Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarApr2017

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 37175

Article: 37175
Subject: Crossing a clock domain
From: VR <crossing@notjordanbutaclockdomain.com>
Date: Mon, 3 Dec 2001 04:43:19 +0000 (UTC)
Links: << >>  << T >>  << A >>
Hey all.

I have a uC that's updating a register in my XC4010E via a standard "three-wire" SPI.

The uC writes to an 8-bit shift register in the FPGA -- the uC's SPI clock is used to clock
the FPGA flip-flops and the SPI_nCS (chip select) is used as the ENABLE(active low) to the
register for shifting.

To prevent any odd behavior, I am double buffering my data -- the shift register is one
buffer, and I parallel load the data from the shift register into a second 8-bit register,
which is clocked by my FPGA native 40MHz clock. (The 2nd register feeds the inputs of a
loadable free running counter).

My problem is in the the control of the second register. I only want this register to update
when data(in the SPI register) is valid but the SPI register is being clocked by something
completely asynchronous to the FPGA's clock.

My first idea was to use the SPI_nCS as the ENABLE(active high) on the second register; the
register would clock in data on rising edges of the FPGA clock and only when SPI_nCS was high.
Since the SPI_nCS "envelope" surrounds an SPI transaction, when the signal is NOT low, I know
an SPI operation wouldn't be occurring.

I also wondered if a better solution might be to use three T-flip-flops and divide down the
uC's SPI clock by 8, so on the 8th clock(when the last bit from the uC gets clocked into the
shiftreg) I register the SPI register's data. I would use the output from the third T-FF as
the ENABLE(active high) on my 2nd register and still clock the 2nd register from the FPGA
40MHz clock.

I'm sure all of the above will work, but I didn't know which would be a better solution (if
any) and if there are other things to keep in mind.

Thanks,
VR.

Article: 37176
Subject: Re: PCI card - 2 layers versus four layers
From: "Austin Franklin" <austin@dark98room.com>
Date: Mon, 3 Dec 2001 01:42:39 -0500
Links: << >>  << T >>  << A >>
Did you read the PCI spec carefully?  The PCI spec requires power and ground
planes, since the maximum distance for the PCI power/ground connector pads
to the plane is .25", as stated in 4.4.2.1.  Also, it is a requirement to
decouple the 3.3V PCI pins too, even though they are not used, per the same
section.

"Dan" <daniel.deconinck@sympatico.ca> wrote in message
news:I8TN7.26205$cC5.2965973@news20.bellglobal.com...
> Hello,
>
> I am shipping a 2 layer PCI card (33mhz-32bit). It uses a Xilinx with a
2.5V
> core and 5Volt tolerant IOs.  ( XC2S50-5PQ208C)
>
> I laid out the board with as much ground plane on the bottom and as much
> routing on the top as was possible. Its 90% ground plane. I believe that
> this works OK on many PCs but I think I still need to improve the
electrical
> characteristics of the board for proper operation across all PCs.
>
> I currently use through hole by pass caps all around the perimeter of the
> Xilinx chip.
>
> I am sure things will get better by switching to both surface mount caps
and
> a four layer PCB. My question is how important is each of these two
> improvements when compared to one another ? For example X% of the
> improvement will come by switching from  through hole caps to surface
mount
> and (100-X) % of the improvment will come from switching from two layers
to
> four layers.
>
> I am wondering if simply switching to surace mount caps will give enough
of
> a boost in performance.
>
>
> Sincerely
> Daniel DeConinck
> www.PixelSmart.com
> TEL: 416-248-4473
>
>
>



Article: 37177
Subject: Re: Crossing a clock domain
From: Philip Freidin <philip@fliptronics.com>
Date: Mon, 03 Dec 2001 07:26:26 GMT
Links: << >>  << T >>  << A >>
On Mon, 3 Dec 2001 04:43:19 +0000 (UTC), VR
<crossing@notjordanbutaclockdomain.com> wrote:
>Hey all.
>
>I have a uC that's updating a register in my XC4010E via a standard "three-wire" SPI.
>
>The uC writes to an 8-bit shift register in the FPGA -- the uC's SPI clock is used to clock
>the FPGA flip-flops and the SPI_nCS (chip select) is used as the ENABLE(active low) to the
>register for shifting.

So far, so good. We will call this reg the SPIReg.

>To prevent any odd behavior, I am double buffering my data -- the shift register is one
>buffer, and I parallel load the data from the shift register into a second 8-bit register,
>which is clocked by my FPGA native 40MHz clock. (The 2nd register feeds the inputs of a
>loadable free running counter).

OK, we call this reg the ReloadReg.

Although you dont say so, all the following assumes that your free running
counter will reload from the ReloadReg at some arbitrary time with
respect to the updating process. I.E. it could use the second register while
a new value is arriving, or has just arrived, is about to arrive.

>My problem is in the the control of the second register.

Well actually, the problem is when to load ReloadReg (the second registe)

>I only want this register to update when data(in the SPI register) is valid

Right

>but the SPI register is being clocked by something completely asynchronous
>to the FPGA's clock.

Right

>My first idea was to use the SPI_nCS as the ENABLE(active high) on the second
>register; the register would clock in data on rising edges of the FPGA clock and
>only when SPI_nCS was high.

This wont work. The failure scenario is that there is a race condition between
SPI_nCS going high, and the next FPGA clock. ENABLE for ReloadReg has
a setup and hold time requirement for reliable loading. This can be violated
by this scheme.

This will lead to an incorrect value loaded into ReloadReg, comprising of
some new bits and some old bits, or metastability. If you are unlucky, while the
ReloadReg has junk in it, it will be used as the reload value.

Following FPGA clocks will correct the value, but there is a finite probability
of the above scenario happening.

>Since the SPI_nCS "envelope" surrounds an SPI transaction, when the signal is NOT low, I know
>an SPI operation wouldn't be occurring.

The problem is on the boundary of SPI_nCS going from low to high. There is also
an issue with SPI_nCS going from high to low, and potentially corrupting a load
at this point too, but because the prior contents, and the data on the D pins
are the same, this wont cause problems.

>I also wondered if a better solution might be to use three T-flip-flops and divide down the
>uC's SPI clock by 8, so on the 8th clock(when the last bit from the uC gets clocked into the
>shiftreg) I register the SPI register's data. I would use the output from the third T-FF as
>the ENABLE(active high) on my 2nd register and still clock the 2nd register from the FPGA
>40MHz clock.

This still has the problem of your ReloadReg enable being in the wrong
clock domain.

>I'm sure all of the above will work, but I didn't know which would be a better solution (if
>any) and if there are other things to keep in mind.

I'm sure the above will eventually fail.

Here's what I would do (have done in dozens of production designs ).

You are right to be concerned about the data for your counter being
in a register that is clocked by the same clock as the counter. The only
issue is how to load ReloadReg reliably.

Consider the following:

1) Once the SPIReg has been updated, we only need to copy it once
    into ReloadReg

2) The SPI clock is significantly slower than the 40MHz FPGA clock,
    so you can take a few 40MHz cycles to get the data safely over into
    the 40MHz domain, since it will take far longer for the SPI system to
    send a new value.

3) Since the SPI system is async to the 40MHz system, your design cant
    be sensitive to the exact 40MHz cycle in which ReloadReg is updated.
    So taking 3 * 40MHz cycles to do it safely is fine.

Here is a structure that will work reliably:

Connect 4 D flipflops as a 4 bit shifter (Q0 to D1, Q1 to D2, Q2 to D3,
Q3 to D4). Clock the shifter with the 40MHz.

Connect the input D0 to SPI_nCS.

Connect a 4 input AND gate to Q0, Q1, Q2, and ~Q3 , to detect the sequence
1,1,1,0 . This will occur after SPI_nCS has gone high, and 3 clocks of 40MHz.

If a metastability occured or you failed a setup/hold requirement, you might
see something like the following:

1,0,0,0
0,0,0,0
1,0,0,0
1,1,0,0
1,1,1,0   <- match case
1,1,1,1
1,1,1,1

Regardless of metastabilities, by the time you get a match, things will
be settled, as you have taken 75 ns , which should be more than enough
given the characteristics of current FPGAs.

Take the output of the AND gate and pass it through 1 more FF. Now we
have 100ns of resolution time (and update latency) . Note that the match
signal will only be high for one 25ns cycle. Use the output of this FF as
the active high enable for ReloadReg.

This should be extremely reliable.

>Thanks,
>VR.

Philip Freidin
Philip Freidin
Fliptronics

Article: 37178
Subject: Synplify 7 and Xilinx 4.1 Pair
From: dottavio@ised.it (Antonio)
Date: 2 Dec 2001 23:39:23 -0800
Links: << >>  << T >>  << A >>
Some question about this pair :

1)In Xilinx I use Synplify to synthesize, why there are always the
possibility to create a constrain file also in Xilinx ?? I had to
clear it ??

2) Synplify produce a file .ncf containing P&R constrain , how I can
specify it like input to the P&R ??

Article: 37179
Subject: Multicycle Synplify question
From: dottavio@ised.it (Antonio)
Date: 2 Dec 2001 23:46:24 -0800
Links: << >>  << T >>  << A >>
How I can recognize that a path is a multicycle path and so I can
specify more clock cycle for it in Synplify constrain ?? What happen
if the this is not a multicycle path ???

For example for the following counter divider 3  :


library IEEE;
use IEEE.std_logic_1164.all;	  
use IEEE.std_logic_unsigned.all;

entity counter_divider_3 is
	port (
		clk			: in  STD_LOGIC;
		reset		: in  STD_LOGIC;
		count_3		: out STD_LOGIC_VECTOR (2 downto 0);
		clk_div_3 	: out STD_LOGIC 
		);
end counter_divider_3;


architecture counter_divider_3_arch of counter_divider_3 is
	signal int_count_3      : STD_LOGIC_VECTOR (1 downto 0) ;
	signal reset_clk_a_b    : STD_LOGIC_VECTOR (3 downto 0) ;
	signal count_0_delayed  : STD_LOGIC;
begin		

	process (clk , reset)
	begin 
		if reset='1' then  
			int_count_3 <= "01"; 
		elsif falling_edge(clk) then	   
			-- & funziona come aggregatore di bit non  un and logico !!
			int_count_3 <= int_count_3(0) & not(int_count_3(0) or
int_count_3(1));
		end if;
	end process;

	process(clk)
	begin
   		if rising_edge(clk) then
       		count_0_delayed <= int_count_3(0);
   		end if;
	end process;
	
	clk_div_3 <= int_count_3(0) nor count_0_delayed;
	
	process(clk)
	begin
    	if falling_edge(clk) then
        	count_3 <= '0' & int_count_3;
    	end if;
	end process;
	
end counter_divider_3_arch;







I use for it these constrains :


# Synplicity, Inc. constraint file
# C:\Tesi\Aggiunte_fino_al_1_Gennaio_2002\Xilinx\CounterDivider3_2Dic2001\counter_divider_3.sdc
# Written on Sun Dec 02 17:37:07 2001
# by Synplify Pro, 7.0.1    Scope Editor
#
# Clocks
#
define_clock -disable  -comment {-improve}   -name {clk}  -freq
165.000 -clockgroup default_clkgroup
#
# Inputs/Outputs
#
define_input_delay -disable      -default
define_output_delay -disable     -default
define_output_delay -disable     {clk_div_3}
define_output_delay -disable     {count_3[2:0]}
define_input_delay -disable      {reset}
#
# Multicycle Path
#
define_multicycle_path           -comment {-improve}  -from
{i:count_0_delayed}  -to {p:clk_div_3}  4
define_multicycle_path           -comment {-improve}  -from
{i:int_count_3[1:0]}  -to {p:clk_div_3}  4
define_multicycle_path           -comment {-improve}  -from
{i:int_count_3[1:0]}  -to {i:count_0_delayed}  4





I obtain a estimated clk of 185MHz but then when I implement it in
Xilinx 4.1 the result is really bad, about 140 MHz , what is wrong ???


Thanks

Article: 37180
Subject: Re: Is there a full open-source synthesis path for any FPGA?
From: Simon Gornall <simon@gornall.net>
Date: Mon, 03 Dec 2001 09:41:57 +0000
Links: << >>  << T >>  << A >>
rickman wrote:
> 
> Simon Gornall wrote:
>
> [Reasons why GCC is a good but limited analogy to FPGA P&R]
> 
> That may all be true. But I still maintain that place and route software
> is inherently more complex than complilers.

No argument here!

> The tasks required to
> convert C language instructions to machine code for a given, well
> defined architecture is conceptually straight forward and well
> understood by nearly anyone graduating with a computer science degree.
> On the other hand, place and route algorithms are in a class of problems
> known as NP complete if my schooling has not failed me (or my memory).
> This means essentially that you can NEVER deterministically find the
> best solution to the problem for a realistic application given the state
> of technology in the foreseeable future. At least this is true until we
> are using Quantum computing which can explore all solution sets
> simultaneously.

Well, not quite. NP-Complete means you're both NP and NP-hard. "NP"
means a "Non-deterministic turing machine can solve the problem
in Polynomial time". In practice, this means the solution will take
a loooong time, because most NP problems involve either an enormous
number of iterations to get the answer, or they have a lot of
variables, increasing the search space. Sometimes (as in FPGA
routing, I'd expect) both :-(  Polynomials can get big very rapidly
when you have "lots" of potential solutions to examine :-((

There is the interesting factor that if you solve one NP problem, you
can in theory solve them all, because any NP problem can be
transformed into any other in polynomial time as well...

> The difference in problem statment means that the algorithms for solving
> them and the means of developing them are very, very different. The
> suboptimal solution hunt will always require custom algorithms and
> special tuning that are far more device specific than what is done to
> write a code optimizer for a processor.

<grin> I did a PhD using neural networks to map feature spaces into 
decision trees. My major discovery was that the relaxation-labelling
equations that were developed for optic-flow are actually an instance
of the Hopfield neural network solution set. 

I'd expect that behind the scenes, you'd probably need a peer-voting
scheme with conventional constraint-based logic as inputs to multiple
types of solver - for example, you could have a genetic algorithm, a
K-nearest-neighbour and a neural network all providing possible 
solutions to localised routing, with a second tier above making the
decision as to which one to "accept" as a potential solution - the
one that best matches the other localised areas. I worked on some
similar stuff when I was a post-Doc.

> You obviously understand compliers pretty well. But what do you know
> about designing place and route software? I don't profess to be an
> expert, but this is a very different animal than writing a compiler.

Very little :-) It seems to me that the routing is the problem though,
and there are *lots* of techniques to try and maximise global "fit"
over local minima in the solution space.

> 
> > One could look at:
> >
> >   http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html
> >   http://www.eecg.toronto.edu/~vaughn/vpr/e64.html (routing images)
> >
> > as a darn good start. I mailed the guy who wrote the package about a
> > year ago though, and he said specifying the 'resource descriptions'
> > as I refer to them above is by far the hardest problem, because of
> > course you have to specify the constraints under which the resources
> > operate as well as the method by which you instantiate the constraint
> > on the resource.
> 
> This is encouraging. But how does it compare to the commercial tools?
> They don't say what the "chip" is. I assume it is an imaginary one, the
> routing appears to be very, very simplistic. Most FPGAs have multiple
> levels of routing and important limitations on how you can use that
> routing. I expect this would greatly complicate routing algorithms.

It will, but not necessarily to the level you expect. A constraint is
a constraint - whether it spans one CLB block or 4 or 16 doesn't really
matter. What does matter is the weighting given to how you would use
the resource, but that's part of the problem...

> But then maybe I am overstating the complexity of P&R algorithms. But
> they have been the bane of FPGA design for as long as there have been
> FPGAs. If you have a chip that runs 20% slower and have tools that
> optimize the P&R to give 20% better results, you will be able to meet or
> beat your competition. I am sure that every FPGA company works very hard
> to improve the P&R tools.

I'm not actually claiming it would be easy :-) I said I thought it would
be hard. I do think it's in the realm of the possible though. At the 
moment I have too much to do (I'm building a radio telescope and writing
the s/w to control it - I can do that in Linux so it takes priority over
the FPGA stuff)

Vaughn founded a company that's been bought by Altera, so he works for
them now. It'll be interesting to see if 'vpr' will stick around. Grab
a copy now!

> But none of that changes the viability of open source tools for FPGA
> design. Perhaps the availability of free (as in beer) tools and low cost
> hardware will encourage more "amature" work in the tool area and we will
> start to see some open source tools. But I don't expect to see them
> being used much professionally during my career. I have about 10 - 15
> years left. We will see if anything changes my mind by then.

Agreed. I'm working on the premise that if Xilinx get moaned at often
enough, they will eventually listen. If only companies were as predictable
as FPGA routing :-)

ATB,
	Simon.

Article: 37181
Subject: Re: 128-bit scrambling and CRC computations
From: kahhean@hotmail.com (Chua Kah Hean)
Date: 3 Dec 2001 02:37:44 -0800
Links: << >>  << T >>  << A >>
Hi all gurus out there,

I got very curious after reading all the posts in this thread.

I know that we can use a lookup table method to implement CRC in
parallel.  E.g. we can use a 256-byte table to calculate the CRC 8-bit
per cycle.

It seems to me that to use the same trick for a 128-bit input would
require a 2^128 element table, which must be a no-go.

Many people have talked about things like unrolling and pipelining the
input.  Can anybody point me to a source where such approaches are
explained in greater detail so that I can apprepiate what you all have
been driving at?

Thanks in advance.

TA TA
kahhean

Article: 37182
Subject: Free PCI simulation model???
From: "Sul Weh" <sweather1999@yahoo.com>
Date: Mon, 03 Dec 2001 11:53:09 GMT
Links: << >>  << T >>  << A >>
Does any know where I can find a free PCI simulation model in VHDL?
32bit @ 33MHz or 66MHz is preferable.

thanks

SW

Article: 37183
Subject: Re: 128-bit scrambling and CRC computations
From: allan_herriman.hates.spam@agilent.com (Allan Herriman)
Date: Mon, 03 Dec 2001 12:30:18 GMT
Links: << >>  << T >>  << A >>
On 3 Dec 2001 02:37:44 -0800, kahhean@hotmail.com (Chua Kah Hean)
wrote:

>Hi all gurus out there,
>
>I got very curious after reading all the posts in this thread.
>
>I know that we can use a lookup table method to implement CRC in
>parallel.  E.g. we can use a 256-byte table to calculate the CRC 8-bit
>per cycle.
>
>It seems to me that to use the same trick for a 128-bit input would
>require a 2^128 element table, which must be a no-go.
>
>Many people have talked about things like unrolling and pipelining the
>input.  Can anybody point me to a source where such approaches are
>explained in greater detail so that I can apprepiate what you all have
>been driving at?

Instead of a monster lookup table mimicking a bunch of XOR gates, just
use the XOR gates directly.  Many of the terms cancel out:  A xor A =
0,  0 xor A = A, etc. so the number of xor gates usually isn't
excessive and you avoid the exponential growth in table size.
(It's actually the depth of the xor gates, not the number of them,
that matters, because the depth determines the delay and hence the
clock rate.)

Take a look at the logic generated by some of the free online parallel
CRC generators:
http://www.easics.be/webtools/crctool
http://www.geocities.com/steve0192/vhdl.htm

The first one (crctool) will generate a function that turns an input
word and a feedback word into a new CRC value, which is the feedback
word for the next clock.

Here's the logic generated by crctool for one bit of a 16 bit CRC with
128 bit input word:

D := Data;	-- the input word
C := CRC;	-- the feedback word

NewCRC(0) := D(127) xor D(125) xor D(124) xor D(123) xor D(122) xor
D(121) xor D(120) xor D(111) xor D(110) xor D(109) xor
D(108) xor D(107) xor D(106) xor D(105) xor D(103) xor
D(101) xor D(99) xor D(97) xor D(96) xor D(95) xor
D(94) xor D(93) xor D(92) xor D(91) xor D(90) xor D(87) xor
D(86) xor D(83) xor D(82) xor D(81) xor D(80) xor D(79) xor
D(78) xor D(77) xor D(76) xor D(75) xor D(73) xor D(72) xor
D(71) xor D(69) xor D(68) xor D(67) xor D(66) xor D(65) xor
D(64) xor D(63) xor D(62) xor D(61) xor D(60) xor D(55) xor
D(54) xor D(53) xor D(52) xor D(51) xor D(50) xor D(49) xor
D(48) xor D(47) xor D(46) xor D(45) xor D(43) xor D(41) xor
D(40) xor D(39) xor D(38) xor D(37) xor D(36) xor D(35) xor
D(34) xor D(33) xor D(32) xor D(31) xor D(30) xor D(27) xor
D(26) xor D(25) xor D(24) xor D(23) xor D(22) xor D(21) xor
D(20) xor D(19) xor D(18) xor D(17) xor D(16) xor D(15) xor
D(13) xor D(12) xor D(11) xor D(10) xor D(9) xor D(8) xor
D(7) xor D(6) xor D(5) xor D(4) xor D(3) xor D(2) xor
D(1) xor D(0) xor C(8) xor C(9) xor C(10) xor C(11) xor
C(12) xor C(13) xor C(15);


(Switch to fixed point font.)

Here's the logic you'll end up with:

clock-----------------------+
                            |
        +-------+      +----------+
        | huge  |      | register |
input-->| xor   |----->|d        q|--+-> CRC out
(128)   | tree  | (16) |          |  |    (16)
        +-------+      +----------+  |
            ^                        |
            |                        |
            +------------------------+
                  feedback (16)

The "speed" is determined by the minimum clock period, which in this
case is limited by the number of logic levels in the xor tree - i.e.
the maximum delay between any flip flop output and any flip flop
input.
You can't do anything with this directly, as the feedback must happen
in a single clock cycle.

If you look more closely at the logic expression, you'll see that it
can be decomposed into the form (input xor feedback) where input is
the xor of a bunch of input bits, and feedback is the xor of a bunch
of feedback bits.

This leads to the following design:

clock--------------------------------------+
                                           |
        +-------+      +-------+      +----------+
        | medium|      | small |      | register |
input-->| xor   |----->| xor   |----->|d        q|--+-> CRC
(128)   | tree  | (16) | tree  | (16) |          |  |   out
        +-------+      +-------+      +----------+  |   (16)
                           ^                        |
                           |                        |
                           +------------------------+
                               feedback (16)

This isn't any faster than the first attempt, but notice that the
"medium xor tree" is not in the feedback path.  This means it can be
pipelined - we can put flip flops in the logic so that the calculation
is performed over several clock cycles.  The logic depth between any
flip flop output and any flip flop input is reduced - we can have a
faster clock.

This is shown here:

clock-----------------------+-----------------------------
                            |
        +-------+      +----------+      +-------+      +-
        | medium|      | register |      | small |      |
input-->| xor   |----->|d        q|----->| xor   |----->|d
(128)   | tree  | (16) |          | (16) | tree  | (16) |
        +-------+      +----------+      +-------+      +-
                                             ^
                                             |
                                             +------------
                                                   feedbac

(I pruned the right side to avoid line wrap, but you should get the
idea.)

In theory the synthesis tools can do all this for you.  E.g. you can
describe a serial CRC calculation, put it in a for loop to iterate
over the input word, tell it how many clock cycles to take, and the
synthesiser should spit out something equivalent to the above.
(I have used this approach with LFSRs with some success at these bit
rates.)

I could make a comment about the relative benefits of HDLs and
schematics for high speed design, but I don't want to ignite yet
another religious war.

Regards,
Allan.

Article: 37184
Subject: Re: PCI card - 2 layers versus four layers
From: acher@in.tum.de (Georg Acher)
Date: 3 Dec 2001 14:19:19 GMT
Links: << >>  << T >>  << A >>
In article <u0m7n0ku4ccvd2@corp.supernews.com>,
 "Austin Franklin" <austin@dark98room.com> writes:
|> Did you read the PCI spec carefully?  The PCI spec requires power and ground
|> planes, since the maximum distance for the PCI power/ground connector pads
|> to the plane is .25", as stated in 4.4.2.1. <...>

Then 80% of the cheaper network and soundcards violate the spec. I have never
seen a RTL8139 based network card with a multilayer PCB.

-- 
         Georg Acher, acher@in.tum.de         
         http://www.in.tum.de/~acher/
          "Oh no, not again !" The bowl of petunias          

Article: 37185
Subject: XC17S00A programmable as XC17S00 for 2 XC2Ss?
From: Utku Ozcan <ozcan@netas.com.tr>
Date: Mon, 03 Dec 2001 17:19:08 +0200
Links: << >>  << T >>  << A >>

We need such a PROM that can program two Spartan-II
XC2S50-5FG256Cs at the same time.

XC2S50 has configuration file size of 559.200 bits.
So we need a PROM of size of = 2 x 559.200 = 1.118.400
bits.

The devices I have found are

XC17S00A 3.3V XC17S200A one time programmable
XC17V00A 3.3V XC17V02 in system programmable

The problem is, we need an OTP SPROM but our Data I/O
Programmer device only supports XC17S00 and XC17S00XL
devices but not XC17S00A devices.

However, when I have looked at the datasheets of XC17S00/XL
http://www.xilinx.com/support/programr/files/17s00.pdf
and datasheet of XC17S00A http://www.xilinx.com/partinfo/ds078.pdf
I see that the internal logic of these devices are the same.
I haven't found any difference.

XC17S00/XL family doesn't have any PROM that can hold two
Spartan-II XC2S50 at the same time. On the other side
Xilinx doesn't say that Spartan II devices can be programmed
with XC17S00/XL PROMs.

The reason why we look at XC17S00/XL devices for Spartan-II
is that our Data I/O Programmer only supports XC17S00/XL
devices.

I have thought that XC17S00/XL in Data I/O Programmers are
compatible with XC17S00A PROMs, therefore I can use XC17S00/XL
mode in Data I/O Programmer to program XC17S00A PROMs.

Utku

Article: 37186
Subject: Benchmarking RC
From: Hananiel Sarella <hsarella@honeybee.ececs.uc.edu>
Date: 03 Dec 2001 10:44:04 -0500
Links: << >>  << T >>  << A >>
Hello,
        Im a grad student trying to benchmark an FPGA board. Are there any non
 proprietary BMs avaliable for FPGAs? Specifically I want benchmark efforts to
 measure the performance of multiple FPGA chips and their interconnect(between
 them). Pointers to any BM effort will be helpful. 
 thank you very much,
hananiel  


Article: 37187
Subject: Re: Phase noise (jitter) of XILINX logic elements - ?
From: Austin Lesea <austin.lesea@xilinx.com>
Date: Mon, 03 Dec 2001 07:48:09 -0800
Links: << >>  << T >>  << A >>
Alex,

I prefer to look at this as "what is the jitter noise floor" in a CMOS FPGA?

Getting in, and getting out of the FPGA is the biggest problem, followed by the
internal distribution of the clock signals.

This is something we have carefully characterized, as we are the 'FPGA Lab'
responsible for the verification of the design.

To get in, get onto a BUFG (global clock resource) and then get out (by using
the DDR clock forwarding FF's) is about 35 to 55 ps P-P (nothing else
happening).

If you have an another BUFG operating, the jitter goes up to 55 ps to 65 ps P-P.

If you then have 10% of all nodes in a 2V3000 all toggle at the same time on the
same clock domain (BUFG), the jitter measured is ~ 150 ps P-P on an
ansynchronous clock domain.

The primary means of jitter is the coupling through the ground, which affects
the slicing level of all of the logic.

Use of LVDS input buffers, and output buffers helps for the external jitter
contributors, but does nothing for the internal contributors.

150 ps P-P of jitter in a design was ignored up until recently.  With DDR
(double data rate) logic designs, and clock periods of 4 ns in some designs, the
half clock period is 2 ns, and 150 ps becomes a significant part of the timing
budget.

See:

 http://www.xilinx.com/support/techxclusives/slack-techX21.htm

Austin

Alex Sherstuk wrote:

> Dear colleagues,
>
> Some time ago there was discussion about phase noise (jitter) introduced by
> XILINX FPGA DLL's
>
> Here is an other question:
>
> What phase noise (jitter) is introduced by a regular logic element of XILINX
> FPGA (e.g. SPARTAN2)?
> What is the timing uncertainty introduced by XILINX CLB trigger?
>
> Thanks,
>    Alex


Article: 37188
Subject: Re: Modelsim
From: Santiago de Pablo <sanpab@eis.uva.es>
Date: Mon, 03 Dec 2001 16:58:33 +0100
Links: << >>  << T >>  << A >>


"Ed Browne, Precision Electronic Solutions" escribió:
> 
> It's appalling that Xilinx would sell a product to design an FPGA/CPLD
> without the ability to simulate the design unless you buy a $1000+
> simulator.  Neither the free version nor the eval version allows testing on
> anything over 500 lines.  At that limit on my machine, it simply closes - no
> slowing down.
> 
> Does anyone have a lower cost alternative, preferably one that would accept
> the HDL bencher output?

Hi Ed and all:

  I have used the WebPACK HDL simulator (i.e. Mentor Graphics
at-last-free ModelSim simulator) and it runs ok for small-medium designs
(see http://www.DTE.eis.uva.es/OpenProjects/OpenDSP/index.htm). The
*trick* is to use VHDL or Verilog to design the circuit (up to you know
500 lines) and use TCL (*.cmd) files to simulate (in HDL, of course, I
wish to simulate the routed design with it, but I cannot: I use
Foundation). Yes, if you use *.cmd files for simulations, instead
HDL-benches, you can simulate bigger designs.

  Another diference: the same 100K-lines code simulated with 500-lines
limitation was 120 seconds; without such limitation it was 2 seconds.
Nice, but why to pay!

> 
> Ed Browne
> Precision Electronic Solutions
> 

Regards, Santiago (sanpab@eis.uva.es).

Article: 37189
Subject: Re: 128-bit scrambling and CRC computations
From: rickman <spamgoeshere4@yahoo.com>
Date: Mon, 03 Dec 2001 11:12:41 -0500
Links: << >>  << T >>  << A >>
Allan Herriman wrote:
> Here's the logic generated by crctool for one bit of a 16 bit CRC with
> 128 bit input word:
> 
> D := Data;      -- the input word
> C := CRC;       -- the feedback word
> 
> NewCRC(0) := D(127) xor D(125) xor D(124) xor D(123) xor D(122) xor
> D(121) xor D(120) xor D(111) xor D(110) xor D(109) xor
> D(108) xor D(107) xor D(106) xor D(105) xor D(103) xor
> D(101) xor D(99) xor D(97) xor D(96) xor D(95) xor
> D(94) xor D(93) xor D(92) xor D(91) xor D(90) xor D(87) xor
> D(86) xor D(83) xor D(82) xor D(81) xor D(80) xor D(79) xor
> D(78) xor D(77) xor D(76) xor D(75) xor D(73) xor D(72) xor
> D(71) xor D(69) xor D(68) xor D(67) xor D(66) xor D(65) xor
> D(64) xor D(63) xor D(62) xor D(61) xor D(60) xor D(55) xor
> D(54) xor D(53) xor D(52) xor D(51) xor D(50) xor D(49) xor
> D(48) xor D(47) xor D(46) xor D(45) xor D(43) xor D(41) xor
> D(40) xor D(39) xor D(38) xor D(37) xor D(36) xor D(35) xor
> D(34) xor D(33) xor D(32) xor D(31) xor D(30) xor D(27) xor
> D(26) xor D(25) xor D(24) xor D(23) xor D(22) xor D(21) xor
> D(20) xor D(19) xor D(18) xor D(17) xor D(16) xor D(15) xor
> D(13) xor D(12) xor D(11) xor D(10) xor D(9) xor D(8) xor
> D(7) xor D(6) xor D(5) xor D(4) xor D(3) xor D(2) xor
> D(1) xor D(0) xor C(8) xor C(9) xor C(10) xor C(11) xor
> C(12) xor C(13) xor C(15);
> 
> (Switch to fixed point font.)
> 
> Here's the logic you'll end up with:
> 
> clock-----------------------+
>                             |
>         +-------+      +----------+
>         | huge  |      | register |
> input-->| xor   |----->|d        q|--+-> CRC out
> (128)   | tree  | (16) |          |  |    (16)
>         +-------+      +----------+  |
>             ^                        |
>             |                        |
>             +------------------------+
>                   feedback (16)
> 
> The "speed" is determined by the minimum clock period, which in this
> case is limited by the number of logic levels in the xor tree - i.e.
> the maximum delay between any flip flop output and any flip flop
> input.
> You can't do anything with this directly, as the feedback must happen
> in a single clock cycle.
> 
> If you look more closely at the logic expression, you'll see that it
> can be decomposed into the form (input xor feedback) where input is
> the xor of a bunch of input bits, and feedback is the xor of a bunch
> of feedback bits.
> 
> This leads to the following design:
> 
> clock--------------------------------------+
>                                            |
>         +-------+      +-------+      +----------+
>         | medium|      | small |      | register |
> input-->| xor   |----->| xor   |----->|d        q|--+-> CRC
> (128)   | tree  | (16) | tree  | (16) |          |  |   out
>         +-------+      +-------+      +----------+  |   (16)
>                            ^                        |
>                            |                        |
>                            +------------------------+
>                                feedback (16)
> 
> This isn't any faster than the first attempt, but notice that the
> "medium xor tree" is not in the feedback path.  This means it can be
> pipelined - we can put flip flops in the logic so that the calculation
> is performed over several clock cycles.  The logic depth between any
> flip flop output and any flip flop input is reduced - we can have a
> faster clock.
> 
> This is shown here:
> 
> clock-----------------------+-----------------------------
>                             |
>         +-------+      +----------+      +-------+      +-
>         | medium|      | register |      | small |      |
> input-->| xor   |----->|d        q|----->| xor   |----->|d
> (128)   | tree  | (16) |          | (16) | tree  | (16) |
>         +-------+      +----------+      +-------+      +-
>                                              ^
>                                              |
>                                              +------------
>                                                    feedbac
> 
> (I pruned the right side to avoid line wrap, but you should get the
> idea.)
> 
> In theory the synthesis tools can do all this for you.  E.g. you can
> describe a serial CRC calculation, put it in a for loop to iterate
> over the input word, tell it how many clock cycles to take, and the
> synthesiser should spit out something equivalent to the above.
> (I have used this approach with LFSRs with some success at these bit
> rates.)
> 
> I could make a comment about the relative benefits of HDLs and
> schematics for high speed design, but I don't want to ignite yet
> another religious war.
> 
> Regards,
> Allan.

I am not clear about how you generated this logic, but it does not match
the general problem. Even though there are only 16 bits in the CRC,
there should be 128 bits in the "feedback" register as well as in the
input. This means that there would be about the same number of feedback
signals to the "small" XOR tree as there are input signals to the medium
tree. So pipelining will improve your complexity roughly by a factor of
2, but not so much more as your analysis above indicates. This of course
does not reduce the number of logic levels by 2, but only a half LUT
when using 4 input LUTS. 

Try this with a very simple one like X43. You start with 43 bits in the
register and have to add one bit for every extra bit in the input word.
If you have 16 bits in at one time, you need a 58 bit feedback word. 

Hmmm... does that mean that there should be 128 + C - 1 bits in the
register, where C is the size of your CRC? I don't remember that being
the case.


-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 37190
Subject: Re: problem with manual floorplanner
From: husby_d@yahoo.com (Don Husby)
Date: 3 Dec 2001 08:36:43 -0800
Links: << >>  << T >>  << A >>
Theron Hicks <hicksthe@egr.msu.edu> wrote in message news:<3C0930CF.8C3C49B4@egr.msu.edu>...
>     I have a problem with the manual floorplanner in ise4.1.  I have a
> design which I know will place and route but the system will not quite
> make it if I use all coregen parts.  If I use mostly inferred counters
> and adders it will work OK.  If I try to manually place the parts then
> they get screwed up.  I woulkd like to use the absolute simplest method
> to located the coregen parts using the UCF file
> ...

I've seen this problem too.  Usually, it works to floorplan
one flip-flop from the middle a counter.  The others will be
placed correctly by PAR.  If you just have carry logic without
flip-flops, you're out of luck.  You can try an area constraint
(fit the logic in a rectangle).  I think this is possible with the
floorplanner.  The best solution is to instantiate each element
of the carry chain.  Even this doesn't always work with the
floorplanner.

Article: 37191
Subject: Xilinx Parallel FIR Implementations
From: bgaughan@aircom.com (Brady Gaughan)
Date: 3 Dec 2001 08:37:16 -0800
Links: << >>  << T >>  << A >>
I have been looking at an upcoming design where I will have 2 16-bit
channels of downconversion/decimation and a 16-bit channel of
upconversion/interpolation.
My two sample rates are 102.4MHz and 25.6MHz, with overall dec. by 4
or interp. by 4.  I am also performing shifts of 12.8MHz which works
out to fs/8, fs/4 depending on which stage I perform them.  The fs/8
shift is less attractive because of the root(2)/2 terms and it would
run at 102.4MHz.  However, this fs/8 shift would allow me to use a
single filter per channel, versus 2 filter stages and fs/4 at 51.2MHz.

I have been targeting the VirtexE or Virtex2 families.  Along with
Matlab sims, I have been generating DA FIR cores to get some
size/speed estimates for FIRs with 16-bit inputs and 16-bit
coefficients.  While the size/speed of Serial DA and nearly-Serial DA
approaches are attractive, I need full rate or nearly-full rate
filters.  This has been leading me towards full parallel DA FIRs. The
first thing apparent is that these start to get large, but intuitively
I would think that these would approach the size of multiplier-based
designs?

So, if I'm heading towards MAC based FIRs, I wonder about using the
Virtex2 and it's dedicated multipliers.  I understand from my local
Xilinx FAE that MAC-based FIR cores may be in the next Coregen update?
 I could use 4 Block Multipliers in a polyphase-type arrangement for
my dec. by 4 paths.  I could also exploit Halfband or other filter
symmetry to implement efficient FIRs.

I would like to get some input from others as to what they have done
for similar applications.  Thanks for any insight!

Brady Gaughan
Airnet Communications
bgaughan@nospam.aircom.com

Article: 37192
Subject: Re: XNF file is rewritten and rendered useless
From: Brian Philofsky <brian.philofsky@xilinx.com>
Date: Mon, 03 Dec 2001 10:00:38 -0700
Links: << >>  << T >>  << A >>
This is a multi-part message in MIME format.
--------------F12C70FD53BDB3149E9E13A8
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit



For this case, I would suggest to instantiate an FDCE rather than an FDC.  An
FDCE is a primitive rather than a macro so you would not need an external XNF of
other file to describe it to the tools.  You can tie the clock enable to a logic
1 to keep it permanently enabled and should serve the same function.

This does not explain your problem, but should get you going much quicker.


---  Brian



Don Teeter wrote:

> Please help if you can.  My VHDL design in Xilinx Foundation 2.1i uses
> library macro FDC.  To instantiate I include the provided file FDC.XNF as a
> source file.  At some point during synthesis the XNF file changes, losing
> some ports.  Then when attempt to implement, I get error messages:
>
> Error: The pin 'D' of the cell 'cmdproc/U1' does not have an associated
> signal in the XNF design 'fdc'. (FPGA-LINK-17)
>
> One of these messages for each of three now-missing ports.  If I look in the
> XNF file I see it has changed and the ports are missing from it.  What
> gives?  How to prevent?  Thank you,
>
> Don T.

--------------F12C70FD53BDB3149E9E13A8
Content-Type: text/x-vcard; charset=us-ascii;
 name="brian.philofsky.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Brian Philofsky
Content-Disposition: attachment;
 filename="brian.philofsky.vcf"

begin:vcard 
n:Philofsky;Brian
x-mozilla-html:TRUE
adr:;;;;;;
version:2.1
email;internet:brian.philofsky@xilinx.com
fn:Brian Philofsky
end:vcard

--------------F12C70FD53BDB3149E9E13A8--


Article: 37193
Subject: Re: What do you like/dislike about place and route tools?
From: John_H <johnhandwork@mail.com>
Date: Mon, 03 Dec 2001 18:05:36 GMT
Links: << >>  << T >>  << A >>
I love seeing comments form others that reinforce the gripes I've had over
time.

My experience with the Altera MaxPlus-II tools is dated with no Quartus to
back mu up but what I saw then is consistent with what I continue to see
in Xilinx:

   Nobody is doing "critical route" placement.

In my opinion, the best way to place and route a design is to figure out
what paths will be the most difficult to route.  When the delay paths with
two carry chains and four levels of logic are routed first, the paths with
two levels of logic should be cake to P&R with fewer resources available.
I don't care if my logic goes from one corner to another if it doesn't
impact the timing for that path and the critical routing resources have
already been used.  What does irk me is finding part of my few tight paths
getting placed inefficiently.  To have these critical paths in different
rows in my Altera design or in different, non-adjacent CLBs in my Xilinx
designs is irresponsible.

The Xilinx mapper in particular works against the concept of a critical
route based place and route.  The idea that the design should 1) be placed
then 2) be routed doesn't work (to any level of efficiency).  The P&R tool
should be 1) place, 2) route, 3) place, 4) route, 5) place, etc....  At
least in (an upcoming servicepack of) the version 4.1i tools there's some
attention given to critical routes, though more of a ripup and retry
approach.  Strike that, I think they refer to it as a retry:  no ripup of
any paths that meet timing.  This is a big step in the right direction but
still may be too little, too late in the P&R process to give the leaps in
performance.

With the proper P&R strategies, the silicon that's been designed to kick
some serious butt will finally be able to do just that.  Having the P&R
kick the engineer in the butt really needs to stop.

I hope the research you're doing is toward a very good end!


Article: 37194
Subject: Re: PCI card - 2 layers versus four layers
From: "Austin Franklin" <austin@dark98room.com>
Date: Mon, 3 Dec 2001 13:57:28 -0500
Links: << >>  << T >>  << A >>
Just because someone violates the spec, doesn't make it right, or something
other designers can/should do.  The spec IS the spec, like it or not, agree
with it or not...  It doesn't mean something done outside the spec won't
"work", depending on your definition of "work".

Having designed a dozen or so PCI cards (as well as PCI cores), I would
strongly urge people to stick to the spec.  That typically minimizes
problems, especially unless you're willing to do VERY extensive testing with
all existing motherboards and plug-in cards, in every conceivable
configuration...fully loaded, in every different slot etc, through full
temperature and voltage ranges...and continue testing as new boards etc.
come out...


> |> Did you read the PCI spec carefully?  The PCI spec requires power and
ground
> |> planes, since the maximum distance for the PCI power/ground connector
pads
> |> to the plane is .25", as stated in 4.4.2.1. <...>
>
> Then 80% of the cheaper network and soundcards violate the spec. I have
never
> seen a RTL8139 based network card with a multilayer PCB.
>
> --
>          Georg Acher, acher@in.tum.de
>          http://www.in.tum.de/~acher/
>           "Oh no, not again !" The bowl of petunias



Article: 37195
Subject: Re: What do you like/dislike about place and route tools?
From: Andy Peters <andy@exponentmedia.deletethis.com>
Date: Mon, 03 Dec 2001 19:07:46 GMT
Links: << >>  << T >>  << A >>
Well, here's a complaint about Lattice's tools.

For some reason, Lattice thinks that designers care about how many logic
levels it takes to implement a function.  See, I don't care.  All I care
is that the finished design meets my timing constraints (and fits). 
Problem is, Lattice's tools don't know a timing constraint from a hole
in the wall.  What their tools expect you to do is to pick a combination
of fitter options, press "go" and after the place and route completes
(if, in fact, it does), you have to manually go through the timing
reports to see if you win or lose.  And when you lose, you have to go
back in and pick a different bunch of options.  The fitter "effort"
switch doesn't do what you think it does, it just picks a different
algorithm.  The "Explore" feature is broken.

If you want to take advantage of the fast I/O output enables, you have
to set a constraint in a constraint file that's call "end critical
path."  And the fitter will then warn you that there's "no combinational
logic..." to minimize if you drive your output enable from a flop.  (It
still "does the right thing," but the warning is stupid.)

I've told the Lattice rep more than once: I want to be able to set a
period constraint and I/O timing constraints, push the "start" button,
and go get a cup of coffee or get some lunch, and come back and find my
chip either routed or failed to meet timing (or it wouldn't fit).

I haven't even mentioned how unroutable their chips are.

---a

Article: 37196
Subject: Re: Is there a full open-source synthesis path for any FPGA?
From: Andy Peters <andy@exponentmedia.deletethis.com>
Date: Mon, 03 Dec 2001 19:34:53 GMT
Links: << >>  << T >>  << A >>
Neil Franklin wrote:

> Presently my real application is my design for running on such an
> board (and being developed on an normal prototype board). Custom board
> will follow after, to let more users use the design with less hassle.

What's a "normal prototype board"?  Don't tell me you're gonna wire-wrap
this thing.

Question: which open-source PCB layout tool will you be using for your
custom circuit-board layout?

Comment: all of the freeware/inexpensive board-layout tools suck, for
many reasons.

> Directly drive SDRAM off of the FPGA. There exist XAPPs on that.

You don't need an XAPP for that.  Just read any SDRAM data sheet.  Piece
of cake.  I hope that non-lazy college professors will start having
their students design DDR SDRAM controllers instead of "Traffic
Controllers" and "Vending Machines."

--andy

Article: 37197
Subject: Re: Modelsim
From: "Seb" <someone@microsoft.com>
Date: Mon, 3 Dec 2001 21:03:56 +0100
Links: << >>  << T >>  << A >>

So it should be possible to simulate large designs with the free version, as
long as you have the time?
How much does the time penalty amount?


"Ed Browne, Precision Electronic Solutions" <ed_b_pes@swbell.net> wrote in
message news:05uN7.600$oO4.343960630@newssvr11.news.prodigy.com...
> It's appalling that Xilinx would sell a product to design an FPGA/CPLD
> without the ability to simulate the design unless you buy a $1000+
> simulator.  Neither the free version nor the eval version allows testing
on
> anything over 500 lines.  At that limit on my machine, it simply closes -
no
> slowing down.
>
> Does anyone have a lower cost alternative, preferably one that would
accept
> the HDL bencher output?
>
> Ed Browne
> Precision Electronic Solutions
>
> "Theron Hicks" <hicksthe@egr.msu.edu> wrote in message
> news:3BFA68F1.10118196@egr.msu.edu...
> >
> >
> > Leon Heller wrote:
> >
> > > Sorry, I've just checked the Xilinx version. It is only for small
> designs.
> > >
> > > --
> > > Leon Heller, G1HSM leon_heller@hotmail.con
> > > http://www.geocities.com/leon_heller
> > > Low-cost Altera Flex design kit: http://www.leonheller.com
> >
> > It will work with much larger designs.  It just runs slower.
> >
> >
>
>



Article: 37198
Subject: Re: Is there a full open-source synthesis path for any FPGA?
From: Peter Alfke <peter.alfke@xilinx.com>
Date: Mon, 03 Dec 2001 12:19:40 -0800
Links: << >>  << T >>  << A >>


Andy Peters wrote:

> > Directly drive SDRAM off of the FPGA. There exist XAPPs on that.
>
> You don't need an XAPP for that.  Just read any SDRAM data sheet.  Piece
> of cake.  I hope that non-lazy college professors will start having
> their students design DDR SDRAM controllers instead of "Traffic
> Controllers" and "Vending Machines."

Agreed, but I would still encourage FPGA users to consult the free app notes (
Xilinx labels them XAPP ). They are sometimes very good, sometimes so-so, but
they usually are well-documented, and they are FREE.
And you can do with them whatever you like, just don't ignore them off-hand.

Peter Alfke, Xilinx Applications


Article: 37199
Subject: Re: PCI card - 2 layers versus four layers
From: Iwo Mergler <Iwo.mergler@soton.sc.philips.com>
Date: Mon, 03 Dec 2001 20:30:44 +0000
Links: << >>  << T >>  << A >>
Dan wrote:
> 
> Hello,
> 
> I am shipping a 2 layer PCI card (33mhz-32bit). It uses a Xilinx with a 2.5V
> core and 5Volt tolerant IOs.  ( XC2S50-5PQ208C)
> 
> I laid out the board with as much ground plane on the bottom and as much
> routing on the top as was possible. Its 90% ground plane. I believe that
> this works OK on many PCs but I think I still need to improve the electrical
> characteristics of the board for proper operation across all PCs.
> 
> I currently use through hole by pass caps all around the perimeter of the
> Xilinx chip.
> 
> I am sure things will get better by switching to both surface mount caps and
> a four layer PCB. My question is how important is each of these two
> improvements when compared to one another ? For example X% of the
> improvement will come by switching from  through hole caps to surface mount
> and (100-X) % of the improvment will come from switching from two layers to
> four layers.
> 
> I am wondering if simply switching to surace mount caps will give enough of
> a boost in performance.
> 

The PCI spec does not specify the type of caps because it
is in your own interest to keep the supply clean.

The spec does specify a four layer board and a certain track
geometry because the mainboard expects certain impedances
and timings. If you don't stick to this, it can cause other,
unrelated things in the system to break in a most amusing way.

It is no secret that most motherboards can work with cards
outside the spec. One of my designs (2 layers) did work on
the top of a (2 layer, 15cm) slot riser card in all computers 
I could get hold of. This was for a research project and I
wouldn't even dream of selling something like that.

Your design may work or it may not. Which one depends on the 
design of the rest of the system, the particular chips used, 
the temperature and the moonphase. This also means that it 
is not necessary your card which stops working, there can be
side effects.

Iwo



Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarApr2017

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search