Messages from 20725

Article: 20725
Subject: Re: Xilinx 9500 CPLD
From: Peter Alfke <palfke@earthlink.net>
Date: Sat, 19 Feb 2000 05:00:31 GMT
Links: << >> << T >> << A >>

Small CPLDs have some advantages over FPGAs:
more predictable performance, faster pin-to-pin delays, non-volatility,
and most importantly, conceptually easier-to-grasp design methods, and
simpler software.
As CPLDs get bigger, these advantages become less important, and the
disadvantages: high static power consumption, severely limited number of
flip-flops, become more annoying.
So SRAM-based FPGAs become a more attractive alternative beyond a
certain size. FPGA pin-to-pin delays are now the same as CPLD's, and
there are far more flip-flops, while the static power is almost zero.
(Nobody can avoid the dynamic power). And volatility has become a
non-issue in almost all cases.

CoolRunner is the only CPLD that avoids the static power, that's why it
can implement more macrocells in a meaningful way.

CPLDs and FPGAs really serve differnt applications, with little overlap.

Peter Alfke, Xilinx Applications

Tim Tuan wrote:

> Hi, why doesn't Xilinx make their 9500 CPLDs any larger. What's the
> constraining factor?
>
> Thanks,
> -T
> mailto:timtuan@yahoo.com

Article: 20726
Subject: Re: Generating a Higher Frequency Clock from a Lower One in FPGA
From: Peter Alfke <palfke@earthlink.net>
Date: Sat, 19 Feb 2000 05:23:52 GMT
Links: << >> << T >> << A >>

You need an
1. adjustable oscillator at 64 MHz,
2. divide-by 64 counter, and
3. phase comparator at 1 MHz.

2 and 3 are very easy to implement. #1 is the problem.
You could build a ring-oscillator out of a chain of reasonably fast delay
elements. Lets assume each element has a delay of 32 ps ( makes the math
easier). You would cascade 250 to achieve the half-period delay of 8 ns
at 64
MHz.
The delay drifts with temperature and supply voltage, and you must
adjust it
from the phase comparator. But you can only adjust the half-period with a
granularity of 32 ns, i.e. you will have an uncontrollable sporadic frequency
error of up to plus/minus 120 kHz. Can you tolerate that amount of uncertainty
and jitter?

Next question is, where do you find, and how do you multiplex these
fine-grained delay elements ? Ray suggested the best programmable
element there
is, the carry chain. But that is still too coarse for you (I think).
Xilinx uses dedicated circuitry in the digital DLL in all Virtex
devices. And
we think that is still too coarse for a PLL, where frequency errors are
cumulative and must be fixed all the time. (In a DLL the error is not
cumulative, and a 35 ps error is usually acceptable.  )

That's why people use an analog PLL, in spite of all the awful headaches it
creates.

Peter Alfke

Nestor wrote:

> Hi everyone.
>
> I am interested in building a clock synthesizer using an FPGA.  My aim is to
> generate a higher frequency clock from a lower frequency reference using an
> FPGA.  For instance, a 64MHz could be generated from a 1MHz reference.  In
> traditional analog phase-locked loops (PLL) this is possible.  My intent is
> to use a digital PLL (DPLL) or an analog-digital hybrid version of the DPLL
> (everything digital except the VCO) to synthesize my higher frequency from
> the lower reference.
>
> From what I have read, a DPLL approaches its analog equivalent if the loop
> is oversampled.  Does this mean that, in order to generate my 64MHz from the
> 1MHz, I would need to use a sampling frequency higher than 64MHz?
>
> If this is true, then the analog PLL would be the better choice to
> synthesize the 64MHz frequency since no frequency higher than 1MHz would be
> required.
>
> Thanks in advance for any suggestions or other comments.
>
> Nestor
> nestor@stansync.com
> nestor@ece.concordia.ca

Article: 20727
Subject: x18 FIFO's in Virtex
From: Keyvan Irani <irani@we.mediaone.net>
Date: Sat, 19 Feb 2000 05:46:04 GMT
Links: << >> << T >> << A >>

Hello,

Does any one know of any way to implement an 18 bit wide FIFO in
Virtex without utilizing 100% of two Block RAMs?

Regards,
K. Irani

Article: 20728
Subject: Re: BEHAVIOURAL VHDL
From: Ray Andraka <randraka@ids.net>
Date: Sat, 19 Feb 2000 06:23:01 GMT
Links: << >> << T >> << A >>

If you are synthesizing, you should be using RTL level code rather than a
behavioral description.  If the behvioral description is even
synthesizable, it is likely not enough information for the synthesizer to
guess at an implementation.

Synplicity will infer RAMs and ROMs from an RTL integer array.

ritchie99_uk@my-deja.com wrote:

> HI ALL,
>
> what's the performance of the actual synthesisers for a behavioural
> vhdl input
> i am looking to target the VIRTEX-E with a behavioural vhdl input, and
> here i don't know  which synthesiser(s) is (are) good especially for
> inferring Block Rams and distributed RAMs
>
> i read that "exemplar" infers automatically the BRAM , what about FPGA
> express that come with F2.1i, is it good  ???
> same question for synplify and symplicity ( i hope that it's the right
> spelling ...)
>
> thanks in anticipation
>
> --ritchie
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20729
Subject: Re: x18 FIFO's in Virtex
From: Ray Andraka <randraka@ids.net>
Date: Sat, 19 Feb 2000 06:27:17 GMT
Links: << >> << T >> << A >>

How deep?  You can use the CLB Ram for relatively small FIFOs, same way
as with 4K designs.

Keyvan Irani wrote:

> Hello,
>
> Does any one know of any way to implement an 18 bit wide FIFO in
> Virtex without utilizing 100% of two Block RAMs?
>
> Regards,
> K. Irani

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20730
Subject: Viewlogic 4 and XACT6.1 - any good for XC4k ??
From: z80@ds2.com (Peter)
Date: Sat, 19 Feb 2000 12:02:28 +0000
Links: << >> << T >> << A >>


I still have the above tools installed, but I haven't done any FPGA
design for about 2 years, and before that it was mostly ASIC
prototyping with large 3k devices.

The supported devices are 2k, 3k and 4k and I know the 2k are
obsolete, and the 3k are still available but pricing isn't too good. 

The 4k *appear* to still be a reasonably good choice for a long-life
project, because it seems to me that the newer Xilinx families get
"churned" a lot more quickly in today's super fast moving marketplace.

I have some production-volume (hundreds) FPGA projects coming up and
would be doing only small designs, say 5k gates or less. Am I right
about the 4k range being a good choice for that? Or should I get rid
of all this old (although just about totally bug-free and solid) stuff
and get one of the starter kits which Xilinx offer? Are they any good?
I know XACT6 is very hard to beat.

Any advice appreciated.


Peter.
--
Return address is invalid to help stop junk mail.
E-mail replies to zX80@digiYserve.com but remove the X and the Y.
Please do NOT copy usenet posts to email - it is NOT necessary.

Article: 20731
Subject: Re: Viewlogic 4 and XACT6.1 - any good for XC4k ??
From: Ray Andraka <randraka@ids.net>
Date: Sat, 19 Feb 2000 15:25:53 GMT
Links: << >> << T >> << A >>

You won't be able to target much anything other than 4000E and 4000EX
series parts with that.  The new tools do run through place and route
faster, and do a little better with automatic placement than Xact6, and
the whole thing will run under NT.  In the process though, you gain lots
of bugs and less functionality in the floorplanner.

Peter wrote:

> I still have the above tools installed, but I haven't done any FPGA
> design for about 2 years, and before that it was mostly ASIC
> prototyping with large 3k devices.
>
> The supported devices are 2k, 3k and 4k and I know the 2k are
> obsolete, and the 3k are still available but pricing isn't too good.
>
> The 4k *appear* to still be a reasonably good choice for a long-life
> project, because it seems to me that the newer Xilinx families get
> "churned" a lot more quickly in today's super fast moving marketplace.
>
> I have some production-volume (hundreds) FPGA projects coming up and
> would be doing only small designs, say 5k gates or less. Am I right
> about the 4k range being a good choice for that? Or should I get rid
> of all this old (although just about totally bug-free and solid) stuff
> and get one of the starter kits which Xilinx offer? Are they any good?
> I know XACT6 is very hard to beat.
>
> Any advice appreciated.
>
> Peter.
> --
> Return address is invalid to help stop junk mail.
> E-mail replies to zX80@digiYserve.com but remove the X and the Y.
> Please do NOT copy usenet posts to email - it is NOT necessary.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20732
Subject: Re: BEHAVIOURAL VHDL
From: "J.R." <j_robby@hotmail.com>
Date: Sat, 19 Feb 2000 19:38:18 -0000
Links: << >> << T >> << A >>

Is not RTL level description a behavioural one?? I am a little bit confused
by the jargon in use...
In my understanding:
- Structural VHDL is any VHDL design that is based on composing sub-designs
(use Hierarchy).
- Behavioural VHDL is any VHDL that is target independent be it DATA FLOW
(boolean equations)
or RTL (register transfer with IF, WHILE, CASE etc. statements).

So a VHDL design can be structural and behavioural at the same time if the
sub-designs are coded behaviourally.

There is another type of VHDL: the one that takes advantage of the target HW
structure (infer special blocks like BlockRAMs in Virtex for example). For
me, this is a subset of VHDL, specific to a particular HW platform (not
standard) .

Any comments????

Article: 20733
Subject: Lattice Download Cable
From: "aaf" <mindornf@online.no>
Date: Sat, 19 Feb 2000 22:15:18 +0100
Links: << >> << T >> << A >>

Hi everyone!

Having the need to reprogram an old Lattice Ispl1016, and without the
download cable at hand, I hope someone of you will help me to get around the
problem. My question is:
   a) Can I program a single device from the PC- printer port without using
any kind of             buffering?

b) If so, Which pin of the LPT port is connected to what programming line?

Thanks in advance
Aage

Article: 20734
Subject: Call for Participation: SIGDA Ph.D. Forum at DAC'2000
From: chou@malibu.ece.uci.edu (Pai Chou)
Date: 19 Feb 2000 23:44:27 GMT
Links: << >> << T >> << A >>

                        Tuesday, June 6, 2000, 7-9pm
                        Los Angeles Convention Center


*** Submission Deadline: March 10, 2000

-What is the Ph.D. Forum at DAC?

The Ph. D. Forum at the Design Automation Conference is an annual event
for Ph.D. students to present their work in front of a poster and have
interactive discussions with attendants. The forum is hosted by SIGDA
and is OPEN to all members of the DA community. It will take place
during the SIGDA member meeting immediately after and adjacent to the
DAC Cocktail party (Tuesday, 7-9pm), in Room 502A at the Los Angeles
Convention Center.

The motivation for this forum started from a 1996 NSF workshop entitled
"Future Research Directions in CAD for Electronic Systems: Putting the D
back in CAD." Since its debut at DAC 1998, we have had many outstanding
Ph.D. candidates. They had some very positive comments about the forum!
Please participate and make this a continuing success.

-Goals

The goals of the Ph. D. forum are
* for graduate students to get feedback on their thesis work from other
  researchers.
* for the industry (CAD, system companies) to preview academic
  work-in-progress to provide a structured way for increasing interaction
  between academia and industry

-Eligibility

Eligible students are those who expect to complete their thesis within
1-2 years, and those who have completed their theses in the 1999-2000
academic year.

Pre-completion students must have a university-approved thesis proposal
or at least one published conference paper.

-What to Submit
* A one-page abstract of the thesis in PDF, not including figures or
  references, and not to exceed 750 words.
* A university-approved thesis proposal, or a published paper
  this is required of ALL students.
* Names of five reviewers whom the student would like to review the abstract

The submission will be reviewed to ensure that the abstract is supported
via the accompanying paper/proposal.

-Travel Grants
 (Notification April 30, 2000)

Some funding will be made available to students to present their work.
The criteria will be based on the quality of the presentation and the
potential benefit for both the students and Forum attendees.


-Forum Presentation
 (Tuesday, June 6, 2000, 7:00 - 9:00 p.m.)

Students will present their work in a poster session, hosted by the SIGDA
during their member meeting.

http://www.eng.uci.edu/~daforum/

--
Pai H. Chou, Assistant Professor of ECE             email chou@ece.uci.edu
Henry Samueli School of Engineering, UC Irvine      phone (949) 824-3229
444F Engineering Tower, Irvine, CA 92697-2625, USA  fax   (949) 824-3203
-- 
--
Pai H. Chou, Assistant Professor of ECE             email chou@ece.uci.edu
Henry Samueli School of Engineering, UC Irvine      phone (949) 824-3229
444F Engineering Tower, Irvine, CA 92697-2625, USA  fax   (949) 824-3203

Article: 20735
Subject: Distributed Arithmetic De-mystified
From: Ray Andraka <randraka@ids.net>
Date: Sun, 20 Feb 2000 02:08:30 GMT
Links: << >> << T >> << A >>

For all of you who have been asking, and those who wanted to know but
were afraid to ask, I have finally gotten a page explaining distributed
arithmetic up on my website.  And for those who don't have a clue what
I'm talking about, distributed arithmetic is a hardware technique that
lets us hide lots of multipliers in an FPGA.  Take a look and let me
know what y'all think.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20736
Subject: Re: BEHAVIOURAL VHDL
From: Ray Andraka <randraka@ids.net>
Date: Sun, 20 Feb 2000 02:11:42 GMT
Links: << >> << T >> << A >>

My understanding of the terminology is behavioral just emulates the function.
while RTL also carries some information about the structure (namely the
locations of the registers and the logic between them).  For example, one could
do a behavioral model of a pipelined multiplier by using the * operator plus a
series of clock delays to match the pipeline length.  An RTL description would
describe, at least in high level terms, the logic between each of the pipeline
stages.

"J.R." wrote:

> Is not RTL level description a behavioural one?? I am a little bit confused
> by the jargon in use...
> In my understanding:
> - Structural VHDL is any VHDL design that is based on composing sub-designs
> (use Hierarchy).
> - Behavioural VHDL is any VHDL that is target independent be it DATA FLOW
> (boolean equations)
> or RTL (register transfer with IF, WHILE, CASE etc. statements).
>
> So a VHDL design can be structural and behavioural at the same time if the
> sub-designs are coded behaviourally.
>
> There is another type of VHDL: the one that takes advantage of the target HW
> structure (infer special blocks like BlockRAMs in Virtex for example). For
> me, this is a subset of VHDL, specific to a particular HW platform (not
> standard) .
>
> Any comments????

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20737
Subject: Re: Generating a Higher Frequency Clock from a Lower One in FPGA
From: murray@pa.dec.com (Hal Murray)
Date: 20 Feb 2000 04:01:57 GMT
Links: << >> << T >> << A >>


> From what I have read, a DPLL approaches its analog equivalent if the loop
> is oversampled.  Does this mean that, in order to generate my 64MHz from the
> 1MHz, I would need to use a sampling frequency higher than 64MHz?

I think the answer to that one is "yes".

Doing everything with digital logic works great if you want
to make a slower clock - that is you get to divide by a big number.
The divide ratio determines the jitter on individual clock edges.
If you are dividing by 100, you can get the clock edge within 1% of
a cycle of where you want it.

You can make the long term frequency as accurate as you want by
dividing by N on some cycles and N+1 on others.  (Think of it as
dividing by N plus a fraction.)


I don't know how to multiply up with digital logic.

-- 
These are my opinions, not necessarily my employers.

Article: 20738
Subject: Re: Generating a Higher Frequency Clock from a Lower One in FPGA
From: Ray Andraka <randraka@ids.net>
Date: Sun, 20 Feb 2000 04:41:20 GMT
Links: << >> << T >> << A >>

For a DPLL, you have a master clock that is several times higher than the clock
you wish to synthesize.  How high is determined by the amount of jitter you can
allow in the generated clock.  Typically you want that to be at least 16x.

You can multipy a clock digitally with a delay lock loop, but you'll need access
to some fairly small incremental delays and equal routing delays to make it work.
Not an easy task in an FPGA.  If I were going to do it, I'd probably look at using
the carry chain for the delay line because it gives you the finest incremental
delays available to the user.

Hal Murray wrote:

> > From what I have read, a DPLL approaches its analog equivalent if the loop
> > is oversampled.  Does this mean that, in order to generate my 64MHz from the
> > 1MHz, I would need to use a sampling frequency higher than 64MHz?
>
> I think the answer to that one is "yes".
>
> Doing everything with digital logic works great if you want
> to make a slower clock - that is you get to divide by a big number.
> The divide ratio determines the jitter on individual clock edges.
> If you are dividing by 100, you can get the clock edge within 1% of
> a cycle of where you want it.
>
> You can make the long term frequency as accurate as you want by
> dividing by N on some cycles and N+1 on others.  (Think of it as
> dividing by N plus a fraction.)
>
> I don't know how to multiply up with digital logic.
>
> --
> These are my opinions, not necessarily my employers.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20739
Subject: Re: Xilinx M2.1 Floorplanner Question
From: bobperl@best_no_spam_thanks.com (Bob Perlman)
Date: Sun, 20 Feb 2000 04:43:58 GMT
Links: << >> << T >> << A >>

On Fri, 18 Feb 2000 03:55:49 GMT, Ray Andraka <randraka@ids.net>
wrote:

>Was the movement limited to within a CLB?  If not check the placement
>report to see what and why it happened.
>

To which placement report are you referring?  I've looked in the .par
file, and other than a number of statements of the form "Resolved that
CLB <DRAM_ADR_8> must be placed at site CLB_R20C11," I don't see
anything that would help me figure out why things are moving.   And
placed logic is moving between CLBs, not just within CLBs.

Bob Perlman


>Bob Perlman wrote:
>
>> Hi -
>>
>> I don't know how many of you use the Xilinx M2.1 floorplanner.  If you
>> do, I have a question for you.
>>
>> Yesterday I used the floorplanner to place portions of a
>> schematic-based XCS30XL design, and managed to go from a design that
>> failed route after 1-1/2 hours (didn't complete route and didn't meet
>> timing on the routed nets)  to a design that routed and met all timing
>> constraints in 40 minutes.  So, I'm happy with the results, but was
>> puzzled by the fact that the Xilinx tools moved some of the cells that
>> I'd placed.  Any RPMs that I placed stayed put, but cells that I'd
>> moved individually into the placement window were sometimes in new
>> places after routing.  You could see that the place and route tools
>> had kept the cells more or less where I'd placed them, but moved some
>> cells around.
>>
>> Is this expected behavior when using the floorplanner?  If so, what's
>> to keep I/O pin assignments from moving?
>>
>> Thanks,
>> Bob Perlman
>>
>>
>> -----------------------------------------------------
>> Bob Perlman
>> Cambrian Design Works
>> Digital Design, Signal Integrity
>> http://www.best.com/~bobperl/cdw.htm
>> Send e-mail replies to best<dot>com, username bobperl
>> -----------------------------------------------------

-----------------------------------------------------
Bob Perlman
Cambrian Design Works
Digital Design, Signal Integrity
http://www.best.com/~bobperl/cdw.htm
Send e-mail replies to best<dot>com, username bobperl
-----------------------------------------------------

Article: 20740
Subject: Re: Spartan and timing analyzer: clock nets using non-dedicated
From: Ray Andraka <randraka@ids.net>
Date: Sun, 20 Feb 2000 04:52:34 GMT
Links: << >> << T >> << A >>

It can be done with careful use of opposite clock edges if the 40M and 80M clocks
are phase locked but with some  amount of unknown skew as long as the skew doesn't
push you too close to the opposite edge.  I know it can be done in a 4025E device
if you are careful with placement (the register to register timing is tight, so
the registers in opposite clock domains have to be placed in horizontally adjacent
CLBs using the direct connects - that is the fastest register to register connect
in a 4K device).  Spend a lot of time on those clock domain crossings to make sure
the skew doesn't kill ya.

Rickman wrote:

> Andy Peters wrote:
> >
> > Tom Burgess wrote in message <38A35E12.DE1CC4F8@hia.nrc.ca>...
> > >The newer parts are amazing all right. Even a slow XLA should give
> > >12.5 - (1.5 Tcko + 1.5? route + 3.0 Tgls + 0.7 Tecck) = 5.8 ns margin.
> > >If we were talking about ye olde 4000 series of 5+ years ago, then "worry"
> > >might have been the right word.
> >
> > Well, the part I'm using is a Spartan XL-4.  I've decided to not worry about
> > it!
>
> Maybe I am missing something. If you are generating a slower clock (40
> MHz) from a faster clock using a divide by 2 FF, then you will have skew
> between your clock domains. Signals moving from the 40 MHz domain to the
> 80 MHz domain will have a reduced setup time (by the amount of the
> skew). This will not be easy to deal with since you are starting with
> only 12.5 nS.
>
> But signals going from 80 MHz domain to the 40 MHz domain will have a
> setup time based on the skew time, not the clock cycle time. The 40 MHz
> clock edge is delayed from the 80 MHz clock edge. If you can't guaranty
> that the minimum delay time for the signal is greater than the skew
> time, which you can't, then you must use the skew time as your clock
> cycle time for this signal!
>
> Am I missing something in the design?
>
> --
>
> Rick Collins
>
> rick.collins@XYarius.com
>
> remove the XY to email me.
>
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design
>
> Arius
> 4 King Ave
> Frederick, MD 21701-3110
> 301-682-7772 Voice
> 301-682-7666 FAX
>
> Internet URL http://www.arius.com

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20741
Subject: Re: multiplier
From: Ray Andraka <randraka@ids.net>
Date: Sun, 20 Feb 2000 05:24:54 GMT
Links: << >> << T >> << A >>



Mathew Wojko wrote:

> In article <38ACBDDF.CA310F0A@ids.net> you wrote:
> : Mathew Wojko wrote:
>
> : > Ray Andraka (randraka@ids.net) wrote:
> : > : Wallace trees are not generally the fastest multipliers in FPGAs.  See the
> : >
> : > If you pipeline them they generally are.
> : >
>
> : No, they are not.  A wallace tree produces a sum vector and a carry vector.
> : Those have to be added together to obtain the full sum.
>
> : However, that final adder determines the maximum clock rate of the
> multiplier.
>
> Precisely. The Wallace tree is a carry-save architecture. When pipelined,
> carry values only ever propagate one bit-position within each stage of
> processing (no carry propagation latencies are experienced). Thus fast
> clocking rates for this 'tree-part' of the multiplier can be acheived.
>
> However, when combining the carry and sum vectors, you do not want to
> compromise the performance obtained thus-far from the 'tree-part' of
> the multiplier. A simple ripple adder implemented using fast carry logic
> will not yeild the same performance as acheived by the wallace tree.
> Thus overall performance will be affected.
>
> : Now fade to the FPGA.  The fast carry chain logic in modern FPGAs is a highly
> : optimized dedicated path that is about an order of magnitude faster than logic
> : implemented in the LUT logic and connected via the general routing resources.
> : That fact makes it extremely difficult to improve upon the performance of the
> : carry chain ripple carry adder.
>
> This is the point that I dont necessarily agree on. I agree that you
> cannot improve on the performance of a ripple carry adder. Using the
> fast-carry logic provides unparallel results for their implemenation.
> However, their exist other addition techniques that will provide better
> pipeline performance when implemented on an FPGA. The trick is not
> to ripple or propagate the carry great lengths between successive
> pipeline stages.

That carry has to be tightly pipelined.  In the extreme case, you can pipeline the
carry out of each bit, but you'll need to add skew and deskew registers to the design
to compensate for the pipeline latency.  If you do this, you will find that you can get
a shorter clock period with a pipelined array multiply than you can with a wallace tree
because the array multipliers routing is all to nearest neighbors.  The limiting factor
in that case is the routing time required to get the multiplicands distributed across
the array.  Note the array can be either a row ripple or column ripple array and you
still get the same tight routing.  The pipeline latency is 2n for the array multiplier
instead of n+logn for the wallace tree.  The thing that slow down the wallace tree when
compared to this extreme case is the length of the routes - a wallace tree has a quite
complicated routing pattern compared to the very simple routing of an array multiplier.

Note that a wallace tree is to a column ripple array as a row ripple tree is to a row
ripple array.  Both the former are tree implementations of the latter to reduce the
tranport delay.  The resources for the wallace tree with a pipelined ripple carry
output adder are the same as for an array multiplier.  In either case, the improvement
over a partial products tree is not as great as you might expect because the array
multiplier/wallace tree multiplier is more than twice the area of the partial products
multiplier (area translates to longer routes for inputs).  The inputs also have to
fan-out to more loads, which requires additional buffering to keep the speed up.

I found some years ago (and I recall being somewhat surprised by the result at first)
that at least in Xilinx 4K, you get  a shorter clock period and lower latency out of
two partial products type multipliers than out of a single pipelined array multiplier,
and the area, without even considering the skew/deskew registers needed for the array,
is about the same for both.  Throw in the skew and deskew registers needed to pipeline
the carries, and you go way over on the area.   I expect Virtex to come out even more
favorable for the partial products multiplier because it's carry structure lets you do
a 2xN partial product in one layer of logic.

>
>
> : This non-homogenous mix of logic means that the
> : cheap ripple carry adder is about as fast as you're gonna get in the FPGA (short
> : of pipelining the carry) for word widths up to around 24-32 bits.
>
> Exactly. If you pipeline the carry then you can acheive a matching
> performance result to that of the wallace tree. Remember that the
> Wallace tree pipelines the carry result at every stage of processing.
> Thats why its called a carry-save technique. Why you would want to
> use a carry ripple adder after expending the extra logic to implement
> a Wallace tree to reduce partial products is beyond me.
>
> : The result is
> : a wallace tree buys you nothing in terms of area, and in fact is twice as big as
> : a a row-ripple tree because the ripple carry adders use one LUT per bit (the
> : carry is in dedicated logic in xilinx or splits the lut in altera) where the full
> : adders in the wallace tree need two luts per bit (one for sum, one for carry).
>
> I agree that the wallace requires more area than the row-ripple tree. As
> you have pointed out, thats true because you do not pipeline the carry
> values in a row-ripple tree (what I call vector based computation),
> whereas in the wallace tree you do. As such, the wallace tree *does* give
> you added performance for area. The clocking speed is substantially faster
> since carry values only propagate one bit position between pipeline stages
> rather than up to 2n bits as in the row-ripple technique.
>
> : The larger area costs clock cycle time since the routing in FPGAs has substantial
> : delay.  Now pipelining will get back the performance (requires a register
> : immediately in front of the final adder for best clock speed), but the fact of
> : the matter is you are still limited by the speed of that final adder.
>
> But thats my point. Why include a carry ripple adder at the final
> stage? This is the obvious performance limiting factor. By using carry
> lookahead techniques you can obtain better performance results than
> the carry ripple adder. Regardless of the carry ripple adder implemented
> by the fast-carry logic.
>
>  So a
> : wallace tree gets you at best, the same performance as a row-ripple tree with
> : double the area (more if you use partial product techniques at the front layer).
> : This is why a wallace tree multiplier is not appropriate for an FPGA.
>
> Sorry, but I disagree. A wallace tree multiplier is appropriate for
> an FPGA *if* you use the appropriate adder to combine the sum and carry
> results. The BCLA adder is a perfect addition technique to combine with the
> wallace tree. Using this, (implemented correctly) the pipeline latency
> at every stage of processing will only be from one 4-input LUT output to a
> register. Thus this technique matches well to both ALTERA and Xilinx FPGA
> architectures.

See my comments above.  I still stand by my assertion that a wallace tree multiplier is
rarely appropriate for an FPGA with a fast carry chain.  The more than doubled area
does not give doubled performance because of routing delays inside the array (which for
a wallace tree are not necessarily to nearest neighbors) and more importantly in the
distribution of the multiplicands to the array.

>
>
> : That said, the column route delay penalty in Altera 10K devices does make a
> : wallace tree a little more attractive for pipelined trees that cannot fit in one
> : row.  The reason for that is the clock period is limited by the delay from the
> : output register on one level of the tree through the carry chain to the msb
> : output register of the next level.  If the levels cross a row boundary, there is
> : a significant delay hit which will reduce the clock frequency unless additional
> : registers are added ahead of and in the same row of the carry chain.  If the tree
> : extends across several rows, several layers of pipeline registers are needed if
> : the tree is all ripple carry adds.  A wallace tree can reduce the hit, but again
> : at the expense of a considerable amount of area...and that is only true for trees
> : that extend across more than two rows.  You get the same clock cycle performance
> : in less area by simply adding the extra pipeline registers instead of doing a
> : wallace tree, but at the expense of a little clock latency.  Note that this is a
> : special case.  The other special case occurs in FPGAs without carry chains, where
> : in order to get an advantage by using a wallace tree, your final adder should use
> : a fast carry scheme.
>
> : > However, it depends on how you define speed. If you are referring to the
> : > clocking rate, then a fully pipelined Wallace tree multiplier will provide
> : > the best results - over vector and array based techniques. However,
> : > Wallace trees require a large amount of device resource to do so (CLB count).

A wallace tree is a vector approach.  It is the tree implementation of a column ripple
array.  A column ripple array uses carry save adders just like the wallace tree except
they are connected to the next row instead of in a tree.

>
> : >
> : > If you are interested in pipelined structures and associated clocking
> : > rates, be prepared to experience an area/time tradeoff for multiplication
> : > implementations. Thats is, the faster you wish to clock the implementation,
> : > the more area you will have to use.
> : >
> : > If you are interested in the functional density of the implementation,
> : > I'd say that vector based approaches (which add partial products in parallel
> : > - using fast carry logic) provide best utilisation results.
>
> : For FPGAs with fast carry chains, these partial product techniques also provide
> : the fastest multipliers short of pipelining the carries.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20742
Subject: Re: Xilinx M2.1 Floorplanner Question
From: Ray Andraka <randraka@ids.net>
Date: Sun, 20 Feb 2000 05:40:18 GMT
Links: << >> << T >> << A >>

If the placement found conflicts it would be in the PAR report.  SOunds to me
like your floorplan file for the rev you are working on didn't get updated
with your floorplan changes.  The only times I've seen stuff move from a CLB
I floorplanned are 1)  RLOCs/LOCs in the source or constraints override the
floorplan or 2) I specified the wrong floorplan file or 3) I didn't set the
floorplan file.   In 2.1 if you specify a floorplan file outside of the rev,
it copies that into that rev when you specify the floorplan.  If you then go
and modify the original floorplan file, it doesn't get copied into the rev
again unless you go back and respecify the floorplan file.  Its a little
backasswards I think, but I've gotten used to it.  Hope that's what's
wrong...sure wouldn't want that as a bug waiting to bite.

Bob Perlman wrote:

> On Fri, 18 Feb 2000 03:55:49 GMT, Ray Andraka <randraka@ids.net>
> wrote:
>
> >Was the movement limited to within a CLB?  If not check the placement
> >report to see what and why it happened.
> >
>
> To which placement report are you referring?  I've looked in the .par
> file, and other than a number of statements of the form "Resolved that
> CLB <DRAM_ADR_8> must be placed at site CLB_R20C11," I don't see
> anything that would help me figure out why things are moving.   And
> placed logic is moving between CLBs, not just within CLBs.
>
> Bob Perlman
>
> >Bob Perlman wrote:
> >
> >> Hi -
> >>
> >> I don't know how many of you use the Xilinx M2.1 floorplanner.  If you
> >> do, I have a question for you.
> >>
> >> Yesterday I used the floorplanner to place portions of a
> >> schematic-based XCS30XL design, and managed to go from a design that
> >> failed route after 1-1/2 hours (didn't complete route and didn't meet
> >> timing on the routed nets)  to a design that routed and met all timing
> >> constraints in 40 minutes.  So, I'm happy with the results, but was
> >> puzzled by the fact that the Xilinx tools moved some of the cells that
> >> I'd placed.  Any RPMs that I placed stayed put, but cells that I'd
> >> moved individually into the placement window were sometimes in new
> >> places after routing.  You could see that the place and route tools
> >> had kept the cells more or less where I'd placed them, but moved some
> >> cells around.
> >>
> >> Is this expected behavior when using the floorplanner?  If so, what's
> >> to keep I/O pin assignments from moving?
> >>
> >> Thanks,
> >> Bob Perlman
> >>
> >>
> >> -----------------------------------------------------
> >> Bob Perlman
> >> Cambrian Design Works
> >> Digital Design, Signal Integrity
> >> http://www.best.com/~bobperl/cdw.htm
> >> Send e-mail replies to best<dot>com, username bobperl
> >> -----------------------------------------------------
>
> -----------------------------------------------------
> Bob Perlman
> Cambrian Design Works
> Digital Design, Signal Integrity
> http://www.best.com/~bobperl/cdw.htm
> Send e-mail replies to best<dot>com, username bobperl
> -----------------------------------------------------

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20743
Subject: Re: x18 FIFO's in Virtex
From: Peter Alfke <palfke@earthlink.net>
Date: Sun, 20 Feb 2000 06:29:10 GMT
Links: << >> << T >> << A >>

I suppose two bits are parity. You could FIFO only 16 bits and recreate
parity at the output ( takes one half Virtex CLB).
Otherwise, 18 is just an awkward number...

Peter Alfke

Ray Andraka wrote:

> How deep?  You can use the CLB Ram for relatively small FIFOs, same way
> as with 4K designs.
>
> Keyvan Irani wrote:
>
> > Hello,
> >
> > Does any one know of any way to implement an 18 bit wide FIFO in
> > Virtex without utilizing 100% of two Block RAMs?
> >
> > Regards,
> > K. Irani
>
> --
> -Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email randraka@ids.net
> http://users.ids.net/~randraka

Article: 20744
Subject: Re: Viewlogic 4 and XACT6.1 - any good for XC4k ??
From: z80@ds2.com (Peter)
Date: Sun, 20 Feb 2000 08:34:52 +0000
Links: << >> << T >> << A >>


>You won't be able to target much anything other than 4000E and 4000EX
>series parts with that.  The new tools do run through place and route
>faster, and do a little better with automatic placement than Xact6, and
>the whole thing will run under NT.  In the process though, you gain lots
>of bugs and less functionality in the floorplanner.

Looks like I should go for the new tools. I paid some $15k for the
stuff I have, and recently someone posted a crack for both the
dongles, so I was quite happy to have the investment protected. :)

How do the bugs get found? Do you mean that the design does not work
and there is no indication of why, or does post-route DRC find them?

Are the 4000E and EX parts obsolete or just too expensive for what
they do?

I did like one particular feature in the old tools which was great for
multiple clock signals: if you placed an L and SC=1 (skew of 1ns max)
attributes onto a clock line, it would use a long line to run that
clock. APR worked just fine with this. However PPR broke this, and one
was back to doing it "properly" using the 1 or 2 global clock nets.
This really hammers dynamic power consumption, increasing it often
several times.


Peter.
--
Return address is invalid to help stop junk mail.
E-mail replies to zX80@digiYserve.com but remove the X and the Y.
Please do NOT copy usenet posts to email - it is NOT necessary.

Article: 20745
Subject: Re: Generating a Higher Frequency Clock from a Lower One in FPGA
From: nestor@ece.concordia.ca
Date: Sun, 20 Feb 2000 15:36:56 GMT
Links: << >> << T >> << A >>

Thanks Peter, Ray and Hal for your input.

  Since creating a completelly digital DPLL in an FPGA looks to be
quite difficult, what about creating a hybrid PLL where only the
voltage-controlled oscillator would be external (analog) and the rest
(phase detector, loop filter and divide-by-N) would be designed in the
FPGA?  
  In most previous cases I have seen, only the phase detector designed
digitally and the rest was still analog.  I wouldn't mind building a
hybrid PLL solution as long as I could be guaranteed that clock
multiplication would be possible.  Otherwise, I would have to resort
to a completely analog design...

Thanks in advance.

Nestor
nestor@stansync.com
nestor@ece.concordia.ca

Article: 20746
Subject: Re: Viewlogic 4 and XACT6.1 - any good for XC4k ??
From: Ray Andraka <randraka@ids.net>
Date: Sun, 20 Feb 2000 15:59:48 GMT
Links: << >> << T >> << A >>



Peter wrote:

> >You won't be able to target much anything other than 4000E and 4000EX
> >series parts with that.  The new tools do run through place and route
> >faster, and do a little better with automatic placement than Xact6, and
> >the whole thing will run under NT.  In the process though, you gain lots
> >of bugs and less functionality in the floorplanner.
>
> Looks like I should go for the new tools. I paid some $15k for the
> stuff I have, and recently someone posted a crack for both the
> dongles, so I was quite happy to have the investment protected. :)
>
> How do the bugs get found? Do you mean that the design does not work
> and there is no indication of why, or does post-route DRC find them?
>

Most of the bugs I've seen so far are in the mapper and floorplanner.  When
they occur, mapper exits with errors.  The 'pushbutton' flow seems to be
pretty bug free.  Bugs are things like not allowing a legal combination in a
CLB, floorplanner not dealing with RLOC correctly and the like.

>
> Are the 4000E and EX parts obsolete or just too expensive for what
> they do?

Spartan parts have the same functionality, are faster and are cheaper.  The E
and EX are not obsolete...yet.  They are the oldest families still sold
though.

>
>
> I did like one particular feature in the old tools which was great for
> multiple clock signals: if you placed an L and SC=1 (skew of 1ns max)
> attributes onto a clock line, it would use a long line to run that
> clock. APR worked just fine with this. However PPR broke this, and one
> was back to doing it "properly" using the 1 or 2 global clock nets.
> This really hammers dynamic power consumption, increasing it often
> several times.

This is the type of thing I meant when referring to less functionality.

>
>
> Peter.
> --
> Return address is invalid to help stop junk mail.
> E-mail replies to zX80@digiYserve.com but remove the X and the Y.
> Please do NOT copy usenet posts to email - it is NOT necessary.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20747
Subject: Re: Generating a Higher Frequency Clock from a Lower One in FPGA
From: Ray Andraka <randraka@ids.net>
Date: Sun, 20 Feb 2000 16:07:23 GMT
Links: << >> << T >> << A >>

You can use a run-of the mill analog PLL chip such  as the widely
available 74FCT88915 for the PLL and use the FPGA or a CPLD to do the
reference and feedback divides.  I like the 88915 because it is available
from several vendors (mot, idt,cypress), has a simple external loop
filter, and has 2x, X/2 and multiple 1x outputs that are skew controlled.
By itself, it is a PLL low skew clock driver, but with the addition of
external dividers it is a quite capable clock synthesizer.  Another PLL
I've used is a National CGS410 pixel clock generator.  It has the dividers
inside and also has a simple loop filter.

nestor@ece.concordia.ca wrote:

> Thanks Peter, Ray and Hal for your input.
>
>   Since creating a completelly digital DPLL in an FPGA looks to be
> quite difficult, what about creating a hybrid PLL where only the
> voltage-controlled oscillator would be external (analog) and the rest
> (phase detector, loop filter and divide-by-N) would be designed in the
> FPGA?
>   In most previous cases I have seen, only the phase detector designed
> digitally and the rest was still analog.  I wouldn't mind building a
> hybrid PLL solution as long as I could be guaranteed that clock
> multiplication would be possible.  Otherwise, I would have to resort
> to a completely analog design...
>
> Thanks in advance.
>
> Nestor
> nestor@stansync.com
> nestor@ece.concordia.ca

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20748
Subject: Divider
From: dave_admin@my-deja.com
Date: Sun, 20 Feb 2000 18:00:05 GMT
Links: << >> << T >> << A >>

Hi,

Does anybody have a HDL source of 32 or 16-bit divider ?
Smaller but parameterized dividers are also welcome.

regards,
Dave.


Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 20749
Subject: Divider
From: dave_admin@my-deja.com
Date: Sun, 20 Feb 2000 18:05:56 GMT
Links: << >> << T >> << A >>

Hi,

Does anybody have a HDL source of 32 or 16-bit divider ?
Smaller but parameterized dividers are also welcome.

regards,
Dave.


Sent via Deja.com http://www.deja.com/
Before you buy.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search