Messages from 2875

Article: 2875
Subject: Floating Point and Reconfigurable Architectures
From: andre@ai.mit.edu (Andre' DeHon)
Date: Wed, 21 Feb 1996 22:49:12 GMT
Links: << >> << T >> << A >>


	Floating point -- that's one worth trying to sort out... 

Do you really need Floating Point?

	I guess that's the question which always comes to mind first.  In
practice, I see floating point weilded in many situations where it isn't
necessary, and I see people demanding it who don't really need it. 

	Often I see people who think they need floating point simply
because they want a decimal place.  "Modern" languages like C, which lack
support for non-integer fixed point numbers, have this tendency to force
you into that mindset.  

	My impression is that the main case for really need floating
point is when the dynamic range of the variables is very large and the
magnitudes at any point in a program (or stored value) are unpredictable
-- and, hence, must be explicitly represented along with the values.

	I believe, the only other justification for floating point is "ease
of programming" -- the programmer isn't burdened with thinking about the
dynamic range or magnitudes of the intermediate values (though he does
still need to think about the number of bits of significance kept in these
intermediates).   I don't want to under-rate this issues, but I do want to 
sort it out as separate from the cases where floating point is
fundamentally required.

	[BTW -- I'd be glad to be educated about other uses and virtues of 
floating point in case I'm simply ignorant on the matter.] 



Can FPGAs do floating point?

   Of course -- see, for example, [Shairazi, Walters and Athanas, FCCM'95]
    (though I think I heard some concern about the stability of their
       detailed, short floating point implementation)

How well can FPGAs do Floating Point? 

	Admittedly, poorly.
 
	Unfortunatley, there's no direct implementations to point at (I
don't feel comfortably directly extrapolating from the ~16 bit floating
point implementation mentioned above to something which is comparable with
the more traditional 32 or 64 bit formats).
	
	From my review of multiply implementations (which takes up about
half of the space in the few floating point implementations I've seen), 
I see a 200x space-time disadvantage for an FPGA implentation versus a
hardwired implementation.  I wouldn't be surprised to see the full floating
point implementation being worse  -- perhaps 500x.


Is this important?

	That's a good one to discuss.  

	How often is floating point *really* needed? -- that's a question
for application and algorithm people to think hard about.  I, for one,
would like to hear people's thoughts.

	Are people so attached to the convenience of floating point that
they can't think of doing things any other way (even if floating point
isn't strictly necessary)?  This will never be something we can quantify --
but it would be nice to get a feel for whether or not floating point (IEEE
floating point with all of its little quirks and details) is a piece of
standardization baggage we're going to be stuck with for all time.

	

Is Floating Point a reconfigurable killer?

	Certainly not.  If it is important for one of the reasons mentioned
above, then it may be time to think about adapting reconfigurable
architectures to deal with it better.  ...and, of course, if it isn't then
it doesn't matter...
     
What can we do if floating point is so important?

   * Range analysis and compile to fixed point -- I appreciate the desire
to take the burden of keeping track of the location of the decimal
point and value ranges away from the programmer.  There are many cases
where we can give that responsibility to the *compiler* rather than to the
hardware.  Certainly, such a scheme could take care of all the cases where
floating point was not needed for dealing with wide, unpredictable dynamic
ranges, but was useful in easing the task of programming.  I could see a
compiler being told the basic ranges of source inputs at some point of a
program and propagating range/significance analysis through the code to
infer the ranges needed at each point in the program.  In the process it
could decide whether the decimal could be fixed staticly (and where it
occurred in the number) or whether it needed to be handled dynamicly.
--> sounds like a good compiler/synthesis project for someone to look into.

  * Do what processors do -- General-purpose processors/ALUs were (are)
pretty bad at floating point, themselves.  That's why modern
microprocessors (and older high-performance processors) include hardwired
floating point units to handle floating point operations.  Modern,
"general-purpose" processor dedicate about 15% of their die area to hold a
floating-point unit.  If this were important, one could do the same with
reconfigurable devices -- add a few FPUs in the corner of the die.  E.g.
the Altera 10K now has banks for specialized memory block; one could
replace those blocks with a specialized FPU if it were deemed that critical
to all applications.

  * Keep Floating-Point datapaths in mind while designing reconfigurable
elements -- If you look at most FPGAs, you'll see that
adder/substractor/comparison datapaths were carefully considered during
design.  This leads to architectures which aren't purely "a bunch of
homogenous bit-level processing elements" and which deal with arithmetic
moderately robustly.  With similar care and attention to the requirements
for floating point, I suspect one could architecture more "floating point
friendly" architectures.  I'd start by looking at what some of the SIMD
processors have done along this vein to improve their ability to handle
floating point code (particularly, I'm thinking of the the Masspar MP2 and
not the CM which took the approach mentioned above of adding a specialized 
FPU). 


					Andre' DeHon
					Reinventing Computing
					MIT AI Lab

<http://www.ai.mit.edu/projects/transit/rc_home_page.html>

Article: 2876
Subject: Re: Floating Point and Reconfigurable Architectures
From: sc@vcc.com (Steve Casselman)
Date: Thu, 22 Feb 1996 04:53:42 GMT
Links: << >> << T >> << A >>


> 
> 
> 	Floating point -- that's one worth trying to sort out... 
> 
> Do you really need Floating Point?
> 
Because floating point is avalible it is used. In other words programs 
and algorithms exist that take full advantage of floating point range.
Many real world apps depend on the range of floats it would take lifetimes
to redo the codes and algorithms involved.  I'd like to know what length 
word one would have to use in fixed point to recreate the full dynamic 
range of single precision floating point. 

More to the point what computer system around today doesn't do floating
point? How can we say "Reconfigurable Computing" if we can't do what other
computing systems do? It is a tough one and we have to tackle it head on to
get ANY respect from the rest of the computing world.

> How well can FPGAs do Floating Point? 
> 
> 	Admittedly, poorly.

Most floating point work I've seen uses VHDL which is not effecient. While
it may take some time I'm going to be working on this soon which is why I'm
writing my little Xilinx assembly language. 


> Is this important?
> 
> 	
Absolutely!!!! I can not tell you how many times I've heard "can it do floating
point?" If all I can say is "if you make it it can" I don't get the sale. For
me it is very important.

> 
> Is Floating Point a reconfigurable killer?
> 
> adapting reconfigurable architectures to deal with it better. 
>      
This is what has to be done. It is tough when one or two companies hold
the patents that lock the field up. Reconfigurable computing is a small
market because the right devices don't exist and the right devices don't
exist because it is a small market (ARRRRGGGGG!!!!).



> What can we do if floating point is so important?
> 
>    * Range analysis and compile to fixed point -- I appreciate the desire
> to take the burden of keeping track of the location of the decimal

I appreciate what your saying here but wouldn't it be nice to just have the
right structures to do the right thing?

> 
>   * Do what processors do -- General-purpose processors/ALUs were (are)
> pretty bad at floating point, themselves.  That's why modern
> microprocessors (and older high-performance processors) include hardwired
> floating point units to handle floating point operations.  

This is an IMPORTANT thought.. Look what they had to do in the 50's then
look at what they did in the early 70's then look what they did in the 80's
"those who do not know history are comdemned to repeat it"

>   * Keep Floating-Point datapaths in mind while designing reconfigurable
> elements -- If you look at most FPGAs, you'll see that
> adder/substractor/comparison datapaths were carefully considered during

and barrel shifters don't forget barrel shifters in floating point it is adds that
really get you!

Steve Casselman
Virtual Computer

Article: 2877
Subject: Re: Floating Point and Reconfigurable Architectures
From: mbutts@netcom.com (Mike Butts)
Date: Thu, 22 Feb 1996 19:09:48 GMT
Links: << >> << T >> << A >>

In the same vein as Steve and Andre have been saying...

15 years ago the Forth community struggled with the same issues
about floating point.  Chuck Moore always maintained that if
you really knew your application and its numerical behavior,
like any good programmer should, then FP was unneccessary baggage.
He and other Forthers quite successfully developed numerical
applications in fixed point Forth, ranging when necessary, etc.
That was in a time when FP was rare and expensive, there were
many different formats, etc.  After not too many more years,
the cost of FP dropped to the point where the issue started
seeming silly, and everybody started using FP anyway.  It's
easier to just throw FP at the problem than do rigorous analysis,
and after all programmers are at least as lazy as anyone else.
I think computers are supposed to make problem solving easier,
so I think that is a reasonable position to take.

I wasn't paying attention, but I believe the DSP community is
now going through a similar process.  FP DSPs are increasingly
common, powerful and affordable.

I think FCCMs are in the early stage of a similar path.  Now
FP is terribly expensive, especially if implemented in LUTs
and programmable interconnect.  FP operators are so atomic
and well understood and universal that they are bound to be
hard-wired into programmable arrays at some point in the
future.  How to architect arrays to mix LUTs with other
hard function units is an interesting problem, that Andre
and his cohorts and others in the research community, and
Altera and probably others in the commercial world, are
usefully exploring.

The real point of FCCMs is to program the hardware that
is *usefully* reprogrammed.  The more cases we find where
we can use hard-wired hardware as elements in our programmable
soup, the better off we will be in speed, area and power.
Arithmetic elements are good cases like that, and will get
better and better as silicon shrinks further.

For now it's very important to find gate-efficient ways to
get arithmetic done in the FPGAs we have.  It's useful to
learn from those who traveled this path before us in
GP computing, and later in DSP, and use the same tricks.

A good example is the resurrection of CORDIC arithmetic
to replace multiplies with adds, reported by Chris Dick
of La Trobe U in Melbourne at FPGA '96 ("Computing the
Discrete Fourier Transform on FPGA Based Systolic
Arrays", pp 129-135.)  He gets a 1000-point DFT on
one XC4010 in 51.45 milliseconds, 5ms on 10 XC4010s.

   --Mike

-- 
Mike Butts, Portland, Oregon   mbutts@netcom.com

Article: 2878
Subject: Re: Xilinx FPGA's with Mentor Tools?
From: zeev@cadence.com (Zeev Yelin)
Date: Thu, 22 Feb 1996 21:20:09 GMT
Links: << >> << T >> << A >>

In article <4fqs7k$2tn@hacgate2.hac.com>, Lance Gin <c43lyg@dso.hac.com> wrote:


> >Just for the record, we're using the A3F release and XACT 5.1.1 on 
> >Sun Solaris 2.5. Generally, once bashed into shape by my mate vi ;-) 
> >the system works well. 
> 
I was told by Xilinx they are not yet ported to Solaris, and yet you tell
us  casually on working on Solaris 2.5  !!! amazing.
how stable it is ? Share with us Lance.
thx
Zeev

-- 
Zeev Yelin, Cadence Design Systems(Israel)

"no such thing as a Finish Line"

My personal opinions, use it or loose it, always at your own risk.

Article: 2879
Subject: Re[2]: Xilinx FPGA's with Mentor Tools?
From: Lance Gin <c43lyg@dso.hac.com>
Date: 22 Feb 1996 22:49:59 GMT
Links: << >> << T >> << A >>

> [Post] [Reply]
> -------------------------------------------------------------------- > 
> Re: Xilinx FPGA's with Mentor Tools? > 
> From: vanbeek@students.uiuc.edu (Christopher VanBeek) > Date: 1996/02/12
> 
> MessageID: 4fo8ae$j4h@vixen.cso.uiuc.edu#1/1
> -------------------------------------------------------------------- > 
> distribution: inet
> references: <4f3l48$c6o@hacgate2.hac.com> <4fnu3l$eo3@gcsin3.geccs.gecm.com>
> organization: University of Illinois at Urbana > newsgroups: comp.arch.fpga,comp.cad.synthesis,comp.lang.vhdl,comp.lang.verilog,co=
mp.lsi .cad,comp.sys.mentor
> 
> Hi,
> 
> I'm a college student at the University of Illinois and am just > learning to use VHDL. I have compiled and synthsized some simply=
 > designs using Mentor's tools and Xilinx libraries. Here was
> the design flow I used:
> 
> First, I had to create a work directory using Mentor's "qvlib" > program. Then compile the VHDL using "qvcom". Simulation can be >=
 performed using "qvsim", or I think QuickSim (I have not tried > QuickSim). I found qvsim to be faster than QuickSim every was,
> and it allows step tracing and breakpoints in the source as well > as waveforms. To synthesize, I compiled using "qvcom" with the
> "-sythesis" option. This executes Mentor's System-1076 Compiler > after compiling the VHDL. That creates a symbol in the work
> directory and the Autologic viewport for the design. Then I ran > Mentor's Autologic program, read in the viewport, and set the
> destination technology. The Autologic libraries for the Xilinx > FPGAs are on supportnet.mentorg.com in the
> /pub/mentortech/tdd/libraries/fpga directory. Then I set a couple > contraints and hit the synthesis button. It created a bunch of=

> sheets with Xilinx gates and flip flops on them. > 
> Christopher Van Beek
> 
> --------------------------------------------------------------------

chris,

we haven't gotten into the details yet, but our flow will need to support
an XC4025E device in a pga-299 pkg using mixed schematic/VHDL entry as
follows:

DA->sys1076(ugh!)/quickHDL/qsim2->autologic2->XACT

we'll also be using model tech's v-system VHDL simulator on PC for blocks.
our local mentor FAE thinks this flow will work (with some caveats) and
has given me a xilinx doc which describes autologic2 synthesis guidelines.
he also made sure the latest autologic library for XC4KE was installed at
the supportnet ftp site. i've got a URL for some xilinx/mentor tutorials
which you might be interested in:

http://www.mentorug.org/sigs/univ_sig/index.html

you might also like to subscribe to a few of the mentor e-mail exploders
like asic_fpga, falcon, or univ_sig where i've got a more up-to-date version
of this thread going. call mentor tech support at (800) 547-4303 for details.

thanks for your flow info chris. when the time comes, i'll compare our flow
with yours, and possibly contact you again with comments. regards,

-- 
____________________________________________________________________________

Lance Gin                                              "off the keyboard
Delco Systems - GM Hughes Electronics                   over the bridge,
OFC: 805.961.7737  FAX: 805.961.7329                    through the gateway,
C43LYG@dso.hac.com                                      nothing but NET!"
____________________________________________________________________________

Article: 2880
Subject: Re: Floating Point and Reconfigurable Architectures
From: Brad Taylor <blt@emf.net>
Date: Thu, 22 Feb 1996 20:05:41 -0800
Links: << >> << T >> << A >>

Andre' DeHon wrote:
> 
>         Floating point ...
> 
> 

I suspect the most efficient way to do floating point multiply 
accumulates with an FPGA is to conect it to connect it to an i860 or a 
C40 and then throw away the FPGA.   

If however, one needs floating point trancendental functions, divides, 
atan2 or other complex functions, they can run more efficiently on an 
FPGA by using the cordic algorithm.  This creates 1 bit per clock
and can be pipelined to create a complete result every clock.  I'd like
to see a DSP that could do 50 million atan()s per second.

Another reason to do floating point in an FPGA might simplt be because 
you have an FPGA available, not a FPU.  

I believe the reason that FPUs have become so common is first that they
are affordable, and second because using floats eliminates a lot of bugs.
I certainly use floats in C.  To properly do math with ints requires
a lot of expertise in a field that very few are expert in, which is 
precision analysis.

It must be noted however that real world problems can usually be 
implemented without floats.  The ecomomic advantage of 10-100 x 
performance advantage might impel one to implement a solution as integer 
math and to do the precision analysis. 

It seem to me that that the value of FPGAs is not in emulating IEEE 
floating point operations, but in solving real world problems which 
they tend to do very  well.  The precision analysis issue is one that 
won't go away and will probably only be solved when high level tools 
are used to transform floating point algorithms into FPGA configurations.

By the way, the carry chain in a 4K FPGA can be used to implement the 
find first one function.  When the FPGA manufactures start to support 
wide fanin muxes, the barrel shift might be a little less painful.

Also I supect that floats using bit serial math are relatively efficient.
Has anyone checked this out?  

Brad Taylor

Article: 2881
Subject: Re: Java and reconfigurable computing
From: billms@nixon.icsl.ucla.edu (Bill Mangione-Smith)
Date: Fri, 23 Feb 1996 06:10:50 GMT
Links: << >> << T >> << A >>

In article <1996Feb21.190049.3248@super.org> sc@vcc.com (Steve Casselman) writes:

   I think the Java thread is about reconfigurable computing in that
   Java talks about virtual machines and these could be implemented as
   FPGA based hardware objects. Below is a little SUN blurb about Java
   processors comming out. I read that the pico Java will be 2mm square
   this would fit in an 10,000 gate FPGA and the micro Java might fit in
   a 40,000 gate FPGA. A good reconfigurable computing project would be
   a JAVA to FPGA complier that could take the JAVA language and deside
   what could go into hardware and what would be run in software. Then of
   course design a machine that would speed up such a program:)

Java has a 32-bit datapath, floating point, stacks (probably requiring
stack->register mapping or stack caching), a reasonably complicated
calling convention, user-level exception handlers, and a requirement
for garbage collection.  I can't imagine how this beast could ever fit
in a 40k FPGA, let alone a 10k job.  The execution path is fairly simple, so
you won't get a speed boost accelerating a specific set of virtual machine
instructions.  I guess you could shoot to compile the high level code to 
hardware, but for that task java is perhaps the *worst* high level language
you could choose, given the heavy weight of its calls and memory model (read
requisit reliance on objects for anything complex).

Java is kind of interesting, and it *appears* that Gosling was trying to 
target dynamic compilation when he defined the virtual machine.  However, 
its a long way from anything that I can imagine ever being implemented in
reconfigurable logic, and I personally consider myself a believer in that
concept.

There are a lot of people who think that Sun is just blowing smoke
with their content-free announcement.  If you read it closely, they
don't even claim that the processor will actually execute virtual
machine instructions directly, leaving open the possibility of
something simple like a sparc core with dynamic compilation (hah!  now
I'm calling dynamic compilation simple...).  However, a recent posting
from a sun architect strongly suggest that in fact they do intend to
use this approach, even if the news release doesn't claim it.

Bill

Article: 2882
Subject: Re: Floating Point and Reconfigurable Architectures
From: Andreas Doering <doering@iti.mu-luebeck.de>
Date: 23 Feb 1996 13:29:27 GMT
Links: << >> << T >> << A >>

I do agree totally,
in software most people use floating point thoughtlessly.
Once I saw a linker which needed fp, just to show how many percent 
of memory were used.
I guess that most of the wide spread programming languages are unfit 
for automatic data range compilation. Languages like Haskell with 
a very clever type system might fill the gap. Even VHDL would not free
the engineer of explicitly controling precision.
Another Option might be the use of fractions. In good old 
D.E. Knuth's books is a discussion about that. 
It would only require a fast gcd-unit, which should be 
doable in hardware with moderate effort.
Just my 2%
A.


-----------------------------------------------------------------
                        Andreas Doering
                        Medizinische Universitaet zu Luebeck
                        Institut fuer Technische Informatik
                        Germany
----------------------------------------------------------------

Article: 2883
Subject: Re: Xilinx is NOT specified MINIMUM delay -
From: budwey@sequoia.com (Mike Budwey)
Date: Fri, 23 Feb 1996 17:37:34 GMT
Links: << >> << T >> << A >>

Peter Alfke (peter@xilinx.com) wrote:
: In article <Dn3uCy.25v@icon.rose.hp.com>, tak@core.rose.hp.com (Tom
: Keaveny) wrote:

: > Note, that there are a number of "synchronous" bus spec's that mandate
: > a non-zero hold time.

: Please let me suggest a more careful noemnclature:

: "Hold-time" is not an OUTPUT characteristic,

: "Hold time" is always an INPUT requirement. A positive hold time means
: that input data is required to be "held" valid until after the active
: clock edge.

: "Propagation delay" or "clock-to-out" is the relevant OUTPUT specification.


: If two devices are directly interconnected, and share a common clock
: without any skew, then a positive hold-time requirement at the input can
: only be satisfied by a guaranteed minimum clock-to-out delay on the output
: that drives that input.

: That's why a positive hold-time requirement on a data input is so bad, and
: that's why Xilinx has added internal delay to increase the data set-up
: time so much that the pin-to-pin hold-time requirement on all inputs is
: never positive.

:  As a result, you can take data away simultaneously with the clock, and
: still be sure that the old data is being clocked in.
: A minimum clock-to-out delay specification is then not needed. The actual,
: physically unavoidable shortest output delay acts as additionasl
: protection against clock-skew problems.

: Peter Alfke, Xilinx Applications.

Come on Peter; surely you don't mean to suggest that nobody design a
XILINX part to interface to something besides than other XILINX parts.
There are certain facts of life which require consideration of at least
some system clock skew.

If a bus spec REQUIRES that data not change for 3ns (for example) after
the clock, I would call that a hold time.  Why the bus spec may even
refer to it as t(DH)!

So, how would you suggest one design an interface to such a bus using a
XILINX device?  If you would suggest that XILINX not be considered for
these designs, let us know.  At least we won't waste time evaluating
them.

Mike Budwey	budwey@sequoia.com

Article: 2884
Subject: Re: Floating Point and Reconfigurable Architectures
From: gah@u.washington.edu (G. Herrmannsfeldt)
Date: 23 Feb 1996 23:02:43 GMT
Links: << >> << T >> << A >>

In many scientific problems, floating point is required.  The dynamic
range is large, and the precision is relatively small.

But outside of scientific work, I don't believe this.

In text processing, I believe that floating point should mostly not be
used.  In Knuth's TeX and Metafont, floating point is never used for
anything that will affect the output.  It is used for things in the log
file, though.

Postscript uses floating point, and I know of a number of cases where
the results are wrong because of it.

Even a 180 degree rotation can't be done in postscript, at least not 
with the rotate operator.  Rounding in the sin/cos means that the zero
terms in the rotation matrix are not zero.  In high resolution, this is
easily visible.

Metafont does fixed point sin,cos,sqrt,etc just to do this right.

Oh well, just a chance to say something that I don't see said very often.


-- glen

Article: 2885
Subject: PCI models synthesized to FPGAs?
From: David Emrich <emrich@exemplar.com>
Date: 24 Feb 1996 01:52:13 GMT
Links: << >> << T >> << A >>

I would like to hear from people who have synthesized PCI models to
FPGAs.

========================================================================
David Emrich                                        Exemplar Logic, Inc.
emrich@exemplar.com                         815 Atlantic Ave., Suite 105
                                                 Alameda, CA  94501-2274
                                                                     USA
========================================================================

Article: 2886
Subject: Re: Floating Point and Reconfigurable Architectures
From: louca@caip.rutgers.edu (Loucas Louca)
Date: 23 Feb 1996 21:27:44 -0500
Links: << >> << T >> << A >>

mbutts@netcom.com (Mike Butts) writes:

>For now it's very important to find gate-efficient ways to
>get arithmetic done in the FPGAs we have.  It's useful to
>learn from those who traveled this path before us in
>GP computing, and later in DSP, and use the same tricks.

>   --Mike

>-- 
>Mike Butts, Portland, Oregon   mbutts@netcom.com

This is what we also think, so we have implemented FP addition 
and multiplication on the Altera FLEX 8000 using the IEEE 
single precision format.  Various methods were investigated 
in order to get the best combination of time-space.  We 
finally used a pipeline design for the adder and a 
digit-serial design for the multiplier.  

The adder takes about 50% of the chip and it has a peak rate 
of about 7MFlops.  The  multiplier takes 344 logic cells 
(34% of the chip) and it is clocked at 15.9 MHz.  Since we 
are using a digit-size of 4, 12 clock cycles are needed for 
the complete result to be available.  This translates to a 
rate of 1.3MFlops.

If you would like to get more information on these designs
please send email to louca@ece.rutgers.edu

Loucas

Article: 2887
Subject: Re: Floating Point and Reconfigurable Architectures
From: louca@caip.rutgers.edu (Loucas Louca)
Date: 23 Feb 1996 21:44:47 -0500
Links: << >> << T >> << A >>

Brad Taylor <blt@emf.net> writes:

>Also I supect that floats using bit serial math are relatively efficient.
>Has anyone checked this out?  

>Brad Taylor

we did design a 32-bit IEEE single precision FP multiplier using digit-serial arithmetic on Altera FLEX 8000.  For details on area requirements and speed see my previous posting.  For more info email louca@ece.rutgers.edu.

Loucas

Article: 2888
Subject: Xilinx 7336 EPLD
From: dsp@io.com (William A. Gordon)
Date: 24 Feb 1996 07:02:02 GMT
Links: << >> << T >> << A >>

Is there anyone in the Austin, TX area who has had experience developing for the Xilinx
7336 EPLD device?  What are good tools for Boolean equation entry AND functional simulation?

Any help would be appreciated.

Regards,


William A. Gordon, Jr.
dsp@io.com

Article: 2889
Subject: Re: Floating Point and Reconfigurable Architectures
From: lazzaro@snap.CS.Berkeley.EDU (John Lazzaro)
Date: 24 Feb 1996 19:28:20 GMT
Links: << >> << T >> << A >>

In article <mbuttsDn6yKC.FoB@netcom.com>, Mike Butts <mbutts@netcom.com> wrote:
>I wasn't paying attention, but I believe the DSP community is
>now going through a similar process.  FP DSPs are increasingly
>common, powerful and affordable.

But DSP's have a powerful motivation for staying as silicon-efficient
as possible -- the descent into low-end, high-volume embedded
application domains. A fixed-point multiplier will always take less
die area than a floating-point multiplier, and that difference in area
can be the difference between profit and loss when selling DSP's below
the $1 price point. And this is the price point that will need to be
breached for voice I/O to become truly ubiquitious, to choose just one
example ...

-- 
-------------------------------------------------------------------------------
John Lazzaro                My Home Page: http://http.cs.berkeley.edu/~lazzaro
lazzaro@cs.berkeley.edu     Chipmunk CAD: http://www.pcmp.caltech.edu/chipmunk/
-------------------------------------------------------------------------------

Article: 2890
Subject: Floating Point on FPGAs -- Numbers are great...
From: andre@ai.mit.edu (Andre' DeHon)
Date: Sat, 24 Feb 1996 20:53:44 GMT
Links: << >> << T >> << A >>



	Numbers are great! 

> This is what we also think, so we have implemented FP addition 
> and multiplication on the Altera FLEX 8000 using the IEEE 
> single precision format.  Various methods were investigated 
> in order to get the best combination of time-space.  We 
> finally used a pipeline design for the adder and a 
> digit-serial design for the multiplier.  
>
> The adder takes about 50% of the chip and it has a peak rate 
> of about 7MFlops.  The  multiplier takes 344 logic cells 
> (34% of the chip) and it is clocked at 15.9 MHz.  Since we 
> are using a digit-size of 4, 12 clock cycles are needed for 
> the complete result to be available.  This translates to a 
> rate of 1.3MFlops.
>
>
> If you would like to get more information on these designs
> please send email to louca@ece.rutgers.edu

	 Let's do some back of the envelope calculations of FP capacity
based on these numbers. 

FPGA ------------------------------------
	I'm going to start by assuming that Altera LE's are about the same
size as Xilinx 4-LUTs (half a CLB in the 3/4k family) -- I think this is
about ballpark right, but it could be off by 20% one way are the other.

	Assume:  1 LE ~= 600K lambda^2 
             (lambda = 1/2 minimum feature size, a technology normalizer)

	You must be using an 81188 -- 1008 LE's since you say 344 logic
cells is 34% of the chip.

	I'll call 50% of the chip 500 cells to keep things easy.
	
On addition this gives a throughput of 1 FP add per: 

	500 * 600K lambda^2 * 63ns
     or  0.053 FP Adds / lambda^2*s

     on multiply, a throughput of 1 FP-MPY per:
        344 & 600K lambda^2 & 63ns * 12
     or  0.0064 FP-mpy / lambda^2*s

Custom ------------------------------------

	I happen to have a custom 32-bit FPU papers in my files (one data
point -- not enough to necessarily know whether or not this is a particulary
good or bad custom implementation).

    Matsushita has a 32-bit CMOS FP-mpy JSSC vol 19, #5, p.697ff.
	
	5.75x5.67mm die in lambda=1um --> 32.5M lambda^2 area
        78.7~ns  multiply

	--> 0.39 FP-mpy/lambda^2*s
	

	[ 0.39 / 0.0064 ~ = 61x]

	It fairs better than I had predicted in my first message. (at some
point it would be worthwhile to look a bit more broadly to see if this custom
implementation is typical).

	Also note, a composite FP unit which does more than just FP-mpy
would be less dense when just considering the FP-MPY operation.

DSP/proc extrapolate ------------------------------------

	Now, if you buy a processor/DSP to do FP-MPY's for you, the
FP-MPYer usually consumes about 10-20% of the area on the die.  So, the ratio
for a DSP etc. which integrates such a muliplier might be more like 1/5th
this (12x) rather than ~60x.

Summary ------------------------------------


     0.0064 FP-mpy/lambda^2*s FPGA
     0.08 FP-mpy/lambda^2*s   DSP estimate [12x FPGA density]
     0.39 FP-mpy/lambda^2*s   custom [60x FPGA density]

	To round this out, does anyone have good references/numbers on the
number of ALU cycles a "typical" (or some particular) processor w/out FP
hardware requires to implement a 32-bit FP-add and FP-mpy? (my intution is
that the processor sans FPU will have a lower computational density than
the FPGA).

	
                                        Andre' DeHon
	                                andre@mit.edu
                                        Reinventing Computing
                                        MIT AI Lab

<http://www.ai.mit.edu/projects/transit/rc_home_page.html>

Article: 2891
Subject: Re: Verilog vs. VHDL comparison
From: alexk@dspis.co.il (Alex Koegel)
Date: 25 Feb 1996 09:12:56 GMT
Links: << >> << T >> << A >>

In article <31294A97.31FC@microplex.com>,
   Fred Fierling <fff@microplex.com> wrote:
>No doubt this would make a FAQ if this newsgroup had one:
>
>I'm trying to decide between Verilog and VHDL for FGPA design now and
>possible ASIC design in the future. Does anyone know of any articles
>or reports that compare the two?  Preferably something less biased than
>an article by a company with an interest in either...
>

Yes. Try to find a pointer to John Colley's ESNUG entry:
/www.chronologic.com/misc/reviews/verilog.review.html

Alex Koegel
DSPC Israel

Article: 2892
Subject: Re: PCI models synthesized to FPGAs?
From: Eric Ryherd <eric@vautomation.com>
Date: 25 Feb 1996 13:27:59 GMT
Links: << >> << T >> << A >>

I just completed a project targeting AT&T Orca 2C parts.

The timing required for PCI is VERY tight. And FPGAs are still relatively
slow especially for wire delays. The real trick to PCI is not the
PCI side of the world, but in how you interface to the local bus side.
We had more problems there than in the PCI side of the world.
I have to commend the PCI spec writers for creating a relatively easy
to read specification that doesn't leave too many things to interpretation.

I tried to have a fully synthesizable core but could not meet the timing.
Not even close really. Although much of the blame lies on the shoulders 
of the AT&T place and Route tool. If I had more control over where
certain functions were to be placed, the timing probably could have made 
it.
Instead, I had to create several "hardmacros" which were still synthesized
to create them, but the hierarchy gave me something to "latch onto" and
I could tell the P&R tool where to put these macros.

Even with all of this we are still just barely making 25Mhz. 
Fortunately the FPGA is really just a prototype for an ASIC and we will
definitely meet the timing there for the full 33Mhz.
Since this is just a prototype, we determined it was not worthwhile
spending a lot of time with manual place and route and additional
hardmacros to achieve the full 33Mhz in the FPGA.

Another option is that there are faster speed Orca parts coming out
by the end of the year which should be fast enough to meet the PCI
timing.

The core we developed was both a Target and an Initiator. A Target
only core should be pretty easy to implement in an FPGA even at 33Mhz.
It's the Initiator that is the real tough part.

We won't be offering the PCI core as a "standard" Core. Instead it will
be built into other cores. I find it very difficult to beleive that
there can be such a thing as a "standard" PCI core. PCI by it's
very nature is a Configurable Core. The local bus side will always have
it's own, unique requirements. Thus, each implementation of a PCI
core is really a custom development.

-- 
Eric Ryherd                eric@vautomation.com  
VAutomation Inc.           Synthesizable HDL Cores 
20 Trafalgar Square        http://www.vautomation.com
Suite 443 Nashua NH 03063  (603) 882-2282 FAX:882-1587

Article: 2893
Subject: Re: Xilinx 8100 Series
From: Eric Ryherd <eric@vautomation.com>
Date: 25 Feb 1996 13:39:29 GMT
Links: << >> << T >> << A >>

Edward Leventhal <ed.leventhal@omitron.gsfc.nasa.gov> wrote:
>Hello,
>
>	Could someone please tell me the status of the Xilinx 8100
>"Sea Of Gates" FPGAs?  Are these parts available?  Will the current
>XACT software be used for these parts or is there another software
>package which must be used - If so, can XACT be "upgraded" ??
>
>	I have read that these FPGAs yield excellent routing when
>used with logic synthesis (e.g. VHDL), and I am interested in any
>feedback / information.

I was a Beta site for these parts. THEY'RE GREAT!
The design flow is very much like and ASIC, HDL->Synthesis->EDIF->P&R.
The gates you get are also very much like an ASIC. Various flavors
of AND/OR and DFFs and stuff, none of these crazy LUTs into DFFs.
The routing is excellent. Lots of available tracks. Still won't get 100%
utilization depending on your design but 90+% wasn't a problem for me.
The best part is that you don't have a fixed number of random logic and
sequential logic elements. Each CLC can be configured as either random 
logic
or as 1/2 of a DFF. Our designs are generally Random Logic intesive which
makes them terrible in regular LUT based FPGAs (which have tons of DFFs
and never enough LUTs).

The parts are available. But the package choices are limited.
PQ84s are easy to get and should be on virtually any distributors shelves.

The biggest bummer is of course that the P&R SW is not in the "standard"
XACT release. You need XACT8000. You'll have to check with Xilinx on the
cost. On the other hand, XACT8000 is about 8000X better than plain old
XACT... 

-- 
Eric Ryherd                eric@vautomation.com  
VAutomation Inc.           Synthesizable HDL Cores 
20 Trafalgar Square        http://www.vautomation.com
Suite 443 Nashua NH 03063  (603) 882-2282 FAX:882-1587

Article: 2894
Subject: Re: Verilog vs. VHDL comparison
From: Fred Fierling <fff@microplex.com>
Date: Sun, 25 Feb 1996 10:31:52 -0800
Links: << >> << T >> << A >>

Alex Koegel wrote:
> In article <31294A97.31FC@microplex.com>,
>    Fred Fierling <fff@microplex.com> wrote:
> >I'm trying to decide between Verilog and VHDL for FGPA design now and
> >possible ASIC design in the future. Does anyone know of any articles

> Yes. Try to find a pointer to John Colley's ESNUG entry:
> /www.chronologic.com/misc/reviews/verilog.review.html

After reading this article, it occurs to me that I've been naive.
The question isn't which one is technically best, the question is
which one will win the marketing war?

--
Fred Fierling           fff@microplex.com     Tel: +1 604 444-4232
Microplex Systems Ltd  http://microplex.com/  Tel: +1 800 665-7798
8525 Commerce Court                           Fax: +1 604 444-4239
Burnaby, BC   V5A 4N3

Article: 2895
Subject: Re: Xilinx 8100 Series
From: "Steve Knapp (Xilinx, Inc.)" <stevek>
Date: 26 Feb 1996 01:16:03 GMT
Links: << >> << T >> << A >>

Edward Leventhal <ed.leventhal@omitron.gsfc.nasa.gov> wrote:
>Hello,
>
>	Could someone please tell me the status of the Xilinx 8100
>"Sea Of Gates" FPGAs?  Are these parts available?  Will the current
>XACT software be used for these parts or is there another software
>package which must be used - If so, can XACT be "upgraded" ??
>
The XC8100 FPGAs are now in production.  They use an addition to the XACT
software called XACT8000.  Please contact your local Xilinx representative
for pricing information.

Thank you for your interest in Xilinx programmable logic.
-- 
=====================================================================
   _
  / /\/  Steven K. Knapp               E-mail:  stevek@xilinx.com 
  \ \    Corporate Applications Mgr.      Tel:  1-408-879-5172 
  / /    Xilinx, Inc.                     Fax:  1-408-879-4442
  \_\/\  2100 Logic Drive                 Web:  http://www.xilinx.com
         San Jose, CA 95124

=====================================================================

Article: 2896
Subject: Programming ATMEL config. PROMs ?
From: Peter Fenn <peterf@electrosolv.co.za>
Date: 26 Feb 1996 04:48:06 GMT
Links: << >> << T >> << A >>

We have recently purchased AT17C128 Serial PROMS to replace Xilinx configuration 
PROMs, but do not have any apparent means to program these new devices.

I have downloaded Atmels CONFIGURATOR application note describing the programming 
spec, which now prompts me to ask the question "Is there not a piece of programming 
software already already out there somewhere that'll save me some time?"

I imagine the serial bus protocol could be implemented relatively simply using a 
couple've pins on a PC's parallel port interface... 
-Is there such a program at a FTP site somewhere? 

Thanks in adavance.

Regards


PETER FENN

****************************************************
*                ____              ______________  *
*  E L E C T R O |   \             ELECTROSOLV cc  *
*  ==============|    \                            *  
*                |     )=======      ELECTRONICS   * 
*  ==============|    / S O L V      & SOFTWARE    *
*                |___/               DESIGN GROUP  * 
*                                                  * 
****************************************************

Article: 2897
Subject: Re: Xilinx is NOT specified MINIMUM delay -
From: murray@pa.dec.com (Hal Murray)
Date: 26 Feb 1996 07:43:51 GMT
Links: << >> << T >> << A >>

In article <peter-2102961234010001@appsmac-1.xilinx.com>, peter@xilinx.com (Peter Alfke) writes:

[snip]

> If two devices are directly interconnected, and share a common clock
> without any skew, then a positive hold-time requirement at the input can
> only be satisfied by a guaranteed minimum clock-to-out delay on the output
> that drives that input.
> 
> That's why a positive hold-time requirement on a data input is so bad, and
> that's why Xilinx has added internal delay to increase the data set-up
> time so much that the pin-to-pin hold-time requirement on all inputs is
> never positive.
> 
>  As a result, you can take data away simultaneously with the clock, and
> still be sure that the old data is being clocked in.
> A minimum clock-to-out delay specification is then not needed. The actual,
> physically unavoidable shortest output delay acts as additionasl
> protection against clock-skew problems.

I've worked on several 3000/3100 designs.  I like your chips (or 
I'd use something else) and I think I understand your reasoning for 
the no-min-delay philosophy.  But it sure makes my job difficult. 

Let me put that another way.  Any official help in that area would
increase my productivity and make your chips more attractive/valuable. 

Here is my view on (part of) the design process...

I would like to be able to convince myself that a board/system I 
am about to build will meet all the timing requirements before I 
push the button to make the PCB.  I'm willing to do a lot of work 
to do that.  For example, correcting for trace lengths and taking
advantage of your 70% across-chip rule. 

If you are hard nosed about no-min-delay, then I have a lot of troubles 
proving that a design will work.  Here are a few examples:

    Suppose I want to clock data from one 3100 to another.  The specs 
    say 0 hold time but no min output delay.  So I have to have 0 
    clock skew.  That's unrealistic.  But how much skew can I get 
    away with?  Should I work hard making it small or put the effort 
    into something else?

    Suppose I have 0 clock skew between the chips.  When I look at 
    the fine print in the data book that corresponds to your description 
    above, it only applies if I'm using the CMOS clock input.  What 
    do I do if I'm using the TTL input?  [A lot of modern CMOS chips 
    have output pads that drive TTL levels.]

    PCI has a min clock to output spec of 2 ns.  How can I convince 
    myself that a sensible design will work?  Will it still work 
    after the next few speed upgrades?  [I assume it will work because 
    Xilinx is pushing PCI.  You might have added some extra delay 
    in your design, but I doubt it because that makes other things
    harder and they are already hard enough.]

Article: 2898
Subject: FPGA and testing
From: Marcin Piaskowski <zwirek@kpbm.pb.bielsko.pl>
Date: Mon, 26 Feb 1996 18:38:36 +0100
Links: << >> << T >> << A >>

HI ALL!

Excuse me if you have just discused this topic! I'm new in this group!
A friend of mine ask me to find some information about FPGA and testing
or testability of it (I don't now much about it)
I have found some WWW sites but I stil can't get to the FPGA and
testability information.

Can you help me?
If you do, send me an answer on my e-mail (I don't realy read this group
very often)

BTW: How many participants of this group from POLAND you have ???

-- 
Marcin Piaskowski
http://kpbm.pb.bielsko.pl/~zwirek   Sorry, only in Polish, so far ;-)
mailto:zwirek@kpbm.pb.bielsko.pl
Zw

Article: 2899
Subject: Languages for reconfigurable computing.
From: sc@vcc.com (Steve Casselman)
Date: Mon, 26 Feb 1996 18:30:41 GMT
Links: << >> << T >> << A >>

So now we see we are only 60x away from floating point dominance of the world
(not bad when most companies were not even trying:). We have to think about
what languages to use to program. VHDL and verilog are out since they are only
2D (you can specify timing but you end up with a time wise flat design) and can not
handle things like MITs multicontext FPGA - which is where we have to go IMHO.
High level languages like C and fortran are naturally "time aware." By that I
mean if you look at the alu as a reconfigurable unit (reconfigures in one clock)
HLL describe an algorithms execution over time. On the other hand most HLLs can not
describe the fine parallelism the FPGAs can execute. It would be good if I could
take existing programs and have the complier understand that I have this great
resource avalible to me. The hangup now is there are not many complier writers
out there thinking about this problem. 

Something reconfigurable to think about (instead on how to port my FPGA to 
ASICs or whether my antifuse part routes better:)


Steve Casselman
Virtual Computer Corporation

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search