Messages from 500

Article: 500
Subject: Re: driving PCI
From: jhallen@world.std.com (Joseph H Allen)
Date: Fri, 9 Dec 1994 19:11:39 GMT
Links: << >> << T >> << A >>

In article <3ca509$gd5@atlantic.merl.com>, Doug Hahn  <hahn@ca.merl.com> wrote:
>I was wondering if anyone has any experience driving a PCI
>bus from an XC4000 device.  PCI has specific I/V characteristics
>which need to be met and has anybody have any experience meeting
>these criteria (what output driver configuration is needed?).

I haven't tried it, but the I/V curve for the 3K and 2K devices is just in
the accepted area for PCI.  I assume 4K is too.  The only problem I can
think of is that having a large number of pins driving a high-capacitance bus
all on the same clock edge probably exceeds the power capability of the
chip.  I think there's a capacitance per power pin limit in the 4K specs
somewhere; so be sure to check it.

I'd like to here if you have any success with this.

-- 
/*  jhallen@world.std.com (192.74.137.5) */               /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Article: 501
Subject: homebuilt processors using FPGAs (long)
From: jsgray@ix.netcom.com (Jan Gray)
Date: 11 Dec 1994 04:08:40 GMT
Links: << >> << T >> << A >>

(Hope the crosspost to comp.arch.fpga is OK, the topic is amateur
processor implementations using FPGAs.)

In <3c6is4$d7k@gordon.enea.se> pefo@enea.se (Per Fogelstrom) writes: 
>
>PDP11 Hacker ..... (ard@siva.bris.ac.uk) wrote:
>
>: My main interest is in designing a CPU _from scratch_. OK, I know I'll get
>: poor performance from all those FPGAs wire-wrapped together (all that
>: capacitive loading for one thing), but with a good underlying design it should
>: be useable (heck.. The PERQ 1a was had a CPU built from 250 TTL chips, PALs 
and
>: PROMs, clocked at 5MHz, and still beats this 386DX33 for graphics performance
>: :-)). And there's the joy when a prompt appears on a machine that you even
>: designed the instruction set for.
>
>I've did a few bitslice designs many years ago. One was for my own amusement
>and was based on AMD2903 slices (32 bits, 8 chips). It was fun but very time-
>consuming. It was clocked with 5Mhz and executed reg-reg instructions in
>two clocks. I later redesigned it to fetch and decode in the same cycle as
>the previous execute. It never ran any serious software.
>
>Per
>

On homebrew computers: start simple and learn as you go.  When they work
they are *very* satisfying. I was encouraged by helpful U.Waterloo
hardware hacker friends (thanks Ashok and Mike and co., wherever you
are) into building my first homebrew 6809 system -- the "Gray-1", in
12th grade about 14 years ago.  It started with ROM, SRAM, and LEDs,
and gradually acquired serial ports, video, and a Votrax speech
synthesizer.  Eight bit micros and 1 MHz clock rates are easy to do:
easy to wire wrap, and easy to program.  Start with one of those; PICs
look like a good choice today.

On homebrew processors: I went into the software biz but my love for
hardware and computer architecture remains.  I've always been envious of
the engineers in industry and academia who get to design and build new
processors.  For a hobbyist, custom VLSI, gate arrays, or standard cell
has these hugely expensive barriers to entry.  And only the most
determined hobbyist would build a useful 32-bit CPU using bitslice
parts.

In the years since, the programmable logic industry has arrived!  These
days you can buy, quantity one, 5,000 gate field programmable gate
arrays (FPGAs) for ~$100, and 10,000 gate parts for about ~$200.  The
beauty of these parts is they are adequately dense for implementing
processors and they abstract away a lot of the high speed circuit stuff
for you.  For instance, clock skew is of little concern.  If you stick
to fully synchronous designs (no async preset/clear, no gated clocks,
etc.), carefully floorplan your functional units, and stay on chip :-),
your designs have a good chance of working at 20-25 MHz.

In my copious spare time I am experimenting with homebrew RISC CPUs.
Right now I have a partially finished, partially functional 16-bit RISC
CPU and ambitions for a dual issue 32-bit CPU.  The former ("jr16") is
compiled for a Xilinx XC4005PC84C-5, the latter ("NVLIW1" -- "not very
long instruction word #1") will be for a XC4010PC84C-5.

jr16 is a pipelined 16 16-bit register, 3-operand, load/store RISC.  The
basic instruction formats are:
{  0, op: 3, rd: 4, ra: 4,  rb: 4  /* add/logic operations   */ },
{ 10, op: 2, rd: 4, ra: 4, imm: 4  /* load/store, EA=ra+imm4 */ }, and
{ 11, op: 2, rd: 4,        imm: 8  /* load immediate, branch */ }.

Instruction pipeline is the classic IF (insn fetch), RF (write back
previous result and reg fetch), and EX (execute add/logic/effective
address computation.)  If there's a load/store the pipeline stops until
completed.

The 16-bit datapath is 8 rows by 5 columns of CLBs (Xilinx Configurable
Logic Blocks) (only ~20% of an XC4005 which has an array of 14x14 CLBs).
The columns are: rfa (reg file read port A), rfb (reg file read port B),
mux (multiplex B or immediate data), adder, logic unit (and, or, xor,
xnor).  Results (add/logic/load data) are multiplexed into a write-back
register on long lines (LLs) using the XC4000's dedicated LL tristate
drivers.

For this first design I avoided a separate PC incrementor and associated
multiplexors and instead use r15 for a PC.  Thus the clock phases are:

phase	register file		exec. unit	load/store

1	write back result reg	add 2 to PC	latch insn, read another
2	read next A, B regs	add 2 to PC
3	write back PC		user insn add/logic
4	read PC			user insn add/logic

(The execution unit takes two clocks to add/mux result at (unproven) 40
MHz.)

A nice aspect of this design is the alternating inc-PC and user-insn
cycle means that the previous user insn finishes and any results are
written back to the reg file before the next user insn operands are
read, thus eliminating any need for bypass multiplexors in the operand
busses or ugly operation latencies in the programming model.

To date I have this design running using the 11 MHz Xilinx XChecker
circuit probe, incrementing PC, fetching instructions from an on-chip
16-word boot ROM, and performing ALU operations, but haven't yet
implemented condition codes, branch or load/store circuitry.  Soon!  (I
know it works as far as it does because I can verify internal state: the
XChecker probe allows you to examine the state of every function
generator and flip flop on the part.)

As for top speed, XDelay static timing analysis (I don't have the
simulator software) indicates I should be able to clock this at 40 MHz
(25 ns).  (I do have a critical path or two to better pipeline yet).
Thus it should do 10 peak MIPS, not too shabby for a first design.

One neat thing about the Xilinx XC4000 architecture (and I haven't
seriously looked at the other FPGA vendor's architecture's to know if
this is unique or inferior or superior) is there are enough flip flops
mixed in with the function generators that you can make a RISC datapath
in as few as three columns of CLBs: one register file (that you have to
take two clocks to read two operands), one adder, one logic unit, result
multiplexing being done on the LLs using tristate drivers).  And using
the dedicated carry paths you can do 16-bit adders in 9 CLBs, delay
about 25 ns, and 32-bit adders in 17 CLBs, delay about 35 ns.

As for the dual-issue 32-bit NVLIW1 my current plans are for a two-unit
implementation of a simple VLIW achitecture.  Each "unit" has a separate
16 32-bit register file, and 3 operand instructions (rdest = ra op rb),
rdest and rb are local to the unit, specified using a 4-bit reg no., but
ra can be read from either unit, and is spec'd using 1+4 bits.  Thus a
2-unit machine has a basic 34-bit insn word:
{ op0: 4, rd0: 4, ra0: 5, rb0: 4, op1: 4, rd1: 4, ra1: 5, rb1: 4 }.

(I'd obviously like to get that 34-bit word down to 32-bits but there
isn't much fluff left.  Any ideas out there?  32 - 2*(4+5+4) = 6, and
six bits doesn't encode two operations very well...)

Using the above "modestly decoupled" architecture, a separate PC
incrementer, bypass result multiplexing, a VLIW-like limited access
between register files/functional units, it should do peak two
instructions in two clocks at 25 ns, or 40 MIPS.  Here, the columns of
functional units in the data path floor plan will be something like
  LAMMRRR RRRMMAL
  (L=logic unit, A=adder, MM=4-way A-bus source mux,
  RRR=3-read 2-write register file)
with the two halves being placed such that splitting the
LL bus lets me mux the adder or logic unit results of each concurrently.

Thus the datapath of this 32-bit dual-issue machine should fit nicely in
14 columns X 17 rows of a 20x20 XC4010.  On a 4013 (24x24) I would add
a 16-entry 256-byte direct mapped cache (16 16-byte lines) whose cache
and data SRAMs would burn another 5 rows by 16 columns.  On a 4025,
(32x32) ...

It is amazing what you can squeeze onto these parts if you design the
machine architecture carefully to exploit FPGA resources.  In contrast,
there was a very interesting article in a recent EE Times by a fellow
from VAutomation doing virtual 6502's in VHDL, then synthesizing them
down into arbitrary FPGA architectures.  Although the 6502 design used
only about 4000 "ASIC gates" it didn't quite fit in a XC4010, a so-
called "10,000 gate" FPGA.  That a dual-issue 32-bit RISC should fit,
and a 4 MHz 6502 does not, states a great deal about VHDL synthesis
vs. manual placement, about legacy architectures vs. custom ones, and
maybe even something about CISC vs. RISC...

Well, that serves as kind of a brain dump of work (play) in progress.
Please drop me a line if you have questions, advice, etc.

Jan Gray
Redmond, WA
jsgray@ix.netcom.com (home: hacking processors)
jangr@microsoft.com  (work: hacking Microsoft Visual C++)

Article: 502
Subject: Re: driving PCI
From: joel@hibp2.ecse.rpi.edu (Joel Glickman)
Date: 12 Dec 1994 01:04:08 GMT
Links: << >> << T >> << A >>

jhallen@world.std.com (Joseph H Allen) writes:

>In article <3ca509$gd5@atlantic.merl.com>, Doug Hahn  <hahn@ca.merl.com> wrote:
>>I was wondering if anyone has any experience driving a PCI
>>bus from an XC4000 device.  PCI has specific I/V characteristics
>>which need to be met and has anybody have any experience meeting
>>these criteria (what output driver configuration is needed?).

>I haven't tried it, but the I/V curve for the 3K and 2K devices is just in
>the accepted area for PCI.  I assume 4K is too.  The only problem I can
>think of is that having a large number of pins driving a high-capacitance bus
>all on the same clock edge probably exceeds the power capability of the
>chip.  I think there's a capacitance per power pin limit in the 4K specs
>somewhere; so be sure to check it.

>I'd like to here if you have any success with this.

>-- 
>/*  jhallen@world.std.com (192.74.137.5) */               /* Joseph H. Allen */

I too am interested in this .. I have contacted Xilinx and they have a PCI
compatibility package .. Just send email to pci@xilinx.com and they'll send
it out to you.  From what I gather, their XC3100 series is fully PCI compliant.
They even have a handy VHDL file that implements a PCI bridge..

-Joel
 glickj@rpi.edu

Article: 503
Subject: FPGA Design Consulting Service Available
From: Phillip Roberts <proberts@rmii.com>
Date: 12 Dec 1994 03:59:14 GMT
Links: << >> << T >> << A >>




	If you have a Xilinx FPGA design that you would like completed,

	Optimum Solutions can help you out!


	Optimum Solutions offers Viewlogic schematic entry, 

	Simulation, and Xilinx FPGA design services over the Internet.


	All you need to do is present a rough block diagram and/or 

	a rough specification of what you want. You will receive 

	detailed Viewlogic schematics, Viewsim simulation vectors, 

	timing and performance specifications, and a fully 

	routed LCA file.


	All this for a reasonable fee!


	If your interested please E-Mail Phillip Roberts 

	at Optimum Solutions. proberts@rmii.com

Article: 504
Subject: Re: L-Edit and Benchmarks
From: jackg@downhaul.crc.ricoh.com (Jack Greenbaum)
Date: 12 Dec 1994 23:25:37 GMT
Links: << >> << T >> << A >>

In article <3c80k0$9j7@timp.ee.byu.edu> hutch@timp.ee.byu.edu (Brad Hutchings) writes:

   From: hutch@timp.ee.byu.edu (Brad Hutchings)
   Newsgroups: comp.arch.fpga
   Date: 8 Dec 1994 15:16:32 -0700
   Organization: ECEN Department, Brigham Young University
   References: <1994Dec7.210319.3344@super.org>

   |> 
   |> I think the format for any benchmarks should be either 
   |> 
   |> 1) pen and paper or
   |> 2) C code
   |> 
   |> The idea here is to specify the benchmark at the highest level.
   |> In the pen and paper approach the benchmark is pure algorithm.
   |> Consisting of a mathematical description of the input data, the
   |> algorithm and output data. The C code approach gives the implementor
   |> a real example of the behavior of an algorithm. 

   Why C-code? C-code seems like a poor choice as it only supports 
   sequential semantics. Thus any algorithm implemented in C will
   demonstrated *sequential* behavior. However, the goal is to implement 
   hardware that is highly concurrent. I don't see where C will be
   helpful in this case.

C is useful for specification of the input and output of the algorithm.
It is a much more well-defined specification language than English for
example. There are many IEEE specs that use either C or Pascal (e.g. the
Ethernet spec) to augment text descriptions. 

The original author's first sentence says explicitly "The idea here is to
specify the benchmark at the highest level". This says nothing about
proposing that your benchmark implementation must follow this behavior,
only that it solve the same problem.  Maybe I'm just reading what makes
sense to me and attributing it to the original author.

   |> A C benchmark allows
   |> the designer to make hardware/software tradeoffs -- if their using
   |> a big board they can put the whole algorithm in hardware if they
   |> are using a small board they can cut up the program and divide and 
   |> conqure. 

   I think that C falls flat when it comes to hardware/software tradeoffs.
   Again, a C-implementation of an algorithm and a hardware implementation
   of an algorithm will likely be quite different for the reasons that
   I expressed above. Using C will only complicate the design-space
   search for mixed hardware-software solutions.

Only if you use C as input to your system, as opposed to the
definition of a computational problem which you are to demonstrate your
system's ability to solve using it's own specification paradigm. Just
another view.

   -- 
	   Brad L. Hutchings (801) 378-2667          Assistant Professor
   Brigham Young University - Electrical Eng. Dept. - 459 CB - Provo, UT 84602
			  Reconfigurable Logic Laboratory

--
Jack Greenbaum       | Ricoh California Research Center
jackg@crc.ricoh.com  | 2882 Sand Hill Rd. Suite 115
(415) 496-5711 voice | Menlo Park, CA 94025-7002
(415) 854-8740 fax   | 
-- 
Jack Greenbaum       | Ricoh California Research Center
jackg@crc.ricoh.com  | 2882 Sand Hill Rd. Suite 115
(415) 496-5711 voice | Menlo Park, CA 94025-7002
(415) 854-8740 fax   |

Article: 505
Subject: Re: L-Edit and Benchmarks
From: hutch@timp.ee.byu.edu (Brad Hutchings)
Date: 13 Dec 1994 09:11:02 -0700
Links: << >> << T >> << A >>


In article <JACKG.94Dec12152537@downhaul.crc.ricoh.com>, jackg@downhaul.crc.ricoh.com (Jack Greenbaum) writes:
|> In article <3c80k0$9j7@timp.ee.byu.edu> hutch@timp.ee.byu.edu (Brad Hutchings) writes:
|> 
|>    Why C-code? C-code seems like a poor choice as it only supports 
|>    sequential semantics. Thus any algorithm implemented in C will
|>    demonstrated *sequential* behavior. However, the goal is to implement 
|>    hardware that is highly concurrent. I don't see where C will be
|>    helpful in this case.
|> 
|> C is useful for specification of the input and output of the algorithm.
|> It is a much more well-defined specification language than English for
|> example. There are many IEEE specs that use either C or Pascal (e.g. the
|> Ethernet spec) to augment text descriptions. 

Sure. But why would C be better than VHDL, for example (which the author
was arguing against)? VHDL allows a much broader range of abstractions
that make sense for both hardware and software. Concurrent and
sequential semantics are supported. If all that matters is the
specification of the benchmark and not its implementation then
just about any *executable* specification will do. However, if the
eventual goal is to compare different systems and approaches, it would be
useful to have a specification language that can get closer to
hardware so that specific approaches and implementation strategies
can be directly compared.

|> 
|> The original author's first sentence says explicitly "The idea here is to
|> specify the benchmark at the highest level". This says nothing about
|> proposing that your benchmark implementation must follow this behavior,
|> only that it solve the same problem.  Maybe I'm just reading what makes
|> sense to me and attributing it to the original author.
|> 
|>    |> A C benchmark allows
|>    |> the designer to make hardware/software tradeoffs -- if their using
|>    |> a big board they can put the whole algorithm in hardware if they
|>    |> are using a small board they can cut up the program and divide and 
|>    |> conqure. 
|> 
|>    I think that C falls flat when it comes to hardware/software tradeoffs.
|>    Again, a C-implementation of an algorithm and a hardware implementation
|>    of an algorithm will likely be quite different for the reasons that
|>    I expressed above. Using C will only complicate the design-space
|>    search for mixed hardware-software solutions.
|> 
|> Only if you use C as input to your system, as opposed to the
|> definition of a computational problem which you are to demonstrate your
|> system's ability to solve using it's own specification paradigm. Just
|> another view.

I missed your point. I was commenting on how C is of very little use
for doing software/hardware tradeoffs. How does C help here? C can
help to *define* the problem but it seems like a poor choice if the real goal
is to to experiment with different hardware/software tradeoffs.



-- 
        Brad L. Hutchings (801) 378-2667          Assistant Professor
Brigham Young University - Electrical Eng. Dept. - 459 CB - Provo, UT 84602
                       Reconfigurable Logic Laboratory

Article: 506
Subject: Re: Any Good HDL Tools for the PC?
From: chaseb@netcom.com (Bryan Chase)
Date: Tue, 13 Dec 1994 19:17:15 GMT
Links: << >> << T >> << A >>

Ryan Raz (morph@io.org) wrote:
: We are currently working on a large design including digital filters,
: shift registers, SRAM's, VRAM's, etc. Also we will be using FPGA's
: to handle timing, control and ALU functions.

: We are looking for CAD tools for overall design description and simulation
: and FPGA synthesis. So far we have looked at Data I/O's Synario,
: Viewlogic's Pro Series, Exemplar Logic and the Xilinx development tools.

: Are there any comments on these systems or on alternatives?

I've use the Viewlogic WORKVIEW Plus on a PC, over the Xilinx compilers
(the new 5.0 on a Sparc, as well as the older versions on a PC) with 
pretty good results.  The PC platform crashes a lot, I believe due to
the overabundance of M*cr*S*ft products on it, but it's fairly useable.
A better solution, IMO, would be a real workstation.

The Altera tools for the PC work pretty well too, although I am not as
familiar with how well their compiler works.  I do know that it is quite
easy to code up some AHDL (Altera's Hardware language) and get it
compiled.  They have some nice design and debug tools.  Their waveform
simulator could use some additions, like accepting macro-command file
input, but it has its edges rounded better than most.

If you are truly looking for HDL and *NOT* schematic entry, your low-
cost choice should probably be the Altera tools, since the Xilinx
tools make it somewhat difficult to do purely HDL designs.  I am not
familiar with other design environments, but many do exist, for other
FPGAs not mentioned.

Article: 507
Subject: Re: L-Edit and Benchmarks
From: guccione@sparcplug.mcc.com (Steve Guccione)
Date: Tue, 13 Dec 1994 19:56:56 GMT
Links: << >> << T >> << A >>

[... use of C, VHDL as benchmarking language ...]

A different approach, and one that is gaining popularity, is to
specify the algorithm in a general way, then permit any sort of
implementation.  This allows true testing of the architecture, rather
than the cleverness of the optimizer (not that testing optimizers is
necessarily a bad thing).  I believe the SLALOM benchmark takes this
approach.

this is probably more practical, considering the wide range (or maybe
lack of range :^) of programming language support for these
machines.

I personally would favor a high level language which permits the
explicit expression of parallelism.  There has been some interesting
work done along these lines with data parallel C and Occam.

But I know at least one person who believes FORTRAN will be necessary
if these machines are to be accepted by the high performance computing
community (but I'd rather not even think about it ...).

-- Steve
-- 12/13/94

Article: 508
Subject: FCCM'95 final Call for Papers
From: jma@descartes.super.org (Jeffrey M. Arnold)
Date: Tue, 13 Dec 1994 23:12:35 GMT
Links: << >> << T >> << A >>

     #######  #####   #####  #     #           ###    #####  #######
     #       #     # #     # ##   ##           ###   #     # #
     #       #       #       # # # #            #    #     # #
     #####   #       #       #  #  #           #      ###### ######
     #       #       #       #     #                       #       #
     #       #     # #     # #     #                 #     # #     #
     #        #####   #####  #     #                  #####   #####

                     C A L L    F O R    P A P E R S

                            THE THIRD ANNUAL
         IEEE SYMPOSIUM ON FPGAs FOR CUSTOM COMPUTING MACHINES
                            Napa, California
			   April 19 - 21, 1995

            For more information, refer to the WWW URL page:  
             http://www.super.org:8000/FPGA/comp.arch.fpga


PURPOSE:   To bring together researchers to present  recent work 
in the use of Field Programmable Gate Arrays or other means for 
obtaining reconfigurable computing elements.  This symposium will 
focus primarily on the current opportunities and problems in this 
new and evolving technology for computing.

SOLICITATIONS:  Papers are solicited on all aspects of the use or 
applications of FPGAs or other means for obtaining reconfigurable 
computing elements in attached or special-purpose processors or 
co-processors, especially including but not limited to:
) Coprocessor boards for augmenting the instruction set of general-
  purpose computers.
) Attached processors for specific purposes (e.g. signal processing).
) Languages, compilation techniques, tools, and environments for 
  programming.
) Application domains.
) Architecture prototyping for emulation and instruction.
A special session will be organized in which venders of hardware and 
software can present new or upcoming products involving FPGAs for 
computing.

SUBMISSIONS: Authors should send submissions (4 copies, 10 pages
double-spaced maximum) before January 16, 1995, to Peter Athanas.  A 
Proceedings will be published by the IEEE Computer Society.  Specific 
questions about the conference should be directed to Kenneth Pocek.

SPONSORSHIP: The IEEE Computer Society and the TC on Computer Architecture.

CO-CHAIRS:
Kenneth L. Pocek					
Intel							
Mail Stop RN6-18					
2200 Mission College Boulevard 					
Santa Clara, California  95052				
(408)765-6705 voice (408)765-5165 fax			
kpocek@sc.intel.com					

Peter M. Athanas
Virginia Polytechnic Institute and State University	
Bradley Department of Electrical Engineering	
340 Whittemore Hall
Blacksburg, Virginia 24061-0111	
(703)231-7010 voice (703)231-3362 fax
athanas@vt.edu

ORGANIZING COMMITTEE:
Jeffrey Arnold, Supercomputing Research Center		
Brad Hutchings, Brigham Young Univ.
Duncan Buell, Supercomputing Research Center		
Tom Kean, Xilinx, Inc. (U.K). 
Pak Chan, Univ. California, Santa Cruz			
Wayne Luk, Oxford Univ.
Apostolos Dollas,  Technical Univ. of Crete

Article: 509
Subject: Benchmark Specs. in C
From: sc@vcc.com (Steve Casselman)
Date: Wed, 14 Dec 1994 04:23:40 GMT
Links: << >> << T >> << A >>

> |> 
> |> I think the format for any benchmarks should be either 
> |> 
> |> 1) pen and paper or
> |> 2) C code
> |> 
> |> The idea here is to specify the benchmark at the highest level.
> |> In the pen and paper approach the benchmark is pure algorithm.
> |> Consisting of a mathematical description of the input data, the
> |> algorithm and output data. The C code approach gives the implementor
> |> a real example of the behavior of an algorithm. 
> 
> Why C-code? C-code seems like a poor choice as it only supports 
> sequential semantics.Thus any algorithm implemented in C will
> demonstrated *sequential* behavior. 

A C-coded (or any sequential language) *emulation* of an algorithm is a 
form of specification. It does not have to give an exact form of implemention
nor does it have to specify hardware structure. A C program can however
give one a measure of performance when comparing throughput of present day 
CPUs against new reconfigurable architectures.

> However, the goal is to implement 
> hardware that is highly concurrent. I don't see where C will be
> helpful in this case.

In my opinion the goal is to implement algorithms in hardware to accelerate
a computation. If there is one thing C has going for it is there are lots
of algorithms (and benchmarks) implemented in that language.  

> 
> |> A C benchmark allows
> |> the designer to make hardware/software tradeoffs -- if their using
> |> a big board they can put the whole algorithm in hardware if they
> |> are using a small board they can cut up the program and divide and 
> |> conqure. 
> 
> I think that C falls flat when it comes to hardware/software tradeoffs.

It seems to me that we need to start with software to have a hw/sw tradeoff 
otherwise we will have to rewrite the billion lines of code all over again.

> Again, a C-implementation of an algorithm and a hardware implementation
> of an algorithm will likely be quite different for the reasons that
> I expressed above. Using C will only complicate the design-space
> search for mixed hardware-software solutions.

I agree that if we use C as the implemention specification it puts all the burden
on the complier writer. I think it will come to that some day however. For now
I think we should just specify (give an example of) the algorithm to be implemented.

For example.

main() {

subroutine1() {
.....
.....
}

subroutine2() {
.....
.....
}

subroutine3() {
.....
.....
}

}

A large system might be able to place all the functions in hardware at one time.
A smaller system might just implement a three hardware designs sequentially. A
very small system might have to parse each subroutine into more than one hardware
object. Exactly how these subroutines are implemented (VHDL verilog schematic) 
would be up to the designer. But at least we would know everyone is getting the
same results because we all have the same C program benchmark to refer to.


Steve Casselman

Article: 510
Subject: Any Way to Download a XNF to FPGA?
From: dyliu@dyliu.dorm2.nctu.edu.tw (¤GªÙªk§J)
Date: 15 Dec 1994 08:58:01 GMT
Links: << >> << T >> << A >>


           hi all:

		as title, any easy way to make the XNF file download!!

--

Article: 511
Subject: LOGIC MINIMIZATION
From: weedk@salmon.wv.tek.com (Kirk A Weedman)
Date: 15 Dec 1994 13:20:35 -0800
Links: << >> << T >> << A >>

I'm looking for some shareware (preferably in C) code that does logic
minimization - i.e. Presto, Espresso II, boozer, MINI, etc.. I would
really like to get the ESPRESSO II algorithm so I can implement it
in some code I'm writing.

	Kirk
	weedk@pogo.wv.tek.com

Article: 512
Subject: Random numbers.
From: sc@vcc.com (Steve Casselman)
Date: Fri, 16 Dec 1994 01:27:28 GMT
Links: << >> << T >> << A >>


The RNG I implemented was like a fibonacci number
generator except I add one every time I access the
EVC over the SBus. I take two random integers stick
them together format them and subtract 1. The result
are double float random numbers. I then use this
RNG in the NAS embar supercomputer benchmark. The
results are below. It should be noted that whatever
speed this benchmark will run on a Sparc 20 it will run
around twice as fast using an EVC.

Steve Casselman

Run 1
numbers > 0.000000 and < 0.100000 99979
numbers > 0.100000 and < 0.200000 100238
numbers > 0.200000 and < 0.300000 100190
numbers > 0.300000 and < 0.400000 99991
numbers > 0.400000 and < 0.500000 100290
numbers > 0.500000 and < 0.600000 99362
numbers > 0.600000 and < 0.700000 100268
numbers > 0.700000 and < 0.800000 99627
numbers > 0.800000 and < 0.900000 100384
numbers > 0.900000 and < 1.000000 99671
Total random numbers = 1000000
max = 0.999998027735670
min = 0.000000566747618

Run 2
numbers > 0.000000 and < 0.100000 100082
numbers > 0.100000 and < 0.200000 100478
numbers > 0.200000 and < 0.300000 100391
numbers > 0.300000 and < 0.400000 100174
numbers > 0.400000 and < 0.500000 99763
numbers > 0.500000 and < 0.600000 99686
numbers > 0.600000 and < 0.700000 99779
numbers > 0.700000 and < 0.800000 100166
numbers > 0.800000 and < 0.900000 99272
numbers > 0.900000 and < 1.000000 100209
1000000
max = 0.999998301519109
min = 0.000000095615990


Using Sparc Station 2 for everything
CPU TIME =  632.6700 (684.9300 when complied with -p)
N = 2^24
NO. GAUSSIAN PAIRS = 13176389.
COUNTS:
0       6140517.
1       5865300.
2       1100361.
3         68546.
4          1648.
5            17.
6             0.
7             0.
8             0.
9             0.

Using EVC just for random numbers
CPU TIME = 353.880004
N = 2^24
NO. GAUSSIAN PAIRS = 13177271.
COUNTS:
0 	6138931.
1 	5865486.
2 	1101640.
3 	  69558.
4 	   1634.
5 	     22.
6 	      0.
7 	      0.
8 	      0.
9 	      0.

top two functions were replaced by hardware 26.3+23.9 = 50.2%
its a little more if you take out mcount which is the 
profiler itself.

 %time  cumsecs  #call  ms/call  name
  26.3   179.74100663620     0.00  _aint
  23.9   343.38    257   636.73  _vranlc_
  15.1   446.7813176389     0.01  _sqrt
  14.2   543.68      1 96900.00  _MAIN_
  13.7   637.73                  mcount
   6.8   684.5813176389     0.00  _log
   0.0   684.59      3     3.33  _cfree
   0.0   684.60      5     2.00  _ioctl
   0.0   684.61     16     0.62  _write
   0.0   684.61     11     0.00  .div
   0.0   684.61     11     0.00  .mul
   0.0   684.61     64     0.00  .rem
   0.0   684.61     16     0.00  .udiv
   0.0   684.61      3     0.00  .umul
   ... other junk

Article: 513
Subject: Re: Any benchmark for FCMs?.
From: wgomes@wiliki.eng.hawaii.edu (Wilfred Gomes)
Date: Fri, 16 Dec 1994 01:58:53 GMT
Links: << >> << T >> << A >>


I still do not see, how a design  spread over
256 FPGA's would rate favourably on a benchmark.. other than being an academic
exercise ..!!





wilfred

Article: 514
Subject: Re: Any Way to Download a XNF to FPGA?
From: fliptron@netcom.com (Philip Freidin)
Date: Fri, 16 Dec 1994 07:07:54 GMT
Links: << >> << T >> << A >>

In article <3cp0ep$914@news.csie.nctu.edu.tw> dyliu@dyliu.dorm2.nctu.edu.tw (¤GªÙªk§J) writes:
>
>           hi all:
>
>		as title, any easy way to make the XNF file download!!
>
>--

YES:	(assuming you are using an XC4000 style device, and version 5 SW)

	1)	xnfmerge  filename.xnf    <<< this is your xnf file
	2)	xnfprep filename	  <<< merge generated an .xff 
						file, and this turns it
						into a  .xtf file
	3)	ppr filename		  <<< ppr turns the .xtf into .lca
	4)	makebits filename	  <<< this turns the .lca file
						into a .bit file
	5)	xchecker filename	  <<< this takes the .bit file
						and downloads it to a chip.


ALL THE BEST
	Philip Freidin   :-)     :-)     :-)


I couldn't help my self.

Article: 515
Subject: Industry FPGA Applications?
From: jff@mrc.uidaho.edu (Jim Frenzel)
Date: 16 Dec 1994 15:19:55 GMT
Links: << >> << T >> << A >>

I am preparing a presentation for an IEEE conference on 
FPGA applications and would like to hear from people
in industry.  (I have the FCCM proceedings, but *most* of
those papers are from academics).

I would also be very interested in hearing from FPGA vendors
as to how customers are using their parts. (Great opportunity
to showcase your product!) :-)

--

  Jim Frenzel, Asst. Prof   Electrical Engineering, BEL 213
  208-885-7532              University of Idaho         
  jfrenzel@uidaho.edu       Moscow, ID 83844-1023 USA

Article: 516
Subject: Re: Any Way to Download a XNF to FPGA?
From: brekke@dopey.me.iastate.edu (Monty H. Brekke)
Date: 16 Dec 1994 21:12:47 GMT
Links: << >> << T >> << A >>

In article <3cp0ep$914@news.csie.nctu.edu.tw>,
GªÙªk§J <dyliu@dyliu.dorm2.nctu.edu.tw> wrote:
>
>           hi all:
>
>		as title, any easy way to make the XNF file download!!
>
>--

   Nope. You absolutely must place/route the XNF file and run Makebits.

						--Monty

Article: 517
Subject: Analog FPGA ???
From: mittra@sequent.com (Swapnajit Mittra)
Date: Sat, 17 Dec 94 01:39:57 GMT
Links: << >> << T >> << A >>

  Few days back I saw some posting regarding 'analog FPGA' in this
  newsgroup. Somebody gave reference to some articles in Electronic
  Design(?). Can someone repost the information regarding that ?

  Thanks in advance,
  Swapnajit

Article: 518
Subject: Call For Papers ASIC '95
From: rauletta@site.gmu.edu (Richard J. Auletta)
Date: 17 Dec 1994 13:58:00 GMT
Links: << >> << T >> << A >>



-------------------------------------------------------------------
       #     #####    ###    #####            ###    #####  #######
      # #   #     #    #    #     #           ###   #     # #
     #   #  #          #    #                  #    #     # #
    #     #  #####     #    #                 #      ###### ######
    #######       #    #    #                             #       #
    #     # #     #    #    #     #                 #     # #     #
    #     #  #####    ###    #####                   #####   #####

                               Eighth Annual
                 APPLICATION SPECIFIC INTEGRATED CIRCUIT
                       Conference and Exhibit 1995

               "Implementing the Information Superhighway 
                       with Emerging Technologies"

                         Stouffer Renaissance Hotel
                              Austin, Texas
                             September 18-22


                 CALL FOR PAPERS, TUTORIALS, & WORKSHOPS 

  The IEEE International ASIC Conference and Exhibit  provides a forum
  for examining current issues related to ASIC applications and system
  implementation, design, test, and design automation.  The conference
  offers a  balance of emphasis on  industry standard  techniques  and 
  evolving research topics. Information is exchanged through workshops,
  tutorials,  and paper presentations.  These promote an understanding
  of the current technical challenges and issues of system integration
  using  programmable logic devices, gate arrays, cell based  ICs, and
  full custom ICs in both digital and analog domains.  
  ____________________________________________________________________

  Technical Papers, Tutorials, and Workshop Proposals are solicited in
  the following areas: 

ASIC Applications:    Wireless Communications, PC/WS and Peripherals,
                      Multimedia, Networking, Image Processing, Data
                      Communications, Storage Technologies, Graphics, 
                      Digital Signal Processing 

Technologies:         Digital, Analog, Mixed Signal, CMOS, BiCMOS, ECL,
                      GaAs 

CAD Tools:            Design Capture, Layout, Test, Synthesis,
                      Modeling, Simulation  

Architectures:        PLDs, Gate Arrays, Cell Based ICs, Full Custom ICs 

Evolving Research:    Research in Methodologies, Tools, Technologies &
                      Architectures  

Design Methodologies: System Design, Top-down, Graphical, HDLs  

Manufacturing:        Process, Testability, Packaging 

Workshops: Four or eight hour technical workshops covering ASIC design
knowledge and skills.  Proposals to form these workshops for either
introductory or advanced levels are invited. ASIC industry as well as
universities are encouraged to submit proposals. Contact the Workshop
Chair.  

______________________________________________________________________

                       INSTRUCTIONS TO AUTHORS 

Authors of papers, tutorials, and workshops are  asked to submit 15
copies of a review package that consists of a  500 word summary and
a title page. The title page should include the technical area from
above, the title,  a 50 word abstract, the authors names as well as 
an indication of the primary contact author with a COMPLETE mailing
address,  telephone number and TELEX/FAX/Email.  The summary should
clearly state:   1) title of the paper;  2) the purpose of the work; 
3) the major contributions to the art; and  4) the specific results 
and their significance.  

                           IMPORTANT DATES

            Summaries and Proposals due:      March 3, 1995 
            Notification of Acceptance:      April 14, 1995 
            Final Camera Ready Manuscript due: June 2, 1995 

                        SEND REVIEW PACKAGE TO 

                        Lynne M. Engelbrecht 
                        ASIC Conference Coordinator 
                        1806 Lyell Avenue 
                        Rochester, NY 14606 
                        Phone: (716) 254-2350 
                        Fax: (716) 254-2237 

CONFERENCE INFORMATION 
http://asic.union.edu 
Proceedings, and the Advance Program 
Airline Discounts, Exhibits, Technical Sessions, 
Schedule, Registration, Hotel Sites, 


CONFERENCE CHAIR	TECHNICAL CHAIR		WORKSHOP CHAIR
William A. Cook		Richard A. Hull 	P. R. Mukund
Eastman Kodak Co.	Xerox Corp.		RIT
Rochester, NY 14650	Webster, NY 14580	Rochester, NY 14623
Phone: (716) 477-5119 	Phone: (716) 422-0281	Phone: (716) 475-2174
Fax: (716) 477-4947 	Fax: (716) 422-9237 	Fax: (716) 475-5845
bcook@kodak.com 	rah.wbst102a@xerox.com 	mukund@cs.rit.edu


EXHIBIT CO-CHAIRS

Kerry Van Iseghem 	Kenneth W. Hsu
LSI Logic Corporation 	RIT
Victor, NY 14564	Rochester, NY 14623
Phone: (716) 233-8820	Phone: (716) 475-2655
Fax: (716) 233-8822	Fax: (716) 475-5041
kerryv@lsil.com		kwheec@ritvax.isc.rit.edu

               Sponsored by the IEEE Rochester Section 
        in cooperation with the Solid State Circuits Council  
	            and the IEEE Austin Section
-------------------------------------------------------------------

Article: 519
Subject: Re: L-Edit and Benchmarks
From: mbutts@netcom.com (Mike Butts)
Date: Sat, 17 Dec 1994 22:16:14 GMT
Links: << >> << T >> << A >>

guccione@sparcplug.mcc.com (Steve Guccione) writes:
>But I know at least one person who believes FORTRAN will be necessary
>if these machines are to be accepted by the high performance computing
>community (but I'd rather not even think about it ...).

At least FORTRAN doesn't have pointers...

       --Mike


-- 
Mike Butts, Portland, Oregon   mbutts@netcom.com

Article: 520
Subject: Re: L-Edit and Benchmarks
From: John Forrest <jf@ap.co.umist.ac.uk>
Date: 18 Dec 1994 11:10:36 GMT
Links: << >> << T >> << A >>

In article <D0rM2x.MJr@mcc.com> Steve Guccione,
guccione@sparcplug.mcc.com writes:
> [... use of C, VHDL as benchmarking language ...]
>
> A different approach, and one that is gaining popularity, is to
> specify the algorithm in a genera l way, then permit any sort of
> implementation.  This allows true testing of the architecture, rather
> than the cleverness of the optimizer (not that testing optimizers is
> necessarily a bad thing).  I believe the SLALOM benchmark takes this
> approach.

Well it all depends what one is trying to benchmark:

€ Use of a particular language: this is useful if one wants to see how
good a particular language ³translation² system is. There are major
problems in comparing one language with another  take C and VHDL: C is a
real programming language with which one can get good effective code
although C++ is better, it lacks parallel constructs, timing info and
detailed type sizes; while VHDL is more of a hardware language, but which
has major flaws such as poor timing specification (at least in terms of
what is wanted rather than what has been achived), very limited
concurrent/parallel operations, and which is not very good for writing
software (if one wants to look at hw/sw tradeoffs).

[I ought at this point to indicate that my research area is using FPGAs
to accelerate software by mapping key functions, but the implementations
we produce only make sense when suitably coupled with the microprocessor
and original program.]

€ Language independent algorithms: useful if that is what one wants. The
problem here is that I personally doubt one can be completely independent
of the underlying semantics of ones implementation system. The biggest
problems will be in the nature of I/O, how the protocols are defined and
knowledge about data sizes. This will help some systems and not others.
Another problem is that the amount of concurrency available has major
implications on effective algorithmic complexity.

€ General problem description: this at least allows people to choose the
best algorithm for their technique - one then compares ³fastest sorters²,
say. Problems should beself evident.

Much of this goes back to the problem with benchmarks. They are not, nor
are they supposed to be, real world examples. The problem is that most
people will really be interested in the effects on real examples, rather
than benchmarks. The worst case scenario is that the benchmarks will
costrain the tools, and prove as usual not to be typical.
_____________________________________________________________
Dr John Forrest           Tel: +44-161-200-3315
Dept of Computation       Fax: +44-161-200-3321
UMIST                  E-mail: jf@ap.co.umist.ac.uk
MANCHESTER M60 1QD
UK

Article: 521
Subject: PCI HW Engr: $55-65K; Portland, OR; Verilog/Synopsis; 100 M byte/sec.
From: smaki@teleport.com (Shaun:503-614-9627 VoiceMail)
Date: Mon, 19 Dec 1994 00:28:23
Links: << >> << T >> << A >>


Requires high volume low cost board success in the past with no jumper wires.  
Call for more information.  EMail smaki@teleport.com background/contact 
information.  Full-time work for the right person.  Stock options.

Article: 522
Subject: Any Way to Download a XNF to FPGA
From: h9219523@ (Chiu See Ming <EEE3>)
Date: Mon, 19 Dec 1994 13:27:55 GMT
Links: << >> << T >> << A >>


hello.
	In response the the letter before, I would like to know for example
Xilinx 4000 or 3000, to change XNF and load it into Xilinx, do I need any
software to help me? What is its name, where can I get?


Regards,
David.

Article: 523
Subject: Re: Any Way to Download a XNF to FPGA
From: bobe@soul.tv.tek.com (Bob Elkind)
Date: 19 Dec 1994 16:24:32 GMT
Links: << >> << T >> << A >>

(Chiu See Ming <EEE3>) writes:
>	In response the the letter before, I would like to know for example
>Xilinx 4000 or 3000, to change XNF and load it into Xilinx, do I need any
>software to help me? What is its name, where can I get?

To make this perhaps a bit more plain, here is an illustration:

Schematics are the FPGA equivalent of source code

   .XNF files are the equivalent of source code that has been
   pre-compiled to assembly language source code

   .LCA files are post-compilation linked assembly language code

   .BIT files are equivalent to compiled binary object code

The analogy isn't bulletproof, but good enough for this discussion.

To get from .XNF files to .BIT files, one needs the "compiler" (at least).
For Xilinx FPGAs, both NeoCad and Xilinx *sell* compilers.  There are no
freeware/shareware Xilinx FPGA compilers.  If there was one, I would run
from it, since anyone trying to "offer" one would obviously be irrational.
The "compiler" is the place and route tool, by and large.

If you want to play this game, you will need/want to buy competent and
well-supported (that means *commercial*, in this case) software.

Bob Elkind, Tektronix TV
bobe@tv.tv.tek.com   (speaking for myself)

Article: 524
Subject: ASIC emulation summary
From: linder@ERC.MsState.Edu (Dan Linder)
Date: 19 Dec 94 09:46:54
Links: << >> << T >> << A >>

A couple of weeks ago I asked for info on ASIC emulation from several
groups and promised to provide a summary.  Since all the responses didn't
make it to the different groups, here is a summary of the posting and email
activity starting with my original post.  The material is given in the
order I saw it. Thanks for all of your replies.

Dan

===========================================================================

We're looking into ASIC emulation of the Quickturn variety and are
interested in experiences with Quickturn or any other FPGA-based emulation
systems.  Can you just drop a design on it and start running test vectors
through, or do you still have do some FPGA-like hardware design?  Does it
scale well to large emulations, say of a complete CPU or even multiple
chips?  We hear that the emulation runs 100 times slower than actual
hardware (which seems a little slow).  Are the FPGAs really that much
slower individually or is it a problem with their combination into a larger
system?

Any insights, experiences and/or references to articles describing
experiences would be greatly appreciated.  I will summarize responses for
the net.  Thanks again.

**********************************************************************
  NSF Engineering Research Center for Computational Field Simulation
**********************************************************************
Daniel H. Linder                               linder@erc.msstate.edu
NSF Engineering Research Center for CFS        (601) 325-2057
P.O. Box 6176                                  fax:  (601) 325-7692
Miss. State, MS  39762
**********************************************************************

===========================================================================

Date: Wed, 30 Nov 94 11:52:33 PST
From: mbutts!mbutts%mbutts@uunet.uu.net ( Mike Butts)
To: linder@erc.msstate.edu
Subject: Re: ASIC emulation (Quickturn, etc.)
Cc: mbutts@uunet.uu.net

Hi, Dan, glad you asked.  Here's the reply which I posted to the 
newsgroups.  You might want to contact one of our local people, 
such as Ken Mason, who's our AE in your part of the world, in
our Cary, NC office (919-380-7178).

> We're looking into ASIC emulation of the Quickturn variety and are
> interested in experiences with Quickturn or any other FPGA-based emulation
> systems.  
>
> Can you just drop a design on it and start running test vectors
> through, or do you still have do some FPGA-like hardware design? 

Yes you can.  All the FPGA-specific details are completely contained
within the design compiler, which does the technology mapping from the
source netlist and libraries into the FPGAs.  The emulation user sees
the design elements, netnames, etc. in the design's terms.  After running
your vectors, or even without vectors, you can go directly in-circuit.

> Does it scale well to large emulations, say of a complete CPU or even 
> multiple chips?  

It scales very well to large emulations.  Intel, Sun, and many other
major developers are emulating entire CPU designs, at 2 million gates and more.
Quickturn's System Emulator M3000 has a 3 million gate capacity, with 
provisions for multi-M3000 systems that allow over 10 million gate emulations
off the shelf.  Most CPU developers and many ASSP and ASIC projects now use
Quickturn emulators to run OSs and applications before tapeout.

> We hear that the emulation runs 100 times slower than actual hardware 
> (which seems a little slow).  Are the FPGAs really that much slower 
> individually or is it a problem with their combination into a larger system?

The programmable interconnect inside and between FPGAs does take more time 
than real metal and wires, because of RC delays in pass transistors and many 
more chip-crossings.  100X slowdown is an upper bound in our experience.  
Most emulations run from 1 to 8 MHz.  That's 3 to 5 orders of magnitude faster 
than cycle-based simulators, which is the difference between running lots of 
real code and just doing vectors or one OS boot.  Multi-million-gate CPU 
emulations are slower than 200K gate ASIC emulations, but the CPU projects find
the speed is plenty for what they do so it all works out.  ASICs typically run 
at multi-MHz in current-generation emulators, and there are many techniques 
for successfully matching the target system's speed to the emulator.

> Any insights, experiences and/or references to articles describing
> experiences would be greatly appreciated.  I will summarize responses for
> the net.  Thanks again.

A detailed and quantitative article written by a user is called "Logic Design 
Aids Design Process", by Jim Gateley of Sun, in the July 1994 issue of 
ASIC & EDA.  It's an account of the MicroSPARC II project's experiences with 
the Quickturn Enterprise (previous generation) logic emulator on a 200K 
gate 32-bit SPARC CPU.

"During the 25 days prior to tapeout, the emulated processor and testbed
system successfully executed power-on self tests and open boot PROM, booted
single- and multi-user Solaris, Open Windows, and Open Windows applications.
Altogether, emulation logged 15 bugs and enhancements against MicroSPARC II, 
PROM, and the kernel before tapeout. First silicon was very clean.  
MicroSPARC II shipped three months early."

           --Mike Butts, Emulation Architect, 
                         Quickturn Design Systems (mbutts@qcktrn.com)

===========================================================================

Date: Wed, 30 Nov 94 14:14:47 PST
From: John.Sullivan@Eng.Sun.COM (John J. Sullivan)
To: mbutts@netcom.com, linder@erc.msstate.edu
Subject: Re: ASIC emulation (Quickturn, etc.)
Cc: John.Sullivan@Eng.Sun.COM

In article Fpy@netcom.com,  mbutts@netcom.com (Mike Butts) writes:
> linder@ERC.MsState.Edu (Dan Linder) writes:
>> We're looking into ASIC emulation of the Quickturn variety and are
>> interested in experiences with Quickturn or any other FPGA-based emulation
>> systems.  
>>
>> Can you just drop a design on it and start running test vectors
>> through, or do you still have do some FPGA-like hardware design? 

> Yes you can.  All the FPGA-specific details are completely contained
> within the design compiler, which does the technology mapping from the
> source netlist and libraries into the FPGAs.  The emulation user sees
> the design elements, netnames, etc. in the design's terms.  After running
> your vectors, or even without vectors, you can go directly in-circuit.

Not to put down Quickturn products, but just a FYI:

Quickturn emulation may indeed be very simple if you are doing a medium
to large ASIC design based purely on a gate-library.  However, you're 
greatly understating the problem for any type of semi-custom or 
full-custom design.  Any kind of memory structure in your design such
as RAMs or register files can be very problematic to model, especially
multi-ported memories.  And all of your custom circuits will have to
be modeled at the primitive gate level (behavioral Verilog or VHDL will
have to be completely re-written.)  For a large design, it can take > 1
day to compile the design to be loaded into the emulation system.

MicroSparc-II was a great experience for Sun in terms of both emulation
and design.  The chip (and its predecessor MicroSparc-I) were both
attempts to have highly automated design flows (synthesys, layout, 
chip assembly.)  I believe this gave them an advantage in getting to
emulation quickly because it forced them to avoid complex structures
that would be hard to map.  (They also sacrificed speed and density,
but MicroSparc-II still gained quite an advantage by being quickly 
ported to a 0.5um technology.) 

Our experiences with two other processors SuperSparc-II and UltraSparc-I
were that it took 3-4 engineers plus a full-time Quickturn FAE on site
approximately 8-9 months to bring up the system to run vectors or do ICE.

Quickturn has been very helpful and responsive to our problems, and
their systems have allowed us to go a long distance toward bug-free
silicon. But, I just want to point out that this does not necessarily
come for free without substantial investment of time and resources on
the user's end.

----------------------------------------------------------------------------
John Sullivan, SparcTech VLSI                  | email: sullivan@eng.sun.com
Sun Microsystems, Sunnyvale CA, Bldg. SUN02    | phone: 408-774-8097

===========================================================================

(anonymous)

In general, you are looking at working with a state of the art CAE
tool, one which has no more than a few hundred installations worldwide.
(If I'm wrong, their sales rep can correct that quickly.)  In addition,
you are talking about a very, very complex simulation job.  My past 
experience with CAE tools is that:

   (1) It takes a certain amount of expertise (measured in full time
       people) to get the thing running at all.  This can sometimes be
       avoided by having the vendor set it up for you (clone a working
       site).

   (2) At that point you can "play" with trivial cases -- things well
       within the envelope of what has been stressed by a lot of 
       different users.  If you're not pushing the state of the art
       in size of simulation, number of test cases, performance, etc
       real work can get done at this point.  You probably will stumble
       over a bug or two, and if you bypassed (1) above you will be 
       clueless how to troubleshoot the bug and completely dependent
       on your vendor, who will have their money already and thus be 
       a bit less responsive than they were before they were paid.

   (3) Then you load the real problem on the system, stretching some 
       limit no one has before, or doing something some way the tool's
       designers never thought of, and (1) you will uncover defects
       in the tool, or (2) the problem no longer fits on the tool you
       bought, or (3) performance falls apart and you have to tune it
       back into usefulness.  This is the point at which those full 
       time, really talented people I mentioned above bail out your
       project by figuring out how to work around the tool's bugs and
       limitations.

BTW, the industry puts up with this because without the tools, the design
or simulation problems simply couldn't be done in our lifetimes.

I once knew an engineer who owned a Porche 930 Turbo.  He didn't have the
money to pay $5000 or so to have the engine rebuilt every year or so, so
he rebuilt it himself in his garage.  That was the right tradeoff for him.
Most of the rest of us own simpler, less aggressive, easier to drive,
simpler to maintain cars from GM or Ford.

===========================================================================

From: Kenny Chen - MPG SLV <kenchen@pcocd2.intel.com>
Date: Wed, 30 Nov 1994 15:36:05 -0800
To: linder@ERC.MsState.Edu (Dan Linder)
Subject: Re: ASIC emulation (Quickturn, etc.)
Newsgroups: comp.arch
X-Newsreader: TIN [version 1.2 PL2]

> (My original post was included here.)

Dan,

	1. You just need to synthesize your design into Quickturn's library,
	   which maps into the LCAs on the Xilinx FPGA.  You don't need to
	   do FPGA-like hardware design.  (Unless you want to do.)  Although
	   there's a mode you can use it as a tester for vectors, in general
	   it can do better than that.  AMD used QT emulation for K5 and booted
	   dos/windows on a PC.

	2. Depend on what you mean by "scalable".

	3. Yes, you can put the whole CPU into it, as long as it fits, :)  If
	   your design gets too big, their SW can partition it into several
	   boxes ($$$ !).  For asics you should be able to fit several chips
	   into one box.  Don't trust their sales quote.

	4. It's slow, and be prepared for that.  But it's hundred times faster
	   than simulation.  It's due to the interconnect and backplane routing.

--
 -Kenny Chen

===========================================================================

Newsgroups: comp.arch.fpga
From: dej@eecg.toronto.edu (David Jones)
Subject: Re: ASIC emulation (Quickturn, etc.)
Nntp-Posting-Host: ziffs.eecg.toronto.edu
Organization: Department of Computer Engineering, University of Toronto
Date: 30 Nov 94 22:03:31 GMT

In article <mbuttsD03IM1.G8C@netcom.com>, Mike Butts <mbutts@netcom.com> wrote:
>real code and just doing vectors or one OS boot.  Multi-million-gate CPU 
>emulations are slower than 200K gate ASIC emulations, but the CPU projects find
>the speed is plenty for what they do so it all works out.  ASICs typically run 

Interesting question: What is the slowest "acceptable" speed for logic
emulation of a CPU?

In particular, would 1/256 full-speed be acceptable?

===========================================================================

From: rwieler@ee.umanitoba.ca (wieler)
Newsgroups: comp.arch.fpga
Subject: Re: ASIC emulation (Quickturn, etc.)
Date: 30 Nov 1994 23:38:21 GMT
Organization: Elect & Comp Engineering, U of Manitoba, Winnipeg, Manitoba,Canada
Distribution: world
Reply-To: rwieler@ee.umanitoba.ca
NNTP-Posting-Host: wine.ee.umanitoba.ca

> (My original post was included here.)

No experiences except on our homegrown system, however you should not be
surprised at system level, or large emulutions running 100 times slower.
Remember that CPU bus speeds (internal) are now running at 100 +MHz, think
of the wire length busses have to run through on a board that will fit
such a design for emulation.  There is no way you will get anywhere near
that speed.  However a drop in speed by only 100 is small potatoes, when
you think of the drop in speed when simulating.  Good luck.

Richard 
Dept of Electrical and Computer Eng.
University of Manitoba

===========================================================================

Date: Thu, 1 Dec 94 11:27:52 PST
From: mbutts!mbutts%mbutts@uunet.uu.net ( Mike Butts)
To: John.Sullivan@Eng.Sun.COM, linder@erc.msstate.edu
Subject: Re: ASIC emulation (Quickturn, etc.)
Cc: mbutts@uunet.uu.net

Thanks for your comments, John.  No question that a big full-custom
design emulation can be a big effort.  ASIC designs like Dan
was aking about are going quite a bit easier, especially if their 
clocking isn't too exotic.

We've gotten a lot more capable in our memory modeling lately.
The System Realizer includes a menu-driven memory compiler
which can generate memories for XC4013 CLB implementation in the 
Logic Modules, or in the bigger or more heavily multiported cases,
in the Core Memory Module hardware.

No question that full-custom designs can raise modeling issues
which don't come up in the ASIC world because we've already done the 
libraries.

I'm very glad that we've been able to help with Sun projects.  The
System Realizers reflect much of the experience we gained in 
working with you folks and everyone else, and I believe they are 
another big step forward towards our ultimate goal of making emulation
as easy to use as simulation.  Certainly it remains a complex 
and evolving technology.

Thanks!

       --Mike  (mbutts@qcktrn.com)

----- Begin Included Message -----

(John Sullivan's post above was included here.)

----- End Included Message -----

===========================================================================

Date: Thu, 1 Dec 94 18:24:25 PST
From: weedk@pogo.WV.TEK.COM (Kirk A Weedman)
To: linder@ERC.MsState.Edu
Subject: Re: ASIC emulation (Quickturn, etc.)
Newsgroups: comp.arch.fpga
In-Reply-To: <LINDER.94Nov30111210@gemini.ERC.MsState.Edu>
Organization: Tektronix, Inc., Wilsonville,  OR.

> (My original post was included here.)

I too am curious about their tools.  What tools are you currently using?
I've been using Cadence Concept for schematic capture or a tool called CIRGEN
that automatically generates Concept schematics from equations (can use just
about any vendor library). Next I create an EDIF netlist and feed that into
ALTERA tools along with a mapping file.  So far I like the Altera tools and
parts - one design for use in a 33Mhz processor application - but am looking
at other vendors too. Anyway, let me know what you hear about their tools.

        Kirk    weedk@pogo.wv.tek.com

===========================================================================

From: Paul Micheletti <pm1@sparc.SanDiegoCA.NCR.COM>
Subject: Quickturn (fwd)
To: linder@ERC.MsState.Edu
Date: Mon, 5 Dec 1994 12:00:53 -0800 (PST)
Cc: pm1@sparc.SanDiegoCA.NCR.COM
X-Mailer: ELM [version 2.4 PL20]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 3200      

>  (My original post was included here.)

We just finished evaluating our newly purchased Quickturn MARS 
system, and found the task of emulating an ASIC to be non-trivial, 
but worth-while.  We took an already designed and tested ASIC
off of a known good PC board and replaced it with a Quickturn
emulation model of this ASIC.  This ASIC was approximately 70K gates,
so it easily fit into a single Logic Block Module (LBM).

The biggest problems we encountered were:
   1) timing problems induced by using their RAM macros for
      our RAM blocks.
   2) the learning curve for operating new software.

I never had to know what parts of the design were placed into which
FPGA, because the software adequately hides the need for this info
from the user.  The design is automatically partitioned into 
multiple FPGAs, and the place and rout for these FPGAs was performed
using a neat tool that spawns jobs off to multiple machines, which
is needed when performing >200 FPGA compiles.

Our ASIC test vectors ran against this model at 1MHz which is 1/20th
of the 20 MHz ASIC clock rate.  When we performed the actual emulation
on our PC board, we were able to run the clock at 1.75MHz. This is 
just under 10% of the real ASIC worst case clock rate.

If we used a larger ASIC for this test, our observed clock frequency 
would have been lower because of the added time required when connecting 
multiple LBM modules.  I don't know the exact degradation for this since 
we haven't tried using multiple LBMs yet.

-- Paul Micheletti
-- AT&T Global Information Solutions
-- email: paul.micheletti@sandiegoca.ncr.com

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search