Messages from 38325

Article: 38325
Subject: APEX-II vs VIRTEX-II
From: spholroyd@iee.org (Steve Holroyd)
Date: 11 Jan 2002 10:39:21 -0800
Links: << >> << T >> << A >>

I am currently task of recommending the largest, fastest and most
memory FPGA that's readily available the first half of this year for a
FPGA Array Card.

The choices have been narrowed down to two families Altera's APEX-II
(EP2A70) and XILINX Virtex-II (XC2V6000).

Which can operate at the highest speed?

Steve

Article: 38326
Subject: How to constrain the inputs of a multi-level parity generator and
From: Kevin Brace <nospamtomekevinbraceusenet@nospamtomehotmail.com>
Date: Fri, 11 Jan 2002 12:58:39 -0600
Links: << >> << T >> << A >>

        Hi, I am having problems with trying to constrain the inputs
going into a multi-level parity generator in XST Verilog.
Here I am trying to generate a parity of 36 inputs for my PCI IP core,
and, of course, Xilinx and Altera FPGAs are 4-input LUT-based, so the
input signals go through multiple levels of LUTs to calculate the
parity.
In the first level, the parity generator uses 9 LUTs to calculate
parity.
In the second level, 2 LUTs take in 8 of the 9 outputs of the first
level LUTs, and the remaining one output from the first level LUT will
be used at the third level.
At the final third level, 2 inputs from the second level LUTs and one
input from the first level LUTs will be used to calculate the final
parity calculation result.
Here are the partial Verilog codes for the top module where I
instantiate the parity generator, and the parity generator.


___________________________ Top Module _____________________________



Parity_Generator Parity_Generator_Generator_Instance(
				.clk(clk),
				.Parity_Input({c_be_n[3:0], ad_Port[31:0]}),
				.XORed_Result(Parity_Generated)
				);

____________________________________________________________________




__________________________ Parity Generator ________________________

module Parity_Generator(
						clk,
						Parity_Input,
						XORed_Result
						);

input		clk;
input[35:0]	Parity_Input;
output		XORed_Result;



reg			XORed_Result;

wire[8:0]	First_Intermediate_Parity;
wire[2:0]	Second_Intermediate_Parity;
wire		Final_Parity;


// First level
assign		First_Intermediate_Parity[8]	= Parity_Input[35] ^
Parity_Input[34] ^ Parity_Input[33] ^ Parity_Input[32];
assign		First_Intermediate_Parity[7]	= Parity_Input[31] ^
Parity_Input[30] ^ Parity_Input[29] ^ Parity_Input[28];
assign		First_Intermediate_Parity[6]	= Parity_Input[27] ^
Parity_Input[26] ^ Parity_Input[25] ^ Parity_Input[24];
assign		First_Intermediate_Parity[5]	= Parity_Input[23] ^
Parity_Input[22] ^ Parity_Input[21] ^ Parity_Input[20];
assign		First_Intermediate_Parity[4]	= Parity_Input[19] ^
Parity_Input[18] ^ Parity_Input[17] ^ Parity_Input[16];
assign		First_Intermediate_Parity[3]	= Parity_Input[15] ^
Parity_Input[14] ^ Parity_Input[13] ^ Parity_Input[12];
assign		First_Intermediate_Parity[2]	= Parity_Input[11] ^
Parity_Input[10] ^ Parity_Input[ 9] ^ Parity_Input[ 8];
assign		First_Intermediate_Parity[1]	= Parity_Input[ 7] ^ Parity_Input[
6] ^ Parity_Input[ 5] ^ Parity_Input[ 4];
assign		First_Intermediate_Parity[0]	= Parity_Input[ 3] ^ Parity_Input[
2] ^ Parity_Input[ 1] ^ Parity_Input[ 0];


// Second level
assign		Second_Intermediate_Parity[2]	= First_Intermediate_Parity[8];
assign		Second_Intermediate_Parity[1]	= First_Intermediate_Parity[7] ^
First_Intermediate_Parity[6] ^ First_Intermediate_Parity[5] ^
First_Intermediate_Parity[4];
assign		Second_Intermediate_Parity[0]	= First_Intermediate_Parity[3] ^
First_Intermediate_Parity[2] ^ First_Intermediate_Parity[1] ^
First_Intermediate_Parity[0];


// Final level
assign		Final_Parity	= Second_Intermediate_Parity[2] ^
Second_Intermediate_Parity[1] ^ Second_Intermediate_Parity[0];




always @ (posedge clk) begin

	XORed_Result	<= Final_Parity;


end



endmodule
____________________________________________________________________





        From what I see, the c_be_n[3:0] should go through,

assign		First_Intermediate_Parity[8]	= Parity_Input[35] ^
Parity_Input[34] ^ Parity_Input[33] ^ Parity_Input[32];


But the problem I have here is that when I synthesize the code, XST
Verilog (ISE WebPACK's synthesis tool) or Xilinx MAP somehow
automatically chooses which inputs goes into which LUTs, and I have a
problem with that.
I want "c_be_n[3:0]" which is an unregistered bus signal of PCI bus to 
go through as fewer LUTs as possible to reduce setup time requirements.
For "ad_Port[31:0]," that signal comes from inside of the chip (from
DFFs), so I don't have to worry too much about how many levels of LUTs
it passes through.
I tried disabling (unchecking) XST's option called XOR collapsing, but
it didn't seem to make any difference.
I recently upgraded to the latest ISE WebPACK 4.1WP2.0 from 4.1WP0.0,
but that didn't seem to make any difference, either.
For MAP, setting Map to Inputs option to 4 or 5 didn't seem to make
difference.
        I first noticed this problem when I synthesized my PCI IP core
trying to meet 66MHz PCI timings (Tsu < 3ns, Tval(Tco) < 6ns) just for
curiosity.
In 33MHz PCI, this whole issue of which signals go through how many LUTs
for calculating parity was not a big issue because Tsu only has to be <
7ns.
        I found someone else discussing a better way of calculating
36-bit parity than the method shown above for Virtex architecture
devices, so I modified my code to take advantage of that idea.
Here are the new partial Verilog codes for the top module where I
instantiate the parity generator, and the parity generator.


___________________________ Top Module _____________________________

Parity_Generator Parity_Generator_Instance(
				.clk(clk),
				.Fast_Path_Parity_Input(cben[3:0]),
				.Parity_Input_1(ad_Port[3:0]),
				.Parity_Input_2(ad_Port[7:4]),
				.Parity_Input_3(ad_Port[11:8]),
				.Parity_Input_4(ad_Port[15:12]),
				.Parity_Input_5(ad_Port[19:16]),
				.Parity_Input_6(ad_Port[23:20]),
				.Parity_Input_7(ad_Port[27:24]),
				.Parity_Input_8(ad_Port[31:28]),
				.XORed_Result(Parity_Generated)
				);

____________________________________________________________________



__________________________ Parity Generator ________________________

module Parity_Generator(
						clk,
						Fast_Path_Parity_Input,
						Parity_Input_1,
						Parity_Input_2,
						Parity_Input_3,
						Parity_Input_4,
						Parity_Input_5,
						Parity_Input_6,
						Parity_Input_7,
						Parity_Input_8,
						XORed_Result
						);

input		clk;
input[3:0]	Fast_Path_Parity_Input;
input[3:0]	Parity_Input_1;
input[3:0]	Parity_Input_2;
input[3:0]	Parity_Input_3;
input[3:0]	Parity_Input_4;
input[3:0]	Parity_Input_5;
input[3:0]	Parity_Input_6;
input[3:0]	Parity_Input_7;
input[3:0]	Parity_Input_8;
output		XORed_Result;




reg			XORed_Result;

wire[7:0]	First_Intermediate_Parity;
wire[1:0]	Second_Intermediate_Parity;
wire		Third_Intermediate_Parity;
wire		Final_Parity;


// First level
assign		First_Intermediate_Parity[7]	= Parity_Input_1[3] ^
Parity_Input_1[2] ^ Parity_Input_1[1] ^ Parity_Input_1[0];
assign		First_Intermediate_Parity[6]	= Parity_Input_2[3] ^
Parity_Input_2[2] ^ Parity_Input_2[1] ^ Parity_Input_2[0];
assign		First_Intermediate_Parity[5]	= Parity_Input_3[3] ^
Parity_Input_3[2] ^ Parity_Input_3[1] ^ Parity_Input_3[0];
assign		First_Intermediate_Parity[4]	= Parity_Input_4[3] ^
Parity_Input_4[2] ^ Parity_Input_4[1] ^ Parity_Input_4[0];
assign		First_Intermediate_Parity[3]	= Parity_Input_5[3] ^
Parity_Input_5[2] ^ Parity_Input_5[1] ^ Parity_Input_5[0];
assign		First_Intermediate_Parity[2]	= Parity_Input_6[3] ^
Parity_Input_6[2] ^ Parity_Input_6[1] ^ Parity_Input_6[0];
assign		First_Intermediate_Parity[1]	= Parity_Input_7[3] ^
Parity_Input_7[2] ^ Parity_Input_7[1] ^ Parity_Input_7[0];
assign		First_Intermediate_Parity[0]	= Parity_Input_8[3] ^
Parity_Input_8[2] ^ Parity_Input_8[1] ^ Parity_Input_8[0];


// Second level
assign		Second_Intermediate_Parity[1]	= First_Intermediate_Parity[7] ^
First_Intermediate_Parity[6] ^ First_Intermediate_Parity[5] ^
First_Intermediate_Parity[4];
assign		Second_Intermediate_Parity[0]	= First_Intermediate_Parity[3] ^
First_Intermediate_Parity[2] ^ First_Intermediate_Parity[1] ^
First_Intermediate_Parity[0];


// Third level
assign		Third_Intermediate_Parity		= Second_Intermediate_Parity[1] ^
Second_Intermediate_Parity[0];


// Final level
assign		Final_Parity	= Fast_Path_Parity_Input[3] ^
Fast_Path_Parity_Input[2] ^ Fast_Path_Parity_Input[1] ^
Fast_Path_Parity_Input[0] ^ Third_Intermediate_Parity;




always @ (posedge clk) begin

	XORed_Result	<= Final_Parity;


end



endmodule

____________________________________________________________________



        In the above shown code, "ad_Port[31:0]" has to go through 4
levels of LUTs, but like the previous version, that signal comes from
inside of the chip (from DFFs), so I don't have to worry too much about
how many levels of LUTs it passes through.
The nice part of this method is that "c_be_n[3:0]" only has to go
through 1 level of LUT.
Yes, a 5-input LUT's gate delay is larger than a 4-input LUT's gate
delay, but the 5-input LUT's gate delay is far better than two 4-input
LUTs connected in series with the routing delay between two 4-input
LUTs.
In theory XST should use Virtex architecture's 5-input LUT, but when I
synthesize this code with the same XOR Collapse option disabled, XST
still seems to collapse the XOR structure of the HDL code, and
synthesizes with 3 levels of LUTs using only 4-input LUTs.
How can I work around this problem to instruct XST not to collapse the
XOR gate structure?
The "XOR Collapse" option I unchecked was through Project Navigator ->
Processes for Current Source -> Synthesize -> Properties -> HDL Options.
I am absolutely willing to use XST synthesis constraint file which I
already use to constraint fanouts of individual inputs signals, if that
is possible, but I don't want to insert vendor specific synthesis
directives into my code because my PCI IP code will have to be
synthesizable with other synthesis tools.
I am using ISE WebPACK 4.1WP2.0 to develop my PCI IP core, and the
device I am currently targeting is Xilinx Spartan-II XC2S150-5CPQ208 or
6CPQ208.




Thanks,



Kevin Brace (Don't respond to me directly, respond within the
newsgroup.)

Article: 38327
Subject: Re: Avoid routing through a certain area (Xilinx)
From: Bret Wade <bret.wade@xilinx.com>
Date: Fri, 11 Jan 2002 12:07:40 -0700
Links: << >> << T >> << A >>



rickman wrote:

>
> > After rethinking this I realized that there is no need remove the macro using
> > JBits. Simply delete the macro in FPGA Editor after place and route.
>
> Bret, are you saying that a macro like this can be treated as a single
> object and removed with a single command in the FPGA Editor?

Yes, set the list window to "all macros", pick the macro from the list and delete.

> >
> > Regarding automatic macro creation, since any .ncd file can be converted to an
> > .nmc, it would be possible to use JBits to create an .ncd and then convert the
> > .ncd to an .nmc using FPGA Editor (File-->Save as macro). Since no external pins
> > would be needed for an interface, the only remaining operation is to set a
> > reference component (Select component, Edit-->Set Macro Reference Comp).
>
> What if external pins are needed? In my application, I will be defining
> five blocks of logic all connected to a common bus and to external pins.
> One block would have a lot of IO. The remaining four blocks are all
> equivalent with only about 30 IOs. These four blocks are each loaded
> with a design for 1 of N possible interfaces that match the HW that is
> plugged into the board.

I was using the term "external pins" in the context of macro creation. These are
physical component pins (usually slice pins) that are defined as ports in/out of the
macro and given names matching the ports in the corresponding symbol in the logical
design. What I meant was that no external pins are needed in the dummy macro since it
does not need to interface to real design. I wasn't making any statement regarding I/O
components which can indeed be part of the macro.

Bret

>
>
> --
>
> Rick "rickman" Collins
>
> rick.collins@XYarius.com
> Ignore the reply address. To email me use the above address with the XY
> removed.
>
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design      URL http://www.arius.com
> 4 King Ave                               301-682-7772 Voice
> Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 38328
Subject: Re: Q:Hand placed fast 32 bit barrel shifter for APEX?
From: Ray Andraka <ray@andraka.com>
Date: Fri, 11 Jan 2002 19:37:17 GMT
Links: << >> << T >> << A >>

Your shifter is apparently not being constructed as a merged tree.  If it were a merged
tree, a 32 bit shift would be 5 layers, each layer being a 32 bit 2:1 mux.  The first
layer shifts by either 0 or 16 bits, the next by 0 or 8 bits, the next by 0 or 4 and so
on.  With this construction you get a composite shift of 0 to 31 bits.  Each layer
consists of 32 2:1 muxes, each of which occupies an LE.  5 Layers is 160 LEs.  I suspect
the shifter you are dealing with is instead a set of 32 32:1 muxes.  Each of those is a
tree containing the logic equivalent of 31 2:1 muxes, or about 6 times the logic.  I
further suspect that the logic has been reduced using the cascade chains to make 4:1 muxes
from pairs of LE's, resulting in approximately 3 times the LE's of the merged tree.

Routing usually is not part of the macro, however if it is a placed macro, then the
routing can be more or less forced onto certain paths.  You have to know what the
connection matrix is, however if you are going to congest the routing.  In that respect,
the xilinx devices are easier to tweak for performance because you have access to the
routing matrix when doing the floorplanning, and routes are generally more local so there
is less chance of having other stuff upset the routing in your floorplanned logic.

ssy wrote:

> Hi Ray
>
> Thanks for your help first
>
> but I have some different idea from you
>
> everytime before I compile in quartus, I assign the shifter to a
> custom region that hold three Megalab, it actually took almost every
> le in that 3 megalab(about 450 le), this is my first question, you say
> it ocuppy only 160 le, how to achive this? BTW, my shifter is 32 bit
> rotate left shifter,
>
> and every compilation get different route result, so I think the place
> and route information is not contain in the lib from altera, the P&R
> of the shifter is perform on the fly with the other logic of the
> design, is that right?
>
> hope for further help from you
>
> Ray Andraka <ray@andraka.com> wrote in message news:<3C3E8F95.C4CD51B2@andraka.com>...
> > A 32 bit barrel shift with 32 bits in and out should occupy 160 LEs.
> > Since the proper construction does not use the cascade/carry chains, it
> > can be laid out with 2 bits per LAB, so that it takes up 16 LABs.
> > Depending on how they are laid out, There may not be enough row routes
> > to squeeze it all into a single megalab.  Unfortunately, Altera does not
> > provide information on the row route connections available at each LAB
> > (it is a sparse connection matrix), so doing hand placement can actually
> > hurt performance and density by forcing wires to go through an
> > intermediate lab to make connections.  That said, the routing time is
> > fairly uniform at each level of hierarchy in Altera, so you may find
> > that you get little additional performance trying to do the placement
> > yourself.
> >
> > ssy wrote:
> >
> > > Hi everyone
> > >
> > > I am looking for a fast 32 bit barrel shifter for APEX20K400E, I use
> > > the LPM from Altera, but after P&R, I found it ocuppy three MegaLAB,
> > > and many wire run between them.
> > >
> > > so I think if somebody have hand place the shifter?
> >
> > --
> > --Ray Andraka, P.E.
> > President, the Andraka Consulting Group, Inc.
> > 401/884-7930     Fax 401/884-7950
> > email ray@andraka.com
> > http://www.andraka.com
> >
> >  "They that give up essential liberty to obtain a little
> >   temporary safety deserve neither liberty nor safety."
> >                                           -Benjamin Franklin, 1759

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 38329
Subject: Re: FPGA Synthesis and implementation
From: "H.L" <alphaboran@yahoo.com>
Date: Fri, 11 Jan 2002 11:45:53 -0800
Links: << >> << T >> << A >>

Thanks so much, you are a great help.

"Ray Andraka" <ray@andraka.com> wrote in message
news:3C3DED27.69F8054C@andraka.com...
> If your bus mux is connected directly from the BRAMs and you only have
latency
> of 1, then that is a problem.  If you look at the Tcko of the BRAMs,
you'll find
> that it is pretty long compared to other delays in the chip.  Add to that
the
> fact that routing around the BRAMs is usually pretty congested (especially
if
> you are using the BRAMs as 32 bits wide), plus your routing through
several
> layers of logic before hitting a flip-flop.   You need to put an
additional
> pipeline register between the BRAM and your bus mux.  You will also have
to
> floorplan those flip-flops to be immediately adjacent to the BRAM.  With
32 bits
> wide, You are going to have to avoid using the BRAMs on the edges of the
chips,
> as it gets too congested there to put the registers close enough to the
BRAMs.
> Even then, you may find that you need to use narrower data paths to the
BRAMs to
> get enough fast connections from the BRAM.  If you are using the BRAM ENA
or WE,
> be aware that those inputs have a long set up time too, so the driving
flip-flop
> has to be right there.  Those inputs are actually even more critical than
the
> data read path.  In high performance stuff, I try to keep those inputs
wired to
> constants and control the read/write address counters instead.
>
> "H.L" wrote:
>
> > "Ray Andraka" <ray@andraka.com> wrote in message
> > news:3C3DB8F1.48F10CD1@andraka.com...
> > > 155 MHz is not hard to achieve in a VirtexE (any speed grade), but you
do
> > have
> > > to be careful about how the design is implemented, particularly making
> > sure that
> > > you don't have lots of levels of logic.
> >
> > In the timing errors the max levels of logic is 6, is this good?
> >
> >  You will likely need to do some
> > > floorplanning to get the speed, especially when reading from the
BRAMs.
> > If you
> > > are accessing the BRAMs at 155 MHz, you will need registers
immediately
> > adjacent
> > > to the BRAM with no LUTs between the BRAM and the registers, and you
will
> > have
> > > to floorplan those to place them there.  Depending on how wide the
BRAMs
> > are,
> > > you may not be able to read them at 155MHz in a -6 or if 16 bits wide,
> > even a -7
> > > part.  One solution may be to run the BRAM at half the clock rate and
> > read/write
> > > two locations per clock by using a set of staging registers.
> >
> > I use 3 BRAMS (128x32) at 155MHz , I have 3 modules that access them so
I
> > use BUS MUXs for the memory arbitration (the BUS MUXs is LUT based and
> > registered with latency 1), all 3 modules use  a fsm to read and write
to
> > the BRAMs. Do you think that the logic is total wrong? In the functional
> > simulation all seem well  (if that counts :)) )
> > In the timing errors I get that the total delay is mostly owing to route
> > (70% route - 30% logic), do you think that with floorplanning I will be
able
> > to decrease the delays?
> >
> > Thank you very much for the help , I am new into these :)
> >
> > >
> > > The first step, of course, is to look at the timing report to see
where
> > your
> > > design is not meeting the timing.  Once you do that, you'll know where
you
> > need
> > > to focus your attention.
> > >
> > >
> > >
> > > "H.L" wrote:
> > >
> > > > Hello all,
> > > >
> > > > I have to program a Virtex-E FPGA at 155MHz. For this purpose I use
8
> > vhdl
> > > > entities,a MUX BUS and a Block RAM from the CORE GENERATOR (I use
XILINX
> > ISE
> > > > 4.1 with SP2). I use for synthesis FPGA EXPRESS 3.6.1 , so I create
a
> > fpga
> > > > express project where I add the vhdl sources and the 2 edn files
(the
> > one
> > > > for the mux bus and the one for the block ram), is this the correct
> > > > procedure? I manage to export the netlist for my design but in the
PAR
> > > > process I get too many timing errors!!!
> > > >
> > > > Thanks a lot
> > >
> > > --
> > > --Ray Andraka, P.E.
> > > President, the Andraka Consulting Group, Inc.
> > > 401/884-7930     Fax 401/884-7950
> > > email ray@andraka.com
> > > http://www.andraka.com
> > >
> > >  "They that give up essential liberty to obtain a little
> > >   temporary safety deserve neither liberty nor safety."
> > >                                           -Benjamin Franklin, 1759
> > >
> > >
>
> --
> --Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email ray@andraka.com
> http://www.andraka.com
>
>  "They that give up essential liberty to obtain a little
>   temporary safety deserve neither liberty nor safety."
>                                           -Benjamin Franklin, 1759
>
>

Article: 38330
Subject: Re: Picking an FPGA
From: Austin Lesea <austin.lesea@xilinx.com>
Date: Fri, 11 Jan 2002 11:58:40 -0800
Links: << >> << T >> << A >>

Rick,

I have responded before on the power on current issue, but for the group, I
will respond here again.

For existing designs in Virtex, Virtex E, Spartan II, and Spartan IIE,
consult the datasheets, and the app notes:

 http://www.support.xilinx.com/xapp/xapp450.pdf

 http://www.support.xilinx.com/xapp/xapp451.pdf

Which demonstrates ways to start up with as little as 80 mA by adding three
components that cost pennies at -40 C with the industrial grade parts.

In Virtex II, there is no startup current at all on any of the three
supplies.  The startup current equals the operational current, and there is
no extra current to be supplied.  If all you can supply is the operational
currents, it powers on cleanly and configures.

4KXLA, and Spartan XL also have no startup current.

Austin



rickman wrote:

> I am finalizing my FPGA selection for a line of DSP boards that we will
> be making for a number of years. I have always been more familiar with
> Xilinx but had a chance to work with the Altera 10K parts this past
> year. They seem ok, but the nearly identical ACEX 1K family is better in
> most regards. But the gate size is limited if we are looking at having
> future growth and I am not finding as good a price as with the Xilinx
> SpartanII parts. The only vendors I can find are Arrow and Newark, and
> Newark does not show much on their web site. Anyone know how to get good
> price numbers on the Altera parts without having a handfull of specific
> parts? If I call the vendors, they always want me to give the a few part
> numbers and I am window shopping and need pricing on all the parts so I
> can make my choices.
>
> The other problem I have with the Altera FPGAs is the lack of LUT RAM.
> There are only a small number of RAM (EAB/ESB) in these parts and I need
> a lot more blocks of it than are available. They don't have to be big,
> the 16 words available in a bank of Xilinx LUTs is perfect. For example,
> I will need 64 blocks of RAM if four modules of the 8 channel ADC/DAC
> are on board. This is not hard using the Xilinx LUTs. Anyone know of a
> way to do something similar in an Altera FPGA?
>
> BTW, just to mention why I like the Altera parts... THEY DON'T HAVE A
> STARTUP CURRENT SURGE!!! Was I at all unclear about that?  :)
>
> --
>
> Rick "rickman" Collins
>
> rick.collins@XYarius.com
> Ignore the reply address. To email me use the above address with the XY
> removed.
>
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design      URL http://www.arius.com
> 4 King Ave                               301-682-7772 Voice
> Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 38331
Subject: Re: Picking an FPGA
From: Ray Andraka <ray@andraka.com>
Date: Fri, 11 Jan 2002 20:17:00 GMT
Links: << >> << T >> << A >>

Rick,

I think you are intending to use these in a signal processing (read
arithmetic and filtering) application.  If that is the case, be very
careful.  The SpartanII/Virtex offer more advantages for DSP than just
having distributed RAM.  If you look at the Altera carry chain, it breaks
the 4LUTs into a pair of 3LUTs, one for the carry and one for the 'sum'.
One input to those 3LUTs is your carry from the adjacent bit.  That means,
at best, you get a two input arithmetic function in each level of logic.
Things like adder/subtractors wind up using two or more times the LUTs as
the equivalent function in Xilinx.  Also, the carry chain runs through the
LAB, so your data flow has to run from LAB to LAB.  In the 10K and I believe
(correct me if I am wrong) in the Acex familes, there are no direct
connections between LABs, so the data path has to go onto the row routing.
There are 6 row routes for every 8 LE's, so in a heavily arithmetic design
you run out of row routes when the row gets 3/4 full.  It is actually worse
than that: since the row routing connections are a sparse matrix, you need
to route through an intermediate LAB if there is not a direct connection
between your source and destination.  As the row fills up, the number of no
connects goes up sharply, accelerating the saturation.  In a heavily
arithmetic design, you hit a pretty hard limit at about 50% device
utilization because of this.  The 20K family greatly improves the situation
by the addition of direct connects between adjacent LABs.  Also, don't
discount the utility of the SRL16's in Xilinx.  Not only do those make very
compact delay queues, which are extensively used in filters, but they also
give you a way to reload LUT contents without having to reconfigure the
device.  This is valuable for DA filters, since the coefficients are stored
as partial sums in LUTs.  In altera, a reprogrammable filter is much harder
to build because the LUTs don't help you there.  The SRL16's are also great
for doing small reorder queues.  Reordering comes up frequently in signal
processing in such operations as FFTs, channel multiplexing etc.

Before you abandon the Xilinx offerings, I would look long and hard at what
you are giving up.  The cost may not be worth the small gains you get.

rickman wrote:

> I am finalizing my FPGA selection for a line of DSP boards that we will
> be making for a number of years. I have always been more familiar with
> Xilinx but had a chance to work with the Altera 10K parts this past
> year. They seem ok, but the nearly identical ACEX 1K family is better in
> most regards. But the gate size is limited if we are looking at having
> future growth and I am not finding as good a price as with the Xilinx
> SpartanII parts. The only vendors I can find are Arrow and Newark, and
> Newark does not show much on their web site. Anyone know how to get good
> price numbers on the Altera parts without having a handfull of specific
> parts? If I call the vendors, they always want me to give the a few part
> numbers and I am window shopping and need pricing on all the parts so I
> can make my choices.
>
> The other problem I have with the Altera FPGAs is the lack of LUT RAM.
> There are only a small number of RAM (EAB/ESB) in these parts and I need
> a lot more blocks of it than are available. They don't have to be big,
> the 16 words available in a bank of Xilinx LUTs is perfect. For example,
> I will need 64 blocks of RAM if four modules of the 8 channel ADC/DAC
> are on board. This is not hard using the Xilinx LUTs. Anyone know of a
> way to do something similar in an Altera FPGA?
>
> BTW, just to mention why I like the Altera parts... THEY DON'T HAVE A
> STARTUP CURRENT SURGE!!! Was I at all unclear about that?  :)
>
> --
>
> Rick "rickman" Collins
>
> rick.collins@XYarius.com
> Ignore the reply address. To email me use the above address with the XY
> removed.
>
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design      URL http://www.arius.com
> 4 King Ave                               301-682-7772 Voice
> Frederick, MD 21701-3110                 301-682-7666 FAX

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 38332
Subject: Re: APEX-II vs VIRTEX-II
From: Ray Andraka <ray@andraka.com>
Date: Fri, 11 Jan 2002 20:17:55 GMT
Links: << >> << T >> << A >>

Depends very heavily on what you are putting in there.

Steve Holroyd wrote:

> I am currently task of recommending the largest, fastest and most
> memory FPGA that's readily available the first half of this year for a
> FPGA Array Card.
>
> The choices have been narrowed down to two families Altera's APEX-II
> (EP2A70) and XILINX Virtex-II (XC2V6000).
>
> Which can operate at the highest speed?
>
> Steve

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 38333
Subject: Xilinx PAR and Editor speed up
From: "Bryan" <bryan@srccomp.com>
Date: Fri, 11 Jan 2002 13:23:37 -0700
Links: << >> << T >> << A >>

Because of a separate project I am doing, I had a Linux box set up to
emulate windows using wine.  The interesting thing is that we set up two
identical PCs.  Both had same motherboard with dual P3 1Ghz processors and
1Gbyte of ram.  Both have 7200rpm IBM hard drives.  We took the same edif
and ucf files and routed them on both.  The design was in a XC2V3000 part
with utilization around 70%.  The Linux box showed almost a 2 to 1 speedup
in PAR.  Also the Linux box shows huge speedups using fpga_editor.  The
windows box(running windows 2000pro) takes over 10 minutes just to open the
routed ncd.  The Linux box takes just a little over 3 minutes.  If you add a
net with autoroute on, windows box is several minutes, Linux box is less
than 10 seconds.  Just wanted to see if anyone else has experimented with
this.  I am now doing all of my routes and fpga_editor work on the linux box
and saving a lot of time.  BTW, the time for PAR on this particular design
was over 2 hours on the windows box and just over an hour on the linux box.

Bryan

Article: 38334
Subject: Re: Xilinx PAR and Editor speed up
From: "Bryan" <bryan@srccomp.com>
Date: Fri, 11 Jan 2002 13:30:30 -0700
Links: << >> << T >> << A >>

BTW, nothing was running under windows, not even a screen saver.  Task
Manager showed that no other native apps were using any processor cycles.

Bryan


"Bryan" <bryan@srccomp.com> wrote in message
news:3c3f4b2e$0$26669$724ebb72@reader2.ash.ops.us.uu.net...
> Because of a separate project I am doing, I had a Linux box set up to
> emulate windows using wine.  The interesting thing is that we set up two
> identical PCs.  Both had same motherboard with dual P3 1Ghz processors and
> 1Gbyte of ram.  Both have 7200rpm IBM hard drives.  We took the same edif
> and ucf files and routed them on both.  The design was in a XC2V3000 part
> with utilization around 70%.  The Linux box showed almost a 2 to 1 speedup
> in PAR.  Also the Linux box shows huge speedups using fpga_editor.  The
> windows box(running windows 2000pro) takes over 10 minutes just to open
the
> routed ncd.  The Linux box takes just a little over 3 minutes.  If you add
a
> net with autoroute on, windows box is several minutes, Linux box is less
> than 10 seconds.  Just wanted to see if anyone else has experimented with
> this.  I am now doing all of my routes and fpga_editor work on the linux
box
> and saving a lot of time.  BTW, the time for PAR on this particular design
> was over 2 hours on the windows box and just over an hour on the linux
box.
>
> Bryan
>
>
>
>

Article: 38335
Subject: Re: How to constrain the inputs of a multi-level parity generator and
From: Ray Andraka <ray@andraka.com>
Date: Fri, 11 Jan 2002 20:31:22 GMT
Links: << >> << T >> << A >>

You are kind of stuck if you are unwilling to use any vendor specific coding.  You could use
keep buffers in your synthesis to force the synthesizer to assign logic to specific LUTs.
The mapper should pretty much leave those alone as long as the synthesizer outputs LUT
primitives.  Unfortunately, the keep buffers have different syntaxes between vendors.

An option, if this part is not changed from instance to instance, is to compile the parity
component separately to an edif using vendor specific code, then distribute as and
instantiate it as a black box in the design.  The PAR tools will look for the edif to merge
the black box component with the rest of the design.  You can include the code when compiling
it for simulation so that the simulation is correct and remove it from the compile script so
that it gets black boxed.

Kevin Brace wrote:

>         Hi, I am having problems with trying to constrain the inputs
> going into a multi-level parity generator in XST Verilog.
> Here I am trying to generate a parity of 36 inputs for my PCI IP core,
> and, of course, Xilinx and Altera FPGAs are 4-input LUT-based, so the
> input signals go through multiple levels of LUTs to calculate the
> parity.
> In the first level, the parity generator uses 9 LUTs to calculate
> parity.
> In the second level, 2 LUTs take in 8 of the 9 outputs of the first
> level LUTs, and the remaining one output from the first level LUT will
> be used at the third level.
> At the final third level, 2 inputs from the second level LUTs and one
> input from the first level LUTs will be used to calculate the final
> parity calculation result.
> Here are the partial Verilog codes for the top module where I
> instantiate the parity generator, and the parity generator.
>
> ___________________________ Top Module _____________________________
>
> Parity_Generator Parity_Generator_Generator_Instance(
>                                 .clk(clk),
>                                 .Parity_Input({c_be_n[3:0], ad_Port[31:0]}),
>                                 .XORed_Result(Parity_Generated)
>                                 );
>
> ____________________________________________________________________
>
> __________________________ Parity Generator ________________________
>
> module Parity_Generator(
>                                                 clk,
>                                                 Parity_Input,
>                                                 XORed_Result
>                                                 );
>
> input           clk;
> input[35:0]     Parity_Input;
> output          XORed_Result;
>
> reg                     XORed_Result;
>
> wire[8:0]       First_Intermediate_Parity;
> wire[2:0]       Second_Intermediate_Parity;
> wire            Final_Parity;
>
> // First level
> assign          First_Intermediate_Parity[8]    = Parity_Input[35] ^
> Parity_Input[34] ^ Parity_Input[33] ^ Parity_Input[32];
> assign          First_Intermediate_Parity[7]    = Parity_Input[31] ^
> Parity_Input[30] ^ Parity_Input[29] ^ Parity_Input[28];
> assign          First_Intermediate_Parity[6]    = Parity_Input[27] ^
> Parity_Input[26] ^ Parity_Input[25] ^ Parity_Input[24];
> assign          First_Intermediate_Parity[5]    = Parity_Input[23] ^
> Parity_Input[22] ^ Parity_Input[21] ^ Parity_Input[20];
> assign          First_Intermediate_Parity[4]    = Parity_Input[19] ^
> Parity_Input[18] ^ Parity_Input[17] ^ Parity_Input[16];
> assign          First_Intermediate_Parity[3]    = Parity_Input[15] ^
> Parity_Input[14] ^ Parity_Input[13] ^ Parity_Input[12];
> assign          First_Intermediate_Parity[2]    = Parity_Input[11] ^
> Parity_Input[10] ^ Parity_Input[ 9] ^ Parity_Input[ 8];
> assign          First_Intermediate_Parity[1]    = Parity_Input[ 7] ^ Parity_Input[
> 6] ^ Parity_Input[ 5] ^ Parity_Input[ 4];
> assign          First_Intermediate_Parity[0]    = Parity_Input[ 3] ^ Parity_Input[
> 2] ^ Parity_Input[ 1] ^ Parity_Input[ 0];
>
> // Second level
> assign          Second_Intermediate_Parity[2]   = First_Intermediate_Parity[8];
> assign          Second_Intermediate_Parity[1]   = First_Intermediate_Parity[7] ^
> First_Intermediate_Parity[6] ^ First_Intermediate_Parity[5] ^
> First_Intermediate_Parity[4];
> assign          Second_Intermediate_Parity[0]   = First_Intermediate_Parity[3] ^
> First_Intermediate_Parity[2] ^ First_Intermediate_Parity[1] ^
> First_Intermediate_Parity[0];
>
> // Final level
> assign          Final_Parity    = Second_Intermediate_Parity[2] ^
> Second_Intermediate_Parity[1] ^ Second_Intermediate_Parity[0];
>
> always @ (posedge clk) begin
>
>         XORed_Result    <= Final_Parity;
>
> end
>
> endmodule
> ____________________________________________________________________
>
>         From what I see, the c_be_n[3:0] should go through,
>
> assign          First_Intermediate_Parity[8]    = Parity_Input[35] ^
> Parity_Input[34] ^ Parity_Input[33] ^ Parity_Input[32];
>
> But the problem I have here is that when I synthesize the code, XST
> Verilog (ISE WebPACK's synthesis tool) or Xilinx MAP somehow
> automatically chooses which inputs goes into which LUTs, and I have a
> problem with that.
> I want "c_be_n[3:0]" which is an unregistered bus signal of PCI bus to
> go through as fewer LUTs as possible to reduce setup time requirements.
> For "ad_Port[31:0]," that signal comes from inside of the chip (from
> DFFs), so I don't have to worry too much about how many levels of LUTs
> it passes through.
> I tried disabling (unchecking) XST's option called XOR collapsing, but
> it didn't seem to make any difference.
> I recently upgraded to the latest ISE WebPACK 4.1WP2.0 from 4.1WP0.0,
> but that didn't seem to make any difference, either.
> For MAP, setting Map to Inputs option to 4 or 5 didn't seem to make
> difference.
>         I first noticed this problem when I synthesized my PCI IP core
> trying to meet 66MHz PCI timings (Tsu < 3ns, Tval(Tco) < 6ns) just for
> curiosity.
> In 33MHz PCI, this whole issue of which signals go through how many LUTs
> for calculating parity was not a big issue because Tsu only has to be <
> 7ns.
>         I found someone else discussing a better way of calculating
> 36-bit parity than the method shown above for Virtex architecture
> devices, so I modified my code to take advantage of that idea.
> Here are the new partial Verilog codes for the top module where I
> instantiate the parity generator, and the parity generator.
>
> ___________________________ Top Module _____________________________
>
> Parity_Generator Parity_Generator_Instance(
>                                 .clk(clk),
>                                 .Fast_Path_Parity_Input(cben[3:0]),
>                                 .Parity_Input_1(ad_Port[3:0]),
>                                 .Parity_Input_2(ad_Port[7:4]),
>                                 .Parity_Input_3(ad_Port[11:8]),
>                                 .Parity_Input_4(ad_Port[15:12]),
>                                 .Parity_Input_5(ad_Port[19:16]),
>                                 .Parity_Input_6(ad_Port[23:20]),
>                                 .Parity_Input_7(ad_Port[27:24]),
>                                 .Parity_Input_8(ad_Port[31:28]),
>                                 .XORed_Result(Parity_Generated)
>                                 );
>
> ____________________________________________________________________
>
> __________________________ Parity Generator ________________________
>
> module Parity_Generator(
>                                                 clk,
>                                                 Fast_Path_Parity_Input,
>                                                 Parity_Input_1,
>                                                 Parity_Input_2,
>                                                 Parity_Input_3,
>                                                 Parity_Input_4,
>                                                 Parity_Input_5,
>                                                 Parity_Input_6,
>                                                 Parity_Input_7,
>                                                 Parity_Input_8,
>                                                 XORed_Result
>                                                 );
>
> input           clk;
> input[3:0]      Fast_Path_Parity_Input;
> input[3:0]      Parity_Input_1;
> input[3:0]      Parity_Input_2;
> input[3:0]      Parity_Input_3;
> input[3:0]      Parity_Input_4;
> input[3:0]      Parity_Input_5;
> input[3:0]      Parity_Input_6;
> input[3:0]      Parity_Input_7;
> input[3:0]      Parity_Input_8;
> output          XORed_Result;
>
> reg                     XORed_Result;
>
> wire[7:0]       First_Intermediate_Parity;
> wire[1:0]       Second_Intermediate_Parity;
> wire            Third_Intermediate_Parity;
> wire            Final_Parity;
>
> // First level
> assign          First_Intermediate_Parity[7]    = Parity_Input_1[3] ^
> Parity_Input_1[2] ^ Parity_Input_1[1] ^ Parity_Input_1[0];
> assign          First_Intermediate_Parity[6]    = Parity_Input_2[3] ^
> Parity_Input_2[2] ^ Parity_Input_2[1] ^ Parity_Input_2[0];
> assign          First_Intermediate_Parity[5]    = Parity_Input_3[3] ^
> Parity_Input_3[2] ^ Parity_Input_3[1] ^ Parity_Input_3[0];
> assign          First_Intermediate_Parity[4]    = Parity_Input_4[3] ^
> Parity_Input_4[2] ^ Parity_Input_4[1] ^ Parity_Input_4[0];
> assign          First_Intermediate_Parity[3]    = Parity_Input_5[3] ^
> Parity_Input_5[2] ^ Parity_Input_5[1] ^ Parity_Input_5[0];
> assign          First_Intermediate_Parity[2]    = Parity_Input_6[3] ^
> Parity_Input_6[2] ^ Parity_Input_6[1] ^ Parity_Input_6[0];
> assign          First_Intermediate_Parity[1]    = Parity_Input_7[3] ^
> Parity_Input_7[2] ^ Parity_Input_7[1] ^ Parity_Input_7[0];
> assign          First_Intermediate_Parity[0]    = Parity_Input_8[3] ^
> Parity_Input_8[2] ^ Parity_Input_8[1] ^ Parity_Input_8[0];
>
> // Second level
> assign          Second_Intermediate_Parity[1]   = First_Intermediate_Parity[7] ^
> First_Intermediate_Parity[6] ^ First_Intermediate_Parity[5] ^
> First_Intermediate_Parity[4];
> assign          Second_Intermediate_Parity[0]   = First_Intermediate_Parity[3] ^
> First_Intermediate_Parity[2] ^ First_Intermediate_Parity[1] ^
> First_Intermediate_Parity[0];
>
> // Third level
> assign          Third_Intermediate_Parity               = Second_Intermediate_Parity[1] ^
> Second_Intermediate_Parity[0];
>
> // Final level
> assign          Final_Parity    = Fast_Path_Parity_Input[3] ^
> Fast_Path_Parity_Input[2] ^ Fast_Path_Parity_Input[1] ^
> Fast_Path_Parity_Input[0] ^ Third_Intermediate_Parity;
>
> always @ (posedge clk) begin
>
>         XORed_Result    <= Final_Parity;
>
> end
>
> endmodule
>
> ____________________________________________________________________
>
>         In the above shown code, "ad_Port[31:0]" has to go through 4
> levels of LUTs, but like the previous version, that signal comes from
> inside of the chip (from DFFs), so I don't have to worry too much about
> how many levels of LUTs it passes through.
> The nice part of this method is that "c_be_n[3:0]" only has to go
> through 1 level of LUT.
> Yes, a 5-input LUT's gate delay is larger than a 4-input LUT's gate
> delay, but the 5-input LUT's gate delay is far better than two 4-input
> LUTs connected in series with the routing delay between two 4-input
> LUTs.
> In theory XST should use Virtex architecture's 5-input LUT, but when I
> synthesize this code with the same XOR Collapse option disabled, XST
> still seems to collapse the XOR structure of the HDL code, and
> synthesizes with 3 levels of LUTs using only 4-input LUTs.
> How can I work around this problem to instruct XST not to collapse the
> XOR gate structure?
> The "XOR Collapse" option I unchecked was through Project Navigator ->
> Processes for Current Source -> Synthesize -> Properties -> HDL Options.
> I am absolutely willing to use XST synthesis constraint file which I
> already use to constraint fanouts of individual inputs signals, if that
> is possible, but I don't want to insert vendor specific synthesis
> directives into my code because my PCI IP code will have to be
> synthesizable with other synthesis tools.
> I am using ISE WebPACK 4.1WP2.0 to develop my PCI IP core, and the
> device I am currently targeting is Xilinx Spartan-II XC2S150-5CPQ208 or
> 6CPQ208.
>
> Thanks,
>
> Kevin Brace (Don't respond to me directly, respond within the
> newsgroup.)

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 38336
Subject: Re: asic vs. fpga
From: jay.mitchell@c1d.com (jay mitchell)
Date: 11 Jan 2002 12:48:26 -0800
Links: << >> << T >> << A >>

The other big difference that separates FPGA and ASIC designs is the
ability of ASIC designs to be mixed mode.

--jay

Article: 38337
Subject: Re: latch vs. register
From: jmrice@ntlworld.com (Martin Rice)
Date: 11 Jan 2002 12:55:14 -0800
Links: << >> << T >> << A >>

Matthias Weber <msweber@onlinehome.de> wrote in message news:<1103_1010490929@news.online.de>...
> hi,
> 
> do i understand right that latches consists of simple flipflops without beeing clocked so that the circuit storesimmediately every change of signal.
> is the difference between latches and registers that latter are clocked (constructed by D-, RS- or  JK-FlipFlops)?
> 
> thanks for information,
> 
> matthias weber

Unfortunately, the terms latch, register and flip-flop do not have
universally accepted meanings, that are adhered to.  You can have
memory elemants with no clock input, you can have memory elements that
do have a clock, elements that have an enable, and you can have
collections of memory elements.

Take two, two-input NOR gates and join the output of one to the input
of the other, in a cross-coupled sort of circuit.  This is a bistable
circuit that can be changed from one state to the other by asserting
the right signals on the two spare inputs.  It is normally referred to
as an SR latch, but also as an RS latch, an RS flip-flop, an SR
bistable, etc.  Perhaps this is the circuit you have in mind when you
mention 'simple flipflops'.

If you add a third input (a clock) and some circuitry that means that
the output can only change in response to a change of state on the
clock, then you get an edge-triggered device.  Most people call this a
flip-flop, although you do find references to edge-triggered latches.

If, instead of the clock, you add a third input (an enable) and some
(different) circuitry that means that the output follows the input
when the enable is in one state, but stores the latest input when the
enable is in its other state, then you get a level-triggered device. 
Most people call this a latch, and to emphasise the way the output
follows the input when the enable is active, qualify the latch as
transparent.

If you string some flip-flops (edge-triggered) together, with all the
clocks combined, you get what most people call a register. On the
other hand, the term registered when describing a digital output,
probably just means that you need to apply a clock signal to get the
output to change, and may apply to just one output.

So, be careful what you read into these terms.  Make sure you know
whether the device is level triggered or edge triggered, and what
polarity of level or edge makes the device output change.

Martin Rice

Article: 38338
Subject: Re: Avoid routing through a certain area (Xilinx)
From: Bret Wade <bret.wade@xilinx.com>
Date: Fri, 11 Jan 2002 15:13:56 -0700
Links: << >> << T >> << A >>

Bret Wade wrote:

> rickman wrote:
>
> >
> > > After rethinking this I realized that there is no need remove the macro using
> > > JBits. Simply delete the macro in FPGA Editor after place and route.
> >
> > Bret, are you saying that a macro like this can be treated as a single
> > object and removed with a single command in the FPGA Editor?
>
> Yes, set the list window to "all macros", pick the macro from the list and delete.
>

I should also point out that hard macros can be imported into an .ncd using FPGA Editor
(Edit--> Add Macro), so there is no need to represent the anti-core macro in the logical
front end and "compile" it into the design. This also means that a library of anti-cores
can be developed and used interchangeably.

Bret

Article: 38339
Subject: Re: Picking an FPGA
From: rickman <spamgoeshere4@yahoo.com>
Date: Fri, 11 Jan 2002 17:16:23 -0500
Links: << >> << T >> << A >>

Thanks for the advice Ray. I finally figured out a way to do most of
what I want. My real problem is that I am trying to design a board that
can be everything to everyone depending on the chips you add. This means
low power if that is what you need or high performance if that is what
you need. And of course, it always has to be low cost. 

So I keep running into walls with this approach. The one that I really
hate is the problems that the Virtex/SpartanII startup current causes.
But if I only use one chip per voltage, I can get around that. So for
now I am looking at using a CoolRunner as the PC/104 interface. This is
not too expensive and it keeps that part of the power down. Then
everything else will go into an XC2S150E. This has a little room for
growth if there is future need for a bigger part. Too bad they didn't
keep the pinout compatible with the Virtex E parts, d..m! The other
problem this causes me is the lack of reconfiguration for different
combinations of IO. There is just not room on the board (or power
supply) to give each IO module its own FPGA like the current board has.
The new board has 4 IO sites rather than just 2. So I will be looking
hard at Jbits and various methods of modularizing bitstreams in the
future. 

For now this will get us off the ground and allow us to design a
workable board. Maybe when the Spartan version of the XC2V parts is
available I will work with that if I am still working then :)

BTW, I am aware of the limitations of the Altera architecture for doing
math. The 10K/1K family has other limitations as well since not all of
the inputs to the LUT can be used when you are using all the inputs to
the FFs. But the real kicker is the MAX+PLUS II tool. We found that in a
dense design it will just plain lie about timing. The analyzer says you
have a good design that meets timing and because of the complex routing
that can be required, it miscalculates and the chip will fail at
temperature. At least they have moved most of the 10K/1K family to
Quartus in the paid versions. The free versions still only support them
under MAX+PLUS II. 

Ray Andraka wrote:
> 
> Rick,
> 
> I think you are intending to use these in a signal processing (read
> arithmetic and filtering) application.  If that is the case, be very
> careful.  The SpartanII/Virtex offer more advantages for DSP than just
> having distributed RAM.  If you look at the Altera carry chain, it breaks
> the 4LUTs into a pair of 3LUTs, one for the carry and one for the 'sum'.
> One input to those 3LUTs is your carry from the adjacent bit.  That means,
> at best, you get a two input arithmetic function in each level of logic.
> Things like adder/subtractors wind up using two or more times the LUTs as
> the equivalent function in Xilinx.  Also, the carry chain runs through the
> LAB, so your data flow has to run from LAB to LAB.  In the 10K and I believe
> (correct me if I am wrong) in the Acex familes, there are no direct
> connections between LABs, so the data path has to go onto the row routing.
> There are 6 row routes for every 8 LE's, so in a heavily arithmetic design
> you run out of row routes when the row gets 3/4 full.  It is actually worse
> than that: since the row routing connections are a sparse matrix, you need
> to route through an intermediate LAB if there is not a direct connection
> between your source and destination.  As the row fills up, the number of no
> connects goes up sharply, accelerating the saturation.  In a heavily
> arithmetic design, you hit a pretty hard limit at about 50% device
> utilization because of this.  The 20K family greatly improves the situation
> by the addition of direct connects between adjacent LABs.  Also, don't
> discount the utility of the SRL16's in Xilinx.  Not only do those make very
> compact delay queues, which are extensively used in filters, but they also
> give you a way to reload LUT contents without having to reconfigure the
> device.  This is valuable for DA filters, since the coefficients are stored
> as partial sums in LUTs.  In altera, a reprogrammable filter is much harder
> to build because the LUTs don't help you there.  The SRL16's are also great
> for doing small reorder queues.  Reordering comes up frequently in signal
> processing in such operations as FFTs, channel multiplexing etc.
> 
> Before you abandon the Xilinx offerings, I would look long and hard at what
> you are giving up.  The cost may not be worth the small gains you get.
> --Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email ray@andraka.com
> http://www.andraka.com
> 
>  "They that give up essential liberty to obtain a little
>   temporary safety deserve neither liberty nor safety."
>                                           -Benjamin Franklin, 1759

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 38340
Subject: Re: Picking an FPGA
From: guys@mediaone.net (guy)
Date: 11 Jan 2002 14:18:13 -0800
Links: << >> << T >> << A >>

Hey Rick,
I see you are in MD.
Strong suggestion here - call your local Altera office 410-203-1245
and ask for Jeff Wills.  He can meet with you and tell you all about
Altera's DSP capabilities architectually, tools, and IP.  Much more
significant than 10k.  Do yourself a favor and get this in person or
on phone.

Guy
Altera Corp.


rickman <spamgoeshere4@yahoo.com> wrote in message news:<3C3F3029.5E555144@yahoo.com>...
> I am finalizing my FPGA selection for a line of DSP boards that we will
> be making for a number of years. I have always been more familiar with
> Xilinx but had a chance to work with the Altera 10K parts this past
> year. They seem ok, but the nearly identical ACEX 1K family is better in
> most regards. But the gate size is limited if we are looking at having
> future growth and I am not finding as good a price as with the Xilinx
> SpartanII parts. The only vendors I can find are Arrow and Newark, and
> Newark does not show much on their web site. Anyone know how to get good
> price numbers on the Altera parts without having a handfull of specific
> parts? If I call the vendors, they always want me to give the a few part
> numbers and I am window shopping and need pricing on all the parts so I
> can make my choices. 
> 
> The other problem I have with the Altera FPGAs is the lack of LUT RAM.
> There are only a small number of RAM (EAB/ESB) in these parts and I need
> a lot more blocks of it than are available. They don't have to be big,
> the 16 words available in a bank of Xilinx LUTs is perfect. For example,
> I will need 64 blocks of RAM if four modules of the 8 channel ADC/DAC
> are on board. This is not hard using the Xilinx LUTs. Anyone know of a
> way to do something similar in an Altera FPGA? 
> 
> BTW, just to mention why I like the Altera parts... THEY DON'T HAVE A
> STARTUP CURRENT SURGE!!! Was I at all unclear about that?  :)
> 
> -- 
> 
> Rick "rickman" Collins
> 
> rick.collins@XYarius.com
> Ignore the reply address. To email me use the above address with the XY
> removed.
> 
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design      URL http://www.arius.com
> 4 King Ave                               301-682-7772 Voice
> Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 38341
Subject: Re: Picking an FPGA
From: rickman <spamgoeshere4@yahoo.com>
Date: Fri, 11 Jan 2002 17:43:54 -0500
Links: << >> << T >> << A >>

Yes Austin, we did discuss the power surge issue there are ways around
the problem. But none of the proposed solutions are workable for me. I
looked at xapp451, "SpartanII poweron assist" again just today to make
sure I did not  miss anything. The problem is that the added circuitry
is not a cheap or as simple as you would like to think. I noticed that
the example used in this xapp uses a 2600 uF capacitor! The AVX
datasheet only goes as high as 1000uF at 6.3 volts with a .32" x .17"
footprint including pads. This would require three of these or something
around .32" x .6". I don't consider that to be small when being used
with a .7" square chip. Actually the largest cap I could find in a quick
search was a 470 uF device which would require 5 at $2.50 each. That
makes the cost of the POS circuit nearly as much as the FPGA!!! It would
actually be easier for me to increase the size of my DCDC converter and
supply the extra Amps directly. 

I believe you (or Peter) offered in the newsgroup to provide more
accurately qualified data on the POS (power on surge), but all I ever
got was a phone call or email indicating that the numbers "could" be
reduced. Rather than being told what the reduced numbers were, I was
asked what my target was. I guess I was hoping that the POS was actually
very overstated, expecially in the smaller part. But it seems that it
could only be reduced by some 25% or so. 

In any event, I am about ready to go with a XC2Ve part along with a
coolrunner for the 5 volt interface. 

I may not like some of the limitations of the Spartan II parts, but I
will love them if I can find a way to design for partial
reconfiguration. 

Austin Lesea wrote:
> 
> Rick,
> 
> I have responded before on the power on current issue, but for the group, I
> will respond here again.
> 
> For existing designs in Virtex, Virtex E, Spartan II, and Spartan IIE,
> consult the datasheets, and the app notes:
> 
>  http://www.support.xilinx.com/xapp/xapp450.pdf
> 
>  http://www.support.xilinx.com/xapp/xapp451.pdf
> 
> Which demonstrates ways to start up with as little as 80 mA by adding three
> components that cost pennies at -40 C with the industrial grade parts.
> 
> In Virtex II, there is no startup current at all on any of the three
> supplies.  The startup current equals the operational current, and there is
> no extra current to be supplied.  If all you can supply is the operational
> currents, it powers on cleanly and configures.
> 
> 4KXLA, and Spartan XL also have no startup current.
> 
> Austin
> 
> rickman wrote:
> 
> > I am finalizing my FPGA selection for a line of DSP boards that we will
> > be making for a number of years. I have always been more familiar with
> > Xilinx but had a chance to work with the Altera 10K parts this past
> > year. They seem ok, but the nearly identical ACEX 1K family is better in
> > most regards. But the gate size is limited if we are looking at having
> > future growth and I am not finding as good a price as with the Xilinx
> > SpartanII parts. The only vendors I can find are Arrow and Newark, and
> > Newark does not show much on their web site. Anyone know how to get good
> > price numbers on the Altera parts without having a handfull of specific
> > parts? If I call the vendors, they always want me to give the a few part
> > numbers and I am window shopping and need pricing on all the parts so I
> > can make my choices.
> >
> > The other problem I have with the Altera FPGAs is the lack of LUT RAM.
> > There are only a small number of RAM (EAB/ESB) in these parts and I need
> > a lot more blocks of it than are available. They don't have to be big,
> > the 16 words available in a bank of Xilinx LUTs is perfect. For example,
> > I will need 64 blocks of RAM if four modules of the 8 channel ADC/DAC
> > are on board. This is not hard using the Xilinx LUTs. Anyone know of a
> > way to do something similar in an Altera FPGA?
> >
> > BTW, just to mention why I like the Altera parts... THEY DON'T HAVE A
> > STARTUP CURRENT SURGE!!! Was I at all unclear about that?  :)
> >
> > --
> >
> > Rick "rickman" Collins
> >
> > rick.collins@XYarius.com
> > Ignore the reply address. To email me use the above address with the XY
> > removed.
> >
> > Arius - A Signal Processing Solutions Company
> > Specializing in DSP and FPGA design      URL http://www.arius.com
> > 4 King Ave                               301-682-7772 Voice
> > Frederick, MD 21701-3110                 301-682-7666 FAX

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 38342
Subject: Re: latch vs. register
From: Peter Alfke <peter.alfke@xilinx.com>
Date: Fri, 11 Jan 2002 15:30:40 -0800
Links: << >> << T >> << A >>

Martin Rice wrote:

>
> Unfortunately, the terms latch, register and flip-flop do not have
> universally accepted meanings, that are adhered to.

Let me suggest that in this newsgroup a latch is a bistable storage element that has only one rank, i.e. is transparent ( input directly affecting the
output) while enabled.

A flip-flop is more complex, dual-rank, and is thus never transparent, and the data input never affects the output directly.

Usually, a register is a collection of flip-flops with a common clock.

If we adhere to these definitions, we are in synch with most of the industry.

Peter Alfke

Article: 38343
Subject: speech recognition - active noise cancellation
From: cjwang_1225@hotmail.com (chris)
Date: 11 Jan 2002 17:13:03 -0800
Links: << >> << T >> << A >>

i am trying to do a speech recognition application and i need a clean
input voice signal to a microphone. my problem is that i need to get
rid of ambient noise in a room without affecting the voice signal at
all. i have thought about doing an adaptive filter like application,
but i cannot think of a way to isolate just the ambient noise without
touching the voice. my main goal would be to come up with some kind of
active noise control setup that would be able to phase-cancel the
ambient noise while keeping the voice signal clean. does anyone know
how this can be done? i have seen something similar from andreas
electronics, but their product doesn't exactly fit my requirement. any
help would be appreciated. thanks.
chris wang

Article: 38344
Subject: Re: Xilinx PAR and Editor speed up
From: Duane Clark <junkmail@junkmail.com>
Date: Fri, 11 Jan 2002 18:04:46 -0800
Links: << >> << T >> << A >>

Bryan wrote:
> ...  I am now doing all of my routes and fpga_editor work on the linux box
> and saving a lot of time.

Just out of curiosity, do the toolbar buttons in the main window of 
fpga_editor work for you? If they do, what version of wine and Linux are 
you using?

For me (current wine cvs and RH 6.2), the toolbar buttons in the main 
window do not work, and there does not appear to be any other way to 
select the particular pips and lines etc that are desired. Oddly enough, 
the toolbar buttons do work in the popup windows for the slices and 
iobs. Everything else in fpga_editor seems to work fine.

I had not tried a run time comparison, but since I can dual boot the 
same machine, maybe I will try that one of these days.

Duane

Article: 38345
Subject: Re: multiply (*) 11000000000
From: Ken McElvain <ken@synplicity.com>
Date: Fri, 11 Jan 2002 18:04:58 -0800
Links: << >> << T >> << A >>

In Synplify, we have a special multiply by constant
mode that does some recoding of the constant to
reduce the number of adders.  I think you will get
a good result just doing the multiply.  For more
complex constants we will probably get fewer adders
than you would expect.  Try multiplying by 7 and see
what happens.

Ken McElvain
CTO, Synplicity

Ray Andraka wrote:

> Because the synthesizer may not recognize that it can be done with an adder.
> Often a template is used which in turn instantiates the vendor core for the
> multiplier.  I believe if you do this in synplicity, you'll get a LUT based sum
> of partial products construction based on the Xilinx coregen constant coefficient
> multiplier.  The synthesis is not smart enough to distill that down to an adder
> (it would if the multiplier template produced a full array multiplier, but that
> is usually a very inefficient construct in an FPGA).
> 
> Jay wrote:
> 
> 
>>What about just typing a "*" and let your synthesizer turn it into 2
>>adders?  This way nobody has to try to figure out why you're adding 2
>>shifted numbers when they're reading the code.
>>
>>Kenily <aiurh@iuehr.erug> wrote in message news:<ee74130.-1@WebX.sUN8CHnE>...
>>
>>>i want to implement a multiplier.one
>>>multiply 0x600(Hex).how do i implement?
>>>
> 
> --
> --Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email ray@andraka.com
> http://www.andraka.com
> 
>  "They that give up essential liberty to obtain a little
>   temporary safety deserve neither liberty nor safety."
>                                           -Benjamin Franklin, 1759
> 
> 
>

Article: 38346
Subject: Re: speech recognition - active noise cancellation
From: "Kevin Neilson" <kevin_neilson@removethis-yahoo.com>
Date: Sat, 12 Jan 2002 02:07:36 GMT
Links: << >> << T >> << A >>

I think the trick is to have a unit with two microphones.  One is highly
directional toward the speaker.  The other is more omnidirectional but with
a null in the direction of the speaker.  The data from the omnidirectional
mic is the noise that you subract from the directional mic.

"chris" <cjwang_1225@hotmail.com> wrote in message
news:24a13eb0.0201111713.7f9af7b5@posting.google.com...
> i am trying to do a speech recognition application and i need a clean
> input voice signal to a microphone. my problem is that i need to get
> rid of ambient noise in a room without affecting the voice signal at
> all. i have thought about doing an adaptive filter like application,
> but i cannot think of a way to isolate just the ambient noise without
> touching the voice. my main goal would be to come up with some kind of
> active noise control setup that would be able to phase-cancel the
> ambient noise while keeping the voice signal clean. does anyone know
> how this can be done? i have seen something similar from andreas
> electronics, but their product doesn't exactly fit my requirement. any
> help would be appreciated. thanks.
> chris wang

Article: 38347
Subject: Re: speech recognition - active noise cancellation
From: Jerry Avins <jya@ieee.org>
Date: Fri, 11 Jan 2002 21:44:19 -0500
Links: << >> << T >> << A >>

Kevin Neilson wrote:
> 
> I think the trick is to have a unit with two microphones.  One is highly
> directional toward the speaker.  The other is more omnidirectional but with
> a null in the direction of the speaker.  The data from the omnidirectional
> mic is the noise that you subract from the directional mic.
> 
> "chris" <cjwang_1225@hotmail.com> wrote in message
> news:24a13eb0.0201111713.7f9af7b5@posting.google.com...
> > i am trying to do a speech recognition application and i need a clean
> > input voice signal to a microphone. my problem is that i need to get
> > rid of ambient noise in a room without affecting the voice signal at
> > all. i have thought about doing an adaptive filter like application,
> > but i cannot think of a way to isolate just the ambient noise without
> > touching the voice. my main goal would be to come up with some kind of
> > active noise control setup that would be able to phase-cancel the
> > ambient noise while keeping the voice signal clean. does anyone know
> > how this can be done? i have seen something similar from andreas
> > electronics, but their product doesn't exactly fit my requirement. any
> > help would be appreciated. thanks.
> > chris wang

Contact throat microphones were used in fighter planes 50 years ago.
They discriminate against ambient noise very nicely indeed. 50 dB SNR
improvements were cited, but I didn't make measurements myself.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
-----------------------------------------------------------------------

Article: 38348
Subject: Re: How do I use Altera's PLL megafunction to multiply some frequency ?
From: vitaliyt@xillix.com (Vitaliy Tkachenko)
Date: 11 Jan 2002 18:49:17 -0800
Links: << >> << T >> << A >>

In Apex 20KE and later devices an external feedback can be used as
well. In this case you need a connection between the PLL output and
the PLL feedback input.

kayrock66@yahoo.com (Jay) wrote in message news:<d049f91b.0201101743.44b19f31@posting.google.com>...
> I'm not sure I understand your question but I'll try my hand at it
> because nobody else has.  The Altera clock multiplication is done
> using hard macro PLL's on the die.  The feedback circuitry is
> encapsulated inside the function, you just tell it what multiplication
> factor you want, and let Altera do the rest.
> 
> "Dimitry Yegorov 1598864168" <dmyegorov@geolink-group.com> wrote in message news:<a0kbgd$97h$1@josh.sovintel.ru>...
> > I know that it must be an easy question, but I can't find the answer - it is
> > not published in the Help. Seems that some loop should be added to max2plus
> > PLL megafunction to implement the F*N multiplier.
> > Any comments are welcome, I am the beginner.
> > Thanks!

Article: 38349
Subject: Re: Suitability of Atmel for project?
From: i_never_check_this@hotmail.com (Stout)
Date: 11 Jan 2002 21:05:15 -0800
Links: << >> << T >> << A >>

>  I'm a little lost. You say SPI, but then mention UARTS. 
> These are all SLAVE devices ?

They are each banging away at about 150 kbps, not synchronous to each
other (i.e. they all have their own clock that they are supplying and
are all completely uncoordinated).  So the worst case is that my board
has to listen to all 5 going at once.  However, the traffic is short
bursts so that the overall throughput from any one input is more like
2000 bps.

Anyway thanks to all who have replied.  I now have a lot more insight
into what direction to go in, or should I say, what directions to
avoid.

- Stout

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search