Messages from 37650

Article: 37650
Subject: Re: division 64
From: Muzaffer Kal <muzaffer@dspia.com>
Date: Tue, 18 Dec 2001 18:23:12 GMT
Links: << >> << T >> << A >>

On Tue, 18 Dec 2001 23:56:30 +1100, Russell Shaw
<rjshaw@iprimus.com.au> wrote:
>Muzaffer Kal wrote:
>> 
>> On Mon, 17 Dec 2001 10:31:33 -0500, "Pallek, Andrew [CAR:CN34:EXCH]"
>> <apallek@americasm01.nt.com> wrote:
>> 
>> >If you just want to devide by 64, shift right by 6 places.  The modulo is what was shifted
>> >out.
>> 
>> what if the dividend is negative ?
>
>The shift_right() function in ieee.numeric_bit operates on
>signed numbers by maintaining the sign bits. For an unsigned
>number, zeros are shifted in.

Yes one would need to at least use arithmetic shift but that was not
my point. If the dividend is negative a shift doesn't always work if
what you want is division in the sense most people (and all adaptive
filters) define it. Say we want to divide "n" by "d"; we can represent
the operation by 
	n = d * q + r
where q is the quotient and r is the remainder. For this operation to
be considered division at all we need
	 |r| < |d| 
but the sign of the remainder is a little bit tricky because one can
select it either positive or negative with a negative dividend, iow:
	-5 = 2 * (-2) + (-1) or -5 = 2 * (-3) + (+1)
For the second choice the remainder is always positive regardless of
the sign of dividend. This is not the commonly accepted definition of
division (no cpu has an integer division which works like this and
adaptive filters hate it with a passion). The usual definition of the
remainder includes 
	sign(r) = sign(x)

The problem is that a shift doesn't do this. You get the second option
which may or may not be what you want. It all depends on what your
"definition" of division is.

Muzaffer Kal

http://www.dspia.com
DSP algorithm implementations for FPGA systems

Article: 37651
Subject: Re: ISP by JTAG using a microcontroller
From: Greg Neff <gregeneff@yahoo.com>
Date: Tue, 18 Dec 2001 13:25:08 -0500
Links: << >> << T >> << A >>

On Tue, 11 Dec 2001 11:51:57 +0100, "alco" <alco@cardiocontrol.com>
wrote:

(SNIP)
>
> - Has anything changed recently in the JTAG interface for the xc9536 that
>might cause a microcontroller to fail programming the cpld.
(SNIP)
>select a version-2 9536 as a target device.
>
>Thanks,
>
>Alco Looye
>alco@cardiocontrol.com
>
>
>

We did the same thing, and had the same problem.  The newer XC9536
silicon requires a longer flash erase time.  The fix for this is to
manually edit your SVF file to account for this.

Near the top of the file you should see four lines like this (each
following an SDR 27 line) :

RUNTEST 1300000 TCK;

Change these to:

RUNTEST 3000000 TCK;

This increases the erase time from 1.3 seconds to 3.0 seconds.
Regenerate your XSV file, and you will be OK.

Also, see Xilinx Answers Database record 4475.  You will notice that
the SVF file generator produces an erase time of 1.3 seconds, even
though the maximum erase time specified by Xilinx is 2.6 seconds.

BTW, we cleaned up the Xilinx 8051 code to get rid of signed
variables, unnecessarily long variables, and other inefficiencies.
This halved the programming time.  I told Xilinx that they should
clean up their example code, and they basically told me to go away and
leave them alone.

===================================
Greg Neff
VP Engineering
*Microsym* Computers Inc.
greg@guesswhichwordgoeshere.com

Article: 37652
Subject: Kindergarten Stuff
From: Peter Alfke <peter.alfke@xilinx.com>
Date: Tue, 18 Dec 2001 10:29:01 -0800
Links: << >> << T >> << A >>

This is a friendly and helpful newsgroup, but let's make sure that it does not
get abused.
Lots of textbooks explain how to divide by a power of 2, where the remainder is,
and how you sign-extend the MSB. Explaining that is not the purpose of this
newsgroup.

Let's use our "bandwidth" for more complex and perhaps controversial questions
that are not explained in textbooks and data books.

Peter Alfke, Xilinx Applications

Article: 37653
Subject: Re: ISP by JTAG using a microcontroller
From: Jim Granville <jim.granville@designtools.co.nz>
Date: Wed, 19 Dec 2001 07:54:19 +1300
Links: << >> << T >> << A >>

Greg Neff wrote:
<snip> 
> BTW, we cleaned up the Xilinx 8051 code to get rid of signed
> variables, unnecessarily long variables, and other inefficiencies.
> This halved the programming time.  I told Xilinx that they should
> clean up their example code, and they basically told me to go away and
> leave them alone.

 Interesting :-)

 I'm sure there is somewhere you could post the cleaned up code...

 What was the final 8051 Code / RAM footprint, after you did this ?

 Did you look at run length compression, or just use a BIT file copy ?

 -jg

Article: 37654
Subject: Re: FGPA express bidir pins Xilinx, FPGA-pmap-18
From: Peter Alfke <peter.alfke@xilinx.com>
Date: Tue, 18 Dec 2001 10:56:34 -0800
Links: << >> << T >> << A >>

Maybe a more general explanation is in order:
It is inherently impossible to extend a bidirectional line across a conventional
amplifier. To go across a chip boundary, you have to know the signal flow and
activate the appropriate driver.  The "wired AND Longline" cannot pass through
the chip boundary, since the I/O contains an amplifier, and a conventional
amplifier is always unidirectional.

Peter Alfke
================================
Falk Brunner wrote:

> "Wilco Vahrmeijer" <wilco@cardiocontrol.com> schrieb im Newsbeitrag
> news:9vnl4h$1j1i$1@news.versatel.net...
> > Hi all,
> >
> > We've got a problem with FPGA express (FPGAexpress 3.6.6613 (attached bij
> > Xilinx  ISE 4.1)) and bidir pins with a Xilinx device:
> >
> > I've made two blocks and each block has control signals and one
> > bidirectional pin (tri-state buffered). On the upper layer, this two
> signals
> > are routed to the same output pin. (See attachments)
> >
> > The problem is a warning from FPGA express:
> > "FPGA-pmap-18  (1 Occurrence) Warning: The port type of port
> > '/TryOutBiDir-1/BiDirPin' is unknown. An output pad will be inserted"
> >
> > and FPGA express insert a Outputbuffer instead of a bidir buffer. Internal
> > the signal is bidirectional, to the outside it's unidirectional.
> >
> > I want a bidirectional output pin !! Can somebody help me??
>
> To have a bidirectional bus inside AND outside the FPGA you have to isolate
> them.
>
> entity tristate is
>   port
>   (
>     BiDirPin: inout STD_LOGIC
>   );
> end TryOutBiDir;
>
> architecture TryOutBiDir_arch of TryOutBiDir is
>
>   component driver
>     port
>     (
>       Write2Readed: in    STD_LOGIC;
>       highZ:        in    STD_LOGIC;
>       Readed:       out   STD_LOGIC;
>       DQ:           inout STD_LOGIC
>     );
>   end component;
>
>   begin
>
>   Driver_1  : Driver  port map (Write2Readed1,HighZ1,Readed1,BiDirPin_int);
>   Driver_2  : Driver  port map (Write2Readed2,HighZ2,Readed2,BiDirPin_int);
>
>   BiDirPin<=BidirPin_int when con='1' else 'Z';
>
> end TryOutBiDir_arch;
>
> This code ist not complete, the signal declarations are missing. You also
> need to generate the con signal, which controls the Tristate driver of the
> IO Pin.
>
> --
> MfG
> Falk

Article: 37655
Subject: Re: Kindergarten Stuff
From: Muzaffer Kal <muzaffer@dspia.com>
Date: Tue, 18 Dec 2001 19:07:48 GMT
Links: << >> << T >> << A >>

On Tue, 18 Dec 2001 10:29:01 -0800, Peter Alfke
<peter.alfke@xilinx.com> wrote:

>This is a friendly and helpful newsgroup, but let's make sure that it does not
>get abused.
>Lots of textbooks explain how to divide by a power of 2, where the remainder is,
>and how you sign-extend the MSB. Explaining that is not the purpose of this
>newsgroup.

I don't think I have ever seen the charter of this group but I know
what you mean. It is as if all the people reading are sitting in a
circle and when one asks "how do I divide by two?" everybody starts
chanting "shift, shift, shift". But if a question is asked I think it
needs to be answered; in a correct and comprehensive manner.
Muzaffer Kal

http://www.dspia.com
DSP algorithm implementations for FPGA systems

Article: 37656
Subject: Barrel shifter puts three 2->1 muxes / slice in Xilinx
From: "Carl Brannen" <carl.brannen@terabeam.com>
Date: Tue, 18 Dec 2001 19:20:37 +0000 (UTC)
Links: << >> << T >> << A >>

This came as a result of thinking about how to make more efficient barrel
shifters.  Most readers are familiar with fall through barrel shifters and how
they're usually implemented.  (With columns of 2 to 1 muxes that each column
dedicated to shifting the data by a different power of 2.)

I got to thinking about how to use MUXF5s for barrel shifters.  It's clear that
if you could bring out the terms that feed the MUXF5 you could get three
results out of a slice instead of just two.  That would essentially give me
three 2 to 1 muxes in one slice instead of two, and it would really improve
barrel shifter packing (and maybe be good for random logic use).
 
The problem with doing it is that it's hard to get the output of the "F" LUT
out of the slice.  But it can be done by brining it out the CARRY-OUT.  You use
a MUXCY, and apply '0' and '1' to the DI and CI inputs, and the "F" LUT output
to the S input of the MUXCY.  That programs the MUXCY to be a buffer of the "F"
LUT, and you get the (otherwise hidden) "F" LUT output as the carry-out (which
can easily route to the outside of the slice).
 
The only problem with that is that it's very hard to program the CI of the
MUXCY to '1' and still use the MUXF5.  In fact, since the BX input is going to
have to be used by the 'S' input of the MUXF5, you have to use the carry input
of the slice.  And that puts the problem of generating a '1' into the next door
neighbor to the slice, where you'll have to use a LUT to generate it, thereby
wasting the LUT you were trying to save.
 
That was where my analysis ended, but the other night I realized that for a
barrel shifter, I don't have to control the value of that carry-out at all
times.  I only need to control it when I'm actually going to select it in the
next stage of logic.  And since the selector for the next stage of logic will
be the same selector as is coming in on the 'S' pin for the MUXF5, that
suggests that there might be a solution.
 
In fact there is.  You program the "DI" input of the MUXCY to '1', and connect
both the MUXCY.CI input  and the MUXF5.S input to the same BX input.  That's a
natural use for the carry input, but it's usually not done because normally
arithmetic logic is not used at the same time as the MUXF5 pin.  But you can do
it.
 
The result is that when the MUXF5 selects the "F" LUT, (i.e. when the BX input
is '1'), the MUXF5 operates normally, but the Carry-out will be forced to '1'.
But that's the condition under which you would normally ignore the carry-out
anyway, if you were using the circuit as a barrel shifter (with positive shift
amount).
 
On the other hand, when the SHIFT input is low, the "G" LUT is selected for the
MUXF5, and the DI and CI inputs of the MUXCY end up as '1' and '0'
respectively.  That causes the CARRY-OUT to follow the complement of the "F"
LUT output.  But we all know how easy it is to invert logic in a Xilinx, so I
just put an inverter on the CARRY-OUT and the logic takes care of getting rid
of the inversion for me.
 
The router isn't too good at combining MUXCYs and MUXF5s, so to get this into a
single slice I have to RLOC it.  But the good news is that this works.
 
Generalizing, this means that I can get certain collections of 3 logic
functions in a single slice.  The general rule is:
 
The three logic functions are {F,G',F5}, and the 9 input variables are
{F1,F2,F3,F4,G1,G2,G3,G4, and BX}
 
F <= LUT(F1,F2,F3,F4);
 
G' <= LUT(G1,G2,G3,G4) nand  BX;
 
with BX select
    F5 <=
        G when '0',
        F when others;
 
I thought this was cool.  It allows a 16-bit wide 0 to 7 bit barrel shifter in
just 36 LUTs which is 12 less than the number needed to create a barrel shift
the usual way.

The logic for the above was implemented in schematics, but I could easily
convert this to VHDL if anyone is interested.

I also figured out a way to program a column of slices to perform a vector of 3
to 1 muxes instead of just 2 to 1 muxes.  This can be used to create very
efficient barrel shifters where the shift amount is a power of 3.  An example
would be a barrel shifter that shifts between 0 and 8 bits.  With the usual
barrel shift technique, such a barrel shifter would require 4 stages, but using
3 to 1 muxes it requires only 2 stages.  There's some fixed costs associated
with computing the controls for the stages, (and since it uses arithmetic
functions), driving the CARRY-IN for each stage.  (It's actually more complex
than I'm implying here.)  I'll post code for it if anyone is interested.

Carl


-- 
Posted from firewall.terabeam.com [216.137.15.2] 
via Mailgate.ORG Server - http://www.Mailgate.ORG

Article: 37657
Subject: Defauolt Should Be "Inputs and Outputs" For IOBs
From: "S. Ramirez" <sramirez@cfl.rr.com>
Date: Tue, 18 Dec 2001 19:25:19 GMT
Links: << >> << T >> << A >>

     I don't know about you guys and gals and pals, but everytime I do a
design, without exception, I ALWAYS go into the Xilinx Design Manager
Design --> Optiions --> Implementation Edit Options and select and select
"Inputs and Outputs" for Pack I/O Registers/Latches into IOBs for.  I ALWAYS
want my designs to use IOB flip flops if possible.  It seems to me that the
default "Off" is a waste of these flip flops.  Does anyone here every turn
this off?
Simon Ramirez, Consultant
Synchronous Design, Inc.
Oviedo, FL  USA

Article: 37658
Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx
From: Steven Derrien <sderrien@irisa.fr>
Date: Tue, 18 Dec 2001 20:36:44 +0100
Links: << >> << T >> << A >>

Hello,

This could be very useful for optimizing floating point adders or substracters,
since
they use large barrel shifters for normalization and denormalization. If you have
some VHDL for this I'd be eager to use it to see how it affects area and speed !!

Steven



Carl Brannen wrote:

> This came as a result of thinking about how to make more efficient barrel
> shifters.  Most readers are familiar with fall through barrel shifters and how
> they're usually implemented.  (With columns of 2 to 1 muxes that each column
> dedicated to shifting the data by a different power of 2.)
>
> I got to thinking about how to use MUXF5s for barrel shifters.  It's clear that
> if you could bring out the terms that feed the MUXF5 you could get three
> results out of a slice instead of just two.  That would essentially give me
> three 2 to 1 muxes in one slice instead of two, and it would really improve
> barrel shifter packing (and maybe be good for random logic use).
>
> The problem with doing it is that it's hard to get the output of the "F" LUT
> out of the slice.  But it can be done by brining it out the CARRY-OUT.  You use
> a MUXCY, and apply '0' and '1' to the DI and CI inputs, and the "F" LUT output
> to the S input of the MUXCY.  That programs the MUXCY to be a buffer of the "F"
> LUT, and you get the (otherwise hidden) "F" LUT output as the carry-out (which
> can easily route to the outside of the slice).
>
> The only problem with that is that it's very hard to program the CI of the
> MUXCY to '1' and still use the MUXF5.  In fact, since the BX input is going to
> have to be used by the 'S' input of the MUXF5, you have to use the carry input
> of the slice.  And that puts the problem of generating a '1' into the next door
> neighbor to the slice, where you'll have to use a LUT to generate it, thereby
> wasting the LUT you were trying to save.
>
> That was where my analysis ended, but the other night I realized that for a
> barrel shifter, I don't have to control the value of that carry-out at all
> times.  I only need to control it when I'm actually going to select it in the
> next stage of logic.  And since the selector for the next stage of logic will
> be the same selector as is coming in on the 'S' pin for the MUXF5, that
> suggests that there might be a solution.
>
> In fact there is.  You program the "DI" input of the MUXCY to '1', and connect
> both the MUXCY.CI input  and the MUXF5.S input to the same BX input.  That's a
> natural use for the carry input, but it's usually not done because normally
> arithmetic logic is not used at the same time as the MUXF5 pin.  But you can do
> it.
>
> The result is that when the MUXF5 selects the "F" LUT, (i.e. when the BX input
> is '1'), the MUXF5 operates normally, but the Carry-out will be forced to '1'.
> But that's the condition under which you would normally ignore the carry-out
> anyway, if you were using the circuit as a barrel shifter (with positive shift
> amount).
>
> On the other hand, when the SHIFT input is low, the "G" LUT is selected for the
> MUXF5, and the DI and CI inputs of the MUXCY end up as '1' and '0'
> respectively.  That causes the CARRY-OUT to follow the complement of the "F"
> LUT output.  But we all know how easy it is to invert logic in a Xilinx, so I
> just put an inverter on the CARRY-OUT and the logic takes care of getting rid
> of the inversion for me.
>
> The router isn't too good at combining MUXCYs and MUXF5s, so to get this into a
> single slice I have to RLOC it.  But the good news is that this works.
>
> Generalizing, this means that I can get certain collections of 3 logic
> functions in a single slice.  The general rule is:
>
> The three logic functions are {F,G',F5}, and the 9 input variables are
> {F1,F2,F3,F4,G1,G2,G3,G4, and BX}
>
> F <= LUT(F1,F2,F3,F4);
>
> G' <= LUT(G1,G2,G3,G4) nand  BX;
>
> with BX select
>     F5 <=
>         G when '0',
>         F when others;
>
> I thought this was cool.  It allows a 16-bit wide 0 to 7 bit barrel shifter in
> just 36 LUTs which is 12 less than the number needed to create a barrel shift
> the usual way.
>
> The logic for the above was implemented in schematics, but I could easily
> convert this to VHDL if anyone is interested.
>
> I also figured out a way to program a column of slices to perform a vector of 3
> to 1 muxes instead of just 2 to 1 muxes.  This can be used to create very
> efficient barrel shifters where the shift amount is a power of 3.  An example
> would be a barrel shifter that shifts between 0 and 8 bits.  With the usual
> barrel shift technique, such a barrel shifter would require 4 stages, but using
> 3 to 1 muxes it requires only 2 stages.  There's some fixed costs associated
> with computing the controls for the stages, (and since it uses arithmetic
> functions), driving the CARRY-IN for each stage.  (It's actually more complex
> than I'm implying here.)  I'll post code for it if anyone is interested.
>
> Carl
>
> --
> Posted from firewall.terabeam.com [216.137.15.2]
> via Mailgate.ORG Server - http://www.Mailgate.ORG

Article: 37659
Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx
From: Peter Alfke <peter.alfke@xilinx.com>
Date: Tue, 18 Dec 2001 12:12:25 -0800
Links: << >> << T >> << A >>

I agree that it looks very clever and interesting.
But, just as an aside, floating point need arithmetic shifters for normalization,
not barrel shifters.
Also, remember that Virtex-II has lots of multipliers, many of them begging to be
used as "free" shifters ( multipliy by a power of 2 )

Peter Alfke
====================================
Steven Derrien wrote:

> Hello,
>
> This could be very useful for optimizing floating point adders or substracters,
> since
> they use large barrel shifters for normalization and denormalization. If you have
> some VHDL for this I'd be eager to use it to see how it affects area and speed !!
>
> Steven
>
> Carl Brannen wrote:
>
> > This came as a result of thinking about how to make more efficient barrel
> > shifters.  Most readers are familiar with fall through barrel shifters and how
> > they're usually implemented.  (With columns of 2 to 1 muxes that each column
> > dedicated to shifting the data by a different power of 2.)
> >
> > I got to thinking about how to use MUXF5s for barrel shifters.  It's clear that
> > if you could bring out the terms that feed the MUXF5 you could get three
> > results out of a slice instead of just two.  That would essentially give me
> > three 2 to 1 muxes in one slice instead of two, and it would really improve
> > barrel shifter packing (and maybe be good for random logic use).
> >
> > The problem with doing it is that it's hard to get the output of the "F" LUT
> > out of the slice.  But it can be done by brining it out the CARRY-OUT.  You use
> > a MUXCY, and apply '0' and '1' to the DI and CI inputs, and the "F" LUT output
> > to the S input of the MUXCY.  That programs the MUXCY to be a buffer of the "F"
> > LUT, and you get the (otherwise hidden) "F" LUT output as the carry-out (which
> > can easily route to the outside of the slice).
> >
> > The only problem with that is that it's very hard to program the CI of the
> > MUXCY to '1' and still use the MUXF5.  In fact, since the BX input is going to
> > have to be used by the 'S' input of the MUXF5, you have to use the carry input
> > of the slice.  And that puts the problem of generating a '1' into the next door
> > neighbor to the slice, where you'll have to use a LUT to generate it, thereby
> > wasting the LUT you were trying to save.
> >
> > That was where my analysis ended, but the other night I realized that for a
> > barrel shifter, I don't have to control the value of that carry-out at all
> > times.  I only need to control it when I'm actually going to select it in the
> > next stage of logic.  And since the selector for the next stage of logic will
> > be the same selector as is coming in on the 'S' pin for the MUXF5, that
> > suggests that there might be a solution.
> >
> > In fact there is.  You program the "DI" input of the MUXCY to '1', and connect
> > both the MUXCY.CI input  and the MUXF5.S input to the same BX input.  That's a
> > natural use for the carry input, but it's usually not done because normally
> > arithmetic logic is not used at the same time as the MUXF5 pin.  But you can do
> > it.
> >
> > The result is that when the MUXF5 selects the "F" LUT, (i.e. when the BX input
> > is '1'), the MUXF5 operates normally, but the Carry-out will be forced to '1'.
> > But that's the condition under which you would normally ignore the carry-out
> > anyway, if you were using the circuit as a barrel shifter (with positive shift
> > amount).
> >
> > On the other hand, when the SHIFT input is low, the "G" LUT is selected for the
> > MUXF5, and the DI and CI inputs of the MUXCY end up as '1' and '0'
> > respectively.  That causes the CARRY-OUT to follow the complement of the "F"
> > LUT output.  But we all know how easy it is to invert logic in a Xilinx, so I
> > just put an inverter on the CARRY-OUT and the logic takes care of getting rid
> > of the inversion for me.
> >
> > The router isn't too good at combining MUXCYs and MUXF5s, so to get this into a
> > single slice I have to RLOC it.  But the good news is that this works.
> >
> > Generalizing, this means that I can get certain collections of 3 logic
> > functions in a single slice.  The general rule is:
> >
> > The three logic functions are {F,G',F5}, and the 9 input variables are
> > {F1,F2,F3,F4,G1,G2,G3,G4, and BX}
> >
> > F <= LUT(F1,F2,F3,F4);
> >
> > G' <= LUT(G1,G2,G3,G4) nand  BX;
> >
> > with BX select
> >     F5 <=
> >         G when '0',
> >         F when others;
> >
> > I thought this was cool.  It allows a 16-bit wide 0 to 7 bit barrel shifter in
> > just 36 LUTs which is 12 less than the number needed to create a barrel shift
> > the usual way.
> >
> > The logic for the above was implemented in schematics, but I could easily
> > convert this to VHDL if anyone is interested.
> >
> > I also figured out a way to program a column of slices to perform a vector of 3
> > to 1 muxes instead of just 2 to 1 muxes.  This can be used to create very
> > efficient barrel shifters where the shift amount is a power of 3.  An example
> > would be a barrel shifter that shifts between 0 and 8 bits.  With the usual
> > barrel shift technique, such a barrel shifter would require 4 stages, but using
> > 3 to 1 muxes it requires only 2 stages.  There's some fixed costs associated
> > with computing the controls for the stages, (and since it uses arithmetic
> > functions), driving the CARRY-IN for each stage.  (It's actually more complex
> > than I'm implying here.)  I'll post code for it if anyone is interested.
> >
> > Carl
> >
> > --
> > Posted from firewall.terabeam.com [216.137.15.2]
> > via Mailgate.ORG Server - http://www.Mailgate.ORG

Article: 37660
Subject: Spartan-IIE schematic symbol?
From: "Peter Fenn" <Peter.Fenn@avnet.com>
Date: Tue, 18 Dec 2001 12:31:29 -0800
Links: << >> << T >> << A >>

Spartan-IIE: I am urgently looking for a (board-level) schematic symbol (preferably ORCAD or VIEWLOGIC) for an XC2S100E-6FT256C Xilinx FPGA. Is anyone in a position to help on this?
-Thanks in advance :-)

Article: 37661
Subject: Re: Defauolt Should Be "Inputs and Outputs" For IOBs
From: "Austin Franklin" <austin@dark98room.com>
Date: Tue, 18 Dec 2001 15:34:08 -0500
Links: << >> << T >> << A >>

"S. Ramirez" <sramirez@cfl.rr.com> wrote in message
news:zKMT7.137295$Ga5.21230731@typhoon.tampabay.rr.com...
>      I don't know about you guys and gals and pals, but everytime I do a
> design, without exception, I ALWAYS go into the Xilinx Design Manager
> Design --> Optiions --> Implementation Edit Options and select and select
> "Inputs and Outputs" for Pack I/O Registers/Latches into IOBs for.  I
ALWAYS
> want my designs to use IOB flip flops if possible.  It seems to me that
the
> default "Off" is a waste of these flip flops.  Does anyone here every turn
> this off?
> Simon Ramirez, Consultant
> Synchronous Design, Inc.
> Oviedo, FL  USA

Hi Simon,

That's what you get for using Design Mangler...er...Manager ;-)

Austin

Article: 37662
Subject: Re: is it OK?
From: Ray Andraka <ray@andraka.com>
Date: Tue, 18 Dec 2001 20:46:13 GMT
Links: << >> << T >> << A >>

Why not just ask the magic eight ball?  Never mind, it would probably tell him to try again.

Andy Peters wrote:

> You could simulate it, and find out for yourself if it is OK.
>
> OK?
>
> "chensw20hotmail.com" wrote:
> >
> > Now,i want to implement it by counter controlling.is it OK?
> >
> > /*counter[2:0] works if read enable.Data was be shifted by counter control*/
> > always @(posedge NA_Clock or negedge Rst )
> > begin
> > if(Rst)
> > NA_Count<=0;
> > else if(NA_Read_Enable) NA_Count<=NA_Count+1;
> > else NA_Count<=0;
> > end
> >
> > /*data read out from fifo were allocated is NA_Des_Data0.1.....7 dividually NA_Data_Out[15:0] :fifo data out NA_Des_Data[7:0] [15:0] */ always @(posedge NA_Clock or negedge Rst )
> > begin
> > if(Rst)
> > begin
> > NA_Des_Data0 <=16'b0;
> > NA_Des_Data1 <=16'b0;
> > NA_Des_Data2 <=16'b0;
> > NA_Des_Data3 <=16'b0;
> > NA_Des_Data4 <=16'b0;
> > NA_Des_Data5 <=16'b0;
> > NA_Des_Data6 <=16'b0;
> > NA_Des_Data7 <=16'b0;
> >
> > end
> >  else
> > case(NA_Count)
> > 3'b000: NA_Des_Data0 <=NA_Data_Out; 3'b001: NA_Des_Data1 <=NA_Data_Out; 3'b010: NA_Des_Data2 <=NA_Data_Out; 3'b011: NA_Des_Data3 <=NA_Data_Out; 3'b100: NA_Des_Data4 <=NA_Data_Out; 3'b101: NA_Des_Data5 <=NA_Data_Out; 3'b110: NA_Des_Data6 <=NA_Data_Out; 3'b111: NA_Des_Data7 <=NA_Data_Out; default :
> >  begin
> > NA_Des_Data0 <=16'b0;
> > NA_Des_Data1 <=16'b0;
> > NA_Des_Data2 <=16'b0;
> > NA_Des_Data3 <=16'b0;
> > NA_Des_Data4 <=16'b0;
> > NA_Des_Data5 <=16'b0;
> > NA_Des_Data6 <=16'b0;
> > NA_Des_Data7 <=16'b0;
> > end
> > endcase
> > end
> > is it OK?
> > Thanks

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 37663
Subject: Re: ISP by JTAG using a microcontroller
From: Greg Neff <gregeneff@yahoo.com>
Date: Tue, 18 Dec 2001 16:08:35 -0500
Links: << >> << T >> << A >>

On Wed, 19 Dec 2001 07:54:19 +1300, Jim Granville
<jim.granville@designtools.co.nz> wrote:

>Greg Neff wrote:
><snip> 
>> BTW, we cleaned up the Xilinx 8051 code to get rid of signed
>> variables, unnecessarily long variables, and other inefficiencies.
>> This halved the programming time.  I told Xilinx that they should
>> clean up their example code, and they basically told me to go away and
>> leave them alone.
>
> Interesting :-)
>
> I'm sure there is somewhere you could post the cleaned up code...



>
> What was the final 8051 Code / RAM footprint, after you did this ?
>
> Did you look at run length compression, or just use a BIT file copy ?
>
> -jg


===================================
Greg Neff
VP Engineering
*Microsym* Computers Inc.
greg@guesswhichwordgoeshere.com

Article: 37664
Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx
From: Ray Andraka <ray@andraka.com>
Date: Tue, 18 Dec 2001 21:09:15 GMT
Links: << >> << T >> << A >>

Here's my two cents worth (maybe not even that much).

1) Peter, the term barrel shift is commonly (although technically incorrectly) applied
to shifters which have a variable shift distance.  The virtex II multipliers can in
fact be used this way, but it can be done considerably faster (with more pipelining) in
the fabric for very little additional cost, especially when you consider the resources
taken by the added pipeline registers you need in front of and behind the multiplier to
get any where close to the data sheet speeds.  It all comes down to how do I best use
the resources available to me.

2) The carry chain can also be used for a free doubler circuit.   However, watch the
timing.  There exist false paths (that are also quite slow comparatively speaking)
introduced by the non-standard use of the carry chain (the chain connections are only
used to the next neighbor, not all the way up the chain).  Timingwise, the conventional
approach seems to yield better propagation delays in combinatorial only shifters, and
considerably better times in fully pipelined shifters.  This is a good trick to put in
your back pocket for those times where the need for density outweighs the needs of the
clock cycle.

3) I'd be interested in seeing your layout solution.  The layout is not trivial to
making this perform well.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 37665
Subject: Re: ISP by JTAG using a microcontroller
From: Greg Neff <gregeneff@yahoo.com>
Date: Tue, 18 Dec 2001 16:28:07 -0500
Links: << >> << T >> << A >>

On Wed, 19 Dec 2001 07:54:19 +1300, Jim Granville
<jim.granville@designtools.co.nz> wrote:

>Greg Neff wrote:
><snip> 
>> BTW, we cleaned up the Xilinx 8051 code to get rid of signed
>> variables, unnecessarily long variables, and other inefficiencies.
>> This halved the programming time.  I told Xilinx that they should
>> clean up their example code, and they basically told me to go away and
>> leave them alone.
>
> Interesting :-)
>
> I'm sure there is somewhere you could post the cleaned up code...

No can do.  We developed the code for a paying customer, so we are not
free to give it away.

>
> What was the final 8051 Code / RAM footprint, after you did this ?

It was a few modules of a large piece of automatic production
programming and test code.  A scanned the link map, and it looks like
it used about 2,300 bytes of code, and 36 bytes of RAM in data space.

>
> Did you look at run length compression, or just use a BIT file copy ?
>

I just used the standard JED -> SVF -> XSVF file flow.  I stored this
file, as well as other production XSVF and HEX files, in a big
external flash.  As part of the tester FPGA I built an automatic
address incrementer for this big flash, so that I could easily
sequentially read each byte from the files to be programmed into the
devices on the UUT.

===================================
Greg Neff
VP Engineering
*Microsym* Computers Inc.
greg@guesswhichwordgoeshere.com

Article: 37666
Subject: Re: Spartan-IIE schematic symbol?
From: "Austin Franklin" <austin@dark98room.com>
Date: Tue, 18 Dec 2001 16:57:29 -0500
Links: << >> << T >> << A >>

Typically, the schematic symbol isn't a "canned" symbol, it is a custom
symbol tailored to the function/pinout of the FPGA.  They don't take THAT
long to make if you know the tool...

Does anyone here **really** use a "canned" symbol for their FPGAs?  For a
PAL, yes, but an FPGA?

"Peter Fenn" <Peter.Fenn@avnet.com> wrote in message
news:ee73c6a.-1@WebX.sUN8CHnE...
> Spartan-IIE: I am urgently looking for a (board-level) schematic symbol
(preferably ORCAD or VIEWLOGIC) for an XC2S100E-6FT256C Xilinx FPGA. Is
anyone in a position to help on this?
> -Thanks in advance :-)

Article: 37667
Subject: Re: Kindergarten Stuff
From: "Austin Franklin" <austin@dark98room.com>
Date: Tue, 18 Dec 2001 17:04:15 -0500
Links: << >> << T >> << A >>

Hi Peter,

"Peter Alfke" <peter.alfke@xilinx.com> wrote in message
news:3C1F8AEC.BFD2E067@xilinx.com...
> This is a friendly and helpful newsgroup, but let's make sure that it does
not
> get abused.
> Lots of textbooks explain how to divide by a power of 2, where the
remainder is,
> and how you sign-extend the MSB. Explaining that is not the purpose of
this
> newsgroup.

Where does it say that?  I've never seen the charter, but I certainly
wouldn't turn away a question of how to do division in an FPGA because it's
in a textbook!

> Let's use our "bandwidth" for more complex and perhaps controversial
questions
> that are not explained in textbooks and data books.

Why?  If someone doesn't like a particular discussion topic, then cripes,
just don't read it!

Regards,

Austin

Article: 37668
Subject: Re: Spartan-IIE schematic symbol?
From: Greg Neff <gregeneff@yahoo.com>
Date: Tue, 18 Dec 2001 17:14:29 -0500
Links: << >> << T >> << A >>

On Tue, 18 Dec 2001 16:57:29 -0500, "Austin Franklin"
<austin@dark98room.com> wrote:

(snip)>
>Does anyone here **really** use a "canned" symbol for their FPGAs?  For a
>PAL, yes, but an FPGA?
>
(snip)

Not a chance.  With large pin counts we usually build a heterogeneous
symbol so that we can put different functional blocks of the FPGA on
different schematic sheets.  This makes the design easier to follow.
Unfortunately, it is a tedious manual process that has to be checked
and double-checked.

===================================
Greg Neff
VP Engineering
*Microsym* Computers Inc.
greg@guesswhichwordgoeshere.com

Article: 37669
Subject: Re: Kindergarten Stuff
From: "Bryan" <bryan@srccomp.com>
Date: Tue, 18 Dec 2001 15:37:20 -0700
Links: << >> << T >> << A >>

So lets talk controversial....

If Lucent can support hard macros in Epic with hard routing, then why can't
Xilinx.  My application requires it and Xilinx doesn't support it in FPGA
editor(which was programmed by the same softies as Epic).  Oh, I remember
why they don't support it.  Because nobody cares about designs that push the
limitations of FPGAs.  Because everybody else that is making designs for
Xilinx parts is still in kindergarten finger painting with verilog and hdl.
Ha, I didn't get my EE degree to be a soft weirdo.  Anybody can throw code
together and get poor performance.

flame away kindergarten kids

Bryan

"Peter Alfke" <peter.alfke@xilinx.com> wrote in message
news:3C1F8AEC.BFD2E067@xilinx.com...
> This is a friendly and helpful newsgroup, but let's make sure that it does
not
> get abused.
> Lots of textbooks explain how to divide by a power of 2, where the
remainder is,
> and how you sign-extend the MSB. Explaining that is not the purpose of
this
> newsgroup.
>
> Let's use our "bandwidth" for more complex and perhaps controversial
questions
> that are not explained in textbooks and data books.
>
> Peter Alfke, Xilinx Applications
>
>

Article: 37670
Subject: Re: SPI interface in VHDL
From: Adam Hawes <hawe0006@infoeng.flinders.edu.au>
Date: Wed, 19 Dec 2001 09:22:00 +1030
Links: << >> << T >> << A >>

> The Altera NIOS softcore processor comes with a flexible, parameterizable
> SPI interface module in VHDL or Verilog. The complete NIOS license with all
> tools, board and of course SPI is US-$ 995,-
> Check out:

For that matter, the Xilinx XS95 and XS40 boards come with SPI (they
are, aren't they? Correct me if I'm wrong) compatible codecs on them. 
Check out http://www.xess.com for the documentation.  There's source for
a SPI library that may do what you want.

Cheers,
Adam

=================================
IMPORTANT: This email and any attachments may be confidential. Any
retransmissions, dissemination or other use of these materials by 
persons or entities other than the intended recipient is prohibited.
If received in error, please contact us and delete all copies. Before
opening or using attachments, check them for viruses and defects. Our
liability is limited to resupplying any affected attachments. [Any
representations or opinions expressed in this e.mail are those of the
individual sender, and not necessarily those of Vision Systems Limited]

Article: 37671
Subject: Divide by 3, with remainder, efficient and fast, for Altera or Xilinx
From: "Carl Brannen" <carl.brannen@terabeam.com>
Date: Tue, 18 Dec 2001 23:13:45 +0000 (UTC)
Links: << >> << T >> << A >>

Let's see how long a VHDL chunk will fit in this forum...

library IEEE;
use IEEE.std_logic_1164.all;

-- Divide by 3 circuit example.  Uses only 85 LUTs
-- (up to 4-input) to compute the quotient and
-- remainder when a 32-bit input is divided by 3.
--
-- Designer:  Carl Brannen
--
-- Feel free to modify this circuit and use it in
-- your own designs. I am aware of no patents that
-- it infringes on, but you will have to make your
-- own determination of this.  My only request is
-- that you leave a comment to the effect that your
-- knowledge of the algorithm is through me.
--
-- Synthesize with optimize set for "low", and 
-- "area".  This circuit is already optimized,
-- the computer will only be waste its time (and likely
-- increase the size and delay of the result) if it
-- tries to optimize further.
--
-- This code was written in response to this post
-- on the comp.arch.FPGA thread:
--
-- <<<
-- "I need to implement in an fpga an algorithm that will divide an integer
-- by 3.  The dividend length is still to be determined but will be
-- somewhere between 20 and 30 bits, and the divisor is always the number
-- 3.
-- 
-- Does anyone know an efficient combinatoric algorithm that can accomplish
-- this?
-- >>>
--
-- http://www.fpga-faq.com/archives/11400.html#11409

entity DIV32_3 is
    port (
        CLK:    in  STD_LOGIC;
        AIN:    in  STD_LOGIC_VECTOR(31 downto 0);
        REMOUT: out STD_LOGIC_VECTOR( 1 downto 0);
        QOUT:   out STD_LOGIC_VECTOR(30 downto 0);
        TEST:   out STD_LOGIC_VECTOR(53 downto 0)
    );
end DIV32_3;

architecture DIV32_3_arch of DIV32_3 is

-- Partial remainders:
signal R1V:  STD_LOGIC_VECTOR(15 downto 0);  --  16 LUTs
signal R2V:  STD_LOGIC_VECTOR( 7 downto 0);  --   8 LUTs
signal R4V:  STD_LOGIC_VECTOR( 3 downto 0);  --   4 LUTs
signal R8V:  STD_LOGIC_VECTOR( 1 downto 0);  --   2 LUTs
signal P4V:  STD_LOGIC_VECTOR( 1 downto 0);  --   2 LUTs
signal P3V:  STD_LOGIC_VECTOR( 1 downto 0);  --   2 LUTs
signal P2V:  STD_LOGIC_VECTOR( 1 downto 0);  --   2 LUTs
signal P1V:  STD_LOGIC_VECTOR( 1 downto 0);  --   2 LUTs
signal P0V:  STD_LOGIC_VECTOR( 1 downto 0);  --   2 LUTs

-- Rearrangement of partial remainders only:
signal PRV:  STD_LOGIC_VECTOR(15 downto 0);  --  16 LUTs

-- carries internal to blocks:
signal X0V:  STD_LOGIC_VECTOR(13 downto 0);  --  14 LUTs

-- Flip-flop for QOUT:
signal QOUTQ,QOUTD: STD_LOGIC_VECTOR(30 downto 0);  --  31 LUTs

-- Flip-flop for Remainder:
signal REMQ,REMD:   STD_LOGIC_VECTOR( 1 downto 0);


--                                           --  -------

-- Total LUT count:                          --  85 LUTs (45 slices or 23 CLBs)

-- Force tool to not "optimize" (i.e. bloat) the design
-- by creating a set of flip-flop outputs.
signal FKD,FKQ: STD_LOGIC_VECTOR(53 downto 0);


begin

-- Scheme for quickest determination of remainder when dividing by 3.
-- Example, 32-bit input, R80 provides the 2-bit remainder result in
-- just 4 stages of 4-input LUTs:
--
--                   AIN
-- 3322 2222 2222 1111 1111 1100 0000 0000
-- 1098 7654 3210 9876 5432 1098 7654 3210
-- ---- ---- ---- ---- ---- ---- ---- ----
--  R17  R16  R15  R14  R13  R12  R11  R10
--   \   /     \   /     \   /     \   /
--    R23       R22       R21       R20
--        \    /              \    /
--         R41                 R40
--            -----\    /-----
--                  R80
--
-- Scheme for computing quotients from the above
-- remainder scheme.  More partial remainders have
-- to be computed, as compared to the above remainders,
-- these ones are called PRs.
--
-- The quotient for the highest four bits is computed
-- directly from (no greater than 4-input) LUTs.  The
-- lower quotients all require a remainder input.  That
-- remainder allows direct computation of the quotient
-- for the high two bits, and a partial remainder needs
-- to be computed to get the lower 2 bits as well.  The
-- following diagram suppresses Rxx that aren't used, and
-- only shows how the PRs are calculated.  The lowest
-- value in each column gives the Rxx or PRx that computes
-- the partial remainder at that column:
--
--                   AIN
-- 3322 2222 2222 1111 1111 1100 0000 0000
-- 1098 7654 3210 9876 5432 1098 7654 3210
-- ---- ---- ---- ---- ---- ---- ---- ----
--  R17       R15       R13       R11
--             |         |         |
--            P4         |   R21   |
--           /           |    |  \ |
--       R23             P3   P2   P1
--                      /     /    |
--                     /     /     P0
--                    /-----/-----/
--                 R41
--
-- ---- ---- ---- ---- ---- ---- ---- ----
--  R17  R23  P4   R41  P3   P2   P0   R8
--  PR7  PR6  PR5  PR4  PR3  PR2  PR1  PR0
--
--
-- From the above tables, it's clear that the
-- longest computation is that of the quotient
-- at position 0, as would be expected.  The
-- number of stages of logic is only 6, and the
-- longest paths are as follows:
--
-- R17 R16 R15 R14 R13 R12
--   \  /    \  /    \  /
--   R23     R22     R21 
--      \    /         \ 
--       R41            P1
--            \     /
--              P0
--               |
--              Remainder[3:2]
--               |
--              Quotient[1:0]
--
--
-- In order to make the VHDL shorter, I've packed
-- the remainders into longer STD_LOGIC_VECTORs
-- as follows:
-- R1V <= R17 & R16 & ... R10
-- R2V <= R23 & R22 & R21 & R20
-- R4V <= R41 & R40
-- R8V <= R80
--
--
-- I normally don't like to complicate things any more
-- than they have to, but I hate to have to create all
-- those unnecessary "SEL" assignments.

-- If Xilinx would support a select statement like this:
--
--     with AIN(4*I+3 downto 4*I)
--
-- I wouldn't have to do this this way, but this is the first
-- way to implement this that comes to mind.  I guess I could
-- define LUT4s, since none of these are trivial, but that wouldn't
-- port to Altera.

-- Generate the R1x logic (16 LUTs)
G1: for I in 0 to 7 generate
R1V(I*2 + 0)
 <= ((not AIN(4*I+3)) and (not AIN(4*I+2)) and (not AIN(4*I+1)) and (
AIN(4*I+0)))
 or ((not AIN(4*I+3)) and (    AIN(4*I+2)) and (not AIN(4*I+1)) and (not
AIN(4*I+0)))
 or ((not AIN(4*I+3)) and (    AIN(4*I+2)) and (    AIN(4*I+1)) and (
AIN(4*I+0)))
 or ((    AIN(4*I+3)) and (not AIN(4*I+2)) and (    AIN(4*I+1)) and (not
AIN(4*I+0)))
 or ((    AIN(4*I+3)) and (    AIN(4*I+2)) and (not AIN(4*I+1)) and (
AIN(4*I+0)));
R1V(I*2 + 1)
 <= ((not AIN(4*I+3)) and (not AIN(4*I+2)) and (    AIN(4*I+1)) and (not
AIN(4*I+0)))
 or ((not AIN(4*I+3)) and (    AIN(4*I+2)) and (not AIN(4*I+1)) and (
AIN(4*I+0)))
 or ((    AIN(4*I+3)) and (not AIN(4*I+2)) and (not AIN(4*I+1)) and (not
AIN(4*I+0)))
 or ((    AIN(4*I+3)) and (not AIN(4*I+2)) and (    AIN(4*I+1)) and (
AIN(4*I+0)))
 or ((    AIN(4*I+3)) and (    AIN(4*I+2)) and (    AIN(4*I+1)) and (not
AIN(4*I+0)));
end generate;

-- Generate the R2x logic (8 LUTs)
G2: for I in 0 to 3 generate
R2V(I*2 + 0)
 <= ((not R1V(4*I+3)) and (not R1V(4*I+2)) and (not R1V(4*I+1)) and (
R1V(4*I+0)))
 or ((not R1V(4*I+3)) and (    R1V(4*I+2)) and (not R1V(4*I+1)) and (not
R1V(4*I+0)))
 or ((not R1V(4*I+3)) and (    R1V(4*I+2)) and (    R1V(4*I+1)) and (
R1V(4*I+0)))
 or ((    R1V(4*I+3)) and (not R1V(4*I+2)) and (    R1V(4*I+1)) and (not
R1V(4*I+0)))
 or ((    R1V(4*I+3)) and (    R1V(4*I+2)) and (not R1V(4*I+1)) and (
R1V(4*I+0)));
R2V(I*2 + 1)
 <= ((not R1V(4*I+3)) and (not R1V(4*I+2)) and (    R1V(4*I+1)) and (not
R1V(4*I+0)))
 or ((not R1V(4*I+3)) and (    R1V(4*I+2)) and (not R1V(4*I+1)) and (
R1V(4*I+0)))
 or ((    R1V(4*I+3)) and (not R1V(4*I+2)) and (not R1V(4*I+1)) and (not
R1V(4*I+0)))
 or ((    R1V(4*I+3)) and (not R1V(4*I+2)) and (    R1V(4*I+1)) and (
R1V(4*I+0)))
 or ((    R1V(4*I+3)) and (    R1V(4*I+2)) and (    R1V(4*I+1)) and (not
R1V(4*I+0)));
end generate;

-- Generate the R4x logic (4 LUTs)
G4: for I in 0 to 1 generate
R4V(I*2 + 0)
 <= ((not R2V(4*I+3)) and (not R2V(4*I+2)) and (not R2V(4*I+1)) and (
R2V(4*I+0)))
 or ((not R2V(4*I+3)) and (    R2V(4*I+2)) and (not R2V(4*I+1)) and (not
R2V(4*I+0)))
 or ((not R2V(4*I+3)) and (    R2V(4*I+2)) and (    R2V(4*I+1)) and (
R2V(4*I+0)))
 or ((    R2V(4*I+3)) and (not R2V(4*I+2)) and (    R2V(4*I+1)) and (not
R2V(4*I+0)))
 or ((    R2V(4*I+3)) and (    R2V(4*I+2)) and (not R2V(4*I+1)) and (
R2V(4*I+0)));
R4V(I*2 + 1)
 <= ((not R2V(4*I+3)) and (not R2V(4*I+2)) and (    R2V(4*I+1)) and (not
R2V(4*I+0)))
 or ((not R2V(4*I+3)) and (    R2V(4*I+2)) and (not R2V(4*I+1)) and (
R2V(4*I+0)))
 or ((    R2V(4*I+3)) and (not R2V(4*I+2)) and (not R2V(4*I+1)) and (not
R2V(4*I+0)))
 or ((    R2V(4*I+3)) and (not R2V(4*I+2)) and (    R2V(4*I+1)) and (
R2V(4*I+0)))
 or ((    R2V(4*I+3)) and (    R2V(4*I+2)) and (    R2V(4*I+1)) and (not
R2V(4*I+0)));
end generate;

-- The R80 logic: (2 LUTs)
R8V(0)
 <= ((not R4V(3)) and (not R4V(2)) and (not R4V(1)) and (    R4V(0)))
 or ((not R4V(3)) and (    R4V(2)) and (not R4V(1)) and (not R4V(0)))
 or ((not R4V(3)) and (    R4V(2)) and (    R4V(1)) and (    R4V(0)))
 or ((    R4V(3)) and (not R4V(2)) and (    R4V(1)) and (not R4V(0)))
 or ((    R4V(3)) and (    R4V(2)) and (not R4V(1)) and (    R4V(0)));
R8V(1)
 <= ((not R4V(3)) and (not R4V(2)) and (    R4V(1)) and (not R4V(0)))
 or ((not R4V(3)) and (    R4V(2)) and (not R4V(1)) and (    R4V(0)))
 or ((    R4V(3)) and (not R4V(2)) and (not R4V(1)) and (not R4V(0)))
 or ((    R4V(3)) and (not R4V(2)) and (    R4V(1)) and (    R4V(0)))
 or ((    R4V(3)) and (    R4V(2)) and (    R4V(1)) and (not R4V(0)));

-- P4 = R23 # R15 (2 LUTs)
P4V(0)
 <= ((not R2V(7)) and (not R2V(6)) and (not R1V(11)) and (    R1V(10)))
 or ((not R2V(7)) and (    R2V(6)) and (not R1V(11)) and (not R1V(10)))
 or ((not R2V(7)) and (    R2V(6)) and (    R1V(11)) and (    R1V(10)))
 or ((    R2V(7)) and (not R2V(6)) and (    R1V(11)) and (not R1V(10)))
 or ((    R2V(7)) and (    R2V(6)) and (not R1V(11)) and (    R1V(10)));
P4V(1)
 <= ((not R2V(7)) and (not R2V(6)) and (    R1V(11)) and (not R1V(10)))
 or ((not R2V(7)) and (    R2V(6)) and (not R1V(11)) and (    R1V(10)))
 or ((    R2V(7)) and (not R2V(6)) and (not R1V(11)) and (not R1V(10)))
 or ((    R2V(7)) and (not R2V(6)) and (    R1V(11)) and (    R1V(10)))
 or ((    R2V(7)) and (    R2V(6)) and (    R1V(11)) and (not R1V(10)));

-- P3 = R41 # R13 (2 LUTs)
P3V(0)
 <= ((not R4V(3)) and (not R4V(2)) and (not R1V(7)) and (    R1V(6)))
 or ((not R4V(3)) and (    R4V(2)) and (not R1V(7)) and (not R1V(6)))
 or ((not R4V(3)) and (    R4V(2)) and (    R1V(7)) and (    R1V(6)))
 or ((    R4V(3)) and (not R4V(2)) and (    R1V(7)) and (not R1V(6)))
 or ((    R4V(3)) and (    R4V(2)) and (not R1V(7)) and (    R1V(6)));
P3V(1)
 <= ((not R4V(3)) and (not R4V(2)) and (    R1V(7)) and (not R1V(6)))
 or ((not R4V(3)) and (    R4V(2)) and (not R1V(7)) and (    R1V(6)))
 or ((    R4V(3)) and (not R4V(2)) and (not R1V(7)) and (not R1V(6)))
 or ((    R4V(3)) and (not R4V(2)) and (    R1V(7)) and (    R1V(6)))
 or ((    R4V(3)) and (    R4V(2)) and (    R1V(7)) and (not R1V(6)));

-- P2 = R41 # R21 (2 LUTs)
P2V(0)
 <= ((not R4V(3)) and (not R4V(2)) and (not R2V(3)) and (    R2V(2)))
 or ((not R4V(3)) and (    R4V(2)) and (not R2V(3)) and (not R2V(2)))
 or ((not R4V(3)) and (    R4V(2)) and (    R2V(3)) and (    R2V(2)))
 or ((    R4V(3)) and (not R4V(2)) and (    R2V(3)) and (not R2V(2)))
 or ((    R4V(3)) and (    R4V(2)) and (not R2V(3)) and (    R2V(2)));
P2V(1)
 <= ((not R4V(3)) and (not R4V(2)) and (    R2V(3)) and (not R2V(2)))
 or ((not R4V(3)) and (    R4V(2)) and (not R2V(3)) and (    R2V(2)))
 or ((    R4V(3)) and (not R4V(2)) and (not R2V(3)) and (not R2V(2)))
 or ((    R4V(3)) and (not R4V(2)) and (    R2V(3)) and (    R2V(2)))
 or ((    R4V(3)) and (    R4V(2)) and (    R2V(3)) and (not R2V(2)));

-- P1 = R11 # R21 (2 LUTs)
P1V(0)
 <= ((not R1V(3)) and (not R1V(2)) and (not R2V(3)) and (    R2V(2)))
 or ((not R1V(3)) and (    R1V(2)) and (not R2V(3)) and (not R2V(2)))
 or ((not R1V(3)) and (    R1V(2)) and (    R2V(3)) and (    R2V(2)))
 or ((    R1V(3)) and (not R1V(2)) and (    R2V(3)) and (not R2V(2)))
 or ((    R1V(3)) and (    R1V(2)) and (not R2V(3)) and (    R2V(2)));
P1V(1)
 <= ((not R1V(3)) and (not R1V(2)) and (    R2V(3)) and (not R2V(2)))
 or ((not R1V(3)) and (    R1V(2)) and (not R2V(3)) and (    R2V(2)))
 or ((    R1V(3)) and (not R1V(2)) and (not R2V(3)) and (not R2V(2)))
 or ((    R1V(3)) and (not R1V(2)) and (    R2V(3)) and (    R2V(2)))
 or ((    R1V(3)) and (    R1V(2)) and (    R2V(3)) and (not R2V(2)));

-- P0 = R41 # PR1 (2 LUTs)
P0V(0)
 <= ((not R4V(3)) and (not R4V(2)) and (not P1V(1)) and (    P1V(0)))
 or ((not R4V(3)) and (    R4V(2)) and (not P1V(1)) and (not P1V(0)))
 or ((not R4V(3)) and (    R4V(2)) and (    P1V(1)) and (    P1V(0)))
 or ((    R4V(3)) and (not R4V(2)) and (    P1V(1)) and (not P1V(0)))
 or ((    R4V(3)) and (    R4V(2)) and (not P1V(1)) and (    P1V(0)));
P0V(1)
 <= ((not R4V(3)) and (not R4V(2)) and (    P1V(1)) and (not P1V(0)))
 or ((not R4V(3)) and (    R4V(2)) and (not P1V(1)) and (    P1V(0)))
 or ((    R4V(3)) and (not R4V(2)) and (not P1V(1)) and (not P1V(0)))
 or ((    R4V(3)) and (not R4V(2)) and (    P1V(1)) and (    P1V(0)))
 or ((    R4V(3)) and (    R4V(2)) and (    P1V(1)) and (not P1V(0)));

-- Assemble the partial remainders into the inputs for the quotient
-- calculations:
PRV(15 downto 0)         --      Remainder of 
  <= R1V(15 downto 14)   -- R17  AIN[31:28]
  &  R2V( 7 downto  6)   -- R23  AIN[31:24]
  &  P4V( 1 downto  0)   -- P4   AIN[31:20]
  &  R4V( 3 downto  2)   -- R41  AIN[31:16]
  &  P3V( 1 downto  0)   -- P3   AIN[31:12]
  &  P2V( 1 downto  0)   -- P2   AIN[31:8]
  &  P0V( 1 downto  0)   -- P0   AIN[31:4]
  &  R8V( 1 downto  0);  -- R8   AIN[31:0]

-- The highest quotient block has no remainder coming in,
-- so compute it directly: (3 LUTs)
with AIN(31 downto 28) select
  QOUTD(30 downto 28) <=
    "000" when "0000" | "0001" | "0010",
    "001" when "0011" | "0100" | "0101",
    "010" when "0110" | "0111" | "1000",
    "011" when "1001" | "1010" | "1011",
    "100" when "1100" | "1101" | "1110",
    "101" when others;

-- 0000 00
-- 0001 00
-- 0010 00

-- 0011 01
-- 0100 01
-- 0101 01

-- 0110 10
-- 0111 10
-- 1000 10

-- 1001 11
-- 1010 11
-- 1011 11


-- Compute the quotient in blocks of 4 bits
Q: for I in 0 to 6 generate
  -- Top two bits are computed directly from the carry-in and AIN
 QOUTD(4*I + 2)
  <= ((not PRV(2*I+3)) and (not PRV(2*I+2)) and (    AIN(4*I+3)) and (
AIN(4*I+2)))
  or ((not PRV(2*I+3)) and (    PRV(2*I+2)) and (not AIN(4*I+3)) and (not
AIN(4*I+2)))
  or ((not PRV(2*I+3)) and (    PRV(2*I+2)) and (not AIN(4*I+3)) and (
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (not PRV(2*I+2)) and (not AIN(4*I+3)) and (
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (not PRV(2*I+2)) and (    AIN(4*I+3)) and (not
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (not PRV(2*I+2)) and (    AIN(4*I+3)) and (
AIN(4*I+2)));
 QOUTD(4*I + 3)
  <= ((not PRV(2*I+3)) and (    PRV(2*I+2)) and (    AIN(4*I+3)) and (not
AIN(4*I+2)))
  or ((not PRV(2*I+3)) and (    PRV(2*I+2)) and (    AIN(4*I+3)) and (
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (not PRV(2*I+2)) and (not AIN(4*I+3)) and (not
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (not PRV(2*I+2)) and (not AIN(4*I+3)) and (
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (not PRV(2*I+2)) and (    AIN(4*I+3)) and (not
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (not PRV(2*I+2)) and (    AIN(4*I+3)) and (
AIN(4*I+2)));
  -- I need to compute the remainder out of the top two bits for the lower two
bits
 X0V(2*I + 0)
  <= ((not PRV(2*I+3)) and (not PRV(2*I+2)) and (not AIN(4*I+3)) and (
AIN(4*I+2)))
  or ((not PRV(2*I+3)) and (    PRV(2*I+2)) and (not AIN(4*I+3)) and (not
AIN(4*I+2)))
  or ((not PRV(2*I+3)) and (    PRV(2*I+2)) and (    AIN(4*I+3)) and (
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (not PRV(2*I+2)) and (    AIN(4*I+3)) and (not
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (    PRV(2*I+2)) and (not AIN(4*I+3)) and (
AIN(4*I+2)));
 X0V(2*I + 1)
  <= ((not PRV(2*I+3)) and (not PRV(2*I+2)) and (    AIN(4*I+3)) and (not
AIN(4*I+2)))
  or ((not PRV(2*I+3)) and (    PRV(2*I+2)) and (not AIN(4*I+3)) and (
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (not PRV(2*I+2)) and (not AIN(4*I+3)) and (not
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (not PRV(2*I+2)) and (    AIN(4*I+3)) and (
AIN(4*I+2)))
  or ((    PRV(2*I+3)) and (    PRV(2*I+2)) and (    AIN(4*I+3)) and (not
AIN(4*I+2)));
  -- Now I can compute the lowest two bits:
 QOUTD(4*I + 0)
  <= ((not X0V(2*I+1)) and (not X0V(2*I+0)) and (    AIN(4*I+1)) and (
AIN(4*I+0)))
  or ((not X0V(2*I+1)) and (    X0V(2*I+0)) and (not AIN(4*I+1)) and (not
AIN(4*I+0)))
  or ((not X0V(2*I+1)) and (    X0V(2*I+0)) and (not AIN(4*I+1)) and (
AIN(4*I+0)))
  or ((    X0V(2*I+1)) and (not X0V(2*I+0)) and (not AIN(4*I+1)) and (
AIN(4*I+0)))
  or ((    X0V(2*I+1)) and (not X0V(2*I+0)) and (    AIN(4*I+1)) and (not
AIN(4*I+0)))
  or ((    X0V(2*I+1)) and (not X0V(2*I+0)) and (    AIN(4*I+1)) and (
AIN(4*I+0)));
 QOUTD(4*I + 1)
  <= ((not X0V(2*I+1)) and (    X0V(2*I+0)) and (    AIN(4*I+1)) and (not
AIN(4*I+0)))
  or ((not X0V(2*I+1)) and (    X0V(2*I+0)) and (    AIN(4*I+1)) and (
AIN(4*I+0)))
  or ((    X0V(2*I+1)) and (not X0V(2*I+0)) and (not AIN(4*I+1)) and (not
AIN(4*I+0)))
  or ((    X0V(2*I+1)) and (not X0V(2*I+0)) and (not AIN(4*I+1)) and (
AIN(4*I+0)))
  or ((    X0V(2*I+1)) and (not X0V(2*I+0)) and (    AIN(4*I+1)) and (not
AIN(4*I+0)))
  or ((    X0V(2*I+1)) and (not X0V(2*I+0)) and (    AIN(4*I+1)) and (
AIN(4*I+0)));
end generate;


-- The following flip-flop implication of all the partial logic is included
only
-- to prevent the synthesis tool from optimizing out the beautiful logic I've
-- created.
FKD(53 downto 0) <= R1V & R2V & R4V & R8V & P4V & P3V & P2V & P1V & P0V & X0V;

-- Register the remainder
REMD(1 downto 0) <= PRV(1 downto 0);

process (CLK)
begin
  if CLK'event and CLK='1' then
    FKQ   <= FKD;
    QOUTQ <= QOUTD;
    REMQ  <= REMD;
  end if;
end process;


-- Output assignment:

REMOUT <= REMQ(1 downto 0);
QOUT   <= QOUTQ(30 downto 0);

-- Test output (unused, but included for synthesis restriction).
TEST   <= FKQ;

end DIV32_3_arch;


-- 
Posted from firewall.terabeam.com [216.137.15.2] 
via Mailgate.ORG Server - http://www.Mailgate.ORG

Article: 37672
Subject: Re: Kindergarten Stuff
From: "Austin Franklin" <austin@dark09room.com>
Date: Tue, 18 Dec 2001 18:25:53 -0500
Links: << >> << T >> << A >>

> Anybody can throw code
> together and get poor performance.

Wait...this isn't a Microsoft news group, is it?

;-)

Article: 37673
Subject: You take the low road and I'll ......
From: Austin Lesea <austin.lesea@xilinx.com>
Date: Tue, 18 Dec 2001 16:11:51 -0800
Links: << >> << T >> << A >>

Bryan,

Reminds me of the Dilbert Cartoon where they are telling tales of their early
programming years...

"I remember using assembly code..."

"That is nothing, I remember using 1's and 0's...."

"You had zeroes?  Wow, we had to use 'lower case l's and upper case 'ohs'..."

"Bucnh of babies, I only had 1's!"

Why we as engineers would enjoy pain, and brag about it still amazes me.

A design that is well architected, self documented, commented, and reliable is
more important to many customers.  I prefer to throw all of my energies into
supporting those designs (in hdl's) which now account for 99% of what is being
done out there.

Austin



Bryan wrote:

> So lets talk controversial....
>
> If Lucent can support hard macros in Epic with hard routing, then why can't
> Xilinx.  My application requires it and Xilinx doesn't support it in FPGA
> editor(which was programmed by the same softies as Epic).  Oh, I remember
> why they don't support it.  Because nobody cares about designs that push the
> limitations of FPGAs.  Because everybody else that is making designs for
> Xilinx parts is still in kindergarten finger painting with verilog and hdl.
> Ha, I didn't get my EE degree to be a soft weirdo.  Anybody can throw code
> together and get poor performance.
>
> flame away kindergarten kids
>
> Bryan
>
> "Peter Alfke" <peter.alfke@xilinx.com> wrote in message
> news:3C1F8AEC.BFD2E067@xilinx.com...
> > This is a friendly and helpful newsgroup, but let's make sure that it does
> not
> > get abused.
> > Lots of textbooks explain how to divide by a power of 2, where the
> remainder is,
> > and how you sign-extend the MSB. Explaining that is not the purpose of
> this
> > newsgroup.
> >
> > Let's use our "bandwidth" for more complex and perhaps controversial
> questions
> > that are not explained in textbooks and data books.
> >
> > Peter Alfke, Xilinx Applications
> >
> >

Article: 37674
Subject: Re: Kindergarten Stuff
From: Bret Wade <bret.wade@xilinx.com>
Date: Tue, 18 Dec 2001 17:13:54 -0700
Links: << >> << T >> << A >>

Hello Bryan,

FPGA Editor  and the other implementation tools do support routed hard macros.
Hard macros aren't widely used because of the limitations in the timing analysis
tools in dealing with the macros, but the support is there.

Regards,
Bret Wade
Xilinx Product Applications

Bryan wrote:

> So lets talk controversial....
>
> If Lucent can support hard macros in Epic with hard routing, then why can't
> Xilinx.  My application requires it and Xilinx doesn't support it in FPGA
> editor(which was programmed by the same softies as Epic).  Oh, I remember
> why they don't support it.  Because nobody cares about designs that push the
> limitations of FPGAs.  Because everybody else that is making designs for
> Xilinx parts is still in kindergarten finger painting with verilog and hdl.
> Ha, I didn't get my EE degree to be a soft weirdo.  Anybody can throw code
> together and get poor performance.
>
> flame away kindergarten kids
>
> Bryan
>
> "Peter Alfke" <peter.alfke@xilinx.com> wrote in message
> news:3C1F8AEC.BFD2E067@xilinx.com...
> > This is a friendly and helpful newsgroup, but let's make sure that it does
> not
> > get abused.
> > Lots of textbooks explain how to divide by a power of 2, where the
> remainder is,
> > and how you sign-extend the MSB. Explaining that is not the purpose of
> this
> > newsgroup.
> >
> > Let's use our "bandwidth" for more complex and perhaps controversial
> questions
> > that are not explained in textbooks and data books.
> >
> > Peter Alfke, Xilinx Applications
> >
> >

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search