Messages from 37725

Article: 37725
Subject: Re: Defauolt Should Be "Inputs and Outputs" For IOBs
From: "S. Ramirez" <sramirez@cfl.rr.com>
Date: Wed, 19 Dec 2001 17:28:59 GMT
Links: << >> << T >> << A >>


"David Miller" <spam@quartz.net.nz> wrote in message
news:3C20222D.4040601@quartz.net.nz...
> M. Ramirez's question still holds good -- is there ever a reason not to
> pack flops into IOBs?

David,
    Even if there was a reason, and there always is, wouldn't it be good for
the default to be "Inputs and Outputs."  Us crusty guys know better, but how
many newbies fire up the tools not knowing EVERY detail and overlooking this
one?  Since IOB flip flops are a freebie and improve offset timing, why not
use them as the default?  Just an suggestion.
Simon

Article: 37726
Subject: Re: Kindergarten Stuff
From: Bret Wade <bret.wade@xilinx.com>
Date: Wed, 19 Dec 2001 11:00:22 -0700
Links: << >> << T >> << A >>

Hello Bryan,

It sounds as though you're running into some limitations with manual routing in
FPGA Editor rather than a lack of support for routed hard macros. As David
mentioned, you may have more success by reusing automatic routing, either by
using routed hard macros or the new directed routing feature. The directed
routing feature has the advantage of not introducing any limitation wrt timing
analysis. If you feel that manual routing in FED is broken, I suggest that you
contact the hotline and ask them to log a CR for you.

Regards,
Bret Wade
Xilinx Product Applications

Bryan wrote:

> Ok, this is great.  Two other times I posted to this group about making hard
> macros and got no response.  The low road paid off it seems.  I really want
> to get these hard macros with locked routing to work.  What I see when I try
> to make a hard macro is that I cannot manually route the nets in editor.  I
> want to hand route the clock signal on local routing and hand route an
> enable.  The first thing that happens is I start routing the clock through a
> switchbox from a normal IBUF, works fine.  Then I try to re-select the input
> sight to the switchbox to make a split in the routing.  I cannot select the
> input bubble again and if I just select the net and a new destination out of
> the switchbox it complains.  I have a routed design open in editor right
> next to the one I am hand routing.  Since I couldn't get this to work, I
> decided to first see if I could route the line by hand like the router did.
> I know I am selecting usable sights because I am cheating of a routed
> design.  So how do I re-select that bubble to continue hand routing?
>
> From what I have seen on the group most people are letting the tool route
> the "macro" then going into editor and turning the ncd to a macro.  Then
> lock routing.  What I have been doing is placement only and then open the
> ncd to hand route my couple nets.  What I am building into a macro is a 16
> bit FIFO, I have 16 of these FIFOs in the design and each one contains IOB
> latches.  Since they contain IOB latches none of them have the identical IOB
> placement because of prohibit sites.  If I hand route I can keep al of the
> clock tree in the logic portion the same and only change which IOBs it goes
> to for each macro.
>
> Any help greatly appreciated, but if you can answer this why can't my Xilinx
> FAE?

Because I used to support this stuff for NeoCAD.

>
>
> Bryan
>
> "Bret Wade" <bret.wade@xilinx.com> wrote in message
> news:3C1FDBC2.A88FF9B8@xilinx.com...
> > Hello Bryan,
> >
> > FPGA Editor  and the other implementation tools do support routed hard
> macros.
> > Hard macros aren't widely used because of the limitations in the timing
> analysis
> > tools in dealing with the macros, but the support is there.
> >
> > Regards,
> > Bret Wade
> > Xilinx Product Applications
> >
> > Bryan wrote:
> >
> > > So lets talk controversial....
> > >
> > > If Lucent can support hard macros in Epic with hard routing, then why
> can't
> > > Xilinx.  My application requires it and Xilinx doesn't support it in
> FPGA
> > > editor(which was programmed by the same softies as Epic).  Oh, I
> remember
> > > why they don't support it.  Because nobody cares about designs that push
> the
> > > limitations of FPGAs.  Because everybody else that is making designs for
> > > Xilinx parts is still in kindergarten finger painting with verilog and
> hdl.
> > > Ha, I didn't get my EE degree to be a soft weirdo.  Anybody can throw
> code
> > > together and get poor performance.
> > >
> > > flame away kindergarten kids
> > >
> > > Bryan
> > >
> > > "Peter Alfke" <peter.alfke@xilinx.com> wrote in message
> > > news:3C1F8AEC.BFD2E067@xilinx.com...
> > > > This is a friendly and helpful newsgroup, but let's make sure that it
> does
> > > not
> > > > get abused.
> > > > Lots of textbooks explain how to divide by a power of 2, where the
> > > remainder is,
> > > > and how you sign-extend the MSB. Explaining that is not the purpose of
> > > this
> > > > newsgroup.
> > > >
> > > > Let's use our "bandwidth" for more complex and perhaps controversial
> > > questions
> > > > that are not explained in textbooks and data books.
> > > >
> > > > Peter Alfke, Xilinx Applications
> > > >
> > > >
> >

Article: 37727
Subject: Re: Defauolt Should Be "Inputs and Outputs" For IOBs
From: Bret Wade <bret.wade@xilinx.com>
Date: Wed, 19 Dec 2001 11:09:53 -0700
Links: << >> << T >> << A >>

David Miller wrote:

> > ALWAYS
> >
> >>want my designs to use IOB flip flops if possible.  It seems to me that
>
> > That's what you get for using Design Mangler...er...Manager ;-)
>
> heh.  I find that make does a fair job of managing builds.  But then, I
> always did find CLIs more user friendly than GUIs.
>
> Even if you invoke map from the commandline or means other than through
> DM, packing flops into I/Os is not done unless the -pr flag is supplied.
>   So I suppose DM is following the defaults of map.
>
> M. Ramirez's question still holds good -- is there ever a reason not to
> pack flops into IOBs?
>

I think that packing registers is not the default map option because the
expectation is that registers will have IOB=TRUE|FALSE attributes applied to
them by the front end tool. This attribute takes precedence over the -pr map
switch and allows for individual control of registers.

Regards,
Bret Wade
Xilinx Product Applications

Article: 37728
Subject: Re: How can I reduce Spartan-II routing delays to meet 33MHz PCI's Tsu <
From: Eric Crabill <eric.crabill@xilinx.com>
Date: Wed, 19 Dec 2001 10:17:33 -0800
Links: << >> << T >> << A >>


Hi,

Have you considered changing your logic design?  I can't say
for sure from reading your timing report, but it looks like you
are using FRAME# and IRDY#, through some combinational logic,
to enable the output flip flops for the AD bus.

I can imagine several places where you would do this; one is
the logic that clocks out the address for your initiator when
the bus is idle and you have GNT#.  This would clock out the
address for the address phase(s).

You can eliminate this path entirely by continually clocking
out an address, even if it is invalid, because that address
will not be driven on the bus (i.e. it doesn't matter what is
in the output flops if the tristate lines are high...)

I prefer to think of this problem (the one of the output clock
enables for the data path) as "When do I want to HALT outgoing
data?" instead of trying to solve the more natural problem of
when you should actually be enabling it.  Use the fact that
the datapath will be tristated to your advantage.

You will still have to solve the problem of when to turn off
the tristates, though.

Eric

Kevin Brace wrote:
> 
> Hi, I will like to know if someone knows the strategies on how to reduce
> routing (net) delays for Spartan-II.
> So far, I treated synthesis tool(XST)/Map/Par as a blackbox, but because
> my design (a PCI IP core) was not meeting Tsu (Tsu < 7ns), I started to
> take a closer look of how LUTs are placed on the FPGA.
> Using Floorplanner, I saw the LUTs being placed all over the FPGA, so I
> decided to hand place the LUTs using UCF flow.
> That was the most effective thing I did to reduce interconnect delay
> (reduced the worst interconnect delay by about 2.7 ns (11 ns down to 8.3
> ns)), but unfortunately, I still have to reduce the interconnect delay
> by 1.3 ns (worst Tsu currently at 8.3 ns).
> Basically, I have two input signals, FRAME# and IRDY# that are not
> meeting timings.
> Here are the two of the worst violators for FRAME# and IRDY#,
> respectively.
> 
> ________________________________________________________________________________
> 
> ================================================================================
>  Timing constraint: COMP "frame_n" OFFSET = IN 7 nS  BEFORE COMP "clk" ;
> 
>  503 items analyzed, 61 timing errors detected.
>  Minimum allowable offset is   8.115ns.
> 
> --------------------------------------------------------------------------------
> Slack:                  -1.115ns (requirement - (data path - clock path
> - clock arrival))
>   Source:               frame_n
>   Destination:          PCI_IP_Core_Instance_ad_Port_2
>   Destination Clock:    clk_BUFGP rising at 0.000ns
>   Requirement:          7.000ns
>   Data Path Delay:      10.556ns (Levels of Logic = 6)
>   Clock Path Delay:     2.441ns (Levels of Logic = 2)
>   Timing Improvement Wizard
>   Data Path: frame_n to PCI_IP_Core_Instance_ad_Port_2
>     Delay type         Delay(ns)  Logical Resource(s)
>     ----------------------------  -------------------
>     Tiopi                 1.224   frame_n
>                                   frame_n_IBUF
>     net (fanout=45)       0.591   frame_n_IBUF
>     Tilo                  0.653   PCI_IP_Core_Instance_I_25_LUT_7
>     net (fanout=3)        0.683   N21918
>     Tbxx                  0.981   PCI_IP_Core_Instance_I_XXL_1357_1
>     net (fanout=15)       2.352   PCI_IP_Core_Instance_I_XXL_1357_1
>     Tilo                  0.653   PCI_IP_Core_Instance_I_125_LUT_17
>     net (fanout=1)        0.749   PCI_IP_Core_Instance_N3059
>     Tilo                  0.653   PCI_IP_Core_Instance_I__n0055
>     net (fanout=1)        0.809   PCI_IP_Core_Instance_N3069
>     Tioock                1.208   PCI_IP_Core_Instance_ad_Port_2
>     ----------------------------  ------------------------------
>     Total                10.556ns (5.372ns logic, 5.184ns route)
>                                   (50.9% logic, 49.1% route)
> 
>    Clock Path: clk to PCI_IP_Core_Instance_ad_Port_2
>    Delay type         Delay(ns)  Logical Resource(s)
>    ----------------------------  -------------------
>     Tgpio                 1.082   clk
>                                   clk_BUFGP/IBUFG
>     net (fanout=1)        0.007   clk_BUFGP/IBUFG
>     Tgio                  0.773   clk_BUFGP/BUFG
>     net (fanout=423)      0.579   clk_BUFGP
>     ----------------------------  ------------------------------
>     Total                 2.441ns (1.855ns logic, 0.586ns route)
>                                   (76.0% logic, 24.0% route)
> 
> --------------------------------------------------------------------------------
> 
> ================================================================================
> Timing constraint: COMP "irdy_n" OFFSET = IN 7 nS  BEFORE COMP "clk" ;
> 
>  698 items analyzed, 74 timing errors detected.
>  Minimum allowable offset is   8.290ns.
> 
> --------------------------------------------------------------------------------
> Slack:                  -1.290ns (requirement - (data path - clock path
> - clock arrival))
>   Source:               irdy_n
>   Destination:          PCI_IP_Core_Instance_ad_Port_2
>   Destination Clock:    clk_BUFGP rising at 0.000ns
>   Requirement:          7.000ns
>   Data Path Delay:      10.731ns (Levels of Logic = 6)
>   Clock Path Delay:     2.441ns (Levels of Logic = 2)
>   Timing Improvement Wizard
>   Data Path: irdy_n to PCI_IP_Core_Instance_ad_Port_2
>     Delay type         Delay(ns)  Logical Resource(s)
>     ----------------------------  -------------------
>     Tiopi                 1.224   irdy_n
>                                   irdy_n_IBUF
>     net (fanout=138)      0.766   irdy_n_IBUF
>     Tilo                  0.653   PCI_IP_Core_Instance_I_25_LUT_7
>     net (fanout=3)        0.683   N21918
>     Tbxx                  0.981   PCI_IP_Core_Instance_I_XXL_1357_1
>     net (fanout=15)       2.352   PCI_IP_Core_Instance_I_XXL_1357_1
>     Tilo                  0.653   PCI_IP_Core_Instance_I_125_LUT_17
>     net (fanout=1)        0.749   PCI_IP_Core_Instance_N3059
>     Tilo                  0.653   PCI_IP_Core_Instance_I__n0055
>     net (fanout=1)        0.809   PCI_IP_Core_Instance_N3069
>     Tioock                1.208   PCI_IP_Core_Instance_ad_Port_2
>     ----------------------------  ------------------------------
>     Total                10.731ns (5.372ns logic, 5.359ns route)
>                                   (50.1% logic, 49.9% route)
> 
>   Clock Path: clk to PCI_IP_Core_Instance_ad_Port_2
>     Delay type         Delay(ns)  Logical Resource(s)
>     ----------------------------  -------------------
>     Tgpio                 1.082   clk
>                                   clk_BUFGP/IBUFG
>     net (fanout=1)        0.007   clk_BUFGP/IBUFG
>     Tgio                  0.773   clk_BUFGP/BUFG
>     net (fanout=423)      0.579   clk_BUFGP
>     ----------------------------  ------------------------------
>     Total                 2.441ns (1.855ns logic, 0.586ns route)
>                                   (76.0% logic, 24.0% route)
> 
> --------------------------------------------------------------------------------
> 
> Timing summary:
> ---------------
> 
> Timing errors: 135  Score: 55289
> 
> Constraints cover 27511 paths, 0 nets, and 4835 connections (92.1%
> coverage)
> 
> ________________________________________________________________________________
> 
> Locations of various resources:
> 
> FRAME#: pin 23
> IRDY#:  pin 24
> AD[2]:  pin 62
> PCI_IP_Core_Instance_I_25_LUT_7: CLB_R12C1.s1
> PCI_IP_Core_Instance_I_XXL_1357_1: CLB_R12C2
> PCI_IP_Core_Instance_I_125_LUT_17: CLB_R23C9.s0
> PCI_IP_Core_Instance_I__n0055: CLB_R24C9.s0
> 
> Input signals other than FRAME# and IRDY# are all meeting Tsu < 7 ns
> requirement, and because I now figured out how to use IOB FFs, I can
> easily meet Tval < 11 ns (Tco) for all output signals.
> I am using Xilinx ISE WebPack 4.1 (which doesn't come with FPGA Editor),
> and the PCI IP core is written in Verilog.
> The device I am targeting is Xilinx Spartan-II 150K system gate speed
> grade -5 part (XC2S150-5CPQ208), and I did meet all 33MHz PCI timings
> with Spartan-II 150K system gate speed grade -6 part (XC2S150-6CPQ208)
> when I resynthesized the PCI IP core for speed grade -6 part, and
> basically reused the same UCF file with the floorplan (I had to make
> small modifications to the UCF file because some of the LUT names
> changed).
> The reason I really care about Xilinx Spartan-II 150K system gate speed
> grade -5 part is because that is the chip that is on the PCI prototype
> board of Insight Electronics Spartan-II Development Kit.
> Yes, I wish the PCI prototype board came with speed grade -6 . . .
> Because I want the PCI IP core to be portable across different platforms
> (most notably Xilinx and Altera FPGAs), I am not really interested in
> making any vendor specific modification to my Verilog RTL code, but I
> won't mind using various tricks in the .UCF file (for Xilinx) or .ACF
> file (I believe that is the Altera equivalent of Xilinx .UCF file).
> Here are some solutions I came up with.
> 
> 1) Reduce the signal fanout (Currently at 35 globally, but FRAME# and
> IRDY#'s fanout are 200. What number should I reduce the global fanout
> to?).
> 
> 2) Use USELOWSKEWLINES in a UCF file (already tried on some long
> routings, but didn't seem to help. I will try to play around with this
> option a little more with different signals.).
> 
> 3) Floorplan all the LUTs and FFs on the FPGA (currently, I only
> floorplanned the LUTs that violated Tsu, and most of them take inputs
> from FRAME# and IRDY#.).
> 
> 4) Use Guide file Leverage mode in Map and Par.
> 
> 5) Try routing my design 2000 times (That will take several days . . . I
> once routed my design about 20 times. After routing my design 20 times,
> Par seems to get stuck in certain Timing Score range beyond 20
> iterations.).
> 
> 6) Pay for ISE Foundation 4.1 (I don't want to pay for tools because I
> am poor), and use FPGA Editor (I wish ISE WebPack came with FPGA
> Editor.). At least from FPGA Editor, I can see how the signals are
> actually getting routed.
> 
> 7) Use a different synthesis tool other than XST (I am poor, so I doubt
> that I can afford.).
> 
> I will like to hear from anyone who can comment on the solutions I just
> wrote, or has other suggestions on what I can do to reduce the delays to
> meet 33MHz PCI's Tsu < 7 ns requirement.
> 
> Thanks,
> 
> Kevin Brace (don't respond to me directly, respond within the newsgroup)
> 
> P.S.  Considering that I am struggling to meet 33MHz PCI timings with
> Spartan-II speed grade -5, how come Xilinx meet 66MHz PCI timings on
> Virtex/Spartan-II speed grade -6? (I can only barely meet 33MHz PCI
> timings with Spartan-II speed grade -6 using floorplanner.)
> Is it possible to move a signal through a input pin like FRAME# and
> IRDY# (pin 23 and pin 24 respectively for Spartan-II PQ208), go through
> a few levels of LUTs, and reach far away IOB output FF and tri-state
> control FF like pin 67 (AD[0]) or pin 203 (AD[31]) in 5 ns? (3 ns + 1.9
> to 2 ns natural clock skew = 4.9 ns to 5.0 ns realistic Tsu)
> Can a signal move that fast on Virtex/Spartan-II speed grade -6? (I sort
> of doubt from my experience.)
> I know that Xilinx uses the special IRDY and TRDY pin in LogiCORE PCI,
> but that won't seem to help FRAME#, since FRAME# has to be sampled
> unregistered to determine an end of burst transfer.
> What kind of tricks is Xilinx using in their LogiCORE PCI other than the
> special IRDY and TRDY pin?
> Does anyone know?

Article: 37729
Subject: Efficient new multiplier for Spartan2, Virtex &c.
From: "Carl Brannen" <carl.brannen@terabeam.com>
Date: Wed, 19 Dec 2001 18:20:06 +0000 (UTC)
Links: << >> << T >> << A >>

I haven't seen this algorithm published.  If it is original, I'm
throwing it into the public domain. 

-- Efficient fall through multiplier in Xilinx Spartan2, Virtex,
-- VirtexE.  (Will work in Virtex2, but they have dedicated
-- multipliers so they probably don't need this.)
--
-- This fall through multiplier gets 3 bits per initial adder
-- rather than the usual 2 bits.  This is accomplished by taking
-- advantage of under utilized adders in a standard multiplier.
--
-- This is an efficient multiplier when the multiplier has
-- 3n - 1 bits, with n>1.  Where it really rocks is when the
-- the multiplier has (3n-1)*2^m bits.
--
-- This is in contrast to the usual Xilinx Virtex multiplier
-- which is relatively efficient when the multiplier has
-- 2*2^m bits.
--
-- For large enough multiplies, this algorithm gets more and
-- more efficient, compared to the usual Xilinx multiplier.
--
-- While a standard multiplier uses the MULT_AND logic well, the
-- stages that add up the partial product results are simple adders
-- built from LUT2s.  Any time you see a big array of LUTs used with
-- less than all four inputs needed you have to wonder if there's
-- a more efficient way of packing the logic in.
--
-- A note on the usual Xilinx multiplier.  (Those skilled in the art
-- should skip this section.)
--
-- The usual multiplier uses two bits per partial product.  The least
-- significant partial product produces one of {0M, 1M, 2M, 3M}, while the
-- next least significant produces one of {0M, 4M, 8M, 12M}.  Adding
-- together these two results produces any multiple of M from 0M to 15M:
--
--  0M + 0M =  0M
--  0M + 1M =  1M
--  0M + 2M =  2M
--  0M + 3M =  3M
--  4M + 0M =  4M
--  4M + 1M =  5M
--  4M + 2M =  6M
--  4M + 3M =  7M
-- ...
-- 12M + 2M = 14M
-- 12M + 3M = 15M
--
--
-- Instead of keeping all my numbers as positive (or 2's complement
-- negative) numbers, I save bits if I allow negative numbers and
-- keep a single bit that indicates that the result is to be interpreted
-- as a negative number.  This is how Seymour Cray did his arithmetic,
-- but I don't know if he used this internally to his multipliers.
-- But if I assume that my single column of slices is only going to
-- give me one of four possible multiples of M, I have a problem.  If
-- I choose the four values to be {0M, 1M, 2M, 3M}, then I only get
-- seven values when I negate them as -0M = 0M.  But for a 3-bit
-- coded multiplier I'm going to need 2^3 = 8 values from each slice.
-- If I use {1M, 2M, 3M, 4M}, then I miss zero.
--
-- The solution is to use two sign bits, one for positive numbers, the
-- other for negative.  Let M = 5, so the nine possible values are as
-- follows:  (Note that I only need 8 of these nine values.)
--
--  P  N  Mult | Value
--  -  -  ---- + -----
--  0  1   4   | -20   (i.e. -4*M)
--  0  1   3   | -15
--  0  1   2   | -10
--  0  1   1   | - 5
--  0  0   X   |   0
--  1  0   1   |   5
--  1  0   2   |  10
--  1  0   3   |  15
--  1  0   4   |  20   (i.e.  4*M)
--
-- This would be exactly what the doctor ordered if the multiplier
-- were in a base slightly different from the octal.  With octal, the
-- eight values that a digit can take are the familiar {0,1, ... 7}.
-- With this unusual base, the 8 values that a digit can take are
-- instead {-4,-3,-2,-1,0,1,2,3}.  (I choose to keep these rather than
-- -3 through 4 because it saves a LUT somewhere.)
--
-- So I need a base conversion between base 8 and base, well, I'll
-- call it base 8-4.  Here's how numbers are interpreted in the
-- two bases:
--
-- Base 8:    A = (A0  ) + (A1  )*8 + (A2  )*8^2 + ...
-- Base 8-4:  B = (B0-4) + (B1-4)*8 + (B2-4)*8^2 + ...
--
-- From this it is obvious that to convert the number "A" in base 8
-- to base 8-4, I need merely add the octal constant o444444... to "A".
-- This perfectly converts it to the corresponding (i.e. carrying the
-- same numerical value) number in base 8-4.
--
-- This conversion is very convenient when the multiplier has a lot of
-- bits, but it isn't needed for relatively short multipliers.  In
-- particular, a multiplier of n x 5 would do well to avoid performing
-- the base conversion explicitly.
--
--
-- After performing the base conversion, I take each digit from B
-- (where B = A + o44...44. = A + "100100100...100100") and use
-- it to create a single partial product.  For an n x (3m-1) multiply,
-- I'll end up with m partial products.
--
-- I can't just add up the partial products because they're not in
-- the usual format for 2's complement arithmetic.  I'll have to add
-- extra logic to the adder stages in order to handle the signed
-- values being added.  It turns out that there is exactly enough
-- freedom in a Xilinx slice in order to handle these explicitly
-- signed numbers.
--
-- With two mode bits, I can choose 4 functions in a Xilinx arithmetic
-- slice.  The table to add two signed numbers (each with "N" and "P"
-- bits, and each with a partial sum in the set 1M to 4M) is fairly
-- complicated.  Let "S" be the higher precision partial product, and
-- "T" be the lower precision number.
--
-- Rather than be confusing, I'll denote the "N" bit of "S" by "S.N",
-- same with the "P" bit, and I'll denote the unsigned vector part
-- of "S" by "S.V".  Same with "T".  Then the function to add "S" and
-- "T" is as follows:  (Note that since "S" is higher in signficance
-- than "T", it follows that "S.V" is larger, as an unsigned number,
-- than "T.V", except if "S" is zero, in which case "S.V" is a don't
-- care.  For this reason, all the subtractions in the following
-- table result in unsigned integers.)
--
--     "S"         "T"            "S+T"
--  ---------   ---------   ------------------
--   S  P N V    T  P N V    S+T      P N   V
--  --  - - -   --  - - -   ------    - -  ---
--  -A  0 1 A   -B  0 1 B   -(A+B)    0 1  A+B
--  -A  0 1 A    0  0 0 X   -(A  )    0 1   A
--  -A  0 1 A    B  1 0 B   -(A-B)    0 1  A-B
--   0  0 0 A   -B  0 1 B   -(  B)    0 1   B
--   0  0 0 A    0  0 0 X      0      0 0   X
--   0  0 0 A    B  1 0 B    (  B)    1 0   B
--   A  1 0 A   -B  0 1 B    (A-B)    1 0  A-B
--   A  1 0 A    0  0 0 X    (A  )    1 0   A
--   A  1 0 A    B  1 0 B    (A_B)    1 0  A+B
--
-- From this, it is clear that "(S+T).N" and "(S+T).P" are
-- simple (4-LUT) functions of "S.N", "S.P", "T.N" and "T.P".
-- It's also clear that "(S+T).V" is computed by one of the
-- four functions {A+B, A-B, A, B}.  It turns out that these
-- four arithmetic functions exactly fit in a single slice.
--
-- It's all very well that I keep the internal partial sums
-- in this odd notation, but it won't do to deliver that to
-- the customer.  So how do I compute the final result?
--
-- First of all, the final result can't have the "N" bit set,
-- because this is an unsigned multiply.  If it has the "P" bit
-- set, then the correct product shows up on "(S+T).V" and I'm
-- done.  The only hairy case is when the "P" bit is low.  In
-- that case, the final result will be supposed to be zero, but
-- "(S+T).V" will be an "X".  For this reason, I have to connect
-- up the final "P" result to the synchronous reset pin of the
-- final register in such a way that when "P" is zero, the
-- final result register is held to zero.
--
-- This comment has gone on long enough that I'm including it
-- as a separate note rather than with the VHDL.  I'll add VHDL
-- code for a sample multipliers as replies to this note, but
-- these aren't easy to build, so give me some time.

Carl


-- 
Posted from firewall.terabeam.com [216.137.15.2] 
via Mailgate.ORG Server - http://www.Mailgate.ORG

Article: 37730
Subject: Re: Low area barrel shift puts 3 to 1 mux in a Xilinx LUT:
From: Mike Treseler <mike.treseler@flukenetworks.com>
Date: Wed, 19 Dec 2001 10:21:19 -0800
Links: << >> << T >> << A >>

Carl Brannen wrote:
> 
> Design reports:

Leo gives 55 LCs on altera 20k -- 193 MHz !!

 -- Mike Treseler

Article: 37731
Subject: Re: MIPS or MOPS?
From: "Rupert Pigott" <Darkb00ng@btinternet.com>
Date: Wed, 19 Dec 2001 18:24:34 +0000 (UTC)
Links: << >> << T >> << A >>

This almost sounds like a homework question. :P

Err, wouldn't you get more progress by coming up with some code and
calculating worst case cycle counts ?

Cheers,
Rupert

AAP3 <aams@dr.com> wrote in message
news:wRXT7.584$B47.961868@typhoon.columbus.rr.com...
> Hi..to all
> I wrote some functions for a CDMA receiver and I want to find the number
of
> MIPS required by each function. How do I calculate it?
> and which is more accurate measure, MIPS  or MOPS?
> More info:
> data rate 2Mbps.
> system clock 50MHz.
> 4 time over sampling.
> 16 Spreading factor.
>
> Thanks.

Article: 37732
Subject: Re: Default Should Be "Inputs and Outputs" For IOBs
From: "Austin Franklin" <austin@dark87room.com>
Date: Wed, 19 Dec 2001 13:31:10 -0500
Links: << >> << T >> << A >>

> > Hi Simon,
> >
> > That's what you get for using Design Mangler...er...Manager ;-)
> >
> > Austin
>
> Hi Austin,
>     Is Max Pus II any better?
> Simon

Hi Simon,

Have you been giving flying lessons to any "suspicious" characters recently?

I haven't used MaxPlus II in probably 6 or more years, so I couldn't tell
you...

Regards and Happy Holidays,

Austin

Article: 37733
Subject: Re: Defauolt Should Be "Inputs and Outputs" For IOBs
From: "Austin Franklin" <austin@dark87room.com>
Date: Wed, 19 Dec 2001 13:32:42 -0500
Links: << >> << T >> << A >>

> > > ALWAYS
> > >
> > >>want my designs to use IOB flip flops if possible.  It seems to me
that
> >
> > > That's what you get for using Design Mangler...er...Manager ;-)
> >
> > heh.  I find that make does a fair job of managing builds.  But then, I
> > always did find CLIs more user friendly than GUIs.
> >
> > Even if you invoke map from the commandline or means other than through
> > DM, packing flops into I/Os is not done unless the -pr flag is supplied.
> >   So I suppose DM is following the defaults of map.
> >
> > M. Ramirez's question still holds good -- is there ever a reason not to
> > pack flops into IOBs?
> >
>
> I think that packing registers is not the default map option because the
> expectation is that registers will have IOB=TRUE|FALSE attributes applied
to
> them by the front end tool. This attribute takes precedence over the -pr
map
> switch and allows for individual control of registers.
>
> Regards,
> Bret Wade
> Xilinx Product Applications


Bret,

I don't know that that is true.  Even if Synplicity has that checked, the
Xilinx tools STILL need the "-pr b" to be added to the mapper from what I
remember.

Regards,

Austin

Article: 37734
Subject: Re: You take the low road and I'll ......
From: "Austin Franklin" <austin@dark87room.com>
Date: Wed, 19 Dec 2001 13:39:58 -0500
Links: << >> << T >> << A >>

Hi Austin,

> If I see hdl code, at least I can see where it is going, even if it is
written
> badly.

For me, that is not true.  I have to wade around pages and pages of
text....where with a schematic, I can pick up what's going on almost
instantly.  Schematics offer, if done right that is, a built-in block
diagram...which can not be done with text files very easily.  The data flow
is FAR easier to pick out in a schematic than in HDL, and control logic may
or may not be easier to "understand" in HDL...it depends on how it's done.

> Nice thing about software is that people have figured out how to manage
it, and
> document it.

Why is that any different than schematics?

> If I examine a design, from top to bottom, I can make a determination of
the
> quality of the design by examining the hdl code.  It is possible, but more
> difficult to see what is going on in schematic.

I believe the exact opposite.

> As a
> technical manager, code review is one tool that should be used to make
sure the
> project is on track, following the rules, and has a higher likelihood of
> success.

And you've never seen/done a schematic review?  I believe schematics are FAR
easier to review than text files are.  Anyway, design reviews are typically
NOT the source files, but the architecture...it's rare that one brings
source files to a design review and gives a copy to everyone in the room,
and people just sit around flipping through hundreds of pages of text
discussing constructs...

I think you should attend my lecture in the spring on mixed design entry for
FPGA design ;-)

Regards,

Austin

Article: 37735
Subject: Re: How can I reduce Spartan-II routing delays to meet 33MHz PCI's Tsu < 7 ns requirement?
From: "Falk Brunner" <Falk.Brunner@gmx.de>
Date: Wed, 19 Dec 2001 19:58:22 +0100
Links: << >> << T >> << A >>

"Kevin Brace" <kevinbraceusenetkillspam@hotmail.com.killspam> schrieb im
Newsbeitrag news:3C202A1A.B51A40DD@hotmail.com.killspam...

> IRDY# (pin 23 and pin 24 respectively for Spartan-II PQ208), go through
> a few levels of LUTs, and reach far away IOB output FF and tri-state
^^^^^^^^^^^^^^^^^^^^^^

I think THATs a good point to start optimization. HOW?

Ok, you have a decoding logic with lets say four levels of logic. IRDY#
enter the FPGA, runs through the logic and reaches the input of an IOB FF.
Now the propagation time through 4 levels of logic is too long, what to do?
Lets say our logic has 10 input signals. One of them is IRDY. So this
decoder can be repesented by a 1024 entry ROM right? If I dont have a 1024
ENTRY ROM, You could use 2 512 entrys ROMS. The output is MUXed with IRDY#.
So IRDY# has only to travel through 1 level of logic (timing analyzer calls
this 3 levels, since it counts clock_2_out and setup as seperate levels)
Got my point?

--
MfG
Falk

Article: 37736
Subject: Re: Default Should Be "Inputs and Outputs" For IOBs
From: "S. Ramirez" <sramirez@cfl.rr.com>
Date: Wed, 19 Dec 2001 19:24:54 GMT
Links: << >> << T >> << A >>

"Austin Franklin" <austin@dark87room.com> wrote in message
news:u21n7gqtl9on53@corp.supernews.com...
> > > Hi Simon,
> > >
> > > That's what you get for using Design Mangler...er...Manager ;-)
> > >
> > > Austin
> >
> > Hi Austin,
> >     Is Max Pus II any better?
> > Simon
>
> Hi Simon,
>
> Have you been giving flying lessons to any "suspicious" characters
recently?
>
> I haven't used MaxPlus II in probably 6 or more years, so I couldn't tell
> you...
>
> Regards and Happy Holidays,
>
> Austin

Austin,
     It was a joke, Austin, it was a joke (similar to yours by the way)
     You and your wire and kids also have a Merry Christmas and Happy New
Year.
Simon (In Beautiful 78 degree Florida)

Article: 37737
Subject: Re: Defauolt Should Be "Inputs and Outputs" For IOBs
From: "S. Ramirez" <sramirez@cfl.rr.com>
Date: Wed, 19 Dec 2001 19:27:05 GMT
Links: << >> << T >> << A >>


"Bret Wade" <bret.wade@xilinx.com> wrote in message
news:3C20D7F1.4AB91218@xilinx.com...
> I think that packing registers is not the default map option because the
> expectation is that registers will have IOB=TRUE|FALSE attributes applied
to
> them by the front end tool. This attribute takes precedence over the -pr
map
> switch and allows for individual control of registers.
>
> Regards,
> Bret Wade
> Xilinx Product Applications


Bret,
    Every company where I have worked, we've never used the above mentioned
attribute.
Simon Ramirez

Article: 37738
Subject: Re: MIPS or MOPS?
From: gah@ugcs.caltech.edu (glen herrmannsfeldt)
Date: 19 Dec 2001 19:33:54 GMT
Links: << >> << T >> << A >>

Murali Jayapala <mjayapal@esat.kuleuven.ac.be> writes:

>I guess the first step is to check the proper definitions of MIPS and MOPS.
>As far as I know  MIPS stands for Millions of instructions per second and MOPS
>is millions of operations per second.
>Now if you have mapped the algorithm on a VLIW/parallel machine, then each
>'instruction' has more than one 'operation'.  However if the platform is a
>risc machine then each instruction has just one operation. So in case of
>parallel machines it would be fair to evaluate through MOPS., while in risc
>machines MOPS and MIPS both signify the same result..

The current definition of MIPS that I know of is:
"Meaningless Indicator of Processor Speed" which makes sense 
under current architectures.  

Some people still find MFLOPS, Millions of Floating point Operations 
Per Second still useful, though it might make some difference if 
those operations were addition, multiplication, or division.  
Mostly this is interesting for some scientific programs that are
mostly floating point and the number of floating point operations
scales nicely with the problem size.  For example, multiplying two
N by N matrices takes pretty much N**3 floating multiply and N**3 
floating add instructions, and the loop operations can likely be
done in parallel.  Running the program for different N and scaling
by 2*N**3 will tell you how long it will take for not too small
(startup overhead) or too large (page thrashing) N's.

-- glen

-- glen

Article: 37739
Subject: Re: Defauolt Should Be "Inputs and Outputs" For IOBs
From: David Miller <spam@quartz.net.nz>
Date: Thu, 20 Dec 2001 08:34:15 +1300
Links: << >> << T >> << A >>

> ALWAYS
> 
>>want my designs to use IOB flip flops if possible.  It seems to me that


> That's what you get for using Design Mangler...er...Manager ;-)


heh.  I find that make does a fair job of managing builds.  But then, I 
always did find CLIs more user friendly than GUIs.

Even if you invoke map from the commandline or means other than through 
DM, packing flops into I/Os is not done unless the -pr flag is supplied. 
  So I suppose DM is following the defaults of map.

M. Ramirez's question still holds good -- is there ever a reason not to 
pack flops into IOBs?

-- 
David Miller, BCMS (Hons)  | When something disturbs you, it isn't the
Endace Measurement Systems | thing that disturbs you; rather, it is
Mobile: +64-21-704-djm     | your judgement of it, and you have the
Fax:    +64-21-304-djm     | power to change that.  -- Marcus Aurelius

Article: 37740
Subject: Re: Defauolt Should Be "Inputs and Outputs" For IOBs
From: Ray Andraka <ray@andraka.com>
Date: Wed, 19 Dec 2001 20:11:40 GMT
Links: << >> << T >> << A >>

Ditto

"S. Ramirez" wrote:

> "Bret Wade" <bret.wade@xilinx.com> wrote in message
> news:3C20D7F1.4AB91218@xilinx.com...
> > I think that packing registers is not the default map option because the
> > expectation is that registers will have IOB=TRUE|FALSE attributes applied
> to
> > them by the front end tool. This attribute takes precedence over the -pr
> map
> > switch and allows for individual control of registers.
> >
> > Regards,
> > Bret Wade
> > Xilinx Product Applications
>
> Bret,
>     Every company where I have worked, we've never used the above mentioned
> attribute.
> Simon Ramirez

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 37741
Subject: Re: You take the low road and I'll ......
From: Ray Andraka <ray@andraka.com>
Date: Wed, 19 Dec 2001 20:24:59 GMT
Links: << >> << T >> << A >>

Both schematics and HDL can be horrendous or stellar.  I've seen examples both
ways of both.  In either case, proper use of hierarchy is the key to a
maintainable design.

Austin Franklin wrote:

> Hi Austin,
>
> > If I see hdl code, at least I can see where it is going, even if it is
> written
> > badly.
>
> For me, that is not true.  I have to wade around pages and pages of
> text....where with a schematic, I can pick up what's going on almost
> instantly.  Schematics offer, if done right that is, a built-in block
> diagram...which can not be done with text files very easily.  The data flow
> is FAR easier to pick out in a schematic than in HDL, and control logic may
> or may not be easier to "understand" in HDL...it depends on how it's done.
>
> > Nice thing about software is that people have figured out how to manage
> it, and
> > document it.
>
> Why is that any different than schematics?
>
> > If I examine a design, from top to bottom, I can make a determination of
> the
> > quality of the design by examining the hdl code.  It is possible, but more
> > difficult to see what is going on in schematic.
>
> I believe the exact opposite.
>
> > As a
> > technical manager, code review is one tool that should be used to make
> sure the
> > project is on track, following the rules, and has a higher likelihood of
> > success.
>
> And you've never seen/done a schematic review?  I believe schematics are FAR
> easier to review than text files are.  Anyway, design reviews are typically
> NOT the source files, but the architecture...it's rare that one brings
> source files to a design review and gives a copy to everyone in the room,
> and people just sit around flipping through hundreds of pages of text
> discussing constructs...
>
> I think you should attend my lecture in the spring on mixed design entry for
> FPGA design ;-)
>
> Regards,
>
> Austin

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 37742
Subject: Re: Best-case timing?
From: Austin Lesea <austin.lesea@xilinx.com>
Date: Wed, 19 Dec 2001 12:49:09 -0800
Links: << >> << T >> << A >>

Stephen,

Best case timing is actually pretty easy for the manufacturer.  We know that
the fastest corner of the silicon process is, we get parts from that corner,
test them, make them cold, supply them with highest Vcc's, and see if the
performance agrees with the models, and the extractions.

We did not tend to specify minimums (although we do now, more and more in the
newer parts) because it wasn't supposed to matter.  Now that it does, we do
provide that information once the process is stable (the part is in
manufacturing as a regular product, not ES material).

For those that wish to have a small delta between min and max, they must
choose the fastest speed grade.  This is because a slower speed grade may
contain a device that is from the fastest corner (as it obviously is fast
enough).

As for the PCI core (an any IP), we warranty its operation partly by the
specifications data sheet for the part, and partly by the exhaustive testing
of the IP in real silicon.  I would refer to this as specification by
application.  Just one more reason to use a core if it is available!

Austin

Stephen Byrne wrote:

> I originally posted this yesterday on google groups, but I'm not seeing it
> on my home news server.  In case it is not visible to all, I'm reposting.
>
> Hello All,
>
> My company is currently comparing 66MHz PCI core solutions from Xilinx
> and Altera, as well as debating using a home-spun core.  One issue
> I've come upon is the PCI requirement for a MAX clock-to-out time of 6
> ns and MIN clock-to-out time of 2 ns.  Both the Xilinx ISE and Altera
> Quartus II tools seem very helpful in supplying MAX (worst-case) Tco
> times, but I don't see any info on best-case times.  Apparently the
> SDF files for back-annotated timing sim have the same worst-case
> numbers repeated 3 times, resulting in the same simulation regardless
> of case selection.  My question is: how is anyone (FPGA vendors
> included) guaranteeing a MIN Tco of 2 ns across all conditions and
> parts if the design tools don't even yield that information?
>
> Thank You,
>
> Stephen Byrne

Article: 37743
Subject: Re: Default Should Be "Inputs and Outputs" For IOBs
From: "Austin Franklin" <austin@dark87room.com>
Date: Wed, 19 Dec 2001 16:48:50 -0500
Links: << >> << T >> << A >>

> > > > Hi Simon,
> > > >
> > > > That's what you get for using Design Mangler...er...Manager ;-)
> > > >
> > > > Austin
> > >
> > > Hi Austin,
> > >     Is Max Pus II any better?
> > > Simon
> >
> > Hi Simon,
> >
> > Have you been giving flying lessons to any "suspicious" characters
> recently?
> >
> > I haven't used MaxPlus II in probably 6 or more years, so I couldn't
tell
> > you...
> >
> > Regards and Happy Holidays,
> >
> > Austin
>
> Austin,
>      It was a joke, Austin, it was a joke (similar to yours by the way)

Hi Simon,

Can't anyone be straight with you around ;-)

> You and your wire and kids also have a Merry Christmas and Happy New
> Year.

My wire?  Hum.  I'll have to ask my wife about that...

> Simon (In Beautiful 78 degree Florida)

May you lose one bit for every byte!

Regards,

Austin

Article: 37744
Subject: Re: Efficient new multiplier for Spartan2, Virtex &c.
From: "Alexei Lomakin" <alomakin@mc.com>
Date: Wed, 19 Dec 2001 13:48:58 -0800
Links: << >> << T >> << A >>

Hi Carl,

Is it efficient from the size or 
speed point?

Thanks,

Alexei

Article: 37745
Subject: Re: How can I reduce Spartan-II routing delays to meet 33MHz PCI's Tsu < 7 ns requirement?
From: "Austin Franklin" <austin@dark87room.com>
Date: Wed, 19 Dec 2001 16:51:30 -0500
Links: << >> << T >> << A >>

Something sounds wrong...aren't you registering your PCI signals in the
IOBs, and are you using the built-in PCI logic?  Making 33MHz in an SII
should be a snap.

"Kevin Brace" <kevinbraceusenetkillspam@hotmail.com.killspam> wrote in
message news:3C202A1A.B51A40DD@hotmail.com.killspam...
> Hi, I will like to know if someone knows the strategies on how to reduce
> routing (net) delays for Spartan-II.
> So far, I treated synthesis tool(XST)/Map/Par as a blackbox, but because
> my design (a PCI IP core) was not meeting Tsu (Tsu < 7ns), I started to
> take a closer look of how LUTs are placed on the FPGA.
> Using Floorplanner, I saw the LUTs being placed all over the FPGA, so I
> decided to hand place the LUTs using UCF flow.
> That was the most effective thing I did to reduce interconnect delay
> (reduced the worst interconnect delay by about 2.7 ns (11 ns down to 8.3
> ns)), but unfortunately, I still have to reduce the interconnect delay
> by 1.3 ns (worst Tsu currently at 8.3 ns).
> Basically, I have two input signals, FRAME# and IRDY# that are not
> meeting timings.
> Here are the two of the worst violators for FRAME# and IRDY#,
> respectively.
>
>
>
>
____________________________________________________________________________
____
>
>
>
============================================================================
====
>  Timing constraint: COMP "frame_n" OFFSET = IN 7 nS  BEFORE COMP "clk" ;
>
>  503 items analyzed, 61 timing errors detected.
>  Minimum allowable offset is   8.115ns.
>
> --------------------------------------------------------------------------
------
> Slack:                  -1.115ns (requirement - (data path - clock path
> - clock arrival))
>   Source:               frame_n
>   Destination:          PCI_IP_Core_Instance_ad_Port_2
>   Destination Clock:    clk_BUFGP rising at 0.000ns
>   Requirement:          7.000ns
>   Data Path Delay:      10.556ns (Levels of Logic = 6)
>   Clock Path Delay:     2.441ns (Levels of Logic = 2)
>   Timing Improvement Wizard
>   Data Path: frame_n to PCI_IP_Core_Instance_ad_Port_2
>     Delay type         Delay(ns)  Logical Resource(s)
>     ----------------------------  -------------------
>     Tiopi                 1.224   frame_n
>                                   frame_n_IBUF
>     net (fanout=45)       0.591   frame_n_IBUF
>     Tilo                  0.653   PCI_IP_Core_Instance_I_25_LUT_7
>     net (fanout=3)        0.683   N21918
>     Tbxx                  0.981   PCI_IP_Core_Instance_I_XXL_1357_1
>     net (fanout=15)       2.352   PCI_IP_Core_Instance_I_XXL_1357_1
>     Tilo                  0.653   PCI_IP_Core_Instance_I_125_LUT_17
>     net (fanout=1)        0.749   PCI_IP_Core_Instance_N3059
>     Tilo                  0.653   PCI_IP_Core_Instance_I__n0055
>     net (fanout=1)        0.809   PCI_IP_Core_Instance_N3069
>     Tioock                1.208   PCI_IP_Core_Instance_ad_Port_2
>     ----------------------------  ------------------------------
>     Total                10.556ns (5.372ns logic, 5.184ns route)
>                                   (50.9% logic, 49.1% route)
>
>    Clock Path: clk to PCI_IP_Core_Instance_ad_Port_2
>    Delay type         Delay(ns)  Logical Resource(s)
>    ----------------------------  -------------------
>     Tgpio                 1.082   clk
>                                   clk_BUFGP/IBUFG
>     net (fanout=1)        0.007   clk_BUFGP/IBUFG
>     Tgio                  0.773   clk_BUFGP/BUFG
>     net (fanout=423)      0.579   clk_BUFGP
>     ----------------------------  ------------------------------
>     Total                 2.441ns (1.855ns logic, 0.586ns route)
>                                   (76.0% logic, 24.0% route)
>
>
> --------------------------------------------------------------------------
------
>
>
>
>
============================================================================
====
> Timing constraint: COMP "irdy_n" OFFSET = IN 7 nS  BEFORE COMP "clk" ;
>
>  698 items analyzed, 74 timing errors detected.
>  Minimum allowable offset is   8.290ns.
>
> --------------------------------------------------------------------------
------
> Slack:                  -1.290ns (requirement - (data path - clock path
> - clock arrival))
>   Source:               irdy_n
>   Destination:          PCI_IP_Core_Instance_ad_Port_2
>   Destination Clock:    clk_BUFGP rising at 0.000ns
>   Requirement:          7.000ns
>   Data Path Delay:      10.731ns (Levels of Logic = 6)
>   Clock Path Delay:     2.441ns (Levels of Logic = 2)
>   Timing Improvement Wizard
>   Data Path: irdy_n to PCI_IP_Core_Instance_ad_Port_2
>     Delay type         Delay(ns)  Logical Resource(s)
>     ----------------------------  -------------------
>     Tiopi                 1.224   irdy_n
>                                   irdy_n_IBUF
>     net (fanout=138)      0.766   irdy_n_IBUF
>     Tilo                  0.653   PCI_IP_Core_Instance_I_25_LUT_7
>     net (fanout=3)        0.683   N21918
>     Tbxx                  0.981   PCI_IP_Core_Instance_I_XXL_1357_1
>     net (fanout=15)       2.352   PCI_IP_Core_Instance_I_XXL_1357_1
>     Tilo                  0.653   PCI_IP_Core_Instance_I_125_LUT_17
>     net (fanout=1)        0.749   PCI_IP_Core_Instance_N3059
>     Tilo                  0.653   PCI_IP_Core_Instance_I__n0055
>     net (fanout=1)        0.809   PCI_IP_Core_Instance_N3069
>     Tioock                1.208   PCI_IP_Core_Instance_ad_Port_2
>     ----------------------------  ------------------------------
>     Total                10.731ns (5.372ns logic, 5.359ns route)
>                                   (50.1% logic, 49.9% route)
>
>   Clock Path: clk to PCI_IP_Core_Instance_ad_Port_2
>     Delay type         Delay(ns)  Logical Resource(s)
>     ----------------------------  -------------------
>     Tgpio                 1.082   clk
>                                   clk_BUFGP/IBUFG
>     net (fanout=1)        0.007   clk_BUFGP/IBUFG
>     Tgio                  0.773   clk_BUFGP/BUFG
>     net (fanout=423)      0.579   clk_BUFGP
>     ----------------------------  ------------------------------
>     Total                 2.441ns (1.855ns logic, 0.586ns route)
>                                   (76.0% logic, 24.0% route)
>
>
> --------------------------------------------------------------------------
------
>
>
> Timing summary:
> ---------------
>
> Timing errors: 135  Score: 55289
>
> Constraints cover 27511 paths, 0 nets, and 4835 connections (92.1%
> coverage)
>
>
____________________________________________________________________________
____
>
>
> Locations of various resources:
>
> FRAME#: pin 23
> IRDY#:  pin 24
> AD[2]:  pin 62
> PCI_IP_Core_Instance_I_25_LUT_7: CLB_R12C1.s1
> PCI_IP_Core_Instance_I_XXL_1357_1: CLB_R12C2
> PCI_IP_Core_Instance_I_125_LUT_17: CLB_R23C9.s0
> PCI_IP_Core_Instance_I__n0055: CLB_R24C9.s0
>
>
>
> Input signals other than FRAME# and IRDY# are all meeting Tsu < 7 ns
> requirement, and because I now figured out how to use IOB FFs, I can
> easily meet Tval < 11 ns (Tco) for all output signals.
> I am using Xilinx ISE WebPack 4.1 (which doesn't come with FPGA Editor),
> and the PCI IP core is written in Verilog.
> The device I am targeting is Xilinx Spartan-II 150K system gate speed
> grade -5 part (XC2S150-5CPQ208), and I did meet all 33MHz PCI timings
> with Spartan-II 150K system gate speed grade -6 part (XC2S150-6CPQ208)
> when I resynthesized the PCI IP core for speed grade -6 part, and
> basically reused the same UCF file with the floorplan (I had to make
> small modifications to the UCF file because some of the LUT names
> changed).
> The reason I really care about Xilinx Spartan-II 150K system gate speed
> grade -5 part is because that is the chip that is on the PCI prototype
> board of Insight Electronics Spartan-II Development Kit.
> Yes, I wish the PCI prototype board came with speed grade -6 . . .
> Because I want the PCI IP core to be portable across different platforms
> (most notably Xilinx and Altera FPGAs), I am not really interested in
> making any vendor specific modification to my Verilog RTL code, but I
> won't mind using various tricks in the .UCF file (for Xilinx) or .ACF
> file (I believe that is the Altera equivalent of Xilinx .UCF file).
> Here are some solutions I came up with.
>
>
> 1) Reduce the signal fanout (Currently at 35 globally, but FRAME# and
> IRDY#'s fanout are 200. What number should I reduce the global fanout
> to?).
>
> 2) Use USELOWSKEWLINES in a UCF file (already tried on some long
> routings, but didn't seem to help. I will try to play around with this
> option a little more with different signals.).
>
> 3) Floorplan all the LUTs and FFs on the FPGA (currently, I only
> floorplanned the LUTs that violated Tsu, and most of them take inputs
> from FRAME# and IRDY#.).
>
> 4) Use Guide file Leverage mode in Map and Par.
>
> 5) Try routing my design 2000 times (That will take several days . . . I
> once routed my design about 20 times. After routing my design 20 times,
> Par seems to get stuck in certain Timing Score range beyond 20
> iterations.).
>
> 6) Pay for ISE Foundation 4.1 (I don't want to pay for tools because I
> am poor), and use FPGA Editor (I wish ISE WebPack came with FPGA
> Editor.). At least from FPGA Editor, I can see how the signals are
> actually getting routed.
>
> 7) Use a different synthesis tool other than XST (I am poor, so I doubt
> that I can afford.).
>
>
> I will like to hear from anyone who can comment on the solutions I just
> wrote, or has other suggestions on what I can do to reduce the delays to
> meet 33MHz PCI's Tsu < 7 ns requirement.
>
>
>
>
> Thanks,
>
>
>
> Kevin Brace (don't respond to me directly, respond within the newsgroup)
>
>
>
>
> P.S.  Considering that I am struggling to meet 33MHz PCI timings with
> Spartan-II speed grade -5, how come Xilinx meet 66MHz PCI timings on
> Virtex/Spartan-II speed grade -6? (I can only barely meet 33MHz PCI
> timings with Spartan-II speed grade -6 using floorplanner.)
> Is it possible to move a signal through a input pin like FRAME# and
> IRDY# (pin 23 and pin 24 respectively for Spartan-II PQ208), go through
> a few levels of LUTs, and reach far away IOB output FF and tri-state
> control FF like pin 67 (AD[0]) or pin 203 (AD[31]) in 5 ns? (3 ns + 1.9
> to 2 ns natural clock skew = 4.9 ns to 5.0 ns realistic Tsu)
> Can a signal move that fast on Virtex/Spartan-II speed grade -6? (I sort
> of doubt from my experience.)
> I know that Xilinx uses the special IRDY and TRDY pin in LogiCORE PCI,
> but that won't seem to help FRAME#, since FRAME# has to be sampled
> unregistered to determine an end of burst transfer.
> What kind of tricks is Xilinx using in their LogiCORE PCI other than the
> special IRDY and TRDY pin?
> Does anyone know?

Article: 37746
Subject: Re: FPGA-Conversion. IP Cores
From: "Austin Franklin" <austin@dark87room.com>
Date: Wed, 19 Dec 2001 17:02:55 -0500
Links: << >> << T >> << A >>

> Another bad news for a conversion service is that Clear Logic recently
> lost a key ruling against Altera.
>
>
>
http://www.altera.com/corporate/press_box/releases/corporate/pr-wins_clear_l
ogic.html
>
>
> I sort of find the ruling troubling because assuming that an Altera-made
> IP is not included in the customer's design, should anyone have any
> control of the bit stream file you generated from Altera's software?
> I suppose that what Altera wants to say is that because the customer had
> to agree prior to using an Altera software (like MAX+PLUS II or
> Quartus), the customer has to use the generated bit stream file in a way
> agreed in the software licensing agreement.

Not only is it bad for conversion services, but bad for all engineering.  I
am STUNNED that a court could find in Altera's favor!  It's entirely absurd
(and arrogant) for the court, and Altera, to claim you can't do what you
want with the bitstream, license agreement or not!  That's like getting a
license agreement with a hammer that says you can only use it with
particular brand of nails!  Clear Logic must have had some bad lawyers.

These kind of foolish, ignorant (IMO) court rulings just make the hair on
the back of my neck stand up!  It's YOUR design, and you should be able to
do what every you want with it.  Just because you used their tools in the
design process should not limit your use of YOUR design.

OK, I'm done ranting for a few minutes.

Austin

Article: 37747
Subject: Re: multi-cycle constraint
From: David Miller <spam@quartz.net.nz>
Date: Thu, 20 Dec 2001 11:56:23 +1300
Links: << >> << T >> << A >>

> I apply a set of multi-cycle constraints to a module and it works
> fine, both timing analyzer and timing simulation. Then I incorporate
> this module and a larger design and apply the the same constraints
> again. This time timing analyzer reports is OK but the timing sim is
> wrong. Any idea to solve the problem?


How is the timing simulation wrong?  Since you don't say, I will make a 
guess: is your timed simulation model is being generated without the 
"-xon false" flag to ngd2vhdl?

Without that flag, ngd2vhdl propagates X's through flops whose inputs 
don't meet timing requirements.  ngd2vhdl doesn't know that that path 
has a multicycle constraint on it, and may be flagging problems that 
aren't real.



-- 
David Miller, BCMS (Hons)  | When something disturbs you, it isn't the
Endace Measurement Systems | thing that disturbs you; rather, it is
Mobile: +64-21-704-djm     | your judgement of it, and you have the
Fax:    +64-21-304-djm     | power to change that.  -- Marcus Aurelius

Article: 37748
Subject: How to route a segment at a time with FPGA Editor
From: "Carl Brannen" <carl.brannen@terabeam.com>
Date: Wed, 19 Dec 2001 23:10:54 +0000 (UTC)
Links: << >> << T >> << A >>

Bryan, it's been a while since I've done this (about 18 months), but I ended up
doing it for pretty much the same reasons you did.  I ran out of clock inputs
for some logic that had to take clocks from outside and I wanted precisely
repeatable timing for a bunch of asynchronous busses that came on to the chip.
So I built the clock domain transfer circuitry into a hard macro and placed it
near each input bus.  Here's some notes:

(1) As soon as you say "hard macro", you're going to exceed Xilinx's ability to
support you.  You're on your own.

(2) The tools tend to blow up if you make too large a hard macro, or if you put
too much routing on one.  I only route the clocks.

(3) I don't try to get hard macros to specify clocks to IOBs.  Instead, I
choose to bring the external clocks into pins that are within about 5 CLBs of
the where I bring the data pins in.  It just turns out that the router handles
that particular clock combination with some grace.

(4) The FPGA editor has gotten more and more difficult to use as time has gone
on.  XACT was far better.  And yes, routing in hard macros is a pain in the
butt.  I'll write some notes as I walk through a hard macro routing problem:

(a) From the "File" menu I open the "Main Properties" menu.
(b) I turn off "Stub_Trimming", "AutomaticRouting", "EnhancedManual Routing",
and "DelayBased Routing", then press "Apply" and "Close".
(c) I turn off the "RatsNest" button because all the uncompleted Data paths
will drive me nuts.
(d) On the "List1" window, I get rid of "All Components" and replace it with
"All Nets"
(e) I select the clock I want to route and press the "hilite" button.
(f) When I want to see where I can route to and that kind of stuff, I turn off
the "Routes" button in the tool bar.  That makes all the routes invisible
(except the stuff I hilited), because the problem with the tool is that you
can't select a segment of an even partially routed net without selecting the
whole net.
(g) When I see the next segment that I want to route the net to, I select that
segment and hilite it too.  (By the way, you can add buttons that allow you to
hilite in different colors.  This is very useful, and is found in the Xilinx
documentation for FPGA Editor, but I'm assuming you haven't specialized your
FPGA Editor to do this stuff.)
(h) I can now route a single segment at a time by selecting the "from" segment
with the left mouse button, holding down the "shift" key, selecting the "to"
segment with the left mouse button, releasing the "shift" key, and then
pressing the "route" button on the right hand panel.

This actually works, believe it or not.  The secret is that in order to
properly manipulate routes, you have to turn off the visibility for routed
lines.

Where this gets stupid is when you try to route a bunch of stuff for something
that is critical for route usage.  You basically have to turn on visibility of
routed stuff, hilite your intended route, then turn off route visiblity in
order to make it happen.

Enjoy.

Carl


-- 
Posted from firewall.terabeam.com [216.137.15.2] 
via Mailgate.ORG Server - http://www.Mailgate.ORG

Article: 37749
Subject: Re: Kindergarten Stuff
From: "Carl Brannen" <carl.brannen@terabeam.com>
Date: Wed, 19 Dec 2001 23:15:48 +0000 (UTC)
Links: << >> << T >> << A >>

"Bryan" <bryan@srccomp.com> wrote in message
news:3c20c0ef$0$25796$4c41069e@reader1.ash.ops.us.uu.net...

> Ok, this is great.  Two other times I posted to this group about making hard
> macros and got no response.  The low road paid off it seems.  I really want
> to get these hard macros with locked routing to work.  What I see when I try
> to make a hard macro is that I cannot manually route the nets in editor.  I
> want to hand route the clock signal on local routing and hand route an
> enable.  The first thing that happens is I start routing the clock through a
> switchbox from a normal IBUF, works fine.  Then I try to re-select the input
> sight to the switchbox to make a split in the routing.  I cannot select the
> input bubble again and if I just select the net and a new destination out of
> the switchbox it complains.  I have a routed design open in editor right
> next to the one I am hand routing.  Since I couldn't get this to work, I
> decided to first see if I could route the line by hand like the router did.
> I know I am selecting usable sights because I am cheating of a routed
> design.  So how do I re-select that bubble to continue hand routing?

Bryan, it's been a while since I've done this (about 18 months), but I ended up
doing it for pretty much the same reasons you did.  I ran out of clock inputs
for some logic that had to take clocks from outside and I wanted precisely
repeatable timing for a bunch of asynchronous busses that came on to the chip.
So I built the clock domain transfer circuitry into a hard macro and placed it
near each input bus.  Here's some notes:

(1) As soon as you say "hard macro", you're going to exceed Xilinx's ability to
support you.  You're on your own.

(2) The tools tend to blow up if you make too large a hard macro, or if you put
too much routing on one.  I only route the clocks.

(3) I don't try to get hard macros to specify clocks to IOBs.  Instead, I
choose to bring the external clocks into pins that are within about 5 CLBs of
the where I bring the data pins in.  It just turns out that the router handles
that particular clock combination with some grace.

(4) The FPGA editor has gotten more and more difficult to use as time has gone
on.  XACT was far better.  And yes, routing in hard macros is a pain in the
butt.  I'll write some notes as I walk through a hard macro routing problem:

(a) From the "File" menu I open the "Main Properties" menu.
(b) I turn off "Stub_Trimming", "AutomaticRouting", "EnhancedManual Routing",
and "DelayBased Routing", then press "Apply" and "Close".
(c) I turn off the "RatsNest" button because all the uncompleted Data paths
will drive me nuts.
(d) On the "List1" window, I get rid of "All Components" and replace it with
"All Nets"
(e) I select the clock I want to route and press the "hilite" button.
(f) When I want to see where I can route to and that kind of stuff, I turn off
the "Routes" button in the tool bar.  That makes all the routes invisible
(except the stuff I hilited), because the problem with the tool is that you
can't select a segment of an even partially routed net without selecting the
whole net.
(g) When I see the next segment that I want to route the net to, I select that
segment and hilite it too.  (By the way, you can add buttons that allow you to
hilite in different colors.  This is very useful, and is found in the Xilinx
documentation for FPGA Editor, but I'm assuming you haven't specialized your
FPGA Editor to do this stuff.)
(h) I can now route a single segment at a time by selecting the "from" segment
with the left mouse button, holding down the "shift" key, selecting the "to"
segment with the left mouse button, releasing the "shift" key, and then
pressing the "route" button on the right hand panel.

This actually works, believe it or not.  The secret is that in order to
properly manipulate routes, you have to turn off the visibility for routed
lines.

Where this gets stupid is when you try to route a bunch of stuff for something
that is critical for route usage.  You basically have to turn on visibility of
routed stuff, hilite your intended route, then turn off route visiblity in
order to make it happen.

Enjoy.

Carl


-- 
Posted from firewall.terabeam.com [216.137.15.2] 
via Mailgate.ORG Server - http://www.Mailgate.ORG

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search