Messages from 137025

Article: 137025
Subject: Re: FPGA partial/catastrophic failure mode question
From: Neil Steiner <neil.steiner@east.isi.edu>
Date: Fri, 19 Dec 2008 11:00:59 -0500
Links: << >> << T >> << A >>

> Do you mean partial reconfiguration? If those words are foreign to
> you, look it up on the Xilinx or Altera websites. I'm unfamiliar with
> the procedure in detail, but understand that it's commonly done. The
> key point is to lock down all of the configuration that is/isn't
> changing (presumably to specific area groups, then there exists a
> mechanism for changing the remaining circuitry.

My defect avoidance is indeed through active partial reconfiguration, 
although I work at a much finer granularity than the slots that people 
typically use for PR.

As for Altera, their products do not yet support partial 
reconfiguration, so while defect avoidance should still be feasible (if 
one had knowledge of the bitstream format), it would require a full 
reconfiguration of the device.

Article: 137026
Subject: Re: FPGA partial/catastrophic failure mode question
From: Neil Steiner <neil.steiner@east.isi.edu>
Date: Fri, 19 Dec 2008 14:39:59 -0500
Links: << >> << T >> << A >>

Thanks for the reply Gabor.

> Where I have seen localized degradation (partial burn-out) it
> has been on I/O drivers.  This generally leads to an unusable
> device without some rewiring at the board level.

Very interesting.  As you say, this would require changes outside of the 
device itself, which could be a problem in aerospace applications.

I wonder though if the driver degradation could be reduced by 
over-designing the board.  In other words, if I knew that a system would 
be difficult to access, but I wanted it to keep running as long as 
possible, it sounds like I might be well served by conservative 
(defensive?) I/O design rules.

> There are clearly applications for your idea at the manufacturing
> defect level, however.  For example Xilinx uses parts that are
> only tested to work with a particular pattern for their volume
> discount ASIC-replacement program.  In addition to the reduced
> test time and therefore cost, this theoretically improves yields.

I believe the test time is the most commonly cited reason for EasyPath. 
  I'm sure Xilinx could provide some very interesting failure mode 
details here, but it seems a little, you know, tacky to ask them.

> Obviously the ability to use parts with manufacturing defects
> for general use would be a big plus to Xilinx, especially on the
> high-end parts that tend to have lower yields (check out the
> price tags on the XC2V8000 if you want to see what low yield
> does to cost).  If the defects could be mapped reliably you
> may have usable parts with an effectively slightly smaller fabric
> size at a fraction of the price or the "perfect" silicon.

And I'm delighted to hear somebody else echoing an argument that I've 
made in published work.  I would happily have taken "mostly good" 
XC2V10000 or XC2VP125 devices, but now I digress.

> In order for this sort of application to get to volume use,
> however you would need to apply the relocation at the tail
> end of the build process.  It is unlikely that large-scale
> users of these devices will want to run place&route for
> every chip that goes out the door.  Small-scale users like
> ASIC-simulation where the bitstream is generally only used
> once would benefit from this.

You are right, of course, that this would be inconvenient in the context 
of current manufacturing.  I'm thinking of it in a different context 
though, where a device or system manages its own configuration and 
performs its own place and route, something that I've demonstrated for 
V2P.  Admittedly, there needs to be a minimum of known good logic for 
the base design, but perhaps that's where EasyPath comes in.

For mainstream systems we're certainly not there yet, but I suspect it 
may come to that in 10 or 15 years, depending on the yield with upcoming 
technologies.

Article: 137027
Subject: Re: FPGA partial/catastrophic failure mode question
From: Neil Steiner <neil.steiner@east.isi.edu>
Date: Fri, 19 Dec 2008 15:01:12 -0500
Links: << >> << T >> << A >>

> In the one genuinely faulty part that I've seen, it was a very localised 
> failure in the middle of the fabric, and rerunning the P&R tools with 
> some trivial code change (which makes it use different resources) could 
> mask the fault.
> 
> This was repeatable, and definitely due to a bad spot on the die.
> 
> Oh, it was an engineering sample.  I assume the fault was due to 
> inadequate testing at the factory rather than some field failure.

Fascinating.  I had been wondering whether a failure like that would 
defeat the power rails or the configuration shift registers, but 
apparently not necessarily so.

> Perhaps your reviewer was thinking about the sort of failures associated 
> with exceeding the absolute maximum ratings of the part.

The context under consideration was an aerospace system able to perform 
its own placement and routing, and therefore able to work around damage 
that might be sustained during its lifetime.

I've demonstrated a system that can do its own placement, routing, and 
partial reconfiguration while continuing to run, but since its back-end 
implementation tools are hosted on the FPGA, the defect avoidance 
capability is useless unless the FPGA remains mostly functional.  That 
was the point that the reviewer was making, and the question implicit in 
my post.

Article: 137028
Subject: Re: Looking for a strategy to identify nets in post-map netlist
From: Mike Treseler <mtreseler@gmail.com>
Date: Fri, 19 Dec 2008 12:59:14 -0800
Links: << >> << T >> << A >>

KJ wrote:

. . .
> Darn near every time completing the above steps will negate the supposed 
> need for post route simulation...and it will take far less time and effort 
> to do so.

Well said. Thanks for the posting.
An sdf + netlist sim is a test of my design *process*.
A failure is a fire alarm, not something to be debugged directly.
      -- Mike Treseler

Article: 137029
Subject: Re: FPGA partial/catastrophic failure mode question
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 19 Dec 2008 21:17:11 +0000 (UTC)
Links: << >> << T >> << A >>

Neil Steiner <neil.steiner@east.isi.edu> wrote:

> The context under consideration was an aerospace system able to perform 
> its own placement and routing, and therefore able to work around damage 
> that might be sustained during its lifetime.

> I've demonstrated a system that can do its own placement, routing, and 
> partial reconfiguration while continuing to run, but since its back-end 
> implementation tools are hosted on the FPGA, the defect avoidance 
> capability is useless unless the FPGA remains mostly functional.  That 
> was the point that the reviewer was making, and the question implicit in 
> my post.

I once went to a talk by someone running Linux on a PPC in a Xilinx
chip, and then doing partial reconfiguration from that running
Linux system.  You do have to be careful not to configure yourself
out, though.  Also, no protection against failure modes including
the PPC and its connection to the configuration lines.

-- glen

Article: 137030
Subject: Re: FPGA partial/catastrophic failure mode question
From: Neil Steiner <neil.steiner@east.isi.edu>
Date: Fri, 19 Dec 2008 16:58:57 -0500
Links: << >> << T >> << A >>

> I once went to a talk by someone running Linux on a PPC in a Xilinx
> chip, and then doing partial reconfiguration from that running
> Linux system.  You do have to be careful not to configure yourself
> out, though.  Also, no protection against failure modes including
> the PPC and its connection to the configuration lines.

Thank you for stating my mantra!  ;)

The key to doing this successfully is to give the system a dynamic model 
of itself that stays in sync with the changes that it undergoes.  That 
not only tells it what wires and logic are or are not in use, but also 
allows it to avoid clobbering existing wires or logic or injected defects.

With that foundation in place, I have demonstrated the ability to 
implement or remove EDIF circuits at will, and arbitrarily add, extend, 
trim, or remove connections between those circuits and/or the base 
system, without requiring the slot model mandated by the PR flow.  It 
turns out that partial active reconfiguration works really well if one 
is careful to avoid the kinds of things you allude to.

But returning to the original point of the post, if FPGA failures are 
typically sudden and catastrophic, then my ability to avoid masked 
defects is not particularly useful.

Article: 137031
Subject: Re: FPGA partial/catastrophic failure mode question
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 19 Dec 2008 23:54:10 +0000 (UTC)
Links: << >> << T >> << A >>

Neil Steiner <neil.steiner@east.isi.edu> wrote:
(snip)
 
> The key to doing this successfully is to give the system a dynamic model 
> of itself that stays in sync with the changes that it undergoes.  That 
> not only tells it what wires and logic are or are not in use, but also 
> allows it to avoid clobbering existing wires or logic or injected defects.
 
> With that foundation in place, I have demonstrated the ability to 
> implement or remove EDIF circuits at will, and arbitrarily add, extend, 
> trim, or remove connections between those circuits and/or the base 
> system, without requiring the slot model mandated by the PR flow.  It 
> turns out that partial active reconfiguration works really well if one 
> is careful to avoid the kinds of things you allude to.

Reminds me of:

http://en.wikipedia.org/wiki/Core_War

-- glen

Article: 137032
Subject: Re: FPGA partial/catastrophic failure mode question
From: Jeff Cunningham <jcc@sover.net>
Date: Fri, 19 Dec 2008 23:08:55 -0500
Links: << >> << T >> << A >>

Neil Steiner wrote:
> Thanks for the reply Gabor.
> 
>> Where I have seen localized degradation (partial burn-out) it
>> has been on I/O drivers.  This generally leads to an unusable
>> device without some rewiring at the board level.
> 
> Very interesting.  As you say, this would require changes outside of the 
> device itself, which could be a problem in aerospace applications.

Couldn't you instead of having one IO pin connecting to a net, have two 
IOs going to each net? Then if one dies, you have the other one as a 
backup. Maybe have some sort of resistor between the two IOs to isolate 
them to reduce the change that both get blown by some glitch on the board.

-Jeff

Article: 137033
Subject: Re: Custom IP Core DMA (Xilinx Virtex II Pro)
From: Jeff Cunningham <jcc@sover.net>
Date: Fri, 19 Dec 2008 23:36:26 -0500
Links: << >> << T >> << A >>

Gerrit Schünemann wrote:
> Hi,
> I did an XPS Project utilizing the PowerPCs and an custom IP core. Now I
> would like to gain DMA to the DDR Memory on the XUP Board from the
> custom IP. Is there a "direct" connection to a memory controller
> available/suitable, or should I utilize the PLB?
> The idea is to measure some fast data with the core, write it to the
> memory and then process it with the PowerPC.

Some possibilities are:

1)
Connect your IP and the PPC to a PLB bus that also connects to a PLB DDR 
controller. Then you only have to implement a PLB master interface on 
your IP, which is not very hard (much easier than a slave interface). 
The IBM coreconnect docs explain the PLB bus in detail.

2)
Use the multiport memory controller (MPMC) tools to create a memory 
controller with a separate port for your IP (which could be a NPI 
interface or PLB) and a port for the PPC (which will be PLB - actually 
two PLB ports, one for data and one for program access). This is a 
potentially faster design, but can also be a resource hog.

I agree with greenlean that you really ought to consider EDK for 
connecting up different IP pieces and controllers and things. Also, do 
stay away from the clunky and obtuse IPIF stuff. Much better to figure 
out how to interface to NPI or PLB and implement any buffering or FIFOs 
in your own IP.

-Jeff

Article: 137034
Subject: PLL and clock in altera cyclone 2 fpga
From: Jamie Morken <jmorken@shaw.ca>
Date: Fri, 19 Dec 2008 21:13:15 -0800
Links: << >> << T >> << A >>

Hi,

I am using a cyclone 2 FPGA, and have a propagation delay warning in one 
of the megafunction's, lpm_divide.  If we use a slower clock to this 
block it will work properly, but the system clock is 27MHz which is too 
fast for the bit width's of the numerator and denominator even with 
pipelining selected in lpm_divide.  I haven't used the cyclone PLL 
before, but its lowest output frequency is 10MHz which is still a bit 
higher than I would like to run the lpm_divide at.  I could add another 
external crystal, but I was wondering if it is possible to generate a 
logic clock inside the FPGA by using flipflops etc.  I have been told 
that this is a bad idea to clock this way due to logic glitches, but am 
not sure why or if that is true?

cheers,
Jamie

Article: 137035
Subject: Re: PLL and clock in altera cyclone 2 fpga
From: Lorenz Kolb <lorenz.kolb@uni-ulm.de>
Date: Sat, 20 Dec 2008 08:25:01 +0100
Links: << >> << T >> << A >>

Jamie Morken wrote:
> Hi,
> 
> I am using a cyclone 2 FPGA, and have a propagation delay warning in one 
> of the megafunction's, lpm_divide.  If we use a slower clock to this 
> block it will work properly, but the system clock is 27MHz which is too 
> fast for the bit width's of the numerator and denominator even with 
> pipelining selected in lpm_divide.  I haven't used the cyclone PLL 
> before, but its lowest output frequency is 10MHz which is still a bit 
> higher than I would like to run the lpm_divide at.  I could add another 
> external crystal, but I was wondering if it is possible to generate a 
> logic clock inside the FPGA by using flipflops etc.  I have been told 
> that this is a bad idea to clock this way due to logic glitches, but am 
> not sure why or if that is true?
> 

Well actually using combinatorial signals as a clock is not good design 
practice. Though it will most likely work in Your case.

A much better way, however, to solve Your problem might be to create a 
clock that's twice as fast as the clock You really like to have (I 
suppose You are going for 9 MHz?).
So create a clock with 18 MHz and use a clock-enable signal  that's 
generated using logic that is only high every second clock-cycle of Your 
18 MHz clock (that's the "official way" to deal with those problems).

You could probably even use the 27 MHz clock and divide that by three 
using clock-enables and set that logic path to timing ignore.

> cheers,
> Jamie

Regards,

Lorenz

Article: 137036
Subject: Re: FPGA partial/catastrophic failure mode question
From: Thomas Stanka <usenet_nospam_valid@stanka-web.de>
Date: Sat, 20 Dec 2008 00:43:09 -0800 (PST)
Links: << >> << T >> << A >>

Hi,

On 19 Dez., 03:34, Neil Steiner <neil.stei...@east.isi.edu> wrote:
> Is it true that when FPGAs fail, they typically experience sudden and
> complete failure, rather than gradual localized degradation?
[..]
> matter.  I am interested if your thoughts both with respect to normal
> aging, and to damage from radiation or other effects.
I think this depends on several points. You like to do some partial
reconfiguring when detecting bad areas. This can be done for some
failures seen.

I have once investigated a fpga with an input failure. The input was
only for configuring some internal functionality and the device was
working beside the fact that this input had no longer influence and
the power consumption was nearly 10 times the normal power
consumption.
The device showed a fast complete wearout due to overheat after some
detailed inspection. So nearly no change to recover this device with
partial reconfiguration, as the lifetime would be very short anyway.

For spaceapplications we're talking typically about permanent failures
due to total dose. This failures could lead to longer delays and
higher power consumption. For the power consumption this has sever
impacts after exceeding a certain level as the thermal power could no
longer be handled. For delay failure it is very hard to detect them
first, then localize them, to know which part has to be shifted.

bye Thomas

Article: 137037
Subject: Large BRAM synthesis
From: benwang08@gmail.com
Date: Sat, 20 Dec 2008 01:50:50 -0800 (PST)
Links: << >> << T >> << A >>

Hi,

My design needs to use most of BRAMs (80%) on a V5 LX50T as one memory
storage. The synthesis results gave me a clock rate of about 100 MHz,
which is far from my expectation. I am new to this, but can someone
tell me is this normal - the routing and fanout will slow down the
design enormously when a lot of BRAMs are used? How can solve this
problem? Xilinx said the BRAM can run at 550 MHz. On what condition
can this be achieved?

Thanks in advance,
Ben

Article: 137038
Subject: Re: Large BRAM synthesis
From: Lorenz Kolb <lorenz.kolb@uni-ulm.de>
Date: Sat, 20 Dec 2008 15:51:53 +0100
Links: << >> << T >> << A >>

benwang08@gmail.com wrote:
> Hi,
> 
> My design needs to use most of BRAMs (80%) on a V5 LX50T as one memory
> storage. The synthesis results gave me a clock rate of about 100 MHz,
> which is far from my expectation. I am new to this, but can someone
> tell me is this normal - the routing and fanout will slow down the
> design enormously when a lot of BRAMs are used?

I have not had a look at the technology map for Your device (maybe 
PlanAhead is worth a look for You and Your design!), but from my guess: 
those BRAM-slices are quite some sort of distributed/scattered 
throughout Your device.

Thus taking a bunch of them to build a larger RAM will result in a lot 
of routing delay and thus slow down Your maximum clock speed.

> How can solve this
> problem? Xilinx said the BRAM can run at 550 MHz. On what condition
> can this be achieved?

I suppose: one BRAM with both in- and outputs registered ...

> 
> Thanks in advance,
> Ben

Regards,

Lorenz

Article: 137039
Subject: Re: PLL and clock in altera cyclone 2 fpga
From: Mike Treseler <mtreseler@gmail.com>
Date: Sat, 20 Dec 2008 07:52:30 -0800
Links: << >> << T >> << A >>

Jamie Morken wrote:

> I am using a cyclone 2 FPGA, and have a propagation delay warning in one 
> of the megafunction's, lpm_divide. 

lpm_divide sucks up luts and routes, and is slow.
I would make a constant reciprocal table
and use one of the dsp block multipliers.

   -- Mike Treseler

Article: 137040
Subject: Re: PLL and clock in altera cyclone 2 fpga
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Sat, 20 Dec 2008 17:22:10 +0000 (UTC)
Links: << >> << T >> << A >>

Mike Treseler <mtreseler@gmail.com> wrote:
> Jamie Morken wrote:
 
>> I am using a cyclone 2 FPGA, and have a propagation delay warning in one 
>> of the megafunction's, lpm_divide. 
 
> lpm_divide sucks up luts and routes, and is slow.
> I would make a constant reciprocal table
> and use one of the dsp block multipliers.

or use an iterative divider (if you can).

-- glen

Article: 137041
Subject: Re: PLL and clock in altera cyclone 2 fpga
From: "KJ" <kkjennings@sbcglobal.net>
Date: Sat, 20 Dec 2008 15:24:30 -0500
Links: << >> << T >> << A >>


"Jamie Morken" <jmorken@shaw.ca> wrote in message 
news:L9%2l.17194$iY3.2034@newsfe14.iad...
> Hi,
>
> If we use a slower clock to this block it will work properly, but the 
> system clock is 27MHz which is too fast for the bit width's of the 
> numerator and denominator even with pipelining selected in lpm_divide.

I've used lpm_divide at ~100+ MHz in Stratix without any difficulties, 
Cyclone II performance should be fairly comparable.  It could be that you 
have.a long logic path leading up to the the divide (or perhaps on the 
output result) and that is the real culprit.  Is the lpm_divide being 
directly instantiated or is it being inferred from code (i.e. x <= a / b)? 
Directly instantiating it gives you more controls in trading off latency in 
clock cycles.

> but I was wondering if it is possible to generate a logic clock inside the 
> FPGA by using flipflops etc.  I have been told that this is a bad idea to 
> clock this way due to logic glitches, but am not sure why or if that is 
> true?
>
The reason why you never want to generate a clock signal with logic or flip 
flops in an FPGA/CPLD is that you open yourself up to failing timing.  No 
matter what the device, the generated clock will
- Be delayed a bit from the system clock, there are no 0ps 
logic/flops/routing in the real world.
- That skew will change from route to route.
- The delay is not controllable.

Now ask yourself what happens when logic that is clocked by the system clock 
enters a flop that is clocked by the skewed clock?  The answer is that, 
depending on what the specific logic is, the signal that comes out of a flop 
that is clocked by the system clock can beat the delayed clock and get 
itself clocked into that flip flop which is one clock cycle early, or it can 
cause it to not meet the setup and/or hold time requirements of the fliip 
flop causing that register to behave unpredictably.  This can show up in the 
timing analysis but only if you enable analyzing for fast paths (which is 
not the default in Quartus).

PLLs get around this problem because they delay the output by exactly one 
full clock cycle (for a 'zero' skew clock) after having done the requested 
frequency multiply/dividing.

Kevin Jennings

Article: 137042
Subject: Re: FPGA partial/catastrophic failure mode question
From: hal-usenet@ip-64-139-1-69.sjc.megapath.net (Hal Murray)
Date: Sat, 20 Dec 2008 14:35:11 -0600
Links: << >> << T >> << A >>


>But returning to the original point of the post, if FPGA failures are 
>typically sudden and catastrophic, then my ability to avoid masked 
>defects is not particularly useful.

I think the failure rate of FPGAs is low enough that you will have
troubled getting enough data to say what is "typical".  The factory
probably has some but they probably won't say much.

I'd expect the most common problem would be ESD damage to an I/O pin.

Even if something deep inside the chip does die, you have the
problem of figuring out what/where the problem is.  I suppose
you could design your logic so it was easy to test.  I've never
done that.  It sounds hard.  I wonder how much extra logic it would
add.

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

Article: 137043
Subject: Re: Looking for a strategy to identify nets in post-map netlist
From: Svenn Are Bjerkem <svenn.bjerkem@googlemail.com>
Date: Sat, 20 Dec 2008 12:50:48 -0800 (PST)
Links: << >> << T >> << A >>

On Dec 19, 9:59=A0pm, Mike Treseler <mtrese...@gmail.com> wrote:
> KJ wrote:
>
> . . .
>
> > Darn near every time completing the above steps will negate the suppose=
d
> > need for post route simulation...and it will take far less time and eff=
ort
> > to do so.
>
> Well said. Thanks for the posting.
> An sdf + netlist sim is a test of my design *process*.
> A failure is a fire alarm, not something to be debugged directly.

Old code "written" in HDL-Designer by somebody not with the company
anymore and in such a way that your toe-nails curl, and the biggest
problem of them all: It used to actually work in the old design. The
RTL design is actually working in my setup also, giving correct
results with recorded data both from back then and today. This is a
clear indication that I should be able to make the old design work
now, too, but I am data-mining something that I didn't write, and the
way I usually do that is to look at data in different places in the
failing design just to identify where the things go wrong. This is a
one-off fire-fighting and not a design process in general. The result
will be RTL code and timing constraints that will ensure that the
design is routable without need for hacks. Then we rewrite the whole
stuff after first prototype.

--
Svenn

Article: 137044
Subject: Re: Large BRAM synthesis
From: benwang08@gmail.com
Date: Sat, 20 Dec 2008 14:01:55 -0800 (PST)
Links: << >> << T >> << A >>

On Dec 20, 6:51 am, Lorenz Kolb <lorenz.k...@uni-ulm.de> wrote:
> I have not had a look at the technology map for Your device (maybe
> PlanAhead is worth a look for You and Your design!), but from my guess:
> those BRAM-slices are quite some sort of distributed/scattered
> throughout Your device.
>
> Thus taking a bunch of them to build a larger RAM will result in a lot
> of routing delay and thus slow down Your maximum clock speed.
>
> I suppose: one BRAM with both in- and outputs registered ...

Thank for your reply.
BRAM slices are distributed over the device in Xilinx case. If we can
not have a large memory made of smaller BRAM slices with decent data
rate, I couldn't see the points having multi-mega bytes of BRAM on a
single device. The only scenario to use all the BRAMs may be just to
have a large number of pipelines each with a small amount of BRAMs. Am
I right? Please help if you have comments.
Thanks,

Article: 137045
Subject: Re: Large BRAM synthesis
From: nico@puntnl.niks (Nico Coesel)
Date: Sat, 20 Dec 2008 22:20:18 GMT
Links: << >> << T >> << A >>

benwang08@gmail.com wrote:

>Hi,
>
>My design needs to use most of BRAMs (80%) on a V5 LX50T as one memory
>storage. The synthesis results gave me a clock rate of about 100 MHz,
>which is far from my expectation. I am new to this, but can someone
>tell me is this normal - the routing and fanout will slow down the
>design enormously when a lot of BRAMs are used? How can solve this
>problem? Xilinx said the BRAM can run at 550 MHz. On what condition
>can this be achieved?

Try to add registered muxes. This will speedup the routing paths at
the expense of needing extra clock cycles before a result comes out of
the blockram. Also be aware that running blockram at 550MHz leaves
very little time for logic between the blockram output and a flipflop.

-- 
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
                     "If it doesn't fit, use a bigger hammer!"
--------------------------------------------------------------

Article: 137046
Subject: Re: Large BRAM synthesis
From: Lorenz Kolb <lorenz.kolb@uni-ulm.de>
Date: Sat, 20 Dec 2008 23:28:53 +0100
Links: << >> << T >> << A >>

benwang08@gmail.com wrote:
> On Dec 20, 6:51 am, Lorenz Kolb <lorenz.k...@uni-ulm.de> wrote:
>> I have not had a look at the technology map for Your device (maybe
>> PlanAhead is worth a look for You and Your design!), but from my guess:
>> those BRAM-slices are quite some sort of distributed/scattered
>> throughout Your device.
>>
>> Thus taking a bunch of them to build a larger RAM will result in a lot
>> of routing delay and thus slow down Your maximum clock speed.
>>
>> I suppose: one BRAM with both in- and outputs registered ...
> 
> Thank for your reply.
> BRAM slices are distributed over the device in Xilinx case. If we can
> not have a large memory made of smaller BRAM slices with decent data
> rate, I couldn't see the points having multi-mega bytes of BRAM on a
> single device.

Ok, let's see what use-cases come to my mind: mssive parallel data 
manipulation (typical video filters can be reduced to moving 3 to 4 line 
filters thus  using 3 to 4 BRAMs). Routing for that number should be 
reasonable fast.

FIFOs buffering some data. Either the design is slow, then the FIFOs 
have to be large or the design is fast and small FIFOs will be sufficient.

So it seems there are quite a reasonable amount of applications for what 
xilinx is building.

> The only scenario to use all the BRAMs may be just to
> have a large number of pipelines each with a small amount of BRAMs. Am
> I right? Please help if you have comments.

Well actually, You might be able to speed the system up using 
pipelinging (however that will cost You latency and depending on Your 
algorithm might result in pipeline-flushes and thus have a dramatical 
impact on Your performance. Are You totally sure that You need such a 
lot of OnChip-Memory?
It is often possible to do a tradeoff between memory and computing power 
(and from my point of view computing power on an FPGA is quite well 
available ...)

> Thanks,

Regards,

Lorenz

Article: 137047
Subject: Re: PLL and clock in altera cyclone 2 fpga
From: "Gary Pace" <abc@def.com>
Date: Sat, 20 Dec 2008 21:31:59 -0600
Links: << >> << T >> << A >>


"Jamie Morken" <jmorken@shaw.ca> wrote in message
news:L9%2l.17194$iY3.2034@newsfe14.iad...
> Hi,
>
> I am using a cyclone 2 FPGA, and have a propagation delay warning in one
> of the megafunction's, lpm_divide.  If we use a slower clock to this block
> it will work properly, but the system clock is 27MHz which is too fast for
> the bit width's of the numerator and denominator even with pipelining
> selected in lpm_divide.  I haven't used the cyclone PLL before, but its
> lowest output frequency is 10MHz which is still a bit higher than I would
> like to run the lpm_divide at.  I could add another external crystal, but
> I was wondering if it is possible to generate a logic clock inside the
> FPGA by using flipflops etc.  I have been told that this is a bad idea to
> clock this way due to logic glitches, but am not sure why or if that is
> true?
>
> cheers,
> Jamie

Jamie :

LPM_DIVIDE is very resource expensive and slow - it is a complex
combinatorial implementation.
I would suggest the following solutions :
1 : Implement a successive approximation divide - this will use far less
resource but will take at least one clock cycle per bit of the quotient
(plus some for sign restoration probably)
2 : Register the inputs and outputs to LPM_DIVIDE, then define a suitable
multi-cycle assignment from the input registers to the output register. This
will use all the resources that LPM_DIVIDE needs, but allows you to cope
with its slowness.

Article: 137048
Subject: Bit width in CPU cores
From: rickman <gnuarm@gmail.com>
Date: Sun, 21 Dec 2008 01:08:14 -0800 (PST)
Links: << >> << T >> << A >>

I am resurrecting a CPU design I created some years back for use in an
FPGA.  It was highly optimized for minimal resource usage, both logic
and BRAMs.  One of the things I didn't optimize fully is the usage of
the full 9 bits of BRAMs in most FPGAs.  When used in byte width or
larger, they use multiples of 9 bits rather than 8 to provide a parity
bit.  It seems a waste to ignore these bits.

I have already optimized the instruction set fairly well for a zero-
operand architecture (dual stack based) using 8 bit instructions.  In
general, 8 bits is overkill for this sort of machine.  For example,
various MISC CPUs use 5 bit instructions packed into larger words.
There is no  need for 256 different instructions, so I used an
immediate field in the jump and call instructions and still have a
total of 36 instructions.

Given this as a starting point, an easy way to use this extra bit is
as a return flag.  There are any number of instructions that if
followed by a return, could now be combined saving both the time of
execution and the byte of storage for the return instruction.
Referring to Koopman's data on word frequency this will reduce the
number of bytes in the code by up to 7.5% and speed by up to nearly
12%.  I say "up to" because only 19 of the current 36 instructions (or
about half) can be optimized this way.  Instructions that use the
return stack can't be combined with the RET opcode because it also
uses the return stack.

The result is likely to be that half of the time this RET bit can be
used giving a 3+ % savings in code size and 6% savings in clock
cycles.  I would never think of enlarging the program memory by 12.5%
to get a 3% savings, but since this bit is actually free, it makes
some sense to use it this way.

On instructions where it has a better use, such as the literal, it is
not used for the return.  The literal instruction goes from a 7 bit
literal to an 8 bit literal shifted onto the return stack.  The jump
and call instructions go from having a 4 bit immediate field to a 5
bit field.  This may seem like not much, but in the code I had written
for this CPU it was a significant percentage of the time that the
length of the jump was far enough that it would not fit in 4 bits and
required an extra byte of storage.  Going to a 5 bit field would have
accommodated most of these cases.

The question is whether there is a better way to use this extra bit in
an instruction.  I am sure it would not be productive to just add more
opcodes.  I haven't seen any other ideas in any designs I have read
about, so the RET function seems like the best.

Rick

Article: 137049
Subject: Re: Bit width in CPU cores
From: rickman <gnuarm@gmail.com>
Date: Sun, 21 Dec 2008 01:13:34 -0800 (PST)
Links: << >> << T >> << A >>

On Dec 21, 4:08 am, rickman <gnu...@gmail.com> wrote:
> I am resurrecting a CPU design I created some years back for use in an
> FPGA.  It was highly optimized for minimal resource usage, both logic
> and BRAMs.  One of the things I didn't optimize fully is the usage of
> the full 9 bits of BRAMs in most FPGAs.  When used in byte width or
> larger, they use multiples of 9 bits rather than 8 to provide a parity
> bit.  It seems a waste to ignore these bits.
>
> I have already optimized the instruction set fairly well for a zero-
> operand architecture (dual stack based) using 8 bit instructions.  In
> general, 8 bits is overkill for this sort of machine.  For example,
> various MISC CPUs use 5 bit instructions packed into larger words.
> There is no  need for 256 different instructions, so I used an
> immediate field in the jump and call instructions and still have a
> total of 36 instructions.
>
> Given this as a starting point, an easy way to use this extra bit is
> as a return flag.  There are any number of instructions that if
> followed by a return, could now be combined saving both the time of
> execution and the byte of storage for the return instruction.
> Referring to Koopman's data on word frequency this will reduce the
> number of bytes in the code by up to 7.5% and speed by up to nearly
> 12%.  I say "up to" because only 19 of the current 36 instructions (or
> about half) can be optimized this way.  Instructions that use the
> return stack can't be combined with the RET opcode because it also
> uses the return stack.
>
> The result is likely to be that half of the time this RET bit can be
> used giving a 3+ % savings in code size and 6% savings in clock
> cycles.  I would never think of enlarging the program memory by 12.5%
> to get a 3% savings, but since this bit is actually free, it makes
> some sense to use it this way.
>
> On instructions where it has a better use, such as the literal, it is
> not used for the return.  The literal instruction goes from a 7 bit
> literal to an 8 bit literal shifted onto the return stack.  The jump
> and call instructions go from having a 4 bit immediate field to a 5
> bit field.  This may seem like not much, but in the code I had written
> for this CPU it was a significant percentage of the time that the
> length of the jump was far enough that it would not fit in 4 bits and
> required an extra byte of storage.  Going to a 5 bit field would have
> accommodated most of these cases.
>
> The question is whether there is a better way to use this extra bit in
> an instruction.  I am sure it would not be productive to just add more
> opcodes.  I haven't seen any other ideas in any designs I have read
> about, so the RET function seems like the best.

I meant to crosspost to CLF and I also wanted to ask if anyone has
looked at similarly making the word size a multiple of 9 rather than 8
bits.  That not only matches the BRAMs, but also the multipliers.  Is
there any real advantage to this?  I guess this could make it a bit
hard to address 8 bit bytes in an 18 bit word.  Or maybe not.
Opinions?

Rick

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search