Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 81925

Article: 81925
Subject: Re: RAMB16_S9
From: "John_H" <johnhandwork@mail.com>
Date: Mon, 04 Apr 2005 19:01:59 GMT
Links: << >>  << T >>  << A >>
"Ann" <ann.lai@analog.com> wrote in message news:ee8d229.1@webx.sUN8CHnE...
> Hi, I have read these materials before I wrote the code, and I have just
re-read it, seems like the way that I instantiate the code is write. I don't
know why the data is not there though. Does anyone have an example or
something? Thanks, Ann

I looked at your code segments earlier and it looks 100% correct.  The state
machine goes through 256 writes to the same address.  The first write to
that address should produce a valid read value.  As long as your clock is
verifiably there, I'd suggest you could have a mistake in the code that
reports the read value *suggesting* that the read value is zero when it
isn't.

I hope you find the trouble.

- John_H
(by the way, your posts aren't wrapping when sent causing some problems in
other newsreaders)



Article: 81926
Subject: Re: Stupid question
From: Jason Zheng <xin.zheng@jpl.nasa.gov>
Date: Mon, 04 Apr 2005 12:20:41 -0700
Links: << >>  << T >>  << A >>
mk wrote:
> On 04 Apr 2005 19:48:58 +0100 (BST), Thomas Womack
> <twomack@chiark.greenend.org.uk> wrote:
> 
> 
>>Is there any way of using the Xilinx toolchain on a Mac?
>>
>>I have become spoiled by my Mac Mini, and unpacking my loud PC
>>just to run place-and-route seems inelegant.
>>
>>Tom
> 
> 
> give this a try
> http://www.microsoft.com/mac/products/virtualpc/virtualpc.aspx
Ahh...Even if it runs, expect at least a 10 fold performance decrease on 
PAR (assuming that you have already upgraded the mini mac's pathetic 
256MB RAM). Sadly it is already slow enough. With the time that you 
wasted waiting for PAR to finish, might as well spend some time 
installing a water-cool x86 box. Better yet, running the tools on an 
xbox (with linux) might even be faster (http://www.xbox-linux.org/)!

-jz

Article: 81927
Subject: Re: fpga async design help me
From: Ray Andraka <ray@andraka.com>
Date: Mon, 04 Apr 2005 15:26:11 -0400
Links: << >>  << T >>  << A >>
perltcl@yahoo.com wrote:

>hi
>
>I need help with my async desgin. I'm using xilinx virtex-ii. I'm very
>new with async stuff and so my understanding is very limited--
>particularly different fpga architechures. (and async terminology.)
>
>Here is what I want to do:
>
>module async(clk,loopbackclk,....)
>input clk;
>output loobackclk; reg loopbackclk;
>// decl , init and reset stuff omitted
>always @(clk)
>begin
>  case (state)
>  case 0: begin  // do stuff
>                 state <= state +1;
>                 loopbackclk <=state;
>          end
>  case 1: begin  // do stuff
>                 state <= state +1;
>                 loopbackclk <= state;
>          end
>  ....
>  endcase
>end
>endmodule
>
>Now in my top module:
>
>module top()
>wire clk,loopbackclk;
>async a(clk,loopbackclk,...);
>
>// now depends on what I use for synthesis --
>module SOME_BUF_STUFF?(O,I);   // if using xilinx tools
>  assign  clk <= I;            // maybe O
>  assign  loopbackclk <=O;     // maybe I
>endmodule // end loopback
>
>// I'm totally blank here , please tell me what do I do if using Icarus
>verilog
>param(.... clk ....            // if using Icarus verilog
>param(.... loopbackclk ...     // if using Icarus verilog
>
>endmodule // top module
>
>A few questions:
>First, is there some generic "buffer" or "pipe" (or insert correct
>terms here)
> for differenct FPGA's that I can loopback my "state" back as "clk", so
>that my state transition only depends on internal circuit, not on a
>global clock.
>Please give me specific "names" for them. So that I can actually try
>it.
>
>Since I prefer using generic tools like Icarus verilog, please help if
>you now how to do it. (If possible, I only use vendor specific tools
>for P and R) 
>
>Thanks.
>
>  
>
You are not likely to succeed without doing hand place and route because 
of the delays in the FPGA.  You need to be very careful to eliminate 
hazards due to race conditions in your design using proper cover terms.  
Also, remember that the logic is implemented conceptually as small 
look-up tables, so you need to be careful about any glitches generated 
while traversing the LUT.  Also, be aware that 'wires' in an FPGA add 
delay, so routing can become very important in order to avoid adding 
unintended delays.  Yes, it can be done, but the existing tools are not 
meant for asynchronous design (and will quickly get you into trouble if 
you depend blindly on them), and the FPGAs are optimized for synchronous 
design.  Generally speaking, you'll probably be using local signals for 
clock in this case, so the global clock buffers are likely to be of 
little or no interest to you (I think that is what you are asking).

-- 
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759



Article: 81928
Subject: Re: Xilinx tools, bugs all around?
From: "Antti Lukats" <antti@openchip.org>
Date: Mon, 4 Apr 2005 21:33:50 +0200
Links: << >>  << T >>  << A >>
"Bret Wade" <bret.wade@xilinx.com> schrieb im Newsbeitrag
news:424DC52A.1060704@xilinx.com...
> Antti Lukats wrote:
>
> > but with the Virtex 4 bug, thats a bit scarier
> >
> > a simple design with 16 counters connected to 16 pin locked GCK inputs.
> >
> > P&R fails, saing one signal is not fully routed
>
> Hello Antti,
>
> If PAR only fails to route a single signal, that's usually an indication
> of a packing or placement problem leading to an unroutable connection,
> rather than a congestion issue. These problems are usually not too
> difficult to understand and correct with packing or placement
> constraints. Although you are focused on the number of clocks in the
> design, you don't say whether the unrouted signal is a clock net or
> something else, so I won't speculate on the root cause.
>
> I suggest examining the design in FPGA Editor and trying to understand
> where the routing conflict is. If you are unable to make any progress
> with this method, I suggest opening a webcase and providing a test for
> investigation.
>
> Regards,
> Bret Wade
> Xilinx Product Applications
>

Hm, thanks for some hints, but well I think there is a an issue related to
V4 because
the same design with 16 clock routes OK on V2 without any problems.
the all design occupies less than 4% of the V4, so I am pretty confident the
design is
routable. And if simple design is routable there should be no reason to look
into
FPGA editor or add placement constraints to make the design routable.

I will try to open a webcase too.

Antti






















Article: 81929
Subject: Re: ModelSim XE and WindowsXP
From: Nemesis <nemesis@nowhere.invalid>
Date: Mon, 04 Apr 2005 19:38:30 GMT
Links: << >>  << T >>  << A >>
Mentre io pensavo ad una intro simpatica "Nemesis" scriveva:

>> ModelSim needs an env variable called LM_LICENSE_FILE which points to 
>> the license.dat file. When you install ModelSim as administrator I guess 
>> that this env. variable is installed in the private space of the 
>> administrator. Have you verified that LM_LICENSE_FILE is defined as 
>> system variable and not user variable? Can you see it when you log in as 
>> a normal user?
> Now I can't control (I'll see tomorrow) but the error says that a "text
> file" cannot be written.
> However I'll try setting this environment variable.

I checked, the environment variable is correct. Today I saw that if I
open the licensing wizard before opening ModelSim, it correctly reports
that the license is valid, but when I open ModelSim as User it stops
working.

-- 
BREAKFAST.COM Halted... Cereal Port Not Responding.
 
 |\ |       |HomePage   : http://nem01.altervista.org
 | \|emesis |XPN (my nr): http://xpn.altervista.org


Article: 81930
Subject: Re: Searching for Vision Concavity Algorithm
From: "Brad Smallridge" <bradsmallridge@dslextreme.com>
Date: Mon, 4 Apr 2005 12:44:58 -0700
Links: << >>  << T >>  << A >>
Hi Jonathan,

Thank you for your consideration.

I have lots of little objects being fed to me one pixel at a time.
It's a line scan sorting operation with mutltiple ejectors.

I have an algorithm now that does blob labeling.  I am thinking
that as the blob grows, the rate that pixels are added to the
left and right side should first increase, then decrease. If I
see increase, decrease, and increase, that might indicate a
convexity, and should be fairly simple to detect.  At least this
is what I am thinking today.

I do need the concavity information because it has proven
so far to be the best way to determine where to segment
the objects that are touching.  The standard erosion/dilation
techniques only separate some of the objects.

I can remove holes with a filter if that becomes a problem.

So are you doing vision now?

Brad



"Jonathan Bromley" <jonathan.bromley@doulos.com> wrote in message 
news:1k0251h31thoai7cftfg4h3vjtam859ukr@4ax.com...
> On Fri, 1 Apr 2005 11:03:22 -0800, "Brad Smallridge"
> <bradsmallridge@dslextreme.com> wrote:
>
>>But how do you calculate, or otherwise detect the concavities?
>>I have done some initial work with small areas and bit patterns
>>but one soon runs out of logic gates.
>
> Most of the traditional binary image manipulation algorithms
> use various types of linked memory structures for flexibility.
> These don't work at all well in FPGA.  When faced with the
> limited memory and non-existent memory allocation opportunities
> in an FPGA, you'll need special algorithms that are sure to be
> application specific.  The key questions, it seems to me, are...
>
> 1) How is the image presented to you?  Do you get it pixel-by-
>   pixel from a camera of some kind, or are you given ready-
>   processed data structures from a CPU that writes to your
>   FPGA?
> 2) How big is the largest object that you need to process?
>   For the most interesting image processing operations,
>   you need to be able to store a complete bitmap of the
>   whole object, which in practice means storing every
>   pixel in a rectangle at least big enough to hold the
>   object.
> 3) Do you need to process multiple objects simultaneously?
>   I ask this because, the last time I did any binary vision
>   stuff, we were capturing images of codfish on a conveyor
>   belt.  The belt went under a line-scan camera, and it was
>   wide enough that there could be several fish in view at
>   any one time.  Our software needed to process all of them,
>   else you got an awful lot of waste fish on the floor :-)
> 4) Are you interested in any included holes in the object,
>   or only its outline?  If you care only about the outline,
>   then the extent of the image on each scan line can be
>   represented by only two numbers - the left and right
>   boundaries of the object on that line.
>
> Given the kind of data representation I outlined in (4),
> there's a relatively simple algorithm for extracting the
> convex hull.  It requires only two additional bits of
> storage associated with each scan line, in addition to
> the left and right boundary information.  However, it is
> not totally FPGA-friendly because at each scan line you
> have to look back over all previous scan lines on the
> object.  You may find it's best to include a little
> CPU in the design, to help with the sequencing of this
> or similar algorithms.
>
> I wonder... does your application REALLY need all the
> concavity information?  It may be possible to get all
> the information you need from simple accumulated
> measurements such as the centre-of-gravity, area,
> and first and second moments of inertia of the object.
> -- 
> Jonathan Bromley, Consultant
>
> DOULOS - Developing Design Know-how
> VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services
>
> Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
> Tel: +44 (0)1425 471223          mail:jonathan.bromley@doulos.com
> Fax: +44 (0)1425 471573                Web: http://www.doulos.com
>
> The contents of this message may contain personal views which
> are not the views of Doulos Ltd., unless specifically stated. 



Article: 81931
Subject: Re: Achieving required speed in Virtex-II Pro FPGA
From: Ray Andraka <ray@andraka.com>
Date: Mon, 04 Apr 2005 15:51:14 -0400
Links: << >>  << T >>  << A >>
v_mirgorodsky@yahoo.com wrote:

>Hi ALL,
>
>I got the problem solved in not very efficient way. I replaced SRL16
>elements with conventional triggers and now design flys in the sky -
>the fmax went all the way up to 214+MHz.
>
>The only thing left to figure out - why conventional triggers do such a
>good job and "very efficient" SRL16 apeared to mess up everything :(
>
>With best regards,
>Vladimir S. Mirgorodsky
>
>  
>
The SRL16s MUST be used with a flip-flop on the output and located in 
the same slice in order to obtain reasonably high performance.  The 
reason is the propagation time through the SRL16 and back out of the 
slice is dismally slow.  In order to use that flip-flop, however, the 
flip-flop cannot have a reset on it other than the power on-reset 
because the SR line is shared with one of the controls for the SRL16.  
The catch is that in order to get the synthesis to infer the SRL16 
followed by a flip-flop, you need to either instantiate the flip-flop or 
put a reset pin on it.  You may also be able to make it produce this 
using a syn_preserve or similar attribute, but that really seems to be 
synthesis tool version dependent.

So the short answer is your synthesizer is inferring the SRL16 without 
putting a flip-flop after it in the same slice, which makes for a very 
long set-up time through the SRL16.

Another factor is the multiple levels of logic.  The Xilinx PAR is 
notoriously bad at placing the additional LUTs when you have more than 
one LUT level between flip-flops in a signal path.  If you can, redesign 
the logic so that the critical path goes through few layers of LUTs 
between flip-flops.  Otherwise, at least look at the placement results 
and try some manual floorplanning to improve the LUT placement. 

 From your success by replaicng the SRL16, it sounds like just making 
sure you get that flip-flop on the SRL16 output will probably fix this 
for you.

-- 
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759



Article: 81932
Subject: Re: XC3000 non-recoverable lockup problem
From: Jim Granville <no.spam@designtools.co.nz>
Date: Tue, 05 Apr 2005 08:07:09 +1200
Links: << >>  << T >>  << A >>
lecroy7200@chek.com wrote:
<snip>
> As per our off-line talks, I have gone ahead and rebuilt the design
> using slew limited outputs for the two pins in question.  I have begun
> running my transient tests but it will be a few weeks before I am
> convinced this was the problem.
> 
> The following link is to my post about the reflected energy causing
> possible problems:
> 
> http://groups-beta.google.com/group/comp.arch.fpga/browse_frm/thread/1423e577bf37d509/1f921b2ef9ae4542?q=reflected&rnum=3#1f921b2ef9ae4542
> 
> The following was taken from a Xilinx app. note.
> 
> "For all FPGA families, ringing signals are not a cause for reliability
> concerns. To cause such a problem, the Absolution Maximum DC conditions
> need to be violated for a considerable amount of time (seconds). "
<snip>
That's from a Pin-failure viewpoint. - ie energy damage.
They also spec a MAX peak current.

There IS another failure mode, which is the lateral currents that result 
from the clamp diodes ( which are actually side-ways transistors ).
It is not easy to KNOW what peak currents you get, especially on cable 
or external runs.

At the highest levels, these injection currents cause latch-up, but 
there can be lower levels, where operation is compromised, but the
device does not latch up.

Latchup tests are purely "did the SCR trigger?" ones, they do NOT 
(AFAIK) ever check to see if the part logically miss-fired in any way.

-jg


Article: 81933
Subject: Re: Achieving required speed in Virtex-II Pro FPGA
From: Ray Andraka <ray@andraka.com>
Date: Mon, 04 Apr 2005 17:04:18 -0400
Links: << >>  << T >>  << A >>
Antti Lukats wrote:

><v_mirgorodsky@yahoo.com> schrieb im Newsbeitrag
>news:1112375908.017059.293810@g14g2000cwa.googlegroups.com...
>  
>
>>Hi ALL,
>>
>>I got the problem solved in not very efficient way. I replaced SRL16
>>elements with conventional triggers and now design flys in the sky -
>>the fmax went all the way up to 214+MHz.
>>
>>The only thing left to figure out - why conventional triggers do such a
>>good job and "very efficient" SRL16 apeared to mess up everything :(
>>    
>>
>
>hm thats strange
>there is on usually unused flip flop at the 'end' of SRL 16
>so doing the SRL16 1 clock shorter and using that flop should have the same
>performance as only flips
>if what you say is so, then it must be a bug in the timing estimation ??
>
>antti
>
>
>  
>
Most likely, that flip-flop is not being placed with the SRL16 due to 
routing limitiations.  Specifically, the flip-flop cannot have the reset 
used other than as part of the dedicated global reset because the reset 
pin to the slice is shared with the WE function for the SRL16.  If you 
forced instantiation of a Flip-flop by adding a reset, it forced the 
flip-flop into another slice, which kills the SRL16 timing.

-- 
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759



Article: 81934
Subject: Re: Achieving required speed in Virtex-II Pro FPGA
From: Ray Andraka <ray@andraka.com>
Date: Mon, 04 Apr 2005 17:24:48 -0400
Links: << >>  << T >>  << A >>
You have a few choices. 

You can instantiate the SRL16 and FF.  Doing that guarantees the proper 
components, and provided you don't have a reset on the FF, the mapper 
will pack the two into the same slice even if you don't put RLOCs on 
it.  Without the RLOCs, there is no issue of portability between Xilinx 
families later than XC4000* and SpartanI.  Including RLOCs adds an 
additional wrinkle because the RLOC format for Virtex, VirtexE and 
Spartan2 is different than that for later families, and with Spartan3 or 
Virtex4, there are restrictions as to which columns can have an SRL16.  
Anyway, you have the choice of using or not using RLOCs

You can also infer the flip-flop by connecting global reset to it.  If 
you do this, you MUST make sure every flip-flop in the design also has 
the global reset  connected to it if you are inferring the global 
reset.  If you leave any flip-flop out, you wind up with a huge net on 
general routing resources for the reset.  You also get a signal wired to 
the SRL flip-flop reset pin, which in turn forces it out of the SLR slice.

You can connect the inferred flip-flop reset pin to an instantiated ROC 
component.  This puts the flip-flop reset on the built in reset network

Depending on the synthesis tool, you may be able to set an attribute to 
force the synthesizer to put a flip-flop on the SRL16 output.  If you do 
this, check your result any time you use a different tool or version of 
the tool.  If you have more than one SRL16 chained together, the tools 
historically have only put the flip-flop on the last one in the chain, 
which is no better than not using the flip-flop at all.

Depending on the tool, you may also be able to put a keep buffer btween 
the inferred SRL16 and flip-flop to force that signal to be retained.  
Early on, I had mixed results with this using Synplify.  Some versions 
it worked, others it didn't (one version it forced a LUT to be inserted 
between the SRL and the FF....the worst possible outcome).

How do I deal with it?  I have an IP block that instantiates RLOC'd 
SRL16's and flip-flops.  It takes the desired delay and virtex family 
as  generics and generates an array of SRL16's and FFs to match the 
width of the output port and divides the delay up into as many SRL16+FF 
segments needed to create the delay.

The root of the problem is the SRL16 has a compartively very slow 
clock-Q time, which is not a problem as long as the SRL16 is wired only 
to the flip-flop in the same slice (thereby avoiding adding routing 
delays to the long clock to Q).  This is compounded because synthesis 
tools don't automatically stick a register on the SRL16 output. 



-- 
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759



Article: 81935
Subject: Re: Xilinx tools, bugs all around?
From: Bret Wade <bret.wade@xilinx.com>
Date: Mon, 04 Apr 2005 15:28:58 -0600
Links: << >>  << T >>  << A >>
Antti Lukats wrote:

> "Bret Wade" <bret.wade@xilinx.com> schrieb im Newsbeitrag
> news:424DC52A.1060704@xilinx.com...
> 
>>Antti Lukats wrote:
>>
>>
>>>but with the Virtex 4 bug, thats a bit scarier
>>>
>>>a simple design with 16 counters connected to 16 pin locked GCK inputs.
>>>
>>>P&R fails, saing one signal is not fully routed
>>
>>Hello Antti,
>>
>>If PAR only fails to route a single signal, that's usually an indication
>>of a packing or placement problem leading to an unroutable connection,
>>rather than a congestion issue. These problems are usually not too
>>difficult to understand and correct with packing or placement
>>constraints. Although you are focused on the number of clocks in the
>>design, you don't say whether the unrouted signal is a clock net or
>>something else, so I won't speculate on the root cause.
>>
>>I suggest examining the design in FPGA Editor and trying to understand
>>where the routing conflict is. If you are unable to make any progress
>>with this method, I suggest opening a webcase and providing a test for
>>investigation.
>>
>>Regards,
>>Bret Wade
>>Xilinx Product Applications
>>
> 
> 
> Hm, thanks for some hints, but well I think there is a an issue related to
> V4 because
> the same design with 16 clock routes OK on V2 without any problems.
> the all design occupies less than 4% of the V4, so I am pretty confident the
> design is
> routable. And if simple design is routable there should be no reason to look
> into
> FPGA editor or add placement constraints to make the design routable.
> 
> I will try to open a webcase too.
> 
> Antti

Hello Antti,

Yes, this is likely a tool problem related to a new feature in the V4 
parts such as the Regional Clocks. But until we know more details about 
the problem, like which connection is unrouted, we are no closer to a 
solution.

It would also help to know what tool version is involved. I suspect that 
you are not yet using 7.1i since I would expect a different failure 
mode. 7.1i does an "unroutability check" before routing that detects and 
and errors out on unroutable connections, but you didn't describe that 
scenario. The 7.1i version also solves many of the early teething 
problems found with V4 devices.

Regards,
Bret

Article: 81936
Subject: Re: Xilkernel: configure to use 2 PPCs
From: "Joseph" <joeylrios@gmail.com>
Date: 4 Apr 2005 15:05:38 -0700
Links: << >>  << T >>  << A >>
Thanks for the response Josh,

I knew about the ability to run two separate OSes, but really do want
to get an SMP machine out of the Virtex II.  I will check out the mem
management section like you advised.  I would like to avoid software
solutions to this issue.  Honestly, I am a bit surprised that more
information isn't readily available on using the hard cores in an SMP
fashion.  If anyone else has any advice or references to pass along, I
would appreciate it.

Joseph


Article: 81937
Subject: Re: Reverse engineering ASIC into FPGA
From: "MikeJ" <support@{nospam}fpgaarcade.com>
Date: Mon, 4 Apr 2005 23:33:18 +0100
Links: << >>  << T >>  << A >>
mmm, next time you want one done send me to India ! Haven't been there yet
:)

Depends how big the asic is and how much info you have on it.

www.fpgaarcade.com

I have cloned a few early NAMCO asics and made plug in 28pin replacements.
No documentation on them, but functionally simple. Very small amounts of
code compared to my normal large virtex4 type stuff, but lots of debugging
and trial and error to get exact behaviour under all (tested at least)
cases.

I have also (almost) finished the atari st custom chip sets, for which there
is a lot of documentation.

What are you after ?
/Mike.



Article: 81938
Subject: Re: Reverse engineering ASIC into FPGA
From: "MikeJ" <support@{nospam}fpgaarcade.com>
Date: Mon, 4 Apr 2005 23:35:44 +0100
Links: << >>  << T >>  << A >>
oh, I also have written a number of tools to turn various asic netlists back
into VHDL  ...
Again, all depends what you want to do.



Article: 81939
Subject: Re: Stupid question
From: Ben Twijnstra <btwijnstra@gmail.com>
Date: Mon, 04 Apr 2005 22:37:07 GMT
Links: << >>  << T >>  << A >>
Hi Jason,

> Better yet, running the tools on an xbox (with linux) might even be faster
> (http://www.xbox-linux.org/)! 

Hmmm... A Celery 733 is not exactly the preferred CPU for
computing-intensive stuff like P&R. I can even imagine a G5 using VirtualPC
running faster than that.

Best regards,


Ben


Article: 81940
Subject: Re: XC95108 problem
From: "Ross Marchant" <rossm@NOexcelSPAMtech.com.auSTRALIA>
Date: Tue, 5 Apr 2005 08:58:19 +1000
Links: << >>  << T >>  << A >>

"Laurent Gauch" <laurent.gauch@DELETEALLCAPSamontec.com> wrote in message
news:4250FE32.90605@DELETEALLCAPSamontec.com...
>
>
> Ross Marchant wrote:
> > Hi,
> >
> > I'm using the XC95108 CPLD and Xilinx ISE 7.1.01i. The problem I am
having
> > is
> > that outputs are inverted when they aren't supposed to be.
> >
> > *****************
> >
> > This is my vhdl file:
>
> --------------------------------------------------------------------------
--
> > ----
> > library IEEE;
> > use IEEE.STD_LOGIC_1164.ALL;
> > use IEEE.STD_LOGIC_ARITH.ALL;
> > use IEEE.STD_LOGIC_UNSIGNED.ALL;
> >
> > ---- Uncomment the following library declaration if instantiating
> > ---- any Xilinx primitives in this code.
> > --library UNISIM;
> > --use UNISIM.VComponents.all;
> >
> > entity test is
> >     Port ( In1 : in std_logic;
> >            Out1 : out std_logic);
> > end test;
> >
> > architecture Behavioral of test is
> >
> > begin
> >  Out1 <= In1;
> > end Behavioral;
> >
> > *****************
> >
> > This is my ucf file:
> > #PACE: Start of Constraints generated by PACE
> >
> > #PACE: Start of PACE I/O Pin Assignments
> > NET "In1"  LOC = "P24"  ;
> > NET "Out1"  LOC = "P54"  ;
> >
> > #PACE: Start of PACE Area Constraints
> >
> > #PACE: Start of PACE Prohibit Constraints
> >
> > #PACE: End of Constraints generated by PACE
> >
> > *****************
> >
> > Now i find if i put a low signal on pin 24 i get a high signal on pin 54
and
> > vice-versa, even though the post fit simulation shows it working
correctly.
> > What could be wrong??
> >
>
> Try again removing your lib declaration
> --use IEEE.STD_LOGIC_ARITH.ALL;
> --use IEEE.STD_LOGIC_UNSIGNED.ALL;
>

Unfortunately this does not work. I have started a web case with Xilinx and
they are looking into it for me.

Thanks
Ross



Article: 81941
Subject: Re: RAMB16_S9
From: Ann <ann.lai@analog.com>
Date: Mon, 4 Apr 2005 16:31:34 -0700
Links: << >>  << T >>  << A >>
Hi, for some reason, the write line have to toggle high and low for me to write and read data back. I thought for this kind of memory module, you only need WE to be high to write, and WE to be low when you read. If in my state machine, I have 10 write cycle where I set WE <= 1'b1, then the rest read and keep looping in read where I set WE = 1'b0, it doesn't work. If I set it to write 10 cycles, read 10 cycles, write 10 cycles, read 10 cycles...etc, then it works. Does anyone know what is wrong? I am terribly confused. Thanks, Ann

Article: 81942
Subject: Re: RAMB16_S9
From: "John_H" <johnhandwork@mail.com>
Date: Mon, 04 Apr 2005 23:52:03 GMT
Links: << >>  << T >>  << A >>
If you're simulating, look for the wr_en being assigned 0 outside the always
block you showed us; this would present odd behavior.

The Block RAM absolutely does not require the WE edge.  The WE level is
sampled on the rising edge of the clock to the BlockRAM with specific setup
and hold requirements.

Are you simulating, using ChipScope, looking at test points, or other?

"Ann" <ann.lai@analog.com> wrote in message news:ee8d229.3@webx.sUN8CHnE...
> Hi, for some reason, the write line have to toggle high and low for me to
write and read data back. I thought for this kind of memory module, you only
need WE to be high to write, and WE to be low when you read. If in my state
machine, I have 10 write cycle where I set WE <= 1'b1, then the rest read
and keep looping in read where I set WE = 1'b0, it doesn't work. If I set it
to write 10 cycles, read 10 cycles, write 10 cycles, read 10 cycles...etc,
then it works. Does anyone know what is wrong? I am terribly confused.
Thanks, Ann



Article: 81943
Subject: Re: exp(-x) function
From: Ray Andraka <ray@andraka.com>
Date: Mon, 04 Apr 2005 20:17:22 -0400
Links: << >>  << T >>  << A >>
Kolja Sulimma wrote:

>
> Most likely that's the way to do it. An alternative would be a 
> hyperbolic CORDIC, also explained by Ray:
> http://www.andraka.com/files/crdcsrvy.pdf
>
> Kolja Sulimma

If I needed high precision or for some reason could not use reasonably 
sized look-up tables, I'd consider a hybrid of these two, replacing the 
look-up with a CORDIC exp(x) after performing the normalize to strip off 
the integer part.  The look up method I described in my prior post will 
get you plenty of precision for most applications.

-- 
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759



Article: 81944
Subject: Re: can c++ code be loaded to a hardware PGA coprocessor card
From: CTips <ctips@bestweb.net>
Date: Mon, 04 Apr 2005 20:42:40 -0400
Links: << >>  << T >>  << A >>
[I've crossposted to comp.arch.fpga, where this question really belongs]

Mark wrote:

> 
> We have a particular software system/program
> that spends a large fraction of it's time in two or three
> particular functions that do lots of trig/geometric and matrix
> calculations. Due to the nature of the algorithms they
> use, they'll not parallelizable (the computer is a quad cpu box).
> Also because the rest of the code depends on the results
> from these functions, they are a bottleneck for the rest
> of the code that we can split across cpus.
> 
> I think I've read that network hardware engineers use
> special hardware that allows them to program the hardware
> to directly implement code via "Programmable arrays" or something.
> What I'd really like is an expansion card that would plug
> into the bus and a toolkit so that these few C++ functions
> in question would actually be implemented very fast in
> hardware and not in software on the unix box.  The C++ code
> would call the function and the toolkit API would set it
> up so that the function call goes to the card, is processed
> and the result returned back to the C++ program as the return
> looking to the rest of the code like it was done in the app.
> 
> I suppose the code for those functions would be dumped in a
> binary file on the unix box, then the card would know to
> load from the file when instructed to via a api call embedded
> in the main application (in a constructor probably). Probably
> the toolkit would have a way to define what C++ funcs to
> grab to put on the card when the C++ is compiled.
> 
> Is this doable?  I think Xilinx and other companies make the chips
> that do this, but I'm having trouble finding the end companies
> that make an actual card we can use.
> thanks
> Mark
> 

You're going to run into a few problems.

First, FPGAs are programmed (in general) in Verilog/VHDL, not C++. 
Anyone know of any real behavioral synthesis from C/C++ tools?

Second, if you're using trig functions, you're going to have to 
implement them differently than you would using software. Look up CORDIC 
algorithms, for example.

Third, you might not get the speed-ups you expect. It depends on the 
chunk of work you're going to offload. It takes a significant amount of 
time to dump data to an accelerator card, and then to read the result 
back in. It works best if each function call takes a long time to complete.

Fourth, if the code is truly non-parallelizable (i.e. you can't use 
SSE2/Altivec and you can't split it across multiple CPUs) then its quite 
likely that FPGAs won't help too much. Again, someone more clued in than 
me might be able to answer better, but I suspect that the FPUs in the 
CPU _MAY_ be better than what you can achieve on a an FPGA (it will 
depend on your algorithm etc.)

Out of curiousity, what CPU are you currently using? Also, are you using 
double-precision or single-precision FP?

Have you looked at the possibility of speeding up the performance of the 
software implementation? In particular, have you looked at how your 
trignometric functions are implemented, and whether you can trade 
accuracy/precision for perfromance? Unless you're absolutely sure that 
the software can't be improved, I wouldn't recommend looking at FPGA 
acceleration.

Article: 81945
Subject: Re: can c++ code be loaded to a hardware PGA coprocessor card
From: "JJ" <johnjakson@yahoo.com>
Date: 4 Apr 2005 19:08:40 -0700
Links: << >>  << T >>  << A >>
I'd agree, the PCI will kill you 1st, and any difficult for FPGA but
easy on the PC will kill you again, and finally C++ will not be so fast
as HDL by my estimate maybe 2-5x (my pure prejudice). If you must use C
take a look at HandelC, at least its based on Occam so its provably
able to synthesize into HW coz it ain't really C, just looks like it.

If you absolutely must use IEEE to get particular results forget it,
but I usually find these barriers are artificial, a good amount of
transforms can flip things around entirely.

To be fair an FPGA PCI could wipe out a PC only if the problem is a
natural, say continuously processing a large stream of raw data either
from converters or special interface and then reducing it in some way
to a report level. Perhaps a HD could be specially interfaced to PCI
card to bypass the OS, not sure if that can really help, getting high
end there. Better still if the operators involved are simple but occur
in the hundreds atleast in parallel.

The x86 has atleast a 20x starting clock advantage of 20ops per FPGA
clock for simple inline code. An FPGA solution would really have to be
several times faster to even make it worth considering. A couple of
years ago when PCI was relatively faster and PC & FPGAs relatively
slower, the bottleneck would have been less of a problem.

BUT, I also think that x86 is way overrated atleast when I measure nos.

One thing FPGAs do with relatively no penalty is randomized processing.
The x86 can take a huge hit if the application goes from entirely
inside cache to almost never inside by maybe a factor of 5 but depends
on how close data is  temporally and spatially..

Now standing things upside down. Take some arbitrary HW function based
on some simple math that is unnatural to PC, say summing a vector of
13b saturated nos. This uses less HW than the 16b version by about a
quarter, but that sort of thing starts to torture x86 since now each
trivial operator now needs to do a couple of things maybe even perform
a test and bra per point which will hurt bra predictor. Imagine the
test is strictly a random choice, real murder on the predictor and
pipeline.

Taken to its logical extreme, even quite simple projects such as say a
cpu emulator can runs 100s of times slower as a C code than as the
actuall HW even at the FPGA leisurely rate of 1/20th PC clock.

It all depends. One thing to consider though is the system bandwidth in
your problem for moving data into & out of rams or buffers. Even a
modest FPGA can handle a 200 plus reads / writes per clock, where I
suspect most x86 can really only express 1 ld or st to cached location
about every 5 ops. Then the FPGA starts to shine with 200 v 20/4 ratio,

Also when you start in C++, you have already favored the PC since you
likely expressed ints as 32b nos and used FP. If you're using FP when
integer can work, you really stacked the deck but that can often be
undone. When you code in HDL for the data size you actually need you
are favoring the FPGA by the same margin in reverse. Mind you I have
never seen FP math get synthesized, you would have to instantiate a
core for that.

One final option to consider, use an FPGA cpu and take a 20x
performance cut and run the code on that, the hit might not even be 20x
because the SRAM or even DRAM is at your speed rather than 100s slower
than PC. Then look for opportunities to add a special purpose
instruction and see what the impact of 1 kernal op might be. A example
crypto op might easily replace 100 opcodes with just 1 op. Now also
consider you can gang up a few cpus too.

It just depends on what you are doing and whether its mostly IO or
mostly internal crunching.

johnjakson at usa dot com


Article: 81946
Subject: Re: XMD : Running XMD with Caches on
From: Peter Ryser <peter.ryser@xilinx.com>
Date: Mon, 04 Apr 2005 21:43:23 -0700
Links: << >>  << T >>  << A >>
Nju,

XMD keeps the main memory consistent with the contents of the caches and 
vice versa. Debugging with caches on and/or off works in a consistent 
way and with guaranteed integrity of the data/code in the caches and the 
memory.

- Peter


Njuguna Njoroge wrote:
> Hello,
> 
> I would like to know whether turning on the caches in the PPC influences the functionality of XMD.
> 
> 1) For instance, when XMD downloads an ELF binary to the memory, it issues writes to processor through the debug ports. Is it safe to assume that these writes bypass the data cache? If this wasn't the case and you are using a writeback cache setting, then there is a chance that the instructions wouldn't make it to main memory. Thus, when executing the program, the instructions won't be read by the processor because it searches the instruction cache, then main memory on a miss. Does this make sense?
> 
> 2) When using the debug mrd (memory read) or mwr (memory write), is it safe to assume that the data cache is bypassed since it is a debug memory read/write, even if the address actually resides in the cache? If the debug read/write does search the cache and causes a miss, will the configured cache behavior ensue (like fetching the rest of the cache line on a miss)? If this is the case, then debug reads could change the state of the cache/memory, which may not be desired by the programmer.
> 
> In general, I would like to understand the mechanism that XMD uses to issue writes/reads to the processor for both instruction download and debug memory read/writes. The "PowerPC Processor Reference Guide" goes into nice detail about the debug capabilities of the PPC 405 with the various configuration registers and signals. However, there is no documentation (that I have found) that discusses how XMD employs those debug features. Therefore, I don't know if XMD is configuring the caches to go into non-cacheable mode for the debug memory accesses or it uses the existing configuration as defined by the program.
> 
> I'm working on a ML 310 board with a V2P30 -6 chip.
> 
> NN


Article: 81947
Subject: Re: Open PowerPC Core?
From: David <david.nospam@westcontrol.removethis.com>
Date: Tue, 05 Apr 2005 09:04:47 +0200
Links: << >>  << T >>  << A >>
On Mon, 04 Apr 2005 11:48:09 -0700, Eric Smith wrote:

> Tobias Weingartner wrote:
>> I doubt it's a matter of patents, but more a matter of licening.  The two
>> are very different beasts.
> 
> But if there isn't a patent on an architecture, you don't need a license
> to implement it.  The purpose of the license is to grant you a right that
> was taken away from the patent.  If there's no patent, you haven't been
> denied the right.

Since this topic has come up, maybe someone could answer this for me:

I've seen publicly available (often open source) cores for other
processors, such as the AVR.  Are these sort of cores legal to make,
distribute and use?  Supposing I made (from scratch) an msp430 compatible
core for an FPGA - any ideas whether that would be legal or not?  I'm
guessing that using the name "msp430" would be a trademark and/or
copyright violation, but if there are no patents involved it should be
okay?  Does it make any difference whether it is just used by the
developer, released as an inaccessible part of a closed design, or whether
it is released for free use by others?

mvh.,

David


Article: 81948
Subject: Re: Reverse engineering ASIC into FPGA
From: "Neo" <zingafriend@yahoo.com>
Date: 5 Apr 2005 00:42:58 -0700
Links: << >>  << T >>  << A >>
currently we are doing one such assignemnt for a client. They want to
do a board respin and wanted us to replace the few asics in there with
fpga's. but fortunately they are not complex but the process sucks.
less or no documentation or its in some foreign language, crazy!! and
nothing for reference except the working board. so its like code,
debug,debug,debug...until you get it right on the screen.


Article: 81949
Subject: Re: Open PowerPC Core?
From: "Antti Lukats" <antti@openchip.org>
Date: Tue, 5 Apr 2005 09:46:07 +0200
Links: << >>  << T >>  << A >>

"David" <david.nospam@westcontrol.removethis.com> schrieb im Newsbeitrag
news:pan.2005.04.05.07.04.46.345000@westcontrol.removethis.com...
> On Mon, 04 Apr 2005 11:48:09 -0700, Eric Smith wrote:
>
> > Tobias Weingartner wrote:
> >> I doubt it's a matter of patents, but more a matter of licening.  The
two
> >> are very different beasts.
> >
> > But if there isn't a patent on an architecture, you don't need a license
> > to implement it.  The purpose of the license is to grant you a right
that
> > was taken away from the patent.  If there's no patent, you haven't been
> > denied the right.
>
> Since this topic has come up, maybe someone could answer this for me:
>
> I've seen publicly available (often open source) cores for other
> processors, such as the AVR.  Are these sort of cores legal to make,
> distribute and use?  Supposing I made (from scratch) an msp430 compatible

its done full soc based on MSP430 compatible core :)
http://bleyer.org/


> core for an FPGA - any ideas whether that would be legal or not?  I'm
> guessing that using the name "msp430" would be a trademark and/or
> copyright violation, but if there are no patents involved it should be
> okay?  Does it make any difference whether it is just used by the
> developer, released as an inaccessible part of a closed design, or whether
> it is released for free use by others?
>
> mvh.,
>
> David
>





Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search