Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive

Compare FPGA features and resources

Threads starting:

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

On Oct 30, 12:23 am, Patrick Dubois <prdub...@gmail.com> wrote: > Hi, > > I'd like to ask experts here about ideas to perform a FFT on an > arbitrary number of points (for real data). > > The cores usually found for an FPGA implementation only permit FFTs on > a number of points that is a power of 2. In our particular case > however, we need to be able to do an FFT on a vector of say, 1025 > points. Our current algorithm is to zero-pad this vector to 2048 > points. > > The problem with this approach is that we nearly double the number of > points for downstream processing. This is very problematic at high > datarates. > > We have a few ideas but since this seems like a common problem, I'd > like to ask people here for tips. > > Thanks! > > Patrick Hello Patrick, The FFT algorithm is efficient because it uses a divide and conquer approach to process the data in subsets. The data is processed by multiple "butterfly" processing kernels. Power of two data sets are easy to deal with, because you can just use power of two butterflies and the addressing is regular. You can still use the FFT algorithm on non power of two data sets, but if you do not want to pad the data up to the next power of two, you will need non power of two butterflies. For your case, 1025 factors to 5*5*41 so you would need to have radix-5 and radix-41 butterflies if you really want to have a 1025 point FFT. If you just want to not have to pad all the way to 2048 but don't care about padding a bit you could find a nicer size. For example 1152 would be 2^7 * 3^2 and would only need radix-2 and radix-3 butterflies. If you want an FFT that is not a power of two in size, you will need a non power of two butterfly in the mix. Regards, John McCaskill www.fastertechnology.com

On Tue, 30 Oct 2007 07:10:49 -0700, Andy <jonesandy@comcast.net> wrote: >On Oct 29, 12:15 pm, mk <kal*@dspia.*comdelete> wrote: >> On Mon, 29 Oct 2007 17:09:15 +0000, Philip Potter >> >> <p...@see.sig.invalid> wrote: >> >IIRC Xilinx Virtex-5 is on a 45nm process. >> >> No, Virtex-5 is 65nm. No foundry is selling 45nm production wafers >> yet.http://www.xilinx.com/products/silicon_solutions/fpgas/virtex/virtex5... > >No FPGA foundry... Intel is selling 45nm quad core processors now. Intel is not a foundry ie it doesn't sell foundry services where people can send them a design to be manufactured in their fabrication facilities for a price. TSMC, UMC, Chartered and SMIC are examples of foundries and none of them is selling production 45nm wafers at this time.

On Oct 30, 6:38 am, Patrick Dubois <prdub...@gmail.com> wrote: > On 30 oct, 09:23, John_H <newsgr...@johnhandwork.com> wrote: > > > > > I'd suggest that an arbitrary number - selected at runtime rather than > > by design - isn't practical but designing for 1025 points might be. > > > I forget if it was LeCroy or Tektronix, but about a decade ago I read > > the whitepaper on their scope's FFT capability. Scopes also have the > > "limitation" of non-2^n size samples. Rather than decompose their FFT > > into 2-element butterflies, they decomposed their 2/5/10 multiple > > waveform samples into 2-element and 5-element FFT nodules. If the idea > > behind the FFT is to reuse the sin/cos values for a given phase delta, > > decomposing into a non-2^n system can work well. What obviously won't > > work well for this approach is any size with large prime numbers. > > > Your 1025 point example calls for 5x5x41 which wouldn't pring so much in > > acceleration. > > > Sorry I don't have an url for the whitepaper - it's been a decade. > > > - John_H > > Thanks for the idea. But ideally I'd like to limit myself to power of > 2 FFTs so that I can use the Xilinx core. The number of points needs > to be flexible from 64 to 32k. Decomposition into smaller chunks seems > like a promising avenue. Although if one number is decomposible into > two power of 2s (e.g. 8192, which is divisable by 64 and 128), then > that number itself will be a power of 2. > > I'd be quite happy with an algorithm that increases the number of > points possible (compared to only power of 2s). Being able to do > _every_ possible number of points is probably too strict a > requirement. > > Patrick Since you have used zero-filling, you don't seem to need a transform the exact size of the data. There is another technique complementary to zero-filling, sometimes called data folding. Since the DFT is a circular convolution, when your data set is larger than the transform size you can continue to 'wrap' the data in a circular manner and sum the overlapped samples. You then perform the smaller sized FFT. You might consider data folding up to sqrt 2 times the smaller power of two FFT size and zero-filling up to the larger transform size above that. Chapter 8 (by fred harris) of 'Handbook of Digital Signal Processing Engineering Applications' edited by Douglas Eliot discusses this approach. With either zero-fill or data folding it is useful to remember that the window size that determines bin response shapes is the data size not the transform size. Dale B. Dalrymple http://dbdimages.com http://stores.lulu.com/dbd

Hi all, I found the problem: the "Retiming" option in Synplify Pro caused the address signals to be retimed and they could therefore not be packed into an IOB flip flop. This caused a much higher delay than in the other output signals. Thanks for your help, Simon "Simon Heinzle" <sheinzle@inf.ethz.ch> wrote in message news:47261129$1@news1-rz-ap.ethz.ch... > Hi FPGA Group! > > I'm struggling to get a fast speed (~ 200 MHz) for the DDR2 DRAM > interface, generated with the Xilinx Memory Interface Generator. The > complete system consists of a PCI interface, an I/O DMA buffer, a burst > module bursting from DMA buffer to the DDR2 DRAM interface. > > What is the best way to define setup/hold times for the I/O pads (UCF)? > (the RAM interface consists of a bi-dir data bus DQ, some output signals > e.g. A and the DRAM clocks CK) > 1. Using the OFFSET = OUT 5 ns AFTER "SYS_CLK_P"? Unfortunately, this does > only work using the Input Clock Pin, but it should probably better be in > reference to the DRAM clocks CK) > 2. Using TIMESPEC "TS_DDR_OUT" = FROM FFS TO "DDR_OUT" 5 ns; ? Is probably > better, but I'm not exactly sure. > > Furthermore, how would you tackle the problem if the timing at the pads > cannot be met? > > Thanks in advance for helpful answers and pointers in the right direction! > > Best regards, > Simon > > >

On 30 oct, 12:03, dbd <d...@ieee.org> wrote: > Since you have used zero-filling, you don't seem to need a transform > the exact size of the data. Correct. Although I want to minimize data growth as much as possible (which is the drawback of zero-filling). > There is another technique complementary > to zero-filling, sometimes called data folding. Since the DFT is a > circular convolution, when your data set is larger than the transform > size you can continue to 'wrap' the data in a circular manner and sum > the overlapped samples. You then perform the smaller sized FFT. You > might consider data folding up to sqrt 2 times the smaller power of > two FFT size and zero-filling up to the larger transform size above > that. > > Chapter 8 (by fred harris) of 'Handbook of Digital Signal Processing > Engineering Applications' edited by Douglas Eliot discusses this > approach. Interesting. Thanks for the reference, I'll try to find a copy of the book. Patrick

Antti <Antti.Lukats@googlemail.com> wrote: > foundation 1.5 mapper terminates on windows XP with GPF fault [...] > or any other recommendations how to create bit file from VHDL for > XC3100A FPGA why not setup up a vmware sandbox with win95 (or something like that) and running the old tools in this box? (and it is much easier to backup the vmware image) WD --

Andrew FPGA wrote: (snip) > Also, they compared the FPGA to a standard cell asic. Presumably with > full custom the difference would be even greater. (and the NRE FPGA > advantage would be even greater too of course.) Is Sea-of-gates still available? I used to have the descriptions of the libraries for them. For SOG, as I understand it, only one custom mask is needed, over what it pretty much an array of transistors. The timing isn't quite as good as standard cell, as the transistor size can't be varied as much. It should be a lot cheaper, though. -- glen

On 30 Okt., 20:18, Walter Dvorak <use-reply...@invalid.invalid> wrote: > Antti <Antti.Luk...@googlemail.com> wrote: > > foundation 1.5 mapper terminates on windows XP with GPF fault > [...] > > or any other recommendations how to create bit file from VHDL for > > XC3100A FPGA > > why not setup up a vmware sandbox with win95 (or > something like that) and running the old tools in this box? > > (and it is much easier to backup the vmware image) > > WD > -- yes can do that - my question was how likely is F 1.5 to work at all on 95 or NT ok, i also checked the M 1.5 with sch, it also GPF on map so its not the synplify edif issue weird I also installed F 3.1 that according to xilinx should also support 3100A but it doesnt so i still may need to get M 1.5 working somehow Antti

Nevo wrote: (snip) > This is definitely on my list of improvements to make in the future. I'm > going to build the first unit with the brute force approach and add in > refinements. (Christmas 2008 should be killer!) :) > Not only will this change reduce total power through the circuit, it'll > allow me to use a smaller transformer, which is one of the costliest parts > in the design. I once thought about doing an array of christmas lights multiplexed so that I could turn each one on and off under computer control. It needs a lot of diodes to do that, though now with LED christmas lights those wouldn't be needed. -- glen

Wei Wang wrote: > Eric, I appreciate your willingness to dig further to help, but my > question was how I could check the block ram mapping of my design > which I thought it was quite generic, and I would expect answers, such Without looking at your HDL code, I don't think we have any clue how you've attached the blockrams to your CPU, so I'm not sure how we could tell you how to check the mapping. > BTW, I > suppose most of us in this group do not work for ourselves, only lazy > university students would post their entire project and let somebody > else do the work for them. In general, I agree. But sometimes it's hard to offer any help without more detail, and sometimes the easiest way to see the relevant detail is to look at the HDL. I'm sorry that I don't have any more specific advice to offer. Best regards, Eric

Nevo wrote: (snip) > If I were designing the circuit board, I'd put in pads for snubbers. I'm > buying from a 'group buy' on a DIY Christmas lights forum and the existing > boards don't have snubbers designed in. I'll provide this feedback for next > year's designs. :) How about a URL so that others interested could follow along? -- glen

Hi all, I would like to debug a system containing a microblze and a ppc405. I'm using the xmd (gdb) for both of these units. I have a single mdm unit and a jtagppc (a single jtag interface). Is there a way to debug both of the processors simultaneously (via two GDBs). Thanks in advance, Mordehay

> I'd like to ask experts here about ideas to perform a FFT on an > arbitrary number of points (for real data). Hi Patrick, A popular misconception seems to be that in order to use cooley-turkey FFT the number of points must be a power of 2. Digital Signal Processing with Field Programmable Gate Arrays by Uwe Meyer-Baese, for example, has an example we he shows the Cooley-Tukey FFT for N = 12. He claims 674 real additions and 432 real multiplications for a 12 point DFT, and for the 12 point FFT 108 real additions and 28 real multiplications. The signal flow graph he shows, shows in the 1st stage, 3x four point DFTs, followed by twiddle factors, and then a final stage of 4x three point DFT's. He also goes on to present the Good-Thomas FFT algorithm and Winograd FFT. Again he demonstrates for N=12. Regards Andrew

On 30 Okt., 16:42, mk <kal*@dspia.*comdelete> wrote: > >No FPGA foundry... Intel is selling 45nm quad core processors now. > > Intel is not a foundry ie it doesn't sell foundry services where > people can send them a design to be manufactured in their fabrication > facilities for a price. TSMC, UMC, Chartered and SMIC are examples of > foundries and none of them is selling production 45nm wafers at this > time. IBM claims to provide a 45 nm ASIC process(IBM Cu-45HP ASIC). I don't know, if you could get a production start right now(if your device would be ready for production), but it is offered to customers on their homepage. bye Thomas

On 27 Okt., 17:08, mk <kal*@dspia.*comdelete> wrote: > On Sat, 27 Oct 2007 00:39:05 -0700, Thomas Stanka > > <usenet_nospam_va...@stanka-web.de> wrote: > >when it comed to adders because nearly every actual fpga provides fast > >carry logic which I haven't seen in ASIC so far. > > Almost all ASIC libraires I've used has a full adder in it which is > basically what a fast carry logic is in an FPGA where they have > hardwired full adders which don't need to be made from luts and don't > need the programmable interconnect. I'ts very easy to accomplish the > same in an ASIC. Is it only a fulladder gate, or a full adder (for given bitsize)? The adders I've seen in ASIC libs provide no fast carry compared to normal cell delay. But that has nothing to say, as I'm not that experienced in comparing ASIC technologies. bye Thomas

<me_2003@walla.co.il> wrote in message news:1193780010.206149.316160@57g2000hsv.googlegroups.com... > Hi all, > I would like to debug a system containing a microblze and a ppc405. > I'm > using the xmd (gdb) for both of these units. I have a single mdm unit > and a jtagppc (a single jtag interface). > Is there a way to debug both of the processors simultaneously (via > two > GDBs). > Thanks in advance, Mordehay > Just connect to the two processors in XMD via two separate connect commands. XMD will open up a GDBServer port for each processor. Connect a GDB session each to the GDBServer ports. You are all set to debug the two processors simultaneously. Be aware of the default system reset performed by XMD upon download of a program. You can use the debugconfig command to change this behavior, if it is destructive for your simultaneous debug sessions.

Hi, me_2003@walla.co.il wrote: > I would like to debug a system containing a microblze and a ppc405. > I'm > using the xmd (gdb) for both of these units. I have a single mdm unit > and a jtagppc (a single jtag interface). > Is there a way to debug both of the processors simultaneously (via > two > GDBs). You shld be able to do this - using xmd, connect to both CPUs: % connect mb mdm % connect ppc hw you may need to add other options to each connect statement, depending on your FPGA and JTAG setup etc. This sequence would make the MB target 0, and the PPC target 1 xmd should then be listening on two different ports, one for the mb, and one for the ppc. Since xmd tends to allocate ports from 1234 upwards, my guess is that the mb will be on port 1234, and the ppc on port 1235 - the actual port no's will be printed by xmd after each connection is made. Then, start each gdb, issue a target command target remote localhost:1234 (for microblaze) and target remote localhost:1235 (for the ppc). Some minor details may remain for you to work out, but this is an overview of the process Regards, John

Patrick Dubois wrote: > Hi, > > I'd like to ask experts here about ideas to perform a FFT on an > arbitrary number of points (for real data). > > The cores usually found for an FPGA implementation only permit FFTs on > a number of points that is a power of 2. In our particular case > however, we need to be able to do an FFT on a vector of say, 1025 > points. Our current algorithm is to zero-pad this vector to 2048 > points. > > The problem with this approach is that we nearly double the number of > points for downstream processing. This is very problematic at high > datarates. > > We have a few ideas but since this seems like a common problem, I'd > like to ask people here for tips. > > Thanks! > > Patrick > As others have mentioned, you need non-power of two FFT kernels to get non-power of two transform sizes unless you use zero padding. A possibility would be to have a set of non-power of two kernels you could combine using the mixed radix algorithm to get larger FFTs. If your FFT kernel sizes are relatively prime, you get a significant simplification because it doesn't need phase rotations between the FFT kernels. For example, a 1540 point FFT can be had by combining 4, 5, 7 and 11 point kernels with no intervening rotations. If you use rotations between the FFT kernels, you can mix the kernels without regard to whether or not they are relatively prime. There are Winograd kernels for all these sizes, and others (Rader or Singleton as I recall) for 3 and 5 point. These are actually quite a bit easier to do in hardware than they are with a microprocessor, as the reordering is a bit convoluted. You might look at the Smith & Smith book on FFT's (http://www.amazon.com/exec/obidos/ASIN/0780310918/andraka), as they have a very good coverage of non-Cooley-Tukey FFTs. Unfortunately, arbitrary sizes are a little harder to do if it is to be flexible because the addressing and phase rotations are not as well ordered as they are for power of 2 sizes.

On Oct 30, 6:17 am, futz...@gmail.com wrote: > On Oct 25, 9:57 am, techG <giuliopul...@gmail.com> wrote: > > > HI, I'm new in FPGA, I have to build a SPI interface (in VHDL) to let > > an fpga read and write a flash memory. > > The fpga is a Xilinx Spartan3E, while the memory is an ST M25P16 > > (serial I/O). > > Do you know if is there any built vhdl core to start with? > > > Thanks in advance > > Giulio > > You can trywww.opencores.org; > The SPI core interface is quite simple to code up as well. > Cheers i just found a project for SPI controller in opencores.org (spiflashcontroller) it's not as simple as I expected, but i found it more useful! thank you all

On Tue, 30 Oct 2007 14:46:40 -0700, Thomas Stanka <usenet_nospam_valid@stanka-web.de> wrote: >On 30 Okt., 16:42, mk <kal*@dspia.*comdelete> wrote: >> >No FPGA foundry... Intel is selling 45nm quad core processors now. >> >> Intel is not a foundry ie it doesn't sell foundry services where >> people can send them a design to be manufactured in their fabrication >> facilities for a price. TSMC, UMC, Chartered and SMIC are examples of >> foundries and none of them is selling production 45nm wafers at this >> time. > > >IBM claims to provide a 45 nm ASIC process(IBM Cu-45HP ASIC). I don't >know, if you could get a production start right now(if your device >would be ready for production), but it is offered to customers on >their homepage. Here is a quote from their press release: "IBM SiGe BiCMOS 6WL design kits are available now. IBM plans to have SiGe BiCMOS 5PAe design kits available this summer and first design kits for CMOS 11LP Foundry products later in 2007. The planned availability date for Cu-45HP ASIC is in early 2008." http://www-03.ibm.com/press/us/en/pressrelease/21648.wss

On Tue, 30 Oct 2007 14:50:44 -0700, Thomas Stanka <usenet_nospam_valid@stanka-web.de> wrote: >On 27 Okt., 17:08, mk <kal*@dspia.*comdelete> wrote: >> On Sat, 27 Oct 2007 00:39:05 -0700, Thomas Stanka >> >> <usenet_nospam_va...@stanka-web.de> wrote: >> >when it comed to adders because nearly every actual fpga provides fast >> >carry logic which I haven't seen in ASIC so far. >> >> Almost all ASIC libraires I've used has a full adder in it which is >> basically what a fast carry logic is in an FPGA where they have >> hardwired full adders which don't need to be made from luts and don't >> need the programmable interconnect. I'ts very easy to accomplish the >> same in an ASIC. > >Is it only a fulladder gate, or a full adder (for given bitsize)? >The adders I've seen in ASIC libs provide no fast carry compared to >normal cell delay. But that has nothing to say, as I'm not that >experienced in comparing ASIC technologies. Xilinx fast carry logic path for v5 is described in this document http://direct.xilinx.com/bvdocs/userguides/ug190.pdf page 193. As you can see it is not a multi-bit carry lookahead or anything similarly complicated. It's a hardwired implementation of a ripple carry logic which can be duplicated using standard cell full adders relatively easily.

Nicolas Matringe wrote: > What are the books you wouldn't work without ? I could work without books, but not without a simulator. -- Mike Treseler

On 31 Okt., 05:15, mk <kal*@dspia.*comdelete> wrote: > On Tue, 30 Oct 2007 14:50:44 -0700, Thomas Stanka > > > > <usenet_nospam_va...@stanka-web.de> wrote: > >On 27 Okt., 17:08, mk <kal*@dspia.*comdelete> wrote: > >> On Sat, 27 Oct 2007 00:39:05 -0700, Thomas Stanka > > >> <usenet_nospam_va...@stanka-web.de> wrote: > >> >when it comed to adders because nearly every actual fpga provides fast > >> >carry logic which I haven't seen in ASIC so far. > > >> Almost all ASIC libraires I've used has a full adder in it which is > >> basically what a fast carry logic is in an FPGA where they have > >> hardwired full adders which don't need to be made from luts and don't > >> need the programmable interconnect. I'ts very easy to accomplish the > >> same in an ASIC. > > >Is it only a fulladder gate, or a full adder (for given bitsize)? > >The adders I've seen in ASIC libs provide no fast carry compared to > >normal cell delay. But that has nothing to say, as I'm not that > >experienced in comparing ASIC technologies. > > Xilinx fast carry logic path for v5 is described in this documenthttp://direct.xilinx.com/bvdocs/userguides/ug190.pdfpage 193. As you > can see it is not a multi-bit carry lookahead or anything similarly > complicated. It's a hardwired implementation of a ripple carry logic > which can be duplicated using standard cell full adders relatively > easily. Thankyou, I have no need to learn the details of the virtex5 carry logic at the moment. The point I'm on is that in the fpga technologies I know the carry chain path has a much faster gate delay compared to the normal gate delay. This means you could do 8 to 16 bit adding in a pipeline running nearly at max technology speed for FF-gate-FF, maybe you need more time multiplexing the data before or after the adder stage than the time you need for adding. In ASIC I've only seen gates with the carry chain having a gate delay compareable to a gate delay, this means, that a 16 bit ripple adder won't be able to run anywhere near FF-gate-FF but needs something like FF-17xgate-FF. In that case you will likely have the adder stage dominating your pipeline frequency. You could only speed up a ripple carry adder by placing it tight together, but this has to be done at any timing critical path anyway. bye Thomas

Thomas Stanka wrote: > Is it only a fulladder gate, or a full adder (for given bitsize)? > The adders I've seen in ASIC libs provide no fast carry compared to > normal cell delay. But that has nothing to say, as I'm not that > experienced in comparing ASIC technologies. There are also different adder implementations at the synthesis tool level. For example Design compiler with proper libraries supports following adders: ripple-carry, carry-look-ahead, delay-optimized flexible parallel-prefix, brent-kung, conditional-sum and ripple-carry-select. --Kim

Thanks, I have seen this product but this is not flexible enough for me. This king of chip is too specific, that why I wanted to do this with a fpga. I don't know exactly how flash memory works so I'm not able to know which throughput I could reach with an IDE interface and a flash memory controller in a fpga.... I thought someone else could have done this before, then I would gain some time... Anyway, I'll have to shearch by myself :)

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive

Compare FPGA features and resources

Threads starting:

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search