Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 161250

Article: 161250
Subject: Re: Tiny CPUs for Slow Logic
From: Theo Markettos <theom+news@chiark.greenend.org.uk>
Date: 20 Mar 2019 11:52:07 +0000 (GMT)
Links: << >>  << T >>  << A >>
already5chosen@yahoo.com wrote:
> As to niches, all "hard" blocks that we currently have in FPGAs are about
> niches.  It's extremely rare that user's design uses all or majority of
> the features of given FPGA device and need LUTs, embedded memories, PLLs,
> multiplies, SERDESs, DDR DRAM I/O blocks etc in exact amounts appearing in
> the device.  It still makes sense, economically, to have them all built
> in, because masks and other NREs are mighty expensive while silicon itself
> is relatively cheap.  Multiple small hard CPU cores are really not very
> different from features, mentioned above.

A lot of these 'niches' have been proven in soft-logic.

Implement your system in soft-logic, discover that there's lots of
multiply-adds and they're slow and take up area.  A DSP block is thus an
'accelerator' (or 'most compact representation') of the same concept in
soft-logic.

The same goes for BRAMs (can be implemented via registers but too much
area), adders (slow when implemented with generic LUTs), etc.

Other features (SERDES, PLLs, DDR, etc) can't be done at all without
hard-logic support.  If you want those features, you need the hard logic,
simple as that.

Through analysis of existing designs we can have a provable win of the hard
over soft logic, to make it worthwhile putting it on the silicon and
integrating into the tools.  In some of these cases, I'd guess the win over
the soft-logic is 10x or more saving in area.

Rick's idea can be done today in soft-logic.  So someone could build a proof
of concept and measure the cases where it improves things over the baseline. 
If that case is compelling, let's put it in the hard logic.

But thus far we haven't seen a clear case for why someone should build a
proof of concept.  I'm not saying it doesn't exist, but we need a clear
elucidation of the problem that it might solve.

Theo

Article: 161251
Subject: Re: Tiny CPUs for Slow Logic
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Wed, 20 Mar 2019 13:37:13 +0000
Links: << >>  << T >>  << A >>
On 20/03/19 10:56, already5chosen@yahoo.com wrote:
> On Tuesday, March 19, 2019 at 10:07:38 PM UTC+2, Tom Gardner wrote:
>> On 19/03/19 17:35, already5chosen@yahoo.com wrote:
>>> On Tuesday, March 19, 2019 at 6:19:36 PM UTC+2, Tom Gardner wrote:
>>>> The "granularity" of the computation and communication will be a key
>>>> to understanding what the OP is thinking.
>>> 
>>> I don't know what Rick had in mind. I personally would go for one
>>> "hard-CPU" block per 4000-5000 6-input logic elements (i.e. Altera ALMs
>>> or Xilinx CLBs). Each block could be configured either as one 64-bit core
>>> or pair of 32-bit cores. The bock would contains hard instruction
>>> decoders/ALUs/shifters and hard register files. It can optionally borrow
>>> adjacent DSP blocks for multipliers. Adjacent embedded memory blocks can
>>> be used for data memory. Code memory should be a bit more flexible giving
>>> to designer a choice between embedded memory blocks or distributed memory
>>> (X)/MLABs(A).
>> 
>> It would be interesting to find an application level description (i.e.
>> language constructs) that - could be automatically mapped onto those
>> primitives by a toolset - was useful for more than a niche subset of
>> applications - was significantly better than existing tools
>> 
>> I wouldn't hold my breath :)
> 
> 
> I think, you are looking at it from wrong angle. One doesn't really need new
> tools to design and simulate such things. What's needed is a combinations of
> existing tools - compilers, assemblers, probably software simulator plug-ins
> into existing HDL simulators, but the later is just luxury for speeding up
> simulations, in principle, feeding HDL simulator with RTL model of the CPU
> core will work too.

That would be one perfectly acceptable embodiment of a toolset
that I mentioned.

But more difficult that creating such a toolset is defining
an application level description that a toolset can munge.

So, define (initially by example, later more formally) inputs
to the toolset and outputs from it. Then we can judge whether
the concepts are more than handwaving wishes.



> As to niches, all "hard" blocks that we currently have in FPGAs are about
> niches. It's extremely rare that user's design uses all or majority of the
> features of given FPGA device and need LUTs, embedded memories, PLLs,
> multiplies, SERDESs, DDR DRAM I/O blocks etc in exact amounts appearing in
> the device. It still makes sense, economically, to have them all built in,
> because masks and other NREs are mighty expensive while silicon itself is
> relatively cheap. Multiple small hard CPU cores are really not very different
> from features, mentioned above.

All the blocks you mention have a simple API and easily
enumerated set of behaviour.

The whole point of processors is that they enable much more
complex behaviour that is practically impossible to enumerate.

Alternatively, if it is possible to enumerate the behaviour
of a processor, then it would be easy and more efficient to
implement the behaviour in conventional logic blocks.

Article: 161252
Subject: Re: Tiny CPUs for Slow Logic
From: already5chosen@yahoo.com
Date: Wed, 20 Mar 2019 07:11:33 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:
>=20
> But more difficult that creating such a toolset is defining
> an application level description that a toolset can munge.
>=20
> So, define (initially by example, later more formally) inputs
> to the toolset and outputs from it. Then we can judge whether
> the concepts are more than handwaving wishes.
>=20

I don't understand what you are asking for.

If I had such thing, I'd use it in exactly the same way that I use soft cor=
es (Nios2) today. I will just use them more frequently, because today it co=
sts me logic resources (often acceptable, but not always) and synthesis and=
 fitter time (and that what I really hate). On the other hand, "hard" core =
would be almost free in both aspects.=20
It would be as expensive as "soft" or even costlier, in HDL simulations, bu=
t until now I managed to avoid "full system" simulations that cover everyth=
ing including CPU core and the program that runs on it. Or may be, I did it=
 once or twice years ago and already don't remember. Anyway, for me it's no=
t an important concern and I consider myself rather heavy user of soft core=
s.

Also, theoretically, if performance of the hard core is non-trivially highe=
r than that of soft cores, either due to higher IPC (I didn't measure, but =
would guess that for majority of tasks Nios2-f IPC is 20-30% lower than ARM=
 Cortex-M4) or due to higher clock rate, then it will open up even more nic=
hes. However I'd expect that performance factor would be less important for=
 me, personally, than other factors mentioned above.


Article: 161253
Subject: Re: Tiny CPUs for Slow Logic
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Wed, 20 Mar 2019 14:31:23 +0000
Links: << >>  << T >>  << A >>
On 20/03/19 14:11, already5chosen@yahoo.com wrote:
> On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:
>> 
>> But more difficult that creating such a toolset is defining an application
>> level description that a toolset can munge.
>> 
>> So, define (initially by example, later more formally) inputs to the
>> toolset and outputs from it. Then we can judge whether the concepts are
>> more than handwaving wishes.
>> 
> 
> I don't understand what you are asking for.

Go back and read the parts of my post that you chose to snip.

Give a handwaving indication of the concepts that avoid the
conceptual problems that I mentioned.

Or better still, get the OP to do it.



> If I had such thing, I'd use it in exactly the same way that I use soft cores
> (Nios2) today. I will just use them more frequently, because today it costs
> me logic resources (often acceptable, but not always) and synthesis and
> fitter time (and that what I really hate). On the other hand, "hard" core
> would be almost free in both aspects. It would be as expensive as "soft" or
> even costlier, in HDL simulations, but until now I managed to avoid "full
> system" simulations that cover everything including CPU core and the program
> that runs on it. Or may be, I did it once or twice years ago and already
> don't remember. Anyway, for me it's not an important concern and I consider
> myself rather heavy user of soft cores.
> 
> Also, theoretically, if performance of the hard core is non-trivially higher
> than that of soft cores, either due to higher IPC (I didn't measure, but
> would guess that for majority of tasks Nios2-f IPC is 20-30% lower than ARM
> Cortex-M4) or due to higher clock rate, then it will open up even more
> niches. However I'd expect that performance factor would be less important
> for me, personally, than other factors mentioned above.


Article: 161254
Subject: Re: Tiny CPUs for Slow Logic
From: gnuarm.deletethisbit@gmail.com
Date: Wed, 20 Mar 2019 07:50:59 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Wednesday, March 20, 2019 at 6:14:21 AM UTC-4, David Brown wrote:
> On 20/03/2019 03:30, gnuarm.deletethisbit@gmail.com wrote:
> > On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos
> > wrote:
> >> Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
> >>> Understand XMOS's xCORE processors and xC language, see how they
> >>> complement and support each other. I found the net result=20
> >>> stunningly easy to get working first time, without having to=20
> >>> continually read obscure errata!
> >>=20
> >> I can see the merits of the XMOS approach.  But I'm unclear how
> >> this relates to the OP's proposal, which (I think) is having tiny
> >> CPUs as hard logic blocks on an FPGA, like DSP blocks.
> >>=20
> >> I completely understand the problem of running out of hardware
> >> threads, so a means of 'just add another one' is handy.  But the
> >> issue is how to combine such things with other synthesised logic.
> >>=20
> >> The XMOS approach is fine when the hardware is uniform and the
> >> software sits on top, but when the hardware is synthesised and the
> >> 'CPUs' sit as pieces in a fabric containing random logic (as I
> >> think the OP is suggesting) it becomes a lot harder to reason about
> >> what the system is doing and what the software running on such
> >> heterogeneous cores should look like.  Only the FPGA tools have a
> >> full view of what the system looks like, and it seems stretching
> >> them to have them also generate software to run on these cores.
> >=20
> > When people talk about things like "software running on such
> > heterogeneous cores" it makes me think they don't really understand
> > how this could be used.  If you treat these small cores like logic
> > elements, you don't have such lofty descriptions of "system software"
> > since the software isn't created out of some global software package.
> > Each core is designed to do a specific job just like any other piece
> > of hardware and it has discrete inputs and outputs just like any
> > other piece of hardware.  If the hardware clock is not too fast, the
> > software can synchronize with and literally function like hardware,
> > but implementing more complex logic than the same area of FPGA fabric
> > might.
> >=20
>=20
> That is software.
>=20
> If you want to try to get cycle-precise control of the software and use
> that precision for direct hardware interfacing, you are almost certainly
> going to have a poor, inefficient and difficult design.  It doesn't
> matter if you say "think of it like logic" - it is /not/ logic, it is
> software, and you don't use that for cycle-precise control.  You use
> when you need flexibility, calculations, and decisions.

I suppose you can make anything difficult if you try hard enough. =20

The point is you don't have to make it difficult by talking about "software=
 running on such heterogeneous cores".  Just talk about it being a small hu=
nk of software that is doing a specific job.  Then the mystery is gone and =
the task can be made as easy as the task is.=20

In VHDL this would be a process().  VHDL programs are typically chock full =
of processes and no one wrings their hands worrying about how they will des=
ign the "software running on such heterogeneous cores".=20

BTW, VHDL is software too.=20


> > There is no need to think about how the CPUs would communicate unless
> > there is a specific need for them to do so.  The F18A uses a
> > handshaked parallel port in their design.  They seem to have done a
> > pretty slick job of it and can actually hang the processor waiting
> > for the acknowledgement saving power and getting an instantaneous
> > wake up following the handshake.  This can be used with other CPUs or
> >=20
>=20
> Fair enough.

Ok, that's a start.=20

Rick C.=20

Article: 161255
Subject: Re: Tiny CPUs for Slow Logic
From: already5chosen@yahoo.com
Date: Wed, 20 Mar 2019 07:51:08 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Wednesday, March 20, 2019 at 4:31:27 PM UTC+2, Tom Gardner wrote:
> On 20/03/19 14:11, already5chosen@yahoo.com wrote:
> > On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:
> >> 
> >> But more difficult that creating such a toolset is defining an application
> >> level description that a toolset can munge.
> >> 
> >> So, define (initially by example, later more formally) inputs to the
> >> toolset and outputs from it. Then we can judge whether the concepts are
> >> more than handwaving wishes.
> >> 
> > 
> > I don't understand what you are asking for.
> 
> Go back and read the parts of my post that you chose to snip.
> 
> Give a handwaving indication of the concepts that avoid the
> conceptual problems that I mentioned.

Frankly, it starts to sound like you never used soft CPU cores in your designs.
So, for somebody like myself, who uses them routinely for different tasks since 2006, you are really not easy to understand.
Concept? Concepts are good for new things, not for something that is a variation of something old and routine and obviously working.

> 
> Or better still, get the OP to do it.
> 

With that part I agree.



Article: 161256
Subject: Re: Tiny CPUs for Slow Logic
From: gnuarm.deletethisbit@gmail.com
Date: Wed, 20 Mar 2019 07:52:57 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Wednesday, March 20, 2019 at 6:29:50 AM UTC-4, already...@yahoo.com wrot=
e:
> On Wednesday, March 20, 2019 at 4:32:07 AM UTC+2, gnuarm.del...@gmail.com=
 wrote:
> > On Tuesday, March 19, 2019 at 11:24:33 AM UTC-4, Svenn Are Bjerkem wrot=
e:
> > > On Tuesday, March 19, 2019 at 1:13:38 AM UTC+1, gnuarm.del...@gmail.c=
om wrote:
> > > > Most of us have implemented small processors for logic operations t=
hat don't need to happen at high speed.  Simple CPUs can be built into an F=
PGA using a very small footprint much like the ALU blocks.  There are stack=
 based processors that are very small, smaller than even a few kB of memory=
. =20
> > > >=20
> > > > If they were easily programmable in something other than C would an=
yone be interested?  Or is a C compiler mandatory even for processors runni=
ng very small programs? =20
> > > >=20
> > > > I am picturing this not terribly unlike the sequencer I used many y=
ears ago on an I/O board for an array processor which had it's own assemble=
r.  It was very simple and easy to use, but very much not a high level lang=
uage.  This would have a language that was high level, just not C rather so=
mething extensible and simple to use and potentially interactive.=20
> > > >=20
> > > > Rick C.
> > >=20
> > > picoblaze is such a small cpu and I would like to program it in somet=
hing else but its assembler language.=20
> >=20
> > Yes, it is small.  How large is the program you are interested in?=20
> >=20
> > Rick C.
>=20
> I don't know about Svenn Are Bjerkem, but can tell you about myself.
> Last time when I considered something like that and wrote enough of the p=
rogram to make measurements the program contained ~250 Nios2 instructions. =
I'd guess, on minimalistic stack machine it would take 350-400 instructions=
.
> At the end, I didn't do it in software. Coding the same functionality in =
HDL turned out to be not hard, which probably suggests that my case was sma=
ller than average.
>=20
> Another extreme, where I did end up using "small" soft core, it was much =
more like "real" software: 2300 Nios2 instructions.

What sorts of applications where these?=20

Rick C.

Article: 161257
Subject: Re: Tiny CPUs for Slow Logic
From: gnuarm.deletethisbit@gmail.com
Date: Wed, 20 Mar 2019 07:58:27 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Wednesday, March 20, 2019 at 6:41:55 AM UTC-4, already...@yahoo.com wrot=
e:
> On Tuesday, March 19, 2019 at 10:07:38 PM UTC+2, Tom Gardner wrote:
> > On 19/03/19 17:35, already5chosen@yahoo.com wrote:
> > > On Tuesday, March 19, 2019 at 6:19:36 PM UTC+2, Tom Gardner wrote:
> > >>=20
> > >> The UK Parliament is an unmitigated dysfunctional mess.
> > >>=20
> > >=20
> > > Do you prefer dysfunctional mesh ;)
> >=20
> > :) I'll settle for anything that /works/ predictably :(
> >=20
>=20
> UK political system is completely off-topic in comp.arch.fpga. However I'=
d say that IMHO right now your parliament is facing unusually difficult pro=
blem on one hand, but at the same time it's not really "life or death" sort=
 of the problem. Having troubles and appearing non-decisive in such situati=
on is normal. It does not mean that the system is broken.

I was watching a video of a guy who bangs together Teslas from salvage cars=
.  This one was about him actually buying a used Tesla from Tesla and the m=
any trials and tribulations he had.  He had traveled to a dealership over a=
n hour drive away and they said they didn't have anything for him.  At one =
point he says he is not going to get too wigged out over all this because i=
t is a "first world problem".  That gave me insight into my own issues real=
izing that what seems at first to me to be a major issue, is an issue that =
much of the world would LOVE to have. =20

I'm wondering if Brexit is not one of those issues...  I'm just sayin'...=
=20

FPGA design is similar.  Consider which of your issues are "first world" is=
sues when you design.=20

Rick C.

Article: 161258
Subject: Re: Tiny CPUs for Slow Logic
From: gnuarm.deletethisbit@gmail.com
Date: Wed, 20 Mar 2019 08:26:46 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Wednesday, March 20, 2019 at 6:53:07 AM UTC-4, Theo wrote:
> gnuarm.deletethisbit@gmail.com wrote:
> > On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos wrote:
> > >
> > When people talk about things like "software running on such heterogene=
ous
> > cores" it makes me think they don't really understand how this could be
> > used.  If you treat these small cores like logic elements, you don't ha=
ve
> > such lofty descriptions of "system software" since the software isn't
> > created out of some global software package.  Each core is designed to =
do
> > a specific job just like any other piece of hardware and it has discret=
e
> > inputs and outputs just like any other piece of hardware.  If the hardw=
are
> > clock is not too fast, the software can synchronize with and literally
> > function like hardware, but implementing more complex logic than the sa=
me
> > area of FPGA fabric might.
>=20
> The point is that we need to understand what the whole system is doing.  =
In
> the XMOS case, we can look at a piece of software with N threads, running
> across the cores provided on the chip.  One piece of software, distribute=
d
> over the hardware resource available - the system is doing one thing.
>=20
> Your bottom-up approach means it's difficult to see the big picture of
> what's going on.  That means it's hard to understand the whole system, an=
d
> to program from a whole-system perspective.

I never mentioned a bottom up or a top down approach to design.  Nothing ab=
out using these small CPUs is about the design "direction".  I am pretty su=
re that you have to define the circuit they will work in before you can sta=
rt designing the code.=20


> > Not sure what is hard to think about.  It's a CPU, a small CPU with
> > limited memory to implement small tasks that can do rather complex
> > operations compared to a state machine really and includes memory,
> > arithmetic and logic as well as I/O without having to write a single li=
ne
> > of HDL.  Only the actual app needs to be written.
>=20
> Here are the sematic descriptions of basic logic elements:
>=20
> LUT:  q =3D f(x,y,z)
> FF:   q <=3D d_in  (delay of one cycle)
> BRAM: q =3D array[addr]
> DSP:  q =3D a*b + c
>=20
> A P&R tool can build a system out of these building blocks.  It's notable
> that the state-holding elements in this schema do nothing else except
> holding state.  That makes writing the tools easier (and we all know how
> difficult the tools already are).  In general, we don't tend to instantia=
te
> these primitives manually but describe the higher level functions (eg a 6=
4
> bit add) in HDL and allow the tools to select appropriate primitives for =
us
> (eg a number of fast-adder blocks chained together).
>=20
> What's the logic equation of a processor? =20

Obviously it is like a combination of LUTs with FFs and able to implement a=
ny logic you wish including math.  BTW, in many devices the elements are no=
t at all so simple.  Xilinx LUTs can be used as shift registers.  There are=
 additional logic within the logic blocks that allow math with carry chains=
, combining LUTs to form larger LUTs, breaking LUTs into smaller LUTs and l=
ets not forget about routing which may not be used much anymore, not sure. =
=20

So your simple world of four elements is really not so valid. =20


> It has state, but vastly more
> state than the simplicity of a flipflop.  What pattern does the P&R tool
> need to match to infer a processor? =20

Why does it need to be inferred.  If you want to write an HDL tool to turn =
HDL into processor code, have at it.  But then there are other methods.  So=
meone mentioned his MO is to use other tools for designing his algorithms a=
nd letting that tool generate the software for a processor or the HDL for a=
n FPGA.  That would seem easy enough to integrate.=20


> How is any verification tool going
> to understand whether the processor with software is doing the right thin=
g?

Huh?  You can't simulate code on a processor???


> If your answer is 'we don't need verification tools, we program by hand'
> then a) software has bugs, and automated verification is a handy way to
> catch them, and b) you're never going to be writing hundreds of different
> mini-programs to run on each core, let alone make them correct.

You seem to have left the roadway here.  I'm lost. =20


> If we scale the processors up a bit, I could see the merits in say a bank
> of, say, 32 Cortex M0s that could be interconnected as part of the FPGA
> fabric and programmed in software for dedicated tasks (for instance, read
> the I2C EEPROM on the DRAM DIMM and configure the DRAM controller at boot=
).=20

I don't follow your logic.  What is different about the ARM processor from =
the stack processor other than that it is larger and slower and requires a =
royalty on each one?  Are you talking about writing the code in C vs. what =
ever is used for the stack processor?=20


> But this is an SoC construct (built using SoC builder tools, and over whi=
ch
> the programmer has some purview although, as it turns out, sketchier than
> you might think[1]).  Such CPUs would likely be running bigger corpora of
> software (for instance, the DRAM controller vendor's provided initialisat=
ion
> code) which would likely be in C.  But in this case we could just use a
> soft-core today (the CPU ISA is most irrelevant for this application, so =
a
> RISC-V/Microblaze/NIOS would be fine).
>=20
> [1] https://inf.ethz.ch/personal/troscoe/pubs/hotos15-gerber.pdf

The point of the many hard cores is the saving of resources.  Soft cores wo=
uld be the most wasteful way to implement logic.  If the application is lar=
ge enough they can implement things in software that aren't as practical in=
 HDL, but that would be a different class of logic from the tiny CPUs I'm t=
alking about.=20


> I can also see another niche, at the extreme bottom end, where a CPLD mig=
ht
> have one of your processors plus a few hundred logic cells.  That's
> essentially a microcontroller with FPGA, or an FPGA with microcontroller =
-
> which some of the vendors already produce (although possibly not
> small/cheap/low power enough).  Here I can't see the advantages of using =
a
> stack-based CPU versus paying a bit more to program in C.  Although I don=
't
> have experience in markets where the retail price of the product is $1, a=
nd so
> every $0.001 matters.
>=20
> > > I would be interested to know what applications might use heterogenou=
s
> > > many-cores and what performance is achievable.
> >=20
> > Yes, clearly not getting the concept.  Asking about heterogeneous
> > performance is totally antithetical to this idea.
>=20
> You keep mentioning 700 MIPS, which suggests performance is important.  I=
f
> these are simple state machine replacements, why do we care about
> performance?

You lost me with the gear shift.  The mention of instruction rate is about =
the CPU being fast enough to keep up with FPGA logic.  The issue with "hete=
rogeneous performance" is the "heterogeneous" part, lumping the many CPUs t=
ogether to create some sort of number cruncher.  That's not what this is ab=
out.  Like in the GA144, I fully expect most CPUs to be sitting around most=
 of the time idling, waiting for data.  This is a good thing actually.  The=
se CPUs could consume significant current if they run at GHz all the time. =
 I believe in the GA144 at that slower rate each processor can use around 2=
.5 mA.  Not sure if a smaller process would use more or less power when run=
ning flat out.  It's been too many years since I worked with those sorts of=
 numbers.=20


> In essence, your proposal has a disconnect between the situations existin=
g
> FPGA blocks are used (implemented automatically by P&R tools) and the
> situations software is currently used (human-driven software and
> architectural design).  It's unclear how you claim to bridge this gap.

I don't usually think of designing in those terms.  If I want to design som=
ething, I design it.  I ignore many tools only using the ones I find useful=
.  In this case I would have no problem writing code for the processor and =
if needed, rolling into the FPGA simulation a model of the processor to run=
 the code.  In a professional implementation I would expect these models to=
 be written for me in modules that run much faster than HDL so the simulati=
on speed is not impacted. =20

I certainly don't see how P&R tools would be a problem.  They accommodate m=
ultipliers, DSP blocks, memory block and many, many special bits of assorte=
d components inside the FPGAs which vary from vendor to vendor.  Clock gene=
rators and distribution is pretty unique to each manufacturer.  Lattice has=
 all sorts of modules to offer like I2C and embedded Flash.  Then there are=
 entire CPUs embedded in FPGAs.  Why would supporting them be so different =
from what I am talking about?=20

Rick C.

Article: 161259
Subject: Re: Tiny CPUs for Slow Logic
From: David Brown <david.brown@hesbynett.no>
Date: Wed, 20 Mar 2019 16:30:11 +0100
Links: << >>  << T >>  << A >>
On 20/03/2019 15:50, gnuarm.deletethisbit@gmail.com wrote:
> On Wednesday, March 20, 2019 at 6:14:21 AM UTC-4, David Brown wrote:
>> On 20/03/2019 03:30, gnuarm.deletethisbit@gmail.com wrote:
>>> On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos 
>>> wrote:
>>>> Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
>>>>> Understand XMOS's xCORE processors and xC language, see how
>>>>> they complement and support each other. I found the net
>>>>> result stunningly easy to get working first time, without
>>>>> having to continually read obscure errata!
>>>> 
>>>> I can see the merits of the XMOS approach.  But I'm unclear
>>>> how this relates to the OP's proposal, which (I think) is
>>>> having tiny CPUs as hard logic blocks on an FPGA, like DSP
>>>> blocks.
>>>> 
>>>> I completely understand the problem of running out of hardware 
>>>> threads, so a means of 'just add another one' is handy.  But
>>>> the issue is how to combine such things with other synthesised
>>>> logic.
>>>> 
>>>> The XMOS approach is fine when the hardware is uniform and the 
>>>> software sits on top, but when the hardware is synthesised and
>>>> the 'CPUs' sit as pieces in a fabric containing random logic
>>>> (as I think the OP is suggesting) it becomes a lot harder to
>>>> reason about what the system is doing and what the software
>>>> running on such heterogeneous cores should look like.  Only the
>>>> FPGA tools have a full view of what the system looks like, and
>>>> it seems stretching them to have them also generate software to
>>>> run on these cores.
>>> 
>>> When people talk about things like "software running on such 
>>> heterogeneous cores" it makes me think they don't really
>>> understand how this could be used.  If you treat these small
>>> cores like logic elements, you don't have such lofty descriptions
>>> of "system software" since the software isn't created out of some
>>> global software package. Each core is designed to do a specific
>>> job just like any other piece of hardware and it has discrete
>>> inputs and outputs just like any other piece of hardware.  If the
>>> hardware clock is not too fast, the software can synchronize with
>>> and literally function like hardware, but implementing more
>>> complex logic than the same area of FPGA fabric might.
>>> 
>> 
>> That is software.
>> 
>> If you want to try to get cycle-precise control of the software and
>> use that precision for direct hardware interfacing, you are almost
>> certainly going to have a poor, inefficient and difficult design.
>> It doesn't matter if you say "think of it like logic" - it is /not/
>> logic, it is software, and you don't use that for cycle-precise
>> control.  You use when you need flexibility, calculations, and
>> decisions.
> 
> I suppose you can make anything difficult if you try hard enough.
> 

Equally, you can make anything sound simple if you are vague enough and
wave your hands around.

> The point is you don't have to make it difficult by talking about
> "software running on such heterogeneous cores".  Just talk about it
> being a small hunk of software that is doing a specific job.  Then
> the mystery is gone and the task can be made as easy as the task is.
> 

I did not use the phrase "software running on such heterogeneous cores"
- and I am not trying to make anything difficult.  You are making cpu
cores.  They run software.  Saying they are "like logic elements" or
"they connect directly to hardware" does not make it so - and it does
not mean that what they run is not software.

> 
> In VHDL this would be a process().  VHDL programs are typically chock
> full of processes and no one wrings their hands worrying about how
> they will design the "software running on such heterogeneous cores".
> 
> 
> BTW, VHDL is software too.

I agree that VHDL is software.  And yes, there are usually processes in
VHDL designs.

I am not /worrying/ about these devices running software - I am simply
saying that they /will/ be running software.  I can't comprehend why you
want to deny that.  It seems that you are frightened of software or
programmers, and want to call it anything /but/ software.

If the software a core is running is simple enough to be described in
VHDL, then it should be a VHDL process - not software in a cpu core.  If
it is too complex for that, it is going to have to be programmed
separately in an appropriate language.  That is not necessarily harder
or easier than VHDL design - it is just different.

If you try to force the software to be synchronous with timing on the
hardware, /then/ you are going to be in big difficulties.  So don't do
that - use hardware for the tightest timing, and software for the bits
that software is good for.


> 
>>> There is no need to think about how the CPUs would communicate
>>> unless there is a specific need for them to do so.  The F18A uses
>>> a handshaked parallel port in their design.  They seem to have
>>> done a pretty slick job of it and can actually hang the processor
>>> waiting for the acknowledgement saving power and getting an
>>> instantaneous wake up following the handshake.  This can be used
>>> with other CPUs or
>>> 
>> 
>> Fair enough.
> 
> Ok, that's a start.
> 

I'd expect that the sensible way to pass data between these, if you need
to do so much, is using FIFO's.


Article: 161260
Subject: Re: Tiny CPUs for Slow Logic
From: gnuarm.deletethisbit@gmail.com
Date: Wed, 20 Mar 2019 08:50:41 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Wednesday, March 20, 2019 at 6:56:51 AM UTC-4, already...@yahoo.com wrot=
e:
> On Tuesday, March 19, 2019 at 10:07:38 PM UTC+2, Tom Gardner wrote:
> > On 19/03/19 17:35, already5chosen@yahoo.com wrote:
> > > On Tuesday, March 19, 2019 at 6:19:36 PM UTC+2, Tom Gardner wrote:
> > >> The "granularity" of the computation and communication will be a key=
 to
> > >> understanding what the OP is thinking.
> > >=20
> > > I don't know what Rick had in mind. I personally would go for one "ha=
rd-CPU"
> > > block per 4000-5000 6-input logic elements (i.e. Altera ALMs or Xilin=
x CLBs).
> > > Each block could be configured either as one 64-bit core or pair of 3=
2-bit
> > > cores. The bock would contains hard instruction decoders/ALUs/shifter=
s and
> > > hard register files. It can optionally borrow adjacent DSP blocks for
> > > multipliers. Adjacent embedded memory blocks can be used for data mem=
ory.
> > > Code memory should be a bit more flexible giving to designer a choice=
 between
> > > embedded memory blocks or distributed memory (X)/MLABs(A).
> >=20
> > It would be interesting to find an application level
> > description (i.e. language constructs) that
> >   - could be automatically mapped onto those primitives
> >     by a toolset
> >   - was useful for more than a niche subset of applications
> >   - was significantly better than existing tools
> >=20
> > I wouldn't hold my breath :)
>=20
>=20
> I think, you are looking at it from wrong angle.
> One doesn't really need new tools to design and simulate such things. Wha=
t's needed is a combinations of existing tools - compilers, assemblers, pro=
bably software simulator plug-ins into existing HDL simulators, but the lat=
er is just luxury for speeding up simulations, in principle, feeding HDL si=
mulator with RTL model of the CPU core will work too.

I agree, but I think it will be very useful to have a proper model of the C=
PUs for faster simulations.  If it were one CPU it's different.  But using =
100 CPUs would very likely make simulation a real chore without a fast mode=
l.=20


> As to niches, all "hard" blocks that we currently have in FPGAs are about=
 niches. It's extremely rare that user's design uses all or majority of the=
 features of given FPGA device and need LUTs, embedded memories, PLLs, mult=
iplies, SERDESs, DDR DRAM I/O blocks etc in exact amounts appearing in the =
device.

This is exactly the reason why FPGA companies resisted even incorporating b=
lock RAM initially.  I recall conversations with Xilinx representatives abo=
ut these issues here.  It was indicated that the cost of the added silicon =
was significant and they would be "seldom" used.  Now many people would not=
 buy an FPGA without multipliers and/or DSP blocks.  This is really just an=
other step in the same direction.=20


> It still makes sense, economically, to have them all built in, because ma=
sks and other NREs are mighty expensive while silicon itself is relatively =
cheap. Multiple small hard CPU cores are really not very different from fea=
tures, mentioned above.

I don't know the details of costs for FPGAs.  What I do know is that the CP=
Us I am talking about would use the silicon area of a rather few logic bloc=
ks.  The reference design I use is in a 180 nm process and is an eighth of =
a square mm.  With an 18 nm process the die area would be 1,260 sq um.  Tha=
t's not very big.  100 of them would occupy 0.126 sq mm.  If they have much=
 use, that's a pretty small die area.  For comparison, an XC7A200T has a di=
e area of about 132 sq mm and 33,000 slices for an area of 3,923 sq um per.=
  Of course this is loaded with overhead which is likely more than half the=
 area, but it gives you some perspective about the cost of adding these CPU=
s... very, very little, around the die area of a single slice.  It also giv=
es you an idea of how large the FPGA logic functions have grown.=20

Rick C.

Article: 161261
Subject: Re: Tiny CPUs for Slow Logic
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Wed, 20 Mar 2019 15:51:18 +0000
Links: << >>  << T >>  << A >>
On 20/03/19 14:51, already5chosen@yahoo.com wrote:
> On Wednesday, March 20, 2019 at 4:31:27 PM UTC+2, Tom Gardner wrote:
>> On 20/03/19 14:11, already5chosen@yahoo.com wrote:
>>> On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:
>>>>
>>>> But more difficult that creating such a toolset is defining an application
>>>> level description that a toolset can munge.
>>>>
>>>> So, define (initially by example, later more formally) inputs to the
>>>> toolset and outputs from it. Then we can judge whether the concepts are
>>>> more than handwaving wishes.
>>>>
>>>
>>> I don't understand what you are asking for.
>>
>> Go back and read the parts of my post that you chose to snip.
>>
>> Give a handwaving indication of the concepts that avoid the
>> conceptual problems that I mentioned.
> 
> Frankly, it starts to sound like you never used soft CPU cores in your designs.
> So, for somebody like myself, who uses them routinely for different tasks since 2006, you are really not easy to understand.

Professionally, since 1978 I've done everything from low noise
analogue electronics, many hardware-software systems using
all sorts of technologies, networking at all levels of the
protocol stack, "up" to high availability distributed soft
real-time systems.

And almost all of that has been on the bleeding edge.

So, yes, I do have more than a passing acquaintance with
the characteristics of many hardware and software technologies,
and where partitions between them can, should and should not
be drawn.


> Concept? Concepts are good for new things, not for something that is a variation of something old and routine and obviously working.

Whatever is being proposed, is it old or new?

If old then the OP needs enlightenment and concrete
examples can easily be noted.

If new, then provide the concepts.


>> Or better still, get the OP to do it.
>>
> 
> With that part I agree.

Article: 161262
Subject: Re: Tiny CPUs for Slow Logic
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Wed, 20 Mar 2019 15:55:58 +0000
Links: << >>  << T >>  << A >>
On 20/03/19 15:30, David Brown wrote:
> If the software a core is running is simple enough to be described in
> VHDL, then it should be a VHDL process - not software in a cpu core.  If
> it is too complex for that, it is going to have to be programmed
> separately in an appropriate language.  That is not necessarily harder
> or easier than VHDL design - it is just different.

Precisely.


> If you try to force the software to be synchronous with timing on the
> hardware, /then/ you are going to be in big difficulties.  So don't do
> that - use hardware for the tightest timing, and software for the bits
> that software is good for.

Precisely.


>>>> There is no need to think about how the CPUs would communicate
>>>> unless there is a specific need for them to do so.  The F18A uses
>>>> a handshaked parallel port in their design.  They seem to have
>>>> done a pretty slick job of it and can actually hang the processor
>>>> waiting for the acknowledgement saving power and getting an
>>>> instantaneous wake up following the handshake.  This can be used
>>>> with other CPUs or
>>>>
>>>
>>> Fair enough.
>>
>> Ok, that's a start.
>>
> 
> I'd expect that the sensible way to pass data between these, if you need
> to do so much, is using FIFO's.

And that raises the question of the "comms protocols" or
"programming model" between each side, e.g. rendezvous,
FIFO depth, blocking, non-blocking, timeouts etc



Article: 161263
Subject: Re: Tiny CPUs for Slow Logic
From: gnuarm.deletethisbit@gmail.com
Date: Wed, 20 Mar 2019 09:30:51 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Wednesday, March 20, 2019 at 11:30:15 AM UTC-4, David Brown wrote:
> On 20/03/2019 15:50, gnuarm.deletethisbit@gmail.com wrote:
> > On Wednesday, March 20, 2019 at 6:14:21 AM UTC-4, David Brown wrote:
> >> On 20/03/2019 03:30, gnuarm.deletethisbit@gmail.com wrote:
> >>> On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos=20
> >>> wrote:
> >>>> Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
> >>>>> Understand XMOS's xCORE processors and xC language, see how
> >>>>> they complement and support each other. I found the net
> >>>>> result stunningly easy to get working first time, without
> >>>>> having to continually read obscure errata!
> >>>>=20
> >>>> I can see the merits of the XMOS approach.  But I'm unclear
> >>>> how this relates to the OP's proposal, which (I think) is
> >>>> having tiny CPUs as hard logic blocks on an FPGA, like DSP
> >>>> blocks.
> >>>>=20
> >>>> I completely understand the problem of running out of hardware=20
> >>>> threads, so a means of 'just add another one' is handy.  But
> >>>> the issue is how to combine such things with other synthesised
> >>>> logic.
> >>>>=20
> >>>> The XMOS approach is fine when the hardware is uniform and the=20
> >>>> software sits on top, but when the hardware is synthesised and
> >>>> the 'CPUs' sit as pieces in a fabric containing random logic
> >>>> (as I think the OP is suggesting) it becomes a lot harder to
> >>>> reason about what the system is doing and what the software
> >>>> running on such heterogeneous cores should look like.  Only the
> >>>> FPGA tools have a full view of what the system looks like, and
> >>>> it seems stretching them to have them also generate software to
> >>>> run on these cores.
> >>>=20
> >>> When people talk about things like "software running on such=20
> >>> heterogeneous cores" it makes me think they don't really
> >>> understand how this could be used.  If you treat these small
> >>> cores like logic elements, you don't have such lofty descriptions
> >>> of "system software" since the software isn't created out of some
> >>> global software package. Each core is designed to do a specific
> >>> job just like any other piece of hardware and it has discrete
> >>> inputs and outputs just like any other piece of hardware.  If the
> >>> hardware clock is not too fast, the software can synchronize with
> >>> and literally function like hardware, but implementing more
> >>> complex logic than the same area of FPGA fabric might.
> >>>=20
> >>=20
> >> That is software.
> >>=20
> >> If you want to try to get cycle-precise control of the software and
> >> use that precision for direct hardware interfacing, you are almost
> >> certainly going to have a poor, inefficient and difficult design.
> >> It doesn't matter if you say "think of it like logic" - it is /not/
> >> logic, it is software, and you don't use that for cycle-precise
> >> control.  You use when you need flexibility, calculations, and
> >> decisions.
> >=20
> > I suppose you can make anything difficult if you try hard enough.
> >=20
>=20
> Equally, you can make anything sound simple if you are vague enough and
> wave your hands around.

Not trying to make it sound "simple".  Just saying it can be useful and not=
 the same as designing a chip with many CPUs for the purpose of providing l=
ots of MIPS to crunch numbers.  Those ideas and methods don't apply here.=
=20


> > The point is you don't have to make it difficult by talking about
> > "software running on such heterogeneous cores".  Just talk about it
> > being a small hunk of software that is doing a specific job.  Then
> > the mystery is gone and the task can be made as easy as the task is.
> >=20
>=20
> I did not use the phrase "software running on such heterogeneous cores"
> - and I am not trying to make anything difficult.  You are making cpu
> cores.  They run software.  Saying they are "like logic elements" or
> "they connect directly to hardware" does not make it so - and it does
> not mean that what they run is not software.

You don't need to complicate the design by applying all the limitations of =
multi-processing when this is NOT at all the same.  I call them logic eleme=
nts because that is the intent, for them to implement logic.  Yes, it is so=
ftware, but that in itself creates no problems I am aware of.=20

As to the connection, I really don't get your point.  They either connect d=
irectly to the hardware because that's how they are designed, or they don't=
... because that's how they are designed.  I don't know what you are saying=
 about that.=20


> > In VHDL this would be a process().  VHDL programs are typically chock
> > full of processes and no one wrings their hands worrying about how
> > they will design the "software running on such heterogeneous cores".
> >=20
> >=20
> > BTW, VHDL is software too.
>=20
> I agree that VHDL is software.  And yes, there are usually processes in
> VHDL designs.
>=20
> I am not /worrying/ about these devices running software - I am simply
> saying that they /will/ be running software.  I can't comprehend why you
> want to deny that. =20

Enough!  The CPUs run software.  Now, what is YOUR point?=20


> It seems that you are frightened of software or
> programmers, and want to call it anything /but/ software.
>=20
> If the software a core is running is simple enough to be described in
> VHDL, then it should be a VHDL process - not software in a cpu core. =20

Ok, now you have crossed into a philosophical domain.  If you want to think=
 in these terms I won't dissuade you, but it has no meaning in digital desi=
gn and I won't discuss it further.=20


> If
> it is too complex for that, it is going to have to be programmed
> separately in an appropriate language.  That is not necessarily harder
> or easier than VHDL design - it is just different.

Ok, so what?=20


> If you try to force the software to be synchronous with timing on the
> hardware, /then/ you are going to be in big difficulties.  So don't do
> that - use hardware for the tightest timing, and software for the bits
> that software is good for.

LOL!  You are thinking in terms that are very obsolete.  Read about how the=
 F18A synchronizes with other processors and you will find that this is an =
excellent way to interface to the hardware as well.  Just like logic, when =
the CPU hand shakes with a logic clock, it only has to meet the timing of a=
 clock cycle, just like all the logic in the same design.  In a VHDL proces=
s the steps are written out in sequence and not assumed to be running in pa=
rallel, just like software.  When the process reaches a point of synchroniz=
ation it will halt, just like logic. =20


> >>> There is no need to think about how the CPUs would communicate
> >>> unless there is a specific need for them to do so.  The F18A uses
> >>> a handshaked parallel port in their design.  They seem to have
> >>> done a pretty slick job of it and can actually hang the processor
> >>> waiting for the acknowledgement saving power and getting an
> >>> instantaneous wake up following the handshake.  This can be used
> >>> with other CPUs or
> >>>=20
> >>=20
> >> Fair enough.
> >=20
> > Ok, that's a start.
> >=20
>=20
> I'd expect that the sensible way to pass data between these, if you need
> to do so much, is using FIFO's.

Between what exactly???  You are designing a system that is not before you.=
  More importantly you don't actually know anything about the ideas used in=
 the F18A and GA144 designs. =20

I'm not trying to be rude, but you should learn more about them before you =
assume they need to work like every other processor you've ever used.  The =
F18A and GA144 really only have two particularly unique ideas.  One is that=
 the processor is very, very small and as a consequence, fast.  The other i=
s the communications technique. =20

Charles Moore is a unique thinker and he realized that with the advance of =
processing technology CPUs could be made very small and so become MIPS fodd=
er.  By that I mean you no longer need to focus on utilizing all the MIPS i=
n a CPU.  Instead, they can be treated as disposable and only a tiny fracti=
on of the available MIPS used to implement some function... usefully. =20

While the GA144 is a commercial failure for many reasons, it does illustrat=
e some very innovative ideas and is what prompted me to consider what happe=
ns when you can scatter CPUs around an FPGA as if they were logic blocks. =
=20

No, I don't have a fully developed "business plan".  I am just interested i=
n exploring the idea.  Moore's (Green Array's actually, CM isn't actively w=
orking with them at this point I believe) chip isn't very practical because=
 Moore isn't terribly interested in being practical exactly.  But that isn'=
t to say it doesn't embody some very interesting ideas.=20

Rick C.

Article: 161264
Subject: Re: Tiny CPUs for Slow Logic
From: already5chosen@yahoo.com
Date: Wed, 20 Mar 2019 09:32:23 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Wednesday, March 20, 2019 at 5:51:21 PM UTC+2, Tom Gardner wrote:
> On 20/03/19 14:51, already5chosen@yahoo.com wrote:
> > On Wednesday, March 20, 2019 at 4:31:27 PM UTC+2, Tom Gardner wrote:
> >> On 20/03/19 14:11, already5chosen@yahoo.com wrote:
> >>> On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:
> >>>>
> >>>> But more difficult that creating such a toolset is defining an appli=
cation
> >>>> level description that a toolset can munge.
> >>>>
> >>>> So, define (initially by example, later more formally) inputs to the
> >>>> toolset and outputs from it. Then we can judge whether the concepts =
are
> >>>> more than handwaving wishes.
> >>>>
> >>>
> >>> I don't understand what you are asking for.
> >>
> >> Go back and read the parts of my post that you chose to snip.
> >>
> >> Give a handwaving indication of the concepts that avoid the
> >> conceptual problems that I mentioned.
> >=20
> > Frankly, it starts to sound like you never used soft CPU cores in your =
designs.
> > So, for somebody like myself, who uses them routinely for different tas=
ks since 2006, you are really not easy to understand.
>=20
> Professionally, since 1978 I've done everything from low noise
> analogue electronics, many hardware-software systems using
> all sorts of technologies, networking at all levels of the
> protocol stack, "up" to high availability distributed soft
> real-time systems.
>=20
> And almost all of that has been on the bleeding edge.
>=20
> So, yes, I do have more than a passing acquaintance with
> the characteristics of many hardware and software technologies,
> and where partitions between them can, should and should not
> be drawn.
>=20

Is it sort of admission that you indeed never designed with soft cores?

>=20
> > Concept? Concepts are good for new things, not for something that is a =
variation of something old and routine and obviously working.
>=20
> Whatever is being proposed, is it old or new?
>=20
> If old then the OP needs enlightenment and concrete
> examples can easily be noted.
>=20
> If new, then provide the concepts.
>=20

It is a new variation of of old concept.
A cross between PPCs in ancient VirtexPro and soft cores virtually everywhe=
re in more modern times.
Probably, best characterized by what is not alike: it is not alike Xilinx Z=
ynq or Altera Cyclone5-HPS.

"New" part comes more from new economics of sub-20nm processes than from ab=
stractions that you try to draf into it. NRE is more and more expensive, ga=
tes are more and more cheap (Well, the cost of gates started to stagnate in=
 last couple of years, but that does not matter. What's matter is that at s=
omething like TSMC 12nm gate are already quite cheap). So, adding multiple =
small CPU cores that could be used as replacement for multiple soft CPU cor=
es that people already used to use today, now starts to make sense. May be,=
 it's not a really good proposition, but at these silicon geometries it can=
't be written out as obviously stupid proposition.

It appears that I don't agree with Rick about "how small is small" and resp=
ectively about how many of them should be placed on die, but we probably ag=
ree about percentage of the area of FPGA that intuitively seem worth to all=
ocate for such feature - more than 1% but less than 5%.
Also he appears to like stack-based ISAs while I lean toward more conventio=
nal 32-bit or 32/64-bit RISC, or, may be, even toward modern CISC akin to R=
enesas RX, but those are relatively minor details.


>=20
> >> Or better still, get the OP to do it.
> >>
> >=20
> > With that part I agree.


Article: 161265
Subject: Re: Tiny CPUs for Slow Logic
From: David Brown <david.brown@hesbynett.no>
Date: Wed, 20 Mar 2019 22:38:11 +0100
Links: << >>  << T >>  << A >>
On 20/03/2019 17:30, gnuarm.deletethisbit@gmail.com wrote:
> On Wednesday, March 20, 2019 at 11:30:15 AM UTC-4, David Brown
> wrote:
>> On 20/03/2019 15:50, gnuarm.deletethisbit@gmail.com wrote:
>>> On Wednesday, March 20, 2019 at 6:14:21 AM UTC-4, David Brown
>>> wrote:
>>>> On 20/03/2019 03:30, gnuarm.deletethisbit@gmail.com wrote:
>>>>> On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo
>>>>> Markettos wrote:
>>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
>>>>>>> Understand XMOS's xCORE processors and xC language, see
>>>>>>> how they complement and support each other. I found the
>>>>>>> net result stunningly easy to get working first time,
>>>>>>> without having to continually read obscure errata!
>>>>>> 
>>>>>> I can see the merits of the XMOS approach.  But I'm
>>>>>> unclear how this relates to the OP's proposal, which (I
>>>>>> think) is having tiny CPUs as hard logic blocks on an FPGA,
>>>>>> like DSP blocks.
>>>>>> 
>>>>>> I completely understand the problem of running out of
>>>>>> hardware threads, so a means of 'just add another one' is
>>>>>> handy.  But the issue is how to combine such things with
>>>>>> other synthesised logic.
>>>>>> 
>>>>>> The XMOS approach is fine when the hardware is uniform and
>>>>>> the software sits on top, but when the hardware is
>>>>>> synthesised and the 'CPUs' sit as pieces in a fabric
>>>>>> containing random logic (as I think the OP is suggesting)
>>>>>> it becomes a lot harder to reason about what the system is
>>>>>> doing and what the software running on such heterogeneous
>>>>>> cores should look like.  Only the FPGA tools have a full
>>>>>> view of what the system looks like, and it seems stretching
>>>>>> them to have them also generate software to run on these
>>>>>> cores.
>>>>> 
>>>>> When people talk about things like "software running on such 
>>>>> heterogeneous cores" it makes me think they don't really 
>>>>> understand how this could be used.  If you treat these small 
>>>>> cores like logic elements, you don't have such lofty
>>>>> descriptions of "system software" since the software isn't
>>>>> created out of some global software package. Each core is
>>>>> designed to do a specific job just like any other piece of
>>>>> hardware and it has discrete inputs and outputs just like any
>>>>> other piece of hardware.  If the hardware clock is not too
>>>>> fast, the software can synchronize with and literally
>>>>> function like hardware, but implementing more complex logic
>>>>> than the same area of FPGA fabric might.
>>>>> 
>>>> 
>>>> That is software.
>>>> 
>>>> If you want to try to get cycle-precise control of the software
>>>> and use that precision for direct hardware interfacing, you are
>>>> almost certainly going to have a poor, inefficient and
>>>> difficult design. It doesn't matter if you say "think of it
>>>> like logic" - it is /not/ logic, it is software, and you don't
>>>> use that for cycle-precise control.  You use when you need
>>>> flexibility, calculations, and decisions.
>>> 
>>> I suppose you can make anything difficult if you try hard
>>> enough.
>>> 
>> 
>> Equally, you can make anything sound simple if you are vague enough
>> and wave your hands around.
> 
> Not trying to make it sound "simple".  Just saying it can be useful
> and not the same as designing a chip with many CPUs for the purpose
> of providing lots of MIPS to crunch numbers.  Those ideas and methods
> don't apply here.

Fair enough.  I have not suggested it was like using lots of CPUs for 
number crunching.  (That is not what I would think the GA144 is good for 
either.)

> 
> 
>>> The point is you don't have to make it difficult by talking
>>> about "software running on such heterogeneous cores".  Just talk
>>> about it being a small hunk of software that is doing a specific
>>> job.  Then the mystery is gone and the task can be made as easy
>>> as the task is.
>>> 
>> 
>> I did not use the phrase "software running on such heterogeneous
>> cores" - and I am not trying to make anything difficult.  You are
>> making cpu cores.  They run software.  Saying they are "like logic
>> elements" or "they connect directly to hardware" does not make it
>> so - and it does not mean that what they run is not software.
> 
> You don't need to complicate the design by applying all the
> limitations of multi-processing when this is NOT at all the same.  I
> call them logic elements because that is the intent, for them to
> implement logic.  Yes, it is software, but that in itself creates no
> problems I am aware of.
> 

I agree that software should not in itself create a problem.  Trying to 
think of them as "logic" /would/ create problems.  Think of them as 
software, and program them as software.  I expect you'd think of them as 
entirely independent units with independent programs, rather than as a 
multi-cpu or heterogeneous system.

> As to the connection, I really don't get your point.  They either
> connect directly to the hardware because that's how they are
> designed, or they don't... because that's how they are designed.  I
> don't know what you are saying about that.
> 

"Synchronise directly with hardware" might be a better phrase.

> 
>>> In VHDL this would be a process().  VHDL programs are typically
>>> chock full of processes and no one wrings their hands worrying
>>> about how they will design the "software running on such
>>> heterogeneous cores".
>>> 
>>> 
>>> BTW, VHDL is software too.
>> 
>> I agree that VHDL is software.  And yes, there are usually
>> processes in VHDL designs.
>> 
>> I am not /worrying/ about these devices running software - I am
>> simply saying that they /will/ be running software.  I can't
>> comprehend why you want to deny that.
> 
> Enough!  The CPUs run software.  Now, what is YOUR point?
> 

My point was that these are not logic, they are not logic elements (even 
if they could be physically small and cheap and scattered around a chip 
like logic elements).  Thinking about them as "sequential logic 
elements" is not helpful.  Think of them as small processors running 
simple and limited /software/.  Unless you can find a way to 
automatically generate code for them, then they will be programmed using 
a /software/ programming language, not a logic or hardware programming 
language.  If you are happy to accept that now, then great - we can move on.

> 
>> It seems that you are frightened of software or programmers, and
>> want to call it anything /but/ software.
>> 
>> If the software a core is running is simple enough to be described
>> in VHDL, then it should be a VHDL process - not software in a cpu
>> core.
> 
> Ok, now you have crossed into a philosophical domain.  If you want to
> think in these terms I won't dissuade you, but it has no meaning in
> digital design and I won't discuss it further.
> 
> 
>> If it is too complex for that, it is going to have to be
>> programmed separately in an appropriate language.  That is not
>> necessarily harder or easier than VHDL design - it is just
>> different.
> 
> Ok, so what?
> 
> 
>> If you try to force the software to be synchronous with timing on
>> the hardware, /then/ you are going to be in big difficulties.  So
>> don't do that - use hardware for the tightest timing, and software
>> for the bits that software is good for.
> 
> LOL!  You are thinking in terms that are very obsolete.  Read about
> how the F18A synchronizes with other processors and you will find
> that this is an excellent way to interface to the hardware as well.
> Just like logic, when the CPU hand shakes with a logic clock, it only
> has to meet the timing of a clock cycle, just like all the logic in
> the same design.

That is not using software for synchronising with hardware (or other 
cpus) - it is using hardware.

When a processor's software has a loop waiting for an input signal to go 
low, then it reads a byte input, then it waits for the first signal to 
go high again - that is using software for synchronisation.  That's okay 
for slow interfacing.  When it waits for one signal, then uses three 
NOP's before setting another signal to get the timing right, that is 
using software for accurate timing - a very fragile solution.

When it is reading from a register that is latched by an external enable 
signal, it is using hardware for the interfacing and synchronisation. 
When the cpu has signals that can pause its execution at the right steps 
in handshaking, it is using hardware synchronisation.  That is, of 
course, absolutely fine - that is using the right tools for the right jobs.


>  In a VHDL process the steps are written out in
> sequence and not assumed to be running in parallel, just like
> software.  When the process reaches a point of synchronization it
> will halt, just like logic.
> 

You use VHDL processes for cycle-precise, simple sequences.  You use 
software on a processor for less precise, complex sequences.

> 
>>>>> There is no need to think about how the CPUs would
>>>>> communicate unless there is a specific need for them to do
>>>>> so.  The F18A uses a handshaked parallel port in their
>>>>> design.  They seem to have done a pretty slick job of it and
>>>>> can actually hang the processor waiting for the
>>>>> acknowledgement saving power and getting an instantaneous
>>>>> wake up following the handshake.  This can be used with other
>>>>> CPUs or
>>>>> 
>>>> 
>>>> Fair enough.
>>> 
>>> Ok, that's a start.
>>> 
>> 
>> I'd expect that the sensible way to pass data between these, if you
>> need to do so much, is using FIFO's.
> 
> Between what exactly???  You are designing a system that is not
> before you.  More importantly you don't actually know anything about
> the ideas used in the F18A and GA144 designs.

Between whatever you want as you pass data around your chip.

> 
> I'm not trying to be rude, but you should learn more about them
> before you assume they need to work like every other processor you've
> ever used.  The F18A and GA144 really only have two particularly
> unique ideas.  One is that the processor is very, very small and as a
> consequence, fast.  The other is the communications technique.

Communication between the nodes is with a synchronising port.  A write 
to the port blocks until the receiving node does a read - similarly, a 
read blocks until the sending node does a write.  Hardware 
synchronisation, not software, and not entirely unlike an absolutely 
minimal blocking FIFO.  It is an interesting idea, though somewhat limiting.

> 
> Charles Moore is a unique thinker and he realized that with the
> advance of processing technology CPUs could be made very small and so
> become MIPS fodder.  By that I mean you no longer need to focus on
> utilizing all the MIPS in a CPU.  Instead, they can be treated as
> disposable and only a tiny fraction of the available MIPS used to
> implement some function... usefully.
> 
> While the GA144 is a commercial failure for many reasons, it does
> illustrate some very innovative ideas and is what prompted me to
> consider what happens when you can scatter CPUs around an FPGA as if
> they were logic blocks.

As I said before, it is a very interesting and impressive concept, with 
a lot of cool ideas - despite being a commercial failure.

I think one of the biggest reasons for its failure is that it is a 
technologically interesting solution, but with no matching problems - 
there is no killer app for it.  When combined with a significant 
learning curve and development challenge compared to alternative 
established solutions.

I want to know if that is going to happen with your ideas here.  Sure, 
you don't have a full business plan - but do you at least have thoughts 
about the kind of usage where these mini cpus would be a technologically 
superior choice compared to using state machines in VHDL (possibly 
generated with external programs), sequential logic generators (like C 
to HDL compilers, matlab tools, etc.), normal soft processors, or normal 
hard processors?

Give me a /reason/ to all this - rather than just saying you can make a 
simple stack-based cpu that's very small, so you could have lots of them 
on a chip.

> 
> No, I don't have a fully developed "business plan".  I am just
> interested in exploring the idea.  Moore's (Green Array's actually,
> CM isn't actively working with them at this point I believe) chip
> isn't very practical because Moore isn't terribly interested in being
> practical exactly.  But that isn't to say it doesn't embody some very
> interesting ideas.
> 
> Rick C.
> 


Article: 161266
Subject: Re: Tiny CPUs for Slow Logic
From: gnuarm.deletethisbit@gmail.com
Date: Wed, 20 Mar 2019 19:21:07 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Wednesday, March 20, 2019 at 5:38:16 PM UTC-4, David Brown wrote:
>=20
> I agree that software should not in itself create a problem.  Trying to=
=20
> think of them as "logic" /would/ create problems.  Think of them as=20
> software, and program them as software.  I expect you'd think of them as=
=20
> entirely independent units with independent programs, rather than as a=20
> multi-cpu or heterogeneous system.

Ok, please tell me what those problems would be.  I have no idea what you m=
ean by what you say.  You are likely reading a lot into this that I am not =
intending. =20


> > As to the connection, I really don't get your point.  They either
> > connect directly to the hardware because that's how they are
> > designed, or they don't... because that's how they are designed.  I
> > don't know what you are saying about that.
> >=20
>=20
> "Synchronise directly with hardware" might be a better phrase.

I don't know why and likely I'm' not going to care.  I think you need to le=
arn more of how the F18A works.=20


> > Enough!  The CPUs run software.  Now, what is YOUR point?
> >=20
>=20
> My point was that these are not logic, they are not logic elements (even=
=20
> if they could be physically small and cheap and scattered around a chip=
=20
> like logic elements).  Thinking about them as "sequential logic=20
> elements" is not helpful.  Think of them as small processors running=20
> simple and limited /software/.  Unless you can find a way to=20
> automatically generate code for them, then they will be programmed using=
=20
> a /software/ programming language, not a logic or hardware programming=20
> language.  If you are happy to accept that now, then great - we can move =
on.

You have it backwards.  Please show me what you think the problems are.  I =
don't care if they run software or have a Maxwell demon tossing bits about =
as long as it does what I need.  You seem to get hung up on terminology so =
easily.=20


> > LOL!  You are thinking in terms that are very obsolete.  Read about
> > how the F18A synchronizes with other processors and you will find
> > that this is an excellent way to interface to the hardware as well.
> > Just like logic, when the CPU hand shakes with a logic clock, it only
> > has to meet the timing of a clock cycle, just like all the logic in
> > the same design.
>=20
> That is not using software for synchronising with hardware (or other=20
> cpus) - it is using hardware.

So???  You are the one who keeps talking about software/hardware whatever. =
 I'm talking about the software being able to synchronize with the clock of=
 the other hardware.  When that happens there are tight timing constraints =
in the same sense of the software sampling an ADC on a periodic basis and h=
aving to process the resulting data before the next sample is ready.  The o=
nly difference is something like the F18A running at a few GHz can do a lot=
 in a 10 ns clock cycle.=20


> When a processor's software has a loop waiting for an input signal to go=
=20
> low, then it reads a byte input, then it waits for the first signal to=20
> go high again - that is using software for synchronisation.  That's okay=
=20
> for slow interfacing.  When it waits for one signal, then uses three=20
> NOP's before setting another signal to get the timing right, that is=20
> using software for accurate timing - a very fragile solution.

That is your construct because you know nothing of how the F18A works.  As =
I've mentioned before, you would do well to read some of the app notes on t=
his device.  It really does have some good ideas to offer.=20


> When it is reading from a register that is latched by an external enable=
=20
> signal, it is using hardware for the interfacing and synchronisation.=20
> When the cpu has signals that can pause its execution at the right steps=
=20
> in handshaking, it is using hardware synchronisation.  That is, of=20
> course, absolutely fine - that is using the right tools for the right job=
s.

Duh!=20


> >  In a VHDL process the steps are written out in
> > sequence and not assumed to be running in parallel, just like
> > software.  When the process reaches a point of synchronization it
> > will halt, just like logic.
> >=20
>=20
> You use VHDL processes for cycle-precise, simple sequences.  You use=20
> software on a processor for less precise, complex sequences.

You are making arbitrary distinctions.  The point is that if these CPUs are=
 available they can be used to implement significant sections of logic in l=
ess space on the die than in the FPGA fabric.=20


> Between whatever you want as you pass data around your chip.

FIFOs are used for specific purposes.  Not every interface needs them.  You=
r suggestion that they should be used without an understand of why is prett=
y pointless.=20


> > I'm not trying to be rude, but you should learn more about them
> > before you assume they need to work like every other processor you've
> > ever used.  The F18A and GA144 really only have two particularly
> > unique ideas.  One is that the processor is very, very small and as a
> > consequence, fast.  The other is the communications technique.
>=20
> Communication between the nodes is with a synchronising port.  A write=20
> to the port blocks until the receiving node does a read - similarly, a=20
> read blocks until the sending node does a write.  Hardware=20
> synchronisation, not software, and not entirely unlike an absolutely=20
> minimal blocking FIFO.  It is an interesting idea, though somewhat limiti=
ng.

Oh, what are the limitations?  Also be aware that the blocking doesn't need=
 to work as you describe it.  Mostly the block would be on the read side, a=
 processor would block until the data it needs is available... or a clock s=
ignal transitions to indicate the data that has been calculated can be outp=
ut... just like other logic the LUT/FF logic blocks of an FPGA.=20


> > Charles Moore is a unique thinker and he realized that with the
> > advance of processing technology CPUs could be made very small and so
> > become MIPS fodder.  By that I mean you no longer need to focus on
> > utilizing all the MIPS in a CPU.  Instead, they can be treated as
> > disposable and only a tiny fraction of the available MIPS used to
> > implement some function... usefully.
> >=20
> > While the GA144 is a commercial failure for many reasons, it does
> > illustrate some very innovative ideas and is what prompted me to
> > consider what happens when you can scatter CPUs around an FPGA as if
> > they were logic blocks.
>=20
> As I said before, it is a very interesting and impressive concept, with=
=20
> a lot of cool ideas - despite being a commercial failure.
>=20
> I think one of the biggest reasons for its failure is that it is a=20
> technologically interesting solution, but with no matching problems -=20
> there is no killer app for it.  When combined with a significant=20
> learning curve and development challenge compared to alternative=20
> established solutions.

Saying there is no killer app is rather the result than the problem.  Yes, =
it was designed out of the idea of "what happens when I inter-connect a bun=
ch of these processors?" without considering a lot of the real world design=
 needs.  The chip has limited RAM which could have been included in some wa=
y even if not on each processor.  There is no Flash, which again could have=
 been included.  The I/Os are all 1.8 volts.  There was no real memory inte=
rface provided, rather a DRAM interface was emulated in firmware and actual=
ly doesn't work, so one had to be written for static RAM which is hard to c=
ome by these days.  I don't recall the full list.=20

But this is not about the GA144.=20


> I want to know if that is going to happen with your ideas here.  Sure,=20
> you don't have a full business plan - but do you at least have thoughts=
=20
> about the kind of usage where these mini cpus would be a technologically=
=20
> superior choice compared to using state machines in VHDL (possibly=20
> generated with external programs), sequential logic generators (like C=20
> to HDL compilers, matlab tools, etc.), normal soft processors, or normal=
=20
> hard processors?

The point wasn't that I don't have a business plan.  The point was that I h=
aven't given this as much thought as would have been done if I were working=
 on a business plan.  I'm kicking around an idea.  I'm not in a position to=
 create FPGA with or without small CPUs.=20


> Give me a /reason/ to all this - rather than just saying you can make a=
=20
> simple stack-based cpu that's very small, so you could have lots of them=
=20
> on a chip.

Why?  Why don't you give ME a reason?  Why don't you switch your point of v=
iew and figure out how this would be useful?  Neither of us have anything t=
o gain or lose.=20


> > No, I don't have a fully developed "business plan".  I am just
> > interested in exploring the idea.  Moore's (Green Array's actually,
> > CM isn't actively working with them at this point I believe) chip
> > isn't very practical because Moore isn't terribly interested in being
> > practical exactly.  But that isn't to say it doesn't embody some very
> > interesting ideas.

Rick C.

Article: 161267
Subject: Re: Tiny CPUs for Slow Logic
From: David Brown <david.brown@hesbynett.no>
Date: Thu, 21 Mar 2019 08:37:10 +0100
Links: << >>  << T >>  << A >>
On 21/03/2019 03:21, gnuarm.deletethisbit@gmail.com wrote:
> On Wednesday, March 20, 2019 at 5:38:16 PM UTC-4, David Brown wrote:

> 
>> I want to know if that is going to happen with your ideas here.
>> Sure, you don't have a full business plan - but do you at least
>> have thoughts about the kind of usage where these mini cpus would
>> be a technologically superior choice compared to using state
>> machines in VHDL (possibly generated with external programs),
>> sequential logic generators (like C to HDL compilers, matlab tools,
>> etc.), normal soft processors, or normal hard processors?
> 
> The point wasn't that I don't have a business plan.  The point was
> that I haven't given this as much thought as would have been done if
> I were working on a business plan.  I'm kicking around an idea.  I'm
> not in a position to create FPGA with or without small CPUs.
> 
> 
>> Give me a /reason/ to all this - rather than just saying you can
>> make a simple stack-based cpu that's very small, so you could have
>> lots of them on a chip.
> 
> Why?  Why don't you give ME a reason?  Why don't you switch your
> point of view and figure out how this would be useful?  Neither of us
> have anything to gain or lose.
> 

I don't have any good ideas of what these might be used for.  And I 
can't see how it ends up as /my/ responsibility to figure out why /your/ 
idea might be a good idea.

You presented an idea - having several small, simple cpus on a chip. 
It's taken a long time, and a lot of side-tracks, to drag out of you 
what you are really thinking about.  (Perhaps you didn't have a clear 
idea in your mind with your first post, and it has solidified underway - 
in which case, great, and I'm glad the thread has been successful there.)

I've been trying to help by trying to look at how these might be used, 
and how they compare to alternative existing solutions.  And I have been 
trying to get /you/ to come up with some ideas about when they might be 
useful.  All I'm getting is a lot of complaints, insults, condescension, 
patronisation.  You tell me I don't understand what these are for - yet 
you refuse to say what they are for (the nearest we have got in any post 
in this thread to evidence that there is any use-case, is you telling me 
you have ideas but refuse to tell me as I am not an FPGA designer by 
profession).  You are forever telling me about the wonders of the F18A 
and the GA144, and how I can't understand your ideas because I don't 
understand that device - while simultaneously telling me that device is 
irrelevant to your proposal.  You are asking for opinions and thoughts 
about how people would program these devices, then tell me I am wrong 
and closed-minded when I give you answers.

Hopefully, you have got /some/ ideas and thoughts out of this thread. 
You can take a long, hard look at the idea in that light, and see if it 
really is something that could be useful - in today's world with today's 
tools and technology, or tomorrow's world with new tools and development 
systems.

But next time you want to start a thread asking for ideas and opinions, 
how about responding with phrases like "I hadn't thought of it that 
way", "I think FPGA designers IME would like this" - not "You are wrong, 
and clearly ignorant".

You are a smart guy, and you are great at answering other people's 
questions and helping them out - but boy, are you bad at asking for help 
yourself.




Article: 161268
Subject: Re: Tiny CPUs for Slow Logic
From: already5chosen@yahoo.com
Date: Thu, 21 Mar 2019 02:22:03 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Thursday, March 21, 2019 at 4:21:13 AM UTC+2, gnuarm.del...@gmail.com wr=
ote:
>=20
> So???  You are the one who keeps talking about software/hardware whatever=
.  I'm talking about the software being able to synchronize with the clock =
of the other hardware.  When that happens there are tight timing constraint=
s in the same sense of the software sampling an ADC on a periodic basis and=
 having to process the resulting data before the next sample is ready.  The=
 only difference is something like the F18A running at a few GHz can do a l=
ot in a 10 ns clock cycle.=20
>=20
>=20

I certainly don't like "few GHz" part.
Distributing single multi-GHZ clock over full area of FPGA is non-starter f=
rom power perspective alone, but even ignoring the power, such distribution=
 takes significant area making the whole proposition unattractive. As I und=
erstand it, the whole point is that this thingies take little area, so they=
 are not harmful even for those buyers of device that don't utilize them at=
 all or utilize very little.
Alternatively, multi-GHZ clocks can be generated by local specialized PLLs,=
 but I am afraid that PLLs would be several times bigger than cores themsel=
ves and need good non-noisy power supplies and grounds that are probably ha=
rd to get in the middle of the chip etc... I really know too little about P=
LLs, but I think that I know enough to conclude that it's not much better i=
dea than chip-wide clock distribution at multi-GHZ.

My idea of small hard cores is completely different in that regard. IMHO, t=
hey should run either with the same clock as surrounding FPGA fabric or wit=
h clock, delivered by simple clock doubler. Even clock quadrupling does not=
 appear as a good idea to my engineering intuition.


Article: 161269
Subject: Re: Tiny CPUs for Slow Logic
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Thu, 21 Mar 2019 09:40:26 +0000
Links: << >>  << T >>  << A >>
On 21/03/19 02:21, gnuarm.deletethisbit@gmail.com wrote:
> On Wednesday, March 20, 2019 at 5:38:16 PM UTC-4, David Brown wrote:
>> 
>> I agree that software should not in itself create a problem.  Trying to 
>> think of them as "logic" /would/ create problems.  Think of them as 
>> software, and program them as software.  I expect you'd think of them as 
>> entirely independent units with independent programs, rather than as a 
>> multi-cpu or heterogeneous system.
> 
> Ok, please tell me what those problems would be.  I have no idea what you
> mean by what you say.  You are likely reading a lot into this that I am not
> intending.

I have no difficulty understanding what he is saying.

Several people have difficulty understanding what you
are proposing.

You are proposing vague ideas, so the onus is on you
to make your ideas clear.


>>> As to the connection, I really don't get your point.  They either connect
>>> directly to the hardware because that's how they are designed, or they
>>> don't... because that's how they are designed.  I don't know what you are
>>> saying about that.
>>> 
>> 
>> "Synchronise directly with hardware" might be a better phrase.
> 
> I don't know why and likely I'm' not going to care.  I think you need to
> learn more of how the F18A works.

No, we really don't have to learn more about one specific
processor - especially if it is just to help you.

If, OTOH, you succinctly summarise its key points and
how that achieves benefits, then we might be interested.


>>> Enough!  The CPUs run software.  Now, what is YOUR point?
>>> 
>> 
>> My point was that these are not logic, they are not logic elements (even if
>> they could be physically small and cheap and scattered around a chip like
>> logic elements).  Thinking about them as "sequential logic elements" is not
>> helpful.  Think of them as small processors running simple and limited
>> /software/.  Unless you can find a way to automatically generate code for
>> them, then they will be programmed using a /software/ programming language,
>> not a logic or hardware programming language.  If you are happy to accept
>> that now, then great - we can move on.
> 
> You have it backwards.  Please show me what you think the problems are.  I
> don't care if they run software or have a Maxwell demon tossing bits about as
> long as it does what I need.  You seem to get hung up on terminology so
> easily.

You need to explain your points better.

There's the old adage that "you only realise how little
you know about a subject when you try to teach it to
other people".


> That is your construct because you know nothing of how the F18A works.  As
> I've mentioned before, you would do well to read some of the app notes on
> this device.  It really does have some good ideas to offer.

Give us the elevator pitch, so we can estimate whether
it would be a beneficial use of our remaining life.



> The point wasn't that I don't have a business plan.  The point was that I
> haven't given this as much thought as would have been done if I were working
> on a business plan.  I'm kicking around an idea.  I'm not in a position to
> create FPGA with or without small CPUs.
> 
> 
>> Give me a /reason/ to all this - rather than just saying you can make a 
>> simple stack-based cpu that's very small, so you could have lots of them on
>> a chip.
> 
> Why?  Why don't you give ME a reason?  Why don't you switch your point of
> view and figure out how this would be useful?  Neither of us have anything to
> gain or lose.

Why? Because you are trying to propagate your ideas.
The onus is on you to convince us, not the other way
around.

Article: 161270
Subject: Re: Tiny CPUs for Slow Logic
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Thu, 21 Mar 2019 09:44:52 +0000
Links: << >>  << T >>  << A >>
On 20/03/19 16:32, already5chosen@yahoo.com wrote:
> On Wednesday, March 20, 2019 at 5:51:21 PM UTC+2, Tom Gardner wrote:
>> On 20/03/19 14:51, already5chosen@yahoo.com wrote:
>>> On Wednesday, March 20, 2019 at 4:31:27 PM UTC+2, Tom Gardner wrote:
>>>> On 20/03/19 14:11, already5chosen@yahoo.com wrote:
>>>>> On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:
>>>>>> 
>>>>>> But more difficult that creating such a toolset is defining an
>>>>>> application level description that a toolset can munge.
>>>>>> 
>>>>>> So, define (initially by example, later more formally) inputs to
>>>>>> the toolset and outputs from it. Then we can judge whether the
>>>>>> concepts are more than handwaving wishes.
>>>>>> 
>>>>> 
>>>>> I don't understand what you are asking for.
>>>> 
>>>> Go back and read the parts of my post that you chose to snip.
>>>> 
>>>> Give a handwaving indication of the concepts that avoid the conceptual
>>>> problems that I mentioned.
>>> 
>>> Frankly, it starts to sound like you never used soft CPU cores in your
>>> designs. So, for somebody like myself, who uses them routinely for
>>> different tasks since 2006, you are really not easy to understand.
>> 
>> Professionally, since 1978 I've done everything from low noise analogue
>> electronics, many hardware-software systems using all sorts of
>> technologies, networking at all levels of the protocol stack, "up" to high
>> availability distributed soft real-time systems.
>> 
>> And almost all of that has been on the bleeding edge.
>> 
>> So, yes, I do have more than a passing acquaintance with the
>> characteristics of many hardware and software technologies, and where
>> partitions between them can, should and should not be drawn.
>> 
> 
> Is it sort of admission that you indeed never designed with soft cores?

No, it is not.


>>> Concept? Concepts are good for new things, not for something that is a
>>> variation of something old and routine and obviously working.
>> 
>> Whatever is being proposed, is it old or new?
>> 
>> If old then the OP needs enlightenment and concrete examples can easily be
>> noted.
>> 
>> If new, then provide the concepts.
>> 
> 
> It is a new variation of of old concept. A cross between PPCs in ancient
> VirtexPro and soft cores virtually everywhere in more modern times. Probably,
> best characterized by what is not alike: it is not alike Xilinx Zynq or
> Altera Cyclone5-HPS.
> 
> "New" part comes more from new economics of sub-20nm processes than from
> abstractions that you try to draf into it. NRE is more and more expensive,
> gates are more and more cheap (Well, the cost of gates started to stagnate in
> last couple of years, but that does not matter. What's matter is that at
> something like TSMC 12nm gate are already quite cheap). So, adding multiple
> small CPU cores that could be used as replacement for multiple soft CPU cores
> that people already used to use today, now starts to make sense. May be, it's
> not a really good proposition, but at these silicon geometries it can't be
> written out as obviously stupid proposition.

The starting points are fine, but so what?

There's little point building something if it
isn't useful in practice.

For examples of that, see Intel's 432 and 860
processors, and there are other examples.


Article: 161271
Subject: Re: Tiny CPUs for Slow Logic
From: Theo <theom+news@chiark.greenend.org.uk>
Date: 21 Mar 2019 10:49:06 +0000 (GMT)
Links: << >>  << T >>  << A >>
gnuarm.deletethisbit@gmail.com wrote:
> On Wednesday, March 20, 2019 at 6:53:07 AM UTC-4, Theo wrote:
> > Your bottom-up approach means it's difficult to see the big picture of
> > what's going on.  That means it's hard to understand the whole system, and
> > to program from a whole-system perspective.
> 
> I never mentioned a bottom up or a top down approach to design.  Nothing
> about using these small CPUs is about the design "direction".  I am pretty
> sure that you have to define the circuit they will work in before you can
> start designing the code.

Your approach is 'I have this low-level thing (a tiny CPU), what can I use
it for?'.  That's bottom up.  A top down view would be 'my problem is X,
what's the best way to solve it?'.  The advantage of the latter view is you
can explore some of the architectural space before targeting a solution
that's appropriate to the problem (with metrics to measure it), aiming to
find the global maximum.  In a bottom-up approach you need to sell to users
that your idea will help their problem, but until you build a system they
don't know that it will even be a local maximum.

> > What's the logic equation of a processor?  
> 
> Obviously it is like a combination of LUTs with FFs and able to implement
> any logic you wish including math.  BTW, in many devices the elements are
> not at all so simple.  Xilinx LUTs can be used as shift registers.  There
> are additional logic within the logic blocks that allow math with carry
> chains, combining LUTs to form larger LUTs, breaking LUTs into smaller
> LUTs and lets not forget about routing which may not be used much anymore,
> not sure.

You can still reason about blocks as combinations of basic functions.  A
block that is LUT+FF can still be analysed in separate parts.
A processor is a 'black box' as far as the tools go.  That means any
software is opaque to analysis of correctness.  The tools therefore can't
know that the circuit they produced matches the input HDL.

Simulation does not give you equivalence checking of the form of LVS (layout
versus schematic) or compiler correctness testing, it only tests a
particular set of (usually hand-defined) test cases.  There's much less
coverage than equivalence checking tools.

> Why does it need to be inferred.  If you want to write an HDL tool to turn
> HDL into processor code, have at it.  But then there are other methods. 
> Someone mentioned his MO is to use other tools for designing his
> algorithms and letting that tool generate the software for a processor or
> the HDL for an FPGA.  That would seem easy enough to integrate.

That's roughly what OpenCL and friends can do.  But those are top-down
architecturally (starting with a chip block diagram), rather than starting
with tiny building blocks as you're suggesting.

> Huh?  You can't simulate code on a processor???

Verification is greater than simulation, as described above.

> > If we scale the processors up a bit, I could see the merits in say a
> > bank of, say, 32 Cortex M0s that could be interconnected as part of the
> > FPGA fabric and programmed in software for dedicated tasks (for
> > instance, read the I2C EEPROM on the DRAM DIMM and configure the DRAM
> > controller at boot).
> 
> I don't follow your logic.  What is different about the ARM processor from
> the stack processor other than that it is larger and slower and requires a
> royalty on each one?  Are you talking about writing the code in C vs. 
> what ever is used for the stack processor?

If you have an existing codebase (supplied by the vendor of your external
chip, for example), it'll likely be in C.  It won't be in
special-stack-assembler, and your architecture seems to be designed to not
be amenable to compilers.

> The point of the many hard cores is the saving of resources.  Soft cores
> would be the most wasteful way to implement logic.  If the application is
> large enough they can implement things in software that aren't as
> practical in HDL, but that would be a different class of logic from the
> tiny CPUs I'm talking about.

'Wastefulness' is one parameter.  But you can also consider that every
unused hard-core is also wasteful in terms of silicon area.  Can you show
that the hard-cores would be used enough of the time to outweigh the space
they waste on other people's designs?

> You lost me with the gear shift.  The mention of instruction rate is about
> the CPU being fast enough to keep up with FPGA logic.  The issue with
> "heterogeneous performance" is the "heterogeneous" part, lumping the many
> CPUs together to create some sort of number cruncher.  That's not what
> this is about.  Like in the GA144, I fully expect most CPUs to be sitting
> around most of the time idling, waiting for data.  This is a good thing
> actually.  These CPUs could consume significant current if they run at GHz
> all the time.  I believe in the GA144 at that slower rate each processor
> can use around 2.5 mA.  Not sure if a smaller process would use more or
> less power when running flat out.  It's been too many years since I worked
> with those sorts of numbers.

OK, so once we drop any idea of MIPS, we're talking about something simpler
than a Cortex M0.  You should be able to make a design that clocks at a few
hundred MHz on an FPGA process.  You could choose to run it synchronously
with your FPGA logic, or on an internal clock and synchronise inputs and
outputs.  You probably wouldn't tile these, but you could deploy them as a
'hardware thread' in places you need a complicated state machine.

> > In essence, your proposal has a disconnect between the situations existing
> > FPGA blocks are used (implemented automatically by P&R tools) and the
> > situations software is currently used (human-driven software and
> > architectural design).  It's unclear how you claim to bridge this gap.
> 
> I certainly don't see how P&R tools would be a problem.  They accommodate
> multipliers, DSP blocks, memory block and many, many special bits of
> assorted components inside the FPGAs which vary from vendor to vendor. 
> Clock generators and distribution is pretty unique to each manufacturer. 
> Lattice has all sorts of modules to offer like I2C and embedded Flash. 
> Then there are entire CPUs embedded in FPGAs.  Why would supporting them
> be so different from what I am talking about?

If this is a module that the tools have no visibility over, ie just a blob
with inputs and outputs, then they can implement that.  In that instance
there is a manageability problem - beyond a handful of processes, writing
heterogeneous distributed software is hard.  Unless each processor is doing
a very small, well-defined, task, I think the chances of bugs are high.

If instead you want interaction with the toolchain in terms of
generating/checking the software running on such cores, that's also
problematic.


I hadn't seen Picoblaze before, but that seems a strong fit with what you're
suggesting.  So a question: why isn't it more successful?  And why isn't
Xilinx putting hard Picoblazes into their FPGAs, which they could do
tomorrow if they felt the need?

Theo

Article: 161272
Subject: Re: Tiny CPUs for Slow Logic
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Thu, 21 Mar 2019 11:48:13 +0000
Links: << >>  << T >>  << A >>
On 21/03/19 10:49, Theo wrote:
> gnuarm.deletethisbit@gmail.com wrote:
>> On Wednesday, March 20, 2019 at 6:53:07 AM UTC-4, Theo wrote:
>>> Your bottom-up approach means it's difficult to see the big picture of
>>> what's going on.  That means it's hard to understand the whole system, and
>>> to program from a whole-system perspective.
>>
>> I never mentioned a bottom up or a top down approach to design.  Nothing
>> about using these small CPUs is about the design "direction".  I am pretty
>> sure that you have to define the circuit they will work in before you can
>> start designing the code.
> 
> Your approach is 'I have this low-level thing (a tiny CPU), what can I use
> it for?'.  That's bottom up.  A top down view would be 'my problem is X,
> what's the best way to solve it?'.  

The OP's attitude and responses have puzzled me. However, they
make more sense if that is indeed his design strategy - and I
suspect it is, based on comments he has made in other parts
of this thread.

That attitude surprises me, since all my /designs/ have been
based on "what do I need to achieve" plus "what can individual
technologies achieve" plus "which combination of technologies
is best at achieving my objectives". I.e top down with a
knowledge of the bottom pieces.

Of course I /implement/ my designs in a more bottom up way.

(I agree with the rest of your statements)

Article: 161273
Subject: Re: Tiny CPUs for Slow Logic
From: gnuarm.deletethisbit@gmail.com
Date: Thu, 21 Mar 2019 07:25:51 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Thursday, March 21, 2019 at 5:22:09 AM UTC-4, already...@yahoo.com wrote=
:
> On Thursday, March 21, 2019 at 4:21:13 AM UTC+2, gnuarm.del...@gmail.com =
wrote:
> >=20
> > So???  You are the one who keeps talking about software/hardware whatev=
er.  I'm talking about the software being able to synchronize with the cloc=
k of the other hardware.  When that happens there are tight timing constrai=
nts in the same sense of the software sampling an ADC on a periodic basis a=
nd having to process the resulting data before the next sample is ready.  T=
he only difference is something like the F18A running at a few GHz can do a=
 lot in a 10 ns clock cycle.=20
> >=20
> >=20
>=20
> I certainly don't like "few GHz" part.
> Distributing single multi-GHZ clock over full area of FPGA is non-starter=
 from power perspective alone, but even ignoring the power, such distributi=
on takes significant area making the whole proposition unattractive. As I u=
nderstand it, the whole point is that this thingies take little area, so th=
ey are not harmful even for those buyers of device that don't utilize them =
at all or utilize very little.

There is no multi-GHz clock distribution.  These CPUs can be self timed.  T=
he F18A is.  Think of asynchronous logic.  It's not literally asynchronous,=
 but similar with internal delays setting the speed so all the internal log=
ic works correctly.  The only clock would be whatever clock the rest of the=
 logic is using.=20

Think of these CPUs running from the clock generated by a ring oscillator i=
n each CPU.  There would be a minimum CPU speed over PVT (Process, Voltage,=
 Temperature).  That's all you need to make this work.=20


> Alternatively, multi-GHZ clocks can be generated by local specialized PLL=
s, but I am afraid that PLLs would be several times bigger than cores thems=
elves and need good non-noisy power supplies and grounds that are probably =
hard to get in the middle of the chip etc... I really know too little about=
 PLLs, but I think that I know enough to conclude that it's not much better=
 idea than chip-wide clock distribution at multi-GHZ.

That's the advantage of synchronizing at the interface rather than trying t=
o run at lock step.  CPUs free run at some fast speed.  They sit waiting fo=
r data on a clock transition not clocking, using very little power.  On rec=
eiving the same clock edge the rest of the chip is using the CPU starts run=
ning, data previously generated is output (like a FF), data on the inputs i=
s read, processed and the result is held while the CPU pends on the next cl=
ock edge again going into a sleep state. =20

You can read how the F18A does it at an atomic level in the clock managemen=
t.  The wake up is *very* fast. =20


> My idea of small hard cores is completely different in that regard. IMHO,=
 they should run either with the same clock as surrounding FPGA fabric or w=
ith clock, delivered by simple clock doubler. Even clock quadrupling does n=
ot appear as a good idea to my engineering intuition.

This would make the CPU ridiculously slow and not a good trade off for fabr=
ic logic. =20

CPUs can be size efficient when they do a lot of sequential calculations.  =
This essentially takes advantage of the enormous multiplexer in the memory =
to allow it to replace a larger amount of logic.  But if the needs are fast=
er than a slow processor can handle the processor needs to run at a much hi=
gher clock speed.  This allows an even higher space efficiency since now th=
e logic in the CPU is executing more instructions in a single clock. =20

So let a small CPU run a very high rates and synchronize at the system cloc=
k rate by handshaking just like a LUT/FF logic block without worrying about=
 the fact that it is running a lot of instructions.  It just needs to run e=
nough to get the job done.  The timing is like the logic in a data path bet=
ween FFs.  I has to run fast enough to reach the next FF before the next cl=
ock edge.  It won't matter if it is faster.  So the CPU only needs a minimu=
m spec on the internal clock speed.=20

Rick C.

Article: 161274
Subject: Re: Tiny CPUs for Slow Logic
From: gnuarm.deletethisbit@gmail.com
Date: Thu, 21 Mar 2019 07:27:40 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Thursday, March 21, 2019 at 3:37:14 AM UTC-4, David Brown wrote:
> On 21/03/2019 03:21, gnuarm.deletethisbit@gmail.com wrote:
> > On Wednesday, March 20, 2019 at 5:38:16 PM UTC-4, David Brown wrote:
>=20
> >=20
> >> I want to know if that is going to happen with your ideas here.
> >> Sure, you don't have a full business plan - but do you at least
> >> have thoughts about the kind of usage where these mini cpus would
> >> be a technologically superior choice compared to using state
> >> machines in VHDL (possibly generated with external programs),
> >> sequential logic generators (like C to HDL compilers, matlab tools,
> >> etc.), normal soft processors, or normal hard processors?
> >=20
> > The point wasn't that I don't have a business plan.  The point was
> > that I haven't given this as much thought as would have been done if
> > I were working on a business plan.  I'm kicking around an idea.  I'm
> > not in a position to create FPGA with or without small CPUs.
> >=20
> >=20
> >> Give me a /reason/ to all this - rather than just saying you can
> >> make a simple stack-based cpu that's very small, so you could have
> >> lots of them on a chip.
> >=20
> > Why?  Why don't you give ME a reason?  Why don't you switch your
> > point of view and figure out how this would be useful?  Neither of us
> > have anything to gain or lose.
> >=20
>=20
> I don't have any good ideas of what these might be used for.  And I=20
> can't see how it ends up as /my/ responsibility to figure out why /your/=
=20
> idea might be a good idea.
>=20
> You presented an idea - having several small, simple cpus on a chip.=20
> It's taken a long time, and a lot of side-tracks, to drag out of you=20
> what you are really thinking about.  (Perhaps you didn't have a clear=20
> idea in your mind with your first post, and it has solidified underway -=
=20
> in which case, great, and I'm glad the thread has been successful there.)
>=20
> I've been trying to help by trying to look at how these might be used,=20
> and how they compare to alternative existing solutions.  And I have been=
=20
> trying to get /you/ to come up with some ideas about when they might be=
=20
> useful.  All I'm getting is a lot of complaints, insults, condescension,=
=20
> patronisation.  You tell me I don't understand what these are for - yet=
=20
> you refuse to say what they are for (the nearest we have got in any post=
=20
> in this thread to evidence that there is any use-case, is you telling me=
=20
> you have ideas but refuse to tell me as I am not an FPGA designer by=20
> profession).  You are forever telling me about the wonders of the F18A=20
> and the GA144, and how I can't understand your ideas because I don't=20
> understand that device - while simultaneously telling me that device is=
=20
> irrelevant to your proposal.  You are asking for opinions and thoughts=20
> about how people would program these devices, then tell me I am wrong=20
> and closed-minded when I give you answers.
>=20
> Hopefully, you have got /some/ ideas and thoughts out of this thread.=20
> You can take a long, hard look at the idea in that light, and see if it=
=20
> really is something that could be useful - in today's world with today's=
=20
> tools and technology, or tomorrow's world with new tools and development=
=20
> systems.
>=20
> But next time you want to start a thread asking for ideas and opinions,=
=20
> how about responding with phrases like "I hadn't thought of it that=20
> way", "I think FPGA designers IME would like this" - not "You are wrong,=
=20
> and clearly ignorant".
>=20
> You are a smart guy, and you are great at answering other people's=20
> questions and helping them out - but boy, are you bad at asking for help=
=20
> yourself.

 I think if you go back and read, I said it all before.  But because there =
is a lot of new thinking involved, it was very hard to get you to understan=
d what was being said rather than continue to look at it the way you have b=
een looking at it for the last few decades.=20

Rick C.



Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search