Messages from 159200

Article: 159200
Subject: Minimal-operation shift-and-add (or subtract)
From: Tim Wescott <tim@seemywebsite.com>
Date: Thu, 01 Sep 2016 14:53:25 -0500
Links: << >> << T >> << A >>

There's a method that I know, but can't remember the name.  And now I 
want to tell someone to Google for it.

It basically starts with the notion of multiplying by shift-and-add, but 
uses the fact that if you shift and then either add or subtract, you can 
minimize "addition" operations.

I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Article: 159201
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: lasselangwadtchristensen@gmail.com
Date: Thu, 1 Sep 2016 13:19:09 -0700 (PDT)
Links: << >> << T >> << A >>

Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim Wescott:
> There's a method that I know, but can't remember the name.  And now I 
> want to tell someone to Google for it.
> 
> It basically starts with the notion of multiplying by shift-and-add, but 
> uses the fact that if you shift and then either add or subtract, you can 
> minimize "addition" operations.
> 
> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
> 

Booth?

-Lasse

Article: 159202
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: Tim Wescott <tim@seemywebsite.com>
Date: Thu, 01 Sep 2016 15:24:28 -0500
Links: << >> << T >> << A >>

On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen wrote:

> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim Wescott:
>> There's a method that I know, but can't remember the name.  And now I
>> want to tell someone to Google for it.
>> 
>> It basically starts with the notion of multiplying by shift-and-add,
>> but uses the fact that if you shift and then either add or subtract,
>> you can minimize "addition" operations.
>> 
>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>> 
>> 
> Booth?
> 
> -Lasse

That's it.  Thanks.

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Article: 159203
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: rickman <gnuarm@gmail.com>
Date: Thu, 1 Sep 2016 18:40:25 -0400
Links: << >> << T >> << A >>

On 9/1/2016 4:24 PM, Tim Wescott wrote:
> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen wrote:
>
>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim Wescott:
>>> There's a method that I know, but can't remember the name.  And now I
>>> want to tell someone to Google for it.
>>>
>>> It basically starts with the notion of multiplying by shift-and-add,
>>> but uses the fact that if you shift and then either add or subtract,
>>> you can minimize "addition" operations.
>>>
>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>
>>>
>> Booth?
>>
>> -Lasse
>
> That's it.  Thanks.

That is very familiar from college, but I don't recall the utility.  It 
would be useful for multiplying by constants, but otherwise how would 
this be used to advantage?  It would save add/subtract operations, but I 
can't think of another situation where this would be useful.

If the algorithm is doing an add and shift, the add does not increase 
the time or the hardware.  If building the full multiplier, an adder is 
included for each stage, it is either used or not used.  When done in 
software, the same applies.  It is easier to do the add than to skip 
over it.

-- 

Rick C

Article: 159204
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Thu, 01 Sep 2016 18:09:27 -0500
Links: << >> << T >> << A >>

On Thu, 01 Sep 2016 18:40:25 -0400, rickman wrote:

> On 9/1/2016 4:24 PM, Tim Wescott wrote:
>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen wrote:
>>
>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim
>>> Wescott:
>>>> There's a method that I know, but can't remember the name.  And now I
>>>> want to tell someone to Google for it.
>>>>
>>>> It basically starts with the notion of multiplying by shift-and-add,
>>>> but uses the fact that if you shift and then either add or subtract,
>>>> you can minimize "addition" operations.
>>>>
>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>>
>>>>
>>> Booth?
>>>
>>> -Lasse
>>
>> That's it.  Thanks.
> 
> That is very familiar from college, but I don't recall the utility.  It
> would be useful for multiplying by constants, but otherwise how would
> this be used to advantage?  It would save add/subtract operations, but I
> can't think of another situation where this would be useful.
> 
> If the algorithm is doing an add and shift, the add does not increase
> the time or the hardware.  If building the full multiplier, an adder is
> included for each stage, it is either used or not used.  When done in
> software, the same applies.  It is easier to do the add than to skip
> over it.

I asked here and on comp.arch.embedded.  It's for a guy who's doing 
assembly-language programming on a PIC12xxx -- for that guy, and for a 
small range of constants (4 through 7), it can save time over a full-
blown multiplication algorithm.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Article: 159205
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: lasselangwadtchristensen@gmail.com
Date: Thu, 1 Sep 2016 16:22:17 -0700 (PDT)
Links: << >> << T >> << A >>

Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev rickman:
> On 9/1/2016 4:24 PM, Tim Wescott wrote:
> > On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen wrote:
> >
> >> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim Wescott:
> >>> There's a method that I know, but can't remember the name.  And now I
> >>> want to tell someone to Google for it.
> >>>
> >>> It basically starts with the notion of multiplying by shift-and-add,
> >>> but uses the fact that if you shift and then either add or subtract,
> >>> you can minimize "addition" operations.
> >>>
> >>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
> >>>
> >>>
> >> Booth?
> >>
> >> -Lasse
> >
> > That's it.  Thanks.
> 
> That is very familiar from college, but I don't recall the utility.  It 
> would be useful for multiplying by constants, but otherwise how would 
> this be used to advantage?  It would save add/subtract operations, but I 
> can't think of another situation where this would be useful.
> 
> If the algorithm is doing an add and shift, the add does not increase 
> the time or the hardware.  If building the full multiplier, an adder is 
> included for each stage, it is either used or not used.  When done in 
> software, the same applies.  It is easier to do the add than to skip 
> over it.

you only need half the stages so it is half the size* and the critical path through your adders are only half as long

* you need a few multiplexors to choose between x1 and x2, subtract is invert and carry in

-Lasse

Article: 159206
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Thu, 1 Sep 2016 16:47:01 -0700 (PDT)
Links: << >> << T >> << A >>

> 
> It basically starts with the notion of multiplying by shift-and-add, but 
> uses the fact that if you shift and then either add or subtract, you can 
> minimize "addition" operations.
> 
> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
> 

Canonical Signed Digit (CSD) representation.  The CSD form of any number has at least 1/3 of the bits cleared.

244 = 011110100 = +000-0100

(Replace 01111 with +000-)

Article: 159207
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Thu, 1 Sep 2016 16:49:10 -0700 (PDT)
Links: << >> << T >> << A >>


> 
> Canonical Signed Digit (CSD) representation.  The CSD form of any number has at least 1/3 of the bits cleared.
> 
> 244 = 011110100 = +000-0100
> 
> (Replace 01111 with +000-)

I meant:
244 = 011110100 = +000-0+00

Article: 159208
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: rickman <gnuarm@gmail.com>
Date: Fri, 2 Sep 2016 02:23:07 -0400
Links: << >> << T >> << A >>

On 9/1/2016 7:09 PM, Tim Wescott wrote:
> On Thu, 01 Sep 2016 18:40:25 -0400, rickman wrote:
>
>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen wrote:
>>>
>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim
>>>> Wescott:
>>>>> There's a method that I know, but can't remember the name.  And now I
>>>>> want to tell someone to Google for it.
>>>>>
>>>>> It basically starts with the notion of multiplying by shift-and-add,
>>>>> but uses the fact that if you shift and then either add or subtract,
>>>>> you can minimize "addition" operations.
>>>>>
>>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>>>
>>>>>
>>>> Booth?
>>>>
>>>> -Lasse
>>>
>>> That's it.  Thanks.
>>
>> That is very familiar from college, but I don't recall the utility.  It
>> would be useful for multiplying by constants, but otherwise how would
>> this be used to advantage?  It would save add/subtract operations, but I
>> can't think of another situation where this would be useful.
>>
>> If the algorithm is doing an add and shift, the add does not increase
>> the time or the hardware.  If building the full multiplier, an adder is
>> included for each stage, it is either used or not used.  When done in
>> software, the same applies.  It is easier to do the add than to skip
>> over it.
>
> I asked here and on comp.arch.embedded.  It's for a guy who's doing
> assembly-language programming on a PIC12xxx -- for that guy, and for a
> small range of constants (4 through 7), it can save time over a full-
> blown multiplication algorithm.

Oh, sure, I use that all the time for constant multiplication.  No magic 
involved.  In some cases (such at multiplying by 4) it is simpler to 
just use a simple shift and add (which becomes trivial for a single 
bit).  Even for multipliers with two bits set, it is easier to use a 
simple add the shifted values.  In fact, the only case of multiplying by 
numbers in the range of 4 to 7 where Booth's algorithm simplifies the 
work is 7.  Since there are only four cases, I expect this could easily 
be implemented as four special cases.

-- 

Rick C

Article: 159209
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: rickman <gnuarm@gmail.com>
Date: Fri, 2 Sep 2016 02:39:07 -0400
Links: << >> << T >> << A >>

On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev rickman:
>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
>>> wrote:
>>>
>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim
>>>> Wescott:
>>>>> There's a method that I know, but can't remember the name.
>>>>> And now I want to tell someone to Google for it.
>>>>>
>>>>> It basically starts with the notion of multiplying by
>>>>> shift-and-add, but uses the fact that if you shift and then
>>>>> either add or subtract, you can minimize "addition"
>>>>> operations.
>>>>>
>>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>>>
>>>>>
>>>> Booth?
>>>>
>>>> -Lasse
>>>
>>> That's it.  Thanks.
>>
>> That is very familiar from college, but I don't recall the utility.
>> It would be useful for multiplying by constants, but otherwise how
>> would this be used to advantage?  It would save add/subtract
>> operations, but I can't think of another situation where this would
>> be useful.
>>
>> If the algorithm is doing an add and shift, the add does not
>> increase the time or the hardware.  If building the full
>> multiplier, an adder is included for each stage, it is either used
>> or not used.  When done in software, the same applies.  It is
>> easier to do the add than to skip over it.
>
> you only need half the stages so it is half the size* and the
> critical path through your adders are only half as long

I think you need to look at the algorithm again.  The degenerate case is 
a multiplier with alternating ones and zeros.  An add or subtract is 
needed at each 1->0 or 0->1 transition.  Since every bit is a transition 
you still need an adder/subtractor for every bit.

Of course you could add logic to detect these cases and do fewer adder 
ops, but then that is not Booth's algorithm anymore and is much more 
complex.  Booth's algorithm looks at pairs of bits in the multiplier, 
this would require looking at more bits.

If you are thinking in terms of constant multiplication then again, this 
is a modified method that combines Booth's with straight adds.

> * you need a few multiplexors to choose between x1 and x2, subtract
> is invert and carry in

Multiplexers are not low cost in any sense in many technologies, but it 
doesn't matter.  Booth's algorithm doesn't use multiplexers.

-- 

Rick C

Article: 159210
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: Tauno Voipio <tauno.voipio@notused.fi.invalid>
Date: Fri, 2 Sep 2016 13:59:30 +0300
Links: << >> << T >> << A >>

On 1.9.16 22:53, Tim Wescott wrote:
> There's a method that I know, but can't remember the name.  And now I
> want to tell someone to Google for it.
>
> It basically starts with the notion of multiplying by shift-and-add, but
> uses the fact that if you shift and then either add or subtract, you can
> minimize "addition" operations.
>
> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.


It seems that you're after Booth's algorithm:
<https://en.wikipedia.org/wiki/Booth%27s_multiplication_algorithm>.

-- 

-TV

Article: 159211
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: lasselangwadtchristensen@gmail.com
Date: Fri, 2 Sep 2016 13:01:36 -0700 (PDT)
Links: << >> << T >> << A >>

Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
> > Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev rickman:
> >> On 9/1/2016 4:24 PM, Tim Wescott wrote:
> >>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
> >>> wrote:
> >>>
> >>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev Tim
> >>>> Wescott:
> >>>>> There's a method that I know, but can't remember the name.
> >>>>> And now I want to tell someone to Google for it.
> >>>>>
> >>>>> It basically starts with the notion of multiplying by
> >>>>> shift-and-add, but uses the fact that if you shift and then
> >>>>> either add or subtract, you can minimize "addition"
> >>>>> operations.
> >>>>>
> >>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
> >>>>>
> >>>>>
> >>>> Booth?
> >>>>
> >>>> -Lasse
> >>>
> >>> That's it.  Thanks.
> >>
> >> That is very familiar from college, but I don't recall the utility.
> >> It would be useful for multiplying by constants, but otherwise how
> >> would this be used to advantage?  It would save add/subtract
> >> operations, but I can't think of another situation where this would
> >> be useful.
> >>
> >> If the algorithm is doing an add and shift, the add does not
> >> increase the time or the hardware.  If building the full
> >> multiplier, an adder is included for each stage, it is either used
> >> or not used.  When done in software, the same applies.  It is
> >> easier to do the add than to skip over it.
> >
> > you only need half the stages so it is half the size* and the
> > critical path through your adders are only half as long
> 
> I think you need to look at the algorithm again.  The degenerate case is 
> a multiplier with alternating ones and zeros.  An add or subtract is 
> needed at each 1->0 or 0->1 transition.  Since every bit is a transition 
> you still need an adder/subtractor for every bit.
> 
> Of course you could add logic to detect these cases and do fewer adder 
> ops, but then that is not Booth's algorithm anymore and is much more 
> complex.  Booth's algorithm looks at pairs of bits in the multiplier, 
> this would require looking at more bits.
> 

you are right, I was thinking of "modified Booth" it looks at 3 bits at 
a time, 

http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif


> If you are thinking in terms of constant multiplication then again, this 
> is a modified method that combines Booth's with straight adds.
> 
> 
> > * you need a few multiplexors to choose between x1 and x2, subtract
> > is invert and carry in
> 
> Multiplexers are not low cost in any sense in many technologies, but it 
> doesn't matter.  Booth's algorithm doesn't use multiplexers.

may not be low cost, but compared to a full adder?

and since the inputs come from the multiplicand and the multiplier not from other intermediate results it shouldn't be in the critical path

-Lasse

Article: 159212
Subject: Re: PALCE22v10 / GAL22v10 programming algorithms needed
From: thomas.entner99@gmail.com
Date: Fri, 2 Sep 2016 13:53:25 -0700 (PDT)
Links: << >> << T >> << A >>

In my youth I tried to build a GAL programmer, but I never got it to work with the samples I had.

Later I found out that there appeared to be quite different programming algorithms for different parts with the same name, so be aware...

Thomas

Article: 159213
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: rickman <gnuarm@gmail.com>
Date: Fri, 2 Sep 2016 19:37:49 -0400
Links: << >> << T >> << A >>

On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
>>> rickman:
>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
>>>>> wrote:
>>>>>
>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
>>>>>> Tim Wescott:
>>>>>>> There's a method that I know, but can't remember the
>>>>>>> name. And now I want to tell someone to Google for it.
>>>>>>>
>>>>>>> It basically starts with the notion of multiplying by
>>>>>>> shift-and-add, but uses the fact that if you shift and
>>>>>>> then either add or subtract, you can minimize "addition"
>>>>>>> operations.
>>>>>>>
>>>>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>>>>>
>>>>>>>
>>>>>> Booth?
>>>>>>
>>>>>> -Lasse
>>>>>
>>>>> That's it.  Thanks.
>>>>
>>>> That is very familiar from college, but I don't recall the
>>>> utility. It would be useful for multiplying by constants, but
>>>> otherwise how would this be used to advantage?  It would save
>>>> add/subtract operations, but I can't think of another situation
>>>> where this would be useful.
>>>>
>>>> If the algorithm is doing an add and shift, the add does not
>>>> increase the time or the hardware.  If building the full
>>>> multiplier, an adder is included for each stage, it is either
>>>> used or not used.  When done in software, the same applies.  It
>>>> is easier to do the add than to skip over it.
>>>
>>> you only need half the stages so it is half the size* and the
>>> critical path through your adders are only half as long
>>
>> I think you need to look at the algorithm again.  The degenerate
>> case is a multiplier with alternating ones and zeros.  An add or
>> subtract is needed at each 1->0 or 0->1 transition.  Since every
>> bit is a transition you still need an adder/subtractor for every
>> bit.
>>
>> Of course you could add logic to detect these cases and do fewer
>> adder ops, but then that is not Booth's algorithm anymore and is
>> much more complex.  Booth's algorithm looks at pairs of bits in the
>> multiplier, this would require looking at more bits.
>>
>
> you are right, I was thinking of "modified Booth" it looks at 3 bits
> at a time,
>
> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
>
>
>> If you are thinking in terms of constant multiplication then again,
>> this is a modified method that combines Booth's with straight
>> adds.
>>
>>
>>> * you need a few multiplexors to choose between x1 and x2,
>>> subtract is invert and carry in
>>
>> Multiplexers are not low cost in any sense in many technologies,
>> but it doesn't matter.  Booth's algorithm doesn't use
>> multiplexers.
>
> may not be low cost, but compared to a full adder?

In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF. 
Because there is extra carry logic in the LC one bit of an adder is the 
same logic as one bit of a multiplexer.  The only disadvantage of an 
adder is the carry propagation time, but they don't cascade, properly 
designed you end up with one ripple cascade through one adder and then 
the other logic delays as the adders all cascade in parallel.


> and since the inputs come from the multiplicand and the multiplier
> not from other intermediate results it shouldn't be in the critical
> path

???

I don't see how you use multiplexers instead of adders.  If the 
multiplier changes from one calculation to the next you need adders in 
every position.  If the multiplier is fixed the combinations of sums is 
fixed and no multiplexers are needed.

-- 

Rick C

Article: 159214
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: lasselangwadtchristensen@gmail.com
Date: Fri, 2 Sep 2016 17:08:36 -0700 (PDT)
Links: << >> << T >> << A >>

Den l=C3=B8rdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickman:
> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
> > Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
> >> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
> >>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
> >>> rickman:
> >>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
> >>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
> >>>>> wrote:
> >>>>>
> >>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
> >>>>>> Tim Wescott:
> >>>>>>> There's a method that I know, but can't remember the
> >>>>>>> name. And now I want to tell someone to Google for it.
> >>>>>>>
> >>>>>>> It basically starts with the notion of multiplying by
> >>>>>>> shift-and-add, but uses the fact that if you shift and
> >>>>>>> then either add or subtract, you can minimize "addition"
> >>>>>>> operations.
> >>>>>>>
> >>>>>>> I.e., 255 =3D 256 - 1, 244 =3D 256 - 16 + 4, etc.
> >>>>>>>
> >>>>>>>
> >>>>>> Booth?
> >>>>>>
> >>>>>> -Lasse
> >>>>>
> >>>>> That's it.  Thanks.
> >>>>
> >>>> That is very familiar from college, but I don't recall the
> >>>> utility. It would be useful for multiplying by constants, but
> >>>> otherwise how would this be used to advantage?  It would save
> >>>> add/subtract operations, but I can't think of another situation
> >>>> where this would be useful.
> >>>>
> >>>> If the algorithm is doing an add and shift, the add does not
> >>>> increase the time or the hardware.  If building the full
> >>>> multiplier, an adder is included for each stage, it is either
> >>>> used or not used.  When done in software, the same applies.  It
> >>>> is easier to do the add than to skip over it.
> >>>
> >>> you only need half the stages so it is half the size* and the
> >>> critical path through your adders are only half as long
> >>
> >> I think you need to look at the algorithm again.  The degenerate
> >> case is a multiplier with alternating ones and zeros.  An add or
> >> subtract is needed at each 1->0 or 0->1 transition.  Since every
> >> bit is a transition you still need an adder/subtractor for every
> >> bit.
> >>
> >> Of course you could add logic to detect these cases and do fewer
> >> adder ops, but then that is not Booth's algorithm anymore and is
> >> much more complex.  Booth's algorithm looks at pairs of bits in the
> >> multiplier, this would require looking at more bits.
> >>
> >
> > you are right, I was thinking of "modified Booth" it looks at 3 bits
> > at a time,
> >
> > http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
> >
> >
> >> If you are thinking in terms of constant multiplication then again,
> >> this is a modified method that combines Booth's with straight
> >> adds.
> >>
> >>
> >>> * you need a few multiplexors to choose between x1 and x2,
> >>> subtract is invert and carry in
> >>
> >> Multiplexers are not low cost in any sense in many technologies,
> >> but it doesn't matter.  Booth's algorithm doesn't use
> >> multiplexers.
> >
> > may not be low cost, but compared to a full adder?
>=20
> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.=20
> Because there is extra carry logic in the LC one bit of an adder is the=
=20
> same logic as one bit of a multiplexer.  The only disadvantage of an=20
> adder is the carry propagation time, but they don't cascade, properly=20
> designed you end up with one ripple cascade through one adder and then=20
> the other logic delays as the adders all cascade in parallel.

sure most fpgas have fast carry chains or build in multipliers so hand=20
coding "fancy" multiplier structures might not come out ahead
=20
>=20
>=20
> > and since the inputs come from the multiplicand and the multiplier
> > not from other intermediate results it shouldn't be in the critical
> > path
>=20
> ???
>=20
> I don't see how you use multiplexers instead of adders.  If the=20
> multiplier changes from one calculation to the next you need adders in=20
> every position.  If the multiplier is fixed the combinations of sums is=
=20
> fixed and no multiplexers are needed.

with a regular multiplier you have to go through N layers of adders=20
with a modified Booth you only have to go through N/2 adders=20

-Lasse

Article: 159215
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: rickman <gnuarm@gmail.com>
Date: Fri, 2 Sep 2016 21:45:06 -0400
Links: << >> << T >> << A >>

On 9/2/2016 8:08 PM, lasselangwadtchristensen@gmail.com wrote:
> Den lørdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickman:
>> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
>>> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
>>>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
>>>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
>>>>> rickman:
>>>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
>>>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
>>>>>>>> Tim Wescott:
>>>>>>>>> There's a method that I know, but can't remember the
>>>>>>>>> name. And now I want to tell someone to Google for it.
>>>>>>>>>
>>>>>>>>> It basically starts with the notion of multiplying by
>>>>>>>>> shift-and-add, but uses the fact that if you shift and
>>>>>>>>> then either add or subtract, you can minimize "addition"
>>>>>>>>> operations.
>>>>>>>>>
>>>>>>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Booth?
>>>>>>>>
>>>>>>>> -Lasse
>>>>>>>
>>>>>>> That's it.  Thanks.
>>>>>>
>>>>>> That is very familiar from college, but I don't recall the
>>>>>> utility. It would be useful for multiplying by constants, but
>>>>>> otherwise how would this be used to advantage?  It would save
>>>>>> add/subtract operations, but I can't think of another situation
>>>>>> where this would be useful.
>>>>>>
>>>>>> If the algorithm is doing an add and shift, the add does not
>>>>>> increase the time or the hardware.  If building the full
>>>>>> multiplier, an adder is included for each stage, it is either
>>>>>> used or not used.  When done in software, the same applies.  It
>>>>>> is easier to do the add than to skip over it.
>>>>>
>>>>> you only need half the stages so it is half the size* and the
>>>>> critical path through your adders are only half as long
>>>>
>>>> I think you need to look at the algorithm again.  The degenerate
>>>> case is a multiplier with alternating ones and zeros.  An add or
>>>> subtract is needed at each 1->0 or 0->1 transition.  Since every
>>>> bit is a transition you still need an adder/subtractor for every
>>>> bit.
>>>>
>>>> Of course you could add logic to detect these cases and do fewer
>>>> adder ops, but then that is not Booth's algorithm anymore and is
>>>> much more complex.  Booth's algorithm looks at pairs of bits in the
>>>> multiplier, this would require looking at more bits.
>>>>
>>>
>>> you are right, I was thinking of "modified Booth" it looks at 3 bits
>>> at a time,
>>>
>>> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
>>>
>>>
>>>> If you are thinking in terms of constant multiplication then again,
>>>> this is a modified method that combines Booth's with straight
>>>> adds.
>>>>
>>>>
>>>>> * you need a few multiplexors to choose between x1 and x2,
>>>>> subtract is invert and carry in
>>>>
>>>> Multiplexers are not low cost in any sense in many technologies,
>>>> but it doesn't matter.  Booth's algorithm doesn't use
>>>> multiplexers.
>>>
>>> may not be low cost, but compared to a full adder?
>>
>> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.
>> Because there is extra carry logic in the LC one bit of an adder is the
>> same logic as one bit of a multiplexer.  The only disadvantage of an
>> adder is the carry propagation time, but they don't cascade, properly
>> designed you end up with one ripple cascade through one adder and then
>> the other logic delays as the adders all cascade in parallel.
>
> sure most fpgas have fast carry chains or build in multipliers so hand
> coding "fancy" multiplier structures might not come out ahead
>
>>
>>
>>> and since the inputs come from the multiplicand and the multiplier
>>> not from other intermediate results it shouldn't be in the critical
>>> path
>>
>> ???
>>
>> I don't see how you use multiplexers instead of adders.  If the
>> multiplier changes from one calculation to the next you need adders in
>> every position.  If the multiplier is fixed the combinations of sums is
>> fixed and no multiplexers are needed.
>
> with a regular multiplier you have to go through N layers of adders
> with a modified Booth you only have to go through N/2 adders

and N/2 multiplexers which have the same delay... PLUS the selectable 
inverter which may or may not be combined with the adder.  What's the 
point?  A simple adder has N stages of delay in an FPGA, same as the 
much more complicated modified Booth's adder.  In an ASIC there may be 
some advantage.  In software, I expect the much more complicated control 
will make the modified Booth's algorithm the slowest of the three.

People so often forget that multiplexers are not trivial logic.

-- 

Rick C

Article: 159216
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: lasselangwadtchristensen@gmail.com
Date: Sat, 3 Sep 2016 05:46:42 -0700 (PDT)
Links: << >> << T >> << A >>

Den l=C3=B8rdag den 3. september 2016 kl. 03.45.13 UTC+2 skrev rickman:
> On 9/2/2016 8:08 PM, lasselangwadtchristensen@gmail.com wrote:
> > Den l=C3=B8rdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickman:
> >> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
> >>> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
> >>>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
> >>>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
> >>>>> rickman:
> >>>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
> >>>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
> >>>>>>>> Tim Wescott:
> >>>>>>>>> There's a method that I know, but can't remember the
> >>>>>>>>> name. And now I want to tell someone to Google for it.
> >>>>>>>>>
> >>>>>>>>> It basically starts with the notion of multiplying by
> >>>>>>>>> shift-and-add, but uses the fact that if you shift and
> >>>>>>>>> then either add or subtract, you can minimize "addition"
> >>>>>>>>> operations.
> >>>>>>>>>
> >>>>>>>>> I.e., 255 =3D 256 - 1, 244 =3D 256 - 16 + 4, etc.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Booth?
> >>>>>>>>
> >>>>>>>> -Lasse
> >>>>>>>
> >>>>>>> That's it.  Thanks.
> >>>>>>
> >>>>>> That is very familiar from college, but I don't recall the
> >>>>>> utility. It would be useful for multiplying by constants, but
> >>>>>> otherwise how would this be used to advantage?  It would save
> >>>>>> add/subtract operations, but I can't think of another situation
> >>>>>> where this would be useful.
> >>>>>>
> >>>>>> If the algorithm is doing an add and shift, the add does not
> >>>>>> increase the time or the hardware.  If building the full
> >>>>>> multiplier, an adder is included for each stage, it is either
> >>>>>> used or not used.  When done in software, the same applies.  It
> >>>>>> is easier to do the add than to skip over it.
> >>>>>
> >>>>> you only need half the stages so it is half the size* and the
> >>>>> critical path through your adders are only half as long
> >>>>
> >>>> I think you need to look at the algorithm again.  The degenerate
> >>>> case is a multiplier with alternating ones and zeros.  An add or
> >>>> subtract is needed at each 1->0 or 0->1 transition.  Since every
> >>>> bit is a transition you still need an adder/subtractor for every
> >>>> bit.
> >>>>
> >>>> Of course you could add logic to detect these cases and do fewer
> >>>> adder ops, but then that is not Booth's algorithm anymore and is
> >>>> much more complex.  Booth's algorithm looks at pairs of bits in the
> >>>> multiplier, this would require looking at more bits.
> >>>>
> >>>
> >>> you are right, I was thinking of "modified Booth" it looks at 3 bits
> >>> at a time,
> >>>
> >>> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
> >>>
> >>>
> >>>> If you are thinking in terms of constant multiplication then again,
> >>>> this is a modified method that combines Booth's with straight
> >>>> adds.
> >>>>
> >>>>
> >>>>> * you need a few multiplexors to choose between x1 and x2,
> >>>>> subtract is invert and carry in
> >>>>
> >>>> Multiplexers are not low cost in any sense in many technologies,
> >>>> but it doesn't matter.  Booth's algorithm doesn't use
> >>>> multiplexers.
> >>>
> >>> may not be low cost, but compared to a full adder?
> >>
> >> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.
> >> Because there is extra carry logic in the LC one bit of an adder is th=
e
> >> same logic as one bit of a multiplexer.  The only disadvantage of an
> >> adder is the carry propagation time, but they don't cascade, properly
> >> designed you end up with one ripple cascade through one adder and then
> >> the other logic delays as the adders all cascade in parallel.
> >
> > sure most fpgas have fast carry chains or build in multipliers so hand
> > coding "fancy" multiplier structures might not come out ahead
> >
> >>
> >>
> >>> and since the inputs come from the multiplicand and the multiplier
> >>> not from other intermediate results it shouldn't be in the critical
> >>> path
> >>
> >> ???
> >>
> >> I don't see how you use multiplexers instead of adders.  If the
> >> multiplier changes from one calculation to the next you need adders in
> >> every position.  If the multiplier is fixed the combinations of sums i=
s
> >> fixed and no multiplexers are needed.
> >
> > with a regular multiplier you have to go through N layers of adders
> > with a modified Booth you only have to go through N/2 adders
>=20
> and N/2 multiplexers which have the same delay... PLUS the selectable=20
> inverter which may or may not be combined with the adder.  What's the=20
> point? =20

the multiplexers get their input from the multiplicant not the output=20
of the previous adder

> A simple adder has N stages of delay in an FPGA, same as the=20
> much more complicated modified Booth's adder.  In an ASIC there may be=20
> some advantage.  In software, I expect the much more complicated control=
=20
> will make the modified Booth's algorithm the slowest of the three.

agreed=20
=20
> People so often forget that multiplexers are not trivial logic.

I think the main purpose of modified booth is speed not size=20

-Lasse

Article: 159217
Subject: Re: Help me choose an FPGA to design network protocols
From: "Michael Kellett" <nospam@invalid.com>
Date: sat, 3 sep 2016 16:13:48 +0100
Links: << >> << T >> << A >>

PM X:
> On Monday, August 29, 2016 at 11:38:27 AM UTC-7, Cecil Bayona wrote:
>> On 8/29/2016 1:32 PM, rickman wrote:
>> > On 8/29/2016 4:58 AM, PM X wrote:
>> >>>
>> >>> UDP/IP is much simpler the TCP/IP. It is commonly done in FPGAs.
>> >>>
>> >>> For example:
>> >>>
>> >>> http://www.fpga4fun.com/10BASE-T.html
>> >>>
>> >>> OK. It is only 10Base-T. But it's not that different than the
10GbE that
>> >>> we do.
>> >>>
>> >>> You can get a crappy NIC and Basys 3 Artix 7 board for less than
$200.00
>> >>>
>> >>>
http://store.digilentinc.com/pmodnic100-network-interface-controller/
>> >>>
>> >>> It won't be low latency (the NIC has an SPI serial interface) but
it
>> >>> will teach concepts.
>> >>>
>> >>> Rob.
>> >>
>> >> Thanks. Is this the board you are referring to?
>> >>
http://store.digilentinc.com/basys-3-artix-7-fpga-trainer-board-recommended-for-introductory-users/
>> >>
>> >>
>> >> If so, this board doesn't seem to have SPI (at least no listed in
the
>> >> description). Also, do you think this board has enough capacity
(in
>> >> terms of logic elements, etc.) to support a fairly complicated
design
>> >> like UDP/IP?
>> >
>> > What do you mean it doesn't have SPI?  SPI is a simple shift
register
>> > interface which can *easily* be implemented in an FPGA (or MCU)
using
>> > the GPIOs.
>> >
>> > Do you mean ISP, in system programming?  If it doesn't have ISP how
do
>> > you load your design?
>> >
>>
>> It's a FPGA, you can add SPI easily, there are IPs for free to allow
>> that to happen.
>>
>> In my post I was also going to mention the BASYS-3  board, I left it
out
>> because the Arty Board has a ton of memory available that this one
>> doesn't but this on has a lot of switches and LED which can be
handy.
>> --
>> Cecil - k5nwa
> 
> OK, thanks. I will check out both of them. What is the largest design
you (or someone you know) have implemented on these boards? The Artix
line seems to be lower end than Virtex line, so trying to get an idea if
they can support somewhat complicated designs.

I haven't used the Arty except on a 1 day Xilinx "get to know Vivado"
jolly. It's quite anice board. We are looking at alternatives to Lattice
(no big reason, just due diligence) with a design currently running on a
Lattice ECP3-35. The FPGA on the Arty could easily do Ethernet trispeed
MAC, IP, ARP, UDP and some more. (We do this on the Lattice (about 20%
of it) so I know of what I speak (at least that far)) TCP on the FPGA
would be bigger (could be much bigger).

As an aside - why does HF trading use TCP - (think of this as an
interview question :-)

Our UDP support is good at transmitting, poor at receiving ('coz that's
what we need) - it does support an in-house protocol for re-transmission
and message integrity - much simpler than TCP and faster.

By the time you've outgrown the Arty you've either done the career shift
or it isn't going happen.

Michael Kellett

Article: 159218
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: rickman <gnuarm@gmail.com>
Date: Sat, 3 Sep 2016 18:10:49 -0400
Links: << >> << T >> << A >>

On 9/3/2016 8:46 AM, lasselangwadtchristensen@gmail.com wrote:
> Den lørdag den 3. september 2016 kl. 03.45.13 UTC+2 skrev rickman:
>> On 9/2/2016 8:08 PM, lasselangwadtchristensen@gmail.com wrote:
>>> Den lørdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickman:
>>>> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
>>>>> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
>>>>>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
>>>>>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
>>>>>>> rickman:
>>>>>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
>>>>>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
>>>>>>>>>> Tim Wescott:
>>>>>>>>>>> There's a method that I know, but can't remember the
>>>>>>>>>>> name. And now I want to tell someone to Google for it.
>>>>>>>>>>>
>>>>>>>>>>> It basically starts with the notion of multiplying by
>>>>>>>>>>> shift-and-add, but uses the fact that if you shift and
>>>>>>>>>>> then either add or subtract, you can minimize "addition"
>>>>>>>>>>> operations.
>>>>>>>>>>>
>>>>>>>>>>> I.e., 255 = 256 - 1, 244 = 256 - 16 + 4, etc.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Booth?
>>>>>>>>>>
>>>>>>>>>> -Lasse
>>>>>>>>>
>>>>>>>>> That's it.  Thanks.
>>>>>>>>
>>>>>>>> That is very familiar from college, but I don't recall the
>>>>>>>> utility. It would be useful for multiplying by constants, but
>>>>>>>> otherwise how would this be used to advantage?  It would save
>>>>>>>> add/subtract operations, but I can't think of another situation
>>>>>>>> where this would be useful.
>>>>>>>>
>>>>>>>> If the algorithm is doing an add and shift, the add does not
>>>>>>>> increase the time or the hardware.  If building the full
>>>>>>>> multiplier, an adder is included for each stage, it is either
>>>>>>>> used or not used.  When done in software, the same applies.  It
>>>>>>>> is easier to do the add than to skip over it.
>>>>>>>
>>>>>>> you only need half the stages so it is half the size* and the
>>>>>>> critical path through your adders are only half as long
>>>>>>
>>>>>> I think you need to look at the algorithm again.  The degenerate
>>>>>> case is a multiplier with alternating ones and zeros.  An add or
>>>>>> subtract is needed at each 1->0 or 0->1 transition.  Since every
>>>>>> bit is a transition you still need an adder/subtractor for every
>>>>>> bit.
>>>>>>
>>>>>> Of course you could add logic to detect these cases and do fewer
>>>>>> adder ops, but then that is not Booth's algorithm anymore and is
>>>>>> much more complex.  Booth's algorithm looks at pairs of bits in the
>>>>>> multiplier, this would require looking at more bits.
>>>>>>
>>>>>
>>>>> you are right, I was thinking of "modified Booth" it looks at 3 bits
>>>>> at a time,
>>>>>
>>>>> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
>>>>>
>>>>>
>>>>>> If you are thinking in terms of constant multiplication then again,
>>>>>> this is a modified method that combines Booth's with straight
>>>>>> adds.
>>>>>>
>>>>>>
>>>>>>> * you need a few multiplexors to choose between x1 and x2,
>>>>>>> subtract is invert and carry in
>>>>>>
>>>>>> Multiplexers are not low cost in any sense in many technologies,
>>>>>> but it doesn't matter.  Booth's algorithm doesn't use
>>>>>> multiplexers.
>>>>>
>>>>> may not be low cost, but compared to a full adder?
>>>>
>>>> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.
>>>> Because there is extra carry logic in the LC one bit of an adder is the
>>>> same logic as one bit of a multiplexer.  The only disadvantage of an
>>>> adder is the carry propagation time, but they don't cascade, properly
>>>> designed you end up with one ripple cascade through one adder and then
>>>> the other logic delays as the adders all cascade in parallel.
>>>
>>> sure most fpgas have fast carry chains or build in multipliers so hand
>>> coding "fancy" multiplier structures might not come out ahead
>>>
>>>>
>>>>
>>>>> and since the inputs come from the multiplicand and the multiplier
>>>>> not from other intermediate results it shouldn't be in the critical
>>>>> path
>>>>
>>>> ???
>>>>
>>>> I don't see how you use multiplexers instead of adders.  If the
>>>> multiplier changes from one calculation to the next you need adders in
>>>> every position.  If the multiplier is fixed the combinations of sums is
>>>> fixed and no multiplexers are needed.
>>>
>>> with a regular multiplier you have to go through N layers of adders
>>> with a modified Booth you only have to go through N/2 adders
>>
>> and N/2 multiplexers which have the same delay... PLUS the selectable
>> inverter which may or may not be combined with the adder.  What's the
>> point?
>
> the multiplexers get their input from the multiplicant not the output
> of the previous adder

Look at your diagram again.  The multiplexer is muxing the output of the 
previous stage or the output of the stage prior to that if the previous 
stage is skipped.  The multiplicand is the other input to the adder and 
the control signal is derived from the multiplier which means there is 
an added level of delay into the first stage with then has to ripple 
through the entire structure.  The multiplexers *must* be in the path of 
the additions.



  Multiplicand                                 Multiplier
       |                                            |
       +--------------------+                       |
       |                    |                       |
       |                +-------+                   |
       |                |       |     Control       |
       +----------+     | +/-/0 |-------------------+
       |          |     |       |                   |
       |          |     +-------+                   |
       |          |         |                       |
       |          |         +------+                |
       |          |         |      |                |
       |      +-------+ +-------+  |   +--------+   |
       |       \       V       /   |   |        |   |
       |        \         +/- /----)---| Decode |---+
       |         \    ADD    /     |   |        |   |
       |          \_________/      |   +--------+   |
       |               |           |                |
       |               |         +-+                |
       |               |         |                  |
       |            +---------------+  +--------+   |
       |             \             /   |        |   |
       +----------+   \    MUX    /----| Decode |---+
       |          |    \         /     |        |   |
       |          |     +-------+      +--------+   |
       |          |         |                       |
       |          |         +------+                |
       |          |         |      |                |
       |      +-------+ +-------+  |   +--------+   |
       |       \       V       /   |   |        |   |
       |        \         +/- /----)---| Decode |---+
       |         \    ADD    /     |   |        |   |
       |          \_________/      |   +--------+   |
       |               |           |                |
       |               |         +-+                |
       |               |         |                  |
       |            +---------------+  +--------+   |
       |             \             /   |        |   |
       +----------+   \    MUX    /----| Decode |---+
       |          |    \         /     |        |   |
       |          |     +-------+      +--------+   |
       |          |         |                       |



>> A simple adder has N stages of delay in an FPGA, same as the
>> much more complicated modified Booth's adder.  In an ASIC there may be
>> some advantage.  In software, I expect the much more complicated control
>> will make the modified Booth's algorithm the slowest of the three.
>
> agreed
>
>> People so often forget that multiplexers are not trivial logic.
>
> I think the main purpose of modified booth is speed not size

I'm not sure how much faster a multiplexer is compared to the adder. 
Even if you bypass a bunch of adders, you pass through an equivalent 
number of muxes.  In an FPGA this is equivalent in speed.  In an ASIC 
you may see some speed advantage.

-- 

Rick C

Article: 159219
Subject: Re: Minimal-operation shift-and-add (or subtract)
From: lasselangwadtchristensen@gmail.com
Date: Sat, 3 Sep 2016 16:44:15 -0700 (PDT)
Links: << >> << T >> << A >>

Den s=C3=B8ndag den 4. september 2016 kl. 00.10.56 UTC+2 skrev rickman:
> On 9/3/2016 8:46 AM, lasselangwadtchristensen@gmail.com wrote:
> > Den l=C3=B8rdag den 3. september 2016 kl. 03.45.13 UTC+2 skrev rickman:
> >> On 9/2/2016 8:08 PM, lasselangwadtchristensen@gmail.com wrote:
> >>> Den l=C3=B8rdag den 3. september 2016 kl. 01.37.56 UTC+2 skrev rickma=
n:
> >>>> On 9/2/2016 4:01 PM, lasselangwadtchristensen@gmail.com wrote:
> >>>>> Den fredag den 2. september 2016 kl. 08.39.20 UTC+2 skrev rickman:
> >>>>>> On 9/1/2016 7:22 PM, lasselangwadtchristensen@gmail.com wrote:
> >>>>>>> Den fredag den 2. september 2016 kl. 00.40.28 UTC+2 skrev
> >>>>>>> rickman:
> >>>>>>>> On 9/1/2016 4:24 PM, Tim Wescott wrote:
> >>>>>>>>> On Thu, 01 Sep 2016 13:19:09 -0700, lasselangwadtchristensen
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Den torsdag den 1. september 2016 kl. 21.53.33 UTC+2 skrev
> >>>>>>>>>> Tim Wescott:
> >>>>>>>>>>> There's a method that I know, but can't remember the
> >>>>>>>>>>> name. And now I want to tell someone to Google for it.
> >>>>>>>>>>>
> >>>>>>>>>>> It basically starts with the notion of multiplying by
> >>>>>>>>>>> shift-and-add, but uses the fact that if you shift and
> >>>>>>>>>>> then either add or subtract, you can minimize "addition"
> >>>>>>>>>>> operations.
> >>>>>>>>>>>
> >>>>>>>>>>> I.e., 255 =3D 256 - 1, 244 =3D 256 - 16 + 4, etc.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> Booth?
> >>>>>>>>>>
> >>>>>>>>>> -Lasse
> >>>>>>>>>
> >>>>>>>>> That's it.  Thanks.
> >>>>>>>>
> >>>>>>>> That is very familiar from college, but I don't recall the
> >>>>>>>> utility. It would be useful for multiplying by constants, but
> >>>>>>>> otherwise how would this be used to advantage?  It would save
> >>>>>>>> add/subtract operations, but I can't think of another situation
> >>>>>>>> where this would be useful.
> >>>>>>>>
> >>>>>>>> If the algorithm is doing an add and shift, the add does not
> >>>>>>>> increase the time or the hardware.  If building the full
> >>>>>>>> multiplier, an adder is included for each stage, it is either
> >>>>>>>> used or not used.  When done in software, the same applies.  It
> >>>>>>>> is easier to do the add than to skip over it.
> >>>>>>>
> >>>>>>> you only need half the stages so it is half the size* and the
> >>>>>>> critical path through your adders are only half as long
> >>>>>>
> >>>>>> I think you need to look at the algorithm again.  The degenerate
> >>>>>> case is a multiplier with alternating ones and zeros.  An add or
> >>>>>> subtract is needed at each 1->0 or 0->1 transition.  Since every
> >>>>>> bit is a transition you still need an adder/subtractor for every
> >>>>>> bit.
> >>>>>>
> >>>>>> Of course you could add logic to detect these cases and do fewer
> >>>>>> adder ops, but then that is not Booth's algorithm anymore and is
> >>>>>> much more complex.  Booth's algorithm looks at pairs of bits in th=
e
> >>>>>> multiplier, this would require looking at more bits.
> >>>>>>
> >>>>>
> >>>>> you are right, I was thinking of "modified Booth" it looks at 3 bit=
s
> >>>>> at a time,
> >>>>>
> >>>>> http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif
> >>>>>
> >>>>>
> >>>>>> If you are thinking in terms of constant multiplication then again=
,
> >>>>>> this is a modified method that combines Booth's with straight
> >>>>>> adds.
> >>>>>>
> >>>>>>
> >>>>>>> * you need a few multiplexors to choose between x1 and x2,
> >>>>>>> subtract is invert and carry in
> >>>>>>
> >>>>>> Multiplexers are not low cost in any sense in many technologies,
> >>>>>> but it doesn't matter.  Booth's algorithm doesn't use
> >>>>>> multiplexers.
> >>>>>
> >>>>> may not be low cost, but compared to a full adder?
> >>>>
> >>>> In an FPGA the unit of size is the Logic Cell (LC), a LUT and a FF.
> >>>> Because there is extra carry logic in the LC one bit of an adder is =
the
> >>>> same logic as one bit of a multiplexer.  The only disadvantage of an
> >>>> adder is the carry propagation time, but they don't cascade, properl=
y
> >>>> designed you end up with one ripple cascade through one adder and th=
en
> >>>> the other logic delays as the adders all cascade in parallel.
> >>>
> >>> sure most fpgas have fast carry chains or build in multipliers so han=
d
> >>> coding "fancy" multiplier structures might not come out ahead
> >>>
> >>>>
> >>>>
> >>>>> and since the inputs come from the multiplicand and the multiplier
> >>>>> not from other intermediate results it shouldn't be in the critical
> >>>>> path
> >>>>
> >>>> ???
> >>>>
> >>>> I don't see how you use multiplexers instead of adders.  If the
> >>>> multiplier changes from one calculation to the next you need adders =
in
> >>>> every position.  If the multiplier is fixed the combinations of sums=
 is
> >>>> fixed and no multiplexers are needed.
> >>>
> >>> with a regular multiplier you have to go through N layers of adders
> >>> with a modified Booth you only have to go through N/2 adders
> >>
> >> and N/2 multiplexers which have the same delay... PLUS the selectable
> >> inverter which may or may not be combined with the adder.  What's the
> >> point?
> >
> > the multiplexers get their input from the multiplicant not the output
> > of the previous adder
>=20
> Look at your diagram again.  The multiplexer is muxing the output of the=
=20
> previous stage or the output of the stage prior to that if the previous=
=20
> stage is skipped.  The multiplicand is the other input to the adder and=
=20
> the control signal is derived from the multiplier which means there is=20
> an added level of delay into the first stage with then has to ripple=20
> through the entire structure.  The multiplexers *must* be in the path of=
=20
> the additions.

the multiplexers can be moved to multiplicand, adding zero is the same as s=
kipping

http://www.ellab.physics.upatras.gr/~bakalis/Eudoxus/mbm8.gif

>=20
> >> A simple adder has N stages of delay in an FPGA, same as the
> >> much more complicated modified Booth's adder.  In an ASIC there may be
> >> some advantage.  In software, I expect the much more complicated contr=
ol
> >> will make the modified Booth's algorithm the slowest of the three.
> >
> > agreed
> >
> >> People so often forget that multiplexers are not trivial logic.
> >
> > I think the main purpose of modified booth is speed not size
>=20
> I'm not sure how much faster a multiplexer is compared to the adder.=20
> Even if you bypass a bunch of adders, you pass through an equivalent=20
> number of muxes. =20

I'm still talking modified Booth, it halfs the number of layers
=20
-Lasse

Article: 159220
Subject: eliminating a DDS
From: John Larkin <jjlarkin@highlandtechnology.com>
Date: Sun, 04 Sep 2016 11:11:32 -0700
Links: << >> << T >> << A >>



I have a design that will use a DDS synthesizer to generate an
internal trigger rate for a pulse generator. The chip will be a ZYNQ
7020. The required upper frequency limit is maybe 20 MHz. The FPGA
will have the usual, 48 bit or so, phase accumulator and sine lookup
stuff clocked at maybe 100 MHz. The FPGA drives a fast DAC which in
turn drives an LC lowpass filter and a comparator. Standard stuff.

But could such a clock be generated entirely inside the FPGA?

Just using the MSB of the DDS phase accumulator works, but it will
have one full clock, 10 ns p-p, of jitter. That will be ugly at 20
MHz. I've got to look into some sort of outboard analog filtering to
clean up that single-bit clock, but I'm not optimistic. DDS is just
too weird.

Do you suppose that one of the FPGA PLLs be used to clean up the DDS
clock, scrub the jitter somehow? That could maybe be used over a
modest range, octave maybe, followed by some dividers.

Any other ideas for making a programmable-frequency clock with DDS
sort of resolution, but without all that outboard analog stuff?

I've been playing with sorta DDS in LT Spice, using a quantizer to
approximate the DDS accumulator and DAC, but that's obviously not the
best tool for this.



-- 

John Larkin         Highland Technology, Inc

lunatic fringe electronics

Article: 159221
Subject: Re: eliminating a DDS
From: Mike Perkins <spam@spam.com>
Date: Sun, 4 Sep 2016 20:13:04 +0100
Links: << >> << T >> << A >>

On 04/09/2016 19:11, John Larkin wrote:
>
>
> I have a design that will use a DDS synthesizer to generate an
> internal trigger rate for a pulse generator. The chip will be a ZYNQ
> 7020. The required upper frequency limit is maybe 20 MHz. The FPGA
> will have the usual, 48 bit or so, phase accumulator and sine lookup
> stuff clocked at maybe 100 MHz. The FPGA drives a fast DAC which in
> turn drives an LC lowpass filter and a comparator. Standard stuff.
>
> But could such a clock be generated entirely inside the FPGA?
>
> Just using the MSB of the DDS phase accumulator works, but it will
> have one full clock, 10 ns p-p, of jitter. That will be ugly at 20
> MHz. I've got to look into some sort of outboard analog filtering to
> clean up that single-bit clock, but I'm not optimistic. DDS is just
> too weird.
>
> Do you suppose that one of the FPGA PLLs be used to clean up the DDS
> clock, scrub the jitter somehow? That could maybe be used over a
> modest range, octave maybe, followed by some dividers.

That isn't how FPGA PLLs work. They add jitter rather than removing it!

> Any other ideas for making a programmable-frequency clock with DDS
> sort of resolution, but without all that outboard analog stuff?
>
> I've been playing with sorta DDS in LT Spice, using a quantizer to
> approximate the DDS accumulator and DAC, but that's obviously not the
> best tool for this.

The jitter of a clock derived from within a FPGA would simply be related 
to the clock frequency used.

If you use a 250MHz clock, as per the max frequency of many cheap FPGAs, 
then jitter will be 4ns (+ a small bit).

What jitter spec are you looking for? What is the range of frequencies 
you require?

Would a VCO / PLL be a better bet to filter the digital jitter, using 
the MSB of your phase accumulator as the reference?

-- 
Mike Perkins
Video Solutions Ltd
www.videosolutions.ltd.uk

Article: 159222
Subject: Re: eliminating a DDS
From: John Larkin <jjlarkin@highlandtechnology.com>
Date: Sun, 04 Sep 2016 12:25:23 -0700
Links: << >> << T >> << A >>

On Sun, 4 Sep 2016 20:13:04 +0100, Mike Perkins <spam@spam.com> wrote:

>On 04/09/2016 19:11, John Larkin wrote:
>>
>>
>> I have a design that will use a DDS synthesizer to generate an
>> internal trigger rate for a pulse generator. The chip will be a ZYNQ
>> 7020. The required upper frequency limit is maybe 20 MHz. The FPGA
>> will have the usual, 48 bit or so, phase accumulator and sine lookup
>> stuff clocked at maybe 100 MHz. The FPGA drives a fast DAC which in
>> turn drives an LC lowpass filter and a comparator. Standard stuff.
>>
>> But could such a clock be generated entirely inside the FPGA?
>>
>> Just using the MSB of the DDS phase accumulator works, but it will
>> have one full clock, 10 ns p-p, of jitter. That will be ugly at 20
>> MHz. I've got to look into some sort of outboard analog filtering to
>> clean up that single-bit clock, but I'm not optimistic. DDS is just
>> too weird.
>>
>> Do you suppose that one of the FPGA PLLs be used to clean up the DDS
>> clock, scrub the jitter somehow? That could maybe be used over a
>> modest range, octave maybe, followed by some dividers.
>
>That isn't how FPGA PLLs work. They add jitter rather than removing it!

We did one clock X2 multiply with a Xilinx DLL, 40 ==> 80 MHz, and the
resulting clock period was bimodal, about 80 ps or so. Ugly. The
actual jitter, ignoring the bimode, wasn't bad.

>
>> Any other ideas for making a programmable-frequency clock with DDS
>> sort of resolution, but without all that outboard analog stuff?
>>
>> I've been playing with sorta DDS in LT Spice, using a quantizer to
>> approximate the DDS accumulator and DAC, but that's obviously not the
>> best tool for this.
>
>The jitter of a clock derived from within a FPGA would simply be related 
>to the clock frequency used.
>
>If you use a 250MHz clock, as per the max frequency of many cheap FPGAs, 
>then jitter will be 4ns (+ a small bit).
>
>What jitter spec are you looking for? What is the range of frequencies 
>you require?

Picoseconds of period jitter would be nice!

>
>Would a VCO / PLL be a better bet to filter the digital jitter, using 
>the MSB of your phase accumulator as the reference?

Or maybe a filter and comparator? Instinct suggests that would be
mediocre. The real advantage of outboard DDS-MSB clock cleanup is all
those DAC pins saved; probably no cost advantage.

This was a longshot question, just to see if there was some clever
trick lurking somewhere.



-- 

John Larkin         Highland Technology, Inc

lunatic fringe electronics

Article: 159223
Subject: Re: eliminating a DDS
From: lasselangwadtchristensen@gmail.com
Date: Sun, 4 Sep 2016 13:23:46 -0700 (PDT)
Links: << >> << T >> << A >>

Den s=C3=B8ndag den 4. september 2016 kl. 20.11.40 UTC+2 skrev John Larkin:
> I have a design that will use a DDS synthesizer to generate an
> internal trigger rate for a pulse generator. The chip will be a ZYNQ
> 7020. The required upper frequency limit is maybe 20 MHz. The FPGA
> will have the usual, 48 bit or so, phase accumulator and sine lookup
> stuff clocked at maybe 100 MHz. The FPGA drives a fast DAC which in
> turn drives an LC lowpass filter and a comparator. Standard stuff.
>=20
> But could such a clock be generated entirely inside the FPGA?
>=20
> Just using the MSB of the DDS phase accumulator works, but it will
> have one full clock, 10 ns p-p, of jitter. That will be ugly at 20
> MHz. I've got to look into some sort of outboard analog filtering to
> clean up that single-bit clock, but I'm not optimistic. DDS is just
> too weird.
>=20
> Do you suppose that one of the FPGA PLLs be used to clean up the DDS
> clock, scrub the jitter somehow? That could maybe be used over a
> modest range, octave maybe, followed by some dividers.
>=20
> Any other ideas for making a programmable-frequency clock with DDS
> sort of resolution, but without all that outboard analog stuff?
>=20
> I've been playing with sorta DDS in LT Spice, using a quantizer to
> approximate the DDS accumulator and DAC, but that's obviously not the
> best tool for this.
>=20

you might be able to do something with the clock manager PLL, it does have=
=20
a jitter filter mode but I haven't had any reason to look at how it works

other trickery you could do is=20

use an DDR output flop to get double resolution or more with a faster clock
use some trickery with pll and serdes output=20

-Lasse

Article: 159224
Subject: Re: eliminating a DDS
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Sun, 4 Sep 2016 22:30:15 +0100
Links: << >> << T >> << A >>

On 04/09/16 19:11, John Larkin wrote:

> Do you suppose that one of the FPGA PLLs be used to clean up the DDS
> clock, scrub the jitter somehow? That could maybe be used over a
> modest range, octave maybe, followed by some dividers.

*IIRC* you have to use the Xilinx clocking wizard (part
of ISE or Vivado), and the SERDES blocks' multipliers have
a jitter of 150-200ps.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search