Messages from 156425

Article: 156425
Subject: Tristates in synthesis
From: Philipp Klaus Krause <pkk@spth.de>
Date: Fri, 04 Apr 2014 14:17:12 +0200
Links: << >> << T >> << A >>

How well can the synthesis tools deal with tristates?

If I use the following Verilog code for a Xilinx CPLD, with t as the
top-level module, and data_io and addr_i connected to I/O ports. Will
this work as intended?

module b(data_io, addr_i);
	inout[6:0] data_io;
	input[12:0] addr_i;

	assign data_io = (addr_i == 321) ? 7'b1111111 : 7'bZZZZZZZ;
endmodule

module i(data_io, addr_i);
	inout[1:0] data_io;
	input[12:0] addr_i;

	assign data_io = (addr_i == 123) ? 2'b00 : 2'bZZ;
endmodule

module t();
	inout[6:0] data_io;
	input[12:0] addr_i;

	b b(data_io[6:0], addr_i);
	i i(data_io[1:0], addr_i);
endmodule

Article: 156426
Subject: Re: Tristates in synthesis
From: GaborSzakacs <gabor@alacron.com>
Date: Fri, 04 Apr 2014 09:14:49 -0400
Links: << >> << T >> << A >>

Philipp Klaus Krause wrote:
> How well can the synthesis tools deal with tristates?
> 
> If I use the following Verilog code for a Xilinx CPLD, with t as the
> top-level module, and data_io and addr_i connected to I/O ports. Will
> this work as intended?
> 
> module b(data_io, addr_i);
> 	inout[6:0] data_io;
> 	input[12:0] addr_i;
> 
> 	assign data_io = (addr_i == 321) ? 7'b1111111 : 7'bZZZZZZZ;
> endmodule
> 
> module i(data_io, addr_i);
> 	inout[1:0] data_io;
> 	input[12:0] addr_i;
> 
> 	assign data_io = (addr_i == 123) ? 2'b00 : 2'bZZ;
> endmodule
> 
> module t();
> 	inout[6:0] data_io;
> 	input[12:0] addr_i;
> 
> 	b b(data_io[6:0], addr_i);
> 	i i(data_io[1:0], addr_i);
> endmodule

And did you try it?

I've never had a problem with external (IOB) tristates with Xilinx
tools.  Internal tristates are not recommended, even though in some
cases the tools infer similar logic.  The last devices that had
internal tristates (Virtex and Virtex E, Spartan 2 and 2e) are
either past end of life or not recommended for new design.

-- 
Gabor

Article: 156427
Subject: Re: Tristates in synthesis
From: al.basili@gmail.com (alb)
Date: 4 Apr 2014 14:54:11 GMT
Links: << >> << T >> << A >>

Hi Philipp,

Philipp Klaus Krause <pkk@spth.de> wrote:
> How well can the synthesis tools deal with tristates?

it is possible the tool vendor provides you recommendations on how to 
describe tristates. Check their coding guidelines.

AFAIK tristates can be easily handled if connected directly to I/O since 
they generally support this mode. But I do use them at lower levels as 
well to describe bus access for instance. since I find it more readable 
than describing muxes myself.

It works, if you follow tool vendor coding guidelines. And if it does 
not, you may blame them! :-)

Al

Article: 156428
Subject: Re: Simulation deltas
From: matt.lettau@gmail.com
Date: Fri, 4 Apr 2014 08:49:01 -0700 (PDT)
Links: << >> << T >> << A >>

On Thursday, April 3, 2014 6:01:34 AM UTC-7, Carl wrote:
> Hi,
>=20
>=20
>=20
> This question deals both with an actual problem, and with some more conce=
ptual thoughts on simulation deltas and how an RTL entity should behave wit=
h regards to this.
>=20
>=20
>=20
> This post regards the case of a simulation with ideal time - that is, no =
delays (in time) modelled, rather trusting only simulation deltas for the o=
rdering of events.
>=20
>=20
>=20
>=20
>=20
> *Conceptual*
>=20
>=20
>=20
> I would argue that for a well-behaved synchronous RTL entity, the followi=
ng must be true:
>=20
>=20
>=20
> *All readings of the input ports must be made *on* the delta of the risin=
g flank of the clock - not one or any other number of deltas after that.*
>=20
>=20
>=20
> Would people agree on that?
>=20
>=20
>=20
> It follows from the possibility of other logic, hierarchically above the =
entity in question, altering the input ports as little as one delta after t=
he rising flank. That must be allowed.
>=20
>=20
>=20
>=20
>=20
> *My actual problem*
>=20
>=20
>=20
> After a lot of debugging of one of my simulations, I found a Xilinx simul=
ation primitive (IDELAYE2 in Unisim) *not* adhering to the statement in the=
 previous section, which had caused all the problems.
>=20
>=20
>=20
> See the signals plotted here:
>=20
> http://www.fpga-dev.com/misc/deltaDelayProblem.png
>=20
>=20
>=20
> It's enough to focus on the "ports" section. The ports are:
>=20
> - c: in, the clock
>=20
> - cntValueIn: in
>=20
> - ld: in, writeEnable for writing cntValueIn to an internal register
>=20
> - cntValueOut: out, giving the contents of that register
>=20
>=20
>=20
> As can be seen, my 'ld' operation is de-asserted one delta after the risi=
ng flank. I argue this should be OK, but it is obvious that the data is nev=
er written (cntValueOut remains 0). If I delay the de-assertion of 'ld' jus=
t one more delta, the write *does* take effect as desired.
>=20
>=20
>=20
> I would argue this is a (serious) flaw of the Xilinx primitive. Would peo=
ple agree on that as well?
>=20
>=20
>=20
>=20
>=20
> (The following is not central for the above discussion, may be skipped.)
>=20
>=20
>=20
> I have checked the actual reason for the problem. See the "internals" sec=
tion of the signals. First, Xilinx delays both the clock and the ports to t=
he *_dly signals. Fully OK, if from now on operating on the delayed signals=
. The problem is that the process writing to the internal register is not c=
locked by c_dly, but by another signal, c_in, which is delayed *one more* d=
elta. This causes my requested 'ld' to be missed. (c_in is driven from c_dl=
y in another process, inverting the the clock input if the user has request=
ed that.)
>=20
>=20
>=20
> I argue that synchronous entities must be modelled in such a way that all=
 processes reading input ports *must* be clocked directly by the input cloc=
k port - not by some derived signal that is lagging (if only by one delta).=
 If this is not possible, the input ports being read must be delayed accord=
ingly. In this case, if Xilinx wishes to conditionally invert the clock lik=
e this, causing another delta of delay, the input ports must also be delaye=
d the corresponding number of deltas.
>=20
>=20
>=20
>=20
>=20
> Cheers,
>=20
> Carl

I would agree with Kevin's assessment and offer an easy solution. As soon a=
s you involve vendor supplied models you might as well just assume that the=
y are not purely behavioral in the sense you are describing. The easy way t=
o deal with this is to move edges of stimulus signals in test benches to th=
e falling edge of the clock, and to ensure your clock is running in simulat=
ion at an appropriate time period as it would in the real hardware.

Article: 156429
Subject: Re: Simulation deltas
From: GaborSzakacs <gabor@alacron.com>
Date: Fri, 04 Apr 2014 12:01:33 -0400
Links: << >> << T >> << A >>

matt.lettau@gmail.com wrote:
> On Thursday, April 3, 2014 6:01:34 AM UTC-7, Carl wrote:
>> Hi,
>>
>>
>>
>> This question deals both with an actual problem, and with some more conceptual thoughts on simulation deltas and how an RTL entity should behave with regards to this.
>>
>>
>>
>> This post regards the case of a simulation with ideal time - that is, no delays (in time) modelled, rather trusting only simulation deltas for the ordering of events.
>>
>>
>>
>>
>>
>> *Conceptual*
>>
>>
>>
>> I would argue that for a well-behaved synchronous RTL entity, the following must be true:
>>
>>
>>
>> *All readings of the input ports must be made *on* the delta of the rising flank of the clock - not one or any other number of deltas after that.*
>>
>>
>>
>> Would people agree on that?
>>
>>
>>
>> It follows from the possibility of other logic, hierarchically above the entity in question, altering the input ports as little as one delta after the rising flank. That must be allowed.
>>
>>
>>
>>
>>
>> *My actual problem*
>>
>>
>>
>> After a lot of debugging of one of my simulations, I found a Xilinx simulation primitive (IDELAYE2 in Unisim) *not* adhering to the statement in the previous section, which had caused all the problems.
>>
>>
>>
>> See the signals plotted here:
>>
>> http://www.fpga-dev.com/misc/deltaDelayProblem.png
>>
>>
>>
>> It's enough to focus on the "ports" section. The ports are:
>>
>> - c: in, the clock
>>
>> - cntValueIn: in
>>
>> - ld: in, writeEnable for writing cntValueIn to an internal register
>>
>> - cntValueOut: out, giving the contents of that register
>>
>>
>>
>> As can be seen, my 'ld' operation is de-asserted one delta after the rising flank. I argue this should be OK, but it is obvious that the data is never written (cntValueOut remains 0). If I delay the de-assertion of 'ld' just one more delta, the write *does* take effect as desired.
>>
>>
>>
>> I would argue this is a (serious) flaw of the Xilinx primitive. Would people agree on that as well?
>>
>>
>>
>>
>>
>> (The following is not central for the above discussion, may be skipped.)
>>
>>
>>
>> I have checked the actual reason for the problem. See the "internals" section of the signals. First, Xilinx delays both the clock and the ports to the *_dly signals. Fully OK, if from now on operating on the delayed signals. The problem is that the process writing to the internal register is not clocked by c_dly, but by another signal, c_in, which is delayed *one more* delta. This causes my requested 'ld' to be missed. (c_in is driven from c_dly in another process, inverting the the clock input if the user has requested that.)
>>
>>
>>
>> I argue that synchronous entities must be modelled in such a way that all processes reading input ports *must* be clocked directly by the input clock port - not by some derived signal that is lagging (if only by one delta). If this is not possible, the input ports being read must be delayed accordingly. In this case, if Xilinx wishes to conditionally invert the clock like this, causing another delta of delay, the input ports must also be delayed the corresponding number of deltas.
>>
>>
>>
>>
>>
>> Cheers,
>>
>> Carl
> 
> I would agree with Kevin's assessment and offer an easy solution. As soon as you involve vendor supplied models you might as well just assume that they are not purely behavioral in the sense you are describing. The easy way to deal with this is to move edges of stimulus signals in test benches to the falling edge of the clock, and to ensure your clock is running in simulation at an appropriate time period as it would in the real hardware.

The problem with that approach is that the vendor IP is driven by user
IP and not the test bench directly.  You certainly don't want the
user IP (for synthesis) working on the opposite clock edge.  In the
past I have worked around the Xilinx model issues by adding unit delays
in the code that instantiates it, but even that leaves a bad taste in
my mouth, as it shouldn't be necessary for behavioral simulation.

-- 
Gabor

Article: 156430
Subject: Re: Simulation deltas
From: KJ <kkjennings@sbcglobal.net>
Date: Fri, 4 Apr 2014 13:54:05 -0700 (PDT)
Links: << >> << T >> << A >>

On Friday, April 4, 2014 12:01:33 PM UTC-4, Gabor wrote:
> The problem with that approach is that the vendor IP is driven by user=20
> IP and not the test bench directly.

I didn't see anything in the OP indicating whether the driving signals were=
 testbench or design...but you could be right.

>  You certainly don't want the=20
> user IP (for synthesis) working on the opposite clock edge.  In the=20
> past I have worked around the Xilinx model issues by adding unit delays=
=20
> in the code that instantiates it, but even that leaves a bad taste in=20
> my mouth, as it shouldn't be necessary for behavioral simulation.

Again the way to fight a model that tries to model reality is with more 're=
ality' of your own.  Make the assignments that assign to signals that conne=
ct with the primitive be delayed by 1 ns (i.e. "a <=3D b after 1 ns;").  Sy=
nthesis tools ignore the 'after' clause, sim does not.

I agree that you shouldn't have to do this when you're simulating the origi=
nal design sources (but I thought he was simulating a post-route design bei=
ng driven by a testbench).  It's ugly, but I guess that is part of the bagg=
age that comes with Brand X...maybe switch to Brand A and see if the laundr=
y comes out cleaner.

Kevin Jennings

Article: 156431
Subject: Re: Tristates in synthesis
From: Brane2 <brankob@avtomatika.com>
Date: Fri, 4 Apr 2014 22:27:48 -0700 (PDT)
Links: << >> << T >> << A >>

Dne petek, 04. april 2014 13:14:49 UTC je oseba Gabor napisala:

<SNIP>

> The last devices that had
> 
> internal tristates (Virtex and Virtex E, Spartan 2 and 2e) are
> 
> either past end of life or not recommended for new design.

Does anyone know why were tristates banned from newer devices ?
Is it about signal integrity, some kind of leakages on the floating nets or something different alltogether ?

Article: 156432
Subject: Re: Tristates in synthesis
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Sat, 5 Apr 2014 05:58:08 +0000 (UTC)
Links: << >> << T >> << A >>

Brane2 <brankob@avtomatika.com> wrote:

(snip)
> Does anyone know why were tristates banned from newer devices ?
> Is it about signal integrity, some kind of leakages on the 
> floating nets or something different alltogether ?

Scaling.

When the wires get smaller and longer, they require buffers along
the way to drive fast signals the full length. The buffers only go
one direction.

-- glen

Article: 156433
Subject: Re: Tristates in synthesis
From: rickman <gnuarm@gmail.com>
Date: Sat, 05 Apr 2014 04:20:53 -0400
Links: << >> << T >> << A >>

On 4/5/2014 1:27 AM, Brane2 wrote:
> Dne petek, 04. april 2014 13:14:49 UTC je oseba Gabor napisala:
>
> <SNIP>
>
>> The last devices that had
>>
>> internal tristates (Virtex and Virtex E, Spartan 2 and 2e) are
>>
>> either past end of life or not recommended for new design.
>
> Does anyone know why were tristates banned from newer devices ?
> Is it about signal integrity, some kind of leakages on the floating nets or something different alltogether ?

Tristates are just not a good solution to most problems in FPGAs.  They 
were called "long lines" because that is what they were, very long wires 
in the part which use up a lot of real estate and slow down a signal. 
Considering that they use space on every part when most designers don't 
use them, they decided to give them the boot and free the space for more 
logic which can do the same job and runs faster anyway.

-- 

Rick

Article: 156434
Subject: Re: Simulation deltas
From: rickman <gnuarm@gmail.com>
Date: Sat, 05 Apr 2014 13:02:16 -0400
Links: << >> << T >> << A >>

On 4/3/2014 1:17 PM, KJ wrote:
> On Thursday, April 3, 2014 10:42:56 AM UTC-4, Carl wrote:
>> I don't really get what your two points mean in this context. I do understand
>> and agree on the literal meaning of them.
>>
>> I don't think those points necessariyl adress my issue. My issue doesn't only
>> relate to causality. Then main problem is to determine *exactly when something
>> is sampled*.
>>
>> Since you don't agree with the statement however; how then should synchronous
>> elements communicate with each other? If I clock a unit with 'clk', and I can't
>> expect that unit to sample the input ports (which I drive) on (exactly on,
>> without any delta delays) the rising edge of 'clk', then how long after the
>> edge must I hold the input data stable? One delta? Two, ten? One ps, one ns?
>>
> Actually, I misread a bit your actual question, I do agree that inputs should get sampled on only one simulation delta cycle...and they do.  For some reason, I thought you were talking about outputs being generated.
>
> In any case, your conceptual question doesn't relate to the problem that you are seeing with the Xilinx primitive.  I have no idea whether it correctly models the primitive or not, but let's assume for a moment that it is correct.  Since that primitive is attempting to model reality, there very well would be a delay between the input clock to that primitive and when that primitive actually samples input signals. If that is the situation, then inputs must also model reality in that they cannot be changing instantaneously either.  Inputs to such a model must meet the setup/hold constraints of the design.

This is a specious argument.  Delta delays are not in any way related to 
physical delays and are intended to deal with issues in the logic of 
simulation, not real world physics.  If the Xilinx primitive is trying 
to model timing delays it has done a pretty durn poor job of it since a 
delta delay is zero simulation time.

> When you're performing functional simulation, there can be an assumption that you can ignore setup/hold time issues.  This is an invalid assumption if you include parts into your model that model reality where delays do occur.  The model is not wrong in that case, it is your usage of that model.

This model is clearly *not* modeling timing delays.  Just read his 
description of the problem and you will see that.

> Just like on a physical board, on the input side to such a model, you need to insure that you do not violate setup or hold constraints.  If you do, then a physical board will not always work, in a simulation environment your simulation will fail (which is what you're experiencing).  On the output side of a model, you need to make sure that you're not sampling too early (i.e. sooner than the Tco min).

This discussion is not at all about setup or hold times.  The OP is 
performing functional simulation which is very much like unit delay 
simulation.  The purpose of delta delays are to prevent the order of 
evaluating sequential logic from affecting the outcome.  So the output 
of all logic gets a delta delay (zero simulation time, but logically 
delayed only) so that the output change is indeed causal and can not 
affect other sequential elements on that same clock edge.

In fact, this is the classic problem where a logic element is inserted 
into the clock path for some sequential elements and not others creating 
the exact problem the OP is observing.  Normally, designers know not to 
do this.  I guess someone at Xilinx was out that day in the training class.

-- 

Rick

Article: 156435
Subject: Re: Simulation deltas
From: KJ <kkjennings@sbcglobal.net>
Date: Sat, 5 Apr 2014 12:21:06 -0700 (PDT)
Links: << >> << T >> << A >>

On Saturday, April 5, 2014 1:02:16 PM UTC-4, rickman wrote:
> > In any case, your conceptual question doesn't relate to the problem tha=
t you=20
> > are seeing with the Xilinx primitive.  I have no idea whether it correc=
tly > > models the primitive or not, but let's assume for a moment that it =
is
> >  correct.  Since that primitive is attempting to model reality, there v=
ery=20
> > well would be a delay between the input clock to that primitive and whe=
n=20
> > that primitive actually samples input signals. If that is the situation=
,=20
> > then inputs must also model reality in that they cannot be changing=20
> > instantaneously either.  Inputs to such a model must meet the setup/hol=
d > > > constraints of the design.=20

> This is a specious argument.  Delta delays are not in any way related to=
=20
> physical delays and are intended to deal with issues in the logic of=20
> simulation, not real world physics.

Nothing at all specious, it is correct.  If you're connecting to a block th=
at models delays (and the OP's does), then the solution is to model reality=
 as well on the inputs in order to meet setup/hold time as well as to not s=
ample outputs before Tco max.  Whether those delays are caused by the model=
 using delta delays or real time delays does not change the fact that the s=
olution I provided is correct.  It will be correct if the offending model u=
ses delta delays or actual post-route delays.

> > When you're performing functional simulation, there can be an assumptio=
n > > > that you can ignore setup/hold time issues.  This is an invalid ass=
umption=20
> > if you include parts into your model that model reality where delays do=
=20
> > occur.  The model is not wrong in that case, it is your usage of that
> > model.=20

> This model is clearly *not* modeling timing delays.  Just read his=20
> description of the problem and you will see that.=20

I did read the post, and there are timing delays.  Just because the delays =
are simulation deltas does not make them 'not a delay'.  Since the model he=
 is using implements these delays, the user needs to account for that.  If =
you don't want to account for it, then you should use a different model.

> > Just like on a physical board, on the input side to such a model, you n=
eed=20
> > to insure that you do not violate setup or hold constraints.  If you do=
,=20
> > then a physical board will not always work, in a simulation environment=
=20
> > your simulation will fail (which is what you're experiencing).  On the=
=20
> > output side of a model, you need to make sure that you're not sampling =
too=20
> > early (i.e. sooner than the Tco min).=20

> This discussion is not at all about setup or hold times.  The OP is=20
> performing functional simulation which is very much like unit delay=20
> simulation.

I agree that the OP's problem is not about setup or hold times.  The work a=
round/solution I suggested was to add delays in order to conform with setup=
 or hold times, "Just like on a physical board...".  My solution has a dire=
ct connection with reality (i.e. a physical board with the design programme=
d in), other solutions might not.

If you're adding something to work around some problem, you're on much firm=
er ground if there is an actual basis that can be traced back to specificat=
ions.  On the assumption that the external thing connected to the part bein=
g worked around is a physical part, ask yourself if adding Tpd and Tco dela=
ys to that model makes it closer or farther away from a 'true' model of tha=
t part.

Someone else posted that they typically worked around this by changing the =
inputs to be driven by the opposite edge of the clock.  That probably works=
 also, but again ask yourself does that make the simulation model closer to=
 reality?  Don't think so.

Of course, there is also the possibility that the stuff connecting to the X=
ilinx primitive is itself internal to the device in which case I suggested =
adding a 1 ns (or really whatever small non-zero time delay you want).  Aga=
in, inside a real device, the output of a flop will not change in zero time=
 so adding a small nominal delay as a work around can be justified as model=
ing reality.

In any case, the work around you use should have a rational basis for being=
 the way it is.  If the only justification is that 'it was the only way I c=
ould get the sim to run' then there is probably a design error that is bein=
g covered up, rather than a model limitation that is being worked around.

Kevin Jennings

Article: 156436
Subject: Re: Tristates in synthesis
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Sun, 6 Apr 2014 09:57:58 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:

(snip)
> Tristates are just not a good solution to most problems in FPGAs.  

Well, at some point getting the logic right is most important,
and if implementing it with simulated tristates gets it right,
and is more readable, then that is probably better. 

> They were called "long lines" because that is what they were, 
> very long wires in the part which use up a lot of real estate 
> and slow down a signal. 

Above about 0.8u, one could consider a "wire" as an equipotential,
that is, (close enough to) a perfect conductor, with capacitance
to ground and driven by a current source. The capacitance depends
on the width and length. If width is constant, then length.

The delay, then, depends on the total length of the wire, not
the distance between the source and sink. (That is, when there
is more than one source or sink.) The model is a lumped capacitance
driven by a current source. (Proper scaling of MOSFETs decreases
the channel width, length, and oxide thickness in proportion,
keeping the on resistance constant.) Circuits speed up as
capacitance decreases, both from width and length.

As above, the capacitance depends on width and length, but the
resistance depends on width, length, and height. Height scales
with width. As width gets smaller, capacitance decreases in
proportion, but resistance increases quadratically. The model
is now a distributed capacitance and distibuted resistance,
which required a big change in all the timing tools.

At some point the delay through the distributed RC wires
gets too long, and intermediate buffers are required. 

> Considering that they use space on every part when most 
> designers don't use them, they decided to give them the 
> boot and free the space for more logic which can do the 
> same job and runs faster anyway.

I haven't thought about this for a while. I think the lines
would still be used, but the driver always enabled.

-- glen

Article: 156437
Subject: Re: Tristates in synthesis
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Sun, 6 Apr 2014 10:02:20 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:

(snip)
> Tristates are just not a good solution to most problems in FPGAs.  

Well, at some point getting the logic right is most important,
and if implementing it with simulated tristates gets it right,
and is more readable, then that is probably better. 

> They were called "long lines" because that is what they were, 
> very long wires in the part which use up a lot of real estate 
> and slow down a signal. 

Above about 0.8u, one could consider a "wire" as an equipotential,
that is, (close enough to) a perfect conductor, with capacitance
to ground and driven by a current source. The capacitance depends
on the width and length. If width is constant, then length.

The delay, then, depends on the total length of the wire, not
the distance between the source and sink. (That is, when there
is more than one source or sink.) The model is a lumped capacitance
driven by a current source. (Proper scaling of MOSFETs decreases
the channel width, length, and oxide thickness in proportion,
keeping the on resistance constant.) Circuits speed up as
capacitance decreases, both from width and length.

As above, the capacitance depends on width and length, but the
resistance depends on width, length, and height. Height scales
with width. As width gets smaller, capacitance decreases in
proportion, but resistance increases quadratically. The model
is now a distributed capacitance and distibuted resistance,
which required a big change in all the timing tools.

At some point the delay through the distributed RC wires
gets too long, and intermediate buffers are required. 

> Considering that they use space on every part when most 
> designers don't use them, they decided to give them the 
> boot and free the space for more logic which can do the 
> same job and runs faster anyway.

I haven't thought about this for a while. I think the lines
would still be used, but the driver always enabled.

-- glen

Article: 156438
Subject: Re: Simulation deltas
From: rickman <gnuarm@gmail.com>
Date: Sun, 06 Apr 2014 11:42:34 -0400
Links: << >> << T >> << A >>

On 4/5/2014 3:21 PM, KJ wrote:
> On Saturday, April 5, 2014 1:02:16 PM UTC-4, rickman wrote:
>>> In any case, your conceptual question doesn't relate to the problem that you
>>> are seeing with the Xilinx primitive.  I have no idea whether it correctly > > models the primitive or not, but let's assume for a moment that it is
>>>   correct.  Since that primitive is attempting to model reality, there very
>>> well would be a delay between the input clock to that primitive and when
>>> that primitive actually samples input signals. If that is the situation,
>>> then inputs must also model reality in that they cannot be changing
>>> instantaneously either.  Inputs to such a model must meet the setup/hold > > > constraints of the design.
>
>> This is a specious argument.  Delta delays are not in any way related to
>> physical delays and are intended to deal with issues in the logic of
>> simulation, not real world physics.
>
> Nothing at all specious, it is correct.  If you're connecting to a block that models delays (and the OP's does), then the solution is to model reality as well on the inputs in order to meet setup/hold time as well as to not sample outputs before Tco max.  Whether those delays are caused by the model using delta delays or real time delays does not change the fact that the solution I provided is correct.  It will be correct if the offending model uses delta delays or actual post-route delays.
>
>>> When you're performing functional simulation, there can be an assumption > > > that you can ignore setup/hold time issues.  This is an invalid assumption
>>> if you include parts into your model that model reality where delays do
>>> occur.  The model is not wrong in that case, it is your usage of that
>>> model.
>
>> This model is clearly *not* modeling timing delays.  Just read his
>> description of the problem and you will see that.
>
> I did read the post, and there are timing delays.  Just because the delays are simulation deltas does not make them 'not a delay'.  Since the model he is using implements these delays, the user needs to account for that.  If you don't want to account for it, then you should use a different model.

I'm not going to argue with you about this.  The models are wrong by 
conventions of VHDL.  I have seen no evidence that the models are trying 
to simulate timing delays.  A delta delay is *zero* time in the 
simulation.  If they wanted to model timing delays they would use a time 
delay, not delta delays.  The problem with using delta delays is that 
they don't even approximate timing values and they corrupt functional 
simulation as the OP is seeing.  It is a bit absurd to expect users to 
insert delta delays in their code to fake out imagined timing delays of 
0 ns.  There is no utility to this concept.


>>> Just like on a physical board, on the input side to such a model, you need
>>> to insure that you do not violate setup or hold constraints.  If you do,
>>> then a physical board will not always work, in a simulation environment
>>> your simulation will fail (which is what you're experiencing).  On the
>>> output side of a model, you need to make sure that you're not sampling too
>>> early (i.e. sooner than the Tco min).
>
>> This discussion is not at all about setup or hold times.  The OP is
>> performing functional simulation which is very much like unit delay
>> simulation.
>
> I agree that the OP's problem is not about setup or hold times.  The work around/solution I suggested was to add delays in order to conform with setup or hold times, "Just like on a physical board...".  My solution has a direct connection with reality (i.e. a physical board with the design programmed in), other solutions might not.
>
> If you're adding something to work around some problem, you're on much firmer ground if there is an actual basis that can be traced back to specifications.  On the assumption that the external thing connected to the part being worked around is a physical part, ask yourself if adding Tpd and Tco delays to that model makes it closer or farther away from a 'true' model of that part.

But this is not relevant.  I would prefer to add the delta delays where 
needed and to document them as being required to deal with the errors in 
the Xilinx models which is why they are there, not to add timing 
information to a functional simulation which is a bit absurd.


> Someone else posted that they typically worked around this by changing the inputs to be driven by the opposite edge of the clock.  That probably works also, but again ask yourself does that make the simulation model closer to reality?  Don't think so.

I would consider this to be adding an error to work around the Xilinx 
error.


> Of course, there is also the possibility that the stuff connecting to the Xilinx primitive is itself internal to the device in which case I suggested adding a 1 ns (or really whatever small non-zero time delay you want).  Again, inside a real device, the output of a flop will not change in zero time so adding a small nominal delay as a work around can be justified as modeling reality.

Now you are starting to understand delta delays.  That is what VHDL does 
in the simulation.  The output of a sequential element changes 1 delta 
delay after the clock edge.  You are proposing that additional delta 
delays be added by the user to compensate for the delta delays being 
introduced in the clock path by the corrupt Xilinx model.  This is in 
conflict with best design practices.

I feel that Xilinx should have added those delays to the input data path 
so that the rest of the simulation can be written like a standard VHDL 
design.


> In any case, the work around you use should have a rational basis for being the way it is.  If the only justification is that 'it was the only way I could get the sim to run' then there is probably a design error that is being covered up, rather than a model limitation that is being worked around.

The rational basis is not "it was the only way I could get the sim to 
run", it is "this is the best way to work around the Xilinx model 
problems".  Ideally the fixes would be added to a wrapper around the 
offending Xilinx code if possible.

-- 

Rick

Article: 156439
Subject: Re: Tristates in synthesis
From: rickman <gnuarm@gmail.com>
Date: Sun, 06 Apr 2014 16:00:38 -0400
Links: << >> << T >> << A >>

On 4/6/2014 5:57 AM, glen herrmannsfeldt wrote:
> rickman <gnuarm@gmail.com> wrote:
>
>> Considering that they use space on every part when most
>> designers don't use them, they decided to give them the
>> boot and free the space for more logic which can do the
>> same job and runs faster anyway.
>
> I haven't thought about this for a while. I think the lines
> would still be used, but the driver always enabled.

Not sure what you mean.  The original tristate busses on FPGAs crossed 
the entire chip.  I seem to recall the last generation of chips with 
tri-state busses only had them across quadrants, but I may be confusing 
this with another feature.  The point is that if you don't need them 
they are entirely wasted (just like any other special feature) but 
unlike other hard logic on the chip the equivalent logic implemented in 
the fabric is not so terribly larger or slower.

I talked about this a number of times with the FAEs and I don't recall 
them mentioning the need for buffers.  I think it was just a chip area 
vs. utility trade off.  There are just a lot of people who think in 
terms of tri-state busses rather than muxed busses so they stuck around 
for some time past their real usefulness.

I remember when I was still in grad school a friend had started work at 
IBM where they were making chips for government sonar work.  They had 
already reached the point of not using tristate busses on chip, 
replacing them with the extra wiring and muxes.  That was some 20 years 
before these features were taken off of FPGAs.

-- 

Rick

Article: 156440
Subject: Re: Tristates in synthesis
From: Theo Markettos <theom+news@chiark.greenend.org.uk>
Date: 06 Apr 2014 21:09:39 +0100 (BST)
Links: << >> << T >> << A >>

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> At some point the delay through the distributed RC wires
> gets too long, and intermediate buffers are required. 

The advantage of switched interconnect, which is what often gets used today
instead of buses, is that the intermediate buffers become registers.  That
means it's possible to have more than one value in flight at once.

With buffers you still have to charge/discharge the whole wire, the buffers
just make it faster than a driver at one end.  But if it's registered the
cycle time can be shorter because you only need to charge the section of line
until the next register.  It'll then pass along in the next (shorter) clock
cycle.  Next cycle you can put something else on the wire.

Of course this means handling the case that data doesn't come back in a
single cycle, but it means your cycles end up a lot faster than they would
otherwise be (well, bottlenecked somewhere else more likely).

Theo

Article: 156441
Subject: Re: New Lattice FPGAs on 40nm ?
From: rickman <gnuarm@gmail.com>
Date: Sun, 06 Apr 2014 19:10:22 -0400
Links: << >> << T >> << A >>

On 3/5/2014 2:21 AM, Sean Durkin wrote:
> Brane2 wrote:
>> Thanks.
>>
>> Any thoughts on MachXO2/3 ?
>>
>> Are they serious with that or are they retreating to reserve defence line ?
>
> No idea. Those parts weren't mentioned in our discussions with them.
>
> I'd recommend contacting your local sales rep and/or Lattice FAE, but in
> our case they also didn't know what hit them when ECP4 was cancelled, so
> I'm not really sure how much sales reps and FAEs are "in the loop"
> nowadays...

I am a Lattice user of the XP family which was obsoleted recently.  I'm 
not happy with that, but I'll consider that this was prompted by the fab 
being closed so they had little choice even if the way they did it was 
not so much to my liking.

The MachXO2/3 parts are indeed low end, but they continue to make one 
mistake (in my opinion) that X and A also make.  They just won't provide 
parts in easy to use lower pin count packages.  I picked the XP part 
because it was the best trade off between size, I/O count, density and 
ease of use (in terms of board fab).  There is *nothing* else on the 
market that meets my criteria.

Lattice prefers to provide very fine pitch BGA/CSP packages that require 
very fine pitch board design rules and often very small vias.  When you 
think about their application focus, when you say, "low-power/low-cost 
market", you should be saying, portable/hand held devices.  In this 
market package size is also a major factor, so no 64/100 pin QFPs

Maybe I'll just bite the bullet and practice some 3/3 design rule 
layouts.  lol  But where to get them prototyped?  Anyone know of 6 
layer, fine pitch PCB batch fabs going on?  I've found a couple that do 
4 layer moderate pitch at very reasonable prices.

-- 

Rick

Article: 156442
Subject: Re: Simulation deltas
From: KJ <kkjennings@sbcglobal.net>
Date: Sun, 6 Apr 2014 17:44:39 -0700 (PDT)
Links: << >> << T >> << A >>

On Sunday, April 6, 2014 11:42:34 AM UTC-4, rickman wrote:
> > If you're adding something to work around some problem, you're on much=
=20
> > firmer ground if there is an actual basis that can be traced back to=20
> > specifications.  On the assumption that the external thing connected to=
 the=20
> > part being worked around is a physical part, ask yourself if adding Tpd=
 and=20
> > Tco delays to that model makes it closer or farther away from a 'true'=
=20
> > model of that part.=20

> But this is not relevant.  I would prefer to add the delta delays where=
=20
> needed and to document them as being required to deal with the errors in=
=20
> the Xilinx models which is why they are there, not to add timing=20
> information to a functional simulation which is a bit absurd.=20

Uh huh...when I say to add delays as a work around, you see it as 'not rele=
vant' and 'absurd', but when you suggest adding delta delays you think you'=
re relevant...OK...gotcha.

If you had actually put *any* thought into the problem you would see that a=
ll of the 'as being required' places that one would need to add delays woul=
d be the inputs (as I suggested) and the delays...well, you never suggested=
 any amount for a delay (where I did).  Good tip!

> > Of course, there is also the possibility that the stuff connecting to t=
he=20
> > Xilinx primitive is itself internal to the device in which case I sugge=
sted=20
> > adding a 1 ns (or really whatever small non-zero time delay you want). =
=20
> > Again, inside a real device, the output of a flop will not change in ze=
ro=20
> > time so adding a small nominal delay as a work around can be justified =
as=20
> > modeling reality.=20

> You are proposing that additional delta delays be added by the user to=20
> compensate for the delta delays being introduced in the clock path by the=
=20
> corrupt Xilinx model.  This is in conflict with best design practices.=20

Ah yes, the 'conflict with best design practice' canard.  I suggested using=
 a different model if available, and if you're stuck with the model, then h=
ere is the way to work around it.  What I suggested can be traced back to s=
pecifications, what you suggest...well, not so much.  Just how many 'delta =
delays' do you think you can add and trace that code back to a specificatio=
n?

> I feel that Xilinx should have added those delays to the input data path=
=20
> so that the rest of the simulation can be written like a standard VHDL=20
> design.=20

Is what you 'fee' supposed to be relevant?

So are you suggesting that one should do nothing until the Xilinx model is =
fixed?  When I encounter a bug, I submit it to the vendor and work on a wor=
k around since I can't depend on them to field a fix in a time frame usable=
 by me.  I guess you live in a different world where it is OK to say develo=
pment has stopped while you wait for a supplier to fix something.

> > In any case, the work around you use should have a rational basis for b=
eing=20
> > the way it is.  If the only justification is that 'it was the only way =
I=20
> > could get the sim to run' then there is probably a design error that is=
=20
> > being covered up, rather than a model limitation that is being worked=
=20
> > around.=20

> The rational basis is not "it was the only way I could get the sim to=20
> run", it is "this is the best way to work around the Xilinx model=20
> problems".  Ideally the fixes would be added to a wrapper around the=20
> offending Xilinx code if possible.

So rather than accepting a solution that I suggested that has a basis that =
can be traced back to specification, can be reused regardless of how many d=
elta delays get added sometime in the future (seems that you forgot about t=
hat possibility) you're into=20
- Railing on Xilinx
- Waiting for them to fix the model
- Or add a magic wrapper being apparently clueless that no wrapper will fix=
 the problem and dismissing my work around as being 'not relevant', 'absurd=
', etc. =20

Gotcha.

I'm done with this thread, catch you in the future in some other thread.

Kevin Jennings

Article: 156443
Subject: Re: Simulation deltas
From: HT-Lab <hans64@htminuslab.com>
Date: Mon, 07 Apr 2014 09:45:27 +0100
Links: << >> << T >> << A >>

On 03/04/2014 14:01, Carl wrote:
> Hi,
>
> This question deals both with an actual problem, and with some more conceptual thoughts on simulation deltas and how an RTL entity should behave with regards to this.
>
> This post regards the case of a simulation with ideal time - that is, no delays (in time) modelled, rather trusting only simulation deltas for the ordering of events.
>

You might extract some useful info from this discussion:

http://verificationguild.com/modules.php?name=Forums&file=viewtopic&t=537

Delta delays avoid a lot of simulation nasties like race conditions but 
still suffers from some real world implementation issues as you have 
discovered.

Good luck,
Hans
www.ht-lab.com

Article: 156444
Subject: Re: Simulation deltas
From: colin <colin_toogood@yahoo.com>
Date: Mon, 7 Apr 2014 02:43:43 -0700 (PDT)
Links: << >> << T >> << A >>

I once got bitten by this sort of thing.
Turned out that the default modelsim timing granularity was too big and the simulation rounded delays down to zero.

Colin

Article: 156445
Subject: static timing analysis
From: al.basili@gmail.com (alb)
Date: 7 Apr 2014 13:34:48 GMT
Links: << >> << T >> << A >>

Hi everyone,

any good reference on STA? I know roughly well the principles but I've 
always wanted to get a deeper understanding of this - vast - subject, 
especially the algorithms behind it.

I often find formulas not very accurate and lacking of important aspects 
like clock skew or clock-to-out delays.

Thanks for any pointer.

Al

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Article: 156446
Subject: Re: [cross-post][long] svn workflow for fpga development
From: Chris Higgs <chiggs.99@gmail.com>
Date: Mon, 7 Apr 2014 08:36:00 -0700 (PDT)
Links: << >> << T >> << A >>

On Sunday, March 30, 2014 11:07:39 PM UTC+1, alb wrote:
> I fully agree with you here, that would be my next item on my personal=20
>=20
> agenda...but revolutionary change requires time and patience ;-).

In my own experience, I've found it's far easier to lead by example than ba=
ttle the internal corporate structure - I soon got tired of arguing!

If the company is wedded to out-dated version control software I'll still u=
se git locally.  There are often wrappers[1] that make interfacing easy.  I=
'll  run GitLab to provide myself a nice HTTP code/diff browser etc.  If th=
ere's no bug-tracker(!!) I'll use GitLab issues to track things locally. If=
 the company has no regression, I'll run a Jenkins server on my box.  If te=
sts aren't scripted, I'll spend some time writing some Makefiles.  If the t=
ests aren't self-checking, I'll gradually add some pass/fail criteria so th=
e tests become useful. I'll then start plotting graphs for things like simu=
lation coverage, FPGA resource utilisation etc. using Jenkins.

Unless you're working in an extremely restrictive environment with no contr=
ol over your development box, none of this requires sign-off from the power=
s that be.  You'll find other developers and then management are suddenly c=
urious to know how you can spot only a few minutes after they've checked so=
mething in that the resource utilisation for their block has doubled... or =
how you can say with such confidence that a certain feature has never been =
tested in simulation.  Once they see the nice web interface of Jenkins and =
the pretty graphs, understand the ease with which you can see what's happen=
ing in the repository, they'll soon be asking for you to centralise your de=
velopment set-up so they can all benefit :)

Chris

[1] https://www.kernel.org/pub/software/scm/git/docs/git-svn.html

PS apologies for breaking the cross-post again... curse GG

Article: 156447
Subject: Re: Simulation deltas
From: Carl <carwer0@gmail.com>
Date: Mon, 7 Apr 2014 10:34:07 -0700 (PDT)
Links: << >> << T >> << A >>


Just to clarify, this is not a post-route simulation. This is a simulation =
of a larger custom RTL design. In various parts of it, some primitives from=
 the Xilinx Unisim library are used.

There are numerous workarounds of course, they are obvious to all of us, an=
d which someone would choose is much a matter of taste - for me, this is no=
t the central discussion here. I rather seek the lesson to learn (if any) a=
fter having spent half a day of debugging, finally having found the behavio=
ur of this primitive to be the cause of the problem.

What the discussion boils down to is if functional models may behave like t=
his. If the answer is yes, there should be a general design practice, that =
should always be used when interfacing to RTL logic or functional models yo=
u haven't developed yourself.

I see from the discussion that the arguments regarding this differs. My ori=
ginal post suggested me leaning towards the Xilinx primitive being flawed, =
and also after having taken in the arguments above, this is still my opinio=
n. The Unisim library contains simulation primitives. For functional simula=
tion (there's the Simprim library for timing simulations) they should follo=
w design practice of the interfacing logic only being required to hold the =
input signals valid *on* the active edge of the input clock. Not longer (al=
so not in terms of deltas).

One effect of the user being required to hold inputs active any longer (say=
, adding 'after 1 ns' to any interfacing logic signals) would be a (sometim=
es) significant increase of simulation time. One of the powers of functiona=
l simulations are that any changes only happens on the clock flanks, and ch=
anges around the clock flanks being separated only by deltas, not by time. =
(Remember, VHDL signals are expensive in this regard. Reducing signal chang=
es means everything to efficient simulations.)


There is another side of this discussion, that is not about how to interfac=
e to models/logic by others, but rather how to select your own design rules=
 to avoid these problems within the code you develop on your own. However, =
I believe a designer seldomly has a legal reason to mess with the clock pat=
h in RTL code. Typically, vendor primitives are instantiated for any such f=
unctionality (clock muxing etc.). There might be situations where you rathe=
r _infer_ than instantiate though, and then this *does* become a problem. H=
owever, I never came across such a situation.

Article: 156448
Subject: Re: Simulation deltas
From: Carl <carwer0@gmail.com>
Date: Mon, 7 Apr 2014 10:40:27 -0700 (PDT)
Links: << >> << T >> << A >>

Den m=E5ndagen den 7:e april 2014 kl. 10:45:27 UTC+2 skrev HT-Lab:
>=20
> You might extract some useful info from this discussion:
>=20
> http://verificationguild.com/modules.php?name=3DForums&file=3Dviewtopic&t=
=3D537
>=20

A very related topic, yes.

If you have several clocks in your design, you must make sure any edges sup=
posed to occur simultaneous *do* occur simultaneous, also in regards to del=
tas. This requires some care when generating your test bench clocks. I make=
 sure to generate them from within one and the same process, keeping the de=
sired phase relationship. Generating one, and dividing the second from the =
first, is doomed to fail. Logic interfacing to both clocks then have a big =
risk of missing signal transactions.

Article: 156449
Subject: Re: Simulation deltas
From: Carl <carwer0@gmail.com>
Date: Mon, 7 Apr 2014 10:40:50 -0700 (PDT)
Links: << >> << T >> << A >>

Den m=E5ndagen den 7:e april 2014 kl. 11:43:43 UTC+2 skrev colin:
> Turned out that the default modelsim timing granularity was too big and t=
he simulation rounded delays down to zero.

This though must have been due to time delays rather than delta delays. (I =
know Xilinx states its primitives requires 1 ps precision, whereas default =
precision for ModelSim is 1 ns.)

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search