Wednesday, February 26, 2014

Avoiding Verilog's Non-determinism, Part 1

In my last post we looked at some example code that showed off Verilog's non-determinism. Here it is again (you can actually run it on multiple simulators here on EDA Playground):

module top;
   reg ready;
   integer result;

   initial begin
      #10;
      ready <= 1;
      result <= 5;
   end

   initial begin
      @(posedge ready);
      if(result == 5) begin
       $display("result was ready");
      end
      else begin
       $display("result was not ready");
      end
   end   
endmodule

Just to review from last time, the problem is that sometimes the @(posedge ready) will trigger before result has the value 5 and sometimes it will trigger after result has the value 5. We have called this non-determinism but a more common term for it is, race condition. There is a race between the values of ready and result making it to that second process (the second initial block). If result is updated first (wins the race) then everything runs as the writer of the code intended. If ready is updated first (wins the race) then the result will not actually be ready when the writer of the code intended.

Now the question is, is there a way to write this code so that there is no race condition? Well, first of all I surveyed my body of work on simulation-only code and didn't find very many uses of non-blocking assignments like that. The common advice in the Verilog world is to use non-blocking assignments in clocked always blocks not in "procedural" code like this. If we change the above to use blocking instead of non-blocking assignments, does that fix the problem? Here's what the new first initial block looks like:

   initial begin
      #10;
      ready = 1;
      result = 5;
   end

You can try it on EDA Playground and see that it still behaves the same as it did before except for with GPL Cver. With non-blocking assignments you get "result was not ready" with Cver and now you get "result was ready." That doesn't give me a lot of warm fuzzy feelings though. In fact, looking at that code makes me feel worse. If I'm thinking procedurally it looks totally backwards to set ready to one before assigning the value to result. My instinct would be to write the first initial block like this:

   initial begin
      #10;
      ready = 1;
      result = 5;
   end

Is that better for avoiding race conditions? If I take the explanation for why race-conditions exist in Verilog from Jan Decaluwe's VHDL's Crown Jewel post at face value, I think it actually is. That post explains that right after the first assignment (signal value update, if we use Jan's wording) in the first initial block Verilog could decide to trigger the second process (the second initial block). That case causes problems in the original code because the first assignment is to ready and result doesn't yet have its updated value. With the assignments re-ordered as above even if the second initial block is activated after the first assignment it will not try to read the value of result. It will just block waiting for a posedge ready (which will happen next). Race condition: eliminated. Here is the full fixed code example on EDA Playground.

Strangely enough, I spent the day yesterday debugging and fixing a race condition in our production testbench code here at work. It was very different from this one, so don't get too confident after reading this single blog post. I was able to boil the problem from yesterday down into another small example and so my next post will show off that code and how I eliminated that particular race.

UPDATE: As promised another example of a race condition.

2 comments:

Victor Lyuboslavsky said...

We can also use our old friend #0 to "fix" this issue: http://www.edaplayground.com/x/3NJ

I just saw code today where a #0 was used after firing a SystemVerilog event to make sure the triggered code was executed before the subsequent line.

Bryan said...

I strongly vote for choosing the solution that doesn't involve using the semantically nonsensical #0 :-)

And yes, I've seen that before. We experienced a difference of opinion between simulator vendors on whether or not an assignment inside an initial block at time zero would trigger another process (this was code we got from an FPGA vendor that had only tested with one simulator). Adding a #0 before the assignment in the initial block helped the two simulators agree :-/