Monday, October 6, 2014

More SystemVerilog Streaming Examples

In my previous post I promised I would write about more interesting cases of streaming using a slice_size and arrays of ints and bytes.  Well, I just posted another set of streaming examples to edaplayground.  I'm going to mostly let you look at that code and learn by example, but I will take some time in this post to explain what I think is the trickiest of the conversions.  When you go to edaplayground, choose the Modelsim simulator to run these.  Riviera-PRO is currently the only other SystemVerilog simulator choice on edaplayground, and it messes up on the tricky ones (more on that in a bit).

These examples demonstrate using the streaming operator to do these conversions:
  • unpacked array of bytes to int
  • queue of ints to queue of bytes
  • queue of bytes to queue of ints
  • int to queue of bytes
  • class to queue of bytes
Of all those examples, the queue of ints to queue of bytes and the queue of bytes to queue of ints are the tricky ones that I want to spend more time explaining.  They are both tricky for the same reason.  If you are like me, your first thought on how to convert a queue of ints to a queue of bytes is to just do this:

byte_queue_dest = {<< byte{int_queue_source}};

Before I explain why that might not be what you want, be sure you remember what "right" and "left" mean from my previous post.  The problem with the straightforward streaming concatenation above is it will start on the right of the int queue (int_queue_source[max_index], because it's unpacked), grab the right-most byte of that int (int_queue_source[max_index][7:0], because the int itself is packed), and put that byte on the left of byte_queue_dest (byte_queue_dest[0], because it is unpacked).  It will then grab the next byte from the right of the int_queue (int_queue_source[max_index][15:8]) and put it in the next position of byte_queue_dest (byte_queue_dest[1]), and so on.  The result is that you end up with the ints from the int queue reversed in the byte queue.  If that doesn't make sense, change the code in the example to the above streaming concatenation and just try it.

To preserve the byte ordering, you do this double streaming concatenation:

byte_queue_dest = {<< byte{ {<< int{int_queue_source}} }};

Let's step through this using the literal representation of the arrays so that rights and lefts will be obvious.  You start with an (unpacked, of course) queue of ints:

int_queue_source = {'h44332211, 'h88776655};

And just to be clear, that means int_queu_source[0][7:0] is 'h11 and we want that byte to end up as byte_queue_dest[0].  The inner stream takes 32-bits at a time from the right and puts them on the left of a temporary packed array of bits.  That ends up looking like this:

temp_bits = 'h8877665544332211;

Now the outer stream takes 8 bits at a time from the right of that and puts them on the left of a queue.  That gives you this in the end:

byte_queue_dest = {'h11, 'h22, 'h33, 'h44, 'h55, 'h66, 'h77, 'h88};

Which, if you wanted to preserve the logical byte ordering, is correct.  Going from bytes to ints, it turns out, is pretty much the same: reverse the queue of bytes and then stream an int at a time.

So what happens with Riviera-PRO?  If you try it in edaplayground you see that the resulting queue of bytes in the int-to-byte conversion ends up with a whole bunch of extra random bytes on the right (highest indexes of the queue).  8 extra, to be exact.  Same for the int queue result in the byte-to-int conversion.  I think Riviera-PROP must be streaming past the end of the temporary packed array (that I called temp_bits above) of pulling in bytes from off in the weeds.  Pretty crazy.  That's all done behind the scenes so I don't really know, but that's sure what it looks like.  Hopefully they can fix that soon.

Well, I hope I've helped clear up how to use the streaming operators for someone.  If I haven't, I have at least helped myself understand them better.  Ask any questions you have in the comments.

Friday, October 3, 2014

SystemVerilog Streaming Operator: Knowing Right from Left

SystemVerilog has this cool feature that is very handy for converting one type of collection of bits into another type of collection of bits.  It's the streaming operator.  Or the streaming concatenation operator.  Or maybe it's the concatenation of streaming expressions (it's also called pack/unpack parenthetically).  Whatever you want to call it, it's nice.  If you have an array of bytes and you want to turn it into an int, or an array of ints that you want to turn into an array of bytes, or if you have a class instance that you want to turn into a stream of bits, then streaming is amazing.  What used to require a mess of nested for-loops can now be done with a concise single line of code.

As nice as it is, getting the hang of the streaming operator is tough.  The SystemVerilog 1800-2012 LRM isn't totally clear (at least to me) on the details of how they work.  The statement from the LRM that really got me was this, "The stream_operator << or >> determines the order in which blocks of data are streamed: >> causes blocks of data to be streamed in left-to-right order, while << causes blocks of data to be streamed in right-to-left order."  You might have some intuitive idea about which end of a stream of bits is on the "right" and which is on the "left" but, I sure didn't.  After looking at the examples of streaming on page 240 of the LRM I thought I had it, and then none of my attempts to write streaming concatenations worked like I thought they should.  Here's why: "right" and "left" are different depending on whether your stream of bits is a packed array or an unpacked array.

As far as I can tell, "right" and "left" are in reference to the literal SystemVerilog code representations of an arrays of bits.  A literal packed array is generally written like this:

bit [7:0] packed_array = 8'b0011_0101;

And packed_array[0] is on the right (that 1 right before the semicolon).  A literal unpacked array is written like this:

bit unpacked_array[] = '{1'b1, 1'b0, 1'b1, 1'b0};

unpacked_array[0] is on the left (the first value after the left curly brace).  I don't know about you, but I'm generally more concerned with actual bit positions, not what is to the right and left in a textual representation of an array, but there you have it.

Once I got that down, I still had problems.  It turns out the results of streaming concatenations will be different depending on the variable you are storing them in.  It's really the same right/left definitions coming into play.  If you are streaming using the right-to-left (<<) operator, the right-most bit of the source will end up in the left-most bit of the destination.  If your destination is a packed array then, just as I explained above, "right" means bit zero and left means the highest index bit.  If, your destination is an unpacked array, your right-most source bit will end up as bit zero of the unpacked array (which is the "right" bit according to the literal representation).

Got all that?  If not, I put a code example on edaplayground that you can run and examine the output of.  The examples are all streaming bits one at a time.  It gets a little harder to wrap your head around what happens when you stream using a slice_size and when your source and/or destinations array is an unpacked array of bytes or ints. I'll write another post explaining some tricks for those next (UPDATE: next post is here).

Wednesday, April 30, 2014

A Quick Look at svlib

I just took a quick look at svlib from Verilab.  Very cool.  It's a library for SystemVerilog that gives you file globbing, regular expressions, a better string class, simple ini config file parsing (with yaml support promised for the future!), and more.  It was announced back in March and it took me this long to getting around to reading about it.  Hopefully it doesn't take me that long to actually try it out :-)

They welcome feedback so brace yourself, here it comes.  First of all it's open source (Apache license) which is excellent.  It's open source and it has documentation.  Amazing!  :-)  It is not currently developed openly though.  Could we get a github, bitbucket, or sourceforge project going?  Our industry (design verification) desperately needs to admit and recognize that we are software developers.  I mean no, we are verifiers!  Bug finders!  It just so happens that writing software is the primary technique we use to verify designs and find bugs (hence the need for svlib).  We need to get better at writing software.  Open Source projects are a great way for us to help each other develop those skills.  The paper and presentation talk about the trade-offs and design considerations that were considered by the svlib authors as they wrote svlib.  How much better would it be for all of us to be able to see and participate in the discussions that led to the particular design of something like svlib?

Second item of feedback.  Recommending people do an import svlib_pkg::* is no surprise, but it's still bad.  C++ and Python programmers long ago realized how bad their equivalents are: using namespace and from import *.  We SystemVerilog programmers need to realize this too.  Brian Hunter makes the case for SystemVerilog in his seminal Namespaces, Build Order, and Chickens video.  You can see the reasoning for C++ and Python all over the internet.  As I have made this argument people have pointed out how awkward the alternative svlib_pkg::foo looks in your code.  A big thing that would help with that is to drop the _pkg suffix.  We don't need that suffix, it's like doing this. svlib::foo is not that bad and clearly shows exactly where foo came from.

Third item of feedback.  It was bold and probably justified to put this all together in a single library named svlib.  It probably simplified some things and expedited getting the code working and out the door.  Those are good things.  Long-term though, I think we'll be better served if this were split into multiple smaller libraries.  Maybe one for regexes alone, one for os/system interactions, one for ini parsing, and so forth.  Python and its libraries are good examples to follow here.  If you make the project open I promise to help out with this where I can and I'm sure others would too.

Those concerns aside, svlib is a great thing to happen to the SystemVerilog community and hopefully just the start of better things to come.  Collaboration and sharing of libraries and tools like this will help our entire industry grow and and progress.

Thursday, February 27, 2014

Avoiding Verilog's Non-determinism, Part 2

At the end of my last post I promised I would have another non-determinism (AKA, race condition) example from recent real-life experience. Here it comes.

Before I show you any code I want to explain how this race condition was introduced. We had a signal in an interface that needed to be widened. We had a function in some simulation-only code that looked at part of that signal and didn't care about the new bits that were added. The engineer who widened the signal decided not to change the function and instead added a new variable and assigned (using the assign keyword) the bits of interest from the newly widened signal to this new variable. He then passed this new variable to the original function in place of the original newly-widened one. Seems reasonable, right? Well, after he made that change some tests started failing and after some digging it began to look like a race condition, but it wasn't obvious where the race was coming from. The problem was that assign statement, because assign creates yet another process. The original newly widened signal was given a value in one process and the assigned variable got updated in a separate process. There was now a race between those two values (the newly-widened one and the assigned one) to get to the third process that consumed them (the process that includes the previously mentioned function).

Got all that? Here's the boiled-down code that illustrates what was going on:

module top;
   reg [3:0] foo;
   wire foo_0;
   reg ready;
   
   assign foo_0 = foo[0];

   initial begin
      #10;
      foo = 'h0;
      ready = 0;
      #10;
      foo = 'hf;
      ready = 1;
      #10;
      foo = 'h0;
      ready = 0;
   end

   initial begin
      forever begin
         @(ready);
         $display("foo[0]: 'b%0b", foo[0]);
         $display("foo_0:  'b%0b", foo_0);
         $display("------------");
      end
   end
endmodule

And of course, an EDA Playground version that you can play with. See the differences with different simulators? It was interesting that GPL Cver and Veriwell give the same results as the simulator I use at work (Cadence): the value of foo_0 is always a step behind the value of foo[0]. Also interesting that if I change the wire to a reg Cadence changes and gives the same results as Modelsim and Icarus (plain verilog won't allow that to be a reg, that's why I made it a wire for EDA Playground), which was not the behavior I was getting in the larger production testbench code (foo_0 was a step behind there). Non-determinism in the flesh.

The solution was to get rid of the new variable (foo_0) and the assign and just pass the foo to the function. Modifying the function to deal with a wider foo wasn't too difficult and the race between foo and foo_0 was eliminated. Easy, right? Trust me, fixing the problem wasn't the hard part. Identifying the root cause was much, much harder.

I'm understanding Jan Decaluwe's concerns with Verilog more and more as I encounter and dig into these race conditions. I think before reading his blog entries on Sigasi I thought concurrency and non-determinism had to go hand in hand. My embedded software background and experience with real-life concurrency was a large contributer to that opinion: isn't Verilog just modeling concurrency accurately when it's non-deterministic? Jan's point, I believe, is that the pain of this "accurate" modeling doesn't really buy you anything, and because processes in Verilog spring up and multiply almost (ok, definitely) without you noticing it can be very difficult to have them communicate reliably. In software like C you know exactly when you are creating another thread but in Verilog it is not so obvious. RTL Verilog has it easy because you can follow a basic (if over-restrictive) guideline (synchronize to the same clock and use non-blocking assignments in clocked processes), but in higher-level simulation-only code I don't know of an easy guideline you can follow. I guess the guideline is this: learn what creates a process and think about values racing from one process to another.

Wednesday, February 26, 2014

Avoiding Verilog's Non-determinism, Part 1

In my last post we looked at some example code that showed off Verilog's non-determinism. Here it is again (you can actually run it on multiple simulators here on EDA Playground):

module top;
   reg ready;
   integer result;

   initial begin
      #10;
      ready <= 1;
      result <= 5;
   end

   initial begin
      @(posedge ready);
      if(result == 5) begin
       $display("result was ready");
      end
      else begin
       $display("result was not ready");
      end
   end   
endmodule

Just to review from last time, the problem is that sometimes the @(posedge ready) will trigger before result has the value 5 and sometimes it will trigger after result has the value 5. We have called this non-determinism but a more common term for it is, race condition. There is a race between the values of ready and result making it to that second process (the second initial block). If result is updated first (wins the race) then everything runs as the writer of the code intended. If ready is updated first (wins the race) then the result will not actually be ready when the writer of the code intended.

Now the question is, is there a way to write this code so that there is no race condition? Well, first of all I surveyed my body of work on simulation-only code and didn't find very many uses of non-blocking assignments like that. The common advice in the Verilog world is to use non-blocking assignments in clocked always blocks not in "procedural" code like this. If we change the above to use blocking instead of non-blocking assignments, does that fix the problem? Here's what the new first initial block looks like:

   initial begin
      #10;
      ready = 1;
      result = 5;
   end

You can try it on EDA Playground and see that it still behaves the same as it did before except for with GPL Cver. With non-blocking assignments you get "result was not ready" with Cver and now you get "result was ready." That doesn't give me a lot of warm fuzzy feelings though. In fact, looking at that code makes me feel worse. If I'm thinking procedurally it looks totally backwards to set ready to one before assigning the value to result. My instinct would be to write the first initial block like this:

   initial begin
      #10;
      ready = 1;
      result = 5;
   end

Is that better for avoiding race conditions? If I take the explanation for why race-conditions exist in Verilog from Jan Decaluwe's VHDL's Crown Jewel post at face value, I think it actually is. That post explains that right after the first assignment (signal value update, if we use Jan's wording) in the first initial block Verilog could decide to trigger the second process (the second initial block). That case causes problems in the original code because the first assignment is to ready and result doesn't yet have its updated value. With the assignments re-ordered as above even if the second initial block is activated after the first assignment it will not try to read the value of result. It will just block waiting for a posedge ready (which will happen next). Race condition: eliminated. Here is the full fixed code example on EDA Playground.

Strangely enough, I spent the day yesterday debugging and fixing a race condition in our production testbench code here at work. It was very different from this one, so don't get too confident after reading this single blog post. I was able to boil the problem from yesterday down into another small example and so my next post will show off that code and how I eliminated that particular race.

UPDATE: As promised another example of a race condition.