Sunday, December 21, 2014

Adventure in Building a Home Gym

Last year I started assembling a little garage gym so I could do some weight lifting in the comfort of home. I scoured the online classifieds and found good deals on used power racks, a bench, and some weights. I was doing pretty well finding these deals, I thought. The one oddity was with the weights. The seller wouldn't sell them separate from this monster:

I had no use for this because I had bought into the idea that compound lifts were the best and this enormous and heavy thing served only to isolate your shoulders. I took it anyway because the guy was nice and he was giving me a good deal on the weights whether this was included or not. I was also thinking in the back of my head that I have a friend that likes metalworking that could probably help me turn it into something more useful. Nearly a full year later my hope was finally realized.

I stored it as is in my garage for about 6 months. My basement was being remodeled so there were lots of things in the garage with it (on it, under it, around it) so that wasn't a big deal. Once the basement was finished and we moved everything back in, it was painfully in the way. I moved it to the side of the house where it sat for a couple months, and then finally my next-door neighbor showed me how his acetylene torch worked and this thing was now in a much more compact state:

The flat bench I originally bought was a little rickety and cheap and my idea was to turn this into a nice sturdy one. A couple weeks ago my schedule and my metalworking friends schedules finally synced up and he graciously helped me use his chop saw, MIG welder, and drill press to reform the original beast into this:

It turns out welding is pretty dang fun. Once this metal frame was done I cut out and sanded some 1/2 inch plywood for the actual bench and attached it using bolts and t-nuts. My wife graciously painted the metal with some Rustoleum, found some padding and vinyl and helped me cover the wood. Here are the last few progress pictures:

Monday, October 6, 2014

More SystemVerilog Streaming Examples

In my previous post I promised I would write about more interesting cases of streaming using a slice_size and arrays of ints and bytes.  Well, I just posted another set of streaming examples to edaplayground.  I'm going to mostly let you look at that code and learn by example, but I will take some time in this post to explain what I think is the trickiest of the conversions.  When you go to edaplayground, choose the Modelsim simulator to run these.  Riviera-PRO is currently the only other SystemVerilog simulator choice on edaplayground, and it messes up on the tricky ones (more on that in a bit).

These examples demonstrate using the streaming operator to do these conversions:
  • unpacked array of bytes to int
  • queue of ints to queue of bytes
  • queue of bytes to queue of ints
  • int to queue of bytes
  • class to queue of bytes
Of all those examples, the queue of ints to queue of bytes and the queue of bytes to queue of ints are the tricky ones that I want to spend more time explaining.  They are both tricky for the same reason.  If you are like me, your first thought on how to convert a queue of ints to a queue of bytes is to just do this:

byte_queue_dest = {<< byte{int_queue_source}};

Before I explain why that might not be what you want, be sure you remember what "right" and "left" mean from my previous post.  The problem with the straightforward streaming concatenation above is it will start on the right of the int queue (int_queue_source[max_index], because it's unpacked), grab the right-most byte of that int (int_queue_source[max_index][7:0], because the int itself is packed), and put that byte on the left of byte_queue_dest (byte_queue_dest[0], because it is unpacked).  It will then grab the next byte from the right of the int_queue (int_queue_source[max_index][15:8]) and put it in the next position of byte_queue_dest (byte_queue_dest[1]), and so on.  The result is that you end up with the ints from the int queue reversed in the byte queue.  If that doesn't make sense, change the code in the example to the above streaming concatenation and just try it.

To preserve the byte ordering, you do this double streaming concatenation:

byte_queue_dest = {<< byte{ {<< int{int_queue_source}} }};

Let's step through this using the literal representation of the arrays so that rights and lefts will be obvious.  You start with an (unpacked, of course) queue of ints:

int_queue_source = {'h44332211, 'h88776655};

And just to be clear, that means int_queu_source[0][7:0] is 'h11 and we want that byte to end up as byte_queue_dest[0].  The inner stream takes 32-bits at a time from the right and puts them on the left of a temporary packed array of bits.  That ends up looking like this:

temp_bits = 'h8877665544332211;

Now the outer stream takes 8 bits at a time from the right of that and puts them on the left of a queue.  That gives you this in the end:

byte_queue_dest = {'h11, 'h22, 'h33, 'h44, 'h55, 'h66, 'h77, 'h88};

Which, if you wanted to preserve the logical byte ordering, is correct.  Going from bytes to ints, it turns out, is pretty much the same: reverse the queue of bytes and then stream an int at a time.

So what happens with Riviera-PRO?  If you try it in edaplayground you see that the resulting queue of bytes in the int-to-byte conversion ends up with a whole bunch of extra random bytes on the right (highest indexes of the queue).  8 extra, to be exact.  Same for the int queue result in the byte-to-int conversion.  I think Riviera-PROP must be streaming past the end of the temporary packed array (that I called temp_bits above) of pulling in bytes from off in the weeds.  Pretty crazy.  That's all done behind the scenes so I don't really know, but that's sure what it looks like.  Hopefully they can fix that soon.

Well, I hope I've helped clear up how to use the streaming operators for someone.  If I haven't, I have at least helped myself understand them better.  Ask any questions you have in the comments.

Friday, October 3, 2014

SystemVerilog Streaming Operator: Knowing Right from Left

SystemVerilog has this cool feature that is very handy for converting one type of collection of bits into another type of collection of bits.  It's the streaming operator.  Or the streaming concatenation operator.  Or maybe it's the concatenation of streaming expressions (it's also called pack/unpack parenthetically).  Whatever you want to call it, it's nice.  If you have an array of bytes and you want to turn it into an int, or an array of ints that you want to turn into an array of bytes, or if you have a class instance that you want to turn into a stream of bits, then streaming is amazing.  What used to require a mess of nested for-loops can now be done with a concise single line of code.

As nice as it is, getting the hang of the streaming operator is tough.  The SystemVerilog 1800-2012 LRM isn't totally clear (at least to me) on the details of how they work.  The statement from the LRM that really got me was this, "The stream_operator << or >> determines the order in which blocks of data are streamed: >> causes blocks of data to be streamed in left-to-right order, while << causes blocks of data to be streamed in right-to-left order."  You might have some intuitive idea about which end of a stream of bits is on the "right" and which is on the "left" but, I sure didn't.  After looking at the examples of streaming on page 240 of the LRM I thought I had it, and then none of my attempts to write streaming concatenations worked like I thought they should.  Here's why: "right" and "left" are different depending on whether your stream of bits is a packed array or an unpacked array.

As far as I can tell, "right" and "left" are in reference to the literal SystemVerilog code representations of an arrays of bits.  A literal packed array is generally written like this:

bit [7:0] packed_array = 8'b0011_0101;

And packed_array[0] is on the right (that 1 right before the semicolon).  A literal unpacked array is written like this:

bit unpacked_array[] = '{1'b1, 1'b0, 1'b1, 1'b0};

unpacked_array[0] is on the left (the first value after the left curly brace).  I don't know about you, but I'm generally more concerned with actual bit positions, not what is to the right and left in a textual representation of an array, but there you have it.

Once I got that down, I still had problems.  It turns out the results of streaming concatenations will be different depending on the variable you are storing them in.  It's really the same right/left definitions coming into play.  If you are streaming using the right-to-left (<<) operator, the right-most bit of the source will end up in the left-most bit of the destination.  If your destination is a packed array then, just as I explained above, "right" means bit zero and left means the highest index bit.  If, your destination is an unpacked array, your right-most source bit will end up as bit zero of the unpacked array (which is the "right" bit according to the literal representation).

Got all that?  If not, I put a code example on edaplayground that you can run and examine the output of.  The examples are all streaming bits one at a time.  It gets a little harder to wrap your head around what happens when you stream using a slice_size and when your source and/or destinations array is an unpacked array of bytes or ints. I'll write another post explaining some tricks for those next (UPDATE: next post is here).

Wednesday, April 30, 2014

A Quick Look at svlib

I just took a quick look at svlib from Verilab.  Very cool.  It's a library for SystemVerilog that gives you file globbing, regular expressions, a better string class, simple ini config file parsing (with yaml support promised for the future!), and more.  It was announced back in March and it took me this long to getting around to reading about it.  Hopefully it doesn't take me that long to actually try it out :-)

They welcome feedback so brace yourself, here it comes.  First of all it's open source (Apache license) which is excellent.  It's open source and it has documentation.  Amazing!  :-)  It is not currently developed openly though.  Could we get a github, bitbucket, or sourceforge project going?  Our industry (design verification) desperately needs to admit and recognize that we are software developers.  I mean no, we are verifiers!  Bug finders!  It just so happens that writing software is the primary technique we use to verify designs and find bugs (hence the need for svlib).  We need to get better at writing software.  Open Source projects are a great way for us to help each other develop those skills.  The paper and presentation talk about the trade-offs and design considerations that were considered by the svlib authors as they wrote svlib.  How much better would it be for all of us to be able to see and participate in the discussions that led to the particular design of something like svlib?

Second item of feedback.  Recommending people do an import svlib_pkg::* is no surprise, but it's still bad.  C++ and Python programmers long ago realized how bad their equivalents are: using namespace and from import *.  We SystemVerilog programmers need to realize this too.  Brian Hunter makes the case for SystemVerilog in his seminal Namespaces, Build Order, and Chickens video.  You can see the reasoning for C++ and Python all over the internet.  As I have made this argument people have pointed out how awkward the alternative svlib_pkg::foo looks in your code.  A big thing that would help with that is to drop the _pkg suffix.  We don't need that suffix, it's like doing this. svlib::foo is not that bad and clearly shows exactly where foo came from.

Third item of feedback.  It was bold and probably justified to put this all together in a single library named svlib.  It probably simplified some things and expedited getting the code working and out the door.  Those are good things.  Long-term though, I think we'll be better served if this were split into multiple smaller libraries.  Maybe one for regexes alone, one for os/system interactions, one for ini parsing, and so forth.  Python and its libraries are good examples to follow here.  If you make the project open I promise to help out with this where I can and I'm sure others would too.

Those concerns aside, svlib is a great thing to happen to the SystemVerilog community and hopefully just the start of better things to come.  Collaboration and sharing of libraries and tools like this will help our entire industry grow and and progress.

Thursday, February 27, 2014

Avoiding Verilog's Non-determinism, Part 2

At the end of my last post I promised I would have another non-determinism (AKA, race condition) example from recent real-life experience. Here it comes.

Before I show you any code I want to explain how this race condition was introduced. We had a signal in an interface that needed to be widened. We had a function in some simulation-only code that looked at part of that signal and didn't care about the new bits that were added. The engineer who widened the signal decided not to change the function and instead added a new variable and assigned (using the assign keyword) the bits of interest from the newly widened signal to this new variable. He then passed this new variable to the original function in place of the original newly-widened one. Seems reasonable, right? Well, after he made that change some tests started failing and after some digging it began to look like a race condition, but it wasn't obvious where the race was coming from. The problem was that assign statement, because assign creates yet another process. The original newly widened signal was given a value in one process and the assigned variable got updated in a separate process. There was now a race between those two values (the newly-widened one and the assigned one) to get to the third process that consumed them (the process that includes the previously mentioned function).

Got all that? Here's the boiled-down code that illustrates what was going on:

module top;
   reg [3:0] foo;
   wire foo_0;
   reg ready;
   assign foo_0 = foo[0];

   initial begin
      foo = 'h0;
      ready = 0;
      foo = 'hf;
      ready = 1;
      foo = 'h0;
      ready = 0;

   initial begin
      forever begin
         $display("foo[0]: 'b%0b", foo[0]);
         $display("foo_0:  'b%0b", foo_0);

And of course, an EDA Playground version that you can play with. See the differences with different simulators? It was interesting that GPL Cver and Veriwell give the same results as the simulator I use at work (Cadence): the value of foo_0 is always a step behind the value of foo[0]. Also interesting that if I change the wire to a reg Cadence changes and gives the same results as Modelsim and Icarus (plain verilog won't allow that to be a reg, that's why I made it a wire for EDA Playground), which was not the behavior I was getting in the larger production testbench code (foo_0 was a step behind there). Non-determinism in the flesh.

The solution was to get rid of the new variable (foo_0) and the assign and just pass the foo to the function. Modifying the function to deal with a wider foo wasn't too difficult and the race between foo and foo_0 was eliminated. Easy, right? Trust me, fixing the problem wasn't the hard part. Identifying the root cause was much, much harder.

I'm understanding Jan Decaluwe's concerns with Verilog more and more as I encounter and dig into these race conditions. I think before reading his blog entries on Sigasi I thought concurrency and non-determinism had to go hand in hand. My embedded software background and experience with real-life concurrency was a large contributer to that opinion: isn't Verilog just modeling concurrency accurately when it's non-deterministic? Jan's point, I believe, is that the pain of this "accurate" modeling doesn't really buy you anything, and because processes in Verilog spring up and multiply almost (ok, definitely) without you noticing it can be very difficult to have them communicate reliably. In software like C you know exactly when you are creating another thread but in Verilog it is not so obvious. RTL Verilog has it easy because you can follow a basic (if over-restrictive) guideline (synchronize to the same clock and use non-blocking assignments in clocked processes), but in higher-level simulation-only code I don't know of an easy guideline you can follow. I guess the guideline is this: learn what creates a process and think about values racing from one process to another.