As I showed in a previous article, avalon_st_error_adapter, this component, though simple to describe, is not so easy to generate. The Europa implementation turns out to be acceptable, by dint of some object overloading.
So - how does a MyHDL implementation of this component turn out?
Finding #1: It is possible!... but I certainly struggled with the implementation.
Two aspects of this component were difficult - not because it would be hard to code the logic in Python; rather, because it was hard to code the problem using the limited subset of Python which is convertible to Verilog.
In MyHDL, I created a mapping tuple, outputMapping, whose element of index 'i' gave the index of the input error signal which drove output error bit i. Then I loop over all matching output bits, assign the proper index bit to a variable (which becomes a signal in Verilog), and drive the output from the given input bit, something like this:
# (j is large enough to hold any index into intermediate.) j = intbv(0, min=0, max=2 + len(i_err)) for i in range(len(outputMapping)): j[:] = outputMapping[i] o_err.next[i] = intermediate[int(j)]
Notice that I use an intermediate signal, rather than assigning directly from the input error signal. The intermediate signal is 2 bits wider than the input error signal. One "extra" bit is driven with 0 (for otherwise undriven outputs); the other is driven with the logical OR of any "other" input bits. This way, I know the index value from which to drive every output error bit, whether it's a direct-map bit, an "other" bit or an output which must be driven with 0.
# (k is large enough to hold any index into intermediate.) k = intbv(0, min=0, max=2 + len(i_err)) for i in range(len(otherList)): k[:] = otherList[i] intermediate[otherIndex] = \ intermediate[otherIndex] | intermediate[int(k)]
The implementation code looks very very unlike the conceptual picture in one's brain - it should be more like straight bit assignments, and an OR gate. The violent disagreement between concept and implementation is the sort of thing that makes my spider-sense tingle.
Finding #2: Unit testing is swell.
My design flow here was to add a feature, add a test for it, add another feature, add another test, ... This worked great: I found lots of bugs quickly as I went, and the need to write tests forced me to carefully consider interface/API issues.
Here's what one of my tests looks like: it tests the function of a the simplest possible error adapter, with a single input and output error bit of matching type:
def testA(self): """ Simple case: data, valid, ready, one-bit matching input and output error. (Arguably this shouldn't be a valid error adapter - there's nothing to adapt! Philosophical issue whether or not this should be supported. I'm going to allow it, because to make a special case of it to exclude it would be more work.) """ # Direct mapping. def error_map(i_err): return i_err self.doATest( \ "testA", \ 8, \ { 'o_err': { 'a' : 0, }, 'i_err': { 'a' : 0, }, }, \ error_map = error_map, \ )
In this test I define the mapping from input error to output error, the data width, and the hash which defines the error bit mapping. Utility routines do the rest of the work of creating the actual adapter, running the simulation, verifying the outputs vs. the inputs, and running toVerilog on the adapter. Other tests have a similar framework - only the error mapping routine, the data width and the parameterization hash are varied. See test_avalon_st_error_adapter.py for the rest of the tests, and the definition of test utility routine doATest.
Finding #3: That Verilog is nigh-unreadable!
Perhaps as a natural result of the fact that the implementation relies on special MyHDL "tricks", the Verilog output appears to have been written by an evil genius. For example, if the input and output error bits are defined as follows:
{ 'o_err': { 'nomatch' : 0, 'other' : 1, 'a' : 2, 'c' : 3, }, 'i_err': { 'a' : 1, 'b' : 2, 'c' : 0, 'd' : 3, 'e' : 4, 'f' : 5, }, },
you might hope that the output error bit assignment would look something like this:
assign other = i_err[5] | i_err[4] | i_err[3] | i_err[2]; assign o_err = {i_err[0], i_err[1], other, 1'b0};
rather than this:
always @(i_err) begin: _testG_mapOutputErrorBits integer i; reg [3-1:0] k; reg [8-1:0] intermediate; reg [3-1:0] j; intermediate = {1'h0, 1'h0, i_err}; k = 0; for (i=0; i<4; i=i+1) begin // synthesis parallel_case full_case case (i) 0: k = 2; 1: k = 3; 2: k = 4; default: k = 5; endcase intermediate[6] = (intermediate[6] | intermediate[k]); end j = 0; for (i=0; i<4; i=i+1) begin // synthesis parallel_case full_case case (i) 0: j = 7; 1: j = 6; 2: j = 1; default: j = 0; endcase o_err[i] <= intermediate[j]; end end
So it goes. The illegibility of the Verilog output may not matter so much in practice, if 1) the unit test facilities lead to well-verified logic, so that the Verilog doesn't need to be studied for bugs (when was the last time you looked for bugs in an assembler listing of your C code?), and 2) the toVerilog function is reliable.
I think that'll be it for now. As usual, I include a full archive of the code:
For example: mapping = (1, 2, 0) would generate a module containing these assignments:
assign x[0] = a[1]; assign x[1] = a[2]; assign x[2] = a[0];
Making parameterized assignments like this is in europa straightforward (assume @mapping contains the particular permutation to generate):
for my $i (0 .. -1 + @mapping) { $module->add_contents( e_assign->new({ lhs => "x\[$i\]", rhs => "a\[$mapping[$i]\]", }), ); }
(europa generator, derived from europa_module_factory)
(complete Verilog listing, as generated)
A similar implementation in MyHDL turns out to be pretty clean:
@always_comb def logic(): for i in mapping: x.next[i] = a[mapping[i]]
(initial attempt in its entirety)
Using the above, I wrote some simple unit tests which try a variety of permutation mappings of various widths, driving a bunch of input values and verify the resulting output values. This works great! But trouble loomed when I tried to generate some Verilog. My innocent-looking code apparently angers toVerilog, and elicits a giant stack dump. Here's the final part of the stack dump, which I think is the actual error:
(lots of lines of stack trace deleted) myhdl.ToVerilogError: in file .../permute.py, line 15: Requirement violation: Expected (down)range call
So... it is documented that toVerilog does not support everything you can write in python; in fact, you're limited to a pretty small subset of the language in code which will be translated to HDL. This might be what I've run into, here, but sadly I don't know what a "(down)range call" is, nor who was expecting it.
I asked Stefaan, who originally pointed me toward MyHDL, about this, and he mentioned a discussion of an issue in a forum posting. In that thread, someone attempted a different implementation of a permuting module, and ran into trouble. Mr. MyHDL himself, Jan Decaluwe, came to the rescue with a method. The solution is to loop through the mapping not directly by element, but instead by index number, and to use a temp variable to index into the mapping list. (I think this is leveraging off of the same special case which allows inferred RAMs to work.) Here's what the modified generator function looks like:
@always_comb def logic(): tmp = intbv(0, min=0, max=len(mapping)) for i in range(len(mapping)): index[:] = mapping[i] x.next[i] = a[int(index)]
I think it's pretty clear that the above version of the generator does the same thing as the initial version (though it's a bit more cluttered), and, since my unit tests pass just fine with this version, I'm pretty happy with it. And, toVerilog runs without error. So what does the generated Verilog look like?
always @(a) begin: _permute_logic reg [2-1:0] tmp; integer i; tmp = 0; for (i=0; i<3; i=i+1) begin // synthesis parallel_case full_case case (i) 0: tmp = 1; 1: tmp = 2; default: tmp = 0; endcase x[i] <= a[tmp]; end end
(complete Verilog listing, as generated)
This is a lot more complex than the simple list of assignments I was hoping for.
So, the tradeoffs: with MyHDL, it's easy to write unit tests with the full power of python, and the tested behavior can then be written out as Verilog. If the conversion process is successful, the resulting generated Verilog can be regarded (though warily) as tested. But, the generated Verilog may not be so readable, and confirming that it matches the original design intent requires work (via PLI, you can run your unit tests against the generated Verilog in a simulator - haven't tried this yet).
Writing the behavioral definition can be a struggle, since it may not be clear which aspects of the language toVerilog will accept (though that's probably just my lack of experience with the tool showing through).
On the whole I think the tradeoff is worth it: MyHDL should make a strong foundation for building up a library of tested functional building blocks.
As usual, I'm attaching the various files associated with this article. At the top level are the MyHDL generator script, with its test script, verilog generator script and output file. One level down in subdirectory permute, you'll find the Europa generator, associated scripts and output file.
Say, do people prefer winzip files over rar files? Let me know.
python testPermute.py
20080213 20:50: Edit: for no good reason I posted used one parameterization (mapping) for the europa example, and a different one for MyHDL example. That doesn't help to make things clear! I fixed it... sorry if you were confused by the original version.
]]>
The goal of the MyHDL project is to empower hardware designers with the elegance and simplicity of the Python language.
Sounds good to me! In the interest of science, I decided to check it out. After reading the manual and tinkering with it for a while, I'm ready to talk about my experience with MyHDL.
MyHDL Features
Simple example: switchable inverter
I'll start off with the simplest imaginable example: a purely combinational module with one input and one output. The output is either the same as the input, or its logical inverse, according to a generation parameter. Don't close your browser window - even this simple example demonstrates interesting facets of MyHDL.
Here's the implementation:
from myhdl import always_comb def inv_or_buf(mode, a, x): @always_comb def buffer(): x.next = a @always_comb def inverter(): x.next = not a if (mode == 0): logic = buffer else: logic = inverter return logic
A few things to note:
Here's something that stands out for me about MyHDL: the low-level routines which define behavior are very simple, with not much more expressive power than the HDL they will eventually be transformed to. If you want to define behavior which depends on a parameter, then for all but the simplest of behaviors, you must declare logic for all possible parameter values, and then conditionally return only the logic which corresponds to the particular parameterization. There is a special case which allows for building ROM-like logic (anything where an output doesn't depend on inputs in an easily-computable way); fortunately, you can make any combinational logic function in ROM-like logic. I rely on this in my implementation of the Avalon-ST error adapter, which I'll get to later.
Generating some Verilog
Here's some code to invoke inv_or_buf, and produce Verilog:
from myhdl import toVerilog, Signal, intbv from inv_or_buf import inv_or_buf def make_some_verilog(mode, name): a = Signal(intbv(0)[1:]) x = Signal(intbv(0)[1:]) toVerilog.name = name toVerilog(inv_or_buf, mode, a, x) make_some_verilog(0, "buffer") make_some_verilog(1, "inverter")
The resulting Verilog output shows up in two files:
buffer.v:
module buffer ( a, x ); input [0:0] a; output [0:0] x; wire [0:0] x; assign x = a; endmodule
inverter.v:
module inverter ( a, x ); input [0:0] a; output [0:0] x; wire [0:0] x; assign x = (!a); endmodule
The output is not the most readable code you've ever seen, but it does appear to be correct.
Leaving some for later
That's it for now. I haven't touched on MyHDL unit testing, which is one of its major strengths - I'll leave that for a future article. ]]>
I've created a new component, vji_component. In the planned pulse measurement testbench, this component will be the bridge between the host system and the pulse generator logic. The general purpose of vji_component is to provide one or more host-accessible input or output signals, while hiding all the complexity of using the sld_virtual_instance.
For components I've written previously, I've provided a handful of test cases. These test cases were simple: each one generates a particular instance of the component, and then compares the generated HDL against a "known good" reference HDL file. One problem with this approach is that my "known good" files have not actually been verified for correct function. Still, this method lets me proceed confidently with component changes which should not result in changed output. vji_component follows the same basic flow that I've established with previous components, but with one additional test feature: a system test.
In the system test, a vji_component instance is configured to have an input and output signal of width 24 (by default; the width is configurable). The output signal wires to the input signal through inverters. A tcl script drives random numbers into the writedata port, reads back the inverted signal on the readdata port, and verifies the value. The block diagram shows what's going on (sorry about that "inv" block - my attempt at an ASCII inverter symbol ended in failure).
With this new system test, I'm taking the opportunity to create the entire system from as few source files as possible, under control of a Makefile. The system top-level is generated by a europa_module_factory-derived perl module and looks exactly like any other component. (I have come to realize that my use of the word "component" is not standard. When I say "component" I just mean some logic with optional sub-instances. Just about any HDL hierarchy is a "component", so maybe I need a different word.)
The test system source files are as follows:
The upshot of all this: 5 source files encode the system and test scripts. Typing "make" runs everything and reports any errors.
Zip archive of the vji_component and associated tests.]]>
Now, I'm planning to build this pulse-measurement circuit (and later on, the follow-on data decode circuit) in firmware in the f2013. I could drive pulses into the f2013 by aiming an IR remote control at the IR transceiver (as seen in VIII. My Little GP1UE267XK) and pushing various buttons on the remote. That sounds pretty annoying - I'd have to pick up and put down the remote all the time, in between typing, and the resulting pulses would vary in length according to factors beyond my control (as I documented in XIII. WWASD?).
Here's a better idea: I'll build logic in the FPGA to generate precise, reproducible pulse sequences, under control of the host PC. Without moving my hands from the keyboard, I'll download various pulse sequences to the hardware, which in turn will drive the device-under-test (f2013 firmware in active development). If I equip the f2013 and testbench logic with a SPI-to-JTAG bridge (as seen in XIV. Hello, world!), then the f2013 can send its interpretation of the pulse sequence back to the host. A script can compare the f2013's report with what was sent - so - a regression test system is possible.
Here's a top-level block diagram of the system:
You can see three basic sub-blocks:
Notice symmetry: the testbench in XII. Gathering the XBox DVD Remote Codes: Method transformed sequences of pulses from the IR remote into sequences of pulse durations. The new testbench will do the inverse transformation, durations to pulses. I have a big pile of labeled sample data which I collected during the IR remote protocol analysis; I can "play back" the samples to the f2013 firmware as I develop it.
Alright then. For the implementation, I'll generate the entire FPGA system (quartus project, pinout declarations and HDL) via script, relying heavily on europa_module_factory. The next step will be to flesh out the sub-sub-blocks within the VJI and Pulse Gen sub-blocks. ]]>
The component's Family/Genus/Species is Adapter/Avalon-ST adapter/Avalon-ST Error Adapter.
Adapter
An adapter, generally speaking, is a simple component which is inserted in between a pair of components of a particular type, to accomplish some sort of conversion. A typical example is the data-width adapter (to allow connection of, say, an Avalon-MM master and slave, or an Avalon-ST source and sink, which happen to have different data widths). A good adapter is fairly simple, does only one thing, and is completely parameterized according to information from the interfaces it connects to.
Avalon-ST Adapter
Adapters of this description are tailored to the particular set of signals supported by Avalon-ST. These adapters understand the specific direction of Avalon-ST signals (e.g. the "data" signal is an output from a source, and an input to a sink; the "ready" signal is an input to a source, and an output from a sink).
Avalon-ST Error Adapter
This adapter does straight-through wiring on all Avalon-ST signals except for one, the "error" signal. All signals other than "error" are required to have the same bit width on both the source and sink which the adapter is connects to. And that's where the regularity ends: the "error" signal is wonderfully free to vary. The source may have no error signal, or a multiple-bit one; likewise on the sink. With mismatched widths, how can the adapter do its job? Well, one more thing about the error signal: each individual bit of the error signal has a "type", which is an arbitrary string, or the special string "other". Given a "type map" for all the source and sink error bits, there are some simple rules for error signal adaptation:
Huzzah, a new base class
Since any Avalon-ST adapter will have a mess of same-width signals which wire straight through, and then maybe one signal which needs some special treatment, it makes sense to derive a base class (avalon_st_adapter) from europa_module_factory, from which all Avalon-ST adapters will further derive. This base class calls into a derived class method for doing any special signal handling, then does straight-through wiring on any remaining (non-special) signals. The derived class is concerned only with doing its special job on its special signal(s), and managing any options relevant to the special signal(s).
Command-line args - limited to simple numerals and strings thus far
But that's far too limiting. Here's why: avalon_st_adapter is a component in its own right, though really a silly one. Its generation parameters are a set of signal descriptions (name, width and type) on its "in" (driven by the source) and "out" (driving the sink) interfaces. It's natural to think of these input parameters as a simple pair of hashes, keyed on signal type. But I want to retain the guideline of pure command-line specification of parameters. I could encode those hashes as comma-separated lists of things, to be massaged and processed by a script into proper perl data structures, but that seems like a lot of work. What to do? No problem, I simply pass my hashes in perl syntax on the command line, appropriately escaped, and "eval" does the parsing for me. Validation of these non-scalar fields presents a bit of a nuisance, but nothing that can't be dealt with. For now, I simply validate against the field "type" (HASH, ARRAY, CODE, undef-for-scalar), but nothing stops me from (in the future) defining nested parameter data structures which encode the same sorts of value sets and ranges that I already use for scalar parameters.
To the basic set of port declarations which form avalon_st_adapter's generation parameters, avalon_st_error_adapter adds two more hashes, in which the "in" and "out" interface error bit types are described.
Testing, testing...
Once I had the basic avalon_st_adapter class working, and a skeleton implementation of avalon_st_error_adapter, I found myself doing lots of exploratory refactoring. I worried that I'd break the functionality, which I was happy with. Solution: unit tests. In my case, this means, for each component, a handful of test-case scripts, each of which produces an HDL module, and a top-level test script, "test.sh". After making a change, I run ". test.sh", which runs all the test cases and diffs the output HDL against a set of "known-good" files in a subdirectory. Occasionally, a change is made which does change the output HDL, and for those cases, I carefully examine the new and old files to convince myself that the new HDL file can replace the old known-good one (or note that I've created a new bug, and fix it).
Avalon-ST signal type "error": wonderfully free form
Actually, this is rather an annoying adapter, due to the unconstrained nature of the "error" signal. You'll note that all of the nuisance is concentrated in avalon_st_error_adapter::make_special_assignments().
HDL comments
I went a bit out of my way to produce comments on the adapter assignments, to label the signals and error bit types. Here's a nice ascii block diagram of error adapter test case "3":
... and here's a snippet of of test case 3's HDL implementation, showing handy assignment comments:
That's it for now. For all my most ardent fans, I'm attaching the new avalon_st_adapter and avalon_st_error_adapter components, along with their test scripts and known-good HDL files. I'm also including the latest version of europa_module_factory, which changed slightly to support the new command-line processing.
From the top
From the user's point of view, the top level is a simple script, "mk.pl", which invokes the top-level package. This script is not truly a part of the architecture, but it's an easy place to start. Here's the basic call-and-response, at your friendly neighborhood cygwin shell:
[SOPC Builder]$ perl -I common mk.pl mk.pl: Missing required parameter 'component' Missing required parameter 'top' Missing required parameter 'name' Usage: mk.pl [--help] (Print this help message) mk.pl --component=<component name> \ --top=<top level module> \ --help (Print sub-package-specific (component::top) help) mk.pl --component=<component name> \ --top=<top level module> \ --name=<top level module name> \ <component-specific options> (Create a module of type component::top, with the given name and options)
A few notes here:
sub-package-specific help
Here's something cool: component sub-packages must declare their required fields and valid values for those fields. Wouldn't it be handy if sub-package-specific help text were built from that same set of declared required fields? Yes, very handy. Example:
[SOPC Builder]$ perl -I common mk.pl --component=spi_slave \ > --top=spi_slave_mm --help Allowed fields in package 'spi_slave::spi_slave_mm': datawidth: Data width range: [1 .. maxint] lsbfirst: data direction (0: msb first; 1: lsb first) range: [0 .. 1]
(By the way, sub-package "spi_slave_mm" is one of the expected top-level packages - it's the SPI Slave component with an Avalon-MM flowcontrol interface.)
How about some help for a less top-level sub-package?
[SOPC Builder]$ perl -I common mk.pl --component=spi_slave \ > --top=fifo --help Allowed fields in package 'spi_slave::fifo': datawidth: range: [1 .. maxint] depth: allowed values: 1
You can see that this help is less verbose - that's simply because sub-package "fifo" didn't happen to provide descriptions for its fields.
Building an SPI Slave
Help text is swell, but what does a successful component build look like? A few more -I includes are required; a Makefile helps keep things tidy:
[SOPC Builder]$ make perl \ -I $QUARTUS_ROOTDIR/sopc_builder/bin \ -I $QUARTUS_ROOTDIR/sopc_builder/bin/europa \ -I $QUARTUS_ROOTDIR/sopc_builder/bin/perl_lib \ -I ./common \ mk.pl \ --component=spi_slave \ --top=spi_slave_mm \ --name=spi_0 \ --target_dir=. \ --lsbfirst=0 --datawidth=8
I've set a name for the top-level HDL file (spi_0), and specified a target directory in which to generate. Parameters "lsbfirst" and "datawidth" are passed along to the chosen subpackage, "spi_slave::spi_slave_mm".
Generator Innards
The basic inner loop of a generator sub-package looks something like this:
# construct an instance of a sub-package module-factory, for example: my $tx = spi_slave::tx->new({ lsbfirst => 0, datawidth => 13, }); # ask the module-factory for an HDL instance, and it to the module: $module->add_contents( $tx->make_instance({ name => "the_tx", }), );
Besides factory-generated instances, the sub-package will add simple logic of its own to the module.
SPI Slave Results
The new SPI Slave component occupies 32 LEs in my example system (tb_8a), and functions just as the old SPI component did (the old component occupied 40 LEs). The new component is heavily modularized; individual module tend to be very simple. The module hierarchy of the component is in 3 levels:
The simplicity of this component hides some of the power of the europa_module_factory architecture. It turns out that only a single factory instance is created by each sub-package of the SPI Slave, and in only one case (sub-package "fifo") does an factory deliver more than one HDL instance; in general, though, a single sub-package will create multiple factory instances, which in turn will deliver multiple HDL instances.
By the way, that middle level, spi_slave_st, is a perfectly viable top-level component all on its own, assuming you'd like an Avalon-ST sink and source, rather than an Avalon-MM slave with flow control. This highlights what I believe is a major feature of the architecture: hierarchy comes "for free". Any perl package (and likewise, HDL module) can be instantiated within another. The way is clear to create deeply-nested design hierarchies composed of reusable blocks. It's also possible to build complete systems of components and interconnect, all within a single running perl process. But possibly the most common use of hierarchy will be to add a small amount of functionality to an existing component, by wrapping that component in another layer.
Here's an archive of the component factory modules, spi slave modules and Makefile/build script.]]>
So, I'll be writing perl scripts which will generate HDL for me. Perl is wonderfully flexible, which means there are an unwonderfully infinite numbers of ways to proceed from here. Let's see if I can trim down the possibilities a bit with some goals...
... and guidelines:
... but I don't mean...
It might sound like I'm saying that the perl package hierarchy should reflect the HDL hierarchy. Not so; in fact, this is not possible in general. To understand why, consider the fact that instances of a particular module may appear at various places within the HDL hierarchy. I'll just place all of my subpackages one level down from the top-level package in the package hierarchy; in the file system, package foo
(foo.pm) and subpackage foo::bar
(bar.pm) will reside in subdirectory foo
.
I expect a payoff!
Related note: why bother with all these subpackages? I see these potential payoffs for the added complexity:
With sadness, a confession
For my Europa-generated modules, I would like to think in terms of two possible forms of parametrization:
The two types of parametrization are orthogonal: a module may have no parametrization, generation-time parametrization only, instantiation-time parametrization only, or both types of parametrization.
Unfortunately, Europa (as it stands today) does not handle instance-time parametrization very well. In particular, the most obviously-useful form of instance-time parameterization, parameterizable port widths, is not supported. So, I'm forced to fall back upon generation-time parametrization even for simple port width parameters.
So what does it look like?
The nucleus of the implementation is a perl package, europa_module_factory
, which defines the base class. Subclasses of europa_module_factory are responsible for producing families of modules grouped by generation-time parametrization. Each subclass implements the following methods:
This is sounding much more abstract than it actually is, so it's time for a simple example. The sub-block of the SPI slave, "rx", is a simple shift register with serial input, parallel output and a couple of control signals. There are two generation options, "lsbfirst" and "datawidth". Here's its perl module, rx.pm:
package spi_slave::rx; use europa_module_factory; @ISA = qw(europa_module_factory); use strict; sub add_contents_to_module { my $this = shift; my $module = $this->module(); my $dw = $this->datawidth(); $module->add_contents( e_register->new({ out => "rxbit", in => "sync_MOSI", enable => "sample", }), ); my $rxshift_expression; if ($this->lsbfirst()) { my $msb = $dw - 1; $rxshift_expression = "{rxbit, rxshift[$msb : 1]}"; } else { my $msb2 = $dw - 2; $rxshift_expression = "{rxshift[$msb2 : 0], rxbit}"; } $module->add_contents( e_register->new({ out => {name => "rxshift", width => $dw, export => 1,}, in => $rxshift_expression, enable => "shift", }), ); } sub get_fields { my $class = shift; my %fields = ( datawidth => {range => [1, undef]}, lsbfirst => {range => [0, 1]}, ); return \%fields; } 1;
How to invoke that perl module? A simple Makefile and top-level generation script (mk.pl) handle the grunt work. The command line is:
make COMPONENT=spi_slave FACTORY=rx NAME=rx_0 \ OPTIONS="--lsbfirst=1 --datawidth=8"
And the resulting HDL (with a bit of boilerplate removed) is:
//Module class: spi_slave::rx //Module options: //datawidth: 8 //lsbfirst: 1 //name: rx_0 module rx_0 ( // inputs: clk, reset_n, sample, shift, sync_MOSI, // outputs: rxshift ) ; output [ 7: 0] rxshift; input clk; input reset_n; input sample; input shift; input sync_MOSI; reg rxbit; reg [ 7: 0] rxshift; always @(posedge clk or negedge reset_n) begin if (reset_n == 0) rxbit <= 0; else if (sample) rxbit <= sync_MOSI; end always @(posedge clk or negedge reset_n) begin if (reset_n == 0) rxshift <= 0; else if (shift) rxshift <= {rxbit, rxshift[7 : 1]}; end endmodule
A nearly-identical invocation generates the SPI component top-level:
make COMPONENT=spi_slave FACTORY=spi_slave NAME=spi_0 \ OPTIONS="--lsbfirst=1 --datawidth=8"
Save some for later
Thoughts for future work:
For the curious, I attach the complete set of files for the spi_slave and underlying europa_module_factory, as of this moment. The SPI slave is not yet complete, but has a top-level module and two sub-modules for illustration.
]]>
But I eschew all that for a different Altera creation: a collection of perl modules called Europa.
First, the history. Long ago, some clever engineers needed to wire a little soft-core CPU to its UART, onchip memory, PIOs and other peripherals. They could have just banged out a Verilog description of the interconnections and called it a day, but that was too easy. Also, what if someone wanted eleven UARTs? What if they somehow thought VHDL was better? Then what? Clearly, automatic generation of the interconnect bus was indicated, and while we're at it, go ahead and generate the HDL for the UART and PIO as well. What better language in which to write a generator program than Perl? Case closed.
Time for a simple example. Consider the following logic, displayed in a format which still, for me, resonates deeply:
(I hope the notation is clear: the diagram shows a module named simple, which has some inputs, an OR gate, a D-flip-flop, and an output.)
Module simple translates fairly readily to Verilog:
module simple( clk, reset_n, a, b, x ); input clk; input reset_n; input a; input b; output x; wire tmp; assign tmp = a | b; reg x; always @(posedge clk or negedge reset_n) if (~reset_n) x <= 1'b0; else x <= tmp; endmodule
Even in this extremely simple example, you can see Verilog's flaws. The module's inputs and output are listed twice: once in the module port list and again as input and output declarations within the module. A different sort of redundancy exists between a given signal's direction (as input, output or neither - internal) and its role in the described logic (signal with no source, signal with no sink or signal with both source and sink). Here's the Europa equivalent of the above Verilog, which solves those problems:
use europa_all; my $project = e_project->new({ do_write_ptf => 0, }); my $module = e_module->new({name => "simple"}); $module->add_contents( e_assign->new({ lhs => "tmp", rhs => "a | b", }), e_register->new({ out => "x", in => "tmp", enable => "1'b1", }), ); $project->top($module); $project->output();
The benefits are clear:
So there you have it. I can merrily go off and build my SPI slave component in Europa, and generate to the HDL of my choice. Great!
However! I couldn't very well count myself among the top 1 billion software architects on the planet if I just went off and coded my component as pages and pages of formless Perl/Europa. No, no, no. I must first make class hierarchy diagrams, invent some directory structure guidelines, and worry about names of API methods. That's the key to success in this business.
Side-note/tangent/rant: there is an alternate Verilog style for coding combinational logic which would prefer the following for the tmp assignment:
reg tmp; always @(a or b) tmp = a | b;
I think most programmers would report a long list of flaws in this style. For myself, I find that:
That said, I admit (to my astonishment) that the above is a preferred coding style in the industry.
Not a problem: in the end, the precise style of Verilog coding is irrelevant, because (in my so-humble opinion), if you're coding in Verilog, you've already made the wrong choice. So let's not fight this fight: we can leave Verilog style issues to the language lawyers, the guardians of legacy code bases, and those evil-doers with a vested interest in seeing that HDL coding remains a black art.
Here are the sub-blocks of the SPI Slave implementation, which will map directly to HDL modules:
That seems like a lot of blocks! Fortunately, though, most of them are very very simple.
Next, I'll get to dig into the implementation. I'll probably need to say some introductory words about Europa, first.
A Note on Clock-Domain Crossing
I've made the choice to synchronize the SPI input signals as they enter the FPGA; all logic in the SPI slave will be in the system clock domain. Delaying the SPI signals like this implies an upper bound on the SCLK frequency, relative to the system clock rate (I think the max SCLK will be something like 1/4 the system clock frequency). There is another option: SCLK could drive a subsection of the SPI slave, all the way from serial input to parallel output. The parallel output would connect to proper clock-crossing FIFOs. This solution would be more complex, but should be able to run at a higher clk/SCLK ratio. I won't implement this solution for now, but it's worth keeping in mind if higher bandwidth is needed. ]]>
(The data-related stuff is obvious; see Wikipedia for a decent explanation of CPOL and CPHA.)
I see a way to make my task easier: drop configurable CPOL and CPHA. That sort of configurability in an SPI slave is a useless feature, and I can prove it. First, consider this fact: most existing SPI slaves lack CPHA and CPOL configurability. (Imagine a cheap SPI-equipped ADC chip. If it were configurable, how would you configure it? By tying pins high or low? Too expensive. By init-time communication via secret codes from the SPI master? Well, I'm sure you see the problem with that idea.) Because there are exist non-configurable SPI slaves, SPI masters (like the one in the f20123) must pick up the slack and provide variable CPHA and CPOL. There's no value in making both ends of the link configurable, so I'll drop that bit of needless complexity and choose CPOL=0, CPHA=0 for my new SPI slave.
That's a relief. Fewer features means less complexity, fewer bugs, easier testing. So, what features do I deem useful enough to implement?
That'll do. Next time: some block diagrams. ]]>
The first two components are beyond reproach: the JTAG UART because I have no idea how it works, but it works, and my little byte-pipe because it's so cute and tiny.
But the SPI slave; that's another matter. My feeling upon reading the HDL implementation, spi.v, is that its designer was not only criminally incompetent, but also completely disinterested in producing legible code. And maybe 40 LEs is not so much, but I believe I can do better. Also, in the process of doing this reimplementation, I can write a few words on my favorite hardware design methodology, "Europa".
To wrap up, here are the metrics by which I'll judge the existing spi.v versus my new implementation, in priority order:
There you have it. The challenge is on!]]>
But, since I seem to be compelled to develop a design environment rather than actually work on a design (what was I working on? Something about turning something on, or off... something like that), let's talk about debugging tools. Well. Let's talk about bugs, first.
Bugs that, say, the compiler fails on, are simple: they're right there in red text. Those bugs die quickly. (Are these even bugs? Perhaps not. But I'll assume they are, for the sake of my point.) On the other end of the spectrum are those extremely intermittent bugs which seem to occur only when we're not looking. We know these bugs through stale logfile tracings, collections of disproven hypotheses, and a body of murky, often contradictory and superstitious lore which grows throughout the long, long lifespan of the bug.
Bugs elude us by hiding. So, what do I want from a debugging tool? I wouldn't ask for too much - just something which:
I do have a few debugging tools handy. How do they rate?
So, it looks like I'm a little heavy on the powerful-but-hard-to-use side. What I'm missing is something simple and quick to iterate on, which can give me lots of data without burdening the system too much. What I need is something like... printf.
Something like printf
Most microcontrollers have some form of built-in serial communications module. The f2013 is a bit odd: rather than a plain-ol' UART, its communications module speaks SPI or I2C. But that's ok - my laptop doesn't have a UART either. Fortunately, to assist me in the simple goal of streaming bytes from the f2013 to my laptop under firmware control, I have a giant heap of programmable logic right next to my f2013, which I can use to bridge the gap between the f2013 and the laptop. Think of the solution as an "SPI-to-JTAG bridge". Here's a block diagram showing the path of a byte from f2013 to laptop:
The components of the system are:
Say, not to pull a fast one: I realize that this is the first time I'm using a system which consists of other than hand-typed Verilog and the odd megafunction module. I designed this system using Altera's SOPC Builder, which you can think of as a heap of useful hardware components and an automatic bus generator. The system consumes 303 logic elements - pretty small. For reference, tb_6, my most complex system so far, consumed 312 LEs. SOPC Builder generation and Quartus compilation complete in about 2.5 minutes.
For anyone reading who happens to be familiar with SOPC Builder, I used these tricks to optimize the system for low logic consumption and fast generation:
f2013 firmware This is a pretty simple firmware project: all I'm doing is sending a string of bytes, over and over, so I can see it in the terminal program. It took a bit of research to hit upon the correct combination of control register values; see utility routines init_spi and send_spi in the attached project archive.
By the way, I have to create my own SPI chipselect (SS_n), using a generic f2013 pin, since the built-in SPI master doesn't provide that automatically. Also notice that send_spi is a polling transmit routine: clearly the next step is to create an IRQ-based transmitter.
Hello, world!
Here's a screen capture of the spi test firmware in action:
P.S. The bug is in the pin assignments
For my own reference, mostly, here's a table of pin names and functions for this little test bench:
f2013 function | f2013 SPI function | 1c20 pin | J15 pin |
---|---|---|---|
P1.4 | <none> | U11 | J15-12 |
P1.5 | SCLK | Y11 | J15-14 |
P1.6 | MOSI | W11 | J15-13 |
P1.7 | MISO | V11 | J15-11 |
Here's the tb_8 archive.
20070922: a small optimization: SOPC Builder's DMA is a bit overpowered for the simple task of moving bytes from one place to another. Also, needing to use undocumented features annoys me. It was about a half hour's work to create a new component (simple_byte_pipe) that does the same job, wiith less logic. The new system (files attached as tb_8a.zip) uses only 248 LEs, and builds in 1.5 minutes.
Remember that the XBox remote transmits in the RCA protocol; that protocol allows for 5 different sorts of pulses:
The real world is usually a bit messy - in this particular case, the data I measured coming out of the IR receiver module diverges from those nice values. So, by trial and error, I determined an upper and lower bound for each pulse type. I bin the data into the 5 types according to these thresholds (a value falls in a bin if it is in the range (min-bin-value, max-bin-value), where the parens represent noninclusive boundaries):
bin | min-bin-value | max-bin-value |
---|---|---|
mark_4ms | 4.01 | 4.07 |
space_4ms | 3.9 | 4.0 |
mark_500us | 0.5 | 0.56 |
space_1ms | 0.9 | 1.0 |
space_2ms | 1.94 | 2.0 |
Notice that the "mark" bins have larger than nominal values, while the "space" bins have smaller than nominal values. (By the way, by design, every duration value I collected falls into one of the 5 bins.)
Here are some statistics on the collected data, grouped by bin:
bin | average value | min value | max value | deviation from nominal (%) | number of samples |
---|---|---|---|---|---|
mark_4ms | 4.046216011 | 4.02744 | 4.06724 | +1.155% | 1514 |
space_4ms | 3.979702576 | 3.96054 | 3.99844 | -0.507% | 1514 |
mark_500us | 0.539785722 | 0.51732 | 0.55442 | +7.957% | 37850 |
space_1ms | 0.95811015 | 0.94466 | 0.98132 | -4.189% | 18168 |
space_2ms | 1.965947435 | 1.94998 | 1.98714 | -1.703% | 18168 |
Average, min and max are useful, and reveal basic facts, but also hide other things. For a full picture, there's nothing like... a picture. I discovered something interesting when I plotted each bin's data as a histogram. Click on a chart thumbnail to see the full-size version:
IR Receiver Output
Bin | Histogram | mark_4ms |
---|---|
space_4ms | |
mark_500us | |
space_1ms | |
space_2ms |
Isn't that peculiar? The data for each bin is clustered into groups rather than being a nice normal distribution.
Well. The measurement is made on the output of the IR receiver; could that receiver be distorting my nice clean data? Moving upstream a bit, I took apart the Xbox remote control, and found a very simple circuit driving the IR LED. (The box labled "micro" is an integrated circuit whose markings were pretty much indecipherable - presumably some simple microcontroller.)
I've taken a new set of measurements between the cathode of the IR LED and ground.
Internal-toXBox-Remote, IR cathode-to-ground
Bin | Histogram | mark_4ms |
---|---|
space_4ms | |
mark_500us | |
space_1ms | |
space_2ms |
These histograms show that the XBox remote itself is emitting pulse durations which tend to cluster into sub-bins separated by about 20μs. The histograms have higher peaks than those measured at the IR output - perhaps the IR receiver circuit has a "smearing" effect (I can imagine that the automatic gain-control part of the receiver would have this effect).
Move forward!
This measurement and analysis is fun laboratory work, but I'm anxious to get started on some firmware. Were I to continue in this vein, I'd work on some of these tasks:
But instead, now it's time to get back to my long-neglected f2013. As a preliminary step, I plan to learn about the f2013's clock source options think about the problem of debugging visibility in embedded systems.
Oh, right. Here are the up-to-date tb_6 (IR receiver measurement) and tb_7 (IR LED cathode measurement) files.]]>
First, a recap: back in X. Start Making Sense, I displayed disappointment at Signaltap's partial scriptability, and resigned myself to manually processing the data, in heavy interaction with Signaltap's GUI and Excel. This solution, though workable, didn't sit well with me, and fortunately I found a better method.
The Virtual JTAG Interface To The Rescue!
The sld_virtual_jtag megafunction, aka the Virtual JTAG interface (VJI), provides access to the same on-chip hardware resources that signaltap makes use of, but in a far more user-configurable form. Here's what the Mighty Altera Corporation says:
The megafunction can be used to diagnose, sample, and update the
values of internal parts of your logic. With this megafunction, you
can easily sample and update the values of the internal counters and
state machines in your hardware device.
You can build your own custom software debugging IP using the Tcl
commands listed above to debug your hardware. This IP
communicates with the instances of the sld_virtual_jtag
megafunction inside your design.
I used the VJI, suitably wrapped in glue logic, to gather data under control of a tcl script. I'll give a brief textual description of the method I used; check out the attached design files for more details.
Remember that my goal is to measure the durations of the mark and space values emitted by the remote control. The measurement circuit consists of these functional blocks:
Those simple circuit elements, running at 50MHz, write the sequence of IR duration values into the FIFO. The read side of the FIFO is controlled by the VJI, which exposes two values to the outside world:
A tcl script (pseudo code, here - the actual script is get_data.tcl) running on the host gathers the FIFO data:
while (FIFO nonempty) read FIFO print FIFO readdata end while
The script process_data.bat
transforms data from all raw data files into a single summary file and one processed data file per raw data file.
So, that's the basic operation of the data-acquisition logic. Next time I'll do some analysis on the gathered data.
Postscript: The testbench for this data acquisition fiesta is "tb_6"; here are the files. This testbench could actually be useful to anyone trying to analyze a serial protocol, so it's worthwhile to give a bit of an overview. The zip file contains these files: