Tuesday, September 1, 2009

software development using a simulator

The software in the prior entry, msploader, contains a directory openmsp430. This contains just enough of the openmsp430 core to allow you to simulate your msp430 program. The openmsp430 package available from opencores appears to be very polished. It is not just a dozen verilog files, there is a debugger, documentation on the test bench and debugger as well as software tools to help debugging if/when you use this core in an fpga. I recommend you log in to opencores and download the whole openmsp430 package.

http://www.opencores.org/project,openmsp430

My goal is not to use this core in an fpga but to instead do software development with rom, ram, register, pin, etc, visibility for a real msp430. Also provides a good learning tool for those wanting to learn embedded or assembly or just how to develop in a simulation environment where you have your software and the hardware engineers logic.

I have only just barely started with this core and simulation so there are no frills, eventually I would like to be able to demonstrate how to stimulate inputs using C such that a non-verilog programmer can develop in this sort of an environment.

For the last howevermany years my day job has included just such a thing. Long before the boards/chips arrive you can start your software development and/or debug the logic being developed by the hardware engineers. Using a small amount of throwaway code as an abstraction layer and maybe some code for stimulus if needed, you can write programs in a native programming language (like C). Basically you can write and debug code right now that that can be reused when the hardware arrives. Traditionally you either wait for the hardware or you create some sort of software solution to wrap around your code, but that software solution does not include a functional simulation of the actual logic, so you may get well into your development only to find that the hardware and software do not agree on how each other should work. How many thousands of lines of code to you have to undo and unravel to fix this blunder?

I wanted to do this with free tools like verilator or icarus verilog. I like verilator in concept because it produces C/C++ and the interface with your programs is easier and more comfortable lets say. But it is very picky about the verilog and right off the bat started complaining about openmsp430 code. I did not want to get into that just yet so I stuck with icarus verilog for now which was happy with the code as-was. I have been using http://cyclicity-cdl.sourceforge.net recently. To use this the hardware has to be designed in cdl then you can turn cdl into C++ and/or verilog. To use it here I would need to re-write the msp430 core in cdl, something I would like to do some day, but have not even started.

Again, I have only just started using icarus verilog and this openmsp430 core so all I have to offer at the moment is a simple blink the led demo.

I am running the 8.x and 9.x ubuntus, the tools you will need in addition to those mentioned in prior blog entries are icarus verilog and gtkwave. The icarus verilog apt-got from ubuntu worked fine, the package name is just verilog. For gtkwave though the version of gtkwave varies across the linux distros so I prefer to build the latest from sources. I prefer the current gktwave user interface to the prior ones.

I have modified the openmsp430 .inc file and test bench so that it runs so many thousands of clock cycles and stops. Some verilog is required (explained below) to change this for now, I have a few ideas to work around that in the future.

To convert the logic into something you can simulate:

cd /path/to/msploader/openmsp430/
make

This builds sim.vvp.

Next you need a rom.mem file. The verilog language includes a way to load memories/proms before the simulation starts. Kinda like using msploader to program the flash in your part before you reset it and let it run. The .mem file is trivial, one line is one memory location, you need to know what memory space the hardware engineer is expecting, in this case using the .inc settings I have left the rom.mem file wants addresses 0xF800- 0xFFFF, with a 16 bit word per line (it is an ascii file, have a look). The mspdiss program in the msploader package has a bit of code that creates a rom.mem file from a .elf binary. So lets take a sample program:

.equ WDTCTL , 0x120
.equ P1OUT , 0x021
.equ P1DIR , 0x022

mov.w #0x0280,r1
mov.w #0x5A80,&0x0120

bis.b #1, &P1DIR
top:
xor.b #1, &P1OUT
mov #13,r15
a:
dec r15
jnz a

jmp top


I named mine blink.s, here were my steps to get a rom.mem from it:

PATH=/msp430/bin/:$PATH
msp430-as blink.s -o blink.o
msp430-ld blink.o -o blink.elf
../trunk/mspdiss blink.elf

Then run the simulation:

vvp sim.vvp

VCD info: dumpfile sim.vcd opened for output.
SIMULATION DONE



This is where gtkwave comes into play. The key to using this technology is the mastery of gtkwave or some other similar tool. I will go through some basics but I cannot in this space cover every little thing, learning how to do it is part of the fun.

Note, you do not have to re-build the sim.vvp file every time you change your rom.mem file, you only have to re-build it if you change the verilog.

The output if the simulation is sim.vcd (there is a line in the verilog hardcoding this file name, you can change it if you wish).

Open the file with gtkwave:

gtkwave sim.vcd

There is a window with the letters SST above it and inside you see a plus and tb_openMSP430, that is a verilog module, the top level module.

I have modified openmsp430 by adding a chip.v layer to make it feel more like a real chip.

So click on the plus next to chip_0, then click on the plus next to core, then click on the plus next to execution_unit_0 to expand it. Finally, click on register_file_0. The window under the top SST window now has a list of signal names, alu_stat[3:0], alu_stat_wr, cpuoff, etc. There is a filter edit box at the bottom, you can use it to limit the signals show in that list, if the signal names for a particular thing have something in common you can use this to select just one bus or modules data. Lets try that, enter the letters pc in that filter box. I now see only four signals. The first one pc[15:0] is the one I was interested in. Click on that signal name to select it (note: you can use the standard ctrl-A or click and shift click or ctrl-click to select more than one signal at a time).

With the signal pc selected press the Append button below the filter edit box, Append, Insert, and Replace take the selected signal and then place it in the waveform window based on which of the buttons pressed.

This signal happens to be the program counter, which is also r0, you can verify this by removing the pc letters from the filter and then finding r0[15:0] and adding it to the waveform window.

What I typically do at this point is on the gtkwave I am using there are three blue magnifying glasses, one with a minus one with a plus and one with a box like thing. The one with a box like thing makes the waveform fit to the window, press it. If you dont have that tool bar then under the Time menu select Zoom->Full

Now hopefully your window is wide enough that you see some green stuff next to the signal names. You may need to widen your window then fit the signals again. I see a little bit of red (red is normally bad that means the signals are undefined, in this case it is fine the chip is coming out of reset), then four zeros then the green boxes have pluses because the content wont fit in the small box. Here is where you start to really learn to use gtkwave. In one of the green boxes that has the 0000 in it. On the right side of the box not right on the place where it has a transition but just to the left of it click the mouse cursor. If you do it right a vertical red line shows up and it snaps to that transition.

Now expand the signals window and or move the slider at the bottom. What you should see is pc[15:0]=FFFE. What this is telling you is that the register pc is changing from the value 0x0000 to 0xFFFE at that point in time. Now go up to the magnifying glass with the plus sign and click it a few times. Pretty quick you can see the plus in the box to the right of your red line turn into the FFFE value. For the pc and r0 waveforms right now you wont see this but when you click near a transition gtkwave (other waveform viewers work the same way), if close enough it will snap to that transition but on that side of the transition. So if you click to the left of the transition other signals along that time line may show one value, then if you click just to the right of that transition you may see another value in other signals. Again for these two you wont see that. If you click far enough away from a transition it will not snap, which is also desirable depending on what you want to look at.

In the signals window click on pc[15:0]=FFFE to select it. On the tool bar to the right of the magnifying glasses I see two sets of blue arrows, one has an arrow pointing at vertical bars the other does not. Click on the right arrow that does not have a vertical bar with it. If you hit the right one what should have happened is the red line moves to the next transition of pc[15:0].

So when you click on the right arrow you should see 0xFC00, Click this again and again several more times. You will see the program counter incrementing by twos then eventually it hits a loop.

Go back to the signals list ad add r15[15:0]. The in the upper SST window select chip_0 which is one level down from the top, then find the signal p1_dout[7:0]. And zoom to full so that you see the whole simulation at one time in the waveform window.

There is the blinking led. The lsbit of p1_dout is a gpio pin. The program loads register 15 with a value and counts to zero, then it toggles p1_dout bit 0 and loads the value and counts down, repeating this forever. The sim was limited but long enough that you can see r15 counting down and the output pin toggling. Certainly toggling every 13 or so clock cycles when run on a real msp at real speeds is way too fast for your eye to see the blinking. I picked a small number to make it easy to follow in the simulation.

If you click on tb_openMSP430 in the SST window. Then select p1_dout_0 and add it to the waveform. I made a separate signal name for port 1 pin 0 output, this should look like a square wave, when tied to an led this one signal will make the led blink.

gtkwave has some quirks as far as removing signals from the waveform window. You select the signal names like a normal list window but you have to ctrl-X to remove them, the delete key doesnt do it.

A feature that I dont often take advantage of, but should, is that if you leave the gtkwave window up, change your program, re-run it so that it creates another sim.vcd, you can go to File->Reload Waveform in gtkwave and re-load the new simulation results without having to reselect the waveforms that you were interested in.

Before letting you loose, under chip_0 select rom_0. This example program did not write to ram so I cant show you anything there, but it does execute from rom, so you can select all the signals in the rom_0 module and add them to the waveform. If you go back to where the PC was set to FFFE at the beginning and then look down your marker what you see is some pre-fetching. The rom appears to know ahead of time what the pc is wanting to fetch. There is a timing game here that you/I would have to look into the logic to understand, basically the 0x4031 comes from the pc address 0xFC00 since that instruction contains an immediate the core has to fetch the next value from memory which is the 0x280 constant at 0xFC02, you see that. If you add register r1 to the waveforms you will see when r1 gets written (I am thinking all of the registers shown here are behind by a clock cycle, that is fine this kind of thing happens in hardware designs and you have to get used to it). The real point here was to show you that you can look at the rom memory bus and see the instruction fetches in execution order. Unlike a debugger you do not get a snapshot of ram or rom at any one time you only get the what is the core reading/writing right now view. Watching this in execution order is quite useful, when you think your code should have branched to something and it does not you can see from the fetch that it did not and you can look at the registers to see what the values and flags were to see why it did not do the branch you wanted. It is kinda like single stepping a debugger I guess but with much greater visibility.

If you want your sim to run longer, then edit the file tb_openMSP430.v, at the bottom you will see:

initial // Normal end of test
begin
#10000;
$display("SIMULATION DONE");
$finish;
end

Verilog is both a language you can use for hardware designs and a language you can use for testing hardware designs. There are some language features that cannot be represented in hardware and this is one of them (using the tb in the filename usually indicates test bench which is a bit of code you put around the design under test, that tb code is usually written using non-synthesizable verilog, meaning it simulates but you cannot make a logic chip from it. Anyway the lines between the begin and end can be viewed like a software program that execute in order. The line #10000; means wait 10000 clocks or nanoseconds or whatever, then it goes to the next line which displays output on the console and then the $finish ends the simulation. Make that #10000; number bigger to make the sim longer.

There is an initial begin end block just above the one described above:

initial // Timeout
begin
#5000000;
$display(" (simulation Timeout) ");
$finish;
end


The engineer put a bit of a safety in there so that the sim didnt run forever. So if you make that #10000 bigger than #5000000 then the sim will end with a simulation Timeout message. Remember the longer the sim the bigger the .vcd file.

Which leads to the last comment. vcd files are ascii and compress really really well, so if you want to share with your friends compress the .vcd file. Gtkwave supports a compressed file format that takes up much less disk space it also loads much faster so as your projects get bigger you will want to know about:

vcd2lxt sim.vcd

This takes the file sim.vcd and makes a compressed version sim.lxt, then you can use gtkwave to view the .lxt file just as you would the .vcd file:

gtkwave sim.lxt

Be careful, if you do not put the vcd2lxt in your make file, it is quite easy to run a sim that creates a .vcd file, you forget to convert it to lxt, then you open the lxt file to look at the results of the new sim and nothing has changed. That is because you are looking at the old .lxt file. It happens to the best of us.

Monday, August 31, 2009

msploader

It was quite difficult (for me) to come up with the right google searches, but I finally found not one but two efforts to loading the msp430 from linux, specifically the boards designed for the ez430 using the usb stick that comes with the ez430. The problem is the usb interface is done with a separate processor which has its own protocol on the usb serial side and from that creates the spy-bi-wire on the target msp430 side. TI did not publish this protocol but instead hid it in a windows dll. So the linux solution thus far has been to use some tricks to ride on top of that windows dll. One was called msp430fet by Travis Goodspeed (if not every then every other google search you do will hit his blog). He stopped working on it when fetproxy came out which is an attempt to replace msp430-gdbproxy on linux without the windows dll.

My goals are much much simpler. I want a program to load a binary from a file into the msp430. Finding the libraries and building and using gdb with fetproxy is more than I am willing to tolerate.

I was already working on a disassembler for the msp430. Not for public consumption really but I feel the best way to learn an instruction set is to write a disassembler for it. Elf files are somewhat trivial to read, esp for finding a single chunk of binary data, so naturally I started there. The disassembler is still in the package below, no guarantees that it is accurate, for accurate or at least supported disassembly use binutils objdump.

The current release of msploader is here:
http://www.dwelch.com/msp430stuff/msploader-r001_36d6295d5849.tar.gz

Prior releases:
http://www.dwelch.com/msp430stuff/msploader-r000_923b533b2beb.tar.gz

It happens to be a full hg repo with change info. Although the initial release, rev000 has only the one commit of the initial source.

This is a somewhat brute force program. For now the serial port is hardcoded, you will need to go into ser.c and change it. I will probably fix that before long and make it a command line option. Also, it is assuming a single binary with a single entry point, basically no interrupt support or any other events in the interrupt vector table at this time. The code is simple enough though that adding support for multiple blocks is not hard. At this time I do not have support for intel hex files, only elf files as I use binutils to generate my binaries and leave them in the native elf format without the extra objcopy to intel hex. For this tool to be useful to other compilers and toolchains I should add support for intel hex files.

One thing I learned fairly quickly was not to perform an erase all. The problem is that it erases all of the flash. The program space and ivt are fine, but it also clears the configuration space with the factory calibrated clock initializaition values. msploader erases flash pages as needed to load the program and leaves the rest unchanged. The ivt does have to be cleared in order to program the entry point.

At the time of this writing there are some Linux problems with the ez430 usb stick. Running on ubuntu 9.10 beta (9.10 wont be out for a month or two) and 9.04 you have the problem that you can only open the serial port one time per reboot of the computer. I think I saw a website with a solution but have not persued it. My choice was instead to downgrade to 8.10. 8.10 has its own problems but with little effort you can be in a situation where you can load the msp more than once without having to reboot the computer.

When you plug the usb stick in you will see something like this:

[85098.925011] ti_usb_3410_5052: TI USB 3410/5052 Serial Driver v0.9
[85099.064020] usb 4-3: new full speed USB device using ohci_hcd and address 5
[85099.280621] usb 4-3: configuration #1 chosen from 2 choices
[85099.283545] ti_usb_3410_5052 4-3:1.0: TI USB 3410 1 port adapter converter detected
[85099.283684] ti_usb_3410_5052: probe of 4-3:1.0 failed with error -5


Notice how usb 4-3 is mentioned in the above output, your output will vary, but with 8.04 or 8.10 should end in the error -5.

Using the usb information from dmesg perform this step:

echo 2 > /sys/bus/usb/devices/4-3/bConfigurationValue

This results in a new dmesg output:

[85295.502696] ti_usb_3410_5052 4-3:2.0: TI USB 3410 1 port adapter converter detected
[85295.502963] usb 4-3: TI USB 3410 1 port adapter converter now attached to ttyUSB2


The specific usb port will also vary so for now you need to examine the output of dmesg and change ser.c to match (and re-compile) before using msploader.

So long as you do not remove the usb stick you can load the msp430 as many times as you like.

This msploader release also includes an openmsp430 directory. This will be described in a separate post. Basically using an open source msp430 core in verilog and icarus verilog to create a simulation you can execute your msp430 programs with visibility inside the pins. I dont expect this to be a 100% accurate at the signal level core, but you can certainly watch the rom, ram and registers, something you cannot do staring at the chip on a board.

llvm and clang for the msp430

As of this writing the stable llvm release does not contain msp430 support, but versions in svn do so I am using the following steps to build llvm using clang as the c frontend.

My instructions are derived from the clang instructions on this page:
http://clang.llvm.org/get_started.html

I tried using trunk for a while but it did not take long before I synced with a release that wouldnt build. I did not want to be that cutting edge, so by wandering around starting at these web addresses:
http://llvm.org/svn/llvm-project/llvm/ for llvm itself and
http://llvm.org/svn/llvm-project/cfe/ for clang I found what they called release_26 in both repos. ymmv.


svn co http://llvm.org/svn/llvm-project/llvm/branches/release_26/
cd release_26
cd tools
svn co http://llvm.org/svn/llvm-project/cfe/branches/release_26/ clang
./configure --enable-targets=msp430 --enable-optimized -disable-doxygen --prefix=/llvm
cd /path/to/release_26/
make
make install


As with binutils --prefix=/llvm defines the installation directory for the binaries, you do not need to build in or anywhere near that directory, nor do you need to prep that directory, make install takes care of it. It does need to be a path where you have file permissions to create and write files.

Note, the build time for llvm is quite slow, not sure why. If you have a multi core processor you can speed it up by using the -j option on make, for example a four core machine you might try

make -j 3

By default all targets are enabled, by using --enable-targets and limiting it to only the targets you are interested in will greatly reduce the compile time.

Now for a test program:


unsigned short add_them ( unsigned short a, unsigned short b )
{
return(a+b);
}


I named mine test2.c


PATH=/llvm/bin:$PATH
clang -Wall -emit-llvm -c test2.c -o test2.bc
opt -std-compile-opts test2.bc -f -o=test2opt.bc
llc -march=msp430 test2.bc -f -o=test2.s
llc -march=msp430 test2opt.bc -f -o=test2opt.s


I wanted to show a blinking led example using a timed counter, but of course the optimizer removes the counter loop, so I changed it to the above simple example that optimizes to what you would expect.

Not optimized:


cat test2.s

.file "test2.bc"


.text
.align 4
.globl add_them
.type add_them,@function
add_them:
.BB1_0: # %entry
sub.w #6, r1
mov.w r15, 2(r1)
mov.w r14, @r1
mov.w r14, r15
add.w 2(r1), r15
mov.w r15, 4(r1)
add.w #6, r1
ret
.size add_them, .-add_them


Optimized:


cat test2opt.s

.file "test2opt.bc"


.text
.align 4
.globl add_them
.type add_them,@function
add_them:
.BB1_0: # %entry
add.w r14, r15
ret
.size add_them, .-add_them



What llvm gives you that gcc does not is several optimization options.

With gcc your optimization is limited per function on the initial compile step.

With llvm you can optimize on
-the initial compile step.
-per bytecode file (not limited to per function per file but the whole file)
-link individual bytecode files to one bytecode file and optimize that one bigger file
-optimize on the llc step from bytecode to the target processor

Linking individual files to one file and then optimizing that file is done like this:

llvm-link a.bc b.bc -f -o=ab.bc
opt -std-compile-opts ab.bc -f -o=abopt.bc



Even if you limited yourself to two optimization levels, none and full, the number of optimization combinations still adds up to more than you are going to be willing to try.


I did one optimization experiment using zlib to compress and deflate some text and then compared the output to the original. For that one experiment optimizing on the initial c compilation and/or optimizing the individual bytecode files hurt the overall performance. The best results were when I compiled from c to bytecode files with no optimizations, then linked the bytecode files into one big bytecode file, then I performed the optimization on the one big bytecode file. This kinda makes sense as the optimizer has the most amount of code to work with. That doesnt mean there are other combinations that would have performed better I did not try them (would have been a few hundred tests).

llc has optimization on by default and I leave that as is since the bytecode doesnt know the target platform only llc can take advantage of the processors features and instructions.

Note for that one test on that one day for that one processor gcc produced a little bit faster code, around 10% if I remember right. llvm is being actively developed and from the google searches sounds like it is the standard compiler for iPhone development so I have faith that those numbers will get better with time (and with other programs/algorithms that I did not test).

There is also a gcc for the msp430, I do not have any specific objections to that toolchain, it is quite popular from what I can tell for the msp430 family. Binaries are available, build instructions involve taking a certain gcc and patching it.

With the world going 64 bit I recently ran into problems with the gcc frontend to llvm. This was not on the msp430 but the ARM (take the build instructions above and change --target=msp430 to --target=arm,msp430 and your one build turns into a multiprocessor cross compiler, I think if you leave the --target off you get all the targets!). llvm-gcc with or without -m32 caused longs to be 64 bit on a 64 bit linux and 32 bit on a 32 bit linux. For cross compiling the hosts processors interpretation of long should not affect the results. Granted loose use of the long variable type is the problem of the programmer not the compiler. zlib has longs and ints scattered willy-nilly, something I hope will get cleaned up in the near future. So my choices were 1) fixup zlib, 2) only use 32 bit linux installs 3) switch from llvm-gcc to (the immature clang) and use -m32. I chose the latter.

binutils for msp430

binutils currently supports the msp430 family of microcontrollers, it is quite simple to build your own cross binutils from sources.

Start by downloading the binutils sources
http://ftp.gnu.org/gnu/binutils/

As of this writing 2.19.1 is the current release, for a direct link
http://ftp.gnu.org/gnu/binutils/binutils-2.19.1.tar.gz

Navigate to a place where you want to perform the build, this does not need to be the final destination you can remove the source directory after you finish the build and install.

The --prefix=/msp430 configuration option specifies the path where I want the binaries installed once compiled. You do not need to prepare this path ahead of time, the make install step will do this, it does need to be a place where you have file permissions to create and write files.

These steps normall work with mingw32 and msys on windows, at this time I have only tried it on Linux (32 and 64 bit ubuntu 8.04 through 9.10beta)


tar -xzvf binutils-2.19.1.tar.gz
cd binutils-2.19.1
./configure --target=msp430 --prefix=/msp430
make
make install


For ubuntu you may need packages like build-essential, bison, flex and possibly others.

To test your install, create a simple program like this one:


outer:
mov #1234,r15
inner:
dec r15
jnz inner

jmp outer


I named mine test.s


PATH=/msp430/bin:$PATH
msp430-as test.s -o test.o
msp430-ld test.o -o test.elf
msp430-objdump -D test.elf

test.elf: file format elf32-msp430


Disassembly of section .text:

0000fc00 <__ctors_end>:
fc00: 3f 40 d2 04 mov #1234, r15 ;#0x04d2

0000fc04 :
fc04: 1f 83 dec r15 ;
fc06: fe 23 jnz $-2 ;abs 0xfc04
fc08: fb 3f jmp $-8 ;abs 0xfc00



Looks like it is working. The device I am using, the 2012, a starting address of 0xFC00 is fine, you can change this by adding -Ttext 0xFD00 to the msp430-ld command (0xFD00 or whatever address).
My loader writes the reset address in the vector table to match the starting address in the binary (.elf) so I dont worry about making binutils do that. Your program will not run though unless address 0xFFFE in flash contains the entry point address of your program.

getting started

I have had an interest in the TI msp430 microcontroller for a few years now. Became aware of it when the ez430 came out (http://www.ti.com/ez430), I am a sucker for $20 microcontroller evaluation boards. I really like the instruction set, the size, power, cost, etc. I want to remember that I became frustrated for a couple of reasons, one most likely was lack of linux support for the tool so I had to use windows. Perhaps even at the time the windows tools, I am not interested in an IDE, in fact will intentionally avoid them. I want my text editor a command line compiler and a dumb loader. I want to remember that I didnt like something about being forced to use the internal RC based clock.

Recently I dusted off the ez430 box and started playing with it again. Started to think this instruction set is so simple perhaps I can make an emulator. Stumbled on two linux based attempts at loaders. Found that an msp430 backend has begun for llvm. All of this combined has re-sparked an interest. And most recently I gave up on the simulator I had started as I found (at least one) an open source core in verilog that runs nicely under icarus verilog (http://www.icarus.com/eda/verilog).

So the next few posts will cover building binutils from sources so that you have an assembler and linker. The instruction set is so simple why would you want to use C? If you do though I have a build of llvm working using llvm's clang C frontend (the gcc front is undesireable as a cross compiler on 64 bit systems, at least for ARM, so I prefer the clang front end for now as the results are consistent between 32 and 64 bit Linux). And then a simple loader program for the ez430 based usb stick. And on top of all that using the openmsp430 core for simulating programs and getting much better visibility for debugging.