Monday, August 31, 2009

llvm and clang for the msp430

As of this writing the stable llvm release does not contain msp430 support, but versions in svn do so I am using the following steps to build llvm using clang as the c frontend.

My instructions are derived from the clang instructions on this page:
http://clang.llvm.org/get_started.html

I tried using trunk for a while but it did not take long before I synced with a release that wouldnt build. I did not want to be that cutting edge, so by wandering around starting at these web addresses:
http://llvm.org/svn/llvm-project/llvm/ for llvm itself and
http://llvm.org/svn/llvm-project/cfe/ for clang I found what they called release_26 in both repos. ymmv.


svn co http://llvm.org/svn/llvm-project/llvm/branches/release_26/
cd release_26
cd tools
svn co http://llvm.org/svn/llvm-project/cfe/branches/release_26/ clang
./configure --enable-targets=msp430 --enable-optimized -disable-doxygen --prefix=/llvm
cd /path/to/release_26/
make
make install


As with binutils --prefix=/llvm defines the installation directory for the binaries, you do not need to build in or anywhere near that directory, nor do you need to prep that directory, make install takes care of it. It does need to be a path where you have file permissions to create and write files.

Note, the build time for llvm is quite slow, not sure why. If you have a multi core processor you can speed it up by using the -j option on make, for example a four core machine you might try

make -j 3

By default all targets are enabled, by using --enable-targets and limiting it to only the targets you are interested in will greatly reduce the compile time.

Now for a test program:


unsigned short add_them ( unsigned short a, unsigned short b )
{
return(a+b);
}


I named mine test2.c


PATH=/llvm/bin:$PATH
clang -Wall -emit-llvm -c test2.c -o test2.bc
opt -std-compile-opts test2.bc -f -o=test2opt.bc
llc -march=msp430 test2.bc -f -o=test2.s
llc -march=msp430 test2opt.bc -f -o=test2opt.s


I wanted to show a blinking led example using a timed counter, but of course the optimizer removes the counter loop, so I changed it to the above simple example that optimizes to what you would expect.

Not optimized:


cat test2.s

.file "test2.bc"


.text
.align 4
.globl add_them
.type add_them,@function
add_them:
.BB1_0: # %entry
sub.w #6, r1
mov.w r15, 2(r1)
mov.w r14, @r1
mov.w r14, r15
add.w 2(r1), r15
mov.w r15, 4(r1)
add.w #6, r1
ret
.size add_them, .-add_them


Optimized:


cat test2opt.s

.file "test2opt.bc"


.text
.align 4
.globl add_them
.type add_them,@function
add_them:
.BB1_0: # %entry
add.w r14, r15
ret
.size add_them, .-add_them



What llvm gives you that gcc does not is several optimization options.

With gcc your optimization is limited per function on the initial compile step.

With llvm you can optimize on
-the initial compile step.
-per bytecode file (not limited to per function per file but the whole file)
-link individual bytecode files to one bytecode file and optimize that one bigger file
-optimize on the llc step from bytecode to the target processor

Linking individual files to one file and then optimizing that file is done like this:

llvm-link a.bc b.bc -f -o=ab.bc
opt -std-compile-opts ab.bc -f -o=abopt.bc



Even if you limited yourself to two optimization levels, none and full, the number of optimization combinations still adds up to more than you are going to be willing to try.


I did one optimization experiment using zlib to compress and deflate some text and then compared the output to the original. For that one experiment optimizing on the initial c compilation and/or optimizing the individual bytecode files hurt the overall performance. The best results were when I compiled from c to bytecode files with no optimizations, then linked the bytecode files into one big bytecode file, then I performed the optimization on the one big bytecode file. This kinda makes sense as the optimizer has the most amount of code to work with. That doesnt mean there are other combinations that would have performed better I did not try them (would have been a few hundred tests).

llc has optimization on by default and I leave that as is since the bytecode doesnt know the target platform only llc can take advantage of the processors features and instructions.

Note for that one test on that one day for that one processor gcc produced a little bit faster code, around 10% if I remember right. llvm is being actively developed and from the google searches sounds like it is the standard compiler for iPhone development so I have faith that those numbers will get better with time (and with other programs/algorithms that I did not test).

There is also a gcc for the msp430, I do not have any specific objections to that toolchain, it is quite popular from what I can tell for the msp430 family. Binaries are available, build instructions involve taking a certain gcc and patching it.

With the world going 64 bit I recently ran into problems with the gcc frontend to llvm. This was not on the msp430 but the ARM (take the build instructions above and change --target=msp430 to --target=arm,msp430 and your one build turns into a multiprocessor cross compiler, I think if you leave the --target off you get all the targets!). llvm-gcc with or without -m32 caused longs to be 64 bit on a 64 bit linux and 32 bit on a 32 bit linux. For cross compiling the hosts processors interpretation of long should not affect the results. Granted loose use of the long variable type is the problem of the programmer not the compiler. zlib has longs and ints scattered willy-nilly, something I hope will get cleaned up in the near future. So my choices were 1) fixup zlib, 2) only use 32 bit linux installs 3) switch from llvm-gcc to (the immature clang) and use -m32. I chose the latter.

No comments:

Post a Comment