Brooks Patola

Aarch64 and x86_64 Assembler

For the purposes of this endeavor we will be building off of the last Compiled C Lab post. Except this time, instead of just using compiler options to look into changes of the assembly code, we will actually be writing an assembly program on two different architectures! “Cool” (only says the geeks)! Lets get going…

Before we begin writing our program, we were provided three different versions of the same “Hello World” C program to inspect on both architectures.

The 3 versions of “Hello World” and their objdump <main> output on x86_64:

printf() version:

Screen Shot 2017-09-29 at 10.47.54 PM

Screen Shot 2017-09-29 at 10.58.44 PM

write() version:

Screen Shot 2017-09-29 at 10.53.14 PM

Screen Shot 2017-09-29 at 11.00.13 PM

syscall() version:

Screen Shot 2017-09-29 at 10.54.39 PM

Screen Shot 2017-09-29 at 11.01.06 PM

Example of partial view of  objdump -d on syscall() version (notice total amount of code):

Screen Shot 2017-09-29 at 11.05.27 PM.png

Now we will look at two versions of “Hello World” written in different assembly syntax and their corresponding objdump -d output…

GNU AS(GAS) syntax (AT&T style):

Screen Shot 2017-09-29 at 11.39.51 PM

This is the full output of the objdump (notice how small it is compared to its C counterpart):

Screen Shot 2017-09-29 at 11.45.39 PM.png

NASM syntax (Intel style):

Screen Shot 2017-09-29 at 11.41.49 PM.png

Again, notice how small the output is compared to the C version…

Screen Shot 2017-09-29 at 11.47.48 PM.png

A very interesting discussion and comparison of the two Linux assemblers can be found here.

Now lets look at the three different versions of the “Hello World” C program and their objdump -d output on aarch64

It is interesting to note how both architectures handle the program instructions.
Although they appear quite similar we can notice variations in how they interact with their registers (which we learned about in the last post!).

printf() version:

Screen Shot 2017-09-29 at 11.56.23 PM.png

write() version:

Screen Shot 2017-09-29 at 11.57.26 PM.png

syscall() version:

Screen Shot 2017-09-29 at 11.58.18 PM.png

Now lets inspect a “Hello World” assembly program and its objdump -d on aarch64

Screen Shot 2017-09-30 at 12.11.46 AM.png

Very similar to its x86_64 counterpart, although we can notice a slight variation in how they interact with their registers.

Now time to implement an assembly program with a loop that will print the values 0-9 as follows:

Screen Shot 2017-09-30 at 12.45.43 AM.png

Wait! this may not be as easy as it seems! We will have to print the word “loop” each time it loops with its index value beside it. To print the index value we must convert an integer to digit character. For those curious, you can refer to the manpage for ascii on how to properly gather the values.

An example using x86_64:

Screen Shot 2017-09-30 at 12.34.14 AM

A bit more confusing than a simple loop in C I’d say.

We are now asked to extend the code to loop from 00-30, then to suppress the high digit when it is 0, in the end printing the values 0-30. Lets give it a try…

Wait! this seems tricky… yet it might be even trickier than it seems!

We will need to take the loop index and convert it to a 2-digit decimal number by dividing by 10. To perform this operation we will use the div instruction, which takes the dividend from rax and the divisor from the register supplied as an argument. The quotient will be placed in tax and the remainder will be placed in rdx.

I initially ran into a couple errors that I had to spend a bit of time on Google to figure out what was happening…

Screen Shot 2017-09-30 at 1.51.06 AM.png

It seems we are using gcc to link which will by default add the C libraries which expect a main and already contain _start that will invoke main…

So I attempted a suggested fix…

Screen Shot 2017-09-30 at 1.51.50 AM.png

Interesting error log, although I can see that adding the -m32 makes it a 32 bit executable which won’t work on this architecture . So alas, I attempted to compile without the -m32 tag …

Screen Shot 2017-09-30 at 1.52.03 AM.png

The output…

Screen Shot 2017-09-30 at 1.58.47 AM.png

sweet! it works!

link to x86_64 code.

We were then asked to perform the same loop using on aarch64. The source code for which can be found here.

Overall, I found that writing and debugging in assembly vs a high level programming language such as C is far more difficult to initially grasp (especially as a noob to assembly language and these specific architectures dealing with the registers). I believe this is due to the rather unintuitive nature of programming the instructions and interacting with the underlying CPU registers. Although it may seem discouraging a bit at first look due to the rather unfamiliar nature of the instructions and the mnemonics involved, I can see the value it has to learn. With assembly we can access any memory location, control the machine code better, and manipulate bits easier than high level languages. Comparing the two architectures, although it feels that aarch64 has a simpler command structure, if I was forced to choose a personal preference, I’d side with x86_64 for now. They are very similar, but I do prefer the clear register names such as rbp and the bash script nature of referring to immediate values with a $. I will spend some more time continuing to learn about assembly language and these architectures and hopefully reach a greater understanding of how everything interacts in the near future.

For deeper learning, refer to these great references:



And if you are still feeling a bit discouraged with assembly, here is a good discussion as to why it is still important today!


Leave a Comment