I talked a lot about control flow on the C system stack in
my post about the inferior runloops problem in Parrot. Since not everybody is familiar with all the mechanics of the platform ABI, I figured I would write up a theory post about how all that works.
C and Assembly CodeThe compilation process for a language like C is very straight-forward: Write C code, invoke compiler, convert code to machine code, execute it. Ultimately for a program written in C code or any language at all for that matter to execute, it must be converted to a stream of machine code instructions. The hardware simply doesn't understand anything else. Any C code can be translated, by hand if necessary into an equivalent sequence of assembly instructions, which in turn can be translated (again, you can do this by hand) into machine code. Assembly instructions and machine code tend to have a one-to-one relationship, but C code has a one-to-many relationship with assembly. It's these variants that an optimizer searches for, finding faster ways to perform the same operations. It is also possible to convert a listing of assembly code into equivalent C code. This is called "decompilation", and is an important technique in the field of reverse engineering.
C code and assembly can be converted back and forth, and are generally equivalent. There are some operations in assembly that cannot be strictly converted to C code, but that's a subset we are able to ignore for now.
Function CallsLook now at a piece of C code that we should all be familiar with, a function call:
x = Foo(a, b, c)
This line of code translates, more or less, into this sequence of x86 assembly:
mov eax, 'c'
push eax
mov eax, 'b'
push eax
mov eax 'a'
push eax
call _Foo
mov 'x', eax
sub esp, 12
I'm obviously simplifying in some places. This is probably not what your compiler would actually generate, but is not entirely dissimilar to what you would get if variables
a
,
b
,
c
, and
x
were all declared volatile, and if you didn't optimize. To make a function call, the arguments are all pushed onto the system stack (usually in left-to-right order) and the function is called. The '
call
' operation does two things: it pushes a return address onto the stack and then jumps to the function Foo. When the function returns, the return value is stored in the
eax
register (except floating-point values, which are different and beyond the scope of this post). The important point to take away from this is that the arguments and the function return value are all on the system stack in a very particular order when the function call was made. Notice also that we need to "clean" the stack after the function call: We move the stack pointer back to where it was before pushing the three arguments. The return address is popped automatically in the
ret
command that we will see next.
As a general rule, the calling function should have absolutely no knowledge or interaction with the callee's stack frame or anything below it. Likewise, the callee should have no knowledge of the caller's stack frame or anything above it.
Now, let's look at the guts of the function Foo. Here's some C code to implement it:
int Foo(int a, int b, int c)
{
return a + b + c;
}
Converting to assembly, this is:
_Foo:
mov eax, 0
mov ebx, [esp+4]
add eax, ebx
mov ebx, [esp+8]
add eax, ebx
mov ebx, [esp+12]
add eax, ebx
ret
Again, not what you are likely to actually see as the output from a good compiler. I've intentionally omitted some details for simplicity. The values a, b and c are stored on the stack, so internally to the function we read them off the stack using offsets of the esp stack register. We cannot pop the values here because popping would move the return address (it's on the top of the stack, remember).
The StackThis is all well and good for the simple example, but what if we want to have local variables in the function? And more importantly, how do we make local variables in a way that we can recurse properly? To do this, we need to create a
stack frame. But before we talk about frames, let's talk about the stack.
The stack is just a contiguous region of memory that is pointed to by the stack register
esp
. The stack grows "down", so it starts at a high value in memory and each additional item pushed goes to the next lowest place on the stack. The "
push eax
" instruction does this, essentially:
mov [esp], eax
sub esp, 4
And the "
pop eax
" instruction reverses this process:
add esp, 4
mov eax, [esp]
The biggest difference is that the push and pop instructions execute much more quickly then the add/mov combination does otherwise.
Stack FramesWith this knowledge of the stack, we can now talk about local variables in a function. Let's rewrite Foo to have some local temporary variables:
int Foo(int a, int b, int c)
{
int x, y;
x = a + b;
y = x + c;
return y;
}
Converting to assembly now, we create space on the stack for x and y:
_Foo:
push ebp ; Save the old value of ebp
mov ebp, esp ; ebp points to the stack
sub esp, 8 ; make space on the stack for x and y
...
mov esp, ebp
pop ebp
ret
The code I've shown you here is nearly identical for every function created in C. These are the
standard entry and exit sequences for x86 machines. The old value of
ebp
is pushed onto the stack to save it. This is so we can recurse easily without having to worry about the caller's stack.
ebp
is then set to point to
esp
, the "top" of the stack. Then we decrease
esp
, creating space on the stack to store local variables without having to manually "push" space for each one individually. From this point forward we can address all function arguments as offsets from
ebp
, and all function variables as offsets from
esp
. This moving stack frame allows us to recurse the function with no limitations besides stack space. Inside the function this is where the variables are stored. Remember that in x86 assembly, the
[]
brackets are pointer dereferences like "prefix:*" in C.
- old ebp =
[ebp]
- return value =
[ebp + 4]
- a =
[ebp + 8]
- b =
[ebp + 12]
- c =
[ebp + 16]
- x =
[esp + 0]
- y =
[esp + 4]
When we return, we move
esp
back to where
ebp
is, pop the stored value of
ebp
off the stack (for the calling function), and then pop the return address off the stack and jump to it. In the calling function we "
add esp 12
" to clean up the arguments from the stack and continue execution from there.
It's worth mentioning that if we create storage on the stack, such as a string or array, and write past the bounds of that array, we can accidentally overwrite things like the stored value of
ebp
or even the return address. A
stack overflow attack does this exact thing to write a specific value to the return address, which causes execution to "return" to somebody's malicious code somewhere. The C standard library string.h is notorious for allowing these kinds of things to happen.
When I talk about having functions "on the stack", what I mean is that we have stack frames for those functions on the stack above where we are currently executing.
BacktracesIf we look at a backtrace in a debugger like GDB, we see a listing of functions representing a call sequence. This sequence can be found by tracing the stack: starting at the current function and iterating over the stack. GDB can find the return address at
[ebp + 4]
, follow that to the caller function, and repeat the process until we reach
main()
. Combine finding addresses with finding metadata about the function name (in terms of debugging symbols) and we can produce a proper backtrace. Commands in GDB follow a lot of stack terminology: "frame" shows us the current frame, "up" moves us up to the previous stack frame and "down" moves us down to the next stack frame.
Continuation Passing StyleParrot doesn't use stacks for it's control flow. Instead, it uses continuations. It's beyond the scope of this post to really talk about the differences and nuances of each. Instead of being stored on the system stack, continuation objects form a linked list (actually, more of a tree or graph) in the heap. By doing this, continuations aren't automatically cleaned up when they are invoked, and we can reuse them for things like loops, coroutines, closures, and all sorts of control flow optimizations and other fun things.
ConclusionThat's a quick overview of the C system stack. For more information, I've contributed to a book on Wikibooks called
x86 Disassembly that talks more about the translation from assembly language back into C code. It covers function calls and stack frames in far more detail then I could in a single blog post. Of course, I would be happy to post more if people still have questions about it all.
In Parrot, it's difficult to think of control flow as a unified thing. We either think of PIR control flow or C control flow, both of which are very different from the other! We can, if we need to, convert PIR down to C code, and from there convert to assembly or machine code too. It's all equivalent and interrelated,