uKernel DevLog #2
Avg. 12 minute(s) of readingI'm currently enrolled in a class about embedded and real-time systems. For this class's final project, I'm developing a real-time kernel for Arduino UNO. I'll try to document the development in a series of posts. This is part #2.
On the last chapter, we've talked about basic concepts of real-time systems. Now, we'll talk about implementing multiple stacks, one for each task.
Why multiple stacks?
There are two reasons for us to want to implement a system where each task has its own stack:
- In the future, we'll implement mutexes which can lead to tasks being in a blocked state (priority inversion);
- We want our task functions to follow the prototypes of POSIX thread
functions:
void* task(void* arg)
.
On this DevLog, we'll focus on the second one.
The new task functions
Originally, a task for toggling an LED could look something like this:
1// toggles an LED 2void runTask() { 3 digitalWrite(LEDPIN, !digitalRead(LEDPIN)); 4}
This function would be called for each activation of the task. Alongside this function, we would also benefit from having a setup function for each task, that would be called whenever the task is created:
1// setup the LED pin as OUTPUT and turn the LED on 2void setupTask() { 3 pinMode(LEDPIN, OUTPUT); 4 digitalWrite(LEDPIN, HIGH); 5}
The plan is to join these methods (and the multiple calls to runTask
) into a
single method, that is called once:
1void taskFunc(void *arg) { 2 // set pin as output 3 pinMode(LEDPIN, OUTPUT); 4 digitalWrite(LEDPIN, HIGH); // turn on LED 5 6 while (true) { 7 // toggle LED 8 digitalWrite(LEDPIN, !digitalRead(LEDPIN)); 9 10 Sched_Yield(); // finished execution for this activation 11 } 12}
In order for the function to only be called once, we wrap the contents that need
to run on each activation into a while(true)
loop.
Note: contrary to what POSIX threads, we return void from these functions,
because they aren't meant to end.
Implementation
Now it's time for us to talk about how to implement this system. It should be noted that much of these ideas were taken from the documentation and code developed for FreeRTOS. The assembly code examples were also taken from their documentation (with some slight changes).
The stacks
As of now, each task has a task control block (TCB): a class containing the
execution information of a task. The TCB of each task contains the stack of the
given task. A task's stack is just an array of bytes (unsigned char
). It is
possible to calculate the maximum stack size needed for each task, but for now,
we'll just assume it is always big enough for our needs.
1typedef unsigned char byte; 2 3class Task { 4 void (* taskFunc)(void *); // task's code 5 const unsigned int period; // task's period 6 unsigned int delay; // time left until next activation 7 byte stack[BIG_ENOGH]; // task's stack 8 9 /* ... */ 10};
Context switching
Whenever a task is blocked, preempted, or finishes the current activation, we need to suspend its execution. When we do this, we need to save its context, so we can restore it later.
Saving a task's context implies saving: its program counter (PC
), the
state of the CPU's general purpose registers (32 registers, from R0
to R31
),
the CPU's status register (SREG
), the stack pointer (2 registers, SPH
and
SPL
), and the contents of the stack.
Let's focus our discussion on the situation where a task is preempted, because of a timer interrupt. Interrupt handlers (ISR) are functions with some special properties:
- Registers modified inside an ISR are restored to their original value (the value on ISR entry) on exit;
- Interrupts are disabled on ISR entry;
- ISRs end with a
RETI
(return from interrupt) instruction, instead of the usualRET
(return) instruction. This instruction re-enables interrupts on ISR exit.
The fact that the registers modified inside the ISR are saved and restored automatically isn't enough for our use case, because during preemption other registers can be modified by new running task, without the compiler knowing. Still, this is interesting, because we can look at the code disassembly and learn how to save registers.
1; --------------------------------------- 2; CODE GENERATED BY THE COMPILER TO SAVE 3; THE REGISTERS THAT GET ALTERED BY THE 4; APPLICATION CODE DURING THE ISR. 5PUSH R1 6PUSH R0 7IN R0,0x3F 8PUSH R0 9CLR R1 10PUSH R18 11PUSH R19 12PUSH R20 13PUSH R21 14PUSH R22 15PUSH R23 16PUSH R24 17PUSH R25 18PUSH R26 19PUSH R27 20PUSH R30 21PUSH R31 22; --------------------------------------- 23 24CALL 0x0000029B ; Call a subroutine 25 26; --------------------------------------- 27; CODE GENERATED BY THE COMPILER TO 28; RESTORE THE REGISTERS PREVIOUSLY 29; SAVED. 30POP R31 31POP R30 32POP R27 33POP R26 34POP R25 35POP R24 36POP R23 37POP R22 38POP R21 39POP R20 40POP R19 41POP R18 42POP R0 43OUT 0x3F,R0 44POP R0 45POP R1 46RETI 47; ---------------------------------------
As we can see, the code to save and the code to restore the registers is very similar: we start by pushing the registers, and then pop them in the opposite order.
Note: The behavior described above is caused by GCC's signal attribute. In
Arduino, we can declare an ISR for timer1 as ISR(TIMER1_COMPA_vect);
. This
automatically includes the signal attribute. In some other programs you
might need to do something like this:
void SIG_TIMER1_COMPA_vect(void) __atribute__ (( signal ));
.
GCC's naked attribute
If we add code to save all registers, this would result in some registers being saved twice: once by the compiler generated code, and again by our code. To avoid this, we use GCC's naked attribute. This attribute tells the compiler to not generate any function entry (prologue) or exit (epilogue) code for the given function. When using the naked attribute, the compiler generates the following code:
1; ------------------------------------------------ 2; NO COMPILER GENERATED CODE TO SAVE THE REGISTERS 3; ------------------------------------------------ 4 5CALL 0x0000029B ; Call a subroutine 6 7; ------------------------------------------------ 8; NO COMPILER GENERATED CODE TO RESTOR THE 9: REGISTERS OR RETURN FROM THE ISR 10; ------------------------------------------------
As we can see, now the compiler generates neither the code to save/restore the registers, nor the instruction to return from the ISR. With this, we can safely add our code to save and restore the context (don't forget the return instruction):
1ISR(TIMER1_COMPA_vect) __atribute__ (( naked )); 2 3ISR(TIMER1_COMPA_vect) { 4 // macro that explicitly saves the execution context 5 SAVE_CONTEXT(); 6 7 handleTick(); // handle the timer tick 8 9 // macro that explicitly restores the execution context 10 RESTORE_CONTEXT(); 11 12 // return from interrupt 13 asm volatile ( "reti" ); 14}
Saving the context
The following macro is responsible for saving the execution context. Note that
the variable currStack
contains the memory address of the current stack's
stack pointer. We can think of it as a pointer to the stack pointer, which in
turn points to the top of the stack. In the case of Arduino UNO, pointers are
16-bit wide, 1 byte for each Stack Pointer register (SPH
, SPL
).
1 2#define SAVE_CONTEXT() \ 3asm volatile ( \ 4 "push r0 \n\t" \ (1) 5 "in r0, __SREG__ \n\t" \ (2) 6 "cli \n\t" \ (3) 7 "push r0 \n\t" \ (4) 8 "push r1 \n\t" \ (5) 9 "clr r1 \n\t" \ (6) 10 "push r2 \n\t" \ (7) 11 "push r3 \n\t" \ 12 "push r4 \n\t" \ 13 "push r5 \n\t" \ 14 15 : 16 : 17 : 18 19 "push r30 \n\t" \ 20 "push r31 \n\t" \ 21 "lds r26, currStack \n\t" \ (8) 22 "lds r27, currStack + 1 \n\t" \ (9) 23 "in r0, __SP_L__ \n\t" \ (10) 24 "st x+, r0 \n\t" \ (11) 25 "in r0, __SP_H__ \n\t" \ (12) 26 "st x+, r0 \n\t" \ (13) 27)
- We save
R0
(__tmp_reg__
) first (1), because it is used when saving the status register (SREG
); - We move
SREG
toR0
(2), so we can push it to the stack later (4); - After saving
SREG
, we disable interrupts. It is important that we save interrupts as soon as possible. We do this after savingSREG
, because that's where the information about interrupts being enabled/disabled is stored. Although we're just analyzing the case where a task is preempted, because of a timer interrupt, there are other cases where we need to use this macro (task blocking, sleeping, yielding), so we need to disable interrupts explicitly; - The compiler assumes that
R1
(__zero_reg__
) to be zero. We save (5) its value before clearing (6) it; - Between (7) and (8), we save all general purpose registers in
numerical order (from
R2
toR31
); - In (8) and (9) we take the address of where we store the stack pointer
and store it in registers
R26
andR27
(x+
register). This is done, so we can take the low byte (10 and 11) and the high nibble (12 and 13) of the stack pointer (SP
) and store it there.
Restoring the context
The process to restore the context is similar to the one to save it: we mainly just pop the registers we pushed while saving the context in the reverse order.
1#define RESTORE_CONTEXT() \ 2asm volatile ( \ 3 "lds r26, currStack \n\t" \ (1) 4 "lds r27, currStack + 1 \n\t" \ (2) 5 "ld r28, x+ \n\t" \ 6 "out __SP_L__, r28 \n\t" \ (3) 7 "ld r29, x+ \n\t" \ 8 "out __SP_H__, r29 \n\t" \ (4) 9 "pop r31 \n\t" \ 10 "pop r30 \n\t" \ 11 12 : 13 : 14 : 15 16 "pop r1 \n\t" \ 17 "pop r0 \n\t" \ (5) 18 "out __SREG__, r0 \n\t" \ (6) 19 "pop r0 \n\t" \ (7) 20)
- As mentioned above,
currStack
contains the address from where the stack pointer can be retrieved. It is loaded into thex+
register (1 and 2); - The stack pointer is loaded into registers
R28
andR29
, and then loaded to the stack pointer registers (SPH
,SPL
); - All the general purpose registers are popped from the stack in the reverse order that they were push in;
- While saving the context, the status register (
SREG
) was pushed betweenR0
andR1
. We restore it before restoringR0
.
Program counter
When an interrupt occurs, the program counter (PC
) is automatically placed on
the stack. This is the same thing that happens when we call a function. This
leads to the following stack state upon context saving:
1| SPH | 2|----------------| 3| SPL | 4|----------------| 5| R1 - R31 | 6|----------------| 7| SREG | 8|----------------| 9| R0 | 10|----------------| 11| PC | <-- Program Counter is pushed before the SAVE_CONTEXT macro 12|----------------| 13| Stack contents | 14|----------------|
After the RESTORE_CONTEXT macro, the PC
is at the top of the stack. When the
code reaches the next RET
/RETI
instruction, it will pop the PC from the
stack and continue executing the pointed-to instruction.
Initializing the stack
With the ideas from the previous chapter, we know how to save and restore the
context of a task on its own stack: we just need to change the value of
currStack
to the one corresponding to the task that should execute right now.
There are only two questions now:
- Choosing the size of a task's stack;
- What should be the initial state of the task;
Choosing the size of a task's stack
To do this, we can analyze the task's code and calculate the stack size for a given task, or determine the size experimentally.
Analyzing the code can be interesting. In fact, we did that for our first task and came close to the real value. The problem is that it is a somewhat time consuming process and small changes in a task's code can become cumbersome, because of compiler optimizations. Still, if we don't care about finding the exact size, and fine with having just an upper-bound, we can naively add the size of all local variables. With this, we will focus on the experimental analysis.
Some people suggest analyzing some of GCC's output, namely the output produced
by -fdump-tree-all
, and -fstack-usage
. We decided to use a more
brute-force approach.
Canaries
We add a few bytes of a compile-time known value to the beginning and the end of a stack. These are useful for run-time assertions that the stack hasn't been corrupted: we didn't try to write more than the maximum stack size.
We use the canaries described above to find the minimum size needed for a stack. We use a binary search where we start with upper-bound described previously (sum all needed local variables' sizes) and start lowering the stack size iteratively. When leave the task running for a while, checking if the canaries remain intact.
With this, we eventually reach the minimum stack size (with reasonable confidence). This method probably doesn't very well for tasks that have many branching paths, because it is hard to test all paths.
Note: the canaries and the context saving operation add a little more spacial overhead to the task's stack. The stack size that the user passes to the Task constructor doesn't need to consider this (the constructor adds this size automatically). As of writing, the canaries add 6 bytes, and the context switching adds 34 bytes (32 general purpose registers + the status register + the program counter).
What should be the initial state of the task?
When we create a task, we also need to initialize its stack. The first time the task starts executing, it will be because of a RESTORE_CONTEXT. This means that the initial state of the stack should match the state that would happen if we preempted the task just as it was about to start executing: context saved when the next instruction to execute would be the first instruction of the task.
To do this, we just need to mimic the behavior of SAVE_CONTEXT. Bellow is an adaptation of the code we use to do this:
1void inline Task::push2stack(stack_t pushable) { 2 *this->stackAddr = pushable; 3 // the stack grows backwards in Arduino UNO 4 this->stackAddr--; 5} 6 7void Task::initializeStack() { 8 // this->stackAddr starts at the beginning of the stack 9 10 // save task code (PC) at the bottom of the stack 11 auto runAddr = (POINTER_SIZE_TYPE) this->run; 12 this->push2stack((stack_t) (runAddr & (POINTER_SIZE_TYPE) 0x00ff)); 13 runAddr >>= 8; 14 this->push2stack((stack_t) (runAddr & (POINTER_SIZE_TYPE) 0x00ff)); 15 16 this->push2stack((stack_t) 0x00); /* R0 */ 17 this->push2stack((stack_t) 0x80); /* SREG */ 18 this->push2stack((stack_t) 0x00); /* R1 - compiler expects it to be 0 */ 19 // debug values 20 this->push2stack((stack_t) 0x02); /* R2 */ 21 this->push2stack((stack_t) 0x03); /* R3 */ 22 this->push2stack((stack_t) 0x04); /* R4 */ 23 this->push2stack((stack_t) 0x05); /* R5 */ 24 this->push2stack((stack_t) 0x06); /* R6 */ 25 this->push2stack((stack_t) 0x07); /* R7 */ 26 this->push2stack((stack_t) 0x08); /* R8 */ 27 this->push2stack((stack_t) 0x09); /* R9 */ 28 this->push2stack((stack_t) 0x10); /* R10 */ 29 this->push2stack((stack_t) 0x11); /* R11 */ 30 this->push2stack((stack_t) 0x12); /* R12 */ 31 this->push2stack((stack_t) 0x13); /* R13 */ 32 this->push2stack((stack_t) 0x14); /* R14 */ 33 this->push2stack((stack_t) 0x15); /* R15 */ 34 this->push2stack((stack_t) 0x16); /* R16 */ 35 this->push2stack((stack_t) 0x17); /* R17 */ 36 this->push2stack((stack_t) 0x18); /* R18 */ 37 this->push2stack((stack_t) 0x19); /* R19 */ 38 this->push2stack((stack_t) 0x20); /* R20 */ 39 this->push2stack((stack_t) 0x21); /* R21 */ 40 this->push2stack((stack_t) 0x22); /* R22 */ 41 this->push2stack((stack_t) 0x23); /* R23 */ 42 43 // place the parameter on the stack in the expected location 44 auto paramAddr = (POINTER_SIZE_TYPE) this->params; 45 this->push2stack((stack_t) (paramAddr & (POINTER_SIZE_TYPE) 0x00ff)); 46 paramAddr >>= 8; 47 this->push2stack((stack_t) (paramAddr & (POINTER_SIZE_TYPE) 0x00ff)); 48 49 // more debug values on R26-R31 50 this->push2stack((stack_t) 0x26); /* R26 */ 51 this->push2stack((stack_t) 0x27); /* R27 */ 52 this->push2stack((stack_t) 0x28); /* R28 */ 53 this->push2stack((stack_t) 0x29); /* R29 */ 54 this->push2stack((stack_t) 0x30); /* R30 */ 55 this->push2stack((stack_t) 0x31); /* R31 */ 56}
- In Arduino UNO, the pointer size is 16-bit, so we define
POINTER_SIZE_TYPE
asuint16_t
; - The type
stack_t
represents a byte usingunsigned char
; - I excluded the code that sets the canaries as it is optional;
- Registers R24 and R25 contain the argument (
void *
) passed to the task; - Most registers don't have a meaningful value at the start of the task, so we fill them with debug values (values that are easily identifiable on a debugger);
- In Arduino UNO, the stack grows backwards: from the end of the stack array (high addresses) to the beginning of the array (low addresses);
- The
PC
is stored at the bottom of the stack and contains the address of the task's function: address of the first instruction.
Task yielding
When a task finishes an activation (executes all work for the current period), it needs to wait until the next activation to execute again. In the previous chapter, the tasks returned after each activation, thus signaling the kernel about the end of the activation. Since our task functions now contain a loop, we need a new way to signal the kernel to handle this. We saw 2 possible implementations:
- The task would sleep for the duration until the beginning of the next period:
- This would be difficult to do and would require more code from the programmer's side, so we decided to avoid it.
- The task calls a method (
yield
) to signal the kernel that it is done executing.
We went with the second option, because it is simpler to implement, and also to use by the programmer.
Given the ideas on previous chapters, yielding is fairly straight forward: the task calls a method that:
- Saves its context;
- Sets it as not ready for execution;
- Selects the next task to run: new highest priority task;
- Restores the context to the new task.
Idle task
There may be CPU idle periods during execution. This is problematic, because a task that wants to yield, would have no task to yield to. In order to deal with this edge-case without implementing more code, we decided to implement a special task (idle task). This task is set up, so it is always ready (never yields), and the lowest priority task of all the ready tasks.
With this, whenever we reach a CPU idle period, this is the task that runs. This model resembles a background execution server, that is something that we might talk about if we ever discuss aperiodic tasks.
Conclusion
In this chapter, we saw how we can save and restore the execution context of our tasks. This allows us to have a single function for each task, that is only called once (similar to POSIX threads). In future chapters, we will use this to implement blocking, and sleeping of tasks.
In the next chapter, we will look into dynamic priorities for tasks, and implement the Earliest Deadline First (EDF) algorithm.
Stay safe :P