# Port of Mimiker Operating System for AArch64 Architecture

(Port systemu operacyjnego Mimiker na architekturę AArch64)

Paweł Jasiak

Praca licencjacka

**Promotorzy:** mgr Krystian Bacławski dr Piotr Witkowski

> Uniwersytet Wrocławski Wydział Matematyki i Informatyki Instytut Informatyki

> > 20czerwca2021

#### Abstract

Many processor families are available today. In my work, I will present the process of preparing an operating system port for a new family of processors on the example of a Mimiker port for the AArch64 architecture. As this is the first port of that system, it required some extra work to separate the architecture-dependent parts from the independent ones. The port requires modification of many critical parts of the kernel: handling processor exceptions, managing the address space of processes, exchanging data between kernel threads and user programs. In addition, it was needed to prepare a set of tools that support the selected architecture, such as a compiler and a hardware debugger. The result of this work is the ability to run the Mimiker system for the first time on widespread hardware - the Raspberry Pi 3.

Obecnie dostępnych jest wiele rodzin procesorów. W mojej pracy przedstawię jak wygląda proces przygotowania portu systemu operacyjnego na nową rodzinę procesorów na przykładzie portu systemu Mimiker na architekturę AArch64. Jako, że jest to pierwszy port tego systemu, wymagał on dodatkowej pracy związanej z rozdzieleniem części zależnych od architektury od niezależenych. Port wymaga modyfikacji wielu krytycznych części jądra: obsługa wyjątków procesora, zarządzanie przestrzenią adresową procesów, wymiana danych pomiędzy wątkami jądra a programami użytkownika. Ponadto należało przygotować zestaw narzędzi, które wspierają wybraną architekturę jak kompilator oraz sprzętowy debuger. Rezultatem tej pracy jest możliwość uruchomienia po raz pierwszy systemu Mimiker na ogólnodostępnym sprzęcie – Raspberry Pi 3.

to Wiktor, who showed me the way

# Contents

| 1 | Intr | oduction                              | 11        |
|---|------|---------------------------------------|-----------|
|   | 1.1  | CPU architecture                      | 11        |
|   | 1.2  | Mimiker                               | 12        |
|   | 1.3  | What is a port?                       | 12        |
|   | 1.4  | What is needed for a port?            | 13        |
| 2 | RPI  | [ 3                                   | 15        |
|   | 2.1  | CPU                                   | 15        |
|   |      | 2.1.1 MMU                             | 15        |
|   |      | 2.1.2 Exception levels                | 18        |
|   |      | 2.1.3 Exceptions                      | 18        |
|   |      | 2.1.4 Timer                           | 19        |
|   | 2.2  | UART                                  | 19        |
| 3 | Pre  | paring the system for porting process | <b>21</b> |
|   | 3.1  | Dictionary                            | 21        |
|   | 3.2  | CPU context                           | 23        |
|   |      | 3.2.1 FPU context                     | 25        |
|   | 3.3  | Pmap                                  | 26        |
|   |      | 3.3.1 Access emulation                | 27        |
|   |      | 3.3.2 Growkernel                      | 28        |
|   | 3.4  | KASAN                                 | 28        |
|   |      | 3.4.1 Kernel bootstrap                | 29        |

|   |     | 3.4.2   | Grow kernel               | 30 |
|---|-----|---------|---------------------------|----|
|   | 3.5 | Infrast | cructure for testing      | 31 |
|   | 3.6 | Gener   | ic UART interface         | 32 |
|   |     | 3.6.1   | Low level interface       | 32 |
|   |     | 3.6.2   | Generic interrupt handler | 33 |
|   |     | 3.6.3   | UART-TTY thread           | 34 |
| 4 | Min | niker c | on AArch64                | 37 |
|   | 4.1 | How t   | o run emulator?           | 37 |
|   | 4.2 | Intera  | ction with user-space     | 38 |
|   |     | 4.2.1   | Сору                      | 38 |
|   |     | 4.2.2   | Syscall handler           | 41 |
|   |     | 4.2.3   | crt0                      | 44 |
|   |     | 4.2.4   | Signals                   | 46 |
|   |     | 4.2.5   | Nonlocal goto             | 47 |
|   | 4.3 | pmap    |                           | 54 |
|   |     | 4.3.1   | Interface                 | 55 |
|   |     | 4.3.2   | Protection map            | 59 |
|   |     | 4.3.3   | Walk                      | 59 |
|   |     | 4.3.4   | Activation                | 61 |
|   |     | 4.3.5   | Access emulation          | 61 |
|   |     | 4.3.6   | Growkernel                | 62 |
|   | 4.4 | KASA    | N                         | 63 |
|   | 4.5 | Boot    |                           | 65 |
|   |     | 4.5.1   | start.S                   | 65 |
|   |     | 4.5.2   | boot.c                    | 66 |
|   |     | 4.5.3   | board stack               | 67 |
|   | 4.6 | Excep   | tion handler              | 68 |
|   | 4.7 | Conte   | xt switching              | 70 |
|   | 4.8 | Device  | e tree                    | 73 |

|          |       | 4.8.1   | Rootdev            | 73 |
|----------|-------|---------|--------------------|----|
|          |       | 4.8.2   | Timer              | 75 |
|          |       | 4.8.3   | PL011              | 76 |
|          | 4.9   | Summ    | ary                | 78 |
| <b>5</b> | Min   | niker o | on Raspberry Pi 3  | 79 |
|          | 5.1   | Install | ation              | 79 |
|          |       | 5.1.1   | Toolchain          | 79 |
|          |       | 5.1.2   | Configuration      | 79 |
|          |       | 5.1.3   | Compilation        | 80 |
|          |       | 5.1.4   | Final installation | 80 |
|          |       | 5.1.5   | Debugging          | 81 |
|          | 5.2   | Challe  | nges               | 84 |
|          |       | 5.2.1   | Boot process       | 84 |
|          |       | 5.2.2   | Destroying $x0$    | 84 |
|          |       | 5.2.3   | Address alignment  | 85 |
|          |       | 5.2.4   | Cache control      | 85 |
| 6        | Sun   | nmary   |                    | 87 |
|          | 6.1   | Future  | work               | 88 |
| Bi       | bliog | graphy  |                    | 91 |

# Chapter 1

# Introduction

Nowadays we have multiple architectures of CPU. The most popular for customers are x86\_64 and AArch64. The first one is used in most personal computers since 2003. The second one was created in 2011 and is used mostly for smartphones and IoT devices but last year Apple migrated their devices to that architecture. It is expected that ARM architecture will become even more popular in the following years.

From a software engineer's point of view target architecture for software usually doesn't matter – we have a lot of abstractions over hardware and operating systems. But somebody needs to create these abstractions.

In my thesis, I will guide the reader through the abstractions that need to be created in an operating system for a new architecture.

I assume that the reader knows basic concepts from the standard course of computer system architecture and basic facts about Unix kernel design.

The structure of this thesis is as follows: this chapter provides an introduction to Mimiker operating system and explains what is a port. Chapter 2 describes the most important facts about Raspberry Pi 3 board, it is a target of my port. In Chapter 3 I explain the most important changes in MIPS and machine-independent code of Mimiker. Chapter 4 describes implementation details for AArch64 port. In Chapter 5 there are instructions how to run Mimiker of Raspberry Pi 3 and the challenges of switching from an emulator to physical hardware. Chapter 5 is a summary of my work and proposals for the future.

# 1.1 CPU architecture

The central processing unit is the most essential part of every computer. It performs basic operations like arithmetic for the rest of devices. Each CPU provides an interface for software called instruction set architecture – ISA. ISA is standardized

for each CPU family. Each program that runs on the CPU is translated (directly on indirectly) from its original source code to binary code which encodes instruction understandable for a given CPU.

But CPU architecture is something more than an instruction set. CPU also provides a standardized way (for each family) for things like interrupt handling and virtual memory management.

In my thesis, I will show the most essential parts of CPU architecture that kernel developers should become aware of.

### 1.2 Mimiker

Mimiker is a research operating system inspired by the world of Unix, and in particular by its \*BSD flavour. The main effort of the project is currently improving its kernel to the point it supports more Unix userspace programs [16].

It is a fully open-source project developed at the University of Wrocław.

Originally Mimiker was written for the Malta board which contains CPU belonging to the MIPS family.

MIPS architecture was introduced in 1986 but it lost popularity in last years. As a result in March 2021, MIPS announced that the development of the MIPS architecture had ended.

Unfortunately, it is very difficult to get a working Malta board, so we decided to rewrite Mimiker to a new architecture. AArch64 is our choice because it is easily accessible for everyone by cheap boards like Raspberry Pi which have become very popular in recent years.

The result of my thesis is fully working port for that board and run Mimiker on a physical machine for the first time ever.

## 1.3 What is a port?

As mentioned in section 1.1 each program needs to be translated into binary code that is understandable by the CPU. We also know that different families of CPUs can have different instruction sets and different interfaces for hardware management. When a kernel is compiled it can't be run on different CPUs, because they do not understand binary encoding of instructions. Even if we generate the correct encoding of instructions the hardware interface could be different. Port of an operating system is a special version of kernel and user-space programs adapted to specific architecture with drivers for devices used by the new hardware. Each port should contain only a minimal subset of machine-dependent code which is used by the rest of the kernel through the hardware abstraction layer (HAL), the reason behind that is simple – more lines of code means more potential defects.

HAL is a set of interfaces that are needed to be implemented by device drivers and by a port that allows writing generic code in the kernel. This set also implies how to write a new driver for a given system. It speeds up the process of implementation support for new hardware because we have already provided high-level interface of our driver.

Portability is a very important feature of code. Portable code can be easily reused on newer hardware – this saves the developer's time and reduces resources needed to create a new version of the code. Portable systems are more likely to gain popularity.

### 1.4 What is needed for a port?

First of all, we need a toolchain that allows creating binaries for a given architecture. In our case, it will be the gnu toolchain with GCC compiler. We also need a set of tools for testing and debugging code. As a software debugger, we use gdb. The emulator of Raspberry Pi 3 of our choice is QEMU. For more information about toolchain see 5.1.

We also need to find which subsystems need to be rewritten for supporting new architecture. Here is a short introduction to the most important parts.

Kernel bootstrapping is the first phase of kernel initialization. We need to configure the CPU and enable MMU before jumping to machine-independent part of the code. It is the most sensitive part of machine-dependent code. We need to do that only once during system startup but every decision made during bootstrapping will affect the behavior of the hardware. For example, we can define how the CPU should react to unaligned accesses.

Interaction with MMU is required for the virtual memory concept. In our case, it is handled by the pmap module which is described at 4.3.

We need to be able to react for external events e.g. key pressing on keyboard. They are passed as CPU exceptions. It is described at 4.6.

Context switching allows us to run multiple programs in parallel. It is machinedependent because each CPU can have a different set of general-purpose registers. It is described at 4.7.

Kernel without user-space processes is useless. Basic interaction with them needs to be machine-dependent because of different ISAs between CPUs. It also uses exception handling – as a way for system calls – which is also specific for CPU. These interaction are described at 4.2.

A system that can only use CPU can't do anything interesting. External devices are important to interact with the world. Since we have working terminals in Mimiker we want to implement a driver for UART for communicating with the system. The second important device is the timer – without it we do not have any possibility for time measurement which is important for process management. For drivers implementation see 4.8.1.

# Chapter 2

# RPI 3

Raspberry Pi is a small single-board computer developed by the Raspberry Pi Foundation in association with Broadcom [25]. It is a simple, easily accessible platform. In my work, I base on Raspberry Pi 3 with BCM2837 board [21].

Here I will describe basic information about the hardware required for the porting process. It includes facts about the memory management unit in AArch64 architecture, CPU exceptions needed for external events handling, timer interface, and PL011 UART device.

### 2.1 CPU

In this section, I introduce important facts about quad-core ARM Cortex-A53 CPU used by Raspberry Pi 3 [22].

That CPU implements ARMv8-A 64-bit instruction set [2] which implements 64-bit extension of the ARM architecture called AArch64. In this work, we will be using these terms interchangeably.

#### 2.1.1 MMU

The memory management unit is responsible for translating addresses from virtual to physical memory.

In modern operating systems each process has its own separate address space. From a process point of view, no other process exists and every address belongs to that process. The only requirement for using given address of memory is system call for kernel that makes given address valid. In that scenario, multiple processes can have the same chunk of memory, but they shouldn't share that memory. For that purpose, we have the concept of virtual memory. For each memory address used by a process, we have a mapping from virtual address to a physical address. To achieve that we need support from hardware and it is the role of the MMU. MMU does address translation and the kernel manages where virtual memory region of the process should be mapped to physical memory. For more information about role of memory management unit and virtual memory see [14].

For that purpose, we use page table data structure.

#### Page Table layout

For general information about how page table works see [14]. In my thesis, I'm using 4 levels page table (levels from 0 to 3) where the third level contains final mapping into physical memory.

We use 4 levels because our CPU supports that and we want to be able to use as much memory as possible. The newer version of Raspberry Pi supports up to 8 GiB of RAM which is not addressable by 2 levels page table used by the MIPS version of Mimiker.

| _ | bits $[47:39]$ | bits [38:30]  | bits [29:21]  | bits $[20:12]$ | bits $[11:0]$  |
|---|----------------|---------------|---------------|----------------|----------------|
|   | Level 0 index  | Level 1 index | Level 2 index | Level 3 index  | Offset on page |

| Table 2.1: Virtual address forma | at |
|----------------------------------|----|
|----------------------------------|----|

Each PDE (page directory entry) has 512 64-bits entries and needs exactly one page (4096 bytes). We can see that in this way it is possible to address 256TiB by a single page table.

| Level 0 | Level 1 | Level 2 | Level 3 |
|---------|---------|---------|---------|
| 256TiB  | 512GiB  | 1GiB    | 2MiB    |

| Table 2. | 2: Page table | e mapping size |
|----------|---------------|----------------|
| 10010 2. | . I age table | mapping size   |

This table describes how much memory can be addressed using a page table on a given level. Let's say that we have single page at second level. The page has 4096 bytes of memory. A single entry in the page table always has 8 bytes (in our CPU) which means we have 512 entries. The same goes for the third level. A single entry describes one page which is 4096 bytes of memory. Finally, we have  $512 \times 512 \times 4096$  bytes of memory which is 1GiB.

In addition to mapping, page table contains permission bits for each page. We're interested in the following bits which are provided by the memory management unit in our CPU. They are translated to user-friendly format in 2.3:

• *AF* [10] - access permission; without that bit, every access to page triggers CPU exception; it is used by Mimiker for tracing accesses for every page; for more see 4.3.5;

- USER [6] unprivileged permission; without that bit, every access to page from exception level 0 triggers CPU exception;
- *RO* [7] read only; with that bit every write access to page triggers CPU exception;
- UXN [54] unprivileged execution never; with that bit, execution access to page from exception level 0 triggers CPU exception; for information about exception levels see 2.1.2;
- *PXN* [53] privileged execution never; with that bit, execution access to the page from higher exception levels triggers CPU exception;



Figure 2.1: Page table entry

For more details see [3].

| access       | AF | USER | RO | UXN & PXN |
|--------------|----|------|----|-----------|
| user read    | 1  | 1    | *  | *         |
| user write   | 1  | 1    | 0  | *         |
| user exec    | 1  | 1    | *  | 0         |
| kernel read  | 1  | *    | *  | *         |
| kernel write | 1  | *    | 0  | *         |
| kernel exec  | 1  | *    | *  | 0         |

Table 2.3: Protection map

This table describes which access bits need to be set in the page table entry for successful memory translation by MMU. Other configurations cause memory fault.

Two different page tables can be active at the same time. The addresses of page tables are located at ttbr0 and ttbr1 CPU registers. The first page table is dedicated to user-space, the second one for kernel-space.

CPU decides which page table will be used based on the highest bits of virtual address. Of course, CPU must be in the correct exception level to use that page table.

#### 2.1.2 Exception levels

In ARMv8 execution takes place on one of the exception levels. A higher exception level means fewer privileges for executed code. It is a common practice to give a program access only to necessary resources. In Mimiker we use exception levels for separate user-space and kernel-space threads.

When an exception occurs, CPU jumps to a special procedure defined separately for each kind of exception and for each exception level. We can implement interrupt handlers or system calls using this mechanism.

There is four exception levels on AArch64

- EL0 application; usually it is an exception level where user-space lives
- EL1 kernel; usually it is an exception level where kernel-space lives
- EL2 hypervisor; used by the hypervisor
- EL3 firmware; reserved by low-level firmware and security code

These exception levels determine privileges for memory and registers access.

Mimiker uses EL3 & EL2 levels only to configure EL1 at the beginning of kernel bootstrap code. Next, we will switch to EL1 where the kernel lives and to EL0 for user-space programs.

#### 2.1.3 Exceptions

In AArch64 we have two main kinds of exceptions 4.2.

- synchronous
- asynchronous

For each type of exception, we need to define a special function called exception handler that will be executed when the exception occurs.

These functions need to be known when the exception occurs. There is an exception vector structure that contains these functions and this vector is stored at vbar register.

#### Interrupts

Interrupts are asynchronous events generated from the outside world. A good example is the timer tick. When we configure the timer we say that in x time units we want to generate an interrupt. It can be used not only for time measurement, but also for scheduling. After x ticks, we want to decide if we want to change the running thread to another thread.

It is important that interrupt can appear any point at time, even in the middle of another instruction. So we can't do any special assumptions about the state of CPU.

#### Traps

Traps are synchronous exceptions generated by special instructions. For example, if we want to transfer control to kernel-space from user-space we can use svc instruction which generates trap – in fact, we use that instruction for the implementation of system calls in Mimiker.

#### Aborts

Aborts are synchronous exceptions generated by instructions but unlike traps, they are not intended. hey can occur as a result of wrong access to memory.

### 2.1.4 Timer

Our CPU provides an ARM timer. We do not need any advanced features of timers so let's discuss only the basics.

We have the following registers

- cntpct\_el0 it contains current value of timer
- cntp\_ctl\_el0 this register controls if timer is enabled
- cntp\_cval\_el0 compare register; if current value of timer will be greater of equal then timer sends interrupt to CPU

With that knowledge, we can implement a driver for that timer 4.8.2. It is required for task scheduling. We want to give a time slice for a process when it can run on CPU. After that, we want to decide which process should be run next. For that we use timer interrupts which trigger the scheduling subsystem.

# 2.2 UART

On Raspberry Pi 3 there is PL011 UART [15].

This universal asynchronous receiver/transmitter provides:

- separate 16x8 transmit and 16x12 receive FIFO memory
- programmable baud rate generator
- standard asynchronous communication bits
- false start bit detection
- line break generation and detection
- support of the modem control functions CTS and RTS
- programmable hardware flow control
- fully-programmable serial interface characteristics:
  - data can be 5, 6, 7 or 8 bits
  - even, odd, stick or no-parity bit generation and detection
  - -1 or 2 stop bit generation
  - baud rate generation

This UART generates two interrupts:

- UARTRXINTR the transmit interrupt
- UARTRTINTR the receive interrupt

These interrupts are necessary to implement a terminal over that UART. For now terminal over UART is the only way to run interactive user sessions with shell in Mimiker.

The last important part is the location of registers used to control PL011. All of them are listed in [15].

# Chapter 3

# Preparing the system for porting process

In this chapter, I will describe the most important changes that have been done before we could implement AArch64 support for Mimiker.

It starts with a short dictionary of the most frequently used C constructions in the Mimiker codebase. Next, I will go through changes in the CPU context representation, the memory mapping subsystem, kernel address sanitizer routines, and new hardware abstraction layout over UART devices.

## 3.1 Dictionary

We use some specific constructions in our code. Here I will introduce those that might not be obvious.

The following macros reads and writes data to special registers using msr and mrs instructions.

```
1 READ_SPECIALREG(reg);
2 WRITE_SPECIALREG(reg, val);
```

WITH\_INTR\_DISABLED disables interrupts under the next scope. They will be enabled at the end of scope. It also takes care of nested calls – we maintain an internal counter.

```
1 WITH_INTR_DISABLED {
2 ...
3 }
```

WITH\_MTX\_LOCK acquires mutual exclusion lock (mutex) for everything under the next scope. This lock will be auto-released.

```
1 WITH_MTX_LOCK(&lock) {
2 ...
3 }
```

SCOPED\_MTX\_LOCK acquires mutex for current scope. This lock will be autoreleased at the end of scope.

SCOPED\_MTX\_LOCK(&lock);

The same but for spin locks.

```
1 WITH_SPIN_LOCK(&lock) {
2 ...
3 }
```

Wait on given conditional variable with given lock.

1 cv\_wait(&cv, &lock);

Wake a single thread waiting on a conditional variable.

1 cv\_signal(&cv);

Macro for for that iterates over a list.

TAILQ\_FOREACH(var, head, name);

Insert an element at the end of the list.

TAILQ\_INSERT\_TAIL(head, elm, name);

Remove an element from the list.

1

1

1

TAILQ\_INSERT\_TAIL(head, elm, name);

For detailed API for lists see NetBSD manpages [23]. For mutexes see [26], for spin locks see [27] and for conditional variables see [28].

# 3.2 CPU context

CPU context contains CPU registers. We need to know how to manipulate context mostly for two actions.

- context switch
- exception handler

The context switch is a procedure of switching currently running thread into another. Each thread has a given time slice when it can be run on the CPU. After that period scheduler must decide which thread can replace the running thread and the dispatcher performs that change. It is a very important for modern operating systems. These days we are running hundreds of threads on our daily systems. For example, my system was running 637 threads in time of writing and I have only 8 available cores. But from the user point of view, every process is running simultaneously by the operating system.

The second action is when a CPU exception occurs. Then we need to save the context of the running thread, go to the exception handler in the kernel, handle that exception and at the end restore the context of the interrupted thread.

At the beginning of my work there were a several different CPU context representations:

• struct sigcontext

```
• struct __ucontext
```

- struct mcontext
- struct ctx
- struct exc\_frame
- jmp\_buf
- sigjmp\_buf

Each of them has a similar structure that contains all CPU registers as individual fields or grouped by arrays, for example:

```
struct sigcontext {
1
      int sc_onstack; /* sigstack state to restore */
2
      sigset_t sc_mask; /* signal mask to restore */
3
      int sc_pc; /* pc at time of signal */
4
      int sc_regs[32]; /* processor regs 0 to 31 */
\mathbf{5}
      int sc_mullo;
6
      int sc_mulhi;  /* mullo and mulhi registers... */
int sc_fpused;  /* fp has been used */
\overline{7}
8
      int sc_fpregs[33];/* fp regs 0 to 31 and csr */
9
      int sc_fpc_eir; /* floating point exception instruction reg */
10
    };
11
```

Listing 1: sigcontext structure

Now all of them are unified into one:

```
typedef uint64_t __greg_t;
1
    typedef __greg_t __gregset_t[_NGREG];
2
3
    typedef struct {
4
      union __freg {
\mathbf{5}
        uint8_t __b8[16];
6
        uint16_t __h16[8];
7
        uint32_t __s32[4];
8
        uint64_t __d64[2];
9
      } __qregs[_NFREG] __aligned(16);
10
      uint32_t __fpcr; /* FPCR */
11
      uint32_t __fpsr; /* FPSR */
12
    } __fregset_t;
13
14
    typedef struct mcontext {
15
      __gregset_t __gregs; /* General Purpose Register set */
16
      __fregset_t __fregs; /* FPU/SIMD Register File */
17
       __greg_t __spare[8]; /* future proof */
18
    } mcontext_t;
19
20
    typedef struct ctx {
21
      __gregset_t __gregs;
22
    } ctx_t;
23
^{24}
  ●struct __ucontext {
25
      unsigned int uc_flags; /* properties */
26
                               /* context to resume */
      ucontext_t *uc_link;
27
      sigset_t uc_sigmask;
                               /* signals blocked in this context */
28
      stack_t uc_stack;
                               /* the stack used by this context */
29
      mcontext_t uc_mcontext; /* machine state */
30
    };
31
```

Listing 2: context representation (mix of ucontext.h [31] and mcontext.h [32])

Now we use only  $ucontext_t \mathbf{0}$  for context representation. It is not the most optimal solution but it is the simplest solution – we do not need to convert one representation into another and code is much more clean.

This change also allows us to fix problems with nonlocal jumps which were broken because of different representation of context between user-space and kernelspace in setcontext system call.

### 3.2.1 FPU context

FPU context contains only floating point registers. We do not use any of them inside kernel code but we want to support them in user-space so we need to take care about them during context switching. On the other hand we do not want to save & restore FPU context with every exception or context switching. We want to do that only if it is really needed. For each thread we have special flags that express in which state is FPU for given thread:

- TDP\_FPUCTXSAVED FPU context was saved by ctx\_switch.
- TDP\_FPUINUSE FPU is in use and its context should be saved & restored on demand.

When new user-space thread is spawned FPU is disabled. First access to floating point unit triggers exception and kernel enables FPU for that thread and set TDP\_FPUINUSE flag.

We do not need to save FPU context during exception handling because usually we will go back to user-space with the same thread and we have a guarantee that kernel doesn't touch any FPU register. But it is possible that we do context switch to other user thread. In that scenario we need to save FPU context for old thread during ctx\_switch procedure. Finally when we go back to user-space in user\_exc\_leave we check if TDP\_FPUCTXSAVED is set and restore FPU context if it is needed.

# 3.3 Pmap

In this section I will describe the most important changes in pmap module.

The physical-mapping module (pmap) manages machine-dependent address translation and page tables that are used either directly or indirectly by the MMU.

When new virtual memory is allocated by kernel for kernel-space or user-space it is need to be mapped into physical memory. Without that every access to memory triggers memory fault exception. Pmap manages page tables and tlb if its needed. But memory mapping is not all. Pmap gives also possibility to modify access permissions for given page - it is a very important feature for modern virtual memory subsystem and it is necessary for sharing memory between different process in userspace. The most known feature that use that is copy-on-write. When one process calls **fork** system call then all pages are marked as read-only. When any of processes (parent or child) try to write to memory then memory fault occurs. Kernel allocates new page and copies the content of old page into new. At the end pmap changes mapping of old virtual address into new physical page and clears caches and tlb for that address. For more details see [11].

For information about this module see 4.3.

7

#### 3.3.1Access emulation

On Malta board we do not have hardware tracking of accesses to pages. For that purpose we emulate these functionality. The same situation takes place on AArch64 4.3.5. Implementation of pmap\_emulate\_bits is similar for both supported architectures. For this reason I will comment only implementation for AArch64. Here is a implementation for MIPS. The reader can compare both of them.

```
int pmap_emulate_bits(pmap_t *pmap, vaddr_t va, vm_prot_t prot) {
1
      paddr_t pa;
2
3
      WITH_MTX_LOCK (&pmap->mtx) {
4
         if (!pmap_extract_nolock(pmap, va, &pa))
5
           return EFAULT;
6
         pte_t pte = pmap_pte_read(pmap, va);
8
9
         if ((prot & VM_PROT_READ) && !(pte & PTE_SW_READ))
10
           return EACCES;
11
12
         if ((prot & VM_PROT_WRITE) && !(pte & PTE_SW_WRITE))
13
           return EACCES;
14
15
         if ((prot & VM_PROT_EXEC) && (pte & PTE_SW_NOEXEC))
16
           return EACCES;
17
      }
18
19
      vm_page_t *pg = vm_page_find(pa);
20
      assert(pg != NULL);
21
22
      WITH_MTX_LOCK (&pv_list_lock) {
23
         /* Kernel non-pageable memory? */
^{24}
         if (TAILQ_EMPTY(&pg->pv_list))
25
           return EINVAL;
26
      }
27
28
      pmap_set_referenced(pg);
29
      if (prot & VM_PROT_WRITE)
30
         pmap_set_modified(pg);
31
32
      return 0;
33
    }
34
```



#### 3.3.2 Growkernel

I introduced pmap\_growkernel function which increases virtual address space of kernel. This change was needed to make KASAN work on AArch64 architecture but it also works on MIPS. For more information about origin of that change see 3.4.

To increase virtual address space of kernel we need to know the old end of virtual address space **①** (vm\_kernel\_end variable), new end **③** and location of kernel page table **②**. We only need to add directory entries into kernel page table **③** (if they do not exist) and send information to KASAN about new area of memory that can be used by kernel memory allocator **③**. L1\_SPACE\_SIZE **④** is a number of bytes that are covered by single page directory entry. Note that the difference between MIPS and AArch64 4.3.6 version of that function is related to size of pointers (4 vs 8 bytes).

```
void pmap_growkernel(vaddr_t maxkvaddr) {
1
2
       assert(mtx_owned(&vm_kernel_end_lock));
     ①assert(maxkvaddr > vm_kernel_end);
3
4
     ②pmap_t *pmap = pmap_kernel();
\mathbf{5}
      vaddr_t va;
6
7
     @maxkvaddr = roundup2(maxkvaddr, L1_SPACE_SIZE);
8
9
      WITH_MTX_LOCK (&pmap->mtx) {
10
       @for (va = vm_kernel_end; va < maxkvaddr; va += L1_SPACE_SIZE) {</pre>
11
           if (!is_valid_pde(PDE_OF(pmap, va)))
12

pmap_add_pde(pmap, va);

13
         }
14
      }
15
16
17
        * kasan_grow calls pmap_kenter which acquires pmap->mtx.
18
        * But we are under um_kernel_end_lock from kmem so it is safe to call
19
        * kasan_grow.
20
        */
21
     6kasan_grow(maxkvaddr);
22
23
       vm_kernel_end = maxkvaddr;
24
    }
25
```

Listing 4: pmap\_growkernel MIPS (pmap.c [33])

# 3.4 KASAN

KASAN is a kernel address sanitizer. It is a tool that allow us to detect different types of memory error in runtime. Julian Pszczołowski is an author of original

#### 3.4. KASAN

KASAN implementation for Mimiker which detects stack overflow, buffer overflow and use-after-free errors [13].

Main idea of address sanitizer is creating special contiguous memory area for describing whole used memory. That area is called shadow map. For each byte of memory available for program we have one bit of memory in shadow map that describes if it is safe to use. For each instruction that access memory we add additional call to special function which checks in shadow map if we are doing valid access. For more information about please refer to Julian's thesis.

This implementation had very important assumption that we can allocate shadow map for whole memory at the begin of kernel life. It was justified for MIPS where we have 32 bit pointers so maximal memory that is available for CPU is 4GB of RAM so we need 512MB of RAM for shadow map. In fact we do not need so much memory and we used only 16MB for shadow map which is reasonable value for Malta board.

This assumption doesn't work for AArch64 architecture where pointers have 64 bits so we do not have any possibility to describe whole memory (32TiB for shadow map). So we need to split initialization of shadow map into two phases but it still is a contiguous area in memory – in our case it is located at the end of kernel virtual address space.

#### 3.4.1 Kernel bootstrap

First we need to create shadow map for initial memory mapping. This mapping contains kernel code and static variables.

We need to create valid entries in page table for shadow map ② which is located at KASAN\_MD\_SHADOW\_START. Shadow map is filled with proper values later during initialization of KASAN, but need to be done here since general memory allocator will be available after KASAN initialization. Here we use primitive physical memory allocator ①.

```
size_t kasan_sanitized_size =
1
        roundup2(va - KASAN_MD_SANITIZED_START,
2
                  SUPERPAGESIZE * KASAN_SHADOW_SCALE_SIZE);
3
    size_t kasan_shadow_size = kasan_sanitized_size / KASAN_SHADOW_SCALE_SIZE;
4
    va = KASAN_MD_SHADOW_START;
\mathbf{5}
    /* Allocate physical memory for shadow area */
6

• paddr_t pa = (paddr_t)bootmem_alloc(kasan_shadow_size);

7
    /* How many PDEs should we use? */
8
    int num_pde = kasan_shadow_size / SUPERPAGESIZE;
9
    for (int i = 0; i < num_pde; i++) {</pre>
10
      /* Allocate a new PT */
11
      pte = bootmem_alloc(PAGESIZE);
12
      pde[PDE_INDEX(va)] = PTE_PFN((paddr_t)pte) | PTE_KERNEL;
13
      for (int j = 0; j < PT\_ENTRIES; j++) {
14

②pte[PTE_INDEX(va)] = PTE_PFN(pa) | PTE_KERNEL;

15
        va += PAGESIZE;
16
        pa += PAGESIZE;
17
      }
18
    }
19
```

Listing 5: Shadow map bootstrap MIPS (boot.c [34])

#### 3.4.2 Grow kernel

We know how to create initial shadow map but in that way we can't use more sanitized memory and we do not want to use unsanitized memory in kernel when KASAN is running. From previous section 3.3 we know how to increase kernel virtual address space.

At the end of pmap\_growkernel kasan\_grow is called by pmap subsystem. It is responsible for increasing shadow map. For each missing page in shadow map **①** we allocate new page using abstractions from machine-independent subsystems **②**. That page is mapped by pmap into kernel page table **③**. Finally the memory area that was added into virtual address space of kernel is marked as invalid **④**. It will be marked as valid latter by kernel memory allocator. This function implements key feature for KASAN on AArch64.

```
void kasan_grow(vaddr_t maxkvaddr) {
1
      assert(mtx_owned(&vm_kernel_end_lock));
2
      maxkvaddr = roundup2(maxkvaddr, PAGESIZE * KASAN_SHADOW_SCALE_SIZE);
3
      assert(maxkvaddr < KASAN_MD_MAX_SANITIZED_END);</pre>
4
      vaddr_t va = kasan_va_to_shadow(_kasan_sanitized_end);
\mathbf{5}
      vaddr_t end = kasan_va_to_shadow(maxkvaddr);
6
      /* Allocate and map shadow pages to cover the new KVA space. */
8
     ①for (; va < end; va += PAGESIZE) {</pre>
9
       ②vm_page_t *pg = vm_page_alloc(1);
10

pmap_kenter(va, pg->paddr, VM_PROT_READ | VM_PROT_WRITE, 0);

11
      3
12
13
14
      if (maxkvaddr > _kasan_sanitized_end) {
       @kasan_mark_invalid((const void *)(_kasan_sanitized_end),
15
                             maxkvaddr - _kasan_sanitized_end,
16
                             KASAN_CODE_FRESH_KVA);
17
         _kasan_sanitized_end = maxkvaddr;
18
      }
19
    }
20
```

Listing 6: Increase shadow map (kasan.c [35])

# 3.5 Infrastructure for testing

The important part of development is testing. Without tests it is impossible to track progress of project like operating system. Unfortunately it is not easy to run tests for kernel. Usually we can run program and collect it is output or exit code and generate report based on that values but we have an assumption that this program is running on some system. Here we run program directly on hardware or on emulator.

Previously we used fact that we have multiple UARTs on Malta board and one of them was generating diagnostic output for testing purpose. We were using **socat** to grab output of that UART and decide if test passed or not. But on Raspberry Pi 3 we do not have enough UARTs for that so we decided to redesign our approach.

We introduced special function ktest\_success, which is a dead end in kernel. We use gdb to set breakpoint on this function. If we stop inside that function we kill emulator and quit from gdb with exit code 0, otherwise we quit with different exit code. Of course we do something more in case of failure e.g. we print kernel log, stack trace of every thread and much more.

These changes reduced our complicated logic and eliminated race condition on one of UART which was a source of errors in our continuous integration.

# **3.6** Generic UART interface

A general terminal interface that we support in Mimiker is tty. In this section I will describe generic UART interface created as a glue between UART drivers and tty machine-independent layer. The terminal subsystem is responsible for processing incoming characters for processes. That processing requires managing multiple queues of characters on different layers. Here we will only discuss UART layer. For more information about tty see [12].

At the beginning of my work big part of the tty layer have lived inside UART driver. It was understandable - we had only one terminal exposed to user-space. With new platform we gain a new UART driver 2.2 and we want to share as much code as we can between drivers.

I proposed an abstraction over UART. The abstraction is a set of functions for interaction with hardware and generic structure for describing current state of UART.

### 3.6.1 Low level interface

```
typedef uint8_t (*uart_getc_t)(void *state);
1
    typedef bool (*uart_rx_ready_t)(void *state);
2
    typedef void (*uart_putc_t)(void *state, uint8_t byte);
3
    typedef bool (*uart_tx_ready_t)(void *state);
4
    typedef void (*uart_tx_enable_t)(void *state);
\mathbf{5}
    typedef void (*uart_tx_disable_t)(void *state);
6
7
    typedef struct uart_state {
8
      spin_t u_lock;
9
      ringbuf_t u_rx_buf; /* Software receiver queue. */
10
      ringbuf_t u_tx_buf; /* Software transmitter queue. */
11
      tty_thread_t u_ttd;
12
      void *u_state; /* Private state - mostly memory and irq resources. */
13
    } uart_state_t;
14
```

Listing 7: Generic low level UART interface (uart.h [36])

- uart\_getc\_t returns single byte from UART.
- uart\_rx\_ready\_t returns true if receiver hardware queue is ready.
- uart\_putc\_t puts byte into UART.
- uart\_tx\_ready\_t returns true if transmitter hardware queue is ready.
- uart\_tx\_enable\_t enables transmitter interrupt.

• uart\_tx\_disable\_t disables transmitter interrupt.

This low level interface can be used to implement generic interrupt handler for UART and generic worker for TTY subsystem. If you are interested in how the implementation of these methods looks, see 4.8.3.

#### 3.6.2 Generic interrupt handler

Generic interrupt handler is responsible for responding to two kinds of events:

- a new character has been put into hardware queue (TTY\_THREAD\_RXRDY);
- a new character can be sent by device (TTY\_THREAD\_TXRDY);

If a new character has been put into hardware queue **①**, we put it into the software receive queue and notify tty subsystem **③**. If a new character can be sent by device **④** we send as many characters from queue as we can **⑤**. Finally, if software queue is empty **⑤** we send notification to tty layer about that.

```
intr_filter_t uart_intr(void *data /* device_t* */) {
1
      device_t *dev = data;
2
      uart_state_t *uart = dev->state;
3
      tty_thread_t *ttd = &uart->u_ttd;
4
      intr_filter_t res = IF_STRAY;
5
6
      WITH_SPIN_LOCK (&uart->u_lock) {
7
        /* data ready to be received? */
8
       1 if (uart_rx_ready(dev)) {
9
         ② (void)ringbuf_putb(&uart->u_rx_buf, uart_getc(dev));
10
         Sttd->ttd_flags |= TTY_THREAD_RXRDY;
11
           cv_signal(&ttd->ttd_cv);
12
           res = IF_FILTERED;
13
        }
14
15
        /* transmit register empty? */
16
        if (uart_tx_ready(dev)) {
17
           uint8_t byte;
18
         Gwhile (uart_tx_ready(dev) && ringbuf_getb(&uart->u_tx_buf, &byte))
19

Juart_putc(dev, byte);

20
           if (ringbuf_empty(&uart->u_tx_buf)) {
21
             /* If we're out of characters and there are characters
22
              * in the tty's output queue, signal the tty thread to refill. */
23
             if (ttd->ttd_flags & TTY_THREAD_OUTQ_NONEMPTY) {
24
             ③ttd->ttd_flags |= TTY_THREAD_TXRDY;
25
               cv_signal(&ttd->ttd_cv);
26
             }
27
             /* Disable TXRDY interrupts - the tty thread will re-enable them
28
              * after filling tx_buf. */
29
             uart_tx_disable(dev);
30
          }
31
          res = IF_FILTERED;
32
        }
33
      }
34
35
      return res;
36
    }
37
```



#### 3.6.3 UART-TTY thread

The role of UART-TTY thread is very similar to role of UART interrupt handler but on the upper layer. Once we are signalled **1** by UART interrupt, we check what we need to do. If the receive queue is not empty **2** we move **3** characters from that queue into upper layer of terminal subsystem. If the output queue is not empty **4** we move characters from terminal's queue to the outgoing buffer **5**.

```
static void uart_tty_thread(void *arg) {
1
      device_t *dev = arg;
2
      uart_state_t *uart = dev->state;
3
      tty_thread_t *ttd = &uart->u_ttd;
4
      tty_t *tty = ttd->ttd_tty;
5
      uint8_t work, byte;
6
7
      while (true) {
8
        WITH_SPIN_LOCK (&uart->u_lock) {
9
          /* Sleep until there's work for us to do. */
10
          while ((work = ttd->ttd_flags & TTY_THREAD_WORK_MASK) == 0)
11
           ①cv_wait(&ttd->ttd_cv, &uart->u_lock);
12
          ttd_>ttd_flags &= ~TTY_THREAD_WORK_MASK;
13
        }
14
        WITH_MTX_LOCK (&tty->t_lock) {
15
         2 if (work & TTY_THREAD_RXRDY) {
16
             /* Move characters from rx_buf into the tty's input queue. */
17
            while (uart_getb_lock(uart, &byte))
18

③ if (!tty_input(tty, byte))

19
                 klog("dropped character %hhx", byte);
20
          }
21
         ④if (work & TTY_THREAD_TXRDY)
22

Juart_tty_fill_txbuf(dev);

23
        }
^{24}
      }
25
    }
26
```

Listing 9: UART TTY thread (uart\_tty.c [38])
## Chapter 4

# Mimiker on AArch64

In this chapter we will focus on changes in Mimiker related to AArch64 and Raspberry Pi 3.

It includes most of work related to AArch64 and Raspberry Pi 3 specific code for Mimiker. I will go through interaction between kernel-space and user-space, memory mapping module, kernel address sanitizer machine-dependent part, first instructions of kernel, exception handling and finally basic set of drivers.

It is recommended that the reader uses the source code viewer when reading this chapter which is available at [16].

These changes were first developed for QEMU emulator and then adjusted to physical hardware.

All assembly code presented in this chapter belongs to ARMv8 Instruction Set [2] with the following calling convention:

- x31 stack pointer;
- x30 link register;
- x19 x28 callee-saved registers;
- x9 x18 caller-saved registers;
- x8 indirect return value address;
- x0 x7 function arguments and their results;

### 4.1 How to run emulator?

A special toolchain is required to build Mimiker. The most significant part is GCC – GNU Compiler Collection – in version 11 in time of writing. Building of toolchain is

fully automated by make scripts in Mimiker repository [17]. For full list of required packages see Dockerfile in root directory of project. It contains description of image used by CI system for automatic tests. More detailed instructions are available at 5.1. There exists development environment with ssh connection. For access please contact with administrators of Mimiker website [16]. If you have account on development machine you can skip installation of toolchain.

The simplest command for build is: make BOARD=rpi3

It produces ramdisk with user-space programs – initrd.cpio, sysroot directory with debugging symbols for each user-space binary, sys/mimiker.elf – ELF with kernel and sys/mimiker.img – final kernel image.

For running interactive session use:

./launch —board=rpi3 -d init=/bin/ksh

This command runs tmux terminal multiplexer with three windows:

- 1. QEMU logs;
- 2. Mimiker console our equivalent of TTY with running ksh;
- 3. gdb session;

By default execution stops on first instruction of machine-independent part of kernel initialization and manual continue command is required by gdb.

#### 4.2 Interaction with user-space

In modern general purpose operating systems kernel without user programs is useless. In this chapter we will focus on machine-dependent communication between kernel and user programs.

#### 4.2.1 Copy

The copy functions are designed to copy contiguous data from one address to another like memcpy but they are safe to use for copying data between kernel-space and userspace. It means that write or read from unmapped address will not cause the kernel panic. The copy functions return 0 on success and error otherwise.

copyin copies len bytes of data from the user-space address uaddr to the kernel-space address kaddr.

int copyin(const void \*uaddr, void \*kaddr, size\_t len);

copyout copies len bytes of data from the kernel-space address kaddr to the user-space address uaddr.

int copyout(const void \*kaddr, void \*uaddr, size\_t len);

copyinstr copies a NULL-terminated string, at most len bytes long, from userspace address uaddr to kernel-space address kaddr. The number of bytes actually copies, including the terminating NULL, is returned in done (if done is not NULL).

1

1

1

int copyinstr(const void \*uaddr, void \*kaddr, size\_t len, size\_t \*done);

copystr copies a NULL-terminated string, at most len bytes long, from kernelspace address kfaddr to kernel-space address kdaddr. The number of bytes actually copied, including the terminating NULL, is returned in done (if done is not NULL).

int copystr(const void \*kfaddr, void \*kdaddr, size\_t len, size\_t \*done);

The copy functions return 0 on success. If a bad address is encountered they return EFAULT.

All of the copy functions need to do following things:

- validate arguments;
- set a flag what to do in case of fault;
- copy data;

First one is simple – kernel need to validate if uaddr is placed in user space which is a contiguous space between USER\_SPACE\_START and USER\_SPACE\_END.

For second one we have a special flag in thread structure. In case of fault we check if td::td\_onfault flag is set. If it is we jump into special copyerr routine that returns EFAULT from copy context. It is needed because kernel doesn't have control over arguments from user-space. Let's think about read

ssize\_t read(int fd, void \*buf, size\_t count);

It is a common mistake to pass wrong pointer as argument to syscall e.g.

```
char *read_data(FILE *fp, size_t count) {
    Ochar *buf = malloc(count);
    read(fileno(stdin), buf, count);
    return buf;
    }
```

Listing 10: heap buffer overflow

It is possible that malloc returns NULL **①**. As a result we trigger memory fault in kernel during copy data to user-space but kernel cannot fail. So we set special flag to handle memory errors from functions which copy data between kernel-space and user-space.

For third one we use generic **bcopy** implementation.

Let's see the final implementation of copyin. We check if user-space pointers point to user-space **①**. Then we set flag in thread structure **②** – this is an information that the thread is inside copy function. Finally bcopy [24] is called **③**. On success we return directly from copyin **④** with 0.

1

```
ENTRY(copyin)
 1
                        sp, sp, #16
               sub
 2
                        lr, [sp]
               str
 3
 ^{4}
               # len > 0
 \mathbf{5}
               cbz
                        x2, 1f
 6
   ิด
 7
               # (uintptr_t)us < (uintptr_t)(us + len)</pre>
 8
               add
                        x3, x0, x2
 9
                        x0, x3
               cmp
10
               b.hi
                        copyerr
11
12
               # (uintptr_t)(us + len) <= USER_SPACE_END</pre>
13
                        x4, =USER_SPACE_END
               ldr
14
                        x3, x4
15
               cmp
               b.hi
                        copyerr
16
17
            2set_onfault x3, x4, copyerr
18
             Øbl
                        bcopy
19
               clr_onfault x3
20
21
   4
     1:
                         x0, xzr
22
               mov
^{23}
                        lr, [sp]
               ldr
24
               add
                         sp, sp, #16
25
               ret
26
     END(copyin)
27
```

Listing 11: copyin (copy.S [39])

#### 4.2.2 Syscall handler

System call (syscall) is a way in which a user program communicates with kernel. From user-space programmer view most of syscalls are normal functions provided by *libc*. That functions generate exceptions in order to changing context from userspace to kernel-space but they also must pass their arguments. We know that we have calling convention for normal function but that convention can be different for system calls.

user-space In *Mimiker* (on *AArch64*) we have the following convention:

- each syscall can have at most six arguments;
- arguments are passed via registers from x0 to x5;
- syscall number is encoded in exception;

so from user-space side it is very simple – we only call assembly which generates exception **0**. At the end of syscall wrapper we also need to update errno variable **2** according to C standard – it is a responsibility of  $\_sc\_error$ .

Listing 12: syscall entry from user-space (syscall.h [40])

**kernel-space** In kernel-space we need a trampoline from machine-dependent exception code to machine-independent specific syscall implementation. We have following generic syscall definition:

```
typedef int syscall_t(proc_t *p, void *args, register_t *result);
typedef struct sysent {
    int nargs;    /* number of args passed to syscall */
    syscall_t *call; /* syscall implementation */
    } sysent_t;
```

Listing 13: system call entry (sysent.h [41])

And for each system call we are casting args into specific argument type e.g.:

```
1
  2#define SYSCALLARG(x) union { register_t _pad; x arg; }
2
3
    typedef struct {
4
      SYSCALLARG(int) fd;
5
      SYSCALLARG(void *) buf;
6
      SYSCALLARG(size_t) nbyte;
7
    } read_args_t;
8
9
    static int sys_read(proc_t *p, read_args_t *args, register_t *res) {
10
      int fd = SCARG(args, fd);
11
      void *u_buf = SCARG(args, buf);
12
      size_t nbyte = SCARG(args, nbyte);
13
14
      int error;
15
      uio_t uio = UIO_SINGLE_USER(UIO_READ, 0, u_buf, nbyte);
16
      if ((error = do_read(p, fd, &uio)))
17
        return error;
18
      *res = nbyte - uio.uio_resid;
19
      return 0;
20
    }
21
```

Listing 14: read system call (syscalls.c [42])

It is very important for us that each argument has size of the register. For that purpose we use combination of SCARG ① and SYSCALLARG ②. We need that for simplicity in trampoline code.

trampoline We can divide syscall trampoline into following steps:

- copy arguments from user context into args structure **0**;
- find sysent\_t of called syscall **2**;
- call syscall function **3**;
- jump to exception return code;

For arguments copying we use buffer where we are storing subsequent arguments for args structure  $\mathbf{0}$ . This is the reason why arguments have size of the register.

```
static void syscall_handler(register_t code, ctx_t *ctx,
1
                                   syscall_result_t *result) {
2
      register_t args[SYS_MAXSYSARGS];
3
       const int nregs = 6;
4
\mathbf{5}
     ①memcpy(args, &_REG(ctx, X0), nregs * sizeof(register_t));
6
7
      if (code > SYS_MAXSYSCALL) {
8
         args[0] = code;
9
         code = 0;
10
      }
11
12
     2sysent_t *se = &sysent[code];
13
       size_t nargs = se->nargs;
14
15
       thread_t *td = thread_self();
16
      register_t retval = 0;
17
18
     Oint error = se->call(td->td_proc, (void *)args, &retval);
19
20
      result->retval = error ? -1 : retval;
21
       result->error = error;
22
    }
23
```

Listing 15: system call trampoline (trap.c [43])

#### 4.2.3 crt0

C standard [1] says that:

- The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters or with two parameters or in some other implementation-defined manner.
- If they are declared, the parameters to the main function shall obey the following constraints:
  - The value of argc (first argument) shall be nonnegative.
  - argv[argc] (second argument) shall be a null pointer.
  - If the value of *argc* is greater than zero, the array embers argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup.
  - If the value of argc is greater than zero, the string pointed to by argv[0] represents the *program name*. If the value of argc is greater than one,

the strings pointer to by argv[1] through argv[argc - 1] represent the program parameters.

- The parameters argc and argv and the strings pointer to by the argv array shall be modifiable by the program, and retain ther last-stored values between program startup and program termination.

But there is a lot of things to do before main e.g. calling functions marked as \_\_attribute\_\_((constructor)) or *libc* init. So the real startup for program is \_start from (in our case) crt0.S.

To understand implementation of \_start we need to know how the initial stack layout of process looks. *Mimiker* puts program arguments and environment variables onto initial process stack. Let's use previous names for variables and additionally let's call envp as a table of environment variables, m as a number of environment variables and n as a number of *program arguments*.

|                                                           | stack segment high address                                      |  |  |
|-----------------------------------------------------------|-----------------------------------------------------------------|--|--|
| envp[m - 1]                                               |                                                                 |  |  |
| :                                                         | each of envp[i] is a NULL-terminated string                     |  |  |
| envp[1]                                                   |                                                                 |  |  |
| envp[0]                                                   |                                                                 |  |  |
| argv[n - 1]                                               |                                                                 |  |  |
| :                                                         | each of argv[i] is a NULL-terminated string                     |  |  |
| argv[1]                                                   |                                                                 |  |  |
| argv[0]                                                   |                                                                 |  |  |
| envp                                                      | NULL-terminated environment vector storing pointers to envp[0m] |  |  |
| argv                                                      | NULL-terminated argument vector storing pointers to argv[0n]    |  |  |
| argc a single uint32 declaring the number of arguments (r |                                                                 |  |  |
| program stack                                             |                                                                 |  |  |
| :                                                         |                                                                 |  |  |
|                                                           | stack segment low address                                       |  |  |



```
ENTRY(_start)
1
             # Grab argc from stack.
2
                      w0, [sp, #0]
             ldr
3
4
             # Prepare argv.
5
             add
                      x1, sp, #8
6
7
             # Prepare enup, it starts at argu + argc + 1
8
             lsl
                      x2, x0, #3
9
                      x2, x1, x2
             add
10
                      x2, x2, #8
             add
11
12
             # Jump to start in crt0-common.c
13
             # void ____start(int argc, char **argv, char **envp)
14
             b
                      ___start
15
    END(_start)
16
```

Listing 16: crt0 (crt0.S [44])

The rest of the program initialization is in machine-independent \_\_\_\_start provided by *libc*.

#### 4.2.4 Signals

1

Signals [8] are a form of inter process communication used in Unix-like operating systems. Signal is an asynchronous notification sent to a process or to a thread. When a signal is sent kernel interrupts execution of process and signal handler (if it is set) is being executed. Signal can be sent directly by kernel e.g. as a result of memory fault or by other process using kill system call [9]. User-space program can be interrupted by signal at almost any time.

static void sighandler(int sig, ksiginfo\_t \*info, ucontext\_t \*ctx);

Machine-dependent part of signals is calling signal handler. In Mimiker we copy following data into user thread stack

- **sigcode** procedure; it is a special code that calls **sigreturn** system call for going back from signal to a context of interrupted thread;
- struct ksiginfo a bunch of information about signal for user;
- struct ucontext a context of interrupted thread

sigcode procedure is needed because signal handler can destroy any register so we can't just return from function. We need to go back to kernel to fix context of thread. Our solution is to copy simply procedure that calls special syscall **1** into stack and changing return address of syscall handler to that procedure **3**. In this way we can go back to kernel and fix context.

```
ENTRY(sigcode)
1
                                     /* address of ucontext_t to restore */
             mov
                      x0, sp
\mathbf{2}
3
           0 svc
                      #SYS_sigreturn
                      #0
             brk
4
    EXPORT(esigcode)
5
    END(sigcode)
6
```

Listing 17: sigcode.S (sigcode.S [45])

We also need to allocate memory for that data on user stack  $\bullet$  and copy their addresses into registers according to ABI  $\bullet$ .

```
int sig_send(signo_t sig, sigset_t *mask, sigaction_t *sa,
1
                  ksiginfo_t *ksi) {
2
      thread_t *td = thread_self();
3
      mcontext_t *uctx = td->td_uctx;
4
5
      ucontext_t uc;
6
      mcontext_copy(&uc.uc_mcontext, uctx);
7
      uc.uc_sigmask = *mask;
8
9
     Oregister_t sc_code = sig_stack_push(uctx, sigcode, esigcode - sigcode);
10
      register_t sc_info = sig_stack_push(uctx, ksi, sizeof(ksiginfo_t));
11
      register_t sc_uctx = sig_stack_push(uctx, &uc, sizeof(ucontext_t));
12
13
     @_REG(uctx, ELR) = (register_t)sa->sa_handler;
14
      _REG(uctx, X0) = sig;
15
       _REG(uctx, X1) = sc_info;
16
      _REG(uctx, X2) = sc_uctx;
17
     ③_REG(uctx, LR) = sc_code;
18
19
      return 0;
20
    }
^{21}
```

Listing 18: signal.c (signal.c [46])

#### 4.2.5 Nonlocal goto

Here I want to describe AArch64 implementation of non local jumps.

Like on MIPS we have six different functions related to non local jumps.

\_setjmp - which saves context of running thread into jump buffer. We need to save callee saved registers. They are x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29 ①. Link register lr ②. Stack pointer sp ③. We also need to save all floating point registers ⑤ and flags ④ for the kernel that this context contains valid CPU and FPU registers.

## 4.2. INTERACTION WITH USER-SPACE

| 1        | ENTRY(_setjmp) |                                                                |
|----------|----------------|----------------------------------------------------------------|
| 2        | /* Copy        | saved registers */                                             |
| 3        | Østr           | x19, [x0, UC_REGS_X19]                                         |
| 4        | stp            | x20, x21, [x0, UC_REGS_X20]                                    |
| 5        | stp            | x22, x23, [x0, UC_REGS_X22]                                    |
| 6        | stp            | x24, x24, [x0, UC_REGS_X24]                                    |
| 7        |                | x26, x27, [x0, UC_REGS_X26]                                    |
| 8        | stp            | x28, x29, [x0, UC_REGS_X28]                                    |
| 9        | <b>2</b> str   | lr, [x0, UC_REGS_LR]                                           |
| 10       |                |                                                                |
| 11       | /* sp r        | egister is special */                                          |
| 12       | mov            | x1, sp                                                         |
| 13       | Østr           | x1, [x0, UC_REGS_SP]                                           |
| 14       |                |                                                                |
| 15       |                | flags */                                                       |
| 16       |                | x1, [x0, UC_FLAGS]                                             |
| 17       | orr            | x1, x1, (_UC_FPU    _UC_CPU)                                   |
| 18       | <b>Ø</b> str   | x1, [x0, UC_FLAGS]                                             |
| 19       |                |                                                                |
| 20       | /* FPU         | status */                                                      |
| 21       |                | x1, fpcr                                                       |
| 22       |                | w1, [x0, UC_FPREGS_FPCR]                                       |
| 23       |                | x1, fpsr                                                       |
| 24       | str            | w1, [x0, UC_FPREGS_FPSR]                                       |
| 25       | _              |                                                                |
| 26       |                | q0, q1, [x0, UC_FPREGS_Q0]                                     |
| 27       |                | q2, q3, [x0, UC_FPREGS_Q2]                                     |
| 28       |                | q4, q5, [x0, UC_FPREGS_Q4]                                     |
| 29       |                | q6, q7, [x0, UC_FPREGS_Q6]                                     |
| 30       |                | q8, q9, [x0, UC_FPREGS_Q8]                                     |
| 31       |                | q10, q11, [x0, UC_FPREGS_Q10]                                  |
| 32       |                | q12, q13, [x0, UC_FPREGS_Q12]                                  |
| 33       | _              | q14, q15, [x0, UC_FPREGS_Q14]                                  |
| 34       | stp<br>stp     | q16, q17, [x0, UC_FPREGS_Q16]<br>q18, q19, [x0, UC_FPREGS_Q18] |
| 35<br>36 | stp            | q20, q21, [x0, UC_FPREGS_Q20]                                  |
| 36<br>37 | stp            | q20, q21, [x0, UC_FPREGS_Q22]                                  |
| 38       |                | q22, q25, [x0, UC_FPREGS_Q24]                                  |
| 39       | -              | q24, q27, [x0, UC_FPREGS_Q26]                                  |
| 39<br>40 | -              | q28, q29, [x0, UC_FPREGS_Q28]                                  |
| 40       | stp            | q30, q31, [x0, UC_FPREGS_Q30]                                  |
| 41       | L SOF          | 100, 101, Luo, 00111020014001                                  |
| 42       | mov            | x0, xzr                                                        |
| 44       | ret            | ,                                                              |
| 45       | END(_setjmp)   |                                                                |
|          | 517            |                                                                |

Listing 19: \_setjmp (\_setjmp.S [47])

 $\_\texttt{longjmp}$  - which is an opposite to  $\_\texttt{setjmp}.$  It reads every registers written by

\_setjmp.

| 1        | ENTRY(_longjmp)                                                        |
|----------|------------------------------------------------------------------------|
| 2        | ldr x19, [x0, UC_REGS_SP]                                              |
| 3        | mov sp, x19                                                            |
| 4        | ldr x19, [x0, UC_REGS_X19]                                             |
| 5        | ldp x20, x21, [x0, UC_REGS_X20]                                        |
| 6        | ldp x22, x23, [x0, UC_REGS_X22]                                        |
| 7        | ldp x24, x25, [x0, UC_REGS_X24]                                        |
| 8        | ldp x26, x27, [x0, UC_REGS_X26]                                        |
| 9        | ldp x28, x29, [x0, UC_REGS_X28]                                        |
| 10       | ldr lr, [x0, UC_REGS_LR]                                               |
| 11       |                                                                        |
| 12       | /* FPU status */                                                       |
| 13       | ldr w2, [x0, UC_FPREGS_FPCR]                                           |
| 14       | /* msr ignores upper 32 bits for fpcr & fpsr */                        |
| 15       | msr fpcr, x2                                                           |
| 16       | ldr w2, [x0, UC_FPREGS_FPSR]                                           |
| 17       | msr fpsr, x2                                                           |
| 18       |                                                                        |
| 19       | ldp q0, q1, [x0, UC_FPREGS_Q0]                                         |
| 20       | ldp q2, q3, [x0, UC_FPREGS_Q2]                                         |
| 21       | ldp q4, q5, [x0, UC_FPREGS_Q4]                                         |
| 22       | ldp q6, q7, [x0, UC_FPREGS_Q6]                                         |
| 23       | ldp q8, q9, [x0, UC_FPREGS_Q8]                                         |
| 24       | ldp q10, q11, [x0, UC_FPREGS_Q10]                                      |
| 25       | ldp q12, q13, [x0, UC_FPREGS_Q12]<br>ldp q14, q15, [x0, UC_FPREGS_Q14] |
| 26<br>27 | ldp q14, q15, [x0, UC_FPREGS_Q14]<br>ldp q16, q17, [x0, UC_FPREGS_Q16] |
| 21       | ldp q18, q19, [x0, UC_FPREGS_Q18]                                      |
| 28<br>29 | ldp q20, q21, [x0, UC_FPREGS_Q20]                                      |
| 30       | ldp q22, q23, [x0, UC_FPREGS_Q22]                                      |
| 31       | ldp q24, q25, [x0, UC_FPREGS_Q24]                                      |
| 32       | ldp q26, q27, [x0, UC_FPREGS_Q26]                                      |
| 33       | ldp q28, q29, [x0, UC_FPREGS_Q28]                                      |
| 34       | ldp q30, q31, [x0, UC_FPREGS_Q30]                                      |
| 35       |                                                                        |
| 36       | mov x0, x1                                                             |
| 37       | ret                                                                    |
| 38       | END(_longjmp)                                                          |
| 38       | END(_longjmp)                                                          |

Listing 20: \_longjmp (\_setjmp.S [47])

longjmp - in addition to  $\_\texttt{longjmp}$  it sets signal mask 0.

```
void longjmp(jmp_buf env, int val) {
1
      ucontext_t *sc_uc = (void *)env;
2
      ucontext_t uc;
3
      memset(&uc, 0, sizeof(ucontext_t));
^{4}
5
      if (\_REG(sc\_uc, SP) == 0)
6
        goto err;
7
8
      if (val == 0)
9
        val = 1;
10
11
      uc.uc_flags =
12
         _UC_CPU | ((sc_uc->uc_flags & _UC_STACK) ? _UC_SETSTACK : _UC_CLRSTACK)
13
14
     Osigprocmask(SIG_SETMASK, &sc_uc->uc_sigmask, NULL);
15
16
      uc.uc_link = 0;
17
18
      REG(\&uc, X0) = val;
19
20
       _REG(&uc, X19) = _REG(sc_uc, X19);
21
       REG(\&uc, X20) = REG(sc_uc, X20);
22
      REG(\&uc, X21) = REG(sc_uc, X21);
23
      REG(\&uc, X22) = REG(sc_uc, X22);
^{24}
      REG(\&uc, X23) = REG(sc_uc, X23);
25
      REG(\&uc, X24) = REG(sc_uc, X24);
26
       _REG(&uc, X25) = _REG(sc_uc, X25);
27
      REG(\&uc, X26) = REG(sc_uc, X26);
28
      REG(\&uc, X27) = REG(sc_uc, X27);
29
      _REG(&uc, X28) = _REG(sc_uc, X28);
30
      _REG(&uc, X29) = _REG(sc_uc, X29);
31
32
       REG(\&uc, SP) = REG(sc_uc, SP);
33
      REG(\&uc, LR) = REG(sc_uc, LR);
34
      REG(\&uc, PC) = REG(sc_uc, PC);
35
       _REG(&uc, SPSR) = _REG(sc_uc, SPSR);
36
       _REG(&uc, TPIDR) = _REG(sc_uc, TPIDR);
37
38
      if (sc_uc->uc_flags & _UC_FPU) {
39
        memcpy(&uc.uc_mcontext.__fregs, &sc_uc->uc_mcontext.__fregs,
40
                sizeof(__fregset_t));
41
        uc.uc_flags |= _UC_FPU;
42
      3
43
44
      setcontext(&uc);
45
    err:
46
      longjmperror();
47
      abort();
48
    }
49
```

Listing 21: longjmp (longjmp.c [48])

setjmp - in addition to \_setjmp it saves current signal mask.

#### 4.2. INTERACTION WITH USER-SPACE

```
ENTRY(setjmp)
1
             sub
                      sp, sp, CALLFRAME_SIZ
2
                      lr, x19, [sp]
             stp
3
             /* Save env in safe register. */
4
             mov
                      x19, x0
5
6
             /* Save current signalmask at ucontext::uc_sigmask.
7
                If set is NULL, then the signal mask is unchanged (i.e., how is
8
              *
                ignored), but the current value of the signal mask is
9
              *
                nevertheless returned in oldset (if it is not NULL). */
10
                      x2, x19, UC_MASK /* &env->uc_sigmask */
             add
11
             mov
                      x1, xzr
12
                      sigprocmask
13
             bl
                      x0, xzr
14
             cmp
                      botch
             bne
15
16
             /* Save stack_t at ucontext::uc_stack
17
                By specifying ss as NULL, and old_ss as a non-NULL value, one
18
              *
                can obtain the current settings for the alternate signal stack
19
                without changing them. */
20
                      x1, x19, UC_STACK /* &env->uc_stack */
21
             add
             /* We know that x0 is equal to 0 here. */
22
             bl
                      sigaltstack
23
                     x0, xzr
24
             cmp
                      botch
             bne
25
26
             /* stack_t::ss_flags is a int */
27
                      w0, [x19, UC_STACK+SS_FLAGS]
             ldr
28
             and
                      w0, w0, SS_ONSTACK
29
                      w0, wzr
             cmp
30
                      1f
31
             beq
32
             /* ucontext_t::uc_flags is a int */
33
                      w0, [x19, UC_FLAGS]
             ldr
34
                      w0, w0, _UC_STACK
             orr
35
                      w0, [x19, UC_FLAGS]
             str
36
37
    1:
38
             /* restore jpmbuf */
39
                     x0, x19
             mov
40
                      lr, x19, [sp]
             ldp
41
42
             add
                      sp, sp, CALLFRAME_SIZ
43
44
             b
                      _setjmp
45
    botch:
46
             bl
                      abort
47
    END(setjmp)
48
```

Listing 22: setjmp (setjmp.S [49])

And finally sigsetjmp and siglongjmp. They are setjmp and longjmp that can be used inside signal handlers. They are only dispatchers which call the rest functions based on arguments.

```
/* int sigsetjmp(jmp_buf buf, int savesigs) */
1
2
    ENTRY(sigsetjmp)
                      x1, xzr
              cmp
3
             bne
                       1f
4
                       x1, [x0, UC_FLAGS]
              str
5
             b
                       _setjmp
6
7
     1:
              mov
                       x1, _UC_SIGMASK
8
             str
                       x1, [x0, UC_FLAGS]
9
             b
                       setjmp
10
    END(sigsetjmp)
11
12
    / void siglongjmp(sigjmp_buf env, int val) */
13
    ENTRY(siglongjmp)
14
             ldr
                      x2, [x0, UC_FLAGS]
15
             and
                      x2, x2, _UC_SIGMASK
16
              cmp
                      x2, _UC_SIGMASK
17
                      longjmp
             beq
18
             b
                       _longjmp
19
    END(siglongjmp)
20
```

Listing 23: sigsetjmp and siglongjmp (sigsetjmp.S [50])

## 4.3 pmap

The physical-mapping module (pmap) manages machine-dependent address translation and access tables that are used either directly or indirectly by the MMU.

When new virtual memory is allocated by the kernel for kernel-space or userspace it needs to be mapped into physical memory. Without that every access to memory triggers memory fault exception. Pmap manages page tables and tlb if its needed. But memory mapping is not all. Pmap also gives possibility to modify access permissions for given virtual memory address - it is very important feature for modern virtual memory subsystem and it is necessary for sharing memory between different process in user-space. The most known feature, which uses that is copyon-write. When one process calls **fork** system call then all pages are marked as read-only. When any of processes (parent or child) try to write to memory then memory fault occurs. Kernel allocate new page and copies the content of old page into the new. At the end pmap change mapping of old virtual address into new physical page and clear caches and tlb for that address. For more details about copy-on-write see [11]. 1

1

1

1

#### 4.3.1 Interface

pmap\_t is a structure that manages actual virtual address space of process.

```
typedef struct pmap {
1
                                       /*protects all fields in this structure*,
2
     mtx_t mtx;
     asid_t asid;
                                       /*address space identifier*/
3
                                       /*directory page table physical address*,
     paddr_t pde;
4
                                       /*pages we allocate in page table*/
     vm_pagelist_t pte_pages;
\mathbf{5}
     TAILQ_HEAD(, pv_entry) pv_list; /*all pages mapped by this physical map*,
6
   } pmap_t;
7
```

Listing 24: Physical map definition (pmap.c [51])

Return true if va belongs to given pmap. It is used mostly for sanity-checks in our codebase.

bool pmap\_address\_p(pmap\_t \*pmap, vaddr\_t va);

Return true if the range start to end belongs to given pmap. It is also used for sanity-checks.

bool pmap\_contains\_p(pmap\_t \*pmap, vaddr\_t start, vaddr\_t end);

Return first address that belongs to given pamp.

vaddr\_t pmap\_start(pmap\_t \*pmap);

Return last address that belongs to given pamp.

vaddr\_t pmap\_end(pmap\_t \*pmap);

Bootstrap kernel pmap. It sets address space id for page table allocated during early bootstrapping, initializes mutexes for pmap module and initializes lists of used pages used by kernel page table. void init\_pmap(void);

Allocate new page table, address space id and bootstrap new pmap with that page table. We call this function every time a new process is created directly by kernel or by system call.

pmap\_t \*pmap\_new(void);

Delete pmap structure, free pages which belong to given pmap and free address space id.

void pmap\_delete(pmap\_t \*pmap);

Map page to given virtual address in pmap with given protection and cache flags. It is used mostly for mapping pages into address spaces of user threads. Kernel should use pmap\_kenter for that purpose.

Find where a given virtual address is mapped in pmap.

bool pmap\_extract(pmap\_t \*pmap, vaddr\_t va, paddr\_t \*pap);

Remove mapping from given pmap.

void pmap\_remove(pmap\_t \*pmap, vaddr\_t start, vaddr\_t end);

Map page to given virtual address in kernel pmap with given protection and cache flags.

1

2

1

```
1
```

Find where a given virtual address is mapped in kernel pmap.

```
bool pmap_kextract(addr_t va, paddr_t *pap);
```

Remove mapping from kernel pmap.

void pmap\_kremove(vaddr\_t start, vaddr\_t end);

Change protection of mapping. It can be called as a result of mprotect system call or by copy-on-write mechanism in kernel [11].

1

1

1

1

Remove page from every pmap.

void pmap\_page\_remove(vm\_page\_t \*pg);

Clear given page.

void pmap\_zero\_page(vm\_page\_t \*pg);

Copy given page.

void pmap\_copy\_page(vm\_page\_t \*src, vm\_page\_t \*dst);

Software tracking of referenced and modified bits. For more information see 4.3.5.

```
bool pmap_clear_referenced(vm_page_t *pg);
bool pmap_clear_modified(vm_page_t *pg);
bool pmap_is_referenced(vm_page_t *pg);
bool pmap_is_modified(vm_page_t *pg);
void pmap_set_referenced(vm_page_t *pg);
void pmap_set_modified(vm_page_t *pg);
int pmap_emulate_bits(pmap_t *pmap, vaddr_t va, vm_prot_t prot);
```

Activate mapping from given pmap. It is called when we want to change active address space.

void pmap\_activate(pmap\_t \*pmap);

Return pmap for given virtual address.

pmap\_t \*pmap\_lookup(vaddr\_t addr);

Return kernel pmap.

1 pmap\_t \*pmap\_kernel(void);

Return active user pmap.

pmap\_t \*pmap\_user(void);

1

Increase usable kernel virtual address space to at least maxkvaddr. More details are available at 3.3.2.

void pmap\_growkernel(vaddr\_t maxkvaddr);

Here I describe the most important internals of that module. For high level overview please see FreeBSD manpages [5].

#### 4.3.2 Protection map

vm\_prot\_map is a representation of access bits from 2.3.

```
static const pte_t pte_common = L3_PAGE | ATTR_SH_IS;
1
    static const pte_t pte_noexec = ATTR_XN | ATTR_SW_NOEXEC;
2
3
    static const pte_t vm_prot_map[] = {
4
      [VM_PROT_NONE] = pte_noexec | pte_common,
5
      [VM_PROT_READ] =
6
        ATTR_AP_RO | ATTR_SW_READ | ATTR_AF | pte_noexec | pte_common,
      [VM_PROT_WRITE] =
8
        ATTR_AP_RW | ATTR_SW_WRITE | ATTR_AF | pte_noexec | pte_common,
9
      [VM_PROT_READ | VM_PROT_WRITE] = ATTR_AP_RW | ATTR_SW_READ |
10
        ATTR_SW_WRITE | ATTR_AF | pte_noexec | pte_common,
11
      [VM_PROT_EXEC] = ATTR_AF | pte_common,
12
      [VM_PROT_READ | VM_PROT_EXEC] =
13
        ATTR_AP_RO | ATTR_SW_READ | ATTR_AF | pte_common,
14
      [VM_PROT_WRITE | VM_PROT_EXEC] =
15
        ATTR_AP_RW | ATTR_SW_WRITE | ATTR_AF | pte_common,
16
      [VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXEC] =
17
        ATTR_AP_RW | ATTR_SW_READ | ATTR_SW_WRITE | ATTR_AF | pte_common,
18
    };
19
```

Listing 25: protection map (pmap.c [51])

#### 4.3.3 Walk

These functions are responsible for walking through page table. They use direct map ① which maps all physical memory into a contiguous area of virtual memory. The difference is that pmap\_ensure\_pte always returns a valid pointer to page table entry – new entries are allocated as needed ②.

```
static pte_t *pmap_lookup_pte(pmap_t *pmap, vaddr_t va) {
1
      pde_t *pdep;
2
      paddr_t pa = pmap->pde;
3
4
       /* Level 0 */
\mathbf{5}
     Opdep = (pde_t *)PHYS_TO_DMAP(pa) + LO_INDEX(va);
6
      if (!(pa = PTE_FRAME_ADDR(*pdep)))
7
        return NULL;
8
9
       /* Level 1 */
10
      pdep = (pde_t *)PHYS_TO_DMAP(pa) + L1_INDEX(va);
11
       if (!(pa = PTE_FRAME_ADDR(*pdep)))
12
        return NULL;
13
14
      /* Level 2 */
15
      pdep = (pde_t *)PHYS_TO_DMAP(pa) + L2_INDEX(va);
16
       if (!(pa = PTE_FRAME_ADDR(*pdep)))
17
        return NULL;
18
19
       /* Level 3 */
20
      return (pde_t *)PHYS_TO_DMAP(pa) + L3_INDEX(va);
21
    }
22
23
    static pte_t *pmap_ensure_pte(pmap_t *pmap, vaddr_t va) {
^{24}
      pde_t *pdep;
25
      paddr_t pa = pmap->pde;
26
27
       /* Level 0 */
28
      pdep = (pde_t *)PHYS_TO_DMAP(pa) + LO_INDEX(va);
29
      if (!(pa = PTE_FRAME_ADDR(*pdep))) {
30
        ②pa = pmap_alloc_pde(pmap, va);
31
         *pdep = pa | L0_TABLE;
32
      }
33
34
      /* Level 1 */
35
      pdep = (pde_t *)PHYS_TO_DMAP(pa) + L1_INDEX(va);
36
      if (!(pa = PTE_FRAME_ADDR(*pdep))) {
37
        pa = pmap_alloc_pde(pmap, va);
38
         *pdep = pa | L1_TABLE;
39
      }
40
41
       /* Level 2 */
42
      pdep = (pde_t *)PHYS_TO_DMAP(pa) + L2_INDEX(va);
43
      if (!(pa = PTE_FRAME_ADDR(*pdep))) {
44
        pa = pmap_alloc_pde(pmap, va);
45
         *pdep = pa | L2_TABLE;
46
      }
47
48
      /* Level 3 */
49
      return (pde_t *)PHYS_TO_DMAP(pa) + L3_INDEX(va);
50
    }
51
```

#### 4.3.4 Activation

This function activates given page table for user access. Pointer to level 0 of page table must be stored in ttbr0 register with address space identifier (ASID) ①. EPDO bit must be cleared in tcr register ②.

```
void pmap_activate(pmap_t *umap) {
1
      SCOPED_NO_PREEMPTION();
2
3
      PCPU_SET(curpmap, umap);
^{4}
5
      uint64_t tcr = READ_SPECIALREG(TCR_EL1);
6
      if (umap == NULL) {
8
        WRITE_SPECIALREG(TCR_EL1, tcr | TCR_EPD0);
9
      } else {
10
        uint64_t ttbr0 = ((uint64_t)umap->asid << ASID_SHIFT) | umap->pde;
11
       ①WRITE_SPECIALREG(TTBR0_EL1, ttbr0);
12
       @WRITE_SPECIALREG(TCR_EL1, tcr & ~TCR_EPD0);
13
      }
14
    }
15
```

Listing 27: activate virtual address space (pmap.c [51])

#### 4.3.5 Access emulation

Since we do not use hardware tracking of AF (access permission) and DBM (dirty page) we need to manage them from software. We do not do that because not every CPU supports that and we want to be compatible with solutions from MIPS version. After mapping page table entry doesn't contain AF bit so first access to that page triggers page fault. When we handle that exception we use pmap\_emulate\_bits for checking permissions for that access ①. If they are sufficient we set needed bits in page table entry (AF and some permission bits) in pmap\_set\_referenced and pmap\_set\_modified ②. In other case error is returned.

```
int pmap_emulate_bits(pmap_t *pmap, vaddr_t va, vm_prot_t prot) {
1
      paddr_t pa;
2
3
      WITH_MTX_LOCK (&pmap->mtx) {
4
         if (!pmap_extract_nolock(pmap, va, &pa))
\mathbf{5}
           return EFAULT:
6
7
        pte_t pte = *pmap_lookup_pte(pmap, va);
8
9
       ●if ((prot & VM_PROT_READ) && !(pte & ATTR_SW_READ))
10
           return EACCES;
11
12
         if ((prot & VM_PROT_WRITE) && !(pte & ATTR_SW_WRITE))
13
           return EACCES;
14
15
         if ((prot & VM_PROT_EXEC) && (pte & ATTR_SW_NOEXEC))
16
           return EACCES;
17
      }
18
19
      vm_page_t *pg = vm_page_find(pa);
20
       assert(pg != NULL);
21
22
      WITH_MTX_LOCK (pv_list_lock) {
23
         /* Kernel non-pageable memory? */
24
         if (TAILQ_EMPTY(&pg->pv_list))
25
           return EINVAL;
26
      }
27
28
     ②pmap_set_referenced(pg);
29
      if (prot & VM_PROT_WRITE)
30
         pmap_set_modified(pg);
31
32
33
      return 0;
    }
34
```

Listing 28: Emulate access and reference bits (pmap.c [51])

#### 4.3.6 Growkernel

Because address space on AArch64 is much bigger than on mips we can't describe whole virtual memory in various subsystems. It is the reason why pmap\_growkernel exists. When kmem (kernel memory allocator) fails with out of memory error it calls pmap\_growkernel to extend memory available for kernel space. Then new memory range is put to vmem (virtual memory allocator) and kmem call is restarted. That function is similar to MIPS 3.3.2 version so code comments are omitted.

```
void pmap_growkernel(vaddr_t maxkvaddr) {
1
       assert(mtx_owned(&vm_kernel_end_lock));
2
       assert(maxkvaddr > vm_kernel_end);
3
4
       pmap_t *pmap = pmap_kernel();
\mathbf{5}
       vaddr_t va;
6
       maxkvaddr = roundup2(maxkvaddr, L2_SIZE);
8
9
       WITH_MTX_LOCK (&pmap->mtx) {
10
         for (va = vm_kernel_end; va < maxkvaddr; va += L2_SIZE)</pre>
11
           pmap_ensure_pte(pmap, va);
12
       }
13
14
      kasan_grow(maxkvaddr);
15
16
       vm_kernel_end = maxkvaddr;
17
    }
18
```

Listing 29: pmap\_growkernel (pmap.c [51])

## 4.4 KASAN

Thanks to changes described in 3.4 we only need to chose where the shadow map is located and build initial shadow map.

For shadow map I have chosen 0xffffff0000000000 ④. It is located at the end of kernel space and only direct mapping is in higher addresses. Note that BSD systems follow the reverse order.

That address needs to be passed to C compiler directly because accesses to stack are sanitized directly by gcc code which doesn't use our functions. As a result we need to do one additional change in boot code – all functions in virtual addresses need to use virtually mapped stack. It is a reason why aarch64\_init returns \_boot\_stack.

Here is a code that build initial shadow map:

```
Osize_t kasan_sanitized_size =
1
    ❷2 * SUPERPAGESIZE + roundup2(va - KASAN_MD_SANITIZED_START,
2
                           SUPERPAGESIZE * KASAN_SHADOW_SCALE_SIZE);
3
    size_t kasan_shadow_size =
4
      kasan_sanitized_size / KASAN_SHADOW_SCALE_SIZE;
5
  6 vaddr_t kasan_shadow_end = KASAN_MD_SHADOW_START + kasan_shadow_size;
6
  ④va = KASAN_MD_SHADOW_START;
7
    *(vaddr_t *)AARCH64_PHYSADDR(&_kasan_sanitized_end) =
      KASAN_MD_SANITIZED_START + kasan_sanitized_size;
9
  10
11
  6 while (va < kasan_shadow_end) {
12
      if (10[L0_INDEX(va)] == 0)
13
        10[L0_INDEX(va)] = (pde_t)bootmem_alloc(PAGESIZE) | L0_TABLE;
14
15
      pde_t *l1k = (pde_t *)PTE_FRAME_ADDR(l0[L0_INDEX(va)]);
16
      if (l1k[L1_INDEX(va)] == 0)
17
        l1k[L1_INDEX(va)] = (pde_t)bootmem_alloc(PAGESIZE) | L1_TABLE;
18
19
      pde_t *l2k = (pde_t *)PTE_FRAME_ADDR(l1k[L1_INDEX(va)]);
20
      if (l2k[L2_INDEX(va)] == 0)
21
        l2k[L2_INDEX(va)] = (pde_t)bootmem_alloc(PAGESIZE) | L2_TABLE;
22
23
      pde_t *13k = (pde_t *)PTE_FRAME_ADDR(12k[L2_INDEX(va)]);
24
25

    for (int j = 0; va < kasan_shadow_end && j < PT_ENTRIES; j++) {
</pre>
26
      ③13k[L3_INDEX(va)] = pa | ATTR_AP_RW | ATTR_XN | pte_default;
27
        va += PAGESIZE;
28
        pa += PAGESIZE;
29
      }
30
    }
31
```

Listing 30: Build initial shadow map (boot.c [52])

First we need to calculate size of the current kernel space ①. With that knowledge we calculate end of shadow area ③ and physical memory is allocated ④. Then the mapping between virtual and physical memory is created ③ in page table for shadow map. The outer loop ④ does page table walk in each iteration and the inner loop ④ fills page table entries.

Additional superpages O are workaround for bug in machine-independent par of memory management subsystem which is not a part of that port.

#### 4.5 Boot

In this section we will discuss what is going on from first instruction to jumping to machine-independent part of code.

First thing we need to know is that first instruction is located at 0x200000 - it is kernel entry point.

#### 4.5.1 start.S

As we can see the start is not complicated. First we have magic header needed by bootloader **①**. Next we check current CPU number **②** – nowadays Mimiker is not ready to run on multiprocessor machine – if we are CPU0 we can execute code otherwise we are in the loop forever. We need to save a pointer to device tree blob **③**. That binary blob stores serialized information about devices present in machine and kernel's command line. The specification of device tree is available at [30]. Then initial stack is prepared **④** and we jump to C code – **aarch64\_init**. That code configures CPU and returns temporary stack **⑤**. Next we jump to **board\_stack** which configures final stack for kernel and finally to **board\_init** which is a trampoline for machine-independent code.

```
_ENTRY(_start)
1
              /* Based on locore.S from FreeBSD. */
2
              b
                        1f
3
            0.long
                        0
4
                        IMAGE_OFFSET
              .quad
\mathbf{5}
                        IMAGE_SIZE
              .quad
6
                        IMAGE_FLAGS
              .quad
7
              .quad
                        0
8
              .quad
                        0
9
              .quad
                       0
10
                        0x644d5241 /* Magic "ARM\x64" */
              .long
11
              .long
                        0
12
    1:
13
              /* Get CPU number. */
14
                        x3, MPIDR_EL1
            2 \text{ MRS}
15
                        x3, x3, #3
              AND
16
              CMP
                        x3, #0
17
              BNE
18
19
              /*
                  Save pointer to dtb. */
20
            Omov
                        x19, x0
21
              /* Setup initial stack. */
22
              ADR
                       x3, __boot_stack_end
^{23}
            sp, x3
24
25
              BL
                        aarch64_init
26
27
            Omov
                        sp, x0
28
              /* Restore dtb pointer. */
29
                        x0, x19
30
              mov
31
              BL
                        board_stack
32
              MOV
                        sp, x0
33
34
              В
                        board_init
35
     _END(_start)
36
```

Listing 31: First kernel instructions (start.S [53])

#### 4.5.2 boot.c

aarch64\_init is a dispatcher which calls functions that configure CPU.

AArch64 has four different exception levels 2.1.2. In our case exception level 0 is where user-space lives and exception level 1 is where kernel lives. But at the beginning we are not in exception level 1 so we need to drop ourselves to level 1 and it is exactly what drop\_to\_el1 does ①.

Next we need to clear .bss section of our binary **2**.

Finally we can build initial page table for kernel and configure MMU to use that page table **③**. For more details see 2.1.1, 4.3 and source code.

```
__boot_text void *aarch64_init(void) {
1
     1 drop_to_el1();
2
       configure_cpu();
3
     ②clear_bss();
\mathbf{4}
5
       /* Set end address of kernel for boot allocation purposes. */
6
       _bootmem_end = (void *)align(AARCH64_PHYSADDR(__ebss), PAGESIZE);
\overline{7}
8
     ③enable_mmu(build_page_table());
9
       return &_boot_stack[PAGESIZE];
10
    }
11
```

Listing 32: Early machine-dependent initialization (boot.c [52])

#### 4.5.3 board stack

The responsibility of board\_stack function is preparing final stack for the first kernel thread. This stack is preallocated in thread0 structure **①**. We process device tree blob **②** and extract important data: memory size, kernel's command line, location of initrd. These information are stored on kernel stack.

```
void *board_stack(paddr_t dtb) {
1
       dtb_early_init(dtb, fdt_totalsize(PHYS_TO_DMAP(dtb)));
\mathbf{2}
3
     ①kstack_t *stk = &thread0.td_kstack;
\mathbf{4}
5
       thread0.td_uctx = kstack_alloc_s(stk, mcontext_t);
6
7
       /*
8
        * NOTE: memsize, rd_start, rd_size, cmdline + 2 = 6
9
        */
10
       char **kenvp = kstack_alloc(stk, 6 * sizeof(char *));
11
     @process_dtb(kenvp, stk, (void *)PHYS_TO_DMAP(dtb));
12
       kstack_fix_bottom(stk);
13
       init_kenv(kenvp);
14
15
       return stk->stk_ptr;
16
    }
17
```

Listing 33: Build kernel stack (board.c [54])

The last thing is **board\_init** which configures machine-independent part of

kernel using data from dtb and jumps to kernel machine-independent entry.

```
1 __noreturn void board_init(void) {
2 init_kasan();
3 init_klog();
4 rpi3_physmem();
5 intr_enable();
6 kernel_init();
7 }
```

Listing 34: Jump to machine-independent code (board.c [54])

## 4.6 Exception handler

Exceptions are a form of exceptional control flow, implemented partly by the hardware and partly by the operating system. An exception is an abrupt change in the control flow in response to some change in the processor's state.

We have four classes of exceptions.

| cla   | class cause |                               | async/sync | return behaviour                    |
|-------|-------------|-------------------------------|------------|-------------------------------------|
| inter | rupt        | signal from I/O device        | async      | always returns to next instruction  |
| tra   | ар          | intentional exception         | sync       | always returns to next instruction  |
| fault |             | potentially recoverable error | sync       | might return to current instruction |
| abo   | ort         | nonrecoverable error          | sync       | never returns                       |

Table 4.2: classes of exceptions

For more high level information about exceptions see [10].

In Mimiker we only handle four types of exceptions.

- Synchronous EL1h
- IRQ EL1h
- Snchronous 64-bit EL0
- IRQ 64-bit EL0

There are more types of exceptions but we are running only on 0 & 1 exception modes so we do not care about others. For more information about exception modes see [7] and 2.1.

IRQ handlers are simple. We need to save context of running thread ① and jump to main interrupt handler in C code. save\_ctx and load\_ctx are macros that

save and restore context of thread. Additional parameter (1 or 0) is a information if context belongs to user-space of kernel-space. The same code is used for kernel exception handler.

Listing 35: irq exception handler (evec.S [55])

Things are more complicated for user exception handler. Again we need to save CPU context **①**. Here we also need to take care about FPU context when we return to user-space. We need to check if FPU is in use **②**. Based on that information we enable FPU **③** for CPU and if given thread already has saved FPU context **④** it is restored **⑤**. It is exactly the same as in mips 3.2.1.

```
.cfi_signal_frame
1
            1 save_ctx 0
2
              mov
                       x0, sp
3
              bl
                       user_trap_handler
4
    user_exc_leave:
\mathbf{5}
              /* disable interrupts */
6
                       daifset, #DAIF_I
              msr
7
8
              /* load thread_t::tdp_flags */
9
              //* thread_t:::tdp_flags is a volatile unsigned - use 32-bit */
10
              load_pcpu x1
11
                       x1, [x1, #PCPU_CURTHREAD]
              ldr
12
                       w3, [x1, #TD_PFLAGS]
              ldr
13
14
            2 and
                       w2, w3, #TDP_FPUINUSE
15
                       w2, wzr
              \mathtt{cmp}
16
                       .skip_fpu_restore
              beq
17
18
              /* enable FPU */
19
              mrs
                       x2, cpacr_el1
20
                       x2, x2, ~ CPACR_FPEN_MASK
              \operatorname{and}
^{21}
                       x2, x2, CPACR_FPEN_TRAP_NONE
22
              orr
            Ømsr
                       cpacr_el1, x2
23
24
                       w2, w3, #TDP_FPUCTXSAVED
            4 and
25
                       w2, wzr
              \mathtt{cmp}
26
                       .skip_fpu_restore
27
              beq
28
              /* clear TDP_FPUCTXSAVED flag */
29
                       w2, w3, TDP_FPUCTXSAVED
              and
30
                       w2, [x1, #TD_PFLAGS]
              str
31
32
              /* restore FPU context */
33
                       x1, [x1, #TD_UCTX]
              ldr
34
            ⑤load_fpu_ctx x1, 2
35
36
     .skip_fpu_restore:
37
             load_ctx 0
38
              eret
39
```

Listing 36: user exception handler (evec.S [55])

## 4.7 Context switching

Context switching is one of the most important parts of operating system. Without them we can't run multiple programs in parallel.

1

long ctx\_switch(thread\_t \*from, thread\_t \*to);

This procedure changes running thread on current CPU from from to to. Context switching is a very sensitive function so interrupts need to be disabled before that procedure **①**. Then we need to check if FPU must be saved by us **②**. It is true when we switch from thread that was in user-space with activated FPU. After that CPU context of current thread is saved **③**. In the next step active virtual memory space is changed to the used by to thread **④**, and finally CPU context of that thread is loaded **⑤**.

For the reason why FPU context is only saved here see 3.2.1.

```
# ctx_switch must be called with interrupts disabled
1
                       x2, daif
             mrs
2
            1 and
                       x2, x2, #PSR_I
3
              cmp
                       x2, xzr
\mathbf{4}
                       .ctx_save
             bne
\mathbf{5}
                       #0
             hlt
6
              # save context of @from thread
7
     .ctx_save:
8
                       w3, [x0, #TD_PFLAGS]
             ldr
9
                       w2, w3, #TDP_FPUINUSE | TDP_FPUCTXSAVED
             and
10
                       w4, #TDP_FPUINUSE
             mov
11
                       w2, w4
            ⊘ cmp
12
              bne
                       .skip_fpu_save
13
14
                       w3, w3, #TDP_FPUCTXSAVED
             orr
15
              str
                       w3, [x0, #TD_PFLAGS]
16
              /* enable FPU and save context */
17
             /*
                 thread_t::tdp_flags is a volatile unsigned - use 32-bit */
18
                       x2, cpacr_el1
             mrs
19
                       x2, x2, CPACR_FPEN_MASK
             and
20
                       x2, x2, CPACR_FPEN_TRAP_NONE
             orr
21
                       cpacr_el1, x2
             msr
22
^{23}
             ldr
                       x2, [x0, #TD_UCTX]
24
             save_fpu_ctx x2, 3
25
              /* disable FPU */
26
             mrs
                       x2, cpacr_el1
27
                       x2, x2, ~CPACR_FPEN_MASK
              \operatorname{and}
28
                       cpacr_el1, x2
             msr
29
     .skip_fpu_save:
30
^{31}
             sub
                       sp, sp, #CTX_SIZE
            SAVE_CTX
32
             mov
                      x2, sp
33
                       x2, [x0, #TD_KCTX]
             \operatorname{str}
34
35
     .ctx_resume:
              # switch stack pointer to @to thread
36
                       x2, [x1, #TD_KCTX]
37
             ldr
             mov
                       sp, x2
38
              # update curthread pointer to reference @to thread
39
             load_pcpu x2
40
                       x1, [x2, #PCPU_CURTHREAD]
^{41}
             \operatorname{str}
              # switch user space if necessary
42
                      x0, x1
             mov
43
            Øbl
                       vm_map_switch
44
              # restore @to thread context
45
             LOAD_CTX
46
47
            badd
                       sp, sp, #CTX_SIZE
             ret
48
```

Listing 37: context switch (switch.S [56])
## 4.8 Device tree

In this section I will describe minimal subset of drivers needed to boot Mimiker on RPi3.

We can think about devices as a tree. There is a one root which is an ancestor of all devices. Nodes are responsible for resources, like management of interrupts, for other devices e.g. USB controller. Leaves are final devices in our infrastructure e.g. keyboard.

## 4.8.1 Rootdev

Rootdev is a fake device. Purpose of rootdev is being an ancestor of all other devices. But for simplicity interrupt controller is integrated with rootdev device.

It provides methods for dispatching interrupts, enabling interrupts and disabling interrupts.

It is an interrupt dispatcher. Firstly CPU local interrupts are handled  $\mathbf{0}$ . Next it handles interrupts from peripherals  $\mathbf{0}$ .

```
1
    static void rootdev_intr_handler(ctx_t *ctx, device_t *dev, void *arg) {
      assert(dev != NULL);
2
      rootdev_t *rd = dev->state;
3
4
      /* Handle local interrupts. */
5
    ① bcm2835_intr_handle(rootdev_local_handle,
6
                           BCM2836_LOCAL_INTC_IRQPENDINGN(0),
7
                           &rd->intr_event[BCM2836_INT_BASECPUN(0)]);
8
9
      /* Handle GPU0 interrupts. */
10
     ②bcm2835_intr_handle(rootdev_arm_base,
11
                            (BCM2835_ARMICU_OFFSET + BCM2835_INTC_IRQ1PENDING),
12
                           &rd->intr_event[BCM2835_INT_GPU0BASE]);
13
14
      /* Handle GPU1 interrupts. */
15
      bcm2835_intr_handle(rootdev_arm_base,
16
                            (BCM2835_ARMICU_OFFSET + BCM2835_INTC_IRQ2PENDING),
17
                           &rd->intr_event[BCM2835_INT_GPU1BASE]);
18
19
      /* Handle base interrupts. */
20
      bcm2835_intr_handle(rootdev_arm_base,
21
                            (BCM2835_ARMICU_OFFSET + BCM2835_INTC_IRQBPENDING),
22
                           &rd->intr_event[BCM2835_INT_BASICBASE]);
23
    }
24
```

Listing 38: rootdev interrupt handler (bcm2835\_rootdev.c [57])

This is a handler for given set of interrupts. Each interrupt is represented by single bit where 1 means that interrupt is present and 0 means that it is absent. Single set is represented by 32-bit register located in physical memory ①. These registers are mapped to virtual memory during rootdev initialization. We iterate over pending interrupts ② and handle them one by one ③.

```
static void bcm2835_intr_handle(bus_space_handle_t irq_base,
1
                                      bus_size_t offset,
2
3
                                      intr_event_t **events) {
    Ouint32_t pending = bus_space_read_4(rootdev_bus_space, irq_base, offset);
4
5
      while (pending) {
6
      Qint irq = ffs(pending) - 1;
7
        /* XXX: some pending bits are shared between BASIC and GPU0/1. */
8
        if (events[irq])
9
         ③intr_event_run_handlers(events[irq]);
10
        pending &= (1 \ll irq);
11
      }
12
    }
13
```

```
Listing 39: bcm2835 interrupt handler (bcm2835_rootdev.c [57])
```

Again, to enable interrupt we need dispatcher very similar to interrupt handler.

```
static void rootdev_enable_irq(intr_event_t *ie) {
1
       int irq = ie->ie_irq;
2
       assert(irq < NIRQ);</pre>
3
4
       if (irq < BCM2836_NIRQ) {
\mathbf{5}
         /* Enable local IRQ. */
6
         enable_local_irq(irq);
7
      } else if (irq < BCM2835_INT_GPU1BASE) {</pre>
8
         /* Enable GPU0 IRQ. */
9
         enable_gpu_irq(irq - BCM2835_INT_GPU0BASE, BCM2835_INTC_IRQ1ENABLE);
10
      } else if (irq < BCM2835_INT_BASICBASE) {</pre>
11
         /* Enable GPU1 IRQ. */
12
         enable_gpu_irq(irq - BCM2835_INT_GPU1BASE, BCM2835_INTC_IRQ2ENABLE);
13
       } else {
14
         /* Enable base IRQ. */
15
         enable_gpu_irq(irq - BCM2835_INT_BASICBASE, BCM2835_INTC_IRQBENABLE);
16
      }
17
    }
18
```

Listing 40: (bcm2835\_rootdev.c [57])

To enable single interrupt we need to set suitable bit in register  $\mathbf{0}$ .

Listing 41: (bcm2835\_rootdev.c [57])

Disabling interrupts looks analogous.

For more information about interrupts see 2.1.3.

### 4.8.2 Timer

Timer is necessary if we want to run periodic tasks e.g. scheduler. Here I want to show simple implementation of driver for timer described at 2.1.4.

For start it is need to get current value of timer ①. Then it is possible to set next tick time ② and at the end, timer can be enabled ③.

```
static int arm_timer_start(timer_t *tm, unsigned flags __unused,
1
                                const bintime_t start __unused,
2
                                 const bintime_t period) {
3
      arm_timer_state_t *state = ((device_t *)tm->tm_priv)->state;
4
      state->step = bintime_mul(period, tm->tm_frequency).sec;
5
6
      WITH_INTR_DISABLED {
7
       ①uint64_t count = READ_SPECIALREG(cntpct_el0);
8
       2WRITE_SPECIALREG(cntp_cval_el0, count + state->step);
9
       @WRITE_SPECIALREG(cntp_ctl_el0, CNTCTL_ENABLE);
10
      }
11
12
      return 0;
13
    }
14
```

Listing 42: (timer.c [58])

To stop timer it is only needed to set value of cntp\_ctl\_el0 register.

```
static int arm_timer_stop(timer_t *tm) {
    WRITE_SPECIALREG(cntp_ctl_el0, CNTCTL_DISABLE);
    return 0;
  }
```

Listing 43: (timer.c [58])

To get current time we need to read current value of timer  $\bullet$  and convert them to the form used by machine-independent part of Mimiker.

```
static bintime_t arm_timer_gettime(timer_t *tm) {
1
    Ouint64_t count = READ_SPECIALREG(cntpct_el0);
2
     bintime_t res = bintime_mul(tm->tm_min_period, (uint32_t) count);
3
     bintime_t high_bits = bintime_mul(tm->tm_min_period,
4
                                          (uint32_t) (count >> 32));
5
     bintime_add_frac(&res, (high_bits.frac << 32));</pre>
6
     res.sec += (high_bits.sec << 32) + (high_bits.frac >> 32);
7
     return res;
8
   }
9
```

Listing 44: (timer.c [58])

The most important part is interrupt handler. Here we triggers machinedependent actions  $\bullet$  based on current time and at the end time of next tick is updated @.

```
static intr_filter_t arm_timer_intr(void *data /* device_t* */) {
1
      arm_timer_state_t *state = ((device_t *)data)->state;
2
3
4
     ①tm_trigger(&state->timer);
\mathbf{5}
      uint64_t prev = READ_SPECIALREG(cntp_cval_el0);
6
     @WRITE_SPECIALREG(cntp_cval_el0, prev + state->step);
7
8
      return IF_FILTERED;
9
    }
10
```

Listing 45: timer interrupt handler (timer.c [58])

## 4.8.3 PL011

For PL011 device described at 2.2 we only need to implement the following functions:

It checks if receiver hardware queue is ready.

```
static bool pl011_rx_ready(void *state) {
    pl011_state_t *pl011 = state;
    return (bus_read_4(pl011->regs, PL01XCOM_FR) & PL01X_FR_RXFE) == 0;
    }
```

Listing 46: (pl011.c [59])

Puts character in uart. Transmitter hardware queue must be ready.

```
static uint8_t pl011_getc(void *state) {
    pl011_state_t *pl011 = state;
    return bus_read_4(pl011->regs, PL01XCOM_DR);
  }
```

Listing 47: (pl011.c [59])

It checks if transmitter hardware queue is ready.

```
static bool pl011_tx_ready(void *state) {
    pl011_state_t *pl011 = state;
    return (bus_read_4(pl011->regs, PL01XCOM_FR) & PL01X_FR_TXFF) == 0;
    }
```

Listing 48: (pl011.c [59])

It enables transmitter interrupt.

```
static void pl011_tx_enable(void *state) {
    pl011_state_t *pl011 = state;
    set4(pl011->regs, PL011COM_CR, PL011_CR_TXE);
  }
```

Listing 49: (pl011.c [59])

It disables transmitter interrupt.

```
static void pl011_tx_disable(void *state) {
    pl011_state_t *pl011 = state;
    clr4(pl011->regs, PL011COM_CR, PL011_CR_TXE);
  }
```

Listing 50: (pl011.c [59])

So all functions are simple wrappers for reading, setting, clearing bits.

## 4.9 Summary

In this chapter we have seen the most important pieces of code used for AArch64 port and Raspberry Pi 3 drivers. We have started with glue between user-space and kernel-space which is used by every single program. Next we have gone through MMU related code. It allows us to use virtual memory as a abstraction over resources. After that we have seen kernel bootstrapping and kernel address sanitizer initialization. At the end of core kernel code we became more familiar with exception handlers and context switching. Finally we have seen drivers for Raspberry Pi 3 that implement our hardware abstraction layer.

# Chapter 5

# Mimiker on Raspberry Pi 3

In this chapter I will show how to run Mimiker on Raspberry Pi 3 board with ARM-8 Cortex-A53 CPU.

Everything was tested with Debian 10 as a build machine [18], Raspberry Pi 3 Model 3B, MicroSD card and Segger J-Link EDU as hardware debugger [20].

## 5.1 Installation

### 5.1.1 Toolchain

There is a toolchain directory in a repository. There is a Makefile for each directory so you only need to run make from console. After that deb packages with toolchain will be built. Please be patient – compilation of toolchain is a long process.

These packages need to be installed by dpkg [19] command and currently they are supported only on Debian [18].

#### 5.1.2 Configuration

Mimiker has a few build options that need to be set before compilation. These are:

- BOARD build image for given board
- CLANG use clang instead of gcc as a C compiler
- LOCKDEP build with lock dependency validator
- KASAN build with kernel address sanitizer
- KGPROF build with kernel profiler

BOARD need to be set to rpi3 in config.mk. For now only KASAN is well tested.

### 5.1.3 Compilation

You only need to run make command.

## 5.1.4 Final installation

I highly recommend to use sd card image of raspbian operating system [29]. That image already contains necessary firmware and configuration files on boot partition.

Copy kernel (mimiker.img) and initrd to memory card. Then you need to modify kernel, arm\_64bit, initramfs and set kernel\_address in config.txt to 0x200000 according to [6].

Memory card should be formatted in standard way – only one partition formatted as FAT32 is needed.

Note that due to the problems discussed at 5.2.4, 5.2.2 and active development, without automatic tests on Rasperry Pi 3, there is a possibility that Mimiker will crash after launch.

### 5.1.5 Debugging

#### Hardware debugger

For debugging I have been using Segger J-Link EDU [20].

It implements JTAG (Joint Test Action Group) standard for verifying designs and testing printed circuit boards after manufacture.

I have been using that tool with OpenOCD software. Before we can start with software debugging we need to connect J-Link to Raspberry Pi 3. You can use the following diagram 5.1 from [7]:



Figure 5.1: J-Link connection diagram for Raspberry Pi 3 [7]

Here is a photo of my setup 5.2:



Figure 5.2: Raspberry Pi 3 with J-Link and UART

It includes additional UART connected on right side and power source via micro USB.

## OpenOCD

For debugging you can use OpenOCD (it is included in our toolchain).

openoed -c "" -f jlink.cfg -f rpi3\_64.cfg

With the following configuration for jlink.cfg:

adapter driver jlink

and following configuration for rpi3\_64.cfg:

```
transport select jtag
1
    reset_config trst_and_srst
2
    adapter speed 1000
3
    jtag_ntrst_delay 500
4
    if { [info exists CHIPNAME] } {
\mathbf{5}
     set _CHIPNAME $CHIPNAME
6
    } else {
7
     set _CHIPNAME rpi3
8
    }
9
    if { [info exists DAP_TAPID] } {
10
       set _DAP_TAPID $DAP_TAPID
11
    } else {
12
       set _DAP_TAPID 0x4ba00477
13
    }
14
    jtag newtap $_CHIPNAME cpu -expected-id $_DAP_TAPID -irlen 4
15
    dap create $_CHIPNAME.dap -chain-position $_CHIPNAME.cpu
16
    set _TARGETNAME_0 $_CHIPNAME.cpu0
17
    set _TARGETNAME_1 $_CHIPNAME.cpu1
18
    set _TARGETNAME_2 $_CHIPNAME.cpu2
19
    set _TARGETNAME_3 $_CHIPNAME.cpu3
20
    set _CTINAME_0 $_CHIPNAME.cti0
21
    set _CTINAME_1 $_CHIPNAME.cti1
22
    set _CTINAME_2 $_CHIPNAME.cti2
23
    set _CTINAME_3 $_CHIPNAME.cti3
24
    # The ARM Cross-Trigger Interface (CTI)
25
    cti create $_CTINAME_0 -dap $_CHIPNAME.dap -ap-num 0 -baseaddr 0x80018000
26
    target create $_TARGETNAME_0 aarch64 -dap $_CHIPNAME.dap -coreid 0 \
27
        -dbgbase 0x80010000 -cti $_CTINAME_0
28
    cti create $_CTINAME_1 -dap $_CHIPNAME.dap -ap-num 0 -baseaddr 0x80019000
29
    target create $_TARGETNAME_1 aarch64 -dap $_CHIPNAME.dap -coreid 1 \
30
        -dbgbase 0x80012000 -cti $_CTINAME_1
31
    cti create $_CTINAME_2 -dap $_CHIPNAME.dap -ap-num 0 -baseaddr 0x8001A000
32
    target create $_TARGETNAME_2 aarch64 -dap $_CHIPNAME.dap -coreid 2 \
33
        -dbgbase 0x80014000 -cti $_CTINAME_2
34
    cti create $_CTINAME_3 -dap $_CHIPNAME.dap -ap-num 0 -baseaddr 0x8001B000
35
    target create $_TARGETNAME_3 aarch64 -dap $_CHIPNAME.dap -coreid 3 \
36
        -dbgbase 0x80016000 -cti $_CTINAME_3
37
    $_TARGETNAME_0 configure -event reset-assert-post "aarch64 dbginit"
38
    $_TARGETNAME_0 configure -event gdb-attach { halt }
39
    $_TARGETNAME_1 configure -event reset-assert-post "aarch64 dbginit"
40
    $_TARGETNAME_1 configure -event gdb-attach { halt }
41
    $_TARGETNAME_2 configure -event reset-assert-post "aarch64 dbginit"
42
    $_TARGETNAME_2 configure -event gdb-attach { halt }
43
    $_TARGETNAME_3 configure -event reset-assert-post "aarch64 dbginit"
44
    $_TARGETNAME_3 configure -event gdb-attach { halt }
45
```

After that you can use gdb for remote debugging.

```
aarch64-mimiker-elf-gdb sys/mimiker.elf
   -ex 'set architecture aarch64'
   -ex 'file sys/mimiker.elf'
   -ex 'target extended-remote localhost:3333'
   -ex 'monitor reset init'
   -ex 'monitor targets rpi3.cpu0'
   -ex "monitor load_image sys/mimiker.img 0x200000 bin"
   -ex "monitor reg pc 0x200000'
   -ex "load sys/mimiker.elf"
   -ex 'source .gdbinit'
```

For more information about debugging without operating system support and explanation of used commands see [7].

## 5.2 Challenges

Here I want to mention the most bothersome problems I encountered when running Mimiker on Raspberry Pi 3. Most of them are caused by differences between QEMU emulator and real hardware. QEMU doesn't emulate every single detail of Rapsberry Pi 3 so most of errors can't be detected by our CI system. They require installation and debugging on physical hardware which is a more complicated process than development in virtualized environment.

#### 5.2.1 Boot process

On QEMU emulator we can boot directly from ELF (Executable and Linkable Format) but on physical machine it is not working. Kernel image needs to be a binary blob. We can achieve that by objcopy command:

objcopy -O binary mimiker.elf mimiker.img

## 5.2.2 Destroying x0

The most disturbing error that I have found is first instruction of kernel code:

| 0000000000200 | 0000 <_start >:     |        |                                                               |
|---------------|---------------------|--------|---------------------------------------------------------------|
| 200000:       | 14000010            | b      | $200040 < \_start + 0x40 >$                                   |
|               |                     |        |                                                               |
| 200010:       | $000\mathrm{af75c}$ | . word | $0 \ge 000 = 175 c$                                           |
| 200014:       | 00000000            | . word | $0 \ge 0 \ge$ |
| 200018:       | 00000002            | . word | $0 \ge 0 \ge$ |
|               |                     |        |                                                               |

| 200038:  | $644\mathrm{d}5241$  | . word               | $0 \mathrm{x} 644 \mathrm{d} 5241$         |
|----------|----------------------|----------------------|--------------------------------------------|
| 20003 c: | 00000000             | .word                | $0 \ge 0000000000000000000000000000000000$ |
| 200040:  | aa0003f3             | mov                  | x19, $x0$                                  |
| 200044:  | d53800a3             | $\operatorname{mrs}$ | $x3$ , mpidr_el1                           |
| 200048:  | 92400463             | and                  | x3, x3, #0x3                               |
| 20004c:  | $\mathrm{f100007f}$  | $\operatorname{cmp}$ | x3, $#0x0$                                 |
| 200050:  | 54000001             | b.ne                 | 200050 < start + 0x50 >                    |
| // b.any |                      |                      |                                            |
| 200054:  | $1000 \mathrm{c3e3}$ | adr                  | $x3$ , 2018d0 <_bootmem_end>               |
| 200058:  | $9100007{ m f}$      | mov                  | ${ m sp}\ ,\ { m x3}$                      |
| 20005 c: | $940004\mathrm{f}7$  | bl                   | $201438$ < arch64_init >                   |
| 200060:  | $9100001\mathrm{f}$  | mov                  | ${ m sp}\;,\;\;{ m x0}$                    |
| 200064:  | aa1303e0             | mov                  | x0, $x19$                                  |
| 200068:  | $9400050\mathrm{e}$  | bl                   | $2014a0 < \_board_stack_veneer >$          |
| 20006c:  | $9100001{ m f}$      | mov                  | ${ m sp}\ ,\ { m x0}$                      |
| 200070:  | 14000512             | b                    | $2014b8 < \_board\_init\_veneer >$         |
|          |                      |                      |                                            |

Before first instruction x0 contains address of atags or dtb. But for unknown reason first branch instruction destroys x0 and put address of pc in that register. We can live without that by hardcoding that address in kernel but it is not the best solution.

## 5.2.3 Address alignment

During early initialization of MMU we set SCTLR\_SAO, SCTLR\_SA and SCTLR\_A bits of sctlr\_el1. They are responsible for checking alignment of kernel stack, user stack and memory access. These bits are not implemented by QEMU. As a result we get alignment exceptions on Raspberry Pi 3 because our implementation of memcpy doesn't meet these requirements. That issue has been resolved.

## 5.2.4 Cache control

In original implementation of Mimiker for MIPS architecture we didn't care about cache control. QEMU doesn't support cache and we never tried to run Mimiker on physical Malta board.

It also wasn't a problem for AArch64 implementation for QEMU emulator. Everything works without any support for caches. Unfortunately on real hardware details are different, now caches matter. It looks like running multiple threads in different address spaces causes cache mismatch for user-space processes. It means that one process uses caches of other process which was running before on the same core. It is a real problem because we can't test that in virtualized environment so our tests are useless for that kind of bugs.

# Chapter 6

# Summary

Adapting operating system to new architecture is a long journey. It requires knowledge of most parts of the kernel. Working with emulated environment is not the same as working with real hardware because emulator usually doesn't implement all details of hardware. Hardware will not forgive mistakes which could be ignored by emulator. First port is also a challenge because it needs to separate machinedependent part from kernel and requires to create abstraction over hardware which will be machine-independent.

In my thesis I have prepared toolchain and infrastructure for developing Mimiker on AArch64 architecture and Rapsberry Pi 3 board. I have separated machinedependent part of MIPS code from kernel and written the following components:

- kernel bootstrapping
- context switching
- exception handler
- pmap
- KASAN
- copy routines
- syscall handler
- timer driver
- UART driver

The results of that work are available in Mimiker repository [17]. Now we have:

• original MIPS implementation

- fully working AArch64 port
- drivers needed for run on Raspberry Pi 3 board
- automated tests in CI for AArch64 which pass
- integrated kernel address sanitizer
- infrastructure for future work on other architectures

The nice side effect is that everybody with basic knowledge can run Mimiker at home.



Tetris game on Raspberry Pi 3 build.

The well known Tetris 6 game is an example of program that uses complex abstractions (for example system calls and terminal subsystem). Now we can run that program on AArch64 build with full functionality.

## 6.1 Future work

There are two big achievements of my work that can be further developed in the future.

First is a support for CPU that has many cores. It gives opportunity to add SMP support for Mimiker. It requires changes in multiple kernel subsystems. The most affected one will be scheduler but it is not the only one that needs to be changed. The VFS subsystem has substantial problems with locking – running multiple processes that use file system causes deadlocks. Locking mechanism also needs to be adapted – e.g. spinlocks assume that there is only a single CPU core.

Second one is bootstrapping Mimiker on a physical board. We know that QEMU is not perfect and hides some of our bugs. We may invest time in tests infrastructure that uses multiple Raspberry Pi 3 boards connected to development serve with setup similar to the one used in my work. It should give an opportunity to track changes in machine-dependent subsystems of Mimiker without QEMU's quirks.

Now that the port on AArch64 is as mature as the MIPS version, we can start implementing drivers for the rest of the devices available in Raspberry Pi 3. Drivers for USB and video will make it possible to use Raspberry Pi 3 as a multimedia device with Mimiker operating system.

# Bibliography

- [1] C99 standard (ISO/IEC 9899:1999)
- [2] ARM Cortex-A Series Version: 1.0 Programmer's Guide for ARMv8-A
- [3] ARMv8-A Address Translation Version: 1.0
- [4] MARSHALL KIRK MCKUSICK, GEORGE V. NEVILLE-NEIL, ROBERT N.M. WATSON, The Design and Implementation of the FreeBSD Operating System, Second Edition
- [5] FREEBSD MANPAGES, pmap(9) https://www.freebsd.org/cgi/man.cgi?query=pmap&apropos=0&sektion= 0&manpath=FreeBSD+13.0-current&arch=default&format=html
- [6] RASPBERRY PI DOCUMENTATION, config.txt https://www.raspberrypi.org/documentation/configuration/ config-txt/
- [7] MICHAŁ BARNAŚ, Hardware Abstraction Layer For ARMv8 Processors
- [8] NETBSD MANPAGES, signal(7) https://man.netbsd.org/signal.7
- [9] NETBSD MANPAGES, kill(2) https://man.netbsd.org/kill.2
- [10] BRYANT, O'HALLARON, Computer Systems A Programmer's Perspective, Second Edition
- [11] CHARLES D. CRANOR, Design and implementation of the UVM virutal memory system
- [12] JAKUB PIECUCH, Implementation of the Terminal Subsystem and Job Control in the Mimiker Operating System
- [13] JULIAN PSZCZOŁOWSKI, Integrating the Kernel Address Sanitizer into the Mimiker Operating System

- [14] ANDREW S. TANENBAUM, HERBERT BOS, *Modern Operating Systems*, Fourth Edition
- [15] BORADCOM BCM2837 ARM Peripherals
- [16] MIMIKER WEB PAGE, https://mimiker.ii.uni.wroc.pl
- [17] MIMIKER REPOSITORY, https://github.com/cahirwpz/mimiker
- [18] DEBIAN, The Universal Operating System https://www.debian.org/
- [19] DPKG SUITE Debian manpages https://manpages.debian.org/buster/dpkg/dpkg.1.en.html
- [20] SEGGER WEB PAGE https://www.segger.com/products/debug-probes/j-link/
- [21] BCM2837 DOCUMENTATION https://www.raspberrypi.org/documentation/hardware/raspberrypi/ bcm2837/README.md
- [22] ARM CORTEX-A53 MPCORE PROCESSOR TECHNICAL REFERENCE MANUAL https://developer.arm.com/documentation/ddi0500/j/
- [23] NETBSD MANPAGES, queue(3) https://man.netbsd.org/queue.3
- [24] NETBSD MANPAGES, bcopy(3) https://man.netbsd.org/bcopy.3
- [25] BROADCOM WEBPAGE https://www.broadcom.com/
- [26] MIMIKER MUTEX API https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/include/sys/ mutex.h?r=bfb3bae9
- [27] MIMIKER SPIN LOCK API https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/include/sys/ spinlock.h?r=27b8c19a
- [28] MIMIKER CONDITIONAL VARIABLE API https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/include/sys/ condvar.h?r=71604845
- [29] RASPBERRY PI OPERATING SYSTEM IMAGES https://www.raspberrypi.org/software/operating-systems/
- [30] DEVICETREE SPECIFICATION https://github.com/devicetree-org/devicetree-specification

#### BIBLIOGRAPHY

- [31] MIMIKER SOURCE CODE include/sys/ucontext.h https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/include/sys/ ucontext.h?r=4ba50916
- [32] MIMIKER SOURCE CODE include/aarch64/mcontext.h https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/include/ aarch64/mcontext.h?r=731f9b87
- [33] MIMIKER SOURCE CODE sys/mips/pmap.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/mips/pmap. c?r=d5439d54
- [34] MIMIKER SOURCE CODE sys/mips/boot.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/mips/boot. c?r=2609772a
- [35] MIMIKER SOURCE CODE sys/kern/kasan.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/kern/kasan. c?r=fc47d4fd
- [36] MIMIKER SOURCE CODE include/dev/uart.h https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/include/dev/ uart.h?r=2609772a
- [37] MIMIKER SOURCE CODE sys/drv/uart.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/drv/uart.c? r=2609772a
- [38] MIMIKER SOURCE CODE sys/kern/uart\_tty.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/kern/uart\_ tty.c?r=2609772a
- [39] MIMIKER SOURCE CODE sys/aarch64/copy.S https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ copy.S?r=8dda89f3
- [40] MIMIKER SOURCE CODE include/aarch64/syscall.h https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/include/ aarch64/syscall.h?r=6da6392f
- [41] MIMIKER SOURCE CODE include/sys/sysent.h https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/include/sys/ sysent.h?r=4cae32de
- [42] MIMIKER SOURCE CODE sys/kern/syscalls.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/kern/ syscalls.c?r=8ae97262

- [43] MIMIKER SOURCE CODE sys/aarch64/trap.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ trap.c?r=db7eaf68
- [44] MIMIKER SOURCE CODE lib/csu/aarch64/crt0.S https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/lib/csu/ aarch64/crt0.S?r=4cb7508a
- [45] MIMIKER SOURCE CODE sys/aarch64/sigcode.S https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ sigcode.S?r=9c185874
- [46] MIMIKER SOURCE CODE sys/aarch64/signal.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ signal.c?r=9c185874
- [47] MIMIKER SOURCE CODE lib/libc/gen/aarch64/\_setjmp.S https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/lib/libc/gen/ aarch64/\_setjmp.S?r=f0de79b8
- [48] MIMIKER SOURCE CODE lib/libc/gen/aarch64/longjmp.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/lib/libc/gen/ aarch64/longjmp.c?r=1d93d219
- [49] MIMIKER SOURCE CODE lib/libc/gen/aarch64/setjmp.S https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/lib/libc/gen/ aarch64/setjmp.S?r=96506ee9
- [50] MIMIKER SOURCE CODE lib/libc/gen/aarch64/sigsetjmp.S https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/lib/libc/gen/ aarch64/sigsetjmp.S?r=f0de79b8
- [51] MIMIKER SOURCE CODE sys/aarch64/pmap.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ pmap.c?r=fd5537f8
- [52] MIMIKER SOURCE CODE sys/aarch64/boot.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ boot.c?r=fd5537f8
- [53] MIMIKER SOURCE CODE sys/aarch64/start.S https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ start.S?r=fd5537f8
- [54] MIMIKER SOURCE CODE sys/aarch64/board.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ board.c?r=fd5537f8

#### BIBLIOGRAPHY

- [55] MIMIKER SOURCE CODE sys/aarch64/evec.S https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ evec.S?r=1f315016
- [56] MIMIKER SOURCE CODE sys/aarch64/switch.S https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ switch.S?r=96506ee9
- [57] MIMIKER SOURCE CODE sys/drv/bcm2835\_rootdev.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/drv/ bcm2835\_rootdev.c?r=2609772a
- [58] MIMIKER SOURCE CODE sys/aarch64/timer.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/aarch64/ timer.c?r=2609772a
- [59] MIMIKER SOURCE CODE sys/drv/pl011.c https://mimiker.ii.uni.wroc.pl/source/xref/mimiker/sys/drv/pl011. c?r=2609772a