|
| 1 | +--- |
| 2 | +title: 2024秋冬开源操作系统训练营第三阶段总结-杨学聪 |
| 3 | +date: 2024-11-28 10:36:20 |
| 4 | +categories: |
| 5 | + - summary |
| 6 | +tags: |
| 7 | + - author:dosconio |
| 8 | + - 2024秋冬季开源操作系统训练营 |
| 9 | + - 第三阶段总结报告 |
| 10 | +--- |
| 11 | + |
| 12 | + |
| 13 | +After the 3rd stage of the 2024 autumn winter open source operating system training camp, I would like to share my experience and summary with you. |
| 14 | + |
| 15 | +There are three system structures to compare: |
| 16 | +- Unikernel |
| 17 | + - Single-Privilege Physical-Address-Space Combination-of-Kernel-and-UserApps |
| 18 | +- Macrokernel Mode |
| 19 | + - Multi-Privilege Paging-Address-Space Isolation-between-Kernel-and-UserApps |
| 20 | +- Hypervisor (Virtual) Mode |
| 21 | + - Isolation-between-Host-and-Guest |
| 22 | + |
| 23 | +The critical key to the Component-based design is the `Feature`. This can be configured in the Cargo.toml file and in the Makefile environment variable, which can decide which content to compile and link. It feels like a advanced `#if - #endif` switch option. |
| 24 | + |
| 25 | +### Unikernel |
| 26 | + |
| 27 | +<!-- U1 --> |
| 28 | + |
| 29 | +We use 3 stages to express the system advancing from bare-metal program to a component-based system. |
| 30 | +- Bare-metal program |
| 31 | + - Hardfirm + Bootloader |
| 32 | + - Initialize the special registers |
| 33 | + - Initialize MMU (for Paging) |
| 34 | + - Initialize Stack |
| 35 | + - Initialize Interrupt Vector |
| 36 | + - Initialize Peripherals / Devices |
| 37 | + - Transfer to `main` program |
| 38 | +- Layer / Hierarchy |
| 39 | + - Hardfirm + Bootloader (Layer) |
| 40 | + - **Hardware Initialization**(Layer): the special registers 、MMU(for Paging)、STACK、Interrupt Vector |
| 41 | + - Peripherals (Layer) |
| 42 | + - Transfer to `main` program (Layer) |
| 43 | +- Component-based |
| 44 | + - Hardfirm + Bootloader |
| 45 | + - **Hardware Initialization** (HAL) |
| 46 | + - Runtime Environment (RTE) |
| 47 | + - Transfer to `main` program |
| 48 | + |
| 49 | +```C |
| 50 | +void Reset_Handler(void) { |
| 51 | + __asm__ volatile( |
| 52 | + ".code 32 \n" |
| 53 | + "CPSID if \n"// Mask interrupts |
| 54 | + /* Put any cores other than 0 to sleep */ |
| 55 | + "MRC p15, 0, R0, c0, c0, 5 \n" /* Read MPIDR */ |
| 56 | + "ANDS R0, R0, #3 \n" |
| 57 | + "goToSleep: \n" |
| 58 | + "ITT NE \n" /* Needed when in Thumb mode for following WFINE instruction */ |
| 59 | + "WFINE \n" |
| 60 | + "BNE goToSleep \n" |
| 61 | + /* Reset SCTLR Settings */ |
| 62 | + "MRC p15, 0, R0, c1, c0, 0 \n" /* Read CP15 System Control register */ |
| 63 | + "BIC R0, R0, #(0x1 << 12) \n" /* Clear I bit 12 to disable I Cache */ |
| 64 | + "BIC R0, R0, #(0x1 << 2) \n" /* Clear C bit 2 to disable D Cache */ |
| 65 | + "BIC R0, R0, #0x1 \n" /* Clear M bit 0 to disable MMU */ |
| 66 | + "BIC R0, R0, #(0x1 << 11) \n" /* Clear Z bit 11 to disable branch prediction */ |
| 67 | + "BIC R0, R0, #(0x1 << 13) \n" /* Clear V bit 13 to disable hivecs */ |
| 68 | + "BIC R0, R0, #(0x1 << 29) \n" /* Clear AFE bit 29 to enable the full range of access permissions */ |
| 69 | + "ORR R0, R0, #(0x1 << 30) \n" /* Set TE bit to take exceptions in Thumb mode */ |
| 70 | + "MCR p15, 0, R0, c1, c0, 0 \n" /* Write value back to CP15 System Control register */ |
| 71 | + "ISB \n" |
| 72 | + /* Configure ACTLR */ |
| 73 | + "MRC p15, 0, r0, c1, c0, 1 \n" /* Read CP15 Auxiliary Control Register */ |
| 74 | + "ORR r0, r0, #(1 << 1) \n" /* Enable L2 prefetch hint (UNK/WI since r4p1) */ |
| 75 | + "MCR p15, 0, r0, c1, c0, 1 \n" /* Write CP15 Auxiliary Control Register */ |
| 76 | + /* Set Vector Base Address Register (VBAR) to point to this application's vector table */ |
| 77 | + "LDR R0, =Vectors \n" |
| 78 | + "MCR p15, 0, R0, c12, c0, 0 \n" |
| 79 | + "ISB \n" |
| 80 | + ... |
| 81 | + "CPSIE if \n"// Unmask interrupts |
| 82 | + "BL __libc_init_array \n" |
| 83 | + "BL main \n" |
| 84 | + ...) |
| 85 | +} |
| 86 | +``` |
| 87 | +
|
| 88 | +The above code selected from the HAL Code of STM32MP13, as the initialization and reset handler code, which will help initialize STACK, Interrupt-Vector and critical registers, and end up transferring to `main` program. |
| 89 | +The code is followed similar logic as the rCore, and can help us have a better understanding of the MCU, MPU and CPU. |
| 90 | +
|
| 91 | +--- |
| 92 | +
|
| 93 | +<!-- U2 --> |
| 94 | +
|
| 95 | +Like rCore, We need to provide implementation for the memory interfaces about heap operations, to avoid memory leaks and memory fragmentation. |
| 96 | +This can help us manage memory more efficiently, and provide convenience for future expansion. |
| 97 | +
|
| 98 | +There are 2 kinds of Memory Allocation Functions: One is based on Page (`palloc`), and the other is based on Byte (`balloc`), where the "ByteAlloc" is based on "PageAlloc". |
| 99 | +
|
| 100 | +> If we take "PageAlloc" based on "ByteAlloc", it will be difficult to align. |
| 101 | +
|
| 102 | +Algorithm for Memory Allocation |
| 103 | +- TLSF, Two-Level Segregated Fit |
| 104 | +- Buddy |
| 105 | +- Slab |
| 106 | +- Bump |
| 107 | +
|
| 108 | +--- |
| 109 | +
|
| 110 | +<!-- U3 --> |
| 111 | +
|
| 112 | +How to enable paging mechanism: |
| 113 | +1. Early in the kernel startup, use the ruled identity mapping part of memory |
| 114 | + - `0xffff_ffc0_8000_0000 ~ 0xffff_ffc0_C000_FFFF` $\rightarrow$ `0x8000_0000~0xC000_0000` |
| 115 | + - Note that some address-related registers, such as SP, need to be changed to linear addresses |
| 116 | +1. Then if paging feature specified, rebuild the complete paging reflect. |
| 117 | +
|
| 118 | +--- |
| 119 | +
|
| 120 | +<!-- U4 --> |
| 121 | +
|
| 122 | +**Task switching**: |
| 123 | +Swaps the task currently being executed with a task in the ready queue. |
| 124 | +For Single-core CPU, the form of multitasking can only be concurrent, but not parallel. |
| 125 | +
|
| 126 | +
|
| 127 | +State of Task |
| 128 | +- Running |
| 129 | + - The number is equal to the number of processor cores |
| 130 | + - **SWITCH-TO**: Ready or Blocked or Exited |
| 131 | +- Ready |
| 132 | + - Is ready to be scheduled at any time |
| 133 | + - **SWITCH-TO**: Running |
| 134 | +- Blocked |
| 135 | + - Waiting for an event or resource satisfying a condition |
| 136 | + - **SWITCH-TO**: Ready |
| 137 | +- Exited |
| 138 | + - The task is finished and waiting to be recycled |
| 139 | +
|
| 140 | +--- |
| 141 | +
|
| 142 | +<!-- U5 --> |
| 143 | +
|
| 144 | +Task switching: Firsty, save the context of the current task. Then, restore the context of the new task. Finally, trandfer to switch tasks. |
| 145 | +Note that the interrupt enable state of the processor switching the task, should be off during the switching process (CLI). |
| 146 | +If neccessary, the **Spinlock** and **Mutex** are required to be used to avoid race conditions (SMP?). |
| 147 | +
|
| 148 | +--- |
| 149 | +
|
| 150 | +<!-- U6 --> |
| 151 | +
|
| 152 | +There are usual preemption conditions: One is the time slice of the task is exhausted, and the other is the interrupt source, such as the clock (Timer). The privilege level may be used to determine the nested or re-entry of the traps. |
| 153 | +
|
| 154 | +Algorithm of Scheduling |
| 155 | +- Collaborative scheduling algorithm (FIFO, fair) |
| 156 | +- Preemptive scheduling algorithm (Privileged) |
| 157 | + - ROUND_ROBIN |
| 158 | + - CFS (Completely Fair Scheduler) |
| 159 | +
|
| 160 | +--- |
| 161 | +
|
| 162 | +<!-- U7 --> |
| 163 | +
|
| 164 | +The DEVICEs are usually in the kinds of `net`, `block`, `display` and so on. |
| 165 | +
|
| 166 | +
|
| 167 | +How to **discover and initialize devices** |
| 168 | +
|
| 169 | +- (axruntime at startup) discovers the device and initializes it with the appropriate driver |
| 170 | +- axdriver Indicates the process of discovering and initializing devices |
| 171 | + - 2-stage cyclic detection discovers the device |
| 172 | + - **Level 1**: traverses all virtio_mmio address ranges, determined by the platform physical memory layout, and performs transition page mapping |
| 173 | + - **Level 2**: Enumerate devices with the for_each_drivers macro, and then probe each virtio device probe_mmio |
| 174 | +- probe discovers devices based on the bus, matches drivers one by one, and initializes them |
| 175 | + - Bus connecting with devices |
| 176 | + - PCI |
| 177 | + - MMIO |
| 178 | +
|
| 179 | +
|
| 180 | +--- |
| 181 | +
|
| 182 | +<!-- U8 --> |
| 183 | +
|
| 184 | +A File System is a mechanism used in an operating system to manage files and data on computer storage devices such as hard disks, SSDS, flash memory, etc. |
| 185 | +(In Linux, every device will also exist as one or more files) |
| 186 | +
|
| 187 | +File System |
| 188 | +- RAMFS: A memory-based virtual file system |
| 189 | + - For temporary data storage that is fast but easy to lose. |
| 190 | +- DEVFS: Device file system |
| 191 | + - For managing and accessing hardware devices, simplifying the development and access of device drivers. |
| 192 | +- ProcFS: process file system |
| 193 | + - Provides system process and status information for system monitoring and management. |
| 194 | +- SysFS: System file system |
| 195 | + - Exports kernel objects and properties for viewing and managing hardware devices and drivers. |
| 196 | +
|
| 197 | +
|
| 198 | +### Macro kernel |
| 199 | +
|
| 200 | +More than before: |
| 201 | +- Privilege Level |
| 202 | +- Address space |
| 203 | +
|
| 204 | +So we need |
| 205 | +- Map user/kernel space (address space) |
| 206 | + - Usual method |
| 207 | + - The high end of the page table is used as kernel space |
| 208 | + - The low end is used as user application space, because user programs mostly start from low addresses |
| 209 | + - Kernel space is shared and user space is used independently |
| 210 | +- Add system call (cross-privilege + address space) |
| 211 | +
|
| 212 | +<!-- M1 --> |
| 213 | +
|
| 214 | +The user applications toggle between two privilege levels: |
| 215 | +- Task Context : User Level, execute application logic |
| 216 | +- Trap Context : Kernel Level, handle system calls and exceptions |
| 217 | +
|
| 218 | +--- |
| 219 | +
|
| 220 | +<!-- M2 --> |
| 221 | +
|
| 222 | +Address space Area mapping Back-end `Backend` maps specific areas in a space |
| 223 | +- **Linear** |
| 224 | + - **case** The target physical address space area already exists, and the mapping relationship is directly established |
| 225 | + - It can be used for MMIO area mapping and special shared address area mapping |
| 226 | + - Corresponding physical page frames must be **consecutive** |
| 227 | +- **Alloc** (Lazy) |
| 228 | + - **case** Use missing page exception (亡羊补牢) |
| 229 | + - Mapped by page, the corresponding physical page frames are usually **discontinuous** |
| 230 | +
|
| 231 | +--- |
| 232 | +
|
| 233 | +<!-- M3 --> |
| 234 | +
|
| 235 | +To compatible with Linux applications, we should implement the compatible system calls, file systems and other system resources. This asks us to follow the POSIX standard. |
| 236 | +
|
| 237 | +POSIX allows developers to write applications using a standard set of apis without worrying about differences in the underlying operating system. |
| 238 | +
|
| 239 | +Except *Windows*, many modern operating systems, such as *Linux*, *macOS* (based on *Unix*), and many embedded systems (RtOS ?) , are partially or fully POSIX compliant. This allows applications developed on these systems to run seamlessly on different platforms. |
| 240 | +
|
| 241 | +When loading the ELF-formatted application, the `VirtAddr` and `MemSiz` are used to place the segment in the target virtual memory location. Beware that some segments like .bss are all zeros, so the actual data is not stored in the ELF file, but rather dynamically allocated space is requested. |
| 242 | +
|
| 243 | +The important support to do is to implemwnt hosted-standing environment `main` function. |
| 244 | +We should provide: |
| 245 | +- Parameter information (`argc` and `argv`): parameter number, parameter string pointer array |
| 246 | +- Environment information (`envv`): environment variable string pointer array |
| 247 | +
|
| 248 | +### Hypervisor |
| 249 | +
|
| 250 | +Each virtual machine has its own independent virtual machine hardware, based on the physical computer. Each virtual machine runs its own operating system, and the operating system believes that it is alone in the execution environment, and cannot distinguish whether the environment is a physical machine or a virtual machine. (ideal state) |
| 251 | +
|
| 252 | +> This is similar to virtualization software, like VMware ? |
| 253 | +
|
| 254 | +> Is the **virtual 8086 mode** on x86 the same or similar principle? |
| 255 | +
|
| 256 | +The difference between a Hypervisor and an Emulator is *whether the architecture of the virtual running environment is the same as that of the physical running environment that supports it*. |
| 257 | +
|
| 258 | +> It Seems to have something to do with emulation KVM acceleration. |
| 259 | +
|
| 260 | +Layers of resource objects supported by the Hypervisor: |
| 261 | +- VM: manages the address space. |
| 262 | +- vCPU: indicates the virtualization of computing resources and the flow of execution on VMS |
| 263 | +- vMem: Memory virtualization based on the physical space layout of VMS |
| 264 | +- vDevice: indicates device virtualization, including direct mapping and emulation |
| 265 | +- vUtilities: interrupts virtualization and bus device operation |
| 266 | +
|
| 267 | +
|
| 268 | +Usually use below items to implement a Hypervisor: |
| 269 | +- **run_guest** is responsible for entering Guest environment |
| 270 | +- **guest_exit** is responsible for exiting Guest environment |
| 271 | +
|
| 272 | +
|
0 commit comments