Introduction to Theseus
Note: for general info about Theseus and a quick start guide, see the top-level README.
Theseus is a new OS written from scratch in Rust to experiment with novel OS structure, better state management, and how to leverage intralingual design principles to shift OS responsibilities like resource management into the compiler.
Continue to the next chapter to learn more about Theseus, or feel free to check out our published academic papers for a deep dive into the research and design concepts behind Theseus.
What's in a name?
The ship wherein Theseus and the youth of Athens returned from Crete had thirty oars, and was preserved by the Athenians down even to the time of Demetrius Phalereus, for they took away the old planks as they decayed, putting in new and stronger timber in their places, in so much that this ship became a standing example among the philosophers, for the logical question of things that grow; one side holding that the ship remained the same, and the other contending that it was not the same. — Plutarch, Theseus
The name "Theseus" was inspired by The Ship of Theseus, an ancient Greek metaphysical paradox and thought experiment that pondered: "if you iteratively replace every individual piece of an object, is that re-built object still the same object?"
Though we do not attempt to answer this question, we do wish to enable any and every OS component to be replaced — across all layers of the system — at runtime without rebooting. This goal of easy and arbitrary live evolution was (and still is) one of the original motivating factors behind Theseus.
Theseus's Design and Structure
Theseus is a safe-language OS, in which everything runs in a single address space (SAS) and single privilege level (SPL). This includes everything from low-level kernel components to higher-level OS services, drivers, libraries, and more, all the way up to user applications. Protection and isolation are provided by means of compiler- and language-ensured type safety and memory safety, as explained in a later section.
Structure of many small Cells
Theseus is implemented as a collection of many small entities called cells, a software-defined unit of modularity that acts as the core building block of Theseus.
The cell concept is a term we coined to represent an individual entity of code and/or data that can be loaded into Theseus.
A cell is not a thread of execution, nor is it related to Rust's std::cell
types.
The Biological Cell Analogy
Cells in Theseus are inspired by and akin to biological cells in an organism, as they both have many attributes in common:
- Cells are the basic structural unit
- Cells are tiny parts of a greater whole, yet remain distinct despite complex interactions and hierarchies
- Cells each have widely differing roles, but can all be viewed under a single uniform abstraction
- Cells have an identifiable boundary (cell membrane = public interface) that explicitly regulates what enters and exits (selective permeability = naming visibility)
- Cells can be arbitrarily "refactored" into multiple different units (meiosis/mitosis = live evolution)
- Cells can be replaced independently after failing or dying (cell motility = fault recovery)
As such, we sometimes refer to Theseus as a cytokernel, in that it is composed of cells. This reinforces the distinction between the design of Theseus and that of other kernels, e.g., monolithic kernels, microkernels, multikernels, etc. Read more here.
Cell ≈ Crate
Currently, there is a one-to-one relationship between a cell and a Rust crate. The crate is Rust's project container that consists of source code and a dependency manifest file. The crate also serves as Rust's translation unit (elementary unit of compilation); in Theseus we configure each Rust crate to be built into a single .o
object file (a relocatable ELF file).
Thus, the cell abstraction is always present in Theseus, but takes different forms as shown in the below diagram.
- At implementation time, a cell is a crate.
- After compile (build) time, a cell is a single
.o
object file. - At runtime, a cell 🄲 is a structure that contains the set of sections 🅂 from its crate object file, which have been dynamically loaded and linked into memory, as well as metadata about the inter-dependencies between it and others.
In Theseus, the metadata stored for each cell is defined by the kernel/crate_metadata
crate, which includes two main types:
LoadedCrate
, which represents a single crate loaded into memory and linked against other loaded crates. TheLoadedCrate
owns the memory regions holding its sections, along with other metadata about sections and symbols in that crate.LoadedSection
, which represents an individual section within a loaded crate, as specified in its object file. ALoadedSection
comprises several main items:- The section type, e.g.,
.text
(an executable function),.rodata
(constant data),.data
/.bss
(read-write data) - Outgoing dependencies: the list of other sections from other crates that this section depends on (and links against).
- Incoming dependencies: the list of other sections from other crates that depend on (link against) this section.
- References to its containing "parent" crate and location within that crate's memory region where this section is loaded.
- The section type, e.g.,
Note that dependencies are tracked on a fine-grained, per-section basis in order to facilitate challenging OS goals like live evolution at runtime, system flexibility, fault recovery, and more.
Dependencies are derived from relocation entries specified in the .rela.*
sections in the ELF object file. This is much more precise than deriving dependencies from crate-level Cargo.toml
manifests.
Each cell is loaded and linked into a namespace, which we refer to as a CellNamespace
or CrateNamespace
, which represents a true namespace of all of the publicly-visible symbols that are exposed by the cells within it. Namespaces are useful for quick dependency (symbol) resolution during dynamic linking, and also play a key role in the above system goals, especially flexibility, as they can be used to efficiently realize multiple distinct OS personalities to serve different applications with disparate needs.
The above diagram depicts a simple set of three crates whose sections depend upon each other and are thus linked into a single namespace. The MappedPages
(MP) objects are Theseus's abstraction of owned memory regions.
Comparison with other OS designs
The below figure shows the distinction between the structure of existing OS/kernel designs and Theseus.
Monolithic OSes are the most common, including Linux and other Unix-like OSes and most commercial systems like Windows and macOS. In a monolithic OS, all kernel components exist and run in a single kernel address space, meaning that intra-kernel communication is fast and efficient: simply use function calls and shared memory accesses. However, monolithic OSes are less resilient to failures in the kernel, as any crash in kernel space (such as a buggy driver) can bring down the entire system. Applications must use system calls to ask the kernel to perform privileged operations on their behalf, requiring a privilege mode switch.
Microkernel OSes are less common, but still widespread in certain computing domains where reliability is key, such as embedded systems. Microkernels move as much kernel functionality as possible into separate user space "system server" processes, leaving the kernel itself very small. This improves resiliency, as each kernel entity executes in user space in its own address space; if one crashes, the rest of the system can continue execution by restarting the failed system process. However, microkernels are less efficient: all inter-entity functionality requires Inter-Process Communication (IPC), requiring costly context switches and mode switches.
Multikernel OSes offer high scalability to manycore hardware architectures by running a separate instance of a small kernel replicated across each hardware core. Depending on the underlying hardware, system service processes may also be replicated redundantly across (subsets of) cores to improve performance by reducing contention. They typically borrow standard OS interfaces and abstractions from monolithic and microkernel systems, though presenting a standard shared memory abstraction can harm performance.
Theseus OS does not base its structure on any aspect of the underlying hardware, unlike the above three system designs. Everything, including applications, system services, and core kernel components, exists and runs in a single address space and a single privilege level (in "kernel space"). The structure of Theseus is purely software-defined and based on the modularity concept of cells. Thus, communication and shared memory access is efficient because isolation and protection are ensured by the compiler. However, everything must be written in a safe language like Rust. See this section for more about Theseus's safe-language OS design.
Source code organization
Source code in the Theseus repository is categorized into three main folders:
kernel/
: components that implement the core functionality of the OSapplications/
: user applications, tests, benchmarks, etc that can be invoked to run in Theseus.libs/
: components that act as standalone libraries usable outside of Theseus.
1. Kernel
Crates in the kernel/
folder are considered to be "first-party" or "privileged" components that can use unsafe code if necessary, e.g., for directly interacting with hardware at the lowest levels of the OS. That being said, we go to great lengths to avoid unsafe code throughout all of Theseus.
Kernel crates cannot depend on any application crates; if they did, the application crate would be erroneously included and built into the kernel image. Kernel crates can depend on libs crates.
2. Applications
Crates in the applications/
folder are user applications that cannot use any unsafe code.
Currently this consists mostly of simple utilities and command-line tools that are developed specifically for Theseus, as well as various small apps used to test functionality or run benchmarks for performance measurements.
Application crates can depend on both kernel crates, libs crates, and even other application crates, though the latter is not recommended. See this section for more details about how applications work and how to develop one.
In the future, we expect to restrict applications to depend only upon functions and types explicitly exposed through a dedicated library, i.e., libtheseus
, but this is a future development.
3. Libs
Crates in the libs/
folder are standalone projects that must not depend on anything else in Theseus, such as kernel or application crates. They are intended to be re-used by other software projects and may eventually be refactored out of the Theseus repository. The libs/
folder also includes some other repositories that we may have forked and modified for use by Theseus, often included as git submodules.
Other folders
The other folders in the root of the repository are mostly build/configuration tools and scripts. Here's a quick rundown:
book
: contains the source code of this Theseus book, which you're currently reading.cfg
: contains a project-wide configuration MakefileConfig.mk
and JSON files that specify the compiler target platform for various Theseus builds.docker
: contains scripts and config files required to set up a basic Docker image that can be used to build and run Theseus.scripts
: contains miscellaneous scripts for setting up a build environment, testing, debugging, etc.tools
: contains custom Rust programs that run as part of Theseus's build process. See the tools/README file for more.
Project Workspace
All of the crates in the main kernel
and applications
folders are organized into a single-top level workspace, a way of using cargo (Rust's package manager and build tool) to build them all together into the same target/
directory.
This ensures they can all be directly linked together against each other and that the dependencies between them will be resolved properly by the compiler and linker toolchains.
You can see how the members of this workspace are defined in the root Cargo.toml file, and how all other folders are ignored. This uses cargo's virtual manifest feature.
Read more about Theseus's build process here.
External dependencies
Theseus does depend on many external, third-party crates that are hosted on the official crates.io registry or on GitHub. This is currently allowed for all crates, no matter whether they are kernel, application, or libs crates. In the future, we may restrict or forbid which kinds of crates can be used by applications and whether they can expose unsafe code or certain underlying assembly instructions.
Booting Process and Flow of Execution
Initial assembly code
The Theseus kernel takes over from the bootloader and first executes code in 32-bit protected mode, which corresponds to the start
function in kernel/nano_core/src/boot/arch_x86_64/boot.asm
.
Currently we use GRUB configured as a legacy bootloader (non-UEFI) and Theseus expects to be booted by a Multiboot2-compliant bootloader.
In the future, we intend to add support for booting via the UEFI standard, especially on other architectures without a legacy BIOS.
After initializing a very simple page table and other miscellaneous hardware features, the assembly file boot.asm
jumps to long_mode_start
, which now runs 64-bit code in long mode.
Then, it jumps to start_high
, such that we're not running the base kernel image in the higher half (see more about higher-half kernels here).
We then set up a new Global Descriptor Table (GDT), segmentation registers, and finally call the Rust code entry point nano_core_start()
with the proper arguments.
After calling nano_core_start
, the assembly files are no longer used, and nano_core_start
should never return.
Initial Rust code: the nano_core
The nano_core
, specifically nano_core_start()
, is the first Rust code to run in Theseus.
It performs a very minimal bootstrap/setup procedure, in which it performs the following duties:
- Initializes logging and a basic VGA text display, for the purpose of debugging.
- Sets up simple CPU exception handlers, for the purpose of catching early errors.
- Sets up a basic virtual memory environment.
- This creates the first and only virtual address space and remaps all of the bootloader-loaded sections into that new single address space.
- Importantly, Theseus doesn't depend on anything else from the bootloader after this point.
- Initializes the
mod_mgmt
subsystem, which creates the firstCrateNamespace
and allows other crates to be dynamically loaded. - Loads the invokes the
captain
, which handles the rest of the OS initialization procedures.
The nano_core
is quite general and minimalistic; it rarely needs to change. The majority of the OS-specific configuration and initialization happens in the captain
, so changes should likely be made there.
Main Initialization routine: the captain
The captain
"steers the ship" of Theseus, meaning that it contains basic logic for initializing all of the other subsystems in the proper order and with the proper flow of data between them.
Currently, there is a single captain
implementation in Theseus (for a standard x86_64 machine), which does the following:
- Initializes ACPI and APIC to discover multicore and other hardware configuration,
- Sets up interrupt and exception handlers,
- Sets up basic device drivers,
- Spawns event handling threads,
- Initializes the window manager and graphics subsystem,
- Starts the first user application, which is currently a single terminal window.
At the end, the captain
must enable interrupts to allow the system to schedule other tasks.
It then falls into an idle loop that does nothing and will never be run again by the scheduler.
Note: in the future, Theseus will add additional architecture-specific
captain
s for different platforms.
Theseus Ideas and Inspiration
Theseus is a safe-language OS, meaning that it relies on type safety and memory safety guarantees from the Rust language and compiler to enforce protection and isolation between tasks and components. As such, it foregoes hardware protection, which generally results in higher efficiency due to the ability to bypass overhead stemming from switching privilege modes and address spaces.
This is possible only when all applications are written in safe Rust, which prevents them from circumventing any type-based restrictions to cause unintended or undefined behavior.
Check out this presentation slide deck to learn more about how we ensure protection and isolation in Theseus based on the foundation of Rust's type and memory safety guarantees.
For more details about Theseus's research merit and novel design principles, see our selected list of papers and presentations here.
P.I.E. Principle
The P.I.E. principle is one of the guiding lights in the design of Theseus and much of our other systems software research. The main idea is that there are three pillars of computing goals, of which the hardware should be responsible for only two:
- Performance
- Isolation
- Efficiency
Traditionally, systems software designers have looked to hardware to provide all three -- high performance, strong isolation, and efficiency (low overhead). We believe that hardware cannot fully realize all three. The P.I.E. principle asserts that hardware should only be responsible for performance and efficiency, but should have no role (or a minimal role) in providing isolation, safety, and security. Isolation should be the responsibility of software alone.
We sometimes refer to this as the PHIS principle: Performance in Hardware, Isolation in Software.
But why?
For one, speculative execution exploits like Meltdown and Spectre have shown that hardware-ensured isolation does not protect kernel data from untrusted user space applications to the extent we once thought. It is difficult if not impossible to verify the true behavior of closed-source hardware (CPU architectures), so we turn to open-source software instead, where the OS, compiler, language libraries, and more are subject to scrutiny and even formal verification.
In addition, modern languages like Rust are able to ensure type safety and memory safety at compile time, without the overhead of traditional safe/managed languages that rely upon inefficient garbage collection and transparent heap-based object management. Thus, we can leverage these safety guarantees to ensure that compiled code does not violate isolation between tasks (threads of execution) and software modules without the need for significant runtime checks.
Theseus transcends the reliance on hardware to provide isolation, and completely foregoes hardware privilege levels (x86's Ring 0 vs. Ring 3 distinction) and multiple address spaces. Instead, we run all code at Ring 0 in a single virtual address space, including user applications that are written in purely safe Rust. This maximizes efficiency whilst preserving protection, because we can guarantee at compile time that a given application or kernel component cannot violate isolation between modules, rendering hardware privilege levels obsolete. Theseus still does use virtual memory translation provided by the MMU, but simply for convenience and ease of memory management; it can be very difficult and inefficient to directly handle and allocate physical memory for applications, and also to find large contiguous chunks of physical memory.
Going beyond safety
We show that it's possible to leverage safe languages and compilers to go much further than just basic isolation and memory safety. For more details, read about Theseus's novel concept of intralingual design (coming soon).
Application Support and Development
One of the unusual features of Theseus, compared to mainstream operating systems like Linux, is that safe applications are loaded into the same single address space as the rest of the OS and run at the same kernel privilege level. Below, we provide information about how such apps are supported by Theseus and how you can develop a new app.
Dynamic Linking and Loading of Application Crates
Applications are simply object files that are loaded into the address space, just like any other kernel crate.
The only real distinction is that they must use only safe code (unsafe code is forbidden),
and they must expose a public entry point function named main
, shown below.
If the main
function is not pub
, it may be removed by compiler optimizations or undiscoverable by the application loader code,
causing the application crate to be non-runnable.
pub fn main(args: Vec<String>) -> isize { ... }
Note that application-level libraries do not need to expose a main
function;
only applications that intend to be run as binary executables do.
If you forget to include a main()
function in your application, the crate manager in Theseus will load and link it successfully but fail to run it; a runtime error will be thrown.
Creating and building a new application
Theseus's build system will automatically build any crates in the applications/
directory, so all you have to do is place your new application crate there.
The name of the directory holding your crate files must be the same as the name of the crate as specified in its Cargo.toml name
field.
So, for example, you could create a new application crate called my_app
with the following file structure:
applications/
├── my_app
│ ├── Cargo.toml
│ └── src
│ └── lib.rs
├── ...
The applications/my_app/src/lib.rs
file contains the application code with at least a fn main()
body (as shown above).
The applications/my_app/Cargo.toml
file must specify the same name as the containing directory:
[package]
name = "my_app"
...
After building and running Theseus, you can type my_app
into the Theseus shell to run the application as expected.
Examples
See the many examples in the applications/
directory. The example
application is designed to serve as a starting point for your new application that you can easily duplicate. We offer a ported version of getopts
to help parse command-line arguments.
Dependencies: how to use OS functionality
Currently, applications can use any Theseus kernel crate as a direct dependency (via its Cargo.toml
file). This is a temporary design choice to bridge the lack of a real standard library.
In the future, this will be replaced with libtheseus
in combination with Rust's standard library, in which applications can only access the kernel functionality re-exported by libtheseus
and any functionality offered by the Rust standard library, which has two benefits:
- Applications will not be able to access public but "sensitive" kernel functions unless they are explicitly made visible to applications via the
libtheseus
library. - Applications will not have to know which kernel crate provides a specific feature; they can simply depend on the single
libtheseus
crate to access any OS feature. Their dependency management will be very simple.
Theseus's Build Process
Cargo
Theseus uses cargo, Rust's package manager and build tool, to automatically manage dependencies and invoke the actual Rust compiler for us.
We utilize cargo's workspace feature with a virtual manifest to group all of the main crates together into a single top-level meta project, which significantly speeds up build times.
As such, the crates from the main repository folders (kernel/
and applications/
) and all of their dependencies are all compiled into a single target/
folder.
The members of this workspace are defined in the root Cargo.toml manifest file, plus the list of other folders that should be ignored by cargo.
Makefiles
Although we use cargo to build all Rust code, we still use make
and Makefiles to handle high-level build tasks. You should never need to directly run cargo
or rustc
commands; go through make
instead.
The top-level Makefile essentially just invokes the Rust toolchain and compiler via cargo
, then copies the compiled object files from the appropriate target/
directory into the top-level build/
directory, and finally generates a bootable .iso
image using various bootloader tools, e.g., GRUB.
The only special build action the Makefile takes is to use the nasm
assembler to compile the architecture-specific assembly code in nano_core/boot/
, and then fully link that against the nano_core
into a separate static binary.
Configuring Theseus
Continue on to the next section to read more about configuring Theseus's build.
Configuring Theseus
Theseus source code uses the standard Rust-provided cfg
options for conditional compilation via build-time configuration.
We expose the ability to set this via the THESEUS_CONFIG
environment variable, which can be set on the command line, in the Makefile itself, or in a Rust build script.
To set one or more cfg options on the command line, all cfg options must be specified in one quoted string, with each individual cfg option separated by whitespace. For example:
make run THESEUS_CONFIG="cfg_option_1 cfg_option_2"
Here's how you would set cfg options in the Makefile. In this case, we set the same cfg_option_1
whenever make my_target
is executed:
my_target : export override THESEUS_CONFIG += cfg_option_1
my_target:
$(MAKE) run
Using cfg options in Rust source code
In Rust, you can use cfg statements in one of two main ways:
-
As attributes on code blocks, which enable conditional compilation.
- Below,
foo()
will be compiled as the top block ifcfg_option_1
was set, otherwisefoo()
will be compiled as the bottom block.#![allow(unused)] fn main() { #[cfg(cfg_option_1)] fn foo() { println!("cfg_option_1 was enabled!"); } #[cfg(not(cfg_option_1))] fn foo() { println!("cfg_option_1 was disabled!"); } }
- Below,
-
As runtime if-conditionals, which enable runtime use and checking of a statically-known cfg option via the
cfg!()
macro, which returns a boolean value.#![allow(unused)] fn main() { fn foo() { if cfg!("cfg_option_1") { println!("cfg_option_1 was enabled!"); } else { println!("cfg_option_1 was disabled!"); } } }
Follow the links below to read more about how Rust supports cfg options:
How THESEUS_CONFIG
and cfg
work together
A single line in the top-level Makefile
converts the cfg options specified by THESEUS_CONFIG
into Rust-known --cfg
values that can be used with the standard Rust cfg attributes.
That line, shown below, simply adds the "--cfg" prefix onto each whitespace-separated value in THESEUS_CONFIG
and appends them all to the RUSTFLAGS
environment variable.
cargo : export override RUSTFLAGS += $(patsubst %,--cfg %, $(THESEUS_CONFIG))
Note: you can also add
--cfg XYZ
directly toRUSTFLAGS
rather than useTHESEUS_CONFIG
.
This is a standard approach that avoids the overhead of our prior approach based on a global build script. Read more here.
Other Configuration Options
TODO: describe the new theseus_features
crate and how it allows one to include optional crates in a Theseus build.
Debug vs. Release Mode
Theseus can be built in a variety of modes, but offers two presets: debug and release build modes.
By default, Theseus is built in release mode for usable performance within an emulator like QEMU.
To build in debug mode, set the BUILD_MODE
environment variable when running make
, like so:
make run BUILD_MODE=debug [host=yes]
As with most languages, release mode in Rust is way faster, but can be difficult to debug with GDB.
Note: Theseus runs extremely slowly when built in debug mode; only use it when necessary to debug tricky problems. You will definitely want to run debug builds of Theseus in QEMU using KVM, e.g., by using the above
host=yes
argument on themake
command.
There is a special Makefile cfg/Config.mk
that contains the build mode options as well as other configuration options used in the kernel Makefile.
Static Build-time Linking vs. Dynamic Runtime Linking
Theseus offers two primary forms of linking and packaging its compiled crates into an ISO image. As depicted in the image below, the first (left side) is a conventional fully statically-linked build, as used in all other OSes, while the second (right side) is a novel dynamic linking approach used for Theseus research.
Standard build-time (static) linking
By default, Theseus is built into a single kernel binary just like a regular OS, in which all kernel
crates are linked into a single static library and then packaged into a bootable .iso file.
This is what happens when you run make
as usual.
Dynamic runtime linking (loadable
mode)
However, the research version of Theseus uses full dynamic loading and linking for all crates (except the nano_core
) to achieve its various goals of live evolution, availability through fault tolerance, flexibility, etc.
Loading and linking of crates at runtime is precisely how Theseus achieves runtime-persistent bounds; the crate management subsystem knows where it loaded a given crate into memory and can therefore maintain metadata about each loaded crate to track its bounds and dependencies.
We refer to this as loadable
mode, because crates are loaded at runtime (as cells) instead of being linked into a single static kernel binary.
To enable this, set the THESEUS_CONFIG="loadable"
option (or run make loadable
, which sets this for you). This causes the following to occur:
- Builds each crate into its own separate object file as normal, but does not link them all together.
- Copies each crate's object file into the top-level build directory's module subdirectory (
build/grub-isofiles/modules
) such that each crate is a separate object file in the final.iso
image.- The bootloader loads these object files into memory for us, which we discover and map into memory when initializing the crate management subsystem from the
nano_core
. This allows Theseus to see and load available crate object files at the very beginning of the OS boot-up without needing full support for a filesystem.
- The bootloader loads these object files into memory for us, which we discover and map into memory when initializing the crate management subsystem from the
- Sets the
loadable
config option, which as seen in thenano_core
(and other crates), will enable the#![cfg(loadable)]
code blocks that dynamically load other crates rather than include them as static dependencies.- In
loadable
code blocks, the caller dynamically looks up the symbol for a given callee function and invokes it dynamically instead of directly calling it via a regular function call. This produces a "soft dependency" in the source code rather than a "hard dependency" that actually requires the callee crate to be statically linked to the caller crate.- This is somewhat similar to
dlopen()
anddlsym()
for loading shared objects on Linux, at least conceptually.
- This is somewhat similar to
- Search the code base for
cfg(loadable)
andcfg(not(loadable))
to see where else it is used.
- In
Built-in Rust cfg and target options
The #[cfg()]
attribute and cfg!()
macro can also be used with built-in cfg options set by the Rust compiler, for example, target-specific values.
For example, Theseus frequently uses options like:
#[cfg(target_arch = "x86_64")]
#[cfg(target_feature = "sse2")]
The advantage of these features is that they can also be used in Cargo.toml
manifest files to conditionally set dependencies. For example:
## Only include the `core_simd` crate as a dependency when "sse2" is enabled.
[target.'cfg(target_feature = "sse2")'.dependencies.core_simd]
...
Unfortunately, you cannot use non-built-in cfg options to conditionally specify dependencies in Cargo.toml
files, such as anything that comes from THESEUS_CONFIG
values.
Using cargo features
Another option for configuration is to expose features
from a given crate; read more about features here.
Theseus does not use features extensively because it is structured as many small crates in one overarching virtual workspace. In this form, you cannot easily set one feature for a target crate across multiple dependent crates at the same time, e.g., using a single command-line argument; instead, you must individually change the Cargo.toml specification of every single crate that depends on that target crate.
Thus, we use the cfg
blocks instead of features
.
That being said, Theseus does choose which features it wants to use when bringing in dependencies on third-party crates, but this is minimal and only occurs for a few dependencies. Typically, features are only specified in order to choose a no_std
version of a crate, i.e., telling that crate to use the Rust core
library instead of the standard library.
Building Out-of-Tree Rust Crates Safely
Background: The Problem
Because Rust currently lacks a stable ABI, there is no easy, stable, or safe way to integrate two or more separately-compiled Rust binaries together. By integrate, we mean the ability to have one binary depend upon or invoke another pre-built binary, such as an executable, statically-linked library, dynamically-linked shared object, etc.
There is another related problem that stems from how the Rust compiler appends unique IDs (metadata used by the compiler) to each compiled crate and each (non-mangled) symbol in those crates; this issue presents itself even in unlinked object files.
As an example, the page_allocator
crate in Theseus will be compiled into an object file with a name like page_allocator-c55b593144fe8446.o
, and the function page_allocator::allocate_pages_at()
implemented and exposed by that crate will be emitted as the symbol _ZN14page_allocator17allocate_pages_at17heb9fd5c4948b3ccfE
.
The values of both the crate's unique ID (c55b593144fe8446
) and every symbol's unique ID (e.g., heb9fd5c4948b3ccfE
) are deterministic, but depend on many factors.
Those factors include the compiler version, the source directory, the target directory, and more.
We sometimes refer to both of these unique IDs as a hash value since the compiler creates them by hashing together these various factors; how this hash is generated is considered opaque and liable to change, thus we treat it as a black box.
Theseus loads and links crate object files dynamically at runtime.
When we build all of the Theseus kernel crates together into a single target directory (read more here), the unique IDs/hash values appended to every crate name and symbol are based on the build machine's source and target directories (among other factors).
A running instance of Theseus will have a single instance of the page_allocator
crate loaded into memory and expect all other crates to depend upon that instance, meaning that they should be compiled to expect linkage against its specifically-hashed symbols, e.g., _ZN14page_allocator17allocate_pages_at17heb9fd5c4948b3ccfE
.
If you separately compile another crate my_crate
that depends on the exact same set of Theseus kernel crates, cargo will recompile all Theseus crates from source into that new target directory, resulting in the recompiled object files and their symbols having completely different unique ID hashes from the original Theseus instance.
As such, when attempting to load my_crate
into that already-running prebuilt instance of Theseus, it will fail to load and link because that version of my_crate
will depend on differently-hashed crates/symbols, e.g., it may depend upon the symbol _ZN14page_allocator17allocate_pages_at17hd64cba3bd66ea729E
instead of _ZN14page_allocator17allocate_pages_at17heb9fd5c4948b3ccfE
(note the different appended hash values).
Therefore, the real problem is that there is no supported method to tell cargo that it should build a crate against a prebuilt set of dependencies. See this GitHub issue for more about why this feature would be useful, but why it still isn't supported (hint: no stable Rust ABI).
A Bad, Unsafe Solution
Technically, we could solve this by using an existing non-Rust stable ABI, like the C language ABI. This would entail defining/exposing Rust functions, data, and types in a C-compatible way such that they are compatible with the C ABI (its expected struct memory layout and calling convention). Unfortunately, this necessitates the usage of unsafe FFI code blocks (via C-style extern functions) to connect two separate bodies of fully-safe Rust code, which is both dumb and tedious.
In the above example, instead of simply invoking page_allocator::allocate_pages_at()
directly, we would need to export the appropriate wrapper functions like so:
#![allow(unused)] fn main() { // in `page_allocator` #[no_mangle] pub extern "C" fn allocate_pages_at(num_pages: usize, ...) -> ... { page_allocator::allocate_pages_at(num_pages, ...) ... } }
and then invoke it using unsafe FFI code blocks like so:
// in `my_crate` extern "C" { fn allocate_pages_at(num_pages: usize, ...); } fn main() { unsafe { allocate_pages_at(15, ...); ... } }
Note that many details are omitted above; while these code wrappers and bindings can be autogenerated, unsafety cannot be avoided.
Surely we can do better!
Solution: theseus_cargo
for out-of-tree builds
A superior solution is to "trick" the Rust compiler into using the prebuilt crates from an existing build of Theseus.
To this end, we've created theseus_cargo
, a custom build tool and wrapper around cargo that resolves an out-of-tree crate's dependencies on in-tree Theseus crates using their prebuilt objects instead of rebuilding them from source.
This is realized in two parts:
- Generating the prebuilt dependencies while building Theseus's (in-tree) kernel crates,
- Correctly building the out-of-tree crate(s) against those prebuilt Theseus crates.
1. Generating the set of prebuilt dependencies
To create a set of dependency files understood by Rust's compiler toolchain, the main top-level Makefile invokes another custom build tool, a Rust program called copy_latest_crate_objects
found in the tools/
directory.
It is invoked like so (details omitted):
cargo run ... tools/copy_latest_crate_objects -- \
--input "target/.../deps" \
--output-deps "build/deps/" \
--output-sysroot "build/deps/sysroot/" \
...
The arguments above specify that we wish to
- Use the Rust-produced build artifacts (compiled crates) in the
target/.../deps
directory as input, and then - Copy them into the
build/deps/
output directory. - Also, copy the prebuilt Rust fundamental libraries (
core
,alloc
) that were cross-compiled into the Theseus platform-specific sysroot folder intobuild/deps/sysroot/
.
Afterwards, the build/deps/
directory contains all prebuilt dependencies needed to compile an out-of-tree crate against the existing build of Theseus, with all the properly versioned (correctly hashed) crates and symbols.
This tool also generates a TheseusBuild.toml
file that describes the parameters of this build of Theseus, such that it can be replicated by theseus_cargo
. For example:
target = "x86_64-unknown-theseus"
rustflags = "--emit=obj -C debuginfo=2 -C code-model=large -C relocation-model=static -D unused-must-use -Z merge-functions=disabled -Z share-generics=no"
cargoflags = "--release"
host_deps = "./host_deps"
2. Building other Rust code against the prebuilt Theseus dependencies
With the contents of build/deps/
described above, we can invoke theseus_cargo
to build the new out-of-tree crate in a separate compilation instance.
The theseus_cargo
tool is a Rust program that invokes cargo, captures its verbose output, and then modifies and re-runs the rustc
commands issued by cargo to use the prebuilt crates to fulfill the out-of-tree crate's dependencies.
Those prebuilt crates are a set of dependencies, namely .rmeta
and .rlib
files, that are understood by the Rust compiler's internal metadata parsers.
The main modifications theseus_cargo
makes to rustc commands is to replace the <values>
of the following arguments with the paths and names of the prebuilt Theseus crates in /build/deps/
:
-L dependency=<dir>
--extern <crate_name>=<crate_file>.rmeta
If a given rustc command has any arguments that need to be changed, theseus_cargo
reissues that command.
Currently, to use theseus_cargo
, it must be compiled and installed from source:
cargo install --path="tools/theseus_cargo" --root=$INSTALL_DIR
Then, it can be invoked just like cargo build
, e.g., to build a crate against the prebuilt dependencies in the input folder:
$INSTALL_DIR/theseus_cargo --input "build/deps/" build
Currently, theseus_cargo
prints very verbose output and will show a lot of irrelevant warning and log statements describing what it is doing. If the out-of-tree crate was successfully built, it will finally print something like "Ran rustc command (modified for Theseus) successfully" before exiting successfully with exit code 0.
See the tools/theseus_cargo
source code for more details.
The approach of capturing and modifying rustc commands using verbose output from cargo is obviously not ideal, but there is currently no other supported way to obtain this info because cargo is removing its --build-plan
option.
Related Links, Discussions, Alternative Approaches
- A Stable Modular ABI for Rust (Rust Internals Forum)
- The Rust ABI wiki
- The abi_stable crate, which provides "safe" traits, macros, and wrappers around underlying Rust-to-Rust FFI.
Building and Running C programs on Theseus
Warning: Support for building C programs atop Theseus is experimental and liable to change at any moment.
As Theseus is a safe-language OS that runs all code in a single address space (SAS) and single privilege level (SPL), there is no guarantee of safety, protection, or isolation when running any other unsafe or non-Rust code directly atop Theseus.
Nevertheless, we have introduced experimental support for building C programs atop Theseus; proceed at your own risk.
Prerequisites
You must have a version of GCC and Binutils cross-compiled for Theseus, e.g., the x86_64-elf
target with red-zone
usage disabled.
To make things easy, we have written an automated script and a guide on how to build and install all necessary tools.
Note that the x86_64-elf-*
binaries must be on your system PATH before running any of the following gcc commands.
Quickstart: Building a C program
See the c_test
directory for an example dummy C program. All it does is run a simple main()
function that returns a constant value.
In short, building a C program requires the following steps:
make # 1. Build Theseus OS itself
make tlibc # 2. Build tlibc, Theseus's libc
make c_test # 3. Build a sample C program
make orun # 4. Run Theseus in QEMU (without rebuilding anything)
Running a C program
Once the C program's executable ELF file has been packaged into Theseus's ISO image, you can execute it in Theseus using the loadc
application. Executables are automatically placed in the _executable
namespace folder by default, so run the following in Theseus's shell:
loadc /namespaces/_executable/dummy_works
You should observe the return value displayed on the shell GUI, as well as various log messages that show output from tlibc alongside those from Theseus's kernel.
How does it all work?
The following sections describe how to set up the toolchain, how tlibc
is built, and how C programs are compiled and linked.
Building GCC and Binutils to target Theseus (x86_64-elf)
We provide a script that does all of this for you; see scripts/install_x86_64-elf-gcc.sh,
./scripts/install_x86_64-elf-gcc.sh $HOME/src $HOME/opt
In this document, we refer to two directories, both of which can be set to the location of your choice:
$SRC
: the directory that contains the source for gcc and binutils- For example,
$HOME/src/
- For example,
$DEST
: the directory that will hold our compiled gcc and binutils packages and libraries (where they will be installed)- For example,
$HOME/opt/
- For example,
(Instructions were taken from this tutorial on the OS dev wiki.)
1. Build a standalone version of GCC & Binutils
Install the required packages:
sudo apt-get install gcc build-essential bison flex libgmp3-dev libmpc-dev libmpfr-dev texinfo gcc-multilib
Download and build GCC/Binutils
For this tutorial, we'll use gcc
version 10.2.0
, released July 20, 2020,
and binutils
version 2.35.1
, released September 19, 2020.
You can obtain it from many mirrors online, such as these:
- gcc: https://mirrors.kernel.org/gnu/gcc/gcc-10.2.0/
- binutils: https://mirrors.kernel.org/gnu/binutils/
Create a destination directory for the newly-built packages to be installed into:
mkdir $DEST
export PREFIX="$DEST/gcc-10.2.0"
Extract each source code package into the directory of your choice ($SRC
), and then build binutils:
mkdir build-binutils
cd build-binutils
../binutils-2.35.1/configure --prefix="$PREFIX" --disable-nls --disable-werror
make -j$(nproc)
make install
Then go back to the $SRC
directory and build gcc:
# Use a GCC script to download all necessary prerequisites
cd gcc-10.2.0
./contrib/download_prerequisites
cd ../
mkdir build-gcc
cd build-gcc
../gcc-10.2.0/configure --prefix="$PREFIX" --disable-nls --enable-languages=c,c++
make -j$(nproc)
make install
2. Build GCC and Binutils again, to cross-target Theseus (x86_64-elf)
Now that we have a standalone build of gcc/binutils that is independent from the one installed by your host system's package manager, we can use that to build a version of gcc that inherently performs cross-compilation for a specific target, in this case, our Theseus x86_64-elf
target.
Note: these instructions are based on this tutorial from the OS dev wiki.
First, create a directory for the cross compiler to be built and installed into, e.g., $DEST/cross
.
mkdir $DEST/cross
export PREFIX="$DEST/cross"
export TARGET=x86_64-elf
export PATH="$PREFIX/bin:$PATH"
Second, re-build the same binutils package as above, but in a way that configures it to target Theseus.
../binutils-2.35.1/configure --target=$TARGET --prefix="$PREFIX" --with-sysroot --disable-nls --disable-werror
make -j$(nproc)
make install
Confirm that your new cross-compiler binutils package exists and is on your system PATH:
which --$TARGET-as
should output something like:
/home/my_username/opt/cross/bin/x86_64-elf-as
Then go back to the $SRC
directory and build a version of gcc that cross compiles C/C++ programs to Theseus.
mkdir cross-build-gcc
cd cross-build-gcc
../gcc-10.2.0/configure --target=$TARGET --prefix="$PREFIX" --disable-nls --enable-languages=c,c++ --without-headers
make all-gcc -j$(nproc)
make all-target-libgcc -j$(nproc)
make install-gcc
make install-target-libgcc
Before moving on, let's check to make sure our cross-compiled gcc is working.
$DEST/cross/bin/$TARGET-gcc --version
This should print out some information about your newly-built gcc. Add the -v
flag to dump out even more info.
3. Re-building GCC without the default red-zone
usage
Importantly, we must disable the red zone in gcc entirely. When invoking gcc itself, we can simply pass the -mno-red-zone
argument on the command line, but that doesn't affect the cross-compiled version of libgcc
itself. Thus, in order to avoid libgcc
functions invalidly using the non-existing red zone in Theseus, we have to build a no-red-zone version of libgcc
in order to successfully build and link C programs for Theseus, without libgcc
's methods trying to write to the red zone.
Note: instructions were adapted from this tutorial.
Adjusting the GCC config
First, create a new file within the gcc source tree at $SRC/gcc-10.2.0/gcc/config/i386
.
Add the following lines to that new file and save it:
MULTILIB_OPTIONS += mno-red-zone
MULTILIB_DIRNAMES += no-red-zone
Yes, even though we're building for x86_64
, we put it in the original x86 architecture config folder called i386
.
Then, instruct gcc's build process to use that new multilib
configuration. Open the file $SRC/gcc-10.2.0/gcc/config
and search for the following configuration lines, which starts on Line 1867 (for gcc-10.2.0):
x86_64-*-elf*)
tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h newlib-stdint.h i386/i386elf.h i386/x86-64.h"
;;
Add a line such that it looks like this:
x86_64-*-elf*)
tmake_file="${tmake_file} i386/t-x86_64-elf"
tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h newlib-stdint.h i386/i386elf.h i386/x86-64.h"
;;
Note: the indentation before tmake_file
must be a TAB, not spaces.
Building GCC again with no red zone
Go back to the build directory and reconfigure and re-make libgcc
:
cd $SRC/cross-build-gcc
../gcc-10.2.0/configure --target=$TARGET --prefix="$PREFIX" --disable-nls --enable-languages=c,c++ --without-headers
make all-gcc -j$(nproc)
make all-target-libgcc -j$(nproc)
make install-gcc
make install-target-libgcc
To check that it worked, run the following two commands:
x86_64-elf-gcc -print-libgcc-file-name
x86_64-elf-gcc -mno-red-zone -print-libgcc-file-name
The first one should output a path to libgcc.a
, and the second should output a similar path with no-red-zone
as the containing directory:
$DEST/cross/lib/gcc/x86_64-elf/10.2.0/libgcc.a $DEST/cross/lib/gcc/x86_64-elf/10.2.0/no-red-zone/libgcc.a
Appendix: How to use the no-red-zone version of GCC
To properly use this new version of GCC that cross-compiles to the Theseus target and disables the red zone, make sure you:
- use the
x86_64-elf-gcc
executable that now resides in$DEST/cross
- specify the
-mno-red-zone
flag, either on the command line or as part ofLDFLAGS
tlibc
: Compiling and Linking Theseus's libc
Warning: Support for building C programs atop Theseus is experimental and liable to change at any moment.
Theseus's libc implementation, tlibc
, is a work in progress and currently a proof-of-concept library that's missing most standard libc functionality.
Building tlibc in a Theseus-compatible way
Most standard library and libc implementations are built as fully-linked static or dynamic libraries; in Rust terms, this corresponds to the staticlib
or cdylib
crate type (see more about crate types and linkage here).
This doesn't work well for Theseus for a few reasons.
First, because Theseus runs everything in a single privilege level, there is no clear point of separation between the lowest level of user code and the highest level of kernel code.
In conventional OSes, standard libraries use the system call interface to separate their code from the rest of the OS.
Therein, building against a specific OS platform is easy -- you simply define the system call interface and compile against any necessary header files.
There is no complex linking that needs to occur, since the lowest level of the dependency chain ends at the syscall
assembly instruction, which makes the library self-contained from the linker's point of view.
Second, Theseus dynamically links raw object files at runtime, so we cannot easily create a fully statically-linked binary for a standalone C library because it won't know where its dependencies will exist in memory. Again, this is not a problem for standard libc implementations since it doesn't need to directly link against each specific syscall handler function.
Thus, we use the standard rlib
crate type for tlibc
and perform partial linking of the raw compiled object files ourselves.
ld -r -o tlibc/target/.../tlibc.o tlibc/target/.../deps/*.o
Alternatively, we could also use ar
to create an archive of all of the object files, as shown below; there's not much of a functional difference between the two approaches, but some build tools prefer .a
archives instead of a .o
object files.
ar -rcs tlibc/target/.../libtlibc.a tlibc/target/.../deps/*.o
We use the theseus_cargo
tool (as described here) to ensure that tlibc
is compiled against and depends on the correct version of crates and symbols from an existing Theseus build.
Using tlibc
Once we have the tlibc.o
(or .a
) file, we can use that to satisfy any C program's dependencies on basic libc functions/data.
The next section describes how we use the tlibc
file to build a standalone C executable that can run atop Theseus.
Compiling and Linking C programs
Warning: Support for building C programs atop Theseus is experimental and liable to change at any moment.
Currently, we use a custom form of "split" linking that is both fully-static at compile time and partially-dynamic at runtime.
1. Full static linking at build time
This procedure can be broken down into the following steps:
- Compile
tlibc
as a standard Rust crate into a series of object files, - Combine or link those tlibc-related as a single archive or object file,
- Compile the basic C program, e.g.,
dummy.c
, - Statically link the C program to the prebuilt
tlibc
object to produce a standalone executable.
The first two steps were described previously; the latter two are described below.
To build and link a dummy C program to tlibc, the invocation of gcc
currently requires several arguments to customize the compiler's behavior as well as its usage of the ld
linker. The key arguments are shown and described below:
x86_64-elf-gcc \
-mno-red-zone -nostdlib -nostartfiles \
-ffunction-sections -fdata-sections \
-mcmodel=large \
-Wl,-gc-sections \
-Wl,--emit-relocs \
-o dummy_works \
path/to/crtbegin.o \
dummy.c \
path/to/tlibc.o \
path/to/crtend.o
The most important arguments are:
-mno-red-zone
,-mcmodel=large
: match Theseus's Rust-side compiler configuration so the C code can properly invoke the Rust code.-Wl,--emit-relocs
: include details in the ELF executable (.rela
sections) about the specific relocation actions the linker performed.
After the above gcc command, we have a standalone executable ELF file dummy_works
.
2. Partial dynamic re-linking at runtime
The ELF executable dummy_works
can immediately be loaded and executed atop Theseus, but it won't necessarily work properly and execute as expected.
This is because it was fully statically linked, meaning that the executable includes duplicate instances of the same data and function sections that already exist in the loaded instances of Theseus crates in memory (cells).
Most importantly, those data sections represent system-wide singleton states (static
variables in Rust) that have already been initialized and are in active use by all other Theseus components.
Thus, the data instances packaged into the executable have not been initialized and can't safely be used.
Using those sections would result in multiple copies of data that's supposed to be a system-wide singleton; this would be bad news for most Theseus components, e.g., frame allocator's system-wide list of free physical memory.
To solve this problem, we re-perform (overwrite) all of the relocations in the executable ELF file such that they refer to the existing sections already loaded into Theseus instead of the new uninitialized/unused ones in the executable itself.
This only applies for sections that already exist in Theseus; references to new sections that are unique to the executable are kept intact, of course.
The relocation information is encoded into the ELF file itself as standard .rela.*
sections via the --emit-relocs
linker argument shown above.
This procedure is currently performed by the loadc
application; it also handles loading the ELF executable segments (program headers) and managing their metadata.
Overview of Key Subsystems
Like every OS, Theseus's functionality can be categorized into multiple subsystems, which each implement a different core OS feature. This section covers the design and implementation of key subsystems in Theseus, such as memory management, multitasking support, graphical displays, etc.
Memory Management in Theseus
Memory management is one of the most unique aspects of Theseus's design, compared to how other OSes manage memory.
Single Virtual Address Space
As previously mentioned, Theseus is a Single Address Space (SAS) OS, meaning that all kernel entities, libraries, and applications are loaded into and execute within a single address space. This is possible due to careful design combined with isolation based on Rust's type and memory safety instead of hardware-based memory protection.
That being said, Theseus's single address space is a virtual address space, not a physical address space; all Rust-level pointers that are dereferenced and addresses that are accessed are virtual addresses. Although Theseus could technically operate directly on physical memory addresses without using virtual memory at all, the use of virtual addresses affords us many benefits, e.g., easier contiguous memory allocation, guard pages to catch stack overflow, etc.
Types and Terminology
Theseus uses precise, specific terminology and dedicated types to avoid confusion and correctness errors related to mixing up physical and virtual memory. The following table concisely describes the basic memory types with links to their source-level documentation:
Description of Type | Virtual Memory Type | Physical Memory Type |
---|---|---|
A memory address | VirtualAddress | PhysicalAddress |
A chunk of memory | Page | Frame |
A range of contiguous chunks | PageRange | FrameRange |
Allocator for memory chunks | page_allocator | frame_allocator |
Addresses
In Theseus, virtual and physical addresses are given dedicated, separate types that are not interoperable.
This is to ensure that programmers are intentional about which type of address they are using and cannot accidentally mix them up.
The constructors for VirtualAddress
and PhysicalAddress
also ensure that you cannot create an invalid address and that all addresses used across the system are canonical in form, which is based on the hardware architecture's expectations.
For 64-bit architectures, the set of possible VirtualAddress
es goes from 0
to 0xFFFFFFFFFFFFFFFF
, and all canonical addresses in that range can be used.
However, while the set of possible PhysicalAddress
has the same range, there are large "holes" across the physical address space that correspond to non-existent, unusable physical addresses; the locations of specific holes is hardware-dependent and generally not known until the OS asks the bootloader at runtime.
Thus, you can be guaranteed that every canonical virtual address actually exists and is usable, but not every canonical physical address.
Page
s, Frame
s, and ranges
A chunk of virtual memory is called a Page
, while a chunk of physical memory is called a Frame
.
Page
s and Frame
s have the same size, typically 4KiB (4096 bytes) but ultimately dependent upon hardware.
These chunks are the smallest elementary unit that the hardware's Memory Management Unit (MMU) can operate on, i.e., they are indivisible from the hardware's point of view.
In other words, the MMU hardware cannot map any chunk of memory smaller than a single Page
to any chunk of memory smaller than a single Frame
.
A Page
has a starting VirtualAddress
and an ending VirtualAddress
; for example, a Page
may start (inclusively) at address v0x5000
and end (exclusively) at v0x6000
Similarly, a Frame
has a starting PhysicalAddress
and an ending PhysicalAddress
, for example from p0x101000
to p0x102000
.
A Page
can be said to contain a VirtualAddress
within its bounds, and likewise a Frame
can be said to contain a PhysicalAddress
.
Although Page
s and Frame
s have internal numbers, we typically identify them by their starting address, e.g., "the page starting at v0x9000
" instead of "page 9".
Intrinsically, Page
s have no relation to PhysicalAddress
es, and similarly, Frame
s have no relation to VirtualAddress
es.
For convenience, Theseus provides dedicated "range" types to represent a contiguous range of virtual Page
s or physical Frame
s.
They are inclusive ranges on both sides; see our custom RangeInclusive
type that replaces Rust's built-in core::ops::RangeInclusive
type for more information.
These types implement the standard Rust [IntoIterator
] trait, allowing for easy iteration over all pages or frames in a range.
Theseus employs macros to generate the implementation of the above basic types,
as they are symmetric across the virtual memory and physical memory categories.
This ensures that the VirtualAddress
and PhysicalAddress
have the same interface (public methods); the same is true for Page
andFrame
, and so on.
Page and Frame allocators
Theseus's page allocator and frame allocator are effectively identical in their design and implementation, but the former allocates virtual memory Page
s while the latter allocates physical memory Frame
s.
While the underlying implementation may change over time, the general interface for both allocators is the same.
- You can request a new allocation of one or more
Page
s orFrame
s at any address, which will only fail if there is no remaining virtual or physical memory. - You can request that allocation to start at the
Page
orFrame
containing a specific address, but it will fail if thePage
orFrame
containing that address has already been allocated. - You can also specify the minimum number of bytes that the new allocation must cover, which will be rounded up to the nearest
Page
orFrame
granularity.- These allocators cannot allocate anything smaller than a single
Page
orFrame
; for smaller allocations, you would want to use dynamically-allocated heap memory.
- These allocators cannot allocate anything smaller than a single
- Both allocators support the concept of "reserved" regions of memory, which are only usable by specific kernel entities, e.g., memory handling functions that run at early boot/init time.
- The
frame_allocator
uses reserved regions to preserve some ranges of physical memory for specific usage, e.g., memory that contains early boot information from the bootloader, or memory that contains the actual executable kernel code.
- The
- Both allocators also support early memory allocation before the heap is set up, allowing Theseus to bootstrap its dynamic memory management system with a series of small statically-tracked allocations in a static array.
The page allocator and frame allocator in Theseus do not directly allow you to access memory; you still must map the virtual Page
(s) to the physical Frame
(s) in order to access the memory therein.
We use more dedicated types to represent this, described below.
In Theseus, like other OSes, there is a single frame allocator because there is only one set of physical memory -- the actual system RAM connected to your computer's motherboard. It would be logically invalid and unsound to have multiple frame allocators that could independently allocate chunks of physical memory from the same single physical address space. Unlike other multi-address space OSes, Theseus also has a single page allocator, because we only have one virtual address space. All pages must be allocated from that one space of virtual addresses, therefore only a single page allocator is needed.
Advanced memory types
While the above "basic" types focus on preventing simple programmer mistakes through type-enforced clarity, the below "advanced" types strive to prevent much more complex errors through type-specific invariants.
AllocatedPages
: a range ofPage
s contiguous in virtual memory that have a single exclusive owner.- This can only be obtained by requesting an allocation from the
page_allocator
.
- This can only be obtained by requesting an allocation from the
AllocatedFrames
: a range ofFrame
s contiguous in physical memory that have a single exclusive owner.- This can only be obtained by requesting an allocation from the
frame_allocator
.
- This can only be obtained by requesting an allocation from the
MappedPages
: a range of virtually-contiguous pages that are mapped to physical frames and have a single exclusive owner.- We will discuss
MappedPages
in the next section about memory mapping.
- We will discuss
The main difference between the basic types and advanced types is that the advanced types guarantee exclusivity.
As such, if you have a PageRange
from v0x6000
to v0x8000
, there is nothing preventing another entity from creating a similar PageRange
that overlaps it, e.g., from v0x7000 to v0x9000
.
However, if you have an AllocatedPages
from v0x6000
to v0x8000
, you are guaranteed that no other entity across the entire system has an AllocatedPages
object that contains any of those pages.
That is a powerful guarantee that allows us to build stronger isolation and safety invariants when allocating, mapping, and accessing memory, as shown in the next section.
Mapping virtual memory to physical memory
In order to actually use and access memory, you must map a virtual address to a physical address. You cannot access a physical address directly because the CPU expects to work with virtual addresses that will be auto-translated to physical addresses by the MMU and other memory controller hardware1. Also, a virtual address is useless by itself; it doesn't have any content or purpose until you first map it to a real underlying physical address.
Memory mapping involves setting up a page table and establishing page table entries to represent the virtual-to-physical address mapping in a way that the MMU can understand. As long as a mapping exists in the currently-active page table, you can access the contents in real physical memory by accessing a virtual address that is mapped to it. Accessing memory simply means dereferencing a pointer at a particular virtual address; a pointer is effectively a typed virtual addresses.
Attempting to access a virtual address that is not currently mapped will result in a page fault, a CPU exception that temporarily halts normal execution flow to allow a page fault handler in the OS to attempt to set up an appropriate page table entry for the non-mapped virtual address. Using page faults is how on-demand paging is realized; Theseus currently does not do this because it goes against the PHIS principle.
Theseus's memory subsystem specifies three key types for mapping memory:
Mapper
: provides functions to map virtual memory to physical memory, which create and returnMappedPages
objects.PageTable
: a top-level page table, which owns the root frame of the page table and aMapper
instance that uses that page table frame to create new page table entries.- This auto-dereferences into a
Mapper
, allowing allMapper
functions to be called on aPageTable
object.
- This auto-dereferences into a
MappedPages
: an object that represents a range of virtually-contiguous pages that are mapped to physical frames and have a single exclusive owner.
The Mapper
type implements the core two functions for mapping memory in Theseus:
map_allocated_pages()
: maps a specific range ofAllocatedPages
to frames that are allocated on demand by the system at a random physical address.- Useful if you have some virtual pages and just want to use them for general purposes, and you don't care which physical memory gets mapped to it.
map_allocated_pages_to()
: maps a specific range ofAllocatedPages
to a specific range ofAllocatedFrames
.- Useful if you need to access a hardware device or other memory that exists at a specific physical address.
The astute reader will notice that you can only map a range of exclusively-owned AllocatedPages
to a range of exclusively-owned AllocatedFrames
.
You cannot simply map a raw virtual address or page to a raw physical address or frame, they have to be specifically requested from the corresponding allocator and then mapped using the current active page table.
This is part of Theseus's solution to ensure that accessing any arbitrary region of memory is guaranteed safe at compile time.
This assumes the CPU is in a standard mode like protected mode or long mode with paging enabled. The MMU can be disabled, after which virtual addresses do not exist and physical addresses can be directly used, but Theseus (and most other OSes) do not disable the MMU.
The MappedPages
type
The MappedPages
type represents a region of virtually-contiguous pages that are statically guaranteed to be mapped to real physical frames (which may or may not be contiguous).
A MappedPages
instance includes the following items:
- The range of
AllocatedPages
that it owns and are currently mapped to - The permissions flags with which it was mapped, e.g., whether it is read-only, writable, cacheable, etc.
- The root page table under which it was mapped, for extra safety and sanity checks.
MappedPages
is the fundamental, sole way to represent and access mapped memory in Theseus; it serves as the backing representation for stacks, heaps, and other arbitrary memory regions, e.g., device MMIO and loaded cells.
Invariants and Safety Guarantees at Compile-time
The design of MappedPages
empowers the compiler's type system to enforce the following key invariants, extending Rust's memory safety checks to all OS-known memory regions, not just the compiler-known stack and heap.
- The mapping from virtual pages to physical frames must be one-to-one, or bijective.
- Each
Page
can only be mapped to oneFrame
, and eachFrame
can only be mapped by a singlePage
, system-wide.
- Each
- A memory region must be unmapped exactly once, only after no outstanding references to it remain.
- A memory region must not be accessible beyond its bounds.
- A memory region can only be referenced as mutable or executable if mapped as such.
These invariants integrate nicely with Rust's existing memory safety rules, such as preventing multiple invalid aliases (aliasing XOR mutability), out-of-bounds access, use after free and double free, and forbidden mutable access.
How to use MappedPages
A key aspect of MappedPages
is its "access methods" that allow callers to safely reinterpret the underlying mapped memory region as a particular type.
Reinterpreting untyped memory is a crucial feature for any memory management subsystem; Theseus provides fully-safe interfaces to do so, while existing OSes do not.
Reinterpretive casting is sometimes also referred to as "struct overlay", as you're overlaying a struct on top of an existing memory region.
Access method name | Return type | Description |
---|---|---|
as_type() | &T | returns an immutable reference to a generic type T starting at a particular offset into the memory region. |
as_type_mut() | &mut T | same as as_type() , but returns a mutable reference. |
as_slice() | &[T] | returns a reference to a slice (dynamic-sized array) of N elements of type T , starting at a particular offset. |
as_slice_mut() | &mut [T] | same as as_slice() , but returns a mutable reference to a slice of type T . |
LoadedSection::as_func() | & <impl Fn(...)> | returns a reference to a function that exists within a LoadedSection 's memory region, which must be an executable .text section. |
These access methods ensure the aforementioned invariants of the MappedPages
type.
- The size of the generic type
T
, which must be known at compile time (T: Sized
), plus the offset must not exceed the bounds of the memory region.- The same is true for slices: the number of elements of a sized type
T: Sized
plus the offset must not exceed the region's bounds.
- The same is true for slices: the number of elements of a sized type
- If a mutable reference is requested, the underlying memory region must have been mapped as writable.
- The same is true for functions and executable memory regions.
- These methods all return references to the requested type or slice, in which the lifetime of the returned reference (
&T
) is dependent upon the lifetime of theMappedPages
object, in order to statically prevent use-after-free errors.- One cannot obtain an owned instance of a type
T
from an underlyingMappedPages
memory region, because that would remove the semantic connection between the typeT
and the existence of the underlying memory mapping.
- One cannot obtain an owned instance of a type
In comparison, other OSes typically return raw virtual address values from a memory mapping operation, which you must then unsafely cast to a typed pointer of your choice. With raw addresses, there is no lifetime guarantee to ensure that the mapping persists for as long as those virtual addresses are used. As such, Theseus removes at compile time the potential to easily cause unsafe, undefined behavior by using a raw virtual address after it has been unmapped.
For more details, see the Theseus paper from OSDI 2020, or Kevin Boos's dissertation, both available here.
The MappedPages
type also exposes other convenient utility methods:
remap()
: change the permissions flags for the virtual pages, which are still mapped to the same physical frames.merge()
: combine multiple contiguousMappedPages
objects into one.unmap_into_parts()
: unmap the memory region and re-take ownership of its constituent parts (AllocatedPages
, etc) for future use.- Without calling this, a
MappedPages
object will be auto-unmapped upon drop and its constituent parts deallocated for future use, but that will happen behind the scenes without you being able to directly access them.
- Without calling this, a
- You can also call any of the methods from
PageRange
as well.
Deallocating frames
Deallocating virtual pages is easy because the range of AllocatedPages
is directly stored in and owned by a MappedPages
object, so it is simply a matter of deallocating them once they are dropped.
However, deallocating a range of AllocatedFrames
is much more difficult because each page in a range of virtually-contiguous pages may likely be mapped to a different, non-contiguous set of frames.
This means we may have to deallocate many sets of AllocatedFrames
, up to one per page.
In existing OSes, there is no way to easily and immediately determine which frames are mapped to which virtual pages; this requires a reverse mapping from 1
frame to N
pages, which is prohibitively expensive to maintain.
As such, OS kernels typically run a periodic "garbage collection" thread on idle CPUs that sweeps the page tables to determine which frames can be reclaimed.
However, Theseus's design vastly simplifies the procedure of reclaiming unused physical frames for deallocation. The single address space design and guaranteed bijective (1-to-1) mappings mean that a frame is mapped exclusively by a single page; when that page is no longer mapped to that frame, the frame can be deallocated. We refer to this as exclusive mappings, and they are realized via a combination of several crates:
- When frame(s) are unmapped in the
page_table_entry
crate, it creates anUnmapResult
that may contain a set ofUnmappedFrames
.- The primary function of interest is
PageTableEntry::set_unmapped()
.
- The primary function of interest is
- Using strong type safety, the
frame_allocator
is able to accept a set ofUnmappedFrames
as a trusted "token" stating that the included frames cannot possibly still be mapped by any pages. It can therefore safely deallocate them.- Deallocation occurs seamlessly because an
UnmappedFrames
object can be converted into anAllocatedFrames
object, see here for details and source.
- Deallocation occurs seamlessly because an
Heap: dynamic memory allocation
Heaps are used to dynamically allocate chunks of memory smaller than the granularity of one page.
Note: One can request a large allocation from the heap, but in Theseus it will be backed by an individually-created
MappedPages
object of newly-allocated pages and frames that are mapped to one another, so it's generally less efficient to use the heap for large allocations.
In Theseus, the primary purpose of the heap is to enable the usage of Rust's alloc
types, e.g., Box
, Arc
, Vec
, and other dynamically-allocated collections types.
Heap allocators must implement Rust's GlobalAlloc
trait in order to be used as the backing allocator behind these alloc
types.
Overview of Relevant Crates
heap
: the default heap implementation that offers a static singleton fixed-size block allocator.- This is the first heap initialized and created during early OS boot.
block_allocator
: the underlying allocator implementation that optimizes allocations of common power-of-two sizes, e.g., 8 bytes, 32 bytes, etc.- Uses the
linked_list_allocator
crate as a fallback for uncommon allocation sizes.
- Uses the
multiple_heaps
: a more complex allocator that implements multiple heaps of arbitrary sizes and usage patterns.- Each internal heap instance is based on a zone allocator, which are modified versions of slab allocators from the
slabmalloc
crate. - Unused heap space can easily be transferred among different internal heap instances for rapid, efficient heap growth.
- Currently, one internal heap is created for each CPU core, with the core ID being used to identify and select which heap should be used for allocation.
- It is trivially easy to use
multiple_heaps
in a different way, such as per-task heaps or per-namespace heaps.
- Each internal heap instance is based on a zone allocator, which are modified versions of slab allocators from the
One unique aspect of Theseus's "combination" heap design is that the early heap, fully-featured heap, and per-core dedicated heaps are all combined into a single heap abstraction that can be accessed via a singleton global heap instance.
It starts out with the simple block allocator described above, and then, once more key system functionality has been initialized during OS boot, the switch_to_multiple_heaps()
function is invoked to transparently activate the more complex, per-core heap allocators.
Another unique aspect of heaps in Theseus is that all entities across the system use and share the same set of global heaps. This allows allocations to seamlessly flow and be passed among applications, libraries, and kernel entities without the need for inefficient and complex exchange heaps used in other SAS OSes.
Note: Theseus's combination heap design was implemented before Rust's
alloc
types supported non-global allocators and placement constructors.We haven't yet investigated how to break these heap instances down into individual allocators that can be used with specific allocation placement functions like
Box::new_in()
.If you're interested in working on this, please file an issue on GitHub or otherwise contact the Theseus maintainers.
Tasking Subsystem in Theseus
The tasking subsystem in Theseus implements full support for multitasking, in which multiple different tasks can execute concurrently1 atop a single set of shared resources, i.e., CPUs and memory.
Because Theseus is a single address space (SAS) OS, it does not have a dedicated address space for each task, and thus does not follow the classic POSIX/Unix-like "process" abstraction where a process is a group of threads that all execute the same program in the same address space. One could consider tasks in Theseus to be the same as threads in other systems; the terms "task" and "thread" can be used interchangeably. One could also consider the entirety of Theseus to be a single "process" in that all Tasks execute within the same address space, but the analogous similarities between a process and Theseus ends there.
In general, the interfaces exposed by the task management subsystem in Theseus follows the Rust standard library's model for threading, with several similarities:
- You can spawn (create) a new task with a function or a closure as the entry point.
- You can customize a new task using a convenient builder pattern.
- You can wait on a task to exit by
join
ing it. - You can use any standard synchronization types for inter-task communication, e.g., shared memory or channels.
- You can catch the action of stack unwinding after a panic or exception occurs in a task.
In this way, tasks in Theseus are effectively a combination of the concept of language-level green threads and OS-level native threads.
The Task
struct
There is one instance of the Task
struct for each task that currently exists in the system.
A task is often thought of as an execution context, and the task struct includes key information about the execution of a given program's code.
Compared to that of other OSes, the Task
struct in Theseus is quite minimal in size and scope,
because our state management philosophy strives to keep only states relevant to a given subsystem in that subsystem.
For example, scheduler-related states are not present in Theseus's task struct; rather, they are found in the relevant scheduler crate in which they are used.
In other words, Theseus's task struct is not monolithic and all-encompassing.
Theseus's task struct includes several key items:
- The actual
Context
of the task's on-CPU execution.- This holds the values of CPU registers (e.g., the stack pointer and other registers) that are saved and restored when context switching between this task and others.
- The name and unique ID of that task.
- These are used primarily for human readability and debugging purposes.
- The runnability (
RunState
) of that task, i.e., whether it can be scheduled in, and its current running status.- This includes the task's exit value, if it has exited.
- The namespace that task is running within; see
CrateNamespace
.- Running tasks in different
CrateNamespace
s is one way to partially mimic the separation offered by the standard process abstraction; see here for more info.
- Running tasks in different
- The task's stack.
- The task's environment, e.g., it's current working directory.
- A variety of states related to handling execution failures and the cleanup of a failed task.
- These are used for fault tolerance purposes like unwinding, task cleanup, and auto-restarting critical tasks upon failure.
Note: the source-level documentation for the
Task
struct describes each member field of the Task struct in much greater detail.
The Task struct itself is split into two main parts:
- Immutable state: things that cannot change after the initial creation of the Task.
- Mutable state: things that can change over the lifetime of the Task.
The immutable states are contained in the Task
struct itself, while the mutable states are contained within the TaskInner
struct.
Each Task struct contains a TaskInner
instance protected by a lock.
This design serves to significantly reduce locking overhead and contention when accessing the states of a Task in a read-only manner.
Note that certain mutable states, primarily runstate information, have been moved into Task
itself because they can be safely modified atomically, but in general, all other mutable states are placed within TaskInner
.
We are also careful to correctly specify the public visibility of state items within the Task
and TaskInner
structures to ensure that they cannot be modified maliciously or accidentally by other crates.
The TaskRef
type
Tasks frequently need to be shared across many entities and subsystems throughout Theseus.
To accommodate this, Theseus offers the TaskRef
type, which is effectively a shared reference to a Task
, i.e., an Arc<Task>
.
There are several reasons that we introduce a dedicated newtype
instead of using an Arc<Task>
directly:
- To clarify all code related to task management.
- To control the visibility (public vs. private) of items in the Task struct.
- To guarantee that the task-local data area (per-task data) is properly set up when a new Task is created.
- A circular reference from a
Task
to its enclosingTaskRef
is necessary to realize task-local data; see theTaskLocalData
type for more details.
- A circular reference from a
- To expose a limited set of functions that allow foreign crates to modify or query only certain task states, e.g.,
join
ing a task. - To greatly optimize comparison and equality tests between two Tasks.
- Two
TaskRef
s are considered equal if they point to the same underlyingTask
struct. - This avoids having to apply comparison tests against all fields in the
Task
struct, a very expensive operation.
- Two
- To prevent other entities from obtaining direct access to a
Task
struct, or wrapping aTask
in a non-Arc
shared pointer type.
Note: while
TaskLocalData
offers a very basic form of per-task data, commonly known as Thread-Local Storage (TLS), full support for standard compiler-known TLS areas in the generated object code is a work in progress.
The global task list
Like all other OSes, Theseus maintains a global list of all tasks in the system.
Currently, this task list is stored as a map from a numeric task ID to a TaskRef
.
Tasks are added to the task list when they are initially spawned, and will remain in the task list for the entirety of their lifecycle. It is important to note that the presence of a task in the task list is not indicative of that task's runnability or execution status. A task is only removed from the task list once it has been reaped, i.e., it has completely exited and its exit value has been "taken" by another task; for example, a "parent" task may reap a "child" task that it has spawned.
Context switching
In OS terminology, the term "context switch" is often incorrectly overloaded and casually used to refer to any number of related topics:
- Switching from one thread to another thread in the same address space.
- Switching from one thread to another thread in a different address space.
- Switching from user mode to kernel mode (e.g., Ring 3 to Ring 0) during a system call.
- Switching from a user thread to a kernel thread upon an interrupt being triggered.
Only number 1 above is what we consider to be a true context switch, and that is what we refer to here when we say "context switch." Number 2 above is an address space switch, e.g., switching page tables, a different action that could potentially occur when switching to a different task, if the next task is in a different process/address space than the current task. Number 3 above is a mode switch that generally does not result in the full execution context being saved or restored; only some registers may be pushed or popped onto the stack, depending on the calling convention employed by a given platform's system calls. Number 4 above is similar to number 3, but is triggered by the hardware rather than userspace software, so more execution context states may need to be saved/restored.
One key aspect of context switching is that it is transparent to the actual code that is currently executing, as the lower layers of the OS kernel will save and restore execution context as needed before resuming. Thus, context switching (with preemption) allows for multiple untrusted and uncooperative tasks to transparently share the CPU, while maintaining the idealistic model that they each are the only task executing in the entire system and have exclusive access to the CPU.
Implementing context switching
The implementation for context switching in Theseus is split across several crates, with each crate corresponding to a particular subset of SIMD instructions being enabled.
The top-level context_switch
crate automatically selects the correct version based on which subset of SIMD functionality is chosen by the target hardware platform's specification.
For example, if SSE2 was enabled, #[cfg(target_feature = "sse2")]
would be true and the context_switch_sse2
crate would be used as the context switching implementation.
Currently, one can select this target by using the x86_64-unknown-theseus-sse
target while building Theseus:
make run TARGET=x86_64-unknown-theseus-sse
Theseus supports both SSE2 and AVX, but its default target x86_64-unknown-theseus
disables both.
This tells the compiler to generate soft floating-point instructions instead of SIMD instructions, meaning that the SIMD register set is not used at all.
Thus, disabling SIMD results in the simplest and fastest version of context switching, as only the basic set of general-purpose CPU registers must be saved and restored; all SIMD registers can be ignored.
Context switching is inherently an unsafe operation, hence why the standard context_switch()
function is marked unsafe.
It must be implemented in assembly in order to ensure that the compiler doesn't insert any instructions that modify register values in between our instructions that save/restore them.
It is only invoked by the task_switch()
function, as only that function has the proper information — saved register values and the destination for the restored register values — to correctly invoke it.
Pre-emptive vs. Cooperative Multitasking
Theseus implements full support for preemptive multitasking, in which a given task is interrupted at a certain periodic time interval in order to allow other tasks to take over execution.
This prevents one greedy task from hogging all system resources and/or starving other tasks from execution time.
In Theseus, like most other systems, we implement this using a timer interrupt that fires every few milliseconds on each CPU core.
The length of this time period is called the timeslice, and is a configurable setting.
Currently, this timer interrupt is set up in the LocalApic
initialization routine, which runs when each CPU core is discovered and initialized.
The interrupt handler itself is very simple, and is currently found in a function called lapic_timer_handler()
in kernel/interrupts/src/lib.rs
.
Cooperative multitasking is also possible, but at the moment Theseus does not offer an easy way to disable preemption to use only cooperative multitasking; however, it wouldn't be difficult to add.
Currently, tasks can choose to yield the processor to other tasks by invoking schedule()
, which will select another task to run next and then switch to it.
This scheduling function is the same function that is invoked by the aforementioned timer interrupt handlers to preempt the current task every few milliseconds.
The Task lifecycle
Theseus tasks follow a typical task lifecycle, which is in-part demonstrated by the possible variants of the RunState
enum.
- Initializing: the task is being created.
- After the task is finished being created and is fully spawned, its runstate will be set to Runnable.
- Runnable: the task is able to be scheduled in (but is not necessarily currently executing).
- A runnable task may be blocked to prevent it from being scheduled in until it is subsequently unblocked.
Note: "running" and "runnable" are not the same thing.
- A task is considering runnable after it has spawned and before it has exited, as long as it is not blocked.
- A task is only said to be running while it is currently executing on a given CPU core, and is merely considered runnable when it is waiting to be scheduled in whilst other tasks execute.
- Blocked: the task is blocked on some other condition and is not able to be scheduled in.
- A blocked task may be unblocked to mark it as runnable again.
- Exited: the task is no longer executing and will never execute again.
- An exited task has an
ExitValue
, which is one of:- Completed -- the task ran to completion normally and finished as expected.
- Killed -- the task stopped executing prematurely as a result of a crash (language-level panic or machine-level exception) or as a result of a kill request.
- An exited task must not cleaned up until ist has been "reaped" (see below).
- An exited task has an
- Reaped: the task has exited and its
ExitValue
has been taken by anotherTask
.- A task is cleaned up and removed from the system once it has been reaped.
- An exited task will be auto-reaped by the system if no other task is waiting on it to exit.
See
JoinableTaskRef
for more info on how this works.
Spawning new Tasks
The functionality to spawn (create) a new task is implemented in a separate spawn
crate.
Theseus provides a TaskBuilder
interface to allow one to customize the creation of a new task, which starts with a call to new_task_builder()
.
The caller must pass a function and argument when spawning a task: the function is the entry point for the task, and the argument will be passed to that entry point function.
Once the task builder has been used to suitably customize the new task, one must invoke the spawn()
method on the TaskBuilder
object to actually create the new Task
instance and add it to one or more runqueues.
This does not immediately execute the task, but rather only makes it eligible to be scheduled in during the next task switch period.
The function passed into the spawn routine is actually not the first function to run when the task is first switched to.
All new tasks have the same entry point, the task_wrapper()
function, which simplifies the procedure of jumping into a new task:
- Setting up the proper stack contents
- Invoking the task's entry function with the proper argument
- Catching panics and exceptions during unwinding
- Handling the task's states after it has exited, whether a completion or failure.
Cleaning up Tasks
The Task
struct implements Rust's Drop
trait, meaning that once all references to a Task have ended, the Task object itself can be dropped and cleaned up.
Because Tasks are more complex than most structures, the drop handler alone isn't sufficient to properly clean up and remove all traces of that task and the effects that it has had on the system.
The clean up procedure begins after a task exits, either by running to completion or being killed upon request or after encountering a panic or exception.
- If the task ran to completion, the
task_wrapper()
will calltask_cleanup_success()
, a function that marks the task as having successfully exited and stores the returnedExitValue
in itsTask
struct. - If the task did not finish, the
task_wrapper()
will calltask_cleanup_failure()
, a function that marks the task as having encountered a failure, and stores the reason that it was prematurely killed in itsTask
struct.
After handling the successful or failed exit condition, the last piece of the task lifecycle is the task_cleanup_final()
function, which removes the task from any runqueues it may be on, drops the final reference to that task, and then re-enables interrupts and yields the processor.
That final task reference, when dropped, triggers the aforementioned drop handler for the Task struct, which automatically releases all of its acquired states and allocated resources.
Note that the procedure of stack unwinding accomplishes the release of most resources and allocations, since those are represented by owned values that exist on the program's stack.
Note: there are a separate set of similar task lifecycle functions for critical system tasks that were spawned as restartable. The primary difference is that after the cleanup actions are completed, a new task is spawned with the same entry point and initial argument as the failed task.
Multitasking on a uniprocessor machine (with a single CPU core) is not truly concurrent, it just appears to be concurrent over a given time interval because the execution of multiple tasks is quickly interleaved. True concurrency can be achieved on a multiprocessor machine, in which different tasks execute simultaneously on different cores.
Invariants Upheld in Task Management
Theseus enforces several key invariants related to task management in order to empower the compiler to uphold memory safety and prevent resource leaks throughout the task lifecycle.
All task lifecycle functions leverage the same set of generic type parameters.
The trait bounds on these three type parameters <F, A, R>
are a key aspect of task-related invariants.
F
: the type of the entry function, i.e., its signature including arguments and return type.A
: the type of the single argument passed into the entry functionF
.R
: the return type of the entry functionF
.
Invariant 1. Spawning a new task must not violate memory safety.
Rust already ensures this for multiple concurrent userspace threads, as long as they were created using its standard library thread type.
Instead of using the standard library, Theseus provides its own task abstraction, overcoming the standard library’s need
to extralingually accommodate unsafe, platform-specific thread interfaces, e.g. fork()
.
Theseus does not offer fork because it is known to be unsafe and unsuitable for SAS systems1,
as it extralingually duplicates task context, states, and underlying memory regions without reflecting that aliasing at the language level.
Theseus’s task abstraction preserves safety similarly to and as an extension of Rust threads.
The interfaces to spawn a new task (in the spawn crate) require specifying the exact type of the entry
function F
, its argument A
, and its return type R
, with the following constraints:
- The entry function
F
must be runnable only once, meaning it must satisfy theFnOnce
trait bound. - The argument type
A
and return typeR
must be safe to transfer between threads, meaning they must satisfy theSend
trait bound. - The lifetime of all three types must outlast the duration of the task itself, meaning they must have a
'static
lifetime.
All task lifecycle management functions are fully type-safe and parameterized with the same generic type parameters, <F,A,R>
.
This ensures both the compile-time availability of type information and the provenance of that type information from head to tail (spawn to cleanup) across all stages in the task's lifecycle.
Theseus thus empowers the compiler to statically prevent invalidly-typed task entry functions with arguments, return values, or execution semantics that violate type safety or memory safety.
Invariant 2: All task states must be released in all possible execution paths.
Releasing task states requires special consideration beyond simply dropping a Task object to prevent resource leakage, as mentioned in the previous chapter. There are several examples of how the multiple stages of task cleanup each permit varying levels of resource release:
- The task's stack is used during unwinding and can only be cleaned up once unwinding is complete.
- The saved execution
Context
can be released when a task has exited. - A task's runstate and exit value must persist until it has been reaped.
The task cleanup functions described in the previous chapter demonstrate the lengths to which Theseus goes to ensure that task states and resources are fully released in both normal and exceptional execution paths.
In addition, as mentioned above, all cleanup functions are parameterized with the same <F, A, R>
generic type parameters,
which is crucial for realizing restartable tasks because the failure handler for a restartable task must know its specific type parameters for the entry function, argument, and return type in order to re-spawn a new instance of the failed task.
Invariant 3: All memory transitively reachable from a task’s entry function must outlive that task.
Although all memory regions in Theseus are represented by MappedPages
, which prevents use-after-free via lifetime invariants,
it is difficult to use Rust lifetimes to sufficiently express the relationship between a task and arbitrary memory regions it accesses.
The Rust language does not and cannot specify a task-based lifetime, e.g., &'task
, to indicate that the lifetime of a given reference is tied to the lifetime of the current task.
Furthermore, a Rust program running as a task cannot specify in its code that its variables bound to objects in memory are tied to the lifetime of an underlying MappedPages
instance, as they are hidden beneath abstractions like stacks, heaps, or static program sections (.text, .rodata, etc).
Even if possible, this would be highly unergonomic, inconvenient, and render ownership useless.
For example, all local stack variables would need to be defined as borrowed references with lifetimes derived from that of the MappedPages
object representing the stack.
Thus, to uphold this invariant, we instead establish a clear chain of ownership:
each task owns the LoadedCrate
that contains its entry function,
and that LoadedCrate
owns any other LoadedCrate
s it depends on by means of the per-section dependencies in the crate metadata.
As such, the MappedPages
regions containing all functions and data reachable from a task’s entry function are guaranteed
to outlive that task itself.
This avoids the unsavory solution of littering lifetime constraints across all program variables, allowing Rust code to be written normally with the standard assumption that the
stack, heap, data, and text sections will always exist.
Andrew Baumann, Jonathan Appavoo, Orran Krieger, and Timothy Roscoe. "A fork() in the road". In Proceedings of HotOS, 2019.
Display Subsystem
Warning: the display subsystem in Theseus is in need of complete redesign. It is inefficient and poorly implemented, as it was simply a means to the end of being able to interact with the system, and unfortunately has not been a focus of any significant effort.
How the Window Manager works
Design
Typically, both the application that owns/creates a window and the window manager that controls that window need to access it jointly. The application needs to display its content into the main part of the window, and the window manager needs information about the location and depth ordering of all windows to render them.
To share a window between an application and the window manager, the application holds a strong reference (Arc
) to the window, while the window manager holds a weak reference (Weak
) to that same window. This allows the window manager to control an manage a window without conceptually owning it.
We use a Mutex
to wrap each window to allow the application task and window manager task to safely access it jointly. However, Mutex
introduces the possibility of deadlock: when an application wants to access its window, it must acquire the Mutex
lock, operate on the window, and then release the lock. If the application doesn't release the lock on its window, the window manager will be forced to block until the lock is released, preventing it from performing typical operations like switching between windows, delivering events, or deleting windows.
To solve this problem, we define two structures: Window
and WindowInner
. WindowInner
only contains the information required by the window manager. The window manager holds a list of references to WindowInner
objects, while only the application owns the outer Window
object (which itself does contain a reference to the underlying WM-owned WindowInner
object. The Window
struct also contains other application-relevant states that describe the window.
The WindowInner
structure
The window_inner
crate defines a WindowInner
structure. It has states and methods of displaying the window on the screen.
A WindowInner
has a framebuffer to which it can display the content of the window. The framebuffer takes a type parameter of pixels it consists of. When the window is rendered to the screen, a compositor may composite every pixel with different principles according to the type. Currently, we have implemented a normal RGB pixel and a pixel of an alpha channel.
Both an application's window and the window manager has a reference to the same WindowInner
object. The application can configure and draw in the framebuffer and the manager can display and composite the window with others.
This structure also has an event producer. The window manager gets events from I/O devices such as keyboards and push them to the corresponding producer.
Window
A Window
object represents a window and is owned by an application. It contains its profile, a title, a consumer and a list of displayables. The consumer can get events pushed to the producer in its profile by the manager.
A Window
provides methods to display the displayables in it and render itself to the screen. The window manager is responsible for compositing it with other windows through a framebuffer compositor.
Displayables
The displayable
crate defines a Displayable
trait. A Displayable
is an item which can display itself onto a framebuffer. It usually consists of basic graphs and acts as a component of a window such as a button or a text box. Currently, we have implemented a TextDisplay
which is a block of text. In the future, we will implement other kinds of displayables.
An application can own multiple displayables and display any type of Displayable
in its window.
The WindowManager
The window_manager
crate defines a WindowManager
structure. This structure consists of the profiles of an active window, a list of shown windows and a list of hidden windows. The hidden ones are totally overlapped by others. The structure implements basic methods to manipulate the list such as adding or deleting a window.
The WindowManager
structure contains a bottom framebuffer which represents the background image and a final framebuffer of a floating window border and a mouse arrow. In refreshing an area, it renders the framebuffers in order background -> hidden list -> shown list -> active -> top. It provides several methods to update a rectangle area or several pixels for better performance.
The structure defines a loop for generic events, a loop for keyboard events and a loop for mouse events. Theseus will initialize them as tasks to handle inputs. The window manager structure provides methods to operate on the window list as reactions to these inputs. It can move a window when we drag it with mouse or pass other events to the active window. The owner application of the active window can handle these events.
The window_manager
crate owns a WINDOW_MANAGER
instance which contains all the existing windows. It invokes the methods of WindowManager
to manage these windows.
How to Create Windows and Display Content
Create a Window
An application invokes the Window::new()
function in the window
crate to create a new window. The function would create a new Window
object and add a weak reference of its WindowInner
to the WINDOW_MANAGER
instance in window_manager
. It then returns the window to the application. Once the application terminates, the window it owns would be dropped automatically, and the weak reference in the window manager would be deleted.
Display in a Window
An application can create a Displayable
and invoke Window.display()
to display it. This method is generic and works for all kinds of displayables.
After display a displayable in its framebuffer, the window would invoke its render()
method to render the updates to the screen. A framebuffer compositor will composite a list of framebuffers and forward the result to a final framebuffer which is mapped to the screen.
Handle Key Inputs
An application invokes Window.handle_event()
to handle the events sent to it. For example, an active window will receive all the key input events. An application can invoke Window.handle_event()
in a loop to handle these inputs from the keyboard.
Running Theseus on Virtual or Real Hardware
We have tested whether Theseus runs properly in a variety of environments, currently on x86_64 only:
- Virtual machine emulators: QEMU, bochs, VirtualBox, VMware Workstation Player.
- Real hardware: Intel NUC devices, Supermicro servers, various Thinkpad laptops, PCs with Gigabyte motherboards.
Currently, the primary limiting factor is that the device support booting via USB or PXE using traditional BIOS rather than UEFI; support for UEFI is a work-in-progress.
Note that as Theseus is not fully mature, booting on your own hardware is done at your own risk. Be sure that you have a backup of all of your important files before doing so.
If you experience a problem booting Theseus on any virtual or real hardware platform, please take a look at the open issues on GitHub to see if someone has already reported your problem or attempted to fix it. If so, leave a comment describing your experience or open a new issue to help the Theseus developers work towards supporting your hardware environment!
Running Theseus in a Virtual Machine
Using a virtual machine emulator is by far the easiest way to develop, test, and run Theseus.
QEMU
Our primary test environment and recommended emulator is QEMU, which is the default choice for running Theseus using our built-in Makefile commands.
For example, the make run
target automatically runs Theseus in a QEMU virtual machine after it finishes the build process.
The top-level Makefile specifies the configuration parameters for Theseus in QEMU, such as system memory, attached storage devices, serial log output, and more.
All of these parameters start with QEMU_
and can be overridden on the command line, or indirectly by setting environment variables such as net
or host
, or by editing the Makefile itself.
Bochs
In older versions of Theseus, we used both Bochs and QEMU for testing. Bochs is supported but its configuration may be out of date; the configuration is found in the bochsrc.txt
(direct link) file in the root repository directory.
Bochs runs quite slowly and supports virtualization of far fewer hardware devices than QEMU; thus, we do not recommend using it. However, you can try running Theseus in Bochs using the Makefile target for it:
make bochs
VMware Workstation Player
We have tested Theseus on VMWare Workstation and it generally works out of the box. However, there are some options that you may wish to enable to improve performance and offer access to more devices in Theseus.
First, download VMware Workstation Player here, which can be installed and used for free for non-commercial work.
On Linux, you will download a .bundle
file, which then needs to executed in a terminal. For example:
chmod +x VMware-Player-<...>.bundle
sudo ./VMware-Player-<...>.bundle
After opening VMware Workstation Player, do the following:
- Click
Create A New Virtual Machine
. - In the New Virtual Machine Wizard window, choose
Use ISO image:
and browse to select the Theseus ISO image, which should be located in thebuild
directory, e.g.,build/theseus-x86_64.iso
. Then, clickNext
. - Under
Guest Operating System
, choose theOther
button and thenOther 64-bit
from the drop-down menu. Then, clickNext
. - Set the
Name:
field to "Theseus" or whatever you prefer. ClickNext
. - Disk size doesn't matter; click
Next
. - Click
Customize Hardware
, and then select the following settings:- 512MB of memory (less may work, but the minimum is on the order of 10-20 MB).
- 2 or more processor cores.
- Select
Virtualize CPU performance counters
if you want to use them (not required). - If you want to obtain Theseus's log output, then you need to add a serial port connection:
- Click
Add...
on the bottom left to add a new hardware type, thenSerial Port
. - Under
Connection
, selectUse output file:
and then choose a destination file name for the serial log to be written to. For example,/home/your_user/theseus_vmware.log
. - Click
Save
.
- Click
- Click
Finish
, and thenClose
.
Theseus should boot up after a few seconds. You can view the serial log output by cat
ting or opening the file:
cat /home/your_user/theseus_vmware.log
VirtualBox
We have tested Theseus on VirtualBox and it generally works out of the box. However, there are some options that you may wish to enable to improve performance and offer access to more devices in Theseus.
First, download VirtualBox here and install it on your system. On Ubuntu and other Debian-based Linux distributions, you will download a .deb
file that you can open in the Software Installer or install on the command line like so:
sudo dpkg -i virtualbox-<...>.deb
After opening VirtualBox, do the following:
- Click
New
. - In the Create Virtual Machine window, set
Type
toOther
andVersion
toOther/Unknown (64-bit)
, choose a name, and then clickNext
. - In the next window, choose 512MB of memory (less may work, but the minimum is on the order of 10-20 MB).
- Continue clicking next through all of the storage disk options, those do not matter.
- Back at the main window, right click on your new Theseus machine and choose
Settings
. - In the left sidebar, click
Storage
and then select the💿 Empty
option to choose an image for the optical disk.
Click on the💿▾
button on the right side of theOptical Drive:
option, selectChoose a disk file
, and then navigate to the Theseus ISO image in thebuild/
directory, e.g.,build/theseus-x86_64.iso
. - Under
System
in the left sidebar, go to theProcessor
tab and select 2 (or more) processors. - If you want to obtain Theseus's log output, then you need to add a serial port connection:
- Click
Serial Ports
in the left sidebar, under thePort 1
tab, select theEnable Serial Port
checkbox. - Under the
Port Mode
drop-down menu, selectRaw File
option. - In the
Path/Address
text box, type the destination file name for the serial log to be written to. For example,/home/your_user/theseus_vbox.log
. - Click
Ok
.
- Click
- In the main window, select the Theseus VM entry from the left sidebar and then click
Start
on the top bar.
Theseus should boot up after a few seconds. You can view the serial log output by cat
ting or opening the file:
cat /home/your_user/theseus_vbox.log
PCI passthrough of devices with QEMU
PCI passthrough can be used to allow a guest OS to directly access a physical device. The following instructions are a combination of this guide on host setup for VFIO passthrough devices and this kernel documentation on VFIO.
There are three main steps to prepare a device for PCI passthrough:
- Find device information
- Detach device from current driver
- Attach device to VFIO driver
Once these steps are completed, the device slot information can be passed to QEMU using the vfio flag. For example, for device 59:00.0, we run:
make run vfio=59:00.0
Finding device information
First, run lspci -vnn
to find the slot information, the kernel driver in use for the device, and the vendor ID and device code for the device you want to use.
Below is sample output for a Mellanox ethernet card we'd like to access using PCI passthrough:
59:00.0 Ethernet controller [0200]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:1019]
Subsystem: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:0008]
Flags: bus master, fast devsel, latency 0, IRQ 719, NUMA node 1
Memory at 39bffe000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at bf200000 [disabled] [size=1M]
Capabilities: <access denied>
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
Detach device from current driver
To detach the device from the kernel driver, run the following command, filling in the slot_info
and driver_name
with values you retrieved in the previous step.
echo $slot_info > /sys/bus/pci/drivers/$driver_name/unbind
In the above example, this would look like:
echo 0000:59:00.0 > /sys/bus/pci/drivers/mlx5_core/unbind
If you run lspci -v
now, you'll see that a kernel driver is no longer attached to this device.
Attach device to VFIO driver
First, load the VFIO driver by doing:
modprobe vfio-pci
To attach the new driver, run the following command, filling in the vendor_id
and device_code
with values you retrieved in the first step.
echo $vendor_id $device_code > /sys/bus/pci/drivers/vfio-pci/new_id
e.g. echo 15b3 1019 > /sys/bus/pci/drivers/vfio-pci/new_id
Now, QEMU can be launched with direct access to the device.
Return device to the Host OS
To reset the device, you can either reboot the system or return the device to the host OS using the following commands (replacing $slot_info
with the value previously retrieved):
echo 1 > /sys/bus/pci/devices/$slot_info/remove
echo 1 > /sys/bus/pci/rescan
Note: access for unprivileged users
To give access to an unprivileged user to this VFIO device, find the IOMMU group the device belongs to:
readlink /sys/bus/pci/devices/$slot_info/iommu_group
for example:
readlink /sys/bus/pci/devices/0000:59:00.0/iommu_group
for which we obtain the output below, in which 74
is the group number:
../../../../kernel/iommu_groups/74
Finally, give access to the current user via this command:
chown $USER /dev/vfio/$group_number
Running Theseus on Headless Systems Interactively
By default, Theseus expects to run in a standard desktop environment with a basic graphical display (monitor) and a real keyboard (and optionally, a mouse). For interacting with the system, Theseus uses the keyboard as its primary input and the graphical display as its primary output.
Theseus can also run in headless mode, in which the "head" of the computer (its monitor and keyboard) are nonexistent, and I/O is done over the network or a serial port connection. This is useful for system environments like servers, embedded microcontrollers, certain virtual machine environments, and anything else without a monitor display.
The current version of Theseus listens for incoming connections on serial ports only (COM1 and COM2). Upon receiving data (like a key press) on a serial port, Theseus will spawn a terminal emulator to handle I/O on that port.
Note: headless interactive mode can coexist simultaneously with regular graphical display mode.
TODO: describe the various options for disabling hardware graphical displays
Connecting a Terminal Emulator to Theseus
Currently, we have tested this with only virtual serial ports connected to Theseus through a VMM like QEMU, but it should also work for real hardware serial ports.
While Theseus is running in a VM or on another physical machine, we recommend using a terminal emulator program on the host machine to connect to the serial port device on the Theseus machine. Examples include:
screen
picocom
minicom
By default, the Theseus Makefile starts QEMU with two serial ports, COM1 and COM2, that the host machine can connect to and exchange data over.
The first serial port, COM1, is connected to the stdio
streams of the terminal that spawned the QEMU process.
This stream is used for the system log and for controlling QEMU, so it is best to use COM2 to interact with Theseus separately such that the headless virtual terminal is not polluted with log statements.
The second serial port, COM2, is connected to a dynamically-allocated pseudo-terminal (PTY) that will be allocated by QEMU. To connect to it, inspect the QEMU output when it first starts to find a line like this:
char device redirected to /dev/pts/3 (label serial1-base)
This tells you that QEMU connected the second serial port (COM2) on the guest Theseus VM to the Linux PTY device at /dev/pts/3
.
Note that QEMU uses 0-based indexing for serial ports, so its "serial1" label refers to the second serial port, our "SERIAL2" (COM2 on x86).
Now, once QEMU is running, you can connect a terminal emulator on the host to the serial port in Theseus, and Theseus will issue interactive commands to that terminal emulator. To do so, run any of the following commands and then press any key to start the terminal prompt:
screen /dev/pts/3
picocom /dev/pts/3
minicom -D /dev/pts/3
Note that some programs (namely minicom
) do not necessarily send the expected value when pressing the Backspace
key.
Thus, if you are experiencing unexpected behavior when pressing Backspace
, you need to ensure that your program is sending the correct ASCII DEL
(0x7F) character value when pressing Backspace
, instead of an ASCII BS
(0x08), which will only move the cursor to the left by one character.
In our experience, screen
and picocom
work as expected, but minicom
does not.
To change minicom
's default behavior, you can do the following:
- Press
Ctrl + A
twice to open the meta control bar at the bottom of the screen - Press
T
to open the "Terminal Settings" menu - Press
B
to toggle the "Backspace key sends" setting.- You want this to be
DEL
, notBS
.
- You want this to be
Either of these serial ports can be changed in QEMU using the environment variables SERIAL1
and SERIAL2
respectively, though again, we recommend only using SERIAL2
in virtual environments.
In real hardware, where there is only one serial port and therefore COM1 must be used, you can disable the log or initialize it with a different serial port, e.g., COM2, to avoid polluting the terminal emulator connected to COM1 with system log printouts.
Booting Theseus from a USB drive
To boot over USB, simply run
make boot usb=sdc
in which sdc
is the device node for the USB disk itself (not a partition like sdc2).
The OS image (.iso file) will be written to that USB drive.
On WSL or other host environments where /dev
device nodes don't exist, you can simply run make iso
and burn the .iso
file in the build/
directory to a USB drive.
For example, on Windows we recommend using Rufus to burn ISOs.
Then, once the bootable USB is ready, plug it into your PC, restart or power on the machine, and choose that USB device from the BIOS or legacy boot device screen.
Booting Theseus on Real Hardware via PXE
The following instructions are a combination of this guide on OSTechNix to set up PXE for Ubuntu and this guide by Andrew Wells on how to use any ISO with PXE.
PXE can be used to load Rust onto a target computer that is connected by LAN to the host machine used for development. To set up the host machine for PXE, first make the Theseus ISO by navigating to the directory Theseus is in and running:
make iso
Then, you will need to set up a TFTP and DHCP server which the test machine will access.
Setting up the TFTP Server
First, install all necessary packages and dependencies for TFTP:
sudo apt-get install apache2 tftpd-hpa inetutils-inetd nasm
Edit the tftp-hpa configuration file:
sudo nano /etc/default/tftpd-hpa
Add the following lines:
RUN_DAEMON="yes"
OPTIONS="-l -s /var/lib/tftpboot"
Then, edit the inetd
configuration file by opening the editor:
sudo nano /etc/inetd.conf
And adding:
tftp dgram udp wait root /usr/sbin/in.tftpd /usr/sbin/in.tftpd -s /var/lib/tftpboot
Restart the TFTP server and check to see if it's running:
sudo systemctl restart tftpd-hpa
sudo systemctl status tftpd-hpa
If the TFTP server is unable to start and mentions an in-use socket, reopen the tftp-hpa configuration file,set the line that has TFTP_ADDRESS=":69"
to be equal to 6969
instead and restart the TFTP server.
Setting up the DHCP Server
First, install package for DHCP server:
sudo apt-get install isc-dhcp-server
Then run ifconfig
to view available networking devices and find the network device name, e.g., eth0
.
Edit the /etc/default/isc-dhcp-server
configuration file and add the network device name from the step above to "INTERFACES". For me, this looks like INTERFACES="eth0"
.
Configure an arbitrary IP address that will be used in the next step:
sudo ifconfig <network-device-name> 192.168.1.105
This command might have to be done each time the computer being used as a server is restarted.
Edit the /etc/dhcp/dhcpd.conf
file by uncommenting the line authoritative;
and adding a subnet configuration such as the one below:
subnet 192.168.1.0 netmask 255.255.255.0 {
range 192.168.1.20 192.168.1.30;
option routers 192.168.1.1;
option broadcast-address 192.168.1.255;
default-lease-time 600;
max-lease-time 7200;
}
allow booting;
allow bootp;
option option-128 code 128 = string;
option option-129 code 129 = text;
next-server 192.168.1.105;
filename "pxelinux.0";
Restart the DHCP server and check to see if it's running:
sudo systemctl restart isc-dhcp-server
sudo systemctl status isc-dhcp-server
Loading the Theseus ISO Into the TFTP Server
In order for the TFTP server to load Theseus, we need the Theseus ISO and a memdisk file in the boot folder. To get the memdisk file first download syslinux which contains it.
wget https://www.kernel.org/pub/linux/utils/boot/syslinux/syslinux-5.10.tar.gz
tar -xzvf syslinux-*.tar.gz
Then navigate to the memdisk folder and compile.
cd syslinux-*/memdisk
make memdisk
Next, make a TFTP boot folder for Theseus and copy the memdisk binary into it along with the Theseus ISO:
sudo mkdir /var/lib/tftpboot/theseus
sudo cp /root/syslinux-*/memdisk/memdisk /var/lib/tftpboot/theseus/
sudo cp /Theseus/build/theseus-x86_64.iso /var/lib/tftpboot/theseus/
Navigate to the PXE configuration file:
sudo nano /var/lib/tftpboot/pxelinux.cfg/default
And add Theseus as a menu option by adding the following:
label theseus
menu label Theseus
root (hd0,0)
kernel theseus/memdisk
append iso initrd=theseus/theseus-x86_64.iso raw
Finally, restart the DHCP server one more time and make sure it's running:
sudo systemctl restart isc-dhcp-server
sudo systemctl status isc-dhcp-server
On the target computer, boot into the BIOS, turn on Legacy boot mode, and select network booting as the top boot option. Once the target computer is restarted, it should boot into a menu which displays booting into Theseus as an option.
Subsequent PXE Uses
After setting up PXE the first time, you can run make pxe
to make an updated ISO, remove the old one, and copy the new one over into the TFTP boot folder. At that point, you should be able to boot that new version of Theseus by restarting the target computer. If there are issues restarting the DHCP server after it worked the first time, one possible solution may be to confirm that the IP address is the one you intended it to be with the command from earlier:
sudo ifconfig <network-device-name> 192.168.1.105
The Golden Rule of Software Development
Code for others how you wish they would code for you.
What does this mean? You should adhere to the following principles.
-
Good abstractions. Another developer using your code should never have to study the internals of the code itself, but rather be able to fully understand how to use your code simply from its struct/function names and documentation. Use intuitive names and try to design an interface that makes sense, is simple and easy to use, and doesn't surprise anyone with unnecessary trickery.
-
Be clean. Write well-crafted, concise code with sufficient features to be useful, but without bloat. Adhere to code style conventions, including proper spacing, doc comments, naming conventions, etc.
-
Foolproof code. Think carefully about how others will use your code, and design it thoughtfully to prevent others from making mistakes when using your code, ideally prevented at compile time instead of runtime.
-
Errors are important! Handle errors gracefully and thoroughly, and return detailed error messages that clearly describe the issue. Don't ever let something fail silently!
Below are some other good practices.
-
Accurate metadata. In addition to good code and documentation, make sure to fill in additional metadata, such as the details present in each crate's
Cargo.toml
file: description, keywords, authors, etc. -
No "magic" numbers. Do not use literal number values that have no documentation or explanation of why they exist. For example, instead of just writing a value like 4096 in the code, create a
const
that accurately describes the semantic meaning of that value, e.g.,const PAGE_SIZE: usize = 4096;
. Magic numbers are terrible to maintain and don't help anyone who looks at your code in the future. -
Minimize global states. Remove static (global) states as much as possible, and rethink how the same data sharing can be done without globals.
Rust-specific Guidelines
-
Rust style. Follow proper Rust coding style and naming conventions. Use correct spacing, indentation, and alignment that matches the existing style. Make your code visually appealing, with spaces between operators like equal signs, addition, spaces after a comma, etc. Punctuation is important for legibility!
-
Rust documentation. Use proper rustdoc-style documentation for all structs, functions, and types. Make sure all of your documentation links are correct, and that you're using the correct rustdoc formatting for doc comments. Triple slashes
///
should be used above function and struct definitions, double slashes//
for C-style inline comments (or block comments like/* */
), and//!
for crate top-level documentation. Use Markdown formatting to describe function arguments, return values, and include usage examples, in a way consistent with Rust's official libraries. -
Option
s andResult
s. Use Options and Results properly. Don't use special values that have overloaded meanings, e.g., an integer in which0
means no value, or something like that. Here's a good resource for better understanding error handling in Rust.Option
s should be returned when an operation might fail, but that failure condition doesn't affect the rest of the system. For example, if you're searching for an element in a list, then anOption
is the suitable choice because the caller of your getter function would only call it in order to get and use the return value.Result
s should be returned if something can fail or succeed, and the caller needs to know whether it succeeded, but potentially need the actual return value, e.g., an init function that returns void. In this case,Result
is the best choice because we want to force the caller to acknowledge that the init function succeeded, or handle its error if it failed. In Theseus,Results
are mandatory when a function has some side effect, such as setting a parameter or value that might not exist or be initialized yet. In that case, a result must be used to indicate whether the function succeeded.
Theseus-specific Guidelines
-
Handle
Result
s properly and fully. Don't ignore a result error, instead, log that error and then handle it if possible. If you cannot handle it, return that error to the caller so they can attempt to handle it. NEVER SILENCE OR LAZILY HIDE ERRORS. -
Never use unsafe code. If you absolutely cannot avoid it, then you should review your case on an individual basis with the maintainers of Theseus. In most cases, unsafe code is not necessary and can be rewritten in safe code.
Adding New Functionality to Theseus
The easiest way to add new functionality is just to create a new crate by duplicating an existing crate and changing the details in its new Cargo.toml
file.
At the very least, you'll need to change the name
entry under the [package]
heading at the top of the Cargo.toml
file, and you'll need to change the dependencies for your new crate.
If your new kernel crate needs to be initialized, you can invoke it from the captain::init()
function,
although there may be more appropriate places to do so, such as the device_manager
's functions for initializing device drivers.
If you want to create a new application for Theseus, see those instructions here.
Advice for Contributing and using git
The main Theseus repository, theseus-os/Theseus
is what we call the upstream.
To contribute, you should create your own fork of that repository through the GitHub website, and then check out your own fork.
That way, your fork will be the origin
remote by default, and then you can add the upstream as another remote by running:
git remote add upstream https://github.com/theseus-os/Theseus
Never push to the main branch
Currently, the main branch on the upstream theseus-os/Theseus/theseus_main
is protected from a direct push.
This is true even for GitHub users who are in the theseus-os
organization and have write access to the Theseus repo.
The only way to contribute to it is by merging a pull request into the main branch, which only authorized users can do.
Instead, checkout your own fork as above, create a new branch with a descriptive name, e.g., kevin/logging_typo
,
develop your feature on that branch, and then submit a pull request.
This is a standard Git workflow that allows people can review your code, check for pitfalls and compatibility problems,
and make comments and suggestions before the code makes its way into the main branch.
You must do this for all changes, even tiny ones that may seem insignificant.
Submitting a pull request
To submit a pull request (PR), go to the GitHub page of your forked Theseus repo,
select the branch that you created from the drop down menu, and then click "New pull request".
By default, GitHub will create a new PR that wants to merge your branch into the upstream theseus_main
branch,
which is usually what you want to do.
Now, give your PR a good title and description, scroll down to review the commits and files changed,
and if everything looks good, click "Create pull request" to notify the maintainers that you have contributions that they should review.
Review your own work
Perform an initial review of your own code before submitting a pull request.
Kindly don't place the whole burden of fixing a bunch of tiny problems on others that must review your code too.
This includes building the documentation and reviewing it in HTML form in a browser (make view-doc
)
to make sure everything is formatted correctly and that hyperlinks work correctly.
Double-check commit contents
When making a commit, review your changes with git status
and git diff
, as well as on the GitHub comparison page, to ensure that you're not committing accidental modifications or editing files that you shouldn't be.
This makes the maintainers' lives a lot easier, meaning your PR is more likely to be accepted.
You don't need to worry about having too many small commits, as we will squash (combine) all of your PR's commits into a single large commit when merging it into the upstream main branch.
Papers and Presentations about Theseus
Over the years, Kevin Boos, Ramla Ijaz, and other Theseus collaborators have given many presentations about Theseus OS and related topics. This page offers a selected collection of the slide decks from those talks (including some video recordings), as well as a list of selected peer-reviewed academic publications and theses.
Selected Papers and Theses
-
[OSDI 2020] Theseus: an Experiment in Operating System Structure and State Management
- The main paper describing Theseus's design principles, implementation, evaluation, and limitations.
- Paper (PDF) — OSDI 2020 Video Talk — OSDI 2020 Short Video — Slides (PDF)
-
[KISV 2023] Leveraging Rust for Lightweight OS Correctness
- A paper about extending intralingual design and applying a hybrid approach (type systems plus formal verification) for proving lightweight correctness guarantees about Theseus's memory management subsystem.
- Paper (PDF)
-
Kevin Boos PhD Dissertation: Theseus: Rethinking Operating Systems Structure and State Management
-
Ramla Ijaz Master's Thesis: Exploring Intralingual Design in Operating Systems
Other published works
- [OSDI 2022] Poster: Correct and Performant Device Drivers via Intralingual Design
- An overview of in-progress work to use formal verification + intralingual design for better device drivers.
- Poster PDF
- [PLOS 2017] Theseus: A State Spill-free Operating System
- Paper PDF — a shorter, outdated ideas paper about early Theseus design.
- Superseded by the OSDI 2020 paper.
- [EuroSys 2017] A Characterization of State Spill in Modern Operating Systems
- Introduces and studies the concept of state spill.
- Motivation for our future work on Theseus.
Selected Presentations and Slide Decks
- Theseus: a Rust-native OS for Safety and Reliability (Sept 2023) – [Video talk]
- How Theseus uses Rust, plus Rust challenges (early 2022)
- How Safe-language OSes work, with Theseus examples — [Video Talk]
- Overview of Theseus Design (Late 2020)
- A Programmer's Introduction to Theseus
- Kevin Boos PhD Defense: Deep Dive into Theseus OS Research — [Video Talk]
- Ramla Ijaz Master's Defense: Deep Dive into Intralingual Design in Theseus
- Crate Loading/Management and Crate Namespaces in Theseus
- Supporting Unwinding and Exceptions/Panics in Theseus (Late 2019)
- Implementing Basic Application Support for Theseus (mid-2018)
- Supporting Multicore Processors on Theseus
- Realizing Dynamic Loading and Linking of Crates (Late 2017)
Theseus README + Quick Start
Click here to see the main Theseus README for quick start instructions.