wasmtime-runtime
, the key Wasmtime crate
Porting A previous post from late 2021 chronicled our ongoing journey to port Wasmtime to Theseus.
While our bottom-up approach got off to a strong start, we quickly encountered our first major challenge when examining the wasmtime-runtime
crate, as it contains many dependencies on platform-specific and legacy system interfaces:
- Unix-like memory mapping and protection
- Signal/trap handling
- Thread-local storage
- Stack introspection and backtracing
- File and I/O abstractions
- Exception (panic) handling and unwinding resumption
This post describes our progress over a few weeks of working to add these features to Theseus in order to support wasmtime-runtime
's many complex dependencies.
Porting & reorganizing third-party libraries
We first re-organized Theseus's repository to include two folders for third-party crates:
libs/
: contains standalone crates that don't depend on Theseus, but can be used by Theseus and others.ports/
: contains crates that have been ported to depend directly on Theseus-specific crates, e.g., those inkernel/
and are thus not standalone.
The main features ported over the past couple of months (early winter 2021-2022) are shown in the table below:
Crate / Feature | Summary | Reason Needed for wasmtime-runtime |
---|---|---|
libc |
Rust wrapper around the actual platform-specific libc implementation |
Used to establish memory mappings and register signal handlers |
region |
Cross-platform APIs for virtual memory functions | Used to allocate large chunks of memory and remap/protect memory regions as exec/read/write as needed |
TLS | Thread-Local Storage areas | Used for thread_local!() macro, which is needed to handle traps and stack unwinding upon exceptions that occur while executing native code that was JIT-compiled from a WASM binary |
object |
Helper crate for reading/writing object files, e.g., ELF | Used to read object files generated by cranelift 's backend and to manage unwind info |
libc
to Theseus
Porting Support for a minimal subset of libc
functionality has been an ongoing but low-priority effort, mostly for two reasons:
- Running C code directly on Theseus is inherently unsafe, as C is not a safe language and can thus violate Theseus's language safety-based isolation and resource usage guarantees.
- No crates that Theseus depends on have needed an underlying
libc
, thus Theseus as a platform did not need to offer one... until now.
Theseus's libc implementation is called tlibc
. which is described in this book chapter.
So far it has been loosely based on the Redox project's relibc
.
Our efforts of late were focused on supporting mmap
for POSIX-style memory mappings, which Theseus has traditionally eschewed because they are unsafe and poorly-designed from a state management perspective1.
In the future, we also may support POSIX-style signal handlers, but for now we have chosen to re-implement Wasmtime's signal handling directly in safe Rust atop Theseus rather than going through an unsafe libc
FFI boundary for no good reason.
The bulk of the mmap
implemenation for tlibc
was added in commit fffda85.
The key aspects of this are:
tlibc
exposes a POSIX-stylemmap()
function that calls Theseus APIs to instantiate newMappedPages
objects, and then saves them in a private list so that they aren't dropped until the correspondingmunmap()
call is invoked.- This is required because Theseus's abstraction of a virtual memory mapping,
MappedPages
, is auto-unmapped upondrop
to guarantee safety.
- This is required because Theseus's abstraction of a virtual memory mapping,
- Currently,
mlock
andmunlock
are dummy functions because Theseus doesn't perform any swapping or paging to disk. - Memory protection (
mprotect
) is offered, but is currently limited because Theseus forces all current memory mappings to be marked as "present" in the page table.- Thus, stripping read permissions from a mapping technically works, but it violates the guarantees of the
MappedPages
type, i.e., the mapping is present and valid for the entire lifetime of aMappedPages
object.
- Thus, stripping read permissions from a mapping technically works, but it violates the guarantees of the
While we were at it, we went even further with additional improvements to theseus_cargo
, libc
, and tlibc
to facilitate integration of Rust and C code atop Theseus.
- Each time we implement a new feature in
tlibc
, we must also update Here's an example of that formmap
, with some testing functions included. theseus_cargo
now supports building out-of-tree components that depend on both Rust and C code, e.g., native libraries.- It also now supports building
rlib
andstaticlib
crate types.
- It also now supports building
- We added basic
stdio
features totlibc
, e.g., forprintf
, which is useful for testing purposes.
In summary, we fixed all the issues with
tlibc
,libc
, andtheseus_cargo
.Now, Rust and C code (both in-tree and out-of-tree components) can all be compiled and loaded/linked together in Theseus.
region
to Theseus
Porting With tlibc
now supporting basic libc
memory mapping functions, porting the region
crate was fairly straightforward.
However, importantly, we chose to not force region
on Theseus to depend on tlibc
, mainly because it would introduce another layer of unsafety.
The primary implementation of alloc
and free
are here, which are similar to the mmap()
implemenation in tlibc
.
We also must express the region::Protection
type in terms of Theseus's page table EntryFlags
, which was generally straightforward.
The one tricky part of region
that we disliked is QueryIter
, which allows the caller to query all virtual memory areas across the entire current virtual address space to find ones that span or overlap with a certain range of addresses.
This is problematic for a few reasons:
- Theseus's state management philosophy dictates that it does not maintain a centralized list of all memory mappings, so there's nothing to iterate over by default.
- Theseus provides a very safe and clear API for interacting with memory mappings, which
region::query_range()
completely ignores because it assumes a POSIX-style virtual memory API. - Theseus strives to prevent TOCTTOU attacks by avoiding the concept of a handles that point to a resource indirectly. By design (and by necessity atop conventional OSes),
QueryIter
separates the "time of check" from the "time of use", leading to potentially confusing behavior and errors in which a memory region returned from a query no longer exists by the time one attempts to use it.
In the end, our solution was to allow QueryIter
to expose and return only references to the memory areas already created by the region
crate itself. This strives to mitigate safety issues that could arise by exposing all memory regions maintained by Theseus to higher-level Rust code that may use them unsafely through the region
APIs.
Hopefully this feature restriction doesn't pose a problem in the future.
Supporting Thread-Local Storage on Theseus
Thread-Local Storage (TLS) allows one to declare a variable that will be instantiated on a per-thread basis, with each thread having its own local copy that other threads cannot access.
This is useful for many reasons, e.g., programming conveniencce, performant access to thread-specific data without locking, etc.
Our motivation for finally supporting it in its ultimate flexible form -- the ELF standard TLS areas -- stemmed from wasmtime-runtime
, which uses it in myriad ways.
Note: previously, Theseus offered a cheap imitation of TLS using the GS register to store limited, targeted data about each task, but it wasn't usable by any applications, libraries, or even other non-
task
kernel crates.
We implemented TLS support across several commits. This was a suprisingly complex and tricky implementation that required a lot of trial-and-error experimentation to determine how to correctly layout each TLS object in the per-task TLS area.
Another complicating factor is that Theseus loads and links all crates at runtime, which means that our implementation must support both statically-linked TLS areas from the base kernel image as well as newcomers found in dynamically-loaded crates. There are a lot of tradeoffs herein as it relates to reserving and allocating offset ranges in the TLS space for TLS data sections, tracking TLS data sections per namespace, per crate, etc -- but these are best saved for a separate post about TLS.
We went a step further by implementing Rust's thread_local!()
macro for any Theseus crate, which offers lazy dynamic initialization and cleanup of TLS areas.
This overcomes the limitations of standard ELF TLS sections, which behave like static
globals in Rust: they are const
-initialized and never dropped.
object
crate to Theseus
Porting the The object
crate is standalone and doesn't need to be ported to Theseus specifically, thus we can simply port it to no_std
and place it in Theseus's libs/
directory.
The only real difficulty here is that while object
does support no_std
, no previous users of object
needed to write to an object file in a no_std
environment.
Thinking about it, we do agree that's kind of weird, but Theseus is just like that sometimes. 😊
Once we convinced the maintainers of object
that this feature was necessary, the changes required to do so weren't very involved.
It boiled down to a rearrangement of object
's Cargo features and configuration blocks: check out the PR we submitted (that was accepted) for more details.
Miscellaneous Improvements
- We improved the page allocator to allow it to lazily merge contiguous freed chunks of pages.
- This happens lazily after an allocation request first fails; it is possible to also do it proactively in
AllocatedPages::drop()
, but that makes deallocation more expensive. - Needed for loading C object files or static libraries with entry points at a fixed address, e.g., the default entry point of
0x400000
. - Future work: support building and loading position-independent executables (PIE, and PIC). This is required to simultaneously load multiple C executables at the same fixed address, because Theseus only offers a single virtual address space.
- This happens lazily after an allocation request first fails; it is possible to also do it proactively in
-
See our OSDI 2020 paper for an in-depth discussion of this. ↩