What is Mac OS X?


© Amit Singh. All Rights Reserved. Written in December 2003

XNU: The Kernel

The Mac OS X kernel is called XNU. It can be viewed as consisting of the following components:

Mach

XNU contains code based on Mach, the legendary architecture that originated as a research project at Carnegie Mellon University in the mid 1980s (Mach itself traces its philosophy to the Accent operating system, also developed at CMU), and has been part of many important systems. Early versions of Mach had monolithic kernels, with much of BSD's code in the kernel. Mach 3.0 was the first microkernel implementation.
XNU's Mach component is based on Mach 3.0, although it's not used as a microkernel. The BSD subsystem is part of the kernel and so are various other subsystems that are typically implemented as user-space servers in microkernel systems. XNU's Mach is responsible for various low-level aspects of the system, such as:
  • preemptive multitasking, including kernel threads (POSIX threads on Mac OS X are implemented using kernel threads)
  • protected memory
  • virtual memory management
  • inter-process communication
  • interrupt management
  • real-time support
  • kernel debugging support (the built-in low-level kernel debugger, ddb, is part of XNU's Mach component, and so is kdp, a remote kernel debugging protocol implementation)
  • console I/O
The sequence of events prior to the kernel is passed control is described in Booting Mac OS X. The secondary bootloader eventually calls the kernel's "startup" code, forwarding various boot arguments to it. This low-level code is where every processor in the system starts (from the kernel's point of view). Various important variables, like maximum virtual and physical addresses, the threshold temperature for throttling down a CPU's speed, are initialized here, BAT registers are cleared, Altivec (if present) is initialized, caches are initialized, etc. Eventually this code jumps to boot initialization code for the architecture (ppc_init() on the PowerPC). Thereafter:
  • A template thread is filled in, and an initial thread is created from this template. It is set to be the "current" thread.
  • Some CPU housekeeping is done.
  • The "Platform Expert" (see below) is initialized (PE_init_platform()), with a flag indicating that the VM is not yet initialized. This saves the boot arguments, the device tree and display information in a state variable. Another call to PE_init_platform() is made after the VM is initialized.
  • Mach VM is initialized.
  • The function machine_startup() is called. It takes some actions based on the boot arguments, performs some housekeeping, starts thermal monitoring for the CPU, and calls setup_main().
  • setup_main() performs a lot of work: initializing the scheduler, IPC, kernel extension loading, clock, timers, tasks, threads, etc. and finally creates a kernel thread called startup_thread that creates further kernel threads.
  • startup_thread creates a number of other threads (the idle threads, service threads for clock and device, ...). It also initializes the thread reaper, the stack swapin and the periodic scheduler mechanism. It is here that the BSD subsystem is initialized (via bsd_init()). startup_thread becomes the pageout daemon once it finishes its work.
At this point, Mach is up and running.
In addition to BSD system calls (the syscall API, as well as the sysctl and ioctl APIs), Mach messaging and IPC can be and is used (as appropriate) to exchange information between the user and kernel spaces. XNU also provides various ways of memory mapping and block copying. While it may be nice (say, from an academic point of view, if nothing else) to have many APIs in a system, there is always a burden on the programmer for choosing wisely what API to use. The situation is similar for user-space APIs on Mac OS X, as we shall see later.

BSD

XNU's BSD component uses FreeBSD as the primary reference codebase (although some code might be traced to other BSDs). Darwin 7.x (Mac OS X 10.3.x) uses FreeBSD 5.x. As mentioned before, BSD runs not as an external (or user-level) server, but is part of the kernel itself. Some aspects that BSD is responsible for include:
  • process model
  • user ids, permissions, basic security policies
  • POSIX API, BSD style system calls
  • TCP/IP stack, BSD sockets, firewall
  • VFS and filesystems (see Mac OS X Filesystems for details)
  • System V IPC
  • crypto framework
  • various synchronization mechanisms
Note that XNU has a unified buffer cache but it ties in to Mach's VM.
XNU uses a synchronization abstraction (built on top of Mach mutexes) called funnels to serialize access to the BSD portion of the kernel. The kernel variables pointing to these funnels have the _flock suffix, such as kernel_flock and network_flock. When Mach initializes the BSD subsystem via a call to bsd_init(), the first operation performed is the allocation of funnels (the kernel funnel's state is set to TRUE). Thereafter:
  • The kernel memory allocator is initialized.
  • The "Platform Expert" (see below) is called upon to see if there are any boot arguments for BSD.
  • VFS buffers/hash tables are allocated and initialized.
  • Process related structures are allocated/initialized. This includes the list of all processes, the list of zombie processes, hash tables for process ids and process groups.
  • Process 0 is created and initialized (credentials, file descriptor table, audit information, limits, etc.). The variable kernproc points to process 0.
  • The machine dependent real-time clock's time and date are initialized.
  • The Unified Buffer Cache is initialized (via ubc_init(), which essentially initializes a Mach VM Zone via zinit(), which allocates a region of memory from the page-level allocator).
  • Various VFS structures/mechanisms are initialized: the vnode table, the filesystem event mechanism, the vnode name cache, etc. Each present filesystem time is also initialized.
  • mbufs (memory buffers, used heavily in network memory-management) are initialized via mbinit().
  • Facilities/subsystems such as syslog, audit, kqueues, aio, and System V IPC are initialized.
  • The kernel's generic MIB (management information base) is initialized.
  • The data link interface layer is initialized.
  • Sockets and protocol families are initialized.
XNU uses a specific type of kernel extensions, NKEs (Network Kernel Extensions), to make the 4.4BSD networking architecture fit in to Mac OS X.
  • Kernel profiling is started, and BSD is "published" as a resource in the IOKit.
  • Ethernet devices are initialized.
  • A Mach Zone is initialized for the vnode pager.
  • BSD tries to mount the root filesystem (which could be coming over the network, for example, a Mac OS X disk image (.dmg) exported over NFS).
  • devfs is mounted on /dev.
  • A new process is created (cloned) from kernproc (process 0). This newly created process has pid 1, and is set to become init (actually mach_init, which starts init). mach_init is loaded and run via bsdinit_task(), which is called by the BSD asynchronous trap handler (bsd_ast()).
The rest of the user space startup is described in Mac OS X System Startup.

I/O Kit

I/O Kit, the object-oriented device driver framework of the XNU kernel is radically different from that on traditional systems.
I/O Kit uses a restricted subset of C++ (based on Embedded C++) as its programming language. This system is implemented by the libkern library. Features of C++ that are not allowed in this subset include:
  • exceptions
  • multiple inheritance
  • templates
  • RTTI (run-time type information), although I/O Kit has its own run-time typing system
The device driver model provided by the I/O Kit has several useful features (in no particular order):
  • numerous device families (ATA/ATAPI, FireWire, Graphics, HID, Network, PCI, USB, HID, ...)
  • object oriented abstractions of devices that can be shared
  • plug-and-play and hot-plugging
  • power management
  • preemptive multitasking, threading, symmetric multiprocessing, memory protection and data management
  • dynamic matching and loading of drivers (multiple bus types)
  • a database for tracking and maintaining detailed information on instantiated objects (the I/O Registry)
  • a database of all I/O Kit classes available on a system (the I/O Catalog)
  • an extensive API
  • mechanisms/interfaces for applications and user-space drivers to communicate with the I/O Kit
  • driver stacking
I/O Kit's implementation consists of three C++ libraries that are present in the kernel and available to loadable drivers: IOKit.framework, Kernel/libkern and Kernel/IOKit. The I/O Kit includes a modular, layered run-time architecture that presents an abstraction of the underlying hardware by capturing the dynamic relationships between the various hardware/software components (involved in an I/O connection).
Various tools such as ioreg, ioalloccount, ioclasscount, iostat, kextload, kextunload, kextstat, kextcache, etc. let you explore and control various aspects of I/O Kit. For example, the following command shows status of dynamically loaded kernel extensions:
% kextstat Index Refs Address Size Wired Name (Version) <Linked Against> 1 1 0x0 0x0 0x0 com.apple.kernel (7.2) 2 1 0x0 0x0 0x0 com.apple.kpi.bsd (7.2) 3 1 0x0 0x0 0x0 com.apple.kpi.iokit (7.2) 4 1 0x0 0x0 0x0 com.apple.kpi.libkern (7.2) ...
The following command lists the details of the I/O Kit registry in excruciating detail:
% ioreg -l -w 0 +-o Root <class IORegistryEntry, retain count 12> | { | "IOKitBuildVersion" = "IOKit Component Version 7.2: Thu Dec 11 16:15:20 PST 2003; root(rcbuilder):RELEASE_PPC/iokit/RELEASE " | "IONDRVFramebufferGeneration" = <0000000200000002> ... /* thousands of lines of output */

Platform Expert

The Platform Expert is an object (one can think of it as a driver) that knows the type of platform that the system is running on. I/O Kit registers a nub (see below) for the Platform Expert. This nub then loads the correct platform specific driver, which further discovers the buses present on the system, registering a nub for each bus found. The I/O Kit loads a matching driver for each bus nub, which discovers the devices connected to the bus, and so on. Thus, the Platform Expert is responsible for actions such as:
  • Building the device tree (as described above)
  • Parse certain boot arguments
  • Identify the machine (including processor and bus clock speeds)
  • Initialize a "user interface" to be used in case of kernel panics
In the context of the I/O Kit, a "nub" is an object that defines an access point and communication channel for a device (a bus, a disk drive or partition, a graphics card, ...) or logical service (arbitration, driver matching, power management, ...).

libkern and libsa

As described earlier, the I/O Kit uses a restricted subset of C++. This system, implemented by libkern, provides features such as:
  • Dynamic object allocation, construction, destruction (including data structures such as Arrays, Booleans, Dictionaries, ...)
  • Certain atomic operations, miscellaneous functions (bcmp(), memcmp(), strlen(), ...)
  • Provisions for tracking the number of current instances for each class
  • Ways to avoid the "Fragile Base Class Problem"
libsa provides functions for miscellaneous purposes: binary searching, symbol remangling (used for gcc 2.95 to 3.3, for example), dgraphs, catalogs, kernel extension management, sorting, patching vtables, etc. 

Comments