You Have a Kernel Read/Write. Not Enough! How to Extract Offsets from XNU Kernelcaches
Foreword
Opa334 recently shared a kernel read and write primitive which is similar to the one used in DarkSword malware. I found that it was a perfect occasion for me to try to make it run on one of my testing devices and actually get my hands dirty with kernel exploration. We always hear about kernel exploitation, but rarely get to walk through what it looks like in practice.
Once you have read and write primitives to the kernel, the first step is to read backward until you find the magic number aka the Mach-O binary signature:
uint64_t magic = early_kread64(kernel_base); if (magic == 0x100000cfeedfacf) { printf("[DEBUG] Found Mach-O magic at 0x%llx!\n", kernel_base);
Then you can compute the kernel slide and you are good to go.
I won't detail this, but feel free to check the blog post of MATTEYEUX on DarkSword.
Now the next difficulty is to find the offsets between this magic value and the kernel objects in memory. It is exactly what this post is about.
Introduction
Kernelcaches extracted from IPSW files come without symbols: just raw ARM64 code. Yet, the internal layout of every kernel data structure is recoverable if you know where to look.
Note: you can use blacktop/symbolicator to recover some symbols and make your life easier.
It reminds me of this sentence from J. Levin in the DisARM book:
[...] In fact, the whole premise of the command line tools I demonstrate is to avoid having to use a debugger.
I tried to push hard on that path...
This guide documents a repeatable methodology for extracting struct offsets from stripped kernelcaches. The techniques here were validated against iOS 16.7.12 (iPhone X, build 20H364) using Binary Ninja.
I voluntarily chose not to use the Kernel Development Kit, to force myself to work directly from ARM assembly.
Prerequisites
- A disassembler with decompiler support (Binary Ninja, IDA Pro + Hex-Rays, or Ghidra)
- A decrypted kernelcache (I used
ipswfor extraction) - The XNU open-source release for the closest matching version
Also, the pseudo non-ARM code that I'll share with you, has been modified and simplified for this post.
The Core Principle
The key insight behind this entire methodology is that functions like proc_pid(), vnode_mount(), or kauth_cred_getuid() are wrappers that read the field from a struct. When decompiled, they directly reveal the field's offset.
A stripped kernelcache still retains the names of these exported functions.
Phase 1: Cross-Referencing with XNU Source
The XNU kernel source is partially open. While the iOS build may differ from the published source, the struct layouts are usually very close. Use the source as a map, not as ground truth.
- Identify the field name from the accessor function name (e.g.,
_proc_pid→p_pid) - Find the struct definition in the XNU source (e.g.,
bsd/sys/proc_internal.h) - Use the source to predict which fields exist and roughly where they are
- Verify each prediction against the actual binary
Struct definitions in XNU source
| Struct | Header file |
|---|---|
proc | bsd/sys/proc_internal.h |
vnode | bsd/sys/vnode_internal.h |
socket | bsd/sys/socketvar.h |
ucred | bsd/sys/ucred.h |
task | osfmk/kern/task.h |
thread | osfmk/kern/thread.h |
filedesc | bsd/sys/filedesc.h |
fileproc | bsd/sys/file_internal.h |
fileglob | bsd/sys/file_internal.h |
mount | bsd/sys/mount_internal.h |
Apple frequently adds, removes, or reorders fields between iOS versions. Never assume the open-source layout matches exactly. The source tells you what fields exist; the binary tells you where they are.
For example, proc_ro (a read-only split of proc fields) exists in iOS 15.2+ but is not in older XNU source releases. If you only read the source, you would miss this entirely.
Phase 2: Finding Anchor Points
Global variables like allproc, kernproc, and nprocs are stored in the __DATA segment. They are referenced by functions via adrp/ldr instruction pairs. Finding these gives you entry points into the kernel's data structures from a known address.
ARM64 uses page-relative addressing:
adrp x8, 0xfffffff0078b7000 ; load page base ldr x8, [x8, #0x728] ; load from page + offset ; → effective address: 0xfffffff0078b7728
This is a load of the global variable at 0xfffffff0078b7728, which in the context of proc_iterate is allproc.
Kernel slide (KASLR)
All addresses in the static binary are pre-slide. At boot, the kernel is loaded at a random offset (the KASLR slide). On a live device, the actual addresses will be static_address + slide. The offsets between globals remain constant.
Phase 3: Use accessor functions
Search for functions whose names follow the pattern <struct>_<field> or <struct>_get<field>. These are almost always thin accessors.
How to recognize an accessor
An accessor function decompiles to essentially one operation:
// _proc_pid at 0x5c892c return *(arg1 + 0x60);
Or in ARM64 assembly:
ldr w0, [x0, #0x60] ret
That single load instruction tells you: struct proc has p_pid at offset +0x60, and it is a 32-bit integer because the instruction is ldr w0, not ldr x0.
For example, given a target struct, search the function list for its name:
| Target struct | Search patterns |
|---|---|
proc | proc_pid, proc_ppid, proc_ucred, proc_name, proc_task |
vnode | vnode_vtype, vnode_mount, vnode_vid, vnode_fsnode, vnode_getname |
ucred | kauth_cred_getuid, kauth_cred_getgid, kauth_cred_getruid |
task | get_task_map, get_bsdtask_info, task_reference |
socket | file_socket, soisconnecting, soisconnected |
mount | vfs_flags, vfs_statfs |
Example: Mapping struct ucred
Search for functions containing kauth_cred_get:
_kauth_cred_getuid → return *(arg1 + 0x18) → cr_uid at +0x18
_kauth_cred_getruid → return *(arg1 + 0x1C) → cr_ruid at +0x1C
_kauth_cred_getsvuid → return *(arg1 + 0x20) → cr_svuid at +0x20
_kauth_cred_getgid → return *(arg1 + 0x28) → cr_gid at +0x28
_kauth_cred_getrgid → return *(arg1 + 0x68) → cr_rgid at +0x68
_kauth_cred_getsvgid → return *(arg1 + 0x6C) → cr_svgid at +0x6C
Decompiler vs. disassembly
Decompilers sometimes introduce confusing array indexing notation. When the decompiler shows arg1[0x15], the actual offset depends on what type it infers for arg1. Always verify against the raw disassembly.
For example, arg1[0x15a] in decompilation might mean arg1 + 0x15a * sizeof(element). But the ARM64 instruction will show the real byte offset:
; 0x5c9a40 add x0, x0, #0x579 ; This is the actual offset
When in doubt, read the assembly instructions: they are always the ground truth.
Phase 4: Iterator and Constructor Functions
When accessor functions do not exist for a field (many internal fields are never exported), look at functions that iterate or construct instances of the struct. These functions touch many fields and reveal the overall layout.
The Iterator Pattern
Functions named *_iterate, *_foreach, or *_walk traverse linked lists of kernel objects. They reveal:
- The global head pointer of the list (a kernel global variable)
- The list entry offset within the struct. It is often
+0x00for the primary list, but a struct can have multiple list entries at different offsets (e.g.proc.p_listat+0x00vsproc.p_hashat+0xA0) - The count variable (for instance
nprocs, inproc_iterate) - Various field accesses used for filtering
Example: proc_iterate
This single function revealed:
| What | How | Value |
|---|---|---|
allproc global | First data reference loaded as list head | 0xfffffff0078b7728 |
zombproc global | Second list head (conditional on flags) | 0xfffffff0078b7730 |
nprocs global | Loop bound variable | 0xfffffff0078b7d00 |
p_list.le_next | i = *i (following the list) | +0x00 |
p_pid | Stored into pidlist array | +0x60 |
p_stat | Compared against 1 (zombie filter) | +0x64 |
p_listflag | Reference count manipulation | +0x464 |
The Constructor Pattern
Functions named *create*, *init*, or *alloc* initialize struct fields. They often set fields sequentially, revealing the struct layout in order.
For instance for the socreate_internal routine the socket creation function revealed over 20 struct fields by tracing the sequential stores to the newly allocated socket.
// x21 = newly allocated socket *(x21 + 0x18) = protosw; // so_proto *(x21 + 0x1e0) = kauth_cred; // so_cred *(x21 + 0x1e4) = proc_pid(p); // so_last_pid *(x21 + 0x1e8) = proc_uniqueid(p);// so_last_upid *(x21 + 0x288) = tpidr_el1; // so_background_thread
What to look for in constructors
- Calls to other accessors functions (e.g.,
proc_pid()) whose return value is stored memcpycalls that reveal embedded sub-structuresstr xzr(storing zero) to initialize pointer fields
Phase 5: Syscall Implementations (The Deep Dive)
When neither accessors nor iterators exist for a field, look at the syscall implementations that operate on the struct. Syscalls are the boundary between userspace and kernel space; they must read and write kernel structs to do their work.
Naming conventions
XNU syscall implementations follow the pattern sys_<name> or just <name> for older BSD syscalls:
| Syscall | Function name | Reveals |
|---|---|---|
chdir(2) | sys_chdir | filedesc.fd_cdir offset |
chroot(2) | chroot | filedesc.fd_rdir offset, chroot flag |
open(2) | vn_open_auth | fileproc/fileglob chain |
fchdir(2) | sys_fchdir | filedesc locking pattern |
We could for example try to access some fields of proc via sys_chdir.
The chdir syscall must update the current working directory. Decompiling it reveals:
IORWLockWrite(proc + 0x128); // fd_rw_lock old = *(proc + 0x118); // fd_cdir (old value) *(proc + 0x118) = new_vnode; // fd_cdir = new directory lck_rw_unlock_exclusive(proc + 0x128); if (old != NULL) vnode_rele(old);
This gives us three offsets from one function:
proc + 0x118=fd_cdirproc + 0x128=fd_rw_lock- And confirms the filedesc is inline in the
proc(no intermediate pointer)
The inline vs. pointer question
A critical question when mapping any struct: is sub-struct X a pointer to a separate allocation, or is it embedded inline?
The answer comes from how the code accesses it. If you see:
// Pointer to separate struct: fd = *(proc + SOME_OFFSET); // load a pointer cdir = *(fd + 0x18); // dereference through it // Inline (embedded): cdir = *(proc + 0x118); // direct access, no intermediate load
If there is no intermediate pointer load, the sub-struct is inline. This is exactly what we found for filedesc inside proc: the fields are at direct offsets from the proc base.
Phase 6: Zone ID Validation (Identifying Protected Structures)
zone_require() and zone_id_require_ro() are used to validate that pointers belong to the correct memory zone. These checks reveal what zone a struct lives in and whether it is read-only.
Reading zone validation
When you see code like this:
// Inside _proc_ucred: x1 = *(arg1 + 0x18); // load proc_ro pointer zone_id_require_ro_panic(5, x1); // validate it belongs to zone #5
Then we can deduce:
proc + 0x18is a pointer to another struct- That struct lives in zone #5
- Zone #5 is a read-only zone (the
_rosuffix)
Zone ID mapping
By collecting all zone_id_require_ro_panic calls across the kernelcache, you can build a complete map of protected zones:
| Zone ID | Struct | Protection |
|---|---|---|
| 3 | thread_ro | read-only |
| 5 | proc_ro | read-only |
| 7 | ucred | read-only |
| 0x17 | proc | Regular zalloc (with zone_require) |
Understanding which structures are in read-only zones tells you about the kernel's security architecture. Fields that Apple moved into proc_ro are protected and cannot be modified even with a kernel read/write primitive.
Phase 7: Following Pointer Chains (Graph Traversal)
Individual functions rarely traverse more than one or two pointer hops. But by combining offsets discovered in different functions, you can build paths between objects that have no direct accessor.
For example, there is no socket_get_proc() in the KPI — you cannot find the owning process of a socket with a single function search. But the path exists if you chain discoveries from earlier phases:
- From
socreate_internal(Phase 2): socket + 0x288 stores the creating thread (tpidr_el1) - From
_current_proc(Phase 1): thread + 0x350 → thread_ro, then thread_ro + 0x10 → proc
socreate_internal _current_proc _proc_pid
found in Phase 2 found in Phase 1 found in Phase 1
│ │ │
▼ ▼ ▼
┌──────────┐ +0x288 ┌────────────┐ +0x350 ┌────────────┐ +0x10 ┌──────────┐ +0x60
│ socket │ ───────→ │ thread │ ──────→ │ thread_ro │ ─────→ │ proc │ ─────→ p_pid
└──────────┘ └────────────┘ └────────────┘ └──────────┘
(tpidr_el1) (zone RO #3)
Neither function knows about the other. But combining them gives you a three-hop path from any socket to its owning process — something you could never find by searching function names alone.
This is where the work becomes cumulative: every offset you confirmed in Phases 1–5 is a building block. The more you have, the more paths you can construct.
Phase 8: Hash Tables and Complex Data Structures
Some kernel lookups use hash tables instead of linked lists. The hash function and table structure can be recovered from the lookup function.
Example: PID hash table from _proc_find
_proc_find takes a PID and returns the corresponding proc. Decompiling it reveals:
- A multiplicative hash function applied to the PID
- A global hash table pointer at a known address
- A mask derived from table metadata
- A chain walk through the collision list, comparing PIDs
The hash entry lives at proc + 0xA0, which means the proc struct has a LIST_ENTRY at that offset for chaining in the hash table. The PID comparison happens at hash_entry - 0xA0 + 0x60, confirming p_pid at +0x60 from another angle.
Practical Tips
Function clusters reveal struct regions
If you find proc_pid at +0x60, proc_ppid at +0x20, and proc_pgrpid at +0x28, you know the PID-related fields are clustered in the +0x20–0x68 region. This helps you predict where other related fields might be, and focus your search.
Size hints from zalloc_ro_mut
When zalloc_ro_mut(zone_id, ptr, offset, src, size) is called, the size parameter tells you the total size of the read-only struct. For example, proc_ro is 0x80 bytes.
ARM64 instruction cheat sheet for offset extraction
| Instruction | What it tells you |
|---|---|
ldr x0, [x1, #0x60] | 64-bit load from offset 0x60 |
ldr w0, [x1, #0x60] | 32-bit load from offset 0x60 |
ldrh w0, [x1, #0x70] | 16-bit load from offset 0x70 |
ldrb w0, [x1, #0x64] | 8-bit load from offset 0x64 |
str x2, [x1, #0x18] | 64-bit store at offset 0x18 |
add x0, x1, #0x579 | Compute address at offset 0x579 (often for strings/arrays) |
stp x2, x3, [x1, #0x50] | Store pair: 64-bit values at +0x50 and +0x58 |
adrp x8, PAGE then ldr x8, [x8, #OFF] | Global variable load at PAGE+OFF |
mrs x0, tpidr_el1 | Load current thread pointer |
Field size from instruction width
The ARM64 instruction tells you the field size:
ldr x/str x→ 8 bytes (pointer, uint64)ldr w/str w→ 4 bytes (int32, uint32, pid_t)ldrh/strh→ 2 bytes (uint16, short)ldrb/strb→ 1 byte (uint8, char, bool)
Take aways
- Jonathan Levin, *OS Internals (volumes I–III): the definitive reference on XNU internals
- Apple XNU source — opensource.apple.com
