You Have a Kernel Read/Write. Not Enough! How to Extract Offsets from XNU Kernelcaches

Foreword

Opa334 recently shared a kernel read and write primitive which is similar to the one used in DarkSword malware. I found that it was a perfect occasion for me to try to make it run on one of my testing devices and actually get my hands dirty with kernel exploration. We always hear about kernel exploitation, but rarely get to walk through what it looks like in practice.

Once you have read and write primitives to the kernel, the first step is to read backward until you find the magic number aka the Mach-O binary signature:

uint64_t magic = early_kread64(kernel_base);
if (magic == 0x100000cfeedfacf) {
  printf("[DEBUG] Found Mach-O magic at 0x%llx!\n", kernel_base);

Then you can compute the kernel slide and you are good to go.

I won't detail this, but feel free to check the blog post of MATTEYEUX on DarkSword.

Now the next difficulty is to find the offsets between this magic value and the kernel objects in memory. It is exactly what this post is about.

Introduction

Kernelcaches extracted from IPSW files come without symbols: just raw ARM64 code. Yet, the internal layout of every kernel data structure is recoverable if you know where to look.

Note: you can use blacktop/symbolicator to recover some symbols and make your life easier.

It reminds me of this sentence from J. Levin in the DisARM book:

[...] In fact, the whole premise of the command line tools I demonstrate is to avoid having to use a debugger.

I tried to push hard on that path...

This guide documents a repeatable methodology for extracting struct offsets from stripped kernelcaches. The techniques here were validated against iOS 16.7.12 (iPhone X, build 20H364) using Binary Ninja.

I voluntarily chose not to use the Kernel Development Kit, to force myself to work directly from ARM assembly.

Prerequisites

A disassembler with decompiler support (Binary Ninja, IDA Pro + Hex-Rays, or Ghidra)
A decrypted kernelcache (I used ipsw for extraction)
The XNU open-source release for the closest matching version

Also, the pseudo non-ARM code that I'll share with you, has been modified and simplified for this post.

The Core Principle

The key insight behind this entire methodology is that functions like proc_pid(), vnode_mount(), or kauth_cred_getuid() are wrappers that read the field from a struct. When decompiled, they directly reveal the field's offset.

A stripped kernelcache still retains the names of these exported functions.

Phase 1: Cross-Referencing with XNU Source

The XNU kernel source is partially open. While the iOS build may differ from the published source, the struct layouts are usually very close. Use the source as a map, not as ground truth.

Identify the field name from the accessor function name (e.g., _proc_pid → p_pid)
Find the struct definition in the XNU source (e.g., bsd/sys/proc_internal.h)
Use the source to predict which fields exist and roughly where they are
Verify each prediction against the actual binary

Struct definitions in XNU source

Struct	Header file
`proc`	`bsd/sys/proc_internal.h`
`vnode`	`bsd/sys/vnode_internal.h`
`socket`	`bsd/sys/socketvar.h`
`ucred`	`bsd/sys/ucred.h`
`task`	`osfmk/kern/task.h`
`thread`	`osfmk/kern/thread.h`
`filedesc`	`bsd/sys/filedesc.h`
`fileproc`	`bsd/sys/file_internal.h`
`fileglob`	`bsd/sys/file_internal.h`
`mount`	`bsd/sys/mount_internal.h`

Apple frequently adds, removes, or reorders fields between iOS versions. Never assume the open-source layout matches exactly. The source tells you what fields exist; the binary tells you where they are.

For example, proc_ro (a read-only split of proc fields) exists in iOS 15.2+ but is not in older XNU source releases. If you only read the source, you would miss this entirely.

Phase 2: Finding Anchor Points

Global variables like allproc, kernproc, and nprocs are stored in the __DATA segment. They are referenced by functions via adrp/ldr instruction pairs. Finding these gives you entry points into the kernel's data structures from a known address.

ARM64 uses page-relative addressing:

adrp    x8, 0xfffffff0078b7000    ; load page base
ldr     x8, [x8, #0x728]          ; load from page + offset
; → effective address: 0xfffffff0078b7728

This is a load of the global variable at 0xfffffff0078b7728, which in the context of proc_iterate is allproc.

Kernel slide (KASLR)

All addresses in the static binary are pre-slide. At boot, the kernel is loaded at a random offset (the KASLR slide). On a live device, the actual addresses will be static_address + slide. The offsets between globals remain constant.

Phase 3: Use accessor functions

Search for functions whose names follow the pattern <struct>_<field> or <struct>_get<field>. These are almost always thin accessors.

How to recognize an accessor

An accessor function decompiles to essentially one operation:

// _proc_pid at 0x5c892c
return *(arg1 + 0x60);

Or in ARM64 assembly:

ldr     w0, [x0, #0x60]
ret

That single load instruction tells you: struct proc has p_pid at offset +0x60, and it is a 32-bit integer because the instruction is ldr w0, not ldr x0.

For example, given a target struct, search the function list for its name:

Target struct	Search patterns
`proc`	`proc_pid`, `proc_ppid`, `proc_ucred`, `proc_name`, `proc_task`
`vnode`	`vnode_vtype`, `vnode_mount`, `vnode_vid`, `vnode_fsnode`, `vnode_getname`
`ucred`	`kauth_cred_getuid`, `kauth_cred_getgid`, `kauth_cred_getruid`
`task`	`get_task_map`, `get_bsdtask_info`, `task_reference`
`socket`	`file_socket`, `soisconnecting`, `soisconnected`
`mount`	`vfs_flags`, `vfs_statfs`

Example: Mapping `struct ucred`

Search for functions containing kauth_cred_get:

_kauth_cred_getuid   → return *(arg1 + 0x18)  →  cr_uid  at +0x18
_kauth_cred_getruid  → return *(arg1 + 0x1C)  →  cr_ruid at +0x1C
_kauth_cred_getsvuid → return *(arg1 + 0x20)  →  cr_svuid at +0x20
_kauth_cred_getgid   → return *(arg1 + 0x28)  →  cr_gid  at +0x28
_kauth_cred_getrgid  → return *(arg1 + 0x68)  →  cr_rgid at +0x68
_kauth_cred_getsvgid → return *(arg1 + 0x6C)  →  cr_svgid at +0x6C

Decompiler vs. disassembly

Decompilers sometimes introduce confusing array indexing notation. When the decompiler shows arg1[0x15], the actual offset depends on what type it infers for arg1. Always verify against the raw disassembly.

For example, arg1[0x15a] in decompilation might mean arg1 + 0x15a * sizeof(element). But the ARM64 instruction will show the real byte offset:

; 0x5c9a40
add     x0, x0, #0x579    ; This is the actual offset

When in doubt, read the assembly instructions: they are always the ground truth.

Phase 4: Iterator and Constructor Functions

When accessor functions do not exist for a field (many internal fields are never exported), look at functions that iterate or construct instances of the struct. These functions touch many fields and reveal the overall layout.

The Iterator Pattern

Functions named *_iterate, *_foreach, or *_walk traverse linked lists of kernel objects. They reveal:

The global head pointer of the list (a kernel global variable)
The list entry offset within the struct. It is often +0x00 for the primary list, but a struct can have multiple list entries at different offsets (e.g. proc.p_list at +0x00 vs proc.p_hash at +0xA0)
The count variable (for instance nprocs, in proc_iterate)
Various field accesses used for filtering

Example: `proc_iterate`

This single function revealed:

What	How	Value
`allproc` global	First data reference loaded as list head	`0xfffffff0078b7728`
`zombproc` global	Second list head (conditional on flags)	`0xfffffff0078b7730`
`nprocs` global	Loop bound variable	`0xfffffff0078b7d00`
`p_list.le_next`	`i = *i` (following the list)	`+0x00`
`p_pid`	Stored into pidlist array	`+0x60`
`p_stat`	Compared against 1 (zombie filter)	`+0x64`
`p_listflag`	Reference count manipulation	`+0x464`

The Constructor Pattern

Functions named *create*, *init*, or *alloc* initialize struct fields. They often set fields sequentially, revealing the struct layout in order.

For instance for the socreate_internal routine the socket creation function revealed over 20 struct fields by tracing the sequential stores to the newly allocated socket.

// x21 = newly allocated socket
*(x21 + 0x18)  = protosw;         // so_proto
*(x21 + 0x1e0) = kauth_cred;      // so_cred
*(x21 + 0x1e4) = proc_pid(p);     // so_last_pid
*(x21 + 0x1e8) = proc_uniqueid(p);// so_last_upid
*(x21 + 0x288) = tpidr_el1;       // so_background_thread

What to look for in constructors

Calls to other accessors functions (e.g., proc_pid()) whose return value is stored
memcpy calls that reveal embedded sub-structures
str xzr (storing zero) to initialize pointer fields

Phase 5: Syscall Implementations (The Deep Dive)

When neither accessors nor iterators exist for a field, look at the syscall implementations that operate on the struct. Syscalls are the boundary between userspace and kernel space; they must read and write kernel structs to do their work.

Naming conventions

XNU syscall implementations follow the pattern sys_<name> or just <name> for older BSD syscalls:

Syscall	Function name	Reveals
`chdir(2)`	`sys_chdir`	`filedesc.fd_cdir` offset
`chroot(2)`	`chroot`	`filedesc.fd_rdir` offset, chroot flag
`open(2)`	`vn_open_auth`	fileproc/fileglob chain
`fchdir(2)`	`sys_fchdir`	filedesc locking pattern

We could for example try to access some fields of proc via sys_chdir.

The chdir syscall must update the current working directory. Decompiling it reveals:

IORWLockWrite(proc + 0x128);            // fd_rw_lock
old = *(proc + 0x118);                  // fd_cdir (old value)
*(proc + 0x118) = new_vnode;            // fd_cdir = new directory
lck_rw_unlock_exclusive(proc + 0x128);
if (old != NULL) vnode_rele(old);

This gives us three offsets from one function:

proc + 0x118 = fd_cdir
proc + 0x128 = fd_rw_lock
And confirms the filedesc is inline in the proc (no intermediate pointer)

The inline vs. pointer question

A critical question when mapping any struct: is sub-struct X a pointer to a separate allocation, or is it embedded inline?

The answer comes from how the code accesses it. If you see:

// Pointer to separate struct:
fd = *(proc + SOME_OFFSET);    // load a pointer
cdir = *(fd + 0x18);           // dereference through it

// Inline (embedded):
cdir = *(proc + 0x118);        // direct access, no intermediate load

If there is no intermediate pointer load, the sub-struct is inline. This is exactly what we found for filedesc inside proc: the fields are at direct offsets from the proc base.

Phase 6: Zone ID Validation (Identifying Protected Structures)

zone_require() and zone_id_require_ro() are used to validate that pointers belong to the correct memory zone. These checks reveal what zone a struct lives in and whether it is read-only.

Reading zone validation

When you see code like this:

// Inside _proc_ucred:
x1 = *(arg1 + 0x18);                    // load proc_ro pointer
zone_id_require_ro_panic(5, x1);        // validate it belongs to zone #5

Then we can deduce:

proc + 0x18 is a pointer to another struct
That struct lives in zone #5
Zone #5 is a read-only zone (the _ro suffix)

Zone ID mapping

By collecting all zone_id_require_ro_panic calls across the kernelcache, you can build a complete map of protected zones:

Zone ID	Struct	Protection
3	`thread_ro`	read-only
5	`proc_ro`	read-only
7	`ucred`	read-only
0x17	`proc`	Regular zalloc (with `zone_require`)

Understanding which structures are in read-only zones tells you about the kernel's security architecture. Fields that Apple moved into proc_ro are protected and cannot be modified even with a kernel read/write primitive.

Phase 7: Following Pointer Chains (Graph Traversal)

Individual functions rarely traverse more than one or two pointer hops. But by combining offsets discovered in different functions, you can build paths between objects that have no direct accessor.

For example, there is no socket_get_proc() in the KPI — you cannot find the owning process of a socket with a single function search. But the path exists if you chain discoveries from earlier phases:

From socreate_internal (Phase 2): socket + 0x288 stores the creating thread (tpidr_el1)
From _current_proc (Phase 1): thread + 0x350 → thread_ro, then thread_ro + 0x10 → proc

socreate_internal        _current_proc         _proc_pid
  found in Phase 2         found in Phase 1      found in Phase 1
        │                        │                     │
        ▼                        ▼                     ▼
  ┌──────────┐  +0x288  ┌────────────┐ +0x350  ┌────────────┐ +0x10  ┌──────────┐ +0x60
  │  socket  │ ───────→ │   thread   │ ──────→ │ thread_ro  │ ─────→ │   proc   │ ─────→ p_pid
  └──────────┘          └────────────┘         └────────────┘        └──────────┘
                         (tpidr_el1)             (zone RO #3)

Neither function knows about the other. But combining them gives you a three-hop path from any socket to its owning process — something you could never find by searching function names alone.

This is where the work becomes cumulative: every offset you confirmed in Phases 1–5 is a building block. The more you have, the more paths you can construct.

Phase 8: Hash Tables and Complex Data Structures

Some kernel lookups use hash tables instead of linked lists. The hash function and table structure can be recovered from the lookup function.

Example: PID hash table from `_proc_find`

_proc_find takes a PID and returns the corresponding proc. Decompiling it reveals:

A multiplicative hash function applied to the PID
A global hash table pointer at a known address
A mask derived from table metadata
A chain walk through the collision list, comparing PIDs

The hash entry lives at proc + 0xA0, which means the proc struct has a LIST_ENTRY at that offset for chaining in the hash table. The PID comparison happens at hash_entry - 0xA0 + 0x60, confirming p_pid at +0x60 from another angle.

Practical Tips

Function clusters reveal struct regions

If you find proc_pid at +0x60, proc_ppid at +0x20, and proc_pgrpid at +0x28, you know the PID-related fields are clustered in the +0x20–0x68 region. This helps you predict where other related fields might be, and focus your search.

Size hints from `zalloc_ro_mut`

When zalloc_ro_mut(zone_id, ptr, offset, src, size) is called, the size parameter tells you the total size of the read-only struct. For example, proc_ro is 0x80 bytes.

ARM64 instruction cheat sheet for offset extraction

Instruction	What it tells you
`ldr x0, [x1, #0x60]`	64-bit load from offset 0x60
`ldr w0, [x1, #0x60]`	32-bit load from offset 0x60
`ldrh w0, [x1, #0x70]`	16-bit load from offset 0x70
`ldrb w0, [x1, #0x64]`	8-bit load from offset 0x64
`str x2, [x1, #0x18]`	64-bit store at offset 0x18
`add x0, x1, #0x579`	Compute address at offset 0x579 (often for strings/arrays)
`stp x2, x3, [x1, #0x50]`	Store pair: 64-bit values at +0x50 and +0x58
`adrp x8, PAGE` then `ldr x8, [x8, #OFF]`	Global variable load at PAGE+OFF
`mrs x0, tpidr_el1`	Load current thread pointer

Field size from instruction width

The ARM64 instruction tells you the field size:

ldr x / str x → 8 bytes (pointer, uint64)
ldr w / str w → 4 bytes (int32, uint32, pid_t)
ldrh / strh → 2 bytes (uint16, short)
ldrb / strb → 1 byte (uint8, char, bool)

Take aways

Jonathan Levin, *OS Internals (volumes I–III): the definitive reference on XNU internals
Apple XNU source — opensource.apple.com