SQLite Primary Key Table B-Tree Storage Mechanism Explained


SQLite B-Tree Architecture for Primary Key Tables

The core of SQLite’s storage engine revolves around its implementation of B-trees, which are hierarchical data structures optimized for disk-based storage and retrieval. Unlike B+ trees, which separate internal nodes (for navigation) and leaf nodes (for data storage), SQLite’s B-tree design merges these roles. This design choice directly impacts how tables—including those with only a primary key—are stored on disk.

Every SQLite database is divided into fixed-size pages (default: 4 KB). These pages form the atomic unit of I/O operations. For a table with a single-column primary key, the B-tree structure organizes rows such that each leaf node contains the primary key value and any associated data. In the simplest case (e.g., a table with only a primary key and no additional columns), the leaf node stores the primary key itself.

Why B-Tree Instead of B+ Tree?
B+ trees are often praised for their efficiency in range queries due to their linked leaf nodes. However, SQLite prioritizes simplicity and predictable performance across diverse workloads. B-trees allow SQLite to store both keys and data in leaf nodes, reducing the need for additional pointer lookups required in B+ trees. This trade-off is intentional: SQLite avoids specialized optimizations (like B+ trees) to maintain a lightweight codebase and universal applicability.

Leaf Node Structure
For a table with only a primary key (e.g., CREATE TABLE t(id INTEGER PRIMARY KEY);), each leaf node contains:

  • A header with metadata (e.g., cell count, free block offsets).
  • Cells storing the primary key values. Each cell includes the key’s encoded value and a pointer to its payload (if applicable).

In this minimalistic scenario, the payload is effectively the key itself. The B-tree ensures that keys are sorted in ascending order, enabling binary search within pages and logarithmic-time traversal across the tree.

Interior Node Dynamics
Interior nodes (non-leaf pages) guide navigation to leaf nodes. Each interior node contains K keys and K+1 pointers to child pages. For example, if an interior page has keys [10, 20, 30], it will have four pointers: pages for keys <10, 10–20, 20–30, and >30. The value of K is determined dynamically based on key size and page capacity, ensuring that interior nodes maximize space utilization without exceeding page limits.


Variable-Length Keys and Page Capacity Constraints

A critical challenge in SQLite’s storage model is handling variable-length keys, such as strings or blobs, which can theoretically exceed the available space on a page. This raises questions about how SQLite guarantees that interior nodes always store a minimum number of keys (K ≥ 4) and how overflow mechanisms prevent structural degradation.

Key Size Limitations and Overflow Pages
SQLite enforces a strict rule: no single key can consume more than one-fourth of a page’s usable space. For a 4 KB page, this limits individual keys to approximately 1 KB. If a key exceeds this threshold, SQLite splits it across overflow pages. For example, a 5 KB string key in an index would occupy one primary page slot (with a 1 KB prefix) and four overflow pages (each holding 1 KB of data). This ensures that interior pages retain sufficient capacity to store at least four keys, preserving the B-tree’s balance and search efficiency.

Calculating K for Interior Pages
The number of keys (K) in an interior page is derived via:

Usable_Page_Space = Page_Size - Reserved_Header_Bytes  
Max_Key_Size = Usable_Page_Space / 4  
K = Floor((Usable_Page_Space - Pointer_Size) / (Avg_Key_Size + Pointer_Size))  

For fixed-size keys (e.g., 4-byte integers), K is stable. For variable-length keys, SQLite uses prefix compression to minimize redundancy. Consecutive keys in sorted order share common prefixes, allowing only the differing suffix to be stored. This optimization increases K without compromising page integrity.

Implications for Primary Key-Only Tables
When a table’s primary key is a variable-length type (e.g., TEXT), SQLite automatically enforces the 1/4-page limit. If the key exceeds this size, overflow pages are allocated. However, primary keys often use fixed-size types (e.g., INTEGER), avoiding overflow and ensuring that K remains maximized.


SQL Key Definitions vs. B-Tree Key Implementation

The term “key” in SQLite’s documentation refers to B-tree keys, which are distinct from SQL-level primary or unique keys. Understanding this distinction is vital for reconciling schema design with storage behavior.

SQL Primary Keys and B-Tree Keys
In SQLite, every table is represented as a B-tree. For ROWID tables (the default), the B-tree key is the ROWID, a 64-bit integer. Declaring a column as INTEGER PRIMARY KEY aliases it to the ROWID, effectively making the SQL primary key the B-tree key. For WITHOUT ROWID tables, the B-tree key is the concatenation of columns in the SQL primary key.

Unique Indexes as B-Tree Keys
Unique indexes (including implicit ones for primary keys) are stored as separate B-trees. Each entry in an index B-tree contains the indexed columns’ values followed by the corresponding table’s B-tree key (ROWID or primary key columns). This ensures index entries uniquely identify rows.

Composite Keys and Storage Layout
For a composite primary key (e.g., PRIMARY KEY (a, b)), the B-tree key is the concatenation of a and b in indexed order. Each column’s value is encoded using SQLite’s binary serialization format, which includes type identifiers and variable-length data. The on-disk layout ensures that composite keys are compared byte-for-byte, following the declared collation for TEXT columns.

Practical Example: Minimal Table Storage
Consider a table with only a primary key:

CREATE TABLE example(id TEXT PRIMARY KEY) WITHOUT ROWID;  

The B-tree key is the id column. If id is "ABCDEFGHIJ" (10 ASCII bytes), each leaf node cell stores:

  • A 1-byte header (serial type code for TEXT).
  • The 10-byte string.
  • No payload (since there are no additional columns).

If the string grows beyond 1 KB, overflow pages are allocated, but the B-tree structure remains intact.

Debugging Common Misconceptions

  • Myth: “SQLite’s B-tree keys are the same as SQL unique constraints.”
    Reality: Unique constraints are enforced via separate B-trees (indexes), whereas the table’s B-tree key is either the ROWID or the primary key.
  • Myth: “Large keys fragment the B-tree.”
    Reality: Overflow pages isolate large values, preserving the B-tree’s balance and page occupancy guarantees.

By dissecting SQLite’s B-tree mechanics, page management, and key handling, developers can optimize schema designs and anticipate storage behaviors. This deep integration of file format and query execution underscores SQLite’s reliability as an embedded database engine.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *