A zero-copy, compile-time multi-index data structure for Zig that enables querying a single collection through multiple independent indexes simultaneously, with automatic index selection and synchronization.
const MultiIndex = @import("multi-index").MultiIndex;
const Person = struct {
name: []const u8,
age: u8,
email: []const u8, // Not indexed; can still be stored
};
var people = MultiIndex(Person, .{
.name = .{
.unique = true,
.ordered = false, // Unordered index (hash table)
},
.age = .{}, // Ordered index (AVL tree, multiset allowed)
}).init(allocator);
defer people.deinit();
// Insert
try people.insert(.{ .name = "Alice", .age = 30, .email = "alice@example.com" });
try people.insert(.{ .name = "Bob", .age = 25, .email = "bob@example.com" });
// O(1) amortized lookup by name
if (people.find(.name, "Alice")) |person| {
std.debug.print("{s} is {}\n", .{ person.name, person.age });
}
// O(log n) range queries by age
var range = people.range(.age, 25, 30);
while (range.next()) |person| {
std.debug.print("- {s} ({})\n", .{ person.name, person.age });
}
// Cannot lookup by email (not indexed)
// people.find(.email, "alice@example.com") // Compile error!
// Automatic re-indexing on update
try people.update(iterator, .{ .name = "Alicia", .age = 31, .email = "alicia@example.com" });zig-multi-index is a generic data structure that maintains a single collection of values indexed through multiple independent index backends (e.g. AVL trees, hash tables...). Each index is a separate data structure that references the same underlying collection, enabling O(log n) or O(1) lookups across any indexed field without duplicating the actual data.
Key characteristics:
- Type-safe, field-based indexing: Indexes are selected using tagged field names (
.name,.age) - Automatic index selection: The library chooses the best data structure based on your config (currently AVL or hash-table)
- Zero-copy: All indexes reference the same underlying node; no data duplication
- Flexible: Index only the fields you need; store any data you want
- Atomic updates: Changes to one index are atomic across all indexes via two-phase insert/update protocol
zig-multi-index maintains three layers:
- Node Structure (Runtime): A wrapper containing the actual value and headers for each index
- Indexes (Runtime): Multiple independent data structures (AVL or hash table), each maintaining a view into the nodes
- MultiIndex Wrapper (Compile-time generated): A type-safe API that coordinates all indexes
MultiIndex<Person> (type-safe API)
├─ Index[.name] (Hash Table) → Person.name nodes (O(1) lookup)
└─ Index[.age] (AVL Tree) → Person.age nodes (O(log n) + range queries)
All indexes contain pointers to the SAME underlying Node<Person> instances
The key insight is using intrusive node headers: instead of storing separate copies of data, each index holds a header (node pointer) that is embedded inside the actual node. This allows:
- Zero duplication: One copy of each value lives in memory
- Structural synchronization: Removing a node automatically removes it from all indexes
- Cache efficiency: All indexes traverse the same node pointers
const Node = struct {
headers: struct {
name: hash_table.Node, // embedded header for name index (hash table)
age: avl.Node, // embedded header for age index (AVL tree)
},
value: Person, // the actual data (single copy)
};Note: The header field names match the indexed field names for clarity. Each header type depends on the configuration.
To maintain consistency across indexes, operations use a prepare-finish pattern:
Insert:
- Prepare Phase: Each index validates constraints (uniqueness) and reserves a position (hint)
- Finish Phase: Each index commits the node using the reserved hint
If any index fails validation, the entire operation fails before any index is modified, preserving atomicity.
// Step 1: Prepare all indexes
var hints: Hints = undefined;
inline for (indexes) |index| {
hints[i] = try index.prepare_insert(node); // May fail on duplicate
}
// Step 2: All or nothing commit
inline for (indexes) |index| {
index.finish_insert(hints[i], node); // Guaranteed to succeed
}Update:
- Prepare: Check if reordering/rehashing is needed in each index and validate constraints
- Commit: Remove from old position, insert at new position (if needed)
If an index doesn't need reordering, it's skipped.
The Config type describes which fields get indexed and how:
MultiIndex(T, .{
.field_name = .{
.unique = true, // Enforce uniqueness (default: false)
.ordered = true, // Choose an ordered data structure (default: true)
// for AVL
.compare_fn = custom_cmp, // Custom comparator (default: std.math.order)
// for HashTable
.hash_context = context, // Custom hash context for unordered indexes
.custom = CustomIndexType, // Use custom index backend
},
// ... only indexed fields go here
})At compile-time, Zig's comptime evaluation:
- Analyzes the config struct
- Generates unique index types for each field
- Creates a unified
Indexesstruct containing all backends - Generates
Node.Headersstruct with embedded headers - Produces an API that references fields by comptime field enums
Comptime:
- No runtime polymorphism or virtual dispatch
- All type relationships known at compile-time
- Inlining opportunities throughout
Structural Guarantees:
- Intrusive headers mean no extra allocations
- Update hints allow skipping unnecessary rebalancing
Index Backends:
- AVL Trees (ordered, O(log n) insert/lookup, self-balancing)
- Hash Tables (unordered, O(1) amortized insert/lookup)
- Selection is compile-time based on the config struct
pub fn insert(self: *Self, value: T) !void
pub fn update(self: *Self, it: Iterator, value: T) !T
pub fn erase_it(self: *Self, it: Iterator) void
pub fn count(self: Self) usize
pub fn reset(self: *Self) void
pub fn deinit(self: *Self) void// Exact match on any indexed field
pub fn find(self: Self, comptime field: Field, v: KeyType) ?T
pub fn find_it(self: Self, comptime field: Field, v: KeyType) Iterator
pub fn equal_range(self: Self, field: Field, v: KeyType) Range
// Range queries (ordered indexes only)
pub fn lower_bound(self: Self, comptime field: Field, v: KeyType) Iterator
pub fn upper_bound(self: Self, comptime field: Field, v: KeyType) Iterator
pub fn range(self: Self, field: Field, v1: KeyType, v2: KeyType) Rangevar range = people.range(.age, 25, 30);
while (range.next()) |person| {
// person is of type T
}// Change axe of iteration
iterator.switch_field(.new_field)
// Bulk erase by range
map.erase_range(map.range(.field, low, high))No runtime decision-making about which index to use. The compiler generates monomorphic code with no indirection. An insert call directly inlines into the specific AVL and hash table implementations being used.
Field names are not strings—they're comptime field enums generated from your struct. Attempting to index a non-indexed field is a compile-error:
people.find(.email, "...") // Error: email is not indexedStores one copy of each value with embedded headers. The indexes are just different tree structures pointing at the same nodes. This is similar to C++ Boost.MultiIndex's intrusive approach, but enforced by Zig's type system.
Specify ordered: true/false and the library picks the optimal backend at compile-time. Mix different index types for different fields:
MultiIndex(Person, .{
.id = .{ .unique = true, .ordered = false }, // Hash table (O(1) lookup)
.age = .{ .unique = false, .ordered = true }, // AVL (O(log n) + ranges)
})Updates that span multiple indexes use a prepare-finish protocol, ensuring all-or-nothing semantics. If reindexing fails on any index, the entire operation is rolled back.
zig-multi-index currently provides two backends:
| Backend | Ordered | Lookup | Insert | Remove | Use Case |
|---|---|---|---|---|---|
| AVL Tree | ✅ Yes | O(log n) | O(log n) | O(log n) | Range queries, sorted iteration |
| Hash Table | ❌ No | O(1) avg | O(1) avg | O(1) avg | Fast exact-match lookups |
The architecture is designed to support additional backends. Planned additions:
- Red-Black trees, scape goat trees... Similar to an AVL but with slightly different performance.
- Skip Lists: Ordered access but simpler to implement and better cache locality for sequential scans
- B-Trees: Better I/O performance for disk-backed or large-scale indexes
Example (future):
MultiIndex(Person, .{
.timestamp = .{ .custom = SkipList }, // Custom backend
.score = .{ .ordered = true }, // AVL
.uid = .{ .ordered = false }, // Hash table
})zig-multi-index is ideal for:
- Databases/indexes with multiple query paths (uid, name, timestamp, etc.)
- Cache managers needing LRU + lookup-by-key
- Routing tables with prefix + length ordering
- Event queues indexed by priority and deadline
- Any scenario where one collection needs 2+ independent access patterns
Consider alternatives if:
- You only have a single query pattern (use a single index)
- Your indexes don't need to stay in sync (use separate data structures)
- You need complex queries beyond exact match and range (full database)
Beyond simple key-value indexes, the architecture is extensible to other data structures:
- Priority Queue Index: Use a field as a priority key in a priority queue, enabling efficient
pop_min()orpop_max()across the entire dataset - Full-Text Search Index: Index string fields for keyword matching
These would be accessed similarly:
// Future: priority queue index on deadline field
var urgent = events.pop_min(.deadline);
// Future: full-text search index
var results = docs.search(.content, "keyword");This project is licensed under the MIT License - see the LICENSE file for details.
Author: Joachim Giron