JankSQL uses the CSharpTest.Net B-Tree implementation, which is some amazing software. It implements a simple interface in its BTree
generic class so that we can make a BTree of keys and values over BTree<Key, Value>
. The class supports persistence and locking and gives enumerators that look up keys and walks the values available starting at a key.
In JankSQL, The BTree
class is used with the Tuple
class that implements a tuple of typed values, each represented with the ExpressionOperand
class. Tuple represents a set of values, so it's used for both the key and the value. Thus, JankSQL's use of BTree
is always on BTree<Tuple, Tuple>
. Tuple has helpers that implement the comparison and persistence interfaces that BTree
requires.
JankSQL implements a table, then, with a key-value store built on a BTree<Tuple, Tuple>
object. The value Tuple
contains all of the columns of the table. The key is a Tuple
that contains a single integer which is used as a monotonically increasing row ID.
Since the table has no index, any operation against it is a scan. Inserting a new row simply adds a one to the last used key and inserts the row as the value for that key. Deleting a row simply removes the row, and the key number is not re-used.
For now, this approach is quite adequate, but it does mean that a table can't survive more than 232 operations because the row ID value will wrap-around. (This is tracked by Issue #2).
Conceptually, we can consider table's fundamental storage -- sometimes called it "heap", perhaps incorrectly -- to be a map between the row ID and the actual row payload: BTree<RowID, Tuple>
. It's just that the row ID itself is implemented as a Tuple
, too.
A unique index in Jank augments the fundamental BTree
with another access path. Each index is implemented a map from the keys of the index to the row ID. We can consider the table and the first index as an example:
BTree<Tuple, Tuple> theTable; // key: row ID, value: rows
BTree<Tuple, Tuple> firstIndex; // key: index key, value: row ID
To find a row, we can look it up by key in firstIndex
to get a row ID. Then, to get the remaining columns, the row ID is used to probe theTable
to get that payload.
Any number of indexes can be created, all referencing back to theTable
via the row ID key.
Classically, BTrees implement only unique indexes: keys can't be duplicated. CSharpTest's implementation is no different, so Jank must provide some mechanism for handling duplicate key values in non-unique indexes.
Jank's approach simply appends a unique ID to the key set. If a non-unique index is created with key columns Col1
and Col2
, the effective key becomes (Col1, Col2, uniqueifier)
. A probe for a value into a non-unique index naturally is a scan, since there may be zero, one, or more values matching the key due to its non-unique nature.
Jank is limited again by using a 32-bit integer here, so any non-unique index an have only 232 keys with the same value.
The addition or removal of a row to the table updates all indexes. Updating a value in an existing row must update the indexes that cover that column.