Abstract
This series is the journey to implement a simplistic LSM engine, more focussed on simplicity than features. The history goes to previous failed attempts to implement database(s) as part of book reading clubs and using already implemented projects. Then came across this wonderful course mini-lsm – building LSM storage engine in a week. It is written in Rust, and even though Rust is on learning list decided to use Java to implement the course. Goal was to keep least amount of friction, learning Rust and implementing would slow me down initially. Eventually, shall reimplement again in Rust.
This series is not going to delve into theoretical aspects, they are covered in details in lot of posts and books. Goal is – A simple working LSM implementation
The series shall be multi-part and does not exactly follows the mini-lsm course. The exact details shall come as we walk the trail. Broadly, the plan is following
- Implement Memtable
- Implement SSTable
- SSTable compaction
- Add WAL
- Optionally use the engine for Key-Value store and add client lib
NOTE: All Unit Test cases are generated via Claude. And the code currently doesn’t handle concurrency issues
Memtable
Memtable is focus for this series. The book Database Internal covers it in details in Chapter 7 Log-Structured Storage. Essentially Memtable is the structure that keeps data in memory and receives all the writes. Once a threshold is reached like memory size, Memtable is flushed to the disk as SSTable.
To implement a Memtable, we need a Data structure that meets following requirements
- Fast Lookup
- Provides sorted reading of completed data or allow partial scan
To begin, added a simple interface for Memtable functionality
public interface Memtable {
byte[] get(byte[] key);
void put(byte[] key, byte[] value);
int getId();
long approximateSize();
}
To meet the requirements for a Memtable, ConcurrentSkipListMap fits the needs as it provides ordering as we need and fast lookup.
The implementation is fairly straight forward and doesn’t need much discussion.
public class SkipListMemtable implements Memtable {
private ConcurrentSkipListMap<ByteArrayWrapper, byte[]> map = new ConcurrentSkipListMap<>();
private final int id;
private AtomicLong estimatedSize = new AtomicLong(0);
public SkipListMemtable(int id) {
this.id = id;
}
@Override
public byte[] get(byte[] key) {
byte[] value = map.get(new ByteArrayWrapper(key));
return value == null ? null : Arrays.copyOf(value, value.length);
}
@Override
public void put(byte[] key, byte[] value) {
map.put(new ByteArrayWrapper(key), value);
estimatedSize.addAndGet(key.length + value.length);
}
// .. some functionality not shown
}
ByteArrayWrapper has been used as wrapper class with comparator. There are other ways to pass comparator, however this makes reading code easier. We shall handle delete calls later, as delete in LSM world means adding tombstones.
Iterator
Coming to 2nd important part of adding an iterator which can facilitate reading data to flush, as well as provide scan functionality. To keep things simple, kept the functionality to minimum
public record KeyValuePair(ByteArrayWrapper keyWrapper, byte[] data){}
public interface MemtableIterator {
boolean hasNext();
KeyValuePair next();
}
KeyValuePair was created to keep code simpler and in-sync with Iterator philosophy. It’s implemented differently on rust mini-lsm side.
The implementation mostly rely on the ConcurrentSkipListMap iterator. We have 2 cases to support
- Reading entire data of Memtable
- Reading partial data by providing key range. Here we shall keep life simple and ensure that provided keys are included. Key exclusion shall be implemented later
Here is quick implementation
public class DefaultMemtableIterator implements MemtableIterator {
private ConcurrentSkipListMap<ByteArrayWrapper, byte[]> map;
Iterator<Map.Entry<ByteArrayWrapper, byte[]>> iterator;
/**
* Defaulkt constructor to iterator complete Memtable like flushing to disk
* @param memtable
*/
public DefaultMemtableIterator(ConcurrentSkipListMap<ByteArrayWrapper, byte[]> memtable) {
this.map = memtable;
this.iterator = memtable.entrySet().iterator();
}
/**
* Memtable to be used when querying a range of data from within the Memtable
* To start we implement key's being inclusive, we can add the Bounds later
*
* @param memtable
* @param beginKey
* @param endKey
*/
public DefaultMemtableIterator(ConcurrentSkipListMap<ByteArrayWrapper, byte[]> memtable, ByteArrayWrapper beginKey, ByteArrayWrapper endKey) {
this.map = memtable;
NavigableMap<ByteArrayWrapper, byte[]> iterableMap;
if(beginKey == null && endKey == null) {
iterableMap = memtable;
} else if(beginKey == null ) {
iterableMap = map.headMap(endKey);
} else if(endKey == null ) {
iterableMap = map.tailMap(beginKey);
} else {
iterableMap = map.subMap(beginKey, true, endKey, true);
}
iterator = iterableMap.entrySet().iterator();
}
@Override
public boolean hasNext() {
return iterator.hasNext();
}
@Override
public KeyValuePair next() {
Map.Entry<ByteArrayWrapper, byte[]> entry = iterator.next();
return new KeyValuePair(entry.getKey(), entry.getValue());
}
}
We rely on the NavigableMap provided by ConcurrentSkipListMap. This completes a very basic Memtable implementation. There is more needed to handle Memtables in LSM, and we shall add them as we move forward on the journey.