The storage model used in encounter.io

08 Jun 2014

The first version of the encounter.io editor used a pretty standard setup with MongoDB on the server and backbone.js on the client. Objects were stored as whole and the user had to press save buttons to have his changes saved. We noticed that this was not user-friendly and that we should do better.

So I started investigating how to automatically track changes to objects, so we could add auto-save. At the time I came across Object.observe and it seemed ideal for the client. So I wrote avers.js which helps tracking changes of deeply nested objects. A similar project exists as part of Polymer: observe-js.

Auto-save is very convenient, but can also be dangerous, in particular if there is no undo. So instead of overwriting the object on the server after each change, we store the changes individually. Because each change is usually very small and mostly self-contained, this gives us undo and potentially collaborative editing for free. Think Google Docs where two people can edit the same document at the same time, without stepping on each others toes.

The new storage model evolved over time. The following document describes the basic idea and implementation. Our server is written in Haskell and uses RethinkDB for persistent storage. The code is not open source, mainly because I haven't put much thought into how to sensibly split it up into a public library. But if you are interested get in touch with me and I'll show you the code.

The Avers storage model

The Avers storage model defines how domain objects are managed both on the server and client. The definition includes a protocol which is used to synchronize changes to these objects. Avers imposes certain restrictions on how the domain objects can be modeled. Therefore, it is not suitable for every use case.

Avers started out as avers.js, a purely client-side library to observe changes made to plain JavaScript objects. From there on, it grew into a protocol used to describe these changes, as well as a server-side implementation to handle persistent storage.

Objects

The basic entity in Avers is an object. Metadata about the object include a unique identifier, a type which is used to determine how to parse the object, and when and by whom it was created. The content itself is stored separately.

Atomic changes to an object is explicitly recorded and stored as patches. Each revision of an object is assigned a (per-object) unique number. This number starts at zero and increments with each patch.

To restore the contents of an object at a given revision, the sequence of patches leading up to that revision are applied to a stub. To make this retrieval efficient, precomputed snapshots are stored at regular revision intervals.

Operations

Operations describe atomic changes to an object. Since Avers started out as a JavaScript library, operations map closely to changes emitted by Object.observe. Avers maps these changes to just two operations:

set: Add, update or delete a property.
splice: Remove and insert elements of an array.

Operations include a path, which describes at which position, relative to the root, the operation should be applied. The path is encoded as a dot-delimited series of properties to walk down.

Example:

var op = { type: 'set', path: 'the.answer', value: 42 };

// Apply the operation to this root object:
var root = { the: {} };
applyOperation(op, root);

// Verify that the answer was set.
assert(root.the.answer === 42);

Releases

In the traditional model, the user makes multiple changes and then presses a save button to commit them. Avers saves changes automatically as soon as they happen. This makes it impossible to tell whether a change is self-contained or part of a larger changeset.

To allow users to safely edit an object and only publish it once they are assured that its state is sound, they can explicitly mark revisions as released.

Authorization

Each object has an associated record which describes who is authorized to view and manipulate it. The record is stored separately, to allow editing it independently of the actual object.

Avers only provides this record as means to store the authorization rules. In which form they are stored and the checks implemented (ACL, RBAC, ABAC, ...) depends on the particular requirements of the project.

In most cases, a simple model based on roles (admin, user, collaborator etc) and permissions (read, modify, delete etc) will be sufficient.

Persistent storage

To provide persistent storage, objects and related metadata is stored in a database. There is three basic tables: objects, patches and snapshots.

The tables are independent from each other, and because all data is write-only, no foreign key constraints have to be maintained. The tables can even be stored in different databases.

Objects are given a unique, alphanumeric ID upon creation. This ID is used in derived form for all rows in the afromentioned three main tables.

objects: id
patches: id@rev - Patch which brings the object from rev - 1 to rev
snapshots: id@rev - Snapshot of an object at rev.

Because databases automatically enforce uniqueness constraint on primary keys, this automatically ensures that the patch sequence for a given object is monotonic. Release and Authorization objects also get derived IDs:

release: id/release/rev
authorization: id/authorization

Secondary indices

It is often necessary to look up objects by certain condition, or ensure that a particular field inside a type of objects is unique (such as the username). Maintaining these secondary indices is outside of the scope of Avers.

Most projects will require additional tables anyway, such as for HTTP sessions, or to store user passwords outside of the main Avers tables. These tables can be used to provide secondary indices.