Ever since we introduced our ideas the next version of akonadi, we’ve been working on a proof of concept implementation, but we haven’t talked a lot about it. I’d therefore like to give a short progress report.
By choosing decentralized storage and a key-value store as the underlying technology, we first need to prove that this approach can deliver the desired performance with all pieces of the infrastructure in place. I think we have mostly reached that milestone by now. The new architecture is very flexible and looks promising so far. We managed IMO quite well to keep the levels of abstraction to a necessary minimum, which results in a system that is easily adjusted as new problems need to be solved and feels very controllable from a developer perspective.
We’ve started off with implementing the full stack for a single resource and a single domain type. For this we developed a simple dummy-resource that currently has an in-memory hash map as backend, and can only store events. This is a sufficient first step, as turning that into the full solution is a matter of adding further flatbuffer schemas for other types and defining the relevant indexes necessary to query what we want to query. By only working on a single type we can first carve out the necessary interfaces and make sure that we make the effort required to add new types minimal and thus maximize code reuse.
The design we’re pursuing, as presented during the pim sprint, consists of:
- A set of resource processes
- A store per resource, maintained by the individual resources (there is no central store)
- A set of indexes maintained by the individual resources
- A clientapi that knows how to access the store and how to talk to the resources through a plugin provided by the resource implementation.
By now we can write to the dummyresource through the client api, the resource internally queues the new entity, updates it’s indexes and writes the entity to storage. On the reading part we can execute simple queries against the indexes and retrieve the found entities. The synchronizer process can meanwhile generate also new entities, so client and synchronizer can write concurrently to the store. We therefore can do the full write/read roundtrip meaning we have most fundamental requirements covered. Missing are other operations than creating new entities (removal and modifications), and the writeback to the source by the synchronizer. But that’s just a matter of completing the implementation (we have the design).
To the numbers: Writing from the client is currently implemented in a very inefficient way and it’s trivial to drastically improve this, but in my latest test I could already write ~240 (small) entities per second. Reading works around 40k entities per second (in a single query) including the lookup on the secondary index. The upper limit of what the storage itself can achieve (on my laptop) is at 30k entities per second to write, and 250k entities per second to read, so there is room for improvement =)
Given that design and performance look promising so far, the next milestone will be to refactor the codebase sufficiently to ensure new resources can be added with sufficient ease, and making sure all the necessary facilities (such as a proper logging system), or at least stubs thereof, are available.
I’m writing this on a plane to Singapore which we’re using as gateway to Indonesia to chase after waves and volcanoes for the next few weeks, but after that I’m looking forward to go full steam ahead with what we started here. I think it’s going to be something cool =)