So what is Kube? (and who is Sink?)

Michael first blogged about Kube, but we apparently missed to properly introduce the Project. Let me fix that for you 😉

Kube is a modern groupware client, built to be effective and efficient on a variety of platforms and form-factors. It is built on top of a high-performance data access layer and Qt Quick to provide an exceptional user experience with minimal resource usage. Kube is based on the lessons learned from KDE Kontact and Akonadi, building on the strengths and replacing the weak points.

Kube is further developed in coordination with Roundcube Next, to achieve a consistent user experience across the two interfaces and to ensure that we can collaborate while building the UX.

A roadmap has been available for some time for the first release here, but in the long run we of course want to go beyond a simple email application. The central aspects of the the problem space that we want to address is communication and collaboration as well as organization. I know this is till a bit fuzzy, but there is a lot of work to be done before we can specify this clearly.

To ensure that we can move fast once the basic framework is ready, the architecture is very modular to enable component reuse and make it as easy as possible to create new ones. This way we can shift our focus over time from building the technology stack to evolving the UX.

Sink

Sink is a high-performance data access layer that provides a plugin mechanism for various backends (remote servers e.g. imap, local maildir, …) an editable offline cache that can replay changes to the server, a query system for efficient data-access and a unified API for groupware types such as events, mails, todos, etc.

It is built on top of LMDB (a key-value store) and Qt to be fast and efficient.

Sink is built for reliability, speed and maintainability.

What Kube & Sink aren’t

It is not a rename of Kontact and Akonadi.
Kontact and Akonadi will continue to be maintained by the KDEPIM team and Kube is a separate project (altough we share bits and pieces under the hood).
It is not a rewrite of Kontact
There is no intention of replicating Kontact. We’re not interested in providing every feature that Kontact has, but rather focus on a set that is useful for the usecases we try to solve (which is WIP).

Development

Development planning happens on phabricator, and the kdepim mailinglist. Our next sprint is in Toulouse together with the rest of the KDEPIM team.

We also have a weekly meeting on Wednesday, 16:00 CET with notes sent to the ML. If you would like to participate in those meetings just let me know, you’re more than welcome.

Current state

Kube is under heavy development and in an early stage, but we’re making good progress and starting to see the first results (you can read mail from maildir and even reply to mails). However, it is not yet ready for general consumption (though installable installable).

If you want to follow the development closely it is also possible to build Kube inside a docker container, or just use the container that contains a built version of Kube (it’s not yet updated automatically, so let me know if you want further information on that).

I hope that makes it a bit clearer what Kube and Sink is and isn’t, and where we’re going with it. If something is still unclear, please let me know in the comments section, and if you want to participate, by all means, join us =)

Kube Architecture – A Primer

Kube’s architecture is starting to emerge, so it is time that I give an overview on the current plans.

But to understand why we’re going where we’re going it is useful to consider the assumptions we work with, so let’s start there:

Kube is a networked application.
While Kube can certainly be used on a machine that has never seen a network connection, that is not where it shines. Kube is built to interact with various services and to work well with multiple devices. This is the reality we live in and that we’re building for.
Kube is scalable.
Kube not only scales from small datasets that are quick to synchronize to large datasets, that we can’t simply load into memory all at once. It also scales to different form factors. Kube is usable on devices with small and large screens, with touch or mouse input, etc.
Kube is cross platform.
Kube should run just as well on your laptop (be it Linux, OS X or Windows) as it does on your mobile (be it Plasma Mobile or Android).
Kube is a platform for rapid development.
We’re not interested in rebuilding mail and calendar and stopping there. Groupware needs to evolve and we want to facilitate communication and collaboration, not email and events. This requires that the user experience can continue to evolve and that we can experiment with new ideas quickly, without having to do large-scale changes to the codebase.
Groupware types are overlapping.
Traditionally PIM/Groupware applications are split up by formats and protocols, such as IMAP, MIME and iCal but that’s not how typical workflows work. Just because the transport chosen by iTip for an invitation happens to be a MIME message transported over IMAP to my machine, doesn’t mean that’s necessarily how I want to view it. I may want to start a communication with a person from my addressbook, calendar or email composer. A note may turn into a set of todo’s eventually. …

A lot of pondering over these points has led to a set of concepts that I’d like to quickly introduce:

Components

Kube is built from different components. Each component is a KPackage that provides a QML UI backed by various C++ elements from the Kube framework. By building reusable components we ensure that i.e. the email application can show the very same contact view as the addressbook, with all the actions you’d expect available. This not only allows us to mix various UI elements freely while building the User Experience, it also ensures consistency across the board with little effort. The components load their data themselves by instantiating the appropriate models and are thus fully self contained.

Components will come in various granularities, from simple widgets suitable for popup display to i.e. a full email application.

The components concept will also be interesting for integration. A plasma clock plasmoid could for instance detect that the Kube calendar package is available, and show this instead of it’s native one. That way the integration is little effort, the user experience is well integrated (you get the exact same UX as in the regular application), and the full set of functionality is directly available (unlike when only the data was shared).

Models

Kube is reactive. Models provide the data that the UI is built upon, so the UI only has to render whatever the model provides. This avoids complex stateful UI’s and ensures a proper separation of bussiness logic and UI. The UI directly instantiates and configures the models it requires.
The models feed on the data they get from Sink or other sources, and are as such often thin wrappers around other API’s. The dynamic nature of models allows to dynamically load more data as required to keep the system efficient.

Actions

In the other direction provide “Actions” the interaction with the rest of the system. An action can be “mark as read”, or “send mail”, or any other interaction with the system that is suitable for reuse. The action system is a publisher-subscriber system where various parts can execute actions that are handled by one of the registered action-handlers.

This loose-coupling between action and handler allows actions to be dynamically handled by different parts of the system system, i.e. based on the currently active account when sending an email. It also ensures that action handlers are nice and small functional components that can be invoked from various parts in the system that require similar functionality.

Pre-Handlers allow preparatory steps to be injected into the action-execution, such as retrieving configuration or requesting authentication, or resolving some identifier over a remote service. Anything that is required really to have all input data available to be able to execute the action handler.

Controllers

Controllers are C++ components that expose properties for a QML UI. These are useful to prepare data for the UI where a simple model is not sufficient, and can include additional UI-helpers such as validators or autocompletion for input fields.

Accounts

Accounts is the attempt to account for (pun intended) the networked nature of the environment we’re working in. Most information we’re working with in Kube is or should be synchronized over one or the other account and there remains very little that is specific to the local machine (besides application state). This means most data and configuration is always tied to an account to ensure clear ownership.

However accounts not only manifest in where data is being put, they also manifest as “plugins” for various backends. They tie together a QML configuration UI, an underlying configuration controller (for validation and autocompletion etc), a Sink resource to access data i.e. over IMAP, a set of action handlers i.e. to send mail over smtp and potentially various defaults for identity etc.

In case you’re internally already shouting “KAccounts!, KAccounts!”; We’re aware of the overlap, but I don’t see how we can solve all our problems using it, and there is definitely an argument for an integrated solution with regards to portability to other platforms. However, I do think there are opportunities in terms of platform integration.

An that’s it!

Further information can be found in the Kube Documentation.

Bringing Akonadi Next up to speed

It’s been a while since the last progress report on akonadi next. I’ve since spent a lot of time refactoring the existing codebase, pushing it a little further,
and refactoring it again, to make sure the codebase remains as clean as possible. The result of that is that an implementation of a simple resource only takes a couple of template instantiations, apart from code that interacts with the datasource (e.g. your IMAP Server) which I obviously can’t do for the resource.

Once I was happy with that, I looked a bit into performance, to ensure the goals are actually reachable. For write speed, operations need to be batched into database transactions, this is what allows the db to write up to 50’000 values per second on my system (4 year old laptop with an SSD and an i7). After implementing the batch processing, and without looking into any other bottlenecks, it can process now ~4’000 values per second, including updating ten secondary indexes. This is not yet ideal given what we should be able to reach, but does mean that a sync of 40’000 emails would be done within 10s, which is not bad already. Because commands first enter a persistent command queue, pulling the data offline is complete even faster actually, but that command queue afterwards needs to be processed for the data to become available to the clients and all of that together leads to the actual write speed.

On the reading side we’re at around 50’000 values per second, with the read time growing linearly with the amount of messages read. Again far from ideal, which is around 400’000 values per second for a single db (excluding index lookups), but still good enough to load large email folders in a matter of a second.

I implemented the benchmarks to get these numbers, so thanks to HAWD we should be able to track progress over time, once I setup a system to run the benchmarks regularly.

With performance being in an acceptable state, I will shift my focus to the revisioned, which is a prerequisite for the resource writeback to the source. After all, performance is supposed to be a desirable side-effect, and simplicity and ease of use the goal.

Randa

Coming up next week is the yearly Randa meeting where we will have the chance to sit together for a week and work on the future of Kontact. This meetings help tremendously in injecting momentum into the project, and we have a variety of topics to cover to direct the development for the time to come (and of course a lot of stuff to actively hack on). If you’d like to contribute to that you can help us with some funding. Much appreciated!

Kontact on Windows

I recently had the dubious pleasure of getting Kontact to work on windows, and after two weeks of agony it also yielded some results =)

Not only did I get Kontact to build on windows (sadly still something to be proud off), it is also largely functional. Even timezones are now working in a way that you can collaborate with non-windows users, although that required one or the other patch to kdelibs.

To make the whole excercise as reproducible as possible I collected my complete setup in a git repository [0]. Note that these builds are from the kolab stable branches, and not all the windows specific fixes have made it back upstream yet. That will follow as soon as the waters calm a bit.

If you want to try it yourself you can download an installer here [1],
and if you don’t (I won’t judge you for not using windows) you can look at the pretty pictures.

[0] https://github.com/cmollekopf/kdepimwindows
[1] http://mirror.kolabsys.com/pub/upload/windows/Kontact-E5-2015-06-30-19-41.exe

Kontact_WindowsAccount_Wizard

Reproducible testing with docker

Reproducible testing is hard, and doing it without automated tests, is even harder. With Kontact we’re unfortunately not yet in a position where we can cover all of the functionality by automated tests.

If manual testing is required, being able to bring the test system into a “clean” state after every test is key to reproducibility.

Fortunately we have a lightweight virtualization technology available with linux containers by now, and docker makes them fairly trivial to use.

Docker

Docker allows us to create, start and stop containers very easily based on images. Every image contains the current file system state, and each running container is essentially a chroot containing that image content, and a process running in it. Let that process be bash and you have pretty much a fully functional linux system.

The nice thing about this is that it is possible to run a Ubuntu 12.04 container on a Fedora 22 host (or whatever suits your fancy), and whatever I’m doing in the container, is not affected by what happens on the host system. So i.e. upgrading the host system does not affect the container.

Also, starting a container is a matter of a second.

Reproducible builds

There is a large variety of distributions out there, and every distribution has it’s own unique set of dependency versions, so if a colleague is facing a build issue, it is by no means guaranteed that I can reproduce the same problem on my system.

As an additional annoyance, any system upgrade can break my local build setup, meaning I have to be very careful with upgrading my system if I don’t have the time to rebuild it from scratch.

Moving the build system into a docker container therefore has a variety of advantages:
* Builds are reproducible across different machines
* Build dependencies can be centrally managed
* The build system is no longer affected by changes in the host system
* Building for different distributions is a matter of having a couple of docker containers

For building I chose to use kdesrc-build, so building all the necessary repositories is the least amount of effort.

Because I’m still editing the code from outside of the docker container (where my editor runs), I’m simply mounting the source code directory into the container. That way I don’t have to work inside the container, but my builds are still isolated.

Further I’m also mounting the install and build directories, meaning my containers don’t have to store anything and can be completely non-persistent (the less customized, the more reproducible), while I keep my builds fast and incremental. This is not about packaging after all.

Reproducible testing

Now we have a set of binaries that we compiled in a docker container using certain dependencies, so all we need to run the binaries is a docker container that has the necessary runtime dependencies installed.

After a bit of hackery to reuse the hosts X11 socket, it’s possible run graphical applications inside a properly setup container.

The binaries are directly mounted from the install directory, and the prepared docker image contains everything from the necessary configurations to a seeded Kontact configuration for what I need to test. That way it is guaranteed that every time I start the container, Kontact starts up in exactly the same state, zero clicks required. Issues discovered that way can very reliably be reproduced across different machines, as the only thing that differs between two setups is the used hardware (which is largely irrelevant for Kontact).

..with a server

Because I’m typically testing Kontact against a Kolab server, I of course also have a docker container running Kolab. I can again seed the image with various settings (I have for instance a John Doe account setup, for which I have the account and credentials already setup in client container), and the server is completely fresh on every start.

Wrapping it all up

Because a bunch of commands is involved, it’s worthwhile writing a couple of scripts to make the usage a easy as possible.

I went for a python wrapper which allows me to:
* build and install kdepim: “devenv srcbuild install kdepim”
* get a shell in the kdepim dir: “devenv srcbuild shell kdepim”
* start the test environment: “devenv start set1 john”

When starting the environment the first parameter defines the dataset used by the server, and the second one specifies which client to start, so I can have two Kontact instances with different users for invitation handling testing and such.

Of course you can issue any arbitrary command inside the container, so this can be extended however necessary.

While that would of course have been possible with VMs for a long time, there is a fundamental difference in performance. Executing the build has no noticeable delay compared to simply issuing make, and that includes creating a container from an image, starting the container, and cleaning it up afterwards. Starting the test server + client also takes all of 3 seconds. This kind of efficiency is really what enables us to use this in a lather, rinse, repeat approach.

The development environment

I’m still using the development environment on the host system, so all file-editing and git handling etc. happens as usual so far. I still require the build dependencies on the host system, so clang can compile my files (using YouCompleteMe) and hint if I made a typo, but at least these dependencies are decoupled from what I’m using to build Kontact itself.

I also did a little bit of integration in Vim, so my Make command now actually executes the docker command. This way I get seamless integration and I don’t even notice that I’m no longer building on the host system. Sweet.

While I’m using Vim, there’s no reason why that shouldn’t work with KDevelop (or whatever really..).

I might dockerize my development environment as well (vim + tmux + zsh + git), but more on that in another post.

Overall I’m very happy with the results of investing in a couple of docker containers, and I doubt we could have done the work we did, without that setup. At least not without a bunch of dedicated machines just for that. I’m likely to invest more in that setup, and I’m currently contemplating dockerizing also my development setup.

In any case, sources can be found here:
https://github.com/cmollekopf/docker.git

Progress on the prototype for a possible next version of akonadi

Ever since we introduced our ideas the next version of akonadi, we’ve been working on a proof of concept implementation, but we haven’t talked a lot about it. I’d therefore like to give a short progress report.

By choosing decentralized storage and a key-value store as the underlying technology, we first need to prove that this approach can deliver the desired performance with all pieces of the infrastructure in place. I think we have mostly reached that milestone by now. The new architecture is very flexible and looks promising so far. We managed IMO quite well to keep the levels of abstraction to a necessary minimum, which results in a system that is easily adjusted as new problems need to be solved and feels very controllable from a developer perspective.

We’ve started off with implementing the full stack for a single resource and a single domain type. For this we developed a simple dummy-resource that currently has an in-memory hash map as backend, and can only store events. This is a sufficient first step, as turning that into the full solution is a matter of adding further flatbuffer schemas for other types and defining the relevant indexes necessary to query what we want to query. By only working on a single type we can first carve out the necessary interfaces and make sure that we make the effort required to add new types minimal and thus maximize code reuse.

The design we’re pursuing, as presented during the pim sprint, consists of:

  • A set of resource processes
  • A store per resource, maintained by the individual resources (there is no central store)
  • A set of indexes maintained by the individual resources
  • A clientapi that knows how to access the store and how to talk to the resources through a plugin provided by the resource implementation.

By now we can write to the dummyresource through the client api, the resource internally queues the new entity, updates it’s indexes and writes the entity to storage. On the reading part we can execute simple queries against the indexes and retrieve the found entities. The synchronizer process can meanwhile generate also new entities, so client and synchronizer can write concurrently to the store. We therefore can do the full write/read roundtrip meaning we have most fundamental requirements covered. Missing are other operations than creating new entities (removal and modifications), and the writeback to the source by the synchronizer. But that’s just a matter of completing the implementation (we have the design).

To the numbers: Writing from the client is currently implemented in a very inefficient way and it’s trivial to drastically improve this, but in my latest test I could already write ~240 (small) entities per second. Reading works around 40k entities per second (in a single query) including the lookup on the secondary index. The upper limit of what the storage itself can achieve (on my laptop) is at 30k entities per second to write, and 250k entities per second to read, so there is room for improvement =)

Given that design and performance look promising so far, the next milestone will be to refactor the codebase sufficiently to ensure new resources can be added with sufficient ease, and making sure all the necessary facilities (such as a proper logging system), or at least stubs thereof, are available.

I’m writing this on a plane to Singapore which we’re using as gateway to Indonesia to chase after waves and volcanoes for the next few weeks, but after that I’m  looking forward to go full steam ahead with what we started here. I think it’s going to be something cool =)

A new folder subscription system

Wouldn’t it be great if Kontact would allow you to select a set of folders you’re interested in, that setting would automatically be respected by all your devices and you’d still be able to control for each individual folder whether it should be visible and available offline?

I’ll line out a system that allows you to achieve just that in a groupware environment. I’ll take Kolab and calendar folders as example, but the concept applies to all groupware systems and is just as well applicable to email or other groupware content.

User Scenarios

  •  Anna has access to hundreds of shared calendars, but she usually only uses a few selected ones. She therefore only has a subset of the available calendars enabled, that are shown to her in the calendar selection dialog, available for offline usage and also get synchronized to her mobile phone. If she realizes she no longer requires a calendar, she simply disables it and it disappears from the Kontact, the Webclient and her phone.
  • Joe works with a small team that shares their calendars with him. Usually he only uses the shared team-calendar, but sometimes he wants to quickly check if they are in the office before calling them, and he’s often doing this in the train with unreliable internet connection. He therefore disables the team member’s calendars but still enables synchronization for them. This hides the calendars from all his devices, but he still can quickly enable them on his laptop while being offline.
  • Fred has a mailing list folder that he always reads on his mobile, but never on his laptop. He keeps the folder enabled, but hides it on his laptop so his folder list isn’t unnecessarily cluttered.

What these scenarios tell us is that we need a flexible mechanism to specify the folders we want to see and the folders we want synchronized. Additionally we want, in today’s world where we have multiple devices, to synchronize the selection of folders that are important to us. It is likely I’d like to see the calendar I have just enabled in Kontact also on my phone. However, we always want to keep the possibility to alter that default setting on specific devices.

Current State

If you’re using a Kolab Server, you can use IMAP subscriptions to control what folders you want to see on your devices. Kontact currently respects that setting in that it makes all folders visible and available for offline usage. Additionally you have local subscriptions to disable certain folders (so they are not downloaded or displayed) on a specific device. That is not very flexible though, and personally I ended up having pretty much all folders enabled that I ever used, leading to cluttered folder selections and lot’s of bandwith and storage space used to keep everything available offline.

To change the subscription state, KMail offers to open the IMAP-subscription dialog which allows to toggle the subscription state of individual folders. This works, but is not well integrated (it’s a separate dialog), and is also not well integrable since it’s IMAP specific.

Because the solution is not well integrated, it tends to be rather static in my experience. I tend to subscribe to all folders that I ever use, which results in a very long and cluttered folder-list.

A new integrated subscription system

What would be much better, is if the back-end could provide a default setting that is synchronized to the server, and we could quickly enable or disable folders as we require them. Additionally we can override the default settings for each individual folder to optimize our setup as required.

To make the system more flexible, while not unnecessarily complex, we need a per folder setting that allows to override a backend provided default value. Additionally we need an interface for applications to alter the subscription state through Akonadi (instead of bypassing it). This allows for a well integrated solution that doesn’t rely on a separate, IMAP-specific dialog.

Each folder requires the following settings:

  • An enabled/disabled state that provides the default value for synchronizing and displaying a folder.
  • An explicit preference to synchronize a folder.
  • An explicit preference to make a folder visible.

A folder is visible if:

  • There is an explicit preference that the folder is visible.
  • There is no explicit preference on visibility and the folder is enabled.

A folder is synchronized if:

  • There is an explicit preference that the folder is synchronized.
  • There is no explicit preference on synchronization and the folder is enabled.

The resource-backend can synchronize the enabled/disabled state which should give a default experience as expected. Additionally it is possible to override that default state using the explicit preference on a per folder level.

User Interaction

By default you would be working with the enabled/disabled state, that is synchronized by the resource backend. If you enable a folder it becomes visible and synchronized, if you disable it, it becomes invisible and not synchronized. For the enabled/disabled state we can build a very easy user interface, as it is a single boolean state, that we can integrate into the primary UI.

Because the enabled/disabled state is synchronized, an enabled calendar will automatically appear on your MyKolab.com web interface and your mobile. One click, and you’re all set.

Mockup of folder sync properties
Example mockup of folder sync properties

In the advanced settings, you can then override visibility and synchronization preference at will as a local-only setting, giving you full flexibility. This can be hidden in a properties dialog, so it doesn’t clutter the primary UI.

This makes the default usecase very simple to use (you either want a folder or you don’t want it), while we keep full flexibility in overriding the default behaviour.

IMAP Synchronization

The IMAP resource will synchronize the enabled/disabled state with IMAP subscriptions if you have subscriptions enabled in the resource. This way we can use the enabled/disabled state as interface to change the subscriptions, and don’t have to use a separate dialog to toggle that state.

Interaction with existing mechanisms

This mechanism can probably replace local subscriptions eventually. However, in order not to break existing setups I plan to leave local subscriptions working as they currently are.

Conclusion

By implementing this proposal we get the required flexibility to make sure the resources of our machine are optimally used, while different clients still interact with each other as expected. Additionally we gain a uniform interface to enable/disable a collection that can be synchronized by backends (e.g. using the IMAP subscription state). This will allow applications to nicely integrate this setting, and should therefore make this feature a lot easier to use and overall more agile.

New doors are opened as this will enable us to do on-demand loading of folders. By having the complete folder list available locally (but disabled by default and thus hidden), we can use the collections to load their content temporarily and on-demand. Want to quickly look at that shared calendar you don’t have enabled? Simply search for it and have a quick look, the data is synchronized on-demand and the folder is as quickly gone as you found it, once it is no longer required. This will diminish the requirement to have folders constantly clutter your folder list even further.

So, what do you think?