Kontact on Windows

I recently had the dubious pleasure of getting Kontact to work on windows, and after two weeks of agony it also yielded some results =)

Not only did I get Kontact to build on windows (sadly still something to be proud off), it is also largely functional. Even timezones are now working in a way that you can collaborate with non-windows users, although that required one or the other patch to kdelibs.

To make the whole excercise as reproducible as possible I collected my complete setup in a git repository [0]. Note that these builds are from the kolab stable branches, and not all the windows specific fixes have made it back upstream yet. That will follow as soon as the waters calm a bit.

If you want to try it yourself you can download an installer here [1],
and if you don’t (I won’t judge you for not using windows) you can look at the pretty pictures.

[0] https://github.com/cmollekopf/kdepimwindows
[1] http://mirror.kolabsys.com/pub/upload/windows/Kontact-E5-2015-06-30-19-41.exe


Posted in KDE, Kolab | Tagged , | 10 Comments

Reproducible testing with docker

Reproducible testing is hard, and doing it without automated tests, is even harder. With Kontact we’re unfortunately not yet in a position where we can cover all of the functionality by automated tests.

If manual testing is required, being able to bring the test system into a “clean” state after every test is key to reproducibility.

Fortunately we have a lightweight virtualization technology available with linux containers by now, and docker makes them fairly trivial to use.


Docker allows us to create, start and stop containers very easily based on images. Every image contains the current file system state, and each running container is essentially a chroot containing that image content, and a process running in it. Let that process be bash and you have pretty much a fully functional linux system.

The nice thing about this is that it is possible to run a Ubuntu 12.04 container on a Fedora 22 host (or whatever suits your fancy), and whatever I’m doing in the container, is not affected by what happens on the host system. So i.e. upgrading the host system does not affect the container.

Also, starting a container is a matter of a second.

Reproducible builds

There is a large variety of distributions out there, and every distribution has it’s own unique set of dependency versions, so if a colleague is facing a build issue, it is by no means guaranteed that I can reproduce the same problem on my system.

As an additional annoyance, any system upgrade can break my local build setup, meaning I have to be very careful with upgrading my system if I don’t have the time to rebuild it from scratch.

Moving the build system into a docker container therefore has a variety of advantages:
* Builds are reproducible across different machines
* Build dependencies can be centrally managed
* The build system is no longer affected by changes in the host system
* Building for different distributions is a matter of having a couple of docker containers

For building I chose to use kdesrc-build, so building all the necessary repositories is the least amount of effort.

Because I’m still editing the code from outside of the docker container (where my editor runs), I’m simply mounting the source code directory into the container. That way I don’t have to work inside the container, but my builds are still isolated.

Further I’m also mounting the install and build directories, meaning my containers don’t have to store anything and can be completely non-persistent (the less customized, the more reproducible), while I keep my builds fast and incremental. This is not about packaging after all.

Reproducible testing

Now we have a set of binaries that we compiled in a docker container using certain dependencies, so all we need to run the binaries is a docker container that has the necessary runtime dependencies installed.

After a bit of hackery to reuse the hosts X11 socket, it’s possible run graphical applications inside a properly setup container.

The binaries are directly mounted from the install directory, and the prepared docker image contains everything from the necessary configurations to a seeded Kontact configuration for what I need to test. That way it is guaranteed that every time I start the container, Kontact starts up in exactly the same state, zero clicks required. Issues discovered that way can very reliably be reproduced across different machines, as the only thing that differs between two setups is the used hardware (which is largely irrelevant for Kontact).

..with a server

Because I’m typically testing Kontact against a Kolab server, I of course also have a docker container running Kolab. I can again seed the image with various settings (I have for instance a John Doe account setup, for which I have the account and credentials already setup in client container), and the server is completely fresh on every start.

Wrapping it all up

Because a bunch of commands is involved, it’s worthwhile writing a couple of scripts to make the usage a easy as possible.

I went for a python wrapper which allows me to:
* build and install kdepim: “devenv srcbuild install kdepim”
* get a shell in the kdepim dir: “devenv srcbuild shell kdepim”
* start the test environment: “devenv start set1 john”

When starting the environment the first parameter defines the dataset used by the server, and the second one specifies which client to start, so I can have two Kontact instances with different users for invitation handling testing and such.

Of course you can issue any arbitrary command inside the container, so this can be extended however necessary.

While that would of course have been possible with VMs for a long time, there is a fundamental difference in performance. Executing the build has no noticeable delay compared to simply issuing make, and that includes creating a container from an image, starting the container, and cleaning it up afterwards. Starting the test server + client also takes all of 3 seconds. This kind of efficiency is really what enables us to use this in a lather, rinse, repeat approach.

The development environment

I’m still using the development environment on the host system, so all file-editing and git handling etc. happens as usual so far. I still require the build dependencies on the host system, so clang can compile my files (using YouCompleteMe) and hint if I made a typo, but at least these dependencies are decoupled from what I’m using to build Kontact itself.

I also did a little bit of integration in Vim, so my Make command now actually executes the docker command. This way I get seamless integration and I don’t even notice that I’m no longer building on the host system. Sweet.

While I’m using Vim, there’s no reason why that shouldn’t work with KDevelop (or whatever really..).

I might dockerize my development environment as well (vim + tmux + zsh + git), but more on that in another post.

Overall I’m very happy with the results of investing in a couple of docker containers, and I doubt we could have done the work we did, without that setup. At least not without a bunch of dedicated machines just for that. I’m likely to invest more in that setup, and I’m currently contemplating dockerizing also my development setup.

In any case, sources can be found here:

Posted in KDE, Kolab | Tagged , , | 3 Comments

Progress on the prototype for a possible next version of akonadi

Ever since we introduced our ideas the next version of akonadi, we’ve been working on a proof of concept implementation, but we haven’t talked a lot about it. I’d therefore like to give a short progress report.

By choosing decentralized storage and a key-value store as the underlying technology, we first need to prove that this approach can deliver the desired performance with all pieces of the infrastructure in place. I think we have mostly reached that milestone by now. The new architecture is very flexible and looks promising so far. We managed IMO quite well to keep the levels of abstraction to a necessary minimum, which results in a system that is easily adjusted as new problems need to be solved and feels very controllable from a developer perspective.

We’ve started off with implementing the full stack for a single resource and a single domain type. For this we developed a simple dummy-resource that currently has an in-memory hash map as backend, and can only store events. This is a sufficient first step, as turning that into the full solution is a matter of adding further flatbuffer schemas for other types and defining the relevant indexes necessary to query what we want to query. By only working on a single type we can first carve out the necessary interfaces and make sure that we make the effort required to add new types minimal and thus maximize code reuse.

The design we’re pursuing, as presented during the pim sprint, consists of:

  • A set of resource processes
  • A store per resource, maintained by the individual resources (there is no central store)
  • A set of indexes maintained by the individual resources
  • A clientapi that knows how to access the store and how to talk to the resources through a plugin provided by the resource implementation.

By now we can write to the dummyresource through the client api, the resource internally queues the new entity, updates it’s indexes and writes the entity to storage. On the reading part we can execute simple queries against the indexes and retrieve the found entities. The synchronizer process can meanwhile generate also new entities, so client and synchronizer can write concurrently to the store. We therefore can do the full write/read roundtrip meaning we have most fundamental requirements covered. Missing are other operations than creating new entities (removal and modifications), and the writeback to the source by the synchronizer. But that’s just a matter of completing the implementation (we have the design).

To the numbers: Writing from the client is currently implemented in a very inefficient way and it’s trivial to drastically improve this, but in my latest test I could already write ~240 (small) entities per second. Reading works around 40k entities per second (in a single query) including the lookup on the secondary index. The upper limit of what the storage itself can achieve (on my laptop) is at 30k entities per second to write, and 250k entities per second to read, so there is room for improvement =)

Given that design and performance look promising so far, the next milestone will be to refactor the codebase sufficiently to ensure new resources can be added with sufficient ease, and making sure all the necessary facilities (such as a proper logging system), or at least stubs thereof, are available.

I’m writing this on a plane to Singapore which we’re using as gateway to Indonesia to chase after waves and volcanoes for the next few weeks, but after that I’m  looking forward to go full steam ahead with what we started here. I think it’s going to be something cool =)

Posted in KDE, Kolab | Tagged , , | 19 Comments

On Domain Models and Layers in kdepim

In our current kdepim code we use some classes throughout the codebase. I’m going to line out the problems with that and propose how we can do better.

The Application Domain

Each application has a “domain” it was created for. KOrganizer has for instance the calendar domain, and kmail the email domain, and each of those domains can be described with domain objects, which make up the domain model. The domain model of an application is essential, because it is what defines how we can represent the problems of that domain. If Korganizer didn’t have a domain model with attendees to events, we wouldn’t have any way to represent attendees internally, and thus couldn’t develop a feature based on that.

The logic implementing the functionality on top of those domain objects is the domain logic. It implements for instance what has to happen if we remove an event from a calendar, or how we can calculate all occurrences of a recurring event.

In the calendaring domain we use KCalCore to provide many of those domain objects and a large part of the domain logic. KCalCore::Event for instance, represents an event, can hold all necessary data of that event, and has the domain logic directly built-in to calculate recurrences.
Since it is a public library, it provides domain-objects and the domain-logic for all calendaring applications, which is awesome, right? Only if you use it right.


KCalCore provides additionally to the containers and the calendaring logic also serialization to the iCalendar format, which is also why it more or less tries to adhere to the iCalendar RFC, for both representation an interpretation of calendaring data. This is of course very useful for applications that deal with that, and there’s nothing particularly wrong with it. One could argue that serialization and interpretation of calendaring data should be split up, but since both is described by the same RFC I think it makes a lot of sense to keep the implementations together.


A problem arises when classes like KCalCore::Event are used as domain objects, and interface for the storage layer, and as actual storage format, which is precisely what we do in kdepim.

The problem is that we introduce very high coupling between those components/layers and by choosing a library that adheres to an RFC the whole system is even locked down by a fully grown specification. I suppose that would be fine if only one application is using the storage layer,
and that application’s sole purpose is to implement exactly that RFC and nothing else, ever. In all other cases I think it is a mistake.

Domain Logic

The domain logic of an application has to evolve with the application by definition. The domain objects used for that are supposed to model the problem at hand precisely, in a way that a domain logic can be built that is easy to understand and evolve as requirements change. Properties that are not used by an application only hide the important bits of a domain objects, and if a new feature is added it must be possible to adjust the domain object to reflect that. By using a class like KCalCore::Event for the domain object, these adjustments become largely impossible.

The consequence is that we employ workarounds everywhere. KCalCore doesn’t provide what you need? Simply store it as “custom property”. We don’t have a class for calendars? Let’s use Akonadi::Collection with some custom attributes. Mechanisms have been designed to extend these rigid structures so we can at least work with it, but that only lead to more complex code that is ever harder to understand.

Instead we could write domain logic that expresses precisely what we need, and is easier to understand and maintain.

Zanshin for instance took the calendaring domain, and applied the GettingThingsDone (GTD) methodology to it. It takes a rather simple approach to todo’s and initially only required a description, a due date and a state. However, it introduced the notion that only “projects” can have subtodo’s. This restriction needs to be reflected in the domain model, and implemented in the domain logic.
Because there are no projects in KCalCore, it was simply defined that todo’s with a magic property “X-Project” are defined as project. There’s nothing wrong with that itself, but you don’t want to litter your code with “if (todo->hasProperty(X-Project))”. So what do you do? You create a wrapper. And that wrapper is now already your new domain object with a nice interface. Kevin fortunately realized that we can do better, and rewrote zanshin with its own custom domain objects, that simply interface with the KCalCore containers in a thin translation layer to akonadi. This made the code much clearer, and keeps those “x-property”-workarounds in one place only.


A useful approach to think about application architecture are IMO layers. It’s not a silver bullet, and shouldn’t be done too excessively I think, but in some cases layer do make a lot of sense. I suggest to think about the following layers:

  • The presentation layer: Displays stuff and nothing else. This is where you expose your domain model to the UI, and where your QML sits.
  • The domain layer: The core of the application. This is where all the useful magic happens.
  • The data access layer: A thin translation layer between domain and storage. It makes it possible to use the same storage layer from multiple domains and to replace the storage layer without replacing all the rest.
  • The storage layer: The layer that persists the domain model. Akonadi.

By keeping these layer’s in mind we can do a better job at keeping the coupling at a reasonable level, allowing individual components to  change as required.

The presentation layer is required in any case if we want to move to QML. With QML we can no longer have half of the domain logic in the UI code, and most of the domain model should probably be exposed as a model that is directly accessible by QML.

The data access layer is where akonadi provides a standardized interface for all data, so multiple applications can shared the same storage layer. This is currently made up by the i.e. KCalCore for calendars, the akonadi client API, and a couple of akonadi objects, such as Akonadi::Item and Akonadi::Collection. As this layer defines what data can be accessed by all applications, it needs to be flexible and likely has to be evolved frequently.

The way forward

For akonadi’s client API, aka the data access layer, I plan on defining a set of interfaces for things like calendars, events, mailfolders, emails, etc. This should eventually replace KCalCore, KContacts and friends from being the canonical interface to the data.

Applications should eventually move to their own domain logic implementation. For reading and structuring data, models are IMO a suitable tool, and if we design them right this will also pave the way for QML interfaces. Of course i.e. KCalCore still has its uses for its calendaring routines, or as a serialization library to create iTip messages, but we should IMO stop using it for everything. The same of course applies to KContacts.

What we still could do IMO, is share some domain logic between applications, including some domain objects. A KDEPIM::Domain::Contact could be used across applications, just like KContact::Adressee was. This keeps different applications from implementing the same logic, but of course also introduces coupling between those again.

What IMO has to stay separate is the data access layer, which implements an interface to the storage layer, and that doesn’t necessarily conform to the domain layer (you could i.e. store “Blog posts” as notes in storage). This separation is IMO useful, as I expect the application domain to evolve separately from what actual storage backends provide (see zanshin).

This is of course quite a chunk of work, that won’t happen at once. But need to know now where we want to end up in a couple of years, if we intend to ever get there.

Posted in KDE, Kolab, Uncategorized | 1 Comment

Putting the code where it belongs

I have been working on better ways to write asynchronous code. In this post I’m going to analyze one of our current tools, KJob, in how it helps us writing asynchronous code and what is missing. I’m then going to present my prototype solution to address these problems.


In KDE we have the KJob class to wrap asynchronous operations. KJob gives us a framework for progress and error reporting, a uniform start method, and by subclassing it we can easily write our own reusable asynchronus operations. Such an asynchronous operation typically takes a couple of arguments, and returns a result.

A KJob, in it’s simplest form, is the asynchronous equivalent of a function call:

int doSomething(int argument) {
    return getNumber(argument);
struct DoSomething : public KJob {
    KJob(int argument): mArgument(argument){}

    void start() {
        KJob *job = getNumberAsync(mArgument);
        connect(job, SIGNAL(result(KJob*)), this, SLOT(onJobDone(KJob*)));

    int mResult;
    int mArgument;

private slots:
    void onJobDone(KJob *job) {
        mResult = job->result;

What you’ll notice immediately that this involves a lot of boilerplate code. It also introduces a lot of complexity in a seemingly trivial task. This is partially because we have to create a class when we actually wanted a function, and partially because we have to use class members to replace variables on the stack, that we don’t have available during an asynchronous operation.

So while KJob gives us a tool to wrap asynchronous operations in a way that they become reusable, it comes at the cost of quite a bit of boilerplate code. It also means that what can be written synchronously in a simple function, requires a class when writing the same code asynchronously.

Inversion of Control

A typical operation is of course slightly more complex than doSomething, and often consists of several (asynchronous) operations itself.

What in imperative code looks like this:

int doSomethingComplex(int argument) {
    return operation2(operation1(argument));

…results in an asynchronous operation that is scattered over multiple result handlers somewhat like this:

void start() {
    KJob *job = operation1(mArgument);
    connect(job, SIGNAL(result(KJob*)), this, SLOT(onOperation1Done(KJob*)));

void onOperation1Done(KJob *operation1Job) {
    KJob *job = operation2(operation1Job->result());
    connect(job, SIGNAL(result(KJob*)), this, SLOT(onOperation1Done(KJob*)));

void onOperation2Done(KJob *operation1Job) {
    mResult = operation1Job->result();

We are forced to split the code over several functions due to the inversion of control introduced by the handler based asynchronous programming. Unfortunately these additional functions (the handlers), that we are now forced to use, are not helping the program strucutre in any way. This manifests itself also in the rather useless function names that typically follow a pattern such as on”$Operation”Done() or similar. Further, because the code is scattered over functions, values that are available on the stack in a synchronous function have to be stored explicitly as class member, so they are available in the handler where they are required for a further step.

The traditional way to make code easy to comprehend is to split it up into functions that are then called by a higher level function. This kind of function composition is no longer possible with asynchronous programs using our current tools. All we can do is chain handler after handler. Due to the lack of this higher level function that composes the functionality, a reader is also force to read very single line of the code, instead of simply skimming the function names, only drilling deeper if more detailed information about the inner workings are required.
Since we are no longer able to structure the code in a useful way using functions, only classes, and in our case KJob’s, are left to structure the code. However, creating subjob’s is a lot of work when all you need is a function, and while it helps the strucutre, it scatters the code even more, making it potentially harder to read and understand. Due to this we also often end up with large and complex job classes.

Last but not least we loose all available control structures by the inversion of control. If you write asynchronous code you don’t have the if’s, for’s and while’s available that are fundamental to write code. Well, obviously they are still there, but if you write asynchronous code you can’t use them as usual because you can’t plug a complete asynchonous operation inside an if{}-block. The best that you can do, is initiate the operation inside the imperative control structures, and dealing with the results later on in handlers. Because we need control structures to build useful programs, these are usually emulated by building complex statemachines where each function depends on the current class state. A typical (anti)pattern of that kind is a for loop creating jobs, with a decreasing counter in the handler to check if all jobs have been executed. These statemachines greatly increase the complexity of the code, are higly error prone, and make larger classes incomprehensible without drawing complex state diagrams (or simply staring at the screen long enough while tearing your hear out).

Oh, and before I forget, of course we also no longer get any useful backtraces from gdb as pretty much every backtrace comes straight from the eventloop and we have no clue what was happening before.

As a summary, inversion of control causes:

  • code is scattered over functions that are not helpful to the structure
  • composing functions is no longer possible, since what would normally be written in a function is written as a class.
  • control structures are not usable, a statemachine is required to emulate this.
  • backtraces become mostly useless

As an analogy, your typical asynchronous class is the functional equivalent of single synchronous function (often over 1000 loc!), that uses goto’s and some local variables to build control structures. I think it’s obvious that this is a pretty bad way to write code, to say the least.


Fortunately we received a new tool with C++11: lambda functions
Lambda functions allow us to write functions inline with minimal syntactical overhead.

Armed with this I set out to find a better way to write asynchronous code.

A first obvious solution is to simply write the result handler of a slot as lambda function, which would allow us to write code like this:

make_async(operation1(), [] (KJob *job) {
    //Do something after operation1()
    make_async(operation2(job->result()), [] (KJob *job) {
        //Do something after operation2()

It’s a simple and concise solution, however, you can’t really build reuasable building blocks (like functions) with that. You’ll get one nested tree of lambda’s that depend on each other by accessing the results of the previous jobs. The problem making this solution non-composable is that the lambda function which we pass to make_async, starts the asynchronous task, but also extracts results from the previous job. Therefore you couldn’t, for instance, return an async task containing operation2 from a function (because in the same line we extract the result of the previous job).

What we require instead is a way of chaining asynchronous operations together, while keeping the glue code separated from the reusable bits.

JobComposer is my proof of concept to help with this:

class JobComposer : public KJob
    //KJob start function
    void start();

    //This adds a new continuation to the queue
    void add(const std::function<void(JobComposer&, KJob*)> &jobContinuation);

    //This starts the job, and connects to the result signal. Call from continuation.
    void run(KJob*);

    //This starts the job, and connects to the result signal. Call from continuation.
    //Additionally an error case continuation can be provided that is called in case of error, and that can be used to determine wether further continuations should be executed or not.
    void run(KJob*, const std::function<bool(JobComposer&, KJob*)> &errorHandler);


The basic idea is to wrap each step using a lambda-function to issue the asynchronous operation. Each such continuation (the lambda function) receives a pointer to the previous job to extract results.

Here’s an example how this could be used:

auto task = new JobComposer;
task->add([](JobComposer &t, KJob*){
    KJob *op1Job = operation1();
    t.run(op1Job, [](JobComposer &t, KJob *job) {
        kWarning() << "An error occurred: " << job->errorString()
task->add([](JobComposer &t, KJob *job){
    KJob *op2Job = operation2(static_cast<Operation1*>(job)->result());
    t.run(op2Job, [](JobComposer &t, KJob *job) {
        kWarning() << "An error occurred: " << job->errorString()
task->add([](JobComposer &t, KJob *job){
    kDebug() << "Result: " << static_cast<Operation2*>(job)->result();

What you see here is the equivalent of:

int tmp = operation1();
int res = operation2(tmp);
kDebug() << res;

There are several important advantages of using this to writing traditional asynchronous code using only KJob:

  • The code above, which would normally be spread over several functions, can be written within a single function.
  • Since we can write all code within a single function we can compose functions again. The JobComposer above could be returned from another function and integrated into another JobComposer.
  • Values that are required for a certain step can either be extracted from the previous job, or simply captured in the lambda functions (no more passing of values as members).
  • You only have to read the start() function of a job that is written this way to get an idea what is going on. Not the complete class.
  • A “backtrace” functionality could be built in to JobComposer that would allow to get useful information about the state of the program although we’re in the eventloop.

This is of course only a rough prototype, and I’m sure we can craft something better. But at least in my experiments it proved to work very nicely.
What I think would be useful as well are a couple of helper jobs that replace the missing control structures, such as a ForeachJob which triggers a continuation for each result, or a job to execute tasks in parallel (instead of serial as JobComposer does).

As a little showcase I rewrote a job of the imap resource.
You’ll see a bit of function composition, a ParallelCompositeJob that executes jobs in parallel, and you’ll notice that only relevant functions are left and all class members are gone. I find the result a lot better than the original, and the refactoring was trivial and quick.

I’m quite certain that if we build these tools, we can vastly improve our asynchronous code making it easier to write, read, and debug.
And I think it’s past time that we build proper tools.

Posted in KDE, Kolab | Tagged | 7 Comments

A new folder subscription system

Wouldn’t it be great if Kontact would allow you to select a set of folders you’re interested in, that setting would automatically be respected by all your devices and you’d still be able to control for each individual folder whether it should be visible and available offline?

I’ll line out a system that allows you to achieve just that in a groupware environment. I’ll take Kolab and calendar folders as example, but the concept applies to all groupware systems and is just as well applicable to email or other groupware content.

User Scenarios

  •  Anna has access to hundreds of shared calendars, but she usually only uses a few selected ones. She therefore only has a subset of the available calendars enabled, that are shown to her in the calendar selection dialog, available for offline usage and also get synchronized to her mobile phone. If she realizes she no longer requires a calendar, she simply disables it and it disappears from the Kontact, the Webclient and her phone.
  • Joe works with a small team that shares their calendars with him. Usually he only uses the shared team-calendar, but sometimes he wants to quickly check if they are in the office before calling them, and he’s often doing this in the train with unreliable internet connection. He therefore disables the team member’s calendars but still enables synchronization for them. This hides the calendars from all his devices, but he still can quickly enable them on his laptop while being offline.
  • Fred has a mailing list folder that he always reads on his mobile, but never on his laptop. He keeps the folder enabled, but hides it on his laptop so his folder list isn’t unnecessarily cluttered.

What these scenarios tell us is that we need a flexible mechanism to specify the folders we want to see and the folders we want synchronized. Additionally we want, in today’s world where we have multiple devices, to synchronize the selection of folders that are important to us. It is likely I’d like to see the calendar I have just enabled in Kontact also on my phone. However, we always want to keep the possibility to alter that default setting on specific devices.

Current State

If you’re using a Kolab Server, you can use IMAP subscriptions to control what folders you want to see on your devices. Kontact currently respects that setting in that it makes all folders visible and available for offline usage. Additionally you have local subscriptions to disable certain folders (so they are not downloaded or displayed) on a specific device. That is not very flexible though, and personally I ended up having pretty much all folders enabled that I ever used, leading to cluttered folder selections and lot’s of bandwith and storage space used to keep everything available offline.

To change the subscription state, KMail offers to open the IMAP-subscription dialog which allows to toggle the subscription state of individual folders. This works, but is not well integrated (it’s a separate dialog), and is also not well integrable since it’s IMAP specific.

Because the solution is not well integrated, it tends to be rather static in my experience. I tend to subscribe to all folders that I ever use, which results in a very long and cluttered folder-list.

A new integrated subscription system

What would be much better, is if the back-end could provide a default setting that is synchronized to the server, and we could quickly enable or disable folders as we require them. Additionally we can override the default settings for each individual folder to optimize our setup as required.

To make the system more flexible, while not unnecessarily complex, we need a per folder setting that allows to override a backend provided default value. Additionally we need an interface for applications to alter the subscription state through Akonadi (instead of bypassing it). This allows for a well integrated solution that doesn’t rely on a separate, IMAP-specific dialog.

Each folder requires the following settings:

  • An enabled/disabled state that provides the default value for synchronizing and displaying a folder.
  • An explicit preference to synchronize a folder.
  • An explicit preference to make a folder visible.

A folder is visible if:

  • There is an explicit preference that the folder is visible.
  • There is no explicit preference on visibility and the folder is enabled.

A folder is synchronized if:

  • There is an explicit preference that the folder is synchronized.
  • There is no explicit preference on synchronization and the folder is enabled.

The resource-backend can synchronize the enabled/disabled state which should give a default experience as expected. Additionally it is possible to override that default state using the explicit preference on a per folder level.

User Interaction

By default you would be working with the enabled/disabled state, that is synchronized by the resource backend. If you enable a folder it becomes visible and synchronized, if you disable it, it becomes invisible and not synchronized. For the enabled/disabled state we can build a very easy user interface, as it is a single boolean state, that we can integrate into the primary UI.

Because the enabled/disabled state is synchronized, an enabled calendar will automatically appear on your MyKolab.com web interface and your mobile. One click, and you’re all set.

Mockup of folder sync properties

Example mockup of folder sync properties

In the advanced settings, you can then override visibility and synchronization preference at will as a local-only setting, giving you full flexibility. This can be hidden in a properties dialog, so it doesn’t clutter the primary UI.

This makes the default usecase very simple to use (you either want a folder or you don’t want it), while we keep full flexibility in overriding the default behaviour.

IMAP Synchronization

The IMAP resource will synchronize the enabled/disabled state with IMAP subscriptions if you have subscriptions enabled in the resource. This way we can use the enabled/disabled state as interface to change the subscriptions, and don’t have to use a separate dialog to toggle that state.

Interaction with existing mechanisms

This mechanism can probably replace local subscriptions eventually. However, in order not to break existing setups I plan to leave local subscriptions working as they currently are.


By implementing this proposal we get the required flexibility to make sure the resources of our machine are optimally used, while different clients still interact with each other as expected. Additionally we gain a uniform interface to enable/disable a collection that can be synchronized by backends (e.g. using the IMAP subscription state). This will allow applications to nicely integrate this setting, and should therefore make this feature a lot easier to use and overall more agile.

New doors are opened as this will enable us to do on-demand loading of folders. By having the complete folder list available locally (but disabled by default and thus hidden), we can use the collections to load their content temporarily and on-demand. Want to quickly look at that shared calendar you don’t have enabled? Simply search for it and have a quick look, the data is synchronized on-demand and the folder is as quickly gone as you found it, once it is no longer required. This will diminish the requirement to have folders constantly clutter your folder list even further.

So, what do you think?

Posted in KDE, Kolab, Uncategorized | Tagged , , , , | 6 Comments

Kontact-Nepomuk Integration: Why data from akonadi is indexed in nepomuk

So Akonadi is already a “cache” for your PIM-data, and now we’re trying hard to feed all that data into a second “cache” called Nepomuk, just for some searching? We clearly must be crazy.

The process of keeping these to caches in sync is not entirely trivial, storing the data in Nepomuk is rather expensive, and obviously we’re duplicating all data. Rest assured we have our reasons though.

  • Akonadi handles the payload of items stored in it transparently, meaning it has no idea what it is actually caching (apart from some hints such as mimetypes). While that is a very good design decision (great flexibility), it has the drawback that we can’t really search for anything inside the payload (because we don’t know what we’re searching through, where to look, etc)
  • The solution to the searching problem is of course building an index, which is a cache of all data optimized for searching. It essentially structures the data in a way that content->item lookups become fast (while normal usage does this the other way round). So that  already means duplicating all your data (more or less), because we’re trading disk-space and memory for searching speed. And Nepomuk is what we’re using as index for that.

Now there would of course be simpler ways to build an index for searching than using Nepomuk, but Nepomuk provides way more opportunities than just a simple, textbased index, allowing us to build awesome features on top of it, while the latter would essentially be a dead end.

To build that cache we’re doing the following:

  • analyze all items in Akonadi
  • split them up into individual parts such as (for an email example): subject, plaintext content, email addresses, flags
  • store that separated data in Nepomuk in a structured way

This results in networks of data stored in Nepomuk:

PersonA [hasEMailAddress] addressA
PersonA [hasEMailAddress] addressB
emailA [hasSender] addressA
emailB [hasSender] addressB

So this “network” relates emails to email-addresses, and email-addresses to contacts, and contacts to actual persons, and suddenly you can ask the system for all emails from a person, no matter which of the person’s email-addresses have been used in the mails. Of course we can add to that IM conversations with the same Person, or documents you exchanged during that conversation, … the possibilities are almost endless.

Based on that information much more powerful interfaces can be written. For instance one could write a communication tool which doesn’t really care anymore which communication channel you’re using and dynamically mixes IM and email depending on whether/where the other person is currently available for a chat or would rather have a mail, which can be read later on, and doing so without splitting the conversation across various mail/chat interfaces.
This is of course just one example of many (neither am I claiming the idea, it’s just a nice example for what is possible).

So that’s basically why we took the difficult route for searching (At least that is why I am working on this).

Now, we’re not quite there yet, but we already start to get the first fruits of our labor;

  • KMail can now automatically complete addresses from all emails you have ever received
  • Filtering in KMail does fulltext searching, making it a lot easier to find old conversations
  • The kpeoples library already uses this data for contacts merging, which will result in a much nicer addressbook
  • And of course having the data available in Nepomuk enables other developers to start working with it

I’ll follow up on that post with some more technical background on how the feeders are working and possibly some information on the problematic areas from a client perspective (such as the address auto-completion in KMail).

Posted in KDE, Uncategorized | 31 Comments