Kontact-Nepomuk Integration: Why data from akonadi is indexed in nepomuk

So Akonadi is already a “cache” for your PIM-data, and now we’re trying hard to feed all that data into a second “cache” called Nepomuk, just for some searching? We clearly must be crazy.

The process of keeping these to caches in sync is not entirely trivial, storing the data in Nepomuk is rather expensive, and obviously we’re duplicating all data. Rest assured we have our reasons though.

  • Akonadi handles the payload of items stored in it transparently, meaning it has no idea what it is actually caching (apart from some hints such as mimetypes). While that is a very good design decision (great flexibility), it has the drawback that we can’t really search for anything inside the payload (because we don’t know what we’re searching through, where to look, etc)
  • The solution to the searching problem is of course building an index, which is a cache of all data optimized for searching. It essentially structures the data in a way that content->item lookups become fast (while normal usage does this the other way round). So that  already means duplicating all your data (more or less), because we’re trading disk-space and memory for searching speed. And Nepomuk is what we’re using as index for that.

Now there would of course be simpler ways to build an index for searching than using Nepomuk, but Nepomuk provides way more opportunities than just a simple, textbased index, allowing us to build awesome features on top of it, while the latter would essentially be a dead end.

To build that cache we’re doing the following:

  • analyze all items in Akonadi
  • split them up into individual parts such as (for an email example): subject, plaintext content, email addresses, flags
  • store that separated data in Nepomuk in a structured way

This results in networks of data stored in Nepomuk:

PersonA [hasEMailAddress] addressA
PersonA [hasEMailAddress] addressB
emailA [hasSender] addressA
emailB [hasSender] addressB

So this “network” relates emails to email-addresses, and email-addresses to contacts, and contacts to actual persons, and suddenly you can ask the system for all emails from a person, no matter which of the person’s email-addresses have been used in the mails. Of course we can add to that IM conversations with the same Person, or documents you exchanged during that conversation, … the possibilities are almost endless.

Based on that information much more powerful interfaces can be written. For instance one could write a communication tool which doesn’t really care anymore which communication channel you’re using and dynamically mixes IM and email depending on whether/where the other person is currently available for a chat or would rather have a mail, which can be read later on, and doing so without splitting the conversation across various mail/chat interfaces.
This is of course just one example of many (neither am I claiming the idea, it’s just a nice example for what is possible).

So that’s basically why we took the difficult route for searching (At least that is why I am working on this).

Now, we’re not quite there yet, but we already start to get the first fruits of our labor;

  • KMail can now automatically complete addresses from all emails you have ever received
  • Filtering in KMail does fulltext searching, making it a lot easier to find old conversations
  • The kpeoples library already uses this data for contacts merging, which will result in a much nicer addressbook
  • And of course having the data available in Nepomuk enables other developers to start working with it

I’ll follow up on that post with some more technical background on how the feeders are working and possibly some information on the problematic areas from a client perspective (such as the address auto-completion in KMail).

Author: cmollekopf

Christian Mollekopf is an open source software enthusiast with a special interest in personal organization tools. He started to contribute actively to KDE in 2008 and currently works for Kolab Systems leading the development for the next generation desktop client.

31 thoughts on “Kontact-Nepomuk Integration: Why data from akonadi is indexed in nepomuk”

    1. Heh, was just going to ask if you could take a look at my message to the KDE nepomuk list about the same issue. I don’t know what’s going on but the indexer is clearly upsetting the storage service.

      1. Yeah, very unfortunate indeed that we ended up with this in the final release. Unfortunately I can’t reproduce the issue so far, so I will have to look into that. If you could open a bugreport and throw it in my direction that would be great.

    1. You’re wrong and you’re right. Akonadi is not a store, it’s just a cache optimised for fast and standardised access to PIM data. The data is ‘stored’ on the imap server or maildir or whatever. Nepomuk does not store the data either, it stores a index to the data and a semantic model of the data. But you’re right, both the PIM item cache and the semantic index *could* be implemented in the same database process. The obstacles to that are the lack of a QtSQL Virtuoso plugin, and the database-specific bits for it in Akonadi.

      1. And fwiw, we’re already flogging the databases and their Qt APIs fairly hard* with lots of parallel large scale access – perhaps having the functions split across two DBs does give us, ironically, some degree of protection from imperfect concurrent access implementations ;).
        * I had the chance to ask the SQLite author recently if it would be suitable for Akonadi’s use and we established that would be close to its limits.

    2. The purpose of akonadi is exactly to be able to deal with various storage backends in a uniform way. The typical storage backend is somewhere on a server though, and this is where akonadi is used to it’s full extent (i.e. offline IMAP access). While one could use Nepomuk as primary storage that would severely limit how one can use the data across various machines (and implementing the synchronization for nepomuk would be reinventing akonadi).

      1. I think you all misunderstood the idea. AFAIA the idea is to use Nepomuk storage instead of vcard disk storage. Contact data (except photo that could be stored on disk) are small enough to store them directly in Nepomuk. And Aconadi would cache Nepomuk data.

        1. That’s not the idea at all (IMO). While there exists the concept to store all your data in one huge database in a structured way, I don’t think that this idea is feasible/practical. There are just too many good reasons to store data elsewhere, and way to many formats that we could expect to replace the native storage mechanism. On the other hand, some metadata (such as ratings) is currently stored solely in nepomuk, and where we draw the line between data that is only available in nepomuk, and data that is written to a backend (possibly via akonadi), is an ongoing process. IMO only non-mission critical data should end up in nepomuk only and as much as possible should be written to the backend. I.e. I wouldn’t want my photos to be in nepomuk only, but if the ratings are, that’s fine with me (although if you rely heavily on that, you’d want that also to be somehow stored with the photos, e.g. for syncing and backup)).

  1. beojan, I think that is what the blog post wants to teach us. akonadi is just there to make the whole thing more complex and debugging much harder …

    1. I don’t think that’s true. Akonadi tackles a difficult problem domain (synchronization), and while it certainly has it’s flaws, I think the concept is sound, and it’s just a matter of stabilization and fixing the remaining issues.

  2. beojan: Nepomuk isn’t a data store, it’s a store for data about data – relations and indices. Storing data and data about data is different because they have different workloads and performance profiles, so you need to optimize for them differently. Akonadi for example does things like using the database or the filesystem depending of the size of the object in question, in the hopes of achieving optimal speed for each case. This isn’t something Nepomuk should concern itself with or would be easily done there. In effect, a unified “grand store” wouldn’t simplify things, it would just make a big bloated mess that would be harder to optimize than individual parts that concern themselves with more specific problems. It’s a trade-off between the syncing problem and this one.

    On a side not, I finally switched to KMail with 4.10 and I’m loving it. The last PIM sprint did absolute _wonders_ for connected IMAP!

  3. I have a dream. One day Akonadi and Nepomuk will be removed from KDE and I can return to what was once my favourit desktop. A+N did not only make Kontact unusable, virtually all KDE programs have now a package dependency. Disabling did not work because every mouseclick can start virtuoso, mysql, soprano and other performance hogs. So I needed to find a replacement for all the great KDE apps. Even if the semantic desktop is bug free one day, I do not want to have it. The fact that the European Union sponsored its development makes it even more suspicious since Nepomuk and Akonadi are technically spyware components. Am I the only one who thinks that people with a hidden agenda managed to gain influence at KDE?

    1. “The fact that the European Union sponsored its development makes it even more suspicious since Nepomuk and Akonadi are technically spyware components.”

      Bong! tin foil hat alert!

  4. As I have always said, the big problem with Nepomuk is that you first pushed it into everybody’s desktops eating resources, and *then* you added useful features that people can actually benefit from. You already dug your grave by doing that. You won’t remove the negative impression everybody got.

    1. We said that about the early Plasma/Applications releases, too, yet here we are. Progress doesn’t come from working in a black box for years. You need testers, you need users to guide your development, and you don’t get that by working in secret.

    2. History teach us how to remove negative impression:
      1) make it almost work
      2) change its name from “Vista” to “7”
      At least that worked very well in at least one case 😉

    3. Sure, as I said, maybe this has been a mistake, or a necessary push for the technology. It would have been way more effort for the developers to shield the users properly from that development process. I don’t think we dug our grave though, it’s now about looking forward and improving the situation and not about lamenting what happened already.

    4. A recent arrival to KDE (4.9 was my first experience), I really like the potential presented in Nepomuk. It may not be perfect yet but I like what it offers (fairly consistently on my install) today, especially for music and image files.

      My point being, as KDE gains in popularity and new users discover its value, they arrive at KDE without the negative impression.

      — Smittie

  5. Apart from the utility of using a service already optimised for indexing and searching (nepomuk), as opposed to caching (akonadi) I always presumed the main reason for using nepomuk as the indexer was to make kmail resources available to nepomuk and the semantic desktop in general.

    When its working (hah!) I do find it useful to be able to search related items including messages and contacts via krunner etc.

  6. If Akonadi is only a cache…that explains at last why PIM is merely unuseable in offline mode with disconnected IMAP. Can Akonadi be forecd to sync all mails?

  7. What’s really unfortunate: most IMAP sever provide an index for server-side search. However, since akonadi/nepomuk were inventend, it doesn’t seem to be possible to use the server-side search in kmail any more. And since at the same time nepomuk indexing has it’s issues all the time, using kmail has become a real pain. Why not let the user choose so keep on using the well established and proven server-side search?

  8. So, is akonadi and Nepomuk so great when my emails won’t move properly in Kmail, displaying mails takes 10 minutes AND I gt millions of these messages in output:

    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(24877) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(24877) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(24877) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(24877) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(24877) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .se

  9. I am running the following command in a cron job:

    06 * * * * /usr/bin/akonadictl stop 2>&1

    My firewall logs show that every time this command is executed, the PC running the cron job attempts to contact 198.105.254.114 using SPT=56445 DPT=512

    Why does the running of “akonadictl stop” via cron initiate outgoing traffic to a remote site?

  10. I might be slightly more appreciative of these applications(?) if they had names that meant something, so I could remember what the heck they’re doing on my system besides being irritating in a variety of ways.

    Is the below normal?

    1 S apb 2619 2383 0 80 0 – 76212 poll_s Oct16 ? 00:00:01 /usr/bin/akonaditray -session 1023728d2901e9000138861115400000016770008_1413481012_589210
    0 S apb 2644 2383 0 80 0 – 75994 poll_s Oct16 ? 00:00:05 /usr/bin/akonadi_control
    0 S apb 2647 2644 0 80 0 – 367797 poll_s Oct16 ? 00:00:13 akonadiserver
    0 S apb 2649 2647 0 80 0 – 310954 poll_s Oct16 ? 00:07:36 /usr/sbin/mysqld –defaults-file=/home/apb/.local/share/akonadi/mysql.conf –datadir=/home/apb/.local/share/akonadi/db_data/ –socket=/tmp/akonadi-apb.F0ucrZ/mysql.socket
    0 S apb 2907 2644 0 80 0 – 83373 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_agent_launcher akonadi_akonotes_resource akonadi_akonotes_resource_0
    0 S apb 2908 2644 0 80 0 – 83373 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_agent_launcher akonadi_akonotes_resource akonadi_akonotes_resource_1
    0 S apb 2909 2644 0 80 0 – 83922 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_agent_launcher akonadi_akonotes_resource akonadi_akonotes_resource_2
    0 S apb 2910 2644 0 80 0 – 159068 poll_s Oct16 ? 00:00:02 /usr/bin/akonadi_archivemail_agent –identifier akonadi_archivemail_agent
    0 S apb 2917 2644 0 99 19 – 85062 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_baloo_indexer –identifier akonadi_baloo_indexer
    0 S apb 2918 2644 0 80 0 – 81225 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_agent_launcher akonadi_contacts_resource akonadi_contacts_resource_0
    0 S apb 2919 2644 0 80 0 – 81989 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_agent_launcher akonadi_ical_resource akonadi_ical_resource_0
    0 S apb 2920 2644 0 80 0 – 85957 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_agent_launcher akonadi_kalarm_resource akonadi_kalarm_resource_0
    0 S apb 2921 2644 0 80 0 – 85943 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_agent_launcher akonadi_kalarm_resource akonadi_kalarm_resource_1
    0 S apb 2922 2644 0 80 0 – 85876 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_agent_launcher akonadi_kalarm_resource akonadi_kalarm_resource_2
    0 S apb 2923 2644 0 80 0 – 83393 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_agent_launcher akonadi_maildir_resource akonadi_maildir_resource_0
    0 S apb 2924 2644 0 80 0 – 86592 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_maildispatcher_agent –identifier akonadi_maildispatcher_agent
    0 S apb 2935 2644 0 80 0 – 159067 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_mailfilter_agent –identifier akonadi_mailfilter_agent
    0 S apb 2964 2644 0 80 0 – 77454 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_migration_agent –identifier akonadi_migration_agent
    0 S apb 3001 2644 0 80 0 – 99065 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_newmailnotifier_agent –identifier akonadi_newmailnotifier_agent
    0 S apb 3002 2644 0 80 0 – 136412 poll_s Oct16 ? 00:00:07 /usr/bin/akonadi_notes_agent –identifier akonadi_notes_agent
    0 S apb 3005 2644 0 80 0 – 147635 poll_s Oct16 ? 00:00:01 /usr/bin/akonadi_sendlater_agent –identifier akonadi_sendlater_agent

Leave a comment