Kontact-Nepomuk Integration: Why data from akonadi is indexed in nepomuk

So Akonadi is already a “cache” for your PIM-data, and now we’re trying hard to feed all that data into a second “cache” called Nepomuk, just for some searching? We clearly must be crazy.

The process of keeping these to caches in sync is not entirely trivial, storing the data in Nepomuk is rather expensive, and obviously we’re duplicating all data. Rest assured we have our reasons though.

  • Akonadi handles the payload of items stored in it transparently, meaning it has no idea what it is actually caching (apart from some hints such as mimetypes). While that is a very good design decision (great flexibility), it has the drawback that we can’t really search for anything inside the payload (because we don’t know what we’re searching through, where to look, etc)
  • The solution to the searching problem is of course building an index, which is a cache of all data optimized for searching. It essentially structures the data in a way that content->item lookups become fast (while normal usage does this the other way round). So that  already means duplicating all your data (more or less), because we’re trading disk-space and memory for searching speed. And Nepomuk is what we’re using as index for that.

Now there would of course be simpler ways to build an index for searching than using Nepomuk, but Nepomuk provides way more opportunities than just a simple, textbased index, allowing us to build awesome features on top of it, while the latter would essentially be a dead end.

To build that cache we’re doing the following:

  • analyze all items in Akonadi
  • split them up into individual parts such as (for an email example): subject, plaintext content, email addresses, flags
  • store that separated data in Nepomuk in a structured way

This results in networks of data stored in Nepomuk:

PersonA [hasEMailAddress] addressA
PersonA [hasEMailAddress] addressB
emailA [hasSender] addressA
emailB [hasSender] addressB

So this “network” relates emails to email-addresses, and email-addresses to contacts, and contacts to actual persons, and suddenly you can ask the system for all emails from a person, no matter which of the person’s email-addresses have been used in the mails. Of course we can add to that IM conversations with the same Person, or documents you exchanged during that conversation, … the possibilities are almost endless.

Based on that information much more powerful interfaces can be written. For instance one could write a communication tool which doesn’t really care anymore which communication channel you’re using and dynamically mixes IM and email depending on whether/where the other person is currently available for a chat or would rather have a mail, which can be read later on, and doing so without splitting the conversation across various mail/chat interfaces.
This is of course just one example of many (neither am I claiming the idea, it’s just a nice example for what is possible).

So that’s basically why we took the difficult route for searching (At least that is why I am working on this).

Now, we’re not quite there yet, but we already start to get the first fruits of our labor;

  • KMail can now automatically complete addresses from all emails you have ever received
  • Filtering in KMail does fulltext searching, making it a lot easier to find old conversations
  • The kpeoples library already uses this data for contacts merging, which will result in a much nicer addressbook
  • And of course having the data available in Nepomuk enables other developers to start working with it

I’ll follow up on that post with some more technical background on how the feeders are working and possibly some information on the problematic areas from a client perspective (such as the address auto-completion in KMail).

About these ads
This entry was posted in KDE, Uncategorized. Bookmark the permalink.

30 Responses to Kontact-Nepomuk Integration: Why data from akonadi is indexed in nepomuk

  1. Kurt K. says:

    Akonadi and Nepomuk are both nice technologies. Too bad that a completely broken version of the feeder was released with 4.10.0: http://lists.opensuse.org/opensuse-kde/2013-02/msg00041.html
    I hope 4.10.1 will fix that. Disabling the mail indexer is currently the only way… :(

    • Heh, was just going to ask if you could take a look at my message to the KDE nepomuk list about the same issue. I don’t know what’s going on but the indexer is clearly upsetting the storage service.

      • cmollekopf says:

        Yeah, very unfortunate indeed that we ended up with this in the final release. Unfortunately I can’t reproduce the issue so far, so I will have to look into that. If you could open a bugreport and throw it in my direction that would be great.

  2. beojan says:

    Why not use nepomuk as the primary storage backend for akonadi so the data is only stored once?

    • You’re wrong and you’re right. Akonadi is not a store, it’s just a cache optimised for fast and standardised access to PIM data. The data is ‘stored’ on the imap server or maildir or whatever. Nepomuk does not store the data either, it stores a index to the data and a semantic model of the data. But you’re right, both the PIM item cache and the semantic index *could* be implemented in the same database process. The obstacles to that are the lack of a QtSQL Virtuoso plugin, and the database-specific bits for it in Akonadi.

      • And fwiw, we’re already flogging the databases and their Qt APIs fairly hard* with lots of parallel large scale access – perhaps having the functions split across two DBs does give us, ironically, some degree of protection from imperfect concurrent access implementations ;).
        * I had the chance to ask the SQLite author recently if it would be suitable for Akonadi’s use and we established that would be close to its limits.

    • cmollekopf says:

      The purpose of akonadi is exactly to be able to deal with various storage backends in a uniform way. The typical storage backend is somewhere on a server though, and this is where akonadi is used to it’s full extent (i.e. offline IMAP access). While one could use Nepomuk as primary storage that would severely limit how one can use the data across various machines (and implementing the synchronization for nepomuk would be reinventing akonadi).

      • Govik says:

        I think you all misunderstood the idea. AFAIA the idea is to use Nepomuk storage instead of vcard disk storage. Contact data (except photo that could be stored on disk) are small enough to store them directly in Nepomuk. And Aconadi would cache Nepomuk data.

        • cmollekopf says:

          That’s not the idea at all (IMO). While there exists the concept to store all your data in one huge database in a structured way, I don’t think that this idea is feasible/practical. There are just too many good reasons to store data elsewhere, and way to many formats that we could expect to replace the native storage mechanism. On the other hand, some metadata (such as ratings) is currently stored solely in nepomuk, and where we draw the line between data that is only available in nepomuk, and data that is written to a backend (possibly via akonadi), is an ongoing process. IMO only non-mission critical data should end up in nepomuk only and as much as possible should be written to the backend. I.e. I wouldn’t want my photos to be in nepomuk only, but if the ratings are, that’s fine with me (although if you rely heavily on that, you’d want that also to be somehow stored with the photos, e.g. for syncing and backup)).

  3. Andreas Schneider says:

    beojan, I think that is what the blog post wants to teach us. akonadi is just there to make the whole thing more complex and debugging much harder …

    • cmollekopf says:

      I don’t think that’s true. Akonadi tackles a difficult problem domain (synchronization), and while it certainly has it’s flaws, I think the concept is sound, and it’s just a matter of stabilization and fixing the remaining issues.

  4. Eike Hein says:

    beojan: Nepomuk isn’t a data store, it’s a store for data about data – relations and indices. Storing data and data about data is different because they have different workloads and performance profiles, so you need to optimize for them differently. Akonadi for example does things like using the database or the filesystem depending of the size of the object in question, in the hopes of achieving optimal speed for each case. This isn’t something Nepomuk should concern itself with or would be easily done there. In effect, a unified “grand store” wouldn’t simplify things, it would just make a big bloated mess that would be harder to optimize than individual parts that concern themselves with more specific problems. It’s a trade-off between the syncing problem and this one.

    On a side not, I finally switched to KMail with 4.10 and I’m loving it. The last PIM sprint did absolute _wonders_ for connected IMAP!

  5. Roland Wolf says:

    I have a dream. One day Akonadi and Nepomuk will be removed from KDE and I can return to what was once my favourit desktop. A+N did not only make Kontact unusable, virtually all KDE programs have now a package dependency. Disabling did not work because every mouseclick can start virtuoso, mysql, soprano and other performance hogs. So I needed to find a replacement for all the great KDE apps. Even if the semantic desktop is bug free one day, I do not want to have it. The fact that the European Union sponsored its development makes it even more suspicious since Nepomuk and Akonadi are technically spyware components. Am I the only one who thinks that people with a hidden agenda managed to gain influence at KDE?

  6. Nicolas says:

    As I have always said, the big problem with Nepomuk is that you first pushed it into everybody’s desktops eating resources, and *then* you added useful features that people can actually benefit from. You already dug your grave by doing that. You won’t remove the negative impression everybody got.

    • We said that about the early Plasma/Applications releases, too, yet here we are. Progress doesn’t come from working in a black box for years. You need testers, you need users to guide your development, and you don’t get that by working in secret.

    • federico says:

      History teach us how to remove negative impression:
      1) make it almost work
      2) change its name from “Vista” to “7”
      At least that worked very well in at least one case ;)

    • cmollekopf says:

      Sure, as I said, maybe this has been a mistake, or a necessary push for the technology. It would have been way more effort for the developers to shield the users properly from that development process. I don’t think we dug our grave though, it’s now about looking forward and improving the situation and not about lamenting what happened already.

    • Smittie says:

      A recent arrival to KDE (4.9 was my first experience), I really like the potential presented in Nepomuk. It may not be perfect yet but I like what it offers (fairly consistently on my install) today, especially for music and image files.

      My point being, as KDE gains in popularity and new users discover its value, they arrive at KDE without the negative impression.

      — Smittie

  7. Blackpaw says:

    Apart from the utility of using a service already optimised for indexing and searching (nepomuk), as opposed to caching (akonadi) I always presumed the main reason for using nepomuk as the indexer was to make kmail resources available to nepomuk and the semantic desktop in general.

    When its working (hah!) I do find it useful to be able to search related items including messages and contacts via krunner etc.

  8. Doc.B says:

    If Akonadi is only a cache…that explains at last why PIM is merely unuseable in offline mode with disconnected IMAP. Can Akonadi be forecd to sync all mails?

  9. Sascha says:

    What’s really unfortunate: most IMAP sever provide an index for server-side search. However, since akonadi/nepomuk were inventend, it doesn’t seem to be possible to use the server-side search in kmail any more. And since at the same time nepomuk indexing has it’s issues all the time, using kmail has become a real pain. Why not let the user choose so keep on using the well established and proven server-side search?

  10. Smittie says:

    A somewhat tangential question. Can I completely remove Akonadi (Kontact, KMail and all of the Akonadi stuff) and keep Nepomuk?

  11. Joe says:

    So, is akonadi and Nepomuk so great when my emails won’t move properly in Kmail, displaying mails takes 10 minutes AND I gt millions of these messages in output:

    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(24877) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(24877) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(24877) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(24877) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(24877) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(23034) ItemQueue::batchJobResult: Error while storing graph: “The name org.kde.nepomuk.DataManagement was not provided by any .service files”
    akonadi_nepomuk_feeder(10898) ItemQueue::removeDataResult: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(10898) ItemQueue::batchJobResult: Error while storing graph: “No such object path ‘/datamanagement'”
    akonadi_nepomuk_feeder(24877) ItemQueue::removeDataResult: “The name org.kde.nepomuk.DataManagement was not provided by any .se

  12. Pingback: Semantic Desktop: Akonadi and Nepomuk | Muktware

  13. security conscious says:

    I am running the following command in a cron job:

    06 * * * * /usr/bin/akonadictl stop 2>&1

    My firewall logs show that every time this command is executed, the PC running the cron job attempts to contact 198.105.254.114 using SPT=56445 DPT=512

    Why does the running of “akonadictl stop” via cron initiate outgoing traffic to a remote site?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s