Data-Source Registry (was: how to handle dataproviders per default)

Registered by Seif Lotfy

The Zeitgeist engine maintains a publicly available list of recognized data-sources (components inserting information into Zeitgeist). An option to disable such data-providers is also provided.

The data-source registry of the Zeitgeist engine has DBus object path :const:`/org/gnome/zeitgeist/data_source_registry` under the bus name :const:`org.gnome.zeitgeist.DataSourceRegistry`.

The information available about the data-sources is the following:
 - Unique ID? (TBD, will be discussed in a separate bug report)
 - Name, indicated by the data-source
 - Description, indicated by the data-source
 - EventTemplates, set of templates for the sort of events it inserts. This is given by the data-source and is *optional* and purely *informational*.
 - Running, whether one or more instances of the data-source have registered themselves and are currently running
 - LastSeen, timestamp in msec of the last time the data-source performed an activity (RegisterDataSource or InsertEvents call)
 - Enabled, whether the data-source is allowed to insert data or the user has specifically disabled it

In the future there will be a GUI or some other interface showing a list of all known data-sources and providing the option to disable them selectively.

As a side effect of having this registry, the EventTemplates, Running and Enabled information is available for recent.py (the GtkRecentlyUsed-powered data-source) and any future data-sources (eg. maybe something GIO or KIO based) to choose which information to send and which would result in a duplicate, guessing from the actor and interpretation fields given in the templates*.

* Remember that it is legal for several data-sources to insert the same sort of information (that the template matches doesn't necessarily mean it is the same, and templates don't need to be complete), but for practical purposes this should be save enough to fix the GtkRecentlyUsed case, documenting this properly so that data-source writers can take into account the problem at hand.

------------------------------------------------
OLD SUMMARY:

Currently one of the issues that is bugging a couple of us is how to deal with dataproviders and policies to avoid event duplication.

Lets start with a scenario where I have gedit plugin installed:
1) I open a file with gedit (1 event) ====> gedit sends "open" (over dbus) and recentlyused sends "open" (2 events)
2) I modify and save the file (1 event) ====> gedit sends "modify" (over dbus) and so does recently used (2 events)
3) I close the file (1 event) ====> gedit sends "close" over dbus (1 event)

In real life they were only 3 events however zeitgeist gets 5. This happens because gedit also sends to recentlyused which is handled by our recenltyused manager.
Now as it turns out we cant get rid of recenlyused so soon since some apps (archive manager) are not plugable to send events to zeitgeist yet they inform recenlyused.

So my approach for co-existens will be as follows.
This applied for dataproviders maintained by us onl...

1) On first run datahub executes a script that installs a set of plugins for apps and enables them by default (each app uses a different way for enabling such as gconf etc), we can have a dialog that asks which apps do u want to cover with zeitgeist.
2) create a blacklist over the installed plugins for apps. this blacklist is the used by zeitgeist-datahub's recentlyused manager to ignore apps, to avoid duplication. This way when a user turns off a plugin we avoid logging form the apps too.

I really think this matter has to be tackled since it would be crucial for our "A Priori" algorithm, and would make it 100% reliable.
Lets discuss this here.

Blueprint information

Status:
Complete
Approver:
Seif Lotfy
Priority:
High
Drafter:
Siegfried Gevatter
Direction:
Needs approval
Assignee:
Siegfried Gevatter
Definition:
Approved
Series goal:
Accepted for 0.3
Implementation:
Implemented
Milestone target:
milestone icon 0.3.3
Started by
Siegfried Gevatter
Completed by
Siegfried Gevatter

Whiteboard

-----------------------------------------------------------------------------------
thekorn:
I don't get why we are talking about this over and over again. When we think we came to a conclusion it is always a matter of time before somebody brings this up *again*. However, IMO this is nothing we should discuss in a blueprint, if the current situation is a problem for user it is a bug, so we should discuss it in a bugreport, and there is already bug 462894 which deals with this problem.
And if we still think this is an issue, we should try to solve it for all time.
If we would like to solve it (Mikkel convinced me some time ago that we should go with the current situation) then I see only one valid and possible solution: let certain clients claim exclusive communication (insert and/or query) with the zeitgeist daemon. This exclusive communication is defined by a set of templates and it is not the clients who check if they are allowed to do any kind of communication, it is the engine who does all the management. I started implementing this idea at lp:~thekorn/zeitgeist/exclusive_clients but did not finish it, because I don't think we need it.
-----------------------------------------------------------------------------------
Seif: @thekorn
If we wanna go like that then i would prefer to manually hardcoding blocking our recentlyused manager for the dataproviders which we wrote plugins for. And creating a dataprovider installer :)
Cheers
Seif
----------------------------
kamstrup: Going to an exclusive API sounds, excuse me being blunt, completely bonkers. Then we might as well run everything inside the same process and have no dbus API what so ever. Exclusive access to the API defeats the entire purpose of dbus.

There are two ways to solve this. 1) By magical deduplication inside the engine, or 2) by defining this to be purely a configuration issue.

I claim that 1) is utterly impossible to do "right". By inserting some hacks in our current code we might get something that works right *for our very limited use so far*, but solving this problem in general will be a daunting task. Who says that duplicate events from a particularly slow source can't be ½s or even 2s apart?

And I actually think that 2) is not just the only choice, but also the right choice. It is up to the distros to package Zeitgeist in a way so that events are logged consistently. If distros to decide to ship some bad set up there is no amount f magical dupe detection that can save us.

So the question is: How do we solve this 100% inside the dataproviders package? The asiest thing is probably to not emit open/save eevents from the Gedit plugin., Just close.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.