Large requests increase memory usage considerably

Bug #624310 reported by Michal Hruby
40
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Zeitgeist Extensions
Invalid
Undecided
Unassigned
Zeitgeist Framework
Invalid
Low
Unassigned
zeitgeist (Ubuntu)
Invalid
Low
Unassigned

Bug Description

I'm seeing with standalone Sezen that after running it, mem usage of the zeitgeist-daemon process goes up from ~13MB to ~40MB, this is understandable as when Sezen is starting, it does one big query where it asks for everything grouped by most recent subjects and in my case this returns ~11 thousand events, so the extra 30MB can be explained by allocating memory for the DBus reply.

Still, my question is whether Zeitgeist should be at mercy of the applications, where nothing prevents them from spiking the memory usage of the core process. (I already saw a couple of times zeitgeist using 80-100MB of memory on my system). Perhaps there's a way to tell python dbus to free its buffers?

Seif Lotfy (seif)
Changed in zeitgeist:
status: New → Confirmed
importance: Undecided → High
milestone: none → 0.5.1
Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

I'd actually guess that the memory was allocated inside sqlite rather than python-dbus, but that's hard to tell without really profiling it.

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

Come to think of it - the fact that Sezen requests 11k events also suggests that there is something wrong in the design scope of either Zeitgeist or Sezen.

Before we loose our selves on an optimization cache-pruning spree (which may very well consume a lot of development time for some very small gains) I think we should figure out why it needs 11k events and if either ZG needs some API additions or Sezen needs fixing.

Revision history for this message
Seif Lotfy (seif) wrote :

Markus is already working on a profiling tool for this...
https://code.edge.launchpad.net/~zeitgeist-extensions/zeitgeist-extensions/trunk

Seif Lotfy (seif)
Changed in zeitgeist:
milestone: 0.5.1 → 0.6
importance: High → Undecided
Revision history for this message
Seif Lotfy (seif) wrote :

Thinking of it I somehow remember the iterator idea. Maybe this is something we should really look at.
An extension with extra cursors for each iterator could be an idea, with an
1) get_iterator(time_range, event_template, result_type, transaction_size) which returns an iterator object with next() and previous()
2) release_iterator(iterator_id)
This approach can be very useful in the sense of Sezen and Unity where we won't need to execute big queries at once , but rather in chunks and upon request.
Don't bash me for trying.

Revision history for this message
Siegfried Gevatter (rainct) wrote : Re: [Bug 624310] Re: Large requests increase memory usage considerably

Seif, that's what we have FindEventIds for.

Revision history for this message
Seif Lotfy (seif) wrote :

I know yet the iteration is handeled then from the client side... I am
suggesting a way to do it completely from the engine side :)

On Fri, Oct 1, 2010 at 10:00 PM, Siegfried Gevatter <email address hidden>wrote:

> Seif, that's what we have FindEventIds for.
>
> --
> Large requests increase memory usage considerably
> https://bugs.launchpad.net/bugs/624310
> You received this bug notification because you are a member of Zeitgeist
> Extensions, which is the registrant for Zeitgeist Extensions.
>
> Status in Zeitgeist Framework: Confirmed
> Status in Zeitgeist Extensions: New
>
> Bug description:
> I'm seeing with standalone Sezen that after running it, mem usage of the
> zeitgeist-daemon process goes up from ~13MB to ~40MB, this is understandable
> as when Sezen is starting, it does one big query where it asks for
> everything grouped by most recent subjects and in my case this returns ~11
> thousand events, so the extra 30MB can be explained by allocating memory for
> the DBus reply.
>
> Still, my question is whether Zeitgeist should be at mercy of the
> applications, where nothing prevents them from spiking the memory usage of
> the core process. (I already saw a couple of times zeitgeist using 80-100MB
> of memory on my system). Perhaps there's a way to tell python dbus to free
> its buffers?
>
>
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
Markus Korn (thekorn) wrote :

As far as I remember, the main reason why we dropped this iterator thing being part of our API redisign efforts a year ago was that there is no easy (and performant) way to do batched database queries. The problem is: we don't have a 'stable' order of events, just think of a query which returns the last inserted events, if new events are inserted between requesting one page and the following, how should be handle them?
Client side batching by using FindEventIds, slicing over the result and get the actual event objects on demand seems a much more reasonable approach.

Revision history for this message
Seif Lotfy (seif) wrote :

OK now it all makes sense again. Sorry guys.
On another note I found a nifty tool for memory profiling.
http://guppy-pe.sourceforge.net/
Let's see what we can do with that
<http://guppy-pe.sourceforge.net/>Cheers
Seif

On Sat, Oct 2, 2010 at 11:13 AM, Markus Korn <email address hidden> wrote:

> As far as I remember, the main reason why we dropped this iterator thing
> being part of our API redisign efforts a year ago was that there is no easy
> (and performant) way to do batched database queries. The problem is: we
> don't have a 'stable' order of events, just think of a query which returns
> the last inserted events, if new events are inserted between requesting one
> page and the following, how should be handle them?
> Client side batching by using FindEventIds, slicing over the result and get
> the actual event objects on demand seems a much more reasonable approach.
>
> --
> Large requests increase memory usage considerably
> https://bugs.launchpad.net/bugs/624310
> You received this bug notification because you are a member of Zeitgeist
> Extensions, which is the registrant for Zeitgeist Extensions.
>
> Status in Zeitgeist Framework: Confirmed
> Status in Zeitgeist Extensions: New
>
> Bug description:
> I'm seeing with standalone Sezen that after running it, mem usage of the
> zeitgeist-daemon process goes up from ~13MB to ~40MB, this is understandable
> as when Sezen is starting, it does one big query where it asks for
> everything grouped by most recent subjects and in my case this returns ~11
> thousand events, so the extra 30MB can be explained by allocating memory for
> the DBus reply.
>
> Still, my question is whether Zeitgeist should be at mercy of the
> applications, where nothing prevents them from spiking the memory usage of
> the core process. (I already saw a couple of times zeitgeist using 80-100MB
> of memory on my system). Perhaps there's a way to tell python dbus to free
> its buffers?
>
>
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
Markus Korn (thekorn) wrote :

I've made some tests tonight, on an activitylog with 18k random web history events.
The in memory size of the daemon is about 7MiB on startup.
After running
  FindEvents(TimeRange.until_now(), [],StorageState.Any, 0,1)
(same query is used by sezen *over and over again*)
the daemon's memory consumption grows to ~38MiB.
After this test I used a pure ZeitgeistEngine instance, and run
  engine.find_events(TimeRange.until_now(), [],StorageState.Any, 0,1)
on the same database (without any DBus interaction)
I'm observing the same memory consumption boost.

Conclusion: DBus is doing well, the buffers are freed, it must be something else.
Next step: try to find a way to track sqlite3_memory_used() from python (or find any other way to get how much memory is used by sqlite3)

Revision history for this message
Seif Lotfy (seif) wrote :

Ok I don't think this issue will be tackled for 0.6 now so postponing for 0.7

Changed in zeitgeist:
milestone: 0.6 → 0.7
Seif Lotfy (seif)
Changed in zeitgeist:
milestone: 0.7.0 → none
Seif Lotfy (seif)
Changed in zeitgeist:
milestone: none → 0.8.0
Revision history for this message
Nicolás Abel Carbone (nicocarbone) wrote :

Is there a posibility this bug is related with this one in Unity?: https://bugs.launchpad.net/ubuntu/+source/zeitgeist/+bug/757727

If so, is it being worked on?

Revision history for this message
Seif Lotfy (seif) wrote :

So using fedora 32-bit Zeitgeist consumes about 90 MB after a few hours
I am using Synapse and GNOME Shell.

Revision history for this message
Seif Lotfy (seif) wrote :

I had a brief discussion with the guys on #python
---
<seiflotfy> can u explain to me why inheriting from object with __slots__ uses less memory than inheriting from list? make it for dummies so i can explain it to my team

<verte> seiflotfy: do you know what a struct looks like in memory?

<seiflotfy> verte, nope

<verte> seiflotfy: __slots__ is exactly that. on some implementations of python, the object literally has one pointer for each field. there's no dict hanging about to store attributes.

<seiflotfy> so the statement is true then ?

<nedbat> seiflotfy: the statement about __slots__ reducing memory? Yes, that was a major goal of __slots__.

<verte> seiflotfy: oh, yes. if you need methods on what would otherwise be a tuple, and you expect to have lots of those objects, __slots__ may be a good idea.

Revision history for this message
Seif Lotfy (seif) wrote :

Based on what I posted above I noticed that in our datamodel.py Event and Subject both inherit from list which could cause the excessive memory usage. I think we should look into re-implementing them as objects with all the necessary additions like __getitem__ and __setitem__ to avoid breaking the API/ABI

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

Before we freak about about __slots__ and structs I think we need hard evidence that we lose all this memory to Events and Subjects. Attaching some profiler to the Python VM should sort this out quickly. And why doesn't Ubuntu show the same symptoms?

Changed in zeitgeist:
milestone: 0.8.0 → 0.8.1
Revision history for this message
Siegfried Gevatter (rainct) wrote :

Here's an easy way to check this:

[rainct, zeitgeist-trunk]$ ps -u $USERNAME -o comm,rss | grep zeitgeist-dae
zeitgeist-daemo 13948
[rainct, zeitgeist-trunk]$ ./reproducer.py
96466
[rainct, zeitgeist-trunk]$ ps -u $USERNAME -o comm,rss | grep zeitgeist-dae
zeitgeist-daemo 31032

Revision history for this message
Seif Lotfy (seif) wrote :

So i created a shitty extension too see how much memory consumption happens before things are sent over dbus.
The idea is I ask for ALL events directly from the engine, 10 seconds after zeitgeist starts
Here are my observations

find_eventids: starts with 16.8 MB and end up with 25.4 MB

find_events: starts with 16.8 MB and end up with 106.2 MB

I think for the first case we cant do much
However for the second case we need to reduce the memory footprint of the Event and Subject in our datamodel. Maybe by using the __slots__

I also wrote 2 scripts one in python and the other in Vala... Both connect to the DB and ask for all ids.
Observations are vala uses 6.6 MB and python 12.9...

Maybe we can write our own cursor around the sqlite in C and create bindings for it to allow us to play with the memory and instead of returning lists in fetchall we can make it return tuples

Revision history for this message
Seif Lotfy (seif) wrote :

also here are is the python test script

Revision history for this message
Seif Lotfy (seif) wrote :

and here is the vala test ?field.comment=and here is the vala test

Revision history for this message
Seif Lotfy (seif) wrote :

for the last 2 scripts make sure you change the path to the DB

Seif Lotfy (seif)
Changed in zeitgeist:
importance: Undecided → Low
status: Confirmed → In Progress
milestone: 0.8.1 → none
Changed in zeitgeist (Ubuntu):
importance: Undecided → Low
Revision history for this message
Trever Fischer (tdfischer) wrote :

After some very thorough analysis, I've concluded that zeitgeist is not at blame here.

If my understanding of the kernel slab allocator is correct, free() doesn't actually free the memory from a program's space until the kernel decides it needs reclaimed. As such, zeitgeist-daemon is not at fault and probably not actually using all that memory.

Revision history for this message
Trever Fischer (tdfischer) wrote :

Also attaching massif output.

Changed in zeitgeist:
status: In Progress → Invalid
Changed in zeitgeist-extensions:
status: New → Invalid
Changed in zeitgeist (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.