Redis garbage collection
Bug #797749 reported by
Lars Butler
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenQuake (deprecated) |
Fix Released
|
High
|
Lars Butler |
Bug Description
Jobs should clean up after themselves when they are complete. If the Redis holds on to too much historical data, the KVS can become really slow, which affects the performance of the OQ engine as a whole.
Changed in openquake: | |
importance: | Undecided → High |
status: | New → Confirmed |
description: | updated |
Changed in openquake: | |
assignee: | nobody → Lars Butler (lars-butler) |
Changed in openquake: | |
status: | Confirmed → In Progress |
milestone: | none → 0.4.1 |
Changed in openquake: | |
milestone: | 0.4.1 → none |
tags: | added: current performance |
tags: |
added: current-cycle removed: current |
Changed in openquake: | |
milestone: | none → 0.4.3 |
Changed in openquake: | |
milestone: | 0.4.3 → 0.4.1 |
tags: |
added: nosql removed: current-cycle |
tags: | added: sys-quality |
Changed in openquake: | |
status: | In Progress → Fix Committed |
Changed in openquake: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
I did some initial investigation and brainstorming for this bug. Here's what I came up with:
------- ------- ------- ------- -----
1) Key generation functions
There are 3 functions in openquake. kvs.__init_ _.py which generate various types of keys for a 'job':
generate_ job_key( job_id) site_key( job_id, block_id) product_ key(job_ id, product, block_id="", site="")
generate_
generate_
These are the keys that we care about the most when it comes to garbage collection in Redis.
------- ------- ------- ------- ------- ----
2) 'Garbage collected' decorator
I think that if we can capture these keys in a list somewhere, we can ask Redis to delete (http:// redis.io/ commands/ del) each key in the list when the job is complete.
One way I though of capturing these keys is by adding a @garbage_collected decorator to each of the 3 functions mentioned above. For example:
@garbage_collected site_key( job_id, block_id)
def generate_
Assuming that job_id will always be the first argument for these 3 key generation functions, the @garbage_collected decorator would do a 'rpush JOB_KEYS!<job_id> <generated_key>', where <job_id> is the value of the job_id arg and <generated_key>is the value returned by the decorated function. Thus, each time any kind of job key is generated, that key will be stored in a list (in Redis).
This doesn't guarantee that we keep track of ALL job data, but we should capture at least 99%. There is a risk that we might miss something, but what I've proposed here is--I think--a good first step.
------- ------- ------- ------- ------- ---
3) Doing the garbage collection
When a job is finished, we should be able to called a KVS gc() function to clean everything up. Here's how I think this can be implemented:
In openquake. kvs.__init_ _:
def gc(job_id): <job_id> ', where <job_id> is the input to the function. redis.io/ commands/ del) each of these keys. <job_id> '. We don't need this list anymore.
# Get the list of keys for this job.
# These keys are all stored in the KVS in under the key 'JOB_KEYS!
# Use the Redis client to delete (http://
# Then delete the contents of 'JOB_KEYS!
# Return the number of items deleted.