file/dir backend does not create top-level atoms directory

Bug #1352550 reported by Brad Clements
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
taskflow
Triaged
High
Unassigned

Bug Description

I am creating a simple flow and using the 'file' backend to persist to /tmp/taskflow, code is like this:

    backend = backends.fetch(conf={
        'connection': 'file',
        'path': file_backend_path, # /tmp/taskflow
    })

    store = { ... }
    flow = get_my_flow()
    engine = engines.load(flow,
                          engine_conf=engine_conf,
                          store=store,
                          backend=backend)

When running, it fails as follows:

   2014-08-04 16:14:49,977 taskflow.persistence.backends DEBUG Looking for 'file' backend driver in 'taskflow.persistence'
   2014-08-04 16:14:49,983 stevedore.extension DEBUG found extension EntryPoint.parse('memory = taskflow.persistence.backends.impl_memory:MemoryBackend')
   2014-08-04 16:14:49,983 stevedore.extension DEBUG found extension EntryPoint.parse('sqlite = taskflow.persistence.backends.impl_sqlalchemy:SQLAlchemyBackend')
   2014-08-04 16:14:49,983 stevedore.extension DEBUG found extension EntryPoint.parse('postgresql = taskflow.persistence.backends.impl_sqlalchemy:SQLAlchemyBackend')
   2014-08-04 16:14:49,983 stevedore.extension DEBUG found extension EntryPoint.parse('zookeeper = taskflow.persistence.backends.impl_zookeeper:ZkBackend')
   2014-08-04 16:14:49,983 stevedore.extension DEBUG found extension EntryPoint.parse('file = taskflow.persistence.backends.impl_dir:DirBackend')
   2014-08-04 16:14:49,986 stevedore.extension DEBUG found extension EntryPoint.parse('mysql = taskflow.persistence.backends.impl_sqlalchemy:SQLAlchemyBackend')
   2014-08-04 16:14:49,987 stevedore.extension DEBUG found extension EntryPoint.parse('dir = taskflow.persistence.backends.impl_dir:DirBackend')
   2014-08-04 16:14:49,987 taskflow.utils.persistence_utils WARNING No logbook provided for flow taskflow.patterns.graph_flow.Flow: ups-invoice-process; 3, creating on
   e.
   2014-08-04 16:14:49,987 taskflow.utils.lock_utils DEBUG Got file lock "/tmp/taskflow/locks/book"
   2014-08-04 16:14:49,988 taskflow.utils.lock_utils DEBUG Released file lock "/tmp/taskflow/locks/book"
   2014-08-04 16:14:49,988 taskflow.utils.lock_utils DEBUG Got file lock "/tmp/taskflow/locks/book"
   2014-08-04 16:14:49,988 taskflow.utils.lock_utils DEBUG Got file lock "/tmp/taskflow/locks/flow"
   2014-08-04 16:14:49,989 taskflow.utils.lock_utils DEBUG Released file lock "/tmp/taskflow/locks/flow"
   2014-08-04 16:14:49,989 taskflow.utils.lock_utils DEBUG Released file lock "/tmp/taskflow/locks/book"
   2014-08-04 16:14:49,989 stevedore.extension DEBUG found extension EntryPoint.parse('default = taskflow.engines.action_engine.engine:SingleThreadedActionEngine')
   2014-08-04 16:14:49,989 stevedore.extension DEBUG found extension EntryPoint.parse('serial = taskflow.engines.action_engine.engine:SingleThreadedActionEngine')
   2014-08-04 16:14:49,996 stevedore.extension DEBUG found extension EntryPoint.parse('worker-based = taskflow.engines.worker_based.engine:WorkerBasedActionEngine')
   2014-08-04 16:14:49,996 stevedore.extension DEBUG found extension EntryPoint.parse('parallel = taskflow.engines.action_engine.engine:MultiThreadedActionEngine')
   2014-08-04 16:14:49,997 taskflow.utils.lock_utils DEBUG Got file lock "/tmp/taskflow/locks/flow"
   2014-08-04 16:14:49,997 taskflow.utils.lock_utils DEBUG Got file lock "/tmp/taskflow/locks/atom"
   2014-08-04 16:14:49,997 taskflow.persistence.backends.impl_dir ERROR Failed running locking file based session
   Traceback (most recent call last):
     File "/home/bkc/src/eclipse/main/openstack-taskflow/taskflow/persistence/backends/impl_dir.py", line 120, in _run_with_process_lock
    return functor(*args, **kwargs)
     File "/home/bkc/src/eclipse/main/openstack-taskflow/taskflow/persistence/backends/impl_dir.py", line 219, in _save_atoms_and_link
    self._save_atom_details(atom_detail, ignore_missing=True)
     File "/home/bkc/src/eclipse/main/openstack-taskflow/taskflow/persistence/backends/impl_dir.py", line 171, in _save_atom_details
    self._write_to(ad_path, jsonutils.dumps(ad_data))
     File "/home/bkc/src/eclipse/main/openstack-taskflow/taskflow/persistence/backends/impl_dir.py", line 112, in _write_to
    with open(filename, 'wb') as fp:
   IOError: [Errno 2] No such file or directory: '/tmp/taskflow/atoms/f2250c9e-e284-4b95-bb8e-2a436b3fc674'
   2014-08-04 16:14:49,998 taskflow.utils.lock_utils DEBUG Released file lock "/tmp/taskflow/locks/atom"
   2014-08-04 16:14:49,998 taskflow.utils.lock_utils DEBUG Released file lock "/tmp/taskflow/locks/flow"

inotifywait shows that /tmp/taskflow/atoms is never created.

If I manually create /tmp/taskflow/atoms before running the engine, it works

Revision history for this message
Brad Clements (bkc) wrote :
Revision history for this message
Brad Clements (bkc) wrote :

I am running from trunk

commit e72a02d97043120195c237b567d83e39c3c3feda
Merge: 6da7782 2405bd8
Author: Jenkins <email address hidden>
Date: Fri Jul 25 01:26:40 2014 +0000

    Merge "Enable hacking checks H305 and H307 in tox.ini template"

Revision history for this message
Joshua Harlow (harlowja) wrote :

Ah, think I see it, did u run upgrade() on the backend before running?

https://github.com/openstack/taskflow/blob/master/taskflow/persistence/backends/base.py#L61

When the file/dir persistence backend is used this goes through and creates the needed directories.

https://github.com/openstack/taskflow/blob/master/taskflow/persistence/backends/impl_dir.py#L301

Changed in taskflow:
importance: Undecided → Critical
assignee: nobody → Joshua Harlow (harlowja)
importance: Critical → Undecided
status: New → Incomplete
Revision history for this message
Joshua Harlow (harlowja) wrote :

Also note that if u use the following as a context manager this will be done for u.

https://github.com/openstack/taskflow/blob/master/taskflow/persistence/backends/__init__.py#L72

This would work as follows:

with backend.backend(conf) as backend:
    store = { ... }
    flow = get_my_flow()
    engine = engines.load(flow,
                          engine_conf=engine_conf,
                          store=store,
                          backend=backend)
    engine.run()
    ....

Revision history for this message
Brad Clements (bkc) wrote :

Hi,

I did not run 'upgrade' on the backend. Is that documented as being required somewhere?

Searching the code base, I only see that it's called in some unit tests, it's not being explicitely called in any of the examples.

I have seen examples of using the backend as a context manager but that just makes me think "oh, that's a nice way to automatically call close() on the backend", so I must either do that or make sure I call close() myself.

The requirement to call upgrade() before using a backend doesn't seem obvious anywhere.

In fact, in the docs at http://docs.openstack.org/developer/taskflow/persistence.html#taskflow.persistence.backends.backend

it says "if the schema is already up to date this is a no-op". Since I know I'm creating new logbooks, this implies there's no need to use the context manager in my case, and for http://docs.openstack.org/developer/taskflow/persistence.html#taskflow.persistence.backends.base.Connection.upgrade it says "Migrate the persistence backend to the most recent version." which also indicates it's only needed if there are existing logbooks that might need to be upgraded

Also what if I pass a dictionary/persistence_conf for the backend parameter to engines.load instead of an actual backend. Will all the engines call upgrade() , or do they do so only as an artifact of using the backend through the context manager?

if I write my own backend, should I rely on all users always calling upgrade() before taking any other action with the backend connection, or might they only call it when there are existing logbooks that might need to have their schema upgraded?

It sounds like 'upgrade' is being used partly to prepare() in addition to doing a schema update. Should the prepare() functionality be broken out into it's own backend method? Or should upgrade() be renamed 'prepare_and_upgrade()' , or just update the documentation and examples?

Revision history for this message
Joshua Harlow (harlowja) wrote :

Hi,

It might make sense to move to a prepare() and upgrade() function. I'd be ok with that.

I think the examples should also be updated to (and the docs in code).

Maybe upgrade should really be renamed prepare() (and we then remove upgrade in the version + 1). Prepare does seem like a better name and is more meaningful for backends that don't do schema upgrades (ie the file based backend which has no concept of schemas).

What do u think?

Changed in taskflow:
importance: Undecided → High
status: Incomplete → Confirmed
assignee: Joshua Harlow (harlowja) → nobody
Revision history for this message
Joshua Harlow (harlowja) wrote :

Another idea, how about a step-program here.

1. Update docs & examples (code and otherwise) to ensure it shows upgrade() occurring.
2. Create prepare method() - have upgrade() call prepare() - mark upgrade() as deprecated (doing something like being done in https://review.openstack.org/#/c/102091/12/taskflow/utils/deprecation.py)
3. In version N + 1, remove upgrade() and adjust docs and examples again to reflect prepare().

Revision history for this message
Brad Clements (bkc) wrote :

I defer to your experience as to if and how changes should be made.

if you move to a prepare() method instead of upgrade(), does that mean backend implementations will be responsible for determining for themselves if an upgrade is necessary, and they would do that from the prepare() call?

In some respects that seems to make sense to me because only the backend implementation can know for sure if an upgrade is really required or not.

Revision history for this message
Joshua Harlow (harlowja) wrote :

Right, backends would be allowed to do upgrade style activities in prepare() if they so choose, or they can make directories or any other kind of preparation style activities. Will put a review up for this, unless u want to?

Joshua Harlow (harlowja)
Changed in taskflow:
status: Confirmed → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.