Since the migration to the new deployment last night, creation of new subordinate repositories (i.e. ones where there's already a target-default repository for their target) is failing. The traceback from turnip-api looks like this:
Traceback (most recent call last):
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/gunicorn/workers/sync.py", line 130, in handle
self.handle_request(listener, req, client, addr)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/gunicorn/workers/sync.py", line 171, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/pyramid/router.py", line 242, in __call__
response = self.invoke_subrequest(request, use_tweens=True)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/pyramid/router.py", line 217, in invoke_subrequest
response = handle_request(request)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/pyramid/tweens.py", line 46, in excview_tween
response = view_callable(exc, request)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/pyramid/config/views.py", line 385, in viewresult_to_response
result = view(context, request)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/pyramid/tweens.py", line 21, in excview_tween
response = handler(request)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/pyramid/router.py", line 163, in handle_request
response = view_callable(context, request)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/pyramid/config/views.py", line 596, in __call__
return view(context, request)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/pyramid/config/views.py", line 329, in attr_view
return view(context, request)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/pyt
hon2.7/site-packages/pyramid/config/views.py", line 305, in predicate_wrapper
return view(context, request)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/python2.7/site-packages/pyramid/config/views.py", line 355, in rendered_view
result = view(context, request)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/python2.7/site-packages/pyramid/config/views.py", line 501, in _requestonly_view
response = view(request)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/env/local/lib/python2.7/site-packages/cornice/service.py", line 527, in wrapper
response = view_()
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/turnip/api/views.py", line 77, in collection_post
repo, clone_from=repo_clone, clone_refs=clone_refs)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/turnip/api/store.py", line 167, in init_repo
import_into_subordinate(sub_path, clone_from)
File "/srv/turnip/turnip-api/payloads/f486417f5fd1674241677fe2f1afb886b3aa3b24/turnip/api/store.py", line 138, in import_into_subordinate
os.link(from_path, sub_path)
OSError: [Errno 13] Permission denied
But it's not currently clear to me why this is getting EACCES; the permissions look fine on disk. https://pastebin.canonical.com/p/s4SYJR3rgw/ shows lp:ubuntu/+source/germinate (27656) and a failed attempt to create lp:~cjwatson/ubuntu/+source/germinate (43224). Based on the find output (assuming that find iterates over directories in the same order as os.listdir, which the output seems to suggest), the EACCES is probably on objects/pack/pack-ccdf56ab39b00807e62fc0e290def554e6935b05.pack, but there seems no clear reason why that should be the case.
I haven't been able to reproduce this on staging, and manually running ln on the probable file in question succeeded. Stracing all the gunicorn processes slowed things down to the point where repository creation attempts timed out (which is a separate bug). Maybe this is an NFS race involving new directories or something?
######### On the NFS client side ###########
# link(): SVr4, 4.3BSD, POSIX.1-2001 (but see NOTES), POSIX.1-2008.
int link(const char *oldpath, const char *newpath);
ERRORS:
EACCES Write access to the directory containing newpath is denied, or search permission is denied for one of the directories in the path prefix of oldpath or newpath.
Notes:
- Hard links, as created by link(), cannot span filesystems. Use symlink(2) if this is required.
Bugs:
On NFS filesystems, the return code may be wrong in case the NFS server performs the link creation and dies before it can say so. Use stat(2) to find out if the link got created.
----
So, unless ganesha deviates from POSIX.1-2008, OR the ganesha nfsd RPC thread crashes, EACCESS should be a result of either NEWPATH having bad permissions for NFS user OR there is an issue with directory (+x) permissions in ONE directory of EITHER oldpath or newpath (/aaa/bbb/cccc <- the full path I mean).
----
######### On the NFS SERVER side ###########
Considering the NFS server is giving us EACCESS, then...
With https:/ /github. com/nfs- ganesha/ nfs-ganesha/ wiki/NFS- Ganesha- Architecture:
the EACCESS comes from the def_handle_ops abstraction (FSAL/default_ methods. c): fsal_test_access() or fsal_obj_ ops->test_ access( ). We also have the MDCACHE in between final fsal_test_access() and the caller.
The stacktrace:
fsal_print_ access_ by_acl( ) check_access_ acl() test_access( ) test_access( ) -> comes from mdcache( ops->test_ access( )) call
mdcache_ handle_ ops_init( ) -> initializes fsal_obj_ops for:
mdcache_ fsal_init( ) ops->test_access = mdcache_ test_access;
start_ fsals()
fsacl_
fsal_
mdcache_
would tell us the exact error if:
if (!isFullDebug( COMPONENT_ NFS_V4_ ACL))
return;
debug was enabled.
--------
For the fsal_test_access():
/* test_access
* common (default) access check method for fsal_obj_handle objects.
* NOTE: A fsal can replace this method with their own custom access
* checker. If so and they wish to have an option to switch
* between their custom and this one, it their test_access
* method's responsibility to do that test and select this one.
*/
looks like a filesystem abstraction can replace the check method. fsal_test_access() is the DEFAULT FSAL check access but not the only one.
NOTE: I'm not sure what is the underlying FSAL you're using (ceph ?)
--------
From the FSAL abstraction layer:
/** access) (struct fsal_obj_handle *obj_hdl,
* @brief Check access for a given user against a given object
*
* This function checks whether a given user is allowed to perform the
* specified operations against the supplied file. The goal is to
* allow filesystem specific semantics to be applied to cached
* metadata.
*
* This method must read attributes and/or get them from a cache.
*
* @param[in] obj_hdl Handle to check
* @param[in] access_type Access requested
* @param[out] allowed Returned access that could be granted
* @param[out] denied Returned access that would be granted
* @param[in] owner_skip Skip test if op_ctx->creds is owner
*
* @return FSAL status.
*/
fsal_status_t (*test_
f...