[EDP][UI] Allow relative or absolute paths in the local hdfs

Bug #1315126 reported by Trevor McKay
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
High
Chad Roberts

Bug Description

Sahara currently demands that the URL for an hdfs data source begins with the hdfs scheme "hdfs://". The UI enforces this restriction as well.

When the hdfs scheme is specified, hadoop requires the hostname and/or port to be specified as well. For use of an external hdfs, of course, the host/port are always necessary.

On a long running cluster, the local hdfs can be used by specifying the host/port of the local namenode in the path so that data can be read/written to the cluster hdfs, for example "hdfs://test-master:9000/user/hadoop/input". However, hadoop supports simpler absolute and relative paths to access the local hdfs without specification of the host and port.

For example (assuming the hadoop user), the relative path "output_path" will evaluate to

  hdfs://namenode:port/user/hadoop/output_path

and the absolute path "/output_path" will evaluate to

  hdfs://namenode:port/output_path

Sahara should relax the restriction that hdfs URLs start with "hdfs://", and allow absolute/relative paths to access the local hdfs.

Revision history for this message
Trevor McKay (tmckay) wrote :

The patch for this in the sahara api is simple. Do not require the hdfs scheme, but if a scheme is present enforce hdfs and hostname. The UI will need changes to allow paths without scheme (but the change can be tested with the CLI)

diff --git a/sahara/service/validations/edp/data_source.py b/sahara/service/validations/edp/data_source.py
index d2c8072..89d393c 100644
--- a/sahara/service/validations/edp/data_source.py
+++ b/sahara/service/validations/edp/data_source.py
@@ -69,8 +69,9 @@ def _check_hdfs_data_source_create(data):
     if len(data['url']) == 0:
         raise ex.InvalidException("HDFS url must not be empty")
     url = urlparse.urlparse(data['url'])
- if url.scheme != "hdfs":
- raise ex.InvalidException("URL scheme must be 'hdfs'")
- if not url.hostname:
- raise ex.InvalidException("HDFS url is incorrect, "
- "cannot determine a hostname")
+ if url.scheme:
+ if url.scheme != "hdfs":
+ raise ex.InvalidException("URL scheme must be 'hdfs'")
+ if not url.hostname:
+ raise ex.InvalidException("HDFS url is incorrect, "
+ "cannot determine a hostname")

Revision history for this message
Trevor McKay (tmckay) wrote :

bah, launchpad messed up my format. But you get the idea.

Chad Roberts (croberts)
summary: - [EDP] Allow relative or absolute paths in the local hdfs
+ [EDP][UI] Allow relative or absolute paths in the local hdfs
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix merged to sahara (master)

Reviewed: https://review.openstack.org/91664
Committed: https://git.openstack.org/cgit/openstack/sahara/commit/?id=6296e8f09a15480bb3a931a8b5579b13019e2e44
Submitter: Jenkins
Branch: master

commit 6296e8f09a15480bb3a931a8b5579b13019e2e44
Author: Trevor McKay <email address hidden>
Date: Thu May 1 16:16:30 2014 -0400

    Allow HDFS data source paths without the hdfs:// scheme

    HDFS paths without a leading hdfs:// scheme will be interpreted by
    hadoop to refer to the local hdfs. Relative paths will be treated
    as relative to the user running the job; absolute paths will start
    at "/" in the local hdfs. Sahara should allow these forms for
    paths so that data sources can simply reference the local hdfs
    on a long running cluster.

    This can be tested/utilized from the CLI, but the UI will need
    an additional change to allow a user to submit these paths.

    Partial-Bug: #1315126
    Change-Id: I575bf9ff20ff348f7bf2bc52f116b74288b8464a

Chad Roberts (croberts)
Changed in sahara:
status: New → Confirmed
assignee: nobody → Chad Roberts (croberts)
importance: Undecided → High
milestone: none → juno-1
Changed in sahara:
status: Confirmed → In Progress
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to sahara (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/97762

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on sahara (stable/icehouse)

Change abandoned by afazekas (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/97762

Thierry Carrez (ttx)
Changed in sahara:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to sahara-dashboard (master)

Reviewed: https://review.openstack.org/92833
Committed: https://git.openstack.org/cgit/openstack/sahara-dashboard/commit/?id=601983762700cfa0a389c45412553867ca580afd
Submitter: Jenkins
Branch: master

commit 601983762700cfa0a389c45412553867ca580afd
Author: Chad Roberts <email address hidden>
Date: Thu May 8 10:08:00 2014 -0400

    Allowing for HDFS data sources without hdfs://

    Now that HDFS data sources can be either full URLs
    or paths relative to the cluster itself, it is no
    longer desirable to force "hdfs://" at the start
    of hdfs data sources.
    * No longer force the prefix on the create form
    * For swift, we now check to see that the location
      starts with swift://, if it doesn't we add it

    Change-Id: Id1fcdb741ce702ad194cf86556be2f3f0e8cd049
    Partial-Bug: bug 1315126

Thierry Carrez (ttx)
Changed in sahara:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.