Explore path-index like queries

Bug #1639044 reported by Paul Everitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KARL4
Won't Fix
Medium
Jim Fulton

Bug Description

We currently hard-code "community" as a column in our (hacked) pgtextindex. However, we are also going to need to query folder/files in a subfolder. Our current plan was to add another column called "parent" and use it for that case.

We could consider solving both with the Zope concept of path index. Do a little research on that. Perhaps it can be encoded directly as an array of docids.

Revision history for this message
Jim Fulton (jim-zope) wrote :

I had to refresh my memory of what path indexes did. So I looked at:

https://docs.zope.org/zope2/zope2book/SearchingZCatalog.html#path-index-record-attributes

And then at:

https://github.com/zopefoundation/Products.ZCatalog/tree/master/src/Products/PluginIndexes/PathIndex

/me mops up brain matter

Wow, this isn't what I thought I'd remembered.

Do, implementing a path index that supports just searching with level = 0 is pretty simple and fast. Just store and btree-index the document paths with trailing delimiters. (eg '/foo/bar/') and do prefix searches:

  path like '/foo/bar/%'

Is this sufficient? Or do you want support for level != 0?

Revision history for this message
Paul Everitt (paul-agendaless) wrote : Re: [Bug 1639044] Explore path-index like queries

We will need the ability to the following. For an object at:

/communities/some-community/folders/folderA/folder22/folderB

…to get folder listings for folderB, folder 22, and folderA. With filtered query support.

We can do this in another way, simply by adding a column “parent”.

We don’t necessarily need to do queries for folderA and get everything in folder22 and folderB, if that’s what you mean. But we do need to know everything in some-community.

—Paul

> On Nov 4, 2016, at 4:35 PM, Jim Fulton <email address hidden> wrote:
>
> I had to refresh my memory of what path indexes did. So I looked at:
>
> https://docs.zope.org/zope2/zope2book/SearchingZCatalog.html#path-index-
> record-attributes
>
> And then at:
>
> https://github.com/zopefoundation/Products.ZCatalog/tree/master/src/Products/PluginIndexes/PathIndex
>
> /me mops up brain matter
>
> Wow, this isn't what I thought I'd remembered.
>
> Do, implementing a path index that supports just searching with level =
> 0 is pretty simple and fast. Just store and btree-index the document
> paths with trailing delimiters. (eg '/foo/bar/') and do prefix
> searches:
>
> path like '/foo/bar/%'
>
> Is this sufficient? Or do you want support for level != 0?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1639044
>
> Title:
> Explore path-index like queries
>
> Status in KARL4:
> New
>
> Bug description:
> We currently hard-code "community" as a column in our (hacked)
> pgtextindex. However, we are also going to need to query folder/files
> in a subfolder. Our current plan was to add another column called
> "parent" and use it for that case.
>
> We could consider solving both with the Zope concept of path index. Do
> a little research on that. Perhaps it can be encoded directly as an
> array of docids.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl4/+bug/1639044/+subscriptions

Revision history for this message
Jim Fulton (jim-zope) wrote :
Download full text (3.5 KiB)

On Sat, Nov 5, 2016 at 9:17 AM, Paul Everitt <email address hidden> wrote:

> We will need the ability to the following. For an object at:
>
> /communities/some-community/folders/folderA/folder22/folderB
>
> …to get folder listings for folderB, folder 22, and folderA. With
> filtered query support.
>

The query I gave would give you all of the descendents for any given path.
This makes sense for situations where you use a path search to narrow some
other search to a section of a site.

You can also construct queries to narrow things further, but the index only
helps with the common prefix in a query, so...

> We can do this in another way, simply by adding a column “parent”.
>

If you want *just* the immediate children of an object, then an index on
parent would be better, at least in terms of performance.

The efficacy of solution can only be measured relative to a problem.

>
> We don’t necessarily need to do queries for folderA and get everything
> in folder22 and folderB, if that’s what you mean. But we do need to know
> everything in some-community.
>

See above.

The, to me, weird thing about the ``level`` parameter in PathIndex queries
is that it lets you do things like:

- search for paths that have some subpath (foo/bar) at some level
(``level`` > 0) in the tree.

- search for paths that have some subpath (foo/bar) anywhere in a path.

I suppose some use case motivated this (although I can also imagine an
over-eager generalization :). If it's over-eager generalization, it's a
shame because it made the implementation a lot more complicated and
searches slower.

Jim

>
> —Paul
>
> > On Nov 4, 2016, at 4:35 PM, Jim Fulton <email address hidden> wrote:
> >
> > I had to refresh my memory of what path indexes did. So I looked at:
> >
> > https://docs.zope.org/zope2/zope2book/SearchingZCatalog.html#path-index-
> > record-attributes
> >
> > And then at:
> >
> > https://github.com/zopefoundation/Products.ZCatalog/tree/master/src/
> Products/PluginIndexes/PathIndex
> >
> > /me mops up brain matter
> >
> > Wow, this isn't what I thought I'd remembered.
> >
> > Do, implementing a path index that supports just searching with level =
> > 0 is pretty simple and fast. Just store and btree-index the document
> > paths with trailing delimiters. (eg '/foo/bar/') and do prefix
> > searches:
> >
> > path like '/foo/bar/%'
> >
> > Is this sufficient? Or do you want support for level != 0?
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1639044
> >
> > Title:
> > Explore path-index like queries
> >
> > Status in KARL4:
> > New
> >
> > Bug description:
> > We currently hard-code "community" as a column in our (hacked)
> > pgtextindex. However, we are also going to need to query folder/files
> > in a subfolder. Our current plan was to add another column called
> > "parent" and use it for that case.
> >
> > We could consider solving both with the Zope concept of path index. Do
> > a little research on that. Perhaps it can be encoded directly as an
> > array of docids.
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/karl4/+b...

Read more...

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

> On Nov 5, 2016, at 10:19 AM, Jim Fulton <email address hidden> wrote:
>
> On Sat, Nov 5, 2016 at 9:17 AM, Paul Everitt <<email address hidden> <mailto:<email address hidden>>>
> wrote:
>
>> We will need the ability to the following. For an object at:
>>
>> /communities/some-community/folders/folderA/folder22/folderB
>>
>> …to get folder listings for folderB, folder 22, and folderA. With
>> filtered query support.
>>
>
> The query I gave would give you all of the descendents for any given path.

Ok, great, then this could replace our current approach of adding a “community” column in the repoze.pgtextindex table. Whether we *would* do that transition depends on the next (and AFAIK, only) other path-based case.

> This makes sense for situations where you use a path search to narrow some
> other search to a section of a site.
>
> You can also construct queries to narrow things further, but the index only
> helps with the common prefix in a query, so...
>
>
>> We can do this in another way, simply by adding a column “parent”.
>>
>
> If you want *just* the immediate children of an object, then an index on
> parent would be better, at least in terms of performance.

Yep.

> The efficacy of solution can only be measured relative to a problem.
>
>
>>
>> We don’t necessarily need to do queries for folderA and get everything
>> in folder22 and folderB, if that’s what you mean. But we do need to know
>> everything in some-community.
>>
>
> See above.
>
> The, to me, weird thing about the ``level`` parameter in PathIndex queries
> is that it lets you do things like:
>
> - search for paths that have some subpath (foo/bar) at some level
> (``level`` > 0) in the tree.
>
> - search for paths that have some subpath (foo/bar) anywhere in a path.
>
> I suppose some use case motivated this (although I can also imagine an
> over-eager generalization :). If it's over-eager generalization, it's a
> shame because it made the implementation a lot more complicated and
> searches slower.

Yeh, that sounds like the kind of over-abstraction that I probably requested. :)

—Paul

Changed in karl4:
milestone: 025 → 026
Changed in karl4:
milestone: 026 → 027
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Jim, I wonder if this should be kept open. Your next view (the Archive Portlet) will require limiting results to a single community, which is part of the path. You'll likely be re-thinking this discussion.

Revision history for this message
Jim Fulton (jim-zope) wrote :

I don't know about keeping this open. The original use case, filtering by community_id isn't well served, I don't think.

OTOH, If you wanted to search by community *name*, then I think an index on path(state) could make a lot of sense.

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Ok, we'll make discussion about this in the ticket for the view where you'll need it. But yes, I should have indicated that community name would suffice.

Changed in karl4:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.