Problem in FixedWordlistDataModelElement.py if a list element is equal to the beginning of another element

Bug #1712789 reported by Markus Wurzenberger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
logdata-anomaly-miner - lightweight tool for log checking, log analysis
Fix Released
Medium
Unassigned

Bug Description

Example Parser:

model=SequenceModelElement('example', [
    FixedWordlistDataModelElement('element1', ['a', 'b', 'c', 'aa'],
    FixedDataModelElement('element2', ' d')
])

Line:

aa d

Output:

Unparsed atom received

The problem in FixWordlistDataModelElement.py is that it iterates through self.wordlist and checks if the data that should be parsed is starting with the current word. If so it breaks. In our example that means that the parser first checks if 'aa d' starts with 'a', which is the case. Hence it considers the 'a' as the value of the path '/model/example/element1'. Next the AMiner checks if 'a d' starts with ' d', but it does not and the line is considered as unparsed. Although, the correctly output would be:

/model/example/element1: 4 ('aa')
/model/example/element1/element2 : d (' d')

My workaround to that problem was generating a temporary list, which I sorted by the length of the elements starting with the longest and returning as 'wordPos' the position of the element in the original list. The code looks as follows:

import MatchElement

class FixedWordlistDataModelElement:
  def __init__(self, id, wordlist):
    self.id=id
    self.wordlist=wordlist

  def getChildElements(self):
    return(None)

  def getMatchElement(self, path, matchContext):
    """@return None when there is no match, MatchElement otherwise."""
    data=matchContext.matchData
    tmp_wordlist = self.wordlist[:]
    tmp_wordlist.sort(key = len, reverse=True)
    matchData=None
    wordPos=0

    for word in tmp_wordlist:
      if data.startswith(word):
        matchData=word
        wordPos = self.wordlist.index(word)
        break
      wordPos+=1

    if matchData == None: return(None)

    matchContext.update(matchData)
    return(MatchElement.MatchElement("%s/%s" % (path, self.id),
        matchData, wordPos, None))

Revision history for this message
Roman Fiedler (roman-fiedler-deactivatedaccount) wrote :

As index in wordlist is the result of the match, resorting after creating the model element is not an option. Hence the wordlist has to have correct shape beforehand.

Done:
* Updated module documentation to state relevance of sorting for model element output
* Added check to constructor to verify, that wordlist has sane structure before using it

See
https://git.launchpad.net/logdata-anomaly-miner/commit/?id=6a7cf5650034ed62833f05bf91ce7afc6c2660dd

Changed in logdata-anomaly-miner:
status: New → Confirmed
assignee: nobody → Roman Fiedler (roman-fiedler)
importance: Undecided → Medium
status: Confirmed → Fix Committed
Changed in logdata-anomaly-miner:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.