logdata-anomaly-miner - lightweight tool for log checking, log analysis

Problem in FixedWordlistDataModelElement.py if a list element is equal to the beginning of another element

Bug #1712789 reported by Markus Wurzenberger on 2017-08-24

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	logdata-anomaly-miner - lightweight tool for log checking, log analysis	Fix Released	Medium	Unassigned

Bug Description

Example Parser:

model=SequenceModelElement('example', [
FixedWordlistDataModelElement('element1', ['a', 'b', 'c', 'aa'],
FixedDataModelElement('element2', ' d')
])

Line:

aa d

Output:

Unparsed atom received

The problem in FixWordlistDataModelElement.py is that it iterates through self.wordlist and checks if the data that should be parsed is starting with the current word. If so it breaks. In our example that means that the parser first checks if 'aa d' starts with 'a', which is the case. Hence it considers the 'a' as the value of the path '/model/example/element1'. Next the AMiner checks if 'a d' starts with ' d', but it does not and the line is considered as unparsed. Although, the correctly output would be:

/model/example/element1: 4 ('aa')
/model/example/element1/element2 : d (' d')

My workaround to that problem was generating a temporary list, which I sorted by the length of the elements starting with the longest and returning as 'wordPos' the position of the element in the original list. The code looks as follows:

import MatchElement

class FixedWordlistDataModelElement:
  def __init__(self, id, wordlist):
    self.id=id
    self.wordlist=wordlist

def getChildElements(self):
return(None)

  def getMatchElement(self, path, matchContext):
    """@return None when there is no match, MatchElement otherwise."""
    data=matchContext.matchData
    tmp_wordlist = self.wordlist[:]
    tmp_wordlist.sort(key = len, reverse=True)
    matchData=None
    wordPos=0

    for word in tmp_wordlist:
      if data.startswith(word):
        matchData=word
        wordPos = self.wordlist.index(word)
        break
      wordPos+=1

if matchData == None: return(None)

    matchContext.update(matchData)
    return(MatchElement.MatchElement("%s/%s" % (path, self.id),
        matchData, wordPos, None))

Revision history for this message

Roman Fiedler (roman-fiedler-deactivatedaccount) wrote on 2017-08-28:

As index in wordlist is the result of the match, resorting after creating the model element is not an option. Hence the wordlist has to have correct shape beforehand.

Done:
* Updated module documentation to state relevance of sorting for model element output
* Added check to constructor to verify, that wordlist has sane structure before using it

See
https://git.launchpad.net/logdata-anomaly-miner/commit/?id=6a7cf5650034ed62833f05bf91ce7afc6c2660dd

Changed in logdata-anomaly-miner:
status:	New → Confirmed
assignee:	nobody → Roman Fiedler (roman-fiedler)
importance:	Undecided → Medium
status:	Confirmed → Fix Committed

Markus Wurzenberger (mwurzenberger) on 2019-06-26

Changed in logdata-anomaly-miner:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.