Hamlet

Run parsers and generate hpz files

Bug #254443 reported by andrew on 2008-08-03

Affects		Status	Importance	Assigned to	Milestone
	Hamlet	In Progress	High	WVU Modeling Intelligence Lab

Bug Description

We now have a fully functioning pipeline but are lacking the data to use with it. Each parser will create a .hpz file that is used as input to the pre-processor. We need to generate these files.

This has already been done for the law dataset (half of it at least). It also needs done for the following datasets.

* STEP
* Text/Html
* Java

A large task for this will be finding the data to run the parsers on.

* STEP - there is a folder on wisp at hamlet/data/STEP. This contains a collection of STEP files we extracted from the nara data. this would be a good starting point. Talk to Greg if you need assistance with this.

* Text/HTML - This i'm not sure about. Should we use the data from the nara folks? Talk to adam and see what he has to say since he wrote this.

* Java - This is your cup of tea. The good thing about this parser is that it has a wide variety of possible datasets to be run on. I recommend you start with weka. Try to build an hpz file for at least 3 different large open source projects.

I'm going to ask that Adam and Greg comment on this bug with instructions/recommendations for using their parsers. Be on the lookout for that.

andrew (andrew-j-matheny) on 2008-08-03

Changed in hamlet:
assignee:	nobody → mhull1

Revision history for this message

Gregory Gay (gregoryg) wrote on 2008-08-04:

About the only thing I can think to add about the step parser is that you might need to change the hard-coded directory in it to match where you store the step files. Just give me a shout if you need any help.

Revision history for this message

Gregory Gay (gregoryg) wrote on 2008-08-12:

No longer using .hpz files.

Changed in hamlet:
assignee:	mhull1 → wvumil
importance:	Undecided → High
status:	New → In Progress

Revision history for this message

Gregory Gay (gregoryg) wrote on 2008-08-15:

STEP corpora added to svn.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.