Skip to content
This repository was archived by the owner on Feb 19, 2020. It is now read-only.

Remove while space only sentences in NewLineSentenceSegmenter#59

Open
hiroshinoji wants to merge 1 commit intodlwh:masterfrom
hiroshinoji:filter_at_newline_segmenter
Open

Remove while space only sentences in NewLineSentenceSegmenter#59
hiroshinoji wants to merge 1 commit intodlwh:masterfrom
hiroshinoji:filter_at_newline_segmenter

Conversation

@hiroshinoji
Copy link

NewLineSentenceSegmenter did not trim each segmented sentence, so for example, it always outputted an error:

$ echo I live in Osaka . | java -Xmx4g -cp assembly.jar epic.parser.ParseText --model parsers/SpanModel-300.parser --sentences newline --tokens whitespace
(TOP (S (NP (PRP He) ) (VP (VBZ lives)  (PP (IN in)  (NP (NNP Osaka) )))))
### Could not tag Vector(), because No parse for Vector(): infinite partition... epic.parser.projections.ChartProjector$class.project(ChartProjector.scala:36);epic.parser.projections.AnchoredRuleMarginalProjector.project(EnumeratedAnchoring.scala:78)

I added an filter for empty sentences as in MLSentenceSegmenter, which avoids this by trimming every sentence. Now no error is outputted.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant