Skip to content

Acronyms at the end of sentence are incorrectly parsed #13

@dinamic

Description

@dinamic

The library has been really useful to us to break text into sentences. I've noticed one issue so far. Seems like if a sentence ends with an acronym at the end of the text, everything is okay, but if there's another sentence after it - it gives an incorrect result. It goes even worse if the acronym is capitalized.

Here it works fine:

$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 a.m..', \Sentence::SPLIT_TRIM);

var_dump($sentences);
array(1) {
  [0] =>
  string(25) "Let's meet at 10:00 a.m.."
}

But fails in this one:

$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 a.m.. How about Greg?', \Sentence::SPLIT_TRIM);

var_dump($sentences);
array(2) {
  [0] =>
  string(22) "Let's meet at 10:00 a."
  [1] =>
  string(19) "m.. How about Greg?"
}

Here it fails with a capitalized acronym:

$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 A.M.. How about Greg?', \Sentence::SPLIT_TRIM);

var_dump($sentences);
array(1) {
  [0] =>
  string(41) "Let's meet at 10:00 A.M.. How about Greg?"
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions