-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
The library has been really useful to us to break text into sentences. I've noticed one issue so far. Seems like if a sentence ends with an acronym at the end of the text, everything is okay, but if there's another sentence after it - it gives an incorrect result. It goes even worse if the acronym is capitalized.
Here it works fine:
$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 a.m..', \Sentence::SPLIT_TRIM);
var_dump($sentences);
array(1) {
[0] =>
string(25) "Let's meet at 10:00 a.m.."
}
But fails in this one:
$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 a.m.. How about Greg?', \Sentence::SPLIT_TRIM);
var_dump($sentences);
array(2) {
[0] =>
string(22) "Let's meet at 10:00 a."
[1] =>
string(19) "m.. How about Greg?"
}
Here it fails with a capitalized acronym:
$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 A.M.. How about Greg?', \Sentence::SPLIT_TRIM);
var_dump($sentences);
array(1) {
[0] =>
string(41) "Let's meet at 10:00 A.M.. How about Greg?"
}
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels