Skip to content

Commit fd4175e

Browse files
authored
Merge pull request #912 from S2P2/fix-join-broken-num
Fix empty string ('') added (in some cases) when using word_tokenize with join_broken_num=True
2 parents a38fd5e + dcd2b47 commit fd4175e

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

pythainlp/tokenize/_utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,8 @@ def rejoin_formatted_num(segments: List[str]) -> List[str]:
6161
connected_token += segments[segment_idx]
6262
pos += len(segments[segment_idx])
6363
segment_idx += 1
64-
65-
tokens_joined.append(connected_token)
64+
if connected_token:
65+
tokens_joined.append(connected_token)
6666
match = next(matching_results, None)
6767
else:
6868
tokens_joined.append(token)

0 commit comments

Comments
 (0)