-
Notifications
You must be signed in to change notification settings - Fork 122
Open
Description
According to the documentation in https://www.tensorflow.org/api_docs/python/tf/strings/split:
If sep is None or an empty string, consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
However the current implementation performs a character by character split.
onnxruntime-extensions/operators/text/string_split.cc
Lines 36 to 50 in 98fb96d
| if (delimiter.size() == 0) { | |
| char word[2] = "a"; | |
| for (int64_t row = 0; row < dimensions[0]; ++row) { | |
| const std::string& str = X[row]; | |
| if (str.empty()) | |
| continue; | |
| maxc = str.size() > maxc ? str.size() : maxc; | |
| for (auto it = str.begin(); it != str.end(); ++it) { | |
| word[0] = *it; | |
| words.push_back(word); | |
| indices.push_back(row); | |
| indices.push_back(std::distance(str.begin(), it)); | |
| } | |
| } | |
| } else { |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels