splitting Vec<u8> of non unicode characters #928
-
| hello, I have a u8 vector, there can be non-unicode characters in that vector and I needed to divide it wherever there is a \0, but with the fact that if there is a \n or \r\n, it will not stop but will continue. Can you please show me example? | 
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 7 replies
-
| I think you probably want to use  As for a specific example, I don't really understand what you're saying. Could you please provide some input and the desired output? | 
Beta Was this translation helpful? Give feedback.
-
| i would just want example on how to build Regex::new from regex::bytes
crate to cover my question
2022-11-17 13:32 GMT+01:00, Andrew Gallant ***@***.***>:…  I think you probably want to use
 [`bytes::Regex`](https://docs.rs/regex/latest/regex/bytes/struct.Regex.html)
 for this.
 As for a specific example, I don't really understand what you're saying.
 Could you please provide some input and the desired output?
 --
 Reply to this email directly or view it on GitHub:
 #928 (comment)
 You are receiving this because you authored the thread.
 Message ID:
 ***@***.***> | 
Beta Was this translation helpful? Give feedback.
-
| It's right there in the link I gave you. :-) For example: https://docs.rs/regex/latest/regex/bytes/struct.Regex.html#method.find | 
Beta Was this translation helpful? Give feedback.
-
| yesterday i wanted to build with unicode disabled and when i did
let re = regex::bytes::Regex::new(r"0x0").unwrap();
it didn't work
2022-11-17 14:18 GMT+01:00, Andrew Gallant ***@***.***>:…  It's right there in the link I gave you. :-) For example:
 https://docs.rs/regex/latest/regex/bytes/struct.Regex.html#method.find
 --
 Reply to this email directly or view it on GitHub:
 #928 (comment)
 You are receiving this because you authored the thread.
 Message ID:
 ***@***.***> | 
Beta Was this translation helpful? Give feedback.
-
| 
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=65826762b40d386ae495a1781305930f
2022-11-17 14:45 GMT+01:00, Andrew Gallant ***@***.***>:…  This might help: https://jvns.ca/blog/good-questions/
 (I linked to [ESR's version of the same
 thing](http://www.catb.org/%7Eesr/faqs/smart-questions.html) previously, but
 I forgot just how patronizing it was.)
 --
 Reply to this email directly or view it on GitHub:
 #928 (reply in thread)
 You are receiving this because you authored the thread.
 Message ID:
 ***@***.***> | 
Beta Was this translation helpful? Give feedback.
-
| output will be later Vec<&[u8]>
but about that i am fine and if your regex idea will not stop after
non utf16 characters then you did what i wanted
2022-11-17 15:10 GMT+01:00, Andrew Gallant ***@***.***>:…  Notice that you don't even need a `bytes::Regex` because a NUL byte is valid
 UTF-8.
 --
 Reply to this email directly or view it on GitHub:
 #928 (reply in thread)
 You are receiving this because you authored the thread.
 Message ID:
 ***@***.***> | 
Beta Was this translation helpful? Give feedback.
-
| I'll try to translate it so you can better understand what I need:
in the example I sent you, there may be bytes that are not utf8 but
may be utf16. I need to split them, but in such a way that they can be
collected at the end into Vec<&[u8]> or Vec<Vec<u8>>, but I believe
that if they are split so their appearance will be preserved as they
are and I will not be blamed if they do not carry utf 8
2022-11-17 15:14 GMT+01:00, Peter Kubek ***@***.***>:…  output will be later Vec<&[u8]>
 but about that i am fine and if your regex idea will not stop after
 non utf16 characters then you did what i wanted
 2022-11-17 15:10 GMT+01:00, Andrew Gallant ***@***.***>:
> Notice that you don't even need a `bytes::Regex` because a NUL byte is
> valid
> UTF-8.
>
> --
> Reply to this email directly or view it on GitHub:
> #928 (reply in thread)
> You are receiving this because you authored the thread.
>
> Message ID:
> ***@***.***>
 | 
Beta Was this translation helpful? Give feedback.
-
| do you think this split will be faster than normal split from std if
my file that i work with will be big?
2022-11-17 15:39 GMT+01:00, Andrew Gallant ***@***.***>:…  A `&str` can be converted to `&[u8]` via
 [`str::as_bytes`](https://doc.rust-lang.org/std/primitive.str.html#method.as_bytes).
 --
 Reply to this email directly or view it on GitHub:
 #928 (reply in thread)
 You are receiving this because you authored the thread.
 Message ID:
 ***@***.***> | 
Beta Was this translation helpful? Give feedback.
-
| hello, I want to ask you if aho_corasick supports processing non utf8
bytes if i am using replace_all_with method?
2022-11-17 16:08 GMT+01:00, Andrew Gallant ***@***.***>:…  Maybe. Benchmark it.
 --
 Reply to this email directly or view it on GitHub:
 #928 (reply in thread)
 You are receiving this because you authored the thread.
 Message ID:
 ***@***.***> | 
Beta Was this translation helpful? Give feedback.
I think you probably want to use
bytes::Regexfor this.As for a specific example, I don't really understand what you're saying. Could you please provide some input and the desired output?