-
Notifications
You must be signed in to change notification settings - Fork 95
URI's are being double encoded, first by _call_uri, then again by URI::Template #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, * The escaping (percent encoding) of chars in the 128 .. 255 range
passed to the URI constructor or when setting URI parts using the
accessor methods depend on the state of the internal UTF8 flag (see
utf8::is_utf8) of the string passed. If the UTF8 flag is set the UTF-8
encoded version of the character is percent encoded. If the UTF8 flag
isn't set the Latin-1 version (byte) of the character is percent
encoded. This basically exposes the internal encoding of Perl strings.
Thus, it would be better to check if the key is utf8 using utf8::is_utf8. Rgds. |
|
Note that statement is listed under BUGS as it's a clear bug in URI. It's better to use utf8::upgrade and utf8::downgrade to put strings in a consistent state for buggy modules like this. |
|
@fskale seems like you have a good idea of what to do, but I just wanted to call out this: |
|
I just want to comment on "Imho the whole utf8 handling has to be rewritten to check whether the string is utf8 or not as mentioned in the URI manpage." - it's impossible to check if a string is utf8 or not. is_utf8 tells you how the string is being internally represented, not whether it's encoded. So like I said the best way is to get the internal representation in a consistent state so this module won't get confused. |
|
@fskale, @Grinnz, @veqryn : Once we have an agreement / way to go with this, we can release to CPAN. If you can test the patch against https://github.com/pplu/aws-sdk-perl/tree/release/0.37 (that has some refactors applied to service handling) it would be awesome. |
|
@fskale I am not sure I understand your comments. Can you provide a working PR or code that will solve this library? Otherwise, I am just going to add a second regex to convert |
|
This is incorrect. utf8::is_utf8 does not tell you whether it needs to be encoded, but how it is internally represented. A string can contain non-ascii characters but not have the UTF8 bit set, and vice versa - your test case only covers one possibility. You can instead run utf8::upgrade $str unconditionally (this operates in place and does nothing if the string is already upgraded) so that it is always interpreted the same way when it eventually gets passed to URI. |
|
@fskale This isn't supposed to be complicated. Stop writing scripts and command line things, and just write a PR please. |
|
Guys, can we please reach an agreement on this so it can be merged and released? This issue is causing problems for us in production on a regular basis. I'd prefer to not have to fork PAWS to get this fixed. |
|
My PR is ready to go. If anyone would like to change it, they are welcome to open a separate PR. |
|
This pr fixes the general issue of double url encoding, and is not necessarily targeting any utf8 issues. I suggest that this PR be merged to fix the general case, and if there is still an issue with utf handling, that be addressed in a separate PR, especially since there has been no word from the utf8 camp in 19 days. @pplu Any chance this can get merged soon? Just trying to get this PR back in motion, as it seems to have stalled, and it will fix issues for a lot of folks. |
|
Just a suggestion: to fix a general case of double-encoding URLs, you could use uri_unescape. This regex only unescapes one aspect of it, double encoding could have other side effects. |
|
For what it's worth I don't actually think the issue is related to UTF-8. URI::Template is documented that it will url-escape the vars you pass it; since the vars are already url-escaped, they become double escaped. Perhaps the solution is not to uri_escape_utf8 the vars at all before passing them to the template. |
|
Deeper investigation into the double-encoding issue reveals that there is a bug in the module URI::Template, used for the uri creation. It should, according to the RFC it follows, allow the caller to pass in already-uri-encoded strings, and not re-encode them, but it doesn't We (Shadowcat), intend to fix this by patching URI::Template to follow the RFC more closely. Some characters should, according to the AWS docs, be avoided completely, we will also add code that disallows these characters to be used in the URI parameters. See also: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html |
I'm not sure how this could be "fixed". Much like UTF-8 encoded strings in Perl, there is no way to tell whether a string is already URI-encoded, because many valid strings are also valid URI-encodings of strings. The user must specify (or use a function documented to do such) whether the string they are passing is already encoded. (But maybe the RFC you refer to already accounts for this?) |
|
Makes me wonder if we should adopt the '...' thing for literal, i.e. passing a template arg as a scalar means "encode this if required", passing it as a scalarref means "just dump it in there" |
|
@Grinnz I'm not sure if you've looked at the module/RFC ? There is a way to send "always encode this" and a way to send "I encoded this already (at least some of it)", so if someone was intending to send strings that contain %XX and wanted it further encoded, they wouldn't use the "I already encoded it" variant. In Paws we're pre-encoding (because AWS is a bit more peculiar about what should be encoded, I assume), and we're wanting the URI::Template to not encode the "/" in for Keys. Therefore we're wanting it not re-encoded,. If someone deliberately sent %xx as part of a Key name in Paws, we will have already encoded it, so it will come out correctly. Edit: It seems "%" is to be avoided in Key names, so that shouldn't be a problem we'll have that way either: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html |
|
I think I have fixed this with a change to botocore/botocore/data/s3/2006-03-01/service-2.json for example I did try and use 'id' in the 'querystring' but I could only get it to work with the request URI was |
This only fixes a very specific use, imo it would be better to fix the generic use (eg by sorting URI Template or the code around the encoding) |
|
True. I did find the same problems on most of the 'Configuaration' Put and Get commands. So far just a change to boto to get past the first door. I have checked the py code and they encode a little differntly that we do and have a few spots where they kludge things together Anyway do you have a solution for the above on the perl side of things? |
|
The URI::Template fix for this has been made and released. The change to Paws is in tests/stabilisation branch by Shadowcat, awaiting merge. It also needs a warning adding, see: #276 (comment) |
|
I'm closing this PR as the issue it was for has been resolved in another PR (#265). Any extra items discussed should be in separate issues. |
URI's (such as S3 Bucket Keys) are being double encoded.
For example, if I start with this bucket key:
trial/blarg/hello@joe.com/something4.msghttps://github.com/pplu/aws-sdk-perl/blob/master/lib/Paws/Net/RestXmlCaller.pm#L73 will turn it int:
trial/blarg/hello%40joe.com/something4.msgThen URI::Template gets called here https://github.com/pplu/aws-sdk-perl/blob/master/lib/Paws/Net/RestXmlCaller.pm#L82 , which turns it into:
trial/blarg/hello%2540joe.com/something4.msgThis bug is evident here: #111 , and probably a few other issues too.
Here is an excerpt from a debug session showing the issue:
@pplu Can you please cut a new release if you merge this? This is a major blocker for our use of Paws.
Thank you,
V
This change is