Skip to content

Conversation

@veqryn
Copy link

@veqryn veqryn commented Feb 22, 2018

URI's (such as S3 Bucket Keys) are being double encoded.

For example, if I start with this bucket key:
trial/blarg/hello@joe.com/something4.msg

https://github.com/pplu/aws-sdk-perl/blob/master/lib/Paws/Net/RestXmlCaller.pm#L73 will turn it int:
trial/blarg/hello%40joe.com/something4.msg

Then URI::Template gets called here https://github.com/pplu/aws-sdk-perl/blob/master/lib/Paws/Net/RestXmlCaller.pm#L82 , which turns it into:
trial/blarg/hello%2540joe.com/something4.msg

This bug is evident here: #111 , and probably a few other issues too.

Here is an excerpt from a debug session showing the issue:

Paws::Net::RestXmlCaller::_call_uri(/usr/local/lib/perl5/site_perl/5.24.0/Paws/Net/RestXmlCaller.pm:83):
83:         return $uri;
  DB<3> x $uri_template
0  '/{Bucket}/{+Key}'
  DB<4> x $vars
0  HASH(0x3ae7218)
   'Bucket' => 'mybucket.com'
   'Key' => 'trial/blarg/hello%40joe.com/something4.msg'
  DB<5> x $uri
0  URI::_generic=SCALAR(0x3abb680)
   -> '/mybucket.com/trial/blarg/hello%2540joe.com/something4.msg'

@pplu Can you please cut a new release if you merge this? This is a major blocker for our use of Paws.

Thank you,
V


This change is Reviewable

@fskale
Copy link

fskale commented Mar 28, 2018

Hi,
is there any forthcoming in the release of the fix ?
We ran into the same problem.
When doing the url_escaping (utf8 keys), the key will get escaped not one but 4 times !
Also, the perl URI manpage clearly states:

*   The escaping (percent encoding) of chars in the 128 .. 255 range
        passed to the URI constructor or when setting URI parts using the
        accessor methods depend on the state of the internal UTF8 flag (see
        utf8::is_utf8) of the string passed. If the UTF8 flag is set the UTF-8
        encoded version of the character is percent encoded. If the UTF8 flag
        isn't set the Latin-1 version (byte) of the character is percent
        encoded. This basically exposes the internal encoding of Perl strings.

Thus, it would be better to check if the key is utf8 using utf8::is_utf8.
If it is, then encode otherwise not.

Rgds.
Franz

@Grinnz
Copy link
Contributor

Grinnz commented Mar 28, 2018

Note that statement is listed under BUGS as it's a clear bug in URI. It's better to use utf8::upgrade and utf8::downgrade to put strings in a consistent state for buggy modules like this.

@veqryn
Copy link
Author

veqryn commented Mar 28, 2018

@fskale seems like you have a good idea of what to do, but I just wanted to call out this:
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
example: https://github.com/aws/aws-sdk-go/blob/master/private/protocol/rest/build.go#L31-L42

@veqryn
Copy link
Author

veqryn commented Mar 28, 2018

@fskale
It is probably URI::Template that is messing it up.
In any case, feel free to improve on my PR.

@pplu will you be considering this PR or the underlying bug any time soon?

@Grinnz
Copy link
Contributor

Grinnz commented Mar 28, 2018

I just want to comment on "Imho the whole utf8 handling has to be rewritten to check whether the string is utf8 or not as mentioned in the URI manpage." - it's impossible to check if a string is utf8 or not. is_utf8 tells you how the string is being internally represented, not whether it's encoded. So like I said the best way is to get the internal representation in a consistent state so this module won't get confused.

@pplu
Copy link
Owner

pplu commented Mar 29, 2018

@fskale, @Grinnz, @veqryn : Once we have an agreement / way to go with this, we can release to CPAN. If you can test the patch against https://github.com/pplu/aws-sdk-perl/tree/release/0.37 (that has some refactors applied to service handling) it would be awesome.

@veqryn
Copy link
Author

veqryn commented Mar 29, 2018

@fskale I am not sure I understand your comments. Can you provide a working PR or code that will solve this library?

Otherwise, I am just going to add a second regex to convert ~ back into their unencoded form.

@Grinnz
Copy link
Contributor

Grinnz commented Mar 30, 2018

This is incorrect. utf8::is_utf8 does not tell you whether it needs to be encoded, but how it is internally represented. A string can contain non-ascii characters but not have the UTF8 bit set, and vice versa - your test case only covers one possibility. You can instead run utf8::upgrade $str unconditionally (this operates in place and does nothing if the string is already upgraded) so that it is always interpreted the same way when it eventually gets passed to URI.

@veqryn
Copy link
Author

veqryn commented Mar 30, 2018

@fskale This isn't supposed to be complicated. Stop writing scripts and command line things, and just write a PR please.

@gadgetjunkie
Copy link

Guys, can we please reach an agreement on this so it can be merged and released? This issue is causing problems for us in production on a regular basis. I'd prefer to not have to fork PAWS to get this fixed.

@veqryn
Copy link
Author

veqryn commented Apr 17, 2018

My PR is ready to go. If anyone would like to change it, they are welcome to open a separate PR.

@gadgetjunkie
Copy link

This pr fixes the general issue of double url encoding, and is not necessarily targeting any utf8 issues.

I suggest that this PR be merged to fix the general case, and if there is still an issue with utf handling, that be addressed in a separate PR, especially since there has been no word from the utf8 camp in 19 days.

@pplu Any chance this can get merged soon? Just trying to get this PR back in motion, as it seems to have stalled, and it will fix issues for a lot of folks.

@Grinnz
Copy link
Contributor

Grinnz commented Apr 17, 2018

Just a suggestion: to fix a general case of double-encoding URLs, you could use uri_unescape. This regex only unescapes one aspect of it, double encoding could have other side effects.

@Grinnz
Copy link
Contributor

Grinnz commented Apr 17, 2018

For what it's worth I don't actually think the issue is related to UTF-8. URI::Template is documented that it will url-escape the vars you pass it; since the vars are already url-escaped, they become double escaped. Perhaps the solution is not to uri_escape_utf8 the vars at all before passing them to the template.

@castaway
Copy link
Collaborator

castaway commented Jul 9, 2018

Deeper investigation into the double-encoding issue reveals that there is a bug in the module URI::Template, used for the uri creation. It should, according to the RFC it follows, allow the caller to pass in already-uri-encoded strings, and not re-encode them, but it doesn't

We (Shadowcat), intend to fix this by patching URI::Template to follow the RFC more closely.

Some characters should, according to the AWS docs, be avoided completely, we will also add code that disallows these characters to be used in the URI parameters. See also: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html

@Grinnz
Copy link
Contributor

Grinnz commented Jul 10, 2018

It should, according to the RFC it follows, allow the caller to pass in already-uri-encoded strings, and not re-encode them, but it doesn't

I'm not sure how this could be "fixed". Much like UTF-8 encoded strings in Perl, there is no way to tell whether a string is already URI-encoded, because many valid strings are also valid URI-encodings of strings. The user must specify (or use a function documented to do such) whether the string they are passing is already encoded. (But maybe the RFC you refer to already accounts for this?)

@shadowcat-mst
Copy link

Makes me wonder if we should adopt the '...' thing for literal, i.e. passing a template arg as a scalar means "encode this if required", passing it as a scalarref means "just dump it in there"

@castaway
Copy link
Collaborator

castaway commented Jul 11, 2018

@Grinnz I'm not sure if you've looked at the module/RFC ? There is a way to send "always encode this" and a way to send "I encoded this already (at least some of it)", so if someone was intending to send strings that contain %XX and wanted it further encoded, they wouldn't use the "I already encoded it" variant.

In Paws we're pre-encoding (because AWS is a bit more peculiar about what should be encoded, I assume), and we're wanting the URI::Template to not encode the "/" in for Keys. Therefore we're wanting it not re-encoded,. If someone deliberately sent %xx as part of a Key name in Paws, we will have already encoded it, so it will come out correctly.

Edit: It seems "%" is to be avoided in Key names, so that shouldn't be a problem we'll have that way either: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html

@byterock
Copy link
Collaborator

byterock commented Nov 3, 2019

I think I have fixed this with a change to botocore/botocore/data/s3/2006-03-01/service-2.json

for example

GetBucketInventoryConfiguration":{
      "name":"GetBucketInventoryConfiguration",
      "http":{
        "method":"GET",
--        "requestUri":"/{Bucket}?inventory"
++        "requestUri":"/{Bucket}?inventory&id={Id}"
      },
…
       "Id":{
          "shape":"InventoryId",
          "documentation":"<p>The ID used to identify the inventory configuration.</p>",
-          "location":"querystring",
-          "locationName":"id"
+          "location":"uri",
+          "locationName":"Id"
        }

I did try and use 'id' in the 'querystring' but I could only get it to work with the request URI was
'/dev.cargotel.paws?inventory=&id=test'

@castaway
Copy link
Collaborator

castaway commented Nov 5, 2019

I think I have fixed this with a change to botocore/botocore/data/s3/2006-03-01/service-2.json

for example

GetBucketInventoryConfiguration":{
      "name":"GetBucketInventoryConfiguration",
      "http":{
        "method":"GET",
--        "requestUri":"/{Bucket}?inventory"
++        "requestUri":"/{Bucket}?inventory&id={Id}"
      },
…
       "Id":{
          "shape":"InventoryId",
          "documentation":"<p>The ID used to identify the inventory configuration.</p>",
-          "location":"querystring",
-          "locationName":"id"
+          "location":"uri",
+          "locationName":"Id"
        }

I did try and use 'id' in the 'querystring' but I could only get it to work with the request URI was
'/dev.cargotel.paws?inventory=&id=test'

This only fixes a very specific use, imo it would be better to fix the generic use (eg by sorting URI Template or the code around the encoding)

@byterock
Copy link
Collaborator

byterock commented Nov 5, 2019

True. I did find the same problems on most of the 'Configuaration' Put and Get commands.

So far just a change to boto to get past the first door.

I have checked the py code and they encode a little differntly that we do and have a few spots where they kludge things together

Anyway do you have a solution for the above on the perl side of things?
AWS takes the ''/dev.cargotel.paws?inventory=&id=test'' with no problems so I am going to stick with with as it shows up many more bugs that I have been nattering away at.

@castaway
Copy link
Collaborator

castaway commented Nov 6, 2019

The URI::Template fix for this has been made and released. The change to Paws is in tests/stabilisation branch by Shadowcat, awaiting merge.

It also needs a warning adding, see: #276 (comment)

@castaway
Copy link
Collaborator

I'm closing this PR as the issue it was for has been resolved in another PR (#265). Any extra items discussed should be in separate issues.

@castaway castaway closed this Feb 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants