Skip to content

Conversation

speeddragon
Copy link

During a cleanup of my files (stored in DigitalOcean Spaces, equivalent to S3), I've noticed some issues accessing files with question marks (?) in the object name. E.g.: test.jpg?id=3 and test.jpg?.

I'm not sure if the best solution here would be avoiding storing files with special characters, like mentioned on AWS documentation, but also found similar issues in the past, like #660 so I wanted to come up a solution for this case, but I think there would still be some other edge cases to explore regarding this.

To solve this, under the perform/2 function in ExAws.Operation.S3, I've encoded the object path.

To avoid the double encoding, I needed to remove Url.uri_encode/1 under:

  • ExAws.Auth.signature/8
  • ExAws.Request.Url.query/1

I'm not familiar with other AWS services and how they handle the query params, but I believe I also would need to add uri_encode/1 to the other ExAws.Operation modules (JSON, Query, RestQuery).

Probably exist some other valid cases to not encode every query parameter, I believe versionId would be one of them, but I think there should be more. (This is fixed in the second commit)

I'm also not sure if DigitalOcean Spaces mimics 100% the same S3 behaviour.

I would like some guidance on whether I should invest more time in this or drop it. Maybe we shouldn't support this kind of functionality due to the complexity versus the benefits.

@@ -14,7 +14,7 @@ defmodule ExAws.Request.Url do
|> convert_port_to_integer
|> (&struct(URI, &1)).()
|> URI.to_string()
|> String.trim_trailing("?")
|> String.replace_suffix("?", "")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to support text.txt? object, since if the object name ends with the question mark, the URI.to_string will add another question mark and the String.trim_trailing/2 will remove both.

Comment on lines +87 to +103
config = %{
http_client: ExAws.Request.HttpMock,
json_codec: JSX,
access_key_id: "AKIAIOSFODNN7EXAMPLE",
secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
region: "us-east-1",
retries: [
max_attempts: 5,
base_backoff_in_ms: 1,
max_backoff_in_ms: 20
],
normalize_path: false,
scheme: "https://",
host: "s3.amazonaws.com",
virtual_host: true,
port: 443
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was put together just to check if the test is valid. This should be improved and moved to a separate test.

@bernardd
Copy link
Contributor

Hi @speeddragon! Thanks for taking a look at this. I'm not across the docs enough to know what the "correct" behaviour for cases like this is. In terms of whether you should continue work on it, the two relevant questions are: 1) What existing stuff (if any) will it break, and 2) What do the official (boto) libraries do in the same case?

@speeddragon
Copy link
Author

Thanks for replying @bernardd. To answer to your questions:

  1. I'm not sure what could break this, since this is ex_aws (a core repository for other dependencies) and I've removed |> Url.uri_encode() from ex_aws/auth.ex. I would need to run tests on other projects that depend on this one. I will do this to answer this question.

  2. A quick check in boto repository, I found Dealing with Escaped Characters in S3 Key Names boto/boto3#3584. They had issues with +, so I think ? isn't supported either. They delegated the request to other teams, but I don't think they support it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants