Skip to content

Conversation

@jgraham14
Copy link

Issue:
When .csv files are uploaded as form attachments through pyodk to ODK Central servers using S3-compatible storage, the Content-Type header is not set correctly. This causes ODK Central to throw internal server errors when attempting to access these files later.

Solution:

  • Added logic to detect CSV files (case-insensitive .csv extension)
  • Set Content-Type: text/csv header specifically for CSV uploads
  • Maintained original behavior for all other file types

What has been done to verify that this works as intended?

  • Manually tested this behavior with an ODK Central server using S3 storage
  • Added unit tests for CSV header handling. N.B.: I'm not familiar with writing unit tests, and so had claude do this part. I feel uncomfortable vouching for their comprehensiveness.

Why is this the best possible solution? Were any other approaches considered?

  • This seemed like the minimal required fix, changing only headers for form attachments, and only for .csv files. I have not tested whether this issue occurs if pyodk is used to upload other kinds of form attachments to servers using S3 storage, though I imagine that is a rarer workflow.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

  • This should theoretically not affect the behavior end users experience.

Do we need any specific form for testing your changes? If so, please attach one.

  • Unneeded.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

  • Should not.

Before submitting this PR, please make sure you have:

  • [x ] included test cases for core behavior and edge cases in tests
  • [x ] run python -m unittest and verified all tests pass
  • [x ] run ruff format pyodk tests and ruff check pyodk tests to lint code
    • N.B. - I only committed the changes ruff made for the tests I added. Ruff also suggested changes in a number of files I had not touched.
  • [x ] verified that any code or assets from external sources are properly credited in comments

jgraham14 added 3 commits May 23, 2025 16:36
…n uploading .csv form attachments, which then caused ODK Central to throw an internal server error when trying to fetch these from s3 storage
@jgraham14
Copy link
Author

Hi! I diagnosed this problem incorrectly, and it is an issue on Central's end. I thought, based on how my workflow runs, that this was an issue caused by PyODK uploading .csvs, while .csvs uploaded manually work correctly. This is not actually the core problem. Instead, it is caused by uploading .csvs with missing data. I identified two slightly different cases:

  1. If you attempt to upload a .csv with a header line (column names) but then no data in those columns, ODK Central will accept the file but then have a 500 internal server error when fetching that file back from S3 storage. This breaks across platforms - when uploaded manually in the browser or using pyodk on a windows machine, and when uploaded by a script using pyodk on a linux machine.
  2. If you attempt to upload a .csv with a header line and incomplete data (i.e., in the last row of data, there is a missing value in the final column), the same error happens. However, for me, this broke only on my linux machine, not on windows. I don't think this has anything to do with pyodk (it worked on windows both with pyodk and manually through the browser), I think it is a different line endings thing.

As a result, I don't think there's actually any changes needed in PyODK (this change above doesn't hurt, but I don't think it's needed).

I will open an issue in the Central repository as well to report this. Feel free to close this PR, I would do myself but I want to make sure someone reads this first and I'm not entirely sure how github notifactions work. Sorry for the wild goose chase!

@lognaturel
Copy link
Member

lognaturel commented May 26, 2025

Thanks so much for the follow-up here and apologies for the original issue! Central doesn’t currently do any backend validation of CSVs so the issue you’re getting is not too surprising.

@lindsay-stevens i thought you had run into an issue that could be related to content type headers but perhaps I’m confusing things?

@matthew-white
Copy link
Member

matthew-white commented May 26, 2025

@lognaturel, maybe you're thinking of getodk/central#940? In this case, I think the underlying issue is getodk/central-backend#1351. By the way, I also left some related comments on #115.

@lognaturel
Copy link
Member

Thanks for reporting and following up on this, @jgraham14! Closing in favor of #116.

@lognaturel lognaturel closed this Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants