Skip to content

Conversation

@Tschuppi81
Copy link
Contributor

@Tschuppi81 Tschuppi81 commented Nov 10, 2025

Org: Ensure mime type validator on file upload fields in form code

TYPE: Feature
LINK: ogc-2738

@linear
Copy link

linear bot commented Nov 10, 2025

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

❌ Patch coverage is 96.42857% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.39%. Comparing base (cadf75d) to head (00b1614).
⚠️ Report is 2 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/onegov/form/validators.py 93.33% 2 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/onegov/agency/forms/agency.py 97.26% <100.00%> (ø)
src/onegov/election_day/forms/election.py 98.14% <100.00%> (ø)
src/onegov/election_day/forms/election_compound.py 97.76% <100.00%> (ø)
src/onegov/election_day/forms/upload/common.py 100.00% <ø> (ø)
src/onegov/election_day/forms/vote.py 97.68% <100.00%> (ø)
src/onegov/file/utils.py 97.46% <ø> (ø)
src/onegov/form/fields.py 94.46% <100.00%> (+0.09%) ⬆️
src/onegov/form/widgets.py 85.46% <100.00%> (ø)
src/onegov/landsgemeinde/forms/agenda.py 47.66% <100.00%> (-0.27%) ⬇️
src/onegov/landsgemeinde/forms/assembly.py 94.91% <100.00%> (-0.09%) ⬇️
... and 11 more

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cadf75d...00b1614. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Tschuppi81 Tschuppi81 requested a review from Daverball December 4, 2025 12:03
@Tschuppi81
Copy link
Contributor Author

I saw that files types are handled differently for view_upload_file_by_json in handle_file_upload. Basically all file types are allowed. Shall we keep this?

@Tschuppi81
Copy link
Contributor Author

Should I completely remove type application/octet-stream ? It is mostly used in conjunction with application/zip

@Daverball
Copy link
Member

I saw that files types are handled differently for view_upload_file_by_json in handle_file_upload. Basically all file types are allowed. Shall we keep this?

We can make sure to set supported_content_types on GeneralFileCollection. That's the only one that would allow anything to be uploaded currently through those views.

Copy link
Member

@Daverball Daverball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but there's a couple of details we should iron out.


upload_field_class: type[UploadField] = UploadField
upload_widget: Widget[UploadField] = UploadWidget()
validators = [WhitelistedMimeType()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here

@Daverball
Copy link
Member

Daverball commented Dec 4, 2025

Should I completely remove type application/octet-stream ? It is mostly used in conjunction with application/zip

It's probably fine to remove it for now. There may however be the rare false positive for any files that cannot be identified correctly by libmagic. Generally pdfs, zips and any other binary file formats can end up as application/octet-stream, it's a generic catch-all content type for binary data if it couldn't be detected as anything else.

id=id,
default=default,
widget=widget, # type:ignore[arg-type]
validators=[*(validators or ())],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still pass these to the unbound field, not the field list. There are still things like the file size validator that makes more sense on the individual field. It allows you to re-simplify the validators back to what they were before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without having it on the FieldList it does not work. On the unbound field the validators will be overwritten from the FieldList defaulted validators which results in the wrong mime type(s)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this still happens? Before it made sense because you set a default validator on the class, now this should no longer happen, as long as you only pass the validators to the unbound field.

It's possible however that this also interacts badly with field dependencies, since there's no special casing for field lists. It might make sense to change how we implement field dependencies, since right now it is pretty fragile.

Copy link
Contributor Author

@Tschuppi81 Tschuppi81 Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails when parsing a form from formcode, see test_parse_multiplefileinput.
I am looking into why it fails once I move the validators from the FieldList to the UnboundField.

My guess is that add_field skips something where as the new UploadMultipleField ctor does not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're just looking for the validator in the wrong place in that test. The validator is on each field in the field list, not the field list itself. So I would get rid of this again and simplify the validators, otherwise the validator will run twice for each file. Once you move the validation from a validator in validators to post_validate you no longer will have to check for the presence of that validator anyways, since it's baked into the field itself.

if not any(isinstance(validator, WhitelistedMimeType)
for validator in validators
):
validators.append(validator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure this logic still holds up when field dependencies are in play, I have a feeling it does not, since it will wrap everything in a conditional validator, so you probably need to detect that case, so you don't add an unconditional and potentially redundant mimetype validator.

Copy link
Member

@Daverball Daverball Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier and more robust to store the mimetypes in an extra attribute and to override the post_validate hook, since validation_stopped is one of the parameters, which would be True if the dependency condition is unfulfilled.

So you can do something like the following in post_validate:

if not validation_stopped:
    WhitelistedMimeType(self.mimetypes)(form, self)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually nevermind, this has the same problem, StrictOptional only stops validation if there is no data. Although technically there should be no data, when the field is hidden, so it might not matter in the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would still be a bit cleaner to move the validator into post_validate like in my code suggestion above, instead of trying to modify the given validator chain. If someone passes in a WhitelistValidator we can emit a warning or raise an exception to use the mimetypes parameter instead.

description: str = '',
id: str | None = None,
default: Sequence[StrictFileDict] = (),
default: LaxFileDict | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually StrictFileDict was correct, just the sequence part was incorrect, I just remembered wrong which type was used here originally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still reads LaxFileDict instead of StrictFileDict, which is incorrect.

var bar = $('<div class="progress"><span class="meter" style="width: 0%"></span></div>')
.attr('data-filename', file.name)
.prependTo(progress);
var bar = $(
Copy link
Contributor Author

@Tschuppi81 Tschuppi81 Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the Unsupported Media Type error is visible

Image

@Tschuppi81 Tschuppi81 requested a review from Daverball January 13, 2026 14:41
Copy link
Member

@Daverball Daverball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're in a pretty good spot now. But we should clean up and refactor things a bit.

If mimetypes is an argument to upload fields, we no longer need to pass around instances of WhitelistedMimeType everywhere, we should use the new parameter and avoid inserting a validator object, instead let's rely on the post_validate method of Field, which we can override in UploadField (we don't need to override it in UploadMultipleField, but we should add an explicit mimetypes parameter, so we can easily access mimetypes from UploadMultipleField.mimetypes in addition to each subfield, this means this parameter should both be assigned to self.mimetypes and passed to the self.upload_field_class call.

'image/x-win-bitmap',
'image/x-pcx',
'image/x-portable-pixmap',
'image/x-tga'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to change this list? Even if the current version of Pillow registers mime types for these, there's no guarantee a future version won't once again get rid of them, so I'd suggest leaving these, even if they're redundant.

description: str = '',
id: str | None = None,
default: Sequence[StrictFileDict] = (),
default: LaxFileDict | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still reads LaxFileDict instead of StrictFileDict, which is incorrect.

if not any(isinstance(validator, WhitelistedMimeType)
for validator in validators
):
validators.append(validator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would still be a bit cleaner to move the validator into post_validate like in my code suggestion above, instead of trying to modify the given validator chain. If someone passes in a WhitelistValidator we can emit a warning or raise an exception to use the mimetypes parameter instead.

id=id,
default=default,
widget=widget, # type:ignore[arg-type]
validators=[*(validators or ())],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're just looking for the validator in the wrong place in that test. The validator is on each field in the field list, not the field list itself. So I would get rid of this again and simplify the validators, otherwise the validator will run twice for each file. Once you move the validation from a validator in validators to post_validate you no longer will have to check for the presence of that validator anyways, since it's baked into the field itself.

Comment on lines +125 to +136
if isinstance(field.data, list): # UploadMultipleField
for data in field.data:
if not data:
continue # in case of file deletion

self.validate_filesize(field, data)

else:
self.validate_filesize(field, field.data)

def validate_filesize(self, field: Field, data: dict[Any, Any]) -> None:
if data.get('size', 0) > self.max_bytes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(field.data, list): # UploadMultipleField
for data in field.data:
if not data:
continue # in case of file deletion
self.validate_filesize(field, data)
else:
self.validate_filesize(field, field.data)
def validate_filesize(self, field: Field, data: dict[Any, Any]) -> None:
if data.get('size', 0) > self.max_bytes:
if field.data.get('size', 0) > self.max_bytes:

Let's simplify this again, since it doesn't need to work on a FieldList, it's good enough if it works for each field in the list.

Comment on lines +42 to +55
def assert_whitelisted_mimetype_validator(
field: UploadField | UploadMultipleField
) -> None:

validator = find_validator(field, WhitelistedMimeType)
assert validator
assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined]


def find_validator(
field: Field | FileField,
cls: type
) -> Validator[Any, Any] | None:
return next((v for v in field.validators if isinstance(v, cls)), None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can significantly simplify this when the whitelist is stored on the field (in fact you can probably get rid of this function and just replace it with the assertion):

Suggested change
def assert_whitelisted_mimetype_validator(
field: UploadField | UploadMultipleField
) -> None:
validator = find_validator(field, WhitelistedMimeType)
assert validator
assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined]
def find_validator(
field: Field | FileField,
cls: type
) -> Validator[Any, Any] | None:
return next((v for v in field.validators if isinstance(v, cls)), None)
def assert_whitelisted_mimetype_validator(
field: UploadField | UploadMultipleField
) -> None:
assert field.whitelist == WhitelistedMimeType.whitelist

Comment on lines +372 to +376
validator = find_validator(form['file'], WhitelistedMimeType)
assert validator
assert validator.whitelist == { # type:ignore[attr-defined]
'application/msword', 'application/pdf'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
validator = find_validator(form['file'], WhitelistedMimeType)
assert validator
assert validator.whitelist == { # type:ignore[attr-defined]
'application/msword', 'application/pdf'
}
assert form['file'].whitelist == { # type:ignore[attr-defined]
'application/msword', 'application/pdf'
}

Comment on lines +381 to +383
validator = find_validator(form['file'], WhitelistedMimeType)
assert validator
assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
validator = find_validator(form['file'], WhitelistedMimeType)
assert validator
assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined]
assert form['file'].whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined]

Comment on lines +401 to +405
validator = find_validator(form['files'], WhitelistedMimeType)
assert validator
assert validator.whitelist == { # type:ignore[attr-defined]
'application/msword', 'application/pdf'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
validator = find_validator(form['files'], WhitelistedMimeType)
assert validator
assert validator.whitelist == { # type:ignore[attr-defined]
'application/msword', 'application/pdf'
}
assert form['files'].whitelist == { # type:ignore[attr-defined]
'application/msword', 'application/pdf'
}

Comment on lines +410 to +412
validator = find_validator(form['my_files'], WhitelistedMimeType)
assert validator
assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
validator = find_validator(form['my_files'], WhitelistedMimeType)
assert validator
assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined]
assert form['my_files'].whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants