-
Notifications
You must be signed in to change notification settings - Fork 2
Rework mime type white list #2198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Rework mime type white list #2198
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
I saw that files types are handled differently for |
|
Should I completely remove type |
We can make sure to set |
Daverball
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, but there's a couple of details we should iron out.
src/onegov/form/fields.py
Outdated
|
|
||
| upload_field_class: type[UploadField] = UploadField | ||
| upload_widget: Widget[UploadField] = UploadWidget() | ||
| validators = [WhitelistedMimeType()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here
It's probably fine to remove it for now. There may however be the rare false positive for any files that cannot be identified correctly by libmagic. Generally pdfs, zips and any other binary file formats can end up as |
| id=id, | ||
| default=default, | ||
| widget=widget, # type:ignore[arg-type] | ||
| validators=[*(validators or ())], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still pass these to the unbound field, not the field list. There are still things like the file size validator that makes more sense on the individual field. It allows you to re-simplify the validators back to what they were before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without having it on the FieldList it does not work. On the unbound field the validators will be overwritten from the FieldList defaulted validators which results in the wrong mime type(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this still happens? Before it made sense because you set a default validator on the class, now this should no longer happen, as long as you only pass the validators to the unbound field.
It's possible however that this also interacts badly with field dependencies, since there's no special casing for field lists. It might make sense to change how we implement field dependencies, since right now it is pretty fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails when parsing a form from formcode, see test_parse_multiplefileinput.
I am looking into why it fails once I move the validators from the FieldList to the UnboundField.
My guess is that add_field skips something where as the new UploadMultipleField ctor does not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're just looking for the validator in the wrong place in that test. The validator is on each field in the field list, not the field list itself. So I would get rid of this again and simplify the validators, otherwise the validator will run twice for each file. Once you move the validation from a validator in validators to post_validate you no longer will have to check for the presence of that validator anyways, since it's baked into the field itself.
| if not any(isinstance(validator, WhitelistedMimeType) | ||
| for validator in validators | ||
| ): | ||
| validators.append(validator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure this logic still holds up when field dependencies are in play, I have a feeling it does not, since it will wrap everything in a conditional validator, so you probably need to detect that case, so you don't add an unconditional and potentially redundant mimetype validator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be easier and more robust to store the mimetypes in an extra attribute and to override the post_validate hook, since validation_stopped is one of the parameters, which would be True if the dependency condition is unfulfilled.
So you can do something like the following in post_validate:
if not validation_stopped:
WhitelistedMimeType(self.mimetypes)(form, self)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually nevermind, this has the same problem, StrictOptional only stops validation if there is no data. Although technically there should be no data, when the field is hidden, so it might not matter in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would still be a bit cleaner to move the validator into post_validate like in my code suggestion above, instead of trying to modify the given validator chain. If someone passes in a WhitelistValidator we can emit a warning or raise an exception to use the mimetypes parameter instead.
| description: str = '', | ||
| id: str | None = None, | ||
| default: Sequence[StrictFileDict] = (), | ||
| default: LaxFileDict | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually StrictFileDict was correct, just the sequence part was incorrect, I just remembered wrong which type was used here originally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still reads LaxFileDict instead of StrictFileDict, which is incorrect.
Co-authored-by: David Salvisberg <david.salvisberg@seantis.ch>
| var bar = $('<div class="progress"><span class="meter" style="width: 0%"></span></div>') | ||
| .attr('data-filename', file.name) | ||
| .prependTo(progress); | ||
| var bar = $( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Daverball
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're in a pretty good spot now. But we should clean up and refactor things a bit.
If mimetypes is an argument to upload fields, we no longer need to pass around instances of WhitelistedMimeType everywhere, we should use the new parameter and avoid inserting a validator object, instead let's rely on the post_validate method of Field, which we can override in UploadField (we don't need to override it in UploadMultipleField, but we should add an explicit mimetypes parameter, so we can easily access mimetypes from UploadMultipleField.mimetypes in addition to each subfield, this means this parameter should both be assigned to self.mimetypes and passed to the self.upload_field_class call.
| 'image/x-win-bitmap', | ||
| 'image/x-pcx', | ||
| 'image/x-portable-pixmap', | ||
| 'image/x-tga' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason to change this list? Even if the current version of Pillow registers mime types for these, there's no guarantee a future version won't once again get rid of them, so I'd suggest leaving these, even if they're redundant.
| description: str = '', | ||
| id: str | None = None, | ||
| default: Sequence[StrictFileDict] = (), | ||
| default: LaxFileDict | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still reads LaxFileDict instead of StrictFileDict, which is incorrect.
| if not any(isinstance(validator, WhitelistedMimeType) | ||
| for validator in validators | ||
| ): | ||
| validators.append(validator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would still be a bit cleaner to move the validator into post_validate like in my code suggestion above, instead of trying to modify the given validator chain. If someone passes in a WhitelistValidator we can emit a warning or raise an exception to use the mimetypes parameter instead.
| id=id, | ||
| default=default, | ||
| widget=widget, # type:ignore[arg-type] | ||
| validators=[*(validators or ())], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're just looking for the validator in the wrong place in that test. The validator is on each field in the field list, not the field list itself. So I would get rid of this again and simplify the validators, otherwise the validator will run twice for each file. Once you move the validation from a validator in validators to post_validate you no longer will have to check for the presence of that validator anyways, since it's baked into the field itself.
| if isinstance(field.data, list): # UploadMultipleField | ||
| for data in field.data: | ||
| if not data: | ||
| continue # in case of file deletion | ||
|
|
||
| self.validate_filesize(field, data) | ||
|
|
||
| else: | ||
| self.validate_filesize(field, field.data) | ||
|
|
||
| def validate_filesize(self, field: Field, data: dict[Any, Any]) -> None: | ||
| if data.get('size', 0) > self.max_bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if isinstance(field.data, list): # UploadMultipleField | |
| for data in field.data: | |
| if not data: | |
| continue # in case of file deletion | |
| self.validate_filesize(field, data) | |
| else: | |
| self.validate_filesize(field, field.data) | |
| def validate_filesize(self, field: Field, data: dict[Any, Any]) -> None: | |
| if data.get('size', 0) > self.max_bytes: | |
| if field.data.get('size', 0) > self.max_bytes: |
Let's simplify this again, since it doesn't need to work on a FieldList, it's good enough if it works for each field in the list.
| def assert_whitelisted_mimetype_validator( | ||
| field: UploadField | UploadMultipleField | ||
| ) -> None: | ||
|
|
||
| validator = find_validator(field, WhitelistedMimeType) | ||
| assert validator | ||
| assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined] | ||
|
|
||
|
|
||
| def find_validator( | ||
| field: Field | FileField, | ||
| cls: type | ||
| ) -> Validator[Any, Any] | None: | ||
| return next((v for v in field.validators if isinstance(v, cls)), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can significantly simplify this when the whitelist is stored on the field (in fact you can probably get rid of this function and just replace it with the assertion):
| def assert_whitelisted_mimetype_validator( | |
| field: UploadField | UploadMultipleField | |
| ) -> None: | |
| validator = find_validator(field, WhitelistedMimeType) | |
| assert validator | |
| assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined] | |
| def find_validator( | |
| field: Field | FileField, | |
| cls: type | |
| ) -> Validator[Any, Any] | None: | |
| return next((v for v in field.validators if isinstance(v, cls)), None) | |
| def assert_whitelisted_mimetype_validator( | |
| field: UploadField | UploadMultipleField | |
| ) -> None: | |
| assert field.whitelist == WhitelistedMimeType.whitelist |
| validator = find_validator(form['file'], WhitelistedMimeType) | ||
| assert validator | ||
| assert validator.whitelist == { # type:ignore[attr-defined] | ||
| 'application/msword', 'application/pdf' | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| validator = find_validator(form['file'], WhitelistedMimeType) | |
| assert validator | |
| assert validator.whitelist == { # type:ignore[attr-defined] | |
| 'application/msword', 'application/pdf' | |
| } | |
| assert form['file'].whitelist == { # type:ignore[attr-defined] | |
| 'application/msword', 'application/pdf' | |
| } |
| validator = find_validator(form['file'], WhitelistedMimeType) | ||
| assert validator | ||
| assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| validator = find_validator(form['file'], WhitelistedMimeType) | |
| assert validator | |
| assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined] | |
| assert form['file'].whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined] |
| validator = find_validator(form['files'], WhitelistedMimeType) | ||
| assert validator | ||
| assert validator.whitelist == { # type:ignore[attr-defined] | ||
| 'application/msword', 'application/pdf' | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| validator = find_validator(form['files'], WhitelistedMimeType) | |
| assert validator | |
| assert validator.whitelist == { # type:ignore[attr-defined] | |
| 'application/msword', 'application/pdf' | |
| } | |
| assert form['files'].whitelist == { # type:ignore[attr-defined] | |
| 'application/msword', 'application/pdf' | |
| } |
| validator = find_validator(form['my_files'], WhitelistedMimeType) | ||
| assert validator | ||
| assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| validator = find_validator(form['my_files'], WhitelistedMimeType) | |
| assert validator | |
| assert validator.whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined] | |
| assert form['my_files'].whitelist == WhitelistedMimeType.whitelist # type:ignore[attr-defined] |

Org: Ensure mime type validator on file upload fields in form code
TYPE: Feature
LINK: ogc-2738