Skip to content

Conversation

@mkorvas
Copy link

@mkorvas mkorvas commented Jul 12, 2024

This is the result of my first encounter with this codebase (Docdeid and Deduce), the second part (Deduce). My goal was to understand the inner workings of it and then make sure that capitalized street names are pseudonymized (all-caps or titlecased, and covering also the special case of the "IJ" digraph in Dutch). When at it, I noticed unexpected behaviour for patient names v. other person names and improved that as well.

This depends on changes in Docdeid, filed as vmenger/docdeid#20.

To use that Docdeid version, I checked out the two repos side by side and added the following configuration in Deduce's pyproject.toml:

[tool.poetry.dependencies]
docdeid = {path = "../docdeid", develop = true}

FWIW, I also see a diff in my local (non-committed) version of base_config.json affecting "initiaal_patient" mentions but it's been 4 months since I intensively worked on this codebase so I don't remember anymore whether it's useful or even necessary anymore. But if some tests fail without it for you, let me know, this may well be the reason.

Beware! `poetry.lock` is not up-to-date in this commit
(and most recent commits wouldn't work with the current
last released version of `docdeid`, anyway).
Leaving the test case commented out for now.
This won't be a frequent problem but it's something
I noticed when first trying out this tool.
Otherwise, random names are labeled as "patient",
which will be wrong in most cases.
...and use it to determine where patient name is to
be merged with a neighbouring person mention and when
not.
...as required by pylint.
This makes Pylint happier and the code simpler.
This is needed so as to reduce the number of arguments
for the `_match_sequence` method and creates a cleaner
inheritance hierarchy between annotators, too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants