Releases: vmenger/deduce
Releases · vmenger/deduce
v3.0.6
v3.0.5
v3.0.4
v3.0.3
3.0.3 (2024-07-16)
Added
- A cache_path option, to define the path for saving/loading the lookup structure cache. You should use this if your install directory is not writable.
Removed
- the
config_filekeyword, now replaced byconfigwhich accepts both filenames and dicts - old lookup list names, e.g.
prefixesnow replaced byprefix - annotator types
custom,regexp,token_pattern,dd_token_patternandannotation_context, all replaced by setting class directly asannotator_type - everything in
deduce.pattern, patient patterns now replaced byPatientNameAnnotator
v3.0.2
v3.0.1
v3.0.0
3.0.0 (2023-12-20)
Added
- speed optimizations, ~250%
- pseudo-annotating eponymous diseases (e.g. Creutzfeldt-Jakob)
PatientNameAnnotator, which replacesdeduce.pattern- a structured way for loading and building lookup structures (lists and tries), including caching
pre_match_wordsfor some regexp annotators, speeding up the annotating- option to present a user config as dict (using
configkeyword)
Changed
- speedup for
TokenPatternAnnotator - some internals of
ContextPatternAnnotator - initials now detected by lookup list, rather than pattern
- redactor open and close chars from
<>to[], as previous chars caused issues in html (so deidentified text now shows[PATIENT],[LOCATIE], etc.) - names of lookup structures to singular (
prefix, rather thanprefixes) INSTELLINGtag toZIEKENHUISandZORGINSTELLING- refactored and simplified annotator loading, specifically the
annotator_typeconfig keyword now accepts references to classes (e.gdeduce.annotator.TokenPatternAnnotator) - renamed
interfix_with_capitalannotator tointerfix_with_name
Deprecated
- the
config_filekeyword, now replaced byconfigwhich accepts both filenames and dicts - old lookup list names, e.g.
prefixesnow replaced byprefix - annotator types 'custom', 'regexp', 'token_pattern', 'dd_token_pattern' and 'annotation_context', all replaced by setting class directly as annotator_type
Removed
- automated coverage reporting on coveralls.io
- options
lowercase_lookup,lowercase_neg_lookupfor token patterns - everything in
deduce.pattern, patient patterns now replaced byPatientNameAnnotator utils.any_in_text
Fixed
- some small additions/removals for specific lookup lists
- smaller bugs related to overlapping matches
v2.5.0
2.5.0 (2023-11-28)
Added
- the
RegexpPseudoAnnotatorcomponent for filtering regexp matches based on preceding/following words - a
prefix_with_interfixpattern for names, detecting e.g.Dr. van Loon
Fixed
- a bug with
BsnAnnotatorwith non-digit characters in regexp
Changed
- the age detection component, with improved logic and pseudo patterns
- annotations are no longer counted adjacent when separated by a comma
- streets are prioritized over names when merging overlapping annotations
- removed some false positives for postal codes ending in
grorie - extended the postbus pattern for
xx.xxxformat (old notation) - some smaller optimizations and exceptions for institution, hospital, placename, residence, medical term, first name, and last name lookup lists