-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
Hello,
I would like to know what was the rationale behind making the matching for surnames case sensitive. For example, I see in the case of hospitals:
hospital = dd.ds.LookupSet(matching_pipeline=[dd.str.LowercaseString()])
But surnames:
surname = dd.ds.LookupSet()
Is this trying to limit the amount of false positives?
We are pondering using Deduce to anonymize text that sometimes comes from letters where the personal information appears with almost no context, all in caps:
SOME NAME AND SURNAME
MY ADDRESS 123-A
1234 AB MY CITY
My understanding is that I could extend the library via lookup_data_path providing a text file with these variants (all in caps for example), but it would be really interesting to know why it works in this way.
Thanks for all the fantastic work made with the library.
Metadata
Metadata
Assignees
Labels
No labels