Skip to content

Surnames matching is case sensitive #156

@jacobmendoza

Description

@jacobmendoza

Hello,

I would like to know what was the rationale behind making the matching for surnames case sensitive. For example, I see in the case of hospitals:

    hospital = dd.ds.LookupSet(matching_pipeline=[dd.str.LowercaseString()])

But surnames:

    surname = dd.ds.LookupSet()

Is this trying to limit the amount of false positives?

We are pondering using Deduce to anonymize text that sometimes comes from letters where the personal information appears with almost no context, all in caps:

SOME NAME AND SURNAME
MY ADDRESS 123-A
1234 AB MY CITY

My understanding is that I could extend the library via lookup_data_path providing a text file with these variants (all in caps for example), but it would be really interesting to know why it works in this way.

Thanks for all the fantastic work made with the library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions