Skip to content

For GDPR compliance how to delete user data from data lake efficiently? #59

@kanistha

Description

@kanistha

I have a question related to GDPR compliance needs to delete user data from data lake when user request to delete the account. Currently we are storing user data for data analytics in Azure Data lake with following configuration:

  • Type: Data Lake Storage Gen1
  • Data format in Data lake: Avro
  • Using default partitioning based on time

We are using de-Identified data lake approache to be inline with data privacy challenges by de-identifying and protecting sensitive information before it even enters a data lake. By minimizing the storage and use of personally identifiable information. So before storing data into data lake we are making data with random id. Is it still required to delete the non-personally identifiable information from data lake to be compliance to GDPR? If so, is there an efficient way to delete the user specific data from data lake as azure data lake store is an append-only file system. Data once committed cannot be erased or updated.

Please let me know if you need any further informations.

Thanks a lot for your help in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions