How PostgreSQL Anonymizer helps businesses comply with GDPR


PostgreSQL is an open source database management system, considered today as one of the most credible alternatives to Oracle Database. The PostgreSQL world is teeming, even daunting, with its multitude of developments.

The Frenchman Dalibo has been supporting companies since 2005 in their adoption of PostgreSQL. It also participates in the development of this offer through Dalibo Labs, one of whose projects should be of great interest to companies wishing to comply with the GDPR (General Data Protection Regulation). PostgreSQL Anonymizer is indeed an extension that allows you to protect sensitive data at source, directly within the database.

“When we saw the GDPR coming, we immediately understood that it would cause a major upheaval within companies”, explains Damien Clochard, co-founder of Dalibo and creator of PostgreSQL Anonymizer. “As PostgreSQL is at the heart of data, we wondered what role it could play in the protection of personal data. »

Rethink data protection at source

Most anonymization tools use the principle of ETL (Extract, Transform, Load), which consists of connecting to the database, extracting the desired information and then anonymizing it. The processing aimed at complying with the GDPR is therefore carried out during the data processing phase and not in the database itself.

“With this method, we get the data out of PostgreSQL. For security reasons, we would prefer PostgreSQL to anonymize the data itself and not an external tool. » With PostgreSQL Anonymizer, database architects can from the design stage specify the masking rules to be applied to a particular column in the database, thus sticking to the principle of privacy by design advocated by the GDPR.

PostgreSQL Anonymizer has two components: functions to declare which data should be anonymized and according to which principle; a toolbox of functions allowing data to be anonymized using various methods (pure and simple erasure, partial masking, generalization, scrambling, etc.).

Beware of indirect identifiers!

Data masking is complex. This is why Dalibo recommends thinking about this problem from the design of the database.

“Masking data is simple for easily identifiable information such as a name, surname or telephone number. But that quickly turns into a headache with indirect identifiers. For example, a postal code alone does not identify someone. But add a date of birth and a gender and the person will become much easier to identify. »

The database architect is able to identify indirect identifiers and specify how to protect them. And when a new technique makes it possible to identify a person, it is sent to the database administrator, so that he can create an appropriate rule. Rule which will then be global and integrated directly into the data model, and not only applied during the incriminated processing.

Two prestigious sponsors

After a little over three years of development, the PostgreSQL Anonymizer extension is finally available in a first stable version, unveiled in May 2022. The stabilization of the code was funded by two major players, each working in very different sectors.

“The first is the General Directorate of Public Finances, a large administration for which data protection is of critical importance,” says Damien Clochard. “The other is bioMérieux, which wanted to integrate this solution into its software dedicated to medical laboratories. This is a particularly interesting use case, because it affects an area where the requirements for the protection of personal data go beyond what the GDPR recommends. »

PostgreSQL Anonymizer 1.0 is available free of charge, under an open source license, from the Dalibo Labs site.





Source link -97