Privacy Enhancing Technologies – Protecting data and still using it: squaring the circle – Knowledge


Contents

Research relies on data, but collecting data violates privacy. This is not an insoluble dilemma: There are technical tricks that can be used to use data and protect it at the same time.

Science can use data to make our lives better: it develops new medicines, analyzes social trends, or creates smart cities. Administration and companies also rely on data, for example to plan public transport or the power grid.

However, private data can be sensitive, so it cannot be easily collected and shared.

Collecting data without violating privacy at the same time? That sounds like an insoluble contradiction. But there is a way to square the circle.

Synthetic instead of anonymous

Traditionally, data is anonymized so that researchers can evaluate it safely. “Mrs. Smith” becomes “Mrs. Meier,” the telephone number becomes “079 *** ** **.” But anonymization has its limits: today, there is a lot of data about every person that can be easily linked to information from the Internet in order to identify a person.

Find a solution synthetic data. An artificial intelligence (AI) looks at the original data and learns what it looks like. Then it generates a new, fictitious dataset that looks the same: the phone numbers still have 10 digits and the ratio of women to men is the same as in the original data.

Researchers can use synthetic data to create statistics. Computer scientists use it to build software or a database.

Fully encrypted

Another way to protect data is encryption. This allows data to be stored or sent without unauthorized persons being able to view it. The problem is that in order for researchers to be able to work with the data, it must be decrypted – and is unprotected in the process.

Not so with the homomorphic encryptionThis mathematical trick makes it possible to continue calculating with data that has been modified in this way: if you add two homomorphically encrypted numbers, they produce the correct result – also in encrypted form.

Use without collecting

When it comes to rare diseases or genetic analyses, the data from a single hospital is not enough – hospitals have to pool data.

This is possible without sharing the data. Instead of sending the hospital data to researchers, the researchers send their models to the hospitals. The hospitals do the calculations on site and only send the results back. The researchers then combine the models from the hospitals.

The principle works not only for simple statistics, but also for machine learning and artificial intelligence. In the so-called Federated Learning the hospitals train a model. A central location brings together the individual models and sends the resulting main model to the hospitals. The process goes back and forth until the AI ​​is fully trained.

The future of data protection

All of these technologies are so-called PET, “Privacy Enhancing Technologies”. They are already being used today, but there are still some hurdles: there is a lack of know-how, resources and regulatory guidelines.

But the future is promising: Thanks to PET, data can be protected and still used. This means that soon more data can be shared securely and used for research.

source site-72