Zacharias 🐝 Voulgaris

2 anni fa · 2 min. di lettura · visibility ~10 ·

chat Contatta l'autore

thumb_up Rilevante message Commentare

A Data Anonymization Method Immune to a Hacker A.I.

A Data Anonymization Method Immune to a Hacker A.I.

Why?

Data anonymization is a hot topic and for good reason. With so many breaches of privacy and the potential of even worse scenarios coming about, it's no wonder that everyone is aware of this matter in the data community. This includes data scientists too since we often need to deal with sensitive data of this sort, usually referred to as PII. However, common anonymization methods such as hashing although useful, don't stand a chance against modern A.I. systems that can figure out the information we are trying to hide, through clever deductions based on the remaining data.


What?

Why if there was a method for anonymizing the data at hand, all while maintaining the relationships among the variables this data consists of? This simple question may make the whole process seem straight forward but it’s much more challenging than it seems. Recently, I talked with some A.I. experts who mentioned this matter and were content with a particular Python package they had found that enables this sort of anonymization while maintaining 65-70% of the information at hand. Also, I've come up with a Julia-based solution to the problem a couple of weeks back, all while maintaining a much larger proportion of the information in the data. So, solutions to the anonymization problem exist, if you know how to deal with the data.


How?

Dealing with the data so that you maintain the bulk of the information while making it anonymous isn’t easy. The idea is to create new data the resembles closely the original and use that one instead. If this process is done in a stochastic manner (i.e. using randomness in a controlled way), the process is impossible to reverse. In other words, no matter how intelligent the hacker, as in the case of an A.I. one, going back to the original data is not possible. This is not an issue for the data scientist analyzing this data since what she works with is the information in that data (aka the signal), which is retained to a large extent making the anonymized data as valuable as the original data more or less.


Where?

The best part about this process is that it’s applicable everywhere, across all domains where data science is usable. In every such project, data is eventually transformed into numbers, so regardless of the domain it comes from, it’s possible to secure it in terms of privacy with the aforementioned anonymization processes. Also, it doesn't matter what applications you plan to do with this data since all this takes place in the preprocessing stage of the analysis, which is before the actual modelling part.


What now?

Now you have the option to perform anonymization to the data at hand, without having to worry about a hacker A.I. compromising it in terms of privacy. You just need to find a data scientist who is adept at this process (ahem!). So, if you have a proof-of-concept project in mind, you can carry it out even with someone outside your organization using anonymized data for it. This can open up new possibilities of deriving value from the data at hand, without jeopardizing the privacy of the people involved in that data. So, perhaps ethical use of data is not so far-fetched as a concept!



PS – This is the kind of article I would normally publish on my data science and A.I. blog, FoxyDataScience.com. This time I decided to publish it here as it’s easier for real people to comment (blogs tend to get more SEO leeches and other spammers). If you enjoy this article, consider visiting my 100% ad-free blog and checking out my other educational material. Cheers!


thumb_up Rilevante message Commentare
Commenti
Pascal Derrien

Pascal Derrien

2 anni fa #2

Data protection encryption and the right to anonymize data are only at the very beginning of their journey thanks 🙏 for reminding us that we should not take things for granted in that domain

Jerry Fletcher

Jerry Fletcher

2 anni fa #1

Zacharias, My respect for your skills just went up another notch! Give nth e stuff I've stumbled upon about AI in the last couple weeks this is impressive. And so it goes.

Altri articoli da Zacharias 🐝 Voulgaris

Visualizza il blog
1 settimana fa · 5 min. di lettura
Zacharias 🐝 Voulgaris

Data Management Best Practices for Modern Backend Data Security

Source: pixabay.com (after some processing work)Th ...

2 mesi fa · 2 min. di lettura
Zacharias 🐝 Voulgaris

A Modern Data Pipeline

Source: Semantix Brasil · I generally don't opt fo ...

2 mesi fa · 2 min. di lettura
Zacharias 🐝 Voulgaris

Facing the Heat (a Raspberry Pi article)

Source: pixabay.com · Lately, I've been working wi ...