A Data Anonymization Method Immune to a Hacker A.I.
Why?
Data anonymization is a hot topic and for good reason. With so many breaches of privacy and the potential of even worse scenarios coming about, it's no wonder that everyone is aware of this matter in the data community. This includes data scientists too since we often need to deal with sensitive data of this sort, usually referred to as PII. However, common anonymization methods such as hashing although useful, don't stand a chance against modern A.I. systems that can figure out the information we are trying to hide, through clever deductions based on the remaining data.
What?
Why if there was a method for anonymizing the data at hand, all while maintaining the relationships among the variables this data consists of? This simple question may make the whole process seem straight forward but it’s much more challenging than it seems. Recently, I talked with some A.I. experts who mentioned this matter and were content with a particular Python package they had found that enables this sort of anonymization while maintaining 65-70% of the information at hand. Also, I've come up with a Julia-based solution to the problem a couple of weeks back, all while maintaining a much larger proportion of the information in the data. So, solutions to the anonymization problem exist, if you know how to deal with the data.
How?
Dealing with the data so that you maintain the bulk of the information while making it anonymous isn’t easy. The idea is to create new data the resembles closely the original and use that one instead. If this process is done in a stochastic manner (i.e. using randomness in a controlled way), the process is impossible to reverse. In other words, no matter how intelligent the hacker, as in the case of an A.I. one, going back to the original data is not possible. This is not an issue for the data scientist analyzing this data since what she works with is the information in that data (aka the signal), which is retained to a large extent making the anonymized data as valuable as the original data more or less.
Where?
The best part about this process is that it’s applicable everywhere, across all domains where data science is usable. In every such project, data is eventually transformed into numbers, so regardless of the domain it comes from, it’s possible to secure it in terms of privacy with the aforementioned anonymization processes. Also, it doesn't matter what applications you plan to do with this data since all this takes place in the preprocessing stage of the analysis, which is before the actual modelling part.
What now?
Now you have the option to perform anonymization to the data at hand, without having to worry about a hacker A.I. compromising it in terms of privacy. You just need to find a data scientist who is adept at this process (ahem!). So, if you have a proof-of-concept project in mind, you can carry it out even with someone outside your organization using anonymized data for it. This can open up new possibilities of deriving value from the data at hand, without jeopardizing the privacy of the people involved in that data. So, perhaps ethical use of data is not so far-fetched as a concept!
PS – This is the kind of article I would normally publish on my data science and A.I. blog, FoxyDataScience.com. This time I decided to publish it here as it’s easier for real people to comment (blogs tend to get more SEO leeches and other spammers). If you enjoy this article, consider visiting my 100% ad-free blog and checking out my other educational material. Cheers!
Articoli di Zacharias 🐝 Voulgaris
Visualizza il blogOverview · Mentoring is one of those subjects I can talk about till the cows come home (the other su ...
Not-so-technical intro · Anyone who has delved into computers has heard and probably experienced pro ...
Whether it's a solar panel or a rigged hamster wheel, you can make a first step in harnessing your p ...
Professionisti correlati
Potresti essere interessato a questi lavori
-
Data Engineer
Trovato in: beBee S2 IT - 5 ore fa
Humanitas Rozzano, ItaliaJob Profile: Data Engineer · Nell'ambito del Piano Nazionale di Ripresa e Resilienza (PNRR), della Missione 4 - Componente 2 - Linea di Intervento 1.4. "Dalla Ricerca all'Impresa", è risultato ammesso a finanziamento il Centro Nazionale "National Centre for HPC, Big Data and Quan ...
-
Data Engineer AWS
Trovato in: Talent IT 2A C2 - 1 giorno fa
InRebus Technologies s.r.l. Catania, Italia A tempo pienoInRebus Technologies nasce come software house e consulting company. Oggi è un'azienda versatile fatta di persone con profonda esperienza nei settori IT, formazione e multimedia: progettiamo soluzioni digitali avanzate in grado di fornire ai nostri clienti la migliore esperienza ...
-
SAP Data Analyst
Trovato in: Jooble IT O C2 - 1 settimana fa
Methode S.r.l. San Vendemiano (TV), ItaliaLavorare in Méthode significa essere innovativi, propositivi e affrontare con entusiasmo ogni sfida. Cerchiamo persone che condividano la nostra visione, con la voglia di apprendere e la capacità di comunicare la propria esperienza, con con l'entusiasmo per crescere insieme. · Co ...
Commenti
Pascal Derrien
4 anni fa #2
Jerry Fletcher
4 anni fa #1