Zacharias 🐝 Voulgaris

4 anni fa · 2 min. di lettura · ~10 ·

Blogging
>
Il blog di Zacharias 🐝
>
A Data Anonymization Method Immune to a Hacker A.I.

A Data Anonymization Method Immune to a Hacker A.I.

bc4f686f.jpg

Why?

Data anonymization is a hot topic and for good reason. With so many breaches of privacy and the potential of even worse scenarios coming about, it's no wonder that everyone is aware of this matter in the data community. This includes data scientists too since we often need to deal with sensitive data of this sort, usually referred to as PII. However, common anonymization methods such as hashing although useful, don't stand a chance against modern A.I. systems that can figure out the information we are trying to hide, through clever deductions based on the remaining data.


What?

Why if there was a method for anonymizing the data at hand, all while maintaining the relationships among the variables this data consists of? This simple question may make the whole process seem straight forward but it’s much more challenging than it seems. Recently, I talked with some A.I. experts who mentioned this matter and were content with a particular Python package they had found that enables this sort of anonymization while maintaining 65-70% of the information at hand. Also, I've come up with a Julia-based solution to the problem a couple of weeks back, all while maintaining a much larger proportion of the information in the data. So, solutions to the anonymization problem exist, if you know how to deal with the data.


How?

Dealing with the data so that you maintain the bulk of the information while making it anonymous isn’t easy. The idea is to create new data the resembles closely the original and use that one instead. If this process is done in a stochastic manner (i.e. using randomness in a controlled way), the process is impossible to reverse. In other words, no matter how intelligent the hacker, as in the case of an A.I. one, going back to the original data is not possible. This is not an issue for the data scientist analyzing this data since what she works with is the information in that data (aka the signal), which is retained to a large extent making the anonymized data as valuable as the original data more or less.


Where?

The best part about this process is that it’s applicable everywhere, across all domains where data science is usable. In every such project, data is eventually transformed into numbers, so regardless of the domain it comes from, it’s possible to secure it in terms of privacy with the aforementioned anonymization processes. Also, it doesn't matter what applications you plan to do with this data since all this takes place in the preprocessing stage of the analysis, which is before the actual modelling part.


What now?

Now you have the option to perform anonymization to the data at hand, without having to worry about a hacker A.I. compromising it in terms of privacy. You just need to find a data scientist who is adept at this process (ahem!). So, if you have a proof-of-concept project in mind, you can carry it out even with someone outside your organization using anonymized data for it. This can open up new possibilities of deriving value from the data at hand, without jeopardizing the privacy of the people involved in that data. So, perhaps ethical use of data is not so far-fetched as a concept!



PS – This is the kind of article I would normally publish on my data science and A.I. blog, FoxyDataScience.com. This time I decided to publish it here as it’s easier for real people to comment (blogs tend to get more SEO leeches and other spammers). If you enjoy this article, consider visiting my 100% ad-free blog and checking out my other educational material. Cheers!


Commenti

Pascal Derrien

4 anni fa #2

Data protection encryption and the right to anonymize data are only at the very beginning of their journey thanks 🙏 for reminding us that we should not take things for granted in that domain

Jerry Fletcher

4 anni fa #1

Zacharias, My respect for your skills just went up another notch! Give nth e stuff I've stumbled upon about AI in the last couple weeks this is impressive. And so it goes.

Articoli di Zacharias 🐝 Voulgaris

Visualizza il blog
2 anni fa · 3 min. di lettura

Overview · Mentoring is one of those subjects I can talk about till the cows come home (the other su ...

1 anno fa · 4 min. di lettura

Not-so-technical intro · Anyone who has delved into computers has heard and probably experienced pro ...

1 anno fa · 1 min. di lettura

Whether it's a solar panel or a rigged hamster wheel, you can make a first step in harnessing your p ...

Professionisti correlati

Potresti essere interessato a questi lavori

  • Humanitas

    Data Engineer

    Trovato in: beBee S2 IT - 5 ore fa


    Humanitas Rozzano, Italia

    Job Profile: Data Engineer · Nell'ambito del Piano Nazionale di Ripresa e Resilienza (PNRR), della Missione 4 - Componente 2 - Linea di Intervento 1.4. "Dalla Ricerca all'Impresa", è risultato ammesso a finanziamento il Centro Nazionale "National Centre for HPC, Big Data and Quan ...

  • InRebus Technologies s.r.l.

    Data Engineer AWS

    Trovato in: Talent IT 2A C2 - 1 giorno fa


    InRebus Technologies s.r.l. Catania, Italia A tempo pieno

    InRebus Technologies nasce come software house e consulting company. Oggi è un'azienda versatile fatta di persone con profonda esperienza nei settori IT, formazione e multimedia: progettiamo soluzioni digitali avanzate in grado di fornire ai nostri clienti la migliore esperienza ...

  • Methode S.r.l.

    SAP Data Analyst

    Trovato in: Jooble IT O C2 - 1 settimana fa


    Methode S.r.l. San Vendemiano (TV), Italia

    Lavorare in Méthode significa essere innovativi, propositivi e affrontare con entusiasmo ogni sfida. Cerchiamo persone che condividano la nostra visione, con la voglia di apprendere e la capacità di comunicare la propria esperienza, con con l'entusiasmo per crescere insieme. · Co ...