Zacharias 🐝 Voulgaris

3 anni fa · 2 min. di lettura · ~10 ·

Blogging
>
Il blog di Zacharias 🐝
>
Data Synthetics without A.I. and Why This Adds Value to You as an Individual

Data Synthetics without A.I. and Why This Adds Value to You as an Individual

3a2a8d0d.jpg

Data Synthetics is a term I coined to refer to the framework/processes related to synthesizing data (instead of just analyzing it). It's by far the most significant thing in data science today and one of the many applications of A.I.; namely, specialized systems generating data based on a given dataset, all while maintaining the properties of the original dataset. But isn't there an abundance of data out there? Well, yes, but we could always use some more. This rationale is much like the work of a fiction writer. The latter often fancies creating her own characters for a novel or a short story even though there are plenty of real-world characters out there she could copy and include in her text. So, if you don't want to be part of someone else's work of fiction (especially if that gets published and read by many other people), you may want to abstain from having your personally identifiable information (PII) roaming free in the world. Part of that information you may be unable to change (e.g., health-related PII, aka PHI) so, protecting it is of paramount importance.

Data synthetics can do this for you by creating new data very similar to existing data, thereby creating an unbridgeable gap between your PII and the data that is used by a predictive model, for example. This similarity can also help make these predictions relevant to you since the general underlying pattern (aka, the signal in the data) remains the same.

Plenty of brilliant A.I. professionals, be it scientists or engineers, have delved into this problem and have come up with mathematically elegant solutions. One such solution is Variational AutoEncoder (VAE, link to a comprehensive and somewhat comprehensible article on this topic), a kind of artificial neural network (ANN) that aims to figure out the underlying distributions of the data and create new data based on them. These distributions are a mathematical model aiming to describe the signal. Not the only one and probably not even the best one either, but it's good enough for something basic. The problem with VAEs (and other A.I. systems) is that they need sufficiently large datasets to figure out this signal and manifest it in new data. Additionally, building a VAE isn't so simple unless you understand the technology and the not-so-trivial math involved.

What if there was a way to develop synthetic data without utilizing A.I.? What if all you needed to know was the Math you learned in school and a few other things based on that Math, elegant but not overly sophisticated? Well, that's what I've done recently with sufficient success to consider this something usable and useful. This framework (which I call ROOF, hence the picture on the top) I developed in Julia 1.5, is low on computational resources and can be applied to any kind of continuous data (there is also a version for ordinal data though I imagine that's not something you care about that much). If you are in this sort of work or know someone who is, feel free to reach out to me. Cheers!


Commenti

Articoli di Zacharias 🐝 Voulgaris

Visualizza il blog
1 anno fa · 2 min. di lettura

The “Language Model for Dialogue Applications" AI Google developed last year is a machine learning-p ...

7 mesi fa · 1 min. di lettura

My team and I are working on an educational venture for data matters. Nothing too technical but some ...

1 anno fa · 4 min. di lettura

I have never been such a big fan of an operating system to try to get others to use it. I like how G ...

Potresti essere interessato a questi lavori

  • DLA Piper

    Internship Programme Italy, Litigation

    Trovato in: Talent IT C2 - 2 giorni fa


    DLA Piper Milan, Italia StageSHIP

    The role · Sappiamo che stai costruendo il tuo futuro e, in DLA Piper, la nostra cultura inclusiva e solidale fa sì che la crescita personale vada di pari passo con lo sviluppo professionale. · Il nostro Programma di Internship dura 6 mesi e ti consente di svolgere la pratica for ...

  • Mantu Group Sa

    It Functional Analyst

    Trovato in: Buscojobs IT C2 - 3 giorni fa


    Mantu Group Sa Biella, Italia

    Who are we?Amaris Consulting is an independent technology consulting firm providing guidance and solutions to businesses. With more than 1,000 clients across the globe, we have been rolling out solutions in major projects for over a decade – this is made possible by an internatio ...


  • PwC South Africa Milano, Italia

    Consultant - Governance, Processes & Controls - Milano page is loaded Consultant - Governance, Processes & Controls - Milano Apply locations Milan time type Full time posted on Posted Yesterday job requisition id 464269WD Line of Service · Advisory Industry/Sector · FS X-Sector S ...