Zacharias 🐝 Voulgaris

2 settimane fa · 5 min. di lettura · visibility ~100 ·

chat Contatta l'autore

thumb_up Rilevante message Commentare

Data Management Best Practices for Modern Backend Data Security

Source: (after some processing work)

The Backend Data Security Threat Is Real!

Unless you’ve been living under a rock, chances are that you’ve heard of at least one of the many data security breaches that have taken place in the past year. Also, if you’ve been paying attention, you’d remember that most of them involved backend data security, or lack thereof, and have had devastating consequences. Some of the highlights include:

Facebook (April) – 533 million user records, including phone numbers and other personally identifiable information compromised.

LinkedIn (April) – About half a billion user profiles were compromised. A small fraction of this information (personal for the most part) was then sold on the dark web.

Volkswagen & Audi (June) – The data of 3.3 million customers of the company was compromised. Much of this was sensitive.

CVS Health (June) – Over a billion search records of the company’s customers were leaked. This leak involved various pieces of health-related data, spanning over 200 GB, due to a poorly protected database.

OneMoreLead (August) – The data of 126 million individuals was leaked because of an unsecured online database. Most of it was data of a personal nature.

Note that among these companies some are well-known international corporations, showing that no one is beyond the reach of the hackers behind such data breaches. Still, hackers tend to be opportunistic, so if you are aware of the data security risks and address them, they are most likely going to leave you alone and go on to their next target.

Common Backend Frameworks Used Today

Just so that we are all on the same page, let’s clarify what we mean by backend and backend frameworks.

First of all, a backend framework is a piece of software that helps streamline the various ETL processes involved. This kind of program doesn’t just save you time but also helps normalize various backend-related tasks across application domains. Note, however, that these frameworks don’t provide 100% data security, even if they may have some relevant protocols in place. However, they tend to be better than a pipeline built from scratch, for most modern use cases.

The backend frameworks most commonly used today are Django, Ruby on Rails, Flask, Laravel, Phoenix, CakePHP, Express, and Spring Boot. Although most of these frameworks have DevOps professionals in mind, it’s good to be familiar with them, even if you access the backend of an application in different capacities, such as building an API or even extending the functionality of one such application.

Traditional backend workflow. Source:

Best Practices for Data Governance in the Backend

Fortunately, there is something you can do about all these backend vulnerabilities to make sure they aren’t exploited. These fall into one or more of the following categories: infrastructure, specialized software, encrypted RDBS, data encryption (especially in searching or indexing), backing up, and general Cybersecurity measures. Let’s look at each one of them in more detail.

Infrastructure-related data security

Implement change management in databases and audit them regularly. This measure sounds like a no-brainer, but it’s important. If the server where the database lives tracks any activities to it, such as failed login attempts, it can help raise an alarm when necessary, deterring a potential hacker. Naturally, checking these logs regularly and even analyzing them with the help of a data scientist may also add actionable information to the whole process.

Perform vulnerability assessments and Cybersecurity penetration tests. Tools like nmap and OpenVas can help identify infrastructural vulnerabilities, such as open ports and the version of the services involved, ensuring that everything is up to the proper Cybersecurity standards. To put all this to the test, you can get a Cybersecurity professional to perform some penetration tests to assess how challenging it is to break into the computers involved.

Data security through specialized software

You need to use software from the following categories for each computer handling sensitive data:



Pop-up blockers


Practices related to encryption

Use encryption in any exposed data streams. If a data stream can be accessed by others, it needs to be encrypted, preferably by a military-grade encryption algorithm. This can take the form of a cryptosystem (e.g., the BitLocker software), or it can be hardware-based (e.g., a Trusted Platform Module chip, directly attached to the server’s motherboard).

Use data encryption for searching and indexing. This can leverage searchable symmetric encryption (SSE), public-key encryption scheme (PKES), or even homomorphic encryption (ideally, fully homomorphic encryption). You can also conceal the search indexes in more creative ways so that if they are intercepted by a hacker, they are useless to them.

Make use of encrypted databases. Some DBMS like CryptDB and Cipherbase provide high levels of security to the data stored in them. Be sure to read the documentation carefully, however, since there may be vulnerabilities still, some of which are described in the software docs.

Backing up practices

Back up all the data regularly. This is something many people do on their personal computers too since it’s not just hackers that can cost you your data. You can automate this process with a bash script or a specialized piece of software.

Using RAID. In some cases, it might sense to invest in a RAID configuration so that there is more redundancy of the data, plus there is an efficiency boost in some I/O operations (e.g., with RAID 0, 1+0, or 0+1). Which RAID setup you use will depend on your data and your budget.

General Cybersecurity measures

Identify sensitive data and label it accordingly. This measure can help not just you but other professionals dealing with this data in the future. After all, hackers tend not to value all data the same way!

Develop a policy for dealing with sensitive data. You may need the help of a Cybersecurity professional for this one, but it’s significant, especially if you have lots of sensitive data.

Manage access to sensitive data. This measure can involve something as simple as setting up an authentication process requiring a password or biometrics. It doesn’t cost much, and you only have to do it once. Again, having a Cybersecurity professional helping you with this may prove very useful.

Disable/Enable certain operations from your OS. Processes like LanMan authentication on Windows and the setuid or setgid programs on Linux can be disabled. At the same time, you can enable logging of important system events, ensure all user accounts have passwords (preferably strong ones) and make sure that all of these accounts are reconfigured for just the necessary users of the system.

Ensure all systems and programs are patched. Security patches are commonplace today and happen regularly. Keeping all your programs, including the operating system ones, up-to-date is crucial for ensuring that there are no vulnerabilities in them that a hacker may exploit.

How These Practices Have Changed Over Recent Years

Although the general Cybersecurity practices have remained more or less the same, other data management practices have evolved over the past few years. Namely, regarding the infrastructure ones, today there's more emphasis on vulnerability assessment and penetration testing. That's mainly because network security has become more widespread as many of today's hacks are related to this aspect of Cybersecurity.

Additionally, encryption-related practices have become more mainstream and more specialized. Before it was as simple as applying basic public-key encryption to any data in transit. Now encryption is widely used for stationary data too. At the same time, homomorphic encryption is also taking off (mainly through its latest versions that are considerably faster and easier to use).

As for the backing-up practices, these are also more widely used, particularly through their more sophisticated setups. With the cost of storage has dropped and continuing to drop, now more IT professionals make use of various RAID setups since it’s much more practical to replace a hard disk as-is than to manually restore data through a conventional backup program.

Beyond these practices, the one thing that has noticeably changed is our attitude towards Cybersecurity. Now more people have become aware of the risks, and they have started to expect it in many IT applications. It used to be a nice-to-have, but now it is more of a requirement.

Parting Thoughts

There is no doubt that this topic is both practical and applicable to any developer’s and data engineer’s daily work. For you to make the most of it, however, you need to incorporate at least some of the practices mentioned previously in your workflow. Thinking about the importance of data security is also paramount, especially if that leads to your refining of the ETL pipelines you work with, making them more secure. After all, when it comes to Cybersecurity, it’s better to spend more time on it than less, especially if you care about your end-user. If you wish to learn more about Cybersecurity and how it affects data-related work, check out my mini-courses on WintellectNow. Cheers!

thumb_up Rilevante message Commentare

Zacharias 🐝 Voulgaris

2 settimane fa #3

Thank you for your feedback, Bill and Franci. You are very welcome!

Thank you for the information. 

thx for sharing

Altri articoli da Zacharias 🐝 Voulgaris

Visualizza il blog
2 mesi fa · 3 min. di lettura

Can We Transcend Binary Thinking?

Source: · “While binary behaviour is s ...

2 mesi fa · 1 min. di lettura

Roofless (poem)

Source: pixabay.comSheltered by it, day after day, ...

2 mesi fa · 3 min. di lettura

Cognitive Dissonance and What We Can Do about It

Source: · What Cognitive Dissonance Is ...