Mind the gap, mind the data – the era of remote operations

me Mind the gap, mind the data – the era of remote operations
Maciej Mazur, Chief Data Scientist, PGS Software

The importance of data and data systems increased rapidly over the last years. Many businesses transformed into data-driven organizations to maximize income and get a better understanding of their users or even to generate new value streams. Suddenly, the world had to change its mindset and remove the phrases ‘on-site’, ‘face-to-face’ or ‘direct’ from the dictionary and emphasize ‘remote’ and ‘distributed’ instead.

But how can governments (usually associated with doing with the old way, not even knowing what the word “remotely” means) cope with the new reality — especially in the era of an epidemic threat?

Distributed Communities

Many countries are ahead of elections. With an epidemic threat happening, it’s a challenge to make sure that all citizen rights are respected (like, for example, a citizen’s right to vote when under quarantine).

On one hand, governments should try to provide services unaffected, but on the other hand they also need to apply some special solutions to cope with necessary restrictions. Among many questions are for example — how are candidates and parties going to campaign and, most of all, how will people be able to vote, when they shouldn’t (or even legally can’t) leave their houses?

The latter doesn’t seem to be such a challenge from a technical perspective — gathering votes is a simple data analytics process, where data from many distributed points must be gathered and later transformed into a common view, stored and counted. Unfortunately, implementing these solutions isn’t that simple, as the 2016 US Presidential Election show (or even this year’s Democratic Primary in Iowa!).

Considering this, one of the main challenges in providing these special solutions is the protection of data and the prevention of cybercrime.

Distributed Challenges

Firstly, it’s important to realize that these challenges are spread across various layers, each coming with a set of difficulties:

  • Real-time data gathering – a growing number of edge devices require standardization, high-throughputs, and the ability to gather and send valid data in a secured way and, most of all —perform user authentication.
  • Transmission – internet encryption for edge locations is becoming a critical standpoint in every distributed solution. Thankfully 5G is arising.
  • Real-time data processes – identifying critical data and securing storages is only the beginning. Another requirement is the immediate processing and protection of gathered information and results.
  • Security and severity – the protection of distributed systems becomes more difficult due to a simple fact: the number of possible vectors of attack is growing exponentially with every new edge, transmission standard or channel.

Currently used technical solutions like Cloud, encryption standards, multi-layer networking are addressing the mentioned challenges reasonably well by adding security on top of known architectures. The difficulty arises with a growing need for true real-time data platforms, where real-time auditability is baked in. At the end of the day, these systems will require a solid protection from:

  1. Corrupted data – received from edge locations, distributed sources, which might affect the results.
  2. Data override – the process of data tempering is a rising thread that often takes place without any alert or notice from safety departments.
  3. Data leakage – typically, the vulnerability of the system leads to breaches which enable stealing information.

To address the need for new security standards in data platforms, two well-known techniques from blockchain world can be used — the tokenization and checksums validations.

Hide Your Data

Let’s look at a practical example. To share healthcare data across countries to defeat the epidemy — in real-time — it’s necessary to mask personal user data but still somehow transfer their data regarding symptoms and treatment. In the end, it’s crucial to ensure that the shared information comes from valid, and authorized entities, so no-one messes with the results.

Smart tokenization can help here. Here’s how:

Imagine storing raw data in an encrypted place. The tokenization function residing on top can generate anonymized yet still meaningful data – to mask and protect original and sensitive data. For example:

  • Name and Surname — real looking — but fake.
  • Address — valid only in terms of general location.
  • Email — fake but with a valid DNS – alias.
  • Social ID — totally

Tokenization can store unique checksums, IDs and encryption keys in separately secured places and monitor possible changes. It can catch (near-real-time) any attempt of data tempering. All information is rotated quickly (with re-generated new fake data). Such an approach enables:

  1. Data tracking — near-real-time monitoring and auditability, showing each change, source and difference comparing to raw information. It provides on-the-go security of the data platforms.
  2. Data sharing — inside and among organizations – any sensitive information is masked or aliased, having meaning only to initial data keeper.
  3. Users protection — real-time monitoring over user aliases allows implementing additional security mechanisms like content filtering, attack protections, leakage blockers, etc.

Thanks to such solutions, governments could easily proceed with online elections, healthcare could share information about patients to quickly discover similarities etc.

The Future

The current epidemic havoc shows that we — as societies — must change the way we’re defining our reality. Remote work, process automation, dispersed communities, augmented environments, edge communication are all becoming a common (normal!) thing — everywhere. All current and future systems will have to evolve towards distributed reality to meet end-user expectations in terms of experience, real-time processing and safety.