Conforming to the established legislature and respecting stakeholder’s and physical person’s sensitive information is a major requirement and challenge for every process, framework or application that deals with storage or processing of personal data.

With the advent and adoption of the GDPR (General Data Protection Regulation) by all member states of the EU, the set of requirements, that each party dealing with personal data must fulfil, has been defined in a much stricter sense than before. Moreover, data regulations under the GDPR has been extended to not only cover the kind of data stored, but also explicitly defines the rights of a physical person whose personal data are stored, commonly referred to as “data subject” under the GDPR terminology, after her/his consent is given and her/his data is stored. Under the present regulation, a data subject can revoke any consent given at any time and request (the removal of any stored data) that any stored personal data be removed. In general, data subjects now have the right to be aware of the details about how their data is processed, analysed, shared or used in decision-making and analytics and additionally have the right to access and/or erase this data.

ChildRescue relies heavily on personal data. Data subjects include (are also the) missing or unaccompanied children, and the citizens that will (register and) act as the social network of ChildRescue. Apart from using the data subject’s data and feedback for the resolution of cases, the participating organisations, and thus ChildRescue, also use these data for statistical analysis of past cases, something that falls under the term of processing under the definitions of GDPR. Moreover, ChildRescue will also use algorithms based on profile data that will provide suggestions to search and rescue organisations on the most probably transport routes. Such tasks fall under the definition of profiling and decision-making of the GDPR.

Citizens that will use the ChildRescue platform to help find missing children will effectively contribute to investigation procedures by offering evidence. Processing of this kind of data falls within the conditions of lawful processing as these are presented in Article 6 of the GDPR; namely the “performance of a task carried out in the public interest or in the exercise of official authority vested in the controller”. Authorities may need to be able to track back the evidence to the citizen that generated it, both for legal reasons and for further direct communication that may be needed, and moreover to avert abuse of the platform from users with evil intent. ChildRescue needs to provide a robust and secure Data Privacy and Anonymization Framework which will respect the GDPR.

The landscape of Data Privacy and Anonymization

When data is communicated between two parties, it is often a requirement that the content of the communication remains secret to non-participating parties. Encryption techniques that tried to provide the required security were used since antiquity, usually to protect military secrets. Nowadays, personal sensitive data is in the core of data encryption; whenever two parties need to exchange information in a way that the content remains hidden, an encryption scheme must be applied. Furthermore, during a communication, each party should be able to verify the other’s party identity and that there is no case of malicious profile/pretender. The techniques used for both encryption and identity verification (via the so-called digital signature), typically involve similar techniques, which fall under the domain of cryptography. Data encryption involves converting and transforming data into scrambled, often unreadable, cipher-text using non-readable mathematical calculations and algorithms. Restoring the message requires a corresponding decryption algorithm and the original encryption key. Encryption uses an algorithm to scramble, or encrypt, data and then uses a key for the receiving party to unscramble, or decrypt, the information. The message contained in an encrypted message is referred to as plaintext. In its encrypted, unreadable form it is referred to as cipher text.

Figure 0‑1 Encryption and decryption process: Encryption uses an algorithm to scramble, or encrypt, data and then uses a key for the receiving party to unscramble, or decrypt, the information. The message contained in an encrypted message is referred to as plaintext. In its encrypted, unreadable form it is referred to as cipher text.

Apart from encryption, there are cases where data may contain both informational that needs to be communicated publicly and personal sensitive information that must remain inaccessible. In these cases, techniques that perform data masking on the original data set need to be applied. Data masking is a process where data elements should not be accessible by users with certain roles. These elements are then hidden and replaced by similar-looking fake data. Fake dataare typically characters that will meet the requirements of a system designed to test or still work with the masked results. Masking ensures vital parts of personally identifiable information (PII), like the first 5 digits of a social security number, are obscured or otherwise de-identified. The main difference between encryption and data masking is that a masked data set is in plain format and can be read by anyone but cannot be reversed to the original data in any way. In data masking, data privacy protection is ensured storing or transmitting masked data, without disclosing the transformations performed. Additionally, for encryption, reversibility is required, while for masking, reversibility is a weakness. At the same time, among the arsenal of IT security techniques available, pseudonymization or anonymization is highly recommended by the GDPR regulation. Pseudonymisation means the transformation of data in a way that personal data cannot be retrieved without the usage of additional identifiers, not present in the transformed set, while, anonymisation means the transformation of data in a way that personal data cannot be retrieved in any way from the transformed set.

Figure 0‑2 An example of using anonymization: There is no meaning to encrypt the measurements and results that take place in a medical study, as these need to be accessible by all interested researchers. The personal data of the patients involved however, should not be disclosed. Anonymization techniques can be applied to address cases like these.