Secure data collaboration brings great value to companies, through the exchange of data between business partners we can help grow the business, but it also involves exposing the company to a great risk such as data leakage and lose of control. In this article we discuss how to protect the information in use so that you don’t have to give collaboration up.

Table of Contents:

 

The difficulty of protecting information in use

 
One of the three most complicated states of data to protect is that of data in use, i.e. when it is accessed by one or more applications for processing. In environments such as BigData or DataLakes it is not complicated to protect them through encryption. Nor is it complicated, rather it is a mandatory measure, to encrypt data in transit to and from these platforms using HTTPS communications or any other type of encryption in the transport of data.

However, once they are being consumed, protecting and controlling them is especially complicated. Control and protection usually goes as far as access, including identity management mechanisms or even conditional or role-based access (RBAC-Role Based Access).

When data is shared with a third party, if it is being displayed to a user or consumed by an application, it is very difficult to prevent it from being copied to another system, for example.
 
Data in Use
 

Benefits and challenges of data protection

 
If we are talking about sharing particularly sensitive data such as medical data or extremely confidential data, it is possible that the risk of sharing outweighs the benefit of sharing, sometimes, even when security measures have been implemented. A loss of medical data, for example by a third party (e.g. a data processor), can result in heavy fines for the collector or sharer (e.g. a data controller). The benefits of sharing medical data or evidence are very high in the case of new drug development, etc., but a loss can have serious consequences if this data is leaked or lost.

On the other hand, if it is a matter of data sharing between pharmaceutical companies, for example, the confidentiality of the data also comes into play, which can reduce competitiveness and also lead to losses of millions of dollars for the organization in the event of loss or leakage.

The technological challenge in the case of data in use is how to protect them while maintaining their confidentiality and privacy, and mitigating possible information leaks.

Not surprisingly, in the Ponemon Institute’s annual analysis of the cost of a data breach, third- party data breaches are among the factors that most increase the cost of a data breach, among others such as regulatory compliance failures.

Despite benefits such as improved productivity and efficiency, not being able to guarantee the control and protection of data in use is one of the biggest barriers to data sharing between companies, as stated in a study conducted by Everis for the European Commission (Study on data sharing between companies in Europe). Concerns about not being able to maintain privacy (49%) or fear of losing trade secrets (33%) are the top 2 barriers to data sharing between companies. European and British governments want companies to share more data. Governments know that the value of data increases when it is shared and used by many. That’s why, for example, the Data Governance Act and the Data Act are laying the groundwork to encourage data sharing and establish clear rules about data access and use. However, governments know that insisting on

this is not enough to ensure that data is shared securely, which is why (Data spaces; a concept initially developed by the Fraunhofer Institute) are being promoted in Europe to increase data sharing by allowing companies and citizens to maintain control of their data.

The cost of protecting the data in use is another issue identified by UN statisticians. In many cases, the data being collected are sensitive and include details about individuals and organisations that can be used to identify them and draw conclusions about their behaviour, health and socio-political leanings. This data, in the wrong hands, can be used to cause physical, social or economic harm.
 

Privacy Preservation Techniques

 
That is why the Privacy Preserving Techniques Task Team (PPTTT) is advising the United Nations Big Data and Data Science Working Group on the development of a policy framework for data governance and information management, specifically around “privacy preserving techniques”.

In the “UN Handbook on Privacy-Preserving Computation Techniques” published by this team (last updated September 2021), the different emerging privacy-preserving techniques are explored, describing the state of the art of these techniques and the challenges to bring these technologies into widespread use.

We will summarize below some of these techniques without going into the cryptographic mechanisms behind them:
 

1. Secure Multi-party Computation

 
Secure multi-party computation is a subfield of cryptography with the goal of creating methods for parties to jointly compute a function over their inputs while keeping those inputs private. Unlike traditional cryptographic tasks, where cryptography guarantees the security and integrity of communication or storage and the adversary is outside the participants’ system, cryptography in this model protects the privacy of the participants from each other.

Imagine that we want to know the average salary of 4 persons (A, B, C, D) without them having to disclose their salary individually. Person A, would divide his salary e.g. 50 in four parts e.g. 20, -10, 45, -5 and would share with B, C and D one of these 4 parts (-10, 45, -5). B, C and D would do the same respectively with their salary. Once each one has a part of her salary and a part of each of the others, they would add up their parts and share them with the rest. Dividing the total by 4, the average of the salaries would be obtained without having to have revealed the individual salary.

For the most part, MPC is still a subject of academic research. Some companies use MPC protocols for specific functions. Some specialize in standard products that cover specific problems and others develop customized products or specific consulting.
 
Secure multi party computation
 

2. Homomorphic Encryption

 
Homomorphic encryption is a form of encryption that allows users to perform calculations on their encrypted data without first decrypting it. These resulting calculations are left in an encrypted form that, when decrypted, result in output identical to that produced if the operations had been performed on the unencrypted data. Homomorphic encryption is a form of encryption with an additional evaluation capability of processing on encrypted data without access to the secret key. The result of such a computation remains encrypted. Homomorphic encryption can be seen as an extension of public key cryptography.

An example of use would be a cloud computing service for medical data where different companies would hand over encrypted data and the service could perform calculations without having to decrypt it. This would avoid complex legal processes of confidentiality to operate with data as sensitive as patient data.

Some of the challenges with homomorphic encryption techniques are the performance, which is much lower than using unencrypted data, and the lack of ability to verify the correctness of the results.

There is an initiative to standardize homomorphic encryption. Although at a theoretical level the technology is at an advanced stage, the applicable technological solutions are still scarce. There are libraries and implementations by different centres and organisations such as HElib from IBM Research, PALISADE from the New Jersey Institute of Technology or SEAL from Microsoft Research.
 
Homomorphic Encryption
 

3. Differential Privacy

 
It is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while retaining information about the people in the dataset.

It is based on applying a certain “noise” to queries made on a set of data that can extract the correct answer without the need to expose individual data. For example, you could let a platform query how many people over 50 years old there are in Paris without the need to export the census and ages of citizens.

Chrome uses this approach to discover frequently visited pages to improve its caching features. Apple uses it in iOS to discover frequently used words and emojis in a text messaging app to improve predictive text models.

There are currently no standard commercial products, although there are academic resources on differential privacy implementations for solving specific problems. For example, the Private Data Interchange Interface (PSI) developed by the Harvard University-led Privacy Tools Project implements a generic methodology for providing access via differential privacy to sensitive datasets.
 
Differential Privacy
 

4. Zero Knowledge Proof

 
In cryptography, a zero knowledge protocol or zero knowledge proof, also known by the acronym ZKP (Zero Knowledge Proof), is a cryptographic protocol that establishes a method for one party to prove to another that a statement (usually mathematical) is true, without revealing anything other than the veracity of the statement.

The application of this technology to real cases makes it possible to show audit evidence of certain issues without having to provide additional details that are not necessary: E.g. checking that taxes have been paid, that a person is over 18 without having to show a driver’s license or national ID card. No need to give access to the address, to check that you are over 18.

Although ZKP technologies are in the process of maturing, there has been a strong push for real- world adoption of this technology in recent years. Several companies have created products that are based on ZKP (e.g. Microsoft’s UProve and IBM’s Idemix). On the other hand, a practical application of zero knowledge has been realized in the context of cryptocurrencies such as zCash and more broadly in blockchain, facing for example the validation of the history of transactions to the synchronization of a new node.
 
Zero Knowledge Proof
 

5. Trusted Execution Environment

 
It is a secure area of a main processor that ensures that the code and data loaded inside are protected with respect to confidentiality and integrity. The processing in a TEE is not done with the data while it is encrypted, but the execution environment is secured by special hardware. This environment is usually called an Enclave. The memory space of this enclave is protected against access while it is stored on the processor chip.

Examples of TEE are the protection of premium content in streaming environments (e.g. movies, audio) on HDTVs, smartphones, etc., which prevent the owner of the phone from accessing the data stored on the device.

This technology requires special hardware such as Intel® SGX or ARM’s Trustzone. There are also different libraries that allow processing on these platforms such as Google Asylo or Microsoft Open Enclave SDK. Some Cloud environments such as Microsoft Azure also offer TEE capabilities.
 
Trusted Execution Environment
 

Privacy Objectives and their relationship to these techniques

 
In a secure data collaboration environment we have input, processing and output data. In this sense there are different privacy objectives:

  • Input Privacy: Implies that the processor cannot access or derive values in addition to those provided in the input. Input privacy covers the input data, and the intermediate and final results of processing.
  • Output Privacy: Implies that the output results do not contain identifiable input data. Output privacy is a property of the output product.
  • Policy Enforcement: A privacy-preserving system implements policy enforcement if it has a mechanism by which the giver of input data can exercise control over what computations can be made on the input data and what output data can be published. Policy enforcement is system-wide.

 
Privacy objectives
 
The following graphic shows a summary of how the technologies described above apply to these privacy goals. For example ZKP or HE avoid obtaining data from the input (Input Privacy). Differential privacy techniques avoid reverse engineering the results to obtain input data (Output Privacy). MPC or TEE techniques, in addition to avoiding providing input data (Input Privacy), establish rules that allow only very specific operations or queries to be performed on the input data.
 
Privacy and technology objectives
 

In-Use Protection for Unstructured Data

 
As explained in the following article, PPC techniques are maturing for analytics and artificial intelligence uses. However, all these technologies do not normally refer to the sharing of unstructured data such as files or documents. They are technologies that are evolving for use cases in BigData or DeepLearning environments where information is shared in order to make calculations with them avoiding revealing sensitive data.

In the case of documents or files, the protection in use could be given with anonymisation techniques avoiding the inclusion of certain private individuals in the documents, but its use is for very specific cases of use and not collaboration understood as the need to work or collaborate with documentation in real time.

In the case of document protection in use, the most effective techniques are those of digital rights control (IRM-Information Rights Management; E-DRM – Enterprise Digital Rights Management) that allow to control who can open a document, from where, under what conditions (only view, edit, print, copy and paste, etc.) and even to monitor the actions on the data.

IRM basically allows you to have a Secure Enclave, or “Digital Embassy” on someone else’s computer that allows access to the data but under certain conditions. The owner of the data is not the same as the owner of the device, and the former has a small amount of power on the latter’s device.
 
IRM-Information Rights Management
 
IRM allows you to securely collaborate on documents no matter where they are and where they are located. Whether they’re stored in the cloud or on someone else’s device.

If you want to see what these controls offer using an end-user friendly solution, please contact us.