What is the meaning of “big data”?

Gartner Inc.-a leading research and advisory company- defines Big Data “as high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation”.
Big data involves large datasets taken from different sources from web data to social media data. This information allows organizations to gain relevant insights by analyzing and finding correlations in the data, most often but not uniquely for marketing purposes. It can give important insights into what advertisements might appeal to given users.

What is the GDPR’s stance on Big Data: does it restrict or enable their use?

At first, we need to focus on how organizations can exploit Big Data as different approaches can have different privacy implications.
We can identify two macro-categories. Opinion n. 03/2013 on purpose limitation adopted on 2 April 2013 by the Working Party says that Big Data can be used to identify more general trends and correlations but it can also be processed in order to directly affect individuals. We need to keep in mind this distinction, as it certainly has different impacts on individuals’ rights and requires organizations to act accordingly.

It is undeniable that the use of Big Data has brought so far significant benefits to our society and in different fields, from artificial intelligence applications to advertising, from public administration to national health service. However, as mentioned before the use and processing of Big Data can have relevant implications for privacy, data protection and the associated rights of individuals.
GDPR allows the use of big data by organizations and recognize its benefits, but at the same times requires them to put in place adequate safeguards to protect data subjects’ rights. This means that Big Data must not must come at the expense of privacy and data protection.

What should an organization do to exploit the benefits of big data and at the same time be compliant with the GDPR?

Organizations whose business is centered on Big Data analytics should focus on the following questions:

  • How can I guarantee transparency in the data processing?
  • How and to what extent the repurposing of personal data is allowed, in my specific contest?
  • Where is the legitimacy of the data processing? Do I have consent from the data subject and if the answer is no, on which legal grounds am I relying upon?
  • Which adverse effects, if any, can my processing have on the subjects involved?
  • Is the data subject aware of such processing or can he or she reasonably expect such type of data processing to happen?
  • How can I minimize the collection and processing of personal data?
  • Which security measures do I need to implement in order to minimize the impact of the processing on the individuals concerned?

These are some of the most relevant points that an organization must take into account when dealing with Big Data.

If my organization is processing public data, which measures should be taken to be compliant with GDPR?

Processing of publicly available data, intended as data made manifestly public by the data subject does not require consent under Art. 9 of GDPR provided that the following conditions are met:

  • personal data must be “processed fairly, lawfully and in a transparent manner in relation to the data subject”.
  • The data controller must have a legitimate interest for the data processing that must be balanced against data subjects’ fundamental rights: for instance, GDPR acknowledges the fact that organizations may have a legitimate interest in market research activities as long as the interests or the fundamental rights and freedoms of the data subject are not overriding, taking into consideration the reasonable expectations of data subjects based on their relationship with the controller. Legitimate interest would need careful assessment of the circumstances where the data was originally collected.
  • Another condition to be met is that further processing must be compatible with the purpose of original collection. Statistical purposes, according to Recital 50 of the GDPR, should be considered compatible lawful processing operations. An important factor to consider when assessing purpose compatibility is “the existence of appropriate safeguards, which may include encryption or pseudonymisation” according to Article 6 section 4 letter (e): when carrying out general analysis with Big Data, Recital 29 of GDPR says expressly that pseudonymisation techniques should be encouraged.
  • An organization must look at the effects of the processing on individuals and must minimize the processing of personal data to what is strictly necessary to achieve its purposes.
  • Organizations dealing with Big Data may also have to appoint a Data Protection Officer. The need for a Data Protection Officer occurs when the core activities of the controller or the processor consist of processing operations which, by virtue of their nature, their scope and/or their purposes, require regular and systematic monitoring of data subjects on a large scale.


Why pseudonymizing is so important?

Pseudonymisation has a key role when dealing with Big Data. Recital 29 of GDPR expressly provides that “measures of pseudonymisation should, whilst allowing general analysis, be possible within the same controller”. If on one side the GDPR recognizes the advantages of research carried out using Big Data, on the other side it balances this type of analysis with data protection, by clarifying that when doing general analysis (also based on Big data) the controller should undergo a data protection impact assessment according to article 35 of GDPR and implement measures which meet in particular the principles of data protection by design and data protection by default.

The implementation of pseudonymisation to personal data can reduce the risks to the data subjects concerned and help controllers and processors to meet their data-protection obligations, according to Recital 28 of GDPR. Pseudonymisation means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person and helps reducing the linkability of a dataset with the original identity of a data subject.

When creating clusters of individuals through data mining operations, organizations should implement techniques which avoid re-identifications and singling out of individuals. Pseudonymisation can help reducing the risk of harm to data subjects, so controllers that use it may be able in some cases to avoid notification of security incidents.

Processing of personal data for statistical purposes.

The use of data for statistical purposes is explicitly deemed to not violate the need to stay with a specific purpose and can be construed broadly, covering uses not just for the public interest but by private companies for commercial gain as well. In the Opinion 03/2013 on “purpose limitation”, the Working Party expressly stated that Big data can be used to identify general trends and correlations but its processing can also directly affect individuals”. ‘Statistical purposes’ in particular, cover a wide range of processing activities, from commercial purposes (e.g. analytical tools of websites or big data applications aimed at market research) to public interests (e.g. statistical information produced from data collected by hospitals to determine the number of people injured as a result of road accidents).

Therefore, an organization can use Big Data for statistical purposes without incurring in the purpose limitation principle, however the GDPR leave it also to the Member States, within the limits of the GDPR, to determine statistical content, control of access, specifications for the processing of personal data for statistical purposes and appropriate measures to safeguard the rights and freedoms of the data subject and for ensuring statistical confidentiality. This may certainly have an impact on harmonization at EU level, as we may have States that will be more permissive with the use of Big Data, and others to be more restrictive.

Do you want to know the first part of our GDPR Insights? Read now the blog post: https://roialty.com/general-data-protection-regulation-compliance-value-brands/.