How will GDPR affect Big Data analytics?

Time to read

By now, everyone will have heard of the General Data Protection Act… GDPR – the fearsome ‘big bad’ that will swoop in and shut down businesses for losing an email address. Come May 25th of 2018, all businesses will be legally required to comply – identifying what data is held, by whom and for what purpose… among other more complex necessities.

Today, however, we operate in a hugely different business environment than when the GDPR was first announced and drafted… a world away from the oversight of the Data Protection Act 1998 – despite it being the most up to date regulation of its kind.

So the question remains, with such stringent limitations placed on the use of information and data that is personally identifiable (PII) in nature, how will this regulation interact with the rise of Big Data and AI analytics?

Big data is, in short, defined as an ‘extremely large data set’ that may be analysed to reveal patterns, trends and associations. Gartner classify it as “high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”

The limitation, then, comes from the essence of the regulation itself. How can someone take advantage of Big Data fully, whilst also being entirely sure that the data being processed is entirely free of PII?

Most notably, whilst Article 22 of the regulation does not expressly forbid the automatic processing of data, it does forbid having persons subject to decisions made from it without their express permission (with notable exceptions). “The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.”

The issue, then, is twofold. Firstly, there is a glut of big data currently available to be analysed – information that has been created and developed without the technical oversight of GDPR. The second comes from actually using this data post-GDPR. In a dataset with potentially millions of records, how can businesses reliably and knowledgably assess whether any of these files have not consented to have their data analysed… who is responsible for this data? What are the limitations on its use? These are questions that simply cannot be answered with any ease.

The simplest solution would be to take the same strategy as most marketing firms are using with their databases… wiping them and starting from scratch. But is this a feasible approach?

The EU’s Digital Single Market strategy highlights Big Data as a “catalyst for economic growth, innovation and digitisation across all economic sectors … and for society as a whole.” In practice, the strategy is designed to “tear down regulatory walls” with the aim of contributing €415bn per year to the European economy as a whole. It stands to reason, then, that GDPR and the Digital Single Market strategy would not cancel each other out.

The ICO currently provides six recommendations for businesses working with Big Data in terms of GDPR compliance:

1) anonymise personal data, where personal data is not necessary for the analysis;

2) be transparent about the use of personal data for big data analytics and provide privacy notices at appropriate stages throughout a big data project;

3) embed a privacy impact assessment process into big data projects to help identify privacy risks and address them;

4) adopt a privacy by design approach in the development and application of big data analytics;

5) develop ethical principles to help reinforce key data protection principles; and

6) implement internal and external audits of machine learning algorithms to check for bias, discrimination and errors.

So how will GDPR affect Big Data analytics? The answer is complex, and entirely dependent on the nature of the Big Data utilised by each individual company. In the first stages, anonymisation of PII is one of the most clear paths to compliance – but the larger the data package to be analysed, the greater the chance of it holding an erroneous item of identifiable data.

The key, as with the majority of regulatory practices, now sits with due diligence. If businesses can prove a clear and precise trail of where the data was sourced, its intended purpose, and any efforts made to remove personally identifiable information, any compliance challenges should be mitigated, if not sidestepped altogether.