Most of the time it clear and self-evident what personal data is, but sometimes it is not so apparent. If you haven't checked out our post How to know if data is personal data: avoid rookie GDPR mistakes. Reading the post about how to determine what data should be designated as personal will make you understand this post better.
Identification of an individual using personal data
Can we identify a specific person from a set of data that relates to them?
The goal is that the specific individual is singled out from others (whoever others are).
In order to identify a specific person, an identifier is needed. The needed identifier has to be to reference exactly one specific individual, and for the sake of this blog, the author will address it that identifier that returns a unique value response (1). If an identifier returns zero values it will be referred to null value response (0), if the identifier returns more than one result it will be returned to a group value response (>1) and if the identifier returns an error response it will be returned as error value response (n/a).
Word identifier is used here because it is mentioned in the provisions of Article 4.1. of GDPR and Preamble item (30) of GDPR, and the author's intent is to build the concept of the identifier by broadening it.
For the sake of the following example, the following personal data types and their values are used to identify a specific person: the individual of a certain age (e.g. 30 - 45) has a certain salary (e.g. 90.000 EUR/year), works for a certain company (e.g. Acme Ltd.), has been employed at a certain point in time (e.g. September 2020), lives in a spacy apartment (e.g. 6th Arrondisement in Paris) bought by low-interest rate mortgage (2%) because of good credit rating, drives a specific make and model of a car (new Audi Q5).
A combination of data values of salary, employment month, neighborhood, mortgage interest rate, and make and model of a car is called a composite identifier (as it is composed of a specific combination of several data types).
An identifier that uses only one data type to identify a specific person is called a simple identifier.
In the above-mentioned example, several data types are actually are referred to as identifiers in Article 4.1. of GDPR.
Did we manage to single out one specific person or a group of people in the last example? For the sake of this argument, let say that the identifier responded by referencing 3 people (small group but still a group).
Does the combination of all of the data that we used to build our composite identifier yield:
- group value response or a
- unique value response or
- null value response or
- error value response?
The answer should be self-evident as we said that this composite identifier references 3 people, the result was a group value response. However, the expected result was that the identifier returns the reference to exactly one (1) specific individual, or yields a unique value response.
In the process of identifying one specific person, this matter can be approached from the following angles:
- change composite identifier that returns unique value response by adding or substituting more relevant data types and changing the number of elements that composite identifier was initially made of or
- discard the composite identifier and change the underlying data, acquire new data, transform existing data and use such data type as a simple or composite identifier that would yield unique value response and reference exactly one person with extremely high probability (e.g. social security number)
Can we identify one specific person and yield a unique value response, using above mentioned personal information(information that combined constitutes the composite identifier) with certainty? It becomes self-evident that we cannot claim that with certainty, so the answer lies in the optimization of either identifier or data.
It becomes self-evident that the first approach to this situation would be to modify a composite identifier by:
- adding, subtracting, or changing data types in composite identifier consist of (e.g. it would be possible to add that person fluently speaks more than four languages as a TRUE or FALSE result) and/or
- adding value conditions to data types values in underlying data in the existing composite identifier (e.g. we can add the condition that age is less than 39) or
- transforming the data, redoing the identifiers, and optimizing again
until the identifier yields a unique value response and references exactly one specific person. It is obvious that the heuristical approach is the only right approach for getting the expected results.
If it's possible to identify exactly one specific person using only information available at hand (information that is readily available and being processed), that is called:
is a process of identity confirmation where information at hand is not enough to confirm identity, but combining available information with additional information from a third party would be able to confirm the identity. To put it a bit differently, there would be no identity confirmation without a third party so that why it is called indirect identification (as the process must engage the third party)
On a high level, indirect identification relies on the fact that a risk of having inadequate data to confirm one's identity will be treated by identity confirmation requestor doing the following:
- engaging a third party that would enrich the data of the identity confirmation requestor with what additional data thrid party can offer
- combine the data of the identity confirmation requestor and the data of the third party to successfully confirm the identity of an individual and/or
- employ the third party to conduct the identity confirmation process fully and revert only response: if the identity is successful and confirmed (accompanied by the full data set needed to identify a specific person) or unsuccessful and not confirmed (accompanied by reasons why - enumerated error codes).
This last step(employing the third party to conduct the identity confirmation process fully) can be used either as:
- primary identity confirmation and/or
- the failsafe process that runs only if the identity confirmation requestor is unable to confirm the identity, that third party tries to confirm the identity and/or
- secondary identity confirmation where it is expected to return the same value as primary identity confirmation (if not, identity confirmation is unsuccessful and identity is not confirmed)
The indirect identity confirmation process relies on:
- primary identity confirmation run by an alphanumeric imperfect simple identifier run by a third party, provided by an institution with high authority and a strong focus on identity management. However, as identity management is not the institution's core business, errors are expected, but on such a low level that the risk is negligible. However negligible, the risk still needs treatment. Risk is treated with a carefully crafted combination of composite identifiers described in the second bullet in the secondary identity confirmation process
- secondary identity confirmation that consists of a combination of corrective composite identifiers that meet specific significant criteria (data type components) that would assure that exactly one specific person is recognized by narrowing down the group of multiple persons to one specific person
Results would be interpreted as follows:
- if primary and secondary identity confirmations return the same one result referencing a specific individual, identity confirmation is successful and identity is confirmed, (if not, identity confirmation is unsuccessful and identity is not confirmed)
- if primary or secondary identity confirmation return exactly one result referencing a specific individual, and the other identity confirmation returns more than 1 or more results of which both primary and secondary identity confirmation results contain a reference to the same specific person, identity confirmation would be successful and identity confirmed (in any other case, identity confirmation is unsuccessful and identity is not confirmed) except
- in case primary identity confirmation is unable to run or returns an error, the results of the secondary identity confirmation process would be accepted as true only if it returns a reference to exactly one specific person (if not, identity confirmation is unsuccessful and identity is not confirmed)
In any case, any data used to identify a specific individual should be classified as personal data, regardless of any other facts or process, and should enjoy full protection as set out in GDPR.
This blog post is made available by the author who is a licensed ISO 27001 Internal Auditor and has extensive experience in managing privacy. This blog is intended for educational purposes only as well to present views of the author how business understands the law, not to provide specific legal advice. By using this blog site you understand that there is no attorney-client relationship between you and this blog publisher. The blog should not be used as a substitute for competent legal advice from a licensed professional attorney. Views of the author do not necessarily represent views of Infranet (see our incorporation details) nor does it constitute a promise. Photos: Pexels.com