Skip to main content

Data quality: The necessary prerequisite for identity resolution

Ray Wright

Let’s face it, having a robust identity resolution capability is a key requirement for most consumer-facing organizations today. As more interactions and processes are digitized, being able to recognize and authenticate individuals to prevent fraud and maximize customer satisfaction is now table stakes. Identity resolution is also imperative for insightful analytics and profitable marketing campaigns. 

What is identity resolution?

The core of identity resolution revolves around the concept of matching identifying information to individual elements of internal or external reference data. How can you tell two or more records are related? The answer: Leverage a unique identifier such as a customer ID number or merge the records into one, enabling you to have a trustworthy customer database.  

To identify if two records are related or that a single record is valid, third-party identification can be used—and actually, may facilitate matching with external data. Frequently used data identifiers include email addresses, device IDs, and IP addresses. Identification numbers that are becoming more popular include mobile IDs, addressable TV IDs, social handles, and IOT (Internet of Things) device IDs like an Apple Watch or health-and-fitness mobile phone app IDs. 

Top challenges for identity strategies

1. Many data sources

As the number of devices used by consumers increases, the number of data sources grows. In this, it becomes challenging to pair data elements across many different sources to identify matches and then, if required, to merge them into single customer records.  

Consider this example: In some cases, the consumer’s name and contact information is captured and added to the record. In this situation, it may be possible to know that two records are matched because the name, address, date of birth, and social security number are identical on each record. However, while dates of birth and social security numbers do not change, it is entirely possible that they have been entered incorrectly into the acquiring application or form—or worse entered by someone who has created a fake synthetic identity. It’s also quite possible that names and addresses have changed, or were entered differently, especially when a long period of time has passed.

2. Data decay  

The passing of time can result in data decay—in other words, consumer data becomes outdated in some way. Our research shows that companies believe up to 30% of their contact data may be inaccurate at any point in time. When new sources of data are added it can be hard to maintain relationships with older sources. Other elements may never have been correct because they were captured or entered manually. Humans often make mistakes.  

3. Unreliable data

Inaccurate data is one of the biggest challenges for identity resolution. How can you be sure that a match between data fields is accurate if the underlying data is suspect? Consider this example: One record shows that J. Nash lives at 123 Main Street and another record shows that John Nash lives at that same address. A third record has John Nash living at 213 Main Street. It’s hard to say if these all relate to the same person without corroborating data. Do other data fields match or better explain the differences? Does John have a sister or wife with the initial “J”? Has John Nash moved since completing earlier forms? 

Control your data quality 

The common thread between the challenges: data quality issues. With more data sources and ways to collect data it becomes more difficult to harvest valuable and quickly actionable information on each customer. In turn, without data quality, it will be hard to add a unique identifier and be confident that each record is a truthful source.  

In this section, we outline the top data quality best practices needed in today’s tech stacks.  

1. Data validation

Best practice is to validate contact data as it is collected. In that way, errors can be identified while the consumer is available to correct them. Then, after data has been collected, regular checks can be made to identify changes and update the data. For example, people move and their old addresses become obsolete. Running a check against the National Change of Address (NCOA) database periodically will help in identifying up-to-date addresses. 

2. Data profiling

Other valuable checks can be performed with data profiling. Profiling compares data in the same fields against one another, checking for format variances, if data falls in the right ranges (e.g., dates of birth should fall within a certain range), and whether relationships between data in different fields always holds true. For example, a large bank found that despite being associated with different first and last names, some addresses were used in hundreds of credit card applications. The expectation was that consumers with different names will typically live at unique addresses. Checking the data quality showed otherwise and pointed to potential fraud. 

3. Data matching

When there are common and verified identifiers in different data sources then matching records is relatively simple. However, in most cases, challenges arise because the data is not sound. Circling back to the top challenges we outlined above, let’s take a look at how inaccurate data can create roadblocks for data matching and, in turn, identity resolution.  

Difficulties arise when the data does not have common identifiers, is formatted differently, has different field names, and so on. Two approaches are typically used to improve record matching, deterministic and probabilistic. In the deterministic approach, basic logic is used.

If a unique identifier in record A matches one of record B and a different unique identifier in B matches one of record C, then record’s A and C are related to the same individual. In the probabilistic approach, the likelihood of a match is considered. If records for John Nash and Jack Nash are compared and found to have the same data of birth and address, there’s a strong probability that the records relate to the same person. It’s not 100% certain (deterministic) they could be twins, but the likelihood is very high – assuming you trust the data.
 

How data quality enables identity resolution and impacts the creation of a single customer view

Data quality is a necessity for maximizing identity resolution and strong identity resolution capabilities are needed to ensure the success of any single customer view project. 

With accurate data and both deterministic and probabilistic approaches to matching, it is possible to think about combining all customer data into a single or a 360⁰view. As discussed, combining records from different sources requires resolving identities between different records or data sources. Typically, a single customer view refers to the consolidation of all internal customer data into a set of comprehensive customer records. A 360⁰ view refers to the addition of external or third-party data to the consolidated internal view. 

There’s a debate though: Is it really possible to build a comprehensive 360⁰view—to bring all customer data together? And does the cost make it worthwhile? 

Most organizations want to know who their customers are, their needs, locations, and channel preferences. This seems essential in today’s fast-moving world. Afterall, customers much prefer to be informed about relevant offers and to be communicated to in a personalized way that recognizes their prior purchases and their expected needs. Yet whether it makes sense to attempt to consolidate all customer insights and attributes into individual records depends on the complexity of your business. There are many considerations, but two very important criteria are: 

1. The state of data

Consider the type of data you generate, who is responsible for managing data, and the output of those actionable insights. Our latest research shows that 45% of the US companies responding to our recent global data management survey have more than 10 contact databases. On top of that, add the many other data sources that hold related transaction data - web visit data, e-commerce data, social media data, and a wide variety of third-party data - and the challenge of creating a comprehensive 360-degree view may be too complex and expensive. It may be easier for each function to create its own single source of truth, at least as a first step, and to periodically sync the different sources. 

2. Data technology 

It is also important to consider how many different data platforms are in use. For example, production and finance departments rely on ERP systems to manage orders, inventories, and financials. They also rely on a master data management system to store important product, customer, and supplier data. Sales organizations will use a customer relationship management system to keep track of customer engagements, opportunities, and contracts. Marketing departments increasingly prefer to leverage a customer data platform to generate a segmented marketing database, orchestrate campaigns, and manage ROI. Integrations between platforms can enable data sharing across departments who may have varying tech stacks, without the need of a monolithic database.

The key to success is to ensure that accurate data is maintained. Sharing poor quality data from a single source of truth or across multiple platforms will lead to a lack of trust and a diminution in your ability to recognize customers when their data appears online or in one of the platforms. Strong identity resolution requires accurate data to work. Without it, there will be little trust in your customer insights, and it will be difficult to come to decisive conclusions. 

Want to learn more about Experian's identity capabilities?

Experian data validation for Salesforce