When it comes to big data, most companies are familiar with its volume, variety and velocity, but they may not be as attuned to data quality. That's because decision-makers often rush forward with ambitious plans to improve business intelligence strategies, but don't know that in order to discover golden nuggets of information, they need to govern the content they collect.
What is big data, exactly?
Business leaders' lofty goals for improved efficiency and productivity may not come to fruition if they were taken in by the buzz surrounding big data and didn't spend enough time gaining a better understanding of what it is and how it can help.
Many people know they want big data, but they have a hard time nailing down its exact definition. Information Management describes it as data which, due to its qualities and complexities, requires innovative processing solutions. It is a combination of structured data - that which fits into traditional databases, such as contact information or mathematical model outputs - and unstructured data. This encompasses many forms of content that can't be compiled quite as easily. Tweets, videos and website browsing metrics fall under this category.
Big data helps companies connect the dots
Combining both types of information is helpful because it allows companies to identify correlations that might otherwise have been difficult to see. For instance, a local drug store might determine that a spike in toothpaste sales coincides with the time when summer camps send supply lists to families who have enrolled their children, based on customers' tweets and Facebook posts about anticipated departures.
Data quality tools ensure that insight is correct
However, analysts can cross wires and come up with incorrect assessments if data isn't accurate and complete. This can occur when data quality is considered as an afterthought to big data plans, or if businesses underestimate how varied the content can be.
Software developer Patrick McKenzie recently created a blog post for Kalzumeus that demonstrates this idea. McKenzie explains that when companies invest in software, they make a number of false assumptions about their systems' needs. One of the most prevalent issues deals with names. This simple and important piece of identifying information can be misconstrued if programs have a limited number of spaces for characters, cannot include foreign letters or do not have enough fields for an individual's complete name.
Data quality tools can call attention to problems in content and prevent companies from making decisions based on inaccurate information.