Improving Data Quality

Improving Data Quality

Good governance is the key to realizing the full business benefits of big data initiatives.

Data is the coin of the realm for businesses today. Organizations of all shapes and sizes are embracing technologies that allow them to analyze massive data sets in order to gain market insights, optimize operations and create personalized customer experiences.  IDC analysts predict the global market for big data and business analytics solutions will increase from $189 billion in 2019 to $274 billion by 2022.

Unfortunately, big data initiatives are often undermined by bad data.

Only 15 percent of big data initiatives make it past the pilot stage, according to Gartner. Meanwhile, Experian reports that an astounding 95 percent of organizations say poor data quality causes operational inefficiencies, lost productivity, compliance risks, customer satisfaction issues and other negative effects. What’s more, poor business decisions based on inaccurate or faulty data are extremely costly — a recent IBM study says poor data quality costs U.S. businesses more than $3.1 trillion every year.

A big part of the problem is that organizations are capturing and storing huge amounts of data but haven’t developed the processes or organizational structure to properly verify its accuracy or value. To effectively deal with these issues, businesses of all sizes need to develop a comprehensive framework to ensure the availability, accuracy, integrity and security of their information assets. Although small to midsize businesses (SMBs) may have different needs and objectives than large enterprises, they have much to gain from effective data governance.

Formalize Policies, Procedures

A well-designed governance framework includes rules, policies and processes that formalize how an organization will collect, access, manage and use data resources across the entire organization. According to Gartner, only about a third of U.S. businesses have formal data governance frameworks in place.

What’s more, most organizations continue to depend on legacy data management processes that predate the big data era. Typically, they approach it as a limited technology project that primarily requires installing or updating software and watching it work. According to the Experian study, this approach is inadequate for addressing challenges arising from multitude of internal and external data sources, growing data volumes, data ownership issues and more.

Without a governance framework, data sprawl can become a huge issue with data being continually copied for disparate applications including backup, disaster recovery, development, testing and analytics. Over time this creates mass data fragmentation with data spread across myriad infrastructure silos where it is being accessed, updated and manipulated by many different individuals. That makes it nearly impossible to have a “single source of truth” that ensures everyone in an organization is making decisions based on the same data.

The first step to creating effective governance is understanding that it is an ongoing process that will require regular assessments and adjustments to meet changing goals and challenges. It must also be viewed not as simply an IT issue, but as an organizational issue requiring the participation of executives, stakeholders and employees across the organization.

Defining Data Ownership

A governance program should also identify who owns and is accountable for data assets. Several mature data governance models such as COSO's Internal Control framework and ISACA's COBIT (Control Objectives for Information and Related Technologies) call for IT to be the single data owner, but that is changing. Newer models establish decentralized ownership, putting data in the hands of business users.

The danger of an IT-centric approach to governance is that data won’t be managed as an asset that crosses organizational boundaries. Organizations can wind up with a series of tactical projects such as data migrations or storage upgrades that neither address underlying issues nor deliver expected business benefits. As the Harvard Business Review notes: “Companies that want to compete in the age of data need to do three things: share data tools, spread data skills and spread data responsibility.”

Decentralized ownership only works within tight parameters, however. There must be controls on who can access data and what they can do with it. Additionally, there should be periodic reviews of access rights to ensure users’ permissions align with their job roles and do not compromise data security.

A governance program should also include procedures for storing, archiving, backing up and securing data. These are primarily IT issues, but a variety of emerging tools are simplifying these processes through automation. For example, automated archival tools streamline the process of moving data off primary storage tiers, and e-discovery tools deliver powerful search and tagging functionality that improve the ability to review and classify data.

Given the number of systems, processes and people involved, governance initiatives aren’t good candidates for “big bang” implementations where everything happens at once. Most analysts recommend a phased approach focused on tactical projects that build toward the end goal. Start by creating a team of stakeholders, IT personnel and subject matter experts to develop an overall vision, incremental goals and project timelines. Addressing data quality issues in a persistent and consistent manner will allow companies to realize the true value of their data assets in the long run.