Data governance is the key to realizing the full business benefits of generative AI applications.
Generative AI burst on the scene in 2022 with the introduction of ChatGPT. In less than two years, it has entered the mainstream, with organizations scrambling to take advantage of it. The latest McKinsey Global Survey on AI, released on May 30, 2024, found that 65 percent of organizations use gen AI regularly. That’s almost double the percentage from McKinsey’s previous survey just 10 months earlier. What’s more, organizations are starting to see real benefits from gen AI, including reduced costs and increased revenue.
Unfortunately, gen AI initiatives are often undermined by bad data.
Almost two-thirds (63 percent) of McKinsey survey respondents said inaccuracy was the greatest gen AI risk, up from 56 percent in the previous survey. In fact, inaccuracy is the only risk that saw a significant increase. Nearly one-fourth (23 percent) of respondents said their organizations had experienced negative consequences due to inaccurate gen AI output.
A big part of the problem is that organizations capture and store huge amounts of data but haven’t developed the processes for verifying its accuracy or value. To effectively deal with these issues, organizations of all sizes need a data governance framework to ensure the availability, accuracy, integrity and security of their information assets.
Formalize Policies, Procedures
A well-designed governance framework includes rules, policies and processes that formalize how an organization will collect, access, manage and use data resources. According to Gartner, only about a third of U.S. businesses have formal data governance frameworks in place.
What’s more, most organizations continue to depend on legacy data management processes. Typically, they approach it as a limited technology project that primarily requires installing or updating software and watching it work. This approach is inadequate for addressing challenges arising from a multitude of internal and external data sources, growing data volumes, data ownership issues and more.
Without a governance framework, data sprawl can become a huge issue. Data is continually copied for disparate applications, including backup, disaster recovery, development, testing and analytics. Over time this creates mass data fragmentation, with data spread across myriad infrastructure silos where it is accessed, updated and manipulated by many different users and applications. That makes it nearly impossible to have a “single source of truth” that ensures everyone in the organization makes decisions based on the same data.
Ensuring Effective Data Governance
The first step to creating effective governance is understanding that it is an ongoing process that will require regular assessments and adjustments. It must also be viewed not as simply an IT issue but as an organizational issue requiring the participation of executives, stakeholders and employees across the organization.
A governance program should also identify who owns and is accountable for data assets. Several mature data governance models, such as COSO's Internal Control framework and ISACA's COBIT, have called for IT to be the single data owner. Newer models establish decentralized ownership, putting data in the hands of business users. The danger of an IT-centric approach to governance is that data won’t be managed as an asset that crosses organizational boundaries. Organizations can wind up with a series of tactical projects that neither address underlying issues nor deliver expected benefits.
Decentralized ownership only works within tight parameters, however. There must be controls on who can access data and what they can do with it. Additionally, access rights should be periodically reviewed to ensure users’ permissions align with their job roles and do not compromise data security.
The Value of a Phased Approach
A governance program should also include procedures for storing, archiving, backing up and securing data. This is critical not only for business operations but to reduce the risk that gen AI will expose sensitive data or intellectual property.
A variety of tools can simplify these processes through automation. For example, automated archival tools streamline the process of moving data off primary storage tiers, and e-discovery tools deliver powerful search and tagging functionality that improve the ability to review and classify data.
Given the number of systems, processes and people involved, data governance initiatives aren’t good candidates for “big bang” implementations where everything happens at once. Most analysts recommend a phased approach focused on tactical projects that build toward the end goal. Start by creating a team of stakeholders, IT personnel and subject matter experts to develop an overall vision, incremental goals and project timelines. Addressing data quality issues persistently and consistently will allow organizations to realize the true value of their data assets and take full advantage of the transformative benefits of gen AI.




