Enterprise Data Quality Guide: “Who Does What” | Michael Segner | Sep 2024

SeniorTechInfo
3 Min Read

How Larger Organizations Can Operationalize Data Quality Programs for Modern Data Platforms

An answer to “who does what” for enterprise data quality. Image courtesy of the author.
Image courtesy of the author.

Data quality is vital for the success of any organization, especially when it comes to modern data platforms. However, many larger organizations struggle with how to operationalize data quality programs effectively. In this article, we will explore one answer and many best practices for how larger organizations can operationalize data quality programs for modern data platforms.

Practical questions often arise when discussing data quality in larger organizations. Who is responsible for what? Why is data quality important? How can data quality be ensured across different teams and departments?

Imagine data quality as a relay race, where each leg – detection, triage, resolution, and measurement – depends on the other. To ensure the success of a data quality program, it is essential to have a clear understanding of who does what and why.

Modern data teams are realizing the need to align around their most valuable data products. Whether it’s a revenue-generating machine learning application or strategic insights derived from curated data, organizations must invest in their data products to drive business value.

Foundational Data Products

Prior to becoming discoverable, every foundational data product should have a designated data platform engineering owner responsible for end-to-end monitoring of freshness, volume, schema, and baseline quality. By setting baseline quality requirements and ensuring data consistency, organizations can lay a solid foundation for their data products.

Derived Data Products

Monitoring data quality at the derived data product level is crucial to prevent bad data from entering the system. Domain-based data stewards should be responsible for triaging alerts and ensuring quality at this level.

One popular way to bridge the gap between foundational and derived data products is through a dedicated triage team that supports all products within a given domain. This approach ensures efficiency without compromising on quality.

By setting explicit SLAs and monitoring table-level health scores for derived data products, organizations can ensure high data quality across use cases.

Summary

Operating data quality programs in larger organizations is no easy task, but with clear processes, ownership, monitoring, and communication strategies in place, it is possible to achieve success. By prioritizing data quality and operational response, organizations can build trust in their data and drive business value.

For more insights on data engineering, data quality, and related topics, follow me on Medium. Together, we can strive towards achieving excellence in data quality for modern data platforms.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *