Data Governance in the Cloud: Make It Lean and Clean
All data needs governance. Especially, data in the cloud. Before we go into how to approach data governance in the cloud, let’s look at the origins of data governance and how it’s evolving.
Historically, there have been a few different ways to implement a data governance program. The first was to set up a centralized office for data governance, hire data stewards, and have them begin governing your data. Unfortunately, that didn’t work so well due to a lack of expertise in the data or the organization.
Next came Data Governance 2.0. Instead of a centralized program, federated data governance became all the rage. Stewards were identified from among the business, anointed, and charged with managing data, and the centralized office became a facilitator and provider of tools.
But that didn’t work so either because stewards already have full-time jobs and no real motivation to take on extra work. As a result, some organizations have thrown up their hands and dissolved their data offices.
But there’s a third way, focused on lean governance and just-enough oversight. Modern cataloging tools such as Alation automate a lot of the work, and cloud-based data platforms such as Snowflake provide features to implement even the most challenging access rules and sharing networks.
At Hakkoda, all of our solutions include built-in data governance. But what does that mean? First, it means that data governance principles apply to every step, and every deliverable has data governance woven into it at a foundational level. It means that you don’t need to spin up an organization-wide data governance program to reap the benefits of governed data.
A Multi-Pronged Approach to Data Governance in the Cloud
Every data object we deliver conforms to basic metadata minimums. These minimums are to ensure that all data assets are documented to a consistent standard. You’ll have consistent, detailed info about every database, schema, table, column, and metric we create. This means that all of your views, queries, shares, and transformations will be reliable and usable. We’ll tag your data in Snowflake to make sure everyone knows where the sensitive data is – or anything else you want people to know about it, like domain, lineage, regional usability, approved uses, inclusion/exclusion scenarios, or basic fit-for-use info.
We’ll help you understand the profiling data you receive from the tools you already have, such as Snowflake, Alation, and dbt. And how to apply the business rules you already know to show where data quality may be lacking. And, use those insights to recommend automated techniques for remediation, mitigation, and programmatically improving the quality of data you produce or receive.
Your rules for who can access your data will be implemented at a programmatic level. We use Snowflake’s native data governance features, including dynamic data masking, row-level access control, user audit, access monitoring, and tagging. We can help you set up a reusable, flexible structure for implementing these features if you don’t have one already, too.
Dark Data Audit
If we discover ‘dark data’ in the course of our engagement, either unintentionally or based on your request, we’ll share a report of our findings with you. Then you can decide what to do with this stale, unused data. Dark data is usually the result of forgotten processes, temporary tables never removed, or outdated methods and workarounds. Most likely, you won’t want to migrate it to the cloud, but you may not be ready to part with it, either. We can be your partner to determine the best course of action.
Old data can be a liability, so it’s important to retire it, archive it, and purge it at end of life. What is the end of life? There are multiple factors to be considered, but you should consider your organization’s legal requirements based on relevant regulatory laws, business requirements, your industry’s standards and best practices, and any data destruction rules that apply to your business. Once you’ve defined how long your data should be kept, we can automate your purge requirements using Snowflake’s innate data protection features. Do you need to keep it in a cold archive for potential (but not fast) retrieval? We can help you locate unused or expired data and automate regular unload and removal to AWS Glacier, Azure Archive, or GCP Archive storage.
Data Governance in the Cloud
One of the biggest challenges data professionals face is understanding how cloud technology impacts their data environment and how to apply governance effectively. We’re happy to offer a governance program review focused on the data cloud. We will partner with you to confirm where your governance is fantastic, and identify where it could be more fantastic. After a joint review based on cloud data governance best practices, we’ll provide you with a score and recommendations for the following areas: Governance roles and responsibilities, data cataloging and metadata, data quality and automation, policies and standards, and potential data valuation and monetization opportunities.