Previously, I’ve covered dynamic data masking and row-level access, which comprise half of Snowflake’s set of data governance features. This time I will cover the other half – tagging, and the ability to audit user access.
Let’s start with tagging. Most people are somewhat familiar with tagging – it’s a way of attaching some information to an artifact – a web page, data object, report, etc. You can tag your data objects in Snowflake with all sorts of metadata but for governance purposes, we’re specifically excited about using tagging to indicate data classification or sensitivity. You set up tags as key-value pairs (key = ‘value’) where the key equals the type of tag you are attaching, and the value indicates what the value is. So you might set up one where the tag key is “classification” and the values could be “public” “private” “restricted” and “confidential.” When you go to affix a tag to a column such as Social Security Number, you would set your tag as (classification = ‘confidential’).
You’d definitely want to lay out your set of tags and acceptable values in a framework or model prior to beginning tagging. Are you tagging classification levels, or do you need to go straight to PII? A PII tag could be as simple as (PII = ‘Yes’), and you’d want to apply that at the lowest level possible. What I mean by that is if you apply (PII = ‘Yes’) at a table or schema level, you’re saying that every column in that table or every table in that schema contains PII. That will make it more difficult than it needs to be when you’re ready to start using tags for their greatest purpose – searching for your tagged data and using it to apply masking and access policies.
So, when you’re using tagging as part of your data governance controls in Snowflake, it’s best to be very targeted and specific about attaching them at the right level (account, database, schema, table, or column) so that you can apply masking and row-level access policies only to the data that needs it. Column level is ideal for sensitivity tagging, but it’s possible you have tables where every column is the same level of sensitivity, in which case it makes sense to apply it at that level.
Auditing User Access
Now that you’re using dynamic data masking, row-level access, and tagging as part of your data governance arsenal in Snowflake, there’s one more concept to consider – auditing user access. After setting up your governance rules and features in Snowflake, the last thing left to do is to check and make sure that users are accessing the data they should be, and not accessing any data they shouldn’t be.
Snowflake provides a view called Access Usage Account History which contains a record for each query executed and which columns it accessed either directly or indirectly. This gives you the ability to see exactly who has accessed which data. This supplies you with the immediate ability to provide regulatory compliance auditing for the last 365 days (1 year) of queries.
This capability can also be used for other cool stuff like detecting unused data (so you can purge or archive – or at least stop paying to store it!) and determining who uses what data prior to making a change or drop.
Making DG Concrete
These four Snowflake features provide you with the tools to execute technically on a business-focused data governance policy. It allows you to turn a set of business rules about how data can be used, accessed, and searched (and by who) into a set of concrete tasks that your Snowflake architect or admin can press into immediate service on your company’s most valuable asset – your data!
More from Hakkoda:
Download our eBook: THE STATE OF DATA AND THE HIGH COSTS OF DATA SPRAWL
Read other blogposts:
- Data Governance – Challenges and Benefits
- Why So Much Data is Being Collected, Why Is So Little Being Used?
- Snowflake as a Solution for Global Data Compliance
- You Need Data to Innovate But You Also Need Broad Data Usability