Disclaimer: This blog is not intended to provide legal or regulatory guidance or advice and should not be relied upon for such. Examples used in this blog are in order to illustrate the complexity and technical capabilities you should consider. As I point out below, this is a very complex and fluid area. You need expert and local guidance to understand your regulatory obligations. I am not your lawyer.
The Compliance Challenge
Global data compliance can be very difficult to navigate. Regulations vary not only by country but also by region or province. This creates a significant challenge for organizations looking for a comprehensive view of the business. Technical challenges and organizational alignment are difficult enough but now you must meet a variety of data privacy and data residency requirements.
If you want to dip into the complexity of data localization run a simple google search and you’ll find law firms, strategic consultancies and even lobbyists with points of view and summaries of the variety and movement in these regulations.
If you are curious about fines levied against companies for European Privacy violations (under GDPR) check it out here.
This blog highlights recent advancements in the technology that enables architecting for a global solution, specifically for data warehouses and global analytics.
The Technical Challenge
Let’s assume you’ve worked with good data privacy consultants and legal counsel and you understand the requirements for the markets in which you operate.
At this point, you set out to architect your global data warehouse, governance and analytics solution. You likely arrived at the data mesh framework. It addresses many issues of past approaches and is a good framework for design. In the context of a single public cloud provider, it seems doable.
Then you tally up your regions and notice that you don’t need one public cloud provider, you need all three! This is due to a myriad of preferences (location / latency, government regulations, regional strengths). Logically modeling multiple cloud data warehouses is possible but quickly gets complicated and you are likely beginning to realize how complex and fraught with risk this strategy is.
It is way more than the data model, integration and replication. At the cloud layer you have services like access, security, monitoring and performance that just got exponentially more challenging.
The Snowflake Solution
Snowflake is a global cloud data platform that can work across AWS, Google GCP and Microsoft Azure, and multiple regions of each, as if it were one. This is the cloud services layer in their three layer architecture (storage, compute, cloud services). Snowflake cloud services enable global access and security. Your data mesh now comes back into focus. There is lots of talk about the storage and compute layers of Snowflake, for good reason, but if you are a global organization dealing with data compliance, the cloud services layer is a bit of magic. I’ve led large teams building cloud agnostic service layers, it’s no joke and more importantly it is not needed. Benoit Dageville and the Snowflake team have done this for you – so you are free to move on up the data value chain.
I feel this is an under-appreciated super power of Snowflake, the location of the data is now at the command of mere mortals (known as ORGADMIN). You now can address data localization by physically storing data in the locations required by regulations. But that’s just one element of the global data compliance challenge.
Now that you have an omni-cloud, global data platform, you are prepared to enable localization, but there are even more capabilities that drive the evolution from compliance into analytics.
At this point you understand your requirements with Regulated Data (e.g. Personally Identifiable Information, Personal Health Information, Payment Card Information). Whatever is defined by governing bodies as Regulated Data must be stored and processed in that region and must remain there. . Further, accessing the data from outside the region can also constitute ‘transfer’.
There are a few more tools Snowflake provies to help you make your data secure, accessible and compliant, such as:
Encryption: Snowflake encrypts all data by default and you can also add client-side encryption.
Data Masking: Most analytics and algorithms don’t really need PII or other regulated data. Data masking enables you to only provide PII to the limited users and systems who need access. This prevents regulated data access even inside the local region. Because complexity of user roles and access can change, Snowflake has enabled dynamic data masking.
Secure Data Share: In the context of this post there are many ways to think about sharing. First, within each region you aren’t moving data around. You create secure data shares and provide access to other accounts in your region and even externally to suppliers and vendors. What you are sharing should be addressed prior (redacting, masking) to ensure compliance but the technology has enabled an environment that is simultaneously more secure and more accessible. For your global data analytics platform, sharing across regions and clouds is done through replication. This enables your compliant data sets to be securely shared to your global database.
Global data compliance is challenging and changing. At the same time the value of data and the requirement for its speed and reach are increasing. Data is the fuel of innovation and the landscape of capabilities is changing quickly. Don’t reach for the old playbook or homebrew a problem someone has already solved. Focus on the tail of the data value chain – that’s where the value is created.