Do we need data modeling in the world of data clouds?
Data clouds are incredibly fast— retrieving data from massive sets is easy and quick for even the largest volumes of data. The computing power you can bring to bear with Snowflake data warehouses is almost limitless. For us who began working with data in the 90s, 80s, or earlier—it’s the stuff of dreams.
But it does begin to elicit some questions about the practices we developed to work with constrained computing and storage. And, whether those are still necessary in a data world seemingly without limits. To help answer that, I’ll share my journey from technical writing to data modeling and my observations on what is here and lies ahead.
I began my career in tech as a technical writer in the late ‘90s, but that’s not exactly what I’m here to tell you about. I spent my first six months documenting databases for a department of DBAs, then suddenly made the leap into data modeling—the credit card company I worked for needed data modelers fast, and was willing to train me ASAP.
My career in data had begun and I’ve never looked back. I took a series of courses covering conceptual, logical, and physical data modeling, followed by training in SQL, ETL, and database administration.
Training to Design Highly-Efficient Databases
The overriding reason for almost all of this training was to ensure I understood relational modeling concepts well enough to design databases that would function efficiently, quickly, and with data integrity. The data had to be organized just right to reduce any hint of duplication or excess processing at query time due to the cost of storage and computing. Because the lag time to scale these things was astronomical.
It could take weeks or months to order, receive, deploy, and start using an expensive new server back in those days. If you ran out of servers to run your operations, you were SOL until IT could spin up some new ones, or find space on the old ones by deleting something.
None of that applies now, right? Speed and data storage can go as high and fast as you like, assuming you’re willing to pay. If you are, the sky’s the limit. So why would you concern yourself with data modeling? Why not just dump all your data into the cloud and get on with living in the future? Do you really need to bother with a practice that ensures efficient storage and processing when those are no longer a limitation?
I’m here to say you do and you should. And here’s why…
Get Yourself a Map
Part of the purpose of modeling your data is to provide a map of what data you have and where to find it. A data model provides a visual representation of your data – entities, relationships, tables, columns, data types, keys, requirements, and, importantly, definitions of everything in your model. You can’t just hand somebody the keys to a data lake and say, “Here’s a big blob of data, make it sing.” They need a map.
If you don’t want your developers figuring these things out from scratch every time they begin a new data project, and you don’t want them each to reach their own conclusions about usage, meaning, and context every time they build something new, a data model is what they need to understand the data. Also, without a data model, all of the assembled data knowledge built by your developers over time will exist only in the application or program code, and only people who understand that code will know your data. A data model makes this knowledge available to all, a concept commonly known as the “democratization of data.”
So in essence, data modeling is still about efficiency, but now it’s about reducing the time your people, not your servers, spend dealing with your data.
Find the Treasure on Your Map
Another reason you need a data model is to aid compliance with laws and regulations, such as HIPAA, GDPR, CCDP, and PCI-DSS. These laws specify the types and instances of data that must be protected, controlled, or audited. You need a data model that identifies and defines this sensitive data to ensure compliance with these laws. A data model gives you a map of where your regulated data is to ensure you protect it appropriately.
Share the Navigation Duties
The really cool thing about data models is that they are visual representations that are easily understandable after learning some very basic concepts. This means that anybody can read them, understand them, and come up with new ideas and insights without having to learn a programming language or application. Acquainting more people within your enterprise with the data available to them and how to use it, (aka data literacy), is a vital step in creating a data-driven organization that can ask better questions, and uncover new opportunities.
Find Your Way Home
Finally, when you model your data you decrease the possibility of data duplication and relationship errors. Accurate data models represent your business and analytics operations, ensuring integrity and results that you can trust, in a repeatable fashion. Creating new and novel insights becomes simple and straightforward when your data is in a diagram. Documenting, defining, describing, and categorizing data, simplifies picking the elements you need and joining them correctly. Building cubes, data marts, dashboards, and data sets for AI/ML operations becomes a matter of selecting and interpreting, not an (ill-fated) expedition into uncharted territory.
The great news is that data modeling in a cloud world doesn’t have to be a long or painful process. Existing data structures need to be identified, analyzed, and understood. As a result, creating a basic model of the critical data sets can be done iteratively and quickly by a skilled modeler or architect.
In the process, you’ll receive artifacts that can be ingested and organized quickly in your data cataloging tool, maximizing your time to insight and value.
If this sounds like a fancy way of saying that data modeling is really data governance, that’s a reasonable conclusion! Modeling is an integral part of managing your data, wherever you store it.