By Andrew Olafson and Theodore Caulton
One of the many data challenges posed to hospitals in the last 50 years has been the ability to store patient information securely, compactly and over a long period of time. Developing a data format that fits all three of those criteria offer tradeoffs between accessibility and storage. This makes it a difficult challenge, especially when data needs to be accessed by doctors and shared between hospital systems.
Hospitals, by and large, have adopted a file format called FHIR that meets the outlined needs making it ideal for sharing data between hospitals. This solution was ideal when hospitals only needed to store and share data. However, as data analytics have matured and hospitals look to perform research and analysis on their data, the FHIR data model has proven unwieldy for scaled analysis due to its file format. In such a scenario, Hakkoda has begun developing a scalable solution to this dilemma.
Understanding the FHIR Data Maturity Model
The FHIR data format, initially developed in 2012, was designed to tackle exponential growth of medical data. It was built to offer a clean solution for storing large magnitudes of patient data while also being easy to share between hospital systems. Its data format (JSON, XML) made it compact and easy to share, but difficult to perform data analysis on, especially at scale.
To this point, if an analyst wanted to analyze and gain insight from a chunk of data, they would need to flatten and map JSON data by hand before applying data science on the data now in a tabular format. If you can imagine trying to flatten and map all your data as it scales, this task quickly becomes an insurmountable problem. Additionally, if every hospital system has their own system for flattening and mapping FHIR data, sharing models or methodology between organizations is impossible.
So, Why OMOP?
The OMOP Data Model offers a solution to this standardization dilemma by offering a standardized mapping from FHIR to a table format data model (OMOP). The open source nature of OMOP means that the community can create and share data models as well as analytical queries built off of the OMOP data model format. Currently, there is an ever growing open source query library designed to be run on the OMOP data model. The advantage of these features is that a hospital could systematically convert their FHIR data to an OMOP format and apply a vetted data model from the OMOP community and have ready built queries at their fingertips. The added bonus is that a hospital could then share their model with a sister hospital system or adopt a model from another system that has also implemented the OMOP data model.
It should be noted that some organizations have created applications to convert small FHIR bundles to the OMOP data format, but there isn’t a universal solution that can be scaled across an entire organization. As data science becomes increasingly important for making informed decisions, it becomes essential to utilize a complete picture of a hospital’s data in analysis. The OMOP data model provides a potentially scalable format that hospitals can use to do large scale analysis. Given 84% of hospitals utilize the FHIR data format, it’s inevitable that hospitals will need to move to a standard format to modernize their analytics.
Hakkoda’s Solution: FHIR to OMOP
Hakkoda believes that the move to OMOP, particularly early on, will provide a strong competitive advantage to any firm adopting this data format. Given this, Hakkoda has developed a system for migrating FHIR data to OMOP at scale.
Initially, FHIR bundles start as XML or JSON files, which can be read in a structured format. This makes them ideal for storage or use in programs that are designed to read and display this type of content. However, traditional analytics techniques cannot natively take advantage of this data format by transforming it into a tabular format. Hakkoda simplifies and standardizes this process using an end to end dbt pipeline that flattens out FHIR bundles and then maps them to the standard OMOP tables.
In a FHIR bundle, resources can be included as part of a transaction or for batch processing. The resources within a bundle can be of different types, such as patients, practitioners, or observations. The resources can also be related to each other in different ways, such as by being part of the same patient record or by representing different components of a single healthcare encounter. The FHIR bundle used for this demonstration was produced using Synthea and copied into Azure blob storage. In production, batch processing of FHIR bundles could be automated using Snowflake’s native snowpipe, streams, and tasks. The first stage of our FHIR to OMOP pipeline implements code to flatten out this data called fhir_starter.
This code is using Snowflake’s Snowpark and python to parse our JSON FHIR bundle and write the various resources to their own target table within Snowflake. Currently, this supports FHIR v4 and below. We have separate Snowpark UDFs for HLV7v2 and EDI files.
Running fhir_starter places the FHIR resources into individual tables within Snowflake. Some commonly used columns, such as date of birth and id, have been flattened out while others are still nested. We do have the capability to flatten these other nested columns as needed. It should also be noted that any resource that does not come through in a given bundle will be ignored.
This FHIR flattened output is the source for our target tables in the final stages of this FHIR to OMOP DAG. It should be noted that our pipeline currently only maps data out to 9 of the 37 OMOP tables; however we are continuing to develop our mappings.
Powering Secure Data Storage and Shareability
The benefits of FHIR bundles for long term data storage and shareability between hospitals shouldn’t be understated. Hospital systems need a robust and standardized format for storing data that may not be constantly accessed. However, as data becomes increasingly important for decision making and building competitive advantages, hospitals will need to stay ahead of the curve and ensure their data is analytics ready.
Hospitals that continue to have their analysts perform ad hoc flattening and mapping of FHIR data before performing analysis will find they receive insights with incomplete, or worse, time lagged data that fails to capture the full picture of their business. FHIR to OMOP unlocks the ability to quickly move all current and future data to a tabular format where it can easily be piped into standardized metrics or dashboards.
The two figures below are a great example of this. Our FHIR bundles started in the messy form you can see below, but in a small amount of time, they can be transformed into legible and constantly updating visualizations. The dashboard visualizations were created using the OMOP query library.
The Life-Saving Value of Standardized Data
A standardized data model has the ability to transform data usage and shareability within healthcare organizations. The easy to analyze tabular format of OMOP data, along with its growing, community driven query library, enables easy implementation of complex but practically applicable models to a hospital’s data analytics systems. The open source nature of this solution means that as more hospitals adopt OMOP, the value of it will grow along with the query library.
Snowflake’s Snowpark and its ability to implement into dbt are invaluable to Hakkoda’s FHIR to OMOP solution. With the power to leverage Python and SQL in dbt, as data grows, FHIR to OMOP can continue to incrementally convert FHIR data and implement it into dashboards where data and insights can remain relevant to current data.