As we discussed in the previous medical device data article, there is no shortage of challenges that can emerge when interacting with large medical device datasets. At the same time, the troves of rich data they contain make it pertinent that we make these databases more accessible, useful, and relevant to the organizations that interface with them.
To accomplish this, we have to extract this data and flatten it to build out a data model. Then we have to clean the data before we can build it out into analytics applications that make that data more usable and more readable to end users. Centralizing that data in a single place, meanwhile, is instrumental for users to easily access all of the information while saving time and money.
Step 1: Ingesting the Datasets
The first point of making the data more accessible and useful to end users is bringing all the sets of data together into a central location for access.
Currently, these medical device datasets are relegated to the websites of the government entities that they belong to. Not only are these sites sometimes a nightmare to navigate due to the outdated user interfaces, but this isolation also increases the time it takes to look through the datasets from disparate sources.
As things stand, you have to access the source, make the lookup within said source, and then collect that data to sift through. If you could aggregate those datasets by ingesting them into a single source, however, you would be able to find all of your information with a single lookup.
We do this by building disparate connectors to the sources themselves, some can be loaded via REST API like the MAUDE while others require manual ingestion such as the MedEffect dataset, which are then placed in AWS S3 storage buckets before being moved into Snowflake.
Step 2: Cleaning the Data
The next step in our process to make these datasets more accessible and useful is to eliminate all of the empty information they contain and standardize the data. Like we talked about in the last blog, much of the data that comes in is largely unstandardized, given that it is compiled from a diverse range of reporters who may not have all the information needed to make the data helpful to its end users.
For example, device or patient age can be recorded in multiple ways. Some reporters might put the age in days, some in months, and some in years, but those records ultimately need to be standardized to be readable in reports and visualizations down the line. Some information, meanwhile, may be entered incorrectly or incompletely, which leads to additional reporting issues.
Examples include patient weights being incorrect, as in one case where a patient’s weight was recorded as 1000kg. These kinds of manual errors can skew downstream results.
This data needs to be filtered out to help the analytics not become skewed due to bad data, and is filtered out based on cut off points that we have set based on experience using the datasets.
Step 3: Building out the Data Models
The next point of the process is flattening the data correctly, and building out data models that can connect the data together. This is an important step because it allows us to have clean data to innovate on in later parts of the process.
As far as we know, we have the only properly cleaned version of the MAUDE dataset accessible out of all instances that exist on the Snowflake Marketplace. Other instances have not been flattened completely and still retain objects nested in arrays. Because we have all the data properly laid out and flattened, it then becomes possible to build an ERD, which in turn makes it easier to connect with other parts of the database—such as connecting the 510k and adverse events portions of the dataset.
Since we have the data built out in this way, it is also easier for us to incorporate any potential innovations you may want to make with the data. Furthermore, we can also layer additional data into the dataset to make it more understandable and accessible to unfamiliar audiences, such as adding parent organizations for manufacturers and listing full names for the product families instead of just their codes.
This allows us to have all the tables in place to build our next aspect which is visualization, while not funneling us into specific visualizations by creating a more combined table.
Step 4: Analytics and Visualizations
Once we have all the data modeled we can then move onto analyzing the data and creating visualizations to make the data more accessible than reading it from a spreadsheet or creating your own queries.
In the case of the MAUDE dataset, we built out a Sigma worksheet that allows you to view the data over time from multiple levels of granularity. To start, you can view the whole dataset at once, seeing where manufacturing issues are coming from or where other key events are happening.
You can also view trends over time and see if the events are staying level or diminishing over time. Events can also be broken down by manufacturer so you can see how you are doing compared to your competitors.
Similarly, you can drop a layer in granularity to look at a specific manufacturer over time, including a breakdown of a given company that allows you to identify what vertical issues that are reported are coming from, whether they be a manufacturing error, a use error, or more. We will talk more about this particular view and the innovation it enables in our next article.
The Sigma worksheet we built, meanwhile, can also allow you to see how you are trending as a company in terms of adverse events. Finally, if you are on a development team you have the ability to go down to the product family and see what issues are happening based on the vertical you are seeing, which are automatically generated for you using Large Language Models.
Panning for Gold in Medical Device Data with Hakkōda
The process above is an important first step in making medical device datasets more useful to a wide array of end users by centralizing, normalizing, and optimizing the accessibility of the data they contain. But it is just the first step, and is something that any data firm or data engineer could do with the time and resources.
In the next article we will go into greater depth about how Hakkoda is bringing innovation to these datasets and empowering end users to pan for gold in these large sets of data.
Interested in learning more about unlocking the insights buried in your medical data? Take a spin through the full workbook of our visualizations here. You can also talk to one of our experts about how to extract more value from your medical device data today.