Can Machine Learning Tame Healthcare’s Big Data?

Posted on September 20, 2016 I Written By

Anne Zieger is a healthcare journalist who has written about the industry for 30 years. Her work has appeared in all of the leading healthcare industry publications, and she's served as editor in chief of several healthcare B2B sites.

Big data is both a blessing and a curse. The blessing is that if we use it well, it will tell us important things we don’t know about patient care processes, clinical improvement, outcomes and more. The curse is that if we don’t use it, we’ve got a very expensive and labor-hungry boondoggle on our hands.

But there may be hope for progress. One article I read today suggests that another technology may hold the key to unlocking these blessings — that machine learning may be the tool which lets us harvest the big data fields. The piece, whose writer, oddly enough, was cited only as “Mauricio,” lead cloud expert at, argues that machine learning is “the most effective way to excavate buried patterns in the chunks of unstructured data.” While I am an HIT observer rather than techie, what limited tech knowledge I possess suggests that machine learning is going to play an important role in the future of taming big data in healthcare.

In the piece, Mauricio notes that big data is characterized by the high volume of data, including both structured and non-structured data, the high velocity of data flowing into databases every working second, the variety of data, which can range from texts and email to audio to financial transactions, complexity of data coming from multiple incompatible sources and variability of data flow rates.

Though his is a general analysis, I’m sure we can agree that healthcare big data specifically matches his description. I don’t know if you who are reading this include wild cards like social media content or video in their big data repositories, but even if you don’t, you may well in the future.

Anyway, for the purposes of this discussion, let’s summarize by saying that in this context, big data isn’t just made of giant repositories of relatively normalized data, it’s a whirlwind of structured and unstructured data in a huge number of formats, flooding into databases in spurts, trickles and floods around the clock.

To Mauricio, an obvious choice for extracting value from this chaos is machine learning, which he defines as a data analysis method that automates extrapolated model-building algorithms. In machine learning models, systems adapt independently without any human interaction, using automatically-applied customized algorithms and mathematical calculations to big data. “Machine learning offers a deeper insight into collected data and allows the computers to find hidden patterns which human analysts are bound to miss,” he writes.

According to the author, there are already machine learning models in place which help predict the appearance of genetically-influenced diseases such as diabetes and heart disease. Other possibilities for machine learning in healthcare – which he doesn’t mention but are referenced elsewhere – include getting a handle on population health. After all, an iterative learning technology could be a great choice for making predictions about population trends. You can probably think of several other possibilities.

Now, like many other industries, healthcare suffers from a data silo problem, and we’ll have to address that issue before we create the kind of multi-source, multi-format data pool that Mauricio envisions. Leveraging big data effectively will also require people to cooperate across departmental and even organizational boundaries, as John Lynn noted in a post from last year.

Even so, it’s good to identify tools and models that can help get the technical work done, and machine learning seems promising. Have any of you experimented with it?