Hands-On Guidance for Data Integration in Health: The CancerLinQ Story

Posted on June 15, 2017 I Written By

Andy Oram is an editor at O’Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space.

Andy also writes often for O’Reilly’s Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O’Reilly’s Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

Institutions throughout the health care field are talking about data sharing and integration. Everyone knows that improved care, cost controls, and expanded research requires institutions who hold patient data to safely share it. The American Society of Clinical Oncology’s CancerLinQ, one of the leading projects analyzing data analysis to find new cures, has tackled data sharing with a large number of health providers and discovered just how labor-intensive it is.

CancerLinQ fosters deep relationships and collaborations with the clinicians from whom it takes data. The platform turns around results from analyzing the data quickly and to give the clinicians insights they can put to immediate use to improve the care of cancer patients. Issues in collecting, storing, and transmitting data intertwine with other discussion items around cancer care. Currently, CancerLinQ isolates the data from each institution, and de-identifies patient information in order to let it be shared among participating clinicians. CancerLinQ LLC is a wholly-owned nonprofit subsidiary of ASCO, which has registered CancerLinQ as a trademark.

CancerLinQ logo

Help from Jitterbit

In 2015, CancerLinQ began collaborating with Jitterbit, a company devoted to integrating data from different sources. According to Michele Hazard, Director of Healthcare Solutions, and George Gallegos, CEO, their company can recognize data from 300 different sources, including electronic health records. At the beginning, the diversity and incompatibility of EHRs was a real barrier. It took them several months to figure out each of the first EHRs they tackled, but now they can integrate a new one quickly. Oncology care, the key data needed by CancerLinQ, is a Jitterbit specialty.

Jitterbit logo

One of the barriers raised by EHRs is licensing. The vendor has to “bless” direct access to EHR and data imported from external sources. HIPAA and licensing agreements also make tight security a priority.

Another challenge to processing data is to find records in different institutions and accurately match data for the correct patient.

Although the health care industry is moving toward the FHIR standard, and a few EHRs already expose data through FHIR, others have idiosyncratic formats and support older HL7 standards in different ways. Many don’t even have an API yet. In some cases, Jitterbit has to export the EHR data to a file, transfer it, and unpack it to discover the patient data.

Lack of structure

Jitterbit had become accustomed to looking in different databases to find patient information, even when EHRs claimed to support the same standard. One doctor may put key information under “diagnosis” while another enters it under “patient problems,” and doctors in the same practice may choose different locations.

Worse still, doctors often ignore the structured fields that were meant to hold important patient details and just dictate or type it into a free-text note. CancerLinQ anticipated this, unpacking the free text through optical character recognition (OCR) and natural language processing (NLP), a branch of artificial intelligence.

It’s understandable that a doctor would evade the use of structured fields. Just think of the position she is in, trying to keep a complex cancer case in mind while half a dozen other patients sit in the waiting room for their turn. In order to use the structured field dedicated to each item of information, she would have to first remember which field to use–and if she has privileges at several different institutions, that means keeping the different fields for each hospital in mind.

Then she has to get access to the right field, which may take several clicks and require movement through several screens. The exact information she wants to enter may or may not be available through a drop-down menu. The exact abbreviation or wording may differ from EHR to EHR as well. And to carry through a commitment to using structured fields, she would have to go through this thought process many times per patient. (CancerLinQ itself looks at 18 Quality eMeasures today, with the plan to release additional measures each year.)

Finally, what is the point of all this? Up until recently, the information would never come back in a useful form. To retrieve it, she would have to retrace the same steps she used to enter the structured data in the first place. Simpler to dump what she knows into a free-text note and move on.

It’s worth mentioning that this Babyl of health care information imposes negative impacts on the billing and reimbursement process, even though the EHRs were designed to support those very processes from the start. Insurers have to deal with the same unstructured data that CancerLinQ and Jitterbit have learned to read. The intensive manual process of extracting information adds to the cost of insurance, and ultimately the entire health care system. The recent eClinicalWorks scandal, which resembles Volkswagon’s cheating on auto emissions and will probably spill out to other EHR vendors as well, highlights the failings of health data.

Making data useful

The clue to unblocking this information logjam is deriving insights from data that clinicians can immediately see will improve their interventions with patients. This is what the CancerLinQ team has been doing. They run analytics that suggest what works for different categories of patients, then return the information to oncologists. The CancerLinQ platform also explains which items of data were input to these insights, and urges the doctors to be more disciplined about collecting and storing the data. This is a human-centered, labor-intensive process that can take six to twelve months to set up for each institution. Richard Ross, Chief Operating Officer of CancerLinQ calls the process “trench warfare,” not because its contentious but because it is slow and requires determination.

Of the 18 measures currently requested by CancerLinQ, one of the most critical data elements driving the calculation of multiple measures is staging information: where the cancerous tumors are and how far it has progressed. Family history, treatment plan, and treatment recommendations are other examples of measures gathered.

The data collection process has to start by determining how each practice defines a cancer patient. The CancerLinQ team builds this definition into its request for data. Sometimes they submit “pull” requests at regular intervals to the hospital or clinic, whereas other times the health care provider submits the data to them at a time of its choosing.

Some institutions enforce workflows more rigorously than others. So in some hospitals, CancerLinQ can persuade the doctors to record important information at a certain point during the patient’s visit. In other hospitals, doctors may enter data at times of their own choosing. But if they understand the value that comes from this data, they are more likely to make sure it gets entered, and that it conforms to standards. Many EHRs provide templates that make it easier to use structured fields properly.

When accepting information from each provider, the team goes through a series of steps and does a check-in with the provider at each step. The team evaluates the data in a different stage for each criterion: completeness, accuracy of coding, the number of patients reported, and so on. By providing quick feedback, they can help the practice improve its reporting.

The CancerLinQ/Jitterbit story reveals how difficult it is to apply analytics to health care data. Few organizations can afford the expertise they apply to extracting and curating patient data. On the other hand, CancerLinQ and Jitterbit show that effective data analysis can be done, even in the current messy conditions of electronic data storage. As the next wave of technology standards, such as FHIR, fall into place, more institutions should be able to carry out analytics that save lives.