Free EMR Newsletter Want to receive the latest news on EMR, Meaningful Use, ARRA and Healthcare IT sent straight to your email? Join thousands of healthcare pros who subscribe to EMR and HIPAA for FREE!!

The Burden of Structured Data: What Health Care Can Learn From the Web Experience (Part 2 of 2)

Posted on September 23, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

The first part of this article summarized what Web developers have done to structure data, and started to look at the barriers presented by health care. This part presents more recommendations for making structured data work.

The Grand Scheme of Things
Once you start classifying things, it’s easy to become ensnared by grandiose pipe dreams and enter a free fall trying to design the perfect classification system. A good system is distinguished by knowing its limitations. That’s why microdata on the Web succeeded. In other areas, the field of ontology is littered with the carcasses of projects that reached too far. And health care ontologies always teeter on the edge of that danger.

Let’s take an everyday classification system as an example of the limitations of ontology. We all use genealogies. Imagine being able to sift information about a family quickly, navigating from father to son and along the trail of siblings. But even historical families, such as royal ones, introduce difficulties right away. For instance, children born out of wedlock should be shown differently from legitimate heirs. Modern families present even bigger headaches. How do you represent blended families where many parents take responsibilities of different types for the children, or people who provided sperm or eggs for artificial insemination?

The human condition is a complicated one not subject to easy classification, and that naturally extends to health, which is one of the most complex human conditions. I’m sure, for instance, that the science of mosquito borne diseases moves much faster than the ICD standard for disease. ICD itself should be replaced with something that embodies semantic meaning. But constant flexibility must be the hallmark of any ontology.

Transgender people present another enormous challenge to ontologies and EHRs. They’re a test case for every kind of variation in humanity. Their needs and status vary from person to person, with no classification suiting everybody. These needs can change over time as people make transitions. And they may simultaneously need services defined for male and female, with the mix differing from one patient to the next.

Getting to the Point
As the very term “microdata” indicates, those who wish to expose semantic data on the Web can choose just a few items of information for that favored treatment. A movie theater may have text on its site extolling its concession stand, its seating, or its accommodations for the disabled, but these are not part of the microdata given to search engines.

A big problem in electronic health records is their insistence that certain things be filled out for every patient. Any item that is of interest for any class of patient must appear in the interface, a problem known in the data industry as a Cartesian explosion. Many observers counsel a “less is more” philosophy in response. It’s interesting that a recent article that complained of “bloated records” and suggested a “less is more” approach goes on to recommend the inclusion of scads of new data in the record, to cover behavioral and environmental information. Without mentioning the contradiction explicitly, the authors address it through the hope that better interfaces for entering and displaying information will ease the burden on the clinician.

The various problems with ontologies that I have explained throw doubt on whether EHRs can attain such simplicity. Patients are not restaurants. To really understand what’s important about a patient–whether to guide the clinician in efficient data entry or to display salient facts to her–we’ll need systems embodying artificial intelligence. Such systems always feature false positives and negatives. They also depend on continuous learning, which means they’re never perfect. I would not like to be the patient whose data gets lost or misclassified during the process of tuning the algorithms.

I do believe that some improvements in EHRs can promote the use of structured data. Doctors should be allowed to enter the data in the order and the manner they find intuitive, because that order and that manner reflect their holistic understanding of the patient. But suggestions can prompt them to save some of the data in structured format, without forcing them to break their trains of thought. Relevant data will be collected and irrelevant fields will not be shown or preserved at all.

The resulting data will be less messy than what we have in unstructured text currently, but still messy. So what? That is the nature of data. Analysts will make the best use of it they can. But structure should never get in the way of the information.

The Burden of Structured Data: What Health Care Can Learn From the Web Experience (Part 1 of 2)

Posted on September 22, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

Most innovations in electronic health records, notably those tied to the Precision Medicine initiative that has recently raised so many expectations, operate by moving clinical information into structure of one type or another. This might be a classification system such as ICD, or a specific record such as “medications” or “lab results” with fixed units and lists of names to choose from. There’s no arguing against the benefits of structured data. But its costs are high as well. So we should avoid repeating old mistakes. Experiences drawn from the Web may have something to teach the health care field in respect to structured data.

What Works on the Web
The Web grew out of a structured data initiative. The dream of organizing information goes back decades, and was embodied in Standard Generalized Markup Language (SGML) years before Tim Berners-Lee stole its general syntax to create HTML and present information on the Web. SGML could let a firm mark in its documents that FR927 was a part number whereas SG1 was a building. Any tags that met the author’s fancy could be defined. This put semantics into documents. In other words, the meaning of text could be abstracted from the the text and presented explicitly. Semantics got stripped out of HTML. Although the semantic goals of SGML were re-introduced into the HTML successor XML, it found only niche uses. Another semantic Web tool, JSON, was reserved for data storage and exchange, not text markup.

Since the Web got popular, people have been trying to reintroduce semantics into it. There was Dublin Core, then RDF, then microdata in places like schema.org–just to list a few. Two terms denoting structured data on the Web, the Semantic Web and Linked Data, have been enthusiastically taken up by the World Wide Web Consortium and Tim Berners-Lee himself.

But none of these structured data initiatives are widely known among the Web-browsing public, probably because they all take a lot of work to implement. Furthermore, they run into the bootstrapping problem faced by nearly all standards: if your web site uses semantics that aren’t recognized by the browser, they’re just dropped on the ground (or even worse, the browser mangles your web pages).

Even so, recent years have seen an important form of structured data take off. When you look up a movie or restaurant on a major search engine such a Google, Yahoo!, or Bing, you’ll see a summary of the information most people want to see: local showtimes for the movie, phone number and ratings for a restaurant, etc. This is highly useful (particularly on mobile devices) and can save you the trouble of visiting the web site from which the data comes. Google calls these summaries Rich Cards and Rich Snippets.

If my memory serves me right, the basis for these snippets didn’t come from standards committees involving years of negotiation between stake-holders. Google just decided what would be valuable to its users and laid out the standard. It got adopted because it was a win-win. The movie theaters and restaurants got their information right into the viewer’s face, and the search engine became instantly more valuable and more likely to be used again. The visitors doing the search obviously benefitted too. Everyone found it worth their time to implement the standards.

Interestingly, as structure moves into metadata, HTML itself is getting less semantic. The most recent standard, HTML5, did add a few modest tags such as header and footer. But many sites are replacing meaningful HTML markup, such as p for paragraph, with two ultra-generic tags: div for a division that is set off from other parts of the page, and span for a piece of text embedded within another. Formatting is expressed through CSS, a separate language.

Having reviewed a bit of Web history, let’s see what we can learn from it and apply to health care.

Make the Customer Happy
Win-win is the key to getting a standard adopted. If your clinician doesn’t see any benefit from the use of structured data, she will carp and bristle at any attempt to get her to enter it. One of the big reasons electronic health records are so notoriously hard to use is, “All those fields to fill out.” And while lists of medications or other structured data can help the doctor choose the right one, they can also help her enter serious errors–perhaps because she chose the one next to the one she meant to choose, or because the one she really wanted isn’t offered on the list.

Doctors’ resentment gets directed against every institution implicated in the structured data explosion: the ONC and CMS who demand quality data and other fields of information for their own inscrutable purposes, the vendor who designs up the clunky system, and the hospital or clinic that forces doctors to use it. But the Web experience suggests that doctors would fill out fields that would help them in their jobs. The use of structured data should be negotiated, not dictated, just like other innovations such as hand-washing protocols or checklists. Is it such a radical notion to put technology at the service of the people using it?

I know it’s frustrating to offer that perspective, because many great things come from collecting data that is used in analytics and can turn up unexpected insights. If we fill out all those fields, maybe we’ll find a new cure! But the promised benefit is too far off and too speculative to justify the hourly drag upon the doctor’s time.

We can fall back on the other hope for EHR improvement: an interface that makes data entry so easy that doctors don’t mind using structured fields. I have some caveats to offer about that dream, which will appear in the second part of this article.

OCHIN Shows That Messy Data Should Not Hold Back Health Care

Posted on September 12, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

The health care industry loves to complain about patient data. It’s full of errors, which can be equally the fault of patients or staff. And hanging over the whole system is lack of interoperability, which hampers research.

Well, it’s not as if the rest of the universe is a pristine source of well-formed statistics. Every field has to deal with messy data. And somehow retailers, financial managers, and even political campaign staff manage to extract useful information from the data soup. This doesn’t mean that predictions are infallible–after all, when I check a news site about the Mideast conflicts, why does the publisher think I’m interested in celebs from ten years ago whose bodies look awful now? But there is still no doubt that messy data can transform industry.

I’m all for standards and for more reliable means of collecting and vetting patient data. But for the foreseeable future, health care institutions are going to have to deal with suboptimal data. And OCHIN is one of the companies that shows how it can be done.

I recently had a chance to talk and see a demo of OCHIN’s analytical tool, Acuere, with CEO Abby Sears and the Vice President of Data Services and Integration, Clayton Gillett. Their basic offering is a no-nonsense interface that lets clinicians and administrator do predictions and hot-spotting.

Acuere is part of a trend in health care analytics that goes beyond clinical decision support and marshalls large amounts of data to help with planning (see an example screen in Figure 1). For instance, a doctor can rank her patients by the number of alerts the system generates (a patient with diabetes whose glucose is getting out of control, or a smoker who hasn’t received counseling for smoking cessation). An administrator can rank a doctor against others in the practice. This summary just gives a flavor of the many services Acuere can perform; my real thrust in this article is to talk about how OCHIN obtains and processes its data. Sears and Gillett talked about the following challenges and how they’re dealing with them.

Acuere Provider Report Card

Figure 1. Acuere Report Card in Acuere

Patient identification
Difficulties in identifying patients and matching their records has repeatedly surfaced as the biggest barrier to information exchange and use in the US health care system. A 2014 ONC report cites it as a major problem (on pages 13 and 20). An article I cited earlier also blames patient identification for many of the problems of health care analytics. But the American public and Congress have been hostile to unique identifiers for some time, so health care institutions just have to get by without them.

OCHIN handles patient matching as other institutions, such as Health Information Exchanges, do. They compare numerous fields of records–not just obvious identifiers such as name and social security number, but address, demographic information, and perhaps a dozen other things. Sears and Gillett said it’s also hard to knowing which patients to attribute to each health care provider.

Data sources
The recent Precision Medicine initiatives seeks to build “a national research cohort of one million or more U.S. participants.” But OCHIN already has a database on 7.6 million people and has signed more contracts to reach 10 million this Fall. Certainly, there will be advantages to the Precision Medicine database. First, it will contain genetic information, which OCHIN’s data suppliers don’t have. Second, all the information on each person will be integrated, whereas OCHIN has to take de-identified records from many different suppliers and try to integrate them using the techniques described in the previous section, plus check for differences and errors in order to produce clean data.

Nevertheless, OCHIN’s data is impressive, and it took a lot of effort to accumulate it. They get not only medical data but information about the patient’s behavior and environment. Along with 200 different vital signs, they can map the patient’s location to elements of the neighborhood, such as income levels and whether healthy food is sold in local stores.

They get Medicare data from qualified entities who were granted access to it by CMS, Medicaid data from the states, patient data from commercial payers, and even data on the uninsured (a population that is luckily shrinking) from providers who treat them. Each institution exports data in a different way.

How do they harmonize the data from these different sources? Sears and Gillett said it takes a lot of manual translation. Data is divided into seven areas, such as medications and lab results. OCHIN uses standards whenever possible and participates in groups that set standards. There are still labs that don’t use LOINC codes to report results, as well as pharmacies and doctors who don’t use RxNorm for medications. Even ICD-10 changes yearly, as codes come and go.

Data handling
OCHIN isn’t like a public health agency that may be happy sharing data 18 months after it’s collected (as I was told at a conference). OCHIN wants physicians and their institutions to have the latest data on patients, so they carry out millions of transactions each day to keep their database updated as soon as data comes in. Their analytics run multiple times every day, to provide the fast results that users get from queries.

They are also exploring the popular “big data” forms of analytics that are sweeping other industries: machine learning, using feedback to improve algorithms, and so on. Currently, the guidance they offer clinicians is based on traditional clinical recommendations from randomized trials. But they are seeking to expand those sources with other insights from light-weight methods of data analysis.

So data can be useful in health care. Modern analytics should be available to every clinician. After all, OCHIN has made it work. And they don’t even serve up ads for chronic indigestion or 24-hour asthma relief.