Free EMR Newsletter Want to receive the latest news on EMR, Meaningful Use, ARRA and Healthcare IT sent straight to your email? Join thousands of healthcare pros who subscribe to EMR and HIPAA for FREE!!

The Burden of Structured Data: What Health Care Can Learn From the Web Experience (Part 2 of 2)

Posted on September 23, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

The first part of this article summarized what Web developers have done to structure data, and started to look at the barriers presented by health care. This part presents more recommendations for making structured data work.

The Grand Scheme of Things
Once you start classifying things, it’s easy to become ensnared by grandiose pipe dreams and enter a free fall trying to design the perfect classification system. A good system is distinguished by knowing its limitations. That’s why microdata on the Web succeeded. In other areas, the field of ontology is littered with the carcasses of projects that reached too far. And health care ontologies always teeter on the edge of that danger.

Let’s take an everyday classification system as an example of the limitations of ontology. We all use genealogies. Imagine being able to sift information about a family quickly, navigating from father to son and along the trail of siblings. But even historical families, such as royal ones, introduce difficulties right away. For instance, children born out of wedlock should be shown differently from legitimate heirs. Modern families present even bigger headaches. How do you represent blended families where many parents take responsibilities of different types for the children, or people who provided sperm or eggs for artificial insemination?

The human condition is a complicated one not subject to easy classification, and that naturally extends to health, which is one of the most complex human conditions. I’m sure, for instance, that the science of mosquito borne diseases moves much faster than the ICD standard for disease. ICD itself should be replaced with something that embodies semantic meaning. But constant flexibility must be the hallmark of any ontology.

Transgender people present another enormous challenge to ontologies and EHRs. They’re a test case for every kind of variation in humanity. Their needs and status vary from person to person, with no classification suiting everybody. These needs can change over time as people make transitions. And they may simultaneously need services defined for male and female, with the mix differing from one patient to the next.

Getting to the Point
As the very term “microdata” indicates, those who wish to expose semantic data on the Web can choose just a few items of information for that favored treatment. A movie theater may have text on its site extolling its concession stand, its seating, or its accommodations for the disabled, but these are not part of the microdata given to search engines.

A big problem in electronic health records is their insistence that certain things be filled out for every patient. Any item that is of interest for any class of patient must appear in the interface, a problem known in the data industry as a Cartesian explosion. Many observers counsel a “less is more” philosophy in response. It’s interesting that a recent article that complained of “bloated records” and suggested a “less is more” approach goes on to recommend the inclusion of scads of new data in the record, to cover behavioral and environmental information. Without mentioning the contradiction explicitly, the authors address it through the hope that better interfaces for entering and displaying information will ease the burden on the clinician.

The various problems with ontologies that I have explained throw doubt on whether EHRs can attain such simplicity. Patients are not restaurants. To really understand what’s important about a patient–whether to guide the clinician in efficient data entry or to display salient facts to her–we’ll need systems embodying artificial intelligence. Such systems always feature false positives and negatives. They also depend on continuous learning, which means they’re never perfect. I would not like to be the patient whose data gets lost or misclassified during the process of tuning the algorithms.

I do believe that some improvements in EHRs can promote the use of structured data. Doctors should be allowed to enter the data in the order and the manner they find intuitive, because that order and that manner reflect their holistic understanding of the patient. But suggestions can prompt them to save some of the data in structured format, without forcing them to break their trains of thought. Relevant data will be collected and irrelevant fields will not be shown or preserved at all.

The resulting data will be less messy than what we have in unstructured text currently, but still messy. So what? That is the nature of data. Analysts will make the best use of it they can. But structure should never get in the way of the information.

The Burden of Structured Data: What Health Care Can Learn From the Web Experience (Part 1 of 2)

Posted on September 22, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

Most innovations in electronic health records, notably those tied to the Precision Medicine initiative that has recently raised so many expectations, operate by moving clinical information into structure of one type or another. This might be a classification system such as ICD, or a specific record such as “medications” or “lab results” with fixed units and lists of names to choose from. There’s no arguing against the benefits of structured data. But its costs are high as well. So we should avoid repeating old mistakes. Experiences drawn from the Web may have something to teach the health care field in respect to structured data.

What Works on the Web
The Web grew out of a structured data initiative. The dream of organizing information goes back decades, and was embodied in Standard Generalized Markup Language (SGML) years before Tim Berners-Lee stole its general syntax to create HTML and present information on the Web. SGML could let a firm mark in its documents that FR927 was a part number whereas SG1 was a building. Any tags that met the author’s fancy could be defined. This put semantics into documents. In other words, the meaning of text could be abstracted from the the text and presented explicitly. Semantics got stripped out of HTML. Although the semantic goals of SGML were re-introduced into the HTML successor XML, it found only niche uses. Another semantic Web tool, JSON, was reserved for data storage and exchange, not text markup.

Since the Web got popular, people have been trying to reintroduce semantics into it. There was Dublin Core, then RDF, then microdata in places like schema.org–just to list a few. Two terms denoting structured data on the Web, the Semantic Web and Linked Data, have been enthusiastically taken up by the World Wide Web Consortium and Tim Berners-Lee himself.

But none of these structured data initiatives are widely known among the Web-browsing public, probably because they all take a lot of work to implement. Furthermore, they run into the bootstrapping problem faced by nearly all standards: if your web site uses semantics that aren’t recognized by the browser, they’re just dropped on the ground (or even worse, the browser mangles your web pages).

Even so, recent years have seen an important form of structured data take off. When you look up a movie or restaurant on a major search engine such a Google, Yahoo!, or Bing, you’ll see a summary of the information most people want to see: local showtimes for the movie, phone number and ratings for a restaurant, etc. This is highly useful (particularly on mobile devices) and can save you the trouble of visiting the web site from which the data comes. Google calls these summaries Rich Cards and Rich Snippets.

If my memory serves me right, the basis for these snippets didn’t come from standards committees involving years of negotiation between stake-holders. Google just decided what would be valuable to its users and laid out the standard. It got adopted because it was a win-win. The movie theaters and restaurants got their information right into the viewer’s face, and the search engine became instantly more valuable and more likely to be used again. The visitors doing the search obviously benefitted too. Everyone found it worth their time to implement the standards.

Interestingly, as structure moves into metadata, HTML itself is getting less semantic. The most recent standard, HTML5, did add a few modest tags such as header and footer. But many sites are replacing meaningful HTML markup, such as p for paragraph, with two ultra-generic tags: div for a division that is set off from other parts of the page, and span for a piece of text embedded within another. Formatting is expressed through CSS, a separate language.

Having reviewed a bit of Web history, let’s see what we can learn from it and apply to health care.

Make the Customer Happy
Win-win is the key to getting a standard adopted. If your clinician doesn’t see any benefit from the use of structured data, she will carp and bristle at any attempt to get her to enter it. One of the big reasons electronic health records are so notoriously hard to use is, “All those fields to fill out.” And while lists of medications or other structured data can help the doctor choose the right one, they can also help her enter serious errors–perhaps because she chose the one next to the one she meant to choose, or because the one she really wanted isn’t offered on the list.

Doctors’ resentment gets directed against every institution implicated in the structured data explosion: the ONC and CMS who demand quality data and other fields of information for their own inscrutable purposes, the vendor who designs up the clunky system, and the hospital or clinic that forces doctors to use it. But the Web experience suggests that doctors would fill out fields that would help them in their jobs. The use of structured data should be negotiated, not dictated, just like other innovations such as hand-washing protocols or checklists. Is it such a radical notion to put technology at the service of the people using it?

I know it’s frustrating to offer that perspective, because many great things come from collecting data that is used in analytics and can turn up unexpected insights. If we fill out all those fields, maybe we’ll find a new cure! But the promised benefit is too far off and too speculative to justify the hourly drag upon the doctor’s time.

We can fall back on the other hope for EHR improvement: an interface that makes data entry so easy that doctors don’t mind using structured fields. I have some caveats to offer about that dream, which will appear in the second part of this article.