Free EMR Newsletter Want to receive the latest news on EMR, Meaningful Use, ARRA and Healthcare IT sent straight to your email? Join thousands of healthcare pros who subscribe to EMR and HIPAA for FREE!!

Scenarios for Health Care Reform (Part 2 of 2)

Posted on May 18, 2017 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

The first part of this article suggested two scenarios that could promote health care reform. We’ll finish off the scenarios in this part of the article.

Capitalism Disrupts Health Care

In the third scenario, reform is stimulated by an intrepid data science firm that takes on health care with greater success than most of its predecessors. After assembling an impressive analytics toolkit from open source software components–thus simplifying licensing–it approaches health care providers and offers them a deal they can’t refuse: analytics demonstrated to save them money and support their growth, all delivered for free. The data science firm asks in return only that they let it use deidentified data from their patients and practices to build an enhanced service that it will offer paying customers.

Some health care providers balk at the requirement to share data, but their legal and marketing teams explain that they have been doing it for years already with companies whose motives are less commendable. Increasingly, the providers are won over. The analytics service appeals particularly to small, rural, and safety-net providers. Hammered by payment cuts and growing needs among their populations, they are on the edge of going out of business and grasp the service as their last chance to stay in the black.

Participating in the program requires the extraction of data from electronic health records, and some EHR vendors try to stand in the way in order to protect their own monopoly on the data. Some even point to clauses in their licenses that prohibit the sharing. But they get a rude message in return: so valuable are the analytics that the providers are ready to jettison the vendors in a minute. The vendors ultimately go along and even compete on the basis of their ability to connect to the analytics.

Once stability and survival are established, the providers can use the analytics for more and more sophisticated benefits. Unlike the inadequate quality measures currently in use, the analytics provide a robust framework for assessing risk, stratifying populations, and determining how much a provider should be rewarded for treating each patient. Fee-for-outcome becomes standard.

Providers make deals to sign up patients for long-term relationships. Unlike the weak Medicare ACO model, which punishes a provider for things their patients do outside their relationship, the emerging system requires a commitment from the patient to stick with a provider. However, if the patient can demonstrate that she was neglected or failed to receive standard of care, she can switch to another provider and even require the misbehaving provider to cover costs. To hold up their end of this deal, providers find it necessary to reveal their practices and prices. Physician organizations develop quality-measurement platforms such as the recent PRIME registry in family medicine. A race to the top ensues.

What If Nothing Changes?

I’ll finish this upbeat article with a fourth scenario in which we muddle along as we have for years.

The ONC and Centers for Medicare & Medicaid Services continue to swat at waste in the health care system by pushing accountable care. But their ratings penalize safety-net providers, and payments fail to correlate with costs as hoped.

Fee-for-outcome flounders, so health care costs continue to rise to intolerable levels. Already, in Massachusetts, the US state that leads in universal health coverage, 40% of the state budget goes to Medicaid, where likely federal cuts will make it impossible to keep up coverage. Many other states and countries are witnessing the same pattern of rising costs.

The same pressures ride like a tidal wave through the rest of the health care system. Private insurers continue to withdraw from markets or lose money by staying. So either explicitly or through complex and inscrutable regulatory changes, the government allows insurers to cut sick people from their rolls and raise the cost burdens on patients and their employers. As patient rolls shrink, more hospitals close. Political rancor grows as the public watches employer money go into their health insurance instead of wages, and more of their own stagnant incomes go to health care costs, and government budgets tied up in health care instead of education and other social benefits.

Chronic diseases creep through the population, mocking crippled efforts at public health. Rampant obesity among children leads to more and earlier diabetes. Dementia also rises as the population ages, and climate change scatters its effects across all demographics.

Furthermore, when patients realize the costs they must take on to ask for health care, they delay doctor visits until their symptoms are unbearable. More people become disabled or perish, with negative impacts that spread through the economy. Output decline and more families become trapped in poverty. Self-medication for pain and mental illness becomes more popular, with predictable impacts on the opiate addiction crisis. Even our security is affected: the military finds it hard to recruit find healthy soldiers, and our foreign policy depends increasingly on drone strikes that kill civilians and inflame negative attitudes toward the US.

I think that, after considering this scenario, most of us would prefer one of the previous three I laid out in this article. If health care continues to be a major political issue for the next election, experts should try to direct discussion away from the current unproductive rhetoric toward advocacy for solutions. Some who read this article will hopefully feel impelled to apply themselves to one of the positive scenarios and bring it to fruition.

Scenarios for Health Care Reform (Part 1 of 2)

Posted on May 16, 2017 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

All reformers in health care know what the field needs to do; I laid out four years ago the consensus about patient-supplied data, widespread analytics, mHealth, and transparency. Our frustration comes in when trying to crack the current hide-bound system open and create change. Recent interventions by US Republicans to repeal the Affordable Care Act, whatever their effects on costs and insurance coverage, offer no promise to affect workflows or treatment. So this article suggests three potential scenarios where reform could succeed, along with a vision of what will happen if none of them take hold.

Patients Forge Their Own Way Forward

In the first scenario, a tiny group of selfer-trackers, athletes, and empowered patients start a movement that ultimately wins over hundreds of millions of individuals.

These scattered enthusiasts, driven to overcome debilitating health problems or achieve extraordinary athletic feats, start to pursue self-tracking with fanaticism. Consumer or medical-grade devices provide them with ongoing data about their progress, and an open source platform such as HIE of One gives them a personal health record (PHR).

They also take charge of their interactions with the health care system. They find that most primary care providers aren’t interested in the data and concerns they bring, or don’t have time to process those data and concerns in the depth they need, or don’t know how to. Therefore, while preserving standard relationships with primary care providers and specialists where appropriate, the self-trackers seek out doctors and other providers to provide consultation about their personal health programs. A small number of providers recognize an opportunity here and set up practices around these consultations. The interactions look quite different from standard doctor visits. The customers, instead of just submitting themselves to examination and gathering advice, steer the conversation and set the goals.

Power relationships between doctors and customers also start to change. Although traditional patients can (and often do) walk away and effectively boycott a practice with which they’re not comfortable, the new customers use this power to set the agenda and to sort out the health care providers they find beneficial.

The turning point probably comes when someone–probabaly a research facility, because it puts customer needs above business models–invents a cheap, comfortable, and easy-to-use device that meets the basic needs for monitoring and transmitting vital signs. It may rest on the waist or some other place where it can be hidden, so that there is no stigma to wearing it constantly and no reason to reject its use on fashion grounds. A beneficent foundation invests several million dollars to make the device available to schoolchildren or some other needy population, and suddenly the community of empowered patients leaps from a miniscule pool to a mainstream phenomenon.

Researchers join the community in search of subjects for their experiments, and patients offer data to the researchers in the hope of speeding up cures. At all times, the data is under control of the subjects, who help to direct research based on their needs. Analytics start to turn up findings that inform clinical decision support.

I haven’t mentioned the collection of genetic information so far, because it requires more expensive processes, presents numerous privacy risks, and isn’t usually useful–normally it tells you that you have something like a 2% risk of getting a disease instead of the general population’s 1% risk. But where genetic testing is useful, it can definitely fit into this system.

Ultimately, the market for consultants that started out tiny becomes the dominant model for delivering health care. Specialists and hospitals are brought in only when their specific contributions are needed. The savings that result bring down insurance costs for everyone. And chronic disease goes way down as people get quick feedback on their lifestyle choices.

Government Puts Its Foot Down

After a decade of cajoling health care providers to share data and adopt a fee-for-outcome model, only to witness progress at a snail’s pace, the federal government decides to try a totally different tack in this second scenario. As part of the Precision Medicine initiative (which originally planned to sign up one million volunteers), and leveraging the ever-growing database of Medicare data, the Office of the National Coordinator sets up a consortium and runs analytics on top of its data to be shared with all legitimate researchers. The government also promises to share the benefits of the analytics with anyone in the world who adds their data to the database.

The goals of the analytics are multi-faceted, combining fraud checks, a search for cures, and everyday recommendations about improving interventions to save money and treat patients earlier in the disease cycle. The notorious 17-year gap between research findings and widespread implementation shrinks radically. Now, best practices are available to any patient who chooses to participate.

As with the personal health records in the previous scenario, the government database in this scenario creates a research platform of unprecedented size, both in the number of records and the variety of participating researchers.

To further expand the power of the analytics, the government demands exponentially greater transparency not just in medical settings but in all things that make us sick: the food we eat (reversing the rulings that protect manufacturers and restaurants from revealing what they’re putting in our bodies), the air and water that surrounds us, the effects of climate change (a major public health issue, spreading scourges such as mosquito-borne diseases and heat exhaustion), disparities in food and exercise options among neighborhoods, and more. Public awareness leads to improvements in health that lagged for decades.

In the next section of this article, I’ll present a third scenario that achieves reform from a different angle.

An Intelligent Interface for Patient Diagnosis by HealthTap

Posted on January 9, 2017 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

HealthTap, an organization that’s hard to categorize, really should appear in more studies of modern health care. Analysts are agog over the size of the Veterans Administration’s clientele, and over a couple other major institutions such as Kaiser Permanente–but who is looking at the 104,000 physicians and the hundreds of millions of patients from 174 countries in HealthTap’s database?

HealthTap allows patients to connect with doctors online, and additionally hosts an enormous repository of doctors’ answers to health questions. In addition to its sheer size and its unique combination of services, HealthTap is ahead of most other health care institutions in its use of data.

I talked with founder and CEO Ron Gutman about a new service, Dr. AI, that triages the patient and guides her toward a treatment plan: online resources for small problems, doctors for major problems, and even a recommendation to head off to the emergency room when that is warranted. The service builds on the patient/doctor interactions HealthTap has offered over its six years of operation, but is fully automated.

Somewhat reminiscent of IBM’s Watson, Dr. AI evaluates the patient’s symptoms and searches a database for possible diagnoses. But the Dr. AI service differs from Watson in several key aspects:

  • Whereas Watson searches a huge collection of clinical research journals, HealthTap searches its own repository of doctor/patient interactions and advice given by its participating doctors. Thus, Dr. AI is more in line with modern “big data” analytics, such as PatientsLikeMe does.

  • More importantly, HealthTap potentially knows more about the patient than Watson does, because the patient can build up a history with HealthTap.

  • And most important, Dr. AI is interactive. Instead of doing a one-time search, it employs artificial intelligence techniques to generate questions. For instance, it may ask, “Did you take an airplane flight recently?” Each question arises from the totality of what HealthTap knows about the patient and the patterns found in HealthTap’s data.

The following video shows Dr. AI in action:

A well-stocked larder of artificial intelligence techniques feed Dr. AI’s interactive triage service: machine learning, natural language processing (because the doctor advice is stored in plain text), Bayesian learning, and pattern recognition. These allow a dialog tailored to each patient that is, to my knowledge, unique in the health care field.

HealthTap continues to grow as a platform for remote diagnosis and treatment. In a world with too few clinicians, it may become standard for people outside the traditional health care system.

Newly Released Open Source Libraries for Health Analytics from Health Catalyst

Posted on December 19, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

I celebrate and try to report on each addition to the pool of open source resources for health care. Some, of course, are more significant than others, and I suspect the new healthcare.ai libraries released by the Health Catalyst organization will prove to be one of the significant offerings. One can do a search for health care software on sites such as GitHub and turn up thousands of hits (of which many are probably under open and free licenses), but for a company with the reputation and accomplishments of Health Catalyst to open up the tools it has been using internally gives healthcare.ai great legitimacy from the start.

According to Health Catalyst’s Director of Data Science Levi Thatcher, the main author of the project, these tools are tried and tested. Many of them are based on popular free software libraries in the general machine learning space: he mentions in particular the Python Scikit-learn library and the R language’s caret and and data.table libraries. The contribution of Health Catalyst is to build on these general tools to produce libraries tailored for the needs of health care facilities, with their unique populations, workflows, and billing needs. The company has used the libraries to deploy models related to operational, financial, and clinical questions. Eventually, Thatcher says, most of Health Catalyst’s applications will use predictive analytics based on healthcare.ai, and now other programmers can too.

Currently, Health Catalyst is providing libraries for R and Python. Moving them from internal projects to open source was not particularly difficult, according to Thatcher: the team mainly had to improve the documentation and broaden the range of usable data connections (ODBC and more). The packages can be installed in the manner common to free software projects in these language. The documentation includes guidelines for submitting changes, so that an ecosystem of developers can build up around the software. When I asked about RESTful APIs, Thatcher answered, “We do plan on using RESTful APIs in our work—mainly as a way of integrating these tools with ETL processes.”

I asked Thatcher one more general question: why did Health Catalyst open the tools? What benefit do they derive as a company by giving away their creative work? Thatcher answers, “We want to elevate the industry and educate it about what’s possible, because a rising tide will lift all boats. With more data publicly available each year, I’m excited to see what new and open clinical or socio-economic datasets are used to optimize decisions related to health.”

The Pain of Recording Patient Risk Factors as Illuminated by Apixio (Part 2 of 2)

Posted on October 28, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

The previous section of this article introduced Apixio’s analytics for payers in the Medicare Advantage program. Now we’ll step through how Apixio extracts relevant diagnostic data.

The technology of PDF scraping
Providers usually submit SOAP notes to the Apixio web site in the form of PDFs. This comes to me as a surprise, after hearing about the extravagant efforts that have gone into new CCDs and other formats such as the Blue Button project launched by the VA. Normally provided in an XML format, these documents claim to adhere to standards and offer a relatively gentle face to a computer program. In contrast, a PDF is one of the most challenging formats to parse: words and other characters are reduced to graphical symbols, while layout bears little relation to the human meaning of the data.

Structured documents such as CCDs contain only about 20% of what CMS requires, and often are formatted in idiosyncratic ways so that even the best CCDs would be no more informative than a Word document or PDF. But the main barrier to getting information, according to Schneider, is that Medicare Advantage works through the payers, and providers can be reluctant to give payers direct access to their EHR data. This reluctance springs from a variety of reasons, including worries about security, the feeling of being deluged by requests from payers, and a belief that the providers’ IT infrastructure cannot handle the burden of data extraction. Their stance has nothing to do with protecting patient privacy, because HIPAA explicitly allows providers to share patient data for treatment, payment, and operations, and that is what they are doing giving sensitive data to Apixio in PDF form. Thus, Apixio had to master OCR and text processing to serve that market.

Processing a PDF requires several steps, integrated within Apixio’s platform:

  1. Optical character recognition to re-create the text from a photo of the PDF.

  2. Further structuring to recognize, for instance, when the PDF contains a table that needs to be broken up horizontally into columns, or constructs such the field name “Diagnosis” followed by the desired data.

  3. Natural language processing to find the grammatical patterns in the text. This processing naturally must understand medical terminology, common abbreviations such as CHF, and codings.

  4. Analytics that pull out the data relevant to risk and presents it in a usable format to a human coder.

Apixio can accept dozens of notes covering the patient’s history. It often turns up diagnoses that “fell through the cracks,” as Schneider puts it. The diagnostic information Apixio returns can be used by medical professionals to generate reports for Medicare, but it has other uses as well. Apixio tells providers when they are treating a patient for an illness that does not appear in their master database. Providers can use that information to deduce when patients are left out of key care programs that can help them. In this way, the information can improve patient care. One coder they followed could triple her rate of reviewing patient charts with Apixio’s service.

Caught between past and future
If the Apixio approach to culling risk factors appears round-about and overwrought, like bringing in a bulldozer to plant a rosebush, think back to the role of historical factors in health care. Given the ways doctors have been taught to record medical conditions, and available tools, Apixio does a small part in promoting the progressive role of accountable care.

Hopefully, changes to the health care field will permit more direct ways to deliver accountable care in the future. Medical schools will convey the requirements of accountable care to their students and teach them how to record data that satisfies these requirements. Technologies will make it easier to record risk factors the first time around. Quality measures and the data needed by policy-makers will be clarified. And most of all, the advantages of collaboration will lead providers and payers to form business agreements or even merge, at which point the EHR data will be opened to the payer. The contortions providers currently need to go through, in trying to achieve 21st-century quality, reminds us of where the field needs to go.

The Pain of Recording Patient Risk Factors as Illuminated by Apixio (Part 1 of 2)

Posted on October 27, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

Many of us strain against the bonds of tradition in our workplace, harboring a secret dream that the industry could start afresh, streamlined and free of hampering traditions. But history weighs on nearly every field, including my own (publishing) and the one I cover in this blog (health care). Applying technology in such a field often involves the legerdemain of extracting new value from the imperfect records and processes with deep roots.

Along these lines, when Apixio aimed machine learning and data analytics at health care, they unveiled a business model based on measuring risk more accurately so that Medicare Advantage payments to health care payers and providers reflect their patient populations more appropriately. Apixio’s tools permit improvements to patient care, as we shall see. But the core of the platform they offer involves uploading SOAP notes, usually in PDF form, and extracting diagnostic codes that coders may have missed or that may not be supportable. Machine learning techniques extract the diagnostic codes for each patient over the entire history provided.

Many questions jostled in my mind as I talked to Apixio CTO John Schneider. Why are these particular notes so important to the Centers for Medicare & Medicaid Services (CMS)? Why don’t doctors keep track of relevant diagnoses as they go along in an easy-to-retrieve manner that could be pipelined straight to Medicare? Can’t modern EHRs, after seven years of Meaningful Use, provide better formats than PDFs? I asked him these things.

A mini-seminar ensued on the evolution of health care and its documentation. A combination of policy changes and persistent cultural habits have tangled up the various sources of information over many years. In the following sections, I’ll look at each aspect of the documentation bouillabaisse.

The financial role of diagnosis and risk
Accountable care, in varying degrees of sophistication, calculates the risk of patient populations in order to gradually replace fee-for-service with payments that reflect how adeptly the health care provider has treated the patient. Accountable care lay behind the Affordable Care Act and got an extra boost at the beginning of 2016 when CMS took on the “goal of tying 30 percent of traditional, or fee-for-service, Medicare payments to alternative payment models, such as ACOs, by the end of 2016 — and 50 percent by the end of 2018.

Although many accountable care contracts–like those of the much-maligned 1970s Managed Care era–ignore differences between patients, more thoughtful programs recognize that accurate and fair payments require measurement of how much risk the health care provider is taking on–that is, how sick their patients are. Thus, providers benefit from scrupulously complete documentation (having learned that upcoding and sloppiness will no longer be tolerated and will lead to significant fines, according to Schneider). And this would seem to provide an incentive for the provider to capture every nuance of a patient’s condition in a clearly code, structured way.

But this is not how doctors operate, according to Schneider. They rebel when presented with dozens of boxes to check off, as crude EHRs tend to present things. They stick to the free-text SOAP note (fields for subjective observations, objective observations, assessment, and plan) that has been taught for decades. It’s often up to post-processing tools to code exactly what’s wrong with the patient. Sometimes the SOAP notes don’t even distinguish the four parts in electronic form, but exist as free-flowing Word documents.

A number of key diagnoses come from doctors who have privileges at the hospital but come in only sporadically to do consultations, and who therefore don’t understand the layout of the EHR or make attempts to use what little structure it provides. Another reason codes get missed or don’t easily surface is that doctors are overwhelmed, so that accurately recording diagnostic information in a structured way is a significant extra burden, an essentially clerical function loaded onto these highly skilled healthcare professionals. Thus, extracting diagnostic information many times involves “reading between the lines,” as Schneider puts it.

For Medicare Advantage payments, CMS wants a precise delineation of properly coded diagnoses in order to discern the risk presented by each patient. This is where Apixio come in: by mining the free-text SOAP notes for information that can enhance such coding. We’ll see what they do in the next section of this article.

The Burden of Structured Data: What Health Care Can Learn From the Web Experience (Part 2 of 2)

Posted on September 23, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

The first part of this article summarized what Web developers have done to structure data, and started to look at the barriers presented by health care. This part presents more recommendations for making structured data work.

The Grand Scheme of Things
Once you start classifying things, it’s easy to become ensnared by grandiose pipe dreams and enter a free fall trying to design the perfect classification system. A good system is distinguished by knowing its limitations. That’s why microdata on the Web succeeded. In other areas, the field of ontology is littered with the carcasses of projects that reached too far. And health care ontologies always teeter on the edge of that danger.

Let’s take an everyday classification system as an example of the limitations of ontology. We all use genealogies. Imagine being able to sift information about a family quickly, navigating from father to son and along the trail of siblings. But even historical families, such as royal ones, introduce difficulties right away. For instance, children born out of wedlock should be shown differently from legitimate heirs. Modern families present even bigger headaches. How do you represent blended families where many parents take responsibilities of different types for the children, or people who provided sperm or eggs for artificial insemination?

The human condition is a complicated one not subject to easy classification, and that naturally extends to health, which is one of the most complex human conditions. I’m sure, for instance, that the science of mosquito borne diseases moves much faster than the ICD standard for disease. ICD itself should be replaced with something that embodies semantic meaning. But constant flexibility must be the hallmark of any ontology.

Transgender people present another enormous challenge to ontologies and EHRs. They’re a test case for every kind of variation in humanity. Their needs and status vary from person to person, with no classification suiting everybody. These needs can change over time as people make transitions. And they may simultaneously need services defined for male and female, with the mix differing from one patient to the next.

Getting to the Point
As the very term “microdata” indicates, those who wish to expose semantic data on the Web can choose just a few items of information for that favored treatment. A movie theater may have text on its site extolling its concession stand, its seating, or its accommodations for the disabled, but these are not part of the microdata given to search engines.

A big problem in electronic health records is their insistence that certain things be filled out for every patient. Any item that is of interest for any class of patient must appear in the interface, a problem known in the data industry as a Cartesian explosion. Many observers counsel a “less is more” philosophy in response. It’s interesting that a recent article that complained of “bloated records” and suggested a “less is more” approach goes on to recommend the inclusion of scads of new data in the record, to cover behavioral and environmental information. Without mentioning the contradiction explicitly, the authors address it through the hope that better interfaces for entering and displaying information will ease the burden on the clinician.

The various problems with ontologies that I have explained throw doubt on whether EHRs can attain such simplicity. Patients are not restaurants. To really understand what’s important about a patient–whether to guide the clinician in efficient data entry or to display salient facts to her–we’ll need systems embodying artificial intelligence. Such systems always feature false positives and negatives. They also depend on continuous learning, which means they’re never perfect. I would not like to be the patient whose data gets lost or misclassified during the process of tuning the algorithms.

I do believe that some improvements in EHRs can promote the use of structured data. Doctors should be allowed to enter the data in the order and the manner they find intuitive, because that order and that manner reflect their holistic understanding of the patient. But suggestions can prompt them to save some of the data in structured format, without forcing them to break their trains of thought. Relevant data will be collected and irrelevant fields will not be shown or preserved at all.

The resulting data will be less messy than what we have in unstructured text currently, but still messy. So what? That is the nature of data. Analysts will make the best use of it they can. But structure should never get in the way of the information.

The Burden of Structured Data: What Health Care Can Learn From the Web Experience (Part 1 of 2)

Posted on September 22, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

Most innovations in electronic health records, notably those tied to the Precision Medicine initiative that has recently raised so many expectations, operate by moving clinical information into structure of one type or another. This might be a classification system such as ICD, or a specific record such as “medications” or “lab results” with fixed units and lists of names to choose from. There’s no arguing against the benefits of structured data. But its costs are high as well. So we should avoid repeating old mistakes. Experiences drawn from the Web may have something to teach the health care field in respect to structured data.

What Works on the Web
The Web grew out of a structured data initiative. The dream of organizing information goes back decades, and was embodied in Standard Generalized Markup Language (SGML) years before Tim Berners-Lee stole its general syntax to create HTML and present information on the Web. SGML could let a firm mark in its documents that FR927 was a part number whereas SG1 was a building. Any tags that met the author’s fancy could be defined. This put semantics into documents. In other words, the meaning of text could be abstracted from the the text and presented explicitly. Semantics got stripped out of HTML. Although the semantic goals of SGML were re-introduced into the HTML successor XML, it found only niche uses. Another semantic Web tool, JSON, was reserved for data storage and exchange, not text markup.

Since the Web got popular, people have been trying to reintroduce semantics into it. There was Dublin Core, then RDF, then microdata in places like schema.org–just to list a few. Two terms denoting structured data on the Web, the Semantic Web and Linked Data, have been enthusiastically taken up by the World Wide Web Consortium and Tim Berners-Lee himself.

But none of these structured data initiatives are widely known among the Web-browsing public, probably because they all take a lot of work to implement. Furthermore, they run into the bootstrapping problem faced by nearly all standards: if your web site uses semantics that aren’t recognized by the browser, they’re just dropped on the ground (or even worse, the browser mangles your web pages).

Even so, recent years have seen an important form of structured data take off. When you look up a movie or restaurant on a major search engine such a Google, Yahoo!, or Bing, you’ll see a summary of the information most people want to see: local showtimes for the movie, phone number and ratings for a restaurant, etc. This is highly useful (particularly on mobile devices) and can save you the trouble of visiting the web site from which the data comes. Google calls these summaries Rich Cards and Rich Snippets.

If my memory serves me right, the basis for these snippets didn’t come from standards committees involving years of negotiation between stake-holders. Google just decided what would be valuable to its users and laid out the standard. It got adopted because it was a win-win. The movie theaters and restaurants got their information right into the viewer’s face, and the search engine became instantly more valuable and more likely to be used again. The visitors doing the search obviously benefitted too. Everyone found it worth their time to implement the standards.

Interestingly, as structure moves into metadata, HTML itself is getting less semantic. The most recent standard, HTML5, did add a few modest tags such as header and footer. But many sites are replacing meaningful HTML markup, such as p for paragraph, with two ultra-generic tags: div for a division that is set off from other parts of the page, and span for a piece of text embedded within another. Formatting is expressed through CSS, a separate language.

Having reviewed a bit of Web history, let’s see what we can learn from it and apply to health care.

Make the Customer Happy
Win-win is the key to getting a standard adopted. If your clinician doesn’t see any benefit from the use of structured data, she will carp and bristle at any attempt to get her to enter it. One of the big reasons electronic health records are so notoriously hard to use is, “All those fields to fill out.” And while lists of medications or other structured data can help the doctor choose the right one, they can also help her enter serious errors–perhaps because she chose the one next to the one she meant to choose, or because the one she really wanted isn’t offered on the list.

Doctors’ resentment gets directed against every institution implicated in the structured data explosion: the ONC and CMS who demand quality data and other fields of information for their own inscrutable purposes, the vendor who designs up the clunky system, and the hospital or clinic that forces doctors to use it. But the Web experience suggests that doctors would fill out fields that would help them in their jobs. The use of structured data should be negotiated, not dictated, just like other innovations such as hand-washing protocols or checklists. Is it such a radical notion to put technology at the service of the people using it?

I know it’s frustrating to offer that perspective, because many great things come from collecting data that is used in analytics and can turn up unexpected insights. If we fill out all those fields, maybe we’ll find a new cure! But the promised benefit is too far off and too speculative to justify the hourly drag upon the doctor’s time.

We can fall back on the other hope for EHR improvement: an interface that makes data entry so easy that doctors don’t mind using structured fields. I have some caveats to offer about that dream, which will appear in the second part of this article.

OCHIN Shows That Messy Data Should Not Hold Back Health Care

Posted on September 12, 2016 I Written By

Andy Oram is an editor at O'Reilly Media, a highly respected book publisher and technology information provider. An employee of the company since 1992, Andy currently specializes in open source, software engineering, and health IT, but his editorial output has ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. His articles have appeared often on EMR & EHR and other blogs in the health IT space. Andy also writes often for O'Reilly's Radar site (http://oreilly.com/) and other publications on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM, and DebConf.

The health care industry loves to complain about patient data. It’s full of errors, which can be equally the fault of patients or staff. And hanging over the whole system is lack of interoperability, which hampers research.

Well, it’s not as if the rest of the universe is a pristine source of well-formed statistics. Every field has to deal with messy data. And somehow retailers, financial managers, and even political campaign staff manage to extract useful information from the data soup. This doesn’t mean that predictions are infallible–after all, when I check a news site about the Mideast conflicts, why does the publisher think I’m interested in celebs from ten years ago whose bodies look awful now? But there is still no doubt that messy data can transform industry.

I’m all for standards and for more reliable means of collecting and vetting patient data. But for the foreseeable future, health care institutions are going to have to deal with suboptimal data. And OCHIN is one of the companies that shows how it can be done.

I recently had a chance to talk and see a demo of OCHIN’s analytical tool, Acuere, with CEO Abby Sears and the Vice President of Data Services and Integration, Clayton Gillett. Their basic offering is a no-nonsense interface that lets clinicians and administrator do predictions and hot-spotting.

Acuere is part of a trend in health care analytics that goes beyond clinical decision support and marshalls large amounts of data to help with planning (see an example screen in Figure 1). For instance, a doctor can rank her patients by the number of alerts the system generates (a patient with diabetes whose glucose is getting out of control, or a smoker who hasn’t received counseling for smoking cessation). An administrator can rank a doctor against others in the practice. This summary just gives a flavor of the many services Acuere can perform; my real thrust in this article is to talk about how OCHIN obtains and processes its data. Sears and Gillett talked about the following challenges and how they’re dealing with them.

Acuere Provider Report Card

Figure 1. Acuere Report Card in Acuere

Patient identification
Difficulties in identifying patients and matching their records has repeatedly surfaced as the biggest barrier to information exchange and use in the US health care system. A 2014 ONC report cites it as a major problem (on pages 13 and 20). An article I cited earlier also blames patient identification for many of the problems of health care analytics. But the American public and Congress have been hostile to unique identifiers for some time, so health care institutions just have to get by without them.

OCHIN handles patient matching as other institutions, such as Health Information Exchanges, do. They compare numerous fields of records–not just obvious identifiers such as name and social security number, but address, demographic information, and perhaps a dozen other things. Sears and Gillett said it’s also hard to knowing which patients to attribute to each health care provider.

Data sources
The recent Precision Medicine initiatives seeks to build “a national research cohort of one million or more U.S. participants.” But OCHIN already has a database on 7.6 million people and has signed more contracts to reach 10 million this Fall. Certainly, there will be advantages to the Precision Medicine database. First, it will contain genetic information, which OCHIN’s data suppliers don’t have. Second, all the information on each person will be integrated, whereas OCHIN has to take de-identified records from many different suppliers and try to integrate them using the techniques described in the previous section, plus check for differences and errors in order to produce clean data.

Nevertheless, OCHIN’s data is impressive, and it took a lot of effort to accumulate it. They get not only medical data but information about the patient’s behavior and environment. Along with 200 different vital signs, they can map the patient’s location to elements of the neighborhood, such as income levels and whether healthy food is sold in local stores.

They get Medicare data from qualified entities who were granted access to it by CMS, Medicaid data from the states, patient data from commercial payers, and even data on the uninsured (a population that is luckily shrinking) from providers who treat them. Each institution exports data in a different way.

How do they harmonize the data from these different sources? Sears and Gillett said it takes a lot of manual translation. Data is divided into seven areas, such as medications and lab results. OCHIN uses standards whenever possible and participates in groups that set standards. There are still labs that don’t use LOINC codes to report results, as well as pharmacies and doctors who don’t use RxNorm for medications. Even ICD-10 changes yearly, as codes come and go.

Data handling
OCHIN isn’t like a public health agency that may be happy sharing data 18 months after it’s collected (as I was told at a conference). OCHIN wants physicians and their institutions to have the latest data on patients, so they carry out millions of transactions each day to keep their database updated as soon as data comes in. Their analytics run multiple times every day, to provide the fast results that users get from queries.

They are also exploring the popular “big data” forms of analytics that are sweeping other industries: machine learning, using feedback to improve algorithms, and so on. Currently, the guidance they offer clinicians is based on traditional clinical recommendations from randomized trials. But they are seeking to expand those sources with other insights from light-weight methods of data analysis.

So data can be useful in health care. Modern analytics should be available to every clinician. After all, OCHIN has made it work. And they don’t even serve up ads for chronic indigestion or 24-hour asthma relief.