In 2016, a major transformation occurred in how we evaluate clinical data for real world effectiveness. The FDA signed into law the 21st Century Cures Act, which impacted the Real World Evidence Program in the United States. It signaled a paradigm shift in medicine by formally recognizing the importance of Real World Data (RWD) in bringing medical innovation to patients. But the immense promise of RWD comes with sizeable hurdles. Simply having massive quantities of data at your disposal does not automatically equate to having meaningful answers—especially when considering the sheer length and complexity of medical records.
We realize that medical records in their raw form are far from being set up in a consistently organized and tabulated format. Due to inconsistencies, redundancies, and format variations in medical records, it is more difficult to easily identify relevant aspects of a patient’s medical journey. To gain deeper clinical insight from information that holds the nuances and critical details about patient care, we must rely on more sophisticated machine learning approaches like natural language processing for healthcare. We need to be able to process and make sense of the information-rich, natural language that exists throughout medical records to gain an understanding at the highest fidelity when it comes to a patient’s health journey. This is needed regardless of where that information is stored.
The Challenges of Medical Data
The most important, insightful clinical data exist in narrative form. This captures the medical journey that is experienced by a patient as a series of events. Many important elements are hard to contextualize in ways that allow someone to make clinical sense of it, while at the same time maintaining patient privacy and data security. The processes of inputting information are inherently flawed because electronic medical records (EMRs) come from administrative systems that are designed for reimbursement purposes, not research.
The challenges of medical records are directly caused by the overall lack of standardization, clear labeling and enforcement of data consistency. It’s also not possible to go back and retroactively apply standardization. Much of this variability comes down to medical notes, such as progress or office notes, and the individual style in which people fill in medical information. This reflects a broader issue related to the “subjectiveness” of language. It’s known that inconsistencies are commonplace as multiple health care providers input content and are not mandated to apply consistent standards. They also may lack the financial or technical resources to do so. This is additionally complicated by the fact that medical conditions often vary in how they present and disease usually impacts people differently.
Clinical content is harder to decipher when the stated intent of care does not match what the actual documentation reflects. The challenge is in trying to document everything that a health care provider has thought about and done in the care of a patient versus how that information is captured in the medical record. Therefore, you have to be careful to consider how the coding of notes is impacted by human interpretation.
There is also the sheer size and volume of information existing in patient records. Consider that one patient can make dozens of visits to different providers over several years and each type of visit can generate different levels of data. In this sense, the amount of medical information that’s created becomes immense. As medical record data continue to grow at an exponential pace, hospitals continue storing it in their own siloed way, based on individual workflows and operations. The result makes cross-comparisons difficult when, for example, contradictory outcomes emerge from the same procedures or hospitals use outdated data formats and modes of information sharing, such as faxes, scanned PDFs and pathology reports with handwritten scribbles. The trick is to bring order to this explosion of scattered data and highlight key elements that can lead to an awareness and deeper insights of the nuances of data in a patient’s healthcare journey.
Machine Learning, Natural Language Processing and the Complexity of Health Data
To handle the Pandora’s box of health data that lives within medical records, we need to harness deep learning. Deep learning is a subset of machine learning that makes it possible for multi-layer, computational neural networks to solve complex problems. One form of machine learning is natural language processing (NLP), which helps process and understand human language in a way that gets at the heart of what matters in the data. It’s like extracting a valuable metal from an ore.
By enhancing or highlighting specific, clinically relevant content, the “noise” that is captured for regulatory purposes becomes less prominent and convoluting. The results are data sets that can be tailored based on the area of focus. Machine learning, together with NLP, transform data into an output that is “fit” enough for human review from which we can draw insights. For instance, NLP can classify sections of medical records so that they are more searchable. The significance of this machine-based approach is that we can “read and summarize” thousands of pages of text incredibly quick. This is a feat in scalable medical research that can never be achieved if we depended strictly on human methods of extraction.
Setting Machine Learning Up for Success
Machine learning is a voracious tool for data processing. But it requires a lot of time and training to be used in a meaningful way. Machines do not have the same level of cognitive reasoning as humans, so they need to be pointed in the right direction. Specifically, it needs human experts to “teach” it basic rules to follow by labeling the data to ensure that the correct information is extracted from medical records. By training the machine learning model to “read,” human experts are essentially guiding it with examples, themes and relevant concepts within the medical record text to create a coded (or structured) representation. With time, the process gets more efficient at extracting nuanced information that is vetted by human expertise to ensure that it accurately aligns with the clinical question being asked.
A Deeper Dive With NLP
Extracting this meaningful information from jumbled medical records depends on a tool that can understand the unique “grammar” of individual medical records. Critical questions that we often ask ourselves include: How do we differentiate various sections of a medical record and classify the many document types stored within? How do we distinguish a patient’s history from their physical, discharge summary, lab results, visits, and the like?
This is where NLP comes in. For instance, NLP relies on machine learning to be sensitive to the inconsistencies in how information is documented and the multiple ways a medical concept can be expressed, abbreviated or mistakenly written. NLP must also be adaptive to the constant, high-paced evolution of medicine, our disease understanding, how we test for diseases and the new terms and updated lexicon that reflect this change. In other words, NLP is at the heart of navigating semi-structured and unstructured data that exist throughout medical records,