The projected costs of healthcare in the United States due to the novel coronavirus, or Sars-COV-19 (COVID-19), are eye-opening. Total costs will range between $56B and $556B by the end of 2021, according to a recent report commissioned by American’s Health Insurance Plans1. Hospitals will have a calculated loss of over $200B by mid-year 2020 or lose on average $51B per month over a four-month time period2. There have also been disruptions to routine surgical care, with over 28M surgeries postponed or canceled during the 12-week peak of the COVID-19 crisis3. In addition, exponential growth in the use of telehealth services, remote monitoring, and postal delivery of health products has occurred. One national retail pharmacy chain reported a 600% growth in demand for telehealth communications, virtual visits, and home delivery services during the first quarter as compared to the use of these services in Q1 2019. Retail pharmacies that have mini-clinics embedded in local communities have experienced rapid growth during this COVID-19 period. This is largely driven by the existence of trusted relationships and established interactions with consumers who seek health care goods and services within their community4.

On full display is the need to quickly understand the impact of these changing dynamics and address costly challenges within our health care system in real-time. One critical question is the extent to which digital technologies can be useful in informing our understanding of what works in health care delivery and what does not. Access to more complete data is crucial to empower a diverse range of stakeholders with the knowledge and ammunition needed to make informed choices about expenditures and the use of scarce health care resources. In this regard, we need to know the full range of factors associated with health care delivery versus data that is just good enough to make effective decisions.

Improvements in health outcomes, particularly over the last 100 years, have notedly occurred after a breakthrough in scientific discovery or advancements in technology. Digital health technology has been called the Fourth Industrial Revolution with its increased capacity to use tools and technologies for health care solutions, such as Big Data cloud technology, advanced computer analytics, artificial intelligence (AI), and Natural Language Processing (NLP)5. These breakthroughs have increased our ability to make quicker, more precise predictions related to disease manifestation, clinical outcomes, and operational costs. In this era, the exchange of digital information has also emerged as a dominant factor within the healthcare ecosystem symbolized by the convergence of electronic computing and communication technologies. These technologies have transformed how we make choices, how we engage, and with whom we engage. It has forced us to create new standards in the integrity and quality of data that is produced6.

Tell Me Something That I Don’t Already Know

A critical aspect of digital innovation is the value of AI in making accurate predictions, particularly regarding health outcomes. Specifically, AI uses existing data (information that is available) to find out about something missing (information that you do not have)7. In other words, AI helps by using information from past experiences to find solutions to current and emerging health care challenges for which there are limited answers. As an example, AI digital health applications can quickly simulate large amounts of data stored on cloud-based platforms to precisely target pathogens in genomically defined cancers to help prolong life. These digital innovations have also shown promise in remotely monitoring a patient’s current condition and then predicting the use of needed services in the future to improve their health status, lower their medical costs or increase satisfaction with the care provided8,9.

Most importantly, the reality of a global health pandemic, like COVID-19, has awakened us to the awareness of how quickly the operational and clinical needs within the healthcare system can change and a “new normal” be established in its place10. It further demonstrates the urgency for public-private partnerships to work collaboratively and develop innovative solutions using digital applications. Customers will demand answers to pressing healthcare questions and will want them quickly. COVID-19 is an illustration of how the business of healthcare must change and be more effective in response to an emerging crisis.

In times like these, what insight will data provide? Will the predictive prowess of AI alone be sufficient? What data can be provided to fill the void of what I don’t already know?

The Black Swan

The impact of COVID-19 has been characterized by Healthbox as a black swan or “an event that happens so rarely that it is incredibly hard to predict and even harder to prepare for”11. This is not to be confused with the psychological horror thriller film, Black Swan, although some days do seem like a horror movie!

The marketplace demand for insight from real-world data (RWD) to help make better, more informed decisions has never been greater. There is more interest in and acceptance of RWD from which to draw insights for COVID-19 vaccine development and treatment options12. This will continue to emerge over the next 18 to 36 months.  Effective solutions depend on the availability of relevant data to inform decisions and deliver vital healthcare services, especially to more vulnerable patient populations.

The COVID-19 experience has revealed inefficiencies in our US and global healthcare systems that have resulted in an inability to scale quickly in response to Ribonucleic acid (RNA) testing, vaccine development, and treatment demand. While some entities are faring better than others, the stress on the healthcare infrastructure is coupled with technological challenges that expose the limitations in interoperability and our ability to communicate using existing resources6. These challenges have also created more reliance on the “Internet of Things.” That is, there is more dependence on digital communications via text messaging, cellular phones, web conferencing, and other mobile applications as we shelter-in-place.

These applications provide new meaning to the “Learning Healthcare System” that was coined by the Institute of Medicine in 2015. Greater importance is placed on fine-tuning technology to predictively “think” and efficiently perform at scale.

Can Machines Think and See?

In the 1950s, Alan Turing, a noted mathematician and computer scientist, developed a way to assess how machines or a computing system can exhibit intelligent behavior. Known as The Turing Test, this concept was later coined by a Dartmouth professor, John McCarthy, as “artificial intelligence” or “Can machines think?”8. While AI holds much promise for innovation, it still depends on how well human beings can use available data to inform and teach machines to think as humans think. For example, we have seen how challenging it is to apply AI in real-time to help individuals who are treated for the COVID-19 virus over the past months. It takes time to collect and evaluate evidence, which can create lags in getting real-time solutions and answers to questions such as: When is it appropriate to ventilate and who is the most appropriate candidate given a limited supply of ventilators? Is it safe for patients to take ibuprofen and what are the risk factors to evaluate before treatment?

Deep Learning techniques, pioneered by Brendan Frey, assist AI technology in performing critical tasks, such as recognizing images or translating syntax and semantic terms in natural language processing (NLP)8. Specific to healthcare, this is akin to the ability to take large amounts of data, recognize patterns, and quickly transform it into digestible insight that can help answer a medical or drug discovery question. AI also holds promise for operational effectiveness in health care, such as predicting cycle time in drug supply processing to improve throughput, lower study costs by identifying the appropriate patients for an investigational study, or minimize errors in transcribing patient responses or treatment instructions13.

While these examples may demonstrate the ability of machines to think and see, important issues of caution should also be considered. Just coming up with an AI algorithm and having it work the first time is not realistic. It takes time to train machines to create on a large scale and this is not a simple thing to do. Specifically, one must consider exactly what kinds of data are needed, the source of the data, and where to acquire it. An extensive amount of computing horsepower is needed to train computers with lots of real-world examples to process and validate data to replicate human judgment. Highly trained professionals are needed with experience in both clinical and health information management to serve as data annotators, taggers, and abstractors. In sum, it takes careful planning, dedicated resources, and execution.

Unlocking Insights Trapped in Clinical Narratives

Technology has evolved to enable the faster curation (or abstraction) of traditionally hard-to-get information from medical records14. These improvements often provide deeper insight into decisions made by health care professionals and patients regarding the course of care or treatment impact. However, is this good enough?

One typical way to create a dataset for research is by abstracting data elements that have been formatted in a structured or common way. This provides a more convenient way to acquire and manage data elements because of its consistency in format within a patient record. A greater challenge is in the ability to harness insights from medical records that do not have this structure (or unstructured medical records). This is typically the case in documents like provider notes, treatment reports, image summaries, and other scanned PDF documents. One way of effectively curating this data is by using a combination of machine language technology, such as NLP, and human verification of its output.

Why is this? The documentation of clinical information is not done in the same way or consistently across providers or healthcare systems. Health data in the US is generally collected for reimbursement purposes, not for research. This makes it necessary to find ways to more effectively process, interpret and verify the accuracy of medical records, whether they are structured or unstructured. The end goal is to generate evidence to help us understand relevant conditions, such as symptoms, comorbidities, diagnostic test results, treatment pathways, tumor disease staging, disease progression, and the like. Furthermore, conditions like disease staging and progression often require complex proxy disease stage assignments using protocols that define relevant clinical pathways. This type of data is not laid out in a uniform format or structured documentation style.

Moreover, the richness of insight that describes the interaction between a healthcare provider and a patient can often be found in physician notes. These are notes that relate to issues like a chief complaint or concern, the reason for the doctor visit, timing of symptom onset, symptom history, symptom severity, patient satisfaction with therapy, and treatment plans. Unleashing information within these unstructured assets can best be abstracted by using a combination of machine learning technologies and human curation approaches.

To be clear, Machine Learning and NLP technology can be used for many different purposes, one of which is to unlock the richness of data from unstructured health records. However, when machine curation can be applied, scaled, and confirmed by experienced human curators, the outcomes can result in significant improvements in the time and efforts associated with getting to the deep insights buried in unstructured data sources. Real-world evidence can be generated from the application of science (digital technology) and art (human expertise) to produce RWD for predictive insight to address our most pressing health care challenges.

Navigation through Uncertain Terrain

The presence of COVID-19 has clearly unveiled the challenge of acquiring data and developing meaningful datasets to answer questions regarding the presence of disease and its impact on health outcomes15. This requires the tenacity to change or improve on what we are doing. Insight into healthcare needs and effective interventions is based on having complete data that is captured throughout the patient health care journey. This facilitates our ability to get to evidence-based solutions. Providers need relevant information to make critical health care decisions, particularly in urgent care situations. Patients need to have confidence in knowing that the precision of technology is confirmed by the expert eye of a trained human professional.

Our hope and confidence rest in the ability to get to the right answers by unlocking insightful information better, faster, and more efficiently using both AI digital technology and human curation.

In upcoming blogs, Ciox Real World Data will shine a light on the growing awakening and dominance of digital technologies in health care delivery and information exchange. We will also describe how Ciox RWD builds its data applications using effective solutions that combine Machine Learning and NLP with expert human curation to deliver high-definition datasets to answer pressing questions.

Innovation is really about transformation. This transformative power is the catalyst for change and being bodacious enough to use innovation to create a pathway for a right now solution. Data that provides a complete picture of health can help to predict and respond to the health care needs of patients today and tomorrow.



  1. Wakely (March 30, 2020). COVID-19 Cost Scenario Modeling: Estimating the Cost of COVID-19 Treatment for U.S. Private Insurers. American’s Health Insurance Plans;
  2. American Hospital Association (May 5, 2020).,of%20%2450.7%20billion%20per%20month).
  3. British Journal of Surgery Society (May 11, 2020). European Colorectal College. St. Gallen, Switzerland.
  4. Minemyer P. (May 6, 2020). CVS seeing massive increases in telehealth use, home prescription delivery due to COVID-19.
  5. Rowlands D. (December 2019). What is digital health and why does it matter? Retrieved from:
  6. Diamandis PH, Kotler S. (2012). Abundance: The future is better than you think. Free Press. New York, NY.
  7. Agrawal A, Gans J, Goldfarb A (2018). Prediction Machines: The simple economics of artificial intelligence. Harvard Business Review Press. Boston, MA.
  8. The New York Times Magazine (May 2020). Artificial Intelligence: Health care, Everyday Life, Business (Its perils and its promise). New York, NY
  9. Ciox Real World Data (August 2019). Pharma’s digital awakening: Research-ready health information and AI to reduce costs and deliver better treatment. Clinical Leader.
  10. Colangelo M. (April 2020). Deep Analysis of Global Pandemic Data Reveals Important Insights. Forbes.
  11. Healthbox (April 21, 2020). What structural weaknesses did COVID-19 expose in the U.S. healthcare system?
  12. Healthbox (April 30, 2020). How will COVID-19 change healthcare over the next 18 to 36 months?
  13. Evans H, Agarwal A & Bazos (2017). Building a digital infrastructure. Pharmaceutical Executive: 37(7).
  14. Natarajan S. (April 27, 2020). Leveraging NLP across the Healthcare Spectrum, Forbes.
  15. Dyke D, Kapadia V. (May 20, 2020). Big Data Powers Insight for COVID-19. Healthcare Business Today.

Acknowledgment of Contributions: The author would like to thank the following Ciox Real World Data individuals for their support and contributions in developing some of the content for this blog: Patty Sheridan, MBA, RHIA, FAHIMA, Former SVP, Data Market Development, Dan O’Conner, SVP, Growth, Mark Yap, FNP-C, Business Analyst, Ana Bargo, Data Scientist, Jeannine Cain, MSHI,RHIA,CPHI, Business Analyst and Jessica Weiss, Director of Marketing at Ciox Real World Data.