When Does Size Matter? Promises, Pitfalls, and Appropriate Interpretation of “Big” Medical Records Data

      The analysis of large datasets in ophthalmic research, and medical research more broadly, has become increasingly common. The growth of electronic health records (EHRs) has facilitated passive collection of large quantities of computerized medical data; EHR systems have been adopted nearly universally in the United States.
      • Henry J.
      • Pylypchuk Y.
      • Searcy T.
      • Patel V.
      Adoption of electronic health record systems among U.S. non-federal acute care hospitals: 2008-2015. The Office of the National Coordinator for Health Information Technology Data Brief 35.

      Healthcare Information and Management Systems Society (HIMSS). Essentials Brief: 2016 Outpatient Practice Management and Electronic Health Record Solutions Study. July 2016; Available at https://www.himssanalytics.org/essentials-brief/essentials-brief-2017-outpatient-pm-ehr-study.

      Researchers are eager to leverage these data into insights that meaningfully improve clinical care and patient outcomes. The American Academy of Ophthalmology’s Intelligent Research in Sight (IRIS) Registry represents one such dataset.
      In this issue, Chiang et al
      • Chiang M.F.
      • Sommer A.
      • Rich W.L.
      • et al.
      The 2016 American Academy of Ophthalmology IRIS® Registry (Intelligent Research in Sight) Database: characteristics and methods.
      (p. 1143) outline the creation of the IRIS Registry and describe the basic characteristics of the physicians and patients included. The IRIS Registry contains an impressive amount of data from ophthalmologic practices; in 2016 alone, IRIS aggregated data on 37 million encounters from 17 million unique patients across 10 000 providers. The utility of the IRIS Registry for clinical research is demonstrated in a study by Park and Lum,
      • Park D.W.
      • Lum F.
      Return to the operating room after macular surgery. IRIS Registry analysis.
      (p. 1273) also appearing in this issue. The authors describe the 1-year risk of return to the operating room after vitrectomy, with a sample of more than 73 000 eyes with epiretinal membrane and 41 000 eyes with macular holes.
      • Park D.W.
      • Lum F.
      Return to the operating room after macular surgery. IRIS Registry analysis.
      This investigation is an example of studies the IRIS Registry may enable, and we expect many others will soon follow.
      Yet enthusiasm about the impressive size of large medical datasets should not diminish awareness of their weaknesses; as with all research, it is essential to draw careful conclusions that are well supported by the data. This editorial intends to give an overview of big EHR data, review their potential strengths for medical research, and outline 5 common pitfalls with recommendations on how they can be mitigated.

      Big Data Basics

      We can think of most “big” datasets as having 2 dimensions: width and length. Wide datasets have information on a large number of variables (e.g., genetic sequence data) and long datasets contain information on a large number of people (e.g., census data). Databases from EHR systems, including the IRIS Registry, contain information on both a large number of variables and a large number of people.
      Medical studies using observational data will generally have at least 1 of 3 goals: (1) description, (2) prediction, or (3) causal inference. Descriptive studies yield insight into the patient experience by reporting on the prevalence and incidence of conditions and treatments. The studies by Chiang et al
      • Chiang M.F.
      • Sommer A.
      • Rich W.L.
      • et al.
      The 2016 American Academy of Ophthalmology IRIS® Registry (Intelligent Research in Sight) Database: characteristics and methods.
      and Park and Lum
      • Park D.W.
      • Lum F.
      Return to the operating room after macular surgery. IRIS Registry analysis.
      featured in this issue are primarily descriptive in nature. Predictive studies provide estimates of a patient’s risk for an outcome, given their individual characteristics, without any intention to intervene. Examples include risk scores, such as the Framingham Risk Score.
      • D’Agostino R.B.
      • Vasan R.S.
      • Pencina M.J.
      • et al.
      General cardiovascular risk profile for use in primary care: the Framingham Heart Study.
      Finally, studies of causal inference aim to quantify the impact of an intervention, for example, estimating improvement in visual acuity after treating diabetic macular edema to determine efficacy of different drug treatments.
      • Well J.A.
      • Glassman A.R.
      • et al.
      Diabetic Retinopathy Clinical Research Network
      Aflibercept, bevacizumab or ranibizumab for diabetic macular edema.
      The specific context and goals of studies will dictate which strengths and limitations of large EHR data will be most pertinent.

      Promises of Big Electronic Health Record Data

      The IRIS Registry aggregates data collected during routine provision of clinical care, making it possible to gain insight from information that would otherwise be inaccessible to researchers from more conventional studies. Although deployment of EHR systems and registries like IRIS require a substantial initial investment of resources to create, they are scalable once in place; the marginal cost and effort required for the addition of new information is low. Compare this with the more traditional paradigm for medical research, where study investigators specify predefined clinical data to be collected outside the course of usual care. The costs (including time, effort, and money) of collecting additional measurements or enrolling new patients prevents collection of original research data on the same extensive scale as EHR systems enable.
      As a result, investigators can leverage large, passively collected datasets to perform studies that would be otherwise infeasible; the broad range of clinical variables available creates numerous possibilities for potential analyses. The value of “real-world evidence” from passively collected datasets is being increasingly recognized by institutions including the US Food and Drug Administration
      US Food and Drug Administration
      Use of real-world evidence to support regulatory decision-making for medical devices: guidance for industry and Food and Drug Administration Staff.
      and the National Academy of Sciences, Engineering, and Medicine.
      National Academy of Sciences, Engineering, and Medicine
      Examining the Impact of Real-World Evidence on Medical Product Development.
      Research can be performed relatively rapidly, without the delays for data collection. Another tangible advantage of the large size of the IRIS Registry and similar EHR data is the increased absolute number of patients with rare outcomes or in uncommon subgroups; large databases have previously been used to study rare events, such as infectious endophthalmitis after cataract surgery
      • Haripriya A.
      • Chang D.F.
      • Ravindran R.D.
      Endophthalmitis reduction with intracameral moxifloxacin prophylaxis.
      or systemic complications after anti–vascular endothelial growth factor therapy.
      • Moja L.
      • Lucenteforte E.
      • Kwag K.H.
      • et al.
      Systemic safety of bevacizumab versus ranibizumab for neovascular age-related macular degeneration.

      Five Potential Pitfalls of Big Electronic Health Record Data

      Data quality: The efficiency of passive collection of research data from EHRs comes with trade-offs in data quality; data errors can arise from several sources.
      • Bowman S.
      Impact of electronic health record systems on information integrity: quality and safety implications.
      For databases that aggregate EHR data from multiple sources, including the IRIS Registry, errors may be introduced from the extraction and harmonization of information across systems that store medical data in substantially different formats. Clinicians also may make data-entry errors in the EHR interface or fail to report important information in structured fields, opting to write free-text notes instead. After all, the primary motivation for clinician EHR use is not research, but the support of clinical workflow, satisfaction of administrative requirements, and documentation for reimbursement. As a result, important types of information may not be captured or may be imprecisely measured in EHRs. Ambiguity in the available data often leads researchers to use inclusion criteria, exposure measures, case definitions, and outcomes that are inexact. For instance, Park and Lum
      • Park D.W.
      • Lum F.
      Return to the operating room after macular surgery. IRIS Registry analysis.
      identify patients receiving vitrectomy for epiretinal membrane and macular holes using a combination of International Classification of Diseases 9th and 10th Revisions, and Current Procedural Terminology codes. Assumptions were made to prioritize conditions; for a hypothetical patient with diagnostic codes indicating the presence of epiretinal membrane, vitreous floaters, and macular hole, macular hole was labeled as the primary diagnosis. Furthermore, the authors were not able to identify what percentage of macular holes actually closed, only the percentage of patients with macular hole who returned to the operating room.
      Although there is little researchers can do to change the quality of EHR data, definitions for key variables, especially outcomes, should be validated whenever possible. Validation compares an inexact definition (e.g., using presence or absence of diagnostic codes) with a gold standard (e.g., full chart review) and reports metrics such as the sensitivity, specificity, positive predictive value, and negative predictive value. Performance can vary widely on the basis of the condition and the sophistication of the definition; one study found diagnostic code–based definitions for 32 conditions yielded positive predicted values ranging from 23% to 100%.
      • Quan H.
      • Li B.
      • Saunders L.D.
      • et al.
      Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database.
      Researchers may cite studies that have previously validated the variable definition they use or validation may be performed internally on a subset of their data.
      Patient loss to follow-up: Electronic health record databases often do not contain a complete record of a patient’s interactions with the medical system. Single-center EHR data, as well as multicenter registries like IRIS, do not capture clinical interactions occurring at nonparticipating facilities. This may lead to substantial undercounting of clinical events after a treatment or procedure. As a result, descriptive studies may report artificially low rates of complications—an issue that should be seriously considered before establishing performance-based quality of care metrics exclusively on registries and other datasets with less than complete follow-up, regardless of their size. In studies estimating causal effects, this can lead to selection bias, particularly when reasons for loss to follow-up are associated with both the exposure and the outcome.
      • Hernan M.A.
      • Hernandez-Diaz S.
      • Robins J.M.
      A structural approach to selection bias.
      The EHR data may not be ideal for answering questions that require a long, uninterrupted duration of individual patient follow-up. Studies that depend on information collected across patient encounters should report potential loss to follow-up as a limitation. To test the robustness of conclusions to this limitation, authors may consider conducting quantitative bias analyses.
      • Lash T.L.
      • Fox M.P.
      • Fink A.K.
      Applying Quantitative Bias Analysis to Epidemiologic Data.
      Overemphasis on statistical significance: There are also several important statistical considerations when working with any large database. A by-product of having many patient observations is increased statistical power and precision; this can lead to remarkably small P values for hypothesis testing, even when the observed differences between groups are clinically inconsequential.
      American Statistical Association
      The ASA’s statement on statistical significance and p-values.
      Simply put, statistical significance does not necessarily imply clinical significance. Furthermore, large sample sizes can have the unfortunate side effect of magnifying problems related to statistical model misspecification. Incorrect model assumptions will lead to biased estimates; the narrower the confidence intervals, the more likely the true parameter will be excluded.
      • van der Laan M.
      Statistics as a science, not an art: the way to survive in data science. AMSTAT News 2015.
      In addition, the high-dimensionality of EHR data can lead investigators to test an excessive number of hypotheses or manipulate the data in a number of arbitrary ways until arriving at the desired outcome, typically only reporting results that are statistically significant.
      • Nuzzo R.
      Scientific method: statistical errors.
      These practices, sometimes called “p-hacking” or “data-dredging,” lead to an inflated risk of false-positive results (also known as “type I errors”) and findings that are not reproducible.
      To ensure that the audience can fairly judge whether results are meaningful enough to warrant action, P values must not be presented in isolation. Authors should report meaningful effect estimates, such as the absolute difference or risk ratio, with corresponding confidence intervals. To prevent misleading results from p-hacking practices, primary analyses can be prespecified. If post hoc testing is performed, the total number of tests and description of the tests should be reported, regardless of whether results were statistically significant.
      Confounding: Causal inference studies using big EHR data are vulnerable to the same confounding biases common in other observational studies. Confounding occurs when outcomes and treatment decisions share common causes. For instance, confounding by severity occurs when mild cases of a condition are treated differently than severe cases and have better outcomes, independent of treatment received. A naïve comparison of outcomes for the 2 treatment alternatives will produce misleading results. Investigators have demonstrated that influenza vaccination appears to increase the risk of influenza-related complications when the study design and analysis do not account for the differences in age and pulmonary disease between vaccine recipients and nonrecipients.
      • Hak E.
      • Verheij T.J.M.
      • Nichol K.L.
      • Hoes A.W.
      Confounding by indication in non-experimental evaluation of vaccine effectiveness: the example of prevention of influenza complications.
      When designing observational studies for causal inference, it can be helpful to conceptualize how an ideal randomized trial would be conducted and to make study design choices to mimic the ideal trial.

      Hernan MA, Robins JM. Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming. Available at: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/. Accessed January 23, 2018.

      To minimize bias due to confounding, factors that act as a common cause of the outcome and treatment must be accounted for through statistical adjustment, restriction, or matching. Although a complete overview of study design for causal inference is beyond the scope of this article, several textbooks provide detailed guidance.

      Hernan MA, Robins JM. Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming. Available at: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/. Accessed January 23, 2018.

      • Rothman K.J.
      • Greenland S.
      • Lash T.L.
      Modern Epidemiology.
      Appropriate reporting: Observational studies including research from large EHR datasets can have important implications for both clinical practice and broader health policy. Investigators using these datasets have a responsibility to be circumspect, thoughtfully considering the strengths and limitations of their approach, faithfully reporting their analytic methods, and drawing responsible, unexaggerated conclusions. All authors should prioritize clear reporting and, at minimum, follow the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines
      • von Elm E.
      • Altman D.G.
      • Egger M.
      • et al.
      STROBE Initiative. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.
      for observational research and RECORD (Reporting of Studies Using Observational Routine-Collected Health Data) statement for routinely collected health data.
      • Benchimol E.I.
      • Smeeth L.
      • Guttman A.
      • et al.
      The Reporting of Studies Using Observational Routine-Collected Health Data (RECORD) Statement.

      Conclusions

      Most of these limitations are not unique to studies using “big” data; yet, the size of the dataset alone will not compensate for them. Data from EHRs, including the IRIS Registry, will open the door to research investigations that would have been otherwise infeasible or impossible. As members of the medical research community continue to gain access to unprecedented amounts of data, we would be wise to remember that the use of “big” EHR data comes with both substantial promise and potential pitfalls.

      References

        • Henry J.
        • Pylypchuk Y.
        • Searcy T.
        • Patel V.
        Adoption of electronic health record systems among U.S. non-federal acute care hospitals: 2008-2015. The Office of the National Coordinator for Health Information Technology Data Brief 35.
        (Available at:) (Accessed January 22, 2018)
      1. Healthcare Information and Management Systems Society (HIMSS). Essentials Brief: 2016 Outpatient Practice Management and Electronic Health Record Solutions Study. July 2016; Available at https://www.himssanalytics.org/essentials-brief/essentials-brief-2017-outpatient-pm-ehr-study.

        • Chiang M.F.
        • Sommer A.
        • Rich W.L.
        • et al.
        The 2016 American Academy of Ophthalmology IRIS® Registry (Intelligent Research in Sight) Database: characteristics and methods.
        Ophthalmology. 2018; 125: 1143-1148
        • Park D.W.
        • Lum F.
        Return to the operating room after macular surgery. IRIS Registry analysis.
        Ophthalmology. 2018; 125: 1273-1278
        • D’Agostino R.B.
        • Vasan R.S.
        • Pencina M.J.
        • et al.
        General cardiovascular risk profile for use in primary care: the Framingham Heart Study.
        Circulation. 2008; 117: 743-753
        • Well J.A.
        • Glassman A.R.
        • et al.
        • Diabetic Retinopathy Clinical Research Network
        Aflibercept, bevacizumab or ranibizumab for diabetic macular edema.
        N Engl J Med. 2015; 372: 1194-1203
        • US Food and Drug Administration
        Use of real-world evidence to support regulatory decision-making for medical devices: guidance for industry and Food and Drug Administration Staff.
        (Available at:) (Accessed February 23, 2018)
        • National Academy of Sciences, Engineering, and Medicine
        Examining the Impact of Real-World Evidence on Medical Product Development.
        National Academies Press, Washington DC2018 (Available at:) (Accessed February 23, 2018)
        • Haripriya A.
        • Chang D.F.
        • Ravindran R.D.
        Endophthalmitis reduction with intracameral moxifloxacin prophylaxis.
        Ophthalmology. 2017; 124: 768-775
        • Moja L.
        • Lucenteforte E.
        • Kwag K.H.
        • et al.
        Systemic safety of bevacizumab versus ranibizumab for neovascular age-related macular degeneration.
        Cochrane Database Syst Rev. 2014; 9CD011230
        • Bowman S.
        Impact of electronic health record systems on information integrity: quality and safety implications.
        Perspect Health Inf Manag. 2013; 10 (eCollection 2013)
        • Quan H.
        • Li B.
        • Saunders L.D.
        • et al.
        Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database.
        Health Serv Res. 2008; 43: 1424-1441
        • Hernan M.A.
        • Hernandez-Diaz S.
        • Robins J.M.
        A structural approach to selection bias.
        Epidemiology. 2004; 15: 615-625
        • Lash T.L.
        • Fox M.P.
        • Fink A.K.
        Applying Quantitative Bias Analysis to Epidemiologic Data.
        Springer, New York2009
        • American Statistical Association
        The ASA’s statement on statistical significance and p-values.
        Am Stat. 2016; 70: 129-133
        • van der Laan M.
        Statistics as a science, not an art: the way to survive in data science. AMSTAT News 2015.
        (Available at:) (Accessed February 22, 2018)
        • Nuzzo R.
        Scientific method: statistical errors.
        Nature. 2014; 506: 150-152
        • Hak E.
        • Verheij T.J.M.
        • Nichol K.L.
        • Hoes A.W.
        Confounding by indication in non-experimental evaluation of vaccine effectiveness: the example of prevention of influenza complications.
        J Epidemiol Community Health. 2002; 56: 951-955
      2. Hernan MA, Robins JM. Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming. Available at: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/. Accessed January 23, 2018.

        • Rothman K.J.
        • Greenland S.
        • Lash T.L.
        Modern Epidemiology.
        3rd ed. Lippincott Williams & Wilkins, Philadelphia, PA2008
        • von Elm E.
        • Altman D.G.
        • Egger M.
        • et al.
        STROBE Initiative. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.
        BMJ. 2007; 335: 806-808
        • Benchimol E.I.
        • Smeeth L.
        • Guttman A.
        • et al.
        The Reporting of Studies Using Observational Routine-Collected Health Data (RECORD) Statement.
        PLoS Med. 2015; 12: e1001885

      Linked Article