Advertisement

Can Automated Imaging for Optic Disc and Retinal Nerve Fiber Layer Analysis Aid Glaucoma Detection?

Open AccessPublished:March 22, 2016DOI:https://doi.org/10.1016/j.ophtha.2016.01.041

      Purpose

      To compare the diagnostic performance of automated imaging for glaucoma.

      Design

      Prospective, direct comparison study.

      Participants

      Adults with suspected glaucoma or ocular hypertension referred to hospital eye services in the United Kingdom.

      Methods

      We evaluated 4 automated imaging test algorithms: the Heidelberg Retinal Tomography (HRT; Heidelberg Engineering, Heidelberg, Germany) glaucoma probability score (GPS), the HRT Moorfields regression analysis (MRA), scanning laser polarimetry (GDx enhanced corneal compensation; Glaucoma Diagnostics (GDx), Carl Zeiss Meditec, Dublin, CA) nerve fiber indicator (NFI), and Spectralis optical coherence tomography (OCT; Heidelberg Engineering) retinal nerve fiber layer (RNFL) classification. We defined abnormal tests as an automated classification of outside normal limits for HRT and OCT or NFI ≥ 56 (GDx). We conducted a sensitivity analysis, using borderline abnormal image classifications. The reference standard was clinical diagnosis by a masked glaucoma expert including standardized clinical assessment and automated perimetry. We analyzed 1 eye per patient (the one with more advanced disease). We also evaluated the performance according to severity and using a combination of 2 technologies.

      Main Outcome Measures

      Sensitivity and specificity, likelihood ratios, diagnostic, odds ratio, and proportion of indeterminate tests.

      Results

      We recruited 955 participants, and 943 were included in the analysis. The average age was 60.5 years (standard deviation, 13.8 years); 51.1% were women. Glaucoma was diagnosed in at least 1 eye in 16.8%; 32% of participants had no glaucoma-related findings. The HRT MRA had the highest sensitivity (87.0%; 95% confidence interval [CI], 80.2%–92.1%), but lowest specificity (63.9%; 95% CI, 60.2%–67.4%); GDx had the lowest sensitivity (35.1%; 95% CI, 27.0%–43.8%), but the highest specificity (97.2%; 95% CI, 95.6%–98.3%). The HRT GPS sensitivity was 81.5% (95% CI, 73.9%–87.6%), and specificity was 67.7% (95% CI, 64.2%–71.2%); OCT sensitivity was 76.9% (95% CI, 69.2%–83.4%), and specificity was 78.5% (95% CI, 75.4%–81.4%). Including only eyes with severe glaucoma, sensitivity increased: HRT MRA, HRT GPS, and OCT would miss 5% of eyes, and GDx would miss 21% of eyes. A combination of 2 different tests did not improve the accuracy substantially.

      Conclusions

      Automated imaging technologies can aid clinicians in diagnosing glaucoma, but may not replace current strategies because they can miss some cases of severe glaucoma.

      Abbreviations and Acronyms:

      CI (confidence interval), GDx (Glaucoma Diagnostics), GPS (glaucoma probability score), HRT (Heidelberg retina tomograph), MRA (Moorfields regression analysis), NFI (nerve fiber indicator), OCT (optical coherence tomography), RNFL (retinal nerve fiber layer), ROC (receiver operating characteristic)
      Diagnosis of glaucoma by an experienced ophthalmologist remains the best reference standard.
      • Prum Jr., B.E.
      • Rosenburg L.F.
      • Gedde S.J.
      • et al.
      Primary Open-Angle Glaucoma Preferred Practice Pattern® Guidelines..
      However, diagnosis can be challenging, especially in people with early glaucoma. Accurate clinical diagnosis of glaucoma is limited by subjectivity, reliance on the examiner's experience, and a wide variation of optic disc structure among the population.
      • Prum Jr., B.E.
      • Rosenburg L.F.
      • Gedde S.J.
      • et al.
      Primary Open-Angle Glaucoma Preferred Practice Pattern® Guidelines..
      • Reus N.J.
      • Lemij H.G.
      • Garway-Heath D.F.
      • et al.
      Clinical assessment of stereoscopic optic disc photographs for glaucoma: the European Optic Disc Assessment Trial.
      Automated imaging of the optic nerve head or retinal nerve fiber layer (RNFL) increasingly is being introduced into practice for diagnosis and monitoring.
      • Stein J.D.
      • Talwar N.
      • Laverne A.M.
      • et al.
      Trends in use of ancillary glaucoma tests for patients with open-angle glaucoma from 2001 to 2009.
      Interpretation of some of the outputs may require expertise, but classification of results as normal or abnormal also can be generated by automatic comparison with a normative database.
      Several imaging technologies that quantify the structure of the retina and optic nerve head can be used in glaucoma.
      • Lin S.C.
      • Singh K.
      • Jampel H.D.
      • et al.
      American Academy of OphthalmologyOphthalmic Technology Assessment Committee Glaucoma Panel
      Optic nerve head and retinal nerve fiber layer analysis: a report by the American Academy of Ophthalmology.
      Confocal scanning laser ophthalmoscopy is commercially available as Heidelberg retina tomograph (HRT; Heidelberg Retina Tomograph III [Heidelberg Engineering, Heidelberg, Germany]). It includes 2 classification algorithms, the Moorfields regression analysis (MRA)
      • Wollstein G.
      • Garway-Heath D.F.
      • Hitchings R.A.
      Identification of early glaucoma cases with the scanning laser ophthalmoscope.
      and the glaucoma probability score (GPS).
      • Coops A.
      • Henson D.B.
      • Kwartz A.J.
      • Artes P.H.
      Automated analysis of heidelberg retina tomograph optic disc images by glaucoma probability score.
      • Burgansky-Eliash Z.
      • Wollstein G.
      • Bilonick R.A.
      • et al.
      Glaucoma detection with the Heidelberg retina tomograph 3.
      The RNFL can be assessed using either scanning laser polarimetry, currently available as the GDx-PRO (Glaucoma Diagnostics [GDx] Carl Zeiss Meditec, Dublin, CA)
      • Garas A.
      • Vargha P.
      • Holló G.
      Comparison of diagnostic accuracy of the RTVue Fourier-domain OCT and the GDx-VCC/ECC polarimeter to detect glaucoma.
      or spectral-domain optical coherence tomography (OCT), with several commercial devices available.
      • Sehi M.
      • Iverson S.M.
      Glaucoma diagnosis and monitoring using advance imaging technologies.
      These imaging tests are user friendly and provide automated quantitative classifications.
      • Tay E.
      • Andreou P.
      • Xing W.
      • et al.
      A questionnaire survey of patient acceptability of optic disc imaging by HRT II and GDx.
      Although many published data describe the diagnostic performance of imaging techniques in cohorts of retrospectively selected glaucoma patients or glaucoma-free normal subjects, there is no high-quality evidence of the comparative accuracy of current imaging techniques for identifying glaucoma in consecutive patients with unknown status screened for possible glaucoma.
      • Lin S.C.
      • Singh K.
      • Jampel H.D.
      • et al.
      American Academy of OphthalmologyOphthalmic Technology Assessment Committee Glaucoma Panel
      Optic nerve head and retinal nerve fiber layer analysis: a report by the American Academy of Ophthalmology.
      • Burr J.
      • Mowatt G.
      • Siddiqui M.A.R.
      • et al.
      The clinical and cost effectiveness of screening for open angle glaucoma: a systematic review and economic evaluation.
      Existing data from case-control studies may not be applicable to the clinically relevant population who undergo assessment and diagnosis.
      • Medeiros F.A.
      • Ng D.
      • Zangwill L.M.
      • et al.
      The effects of study design and spectrum bias on the evaluation of diagnostic accuracy of confocal scanning laser ophthalmoscopy in glaucoma.
      We aimed to assess and compare the performance of these commercially available technologies to detect glaucoma in a prospective cohort. This work was conducted as part of a wider publicly funded study (National Institute for Health Research Health Technology Assessment [HTA], 09/22/111) that also evaluated cost-effectiveness of these imaging technologies in a triage setting in the United Kingdom.

      Methods

       Study Design and Participants

      We conducted a pragmatic multicenter, within-patient, comparative evaluation of the diagnostic accuracy of automated imaging techniques for diagnosis of glaucoma, the Glaucoma Automated Tests Evaluation (GATE). We selected participants prospectively, and they underwent imaging with all technologies under evaluation and then had the reference standard diagnosis (clinical assessment by a glaucoma expert, including examination of the fundus by biomicroscopy and visual field testing with Humphrey 24-2 Swedish interactive threshold algorithm testing, masked to the imaging test results). The study was approved by the North of Scotland Research Ethics Committee (reference, 10/S0801/58) and was conducted according to the tenets of the Declaration of Helsinki. The full study protocol is publicly available.

      National Institute for Health Research. Evaluation, Trials and Studies. Available at: http://www.nets.nihr.ac.uk/projects/hta/0922111. Accessed February 1, 2015.

      We sought patient views on the design, conduct, and analysis of the study through representatives from the International Glaucoma Society.
      The study was coordinated from a central study office in the Health Services Research Unit, University of Aberdeen, and was conducted in 5 National Health Service hospital eye services in the United Kingdom (Aberdeen, Bedford, Hinchingbrooke, Liverpool, and Moorfields). We identified eligible patients from their referral letter as being adults (age ≥18 years) who were newly referred from primary care to the department of ophthalmology of the recruiting hospital with a possible glaucoma diagnosis or glaucoma-related finding. This included high intraocular pressure; possible abnormalities in the optic disc, visual field test results, or both; and possible narrow anterior chamber angle. Patients were ineligible if they had a previous diagnosis of glaucoma or had already been seen by an ophthalmologist.

       Participant Recruitment Process

      We sent information about the study to potential eligible patients at each recruiting hospital, before their first hospital appointment. At their first clinic appointment, we then approached patients, and those patients who agreed to participate and signed the consent form were enrolled. We recorded the demographics (age, gender) of those patients who declined to take part.

       Testing Regimen

      Before seeing the ophthalmologist, participants were imaged using all imaging tests in a random order followed by visual field measurement with standard automated perimetry with the Humphrey 24-2 Swedish interactive threshold algorithm strategy. An ophthalmologist with glaucoma expertise (reference standard) who was masked to the imaging results then assessed participants. We conducted all tests on the same day in 2 centers, and in 3 centers, the clinician assessment was on a separate date, within 2 weeks.

       Index Tests

      The following imaging technologies were compared.

       Heidelberg Retina Tomograph III

      A confocal laser scanning ophthalmoscope (Heidelberg Retina Tomograph III; Heidelberg Engineering) measures quantitative structural information of the optic disc anatomy and creates a topographic map of the retinal surface. Images are given a quality index (the mean topography standard deviation), which the manufacturer recommends should be less than 40 μm. We assessed 2 classification tools: the MRA and the GPS. Both methods produced an overall classification of within normal limits, borderline, or outside normal limits by comparison with normative data. Abnormal index test results for MRA and GPS were defined as an overall classification of outside normal limits.

       Scanning Laser Polarimetry

      Scanning laser polarimetry (the GDx with enhanced corneal compensation or the GDx PRO; Carl Zeiss Meditec) analyzes polarized light reflected from the fundus to measure the RNFL thickness. Images are given a quality index Q, which the manufacturer recommends should be 7 or more. The software generated an automated discriminating classifier of glaucoma, the nerve fiber indicator (NFI) value. The manufacturers report that 95% of the normal population has an NFI value of 35 or less and that 99% of the normal population has an NFI value of 55 or less.

      Carl Zeiss Meditec, Inc. GDxPRO Scanning Laser Polarimeter: User Manual. URL: www.amedeolucente.it/pdf/GDxPRO_User_Manual.pdf; Accessed September 2014.

      An abnormal index test result for the GDx was taken as an NFI value of 56 or more. An NFI value between 36 and 55 was considered to be similar to the borderline category of the other imaging tests. Measurements were obtained using either the GDx PRO or GDx variable corneal compensation (VCC) with an updated enhanced corneal compensation module.

       Optical Coherence Tomography

      Spectral-domain OCT was performed using the Spectralis (Heidelberg Engineering), which produces a detailed cross-sectional image of the retina using reflected light in a similar manner to B-mode ultrasound. Images are given a quality index Q, which the manufacturer recommends should be more than 15. The software (RNFL software version 5) automatically compared an average RNFL thickness with a normative database and produced an overall classification of within normal limits, borderline, or outside normal limits. Abnormal index test results for OCT were taken as a classification of outside normal limits.
      We classed images as good quality if they met the manufacturer quality threshold as described above. Pupils were not dilated routinely unless image quality was inadequate. Between 1 and 3 experienced technicians performed the imaging tests at each center. We did not have any restriction on the same technician performing all imaging tests on an individual. The imaging technologies automatically generated a classification measurement without user input, except the HRT MRA, which required manual identification of the margin of the optic disc.

       Reference Standard

      An ophthalmologist with glaucoma expertise performed a clinical assessment that formed the reference standard (Table 1). All glaucoma experts participating in the study were trained in the procedures and used a preagreed definition of glaucoma (Table 1). Clinical examination included Goldmann applanation tonometry, gonioscopy, biomicroscopic examination of the optic disc (pupils dilated unless contraindicated), and evaluation of the visual field test with Humphrey Swedish interactive threshold algorithm 24-2. This represents currently recommended clinical practice in the United Kingdom.
      National Institute for Health and Clinical Excellence
      Glaucoma: Diagnosis and Management of Chronic Open Angle Glaucoma and Ocular Hypertension. Clinical Guidelines CG85.
      Imaging test results were not available to the ophthalmologist. If a clinical diagnosis could not be established, (e.g., unreliable visual field measurement), an inconclusive diagnosis was recorded. To ensure a valid and consistent application of the agreed reference standard, a limited number of consultant ophthalmologists provided the assessment (n = 11) according to the diagnosis chart shown in Table 1.
      Table 1Possible Diagnoses by the Clinician Performing the Reference Standard Measurement, Ranked in Order of Severity
      DiagnosisDefinition
      Glaucoma
       SevereEvidence of glaucomatous optic neuropathy
      Any of the following: optic disc or retinal nerve fiber layer structural abnormalities; diffuse thinning, focal narrowing, or notching of the optic disc rim, especially at the inferior or superior poles; documented, progressive thinning of the neuroretinal rim with an associated increase in cupping of the optic disc; diffuse or localized abnormalities of the peripapillary retinal nerve fiber layer, especially at the inferior or superior poles; disc rim or peripapillary retinal nerve fiber layer hemorrhages; and optic disc neural rim asymmetry of the 2 eyes consistent with loss of neural tissue.
      and a characteristic visual field loss
      Reliable visual field abnormality considered a valid representation of the subject's functional status. Visual field damage consistent with retinal nerve fiber layer damage (e.g., nasal step, arcuate field defect, or paracentral depression in clusters of test sites). Visual field loss in 1 hemifield that is different from the other hemifield, that is, across the horizontal midline (in early or moderate cases). Absence of other known explanations.
      with MD of –12.01 dB or worse
       ModerateEvidence of glaucomatous optic neuropathy
      Any of the following: optic disc or retinal nerve fiber layer structural abnormalities; diffuse thinning, focal narrowing, or notching of the optic disc rim, especially at the inferior or superior poles; documented, progressive thinning of the neuroretinal rim with an associated increase in cupping of the optic disc; diffuse or localized abnormalities of the peripapillary retinal nerve fiber layer, especially at the inferior or superior poles; disc rim or peripapillary retinal nerve fiber layer hemorrhages; and optic disc neural rim asymmetry of the 2 eyes consistent with loss of neural tissue.
      and a characteristic visual field loss
      Reliable visual field abnormality considered a valid representation of the subject's functional status. Visual field damage consistent with retinal nerve fiber layer damage (e.g., nasal step, arcuate field defect, or paracentral depression in clusters of test sites). Visual field loss in 1 hemifield that is different from the other hemifield, that is, across the horizontal midline (in early or moderate cases). Absence of other known explanations.
      with MD between –6.01 dB and –12 dB
       MildEvidence of glaucomatous optic neuropathy
      Any of the following: optic disc or retinal nerve fiber layer structural abnormalities; diffuse thinning, focal narrowing, or notching of the optic disc rim, especially at the inferior or superior poles; documented, progressive thinning of the neuroretinal rim with an associated increase in cupping of the optic disc; diffuse or localized abnormalities of the peripapillary retinal nerve fiber layer, especially at the inferior or superior poles; disc rim or peripapillary retinal nerve fiber layer hemorrhages; and optic disc neural rim asymmetry of the 2 eyes consistent with loss of neural tissue.
      and a characteristic visual field loss
      Reliable visual field abnormality considered a valid representation of the subject's functional status. Visual field damage consistent with retinal nerve fiber layer damage (e.g., nasal step, arcuate field defect, or paracentral depression in clusters of test sites). Visual field loss in 1 hemifield that is different from the other hemifield, that is, across the horizontal midline (in early or moderate cases). Absence of other known explanations.
      with MD of –6 dB or better
      Glaucoma suspect
       Disc suspectAppearance suggestive of glaucomatous optic neuropathy, but also may represent a variation of normality, with normal visual fields (with or without high IOP)
       Visual field suspectVisual field loss suggestive of glaucoma, but also may represent a variation of normality, with normal appearance of the optic disc (with or without high IOP)
       Visual field and disc suspectBoth the optic disc and visual field have some features that resemble glaucoma, but also may represent a variation of normal (with or without high IOP)
      Ocular hypertensionBoth the visual field and optic nerve appear normal in the presence of elevated pressure >21 mmHg
      Primary angle closureClosed anterior chamber angle (appositionally or synechial) in at least 270° and at least 1 of the following 2: IOP >21 mmHg and presence of peripheral anterior synechiae; both visual field and optic nerve appear normal
      Primary angle-closure suspectClosed anterior chamber angle (appositionally without any synechiae) in at least 270°, with IOP ≤21 mmHg; both visual field and optic nerve appear normal
      No glaucoma-related findingsAbsence of any of the above diagnoses
      IOP = intraocular pressure; MD = mean deviation.
      Any of the following: optic disc or retinal nerve fiber layer structural abnormalities; diffuse thinning, focal narrowing, or notching of the optic disc rim, especially at the inferior or superior poles; documented, progressive thinning of the neuroretinal rim with an associated increase in cupping of the optic disc; diffuse or localized abnormalities of the peripapillary retinal nerve fiber layer, especially at the inferior or superior poles; disc rim or peripapillary retinal nerve fiber layer hemorrhages; and optic disc neural rim asymmetry of the 2 eyes consistent with loss of neural tissue.
      Reliable visual field abnormality considered a valid representation of the subject's functional status. Visual field damage consistent with retinal nerve fiber layer damage (e.g., nasal step, arcuate field defect, or paracentral depression in clusters of test sites). Visual field loss in 1 hemifield that is different from the other hemifield, that is, across the horizontal midline (in early or moderate cases). Absence of other known explanations.

       Outcomes

      Primary outcomes were sensitivity and specificity. Secondary outcomes were diagnostic odds ratio, likelihood ratio, and proportion of indeterminate test results.

       Sample Size

      We calculated the sample size using a 5% significance level based on a 2-sided test using standard diagnostic accuracy study methods.
      • McNemar Q.
      Note on the sampling error of the difference between correlated proportions or percentages.
      A study of 897 individuals would have 90% power to detect a 9% difference in accuracy for the primary outcome of diagnosis of glaucoma based on conservative assumptions (a probability of disagreement of 0.18, a glaucoma rate of 25%, and a sensitivity of 86%).
      • Burr J.
      • Mowatt G.
      • Siddiqui M.A.R.
      • et al.
      The clinical and cost effectiveness of screening for open angle glaucoma: a systematic review and economic evaluation.
      This sample size also would yield 80% power for detecting a 6% difference in accuracy should the sensitivity be 93% (the current best estimate from meta-analyses of high-quality diagnostic studies). For specificity, there would be more than 90% power to detect a 5% difference. We assumed that 6% of test results would be indeterminate, and this increased the total sample size to 954.

       Statistical Analysis

      We included a single eye per patient in the analysis. We ranked the clinical diagnosis in order of decreasing severity according to Table 1. We used the worst eye based on this ranking for our analyses. When both eyes had the same diagnosis, we chose an eye at random. We calculated diagnostic measures (sensitivity, specificity, likelihood ratios, and diagnostic odds ratio) for each test for detection of glaucoma. We compared the diagnostic performance (sensitivity and specificity) of the imaging tests using McNemar's test
      • McNemar Q.
      Note on the sampling error of the difference between correlated proportions or percentages.
      and generated corresponding 95% paired confidence intervals (CIs).
      • Newcombe R.G.
      Improved confidence intervals for the difference between binomial proportions based on paired data.
      We did not impute any missing data. We reported an indeterminate result when no automatic classification was generated, imaging quality was low, or an artifact was present. We explored the following using sensitivity analyses: performance of the tests across the spectrum of disease, classifying borderline test results as abnormal, combining results of more than one imaging test, and considering glaucoma suspects as glaucoma. In combining test results, if either test gave abnormal results, we classified this as abnormal. We only classified combined results as normal if results from both tests were normal. We calculated the area under the receiver operating characteristic curve (ROC) for each test using the global parameter for HRT or OCT and the NFI parameter for GDx, and we compared these using the method of DeLong et al.
      • DeLong E.R.
      • DeLon D.M.
      • Clarke-Pearson D.L.
      Comparing the area under two or more correlated receiver operating characteristic curve: a non-parametric approach.
      We have reported the results according to the Standards for Reporting Diagnostic accuracy studies (STARD) guidelines.
      • Bossuyt P.M.
      • Reitsma J.B.
      • Bruns D.E.
      • et al.
      Standards for reporting of diagnostic accuracy. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration.
      We conducted a post hoc analysis, suggested by a peer reviewer, to explore informally the difference in test performance between tests by calculating individual recruiting center sensitivity and specificity (results not reported here).

      Results

      We recruited 966 participants (48% of those approached) between April 2011 and July 2013. After obtaining informed consent, we excluded 11 participants: 10 were ineligible (preexisting glaucoma, n = 4; referred from other ophthalmologist, n = 4; or not referred for glaucoma, n = 2), and 1 person withdrew from the study. Therefore, 955 participants were available for the index test comparison. Imaging was not implemented for all index tests in 12 participants, and these were excluded from all analyses. There were no adverse events reported during the study. Figure 1 shows a flow diagram of participant flow and classification of results by test and reference standard. The baseline demographics and ocular characteristics of the 943 participants who underwent all index tests are shown in Table 2. Age and gender of nonparticipants were similar to those of participants (see Fig 1).
      Figure thumbnail gr1
      Figure 1Flow diagram showing diagnosis of the cohort. FN = false negative; FP = false positive; GPS = glaucoma probability score; HRT = Heidelberg retinal tomography; MRA = Moorfields regression analysis; OCT = optical coherence tomography; TN = true negative; TP = true positive.
      Table 2Participant Demographics and Ocular Characteristics
      CharacteristicValue
      All ParticipantsGlaucomaNonglaucoma
      No.943158770
      Mean age (SD), yrs60.5 (13.8)67.4 (12.7)59.2 (13.6)
      Female gender, no. (%)482 (51.1)74 (46.8)401 (52.1)
      Ethnicity, no. (%)
      Self-reported ethnicity; no ethnicity was recorded for 15 participants.
       Black or black Caribbean25 (2.7)4 (2.5)21 (2.7)
       Black or black British-African20 (2.1)6 (3.8)14 (1.8)
       Asian or Asian British-Indian18 (1.9)5 (3.2)13 (1.7)
       Asian or Asian British-Pakistani4 (0.4)0 (0)4 (0.5)
       Chinese1 (0.1)1 (0.6)0 (0)
       Other Asian background4 (0.4)1 (0.6)3 (0.4)
       Mixed white and black African1 (0.1)1 (0.6)0 (0)
       White British826 (89.2)140 (88.6)686 (89.1)
       Other29 (3.1)0 (0)29 (3.8)
      Right EyeLeft Eye
      Refraction
       Mean sphere D, Mean (SD), n0.4 (3.3), 5711.0 (3.6), 561
       Myopia greater than –5 D, N/n (%)37/943 (3.9)36/943 (3.8)
       Hyperopia greater than +5 D, N/n (%)38/943 (4.0)51/943 (5.4)
       Astigmatism greater than 3 D, N/n (%)16/943 (1.7)16/943 (1.7)
      Visual acuity logMAR, Mean SD, n0.0 (0.30), 9250.0 (0.3), 926
      D = diopter; logMAR = logarithm of the minimum angle of resolution; SD = standard deviation.
      Self-reported ethnicity; no ethnicity was recorded for 15 participants.
      In 11 of the 943 participants who underwent all the imaging tests, no reference standard was collected (Fig 1). Reference standard results (diagnosis by clinician) of the remaining 932 participants' worse eyes (as described in “Methods”) are shown in Table 3. Table 4 (available at www.aaojournal.org) shows intraocular pressure and visual field mean deviation according to diagnosis. The most common diagnosis was no glaucoma-related findings (32% of participants had no glaucoma-related findings in either eye). Glaucoma was diagnosed in at least 1 eye in 17% of the cohort, and 6.6% had glaucoma in both eyes at referral. Most eyes that were diagnosed with glaucoma had primary open-angle glaucoma (78%).
      Table 3Diagnosis of Participants' Worse Eye by Clinician in Secondary Care (Reference Standard)
      Diagnosis by ClinicianWorse Eye
      No.%
      Number of eyes932
      Glaucoma (reference standard positive test results)15817.0
       Mild788.4
       Moderate454.8
       Severe262.8
       Severity not recorded91.0
      Disc suspect17018.2
      VF suspect363.9
      VF + disc suspect363.9
      OHT11512.3
      PAC313.3
      PAC suspect838.9
      No glaucoma-related findings29932.1
      Inconclusive diagnosis40.4
      OHT = ocular hypertension; PAC = primary angle closure; VF = visual field.
      The distribution of test results, including indeterminate rates, is shown in Table 5 (available at www.aaojournal.org). Indeterminate rates were low (<10%) for all index tests; OCT had the lowest rate of indeterminate test results (4.2%). Indeterminate test results (including those with low quality) were not included in the primary analysis.
      Figure 1 shows the categorization of the test results according to the reference standard finding (true and false positive, true and false negative). Of the 943 participants for whom all 4 tests were performed, 158 were classified by the clinician as glaucoma and 770 as not glaucoma. No conclusive diagnosis could be made for 4 participants. Table 6 (available at www.aaojournal.org) shows the diagnostic performance of the 4 tests, including CIs. The HRT MRA had the highest sensitivity (87.0%) but the lowest specificity (63.9%); GDx had the lowest sensitivity (35.1%) but the highest specificity (97.2%); and the other 2 tests provided intermediate results (HRT GPS values were very similar to the HRT MRA results, and OCT had very similar sensitivity and specificity values). The diagnostic odds ratio ranged from 9.24 for HRT GPS to 18.48 for GDx. Table 7 (available at www.aaojournal.org) shows the paired comparison of the 4 imaging tests.
      We explored with sensitivity analyses the performance of the tests across the spectrum of disease by excluding mild or moderate glaucoma diagnoses, or both, as shown in Figure 2. Restricting the reference standard to a diagnosis of severe glaucoma yielded very high sensitivity (95%) for HRT MRA, HRT GPS, and OCT; of these 3, OCT demonstrated the highest specificity (71%).
      Figure thumbnail gr2
      Figure 2Graph showing the sensitivity and specificity of the tests across the spectrum of disease. GPS = glaucoma probability score; HRT = Heidelberg retinal tomography; MRA = Moorfields regression analysis; OCT = optical coherence tomography.
      We also explored the effect of including a borderline classification as an abnormal index test in addition to outside normal limits results, shown in Figure 3. Figure 4 (available at www.aaojournal.org) shows the categorization flow diagram for the analysis in Figure 3. Sensitivity was higher for all tests than under the default analysis, but with correspondingly lower specificity. Likelihood ratios (and 95% CIs) showed evidence of both being able to rule in and out the presence of glaucoma for all 4 imaging tests (CIs did not contain 1.0). The positive likelihood ratio for GDx was substantially higher than for the other tests (less than 5). This was because of the very high specificity that more than offset the lower relative sensitivity performance. The diagnostic odds ratios ranged from 7.36 for GDx to 14.62 for HRT MRA. Table 8 (available at www.aaojournal.org) shows a further analysis of the diagnostic performance of the imaging tests when glaucoma suspect cases as well as glaucoma cases were classified as cases of disease for the reference standard.
      Figure thumbnail gr3
      Figure 3Graph showing comparison of the sensitivity and specificity of the tests when test results classified as borderline or outside normal limits were included as abnormal test results, compared with test results of outside normal limits only. GPS = glaucoma probability score; HRT = Heidelberg retinal tomography; MRA = Moorfields regression analysis; NFI = nerve fiber indicator; OCT = optical coherence tomography.
      Figure 5 (available at www.aaojournal.org) shows the receiver operating characteristics (ROC) curves for each test under the default analysis. The areas under the ROC curve were similar, with OCT having the highest (0.84) and MRA the lowest (0.79). There was no statistical evidence of a difference between them (P = 0.077).
      We further explored the diagnostic accuracy of using combinations of tests (Table 9; available at www.aaojournal.org). The HRT MRA was combined with the other 3 tests to form a combined test. Combining tests in this way increased the sensitivity as expected, but only marginally, and at the expense of a greater decrease in specificity. Finally, the post hoc informal exploration of center differences showed some indication of MRA specificity varying between centers.

      Discussion

      The GATE study was a large, prospective, within-patient, comparative diagnostic accuracy study that provided the sensitivity and specificity data of 4 diagnostic imaging tests for glaucoma. In our study, most participants had successful testing and all 4 imaging tests had some value in terms of diagnosing or ruling out glaucoma. The HRT MRA had the highest sensitivity, but lower specificity, compared with other tests. By contrast, GDx had the best specificity, although the lowest sensitivity. The HRT GPS results were similar to those of the HRT MRA, as might be anticipated given that their analysis is based on imaging the same structure (i.e., the optic disc), and this finding has been reported previously in different populations.
      • Coops A.
      • Henson D.B.
      • Kwartz A.J.
      • Artes P.H.
      Automated analysis of heidelberg retina tomograph optic disc images by glaucoma probability score.
      Sensitivity for OCT was of very similar magnitude to its specificity. Most tests conducted yielded good-quality automated test results, between 92% (GDx) and 97% (OCT). Optical coherence tomography had the lowest percentage of low-quality imaging results and GDx had the highest, according to the image quality classification provided in the device software. There also was a small percentage of cases (<5%) in which an automated imaging test result could not be produced.
      A number of sensitivity analyses were carried out to assess the robustness of the findings of the primary analysis. The HRT MRA had consistently the highest sensitivity across most analyses, but at a cost of lower specificity compared with other tests. When excluding cases of mild glaucoma, HRT GPS had higher sensitivity than HRT MRA. For severe glaucoma, OCT had the highest sensitivity and specificity. By contrast, GDx had the highest specificity, but the lowest sensitivity, across all analyses. None of the imaging tests had a sensitivity of 100% in identifying severe glaucoma.
      Varying the test definition of abnormal imaging test results by including the borderline category as positive results had the anticipated impact of improving the detection of glaucoma, although at the expense of more nonglaucoma cases being classified falsely as being cases of glaucoma. Using combinations of 2 imaging tests led to a marginal improvement in the detection of glaucoma and a reduction in the number of indeterminate test results. Because the HRT GPS and HRT MRA analyses consistently gave the best performing sensitivity and both methods are available on the same machine (HRT), an increase in HRT performance in terms of sensitivity could be obtained by assigning a diagnosis of glaucoma when either the HRT GPS or HRT MRA classifications were outside normal limits. However, the improvement was smaller than the loss in performance in terms of correctly identifying nondisease cases. Considering the additional cost and practical implications, including training and the purchase of additional equipment, suggests that the use of a single technology is to be preferred. The informal exploration of test performance by center suggested that HRT MRA may vary slightly more in this regard. However, there were likely to be some differences between centers in terms of population, which may explain this, or alternatively, it may be a chance finding.
      Although the automated imaging classifications performed well at identifying disease (sensitivity), they did not perform as well in identifying normal cases (specificity). The exception to this was GDx, which had very poor sensitivity but moderate specificity. The choice of which imaging test is to be preferred to aid diagnosis reflects the inherent trade-off regarding diagnostic testing, where the desire not to miss glaucoma when present must be balanced against the desire to identify correctly those who are without disease.
      Among the strengths of this study is the fact that the enrolled population included consecutive recruitment of subjects without a known history of disease, which reflects a potential clinical application of a diagnostic test. Patients were referred from community optometrists, and this gives a representative sample of current UK practice. The recruited cohort included a large percentage of patients without disease as well as those across the spectrum of glaucoma, including glaucoma suspects and those with ocular hypertension. Prevalence of glaucoma in this cohort was nearly 20%. Other reported studies evaluating the performance of these diagnostic technologies often have used patients already diagnosed with glaucoma, with patients already identified as normal, which has a risk of selection bias. A previous head-to-head comparison of the best-performing parameters of previous versions of the imaging devices the Heidelberg Retina Tomograph II, GDx VCC, and Stratus OCT found similar performance across techniques among a prospective sample of glaucoma patients compared with healthy volunteers.
      • Medeiros F.A.
      • Zangwill L.M.
      • Bowd C.
      • Weinreb R.N.
      Comparison of the GDx VCC scanning laser polarimeter, HRT II confocal scanning laser ophthalmoscope, and Stratus OCT optical coherence tomograph for the detection of glaucoma.
      However, the sample did not represent routine practice (i.e., when the disease status is unknown), and the technologies have been upgraded since.
      The tests that we compared are all routinely available in clinical practice, and each provided an automated classification of glaucoma status. By using the automated outputs, there is no need to interpret the large number of images and parameters that are produced by the technologies, and this therefore may be attractive to clinicians who are not glaucoma experts. An additional strength of our study is that we used a paired design in which participants underwent all 4 tests, allowing a head-to-head comparison. We chose as the reference standard the clinical assessment provided by different ophthalmologists with glaucoma expertise. All ophthalmologists were trained in the study protocols and agreed to a common set of criteria to define glaucoma and normality and were masked to the imaging test results when making their diagnosis. By using different ophthalmologists working at different units, the results of the study are more likely to be generalizable than results from studies performed in a single unit.
      Among the limitations, we recognize that diagnosing glaucoma during the very early stage of disease is challenging. Although consensus was sought through structured discussions and agreement, some assessor differences may have remained between the centers. Ideally, a longitudinal follow-up would provide the best possible reference standard. This was proposed by Medeiros et al.
      • Medeiros F.A.
      • Zangwill L.M.
      • Bowd C.
      • et al.
      Use of progressive glaucoma or optic disc change as the reference standard for evaluation of diagnostic tests in glaucoma.
      who used optic nerve head progression on stereophotographic examination as the criterion for glaucoma diagnosis. We used a single eye (worse diagnosis) per patient, and assessment of the presence of glaucoma in either eye may have a slightly different diagnostic performance. A limitation of the HRT MRA analysis is that it requires a trained technician to identify manually the margins of the optic disc on the acquired image, and this may affect the diagnostic performance if it is not carried out appropriately by a trained examiner. There are other OCT instruments in clinical use with glaucoma diagnostic capabilities, and the results of this study using the Spectralis device may not be applicable fully to other OCT technologies. In this study, we analyzed only a small amount of potential data generated. In addition, OCT hardware and software are evolving rapidly, and new developments may improve their diagnostic performance. Also, current OCT technology now is able to quantify macular parameters that may be useful to diagnose glaucoma, for example, in myopia.
      The results of this study have implications for clinical practice. No test demonstrated or could have been expected to demonstrate 100% sensitivity and specificity, and less-than-perfect results reflect a combination of factors that include an imperfect reference standard and normative databases, at least in part because of the variability of normal anatomic features and instrument operator–dependent factors. Thus, relying solely on the automatic classification to diagnose glaucoma or using this as a replacement for visual field testing is not recommended. We suggest automated imaging technologies are best suited as an adjunct to the ophthalmologist's assessment. It should also be noted that the clinical biomicroscopic examination of the optic nerve and nerve fiber layer yields a true stereoscopic view of the structures that may reveal glaucomatous features that are not visible in the computer-generated images from the imaging technologies used in this study, such as optic disc hemorrhages. These imaging instruments probably have their greatest usefulness when assisting in the diagnosis of more subtle glaucomatous optic neuropathy, particularly in situations where the nerve fiber layer and optic disc are indistinct clinically. Additionally, there is a role to assist clinicians who may not be glaucoma specialists or in the case of trainees where the additional information provided by these devices can be a valuable component of training. From a UK perspective, using automated imaging classification of imaging technologies as a triage test may prove to be cost-effective.
      • Azuara-Blanco A.
      • Banister K.
      • Boachie C.
      • et al.
      Automated imaging technologies for the diagnosis of glaucoma: a comparative diagnostic study for the evaluation of the diagnostic accuracy, performance as triage tests and cost-effectiveness (GATE study).
      In conclusion, this study offers valuable insight into the comparative diagnostic performance of automated diagnostic tests using some of their outputs (disc classification for HRT and RNFL classification for GDx enhanced corneal compensation and Spectralis OCT) that are easy to read and interpret and are encountered in a real-life setting. Automated testing is helpful in and aids glaucoma diagnosis. However, reliance solely on imaging (as used in this study) as a means of diagnosis is not recommended because some patients with severe glaucoma may be missed.

      Acknowledgments

      The authors thank all the GATE study participants and the staff at each of our recruiting centers for taking part in this study: NHS Grampian, Hinchingbrooke Hospital NHS Trust, Bedford Hospital NHS Trust, Moorfields Eye Hospital NHS Foundation Trust, The Royal Liverpool, and Broadgreen University Hospitals NHS Trust; the members of our independent steering committee: Colm O'Brien (chair), Anthony King, Anja Tuulonen, Russell Young, and David Wright; the staff at the GATE study office; and Gladys McPherson and the programming team based in the Centre for Healthcare Randomised Trials within the Health Services Research Unit, University of Aberdeen.

      Supplementary Data

      References

        • Prum Jr., B.E.
        • Rosenburg L.F.
        • Gedde S.J.
        • et al.
        Primary Open-Angle Glaucoma Preferred Practice Pattern® Guidelines..
        Ophthalmology. 2016; 123: p41-111
        • Reus N.J.
        • Lemij H.G.
        • Garway-Heath D.F.
        • et al.
        Clinical assessment of stereoscopic optic disc photographs for glaucoma: the European Optic Disc Assessment Trial.
        Ophthalmology. 2010; 117: 717-723
        • Stein J.D.
        • Talwar N.
        • Laverne A.M.
        • et al.
        Trends in use of ancillary glaucoma tests for patients with open-angle glaucoma from 2001 to 2009.
        Ophthalmology. 2012; 119: 748-758
        • Lin S.C.
        • Singh K.
        • Jampel H.D.
        • et al.
        • American Academy of Ophthalmology
        • Ophthalmic Technology Assessment Committee Glaucoma Panel
        Optic nerve head and retinal nerve fiber layer analysis: a report by the American Academy of Ophthalmology.
        Ophthalmology. 2007; 114: 1937-1949
        • Wollstein G.
        • Garway-Heath D.F.
        • Hitchings R.A.
        Identification of early glaucoma cases with the scanning laser ophthalmoscope.
        Ophthalmology. 1998; 105: 1557-1563
        • Coops A.
        • Henson D.B.
        • Kwartz A.J.
        • Artes P.H.
        Automated analysis of heidelberg retina tomograph optic disc images by glaucoma probability score.
        Invest Ophthalmol Vis Sci. 2006; 47: 5348-5355
        • Burgansky-Eliash Z.
        • Wollstein G.
        • Bilonick R.A.
        • et al.
        Glaucoma detection with the Heidelberg retina tomograph 3.
        Ophthalmology. 2007; 114: 466-471
        • Garas A.
        • Vargha P.
        • Holló G.
        Comparison of diagnostic accuracy of the RTVue Fourier-domain OCT and the GDx-VCC/ECC polarimeter to detect glaucoma.
        Eur J Ophthalmol. 2012; 22: 45-54
        • Sehi M.
        • Iverson S.M.
        Glaucoma diagnosis and monitoring using advance imaging technologies.
        US Ophthalmic Rev. 2013; 6: 15-25
        • Tay E.
        • Andreou P.
        • Xing W.
        • et al.
        A questionnaire survey of patient acceptability of optic disc imaging by HRT II and GDx.
        Br J Ophthalmol. 2004; 88: 719-720
        • Burr J.
        • Mowatt G.
        • Siddiqui M.A.R.
        • et al.
        The clinical and cost effectiveness of screening for open angle glaucoma: a systematic review and economic evaluation.
        Health Technol Assess. 2007; 11 (ix-x, 1–190): iii-iv
        • Medeiros F.A.
        • Ng D.
        • Zangwill L.M.
        • et al.
        The effects of study design and spectrum bias on the evaluation of diagnostic accuracy of confocal scanning laser ophthalmoscopy in glaucoma.
        Invest Ophthalmol Vis Sci. 2007; 48: 214-222
      1. National Institute for Health Research. Evaluation, Trials and Studies. Available at: http://www.nets.nihr.ac.uk/projects/hta/0922111. Accessed February 1, 2015.

      2. Carl Zeiss Meditec, Inc. GDxPRO Scanning Laser Polarimeter: User Manual. URL: www.amedeolucente.it/pdf/GDxPRO_User_Manual.pdf; Accessed September 2014.

        • National Institute for Health and Clinical Excellence
        Glaucoma: Diagnosis and Management of Chronic Open Angle Glaucoma and Ocular Hypertension. Clinical Guidelines CG85.
        (Available at) (Accessed August 1, 2015)
        • McNemar Q.
        Note on the sampling error of the difference between correlated proportions or percentages.
        Psychometrika. 1947; 12: 153-157
        • Newcombe R.G.
        Improved confidence intervals for the difference between binomial proportions based on paired data.
        Stat Med. 1998; 17: 2635-2650
        • DeLong E.R.
        • DeLon D.M.
        • Clarke-Pearson D.L.
        Comparing the area under two or more correlated receiver operating characteristic curve: a non-parametric approach.
        Biometrics. 1988; 44: 837-845
        • Bossuyt P.M.
        • Reitsma J.B.
        • Bruns D.E.
        • et al.
        Standards for reporting of diagnostic accuracy. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration.
        Ann Intern Med. 2003; 138: W1-W12
        • Medeiros F.A.
        • Zangwill L.M.
        • Bowd C.
        • Weinreb R.N.
        Comparison of the GDx VCC scanning laser polarimeter, HRT II confocal scanning laser ophthalmoscope, and Stratus OCT optical coherence tomograph for the detection of glaucoma.
        Arch Ophthalmol. 2004; 122: 827-837
        • Medeiros F.A.
        • Zangwill L.M.
        • Bowd C.
        • et al.
        Use of progressive glaucoma or optic disc change as the reference standard for evaluation of diagnostic tests in glaucoma.
        Am J Ophthalmol. 2005; 139: 1010-1018
        • Azuara-Blanco A.
        • Banister K.
        • Boachie C.
        • et al.
        Automated imaging technologies for the diagnosis of glaucoma: a comparative diagnostic study for the evaluation of the diagnostic accuracy, performance as triage tests and cost-effectiveness (GATE study).
        Health Technol Assess. 2016; 20: 1-168