G. Evidence Hierarchies

So should we assess evidence the way Michelin guides assess hotels and restaurants? (Glasziou 2004).

Researchers often use evidence hierarchies or other frameworks to portray the relative quality of various study designs for the purposes of evaluating single studies as well as a body of evidence comprising multiple studies. An example of a basic evidence hierarchy is:

  • Systematic reviews and meta-analyses of RCTs
  • Randomized controlled trials (RCTs)
  • Non-randomized controlled trials
  • Prospective observational studies
  • Retrospective observational studies
  • Expert opinion

In this instance, as is common in such hierarchies, the top item is a systematic review of RCTs, an integrative method that pools data or results from multiple single studies. (Hierarchies for single primary data studies typically have RCTs at the top.) Also, the bottom item, expert opinion, does not comprise evidence as such, though it may reflect the judgment of one or more people drawing on their perceptions of scientific evidence, personal experience, and other subjective input. There are many versions of such hierarchies, including some with more extensive levels/breakdowns.

Hierarchies cannot, moreover, accommodate evidence that relies on combining the results from RCTs and observational studies (Rawlins 2008).

As noted earlier in this chapter, although the general type or name of a study design (e.g., RCT, prospective cohort study, case series) conveys certain attributes about the quality of a study, the study design name itself is not a good proxy for study quality. One of the weaknesses of these conventional one-dimensional evidence hierarchies is that, while they tend to reflect internal validity, they do not generally reflect external validity of the evidence to more diverse patients and care settings. Depending on the intended use of the findings of a single study or of a body of evidence, an assessment of internal validity may be insufficient. Such hierarchies do not lend themselves to characterizing the quality of a body of diverse, complementary evidence that may yield fuller understanding about how well an intervention works across a heterogeneous population in different real-world circumstances. Box III-12 lists these and other limitations of conventional evidence hierarchies.

Box III-12. Limitations of Conventional Evidence Hierarchies

  • Originally developed for pharmacological models of therapy
  • Poor design and implementation of high-ranking study designs may yield less valid findings than lower-ranking, though better designed and implemented, study types
  • Emphasis on experimental control, while enhancing internal validity, can jeopardize external validity
  • Cannot accommodate evidence that relies on considering or combining results from multiple study designs
  • Though intended to address internal validity of causal effect of an intervention on outcomes, they have been misapplied to questions about diagnostic accuracy, prognosis, or adverse events
  • Number and inconsistencies among (60+) existing hierarchies suggest shortcomings, e.g.,
    • ranking of meta-analyses relative to RCTs
    • ranking of different observational studies
    • terminology (“cohort studies,” “quasi-experimental,” etc.)

Sources: See, e.g.:Glasziou P, et al. Assessing the quality of research. BMJ. 2004;328:39-41.Rawlins MD. On the evidence for decisions about the use of therapeutic interventions. The Harveian Oration of 2008. London: Royal College of Physicians, 2008.Walach H, et al. Circular instead of hierarchical: methodological principles for the evaluation of complex interventions. BMC Med Res Methodol. 2006;24;6:29.

Box III-13 shows an evidence framework from the Oxford Centre for Evidence-Based Medicine that defines five levels of evidence for each of several types of evidence questions pertaining to disease prevalence, screening tests, diagnostic accuracy, therapeutic benefits, and therapeutic harms. The lowest level of evidence for several of these evidence questions, “Mechanism-based reasoning,” refers to some plausible scientific basis, e.g., biological, chemical, or mechanical, for the impact of an intervention. Although the framework is still one-dimensional for each type of evidence question, it does allow for moving up or down a level based on study attributes other than the basic study design.

While retaining the importance of weighing the respective methodological strengths and limitations of various study designs, extending beyond rigid one-dimensional evidence hierarchies to more useful evidence appraisal (Glasziou 2004; Howick 2009; Walach 2006) recognizes that:

  • Appraising evidence quality must extend beyond categorizing study designs
  • Different types of research questions call for different study designs
  • It is more important for ‘direct’ evidence to demonstrate that the effect size is greater than the combined influence of plausible confounders than it is for the study to be experimental.
  • Best scientific evidence ─ for a pragmatic estimate of effectiveness and safety ─ may derive from a complementary set of methods
    • They can offset respective weaknesses/vulnerabilities
    • “Triangulating” findings achieved with one method by replicating it with other methods may provide a more powerful and comprehensive approach than the prevailing RCT approach
  • Systematic reviews are necessary, no matter the research type

Box III-13. Oxford Centre for Evidence-Based Medicine 2011 Levels of Evidence

Table 1 of 2. You can view the complete table in its PDF format by going to http://www.cebm.net/wp-content/uploads/2014/06/CEBM-Levels-of-Evidence-2.1.pdf.

You can also view an image of the table.

Question Step1 (Level1*) Step2 (Level2*) Step3 (Level3*)
How common is the problem? Local and current random sample surveys (or censuses) Systematic review of surveys that allow matching to local circumstances** Local non-random sample**
Is this diagnostic or monitoring test accurate? (Diagnosis) Systematic review of cross sectional studies with consistently applied reference standard and blinding Individual cross sectional studies with consistently applied reference standard and blinding Non-consecutive studies, or studies without consistently applied references tandards**
What will happen if we do not add a therapy? (Prognosis) Systematic review of inception cohort studies Inception cohort studies Cohort study or control arm of randomized trial*
Does this intervention help? (Treatment Benefits) Systematic review of randomized trials or n-of-1 trials Randomized trial or observational study with dramatic effect Non-randomized controlled cohort/follow-upstudy**
What are the COMMON harms?(Treatment Harms) Systematic review of randomized trials, systematic review of nested case-controlstudies, n- of 1 trial with the patient you are raising the question about, or observational study with dramatic effect Individual randomized trial or (exceptionally) observational study with dramatic effect Non-randomized controlled cohort/follow-upstudy (post-marketing surveillance) provided there are sufficient numbers to rule out a common harm. (For long-term harms the duration of follow-up must be sufficient. )**
What are the RARE harms?(Treatment Harms) Systematic review of randomized trials or n-of-1trial Randomized trial or (exceptionally observational study with dramatic effect
Is this (early detection)test worthwhile?(Screening) Systematic review of randomized trials Randomized trial Non-randomized controlled cohort/follow-upstudy**

Table 2 of 2. You can view the complete table in its PDF format by going to http://www.cebm.net/wp-content/uploads/2014/06/CEBM-Levels-of-Evidence-2.1.pdf.

You can also view an image of the table.

Question Step4 (Level4*) Step5(Level5)
How common is the problem? Case-series** n/a
Is this diagnostic or monitoring test accurate? (Diagnosis) Case-control studies, or “poor or non-independent reference standard** Mechanism-based reasoning
What will happen if we do not add a therapy? (Prognosis) Case-series or case-control studies, or poor quality prognostic cohort study** n/a
Does this intervention help? (Treatment Benefits) Case-series, case-controlstudies, or historically controlled studies** Mechanism-based reasoning
What are the COMMON harms?(Treatment Harms) Case-series,case-control, or historically controlled studies** Mechanism-based reasoning
What are the RARE harms?(Treatment Harms)
Is this (early detection)test worthwhile?(Screening) Case-series,case-control, or historically controlled studies** Mechanism-based reasoning

Level may be graded down on the basis ofstudy quality, imprecision, indirectness (study PICO does not match questions PICO), because of inconsistency between studies, or because the absolute effect size is very small; Level may be graded up if there is a large or very large effect size. *As always, a systematic review is generally better than an individual study.

Source: OCEBM Levels of Evidence Working Group. The Oxford 2011 Levels of Evidence. Oxford Centre for Evidence-Based Medicine. http://www.cebm.net/index.aspx?o=5653

results matching ""

    No results matching ""