B. Assessing the Quality of Primary Data Studies

Our confidence that the estimate of a treatment effect, accuracy of a screening or diagnostic test, or other impact of a health care technology that is generated by a study is correct reflects our understanding of the quality of the study. For various types of interventions, we examine certain attributes of the design and conduct of a study to assess the quality of that study. For example, some of the attributes or criteria that are commonly used to assess the quality of studies for demonstrating the internal validity of the impact of therapies on health outcomes are the following:

  • Prospective, i.e., following a study population over time as it receives an intervention or exposure and experiences outcomes, rather than retrospective design
  • Experimental rather than observational
  • Controlled, i.e., with one or more comparison groups, rather than uncontrolled
  • Contemporaneous control groups rather than historical ones
  • Internal (i.e., managed within the study) control groups rather than external ones
  • Allocation concealment of patients to intervention and control groups
  • Randomized assignment of patients to intervention and control groups
  • Blinding of patients, clinicians, and investigators as to patient assignment to intervention and control groups
  • Large enough sample size (number of patients/participants) to detect true treatment effects with statistical significance
  • Minimal patient drop-outs or loss to follow-up of patients (or differences in these between intervention and control groups) for duration of study
  • Consistency of pre-specified study protocol (patient populations, assignment to intervention and control groups, regimens, etc.) and outcome measures with the reported (post-study) protocol and outcome measures

Similarly, some attributes that are commonly used for assessing the external validity of the impact of therapies and other technologies on health outcomes include:

  • Flexible entry criteria to identify/enroll a patient population that is representative of patient diversity likely to be offered the intervention in practice, including demographic characteristics, risk factors, disease stage/severity, comorbidities
  • Large enough patient population to conduct meaningful subgroup analyses (especially for pre-specified subgroups)
  • Dosing, regimen, technique, delivery of the intervention consistent with anticipated practice
  • Comparator is standard of care or other relevant, clinically acceptable (not-substandard) intervention
  • Dosing, regimen, or other forms of delivering the comparator consistent with standard care
  • Patient monitoring and efforts to maintain patient adherence comparable to those in practice
  • Accompanying/concurrent/ancillary care similar to what will be provided in practice
  • Training, expertise, skills of clinicians and other health care providers similar to those available or feasible for providers anticipated to deliver the intervention
  • Selection of outcome measures relevant to those experienced by and important to intended patient groups
  • Systematic effort to follow-up on all patients to minimize attrition
  • Intention-to-treat analysis used to account for all study patients
  • Study duration consistent with the course/episode of disease/condition in practice in order to detect outcomes of importance to patients and clinicians
  • Multiple study sites representative of type/level of health care settings and patient and clinician experience anticipated in practice

RCTs are designed to maximize internal validity, and are generally regarded as the “gold standard” study design for demonstrating the causal impact of a technology on health care outcomes. However, some attributes that strengthen the internal validity of RCTs tend to diminish RCTs’ external validity. Probing the strengths and limitations of RCTs with respect to internal and external validity is also instructive for understanding the utility of other studies. A variety of design aspects intended to improve the external validity of RCTs and related experimental designs are described briefly later in this chapter.

The commonly recognized attributes of study quality noted above that strengthen internal and external validity of primary data studies are derived from an extensive body of methodological concepts and principles, including those summarized below: confounding and the need for controls, prospective vs. retrospective design, sources of bias, random error, and selected other factors.

1. Types of Validity in Methods and Measurement

Whether they are experimental or non-experimental in design, studies vary in their ability to produce valid findings. Validity refers to how well a study or data collection instrument measures what it is intended to measure. Understanding different aspects of validity helps in comparing strengths and weaknesses of alternative study designs and our confidence in the findings generated by those studies. Although these concepts are often addressed in reference to primary data methods, they generally apply as well to integrative methods.

Internal validity refers to the extent to which the results of a study accurately represent the causal relationship between an intervention and an outcome in the particular circumstances of that study. This includes the extent to which the design and conduct of a study minimize the risk of any systematic (non-random) error (i.e., bias) in the study results. Internal validity can be suspect when biases in the design or conduct of a clinical trial or other study could have affected outcomes, thereby causing the study results to deviate from the true magnitude of the treatment effect. True experiments such as RCTs generally have high internal validity.

External validity refers to the extent to which the results of a study conducted under particular circumstances can be generalized (or are applicable) to other circumstances. When the circumstances of a particular study (e.g., patient characteristics, the technique of delivering a treatment, or the setting of care) differ from the circumstances of interest (e.g., patients with different characteristics, variations in the technique of delivering a treatment, or alternative settings of care), the external validity of the results of that study may be limited.

Construct validity refers to how well a measure is correlated with other accepted measures of the construct of interest (e.g., pain, anxiety, mobility, quality of life), and discriminates between groups known to differ according to the construct. Face validity is the ability of a measure to represent reasonably (i.e., to be acceptable “on its face”) a construct (i.e., a concept, trait, or domain of interest) as judged by someone with knowledge or expertise in the construct.

Content validity refers to the degree to which the set of items of a data collection instrument is known to represent the range or universe of meanings or dimensions of a construct of interest. For example, how well do the domains of a health-related quality of life index for rheumatic arthritis represent the aspects of quality of life or daily functioning that are important to patients with rheumatoid arthritis?

Criterion validity refers to how well a measure, including its various domains or dimensions, is correlated with a known gold standard or definitive measurement, if one exists. The similar concept of concurrent validity refers to how well a measure correlates with a previously validated one, and the ability of a measure to accurately differentiate between different groups at the time the measure is applied. Predictive validity refers to the ability to use differences in a measure of a construct to predict future events or outcomes. It may be considered a subtype of criterion validity.

Convergent validity refers to the extent to which different measures that are intended to measure the same construct actually yield similar results, such as two measures of quality of life. Discriminant validity concerns whether different measures that are intended to measure different constructs actually fail to be positively associated with each other. Convergent validity and discriminant validity contribute to, or can be considered subtypes of, construct validity.

2. Confounding and the Need for Controls

Confounding occurs when any factor that is associated with an intervention has an impact on an outcome that is independent of the impact of the intervention. As such, confounding can “mask” or muddle the true impact of an intervention. In order to diminish any impact of confounding factors, it is necessary to provide a basis of comparing what happens to patients who receive an intervention to those that do not.

The main purpose of control groups is to enable isolating the impact of an intervention of interest on patient outcomes from the impact of any extraneous factors. The composition of the control group is intended to be as close as possible to that of the intervention group, and both groups are managed as similarly as possible, so that the only difference between the groups is that one receives the intervention of interest and the other does not. In controlled clinical trials, the control groups may receive a current standard of care, no intervention, or a placebo.

For a factor to be a confounder in a controlled trial, it must differ for the intervention and control groups and be predictive of the treatment effect, i.e., it must have an impact on the treatment effect that is independent of the intervention of interest. Confounding can arise due to differences between the intervention and control groups, such as differences in baseline risk factors at the start of a trial or different exposures during the trial that could affect outcomes. Investigators may not be aware of all potentially confounding factors in a trial. Examples of potentially confounding factors are age, prevalence of comorbidities at baseline, and different levels of ancillary care. To the extent that potentially confounding factors are present at different rates between comparison groups, a study is subject to selection bias (described below).

Most controlled studies use contemporaneous controls alongside (i.e., constituted and followed simultaneously with) intervention groups. Investigators sometimes rely on historical control groups. However, a historical control group is subject to known or unknown inherent differences (e.g., risk factors or other prognostic factors) from a current intervention group, and environmental or other contextual differences arising due to the passage of time that may confound outcomes. In some instances, including those noted below, historical controls have sufficed to demonstrate definitive treatment effects. In a crossover design study, patients start in one group (intervention or control) and then are switched to the other (sometimes multiple times), thereby acting as their own controls, although such designs are subject to certain forms of bias.

Various approaches are used to ensure that intervention and control groups comprise patients with similar characteristics, diminishing the likelihood that baseline differences between them will confound observed treatment effects. The best of these approaches is randomization of patients to intervention and control groups. Random allocation diminishes the impact of any potentially known or unrecognized confounding factors by tending to distribute those factors evenly across the groups to be compared. “Pseudo-randomization” approaches such as alternate assignment or using birthdays or identification numbers to assign patients to intervention and control groups can be vulnerable to confounding.

Placebo Controls

Among the ongoing areas of methodological controversy in clinical trials is the appropriate use of placebo controls. Issues include: (1) appropriateness of using a placebo in a trial of a new therapy when a therapy judged to be effective already exists, (2) statistical requirements for discerning what may be smaller differences in outcomes between a new therapy and an existing one compared to differences in outcomes between a new therapy and a placebo, (3) concerns about comparing a new treatment to an existing therapy that, except during the trial itself, may be unavailable in a given setting (e.g., a developing country) because of its cost or other economic or social constraints (Rothman 1994; Varmus 1997); and (4) when and how to use the placebo effect to patient advantage. As in other health technologies, surgical procedures can be subject to the placebo effect. Following previous missteps that raised profound ethical concerns, guidance was developed for using “sham” procedures as placebos in RCTs of surgical procedures (Horng 2003). Some instances of patient blinding have been most revealing about the placebo effect in surgery, including arthroscopic knee surgery (Moseley 2002), percutaneous myocardial laser revascularization (Stone 2002), and neurotransplantation surgery (Boer 2002). Even so, the circumstances in which placebo surgery is ethically and scientifically acceptable, as well as practically feasible and acceptable to enrolled patients, may be very limited (Campbell 2011).

In recent years there has been considerable scientific progress in understanding the physiological and psychological basis of the placebo response, prompting efforts to put it to use in improving outcomes. It remains important to control for the placebo effect in order to minimize its confounding effect on evaluating the treatment effect of an intervention. However, once a new drug or other technology is in clinical use, the patient expectations and learning mechanisms contributing to the placebo effect may be incorporated into medication regimens to improve patient satisfaction and outcomes. Indeed, this approach may be personalized based on patient genomics, medical history, and other individual characteristics (Enck 2013).

3. Prospective vs. Retrospective Design

Prospective studies are planned and implemented by investigators using real-time data collection. These typically involve identification of one or more patient groups according to specified risk factors or exposures, followed by collection of baseline (i.e., initial, prior to intervention) data, delivery of interventions of interest and controls, collecting follow-up data, and comparing baseline to follow-up data for the patient groups. In retrospective studies, investigators collect samples of data from past interventions and outcomes involving one or more patient groups.

Prospective studies are usually subject to fewer types of confounding and bias than retrospective studies. In particular, retrospective studies are more subject to selection bias than prospective studies. In retrospective studies, patients’ interventions and outcomes have already transpired and been recorded, raising opportunities for intentional or unintentional selection bias on the part of investigators. In prospective studies, patient enrollment and data collection can be designed to reduce bias (e.g., selection bias and detection bias), which is an advantage over most retrospective studies. Even so, the logistical challenges of maintaining blinding of patients and investigators are considerable and unblinding can introduce performance and detection bias.

Prospective and retrospective studies have certain other relative advantages and disadvantages that render them more or less useful for certain types of research questions. Both are subject to certain types of missing or otherwise limited data. As retrospective studies primarily involve selection and analyses of existing data, they tend to be far less expensive than prospective studies. However, their dependence on existing data makes it difficult to fill data gaps or add data fields to data collection instruments, although they can rely in part on importing and adjusting data from other existing sources. Given the costs of enrolling enough patients and collecting sufficient data to achieve statistical significance, prospective studies tend to be more suited to investigating health problems that are prevalent and yield health outcomes or other events that occur relatively frequently and within short follow-up periods. The typically shorter follow-up periods of prospective studies may subject them to seasonal or other time-dependent biases, whereas retrospective studies can be designed to extract data from longer time spans. Retrospective studies offer the advantage of being able to canvass large volumes of data over extended time periods (e.g., from registries, insurance claims, and electronic health records) to identify patients with specific sets of risk factors or rare or delayed health outcomes, including certain adverse events.

4. Sources of Bias

The quality of a primary data study determines our confidence that the estimated treatment effect in a primary data study is correct. Bias refers to any systematic (i.e., not due to random error) deviation in an observation from the true nature of an event. In clinical trials, bias may arise from any factor that systematically distorts (increases or decreases) the observed magnitude of an outcome (e.g., treatment effect or harm) relative to the true magnitude of the outcome. As such, bias diminishes the accuracy (though not necessarily the precision; see discussion below) of an observation. Biases may arise from inadequacies in the design, conduct, analysis, or reporting of a study.

Major types of bias in comparative primary data studies are described below, including selection bias, performance bias, detection bias, attrition bias, and reporting bias (Higgins, Altman, Gøtzsche 2011; Higgins, Altman, Sterne 2011; Viswanathan 2014). Also noted are techniques and other study attributes that tend to diminish each type of bias. These attributes for diminishing bias also serve as criteria for assessing the quality of individual studies.

Selection bias refers to systematic differences between baseline characteristics of the groups that are compared, which can arise from, e.g., physician assignment of patients to treatments, patient self-selection of treatments, or association of treatment assignment with patient clinical characteristics or demographic factors. Among the means for diminishing selection bias are random sequence generation (random allocation of patients to treatment and control groups) and allocation concealment for RCTs, control groups to diminish confounders in cohort studies, and case matching in case-control studies.

Allocation concealment refers to the process of ensuring that the persons assessing patients for potential entry into a trial, as well as the patients, do not know whether any particular patient will be allocated to an intervention group or control group. This helps to prevent the persons who manage the allocation, or the patients, from influencing (intentionally or not) the assignment of a patient to one group or another. Patient allocation based on personal identification numbers, birthdates, or medical record numbers may not ensure concealment. Better methods include centralized randomization (i.e., managed at one site rather than at each enrollment site); sequentially numbered, opaque, sealed envelopes; and coded medication bottles or containers.

Performance bias refers to systematic differences between comparison groups in the care that is provided, or in exposure to factors other than the interventions of interest. This includes, e.g., deviating from the study protocol or assigned treatment regimens so that patients in control groups receive the intervention of interest, providing additional or co-interventions unevenly to the intervention and control groups, and inadequately blinding providers and patients to assignment to intervention and control groups, thereby potentially affecting whether or how assigned interventions or exposures are delivered. Techniques for diminishing performance bias include blinding of patients and providers (in RCTs and other controlled trials in particular), adhering to the study protocol, and sustaining patients’ group assignments.

Detection (or ascertainment) biasrefers to systematic differences between groups in how outcomes are assessed. These differences may arise due to, e.g., inadequate blinding of outcome assessors regarding patient treatment assignment, reliance on patient or provider recall of events (also known as recall bias), inadequate outcome measurement instruments, or faulty statistical analysis. Whereas allocation concealment is intended to ensure that persons who manage patient allocation, as well as the patients themselves, do not influence patient assignment to one group or another, blinding refers to preventing anyone who could influence assessment of outcomes from knowing which patients have been assigned to one group or another. Knowledge of patient assignment itself can affect outcomes as experienced by patients or assessed by investigators. The techniques for diminishing detection bias include blinding of outcome assessors including patients, clinicians, investigators, and/or data analysts, especially for subjective outcome measures; and validated and reliable outcome measurement instruments and techniques.

Attrition bias refers to systematic differences between comparison groups in withdrawals (drop-outs) from a study, loss to follow-up, or other exclusion of patients/participants and how these losses are analyzed. Ignoring these losses or accounting for them differently between groups can skew study findings, as patients who withdraw or are lost to follow-up may differ systematically from those patients who remain for the duration of the study. Indeed, patients’ awareness of whether they have been assigned to a particular treatment or control group may differentially affect their likelihood of dropping out of a trial. Techniques for diminishing attrition bias include blinding of patients as to treatment assignment, completeness of follow-up data for all patients, and intention-to-treat analysis (with imputations for missing data as appropriate).

Reporting bias refers to systematic differences between reported and unreported findings, including, e.g., differential reporting of outcomes between comparison groups and incomplete reporting of study findings (such as reporting statistically significant results only). Also, narrative and systematic reviews that do not report search strategies or disclose potential conflicts of interest raise concerns about reporting bias as well as selection bias (Roundtree 2009). Techniques for diminishing reporting bias include thorough reporting of outcomes consistent with outcome measures specified in the study protocol, including attention to documentation and rationale for any post-hoc (after the completion of data collection) analyses not specified prior to the study, and reporting of literature search protocols and results for review articles. Reporting bias, which concerns differential or incomplete reporting of findings in individual studies, is not the same as publication bias, which concerns the extent to which all relevant studies on given topic proceed to publication.

Registration of Clinical Trials and Results

Two related sets of requirements have improved clinical trial reporting for many health technologies. These requirements help to diminish reporting bias and publication bias, thereby improving the quality of the evidence available for HTA. Further, they increase the value of clinical trials more broadly to trial participants, patients, clinicians, and other decision makers, and society (Huser 2013).

In the US, the Food and Drug Administration Amendments Act of 2007 (FDAAA) mandates that certain clinical trials of drugs, biologics, and medical devices that are subject to FDA regulation for any disease or condition be registered with ClinicalTrials.gov. A service of the US National Library of Medicine, ClinicalTrials.gov is a global registry and results database of publicly and privately supported clinical studies. Further, FDAAA requires investigators to register the results of these trials, generally no more than 12 months after trial completion. Applicable trials include those that have one or more sites in the US, are conducted under an FDA investigational new drug application (IND) or investigational device exemption (IDE), or involve a drug, biologic, or device that is manufactured in the US and its territories and is exported for research (ClinicalTrials.gov 2012; Zarin 2011).

The International Committee of Medical Journal Editors (ICMJE) requires clinical trial registration as a condition for publication of research results generated by a clinical trial. Although the ICMJE does not advocate any particular registry, it is required that a registry meet certain criteria for investigators to meet the condition for publication. (ClinicalTrials.gov meets these criteria.) ICMJE requires registration of trial methodology but not trials results (ICMJE 2013).

As noted above, study attributes that affect bias can be used as criteria for assessing the quality of individual studies. For example, the use of randomization to reduce selection bias and blinding of outcomes assessors to reduce detection bias are among the criteria used for assessing the quality of clinical trials. Even within an individual study, the extent of certain types of bias may vary for different outcomes. For example, in a study of the impact of a technology on mortality and quality of life, the presence of detection bias and reporting bias may vary for those two outcomes.

Box III-4 shows a set of criteria for assessing risk of bias for benefits for several types of study design based on the main types of risk of bias cited above and used by the US Agency for Healthcare Research and Quality (AHRQ) Evidence-based Practice Centers (EPCs).

5. Random Error

In contrast to the systematic effects of various types of bias, random error is a source of non-systematic deviation of an observed treatment effect or other outcome from a true one. Random error results from chance variation in the sample of data collected in a study (i.e., sampling error). The extent to which an observed outcome is free from random error is precision. As such, precision is inversely related to random error.

Random error can be reduced, but it cannot be eliminated. P-values and confidence intervals account for the extent of random error, but they do not account for systematic error (bias). The main approach to reducing random error is to establish large enough sample sizes (i.e., numbers of patients in the intervention and control groups of a study) to detect a true treatment effect (if one exists) at acceptable levels of statistical significance. The smaller the true treatment effect, the more patients may be required to detect it. Therefore, investigators who are planning an RCT or other study consider the estimated magnitude of the treatment effect that they are trying to detect at an acceptable level of statistical significance, and then “power” (i.e., determine the necessary sample size of) the study accordingly. Depending on the type of treatment effect or other outcome being assessed, another approach to reducing random error is to reduce variation in an outcome for each patient by increasing the number of observations made for each patient. Random error also may be reduced by improving the precision of the measurement instrument used to take the observations (e.g., a more precise diagnostic test or instrument for assessing patient mobility).

6. Role of Selected Other Factors

Some researchers contend that if individual studies are to be assembled into a body of evidence for a systematic review, precision should be evaluated not at the level of individual studies, but when assessing the quality of the body of evidence. This is intended to avoid double-counting limitations in precision from the same source (Viswanathan 2014).

In addition to evaluating internal validity of studies, some instruments for assessing the quality of individual studies evaluate external validity. However, by definition, the external validity of a study depends not only on its inherent attributes, but on the nature of an evidence question for which the study is more or less relevant. An individual study may have high external validity for some evidence questions and low external validity for others. Certainly, some aspects of bias for internal validity noted above may also be relevant to external validity, such as whether the patient populations compared in a treatment and control group represent same or different populations, and whether the analyses account for attrition in a way that represents the population of interest, including any patient attributes that differ between patients who were followed to study completion and those who were lost to follow-up. Some researchers suggest that if individual studies are to be assembled into a body of evidence for a systematic review, then external validity should be evaluated when assessing the quality of the body of evidence, but not at the level of individual studies (Atkins 2004; Viswanathan 2014).

Box III-4. Design-Specific Criteria to Assess Risk of Bias for Benefits

ox III-4\. Design-Specific Criteria to Assess Risk of Bias for Benefits   Source:  Viswanathan M, Ansari MT, Berkman ND, Chang S, et al. Chapter 9\. Assessing the risk of bias of individual studies in systematic reviews of health care interventions. In: Methods Guide for Effectiveness and Comparative Effectiveness Reviews. AHRQ Publication No. 10(14)-EHC063-EF. Rockville, MD: Agency for Healthcare Research and Quality. January 2014.

Source: Viswanathan M, Ansari MT, Berkman ND, Chang S, et al. Chapter 9. Assessing the risk of bias of individual studies in systematic reviews of health care interventions. In: Methods Guide for Effectiveness and Comparative Effectiveness Reviews. AHRQ Publication No. 10(14)-EHC063-EF. Rockville, MD: Agency for Healthcare Research and Quality. January 2014.

Some quality assessment tools for individual studies account for funding source (or sponsor) of a study and disclosed conflicts of interest (e.g., on the part of sponsors or investigators) as potential sources of bias. Rather than being direct sources of bias themselves, a funding source or a person with a disclosed conflict of interest may induce bias indirectly, e.g., in the form of certain types of reporting bias or detection bias. Also, whether the funding source of research comes is a government agency, non-profit organization, or health technology company does not necessarily determine whether it induces bias. Of course, all of these potential sources of bias should be systematically documented (Viswanathan 2014).

results matching ""

    No results matching ""