C. Properties and Impacts Assessed

What does HTA assess? HTA may involve the investigation of one or more properties, impacts, or other attributes of health technologies or applications. In general, these include the following.

  • Technical properties
  • Safety
  • Efficacy and/or effectiveness
  • Economic attributes or impacts
  • Social, legal, ethical and/or political impacts

The properties, impacts, and other attributes assessed in HTA pertain across the range of types of technology. Thus, for example, just as drugs, devices, and surgical procedures can be assessed for safety, effectiveness, and cost effectiveness, so can hospital infection control programs, computer-based drug-utilization review systems, and rural telemedicine networks.

Technical properties include performance characteristics and conformity with specifications for design, composition, manufacturing, tolerances, reliability, ease of use, maintenance, etc.

Safety is a judgment of the acceptability of risk (a measure of the probability of an adverse outcome and its severity) associated with using a technology in a given situation, e.g., for a patient with a particular health problem, by a clinician with certain training, or in a specified treatment setting.

Efficacy and effectiveness both refer to how well a technology works, i.e., accomplishes its intended purpose, usually based on changes in one or more specified health outcomes or “endpoints” as described below. A technology that works under carefully managed conditions does not always work as well under more heterogeneous or less controlled conditions. In HTA, efficacy refers to the benefit of using a technology for a particular problem under ideal conditions, e.g., within the protocol of a carefully managed RCT, involving patients meeting narrowly defined criteria, or conducted at a “center of excellence.” Effectiveness refers to the benefit of using a technology for a particular problem under general or routine conditions, e.g., by a physician in a community hospital for a variety of types of patients. Whereas efficacy answers the question, “Can it work?” (in the best conditions), effectiveness answers the question “Does it work?” (in real-world conditions).

Clinicians, patients, managers and policymakers are increasingly aware of the practical implications of differences in efficacy and effectiveness. Researchers delve into registers, databases (e.g., of third-party payment claims and administrative data), and other epidemiological and observational data to discern possible associations between the use of technologies and patient outcomes in general or routine practice settings. As these are observational studies, their validity for establishing causal connections between interventions and patient outcomes is limited compared to experimental studies, particularly RCTs. Even so, observational studies can be used to generate hypotheses for experimental trials, and they can provide evidence about effectiveness that can complement other evidence about efficacy, suggesting whether findings under ideal conditions may be extended to routine practice. As discussed below, some different types of trials are designed to incorporate varied groups of patients and settings.

Box II-1 shows certain distinctions in efficacy and effectiveness for diagnostic tests. Whereas the relationship between a preventive, therapeutic, or rehabilitative technology and patient outcomes is often direct (though not always easy to measure), the relationship between a technology used for diagnosis or screening and patient outcomes is usually indirect. Also, diagnostic and screening procedures can have their own short-term and long-term adverse health effects, e.g., arising from biopsies, certain radiological procedures, or genetic testing for certain disorders.

Box II-1. Efficacy vs. Effectiveness for Diagnostic Tests

Efficacy Effectiveness
Patient Population Homogeneous; patients with coexisting illness often excluded Heterogeneous; includes all patients who usually have test
Procedures Standardized Often variable
Testing Conditions Ideal Conditions of everyday practice
Practitioner Experts All users

Adapted from: Institute of Medicine 1989.

Economic attributes or impacts of health technologies can be microeconomic and macroeconomic. Microeconomic concerns include costs, prices, charges, and payment levels associated with individual technologies. Other concerns include comparisons of resource requirements and outcomes (or benefits) of technologies for particular applications, such as cost effectiveness, cost utility, and cost benefit. (Methods for determining these are described in chapter V, Economic Analysis Methods.) Health technology can have or contribute to a broad range of macroeconomic impacts. These include impacts on: a nation’s gross domestic product, national health care costs, and resource allocation across health care and other industrial sectors, and international trade. Health technology can also be a factor in national and global patterns of investment, innovation, competitiveness, technology transfer, and employment (e.g., workforce size and mobility). Other macroeconomic issues that pertain to health technologies include the effects of intellectual property policies (e.g., for patent protection), regulation, third-party payment, and other policy changes that affect technological innovation, adoption, diffusion, and use.

Ethical, legal, and social considerations arise in HTA in the form of normative concepts (e.g., valuation of human life); choices about how and when to use technologies; research and the advancement of knowledge; resource allocation; and the integrity of HTA processes themselves (Heitman 1998). Indeed, the origins of technology assessment called for the field to support policymakers’ broader considerations of technological impacts, such as the “social, economic, and legal implications of any course of action” (US Congress, House of Representatives 1967) and the “short- and long-term social consequences (for example, societal, economic, ethical, legal) of the application of technology” (Banta 1993). More recently, for example, an integral component of the Human Genome Project of the US National Institutes of Health is the Ethical, Legal and Social Implications (ELSI) Research Program (Green 2011). One recently proposed broader framework, “HELPCESS,” includes consideration of: humanitarian, ethical, legal, public relationships, cultural, economic, safety/security, and social implications (Yang 2013).

Whether in health care or other sectors, technological innovation can challenge certain ethical, religious, cultural, and legal norms. Current examples include genetic testing, use of stem cells to grow new tissues, allocation of scarce organs for transplantation, and life-support systems for critically ill patients. For example, the slowly increasing supply of donated kidneys, livers, hearts, lungs, and other solid organs for transplantation continues to fall behind the expanding need for them, raising ethical, social, and political concerns about allocation of scarce, life-saving resources (Huesch 2012; Yoshida 1998). In dialysis and transplantation for patients with end-stage renal disease, ethical concerns arise from patient selection criteria, termination of treatment, and managing non-compliant and other problem patients (Moss 2011; Rettig 1991). Even so, these concerns continue to prompt innovations to overcome organ shortages (Lechler 2005), such as techniques for improving transplantation success rates with organs from marginal donors, organs from living donors, paired and longer chain donation, xenotransplantation (e.g., from pigs), stem cells to regenerate damaged tissues, and the longer-range goal of whole-organ tissue engineering (Soto-Gutierrez 2012).

Technologies that can diminish or strengthen patient dignity or autonomy include, e.g., end-of-life care, cancer chemotherapy, feeding devices, and assistive equipment for moving immobilized patients. Greater involvement of patients, citizens, and other stakeholders in health care decisions, technology design and development, and the HTA process itself is helping to address some concerns about the relationships between patients and health technology. Ethical questions also have led to improvements in informed consent procedures for patients involved in clinical trials.

Allocation of scarce resources to technologies that are expensive, misused, not uniformly accessible, or non-curative can raise broad concerns about equity and squandered opportunities to improve population health (Gibson 2002). The same technologies can pose various challenges in the context of different or evolving societal and cultural norms, economic conditions, and health care system delivery and financing configurations. Even old or “mainstream” technologies can raise concerns in changing social contexts, such as immunization, organ procurement for transplantation, or male circumcision (EUnetHTA 2008). In addition to technologies, certain actual or proposed uses of analytical methods can prompt such concerns; many observers object to using actual or implied cost per quality-adjusted life year (QALY) thresholds in coverage decisions (Nord 2010).

Methods for assessing ethical, legal, and social implications of health technology have been underdeveloped relative to other methods in HTA, although there has been increased attention in recent years to developing frameworks and other guidance for these analyses (Duthie 2011; Potter 2008). More work is needed for translating these implications into policy (Van der Wilt 2000), such as for involving different perspectives in the HTA process in order to better account for identification of the types of effects or impacts that should be assessed, and for values assigned by these different perspectives to life, quality of life, privacy, choice of care, and other matters (Reuzel 2001). Some methods used in analysis of ethical issues in HTA, based on work assembled by the European network for Health Technology Assessment (EUnetHTA), are listed in Box II-2. Recent examination of alternative methods used in ethical analysis in HTA suggests that they can yield similar results, and that having a systematic and transparent approach to ethical analysis is more important than the choice of methods (Saarni 2011).

Box II-2. Methods Used for Ethical Analysis in HTA

Method Description
Casuistry Solves morally challenging situations by comparing them with relevant and similar cases where an undisputed solution exists
Coherence analysis Tests the consistency of ethical argumentation, values or theories on different levels, with an ideal goal of a logically coherent set of arguments
Principlism Approaches ethical problems by addressing basic ethical principles, rooted in society’s common morality
Interactive, participatory HTA approaches Involves different stakeholders in a real discourse, to reduce bias and improve the validity and applicability of the HTA
Social shaping of technology Addresses the interaction between society and technology and emphasizes how to shape technology in the best ways to benefit people
Wide reflective equilibrium Aims at a coherent conclusion by a process of reflective mutual adjustment among general principles and particular judgements

Source: Saarni et al. 2008.

As a form of objective scientific and social inquiry, HTA must be subject to ethical conduct, social responsibility, and cultural differences. Some aspects to be incorporated or otherwise addressed include: identifying and minimizing potential conflicts of interest on the part of assessment staff and expert advisors; accounting for social, demographic, economic, and other dimensions of representativeness and equity in HTA resource allocation and topic selection; and patient and other stakeholder input on topic selection, evidence questions, and relevant outcomes/endpoints.

The terms “appropriate” and “necessary” often are used to describe whether or not a technology should be used in particular circumstances. These are judgments that typically reflect considerations of one or more of the properties and impacts described above. For example, the appropriateness of a diagnostic test may depend on its safety and effectiveness compared to alternative available interventions for particular patient indications, clinical settings, and resource constraints, perhaps as summarized in an evidence-based clinical practice guideline. A technology may be considered necessary if it is likely to be effective and acceptably safe for particular patient indications, and if withholding it would be deleterious to the patient's health (Hilborne 1991; Kahan 1994; Singer 2001).

As described in chapter I, HTA inquires about the unintended consequences of health technologies as well an intended ones, which may involve some or all of the types of impacts assessed. Some unintended consequences include, or lead to, unanticipated uses of technologies. Box II-3 lists some recent examples.

Box II-3. Recent Examples of Unintended Consequences of Health Technology

Technology Intended or Original Uses Unintended Consequences or Unanticipated Uses
Antibiotics (antibacterials) Kill or inhibit growth of bacteria that cause infectious diseases Overuse and improper use leading to multi-drug resistant bacterial strains1
Antiretroviral therapy (ART) Treatment of HIV/AIDS Return to risky sexual behaviors in some patient groups2,3,4
Aspirin Relieve pain, fever, inflammation Antiplatelet to prevent blood clots5
Bariatric surgery Weight loss in obese patients Cure or remission of type 2 diabetes in many of the obese patients6
Medical ultrasonography Visualizing structures and blood flow in the body in real time Fetal sex selection7,8,9
Prostate cancer screening with PSA test Identify men with prostate cancer early enough to cure Invasive testing, therapies, and adverse effects for men with slow-growing/low-risk cases that will never cause symptoms10,11
Sildenafil Cardiovascular disorders, especially hypertension (used today for pulmonary arterial hypertension) Treat male sexual dysfunction12

Sources:

1Hollis A, Ahmed Z. Preserving antibiotics, rationally. N Engl J Med. 2013;369(26):2474-6.

2Fu TC, et al. Changes in sexual and drug-related risk behavior following antiretroviral therapy initiation among HIV-infected injection drug users. AIDS. 2012;26(18):2383-91.

3Kembabazi A, et al. Disinhibition in risky sexual behavior in men, but not women, during four years of antiretroviral therapy in rural, southwestern Uganda. PLoS One. 2013;8(7):e69634.

4Tun W, et al. Increase in sexual risk behavior associated with immunologic response to highly active antiretroviral therapy among HIV-infected injection drug users. Clin Infect Dis. 2004;38(8):1167-74.

5Hackam DG, Eikelboom JW. Antithrombotic treatment for peripheral arterial disease. Heart. 2007;93(3):303-8.

6Brethauer SA, et al. Can diabetes be surgically cured? Long-term metabolic effects of bariatric surgery in obese patients with type 2 diabetes mellitus. Ann Surg. 2013;258(4):628-36.

7George SM. Millions of missing girls: from fetal sexing to high technology sex selection in India. Prenat Diagn. 2006 Jul;26(7):604-9.

8Nie JB. Non-medical sex-selective abortion in China: ethical and public policy issues in the context of 40 million missing females. Br Med Bull. 2011;98:7-20.

9Thiele AT, Leier B. Towards an ethical policy for the prevention of fetal sex selection in Canada. J Obstet Gynaecol Can. 2010 Jan;32(1):54-7.

10Hayes JH, Barry MJ. Screening for prostate cancer with the prostate-specific antigen test: a review of current evidence. JAMA. 2014;311(11):1143-9.

11Lin K, Lipsitz R, Miller T, Janakiraman S; U.S. Preventive Services Task Force. Benefits and harms of prostate-specific antigen screening for prostate cancer: an evidence update for the U.S. Preventive Services Task Force. Ann Intern Med. 2008;149(3):192-9.

12Kling J. From hypertension to angina to Viagra. Mod Drug Discov. 1998;1(2):31-8.

1. Measuring Health Outcomes

Health outcome variables are used to measure the safety, efficacy and effectiveness of health care technologies. Main categories of health outcomes are:

  • Mortality (death rate)
  • Morbidity (disease rate)
  • Adverse health events (e.g., harmful side effects)
  • Quality of life
  • Functional status
  • Patient satisfaction

For example, for a cancer treatment, the main outcome of interest may be five-year survival rate; for treatments of coronary artery disease, the main endpoints may be incidence of fatal and nonfatal acute myocardial infarction (heart attack) and recurrence of angina pectoris (chest pain due to poor oxygen supply to the heart). Although mortality, morbidity, and adverse events are usually the outcomes of greatest interest, the other types of outcomes are often important as well to patients and others. Many technologies affect patients, family members, providers, employers, and other interested parties in other important ways; this is particularly true for many chronic diseases. As such, there is increasing emphasis on quality of life, functional status, patient satisfaction, and related types of patient outcomes.

In a clinical trial and other studies comparing alternative treatments, the effect on health outcomes of one treatment relative to another (e.g., a new treatment vs. a control treatment) can be expressed using various measures of treatment effect. These measures compare the probability of a given health outcome in the treatment group with the probability of the same outcome in a control group. Examples are absolute risk reduction, odds ratio, number needed to treat, and effect size. Box II-4 shows how choice of treatment effect measures can give different impressions of study results.

Box II-4. Choice of Treatment Effect Measures Can Give Different Impressions

A study of the effect of breast cancer screening can be used to contrast several treatment effect measures and show how they can give different impressions about the effectiveness of an intervention (Forrow 1992). In 1988, Andersson (1988) reported the results of a large RCT that was conducted to determine the effect of mammographic screening on mortality from breast cancer. The trial involved more than 42,000 women who were over 45 years old. Half of the women were invited to have mammographic screening and were treated as needed. The other women (control group) were not invited for screening.

The report of this trial states that "Overall, women in the study group aged >55 had a 20% reduction in mortality from breast cancer." Although this statement of relative risk reduction is true, it is based on the reduction from an already low-probability event in the control group to an even lower one in the screened group. Calculation of other types of treatment effect measures provides important additional information. The table below shows the number of women aged 55 and breast cancer deaths in the screened group and control group, respectively. Based on these results, four treatment effect measures are calculated.

For example, absolute risk reduction is the difference in the rate of adverse events between the screened group and the control group. In this trial, the absolute risk reduction of 0.0007 means that the absolute effect of screening was to reduce the incidence of breast cancer mortality by 7 deaths per 10,000 women screened, or 0.07%.

Group No. of Patients Deaths from breast cancer Probability of death from breast cancer Relative risk reduction1 Absolute reduction2 Odds ratio3 No. needed to screen4
Screened 13,107 35 Pc= 0.0027 20.6% 0.0007 0.79 1,429
Control 13,113 44 Pc= 0.0034

Women in the intervention group were invited to attend mammographic screening at intervals of 18-24 months. Five rounds of screening were completed. Breast cancer was treated according to stage at diagnosis. Mean follow-up was 8.8 years.

1. Relative risk reduction: (Pc- Ps) ÷ Pc

2. Absolute risk reduction: Pc- Ps

3. Odds ratio: [Ps÷ (1 - Ps)] ÷ [Pc÷ (1 - Pc)]

4. Number needed to screen to prevent one breast cancer death: 1 ÷ (Pc- Ps)

Source of number of patients and deaths from breast cancer: Andersson 1988

a. Biomarkers and Surrogate Endpoints

Certain health outcomes or clinical endpoints have particular roles in clinical trials, other research, and HTA, including biomarkers, intermediate endpoints, and surrogate endpoints.

A biomarker (or biological marker) is an objectively measured variable or trait that is used as an indicator of a normal biological process, a disease state, or effect of a treatment (Biomarkers Definitions Working Group 2001). It may be a physiological measurement (height, weight, blood pressure, etc. ), blood component or other biochemical assay (red blood cell count, viral load, glycated hemoglobin [HbA1c] level, etc. ), genetic data (presence of a specific genetic mutation), or measurement from an image (coronary artery stenosis, cancer metastases).

An intermediate endpoint is a non-ultimate endpoint (e.g., not mortality or morbidity) that may be associated with disease status or progression toward an ultimate endpoint such as mortality or morbidity. They include certain biomarkers (e.g., HbA1c in prediabetes or diabetes, bone density in osteoporosis, tumor progression in cancer) or disease symptoms (e.g., angina frequency in heart disease, measures of lung function in chronic obstructive pulmonary disease). Some intermediate endpoints can serve as surrogate endpoints.

A surrogate endpoint is a measure (typically a biomarker) that is used as a substitute for a clinical endpoint of interest, such as morbidity and mortality. They are used in clinical trials when it is impractical to measure the primary endpoint during the course of the trial, such as when observation of the clinical endpoint would require years of follow-up. A surrogate endpoint is assumed, based on scientific evidence, to be a valid and reliable predictor of a clinical endpoint of interest. As such, changes in a surrogate endpoint should be highly correlated with changes in the clinical endpoint. For example, a long-standing surrogate marker for risk of stroke is hypertension, although understanding continues to evolve of the respective and joint roles of systolic and diastolic pressures in predicting stroke in the general population and in high-risk populations (Malyszko 2013). RCTs of new drugs for HIV/AIDS use biological markers such as virological (e.g., plasma HIV RNA) levels (or “loads”) and immunological (e.g., CD4+ cell counts) levels (Lalezari 2003) as surrogates for mortality and morbidity. Other examples of surrogate endpoints for clinical endpoints are negative cultures for cures of bacterial infections and decrease of intraocular pressure for loss of vision in glaucoma.

b. Quality of Life Measures

Quality of life (QoL) measures, or “health-related quality of life” measures or indexes, are increasingly used along with more traditional outcome measures to assess efficacy and effectiveness, providing a more complete picture of the ways in which health care affects patients. QoL measures capture such dimensions (or domains) as: physical function, social function, cognitive function, anxiety/distress, bodily pain, sleep/rest, energy/fatigue and general health perception. These measures may be generic (covering overall health) or disease-specific. They may provide a single aggregate score or yield a set of scores, each for a particular dimension. Some examples of widely used generic measures are:

  • CAHPS (formerly Consumer Assessment of Healthcare Providers and Systems)
  • EuroQol (EQ-5D)
  • Health Utilities Index
  • Nottingham Health Profile
  • Quality of Well-Being Scale
  • Short Form (12) Health Survey (SF-12)
  • Short Form (36) Health Survey (SF-36)
  • Sickness Impact Profile

Dimensions of selected generic QoL measures that have been used extensively and that are well validated for certain applications are shown in Box II-5. There is an expanding literature on the relative strengths and weaknesses of these generic QoL indexes, including how sensitive they are to changes in quality of life for people with particular diseases and disorders (Coons 2000; Feeny 2011; Fryback 2007; Kaplan 2011; Kaplan 1998; Post 2001; Saban 2008).

EuroQol EQ-5D (Rabin 2001)

· Mobility · Pain/discomfort
· Self-care · Anxiety/depression
· Usual activities

Functional Independence Measure (Hsueh 2002; Linacre 1994)

· Self-care · Communication
· Sphincter control · Psychosocial
· Mobility · Cognition

Nottingham Health Profile (Doll 1993; Jenkinson 1988)

· Physical mobility · Energy
· Pain · Social isolation
· Sleep · Emotional reactions

Quality of Well-Being Scale (Frosch 2004; Kaplan 1989)

· Mobility · Social activity
· Physical activity · Symptom-problem complex

Short Form (SF)-36 (Martin 2011; Ware 1992)

· Physical functioning · Mental health
· Role - physical · Role - emotional
· Social functioning · Vitality
· Bodily pain · General health perceptions

Sickness Impact Profile (Bergner 1981; de Bruin 1992)

· Body care and movement · Emotional behavior
· Ambulation · Alertness behavior
· Mobility · Communication
· Sleep and rest · Social interaction
· Home management · Work
· Recreation and pastimes · Eating

Some of the diseases or conditions for which there are disease- (or condition-) specific measures are: angina, arthritis, asthma, epilepsy, heart disease, kidney disease, migraine, multiple sclerosis, urinary incontinence, and vision problems. See Box II-6 for dimensions used in selected measures.

Adult Asthma Quality of Life Questionnaire (Juniper 2005; Juniper 1993)

· Activity limitations · Exposure to environmental stimuli
· Emotional function · Symptoms

Arthritis Impact Measurement Scales (AIMS2) (Söderlin 2004; Meenan 1992)

· Mobility · Social activity
· Walking and bending · Support from family and friends
· Hand and finger function · Arthritis pain
· Arm function · Work
· Self care · Level of tension
· Household tasks · Mood

Urinary Incontinence-Specific Quality of Life Instrument (I-QOL) (Patrick 1999; Wagner 1996)

· Avoidance and limiting behavior · Social embarrassment
· Psychosocial impacts

Considerable advances have been made in the development and validation of generic and disease-specific measures since the 1980s. These measures are increasingly used by health product companies to differentiate their products from those of competitors, which may have virtually indistinguishable effects on morbidity for particular diseases (e.g., hypertension, depression, arthritis) but may have different side effect profiles that affect patients’ quality of life (Gregorian 2003).

c. Health-Adjusted Life Years: QALYs, DALYs, and More

The category of measures known as health-adjusted life years (HALYs) recognizes that changes in an individual’s health status or the burden of population health should reflect not only the dimension of life expectancy but a dimension of QoL or functional status. Three main types of HALYs are: quality-adjusted life years (QALYs), disability-adjusted life years (DALYs), and healthy-years equivalents (HYEs). One of the attributes of HALYs is that they are not specific to a particular disease or condition.

The QALY is a unit of health care outcome that combines gains (or losses) in length of life with quality of life. QALYs are usually used to represent years of life subsequent to a health care intervention that are weighted or adjusted for the quality of life experienced by the patient during those years (Torrance 1989). QALYs provide a common unit for multiple purposes, including: estimating the overall burden of disease; comparing the relative impact on personal and population health of specific diseases or conditions, comparing the relative impact on personal and population health of specific technologies; and making economic comparisons, such as of the cost-effectiveness (in particular the cost-utility) of different health care interventions. Some health economists and policymakers have proposed setting priorities among alternative health care interventions by selecting among these so as to maximize the additional health gain in terms of QALYs. This is intended to optimize allocation of scarce resources and thereby maximize social welfare (Gold 2002; Johannesson 1993; Mullahy 2001). QALYs are used routinely in assessing the impact or value of technologies by some HTA organizations, e.g., the National Institute for Health and Care Excellence (NICE) in the UK. Box II-7 illustrates the dual dimensions of QALYs, and how an intervention can result in a gain in QALYs.

Box II-7. Gain in Quality-Adjusted Life Years from a New Intervention

QALY = Length of life X Quality Weight

Survival and Quality of Life with Current Treatment

Survival and Quality of Life with New Treatment QALY Gain is Represented by the Area of Increased Survival and Quality of Life

ox II-7\.  Survival and Quality of Life with New Treatment. QALY Gain is Represented by the Area of Increased Survival and Quality of Life

Although HALYs arise from a common concept of adjusting duration of life by individuals’ experience of quality of life, they differ in ways that have implications for their appropriate use, including for assessing cost-effectiveness. QALYs are used primarily to adjust a person’s life expectancy by the levels of health-related quality of life that the person is predicted to experience during the remainder of life or some interval of it. DALYs are primarily used to measure population disease burden; they are a measure of something ‘lost’ rather than something ‘gained.’ The health-related quality of life weights used for QALYs are intended to represent quality of life levels experienced by individuals in particular health states, whereas the disability weights used for DALYs represent levels of loss of functioning caused by mental or physical disability caused by disease or injury. Another key distinction is that the burden of disability in calculating DALYs depends on one’s age. That is, DALYs incorporate an age-weighting function that assigns different weights to life years lived at different ages. Also, the origins of quality of life weights and disability weights are different (Sassi 2006; Fox-Rushby 2001).

The scale of quality of life used for QALYs can be based on general, multi-attribute QoL indexes or preference survey methods (Bleichrodt 1997; Doctor 2010; Weinstein 2010). The multi-attribute QoL indexes used for this purpose include, e.g., the SF-6D (based on the SF-36), EQ-5D, versions of the Health Utilities Index, and Quality of Well-Being Scale. The preference survey methods are used to elicit the utility or preferences of individuals (including patients, disabled persons, or others) for certain states of health or well-being, such as the standard gamble, time-tradeoff, or rating scale methods (e.g., a visual analog scale). Another preference survey method, the person trade-off, is used for eliciting preferences for the health states of a community or population, although the standard gamble, time tradeoff, and rating scales can be used at that level as well. This scale is typically standardized to a range of 0.0 (death) to 1.0 (perfect health). A scale may allow for ratings below 0.0 for states of disability and distress that some patients consider to be worse than death (Patrick 1994). Some work has been done to capture more dimensions of public preference and to better account for the value attributed to different health care interventions (Dolan 2001; Schwappach 2002). There is general agreement about the usefulness of the standard measures of health outcomes such as QALYs to enable comparisons of the impacts of technologies across diseases and populations, and standard approaches for valuing utilities for different health states. Among the areas of controversy are:

  • whether the QALY captures the full range of health benefits,
  • that the QALY does not account for social concerns for equity
  • whether the QALY is the most appropriate generic preference-based measure of utility
  • whether a QALY is the same regardless of who experiences it
  • what the appropriate perspective is for valuing health states, e.g., from the perspective of patients with particular diseases or the general public (Whitehead 2010).

Regarding perspective, for example, the values of the general public may not account for adaptation of the patients to changes in health states, and patients’ values may incorporate self-interest. Given this divergence, the appropriate perspective for health state valuations should depend on the context of the decisions or policies to be informed by the evaluation (Stamuli 2011; Oldridge 2008).

QoL measures and QALYs continue to be used in HTA while substantial work continues in reviewing, refining and validating them. As described in chapter V, Economic Analysis Methods, the QALY is often used as the unit of patient outcomes in cost-utility analyses.

2. Performance of Screening and Diagnostic Technologies

Screening and diagnostic tests provide information about the presence of a disease or other health condition. As such, they must be able to discriminate between patients who have a particular disease or condition and those who do not have it. Although the tests used for them are often the same, screening and diagnosis are distinct applications: screening is conducted in asymptomatic patients; diagnosis is conducted in symptomatic patients. As described below, whether a particular test is used for screening or it is used for diagnosis can have a great effect on the probability that the test result truly indicates whether or not a patient has a given disease or other health condition. Although these tests are most often recognized as being used for screening and diagnosis, there are other, related uses of these tests across the spectrum of managing a disease or condition, as listed in Box II-8.

Box II-8. Uses of Tests for Asymptomatic and Symptomatic Patients

Asymptomatic patients (no known disease)

  • Susceptibility: presence of a risk factor for a disease (e.g., a gene for a particular form of cancer)
  • Presence of (hidden or occult) disease (e.g., Pap smear for cervical cancer)

Symptomatic patients (known or probable disease)

  • Diagnosis: presence of a particular disease or condition (e.g., thyroid tests for suspected hyperthyroidism)
  • Differential diagnosis: determine which disease or condition a patient has from among multiple possible alternatives (e.g., in a process of elimination using a series of tests to rule out particular diseases or conditions)
  • Staging: extent or progression of a disease (e.g., imaging to determine stages of cancer)
  • Prognosis: probability of progression of a disease or condition to a particular health outcome (e.g., a multi-gene test for survival of a particular type of cancer)
  • Prediction: probability of a treatment to result in progression of a disease or condition to a particular health outcome (e.g., a genetic test for the responsiveness of colorectal cancer to a particular chemotherapy)
  • Surveillance: periodic testing for recurrence or other change in disease or condition status
  • Monitoring: response to treatment (e.g., response to anticoagulation therapy)

The technical performance of a test depends on multiple factors. Among these are the precision and accuracy of the test, the observer variation in reading the test data, and the relationship between the disease of interest and the designated cutoff level (threshold) of the variable (usually a biomarker) used to determine the presence or absence of that disease. These factors contribute to the ability of a test to detect a disease when it is present and to not detect a disease when it is not present.

A screening or diagnostic test can have four basic types of outcomes, as shown in Box II-9. A true positive test result is one that detects a marker when the disease is present. A true negative test result is one that does not detect the marker when the disease is absent. A false positive test result is one that detects a marker when the disease is absent. A false negative test result is one that does not detect a marker when the disease is present.

Box II-9. Possible Outcomes of a Screening or Diagnostic Test

Test Result True Disease Status
Present Absent
Positive (+) True + False +
Negative (-) False - True -

Operating characteristics of tests and procedures are measures of their technical performance. These characteristics are based on the probabilities of the four possible types of outcomes of a test noted above. The two most commonly used operating characteristics of screening and diagnostic tests are sensitivity and specificity. Sensitivity measures the ability of a test to detect a particular disease (e.g., a particular type of infection) or condition (a particular genotype) when it is present. Specificity measures the ability of a test to correctly exclude that disease or condition in a person who truly does not have that disease or condition. The sensitivity and specificity of a test are independent of the true prevalence of the disease or condition in the population being tested.

A graphical way of depicting these operating characteristics for a given diagnostic test is with a receiver operating characteristic (ROC) curve, which plots the relationship between the true positive ratio (sensitivity) and false positive ratio (1 - specificity) for all cutoff points of a disease or condition marker. For a perfect test, the area under the ROC curve would be 1.0; for a useless test (no better than a coin flip), the area under the ROC curve would be 0.5. ROC curves help to demonstrate how raising or lowering a cutoff point selected for defining a positive test result affects tradeoffs between correctly identifying people with a disease (true positives) and incorrectly labeling a person as positive who does not have the disease (false positives).

Sensitivity and specificity do not reveal the probability that a given patient really has a disease if the test is positive, or the probability that a given patient does not have the disease if the test is negative. These probabilities are captured by two other operating characteristics, shown in Box II-10. Positive predictive value is the proportion of those patients with a positive test result who actually have the disease. Negative predictive value is the proportion of patients with a negative test result who actually do not have the disease. Unlike sensitivity and specificity, the positive and negative predictive values of a test do depend on the true prevalence of the disease or condition in the population being tested. That is, the positive and negative predictive values of a test result are not constant performance characteristics of a test; they vary with the prevalence of the disease or condition in the population of interest. For example, if a disease is very rare in the population, even tests with high sensitivity and high specificity can have low predictive value positive, generating more false-positive than false negative results.

Box II-10. Operating Characteristics of Diagnostic Tests

Characteristic Formula Definition
Sensitivity True Positives Proportion of people with
Specificity True Negatives Proportion of people without
Positive predictive value True Positives Proportion of people with positive
Negative predictive value True Negatives Proportion of people with negative

a. Biomarkers and Cutoff Points in Disease Detection

The biomarker for certain diseases or conditions is typically defined as a certain cutoff level of one or more variables. Examples of variables used for biomarkers for particular diseases are systolic and diastolic blood pressure for hypertension, HbA1c level for type 2 diabetes, coronary calcium score for coronary artery disease, and high-sensitivity cardiac troponin T for acute myocardial infarction. The usefulness of such biomarkers in making a definitive finding about presence or absence of a disease or condition varies; many are used in conjunction with information from other tests or patient risk factors. Biomarkers used to detect diseases have distributions in non-diseased as well as in diseased populations. For most diseases, these distributions overlap, so that a single cutoff level does not clearly separate non-diseased from diseased people. For example, an HbA1c level of 6.5% may be designated as the cutoff point for diagnosing type 2 diabetes. In fact, some people whose HbA1c level is lower than 6.5% also have diabetes (as confirmed by other tests), and some people whose HbA1c level is higher than 6.5% do not have diabetes. Lowering the cutoff point to 6.0% or 5.5% will correctly identify more people who are diabetic, but it will also incorrectly identify more people as being diabetic who are not. For diabetes as well as other conditions, clinically useful cutoff points may vary among different population subgroups (e.g., by age or race/ethnicity).

A cutoff point that is set to detect more true positives will also yield more false positives; a cutoff point that is set to detect more true negatives will also yield more false negatives. There are various statistical approaches for determining “optimal” cutoff points, e.g., where the intent is to minimize total false positives and false negatives, with equal weight given to sensitivity and specificity (Perkins 2006). However, the selection of a cutoff point should consider the acceptable risks of false positives vs. false negatives. For example, if the penalty for a false negative test is high (e.g., in patients with a fatal disease for which there is an effective treatment), then the cutoff point is usually set to be highly sensitive to minimize false negatives. If the penalty for a false positive test is high (e.g., leading to confirmatory tests or treatments that are invasive, associated with adverse events, and expensive), then the cutoff point is usually set to be highly specific to minimize false positives. Given the different purposes of screening and diagnosis, and the associated penalties of false positives and false negatives, cutoff points may be set differently for screening and diagnosis of the same disease.

b. Tests and Health Outcomes

Beyond technical performance of screening and diagnostic tests, their effect on health outcomes or health-related quality of life is often less immediate or direct than for other types of technologies. The impacts of most preventive, therapeutic, and rehabilitative technologies on health outcomes can be assessed as direct cause-and-effect relationships between interventions and outcomes. However, the relationship between the use of screening and diagnostic tests and health outcomes is typically indirect, given intervening decisions or other steps between the test and health outcomes. Even highly accurate test results may be ignored or improperly interpreted by clinicians. Therapeutic decisions that are based on test results can have differential effects on patient outcomes. Also, the impact of those therapeutic decisions may be subject to other factors, such as patient adherence to a drug regimen. Even so, health care decision makers and policymakers increasingly seek direct or indirect evidence demonstrating that a test is likely to have an impact on clinical decisions and health care outcomes.

The effectiveness (or efficacy) of a diagnostic (or screening) technology can be determined along a chain of inquiry that leads from technical capacity of a technology to changes in patient health outcomes to cost effectiveness (where relevant to decision makers), as follows.

  1. Technical capacity. Does the technology perform reliably and deliver accurate information?
  2. Diagnostic accuracy. Does the technology contribute to making an accurate diagnosis?
  3. Diagnostic impact. Do the diagnostic results influence use of other diagnostic technologies, e.g., does it replace other diagnostic technologies?
  4. Therapeutic impact. Do the diagnostic findings influence the selection and delivery of treatment?
  5. Patient outcome. Does use of the diagnostic technology contribute to improved health of the patient?
  6. Cost effectiveness. Does use of the diagnostic technology improve the cost effectiveness of health care compared to alternative interventions?

If a diagnostic technology is not effective at any step along this chain, then it is not likely to be effective at any subsequent step. Effectiveness at a given step does not imply effectiveness at a later step (Feeny 1986; Fineberg 1977; Institute of Medicine 1985). An often-cited hierarchy of studies for assessing diagnostic imaging technologies that is consistent with the chain of inquiry noted above is shown in Box II-11. A generic analytical framework of the types of evidence questions that could be asked about the impacts of a screening test is presented in Box II-12. Some groups have developed standards for assessing the quality of studies of the accuracy of screening and diagnostic tests, such as for conducting systematic reviews of the literature on those tests (Smidt 2006; Whiting 2011).

Box II-11. Hierarchical Model of Efficacy for Diagnostic Imaging: Typical Measures of Analysis

Level 1. Technical efficacy

  • Resolution of line pairs
  • Modulation transfer function change
  • Gray-scale range
  • Amount of mottle
  • Sharpness

Level 2. Diagnostic accuracy efficacy

  • Yield of abnormal or normal diagnoses in a case series
  • Diagnostic accuracy (% correct diagnoses in case series)
  • Sensitivity and specificity in a defined clinical problem setting
  • Measures of area under the ROC curve

Level 3. Diagnostic thinking efficacy

  • Number (%) of cases in a series in which image judged "helpful" to making the diagnosis
  • Entropy change in differential diagnosis probability distribution
  • Difference in clinicians' subjectively estimated diagnosis probabilities pre- to post-test information
  • Empirical subjective log-likelihood ratio for test positive and negative in a case series

Level 4. Therapeutic efficacy

  • Number (%) of times image judged helpful in planning management of patient in a case series
  • % of times medical procedure avoided due to image information
  • Number (%) of times therapy planned before imaging changed after imaging information obtained (retrospectively inferred from clinical records)
  • Number (%) of times clinicians' prospectively stated therapeutic choices changed after information obtained

Level 5. Patient outcome efficacy

  • % of patients improved with test compared with/without test
  • Morbidity (or procedures) avoided after having image information
  • Change in quality-adjusted life expectancy
  • Expected value of test information in quality-adjusted life years (QALYs)
  • Cost per QALY saved with imaging information
  • Patient utility assessment; e.g., Markov modeling; time trade-off

Level 6. Societal efficacy

  • Benefit-cost analysis from societal viewpoint
  • Cost-effectiveness analysis from societal viewpoint

Source: Thornbury JR, Fryback DG. Technology assessment − An American view. Eur J Radiol. 1992;14(2):147-56.

Box II-12. Example of Analytical Framework of Evidence Questions: Screening

ox II-12\.  Example of Analytical Framework of Evidence Questions: Screening.  Source: Adapted from: Harris RP, Helfand M, Woolf SH, et al. Current methods of the US Preventive Services Task Force. A review of the process. Am J Prev Med. 2001;20(3S):21-35.

  • Is screening test accurate for target condition?
  • Does screening result in adverse effects?
  • Do screening test results influence treatment decisions?
  • Do treatments change intermediate outcomes?
  • Do treatments result in adverse effects?
  • Do changes in intermediate outcomes predict changes in health outcomes?
  • Does treatment improve health outcomes?
  • Is there direct evidence that screening improves health outcomes?

Source: Adapted from: Harris RP, Helfand M, Woolf SH, et al. Current methods of the US Preventive Services Task Force. A review of the process. Am J Prev Med. 2001;20(3S):21-35.

For diagnostic (or screening) technologies that are still prototypes or in other early stages of development, there may be limited data on which to base answers to such questions as these. Even so, investigators and advocates of diagnostic technologies should be prepared to describe, at least qualitatively, how the technology might affect diagnostic accuracy, diagnostic impact, therapeutic impact, patient outcomes and cost effectiveness (where appropriate); how these effects might be measured; approximately what levels of performance would be needed to successfully implement the technology; and how further investigations should be conducted to make these determinations.

3. Timing of Assessment

There is no single correct time to conduct an HTA. It is conducted to meet the needs of a variety of policymakers seeking assessment information throughout the lifecycles of technologies. Regulators, payers, clinicians, hospital managers, investors, and others tend to make decisions about technologies at particular junctures, and each may subsequently reassess technologies. Indeed, the determination of a technology's stage of diffusion may be the primary purpose of an assessment. For insurers and other payers, technologies that are deemed “experimental” or “investigational” are usually excluded from coverage, whereas those that are established or generally accepted are usually eligible for coverage (Newcomer 1990; Reiser 1994; Singer 2001).

There are tradeoffs inherent in decisions regarding the timing for HTA. On one hand, the earlier a technology is assessed, the more likely its diffusion can be curtailed if it is unsafe or ineffective (McKinlay 1981). From centuries’ old purging and bloodletting to the more recent autologous bone marrow transplantation with high-dose chemotherapy for advanced breast cancer, the list of poorly evaluated technologies that diffused into general practice before being found to be ineffective and/or harmful continues to grow. Box II-13shows examples of health care technologies found to be ineffective or harmful after being widely diffused.

**Box II-13. Technologies Found to be Ineffective or Harmful for Some or **All Indications After Diffusion**

  • Autologous bone marrow transplantation with high-dose chemotherapy for advanced breast cancer
  • Antiarrhythmic drugs
  • Bevacizumab for metastatic breast cancer
  • Colectomy to treat epilepsy
  • Diethylstilbestrol (DES) to improve pregnancy outcomes
  • Electronic fetal monitoring during labor without access to fetal scalp sampling
  • Episiotomy (routine or liberal) for birth
  • Extracranial-intracranial bypass to reduce risk of ischemic stroke
  • Gastric bubble for morbid obesity
  • Gastric freezing for peptic ulcer disease
  • Hormone replacement therapy for preventing heart disease in healthy menopausal women
  • Hydralazine for chronic heart failure
  • Intermittent positive pressure breathing
  • Mammary artery ligation for coronary artery disease
  • Magnetic resonance imaging (routine) for low back pain in first 6 weeks
  • Optic nerve decompression surgery for nonarteritic anterior ischemic optic neuropathy
  • Oxygen supplementation for premature infants
  • Prefrontal lobotomy for mental disturbances
  • Prostate-specific antigen (PSA) screening for prostate cancer
  • Quinidine for suppressing recurrences of atrial fibrillation
  • Radiation therapy for acne
  • Rofecoxib (COX-2 inhibitor) for anti-inflammation
  • Sleeping face down for healthy babies
  • Supplemental oxygen for healthy premature babies
  • Thalidomide for sedation in pregnant women
  • Thymic irradiation in healthy children
  • Triparanol (MER-29) for cholesterol reduction

Sources: Chou 2011; Coplen 1990; Enkin 2000; Feeny 1986; FDA Center for Drug Evaluation and Research 2010; Fletcher 2002; Grimes 1993; Mello 2001; The Ischemic Optic Neuropathy Decompression Trial Research Group 1995; Jüni 2004; Passamani 1991; Peters 2005; Rossouw 2002; Srinivas 2012; Toh 2010; US DHHS1990, 1993; others.

On the other hand, to regard the findings of an early assessment as definitive or final may be misleading. An investigational technology may not yet be perfected; its users may not yet be proficient; its costs may not yet have stabilized; it may not have been applied in enough circumstances to recognize its potential benefits; and its long-term outcomes may not yet be known (Mowatt 1997). As one technology assessor concluded about the problems of when-to-assess: “It’s always too early until, unfortunately, it’s suddenly too late!” (Buxton 1987). Further, the “moving target problem” can complicate HTA. By the time a HTA is conducted, reviewed, and disseminated, its findings may be outdated by changes in a technology, how it is used, its competing technologies (comparators) for a given health problem (indication), the health problems for which it is used, and other factors (Goodman 1996). See chapter VI, Determine Topics for HTA, for further discussion of identification of candidate assessment topics, horizon scanning, setting assessment priorities, reassessment, and the moving target problem.

In recent years, the demand for HTA by health care decision makers has increasingly involved requests for faster responses to help inform emergent decisions. This has led to development of “rapid HTAs” that are more focused, less-comprehensive assessments designed to provide high-level responses to such decision maker requests within approximately four-to-eight weeks. See discussion of rapid HTA in chapter X, Selected Issues in HTA.

Among the factors affecting the timing of HTA is the sufficiency of evidence to undertake an HTA. One of the types of circumstances in which there are tradeoffs in “when to assess” is a coverage decision for a new technology (or new application of an existing technology) for which there is promising, yet non-definitive or otherwise limited, evidence. For some of these technologies, delaying any reimbursement until sufficient evidence is available for a definitive coverage decision could deny access for certain patients with unmet medical need who might benefit. Further, the absence of any reimbursement could slow the generation of evidence. In such instances, payers may provide for coverage with evidence development or other forms of managed entry of the technology in which reimbursement is made for particular indications or other well-defined uses of the technology in exchange for collection of additional evidence. See further discussion of managed entry in chapter X.

results matching ""

    No results matching ""