Validity and Reliability of Clinical Tests - 6 minutes

Measures of Risk - 10 minutes

Epidemiological Biases - 5 minutes

Types of Studies - 10 minutes

Biostatistics

There are some important random terms

Generalizability

How applicable is a finding to the general population

P-value

Probability of finding a value this extreme by random chance

Confidence Interval

Interval over which population value is found with a specified probability (e.g. 95%)

Efficacy

Performance of treatment under ideal circumstances

Effectiveness

Performance of treatment under real world circumstances

Precision is repeatability, Accuracy is closeness

Describing Distributions

Statistical distributions have invariant properties

Question #1

Investigators are studying prostate specific antigen (PSA) as a predictor for prostate cancer. To make the statistics easier, they are going to assume that PSA is a normally distributed population variable. Which of the following is correct under their assumption?

Mode is greater than median

Median is greater than mode

95% CI depends on degrees of freedom

Median is equal to mean

Mean is equal to standard deviation

The normal distribution is unimodal and symmetric.

The important invariant properties (for you) of normal distributions are the following:

Mean = Median = Mode

Unimodal

Symmetric

Area under curve is 1

Constant relationship between standard deviation and percentiles

Real distributions can have one or multiple peaks

Skew describes the direction of the tail

Question #2

Which of the following corresponds to the measures of central tendency on the graph from left to right?

mean, median, mode

mode, mean, median

median, mode, mean

mode, median, mean

mean, mode, median

Mode is most common, median is middle, mean is average value.

Always remember that the y-axis on these plots are counts or frequency. Therefore, which line is closest to the peak on the y-axis is the mode. The median is always in the middle. The mean is the most susceptible to outliers so in a skewed distribution it will always be farthest out on the tail.

Hypothesis Testing

The null hypothesis (\(H_0\)) is always the default

Assume:

There are two or more groups being compared, or

One group being compared to zero, or

One group is being compared to expectation.

For Step 1, probably safe to assume null is always rejected with \(p < 0.05\).

For ratios (e.g. Relative Risk, Odds Ratio), a 95% CI not overlapping 1 is significant.

For two sample tests, it is less straightforward how the CI relates to the p-value

Once \(H_0\) is rejected, we accept the alternative hypothesis \(H_A\).

T-test compares means of one or two groups

One sample: \(H_0\) = There is no difference between group mean and zero

Two sample: \(H_0\) = There is no difference between the disease and no disease groups

Paired: \(H_0\) = The difference of a measured variable between two time points on the same individuals is zero

Will the plot be significant?

T-test compares means of one or two groups

Two sample: \(H_0\) = There is no difference between the disease and no disease groups

Run the t-test

norm1 <- rnorm(5000, mean = 4.75, sd = 1.2)
norm2 <- rnorm(5000, mean = 5.25, sd = 1.2)
(t.test(norm1, norm2))$p.value

## [1] 1.505077e-95

Have we rejected the null hypothesis?

Yes, we have accepted \(H_A\). There is a difference between disease and no disease groups.

Chi-squared test uses categorical (count) data

Two common tests

Goodness-of-fit

Test of independence

Goodness-of-fit

\(H_0\): The number of cases occuring in a subgroup is consistent with expected

\(H_A\): The number of cases occuring in a subgroup is not consistent with expected

Test of independence

\(H_0\): Categorical variable A and categorical variable B are independent

\(H_A\): Categorical variable A and categorical variable B are not independent

Always expect a contingency table for chi-squared

Healthy

Disease

Total

Exposed

40

60

100

Not Exposed

500

400

900

Total

540

460

1000

Table 1: A 2x2 contingency table

Exposure Status

Never Sick

Sometimes Sick

Mostly Sick

Total

High

10

20

180

210

Medium

20

100

20

140

Low

100

40

10

150

Total

130

160

210

500

Table 2: A 3x3 contingency table

The contingency table can be of any size

Exposure Status

Never Sick

Infrequently Sick

Sometimes Sick

Mostly Sick

Always Sick

Total

Super High

10

90

34

12

12

158

Very High

30

345

54

43

21

493

High

70

57

67

65

32

291

Medium

200

33

87

25

42

387

Low

130

89

58

45

56

378

Very Low

100

54

36

23

78

291

Super Low

90

23

36

63

8

220

Total

530

691

372

276

249

2118

Table 3: A 7x5 contingency table

Pearson correlation compares two variables

The correlation can be positive or negative

For correlation, r is the critical statistic

Must be quantitative data

Not count data

\(r =\) correlation between variables

\(r^2 = \) amount of variance in y that is explained by x

p-value is still used for significance

For Step 1, most likely significant at \(p < 0.05\)

A wider spread in \(y\) means a lower \(r^2\)

Question #3

Clot

No Clot

Total

OCP Use

500

400

900

No OCP Use

80

20

100

Total

580

420

1000

A study was conducted to assess the association between oral contraceptive (OCP) use and confirmed blood clots. The data from the study are presented to the left. Which of the following is the best method to assess the association between OCP use and blood clots?

Two sample T-test

Analysis of variance

Pearson correlation

Chi-square test

Spearman correlation

What kind of data is this?

The only test available that utilizes categorical data is the Chi-square test. All of the other tests require at least rank or quantitative data.

Question #4

Investigators developed a new serum biomarker as a predictor for prostate cancer. To test it, they plan a cross-sectional study comprised of two groups. In one group, the researchers will include measurements of men with biopsy confirmed prostate cancer. In the other group, researchers will measure the level of their biomarker in men that have never previously been diagnosed with prostate cancer nor had a positive PSA test. The investigators will assume their biomarker is normally distributed. What is the best test to investigate whether the biomarker can distinguish the two groups?

Two sample Mann-Whitney U-test

Pearson correlation

Two sample T-test

Chi-squared test

Analysis of variance

The number of groups and distribution is all that matters

The two sample T-test is the appropriate test in this case. The two sample Mann-Whitney U-test could work as well, but is slightly less efficient for normally distributed data than the T-test. The Pearson correlation requires two measured variables on the same sample. A chi-squared test requires categorical (i.e. count) data. An analysis of variance is typically used to measure the difference in means of three or more groups.

Hypothesis testing has four possible outcomes

Correct - Reject a false \(H_0\)

Probability of success is called "power"

Power depends on sample size

bigger sample = bigger power

Correct - Fail to reject a true \(H_0\)

Probability determined by \(\alpha\) as \(1-\alpha\)

Type 1 - Incorrect rejection of a true \(H_0\)

False Positive

Type 2 - Failure to reject a false \(H_0\)

False Negative

Epidemiology

Types of prevention

Primary - Prevention

An action taken to prevent development of disease in a person who is well

Secondary - Screening

Identifying people in whom disease has begun but who do not have signs or symptoms

Tertiary - Treatment

Preventing complications in those who have developed signs and symptoms and have been diagnosed

Quaternary - Quit overtesting and overtreating

Recent effort to minimize excessive healthcare interventions in disease process

Endemic vs Sporadic vs Epidemic vs Pandemic

Statistic differences lie in setting and time frame

Attack rate

Typically used during epidemics or pandemics

Number of people who get disease / Number of people at risk

Incidence

Given a defined period of time

Number of people who get disease / Number of people at risk

Prevalence

No time course (i.e. measured at a single point in time)

Number of people with disease / Number of people at risk

Simple diseases (e.g. SIR infections): Prevalence = Incidence x Average Disease Duration

Tests are usually cutoffs on a continuous variable

Sensitivity is true positives / number with disease

Specificity is true negatives / number w/o disease

PPV and NPV vary based on pre-test probability

Positive Predictive Value

Chance that person has the disease after a positive test result

\(PPV = TP / (TP + FP)\)

Negative Predictive Value

Chance that person does not have disease after a negative test result

\(NPV = TN / (TN + FN)\)

Both depend on how prevalent the disease is in the population

This is what real diseases look like in the population

This is the real prevalence of HIV... Where would you put the cutoff?

Question #5

Assume a steady-state population that is not changing in anyway. Which of the following statements is true for people who test positive regarding moving the cutoff for a positive test from the solid to the dotted line?

Decrease in test specificity

Increase in test sensitivity

Increase in PPV

Increase in NPV

Decrease in NPV

Question prefaces a positive test result

Incorrect - Moving the line to the right increases the specificity because it captures more true negatives as a portion of total negative individuals

Incorrect - Moving the line to the right decreases the sensitivity because it captures fewer true positives as a portion of total positive individuals

Correct - Moving the line to the right increase positive predictive value drives up the portion of true positives to total positive test by reducing the number of false positives

Incorrect - The question is concerned about positives tests which do not factor into negative predictive value

Incorrect - The question is concerned about positives tests which do not factor into negative predictive value

Odds and risk connect disease with exposure

Odds

Risk that someone with an exposure will get disease

Odds ratio (OR)

Excess odds of exposure of one population relative to another

Risk - Must know disease prevalence

Probability that someone with an exposure will get a disease

Risk Ratio (Relative Risk or RR)

Excess risk of one population relative to another

Both significant if CI does not include 1

Question #6

Investigators are studying the association between mesothelioma and asbestos exposure. Due to the relative rarity of the disease, they design a very large case-control study. In the end, they find an \(OR = 20 (19.54;20.52, p < 0.001)\). After assuming that the OR is a good approximation of risk, the authors conclude that the risk of mesothelioma is 20 times higher in those exposed to asbestos compared to control. Why is their assumption reasonable?

The incidence of mesothelioma in the population is low

The sample size of this study is very large

The result is highly significant

The OR is always a good approximation of outcome risk

The 95% CI is very narrow around the OR of 20

Think about the denominators for odds and risks.

The odds ratio is (A / B) / (C / D) and the risk ratio is (A / (A + B)) / (C / (C + D)). In the case where the number of people with the disease is small, the numbers A and C become very small. In that case, B is a good approximation of A + B and D is a good aproximation of C + D. Thus, the RR ~ (A / B) / (C / D).

OR approximates RR in low prevalence diseases

If true infections are low, denominator \(A+B \approx B\) and \(C+D \approx D\)

Question #7

Two studies were conducted on different samples from the same population to assess the relationship between oral contraceptive use and the risk of deep venous thrombosis (DVT). Study A showed an increased risk of DVT among oral contraceptive users, with a relative risk of 2.0 and a 95% CI of 1.2-2.8. Study B showed a relative risk of 2.05 and a 95% CI of 0.8-3.1. Which of the following statements is most likely to be true regarding these 2 studies?

The p-value in study B is likely to be < 0.05

The result in study A is not accurate

The result in study A is not statistically significant

The result in study B is likely biased

The sample size is likely smaller in study B than study A

What gives a narrower confidence interval?

Incorrect - The CI in study B overlaps 1 so it is not significant

Incorrect - It is hard to judge accuracy without knowing the objective Truth

Inccorect - The CI in study A does not include 1 so it is statistically significant

Incorrect - There is no reason to believe B is biased

Correct - Per slide 23/38 bigger sample leads to improved ability to reject a false null hypothesis

Absolute risk reduction is a risk difference

Reminder

Exposed: \(Risk = A / (A + B)\)

Unexposed: \(Risk = C / (C + D)\)

\(AR = Risk_{Exposed} - Risk_{Unexposed}\)

\(ARR = Risk_{Control} - Risk_{Treatment}\)

Number needed to treat

Number of patients treated for ONE patient benefited

\(NNT = 1 / ARR\)

\(NNH = 1 / AR\)

Types of Biases - My groupings

Biases of design or unseen variables

Selection bias

Non-random partitioning of individuals into groups

Observer-expectancy

Observer is unblinded and expects a particular outcome

Hawthorne effect

Subjects improve health behaviors because someone is watching

Effect modification bias

Magnitude of effect varies by third variable

Can be eliminated by stratification

Confounding

Unseen third variable is an underlying cause for correlation of two other variables

Cannot be eliminated by stratification

Biases of information (measurement)

Recall bias

Subjects with disease can recall exposures better than healthy subjects

Procedure bias

Experimenters vary systematically in the way they do work

e.g. Experimenters don't follow the specified procedure

Instrument bias

Instrument is broken

Instruments can also be things like surveys or clerkship evaluations

Just means instrument is not reliable

Biases of time and completion

Lead-time bias

New test detects disease earlier

Survival appears improved with new test

Attrition bias

Subjects systematically withdraw

Could be things like side effects or lack of improvement

Loss-to-follow up

Subjects randomly do not report for scheduled followup

Types of studies

The pyramid of evidence is a hierarchy

Closer to the top means better evidence

Experimental Trials

Randomized control trial is in the name

Randomized control trials are the gold standard

This is widely considered the gold standard for clinical evidence

Question: Primary purpose of randomization?

Answer: To eliminate selection bias

Selection bias is eliminated if randomization is technically correct

Question: Secondary goal of randomization?

Answer: To control confounders

Confounders are not necessarily eliminated even with perfect technical execution

Can use relative risk because investigator knows prevalence of disease and prior exposures

Crossover trial means the two groups switch

This post hoc analysis is overly simplified for real life

This understanding is sufficient for step 1

Confounders reduced because a patient can serve as their own control

Observational Studies

Prospective cohorts follow groups into the future

Retrospective cohorts follow groups from the past

Cohorts form the next level of evidence

Can use relative risk because investigator knows prevalence of exposure and disease

Case-control trials measure chance of exposure given disease

Case-control forms the next level down from cohorts

Must use odds ratio because investigator does not know prevalence of disease

Subjects grouped by cases and controls

Measure odds of exposure in case and control groups

Significantly improved power and decreased resource requirements compared to cohorts

Due to cases being selected at out set

Selection and Recall biases are the biggest problem

Selecting appropriate controls is highly non-trivial

Sick people remember exposures (e.g. Melanoma patients stew about their sunburns)

Also common

Information biases

Cannot calculate incidence or prevalence

Cross-sectional trials measure exposure and disease simultaneously

Cross-sectional study form next level evidence

Quick, cheap, and easy

Typically this is a starting point

Can establish prevalence of disease

Must use chi-squared or correlation for statistical test

Subjects can be grouped by exposure and diease in to the 2x2 contingency

Cannot establish causation

Cannot calculate risk metrics

Question #8

A study was conducted to evaluate the efficacy of a new antiviral drug for the treatment of the common cold in young children. The study population consisted of 100 children between the age of 2 to 8 years. These children were diagnosed with rhinovirus infection and subsequently given the particular antiviral drug. One week later, it was observed that 92 of the 100 patient were asymptomatic. Which of the following is the true conclusion of this study?

The drug is highly effective as the effectiveness is 90%

The drug is moderately effective as the efficacy is 90%

An exact conclusion cannot be drawn from the study

The drug is not effective as the sample size is very small

No conclusion can be made, as compliance is generally very low in small children

A treatment is tested without a control

Incorrect - We can't compare to a real-world control.

Incorrect - We can't compare to an ideal control.

Correct - Most people with recover from a cold in a week or so.

Incorrect - The sample size may be adequate. There are no statistical tests to evaluate this statement.

Incorrect - Compliance would not be an issue in this case.

Question #9

A group of researchers are studying the relationship between mutations in HMG-CoA reductase and coronary heart disease (CHD). The study population is selected at random. Tissue samples are obtained for genotyping and stress echocardiograms are performed to assess CHD. In the subsequent paper, the authors conclude that there is an association between particular mutations in HMG-CoA reductase and CHD. Which of the following study designs did the authors utilize?

Retrospective cohort study

Cross-sectional study

Randomized clinical trial

Prospective cohort study

Case-control trial

What does the timeline look like?

Incorrect - A retrospective cohort starts at some point in the past. There is no indication of a past time or chart review in this study.

Correct - A cross-sectional study is a "snap shot". It simultaneously determines both risk factors and disease. It can establish an association, but it cannot say much about causation because the timeline is unknown.

Incorrect - Although patients are randomly selected, a random clinical trial requires a control group and requires some treatment under investigation.

Incorrect - A prospective cohort starts in the present and follows a group into the future. There is no indication of time or following patients or recording expsoure.

Incorrect - A case-control trial requires identifying cases with disease and controls without disease, then identifying exposures, and calculating the risk of exposure in given disease. There is no indication of that here.

Question #10

A study was conducted to evaluate the efficacy of a new antiviral drug for the treatment of the common cold in young children. The study population consisted of 100 children between the age of 2 to 8 years divided into control and treatment arms. These children were diagnosed with rhinovirus infection and subsequently given the particular antiviral drug. One week later, it was observed that 42 out of 50 treatment patients were asymptomatic and 30 of 50 control patients were asymptomatic. How many people need to be treated with this drug for one to reach the primary end point of this study?