 # Biostatistics and Epidemiology

## A brief introduction

Austin Meyer, PhD
MS4

## Roadmap - Exactly 90 minutes...

Basic Statistics

• Accuracy versus Precision - 2 minutes
• Statistical inference - 3 minutes
• Distributions - 5 minutes
• Hypothesis testing - 10 minutes

Epidemiology

• Prevention and Outbreaks - 5 minutes
• Disease metrics - 5 minutes
• Measures of risk - 10 minutes
• Clinical test characteristics - 15 minutes
• Minimal Bayesian statistics - 10 minutes
• Study bias - Mostly for independent
• Types of studies - 10 minutes
• Cinical application - 15 minutes

## There are some important random terms

• Generalizability

• How applicable is a finding to the general population
• P-value

• Probability of finding a value this extreme by random chance
• Confidence Interval

• Interval over which population value is found with a specified probability (e.g. 95%)
• Efficacy

• Performance of treatment under ideal circumstances
• Effectiveness

• Performance of treatment under real world circumstances

## Precision is repeatability, Accuracy is closeness ## Statistical distributions have invariant properties  ## Question #1

Investigators are studying prostate specific antigen (PSA) as a predictor for prostate cancer. To make the statistics easier, they are going to assume that PSA is a normally distributed population variable. Which of the following is correct under their assumption?

1. Mode is greater than median
2. Median is greater than mode
3. 95% CI depends on degrees of freedom
4. Median is equal to mean
5. Mean is equal to standard deviation

The normal distribution is unimodal and symmetric.

The important invariant properties (for you) of normal distributions are the following:

1. Mean = Median = Mode
2. Unimodal
3. Symmetric
4. Area under curve is 1
5. Constant relationship between standard deviation and percentiles

## Real distributions can have one or multiple peaks  ## Skew describes the direction of the tail  ## Question #2 Which of the following corresponds to the measures of central tendency on the graph from left to right?

1. mean, median, mode
2. mode, mean, median
3. median, mode, mean
4. mode, median, mean
5. mean, mode, median

Mode is most common, median is middle, mean is average value.

Always remember that the y-axis on these plots are counts or frequency. Therefore, which line is closest to the peak on the y-axis is the mode. The median is always in the middle. The mean is the most susceptible to outliers so in a skewed distribution it will always be farthest out on the tail.

## The null hypothesis ($H_0$) is always the default

• Assume:
• There are two or more groups being compared, or
• One group is being compared to zero, or
• One group is being compared to expectation.

• For Step 1, probably safe to assume null is always rejected with $p < 0.05$.
• For ratios (e.g. Relative Risk, Odds Ratio), a 95% CI not overlapping 1 is significant.
• For two sample tests, it is less straightforward how the CI relates to the p-value so don't worry about it.

• Once $H_0$ is rejected, we accept the alternative hypothesis $H_A$.

## T-test compares means of one or two groups • One sample test
• $H_0$ = There is no difference between group mean and zero

• Two sample test
• $H_0$ = There is no difference between the disease and no disease groups

• Paired test
• $H_0$ = The difference of a measured variable between two time points on the same individuals is zero
Will the plot be significant?

## T-test compares means of one or two groups

• Two sample: $H_0$ = There is no difference between the disease and no disease groups

• Run the T-test (in this case, in R language)

norm1 <- rnorm(5000, mean = 4.75, sd = 1.2)
norm2 <- rnorm(5000, mean = 5.25, sd = 1.2)
(t.test(norm1, norm2))\$p.value

##  4.687206e-88


• Have we rejected the null hypothesis?

• Yes, we have accepted $H_A$. There is a difference between disease and no disease groups.

## Chi-squared test uses categorical (count) data

• Two common tests
• Goodness-of-fit
• Test of independence
• Goodness-of-fit
• $H_0$: The number of cases occuring in a subgroup is consistent with random expectation
• $H_A$: The number of cases occuring in a subgroup is not consistent with random expectation
• Test of independence
• $H_0$: Categorical variable A and categorical variable B are independent
• $H_A$: Categorical variable A and categorical variable B are not independent

## Always expect a contingency table for chi-squared

Healthy Disease Total
Exposed 40 60 100
Not Exposed 500 400 900
Total 540 460 1000

Table 1: A 2x2 contingency table

Exposure Status Never Sick Sometimes Sick Mostly Sick Total
High 10 20 180 210
Medium 20 100 20 140
Low 100 40 10 150
Total 130 160 210 500

Table 2: A 3x3 contingency table

## The contingency table can be of any size

Exposure Status Never Sick Infrequently Sick Sometimes Sick Mostly Sick Always Sick Total
Super High 10 90 34 12 12 158
Very High 30 345 54 43 21 493
High 70 57 67 65 32 291
Medium 200 33 87 25 42 387
Low 130 89 58 45 56 378
Very Low 100 54 36 23 78 291
Super Low 90 23 36 63 8 220
Total 530 691 372 276 249 2118

Table 3: A 7x5 contingency table

## Pearson correlation compares values of two variables

Spearman correlation compares ranked values of two variables ## For correlation, r is the critical statistic

• Must be quantitative data
• Not count data

• $r =$ correlation between variables

• $r^2 =$ amount of variance in y that is explained by x

• p-value is still used for significance
• For Step 1, most likely significant at $p < 0.05$ ## A wider spread in $y$ means a lower $r^2$  ## Question #3

Clot No Clot Total
OCP Use 500 400 900
No OCP Use 80 20 100
Total 580 420 1000

A study was conducted on OCPs and blood clots, and the data is shown. Which of the following is the best method to assess the association between OCP use and blood clots?

1. Two sample T-test
2. Analysis of variance
3. Pearson correlation
4. Chi-square test
5. Spearman correlation

What kind of data is this?

The only test available that utilizes categorical data is the Chi-square test. All of the other tests require at least rank or quantitative data.

## Question #4

To test a new biomarker, investigators plan a cross-sectional study comprised of two groups. In one group, the researchers will include men with confirmed prostate cancer. In the other group, researchers will include men with no evidence of prostate cancer. The investigators will assume their biomarker is normally distributed. What is the best test to investigate whether the biomarker can distinguish the two groups?

1. Two sample Mann-Whitney U-test
2. Pearson correlation
3. Two sample T-test
4. Chi-squared test
5. Analysis of variance

The number of groups and distribution is all that matters

The two sample T-test is the appropriate test in this case. The two sample Mann-Whitney U-test could work as well, but is slightly less efficient for normally distributed data than the T-test. The Pearson correlation requires two measured variables on the same sample. A chi-squared test requires categorical (i.e. count) data. An analysis of variance is typically used to measure the difference in means of three or more groups.

## Hypothesis testing has four possible outcomes • Correct - Reject a false $H_0$
• Probability of success is called "power"
• Power depends on sample size
• bigger sample = bigger power
• Correct - Fail to reject a true $H_0$
• Probability determined by $\alpha$ as $1-\alpha$
• Type 1 - Incorrect rejection of a true $H_0$
• False Positive
• Type 2 - Failure to reject a false $H_0$
• False Negative

## Types of prevention

• Primary - Prevention
• An action taken to prevent development of disease in a person who is well
• Secondary - Screening
• Identifying people in whom disease has begun but who do not have signs or symptoms
• Tertiary - Treatment
• Preventing complications in those who have developed signs and symptoms and have been diagnosed
• Quaternary - Quit overtesting and overtreating
• Recent effort to minimize excessive healthcare interventions in disease process

## Endemic vs Sporadic vs Epidemic vs Pandemic ## Statistic differences lie in setting and time frame

• Attack rate

• Typically used during epidemics or pandemics
• Number of people who get disease / Number of people at risk
• Incidence

• Given a defined period of time
• Number of people who get disease / Number of people at risk
• Prevalence

• No time course (i.e. measured at a single point in time)
• Number of people with disease / Number of people at risk
• Simple diseases (e.g. SIR infections): Prevalence = Incidence x Average Disease Duration

## Odds and risk connect disease with exposure

• Odds
• Risk that someone with an exposure will get disease
• Odds ratio (OR)
• Excess odds of exposure of one population relative to another
• Risk - Must know disease prevalence
• Probability that someone with an exposure will get a disease
• Risk Ratio (Relative Risk or RR)
• Excess risk of one population relative to another
• Both significant if CI does not include 1 ## Question #6

Investigators are studying the association between mesothelioma and asbestos exposure. Due to the relative rarity of the disease, they design a very large case-control study. In the end, they find an $OR = 20\ (19.54;20.52,\ p < 0.001)$. After assuming that the OR is a good approximation of risk, the authors conclude that the risk of mesothelioma is 20 times higher in those exposed to asbestos compared to control. Why is their assumption reasonable?

1. The incidence of mesothelioma in the population is low
2. The sample size of this study is very large
3. The result is highly significant
4. The OR is always a good approximation of outcome risk
5. The 95% CI is very narrow around the OR of 20

Think about the denominators for odds and risks.

The odds ratio is (A / B) / (C / D) and the risk ratio is (A / (A + B)) / (C / (C + D)). In the case where the number of people with the disease is small, the numbers A and C become very small. In that case, B is a good approximation of A + B and D is a good aproximation of C + D. Thus, the RR ~ (A / B) / (C / D).

## OR approximates RR in low prevalence diseases If true infections are low, denominator $A+B \approx B$ and $C+D \approx D$

## Question #7

Two studies were conducted on different samples from the same population to assess the relationship between oral contraceptive use and the risk of deep venous thrombosis (DVT). Study A showed an increased risk of DVT among oral contraceptive users, with a relative risk of 2.0 and a 95% CI of 1.2-2.8. Study B showed a relative risk of 2.05 and a 95% CI of 0.8-3.1. Which of the following statements is most likely to be true regarding these 2 studies?

1. The p-value in study B is likely to be < 0.05
2. The result in study A is not accurate
3. The result in study A is not statistically significant
4. The result in study B is likely biased
5. The sample size is likely smaller in study B than study A

What gives a narrower confidence interval?

1. Incorrect - The CI in study B overlaps 1 so it is not significant
2. Incorrect - It is hard to judge accuracy without knowing the objective Truth
3. Inccorect - The CI in study A does not include 1 so it is statistically significant
4. Incorrect - There is no reason to believe B is biased
5. Correct - Per slide 23/38 bigger sample leads to improved ability to reject a false null hypothesis

## Absolute risk reduction is a risk difference

• Reminder
• Exposed: $Risk = \frac{A}{A + B}$
• Unexposed: $Risk = \frac{C}{C + D}$

• $AR = Risk_{Exposed} - Risk_{Unexposed}$
• $ARR = Risk_{Control} - Risk_{Treatment}$

• Number needed to treat
• Number of patients treated for ONE patient benefited
• $NNT = \frac{1}{ARR}$
• $NNH = \frac{1}{AR}$ ## Tests are usually cutoffs on a continuous variable ## Sensitivity: Efficiency of finding TRUE POSITIVES in Real Positives ## Specificity: Efficiency of finding TRUE NEGATIVES in Real Negatives ## Sensitivity and Specificity are critically important for all of medicine

• These are the defining characteristics of any clinical finding (e.g. history, physical, test, image).
• They do not depend on anything... they are intrinsic to the exam/test

• If someone quotes the prositive or negative predictive value of a test, they are wrong.

• Therefore, if you do not know the sensitivity or specificity of a test, you are missing information
• Without sensitivity and specificity, you cannot make a ROC curve
• Never trust a paper, poster, or company presentation that does not include a ROC curve

• Sensitivity
• $Sensitivity = \frac{TruePositives}{AllRealPositives}$
• $AllRealPositives = TP + FN$

• Specificity
• $Specificity = \frac{TrueNegatives}{AllRealNegatives}$
• $AllRealNegatives = TN + FP$

## Sensitivity and specificity are continuous when a test is continuous ## Unlike Sens and Spec, PPV and NPV vary with pre-test probability • Positive Predictive Value
• Chance that person has the disease after a positive test result
• $PPV = \frac{TP}{TP + FP}$

• Negative Predictive Value
• Chance that person does not have disease after a negative test result
• $NPV = \frac{TN}{TN + FN}$

• Both depend on how prevalent the disease is in the population
PPV depends more on sensitivity or specificity?

## PPV goes more with Specificity and NPV goes more with Sensitivity  ## This is what real diseases look like in the population This is the real prevalence of HIV... Where would you put the cutoff?

## Question #5 Assume a steady-state population that is not changing in any way. Which of the following statements is true for people who test positive regarding moving the cutoff for a positive test from the solid to the dotted line?

1. Decrease in test specificity
2. Increase in test sensitivity
3. Increase in PPV
4. Increase in NPV
5. Decrease in NPV

Question prefaces a positive test result

1. Incorrect - Moving the line to the right increases the specificity because it captures more true negatives as a portion of total negative individuals
2. Incorrect - Moving the line to the right decreases the sensitivity because it captures fewer true positives as a portion of total positive individuals
3. Correct - Moving the line to the right increase positive predictive value drives up the portion of true positives to total positive test by reducing the number of false positives
4. Incorrect - The question is concerned about positives tests which do not factor into negative predictive value
5. Incorrect - The question is concerned about positives tests which do not factor into negative predictive value

## ROC curves visually define clinical test yield • If the curve approximates the diagonal, it is a bad test
• $AUC = 0.5$ for a bad test
• If the curve goes up the y-axis and then turns right down x-axis, it is a perfect test
• $AUC = 1$ for a perfect test

## ROC curves also establish the optimal dichotomous cutoff The highest yield cutoff is the x-value that maximizes the distance from the diagonal to the curve

## With the optimal cutoff found, it maps to clinical test results Again, the best cutoff is the x-value that maximizes the distance from the diagonal to the curve

## Motivation.. In medicine, frequentist statistics is not too useful

• The value lies in the intuitive approach

• Frequentist: goal is to approximate objective truth through repeated trials
• The important metric is the probability that our estimate does not match reality

• Bayesian: goal is to approximate objective truth by updating prior probability with new evidence
• The important metric is the probability that our subjective experience matches reality

• Example: given a clinical test, which do you care more about?
• If your patient has disease, there is a 2% chance of getting a test result this extreme by chance.
• If your patient has a positive test, there is a 75% chance of having disease.

## Involves adding new information to existing probability

• This data is available all over the place (e.g. any prevalence data)
• Example: Region 6 prevalence of influenza right now is $\approx 4\%$

• Do something (e.g. rapid flu test)
• Absolute best Rapid Flu test: $Sensitivity \approx 70\%$ and $Specificity \approx 95\%$
• If positive, use likelihood ratio positive (a.k.a. bayes factor positive)
• $LR+ = \frac{sensitivity}{1 - specificity} = \frac{0.7}{1 - 0.95} = 14$
• If negative, use likelihood ratio negative (a.k.a. bayes factor negative)
• $LR- = \frac{1 - sensitivity}{specificity} = \frac{1 - 0.7}{0.95} = 0.32$

## Adjust the probability of disease with the following procedure

• Pre-test Probability $\rightarrow$ Pre-test Odds $\rightarrow$ Pre-test Odds x LR $\rightarrow$ Post-test Odds $\rightarrow$ Post-test Probability

• Probability $\rightarrow$ Odds: $O = \frac{P}{1 - P}$
• Odds $\rightarrow$ Probability: $P = \frac{O}{1 + O}$

• Positive test: $0.04 \rightarrow 0.04/0.96 = 0.042 \rightarrow 0.042 * 14 = 0.583 \rightarrow 0.583/1.583 = 0.37$
• Post-test probability following positive test: $37\%$

• Negative test: $0.04 \rightarrow 0.04/0.96 = 0.042 \rightarrow 0.042 * 0.32 = 0.0134 \rightarrow 0.0134/1.0134 = 0.0132$
• Post-test probability following negative test: $1.32\%$
• So this time of year a negative test is basically useless

## Biases of design or unseen variables

• Selection bias
• Non-random partitioning of individuals into groups
• Observer-expectancy
• Observer is unblinded and expects a particular outcome
• Effect modification bias
• Magnitude of effect varies by third variable
• Can be eliminated by stratification
• Confounding
• Unseen third variable is an underlying cause for correlation of two other variables
• Cannot be eliminated by stratification

## Biases of information (measurement)

• Recall bias
• Subjects with disease can recall exposures better than healthy subjects

• Procedure bias
• Experimenters vary systematically in the way they do work
• e.g. Experimenters don't follow the specified procedure

• Instrument bias
• Instrument is broken
• Instruments can also be things like surveys or clerkship evaluations
• Just means instrument is not reliable

## Biases of time and completion

• New test detects disease earlier
• Survival appears improved with new test

• Attrition bias
• Subjects systematically withdraw
• Could be things like side effects or lack of improvement

• Subjects randomly do not report for scheduled followup

## The pyramid of evidence is a hierarchy Closer to the top means better evidence

Experimental Trials

## Randomized control trial is in the name ## Randomized control trials are the gold standard

• This is widely considered the gold standard for clinical evidence

• Question: Primary purpose of randomization?
• Answer: To eliminate selection bias
• Selection bias is eliminated if randomization is technically correct

• Question: Secondary goal of randomization?
• Confounders are not necessarily eliminated even with perfect technical execution

• Can use relative risk because investigator knows prevalence of disease and prior exposures

## Crossover trial means the two groups switch • This post hoc analysis is overly simplified for real life

• This understanding is sufficient for step 1

• Confounders reduced because a patient can serve as their own control

Observational Studies

## Prospective cohorts follow groups into the future ## Retrospective cohorts follow groups from the past ## Cohorts form the next level of evidence

• Can use relative risk because investigator knows prevalence of exposure and disease
• Subjects vary by exposure status
• Can calculate incidence

• Selection bias is the biggest problem
• Investigator has infinite control over inclusion
• Other biases
• Attrition, loss-to-follow up, confounding, Hawthorne

• Retrospective
• Information bias

## Case-control trials measure chance of exposure given disease ## Case-control forms the next level down from cohorts

• Must use odds ratio because investigator does not know prevalence of disease
• Subjects grouped by cases and controls
• Measure odds of exposure in case and control groups
• Significantly improved power and decreased resource requirements compared to cohorts
• Due to cases being selected at out set
• Selection and Recall biases are the biggest problem
• Selecting appropriate controls is highly non-trivial
• Sick people remember exposures (e.g. Melanoma patients stew about their sunburns)
• Also common
• Information biases
• Cannot calculate incidence or prevalence

## Cross-sectional trials measure exposure and disease simultaneously ## Cross-sectional study form next level evidence

• Quick, cheap, and easy
• Typically this is a starting point
• Can establish prevalence of disease
• Must use chi-squared or correlation for statistical test
• Subjects can be grouped by exposure and diease in to the 2x2 contingency

• Cannot establish causation
• Cannot calculate risk metrics

## Question #8

A study was conducted to evaluate the efficacy of a new antiviral drug for the treatment of the common cold in young children. The study population consisted of 100 children between the age of 2 to 8 years. These children were diagnosed with rhinovirus infection and subsequently given the particular antiviral drug. One week later, it was observed that 92 of the 100 patient were asymptomatic. Which of the following is the true conclusion of this study?

1. The drug is highly effective as the effectiveness is 90%
2. The drug is moderately effective as the efficacy is 90%
3. An exact conclusion cannot be drawn from the study
4. The drug is not effective as the sample size is very small
5. No conclusion can be made, as compliance is generally very low in small children

A treatment is tested without a control

1. Incorrect - We can't compare to a real-world control.
2. Incorrect - We can't compare to an ideal control.
3. Correct - Most people with recover from a cold in a week or so.
4. Incorrect - The sample size may be adequate. There are no statistical tests to evaluate this statement.
5. Incorrect - Compliance would not be an issue in this case.

## Question #9

Researchers are studying the relationship between mutations in HMG-CoA reductase and CAD. The study population is selected at random. Tissue samples are obtained for genotyping and stress echos are performed to assess CHD. In the subsequent paper, the authors conclude that there is an association between mutations in HMG-CoA reductase and CHD. Which of the following study designs did the authors utilize?

1. Retrospective cohort study
2. Cross-sectional study
3. Randomized clinical trial
4. Prospective cohort study
5. Case-control trial

What does the timeline look like?

1. Incorrect - A retrospective cohort starts at some point in the past. There is no indication of a past time or chart review in this study.
2. Correct - A cross-sectional study is a "snap shot". It simultaneously determines both risk factors and disease. It can establish an association, but it cannot say much about causation because the timeline is unknown.
3. Incorrect - Although patients are randomly selected, a random clinical trial requires a control group and requires some treatment under investigation.
4. Incorrect - A prospective cohort starts in the present and follows a group into the future. There is no indication of time or following patients or recording expsoure.
5. Incorrect - A case-control trial requires identifying cases with disease and controls without disease, then identifying exposures, and calculating the risk of exposure in given disease. There is no indication of that here.

## Question #10

A study was conducted to evaluate the efficacy of a new antiviral drug. The study population consisted of 100 rhinovirus-infected children. The treatment arm was given an antiviral drug and the control arm was given a placebo. One week later, researchers found that 42 out of 50 treatment patients were asymptomatic and 30 of 50 control patients were asymptomatic. On average, how many people need to be treated with this drug to cure one infected person?

1. 25/12
2. 50/12
3. 50/15
5. 50/8

$ARR = Risk_{Control} - Risk_{Treatment}$

$NNT = \frac{1}{ARR}$

$ARR = \frac{42}{50} - \frac{30}{50} = \frac{12}{50}$

$NNT = \frac{1}{12/50} = \frac{50}{12}$

## A little controversy ## Subgroup demographics of the patients and physicians ## Question #11 Assuming that mortality is simply the incidence of death per 100 patients, after controlling for physician characteristics, what is the relative risk of death within 30 days of discharge for patients with a male physician versus a female physician?

1. 11.49
2. 11.07
3. 1.04
4. 0.96
5. 1.07

$Incidence_{Male} = \frac{a}{a + b}$

$Incidence_{Female} = \frac{c}{c + d}$

Thus, mortality is the risk of death.

Mortality is the risk of death. Then, the relative risk is:

$Risk_{Male} = 11.49$

$Risk_{Female} = 11.07$

$RR = \frac{11.49}{11.07}$

## Question #12 Controlling for physician characteristics as before, what is the absolute risk reduction of having a female physician?

1. 0.0042
2. 0.0142
3. 0.0049
4. 0.0064
5. 0.0342

Mortality is the risk of death.

$ARR = Risk_{Male} - Risk_{Female}$

Mortality is the risk of death. Since this is subtraction it is important to have the units correct.

$ARR = 0.1149 - 0.1107 = 0.0042$

## Question #13 Given the information from the previous slide, on average how many patients would need to be treated by a female physician to save a life?

1. 2.4
2. 8.7
3. 87.1
4. 238.1
5. 871.4

Mortality is the risk of death.

$ARR = Risk_{Male} - Risk_{Female}$

$NNT = \frac{1}{ARR}$

Thus, mortality is the risk of death. Since this is subtraction it is important to have the units correct.

$ARR = 0.1149 - 0.1107 = 0.0042$

$NNT = \frac{1}{ARR} = \frac{1}{0.0042} = 238.1$

## Question #14 Given the information from the previous slide, if we no longer allowed men to treat general medicine patients approximately how long on average would it take for a female physician to save a patient that otherwise would have died under the previous treatment system?

1. 2 months
2. 8 months
3. 1.5 years
4. 3.5 years
5. 5 years

Approximately how many individual study patients are being seen each year by the physicians in this study?

In this study, between 130 and 180 patients are being seen annually by female and male physicians respectively. Thus, the time to save a patient in years is:

$\frac{238.1}{131.9} = 1.81$

$\frac{238.1}{180.5} = 1.32$

The End