Problems and solutions

1. TDT test: calculating the test statistic and its p-value

In a transimission disequilibrium test we had the data

n11 = 99, n12 = 39, n21 = 86, n22 = 16
The test statistic is
T = (n12 - n21)^2 / (n12 + n21)
Its distribution under H0 is (approximately) chi-squared with df = 1.

Calculate the value of T, and its p-value, (but please, do not use T as the name of your variable). The R name of the chi-squared distribution is chisq


Solution:

n12 <- 39
n21 <- 86
# test statistic
(t.stat <- (n12 - n21)^2 / (n12 + n21))
# and its p-value
pchisq(tobs, df = 1, lower = F)


2. TDT test: calculating the Bayes factor

We use the same data as in the previous problem. The hypotheses are

n12 ~ Binomial(n12 + n21, 0.5) under H0
n12 ~ Binomial(n12 + n21, p1) under H1
H0 is obtained from H1 by setting p1 = 0.5. When we use the prior
p1 ~ Beta(0.5, 0.5)
then the posterior of p1 (under H1) is
g(p1 | n12, n21) = Beta(p1 | n12 + 0.5, n21 + 0.5)

Calculate the Bayes factor using the Savage-Dickey density ratio (the ratio of prior to posterior densities at p1 = 0.5). For this, you need the density of the beta distribution, i.e., the function dbeta().


Solution:

# The density of Beta(a, b) distribution at x is dbeta(x, a, b)

# (hyper)parameters of the prior:
a <- 0.5
b <- 0.5
(B <- dbeta(0.5, a, b) / dbeta(0.5, n12 + a, n21 + b))


3. Cancer at Slater: p-value

In the Cancer at Slater example we had observed 8 cases, and the statistical model (under H0) is the binomial distribution with sample size 145 and probability parameter 0.03.

Calculate the p-value

The R name of the binomial distribution is binom.


Solution:

# p-value by summing the probability mass function:
sum(dbinom(8:145, size = 145, prob = 0.03))
# p-value by using pbinom():
pbinom(7.5, size = 145, prob = 0.03, lower = F)

Comment: I didn't call pbinom() with an integer argument, becouse I did not want to depend on the designers of that function having made the same choice as me as regards to which of the two functions, the CDF or its complement, the point mass at x = 8 belongs.


4. Cancer at Slater: a Bayesian approach

In the Bayesian treatment of the Cancer at Slater example discussed in the lectures, we considered only hypothesis theta = 0.03, 0.04, 0.05, 0.06. Why so few alternative hypotheses? Has the example been rigged to make the alternative hypothesis seem worse than it really should be? Let's extend the theta grid and check the results!

Now we reconsider the same problem, but this case consider the values theta = 0.03, 0.04, ..., 0.15. Take the prior probability of the null (theta = 0.03) to be 0.5, and distribute the rest of the prior probability evenly over the other theta values. The observation is x = 8 and the likelihood is the binomial likelihood with sample size 145 and probability parameter theta.

Calculate posterior probabilities of the different 'theories' theta = 0.03, 0.04, ..., 0.15.

Advice: once you have formed a vector containing the prior probabilities (corresponding to different thetas), and a vector containing the likelihood values (for different thetas), then the posterior probabilities are given by normalizing the product of prior times likelihood to sum to one, i.e., in the following manner

(prior * likelihood) / sum(prior * likelihood)

Solution:

theta <- seq(0.03, 0.15, by = 0.01)
k <- length(theta) - 1
prior <- c(0.5, 0.5 * rep(1, k) / k)

likelihood <- dbinom(x = 8, size = 145, prob = theta)

posterior <- prior * likelihood / sum(prior * likelihood)

plot(theta, posterior, type = 'h')

# odds in favour of the alternative:

(1 - posterior[1]) / posterior[1]

# ... the alternative hypothesis seems now even worse than previously



5. Vitamin C, a Bayesian approach

In the Vitamin C example, the observation is x = 13 in the model where x has binomial distribution with sample size 17 and probability parameter p. The null hypothesis is p = 0.5, and the prior probabilities of the null and the alternative hypothesis are equal.

Under the alternative hypothesis, the prior distribution of p is the uniform distribution on (p0, 1 - p0), where 0 < p0 < 0.5. The density of this distribution has the constant value 1 / (1 - 2*p0) on (p0, 1 - p0) and is zero elsewhere.

Go through the formulas for calculating the posterior probability of H0, and calculate and plot its value for p0 = 0.01, 0.02, ..., 0.49. (Check that the posterior probability stays above 0.20.)

Advice: R has the function integrate() for numerical integration. E.g., to calculate the integral of x^2 over (0, 1), use it as follows.

f <- function(x) x^2
int <- integrate(f, 0, 1)
int
However, integrate() returns an object, whose structure you must unravel (hint: try str(int)) before you can store the (approximate) value of the integral as the element of a numerical vector, whose elements you calculate using a for-loop.


Solution:

p0 <- seq(0.01, 0.49, by = 0.01)

f <- function(p) p^13 * (1 - p)^4

posth0 <- numeric(length(p0)) # set up a vector for the results

for (i in 1:length(p0)) {
  v <- integrate(f, p0[i], 1 - p0[i])
  posth0[i] <- (1 + 2^17 / (1 - 2 * p0[i]) * v$value)^(-1)
}


plot(p0, posth0, type = 'l', ylab = 'Pr(H0 | x)')

# NB: the integrals could be calculated here using pbeta() instead of
# integrate().

6. Case control dataset: frequentist inference

Consider the following case-control dataset, where row one contains counts for cases and row two for controls, and where there are two marker classes (the columns).

(n <- matrix(c(602, 1065, 28, 101), 2, 2))

(There are functions in R for calculating X^2 and OR, but we won't reveal their names. Try to do without ready-made functions.)


Solution:

(n <- matrix(c(602, 1065, 28, 101), 2, 2))

s <- sum(n)
x2 <- (s * (n[1,1] * n[2,2] - n[1,2] * n[2, 1])^2) /
  ((n[1,1] + n[1, 2]) * (n[1,1] + n[2,1]) *
   (n[1,2] + n[2,2]) * (n[2,1] + n[2,2]))

x2

# p-value:

pchisq(x2, df = 1, lower = F)

# The previous values can be calculated more easily with

chisq.test(n, correct = F)

# odds ratio

OR <- n[1,1] * n[2,2] / (n[1,2] * n[2,1])

# an approximate confidence interval for the odds ratio:

exp(log(OR) + c(-1, 1) * 1.96 * sqrt(sum(1 / n)))

# The previous values, and the p-value can be calculated more easily with

fisher.test(n)


7. Case control dataset: Bayes factors

We still use the data of the previous problem.

Eq. (30) comes from the analysis, which uses priors

Here pij is the expected proportion for n[i,j] and p11 = 1 - p12 and p21 = 1 - p22.

Hint: Pr(data given H1) factorizes into a product of two one-dimensional integrals; Pr(data given H0) is a one-dimensional integral.

Warning: You will probably encounter numerical problems.


Solution:

# Bayes factor from eq. (30)

B <- beta(n[1,1] + 0.5, n[1,2] + 0.5) * beta(n[2,1] + 0.5, n[2,2] + 0.5) /
  (beta(0.5, 0.5) * beta(n[1,1] + n[2,1] + 0.5, n[1,2] + n[2,2] + 0.5))

# Bayes factor by numerical integration

# First define the three integrands

f11 <- function(p)
  dbinom(n[1,2], size = sum(n[1,]), prob = p) * dbeta(p, 0.5, 0.5)

f12 <- function(p)
  dbinom(n[2,2], size = sum(n[2,]), prob = p)  * dbeta(p, 0.5, 0.5)

f0 <- function(p)
  (dbinom(n[1,2], size = sum(n[1,]), prob = p) *
   dbinom(n[2,2], size = sum(n[2,]), prob = p) * dbeta(p, 0.5, 0.5))
   

# First attempt:

marg.lik.h0 <- integrate(f0, 0, 1)$value
marg.lik.h1 <- integrate(f11, 0, 1)$value *
  integrate(f12, 0, 1)$value

B.int1 <- marg.lik.h1 / marg.lik.h0

# This is not near the result calculated previously. What went wrong?

# Let's inspect the integrands:

x <- seq(0, 1, len = 401)
op <- par(mfrow = c(2, 2))
plot(x, f11(x), 'l')
plot(x, f12(x), 'l')
plot(x, f0(x), 'l')
par(op)

# They are all highly peaked -- that's the cause of the problems.

# Try increasing the number of evaluation points:

integrate(f11, 0, 1)
integrate(f11, 0, 1, sub = 10000)

integrate(f12, 0, 1)
integrate(f12, 0, 1, sub = 10000)

integrate(f0, 0, 1)
integrate(f0, 0, 1, sub = 10000)

# ... no changes.

# Try restricting the integration set; all the integrands seem to vanish
# outside (0, 0.2)

integrate(f11, 0, 1)
integrate(f11, 0, 0.2)
integrate(f11, 0.2, 1)

integrate(f12, 0, 1)
integrate(f12, 0, 0.2)
integrate(f12, 0.2, 1)

integrate(f0, 0, 1)
integrate(f0, 0, 0.2)
integrate(f0, 0.2, 1)
# Ahaa: integral of f0 was hard for integrate()

marg.lik.h0 <- (
                integrate(f0, 0, 0.2, sub = 10000)$value
                + integrate(f0, 0.2, 1)$value
                )
marg.lik.h1 <- (
                (integrate(f11, 0, 0.2, sub = 10000)$value
                + integrate(f11, 0.2, 1)$value) *
                (integrate(f12, 0, 0.2, sub = 10000)$value
                 + integrate(f12, 0.2, 1)$value)
                )
B.int2 <- marg.lik.h1 / marg.lik.h0

B
B.int2

# Now, that's better.

Moral: calculating the Bayes factor can be tricky!


8. TDT-test: Bayes factor via Laplace

We use the data of problem 1: n12 = 39, n21 = 86. The model is described in the lectures and also in problem 2. The parameter p = p12 has the non-informative prior Beta(0.5, 0.5).

Under H1, the joint density of the data and the parameter, f(data, p), is given by

choose(n12+n21, n12) / beta(0.5, 0.5) * p^(n12+0.5 - 1) * (1-p)^(n21+0.5 - 1)

NB: It is possible to do the maximization and calculation of the Hessian at the maximum point with optim(), but this function is trickier to use than optimize().


Solution:

n12 <- 39
n21 <- 86

# maximum point analytically:

max.p <- (n12 - 0.5) / (n12 + n21 - 1)

# Let's pretend that we do not how to calculate the maximum point.

# maximum point with optimize()

log.C1 <- lchoose(n12 + n21, n12) - lbeta(0.5, 0.5)
logf <- function(p)
  (log.C1 + (n12 - 0.5) * log(p) + (n21 - 0.5) * log(1 - p))

opt1 <- optimize(logf, interval = c(0, 1), maximum = TRUE)
phat <- opt1$max

# second derivative of logf:

L2 <- function(p)
  (-(n12 - 0.5) / p^2 - (n21 - 0.5) / (1 - p)^2)

A <- - L2(phat)

# Laplace approximation to the marginal likelihood under H1:
f1 <- sqrt(2 * pi / A) * exp(logf(phat))
# This is the likelihood under H0:
f0 <- dbinom(n12, n12 + n21, 0.5)

(B.Lap <- f1 / f0)


# The same results with optim()

L1 <- function(p)
  ((n12 - 0.5) / p - (n21 - 0.5) / (1 - p))

# Tell optim() that I want to maximize, not minimize
ctrl <- list(fnscale = -1)
# I cannot use the default optimization method (Nelder-Mead),
# since it cannot handle one-dimensional optimization.

opt12 <- optim(0.3, fn = logf, gr = L1, control = ctrl,
              lower = 0.01, upper = 0.99,
              method = "L-BFGS-B", hessian = TRUE)

A2 <- - opt12$hessian
f12 <- sqrt(2 * pi / A2) * exp(opt12$value)
(B.Lap2 <- f12 / f0)

9. t-test data: frequentist inference with textbook formulae

The lecture notes use the following data

xa <- scan()
2.065 1.435 2.792 1.017 1.992 1.794 0.733 2.040 2.355 1.536

xb <- scan()
2.19 2.25 1.22 2.58 2.64 2.56 2.45 3.00 1.73 2.55

xa; xb

Calculate the t-statistic using the textbook formulae

# pooled standard deviation:
sp <- sqrt( ((na - 1) * sa^2 + (nb - 1) * sb^2) / (na + nb - 2) )
# standard error of difference of means:
sed <- sp * sqrt(1 / na + 1 / nb)

t.stat <- (mb - ma) / sed
where na, ma and sa are the sample size, mean and standard deviation of data set xa, and nb, mb and sb are the sample size, mean and standard deviation of data set xb.

Calculate the F statistic corresponding to t.stat.

Finally, redo the calculations with t.test().


Solution:

xa <- scan()
2.065 1.435 2.792 1.017 1.992 1.794 0.733 2.040 2.355 1.536

xb <- scan()
2.19 2.25 1.22 2.58 2.64 2.56 2.45 3.00 1.73 2.55

xa; xb

na <- length(xa)
nb <- length(xb)
(ma <- mean(xa))
(mb <- mean(xb))
(sa <- sd(xa))
(sb <- sd(xb))

# sp, pooled standard deviation
(sp <- sqrt( ((na - 1) * sa^2 + (nb - 1) * sb^2) / (na + nb - 2) ))
# sed, standard error of difference of means
(sed <- sp * sqrt(1 / na + 1 / nb))

(t.stat <- (mb - ma) / sed)

(F.stat <- t.stat^2)

# Check the results with t.test()

t.test(xb, xa, var.equal = T)


10. t-test data: frequentist inference with aov()

Recalculate the F statistic using aov().


Solution:

The two-sample t-test is a special case of one-way ANOVA, when we have two groups.

f <- factor(c(rep('a', na), rep('b', nb)))
x <- c(xa, xb)

summary(aov(x ~ f))

11. t-test data: Bayes factor with Laplace

First, load library ldDesign, and then calculate the Bayes factor using SS.oneway.bf().

Next we try to calculate the Bayes factor using Laplace approximation. For this, you need to implement functions, which evaluate the logarithm of the joint density of the data and the parameters under H0 and H1. You can then locate their maxima with optim(), which is also able to calculate the Hessian matrix at the extremal point. Function optim() usually minimizes functions, so give it a function which calculates the negative of your target function.

In order to do this problem, we need to specify priors. Use

This is not the same prior which was used by Spiegelhalter and Smith, so the Bayes factor will be different, too.

# Spiegelhalter and Smith Bayes factor

library(ldDesign)
B.SS <- SS.oneway.bf(c(na, nb), F.stat)

# Bayes factor with Laplace approximation:

# Function for the negative of the joint density of data and
# parameters under H1
neg.log.f1 <- function(v) {
  ma <- v[1]
  mb <- v[2]
  prec <- v[3]
  if (prec < 0) return(Inf)
  
  s <- 1 / sqrt(prec)
  loglik <- (
             sum(dnorm(log = TRUE, xa, mean = ma, sd = s))
             + sum(dnorm(log = TRUE, xb, mean = mb, sd = s))
             )
  logpri <- (
             sum(dnorm(log = TRUE, c(ma, mb), mean = 0, sd = 2))
             + dgamma(log = TRUE, prec, 1, 0.1)
             )
  return(-loglik - logpri)
  
}

# Function for the negative of the joint density of data and
# parameters under H0
neg.log.f0 <- function(v) {
  m <- v[1]
  prec <- v[2]
  if (prec < 0) return(Inf)

  s <- 1 / sqrt(prec)
  loglik <- sum(dnorm(log = TRUE, c(xa, xb), mean = m, sd = s))
  logpri <- (
             dnorm(log = TRUE, m, mean = 0, sd = 2)
             + dgamma(log = TRUE, prec, 1, 0.1)
             )
  return(-loglik - logpri)
}

# Reasonable initial values for optimization (under H1)
ini1 <- c(mean(xa), mean(xb), 1 / sp^2)
opt1 <- optim(ini1, neg.log.f1, hessian = TRUE)

x <- c(xa, xb)
# Reasonable initial values for optimization (under H0)
ini0 <- c(mean(x), 1 / var(x))
opt0 <- optim(ini0, neg.log.f0, hessian = TRUE)

B.Lap <- (2 * pi)^(3/2) * det(opt1$hessian)^(-0.5) * exp(-opt1$value) /
  ((2 * pi)^(2/2) * det(opt0$hessian)^(-0.5) * exp(-opt0$value))



12. t-test data: BIC

Calculate the BIC criterion under H0 and H1, and approximate the Bayes factor using the BIC values.


Solution:

x <- c(xa, xb)
f <- factor(c(rep('a', length(xa)), rep('b', length(xb))))

m1 <- lm(x ~ f)
m0 <- lm(x ~ 1)

p1 <- 3 # two regression coeff's + error variance parameter
p0 <- 2 # one regression coeff + error variance

# The only thing that matters in the sequel is the difference p1 - p0,
# so you could use as well (p1 = 2 and p0 = 1) or (p1 = 1 and p0 = 0)


n <- length(x)
bic1 <- n * log(1 - summary(m1)$r.squared) + p1 * log(n)
bic0 <- n * log(1 - summary(m0)$r.squared) + p0 * log(n)

B.bic <- exp(-(bic1 - bic0) / 2)


13. Importance sampling exercise from the lecture notes


Solution:

Notice that we are now considering the paired t-test, not the two-sample t-test which we have been considering up to now. The data for the paired t-test is x = xb - xa, and the sample size is now 10.

x <- xb - xa
n <- length(x)

# Precision parameter for the prior on mu
prec.mu <- qnorm(0.05, lower = F)^2
# Check that this gives 5 %:
pnorm(1, lower = F, sd = 1 / sqrt(prec.mu))

# My prior for the precision parameter (in the likelihood) is
# Gamma(a, b)

a <- 1
b <- 0.1

mu.bar <- mean(x)
tau.bar <- (a + n/2) / (b + sum( (x - mu.bar)^2) / 2)

N <- 10000

mu1 <- tau.bar * sum(x) / (prec.mu + n * tau.bar)
sd1 <- 1 / sqrt(prec.mu + n * tau.bar)

a1 <- a + n/2
b1 <- b + sum((x - mean(x))^2) / 2

mu.sample <- rnorm(N, mean = mu1, sd = sd1)

tau.sample <- rgamma(N, a1, b1)

q.dens <- function(mu, tau)
   dnorm(mu, mean = mu1, sd = sd1) * dgamma(tau, a1, b1)

# Posterior is proportional to prior * likelihood;
# the following function calculates it
joint.dens <- function(mu, tau) {
  prior <- (
            dnorm(mu, mean = 0, sd = 1 / sqrt(prec.mu)) 
            * dgamma(tau, a, b)
            )
  lik <- numeric(length(mu))
  for (i in 1:length(mu)) {
    lik[i] <- prod(dnorm(x, mean = mu[i], sd = 1 / sqrt(tau[i])))
  }
  prior * lik
}

weights <- joint.dens(mu.sample, tau.sample) / q.dens(mu.sample, tau.sample)
# normalize them to a probability vector
weights <- weights / sum(weights)

# This is our histogram estimate for the marginal posterior density of mu:
hist(probability = TRUE, sample(size = 1000, mu.sample, prob = weights))


# This is our estimate for Pr(mu > 0 | x)
sum(weights * (mu.sample > 0))
# add the plot of the prior density:
u <- seq(-2, 2, len = 201)
lines(u, dnorm(u, sd = 1 / sqrt(prec.mu)))


# Bayes factor using the Savage-Dickey density ratio at mu = 0:

delta <- 0.1
g1 <- sum(weights * (- delta < mu.sample) * (mu.sample < delta)) / (2 * delta)

# check from the density histogram, that the estimate is reasonable:

g1

# This the approximate  Bayes factor for H1 against H0
B <- dnorm(0, sd = 1 / sqrt(prec.mu)) / g1



Last updated 2008-05-08 17:39
Petri Koistinen
petri.koistinen 'at' helsinki.fi