In a transimission disequilibrium test we had the data

n11 = 99, n12 = 39, n21 = 86, n22 = 16The test statistic is

T = (n12 - n21)^2 / (n12 + n21)Its distribution under H0 is (approximately) chi-squared with df = 1.

Calculate the value of T, and its p-value,
(but please, do not use `T`

as the name of your variable).
The R name of the chi-squared distribution is `chisq`

Solution:

n12 <- 39 n21 <- 86 # test statistic (t.stat <- (n12 - n21)^2 / (n12 + n21)) # and its p-value pchisq(tobs, df = 1, lower = F)

We use the same data as in the previous problem. The hypotheses are

n12 ~ Binomial(n12 + n21, 0.5) under H0 n12 ~ Binomial(n12 + n21, p1) under H1H0 is obtained from H1 by setting p1 = 0.5. When we use the prior

p1 ~ Beta(0.5, 0.5)then the posterior of p1 (under H1) is

g(p1 | n12, n21) = Beta(p1 | n12 + 0.5, n21 + 0.5)

Calculate the Bayes factor using the Savage-Dickey density ratio
(the ratio of prior to posterior densities at p1 = 0.5).
For this, you need the density of the beta distribution, i.e.,
the function `dbeta()`

.

Solution:

# The density of Beta(a, b) distribution at x is dbeta(x, a, b) # (hyper)parameters of the prior: a <- 0.5 b <- 0.5 (B <- dbeta(0.5, a, b) / dbeta(0.5, n12 + a, n21 + b))

In the Cancer at Slater example we had observed 8 cases, and the statistical model (under H0) is the binomial distribution with sample size 145 and probability parameter 0.03.

Calculate the p-value

- by summing the probability mass function for arguments 8, 9, ...
- more directly, by using the CDF of the binomial distribution.

The R name of the binomial distribution is `binom`

.

Solution:

# p-value by summing the probability mass function: sum(dbinom(8:145, size = 145, prob = 0.03)) # p-value by using pbinom(): pbinom(7.5, size = 145, prob = 0.03, lower = F)

Comment: I didn't call pbinom() with an integer argument, becouse I did not want to depend on the designers of that function having made the same choice as me as regards to which of the two functions, the CDF or its complement, the point mass at x = 8 belongs.

In the Bayesian treatment of the Cancer at Slater example discussed in the lectures, we considered only hypothesis theta = 0.03, 0.04, 0.05, 0.06. Why so few alternative hypotheses? Has the example been rigged to make the alternative hypothesis seem worse than it really should be? Let's extend the theta grid and check the results!

Now we reconsider the same problem, but this case consider the values theta = 0.03, 0.04, ..., 0.15. Take the prior probability of the null (theta = 0.03) to be 0.5, and distribute the rest of the prior probability evenly over the other theta values. The observation is x = 8 and the likelihood is the binomial likelihood with sample size 145 and probability parameter theta.

Calculate posterior probabilities of the different 'theories' theta = 0.03, 0.04, ..., 0.15.

Advice: once you have formed a vector containing the prior probabilities (corresponding to different thetas), and a vector containing the likelihood values (for different thetas), then the posterior probabilities are given by normalizing the product of prior times likelihood to sum to one, i.e., in the following manner

(prior * likelihood) / sum(prior * likelihood)

Solution:

theta <- seq(0.03, 0.15, by = 0.01) k <- length(theta) - 1 prior <- c(0.5, 0.5 * rep(1, k) / k) likelihood <- dbinom(x = 8, size = 145, prob = theta) posterior <- prior * likelihood / sum(prior * likelihood) plot(theta, posterior, type = 'h') # odds in favour of the alternative: (1 - posterior[1]) / posterior[1] # ... the alternative hypothesis seems now even worse than previously

In the Vitamin C example, the observation is x = 13 in the model where x has binomial distribution with sample size 17 and probability parameter p. The null hypothesis is p = 0.5, and the prior probabilities of the null and the alternative hypothesis are equal.

Under the alternative hypothesis, the prior distribution of p is the uniform distribution on (p0, 1 - p0), where 0 < p0 < 0.5. The density of this distribution has the constant value 1 / (1 - 2*p0) on (p0, 1 - p0) and is zero elsewhere.

Go through the formulas for calculating the posterior probability of H0, and calculate and plot its value for p0 = 0.01, 0.02, ..., 0.49. (Check that the posterior probability stays above 0.20.)

Advice:
R has the function `integrate()`

for numerical integration.
E.g., to calculate the integral of x^2 over (0, 1), use it
as follows.

f <- function(x) x^2 int <- integrate(f, 0, 1) intHowever,

`integrate()`

returns an object,
whose structure
you must unravel (hint: try `str(int)`

)
before you can store the (approximate) value of the
integral as the element of a numerical vector, whose elements you
calculate using a for-loop.
Solution:

p0 <- seq(0.01, 0.49, by = 0.01) f <- function(p) p^13 * (1 - p)^4 posth0 <- numeric(length(p0)) # set up a vector for the results for (i in 1:length(p0)) { v <- integrate(f, p0[i], 1 - p0[i]) posth0[i] <- (1 + 2^17 / (1 - 2 * p0[i]) * v$value)^(-1) } plot(p0, posth0, type = 'l', ylab = 'Pr(H0 | x)') # NB: the integrals could be calculated here using pbeta() instead of # integrate().

Consider the following case-control dataset, where row one contains counts for cases and row two for controls, and where there are two marker classes (the columns).

(n <- matrix(c(602, 1065, 28, 101), 2, 2))

- Calculate the value of the chi-squared statistic X^2 (eq. (26)), and its p-value.
- Calculate the odds ratio OR given by
OR = (n11 * n22) / (n12 * n21)

Calculate an approximate 95% confidence interval for OR withexp(log(OR) +- 1.96 * SE(log(OR))),

whereSE(log(OR)) = sqrt(sum(1 / n))

(There are functions in R for calculating X^2 and OR, but we won't reveal their names. Try to do without ready-made functions.)

Solution:

(n <- matrix(c(602, 1065, 28, 101), 2, 2)) s <- sum(n) x2 <- (s * (n[1,1] * n[2,2] - n[1,2] * n[2, 1])^2) / ((n[1,1] + n[1, 2]) * (n[1,1] + n[2,1]) * (n[1,2] + n[2,2]) * (n[2,1] + n[2,2])) x2 # p-value: pchisq(x2, df = 1, lower = F) # The previous values can be calculated more easily with chisq.test(n, correct = F) # odds ratio OR <- n[1,1] * n[2,2] / (n[1,2] * n[2,1]) # an approximate confidence interval for the odds ratio: exp(log(OR) + c(-1, 1) * 1.96 * sqrt(sum(1 / n))) # The previous values, and the p-value can be calculated more easily with fisher.test(n)

We still use the data of the previous problem.

- Calculate the Bayes factor using eq. (30). The R function which
calculates
the beta function B(,)
is
`beta()`

.

Eq. (30) comes from the analysis, which uses priors

- p12, p22 ~ Beta(0.5, 0.5), independently, under H1.
- p12 = p22 and p12 ~ Beta(0.5, 0.5) under H0.

Here pij is the expected proportion for n[i,j] and p11 = 1 - p12 and p21 = 1 - p22.

- Write the likelihood under H1 and H0
- Recalculate the Bayes factor with numerical integration (function
`integrate()`

) using the above priors.

Hint: Pr(data given H1) factorizes into a product of two one-dimensional integrals; Pr(data given H0) is a one-dimensional integral.

Warning: You will probably encounter numerical problems.

Solution:

# Bayes factor from eq. (30) B <- beta(n[1,1] + 0.5, n[1,2] + 0.5) * beta(n[2,1] + 0.5, n[2,2] + 0.5) / (beta(0.5, 0.5) * beta(n[1,1] + n[2,1] + 0.5, n[1,2] + n[2,2] + 0.5)) # Bayes factor by numerical integration # First define the three integrands f11 <- function(p) dbinom(n[1,2], size = sum(n[1,]), prob = p) * dbeta(p, 0.5, 0.5) f12 <- function(p) dbinom(n[2,2], size = sum(n[2,]), prob = p) * dbeta(p, 0.5, 0.5) f0 <- function(p) (dbinom(n[1,2], size = sum(n[1,]), prob = p) * dbinom(n[2,2], size = sum(n[2,]), prob = p) * dbeta(p, 0.5, 0.5)) # First attempt: marg.lik.h0 <- integrate(f0, 0, 1)$value marg.lik.h1 <- integrate(f11, 0, 1)$value * integrate(f12, 0, 1)$value B.int1 <- marg.lik.h1 / marg.lik.h0 # This is not near the result calculated previously. What went wrong? # Let's inspect the integrands: x <- seq(0, 1, len = 401) op <- par(mfrow = c(2, 2)) plot(x, f11(x), 'l') plot(x, f12(x), 'l') plot(x, f0(x), 'l') par(op) # They are all highly peaked -- that's the cause of the problems. # Try increasing the number of evaluation points: integrate(f11, 0, 1) integrate(f11, 0, 1, sub = 10000) integrate(f12, 0, 1) integrate(f12, 0, 1, sub = 10000) integrate(f0, 0, 1) integrate(f0, 0, 1, sub = 10000) # ... no changes. # Try restricting the integration set; all the integrands seem to vanish # outside (0, 0.2) integrate(f11, 0, 1) integrate(f11, 0, 0.2) integrate(f11, 0.2, 1) integrate(f12, 0, 1) integrate(f12, 0, 0.2) integrate(f12, 0.2, 1) integrate(f0, 0, 1) integrate(f0, 0, 0.2) integrate(f0, 0.2, 1) # Ahaa: integral of f0 was hard for integrate() marg.lik.h0 <- ( integrate(f0, 0, 0.2, sub = 10000)$value + integrate(f0, 0.2, 1)$value ) marg.lik.h1 <- ( (integrate(f11, 0, 0.2, sub = 10000)$value + integrate(f11, 0.2, 1)$value) * (integrate(f12, 0, 0.2, sub = 10000)$value + integrate(f12, 0.2, 1)$value) ) B.int2 <- marg.lik.h1 / marg.lik.h0 B B.int2 # Now, that's better.Moral: calculating the Bayes factor can be tricky!

We use the data of problem 1: n12 = 39, n21 = 86. The model is described in the lectures and also in problem 2. The parameter p = p12 has the non-informative prior Beta(0.5, 0.5).

Under H1, the joint density of the data and the parameter, f(data, p), is given by

choose(n12+n21, n12) / beta(0.5, 0.5) * p^(n12+0.5 - 1) * (1-p)^(n21+0.5 - 1)

- Calculate the maximum point (wrt p) of the joint density analytically.
- Now pretend that you don't know how to calculate the maximum point analytically, and calculate the maximum point numerically, using optimize(). (This is explained in the lecture notes.)
- Calculate the second derivate of log(f(data, p)) analytically.
- Calculate the Bayes factor using Laplace approximation. (We already calculated the correct result 610.8 in problem 2.)

NB: It is possible to do the maximization and calculation of the
Hessian at the maximum point with `optim()`

, but this
function is trickier to use than `optimize()`

.

Solution:

n12 <- 39 n21 <- 86 # maximum point analytically: max.p <- (n12 - 0.5) / (n12 + n21 - 1) # Let's pretend that we do not how to calculate the maximum point. # maximum point with optimize() log.C1 <- lchoose(n12 + n21, n12) - lbeta(0.5, 0.5) logf <- function(p) (log.C1 + (n12 - 0.5) * log(p) + (n21 - 0.5) * log(1 - p)) opt1 <- optimize(logf, interval = c(0, 1), maximum = TRUE) phat <- opt1$max # second derivative of logf: L2 <- function(p) (-(n12 - 0.5) / p^2 - (n21 - 0.5) / (1 - p)^2) A <- - L2(phat) # Laplace approximation to the marginal likelihood under H1: f1 <- sqrt(2 * pi / A) * exp(logf(phat)) # This is the likelihood under H0: f0 <- dbinom(n12, n12 + n21, 0.5) (B.Lap <- f1 / f0) # The same results with optim() L1 <- function(p) ((n12 - 0.5) / p - (n21 - 0.5) / (1 - p)) # Tell optim() that I want to maximize, not minimize ctrl <- list(fnscale = -1) # I cannot use the default optimization method (Nelder-Mead), # since it cannot handle one-dimensional optimization. opt12 <- optim(0.3, fn = logf, gr = L1, control = ctrl, lower = 0.01, upper = 0.99, method = "L-BFGS-B", hessian = TRUE) A2 <- - opt12$hessian f12 <- sqrt(2 * pi / A2) * exp(opt12$value) (B.Lap2 <- f12 / f0)

The lecture notes use the following data

xa <- scan() 2.065 1.435 2.792 1.017 1.992 1.794 0.733 2.040 2.355 1.536 xb <- scan() 2.19 2.25 1.22 2.58 2.64 2.56 2.45 3.00 1.73 2.55 xa; xb

Calculate the t-statistic using the textbook formulae

# pooled standard deviation: sp <- sqrt( ((na - 1) * sa^2 + (nb - 1) * sb^2) / (na + nb - 2) ) # standard error of difference of means: sed <- sp * sqrt(1 / na + 1 / nb) t.stat <- (mb - ma) / sedwhere na, ma and sa are the sample size, mean and standard deviation of data set xa, and nb, mb and sb are the sample size, mean and standard deviation of data set xb.

Calculate the F statistic corresponding to t.stat.

Finally, redo the calculations with `t.test()`

.

Solution:

xa <- scan() 2.065 1.435 2.792 1.017 1.992 1.794 0.733 2.040 2.355 1.536 xb <- scan() 2.19 2.25 1.22 2.58 2.64 2.56 2.45 3.00 1.73 2.55 xa; xb na <- length(xa) nb <- length(xb) (ma <- mean(xa)) (mb <- mean(xb)) (sa <- sd(xa)) (sb <- sd(xb)) # sp, pooled standard deviation (sp <- sqrt( ((na - 1) * sa^2 + (nb - 1) * sb^2) / (na + nb - 2) )) # sed, standard error of difference of means (sed <- sp * sqrt(1 / na + 1 / nb)) (t.stat <- (mb - ma) / sed) (F.stat <- t.stat^2) # Check the results with t.test() t.test(xb, xa, var.equal = T)

Recalculate the F statistic using `aov()`

.

Solution:

The two-sample t-test is a special case of one-way ANOVA, when we have two groups.

f <- factor(c(rep('a', na), rep('b', nb))) x <- c(xa, xb) summary(aov(x ~ f))

First, load library `ldDesign`

, and then
calculate the Bayes factor using `SS.oneway.bf()`

.

Next we try to calculate the Bayes factor using Laplace approximation.
For this, you need to implement functions, which evaluate
the logarithm of the joint density of the data and the parameters
under H0 and H1. You can then locate their maxima with `optim()`

,
which is also able to calculate the Hessian matrix at the
extremal point. Function `optim()`

usually minimizes functions,
so give it a function which calculates the *negative* of your target
function.

In order to do this problem, we need to specify priors. Use

- ma, mb ~ N(0, 2^2), indendently, and define the precision tau = 1/sigma^2, and let tau ~ Gamma(1, 0.1) (under H1).
- ma = mb = m, m ~ N(0, 2^2), and tau ~ Gamma(1, 0.1) (under H0).

This is not the same prior which was used by Spiegelhalter and Smith, so the Bayes factor will be different, too.

# Spiegelhalter and Smith Bayes factor library(ldDesign) B.SS <- SS.oneway.bf(c(na, nb), F.stat) # Bayes factor with Laplace approximation: # Function for the negative of the joint density of data and # parameters under H1 neg.log.f1 <- function(v) { ma <- v[1] mb <- v[2] prec <- v[3] if (prec < 0) return(Inf) s <- 1 / sqrt(prec) loglik <- ( sum(dnorm(log = TRUE, xa, mean = ma, sd = s)) + sum(dnorm(log = TRUE, xb, mean = mb, sd = s)) ) logpri <- ( sum(dnorm(log = TRUE, c(ma, mb), mean = 0, sd = 2)) + dgamma(log = TRUE, prec, 1, 0.1) ) return(-loglik - logpri) } # Function for the negative of the joint density of data and # parameters under H0 neg.log.f0 <- function(v) { m <- v[1] prec <- v[2] if (prec < 0) return(Inf) s <- 1 / sqrt(prec) loglik <- sum(dnorm(log = TRUE, c(xa, xb), mean = m, sd = s)) logpri <- ( dnorm(log = TRUE, m, mean = 0, sd = 2) + dgamma(log = TRUE, prec, 1, 0.1) ) return(-loglik - logpri) } # Reasonable initial values for optimization (under H1) ini1 <- c(mean(xa), mean(xb), 1 / sp^2) opt1 <- optim(ini1, neg.log.f1, hessian = TRUE) x <- c(xa, xb) # Reasonable initial values for optimization (under H0) ini0 <- c(mean(x), 1 / var(x)) opt0 <- optim(ini0, neg.log.f0, hessian = TRUE) B.Lap <- (2 * pi)^(3/2) * det(opt1$hessian)^(-0.5) * exp(-opt1$value) / ((2 * pi)^(2/2) * det(opt0$hessian)^(-0.5) * exp(-opt0$value))

Calculate the BIC criterion under H0 and H1, and approximate the Bayes factor using the BIC values.

Solution:

x <- c(xa, xb) f <- factor(c(rep('a', length(xa)), rep('b', length(xb)))) m1 <- lm(x ~ f) m0 <- lm(x ~ 1) p1 <- 3 # two regression coeff's + error variance parameter p0 <- 2 # one regression coeff + error variance # The only thing that matters in the sequel is the difference p1 - p0, # so you could use as well (p1 = 2 and p0 = 1) or (p1 = 1 and p0 = 0) n <- length(x) bic1 <- n * log(1 - summary(m1)$r.squared) + p1 * log(n) bic0 <- n * log(1 - summary(m0)$r.squared) + p0 * log(n) B.bic <- exp(-(bic1 - bic0) / 2)

Solution:

Notice that we are now considering the paired t-test, not the two-sample t-test which we have been considering up to now. The data for the paired t-test is x = xb - xa, and the sample size is now 10.

x <- xb - xa n <- length(x) # Precision parameter for the prior on mu prec.mu <- qnorm(0.05, lower = F)^2 # Check that this gives 5 %: pnorm(1, lower = F, sd = 1 / sqrt(prec.mu)) # My prior for the precision parameter (in the likelihood) is # Gamma(a, b) a <- 1 b <- 0.1 mu.bar <- mean(x) tau.bar <- (a + n/2) / (b + sum( (x - mu.bar)^2) / 2) N <- 10000 mu1 <- tau.bar * sum(x) / (prec.mu + n * tau.bar) sd1 <- 1 / sqrt(prec.mu + n * tau.bar) a1 <- a + n/2 b1 <- b + sum((x - mean(x))^2) / 2 mu.sample <- rnorm(N, mean = mu1, sd = sd1) tau.sample <- rgamma(N, a1, b1) q.dens <- function(mu, tau) dnorm(mu, mean = mu1, sd = sd1) * dgamma(tau, a1, b1) # Posterior is proportional to prior * likelihood; # the following function calculates it joint.dens <- function(mu, tau) { prior <- ( dnorm(mu, mean = 0, sd = 1 / sqrt(prec.mu)) * dgamma(tau, a, b) ) lik <- numeric(length(mu)) for (i in 1:length(mu)) { lik[i] <- prod(dnorm(x, mean = mu[i], sd = 1 / sqrt(tau[i]))) } prior * lik } weights <- joint.dens(mu.sample, tau.sample) / q.dens(mu.sample, tau.sample) # normalize them to a probability vector weights <- weights / sum(weights) # This is our histogram estimate for the marginal posterior density of mu: hist(probability = TRUE, sample(size = 1000, mu.sample, prob = weights)) # This is our estimate for Pr(mu > 0 | x) sum(weights * (mu.sample > 0)) # add the plot of the prior density: u <- seq(-2, 2, len = 201) lines(u, dnorm(u, sd = 1 / sqrt(prec.mu))) # Bayes factor using the Savage-Dickey density ratio at mu = 0: delta <- 0.1 g1 <- sum(weights * (- delta < mu.sample) * (mu.sample < delta)) / (2 * delta) # check from the density histogram, that the estimate is reasonable: g1 # This the approximate Bayes factor for H1 against H0 B <- dnorm(0, sd = 1 / sqrt(prec.mu)) / g1

Last updated 2008-05-08 17:39

Petri Koistinen

petri.koistinen 'at' helsinki.fi