Helsingin yliopisto Department of Mathematics and Statistics
Faculty of Science
Faculty of Social Sciences


57755 Statistical Methods for Association Mapping


This is an intensive graduate level short course intended for statistical/mathematical and biological science students interested in association mapping. The course is sponsored by the ComBi graduate school in computational biology, bioinformatics and biometry.

Time and place
There are 5 days with 5 hours of lectures per day from Monday the 5th of May to Friday the 9th of May (week 19), 2008. The lectures are held in Exactum.
4 - 6 credit units.
Roderick Ball (Scion), Petri Koistinen (U. Helsinki, calculations using R), and Bob O'Hara (U. Helsinki, Bayesian calculations using BUGS).
Please fill in the registration form and email it according to the instructions.
Table of contents for the rest of this page
Course outline
Obtaining the credits
Course schedule

Course outline

This course introduces association mapping and the statistical methods and computational techniques needed. The course is based on chapters 7, and 8 of the book Association Mapping in Plants, together with case studies of recent large scale case-control studies of human disease.

Association mapping is a gene mapping method based on detecting and utilising population-level associations --- i.e. non-independence or `linkage disequilibrium' between genetic loci, e.g. between DNA markers and traits of interest. There is interest in using association mapping to find genetic loci associated with variation in complex traits and diseases.

The advent of dense maps (e.g. 500,000 or more) of SNP markers covering the genome, and technologies for screening large numbers of markers per individual is leading to generation of vast amounts of genomic data. However, obtaining useful information from the data is non-trivial, and many published associations are spurious. Statistical methods for analysing the data are presented. Experimental designs with sufficient power, to overcome the low prior odds for genomic associations, are equally vital. The course introduces methods and software for ensuring designs have sufficient power to obtain reasonable posterior odds for associations.

The basic concepts of Bayesian statistics, and how to use them for testing scientific hypotheses are introduced. This enables computation of posterior probabilities for scientific hypotheses. A range of techniques including analytical approximate methods, conjugate prior distributions and MCMC sampling are introduced. Bayesian computations for case studies are demonstrated and compared with classical `frequentist' inference based on p-values, which are shown to be particularly problematic in a genomics context.

The R system for data analysis and graphics and the BUGS system are introduced, and the required computations demonstrated. R functions and libraries (ldDesign) will be provided.


The course starts from first principles of Bayesian statistics. Knowledge of the basics of calculus (differentiation, integration), matrix algebra, and probability theory is an advantage. Basic knowledge of genetics (e.g. Mendelian inheritance, heritability) is also an advantage. The course aims to cater for both biologically and statistically oriented students.

Obtaining the credits

There is a 1 hour exam at the end of the course and an optional project due by the end of May. Students passing the exam will receive 4 credits. Satisfactory completion of the project will be worth an extra 2 credits. Students are encouraged to work in pairs, with one biologist and one statistician.


  • Ball, R.D. 2004: ldDesign, an R package for design of experiments for detection of linkage disequilibrium. http://cran.r-project.org/web/packages/ldDesign/index.html
  • Ball, R.D. 2005: Experimental designs for reliable detection of linkage disequilibrium in unstructured random population association studies. Genetics 170: 859--873.
  • De Silva, N.D., and Ball, R.D. 2007: Linkage disequilibrium mapping concepts, Chapter 7, pp 103--132, In: Association Mapping in Plants, N.C. Oraguzie, E.H.A. Rikkerink, S.E. Gardiner, and H.N. De Silva (Editors). Springer 2007.
  • Ball, R.D. 2007: Statistical analysis and experimental design Chapter 8, pp 133--209, In: Association Mapping in Plants, N.C. Oraguzie, E.H.A. Rikkerink, S.E. Gardiner, and H.N. De Silva (Editors). Springer 2007.


There are 5 days with 5 hours of lectures per day from Monday the 5th of May to Friday the 9th of May. This includes a 5 hours introduction to the R system for statistical analysis and graphics and a 5 hours introduction to the BUGS system for Bayesian analysis using Gibbs sampling.

Course schedule

Monday May 5, 2008
  • 9:45 - 10:15 Registration
  • 10:15 - 12:15 Lectures, room D123 (Ball)
  • 1:30 - 4:30 Lectures, room D123 (Ball)
Tuesday May 6, 2008
  • 9:45 - 12:15 Lectures, room D123 (Ball)
  • 1:15 - 3:45 R(1), room C128 (Koistinen, Ball)
Wednesday May 7, 2008
  • 9:45 - 12:15 Lectures, room D123 (Ball)
  • 1:15 - 3:45 R(2), room C128 (Koistinen, Ball)
Thursday May 8, 2008
  • 9:45 - 12:15 Lectures, room D123 (Ball)
  • 1:45 - 4:15 BUGS(1), room C128 (O'Hara, Ball)
Friday May 9, 2008
  • 10:00 - 12:00 Lectures, room D123 (Ball)
  • 1:15 - 3:15 BUGS(2), room C128 (O'Hara, Ball)
  • 3:15 - 4:00 Time for discussion with the instructor (Ball)
  • 4:00 - 5:00 Test, room D123 (Ball)


Last updated 2010-01-19 13:23
Petri Koistinen
petri.koistinen 'at' helsinki.fi