Send your solutions to the lecturer by email (firstname.lastname@example.org), at the latest on Thu 3 March at 10.00. Use R4 as the title of your message.
This figure was produced by repeating 100 times the experiment,
where a sample of size 10 was generated from the standard normal
distribution. In each of the repetitions, the 95% t-confidence interval was
calculated and stored (using
t.test(x)$conf.int). Finally, each confidence
interval was drawn (with function
so that those confidence intervals which
did not cover the true value zero were colored red.
Write R code for producing a similar figure.
We continue exploring the data set tn10.csv
from the probability theory course.
We try to investigate the hypothesis whether the decision of not coming to the first course exam (
K11 is missing) and low exericse activity at the beginning of the course (
H1+H2+H3+H4+H5 < 6) are correlated. Test the null hypothesis
that these classifications of the students are independent with the chi squared
test. Do you reject the null hypothesis with significance level 0.05?
Hints: use function
chisq.test() and form its input by using the
table(). You can use as inputs to
table() two logical vectors
which correspond to the row and column classifications.
The bilirubin data bilirubin.dat, contains bilirubin concentration measurements for three individuals. The measurements from a single individual form one of the three groups. Now we draw a figure to investigate informally the assumption of constant variance within each group needed for an ANOVA model (analysis of variance model) for this data. First read the data and then draw a stripchart from it with the command
stripchart(bili$Concentration ~ bili$Individual, method = 'jitter', vert = TRUE, pch = 1)Next, using the command
arrows( ...arguments... , angle = 90, code = 3)add line segments, which show for each of the three individuals the 1 SEM (standard error of the mean) limits
xbar[i] +- sem[i]where xbar[i] is the mean of the bilirubin concentrations for the i'th individual, and sem[i] is the corresponding standard error of the mean, which is equal to the standard deviation of the concentration measurements for the i'th individual divided by the square root of the number of the concentration measurements for the i'th individual.
LifeCycleSavings data set of R, first fit a linear model, which
tries to predict the variable
sr using the other variables of the
data frame as explanatory variables. Then calculate the following.
srfor a new country with pop15 equal to 28, pop75 equal to 2.4, dpi equal to 1700 and ddpi equal to 4.3.
The file trees.dat contains measurements of the diameter (d) and height (h) and volume (v) of black cherry trees (given in exotic units). While it is easy to measure the diameter and height of a growing tree, measuring its volume directly (without cutting it down) is not easy. Therefore it is of interest to try to predict the volume of a tree on the basis of its diameter and its height.
First try to explain v with a linear model, where the explanatory variables are h and d. Next draw the following residual plots
Convince yourself that this model is not good, since some of the residual plots exhibit a clear systematic pattern.
Next try to explain log(v) with a linear model, where the explanatory variables are log(h) and log(d). Draw again the residual plots and notice that the systematic structure has now disappeared.
We continue with the trees data trees.dat. First fit the full linear model, where the response variable is log(v) and the explanatory variables are log(h) and log(d).
A simple model for connecting the (expected) values of the three quantities is the following
v = constant * h * d^2Taking logarithms, you arrive at a reduced linear model, which is a special case of the full linear model.
Perform the F-test, where you compare the reduced linear model to the full linear model.
Last updated 2011-02-21 09:58
petri.koistinen 'at' helsinki.fi