Send your solutions to the lecturer by email (firstname.lastname@helsinki.fi), at the latest on Mon 14 Feb at 10.00. Use R2 as the title of your message.

The data file tn10.csv corresponds to the Excel file tn10.xls. The csv version of the data was written out from Excel using the CSV (comma separated values) format.

First take a look at the data file and read the help
of the R function `read.table()`

.
Next formulate a command by which you can create a data frame called
`results`

which holds
the contents of the file.
Check that the data frame contains 82 observations of 19 variables.

The following commands creates the data frame `results`

from the data in
the file tn10.dat even if you failed
to solve the previous problem

results <- read.table('tn10.dat')

This data pertains to points from the course on probability theory
(Todennäköisyyslaskenta) held in 2010.
To protect the culprits,
the names of the students have been omitted
and their student numbers have been replaced with random numbers.
The variables `H1`

, ..., `H10`

contain points from
each of the 10 exercise sessions
of the course,
the variables `K11`

, ... `K14`

contain
the points from the four problems of the first
course exam,
and the variables `K21`

, ..., `K24`

the points from the four problems of the second course exam

- Write commands which add variables
`H.sum`

and`K.sum`

to the data frame.`H.sum`

should contain the sum of the points from the exercise sessions, but all the NA values should be counted as zero (hint:`rowSums()`

).`K.sum`

should contain the sum of the points from both of the two course exams. - Write a command which creates the variable
`Extra.points`

based on the value of the variable`H.sum`

such that if`H.sum`

is in the range 10 <= H.sum < 15, then Extra.points is 1; for the range 15 <= H.sum < 20 Extra.points is 2; and so on; finally for 40 points or more Extra.points is 7. It is possible to write an elegant one-line solution using the function`cut()`

, but some of you may find it easier to solve the problem with some other tools. - How do you find out the student numbers (variable
`opisnro`

) of those students who did not obtain any points from the two course exams. (Watch out for missing values.)

Applied to a vector `x`

, the following call calculates quantiles
corresponding to the probabilities 0.1, 0.5 and 0.9.

quantile(x, probs = c(0.1, 0.5, 0.9))Now we want to calculate these quantiles for each of the numeric variables of the

`iris`

data set.
Give a solution using one of the apply-type functions.
In the U.S. temperatures are usually expressed in degrees Fahrenheit (F) instead of degrees Celsius (C), which are used in the rest of the world. The conversion formula between the two temperature scales is the following.

C = 5 / 9 * (F - 32)

Write function `FtoC`

which converts temperatures given in degrees Fahrenheit
into degrees Celsius.
Also write function `CtoF`

which converts temperatures given in
degrees Celsius into degrees Fahrenheit.

Supposedly, a school teacher gave C. F. Gauss, at the age of seven, the problem of summing the integers 1, 2, ..., 100. Gauss found the answer almost instantly (without having been told about the arithmetic series).

1. How do you find that sum by using the function `sum()`

?

2. For the sake of practice, do the same calculation using a for-loop. Of course, your first solution, with the sum-function, is much clearer and shorter.

Sometimes the data must be combined from several sources and there is a key which tells which of the experimental unit the data item comes from.

The names of the students of the probability theory course are kept
in a separate registry, from which they have been extracted to
the file tn10nimet.dat
(but the real names have been changed).
Unfortunately, the names are not listed in the same order as in the
file tn10.dat where the points can be found. Instead the student
number (variable `opisnro`

in file `tn10.dat`

and variable `nro`

in the
file `tn10nimet.dat`

) identify the students uniquely.

Produce a data frame whose first three variable contain the student
number, the first name (`etunimi`

) and the family name (`sukunimi`

)
and the rest of the variables are for the points from the exercise
sessions and the two course exams. (Hint: `merge()`

.)

Last updated 2011-02-11 17:35

Petri Koistinen

petri.koistinen 'at' helsinki.fi