Send your solutions to the lecturer by email (firstname.lastname@helsinki.fi), at the latest on Mon 7 Feb at 10.00. Use R2 as the title of your message.

The data file `tn10.csv`

corresponds to the Excel file `tn10.xsl`

.
It was written out from Excel using the CSV (comma separated values) format.

First take a look at the data file and read the help
of the R function `read.table()`

.
Next formulate a command by which you can create a data frame called
`results`

which holds
the contents of the file.
Check that the data frame contains 82 observations of 19 variables.

Suggested solution:

Reading through the help we find out that the function `read.csv2`

does
the job.

results <- read.csv2('tn10.csv') str(results)

The following commands creates the data frame `results`

from the data in
the file `tn10.dat`

even if you failed
to solve the previous problem

results <- read.table('tn10.dat')

This data pertains to points from the course on probability theory
(Todennäköisyyslaskenta) held in 2010.
To protect the culprits,
the names of the students have been omitted
and their student numbers have been replaced with random numbers.
The variables `H1`

, ..., `H10`

contain points from
each of the 10 exercise sessions
of the course,
the variables `K11`

, ... `K14`

contain
the points from the problems of the first
course exam,
and the variables `K21`

, ..., `K24`

the points from the problems of the second course exam

- Write commands which add variables
`H.sum`

and`K.sum`

to the data frame.`H.sum`

should contain the sum of the points from the exercise sessions, but all the NA values should be counted as zero (hint:`rowSums()`

).`K.sum`

should contain the sum of the points from both of the two course exams. - Write a command which creates the variable
`Extra.points`

based on the value of the variable`H.sum`

such that if`H.sum`

is in the range 10 <= H.sum < 15, then Extra.points is 1; for the range 15 <= H.sum < 20 Extra.points is 2; and so on; finally for 40 points or more Extra.points is 7. A one-line solution can be written using the function`cut()`

(but some of you may find it easier to solve the problem with some other approach). - How do you find out the student numbers (variable
`opisnro`

) of those students who did not obtain any points from the two course exams. (Watch out for missing values.)

Suggested solution:

# 1: H.sum <- rowSums(results[ , 2:11], na.rm = TRUE) results$H.sum <- H.sum K.sum <- with(results, K11 + K12 + K13 + K14 + K21 + K22 + K23 + K24) # Let me remove NA's K.sum[is.na(K.sum)] <- 0 results$K.sum <- K.sum # 2: lims <- c(0, seq(10, 40, by = 5), Inf) results$Extra.points <- (0:7)[cut(results$H.sum, breaks = lims, right = FALSE)] # 3: # the following works irrespective of whether NA's have been removed or not ind <- is.na(K.sum) | (K.sum == 0) results$opisnro[ind]

Applied to a vector `x`

, the following call calculates quantiles
corresponding to the probabilities 0.1, 0.5 and 0.9.

quantile(x, probs = c(0.1, 0.5, 0.9))Now we want to calculate these quantiles for each of the numeric variables of the

`iris`

data set.
Give a solution using one of the apply-type functions.
Suggested solution:

First find out which of the variables are numeric, then
use `lapply()`

or `sapply()`

. In the call `sapply(X, FUN, args)`

the additional arguments `args`

are passed to function `FUN`

.

data(iris) ind <- sapply(iris, is.numeric) sapply(iris[ind], quantile, probs = c(0.1, 0.5, 0.9))

In the U.S. temperatures are usually expressed in degrees Fahrenheit (F) instead of degrees Celsius (C), which are used in the rest of the world. The conversion formula between the two temperature scales is the following.

C = 5 / 9 * (F - 32)

Write function `FtoC`

which converts temperatures given in degrees Fahrenheit
into degrees Celsius.
Also write function `CtoF`

which converts temperatures given in
degrees Celsius into degrees Fahrenheit.

Suggested solution:

FtoC <- function(F) 5/9*(F - 32) CtoF <- function(C) 9/5 * C + 32 # Try them out: FtoC(seq(0, 100, by = 5)) CtoF(seq(-20, 40, by = 5))

Supposedly, a school teacher gave C. F. Gauss, at the age of seven, the problem of summing the integers 1, 2, ..., 100. Gauss found the answer almost instantly (without having been told about the arithmetic series).

1. How do you find that sum by using the function `sum()`

?

2. For the sake of practice, do the same calculation using a for-loop. Of course, your first solution, with the sum-function, is much clearer and shorter.

Suggested solution:

# 1. sum(1:100) # 2. s <- 0 for (i in 1:100) s <- s + i

Sometimes the data must be combined from several sources and there is a key which tells which of the experimental unit the data item comes from.

The names of the students of the probability theory course are kept
in a separate register, from which they have been extracted to
the file `tn10nimet.dat`

(but the real names have been changed).
Unfortunately, the names are not listed in the same order as in the
file `tn10.dat`

where the points can be found. Instead the student
number (variable `opisnro`

in file `tn10.dat`

and variable `nro`

in the
file `tn10nimet.dat`

) identify the students uniquely.

Produce a data frame whose first three variable contain the student
number (`opisnro`

), the first name (`etunimi`

) and the family name (`sukunimi`

)
and the rest of the variables are for the points from the exercise
sessions and the two course exams. (Hint: `merge()`

.)

Suggested solution:

results <- read.csv2('tn10.csv') nimet <- read.table('tn10nimet.dat', as.is = TRUE) d <- merge(nimet, results, by.x = 'nro', by.y = 'opisnro') str(d)

Last updated 2011-02-11 17:33

Petri Koistinen

petri.koistinen 'at' helsinki.fi