Send your solutions to the lecturer by email (firstname.lastname@example.org), at the latest on Mon 7 Feb at 10.00. Use R2 as the title of your message.
The data file
tn10.csv corresponds to the Excel file
It was written out from Excel using the CSV (comma separated values) format.
First take a look at the data file and read the help
of the R function
Next formulate a command by which you can create a data frame called
results which holds
the contents of the file.
Check that the data frame contains 82 observations of 19 variables.
Reading through the help we find out that the function
results <- read.csv2('tn10.csv') str(results)
The following commands creates the data frame
results from the data in
tn10.dat even if you failed
to solve the previous problem
results <- read.table('tn10.dat')
This data pertains to points from the course on probability theory
(Todennäköisyyslaskenta) held in 2010.
To protect the culprits,
the names of the students have been omitted
and their student numbers have been replaced with random numbers.
H10 contain points from
each of the 10 exercise sessions
of the course,
the points from the problems of the first
and the variables
the points from the problems of the second course exam
K.sumto the data frame.
H.sumshould contain the sum of the points from the exercise sessions, but all the NA values should be counted as zero (hint:
K.sumshould contain the sum of the points from both of the two course exams.
Extra.pointsbased on the value of the variable
H.sumsuch that if
H.sumis in the range 10 <= H.sum < 15, then Extra.points is 1; for the range 15 <= H.sum < 20 Extra.points is 2; and so on; finally for 40 points or more Extra.points is 7. A one-line solution can be written using the function
cut()(but some of you may find it easier to solve the problem with some other approach).
opisnro) of those students who did not obtain any points from the two course exams. (Watch out for missing values.)
# 1: H.sum <- rowSums(results[ , 2:11], na.rm = TRUE) results$H.sum <- H.sum K.sum <- with(results, K11 + K12 + K13 + K14 + K21 + K22 + K23 + K24) # Let me remove NA's K.sum[is.na(K.sum)] <- 0 results$K.sum <- K.sum # 2: lims <- c(0, seq(10, 40, by = 5), Inf) results$Extra.points <- (0:7)[cut(results$H.sum, breaks = lims, right = FALSE)] # 3: # the following works irrespective of whether NA's have been removed or not ind <- is.na(K.sum) | (K.sum == 0) results$opisnro[ind]
Applied to a vector
x, the following call calculates quantiles
corresponding to the probabilities 0.1, 0.5 and 0.9.
quantile(x, probs = c(0.1, 0.5, 0.9))Now we want to calculate these quantiles for each of the numeric variables of the
irisdata set. Give a solution using one of the apply-type functions.
First find out which of the variables are numeric, then
sapply(). In the call
sapply(X, FUN, args)
the additional arguments
args are passed to function
data(iris) ind <- sapply(iris, is.numeric) sapply(iris[ind], quantile, probs = c(0.1, 0.5, 0.9))
In the U.S. temperatures are usually expressed in degrees Fahrenheit (F) instead of degrees Celsius (C), which are used in the rest of the world. The conversion formula between the two temperature scales is the following.
C = 5 / 9 * (F - 32)
FtoC which converts temperatures given in degrees Fahrenheit
into degrees Celsius.
Also write function
CtoF which converts temperatures given in
degrees Celsius into degrees Fahrenheit.
FtoC <- function(F) 5/9*(F - 32) CtoF <- function(C) 9/5 * C + 32 # Try them out: FtoC(seq(0, 100, by = 5)) CtoF(seq(-20, 40, by = 5))
Supposedly, a school teacher gave C. F. Gauss, at the age of seven, the problem of summing the integers 1, 2, ..., 100. Gauss found the answer almost instantly (without having been told about the arithmetic series).
1. How do you find that sum by using the function
2. For the sake of practice, do the same calculation using a for-loop. Of course, your first solution, with the sum-function, is much clearer and shorter.
# 1. sum(1:100) # 2. s <- 0 for (i in 1:100) s <- s + i
Sometimes the data must be combined from several sources and there is a key which tells which of the experimental unit the data item comes from.
The names of the students of the probability theory course are kept
in a separate register, from which they have been extracted to
(but the real names have been changed).
Unfortunately, the names are not listed in the same order as in the
tn10.dat where the points can be found. Instead the student
opisnro in file
tn10.dat and variable
nro in the
tn10nimet.dat) identify the students uniquely.
Produce a data frame whose first three variable contain the student
opisnro), the first name (
etunimi) and the family name (
and the rest of the variables are for the points from the exercise
sessions and the two course exams. (Hint:
results <- read.csv2('tn10.csv') nimet <- read.table('tn10nimet.dat', as.is = TRUE) d <- merge(nimet, results, by.x = 'nro', by.y = 'opisnro') str(d)
Last updated 2011-02-11 17:33
petri.koistinen 'at' helsinki.fi