Software Tools / R: Homework 1: due Monday, 7 Feb, 2011

Send your solutions by email to the lecturer (firstname.lastname@helsinki.fi), at the latest on Mon 7 Feb at 10.00 using R1 as the title of your message. Also include your name and student number in the message.

In most cases, your solutions should be a few lines of R code. Please, send the solutions as plain text (do not format your code with a word processing program).


Exercise 1. (Initial preparation)

Either

  1. start R in the microcomputer room C128 (instructions)
  2. or install R on your own computer (instructions).

Create a directory (folder) where you put files that you need during the course (instructions). You do not need to document your solution to this exercise, even though this is the most important R exercise in the course. You will get automatically credit for this exercise, if you send solutions to any of the other R exercises.


Exercise 2. (Function calls)

R has a function called pnorm. What are the names of its formal arguments? Which values are bound to the formal arguments in the following call?

z <- 3
pnorm(1, lower = (1 < 2), 2, mean = z)

Suggested solution: To find out the names of the formal arguments, give the command args(pnorm) or read the help text by giving the command ?pnorm. The formal arguments and their default values are as follows.

> args(pnorm)
function (q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)

In the given function call, we have two named actual arguments and two other actual arguments. The named arguments are bound first, then the unnamed arguments are bound, left to right, to the remaining unbound formal arguments. If any of the formal arguments remains unbound and it has a default value, then it recieves the default value. Hence the arguments are bound in the following order to the following values. The name of the formal argument lower.tail has been abbreviated in the function call.

Formal argument Value Explanation
lower.tail TRUE named argument, value of (1 < 2) is TRUE
mean 3 named argument; value of variable z is 3
q 1 first unbound formal
sd 2 second unbound formal
log.p FALSE default value

This function call calculates the probability P(X <= 1), when random variable X is normally distributed with mean 3 and standard deviation 2 (or variance 4).


Exercise 3. (Indexing)

First create vector x, which contains 100 random values drawn from the Poisson distribution.

x <- rpois(100, 1.1)

Formulate your answers to the following questions so that they work not only for your particular sample but for any random sample drawn as above.

  1. How do you extract a vector, which contains the entries of x at the positions 2, 3 and 20?
  2. How do you create a logical vector b whose i'th entry is TRUE if and only if the i'th entry of x is zero?
  3. How do you find out, how many entries of x have the value zero?
  4. How do you select the non-zero entries of x?

Suggested solutions: (it is possible to solve this exercise in many ways).

# 1:
x[c(2, 3, 20)]
c(x[2],x[3], x[20]) # a longer way
# 2:
b <- x == 0
# 3:
sum(b)
sum(x == 0) # alternative
# 4:
# a)
b <- x == 0
x[!b]
# b)
x[x != 0]
# c)
x[!(x == 0)]
# and so on ...

Exercise 4. (More advanced indexing)

Now we want to find the indices of those entries of vector x (generated in the previous exercise) which are greater than or equal to 2. One way of doing this is the following.

which(x >= 2)

Now you should try to achieve the same result without using the which function. Instead, you should index with a suitable logical vector the vector inds which you generate as follows.

inds <- 1:length(x)

Suggested solution

inds <- 1:length(x)
inds[x >= 2]

Exercise 5. (Coping with missing values)

The following lines create vector x which contains a random number of missing values (NA's).

n <- 100
x <- rnorm(n)
x[rbinom(n, 1, 0.1) == 1] <- NA
  1. How do you find out how many missing values there are in x?
  2. How do you replace all the missing values with zeros?

Suggested solution:

# number of missing values:
sum(is.na(x))
# replacing missing values with zeros:
x[is.na(x)] <- 0

Exercise 6. (Ordering data according to the values of one variable)

Suppose that you want to plot data which resembles the data we generate as follows.

x <- runif(100, -pi, pi)
y <- sin(x)
Here we first sample 100 value uniformly on the interval (-pi, pi) and then calculate the sine function.

Try the command

plot(x, y, type = 'l')

(there is a lower case L inside the quotation marks). The result is a line plot, where the point (x[1], y[1]) is connected to the points (x[2], y[2]), (x[3], y[3]) and so on. Since the x-values are not ordered, the plot looks like a spider's web.

Instead, you want a line plot which resembles the graph of the sine function. The trick is to sort the x vector into increasing order, and to apply the same permutation also to the y vector prior to plotting. How do you do this in practice? Pretend that you do not know the rule of calculating the y's from the x's.

(Hint: sort(), order().)


Suggested solution:

We sort x's and reorder y's using the permutation, which sorts the x-vector.

plot(sort(x), y[order(x)], type = 'l')

The same operation could be done also in other ways, e.g. like this,

ind <- order(x)
plot(x[ind], y[ind], 'l')

Since we know in this exercise that y's are just sines of x's, we could also sort x's first and then recalculate y's.

plot(sort(x), sin(sort(x)), type = 'l')
However, this solution depends on the fact that we know how the y values were calculated in the first place.

Last updated 2011-02-04 17:55
Petri Koistinen
petri.koistinen 'at' helsinki.fi