Chapter 3 Simple Manipulations — Numbers and Vectors

3.1 Vectors and Assignment

R operates on named data structures. The simplest such structure is the numeric vector, a single entity consisting of an ordered collection of numbers. To set up a vector named x, say, consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4, and 21.7, use the R command:

x <- c(10.4, 5.6, 3.1, 6.4, 21.7) #set up a vector
x
## [1] 10.4  5.6  3.1  6.4 21.7

This is an assignment statement using the function c(), which can take an arbitrary number of vector arguments and whose value is a vector got by concatenating its arguments end to end.

A few remarks:

  • We assigned the values to a variable called x.
  • The assignment operator is a <- , which consists of the two characters < (“less than”) and - (“minus”) occurring strictly side-by-side, and it ‘points’ to the object receiving the value of the expression. In most contexts, the ‘=’ operator can be used as an alternative. Both will be used, although you should learn one and stick with it.
  • The value of the x doesn’t automatically print out. It does when we type just the name though, as the last input line indicates.
  • The value of x is prefaced with a funny looking [1]. This indicates that the value is a vector. More on that later.
  • A number occurring by itself in an expression is taken as a vector of length one.
  • Assignment can also be made using the function assign(). An equivalent way of making the same assignment as above is with:
assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))

The usual operator, <-, can be thought of as a syntactic short-cut to this.

  • Assignments can also be made in the other direction, using the obvious change “->” in the assignment operator. So the same assignment could be made using
c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
x
## [1] 10.4  5.6  3.1  6.4 21.7

If an expression is used as a complete command, the value is printed and lost. So now if we were to use the command

1/x #reciprocal of x
## [1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295

the reciprocals of the five values would be printed at the terminal (and the value of x, of course, unchanged), but 1/x will be lost in R.

More example:

The following assignment

y <- c(x, 0, x)
y
##  [1] 10.4  5.6  3.1  6.4 21.7  0.0 10.4  5.6  3.1  6.4 21.7

would create a vector y with 11 entries consisting of two copies of x with a zero in the middle place.

3.2 Vector Arithmetic

  1. Vectors can be used in arithmetic expressions, in which case the operations are performed element by element.
x = c(1, 2, 3) 
y = c(4, 5, 6)
v = 2 * x + y + 1

Note: Vectors occurring in the same expression need not all be of the same length. If they are not, the value of the expression is a vector with the same length as the longest vector which occurs in the expression. Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector. In particular, a constant is simply repeated.

x1 = c(1, 2)
y1 = c(3, 5, 6, 8)
x1 + y1
## [1]  4  7  7 10

Try this yourself.

  1. The elementary arithmetic operators are the usual +, -, *, / and ^ for raising to a power.

  2. In addition all of the common arithmetic functions are available.

  • log, exp, sin, cos, tan, sqrt, and so on, all have their usual meaning.
exp((x + y)^2) / sqrt((x * y))
  • length(x) is the number of elements in x, sum(x) gives the total of the elements in x.
  1. Quick guide to soome built-in statistical functions in R
mean(x) # compute mean of variable x
median(x) # compute median of variable x
sd(x) # compute standard deviation of numbers in vector x
scale(x) # compute z-scores of numbers in vector x
quantile(x) # compute the quartiles or percentiles of vector x
var(x) # computes variance of variable x
cov(x,y) # computes variance of variables x and y
cor(x,y) # computes correlation of variables x and y

For example,

x <- c(0, 5, 9, 12, 35)
length(x)
## [1] 5
sum(x)
## [1] 61
xm <- mean(x)
xsd <- sqrt(var(x))

3.3 Generating Regular Sequences

R has many facilities for generating commonly used sequences of numbers. For example 1:30 is the vector c(1, 2, ..., 29, 30).

  1. The colon operator has a high priority within an expression, so, for example, 2*1:15 is really 2*(1:15), so we have the vector c(2, 4, ..., 28, 30). Put n <- 10 and you can compare the sequences 1:n-1 and 1:(n-1).

  2. The construction 30:1 may be used to generate a sequence backward.

  3. The function seq() is a more general facility for generating sequences. It has five arguments, only some of which may be specified in any one call.

  • The first two arguments, if given, specify the beginning and end of the sequence, and if these are the only two arguments given, the result is the same as the colon operator. That is, seq(2,10) is the same vector as 2:10.
  • Parameters to seq(), and to many other R functions, can also be given in named form, in which case the order in which they appear is irrelevant. The first two parameters may be named from=value and to=value; thus seq(1,30), seq(from=1, to=30) and seq(to=30, from=1) are all the same as 1:30.
  • The next two parameters to seq() may be named by=value and length=value, which specify a step size and a length for the sequence, respectively. If neither of these is given, the default by=1 is assumed.

For example,

seq(-5, 5, by = .2) -> s3
s3
##  [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0
## [12] -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8
## [23] -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8  1.0  1.2  1.4
## [34]  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2  3.4  3.6
## [45]  3.8  4.0  4.2  4.4  4.6  4.8  5.0
s4 <- seq(length = 51, from = -5, by = .2)
s4
##  [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0
## [12] -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8
## [23] -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8  1.0  1.2  1.4
## [34]  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2  3.4  3.6
## [45]  3.8  4.0  4.2  4.4  4.6  4.8  5.0
  • The fifth parameter may be named along=vector, which, if used must be the only parameter, and creates a sequence 1, 2, ..., length(vector), or the empty sequence if the vector is empty (as it can be).
  1. A related function is rep(), which can be used for replicating an object in various complicated ways. The simplest form is
options(width = 60)

s5 <- rep(x, times = 5)
s5
##  [1]  0  5  9 12 35  0  5  9 12 35  0  5  9 12 35  0  5  9
## [19] 12 35  0  5  9 12 35

which will put five copies of x end-to-end in s5. Another useful version is

options(width = 60)

s6 <- rep(x, each = 5)
s6
##  [1]  0  0  0  0  0  5  5  5  5  5  9  9  9  9  9 12 12 12
## [19] 12 12 35 35 35 35 35

which repeats each element of x five times before moving on to the next.

3.4 Logical Vectors

As well as numerical vectors, R allows manipulation of logical quantities.

  1. The elements of a logical vector can have the values TRUE, FALSE, and NA (for “not available”, see below).
    • The first two are often abbreviated as T and F, respectively.
    • Note, however, that T and F are just variables that are set to TRUE and FALSE by default, but are not reserved words and hence can be overwritten by the user. Therefore, you should always use TRUE and FALSE.
  2. Logical vectors are generated by conditions.

For example,

temp <- x > 13

sets temp as a vector of the same length as x with values FALSE corresponding to elements of x where the condition is not met and TRUE where it is.

  1. The logical operators are <, <=, >, >=, == for exact equality and != for inequality. In addition if c1 and c2 are logical expressions, then c1&c2 is their intersection (“and”), c1|c2 is their union (“or”), and !c1 is the negation of c1.
z = 12
(8 < z) & (z < 15) #Does z belong the interval (8,15)
## [1] TRUE
  1. Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric vectors, FALSE becoming 0 and TRUE becoming 1.
x <- c(1, 1, 2, 4, 8, 1, 7, 1)
sum(x == 1) #number of "1" in x
## [1] 4

3.5 Missing values

  1. In some cases, the components of a vector may not be completely known. When an element or value is “not available” or a “missing value” in the statistical sense, a place within a vector may be reserved for it by assigning it the special value NA. In general, any operation is incomplete; the result cannot be known, and hence is not available.
    • The function is.na(x) gives a logical vector of the same size as x with value TRUE if and only if the corresponding element in x is NA.
z <- c(1:3, NA)
ind <- is.na(z)
  • Notice that the logical expression x == NA is quite different from is.na(x) since NA is not really a value but a marker for a quantity that is not available. Thus, x == NA is a vector of the same length as x, all of whose values are NA as the logical expression itself is incomplete and hence undecidable.
  1. Note that there is a second kind of “missing” values which are produced by numerical computation, the so-called Not a Number, NaN values. Examples are
0 / 0
## [1] NaN

or

Inf - Inf
## [1] NaN

which both give NaN since the result cannot be defined sensibly.

  1. In summary, is.na(xx) is TRUE both for NA and NaN values. To differentiate these, is.nan(xx) is only TRUE for NaNs. Missing values are sometimes printed as when character vectors are printed without quotes.

3.6 Character Vectors

  1. Character quantities and character vectors are used frequently in R, for example, as plot labels. Where needed, they are denoted by a sequence of characters delimited by the double-quote character, e.g., “x-values”, “New iteration results”.

  2. Character strings are entered using either matching double (”) or single (’) quotes, but are printed using double quotes (or sometimes without quotes).

  3. Character vectors may be concatenated into a vector by the c() function; examples of their use will emerge frequently.

  4. The paste() function takes an arbitrary number of arguments and concatenates them one by one into character strings. Any numbers given among the arguments are coerced into character strings in an evident way, that is, in the same way, they would be if they were printed. The arguments are by default separated in the result by a single blank character, but this can be changed by the named parameter, sep=string, which changes it to string, possibly empty.

For example,

paste("I like", "Statistics")
## [1] "I like Statistics"
x = 2
paste("the value of x is", x)
## [1] "the value of x is 2"
labs <- paste(c("X","Y"), 1:10, sep="")
labs
##  [1] "X1"  "Y2"  "X3"  "Y4"  "X5"  "Y6"  "X7"  "Y8"  "X9" 
## [10] "Y10"

This makes labs into the vector repeated 5 times to match the sequence 1:10.

3.7 Index Vectors

Subsets of a vector’s elements may be selected by appending to the name of the vector an index vector in square brackets. More generally, any expression that evaluates to a vector may have subsets of its elements similarly selected by appending an index vector in square brackets immediately after the expression. Such index vectors can be any of four distinct types.

  1. A logical vector. In this case, the index vector must be of the same length as the vector from which elements are to be selected. Values corresponding to TRUE in the index vector are selected, and those corresponding to FALSE are omitted. For example
y <- x[!is.na(x)]

creates (or re-creates) an object y which will contain the non-missing values of x, in the same order. Note that if x has missing values, y will be shorter than x.

Also,

(x+1)[(!is.na(x)) & x > 0] -> z

creates an object z and places in it the values of the vector x+1 for which the corresponding value in x was both non-missing and positive.

  1. A vector of positive integral quantities.

In this case, the index vector’s values must lie in the set {1, 2, …, length(x)}. The corresponding elements of the vector are selected and concatenated, in that order, in the result. The index vector can be of any length, and the result is of the same length as the index vector. For example, x[6] is the sixth component of x and

x[1:10]

selects the first 10 elements of x (assuming length(x) is not less than 10). Also,

options(width = 60)

c("x", "y")[rep(c(1, 2, 2, 1), times = 4)]
##  [1] "x" "y" "y" "x" "x" "y" "y" "x" "x" "y" "y" "x" "x" "y"
## [15] "y" "x"

produces a character vector of length 16 consisting of “x”, “y”, “y”, “x” repeated four times.

  1. A vector of negative integral quantities. Such an index vector specifies the values to be excluded rather than included. Thus
y <- x[-(1:5)]

gives y all but the first five elements of x.

  1. A vector of character strings.

This possibility only applies where an object has a names attribute to identify its components. In this case, a sub-vector of the names vector may be used in the same way as the positive integral labels in item 2 further above.

fruit <- c(5, 10, 1, 20)
names(fruit) <- c("orange", "banana", "apple", "peach")
fruit
## orange banana  apple  peach 
##      5     10      1     20
fruit[c("apple","orange")]
##  apple orange 
##      1      5

The advantage is that alphanumeric names are often easier to remember than numeric indices. This option is particularly useful in connection with data frames, as we shall see later.

An indexed expression can also appear on the receiving end of an assignment, in which case the assignment operation is performed only on those elements of the vector. The expression must be of the form vector[index_vector] as having an arbitrary expression in place of the vector name does not make much sense here.

The vector assigned must match the length of the index vector, and in the case of a logical index vector, it must again be the same length as the vector it is indexing. For example,

x[is.na(x)] <- 0

replaces any missing values in x by zeros and

y[y < 0] <- -y[y < 0]

has the same effect as

y <- abs(y)

3.8 Exercises:

  1. Create the vectors.

    1. (1, 2, 3, ..., 19, 20)
    2. (20, 19, ..., 2, 1)
    3. (1, 2, 3, ..., 19, 20, 19, 18, . . . , 2, 1)
    4. (4, 6, 3) and assign it to the vector tmp.
  2. Create a vector of the values of excos(x) at x = 3, 3.1, 3.2, …, 6

  3. Create the following vectors.

    1. (0.13, 0.16, 0.19, ..., 0.136)
    2. (2, 22/2, 23/3, ..., 225/25)
  4. Use the function paste() to create the following character vectors of length 30: ("label 1", "label 2", ....., "label 30").

  5. Note that there is a single space between label and the number following: ("fn1", "fn2", ..., "fn30"). In this case, there is no space between fn and the number following.

6.Calculate the following: \[ \Sigma_{i=10}^{100} i^3+4i^2 \]

  1. Without using R, determine the result of the following computation:
x <- c(1, 2, 3)
x[1]/x[2]^3 - 1 + 2 * x[3] - x[2 - 1]
1:5 + c(1, 2, 3)
x1 <- c(1, 4, 3, NA, 7)
x2 <- c("a", "B", NA, "NA")
is.na(x1)
is.na(x2)
is.na(paste(c(1, NA)))
x <- c(1, 2, NA, 3)
mean(x)
mean(x, na.rm = TRUE)
c(1, 2, 3)[4]
  1. Compute the real roots of the quadratic equation \[ x^2+x+1=0 \] Recall the formula for the roots of a quadratic \[ x=\frac{-b \pm \sqrt{b^2-4ac}}{2a} \]

  2. Do the following for logical variables

    1. Create a logical vector
x <- seq(-3, 3, length = 200) > 0
# negate this vector
!x
  1. Compute the truth table for logical AND
c(T, T, F, F) & c(T, F, F, T)
 (c) Compute the truth table for logical OR
c(T, T, F, F) & c(T, F, F, T)
  1. Explore arithmetic with logical and numeric
1:3 + c(T, F, T)