Module 1
Data Types
In programming, data types is an important concept. Variables can store data of different types, and different types can do different things. For correct processing, a programming language must know what can and cannot be done to a particular value. For example, addition cannot be performed on the words Hello
and world
. Similarly, you cannot change the numbers 5
or -22
from lower to uppercase.
Due to this, R has a feature called data types
. Different kind of values assign different data types that help differentiate them. These types have certain characteristics and rules associated with them that define their properties.
In this course we will consider the following data types:
- Numeric
- Integers
- Complex
- Logical
- Characters
There are more data types available in R, but it is beyond the scope of this class. Let’s get through these data types one-by-one.
Numeric Data Type
As you may expect, numeric data type
is for numerical values. To create a variable of a numeric data type, simply assign any numeric value to the variable.
x_num <- 1
print(x_num)
#> [1] 1
y_num <- -2.35
print(y_num)
#> [1] -2.35
Use class()
function to find out what the type of any variable is.
Integers Data Type
An integers data type
is a special case of the numeric data type
and is used for integer values. To store a value as an integer, we need to specify it as such using as.integer()
function:
x_int <- as.integer(2)
print(x_int)
#> [1] 2
class(x_int)
#> [1] "integer"
If an input value is not an integer itself (for example, 2.85), as.integer()
function will remove decimal points and will keep integers only.
y_int <- as.integer(2.85)
print(y_int)
#> [1] 2
class(x_int)
#> [1] "integer"
Another way of creating a variable of the integer data type
is to use an integer followed by the L
letter:
Complex Data Type
Complex data types
are used to store numbers with an imaginary component. For instance, 1 + 3i
, 2 - 5i
, and 3 - 4i
. In this class we are not going to use this data type, but it is good to know about them.
Logical Data Type
A logical data type
stores logical (also known as boolean) values of TRUE
and FALSE
:
Character Data Type
A character data type
stores character values or strings. Strings in R can contain the alphabet, numbers, and symbols. The easiest way to denote that a value is of character type
in R is to wrap the value inside single or double quotes:
Converting Data Types
In R we can convert values from one data type into another. R has certain rules that govern these conversions.
Converting into Numeric Data Type
Before we discuss how to convert any other data type into numeric, let’s first introduce is.numeric()
function that checks whether a variable is of numeric data type:
is.numeric(x_num)
#> [1] TRUE
is.numeric(x_char)
#> [1] FALSE
To convert any other data type into numeric, we can use as.numeric()
function. When converting integer type data into numeric, as.numeric()
changes its type and keeps the value as it is; when converting a complex data type, it removes the imaginary part of the number; when converting logical data type, the TRUE
value is converted to 1
, and FALSE
is converted to 0
; finally, character values can similarly be converted into numerical values but if the string contains letters or other symbols, then the numeric value becomes NA
:
######################################
x_comp
#> [1] 20+6i
is.numeric(x_comp)
#> [1] FALSE
num1 <- as.numeric(x_comp)
#> Warning: imaginary parts discarded in coercion
class(num1)
#> [1] "numeric"
print(num1)
#> [1] 20
######################################
x_logical
#> [1] TRUE
logical1 <- as.numeric(x_logical)
class(logical1)
#> [1] "numeric"
print(logical1)
#> [1] 1
######################################
y_logical
#> [1] FALSE
logical2 <- as.numeric(y_logical)
class(logical2)
#> [1] "numeric"
print(logical2)
#> [1] 0
######################################
print(y_char)
#> [1] "Welcome to STAT 2102!"
char1 <- as.numeric(y_char)
#> Warning: NAs introduced by coercion
class(char1)
#> [1] "numeric"
print(char1)
#> [1] NA
######################################
print(x_char)
#> [1] "2102"
char2 <- as.numeric(x_char)
class(char2)
#> [1] "numeric"
print(char2)
#> [1] 2102
Converting into Integer Data Type
To convert any other data type into integer, we can use as.integer()
function. The properties of this function are similar to those stated above, so we will skip them here. (Try it yourself!)
Converting into Logical Data Type
To convert any other data type into logical, we can utilize as.logical()
function. It return FALSE
if the value is zero and TRUE
if it anything else. Character values when converted by the as.logical()
function, always return NA
:
######################################
print(y_num)
#> [1] -2.35
is.logical(y_num)
#> [1] FALSE
logi1 <- as.logical(y_num)
class(logi1)
#> [1] "logical"
print(logi1)
#> [1] TRUE
######################################
print(y_char)
#> [1] "Welcome to STAT 2102!"
logi2 <- as.logical(y_char)
class(logi2)
#> [1] "logical"
print(logi2)
#> [1] NA
######################################
print(x_char)
#> [1] "2102"
logi3 <- as.logical(x_char)
class(logi3)
#> [1] "logical"
print(logi3)
#> [1] NA
Converting into Character Data Type
We can convert any data type into character data type using the as.character()
function. It converts the original value into a character string.
######################################
print(y_num)
#> [1] -2.35
is.character(y_num)
#> [1] FALSE
char1 <- as.character(y_num)
class(char1)
#> [1] "character"
print(char1)
#> [1] "-2.35"
######################################
print(x_comp)
#> [1] 20+6i
char2 <- as.character(x_comp)
class(char2)
#> [1] "character"
print(char2)
#> [1] "20+6i"
Data Structures
In any programming language, you need to use different variables to store different data. Unlike other programming languages like C
and Java
, R doesn’t have variables declared as some data type. Further, variables are appointed with R-objects and the knowledge form of the R-object becomes the datatype of the variable. There are many types of R-objects (data structures). The commonly used ones are:
- Vectors
- Lists
- Matrices
- Data Frames
- Factors
In this module, we will discuss vectors
and lists
. Later, we will go over other data structures as well.
Vectors
Creating Vectors
Vector is the most basic data structure in R programming language. There are various ways of creating a vector. The most common way is using c()
function:
vec1 <- c(1, 2, 3, 4, 5)
print(vec1)
#> [1] 1 2 3 4 5
vec2 <- c("fall", "winter", "spring", "summer")
print(vec2)
#> [1] "fall" "winter" "spring" "summer"
You can also use :
operator to create a vector:
vec3 <- 3:11
print(vec3)
#> [1] 3 4 5 6 7 8 9 10 11
Another way is to use seq()
function:
vec4 <- seq(from = 1, to = 5, by = 0.7)
print(vec4)
#> [1] 1.0 1.7 2.4 3.1 3.8 4.5
vec5 <- seq(from = 1, to = 5, length.out = 8)
print(vec5)
#> [1] 1.000000 1.571429 2.142857 2.714286 3.285714 3.857143
#> [7] 4.428571 5.000000
We can consider one more function, rep()
, to create a vector:
vec6 <- rep(5, times = 3)
print(vec6)
#> [1] 5 5 5
vec7 <- rep(c(1,3,4), times = 2)
print(vec7)
#> [1] 1 3 4 1 3 4
vec8 <- rep(c("apple", "orange", "mango"), times = 2, each = 3)
print(vec8)
#> [1] "apple" "apple" "apple" "orange" "orange" "orange"
#> [7] "mango" "mango" "mango" "apple" "apple" "apple"
#> [13] "orange" "orange" "orange" "mango" "mango" "mango"
How Many Elements Does Your Vector Contain?
We can use the length()
function to check how many elements are stored in vectors:
Adding Elements to Vectors
In order to add new elements to an existing vector, we can utilize c()
function once again:
# Adding three elements, c(15, 3, 4), to vec1
vec9 <- c(vec1, c(15, 3, 4))
print(vec9)
#> [1] 1 2 3 4 5 15 3 4
# Merging vec1 and vec3
vec10 <- c(vec1, vec3)
print(vec10)
#> [1] 1 2 3 4 5 3 4 5 6 7 8 9 10 11
If you would like to insert an element(s) at the specific position(s) in the vector, use append()
function:
Subsetting/Indexing Vectors
We use square brackets, []
, to extract specific elements from vectors:
# selects the first element of the vec1
vec1[1]
#> [1] 1
# selects the 1st, 5th, and 8th elements of the vec9
vec9[c(1,5,8)]
#> [1] 1 5 4
# selects the 4th, 5th, 6th, and 7th elements of the vec9
vec9[4:7]
#> [1] 4 5 15 3
# selects the first and second elements of vec1
vec1[c(T, T, F, F, F,F)]
#> [1] 1 2
# select all elements of vec1 that are greater than 2.5
vec1[vec1 > 2.5]
#> [1] 3 4 5
# select all elements of vec1 that are not equal to 3
vec1[vec1 != 3]
#> [1] 1 2 4 5
# selects all elements of vec1 except the 4th one
vec2[-4]
#> [1] "fall" "winter" "spring"
# selects all elements of vec1 except the 1st and 2nd ones
vec2[c(-1, -2)]
#> [1] "spring" "summer"
Assigning New Values to Elements of the Existing Vector
Use the assignment operator, <-
, to assign new values to elements of the existing vector:
Vectorization
The main advantage of vectors in R is that you can perform vectorized operations on them:
# Adding 1 to each element of vec1
print(vec1 + 1)
#> [1] 101 3 4 5 6
# For each element of the vector (1:3), raising 2 to the power of its elements
print(2^(1:3))
#> [1] 2 4 8
# Doing elementwise addition (you can do it with all arithmetic operations)
print(c(1, 2, 3) + c(4, 5, 6))
#> [1] 5 7 9
# Be careful! vectors should have the same length, otherwise it will recycle
# values of the shorter vector
print(c(1, 2, 3) + c(4, 5, 6, 7))
#> Warning in c(1, 2, 3) + c(4, 5, 6, 7): longer object length
#> is not a multiple of shorter object length
#> [1] 5 7 9 8
# Checking whether 2 is in vec1 using %in% function
print(2 %in% vec1)
#> [1] TRUE
Vectors Are Homogeneous!
The main disadvantage of vectors in R is that they can store homogeneous data only (data of the same type). If elements of a vector are of different data types, then the vector will convert their types so that all elements are of the same type:
# R will convert all elements of vec12 into characters, because vectors can only
#contain homogeneous data
vec12 <- c(2, 3.5, "fall", 2.7)
print(vec12)
#> [1] "2" "3.5" "fall" "2.7"
class(vec12)
#> [1] "character"
Question: What if I want to store heterogeneous data (data of different types)?
Solution: Use Lists
.
Lists
Creating Lists
You can create a list using list()
function:
Subsetting/Indexing Lists Using Square Brackets (Single and Double), [] and [[]]
# Selecting the first element of the list2 as a list
list2[1]
#> [[1]]
#> [1] 2 4 10
# Selecting the first element of the list2 as it is
list2[[1]]
#> [1] 2 4 10
# Selecting the second element of the first element of the list2
list2[[1]][2]
#> [1] 4
Merging Lists
You can merge lists using both c()
and list()
functions. Can you tell the difference between the outputs these functions produce?
a <- list(1, 2, 3)
b <- list (4, 5, 6)
merged_list1 <- c(a, b)
print(merged_list1)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
#>
#> [[4]]
#> [1] 4
#>
#> [[5]]
#> [1] 5
#>
#> [[6]]
#> [1] 6
merged_list2 <- list(a, b)
print(merged_list2)
#> [[1]]
#> [[1]][[1]]
#> [1] 1
#>
#> [[1]][[2]]
#> [1] 2
#>
#> [[1]][[3]]
#> [1] 3
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 4
#>
#> [[2]][[2]]
#> [1] 5
#>
#> [[2]][[3]]
#> [1] 6
c()
function merged the elements of list a
and list b
and created a list containing 6 elements. In contrast, list()
function created a list containing two elements, list a
and list b
.
Flattening Lists into Vectors
You can convert a list into a vector using unlist()
function:
Manipulating Elements in a List
Adding an element to a list:
print(list3)
#> [[1]]
#> [1] 1 2 3
#>
#> [[2]]
#> [1] 45
#>
#> [[3]]
#> [1] 20 -5
list3[4] <- 100
print(list3)
#> [[1]]
#> [1] 1 2 3
#>
#> [[2]]
#> [1] 45
#>
#> [[3]]
#> [1] 20 -5
#>
#> [[4]]
#> [1] 100
Removing an element from a list:
# Removing the second element in the list3
list3[2] <- NULL
print(list3)
#> [[1]]
#> [1] 1 2 3
#>
#> [[2]]
#> [1] 20 -5
#>
#> [[3]]
#> [1] 100
Changing values of elements in a list:
# Changing the second element of the first element of the list3
list3[[1]][3] <- 50
print(list3)
#> [[1]]
#> [1] 1 2 50
#>
#> [[2]]
#> [1] 20 -5
#>
#> [[3]]
#> [1] 100