Vectors, Lists, Data frames and Tibbles

If you are not familiarised with R Studio and the basics of R, it is highly recommend to complete this part of the tutorial first: Become familiar with R Studio and basic concepts of R and What is R programming language?

Vectors

A vector is the simplest data structure used in many programming languages. It consists of a collection of elements, which can be identified by an index. The author often imagines arrays and vectors as a row of terrace houses in the United Kingdom. They all look the same from the outside, but inside they are uniquely decorated. Each house has a unique number too. A vector in computer science must not be confused with Euclidean vectors; arrays and vectors relates more to a set.

In R, one-dimensional arrays have a fixed numbers of elements and referred as vector. Every element must be of the same data type; it is homogeneous. The number of the first element is 1; other programming languages may use 0. Vectors can be added, subtracted, multiplied and divided.

In R, vector are created using the c notation. For example, a vector containing the first three primes are defined as

 

 

Practice - Working with vectors

Numerical vectors manipulation

  1. Read the following introduction to vectors in R: Vectors in R (TutorialPoints)

  2. Create two vectors called prime and non.prime in a script or in the console.

  3. Find the sum, product, quotient and differences between the prime and non.prime vectors.

  4. Print the first and last element of each vector.

    1. Type print(prime[1])

    2. Type print(prime[3])

5. Sort the two vectors and print the ordered vectors.

Character vectors manipulation

  1. Create the days vector, type days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")

  2. Create another vector, referred as months.

  3. Try to find the sum, product, quotient and differences between days and months. Some errors should be shown.

  4. Sort the two vectors and print the ordered vectors.

Logical vectors manipulation

  1. Create two boolean vectors named sunny.days and rainy.days. Both of them should have 7 elements.

    1. Type rainy.days <- c(TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE)

    2. Type sunny.days <- c(FALSE,TRUE, FALSE, FALSE, TRUE, TRUE, FALSE)

  2. Find the sum, product, quotient and differences between the rainy.days and sunny.days vectors. Some numerical values should be found. It is worth noting, TRUE is represented by 1 and FALSE by 0. Some of the results have produced some numerical vectors, not logical vectors.

  3. Sort the two vectors and print the ordered vectors.

Using vectors incorrectly

  1. Create the vector incorrect.vector.1 <-c(1,”2”,3). The vector has been converted as a character vector.

  2. Create the vector incorrect.vector.2 <-c(1,TRUE,3, FALSE). The vector has been converted to a numerical vector.

List

List is also a heterogeneous one-dimensional data structure; it can consist of elements of the same data type or different data types. Each element has a unique index too; the index of the first element is 1 and the last one is the length of the list.

List are useful to group information of different data types. In R, list are created with the function list. The example below illustrates how a list can be created. It has contains a date, the humidity, the temperature and the wind speed with direction. Some characters, integer and numerical values are stored in the list as a complete set. A list can also holds some vectors.

 

A double bracket is used to extract the elements of a list; for example weather.condition[[1]].

Practice - create a list

Simple list

  1. read the tutorial on list: https://www.tutorialspoint.com/r/r_lists.htm

  2. Create the weather.condition list using the same structure as above. Try to add your own values.

  3. Display every element of the list.

  4. Display the first two elements of the list; type weather.condition[1:2]

  5. Convert the weather.condition into a vector. Type unlist(weather.condition). You will see all the values have been converted as character elements.

More complex list made of vectors

  1. Let’s create a two-dimensional structure using some vectors. Some vectors of the same lengths can be used. Use again the vectors created in the previous practice. Type weather.list <- list(days, rainy.days, sunny.days).

  2. Let’s build a character variable that shows the values for the Monday. Type paste(weather.list[[1]][1], weather.list[[2]][1],weather.list[[3]][1])

Operations not possible on lists

  1. Create a list of prime numbers; type prime.list <- list(2,3,5)

  2. Create a list of non prime numbers; non.prime.list <- list(1,4,-1)

  3. Create two vectors called prime and non.prime. It should not work.

Data frame, tibbles, and data table

A data frame is a data structure that stores data as a two-dimensional table, with some rows and columns. Similarly has a table in a database, each column can store data of the same data type. For that reason, each row may have an heterogenous collection of data. Each row must have the same length.

 

 

Elements of a data frame can be shown using a “[“ and “]” notation: [row or rows, column or columns].

A new packages referred Tibble has been created to simplify the data frame manipulation. It is worth reading about this new method of using two-dimensional tables.

Practice - Creating a data frame using some vectors

This exercise creates a data frame using three vectors. The latter are referred as days, humidity and temperatures.

  1. Create the days vector, type days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")

  2. Create the humidity vector, type humidity <- c(34,34,34,90,80,70,0) or something similar.

  3. Create the temperatures, by typing something similar as this example: temperatures <- c(23,23,12,9,12,19,20)

  4. Use the help to find the data.frame function. Read the help file.

  5. Create the data frame weather. Type weather <- data.frame(days,humidity,temperatures)

  6. Show the content of the data frame weather with the print function.

  7. From the Environment (top right section), look at the content of the data frame. Your data frame should look in this manner. Each vector has become one column.

 

Practice - Creating a data frame using a file

  1. Download the following data: weather.csv (source https://www.kaggle.com/zaraavagyan/weathercsv/downloads/weathercsv.zip/1).

  2. Copy the downloaded file into your project folder.

  3. Import the data. You have two options:

  4. Import the data using one of these options:

    1. Using the Files tool of R Studio (Botton right of the screen), find the downloaded file. Click on the file and select import Dataset…. Change the name to weather and click on Import.

    2. Alternatively, on the Environment (top right corner), click on the Import dataset option. Select the second option (i.e., From Text (readr). Use the Browse utility to find and select weather.csv. Change the name to weather and click on Import.

  5. A second data frame should have been created in the Environment. Open the new data frame in a new tab and explore the data.

Practice - Extracting some information from a data frame.

  1. Let’s show the humidity3pm type weather[ ,13].

  2. Let’s show the first 3 rows of the data frame: type weather[1:3,]

  3. Let’s show all the temperatures; type weather[,3]

  4. Let’s show the first two columns of the data frame; type weather[,1:2]

  5. Type weather[1:3,1:2]. The first 3 rows and first two columns are shown.

  6. Show the values of the last three rows.

  7. Show the values of the second column.

  8. Show the values of last three rows and the second column.

  9. Try other combinations of your own.

 

What can I learn next?

Understanding the communication with R

DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki