Reading in data

In your working folder there will be a file called ukgas.csv. UKgas is a dataset of the quarterly UK gas consumption from 1960 Q1 to 1986 Q4, in millions of therms. It is native to R as a built in dataset but we will learn how to read a file in.


UKgas is a dataset of the quarterly UK gas consumption from 1960 Q1 to 1986 Q4, in millions of therms. It is native to R as a built in dataset but we will learn how to read a file in.

  • In R set your working directory - where all your work will be saved


#R
setwd("C://FILE_PATH_TO_DATA_AND_RSCRIPTS")
  • Download ukgs.csv and save it to your working directory set above



Remember to comment your code e.g.

#this is a comment

or even create code sections e.g.

###################################
#NEW SECTION
################################### 

?read.csv
file1<-read.csv("ukgas.csv")

Exploring the data

file1 # this displays all data in file1
head(file1) # this displays xxxxxxxxx
tail(file1) # this displays xxxxxxxxxxxxx
file1 [1,] # this displays xxxxxxxxxxx
file1[1:5,] # displays xxxxxxxxxxx
file1[,1] # this displays xxxxxxxxxxxxxxxxxx
file1[,1:5] # this displays xxxxxxxxxxxx
file1[,'year'] # this displays xxxxxx
file1 [,'qtr1'] # this displays xxxxxx


You can assign your selected data to a new variable name using <- or  

year <- file1[,'year'] 
qtr1 = file1[,'qtr1']

It is best practice to use <- to assign a value to a new variable x rather than = which implies x equals the value. = is typically used to denote arguments within functions. 

Plotting data

?plot

plot(x = year, y = qtr1)

Writing the data

?png 
 
png(file="plot1.png") # opens the png printing driver, output to appear in plot1.png
plot(x = year, y = qtr1, type = 'b', col = 'red') # this plot is printed
dev.off() #prin driver closed

A pdf output can be printed using:

pdf (file="plot1.pdf", h=7, w=12) 
#where h is height in inches and w is width in inches
plot(x = year, y = qtr1, type = 'l', col = 'red')
dev.off()


Beautifying plots

R plots in layers. You start with a base plot using the plot function and then can add layers of extra data, regression lines, legends, text equations etc on top of it using:

#look up points in help file
?points
 
#assign qtr2 data to variable called qtr2
qtr2 = file1[,'qtr2']
 
#plot qtr 1
plot(x = year, y = qtr1, type = 'l', col = 'red')
 
#add qtr2 data
points(x = year, y = qtr2, type = 'l', col = 'black')


The arguments that can be applied in the points function are very similar to the plot function.

#look up points in help file
?legend


#assign qtr2 data to variable called qtr2
qtr2 = file1[,'qtr2']

#plot qtr 1
plot(x = year, y = qtr1, type = 'l', col = 'red')

#add qtr2 data
points(x = year, y = qtr2, type = 'l', col = 'black')
 
#add legend
legend(x = 'topleft', y = NULL, legend = c('qtr1', 'qtr2'), col = c('red', 'black'), lty = 1)
 

You will see that qtr 3 plots data off the y axis. Use the min and max functions respectively to identify the min of qtr3 and max of qtr1. Use the ylim argument in plot() to set the min and max y axis as the example below (denoted by i and j, respectively).

plot(x = year, y = qtr1, type = 'l', col = 'red', ylim=c(i,j))
points(x = year, y = qtr2, type = 'l', col = 'black')
points(x = year, y = qtr3, type = 'l', col = 'blue')
points(x = year, y = qtr4, type = 'l', col = 'green')
legend(x = 'topleft', y = NULL, legend = c('qtr1', 'qtr2','qtr3', 'qtr4'), col = c('red', 'black', 'blue', 'green'), lty = 1)




You have now completed Exploring the data: the basics. Make sure you:

  • comment your script appropriately
  • save your script somewhere sensible
  • your script should be similar to the example answer script.


Extended R Practical: Analysing simulated ALSPAC data