What is R programming language?


An overview of R

R is an established statistical and mathematical programming language; it is currently in the top 25 programming languages in the TIOBE Index.  It provides some advanced statistical and visualization tools that can efficiently be used with some significant set of data. R is free and open-source. It can work with many operating systems, including Unix, Linux, macOS, and Windows.  It is used across disciplines (environmental sciences, geography, ecology, medical statistics, statistics, social sciences) and is commonly taught in undergraduate degrees.  Other popular statistical programming languages include Matlab and Python. R is a functional programming language. Programmers uses some functions to complete some data investigations.  They can create their own functions too.

R features include handling several standard file formats and can be integrated with other programming languages libraries. The R community is proud of providing some extensive documentation with R Journal, CRAN documentation, and its Latex-like documentation format.  R release have  often been named  after Peanuts strips and the season of the release; for example the Great Pumpkin was released in the Autumn. 



Let's compare statistical programming languages with spreadsheet


SpreadsheetStatistical programming languages
Spreadsheets store data in a tabular format. Data stored in the database  and standard file format can be imported to a spreadsheet. The maximum limit in Excel is 1,048,576 rows by 16,384 columns in Excel.Statistical programming languages can also import data from a standard file format and databases. The maximum limits too a data frame R is 2,147,483,647 rows and 2,147,483,647 columns. Additionally, other formats, such as vector and single variables, can be used.
The calculations and the data are both stored in the same file. Calculations and other computations can be separated.  Some analysis can be executed with some different data, without writing again all the formulae. 
Spreadsheet provides some graphical representation tools. Sometimes, some more advanced statistical representation are not available.Statistical programming languages often provides a wider range of graphical representation tools of data. The latter can often be formatted and refined.
Sharing and mining data through spreadsheet  can be a difficult task. The format of each spreadsheet is not guaranteed to be structured identically. The number of rows and columns can vary too.Statistical programming languages provides some advanced tools to clean, mine, merge and analyse from several sources and format. Some scripts written with some statistical programming languages can share separately from the data; code re-use is promoted. Data can be exported and share using some valid and standard file format.
Spreadsheets are often provided as a part of an office package; i.e., Office libre, OpenOffice and Microsoft Office. Some of the spreadsheets are free; some require to purchase for a license.Statistical programming languages rely on some Integrated Development Environment (IDE) to write their code. A lexicon provides a list of keywords to be used by the programmers so that an interpreter can interpret the code. The latter is often integrated with some IDEs but can be run directly by a command-line interface. Some statistical programming languages and IDEs are free. Some others require a license to be purchased.

How R works? ("not quite advanced" and "advanced" readers)

R is a programming language that is interpreted. At execution time, each line of code is translated into machine code. Unlike compiled and assembled programming languages, no code is required to be compiled first and assembled into machine-language instructions. For that reason, some R scripts can be written in Linux and executed on a Windows machine. However, some errors can stop a script half-way through its execution. 


R has a lexicon of reserved keywords that are used by the R interpreter. The latter evaluates some expressions against the R language definition.  R has a dictionary to define some data type of variables, arithmetical, assignment, and boolean operators, as well as some programming constructs. The R language definition is given as an additional reading list.

Installation of R 


R can be installed using the preferred method of installing software for an operating system; some links are provided as an additional reading list.  

Some installation software should be first downloaded. Then, the installer should be run to create the R framework.  On the left hand-side, the folders (or directories) of an R framework are shown.  This complex structure, inspired from Unix traditional folder structure, is required to execute and interpret any R code. The R framework is written in R, C, C++, Java and Fortran. For that reason, R is easily extensible.

  • The bin folder is short for binary, not "trash" or "waste bin". It contains all the programs executed by the R framework; it is unique for each operating system.
  • The etc folder stores all the configuration files.
  • The include folder contains some header files defined in C. 
  • The lib folder stores some components developed in C++.
  • The library folder contains all the R packages installed in an environment
  • The R command open an R environment from the command prompt (terminal) or an IDE.

Some warnings:

  • It is advisable not to edit any of these files. Otherwise, R may stop working. 
  • It is highly recommended to check the version of R supported by any libraries and packages. Otherwise, these libraries may stop working.



What can I learn next?

Become familiar with R Studio and basic concepts of R



DataSHIELD Wiki by DataSHIELD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.datashield.ac.uk/wiki