Basic Intro to R
In this article, we talk about R Programming Language. We discuss its uses, the niche group of people who greatly benefit by this, its advantages above some of the other programming language for its specific purpose and later dive deeper into its Data types and get hands on experience with programming.
R Programming Language(R): R is a programming language which is widely used for graphics and statistical computing. R Foundation for Statistical Computing supports the R language and has also provided free software environment. Statisticians and data miners widely use the R programming language which enables them to develop statistical software and data analysis. It is an amazing program to perform statistics operation and since it is open source, it is strongly supported by the R Community too.
We took a dive into Python with "An Approach to Data Analytics Using Python." There are many reasons to choose R Programming over Python for multiple scenarios for different reasons. First of all, if you employ a lot of Statistics, R is elegant for this. Most seasoned statisticians have a penchant for R over Python in day-to-day work. Many who have experienced both Python and R programming language agree on that first learning steps are way easy in R. And R has exceptional visualizations tools. Visualization is better with R packages than on Python.
Today, we’ll learn to run R, open files and do some simple data-wrangling. We’ll experiment with input file formats and do some calculations with R. After that, we’ll dive into Data Frames.
First of all, please download RStudio from the following link: It is freely available via: https://www.rstudio.com/.
After that install R into your system using this
link.
RStudio
It is a really nice Integrated Development Environment for R. It enables statisticians and researchers to develop statistical programs,
In this article, we’ll demonstrate how to use RStudio, load in a file and run code.
Data Syntax
It can be defined as the way statements can be structured in a language. We can understand it by thinking of grammar in our real-world linguistic languages. Some of the syntaxes for working with R-objects are as follows:
Data frames: df <- matrix(c(1,2,4,9,”a”,”r”), nrow = 2)
Data Structures
Data Structures can be defined as some particular method of organizing data such that it can be used effectually. Data can be stored in a variety of ways such as follows:
- Array
- Linked List
- Stack
- Queue
- Binary Tree
- Binary Search Tree
- Heap
- Graph
- Matrix
Data types
In contrast to other programming languages like Java and C, the variables in R is not declared as some particular data type. The variables in R language are assigned with R – objects and that data type of the R-objects becomes variable's data type. Data can be structured as numeric, characters and factors.
Some of the commonly used R-objects are as follows,
Vectors
They are known as the most basic R data objects which are mainly of six different types. Known as Atomic Vectors, they are integer, complex, logical, double, raw and character.
Using colon operator with numeric data,
The sequence is created with the values of v as follows,
Lists
List in R is created using the functions list(). It can contain different types of elements such as vectors, numbers, strings and another list.
Creating a List
-
- list_data <- list("Yellow", "Brown", c(1,2,3), FALSE, 35.53, 999)
- print(list_data)
If you want to dive deeper in R Programming Language, watch this video from
AI 42,
Matrices
Matrices in R consists of element which are of the same atomic types and are arranged in two-dimensional rectangular format.
Creating Matrices
-
- matrix1 <- matrix(c(1, 2, -1, 2, 8, 9), nrow = 2)
- print(matrix1)
- matrix2 <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2)
- print(matrix2)
-
- result <- matrix1 + matrix2
- cat("Result of addition","\n")
- print(result)
Arrays
Data can be stored in an array in more that two dimensions. It is created using array(). Vectors are taken as input and from the values of dim parameter, the array Is created.
-
- vector1 <-c(0,1,2)
- vector2 <-c(5,6,7,8,9,10)
- column.names <- c("COL1","COL2","COL3")
- row.names <- c("ROW1","ROW2","ROW3")
- matrix.names <- c("Matrix1","Matrix2")
-
- result <- array(c(vector1,vector2),dim =c(3,3,2),dimnames = list(row.names,column.names,matrix.names))
- print(result)
Factors
These are data objects using which the data are categorized and stored as levels. Integers and strings both can be stored with Factors. In performing data analysis in statistical modeling, they are highly useful. Using factor(), factors can be created.
Using Factor
-
- data <- c("Right","Left","Right","Up","Up","Right","Left")
- print(data)
- print(is.factor(data))
-
- factor_data <- factor(data)
- print(factor_data)
- print(is.factor(factor_data))
Data Frames
Data frames are two-dimensional and heterogeneous tabular data that are mutable in size. i.e., its size can be changed.
- df <- data.frame(matrix(c(2,4,6,8,"C#","Corner"), ncol=3))
In order to read files, we need a library: library(readxl)
Accessing data frames,
- df[rows,columns]
- df$columns[row]
-
- emp.data <-data.frame(
- emp_id = c (1:5),
- emp_name = c("Ram","Laxman","Bharat","Shatrughan","Krishna"),
- salary = c(909.23,642.18,871.0,347.60,890.05),
- start_date =as.Date(c("2020-01-01","2021-02-12","2018-12-05","2019-06-14","2017-03-17")),
- stringsAsFactors = FALSE
- )
-
- print(emp.data)
Calculating stuff within R
You can change the format of variable with the following operators and functions,
- Add, minus, multiplication, etc (+,-,*,/)
- Square root: srt()
- Logarithm: log(), log10(), log2()
- Potens: x^2
In order to perform a particular task, a set of statements are organized together which are called as functions. R consists of huge number of in-built functions. Also, the users have the freedom to create their own functions. In the R programming language, a function is in fact an object. Functions are basically predefined operation that are designed to process data in a certain way.
Examples
mean(), max(), seq(), c(), as.numeric() and read.excel() are all functions.
Differences between Data frame and Matrix?
A data frame contains a collection of “things” (rows) each with a set of properties (columns) of different types. Actually, this data is better thought of as a matrix. In a data frame the columns containdifferent types of data, but in a matrix all the elements are the same type of data.
Conclusion
Today, we learned about R Programming Language. This is widely and extensively used by statisticians. Initially, we started with installing R Studio and R and then went ahead to learn Dat Syntax, Variables, Data Structures, Data Types and got deeper into each of them. They will come in handy in the future when we work with data and want to execute various functions on it to obtain our goals. In the next article we’ll dive deeper into R Programming Language and learn about Modeling and Visualization in R.