Introduction
There are many predefined functions available in R which can be used for analysing data through some statistical functions. These functions are available in the R base package. Various statistical functions such as mean, median and mode are available in R for analysis of data. As input these functions take in vector and return the result. In this article, I will demonstrate how to calculate the mode of observations in a variables of a dataset.
Calculating mode
Mode of the values of a particular variable in a dataset is that observation in a variable whose occurance in a variables is more than any other observation in a variable. There is a predefined function available in R called median() function which can be used to calculate the medain of all the variable in a dataset.
There are different syntax available to calculate the mode of a variable in a dataset which are as follows,
- mode(dataset_name$variable_name)
- mode(dataset_name$variable_name, trim = 0.1)
- mode(dataset_name$variable_name,na.rm = TRUE)
Now to calculate mode I will be using predefined datasets available in R package. We will be using mtcars dataset to calculate the mean of different variables available in dataset mtcars.
- > mtcars
- mpg cyl disp hp drat wt qsec vs am gear carb
- Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
- Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
- Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
- Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
- Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
- Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
- Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
- Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
- Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
- Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
- Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
- Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
- Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
- Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
- Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
- Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
- Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
- Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
- Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
- Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
- Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
- Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
- AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
- Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
- Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
- Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
- Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
- Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
- Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
- Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
- Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
- Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
- >
Now we will be creating a user defined function to calculate mode of variables in dataset,
- > calcmode <- function(a) {
- + vector <- unique(a)
- + vector[which.max(tabulate(match(a, vector)))]
- + }
Now we will calculate the mode of variables of mtcars dataset.
- ds = mtcars
- calcmode(ds$mpg)
- > var <- calcmode(ds$mpg)
- > var
- [1] 21
In the above code, the syntax for calculating the mode of mpg variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then predefined mode function is used, the function has mpg variable as its argument.
- ds = mtcars
- > var <- calcmode(ds$cyl)
- > var
- [1] 8
In the above code, the syntax for calculating the mode of cyl variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then predefined mode function is used, the function has cyl variable as its argument.
- ds = mtcars
- > var <- calcmode (ds$disp)
- > var
- [1] 275.8
In the above code, the syntax for calculating the mode of disp variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then predefined mode function is used, the function has disp variable as its argument.
- ds = mtcars
- > var <- calcmode(ds$hp)
- > var
- [1] 110
- >
In the above code, the syntax for calculating the mode of hp variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then predefined mode function is used, the function has hp variable as its argument.
- a <- c(9, 6, 2, 55, 60, 35, 55, -31, 9, -5, 15)
-
- var <- calc
- var <- calcmode(a)
- print(var)
- It will generate the following output,
- > a <- c(9, 6, 2, 55, 60, 35, 55, -31, 9, -5, 15)
- > var <- calcmode(a)
- > print(var)
- [1] 55
- >
Using the above code, we have created a vector named a having 11 values. Then we calculated the mode of the values of the vector. The name of the vector is passed as an argument to the mode function and mode of the vector named a is calculated and assigned to the variable vec.
Trim argument
To remove a certain number of observations from the variables and sort them in ascending order, we can include trim argument into the mode() function to calculate the median of the observations. Let us implement the mode() function using the trim argument as follows,
- calcmode(df1, trim = 0.1)
- Let us implement the mode() function using the trim argument as follows,
- > df1 = data$mpg
- > df1
- [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
- > calc <- calcmode(df1,trim=0.3)
- > calc
- [1] 21
As we can see after using trim argument the observations are sorted and mode is calculated after the removal of 3 values from top and bottom of the mpg variable.
- > var <- calcmode(df1)
- [1] 21
We can also calculate the mode of the vectors by including trim argument as follows,
- a <- c(9, 6, 2, 43, 20, 3, 20, -31, 9, -5, 15)
- var <- calcmode(a, trim = 0.2)
- print(var)
It will generate the following output,
- > a <- c(9, 6, 2, 43, 20, 3, 20, -31, 9, -5, 15)
- > var <- calcmode(a, trim = 0.2)
- > var
- [1] 20
We have created a vector named a and calculated the mode of the vector. In the mode function, trim argument is used whose value is set to 0.2 which will remove two values each from left and right of the vector.
Calculating mode by removing missing values
If there are missing values present in the observations of the variable then upon calculating the mode, it will return NA.
To create missing values in a variable we can use the below syntax,
- > data[2,4] = NA
- > df2 = data$hp
- > df2
- [1] 110 NA 93 110 175 105 245 62 95 123 123 180 180 180 205 215 230 66 52 65 97 150 150 245 175 66 91 113 264 175 335 109
As we can see the dataset named data contains a variable named hp whose second observation is set to a null value. Upon calculating the mode of the hp variable, it will return NA.
Removal of missing values
We can calculate the mode of the variable by removing missing values from the variable by using the na.rm = True parameter inside the mode () function. The value of the parameter na.rm is set to True which indicates that NA values should be removed.
The below code will remove missing values as follows,
- > rs2 = calcmode (df2,na.rm = TRUE)
- > rs2
- [1] 180
- > a <- c(9, 6, 2, 43, 21, 21, 55, -31, 9, -5, NA)
-
- mode <- calcmode (a)
- print(mode)
Above code will return the following output,
- > a <- c(9, 6, 2, 43, 21, 21, 55, -31, 9, -5, NA)
- > mode <- calcmode (a)
- > print(mode)
- [1] NA
- >
Removing NA values and calculating the mode.
- Res1 <- calcmode (a,na.rm = TRUE)
- print(res1)
The above code will generate the following output,
- > Res1 <- calcmode (a,na.rm = TRUE)
- > Res1
- [1] 21
- >
As we can see a vector named a has been created, which is having NA value as well, upon calculating the mode, it will return mode as NA. Then we have included the parameter na.rm =True to remove NA from vector and then mode is calculated.
Summary
In this article, I demonstrate how to calculate the mode of variables of a dataset. Different ways of calculating a mode are also demonstrated. Proper coding snippets are provided.