Introduction
Factors are data objects used for the purpose of categorizing data and then storing them under levels. They can be used for storage of both strings and integers. Factors are only useful in the columns with a limited number of unique values. They are good in data analysis and statistical modeling.
Creation of Factors
To create factors in R, we use the factor() method and use a vector as the input. Consider the example given below showing how this function can be used:
- d <- c("East","West","East","North","North","East","West","West","West","East","North")
Let us now see the contents of the vector:
- > d <- c("East","West","East","North","North","East","West","West","West","East","North")
- > d
- [1] "East" "West" "East" "North" "North" "East" "West" "West" "West"
- [10] "East" "North"
To check whether d is a factor or not, we use the is.factor() attribute, as shown below:
The script returns the following:
- > d <- c("East","West","East","North","North","East","West","West","West","East","North")
- > d
- [1] "East" "West" "East" "North" "North" "East" "West" "West" "West"
- [10] "East" "North"
- >
- > is.factor(d)
- [1] FALSE
Object d is not a factor. It is a vector. We need to call the factor() method and pass the name of the vector to it.
The vector will be changed to a factor:
-
- factor_data <- factor(d)
- >
- > factor_data <- factor(d)
- >
Let us need the contents of the factor and determine whether d is a factor or not,
Execution of the program should give the following output:
- > is.factor(factor_data)
- [1] TRUE
- >
The output shows that we already have a factor. We have successfully created a factor from a vector by calling the factor() method.
We can also create a factor from a data frame. Once you have created a data frame having a column of text data, R treats the next column as categorical data and then creates factors on it. Consider the example given below showing how this can be done:
-
- height <- c(140,152,164,137,166,157,112)
- weight <- c(38,49,76,54,97,22,30)
- gender <- c("male","male","female","female","male","female","male")
- > height <- c(140,152,164,137,166,157,112)
- > weight <- c(38,49,76,54,97,22,30)
- > gender <- c("male","male","female","female","male","female","male")
Creating the data frame
- input_data <- data.frame(height,weight,gender)input_data <- data.frame(height,weight,gender)
Let us view the contents of the data frame:
- > input_data <- data.frame(height,weight,gender)
- > input_data
- height weight gender
- 1 140 38 male
- 2 152 49 male
- 3 164 76 female
- 4 137 54 female
- 5 166 97 male
- 6 157 22 female
- 7 112 30 male
Let us check whether the column gender is a factor or not:
- is.factor(input_data$gender)
It returns the following output:
- > is.factor(input_data$gender)
- [1] FALSE
Yes, the column is a factor.
We can now print the gender column to see the levels:
The script will return the following output:
The order of the levels contained in a factor can be changed by applying the factor function again while specifying the new order of the levels.
Consider the example given below:
- d <- c("East","West","East","North","North","East","West","West","West","East","North")
- > d <- c("East","West","East","North","North","East","West","West","West","East","North")
Let us create the factors:
- factor_data <- factor(d)
- > factor_data <- factor(d)
Let us display the factor data:
- > factor_data
- [1] East West East North North East West West West East North
- Levels: East North West
Let us now apply the factor function and the required order for the level,
- new_order_data <- factor(factor_data,levels = c("East","West","North"))
The above syntax will give the following output:
- >
- > new_order_data <- factor(factor_data,levels = c("East","West","North"))
- > new_order_data
- [1] East West East North North East West West West East North
- Levels: East West North
- >
Let us view the data:
- > new_order_data <- factor(factor_data,levels = c("East","West","North"))
- > new_order_data
- [1] East West East North North East West West West East North
- Levels: East West North
In R, we can generate factor levels using the “gl()” function. The function will take two integers, in which the first integer will specify the number of levels while the second integer will specify the number of times for each level.
The function takes the syntax as gl(n, k, labels)
The following parameters have been used in the above syntax:
- n- This is an integer which defines the number of levels.
- k- This is an integer that specifies the number of replications.
-
labels- this is a vector of labels representing the resulting factor levels.
Consider the example given below which shows how the function can be used:
- vec <- gl(2, 3, labels = c("Texas", "Seattle","Boston"))
Then we print the contents of the vector,
- > vec <- gl(2, 3, labels = c("Texas", "Seattle","Boston"))
- > vec
- [1] Texas Texas Texas Seattle Seattle Seattle
- Levels: Texas Seattle Boston
- >
We can also create a factor directly from the factor() function. The following example demonstrates this:
- x <- factor(c("Married", "married", "single", "single"));
We can then print out the contents of the factor:
- > x <- factor(c("Married", "married", "single", "single"));
- > x
- [1] Married married single single
- Levels: married Married single
- >
The elements of a factor can be accessed in the same way as those of a vector. For example, Here is our factor x.
Let us access the 2nd element of the factor:
The script will run as follows:
- > x[2]
- [1] married
- Levels: married Married single
- >
Let us access the 1st and the 3rd elements of the factor:
It will return the following:
- >
- > x[c(1, 3)]
- [1] Married single
- Levels: married Married single
- >
Let us access all the factor elements except for the 1st one:
It prints the following output:
- >
- > x[-1]
- [1] married single single
- Levels: married Married single
- >
To modify the elements of a vector, we only have to use simple reassignments. However, it’s impossible for us to choose components outside its predefined levels.
We have changed the value of the 3rd element from single to married.
The code should run as follows:
- >
- > x[3] <- "married"
- >
-
- > x
- [1] Married married married single
- Levels: married Married single
- >
The above output shows that the change was made successfully. In our case, we only have two levels, married and single. If we attempt to assign a value that is outside this, we will get a warning message.
This will run as follows:
- >
- > x[3] <- "divorced"
- Warning message:
- In `[<-.factor`(`*tmp*`, 3, value = "divorced") :
- invalid factor level, NA generated
- >
Summary
In this article, I demonstrated how to create factors in R using R console and perform various operations on a factor such as accessing factor elements using indexing technique, accessing elements which are not in factor, and modifying the elements of a factor. Proper coding snippets along with output have been provided.