Introduction
R can read data from various types of files having data in different formats. One such data format is the format of files created using Microsoft Excel. Files saved in Microsoft Excel are generally in .xls or .xlsx format. Data available in files supporting .xls or .xlsx format can be read by R with the help of various packages provided in the directory of R.
In this article, we will be using readxl and XLConnect packages to read Excel data from files available in .xls or .xlsx format. To import data from Excel various techniques are available which require the installation of third‐party libraries as well. These libraries may not be available for all operating systems.
Therefore we can also use a package named XLConnect. To read and write files supporting .xls or .xlsx format, the XLConnect package requires a Java library, therefore XLConnect can run on different kinds of operating systems.
Installing the package XLConnect
We need to install and load the package in the directory of R, only then we can read an Excel file supporting formats such as .xls and .xlsx. In order to use a particular file in R, we can create an object of a file name as well.If Java is installed in an operating system, then to run the XLConnect package, there is no need to install and load any other library in R.
install.packages("XLConnect")
The above code will generate the following output.
> install.packages("XLConnect")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘R/win-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/XLConnect_1.0.1.zip'
Content type 'application/zip' length 22907452 bytes (21.8 MB)
downloaded 21.8 MB
package ‘XLConnect’ successfully unpacked and MD5 sums checked
Loading the XLConnect package
library("XLConnect")
The above code will generate the following output.
> library("XLConnect")
XLConnect 1.0.1 by Mirai Solutions GmbH [aut],
Martin Studer [cre],
The Apache Software Foundation [ctb, cph] (Apache POI),
Graph Builder [ctb, cph] (Curvesapi Java library)
http://www.mirai-solutions.com
https://github.com/miraisolutions/xlconnect
Once the package is loaded in the library, we can either load an entire Excel file or load a single worksheet at a time using a function named readWorksheetFromFile(). We have to pass two arguments inside the readWorksheetFromFile() function to read Excel files in R.
These two arguments are
- A file name with .xls or .xlsx format along with a valid path of the directory where the required file is stored.
- Worksheet’s name or index position.
Using readWorksheetFromFile function
> info <- readWorksheetFromFile("info.xlsx ", sheet = 1)
> print(info)
The above code will generate the following output.
> info <- readWorksheetFromFile("info.xlsx ", sheet = 1)
> print(info)
age. job marital eduation default balance housing loan contact day month duration campaign pdays previous poutcome y
1 58 management married tertiary no 2143 yes no unknown 5 may 261 1 -1 0 unknown no
2 44 technician single secondary no 143 no no unknown 7 may 261 1 -1 0 unknown no
3 33 entrepreneur married primary no 43 yes yes unknown 10 may 261 1 -1 0 unknown yes
4 47 management single unknown no 250 yes no unknown 4 may 261 1 -1 0 unknown no
5 33 blue-collar porced tertiary no 2100 yes no unknown 8 may 261 1 -1 0 unknown no
6 35 services married tertiary no 213 no yes unknown 25 may 261 1 -1 0 unknown yes
7 28 retired married secondary no 214 yes no unknown 19 may 261 1 -1 0 unknown no
8 42 technician single primary no 2500 no yes unknown 5 may 261 1 -1 0 unknown no
9 58 management married tertiary no 230 yes no unknown 9 may 261 1 -1 0 unknown yes
To remove an Excel file and clean the working environment of R, we can use the file.remove() function.
> file.remove("movies_data.xlsx")
readxl package
To use readxl package, we need to install readxl package first as follows.
install.packages("readxl")
The above code will generate the following output.
> install.packages("readxl")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘R/win-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/readxl_1.3.1.zip'
Content type 'application/zip' length 1708917 bytes (1.6 MB)
downloaded 1.6 MB
package ‘readxl’ successfully unpacked and MD5 sums checked
>
To read data from Excel files we can use readxl package to read data from .xls and .xlsx format files. The readxl package contains the read_excel() function which can be used to read data from Excel files.
We can also determine the number of sheets available in the Excel file as follows.
> library(readxl)
> excel_sheets('info.xlsx')
The above code will generate the following output.
> excel_sheets('info.xlsx')
[1] "Book1"
As we can see from the above output, the info.xlsx Excel file contains just a single worksheet named Book1.
Using the read_excel function to read data
> df <-read_excel('info.xlsx')
> print(df)
The above code will generate the following output.
> df <-read_excel('info.xlsx')
> print(df)
# A tibble: 9 x 17
age job marital eduation default balance housing loan contact day month duration campaign pdays previous poutcome y
<dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 58 management married tertiary no 2143 yes no unknown 5 may 261 1 -1 0 unknown no
2 44 technician single secondary no 143 no no unknown 7 may 261 1 -1 0 unknown no
3 33 entrepreneur married primary no 43 yes yes unknown 10 may 261 1 -1 0 unknown yes
4 47 management single unknown no 250 yes no unknown 4 may 261 1 -1 0 unknown no
5 33 blue-collar porced tertiary no 2100 yes no unknown 8 may 261 1 -1 0 unknown no
6 35 services married tertiary no 213 no yes unknown 25 may 261 1 -1 0 unknown yes
7 28 retired married secondary no 214 yes no unknown 19 may 261 1 -1 0 unknown no
8 42 technician single primary no 2500 no yes unknown 5 may 261 1 -1 0 unknown no
9 58 management married tertiary no 230 yes no unknown 9 may 261 1 -1 0 unknown yes
As we can see, the above output generates a table instead of a data frame.
Reading Excel data using the position of the worksheet
> df <-read_excel('info.xlsx', sheet=1)
> print(df1)
The above code will generate the following output.
> df <-read_excel('info.xlsx', sheet=1)
> print(df1)
> print(df1)
# A tibble: 9 x 17
age job marital eduation default balance housing loan contact day month duration campaign pdays previous poutcome y
<dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 58 management married tertiary no 2143 yes no unknown 5 may 261 1 -1 0 unknown no
2 44 technician single secondary no 143 no no unknown 7 may 261 1 -1 0 unknown no
3 33 entrepreneur married primary no 43 yes yes unknown 10 may 261 1 -1 0 unknown yes
4 47 management single unknown no 250 yes no unknown 4 may 261 1 -1 0 unknown no
5 33 blue-collar porced tertiary no 2100 yes no unknown 8 may 261 1 -1 0 unknown no
6 35 services married tertiary no 213 no yes unknown 25 may 261 1 -1 0 unknown yes
7 28 retired married secondary no 214 yes no unknown 19 may 261 1 -1 0 unknown no
8 42 technician single primary no 2500 no yes unknown 5 may 261 1 -1 0 unknown no
9 58 management married tertiary no 230 yes no unknown 9 may 261 1 -1 0 unknown yes
>
Reading Excel data using the name of the worksheet
> df1 <-read_excel('info.xlsx', sheet='Book1')
> print(df1)
The above code will generate the following output.
> df1 <-read_excel('info.xlsx', sheet='Book1')
> print(df1)
# A tibble: 9 x 17
age job marital eduation default balance housing loan contact day month duration campaign pdays previous poutcome y
<dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 58 management married tertiary no 2143 yes no unknown 5 may 261 1 -1 0 unknown no
2 44 technician single secondary no 143 no no unknown 7 may 261 1 -1 0 unknown no
3 33 entrepreneur married primary no 43 yes yes unknown 10 may 261 1 -1 0 unknown yes
4 47 management single unknown no 250 yes no unknown 4 may 261 1 -1 0 unknown no
5 33 blue-collar porced tertiary no 2100 yes no unknown 8 may 261 1 -1 0 unknown no
6 35 services married tertiary no 213 no yes unknown 25 may 261 1 -1 0 unknown yes
7 28 retired married secondary no 214 yes no unknown 19 may 261 1 -1 0 unknown no
8 42 technician single primary no 2500 no yes unknown 5 may 261 1 -1 0 unknown no
9 58 management married tertiary no 230 yes no unknown 9 may 261 1 -1 0 unknown yes
Summary
In this article, I demonstrated how to use readxl and XLConnect packages to read Excel data from files available in .xls or .xlsx format. I explained how to install and load packages to read Excel data. Proper coding snippets and outputs are also provided.