Introduction
Various application programs such as R and Microsoft Excel support importing and exporting data in tabular format. A CSV file contains data in the form of rows and columns that is data in the form of a table, A CSV file comprises several rows of data, and every piece of information within that row is separated by commas.
In this article, I will discuss how to read comma-separated data values in R and store these values in a data frame.
Reading comma-separated values
The read.csv function can be used to read data from comma-separated values (CSV) files. If a CSV file contains a header row, then to read data from such files, we can use the following syntax.
df1 <- read.csv("filename")
To read data from a CSV file that includes a row header, we can include a new argument named header and change its value as follows.
> df1 <- read.csv("filename", header=FALSE)
The above syntax will generate the following output.
> df1 = read.csv("bank.csv", header = TRUE)
> df1
age job marital education default balance housing loan contact day month duration campaign pdays previous
1 59 admin. married secondary no 2343 yes no unknown 5 may 1042 1 -1 0
2 56 admin. married secondary no 45 no no unknown 5 may 1467 1 -1 0
3 41 technician married secondary no 1270 yes no unknown 5 may 1389 1 -1 0
4 55 services married secondary no 2476 yes no unknown 5 may 579 1 -1 0
5 54 admin. married tertiary no 184 no no unknown 5 may 673 2 -1 0
if we import data from a CSV file named bank.csv containing data frames with 5 rows and 15 columns. On top, there is a row header that consists of the names of columns,
To read data from a CSV file that has been imported using the syntax mentioned above, we are using the read.csv function. The read.csv function builds up a data frame. Data frame is one of the ways through which we can represent any data in R which is available in the form of rows and columns that are in the form of tables.
The function in the below code implies that the CSV file already contains a header row.
> data = read.csv("bank.csv")
> data
age job marital education default balance housing loan contact day month duration campaign pdays previous
1 59 admin. married secondary no 2343 yes no unknown 5 may 1042 1 -1 0
2 56 admin. married secondary no 45 no no unknown 5 may 1467 1 -1 0
3 41 technician married secondary no 1270 yes no unknown 5 may 1389 1 -1 0
4 55 services married secondary no 2476 yes no unknown 5 may 579 1 -1 0
5 54 admin. married tertiary no 184 no no unknown 5 may 673 2 -1 0
6 42 management single tertiary no 0 yes yes unknown 5 may 562 2 -1 0
7 56 management married tertiary no 830 yes yes unknown 6 may 1201 1 -1 0
8 60 retired divorced secondary no 545 yes no unknown 6 may 1030 1 -1 0
9 37 technician married secondary no 1 yes no unknown 6 may 608 1 -1 0
10 28 services single secondary no 5090 yes no unknown 6 may 1297 3 -1 0
11 38 admin. single secondary no 100 yes no unknown 7 may 786 1 -1 0
12 30 blue-collar married secondary no 309 yes no unknown 7 may 1574 2 -1 0
13 29 management married tertiary no 199 yes yes unknown 7 may 1689 4 -1 0
14 46 blue-collar single tertiary no 460 yes no unknown 7 may 1102 2 -1 0
15 31 technician single tertiary no 703 yes no unknown 8 may 943 2 -1 0
16 35 management divorced tertiary no 3837 yes no unknown 8 may 1084 1 -1 0
17 32 blue-collar single primary no 611 yes no unknown 8 may 541 3 -1 0
18 49 services married secondary no -8 yes no unknown 8 may 1119 1 -1 0
19 41 admin. married secondary no 55 yes no unknown 8 may 1120 2 -1 0
20 49 admin. divorced secondary no 168 yes yes unknown 8 may 513 1 -1 0
21 28 admin. divorced secondary no 785 yes no unknown 8 may 442 2 -1 0
22 43 management single tertiary no 2067 yes no unknown 8 may 756 1 -1 0
23 43 management divorced tertiary no 388 yes no unknown 8 may 2087 2 -1 0
24 43 blue-collar married primary no -192 yes no unknown 8 may 1120 2 -1 0
25 37 unemployed single secondary no 381 yes no unknown 8 may 985 2 -1 0
26 35 blue-collar single secondary no 40 yes no unknown 9 may 617 4 -1 0
27 31 technician single tertiary no 22 yes no unknown 9 may 483 3 -1 0
28 43 blue-collar single secondary no 3 yes no unknown 9 may 929 3 -1 0
29 31 admin. married secondary no 307 yes no unknown 9 may 538 1 -1 0
30 28 blue-collar single secondary no 759 yes no unknown 9 may 710 1 -1 0
From the above code snippet, we can see that the header row within the data frame contains the name of the columns of the CSV data file as a header for the data frame.
If we do not want a data frame to include a header, then we can pass the argument header=FALSE, and R will generate dummy variables.
> df1 = read.csv("bank.csv", header = F)
> head(df1,20)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
1 59 admin. married secondary no 2343 yes no unknown 5 may 1042 1 -1 0
2 56 admin. married secondary no 45 no no unknown 5 may 1467 1 -1 0
3 41 technician married secondary no 1270 yes no unknown 5 may 1389 1 -1 0
4 55 services married secondary no 2476 yes no unknown 5 may 579 1 -1 0
5 54 admin. married tertiary no 184 no no unknown 5 may 673 2 -1 0
6 42 management single tertiary no 0 yes yes unknown 5 may 562 2 -1 0
7 56 management married tertiary no 830 yes yes unknown 6 may 1201 1 -1 0
8 60 retired divorced secondary no 545 yes no unknown 6 may 1030 1 -1 0
9 37 technician married secondary no 1 yes no unknown 6 may 608 1 -1 0
10 28 services single secondary no 5090 yes no unknown 6 may 1297 3 -1 0
11 38 admin. single secondary no 100 yes no unknown 7 may 786 1 -1 0
12 30 blue-collar married secondary no 309 yes no unknown 7 may 1574 2 -1 0
13 29 management married tertiary no 199 yes yes unknown 7 may 1689 4 -1 0
14 46 blue-collar single tertiary no 460 yes no unknown 7 may 1102 2 -1 0
15 31 technician single tertiary no 703 yes no unknown 8 may 943 2 -1 0
16 35 management divorced tertiary no 3837 yes no unknown 8 may 1084 1 -1 0
17 32 blue-collar single primary no 611 yes no unknown 8 may 541 3 -1 0
18 49 services married secondary no -8 yes no unknown 8 may 1119 1 -1 0
19 41 admin. married secondary no 55 yes no unknown 8 may 1120 2 -1 0
Structure of data frame
We can also take a look at the structure of data that has been imported. To display the structure of data, we can use the following syntax.
str(df1)
Here df1 is the name of the data frame.
Now I will discuss the structure of the data frame of the bank.csv file.
> df <- read.csv("bank.csv", as.is=TRUE)
> str(df)
'data.frame': 11162 obs. of 17 variables:
$ age : int 59 56 41 55 54 42 56 60 37 28 ...
$ job : chr "admin." "admin." "technician" "services" ...
$ marital : chr "married" "married" "married" "married" ...
$ education: chr "secondary" "secondary" "secondary" "secondary" ...
$ default : chr "no" "no" "no" "no" ...
$ balance : int 2343 45 1270 2476 184 0 830 545 1 5090 ...
$ housing : chr "yes" "no" "yes" "yes" ...
$ loan : chr "no" "no" "no" "no" ...
$ contact : chr "unknown" "unknown" "unknown" "unknown" ...
$ day : int 5 5 5 5 5 5 6 6 6 6 ...
$ month : chr "may" "may" "may" "may" ...
$ duration : int 1042 1467 1389 579 673 562 1201 1030 608 1297 ...
$ campaign : int 1 1 1 1 2 2 1 1 1 3 ...
$ pdays : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ previous : int 0 0 0 0 0 0 0 0 0 0 ...
$ poutcome : chr "unknown" "unknown" "unknown" "unknown" ...
$ deposit : chr "yes" "yes" "yes" "yes" ...
As we can see the structure of the data frame contains observations and variables. The variables have values which are of integer and character datatype.
Import values using the table function
We can also use read. table function to import values of CSV (comma-separated values) files in R. After reading every value from the CSV file, the values are stored in the data frame.
> df = read.table("bank.csv", header = TRUE)
> head(df,10)
age.job.marital.education.default.balance.housing.loan.contact.day.month.duration.campaign.pdays.previous.poutcome.deposit
1 59,admin.,married,secondary,no,2343,yes,no,unknown,5,may,1042,1,-1,0,unknown,yes
2 56,admin.,married,secondary,no,45,no,no,unknown,5,may,1467,1,-1,0,unknown,yes
3 41,technician,married,secondary,no,1270,yes,no,unknown,5,may,1389,1,-1,0,unknown,yes
4 55,services,married,secondary,no,2476,yes,no,unknown,5,may,579,1,-1,0,unknown,yes
5 54,admin.,married,tertiary,no,184,no,no,unknown,5,may,673,2,-1,0,unknown,yes
6 42,management,single,tertiary,no,0,yes,yes,unknown,5,may,562,2,-1,0,unknown,yes
7 56,management,married,tertiary,no,830,yes,yes,unknown,6,may,1201,1,-1,0,unknown,yes
8 60,retired,divorced,secondary,no,545,yes,no,unknown,6,may,1030,1,-1,0,unknown,yes
9 37,technician,married,secondary,no,1,yes,no,unknown,6,may,608,1,-1,0,unknown,yes
10 28,services,single,secondary,no,5090,yes,no,unknown,6,may,1297,3,-1,0,unknown,yes
>
>
Here we are also passing the argument header = TRUE to read. table function as a data frame containing a header row.
Summary
In this article, I demonstrated how to read comma-separated data values in R and store these values in the data frame. I also discussed how to read data values with a row header and without a row header. Two different kinds of functions are used to import comma-separated data values in R. Proper coding snippets and outputs are provided.