Read csv file and excel files (.xls and .xlsx) into R using ‘readxl’ package
Even today, most of the companies use Microsoft Excel for storing their information. It is convenient, data is formatted in a tabular fashion, and needs no training as most of the employees know excel very well.
Excel files can be stored either as excel workbook.(.xls , .xlsx) or as Comma Separated values (.csv).
Before importing data into R, it would be helpful to set working directory where files are stored. This avoids providing the entire path for reading each file. It can be done using the following command
#knowing current working directory
#setting a new working directory
setwd("C:/...imaginary file path")
Notice the use of forward slash while providing working directory
Importing csv data into R
csv data can be read into a dataframe in R using the following command
mydata = read.csv('mydata.csv', header=TRUE)
Notice that as we have already set working directory, we need not provide the entire path where file is stored
Importing excel data into R
There are multiple packages to read excel data into R such as ‘xlsx’, ‘gdata’, ‘xlsReadWrite’ and the one which is illustrated here ‘readxl’
#install package readxl
#read excel file (with .xls or .xlsx extension) into r dataframe
mydata = read_exce('myfile.xlsx', sheet=1) #reading first sheet
#for xls file replace .xlsx with .xls above
The advantage of ‘readxl’ package is that it has no external dependencies and is easy to install and use on all operating systems.
Splitting data in R using sample function and caret package
Data is split into Train and Test in R to train the model and evaluate the results.
There are multiple ways of doing this.
1. Splitting data using sample function
#load data into variable called mydata
mydata = read.csv('mydata.csv',header=T)
#setting seed so we get same data split each time
set.seed(100) #can provide any number for seed
nall = nrow(mydata) #total number of rows in data
ntrain = floor(0.7 * nall) # number of rows for train,70%
ntest = floor(0.3* nall) # number of rows for test, 30%
index = seq(1:nall)
trainIndex = sample(index, ntrain) #train data set
testIndex = index[-train]
train = mydata[trainIndex,]
test = mydata[test,]
2. Splitting data using caret package
Data can be split in caret package based on the target variable, or y variable
For illustration, I am assuming target variable to be TARGET
#install caret package
trainIndex = createDataPartition(mydata$TARGET,
train = mydata[trainIndex,]
test = mydata[-trainIndex,]
There are more ways of splitting code. If you want to read more,
please refer to the following link
This post is to illustrate how to write a simple hello world program in R.
To do this download R from here and install it on your system.
R is available on Windows, Mac as well as Linux platforms.
Once you install R, open R and you should see something like this
Now go to File menu and select New Script In the new file, let us code to print out the introductory Hello World! by giving the following code
Now save your code as Hello World.R
Congrats, you have successfully saved an R file. Now it’s time to execute.
Select the code in the file and hit Ctrl+R. You can see output in R console behind
That’s it. You have successfully created your first R program and executed it successfully