Splitting Data into Train and Test using caret package in R

Splitting data in R using sample function and caret package

Data is split into Train and Test in R to train the model and evaluate the results.

There are multiple ways of doing this.

1. Splitting data using sample function

#load data into variable called mydata
mydata = read.csv('mydata.csv',header=T)
#setting seed so we get same data split each time
set.seed(100) #can provide any number for seed
nall = nrow(mydata) #total number of rows in data
ntrain = floor(0.7 * nall) # number of rows for train,70%
ntest = floor(0.3* nall) # number of rows for test, 30%
index = seq(1:nall)
trainIndex = sample(index, ntrain) #train data set
testIndex = index[-trainIndex]

train = mydata[trainIndex,]
test = mydata[testIndex,]

2. Splitting data using caret package

Data can be split in caret package based on the target variable, or y variable

For illustration, I am assuming target variable to be TARGET

#install caret package
#load package
trainIndex = createDataPartition(mydata$TARGET,
                       p=0.7, list=FALSE,times=1)

train = mydata[trainIndex,]
test = mydata[-trainIndex,]

There are more ways of splitting code. If you want to read more,
please refer to the following link

Hello world in R

This post is to illustrate how to write a simple hello world program in R.

To do this download R from here and install it on your system.

R is available on Windows, Mac as well as Linux platforms.

Once you install R, open R and you should see something like this R-basic

Now go to File menu and select New Script In the new file, let us code to print out the introductory Hello World! by giving the following code

 print("Hello World") 

Now save your code as Hello World.R 

Congrats, you have successfully saved an R file. Now it’s time to execute.
Select the code in the file and hit Ctrl+R. You can see output in R console behind
as follows

That’s it. You have successfully created your first R program and executed it successfully 🙂