Splitting Data into Train and Test using caret package in R

Splitting data in R using sample function and caret package

Data is split into Train and Test in R to train the model and evaluate the results.

There are multiple ways of doing this.

1. Splitting data using sample function

#load data into variable called mydata
mydata = read.csv('mydata.csv',header=T)
#setting seed so we get same data split each time
set.seed(100) #can provide any number for seed
nall = nrow(mydata) #total number of rows in data
ntrain = floor(0.7 * nall) # number of rows for train,70%
ntest = floor(0.3* nall) # number of rows for test, 30%
index = seq(1:nall)
trainIndex = sample(index, ntrain) #train data set
testIndex = index[-train]

train = mydata[trainIndex,]
test = mydata[test,]

2. Splitting data using caret package

Data can be split in caret package based on the target variable, or y variable

For illustration, I am assuming target variable to be TARGET

#install caret package
install.packages('caret')
#load package
library(caret)
trainIndex = createDataPartition(mydata$TARGET, 
                       p=0.7, list=FALSE,times=1)

train = mydata[trainIndex,]
test = mydata[-trainIndex,]

There are more ways of splitting code. If you want to read more,
please refer to the following link
http://topepo.github.io/caret/splitting.html

Author: Krishna

Data Analytics Enthusiast

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s