Data is split into Train and Test in R to train the model and evaluate the results.
There are multiple ways of doing this.
1. Splitting data using sample function
#load data into variable called mydata mydata = read.csv('mydata.csv',header=T) #setting seed so we get same data split each time set.seed(100) #can provide any number for seed nall = nrow(mydata) #total number of rows in data ntrain = floor(0.7 * nall) # number of rows for train,70% ntest = floor(0.3* nall) # number of rows for test, 30% index = seq(1:nall) trainIndex = sample(index, ntrain) #train data set testIndex = index[-train] train = mydata[trainIndex,] test = mydata[test,]
2. Splitting data using caret package
Data can be split in caret package based on the target variable, or y variable
For illustration, I am assuming target variable to be TARGET
#install caret package install.packages('caret') #load package library(caret) trainIndex = createDataPartition(mydata$TARGET, p=0.7, list=FALSE,times=1) train = mydata[trainIndex,] test = mydata[-trainIndex,]
There are more ways of splitting code. If you want to read more,
please refer to the following link