Comprehending Cross Validation throughout the Data Science pipeline

Blogs

“”” ### Test Alternative Models

logistic = LogisticRegression()

cross_val_score( logistic, X, y, cv= 5, scoring=” accuracy”). Hyperparameter tuning

Finally, cross validation is also used in hyperparameter tuning

As per cross validation specification tuning grid search

“In machine learning, 2 jobs are frequently done at the very same time in information pipelines: cross recognition and (active) criterion tuning.

So, to conclude, cross recognition is a technique utilized in several parts of the information science pipeline

2. Exploratory data analysis

  • Analyse the target variable,
  • Check if the data is balanced,
  • Check the co-relations

3. Split the data

4. Choose a Baseline algorithm

  • Train and Test the Model
  • Choose an evaluation metric
  • Refine our dataset
  • Feature engineering

5. Test Alternative Models — Ensemble models 

6. Choose the best model and optimise its parameters

In this context, we outline below two more cases where we can use cross validation

  1. In choice of alternate models and
  2. In hyperparameter tuning

we explain these below

1. Choosing alternate models:

If we have two models, and we want to see which one is better, we can use cross validation to compare the two for a given dataset.  For the code listed above, this is shown in the following section.

“””### Test Alternative Models

logistic = LogisticRegression()

cross_val_score(logistic, X, y, cv=5, scoring=”accuracy”).mean()

rnd_clf = RandomForestClassifier()

cross_val_score(rnd_clf, X, y, cv=5, scoring=”accuracy”).mean()

 

2. Hyperparameter tuning

Finally, cross validation is also used in hyperparameter tuning

As per cross validation parameter tuning grid search

In machine learning, two tasks are commonly done at the same time in data pipelines: cross validation and (hyper)parameter tuning. Cross validation is the process of training learners using one set of data and testing it using a different set. Parameter tuning is the process to selecting the values for a model’s parameters that maximize the accuracy of the model.”

 So, to conclude, cross validation is a technique used in multiple parts of the data science pipeline

Please follow and like us: