Prediction for survival of Titanic passengers

In this project, I used different machine learning models to predict the survival of passengers based on various features in the dataset.

Data

  • The data for this project was taken from Kaggle.
  • The data-set is labelled in two classes: "Survived: 1" and "Not survived: 0"

Step 1

  • Identified the discrepency between male and female passengers survival rates. Looked at each feature independently.
  • Step 2

  • Performed One-hot encoding, feature removal, feature scaling on the features of the data-set.
  • Step 3

  • Trained four different machine learning models on the data-set: Logistic regression, Kernal SVM, Decision tree, Random forest.
  • Step 4

  • Performed grid search and k-fold cross validation to estimate accuracy for each model in predicting the survival for test data.
  • Best performing model came out to be Random Forest with a mean accuracy score of 84%.


    Some visualisations from the project:

    Distribution of titles in the names of passengers:

    Survival ratio of Training and predicted data set:

    Link to Github Repository