df_bkp = df.copy() from sklearn.preprocessing import OrdinalEncoder df = OrdinalEncoder().fit_transform(df.values.reshape((-1, 1))) df = OrdinalEncoder().fit_transform(df.values.reshape((-1, 1))) df = OrdinalEncoder().fit_transform(df.values.reshape((-1, 1))) Visualizations Before that, we will backup our current data format for future use. We’re going to use Scikit-Learn’s OrdinalEncoder to apply this transformation on variables Sex, Ticket, Embarked. Ordinal encoding is a way to convert a categorical variable into numbers by assigning each category a number. We will use ordinal encoding for this conversion. Machine learning algorithms don’t work with text (at least not directly), we need to convert all strings with numbers. We got rid of null values, now let’s see what’s next. mean_fare_train = np.mean(df.loc) = False].values) We will also compute and store Fare mean in case we need it on test data.
Note that we should store everything that we learn from our training data, such as Embarked class frequency or Age mean, as we will use this information when making predictions in case there are also missing values in test data. mean_age_train = np.mean(df.loc) = False].values) df.loc)] = mean_age_train df.loc)] = 'S'įor age we will replace missing values with the average age. The most frequent value for Embarked is ‘S’, so we will use it to replace the null values.