+4 votes
in Machine Learning by (66.2k points)
recategorized by
CatBoost classifier can be run on data with non-numeric features. Is there any parameter in CatBoost to specify non-numeric features, or will it just handle them itself?

1 Answer

+2 votes
by (281k points)
edited by
Best answer

You need to specify the index of non-numeric features in the fit() function of the CatBoost; otherwise, it will throw an error. The parameter to specify those indices is "cat_features". You can provide a list or a Numpy array. It internally converts those non-numerical features into numeric features. That's why it needs to know their indices.

fit(X, y=None, cat_features=None,  text_features=None, ...)

For example:

If your data have both numeric and non-numeric features, you need to find the indices of the non-numeric features.
Let's assume that your data's non-numeric features indices are 2, 7, 9, and 10. So, the fit() function will look like this:

cat_features_indx = [2,7,9,10]

fit(X, y=None, cat_features=cat_features_indx, ...)

You can find a CatBoost classifier example with categorical mushroom data here .