sklearn How to calculate F1 Macro in Keras?




sklearn f1 score (4)

i've tried to use the codes given from Keras before they're removed. Here's the code :

def precision(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def recall(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def fbeta_score(y_true, y_pred, beta=1):
    if beta < 0:
        raise ValueError('The lowest choosable beta is zero (only precision).')

    # If there are no true positives, fix the F score at 0 like sklearn.
    if K.sum(K.round(K.clip(y_true, 0, 1))) == 0:
        return 0

    p = precision(y_true, y_pred)
    r = recall(y_true, y_pred)
    bb = beta ** 2
    fbeta_score = (1 + bb) * (p * r) / (bb * p + r + K.epsilon())
    return fbeta_score

def fmeasure(y_true, y_pred):
    return fbeta_score(y_true, y_pred, beta=1)

From what i saw (i'm an amateur in this), it seems like they use the correct formula. But, when i tried to use it as a metrics in the training process, I got exactly equal output for val_accuracy, val_precision, val_recall, and val_fmeasure. I do believe that it might happen even if the formula correct, but i believe it is unlikely. Any explanation for this issue? Thank you


I also suggest this work-around

  • install keras_metrics package by ybubnov
  • call model.fit(nb_epoch=1, ...) inside a for loop taking advantage of the precision/recall metrics outputted after every epoch

Something like this:

    for mini_batch in range(epochs):
        model_hist = model.fit(X_train, Y_train, batch_size=batch_size, epochs=1,
                            verbose=2, validation_data=(X_val, Y_val))

        precision = model_hist.history['val_precision'][0]
        recall = model_hist.history['val_recall'][0]
        f_score = (2.0 * precision * recall) / (precision + recall)
        print 'F1-SCORE {}'.format(f_score)



Using a Keras metric function is not the right way to calculate F1 or AUC or something like that.

The reason for this is that the metric function is called at each batch step at validation. That way the Keras system calculates an average on the batch results. And that is not the right F1 score.

Thats the reason why F1 score got removed from the metric functions in keras. See here:

The right way to do this is to use a custom callback function in a way like this: https://medium.com/@thongonary/how-to-compute-f1-score-for-each-epoch-in-keras-a1acd17715a2


since Keras 2.0 metrics f1, precision, and recall have been removed. The solution is to use a custom metric function:

from keras import backend as K

def f1(y_true, y_pred):
    def recall(y_true, y_pred):
        """Recall metric.

        Only computes a batch-wise average of recall.

        Computes the recall, a metric for multi-label classification of
        how many relevant items are selected.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

    def precision(y_true, y_pred):
        """Precision metric.

        Only computes a batch-wise average of precision.

        Computes the precision, a metric for multi-label classification of
        how many selected items are relevant.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision
    precision = precision(y_true, y_pred)
    recall = recall(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))


model.compile(loss='binary_crossentropy',
          optimizer= "adam",
          metrics=[f1])

The return line of this function

return 2*((precision*recall)/(precision+recall+K.epsilon()))

was modified by adding the constant epsilon, in order to avoid division by 0. Thus NaN will not be computed.