[python] Comment pourrais-je utiliser la normalisation par lots dans TensorFlow?



Answers

Pour ajouter encore une autre alternative: à partir de TensorFlow 1.0 (février 2017), il y a aussi l'API de haut niveau tf.layers.batch_normalization incluse dans TensorFlow elle-même.

C'est super simple à utiliser:

# Set this to True for training and False for testing
training = tf.placeholder(tf.bool)

x = tf.layers.dense(input_x, units=100)
x = tf.layers.batch_normalization(x, training=training)
x = tf.nn.relu(x)

... sauf qu'il ajoute des ops supplémentaires au graphe (pour mettre à jour ses variables de moyenne et de variance) de telle sorte qu'elles ne seront pas des dépendances de votre op. Vous pouvez soit lancer les ops séparément:

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
sess.run([train_op, extra_update_ops], ...)

ou ajoutez manuellement les opérations de mise à jour en tant que dépendances de votre programme d'entraînement, puis exécutez simplement votre opération d'entraînement comme d'habitude:

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
    train_op = optimizer.minimize(loss)
...
sess.run([train_op], ...)
Question

Je voudrais utiliser la normalisation de lot dans TensorFlow, puisque je l'ai trouvé dans le code source core/ops/nn_ops.cc . Cependant, je n'ai pas trouvé documenté sur tensorflow.org.

BN a des sémantiques différentes dans MLP et CNN, donc je ne suis pas sûr de ce que fait exactement ce BN.

Je n'ai pas trouvé de méthode appelée MovingMoments non plus.

Le code C ++ est copié ici pour référence:

REGISTER_OP("BatchNormWithGlobalNormalization")
    .Input("t: T")
    .Input("m: T")
    .Input("v: T")
    .Input("beta: T")
    .Input("gamma: T")
    .Output("result: T")
    .Attr("T: numbertype")
    .Attr("variance_epsilon: float")
    .Attr("scale_after_normalization: bool")
    .Doc(R"doc(
Batch normalization.

t: A 4D input Tensor.
m: A 1D mean Tensor with size matching the last dimension of t.
  This is the first output from MovingMoments.
v: A 1D variance Tensor with size matching the last dimension of t.
  This is the second output from MovingMoments.
beta: A 1D beta Tensor with size matching the last dimension of t.
  An offset to be added to the normalized tensor.
gamma: A 1D gamma Tensor with size matching the last dimension of t.
  If "scale_after_normalization" is true, this tensor will be multiplied
  with the normalized tensor.
variance_epsilon: A small float number to avoid dividing by 0.
scale_after_normalization: A bool indicating whether the resulted tensor
  needs to be multiplied with gamma.
)doc");



Donc un exemple simple de l'utilisation de cette classe batchnorm:

from bn_class import *

with tf.name_scope('Batch_norm_conv1') as scope:
    ewma = tf.train.ExponentialMovingAverage(decay=0.99)                  
    bn_conv1 = ConvolutionalBatchNormalizer(num_filt_1, 0.001, ewma, True)           
    update_assignments = bn_conv1.get_assigner() 
    a_conv1 = bn_conv1.normalize(a_conv1, train=bn_train) 
    h_conv1 = tf.nn.relu(a_conv1)



Juste un heads-up, ne peux pas commenter parce que je n'ai pas 50 rep. Mais la réponse de @ bgshi ci-dessus ne semble pas correcte. Lorsque phase_train est défini sur false, il met à jour la moyenne et la variance ema. Cela peut être vérifié avec l'extrait de code suivant.

x = tf.placeholder(tf.float32, [None, 20, 20, 10], name='input')
phase_train = tf.placeholder(tf.bool, name='phase_train')

# generate random noise to pass into batch norm
x_gen = tf.random_normal([50,20,20,10])
pt_false = tf.Variable(tf.constant(True))

#generate a constant variable to pass into batch norm
y = x_gen.eval()

[bn, bn_vars] = batch_norm(x, 10, phase_train)

tf.initialize_all_variables().run()
train_step = lambda: bn.eval({x:x_gen.eval(), phase_train:True})
test_step = lambda: bn.eval({x:y, phase_train:False})
test_step_c = lambda: bn.eval({x:y, phase_train:True})

# Verify that this is different as expected, two different x's have different norms
print(train_step()[0][0][0])
print(train_step()[0][0][0])

# Verify that this is same as expected, same x's (y) have same norm
print(train_step_c()[0][0][0])
print(train_step_c()[0][0][0])

# THIS IS DIFFERENT but should be they same, should only be reading from the ema.
print(test_step()[0][0][0])
print(test_step()[0][0][0])



Il existe également une github.com/tensorflow/tensorflow/blob/… codée par les développeurs. Ils n'ont pas de très bons docs sur comment l'utiliser mais voici comment l'utiliser (selon moi):

from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm

def batch_norm_layer(x,train_phase,scope_bn):
    bn_train = batch_norm(x, decay=0.999, center=True, scale=True,
    updates_collections=None,
    is_training=True,
    reuse=None, # is this right?
    trainable=True,
    scope=scope_bn)
    bn_inference = batch_norm(x, decay=0.999, center=True, scale=True,
    updates_collections=None,
    is_training=False,
    reuse=True, # is this right?
    trainable=True,
    scope=scope_bn)
    z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
    return z

pour l'utiliser, vous devez créer un espace réservé pour train_phase qui indique si vous êtes en phase de formation ou d'inférence (comme dans train_phase = tf.placeholder(tf.bool, name='phase_train') ). Sa valeur peut être remplie pendant l'inférence ou la formation avec une tf.session comme dans:

test_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xtest, y_:batch_ytest, train_phase: False})

ou pendant l'entraînement:

sess.run(fetches=train_step, feed_dict={x: batch_xs, y_:batch_ys, train_phase: True})

et voici la fonction qui implémente BN selon eux (cela peut être dans le lien que j'ai donné ci-dessus mais je colle ceci juste au cas où ils le changeraient):

@add_arg_scope
def batch_norm(inputs,
               decay=0.999,
               center=True,
               scale=False,
               epsilon=0.001,
               activation_fn=None,
               updates_collections=ops.GraphKeys.UPDATE_OPS,
               is_training=True,
               reuse=None,
               variables_collections=None,
               outputs_collections=None,
               trainable=True,
               scope=None):
  """Adds a Batch Normalization layer from http://arxiv.org/abs/1502.03167.
    "Batch Normalization: Accelerating Deep Network Training by Reducing
    Internal Covariate Shift"
    Sergey Ioffe, Christian Szegedy
  Can be used as a normalizer function for conv2d and fully_connected.
  Args:
    -inputs: a tensor of size `[batch_size, height, width, channels]`
            or `[batch_size, channels]`.
    -decay: decay for the moving average.
    -center: If True, subtract `beta`. If False, `beta` is ignored.
    -scale: If True, multiply by `gamma`. If False, `gamma` is
      not used. When the next layer is linear (also e.g. `nn.relu`), this can be
      disabled since the scaling can be done by the next layer.
    -epsilon: small float added to variance to avoid dividing by zero.
    -activation_fn: Optional activation function.
    -updates_collections: collections to collect the update ops for computation.
      If None, a control dependency would be added to make sure the updates are
      computed.
    -is_training: whether or not the layer is in training mode. In training mode
      it would accumulate the statistics of the moments into `moving_mean` and
      `moving_variance` using an exponential moving average with the given
      `decay`. When it is not in training mode then it would use the values of
      the `moving_mean` and the `moving_variance`.
    -reuse: whether or not the layer and its variables should be reused. To be
      able to reuse the layer scope must be given.
    -variables_collections: optional collections for the variables.
    -outputs_collections: collections to add the outputs.
    -trainable: If `True` also add variables to the graph collection
      `GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
    -scope: Optional scope for `variable_op_scope`.
  Returns:
    a tensor representing the output of the operation.
  """
  with variable_scope.variable_op_scope([inputs],scope, 'BatchNorm', reuse=reuse) as sc:
    inputs_shape = inputs.get_shape()
    dtype = inputs.dtype.base_dtype
    axis = list(range(len(inputs_shape) - 1))
    params_shape = inputs_shape[-1:]
    # Allocate parameters for the beta and gamma of the normalization.
    beta, gamma = None, None
    if center:
      beta_collections = utils.get_variable_collections(variables_collections,'beta')
      beta = variables.model_variable('beta',shape=params_shape,dtype=dtype,initializer=init_ops.zeros_initializer,collections=beta_collections,trainable=trainable)
    if scale:
      gamma_collections = utils.get_variable_collections(variables_collections,'gamma')
      gamma = variables.model_variable('gamma',shape=params_shape,dtype=dtype,initializer=init_ops.ones_initializer,collections=gamma_collections,trainable=trainable)
    # Create moving_mean and moving_variance variables and add them to the
    # appropiate collections.
    moving_mean_collections = utils.get_variable_collections(variables_collections, 'moving_mean')
    moving_mean = variables.model_variable('moving_mean',shape=params_shape,dtype=dtype,initializer=init_ops.zeros_initializer,trainable=False,collections=moving_mean_collections)
    moving_variance_collections = utils.get_variable_collections(variables_collections, 'moving_variance')
    moving_variance = variables.model_variable('moving_variance',shape=params_shape,dtype=dtype,initializer=init_ops.ones_initializer,trainable=False,collections=moving_variance_collections)
    if is_training:
      # Calculate the moments based on the individual batch.
      mean, variance = nn.moments(inputs, axis, shift=moving_mean)
      # Update the moving_mean and moving_variance moments.
      update_moving_mean = moving_averages.assign_moving_average(moving_mean, mean, decay)
      update_moving_variance = moving_averages.assign_moving_average(moving_variance, variance, decay)
      if updates_collections is None:
        # Make sure the updates are computed here.
        with ops.control_dependencies([update_moving_mean,update_moving_variance]):
          outputs = nn.batch_normalization(inputs, mean, variance, beta, gamma, epsilon)
      else:
        # Collect the updates to be computed later.
        ops.add_to_collections(updates_collections, update_moving_mean)
        ops.add_to_collections(updates_collections, update_moving_variance)
        outputs = nn.batch_normalization(inputs, mean, variance, beta, gamma, epsilon)
    else:
      outputs = nn.batch_normalization(
          inputs, moving_mean, moving_variance, beta, gamma, epsilon)
    outputs.set_shape(inputs.get_shape())
    if activation_fn:
      outputs = activation_fn(outputs)
    return utils.collect_named_outputs(outputs_collections, sc.name, outputs)

Je suis assez sûr que c'est correct selon la discussion dans github .

Il semble qu'il y ait un autre lien utile:

http://r2rt.com/implementing-batch-normalization-in-tensorflow.html




Related