[python] TensorFlow - Read all examples from a TFRecords at once?



2 Answers

To read all the data just once, you need to pass num_epochs to the string_input_producer. When all the record are read, the .read method of reader will throw an error, which you can catch. Simplified example:

import tensorflow as tf

def read_and_decode(filename_queue):
 reader = tf.TFRecordReader()
 _, serialized_example = reader.read(filename_queue)
 features = tf.parse_single_example(
  serialized_example,
  features={
      'image_raw': tf.FixedLenFeature([], tf.string)
  })
 image = tf.decode_raw(features['image_raw'], tf.uint8)
 return image


def get_all_records(FILE):
 with tf.Session() as sess:
   filename_queue = tf.train.string_input_producer([FILE], num_epochs=1)
   image = read_and_decode(filename_queue)
   init_op = tf.initialize_all_variables()
   sess.run(init_op)
   coord = tf.train.Coordinator()
   threads = tf.train.start_queue_runners(coord=coord)
   try:
     while True:
       example = sess.run([image])
   except tf.errors.OutOfRangeError, e:
     coord.request_stop(e)
   finally:
     coord.request_stop()
     coord.join(threads)

get_all_records('/path/to/train-0.tfrecords')

And to use tf.parse_example (which is faster than tf.parse_single_example) you need to first batch the examples like that:

batch = tf.train.batch([serialized_example], num_examples, capacity=num_examples)
parsed_examples = tf.parse_example(batch, feature_spec)

Unfortunately this way you'd need to know the num of examples beforehand.

Question

How do you read all examples from a TFRecords at once?

I've been using tf.parse_single_example to read out individual examples using code similar to that given in the method read_and_decode in the example of the fully_connected_reader. However, I want to run the network against my entire validation dataset at once, and so would like to load them in their entirety instead.

I'm not entirely sure, but the documentation seems to suggest I can use tf.parse_example instead of tf.parse_single_example to load the entire TFRecords file at once. I can't seem to get this to work though. I'm guessing it has to do with how I specify the features, but I'm not sure how in the feature specification to state that there are multiple examples.

In other words, my attempt of using something similar to:

reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_example(serialized_example, features={
    'image_raw': tf.FixedLenFeature([], tf.string),
    'label': tf.FixedLenFeature([], tf.int64),
})

isn't working, and I assume it's because the features aren't expecting multiple examples at once (but again, I'm not sure). [This results in an error of ValueError: Shape () must have rank 1]

Is this the proper way to read all the records at once? And if so, what do I need to change to actually read the records? Thank you much!




Besides, if you don't think 'tf.train.shuffle_batch' is the way you need. You may try combination of tf.TFRecordReader().read_up_to() and tf.parse_example() as well. Here's the example for your reference:

def read_tfrecords(folder_name, bs):
    filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once(glob.glob(folder_name + "/*.tfrecords")))
    reader = tf.TFRecordReader()
    _, serialized = reader.read_up_to(filename_queue, bs)
    features = tf.parse_example(
        serialized,
        features={
            'label': tf.FixedLenFeature([], tf.string),
            'image': tf.FixedLenFeature([], tf.string)
        }
    )
    record_image = tf.decode_raw(features['image'], tf.uint8)
    image = tf.reshape(record_image, [-1, 250, 151, 1])
    label = tf.cast(features['label'], tf.string)
    return image, label



If you need to read all the data from TFRecord at once, you can write way easier solution just in a few lines of code using tf_record_iterator:

An iterator that read the records from a TFRecords file.

To do this, you just:

  1. create an example
  2. iterate over records from the iterator
  3. parse each record and read each feature depending on its type

Here is an example with explanation how to read each type.

example = tf.train.Example()
for record in tf.python_io.tf_record_iterator(<tfrecord_file>):
    example.ParseFromString(record)
    f = example.features.feature
    v1 = f['int64 feature'].int64_list.value[0]
    v2 = f['float feature'].float_list.value[0]
    v3 = f['bytes feature'].bytes_list.value[0]
    # for bytes you might want to represent them in a different way (based on what they were before saving)
    # something like `np.fromstring(f['img'].bytes_list.value[0], dtype=np.uint8
    # Now do something with your v1/v2/v3



Related