Tensorflow 中（批量）讀取數據的案列分析及TFRecord文件的打包與讀取

by admin · Published 2020 年 10 月 8 日 · Updated 2020 年 10 月 8 日

內容概要：

單一數據讀取方式：

　　第一種：slice_input_producer()

# 返回值可以直接通過 Session.run([images, labels])查看，且第一個參數必須放在列表中，如[...]

[images, labels] = tf.train.slice_input_producer([images, labels], num_epochs=None, shuffle=True)

　　第二種：string_input_producer()

# 需要定義文件讀取器，然後通過讀取器中的 read()方法來獲取數據（返回值類型 key,value），再通過 Session.run(value)查看

file_queue = tf.train.string_input_producer(filename, num_epochs=None, shuffle=True)

reader = tf.WholeFileReader()           # 定義文件讀取器

key, value = reader.read(file_queue)    # key：文件名；value：文件中的內容

　　！！！num_epochs=None，不指定迭代次數，這樣文件隊列中元素個數也不限定（None*數據集大小）。

　　！！！如果它不是None，則此函數創建本地計數器 epochs，需要使用local_variables_initializer()初始化局部變量

　　！！！以上兩種方法都可以生成文件名隊列。

（隨機）批量數據讀取方式：

batchsize=2　　# 每次讀取的樣本數量

tf.train.batch(tensors, batch_size=batchsize)

tf.train.shuffle_batch(tensors, batch_size=batchsize, capacity=batchsize*10, min_after_dequeue=batchsize*5) # capacity > min_after_dequeue

　　！！！以上所有讀取數據的方法，在Session.run()之前必須開啟文件隊列線程 tf.train.start_queue_runners()

TFRecord文件的打包與讀取

一、單一數據讀取方式

第一種：slice_input_producer()

def slice_input_producer(tensor_list, num_epochs=None, shuffle=True, seed=None, capacity=32, shared_name=None, name=None)

案例1：

import tensorflow as tf

images = ['image1.jpg', 'image2.jpg', 'image3.jpg', 'image4.jpg']
labels = [1, 2, 3, 4]

# [images, labels] = tf.train.slice_input_producer([images, labels], num_epochs=None, shuffle=True)

# 當num_epochs=2時，此時文件隊列中只有 2*4=8個樣本，所有在取第9個樣本時會出錯
# [images, labels] = tf.train.slice_input_producer([images, labels], num_epochs=2, shuffle=True)

data = tf.train.slice_input_producer([images, labels], num_epochs=None, shuffle=True)
print(type(data))   # <class 'list'>

with tf.Session() as sess:
    # sess.run(tf.local_variables_initializer())
    sess.run(tf.local_variables_initializer())
    coord = tf.train.Coordinator()  # 線程的協調器
    threads = tf.train.start_queue_runners(sess, coord)  # 開始在圖表中收集隊列運行器

    for i in range(10):
        print(sess.run(data))

    coord.request_stop()
    coord.join(threads)

"""
運行結果：
[b'image2.jpg', 2]
[b'image1.jpg', 1]
[b'image3.jpg', 3]
[b'image4.jpg', 4]
[b'image2.jpg', 2]
[b'image1.jpg', 1]
[b'image3.jpg', 3]
[b'image4.jpg', 4]
[b'image2.jpg', 2]
[b'image3.jpg', 3]
"""

　　！！！slice_input_producer() 中的第一個參數需要放在一個列表中，列表中的每個元素可以是 List 或 Tensor，如 [images，labels]，

　　！！！num_epochs設置

第二種：string_input_producer()

def string_input_producer(string_tensor, num_epochs=None, shuffle=True, seed=None, capacity=32, shared_name=None, name=None, cancel_op=None)

文件讀取器

　　不同類型的文件對應不同的文件讀取器，我們稱為 reader對象；

　　該對象的 read 方法自動讀取文件，並創建數據隊列，輸出key/文件名，value/文件內容；

reader = tf.TextLineReader()      ### 一行一行讀取，適用於所有文本文件

reader = tf.TFRecordReader()      ### A Reader that outputs the records from a TFRecords file

reader = tf.WholeFileReader()     ### 一次讀取整個文件，適用圖片

案例2：讀取csv文件

iimport tensorflow as tf

filename = ['data/A.csv', 'data/B.csv', 'data/C.csv']

file_queue = tf.train.string_input_producer(filename, shuffle=True, num_epochs=2)   # 生成文件名隊列
reader = tf.WholeFileReader()           # 定義文件讀取器（一次讀取整個文件）
# reader = tf.TextLineReader()            # 定義文件讀取器(一行一行的讀)
key, value = reader.read(file_queue)    # key：文件名；value：文件中的內容
print(type(file_queue))

init = [tf.global_variables_initializer(), tf.local_variables_initializer()]
with tf.Session() as sess:
    sess.run(init)
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    try:
        while not coord.should_stop():
            for i in range(6):
                print(sess.run([key, value]))
            break
    except tf.errors.OutOfRangeError:
        print('read done')
    finally:
        coord.request_stop()
    coord.join(threads)

"""
reader = tf.WholeFileReader()           # 定義文件讀取器（一次讀取整個文件）
運行結果：
[b'data/C.csv', b'7.jpg,7\n8.jpg,8\n9.jpg,9\n']
[b'data/B.csv', b'4.jpg,4\n5.jpg,5\n6.jpg,6\n']
[b'data/A.csv', b'1.jpg,1\n2.jpg,2\n3.jpg,3\n']
[b'data/A.csv', b'1.jpg,1\n2.jpg,2\n3.jpg,3\n']
[b'data/B.csv', b'4.jpg,4\n5.jpg,5\n6.jpg,6\n']
[b'data/C.csv', b'7.jpg,7\n8.jpg,8\n9.jpg,9\n']
"""
"""
reader = tf.TextLineReader()           # 定義文件讀取器(一行一行的讀)
運行結果：
[b'data/B.csv:1', b'4.jpg,4']
[b'data/B.csv:2', b'5.jpg,5']
[b'data/B.csv:3', b'6.jpg,6']
[b'data/C.csv:1', b'7.jpg,7']
[b'data/C.csv:2', b'8.jpg,8']
[b'data/C.csv:3', b'9.jpg,9']
"""

案例3：讀取圖片（每次讀取全部圖片內容，不是一行一行）

import tensorflow as tf

filename = ['1.jpg', '2.jpg']
filename_queue = tf.train.string_input_producer(filename, shuffle=False, num_epochs=1)
reader = tf.WholeFileReader()              # 文件讀取器
key, value = reader.read(filename_queue)   # 讀取文件 key:文件名；value：圖片數據，bytes

with tf.Session() as sess:
    tf.local_variables_initializer().run()
    coord = tf.train.Coordinator()      # 線程的協調器
    threads = tf.train.start_queue_runners(sess, coord)

    for i in range(filename.__len__()):
        image_data = sess.run(value)
        with open('img_%d.jpg' % i, 'wb') as f:
            f.write(image_data)
    coord.request_stop()
    coord.join(threads)

二、（隨機）批量數據讀取方式：

　　功能：shuffle_batch() 和 batch() 這兩個API都是從文件隊列中批量獲取數據，使用方式類似；

案例4：slice_input_producer() 與 batch()

import tensorflow as tf
import numpy as np

images = np.arange(20).reshape([10, 2])
label = np.asarray(range(0, 10))
images = tf.cast(images, tf.float32)　　# 可以註釋掉，不影響運行結果
label = tf.cast(label, tf.int32)　　　　 # 可以註釋掉，不影響運行結果

batchsize = 6   # 每次獲取元素的數量
input_queue = tf.train.slice_input_producer([images, label], num_epochs=None, shuffle=False)
image_batch, label_batch = tf.train.batch(input_queue, batch_size=batchsize)

# 隨機獲取 batchsize個元素，其中，capacity：隊列容量，這個參數一定要比 min_after_dequeue 大
# image_batch, label_batch = tf.train.shuffle_batch(input_queue, batch_size=batchsize, capacity=64, min_after_dequeue=10)

with tf.Session() as sess:
    coord = tf.train.Coordinator()      # 線程的協調器
    threads = tf.train.start_queue_runners(sess, coord)     # 開始在圖表中收集隊列運行器
    for cnt in range(2):
        print("第{}次獲取數據,每次batch={}...".format(cnt+1, batchsize))
        image_batch_v, label_batch_v = sess.run([image_batch, label_batch])
        print(image_batch_v, label_batch_v, label_batch_v.__len__())

    coord.request_stop()
    coord.join(threads)

"""
運行結果：
第1次獲取數據,每次batch=6...
[[ 0.  1.]
 [ 2.  3.]
 [ 4.  5.]
 [ 6.  7.]
 [ 8.  9.]
 [10. 11.]] [0 1 2 3 4 5] 6
第2次獲取數據,每次batch=6...
[[12. 13.]
 [14. 15.]
 [16. 17.]
 [18. 19.]
 [ 0.  1.]
 [ 2.  3.]] [6 7 8 9 0 1] 6
"""

案例5：從本地批量的讀取圖片 — string_input_producer() 與 batch()

 1 import tensorflow as tf
 2 import glob
 3 import cv2 as cv
 4 
 5 def read_imgs(filename, picture_format, input_image_shape, batch_size=1):
 6     """
 7     從本地批量的讀取圖片
 8     :param filename: 圖片路徑（包括圖片的文件名），[]
 9     :param picture_format: 圖片的格式，如 bmp,jpg,png等; string
10     :param input_image_shape: 輸入圖像的大小; (h,w,c)或[]
11     :param batch_size: 每次從文件隊列中加載圖片的數量; int
12     :return: batch_size張圖片數據, Tensor
13     """
14     global new_img
15     # 創建文件隊列
16     file_queue = tf.train.string_input_producer(filename, num_epochs=1, shuffle=True)
17     # 創建文件讀取器
18     reader = tf.WholeFileReader()
19     # 讀取文件隊列中的文件
20     _, img_bytes = reader.read(file_queue)
21     # print(img_bytes)    # Tensor("ReaderReadV2_19:1", shape=(), dtype=string)
22     # 對圖片進行解碼
23     if picture_format == ".bmp":
24         new_img = tf.image.decode_bmp(img_bytes, channels=1)
25     elif picture_format == ".jpg":
26         new_img = tf.image.decode_jpeg(img_bytes, channels=3)
27     else:
28         pass
29     # 重新設置圖片的大小
30     # new_img = tf.image.resize_images(new_img, input_image_shape)
31     new_img = tf.reshape(new_img, input_image_shape)
32     # 設置圖片的數據類型
33     new_img = tf.image.convert_image_dtype(new_img, tf.uint8)
34 
35     # return new_img
36     return tf.train.batch([new_img], batch_size)
37 
38 
39 def main():
40     image_path = glob.glob(r'F:\demo\FaceRecognition\人臉庫\ORL\*.bmp')
41     image_batch = read_imgs(image_path, ".bmp", (112, 92, 1), 5)
42     print(type(image_batch))
43     # image_path = glob.glob(r'.\*.jpg')
44     # image_batch = read_imgs(image_path, ".jpg", (313, 500, 3), 1)
45 
46     sess = tf.Session()
47     sess.run(tf.local_variables_initializer())
48     tf.train.start_queue_runners(sess=sess)
49 
50     image_batch = sess.run(image_batch)
51     print(type(image_batch))    # <class 'numpy.ndarray'>
52 
53     for i in range(image_batch.__len__()):
54         cv.imshow("win_"+str(i), image_batch[i])
55     cv.waitKey()
56     cv.destroyAllWindows()
57 
58 def start():
59     image_path = glob.glob(r'F:\demo\FaceRecognition\人臉庫\ORL\*.bmp')
60     image_batch = read_imgs(image_path, ".bmp", (112, 92, 1), 5)
61     print(type(image_batch))    # <class 'tensorflow.python.framework.ops.Tensor'>
62 
63 
64     with tf.Session() as sess:
65         sess.run(tf.local_variables_initializer())
66         coord = tf.train.Coordinator()      # 線程的協調器
67         threads = tf.train.start_queue_runners(sess, coord)     # 開始在圖表中收集隊列運行器
68         image_batch = sess.run(image_batch)
69         print(type(image_batch))    # <class 'numpy.ndarray'>
70 
71         for i in range(image_batch.__len__()):
72             cv.imshow("win_"+str(i), image_batch[i])
73         cv.waitKey()
74         cv.destroyAllWindows()
75 
76         # 若使用 with 方式打開 Session，且沒加如下2行語句，則會出錯
77         # ERROR:tensorflow:Exception in QueueRunner: Enqueue operation was cancelled；
78         # 原因：文件隊列線程還處於工作狀態（隊列中還有圖片數據），而加載完batch_size張圖片會話就會自動關閉，同時關閉文件隊列線程
79         coord.request_stop()
80         coord.join(threads)
81 
82 
83 if __name__ == "__main__":
84     # main()
85     start()

從本地批量的讀取圖片案例

案列6：TFRecord文件打包與讀取

 1 def write_TFRecord(filename, data, labels, is_shuffler=True):
 2     """
 3     將數據打包成TFRecord格式
 4     :param filename: 打包後路徑名，默認在工程目錄下創建該文件；String
 5     :param data: 需要打包的文件路徑名；list
 6     :param labels: 對應文件的標籤；list
 7     :param is_shuffler:是否隨機初始化打包后的數據，默認：True；Bool
 8     :return: None
 9     """
10     im_data = list(data)
11     im_labels = list(labels)
12 
13     index = [i for i in range(im_data.__len__())]
14     if is_shuffler:
15         np.random.shuffle(index)
16 
17     # 創建寫入器，然後使用該對象寫入樣本example
18     writer = tf.python_io.TFRecordWriter(filename)
19     for i in range(im_data.__len__()):
20         im_d = im_data[index[i]]    # im_d:存放着第index[i]張圖片的路徑信息
21         im_l = im_labels[index[i]]  # im_l：存放着對應圖片的標籤信息
22 
23         # # 獲取當前的圖片數據  方式一：
24         # data = cv2.imread(im_d)
25         # # 創建樣本
26         # ex = tf.train.Example(
27         #     features=tf.train.Features(
28         #         feature={
29         #             "image": tf.train.Feature(
30         #                 bytes_list=tf.train.BytesList(
31         #                     value=[data.tobytes()])), # 需要打包成bytes類型
32         #             "label": tf.train.Feature(
33         #                 int64_list=tf.train.Int64List(
34         #                     value=[im_l])),
35         #         }
36         #     )
37         # )
38         # 獲取當前的圖片數據  方式二：相對於方式一，打包文件佔用空間小了一半多
39         data = tf.gfile.FastGFile(im_d, "rb").read()
40         ex = tf.train.Example(
41             features=tf.train.Features(
42                 feature={
43                     "image": tf.train.Feature(
44                         bytes_list=tf.train.BytesList(
45                             value=[data])), # 此時的data已經是bytes類型
46                     "label": tf.train.Feature(
47                         int64_list=tf.train.Int64List(
48                             value=[im_l])),
49                 }
50             )
51         )
52 
53         # 寫入將序列化之後的樣本
54         writer.write(ex.SerializeToString())
55     # 關閉寫入器
56     writer.close()

TFRecord文件打包案列

 1 import tensorflow as tf
 2 import cv2
 3 
 4 def read_TFRecord(file_list, batch_size=10):
 5     """
 6     讀取TFRecord文件
 7     :param file_list: 存放TFRecord的文件名，List
 8     :param batch_size: 每次讀取圖片的數量
 9     :return: 解析後圖片及對應的標籤
10     """
11     file_queue = tf.train.string_input_producer(file_list, num_epochs=None, shuffle=True)
12     reader = tf.TFRecordReader()
13     _, ex = reader.read(file_queue)
14     batch = tf.train.shuffle_batch([ex], batch_size, capacity=batch_size * 10, min_after_dequeue=batch_size * 5)
15 
16     feature = {
17         'image': tf.FixedLenFeature([], tf.string),
18         'label': tf.FixedLenFeature([], tf.int64)
19     }
20     example = tf.parse_example(batch, features=feature)
21 
22     images = tf.decode_raw(example['image'], tf.uint8)
23     images = tf.reshape(images, [-1, 32, 32, 3])
24 
25     return images, example['label']
26 
27 
28 
29 def main():
30     # filelist = ['data/train.tfrecord']
31     filelist = ['data/test.tfrecord']
32     images, labels = read_TFRecord(filelist, 2)
33     with tf.Session() as sess:
34         sess.run(tf.local_variables_initializer())
35         coord = tf.train.Coordinator()
36         threads = tf.train.start_queue_runners(sess=sess, coord=coord)
37 
38         try:
39             while not coord.should_stop():
40                 for i in range(1):
41                     image_bth, _ = sess.run([images, labels])
42                     print(_)
43 
44                     cv2.imshow("image_0", image_bth[0])
45                     cv2.imshow("image_1", image_bth[1])
46                 break
47         except tf.errors.OutOfRangeError:
48             print('read done')
49         finally:
50             coord.request_stop()
51         coord.join(threads)
52         cv2.waitKey(0)
53         cv2.destroyAllWindows()
54 
55 if __name__ == "__main__":
56     main()

TFReord文件的讀取案列

內容概要：

單一數據讀取方式：

　　第一種：slice_input_producer()

# 返回值可以直接通過 Session.run([images, labels])查看，且第一個參數必須放在列表中，如[...]

[images, labels] = tf.train.slice_input_producer([images, labels], num_epochs=None, shuffle=True)

　　第二種：string_input_producer()

# 需要定義文件讀取器，然後通過讀取器中的 read()方法來獲取數據（返回值類型 key,value），再通過 Session.run(value)查看

file_queue = tf.train.string_input_producer(filename, num_epochs=None, shuffle=True)

reader = tf.WholeFileReader()           # 定義文件讀取器

key, value = reader.read(file_queue)    # key：文件名；value：文件中的內容

　　！！！num_epochs=None，不指定迭代次數，這樣文件隊列中元素個數也不限定（None*數據集大小）。

　　！！！如果它不是None，則此函數創建本地計數器 epochs，需要使用local_variables_initializer()初始化局部變量

　　！！！以上兩種方法都可以生成文件名隊列。

（隨機）批量數據讀取方式：

batchsize=2　　# 每次讀取的樣本數量

tf.train.batch(tensors, batch_size=batchsize)

tf.train.shuffle_batch(tensors, batch_size=batchsize, capacity=batchsize*10, min_after_dequeue=batchsize*5) # capacity > min_after_dequeue

　　！！！以上所有讀取數據的方法，在Session.run()之前必須開啟文件隊列線程 tf.train.start_queue_runners()

TFRecord文件的打包與讀取

一、單一數據讀取方式

第一種：slice_input_producer()

def slice_input_producer(tensor_list, num_epochs=None, shuffle=True, seed=None, capacity=32, shared_name=None, name=None)

案例1：

import tensorflow as tf

images = ['image1.jpg', 'image2.jpg', 'image3.jpg', 'image4.jpg']
labels = [1, 2, 3, 4]

# [images, labels] = tf.train.slice_input_producer([images, labels], num_epochs=None, shuffle=True)

# 當num_epochs=2時，此時文件隊列中只有 2*4=8個樣本，所有在取第9個樣本時會出錯
# [images, labels] = tf.train.slice_input_producer([images, labels], num_epochs=2, shuffle=True)

data = tf.train.slice_input_producer([images, labels], num_epochs=None, shuffle=True)
print(type(data))   # <class 'list'>

with tf.Session() as sess:
    # sess.run(tf.local_variables_initializer())
    sess.run(tf.local_variables_initializer())
    coord = tf.train.Coordinator()  # 線程的協調器
    threads = tf.train.start_queue_runners(sess, coord)  # 開始在圖表中收集隊列運行器

    for i in range(10):
        print(sess.run(data))

    coord.request_stop()
    coord.join(threads)

"""
運行結果：
[b'image2.jpg', 2]
[b'image1.jpg', 1]
[b'image3.jpg', 3]
[b'image4.jpg', 4]
[b'image2.jpg', 2]
[b'image1.jpg', 1]
[b'image3.jpg', 3]
[b'image4.jpg', 4]
[b'image2.jpg', 2]
[b'image3.jpg', 3]
"""

　　！！！slice_input_producer() 中的第一個參數需要放在一個列表中，列表中的每個元素可以是 List 或 Tensor，如 [images，labels]，

　　！！！num_epochs設置

第二種：string_input_producer()

def string_input_producer(string_tensor, num_epochs=None, shuffle=True, seed=None, capacity=32, shared_name=None, name=None, cancel_op=None)

文件讀取器

　　不同類型的文件對應不同的文件讀取器，我們稱為 reader對象；

　　該對象的 read 方法自動讀取文件，並創建數據隊列，輸出key/文件名，value/文件內容；

reader = tf.TextLineReader()      ### 一行一行讀取，適用於所有文本文件

reader = tf.TFRecordReader()      ### A Reader that outputs the records from a TFRecords file

reader = tf.WholeFileReader()     ### 一次讀取整個文件，適用圖片

案例2：讀取csv文件

iimport tensorflow as tf

filename = ['data/A.csv', 'data/B.csv', 'data/C.csv']

file_queue = tf.train.string_input_producer(filename, shuffle=True, num_epochs=2)   # 生成文件名隊列
reader = tf.WholeFileReader()           # 定義文件讀取器（一次讀取整個文件）
# reader = tf.TextLineReader()            # 定義文件讀取器(一行一行的讀)
key, value = reader.read(file_queue)    # key：文件名；value：文件中的內容
print(type(file_queue))

init = [tf.global_variables_initializer(), tf.local_variables_initializer()]
with tf.Session() as sess:
    sess.run(init)
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    try:
        while not coord.should_stop():
            for i in range(6):
                print(sess.run([key, value]))
            break
    except tf.errors.OutOfRangeError:
        print('read done')
    finally:
        coord.request_stop()
    coord.join(threads)

"""
reader = tf.WholeFileReader()           # 定義文件讀取器（一次讀取整個文件）
運行結果：
[b'data/C.csv', b'7.jpg,7\n8.jpg,8\n9.jpg,9\n']
[b'data/B.csv', b'4.jpg,4\n5.jpg,5\n6.jpg,6\n']
[b'data/A.csv', b'1.jpg,1\n2.jpg,2\n3.jpg,3\n']
[b'data/A.csv', b'1.jpg,1\n2.jpg,2\n3.jpg,3\n']
[b'data/B.csv', b'4.jpg,4\n5.jpg,5\n6.jpg,6\n']
[b'data/C.csv', b'7.jpg,7\n8.jpg,8\n9.jpg,9\n']
"""
"""
reader = tf.TextLineReader()           # 定義文件讀取器(一行一行的讀)
運行結果：
[b'data/B.csv:1', b'4.jpg,4']
[b'data/B.csv:2', b'5.jpg,5']
[b'data/B.csv:3', b'6.jpg,6']
[b'data/C.csv:1', b'7.jpg,7']
[b'data/C.csv:2', b'8.jpg,8']
[b'data/C.csv:3', b'9.jpg,9']
"""

案例3：讀取圖片（每次讀取全部圖片內容，不是一行一行）

import tensorflow as tf

filename = ['1.jpg', '2.jpg']
filename_queue = tf.train.string_input_producer(filename, shuffle=False, num_epochs=1)
reader = tf.WholeFileReader()              # 文件讀取器
key, value = reader.read(filename_queue)   # 讀取文件 key:文件名；value：圖片數據，bytes

with tf.Session() as sess:
    tf.local_variables_initializer().run()
    coord = tf.train.Coordinator()      # 線程的協調器
    threads = tf.train.start_queue_runners(sess, coord)

    for i in range(filename.__len__()):
        image_data = sess.run(value)
        with open('img_%d.jpg' % i, 'wb') as f:
            f.write(image_data)
    coord.request_stop()
    coord.join(threads)

二、（隨機）批量數據讀取方式：

　　功能：shuffle_batch() 和 batch() 這兩個API都是從文件隊列中批量獲取數據，使用方式類似；

案例4：slice_input_producer() 與 batch()

import tensorflow as tf
import numpy as np

images = np.arange(20).reshape([10, 2])
label = np.asarray(range(0, 10))
images = tf.cast(images, tf.float32)　　# 可以註釋掉，不影響運行結果
label = tf.cast(label, tf.int32)　　　　 # 可以註釋掉，不影響運行結果

batchsize = 6   # 每次獲取元素的數量
input_queue = tf.train.slice_input_producer([images, label], num_epochs=None, shuffle=False)
image_batch, label_batch = tf.train.batch(input_queue, batch_size=batchsize)

# 隨機獲取 batchsize個元素，其中，capacity：隊列容量，這個參數一定要比 min_after_dequeue 大
# image_batch, label_batch = tf.train.shuffle_batch(input_queue, batch_size=batchsize, capacity=64, min_after_dequeue=10)

with tf.Session() as sess:
    coord = tf.train.Coordinator()      # 線程的協調器
    threads = tf.train.start_queue_runners(sess, coord)     # 開始在圖表中收集隊列運行器
    for cnt in range(2):
        print("第{}次獲取數據,每次batch={}...".format(cnt+1, batchsize))
        image_batch_v, label_batch_v = sess.run([image_batch, label_batch])
        print(image_batch_v, label_batch_v, label_batch_v.__len__())

    coord.request_stop()
    coord.join(threads)

"""
運行結果：
第1次獲取數據,每次batch=6...
[[ 0.  1.]
 [ 2.  3.]
 [ 4.  5.]
 [ 6.  7.]
 [ 8.  9.]
 [10. 11.]] [0 1 2 3 4 5] 6
第2次獲取數據,每次batch=6...
[[12. 13.]
 [14. 15.]
 [16. 17.]
 [18. 19.]
 [ 0.  1.]
 [ 2.  3.]] [6 7 8 9 0 1] 6
"""

案例5：從本地批量的讀取圖片 — string_input_producer() 與 batch()

 1 import tensorflow as tf
 2 import glob
 3 import cv2 as cv
 4 
 5 def read_imgs(filename, picture_format, input_image_shape, batch_size=1):
 6     """
 7     從本地批量的讀取圖片
 8     :param filename: 圖片路徑（包括圖片的文件名），[]
 9     :param picture_format: 圖片的格式，如 bmp,jpg,png等; string
10     :param input_image_shape: 輸入圖像的大小; (h,w,c)或[]
11     :param batch_size: 每次從文件隊列中加載圖片的數量; int
12     :return: batch_size張圖片數據, Tensor
13     """
14     global new_img
15     # 創建文件隊列
16     file_queue = tf.train.string_input_producer(filename, num_epochs=1, shuffle=True)
17     # 創建文件讀取器
18     reader = tf.WholeFileReader()
19     # 讀取文件隊列中的文件
20     _, img_bytes = reader.read(file_queue)
21     # print(img_bytes)    # Tensor("ReaderReadV2_19:1", shape=(), dtype=string)
22     # 對圖片進行解碼
23     if picture_format == ".bmp":
24         new_img = tf.image.decode_bmp(img_bytes, channels=1)
25     elif picture_format == ".jpg":
26         new_img = tf.image.decode_jpeg(img_bytes, channels=3)
27     else:
28         pass
29     # 重新設置圖片的大小
30     # new_img = tf.image.resize_images(new_img, input_image_shape)
31     new_img = tf.reshape(new_img, input_image_shape)
32     # 設置圖片的數據類型
33     new_img = tf.image.convert_image_dtype(new_img, tf.uint8)
34 
35     # return new_img
36     return tf.train.batch([new_img], batch_size)
37 
38 
39 def main():
40     image_path = glob.glob(r'F:\demo\FaceRecognition\人臉庫\ORL\*.bmp')
41     image_batch = read_imgs(image_path, ".bmp", (112, 92, 1), 5)
42     print(type(image_batch))
43     # image_path = glob.glob(r'.\*.jpg')
44     # image_batch = read_imgs(image_path, ".jpg", (313, 500, 3), 1)
45 
46     sess = tf.Session()
47     sess.run(tf.local_variables_initializer())
48     tf.train.start_queue_runners(sess=sess)
49 
50     image_batch = sess.run(image_batch)
51     print(type(image_batch))    # <class 'numpy.ndarray'>
52 
53     for i in range(image_batch.__len__()):
54         cv.imshow("win_"+str(i), image_batch[i])
55     cv.waitKey()
56     cv.destroyAllWindows()
57 
58 def start():
59     image_path = glob.glob(r'F:\demo\FaceRecognition\人臉庫\ORL\*.bmp')
60     image_batch = read_imgs(image_path, ".bmp", (112, 92, 1), 5)
61     print(type(image_batch))    # <class 'tensorflow.python.framework.ops.Tensor'>
62 
63 
64     with tf.Session() as sess:
65         sess.run(tf.local_variables_initializer())
66         coord = tf.train.Coordinator()      # 線程的協調器
67         threads = tf.train.start_queue_runners(sess, coord)     # 開始在圖表中收集隊列運行器
68         image_batch = sess.run(image_batch)
69         print(type(image_batch))    # <class 'numpy.ndarray'>
70 
71         for i in range(image_batch.__len__()):
72             cv.imshow("win_"+str(i), image_batch[i])
73         cv.waitKey()
74         cv.destroyAllWindows()
75 
76         # 若使用 with 方式打開 Session，且沒加如下2行語句，則會出錯
77         # ERROR:tensorflow:Exception in QueueRunner: Enqueue operation was cancelled；
78         # 原因：文件隊列線程還處於工作狀態（隊列中還有圖片數據），而加載完batch_size張圖片會話就會自動關閉，同時關閉文件隊列線程
79         coord.request_stop()
80         coord.join(threads)
81 
82 
83 if __name__ == "__main__":
84     # main()
85     start()

從本地批量的讀取圖片案例

案列6：TFRecord文件打包與讀取

 1 def write_TFRecord(filename, data, labels, is_shuffler=True):
 2     """
 3     將數據打包成TFRecord格式
 4     :param filename: 打包後路徑名，默認在工程目錄下創建該文件；String
 5     :param data: 需要打包的文件路徑名；list
 6     :param labels: 對應文件的標籤；list
 7     :param is_shuffler:是否隨機初始化打包后的數據，默認：True；Bool
 8     :return: None
 9     """
10     im_data = list(data)
11     im_labels = list(labels)
12 
13     index = [i for i in range(im_data.__len__())]
14     if is_shuffler:
15         np.random.shuffle(index)
16 
17     # 創建寫入器，然後使用該對象寫入樣本example
18     writer = tf.python_io.TFRecordWriter(filename)
19     for i in range(im_data.__len__()):
20         im_d = im_data[index[i]]    # im_d:存放着第index[i]張圖片的路徑信息
21         im_l = im_labels[index[i]]  # im_l：存放着對應圖片的標籤信息
22 
23         # # 獲取當前的圖片數據  方式一：
24         # data = cv2.imread(im_d)
25         # # 創建樣本
26         # ex = tf.train.Example(
27         #     features=tf.train.Features(
28         #         feature={
29         #             "image": tf.train.Feature(
30         #                 bytes_list=tf.train.BytesList(
31         #                     value=[data.tobytes()])), # 需要打包成bytes類型
32         #             "label": tf.train.Feature(
33         #                 int64_list=tf.train.Int64List(
34         #                     value=[im_l])),
35         #         }
36         #     )
37         # )
38         # 獲取當前的圖片數據  方式二：相對於方式一，打包文件佔用空間小了一半多
39         data = tf.gfile.FastGFile(im_d, "rb").read()
40         ex = tf.train.Example(
41             features=tf.train.Features(
42                 feature={
43                     "image": tf.train.Feature(
44                         bytes_list=tf.train.BytesList(
45                             value=[data])), # 此時的data已經是bytes類型
46                     "label": tf.train.Feature(
47                         int64_list=tf.train.Int64List(
48                             value=[im_l])),
49                 }
50             )
51         )
52 
53         # 寫入將序列化之後的樣本
54         writer.write(ex.SerializeToString())
55     # 關閉寫入器
56     writer.close()

TFRecord文件打包案列

 1 import tensorflow as tf
 2 import cv2
 3 
 4 def read_TFRecord(file_list, batch_size=10):
 5     """
 6     讀取TFRecord文件
 7     :param file_list: 存放TFRecord的文件名，List
 8     :param batch_size: 每次讀取圖片的數量
 9     :return: 解析後圖片及對應的標籤
10     """
11     file_queue = tf.train.string_input_producer(file_list, num_epochs=None, shuffle=True)
12     reader = tf.TFRecordReader()
13     _, ex = reader.read(file_queue)
14     batch = tf.train.shuffle_batch([ex], batch_size, capacity=batch_size * 10, min_after_dequeue=batch_size * 5)
15 
16     feature = {
17         'image': tf.FixedLenFeature([], tf.string),
18         'label': tf.FixedLenFeature([], tf.int64)
19     }
20     example = tf.parse_example(batch, features=feature)
21 
22     images = tf.decode_raw(example['image'], tf.uint8)
23     images = tf.reshape(images, [-1, 32, 32, 3])
24 
25     return images, example['label']
26 
27 
28 
29 def main():
30     # filelist = ['data/train.tfrecord']
31     filelist = ['data/test.tfrecord']
32     images, labels = read_TFRecord(filelist, 2)
33     with tf.Session() as sess:
34         sess.run(tf.local_variables_initializer())
35         coord = tf.train.Coordinator()
36         threads = tf.train.start_queue_runners(sess=sess, coord=coord)
37 
38         try:
39             while not coord.should_stop():
40                 for i in range(1):
41                     image_bth, _ = sess.run([images, labels])
42                     print(_)
43 
44                     cv2.imshow("image_0", image_bth[0])
45                     cv2.imshow("image_1", image_bth[1])
46                 break
47         except tf.errors.OutOfRangeError:
48             print('read done')
49         finally:
50             coord.request_stop()
51         coord.join(threads)
52         cv2.waitKey(0)
53         cv2.destroyAllWindows()
54 
55 if __name__ == "__main__":
56     main()

TFReord文件的讀取案列

本站聲明:網站內容來源於博客園,如有侵權,請聯繫我們,我們將及時處理

【其他文章推薦】

※教你寫出一流的銷售文案?

※廣告預算用在刀口上，台北網頁設計公司幫您達到更多曝光效益

※回頭車貨運收費標準

※別再煩惱如何寫文案,掌握八大原則!

※超省錢租車方案

※產品缺大量曝光嗎?你需要的是一流包裝設計!

Tensorflow 中（批量）讀取數據的案列分析及TFRecord文件的打包與讀取

內容概要：

單一數據讀取方式：

（隨機）批量數據讀取方式：

TFRecord文件的打包與讀取

一、單一數據讀取方式

二、（隨機）批量數據讀取方式：

內容概要：

單一數據讀取方式：

（隨機）批量數據讀取方式：

TFRecord文件的打包與讀取

一、單一數據讀取方式

二、（隨機）批量數據讀取方式：

You may also like...

近期文章

分類

彙整

Tensorflow 中（批量）讀取數據的案列分析及TFRecord文件的打包與讀取

內容概要：

單一數據讀取方式：

（隨機）批量數據讀取方式：

TFRecord文件的打包與讀取

一、單一數據讀取方式

二、（隨機）批量數據讀取方式：

內容概要：

單一數據讀取方式：

（隨機）批量數據讀取方式：

TFRecord文件的打包與讀取

一、單一數據讀取方式

二、（隨機）批量數據讀取方式：

You may also like...

礦工沉默、廠商流淚，獨立顯卡好日子遠去

舊版WinRAR用戶請注意！國家級駭客正鎖定你的電腦，立即檢查更新

Niantic發布網頁瀏覽器用Lightship增強實境視覺定位系統

近期文章

標籤

分類

彙整