使用预训练的卷积神经网络进行迁移学习 (tensorflow2.0官方教程翻译)

在本教程中，您将学习如何使用预训练网络进行转移学习对猫与狗图像分类。主要内容：使用预训练的模型进行特征提取，微调与训练的模型。

预训练模型是一个保存的网路，以前在大型数据集上训练的，通常是在大规模图像分类任务上，您可以按原样使用预训练模型，也可以使用转移学习将此模型自定义为给定的任务。

转移学习背后的直觉是，如果一个模型在一个大而且足够通用的数据集上训练，这个模型将有效地作为视觉世界的通用模型。然后，您可以利用这些学习的特征映射，而无需从头开始训练大型数据集上的大型模型。

在本节中，您将尝试两种方法来自定义预训练模型：

特征提取：使用先前网络学习的表示从新样本中提取有意义的特征，您只需在与训练模型的基础上添加一个新的分类器（将从头开始训练），以便您可以重新调整先前为我们的数据集学习的特征映射。您不需要(重新)训练整个模型，基本卷积网络已经包含了一些对图片分类非常有用的特性。然而，预训练模型的最后一个分类部分是特定于原始分类任务的，然后是特定于模型所训练的一组类。
微调：解冻冻结模型的顶层，并共同训练新添加的分类器和基础模型的最后一层，这允许我们“微调”基础模型中的高阶特征表示，以使它们与特定任务更相关。

你将要遵循一般的机器学习工作流程：

检查并理解数据
构建输入管道，在本例中使用Keras 的 ImageDataGenerator
构建模型
- 加载我们的预训练基础模型（和预训练的权重）
- 将我们的分类图层堆叠在顶部
训练模型
评估模型


x
1
from __future__ import absolute_import, division, print_function, unicode_literals
2
3
import os
4
5
import numpy as np
6
7
import matplotlib.pyplot as plt
8
9
import tensorflow as tf
10
11
keras = tf.keras

1. 数据预处理

1.1. 下载数据

使用 TensorFlow Datasets加载猫狗数据集。tfds 包是加载预定义数据的最简单方法，如果您有自己的数据，并且有兴趣使用TensorFlow进行导入，请参阅加载图像数据。


xxxxxxxxxx
1
1
import tensorflow_datasets as tfds

tfds.load方法下载并缓存数据，并返回tf.data.Dataset对象，这些对象提供了强大、高效的方法来处理数据并将其传递到模型中。

由于"cats_vs_dog" 没有定义标准分割，因此使用subsplit功能将其分为训练80%、验证10%、测试10%的数据。


xxxxxxxxxx
6
1
SPLIT_WEIGHTS = (8, 1, 1)
2
splits = tfds.Split.TRAIN.subsplit(weighted=SPLIT_WEIGHTS)
3
4
(raw_train, raw_validation, raw_test), metadata = tfds.load(
5
    'cats_vs_dogs', split=list(splits),
6
    with_info=True, as_supervised=True)

生成的tf.data.Dataset对象包含（图像，标签）对。图像具有可变形状和3个通道，标签是标量。


xxxxxxxxxx
3
1
print(raw_train)
2
print(raw_validation)
3
print(raw_test)


xxxxxxxxxx
3
1
    <DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>
2
    <DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>
3
    <DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>

显示训练集中的前两个图像和标签：


xxxxxxxxxx
6
1
get_label_name = metadata.features['label'].int2str
2
3
for image, label in raw_train.take(2):
4
  plt.figure()
5
  plt.imshow(image)
6
  plt.title(get_label_name(label))

png

1.2. 格式化数据

使用tf.image模块格式化图像，将图像调整为固定的输入大小，并将输入通道重新调整为[-1,1]范围。


xxxxxxxxxx
7
1
IMG_SIZE = 160 # 所有图像将被调整为160x160
2
3
def format_example(image, label):
4
  image = tf.cast(image, tf.float32)
5
  image = (image/127.5) - 1
6
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
7
  return image, label

使用map方法将此函数应用于数据集中的每一个项：


xxxxxxxxxx
3
1
train = raw_train.map(format_example)
2
validation = raw_validation.map(format_example)
3
test = raw_test.map(format_example)

打乱和批处理数据：


xxxxxxxxxx
6
1
BATCH_SIZE = 32
2
SHUFFLE_BUFFER_SIZE = 1000
3
4
train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
5
validation_batches = validation.batch(BATCH_SIZE)
6
test_batches = test.batch(BATCH_SIZE)

检查一批数据：


xxxxxxxxxx
4
1
for image_batch, label_batch in train_batches.take(1):
2
  pass
3
4
image_batch.shape


xxxxxxxxxx
1
1
    TensorShape([32, 160, 160, 3])

2. 从预先训练的网络中创建基础模型

您将从Google开发的MobileNet V2模型创建基础模型，这是在ImageNet数据集上预先训练的，一个包含1.4M图像和1000类Web图像的大型数据集。ImageNet有一个相当随意的研究训练数据集，其中包括“jackfruit(菠萝蜜)”和“syringe(注射器)”等类别，但这个知识基础将帮助我们将猫和狗从特定数据集中区分开来。

首先，您需要选择用于特征提取的MobileNet V2层，显然，最后一个分类层（在“顶部”，因为大多数机器学习模型的图表从下到上）并不是非常有用。相反，您将遵循通常的做法，在展平操作之前依赖于最后一层，该层称为“瓶颈层”，与最终/顶层相比，瓶颈层保持了很多通用性。

然后，实例化预装了ImageNet上训练的MobileNet V2模型权重，通过制定include_top=False参数，可以加载不包含顶部分类层的网络，这是特征提取的理想选择。


xxxxxxxxxx
6
1
IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)
2
3
# 从预先训练的模型MobileNet V2创建基础模型 
4
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
5
                                               include_top=False,
6
                                               weights='imagenet')

此特征提取器将每个160x160x3图像转换为5x5x1280的特征块，看看它对示例批量图像的作用：


xxxxxxxxxx
2
1
feature_batch = base_model(image_batch)
2
print(feature_batch.shape)


xxxxxxxxxx
1
1
    (32, 5, 5, 1280)

3. 特征提取

您将冻结上一步创建的卷积基，并将其用作特征提取器，在其上添加分类器并训练顶级分类器。

3.1. 冻结卷积基

在编译和训练模型之前，冻结卷积基是很重要的，通过冻结（或设置layer.trainable = False），可以防止在训练期间更新给定图层中的权重。MobileNet V2有很多层，因此将整个模型的可训练标志设置为False将冻结所有层。


xxxxxxxxxx
2
1
base_model.trainable = False
2
base_model.summary() # 看看基础模型架构


xxxxxxxxxx
17
1
    Model: "mobilenetv2_1.00_160"
2
    __________________________________________________________________________________________________
3
    Layer (type)                    Output Shape         Param #     Connected to
4
    ==================================================================================================
5
    input_1 (InputLayer)            [(None, 160, 160, 3) 0
6
    __________________________________________________________________________________________________
7
    Conv1_pad (ZeroPadding2D)       (None, 161, 161, 3)  0           input_1[0][0]
8
    __________________________________________________________________________________________________
9
    Conv1 (Conv2D)                  (None, 80, 80, 32)   864         Conv1_pad[0][0]
10
    __________________________________________________________________________________________________
11
    .....（此处省略很多层）
12
    __________________________________________________________________________________________________
13
    Conv_1_bn (BatchNormalizationV1 (None, 5, 5, 1280)   5120        Conv_1[0][0]
14
    __________________________________________________________________________________________________
15
    out_relu (ReLU)                 (None, 5, 5, 1280)   0           Conv_1_bn[0][0]
16
    ==================================================================================================
17
    ...

3.2. 添加分类头

要从特征块生成预测，请用5x5在空间位置上进行平均，使用tf.keras.layers.GlobalAveragePooling2D层将特征转换为每个图像对应一个1280元素向量。


xxxxxxxxxx
3
1
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
2
feature_batch_average = global_average_layer(feature_batch)
3
print(feature_batch_average.shape)

(32, 1280)

应用tf.keras.layers.Dense层将这些特征转换为每个图像的单个预测。您不需要激活函数，因为此预测将被视为logit或原始预测值。正数预测第1类，负数预测第0类。


xxxxxxxxxx
3
1
prediction_layer = keras.layers.Dense(1)
2
prediction_batch = prediction_layer(feature_batch_average)
3
print(prediction_batch.shape)


xxxxxxxxxx
1
1
    (32, 1)

现在使用tf.keras.Sequential堆叠特征提取器和这两个层：


xxxxxxxxxx
5
1
model = tf.keras.Sequential([
2
  base_model,
3
  global_average_layer,
4
  prediction_layer
5
])

3.3. 编译模型

你必须在训练之前编译模型，由于有两个类，因此使用二进制交叉熵损失：


xxxxxxxxxx
6
1
base_learning_rate = 0.0001
2
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),
3
              loss='binary_crossentropy',
4
              metrics=['accuracy'])
5
              
6
model.summary()


xxxxxxxxxx
14
1
    Model: "sequential"
2
    _________________________________________________________________
3
    Layer (type)                 Output Shape              Param #
4
    =================================================================
5
    mobilenetv2_1.00_160 (Model) (None, 5, 5, 1280)        2257984
6
    _________________________________________________________________
7
    global_average_pooling2d (Gl (None, 1280)              0
8
    _________________________________________________________________
9
    dense (Dense)                (None, 1)                 1281
10
    =================================================================
11
    Total params: 2,259,265
12
    Trainable params: 1,281
13
    Non-trainable params: 2,257,984
14
    _________________________________________________________________

MobileNet中的2.5M参数被冻结，但Dense层中有1.2K可训练参数，它们分为两个tf.Variable对象：权重和偏差。


xxxxxxxxxx
1
1
len(model.trainable_variables)

2

3.4. 训练模型

经过10个周期的训练后，你应该看到约96%的准确率。


xxxxxxxxxx
10
1
num_train, num_val, num_test = (
2
  metadata.splits['train'].num_examples*weight/10
3
  for weight in SPLIT_WEIGHTS
4
)
5
6
initial_epochs = 10
7
steps_per_epoch = round(num_train)//BATCH_SIZE
8
validation_steps = 20
9
10
loss0,accuracy0 = model.evaluate(validation_batches, steps = validation_steps)


xxxxxxxxxx
1
1
    20/20 [==============================] - 4s 219ms/step - loss: 3.1885 - accuracy: 0.6109


xxxxxxxxxx
2
1
print("initial loss: {:.2f}".format(loss0))
2
print("initial accuracy: {:.2f}".format(accuracy0))


xxxxxxxxxx
2
1
    initial loss: 3.19
2
    initial accuracy: 0.61


xxxxxxxxxx
3
1
history = model.fit(train_batches,
2
                    epochs=initial_epochs,
3
                    validation_data=validation_batches)


xxxxxxxxxx
5
1
    Epoch 1/10
2
    581/581 [==============================] - 102s 175ms/step - loss: 1.8917 - accuracy: 0.7606 - val_loss: 0.8860 - val_accuracy: 0.8828
3
    ...
4
    Epoch 10/10
5
    581/581 [==============================] - 96s 165ms/step - loss: 0.4921 - accuracy: 0.9381 - val_loss: 0.1847 - val_accuracy: 0.9719

3.5. 学习曲线

让我们来看一下使用MobileNet V2基础模型作为固定特征提取器时，训练和验证准确性/损失的学习曲线。


xxxxxxxxxx
24
1
acc = history.history['accuracy']
2
val_acc = history.history['val_accuracy']
3
4
loss = history.history['loss']
5
val_loss = history.history['val_loss']
6
7
plt.figure(figsize=(8, 8))
8
plt.subplot(2, 1, 1)
9
plt.plot(acc, label='Training Accuracy')
10
plt.plot(val_acc, label='Validation Accuracy')
11
plt.legend(loc='lower right')
12
plt.ylabel('Accuracy')
13
plt.ylim([min(plt.ylim()),1])
14
plt.title('Training and Validation Accuracy')
15
16
plt.subplot(2, 1, 2)
17
plt.plot(loss, label='Training Loss')
18
plt.plot(val_loss, label='Validation Loss')
19
plt.legend(loc='upper right')
20
plt.ylabel('Cross Entropy')
21
plt.ylim([0,1.0])
22
plt.title('Training and Validation Loss')
23
plt.xlabel('epoch')
24
plt.show()

png

注意：如果您想知道为什么验证指标明显优于训练指标，主要因素是因为像tf.keras.layers.BatchNormalization和tf.keras.layers.Dropout这样的层会影响训练期间的准确性。在计算验证损失时，它们会被关闭。

在较小程度上，这也是因为训练指标报告了一个周期的平均值，而验证指标是在周期之后进行评估的，因此验证指标会看到已经训练稍长一些的模型。

4. 微调

在我们的特征提取实验中，您只在MobileNet V2基础模型上训练了几层，训练期间未预先更新预训练网络的权重。

进一步提高性能的方法是训练（或“微调”）预训练模型的顶层的权重以及您添加的分类器的训练，训练过程将强制将权重通过特征图调整为专门与我们的数据集关联的特征。

注意：只有在训练顶级分类器并将预先训练的模型设置为不可训练之后，才应尝试此操作。如果您在预先训练的模型上添加一个随机初始化的分类器并尝试联合训练所有层，则梯度更新的幅度将太大（由于分类器的随机权重），并且您的预训练模型将忘记它学到的东西。

此外，您应该尝试微调少量顶层而不是整个MobileNet模型，在大多数卷积网络中，层越高，它就越专业化。前几层学习非常简单和通用的功能，这些功能可以推广到几乎所有类型的图像，随着层越来越高，这些功能越来越多地针对训练模型的数据集。微调的目的是使这些专用功能适应新数据集，而不是覆盖通用学习。

4.1. 取消冻结模型的顶层

您需要做的就是解冻base_model并将底层设置为无法训练，然后重新编译模型（这些更改生效所必须的），并恢复训练。


xxxxxxxxxx
11
1
base_model.trainable = True
2
3
# 看看基础模型有多少层 
4
print("Number of layers in the base model: ", len(base_model.layers))
5
6
# 从此层开始微调 
7
fine_tune_at = 100
8
9
# 冻结‘fine_tune_at’层之前的所有层
10
for layer in base_model.layers[:fine_tune_at]:
11
  layer.trainable =  False


xxxxxxxxxx
1
1
    Number of layers in the base model:  155

4.2. 编译模型

使用低得多的训练率（学习率）编译模型：


xxxxxxxxxx
5
1
model.compile(loss='binary_crossentropy',
2
              optimizer = tf.keras.optimizers.RMSprop(lr=base_learning_rate/10),
3
              metrics=['accuracy'])
4
              
5
model.summary()


xxxxxxxxxx
14
1
    Model: "sequential"
2
    _________________________________________________________________
3
    Layer (type)                 Output Shape              Param #
4
    =================================================================
5
    mobilenetv2_1.00_160 (Model) (None, 5, 5, 1280)        2257984
6
    _________________________________________________________________
7
    global_average_pooling2d (Gl (None, 1280)              0
8
    _________________________________________________________________
9
    dense (Dense)                (None, 1)                 1281
10
    =================================================================
11
    Total params: 2,259,265
12
    Trainable params: 1,863,873
13
    Non-trainable params: 395,392
14
    _________________________________________________________________


xxxxxxxxxx
1
1
len(model.trainable_variables)


xxxxxxxxxx
1
1
   58

4.3. 继续训练模型

如果你训练得更早收敛，这将使你的准确率提高几个百分点。


xxxxxxxxxx
7
1
fine_tune_epochs = 10
2
total_epochs =  initial_epochs + fine_tune_epochs
3
4
history_fine = model.fit(train_batches,
5
                         epochs=total_epochs,
6
                         initial_epoch = initial_epochs,
7
                         validation_data=validation_batches)


xxxxxxxxxx
3
1
    ...
2
    Epoch 20/20
3
    581/581 [==============================] - 116s 199ms/step - loss: 0.1243 - accuracy: 0.9849 - val_loss: 0.1121 - val_accuracy: 0.9875

让我们看一下训练和验证精度/损失的学习曲线，当微调MobileNet V2基础模型的最后几层并在其上训练分类器是，验证损失远远高于训练损失，因此您可能有一些过度拟合。因为新的训练集相对较小且与原始的MobileNet V2数据集类似。

经过微调后，模型精度几乎达到98%。


xxxxxxxxxx
26
1
acc += history_fine.history['accuracy']
2
val_acc += history_fine.history['val_accuracy']
3
4
loss += history_fine.history['loss']
5
val_loss += history_fine.history['val_loss']
6
7
plt.figure(figsize=(8, 8))
8
plt.subplot(2, 1, 1)
9
plt.plot(acc, label='Training Accuracy')
10
plt.plot(val_acc, label='Validation Accuracy')
11
plt.ylim([0.8, 1])
12
plt.plot([initial_epochs-1,initial_epochs-1],
13
          plt.ylim(), label='Start Fine Tuning')
14
plt.legend(loc='lower right')
15
plt.title('Training and Validation Accuracy')
16
17
plt.subplot(2, 1, 2)
18
plt.plot(loss, label='Training Loss')
19
plt.plot(val_loss, label='Validation Loss')
20
plt.ylim([0, 1.0])
21
plt.plot([initial_epochs-1,initial_epochs-1],
22
         plt.ylim(), label='Start Fine Tuning')
23
plt.legend(loc='upper right')
24
plt.title('Training and Validation Loss')
25
plt.xlabel('epoch')
26
plt.show()

png

5. 小结:

使用预训练的模型进行特征提取： 使用小型数据集时，通常会利用在同一域中的较大数据集上训练的模型所学习的特征。这是通过实例化预先训练的模型，并在顶部添加完全连接的分类器来完成的。预训练的模型被“冻结”并且仅在训练期间更新分类器的权重。在这种情况下，卷积基提取了与每幅图像相关的所有特征，您只需训练一个分类器，根据所提取的特征集确定图像类。
微调与训练的模型： 为了进一步提高性能，可以通过微调将预训练模型的顶层重新调整为新数据集。在这种情况下，您调整了权重，以便模型学习特定于数据集的高级特征，当训练数据集很大并且非常类似于预训练模型训练的原始数据集时，通常建议使用此技术。

最新版本：https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-transfer_learning.html 英文版本：https://tensorflow.google.cn/beta/tutorials/images/transfer_learning 翻译建议PR：https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/transfer_learning.md