简介 || 张量 || 自动求导 || 构建模型 || TensorBoard 支持 || 训练模型 || 模型理解
PyTorch TensorBoard 支持
请跟随以下视频或在 youtube 上进行操作。
开始之前
要运行本教程,您需要安装 PyTorch、TorchVision、Matplotlib 和 TensorBoard。
使用 conda
:
condainstallpytorchtorchvision-cpytorch
condainstallmatplotlibtensorboard
使用 pip
:
pipinstalltorchtorchvisionmatplotlibtensorboard
一旦依赖项安装完成,请在安装了这些依赖项的 Python 环境中重新启动此笔记本。
简介
在本笔记本中,我们将使用 Fashion-MNIST 数据集训练一个 LeNet-5 的变体。Fashion-MNIST 是一组描绘各种服装的图片,包含十个类别标签,表示图片中描绘的服装类型。
# PyTorch model and training necessities
importtorch
importtorch.nnasnn
importtorch.nn.functionalasF
importtorch.optimasoptim
# Image datasets and image manipulation
importtorchvision
importtorchvision.transformsastransforms
# Image display
importmatplotlib.pyplotasplt
importnumpyasnp
# PyTorch TensorBoard support
fromtorch.utils.tensorboardimport SummaryWriter
# In case you are using an environment that has TensorFlow installed,
# such as Google Colab, uncomment the following code to avoid
# a bug with saving embeddings to your TensorBoard directory
# import tensorflow as tf
# import tensorboard as tb
# tf.io.gfile = tb.compat.tensorflow_stub.io.gfile
在 TensorBoard 中显示图像
让我们首先将数据集中的样本图像添加到 TensorBoard:
# Gather datasets and prepare them for consumption
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Store separate training and validations splits in ./data
training_set = torchvision.datasets.FashionMNIST('./data',
download=True,
train=True,
transform=transform)
validation_set = torchvision.datasets.FashionMNIST('./data',
download=True,
train=False,
transform=transform)
training_loader = torch.utils.data.DataLoader(training_set,
batch_size=4,
shuffle=True,
num_workers=2)
validation_loader = torch.utils.data.DataLoader(validation_set,
batch_size=4,
shuffle=False,
num_workers=2)
# Class labels
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')
# Helper function for inline image display
defmatplotlib_imshow(img, one_channel=False):
if one_channel:
img = img.mean(dim=0)
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
if one_channel:
plt.imshow(npimg, cmap="Greys")
else:
plt.imshow(np.transpose(npimg, (1, 2, 0)))
# Extract a batch of 4 images
dataiter = iter(training_loader)
images, labels = next(dataiter)
# Create a grid from the images and show them
img_grid = torchvision.utils.make_grid(images)
matplotlib_imshow(img_grid, one_channel=True)
0%| | 0.00/26.4M [00:00<?, ?B/s]
0%| | 65.5k/26.4M [00:00<01:12, 362kB/s]
1%| | 229k/26.4M [00:00<00:38, 679kB/s]
3%|3 | 852k/26.4M [00:00<00:10, 2.43MB/s]
7%|7 | 1.93M/26.4M [00:00<00:05, 4.10MB/s]
23%|##2 | 6.00M/26.4M [00:00<00:01, 13.8MB/s]
36%|###6 | 9.60M/26.4M [00:00<00:01, 16.6MB/s]
56%|#####6 | 14.9M/26.4M [00:01<00:00, 25.5MB/s]
70%|####### | 18.6M/26.4M [00:01<00:00, 24.0MB/s]
91%|######### | 24.0M/26.4M [00:01<00:00, 31.0MB/s]
100%|##########| 26.4M/26.4M [00:01<00:00, 19.3MB/s]
0%| | 0.00/29.5k [00:00<?, ?B/s]
100%|##########| 29.5k/29.5k [00:00<00:00, 327kB/s]
0%| | 0.00/4.42M [00:00<?, ?B/s]
1%|1 | 65.5k/4.42M [00:00<00:12, 359kB/s]
5%|5 | 229k/4.42M [00:00<00:06, 676kB/s]
21%|## | 918k/4.42M [00:00<00:01, 2.56MB/s]
44%|####3 | 1.93M/4.42M [00:00<00:00, 4.06MB/s]
100%|##########| 4.42M/4.42M [00:00<00:00, 6.04MB/s]
0%| | 0.00/5.15k [00:00<?, ?B/s]
100%|##########| 5.15k/5.15k [00:00<00:00, 29.7MB/s]
上文我们使用 TorchVision 和 Matplotlib 创建了一个输入数据小批量的可视化网格。接下来,我们在 SummaryWriter
上调用 add_image()
方法来记录图像以供 TensorBoard 使用,同时调用 flush()
以确保图像立即写入磁盘。
# Default log_dir argument is "runs" - but it's good to be specific
# torch.utils.tensorboard.SummaryWriter is imported above
writer = SummaryWriter('runs/fashion_mnist_experiment_1')
# Write image data to TensorBoard log dir
writer.add_image('Four Fashion-MNIST Images', img_grid)
writer.flush()
# To view, start TensorBoard on the command line with:
# tensorboard --logdir=runs
# ...and open a browser tab to http://localhost:6006/
如果您在命令行启动 TensorBoard 并在新的浏览器标签页中打开它(通常在
绘制标量以可视化训练过程
TensorBoard 对于跟踪训练进度和效果非常有用。接下来,我们将运行一个训练循环,跟踪一些指标,并保存数据供 TensorBoard 使用。
让我们定义一个模型来对图像块进行分类,并定义用于训练的优化器和损失函数:
classNet(nn.Module):
def__init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 4 * 4, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
defforward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 4 * 4)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
现在让我们训练一个 epoch,并每 1000 个批次评估一次训练集与验证集的损失:
print(len(validation_loader))
for epoch in range(1): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(training_loader, 0):
# basic training loop
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 1000 == 999: # Every 1000 mini-batches...
print('Batch {}'.format(i + 1))
# Check against the validation set
running_vloss = 0.0
# In evaluation mode some model specific operations can be omitted eg. dropout layer
net.train(False) # Switching to evaluation mode, eg. turning off regularisation
for j, vdata in enumerate(validation_loader, 0):
vinputs, vlabels = vdata
voutputs = net(vinputs)
vloss = criterion(voutputs, vlabels)
running_vloss += vloss.item()
net.train(True) # Switching back to training mode, eg. turning on regularisation
avg_loss = running_loss / 1000
avg_vloss = running_vloss / len(validation_loader)
# Log the running loss averaged per batch
writer.add_scalars('Training vs. Validation Loss',
{ 'Training' : avg_loss, 'Validation' : avg_vloss },
epoch * len(training_loader) + i)
running_loss = 0.0
print('Finished Training')
writer.flush()
2500
Batch 1000
Batch 2000
Batch 3000
Batch 4000
Batch 5000
Batch 6000
Batch 7000
Batch 8000
Batch 9000
Batch 10000
Batch 11000
Batch 12000
Batch 13000
Batch 14000
Batch 15000
Finished Training
切换到您打开的 TensorBoard,并查看 SCALARS 标签页。
可视化您的模型
TensorBoard 还可以用于检查模型内部的数据流。要做到这一点,可以使用模型和样本输入调用 add_graph()
方法:
# Again, grab a single mini-batch of images
dataiter = iter(training_loader)
images, labels = next(dataiter)
# add_graph() will trace the sample input through your model,
# and render it as a graph.
writer.add_graph(net, images)
writer.flush()
当您切换到 TensorBoard 时,您应该会看到一个 GRAPHS 标签。双击“NET”节点以查看模型中的层和数据流。
使用嵌入可视化您的数据集
我们使用的28x28图像块可以建模为784维的向量(28 * 28 = 784)。将其投影到低维表示可能会很有启发性。add_embedding()
方法会将一组数据投影到方差最大的三个维度上,并以交互式3D图表的形式展示。add_embedding()
方法自动通过投影到方差最大的三个维度来实现这一点。
接下来,我们将从数据中抽取一个样本,并生成这样的嵌入:
# Select a random subset of data and corresponding labels
defselect_n_random(data, labels, n=100):
assert len(data) == len(labels)
perm = torch.randperm(len(data))
return data[perm][:n], labels[perm][:n]
# Extract a random subset of data
images, labels = select_n_random(training_set.data, training_set.targets)
# get the class labels for each image
class_labels = [classes[label] for label in labels]
# log embeddings
features = images.view(-1, 28 * 28)
writer.add_embedding(features,
metadata=class_labels,
label_img=images.unsqueeze(1))
writer.flush()
writer.close()
现在,如果您切换到 TensorBoard 并选择 PROJECTOR 选项卡,您应该会看到一个投影的 3D 表示。您可以旋转和缩放模型。从大尺度和小尺度上检查它,看看是否能在投影数据和标签聚类中发现模式。
为了获得更好的可视性,建议:
-
从左边的“Color by”下拉菜单中选择“label”。
-
点击顶部的夜间模式图标,将浅色图像切换到深色背景上。
其他资源
欲了解更多信息,请查看:
-
关于 torch.utils.tensorboard.SummaryWriter 的 PyTorch 文档
-
PyTorch.org 教程 中的 TensorBoard 教程内容
-
有关 TensorBoard 的更多信息,请参阅 TensorBoard 文档