Caltech	CelebA	CIFAR	Cityscapes
EMNIST	FakeData	Fashion-MNIST	Flickr
ImageNet	Kinetics-400	KITTI	KMNIST
PhotoTour	Places365	QMNIST	SBD
SEMEION	STL10	SVHN	UCF101
VOC	WIDERFace

2.torchvision.transforms：

数据预处理方法，可以进行图片数据的放大、缩小、水平或垂直翻转等

from torchvision import transforms
data_transform = transforms.Compose([
    transforms.ToPILImage(),   # 这一步取决于后续的数据读取方式，如果使用内置数据集则不需要
    transforms.Resize(image_size),
    transforms.ToTensor()
])

3.torchvision.models：

预训练模型，包括图像分类、语义分割、物体检测、实例分割、人体关键点检测、视频分类等模型

为了提高训练效率，减少不必要的重复劳动，PyTorch官方也提供了一些预训练好的模型供我们使用，可以点击这里进行查看现在有哪些预训练模型，下面我们将对如何使用这些模型进行详细介绍。此处我们以torchvision0.10.0 为例，如果希望获取更多的预训练模型，可以使用使用pretrained-models.pytorch仓库。现有预训练好的模型可以分为以下几类：

Classification

在图像分类里面，PyTorch官方提供了以下模型，并正在不断增多。

AlexNet	VGG	ResNet	SqueezeNet
DenseNet	Inception v3	GoogLeNet	ShuffleNet v2
MobileNetV2	MobileNetV3	ResNext	Wide ResNet
MNASNet	EfficientNet	RegNet	持续更新

这些模型是在ImageNet-1k进行预训练好的，具体的使用我们会在后面进行介绍。除此之外，我们也可以点击这里去查看这些模型在ImageNet-1k的准确率。

Semantic Segmentation

语义分割的预训练模型是在COCO train2017的子集上进行训练的，提供了20个类别，包括background, aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa,train, tvmonitor。

FCN ResNet50	FCN ResNet101	DeepLabV3 ResNet50	DeepLabV3 ResNet101
LR-ASPP MobileNetV3-Large	DeepLabV3 MobileNetV3-Large	未完待续

具体我们可以点击这里进行查看预训练的模型的mean IOU和global pixelwise acc

Object Detection，instance Segmentation and Keypoint Detection

物体检测，实例分割和人体关键点检测的模型我们同样是在COCO train2017进行训练的，在下方我们提供了实例分割的类别和人体关键点检测类别：

COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus','train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A','handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball','kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket','bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl','banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza','donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table','N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone','microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book','clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
COCO_PERSON_KEYPOINT_NAMES =['nose','left_eye','right_eye','left_ear','right_ear','left_shoulder','right_shoulder','left_elbow','right_elbow','left_wrist','right_wrist','left_hip','right_hip','left_knee','right_knee','left_ankle','right_ankle']

Faster R-CNN	Mask R-CNN	RetinaNet	SSDlite
SSD	未完待续

同样的，我们可以点击这里查看这些模型在COCO train 2017上的box AP,keypoint AP,mask AP

Video classification

视频分类模型是在 Kinetics-400上进行预训练的

ResNet 3D 18	ResNet MC 18	ResNet (2+1) D
未完待续

同样我们也可以点击这里查看这些模型的。

4.torchvision.io：

视频、图片和文件的IO操作，包括读取、写入、编解码处理等

5.torchvision.ops：

计算机视觉的特定操作，包括但不仅限于NMS，RoIAlign（MASK R-CNN中应用的一种方法），RoIPool（Fast R-CNN中用到的一种方法）

6.torchvision.utils：

图片拼接、可视化检测和分割等操作

torchvision.utils 为我们提供了一些可视化的方法，可以帮助我们将若干张图片拼接在一起、可视化检测和分割的效果。具体方法可以点击这里进行查看。

总的来说，torchvision的出现帮助我们解决了常见的计算机视觉中一些重复且耗时的工作，并在数据集的获取、数据增强、模型预训练等方面大大降低了我们的工作难度，可以让我们更加快速上手一些计算机视觉任务。

2 PyTorchVideo（视频）

简介：PyTorchVideo是一个专注于视频理解工作的深度学习库，提供加速视频理解研究所需的可重用、模块化和高效的组件，使用PyTorch开发，支持不同的深度学习视频组件，如视频模型、视频数据集和视频特定转换。
特点：基于PyTorch，提供Model Zoo，支持数据预处理和常见数据，采用模块化设计，支持多模态，优化移动端部署
使用方式：TochHub、PySlowFast、PyTorch Lightning

3 torchtext（文本）

torchtext的主要组成部分

torchtext可以方便的对文本进行预处理，例如截断补长、构建词表等。torchtext主要包含了以下的主要组成部分：

数据处理工具 torchtext.data.functional、torchtext.data.utils
数据集 torchtext.data.datasets
词表工具 torchtext.vocab
评测指标 torchtext.metrics

简介：torchtext是PyTorch的自然语言处理（NLP）的工具包，可对文本进行预处理，例如截断补长、构建词表等操作
构建数据集：使用Field类定义不同类型的数据
评测指标：使用torchtext.data.metrics下的方法，对NLP任务进行评测

本节参考

transforms实战


from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
%matplotlib inline
# 加载原始图片
img = Image.open("./lenna.jpg") 
print(img.size)
plt.imshow(img)
## transforms.CenterCrop(size)
# 对给定图片进行沿中心切割
# 对图片沿中心放大切割，超出图片大小的部分填0
img_centercrop1 = transforms.CenterCrop((500,500))(img)
print(img_centercrop1.size)
# 对图片沿中心缩小切割，超出期望大小的部分剔除
img_centercrop2 = transforms.CenterCrop((224,224))(img)
print(img_centercrop2.size)
plt.subplot(1,3,1),plt.imshow(img),plt.title("Original")
plt.subplot(1,3,2),plt.imshow(img_centercrop1),plt.title("500 * 500")
plt.subplot(1,3,3),plt.imshow(img_centercrop2),plt.title("224 * 224")
plt.show()
## transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
# 对图片的亮度，对比度，饱和度，色调进行改变
img_CJ = transforms.ColorJitter(brightness=1,contrast=0.5,saturation=0.5,hue=0.5)(img)
print(img_CJ.size)
plt.imshow(img_CJ)
## transforms.Grayscale(num_output_channels)
img_grey_c3 = transforms.Grayscale(num_output_channels=3)(img)
img_grey_c1 = transforms.Grayscale(num_output_channels=1)(img)
plt.subplot(1,2,1),plt.imshow(img_grey_c3),plt.title("channels=3")
plt.subplot(1,2,2),plt.imshow(img_grey_c1),plt.title("channels=1")
plt.show()
## transforms.Resize
# 等比缩放
img_resize = transforms.Resize(224)(img)
print(img_resize.size)
plt.imshow(img_resize)
## transforms.Scale
# 等比缩放 不推荐使用此转换以支持调整大小
img_scale = transforms.Scale(224)(img)
print(img_scale.size)
plt.imshow(img_scale)
## transforms.RandomCrop
# 随机裁剪成指定大小
# 设立随机种子
import torch
torch.manual_seed(31)
# 随机裁剪
img_randowm_crop1 = transforms.RandomCrop(224)(img)
img_randowm_crop2 = transforms.RandomCrop(224)(img)
print(img_randowm_crop1.size)
plt.subplot(1,2,1),plt.imshow(img_randowm_crop1)
plt.subplot(1,2,2),plt.imshow(img_randowm_crop2)
plt.show()
## transforms.RandomHorizontalFlip
# 随机左右旋转
# 设立随机种子，可能不旋转
import torch
torch.manual_seed(31)

img_random_H = transforms.RandomHorizontalFlip()(img)
print(img_random_H.size)
plt.imshow(img_random_H)
## transforms.RandomVerticalFlip
# 随机垂直方向旋转
img_random_V = transforms.RandomVerticalFlip()(img)
print(img_random_V.size)
plt.imshow(img_random_V)
## transforms.RandomResizedCrop
# 随机裁剪成指定大小
img_random_resizecrop = transforms.RandomResizedCrop(224,scale=(0.5,0.5))(img)
print(img_random_resizecrop.size)
plt.imshow(img_random_resizecrop)
## 对图片进行组合变化 tranforms.Compose()
# 对一张图片的操作可能是多种的，我们使用transforms.Compose()将他们组装起来
transformer = transforms.Compose([
    transforms.Resize(256),
    transforms.transforms.RandomResizedCrop((224), scale = (0.5,1.0)),
    transforms.RandomVerticalFlip(),
])
img_transform = transformer(img)
plt.imshow(img_transform)

猜你喜欢

.NET MVC第六章、@Html.Partial[ˈpɑːʃl](string name)分布视图
仿新浪微盾客户端项目简介五
作为前端开发，如何写好一个简历
ZZNUOJ_用C语言编写程序实现1141：进制转换(附完整源码)
grep' bb'
在多分类任务实验中手动实现使用 𝑳𝟐 正则化
【收藏】nvm的下载，安装与使用（nodejs版本管理）
kindle怎么导入电子书
UVA - 1368 DNA Consensus String
paddle 33 paddledetection推理结果保存为txt
C语言函数參数传递原理
用VS制作的windows服务安装包安装完后如何让服务自动启动
curl的用法整理

相关主题

PyTorch 学习笔记
正则表达式学习笔记
SpringCloud学习笔记(一)

zl程序教程

当前栏目

PyTorch学习笔记（八）：PyTorch生态简介

PyTorch生态简介

往期学习资料推荐：

本系列目录：

一、 torchvision（图像）

1.torchvision.datasets：