zl程序教程

您现在的位置是:首页 >  其它

当前栏目

TensorRTx 开源代码内容说明

说明 内容 源代码
2023-09-27 14:20:16 时间

TensorRTx 提供了把常见网络模型转化为 TensorRT 格式的功能。TensorRTx旨在使用tensorrt网络定义API实现流行的深度学习网络。tensorrt有内置的解析器,包括caffeparser、uffparser、onnxparser等,当我们使用这些解析器时,我们经常遇到一些“不受支持的操作或层”问题,特别是一些最先进的模型正在使用新类型的层。

那么我们为什么不跳过所有的解析器呢?我们只使用TensorRT网络定义API来构建整个网络,并不复杂。

TensorRTx 所有模型首先在pytorch/mxnet/tensorflown中实现,然后导出权重文件xxx.wts,然后使用tensorrt加载权重,定义网络并进行推理。一些pytorch实现可以在my repo Pytorchx中找到,其余的来自polular开源实现。

更新

  • 19 Aug 2022. Dominic and sbmalik: Yolov3-tiny and Arcface support TRT8.
  • 6 Jul 2022. xiang-wuu: SuperPoint - Self-Supervised Interest Point Detection and Description, vSLAM related.
  • 26 May 2022. triple-Mu: YOLOv5 python script with CUDA Python API.
  • 23 May 2022. yhpark: Real-ESRGAN, Practical Algorithms for General Image/Video Restoration.
  • 19 May 2022. vjsrinivas: YOLOv3 TRT8 support and Python script.
  • 15 Mar 2022. sky_hole: Swin Transformer - Semantic Segmentation.
  • 19 Oct 2021. liuqi123123 added cuda preprossing for yolov5, preprocessing + inference is 3x faster when batchsize=8.
  • 18 Oct 2021. xupengao: YOLOv5 updated to v6.0, supporting n/s/m/l/x/n6/s6/m6/l6/x6.
  • 31 Aug 2021. FamousDirector: update retinaface to support TensorRT 8.0.
  • 27 Aug 2021. HaiyangPeng: add a python wrapper for hrnet segmentation.
  • 1 Jul 2021. freedenS: DE⫶TR: End-to-End Object Detection with Transformers. First Transformer model!
  • 10 Jun 2021. upczww: EfficientNet b0-b8 and l2.
  • 23 May 2021. SsisyphusTao: CenterNet DLA-34 with DCNv2 plugin.
  • 17 May 2021. ybw108: arcface LResNet100E-IR and MobileFaceNet.
  • 6 May 2021. makaveli10: scaled-yolov4 yolov4-csp.

教程

测试环境

  1. TensorRT 7.x
  2. TensorRT 8.x(Some of the models support 8.x)

如何运行

每个文件夹内部都有一个README,解释如何在其中运行模型。

模型

下列模型均被实现.

NameDescription
mlpthe very basic model for starters, properly documented
lenetthe simplest, as a “hello world” of this project
alexneteasy to implement, all layers are supported in tensorrt
googlenetGoogLeNet (Inception v1)
inceptionInception v3, v4
mnasnetMNASNet with depth multiplier of 0.5 from the paper
mobilenetMobileNet v2, v3-small, v3-large
resnetresnet-18, resnet-50 and resnext50-32x4d are implemented
senetse-resnet50
shufflenetShuffleNet v2 with 0.5x output channels
squeezenetSqueezeNet 1.1 model
vggVGG 11-layer model
yolov3-tinyweights and pytorch implementation from ultralytics/yolov3
yolov3darknet-53, weights and pytorch implementation from ultralytics/yolov3
yolov3-sppdarknet-53, weights and pytorch implementation from ultralytics/yolov3
yolov4CSPDarknet53, weights from AlexeyAB/darknet, pytorch implementation from ultralytics/yolov3
yolov5yolov5 v1.0-v6.0, pytorch implementation from ultralytics/yolov5
retinafaceresnet50 and mobilnet0.25, weights from biubug6/Pytorch_Retinaface
arcfaceLResNet50E-IR, LResNet100E-IR and MobileFaceNet, weights from deepinsight/insightface
retinafaceAntiCovmobilenet0.25, weights from deepinsight/insightface, retinaface anti-COVID-19, detect face and mask attribute
dbnetScene Text Detection, weights from BaofengZan/DBNet.pytorch
crnnpytorch implementation from meijieru/crnn.pytorch
ufldpytorch implementation from Ultra-Fast-Lane-Detection, ECCV2020
hrnethrnet-image-classification and hrnet-semantic-segmentation, pytorch implementation from HRNet-Image-Classification and HRNet-Semantic-Segmentation
psenetPSENet Text Detection, tensorflow implementation from liuheng92/tensorflow_PSENet
ibnnetIBN-Net, pytorch implementation from XingangPan/IBN-Net, ECCV2018
unetU-Net, pytorch implementation from milesial/Pytorch-UNet
repvggRepVGG, pytorch implementation from DingXiaoH/RepVGG
lprnetLPRNet, pytorch implementation from xuexingyu24/License_Plate_Detection_Pytorch
refinedetRefineDet, pytorch implementation from luuuyi/RefineDet.PyTorch
densenetDenseNet-121, from torchvision.models
rcnnFasterRCNN and MaskRCNN, model from detectron2
tsmTSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019
scaled-yolov4yolov4-csp, pytorch from WongKinYiu/ScaledYOLOv4
centernetCenterNet DLA-34, pytorch from xingyizhou/CenterNet
efficientnetEfficientNet b0-b8 and l2, pytorch from lukemelas/EfficientNet-PyTorch
detrDE⫶TR, pytorch from facebookresearch/detr
swin-transformerSwin Transformer - Semantic Segmentation, only support Swin-T. The Pytorch implementation is microsoft/Swin-Transformer
real-esrganReal-ESRGAN. The Pytorch implementation is real-esrgan
superpointSuperPoint. The Pytorch model is from magicleap/SuperPointPretrainedNetwork

Model Zoo

可以从model zoo下载.wts文件以进行快速评估。但建议将.wts从pytorch/mxnet/tensorflow模型转换,以便您可以重新训练自己的模型。

GoogleDrive | BaiduPan pwd: uvv2

棘手的操作

这些模型中遇到的一些棘手操作已经解决,但可能有更好的解决方案。

NameDescription
BatchNormImplement by a scale layer, used in resnet, googlenet, mobilenet, etc.
MaxPool2d(ceil_mode=True)use a padding layer before maxpool to solve ceil_mode=True, see googlenet.
average pool with paddinguse setAverageCountExcludesPadding() when necessary, see inception.
relu6use Relu6(x) = Relu(x) - Relu(x-6), see mobilenet.
torch.chunk()implement the ‘chunk(2, dim=C)’ by tensorrt plugin, see shufflenet.
channel shuffleuse two shuffle layers to implement channel_shuffle, see shufflenet.
adaptive pooluse fixed input dimension, and use regular average pooling, see shufflenet.
leaky reluI wrote a leaky relu plugin, but PRelu in NvInferPlugin.h can be used, see yolov3 in branch trt4.
yolo layer v1yolo layer is implemented as a plugin, see yolov3 in branch trt4.
yolo layer v2three yolo layers implemented in one plugin, see yolov3-spp.
upsamplereplaced by a deconvolution layer, see yolov3.
hsigmoidhard sigmoid is implemented as a plugin, hsigmoid and hswish are used in mobilenetv3
retinaface output decodeimplement a plugin to decode bbox, confidence and landmarks, see retinaface.
mishmish activation is implemented as a plugin, mish is used in yolov4
prelumxnet’s prelu activation with trainable gamma is implemented as a plugin, used in arcface
HardSwishhard_swish = x * hard_sigmoid, used in yolov5 v3.0
LSTMImplemented pytorch nn.LSTM() with tensorrt api

速度基准

ModelsDeviceBatchSizeModeInput Shape(HxW)FPS
YOLOv3-tinyXeon E5-2620/GTX10801FP32608x608333
YOLOv3(darknet53)Xeon E5-2620/GTX10801FP32608x60839.2
YOLOv3(darknet53)Xeon E5-2620/GTX10801INT8608x60871.4
YOLOv3-spp(darknet53)Xeon E5-2620/GTX10801FP32608x60838.5
YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10801FP32608x60835.7
YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10804FP32608x60840.9
YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10808FP32608x60841.3
YOLOv5-s v3.0Xeon E5-2620/GTX10801FP32608x608142
YOLOv5-s v3.0Xeon E5-2620/GTX10804FP32608x608173
YOLOv5-s v3.0Xeon E5-2620/GTX10808FP32608x608190
YOLOv5-m v3.0Xeon E5-2620/GTX10801FP32608x60871
YOLOv5-l v3.0Xeon E5-2620/GTX10801FP32608x60843
YOLOv5-x v3.0Xeon E5-2620/GTX10801FP32608x60829
YOLOv5-s v4.0Xeon E5-2620/GTX10801FP32608x608142
YOLOv5-m v4.0Xeon E5-2620/GTX10801FP32608x60871
YOLOv5-l v4.0Xeon E5-2620/GTX10801FP32608x60840
YOLOv5-x v4.0Xeon E5-2620/GTX10801FP32608x60827
RetinaFace(resnet50)Xeon E5-2620/GTX10801FP32480x64090
RetinaFace(resnet50)Xeon E5-2620/GTX10801INT8480x640204
RetinaFace(mobilenet0.25)Xeon E5-2620/GTX10801FP32480x640417
ArcFace(LResNet50E-IR)Xeon E5-2620/GTX10801FP32112x112333
CRNNXeon E5-2620/GTX10801FP3232x1001000

需要帮助,如果您获得了速度结果,请添加问题或PR。

确认和联系

欢迎任何意见、问题和讨论,请通过以下信息与作者联系。

E-mail: wangxinyu_es@163.com

WeChat ID: wangxinyu0375 (可加作者微信进tensorrtx交流群,备注:tensorrtx)