Incremental Generative Occlusion Adversarial Suppression Network for Person ReID基于增量式生成遮挡与对抗抑制的行人再识别.
IEEE Transactions on Image Processing, 2021. [pdf][code]


赵才荣  , 吕心铋  , 窦曙光  , 张珊珊  , 吴俊  , 王亮  

  同济大学,  南京理工大学,  复旦大学,  中科院自动化所

TIP 2021



原文标题:Incremental Generative Occlusion Adversarial Suppression Network for Person ReID




1) The IGOAS network first generates easy-to-hard occluded data through the IGO block and then suppresses the generated occluded region with the adversarial suppression branch. In this process of adversarial learning, the IGOAS learns a more discriminative and robust feature for the occlusion problem.


2) We propose an incremental generative block to generate easy-to-hard occlusion data. It makes the network more robust to occlusion by gradually learning harder occlusion instead of hardest occlusion directly.

3) We develop an occlusion suppression module in the G&A framework. By suppressing the occlusions, our model can focus less on the background and more on the foreground.


在遮挡场景下,行人图像包含遮挡和较少具有辨别力的行人信息。之前的工作设计复杂的模块来捕捉隐性信息(包含人体姿态关键点, 掩码图和空间信息)来实现有效地对齐。少量研究工作专注于数据增强,只带来有限的性能提升。为了解决遮挡问题,我们提出一种新增量式生成遮挡与对抗抑制(Incremental Generative Occlusion Adversarial Suppression ,IGOAS)方法。在遮挡数据Occluded-DukeMTMC上,我们的方法在Rank-1和mAP指标上分别达到了60.1%和49.4%。


Fig. 1. Illustration of person re-id: (a) holistic re-id,  (b) - (c): partial re-id,    (d) to (f): occluded re-id.








Fig. 2. The flowchart of the proposed IGOAS network:

As shown in Fig. 2, the G&A framework is composed of a backbone network, a global branch提取图像稳定的全局特征), and an adversarial suppression branch(使用OSM通过将生成的遮挡区域的响应抑制为零,从而对前景信息给予更多关注).

1) Backbone Network: We use the ResNet-50 Stages 1, 2, and 3 as a backbone network. We modify the Stage 4 and do not employ down-sampling operation in the first residual block for a fair comparison with the recent works. In this way, we get a larger feature map of size 2048 × 24 × 8. In our framework, Stages 1, 2, and 3 share weights for fewer parameters learning, and Stage 4 does not. We need Stage 4 to focus on a specific task in each branch.

2) Global Branch: We need the global branch to learn steady global features(提取图像稳定的全局特征).Thus, we adopt the ResNet-50 baseline structure as the global branch, considering its competitive performance to re-id.

  1. Specifically, following Stage 4, we employ a global average pooling (GAP) to get a 2048-d feature vector.
  2. The vector is further reduced to 512-d via a fully connected layer,
  3. a batch normalization (BN) layer ,
  4. and a ReLU layer. Finally,
  5. a 512-d global feature vector is output for calculating classification loss.

3) Adversarial Suppression Branch: The adversarial suppression branch aims to pay more attention to foreground information by suppressing the response of the generated occlusion region to zero.使用OSM通过将生成的遮挡区域的响应抑制为零,从而对前景信息给予更多关注

We develop an occlusion suppression module (OSM) to achieve this goal.


1、Specifically, in the training phase, the IGO block converts the raw input into occluded data,
2、and then the raw data and the occluded data are entered into the respective branch of the frame for feature extraction.

  • In global branch, we retain the ResNet-50 baseline to extract steady global features of the raw data.
  • In adversarial suppression branch, the OSM and a global max pooling operation are employing to force this branch to suppress the occlusion’s response and strengthen discriminative feature representation on non-occluded regions of the pedestrian.

3、Finally, we get a more robust pedestrian feature descriptor by concatenating two branches’ features. And in the test phase, the incremental occlusion block won’t be performed.


Fig. 4. Simple flow of the batch-based incremental occlusion block in a batch: A cuboid represents the RGB-3D feature tensor of one image.
For a batch image, the block randomly generates uniform size and position of occlusion region, such as the red cuboid region.
All the units inside will be erasing(擦除) with random value in [0, 255] to simulate images suffer from occlusions.
Notably, the size of occlusion increases with the number of training iterations.


--> RGB-3D feature tensor
--> randomly generate occlusion region( partical region of the image is erased )
--> reased region is filled with random value in [0, 255](这里的遮挡区域是所有的像素点都是一个值,而不是说每个像素点随机为[0-255])




图 2. OSM的结构。  表示元素级的乘法运算


  1. 输入的特征 X 首先被送入注意力模块以获得精炼的特征 X^{'} 。在本文中,我们使用CBAM[8]作为OSM的注意力模块。
  2. 然后,通过对二进制掩码和 X^{'} 进行元素级的操作乘法得到 X_{mask} 。其中二进制掩码是通过缩放人工设计的图像遮挡掩码得到
  3. 最后,X_{mask} 通过掩码损失来监督 X (通过掩码损失对 X 进行监督),这样模型在反向传播的过程中就能学会忽略背景区域(图2中 X_{mask}  的黑色部分)。


 其中 MSE 代表均方误差。更具体地说。掩码损失函数使区域内的特征对应于遮挡的区域内的特征尽可能为零。由于遮挡的位置是已知的和随机的,它可以作为注意力模块的监督信息来学习抑制生成的遮挡反应。


 Fig. 5. Comparison of (a) single-based random occlusion block,   (b) batch-based incremental occlusion block.

In (a), each data in the batch suffers from a variable-size and variable-position occlusion.(随机位置、随机大小
In (b), all data in the batch suffer from occlusions with a uniform size and position. But as the number of iterations increases, it allows to generate variable-size, variable-position, and easy-to-hard occlusions.(多尺寸、多位置、由易到难的遮挡




与其它数据增强方法Batch DropBlock[11], Slow-Drop Block[14]和Batch Random Erasing Block的对比如下图所示:

图 3. IGO与三种数据增强方法对比

Fig. 6. Comparison of (a) Batch DropBlock,  (b) Slow-Drop Block,  (c) Batch-based Random Erasing Block and (d) Incremental Generative Occlusion Block.

In (a), batch dropblock randomly drops the same region (fixed-size) of all deep features to reinforce the attentive feature learning on local regions. In (b), slow-dropblock moves the dropping operation from the deep feature layer towards the input images ensure inputs diversity for feature learning.
In (c), batch-based random erasing block generates the occlusion mask of random size and random position and replaces the original image with mean of ImageNet.(随机尺寸、随机位置
In (d), our incremental occlusion block generates variable-position(可变位置) and easy-to-hard occlusions(由易到难的遮挡) to enrich the diversity of occlusions, more various images under occlusion can be generated. The easy-to-hard learning strategy also make the network more robust to occlusion by gradually learning harder occlusion instead of hardest occlusion directly.(可变位置、遮挡由少逐渐变多):在(d)中,这种数据增强方法宣称可以生成更多的遮挡图像【与c相比,能有更多的遮挡图像吗?】,同时easy-to-hard使network more robust【what is the reason of more robust?】。

 Fig. 9. 在阶段4被不同的遮挡方法学到的特征图的可视化结果

The first, second, third, and fourth rows show the results for the baseline (ResNet-50) without occlusion simulation, the IGOAS joint with single-based random occlusion, the IGOAS joint with batch-based random occlusion, and the IGOAS joint with batch-based incremental occlusion.



 Fig. 8. Visualization of the Stage 4 feature maps learned by different branch.

The first and second rows show the results for the global branch and the attentive branch.
From left to right, (i) Original images, (ii) Activation map, and (iii) Overlapped image.
In the heat map, the response increases from blue to red.


表1. 在Occluded-DukeMTMC上的性能

表 2. 在Occluded-ReID上的性能

表 3. 在两个完整行人数据集上的性能
