[HumanSeg] Add Semantic Connectivity-aware Learning and release teleconferencing dataset (PaddlePaddle#1685)

LutaoChu · web-flow · commit c6f8924844d8 · 2022-01-04T21:37:35.000+08:00
diff --git a/README.md b/README.md
@@ -8,6 +8,7 @@ English | [简体中文](README_CN.md)
 ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
 ## PaddleSeg has released the new version including the following features:
 
+* We published a paper on portrait segmentation named [PP-HumanSeg](./contrib/PP-HumanSeg/paper.md), and release Semantic Connectivity-aware Learning (SCL) framework and a Large-Scale Teleconferencing Video Dataset.
 * We published a paper on interactive segmentation named [EdgeFlow](https://arxiv.org/abs/2109.09406), in which the proposed approach achieved SOTA performance on several well-known datasets, and upgraded the interactive annotation tool, [EISeg](./EISeg).
 * We released two [Matting](./contrib/Matting) algorithms, DIM and MODNet, which achieve extremely fine-grained segmentation.
 * We provided advanced features on segmentation model compression, [Knowlede Distillation](./slim/distill) and [Model Quantization](./slim/quant), which accelerate model inference on multi-devices deployment.
diff --git a/README_CN.md b/README_CN.md
@@ -16,6 +16,7 @@ PaddleSeg团队将举办主题为《产业图像分割应用与实战》的两
 
 ## PaddleSeg发布2.3版本，欢迎体验
 
+* PaddleSeg团队发表人像分割论文[PP-HumanSeg](./contrib/PP-HumanSeg/paper.md)，并开源连通性学习（SCL）方法和大规模视频会议数据集。
 * PaddleSeg团队发表交互式分割论文[EdgeFlow](https://arxiv.org/abs/2109.09406)，已在多个数据集实现SOTA性能，并升级了交互式分割工具[EISeg](./EISeg)。
 * 开源两种[Matting](./contrib/Matting)算法，经典方法DIM，和实时性方法MODNet，实现精细化人像分割。
 * 发布图像分割高阶功能，[模型蒸馏](./slim/distill)和[模型量化](./slim/quant)方案，进一步提升模型的部署效率。
diff --git a/configs/fastscnn/fastscnn_cityscapes_1024x1024_40k.yml b/configs/fastscnn/fastscnn_cityscapes_1024x1024_40k.yml
@@ -0,0 +1,21 @@
+_base_: '../_base_/cityscapes_1024x1024.yml'
+
+batch_size: 4
+iters: 40000
+
+loss:
+  types:
+    - type: CrossEntropyLoss
+  coef: [1.0, 0.4]
+
+lr_scheduler:
+  type: PolynomialDecay
+  learning_rate: 0.025
+  end_lr: 1.0e-4
+  power: 0.9
+
+model:
+  type: FastSCNN
+  num_classes: 19
+  enable_auxiliary_loss: True
+  pretrained: null
diff --git a/configs/fastscnn/fastscnn_cityscapes_1024x1024_40k_SCL.yml b/configs/fastscnn/fastscnn_cityscapes_1024x1024_40k_SCL.yml
@@ -0,0 +1,26 @@
+_base_: '../_base_/cityscapes_1024x1024.yml'
+
+batch_size: 4
+iters: 40000
+
+loss:
+  types:
+    - type: MixedLoss
+      losses:
+        - type: CrossEntropyLoss
+        - type: SemanticConnectivityLearning
+      coef: [1, 0.01]
+    - type: CrossEntropyLoss
+  coef: [1.0, 0.4]
+
+lr_scheduler:
+  type: PolynomialDecay
+  learning_rate: 0.025
+  end_lr: 1.0e-4
+  power: 0.9
+
+model:
+  type: FastSCNN
+  num_classes: 19
+  enable_auxiliary_loss: True
+  pretrained: null
diff --git a/configs/fcn/fcn_hrnetw18_cityscapes_1024x512_80k_bs4.yml b/configs/fcn/fcn_hrnetw18_cityscapes_1024x512_80k_bs4.yml
@@ -0,0 +1,17 @@
+_base_: '../_base_/cityscapes.yml'
+
+model:
+  type: FCN
+  backbone:
+    type: HRNet_W18
+    align_corners: False
+    pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w18_ssld.tar.gz
+  num_classes: 19
+  pretrained: Null
+  backbone_indices: [-1]
+
+optimizer:
+  weight_decay: 0.0005
+
+iters: 80000
+batch_size: 4
diff --git a/configs/fcn/fcn_hrnetw18_cityscapes_1024x512_80k_bs4_SCL.yml b/configs/fcn/fcn_hrnetw18_cityscapes_1024x512_80k_bs4_SCL.yml
@@ -0,0 +1,26 @@
+_base_: '../_base_/cityscapes.yml'
+
+model:
+  type: FCN
+  backbone:
+    type: HRNet_W18
+    align_corners: False
+    pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w18_ssld.tar.gz
+  num_classes: 19
+  pretrained: Null
+  backbone_indices: [-1]
+
+optimizer:
+  weight_decay: 0.0005
+
+iters: 80000
+batch_size: 4
+
+loss:
+  types:
+    - type: MixedLoss
+      losses:
+        - type: CrossEntropyLoss
+        - type: SemanticConnectivityLearning
+      coef: [1, 0.05]
+  coef: [1]
diff --git a/configs/ocrnet/ocrnet_hrnetw48_cityscapes_1024x512_40k.yml b/configs/ocrnet/ocrnet_hrnetw48_cityscapes_1024x512_40k.yml
@@ -0,0 +1,28 @@
+
+_base_: '../_base_/cityscapes.yml'
+
+batch_size: 2
+iters: 40000
+
+model:
+  type: OCRNet
+  backbone:
+    type: HRNet_W48
+
+    pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz
+  num_classes: 19
+  backbone_indices: [0]
+
+optimizer:
+  type: sgd
+
+lr_scheduler:
+  type: PolynomialDecay
+  learning_rate: 0.01
+  power: 0.9
+
+loss:
+  types:
+    - type: CrossEntropyLoss
+    - type: CrossEntropyLoss
+  coef: [1, 0.4]
diff --git a/configs/ocrnet/ocrnet_hrnetw48_cityscapes_1024x512_40k_SCL.yml b/configs/ocrnet/ocrnet_hrnetw48_cityscapes_1024x512_40k_SCL.yml
@@ -0,0 +1,32 @@
+
+_base_: '../_base_/cityscapes.yml'
+
+batch_size: 2
+iters: 40000
+
+model:
+  type: OCRNet
+  backbone:
+    type: HRNet_W48
+
+    pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz
+  num_classes: 19
+  backbone_indices: [0]
+
+optimizer:
+  type: sgd
+
+lr_scheduler:
+  type: PolynomialDecay
+  learning_rate: 0.01
+  power: 0.9
+
+loss:
+  types:
+    - type: MixedLoss
+      losses:
+        - type: CrossEntropyLoss
+        - type: SemanticConnectivityLearning
+      coef: [1, 0.1]
+    - type: CrossEntropyLoss
+  coef: [1, 0.4]
diff --git a/configs/pp_humanseg_lite/README.md b/configs/pp_humanseg_lite/README.md
@@ -1,6 +1,6 @@
 # PP-HumanSeg-Lite
 
-自研超轻量级模型，适用于Web端或移动端实时分割场景。
+自研超轻量级模型ConnectNet，适用于Web端或移动端实时分割场景。
 
 ## Network Structure
 ![](pphumanseg_lite.png)
diff --git a/contrib/PP-HumanSeg/README.md b/contrib/PP-HumanSeg/README.md
@@ -8,6 +8,9 @@
 <img src="https://github.com/LutaoChu/transfer_station/raw/master/conference.gif" width="70%" height="70%">
 </p>
 
+## 最新动向
+- [2022-1-4] 人像分割论文[PP-HumanSeg](./paper.md)发表于WACV 2022 Workshop，并开源连通性学习（SCL）方法和大规模视频会议数据集。
+
 ## 目录
 - [人像分割模型](#人像分割模型)
   - [通用人像分割](#通用人像分割)
diff --git a/contrib/PP-HumanSeg/paper.md b/contrib/PP-HumanSeg/paper.md
@@ -0,0 +1,35 @@
+# Connectivity-Aware Portrait Segmentation With a Large-Scale Teleconferencing Video Dataset
+Official resource for the paper PP-HumanSeg: Connectivity-Aware Portrait Segmentation With a Large-Scale Teleconferencing Video Dataset. [[Paper](https://arxiv.org/abs/2112.07146) | [Poster](https://paddleseg.bj.bcebos.com/dygraph/humanseg/paper/12-HAD-poster.pdf) | [YouTube](https://www.youtube.com/watch?v=FlK8R5cdD7E)]
+
+## Semantic Connectivity-aware Learning
+SCL (Semantic Connectivity-aware Learning) framework, which introduces a SC Loss (Semantic Connectivity-aware Loss) to improve the quality of segmentation results from the perspective of connectivity. Support multi-class segmentation. [[Source code](../../paddleseg/models/losses/semantic_connectivity_learning.py)]
+
+SCL can improve the integrity of segmentation objects and increase segmentation accuracy. The experimental results on our Teleconferencing Video Dataset are shown in paper, and the experimental results on Cityscapes are as follows:
+
+### Perfermance on Cityscapes
+| Model | Backbone | Learning Strategy | GPUs * Batch Size(Per Card)| Training Iters | mIoU (%) | Config |
+|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
+|OCRNet|HRNet-W48|-|2*2|40000|76.23| [config](../../configs/ocrnet/ocrnet_hrnetw48_cityscapes_1024x512_40k.yml) |
+|OCRNet|HRNet-W48|SCL|2*2|40000|78.29(**+2.06**)|[config](../../configs/ocrnet/ocrnet_hrnetw48_cityscapes_1024x512_40k_SCL.yml) |
+|FCN|HRNet-W18|-|2*4|80000|77.81|[config](../../configs/fcn/fcn_hrnetw18_cityscapes_1024x512_80k_bs4.yml)|
+|FCN|HRNet-W18|SCL|2*4|80000|78.68(**+0.87**)|[config](../../configs/fcn/fcn_hrnetw18_cityscapes_1024x512_80k_bs4_SCL.yml)|
+|Fast SCNN|-|-|2*4|40000|56.41|[config](../../configs/fastscnn/fastscnn_cityscapes_1024x1024_40k.yml)|
+|Fast SCNN|-|SCL|2*4|40000|57.37(**+0.96**)|[config](../../configs/fastscnn/fastscnn_cityscapes_1024x1024_40k_SCL.yml)|
+
+## Large-Scale Teleconferencing Video Dataset
+A large-scale video portrait dataset that contains 291 videos from 23 conference scenes with 14K fine-labeled frames. The data can be obtained by sending an application email to chulutao@baidu.com.
+
+
+## Citation
+If our project is useful in your research, please citing:
+
+```latex
+@InProceedings{Chu_2022_WACV,
+    author    = {Chu, Lutao and Liu, Yi and Wu, Zewu and Tang, Shiyu and Chen, Guowei and Hao, Yuying and Peng, Juncai and Yu, Zhiliang and Chen, Zeyu and Lai, Baohua and Xiong, Haoyi},
+    title     = {PP-HumanSeg: Connectivity-Aware Portrait Segmentation With a Large-Scale Teleconferencing Video Dataset},
+    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
+    month     = {January},
+    year      = {2022},
+    pages     = {202-209}
+}
+```
diff --git a/docs/module/loss/SemanticConnectivityLearning_cn.md b/docs/module/loss/SemanticConnectivityLearning_cn.md
@@ -0,0 +1,27 @@
+简体中文 | [English](SemanticConnectivityLearning_en.md)
+## [SemanticConnectivityLearning](../../../paddleseg/models/losses/semantic_connectivity_learning.py)
+SCL（Semantic Connectivity-aware Learning）框架，它引入了SC Loss (Semantic Connectivity-aware Loss)，从连通性的角度提升分割结果的质量。支持多类别分割。
+
+论文信息：
+    Lutao Chu, Yi Liu, Zewu Wu, Shiyu Tang, Guowei Chen, Yuying Hao, Juncai Peng, Zhiliang Yu, Zeyu Chen, Baohua Lai, Haoyi Xiong.
+    "PP-HumanSeg: Connectivity-Aware Portrait Segmentation with a Large-Scale Teleconferencing Video Dataset"
+    In WACV 2022 workshop
+    https://arxiv.org/abs/2112.07146
+
+执行步骤：
+步骤1，连通域计算
+步骤2，连通域匹配与SC Loss计算
+```python
+class paddleseg.models.losses.SemanticConnectivityLearning(
+            ignore_index = 255,
+            max_pred_num_conn = 10,
+            use_argmax = True
+)
+```
+
+## 语义连通性学习(SCL) 使用指南
+
+### 参数
+* **ignore_index** (int): 指定一个在标注图中要忽略的像素值，其对输入梯度不产生贡献。当标注图中存在无法标注（或很难标注）的像素时，可以将其标注为某特定灰度值。在计算损失值时，其与原图像对应位置的像素将不作为损失函数的自变量。 *默认:``255``*
+* **max_pred_num_conn** (int): 预测连通域的最大数量。在训练开始时，往往存在大量连通域，导致计算非常耗时。因此，有必要限制预测连通域的最大数量，超出最大数量的连通域将不参与计算。
+* **use_argmax** (bool): 是否对logits进行argmax操作。
diff --git a/docs/module/loss/SemanticConnectivityLearning_en.md b/docs/module/loss/SemanticConnectivityLearning_en.md
@@ -0,0 +1,33 @@
+English | [简体中文](SemanticConnectivityLearning_cn.md)
+## [SemanticConnectivityLearning](../../../paddleseg/models/losses/semantic_connectivity_learning.py)
+SCL (Semantic Connectivity-aware Learning) framework, which introduces a SC Loss (Semantic Connectivity-aware Loss)
+to improve the quality of segmentation results from the perspective of connectivity. Support multi-class segmentation.
+
+The original article refers to
+    Lutao Chu, Yi Liu, Zewu Wu, Shiyu Tang, Guowei Chen, Yuying Hao, Juncai Peng, Zhiliang Yu, Zeyu Chen, Baohua Lai, Haoyi Xiong.
+    "PP-HumanSeg: Connectivity-Aware Portrait Segmentation with a Large-Scale Teleconferencing Video Dataset"
+    In WACV 2022 workshop
+    https://arxiv.org/abs/2112.07146
+
+Running process:
+Step 1. Connected Components Calculation
+Step 2. Connected Components Matching and SC Loss Calculation
+
+```python
+class paddleseg.models.losses.SemanticConnectivityLearning(
+            ignore_index = 255,
+            max_pred_num_conn = 10,
+            use_argmax = True
+)
+```
+
+## Semantic Connectivity Learning usage guidance
+
+### Args
+* **ignore_index** (int): Specify a pixel value to be ignored in the annotated image
+            and does not contribute to the input gradient.When there are pixels that cannot be marked (or difficult to be marked) in the marked image, they can be marked as a specific gray value. When calculating the loss value, the pixel corresponding to the original image will not be used as the independent variable of the loss function. *Default:``255``*
+* **max_pred_num_conn** (int): Maximum number of predicted connected components. At the beginning of training,
+                there will be a large number of connected components, and the calculation is very time-consuming.
+                Therefore, it is necessary to limit the maximum number of predicted connected components,
+                and the rest will not participate in the calculation.
+* **use_argmax** (bool): Whether to use argmax for logits.
diff --git a/docs/module/loss/losses_cn.md b/docs/module/loss/losses_cn.md
@@ -23,4 +23,6 @@
 
 * ## [paddleseg.models.losses.ohem_cross_entropy_loss](./OhemCrossEntropyLoss_cn.md)
 
-* ## [paddleseg.models.losses.ohem_edge_attention_loss](./OhemEdgeAttentionLoss_cn.md)
+* ## [paddleseg.models.losses.ohem_edge_attention_loss](./OhemEdgeAttentionLoss_cn.md)
+
+* ## [paddleseg.models.losses.semantic_connectivity_learning](./SemanticConnectivityLearning_cn.md)
diff --git a/docs/module/loss/losses_en.md b/docs/module/loss/losses_en.md
@@ -23,4 +23,6 @@ English | [简体中文](losses_cn.md)
 
 * ## [paddleseg.models.losses.ohem_cross_entropy_loss](./OhemCrossEntropyLoss_en.md)
 
-* ## [paddleseg.models.losses.ohem_edge_attention_loss](./OhemEdgeAttentionLoss_en.md)
+* ## [paddleseg.models.losses.ohem_edge_attention_loss](./OhemEdgeAttentionLoss_en.md)
+
+* ## [paddleseg.models.losses.semantic_connectivity_learning](./SemanticConnectivityLearning_en.md)
diff --git a/paddleseg/models/losses/__init__.py b/paddleseg/models/losses/__init__.py
@@ -33,3 +33,4 @@
 from .point_cross_entropy_loss import PointCrossEntropyLoss
 from .pixel_contrast_cross_entropy_loss import PixelContrastCrossEntropyLoss
 from .semantic_encode_cross_entropy_loss import SECrossEntropyLoss
+from .semantic_connectivity_learning import SemanticConnectivityLearning
diff --git a/paddleseg/models/losses/semantic_connectivity_learning.py b/paddleseg/models/losses/semantic_connectivity_learning.py