Weekly-220410 - WYM's Blog

本文最后更新于：April 10, 2022 pm

本周学习汇报

CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs^[1]

L-Verse: Bidirectional Generation Between Image and Text^[2]

Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences^[3]

Subspace Adversarial Training^[4]

1. CoordGAN

1.1 动机

对一个Image而言，可分为Texture部分和Structure部分。而生成图像则期待学习一个连续平滑的变量空间，去表达不同图像之间像素层面的相关性，从而更好的对Texture和Structure部分进行解构（latent disentanglement），从而生成更多的有意义的图像。文章旨在学习一个显式的correspondence map，设计了一个新的坐标空间，以学习图像之间的密度关联。

1.2 框架

分为四个部分：纹理映射网络、结构映射网络、坐标变换网络（对密度图的分析）、生成器模块

1.3 实验结果

1.4 总结展望

总结：设计了密度相关图，对GAN的latent space可以更好地进行解构。
展望：可以向3D方向进行扩展。

2. L-Verse

题目：L-Verse（CVPR-oral）
L-Verse: Bidirectional Generation Between Image and Text

第一行：由图像生成文本效果
第二行：由文本生成图像效果

2.1 动机

在高分辨率图像生成中，Transformer有着强大的性能。在文本与图像的转换方面，VQ-VAE则可以通过维护一个codebook来高效的提取图像中的信息，以转变为序列化的特征向量。本文提出了L-Verse，包括了一个特征增强变分自编码器（AugVAE），一个双向的自回归Transformer，可以实现Image-Text以及Text-Image的转变，而不需要微调或者是额外的目标检测框架。

贡献

AugVAE： 表现了SOTA的图像重构性能。
BiART： 有两个不同的embedding vector，可以分别以reference和target为条件进行生成，也就是可以根据图像生成文字，也可以根据文字生成图像。
L-Verse: 包含以上两个模块，不需要任何额外的目标检测框架（比如Faster-RCNN，用于区域特征的提取）就可以进行由图像到文字的生成。

2.2 框架

AugVAE： 图中蓝色部分，具有一个层级结构，可以有效提取图像中的特征到image token中。
REF： 给定样式，即根据图像样式生成文本。
GEN： 给定target，即根据文本生成图像。

2.3 实验

定量分析： 重构FID指标达到了SOTA的效果（AugVAE）

定性分析： I2T与T2I的效果

2.4 总结与展望

本文的主要任务是做图像与文本之间的双向生成，提出了两个组件，分别是AugVAE与BiART，可以有效提取图像中的信息，并达到一个双向生成的效果。
本文受VQ-GAN的启发，以利用Transformer生成高分辨率图像，受CLIP的影响，更好地对图像进行采样。

3. Probabilistic Warp Consistency

3.1 引言

方法： 本文提出了概率扭曲一致性，一个为语义匹配任务设计的弱监督目标函数。本文的目标是更直接地监督由网络预测得到的密度匹配分数，编码成为一个条件概率分布。 方法是，首先构造一个图像三元组，应用一个已知扭曲在其中一张图像I上，同时与另一个图像J构成了一个图像三元组，I与J属于同一个类别中的不同实例，然后使用从生成的图像三元组产生的约束导出概率学习目标。通过一个可学习的特殊状态扩展概率输出空间，本文进一步考虑了遮挡问题以及背景噪声问题。为设计合理的监督，文章设计了一个图像对之间的目标函数，去描述不同的图像类别。
实验方面：，本文将这一方法应用在了四个最近的语义匹配结构中，实验表明，本文的方法可达到一个SOTA的效果，同时，本文的方法与关键点注释相结合时，也会带来强监督效果的提升。
研究领域： semantic matching，寻找同一个类别中不同实例的像素级别的相关性，可应用在语义分割与图像编辑的领域。
解决问题： 强监督中需要大量人工标注的数据，弱监督（只有图像级别的标签，弱监督的一种）成本较低，但往往效果不佳。

3.2 相关工作

语义匹配结构： 主要分为三步，1）feature extraction｜特征提取，2）cost volume construction｜存储两图像之间各像素的匹配程度，3） displacement estimation｜位移估计。
无监督与弱监督语义匹配问题： use proxy losses on the cost volume constructed between real image pairs, with image labels as the only supervision.
视频中的无监督学习： 提出自监督方法提取特征。

3.3 方法

概率公式

$P_{I \leftarrow J}(i \mid j)=\frac{\exp \left(C_{I \leftarrow J}(i, j)\right)}{\sum_{k} \exp \left(C_{I \leftarrow J}(k, j)\right)}$

概率扭曲一致性约束

不匹配区域的建模

$L_{\text {vis-PW-bi }}=\sum_{i \prime} \widehat{V}\left(i^{\prime}\right) \mathcal{H}\left(\widehat{P}_{I \leftarrow J \leftarrow I^{\prime}}\left(\cdot \mid i^{\prime}\right), P_{W}\left(\cdot \mid i^{\prime}\right)\right)$ $L_{\mathrm{PNeg}}=\sum_{i} \mathcal{B}\left(\widehat{P}_{A \leftarrow I}(\emptyset \mid i), p_{\mathrm{neg}}\right)$

训练目标函数

$L_{\text {weak }}=L_{\text {vis-PW-bi }}+\lambda_{\text {P-warp-sup }} L_{\text {P-warp-sup }}+\lambda_{\mathrm{PNeg}} L_{\mathrm{PNeg}}$ $L_{\text {strong }}=L_{\text {vis-PW-bi }}+\lambda_{\text {P-warp-sup }} L_{\text {P-warp-sup }}+\lambda_{\mathrm{kp}} L_{\mathrm{kp}}$

3.4 实验效果

定量分析： 消融实验

定性分析： 直接预测一个Dirac-like分布

3.5 总结与展望

概率扭曲一致性 ：适用于弱监督领域的目标函数。
根据基于真实图像对生成的三元组图像，引入多个概率损失。
在多个benchmark中达到SOTA效果。

4. Subspace Adversarial Training

4.1 引言

要解决catastrophic overfitting 问题。

对抗训练： Adversarial training (AT), which aims to minimize the model’s risk under the worst-case perturbations, is currently the most effective approach for improving the robustness of deep neural networks.
单步对抗训练： Since the adversarial examples above are generated by onestep gradient propagation, the corresponding AT is called single-step AT.
提出方法： Based on this discovery, we propose a new AT method called Subspace Adversarial Training (Sub-AT),
which identifies such an effective subspace and conducts AT in it.

贡献

We approach the catastrophic overfitting in single-step AT from a novel view of optimization and firstly reveal the close link between the fast-growing gradient of each sample and overfitting, which can also be applied to explain the robust overfitting in multi-step AT.

We propose an efficient AT method, Sub-AT, which constrains AT in a carefully extracted subspace, to control the growth of gradient. It uniformly resolves both kinds of overfitting, significantly improves the robustness, and successfully overcomes the sensitivity to learning rates. It is also very easy to combine with other AT methods to bring consistent improvements.

Our Sub-AT achieves state-of-the-art adversarial robustness on single-step AT and can successfully train with larger steps and larger radius, which brings further improvements. Notably, our pure single-step AT achieves over 51% robust accuracy against PGD-50 attack of ϵ = 8/255 on CIFAR-10, competitive to the multi-step PGD-10 AT with great time benefits.

4.2 方法

4.3 实验效果

定量分析：

4.4 总结与展望

Reveal the close link between the fast-growing gradient of each sample and overfitting, which can also explain the robust overfitting in multi-step AT.
Constrain AT in a carefully extracted subspace.

5. 下一步计划

阅读特征匹配的相关论文及代码。
了解密度图相关的知识。

参考：

GAN Semantic matching I2T T2I

Pytorch分布式数据并行运算和一个小工具 Previous

Weekly-220403 Next