論文閱讀 | AttentionGAN

AttentionGAN： Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks

在CycleGAN的基礎上，

生成模型不僅生成影象，也生成注意力圖

，然後用注意力圖、生成影象、原圖進行融合，得到最終生成的影象。

另，關於U-GAT-IT（實驗效果比這篇有保證）：

這篇論文還有一篇前身，提出的演算法簡稱為AGGAN（Attention-Guided GAN）：

AttentionGAN（Attention-Guided GAN）論文中提出的結構包含兩種方式，一種就是AGGAN，另一種我就稱為AttentionGAN了。

我就直接抓主要的講了。

AGGAN

CycleGAN的結構，不過生成網路輸出：生成內容影象+Attention Mask，然後做融合

生成網路

$G_{X \rightarrow Y}$

的網路結構可以與CycleGAN一致，差異在於輸出層：

維持原來生成影象的3通道輸出，作為 content mask

$C_y \in {\Bbb R^{H \times W \times 3}}$

；

多加一個輸出通道，接上sigmoid啟用函式，得到 attention mask

$A_y \in {\Bbb R^{H \times W}}$

。

最終融合得到生成影象：

$G_y = C_y * A_y + x * (1-A_y) \\$

其中，

為輸入影象。

然後就是判別網路有兩個：

和

$D_{YA}$

（輸入4通道：

$[A_y, G_y], [A_y, y] \in {\Bbb R^{H \times W \times 4}}$

）。

以上對於生成網路

$G_{Y \rightarrow X}$

和判別網路

$D_X, D_{XA}$

也是同理。

損失函式

生成網路的損失函式包含了5項：Vanilla Adversarial Loss，Attention-Guided Adversarial Loss，Cycle-Consistency Loss，Pixel Loss，Attention Loss。

整體如下：

$\begin{array}{l} {\cal L}(G_{X \rightarrow Y}, G_{Y \rightarrow X}, D_X, D_Y, D_{XA}, D_{YA}) & = & \lambda_{gan}[ & {\cal L}_{GAN}(G_{X \rightarrow Y}, D_Y) + {\cal L}_{GAN}(G_{Y \rightarrow X}, D_{X}) \\ & & & + {\cal L}_{AGAN}(G_{X \rightarrow Y}, D_{YA}) +{\cal L}_{AGAN}(G_{Y \rightarrow X}, D_{XA}) ] \\ & & + &\lambda_{cycle} * {\cal L}_{cycle}(G_{X \rightarrow Y}, G_{Y \rightarrow X}) \\ & & + & \lambda_{pixel} * {\cal L}_{pixel}(G_{X \rightarrow Y}, G_{Y \rightarrow X}) + \lambda_{tv} * [{\cal L}_{tv}(A_x) +{\cal L}_{tv}(A_y)] \end{array}\\$

Vanilla Adversarial Loss

${\cal L}_{GAN}(G_{X \rightarrow Y}, D_Y)$

就是一般的對抗損失：

${\cal L}_{GAN}(G_{X \rightarrow Y}, D_Y) = {\Bbb E_{y \sim p_{data}(y)}}[log D_Y(y)] + {\Bbb E_{1 - y \sim p_{data}(x)}}[log (1-D_Y(G_{X \rightarrow Y}(x)))] \\$

這裡

$G_{X \rightarrow Y}(x) = G_y = = C_y * A_y + x * (1-A_y)$

。

Attention-Guided Adversarial Loss

${\cal L}_{AGAN}(G_{X \rightarrow Y}, D_{YA})$

就是加了 attention mask 的4通道輸入對抗損失：

${\cal L}_{AGAN}(G_{X \rightarrow Y}, D_{YA}) = {\Bbb E_{y \sim p_{data}(y)}}[log D_Y([A_y, y])] + {\Bbb E_{1 - y \sim p_{data}(x)}}[log (1-D_Y([A_y, G_{X \rightarrow Y}(x)]))] \\$

Cycle-Consistency Loss

${\cal L}_{cycle}(G_{X \rightarrow Y}, G_{Y \rightarrow X})$

就是CycleGAN中的迴圈一致性損失：

${\cal L}_{cycle}(G_{X \rightarrow Y}, G_{Y \rightarrow X}) = {\Bbb E_{y \sim p_{data}(x)}}[ || G_{Y \rightarrow X}(G_{X \rightarrow Y}(x)) - x ||_1 ] + {\Bbb E_{1 - y \sim p_{data}(y)}}[ || G_{X \rightarrow Y}(G_{Y \rightarrow X}(y)) - y|| _1 ] \\$

Pixel Loss

${\cal L}_{pixel}(G_{X \rightarrow Y}, G_{Y \rightarrow X})$

就是生成影象和輸入影象的重構損失：

${\cal L}_{pixel}(G_{X \rightarrow Y}, G_{Y \rightarrow X}) = {\Bbb E_{y \sim p_{data}(x)}}[ || G_{X \rightarrow Y}(x) - x ||_1 ] + {\Bbb E_{1 - y \sim p_{data}(y)}}[ || G_{Y \rightarrow X}(y) - y|| _1 ]\\$

Attention Loss

${\cal L}_{tv}(A_x)$

就是限制 attention mask 上的值平滑一點：

${\cal L}_{tv}(A_x) = \sum_{w,h=1}^{W,H} |A_x(w+1,h,c) - A_x(w, h,c)| + |A_x(w,h+1,c) - A_x(w, h,c)| \\$

以上各項損失的權重係數取值為：

$\lambda_{cycle}=10, \lambda_{gan}=0.5,\lambda_{pixel}=1, \lambda_{tv}=1e-6$

。和CycleGAN一樣，設定50的作為訓練D網路的buffer。

AttentionGAN

AttentionGAN主要改變在網路結構上，下采樣部分的網路是共享權重的，然後上取樣有兩個分支：Content Mask生成器分支，Attention Mask生成器分支。Attention Mask生成器有兩種含義的輸出（形式上是一樣的）：Foreground Attention Mask和Background Attention Mask。最終結合原圖，融合生成影象。

生成網路

$G_{X \rightarrow Y}$

生成過程為：

$G(x)=\sum_{f=1}^{n-1}(C_y^f * A_y^f) + x * A_y^b \\$

其中，

為輸入影象，

$\{C_y^f\}_{f=1}^{n-1}$

為 Content Mask，

$[\{A_y^f\}_{f=1}^{n-1}, A_y^b]$

為 Attention Mask，分別為Foreground Attention Mask和Background Attention Mask。

生成網路

$G_{Y \rightarrow X}$

的生成過程亦如此。

損失函式的話，就和CycleGAN一致了，

只有對抗損失、迴圈一致性損失和單位對映損失

：

$\begin{array}{l} {\cal L}(G_{X \rightarrow Y}, G_{Y \rightarrow X}, D_X, D_Y) & = & {\cal L}_{GAN}(G_{X \rightarrow Y}, D_Y, G_{Y \rightarrow X}, D_{X}) \\ & + & \lambda_{cycle} * {\cal L}_{cycle}(G_{X \rightarrow Y}, G_{Y \rightarrow X}) \\ & +& \lambda_{idt} * {\cal L}_{idt}(G_{X \rightarrow Y}, G_{Y \rightarrow X}) \end{array} \\$

其中，

$\lambda_{cycle} = 10, \lambda_{idt}=0.5$

，程式碼是：

$\lambda_{idt}=\lambda_{cycle}*0.5=5.0$

。

實驗結果

AGGAN就不必用了，AttentionGAN效果沒有U-GAT-IT效果好，但是兩者思想可以合併。

將AttentionGAN的mask思想融入到U-GAT-IT中（就是將U-GAT-IT的上取樣階段多拉一個分支出來，這個分支用於生成Attention mask，做sigmoid歸一化，與U-GAT-IT對應相乘再累加生成最終圖），我實驗了5個Attention mask，最終的效果我自己人眼評估，一些細節是會有改善，例如人的眼睛（看mask圖，5張mask圖的確有人眼、人臉、背景這種類似分割出來的注意力熱力圖，有2張整體機率都比較低），但整體上感覺和U-GAT-IT差不多，不算有一個level的提升，所以差不多還是U-GAT-IT真香。