人臉識別損失函式簡介與Pytorch實現：ArcFace、SphereFace、CosFace

一般來說，人臉識別分三步走：

找人臉：圖片中找出含人臉的區域框出來

對齊人臉：將人臉的眼鏡鼻子嘴巴等標出來，以此作為依據對齊人臉

識別：將對齊的人臉進行識別，判定這張臉究竟是誰

本篇要介紹的損失函式，用於第三步驟，聚焦於更準確地識別這張臉究竟屬於誰，本質上屬於一個分類問題。

一言以蔽之ArcFace、SphereFace、CosFace三個損失函式相對於前輩們而言，改進的一個核心思路就是：

只有平常（train）更刻苦的訓練，才有可能在比賽中（test）中得到更好的結果。

它們都對卷積神經網路提出了更高的目標，在訓練階段更為艱難，也因此讓其成為了一個更好的分類器。

一、從前輩說起

首先談談他們的前輩：

維基百科介紹：

Softmax函式

，或稱

歸一化指數函式

［1］，是邏輯函式的一種推廣。它能將一個含任意實數的K維向量

$\mathbf {z}$

“壓縮”到另一個K維實向量

$\sigma (\mathbf {z} )$

中，使得每一個元素的範圍都在

$\sigma (\mathbf {z} )$

之間，並且所有元素的和為1。該函式的形式通常按下面的式子給出：

${\displaystyle \sigma (\mathbf {z} )_{j}={\frac {e^{z_{j}}}{\sum _{k=1}^{K}e^{z_{k}}}}} \ for\ j= 1, …,K.$

簡單來說 softmax 將一組向量進行壓縮，使得到的向量各元素之和為 1，而壓縮後的值便可以作為置信率，所以常用於分類問題。另外，在實際運算的時候，為了避免上溢和下溢，在將向量丟進softmax之前往往先對每個元素減去其中最大值，即：

${\displaystyle \sigma (\mathbf {z} )_{j}={\frac {e^{z_{j}-z_{max}}}{\sum _{k=1}^{K}e^{z_{k}-z_{max}}}}} \ for\ j= 1, …,K.$

想了解更多，可以參考：憶臻：softmax函式計算時候為什麼要減去一個最大值？

再談談一個容易搞混的東西：

$Softmax\ Loss$

上面我們丟入一個長度為

的

向量，得到

$\sigma$

，而softmax loss呢，則是：

$SL=\sum_{k=1}^{K}-y_klog(\sigma_k)$

其中

是一個長度為

的one-hot向量，即

$y_k\in\{0,1\}$

，只有ground truth對應的

。所以也可以簡寫為：

$SL=-y_{gt}log(\sigma_{gt})=-log(\sigma_{gt})$

到這裡我們不妨在看看交叉熵

：

$CE=\sum_{k=1}^{K}-P_{k}log(p_k)$

其中

是真實分佈，在分類任務中，

實際上等價於上面的

。而

則是預測分佈，在分類任務中

實際上等價於上面的

$\sigma$

。這樣一來進行化簡就得到：

$CE=\sum_{k=1}^{K}-y_klog(\sigma_k)\to\ CE=-log(\sigma_{gt})$

我咋覺得這麼眼熟呢。。。

所以，我們可以得到：

$SoftMax\ Loss = CrossEntropy(SoftMax)$

參考連結：

https：//

blog。csdn。net/u01438016

5/article/details/77284921

二、SphereFace

論文地址：

https：//

arxiv。org/pdf/1704。0806

3。pdf

要想增強

的分類能力，其實就是要在分佈上做到兩點：

讓同類之間距離更近

讓不同類之間距離更遠

不妨繼續看看

$SoftMax\ Loss$

：

$L=-log(\sigma_{gt})=-log({\frac {e^{z_{gt}}}{\sum _{k=1}^{K}e^{z_{k}}}}) \\=-log({\frac {e^{W_{gt}^{T}x+b_{gt}}}{\sum_{k=1}^{K}e^{w_{k}^{T}x+b_{k}}}}) \\=-log({\frac {e^{||W_{gt}||\ ||x||cos(\theta_{W_{gt},x})+b_{gt}}}{\sum_{k=1}^{K}e^{||W_{k}||\ ||x||cos(\theta_{W_k,x})+b_{k}}}})$

其中

$\theta_{i,j}\in (0,\pi)$

代表兩個向量

之間的夾角，如果對

歸一化，將偏置

置為0，即

$||W_k||=1 \ and \ b_k=0$

，則有：

$L_{m}=-log({\frac {e^{||x||cos(\theta_{W_{gt},x})}}{\sum_{k=1}^{K}e^{||x||cos(\theta_{_{Wk},x})}}})$

下標

表示

。

對於

$\theta$

我們乘上一個大於等於1的整數

：

$L_{ang}=-log({\frac {e^{||x||cos(m\theta_{W_{gt},x})}}{\sum_{k=1}^{K}e^{||x||cos(m\theta_{W_{k},x})}}}) \ m\in\{1,2,...\}$

這樣不僅放大了類之間的距離，也因放大了同類

$W_{gt}^T$

與

之間的間隔而使類內更聚攏。

不過上述公式仍有問題：原來的

$\theta_{i,j}\in (0,\pi)$

，如今

$m\theta_{i,j}\in (0,m\pi)$

超出了向量之間的夾角函式

定義域範圍

$(0,\pi)$

咋辦？

那就變個函式唄，把n個cos懟起來變成一個遞減的連續的函式：

$\psi(\theta_{i,j})=(-1)^ncos(m\theta_{i,j})-2n,\ \theta_{i,j}\in[\frac{n\pi}{m},\frac{(n+1)\pi}{m}],\ n\in[0,m-1]$

這樣一來：

$L_{ang}=-log({\frac {e^{||x||\psi(\theta_{W_{gt},x})}}{e^{||x||\psi(\theta_{W_{gt},x})}+ \sum_{k\neq gt}e^{||x||cos(\theta_{W_k,x})}}})$

如此我們就得到了SphereFace的損失函式

原論文則是：

$L_{ang}=\frac{1}{N}\sum_i-log({\frac {e^{||x||\psi(\theta_{y_i,i})}}{e^{||x||\psi(\theta_{y_i,i})}+\sum_{j\neq y_i}e^{||x||cos(\theta_{j,i})} }})$

其中

表示第

個樣本，

表示第

個樣本的

$ground\ truth$

標籤，

$\theta_{j,i}$

表示第

和樣本

之間的夾角。

論文中的視覺化圖片：

Pytorch程式碼實現：

# SphereFace

class

SphereProduct

（

。

Module

）：

“”“Implement of large margin cosine distance：：

Args：

in_features： size of each input sample

out_features： size of each output sample

m： margin

cos（m*theta）

”“”

def

__init__

（

self

，

in_features

，

out_features

，

）：

super

（

SphereProduct

，

self

）

。

__init__

（）

self

。

in_features

self

。

out_features

self

。

self

。

base

1000。0

self

。

gamma

0。12

self

。

power

self

。

LambdaMin

5。0

self

。

iter

self

。

weight

Parameter

（

torch

。

FloatTensor

（

out_features

，

in_features

））

。

init

。

xavier_uniform

（

self

。

weight

）

# duplication formula

# 將x\in［-1，1］範圍的重複index次對映到y\［-1，1］上

self

。

mlambda

［

lambda

：

，

lambda

：

，

lambda

：

，

lambda

：

，

lambda

：

，

lambda

：

］

“”“

執行以下程式碼直觀瞭解mlambda

import matplotlib。pyplot as plt

mlambda = ［

lambda x： x ** 0，

lambda x： x ** 1，

lambda x： 2 * x ** 2 - 1，

lambda x： 4 * x ** 3 - 3 * x，

lambda x： 8 * x ** 4 - 8 * x ** 2 + 1，

lambda x： 16 * x ** 5 - 20 * x ** 3 + 5 * x

］

x = ［0。01 * i for i in range（-100， 101）］

print（x）

for f in mlambda：

plt。plot（x，［f（i） for i in x］）

plt。show（）

”“”

def

forward

（

self

，

input

，

label

）：

# lambda = max（lambda_min，base*（1+gamma*iteration）^（-power））

self

。

iter

self

。

lamb

max

（

self

。

LambdaMin

，

self

。

base

（

self

。

gamma

self

。

iter

）

（

self

。

power

））

# ——————————————- cos（theta） & phi（theta） ——————————————-

cos_theta

。

linear

（

。

normalize

（

input

），

。

normalize

（

self

。

weight

））

cos_theta

。

clamp

（

，

）

cos_m_theta

self

。

mlambda

［

self

。

］（

cos_theta

）

theta

cos_theta

。

data

。

acos

（）

（

self

。

theta

3。14159265

）

。

floor

（）

phi_theta

（（

1。0

）

cos_m_theta

NormOfFeature

torch

。

norm

（

input

，

）

# ——————————————- convert label to one-hot ——————————————-

one_hot

torch

。

zeros

（

cos_theta

。

size

（））

one_hot

。

cuda

（）

cos_theta

。

is_cuda

else

one_hot

。

scatter_

（

，

label

。

view

（

，

），

）

# ——————————————- Calculate output ——————————————-

output

（

one_hot

（

phi_theta

cos_theta

）

（

self

。

lamb

））

cos_theta

output

NormOfFeature

。

view

（

，

）

return

output

def

__repr__

（

self

）：

return

self

。

__class__

。

__name__

‘（’

‘in_features=’

str

（

self

。

in_features

）

‘， out_features=’

str

（

self

。

out_features

）

‘， m=’

str

（

self

。

）

‘）’

三、CosFace

論文地址：

https：//

arxiv。org/pdf/1801。0941

4。pdf

和SphereFace類似，CosFace也是從

的餘弦表達形式入手，令

$||W_k||=1 \ and \ b_k=0$

。與此同時，作者發現

對於分類並沒有啥幫助，所以乾脆將其固定

，所以有：

$L_{ns}=\frac{1}{N}\sum_{i}-log\frac {e^{s\ cos(\theta_{y_i,i})}} {\sum_je^{s\ cos(\theta_{j,i})}}$

應該代表歸一化的

。

接下來與上文

類似的是也引入了常數

，不同的是這裡的

是加上去的：

$L_{lmc}=\frac{1}{N}\sum_i-log\frac{e^{s(cos(\theta_{y_i,i})-m)}}{e^{s(cos(\theta_{y_i,i})-m)}+\sum_{j\neq y_i}e^{s\ cos(\theta_j,i)}}$

$subject\ to:$

$W=\frac{W^*}{||W^*||}$

$x=\frac{x^*}{||x^*||}$

$cos(\theta_j,i)=W^T_jx_i$

以上我們就得到了CosFace中提出的

$Large\ Margin\ Cosine\ Loss$

程式碼實現：

# CosFace

class

AddMarginProduct

（

。

Module

）：

“”“Implement of large margin cosine distance：：

Args：

in_features： size of each input sample

out_features： size of each output sample

s： norm of input feature

m： margin

cos（theta） - m

”“”

def

__init__

（

self

，

in_features

，

out_features

，

30。0

，

0。40

）：

super

（

AddMarginProduct

，

self

）

。

__init__

（）

self

。

in_features

self

。

out_features

self

。

self

。

self

。

weight

Parameter

（

torch

。

FloatTensor

（

out_features

，

in_features

））

。

init

。

xavier_uniform_

（

self

。

weight

）

def

forward

（

self

，

input

，

label

）：

# ——————————————- cos（theta） & phi（theta） ——————————————-

cosine

。

linear

（

。

normalize

（

input

），

。

normalize

（

self

。

weight

））

phi

cosine

self

。

# ——————————————- convert label to one-hot ——————————————-

one_hot

torch

。

zeros

（

cosine

。

size

（），

device

‘cuda’

）

# one_hot = one_hot。cuda（） if cosine。is_cuda else one_hot

one_hot

。

scatter_

（

，

label

。

view

（

，

）

。

long

（），

）

# ——————-torch。where（out_i = {x_i if condition_i else y_i） ——————-

output

（

one_hot

phi

）

（（

1。0

one_hot

）

cosine

）

# you can use torch。where if your torch。__version__ is 0。4

output

self

。

# print（output）

return

output

def

__repr__

（

self

）：

return

self

。

__class__

。

__name__

‘（’

‘in_features=’

str

（

self

。

in_features

）

‘， out_features=’

str

（

self

。

out_features

）

‘， s=’

str

（

self

。

）

‘， m=’

str

（

self

。

）

‘）’

四、ArcFace

論文地址：

https：//

arxiv。org/pdf/1801。0769

8。pdf

話不多說，直接上公式：

$L_{}=\frac{1}{N}\sum_i-log\frac{e^{s(cos(\theta_{y_i,i}+m))}}{e^{s(cos(\theta_{y_i,i}+m))}+\sum_{j\neq y_i}e^{s\ cos(\theta_j,i)}}$

$subject\ to:$

$W=\frac{W^*}{||W^*||}$

$x=\frac{x^*}{||x^*||}$

$cos(\theta_j,i)=W^T_jx_i$

可以看到和CosFace非常類似，只是將

作為角度加上去了，這樣就強行拉大了同類之間的角度，使得神經網路更努力地將同類收得更緊。

虛擬碼實現步驟：

對

進行歸一化

對

進行歸一化

計算

得到預測向量

從

中挑出與ground truth對應的值

計算其反餘弦得到角度

角度加上m

得到挑出從

中挑出與ground truth對應的值所在位置的獨熱碼

將

$cos(\theta+m)$

透過獨熱碼放回原來的位置

對所有值乘上固定值

程式碼實現：

# ArcFace

class

ArcMarginProduct

（

。

Module

）：

“”“Implement of large margin arc distance：：

Args：

in_features： size of each input sample

out_features： size of each output sample

s： norm of input feature

m： margin

cos（theta + m）

”“”

def

__init__

（

self

，

in_features

，

out_features

，

30。0

，

0。50

，

easy_margin

False

）：

super

（

ArcMarginProduct

，

self

）

。

__init__

（）

self

。

in_features

self

。

out_features

self

。

self

。

# Parameter 的用途：

# 將一個不可訓練的型別Tensor轉換成可以訓練的型別parameter

# 並將這個parameter繫結到這個module裡面

# net。parameter（）中就有這個繫結的parameter，所以在引數最佳化的時候可以進行最佳化的

# https：//www。jianshu。com/p/d8b77cc02410

# 初始化權重

self

。

weight

Parameter

（

torch

。

FloatTensor

（

out_features

，

in_features

））

。

init

。

xavier_uniform_

（

self

。

weight

）

self

。

easy_margin

self

。

cos_m

math

。

cos

（

）

self

。

sin_m

math

。

sin

（

）

self

。

math

。

cos

（

math

。

）

self

。

math

。

sin

（

math

。

）

def

forward

（

self

，

input

，

label

）：

# ——————————————- cos（theta） & phi（theta） ——————————————-

# torch。nn。functional。linear（input， weight， bias=None）

# y=x*W^T+b

cosine

。

linear

（

。

normalize

（

input

），

。

normalize

（

self

。

weight

））

sine

torch

。

sqrt

（

1。0

torch

。

pow

（

cosine

，

））

# cos（a+b）=cos（a）*cos（b）-size（a）*sin（b）

phi

cosine

self

。

cos_m

sine

self

。

sin_m

self

。

easy_margin

：

# torch。where（condition， x， y） → Tensor

# condition （ByteTensor） – When True （nonzero）， yield x， otherwise yield y

# x （Tensor） – values selected at indices where condition is True

# y （Tensor） – values selected at indices where condition is False

# return：

# A tensor of shape equal to the broadcasted shape of condition， x， y

# cosine>0 means two class is similar， thus use the phi which make it

phi

torch

。

where

（

cosine

，

phi

，

cosine

）

else

：

phi

torch

。

where

（

cosine

self

。

，

phi

，

cosine

self

。

）

# ——————————————- convert label to one-hot ——————————————-

# one_hot = torch。zeros（cosine。size（）， requires_grad=True， device=‘cuda’）

# 將cos（\theta + m）更新到tensor相應的位置中

one_hot

torch

。

zeros

（

cosine

。

size

（），

device

‘cuda’

）

# scatter_（dim， index， src）

one_hot

。

scatter_

（

，

label

。

view

（

，

）

。

long

（），

）

# ——————-torch。where（out_i = {x_i if condition_i else y_i） ——————-

output

（

one_hot

phi

）

（（

1。0

one_hot

）

cosine

）

# you can use torch。where if your torch。__version__ is 0。4

output

self

。

# print（output）

return

output

到此ArcFace、SphereFace、CosFace的損失函式就介紹完啦~

參考連結：

https：//

blog。csdn。net/fuwenyan/

article/details/79657738

歡迎關注個人微信公眾號：

歡迎掃碼關注~

小蜜蜂問答

小蜜蜂問答

人臉識別損失函式簡介與Pytorch實現：ArcFace、SphereFace、CosFace

推薦文章

小蜜蜂問答

小蜜蜂問答

人臉識別損失函式簡介與Pytorch實現：ArcFace、SphereFace、CosFace

相關文章

正餘轉換公式？

無線電波平均能量密度怎麼算？

三角函數里的萬能公式如何推？

神經網路-腳踏車共享專案

推薦文章