零基礎入門深度學習（十一）：目標檢測之YOLOv3演算法實現下篇

課程名稱 | 零基礎入門深度學習

授課講師 | 孫高峰百度深度學習技術平臺部資深研發工程師

授課時間 | 每週二、週四晚20:00-21:00

導讀

本課程是百度官方開設的零基礎入門深度學習課程，主要面向沒有深度學習技術基礎或者基礎薄弱的同學，幫助大家在深度學習領域實現從0到1+的跨越。從本課程中，你將學習到： 1。深度學習基礎知識 2。 numpy實現神經網路構建和

梯度下降演算法

3。計算機視覺領域主要方向的原理、實踐 4。自然語言處理領域主要方向的原理、實踐 5。個性化推薦演算法的原理、實踐

百度深度學習技術平臺部資深研發工程師孫高峰，在上一講中為大家講解了YOLOv3演算法中產生候選區域和

卷積神經網路

提取特徵的部分，本講將為大家介紹建立損失函式、多層級檢測和預測輸出的相關內容。

損失函式

上一講中，我們已經從概念上將輸出特徵圖上的畫素點與預測框關聯起來了，那麼要對神經網路進行求解，還必須從數學上將網路輸出和預測框關聯起來，也就是要建立起損失函式跟網路輸出之間的關係。下面討論如何建立起YOLO-V3的損失函式。

對於每個預測框，YOLO-V3模型會建立三種類型的損失函式：

表徵是否包含目標物體的損失函式，透過pred_objectness和label_objectness計算

loss_obj = fluid。layers。sigmoid_cross_entropy_with_logits（pred_objectness， label_objectness）

表徵物體位置的損失函式，透過pred_location和label_location計算

pred_location_x = pred_location［：，：， 0，：，：］ pred_location_y = pred_location［：，：， 1，：，：］ pred_location_w = pred_location［：，：， 2，：，：］ pred_location_h = pred_location［：，：， 3，：，：］ loss_location_x = fluid。layers。sigmoid_cross_entropy_with_logits（pred_location_x， label_location_x） loss_location_y = fluid。layers。sigmoid_cross_entropy_with_logits（pred_location_y， label_location_y） loss_location_w = fluid。layers。abs（pred_location_w - label_location_w） loss_location_h = fluid。layers。abs（pred_location_h - label_location_h） loss_location = loss_location_x + loss_location_y + loss_location_w + loss_location_h

表徵物體類別的損失函式，透過pred_classification和label_classification計算

loss_obj = fluid。layers。sigmoid_cross_entropy_with_logits（pred_classification， label_classification）

在前面幾個小節中我們已經知道怎麼計算這些預測值和標籤了，但是遺留了一個小問題，就是沒有標註出哪些錨框的objectness為-1。為了完成這一步，我們需要計算出所有預測框跟真實框之間的IoU，然後把那些IoU大於閾值的真實框挑選出來。實現程式碼如下：

# 挑選出跟真實框IoU大於閾值的預測框

def

get_iou_above_thresh_inds

（

pred_box

，

gt_boxes

，

iou_threshold

）：

batchsize

pred_box

。

shape

［

］

num_rows

pred_box

。

shape

［

］

num_cols

pred_box

。

shape

［

］

num_anchors

pred_box

。

shape

［

］

ret_inds

。

zeros

（［

batchsize

，

num_rows

，

num_cols

，

num_anchors

］）

for

range

（

batchsize

）：

pred_box_i

pred_box

［

］

gt_boxes_i

gt_boxes

［

］

for

range

（

len

（

gt_boxes_i

））：

#gt in gt_boxes_i：

gt_boxes_i

［

］

gtx_min

［

］

［

］

2。

gty_min

［

］

［

］

2。

gtx_max

［

］

［

］

2。

gty_max

［

］

［

］

2。

（

gtx_max

gtx_min

1e-3

）

（

gty_max

gty_min

1e-3

）：

continue

。

maximum

（

pred_box_i

［：，

：，

］，

gtx_min

）

。

maximum

（

pred_box_i

［：，

：，

］，

gty_min

）

。

minimum

（

pred_box_i

［：，

：，

］，

gtx_max

）

。

minimum

（

pred_box_i

［：，

：，

］，

gty_max

）

intersection

。

maximum

（

，

0。

）

。

maximum

（

，

0。

）

（

gty_max

gty_min

）

（

gtx_max

gtx_min

）

（

pred_box_i

［：，

：，

］

pred_box_i

［：，

：，

］）

（

pred_box_i

［：，

：，

］

pred_box_i

［：，

：，

］）

union

intersection

iou

intersection

union

above_inds

。

where

（

iou

iou_threshold

）

ret_inds

［

］［

above_inds

］

ret_inds

。

transpose

（

ret_inds

，

（

，

））

return

ret_inds

。

astype

（

‘bool’

）

上面的函式可以得到哪些錨框的objectness需要被標註為-1，透過下面的程式，對label_objectness進行處理，將IoU大於閾值，但又不是正樣本的那些錨框標註為-1。

def

label_objectness_ignore

（

label_objectness

，

iou_above_thresh_indices

）：

# 注意：這裡不能簡單的使用 label_objectness［iou_above_thresh_indices］ = -1，

# 這樣可能會造成label_objectness為1的那些點被設定為-1了

# 只有將那些被標註為0，且與真實框IoU超過閾值的預測框才被標註為-1

negative_indices

（

label_objectness

0。5

）

ignore_indices

negative_indices

iou_above_thresh_indices

label_objectness

［

ignore_indices

］

return

label_objectness

下面透過呼叫這兩個函式，實現如何將部分預測框的label_objectness設定為-1。

# 讀取資料

reader

multithread_loader

（

‘/home/aistudio/work/insects/train’

，

batch_size

，

mode

‘train’

）

img

，

gt_boxes

，

gt_labels

，

im_shape

（

reader

（））

# 計算出錨框對應的標籤

label_objectness

，

label_location

，

label_classification

，

scale_location

get_objectness_label

（

img

，

gt_boxes

，

gt_labels

，

iou_threshold

0。7

，

anchors

［

116

，

156

，

198

，

373

，

326

］，

num_classes

，

downsample

）

NUM_ANCHORS

NUM_CLASSES

num_filters

NUM_ANCHORS

（

NUM_CLASSES

）

with

fluid

。

dygraph

。

guard

（）：

backbone

DarkNet53_conv_body

（

‘yolov3_backbone’

，

is_test

False

）

detection

YoloDetectionBlock

（

‘detection’

，

channel

512

，

is_test

False

）

conv2d_pred

Conv2D

（

‘out_pred’

，

num_filters

，

filter_size

）

to_variable

（

img

）

，

backbone

（

）

route

，

tip

detection

（

）

conv2d_pred

（

tip

）

# anchors包含了預先設定好的錨框尺寸

anchors

［

116

，

156

，

198

，

373

，

326

］

# downsample是特徵圖P0的步幅

pred_boxes

get_yolo_box_xxyy

（

。

numpy

（），

anchors

，

num_classes

，

downsample

）

iou_above_thresh_indices

get_iou_above_thresh_inds

（

pred_boxes

，

gt_boxes

，

iou_threshold

0。7

）

label_objectness

label_objectness_ignore

（

label_objectness

，

iou_above_thresh_indices

）

label_objectness

。

shape

（

，

）

使用這種方式，就可以將那些沒有被標註為正樣本，但又與真實框IoU比較大的樣本objectness標籤設定為-1了，不計算其對任何一種損失函式的貢獻。

計算總的損失函式的程式碼如下：

def

get_loss

（

output

，

label_objectness

，

label_location

，

label_classification

，

scales

，

num_anchors

，

num_classes

）：

# 將output從［N， C， H， W］變形為［N， NUM_ANCHORS， NUM_CLASSES + 5， H， W］

reshaped_output

fluid

。

layers

。

reshape

（

output

，

［

，

num_anchors

，

num_classes

，

output

。

shape

［

］，

output

。

shape

［

］］）

# 從output中取出跟objectness相關的預測值

pred_objectness

reshaped_output

［：，

：，

，

：，

：］

loss_objectness

fluid

。

layers

。

sigmoid_cross_entropy_with_logits

（

pred_objectness

，

label_objectness

，

ignore_index

）

## 對第1，2，3維求和

#loss_objectness = fluid。layers。reduce_sum（loss_objectness， dim=［1，2，3］， keep_dim=False）

# pos_samples 只有在正樣本的地方取值為1。，其它地方取值全為0。

pos_objectness

label_objectness

pos_samples

fluid

。

layers

。

cast

（

pos_objectness

，

‘float32’

）

pos_samples

。

stop_gradient

True

#從output中取出所有跟位置相關的預測值

reshaped_output

［：，

：，

，

：，

：］

reshaped_output

［：，

：，

，

：，

：］

reshaped_output

［：，

：，

，

：，

：］

reshaped_output

［：，

：，

，

：，

：］

# 從label_location中取出各個位置座標的標籤

dx_label

label_location

［：，

：，

，

：，

：］

dy_label

label_location

［：，

：，

，

：，

：］

tw_label

label_location

［：，

：，

，

：，

：］

th_label

label_location

［：，

：，

，

：，

：］

# 構建損失函式

loss_location_x

fluid

。

layers

。

sigmoid_cross_entropy_with_logits

（

，

dx_label

）

loss_location_y

fluid

。

layers

。

sigmoid_cross_entropy_with_logits

（

，

dy_label

）

loss_location_w

fluid

。

layers

。

abs

（

tw_label

）

loss_location_h

fluid

。

layers

。

abs

（

th_label

）

# 計算總的位置損失函式

loss_location

loss_location_x

loss_location_y

loss_location_h

loss_location_w

# 乘以scales

loss_location

scales

# 只計算正樣本的位置損失函式

loss_location

pos_samples

#從ooutput取出所有跟物體類別相關的畫素點

pred_classification

reshaped_output

［：，

：，

：

num_classes

，

：，

：］

# 計算分類相關的損失函式

loss_classification

fluid

。

layers

。

sigmoid_cross_entropy_with_logits

（

pred_classification

，

label_classification

）

# 將第2維求和

loss_classification

fluid

。

layers

。

reduce_sum

（

loss_classification

，

dim

，

keep_dim

False

）

# 只計算objectness為正的樣本的分類損失函式

loss_classification

pos_samples

total_loss

loss_objectness

loss_location

loss_classification

# 對所有預測框的loss進行求和

total_loss

fluid

。

layers

。

reduce_sum

（

total_loss

，

dim

［

，

］，

keep_dim

False

）

# 對所有樣本求平均

total_loss

fluid

。

layers

。

reduce_mean

（

total_loss

）

return

total_loss

# 計算損失函式

# 讀取資料

reader

multithread_loader

（

‘/home/aistudio/work/insects/train’

，

batch_size

，

mode

‘train’

）

img

，

gt_boxes

，

gt_labels

，

im_shape

（

reader

（））

# 計算出錨框對應的標籤

label_objectness

，

label_location

，

label_classification

，

scale_location

get_objectness_label

（

img

，

gt_boxes

，

gt_labels

，

iou_threshold

0。7

，

anchors

［

116

，

156

，

198

，

373

，

326

］，

num_classes

，

downsample

）

NUM_ANCHORS

NUM_CLASSES

num_filters

NUM_ANCHORS

（

NUM_CLASSES

）

with

fluid

。

dygraph

。

guard

（）：

backbone

DarkNet53_conv_body

（

‘yolov3_backbone’

，

is_test

False

）

detection

YoloDetectionBlock

（

‘detection’

，

channel

512

，

is_test

False

）

conv2d_pred

Conv2D

（

‘out_pred’

，

num_filters

，

filter_size

）

to_variable

（

img

）

，

backbone

（

）

route

，

tip

detection

（

）

conv2d_pred

（

tip

）

# anchors包含了預先設定好的錨框尺寸

anchors

［

116

，

156

，

198

，

373

，

326

］

# downsample是特徵圖P0的步幅

pred_boxes

get_yolo_box_xxyy

（

。

numpy

（），

anchors

，

num_classes

，

downsample

）

iou_above_thresh_indices

get_iou_above_thresh_inds

（

pred_boxes

，

gt_boxes

，

iou_threshold

0。7

）

label_objectness

label_objectness_ignore

（

label_objectness

，

iou_above_thresh_indices

）

label_objectness

to_variable

（

label_objectness

）

label_location

to_variable

（

label_location

）

label_classification

to_variable

（

label_classification

）

scales

to_variable

（

scale_location

）

label_objectness

。

stop_gradient

True

label_location

。

stop_gradient

True

label_classification

。

stop_gradient

True

scales

。

stop_gradient

True

total_loss

get_loss

（

，

label_objectness

，

label_location

，

label_classification

，

scales

，

num_anchors

NUM_ANCHORS

，

num_classes

NUM_CLASSES

）

total_loss_data

total_loss

。

numpy

（）

total_loss_data

array

（［

623。6282

］，

dtype

float32

）

上面的程式計算出了總的損失函式，看到這裡，讀者已經瞭解到了YOLO-V3演算法的大部分內容，包括如何生成錨框、給錨框打上標籤、透過卷積神經網路提取特徵、將輸出特徵圖跟預測框相關聯、建立起損失函式。

多尺度檢測

目前我們計算損失函式是在特徵圖P0的基礎上進行的，它的步幅stride=32。特徵圖的尺寸比較小，畫素點數目比較少，每個畫素點的感受野很大，具有非常豐富的高層級語義資訊，可能比較容易檢測到較大的目標。為了能夠檢測到尺寸較小的那些目標，需要在尺寸較大的特徵圖上面建立預測輸出。如果我們在C2或者C1這種層級的特徵圖上直接產生預測輸出，可能面臨新的問題，它們沒有經過充分的特徵提取，畫素點包含的語義資訊不夠豐富，有可能難以提取到有效的

特徵模式

。在目標檢測中，解決這一問題的方式是，將高層級的特徵圖尺寸放大之後跟低層級的特徵圖進行融合，得到的新特徵圖既能包含豐富的語義資訊，又具有較多的畫素點，能夠描述更加精細的結構。

具體的網路實現方式如

圖19

所示：

圖19：生成多層級的輸出特徵圖P0、P1、P2

YOLO-V3在每個區域的中心位置產生3個錨框，在3個層級的特徵圖上產生錨框的大小分別為P2 ［（10×13），（16×30），（33×23）］，P1 ［（30×61），（62×45），（59× 119）］，P0［（116 × 90），（156 × 198），（373 × 326］。越往後的特徵圖上用到的錨框尺寸也越大，能捕捉到大尺寸目標的資訊；越往前的特徵圖上錨框尺寸越小，能捕捉到小尺寸目標的資訊。

因為有多尺度的檢測，所以需要對上面的程式碼進行較大的修改，而且實現過程也略顯繁瑣，所以推薦大家直接使用Paddle提供的API fluid。layers。yolov3_loss，其具體說明如下：

fluid。layers。yolov3_loss（x， gt_box， gt_label， anchors， anchor_mask， class_num， ignore_thresh， downsample_ratio， gt_score=None， use_label_smooth=True， name=None））

x：輸入的影象資料

gt_box：真實框

gt_label：真實框標籤

anchors：使用到的anchor的尺寸，如［10， 13， 16， 30， 33， 23， 30， 61， 62， 45， 59， 119， 116， 90， 156， 198， 373， 326］

anchor_mask：每個層級上使用的anchor的掩碼，［［6， 7， 8］，［3， 4， 5］，［0， 1， 2］］

class_num，物體類別數，AI識蟲資料集為7

ignore_thresh，預測框與真實框IoU閾值超過ignore_thresh時，不作為負樣本，YOLO-V3模型裡設定為0。7

downsample_ratio，特徵圖P0的下采樣比例，使用Darknet53

骨幹網路

時為32

gt_score，真實框的置信度，在使用了mixup技巧時會用到

use_label_smooth，一種訓練技巧，不使用就設定為False

name，該層的名字，比如‘yolov3_loss’，可以不設定

對於使用了多層級特徵圖產生預測框的方法，其具體實現程式碼如下：

# 定義上取樣模組

class

Upsample

（

fluid

。

dygraph

。

Layer

）：

def

__init__

（

self

，

name_scope

，

scale

）：

super

（

Upsample

，

self

）

。

__init__

（

name_scope

）

self

。

scale

def

forward

（

self

，

inputs

）：

# get dynamic upsample output shape

shape_nchw

fluid

。

layers

。

shape

（

inputs

）

shape_hw

fluid

。

layers

。

slice

（

shape_nchw

，

axes

［

］，

starts

［

］，

ends

［

］）

shape_hw

。

stop_gradient

True

in_shape

fluid

。

layers

。

cast

（

shape_hw

，

dtype

‘int32’

）

out_shape

in_shape

self

。

scale

out_shape

。

stop_gradient

True

# reisze by actual_shape

out

fluid

。

layers

。

resize_nearest

（

input

inputs

，

scale

self

。

scale

，

actual_shape

out_shape

）

return

out

# 定義YOLO-V3模型

class

YOLOv3

（

fluid

。

dygraph

。

Layer

）：

def

__init__

（

self

，

name_scope

，

num_classes

，

is_train

True

）：

super

（

YOLOv3

，

self

）

。

__init__

（

name_scope

）

self

。

is_train

self

。

num_classes

# 提取影象特徵的骨幹程式碼

self

。

block

DarkNet53_conv_body

（

self

。

full_name

（），

is_test

not

self

。

is_train

）

self

。

block_outputs

［］

self

。

yolo_blocks

［］

self

。

route_blocks_2

［］

# 生成3個層級的特徵圖P0， P1， P2

for

range

（

）：

# 新增從ci生成ri和ti的模組

yolo_block

self

。

add_sublayer

（

“yolo_detecton_block_

”

（

），

YoloDetectionBlock

（

self

。

full_name

（），

channel

512

（

），

is_test

not

self

。

is_train

））

self

。

yolo_blocks

。

append

（

yolo_block

）

num_filters

（

self

。

num_classes

）

# 新增從ti生成pi的模組，這是一個Conv2D操作，輸出通道數為3 * （num_classes + 5）

block_out

self

。

add_sublayer

（

“block_out_

”

（

），

Conv2D

（

self

。

full_name

（），

num_filters

，

filter_size

，

stride

，

padding

，

act

None

，

param_attr

ParamAttr

（

initializer

fluid

。

initializer

。

Normal

（

0。

，

0。02

）），

bias_attr

ParamAttr

（

initializer

fluid

。

initializer

。

Constant

（

0。0

），

regularizer

L2Decay

（

0。

））））

self

。

block_outputs

。

append

（

block_out

）

：

# 對ri進行卷積

route

self

。

add_sublayer

（

“route2_

”

，

ConvBNLayer

（

self

。

full_name

（），

ch_out

256

（

），

filter_size

，

stride

，

padding

，

is_test

（

not

self

。

is_train

）））

self

。

route_blocks_2

。

append

（

route

）

# 將ri放大以便跟c_{i+1}保持同樣的尺寸

self

。

upsample

Upsample

（

self

。

full_name

（））

def

forward

（

self

，

inputs

）：

outputs

［］

blocks

self

。

block

（

inputs

）

for

，

block

enumerate

（

blocks

）：

：

# 將r_{i-1}經過卷積和上取樣之後得到特徵圖，與這一級的ci進行拼接

block

fluid

。

layers

。

concat

（

input

［

route

，

block

］，

axis

）

# 從ci生成ti和ri

route

，

tip

self

。

yolo_blocks

［

］（

block

）

# 從ti生成pi

block_out

self

。

block_outputs

［

］（

tip

）

# 將pi放入列表

outputs

。

append

（

block_out

）

：

# 對ri進行卷積調整通道數

route

self

。

route_blocks_2

［

］（

route

）

# 對ri進行放大，使其尺寸和c_{i+1}保持一致

route

self

。

upsample

（

route

）

return

outputs

def

get_loss

（

self

，

outputs

，

gtbox

，

gtlabel

，

gtscore

None

，

anchors

［

，

119

，

116

，

156

，

198

，

373

，

326

］，

anchor_masks

［［

，

］，

［

，

］，

［

，

］］，

ignore_thresh

0。7

，

use_label_smooth

False

）：

“”“

使用fluid。layers。yolov3_loss，直接計算損失函式，過程更簡潔，速度也更快

”“”

self

。

losses

［］

downsample

for

，

out

enumerate

（

outputs

）：

# 對三個層級分別求損失函式

anchor_mask_i

anchor_masks

［

］

loss

fluid

。

layers

。

yolov3_loss

（

out

，

# out是P0， P1， P2中的一個

gt_box

gtbox

，

# 真實框座標

gt_label

gtlabel

，

# 真實框類別

gt_score

gtscore

，

# 真實框得分，使用mixup訓練技巧時需要，不使用該技巧時直接設定為1，形狀與gtlabel相同

anchors

，

# 錨框尺寸，包含［w0， h0， w1， h1，。。。， w8， h8］共9個錨框的尺寸

anchor_mask

anchor_mask_i

，

# 篩選錨框的mask，例如anchor_mask_i=［3， 4， 5］，將anchors中第3、4、5個錨框挑選出來給該層級使用

class_num

self

。

num_classes

，

# 分類類別數

ignore_thresh

，

# 當預測框與真實框IoU > ignore_thresh，標註objectness = -1

downsample_ratio

downsample

，

# 特徵圖相對於原圖縮小的倍數，例如P0是32， P1是16，P2是8

use_label_smooth

False

）

# 使用label_smooth訓練技巧時會用到，這裡沒用此技巧，直接設定為False

self

。

losses

。

append

（

fluid

。

layers

。

reduce_mean

（

loss

））

#reduce_mean對每張圖片求和

downsample

# 下一級特徵圖的縮放倍數會減半

return

sum

（

self

。

losses

）

# 對每個層級求和

開啟端到端訓練

訓練過程的流程如下圖所示，輸入圖片經過特徵提取得到三個層級的輸出特徵圖P0（stride=32）、P1（stride=16）和P2（stride=8），相應的分別使用不同大小的小方塊區域去生成對應的錨框和預測框，並對這些錨框進行標註。

P0層級特徵圖，對應著使用$32\times32$大小的小方塊，在每個區域中心生成大小分別為$［116， 90］$， $［156， 198］$， $［373， 326］$的三種錨框。

P1層級特徵圖，對應著使用$16\times16$大小的小方塊，在每個區域中心生成大小分別為$［30， 61］$， $［62， 45］$， $［59， 119］$的三種錨框。

P2層級特徵圖，對應著使用$8\times8$大小的小方塊，在每個區域中心生成大小分別為$［10， 13］$， $［16， 30］$， $［33， 23］$的三種錨框。

將三個層級的特徵圖與對應錨框之間的標籤關聯起來，並建立損失函式，總的損失函式等於三個層級的損失函式相加。透過極小化損失函式，可以開啟端到端的訓練過程。

圖20：端到端訓練流程

訓練過程的具體實現程式碼如下：

import

time

import

paddle

import

paddle。fluid

fluid

ANCHORS

［

，

119

，

116

，

156

，

198

，

373

，

326

］

ANCHOR_MASKS

［［

，

］，

［

，

］，

［

，

］］

IGNORE_THRESH

。

NUM_CLASSES

def

get_lr

（

base_lr

0。0001

，

lr_decay

0。1

）：

［

10000

，

20000

］

［

base_lr

，

base_lr

lr_decay

，

base_lr

lr_decay

］

learning_rate

fluid

。

layers

。

piecewise_decay

（

boundaries

，

values

）

return

learning_rate

__name__

‘__main__’

：

TRAINDIR

‘/home/aistudio/work/insects/train’

TESTDIR

‘/home/aistudio/work/insects/test’

VALIDDIR

‘/home/aistudio/work/insects/val’

with

fluid

。

dygraph

。

guard

（）：

model

YOLOv3

（

‘yolov3’

，

num_classes

NUM_CLASSES

，

is_train

True

）

#建立模型

learning_rate

get_lr

（）

opt

fluid

。

optimizer

。

Momentum

（

learning_rate

，

momentum

0。9

，

regularization

fluid

。

regularizer

。

L2Decay

（

0。0005

））

#建立最佳化器

train_loader

multithread_loader

（

TRAINDIR

，

batch_size

，

mode

‘train’

）

#建立訓練資料讀取器

valid_loader

multithread_loader

（

VALIDDIR

，

batch_size

，

mode

‘valid’

）

#建立驗證資料讀取器

MAX_EPOCH

200

for

epoch

range

（

MAX_EPOCH

）：

for

，

data

enumerate

（

train_loader

（））：

img

，

gt_boxes

，

gt_labels

，

img_scale

data

gt_scores

。

ones

（

gt_labels

。

shape

）

。

astype

（

‘float32’

）

gt_scores

to_variable

（

gt_scores

）

img

to_variable

（

img

）

gt_boxes

to_variable

（

gt_boxes

）

gt_labels

to_variable

（

gt_labels

）

outputs

model

（

img

）

#前向傳播，輸出［P0， P1， P2］

loss

model

。

get_loss

（

outputs

，

gt_boxes

，

gt_labels

，

gtscore

gt_scores

，

anchors

ANCHORS

，

anchor_masks

ANCHOR_MASKS

，

ignore_thresh

IGNORE_THRESH

，

use_label_smooth

False

）

# 計算損失函式

loss

。

backward

（）

# 反向傳播計算梯度

opt

。

minimize

（

loss

）

# 更新引數

model

。

clear_gradients

（）

：

timestring

time

。

strftime

（

“%Y-%m-

%H：%M：%S”

，

time

。

localtime

（

time

。

time

（）））

（

‘{}［TRAIN］epoch {}， iter {}， output loss： {}’

。

format

（

timestring

，

epoch

，

loss

。

numpy

（）））

# save params of model

（

epoch

）

（

epoch

MAX_EPOCH

）：

fluid

。

save_dygraph

（

model

。

state_dict

（），

‘yolo_epoch{}’

。

format

（

epoch

））

# 每個epoch結束之後在驗證集上進行測試

model

。

eval

（）

for

，

data

enumerate

（

valid_loader

（））：

img

，

gt_boxes

，

gt_labels

，

img_scale

data

gt_scores

。

ones

（

gt_labels

。

shape

）

。

astype

（

‘float32’

）

gt_scores

to_variable

（

gt_scores

）

img

to_variable

（

img

）

gt_boxes

to_variable

（

gt_boxes

）

gt_labels

to_variable

（

gt_labels

）

outputs

model

（

img

）

loss

model

。

get_loss

（

outputs

，

gt_boxes

，

gt_labels

，

gtscore

gt_scores

，

anchors

ANCHORS

，

anchor_masks

ANCHOR_MASKS

，

ignore_thresh

IGNORE_THRESH

，

use_label_smooth

False

）

：

timestring

time

。

strftime

（

“%Y-%m-

%H：%M：%S”

，

time

。

localtime

（

time

。

time

（）））

（

‘{}［VALID］epoch {}， iter {}， output loss： {}’

。

format

（

timestring

，

epoch

，

loss

。

numpy

（）））

model

。

train

（）

預測

預測過程流程

圖21

如下所示：

圖21：端到端訓練流程

預測過程可以分為兩步：

透過網路輸出計算出預測框位置和所屬類別的得分。

使用非極大值抑制來消除重疊較大的預測框。

對於第1步，前面我們已經講過如何透過網路輸出值計算pred_objectness_probability， pred_boxes以及pred_classification_probability，這裡推薦大家直接使用fluid。layers。yolo_box，其使用方法是：

fluid。layers。yolo_box（x， img_size， anchors， class_num， conf_thresh， downsample_ratio， name=None）

x，網路輸出特徵圖，例如上面提到的P0或者P1、P2

img_size，輸入圖片尺寸

anchors，使用到的anchor的尺寸，如［10， 13， 16， 30， 33， 23， 30， 61， 62， 45， 59， 119， 116， 90， 156， 198， 373， 326］

anchor_mask：每個層級上使用的anchor的掩碼，［［6， 7， 8］，［3， 4， 5］，［0， 1， 2］］

class_num，物體類別數目

conf_thresh，置信度閾值，得分低於該閾值的預測框位置數值不用計算直接設定為0。0

downsample_ratio，特徵圖的下采樣比例，例如P0是32，P1是16，P2是8

name=None，名字，例如‘yolo_box’

返回值包括兩項，boxes和scores，其中boxes是所有預測框的

座標值

，scores是所有預測框的得分。

預測框得分的定義是所屬類別的機率乘以其預測框是否包含目標物體的objectness機率，即

$$score = P_{obj} \cdot P_{classification}$$

在上面定義的類YOLO-V3下面新增函式，get_pred，透過呼叫fluid。layers。yolo_box獲得P0、P1、P2三個層級的特徵圖對應的預測框和得分，並將他們拼接在一塊，即可得到所有的預測框及其屬於各個類別的得分。

class

YOLOv3

（

fluid

。

dygraph

。

Layer

）：

def

__init__

（

self

，

name_scope

，

num_classes

，

is_train

True

）：

super

（

YOLOv3

，

self

）

。

__init__

（

name_scope

）

self

。

is_train

self

。

num_classes

# 提取影象特徵的骨幹程式碼

self

。

block

DarkNet53_conv_body

（

self

。

full_name

（），

is_test

not

self

。

is_train

）

self

。

block_outputs

［］

self

。

yolo_blocks

［］

self

。

route_blocks_2

［］

for

range

（

）：

# 新增從ci生成ri和ti的模組

yolo_block

self

。

add_sublayer

（

“yolo_detecton_block_

”

（

），

YoloDetectionBlock

（

self

。

full_name

（），

channel

512

（

），

is_test

not

self

。

is_train

））

self

。

yolo_blocks

。

append

（

yolo_block

）

num_filters

（

self

。

num_classes

）

# 新增從ti生成pi的模組，這是一個Conv2D操作，輸出通道數為3 * （num_classes + 5）

block_out

self

。

add_sublayer

（

“block_out_

”

（

），

Conv2D

（

self

。

full_name

（），

num_filters

，

filter_size

，

stride

，

padding

，

act

None

，

param_attr

ParamAttr

（

initializer

fluid

。

initializer

。

Normal

（

0。

，

0。02

）），

bias_attr

ParamAttr

（

initializer

fluid

。

initializer

。

Constant

（

0。0

），

regularizer

L2Decay

（

0。

））））

self

。

block_outputs

。

append

（

block_out

）

：

# 對ri進行卷積

route

self

。

add_sublayer

（

“route2_

”

，

ConvBNLayer

（

self

。

full_name

（），

ch_out

256

（

），

filter_size

，

stride

，

padding

，

is_test

（

not

self

。

is_train

）））

self

。

route_blocks_2

。

append

（

route

）

# 將ri放大以便跟c_{i+1}保持同樣的尺寸

self

。

upsample

Upsample

（

self

。

full_name

（））

def

forward

（

self

，

inputs

）：

outputs

［］

blocks

self

。

block

（

inputs

）

for

，

block

enumerate

（

blocks

）：

：

# 將r_{i-1}經過卷積和上取樣之後得到特徵圖，與這一級的ci進行拼接

block

fluid

。

layers

。

concat

（

input

［

route

，

block

］，

axis

）

# 從ci生成ti和ri

route

，

tip

self

。

yolo_blocks

［

］（

block

）

# 從ti生成pi

block_out

self

。

block_outputs

［

］（

tip

）

# 將pi放入列表

outputs

。

append

（

block_out

）

：

# 對ri進行卷積調整通道數

route

self

。

route_blocks_2

［

］（

route

）

# 對ri進行放大，使其尺寸和c_{i+1}保持一致

route

self

。

upsample

（

route

）

return

outputs

def

get_loss

（

self

，

outputs

，

gtbox

，

gtlabel

，

gtscore

None

，

anchors

［

，

119

，

116

，

156

，

198

，

373

，

326

］，

anchor_masks

［［

，

］，

［

，

］，

［

，

］］，

ignore_thresh

0。7

，

use_label_smooth

False

）：

self

。

losses

［］

downsample

for

，

out

enumerate

（

outputs

）：

anchor_mask_i

anchor_masks

［

］

loss

fluid

。

layers

。

yolov3_loss

（

out

，

gt_box

gtbox

，

gt_label

gtlabel

，

gt_score

gtscore

，

anchors

，

anchor_mask

anchor_mask_i

，

class_num

self

。

num_classes

，

ignore_thresh

，

downsample_ratio

downsample

，

use_label_smooth

False

）

self

。

losses

。

append

（

fluid

。

layers

。

reduce_mean

（

loss

））

downsample

return

sum

（

self

。

losses

）

def

get_pred

（

self

，

outputs

，

im_shape

None

，

anchors

［

，

119

，

116

，

156

，

198

，

373

，

326

］，

anchor_masks

［［

，

］，

［

，

］，

［

，

］］，

valid_thresh

0。01

）：

downsample

total_boxes

［］

total_scores

［］

for

，

out

enumerate

（

outputs

）：

anchor_mask

anchor_masks

［

］

anchors_this_level

［］

for

anchor_mask

：

anchors_this_level

。

append

（

anchors

［

］）

anchors_this_level

。

append

（

anchors

［

］）

boxes

，

scores

fluid

。

layers

。

yolo_box

（

out

，

img_size

im_shape

，

anchors

anchors_this_level

，

class_num

self

。

num_classes

，

conf_thresh

valid_thresh

，

downsample_ratio

downsample

，

name

“yolo_box”

str

（

））

total_boxes

。

append

（

boxes

）

total_scores

。

append

（

fluid

。

layers

。

transpose

（

scores

，

perm

［

，

］））

downsample

yolo_boxes

fluid

。

layers

。

concat

（

total_boxes

，

axis

）

yolo_scores

fluid

。

layers

。

concat

（

total_scores

，

axis

）

return

yolo_boxes

，

yolo_scores

第1步的計算結果會在每個小方塊區域都會產生多個預測框，輸出預測框中會有很多重合度比較大，需要消除重疊較大的冗餘預測框。

下面示例程式碼中的預測框是使用模型對圖片預測之後輸出的，這裡一共選出了11個預測框，在圖上畫出預測框如下所示。在每個人像周圍，都出現了多個預測框，需要消除冗餘的預測框以得到最終的預測結果。

# 畫圖展示目標物體邊界框

import

numpy

import

matplotlib。pyplot

plt

import

matplotlib。patches

patches

from

matplotlib。image

import

imread

import

math

# 定義畫矩形框的程式

def

draw_rectangle

（

currentAxis

，

bbox

，

edgecolor

‘k’

，

facecolor

‘y’

，

fill

False

，

linestyle

‘-’

）：

# currentAxis，座標軸，透過plt。gca（）獲取

# bbox，邊界框，包含四個數值的list，［x1， y1， x2， y2］

# edgecolor，邊框線條顏色

# facecolor，填充顏色

# fill，是否填充

# linestype，邊框線型

# patches。Rectangle需要傳入左上角座標、矩形區域的寬度、高度等引數

rect

patches

。

Rectangle

（（

bbox

［

］，

bbox

［

］），

bbox

［

］

bbox

［

］

，

bbox

［

］

bbox

［

］

，

linewidth

，

edgecolor

，

facecolor

，

fill

，

linestyle

）

currentAxis

。

add_patch

（

rect

）

plt

。

figure

（

figsize

（

，

））

filename

‘/home/aistudio/work/images/section3/000000086956。jpg’

imread

（

filename

）

plt

。

imshow

（

）

currentAxis

plt

。

gca

（）

# 預測框位置

boxes

。

array

（［［

4。21716537e+01

，

1。28230896e+02

，

2。26547668e+02

，

6。00434631e+02

］，

［

3。18562988e+02

，

1。23168472e+02

，

4。79000000e+02

，

6。05688416e+02

］，

［

2。62704697e+01

，

1。39430557e+02

，

2。20587097e+02

，

6。38959656e+02

］，

［

4。24965363e+01

，

1。42706665e+02

，

2。25955185e+02

，

6。35671204e+02

］，

［

2。37462646e+02

，

1。35731537e+02

，

4。79000000e+02

，

6。31451294e+02

］，

［

3。19390472e+02

，

1。29295090e+02

，

4。79000000e+02

，

6。33003845e+02

］，

［

3。28933838e+02

，

1。22736115e+02

，

4。79000000e+02

，

6。39000000e+02

］，

［

4。44292603e+01

，

1。70438187e+02

，

2。26841858e+02

，

6。39000000e+02

］，

［

2。17988785e+02

，

3。02472412e+02

，

4。06062927e+02

，

6。29106628e+02

］，

［

2。00241089e+02

，

3。23755096e+02

，

3。96929321e+02

，

6。36386108e+02

］，

［

2。14310303e+02

，

3。23443665e+02

，

4。06732849e+02

，

6。35775269e+02

］］）

# 預測框得分

scores

。

array

（［

0。5247661

，

0。51759845

，

0。86075854

，

0。9910175

，

0。39170712

，

0。9297706

，

0。5115228

，

0。270992

，

0。19087596

，

0。64201415

，

0。879036

］）

# 畫出所有預測框

for

box

boxes

：

draw_rectangle

（

currentAxis

，

box

）

這裡使用非極大值抑制（non-maximum suppression， nms）來消除冗餘框，其基本思想是，如果有多個預測框都對應同一個物體，則只選出得分最高的那個預測框，剩下的預測框被丟棄掉。那麼如何判斷兩個預測框對應的是同一個物體呢，標準該怎麼設定？如果兩個預測框的類別一樣，而且他們的位置重合度比較大，則可以認為他們是在預測同一個目標。非極大值抑制的做法是，選出某個類別得分最高的預測框，然後看哪些預測框跟它的IoU大於閾值，就把這些預測框給丟棄掉。這裡IoU的閾值是超引數，需要提前設定，YOLO-V3模型裡面設定的是0。5。

比如在上面的程式中，boxes裡面一共對應11個預測框，scores給出了它們預測“人”這一類別的得分。

Step0 建立選中列表，keep_list = ［］

Step1 對得分進行排序，remain_list = ［ 3， 5， 10， 2， 9， 0， 1， 6， 4， 7， 8］，

Step2 選出boxes［3］，此時keep_list為空，不需要計算IoU，直接將其放入keep_list，keep_list = ［3］， remain_list=［5， 10， 2， 9， 0， 1， 6， 4， 7， 8］

Step3 選出boxes［5］，此時keep_list中已經存在boxes［3］，計算出IoU（boxes［3］， boxes［5］） = 0。0，顯然小於閾值，則keep_list=［3， 5］， remain_list = ［10， 2， 9， 0， 1， 6， 4， 7， 8］

Step4 選出boxes［10］，此時keep_list=［3， 5］，計算IoU（boxes［3］， boxes［10］）=0。0268，IoU（boxes［5］， boxes［10］）=0。0268 = 0。24，都小於閾值，則keep_list = ［3， 5， 10］，remain_list=［2， 9， 0， 1， 6， 4， 7， 8］

Step5 選出boxes［2］，此時keep_list = ［3， 5， 10］，計算IoU（boxes［3］， boxes［2］） = 0。88，超過了閾值，直接將boxes［2］丟棄，keep_list=［3， 5， 10］，remain_list=［9， 0， 1， 6， 4， 7， 8］

Step6 選出boxes［9］，此時keep_list = ［3， 5， 10］，計算IoU（boxes［3］， boxes［9］） = 0。0577，IoU（boxes［5］， boxes［9］） = 0。205，IoU（boxes［10］， boxes［9］） = 0。88，超過了閾值，將boxes［9］丟棄掉。keep_list=［3， 5， 10］，remain_list=［0， 1， 6， 4， 7， 8］

Step7 重複上述Step6直到remain_list為空

最終得到keep_list=［3， 5， 10］，也就是預測框3、5、10被最終挑選出來了，如下圖所示

# 畫圖展示目標物體邊界框

import

numpy

import

matplotlib。pyplot

plt

import

matplotlib。patches

patches

from

matplotlib。image

import

imread

import

math

# 定義畫矩形框的程式

def

draw_rectangle

（

currentAxis

，

bbox

，

edgecolor

‘k’

，

facecolor

‘y’

，

fill

False

，

linestyle

‘-’

）：

# currentAxis，座標軸，透過plt。gca（）獲取

# bbox，邊界框，包含四個數值的list，［x1， y1， x2， y2］

# edgecolor，邊框線條顏色

# facecolor，填充顏色

# fill，是否填充

# linestype，邊框線型

# patches。Rectangle需要傳入左上角座標、矩形區域的寬度、高度等引數

rect

patches

。

Rectangle

（（

bbox

［

］，

bbox

［

］），

bbox

［

］

bbox

［

］

，

bbox

［

］

bbox

［

］

，

linewidth

，

edgecolor

，

facecolor

，

fill

，

linestyle

）

currentAxis

。

add_patch

（

rect

）

plt

。

figure

（

figsize

（

，

））

filename

‘/home/aistudio/work/images/section3/000000086956。jpg’

imread

（

filename

）

plt

。

imshow

（

）

currentAxis

plt

。

gca

（）

boxes

。

array

（［［

4。21716537e+01

，

1。28230896e+02

，

2。26547668e+02

，

6。00434631e+02

］，

［

3。18562988e+02

，

1。23168472e+02

，

4。79000000e+02

，

6。05688416e+02

］，

［

2。62704697e+01

，

1。39430557e+02

，

2。20587097e+02

，

6。38959656e+02

］，

［

4。24965363e+01

，

1。42706665e+02

，

2。25955185e+02

，

6。35671204e+02

］，

［

2。37462646e+02

，

1。35731537e+02

，

4。79000000e+02

，

6。31451294e+02

］，

［

3。19390472e+02

，

1。29295090e+02

，

4。79000000e+02

，

6。33003845e+02

］，

［

3。28933838e+02

，

1。22736115e+02

，

4。79000000e+02

，

6。39000000e+02

］，

［

4。44292603e+01

，

1。70438187e+02

，

2。26841858e+02

，

6。39000000e+02

］，

［

2。17988785e+02

，

3。02472412e+02

，

4。06062927e+02

，

6。29106628e+02

］，

［

2。00241089e+02

，

3。23755096e+02

，

3。96929321e+02

，

6。36386108e+02

］，

［

2。14310303e+02

，

3。23443665e+02

，

4。06732849e+02

，

6。35775269e+02

］］）

scores

。

array

（［

0。5247661

，

0。51759845

，

0。86075854

，

0。9910175

，

0。39170712

，

0。9297706

，

0。5115228

，

0。270992

，

0。19087596

，

0。64201415

，

0。879036

］）

left_ind

。

where

（（

boxes

［：，

］

）

（

boxes

［：，

］

））

left_boxes

boxes

［

left_ind

］

left_scores

scores

［

left_ind

］

colors

［

‘r’

，

‘g’

，

‘b’

，

‘k’

］

# 畫出最終保留的預測框

inds

［

，

］

for

range

（

）：

box

boxes

［

inds

［

］］

draw_rectangle

（

currentAxis

，

box

，

edgecolor

colors

［

］）

非極大值抑制的具體實現程式碼如下面nms函式的定義，需要說明的是資料集中含有多個類別的物體，所以這裡需要做多分類非極大值抑制，其實現原理與非極大值抑制相同，區別在於需要對每個類別都做非極大值抑制，實現程式碼如下面的multiclass_nms所示。

# 非極大值抑制

def

nms

（

bboxes

，

scores

，

score_thresh

，

nms_thresh

，

pre_nms_topk

，

）：

“”“

nms

”“”

inds

。

argsort

（

scores

）

inds

［：：

］

keep_inds

［］

while

（

len

（

inds

）

）：

cur_ind

inds

［

］

cur_score

scores

［

cur_ind

］

# if score of the box is less than score_thresh， just drop it

cur_score

score_thresh

：

break

keep

True

for

ind

keep_inds

：

current_box

bboxes

［

cur_ind

］

remain_box

bboxes

［

ind

］

iou

box_iou_xyxy

（

current_box

，

remain_box

）

iou

nms_thresh

：

keep

False

break

and

cur_ind

951

：

（

‘suppressed， ’

，

keep

，

cur_ind

，

ind

，

iou

）

keep

：

keep_inds

。

append

（

cur_ind

）

inds

［

：］

return

。

array

（

keep_inds

）

# 多分類非極大值抑制

def

multiclass_nms

（

bboxes

，

scores

，

score_thresh

0。01

，

nms_thresh

0。45

，

pre_nms_topk

1000

，

pos_nms_topk

100

）：

“”“

This is for multiclass_nms

”“”

batch_size

bboxes

。

shape

［

］

class_num

scores

。

shape

［

］

rets

［］

for

range

（

batch_size

）：

bboxes_i

bboxes

［

］

scores_i

scores

［

］

ret

［］

for

range

（

class_num

）：

scores_i_c

scores_i

［

］

keep_inds

nms

（

bboxes_i

，

scores_i_c

，

score_thresh

，

nms_thresh

，

pre_nms_topk

，

）

len

（

keep_inds

）

：

continue

keep_bboxes

bboxes_i

［

keep_inds

］

keep_scores

scores_i_c

［

keep_inds

］

keep_results

。

zeros

（［

keep_scores

。

shape

［

］，

］）

keep_results

［：，

］

keep_results

［：，

］

keep_scores

［：］

keep_results

［：，

：

］

keep_bboxes

［：，

：］

ret

。

append

（

keep_results

）

len

（

ret

）

：

rets

。

append

（

ret

）

continue

ret_i

。

concatenate

（

ret

，

axis

）

scores_i

ret_i

［：，

］

len

（

scores_i

）

pos_nms_topk

：

inds

。

argsort

（

scores_i

）［：：

］

inds

［：

pos_nms_topk

］

ret_i

［

inds

］

rets

。

append

（

ret_i

）

return

rets

下面是完整的測試程式，在測試資料集上的輸出結果將會被儲存在pred_results。json檔案中。

import

json

ANCHORS

［

，

119

，

116

，

156

，

198

，

373

，

326

］

ANCHOR_MASKS

［［

，

］，

［

，

］，

［

，

］］

VALID_THRESH

0。01

NMS_TOPK

400

NMS_POSK

100

NMS_THRESH

0。45

NUM_CLASSES

__name__

‘__main__’

：

TRAINDIR

‘/home/aistudio/work/insects/train/images’

TESTDIR

‘/home/aistudio/work/insects/test/images’

VALIDDIR

‘/home/aistudio/work/insects/val’

with

fluid

。

dygraph

。

guard

（）：

model

YOLOv3

（

‘yolov3’

，

num_classes

NUM_CLASSES

，

is_train

False

）

params_file_path

‘/home/aistudio/work/yolo_epoch50’

model_state_dict

，

fluid

。

load_dygraph

（

params_file_path

）

model

。

load_dict

（

model_state_dict

）

model

。

eval

（）

total_results

［］

test_loader

test_data_loader

（

TESTDIR

，

batch_size

，

mode

‘test’

）

for

，

data

enumerate

（

test_loader

（））：

img_name

，

img_data

，

img_scale_data

data

img

to_variable

（

img_data

）

img_scale

to_variable

（

img_scale_data

）

outputs

model

。

forward

（

img

）

bboxes

，

scores

model

。

get_pred

（

outputs

，

im_shape

img_scale

，

anchors

ANCHORS

，

anchor_masks

ANCHOR_MASKS

，

valid_thresh

VALID_THRESH

）

bboxes_data

bboxes

。

numpy

（）

scores_data

scores

。

numpy

（）

result

multiclass_nms

（

bboxes_data

，

scores_data

，

score_thresh

VALID_THRESH

，

nms_thresh

NMS_THRESH

，

pre_nms_topk

NMS_TOPK

，

pos_nms_topk

NMS_POSK

）

for

range

（

len

（

result

））：

result_j

result

［

］

img_name_j

img_name

［

］

total_results

。

append

（［

img_name_j

，

result_j

。

tolist

（）］）

（

‘processed {} pictures’

。

format

（

len

（

total_results

）））

（

‘’

）

json

。

dump

（

total_results

，

open

（

‘pred_results。json’

，

‘w’

））

json檔案中儲存著測試結果，是包含所有圖片預測結果的list，其構成如下：

［［img_name，［［label， score， x1， x2， y1， y2］，。。。，［label， score， x1， x2， y1， y2］］］，

［img_name，［［label， score， x1， x2， y1， y2］，。。。，［label， score， x1， x2， y1， y2］］］，

。。。

［img_name，［［label， score， x1， x2， y1， y2］，。。。，［label， score， x1， x2， y1， y2］］］］

list中的每一個元素是一張圖片的預測結果，list的總長度等於圖片的數目，每張圖片預測結果的格式是：

［img_name，［［label， score， x1， x2， y1， y2］，。。。，［label， score， x1， x2， y1， y2］］］

其中第一個元素是圖片名稱image_name，第二個元素是包含該圖片所有預測框的list，預測框列表：

［［label， score， x1， x2， y1， y2］，。。。，［label， score， x1， x2， y1， y2］］

預測框列表中每個元素［label， score， x1， x2， y1， y2］描述了一個預測框，label是預測框所屬類別標籤，score是預測框的得分；x1， x2， y1， y2對應預測框左上角座標（x1， y1），右下角座標（x2， y2）。每張圖片可能有很多個預測框，則將其全部放在預測框列表中。

在AI識蟲比賽的基礎版本中，老師提供了MAP指標計算程式碼，使用此pred_results。json檔案即可計算出最終的評估指標。

模型效果及視覺化展示

上面的程式展示瞭如何讀取測試資料集的讀片，並將最終結果儲存在json格式的檔案中。為了更直觀的給讀者展示模型效果，下面的程式添加了如何讀取單張圖片，並畫出其產生的預測框。

建立資料讀取器以讀取單張圖片的資料

# 讀取單張測試圖片

def

single_image_data_loader

（

filename

，

test_image_size

608

，

mode

‘test’

）：

“”“

載入測試用的圖片，測試資料沒有groundtruth標籤

”“”

batch_size

def

reader

（）：

batch_data

［］

img_size

test_image_size

file_path

。

path

。

join

（

filename

）

img

cv2

。

imread

（

file_path

）

img

cv2

。

cvtColor

（

img

，

cv2

。

COLOR_BGR2RGB

）

img

。

shape

［

］

img

。

shape

［

］

img

cv2

。

resize

（

img

，

（

img_size

，

img_size

））

mean

［

0。485

，

0。456

，

0。406

］

std

［

0。229

，

0。224

，

0。225

］

mean

。

array

（

mean

）

。

reshape

（（

，

））

std

。

array

（

std

）

。

reshape

（（

，

））

out_img

（

img

255。0

mean

）

std

out_img

。

astype

（

‘float32’

）

。

transpose

（（

，

））

img

out_img

#np。transpose（out_img，（2，0，1））

im_shape

［

，

］

batch_data

。

append

（（

image_name

。

split

（

‘。’

）［

］，

img

，

im_shape

））

len

（

batch_data

）

batch_size

：

yield

make_test_array

（

batch_data

）

batch_data

［］

return

reader

定義繪製預測框的畫圖函式，程式碼如下。

# 定義畫圖函式

INSECT_NAMES

［

‘Boerner’

，

‘Leconte’

，

‘Linnaeus’

，

‘acuminatus’

，

‘armandi’

，

‘coleoptera’

，

‘linnaeus’

］

# 定義畫矩形框的函式

def

draw_rectangle

（

currentAxis

，

bbox

，

edgecolor

‘k’

，

facecolor

‘y’

，

fill

False

，

linestyle

‘-’

）：

# currentAxis，座標軸，透過plt。gca（）獲取

# bbox，邊界框，包含四個數值的list，［x1， y1， x2， y2］

# edgecolor，邊框線條顏色

# facecolor，填充顏色

# fill，是否填充

# linestype，邊框線型

# patches。Rectangle需要傳入左上角座標、矩形區域的寬度、高度等引數

rect

patches

。

Rectangle

（（

bbox

［

］，

bbox

［

］），

bbox

［

］

bbox

［

］

，

bbox

［

］

bbox

［

］

，

linewidth

，

edgecolor

，

facecolor

，

fill

，

linestyle

）

currentAxis

。

add_patch

（

rect

）

# 定義繪製預測結果的函式

def

draw_results

（

result

，

filename

，

draw_thresh

0。5

）：

plt

。

figure

（

figsize

（

，

））

imread

（

filename

）

plt

。

imshow

（

）

currentAxis

plt

。

gca

（）

colors

［

‘r’

，

‘g’

，

‘b’

，

‘k’

，

‘y’

，

‘c’

，

‘purple’

］

for

item

result

：

box

item

［

：

］

label

int

（

item

［

］）

name

INSECT_NAMES

［

label

］

item

［

］

draw_thresh

：

draw_rectangle

（

currentAxis

，

box

，

edgecolor

colors

［

label

］）

plt

。

text

（

box

［

］，

box

［

］，

name

，

fontsize

，

color

colors

［

label

］）

使用上面定義的single_image_data_loader函式讀取指定的圖片，輸入網路並計算出預測框和得分，然後使用多分類非極大值抑制消除冗餘的框。將最終結果畫圖展示出來。

import

json

import

paddle

import

paddle。fluid

fluid

ANCHORS

［

，

119

，

116

，

156

，

198

，

373

，

326

］

ANCHOR_MASKS

［［

，

］，

［

，

］，

［

，

］］

VALID_THRESH

0。01

NMS_TOPK

400

NMS_POSK

100

NMS_THRESH

0。45

NUM_CLASSES

__name__

‘__main__’

：

image_name

‘/home/aistudio/work/insects/test/images/2599。jpeg’

params_file_path

‘/home/aistudio/work/yolo_epoch50’

with

fluid

。

dygraph

。

guard

（）：

model

YOLOv3

（

‘yolov3’

，

num_classes

NUM_CLASSES

，

is_train

False

）

model_state_dict

，

fluid

。

load_dygraph

（

params_file_path

）

model

。

load_dict

（

model_state_dict

）

model

。

eval

（）

total_results

［］

test_loader

single_image_data_loader

（

image_name

，

mode

‘test’

）

for

，

data

enumerate

（

test_loader

（））：

img_name

，

img_data

，

img_scale_data

data

img

to_variable

（

img_data

）

img_scale

to_variable

（

img_scale_data

）

outputs

model

。

forward

（

img

）

bboxes

，

scores

model

。

get_pred

（

outputs

，

im_shape

img_scale

，

anchors

ANCHORS

，

anchor_masks

ANCHOR_MASKS

，

valid_thresh

VALID_THRESH

）

bboxes_data

bboxes

。

numpy

（）

scores_data

scores

。

numpy

（）

results

multiclass_nms

（

bboxes_data

，

scores_data

，

score_thresh

VALID_THRESH

，

nms_thresh

NMS_THRESH

，

pre_nms_topk

NMS_TOPK

，

pos_nms_topk

NMS_POSK

）

result

results

［

］

draw_results

（

result

，

image_name

，

draw_thresh

0。5

）

透過上面的程式，清晰的給讀者展示瞭如何使用訓練好的權重，對圖片進行預測並將結果視覺化。最終輸出的圖片上，檢測出了每個昆蟲，標出了它們的邊界框和具體類別。

總結

在過去的四講中，孫老師為讀者詳細講解了YOLOv3的設計思想以及具體演算法實現，並且以業病蟲害資料集為例完成了一個具體的AI識蟲的任務。在後期課程中，將繼續為大家帶來內容更豐富的課程，幫助學員快速掌握深度學習方法。

【如何學習】 1 如何觀看配套影片？如何程式碼實踐？

影片+程式碼已經發布在AI Studio實踐平臺上，影片支援PC端/手機端同步觀看，也鼓勵大家親手體驗執行程式碼哦。掃碼或者開啟以下連結：

https：//

aistudio。baidu。com/aist

udio/course/introduce/888

2 學習過程中，有疑問怎麼辦？

加入深度學習集訓營QQ群：726887660，班主任與飛槳研發會在群裡進行答疑與學習資料發放。

3 如何學習更多內容？

百度飛槳將透過飛槳深度學習集訓營的形式，繼續更新《零基礎入門深度學習》課程，由百度深度學習高階研發工程師親自授課，工作日每週二、每週四8：00-9：00不見不散，採用直播+錄播+實踐+答疑的形式，歡迎關注~

請搜尋AI Studio，點選課程-百度架構師手把手教深度學習，或者點選文末「閱讀原文」收看。

小蜜蜂問答

小蜜蜂問答

零基礎入門深度學習（十一）：目標檢測之YOLOv3演算法實現下篇

推薦文章

小蜜蜂問答

小蜜蜂問答

零基礎入門深度學習（十一）：目標檢測之YOLOv3演算法實現下篇

相關文章

北極星是夜晚天空中最亮的一顆星星？

前端頁面中的label是什麼作用

【自監督系列】首次探究畫素級別的自監督任務

利用深度學習來給機器學習賦能(1)——pytorch loss

推薦文章