零基礎入門深度學習(十一):目標檢測之YOLOv3演算法實現下篇

零基礎入門深度學習(十一):目標檢測之YOLOv3演算法實現下篇

課程名稱 | 零基礎入門深度學習

授課講師 | 孫高峰 百度深度學習技術平臺部資深研發工程師

授課時間 | 每週二、週四晚20:00-21:00

導讀

本課程是百度官方開設的零基礎入門深度學習課程,主要面向沒有深度學習技術基礎或者基礎薄弱的同學,幫助大家在深度學習領域實現從0到1+的跨越。從本課程中,你將學習到: 1。 深度學習基礎知識 2。 numpy實現神經網路構建和

梯度下降演算法

3。 計算機視覺領域主要方向的原理、實踐 4。 自然語言處理領域主要方向的原理、實踐 5。 個性化推薦演算法的原理、實踐

百度深度學習技術平臺部資深研發工程師孫高峰,在上一講中為大家講解了YOLOv3演算法中產生候選區域和

卷積神經網路

提取特徵的部分,本講將為大家介紹建立損失函式、多層級檢測和預測輸出的相關內容。

損失函式

上一講中,我們已經從概念上將輸出特徵圖上的畫素點與預測框關聯起來了,那麼要對神經網路進行求解,還必須從數學上將網路輸出和預測框關聯起來,也就是要建立起損失函式跟網路輸出之間的關係。下面討論如何建立起YOLO-V3的損失函式。

對於每個預測框,YOLO-V3模型會建立三種類型的損失函式:

表徵是否包含目標物體的損失函式,透過pred_objectness和label_objectness計算

loss_obj = fluid。layers。sigmoid_cross_entropy_with_logits(pred_objectness, label_objectness)

表徵物體位置的損失函式,透過pred_location和label_location計算

pred_location_x = pred_location[:, :, 0, :, :] pred_location_y = pred_location[:, :, 1, :, :] pred_location_w = pred_location[:, :, 2, :, :] pred_location_h = pred_location[:, :, 3, :, :] loss_location_x = fluid。layers。sigmoid_cross_entropy_with_logits(pred_location_x, label_location_x) loss_location_y = fluid。layers。sigmoid_cross_entropy_with_logits(pred_location_y, label_location_y) loss_location_w = fluid。layers。abs(pred_location_w - label_location_w) loss_location_h = fluid。layers。abs(pred_location_h - label_location_h) loss_location = loss_location_x + loss_location_y + loss_location_w + loss_location_h

表徵物體類別的損失函式,透過pred_classification和label_classification計算

loss_obj = fluid。layers。sigmoid_cross_entropy_with_logits(pred_classification, label_classification)

在前面幾個小節中我們已經知道怎麼計算這些預測值和標籤了,但是遺留了一個小問題,就是沒有標註出哪些錨框的objectness為-1。為了完成這一步,我們需要計算出所有預測框跟真實框之間的IoU,然後把那些IoU大於閾值的真實框挑選出來。實現程式碼如下:

# 挑選出跟真實框IoU大於閾值的預測框

def

get_iou_above_thresh_inds

pred_box

gt_boxes

iou_threshold

):

batchsize

=

pred_box

shape

0

num_rows

=

pred_box

shape

1

num_cols

=

pred_box

shape

2

num_anchors

=

pred_box

shape

3

ret_inds

=

np

zeros

([

batchsize

num_rows

num_cols

num_anchors

])

for

i

in

range

batchsize

):

pred_box_i

=

pred_box

i

gt_boxes_i

=

gt_boxes

i

for

k

in

range

len

gt_boxes_i

)):

#gt in gt_boxes_i:

gt

=

gt_boxes_i

k

gtx_min

=

gt

0

-

gt

2

/

2。

gty_min

=

gt

1

-

gt

3

/

2。

gtx_max

=

gt

0

+

gt

2

/

2。

gty_max

=

gt

1

+

gt

3

/

2。

if

gtx_max

-

gtx_min

<

1e-3

or

gty_max

-

gty_min

<

1e-3

):

continue

x1

=

np

maximum

pred_box_i

[:,

:,

:,

0

],

gtx_min

y1

=

np

maximum

pred_box_i

[:,

:,

:,

1

],

gty_min

x2

=

np

minimum

pred_box_i

[:,

:,

:,

2

],

gtx_max

y2

=

np

minimum

pred_box_i

[:,

:,

:,

3

],

gty_max

intersection

=

np

maximum

x2

-

x1

0。

*

np

maximum

y2

-

y1

0。

s1

=

gty_max

-

gty_min

*

gtx_max

-

gtx_min

s2

=

pred_box_i

[:,

:,

:,

2

-

pred_box_i

[:,

:,

:,

0

])

*

pred_box_i

[:,

:,

:,

3

-

pred_box_i

[:,

:,

:,

1

])

union

=

s2

+

s1

-

intersection

iou

=

intersection

/

union

above_inds

=

np

where

iou

>

iou_threshold

ret_inds

i

][

above_inds

=

1

ret_inds

=

np

transpose

ret_inds

0

3

1

2

))

return

ret_inds

astype

‘bool’

上面的函式可以得到哪些錨框的objectness需要被標註為-1,透過下面的程式,對label_objectness進行處理,將IoU大於閾值,但又不是正樣本的那些錨框標註為-1。

def

label_objectness_ignore

label_objectness

iou_above_thresh_indices

):

# 注意:這裡不能簡單的使用 label_objectness[iou_above_thresh_indices] = -1,

# 這樣可能會造成label_objectness為1的那些點被設定為-1了

# 只有將那些被標註為0,且與真實框IoU超過閾值的預測框才被標註為-1

negative_indices

=

label_objectness

<

0。5

ignore_indices

=

negative_indices

*

iou_above_thresh_indices

label_objectness

ignore_indices

=

-

1

return

label_objectness

下面透過呼叫這兩個函式,實現如何將部分預測框的label_objectness設定為-1。

# 讀取資料

reader

=

multithread_loader

‘/home/aistudio/work/insects/train’

batch_size

=

2

mode

=

‘train’

img

gt_boxes

gt_labels

im_shape

=

next

reader

())

# 計算出錨框對應的標籤

label_objectness

label_location

label_classification

scale_location

=

get_objectness_label

img

gt_boxes

gt_labels

iou_threshold

=

0。7

anchors

=

116

90

156

198

373

326

],

num_classes

=

7

downsample

=

32

NUM_ANCHORS

=

3

NUM_CLASSES

=

7

num_filters

=

NUM_ANCHORS

*

NUM_CLASSES

+

5

with

fluid

dygraph

guard

():

backbone

=

DarkNet53_conv_body

‘yolov3_backbone’

is_test

=

False

detection

=

YoloDetectionBlock

‘detection’

channel

=

512

is_test

=

False

conv2d_pred

=

Conv2D

‘out_pred’

num_filters

=

num_filters

filter_size

=

1

x

=

to_variable

img

C0

C1

C2

=

backbone

x

route

tip

=

detection

C0

P0

=

conv2d_pred

tip

# anchors包含了預先設定好的錨框尺寸

anchors

=

116

90

156

198

373

326

# downsample是特徵圖P0的步幅

pred_boxes

=

get_yolo_box_xxyy

P0

numpy

(),

anchors

num_classes

=

7

downsample

=

32

iou_above_thresh_indices

=

get_iou_above_thresh_inds

pred_boxes

gt_boxes

iou_threshold

=

0。7

label_objectness

=

label_objectness_ignore

label_objectness

iou_above_thresh_indices

label_objectness

shape

2

3

10

10

使用這種方式,就可以將那些沒有被標註為正樣本,但又與真實框IoU比較大的樣本objectness標籤設定為-1了,不計算其對任何一種損失函式的貢獻。

計算總的損失函式的程式碼如下:

def

get_loss

output

label_objectness

label_location

label_classification

scales

num_anchors

=

3

num_classes

=

7

):

# 將output從[N, C, H, W]變形為[N, NUM_ANCHORS, NUM_CLASSES + 5, H, W]

reshaped_output

=

fluid

layers

reshape

output

-

1

num_anchors

num_classes

+

5

output

shape

2

],

output

shape

3

]])

# 從output中取出跟objectness相關的預測值

pred_objectness

=

reshaped_output

[:,

:,

4

:,

:]

loss_objectness

=

fluid

layers

sigmoid_cross_entropy_with_logits

pred_objectness

label_objectness

ignore_index

=-

1

## 對第1,2,3維求和

#loss_objectness = fluid。layers。reduce_sum(loss_objectness, dim=[1,2,3], keep_dim=False)

# pos_samples 只有在正樣本的地方取值為1。,其它地方取值全為0。

pos_objectness

=

label_objectness

>

0

pos_samples

=

fluid

layers

cast

pos_objectness

‘float32’

pos_samples

stop_gradient

=

True

#從output中取出所有跟位置相關的預測值

tx

=

reshaped_output

[:,

:,

0

:,

:]

ty

=

reshaped_output

[:,

:,

1

:,

:]

tw

=

reshaped_output

[:,

:,

2

:,

:]

th

=

reshaped_output

[:,

:,

3

:,

:]

# 從label_location中取出各個位置座標的標籤

dx_label

=

label_location

[:,

:,

0

:,

:]

dy_label

=

label_location

[:,

:,

1

:,

:]

tw_label

=

label_location

[:,

:,

2

:,

:]

th_label

=

label_location

[:,

:,

3

:,

:]

# 構建損失函式

loss_location_x

=

fluid

layers

sigmoid_cross_entropy_with_logits

tx

dx_label

loss_location_y

=

fluid

layers

sigmoid_cross_entropy_with_logits

ty

dy_label

loss_location_w

=

fluid

layers

abs

tw

-

tw_label

loss_location_h

=

fluid

layers

abs

th

-

th_label

# 計算總的位置損失函式

loss_location

=

loss_location_x

+

loss_location_y

+

loss_location_h

+

loss_location_w

# 乘以scales

loss_location

=

loss_location

*

scales

# 只計算正樣本的位置損失函式

loss_location

=

loss_location

*

pos_samples

#從ooutput取出所有跟物體類別相關的畫素點

pred_classification

=

reshaped_output

[:,

:,

5

5

+

num_classes

:,

:]

# 計算分類相關的損失函式

loss_classification

=

fluid

layers

sigmoid_cross_entropy_with_logits

pred_classification

label_classification

# 將第2維求和

loss_classification

=

fluid

layers

reduce_sum

loss_classification

dim

=

2

keep_dim

=

False

# 只計算objectness為正的樣本的分類損失函式

loss_classification

=

loss_classification

*

pos_samples

total_loss

=

loss_objectness

+

loss_location

+

loss_classification

# 對所有預測框的loss進行求和

total_loss

=

fluid

layers

reduce_sum

total_loss

dim

=

1

2

3

],

keep_dim

=

False

# 對所有樣本求平均

total_loss

=

fluid

layers

reduce_mean

total_loss

return

total_loss

# 計算損失函式

# 讀取資料

reader

=

multithread_loader

‘/home/aistudio/work/insects/train’

batch_size

=

2

mode

=

‘train’

img

gt_boxes

gt_labels

im_shape

=

next

reader

())

# 計算出錨框對應的標籤

label_objectness

label_location

label_classification

scale_location

=

get_objectness_label

img

gt_boxes

gt_labels

iou_threshold

=

0。7

anchors

=

116

90

156

198

373

326

],

num_classes

=

7

downsample

=

32

NUM_ANCHORS

=

3

NUM_CLASSES

=

7

num_filters

=

NUM_ANCHORS

*

NUM_CLASSES

+

5

with

fluid

dygraph

guard

():

backbone

=

DarkNet53_conv_body

‘yolov3_backbone’

is_test

=

False

detection

=

YoloDetectionBlock

‘detection’

channel

=

512

is_test

=

False

conv2d_pred

=

Conv2D

‘out_pred’

num_filters

=

num_filters

filter_size

=

1

x

=

to_variable

img

C0

C1

C2

=

backbone

x

route

tip

=

detection

C0

P0

=

conv2d_pred

tip

# anchors包含了預先設定好的錨框尺寸

anchors

=

116

90

156

198

373

326

# downsample是特徵圖P0的步幅

pred_boxes

=

get_yolo_box_xxyy

P0

numpy

(),

anchors

num_classes

=

7

downsample

=

32

iou_above_thresh_indices

=

get_iou_above_thresh_inds

pred_boxes

gt_boxes

iou_threshold

=

0。7

label_objectness

=

label_objectness_ignore

label_objectness

iou_above_thresh_indices

label_objectness

=

to_variable

label_objectness

label_location

=

to_variable

label_location

label_classification

=

to_variable

label_classification

scales

=

to_variable

scale_location

label_objectness

stop_gradient

=

True

label_location

stop_gradient

=

True

label_classification

stop_gradient

=

True

scales

stop_gradient

=

True

total_loss

=

get_loss

P0

label_objectness

label_location

label_classification

scales

num_anchors

=

NUM_ANCHORS

num_classes

=

NUM_CLASSES

total_loss_data

=

total_loss

numpy

()

total_loss_data

array

([

623。6282

],

dtype

=

float32

上面的程式計算出了總的損失函式,看到這裡,讀者已經瞭解到了YOLO-V3演算法的大部分內容,包括如何生成錨框、給錨框打上標籤、透過卷積神經網路提取特徵、將輸出特徵圖跟預測框相關聯、建立起損失函式。

多尺度檢測

目前我們計算損失函式是在特徵圖P0的基礎上進行的,它的步幅stride=32。特徵圖的尺寸比較小,畫素點數目比較少,每個畫素點的感受野很大,具有非常豐富的高層級語義資訊,可能比較容易檢測到較大的目標。為了能夠檢測到尺寸較小的那些目標,需要在尺寸較大的特徵圖上面建立預測輸出。如果我們在C2或者C1這種層級的特徵圖上直接產生預測輸出,可能面臨新的問題,它們沒有經過充分的特徵提取,畫素點包含的語義資訊不夠豐富,有可能難以提取到有效的

特徵模式

。在目標檢測中,解決這一問題的方式是,將高層級的特徵圖尺寸放大之後跟低層級的特徵圖進行融合,得到的新特徵圖既能包含豐富的語義資訊,又具有較多的畫素點,能夠描述更加精細的結構。

具體的網路實現方式如

圖19

所示:

零基礎入門深度學習(十一):目標檢測之YOLOv3演算法實現下篇

圖19:生成多層級的輸出特徵圖P0、P1、P2

YOLO-V3在每個區域的中心位置產生3個錨框,在3個層級的特徵圖上產生錨框的大小分別為P2 [(10×13),(16×30),(33×23)],P1 [(30×61),(62×45),(59× 119)],P0[(116 × 90), (156 × 198), (373 × 326]。越往後的特徵圖上用到的錨框尺寸也越大,能捕捉到大尺寸目標的資訊;越往前的特徵圖上錨框尺寸越小,能捕捉到小尺寸目標的資訊。

因為有多尺度的檢測,所以需要對上面的程式碼進行較大的修改,而且實現過程也略顯繁瑣,所以推薦大家直接使用Paddle提供的API fluid。layers。yolov3_loss,其具體說明如下:

fluid。layers。yolov3_loss(x, gt_box, gt_label, anchors, anchor_mask, class_num, ignore_thresh, downsample_ratio, gt_score=None, use_label_smooth=True, name=None))

x: 輸入的影象資料

gt_box: 真實框

gt_label: 真實框標籤

anchors: 使用到的anchor的尺寸,如[10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326]

anchor_mask: 每個層級上使用的anchor的掩碼,[[6, 7, 8], [3, 4, 5], [0, 1, 2]]

class_num,物體類別數,AI識蟲資料集為7

ignore_thresh,預測框與真實框IoU閾值超過ignore_thresh時,不作為負樣本,YOLO-V3模型裡設定為0。7

downsample_ratio,特徵圖P0的下采樣比例,使用Darknet53

骨幹網路

時為32

gt_score,真實框的置信度,在使用了mixup技巧時會用到

use_label_smooth,一種訓練技巧,不使用就設定為False

name,該層的名字,比如‘yolov3_loss’,可以不設定

對於使用了多層級特徵圖產生預測框的方法,其具體實現程式碼如下:

# 定義上取樣模組

class

Upsample

fluid

dygraph

Layer

):

def

__init__

self

name_scope

scale

=

2

):

super

Upsample

self

__init__

name_scope

self

scale

=

scale

def

forward

self

inputs

):

# get dynamic upsample output shape

shape_nchw

=

fluid

layers

shape

inputs

shape_hw

=

fluid

layers

slice

shape_nchw

axes

=

0

],

starts

=

2

],

ends

=

4

])

shape_hw

stop_gradient

=

True

in_shape

=

fluid

layers

cast

shape_hw

dtype

=

‘int32’

out_shape

=

in_shape

*

self

scale

out_shape

stop_gradient

=

True

# reisze by actual_shape

out

=

fluid

layers

resize_nearest

input

=

inputs

scale

=

self

scale

actual_shape

=

out_shape

return

out

# 定義YOLO-V3模型

class

YOLOv3

fluid

dygraph

Layer

):

def

__init__

self

name_scope

num_classes

=

7

is_train

=

True

):

super

YOLOv3

self

__init__

name_scope

self

is_train

=

is_train

self

num_classes

=

num_classes

# 提取影象特徵的骨幹程式碼

self

block

=

DarkNet53_conv_body

self

full_name

(),

is_test

=

not

self

is_train

self

block_outputs

=

[]

self

yolo_blocks

=

[]

self

route_blocks_2

=

[]

# 生成3個層級的特徵圖P0, P1, P2

for

i

in

range

3

):

# 新增從ci生成ri和ti的模組

yolo_block

=

self

add_sublayer

“yolo_detecton_block_

%d

%

i

),

YoloDetectionBlock

self

full_name

(),

channel

=

512

//

2

**

i

),

is_test

=

not

self

is_train

))

self

yolo_blocks

append

yolo_block

num_filters

=

3

*

self

num_classes

+

5

# 新增從ti生成pi的模組,這是一個Conv2D操作,輸出通道數為3 * (num_classes + 5)

block_out

=

self

add_sublayer

“block_out_

%d

%

i

),

Conv2D

self

full_name

(),

num_filters

=

num_filters

filter_size

=

1

stride

=

1

padding

=

0

act

=

None

param_attr

=

ParamAttr

initializer

=

fluid

initializer

Normal

0。

0。02

)),

bias_attr

=

ParamAttr

initializer

=

fluid

initializer

Constant

0。0

),

regularizer

=

L2Decay

0。

))))

self

block_outputs

append

block_out

if

i

<

2

# 對ri進行卷積

route

=

self

add_sublayer

“route2_

%d

%

i

ConvBNLayer

self

full_name

(),

ch_out

=

256

//

2

**

i

),

filter_size

=

1

stride

=

1

padding

=

0

is_test

=

not

self

is_train

)))

self

route_blocks_2

append

route

# 將ri放大以便跟c_{i+1}保持同樣的尺寸

self

upsample

=

Upsample

self

full_name

())

def

forward

self

inputs

):

outputs

=

[]

blocks

=

self

block

inputs

for

i

block

in

enumerate

blocks

):

if

i

>

0

# 將r_{i-1}經過卷積和上取樣之後得到特徵圖,與這一級的ci進行拼接

block

=

fluid

layers

concat

input

=

route

block

],

axis

=

1

# 從ci生成ti和ri

route

tip

=

self

yolo_blocks

i

](

block

# 從ti生成pi

block_out

=

self

block_outputs

i

](

tip

# 將pi放入列表

outputs

append

block_out

if

i

<

2

# 對ri進行卷積調整通道數

route

=

self

route_blocks_2

i

](

route

# 對ri進行放大,使其尺寸和c_{i+1}保持一致

route

=

self

upsample

route

return

outputs

def

get_loss

self

outputs

gtbox

gtlabel

gtscore

=

None

anchors

=

10

13

16

30

33

23

30

61

62

45

59

119

116

90

156

198

373

326

],

anchor_masks

=

[[

6

7

8

],

3

4

5

],

0

1

2

]],

ignore_thresh

=

0。7

use_label_smooth

=

False

):

“”“

使用fluid。layers。yolov3_loss,直接計算損失函式,過程更簡潔,速度也更快

”“”

self

losses

=

[]

downsample

=

32

for

i

out

in

enumerate

outputs

):

# 對三個層級分別求損失函式

anchor_mask_i

=

anchor_masks

i

loss

=

fluid

layers

yolov3_loss

x

=

out

# out是P0, P1, P2中的一個

gt_box

=

gtbox

# 真實框座標

gt_label

=

gtlabel

# 真實框類別

gt_score

=

gtscore

# 真實框得分,使用mixup訓練技巧時需要,不使用該技巧時直接設定為1,形狀與gtlabel相同

anchors

=

anchors

# 錨框尺寸,包含[w0, h0, w1, h1, 。。。, w8, h8]共9個錨框的尺寸

anchor_mask

=

anchor_mask_i

# 篩選錨框的mask,例如anchor_mask_i=[3, 4, 5],將anchors中第3、4、5個錨框挑選出來給該層級使用

class_num

=

self

num_classes

# 分類類別數

ignore_thresh

=

ignore_thresh

# 當預測框與真實框IoU > ignore_thresh,標註objectness = -1

downsample_ratio

=

downsample

# 特徵圖相對於原圖縮小的倍數,例如P0是32, P1是16,P2是8

use_label_smooth

=

False

# 使用label_smooth訓練技巧時會用到,這裡沒用此技巧,直接設定為False

self

losses

append

fluid

layers

reduce_mean

loss

))

#reduce_mean對每張圖片求和

downsample

=

downsample

//

2

# 下一級特徵圖的縮放倍數會減半

return

sum

self

losses

# 對每個層級求和

開啟端到端訓練

訓練過程的流程如下圖所示,輸入圖片經過特徵提取得到三個層級的輸出特徵圖P0(stride=32)、P1(stride=16)和P2(stride=8),相應的分別使用不同大小的小方塊區域去生成對應的錨框和預測框,並對這些錨框進行標註。

P0層級特徵圖,對應著使用$32\times32$大小的小方塊,在每個區域中心生成大小分別為$[116, 90]$, $[156, 198]$, $[373, 326]$的三種錨框。

P1層級特徵圖,對應著使用$16\times16$大小的小方塊,在每個區域中心生成大小分別為$[30, 61]$, $[62, 45]$, $[59, 119]$的三種錨框。

P2層級特徵圖,對應著使用$8\times8$大小的小方塊,在每個區域中心生成大小分別為$[10, 13]$, $[16, 30]$, $[33, 23]$的三種錨框。

將三個層級的特徵圖與對應錨框之間的標籤關聯起來,並建立損失函式,總的損失函式等於三個層級的損失函式相加。透過極小化損失函式,可以開啟端到端的訓練過程。

零基礎入門深度學習(十一):目標檢測之YOLOv3演算法實現下篇

圖20:端到端訓練流程

訓練過程的具體實現程式碼如下:

import

time

import

os

import

paddle

import

paddle。fluid

as

fluid

ANCHORS

=

10

13

16

30

33

23

30

61

62

45

59

119

116

90

156

198

373

326

ANCHOR_MASKS

=

[[

6

7

8

],

3

4

5

],

0

1

2

]]

IGNORE_THRESH

=

7

NUM_CLASSES

=

7

def

get_lr

base_lr

=

0。0001

lr_decay

=

0。1

):

bd

=

10000

20000

lr

=

base_lr

base_lr

*

lr_decay

base_lr

*

lr_decay

*

lr_decay

learning_rate

=

fluid

layers

piecewise_decay

boundaries

=

bd

values

=

lr

return

learning_rate

if

__name__

==

‘__main__’

TRAINDIR

=

‘/home/aistudio/work/insects/train’

TESTDIR

=

‘/home/aistudio/work/insects/test’

VALIDDIR

=

‘/home/aistudio/work/insects/val’

with

fluid

dygraph

guard

():

model

=

YOLOv3

‘yolov3’

num_classes

=

NUM_CLASSES

is_train

=

True

#建立模型

learning_rate

=

get_lr

()

opt

=

fluid

optimizer

Momentum

learning_rate

=

learning_rate

momentum

=

0。9

regularization

=

fluid

regularizer

L2Decay

0。0005

))

#建立最佳化器

train_loader

=

multithread_loader

TRAINDIR

batch_size

=

10

mode

=

‘train’

#建立訓練資料讀取器

valid_loader

=

multithread_loader

VALIDDIR

batch_size

=

10

mode

=

‘valid’

#建立驗證資料讀取器

MAX_EPOCH

=

200

for

epoch

in

range

MAX_EPOCH

):

for

i

data

in

enumerate

train_loader

()):

img

gt_boxes

gt_labels

img_scale

=

data

gt_scores

=

np

ones

gt_labels

shape

astype

‘float32’

gt_scores

=

to_variable

gt_scores

img

=

to_variable

img

gt_boxes

=

to_variable

gt_boxes

gt_labels

=

to_variable

gt_labels

outputs

=

model

img

#前向傳播,輸出[P0, P1, P2]

loss

=

model

get_loss

outputs

gt_boxes

gt_labels

gtscore

=

gt_scores

anchors

=

ANCHORS

anchor_masks

=

ANCHOR_MASKS

ignore_thresh

=

IGNORE_THRESH

use_label_smooth

=

False

# 計算損失函式

loss

backward

()

# 反向傳播計算梯度

opt

minimize

loss

# 更新引數

model

clear_gradients

()

if

i

%

1

==

0

timestring

=

time

strftime

“%Y-%m-

%d

%H:%M:%S”

time

localtime

time

time

()))

print

‘{}[TRAIN]epoch {}, iter {}, output loss: {}’

format

timestring

epoch

i

loss

numpy

()))

# save params of model

if

epoch

%

5

==

0

or

epoch

==

MAX_EPOCH

-

1

):

fluid

save_dygraph

model

state_dict

(),

‘yolo_epoch{}’

format

epoch

))

# 每個epoch結束之後在驗證集上進行測試

model

eval

()

for

i

data

in

enumerate

valid_loader

()):

img

gt_boxes

gt_labels

img_scale

=

data

gt_scores

=

np

ones

gt_labels

shape

astype

‘float32’

gt_scores

=

to_variable

gt_scores

img

=

to_variable

img

gt_boxes

=

to_variable

gt_boxes

gt_labels

=

to_variable

gt_labels

outputs

=

model

img

loss

=

model

get_loss

outputs

gt_boxes

gt_labels

gtscore

=

gt_scores

anchors

=

ANCHORS

anchor_masks

=

ANCHOR_MASKS

ignore_thresh

=

IGNORE_THRESH

use_label_smooth

=

False

if

i

%

1

==

0

timestring

=

time

strftime

“%Y-%m-

%d

%H:%M:%S”

time

localtime

time

time

()))

print

‘{}[VALID]epoch {}, iter {}, output loss: {}’

format

timestring

epoch

i

loss

numpy

()))

model

train

()

預測

預測過程流程

圖21

如下所示:

零基礎入門深度學習(十一):目標檢測之YOLOv3演算法實現下篇

圖21:端到端訓練流程

預測過程可以分為兩步:

透過網路輸出計算出預測框位置和所屬類別的得分。

使用非極大值抑制來消除重疊較大的預測框。

對於第1步,前面我們已經講過如何透過網路輸出值計算pred_objectness_probability, pred_boxes以及pred_classification_probability,這裡推薦大家直接使用fluid。layers。yolo_box,其使用方法是:

fluid。layers。yolo_box(x, img_size, anchors, class_num, conf_thresh, downsample_ratio, name=None)

x,網路輸出特徵圖,例如上面提到的P0或者P1、P2

img_size,輸入圖片尺寸

anchors,使用到的anchor的尺寸,如[10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326]

anchor_mask: 每個層級上使用的anchor的掩碼,[[6, 7, 8], [3, 4, 5], [0, 1, 2]]

class_num,物體類別數目

conf_thresh, 置信度閾值,得分低於該閾值的預測框位置數值不用計算直接設定為0。0

downsample_ratio, 特徵圖的下采樣比例,例如P0是32,P1是16,P2是8

name=None,名字,例如‘yolo_box’

返回值包括兩項,boxes和scores,其中boxes是所有預測框的

座標值

,scores是所有預測框的得分。

預測框得分的定義是所屬類別的機率乘以其預測框是否包含目標物體的objectness機率,即

$$score = P_{obj} \cdot P_{classification}$$

在上面定義的類YOLO-V3下面新增函式,get_pred,透過呼叫fluid。layers。yolo_box獲得P0、P1、P2三個層級的特徵圖對應的預測框和得分,並將他們拼接在一塊,即可得到所有的預測框及其屬於各個類別的得分。

class

YOLOv3

fluid

dygraph

Layer

):

def

__init__

self

name_scope

num_classes

=

7

is_train

=

True

):

super

YOLOv3

self

__init__

name_scope

self

is_train

=

is_train

self

num_classes

=

num_classes

# 提取影象特徵的骨幹程式碼

self

block

=

DarkNet53_conv_body

self

full_name

(),

is_test

=

not

self

is_train

self

block_outputs

=

[]

self

yolo_blocks

=

[]

self

route_blocks_2

=

[]

for

i

in

range

3

):

# 新增從ci生成ri和ti的模組

yolo_block

=

self

add_sublayer

“yolo_detecton_block_

%d

%

i

),

YoloDetectionBlock

self

full_name

(),

channel

=

512

//

2

**

i

),

is_test

=

not

self

is_train

))

self

yolo_blocks

append

yolo_block

num_filters

=

3

*

self

num_classes

+

5

# 新增從ti生成pi的模組,這是一個Conv2D操作,輸出通道數為3 * (num_classes + 5)

block_out

=

self

add_sublayer

“block_out_

%d

%

i

),

Conv2D

self

full_name

(),

num_filters

=

num_filters

filter_size

=

1

stride

=

1

padding

=

0

act

=

None

param_attr

=

ParamAttr

initializer

=

fluid

initializer

Normal

0。

0。02

)),

bias_attr

=

ParamAttr

initializer

=

fluid

initializer

Constant

0。0

),

regularizer

=

L2Decay

0。

))))

self

block_outputs

append

block_out

if

i

<

2

# 對ri進行卷積

route

=

self

add_sublayer

“route2_

%d

%

i

ConvBNLayer

self

full_name

(),

ch_out

=

256

//

2

**

i

),

filter_size

=

1

stride

=

1

padding

=

0

is_test

=

not

self

is_train

)))

self

route_blocks_2

append

route

# 將ri放大以便跟c_{i+1}保持同樣的尺寸

self

upsample

=

Upsample

self

full_name

())

def

forward

self

inputs

):

outputs

=

[]

blocks

=

self

block

inputs

for

i

block

in

enumerate

blocks

):

if

i

>

0

# 將r_{i-1}經過卷積和上取樣之後得到特徵圖,與這一級的ci進行拼接

block

=

fluid

layers

concat

input

=

route

block

],

axis

=

1

# 從ci生成ti和ri

route

tip

=

self

yolo_blocks

i

](

block

# 從ti生成pi

block_out

=

self

block_outputs

i

](

tip

# 將pi放入列表

outputs

append

block_out

if

i

<

2

# 對ri進行卷積調整通道數

route

=

self

route_blocks_2

i

](

route

# 對ri進行放大,使其尺寸和c_{i+1}保持一致

route

=

self

upsample

route

return

outputs

def

get_loss

self

outputs

gtbox

gtlabel

gtscore

=

None

anchors

=

10

13

16

30

33

23

30

61

62

45

59

119

116

90

156

198

373

326

],

anchor_masks

=

[[

6

7

8

],

3

4

5

],

0

1

2

]],

ignore_thresh

=

0。7

use_label_smooth

=

False

):

self

losses

=

[]

downsample

=

32

for

i

out

in

enumerate

outputs

):

anchor_mask_i

=

anchor_masks

i

loss

=

fluid

layers

yolov3_loss

x

=

out

gt_box

=

gtbox

gt_label

=

gtlabel

gt_score

=

gtscore

anchors

=

anchors

anchor_mask

=

anchor_mask_i

class_num

=

self

num_classes

ignore_thresh

=

ignore_thresh

downsample_ratio

=

downsample

use_label_smooth

=

False

self

losses

append

fluid

layers

reduce_mean

loss

))

downsample

=

downsample

//

2

return

sum

self

losses

def

get_pred

self

outputs

im_shape

=

None

anchors

=

10

13

16

30

33

23

30

61

62

45

59

119

116

90

156

198

373

326

],

anchor_masks

=

[[

6

7

8

],

3

4

5

],

0

1

2

]],

valid_thresh

=

0。01

):

downsample

=

32

total_boxes

=

[]

total_scores

=

[]

for

i

out

in

enumerate

outputs

):

anchor_mask

=

anchor_masks

i

anchors_this_level

=

[]

for

m

in

anchor_mask

anchors_this_level

append

anchors

2

*

m

])

anchors_this_level

append

anchors

2

*

m

+

1

])

boxes

scores

=

fluid

layers

yolo_box

x

=

out

img_size

=

im_shape

anchors

=

anchors_this_level

class_num

=

self

num_classes

conf_thresh

=

valid_thresh

downsample_ratio

=

downsample

name

=

“yolo_box”

+

str

i

))

total_boxes

append

boxes

total_scores

append

fluid

layers

transpose

scores

perm

=

0

2

1

]))

downsample

=

downsample

//

2

yolo_boxes

=

fluid

layers

concat

total_boxes

axis

=

1

yolo_scores

=

fluid

layers

concat

total_scores

axis

=

2

return

yolo_boxes

yolo_scores

第1步的計算結果會在每個小方塊區域都會產生多個預測框,輸出預測框中會有很多重合度比較大,需要消除重疊較大的冗餘預測框。

下面示例程式碼中的預測框是使用模型對圖片預測之後輸出的,這裡一共選出了11個預測框,在圖上畫出預測框如下所示。在每個人像周圍,都出現了多個預測框,需要消除冗餘的預測框以得到最終的預測結果。

# 畫圖展示目標物體邊界框

import

numpy

as

np

import

matplotlib。pyplot

as

plt

import

matplotlib。patches

as

patches

from

matplotlib。image

import

imread

import

math

# 定義畫矩形框的程式

def

draw_rectangle

currentAxis

bbox

edgecolor

=

‘k’

facecolor

=

‘y’

fill

=

False

linestyle

=

‘-’

):

# currentAxis,座標軸,透過plt。gca()獲取

# bbox,邊界框,包含四個數值的list, [x1, y1, x2, y2]

# edgecolor,邊框線條顏色

# facecolor,填充顏色

# fill, 是否填充

# linestype,邊框線型

# patches。Rectangle需要傳入左上角座標、矩形區域的寬度、高度等引數

rect

=

patches

Rectangle

((

bbox

0

],

bbox

1

]),

bbox

2

-

bbox

0

+

1

bbox

3

-

bbox

1

+

1

linewidth

=

1

edgecolor

=

edgecolor

facecolor

=

facecolor

fill

=

fill

linestyle

=

linestyle

currentAxis

add_patch

rect

plt

figure

figsize

=

10

10

))

filename

=

‘/home/aistudio/work/images/section3/000000086956。jpg’

im

=

imread

filename

plt

imshow

im

currentAxis

=

plt

gca

()

# 預測框位置

boxes

=

np

array

([[

4。21716537e+01

1。28230896e+02

2。26547668e+02

6。00434631e+02

],

3。18562988e+02

1。23168472e+02

4。79000000e+02

6。05688416e+02

],

2。62704697e+01

1。39430557e+02

2。20587097e+02

6。38959656e+02

],

4。24965363e+01

1。42706665e+02

2。25955185e+02

6。35671204e+02

],

2。37462646e+02

1。35731537e+02

4。79000000e+02

6。31451294e+02

],

3。19390472e+02

1。29295090e+02

4。79000000e+02

6。33003845e+02

],

3。28933838e+02

1。22736115e+02

4。79000000e+02

6。39000000e+02

],

4。44292603e+01

1。70438187e+02

2。26841858e+02

6。39000000e+02

],

2。17988785e+02

3。02472412e+02

4。06062927e+02

6。29106628e+02

],

2。00241089e+02

3。23755096e+02

3。96929321e+02

6。36386108e+02

],

2。14310303e+02

3。23443665e+02

4。06732849e+02

6。35775269e+02

]])

# 預測框得分

scores

=

np

array

([

0。5247661

0。51759845

0。86075854

0。9910175

0。39170712

0。9297706

0。5115228

0。270992

0。19087596

0。64201415

0。879036

])

# 畫出所有預測框

for

box

in

boxes

draw_rectangle

currentAxis

box

零基礎入門深度學習(十一):目標檢測之YOLOv3演算法實現下篇

這裡使用非極大值抑制(non-maximum suppression, nms)來消除冗餘框,其基本思想是,如果有多個預測框都對應同一個物體,則只選出得分最高的那個預測框,剩下的預測框被丟棄掉。那麼如何判斷兩個預測框對應的是同一個物體呢,標準該怎麼設定?如果兩個預測框的類別一樣,而且他們的位置重合度比較大,則可以認為他們是在預測同一個目標。非極大值抑制的做法是,選出某個類別得分最高的預測框,然後看哪些預測框跟它的IoU大於閾值,就把這些預測框給丟棄掉。這裡IoU的閾值是超引數,需要提前設定,YOLO-V3模型裡面設定的是0。5。

比如在上面的程式中,boxes裡面一共對應11個預測框,scores給出了它們預測“人”這一類別的得分。

Step0 建立選中列表,keep_list = []

Step1 對得分進行排序,remain_list = [ 3, 5, 10, 2, 9, 0, 1, 6, 4, 7, 8],

Step2 選出boxes[3],此時keep_list為空,不需要計算IoU,直接將其放入keep_list,keep_list = [3], remain_list=[5, 10, 2, 9, 0, 1, 6, 4, 7, 8]

Step3 選出boxes[5],此時keep_list中已經存在boxes[3],計算出IoU(boxes[3], boxes[5]) = 0。0,顯然小於閾值,則keep_list=[3, 5], remain_list = [10, 2, 9, 0, 1, 6, 4, 7, 8]

Step4 選出boxes[10],此時keep_list=[3, 5],計算IoU(boxes[3], boxes[10])=0。0268,IoU(boxes[5], boxes[10])=0。0268 = 0。24,都小於閾值,則keep_list = [3, 5, 10],remain_list=[2, 9, 0, 1, 6, 4, 7, 8]

Step5 選出boxes[2],此時keep_list = [3, 5, 10],計算IoU(boxes[3], boxes[2]) = 0。88,超過了閾值,直接將boxes[2]丟棄,keep_list=[3, 5, 10],remain_list=[9, 0, 1, 6, 4, 7, 8]

Step6 選出boxes[9],此時keep_list = [3, 5, 10],計算IoU(boxes[3], boxes[9]) = 0。0577,IoU(boxes[5], boxes[9]) = 0。205,IoU(boxes[10], boxes[9]) = 0。88,超過了閾值,將boxes[9]丟棄掉。keep_list=[3, 5, 10],remain_list=[0, 1, 6, 4, 7, 8]

Step7 重複上述Step6直到remain_list為空

最終得到keep_list=[3, 5, 10],也就是預測框3、5、10被最終挑選出來了,如下圖所示

# 畫圖展示目標物體邊界框

import

numpy

as

np

import

matplotlib。pyplot

as

plt

import

matplotlib。patches

as

patches

from

matplotlib。image

import

imread

import

math

# 定義畫矩形框的程式

def

draw_rectangle

currentAxis

bbox

edgecolor

=

‘k’

facecolor

=

‘y’

fill

=

False

linestyle

=

‘-’

):

# currentAxis,座標軸,透過plt。gca()獲取

# bbox,邊界框,包含四個數值的list, [x1, y1, x2, y2]

# edgecolor,邊框線條顏色

# facecolor,填充顏色

# fill, 是否填充

# linestype,邊框線型

# patches。Rectangle需要傳入左上角座標、矩形區域的寬度、高度等引數

rect

=

patches

Rectangle

((

bbox

0

],

bbox

1

]),

bbox

2

-

bbox

0

+

1

bbox

3

-

bbox

1

+

1

linewidth

=

1

edgecolor

=

edgecolor

facecolor

=

facecolor

fill

=

fill

linestyle

=

linestyle

currentAxis

add_patch

rect

plt

figure

figsize

=

10

10

))

filename

=

‘/home/aistudio/work/images/section3/000000086956。jpg’

im

=

imread

filename

plt

imshow

im

currentAxis

=

plt

gca

()

boxes

=

np

array

([[

4。21716537e+01

1。28230896e+02

2。26547668e+02

6。00434631e+02

],

3。18562988e+02

1。23168472e+02

4。79000000e+02

6。05688416e+02

],

2。62704697e+01

1。39430557e+02

2。20587097e+02

6。38959656e+02

],

4。24965363e+01

1。42706665e+02

2。25955185e+02

6。35671204e+02

],

2。37462646e+02

1。35731537e+02

4。79000000e+02

6。31451294e+02

],

3。19390472e+02

1。29295090e+02

4。79000000e+02

6。33003845e+02

],

3。28933838e+02

1。22736115e+02

4。79000000e+02

6。39000000e+02

],

4。44292603e+01

1。70438187e+02

2。26841858e+02

6。39000000e+02

],

2。17988785e+02

3。02472412e+02

4。06062927e+02

6。29106628e+02

],

2。00241089e+02

3。23755096e+02

3。96929321e+02

6。36386108e+02

],

2。14310303e+02

3。23443665e+02

4。06732849e+02

6。35775269e+02

]])

scores

=

np

array

([

0。5247661

0。51759845

0。86075854

0。9910175

0。39170712

0。9297706

0。5115228

0。270992

0。19087596

0。64201415

0。879036

])

left_ind

=

np

where

((

boxes

[:,

0

<

60

*

boxes

[:,

0

>

20

))

left_boxes

=

boxes

left_ind

left_scores

=

scores

left_ind

colors

=

‘r’

‘g’

‘b’

‘k’

# 畫出最終保留的預測框

inds

=

3

5

10

for

i

in

range

3

):

box

=

boxes

inds

i

]]

draw_rectangle

currentAxis

box

edgecolor

=

colors

i

])

非極大值抑制的具體實現程式碼如下面nms函式的定義,需要說明的是資料集中含有多個類別的物體,所以這裡需要做多分類非極大值抑制,其實現原理與非極大值抑制相同,區別在於需要對每個類別都做非極大值抑制,實現程式碼如下面的multiclass_nms所示。

# 非極大值抑制

def

nms

bboxes

scores

score_thresh

nms_thresh

pre_nms_topk

i

=

0

c

=

0

):

“”“

nms

”“”

inds

=

np

argsort

scores

inds

=

inds

[::

-

1

keep_inds

=

[]

while

len

inds

>

0

):

cur_ind

=

inds

0

cur_score

=

scores

cur_ind

# if score of the box is less than score_thresh, just drop it

if

cur_score

<

score_thresh

break

keep

=

True

for

ind

in

keep_inds

current_box

=

bboxes

cur_ind

remain_box

=

bboxes

ind

iou

=

box_iou_xyxy

current_box

remain_box

if

iou

>

nms_thresh

keep

=

False

break

if

i

==

0

and

c

==

4

and

cur_ind

==

951

print

‘suppressed, ’

keep

i

c

cur_ind

ind

iou

if

keep

keep_inds

append

cur_ind

inds

=

inds

1

:]

return

np

array

keep_inds

# 多分類非極大值抑制

def

multiclass_nms

bboxes

scores

score_thresh

=

0。01

nms_thresh

=

0。45

pre_nms_topk

=

1000

pos_nms_topk

=

100

):

“”“

This is for multiclass_nms

”“”

batch_size

=

bboxes

shape

0

class_num

=

scores

shape

1

rets

=

[]

for

i

in

range

batch_size

):

bboxes_i

=

bboxes

i

scores_i

=

scores

i

ret

=

[]

for

c

in

range

class_num

):

scores_i_c

=

scores_i

c

keep_inds

=

nms

bboxes_i

scores_i_c

score_thresh

nms_thresh

pre_nms_topk

i

=

i

c

=

c

if

len

keep_inds

<

1

continue

keep_bboxes

=

bboxes_i

keep_inds

keep_scores

=

scores_i_c

keep_inds

keep_results

=

np

zeros

([

keep_scores

shape

0

],

6

])

keep_results

[:,

0

=

c

keep_results

[:,

1

=

keep_scores

[:]

keep_results

[:,

2

6

=

keep_bboxes

[:,

:]

ret

append

keep_results

if

len

ret

<

1

rets

append

ret

continue

ret_i

=

np

concatenate

ret

axis

=

0

scores_i

=

ret_i

[:,

1

if

len

scores_i

>

pos_nms_topk

inds

=

np

argsort

scores_i

)[::

-

1

inds

=

inds

[:

pos_nms_topk

ret_i

=

ret_i

inds

rets

append

ret_i

return

rets

下面是完整的測試程式,在測試資料集上的輸出結果將會被儲存在pred_results。json檔案中。

import

json

ANCHORS

=

10

13

16

30

33

23

30

61

62

45

59

119

116

90

156

198

373

326

ANCHOR_MASKS

=

[[

6

7

8

],

3

4

5

],

0

1

2

]]

VALID_THRESH

=

0。01

NMS_TOPK

=

400

NMS_POSK

=

100

NMS_THRESH

=

0。45

NUM_CLASSES

=

7

if

__name__

==

‘__main__’

TRAINDIR

=

‘/home/aistudio/work/insects/train/images’

TESTDIR

=

‘/home/aistudio/work/insects/test/images’

VALIDDIR

=

‘/home/aistudio/work/insects/val’

with

fluid

dygraph

guard

():

model

=

YOLOv3

‘yolov3’

num_classes

=

NUM_CLASSES

is_train

=

False

params_file_path

=

‘/home/aistudio/work/yolo_epoch50’

model_state_dict

_

=

fluid

load_dygraph

params_file_path

model

load_dict

model_state_dict

model

eval

()

total_results

=

[]

test_loader

=

test_data_loader

TESTDIR

batch_size

=

1

mode

=

‘test’

for

i

data

in

enumerate

test_loader

()):

img_name

img_data

img_scale_data

=

data

img

=

to_variable

img_data

img_scale

=

to_variable

img_scale_data

outputs

=

model

forward

img

bboxes

scores

=

model

get_pred

outputs

im_shape

=

img_scale

anchors

=

ANCHORS

anchor_masks

=

ANCHOR_MASKS

valid_thresh

=

VALID_THRESH

bboxes_data

=

bboxes

numpy

()

scores_data

=

scores

numpy

()

result

=

multiclass_nms

bboxes_data

scores_data

score_thresh

=

VALID_THRESH

nms_thresh

=

NMS_THRESH

pre_nms_topk

=

NMS_TOPK

pos_nms_topk

=

NMS_POSK

for

j

in

range

len

result

)):

result_j

=

result

j

img_name_j

=

img_name

j

total_results

append

([

img_name_j

result_j

tolist

()])

print

‘processed {} pictures’

format

len

total_results

)))

print

‘’

json

dump

total_results

open

‘pred_results。json’

‘w’

))

json檔案中儲存著測試結果,是包含所有圖片預測結果的list,其構成如下:

[[img_name, [[label, score, x1, x2, y1, y2], 。。。, [label, score, x1, x2, y1, y2]]],

[img_name, [[label, score, x1, x2, y1, y2], 。。。, [label, score, x1, x2, y1, y2]]],

。。。

[img_name, [[label, score, x1, x2, y1, y2],。。。, [label, score, x1, x2, y1, y2]]]]

list中的每一個元素是一張圖片的預測結果,list的總長度等於圖片的數目,每張圖片預測結果的格式是:

[img_name, [[label, score, x1, x2, y1, y2],。。。, [label, score, x1, x2, y1, y2]]]

其中第一個元素是圖片名稱image_name,第二個元素是包含該圖片所有預測框的list, 預測框列表:

[[label, score, x1, x2, y1, y2],。。。, [label, score, x1, x2, y1, y2]]

預測框列表中每個元素[label, score, x1, x2, y1, y2]描述了一個預測框,label是預測框所屬類別標籤,score是預測框的得分;x1, x2, y1, y2對應預測框左上角座標(x1, y1),右下角座標(x2, y2)。每張圖片可能有很多個預測框,則將其全部放在預測框列表中。

在AI識蟲比賽的基礎版本中,老師提供了MAP指標計算程式碼,使用此pred_results。json檔案即可計算出最終的評估指標。

模型效果及視覺化展示

上面的程式展示瞭如何讀取測試資料集的讀片,並將最終結果儲存在json格式的檔案中。為了更直觀的給讀者展示模型效果,下面的程式添加了如何讀取單張圖片,並畫出其產生的預測框。

建立資料讀取器以讀取單張圖片的資料

# 讀取單張測試圖片

def

single_image_data_loader

filename

test_image_size

=

608

mode

=

‘test’

):

“”“

載入測試用的圖片,測試資料沒有groundtruth標籤

”“”

batch_size

=

1

def

reader

():

batch_data

=

[]

img_size

=

test_image_size

file_path

=

os

path

join

filename

img

=

cv2

imread

file_path

img

=

cv2

cvtColor

img

cv2

COLOR_BGR2RGB

H

=

img

shape

0

W

=

img

shape

1

img

=

cv2

resize

img

img_size

img_size

))

mean

=

0。485

0。456

0。406

std

=

0。229

0。224

0。225

mean

=

np

array

mean

reshape

((

1

1

-

1

))

std

=

np

array

std

reshape

((

1

1

-

1

))

out_img

=

img

/

255。0

-

mean

/

std

out_img

=

out_img

astype

‘float32’

transpose

((

2

0

1

))

img

=

out_img

#np。transpose(out_img, (2,0,1))

im_shape

=

H

W

batch_data

append

((

image_name

split

‘。’

)[

0

],

img

im_shape

))

if

len

batch_data

==

batch_size

yield

make_test_array

batch_data

batch_data

=

[]

return

reader

定義繪製預測框的畫圖函式,程式碼如下。

# 定義畫圖函式

INSECT_NAMES

=

‘Boerner’

‘Leconte’

‘Linnaeus’

‘acuminatus’

‘armandi’

‘coleoptera’

‘linnaeus’

# 定義畫矩形框的函式

def

draw_rectangle

currentAxis

bbox

edgecolor

=

‘k’

facecolor

=

‘y’

fill

=

False

linestyle

=

‘-’

):

# currentAxis,座標軸,透過plt。gca()獲取

# bbox,邊界框,包含四個數值的list, [x1, y1, x2, y2]

# edgecolor,邊框線條顏色

# facecolor,填充顏色

# fill, 是否填充

# linestype,邊框線型

# patches。Rectangle需要傳入左上角座標、矩形區域的寬度、高度等引數

rect

=

patches

Rectangle

((

bbox

0

],

bbox

1

]),

bbox

2

-

bbox

0

+

1

bbox

3

-

bbox

1

+

1

linewidth

=

1

edgecolor

=

edgecolor

facecolor

=

facecolor

fill

=

fill

linestyle

=

linestyle

currentAxis

add_patch

rect

# 定義繪製預測結果的函式

def

draw_results

result

filename

draw_thresh

=

0。5

):

plt

figure

figsize

=

10

10

))

im

=

imread

filename

plt

imshow

im

currentAxis

=

plt

gca

()

colors

=

‘r’

‘g’

‘b’

‘k’

‘y’

‘c’

‘purple’

for

item

in

result

box

=

item

2

6

label

=

int

item

0

])

name

=

INSECT_NAMES

label

if

item

1

>

draw_thresh

draw_rectangle

currentAxis

box

edgecolor

=

colors

label

])

plt

text

box

0

],

box

1

],

name

fontsize

=

12

color

=

colors

label

])

使用上面定義的single_image_data_loader函式讀取指定的圖片,輸入網路並計算出預測框和得分,然後使用多分類非極大值抑制消除冗餘的框。將最終結果畫圖展示出來。

import

json

import

paddle

import

paddle。fluid

as

fluid

ANCHORS

=

10

13

16

30

33

23

30

61

62

45

59

119

116

90

156

198

373

326

ANCHOR_MASKS

=

[[

6

7

8

],

3

4

5

],

0

1

2

]]

VALID_THRESH

=

0。01

NMS_TOPK

=

400

NMS_POSK

=

100

NMS_THRESH

=

0。45

NUM_CLASSES

=

7

if

__name__

==

‘__main__’

image_name

=

‘/home/aistudio/work/insects/test/images/2599。jpeg’

params_file_path

=

‘/home/aistudio/work/yolo_epoch50’

with

fluid

dygraph

guard

():

model

=

YOLOv3

‘yolov3’

num_classes

=

NUM_CLASSES

is_train

=

False

model_state_dict

_

=

fluid

load_dygraph

params_file_path

model

load_dict

model_state_dict

model

eval

()

total_results

=

[]

test_loader

=

single_image_data_loader

image_name

mode

=

‘test’

for

i

data

in

enumerate

test_loader

()):

img_name

img_data

img_scale_data

=

data

img

=

to_variable

img_data

img_scale

=

to_variable

img_scale_data

outputs

=

model

forward

img

bboxes

scores

=

model

get_pred

outputs

im_shape

=

img_scale

anchors

=

ANCHORS

anchor_masks

=

ANCHOR_MASKS

valid_thresh

=

VALID_THRESH

bboxes_data

=

bboxes

numpy

()

scores_data

=

scores

numpy

()

results

=

multiclass_nms

bboxes_data

scores_data

score_thresh

=

VALID_THRESH

nms_thresh

=

NMS_THRESH

pre_nms_topk

=

NMS_TOPK

pos_nms_topk

=

NMS_POSK

result

=

results

0

draw_results

result

image_name

draw_thresh

=

0。5

透過上面的程式,清晰的給讀者展示瞭如何使用訓練好的權重,對圖片進行預測並將結果視覺化。最終輸出的圖片上,檢測出了每個昆蟲,標出了它們的邊界框和具體類別。

總結

在過去的四講中,孫老師為讀者詳細講解了YOLOv3的設計思想以及具體演算法實現,並且以業病蟲害資料集為例完成了一個具體的AI識蟲的任務。在後期課程中,將繼續為大家帶來內容更豐富的課程,幫助學員快速掌握深度學習方法。

【如何學習】 1 如何觀看配套影片?如何程式碼實踐?

影片+程式碼已經發布在AI Studio實踐平臺上,影片支援PC端/手機端同步觀看,也鼓勵大家親手體驗執行程式碼哦。掃碼或者開啟以下連結:

https://

aistudio。baidu。com/aist

udio/course/introduce/888

2 學習過程中,有疑問怎麼辦?

加入深度學習集訓營QQ群:726887660,班主任與飛槳研發會在群裡進行答疑與學習資料發放。

3 如何學習更多內容?

百度飛槳將透過飛槳深度學習集訓營的形式,繼續更新《零基礎入門深度學習》課程,由百度深度學習高階研發工程師親自授課,工作日每週二、每週四8:00-9:00不見不散,採用直播+錄播+實踐+答疑的形式,歡迎關注~

請搜尋AI Studio,點選課程-百度架構師手把手教深度學習,或者點選文末「閱讀原文」收看。