下列程式碼均在pytorch1。4版本中測試過,確認正確無誤。

FCOS的訓練方式

在FCOS論文(

https://

arxiv。org/pdf/1904。0135

5。pdf

)中提到了在COCO資料集上的訓練方法,和RetinaNet的訓練方法是一致的:使用momentum=0。9,weight_decay=0。0001的SGD最佳化器,batch_size=16且使用跨卡同步BN。一共迭代90000次(大約12個epoch),初始學習率為0。01,在60000和80000次分別將學習率除以10。

在我的訓練程式碼中,仍然使用Adam最佳化器來進行最佳化,因為Adam最佳化器初期收斂比較快。在實驗中發現,FCOS的收斂速度明顯要比RetinaNet要慢,因此對於FCOS,需要訓練24個epoch。

為什麼FCOS的收斂速度比RetinaNet要慢?

對於目標檢測任務,樣本指的是Anchor。FCOS可以看成feature map上每個位置只有一個Anchor的特殊形式,而RetinaNet在feature map上每個位置有9個Anchor。去除掉RetinaNet中被忽略的Anchor樣本,對於同樣的輸入圖片,RetinaNet的Anchor樣本數量大約是FCOS的5到6倍。樣本少也就意味著監督資訊變少,因此FCOS的收斂速度要慢一些。事實上,在DETR論文中也可以觀察到這種現象,對於每張圖片,DETR只產生100個樣本,遠遠少於Faster rcnn中Anchor樣本的數量,因此DETR要訓練500個epoch才能達到Faster rcnn訓練9x(108個epoch)時的效能表現。

FCOS論文中提出的各項改進措施

centerness heads與迴歸heads共用:

目前的實現就是這樣的。

分類heads和迴歸heads加上GN:

即在heads前四層卷積層之後都加上Group Normlization。程式碼如下:

layers

append

nn

GroupNorm

32

inplanes

))

我已經在自己實現的FCOS上進行了驗證,GN可以穩定漲點,但是由於TNN,NCNN都不支援GN運算元,因此目前實現的FCOS中沒有加入GN。

GIoU:

已經用在迴歸loss上。

center sampling:

在標註框中,物體一般不會佔據框中的100%面積,因此總會有一部分背景在框中。當使用FCOS進行ground truth分配時,有些點可能會在框的內部邊緣,該位置可能會是背景,這些點在訓練過程中會比較難收斂。center sampling就是當點在框內中心部分更小的一個區域內才將該點設為正樣本,這樣可以排除一部分實際上在背景上的點。但是這種方法會多一個超引數來調整區域大小,當更換資料集時,這個超引數需要重新調整,不太方便。我認為如果資料集中同時標註了分割標籤時,其實直接判斷點是否在分割標籤圍成的區域內更好,這樣對於任何一個數據集都不需要調整超引數。

FCOS的訓練和測試

FCOS的訓練和測試程式碼與RetinaNet完全一樣,只是多了一個centerness loss。使用RetinaNet的訓練程式碼稍作修改即可。

config。py檔案如下:

import

os

import

sys

BASE_DIR

=

os

path

dirname

os

path

dirname

os

path

dirname

os

path

abspath

__file__

))))

sys

path

append

BASE_DIR

from

public。path

import

COCO2017_path

from

public。detection。dataset。cocodataset

import

CocoDetection

Resize

RandomFlip

RandomCrop

RandomTranslate

import

torchvision。transforms

as

transforms

import

torchvision。datasets

as

datasets

class

Config

object

):

log

=

‘。/log’

# Path to save log

checkpoint_path

=

‘。/checkpoints’

# Path to store checkpoint model

resume

=

‘。/checkpoints/latest。pth’

# load checkpoint model

evaluate

=

None

# evaluate model path

train_dataset_path

=

os

path

join

COCO2017_path

‘images/train2017’

val_dataset_path

=

os

path

join

COCO2017_path

‘images/val2017’

dataset_annotations_path

=

os

path

join

COCO2017_path

‘annotations’

network

=

“resnet50_fcos”

pretrained

=

False

num_classes

=

80

seed

=

0

input_image_size

=

667

train_dataset

=

CocoDetection

image_root_dir

=

train_dataset_path

annotation_root_dir

=

dataset_annotations_path

set

=

“train2017”

transform

=

transforms

Compose

([

RandomFlip

flip_prob

=

0。5

),

RandomCrop

crop_prob

=

0。5

),

RandomTranslate

translate_prob

=

0。5

),

Resize

resize

=

input_image_size

),

]))

val_dataset

=

CocoDetection

image_root_dir

=

val_dataset_path

annotation_root_dir

=

dataset_annotations_path

set

=

“val2017”

transform

=

transforms

Compose

([

Resize

resize

=

input_image_size

),

]))

epochs

=

12

per_node_batch_size

=

16

lr

=

1e-4

num_workers

=

4

print_interval

=

100

apex

=

True

sync_bn

=

False

train。py檔案如下:

import

sys

import

os

import

argparse

import

random

import

shutil

import

time

import

warnings

import

json

BASE_DIR

=

os

path

dirname

os

path

dirname

os

path

dirname

os

path

abspath

__file__

))))

sys

path

append

BASE_DIR

warnings

filterwarnings

‘ignore’

import

numpy

as

np

from

thop

import

profile

from

thop

import

clever_format

import

apex

from

apex

import

amp

from

apex。parallel

import

convert_syncbn_model

from

apex。parallel

import

DistributedDataParallel

import

torch

import

torch。nn

as

nn

import

torch。nn。parallel

import

torch。distributed

as

dist

import

torch。backends。cudnn

as

cudnn

from

torch。utils。data

import

DataLoader

from

torchvision

import

transforms

from

config

import

Config

from

public。detection。dataset。cocodataset

import

COCODataPrefetcher

collater

from

public。detection。models。loss

import

FCOSLoss

from

public。detection。models。decode

import

FCOSDecoder

from

public。detection。models

import

fcos

from

public。imagenet。utils

import

get_logger

from

pycocotools。cocoeval

import

COCOeval

def

parse_args

():

parser

=

argparse

ArgumentParser

description

=

‘PyTorch COCO Detection Distributed Training’

parser

add_argument

‘——network’

type

=

str

default

=

Config

network

help

=

‘name of network’

parser

add_argument

‘——lr’

type

=

float

default

=

Config

lr

help

=

‘learning rate’

parser

add_argument

‘——epochs’

type

=

int

default

=

Config

epochs

help

=

‘num of training epochs’

parser

add_argument

‘——per_node_batch_size’

type

=

int

default

=

Config

per_node_batch_size

help

=

‘per_node batch size’

parser

add_argument

‘——pretrained’

type

=

bool

default

=

Config

pretrained

help

=

‘load pretrained model params or not’

parser

add_argument

‘——num_classes’

type

=

int

default

=

Config

num_classes

help

=

‘model classification num’

parser

add_argument

‘——input_image_size’

type

=

int

default

=

Config

input_image_size

help

=

‘input image size’

parser

add_argument

‘——num_workers’

type

=

int

default

=

Config

num_workers

help

=

‘number of worker to load data’

parser

add_argument

‘——resume’

type

=

str

default

=

Config

resume

help

=

‘put the path to resuming file if needed’

parser

add_argument

‘——checkpoints’

type

=

str

default

=

Config

checkpoint_path

help

=

‘path for saving trained models’

parser

add_argument

‘——log’

type

=

str

default

=

Config

log

help

=

‘path to save log’

parser

add_argument

‘——evaluate’

type

=

str

default

=

Config

evaluate

help

=

‘path for evaluate model’

parser

add_argument

‘——seed’

type

=

int

default

=

Config

seed

help

=

‘seed’

parser

add_argument

‘——print_interval’

type

=

bool

default

=

Config

print_interval

help

=

‘print interval’

parser

add_argument

‘——apex’

type

=

bool

default

=

Config

apex

help

=

‘use apex or not’

parser

add_argument

‘——sync_bn’

type

=

bool

default

=

Config

sync_bn

help

=

‘use sync bn or not’

parser

add_argument

‘——local_rank’

type

=

int

default

=

0

help

=

‘LOCAL_PROCESS_RANK’

return

parser

parse_args

()

def

validate

val_dataset

model

decoder

):

model

=

model

module

# switch to evaluate mode

model

eval

()

with

torch

no_grad

():

all_eval_result

=

evaluate_coco

val_dataset

model

decoder

return

all_eval_result

def

evaluate_coco

val_dataset

model

decoder

):

results

image_ids

=

[],

[]

for

index

in

range

len

val_dataset

)):

data

=

val_dataset

index

scale

=

data

‘scale’

cls_heads

reg_heads

center_heads

batch_positions

=

model

data

‘img’

cuda

()

permute

2

0

1

float

()

unsqueeze

dim

=

0

))

scores

classes

boxes

=

decoder

cls_heads

reg_heads

center_heads

batch_positions

scores

classes

boxes

=

scores

cpu

(),

classes

cpu

(),

boxes

cpu

()

boxes

/=

scale

# make sure decode batch_size=1

# scores shape:[1,max_detection_num]

# classes shape:[1,max_detection_num]

# bboxes shape[1,max_detection_num,4]

assert

scores

shape

0

==

1

scores

=

scores

squeeze

0

classes

=

classes

squeeze

0

boxes

=

boxes

squeeze

0

# for coco_eval,we need [x_min,y_min,w,h] format pred boxes

boxes

[:,

2

:]

-=

boxes

[:,

2

for

object_score

object_class

object_box

in

zip

scores

classes

boxes

):

object_score

=

float

object_score

object_class

=

int

object_class

object_box

=

object_box

tolist

()

if

object_class

==

-

1

break

image_result

=

{

‘image_id’

val_dataset

image_ids

index

],

‘category_id’

val_dataset

find_category_id_from_coco_label

object_class

),

‘score’

object_score

‘bbox’

object_box

}

results

append

image_result

image_ids

append

val_dataset

image_ids

index

])

print

‘{}/{}’

format

index

len

val_dataset

)),

end

=

\r

if

not

len

results

):

print

“No target detected in test set images”

return

json

dump

results

open

‘{}_bbox_results。json’

format

val_dataset

set_name

),

‘w’

),

indent

=

4

# load results in COCO evaluation tool

coco_true

=

val_dataset

coco

coco_pred

=

coco_true

loadRes

‘{}_bbox_results。json’

format

val_dataset

set_name

))

coco_eval

=

COCOeval

coco_true

coco_pred

‘bbox’

coco_eval

params

imgIds

=

image_ids

coco_eval

evaluate

()

coco_eval

accumulate

()

coco_eval

summarize

()

all_eval_result

=

coco_eval

stats

return

all_eval_result

def

main

():

args

=

parse_args

()

global

local_rank

local_rank

=

args

local_rank

if

local_rank

==

0

global

logger

logger

=

get_logger

__name__

args

log

torch

cuda

empty_cache

()

if

args

seed

is

not

None

random

seed

args

seed

torch

manual_seed

args

seed

torch

cuda

manual_seed_all

args

seed

cudnn

deterministic

=

True

torch

cuda

set_device

local_rank

dist

init_process_group

backend

=

‘nccl’

init_method

=

‘env://’

global

gpus_num

gpus_num

=

torch

cuda

device_count

()

if

local_rank

==

0

logger

info

f

‘use {gpus_num} gpus’

logger

info

f

“args: {args}”

cudnn

benchmark

=

True

cudnn

enabled

=

True

start_time

=

time

time

()

# dataset and dataloader

if

local_rank

==

0

logger

info

‘start loading data’

train_sampler

=

torch

utils

data

distributed

DistributedSampler

Config

train_dataset

shuffle

=

True

train_loader

=

DataLoader

Config

train_dataset

batch_size

=

args

per_node_batch_size

shuffle

=

False

num_workers

=

args

num_workers

collate_fn

=

collater

sampler

=

train_sampler

if

local_rank

==

0

logger

info

‘finish loading data’

model

=

fcos

__dict__

args

network

](

**

{

“pretrained”

args

pretrained

“num_classes”

args

num_classes

})

for

name

param

in

model

named_parameters

():

if

local_rank

==

0

logger

info

f

“{name},{param。requires_grad}”

flops_input

=

torch

randn

1

3

args

input_image_size

args

input_image_size

flops

params

=

profile

model

inputs

=

flops_input

))

flops

params

=

clever_format

([

flops

params

],

%。3f

if

local_rank

==

0

logger

info

f

“model: ‘{args。network}’, flops: {flops}, params: {params}”

criterion

=

FCOSLoss

image_w

=

args

input_image_size

image_h

=

args

input_image_size

cuda

()

decoder

=

FCOSDecoder

image_w

=

args

input_image_size

image_h

=

args

input_image_size

cuda

()

model

=

model

cuda

()

optimizer

=

torch

optim

AdamW

model

parameters

(),

lr

=

args

lr

scheduler

=

torch

optim

lr_scheduler

ReduceLROnPlateau

optimizer

patience

=

3

verbose

=

True

if

args

sync_bn

model

=

torch

nn

SyncBatchNorm

convert_sync_batchnorm

model

if

args

apex

amp

register_float_function

torch

‘sigmoid’

amp

register_float_function

torch

‘softmax’

model

optimizer

=

amp

initialize

model

optimizer

opt_level

=

‘O1’

model

=

apex

parallel

DistributedDataParallel

model

delay_allreduce

=

True

if

args

sync_bn

model

=

apex

parallel

convert_syncbn_model

model

else

model

=

nn

parallel

DistributedDataParallel

model

device_ids

=

local_rank

],

output_device

=

local_rank

if

args

evaluate

if

not

os

path

isfile

args

evaluate

):

if

local_rank

==

0

logger

exception

‘{} is not a file, please check it again’

format

args

resume

))

sys

exit

-

1

if

local_rank

==

0

logger

info

‘start only evaluating’

logger

info

f

“start resuming model from {args。evaluate}”

checkpoint

=

torch

load

args

evaluate

map_location

=

torch

device

‘cpu’

))

model

load_state_dict

checkpoint

‘model_state_dict’

])

if

local_rank

==

0

logger

info

f

“start eval。”

all_eval_result

=

validate

Config

val_dataset

model

decoder

logger

info

f

“eval done。”

if

all_eval_result

is

not

None

logger

info

f

“val: epoch: {checkpoint[‘epoch’]:0>5d}, IoU=0。5:0。95,area=all,maxDets=100,mAP:{all_eval_result[0]:。3f}, IoU=0。5,area=all,maxDets=100,mAP:{all_eval_result[1]:。3f}, IoU=0。75,area=all,maxDets=100,mAP:{all_eval_result[2]:。3f}, IoU=0。5:0。95,area=small,maxDets=100,mAP:{all_eval_result[3]:。3f}, IoU=0。5:0。95,area=medium,maxDets=100,mAP:{all_eval_result[4]:。3f}, IoU=0。5:0。95,area=large,maxDets=100,mAP:{all_eval_result[5]:。3f}, IoU=0。5:0。95,area=all,maxDets=1,mAR:{all_eval_result[6]:。3f}, IoU=0。5:0。95,area=all,maxDets=10,mAR:{all_eval_result[7]:。3f}, IoU=0。5:0。95,area=all,maxDets=100,mAR:{all_eval_result[8]:。3f}, IoU=0。5:0。95,area=small,maxDets=100,mAR:{all_eval_result[9]:。3f}, IoU=0。5:0。95,area=medium,maxDets=100,mAR:{all_eval_result[10]:。3f}, IoU=0。5:0。95,area=large,maxDets=100,mAR:{all_eval_result[11]:。3f}”

return

best_map

=

0。0

start_epoch

=

1

# resume training

if

os

path

exists

args

resume

):

if

local_rank

==

0

logger

info

f

“start resuming model from {args。resume}”

checkpoint

=

torch

load

args

resume

map_location

=

torch

device

‘cpu’

))

start_epoch

+=

checkpoint

‘epoch’

best_map

=

checkpoint

‘best_map’

model

load_state_dict

checkpoint

‘model_state_dict’

])

optimizer

load_state_dict

checkpoint

‘optimizer_state_dict’

])

scheduler

load_state_dict

checkpoint

‘scheduler_state_dict’

])

if

local_rank

==

0

logger

info

f

“finish resuming model from {args。resume}, epoch {checkpoint[‘epoch’]}, best_map: {checkpoint[‘best_map’]}, ”

f

“loss: {checkpoint[‘loss’]:3f}, cls_loss: {checkpoint[‘cls_loss’]:2f}, reg_loss: {checkpoint[‘reg_loss’]:2f}, center_ness_loss: {checkpoint[‘center_ness_loss’]:2f}”

if

local_rank

==

0

if

not

os

path

exists

args

checkpoints

):

os

makedirs

args

checkpoints

if

local_rank

==

0

logger

info

‘start training’

for

epoch

in

range

start_epoch

args

epochs

+

1

):

train_sampler

set_epoch

epoch

cls_losses

reg_losses

center_ness_losses

losses

=

train

train_loader

model

criterion

optimizer

scheduler

epoch

args

if

local_rank

==

0

logger

info

f

“train: epoch {epoch:0>3d}, cls_loss: {cls_losses:。2f}, reg_loss: {reg_losses:。2f}, center_ness_loss: {center_ness_losses:。2f}, loss: {losses:。2f}”

if

epoch

%

5

==

0

or

epoch

==

args

epochs

if

local_rank

==

0

logger

info

f

“start eval。”

all_eval_result

=

validate

Config

val_dataset

model

decoder

logger

info

f

“eval done。”

if

all_eval_result

is

not

None

logger

info

f

“val: epoch: {epoch:0>5d}, IoU=0。5:0。95,area=all,maxDets=100,mAP:{all_eval_result[0]:。3f}, IoU=0。5,area=all,maxDets=100,mAP:{all_eval_result[1]:。3f}, IoU=0。75,area=all,maxDets=100,mAP:{all_eval_result[2]:。3f}, IoU=0。5:0。95,area=small,maxDets=100,mAP:{all_eval_result[3]:。3f}, IoU=0。5:0。95,area=medium,maxDets=100,mAP:{all_eval_result[4]:。3f}, IoU=0。5:0。95,area=large,maxDets=100,mAP:{all_eval_result[5]:。3f}, IoU=0。5:0。95,area=all,maxDets=1,mAR:{all_eval_result[6]:。3f}, IoU=0。5:0。95,area=all,maxDets=10,mAR:{all_eval_result[7]:。3f}, IoU=0。5:0。95,area=all,maxDets=100,mAR:{all_eval_result[8]:。3f}, IoU=0。5:0。95,area=small,maxDets=100,mAR:{all_eval_result[9]:。3f}, IoU=0。5:0。95,area=medium,maxDets=100,mAR:{all_eval_result[10]:。3f}, IoU=0。5:0。95,area=large,maxDets=100,mAR:{all_eval_result[11]:。3f}”

if

all_eval_result

0

>

best_map

torch

save

model

module

state_dict

(),

os

path

join

args

checkpoints

“best。pth”

))

best_map

=

all_eval_result

0

if

local_rank

==

0

torch

save

{

‘epoch’

epoch

‘best_map’

best_map

‘cls_loss’

cls_losses

‘reg_loss’

reg_losses

‘center_ness_loss’

center_ness_losses

‘loss’

losses

‘model_state_dict’

model

state_dict

(),

‘optimizer_state_dict’

optimizer

state_dict

(),

‘scheduler_state_dict’

scheduler

state_dict

(),

},

os

path

join

args

checkpoints

‘latest。pth’

))

if

local_rank

==

0

logger

info

f

“finish training, best_map: {best_map:。3f}”

training_time

=

time

time

()

-

start_time

/

3600

if

local_rank

==

0

logger

info

f

“finish training, total training time: {training_time:。2f} hours”

def

train

train_loader

model

criterion

optimizer

scheduler

epoch

args

):

cls_losses

reg_losses

center_ness_losses

losses

=

[],

[],

[],

[]

# switch to train mode

model

train

()

iters

=

len

train_loader

dataset

//

args

per_node_batch_size

*

gpus_num

prefetcher

=

COCODataPrefetcher

train_loader

images

annotations

=

prefetcher

next

()

iter_index

=

1

while

images

is

not

None

images

annotations

=

images

cuda

()

float

(),

annotations

cuda

()

cls_heads

reg_heads

center_heads

batch_positions

=

model

images

cls_loss

reg_loss

center_ness_loss

=

criterion

cls_heads

reg_heads

center_heads

batch_positions

annotations

loss

=

cls_loss

+

reg_loss

+

center_ness_loss

if

cls_loss

==

0。0

or

reg_loss

==

0。0

optimizer

zero_grad

()

continue

if

args

apex

with

amp

scale_loss

loss

optimizer

as

scaled_loss

scaled_loss

backward

()

else

loss

backward

()

torch

nn

utils

clip_grad_norm_

model

parameters

(),

0。1

optimizer

step

()

optimizer

zero_grad

()

cls_losses

append

cls_loss

item

())

reg_losses

append

reg_loss

item

())

center_ness_losses

append

center_ness_loss

item

())

losses

append

loss

item

())

images

annotations

=

prefetcher

next

()

if

local_rank

==

0

and

iter_index

%

args

print_interval

==

0

logger

info

f

“train: epoch {epoch:0>3d}, iter [{iter_index:0>5d}, {iters:0>5d}], cls_loss: {cls_loss。item():。2f}, reg_loss: {reg_loss。item():。2f}, center_ness_loss: {center_ness_loss。item():。2f}, loss_total: {loss。item():。2f}”

iter_index

+=

1

scheduler

step

np

mean

losses

))

return

np

mean

cls_losses

),

np

mean

reg_losses

),

np

mean

center_ness_losses

),

np

mean

losses

if

__name__

==

‘__main__’

main

()

訓練結果

下面的實驗結果輸入均為667大小,相當於論文中的resize=400。關於為什麼等價可以看【庖丁解牛】從零實現RetinaNet(終)文章中的解釋。mAP為COCOeval stats[0]值,mAR為COCOeval stats[8]值。

\begin{array}[b] {|c|c|}   \hline Network & batch & gpu-nums & apex &  syncbn & epoch5-mAP-mAR-loss & epoch10-mAP-mAR-loss &epoch12-mAP-mAR-loss &epoch15-mAP-mAR-loss &epoch20-mAP-mAR-loss &epoch24-mAP-mAR-loss \\     \hline ResNet50-FCOS-myresize667-fastdecode  & 32  & 2 & yes & no & 0.162,0.289,1.31  &  0.226,0.342,1.21 & 0.248,0.370,1.20 & 0.217,0.343,1.17 & 0.282,0.409,1.14 & 0.286,0.409,1.12 \\   \hline ResNet101-FCOS-myresize667-fastdecode  & 24  & 2 & yes & no & 0.206,0.325,1.29  & 0.237,0.359,1.20   & 0.263,0.380,1.18  & 0.277,0.400,1.15  &  0.260,0.385,1.13  &  0.291,0.416,1.10  \\        \hline  \end{array}\\

我訓練的resnet50_FCOS模型mAP要略低於同樣大小輸入的ResNet50-RetinaNet(低0。7個百分點),這可能是因為沒有使用Group Normlization和center sample的原因。但是FCOS模型的mAR指標高於RetinaNet,這表明centerness分支對提升mAR是有效的。

所有程式碼已上傳到本人github repository:

本文也同時放在了我的CSDN部落格上: