記錄一些HRNet裡面的程式碼細節,以免之後忘了。

最近在看一些pose estimation相關,細看了一下HRNet的程式碼,網路結構方面就不做記錄了,詳細可看論文HRNet。程式碼來自於

https://

github。com/leoxiaobin/d

eep-high-resolution-net。pytorch

。這裡主要記錄一下inference階段的一些座標轉換。

HRNet也是一種top-down的方法,在inference階段,程式碼裡直接用torchvision裡面faster rcnn來進行detection,得到pred_boxes:

def

detection

model

image

threshold

=

0。5

):

pil_image

=

Image

fromarray

image

transform

=

transforms

Compose

([

transforms

ToTensor

()])

input

=

transform

pil_image

with

torch

no_grad

():

pred

=

model

([

input

to

dev

)])

pred_classes

=

COCO_INSTANCE_CATEGORY_NAMES

i

for

i

in

list

pred

0

][

‘labels’

cpu

()

numpy

())]

# Get the Prediction Score

pred_boxes

=

[[(

i

0

],

i

1

]),

i

2

],

i

3

])]

for

i

in

list

pred

0

][

‘boxes’

cpu

()

detach

()

numpy

())]

# Bounding boxes

pred_scores

=

list

pred

0

][

‘scores’

cpu

()

detach

()

numpy

())

person_boxes

=

[]

for

pred_class

pred_box

pred_score

in

zip

pred_classes

pred_boxes

pred_scores

):

if

pred_score

>

threshold

and

pred_class

==

‘person’

):

person_boxes

append

pred_box

return

person_boxes

利用高threshold來得到person_boxes

整個2d pose座標的預測函式為:

def

pose_predict_2d

cfg

image

model

pred_boxes

):

normalize

=

transforms

Normalize

mean

=

0。485

0。456

0。406

],

std

=

0。229

0。224

0。225

transformer

=

transforms

Compose

([

transforms

ToTensor

(),

normalize

])

centers

scales

=

[],

[]

for

box

in

pred_boxes

center

scale

=

box_to_center_scale

box

cfg

MODEL

IMAGE_SIZE

0

],

cfg

MODEL

IMAGE_SIZE

1

])

centers

append

center

scales

append

scale

rotation

=

0

model_inputs

=

[]

for

center

scale

in

zip

centers

scales

):

trans

=

get_affine_transform

center

scale

rotation

cfg

MODEL

IMAGE_SIZE

model_input

=

cv2

warpAffine

image

trans

int

cfg

MODEL

IMAGE_SIZE

0

]),

int

cfg

MODEL

IMAGE_SIZE

1

])),

flags

=

cv2

INTER_LINEAR

# hwc -> 1chw

model_input

=

transformer

model_input

#。unsqueeze(0)

model_inputs

append

model_input

model_inputs

=

torch

stack

model_inputs

with

torch

no_grad

():

output

=

model

model_inputs

to

dev

))

coords

_

=

get_final_preds

cfg

output

cpu

()

detach

()

numpy

(),

np

asarray

centers

),

np

asarray

scales

return

coords

這裡面做的處理除了對影象的normalization外,先要得到每個box的center和scale:

def

box_to_center_scale

box

model_image_width

model_image_height

):

“”“convert a box to center,scale information required for pose transformation

Parameters

——————

box : list of tuple

list of length 2 with two tuples of floats representing

bottom left and top right corner of a box

model_image_width : int

model_image_height : int

Returns

————-

(numpy array, numpy array)

Two numpy arrays, coordinates for the center of the box and the scale of the box

”“”

center

=

np

zeros

((

2

),

dtype

=

np

float32

bottom_left_corner

=

box

0

top_right_corner

=

box

1

box_width

=

top_right_corner

0

-

bottom_left_corner

0

box_height

=

top_right_corner

1

-

bottom_left_corner

1

bottom_left_x

=

bottom_left_corner

0

bottom_left_y

=

bottom_left_corner

1

center

0

=

bottom_left_x

+

box_width

*

0。5

center

1

=

bottom_left_y

+

box_height

*

0。5

aspect_ratio

=

model_image_width

*

1。0

/

model_image_height

pixel_std

=

200

if

box_width

>

aspect_ratio

*

box_height

box_height

=

box_width

*

1。0

/

aspect_ratio

elif

box_width

<

aspect_ratio

*

box_height

box_width

=

box_height

*

aspect_ratio

scale

=

np

array

box_width

*

1。0

/

pixel_std

box_height

*

1。0

/

pixel_std

],

dtype

=

np

float32

if

center

0

!=

-

1

scale

=

scale

*

1。25

return

center

scale

這裡是為了後續將box的影象透過仿射變換到設定的大小,並且中心在box中心。這裡最後得到的center就是box的center。當對影象進行縮放的時候,存在兩種情況:當bw/bh>W/H時,為了保證比例,要增大bh=bw*H/W;反之增大bw。最終scale為修改後的[bw,bh],然後為了引入一部分的context,將整個box擴大0。25倍。

在得到每個box的center和scale後,需要獲取每個person的輸入,這裡先得到針對每個person,原圖到輸入的仿射變換(平移,旋轉,伸縮)矩陣:

def

get_affine_transform

center

scale

rot

output_size

shift

=

np

array

([

0

0

],

dtype

=

np

float32

),

inv

=

0

):

if

not

isinstance

scale

np

ndarray

and

not

isinstance

scale

list

):

print

scale

scale

=

np

array

([

scale

scale

])

scale_tmp

=

scale

*

200。0

src_w

=

scale_tmp

0

dst_w

=

output_size

0

dst_h

=

output_size

1

rot_rad

=

np

pi

*

rot

/

180

src_dir

=

get_dir

([

0

src_w

*

-

0。5

],

rot_rad

dst_dir

=

np

array

([

0

dst_w

*

-

0。5

],

np

float32

src

=

np

zeros

((

3

2

),

dtype

=

np

float32

dst

=

np

zeros

((

3

2

),

dtype

=

np

float32

src

0

:]

=

center

+

scale_tmp

*

shift

src

1

:]

=

center

+

src_dir

+

scale_tmp

*

shift

dst

0

:]

=

dst_w

*

0。5

dst_h

*

0。5

dst

1

:]

=

np

array

([

dst_w

*

0。5

dst_h

*

0。5

])

+

dst_dir

src

2

:,

:]

=

get_3rd_point

src

0

:],

src

1

:])

dst

2

:,

:]

=

get_3rd_point

dst

0

:],

dst

1

:])

if

inv

trans

=

cv2

getAffineTransform

np

float32

dst

),

np

float32

src

))

else

trans

=

cv2

getAffineTransform

np

float32

src

),

np

float32

dst

))

return

trans

OpenCV裡面的cv2。getAffineTransform函式能透過3個點對來得到相應的仿射變換。在這裡src_w為box的w,dst_w和dst_h為規定的output size。這裡get_dir函式對點[0, -0。5*src_w],然後這裡首先找到的兩個點對一個是剛剛提到的,另一個就是中心點,然後在根據前兩個點找第三個點對,使得其於中心的連線與第一個點與中心的連線垂直。透過這三個點對獲得仿射矩陣(也可以獲得相應的逆變換)。在透過cv2。warpAffine得到相應的輸入。

最後每一個person的detection結果都能獲得一個heatmap,在透過get_final_preds得到最終的coordinates:

def

get_final_preds

config

batch_heatmaps

center

scale

):

coords

maxvals

=

get_max_preds

batch_heatmaps

heatmap_height

=

batch_heatmaps

shape

2

heatmap_width

=

batch_heatmaps

shape

3

# post-processing

if

config

TEST

POST_PROCESS

for

n

in

range

coords

shape

0

]):

for

p

in

range

coords

shape

1

]):

hm

=

batch_heatmaps

n

][

p

px

=

int

math

floor

coords

n

][

p

][

0

+

0。5

))

py

=

int

math

floor

coords

n

][

p

][

1

+

0。5

))

if

1

<

px

<

heatmap_width

-

1

and

1

<

py

<

heatmap_height

-

1

diff

=

np

array

hm

py

][

px

+

1

-

hm

py

][

px

-

1

],

hm

py

+

1

][

px

-

hm

py

-

1

][

px

coords

n

][

p

+=

np

sign

diff

*

25

preds

=

coords

copy

()

# Transform back

for

i

in

range

coords

shape

0

]):

preds

i

=

transform_preds

coords

i

],

center

i

],

scale

i

],

heatmap_width

heatmap_height

return

preds

maxvals

def

get_max_preds

batch_heatmaps

):

‘’‘

get predictions from score maps

heatmaps: numpy。ndarray([batch_size, num_joints, height, width])

’‘’

assert

isinstance

batch_heatmaps

np

ndarray

),

\

‘batch_heatmaps should be numpy。ndarray’

assert

batch_heatmaps

ndim

==

4

‘batch_images should be 4-ndim’

batch_size

=

batch_heatmaps

shape

0

num_joints

=

batch_heatmaps

shape

1

width

=

batch_heatmaps

shape

3

heatmaps_reshaped

=

batch_heatmaps

reshape

((

batch_size

num_joints

-

1

))

idx

=

np

argmax

heatmaps_reshaped

2

maxvals

=

np

amax

heatmaps_reshaped

2

maxvals

=

maxvals

reshape

((

batch_size

num_joints

1

))

idx

=

idx

reshape

((

batch_size

num_joints

1

))

# 複製

preds

=

np

tile

idx

1

1

2

))

astype

np

float32

# 獲取該id的座標

preds

[:,

:,

0

=

preds

[:,

:,

0

])

%

width

preds

[:,

:,

1

=

np

floor

((

preds

[:,

:,

1

])

/

width

pred_mask

=

np

tile

np

greater

maxvals

0。0

),

1

1

2

))

pred_mask

=

pred_mask

astype

np

float32

preds

*=

pred_mask

return

preds

maxvals

透過get_max_preds函式或者各joint在影象上的位置,這裡的post_process是透過相鄰畫素的heatmap來對當前joint的位置做一些修正。最後透過transform_preds函式將位置座標變換到原始影象上

def

transform_preds

coords

center

scale

output_size

):

target_coords

=

np

zeros

coords

shape

trans

=

get_affine_transform

center

scale

0

output_size

inv

=

1

for

p

in

range

coords

shape

0

]):

target_coords

p

0

2

=

affine_transform

coords

p

0

2

],

trans

return

target_coords

即利用上訴放射變換的逆變換進行轉換,最終可得到原圖上的關節點座標。