HRNet 程式碼細節記錄

記錄一些HRNet裡面的程式碼細節，以免之後忘了。

最近在看一些pose estimation相關，細看了一下HRNet的程式碼，網路結構方面就不做記錄了，詳細可看論文HRNet。程式碼來自於

https：//

github。com/leoxiaobin/d

eep-high-resolution-net。pytorch

。這裡主要記錄一下inference階段的一些座標轉換。

HRNet也是一種top-down的方法，在inference階段，程式碼裡直接用torchvision裡面faster rcnn來進行detection，得到pred_boxes：

def

detection

（

model

，

image

，

threshold

0。5

）：

pil_image

Image

。

fromarray

（

image

）

transform

transforms

。

Compose

（［

transforms

。

ToTensor

（）］）

input

transform

（

pil_image

）

with

torch

。

no_grad

（）：

pred

model

（［

input

。

（

dev

）］）

pred_classes

［

COCO_INSTANCE_CATEGORY_NAMES

［

］

for

list

（

pred

［

］［

‘labels’

］

。

cpu

（）

。

numpy

（））］

# Get the Prediction Score

pred_boxes

［［（

［

］，

［

］），

（

［

］，

［

］）］

for

list

（

pred

［

］［

‘boxes’

］

。

cpu

（）

。

detach

（）

。

numpy

（））］

# Bounding boxes

pred_scores

list

（

pred

［

］［

‘scores’

］

。

cpu

（）

。

detach

（）

。

numpy

（））

person_boxes

［］

for

pred_class

，

pred_box

，

pred_score

zip

（

pred_classes

，

pred_boxes

，

pred_scores

）：

（

pred_score

threshold

）

and

（

pred_class

‘person’

）：

person_boxes

。

append

（

pred_box

）

return

person_boxes

利用高threshold來得到person_boxes

整個2d pose座標的預測函式為：

def

pose_predict_2d

（

cfg

，

image

，

model

，

pred_boxes

）：

normalize

transforms

。

Normalize

（

mean

［

0。485

，

0。456

，

0。406

］，

std

［

0。229

，

0。224

，

0。225

］

）

transformer

transforms

。

Compose

（［

transforms

。

ToTensor

（），

normalize

］）

centers

，

scales

［］，

［］

for

box

pred_boxes

：

center

，

scale

box_to_center_scale

（

box

，

cfg

。

MODEL

。

IMAGE_SIZE

［

］，

cfg

。

MODEL

。

IMAGE_SIZE

［

］）

centers

。

append

（

center

）

scales

。

append

（

scale

）

rotation

model_inputs

［］

for

center

，

scale

zip

（

centers

，

scales

）：

trans

get_affine_transform

（

center

，

scale

，

rotation

，

cfg

。

MODEL

。

IMAGE_SIZE

）

model_input

cv2

。

warpAffine

（

image

，

trans

，

（

int

（

cfg

。

MODEL

。

IMAGE_SIZE

［

］），

int

（

cfg

。

MODEL

。

IMAGE_SIZE

［

］）），

flags

cv2

。

INTER_LINEAR

）

# hwc -> 1chw

model_input

transformer

（

model_input

）

#。unsqueeze（0）

model_inputs

。

append

（

model_input

）

model_inputs

torch

。

stack

（

model_inputs

）

with

torch

。

no_grad

（）：

output

model

（

model_inputs

。

（

dev

））

coords

，

get_final_preds

（

cfg

，

output

。

cpu

（）

。

detach

（）

。

numpy

（），

。

asarray

（

centers

），

。

asarray

（

scales

）

return

coords

這裡面做的處理除了對影象的normalization外，先要得到每個box的center和scale：

def

box_to_center_scale

（

box

，

model_image_width

，

model_image_height

）：

“”“convert a box to center，scale information required for pose transformation

Parameters

——————

box ： list of tuple

list of length 2 with two tuples of floats representing

bottom left and top right corner of a box

model_image_width ： int

model_image_height ： int

Returns

————-

（numpy array， numpy array）

Two numpy arrays， coordinates for the center of the box and the scale of the box

”“”

center

。

zeros

（（

），

dtype

。

float32

）

bottom_left_corner

box

［

］

top_right_corner

box

［

］

box_width

top_right_corner

［

］

bottom_left_corner

［

］

box_height

top_right_corner

［

］

bottom_left_corner

［

］

bottom_left_x

bottom_left_corner

［

］

bottom_left_y

bottom_left_corner

［

］

center

［

］

bottom_left_x

box_width

0。5

center

［

］

bottom_left_y

box_height

0。5

aspect_ratio

model_image_width

1。0

model_image_height

pixel_std

200

box_width

aspect_ratio

box_height

：

box_height

box_width

1。0

aspect_ratio

elif

box_width

aspect_ratio

box_height

：

box_width

box_height

aspect_ratio

scale

。

array

（

［

box_width

1。0

pixel_std

，

box_height

1。0

pixel_std

］，

dtype

。

float32

）

center

［

］

！=

：

scale

1。25

return

center

，

scale

這裡是為了後續將box的影象透過仿射變換到設定的大小，並且中心在box中心。這裡最後得到的center就是box的center。當對影象進行縮放的時候，存在兩種情況：當bw/bh>W/H時，為了保證比例，要增大bh=bw*H/W；反之增大bw。最終scale為修改後的［bw，bh］，然後為了引入一部分的context，將整個box擴大0。25倍。

在得到每個box的center和scale後，需要獲取每個person的輸入，這裡先得到針對每個person，原圖到輸入的仿射變換（平移，旋轉，伸縮）矩陣：

def

get_affine_transform

（

center

，

scale

，

rot

，

output_size

，

shift

。

array

（［

，

］，

dtype

。

float32

），

inv

）：

not

isinstance

（

scale

，

。

ndarray

）

and

not

isinstance

（

scale

，

list

）：

（

scale

）

scale

。

array

（［

scale

，

scale

］）

scale_tmp

scale

200。0

src_w

scale_tmp

［

］

dst_w

output_size

［

］

dst_h

output_size

［

］

rot_rad

。

rot

180

src_dir

get_dir

（［

，

src_w

0。5

］，

rot_rad

）

dst_dir

。

array

（［

，

dst_w

0。5

］，

。

float32

）

src

。

zeros

（（

，

），

dtype

。

float32

）

dst

。

zeros

（（

，

），

dtype

。

float32

）

src

［

，

：］

center

scale_tmp

shift

src

［

，

：］

center

src_dir

scale_tmp

shift

dst

［

，

：］

［

dst_w

0。5

，

dst_h

0。5

］

dst

［

，

：］

。

array

（［

dst_w

0。5

，

dst_h

0。5

］）

dst_dir

src

［

：，

：］

get_3rd_point

（

src

［

，

：］，

src

［

，

：］）

dst

［

：，

：］

get_3rd_point

（

dst

［

，

：］，

dst

［

，

：］）

inv

：

trans

cv2

。

getAffineTransform

（

。

float32

（

dst

），

。

float32

（

src

））

else

：

trans

cv2

。

getAffineTransform

（

。

float32

（

src

），

。

float32

（

dst

））

return

trans

OpenCV裡面的cv2。getAffineTransform函式能透過3個點對來得到相應的仿射變換。在這裡src_w為box的w，dst_w和dst_h為規定的output size。這裡get_dir函式對點［0， -0。5*src_w］，然後這裡首先找到的兩個點對一個是剛剛提到的，另一個就是中心點，然後在根據前兩個點找第三個點對，使得其於中心的連線與第一個點與中心的連線垂直。透過這三個點對獲得仿射矩陣（也可以獲得相應的逆變換）。在透過cv2。warpAffine得到相應的輸入。

最後每一個person的detection結果都能獲得一個heatmap，在透過get_final_preds得到最終的coordinates：

def

get_final_preds

（

config

，

batch_heatmaps

，

center

，

scale

）：

coords

，

maxvals

get_max_preds

（

batch_heatmaps

）

heatmap_height

batch_heatmaps

。

shape

［

］

heatmap_width

batch_heatmaps

。

shape

［

］

# post-processing

config

。

TEST

。

POST_PROCESS

：

for

range

（

coords

。

shape

［

］）：

for

range

（

coords

。

shape

［

］）：

batch_heatmaps

［

］［

］

int

（

math

。

floor

（

coords

［

］［

］

0。5

））

int

（

math

。

floor

（

coords

［

］［

］

0。5

））

heatmap_width

and

heatmap_height

：

diff

。

array

（

［

］［

］

［

］［

］，

［

］［

］

［

］［

］

）

coords

［

］［

］

。

sign

（

diff

）

。

preds

coords

。

copy

（）

# Transform back

for

range

（

coords

。

shape

［

］）：

preds

［

］

transform_preds

（

coords

［

］，

center

［

］，

scale

［

］，

［

heatmap_width

，

heatmap_height

］

）

return

preds

，

maxvals

def

get_max_preds

（

batch_heatmaps

）：

‘’‘

get predictions from score maps

heatmaps： numpy。ndarray（［batch_size， num_joints， height， width］）

’‘’

assert

isinstance

（

batch_heatmaps

，

。

ndarray

），

‘batch_heatmaps should be numpy。ndarray’

assert

batch_heatmaps

。

ndim

，

‘batch_images should be 4-ndim’

batch_size

batch_heatmaps

。

shape

［

］

num_joints

batch_heatmaps

。

shape

［

］

width

batch_heatmaps

。

shape

［

］

heatmaps_reshaped

batch_heatmaps

。

reshape

（（

batch_size

，

num_joints

，

））

idx

。

argmax

（

heatmaps_reshaped

，

）

maxvals

。

amax

（

heatmaps_reshaped

，

）

maxvals

。

reshape

（（

batch_size

，

num_joints

，

））

idx

。

reshape

（（

batch_size

，

num_joints

，

））

# 複製

preds

。

tile

（

idx

，

（

，

））

。

astype

（

。

float32

）

# 獲取該id的座標

preds

［：，

：，

］

（

preds

［：，

：，

］）

width

preds

［：，

：，

］

。

floor

（（

preds

［：，

：，

］）

width

）

pred_mask

。

tile

（

。

greater

（

maxvals

，

0。0

），

（

，

））

pred_mask

。

astype

（

。

float32

）

preds

pred_mask

return

preds

，

maxvals

透過get_max_preds函式或者各joint在影象上的位置，這裡的post_process是透過相鄰畫素的heatmap來對當前joint的位置做一些修正。最後透過transform_preds函式將位置座標變換到原始影象上

def

transform_preds

（

coords

，

center

，

scale

，

output_size

）：

target_coords

。

zeros

（

coords

。

shape

）

trans

get_affine_transform

（

center

，

scale

，

output_size

，

inv

）

for

range

（

coords

。

shape

［

］）：

target_coords

［

，

：

］

affine_transform

（

coords

［

，

：

］，

trans

）

return

target_coords

即利用上訴放射變換的逆變換進行轉換，最終可得到原圖上的關節點座標。

小蜜蜂問答

小蜜蜂問答

HRNet 程式碼細節記錄

推薦文章

小蜜蜂問答

小蜜蜂問答

HRNet 程式碼細節記錄

相關文章

box是什麼意思網路用語？

B BOX 發音

是不是有個服裝品牌叫supre me

CSS如何讓頁面元素水平居中？

推薦文章