Reproduce PersonLab (1)

PersonLab复现过程

我打算记录整个复现的心路历程,和代码实践遇到的问题 正在写完这一行字的我, 什么代码都没写~ 所以以下的文字是一篇记叙文, 默认的叙述手法是顺序

选个复现思路: - 自上而下: 从train.pymain(),逐渐构建所需要的函数以及其子函数 - 自下而上: 从每个最小的函数(如从读入数据集,预处理图像,构建标签)开始,逐渐向上封装函数

我倾向于后者:

我先想清楚了整个项目需要哪些模块, 然后创建了如下的几个空文件:

backbone_network.py, label_constrction.py, data_augmentation.py , data_iteration.py, loss.py, evaluate.py, greedy_decoding.py, instance_association.py, model.py , train.py

空文件先创建好,刺激你去复现,后面遇到问题再改

label_construction.py 开始

PersonLab 最精华的部分应该是如何利用COCO给定的标签信息, 利用人类的直觉和知识重新加工成几何上的监督信息, 即heatmaps,short-range offset,hough_socre_maps,mid-range pairwise offset, long-range offset,persons_mask. 所以此部分也是复现的关键步骤.

对了, 有一点需要强调, 复现前要仔细阅读论文的实验部分(Experiment)的描述.我读了一些关键的描述语言, 后面如果涉及到再详细说明

继续

coco keypoint detection task 数据集提供的标签格式是.json, 举个例子,person_keypoints_val2017.josn其中包含infoannotation,

  • info负责提供数据集中每张图像的image_id,file_name,height,width
  • annotation负责提供标注信息,其中的多个样本可能来自于同一张图像(多人问题嘛),每个样本包括如下:
    1
    2
    3
    4
    5
    6
    7
    8
    {"segmentation": [[76,46.53,省略,31.03,99,省略,46.03]],
    "num_keypoints": 15,
    "area": 2404.375,
    "iscrowd": 0,
    "keypoints": [102,50,1,0,0,0,101,46,2,0,0,0,97,46,2,82,44,2,91,49,2,97,43,2,109,66,2,112,43,2,128,73,2,71,74,2,76,79,2,94,65,2,110,81,2,84,90,2,129,99,2],"image_id": 149770,
    "bbox": [65,31.03,81,77.5],
    "category_id": 1,
    "id": 427983}
    我们的目标就是找到每张图像中所有样本,并且主要根据他们的keypoints,segmentation的坐标信息来构造上面提到的监督信号,bbox的坐标其实就不需要了.

以下论文提到的两点请注意:

在利用segmentation的时候,需要注意,作者提到了关于处理特殊情况的操作: > we back-propagate across the full image, only excluding areas that contain people that have not been fully annotated with keypoints (person crowd areas and small scale person segments in the COCO dataset)

这意味着我们要考虑iscrowd==1的情况,我们在进行loss计算时,要把iscrowd==1 的区域mask掉.

此外, 在论文的Imputing missing keypoint annotations章节,作者说明: >The standard COCO dataset does not contain keypoint annotations in the training set for the small person instances, and ignores them during model evaluation.However, it contains segmentation annotations and evaluates mask predictions for those small instances. Since training our geometric embeddings requires keypoint annotations for training, we have run the single-person pose estimator of [G-RMI] (trained on COCO data alone) in the COCO training set on image crops around the ground truth box annotations of those small person instances to impute those missing keypoint annotations.

因为暂时不打算用另外一个模型预测小尺寸图像的keypoints,这里我们直接忽略掉小尺寸的instance segmentation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

import numpy as np
from pycocotools.coco import COCO
import os
import cv2

def get_coco_annotations(ann_dir,img_dir,mode='val'):
ann_file = os.path.join(ann_dir,'person_keypoints_{}2017.json'.format(mode))
coco = COCO(ann_file)
image_id_list = coco.getImgIds()

for img_id in image_id_list:

anns = coco.loadAnns(coco.getAnnIds(imgIds=img_id))
if len(anns)==0:
continue

file_name = coco.imgs[img_id]['file_name']
file_path = os.path.join(img_dir,mode+'2017',file_name)
img = cv2.imread(file_path)
h, w, c = img.shape

#crowd_mask = np.zeros((h,w),dtype='bool')
instance_masks = []
keypoints_skeletons = []

for ann in anns:
if ann['area'] ==0:
continue
mask = coco.annToMask(ann)

if ann['iscrowd'] ==1:
# IGNORE CROWD IN PAPAER: TODO
continue
if ann['num_keypoints'] ==0:
# IGNORE: TODO
continue

keypoints_skeletons.append(ann['keypoints'])
instance_masks.append(mask)

yield img, keypoints_skeletons, instance_masks

ann_dir = '/data/dataset/coco/annotations/'
img_dir = '/data/dataset/coco/images/'

c = get_coco_annotations(ann_dir,img_dir)

print(c)
<generator object get_coco_annotations at 0x0000024759D61518>

上面是构造了一个生成器, 迭代产生(img, keypoints_skeletons, instance_masks) 这样的元组.

接下来要做的事,就是如何把最原始的标签重新加工编码成一个更加几何化的监督表示.

我们要明确一点, 论文中构造的偏移向量的表示都是在原图像尺寸的量化精度下的, 而不是神经网络直接输出的feature map的尺寸, 因为network的输出已经被降采样了, 所以不论是预测出的heatmap,还是offsets都被上采样,来还原到原图的尺寸精度上.

所以我们编码用来监督的表示时,需要参考原图的高度和宽度.

根据关键点的位置构造以关键点位置为中心,半径为r的圆形disk区域,区域内的取值为1,区域外的取值为0

最直接的做法是,遍历heatmap的每个位置,通过计算距离来判断是否落在某个关键点的半径范围内,但这样的时间复杂度为O(H*W) 而通过数学的直观角度,大部分的区域都是disk外的,有没有更快捷的方法,这里我采用的是:

引入高斯核的技巧,即在每个关键点的位置生成高斯分布的函数,其分布满足中心对称,那么通过设定阈值(半径处取值),大于阈值设置为1,小于阈值设置为0

高斯核的技巧,避免了heatmap上所有位置的遍历,此处我用了opencv带的cv2.GaussianBlur()函数,这个实际上也是个滤波器,也包含遍历,但其复杂度为O(H*W)但其函数通过openc库实现。(是否需要考证一下这种方法的计算速度?毕竟滤波器遍历了整个heatmap。有没有更直接的指定位置插入高斯核的现成的函数?)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def disk_mask_heatmap(one_hot_heatmap,radius):

if radius % 2==0:
radius = radius - 1

# odd number for kernel size
heatmap = cv2.GaussianBlur(one_hot_heatmap, ksize= (radius,radius), sigmaX=1, sigmaY=1)

def get_threshold_value(ksize):
# In order to get a circle(disk ) mask,
# we need to find a threshold value `t` in Gaussian kernel size, v=1 if v>t else 0
# sample value in position: circle center + radius//2
# to get a approximate disk of radius
square = np.zeros(shape=(ksize[0]*2,ksize[1]*2))
center = square.shape[0]//2, square.shape[1]//2
square[center[0],center[1]]=1
gaussian = cv2.GaussianBlur(square, ksize, sigmaX=1, sigmaY=1)

circle_border_value = gaussian[center[0],center[1]+ksize[1]//2] # border of circle

return circle_border_value

threshold = get_threshold_value(ksize=(radius,radius))
heatmap = (heatmap >= threshold)
return heatmap

上述方法可以解决了生成disk的难题,但是接下考虑构造short-range offset时,必须找到在关键点周围半径内的位置计算偏移。这种功能要求代码,必须给定一个关键点位置,就可以获得其周围对应的位置。这个需求也就是:在产生指定圆的时候,同时记录其圆内所有像素的位置和像素相对于圆心的偏移。

但是上面的代码不是逐个处理关键点的方法,而是一次性生成的,所以不能够实现上面的需求

获取heatmap上位置索引的技巧,比如,给定一个HxW大小的heatmaps

1
2
3
4
5
6
7
import numpy as np 
H,W =3,3
map_shape = (H, W)
idx = np.rollaxis(np.indices(map_shape[::-1]), 0, 3).transpose((1,0,2))
print(idx.shape)
print(idx)
print(idx[1,1,1]-2)
(3, 3, 2)
[[[0 0]
  [1 0]
  [2 0]]

 [[0 1]
  [1 1]
  [2 1]]

 [[0 2]
  [1 2]
  [2 2]]]
-1

这就可以得到了heatmap每个位置的地址索引. 继续考虑如何构造disk区域。 ## 获取每个关键点影响周围的disk区域

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def get_keypoint_discs(all_keypoints,map_shape,K=17,radius=4):


idx = np.rollaxis(np.indices(map_shape[::-1]), 0, 3).transpose((1,0,2))

discs = [[] for _ in range(len(all_keypoints))]
for i in range(K):

centers = [keypoints[i,:2] for keypoints in all_keypoints if keypoints[i,2] > 0]
dists = np.zeros(map_shape+(len(centers),))

for k, center in enumerate(centers):
dists[:,:,k] = np.sqrt(np.square(center-idx).sum(axis=-1))
if len(centers) > 0:
inst_id = dists.argmin(axis=-1)
count = 0
for j in range(len(all_keypoints)):
if all_keypoints[j][i,2] > 0:
discs[j].append(np.logical_and(inst_id==count, dists[:,:,count]<=radius))
count +=1
else:
discs[j].append(np.array([]))

# discs.shape N*K*[indices of the specified keypoint]
discs = np.array(discs)
return discs

返回的discs,包含K个heatmap,其中每个heatmap对应一种类型的人体关键点的所有人体的位置,每个位置的周围disk内的所有像素的位置的索引。进而达到了功能的需求。

进而我们可以设计这样的函数,根据所有人体的所有关键点的位置集合,

  • 获取每个关键点位置周围的disk区域,进行赋值,得到heatmaps
  • 获取每个关键点位置周围的disk区域内每个像素的位置索引,与中心位置在x,y方向上作差,得到short offsets
  • 获取起始关键点位置周围的disk区域内每个像素的位置索引,用终点关键点的位置于这些像素位置的索引作差,得到mid-range offsets
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
import numpy as np
import os
import cv2

import matplotlib.pyplot as plt
import numpy as np

def get_keypoint_discs(all_keypoints,map_shape,kpts_num=17,radius=4):
print(all_keypoints)

idx = np.rollaxis(np.indices(map_shape[::-1]), 0, 3).transpose((1,0,2))

discs = [[] for _ in range(len(all_keypoints))]
for i in range(kpts_num):

centers = [keypoints[i,:2] for keypoints in all_keypoints if keypoints[i,2] > 0]
dists = np.zeros(map_shape+(len(centers),))

for k, center in enumerate(centers):
dists[:,:,k] = np.sqrt(np.square(center-idx).sum(axis=-1))
if len(centers) > 0:
inst_id = dists.argmin(axis=-1)
count = 0
for j in range(len(all_keypoints)):
if all_keypoints[j][i,2] > 0:
discs[j].append(np.logical_and(inst_id==count, dists[:,:,count]<=radius))
count +=1
else:
discs[j].append(np.array([]))

# discs.shape N*kpts_num*[indices of the specified keypoint]
discs = np.array(discs)
return discs

def kpts_maps(keypoints_skeletons,discs,map_shape,kpts_num=17):
# discs.shape N*kpts_num*[indices of the specified keypoint]
kpts_maps = np.zeros(map_shape+(kpts_num,))

for n in range(len(discs)):
for k in range(kpts_num):
#print(keypoints_skeletons[n,k,2])
if keypoints_skeletons[n,k,2] > 0.:

disk_indices = discs[n][k]
kpts_maps[disk_indices,k] = 1.0

return kpts_maps

def short_offsets(keypoints_skeletons,discs,map_shape,kpts_num=17):

# discs.shape N*kpts_num*[indices of the specified keypoint]
#disk_mask_map = np.zeros(shape=map_shape+(kpts_num,),dtype='bool')
short_offsets = np.zeros(map_shape+(kpts_num,2,))
#map_shape = (H, W)
# [H,W,2] for each pixel's (x,y) position index
pixels_indices = np.rollaxis(np.indices(map_shape[::-1]), 0, 3).transpose((1,0,2))

for n in range(len(discs)):
for k in range(kpts_num):
if keypoints_skeletons[n,k,2] > 0:

disk_indices = discs[n][k]

kpt_x = keypoints_skeletons[n][k,0]
kpt_y = keypoints_skeletons[n][k,1]

short_offsets[disk_indices,k,0] = pixels_indices[disk_indices,0] - kpt_x
short_offsets[disk_indices,k,1] = pixels_indices[disk_indices,1] - kpt_y



return short_offsets

def mid_range_offsets(pair_wise_kpts,keypoints_skeletons,discs,map_shape,kpts_num=17):
"""
pair_wisd_kpts: for COCO, edges of tree structures of body is `kpts_num-1`
such as [[kpt_shoulder_r,kpt_ankle_r],[kpt_shoulder_r,nose],..,[]]

discs.shape Number of people*kpts_num*[disk indices of the specified keypoint]

return [H,W,2*(kpts_num-1),2]
"""
directed_edges = []
for edge in pair_wise_kpts:
directed_edges.append(edge)
#directed_edges.append(edge[::-1])

mid_offsets = np.zeros(map_shape+(len(directed_edges),2,))

# [H,W,2] for each pixel's (x,y) position index
pixels_indices = np.rollaxis(np.indices(map_shape[::-1]), 0, 3).transpose((1,0,2))

for n in range(len(discs)):
for edge_id, direct_edge in enumerate(directed_edges):
begin_kpt_id, end_kpt_id = direct_edge
begin_kpt = keypoints_skeletons[n][begin_kpt_id]
end_kpt = keypoints_skeletons[n][end_kpt_id]

if begin_kpt[2] > 0 and end_kpt[2] >0:
disk_indices = discs[n][begin_kpt_id]
#print(end_kpt[0],pixels_indices[disk_indices])
mid_offsets[disk_indices,edge_id,0] = end_kpt[0] - pixels_indices[disk_indices,0]
mid_offsets[disk_indices,edge_id,1] = end_kpt[1] - pixels_indices[disk_indices,1]
#print(mid_offsets[disk_indices,edge_id])
return mid_offsets


def visualize():
import matplotlib.pyplot as plt
import numpy as np

#keypoints_skeletons1 = np.array([[[28,28,1],[61,61,1],[155,155,1]],[[20,20,1],[55,55,1],[6,6,1]]])
keypoints_skeletons = np.array(
[
[[28,228,1],[161,361,1],[55,455,1],[64,54,1],[368,550,1]],
#[[20,120,1],[55,55,1],[6,6,1]]
]
)

pair_wise_kpts = [[0,1],[1,2],[2,4],[3,4]]
print("keypoints coordiantes:\n",keypoints_skeletons,"\npair wise keypoints:\n",pair_wise_kpts)

map_shape = (600,600)
kpts_num = 5
radius = 32

discs = get_keypoint_discs(keypoints_skeletons,map_shape,kpts_num,radius)

kpts_heatmaps = kpts_maps(keypoints_skeletons,discs,map_shape,kpts_num)
visual_kpts_heatmaps = np.amax(kpts_heatmaps,axis=-1)
short = short_offsets(keypoints_skeletons,discs,map_shape,kpts_num)

offsets_magnitude = np.sqrt(np.square(short).sum(axis=-1))
visual_offsets_magnitude=np.max(offsets_magnitude,axis=-1)

# show mid_range_offset
mid_offsets = mid_range_offsets(pair_wise_kpts,
keypoints_skeletons,
discs,
map_shape,
kpts_num)


#mid_offsets_edge = mid_offsets[:,:,0::2,:] # directed edges
mid_offsets_edge = mid_offsets # undirected edges

mid_offsets_edge = mid_offsets_edge.astype('int')

# (h,w,2)
pixels_indices = np.rollaxis(np.indices(map_shape[::-1]), 0, 3).transpose((1,0,2))

# canvs
background = np.zeros(map_shape+(3,))

for n in range(len(discs)):
for edge_id, edge in enumerate(pair_wise_kpts):
if keypoints_skeletons[n,edge[0],2]>0 and keypoints_skeletons[n,edge[1],2] > 0:
begin_disk_indices = discs[n][edge[0]]

for (x,y) in pixels_indices[begin_disk_indices]:
# sparse disk setting for better visualization
if x %8==0 and y%8 ==0:
#continue
begin_kpt = (x,y)
# note: mid_offsets_edge[y,x,edge_id,0] not: mid_offsets_edge[x,y,edge_id,0]
end_kpt = (x + mid_offsets_edge[y,x,edge_id,0], y + mid_offsets_edge[y,x,edge[0],1])

cv2.line(background,begin_kpt,end_kpt,(0,0,255),thickness=2)

fig = plt.figure(figsize=(12,12))
fig.add_subplot(1,3,1)
plt.imshow(visual_kpts_heatmaps )
plt.title("heatmaps of keypoints")


fig.add_subplot(1,3,2)
plt.imshow(visual_offsets_magnitude)
plt.title("the magnitude of short offsets")

fig.add_subplot(1,3,3)
plt.imshow(background)
plt.title("mid_range_offsets of keypoints")

plt.savefig('visualization.png')
plt.show()

visualize()
keypoints coordiantes:
 [[[ 28 228   1]
  [161 361   1]
  [ 55 455   1]
  [ 64  54   1]
  [368 550   1]]] 
pair wise keypoints:
 [[0, 1], [1, 2], [2, 4], [3, 4]]
[[[ 28 228   1]
  [161 361   1]
  [ 55 455   1]
  [ 64  54   1]
  [368 550   1]]]


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

visualization