YOLOv3

아키텍쳐

Darknet53 구현

def __init__ ()

self.layer1 = self._make_layer(32, 64, 1) #3층

self.layer2 = self._make_layer(64, 128, 2) #5층

self.layer3 = self._make_layer(128, 256, 8) #17층

self.layer4 = self._make_layer(256, 512, 8) #17층

self.layer5 = self._make_layer(512, 1024, 4) #9층

def _make_layer(self, in_channels, out_channels, num_blocks):

layers = []

layers.append(ConvBlock(in_channels, out_channels, 3, stride=2)) #채널 32 > 64

for _ in range(num_blocks):

layers.append(ResBlock(out_channels)) #채널 64 > 32, 32 > 64

return nn.Sequential(*layers)

class ResBlock(nn.Module):

def __init__(self, channels):

super().__init__()

# reduced_channels, layer1, layer2, activation

reduced_channels = channels // 2

self.layer1 = ConvBlock(channels, reduced_channels, 1)

self.layer2 = ConvBlock(reduced_channels, channels, 3, activation=False)

self.activation = nn.LeakyReLU(0.1)

def forward(self, x):

# residual, layer1, layer2, activation(out + residual)

residual = x

out = self.layer1(x)

out = self.layer2(out)

out = self.activation(out + residual)

return out

Yolo_block 살펴보기

Convolutional을 거치면, 크기가 작아진다.
크기가 작으면, 작은 것을 검출하기 힘들다.
중간 피처맵을 함께 분류(classification)한다.

Output 13 x 13 일때,

5개의 conv를 거친다.

head로 13 x 13 x 255 결과를 내보낸다.
26 x 26 으로 Up-sample 한다.

Output 26 x 26 일 때,

26 x 26 Up-sample과 concat한다.
5개의 conv를 거친다.

head로 26 x 26 x 255 결과를 내보낸다.
52 x 52 로 Up-sample 한다.

Output 52 x 52 일 때,

52 x 52 Up-sample과 concat한다.
5개의 conv를 거친다.

head로 52 x 52 x 255 결과를 내보낸다.

YOLO 버전별 헤드 비교

YOLOv1:

7 x 7 x 30
(2 x 5) + 20
1grid에서 2bbox 추론, 1 Class 추론
1 이미지당 98 bbox (= 49 * 2)

YOLOv2:

13 x 13 x 125
5 x (5 + 20)
1grid에서 5bbox 추론, bbox당 Class 추론
1 이미지당 845 bbox (= 13 * 13 * 5)

YOLOv3:

13 x 13 x 255, 26 x 26 x 255, 52 x 52 x 255
원본이미지 상응 32x32[큰물체], 16x16, 8x8[작은물체]
13 x 13 x 255 [큰물체].....
3 x (5 + 80)
1grid에서 3bbox 추론, bbox당 Class 추론
1 이미지당 10,647 bbox = 507 bbox + 2,028 bbox + 8,112 bbox
anchors box는 크기당 3개씩 총 9종류가 필요하다.

YOLOv3

아키텍쳐

Darknet53 구현

Yolo_block 살펴보기

YOLO 버전별 헤드 비교

댓글 쓰기