실시간 객체 탐지

구조

웹서버1 (카메라)

- window conda

웹서버2 (AI, object Detection)

- window > wsl > conda

AI웹서버가 카메라 권한을 얻기 위해서는 3가지 방법이 있다.

- 고정 ip인 경우 포트포워딩을 통해서 접근 가능하다.

- http는 불가능 하며 https는 가능하다. 단, SSL 인증서가 필요하다.

- 웹서버(카메라)의 프레임을 주기적으로 AI웹서버에 전달한다.

프레임 전송 실습

웹서버1(카메라)

conda env list

conda create -n alpacoms python=3.8.3

pip install fastapi

pip install "uvicorn[standard]"

pip install opencv-python

python webcam_server.py (cmd에서 아래 파일 실행)

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import cv2
import uvicorn
app = FastAPI()
cap = cv2.VideoCapture(0)
def generate_frames():
  while True:
    success, frame = cap.read()
    if not success:
      break
    ret, buffer = cv2.imencode(".jpg", frame)
    frame = buffer.tobytes()
    yield (b'--frame\r\n'
      b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n')
@app.get("/video")
def video_feed():
  return StreamingResponse(generate_frames(),
      media_type = "multipart/x-mixed-replace; boundary=frame")
if __name__ == "__main__":
  uvicorn.run(app, host ="0.0.0.0", port=8000)

웹서버2 (프레임 받는 곳)

webcamtest.ipynb

import cv2
import matplotlib.pyplot as plt
cap = cv2.VideoCapture("http://172.28.80.1:8000/video")

success, frame = cap.read()

plt.imshow(frame)

실시간 탐지 실습

위 '프레임 전송 실습'의 웹서버1(카메라)를 그대로 사용한다.

main.py

1. '/' -> index.html -> video_feed

2. video_feed > gen()

- 프레임을 받아온다.

from camera import VideoCamera
import uvicorn
from fastapi import FastAPI, Request
from fastapi.templating import Jinja2Templates
from starlette.responses import StreamingResponse
app = FastAPI()
templates = Jinja2Templates(directory="templates")
@app.route('/') #맨 처음 접속할 화면
def index(request: Request):
                                       # 객체 탐지 결과 화면이 송출되는 페이지.
    return templates.TemplateResponse("index.html", {'request': request})
def gen(camera): # camera= camera.py에 있는 VideoCamera 객체 다.
    while True: # 계속 반복시킨다.
        frame = camera.get_frame() # 실시간 영상을 통해 디텍션한 결과 이미지 프레임을 받아오는것
        # 함수에서 차례대로 값을 return 해주는 yield 함수
        yield (b'--frame\r\n'
               b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n\r\n')
# index.html에 있는 img 태그는 url 기반으로 아래 함수에서 이미지를 가져옴.
@app.get('/video_feed')
def video_feed():            # 리스폰 반응 함수 yeild가 반복적(while문 때문에)으로 리턴을 하는걸 받는다
    # StreamingResponse <- 미디어 매체를 응답으로 반환하는 함수
                            # gen(VideoCamera()) 바로위에 있는 gen 함수에 import한 VideoCamera() 객체를 던짐
    return StreamingResponse(gen(VideoCamera()),
                    media_type='multipart/x-mixed-replace; boundary=frame')
                    #미디어 타입 양식
if __name__ == '__main__':
    uvicorn.run(app, host="localhost", port=8001)

index.html

<html>
  <head>
    <title>Video Streaming Demonstration</title>
  </head>
  <body>
    <h1>Video Streaming Demonstration</h1>
    <!-- src에서 가져온 이미지를 웹화면에 보여주는 간단한 img 태그 동기적으로 사용 가능-->
    <!-- video_feed에서 이미지가 바뀌면 바뀌는대로 보여줌. -->
    <img id="bg" src="{{ url_for('video_feed') }}">
  </body>
</html>

camera.py

웹서버1(카메라) > 웹서버2(camera.py) > 웹서버2(main.py)

웹서버1(카메라)에 접속하여, videocapture를 받아온 뒤,

객체인식을 한다. 결과 이미지(box와 tag)를 gen()에 전송한다.

from ultralytics import YOLO
import cv2
import math
# pretrain된 yolov8 small 모델을 model변수에 정의.
model = YOLO('yolov8s.pt')
# object classes
classNames = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat",
                    "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
                    "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella",
                    "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat",
                    "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
                    "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli",
                    "carrot", "hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed",
                    "diningtable", "toilet", "tvmonitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
                    "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
                    "teddy bear", "hair drier", "toothbrush"
                    ]
class VideoCamera(object):
    def __init__(self):
                    #웹캠 프레임을 가져올때 쓰는 비디오객체 인자에 넣는 숫자는 몇번째 웹캠이냐
        win_ip = "172.28.80.1"
        stream_url = f"http://{win_ip}:8000/video"
        self.video = cv2.VideoCapture(stream_url)
    def __del__(self):
        self.video.release()
    def get_frame(self): # 객체탐지 결과 프레임을 반환하는 함수
                        # VideoCapture에서 read 내장함수를 호출해야만 실시간 프레임 1장을 가져온다.
        success, image = self.video.read()
                 #모델에다가 실시간 프레임을 때려박음.
        results = model(image, stream=True)
        # model에 1장을주면 results는 길이가 1, model에게 2장을 주면 results는 길이가 2
        for r in results:
            # 프레임 1장에 있는 객체 객수 만큼 bbox가 있을 거다.
            boxes = r.boxes
            for box in boxes: # bbox 하나하나 정성들여서 이미지에 drawing 할거다.
                # bounding box에 bbox의 좌표가 들어있다.
                x1, y1, x2, y2 = box.xyxy[0]
                # int 타입으로 변환.
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2) # convert to int values
                # 이미지에 box 정보 넣기
                # image 즉, 프레임에다가 바운딩박스를 그리는데,(255, 0, 255) 색깔로 draw하고, 2 line width
                cv2.rectangle(image, (x1, y1), (x2, y2), (255, 0, 255), 2)
                # 예측 클래스 confidence
                confidence = math.ceil((box.conf[0]*100))/100
                # 예측 클래스 이름
                cls = int(box.cls[0])
                # 텍스트 정보
                org = [x1, y1-10]
                font = cv2.FONT_HERSHEY_SIMPLEX
                fontScale = 1
                color = (255, 0, 0)
                thickness = 2
                cv2.putText(image, classNames[cls]+" "+str(confidence), org, font, fontScale, color, thickness)
                # image 즉, 텍스트와 바운딩 박스가 draw된 상태로 jpg 인코딩으로 반환함.
        return cv2.imencode('.jpg', image)[1].tobytes()