How to Detect Speaker from facial landmarks of mouth using face_recognition
I am trying to find a speaker from a webcam using facial land marks which i can get using the face_recognition library. I am successful in getting the month top lip and bottom lip points.
I want to calculate the distance b/w these points and according to distance may be we can say person is speaking or not. What i had done so far now.
import face_recognition
import cv2
import math
video_capture = cv2.VideoCapture(0)
while True:
# Grab a single frame of video
ret, frame = video_capture.read()
face_landmarks = face_recognition.face_landmarks(frame)
try:
p1=face_landmarks[0]['top_lip']
p2=face_landmarks[0]['bottom_lip']
x1,y1=p1[9]
x3,y3=p1[8]
x4,y4=p1[10]
x2,y2=p2[9]
x5,y5=p2[8]
x6,y6=p2[10]
dist = math.sqrt(((x2+x5+x6) - (x1+x3+x4)) ** 2 + ((y2+y5+y6) - (y1+y3+y4)) ** 2)
print(dist)
image = cv2.circle(frame, p1[8], 1, (255, 255, 255, 0), 2)
image = cv2.circle(frame, p1[9], 1, (255, 255, 255, 0), 2)
image = cv2.circle(frame, p1[10], 1, (255, 255, 255, 0), 2)
image = cv2.circle(frame, p2[8], 1, (255, 255, 255, 0), 2)
image = cv2.circle(frame, p2[9], 1, (255, 255, 255, 0), 2)
image = cv2.circle(frame, p2[10], 1, (255, 255, 255, 0), 2)
# # cv2.clipLine(frame, p1, p2,(255,255,255,0), thickness=2)
# for p1t in p1:
# image = cv2.circle(frame, p1t, 1, (255,255,255,0), 2)
# for p1b in p2:
# image = cv2.circle(frame, p1b, 1, (255, 255, 255, 0), 2)
cv2.namedWindow('Video', cv2.WINDOW_NORMAL)
cv2.imshow('Video', frame)
except Exception as e:
raise(e)
# Hit 'q' on the keyboard to quit!
if cv2.waitKey(1) & 0xFF == ord('q'):
break
video_capture.release()
cv2.destroyAllWindows()
but the distance which i had calculated is varying even if person don't speak.If anyone has idea that how i can detect speaker using month lands marks then please let me know. Thanks
good luck
math.sqrt((x2-x1) * * 2+(y2+y1) * * 2) that's the simple formula.
if at all:
math.sqrt((x2-x1)**2 + (y2-y1)**2)
but that's not, what your code is doing.
then, there will be always some distance between the landmarks. to find out, if someone is moving the mouth, you'll need to make a time-series from the distances. (and maybe make some primitive frequency analysis)
you probably need to add up 3 distances (3 point pairs), not what you do now, for sure.