Why does video.set(..FRAME_POS,<index>) index not align with frame number?
I have a video that is 2:12 sec long according to QuickTime on MacOS (10.14 Mojave).
I have the following code:
import cv2
vid = cv2.VideoCapture("video.mov")
length = int(vid.get(cv2.CAP_PROP_FRAME_COUNT)) # = 3953
fps = int(vid.get(cv2.CAP_PROP_FPS)) # = 29
def frame_set(index):
success = vid.set(cv2.CAP_PROP_POS_FRAMES, index)
success, img = vid.read()
return img
def frame_walk(index):
success = vid.set(cv2.CAP_PROP_POS_FRAMES, 0)
for i in range(index):
vid.read()
success, img = vid.read()
return img
sum(abs(frame_set(0) - frame_walk(0))) # = 0
sum(abs(frame_set(29) - frame_walk(29))) # = 0
sum(abs(frame_set(30) - frame_walk(30))) # = <big number> <---- PROBLEM, mismatch
frame_set(3953 - 128) # = <image>
frame_set(3953 - 127) # = None <---- PROBLEM, should be valid image
frame_set(3952) # = None <---- PROBLEM, should be valid image
frame_walk(3953 - 127) # = <image> <---- correct answer
frame_walk(3952) # = <image> <---- correct answer
Clearly a misalignment between the "frame index" method starts as soon as "1 second" has elapsed in the video. The OpenCV ".set" method is not actually setting to the correct frame. However the more cumbersome "walk" method works just fine.
Am I doing something wrong here?
This appears to be a bug in the OpenCV codebase, because the video length divided by the fps provides a 2 minute 16 second video, when QuickTime correctly reports a 2 minute 12 second video. That difference accounts for the last 127 frames being dropped from the ".set" method.
somewhat mildly, an "expectation mismatch". cv2.VideoCapture is a utility class to acquire images for computer-vision. while it seems you want to build a video editing software on top of it. (wrong library abused for this, sorry to say so)
some codecs (i've no idea about apple or mov) only store the (absolute) position information of keyframes, so any position relative to that is a plain guess.
besides that, your 2nd attempt:
is broken. even IF you get the correct number of frames, the following
success, img = vid.read()
will return an EMPTY/INVALID numpy array. (the movie's over already)
and, since, like all other python noobs, YOU NEVER CHECK if it's valid or not, --- you'll just burn there.
I do want to do computer-vision, specifically I am working on de-noising images using distance-based function approximation techniques. I am using the VideoCapture object to store images in a less RAM-intense mechanism (rather than a single array).
I want to be able to index access the video just as you would index access an array. The actual number of frames provided by "length" is correct! If you look carefully, you can see that the only difference between "frame_walk" and "frame_set" is the way that the i-th frame is retrieved. I excluded the checking code for brevity, my actual code is longer and more careful.
Notice that using "frame_set" I am unable to access the last 127 frames of the video, even though they clearly exist. This is the behavior which I question.