Revision history [back]

Grid Corpus Dataset

I'm doing a project on Lip Reading which uses the Grid Corpus Dataset, the dataset contains 34000 videos, 34 speakers, 1000 videos per speaker, the videos are 3s and 75 frames each. The dataset contains an align file for each video, the align file contains certain markings and the words being spoken in the respective video. What could the markings in the video possibly stand for? Eg of an align file (Here sil stands for silence) :

0 12250 sil

12250 19250 set

19250 27250 white

27250 30500 with

30500 36000 p

36000 43250 two

43250 55250 soon

55250 74500 sil