Detecting Speech and Music in Audio Content
The Netflix TechBlog
NOVEMBER 13, 2023
To evaluate and benchmark our dataset, we manually labeled 20 audio tracks from various TV shows which do not overlap with our training data. We adapted the SOTA convolutional recurrent neural network ( CRNN ) architecture to accommodate our requirements for input/output dimensionality and model complexity.
Let's personalize your content