Detecting Speech and Music in Audio Content
The Netflix TechBlog
NOVEMBER 13, 2023
Like semantic segmentation for audio, SMAD separately tracks the amount of speech and music in each frame in an audio file and is useful in content understanding tasks during the audio production and delivery lifecycle. Content duration ranged from 10 minutes to over 1 hour, across the various genres listed below.
Let's personalize your content