One of the main media companies in Europe wanted to accurately analyze the electoral debates to compare candidates and the impact of the debates in the news and on social media. The comparison between candidates had to include information such as the number of turns, duration of each turn and cumulated time for each candidate, number of different words used, most frequent words, expressions and collocations, complexity of language, emotion evolution along time, and topic detection.
Sigma’s Speech and Language technology is able to diarize the debates and compute statistics as well as monitor press, radio, tv and social media.
The solution combined many technologies: speaker diarization to partition the audio stream into segments according to the speaker identity, speech recognition to transcribe the debate, natural language processing and understanding to extract words, expressions, collocations, and entities, and find the most frequent ones, text complexity analysis to estimate the level of difficulty of the language and the language proficiency of each candidate, topic detection and voice-based emotion estimation. It also used crawlers to monitor the news and the social media, and statistical analysis to extract information and visualize the results.