An AI voice makes news accessible to everyone
Why limit the audio presentation of journalism to podcasts? Aftenposten’s cloned voice will be able to present all the newspaper’s content – and by doing so, give everyone access to the same information.
Today a large part of society is left out when it comes to consuming journalism. It is, in fact, a democratic problem that media prevents people from getting information about society because much content is only accessible as text. This is also a big risk for news companies, as they may be missing out on a market opportunity by not offering an audio alternative to the huge amount of written journalism produced every day.
According to Dysleksi Norge, between 5 and 10% of all Norwegians suffer from dyslexia. This means that as many as 270,000 to 540,000 children and adults in Norway are reluctant to consume written journalism. This is not the only group who have challenges with reading. People with attention deficit disorder concentrate better when listening instead of reading. Refugees and asylum seekers who are in the process of learning Norwegian also find it very helpful to be able to listen and read Norwegian simultaneously.
Students struggle to read
When Aftenposten started looking into this, we primarily had our newspaper for kids in mind – Aftenposten Junior skole. Since this is a news product for use in public schools, we are obligated to fulfil all accessibility requirements.
We learned from teachers that 92% of them have students who struggle to read in their classroom, and we were even told that schools were not interested in buying our product if we could not offer text-to-speech.
Two important observations and findings from our research also convinced us that adults in the future will have needs quite similar to today’s users of Aftenposten Junior skole.
Firstly, we observed that many kids, beyond those who struggle to read, actively chose to listen to the text. And today’s kids and teenagers are potentially future subscribers who tend to bring their media habits from childhood into adulthood. After observing how popular listening is when given the choice between sound and text, we are pretty sure that we need to have a sound alternative ready for them before they grow up.
Secondly, dyslexia and attention deficit are lifelong problems. This means that people who suffer from it will probably still prefer to listen to a long article instead of reading when they grow up, and they will not find our news products worth paying for unless we can offer more than text-based journalism.
A voice you can recognise
Our primary goal was to make an artificial voice with the highest possible quality. That is why we offer a cloned voice and not a purely synthetic voice. A synthetic voice is an artificial voice that is not meant to sound like a specific, real person. A cloned voice, on the other hand, is created in the same way as a synthetic voice but simulates the speech of a real person. That means that if it is a voice that is familiar to you, you will recognise the voice and may even struggle to understand that it is not a real person but rather an artificial cloned voice that’s reading the news for you.
To build an artificial voice we needed speech data. Speech data in this context is recorded sentences from our newspapers. Using our past articles, our collaborator, BeyondWords, extracted 6,812 phonetically rich utterances. These sentences were recorded by Anne Lindholm, a podcast host in Aftenposten, who is now also the voice behind our cloned voice.
After processing the speech data and training a neural network, the first version of the voice was ready – and it was impressive. Anne herself could not believe how similar it had become to her own voice. Still, as with all other AI-features, we needed to train it to improve it. By training we mean that a person listened to a huge amount of sound files that were converted from articles and reported mistakes.
A linguist from the company that developed the voice technology then made corrections to the phonemic dictionary that served as the foundation for the quality of the cloned voice. When a mistake is corrected in this way, the correction will affect all future articles in which the same words occur. Over the last few months, the voice improved a lot and we are soon ready to scale up so that you can hear the voice on many more Aftenposten articles.
Many benefits with a robot
When it comes to the quality of the voice, a real voice still beats the robotic one. But we have done A/B tests between the real voice and the artificial voice, and the results indicate that the quality difference is not very high and that the benefits with a robot voice outweigh the disadvantages.
One of the benefits has to do with the nature of digital presentation of news. When a dramatic incident first occurs, like the start of the war in Ukraine, the news gets updated from minute to minute, and it is impossible for a real person to compete with the speed of updating audio files with the cloned voice. Not to mention the cost of having a real person doing multiple recordings of an updated article, as well as the time saved for the journalists, who can instead focus on the next news article.
Artificial intelligence and our cloned voice have the potential to be revolutionary and make a hugely positive impact for large groups in our society who now can access journalism they could never access before.
This is why we believe that offering a robot voice based on artificial intelligence is an important bet on the future of journalism. It shows that new technology can contribute to a more open and inclusive society where everyone has access to the same information.
Lena Beate Hamborg Pedersen
Product Manager, Schibsted Subscription Newspapers
Years in Schibsted: 3