Loquendo

Loquendo is an Italian multinational computer software technology corporation, headquartered in Torino, Italy, that provides speech recognition, speech synthesis, speaker verification and identification applications. Loquendo, which was founded in 2001 under the Telecom Italia Lab (formerly, CSELT), also had offices in United Kingdom, Spain, Germany, France, and the United States.

Current business products to can be found in portable and in-car navigation devices, assistive devices for the differently able, smartphones, ebook readers, talking ATMs, computer games, voice-controlled domestic appliances and others. The voice synthesis and speech recognition systems is used in a new e-health application as part of Spain's Junta de Andalucía Government Health Services's virtual assistant.

Loquendo's products have been the recipient of several awards including being a Speech Technologies Speech Engine Leader in 2007, 2008, and 2009 It was rated as 'Market Leader' by Speech Technologies in 2009 and 2010.

On 30 September 2011, Nuance announced that it had acquired Loquendo.

History
Loquendo was originally a research group created in the mid-seventies by managers at IRI-STET in the CSELT laboratories in Turin before becoming a company in its own right in 2001.

Speech synthesis
Building on the recommendations of the University of Padua, by applying the technique of so-called diphones (the union of a consonant and a vowel, that counts 150 in total for the Italian) the voice technology group led by Giulio Modena created the first speech synthesizer with high intelligibility able to speak (and sing) Italian in 1975. It was called MUSA (MUltichannel Speaking Automaton), which demonstrated what was possible with the technology of the time. The results achieved in those years were condensed into an audio disc at 45 rpm published in 1978, distributed in thousands of copies through the mass communication media. The auto track, after a short spoken self-presentation of the system, contained a funny Italian version of the song Frère Jacques carried out in polyphony (a cappella) with more singing voices (MUSA could manage up to 8 synthesis channels in parallel). The evolution of this prototype, with the increase in the number of diphones (about 1000), the refinement of the tools of language analysis, and improved waveform management led to a marked improvement of the synthetic voice too. This led to the creation of the first integrated circuit of "voice synthesizer" developed internally in CSELT, which was manufactured by SGS (catalog as Zilog's Z80 microprocessor's peripheral (with the code M8950).

Later in the nineties, "ELOQUENS" was born, a multi-platform software speech synthesizer aimed for various operating systems including DOS, Windows, System 7, Unix, OS/2) and telephone boards with very large numbers of channels, such as those used by the Italian telephone operator to build the reverse telephoner subscribers information service (used to obtain a subscriber's identity and address from their telephone number).

Towards the end of the 1990s speech synthesis took on a new approach, instead of passing diphones it would use the selection and concatenation of acoustic units of variable length, an approach made possible by the increased power of computers and especially the increasing capacity of mass storage systems. This resulted in "ACTOR" – "The human sounding voice" – which began to have a large audience due to the number of telephone services and applications created by Loquendo related companies.

In the year 2000, the synthesizer was released from the research labs as a commercial product, including a number of editing tools to produce synthetic audio enriched with emotions, and it was also released as an SW library for use in various products, from small portable devices such as mobile phones, navigators and palm computers, to multichannel/multilingual telephone servers for (semi)automatic call centers.

The Loquendo speech synthesis has become an internet meme on YouTube, though it is more common in videos of the Spanish language. It is often used in creepypastas and parody dubbings (often with vulgar language).

Speech recognition
Shortly after the start of the research into speech synthesis, they began research on speech recognition and at the beginning of the eighties produced the first prototype, able to recognize the ten digits and a few simple commands.

Applying the Hidden Markov models in 1984 led to the development of a speech recognizer which could recognize connected words and sentences, created in collaboration with ELSAG, another company in the IRI-STET group. Even in collaboration with ELSAG, in 1986 was presented RIPAC (RIconoscimento PArlato Connesso), an early microprocessor aimed to perform recognition of the connected speech. This processor had VLSI levels of integration and was composed of 70.000 transistors.

The need to produce independent speech recognizer telephone applications leads to the creation of speech databases with the recorded voices of hundreds of different people and in 1987 the first large database, obtained through recording the voices of more than 1000 people calling from all over Italy with an automatic procedure, was used in the creation of a specially crafted phone server at CSELT labs.

This saved material saved allowed the training of Markov models, and, by using sophisticated algorithms led to the development of "AURIS", the first commercial recognizer that could "turn" in a variety of devices with Digital signal processors (DSP).

In the nineties, a large cross-European collaboration began and, along with a dozen other companies and universities across Europe a very large speech database was collected throughout Europe, with the voices of more than 65000 people.

This material, combined with a new mixed approach of Hidden Markov models and Neural networks led to "FLEXUS", the first flexible vocabulary speech recognizer, which allowed many varied telephone services to use automatic speech recognition in their human interfaces.

Merging "FLEXUS" and "ACTOR" into a single system created "Dialogos", allowing the creation of cutting-edge telephone services.

The birth of Loquendo as a company led to the development of many languages and the release of the recognizer in the form of library software for the creation of various telephony applications.

They also introduced several systems to write state-finite grammars and natural language models systems.

The speech databases recording campaigns continue having moved on from Europe to Mediterranean countries, to the South, Center and North America, and finally to countries in the Far East. Overall countless hours of speech have been recorded by contacting hundreds of thousands of people in the listed regions. The recordings have been collected both for fixed telephone networks, as well as in moving vehicles for mobile phones and also using high quality microphones in domestic environments for consumer applications such as video games, appliances, and home automation in general.

Speaker recognition
Research activities into speaker recognition were initiated in the early Eighties. Later, in the middle of two-thousands, speech databases tailored for this task became available. In collaboration with Politecnico of Turin they began experiments on two different fronts: speaker "identification" and "verification".

The success of the research has also pushed the company to move to the development of products specifically for these tasks through the enabling platforms described below.

Speech coding
The research activities into Speech coding started even before the ones on speech recognition and synthesis, aiming to build equipment such as CODEC and echo canceler to be able to increase as much as possible the number of telephone conversations that can flow through a single cable (or satellite connection) without losing voice intelligibility.

In the late seventies, studies and experiments led to the creation of algorithms to encode the telephonic speech signal and set-up the European regulation CCITT known as encoding A-law (8-bit logarithm encoding law "A" for audio signal 8 kHz band limited). This standard was then used in the CODEC for 64 kbit/s ISDN telephone lines.

In subsequent years they built stronger codecs (used telephone exchanges) and, within the PAN-Europe consortium GSM, the codec to use in second-generation mobile phones.

At the same time they built a CODEC to transmit high-quality signals in spite of the 8 kHz band limit of the telephone cables, which was useful for audio and video conference applications.

Enabling platforms
In the late nineties, the development of the Internet in the form known today (hypertext resident on different servers that span the planet in one big network) led to the need to make these texts available in voice over the phone.

At the same time, the IVR – Interactive Voice Response, became increasingly popular and used hardware and software tools to quickly develop new telephony applications. It became evident that the previous development models that led to the development of complex systems such as automation of directory inquiry service or Automatic Information Service Stations were too rigid and would not easily allow the development of new applications.

It was therefore felt that there was a need for enabling platforms for automatic voice telephone systems that are both scalable and easily programmable. To this end there was created a special working group to develop a voice browser prototype, to be shown to the public at SMAU 2000, with the name "VoxNauta". It was such a success that Telecom Italia decided to close its original research labs and create Loquendo on 1 February 2001.

Over the years "VoxNauta" was further developed in various scalable forms: from small servers to large enterprise systems with thousands of lines and has been installed in hundreds of companies around the world.

The birth of standards to write telephone services to connect server hosting the speech technologies to servers hosting the telephone boards pushes the development of solo SW.

The emergence of standards in the writing of telephone services (VoiceXML) and protocols (MRCP) for connecting servers hosting the speech technologies to servers hosting the telephone boards led to the creation of Speech Server software, hosting text-to-speech and speech-recognizer engines from Loquendo

This continuing research and development have led Loquendo to be one of the most widely known brands in the field of synthesis and voice recognition.

The brand
The name Loquendo was devised by the wife of the founding CEO, Silvano Giorcelli, while the logo was created by the Telecom Italia graphic department. When displayed as an animated gif the three ripples above the "O" turn on in sequence, giving the sense of the emission of sound.

The brand has not been protected by the company, there are other Italian companies whose name directly derives from Loquendo, and this has contributed to its widespread use, even at the expense of competing brands.

Sale of the company
Over the years there have been rumors of the sale of Loquendo to other companies.

The most recent was in the summer of 2011, when it was announced that two multinational USA based companies, Nuance and Avaya, were looking into the possibility of a takeover.

As Nuance was a direct competitor of the Italian company there was some worry by Loquendo workers that were worried about the possible dismemberment of research and development and the disappearance from Italy of an excellent brand with forty years experience.

A purchase by Avaya seemed more desirable as its activities were complementary to the activity carried on by Loquendo; Avaya in fact did not own any speech technology and therefore could have been very interested in the possibility of in-house development rather than purchasing them from outside companies.

These reports were followed with great interest by the workers, local authorities in Turin and Piedmont and the entire international scientific community.

On 13 August 2011, Telecom Italia publicly announced the sale of its entire stake in Loquendo to Nuance for 53 million euros.

Awards and Recognitions

 * CSELT won the «Telework Award», the first prize of the European Telework Week 1998 because the experimental demonstration of the usefulness of CSELT technologies for disabled users, such as quadriplegics or blind people, with the combination of different voice technologies (remarkable for their high quality).

Products

 * speech synthesis
 * speech recognition
 * speaker verification
 * voice browser