Text to speech in digital television

Text to speech in digital television refers to digital television products that use speech synthesis (computer-generated speech that “talks” to the end user) to enable access to blind or partially sighted people. By combining a digital television (a television, set-top box, personal video recorder, or other type of receiver) with a speech synthesis engine, blind and partially sighted people are able to access information that is normally displayed visually in order to operate the menus and electronic program guides of the receiver.

User need
Using an audiovisual medium causes problems for certain people with disabilities, notably individuals with sight or hearing loss. These problems can be split between interface accessibility barriers and impediments in using the content itself. Text-to-speech features in television products helps address interface accessibility barriers for blind and partially sighted people who may be unable to use the standard visual interface or even special features such as large fonts, magnifiers, adjustable color schemes, etc.

Digital television solutions are often more complicated products compared to their analog ancestors. The ability to navigate many menus, to see on-screen program information, and to browse electronic program guides or on-screen content listings to find out what is available to watch, is essential to using digital TVs.

Policy makers across the world have recognized the importance of access to (digital) television. Recital 64 of the EU’s Audiovisual Media Services Directive (AVMS) states: "The right of persons with a disability and of the elderly to participate and be integrated in the social and cultural life of the Community is inextricably linked to the provision of accessible audiovisual media services." The initial report of a European Commission study "Measuring progress of eAccessibility in Europe" refers to television as one of a set of fields "that are now essential elements of social and economic life." The United Nations Convention on the Rights of People with Disabilities makes specific reference to television access in Article 30(1) ("Participation in cultural life, recreation, leisure and sport"): "States Parties recognize the right of persons with disabilities to take part on an equal basis with others in cultural life, and shall take all appropriate measures to ensure that persons with disabilities: [...] b. Enjoy access to television programmes, films, theatre and other cultural activities, in accessible formats."

History
Text-to-speech software has been widely available for desktop computers since the 1990s, and Moore’s Law increases in CPU and memory capabilities have contributed to making their inclusion in software and hardware solutions more feasible. In the wake of these trends, text-to-speech is finding its way into everyday consumer electronics. In addition to text-to-speech solutions for computers, we now see talking watches and clocks, calendars, thermometers, kitchen aids, and many other products. Talking books and GPS navigation systems have become widely used as well.

Organizations representing blind and partially sighted people are long-standing supporters of text-to-speech technology in consumer electronics. In the UK, the Royal National Institute of Blind People (RNIB) has been advocating for speaking radio and television products since the early part of the century and has supported manufacturers in creating such solutions.

The Digital TV Group, the UK Industry association for Digital TV, first discussed the topic in 2007 and subsequently brought the industry together to write a technical specification for text to speech in the horizontal market in 2009. This collaboration formed part of the UK Government BERR Usability Action Plan. When complete, the plan was submitted to Digital Europe for ETSI standardization and also published as a white paper. Subsequently, the plan was incorporated in the U-Book - UK Digital TV Usability and Accessibility Guidelines including text to speech.

In 2010, two talking products for digital television came into the market in the UK. The Sky Talker is an add-on for the Sky set top box. It provides talking features for program and channel information and play back control. The Sky Talker is operated through the standard Sky remote control. In the same year, the Smart Talk Freeview (terrestrial digital broadcasting) set-top box was also launched into the UK market. This is a Goodmans branded Freeview set top box, developed by a partnership between Harvard International Ltd and the RNIB. It was the first complete talking solution for digital television in the UK, including speaking of the Electronic Program Guide, menus, and providing spoken assistance during setup.

In Japan, both Panasonic and Mitsubishi Electric, have been producing television and Blu-ray products since 2010. According to information compiled by the Japanese blindness organization, Lighthouse for the Blind, there are some 70-odd products from Mitsubishi and Panasonic with talking features.

Around 2011 in Spain a talking Linux-based set-top box solution, using the free Festival text-to-speech engine, was distributed to blind and visually impaired people free of charge by the Ministry of Industry, Tourism and Trade. However, this product is no longer available.

In 2012, Panasonic launched its voice guidance solution on the UK market. Voice Guidance is a set of talking features for their 2012 Viera range (and beyond). Voice Guidance announces on-screen information on the most important menus and has support for reminders, recordings, and playback functions. It is available for Freesat and Freeview receivers. In creating its solution, Panasonic took into account advice from RNIB experts.

Also in 2012, TVonics, a former UK digital video recorder maker, launched its talking PVR solution: a twin-tuner Freeview HD recorder based on the Ivona TTS engine which is widely lauded by disability groups for its high-quality voice. The TVonics solution was essentially a software addition for its existing platform and can be deployed as a software upgrade to customers of existing products. TVonics went into production in June 2012. The RNIB acquired the core DVR IP including the text-to-speech system. The TVonics brand was bought by Peterborough-based Pulse-Eight.

List of possible text-to-speech enabled features
Interaction with interactive services and widgets.
 * Initial set up and configuration- For connected TVs, this may include the network configuration including authentication to the home network.
 * Power cycle control (on, off, standby).
 * Announcing the currently showing channel and program, plus the list of available channels.
 * Assistance and feedback for basic receiver functions such as change channel and volume control.
 * Speaking the electronic program guide (EPG) and assisting the user in navigation of the EPG and other lists of services and content including browsing on-demand, catch-up content, previously recorded or downloaded content, as well as user customizable lists (favorites, etc.).
 * Spoken feedback for reporting and changing the state of access services (i.e.audio description).
 * Talking features in support of playback and recording include managing the recording schedule.
 * Notification of pay-per-view and other restricted content, restrictions, and conditions and control over these functions including the authorization mechanism.
 * Feedback and control for on-screen information banners, dialogues, and menus (including modal and other out-of band prompts).
 * Customization of the talking features.
 * Speaking of on-screen manuals and help pages.

Implementation guidance and standardization
An early effort to capture the user requirements and define a functional specification was undertaken by the Digital TV Group (DTG) in the UK, who published a White Paper on the subject. This White Paper has since been submitted into the publication UK Digital TV Usability and Accessibility Guidelines (known as the U-Book). The same White Paper was also used as the basis for a discussion between disability user groups and DigitalEurope, a European industry body for manufacturers of consumer equipment on the topic of text-to-speech for television, The DigitalEurope work stream led to the International Electrotechnical Commission (IEC) setting up a project group (IEC 62731) to create an International Standard for text-to-speech in digital television. The first edition of the standard, IEC 62731:2013 was published officially as an International Standard in January 2013. The Standard does not dictate implementation but provides a functional description on how a text-to-speech enabled television product should behave and what should be spoken when properly used.