Switchboard Telephone Speech Corpus

The Switchboard Telephone Speech Corpus is a corpus of spoken English language consisted of almost 260 hours of speech. It was created in 1990 by Texas Instruments via a DARPA grant, and released in 1992 by NIST. The corpus contains 2,400 telephone conversations among 543 US speakers (302 male, 241 female). Participants did not know each other, and conversations were held on topics from a predetermined list.

Switchboard-2 Phase II was collected in 1999 and includes "4,472 five-minute telephone conversations involving 679 participants".

The corpus was used for development of speech recognition algorithms.

Text example: A: All right um well [laughter-uh] let's see i'm twenty B: How old are you Lisa. Okay that i'm older A: Yeah how old are you. Older [laughter] B: Older than you [laughter-are] A: [laughter-okay] B: Okay we are supposed  to talk  about places we like  to go so i'm  gonna and where are you from where are you calling from? A: I'm calling from uh Provo Utah but I'm from Plano Texas B: Oh you are from  Plano my sister lives in  Plano yes her husband is the new Director of Admissions at uh University of Texas at Dallas A: Oh really. Oh wow my dad used to work at UTD also B: Yeah so I [vocalized-noise]. Anyway so where's your favorite place to go? A: Um. Generally we just go on family vacations to Arizona my grandparents live there that's generally our usual summer vacation