Idiolect

Idiolect is an individual's unique use of language, including speech. This unique usage encompasses vocabulary, grammar, and pronunciation. This differs from a dialect, a common set of linguistic characteristics shared among a group of people.

The term is etymologically related to the Greek prefix idio- (meaning "own, personal, private, peculiar, separate, distinct") and -lect, abstracted from dialect, and ultimately from Ancient Greek λέγω.

Language
Language consists of sentence constructs, word choices, and expressions of style, and an idiolect comprises an individual's uses of these facets. Every person has a unique idiolect influenced by their language, socioeconomic status, and geographical location. Forensic linguistics psychologically analyzes idiolects.

The notion of language is used as an abstract description of the language use, and of the abilities of individual speakers and listeners. According to this view, a language is an "ensemble of idiolects... rather than an entity per se". Linguists study particular languages by examining the utterances produced by native speakers.

This contrasts with a view among non-linguists, at least in the United States, that languages as ideal systems exist outside the actual practice of language users. Based on work done in the US, Nancy Niedzielski and Dennis Preston describe a language ideology seemingly common among American English speakers. According to Niedzielski and Preston, many of their subjects believe that there is one "correct" pattern of grammar and vocabulary that underlies Standard English, and that individual usage comes from this external system.

Linguists who understand particular languages as a composite of unique, individual idiolects must nonetheless account for the fact that members of large speech communities, and even speakers of different dialects of the same language, can understand one another. All human beings seem to produce language in essentially the same way. This has led to searches for universal grammar, as well as attempts to further define the nature of particular languages.

Forensic linguistics
Forensic linguistics includes attempts to identify whether a person produced a given text by comparing the style of the text with the idiolect of the individual in question. The forensic linguist may conclude that the text is consistent with the individual, rule out the individual as the author, or deem the comparison inconclusive.

In 1995, Max Appedole relied in part on an analysis of Rafael Sebastián Guillén Vicente's writing style to identify him as Subcomandante Marcos, a leader of the Zapatista movement. Although the Mexican government regarded Subcomandante Marcos as a dangerous guerrilla, Appedole convinced the government that Guillén was a pacifist. Appedole's analysis is considered an early success in the application of forensic linguistics to criminal profiling in law enforcement.

In 1998, Ted Kaczynski was identified as the "Unabomber" by means of forensic linguistics. The FBI and Attorney General Janet Reno pushed for the publication of an essay of Kaczynski's, which led to a tip-off from Kaczynski's brother, who recognized the writing style, his idiolect.

In 1978, four men were convicted of murdering Carl Bridgewater. No forensic linguistics was involved in their case at the time. Today, forensic linguistics reflects that the idiolect used in the interview of one of the men was very similar to that man's reported statement. Since idiolects are unique to an individual, forensic linguistics reflects that it is very unlikely that one of these files was not created by using the other.

Detecting idiolect with corpora
Idiolect analysis is different for an individual depending on whether the data being analyzed is from a corpus made up entirely from texts or audio files, since written work is more thought out in planning and precise in wording than in spontaneous speech, which is full of informal language and conversation fillers, e.g. "umm..." and "you know". Corpora with large amounts of input data allow for the generation of word frequency and synonym lists, normally through the use of the top ten bigrams created from it. In such a situation, the context of word usage is considered, particularly when determining the legitimacy of a given bigram.

Whether a word or phrase is part of an idiolect is determined by the word's location compared with the window's head word, the edge of the window. This window is kept to 7-10 words, with a sample that is being considered as a feature of the idiolect being possibly +5/-5 words away from the "head" word of the window (which is normally in the middle). Data in corpus pertaining to idiolect get sorted into three categories: irrelevant, personal discourse marker(s), and informal vocabulary. Samples at the end of the frame and far from this head word are often deemed superfluous. Superfluous and non-superfluous data are then run through different functions to see if given words or phrases are a part of an individual's idiolect.