User:Monist H/sandbox

Sentence extraction is a technique used for automatic summarization of a text. In this shallow approach, statistical heuristics are used to identify the most salient sentences of a text. Sentence extraction is a low-cost approach compared to more knowledge-intensive deeper approaches which require additional knowledge bases such as ontologies or linguistic knowledge. In short "sentence extraction" works as a filter which allows only important sentences to pass. Most of the summarization work done till date is based on extraction of sentences from the original document. The   sentence extraction techniques compute score for each sentence based on features such as position of sentence in  the document [Baxendale 1958; Edmundson 1969], word or phrase frequency [Luhn 1958], key phrases (terms which indicate the importance of the sentence towards summary e.g. “this article talks about”)[Edmundson 1969]. There were some attempts to use machine learning (to identify important features), use natural language processing (to identify key passages or to use relationship between words rather than bag of words). The  application of machine   learning to summarization was pioneered by Kupiec, Pedersen, and Chen [1995], who developed a summarizer for  scientific articles using a Bayesian classifier. For the generation of a coherent and readable summary, one has to do   significant amount of text analysis to generating good feature vector, handling discourse connectors, and refining the   sentences. This system is an attempt in that direction.

The major downside of applying sentence-extraction techniques to the task of summarization is the loss of coherence in the resulting summary. Nevertheless, sentence extraction summaries can give valuable clues to the main points of a document and are frequently sufficiently intelligible to human readers.