Jsoup

jsoup is an open-source Java library designed to parse, extract, and manipulate data stored in HTML documents.

History
jsoup was created in 2009 by Jonathan Hedley. It is distributed it under the MIT License, a permissive free software license similar to the Creative Commons attribution license.

Hedley's avowed intention in writing jsoup was "to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup."

Projects powered by jsoup
jsoup is used in a number of current projects, including Google's OpenRefine data-wrangling tool.