Wikipedia:Bots/Requests for approval/LyricsBot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

LyricsBot
Operator:

Time filed: 10:58, Thursday January 3, 2013 (UTC)

Automatic, Supervised, or Manual: Automatic (with some manual intervention for difficult matching)

Programming language(s): Python

Source code available: https://github.com/dcoetzee/wikipedia-lyrics-bot

Function overview: LyricsBot adds external links to articles about songs, albums, and artists that give full lyrics from licensed, legal lyrics providers.

Links to relevant discussions (where appropriate): Village_pump_(proposals)/Archive_97 (up since 23 Dec with unanimous consensus)

Edit period(s): Continuous (at most 5 or so edits per minute is expected, and this could be brought down as slow as necessary as there is no urgency)

Estimated number of pages affected: MetroLyrics has about 1 million songs listed, but many do not have Wikipedia articles - roughly 100K to 200K seems about right.

Exclusion compliant (Yes/No): Yes (verify here at lines 15, 212)

Already has a bot flag (Yes/No): No

Function details: The function of this bot is simple:

* It creates subpages of Template:Lyrics containing external links to lyrics for particular songs, artists, nd albums. For example, Lyrics/Dreadlock Holiday contains: This renders as:
 * It enumerates pages on MetroLyrics.com, a licensed provider of full lyrics to popular music.
 * It transcludes the Template:Lyrics subpage into the External links section of its corresponding article, creating one if necessary, as in this sample edit to Dreadlock Holiday.

This renders as:
 * It adds a link to the MetroLyrics song page to the External links section of the corresponding article (creating the section if necessary), as in this sample edit where it added to Dreadlock Holiday:

Pages on MetroLyrics are matched to articles using infobox data about the artist and song title, and is expected to be highly reliable where automatic matching succeeds. Where this info is not available, does not match, or the article cannot be located, the page is queued for later human assistance. Pages are not edited if LyricsBot has edited it before, protecting against reverting humans.

The Template:Lyrics subpage structure enables lyrics links to be easily revised by another automatic process later on, should we choose to switch to a different provider or eventually adopt a Special:BookSources-type solution when more licensed providers enter the field.

Discussion
Why not just make Lyrics redirect to MetroLyrics song? Then you can still change provider easily enough, without the need for an extra transclusion layer. - Jarry1250 [Deliberation needed] 11:07, 3 January 2013 (UTC)
 * This wouldn't work because the template parameters to MetroLyrics song depend on the specific way in which that MetroLyrics chooses to render names (e.g. they render 10cc as "10 Cc" and many others are also inconsistent with the official name). It would be problematic to use template logic to attempt to duplicate errors in titles because the number of errors is so large. This could be avoided if there were some kind of widely-used "ISBN-like" unique identifier for songs, but there is not. Dcoetzee 11:13, 3 January 2013 (UTC)
 * Hmm, I can see that. At least if we assume that edits to non-articles are cheaper than edits to articles. What determines the subpage format for Lyrics? PAGENAME? Is that sufficient durable? - Jarry1250 [Deliberation needed] 11:38, 3 January 2013 (UTC)
 * It's not so much cheaper as easier (it allows for a mixture of sites each licensing different songs, and they can be both modified and moved about within articles without the need to be able to search articles for a variety of site-specific templates). The subpage format is based on PAGENAME, yes. I considered alternate structures, but the goal was basically to deal with complex issues of uniqueness without inventing my own unique identifiers. This deals gracefully with issues like songs that have multiple covers by different artists with identical lyrics, and distinct songs with identical names by the same artist. The subpages may fall out of sync as articles are moved, but this is easy to correct and only rarely would cause any problems. In cases where a single song has multiple sets of lyrics (e.g. associated with different versions), they can be listed in a bulleted list in the template subpage (this is why it includes the bullet). Dcoetzee 12:34, 3 January 2013 (UTC)
 * Hmm. I still can't say I'm keen on the idea of lumping external links together. It means newer users will struggle to edit them and takes important, visible changes out of mainspace for no particular reason at the moment, given there's only one provider. I can kind of see the case for eventually having them as subpages, but at this point? No, I can't say I really get it. Just add the MetroLyrics song template to the external links section of an article for now, I say. - Jarry1250 [Deliberation needed] 13:07, 3 January 2013 (UTC)
 * I'm fine with that, it simplifies things for me. That was how I originally planned to do it. Dcoetzee 20:15, 3 January 2013 (UTC)


 * This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT ⚡ 20:46, 3 January 2013 (UTC)
 * As noted on the user page, this bot was editing in supervised mode only, just to demonstrate what it does for the purpose of this request. All edits were reviewed by me prior to saving. Dcoetzee 21:37, 3 January 2013 (UTC)


 *  MBisanz  talk 23:51, 14 January 2013 (UTC)
 * The trial is now complete. I reviewed all 50 edits made by the bot after it completed, including clicking through the link to make sure it targeted the right page on MetroLyrics, and they all look correct. In one isolated case the link was placed in a references section because the section order wasn't compliant with the MOS - this is fixed and it'll place in the External links section now even if it isn't last. I made a few changes to how it works during the run, including adding an HTML comment to each edit reading "Licensed lyrics provider" (see sample edit) that is intended to discourage unwitting removal of the links as copyright violations, adding a link to my talk page in the edit summary in case of error, and some improvements to the matching algorithm and performance. Dcoetzee 03:15, 16 January 2013 (UTC)
 *  MBisanz  talk 04:01, 17 January 2013 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.