Wikipedia talk:Merge some redundant lists to categories

This page is for comment on the project to merge lists to categories. Go to the project page to add a list to be merged. Philwelch 23:32, 24 Mar 2005 (UTC)

Looking at the bigger picture, categories do not work in the vast majority of mirror sites. Lists, however, do. -- John Gohde 21:30, 25 Mar 2005 (UTC)


 * Alright, that's two *technical* reasons for retaining lists that could otherwise be categories, even though we have good *conceptual* reasons for switching to categories. How about retaining lists *in the main article*, but keeping them synchronized with the category? Two redundant but identical sets of information are better than two redundant but different sets of information. Philwelch 01:33, 26 Mar 2005 (UTC)

This is also under discussion at Categorization policy. Currently, categories have some functionality that lists lack, AND lists have some functionality that categories lack. The latter is likely to change with future MediaWiki releases, but that may well take another year. At that point, it would be useful to re-open the discussion. Radiant_* 10:26, Mar 29, 2005 (UTC)
 * 1) Lists can be annotated (in other words, they can contain huge amounts of text).
 * 2) Lists can show relationships through bulleted indentations.
 * 3) Lists can suggest articles that need to be written.
 * 4) Lists work during server problems. -- John Gohde 04:49, 2 Apr 2005 (UTC)


 * No one is discussing merging all lists to all categories. Please read and understand the proposal before you comment on it. The fact is, there are presently actual lists that exist on Wikipedia that are redundant with actual categories, and those particular lists should be merged. Philwelch 18:39, 29 Mar 2005 (UTC)

I don't like this
I don't like this. Frankly, I'm really annoyed by the recent change of the Gay Icon thing from a list to a category. I think it was a valid (if slightly frivolous and POV) article; it's an absolute liability as a category. In particular, I think it's rather inappropriate when reading certain articles to have (via categorization) what amounts to a note at the bottom of the page saying "gay icon". I can imagine (with something close to horror) the proliferation of dozens of other equally frivolous and POV labels: "Jewish icon", "Big in Japan", etc. -- Jmabel | Talk 05:12, Mar 25, 2005 (UTC)


 * I think that's a specific problem with the category and not a general problem with the idea of merging lists to categories. I don't think we should have 303 articles under "gay icon", but then again, being identified as a gay icon is an important fact when we're dealing with figures such as Judy Garland and Margaret Cho. Keep in mind that the category existed before my test case--it was just an incomplete and redundant information set compared to the list in the article. So i don't see how this critique is germane to merging lists to categories as a potential project, although it's certainly germane to the Gay Icons category--perhaps these complaints would be more germane at Category talk:Gay icons. Philwelch 05:22, 25 Mar 2005 (UTC)


 * All I can go by is the test case. If it was a typical example (as a test case should be) this is a basd idea. -- Jmabel | Talk 06:37, Mar 25, 2005 (UTC)


 * Fair enough. But look at the reasoning. If there was something wrong with the test case, what should we change if this becomes a broader project? The test case wasn't necessarily a typical example of the specific kind of list we would merge to a category, it was an example of the merging methodology and process. I start to wonder if I would have gotten a better reception if I didn't edit a list that was related to a minority subculture... Philwelch 16:50, 25 Mar 2005 (UTC)


 * I agree with Jmbael, but on different grounds.
 * You risk having particularly famous people, places, etc. in so many categories as to make the whole thing undreadable.
 * Some lists keep track of things that are valuable in aggregate but that can't sustain an article on their own. (e.g. List of earthquakes, List of Latin phrases, List of ethnic slurs).
 * Now you can certainly argue that lists whose members don't warrant an article shouldn't exist in the first place or that many lists are at odds with What Wikipedia is not, but I think it's worth considering the ramifications of the changes that you are proposing. --CVaneg 17:26, 25 Mar 2005 (UTC)


 * I'm not proposing merging every list into a category. List of ethnic slurs definitely wouldn't be merged, because we shouldn't create an article for every ethnic slur. Remember, if we do this project, one of the primary things we will be doing is selecting which lists would best be merged to categories, and which lists to leave alone. But if we have something like List of engineering topics, shouldn't we try to merge that with Category:Engineering? Philwelch 18:16, 25 Mar 2005 (UTC)


 * From the original proposal: "I am proposing that all lists on Wikipedia that can be made into categories, or already have been made into categories, be merged into said categories by a script." Philwelch 18:17, 25 Mar 2005 (UTC)


 * Ah. That will teach me to actually read before responding.  In that case most of my reservations are not applicable.  I'd be interested to see how the script performed, particularly since lists can be formatted in various ways, and you would not want to lose any information. (e.g. converting sub-lists into sub-categories) --CVaneg 19:51, 25 Mar 2005 (UTC)


 * As soon as we find someone who can write a script or a bot to do this sort of thing, we'll see about that. That's definitely a technical issue we'll have to work out. Philwelch 20:28, 25 Mar 2005 (UTC)

Removal of a page from a category is not detected
Categories miss a major feature of lists: although pages in a category can be watched by applying Related Changes to the category, removal of a page from the category is not detected without having all these pages in advance on one's watchlist. For a list one can look in the page history. Therefore I think lists should not be deleted, even if there is a corresponding category.--Patrick 13:57, 25 Mar 2005 (UTC)


 * How about we keep a category listing, in list form, on the talk page of the category, that is automatically synchronized to the category by a bot? If we're already going to use a bot to merge lists to categories, I don't think it would be a challenge to use that bot to synchronize the category to the list every other day or something. Philwelch 16:56, 25 Mar 2005 (UTC)


 * Yes! I would go further-- map the categories, so that you can watch just the highest level category. I know, the mapping is complicated by the existence of cross-linking and recursion, but that is solvable-- you keep track of where you've been and when you hit a node you've already mapped, you simply refer to the earlier occurance.  I have an example at User:Mwanner/Sandbox.  Mwanner 12:08, May 6, 2005 (UTC)

Red links
Categories don't prove an ability for red links (non-existing articles). How do you propose to fix that? -- AllyUnion (talk) 10:04, 28 Mar 2005 (UTC)


 * Any list with a significant quantity of red links would probably be a bad candidate for merging to a list. If a list has just a few red links, we can turn the red links into stubs and then create the category. The red link issue is an issue involved with the selection of which lists to merge to categories. Philwelch 16:29, 28 Mar 2005 (UTC)


 * Red links could be kept in the descriptionary text on top of the category page, or in the category's talk page. Radiant_* 10:26, Mar 29, 2005 (UTC)


 * I agree with Radiant - see the next comment section on dynamic merge of missing links in the category page. --Yurik 21:18, 29 Mar 2005 (UTC)

Annotated Category Listings
IMHO, the only reason why categories and lists have not yet merged, is because categories do not allow parametrized items. For example, in some actor's article, inserting Birth=...|Death=...|ShortDescription=...' can result in a properly formed item on the Category:Actors'' page, with birth/death/short description next to the name. We can have some standardized formating element on the category page, describing how category parameters should be rendered:  -- here, * means the name of the article.

The feature request has been added to bugzilla: [Bug 1775] --Yurik 08:31, 29 Mar 2005 (UTC)


 * No, that is by no means the only reason. Categories can only include articles that are already written, so there is no way to list items that do not merit, or do not yet have, articles. Also, many things may belong in a list somewhere, but are not inherent enough to deserve any mention in the article itself. Inclusion in a category necessarily shows up on the page of the article itself. -- Jmabel | Talk 16:16, Mar 29, 2005 (UTC)


 * I disagree - categories have a page associated with them, and that page can list any missing articles at the top, thus actually increasing their visibility. If public demands inclusion of links alphabetically in the list, some form of a template can be used to dynamically insert those links in the list. Category page example (assuming category name is "Actors"):
 * First, some general description what this list is about...
 * The above category page would produce a list of all actors in "Full Name (Born-Died)" format, plus all missing links from the template will be included in red on that same page in alphabetical order. Some bot can later clean up all links that have been created. --Yurik 21:07, 29 Mar 2005 (UTC)
 * The above category page would produce a list of all actors in "Full Name (Born-Died)" format, plus all missing links from the template will be included in red on that same page in alphabetical order. Some bot can later clean up all links that have been created. --Yurik 21:07, 29 Mar 2005 (UTC)


 * This feature is mostly needed for the duplicate category/list pages. When you have a list that does not categories information, than the whole point is mute, as list is just a simple article without any automation. I am trying to decrease the amount of manual categories+lists maintenance - people tend to forget to update all relevant lists, plus create duplicate cat+list pages. --Yurik 21:14, 29 Mar 2005 (UTC)


 * This proposal is now cross-posted (cleaned-up) at Wiki talk:Categorization policy. I think the discussion should be moved as well. --Yurik 07:36, 30 Mar 2005 (UTC)

Rather than creating new syntax, I think it would be easier - in the short term - just to create stub articles. Someone's going to have to do that at some point anyway; might as well get the ball rolling. In the long run, I'm hoping that categories will look more like lists - able to be annotated and to contain redlinks like any other page. -- Beland 22:43, 30 Mar 2005 (UTC)


 * There are two features i am proposing - way to format/annotate article name (with extra info) on the category page, and a way to add missing links. #1 is annotation:  (article page)   &rarr;     (category page) #2 is red links:   (category page).
 * I think you are referring to the second item, which, I agree, is arguable (it tries to follow wiki syntax). I do NOT agree with the stubs idea - most foreign wikis have mostly missing links. Those red links tell there is no article and encourage people to start writing, whereas stubs would lie there is an article when there is non and add to the general clutter without adding any useful information. The only interim alternative IMHO is to add missing links to the top of the category page, separate from auto-generated lists. --Yurik 23:20, 30 Mar 2005 (UTC)


 * Well, it's certainly debatable whether it's better to have a red link or a stub. At least on the English project, stubs are sorted by category, so people who have an interest in a particular subject area can easily find articles that need basic information.  Red links are not sorted that way, as far as I know.  Also, when the stub is created, someone should be putting a basic definition in there (which would be more information than a red link would give), and tagging it as a stub.  Personally, I've found that new articles get attention from other editors.  I guess it shows up on Recent Changes, and some people realize they have something to contribute to the topic, and then do so.  (As opposed to just sitting around not knowing that they should be contributing to something that hasn't yet been started.)  It also starts the process of centralizing information about that subject - there may be other articles that have bits and pieces of information that should be moved over, and there may be an existing article with a slightly different name with which to merge. -- Beland 03:13, 31 Mar 2005 (UTC)


 * Here's an idea for a feature request: links to stub articles have purple links, links to non-articles have red links, and links to normal articles have blue links. Philwelch 03:25, 31 Mar 2005 (UTC)


 * I like stub-color links idea, except that new users can get confused with the multitude of link colors: regular, visited, stub, visited stub, missing, visited missing. As far as stubs vs links - i can see benefits for both - when first writing a long article / category, writer usually just list all possible relevant topics as links, and other, mostly casual, visitors might later fill them out stumblng upon a red link; others (mostly those who keep track of recent changes), will see a stub created, and might contribute. Again, please consider interwikis with far fewer articles. --Yurik 07:46, 31 Mar 2005 (UTC)


 * I'm not sure why consideration of smaller wikis is necessary in this case; they can and do arrange things quite differently. Personally, I'm in favor of a long-term solution that allows redlinks to be intermixed with bluelinks, without any special syntax.  I'm saying we can get by without any redlinks in categories; if other wikis want to lobby for redlink support of any particular kind in categories sooner rather than later, that's fine by me. But I'd rather not have English categories made even more confusing to use than they already are. -- Beland 17:42, 31 Mar 2005 (UTC)

Categories produce relatively heavy server load

 * Myth or fact?
 * See the link to comments by Jamesday below. -- Beland 03:23, 31 Mar 2005 (UTC)

Watchlists and the History tab cannot track changes in a specified category's articles and subcategory lists

 * Any proposed solutions?
 * Yes - file a bug report and get it fixed. -- Beland 03:23, 31 Mar 2005 (UTC)

Categories appear broken on mirror sites

 * Please elaborate
 * You'll have to ask [User:John Gohde for examples. -- Beland 03:23, 31 Mar 2005 (UTC)

Article and subcategory lists in categories cannot be annotated

 * ''Proposed feature to fix this above

Categories can't contain non-existent articles

 * ''They can, but only in the top (article) area. Proposed feature to fix this above

If an article is placed in too many categories, there will be too many category links at the bottom of the page

 * Any proposed solutions?
 * 1.) Continue to use lists. 2.) Allow for "hidden" categories. More discussion below. -- Beland 03:23, 31 Mar 2005 (UTC)

Some aspects that may merit mention in a list of some sort may nonetheless be so tangentially connected to the article that a category link at the bottom of the page is a liability

 * ''That is an argument for retaining some lists as lists. No one ever suggested merging all lists to categories--that's why I moved this page to "Merge some redundant lists to categories". Philwelch 20:29, 30 Mar 2005 (UTC)
 * Disagree - if they are redundant, they should be merged, otherwise why would there be both? Please show examples. --Yurik 22:50, 30 Mar 2005 (UTC)
 * Well, in these cases, the redundant categories should be converted into lists. These types of categories are currently being deleted by WP:CFD, so there are probably few, if any, example of categories of this type.  List of famous left-handed people would be an example of a list of this type. -- Beland 03:23, 31 Mar 2005 (UTC)
 * Yurik, you just said you disagreed with me and then restated the point I'm trying to make here. Philwelch 03:29, 31 Mar 2005 (UTC)
 * Philwelch, sorry, misread the argument. Has anyone considered having a "Meta box" next to the main article edit box, so that all categories and interwikies would go there instead of cluttering up the article text? It sometimes gets very disorienting, especially for new authors. --Yurik 08:06, 31 Mar 2005 (UTC)

Categories are harder to move

 * Agree. Need a feature request (i.e. as part of "move" tab). Maybe limit for the new users - on click show move category request page. --Yurik 22:50, 30 Mar 2005 (UTC)
 * Bots are currently being used as an interim measure. -- Beland 03:23, 31 Mar 2005 (UTC)

It's harder to edit the article and subcategory membership lists of categories

 * What kind of edits? Bots can do major renames, whereas minor changes are usually related to specific articles
 * Well, if you want to break a large category into subcategories, then you have to edit say, 100 or 200 articles. It would be much easier if you could just import the list of articles into a text editor, split it up into 5 or 10 lists, and then post those lists to the 5 or 10 category pages you want.

Categories break when they reach 200 members

 * Bug [1058] appears to be fixed. Any other?
 * Yes. First of all, from a usability perspective, having multiple pages for large categories, sorted alphabetically, is poor.  All articles should be displayed on the same page.  Also, it's not guaranteed that all subcategories are listed on the first page.  For example, you might find articles and subcategories beginning S-Z on the second page.  This is very confusing.  All subcategories should be displayed on the first page.

It's nicer to view a long list on a single page with subsections than a category which is split into subcategories on multiple pages

 * Please elaborate
 * Well, the general style for categories is that if there are too many articles (over 50? 100? 200?), you split the category up into subcategories, to make things easier to find. This means that the original membership is spread over multiple subcategory pages.  With a list, you would simply divide into sections, so you could still see all the articles in a unified listing, but things would also be easy to find.  You could fix this if you engineered non-alphabetical sorting for categories (and convinced people to use it). -- Beland 03:30, 31 Mar 2005 (UTC)

The site search engine doesn't work well with categories

 * Please elaborate. Isn't search disabled? Google & Yahoo are currently default
 * True, the site's search engine has larger problems. I guess Google does see categories.  I list this here because Jamesday mentioned it in his comments (linked below). -- Beland 03:23, 31 Mar 2005 (UTC)

Arguments for converting lists into categories

 * The pointer from a category to a member article is automatically synchronized with the pointer from the article to the category.
 * As articles are renamed, merged, and redirects created, the category membership list is automatically updated.
 * Complicates vandalism, as only one item in category would get vandalized
 * Example - Recent deaths can get vandalized, by adding names to it. Each change would have to get researched by the maintainer. When categories are used, the maintainer of the personal page would see the change, and would likely know more about the subject matter.
 * I'm not sure it makes vandalism harder, exactly...with a list, you can watch the list for vandalism. With categories, you have to watch the article to catch vandalism.  It's currently not possible to watch all possible articles that might in the future be added to a category, and it's not possible to view the history of changes to the category's membership lists.  With a list, you can watch both the list and all the articles in it.  You can also use "Related changes" to watch all the articles in a list without adding them to your personal watchlist.  But categories could be fixed so that "Related changes" and watchlists and histories work, at which point they would be more clearly superior to lists for publicizing vandalism to more editors. -- Beland 03:35, 31 Mar 2005 (UTC)
 * With smaller category, individual observer can have expertise over an entire topic with all entries, and will easily spot mis-listed item. With large lists, such all-encompassing knowledge is less likely, thus more items might get in without being thoroughly checked. With categories, the article itself becomes the point of monitoring, and original article's author might know better which category it should be subscribed to. If we get category monitoring, the whole point is mute, as both category-side AND article-side monitoring becomes possible. --Yurik 08:20, 31 Mar 2005 (UTC)

Weighing pros and cons
From Jamesday's comments here and here, it sounds like the performance problems are caused simply because category page views are not cached. This means that if 1,000 people view a category, then 1,000 database lookups are needed. We would very much prefer that, say, 990 of those people view a cached version, and do only 10 database lookups. Am I wrong in thinking that constructing a category page from scratch isn't significantly more expensive than constructing a list page from scratch? Either way, it needs to be determined whether or not a long list of articles should be "redlinked", and it's not like the entire database is searched to find articles that belong to the category; that information is pre-indexed by the database server, right?

I would feel uncomfortable proceeding with mass list-to-category conversion if the developers thought it would negatively impact site performance in a significant way. Jamesday seems to think this problem will be fixed in 4-9 months.

The next most important problem is the ability to watch and examine the history of categories. I think people should be able to veto conversion if they are watching a list or would like the history to remain viewable in a central location. We should also file a feature request in bugzilla so that these issues will be fixed, and then later come back and convert the lists that people were watching into categories. I don't find the "move it to a talk page and have a bot sync it" solution to be particularly satisfying. The server load generated by such a bot would be quite high, especially given the performance hit of each category load. The sheer number of lists that would have to be synchronized with categories would mean that updates would be significantly delayed, probably on the order of days. I think it would be better just to leave the original lists in place for the time being. If someone wants to run a bot to temporarily synchronize existing, redundant categories and lists, that might be a reasonable interim measure, but it might be easier just to wait for code improvements so we can do full conversion.

Ease of use is also an important issue - if it's hard to make useful contributions, fewer people will do it. If you could edit the category membership by editing the category page (instead of 700 individual article pages), that would be a lot easier. But personally, I wouldn't worry about getting this fixed before starting mass conversion. Especially since redundant lists create opportunities for editors to waste time making redundant improvements. There's also question about whether or not every single list that an article is on should have a link from the article itself. Many people seem to find that doing that would unnecessarily clutter the navigation area at the bottom of the article, and that people should look at the "What links here" page to find all the lists that point to the article. Personally, I find this unsatisfactory, because it's not obvious to the new user that this information is available through this link, and also because there are a lot of other inbound links that clutter up that page. (Including many user and talk pages, and other articles which have common list or category memberships.) On the other hand, it would be nice to segregate the "important" memberships from the "trivial" ones, for easier navigation.

One way to do this would be to mark categories as "incidental" or "vital", and have two navigational boxes, as appropriate. Another would be to continue to use lists for "trivial" or "incidental" facts, and use categories only for "vital" memberships. (Or "topical categories" vs. "lists of things with certain attributes", etc.) A bot could be used to make sure that each article has a "See also" link for each list it is a member of. Alternatively, we could just convert lists of both kinds into categories, and just make a convention that the "important" ones should be first in the list.

All of the other negatives of categories can also be fixed with code improvements, including the ability to have manually sorted subsections. (Well, I don't actually care about broken mirror sites.)

-- Beland 03:21, 30 Mar 2005 (UTC)

Next steps
Honestly, I think the thing to do right now is to file a bunch of requests in bugzilla, and then put this project on hold for a few months until some critical fixes can be made. (And perhaps lobby a little for developer attention, so the project can proceed as quickly as possible.) It might be OK to deal with lists which are redundant with existing categories, to prevent wasted effort. But we'll need to get an idea from the developers about how many conversions would be a reasonably safe amount, and then have some process for allowing people to object to conversion for change-watching reasons. (Otherwise, I'm sure we wouldn't get community approval to run a bot for this purpose.) I'm not sure it's worth the effort of arguing about which lists are most important to convert, if we're going to be able to open the gate much more widely in a few months anyway.

In fact, I think the interim might be better spent converting "See also" links into categories. (Which is actually sort of what was done in the "Gay icon" case.) There's a lot more manual involvement here, and a lot of immediate benefits, including reduced redundancy and increased maintainability of cross-references, and considerably improved navigation. The performance impact should also be somewhat less, both because things will proceed more slowly, and because the categories already exist.

The only navigational penalty for converting "see also" links into categories is the additional click and page load that's required. But that's what navboxes are for. Adding a navbox to a category automatically adds all the articles that contain that box to the category, so there's a neat solution which satisfies both the need for short click-paths and easy navigation.

(That makes me wonder whether category "peers" shouldn't automatically be displayed at the bottom of every article. But that seems like a question for another page and another day.)

In any case, I've done some automated scans of previous database dumps to find articles in most need of "see also" link conversion. There are some subpages of Wikipedia:Auto-categorization which present the results in a convenient report designed to put everything a human editor needs to the conversion for a given article in one place. -- Beland 03:21, 30 Mar 2005 (UTC)

Bot
I can implement any bot requests related to this project, if no one else is interested. Only fairly trivial enhancements would be necessary for Pearle to be able to do this. (I could also pass the source code on to someone else who would actually operate the bot.) -- Beland 03:21, 30 Mar 2005 (UTC)

Please don't do this yet!
Lists have one vital feature that categories cannot yet have: they can have entries for articles that do not exist. In addition, lists can support semantics that categories cannot. I believe that the proposals here are over-hasty, and are getting ahead of what the category scheme can as yet support semantically. In any case, the lists should be kept as long as possible until there is general community consensus for how to proceed. -- The Anome 07:44, Mar 30, 2005 (UTC)


 * Do you think my proposal addresses your concerns? Thanks! --Yurik 08:02, 30 Mar 2005 (UTC)


 * Anome, no one worth listening to thinks we should replace ALL lists with categories. That's why I moved this to "merge some redundant lists to categories". Philwelch 03:23, 31 Mar 2005 (UTC)


 * If a list is redundant with a category, the list should be made more useful, such as with annotation or more useful sorting/grouping. Stan 01:38, 2 Apr 2005 (UTC)


 * What if someone changes the list or adds or removes an article from the category? We have subcategories for sorting and grouping, Stan. What we shouldn't have is two identical sets of redundant, 'non-identical' information. And that will inevitably happen whenever a list is redundant with a category. Philwelch 04:21, 2 Apr 2005 (UTC)

Semantic structure?!
Is'nt this problem a problem of semantic structuring? Lists and categories have something in common - they comprise of a list of items which are hierachically linked.

So what is the difference?

 * Categories usually do not link items by an order other then alphabetical (it's possible though, but it is a pain) Lists primarily do that
 * the formatting of lists gives mor freedom
 * Categories live bottom up, lists top down (though categories can do something that lists can)
 * Pointers versus Edges - Categories use edges, while lists only use pointers. The concept of edges implies automatic handling of "the second side" of the link. (a new category is created only with a new article -> the category page has to be maintained) - the other way around it is not possible (currently).
 * orphaned links can be created by lists, but not by categories (currently)
 * readers often want to know the semantics of a lemma and browse through equivalents or antonyms - tat's why navigation bars are so popular

What we need
What we actually want to achieve is a semantic structure, where we try to use non primitive constructions, without having the primitives integrated yet (Neither categories, lists or navifgation bars are). We need some basic primitives to link articles in a structured mannor. --BoP 22:29, 30 August 2005 (UTC)
 * External structuring information: (already done with categories) "element of" <-> "category"
 * Properties: "name of property: value" (currently not well solved) - to make automatic structured list generation possible as well as defining proper prede- and successors
 * list formatter: The lists associated with categories should be formatable by specifying a template
 * navigator: a navigation bar, that displays some prede- and successors and the category, also with an adjustable template (format probably like the category entry + template link)
 * semantic browser: a browser that can read the data and display it like the visible thesaurus or freemind would be really great
 * structuring: It is already demonstrated in Wikispecies, well we do not have to go that far but the goal would be something like OWL for semantics