Wikipedia talk:Categorization/Categories and subcategories/Archive

I think this is a bad idea because it encourages overcategorization. It encourages people to add half a dozen categories to an article, instead of the most appropriate one. This is important - the more categories an article is in, the less meaningful they become.

This is a debate that should be held some time anyway - in my opinion, certain classes of categorization are undesirable and should be avoided. In particular, any categorization by gender, skin color, or sexual or political preference (unless, of course, this is directly relevant to the person we're talking about). I do believe this obviates the problem that the "all or nothing" rule is intended to solve.

It seems topic articles are a good exception, as you state on the page. Radiant_ &gt;|&lt; 09:48, August 23, 2005 (UTC)
 * Articles should be in categories that help people find the articles. Some articles should be in many categories.  Some shouldn't.  If you want to convince me that this is a bad idea, you have to explain what overcategorization is, why it is bad, and why this will encourage it.  This proposal is trying to fill holes due to UNDERcategorization without opening up the floodgates where everything ends up in every possible related category.


 * While I disagree with your desire to remove any categorization by gender, skin color, or sexual or political preference, I don't believe that argument is relevant to this discussion. I just happened to use African American actors as an examples, but I could have used Academy award winning actors to illustrate the same problem.  Just because Marlon Brando is in Category:Best Actor Oscar that doesn't mean that he should be removed from the parent Category:Film actors.  This also is an all or nothing example.


 * I also want to point out that most of what I am trying to codify is already common practice. -- Samuel Wantman 07:43, 24 August 2005 (UTC)


 * I think the main difference is whether a category's subcategories, if put together, would form the entire category. For instance, "people" is subcatted by "profession" (and also "nationality"), and possibly subcatted further. Since every person has a profession, there should be no articles on individuals in the category:people. They should all be in the subcategories.
 * A category on actors could arguably be subcatted in oscar winners and non-oscar winners. However, the latter is rather silly so we don't have it. Thus, Marlon can be both in cat:actors and cat:oscar winners.
 * On the other hand, if we're going to subcat by ethnicity, it is obvious that every person has an ethnicity. So if cat:Americans is subcatted with "African Americans", "European Americans" and "Native Americans" (etc) then it follows that all articles on individuals should be put in the latter, not the former. Radiant_ &gt;|&lt; 14:17, August 24, 2005 (UTC)
 * I think we are basically agreeing. There are some minor differences in our reasons for things, but I don't think they are worth fighting over here.  I would say that categorizing by ethnicity may occasionally be useful, such as with "African Americans", that doesn't mean it makes sense to categorize everyone by ethnicity, I don't think it adds to the usefulness of the categorization scheme.  And, while everyone may have an ethnicity, it often is far from obvious or clear what someone's ethnicity may be.  To me that would fall into your "rather silly" category.  (I find some of the subcategorization by nationality to fall into the same silliness.  For instance, I don't think it makes it useful to sub-categorize all professions by nationality.  Some professions are international, like Category:Film directors.  Where should Roman Polanski be found?  I don't think his nationality is an important distinction.  But this is all besides the point.)  I'm going to change the example from Halle Berry to Marlon Brando, because it will make this proposal less controversial.  -- Samuel Wantman 19:40, 24 August 2005 (UTC)
 * On the subject of categories based on ethnicity, Category:Italian-Americans is downright huge. I've asked before exactly what qualifies one as Italian American.  People in that category are everything from 19th century immigrants to Nicolas Cage and Scott Baio.  I really don't see the value of categorizing everyone based upon their race/ethnicity.  How inclusive should such a list be?  And because one category is large, the community is compeled to complete Category:Turkish-Americans and Category:British-Americans.  Cacophony 05:05, 20 September 2005 (UTC)

more on all or nothing rule
I'm not sure the "all or nothing rule" as stated reflects widespread current practice. I think in some cases it's reasonable, like the example of Category:Best Actor Oscar (and we should perhaps discuss in another forum whether Category:Best Actor Oscar should even be a category in the first place). I think there are other cases where it would lead to massive redundancy, like the ethnicity or national origin subcats of Category:American people. I think the difference might be the relative size of the incomplete subsets to the higher level supercat. Back to the "utility" yardstick, or perhaps more related to the Principle of least astonishment, if the supercat contains a collection of "relatively small" subsets (and the subsets are incomplete) the preponderance of the articles will be directly in the supercat. In this case, I think duplicate categorization is warranted. If the subcats contain a large fraction of the articles based on a relatively obvious division I think duplicate categorization is not warranted. This may make for an "ugly" rule, not pleasantly in line with any kind of formal database view, but IMO viewing categorization as a formal database is not realistic. -- Rick Block (talk) 17:45, September 11, 2005 (UTC)
 * I agree with the utility yardstick. Can you think of a good category to illustrate the principle of least astonishment?  How would you rephrase things?  BTW, most of the members in Category:African-American actors are also in Category:American actors, which is in my opinion, the way things should be. -- Samuel Wantman 04:43, 12 September 2005 (UTC)

The least astonishment comment was more directed to the situation where some relatively large category has a relatively small (and incomplete) subcategory, in which case I'd think most users would expect the higher level category to contain all the articles (including the ones in the small, incomplete, subcategory). One example of this might be Category:American films shot in Japan which is a subcategory of Category:American films (the members of the subcat are not currently also in the higher level category, which seems weird to me). Here's a modified version of the rule (additions italicized, deletions struck ).


 * THE ALL OR NOTHING RULE — When the subcategories just happen to be are a few incomplete subsets of the category. Can ALL the articles be moved into Do MOST of the articles belong in only the main category and not in one of the existing subcategories? Would it make it easier for the user if Is it impossible or awkward to complete the set of subcategories was completed and move all the articles moved to the subcategories so that NOTHING was NO articles are left in the category? If the answer to both of these questions is "no" "yes", duplication is a good idea. For example, actor Marlon Brando is in Category:Best Actor Oscar  and its parent Category:Film actors.   While it is possible to add Category:Actors who never won an Oscar to fill out complete the categorization scheme and then make it possible to remove the duplications with all actors being in one or the other of the subcategories and none in the parent category, this wouldn't make the categories any more useful, and would make it much harder to categorize actors and search for them in categories. Another example of this is Category:Bridges in New York City and Category:Toll bridges in New York City. ALL the toll bridges are listed in both categories. These situations come about when one hierarchy of categories (toll bridges in the United States) is a subset of another hierarchy of categories (Bridges in the United States). In a sense the subcategories are related categories and not actually part of the same hierarchy. It also makes it easier to see a complete list of the bridges in each location.

-- Rick Block (talk) 04:52, 17 September 2005 (UTC)
 * These are good changes. I'm going to put them on the project page. -- Samuel Wantman 04:57, 17 September 2005 (UTC)