User talk:SDZeroBot/Category cycles

Geographic neighbors of different kind
A very common kind of a category cycle is the following:
 * 1) &lt;Foo&gt; (place on land)
 * 2) Bodies of water of &lt;Foo&gt;
 * 3) &lt;Bar&gt; (a body of water)
 * 4) Places on &lt;Bar&gt;
 * 5) Back to &lt;Foo&gt;

Examples:
 * Gulf of Aden → Horn of Africa → Horn African countries → Eritrea → Geography of Eritrea → Landforms of Eritrea → Bodies of water of Eritrea → Red Sea → Landforms of the Red Sea → Bodies of water of the Red Sea
 * Rivers of Manhattan → Hudson River → Populated places on the Hudson River → New York City → Environment of New York City → Water in New York City → Bodies of water of New York City → Rivers of New York City

When the category "Places on &lt;Bar&gt;" doesn't exist and thus is not part of the cycle, the solution is straightforward—just remove the category for the place Foo from category Bar (e.g. as I did with cycles involving Category:Strait of Malacca, see Special:Diff/981392261, Special:Diff/981392401, Special:Diff/981392616, Special:Diff/981392692, and Special:Diff/981392760).

How do we resolve such categorization cycles? —⁠andrybak (talk) 18:00, 2 October 2020 (UTC)
 * Categories are to help readers, not to facilitate bots. The only question is whether each individual connection makes sense to guide readers to related topics or subtopics. Whether that ultimately causes loops that trip up nonhumans is irrelevant. postdlf (talk) 19:38, 3 October 2020 (UTC)
 * Makes no sense. Which reader goes up and down the category hierarchy looking for pages to read? The only way the deep category hierarchies actually get used are in tools like search and petscan, which after all exist for the readers/editors only. Existence of bad links (of which cycles are just an example) causes the tools to give wrong results. – SD0001  (talk) 16:47, 5 October 2020 (UTC)

Country → Culture of country → Language → Language-speaking countries
Another common kind of category cycle:
 * Indonesia → Geography of Indonesia → Landforms of Indonesia → Archipelagoes of Indonesia → Outer Banda Arc → Timor → East Timor → East Timorese culture → Languages of East Timor → Malay language → Malay-language culture → Malay-speaking countries and territories
 * Kurdish culture → Kurdish language → Kurdish-speaking countries and territories → Iran → Iranian society → Demographics of Iran → Iranian peoples → Iranic culture
 * Berber → Berber languages → Berber-speaking countries and territories → Libya → Libyan society → Demographics of Libya → Ethnic groups in Libya

How do we resolve such categorization cycles? —⁠andrybak (talk) 18:00, 2 October 2020 (UTC)
 * this was raised and discussed at . The widely agreed solution is that articles about a country (in your examples: East Timor, Iran and Libya) should be in a category of territories by language spoken, but not the eponymous category ( etc.). Indeed, only the country itself is defined as a Foo-speaking territory, not the rest of the content of the eponymous category. The relevant guideline here is WP:EPONCAT. While there seemed to be an agreement on the solution, it was not widely implemented (my bad, I guess I had other things on my mind). Place Clichy (talk) 18:01, 10 February 2021 (UTC)
 * , thanks for the pointer. I've cleaned up all subcategories of Category:Administrative territorial entities by language and made sure that the articles are in corresponding language categories. —⁠andrybak (talk) 11:49, 13 February 2021 (UTC)
 * , it would be nice to regenerate the list again, because a big portion of the currently listed cycles involve a "X-speaking countries and territories" category. —⁠andrybak (talk) 11:54, 13 February 2021 (UTC)
 * ✅! – SD0001  (talk) 15:42, 13 February 2021 (UTC)

It's hopeless
Sorry to be pessimistic, but poking at specific examples like this is hopeless. The wiki category system is fundamentally broken, since there's no information about what kind of relationship each subcategory link represents. I know I've ranted about this elsewhere, but it's worth repeating. For example, what's the relationship between Category:Gulf of Aden and Category:Horn of Africa? borders-on, maybe? Let's assume it is. Then, how would you resolve the (hypothetical) cycle Indian Ocean borders-on Pacific Ocean borders-on Southern Ocean borders-on Indian Ocean? You can't. Until you know what the relationships are, this is truly an intractable problem. Not only can't you understand how to break the cycle, you can't even understand if the cycle is inherently unbreakable. -- RoySmith (talk) 18:58, 2 October 2020 (UTC)
 * Per Categorization, categories can either be a WP:TOPICCAT (is-related relationship) or WP:SETCAT (is-a relationship). I think for SETCATs it's pretty clear, but the is-related relationship of TOPICCATs can vary by editor. Borders-on doesn't sound like a good relationship. I would just remove or change such categorisations to the way I see fit, because after all these categorizations must have been done by a single editor without any discussion. I don't think many people monitor changes to categories which would mean bad categorisations are going to exist in plenty -- if they result in a cycle, the bot would highlight them but not otherwise. – SD0001  (talk) 18:29, 3 October 2020 (UTC)
 * , Is there any way to tell from looking at the category graph which type a particular edge represents? -- RoySmith (talk) 19:50, 4 October 2020 (UTC)
 * , You mean for a bot? Not really. Set category exists, but it has only 32.5k transclusions so I suspect all set categories haven't been tagged with the template. – SD0001  (talk) 14:50, 5 October 2020 (UTC)
 * , Well, it sounds like a good place to start this quest would be to get all the set categories tagged with set category. That at least would give us a line in the sand.  We could then start to build tools which enforced that there were no cycles composed entirely of set categories.  And tools which could traverse the set category subgraph with the knowledge that it was a tree (or at least a DAG). -- RoySmith (talk) 16:01, 5 October 2020 (UTC)
 * , unfortunately there are more problems, set cats (like Category:Office buildings in Manhattan often contain topic cats (like Category:Empire State Building), causing a leak. I'm surprised the guideline doesn't explicitly say that set cats should only contain set cats, though both WP:SUBCAT (When making one category a subcategory of another, ensure that the members of the subcategory really can be expected (with possibly a few exceptions) to belong to the parent also.) and the wording of set category template imply it. –  SD0001  (talk) 16:42, 5 October 2020 (UTC)
 * , That's why I was explicit about "no cycles composed entirely of set categories". Adding an additional constraint that "set cats can only contain other set cats" would be a stronger constraint, and perhaps something we could make a long-term goal, but at least getting the set cats identified as such is a step in the right direction, and useful in its own right. -- RoySmith (talk) 16:48, 5 October 2020 (UTC)
 * I would be delighted if the logically obvious rule "set cats can only contain other set cats" could be explicitly added somewhere. At present Categorization explicitly advocates  as a subcat of, thus adding a plethora of non-cities to a set category of cities (and often compounding the error by removing the article New York City from  on the grounds that categories are hierarchical). Oculi (talk) 20:44, 14 October 2020 (UTC)

Regenerated
Following a request by, I did a re-run of the bot. Regards, – SD0001  (talk) 18:17, 3 October 2020 (UTC)