User:Dfletter/Taxonomy

Introduction
Computers, Computer Science, Computer Engineering, Computing, Information Technology, Software Engineerig; with so many different terms for such closely related topics, it should be no wonder that it is difficult for Wikipedians to quickly come to consensus on how to best set up the category structure to encompass so many varied ways of looking at these domains. This article is a working paper directed at Wikipedians working together on that part of the WikiProject devoted to coordinating work toward the improvement of the Wikipedia in these domains.

Work is currently underway within the computer related topics to coordinate efforts and to further the goals of the Wikipedia project at Wikipedia:WikiProject_Computers. Since this is a large project requiring the work of many individuals over a long period of time, the organization and coordination of these individuals is important to the effort's success. An important part of that organization is a structure for the topics within our stated domain (taxonomy). While there are several existing taxonomies for this domain, the simple adoption of one specific one is not likely to happen. However these published taxonomies are helpful in the discussion.

In addition, it is my goal to create a strawman proposal to submit to the members of the project for their comment. Since the members of this project are also the ones who are categorizing articles, this proposed taxonomy will begin to influence the shape of the Wikipedia taxonomy as soon as it gains a reasonable level of support among the people reviewing and commenting on it.

Academia versus enterprise;science versus engineering;search for new principles or the application of known principles. These tensions exist in the quest to find a workable categorization scheme. But there are others too; the casual high-school user of the Wikipedia researching a term paper versus the seasoned professional brushing up on topics they may not have studied in years. Or the confusion between the creation of computer tools versus the use of those tools in other disciplines. To gain wide acceptance, a categorization scheme must satisfy all these tensions.

One common way that this kind of challenge is answered is to adopt either wholesale or in part the work of some acknowledged authority in that domain. There are no shortage of them and none is completely satisfactory. Yet, ceding that responsibility to someone else allows the larger project to proceed without the endless disputes over the meta-data and, presumably, more time spent on the core challenge of writing good articles for the topics. Even if the adoption of an existing structure is politically impossible, perhaps these authorities will at least be persuasive. To that end, you will find references to some existing structures that already exist.

To familiarize the reader with whatever strucutre now exists in the Wikipedia in these domains, there is a section which documents a significant portion of the current categorization scheme. Howeven given the fluid nature of the Wikipedia, especially domain areas in as much flux as these, it is hopelessly out of date as soon as it is written. It only helps as a start before diving into whatever is current at the time this is read as opposed to written.

The last section is a strawman created to provide a reasonable starting point for the discussions to follow.

ACM Computing Classification System
http://www.acm.org/class/1998/

Philosophy
The philosophy that guided the Committee in the development of this Classification System is as follows:

The heart of the CCS is a tree, the easiest format in which to represent a hierarchical structure in a linear publication format. The classification tree is restricted to three letter-and-number-coded levels in order that the tree be able to accurately reflect the essential structure of the discipline over an extended period. Subject descriptors (an uncoded fourth level of the tree) provide sufficient detail to cope with new developments in the field. Originally, subject descriptors were intended to change frequently; in practice, however, it is difficult to delete obsolete subject descriptors without obliterating the references to works originally classified under them. Thus, subject descriptors are a permanent part of the tree. Those marked by an asterisk have been "retired" from active usage. Users of the Classification System may still search ACM's online and CD-ROM files using the retired descriptors for items classified before the descriptor was retired. Footnotes indicate the years the items were retired. Counts of past usage of CCS index terms assisted the Update Committee in deciding which terms to retire and which sections to consider for expansion. For the 1998 update, the Committee considered changing the overall structure of the CCS to reflect the rapidly evolving discipline of computing, but the constraint of maintaining a historical search capability as mentioned in item 3 severely limited its options in this respect. The Committee decided to retain the overall structure while implementing changes at lower levels of the tree, in order to have a working CCS that is still recognizable when compared to earlier versions. A major redesign of the CCS that would reach into higher node levels is being considered for the future, however.

The tree consists of 11 first-level nodes and one or two levels under each of these. The set of children of all first and second-level nodes begins with a node General and ends with a node Miscellaneous. The first-level nodes have letter designations (A through K). The second and third levels have combination letter-and-numerical designations. In actual classification usage, first-level nodes (like B. Hardware) are never used to classify material. For material at a general level, the General node (in this case B.0) is used instead. The General node at the first or second level can serve two purposes: it is used for papers that include broad treatments of the topic covered by its parent node (the node immediately preceding it in the tree), or it may cover several topics related to some (but not necessarily all) of its sibling nodes. For example, under K.7 The Computing Profession, the node K.7.0 General would be used to classify a general article on the computing profession, but also could be used for an article that dealt specifically with computing Occupations (K.7.1), Organizations (K.7.2) and Testing, Certification, and Licensing (K.7.3).

A set of subject descriptors is associated with most leaves of the tree (although seldom with the General and Miscellaneous leaves).

In addition to the subject descriptors printed as a part of the CCS, proper nouns or implicit subject descriptors can be included under the proper numbered node. For example, "C++" is an implicit subject descriptor under D.3.2 Language Classifications, "OS/2" is an implicit subject descriptor under D.4.0 Operating Systems, General, and "Bill Gates" and "Grace Murray Hopper" are implicit subject descriptors under K.2 History of Computing.

General Terms are a defined set of 16 words that typically apply to many areas of the field. The General Terms list is somewhat orthogonal relative to the actual tree.

Encyclopedia of Computer Science
Encyclopedia of Computer Science Anthony Ralston(editor), Edwin D. Reilly(editor), David Hemmendinger(editor) ISBN: 0470864125

In the Encyclopedia of Computer Science, the Fourth Edition uses these high-level classifications:
 * Hardware
 * Computer Systems
 * Information and Data
 * Software
 * Mathematics of Computing
 * Theory of Computing
 * Methodologies
 * Applications
 * Computing Milieux

http://www.doc.ic.ac.uk/~ar9/AnnalsArticleSubmit.html http://doi.ieeecomputersociety.org/10.1109/MAHC.2004.1278849

Taxonomy of Computer Science & Engineering
AFIPS Press, 1980 LOC # 79-57474

Google
Google's Directory is created by volunteer human-subject matter experts who contribute to the Open Directory Project (www.dmoz.org). Follow this link to see Google's category structure. If you are interested in learning more about how Google is working to categorize the web, follow this link.

I have created a separate page to show Google's taxonomy as it applies to computers and aligned topics. /Google taxonomies for computers

Directory Open Project
The Open Directory Project is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors.

The ODP powers core directory services for some the most popular portals and search engines on the Web, including AOL Search, Netscape Search, Google, Lycos, DirectHit, and HotBot, and hundreds of others.

site: http://www.dmoz.org

For a high-level summary of ODP's organization of computer related topics, see /Directory Open Project Taxonomy

Library of Congress Catalog
Science
 * Q300-390  Cybernetics
 * Q350-390  Information theory
 * QA75.5-76.95   Electronic computers. Computer science.
 * QA76.75-76.765   Computer software

Technology
 * TK5101-6720   Telecommunication
 * TK7885-7895   Computer engineering

Others
A. Kent, and J. Belzer,, et al., .eds. Encyclopedia of Computer Science and Technology, 43 volumes and ongoing, Marcel Dekker, 1976 (for first volume).

A. Tucker,, ed. The Computer Science and Engineering Handbook, CRC Press, 1997

R. Rojas,, ed. Encyclopedia of Computers and Computer History, Fitzroy Dearborn, 2001

report, from the USA National Research Council's Committee to Assess the Scope and Direction of Computer Science and Technology

Current Category Structure for Wikipedia
The Wikipedia classification system is, by its nature, organic and ever chaning. However the parts that people generally agree are helpful are usually static. I maintain another page to show that part of the category structure that is of interest to me to support analysis and criticism. /WP Taxonomy

Proposed Category Structure for Wikipedia (not yet ready for review)
/Draft Version