Wikipedia:Categorization/Topic Maps

Moved from the namespace page


 * This is a Wikipedia:policy thinktank project ("Article grouping techniques" subsection), initiated by User:BernardVatant

Discuss here the case for a Topic Map structure for Wikipedia.

 This project started in 2001 and is no more active. User:BernardVatant (2006-06-13)

What are Topic Maps about? See: http://www.topicmaps.org

What are topic maps? Why would they be any better than what we have now, which is readily understandable by most people? --User:LMS

Good question! I'll try to figure out a convincing and simple answer. "As simple as possible, but not simpler ..."

What about Topic Maps and Wikipedia?

Topic Maps are linked with discussions about various possible Wikipedia category schemes. I think that Wiki spirit and technical structures are basically incompatible with any definite hierarchical organization, with "top categories" and "subcategories" and so on. A Wiki Web is expanding like a semantic network, or like the Universe itself, from neither center nor upper level. So does and will do Wikipedia.

But if we want users to find their way in this expanding network universe, we have somewhere to provide some maps for it. What do Topic Maps bring there? They allow to pass from a plain oriented graph structure (network of pages linked by one-way hypertext links) to a labeled hypergraph, where pages (or better: "topics" of which subjects are described by pages) are nodes connected by non-oriented edges (carrying a label called "role") to and through "association" nodes.

Let's take an example. At present time, if you go to Astronomy page, you may find hyperlinks to some astronomical subjects like Planets or Sun or Solar System.

But you don't know if there is a link back from Planets to Astronomy ... And you don't exactly know why this precise word in the text is hyperlinked, whereas "Radio" or "Physics" are not ...

Thinking about it, you would say: "Well, Planets and Sun are objects of the Solar System and Astronomy is about that sort of things. That's why there is an hyperlink there."

See there are different levels of reality in the above subjects.

Astronomy is an abstract division of human activity, called maybe "scientific discipline" or "knowledge field" ... Planets define a class of observable objects, not all of them in the Solar System ... Sun is an individual object, element of Solar System ... Solar System is a set of individual objects, part of the Galaxy ...

A part of a Topic Map index for those subjects could be for example as following:

Astronomy Planets Sun Solar System Telescopes are members in an association.

The type of this association should be e.g. "Knowledge Field"

The role types for this association type should be e.g. Discipline, Individual Object, Objects Class, Objects Set, Tool Class, ...

Astronomy Astrophysics ... play the role Discipline ; Planets Stars Galaxies ... play the role of Objects Class ; Sun Mars play the role of Individual Object ; Solar System play the role of Objects Set ; Telescopes play the role of Tool Class ...

A topic map index is made in such a way that from every topic A you can browse to the topics members of any association where A is a member. The problem is I don't see right now how to integrate a Topic Map index inside Wikipedia. It would need a meta-level of which Wikipedia pages would be occurrences. See e.g. a collaborative Topic Map index for the Semantic Web at http://www.universimmedia.com/semantopic.htm

If collaborators are interested, we could start in this semantopic data base to index some pages of Wikipedia relevant to Knowledge Organization, Semantic Web, XML technology ... and make this way both an example of the proposed indexation, and a synergy between those two collaborative projects.

User:BernardVatant

The Semantopic project is not active any more (since 2003) and the above link is dead (2006-06-13)

I for one would be happy to see an enhanced form of semantic markup for wikipedia (and also nupedia). Topic maps would perhaps not be my first choice for a technology but nevertheless would be a good thing.

I wonder though how this could be fitted into a wiki. The linking scheme is not terribly expressive within wiki. Do you have any ideas?

I read this through once quickly. I didn't understand on the first reading. How can you expect ordinary Wikipedians then to understand it and participate? I'm skeptical. Can you explain it more simply? --User:LMS

Bah, you're skeptical of anything new, aren't you? ;-) If you read the proposal, what he's describing is something analogous to a genealogy tree, except for subject matter.  This has been attempted in, IIRC, "The Brain".  Personally I think it would be complicated to program, processor intensive to make the mappings, and of only occasional usefulness to the casual browser.  Other than that, though, I think it's a nifty idea.  -- BryceHarrington

No, Bryce, I think you are discounting a perfectly legitimate criticism on my part. --User:LMS

Yes, the explanation isn't clear at all, but from what I can gather from it and from the referenced sites, it is merely one application of the more general issue of metadata (including typed links). That is indeed something we may want to accommodate at some point, but for now I like Wikipedia's focus on building content. For a complete online encyclopedia system to be a useful product, it will need people with several different skills: creating content, editing and selecting for quality, organizing and cross-referencing the content, creating software for users and for authors and editors, and possibly more. Wikipedia is very good at creating content, but let's face it--most of it is of poor quality; Nupedia is better at editing for quality, but doesn't encourage the kind of collaborative brainstorming for creation that Wiki does; neither tool is especially adept at generating metadata; Nupedia's organizational structure is fixed, while Wiki's is more flexible; neither has good software support for either authors or editors, but that's more a function of the present limitations of the Web. When we get to the point where we have more content than we can handle, I think it would make sense to create better tools for generating and using metadata, but I don't think we're there yet. --LDC

The difficult is with this is that its very hard to put the metadata back in once you have taken it out. You need an awful lot of domain knowledge to determine for instance link types. And neither wikipedia or nupedia are addressing the issues of defining the semantic structure, at least not in a way that is machine comprehensible.

Within wikipedia this is going to be extremely difficult to achieve I think. We could use naming conventions within the links. But this is limited as wiki does not provide an automatic way of checking that these links are sane. Nupedia has the possibility of a more complex structure though, as it is going to have an XML representation, and at least some of these things would be possible.

Still there are certain things that I think could be done for both. For nupedia having a database, and wikipedia a specific area which defined people for instance, with an unambiguous naming scheme would allow us to distinguish different people with the same name, at least in many cases. The same of course would also be true for places. And also for things which have multiple different names (organism species for instance). Providing even small amounts of structure such as this can help enormously in extracting and transforming data which is held within mostly free text.

Long term I think defining a semantic network of everything and having wiki/nupedia relate to these would be an excellent thing, but at the current time I think it would actually be more effort than writing the content as free text. Its a really hard problem. However this should not be used as an excuse for not doing those things that could be done.

It would be lovely to hear from the original poster if he has any ideas about how to integrate topic maps or other similar technologies into a wiki. PL

I agree with the above comments. Nupedia is going about the encyclopedia-building project much more carefully and we still aren't to the point, I think, where adding semantic markup is practical. But it will be easier, I think, to establish our markup conventions (which is to be an extension of the XML DTD, TEI Lite), once we have a lot more articles.

Anything similar on a wiki would be, basically, impractical. Probably it would be better simply to wait for technology that identifies conceptual relationships and content types by parsing ordinary, unmarked-up language itself! --User:LMS

Many things to say about all the above pertinent remarks. Not the time to address them fully now. Simply a few remarks: 0. Topic Maps are about representing complexity. Nobody I know has ever understood what they are about without a good deal of individual effort. But it's worth it, believe me! 1. Waiting for the knowledge base to have a significative content and size to start thinking about metastructure seems to me at the risk of *never* thinking about it, or too late. 2. Extracting conceptual relationships out of automatic parsing process? Very skeptical about that. It's up to human authors and users to define what they agree are the subjects of knowledge. See ongoing debate about that on various threads on the topic map mailing list at http://www.infoloom.com/pipermail/topicmapmail/2001q2/thread.html 3. As written above, I consider wiki pages as an "occurrence level". I don't figure how to include the index level (topic map of subjects) inside it.

Bernard

I would like to say that Bernard is wrong to be skeptical about the idea of automatic parsing of conceptual relationships. There is no need to be skeptical about it, when it is absolutely clear that this is not possible and that its extremely unlikely to become so for many years. Trust me on this one.

I think the problem with building in a semantic markup into nupedia post-hoc is that it requires domain knowledge to do so, at least in many cases. In other words its going to be difficult to introduce semantic markup into articles already written, only to articles when the authors can add the necessary structure. I don't find the argument that we need articles to establish conventions persuasive therefore. The argument that writing articles is more important is one I find more persuasive.

TEI-lite is okay as a basis for defining the document structure, but many of the semantic issues are not going to be covered. As before there are some issues that could be addressed (dates, people, places) quickly. I think keeping an eye on the developments which relate to the semantic web is a must for nupedia. With wikipedia I don't think it can happen. The technology is just not expressive enough. PL

If I understand this stuff correctly (which is doubtful since I have only spent 10 minutes reading it) topic maps would add two new features to Wikipedia:


 * 1) Bi-directional links --  So for example, from a page you would be able to see a list of all pages that link to that page.  You can do an approximation to that now in a crude and unsatisfying way by clicking on the title of the page to get a search for the page title.
 * 2) Typed links -- when you add a link you also specify a type.  So instead of writing in the Hamlet page "Hamlet was written by Shakespeare" you would write something like "Hamlet was written by:Shakespeare" . Presumably then this information could be used to generate different index pages.  The "type" of a page could be inferred by the type of the links to and from the page.  For example a list of "authors" could be generated as the list of all pages that have a "written by" link pointing to them, and a list of "written works" could be generated as a list of all pages that have a "written by" link in them pointing to other pages.

Is my understanding correct? --User:Eob

Having read up about Topic maps, my opinions are:
 * 1) Bi-directional links -- You can already see a list of all pages that link to this page by using the 'What links here' function in the toolbox. Also, there may be cases where bi-directional links are not appropriate; e.g. overview articles might be linked to from a very large number of pages, and having links back to all of them on the overview page would be too much.
 * 2) Typed links -- to avoid confusing users by changing the required content of links, the backend programming of Wikipedia could use AI to look for strings such as 'written by ' + a name.

The purposes of topic mapping seem to be: --User:Hkandy
 * to allow users to see information in a concept map / mind map format - which is probably a minority interest
 * to allow use of information by sematic web applications - but it may not be worth making major changes to Wikipedia in order to aid the semantic web, which is still in its' infancy. A more gradual approach, for example moving from xhtml towards xml (which could then contain more metadata on topics), might be more appropriate.