Wikipedia talk:Semantic Wikipedia

Untitled
See: http://semantic-mediawiki.org

Original post in the village pump
I've been thinking about this proposal for some time, but being a proposal, it of course requires comment and refining.

I've been thinking on how to make Wikipedia better, and while reading about Tim Berners-Lee's Semantic Web, it came to me. We need a Semantic Wikipedia. Perhaps I'm using the wrong term, but I like it. :)

The problem with Wikipedia right now is no sense of context. Everything related to something has to be manually added. For example, when we say that someone was born in 1932, that doesn't really mean anything. It doesn't alter the "people born in 1932" pages, it doesn't alter 1932, and it's nothing more than text. Likewise with saying they were born in Lubbock, Texas, or that they were an architect. Furthermore, saying that an object is in Edinburgh doesn't matter much until we tell Edinburgh that.

We need to automate and integrate these issues.

The idea I had was a mix of a template and meta data. For editing, it would take the appearance of a template, but the template would not be required. For example, a "biography" template would include birthday, deathday, primary occupation, secondary occupation, birthplace, deathplace, primary residence, etc. A planetary template would be appropriately different, as would an elemental one, and some things wouldn't use such templates at all, though they could still use meta data.

A major use of this would be for synopses. For example, "Why this person is famous" in one line. This would allow for quick browsing of entries without having to load the entire entry, and it would be automated based on the entry itself, rather than a manually maintained list. So a list of people born in 1932 could also list the synopsis as to why they're famous. This would also ease the process of picking notable anniversaries.

Some of this is managed by categories, but not nearly enough, and I'm not a fan of the categorization system as it presently stands. However, that is for another discussion.

Does everyone know just what I'm trying to get across here? Meta data to help organize and express the information. For example, the header to George W. Bush could include:


 * Birthdate: July 6 1946
 * Notable for: 43rd President of the United States.
 * Secondary reason: Presided during September 11th Attacks and invasions of Afghanistan and Iraq.
 * .. and any other possible metadata that could be used.

So, when looking at a list of people born on 7/6/46, it would say him, along with the reasons he's notable. This could allow for a quick list of automatically generated synopses.

A secondary advantage of using a template in this fashion would be to remove Categories and Interwikies from the main article space, making it easier to edit and comprehend. They would still be there, but exist as meta data, not in the main text area. Footer and header tables could be maintained the same way, removing them from the concern of the main text.

I've been pondering this for a while and finally put it in writing, so sorry if it rambles a bit. It focuses on biographies, but it could be useful for many things. A city template could include its coordinates, state and country, etc, allowing for people to get a quick list of townships in Ontario, for example, without having to rely on a manually maintained list. I guess what this is all about is automating many functions that should be automated.

Any comments? --Golbez 04:25, Nov 14, 2004 (UTC)


 * This issue is handled to some degree by the existence of categories. When a cat tag is added to an article, the category is automatically updated - no need for "manually updated lists".  So adding Category:1946 births to George W. Bush puts him on that page.  Unfortunately, cats aren't always the most reader-friendly - they are just lists with a little header, and they are slightly out-of-the-way at the bottom of Monobook.  I think you have a good suggestion with creating synopses based on these categories - maybe somthing along the lines of a WikiReader, but tied directly to categories.  The only problem is that this would require the creation of metadata for every article - all  of them.  Some stub articles you couldn't even make metadata from.  So while I think this is a good idea, it might not be currently feasible, without a huge project to create vast wodges of metadata.  Some pages already have semi-metadata - anything that was formatted by WikiProject Elements, for instance, like Hydrogen, has an infobox.  But this is the minority of pages.  --Whosyourjudas (talk) 03:58, 14 Nov 2004 (UTC)


 * Categories are inadequate, though, and contain no context. They don't say who the person was, or why that item is included in that category. And categories themselves are manually updated lists. Yes, all of them, because this is a work in progress, and always will be. Just because there are already so many articles doesn't mean the process can't be refined. Elements has a good idea, but that's just a couple hundred articles - There are many, many more biographies.


 * I guess my point here is, few articles have any context beyond their borders. We need to connect them. --Golbez 04:25, Nov 14, 2004 (UTC)


 * I think it's a great idea and have had similar thoughts myself (for instance some standard format for someones birth and death dates that can be automatically recognised replacing Categories YYYY Births and YYYY Deaths which do their job but are 'clumsy'). However like the semantic web itself I think it will probably be another 5 years before the underlying standards evolve and 'bed-down' before the MediaWiki software (or successor of) can incorporate them to level discussed above. CheekyMonkey 14:57, 14 Nov 2004 (UTC)


 * Then there's no harm in starting work on such now. :) --Golbez 23:19, Nov 14, 2004 (UTC)


 * I am very much interested in the Semantic Web, and I have also been thinking about how Wikipedia can be made more "machine readable". The problem is that people are not that used to working directly with metadata, and doing so is not very wiki-friendly, at least if one goes back to the original thoughts behind wikis (granted, Wikipedia has already side-stepped a lot of the original guidelines where it made sense, and for good measure). We would have to make it as easy to edit and expand the metadata as it is to edit the article right now...I think it is a difficult problem to solve (look at how oblivious most people are to the metadata information stored in Microsoft Word documents. Most of the time they're not even aware it's there).


 * As an aside, I think it is interesting how wikis and Wikipedia have already brought one of Tim Berners-Lee's original thoughts with the World Wide Web to life; that everyone would become a publisher. The easy editing of the Web was not implemented in the ground-breaking web browsers, and was long forgotten, but wikis bring that back. Wouldn't it be great if Wikipedia could be on the forefront of yet another web revolution; the Semantic Web? &mdash; David Remahl 23:27, 14 Nov 2004 (UTC)


 * Thanks for the comments, and yeah, it was the articles on him and the Semantic Web that started this churning in my mind. My idea was, there would be templates (like "put date of birth here") but those wouldn't be required; the metadata would be stored with the article text just like categories and interwikis are, causing them to show up in a diff. A side benefit of this would be to also add a separate template text entry section of interwikis and categories, thus perhaps enforcing format rules as to where they belong in the article text. I wonder if Tim Berners-Lee is familiar with Wikipedia? Surely he's heard of it, but how familiar is he? --Golbez 10:30, Nov 15, 2004 (UTC)


 * He certainly has heard of it, he even described Wikipedia as "The Font of All Knowledge" in a speech to the MIT Emerging Technologies Conference (penultimate paragraph). CheekyMonkey 14:05, 15 Nov 2004 (UTC)


 * I like the idea. It is essentially normallizing part of the data (in database terms).  It would allow one to use that data programmaticaly. Morris 03:26, Nov 18, 2004 (UTC)
 * Right. It would convert the information from pure text into meta data that can be used to ease organization, searching, browsing, and integration of data between articles. --Golbez 18:39, Nov 18, 2004 (UTC)


 * Yes, I agree such a system is needed. I thought about it also, for country data, because statistics like population or GDP change quickly, so there should be some nice way to update all articles about countries with new data. Samohyl Jan 12:32, 19 Nov 2004 (UTC)


 * Take a look at Web Ontology Language. There was a long discussion without conclusion, now archived at Wikipedia_talk:Categorization/Archive_3, just after categories were introduced. See the "Describing the relations" and "Ontologies and OWL" sections for my thoughts. -- Avaragado 19:05, 25 Nov 2004 (UTC)


 * I also already thought about this and think it would be necessairy. If some initiative is started I would like to contribute (in matter of conception and coding) - so please keep me informed :). -- Chrisgraf 16:04, 5 Dec 2004 (UTC)

On Data and tables
I agree with your end, Golbez, but not with your means. Yes, it is important to able a machine access wikipedia data so it's legacy will go further than a website. But never in prejudice to users.

Is not easy to write wikipedia, but as Don Norman would put it, it's complexity lies in the task, not in the tool. We're trying to write the biggest book ever after all. So I believe than automating some tasks would certainly have big trade-offs. The first rule of Usability guide for future improvements (wich I just wrote up) is Never mix human readable with computer readable, not if it makes it more dificult to humans.

Let's suppose there was a automatic tag summary. Writting {summary:history of haiti} or {intro:history of haiti} would dysplay respectively the first paragraph or the first line of history of haiti.

Imagine how nice! This way you could put a small summary of haiti in many other articles, one that would be automatically updated. Imagine the ease on wich editing one article would in fact change a lot of others. And then you just needed to put many summary togethers and you would have a ready made history of the caribean, history of latin america, history of the XX century and so on. Wouldn't it be great?

In fact no. An article on history of a region is way more than the sum of it's parts. A good summary of an article is not the same of a summary that should fit in other articles like days. They may be good starting points, and many good articles have started from copy pasting parts from others - and the editing them together. The end result would be articles harder to read and harder to edit.

Keep wikipedia on an human scale, do not artificially inflate it. It has been working.


 * I don't think this is quite what I was proposing. I wasn't saying that entire articles should be cobbled together by machine from other articles; my first priority was dealing with things like lists. Who was born in a certain year, and why they're on Wikipedia. That could be done automatically, rather than having robots or people trying to handle it. Likewise, a list could be quickly generated of all landmarks in a certain US state. This is not meant to be a replacement for actual articles. --Golbez 22:18, Dec 12, 2004 (UTC)

But there is something to be said about semantics. Computers are able to read facts, not understand articles. Semantics are not in the body of text, but in those tables at the right.

In fact those tables themself present problems. Each user creates them differently, some use html ( a totally unreadable code ) table others use wiki tables (|--- a clever code, but gets complicated when you want to add colors or control size) some even use. The result is a table that is hard to edit, hard to personalize and mainly something that our user number one sees as plain noise.

So, I propose a different approach, the article should keep not a table, but just the info. I'll explain by example. In the Kyoto article, instead of a table there should be a tag like that:

This would act as two things, first as a template, just as if the user was using the Template:city, by putting a tag. Secondly would act as a variable setter, creating variables called “Population” or “Map” and assigning them the values. Maybe we should not even use the name Variable, as this is a programming jargon and, as I said before we do not want to seem that wikipedia is for programmers. Notice that I’m not saying something as

Var: CityName.local= “Shacbiss”

But a simple syntax, in which the word between the line break and the : sign is the variable name and everything after the : and before the next line break is the value.

Thus, any occurrence in the document of something as  $Name  or  $Population  would be substituted for the corresponding value. This way if in the aforementioned example, the city template was a table, with the corresponding information on variables, not only it would be easier to give information about a given city, but it would also be easier to change the layout of the table in all of them. Of course, in the example above, one could use a general template but a more specific  or even a , in a nice wikipedian style.

So it would be better for humans AND a computer could read all this data. To use it to draw a world map, design a game, or make some statistics...

Feedback please?--Alexandre Van de Sande 14:08, 12 Dec 2004 (UTC)


 * I've just skimmed this so I can't supply an in-depth reply, but yes, "automating" the header and footer tables were part of my idea. That is, they would be semantic, they would mean more than they currently do. Their information would be transmitted back to other articles as needed, and there would be a specific (but by no means forced) format for each. The city table, and the information contained within it, is one example. A biography table would be another.
 * Furthermore, I want a way to tell what an article is about. IS it about a city? IS it a biography? Right now, there is no such mechanism. --Golbez 22:18, Dec 12, 2004 (UTC)

Just to add another voice to the chorus, I'd love to see the ability to add semantic tags to wikipedia; I think they'd take off a lot more than people here are thinking. As for applications, I dream of projects like CYC being able to make use of our massive editor base. -- Rei 23:49, 12 May 2006 (UTC)

Integration with Semantic MediaWiki: when?
I've been reading in the last few days a couple of papers about "Semantic WikpediA" and "Semantic WikiMedia", they are dated 2006. Any chance the proposed extensions will be integrated in WikipediA any time soon?

I'm not claiming it's easy (of course is not!), I'm only a bit disappointed about not finding any roadmap for this integration here, in this discussion page. Perhaps I'm missing some useful link to the roadmap?

Zacchiro 16:01, 5 June 2007 (UTC)


 * I'm also curious about the roadmap towards a semantic Wikiepdia. --GrandiJoos 17:46, 17 October 2007 (UTC)


 * Same here. I am not in the know about the cutting edge of wikipedia, but after doing some research, apart from the fact the semantic extention is in constant development, there doesn't seem to be any plans to adopt it. I want it in asap. Chendy (talk) 09:46, 25 May 2008 (UTC)


 * I think we need to send a request out to the semantic developers for a roadmap, and ensure the issue is discussed at the summit. —Preceding unsigned comment added by 118.92.232.96 (talk) 21:31, 23 July 2009 (UTC)

Why to integrate?
The big disadvatage of the Semnatic Web movement is that people dont't know WHY or WHAT FOR? Sometimes they answer "to enable computer understand, think, infer, decide". WHY? It is human who has to understand.

Of course there are some reasonable applications of Semantic Web technology for popular use, like price comparison etc. But Wikipedia is not for such tasks.

Besides, Wiki has the mechanism of backlinks, useful for categorisation. Are U sure that it is not enough?

andrzejgo - 0:15, 21 July 2007


 * With semantic knowledge you can ask Wikipedia questions like "Give me all American Nobel Prize laureates born after 1950" or "List all German cities with summer temperature higher than 26 C", which are laborious to answer now. I think these are pretty exciting possibilities and probably we haven't thought of the most interesting applications yet. -- þħɥʂıɕıʄʈʝɘɖı 20:04, 28 August 2007 (UTC)

Developing this "essay"
I am intending on developing the content, I my additions are to everybodies liking, if not please discuss with me :) Chendy (talk) 09:32, 1 June 2008 (UTC)

Proposal: Wikilink Everything
Here is my humble proposal to get Wikipedia a bit more semantic: If an article mentions Japan several times, then Japan should be wikilinked each time. Right now, the policy is to only wikilink the first occurrence of Japan. I feel this selection should be done by the Wikimedia software when rendering the page. Having more wikilinks would aim towards the main goal of wikilinks: Make the sense behind a word explicit with a reference to the meant concept. What do you think about this idea ? Thanks for your feedback! Nicolas1981 (talk) 11:39, 30 October 2008 (UTC)
 * Note: In case it is not clear, please check "signifiant" and "signifié" in the Course in General Linguistics. Nicolas1981 (talk) 11:44, 30 October 2008 (UTC)
 * You can't accurately link every word without major work getting the right meaning of those words. Also linking everything risks drowning out the actually meaningful stuff. With all the piping involved in getting the right meaning of words like faggot and bonnet there will be a dramatic increase in the complexity of the code for editors, and this is a major barrier to recruiting new editors. So linking everything has three disadvantages:
 * A lot of editor time is needed.
 * Fewer editors will be available to do this.
 * Large amounts of information with little meaning could drown out the existing semantic wiki functionality such as the death anomalies project.  Ϣere Spiel  Chequers  13:08, 12 January 2011 (UTC)

Let's get started!
I started creating the first few "articles" of what could become a Semantic Wikipedia here. Semantic grammar is a language like others, so this Semantic Wikipedia is meant to be an encyclopedia by itself, just like the Swahili or Northern Sami Wikipedias. What do you think about the embryo I created ? Cheers Nicolas1981 (talk) 11:04, 30 April 2009 (UTC)

Silently encoding semantic wikilinks
I started this discussion on the village pump that may be of interest here... Cheers, AndrewGNF (talk) 00:10, 21 July 2009 (UTC)

Why not utilize template parameter values and cross-link table values more than today?
Why is it not possible to search and cross link the template parameter values and table content more than we do today?

Examples:
 * Searching template parameter values: I want to search for all articles that contains a certain template (e.g. Infobox City) and a certain parameter value value (e.g. established_date between year X and Y). I may want to sort the search result based on a parameter (e.g. population_size). Since several templates are similar (like Infobox City Spain, Infobox Settlement, etc), I may want to be able to state a set of alternative templates, or a category of templates, or a "super class" templates that other templates have "inherited" parameters from. The search language should handle boolean expressions.
 * Cross linking: E.g. I may want the population size of a city to be mentioned in the floating text and in infoboxes for the city article as well as the corresponding municipality article and country article. But I only want to enter it at one place. Or I may change the value at either of these places.
 * Compilation of template parameter values from many articles: I may want to compile a table or list of all articles that use a certain template. Each article (e.g. each city) should be a row, and each column a parameter value. The table may be sorted and filtered based on certain parameters. This table should be included in an article about the county, and should be automatically updated whenever someone changes a template parameter value in one of the city articles. For example by a robot that subscribes on all articles that uses the template.
 * Linking from one table to infoboxes in several articles: I may want to enter the population size of all the cities in a country only in one table, one row per city, and then use that value in the infoboxes in each city article.
 * Double linking and WYSIWYG editing of infobox fields: It should be easy to edit a parameter value directly within the articles where it is used, for example in a specific value in an infobox or a cell in a table, without editing the template call code or table code. If the value is cross-linked and embedded/transcluded from another article, it should be possible to change its value immediately without looking up the source article. For example, you may double click or right-click on a specific parameter value in a infoxbox or a table cell, or a bookmarked or embedded/transcluded portion of the the text. Then you should be able edit it, or find its source article, or find out what other articles that are using the value and will be affected by a change. A preview of what affects a change would have on other articles would be difficult to achieve but is interesting. It would probably require saving a preview version on the server.

Is there a reason why no robots and extensions for this are developed? Why adding semantic classification markup? What practical use would that give besides the above, and besides today's article categories? These suggestions would require some kind of addressing mechanism for specific parameters, table cells or portions of the text. These are some suggestions:
 * Embedding bookmarked portions of the text: One simple solution would be a template called My text and another that makes it possible for article a to embedding, transcluding or double linking bookmarked text from another article. How should the articlename+bookmarkname URI look like? Would this be possible without extensions?
 * Addressing parameter values: We need a URI for addressing the value of the parameter named X, used in the first article call of template Y in article Z. Any suggestions for the address syntax?
 * Addressing table cell values: It should be possible to address a specific value in a table, by the row and column header names, table name (if one is assigned) or table number, and the column header (or column number)article name or article section name. Any suggestions for the address syntax?

These suggestions also require some Boolean search language syntax. Any suggestions?

Mange01 (talk) 19:07, 18 December 2009 (UTC)

How to make a Semantic Wikipedia

 * 1) Register a domain name such as semantic-wikipedia.org
 * 2) Install MediaWiki with the Semantic MediaWiki plugin
 * 3) Download a data dump of Wikipedia and install to the wiki
 * 4) Make or use revision control software similar to GIT or SVN to each day merge changes from Wikipedia, but still keep the local changes where semantic annoations have been added.
 * 5) Tell people about the wiki on Semantic Web mailing lists etc.
 * 6) Publish a dump of RDF-triples etc. —Preceding unsigned comment added by 129.241.122.126 (talk) 14:42, 24 March 2010 (UTC)

Academia
We should go ahead and do this. The academic community is quite interested in this idea, if the number of related hits on Google Scholar is any indication. Wikipedia's usefulness to researchers would be greatly increased by SMW. Tisane (talk) 09:46, 9 May 2010 (UTC)
 * Agree, Sadads (talk) 11:29, 6 July 2010 (UTC)
 * The above todo-list seems to be a good start. What's the catch? Don't you think it will work?Mange01 (talk) 14:54, 7 June 2011 (UTC)

Automatic link to lead section content
There's a feature request at the village pump for an automatic link to the beginning of content in the lead, skipping article tags and top hats. It may be of interest to the semantic wikipedia. Diego (talk) 11:11, 20 March 2012 (UTC)


 * Thank you, Diego, for this link. The Wikipedia Manual of Style requires the First sentence to include the name of the topic and its concise definition, which is precisely what is needed for a glossary reference by web sites anywhere in the world wide web. The ability to link directly to that sentence is an essential first step in creating a Semantic Wikipedia. Currently no facility exists to point to that specific information. My proposal to have editors insert anchors to that first sentence has been soundly rejected by editors who fear (I think unwisely) a proliferation of such tags throughout Wikipedia, and who seem unconcerned with the potential role that Wikipedia might play in the web. Since the distinction of the "First sentence" is an internal Wikipedia concept, with no syntactic marker for software to grab (indeed that is the technical essence of the problem), I am skeptical that there is a "software solution". Nonetheless, whether the solution is in software or in policy, I would appreciate your feedback on the need for the solution.DrFree (talk) 18:17, 20 March 2012 (UTC)