User:Moudy83/conference papers

Conference presentations and papers

 * See also: Wikimania and WikiSym conference series
 * This table is sortable.

{| class="wikitable sortable" ! Authors !! Title !! Conference / published in !! Year !! Online !! Notes !! Abstract !! Keywords
 * -- align="left" valign=top
 * Choi, Key-Sun
 * IT Ontology and Semantic Technology
 * International Conference on Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007.
 * 2007
 * {{hidden||IT (information technology) ontology is to be used for analyzing the information technology as well as for enhancing it. Semantic technology is compared with the syntactic one. Ontology plays a backbone for meaning-centered reconfiguration of syntactic structure, which is one aspect of semantic technology. The purpose of use of IT} ontology will be categorized into two things: to capture the right information and services for user requests, and on the other hand, to give insights for the future IT} with their possible paths by interlinking relations on component classes and instances. Consider question-answering based on ontology to improve the performance of QA.} Each question type (e.g., {5W1H) will seek its specific relation from the ontology that has already been acquired from the relevant information resources (e.g., Wikipedia or news articles). The question is whether such relations and related classes are so neutral independent of domain or they are affected by each specific-domain. The first step of ontology learning for question-answering application is to find such neutral relation discovery mechanism and to take care of the special distorted relation-instance mapping when populating on the domain resources. Then, we will consider the domain ontology acquisition by top-down manner from already made similar resources (e.g., domain-specific thesaurus) and also bottom-up manner from the relevant resources. But the already-made resources should be checked against the current available resources for their coverage. Problem is that thesaurus is comprised of classes, not the instances of terms that appear in corpora. They have little coverage over the resources, and even the mapping between classes and instances has not been established yet in this stage. Clustering technology could now filter out the irrelevant mappings. Features of clustering could be improved more accurate by using more semantic ones that have been accumulated during the steps. For example, discov- ery process based on patterns could be evolved by putting the discovered semantic features into the patterns. Keeping ontology use for question-answering in mind, it is asked for how much the acquired ontology can represent the resources used for acquisition processes. Derived questions are summarized into two about: (1) how such ideal complete ontology could be generated for each specification of use, and (2) how much ontology contributes to the intended problem-solving. The ideal case is to convert all of resources to their corresponding ontology. But if presupposing the gap between the meaning of resources and acquired ontology, a set of raw chunks in resources may be still effective to answer for given questions with some help from acquired ontology or even without resort to them. Definitions of classes and relations in ontology would be manifested through dual structure to supplement the complementary factors between the idealized complete noise-free ontology shape and incomplete error-prone knowledge. In the result, we now confront two problems: how to measure the ontology effectiveness for each situation, and how to compare with the use of ontology for each application and to transform into another shape of ontology depending on application, that could be helped by granularity control and even extended to reconfiguration of knowledge structure. In the result, the intended IT} ontology is modularized enough to be compromised later for each purpose of use, and in efficient and effective ways. Still we have to solve definition questions and their translation to ontology forms.}}
 * {{hidden||IT (information technology) ontology is to be used for analyzing the information technology as well as for enhancing it. Semantic technology is compared with the syntactic one. Ontology plays a backbone for meaning-centered reconfiguration of syntactic structure, which is one aspect of semantic technology. The purpose of use of IT} ontology will be categorized into two things: to capture the right information and services for user requests, and on the other hand, to give insights for the future IT} with their possible paths by interlinking relations on component classes and instances. Consider question-answering based on ontology to improve the performance of QA.} Each question type (e.g., {5W1H) will seek its specific relation from the ontology that has already been acquired from the relevant information resources (e.g., Wikipedia or news articles). The question is whether such relations and related classes are so neutral independent of domain or they are affected by each specific-domain. The first step of ontology learning for question-answering application is to find such neutral relation discovery mechanism and to take care of the special distorted relation-instance mapping when populating on the domain resources. Then, we will consider the domain ontology acquisition by top-down manner from already made similar resources (e.g., domain-specific thesaurus) and also bottom-up manner from the relevant resources. But the already-made resources should be checked against the current available resources for their coverage. Problem is that thesaurus is comprised of classes, not the instances of terms that appear in corpora. They have little coverage over the resources, and even the mapping between classes and instances has not been established yet in this stage. Clustering technology could now filter out the irrelevant mappings. Features of clustering could be improved more accurate by using more semantic ones that have been accumulated during the steps. For example, discov- ery process based on patterns could be evolved by putting the discovered semantic features into the patterns. Keeping ontology use for question-answering in mind, it is asked for how much the acquired ontology can represent the resources used for acquisition processes. Derived questions are summarized into two about: (1) how such ideal complete ontology could be generated for each specification of use, and (2) how much ontology contributes to the intended problem-solving. The ideal case is to convert all of resources to their corresponding ontology. But if presupposing the gap between the meaning of resources and acquired ontology, a set of raw chunks in resources may be still effective to answer for given questions with some help from acquired ontology or even without resort to them. Definitions of classes and relations in ontology would be manifested through dual structure to supplement the complementary factors between the idealized complete noise-free ontology shape and incomplete error-prone knowledge. In the result, we now confront two problems: how to measure the ontology effectiveness for each situation, and how to compare with the use of ontology for each application and to transform into another shape of ontology depending on application, that could be helped by granularity control and even extended to reconfiguration of knowledge structure. In the result, the intended IT} ontology is modularized enough to be compromised later for each purpose of use, and in efficient and effective ways. Still we have to solve definition questions and their translation to ontology forms.}}
 * {{hidden||IT (information technology) ontology is to be used for analyzing the information technology as well as for enhancing it. Semantic technology is compared with the syntactic one. Ontology plays a backbone for meaning-centered reconfiguration of syntactic structure, which is one aspect of semantic technology. The purpose of use of IT} ontology will be categorized into two things: to capture the right information and services for user requests, and on the other hand, to give insights for the future IT} with their possible paths by interlinking relations on component classes and instances. Consider question-answering based on ontology to improve the performance of QA.} Each question type (e.g., {5W1H) will seek its specific relation from the ontology that has already been acquired from the relevant information resources (e.g., Wikipedia or news articles). The question is whether such relations and related classes are so neutral independent of domain or they are affected by each specific-domain. The first step of ontology learning for question-answering application is to find such neutral relation discovery mechanism and to take care of the special distorted relation-instance mapping when populating on the domain resources. Then, we will consider the domain ontology acquisition by top-down manner from already made similar resources (e.g., domain-specific thesaurus) and also bottom-up manner from the relevant resources. But the already-made resources should be checked against the current available resources for their coverage. Problem is that thesaurus is comprised of classes, not the instances of terms that appear in corpora. They have little coverage over the resources, and even the mapping between classes and instances has not been established yet in this stage. Clustering technology could now filter out the irrelevant mappings. Features of clustering could be improved more accurate by using more semantic ones that have been accumulated during the steps. For example, discov- ery process based on patterns could be evolved by putting the discovered semantic features into the patterns. Keeping ontology use for question-answering in mind, it is asked for how much the acquired ontology can represent the resources used for acquisition processes. Derived questions are summarized into two about: (1) how such ideal complete ontology could be generated for each specification of use, and (2) how much ontology contributes to the intended problem-solving. The ideal case is to convert all of resources to their corresponding ontology. But if presupposing the gap between the meaning of resources and acquired ontology, a set of raw chunks in resources may be still effective to answer for given questions with some help from acquired ontology or even without resort to them. Definitions of classes and relations in ontology would be manifested through dual structure to supplement the complementary factors between the idealized complete noise-free ontology shape and incomplete error-prone knowledge. In the result, we now confront two problems: how to measure the ontology effectiveness for each situation, and how to compare with the use of ontology for each application and to transform into another shape of ontology depending on application, that could be helped by granularity control and even extended to reconfiguration of knowledge structure. In the result, the intended IT} ontology is modularized enough to be compromised later for each purpose of use, and in efficient and effective ways. Still we have to solve definition questions and their translation to ontology forms.}}


 * -- align="left" valign=top
 * Paci, Giulio; Pedrazzi, Giorgio & Turra, Roberta
 * Wikipedia based semantic metadata annotation of audio transcripts
 * International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th
 * 2010
 * {{hidden||A method to automatically annotate video items with semantic metadata is presented. The method has been developed in the context of the Papyrus project to annotate documentary- like broadcast videos with a set of relevant keywords using automatic speech recognition (ASR) transcripts as a primary complementary resource. The task is complicated by the high word error rate (WER) of the ASR} for this kind of videos. For this reason a novel relevance criterion based on domain information is proposed. Wikipedia is used both as a source of metadata and as a linguistic resource for disambiguating keywords and for eliminating the out of topic/out of domain keywords. Documents are annotated with relevant links to Wikipedia pages, concepts definitions, synonyms, translations and concepts categories.}}
 * {{hidden||A method to automatically annotate video items with semantic metadata is presented. The method has been developed in the context of the Papyrus project to annotate documentary- like broadcast videos with a set of relevant keywords using automatic speech recognition (ASR) transcripts as a primary complementary resource. The task is complicated by the high word error rate (WER) of the ASR} for this kind of videos. For this reason a novel relevance criterion based on domain information is proposed. Wikipedia is used both as a source of metadata and as a linguistic resource for disambiguating keywords and for eliminating the out of topic/out of domain keywords. Documents are annotated with relevant links to Wikipedia pages, concepts definitions, synonyms, translations and concepts categories.}}
 * {{hidden||A method to automatically annotate video items with semantic metadata is presented. The method has been developed in the context of the Papyrus project to annotate documentary- like broadcast videos with a set of relevant keywords using automatic speech recognition (ASR) transcripts as a primary complementary resource. The task is complicated by the high word error rate (WER) of the ASR} for this kind of videos. For this reason a novel relevance criterion based on domain information is proposed. Wikipedia is used both as a source of metadata and as a linguistic resource for disambiguating keywords and for eliminating the out of topic/out of domain keywords. Documents are annotated with relevant links to Wikipedia pages, concepts definitions, synonyms, translations and concepts categories.}}


 * -- align="left" valign=top
 * Shachaf, P.; Hara, N.; Herring, S.; Callahan, E.; Solomon, P.; Stvilia, B. & Matei, S.
 * Global perspective on Wikipedia research
 * Proceedings of the American Society for Information Science and Technology
 * 2008
 * 
 * {{hidden|| This panel will provide a global perspective on Wikipedia research. The literature on Wikipedia is mostly anecdotal, and most of the research has focused attention primarily on the English Wikipedia examining the accuracy of entries compared to established online encyclopedias (Emigh} \&amp; Herring, 2005; Giles, 2005; Rosenzweig, 2006) and analyzing the evolution of articles over time (Viégas, Wattenberg, \&amp; Dave, 2004; Viégas, Wattenberg, Kriss, \&amp; van Ham, 2007). Others have examined the quality of contribution (Stvilia} et al., 2005). However, only a few studies have conducted comparative analyses across languages or analyzed Wikipedia in languages other than English (e.g., Pfeil, Zaphiris, \&amp; Ang, 2006). There is a need for international, cross-cultural understanding of Wikipedia. In an effort to address this gap, this panel will present a range of international and cross-cultural research of Wikipedia. The presenters will contribute different perspectives of Wikipedia as an international sociocultural institution and will describe similarities and differences across various national/language versions of Wikipedia. Shachaf and Hara will present variation of norms and behaviors on talk pages in various languages of Wikipedia. Herring and Callahan will share results from a cross-language comparison of biographical entries that exhibit variations in content of entries in the English and Polish versions of Wikipedia and will explain how they are influenced by the culture and history of the US} and Poland. Stvilia will discuss some of the commonalities and variability of quality models used by different Wikipedias, and the problems of cross-language quality measurement aggregation and reasoning. Matei will describe the social structuration and distribution of roles and efforts in wiki teaching environments. Solomon's comments, as a discussant, will focus on how these comparative insights provide evidence of the ways in which an evolving institution, such as Wikipedia, may be a force for supporting cultural identity (or not).}}
 * {{hidden|| This panel will provide a global perspective on Wikipedia research. The literature on Wikipedia is mostly anecdotal, and most of the research has focused attention primarily on the English Wikipedia examining the accuracy of entries compared to established online encyclopedias (Emigh} \&amp; Herring, 2005; Giles, 2005; Rosenzweig, 2006) and analyzing the evolution of articles over time (Viégas, Wattenberg, \&amp; Dave, 2004; Viégas, Wattenberg, Kriss, \&amp; van Ham, 2007). Others have examined the quality of contribution (Stvilia} et al., 2005). However, only a few studies have conducted comparative analyses across languages or analyzed Wikipedia in languages other than English (e.g., Pfeil, Zaphiris, \&amp; Ang, 2006). There is a need for international, cross-cultural understanding of Wikipedia. In an effort to address this gap, this panel will present a range of international and cross-cultural research of Wikipedia. The presenters will contribute different perspectives of Wikipedia as an international sociocultural institution and will describe similarities and differences across various national/language versions of Wikipedia. Shachaf and Hara will present variation of norms and behaviors on talk pages in various languages of Wikipedia. Herring and Callahan will share results from a cross-language comparison of biographical entries that exhibit variations in content of entries in the English and Polish versions of Wikipedia and will explain how they are influenced by the culture and history of the US} and Poland. Stvilia will discuss some of the commonalities and variability of quality models used by different Wikipedias, and the problems of cross-language quality measurement aggregation and reasoning. Matei will describe the social structuration and distribution of roles and efforts in wiki teaching environments. Solomon's comments, as a discussant, will focus on how these comparative insights provide evidence of the ways in which an evolving institution, such as Wikipedia, may be a force for supporting cultural identity (or not).}}


 * -- align="left" valign=top
 * Schumann, E. T.; Brunner, L.; Schulz, K. U. & Ringlstetter, C.
 * A semantic interface for post secondary education programs
 * Proceedings of the American Society for Information Science and Technology
 * 2008
 * 


 * -- align="left" valign=top
 * Ueda, H. & Murakami, H.
 * Suggesting Japanese subject headings using web information resources
 * Proceedings of the American Society for Information Science and Technology
 * 2006
 * 
 * {{hidden||we propose a method that suggests BSH4} (Japan} Library Association, 1999) subject headings according to user queries when pattern matching algorithms fail to produce a hit. As user queries are diverse and unpredictable, we explore a method that makes a suggestion even when the query is a new word. We investigate the use of information obtained from Wikipedia (“Wikipedia,‿} n.d.), the Amazon Web Service (AWS), and Google. We implemented the method, and our system suggests ten BSH4} subject headings according to user queries.}}
 * {{hidden||we propose a method that suggests BSH4} (Japan} Library Association, 1999) subject headings according to user queries when pattern matching algorithms fail to produce a hit. As user queries are diverse and unpredictable, we explore a method that makes a suggestion even when the query is a new word. We investigate the use of information obtained from Wikipedia (“Wikipedia,‿} n.d.), the Amazon Web Service (AWS), and Google. We implemented the method, and our system suggests ten BSH4} subject headings according to user queries.}}


 * -- align="left" valign=top
 * Gazan, R.; Shachaf, P.; Barzilai-Nahon, K.; Shankar, K. & Bardzell, S.
 * Social computing as co-created experience
 * Proceedings of the American Society for Information Science and Technology
 * 2007
 * 


 * -- align="left" valign=top
 * Buzydlowski, J. W.
 * Exploring co-citation chains
 * Proceedings of the American Society for Information Science and Technology
 * 2006
 * 
 * {{hidden|| The game {“Six} Degrees of Kevin Bacon‿ is played by naming an actor and then, by thinking of other actors in movies such that a chain of connections can be made, linking the named actor with Kevin Bacon. The number of different movies that are used to link the actor to Bacon indicate the degree with which the two are linked. For example, using John Travolta as the named actor, he appeared in the movie Look Who' s Talking with Kirstie Alley, who was in She' s Having a Baby with Kevin Bacon. So, John Travolta has a Bacon number or degree of two, as connected via Kirstie Alley. (For} a more thorough discussion, see (http://en.wikipedia.org/wiki/Six\_Degrees\_of\_Kevin\_Bacon).} The example is taken from (http://www.geocities.com/theeac/bacon.html)). Based on the above, perhaps another title for this paper could be the {“Six} Degrees of Sir Francis Bacon,‿ as it indicates the framework for this paper by relating it to the above technique but placing it in an academic domain through the use of a scholarly bibliographic database. Additionally, the bibliometric technique of author co-citation analysis (ACA) will be used to help by automating the process of finding the connections.}}
 * {{hidden|| The game {“Six} Degrees of Kevin Bacon‿ is played by naming an actor and then, by thinking of other actors in movies such that a chain of connections can be made, linking the named actor with Kevin Bacon. The number of different movies that are used to link the actor to Bacon indicate the degree with which the two are linked. For example, using John Travolta as the named actor, he appeared in the movie Look Who' s Talking with Kirstie Alley, who was in She' s Having a Baby with Kevin Bacon. So, John Travolta has a Bacon number or degree of two, as connected via Kirstie Alley. (For} a more thorough discussion, see (http://en.wikipedia.org/wiki/Six\_Degrees\_of\_Kevin\_Bacon).} The example is taken from (http://www.geocities.com/theeac/bacon.html)). Based on the above, perhaps another title for this paper could be the {“Six} Degrees of Sir Francis Bacon,‿ as it indicates the framework for this paper by relating it to the above technique but placing it in an academic domain through the use of a scholarly bibliographic database. Additionally, the bibliometric technique of author co-citation analysis (ACA) will be used to help by automating the process of finding the connections.}}


 * -- align="left" valign=top
 * Shachaf, P.; Hara, N.; Bonk, C.; Mackey, T. P.; Hemminger, B.; Stvilia, B. & Rosenbaum, H.
 * Wiki a la carte: Understanding participation behaviors
 * Proceedings of the American Society for Information Science and Technology
 * 2007
 * 
 * {{hidden||This panel focuses on trends in research on Wikis. Wikis have become prevalent in our society and are used for multiple purposes, such as education, knowledge sharing, collaboration, and coordination. Similar to other popular social computing tools, they raise new research questions and have attracted the attention of researchers in information science. While some focus on the semantic web, the automatic processing of data accumulated by users, and tool improvements, others discuss social implications of Wikis. This panel presents five studies that address the social uses of Wikis that support information sharing. In their studies, the panelists use a variety of novel applications of research methods, such as action research, and online ethnography, site observation, survey, and interviews. The panelists will present their findings: Shachaf and Hara will discuss Wikipedians' norms and behaviors; Bonk will present collaborative writing on Wikibook; Mackey will discuss authorship and collaboration in PBwiki.com;} Hemminger will share results from the early use of wikis for conference communications; and Stvilia will outline the community mechanism of information quality assurance in Wikipedia.}}
 * {{hidden||This panel focuses on trends in research on Wikis. Wikis have become prevalent in our society and are used for multiple purposes, such as education, knowledge sharing, collaboration, and coordination. Similar to other popular social computing tools, they raise new research questions and have attracted the attention of researchers in information science. While some focus on the semantic web, the automatic processing of data accumulated by users, and tool improvements, others discuss social implications of Wikis. This panel presents five studies that address the social uses of Wikis that support information sharing. In their studies, the panelists use a variety of novel applications of research methods, such as action research, and online ethnography, site observation, survey, and interviews. The panelists will present their findings: Shachaf and Hara will discuss Wikipedians' norms and behaviors; Bonk will present collaborative writing on Wikibook; Mackey will discuss authorship and collaboration in PBwiki.com;} Hemminger will share results from the early use of wikis for conference communications; and Stvilia will outline the community mechanism of information quality assurance in Wikipedia.}}


 * -- align="left" valign=top
 * Shachaf, P.; Hara; Eschenfelder, K.; Goodrum, A.; Scott, L. C.; Shankar, K.; Ozakca, M. & Robbin
 * Anarchists, pirates, ideologists, and disasters: New digital trends and their impacts
 * Proceedings of the American Society for Information Science and Technology
 * 2006
 * 
 * {{hidden||This panel will address both online disasters created by anarchists and pirates and disaster relief efforts aided by information and communication technologies (ICTs).} An increasing number of people use (ICTs) to mobilize their resources and enhance their activities. This mobilization has unpredictable consequences for society: On one hand, use of ICT} has allowed for the mobilization of millions of people for disaster relief efforts and peace movements. On the other hand, it has also helped hackers, pirates to carryout destructive activities. In many cases it is hard to judge the moral consequences of the use of ICT} by marginalized groups. The panel will present five studies of which three will focus on online disobedience and two will focus on ICT} use for disaster. Together these presentations illustrate both positive and negative consequences of the new digital trends. Goodrum deliberates on an ethic of hacktivism in the context of online activism. Eschenfelder discusses user modification of or resistance to technological protection measures. Shachaf and Hara present a study of anarchists who attack information posted on Wikipedia and modify the content by deleting, renaming, reinterpreting, and recreating information according to their ideologies. Scott examines consumer media behaviors after hurricane Katrina and Rita disasters. Shankar and Ozakca discuss volunteer efforts in the aftermath of hurricane Katrina.}}
 * {{hidden||This panel will address both online disasters created by anarchists and pirates and disaster relief efforts aided by information and communication technologies (ICTs).} An increasing number of people use (ICTs) to mobilize their resources and enhance their activities. This mobilization has unpredictable consequences for society: On one hand, use of ICT} has allowed for the mobilization of millions of people for disaster relief efforts and peace movements. On the other hand, it has also helped hackers, pirates to carryout destructive activities. In many cases it is hard to judge the moral consequences of the use of ICT} by marginalized groups. The panel will present five studies of which three will focus on online disobedience and two will focus on ICT} use for disaster. Together these presentations illustrate both positive and negative consequences of the new digital trends. Goodrum deliberates on an ethic of hacktivism in the context of online activism. Eschenfelder discusses user modification of or resistance to technological protection measures. Shachaf and Hara present a study of anarchists who attack information posted on Wikipedia and modify the content by deleting, renaming, reinterpreting, and recreating information according to their ideologies. Scott examines consumer media behaviors after hurricane Katrina and Rita disasters. Shankar and Ozakca discuss volunteer efforts in the aftermath of hurricane Katrina.}}


 * -- align="left" valign=top
 * Ayers, P.
 * Researching wikipedia - current approaches and new directions
 * Proceedings of the American Society for Information Science and Technology
 * 2006
 * 


 * -- align="left" valign=top
 * Sundin, O. & Haider, J.
 * Debating information control in web 2.0: The case of Wikipedia vs. Citizendium
 * Proceedings of the American Society for Information Science and Technology
 * 2007
 * 
 * {{hidden||Wikipedia is continually being scrutinised for the quality of its content. The question addressed in this paper concerns which notions of information, of collaborative knowledge creation, of authority and of the role of the expert are drawn on when information control in WP} is discussed. This is done by focusing on the arguments made in the debates surrounding the launch of Citizendium, a proposed new collaborative online encyclopaedia. While Wikipedia claims not to attribute special status to any of its contributors, Citizendium intends to assign a decision-making role to subject experts. The empirical material for the present study consists of two online threads available from Slashdot. One, {“A} Look inside Citizendium‿, dates from September, the second one {“Co-Founder} Forks Wikipedia‿ from October 2006. The textual analysis of these documents was carried out through close interpretative reading. Five themes, related to different aspects of information control emerged: 1.information types, 2.information responsibility, 3. information perspectives, 4. information organisation, 5. information provenance \&amp; creation. Each theme contains a number of different positions. It was found that these positions not necessarily correspond with the different sides of the argument. Instead, at times the fault lines run through the two camps.}}
 * {{hidden||Wikipedia is continually being scrutinised for the quality of its content. The question addressed in this paper concerns which notions of information, of collaborative knowledge creation, of authority and of the role of the expert are drawn on when information control in WP} is discussed. This is done by focusing on the arguments made in the debates surrounding the launch of Citizendium, a proposed new collaborative online encyclopaedia. While Wikipedia claims not to attribute special status to any of its contributors, Citizendium intends to assign a decision-making role to subject experts. The empirical material for the present study consists of two online threads available from Slashdot. One, {“A} Look inside Citizendium‿, dates from September, the second one {“Co-Founder} Forks Wikipedia‿ from October 2006. The textual analysis of these documents was carried out through close interpretative reading. Five themes, related to different aspects of information control emerged: 1.information types, 2.information responsibility, 3. information perspectives, 4. information organisation, 5. information provenance \&amp; creation. Each theme contains a number of different positions. It was found that these positions not necessarily correspond with the different sides of the argument. Instead, at times the fault lines run through the two camps.}}


 * -- align="left" valign=top
 * Kimmerle, Joachim; Moskaliuk, Johannes & Cress, Ulrike
 * Individual Learning and Collaborative Knowledge Building with Shared Digital Artifacts.
 * Proceedings of World Academy of Science: Engineering \& Technology
 * 2008


 * -- align="left" valign=top
 * Yang, Kai-Hsiang; Chen, Chun-Yu; Lee, Hahn-Ming & Ho, Jan-Ming
 * EFS: Expert Finding System based on Wikipedia link pattern analysis
 * IEEE International Conference on Systems, Man and Cybernetics, 2008. SMC 2008.
 * 2008
 * {{hidden||Building an expert finding system is very important for many applications especially in the academic environment. Previous work uses e-mails or Web pages as corpus to analyze the expertise for each expert. In this paper, we present an Expert Finding System, abbreviated as EFS} to build experts' profiles by using their journal publications. For a given proposal, the EFS} first looks up the Wikipedia Web site to get relative link information, and then list and rank all associated experts by using those information. In our experiments, we use a real-world dataset which comprises of 882 people and 13,654 papers, and are categorized into 9 expertise domains. Our experimental results show that the EFS} works well on several expertise domains like LdquoArtificial} Intelligencerdquo and LdquoImage} \& Pattern Recognitionrdquo etc.}}
 * {{hidden||Building an expert finding system is very important for many applications especially in the academic environment. Previous work uses e-mails or Web pages as corpus to analyze the expertise for each expert. In this paper, we present an Expert Finding System, abbreviated as EFS} to build experts' profiles by using their journal publications. For a given proposal, the EFS} first looks up the Wikipedia Web site to get relative link information, and then list and rank all associated experts by using those information. In our experiments, we use a real-world dataset which comprises of 882 people and 13,654 papers, and are categorized into 9 expertise domains. Our experimental results show that the EFS} works well on several expertise domains like LdquoArtificial} Intelligencerdquo and LdquoImage} \& Pattern Recognitionrdquo etc.}}
 * {{hidden||Building an expert finding system is very important for many applications especially in the academic environment. Previous work uses e-mails or Web pages as corpus to analyze the expertise for each expert. In this paper, we present an Expert Finding System, abbreviated as EFS} to build experts' profiles by using their journal publications. For a given proposal, the EFS} first looks up the Wikipedia Web site to get relative link information, and then list and rank all associated experts by using those information. In our experiments, we use a real-world dataset which comprises of 882 people and 13,654 papers, and are categorized into 9 expertise domains. Our experimental results show that the EFS} works well on several expertise domains like LdquoArtificial} Intelligencerdquo and LdquoImage} \& Pattern Recognitionrdquo etc.}}


 * -- align="left" valign=top
 * Mullins, Matt & Fizzano, Perry
 * Treelicious: A System for Semantically Navigating Tagged Web Pages
 * IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)
 * 2010


 * -- align="left" valign=top
 * Achananuparp, Palakorn; Han, Hyoil; Nasraoui, Olfa & Johnson, Roberta
 * Semantically enhanced user modeling
 * Proceedings of the ACM} Symposium on Applied Computing
 * 2007
 * 
 * {{hidden||Content-based implicit user modeling techniques usually employ a traditional term vector as a representation of the user's interest. However, due to the problem of dimensionality in the vector space model, a simple term vector is not a sufficient representation of the user model as it ignores the semantic relations between terms. In this paper, we present a novel method to enhance a traditional term-based user model with WordNet-based} semantic similarity techniques. To achieve this, we use word definitions and relationship hierarchies in WordNet} to perform word sense disambiguation and employ domain-specific concepts as category labels for the derived user models. We tested our method on Windows to the Universe, a public educational website covering subjects in the Earth and Space Sciences, and performed an evaluation of our semantically enhanced user models against human judgment. Our approach is distinguishable from existing work because we automatically narrow down the set of domain specific concepts from initial domain concepts obtained from Wikipedia and because we automatically create semantically enhanced user models. ""}}
 * {{hidden||Content-based implicit user modeling techniques usually employ a traditional term vector as a representation of the user's interest. However, due to the problem of dimensionality in the vector space model, a simple term vector is not a sufficient representation of the user model as it ignores the semantic relations between terms. In this paper, we present a novel method to enhance a traditional term-based user model with WordNet-based} semantic similarity techniques. To achieve this, we use word definitions and relationship hierarchies in WordNet} to perform word sense disambiguation and employ domain-specific concepts as category labels for the derived user models. We tested our method on Windows to the Universe, a public educational website covering subjects in the Earth and Space Sciences, and performed an evaluation of our semantically enhanced user models against human judgment. Our approach is distinguishable from existing work because we automatically narrow down the set of domain specific concepts from initial domain concepts obtained from Wikipedia and because we automatically create semantically enhanced user models. ""}}


 * -- align="left" valign=top
 * Adafre, Sisay Fissaha; Jijkoun, Valentin & De, Rijke
 * Fact discovery in Wikipedia
 * IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, Nov 2 - 5 2007 Silicon Valley, CA, United states
 * 2007
 * 


 * -- align="left" valign=top
 * Adafre, Sisay Fissaha; Jijkoun, Valentin & Rijke, Maarten De
 * Link-based vs. content-based retrieval for question answering using Wikipedia
 * 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, September 20, 2006 - September 22, 2006 Alicante, Spain
 * 2007
 * {{hidden||We describe our participation in the WiQA} 2006 pilot on question answering using Wikipedia, with a focus on comparing linkbased vs content-based retrieval. Our system currently works for Dutch and English. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||We describe our participation in the WiQA} 2006 pilot on question answering using Wikipedia, with a focus on comparing linkbased vs content-based retrieval. Our system currently works for Dutch and English. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||We describe our participation in the WiQA} 2006 pilot on question answering using Wikipedia, with a focus on comparing linkbased vs content-based retrieval. Our system currently works for Dutch and English. Springer-Verlag} Berlin Heidelberg 2007.}}


 * -- align="left" valign=top
 * Adar, Eytan; Skinner, Michael & Weld, Daniel S.
 * Information arbitrage across multi-lingual Wikipedia
 * 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain
 * 2009
 * 


 * -- align="left" valign=top
 * Alencar, Rafael Odon De; Jr., Clodoveu Augusto Davis & Goncalves, Marcos Andre
 * Geographical classification of documents using evidence from Wikipedia
 * 6th Workshop on Geographic Information Retrieval, GIR'10, February 18, 2010 - February 19, 2010 Zurich, Switzerland
 * 2010
 * 


 * -- align="left" valign=top
 * Amaral, Carlos; Cassan, Adan; Figueira, Helena; Martins, Andre; Mendes, Afonso; Mendes, Pedro; Pinto, Claudia & Vidal, Daniel
 * Priberam's question answering system in QA@CLEF 2007
 * 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary
 * 2008
 * 
 * {{hidden||This paper accounts for Priberam's participation in the monolingual question answering (QA) track of CLEF} 2007. In previous participations, Priberam's QA} system obtained encouraging results both in monolingual and cross-language tasks. This year we endowed the system with syntactical processing, in order to capture the syntactic structure of the question. The main goal was to obtain a more tuned question categorisation and consequently a more precise answer extraction. Besides this, we provided our system with the ability to handle topic-related questions and to use encyclopaedic sources like Wikipedia. The paper provides a description of the improvements made in the system, followed by the discussion of the results obtained in Portuguese and Spanish monolingual runs. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper accounts for Priberam's participation in the monolingual question answering (QA) track of CLEF} 2007. In previous participations, Priberam's QA} system obtained encouraging results both in monolingual and cross-language tasks. This year we endowed the system with syntactical processing, in order to capture the syntactic structure of the question. The main goal was to obtain a more tuned question categorisation and consequently a more precise answer extraction. Besides this, we provided our system with the ability to handle topic-related questions and to use encyclopaedic sources like Wikipedia. The paper provides a description of the improvements made in the system, followed by the discussion of the results obtained in Portuguese and Spanish monolingual runs. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Arribillaga, Esnaola
 * Active knowledge generation by university students through cooperative learning
 * 2008 ITI 6th International Conference on Information and Communications Technology, ICICT 2008, December 16, 2008 - December 18, 2008 Cairo, Egypt
 * 2008
 * 
 * {{hidden||Social and cultural transformations caused by the globalisation have fostered changes in current universities, institutions which, doing an intensive and responsible use of technologies,have to create a continuous improvement-based pedagogical model consisting on Communities.To} this end, we propose here the adoption of the so-called hacker ethic, which highlights the importance of collaborative, passionate, creative as well as socially-valuable work. Applying this ethic to higher education, current universities may become Net-Academy-based} Universities.Therefore, these institutions require a new digital culture that allow the transmission of hacker ethic's values and, in turn, a Net-Academy-based} learning model that enable students transform into knowledge generators. In this way, wikitechnology-based systems may help universities to achieve the transformation they need. We present here an experiment to check whether these kind of resources transmit to the students the values of the hacker ethic allowing them to become active knowledge generators. This experiment revealed the problems of such technologies with the limits of the scope of the community created and the non-so-active knowledge-generator role of the students. Against these shortcomings, we address here a Wikipedia-based methodology and discuss the possibilities of this alternative to help current universities upgrade into Net-Academy-based} universities. ""}}
 * {{hidden||Social and cultural transformations caused by the globalisation have fostered changes in current universities, institutions which, doing an intensive and responsible use of technologies,have to create a continuous improvement-based pedagogical model consisting on Communities.To} this end, we propose here the adoption of the so-called hacker ethic, which highlights the importance of collaborative, passionate, creative as well as socially-valuable work. Applying this ethic to higher education, current universities may become Net-Academy-based} Universities.Therefore, these institutions require a new digital culture that allow the transmission of hacker ethic's values and, in turn, a Net-Academy-based} learning model that enable students transform into knowledge generators. In this way, wikitechnology-based systems may help universities to achieve the transformation they need. We present here an experiment to check whether these kind of resources transmit to the students the values of the hacker ethic allowing them to become active knowledge generators. This experiment revealed the problems of such technologies with the limits of the scope of the community created and the non-so-active knowledge-generator role of the students. Against these shortcomings, we address here a Wikipedia-based methodology and discuss the possibilities of this alternative to help current universities upgrade into Net-Academy-based} universities. ""}}


 * -- align="left" valign=top
 * Ashoori, Elham & Lalmas, Mounia
 * Using topic shifts in XML retrieval at INEX 2006
 * 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany
 * 2007
 * {{hidden||This paper describes the retrieval approaches used by Queen Mary, University of London in the INEX} 2006 ad hoc track. In our participation, we mainly investigate element-specific smoothing method within the language modelling framework. We adjust the amount of smoothing required for each XML} element depending on its number of topic shifts to provide a focused access to XML} elements in the Wikipedia collection. We also investigate whether using non-uniform priors is beneficial for the ad hoc tasks. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||This paper describes the retrieval approaches used by Queen Mary, University of London in the INEX} 2006 ad hoc track. In our participation, we mainly investigate element-specific smoothing method within the language modelling framework. We adjust the amount of smoothing required for each XML} element depending on its number of topic shifts to provide a focused access to XML} elements in the Wikipedia collection. We also investigate whether using non-uniform priors is beneficial for the ad hoc tasks. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||This paper describes the retrieval approaches used by Queen Mary, University of London in the INEX} 2006 ad hoc track. In our participation, we mainly investigate element-specific smoothing method within the language modelling framework. We adjust the amount of smoothing required for each XML} element depending on its number of topic shifts to provide a focused access to XML} elements in the Wikipedia collection. We also investigate whether using non-uniform priors is beneficial for the ad hoc tasks. Springer-Verlag} Berlin Heidelberg 2007.}}


 * -- align="left" valign=top
 * Auer, Soren; Bizer, Christian; Kobilarov, Georgi; Lehmann, Jens; Cyganiak, Richard & Ives, Zachary
 * DBpedia: A nucleus for a Web of open data
 * 6th International Semantic Web Conference, ISWC 2007 and 2nd Asian Semantic Web Conference, ASWC 2007, November 11, 2007 - November 15, 2007 Busan, Korea, Republic of
 * 2007
 * 
 * {{hidden||DBpedia} is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia} allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia} datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia} community and show how website authors can facilitate DBpedia} content within their sites. Finally, we present the current status of interlinking DBpedia} with other open datasets on the Web and outline how DBpedia} could serve as a nucleus for an emerging Web of open data. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||DBpedia} is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia} allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia} datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia} community and show how website authors can facilitate DBpedia} content within their sites. Finally, we present the current status of interlinking DBpedia} with other open datasets on the Web and outline how DBpedia} could serve as a nucleus for an emerging Web of open data. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Augello, Agnese; Vassallo, Giorgio; Gaglio, Salvatore & Pilato, Giovanni
 * A semantic layer on semi-structured data sources for intuitive chatbots
 * International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2009, March 16, 2009 - March 19, 2009 Fukuoka, Japan
 * 2009
 * 


 * -- align="left" valign=top
 * Ayu, Media A.; Taylor, Ken & Mantoro, Teddy
 * Active learning: Engaging students in the classroom using mobile phones
 * 2009 IEEE Symposium on Industrial Electronics and Applications, ISIEA 2009, October 4, 2009 - October 6, 2009 Kuala Lumpur, Malaysia
 * 2009
 * 
 * {{hidden||Audience Response Systems (ARS) are used to achieve active learning in lectures and large group environments by facilitating interaction between the presenter and the audience. However, their use is discouraged by the requirement for specialist infrastructure in the lecture theatre and management of the expensive clickers they use. We improve the ARS} by removing the need for specialist infrastructure, by using mobile phones instead of clickers, and by providing a web based interface in the familiar Wikipedia style. Responders usually vote by dialing and this has been configured to be cost free in most cases. The desirability of this approach is shown by the use the demonstration system has had with 21, 000 voters voting 92, 000 times in 14, 000 surveys to date. ""}}
 * {{hidden||Audience Response Systems (ARS) are used to achieve active learning in lectures and large group environments by facilitating interaction between the presenter and the audience. However, their use is discouraged by the requirement for specialist infrastructure in the lecture theatre and management of the expensive clickers they use. We improve the ARS} by removing the need for specialist infrastructure, by using mobile phones instead of clickers, and by providing a web based interface in the familiar Wikipedia style. Responders usually vote by dialing and this has been configured to be cost free in most cases. The desirability of this approach is shown by the use the demonstration system has had with 21, 000 voters voting 92, 000 times in 14, 000 surveys to date. ""}}


 * -- align="left" valign=top
 * Babu, T. Lenin; Ramaiah, M. Seetha; Prabhakar, T.V. & Rambabu, D.
 * ArchVoc - Towards an ontology for software architecture
 * ICSE 2007 Workshops:Second Workshop on SHAring and Reusing architectural Knowledge Architecture, Rationale, and Design Intent, SHARK-ADI'07, May 20, 2007 - May 26, 2007 Minneapolis, MN, United states
 * 2007
 * 


 * -- align="left" valign=top
 * Baeza-Yates, Ricardo
 * Keynote talk: Mining the web 2.0 for improved image search
 * 4th International Conference on Semantic and Digital Media Technologies, SAMT 2009, December 2, 2009 - December 4, 2009 Graz, Austria
 * 2009
 * 


 * -- align="left" valign=top
 * Banerjee, Somnath
 * Boosting inductive transfer for text classification using Wikipedia
 * 6th International Conference on Machine Learning and Applications, ICMLA 2007, December 13, 2007 - December 15, 2007 Cincinnati, OH, United states
 * 2007
 * 
 * {{hidden||Inductive transfer is applying knowledge learned on one set of tasks to improve the performance of learning a new task. Inductive transfer is being applied in improving the generalization performance on a classification task using the models learned on some related tasks. In this paper, we show a method of making inductive transfer for text classification more effective using Wikipedia. We map the text documents of the different tasks to a feature space created using Wikipedia, thereby providing some background knowledge of the contents of the documents. It has been observed here that when the classifiers are built using the features generated from Wikipedia they become more effective in transferring knowledge. An evaluation on the daily classification task on the Reuters RCV1} corpus shows that our method can significantly improve the performance of inductive transfer. Our method was also able to successfully overcome a major obstacle observed in a recent work on a similar setting. ""}}
 * {{hidden||Inductive transfer is applying knowledge learned on one set of tasks to improve the performance of learning a new task. Inductive transfer is being applied in improving the generalization performance on a classification task using the models learned on some related tasks. In this paper, we show a method of making inductive transfer for text classification more effective using Wikipedia. We map the text documents of the different tasks to a feature space created using Wikipedia, thereby providing some background knowledge of the contents of the documents. It has been observed here that when the classifiers are built using the features generated from Wikipedia they become more effective in transferring knowledge. An evaluation on the daily classification task on the Reuters RCV1} corpus shows that our method can significantly improve the performance of inductive transfer. Our method was also able to successfully overcome a major obstacle observed in a recent work on a similar setting. ""}}


 * -- align="left" valign=top
 * Baoyao, Zhou; Ping, Luo; Yuhong, Xiong & Wei, Liu
 * Wikipedia-graph based key concept extraction towards news analysis
 * 2009 IEEE Conference on Commerce and Enterprise Computing, CEC 2009, July 20, 2000 - July 23, 2009 Vienna, Austria
 * 2009
 * 
 * {{hidden||The well-known Wikipedia can serve as a comprehensive knowledge repository to facilitate textual content analysis, due to its abundance, high quality and well-structuring. In this paper, we propose WikiRank} - a Wikipedia-graph based ranking model, which can be used to extract key Wikipedia concepts from a document. These key concepts can be regarded as the most salient terms to represent the theme of the document. Different from other existing graph-based ranking algorithms, the concept graph used for ranking in this model is constructed by leveraging not only the co-occurrence relations within the local context of a document but also the preprocessed hyperlink-structure of Wikipedia. We have applied the proposed WikiRank} model with the Support Propagation ranking algorithm to analyze the news articles, especially for enterprise news. These promising applications include Wikipedia Concept Linking and Enterprise Concept Cloud Generation. ""}}
 * {{hidden||The well-known Wikipedia can serve as a comprehensive knowledge repository to facilitate textual content analysis, due to its abundance, high quality and well-structuring. In this paper, we propose WikiRank} - a Wikipedia-graph based ranking model, which can be used to extract key Wikipedia concepts from a document. These key concepts can be regarded as the most salient terms to represent the theme of the document. Different from other existing graph-based ranking algorithms, the concept graph used for ranking in this model is constructed by leveraging not only the co-occurrence relations within the local context of a document but also the preprocessed hyperlink-structure of Wikipedia. We have applied the proposed WikiRank} model with the Support Propagation ranking algorithm to analyze the news articles, especially for enterprise news. These promising applications include Wikipedia Concept Linking and Enterprise Concept Cloud Generation. ""}}


 * -- align="left" valign=top
 * Bautin, Mikhail & Skiena, Steven
 * Concordance-based entity-oriented search
 * IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, November 2, 2007 - November 5, 2007 Silicon Valley, CA, United states
 * 2007
 * 
 * {{hidden||We consider the problem of finding the relevant named entities in response to a search query over a given text corpus. Entity search can readily be used to augment conventional web search engines for a variety of applications. To assess the significance of entity search, we analyzed the AOL} dataset of 36 million web search queries with respect to two different sets of entities: namely (a) 2.3 million distinct entities extracted from a news text corpus and (b) 2.9 million Wikipedia article titles. The results clearly indicate that search engines should be aware of entities, for under various criteria of matching between 18-39\% of all web search queries can be recognized as specifically searching for entities, while 73-87\% of all queries contain entities. Our entity search engine creates a concordance document for each entity, consisting of all the sentences in the corpus containing that entity. We then index and search these documents using open-source search software. This gives a ranked list of entities as the result of search. Visit http://www.textmap.com for a demonstration of our entity search engine over a large news corpus. We evaluate our system by comparing the results of each query to the list of entities that have highest statistical juxtaposition scores with the queried entity. Juxtaposition score is a measure of how strongly two entities are related in terms of a probabilistic upper bound. The results show excellent performance, particularly over well-characterized classes of entities such as people. ""}}
 * {{hidden||We consider the problem of finding the relevant named entities in response to a search query over a given text corpus. Entity search can readily be used to augment conventional web search engines for a variety of applications. To assess the significance of entity search, we analyzed the AOL} dataset of 36 million web search queries with respect to two different sets of entities: namely (a) 2.3 million distinct entities extracted from a news text corpus and (b) 2.9 million Wikipedia article titles. The results clearly indicate that search engines should be aware of entities, for under various criteria of matching between 18-39\% of all web search queries can be recognized as specifically searching for entities, while 73-87\% of all queries contain entities. Our entity search engine creates a concordance document for each entity, consisting of all the sentences in the corpus containing that entity. We then index and search these documents using open-source search software. This gives a ranked list of entities as the result of search. Visit http://www.textmap.com for a demonstration of our entity search engine over a large news corpus. We evaluate our system by comparing the results of each query to the list of entities that have highest statistical juxtaposition scores with the queried entity. Juxtaposition score is a measure of how strongly two entities are related in terms of a probabilistic upper bound. The results show excellent performance, particularly over well-characterized classes of entities such as people. ""}}


 * -- align="left" valign=top
 * Beigbeder, Michel
 * Focused retrieval with proximity scoring
 * 25th Annual ACM Symposium on Applied Computing, SAC 2010, March 22, 2010 - March 26, 2010 Sierre, Switzerland
 * 2010
 * 
 * {{hidden||We present in this paper a scoring method for information retrieval based on the proximity of the query terms in the documents. The idea of the method first is to assign to each position in the document a fuzzy proximity value depending on its closeness to the surrounding keywords. These proximity values can then be summed on any range of text - including any passage or any element - and after normalization this sum is used as the relevance score for the extent. Some experiments on the Wikipedia collection used in the INEX} 2008 evaluation campaign are presented and discussed. ""}}
 * {{hidden||We present in this paper a scoring method for information retrieval based on the proximity of the query terms in the documents. The idea of the method first is to assign to each position in the document a fuzzy proximity value depending on its closeness to the surrounding keywords. These proximity values can then be summed on any range of text - including any passage or any element - and after normalization this sum is used as the relevance score for the extent. Some experiments on the Wikipedia collection used in the INEX} 2008 evaluation campaign are presented and discussed. ""}}


 * -- align="left" valign=top
 * Beigbeder, Michel; Imafouo, Amelie & Mercier, Annabelle
 * ENSM-SE at INEX 2009: Scoring with proximity and semantic tag information
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||We present in this paper some experiments on the Wikipedia collection used in the INEX} 2009 evaluation campaign with an information retrieval method based on proximity. The idea of the method is to assign to each position in the document a fuzzy proximity value depending on its closeness to the surrounding keywords. These proximity values can then be summed on any range of text - including any passage or any element - and after normalization this sum is used as the relevance score for the extent. To take into account the semantic tags, we define a contextual operator which allow to consider at query time only the occurrences of terms that appear in a given semantic context. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We present in this paper some experiments on the Wikipedia collection used in the INEX} 2009 evaluation campaign with an information retrieval method based on proximity. The idea of the method is to assign to each position in the document a fuzzy proximity value depending on its closeness to the surrounding keywords. These proximity values can then be summed on any range of text - including any passage or any element - and after normalization this sum is used as the relevance score for the extent. To take into account the semantic tags, we define a contextual operator which allow to consider at query time only the occurrences of terms that appear in a given semantic context. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Bekavac, Bozo & Tadic, Marko
 * A generic method for multi word extraction from wikipedia
 * ITI 2008 30th International Conference on Information Technology Interfaces, June 23, 2008 - June 26, 2008 Cavtat/Dubrovnik, Croatia
 * 2008
 * 
 * {{hidden||This paper presents the generic method for multiword expression extraction from Wikipedia. The method is using the properties of this specific encyclopedic genre in its {HTML} format and it relies on the intention of the authors of articles to link to other articles. The relevant links were processed by applying local regular grammars within the NooJ} development environment. We tested the method on a Croatian version of Wikipedia and we present the results obtained.}}
 * {{hidden||This paper presents the generic method for multiword expression extraction from Wikipedia. The method is using the properties of this specific encyclopedic genre in its {HTML} format and it relies on the intention of the authors of articles to link to other articles. The relevant links were processed by applying local regular grammars within the NooJ} development environment. We tested the method on a Croatian version of Wikipedia and we present the results obtained.}}


 * -- align="left" valign=top
 * Berkner, Kathrin
 * WikiPrints - Rendering enterprise wiki content for printing
 * Imaging and Printing in a Web 2.0 World; and Multimedia Content Access: Algorithms and Systems IV, January 19, 2010 - January 21, 2010 San Jose, CA, United states
 * 2010
 * 
 * {{hidden||Wikis have become a tool of choice for collaborative, informative communication. In contrast to the immense Wikipedia, that serves as a reference web site and typically covers only one topic per web page, enterprise wikis are often used as project management tools and contain several closely related pages authored by members of one project. In that scenario it is useful to print closely related content for review or teaching purposes. In this paper we propose a novel technique for rendering enterprise wiki content for printing called WikiPrints, that creates a linearized version of wiki content formatted as a mixture between web layout and conventional document layout suitable for printing. Compared to existing print options for wiki content, Wikiprints automatically selects content from different wiki pages given user preferences and usage scenarios. Meta data such as content authors or time of content editing are considered. A preview of the linearized content is shown to the user and an interface for making manual formatting changes provided. 2010 Copyright SPIE} - The International Society for Optical Engineering.}}
 * {{hidden||Wikis have become a tool of choice for collaborative, informative communication. In contrast to the immense Wikipedia, that serves as a reference web site and typically covers only one topic per web page, enterprise wikis are often used as project management tools and contain several closely related pages authored by members of one project. In that scenario it is useful to print closely related content for review or teaching purposes. In this paper we propose a novel technique for rendering enterprise wiki content for printing called WikiPrints, that creates a linearized version of wiki content formatted as a mixture between web layout and conventional document layout suitable for printing. Compared to existing print options for wiki content, Wikiprints automatically selects content from different wiki pages given user preferences and usage scenarios. Meta data such as content authors or time of content editing are considered. A preview of the linearized content is shown to the user and an interface for making manual formatting changes provided. 2010 Copyright SPIE} - The International Society for Optical Engineering.}}


 * -- align="left" valign=top
 * Bhn, Christian & Nrvag, Kjetil
 * Extracting named entities and synonyms from Wikipedia
 * 24th IEEE International Conference on Advanced Information Networking and Applications, AINA2010, April 20, 2010 - April 23, 2010 Perth, WA, Australia
 * 2010
 * 


 * -- align="left" valign=top
 * Bischoff, Andreas
 * The pediaphon - Speech interface to the free wikipedia encyclopedia for mobile phones, PDA's and MP3-players
 * DEXA 2007 18th International Workshop on Database and Expert Systems Applications, September 3, 2007 - September 7, 2007 Regensburg, Germany
 * 2007
 * 
 * {{hidden||This paper presents an approach to generate audio based learning material dynamically from Wikipedia articles for M-Learning} and ubiquitous access. It introduces the so called {'Pediaphon', an speech interface to the free Wikipedia online encyclopedia as an example application for 'microlearning'. The effective generation and the deployment of the audio data to the user via podcast or progressive download (pseudo streaming) are covered. A convenient cell phone interface to the Wikipedia content, which is usable with every mobile phone will be introduced. ""}}
 * {{hidden||This paper presents an approach to generate audio based learning material dynamically from Wikipedia articles for M-Learning} and ubiquitous access. It introduces the so called {'Pediaphon', an speech interface to the free Wikipedia online encyclopedia as an example application for 'microlearning'. The effective generation and the deployment of the audio data to the user via podcast or progressive download (pseudo streaming) are covered. A convenient cell phone interface to the Wikipedia content, which is usable with every mobile phone will be introduced. ""}}


 * -- align="left" valign=top
 * Biuk-Aghai, Robert P.
 * Visualizing co-authorship networks in online Wikipedia
 * 2006 International Symposium on Communications and Information Technologies, ISCIT, October 18, 2006 - October 20, 2006 Bangkok, Thailand
 * 2006
 * 
 * {{hidden||The Wikipedia online user-contributed encyclopedia has rapidly become a highly popular and widely used online reference source. However, perceiving the complex relationships in the network of articles and other entities in Wikipedia is far from easy. We introduce the notion of using co-authorship of articles to determine relationship between articles, and present the WikiVis} information visualization system which visualizes this and other types of relationships in the Wikipedia database in {3D} graph form. A {3D} star layout and a {3D} nested cone tree layout are presented for displaying relationships between entities and between categories, respectively. A novel {3D} pinboard layout is presented for displaying search results. ""}}
 * {{hidden||The Wikipedia online user-contributed encyclopedia has rapidly become a highly popular and widely used online reference source. However, perceiving the complex relationships in the network of articles and other entities in Wikipedia is far from easy. We introduce the notion of using co-authorship of articles to determine relationship between articles, and present the WikiVis} information visualization system which visualizes this and other types of relationships in the Wikipedia database in {3D} graph form. A {3D} star layout and a {3D} nested cone tree layout are presented for displaying relationships between entities and between categories, respectively. A novel {3D} pinboard layout is presented for displaying search results. ""}}


 * -- align="left" valign=top
 * Biuk-Aghai, Robert P.; Tang, Libby Veng-Sam; Fong, Simon & Si, Yain-Whar
 * Wikis as digital ecosystems: An analysis based on authorship
 * 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies, DEST '09, June 1, 2009 - June 3, 2009 Istanbul, Turkey
 * 2009
 * 


 * -- align="left" valign=top
 * Bocek, Thomas; Peric, Dalibor; Hecht, Fabio; Hausheer, David & Stiller, Burkhard
 * Peer vote: A decentralized voting mechanism for P2P collaboration systems
 * 3rd International Conference on Autonomous Infrastructure, Management and Security, AIMS 2009, June 30, 2009 - July 2, 2009 Enschede, Netherlands
 * 2009
 * 
 * {{hidden||Peer-to-peer (P2P) systems achieve scalability, fault tolerance, and load balancing with a low-cost infrastructure, characteristics from which collaboration systems, such as Wikipedia, can benefit. A major challenge in P2P} collaboration systems is to maintain article quality after each modification in the presence of malicious peers. A way of achieving this goal is to allow modifications to take effect only if a majority of previous editors approve the changes through voting. The absence of a central authority makes voting a challenge in P2P} systems. This paper proposes the fully decentralized voting mechanism PeerVote, which enables users to vote on modifications in articles in a P2P} collaboration system. Simulations and experiments show the scalability and robustness of PeerVote, even in the presence of malicious peers. 2009 IFIP} International Federation for Information Processing.}}
 * {{hidden||Peer-to-peer (P2P) systems achieve scalability, fault tolerance, and load balancing with a low-cost infrastructure, characteristics from which collaboration systems, such as Wikipedia, can benefit. A major challenge in P2P} collaboration systems is to maintain article quality after each modification in the presence of malicious peers. A way of achieving this goal is to allow modifications to take effect only if a majority of previous editors approve the changes through voting. The absence of a central authority makes voting a challenge in P2P} systems. This paper proposes the fully decentralized voting mechanism PeerVote, which enables users to vote on modifications in articles in a P2P} collaboration system. Simulations and experiments show the scalability and robustness of PeerVote, even in the presence of malicious peers. 2009 IFIP} International Federation for Information Processing.}}


 * -- align="left" valign=top
 * Bohm, Christoph; Naumann, Felix; Abedjan, Ziawasch; Fenz, Dandy; Grutze, Toni; Hefenbrock, Daniel; Pohl, Matthias & Sonnabend, David
 * Profiling linked open data with ProLOD
 * 2010 IEEE 26th International Conference on Data Engineering Workshops, ICDEW 2010, March 1, 2010 - March 6, 2010 Long Beach, CA, United states
 * 2010
 * 
 * {{hidden||Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD} sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset. ""}}
 * {{hidden||Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD} sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset. ""}}


 * -- align="left" valign=top
 * Boselli, Roberto; Cesarini, Mirko & Mezzanzanica, Mario
 * Customer knowledge and service development, the Web 2.0 role in co-production
 * Proceedings of World Academy of Science, Engineering and Technology
 * 2009
 * {{hidden||The paper is concerned with relationships between SSME} and ICTs} and focuses on the role of Web 2.0 tools in the service development process. The research presented aims at exploring how collaborative technologies can support and improve service processes, highlighting customer centrality and value co-production. The core idea of the paper is the centrality of user participation and the collaborative technologies as enabling factors; Wikipedia is analyzed as an example. The result of such analysis is the identification and description of a pattern characterising specific services in which users collaborate by means of web tools with value co-producers during the service process. The pattern of collaborative co-production concerning several categories of services including knowledge based services is then discussed.}}
 * {{hidden||The paper is concerned with relationships between SSME} and ICTs} and focuses on the role of Web 2.0 tools in the service development process. The research presented aims at exploring how collaborative technologies can support and improve service processes, highlighting customer centrality and value co-production. The core idea of the paper is the centrality of user participation and the collaborative technologies as enabling factors; Wikipedia is analyzed as an example. The result of such analysis is the identification and description of a pattern characterising specific services in which users collaborate by means of web tools with value co-producers during the service process. The pattern of collaborative co-production concerning several categories of services including knowledge based services is then discussed.}}
 * {{hidden||The paper is concerned with relationships between SSME} and ICTs} and focuses on the role of Web 2.0 tools in the service development process. The research presented aims at exploring how collaborative technologies can support and improve service processes, highlighting customer centrality and value co-production. The core idea of the paper is the centrality of user participation and the collaborative technologies as enabling factors; Wikipedia is analyzed as an example. The result of such analysis is the identification and description of a pattern characterising specific services in which users collaborate by means of web tools with value co-producers during the service process. The pattern of collaborative co-production concerning several categories of services including knowledge based services is then discussed.}}


 * -- align="left" valign=top
 * Bouma, Gosse; Kloosterman, Geert; Mur, Jori; Noord, Gertjan Van; Plas, Lonneke Van Der & Tiedemann, Jorg
 * Question answering with joost at CLEF 2007
 * 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary
 * 2008
 * 
 * {{hidden||We describe our system for the monolingual Dutch and multilingual English to Dutch QA} tasks. We describe the preprocessing of Wikipedia, inclusion of query expansion in IR, anaphora resolution in follow-up questions, and a question classification module for the multilingual task. Our best runs achieved 25.5\% accuracy for the Dutch monolingual task, and 13.5\% accuracy for the multilingual task. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We describe our system for the monolingual Dutch and multilingual English to Dutch QA} tasks. We describe the preprocessing of Wikipedia, inclusion of query expansion in IR, anaphora resolution in follow-up questions, and a question classification module for the multilingual task. Our best runs achieved 25.5\% accuracy for the Dutch monolingual task, and 13.5\% accuracy for the multilingual task. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Brandes, Ulrik & Lerner, Jurgen
 * Visual analysis of controversy in user-generated encyclopedias
 * Houndmills, Basingstoke, Hants., RG21} {6XS, United Kingdom
 * 2008
 * 
 * {{hidden||Wikipedia is a large and rapidly growing Web-based collaborative authoring environment, where anyone on the Internet can create, modify, and delete pages about encyclopedic topics. A remarkable property of some Wikipedia pages is that they are written by up to thousands of authors who may have contradicting opinions. In this paper, we show that a visual analysis of the who revises whom-network gives deep insight into controversies. We propose a set of analysis and visualization techniques that reveal the dominant authors of a page, the roles they play, and the alters they confront. Thereby we provide tools to understand how Wikipedia authors collaborate in the presence of controversy. 2008 PalgraveMacmillan} Ltd. All rights reserved.}}
 * {{hidden||Wikipedia is a large and rapidly growing Web-based collaborative authoring environment, where anyone on the Internet can create, modify, and delete pages about encyclopedic topics. A remarkable property of some Wikipedia pages is that they are written by up to thousands of authors who may have contradicting opinions. In this paper, we show that a visual analysis of the who revises whom-network gives deep insight into controversies. We propose a set of analysis and visualization techniques that reveal the dominant authors of a page, the roles they play, and the alters they confront. Thereby we provide tools to understand how Wikipedia authors collaborate in the presence of controversy. 2008 PalgraveMacmillan} Ltd. All rights reserved.}}


 * -- align="left" valign=top
 * Bryant, Susan L.; Forte, Andrea & Bruckman, Amy
 * Becoming Wikipedian: Transformation of participation in a collaborative online encyclopedia
 * 2005 International ACM SIGGROUP Conference on Supporting Group Work, GROUP'05, November 6, 2005 - November 9, 2005 Sanibel Island, FL, United states
 * 2005
 * 


 * -- align="left" valign=top
 * Butler, Brian; Joyce, Elisabeth & Pike, Jacqueline
 * Don't look now, but we've created a bureaucracy: The nature and roles of policies and rules in Wikipedia
 * 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2008, April 5, 2008 - April 10, 2008 Florence, Italy
 * 2008
 * 


 * -- align="left" valign=top
 * Buzzi, Marina & Leporini, Barbara
 * Is Wikipedia usable for the blind?
 * W4A'08: 2008 International Cross-Disciplinary Conference on Web Accessibility, W4A, Apr 21 - 22 2008 Beijing, China
 * 2008
 * 


 * -- align="left" valign=top
 * Buzzi, M.Claudia; Buzzi, Marina; Leporini, Barbara & Senette, Caterina
 * Making wikipedia editing easier for the blind
 * NordiCHI 2008: Building Bridges - 5th Nordic Conference on Human-Computer Interaction, October 20, 2008 - October 22, 2008 Lund, Sweden
 * 2008
 * 
 * {{hidden||A key feature of Web 2.0 is the possibility of sharing, creating and editing on-line content. This approach is increasingly used in learning environments to favor interaction and cooperation among students. These functions should be accessible as well as easy to use for all participants. Unfortunately accessibility and usability issues still exist for Web 2.0-based applications. For instance, Wikipedia presents many difficulties for the blind. In this paper we discuss a possible solution for simplifying the Wikipedia editing page when interacting via screen reader. Building an editing interface that conforms to W3C} ARIA} (Accessible} Rich Internet Applications) recommendations would overcome ccessibility and usability problems that prevent blind users from actively contributing to Wikipedia. ""}}
 * {{hidden||A key feature of Web 2.0 is the possibility of sharing, creating and editing on-line content. This approach is increasingly used in learning environments to favor interaction and cooperation among students. These functions should be accessible as well as easy to use for all participants. Unfortunately accessibility and usability issues still exist for Web 2.0-based applications. For instance, Wikipedia presents many difficulties for the blind. In this paper we discuss a possible solution for simplifying the Wikipedia editing page when interacting via screen reader. Building an editing interface that conforms to W3C} ARIA} (Accessible} Rich Internet Applications) recommendations would overcome ccessibility and usability problems that prevent blind users from actively contributing to Wikipedia. ""}}


 * -- align="left" valign=top
 * Byna, Surendra; Meng, Jiayuan; Raghunathan, Anand; Chakradhar, Srimat & Cadambi, Srihari
 * Best-effort semantic document search on GPUs
 * 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU-3, Held in cooperation with ACM ASPLOS XV, March 14, 2010 - March 14, 2010 Pittsburg, PA, United states
 * 2010
 * 
 * {{hidden||Semantic indexing is a popular technique used to access and organize large amounts of unstructured text data. We describe an optimized implementation of semantic indexing and document search on manycore GPU} platforms. We observed that a parallel implementation of semantic indexing on a 128-core Tesla C870 GPU} is only {2.4X} faster than a sequential implementation on an Intel Xeon {2.4GHz} processor. We ascribe the less than spectacular speedup to a mismatch in the workload characteristics of semantic indexing and the unique architectural features of GPUs.} Compared to the regular numerical computations that have been ported to GPUs} with great success, our semantic indexing algorithm (the recently proposed Supervised Semantic Indexing algorithm called SSI) has interesting characteristics - the amount of parallelism in each training instance is data-dependent, and each iteration involves the product of a dense matrix with a sparse vector, resulting in random memory access patterns. As a result, we observed that the baseline GPU} implementation significantly under-utilizes the hardware resources (processing elements and memory bandwidth) of the GPU} platform. However, the SSI} algorithm also demonstrates unique characteristics, which we collectively refer to as the forgiving nature" of the algorithm. These unique characteristics allow for novel optimizations that do not strive to preserve numerical equivalence of each training iteration with the sequential implementation. In particular we consider best-effort computing techniques such as dependency relaxation and computation dropping to suitably alter the workload characteristics of SSI} to leverage the unique architectural features of the GPU.} We also show that the realization of dependency relaxation and computation dropping concepts on a GPU} is quite different from how one would implement these concepts on a multicore CPU} largely due to the distinct architectural features supported by a GPU.} Our new techniques dramatically enhance the amount of parallel workload leading to much higher performance on the GPU.} By optimizing data transfers between CPU} and GPU} and by reducing GPU} kernel invocation overheads we achieve further performance gains. We evaluated our new GPU-accelerated} implementation of semantic document search on a database of over 1.8 million documents from Wikipedia. By applying our novel performance-enhancing strategies our GPU} implementation on a 128-core Tesla C870 achieved a {5.5X} acceleration as compared to a baseline parallel implementation on the same GPU.} Compared to a baseline parallel TBB} implementation on a dual-socket quad-core Intel Xeon multicore CPU} (8-cores) the enhanced GPU} implementation is {11X} faster. Compared to a parallel implementation on the same multi-core CPU} that also uses data dependency relaxation and dropping computation techniques our enhanced GPU} implementation is {5X} faster. """}}
 * {{hidden||Semantic indexing is a popular technique used to access and organize large amounts of unstructured text data. We describe an optimized implementation of semantic indexing and document search on manycore GPU} platforms. We observed that a parallel implementation of semantic indexing on a 128-core Tesla C870 GPU} is only {2.4X} faster than a sequential implementation on an Intel Xeon {2.4GHz} processor. We ascribe the less than spectacular speedup to a mismatch in the workload characteristics of semantic indexing and the unique architectural features of GPUs.} Compared to the regular numerical computations that have been ported to GPUs} with great success, our semantic indexing algorithm (the recently proposed Supervised Semantic Indexing algorithm called SSI) has interesting characteristics - the amount of parallelism in each training instance is data-dependent, and each iteration involves the product of a dense matrix with a sparse vector, resulting in random memory access patterns. As a result, we observed that the baseline GPU} implementation significantly under-utilizes the hardware resources (processing elements and memory bandwidth) of the GPU} platform. However, the SSI} algorithm also demonstrates unique characteristics, which we collectively refer to as the forgiving nature" of the algorithm. These unique characteristics allow for novel optimizations that do not strive to preserve numerical equivalence of each training iteration with the sequential implementation. In particular we consider best-effort computing techniques such as dependency relaxation and computation dropping to suitably alter the workload characteristics of SSI} to leverage the unique architectural features of the GPU.} We also show that the realization of dependency relaxation and computation dropping concepts on a GPU} is quite different from how one would implement these concepts on a multicore CPU} largely due to the distinct architectural features supported by a GPU.} Our new techniques dramatically enhance the amount of parallel workload leading to much higher performance on the GPU.} By optimizing data transfers between CPU} and GPU} and by reducing GPU} kernel invocation overheads we achieve further performance gains. We evaluated our new GPU-accelerated} implementation of semantic document search on a database of over 1.8 million documents from Wikipedia. By applying our novel performance-enhancing strategies our GPU} implementation on a 128-core Tesla C870 achieved a {5.5X} acceleration as compared to a baseline parallel implementation on the same GPU.} Compared to a baseline parallel TBB} implementation on a dual-socket quad-core Intel Xeon multicore CPU} (8-cores) the enhanced GPU} implementation is {11X} faster. Compared to a parallel implementation on the same multi-core CPU} that also uses data dependency relaxation and dropping computation techniques our enhanced GPU} implementation is {5X} faster. """}}


 * -- align="left" valign=top
 * Cabral, Luis Miguel; Costa, Luis Fernando & Santos, Diana
 * What Happened to Esfinge in 2007?
 * 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary
 * 2008
 * 
 * {{hidden||Esfinge is a general domain Portuguese question answering system which uses the information available on the Web as an additional resource when searching for answers. Other external resources and tools used are a broad coverage parser, a morphological analyser, a named entity recognizer and a Web-based database of word co-occurrences. In this fourth participation in CLEF, in addition to the new challenges posed by the organization (topics and anaphors in questions and the use of Wikipedia to search and support answers), we experimented with a multiple question and multiple answer approach in QA.} 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Esfinge is a general domain Portuguese question answering system which uses the information available on the Web as an additional resource when searching for answers. Other external resources and tools used are a broad coverage parser, a morphological analyser, a named entity recognizer and a Web-based database of word co-occurrences. In this fourth participation in CLEF, in addition to the new challenges posed by the organization (topics and anaphors in questions and the use of Wikipedia to search and support answers), we experimented with a multiple question and multiple answer approach in QA.} 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Calefato, Caterina; Vernero, Fabiana & Montanari, Roberto
 * Wikipedia as an example of positive technology: How to promote knowledge sharing and collaboration with a persuasive tutorial
 * 2009 2nd Conference on Human System Interactions, HSI '09, May 21, 2009 - May 23, 2009 Catania, Italy
 * 2009
 * 


 * -- align="left" valign=top
 * Chahine, C.Abi.; Chaignaud, N.; Kotowicz, J.P. & Pecuchet, J.P.
 * Context and keyword extraction in plain text using a graph representation
 * 4th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2008, November 30, 2008 - December 3, 2008 Bali, Indonesia
 * 2008
 * 


 * -- align="left" valign=top
 * Chandramouli, K.; Kliegr, T.; Nemrava, J.; Svatek, V. & Izquierdo, E.
 * Query refinement and user relevance feedback for contextualized image retrieval
 * 5th International Conference on Visual Information Engineering, VIE 2008, July 29, 2008 - August 1, 2008 Xi'an, China
 * 2008
 * 
 * {{hidden||The motivation of this paper is to enhance the user perceived precision of results of content based information retrieval (CBIR) systems with query refinement (QR), visual analysis (VA) and relevance feedback (RF) algorithms. The proposed algorithms were implemented as modules into K-Space} CBIR} system. The QR} module discovers hypernyms for the given query from a free text corpus (such as Wikipedia) and uses these hypernyms as refinements for the original query. Extracting hypernyms from Wikipedia makes it possible to apply query refinement to more queries than in related approaches that use static predefined thesaurus such as Wordnet. The VA} Module uses the K-Means} algorithm for clustering the images based on low-level MPEG} - 7 Visual features. The RF} Module uses the preference information expressed by the user to build user profiles by applying SOM-} based supervised classification, which is further optimized by a hybrid Particle Swarm Optimization (PSO) algorithm. The experiments evaluating the performance of QR} and VA} modules show promising results. 2008 The Institution of Engineering and Technology.}}
 * {{hidden||The motivation of this paper is to enhance the user perceived precision of results of content based information retrieval (CBIR) systems with query refinement (QR), visual analysis (VA) and relevance feedback (RF) algorithms. The proposed algorithms were implemented as modules into K-Space} CBIR} system. The QR} module discovers hypernyms for the given query from a free text corpus (such as Wikipedia) and uses these hypernyms as refinements for the original query. Extracting hypernyms from Wikipedia makes it possible to apply query refinement to more queries than in related approaches that use static predefined thesaurus such as Wordnet. The VA} Module uses the K-Means} algorithm for clustering the images based on low-level MPEG} - 7 Visual features. The RF} Module uses the preference information expressed by the user to build user profiles by applying SOM-} based supervised classification, which is further optimized by a hybrid Particle Swarm Optimization (PSO) algorithm. The experiments evaluating the performance of QR} and VA} modules show promising results. 2008 The Institution of Engineering and Technology.}}


 * -- align="left" valign=top
 * Chandramouli, K.; Kliegr, T.; Svatek, V. & Izquierdo, E.
 * Towards semantic tagging in collaborative environments
 * DSP 2009:16th International Conference on Digital Signal Processing, July 5, 2009 - July 7, 2009 Santorini, Greece
 * 2009
 * 
 * {{hidden||Tags pose an efficient and effective way of organization of resources, but they are not always available. A technique called SCM/THD} investigated in this paper extracts entities from free-text annotations, and using the Lin similarity measure over the WordNet} thesaurus classifies them into a controlled vocabulary of tags. Hypernyms extracted from Wikipedia are used to map uncommon entities to Wordnet synsets. In collaborative environments, users can assign multiple annotations to the same object hence increasing the amount of information available. Assuming that the semantics of the annotations overlap, this redundancy can be exploited to generate higher quality tags. A preliminary experiment presented in the paper evaluates the consistency and quality of tags generated from multiple annotations of the same image. The results obtained on an experimental dataset comprising of 62 annotations from four annotators show that the accuracy of a simple majority vote surpasses the average accuracy obtained through assessing the annotations individually by 18\%. A moderate-strength correlation has been found between the quality of generated tags and the consistency of annotations. ""}}
 * {{hidden||Tags pose an efficient and effective way of organization of resources, but they are not always available. A technique called SCM/THD} investigated in this paper extracts entities from free-text annotations, and using the Lin similarity measure over the WordNet} thesaurus classifies them into a controlled vocabulary of tags. Hypernyms extracted from Wikipedia are used to map uncommon entities to Wordnet synsets. In collaborative environments, users can assign multiple annotations to the same object hence increasing the amount of information available. Assuming that the semantics of the annotations overlap, this redundancy can be exploited to generate higher quality tags. A preliminary experiment presented in the paper evaluates the consistency and quality of tags generated from multiple annotations of the same image. The results obtained on an experimental dataset comprising of 62 annotations from four annotators show that the accuracy of a simple majority vote surpasses the average accuracy obtained through assessing the annotations individually by 18\%. A moderate-strength correlation has been found between the quality of generated tags and the consistency of annotations. ""}}


 * -- align="left" valign=top
 * Chatterjee, Madhumita; Sivakumar, G. & Menezes, Bernard
 * Dynamic policy based model for trust based access control in P2P applications
 * 2009 IEEE International Conference on Communications, ICC 2009, June 14, 2009 - June 18, 2009 Dresden, Germany
 * 2009
 * 


 * -- align="left" valign=top
 * Chen, Jian; Shtykh, Roman Y. & Jin, Qun
 * A web recommender system based on dynamic sampling of user information access behaviors
 * IEEE 9th International Conference on Computer and Information Technology, CIT 2009, October 11, 2009 - October 14, 2009 Xiamen, China
 * 2009
 * 


 * -- align="left" valign=top
 * Chen, Qing; Shipper, Timothy & Khan, Latifur
 * Tweets mining using Wikipedia and impurity cluster measurement
 * 2010 IEEE International Conference on Intelligence and Security Informatics: Public Safety and Security, ISI 2010, May 23, 2010 - May 26, 2010 Vancouver, BC, Canada
 * 2010
 * 


 * -- align="left" valign=top
 * Chen, Scott Deeann; Monga, Vishal & Moulin, Pierre
 * Meta-classifiers for multimodal document classification
 * 2009 IEEE International Workshop on Multimedia Signal Processing, MMSP '09, October 5, 2009 - October 7, 2009 Rio De Janeiro, Brazil
 * 2009
 * 


 * -- align="left" valign=top
 * Chevalier, Fanny; Huot, Stephane & Fekete, Jean-Daniel
 * WikipediaViz: Conveying article quality for casual wikipedia readers
 * IEEE Pacific Visualization Symposium 2010, PacificVis 2010, March 2, 2010 - March 5, 2010 Taipei, Taiwan
 * 2010
 * 
 * {{hidden||As Wikipedia has become one of the most used knowledge bases worldwide, the problem of the trustworthiness of the information it disseminates becomes central. With WikipediaViz, we introduce five visual indicators integrated to the Wikipedia layout that can keep casual Wikipedia readers aware of important meta-information about the articles they read. The design of WikipediaViz} was inspired by two participatory design sessions with expert Wikipedia writers and sociologists who explained the clues they used to quickly assess the trustworthiness of articles. According to these results, we propose five metrics for Maturity and Quality assessment OfWikipedia} articles and their accompanying visualizations to provide the readers with important clues about the editing process at a glance. We also report and discuss about the results of the user studies we conducted. Two preliminary pilot studies show that all our subjects trust Wikipedia articles almost blindly. With the third study, we show that WikipediaViz} significantly reduces the time required to assess the quality of articles while maintaining a good accuracy.}}
 * {{hidden||As Wikipedia has become one of the most used knowledge bases worldwide, the problem of the trustworthiness of the information it disseminates becomes central. With WikipediaViz, we introduce five visual indicators integrated to the Wikipedia layout that can keep casual Wikipedia readers aware of important meta-information about the articles they read. The design of WikipediaViz} was inspired by two participatory design sessions with expert Wikipedia writers and sociologists who explained the clues they used to quickly assess the trustworthiness of articles. According to these results, we propose five metrics for Maturity and Quality assessment OfWikipedia} articles and their accompanying visualizations to provide the readers with important clues about the editing process at a glance. We also report and discuss about the results of the user studies we conducted. Two preliminary pilot studies show that all our subjects trust Wikipedia articles almost blindly. With the third study, we show that WikipediaViz} significantly reduces the time required to assess the quality of articles while maintaining a good accuracy.}}


 * -- align="left" valign=top
 * Chidlovskii, Boris
 * Multi-label wikipedia classification with textual and link features
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||We address the problem of categorizing a large set of linked documents with important content and structure aspects, in particular, from the Wikipedia collection proposed at the INEX} 2009 XML} Mining challenge. We analyze the network of collection pages and turn it into valuable features for the classification. We combine the content-based and link-based features of pages to train an accurate categorizer for unlabelled pages. In the multi-label setting, we revise a number of existing techniques and test some which show a good scalability. We report evaluation results obtained with a variety of learning methods and techniques on the training set of the Wikipedia corpus. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We address the problem of categorizing a large set of linked documents with important content and structure aspects, in particular, from the Wikipedia collection proposed at the INEX} 2009 XML} Mining challenge. We analyze the network of collection pages and turn it into valuable features for the classification. We combine the content-based and link-based features of pages to train an accurate categorizer for unlabelled pages. In the multi-label setting, we revise a number of existing techniques and test some which show a good scalability. We report evaluation results obtained with a variety of learning methods and techniques on the training set of the Wikipedia corpus. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini & Eichmann, David
 * Detecting wikipedia vandalism with active learning and statistical language models
 * 4th Workshop on Information Credibility on the Web, WICOW'10, April 26, 2010 - April 30, 2010 Raleigh, NC, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Choubassi, Maha El; Nestares, Oscar; Wu, Yi; Kozintsev, Igor & Haussecker, Horst
 * An augmented reality tourist guide on your mobile devices
 * 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China
 * 2009
 * 
 * {{hidden||We present an augmented reality tourist guide on mobile devices. Many of latest mobile devices contain cameras, location, orientation and motion sensors. We demonstrate how these devices can be used to bring tourism information to users in a much more immersive manner than traditional text or maps. Our system uses a combination of camera, location and orientation sensors to augment live camera view on a device with the available information about the objects in the view. The augmenting information is obtained by matching a camera image to images in a database on a server that have geotags in the vicinity of the user location. We use a subset of geotagged English Wikipedia pages as the main source of images and augmenting text information. At the time of publication our database contained 50 K pages with more than 150 K images linked to them. A combination of motion estimation algorithms and orientation sensors is used to track objects of interest in the live camera view and place augmented information on top of them. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We present an augmented reality tourist guide on mobile devices. Many of latest mobile devices contain cameras, location, orientation and motion sensors. We demonstrate how these devices can be used to bring tourism information to users in a much more immersive manner than traditional text or maps. Our system uses a combination of camera, location and orientation sensors to augment live camera view on a device with the available information about the objects in the view. The augmenting information is obtained by matching a camera image to images in a database on a server that have geotags in the vicinity of the user location. We use a subset of geotagged English Wikipedia pages as the main source of images and augmenting text information. At the time of publication our database contained 50 K pages with more than 150 K images linked to them. A combination of motion estimation algorithms and orientation sensors is used to track objects of interest in the live camera view and place augmented information on top of them. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Ciglan, Marek; Rivierez, Etienne & Nrvag, Kjetil
 * Learning to find interesting connections in Wikipedia
 * 12th International Asia Pacific Web Conference, APWeb 2010, April 6, 2010 - April 8, 2010 Busan, Republic of Korea
 * 2010
 * 


 * -- align="left" valign=top
 * Conde, Tiago; Marcelino, Luis & Fonseca, Benjamim
 * Implementing a system for collaborative search of local services
 * 14th International Workshop of Groupware, CRIWG 2008, September 14, 2008 - September 18, 2008 Omaha, NE, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Congle, Zhang & Dikan, Xing
 * Knowledge-supervised learning by co-clustering based approach
 * 7th International Conference on Machine Learning and Applications, ICMLA 2008, December 11, 2008 - December 13, 2008 San Diego, CA, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Cotta, Carlos
 * Keeping the ball rolling: Teaching strategies using Wikipedia: An argument in favor of its use in computer science courses
 * 2nd International Conference on Computer Supported Education, CSEDU 2010, April 7, 2010 - April 10, 2010 Valencia, Spain
 * 2010


 * -- align="left" valign=top
 * Craswell, Nick; Demartini, Gianluca; Gaugaz, Julien & Iofciu, Tereza
 * L3S at INEX 2008: Retrieving entities using structured information
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 


 * -- align="left" valign=top
 * Crouch, Carolyn J.; Crouch, Donald B.; Bapat, Salil; Mehta, Sarika & Paranjape, Darshan
 * Finding good elements for focused retrieval
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||This paper describes the integration of our methodology for the dynamic retrieval of XML} elements [2] with traditional article retrieval to facilitate the Focused and the Relevant-in-Context} Tasks of the INEX} 2008 Ad Hoc Track. The particular problems that arise for dynamic element retrieval in working with text containing both tagged and untagged elements have been solved [3]. The current challenge involves utilizing its ability to produce a rank-ordered list of elements in the context of focused retrieval. Our system is based on the Vector Space Model [8]; basic functions are performed using the Smart experimental retrieval system [7]. Experimental results are reported for the Focused, Relevant-in-Context, and Best-in-Context} Tasks of both the 2007 and 2008 INEX} Ad Hoc Tracks. These results indicate that the goal of our 2008 investigations-namely, finding good focused elements in the context of the Wikipedia collection-has been achieved. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper describes the integration of our methodology for the dynamic retrieval of XML} elements [2] with traditional article retrieval to facilitate the Focused and the Relevant-in-Context} Tasks of the INEX} 2008 Ad Hoc Track. The particular problems that arise for dynamic element retrieval in working with text containing both tagged and untagged elements have been solved [3]. The current challenge involves utilizing its ability to produce a rank-ordered list of elements in the context of focused retrieval. Our system is based on the Vector Space Model [8]; basic functions are performed using the Smart experimental retrieval system [7]. Experimental results are reported for the Focused, Relevant-in-Context, and Best-in-Context} Tasks of both the 2007 and 2008 INEX} Ad Hoc Tracks. These results indicate that the goal of our 2008 investigations-namely, finding good focused elements in the context of the Wikipedia collection-has been achieved. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Crouch, Carolyn J.; Crouch, Donald B.; Bhirud, Dinesh; Poluri, Pavan; Polumetla, Chaitanya & Sudhakar, Varun
 * A methodology for producing improved focused elements
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||This paper reports the results of our experiments to consistently produce highly ranked focused elements in response to the Focused Task of the INEX} Ad Hoc Track. The results of these experiments, performed using the 2008 INEX} collection, confirm that our current methodology (described herein) produces such elements for this collection. Our goal for 2009 is to apply this methodology to the new, extended 2009 INEX} collection to determine its viability in this environment. (These} experiments are currently underway.) Our system uses our method for dynamic element retrieval [4], working with the semi-structured text of Wikipedia [5], to produce a rank-ordered list of elements in the context of focused retrieval. It is based on the Vector Space Model [15]; basic functions are performed using the Smart experimental retrieval system [14]. Experimental results are reported for the Focused Task of both the 2008 and 2009 INEX} Ad Hoc Tracks. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper reports the results of our experiments to consistently produce highly ranked focused elements in response to the Focused Task of the INEX} Ad Hoc Track. The results of these experiments, performed using the 2008 INEX} collection, confirm that our current methodology (described herein) produces such elements for this collection. Our goal for 2009 is to apply this methodology to the new, extended 2009 INEX} collection to determine its viability in this environment. (These} experiments are currently underway.) Our system uses our method for dynamic element retrieval [4], working with the semi-structured text of Wikipedia [5], to produce a rank-ordered list of elements in the context of focused retrieval. It is based on the Vector Space Model [15]; basic functions are performed using the Smart experimental retrieval system [14]. Experimental results are reported for the Focused Task of both the 2008 and 2009 INEX} Ad Hoc Tracks. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Crouch, Carolyn J.; Crouch, Donald B.; Kamat, Nachiket; Malik, Vikram & Mone, Aditya
 * Dynamic element retrieval in the wikipedia collection
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||This paper describes the successful adaptation of our methodology for the dynamic retrieval of XML} elements to a semi-structured environment. Working with text that contains both tagged and untagged elements presents particular challenges in this context. Our system is based on the Vector Space Model; basic functions are performed using the Smart experimental retrieval system. Dynamic element retrieval requires only a single indexing of the document collection at the level of the basic indexing node (i.e., the paragraph). It returns a rank-ordered list of elements identical to that produced by the same query against an all-element index of the collection. Experimental results are reported for both the 2006 and 2007 Ad-hoc tasks. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper describes the successful adaptation of our methodology for the dynamic retrieval of XML} elements to a semi-structured environment. Working with text that contains both tagged and untagged elements presents particular challenges in this context. Our system is based on the Vector Space Model; basic functions are performed using the Smart experimental retrieval system. Dynamic element retrieval requires only a single indexing of the document collection at the level of the basic indexing node (i.e., the paragraph). It returns a rank-ordered list of elements identical to that produced by the same query against an all-element index of the collection. Experimental results are reported for both the 2006 and 2007 Ad-hoc tasks. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Cui, Gaoying; Lu, Qin; Li, Wenjie & Chen, Yirong
 * Automatic acquisition of attributes for ontology construction
 * 22nd International Conference on Computer Processing of Oriental Languages, ICCPOL 2009, March 26, 2009 - March 27, 2009 Hong kong
 * 2009
 * 


 * -- align="left" valign=top
 * Curino, Carlo A.; Moon, Hyun J.; Tanca, Letizia & Zaniolo, Carlo
 * Schema evolution in wikipedia - Toward a web Information system benchmark
 * ICEIS 2008 - 10th International Conference on Enterprise Information Systems, June 12, 2008 - June 16, 2008 Barcelona, Spain
 * 2008
 * {{hidden||Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an indepth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known example of a large family of web information systems built using the open-source software MediaWiki.} Our study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, we briefly discuss documentation and automation support systems for database evolution, and suggest that the Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.}}
 * {{hidden||Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an indepth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known example of a large family of web information systems built using the open-source software MediaWiki.} Our study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, we briefly discuss documentation and automation support systems for database evolution, and suggest that the Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.}}
 * {{hidden||Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an indepth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known example of a large family of web information systems built using the open-source software MediaWiki.} Our study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, we briefly discuss documentation and automation support systems for database evolution, and suggest that the Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.}}


 * -- align="left" valign=top
 * Dalip, Daniel Hasan; Goncalves, Marcos Andre; Cristo, Marco & Calado, Pavel
 * Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia
 * 2009 ACM/IEEE Joint Conference on Digital Libraries, JCDL'09, June 15, 2009 - June 19, 2009 Austin, TX, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Darwish, Kareem
 * CMIC@INEX 2008: Link-the-wiki track
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||This paper describes the runs that I submitted to the INEX} 2008 Link-the-Wiki} track. I participated in the incoming File-to-File} and the outgoing Anchor-to-BEP} tasks. For the File-to-File} task I used a generic IR} engine and constructed queries based on the title, keywords, and keyphrases of the Wikipedia article. My runs performed well for this task achieving the highest precision for low recall levels. Further post-hoc experiments showed that constructing queries using titles only produced even better results than the official submissions. For the Anchor-to-BEP} task, I used a keyphrase extraction engine developed in-house and I filtered the keyphrases using existing Wikipedia titles. Unfortunately, my runs performed poorly compared to those of other groups. I suspect that this was the result of using many phrases that were not central to articles as anchors. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper describes the runs that I submitted to the INEX} 2008 Link-the-Wiki} track. I participated in the incoming File-to-File} and the outgoing Anchor-to-BEP} tasks. For the File-to-File} task I used a generic IR} engine and constructed queries based on the title, keywords, and keyphrases of the Wikipedia article. My runs performed well for this task achieving the highest precision for low recall levels. Further post-hoc experiments showed that constructing queries using titles only produced even better results than the official submissions. For the Anchor-to-BEP} task, I used a keyphrase extraction engine developed in-house and I filtered the keyphrases using existing Wikipedia titles. Unfortunately, my runs performed poorly compared to those of other groups. I suspect that this was the result of using many phrases that were not central to articles as anchors. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Das, Sanmay & Magdon-Ismail, Malik
 * Collective wisdom: Information growth in wikis and blogs
 * 11th ACM Conference on Electronic Commerce, EC'10, June 7, 2010 - June 11, 2010 Cambridge, MA, United states
 * 2010
 * 
 * {{hidden||Wikis and blogs have become enormously successful media for collaborative information creation. Articles and posts accrue information through the asynchronous editing of users who arrive both seeking information and possibly able to contribute information. Most articles stabilize to high quality, trusted sources of information representing the collective wisdom of all the users who edited the article. We propose a model for information growth which relies on two main observations: (i) as an article's quality improves, it attracts visitors at a faster rate (a rich get richer phenomenon); and, simultaneously, (ii) the chances that a new visitor will improve the article drops (there is only so much that can be said about a particular topic). Our model is able to reproduce many features of the edit dynamics observed on Wikipedia and on blogs collected from LiveJournal;} in particular, it captures the observed rise in the edit rate, followed by 1/t decay. ""}}
 * {{hidden||Wikis and blogs have become enormously successful media for collaborative information creation. Articles and posts accrue information through the asynchronous editing of users who arrive both seeking information and possibly able to contribute information. Most articles stabilize to high quality, trusted sources of information representing the collective wisdom of all the users who edited the article. We propose a model for information growth which relies on two main observations: (i) as an article's quality improves, it attracts visitors at a faster rate (a rich get richer phenomenon); and, simultaneously, (ii) the chances that a new visitor will improve the article drops (there is only so much that can be said about a particular topic). Our model is able to reproduce many features of the edit dynamics observed on Wikipedia and on blogs collected from LiveJournal;} in particular, it captures the observed rise in the edit rate, followed by 1/t decay. ""}}


 * -- align="left" valign=top
 * Demartini, Gianluca; Firan, Claudiu S. & Iofciu, Tereza
 * L3S at INEX 2007: Query expansion for entity ranking using a highly accurate ontology
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||Entity ranking on Web scale datasets is still an open challenge. Several resources, as for example Wikipedia-based ontologies, can be used to improve the quality of the entity ranking produced by a system. In this paper we focus on the Wikipedia corpus and propose algorithms for finding entities based on query relaxation using category information. The main contribution is a methodology for expanding the user query by exploiting the semantic structure of the dataset. Our approach focuses on constructing queries using not only keywords from the topic, but also information about relevant categories. This is done leveraging on a highly accurate ontology which is matched to the character strings of the topic. The evaluation is performed using the INEX} 2007 Wikipedia collection and entity ranking topics. The results show that our approach performs effectively, especially for early precision metrics. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Entity ranking on Web scale datasets is still an open challenge. Several resources, as for example Wikipedia-based ontologies, can be used to improve the quality of the entity ranking produced by a system. In this paper we focus on the Wikipedia corpus and propose algorithms for finding entities based on query relaxation using category information. The main contribution is a methodology for expanding the user query by exploiting the semantic structure of the dataset. Our approach focuses on constructing queries using not only keywords from the topic, but also information about relevant categories. This is done leveraging on a highly accurate ontology which is matched to the character strings of the topic. The evaluation is performed using the INEX} 2007 Wikipedia collection and entity ranking topics. The results show that our approach performs effectively, especially for early precision metrics. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Demartini, Gianluca; Firan, Claudiu S.; Iofciu, Tereza; Krestel, Ralf & Nejdl, Wolfgang
 * A model for Ranking entities and its application to Wikipedia
 * Latin American Web Conference, LA-WEB 2008, October 28, 2008 - October 30, 2008 Vila Velha, Espirito Santo, Brazil
 * 2008
 * 
 * {{hidden||Entity Ranking (ER) is a recently emerging search task in Information Retrieval, where the goal is not finding documents matching the query words, but instead finding entities which match types and attributes mentioned in the query. In this paper we propose a formal model to define entities as well as a complete ER} system, providing examples of its application to enterprise, Web, and Wikipedia scenarios. Since searching for entities on Web scale repositories is an open challenge as the effectiveness of ranking is usually not satisfactory, we present a set of algorithms based on our model and evaluate their retrieval effectiveness. The results show that combining simple Link Analysis, Natural Language Processing, and Named Entity Recognition methods improves retrieval performance of entity search by over 53\% for P@ 10 and 35\% for MAP.} ""}}
 * {{hidden||Entity Ranking (ER) is a recently emerging search task in Information Retrieval, where the goal is not finding documents matching the query words, but instead finding entities which match types and attributes mentioned in the query. In this paper we propose a formal model to define entities as well as a complete ER} system, providing examples of its application to enterprise, Web, and Wikipedia scenarios. Since searching for entities on Web scale repositories is an open challenge as the effectiveness of ranking is usually not satisfactory, we present a set of algorithms based on our model and evaluate their retrieval effectiveness. The results show that combining simple Link Analysis, Natural Language Processing, and Named Entity Recognition methods improves retrieval performance of entity search by over 53\% for P@ 10 and 35\% for MAP.} ""}}


 * -- align="left" valign=top
 * Demartini, Gianluca; Iofciu, Tereza & Vries, Arjen P. De
 * Overview of the INEX 2009 entity ranking track
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||In some situations search engine users would prefer to retrieve entities instead of just documents. Example queries include Italian} Nobel prize winners" {"Formula} 1 drivers that won the Monaco Grand Prix" or {"German} spoken Swiss cantons". The XML} Entity Ranking (XER) track at INEX} creates a discussion forum aimed at standardizing evaluation procedures for entity retrieval. This paper describes the XER} tasks and the evaluation procedure used at the XER} track in 2009 where a new version of Wikipedia was used as underlying collection; and summarizes the approaches adopted by the participants. 2010 Springer-Verlag} Berlin Heidelberg."}}
 * {{hidden||In some situations search engine users would prefer to retrieve entities instead of just documents. Example queries include Italian} Nobel prize winners" {"Formula} 1 drivers that won the Monaco Grand Prix" or {"German} spoken Swiss cantons". The XML} Entity Ranking (XER) track at INEX} creates a discussion forum aimed at standardizing evaluation procedures for entity retrieval. This paper describes the XER} tasks and the evaluation procedure used at the XER} track in 2009 where a new version of Wikipedia was used as underlying collection; and summarizes the approaches adopted by the participants. 2010 Springer-Verlag} Berlin Heidelberg."}}


 * -- align="left" valign=top
 * Demidova, Elena; Oelze, Irina & Fankhauser, Peter
 * Do we mean the same? Disambiguation of extracted keyword queries for database search
 * 1st International Workshop on Keyword Search on Structured Data, KEYS '09, June 28, 2009 - June 28, 2009 Providence, RI, United states
 * 2009
 * 
 * {{hidden||Users often try to accumulate information on a topic of interest from multiple information sources. In this case a user's informational need might be expressed in terms of an available relevant document, e.g. a web-page or an e-mail attachment, rather than a query. Database search engines are mostly adapted to the queries manually created by the users. In case a user's informational need is expressed in terms of a document, we need algorithms that map keyword queries automatically extracted from this document to the database content. In this paper we analyze the impact of selected document and database statistics on the effectiveness of keyword disambiguation for manually created as well as automatically extracted keyword queries. Our evaluation is performed using a set of user queries from the AOL} query log and a set of queries automatically extracted from Wikipedia articles both executed against the Internet Movie Database (IMDB).} Our experimental results show that (1) knowledge of the document context is crucial in order to extract meaningful keyword queries; (2) statistics which enable effective disambiguation of user queries are not sufficient to achieve the same quality for the automatically extracted requests. ""}}
 * {{hidden||Users often try to accumulate information on a topic of interest from multiple information sources. In this case a user's informational need might be expressed in terms of an available relevant document, e.g. a web-page or an e-mail attachment, rather than a query. Database search engines are mostly adapted to the queries manually created by the users. In case a user's informational need is expressed in terms of a document, we need algorithms that map keyword queries automatically extracted from this document to the database content. In this paper we analyze the impact of selected document and database statistics on the effectiveness of keyword disambiguation for manually created as well as automatically extracted keyword queries. Our evaluation is performed using a set of user queries from the AOL} query log and a set of queries automatically extracted from Wikipedia articles both executed against the Internet Movie Database (IMDB).} Our experimental results show that (1) knowledge of the document context is crucial in order to extract meaningful keyword queries; (2) statistics which enable effective disambiguation of user queries are not sufficient to achieve the same quality for the automatically extracted requests. ""}}


 * -- align="left" valign=top
 * Denoyer, Ludovic & Gallinari, Patrick
 * Overview of the INEX 2008 XML mining track categorization and clustering of XML documents in a graph of documents
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||We describe here the XML} Mining Track at INEX} 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning (ML) tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of XML} documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the XML} documents and also the link information between documents. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||We describe here the XML} Mining Track at INEX} 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning (ML) tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of XML} documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the XML} documents and also the link information between documents. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Denoyer, Ludovic & Gallinari, Patrick
 * Machine learning for semi-structured multimedia documents: Application to pornographic filtering and thematic categorization
 * Machine Learning Techniques for Multimedia - Case Studies on Organization and Retrieval Tiergartenstrasse 17, Heidelberg, D-69121, Germany
 * 2008
 * {{hidden||We propose a generative statistical model for the classification of semi-structured multimedia documents. Its main originality is its ability to simultaneously take into account the structural and the content information present in a semi-structured document and also to cope with different types of content (text, image, etc.). We then present the results obtained on two sets of experiments: • One set concerns the filtering of pornographic Web pages • The second one concerns the thematic classification of Wikipedia documents. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We propose a generative statistical model for the classification of semi-structured multimedia documents. Its main originality is its ability to simultaneously take into account the structural and the content information present in a semi-structured document and also to cope with different types of content (text, image, etc.). We then present the results obtained on two sets of experiments: • One set concerns the filtering of pornographic Web pages • The second one concerns the thematic classification of Wikipedia documents. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We propose a generative statistical model for the classification of semi-structured multimedia documents. Its main originality is its ability to simultaneously take into account the structural and the content information present in a semi-structured document and also to cope with different types of content (text, image, etc.). We then present the results obtained on two sets of experiments: • One set concerns the filtering of pornographic Web pages • The second one concerns the thematic classification of Wikipedia documents. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Deshpande, Smita & Betke, Margrit
 * RefLink: An interface that enables people with motion impairments to analyze web content and dynamically link to references
 * 9th International Workshop on Pattern Recognition in Information Systems - PRIS 2009 In Conjunction with ICEIS 2009, May 6, 2009 - May 7, 2009 Milan, Italy
 * 2009
 * {{hidden||In this paper, we present RefLink, an interface that allows users to analyze the content of web page by dynamically linking to an online encyclopedia such as Wikipedia. Upon opening a webpage, RefLink instantly provides a list of terms extracted from the webpage and annotates each term by the number of its occurrences in the page. RefLink} uses the text-to-speech interface to read out the list of terms. The user can select a term of interest and follow its link to the encyclopedia. RefLink} thus helps the users to perform an informed and efficient contextual analysis. Initial user testing suggests that RefLink} is a valuable web browsing tool, in particular for people with motion impairments, because it greatly simplifies the process of obtaining reference material and performing contextual analysis.}}
 * {{hidden||In this paper, we present RefLink, an interface that allows users to analyze the content of web page by dynamically linking to an online encyclopedia such as Wikipedia. Upon opening a webpage, RefLink instantly provides a list of terms extracted from the webpage and annotates each term by the number of its occurrences in the page. RefLink} uses the text-to-speech interface to read out the list of terms. The user can select a term of interest and follow its link to the encyclopedia. RefLink} thus helps the users to perform an informed and efficient contextual analysis. Initial user testing suggests that RefLink} is a valuable web browsing tool, in particular for people with motion impairments, because it greatly simplifies the process of obtaining reference material and performing contextual analysis.}}
 * {{hidden||In this paper, we present RefLink, an interface that allows users to analyze the content of web page by dynamically linking to an online encyclopedia such as Wikipedia. Upon opening a webpage, RefLink instantly provides a list of terms extracted from the webpage and annotates each term by the number of its occurrences in the page. RefLink} uses the text-to-speech interface to read out the list of terms. The user can select a term of interest and follow its link to the encyclopedia. RefLink} thus helps the users to perform an informed and efficient contextual analysis. Initial user testing suggests that RefLink} is a valuable web browsing tool, in particular for people with motion impairments, because it greatly simplifies the process of obtaining reference material and performing contextual analysis.}}


 * -- align="left" valign=top
 * Dopichaj, Philipp; Skusa, Andre & He, Andreas
 * Stealing anchors to link the wiki
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||This paper describes the Link-the-Wiki} submission of Lycos Europe. We try to learn suitable anchor texts by looking at the anchor texts the Wikipedia authors used. Disambiguation is done by using textual similarity and also by checking whether a set of link targets makes sense" together. 2009 Springer Berlin Heidelberg."}}
 * {{hidden||This paper describes the Link-the-Wiki} submission of Lycos Europe. We try to learn suitable anchor texts by looking at the anchor texts the Wikipedia authors used. Disambiguation is done by using textual similarity and also by checking whether a set of link targets makes sense" together. 2009 Springer Berlin Heidelberg."}}


 * -- align="left" valign=top
 * Doyle, Richard & Devon, Richard
 * Teaching process for technological literacy: The case of nanotechnology and global open source pedagogy
 * 2010 ASEE Annual Conference and Exposition, June 20, 2010 - June 23, 2010 Louisville, KY, United states
 * 2010
 * {{hidden||In this paper we propose approaching the concern addressed by the technology literacy movement by using process design rather than product design. Rather than requiring people to know an impossible amount about technology, we suggest that we can teach process for understanding and making decisions about any technology. This process can be applied to new problems and new contexts that emerge from the continuous innovation and transformation of technology markets. Such a process offers a strategy for planning for and abiding the uncertainty intrinsic to the development of modern science and technology. We teach students from diverse backgrounds in an NSF} funded course on the social, human, and ethical (SHE) impacts of nanotechnology. The process we will describe is global open source collective intelligence (GOSSIP).} This paper traces out some the principles of GOSSIP} through the example of a course taught to a mixture of engineers and students from the Arts and the Humanities. Open source is obviously a powerful method: witness the development of Linux, and GNU} before that, and the extraordinary success of Wikipedia. Democratic, and hence diverse, information flows have been suggested as vital to sustaining a healthy company. American Society for Engineering Education, 2010.}}
 * {{hidden||In this paper we propose approaching the concern addressed by the technology literacy movement by using process design rather than product design. Rather than requiring people to know an impossible amount about technology, we suggest that we can teach process for understanding and making decisions about any technology. This process can be applied to new problems and new contexts that emerge from the continuous innovation and transformation of technology markets. Such a process offers a strategy for planning for and abiding the uncertainty intrinsic to the development of modern science and technology. We teach students from diverse backgrounds in an NSF} funded course on the social, human, and ethical (SHE) impacts of nanotechnology. The process we will describe is global open source collective intelligence (GOSSIP).} This paper traces out some the principles of GOSSIP} through the example of a course taught to a mixture of engineers and students from the Arts and the Humanities. Open source is obviously a powerful method: witness the development of Linux, and GNU} before that, and the extraordinary success of Wikipedia. Democratic, and hence diverse, information flows have been suggested as vital to sustaining a healthy company. American Society for Engineering Education, 2010.}}
 * {{hidden||In this paper we propose approaching the concern addressed by the technology literacy movement by using process design rather than product design. Rather than requiring people to know an impossible amount about technology, we suggest that we can teach process for understanding and making decisions about any technology. This process can be applied to new problems and new contexts that emerge from the continuous innovation and transformation of technology markets. Such a process offers a strategy for planning for and abiding the uncertainty intrinsic to the development of modern science and technology. We teach students from diverse backgrounds in an NSF} funded course on the social, human, and ethical (SHE) impacts of nanotechnology. The process we will describe is global open source collective intelligence (GOSSIP).} This paper traces out some the principles of GOSSIP} through the example of a course taught to a mixture of engineers and students from the Arts and the Humanities. Open source is obviously a powerful method: witness the development of Linux, and GNU} before that, and the extraordinary success of Wikipedia. Democratic, and hence diverse, information flows have been suggested as vital to sustaining a healthy company. American Society for Engineering Education, 2010.}}


 * -- align="left" valign=top
 * Dupen, Barry
 * Using internet sources to solve materials homework assignments
 * 2008 ASEE Annual Conference and Exposition, June 22, 2008 - June 24, 2008 Pittsburg, PA, United states
 * 2008


 * -- align="left" valign=top
 * Edwards, Lilian
 * Content filtering and the new censorship
 * 4th International Conference on Digital Society, ICDS 2010, Includes CYBERLAWS 2010: 1st International Conference on Technical and Legal Aspects of the e-Society, February 10, 2010 - February 16, 2010 St. Maarten, Netherlands
 * 2010
 * 
 * {{hidden||Since the famous Time magazine cover of 1995, nation states have been struggling to control access to adult and illegal material on the Internet. In recent years, strategies for such control have shifted from the use of traditional policing-largely ineffective in a transnational medium - to the use of take down and especially filtering applied by ISPs} enrolled as privatized censors" by the state. The role of the IWF} in the UK} has become a pivotal case study of how state and private interests have interacted to produce effective but non transparent and non accountable censorship even in a Western democracy. The IWF's} role has recently been significantly questioned after a stand-off with Wikipedia in December 2008. This paper will set the IWF's} recent acts in the context of a massive increase in global filtering of Internet content and suggest the creation of a Speech Impact Assessment process which might inhibit the growth of unchecked censorship. """}}
 * {{hidden||Since the famous Time magazine cover of 1995, nation states have been struggling to control access to adult and illegal material on the Internet. In recent years, strategies for such control have shifted from the use of traditional policing-largely ineffective in a transnational medium - to the use of take down and especially filtering applied by ISPs} enrolled as privatized censors" by the state. The role of the IWF} in the UK} has become a pivotal case study of how state and private interests have interacted to produce effective but non transparent and non accountable censorship even in a Western democracy. The IWF's} role has recently been significantly questioned after a stand-off with Wikipedia in December 2008. This paper will set the IWF's} recent acts in the context of a massive increase in global filtering of Internet content and suggest the creation of a Speech Impact Assessment process which might inhibit the growth of unchecked censorship. """}}


 * -- align="left" valign=top
 * Elmqvis, Niklas; Do, Thanh-Nghi; Goodell, Howard; Henry, Nathalie & Fekete, Jean-Daniel
 * ZAME: Interactive large-scale graph visualization
 * 2008 Pacific Visualization Symposium, PacificVis 2008, March 4, 2008 - March 7, 2008 Kyoto, Japan
 * 2008
 * 
 * {{hidden||We present the Zoomable Adjacency Matrix Explorer (ZAME), a visualization tool for exploring graphs at a scale of millions of nodes and edges. ZAME} is based on an adjacency matrix graph representation aggregated at multiple scales. It allows analysts to explore a graph at many levels, zooming and panning with interactive performance from an overview to the most detailed views. Several components work together in the ZAME} tool to make this possible. Efficient matrix ordering algorithms group related elements. Individual data cases are aggregated into higher-order meta-representations. Aggregates are arranged into a pyramid hierarchy that allows for on-demand paging to GPU} shader programs to support smooth multiscale browsing. Using ZAME, we are able to explore the entire French. Wikipedia - over 500,000 articles and 6,000,000 links - with interactive performance on standard consumer-level computer hardware. ""}}
 * {{hidden||We present the Zoomable Adjacency Matrix Explorer (ZAME), a visualization tool for exploring graphs at a scale of millions of nodes and edges. ZAME} is based on an adjacency matrix graph representation aggregated at multiple scales. It allows analysts to explore a graph at many levels, zooming and panning with interactive performance from an overview to the most detailed views. Several components work together in the ZAME} tool to make this possible. Efficient matrix ordering algorithms group related elements. Individual data cases are aggregated into higher-order meta-representations. Aggregates are arranged into a pyramid hierarchy that allows for on-demand paging to GPU} shader programs to support smooth multiscale browsing. Using ZAME, we are able to explore the entire French. Wikipedia - over 500,000 articles and 6,000,000 links - with interactive performance on standard consumer-level computer hardware. ""}}


 * -- align="left" valign=top
 * Fachry, Khairun Nisa; Kamps, Jaap; Koolen, Marijn & Zhang, Junte
 * Using and detecting links in Wikipedia
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||In this paper, we document our efforts at INEX} 2007 where we participated in the Ad Hoc Track, the Link the Wiki Track, and the Interactive Track that continued from INEX} 2006. Our main aims at INEX} 2007 were the following. For the Ad Hoc Track, we investigated the effectiveness of incorporating link evidence into the model, and of a CAS} filtering method exploiting the structural hints in the INEX} topics. For the Link the Wiki Track, we investigated the relative effectiveness of link detection based on retrieving similar documents with the Vector Space Model, and then filter with the names of Wikipedia articles to establish a link. For the Interactive Track, we took part in the interactive experiment comparing an element retrieval system with a passage retrieval system. The main results are the following. For the Ad Hoc Track, we see that link priors improve most of our runs for the Relevant in Context and Best in Context Tasks, and that CAS} pool filtering is effective for the Relevant in Context and Best in Context Tasks. For the Link the Wiki Track, the results show that detecting links with name matching works relatively well, though links were generally under-generated, which hurt the performance. For the Interactive Track, our test-persons showed a weak preference for the element retrieval system over the passage retrieval system. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||In this paper, we document our efforts at INEX} 2007 where we participated in the Ad Hoc Track, the Link the Wiki Track, and the Interactive Track that continued from INEX} 2006. Our main aims at INEX} 2007 were the following. For the Ad Hoc Track, we investigated the effectiveness of incorporating link evidence into the model, and of a CAS} filtering method exploiting the structural hints in the INEX} topics. For the Link the Wiki Track, we investigated the relative effectiveness of link detection based on retrieving similar documents with the Vector Space Model, and then filter with the names of Wikipedia articles to establish a link. For the Interactive Track, we took part in the interactive experiment comparing an element retrieval system with a passage retrieval system. The main results are the following. For the Ad Hoc Track, we see that link priors improve most of our runs for the Relevant in Context and Best in Context Tasks, and that CAS} pool filtering is effective for the Relevant in Context and Best in Context Tasks. For the Link the Wiki Track, the results show that detecting links with name matching works relatively well, though links were generally under-generated, which hurt the performance. For the Interactive Track, our test-persons showed a weak preference for the element retrieval system over the passage retrieval system. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Fadaei, Hakimeh & Shamsfard, Mehrnoush
 * Extracting conceptual relations from Persian resources
 * 7th International Conference on Information Technology - New Generations, ITNG 2010, April 12, 2010 - April 14, 2010 Las Vegas, NV, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Fernandez-Garcia, Norberto; Blazquez-Del-Toro, Jose M.; Fisteus, Jesus Arias & Sanchez-Fernandez, Luis
 * A semantic web portal for semantic annotation and search
 * 10th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES 2006, October 9, 2006 - October 11, 2006 Bournemouth, United kingdom
 * 2006
 * {{hidden||The semantic annotation of the contents of Web resources is a required step in order to allow the Semantic Web vision to become a reality. In this paper we describe an approach to manual semantic annotation which tries to integrate both the semantic annotation task and the information retrieval task. Our approach exploits the information provided by Wikipedia pages and takes the form of a semantic Web portal, which allows a community of users to easily define and share annotations on Web resources. Springer-Verlag} Berlin Heidelberg 2006.}}
 * {{hidden||The semantic annotation of the contents of Web resources is a required step in order to allow the Semantic Web vision to become a reality. In this paper we describe an approach to manual semantic annotation which tries to integrate both the semantic annotation task and the information retrieval task. Our approach exploits the information provided by Wikipedia pages and takes the form of a semantic Web portal, which allows a community of users to easily define and share annotations on Web resources. Springer-Verlag} Berlin Heidelberg 2006.}}
 * {{hidden||The semantic annotation of the contents of Web resources is a required step in order to allow the Semantic Web vision to become a reality. In this paper we describe an approach to manual semantic annotation which tries to integrate both the semantic annotation task and the information retrieval task. Our approach exploits the information provided by Wikipedia pages and takes the form of a semantic Web portal, which allows a community of users to easily define and share annotations on Web resources. Springer-Verlag} Berlin Heidelberg 2006.}}


 * -- align="left" valign=top
 * Ferrandez, Sergio; Toral, Antonio; Ferrandez, Oscar; Ferrandez, Antonio & Munoz, Rafael
 * Applying Wikipedia's multilingual knowledge to cross-lingual question answering
 * 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France
 * 2007
 * {{hidden||The application of the multilingual knowledge encoded in Wikipedia to an open-domain Cross-Lingual} Question Answering system based on the Inter Lingual Index (ILI) module of EuroWordNet} is proposed and evaluated. This strategy overcomes the problems due to ILI's} low coverage on proper nouns (Named} Entities). Moreover, as these are open class words (highly changing), using a community-based up-to-date resource avoids the tedious maintenance of hand-coded bilingual dictionaries. A study reveals the importance to translate Named Entities in CL-QA} and the advantages of relying on Wikipedia over ILI} for doing this. Tests on questions from the Cross-Language} Evaluation Forum (CLEF) justify our approach (20\% of these are correctly answered thanks to Wikipedia's Multilingual Knowledge). Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||The application of the multilingual knowledge encoded in Wikipedia to an open-domain Cross-Lingual} Question Answering system based on the Inter Lingual Index (ILI) module of EuroWordNet} is proposed and evaluated. This strategy overcomes the problems due to ILI's} low coverage on proper nouns (Named} Entities). Moreover, as these are open class words (highly changing), using a community-based up-to-date resource avoids the tedious maintenance of hand-coded bilingual dictionaries. A study reveals the importance to translate Named Entities in CL-QA} and the advantages of relying on Wikipedia over ILI} for doing this. Tests on questions from the Cross-Language} Evaluation Forum (CLEF) justify our approach (20\% of these are correctly answered thanks to Wikipedia's Multilingual Knowledge). Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||The application of the multilingual knowledge encoded in Wikipedia to an open-domain Cross-Lingual} Question Answering system based on the Inter Lingual Index (ILI) module of EuroWordNet} is proposed and evaluated. This strategy overcomes the problems due to ILI's} low coverage on proper nouns (Named} Entities). Moreover, as these are open class words (highly changing), using a community-based up-to-date resource avoids the tedious maintenance of hand-coded bilingual dictionaries. A study reveals the importance to translate Named Entities in CL-QA} and the advantages of relying on Wikipedia over ILI} for doing this. Tests on questions from the Cross-Language} Evaluation Forum (CLEF) justify our approach (20\% of these are correctly answered thanks to Wikipedia's Multilingual Knowledge). Springer-Verlag} Berlin Heidelberg 2007.}}


 * -- align="left" valign=top
 * Fier, Darja & Sagot, Benoit
 * Combining multiple resources to build reliable wordnets
 * 11th International Conference on Text, Speech and Dialogue, TSD 2008, September 8, 2008 - September 12, 2008 Brno, Czech republic
 * 2008
 * 
 * {{hidden||This paper compares automatically generated sets of synonyms in French and Slovene wordnets with respect to the resources used in the construction process. Polysemous words were disambiguated via a five-language word-alignment of the SEERA.NET} parallel corpus, a subcorpus of the JRC} Acquis. The extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from different resources, including Wikipedia, Wiktionary and EUROVOC} thesaurus. A representative sample of the generated synsets was evaluated against the goldstandards. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper compares automatically generated sets of synonyms in French and Slovene wordnets with respect to the resources used in the construction process. Polysemous words were disambiguated via a five-language word-alignment of the SEERA.NET} parallel corpus, a subcorpus of the JRC} Acquis. The extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from different resources, including Wikipedia, Wiktionary and EUROVOC} thesaurus. A representative sample of the generated synsets was evaluated against the goldstandards. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Figueroa, Alejandro
 * Surface language models for discovering temporally anchored definitions on the web: Producing chronologies as answers to definition questions
 * 6th International Conference on Web Information Systems and Technologies, WEBIST 2010, April 7, 2010 - April 10, 2010 Valencia, Spain
 * 2010


 * -- align="left" valign=top
 * Figueroa, Alejandro
 * Are wikipedia resources useful for discovering answers to list questions within web snippets?
 * 4th International Conference on Web Information Systems and Technologies, WEBIST 2008, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal
 * 2009
 * 
 * {{hidden||This paper presents LiSnQA, a list question answering system that extracts answers to list queries from the short descriptions of web-sites returned by search engines, called web snippets. LiSnQA} mines Wikipedia resources in order to obtain valuable information that assists in the extraction of these answers. The interesting facet of LiSnQA} is, that in contrast to current systems, it does not account for lists in Wikipedia, but for its redirections, categories, sandboxes, and first definition sentences. Results show that these resources strengthen the answering process. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper presents LiSnQA, a list question answering system that extracts answers to list queries from the short descriptions of web-sites returned by search engines, called web snippets. LiSnQA} mines Wikipedia resources in order to obtain valuable information that assists in the extraction of these answers. The interesting facet of LiSnQA} is, that in contrast to current systems, it does not account for lists in Wikipedia, but for its redirections, categories, sandboxes, and first definition sentences. Results show that these resources strengthen the answering process. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Figueroa, Alejandro
 * Mining wikipedia for discovering multilingual definitions on the web
 * 4th International Conference on Semantics, Knowledge, and Grid, SKG 2008, December 3, 2008 - December 5, 2008 Beijing, China
 * 2008
 * 
 * {{hidden||MI} - DfWebQA} is a multilingual definition question answering system (QAS) that extracts answers to definition queries from the short descriptions of web-sites returned by search engines, called web snippets. These answers are discriminated on the ground of lexico-syntactic regularities mined from multilingual resources supplied by Wikipedia. Results support that these regularities serve to significantly strengthen the answering process. In addition, Ml - DfWebQA} increases the robustness of multilingual definition QASs} by making use of aliases found in Wikipedia. ""}}
 * {{hidden||MI} - DfWebQA} is a multilingual definition question answering system (QAS) that extracts answers to definition queries from the short descriptions of web-sites returned by search engines, called web snippets. These answers are discriminated on the ground of lexico-syntactic regularities mined from multilingual resources supplied by Wikipedia. Results support that these regularities serve to significantly strengthen the answering process. In addition, Ml - DfWebQA} increases the robustness of multilingual definition QASs} by making use of aliases found in Wikipedia. ""}}


 * -- align="left" valign=top
 * Figueroa, Alejandro
 * Mining wikipedia resources for discovering answers to list questions in web snippets
 * 4th International Conference on Semantics, Knowledge, and Grid, SKG 2008, December 3, 2008 - December 5, 2008 Beijing, China
 * 2008
 * 
 * {{hidden||This paper presents LiSnQA, a list question answering system that extracts answers to list queries from the short descriptions of web-sites returned by search engines, called web snippets. LiSnQA} mines Wikipedia resources in order to obtain valuable information that assists in the extraction of these answers. The interesting facet of LiSnQA} is, that in contrast to current systems, it does not account for lists in Wikipedia, but for its redirections, categories, sandboxes, and first definition sentences. Results show that these resources strengthen the answering process. ""}}
 * {{hidden||This paper presents LiSnQA, a list question answering system that extracts answers to list queries from the short descriptions of web-sites returned by search engines, called web snippets. LiSnQA} mines Wikipedia resources in order to obtain valuable information that assists in the extraction of these answers. The interesting facet of LiSnQA} is, that in contrast to current systems, it does not account for lists in Wikipedia, but for its redirections, categories, sandboxes, and first definition sentences. Results show that these resources strengthen the answering process. ""}}


 * -- align="left" valign=top
 * Figueroa, Alejandro & Atkinson, John
 * Using dependency paths for answering definition questions on the web
 * 5th International Conference on Web Information Systems and Technologies, WEBIST 2009, March 23, 2009 - March 26, 2009 Lisbon, Portugal
 * 2009


 * -- align="left" valign=top
 * Finin, Tim & Syed, Zareen
 * Creating and exploiting a Web of semantic data
 * 2nd International Conference on Agents and Artificial Intelligence, ICAART 2010, January 22, 2010 - January 24, 2010 Valencia, Spain
 * 2010
 * {{hidden||Twenty years ago Tim Berners-Lee} proposed a distributed hypertext system based on standard Internet protocols. The Web that resulted fundamentally changed the ways we share information and services, both on the public Internet and within organizations. That original proposal contained the seeds of another effort that has not yet fully blossomed: a Semantic Web designed to enable computer programs to share and understand structured and semi-structured information easily. We will review the evolution of the idea and technologies to realize a Web of Data and describe how we are exploiting them to enhance information retrieval and information extraction. A key resource in our work is Wikitology, a hybrid knowledge base of structured and unstructured information extracted from Wikipedia.}}
 * {{hidden||Twenty years ago Tim Berners-Lee} proposed a distributed hypertext system based on standard Internet protocols. The Web that resulted fundamentally changed the ways we share information and services, both on the public Internet and within organizations. That original proposal contained the seeds of another effort that has not yet fully blossomed: a Semantic Web designed to enable computer programs to share and understand structured and semi-structured information easily. We will review the evolution of the idea and technologies to realize a Web of Data and describe how we are exploiting them to enhance information retrieval and information extraction. A key resource in our work is Wikitology, a hybrid knowledge base of structured and unstructured information extracted from Wikipedia.}}
 * {{hidden||Twenty years ago Tim Berners-Lee} proposed a distributed hypertext system based on standard Internet protocols. The Web that resulted fundamentally changed the ways we share information and services, both on the public Internet and within organizations. That original proposal contained the seeds of another effort that has not yet fully blossomed: a Semantic Web designed to enable computer programs to share and understand structured and semi-structured information easily. We will review the evolution of the idea and technologies to realize a Web of Data and describe how we are exploiting them to enhance information retrieval and information extraction. A key resource in our work is Wikitology, a hybrid knowledge base of structured and unstructured information extracted from Wikipedia.}}


 * -- align="left" valign=top
 * Fogarolli, Angela
 * Word sense disambiguation based on Wikipedia link structure
 * ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Fogarolli, Angela & Ronchetti, Marco
 * Domain independent semantic representation of multimedia presentations
 * International Conference on Intelligent Networking and Collaborative Systems, INCoS 2009, November 4, 2009 - November 6, 2009 Barcelona, Spain
 * 2009
 * 
 * {{hidden||This paper describes a domain independent approach for semantically annotating and representing multimedia presentations. It uses a combination of techniques to automatically discover the content of the media and, though supervised or unsupervised methods, it can generate a RDF} description out of it. The domain independence is achieved using Wikipedia as a source of knowledge instead of domain Ontologies. The described approach can be relevant for understanding multimedia content which can be used in Information Retrieval, categorization and summarization. ""}}
 * {{hidden||This paper describes a domain independent approach for semantically annotating and representing multimedia presentations. It uses a combination of techniques to automatically discover the content of the media and, though supervised or unsupervised methods, it can generate a RDF} description out of it. The domain independence is achieved using Wikipedia as a source of knowledge instead of domain Ontologies. The described approach can be relevant for understanding multimedia content which can be used in Information Retrieval, categorization and summarization. ""}}


 * -- align="left" valign=top
 * Fogarolli, Angela & Ronchetti, Marco
 * Discovering semantics in multimedia content using Wikipedia
 * 11th International Conference on Business Information Systems, BIS 2008, May 5, 2008 - May 7, 2008 Innsbruck, Austria
 * 2008
 * 


 * -- align="left" valign=top
 * Fu, Linyun; Wang, Haofen; Zhu, Haiping; Zhang, Huajie; Wang, Yang & Yu, Yong
 * Making more wikipedians: Facilitating semantics reuse for wikipedia authoring
 * 6th International Semantic Web Conference, ISWC 2007 and 2nd Asian Semantic Web Conference, ASWC 2007, November 11, 2007 - November 15, 2007 Busan, Korea, Republic of
 * 2007
 * 
 * {{hidden||Wikipedia, a killer application in Web 2.0, has embraced the power of collaborative editing to harness collective intelligence. It can also serve as an ideal Semantic Web data source due to its abundance, influence, high quality and well-structuring. However, the heavy burden of up-building and maintaining such an enormous and ever-growing online encyclopedic knowledge base still rests on a very small group of people. Many casual users may still feel difficulties in writing high quality Wikipedia articles. In this paper, we use RDF} graphs to model the key elements in Wikipedia authoring, and propose an integrated solution to make Wikipedia authoring easier based on RDF} graph matching, expecting making more Wikipedians. Our solution facilitates semantics reuse and provides users with: 1) a link suggestion module that suggests and auto-completes internal links between Wikipedia articles for the user; 2) a category suggestion module that helps the user place her articles in correct categories. A prototype system is implemented and experimental results show significant improvements over existing solutions to link and category suggestion tasks. The proposed enhancements can be applied to attract more contributors and relieve the burden of professional editors, thus enhancing the current Wikipedia to make it an even better Semantic Web data source. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Wikipedia, a killer application in Web 2.0, has embraced the power of collaborative editing to harness collective intelligence. It can also serve as an ideal Semantic Web data source due to its abundance, influence, high quality and well-structuring. However, the heavy burden of up-building and maintaining such an enormous and ever-growing online encyclopedic knowledge base still rests on a very small group of people. Many casual users may still feel difficulties in writing high quality Wikipedia articles. In this paper, we use RDF} graphs to model the key elements in Wikipedia authoring, and propose an integrated solution to make Wikipedia authoring easier based on RDF} graph matching, expecting making more Wikipedians. Our solution facilitates semantics reuse and provides users with: 1) a link suggestion module that suggests and auto-completes internal links between Wikipedia articles for the user; 2) a category suggestion module that helps the user place her articles in correct categories. A prototype system is implemented and experimental results show significant improvements over existing solutions to link and category suggestion tasks. The proposed enhancements can be applied to attract more contributors and relieve the burden of professional editors, thus enhancing the current Wikipedia to make it an even better Semantic Web data source. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Fukuhara, Tomohiro; Arai, Yoshiaki; Masuda, Hidetaka; Kimura, Akifumi; Yoshinaka, Takayuki; Utsuro, Takehito & Nakagawa, Hiroshi
 * KANSHIN: A cross-lingual concern analysis system using multilingual blog articles
 * 2008 1st International Workshop on Information-Explosion and Next Generation Search, INGS 2008, April 26, 2008 - April 27, 2008 Shenyang, China
 * 2008
 * 
 * {{hidden||An architecture of cross-lingual concern analysis (CLCA) using multilingual blog articles, and its prototype system are described. As various people who are living in various countries use the Web, cross-lingual information retrieval (CLIR) plays an important role in the next generation search. In this paper, we propose a CLCA} as one of CLIR} applications for facilitating users to find concerns of people across languages. We propose a layer architecture of CLCA, and its prototype system called KANSHIN.} The system collects Japanese, Chinese, Korean, and English blog articles, and analyzes concerns across languages. Users can find concerns from several viewpoints such as temporal, geographical, and a network of blog sites. The system also facilitates users to browse multilingual keywords using Wikipedia, and the system facilitates users to find spam blogs. An overview of the CLCA} architecture and the system are described. ""}}
 * {{hidden||An architecture of cross-lingual concern analysis (CLCA) using multilingual blog articles, and its prototype system are described. As various people who are living in various countries use the Web, cross-lingual information retrieval (CLIR) plays an important role in the next generation search. In this paper, we propose a CLCA} as one of CLIR} applications for facilitating users to find concerns of people across languages. We propose a layer architecture of CLCA, and its prototype system called KANSHIN.} The system collects Japanese, Chinese, Korean, and English blog articles, and analyzes concerns across languages. Users can find concerns from several viewpoints such as temporal, geographical, and a network of blog sites. The system also facilitates users to browse multilingual keywords using Wikipedia, and the system facilitates users to find spam blogs. An overview of the CLCA} architecture and the system are described. ""}}


 * -- align="left" valign=top
 * Gang, Wang; Huajie, Zhang; Haofen, Wang & Yong, Yu
 * Enhancing relation extraction by eliciting selectional constraint features from Wikipedia
 * 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France
 * 2007
 * {{hidden||Selectional Constraints are usually checked for detecting semantic relations. Previous work usually defined the constraints manually based on handcrafted concept taxonomy, which is time-consuming and impractical for large scale relation extraction. Further, the determination of entity type (e.g. NER) based on the taxonomy cannot achieve sufficiently high accuracy. In this paper, we propose a novel approach to extracting relation instances using the features elicited from Wikipedia, a free online encyclopedia. The features are represented as selectional constraints and further employed to enhance the extraction of relations. We conduct case studies on the validation of the extracted instances for two common relations {hasArtist(album, artist) and {hasDirector(film, director). Substantially high extraction precision (around 0.95) and validation accuracy (near 0.90) are obtained. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||Selectional Constraints are usually checked for detecting semantic relations. Previous work usually defined the constraints manually based on handcrafted concept taxonomy, which is time-consuming and impractical for large scale relation extraction. Further, the determination of entity type (e.g. NER) based on the taxonomy cannot achieve sufficiently high accuracy. In this paper, we propose a novel approach to extracting relation instances using the features elicited from Wikipedia, a free online encyclopedia. The features are represented as selectional constraints and further employed to enhance the extraction of relations. We conduct case studies on the validation of the extracted instances for two common relations {hasArtist(album, artist) and {hasDirector(film, director). Substantially high extraction precision (around 0.95) and validation accuracy (near 0.90) are obtained. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||Selectional Constraints are usually checked for detecting semantic relations. Previous work usually defined the constraints manually based on handcrafted concept taxonomy, which is time-consuming and impractical for large scale relation extraction. Further, the determination of entity type (e.g. NER) based on the taxonomy cannot achieve sufficiently high accuracy. In this paper, we propose a novel approach to extracting relation instances using the features elicited from Wikipedia, a free online encyclopedia. The features are represented as selectional constraints and further employed to enhance the extraction of relations. We conduct case studies on the validation of the extracted instances for two common relations {hasArtist(album, artist) and {hasDirector(film, director). Substantially high extraction precision (around 0.95) and validation accuracy (near 0.90) are obtained. Springer-Verlag} Berlin Heidelberg 2007.}}


 * -- align="left" valign=top
 * Garza, Sara E.; Brena, Ramon F. & Ramirez, Eduardo
 * Topic calculation and clustering: an application to wikipedia
 * 7th Mexican International Conference on Artificial Intelligence, MICAI 2008, October 27, 2008 - October 31, 2008 Atizapan de Zaragoza, Mexico
 * 2008
 * 
 * {{hidden||Wikipedia is nowadays one of the most valuable information resources; nevertheless, its current structure, which has no formal organization, does not allow to always have a useful browsing among topics. Moreover, even though most Wikipedia pages include a See} Also " section for navigating through those articles' related Wikipedia pages the only references included here are those which authors are aware of leading to incompleteness and other irregularities. In this work a method for finding related Wikipedia articles is proposed; this method relies on a framework that clusters documents into semantically-calculated topics and selects the closest documents which could enrich the {"See} Also " section. """}}
 * {{hidden||Wikipedia is nowadays one of the most valuable information resources; nevertheless, its current structure, which has no formal organization, does not allow to always have a useful browsing among topics. Moreover, even though most Wikipedia pages include a See} Also " section for navigating through those articles' related Wikipedia pages the only references included here are those which authors are aware of leading to incompleteness and other irregularities. In this work a method for finding related Wikipedia articles is proposed; this method relies on a framework that clusters documents into semantically-calculated topics and selects the closest documents which could enrich the {"See} Also " section. """}}


 * -- align="left" valign=top
 * Gaugaz, Julien; Zakrzewski, Jakub; Demartini, Gianluca & Nejdl, Wolfgang
 * How to trace and revise identities
 * 6th European Semantic Web Conference, ESWC 2009, May 31, 2009 - June 4, 2009 Heraklion, Crete, Greece
 * 2009
 * 
 * {{hidden||The Entity Name System (ENS) is a service aiming at providing globally unique URIs} for all kinds of real-world entities such as persons, locations and products, based on descriptions of such entities. Because entity descriptions available to the ENS} for deciding on entity Identity-Do} two entity descriptions refer to the same real-world entity?-are changing over time, the system has to revise its past decisions: One entity has been given two different URIs} or two entities have been attributed the same URI.} The question we have to investigate in this context is then: How do we propagate entity decision revisions to the clients which make use of the URIs} provided by the ENS?} In this paper we propose a solution which relies on labelling the IDs} with additional history information. These labels allow clients to locally detect deprecated URIs} they are using and also merge IDs} referring to the same real-world entity without needing to consult the ENS.} Making update requests to the ENS} only for the IDs} detected as deprecated considerably reduces the number of update requests, at the cost of a decrease in uniqueness quality. We investigate how much the number of update requests decreases using ID} history labelling, as well as how this impacts the uniqueness of the IDs} on the client. For the experiments we use both artificially generated entity revision histories as well as a real case study based on the revision history of the Dutch and Simple English Wikipedia. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||The Entity Name System (ENS) is a service aiming at providing globally unique URIs} for all kinds of real-world entities such as persons, locations and products, based on descriptions of such entities. Because entity descriptions available to the ENS} for deciding on entity Identity-Do} two entity descriptions refer to the same real-world entity?-are changing over time, the system has to revise its past decisions: One entity has been given two different URIs} or two entities have been attributed the same URI.} The question we have to investigate in this context is then: How do we propagate entity decision revisions to the clients which make use of the URIs} provided by the ENS?} In this paper we propose a solution which relies on labelling the IDs} with additional history information. These labels allow clients to locally detect deprecated URIs} they are using and also merge IDs} referring to the same real-world entity without needing to consult the ENS.} Making update requests to the ENS} only for the IDs} detected as deprecated considerably reduces the number of update requests, at the cost of a decrease in uniqueness quality. We investigate how much the number of update requests decreases using ID} history labelling, as well as how this impacts the uniqueness of the IDs} on the client. For the experiments we use both artificially generated entity revision histories as well as a real case study based on the revision history of the Dutch and Simple English Wikipedia. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Gehringer, Edward
 * Assessing students' WIKI contributions
 * 2008 ASEE Annual Conference and Exposition, June 22, 2008 - June 24, 2008 Pittsburg, PA, United states
 * 2008


 * -- align="left" valign=top
 * Geva, Shlomo
 * GPX: Ad-Hoc queries and automated link discovery in the Wikipedia
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||The INEX} 2007 evaluation was based on the Wikipedia collection. In this paper we describe some modifications to the GPX} search engine and the approach taken in the Ad-hoc and the Link-the-Wiki} tracks. In earlier version of GPX} scores were recursively propagated from text containing nodes, through ancestors, all the way to the document root of the XML} tree. In this paper we describe a simplification whereby the score of each node is computed directly, doing away with the score propagation mechanism. Results indicate slightly improved performance. The GPX} search engine was used in the Link-the-Wiki} track to identify prospective incoming links to new Wikipedia pages. We also describe a simple and efficient approach to the identification of prospective outgoing links in new Wikipedia pages. We present and discuss evaluation results. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||The INEX} 2007 evaluation was based on the Wikipedia collection. In this paper we describe some modifications to the GPX} search engine and the approach taken in the Ad-hoc and the Link-the-Wiki} tracks. In earlier version of GPX} scores were recursively propagated from text containing nodes, through ancestors, all the way to the document root of the XML} tree. In this paper we describe a simplification whereby the score of each node is computed directly, doing away with the score propagation mechanism. Results indicate slightly improved performance. The GPX} search engine was used in the Link-the-Wiki} track to identify prospective incoming links to new Wikipedia pages. We also describe a simple and efficient approach to the identification of prospective outgoing links in new Wikipedia pages. We present and discuss evaluation results. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Geva, Shlomo; Kamps, Jaap; Lethonen, Miro; Schenkel, Ralf; Thom, James A. & Trotman, Andrew
 * Overview of the INEX 2009 Ad hoc track
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||This paper gives an overview of the INEX} 2009 Ad Hoc Track. The main goals of the Ad Hoc Track were three-fold. The first goal was to investigate the impact of the collection scale and markup, by using a new collection that is again based on a the Wikipedia but is over 4 times larger, with longer articles and additional semantic annotations. For this reason the Ad Hoc track tasks stayed unchanged, and the Thorough Task of INEX} 2002-2006 returns. The second goal was to study the impact of more verbose queries on retrieval effectiveness, by using the available markup as structural constraints-now using both the Wikipedia's layout-based markup, as well as the enriched semantic markup-and by the use of phrases. The third goal was to compare different result granularities by allowing systems to retrieve XML} elements, ranges of XML} elements, or arbitrary passages of text. This investigates the value of the internal document structure (as provided by the XML} mark-up) for retrieving relevant information. The INEX} 2009 Ad Hoc Track featured four tasks: For the Thorough Task a ranked-list of results (elements or passages) by estimated relevance was needed. For the Focused Task a ranked-list of non-overlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the setup of the track, and the results for the four tasks. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper gives an overview of the INEX} 2009 Ad Hoc Track. The main goals of the Ad Hoc Track were three-fold. The first goal was to investigate the impact of the collection scale and markup, by using a new collection that is again based on a the Wikipedia but is over 4 times larger, with longer articles and additional semantic annotations. For this reason the Ad Hoc track tasks stayed unchanged, and the Thorough Task of INEX} 2002-2006 returns. The second goal was to study the impact of more verbose queries on retrieval effectiveness, by using the available markup as structural constraints-now using both the Wikipedia's layout-based markup, as well as the enriched semantic markup-and by the use of phrases. The third goal was to compare different result granularities by allowing systems to retrieve XML} elements, ranges of XML} elements, or arbitrary passages of text. This investigates the value of the internal document structure (as provided by the XML} mark-up) for retrieving relevant information. The INEX} 2009 Ad Hoc Track featured four tasks: For the Thorough Task a ranked-list of results (elements or passages) by estimated relevance was needed. For the Focused Task a ranked-list of non-overlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the setup of the track, and the results for the four tasks. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Ghinea, Gheorghita; Bygstad, Bendik & Schmitz, Christoph
 * Multi-dimensional moderation in online communities: Experiences with three Norwegian sites
 * 3rd International Conference on Online Communities and Social Computing, OCSC 2009. Held as Part of HCI International 2009, July 19, 2009 - July 24, 2009 San Diego, CA, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Giampiccolo, Danilo; Forner, Pamela; Herrera, Jesus; Penas, Anselmo; Ayache, Christelle; Forascu, Corina; Jijkoun, Valentin; Osenova, Petya; Rocha, Paulo; Sacaleanu, Bogdan & Sutcliffe, Richard
 * Overview of the CLEF 2007 multilingual question answering track
 * 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary
 * 2008
 * 
 * {{hidden||The fifth QA} campaign at CLEF} [1], having its first edition in 2003, offered not only a main task but an Answer Validation Exercise (AVE) [2], which continued last year's pilot, and a new pilot: the Question Answering on Speech Transcripts (QAST) [3, 15]. The main task was characterized by the focus on cross-linguality, while covering as many European languages as possible. As novelty, some QA} pairs were grouped in clusters. Every cluster was characterized by a topic (not given to participants). The questions from a cluster possibly contain co-references between one of them and the others. Finally, the need for searching answers in web formats was satisfied by introducing Wikipedia as document corpus. The results and the analyses reported by the participants suggest that the introduction of Wikipedia and the topic related questions led to a drop in systems' performance. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||The fifth QA} campaign at CLEF} [1], having its first edition in 2003, offered not only a main task but an Answer Validation Exercise (AVE) [2], which continued last year's pilot, and a new pilot: the Question Answering on Speech Transcripts (QAST) [3, 15]. The main task was characterized by the focus on cross-linguality, while covering as many European languages as possible. As novelty, some QA} pairs were grouped in clusters. Every cluster was characterized by a topic (not given to participants). The questions from a cluster possibly contain co-references between one of them and the others. Finally, the need for searching answers in web formats was satisfied by introducing Wikipedia as document corpus. The results and the analyses reported by the participants suggest that the introduction of Wikipedia and the topic related questions led to a drop in systems' performance. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Giuliano, Claudio; Gliozzo, Alfio Massimiliano; Gangemi, Aldo & Tymoshenko, Kateryna
 * Acquiring thesauri from wikis by exploiting domain models and lexical substitution
 * 7th Extended Semantic Web Conference, ESWC 2010, May 30, 2010 - June 3, 2010 Heraklion, Crete, Greece
 * 2010
 * 


 * -- align="left" valign=top
 * Gonzalez-Cristobal, Jose-Carlos; Goni-Menoyo, Jose Miguel; Villena-Roman, Julio & Lana-Serrano, Sara
 * MIRACLE progress in monolingual information retrieval at Ad-Hoc CLEF 2007
 * 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary
 * 2008
 * 
 * {{hidden||This paper presents the 2007 MIRACLE's} team approach to the AdHoc} Information Retrieval track. The main work carried out for this campaign has been around monolingual experiments, in the standard and in the robust tracks. The most important contributions have been the general introduction of automatic named-entities extraction and the use of wikipedia resources. For the 2007 campaign, runs were submitted for the following languages and tracks: a) Monolingual: Bulgarian, Hungarian, and Czech. b) Robust monolingual: French, English and Portuguese. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper presents the 2007 MIRACLE's} team approach to the AdHoc} Information Retrieval track. The main work carried out for this campaign has been around monolingual experiments, in the standard and in the robust tracks. The most important contributions have been the general introduction of automatic named-entities extraction and the use of wikipedia resources. For the 2007 campaign, runs were submitted for the following languages and tracks: a) Monolingual: Bulgarian, Hungarian, and Czech. b) Robust monolingual: French, English and Portuguese. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Grac, Marek
 * Trdlo, an open source tool for building transducing dictionary
 * 12th International Conference on Text, Speech and Dialogue, TSD 2009, September 13, 2009 - September 17, 2009 Pilsen, Czech republic
 * 2009
 * 
 * {{hidden||This paper describes the development of an open-source tool named Trdlo. Trdlo was developed as part of our effort to build a machine translation system between very close languages. These languages usually do not have available pre-processed linguistic resources or dictionaries suitable for computer processing. Bilingual dictionaries have a big impact on quality of translation. Proposed methods described in this paper attempt to extend existing dictionaries with inferable translation pairs. Our approach requires only 'cheap' resources: a list of lemmata for each language and rules for inferring words from one language to another. It is also possible to use other resources like annotated corpora or Wikipedia. Results show that this approach greatly improves effectivity of building Czech-Slovak} dictionary. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper describes the development of an open-source tool named Trdlo. Trdlo was developed as part of our effort to build a machine translation system between very close languages. These languages usually do not have available pre-processed linguistic resources or dictionaries suitable for computer processing. Bilingual dictionaries have a big impact on quality of translation. Proposed methods described in this paper attempt to extend existing dictionaries with inferable translation pairs. Our approach requires only 'cheap' resources: a list of lemmata for each language and rules for inferring words from one language to another. It is also possible to use other resources like annotated corpora or Wikipedia. Results show that this approach greatly improves effectivity of building Czech-Slovak} dictionary. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Granitzer, Michael; Seifert, Christin & Zechner, Mario
 * Context based wikipedia linking
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 


 * -- align="left" valign=top
 * Guo, Hongzhi; Chen, Qingcai; Cui, Lei & Wang, Xiaolong
 * An interactive semantic knowledge base unifying wikipedia and HowNet
 * 7th International Conference on Information, Communications and Signal Processing, ICICS 2009, December 8, 2009 - December 10, 2009 Macau Fisherman's Wharf, China
 * 2009
 * 
 * {{hidden||We present an interactive, exoteric semantic knowledge base, which integrates {HowNet} and the online encyclopedia Wikipedia. The semantic knowledge base mainly builds on items, categories, attributes and relation between. In the constructing process, a mapping relationship is established from {HowNet, Wikipedia to the new knowledge base. Different from other online encyclopedias or knowledge dictionaries, the categories in the semantic knowledge base are semantically tagged, and this can be well used in semantic analysis and semantic computing. Currently the knowledge base built in this paper contains more than 200,000 items and 1,000 categories, and these are still increasing every day. ""}}
 * {{hidden||We present an interactive, exoteric semantic knowledge base, which integrates {HowNet} and the online encyclopedia Wikipedia. The semantic knowledge base mainly builds on items, categories, attributes and relation between. In the constructing process, a mapping relationship is established from {HowNet, Wikipedia to the new knowledge base. Different from other online encyclopedias or knowledge dictionaries, the categories in the semantic knowledge base are semantically tagged, and this can be well used in semantic analysis and semantic computing. Currently the knowledge base built in this paper contains more than 200,000 items and 1,000 categories, and these are still increasing every day. ""}}


 * -- align="left" valign=top
 * Gupta, Anand; Goyal, Akhil; Bindal, Aman & Gupta, Ankuj
 * Meliorated approach for extracting Bilingual terminology from wikipedia
 * 11th International Conference on Computer and Information Technology, ICCIT 2008, December 25, 2008 - December 27, 2008 Khulna, Bangladesh
 * 2008
 * 


 * -- align="left" valign=top
 * Hartrumpf, Sven; Glockner, Ingo & Leveling, Johannes
 * Coreference resolution for questions and answer merging by validation
 * 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary
 * 2008
 * 
 * {{hidden||For its fourth participation at QA@CLEF, the German question answering (QA) system InSicht} was improved for CLEF} 2007 in the following main areas: questions containing pronominal or nominal anaphors are treated by a coreference resolver; the shallow QA} methods are improved; and a specialized module is added for answer merging. Results showed a performance drop compared to last year mainly due to problems in handling the newly added Wikipedia corpus. However, dialog treatment by coreference resolution delivered very accurate results so that follow-up questions can be handled similarly to isolated questions. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||For its fourth participation at QA@CLEF, the German question answering (QA) system InSicht} was improved for CLEF} 2007 in the following main areas: questions containing pronominal or nominal anaphors are treated by a coreference resolver; the shallow QA} methods are improved; and a specialized module is added for answer merging. Results showed a performance drop compared to last year mainly due to problems in handling the newly added Wikipedia corpus. However, dialog treatment by coreference resolution delivered very accurate results so that follow-up questions can be handled similarly to isolated questions. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Haruechaiyasak, Choochart & Damrongrat, Chaianun
 * Article recommendation based on a topic model for Wikipedia Selection for Schools
 * 11th International Conference on Asian Digital Libraries, ICADL 2008, December 2, 2008 - December 5, 2008 Bali, Indonesia
 * 2008
 * 


 * -- align="left" valign=top
 * Hatcher-Gallop, Rolanda; Fazal, Zohra & Oluseyi, Maya
 * Quest for excellence in a wiki-based world
 * 2009 IEEE International Professional Communication Conference, IPCC 2009, July 19, 2009 - July 22, 2009 Waikiki, {HI, United states
 * 2009
 * 


 * -- align="left" valign=top
 * He, Jiyin
 * Link detection with wikipedia
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||This paper describes our participation in the INEX} 2008 Link the Wiki track. We focused on the file-to-file task and submitted three runs, which were designed to compare the impact of different features on link generation. For outgoing links, we introduce the anchor likelihood ratio as an indicator for anchor detection, and explore two types of evidence for target identification, namely, the title field evidence and the topic article content evidence. We find that the anchor likelihood ratio is a useful indicator for anchor detection, and that in addition to the title field evidence, re-ranking with the topic article content evidence is effective for improving target identification. For incoming links, we use exact match and retrieval method with language modeling approach, and find that the exact match approach works best. On top of that, our experiment shows that the semantic relatedness between Wikipedia articles also has certain ability to indicate links. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper describes our participation in the INEX} 2008 Link the Wiki track. We focused on the file-to-file task and submitted three runs, which were designed to compare the impact of different features on link generation. For outgoing links, we introduce the anchor likelihood ratio as an indicator for anchor detection, and explore two types of evidence for target identification, namely, the title field evidence and the topic article content evidence. We find that the anchor likelihood ratio is a useful indicator for anchor detection, and that in addition to the title field evidence, re-ranking with the topic article content evidence is effective for improving target identification. For incoming links, we use exact match and retrieval method with language modeling approach, and find that the exact match approach works best. On top of that, our experiment shows that the semantic relatedness between Wikipedia articles also has certain ability to indicate links. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * He, Jiyin & Rijke, Maarten De
 * An exploration of learning to link with wikipedia: Features, methods and training collection
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||We describe our participation in the Link-the-Wiki} track at INEX} 2009. We apply machine learning methods to the anchor-to-best-entry-point task and explore the impact of the following aspects of our approaches: features, learning methods as well as the collection used for training the models. We find that a learning to rank-based approach and a binary classification approach do not differ a lot. The new Wikipedia collection which is of larger size and which has more links than the collection previously used, provides better training material for learning our models. In addition, a heuristic run which combines the two intuitively most useful features outperforms machine learning based runs, which suggests that a further analysis and selection of features is necessary. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We describe our participation in the Link-the-Wiki} track at INEX} 2009. We apply machine learning methods to the anchor-to-best-entry-point task and explore the impact of the following aspects of our approaches: features, learning methods as well as the collection used for training the models. We find that a learning to rank-based approach and a binary classification approach do not differ a lot. The new Wikipedia collection which is of larger size and which has more links than the collection previously used, provides better training material for learning our models. In addition, a heuristic run which combines the two intuitively most useful features outperforms machine learning based runs, which suggests that a further analysis and selection of features is necessary. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * He, Jiyin; Zhang, Xu; Weerkamp, Wouter & Larson, Martha
 * Metadata and multilinguality in video classification
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||The VideoCLEF} 2008 Vid2RSS} task involves the assignment of thematic category labels to dual language (Dutch/English) television episode videos. The University of Amsterdam chose to focus on exploiting archival metadata and speech transcripts generated by both Dutch and English speech recognizers. A Support Vector Machine (SVM) classifier was trained on training data collected from Wikipedia. The results provide evidence that combining archival metadata with speech transcripts can improve classification performance, but that adding speech transcripts in an additional language does not yield performance gains. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||The VideoCLEF} 2008 Vid2RSS} task involves the assignment of thematic category labels to dual language (Dutch/English) television episode videos. The University of Amsterdam chose to focus on exploiting archival metadata and speech transcripts generated by both Dutch and English speech recognizers. A Support Vector Machine (SVM) classifier was trained on training data collected from Wikipedia. The results provide evidence that combining archival metadata with speech transcripts can improve classification performance, but that adding speech transcripts in an additional language does not yield performance gains. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * He, Miao; Cutler, Michal & Wu, Kelvin
 * Categorizing queries by topic directory
 * 9th International Conference on Web-Age Information Management, WAIM 2008, July 20, 2008 - July 22, 2008 Zhangjiajie, China
 * 2008
 * 
 * {{hidden||The categorization of a web user query by topic or category can be used to select useful web sources that contain the required information. In pursuit of this goal, we explore methods for mapping user queries to category hierarchies under which deep web resources are also assumed to be classified. Our sources for these category hierarchies, or directories, are Yahoo! Directory and Wikipedia. Forwarding an unrefined query (in our case a typical fact finding query sent to a question answering system) directly to these directory resources usually returns no directories or incorrect ones. Instead, we develop techniques to generate more specific directory finding queries from an unrefined query and use these to retrieve better directories. Despite these engineered queries, our two resources often return multiple directories that include many incorrect results, i.e., directories whose categories are not related to the query, and thus web resources for these categories are unlikely to contain the required information. We develop methods for selecting the most useful ones. We consider a directory to be useful if web sources for any of its narrow categories are likely to contain the searched for information. We evaluate our mapping system on a set of 250 TREC} questions and obtain precision and recall in the 0.8 to 1.0 range. ""}}
 * {{hidden||The categorization of a web user query by topic or category can be used to select useful web sources that contain the required information. In pursuit of this goal, we explore methods for mapping user queries to category hierarchies under which deep web resources are also assumed to be classified. Our sources for these category hierarchies, or directories, are Yahoo! Directory and Wikipedia. Forwarding an unrefined query (in our case a typical fact finding query sent to a question answering system) directly to these directory resources usually returns no directories or incorrect ones. Instead, we develop techniques to generate more specific directory finding queries from an unrefined query and use these to retrieve better directories. Despite these engineered queries, our two resources often return multiple directories that include many incorrect results, i.e., directories whose categories are not related to the query, and thus web resources for these categories are unlikely to contain the required information. We develop methods for selecting the most useful ones. We consider a directory to be useful if web sources for any of its narrow categories are likely to contain the searched for information. We evaluate our mapping system on a set of 250 TREC} questions and obtain precision and recall in the 0.8 to 1.0 range. ""}}


 * -- align="left" valign=top
 * Hecht, Brent & Gergle, Darren
 * The tower of Babel meets web 2.0: User-generated content and its applications in a multilingual context
 * 28th Annual CHI Conference on Human Factors in Computing Systems, CHI 2010, April 10, 2010 - April 15, 2010 Atlanta, GA, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Hecht, Brent & Moxley, Emily
 * Terabytes of tobler: Evaluating the first law in a massive, domain-neutral representation of world knowledge
 * 9th International Conference on Spatial Information Theory, COSIT 2009, September 21, 2009 - September 25, 2009 Aber Wrac'h, France
 * 2009
 * 


 * -- align="left" valign=top
 * Heiskanen, Tero; Kokkonen, Juhana; Hintikka, Kari A.; Kola, Petri; Hintsa, Timo & Nakki, Pirjo
 * Tutkimusparvi the open research swarm in Finland
 * 12th International MindTrek Conference: Entertainment and Media in the Ubiquitous Era, MindTrek'08, October 7, 2008 - October 9, 2008 Tampere, Finland
 * 2008
 * 


 * -- align="left" valign=top
 * Hoffman, Joel
 * Employee knowledge: Instantly searchable
 * Digital Energy Conference and Exhibition 2009, April 7, 2009 - April 8, 2009 Houston, TX, United states
 * 2009
 * {{hidden||The online encyclopedia, Wikipedia, has proven the value of the world community contributing to an instantly searchable world knowledge base. The same technology can be applied to the company community: each individual sharing strategic tips directly related to company interests that are then instantly searchable. Each employee can share, using Microsoft Sharepoint Wiki Pages, those unique hints, tips, tricks, and knowledge that they feel could be of the highest value to other employees: how-to's and shortcuts in company software packages, learnings from pilot projects (successful or not), links to fantastic resources, etc. This growing knowledge base then becomes an instantly searchable, global resource for the entire company. Occidental of Elk Hills, Inc. just recently, October 15, 2008, started a rollout of Wiki page use at its Elk Hills, CA, USA} properties. There are over 300 employees at Elk Hills and its Wiki Home Page received over 1500 hits in its first day, with multiple employees contributing multiple articles. Employees are already talking about time-savers they have learned and applied. A second presentation was demanded by those that missed the first. The rollout has generated a buzz of excitement and interest that we will be encouraging into the indefinite future. The significance of a corporate knowledge base can be major: high-tech professionals not spending hours figuring out how to do what someone else has already figured out and documented, support personnel not having to answer the same questions over and over again but having only to point those asking to steps already documented, employees learning time-saving tips that they may never have learned or thought of, professionals no longer wasting time searching for results of other trials or having to reinvent the wheel. Time is money. Knowledge is power. Applying Wiki technology to corporate knowledge returns time and knowledge to the workforce leading to bottom line benefits and powerful corporate growth. 2009, Society of Petroleum Engineers.}}
 * {{hidden||The online encyclopedia, Wikipedia, has proven the value of the world community contributing to an instantly searchable world knowledge base. The same technology can be applied to the company community: each individual sharing strategic tips directly related to company interests that are then instantly searchable. Each employee can share, using Microsoft Sharepoint Wiki Pages, those unique hints, tips, tricks, and knowledge that they feel could be of the highest value to other employees: how-to's and shortcuts in company software packages, learnings from pilot projects (successful or not), links to fantastic resources, etc. This growing knowledge base then becomes an instantly searchable, global resource for the entire company. Occidental of Elk Hills, Inc. just recently, October 15, 2008, started a rollout of Wiki page use at its Elk Hills, CA, USA} properties. There are over 300 employees at Elk Hills and its Wiki Home Page received over 1500 hits in its first day, with multiple employees contributing multiple articles. Employees are already talking about time-savers they have learned and applied. A second presentation was demanded by those that missed the first. The rollout has generated a buzz of excitement and interest that we will be encouraging into the indefinite future. The significance of a corporate knowledge base can be major: high-tech professionals not spending hours figuring out how to do what someone else has already figured out and documented, support personnel not having to answer the same questions over and over again but having only to point those asking to steps already documented, employees learning time-saving tips that they may never have learned or thought of, professionals no longer wasting time searching for results of other trials or having to reinvent the wheel. Time is money. Knowledge is power. Applying Wiki technology to corporate knowledge returns time and knowledge to the workforce leading to bottom line benefits and powerful corporate growth. 2009, Society of Petroleum Engineers.}}
 * {{hidden||The online encyclopedia, Wikipedia, has proven the value of the world community contributing to an instantly searchable world knowledge base. The same technology can be applied to the company community: each individual sharing strategic tips directly related to company interests that are then instantly searchable. Each employee can share, using Microsoft Sharepoint Wiki Pages, those unique hints, tips, tricks, and knowledge that they feel could be of the highest value to other employees: how-to's and shortcuts in company software packages, learnings from pilot projects (successful or not), links to fantastic resources, etc. This growing knowledge base then becomes an instantly searchable, global resource for the entire company. Occidental of Elk Hills, Inc. just recently, October 15, 2008, started a rollout of Wiki page use at its Elk Hills, CA, USA} properties. There are over 300 employees at Elk Hills and its Wiki Home Page received over 1500 hits in its first day, with multiple employees contributing multiple articles. Employees are already talking about time-savers they have learned and applied. A second presentation was demanded by those that missed the first. The rollout has generated a buzz of excitement and interest that we will be encouraging into the indefinite future. The significance of a corporate knowledge base can be major: high-tech professionals not spending hours figuring out how to do what someone else has already figured out and documented, support personnel not having to answer the same questions over and over again but having only to point those asking to steps already documented, employees learning time-saving tips that they may never have learned or thought of, professionals no longer wasting time searching for results of other trials or having to reinvent the wheel. Time is money. Knowledge is power. Applying Wiki technology to corporate knowledge returns time and knowledge to the workforce leading to bottom line benefits and powerful corporate growth. 2009, Society of Petroleum Engineers.}}


 * -- align="left" valign=top
 * Hong, Richang; Tang, Jinhui; Zha, Zheng-Jun; Luo, Zhiping & Chua, Tat-Seng
 * Mediapedia: Mining web knowledge to construct multimedia encyclopedia
 * 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China
 * 2009
 * 
 * {{hidden||In recent years, we have witnessed the blooming of Web 2.0 content such as Wikipedia, Flickr and YouTube, etc. How might we benefit from such rich media resources available on the internet? This paper presents a novel concept called Mediapedia, a dynamic multimedia encyclopedia that takes advantage of, and in fact is built from the text and image resources on the Web. The Mediapedia distinguishes itself from the traditional encyclopedia in four main ways. (1) It tries to present users with multimedia contents (e.g., text, image, video) which we believed are more intuitive and informative to users. (2) It is fully automated because it downloads the media contents as well as the corresponding textual descriptions from the Web and assembles them for presentation. (3) It is dynamic as it will use the latest multimedia content to compose the answer. This is not true for the traditional encyclopedia. (4) The design of Mediapedia is flexible and extensible such that we can easily incorporate new kinds of mediums such as video and languages into the framework. The effectiveness of Mediapedia is demonstrated and two potential applications are described in this paper. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||In recent years, we have witnessed the blooming of Web 2.0 content such as Wikipedia, Flickr and YouTube, etc. How might we benefit from such rich media resources available on the internet? This paper presents a novel concept called Mediapedia, a dynamic multimedia encyclopedia that takes advantage of, and in fact is built from the text and image resources on the Web. The Mediapedia distinguishes itself from the traditional encyclopedia in four main ways. (1) It tries to present users with multimedia contents (e.g., text, image, video) which we believed are more intuitive and informative to users. (2) It is fully automated because it downloads the media contents as well as the corresponding textual descriptions from the Web and assembles them for presentation. (3) It is dynamic as it will use the latest multimedia content to compose the answer. This is not true for the traditional encyclopedia. (4) The design of Mediapedia is flexible and extensible such that we can easily incorporate new kinds of mediums such as video and languages into the framework. The effectiveness of Mediapedia is demonstrated and two potential applications are described in this paper. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Hori, Kentaro; Oishi, Tetsuya; Mine, Tsunenori; Hasegawa, Ryuzo; Fujita, Hiroshi & Koshimura, Miyuki
 * Related word extraction from wikipedia for web retrieval assistance
 * 2nd International Conference on Agents and Artificial Intelligence, ICAART 2010, January 22, 2010 - January 24, 2010 Valencia, Spain
 * 2010
 * {{hidden||This paper proposes a web retrieval system with extended queries generated from the contents of Wikipedia.By} using the extended queries, we aim to assist user in retrieving Web pages and acquiring knowledge. To extract extended query items, we make much of hyperlinks in Wikipedia in addition to the related word extraction algorithm. We evaluated the system through experimental use of it by several examinees and the questionnaires to them. Experimental results show that our system works well for user's retrieval and knowledge acquisition.}}
 * {{hidden||This paper proposes a web retrieval system with extended queries generated from the contents of Wikipedia.By} using the extended queries, we aim to assist user in retrieving Web pages and acquiring knowledge. To extract extended query items, we make much of hyperlinks in Wikipedia in addition to the related word extraction algorithm. We evaluated the system through experimental use of it by several examinees and the questionnaires to them. Experimental results show that our system works well for user's retrieval and knowledge acquisition.}}
 * {{hidden||This paper proposes a web retrieval system with extended queries generated from the contents of Wikipedia.By} using the extended queries, we aim to assist user in retrieving Web pages and acquiring knowledge. To extract extended query items, we make much of hyperlinks in Wikipedia in addition to the related word extraction algorithm. We evaluated the system through experimental use of it by several examinees and the questionnaires to them. Experimental results show that our system works well for user's retrieval and knowledge acquisition.}}


 * -- align="left" valign=top
 * Huang, Darren Wei Che; Xu, Yue; Trotman, Andrew & Geva, Shlomo
 * Overview of INEX 2007 link the Wiki track
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||Wikipedia is becoming ever more popular. Linking between documents is typically provided in similar environments in order to achieve collaborative knowledge sharing. However, this functionality in Wikipedia is not integrated into the document creation process and the quality of automatically generated links has never been quantified. The Link the Wiki (LTW) track at INEX} in 2007 aimed at producing a standard procedure, metrics and a discussion forum for the evaluation of link discovery. The tasks offered by the LTW} track as well as its evaluation present considerable research challenges. This paper briefly described the LTW} task and the procedure of evaluation used at LTW} track in 2007. Automated link discovery methods used by participants are outlined. An overview of the evaluation results is concisely presented and further experiments are reported. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Wikipedia is becoming ever more popular. Linking between documents is typically provided in similar environments in order to achieve collaborative knowledge sharing. However, this functionality in Wikipedia is not integrated into the document creation process and the quality of automatically generated links has never been quantified. The Link the Wiki (LTW) track at INEX} in 2007 aimed at producing a standard procedure, metrics and a discussion forum for the evaluation of link discovery. The tasks offered by the LTW} track as well as its evaluation present considerable research challenges. This paper briefly described the LTW} task and the procedure of evaluation used at LTW} track in 2007. Automated link discovery methods used by participants are outlined. An overview of the evaluation results is concisely presented and further experiments are reported. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Huang, Jin-Xia; Ryu, Pum-Mo & Choi, Key-Sun
 * An empirical research on extracting relations from Wikipedia text
 * 9th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2008, November 2, 2008 - November 5, 2008 Daejeon, Korea, Republic of
 * 2008
 * 
 * {{hidden||A feature based relation classification approach is presented, in which probabilistic and semantic relatedness features between patterns and relation types are employed with other linguistic information. The importance of each feature set is evaluated with Chi-square estimator, and the experiments show that, the relatedness features have big impact on the relation classification performance. A series experiments are also performed to evaluate the different machine learning approaches on relation classification, among which Bayesian outperformed other approaches including Support Vector Machine (SVM).} 2008 Springer Berlin Heidelberg.}}
 * {{hidden||A feature based relation classification approach is presented, in which probabilistic and semantic relatedness features between patterns and relation types are employed with other linguistic information. The importance of each feature set is evaluated with Chi-square estimator, and the experiments show that, the relatedness features have big impact on the relation classification performance. A series experiments are also performed to evaluate the different machine learning approaches on relation classification, among which Bayesian outperformed other approaches including Support Vector Machine (SVM).} 2008 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Huynh, Dat T.; Cao, Tru H.; Pham, Phuong H.T. & Hoang, Toan N.
 * Using hyperlink texts to improve quality of identifying document topics based on Wikipedia
 * 1st International Conference on Knowledge and Systems Engineering, KSE 2009, October 13, 2009 - October 17, 2009 Hanoi, Viet nam
 * 2009
 * 


 * -- align="left" valign=top
 * Iftene, Adrian; Pistol, Ionut & Trandabat, Diana
 * Grammar-based automatic extraction of definitions
 * 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2008, September 26, 2008 - September 29, 2008 Timisoara, Romania
 * 2008
 * 


 * -- align="left" valign=top
 * IV, Adam C. Powell & Morris, Arthur E.
 * Wikipedia in materials education
 * 136th TMS Annual Meeting, 2007, Febrary 25, 2007 - March 1, 2007 Orlando, FL, United states
 * 2007


 * -- align="left" valign=top
 * Jack, Hugh
 * Using a wiki for professional communication and collaboration
 * 2009 ASEE Annual Conference and Exposition, June 14, 2009 - June 17, 2009 Austin, TX, United states
 * 2009


 * -- align="left" valign=top
 * Jamsen, Janne; Nappila, Turkka & Arvola, Paavo
 * Entity ranking based on category expansion
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||This paper introduces category and link expansion strategies for the XML} Entity Ranking track at INEX} 2007. Category expansion is a coefficient propagation method for the Wikipedia category hierarchy based on given categories or categories derived from sample entities. Link expansion utilizes links between Wikipedia articles. The strategies are evaluated within the entity ranking and list completion tasks. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper introduces category and link expansion strategies for the XML} Entity Ranking track at INEX} 2007. Category expansion is a coefficient propagation method for the Wikipedia category hierarchy based on given categories or categories derived from sample entities. Link expansion utilizes links between Wikipedia articles. The strategies are evaluated within the entity ranking and list completion tasks. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Janik, Maciej & Kochut, Krys J.
 * Wikipedia in action: Ontological knowledge in text categorization
 * 2nd Annual IEEE International Conference on Semantic Computing, ICSC 2008, August 4, 2008 - August 7, 2008 Santa Clara, CA, United states
 * 2008
 * 
 * {{hidden||We present a new, ontology-based approach to the automatic text categorization. An important and novel aspect of this approach is that our categorization method does not require a training set, which is in contrast to the traditional statistical and probabilistic methods. In the presented method, the ontology, including the domain concepts organized into hierarchies of categories and interconnected by relationships, as well as instances and connections among them, effectively becomes the classifier. Our method focuses on (i) converting a text document into a thematic graph of entities occurring in the document, (ii) ontological classification of the entities in the graph, and (iii) determining the overall categorization of the thematic graph, and as a result, the document itself. In the presented experiments, we used an RDF} ontology constructed from the full English version of Wikipedia. Our experiments, conducted on corpora of Reuters news articles, showed that our training-less categorization method achieved a very good overall accuracy. ""}}
 * {{hidden||We present a new, ontology-based approach to the automatic text categorization. An important and novel aspect of this approach is that our categorization method does not require a training set, which is in contrast to the traditional statistical and probabilistic methods. In the presented method, the ontology, including the domain concepts organized into hierarchies of categories and interconnected by relationships, as well as instances and connections among them, effectively becomes the classifier. Our method focuses on (i) converting a text document into a thematic graph of entities occurring in the document, (ii) ontological classification of the entities in the graph, and (iii) determining the overall categorization of the thematic graph, and as a result, the document itself. In the presented experiments, we used an RDF} ontology constructed from the full English version of Wikipedia. Our experiments, conducted on corpora of Reuters news articles, showed that our training-less categorization method achieved a very good overall accuracy. ""}}


 * -- align="left" valign=top
 * Javanmardi, Sara; Ganjisaffar, Yasser; Lopes, Cristina & Baldi, Pierre
 * User contribution and trust in Wikipedia
 * 2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2009, November 11, 2009 - November 14, 2009 Washington, DC, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Jenkinson, Dylan & Trotman, Andrew
 * Wikipedia ad hoc passage retrieval and Wikipedia document linking
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||Ad hoc passage retrieval within the Wikipedia is examined in the context of INEX} 2007. An analysis of the INEX} 2006 assessments suggests that fixed sized window of about 300 terms is consistently seen and that this might be a good retrieval strategy. In runs submitted to INEX, potentially relevant documents were identified using BM25} (trained on INEX} 2006 data). For each potentially relevant document the location of every search term was identified and the center (mean) located. A fixed sized window was then centered on this location. A method of removing outliers was examined in which all terms occurring outside one standard deviation of the center were considered outliers and the center recomputed without them. Both techniques were examined with and without stemming. For Wikipedia linking we identified terms within the document that were over-represented and from the top few generated queries of different lengths. A BM25} ranking search engine was used to identify potentially relevant documents. Links from the source document to the potentially relevant documents (and back) were constructed (at a granularity of whole document). The best performing run used the 4 most over-represented search terms to retrieve 200 documents, and the next 4 to retrieve 50 more. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Ad hoc passage retrieval within the Wikipedia is examined in the context of INEX} 2007. An analysis of the INEX} 2006 assessments suggests that fixed sized window of about 300 terms is consistently seen and that this might be a good retrieval strategy. In runs submitted to INEX, potentially relevant documents were identified using BM25} (trained on INEX} 2006 data). For each potentially relevant document the location of every search term was identified and the center (mean) located. A fixed sized window was then centered on this location. A method of removing outliers was examined in which all terms occurring outside one standard deviation of the center were considered outliers and the center recomputed without them. Both techniques were examined with and without stemming. For Wikipedia linking we identified terms within the document that were over-represented and from the top few generated queries of different lengths. A BM25} ranking search engine was used to identify potentially relevant documents. Links from the source document to the potentially relevant documents (and back) were constructed (at a granularity of whole document). The best performing run used the 4 most over-represented search terms to retrieve 200 documents, and the next 4 to retrieve 50 more. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Jiang, Jiepu; Lu, Wei; Rong, Xianqian & Gao, Yangyan
 * Adapting language modeling methods for expert search to rank wikipedia entities
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||In this paper, we propose two methods to adapt language modeling methods for expert search to the INEX} entity ranking task. In our experiments, we notice that language modeling methods for expert search, if directly applied to the INEX} entity ranking task, cannot effectively distinguish entity types. Thus, our proposed methods aim at resolving this problem. First, we propose a method to take into account the INEX} category query field. Second, we use an interpolation of two language models to rank entities, which can solely work on the text query. Our experiments indicate that both methods can effectively adapt language modeling methods for expert search to the INEX} entity ranking task. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||In this paper, we propose two methods to adapt language modeling methods for expert search to the INEX} entity ranking task. In our experiments, we notice that language modeling methods for expert search, if directly applied to the INEX} entity ranking task, cannot effectively distinguish entity types. Thus, our proposed methods aim at resolving this problem. First, we propose a method to take into account the INEX} category query field. Second, we use an interpolation of two language models to rank entities, which can solely work on the text query. Our experiments indicate that both methods can effectively adapt language modeling methods for expert search to the INEX} entity ranking task. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Jijkoun, Valentin; Hofmann, Katja; Ahn, David; Khalid, Mahboob Alam; Rantwijk, Joris Van; Rijke, Maarten De & Sang, Erik Tjong Kim
 * The university of amsterdam's question answering system at QA@CLEF 2007
 * 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary
 * 2008
 * 
 * {{hidden||We describe a new version of our question answering system, which was applied to the questions of the 2007 CLEF} Question Answering Dutch monolingual task. This year, we made three major modifications to the system: (1) we added the contents of Wikipedia to the document collection and the answer tables; (2) we completely rewrote the module interface code in Java; and (3) we included a new table stream which returned answer candidates based on information which was learned from question-answer pairs. Unfortunately, the changes did not lead to improved performance. Unsolved technical problems at the time of the deadline have led to missing justifications for a large number of answers in our submission. Our single run obtained an accuracy of only 8\% with an additional 12\% of unsupported answers (compared to 21\% in the last year's task). 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We describe a new version of our question answering system, which was applied to the questions of the 2007 CLEF} Question Answering Dutch monolingual task. This year, we made three major modifications to the system: (1) we added the contents of Wikipedia to the document collection and the answer tables; (2) we completely rewrote the module interface code in Java; and (3) we included a new table stream which returned answer candidates based on information which was learned from question-answer pairs. Unfortunately, the changes did not lead to improved performance. Unsolved technical problems at the time of the deadline have led to missing justifications for a large number of answers in our submission. Our single run obtained an accuracy of only 8\% with an additional 12\% of unsupported answers (compared to 21\% in the last year's task). 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Jinpan, Liu; Liang, He; Xin, Lin; Mingmin, Xu & Wei, Lu
 * A new method to compute the word relevance in news corpus
 * 2nd International Workshop on Intelligent Systems and Applications, ISA2010, May 22, 2010 - May 23, 2010 Wuhan, China
 * 2010
 * 


 * -- align="left" valign=top
 * Juffinger, Andreas; Kern, Roman & Granitzer, Michael
 * Crosslanguage Retrieval Based on Wikipedia Statistics
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||In this paper we present the methodology, implementations and evaluation results of the crosslanguage retrieval system we have developed for the Robust WSD} Task at CLEF} 2008. Our system is based on query preprocessing for translation and homogenisation of queries. The presented preprocessing of queries includes two stages: Firstly, a query translation step based on term statistics of cooccuring articles in Wikipedia. Secondly, different disjunct query composition techniques to search in the CLEF} corpus. We apply the same preprocessing steps for the monolingual as well as the crosslingual task and thereby acting fair and in a similar way across these tasks. The evaluation revealed that the similar processing comes at nearly no costs for monolingual retrieval but enables us to do crosslanguage retrieval and also a feasible comparison of our system performance on these two tasks. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||In this paper we present the methodology, implementations and evaluation results of the crosslanguage retrieval system we have developed for the Robust WSD} Task at CLEF} 2008. Our system is based on query preprocessing for translation and homogenisation of queries. The presented preprocessing of queries includes two stages: Firstly, a query translation step based on term statistics of cooccuring articles in Wikipedia. Secondly, different disjunct query composition techniques to search in the CLEF} corpus. We apply the same preprocessing steps for the monolingual as well as the crosslingual task and thereby acting fair and in a similar way across these tasks. The evaluation revealed that the similar processing comes at nearly no costs for monolingual retrieval but enables us to do crosslanguage retrieval and also a feasible comparison of our system performance on these two tasks. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Kaiser, Fabian; Schwarz, Holger & Jakob, Mihaly
 * Using wikipedia-based conceptual contexts to calculate document similarity
 * 3rd International Conference on Digital Society, ICDS 2009, February 1, 2009 - February 7, 2009 Cancun, Mexico
 * 2009
 * 


 * -- align="left" valign=top
 * Kamps, Jaap; Geva, Shlomo; Trotman, Andrew; Woodley, Alan & Koolen, Marijn
 * Overview of the INEX 2008 Ad hoc track
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||This paper gives an overview of the INEX} 2008 Ad Hoc Track. The main goals of the Ad Hoc Track were two-fold. The first goal was to investigate the value of the internal document structure (as provided by the XML} mark-up) for retrieving relevant information. This is a continuation of INEX} 2007 and, for this reason, the retrieval results are liberalized to arbitrary passages and measures were chosen to fairly compare systems retrieving elements, ranges of elements, and arbitrary passages. The second goal was to compare focused retrieval to article retrieval more directly than in earlier years. For this reason, standard document retrieval rankings have been derived from all runs, and evaluated with standard measures. In addition, a set of queries targeting Wikipedia have been derived from a proxy log, and the runs are also evaluated against the clicked Wikipedia pages. The INEX} 2008 Ad Hoc Track featured three tasks: For the Focused Task a ranked-list of non-overlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the results for the three tasks, and examine the relative effectiveness of element and passage retrieval. This is examined in the context of content only (CO, or Keyword) search as well as content and structure (CAS, or structured) search. Finally, we look at the ability of focused retrieval techniques to rank articles, using standard document retrieval techniques, both against the judged topics as well as against queries and clicks from a proxy log. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper gives an overview of the INEX} 2008 Ad Hoc Track. The main goals of the Ad Hoc Track were two-fold. The first goal was to investigate the value of the internal document structure (as provided by the XML} mark-up) for retrieving relevant information. This is a continuation of INEX} 2007 and, for this reason, the retrieval results are liberalized to arbitrary passages and measures were chosen to fairly compare systems retrieving elements, ranges of elements, and arbitrary passages. The second goal was to compare focused retrieval to article retrieval more directly than in earlier years. For this reason, standard document retrieval rankings have been derived from all runs, and evaluated with standard measures. In addition, a set of queries targeting Wikipedia have been derived from a proxy log, and the runs are also evaluated against the clicked Wikipedia pages. The INEX} 2008 Ad Hoc Track featured three tasks: For the Focused Task a ranked-list of non-overlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the results for the three tasks, and examine the relative effectiveness of element and passage retrieval. This is examined in the context of content only (CO, or Keyword) search as well as content and structure (CAS, or structured) search. Finally, we look at the ability of focused retrieval techniques to rank articles, using standard document retrieval techniques, both against the judged topics as well as against queries and clicks from a proxy log. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Kamps, Jaap & Koolen, Marijn
 * The impact of document level ranking on focused retrieval
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||Document retrieval techniques have proven to be competitive methods in the evaluation of focused retrieval. Although focused approaches such as XML} element retrieval and passage retrieval allow for locating the relevant text within a document, using the larger context of the whole document often leads to superior document level ranking. In this paper we investigate the impact of using the document retrieval ranking in two collections used in the INEX} 2008 Ad hoc and Book Tracks; the relatively short documents of the Wikipedia collection and the much longer books in the Book Track collection. We experiment with several methods of combining document and element retrieval approaches. Our findings are that 1) we can get the best of both worlds and improve upon both individual retrieval strategies by retaining the document ranking of the document retrieval approach and replacing the documents by the retrieved elements of the element retrieval approach, and 2) using document level ranking has a positive impact on focused retrieval in Wikipedia, but has more impact on the much longer books in the Book Track collection. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||Document retrieval techniques have proven to be competitive methods in the evaluation of focused retrieval. Although focused approaches such as XML} element retrieval and passage retrieval allow for locating the relevant text within a document, using the larger context of the whole document often leads to superior document level ranking. In this paper we investigate the impact of using the document retrieval ranking in two collections used in the INEX} 2008 Ad hoc and Book Tracks; the relatively short documents of the Wikipedia collection and the much longer books in the Book Track collection. We experiment with several methods of combining document and element retrieval approaches. Our findings are that 1) we can get the best of both worlds and improve upon both individual retrieval strategies by retaining the document ranking of the document retrieval approach and replacing the documents by the retrieved elements of the element retrieval approach, and 2) using document level ranking has a positive impact on focused retrieval in Wikipedia, but has more impact on the much longer books in the Book Track collection. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Kamps, Jaap & Koolen, Marijn
 * Is Wikipedia link structure different?
 * 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain
 * 2009
 * 
 * {{hidden||In this paper, we investigate the difference between Wikipe-dia and Web link structure with respect to their value as indicators of the relevance of a page for a given topic of request. Our experimental evidence is from two IR} test-collections: the {.GOV} collection used at the TREC} Web tracks and the Wikipedia XML} Corpus used at INEX.} We first perform a comparative analysis of Wikipedia and {.GOV} link structure and then investigate the value of link evidence for improving search on Wikipedia and on the {.GOV} domain. Our main findings are: First, Wikipedia link structure is similar to the Web, but more densely linked. Second, Wikipedia's outlinks behave similar to inlinks and both are good indicators of relevance, whereas on the Web the inlinks are more important. Third, when incorporating link evidence in the retrieval model, for Wikipedia the global link evidence fails and we have to take the local context into account. ""}}
 * {{hidden||In this paper, we investigate the difference between Wikipe-dia and Web link structure with respect to their value as indicators of the relevance of a page for a given topic of request. Our experimental evidence is from two IR} test-collections: the {.GOV} collection used at the TREC} Web tracks and the Wikipedia XML} Corpus used at INEX.} We first perform a comparative analysis of Wikipedia and {.GOV} link structure and then investigate the value of link evidence for improving search on Wikipedia and on the {.GOV} domain. Our main findings are: First, Wikipedia link structure is similar to the Web, but more densely linked. Second, Wikipedia's outlinks behave similar to inlinks and both are good indicators of relevance, whereas on the Web the inlinks are more important. Third, when incorporating link evidence in the retrieval model, for Wikipedia the global link evidence fails and we have to take the local context into account. ""}}


 * -- align="left" valign=top
 * Kanhabua, Nattiya & Nrvag, Kjetil
 * Exploiting time-based synonyms in searching document archives
 * 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia
 * 2010
 * 
 * {{hidden||Query expansion of named entities can be employed in order to increase the retrieval effectiveness. A peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance, and synonym relationships between terms change with time. In this paper, we present an approach to extracting synonyms of named entities over time from the whole history of Wikipedia. In addition, we will use their temporal patterns as a feature in ranking and classifying them into two types, i.e., time-independent or time-dependent. Time-independent synonyms are invariant to time, while time-dependent synonyms are relevant to a particular time period, i.e., the synonym relationships change over time. Further, we describe how to make use of both types of synonyms to increase the retrieval effectiveness, i.e., query expansion with time-independent synonyms for an ordinary search, and query expansion with time-dependent synonyms for a search wrt. temporal criteria. Finally, through an evaluation based on TREC} collections, we demonstrate how retrieval performance of queries consisting of named entities can be improved using our approach. ""}}
 * {{hidden||Query expansion of named entities can be employed in order to increase the retrieval effectiveness. A peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance, and synonym relationships between terms change with time. In this paper, we present an approach to extracting synonyms of named entities over time from the whole history of Wikipedia. In addition, we will use their temporal patterns as a feature in ranking and classifying them into two types, i.e., time-independent or time-dependent. Time-independent synonyms are invariant to time, while time-dependent synonyms are relevant to a particular time period, i.e., the synonym relationships change over time. Further, we describe how to make use of both types of synonyms to increase the retrieval effectiveness, i.e., query expansion with time-independent synonyms for an ordinary search, and query expansion with time-dependent synonyms for a search wrt. temporal criteria. Finally, through an evaluation based on TREC} collections, we demonstrate how retrieval performance of queries consisting of named entities can be improved using our approach. ""}}


 * -- align="left" valign=top
 * Kaptein, Rianne & Kamps, Jaap
 * Finding entities in wikipedia using links and categories
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||In this paper we describe our participation in the INEX} Entity Ranking track. We explored the relations between Wikipedia pages, categories and links. Our approach is to exploit both category and link information. Category information is used by calculating distances between document categories and target categories. Link information is used for relevance propagation and in the form of a document link prior. Both sources of information have value, but using category information leads to the biggest improvements. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||In this paper we describe our participation in the INEX} Entity Ranking track. We explored the relations between Wikipedia pages, categories and links. Our approach is to exploit both category and link information. Category information is used by calculating distances between document categories and target categories. Link information is used for relevance propagation and in the form of a document link prior. Both sources of information have value, but using category information leads to the biggest improvements. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Kaptein, Rianne & Kamps, Jaap
 * Using links to classify wikipedia pages
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||This paper contains a description of experiments for the 2008 INEX} XML-mining} track. Our goal for the XML-mining} track is to explore whether we can use link information to improve classification accuracy. Our approach is to propagate category probabilities over linked pages. We find that using link information leads to marginal improvements over a baseline that uses a Naive Bayes model. For the initially misclassified pages, link information is either not available or contains too much noise. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper contains a description of experiments for the 2008 INEX} XML-mining} track. Our goal for the XML-mining} track is to explore whether we can use link information to improve classification accuracy. Our approach is to propagate category probabilities over linked pages. We find that using link information leads to marginal improvements over a baseline that uses a Naive Bayes model. For the initially misclassified pages, link information is either not available or contains too much noise. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Kawaba, Mariko; Nakasaki, Hiroyuki; Yokomoto, Daisuke; Utsuro, Takehito & Fukuhara, Tomohiro
 * Linking Wikipedia entries to blog feeds by machine learning
 * 3rd International Universal Communication Symposium, IUCS 2009, December 3, 2009 - December 4, 2009 Tokyo, Japan
 * 2009
 * 


 * -- align="left" valign=top
 * Kc, Milly; Chau, Rowena; Hagenbuchner, Markus; Tsoi, Ah Chung & Lee, Vincent
 * A machine learning approach to link prediction for interlinked documents
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||This paper provides an explanation to how a recently developed machine learning approach, namely the Probability Measure Graph Self-Organizing} Map (PM-GraphSOM) can be used for the generation of links between referenced or otherwise interlinked documents. This new generation of SOM} models are capable of projecting generic graph structured data onto a fixed sized display space. Such a mechanism is normally used for dimension reduction, visualization, or clustering purposes. This paper shows that the PM-GraphSOM} training algorithm inadvertently" encodes relations that exist between the atomic elements in a graph. If the nodes in the graph represent documents and the links in the graph represent the reference (or hyperlink) structure of the documents then it is possible to obtain a set of links for a test document whose link structure is unknown. A significant finding of this paper is that the described approach is scalable in that links can be extracted in linear time. It will also be shown that the proposed approach is capable of predicting the pages which would be linked to a new document and is capable of predicting the links to other documents from a given test document. The approach is applied to web pages from Wikipedia a relatively large XML} text database consisting of many referenced documents. 2010 Springer-Verlag} Berlin Heidelberg."}}
 * {{hidden||This paper provides an explanation to how a recently developed machine learning approach, namely the Probability Measure Graph Self-Organizing} Map (PM-GraphSOM) can be used for the generation of links between referenced or otherwise interlinked documents. This new generation of SOM} models are capable of projecting generic graph structured data onto a fixed sized display space. Such a mechanism is normally used for dimension reduction, visualization, or clustering purposes. This paper shows that the PM-GraphSOM} training algorithm inadvertently" encodes relations that exist between the atomic elements in a graph. If the nodes in the graph represent documents and the links in the graph represent the reference (or hyperlink) structure of the documents then it is possible to obtain a set of links for a test document whose link structure is unknown. A significant finding of this paper is that the described approach is scalable in that links can be extracted in linear time. It will also be shown that the proposed approach is capable of predicting the pages which would be linked to a new document and is capable of predicting the links to other documents from a given test document. The approach is applied to web pages from Wikipedia a relatively large XML} text database consisting of many referenced documents. 2010 Springer-Verlag} Berlin Heidelberg."}}


 * -- align="left" valign=top
 * Kimelfeid, Benny; Kovacs, Eitan; Sagiv, Yehoshua & Yahav, Dan
 * Using language models and the HITS algorithm for XML retrieval
 * 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany
 * 2007
 * {{hidden||Our submission to the INEX} 2006 Ad-hoc retrieval track is described. We study how to utilize the Wikipedia structure (XML} documents with hyperlinks) by combining XML} and Web retrieval. In particular, we experiment with different combinations of language models and the {HITS} algorithm. An important feature of our techniques is a filtering phase that identifies the relevant part of the corpus, prior to the processing of the actual XML} elements. We analyze the effect of the above techniques based on the results of our runs in INEX} 2006. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||Our submission to the INEX} 2006 Ad-hoc retrieval track is described. We study how to utilize the Wikipedia structure (XML} documents with hyperlinks) by combining XML} and Web retrieval. In particular, we experiment with different combinations of language models and the {HITS} algorithm. An important feature of our techniques is a filtering phase that identifies the relevant part of the corpus, prior to the processing of the actual XML} elements. We analyze the effect of the above techniques based on the results of our runs in INEX} 2006. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||Our submission to the INEX} 2006 Ad-hoc retrieval track is described. We study how to utilize the Wikipedia structure (XML} documents with hyperlinks) by combining XML} and Web retrieval. In particular, we experiment with different combinations of language models and the {HITS} algorithm. An important feature of our techniques is a filtering phase that identifies the relevant part of the corpus, prior to the processing of the actual XML} elements. We analyze the effect of the above techniques based on the results of our runs in INEX} 2006. Springer-Verlag} Berlin Heidelberg 2007.}}


 * -- align="left" valign=top
 * Kiritani, Yusuke; Ma, Qiang & Yoshikawa, Masatoshi
 * Classifying web pages by using knowledge bases for entity retrieval
 * 20th International Conference on Database and Expert Systems Applications, DEXA 2009, August 31, 2009 - September 4, 2009 Linz, Austria
 * 2009
 * 
 * {{hidden||In this paper, we propose a novel method to classify Web pages by using knowledge bases for entity search, which is a kind of typical Web search for information related to a person, location or organization. First, we map a Web page to entities according to the similarities between the page and the entities. Various methods for computing such similarity are applied. For example, we can compute the similarity between a given page and a Wikipedia article describing a certain entity. The frequency of an entity appearing in the page is another factor used in computing the similarity. Second, we construct a directed acyclic graph, named PEC} graph, based on the relations among Web pages, entities, and categories, by referring to YAGO, a knowledge base built on Wikipedia and WordNet.} Finally, by analyzing the PEC} graph, we classify Web pages into categories. The results of some preliminary experiments validate the methods proposed in this paper. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||In this paper, we propose a novel method to classify Web pages by using knowledge bases for entity search, which is a kind of typical Web search for information related to a person, location or organization. First, we map a Web page to entities according to the similarities between the page and the entities. Various methods for computing such similarity are applied. For example, we can compute the similarity between a given page and a Wikipedia article describing a certain entity. The frequency of an entity appearing in the page is another factor used in computing the similarity. Second, we construct a directed acyclic graph, named PEC} graph, based on the relations among Web pages, entities, and categories, by referring to YAGO, a knowledge base built on Wikipedia and WordNet.} Finally, by analyzing the PEC} graph, we classify Web pages into categories. The results of some preliminary experiments validate the methods proposed in this paper. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Kirtsis, Nikos; Stamou, Sofia; Tzekou, Paraskevi & Zotos, Nikos
 * Information uniqueness in Wikipedia articles
 * 6th International Conference on Web Information Systems and Technologies, WEBIST 2010, April 7, 2010 - April 10, 2010 Valencia, Spain
 * 2010


 * -- align="left" valign=top
 * Kisilevich, Slava; Mansmann, Florian; Bak, Peter; Keim, Daniel & Tchaikin, Alexander
 * Where would you go on your next vacation? A framework for visual exploration of attractive places
 * 2nd International Conference on Advanced Geographic Information Systems, Applications, and Services, GEOProcessing 2010, February 10, 2010 - February 16, 2010 St. Maarten, Netherlands
 * 2010
 * 


 * -- align="left" valign=top
 * Kittur, Aniket; Suh, Bongwon; Pendleton, Bryan A. & Chi, Ed H.
 * He says, she says: Conflict and coordination in Wikipedia
 * 25th SIGCHI Conference on Human Factors in Computing Systems 2007, CHI 2007, April 28, 2007 - May 3, 2007 San Jose, CA, United states
 * 2007
 * 


 * -- align="left" valign=top
 * Kiyota, Yoji; Nakagawa, Hiroshi; Sakai, Satoshi; Mori, Tatsuya & Masuda, Hidetaka
 * Exploitation of the Wikipedia category system for enhancing the value of LCSH
 * 2009 ACM/IEEE Joint Conference on Digital Libraries, JCDL'09, June 15, 2009 - June 19, 2009 Austin, TX, United states
 * 2009
 * 
 * {{hidden||This paper addresses an approach that integrates two different types of information resources: the Web and libraries. Our method begins from any keywords in Wikipedia, and induces related subject headings of LCSH} through the Wikipedia category system.}}
 * {{hidden||This paper addresses an approach that integrates two different types of information resources: the Web and libraries. Our method begins from any keywords in Wikipedia, and induces related subject headings of LCSH} through the Wikipedia category system.}}


 * -- align="left" valign=top
 * Koolen, Marijn & Kamps, Jaap
 * What's in a link? from document importance to topical relevance
 * 2nd International Conference on the Theory of Information Retrieval, ICTIR 2009, September 10, 2009 - September 12, 2009 Cambridge, United kingdom
 * 2009
 * 


 * -- align="left" valign=top
 * Koolen, Marijn; Kaptein, Rianne & Kamps, Jaap
 * Focused search in books and wikipedia: Categories, links and relevance feedback
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||In this paper we describe our participation in INEX} 2009 in the Ad Hoc Track, the Book Track, and the Entity Ranking Track. In the Ad Hoc track we investigate focused link evidence, using only links from retrieved sections. The new collection is not only annotated with Wikipedia categories, but also with YAGO/WordNet} categories. We explore how we can use both types of category information, in the Ad Hoc Track as well as in the Entity Ranking Track. Results in the Ad Hoc Track show Wikipedia categories are more effective than WordNet} categories, and Wikipedia categories in combination with relevance feedback lead to the best results. Preliminary results of the Book Track show full-text retrieval is effective for high early precision. Relevance feedback further increases early precision. Our findings for the Entity Ranking Track are in direct opposition of our Ad Hoc findings, namely, that the WordNet} categories are more effective than the Wikipedia categories. This marks an interesting difference between ad hoc search and entity ranking. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||In this paper we describe our participation in INEX} 2009 in the Ad Hoc Track, the Book Track, and the Entity Ranking Track. In the Ad Hoc track we investigate focused link evidence, using only links from retrieved sections. The new collection is not only annotated with Wikipedia categories, but also with YAGO/WordNet} categories. We explore how we can use both types of category information, in the Ad Hoc Track as well as in the Entity Ranking Track. Results in the Ad Hoc Track show Wikipedia categories are more effective than WordNet} categories, and Wikipedia categories in combination with relevance feedback lead to the best results. Preliminary results of the Book Track show full-text retrieval is effective for high early precision. Relevance feedback further increases early precision. Our findings for the Entity Ranking Track are in direct opposition of our Ad Hoc findings, namely, that the WordNet} categories are more effective than the Wikipedia categories. This marks an interesting difference between ad hoc search and entity ranking. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Kriplean, Travis; Beschastnikh, Ivan; McDonald, David W. & Golder, Scott A.
 * Community, consensus, coercion, control: CS*W or how policy mediates mass participation
 * 2007 International ACM Conference on Supporting Group Work, GROUP'07, November 4, 2007 - November 7, 2007 Sanibel Island, FL, United states
 * 2007
 * 


 * -- align="left" valign=top
 * Kuribara, Shusuke; Abbas, Safia & Sawamura, Hajime
 * Applying the logic of multiple-valued argumentation to social web: SNS and wikipedia
 * 11th Pacific Rim International Conference on Multi-Agents, PRIMA 2008, December 15, 2008 - December 16, 2008 Hanoi, Viet nam
 * 2008
 * 
 * {{hidden||The Logic of Multiple-Valued} Argumentation (LMA) is an argumentation framework that allows for argument-based reasoning about uncertain issues under uncertain knowledge. In this paper, we describe its applications to Social Web: SNS} and Wikipedia. They are said to be the most influential social Web applications to the present and future information society. For SNS, we present an agent that judges the registration approval for Mymixi in mixi in terms of LMA.} For Wikipedia, we focus on the deletion problem of Wikipedia and present agents that argue about the issue on whether contributed articles should be deleted or not, analyzing arguments proposed for deletion in terms of LMA.} These attempts reveal that LMA} can deal with not only potential applications but also practical ones such as extensive and contemporary applications. 2008 Springer Berlin Heidelberg.}}
 * {{hidden||The Logic of Multiple-Valued} Argumentation (LMA) is an argumentation framework that allows for argument-based reasoning about uncertain issues under uncertain knowledge. In this paper, we describe its applications to Social Web: SNS} and Wikipedia. They are said to be the most influential social Web applications to the present and future information society. For SNS, we present an agent that judges the registration approval for Mymixi in mixi in terms of LMA.} For Wikipedia, we focus on the deletion problem of Wikipedia and present agents that argue about the issue on whether contributed articles should be deleted or not, analyzing arguments proposed for deletion in terms of LMA.} These attempts reveal that LMA} can deal with not only potential applications but also practical ones such as extensive and contemporary applications. 2008 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Kusrsten, Jens; Richter, Daniel & Eibl, Maximilian
 * VideoCLEF 2008: ASR classification with wikipedia categories
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||This article describes our participation at the VideoCLEF} track. We designed and implemented a prototype for the classification of the Video ASR} data. Our approach was to regard the task as text classification problem. We used terms from Wikipedia categories as training data for our text classifiers. For the text classification the Naive-Bayes} and KNN} classifier from the WEKA} toolkit were used. We submitted experiments for classification task 1 and 2. For the translation of the feeds to English (translation task) Google's AJAX} language API} was used. Although our experiments achieved only low precision of 10 to 15 percent, we assume those results will be useful in a combined setting with the retrieval approach that was widely used. Interestingly, we could not improve the quality of the classification by using the provided metadata. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This article describes our participation at the VideoCLEF} track. We designed and implemented a prototype for the classification of the Video ASR} data. Our approach was to regard the task as text classification problem. We used terms from Wikipedia categories as training data for our text classifiers. For the text classification the Naive-Bayes} and KNN} classifier from the WEKA} toolkit were used. We submitted experiments for classification task 1 and 2. For the translation of the feeds to English (translation task) Google's AJAX} language API} was used. Although our experiments achieved only low precision of 10 to 15 percent, we assume those results will be useful in a combined setting with the retrieval approach that was widely used. Interestingly, we could not improve the quality of the classification by using the provided metadata. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Kutty, Sangeetha; Tran, Tien; Nayak, Richi & Li, Yuefeng
 * Clustering XML documents using closed frequent subtrees: A structural similarity approach
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||This paper presents the experimental study conducted over the INEX} 2007 Document Mining Challenge corpus employing a frequent subtree-based incremental clustering approach. Using the structural information of the XML} documents, the closed frequent subtrees are generated. A matrix is then developed representing the closed frequent subtree distribution in documents. This matrix is used to progressively cluster the XML} documents. In spite of the large number of documents in INEX} 2007 Wikipedia dataset, the proposed frequent subtree-based incremental clustering approach was successful in clustering the documents. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper presents the experimental study conducted over the INEX} 2007 Document Mining Challenge corpus employing a frequent subtree-based incremental clustering approach. Using the structural information of the XML} documents, the closed frequent subtrees are generated. A matrix is then developed representing the closed frequent subtree distribution in documents. This matrix is used to progressively cluster the XML} documents. In spite of the large number of documents in INEX} 2007 Wikipedia dataset, the proposed frequent subtree-based incremental clustering approach was successful in clustering the documents. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Lahti, Lauri
 * Personalized learning paths based on wikipedia article statistics
 * 2nd International Conference on Computer Supported Education, CSEDU 2010, April 7, 2010 - April 10, 2010 Valencia, Spain
 * 2010


 * -- align="left" valign=top
 * Lahti, Lauri & Tarhio, Jorma
 * Semi-automated map generation for concept gaming
 * Computer Graphics and Visualization 2008 and Gaming 2008: Design for Engaging Experience and Social Interaction 2008, MCCSIS'08 - IADIS Multi Conference on Computer Science and Information Systems, July 22, 2008 - July 27, 2008 Amsterdam, Netherlands
 * 2008


 * -- align="left" valign=top
 * Lam, Shyong K. & Riedl, John
 * Is Wikipedia growing a longer tail?
 * 2009 ACM SIGCHI International Conference on Supporting Group Work, GROUP'09, May 10, 2009 - May 13, 2009 Sanibel Island, FL, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Lanamaki, Arto & Paivarinta, Tero
 * Metacommunication patterns in online communities
 * 3rd International Conference on Online Communities and Social Computing, OCSC 2009. Held as Part of HCI International 2009, July 19, 2009 - July 24, 2009 San Diego, CA, United states
 * 2009
 * 
 * {{hidden||This paper discusses about contemporary literature on computer-mediated metacommunication and observes the phenomenon in two online communities. The results contribute by identifying six general-level patterns of how metacommunication refers to primary communication in online communities. A task-oriented, user-administrated, community (Wikipedia} in Finnish) involved a remarkable number of specialized metacommunication genres. In a centrally moderated discussion-oriented community (Patientslikeme), metacommunication was intertwined more with primary ad hoc communication. We suggest that a focus on specialized metacommunication genres may appear useful in online communities. However, room for ad hoc (meta)communication is needed as well, as it provides a basis for user-initiated community development. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper discusses about contemporary literature on computer-mediated metacommunication and observes the phenomenon in two online communities. The results contribute by identifying six general-level patterns of how metacommunication refers to primary communication in online communities. A task-oriented, user-administrated, community (Wikipedia} in Finnish) involved a remarkable number of specialized metacommunication genres. In a centrally moderated discussion-oriented community (Patientslikeme), metacommunication was intertwined more with primary ad hoc communication. We suggest that a focus on specialized metacommunication genres may appear useful in online communities. However, room for ad hoc (meta)communication is needed as well, as it provides a basis for user-initiated community development. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Laroslaw, Kuchta
 * Passing from requirements specification to class model using application domain ontology
 * 2010 2nd International Conference on Information Technology, ICIT 2010, June 28, 2010 - June 30, 2010 Gdansk, Poland
 * 2010


 * -- align="left" valign=top
 * Larsen, Jakob Eg; Halling, Sren; Sigurosson, Magnus & Hansen, Lars Kai
 * MuZeeker: Adapting a music search engine for mobile phones
 * Mobile Multimedia Processing - Fundamentals, Methods, and Applications Tiergartenstrasse 17, Heidelberg, D-69121, Germany
 * 2010
 * 
 * {{hidden||We describe MuZeeker, a search engine with domain knowledge based on Wikipedia. MuZeeker} enables the user to refine a search in multiple steps by means of category selection. In the present version we focus on multimedia search related to music and we present two prototype search applications (web-based and mobile) and discuss the issues involved in adapting the search engine for mobile phones. A category based filtering approach enables the user to refine a search through relevance feedback by category selection instead of typing additional text, which is hypothesized to be an advantage in the mobile MuZeeker} application. We report from two usability experiments using the think aloud protocol, in which N=20 participants performed tasks using MuZeeker} and a customized Google search engine. In both experiments web-based and mobile user interfaces were used. The experiment shows that participants are capable of solving tasks slightly better using MuZeeker, while the inexperienced" MuZeeker} users perform slightly slower than experienced Google users. This was found in both the web-based and the mobile applications. It was found that task performance in the mobile search applications (MuZeeker} and Google) was 2-2.5 times lower than the corresponding web-based search applications (MuZeeker} and Google). """}}
 * {{hidden||We describe MuZeeker, a search engine with domain knowledge based on Wikipedia. MuZeeker} enables the user to refine a search in multiple steps by means of category selection. In the present version we focus on multimedia search related to music and we present two prototype search applications (web-based and mobile) and discuss the issues involved in adapting the search engine for mobile phones. A category based filtering approach enables the user to refine a search through relevance feedback by category selection instead of typing additional text, which is hypothesized to be an advantage in the mobile MuZeeker} application. We report from two usability experiments using the think aloud protocol, in which N=20 participants performed tasks using MuZeeker} and a customized Google search engine. In both experiments web-based and mobile user interfaces were used. The experiment shows that participants are capable of solving tasks slightly better using MuZeeker, while the inexperienced" MuZeeker} users perform slightly slower than experienced Google users. This was found in both the web-based and the mobile applications. It was found that task performance in the mobile search applications (MuZeeker} and Google) was 2-2.5 times lower than the corresponding web-based search applications (MuZeeker} and Google). """}}


 * -- align="left" valign=top
 * Larson, Martha; Newman, Eamonn & Jones, Gareth J. F.
 * Overview of videoCLEF 2008: Automatic generation of topic-based feeds for dual language audio-visual content
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||The VideoCLEF} track, introduced in 2008, aims to develop and evaluate tasks related to analysis of and access to multilingual multimedia content. In its first year, VideoCLEF} piloted the Vid2RSS} task, whose main subtask was the classification of dual language video (Dutch-language} television content featuring English-speaking experts and studio guests). The task offered two additional discretionary subtasks: feed translation and automatic keyframe extraction. Task participants were supplied with Dutch archival metadata, Dutch speech transcripts, English speech transcripts and ten thematic category labels, which they were required to assign to the test set videos. The videos were grouped by class label into topic-based RSS-feeds, displaying title, description and keyframe for each video. Five groups participated in the 2008 VideoCLEF} track. Participants were required to collect their own training data; both Wikipedia and general web content were used. Groups deployed various classifiers (SVM, Naive Bayes and K-NN) or treated the problem as an information retrieval task. Both the Dutch speech transcripts and the archival metadata performed well as sources of indexing features, but no group succeeded in exploiting combinations of feature sources to significantly enhance performance. A small scale fluency/adequacy evaluation of the translation task output revealed the translation to be of sufficient quality to make it valuable to a Non-Dutch} speaking English speaker. For keyframe extraction, the strategy chosen was to select the keyframe from the shot with the most representative speech transcript content. The automatically selected shots were shown, with a small user study, to be competitive with manually selected shots. Future years of VideoCLEF} will aim to expand the corpus and the class label list, as well as to extend the track to additional tasks. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||The VideoCLEF} track, introduced in 2008, aims to develop and evaluate tasks related to analysis of and access to multilingual multimedia content. In its first year, VideoCLEF} piloted the Vid2RSS} task, whose main subtask was the classification of dual language video (Dutch-language} television content featuring English-speaking experts and studio guests). The task offered two additional discretionary subtasks: feed translation and automatic keyframe extraction. Task participants were supplied with Dutch archival metadata, Dutch speech transcripts, English speech transcripts and ten thematic category labels, which they were required to assign to the test set videos. The videos were grouped by class label into topic-based RSS-feeds, displaying title, description and keyframe for each video. Five groups participated in the 2008 VideoCLEF} track. Participants were required to collect their own training data; both Wikipedia and general web content were used. Groups deployed various classifiers (SVM, Naive Bayes and K-NN) or treated the problem as an information retrieval task. Both the Dutch speech transcripts and the archival metadata performed well as sources of indexing features, but no group succeeded in exploiting combinations of feature sources to significantly enhance performance. A small scale fluency/adequacy evaluation of the translation task output revealed the translation to be of sufficient quality to make it valuable to a Non-Dutch} speaking English speaker. For keyframe extraction, the strategy chosen was to select the keyframe from the shot with the most representative speech transcript content. The automatically selected shots were shown, with a small user study, to be competitive with manually selected shots. Future years of VideoCLEF} will aim to expand the corpus and the class label list, as well as to extend the track to additional tasks. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Le, Qize & Panchal, Jitesh H.
 * Modeling the effect of product architecture on mass collaborative processes - An agent-based approach
 * 2009 ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, DETC2009, August 30, 2009 - September 2, 2009 San Diego, CA, United states
 * 2010
 * {{hidden||Traditional product development efforts are based on well-structured and hierarchical product development teams. The products are systematically decomposed into subsystems that are designed by dedicated teams with well-defined information flows. Recently, a new product development approach called Mass Collaborative Product Development (MCPD) has emerged. The fundamental difference between a traditional product development process and a MCPD} process is that the former is based on top-down decomposition while the latter is based on evolution and self-organization. The paradigm of MCPD} has resulted in highly successful products such as Wikipedia, Linux and Apache. Despite the success of various projects using MCPD, it is not well understood how the product architecture affects the evolution of products developed using such processes. To address this gap, an agent-based model to study MCPD} processes is presented in this paper. Through this model, the effect of product architectures on the product evolution is studied. The model is executed for different architectures ranging from slot architecture to bus architecture and the rates of product evolution are determined. The simulation-based approach allows us to study how the degree of modularity of products affects the evolution time of products and different modules in the MCPD} processes. The methodology is demonstrated using an illustrative example of mobile phones. This approach provides a simple and intuitive way to study the effects of product architecture on the MCPD} processes. It is helpful in determining the best strategies for product decomposition and identifying the product architectures that are suitable for the MCPD processes.}}
 * {{hidden||Traditional product development efforts are based on well-structured and hierarchical product development teams. The products are systematically decomposed into subsystems that are designed by dedicated teams with well-defined information flows. Recently, a new product development approach called Mass Collaborative Product Development (MCPD) has emerged. The fundamental difference between a traditional product development process and a MCPD} process is that the former is based on top-down decomposition while the latter is based on evolution and self-organization. The paradigm of MCPD} has resulted in highly successful products such as Wikipedia, Linux and Apache. Despite the success of various projects using MCPD, it is not well understood how the product architecture affects the evolution of products developed using such processes. To address this gap, an agent-based model to study MCPD} processes is presented in this paper. Through this model, the effect of product architectures on the product evolution is studied. The model is executed for different architectures ranging from slot architecture to bus architecture and the rates of product evolution are determined. The simulation-based approach allows us to study how the degree of modularity of products affects the evolution time of products and different modules in the MCPD} processes. The methodology is demonstrated using an illustrative example of mobile phones. This approach provides a simple and intuitive way to study the effects of product architecture on the MCPD} processes. It is helpful in determining the best strategies for product decomposition and identifying the product architectures that are suitable for the MCPD processes.}}
 * {{hidden||Traditional product development efforts are based on well-structured and hierarchical product development teams. The products are systematically decomposed into subsystems that are designed by dedicated teams with well-defined information flows. Recently, a new product development approach called Mass Collaborative Product Development (MCPD) has emerged. The fundamental difference between a traditional product development process and a MCPD} process is that the former is based on top-down decomposition while the latter is based on evolution and self-organization. The paradigm of MCPD} has resulted in highly successful products such as Wikipedia, Linux and Apache. Despite the success of various projects using MCPD, it is not well understood how the product architecture affects the evolution of products developed using such processes. To address this gap, an agent-based model to study MCPD} processes is presented in this paper. Through this model, the effect of product architectures on the product evolution is studied. The model is executed for different architectures ranging from slot architecture to bus architecture and the rates of product evolution are determined. The simulation-based approach allows us to study how the degree of modularity of products affects the evolution time of products and different modules in the MCPD} processes. The methodology is demonstrated using an illustrative example of mobile phones. This approach provides a simple and intuitive way to study the effects of product architecture on the MCPD} processes. It is helpful in determining the best strategies for product decomposition and identifying the product architectures that are suitable for the MCPD processes.}}


 * -- align="left" valign=top
 * Le, Minh-Tam; Dang, Hoang-Vu; Lim, Ee-Peng & Datta, Anwitaman
 * WikiNetViz: Visualizing friends and adversaries in implicit social networks
 * IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, June 17, 2008 - June 20, 2008 Taipei, Taiwan
 * 2008
 * 
 * {{hidden||When multiple users with diverse backgrounds and beliefs edit Wikipedia together, disputes often arise due to disagreements among the users. In this paper, we introduce a novel visualization tool known as WikiNetViz} to visualize and analyze disputes among users in a dispute-induced social network. WikiNetViz} is designed to quantify the degree of dispute between a pair of users using the article history. Each user (and article) is also assigned a controversy score by our proposed ControversyRank} model so as to measure the degree of controversy of a user (and an article) by the amount of disputes between the user (article) and other users in articles of varying degrees of controversy. On the constructed social network, WikiNetViz} can perform clustering so as to visualize the dynamics of disputes at the user group level. It also provides an article viewer for examining an article revision so as to determine the article content modified by different users. ""}}
 * {{hidden||When multiple users with diverse backgrounds and beliefs edit Wikipedia together, disputes often arise due to disagreements among the users. In this paper, we introduce a novel visualization tool known as WikiNetViz} to visualize and analyze disputes among users in a dispute-induced social network. WikiNetViz} is designed to quantify the degree of dispute between a pair of users using the article history. Each user (and article) is also assigned a controversy score by our proposed ControversyRank} model so as to measure the degree of controversy of a user (and an article) by the amount of disputes between the user (article) and other users in articles of varying degrees of controversy. On the constructed social network, WikiNetViz} can perform clustering so as to visualize the dynamics of disputes at the user group level. It also provides an article viewer for examining an article revision so as to determine the article content modified by different users. ""}}


 * -- align="left" valign=top
 * Lee, Kangpyo; Kim, Hyunwoo; Shin, Hyopil & Kim, Hyoung-Joo
 * FolksoViz: A semantic relation-based folksonomy visualization using the Wikipedia corpus
 * 10th ACIS Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2009, In conjunction with IWEA 2009 and WEACR 2009, May 27, 2009 - May 29, 2009 Daegu, Korea, Republic of
 * 2009
 * 
 * {{hidden||Tagging is one of the most popular services in Web 2.0 and folksonomy is a representation of collaborative tagging. Tag cloud has been the one and only visualization of the folksonomy. The tag cloud, however, provides no information about the relations between tags. In this paper, targeting del.icio.us tag data, we propose a technique, FolksoViz, for automatically deriving semantic relations between tags and for visualizing the tags and their relations. In order to find the equivalence, subsumption, and similarity relations, we apply various rules and models based on the Wikipedia corpus. The derived relations are visualized effectively. The experiment shows that the FolksoViz} manages to find the correct semantic relations with high accuracy. ""}}
 * {{hidden||Tagging is one of the most popular services in Web 2.0 and folksonomy is a representation of collaborative tagging. Tag cloud has been the one and only visualization of the folksonomy. The tag cloud, however, provides no information about the relations between tags. In this paper, targeting del.icio.us tag data, we propose a technique, FolksoViz, for automatically deriving semantic relations between tags and for visualizing the tags and their relations. In order to find the equivalence, subsumption, and similarity relations, we apply various rules and models based on the Wikipedia corpus. The derived relations are visualized effectively. The experiment shows that the FolksoViz} manages to find the correct semantic relations with high accuracy. ""}}


 * -- align="left" valign=top
 * Lee, Kangpyo; Kim, Hyunwoo; Shin, Hyopil & Kim, Hyoung-Joo
 * Tag sense disambiguation for clarifying the vocabulary of social tags
 * 2009 IEEE International Conference on Social Computing, SocialCom 2009, August 29, 2009 - August 31, 2009 Vancouver, BC, Canada
 * 2009
 * 
 * {{hidden||Tagging is one of the most popular services in Web 2.0. As a special form of tagging, social tagging is done collaboratively by many users, which forms a so-called folksonomy. As tagging has become widespread on the Web, the tag vocabulary is now very informal, uncontrolled, and personalized. For this reason, many tags are unfamiliar and ambiguous to users so that they fail to understand the meaning of each tag. In this paper, we propose a tag sense disambiguating method, called Tag Sense Disambiguation (TSD), which works in the social tagging environment. TSD} can be applied to the vocabulary of social tags, thereby enabling users to understand the meaning of each tag through Wikipedia. To find the correct mappings from del.icio.us tags to Wikipedia articles, we define the Local )eighbor tags, the Global )eighbor tags, and finally the )eighbor tags that would be the useful keywords for disambiguating the sense of each tag based on the tag co-occurrences. The automatically built mappings are reasonable in most cases. The experiment shows that TSD} can find the correct mappings with high accuracy. ""}}
 * {{hidden||Tagging is one of the most popular services in Web 2.0. As a special form of tagging, social tagging is done collaboratively by many users, which forms a so-called folksonomy. As tagging has become widespread on the Web, the tag vocabulary is now very informal, uncontrolled, and personalized. For this reason, many tags are unfamiliar and ambiguous to users so that they fail to understand the meaning of each tag. In this paper, we propose a tag sense disambiguating method, called Tag Sense Disambiguation (TSD), which works in the social tagging environment. TSD} can be applied to the vocabulary of social tags, thereby enabling users to understand the meaning of each tag through Wikipedia. To find the correct mappings from del.icio.us tags to Wikipedia articles, we define the Local )eighbor tags, the Global )eighbor tags, and finally the )eighbor tags that would be the useful keywords for disambiguating the sense of each tag based on the tag co-occurrences. The automatically built mappings are reasonable in most cases. The experiment shows that TSD} can find the correct mappings with high accuracy. ""}}


 * -- align="left" valign=top
 * Lees-Miller, John; Anderson, Fraser; Hoehn, Bret & Greiner, Russell
 * Does Wikipedia information help Netflix predictions?
 * 7th International Conference on Machine Learning and Applications, ICMLA 2008, December 11, 2008 - December 13, 2008 San Diego, CA, United states
 * 2008
 * 
 * {{hidden||We explore several ways to estimate movie similarity from the free encyclopedia Wikipedia with the goal of improving our predictions for the Netflix Prize. Our system first uses the content and hyperlink structure of Wikipedia articles to identify similarities between movies. We then predict a user's unknown ratings by using these similarities in conjunction with the user's known ratings to initialize matrix factorization and K-Nearest} Neighbours algorithms. We blend these results with existing ratings-based predictors. Finally, we discuss our empirical results, which suggest that external Wikipedia data does not significantly improve the overall prediction accuracy. ""}}
 * {{hidden||We explore several ways to estimate movie similarity from the free encyclopedia Wikipedia with the goal of improving our predictions for the Netflix Prize. Our system first uses the content and hyperlink structure of Wikipedia articles to identify similarities between movies. We then predict a user's unknown ratings by using these similarities in conjunction with the user's known ratings to initialize matrix factorization and K-Nearest} Neighbours algorithms. We blend these results with existing ratings-based predictors. Finally, we discuss our empirical results, which suggest that external Wikipedia data does not significantly improve the overall prediction accuracy. ""}}


 * -- align="left" valign=top
 * Lehtonen, Miro & Doucet, Antoine
 * Phrase detection in the Wikipedia
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||The Wikipedia XML} collection turned out to be rich of marked-up phrases as we carried out our INEX} 2007 experiments. Assuming that a phrase occurs at the inline level of the markup, we were able to identify over 18 million phrase occurrences, most of which were either the anchor text of a hyperlink or a passage of text with added emphasis. As our IR} system - EXTIRP} - indexed the documents, the detected inline-level elements were duplicated in the markup with two direct consequences: 1) The frequency of the phrase terms increased, and 2) the word sequences changed. Because the markup was manipulated before computing word sequences for a phrase index, the actual multi-word phrases became easier to detect. The effect of duplicating the inline-level elements was tested by producing two run submissions in ways that were similar except for the duplication. According to the official INEX} 2007 metric, the positive effect of duplicated phrases was clear. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||The Wikipedia XML} collection turned out to be rich of marked-up phrases as we carried out our INEX} 2007 experiments. Assuming that a phrase occurs at the inline level of the markup, we were able to identify over 18 million phrase occurrences, most of which were either the anchor text of a hyperlink or a passage of text with added emphasis. As our IR} system - EXTIRP} - indexed the documents, the detected inline-level elements were duplicated in the markup with two direct consequences: 1) The frequency of the phrase terms increased, and 2) the word sequences changed. Because the markup was manipulated before computing word sequences for a phrase index, the actual multi-word phrases became easier to detect. The effect of duplicating the inline-level elements was tested by producing two run submissions in ways that were similar except for the duplication. According to the official INEX} 2007 metric, the positive effect of duplicated phrases was clear. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Lehtonen, Miro & Doucet, Antoine
 * EXTIRP: Baseline retrieval from Wikipedia
 * 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany
 * 2007
 * {{hidden||The Wikipedia XML} documents are considered an interesting challenge to any XML} retrieval system that is capable of indexing and retrieving XML} without prior knowledge of the structure. Although the structure of the Wikipedia XML} documents is highly irregular and thus unpredictable, EXTIRP} manages to handle all the well-formed XML} documents without problems. Whether the high flexibility of EXTIRP} also implies high performance concerning the quality of IR} has so far been a question without definite answers. The initial results do not confirm any positive answers, but instead, they tempt us to define some requirements for the XML} documents that EXTIRP} is expected to index. The most interesting question stemming from our results is about the line between high-quality XML} markup which aids accurate IR} and noisy XML} spam" that misleads flexible XML} search engines. Springer-Verlag} Berlin Heidelberg 2007."}}
 * {{hidden||The Wikipedia XML} documents are considered an interesting challenge to any XML} retrieval system that is capable of indexing and retrieving XML} without prior knowledge of the structure. Although the structure of the Wikipedia XML} documents is highly irregular and thus unpredictable, EXTIRP} manages to handle all the well-formed XML} documents without problems. Whether the high flexibility of EXTIRP} also implies high performance concerning the quality of IR} has so far been a question without definite answers. The initial results do not confirm any positive answers, but instead, they tempt us to define some requirements for the XML} documents that EXTIRP} is expected to index. The most interesting question stemming from our results is about the line between high-quality XML} markup which aids accurate IR} and noisy XML} spam" that misleads flexible XML} search engines. Springer-Verlag} Berlin Heidelberg 2007."}}
 * {{hidden||The Wikipedia XML} documents are considered an interesting challenge to any XML} retrieval system that is capable of indexing and retrieving XML} without prior knowledge of the structure. Although the structure of the Wikipedia XML} documents is highly irregular and thus unpredictable, EXTIRP} manages to handle all the well-formed XML} documents without problems. Whether the high flexibility of EXTIRP} also implies high performance concerning the quality of IR} has so far been a question without definite answers. The initial results do not confirm any positive answers, but instead, they tempt us to define some requirements for the XML} documents that EXTIRP} is expected to index. The most interesting question stemming from our results is about the line between high-quality XML} markup which aids accurate IR} and noisy XML} spam" that misleads flexible XML} search engines. Springer-Verlag} Berlin Heidelberg 2007."}}


 * -- align="left" valign=top
 * Leong, Peter; Siak, Chia Bin & Miao, Chunyan
 * Cyber engineering co-intelligence digital ecosystem: The GOFASS methodology
 * 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies, DEST '09, June 1, 2009 - June 3, 2009 Istanbul, Turkey
 * 2009
 * 
 * {{hidden||Co-intelligence, also known as collective or collaborative intelligence, is the harnessing of human knowledge and intelligence that allows groups of people to act together in ways that seem to be intelligent. Co-intelligence Internet applications such as Wikipedia are the first steps toward developing digital ecosystems that support collective intelligence. Peer-to-peer (P2P) systems are well fitted to Co-Intelligence} digital ecosystems because they allow each service client machine to act also as a service provider without any central hub in the network of cooperative relationships. However, dealing with server farms, clusters and meshes of wireless edge devices will be the norm in the next generation of computing; but most present P2P} system had been designed with a fixed, wired infrastructure in mind. This paper proposes a methodology for cyber engineering an intelligent agent mediated co-intelligence digital ecosystems. Our methodology caters for co-intelligence digital ecosystems with wireless edge devices working with service-oriented information servers. ""}}
 * {{hidden||Co-intelligence, also known as collective or collaborative intelligence, is the harnessing of human knowledge and intelligence that allows groups of people to act together in ways that seem to be intelligent. Co-intelligence Internet applications such as Wikipedia are the first steps toward developing digital ecosystems that support collective intelligence. Peer-to-peer (P2P) systems are well fitted to Co-Intelligence} digital ecosystems because they allow each service client machine to act also as a service provider without any central hub in the network of cooperative relationships. However, dealing with server farms, clusters and meshes of wireless edge devices will be the norm in the next generation of computing; but most present P2P} system had been designed with a fixed, wired infrastructure in mind. This paper proposes a methodology for cyber engineering an intelligent agent mediated co-intelligence digital ecosystems. Our methodology caters for co-intelligence digital ecosystems with wireless edge devices working with service-oriented information servers. ""}}


 * -- align="left" valign=top
 * Li, Bing; Chen, Qing-Cai; Yeung, Daniel S.; Ng, Wing W.Y. & Wang, Xiao-Long
 * Exploring wikipedia and query log's ability for text feature representation
 * 6th International Conference on Machine Learning and Cybernetics, ICMLC 2007, August 19, 2007 - August 22, 2007 Hong Kong, China
 * 2007
 * 
 * {{hidden||The rapid increase of internet technology requires a better management of web page contents. Many text mining researches has been conducted, like text categorization, information retrieval, text clustering. When machine learning methods or statistical models are applied to such a large scale of data, the first step we have to solve is to represent a text document into the way that computers could handle. Traditionally, single words are always employed as features in Vector Space Model, which make up the feature space for all text documents. The single-word based representation is based on the word independence and doesn't consider their relations, which may cause information missing. This paper proposes Wiki-Query} segmented features to text classification, in hopes of better using the text information. The experiment results show that a much better F1 value has been achieved than that of classical single-word based text representation. This means that Wikipedia and query segmented feature could better represent a text document. ""}}
 * {{hidden||The rapid increase of internet technology requires a better management of web page contents. Many text mining researches has been conducted, like text categorization, information retrieval, text clustering. When machine learning methods or statistical models are applied to such a large scale of data, the first step we have to solve is to represent a text document into the way that computers could handle. Traditionally, single words are always employed as features in Vector Space Model, which make up the feature space for all text documents. The single-word based representation is based on the word independence and doesn't consider their relations, which may cause information missing. This paper proposes Wiki-Query} segmented features to text classification, in hopes of better using the text information. The experiment results show that a much better F1 value has been achieved than that of classical single-word based text representation. This means that Wikipedia and query segmented feature could better represent a text document. ""}}


 * -- align="left" valign=top
 * Li, Yun; Huang, Kaiyan; Ren, Fuji & Zhong, Yixin
 * Searching and computing for vocabularies with semantic correlations from Chinese Wikipedia
 * China-Ireland International Conference on Information and Communications Technologies, CIICT 2008, September 26, 2008 - September 28, 2008 Beijing, China
 * 2008
 * 


 * -- align="left" valign=top
 * Lian, Li; Ma, Jun; Lei, JingSheng; Song, Ling & Liu, LeBo
 * Automated construction Chinese domain ontology from Wikipedia
 * 4th International Conference on Natural Computation, ICNC 2008, October 18, 2008 - October 20, 2008 Jinan, China
 * 2008
 * 
 * {{hidden||Wikipedia (Wiki) is a collaborative on-line encyclopedia, where web users are able to share their knowledge about a certain topic. How to make use of the rich knowledge in the Wiki is a big challenge. In this paper we propose a method to construct domain ontology from the Chinese Wiki automatically. The main Idea in this paper is based on the entry segmenting and Feature Text (FT) extracting, where we segment the name of entries and establish the concept hierarchy firstly. Secondly, we extract the FTs} from the descriptions of entries to eliminate the redundant information. Finally we calculate the similarity between pairs of FTs} to revise the concept hierarchy and gain non-taxonomy relations between concepts. The primary experiment indicates that our method is useful for Chinese domain ontology construction. ""}}
 * {{hidden||Wikipedia (Wiki) is a collaborative on-line encyclopedia, where web users are able to share their knowledge about a certain topic. How to make use of the rich knowledge in the Wiki is a big challenge. In this paper we propose a method to construct domain ontology from the Chinese Wiki automatically. The main Idea in this paper is based on the entry segmenting and Feature Text (FT) extracting, where we segment the name of entries and establish the concept hierarchy firstly. Secondly, we extract the FTs} from the descriptions of entries to eliminate the redundant information. Finally we calculate the similarity between pairs of FTs} to revise the concept hierarchy and gain non-taxonomy relations between concepts. The primary experiment indicates that our method is useful for Chinese domain ontology construction. ""}}


 * -- align="left" valign=top
 * Liang, Chia-Kai; Hsieh, Yu-Ting; Chuang, Tien-Jung; Wang, Yin; Weng, Ming-Fang & Chuang, Yung-Yu
 * Learning landmarks by exploiting social media
 * 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China
 * 2009
 * 
 * {{hidden||This paper introduces methods for automatic annotation of landmark photographs via learning textual tags and visual features of landmarks from landmark photographs that are appropriately location-tagged from social media. By analyzing spatial distributions of text tags from Flickr's geotagged photos, we identify thousands of tags that likely refer to landmarks. Further verification by utilizing Wikipedia articles filters out non-landmark tags. Association analysis is used to find the containment relationship between landmark tags and other geographic names, thus forming a geographic hierarchy. Photographs relevant to each landmark tag were retrieved from Flickr and distinctive visual features were extracted from them. The results form ontology for landmarks, including their names, equivalent names, geographic hierarchy, and visual features. We also propose an efficient indexing method for content-based landmark search. The resultant ontology could be used in tag suggestion and content-relevant re-ranking. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper introduces methods for automatic annotation of landmark photographs via learning textual tags and visual features of landmarks from landmark photographs that are appropriately location-tagged from social media. By analyzing spatial distributions of text tags from Flickr's geotagged photos, we identify thousands of tags that likely refer to landmarks. Further verification by utilizing Wikipedia articles filters out non-landmark tags. Association analysis is used to find the containment relationship between landmark tags and other geographic names, thus forming a geographic hierarchy. Photographs relevant to each landmark tag were retrieved from Flickr and distinctive visual features were extracted from them. The results form ontology for landmarks, including their names, equivalent names, geographic hierarchy, and visual features. We also propose an efficient indexing method for content-based landmark search. The resultant ontology could be used in tag suggestion and content-relevant re-ranking. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Lim, Ee-Peng; Kwee, Agus Trisnajaya; Ibrahim, Nelman Lubis; Sun, Aixin; Datta, Anwitaman; Chang, Kuiyu & Maureen
 * Visualizing and exploring evolving information networks in Wikipedia
 * 12th International Conference on Asia-Pacific Digital Libraries, ICADL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia
 * 2010
 * 
 * {{hidden||Information networks in Wikipedia evolve as users collaboratively edit articles that embed the networks. These information networks represent both the structure and content of community's knowledge and the networks evolve as the knowledge gets updated. By observing the networks evolve and finding their evolving patterns, one can gain higher order knowledge about the networks and conduct longitudinal network analysis to detect events and summarize trends. In this paper, we present SSNetViz+, a visual analytic tool to support visualization and exploration of Wikipedia's information networks. SSNetViz+} supports time-based network browsing, content browsing and search. Using a terrorism information network as an example, we show that different timestamped versions of the network can be interactively explored. As information networks in Wikipedia are created and maintained by collaborative editing efforts, the edit activity data are also shown to help detecting interesting events that may have happened to the network. SSNetViz+} also supports temporal queries that allow other relevant nodes to be added so as to expand the network being analyzed. ""}}
 * {{hidden||Information networks in Wikipedia evolve as users collaboratively edit articles that embed the networks. These information networks represent both the structure and content of community's knowledge and the networks evolve as the knowledge gets updated. By observing the networks evolve and finding their evolving patterns, one can gain higher order knowledge about the networks and conduct longitudinal network analysis to detect events and summarize trends. In this paper, we present SSNetViz+, a visual analytic tool to support visualization and exploration of Wikipedia's information networks. SSNetViz+} supports time-based network browsing, content browsing and search. Using a terrorism information network as an example, we show that different timestamped versions of the network can be interactively explored. As information networks in Wikipedia are created and maintained by collaborative editing efforts, the edit activity data are also shown to help detecting interesting events that may have happened to the network. SSNetViz+} also supports temporal queries that allow other relevant nodes to be added so as to expand the network being analyzed. ""}}


 * -- align="left" valign=top
 * Lim, Ee-Peng; Wang, Z.; Sadeli, D.; Li, Y.; Chang, Chew-Hung; Chatterjea, Kalyani; Goh, Dion Hoe-Lian; Theng, Yin-Leng; Zhang, Jun & Sun, Aixin
 * Integration of Wikipedia and a geography digital library
 * 9th International Conference on Asian Digital Libraries, ICADL 2006, November 27, 2006 - November 30, 2006 Kyoto, Japan
 * 2006
 * {{hidden||In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal} and Wikipedia to meet the integration requirements. Springer-Verlag} Berlin Heidelberg 2006.}}
 * {{hidden||In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal} and Wikipedia to meet the integration requirements. Springer-Verlag} Berlin Heidelberg 2006.}}
 * {{hidden||In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal} and Wikipedia to meet the integration requirements. Springer-Verlag} Berlin Heidelberg 2006.}}


 * -- align="left" valign=top
 * Linna, Li
 * The design of semantic web services discovery model based on multi proxy
 * 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009, November 20, 2009 - November 22, 2009 Shanghai, China
 * 2009
 * 
 * {{hidden||Web services have changed the Web from a database of static documents to a service provider. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and proxys. In this paper we propose a model for semantic Web service discovery based on semantic Web services and FIPA} multi proxys. This paper provides a broker which provides semantic interoperability between semantic Web service provider and proxys by translating WSDL} to DF} description for semantic Web services and DF} description to WSDL} for FIPA} multi proxys. We describe how the proposed architecture analyzes the request and match search query. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ...). We also describe the recommendation component that recommends the WSDL} to Web service provider to increase their retrieval probability in the related queries. ""}}
 * {{hidden||Web services have changed the Web from a database of static documents to a service provider. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and proxys. In this paper we propose a model for semantic Web service discovery based on semantic Web services and FIPA} multi proxys. This paper provides a broker which provides semantic interoperability between semantic Web service provider and proxys by translating WSDL} to DF} description for semantic Web services and DF} description to WSDL} for FIPA} multi proxys. We describe how the proposed architecture analyzes the request and match search query. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ...). We also describe the recommendation component that recommends the WSDL} to Web service provider to increase their retrieval probability in the related queries. ""}}


 * -- align="left" valign=top
 * Lintean, Mihai; Moldovan, Cristian; Rus, Vasile & McNamara, Danielle
 * The role of local and global weighting in assessing the semantic similarity of texts using latent semantic analysis
 * 23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23, May 19, 2010 - May 21, 2010 Daytona Beach, FL, United states
 * 2010
 * {{hidden||In this paper, we investigate the impact of several local and global weighting schemes on Latent Semantic Analysis' (LSA) ability to capture semantic similarity between two texts. We worked with texts varying in size from sentences to paragraphs. We present a comparison of 3 local and 3 global weighting schemes across 3 different standardized data sets related to semantic similarity tasks. For local weighting, we used binary weighting, term-frequency, and log-type. For global weighting, we relied on binary, inverted document frequencies (IDF) collected from the English Wikipedia, and entropy, which is the standard weighting scheme used by most LSA-based} applications. We studied all possible combinations of these weighting schemes on the following three tasks and corresponding data sets: paraphrase identification at sentence level using the Microsoft Research Paraphrase Corpus, paraphrase identification at sentence level using data from the intelligent tutoring system ISTART, and mental model detection based on student-articulated paragraphs in MetaTutor, another intelligent tutoring system. Our experiments revealed that for sentence-level texts a combination of type frequency local weighting in combination with either IDF} or binary global weighting works best. For paragraph-level texts, a log-type local weighting in combination with binary global weighting works best. We also found that global weights have a greater impact for sententence-level similarity as the local weight is undermined by the small size of such texts. Copyright 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}
 * {{hidden||In this paper, we investigate the impact of several local and global weighting schemes on Latent Semantic Analysis' (LSA) ability to capture semantic similarity between two texts. We worked with texts varying in size from sentences to paragraphs. We present a comparison of 3 local and 3 global weighting schemes across 3 different standardized data sets related to semantic similarity tasks. For local weighting, we used binary weighting, term-frequency, and log-type. For global weighting, we relied on binary, inverted document frequencies (IDF) collected from the English Wikipedia, and entropy, which is the standard weighting scheme used by most LSA-based} applications. We studied all possible combinations of these weighting schemes on the following three tasks and corresponding data sets: paraphrase identification at sentence level using the Microsoft Research Paraphrase Corpus, paraphrase identification at sentence level using data from the intelligent tutoring system ISTART, and mental model detection based on student-articulated paragraphs in MetaTutor, another intelligent tutoring system. Our experiments revealed that for sentence-level texts a combination of type frequency local weighting in combination with either IDF} or binary global weighting works best. For paragraph-level texts, a log-type local weighting in combination with binary global weighting works best. We also found that global weights have a greater impact for sententence-level similarity as the local weight is undermined by the small size of such texts. Copyright 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}
 * {{hidden||In this paper, we investigate the impact of several local and global weighting schemes on Latent Semantic Analysis' (LSA) ability to capture semantic similarity between two texts. We worked with texts varying in size from sentences to paragraphs. We present a comparison of 3 local and 3 global weighting schemes across 3 different standardized data sets related to semantic similarity tasks. For local weighting, we used binary weighting, term-frequency, and log-type. For global weighting, we relied on binary, inverted document frequencies (IDF) collected from the English Wikipedia, and entropy, which is the standard weighting scheme used by most LSA-based} applications. We studied all possible combinations of these weighting schemes on the following three tasks and corresponding data sets: paraphrase identification at sentence level using the Microsoft Research Paraphrase Corpus, paraphrase identification at sentence level using data from the intelligent tutoring system ISTART, and mental model detection based on student-articulated paragraphs in MetaTutor, another intelligent tutoring system. Our experiments revealed that for sentence-level texts a combination of type frequency local weighting in combination with either IDF} or binary global weighting works best. For paragraph-level texts, a log-type local weighting in combination with binary global weighting works best. We also found that global weights have a greater impact for sententence-level similarity as the local weight is undermined by the small size of such texts. Copyright 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}


 * -- align="left" valign=top
 * Liu, Changxin; Chen, Huijuan; Tan, Yunlan & Wu, Lanying
 * The design of e-Learning system based on semantic wiki and multi-agent
 * 2nd International Workshop on Education Technology and Computer Science, ETCS 2010, March 6, 2010 - March 7, 2010 Wuhan, Hubei, China
 * 2010
 * 


 * -- align="left" valign=top
 * Liu, Qiaoling; Xu, Kaifeng; Zhang, Lei; Wang, Haofen; Yu, Yong & Pan, Yue
 * Catriple: Extracting triples from wikipedia categories
 * 3rd Asian Semantic Web Conference, ASWC 2008, December 8, 2008 - December 11, 2008 Bangkok, Thailand
 * 2008
 * 


 * -- align="left" valign=top
 * Lu, Zhiqiang; Shao, Werimin & Yu, Zhenhua
 * Measuring semantic similarity between words using wikipedia
 * 2009 International Conference on Web Information Systems and Mining, WISM 2009, November 7, 2009 - November 8, 2009 Shanghai, China
 * 2009
 * 
 * {{hidden||Semantic similarity measures play an important role in the extraction of semantic relations. Semantic similarity measures are widely used in Natural Language Processing (NLP) and information Retrieval (IR).} This paper presents a new Web-based method for measuring the semantic similarity between words. Different from other methods which are based on taxonomy or Search engine in Internet, our method uses snippets from Wikipedia1 to calculate the semantic similarity between words by using cosine similarity and TF-IDF.} Also, the stemmer algorithm and stop words are used in preprocessing the snippets from Wikipedia. We set different threshold to evaluate our results in order to decrease the interference from noise and redundancy. Our method was empirically evaluated using Rubenstein-Goodenough} benchmark dataset. It gives higher correlation value (with 0.615) than some existing methods. Evaluation results show that our method improves accuracy and more robust for measuring semantic similarity between words. ""}}
 * {{hidden||Semantic similarity measures play an important role in the extraction of semantic relations. Semantic similarity measures are widely used in Natural Language Processing (NLP) and information Retrieval (IR).} This paper presents a new Web-based method for measuring the semantic similarity between words. Different from other methods which are based on taxonomy or Search engine in Internet, our method uses snippets from Wikipedia1 to calculate the semantic similarity between words by using cosine similarity and TF-IDF.} Also, the stemmer algorithm and stop words are used in preprocessing the snippets from Wikipedia. We set different threshold to evaluate our results in order to decrease the interference from noise and redundancy. Our method was empirically evaluated using Rubenstein-Goodenough} benchmark dataset. It gives higher correlation value (with 0.615) than some existing methods. Evaluation results show that our method improves accuracy and more robust for measuring semantic similarity between words. ""}}


 * -- align="left" valign=top
 * Lukosch, Stephan & Leisen, Andrea
 * Comparing and merging versioned wiki pages
 * 4th International Conference on Web Information Systems and Technologies, WEBIST 2008, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal
 * 2009
 * 


 * -- align="left" valign=top
 * Lukosch, Stephan & Leisen, Andrea
 * Dealing with conflicting modifications in a Wiki
 * WEBIST 2008 - 4th International Conference on Web Information Systems and Technologies, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal
 * 2008


 * -- align="left" valign=top
 * Mansour, Osama
 * Group Intelligence: A distributed cognition perspective
 * International Conference on Intelligent Networking and Collaborative Systems, INCoS 2009, November 4, 2009 - November 6, 2009 Barcelona, Spain
 * 2009
 * 


 * -- align="left" valign=top
 * Mataoui, M'hamed; Boughanem, Mohand & Mezghiche, Mohamed
 * Experiments on PageRank algorithm in the XML information retrieval context
 * 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009, August 4, 2009 - August 6, 2009 London, United kingdom
 * 2009
 * 
 * {{hidden||In this paper we present two adaptations of the PageRank} algorithm to collections of XML} documents and the experimental results obtained for the Wikipedia collection used at INEX1} 2007. These adaptations to which we referred as DOCRANK} and TOPICAL-docrank"} allow the re-rank of the results returned by the base run execution to improve retrieval quality. Our experiments are performed on the results returned by the three best ranked systems in the {"Focused"} task of INEX} 2007. Evaluations have shown improvements in the quality of retrieval results (improvement of some topics is very significant eg: topic 491 topic 521 etc.). The best improvement achieved in the results returned by the DALIAN2} university system (global rate obtained for the 107 topics of INEX} 2007) was about 3.78\%. """}}
 * {{hidden||In this paper we present two adaptations of the PageRank} algorithm to collections of XML} documents and the experimental results obtained for the Wikipedia collection used at INEX1} 2007. These adaptations to which we referred as DOCRANK} and TOPICAL-docrank"} allow the re-rank of the results returned by the base run execution to improve retrieval quality. Our experiments are performed on the results returned by the three best ranked systems in the {"Focused"} task of INEX} 2007. Evaluations have shown improvements in the quality of retrieval results (improvement of some topics is very significant eg: topic 491 topic 521 etc.). The best improvement achieved in the results returned by the DALIAN2} university system (global rate obtained for the 107 topics of INEX} 2007) was about 3.78\%. """}}


 * -- align="left" valign=top
 * Maureen; Sun, Aixin; Lim, Ee-Peng; Datta, Anwitaman & Chang, Kuiyu
 * On visualizing heterogeneous semantic networks from multiple data sources
 * 11th International Conference on Asian Digital Libraries, ICADL 2008, December 2, 2008 - December 5, 2008 Bali, Indonesia
 * 2008
 * 


 * -- align="left" valign=top
 * Minier, Zsolt; Bodo, Zalan & Csato, Lehel
 * Wikipedia-based Kernels for text categorization
 * 9th International Symposium on Symbolic and Numeric lgorithms for Scientific Computing, SYNASC 2007, September 26, 2007 - September 29, 2007 Timisoara, Romania
 * 2007
 * 


 * -- align="left" valign=top
 * Mishra, Surjeet & Ghosh, Hiranmay
 * Effective visualization and navigation in a multimedia document collection using ontology
 * 3rd International Conference on Pattern Recognition and Machine Intelligence, PReMI 2009, December 16, 2009 - December 20, 2009 New Delhi, India
 * 2009
 * 
 * {{hidden||We present a novel user interface for visualizing and navigating in a multimedia document collection. Domain ontology has been used to depict the background knowledge organization and map the multimedia information nodes on that knowledge map, thereby making the implicit knowledge organization in a collection explicit. The ontology is automatically created by analyzing the links in Wikipedia, and is delimited to tightly cover the information nodes in the collection. We present an abstraction of the knowledge map for creating a clear and concise view, which can be progressively 'zoomed in' or 'zoomed out' to navigate the knowledge space. We organize the graph based on mutual similarity scores between the nodes for aiding the cognitive process during navigation. 2009 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We present a novel user interface for visualizing and navigating in a multimedia document collection. Domain ontology has been used to depict the background knowledge organization and map the multimedia information nodes on that knowledge map, thereby making the implicit knowledge organization in a collection explicit. The ontology is automatically created by analyzing the links in Wikipedia, and is delimited to tightly cover the information nodes in the collection. We present an abstraction of the knowledge map for creating a clear and concise view, which can be progressively 'zoomed in' or 'zoomed out' to navigate the knowledge space. We organize the graph based on mutual similarity scores between the nodes for aiding the cognitive process during navigation. 2009 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Missen, Malik Muhammad Saad; Boughanem, Mohand & Cabanac, Guillaume
 * Using passage-based language model for opinion detection in blogs
 * 25th Annual ACM Symposium on Applied Computing, SAC 2010, March 22, 2010 - March 26, 2010 Sierre, Switzerland
 * 2010
 * 
 * {{hidden||In this work, we evaluate the importance of Passages in blogs especially when we are dealing with the task of Opinion Detection. We argue that passages are basic building blocks of blogs. Therefore, we use Passage-Based} Language Modeling approach as our approach for Opinion Finding in Blogs. Our decision to use Language Modeling (LM) in this work is totally based on the performance LM} has given in various Opinion Detection Approaches. In addition to this, we propose a novel method for bi-dimensional Query Expansion with relevant and opinionated terms using Wikipedia and Relevance-Feedback} mechanism respectively. We also compare the impacts of two different query terms weighting (and ranking) approaches on final results. Besides all this, we also compare the performance of three Passage-based document ranking functions (Linear, Avg, Max). For evaluation purposes, we use the data collection of TREC} Blog06 with 50 topics of TREC} 2006 over TREC} provided best baseline with opinion finding MAP} of 0.3022. Our approach gives a MAP} improvement of almost 9.29\% over best TREC} provided baseline (baseline4). ""}}
 * {{hidden||In this work, we evaluate the importance of Passages in blogs especially when we are dealing with the task of Opinion Detection. We argue that passages are basic building blocks of blogs. Therefore, we use Passage-Based} Language Modeling approach as our approach for Opinion Finding in Blogs. Our decision to use Language Modeling (LM) in this work is totally based on the performance LM} has given in various Opinion Detection Approaches. In addition to this, we propose a novel method for bi-dimensional Query Expansion with relevant and opinionated terms using Wikipedia and Relevance-Feedback} mechanism respectively. We also compare the impacts of two different query terms weighting (and ranking) approaches on final results. Besides all this, we also compare the performance of three Passage-based document ranking functions (Linear, Avg, Max). For evaluation purposes, we use the data collection of TREC} Blog06 with 50 topics of TREC} 2006 over TREC} provided best baseline with opinion finding MAP} of 0.3022. Our approach gives a MAP} improvement of almost 9.29\% over best TREC} provided baseline (baseline4). ""}}


 * -- align="left" valign=top
 * Mlgaard, Lasse L.; Larsen, Jan & Goutte, Cyril
 * Temporal analysis of text data using latent variable models
 * Machine Learning for Signal Processing XIX - 2009 IEEE Signal Processing Society Workshop, MLSP 2009, September 2, 2009 - September 4, 2009 Grenoble, France
 * 2009
 * 
 * {{hidden||Detecting and tracking of temporal data is an important task in multiple applications. In this paper we study temporal text mining methods for Music Information Retrieval. We compare two ways of detecting the temporal latent semantics of a corpus extracted from Wikipedia, using a stepwise Probabilistic Latent Semantic Analysis (PLSA) approach and a global multiway PLSA} method. The analysis indicates that the global analysis method is able to identify relevant trends which are difficult to get using a step-by-step approach. Furthermore we show that inspection of PLSA} models with different number of factors may reveal the stability of temporal clusters making it possible to choose the relevant number of factors. ""}}
 * {{hidden||Detecting and tracking of temporal data is an important task in multiple applications. In this paper we study temporal text mining methods for Music Information Retrieval. We compare two ways of detecting the temporal latent semantics of a corpus extracted from Wikipedia, using a stepwise Probabilistic Latent Semantic Analysis (PLSA) approach and a global multiway PLSA} method. The analysis indicates that the global analysis method is able to identify relevant trends which are difficult to get using a step-by-step approach. Furthermore we show that inspection of PLSA} models with different number of factors may reveal the stability of temporal clusters making it possible to choose the relevant number of factors. ""}}


 * -- align="left" valign=top
 * Mohammadi, Mehdi & GhasemAghaee, Nasser
 * Building bilingual parallel corpora based on wikipedia
 * 2nd International Conference on Computer Engineering and Applications, ICCEA 2010, March 19, 2010 - March 21, 2010 Indonesia
 * 2010
 * 
 * {{hidden||Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper we present a Persian-English} sentence-aligned parallel corpus by mining Wikipedia. We propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that our method increase precision, while it reduce the total number of generated candidate pairs. ""}}
 * {{hidden||Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper we present a Persian-English} sentence-aligned parallel corpus by mining Wikipedia. We propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that our method increase precision, while it reduce the total number of generated candidate pairs. ""}}


 * -- align="left" valign=top
 * Morgan, Jonathan T.; Derthick, Katie; Ferro, Toni; Searle, Elly; Zachry, Mark & Kriplean, Travis
 * Formalization and community investment in wikipedia's regulating texts: The role of essays
 * 27th ACM International Conference on Design of Communication, SIGDOC'09, October 5, 2009 - October 7, 2009 Bloomington, IN, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Mozina, Martin; Giuliano, Claudio & Bratko, Ivan
 * Argument based machine learning from examples and text
 * 2009 1st Asian Conference on Intelligent Information and Database Systems, ACIIDS 2009, April 1, 2009 - April 3, 2009 Dong Hoi, Viet nam
 * 2009
 * 
 * {{hidden||We introduce a novel approach to cross-media learning based on argument based machine learning (ABML).} ABML} is a recent method that combines argumentation and machine learning from examples, and its main idea is to use arguments for some of the learning examples. Arguments are usually provided by a domain expert. In this paper, we present an alternative approach, where arguments used in ABML} are automatically extracted from text with a technique for relation extraction. We demonstrate and evaluate the approach through a case study of learning to classify animals by using arguments automatically extracted from Wikipedia. ""}}
 * {{hidden||We introduce a novel approach to cross-media learning based on argument based machine learning (ABML).} ABML} is a recent method that combines argumentation and machine learning from examples, and its main idea is to use arguments for some of the learning examples. Arguments are usually provided by a domain expert. In this paper, we present an alternative approach, where arguments used in ABML} are automatically extracted from text with a technique for relation extraction. We demonstrate and evaluate the approach through a case study of learning to classify animals by using arguments automatically extracted from Wikipedia. ""}}


 * -- align="left" valign=top
 * Mulhem, Philippe & Chevallet, Jean-Pierre
 * Use of language model, phrases and wikipedia forward links for INEX 2009
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||We present in this paper the work of the Information Retrieval Modeling Group (MRIM) of the Computer Science Laboratory of Grenoble (LIG) at the INEX} 2009 Ad Hoc Track. Our aim this year was to twofold: first study the impact of extracted noun phrases taken in addition to words as terms, and second using forward links present in Wikipedia to expand queries. For the retrieval, we use a language model with Dirichlet smoothing on documents and/or doxels, and using an Fetch and Browse approach we select rank the results. Our best runs according to doxel evaluation get the first rank on the Thorough task, and according to the document evaluation we get the first rank for the Focused, Relevance in Context and Best in Context tasks. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We present in this paper the work of the Information Retrieval Modeling Group (MRIM) of the Computer Science Laboratory of Grenoble (LIG) at the INEX} 2009 Ad Hoc Track. Our aim this year was to twofold: first study the impact of extracted noun phrases taken in addition to words as terms, and second using forward links present in Wikipedia to expand queries. For the retrieval, we use a language model with Dirichlet smoothing on documents and/or doxels, and using an Fetch and Browse approach we select rank the results. Our best runs according to doxel evaluation get the first rank on the Thorough task, and according to the document evaluation we get the first rank for the Focused, Relevance in Context and Best in Context tasks. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Muller, Christof & Gurevych, Iryna
 * Using Wikipedia and Wiktionary in domain-specific information retrieval
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||The main objective of our experiments in the domain-specific track at CLEF} 2008 is utilizing semantic knowledge from collaborative knowledge bases such as Wikipedia and Wiktionary to improve the effectiveness of information retrieval. While Wikipedia has already been used in IR, the application of Wiktionary in this task is new. We evaluate two retrieval models, i.e. SR-Text} and SR-Word, based on semantic relatedness by comparing their performance to a statistical model as implemented by Lucene. We refer to Wikipedia article titles and Wiktionary word entries as concepts and map query and document terms to concept vectors which are then used to compute the document relevance. In the bilingual task, we translate the English topics into the document language, i.e. German, by using machine translation. For SR-Text, we alternatively perform the translation process by using cross-language links in Wikipedia, whereby the terms are directly mapped to concept vectors in the target language. The evaluation shows that the latter approach especially improves the retrieval performance in cases where the machine translation system incorrectly translates query terms. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||The main objective of our experiments in the domain-specific track at CLEF} 2008 is utilizing semantic knowledge from collaborative knowledge bases such as Wikipedia and Wiktionary to improve the effectiveness of information retrieval. While Wikipedia has already been used in IR, the application of Wiktionary in this task is new. We evaluate two retrieval models, i.e. SR-Text} and SR-Word, based on semantic relatedness by comparing their performance to a statistical model as implemented by Lucene. We refer to Wikipedia article titles and Wiktionary word entries as concepts and map query and document terms to concept vectors which are then used to compute the document relevance. In the bilingual task, we translate the English topics into the document language, i.e. German, by using machine translation. For SR-Text, we alternatively perform the translation process by using cross-language links in Wikipedia, whereby the terms are directly mapped to concept vectors in the target language. The evaluation shows that the latter approach especially improves the retrieval performance in cases where the machine translation system incorrectly translates query terms. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Muller, Claudia; Meuthrath, Benedikt & Jeschke, Sabina
 * Defining a universal actor content-element model for exploring social and information networks considering the temporal dynamic
 * 2009 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2009, July 20, 2009 - July 22, 2009 Athens, Greece
 * 2009
 * 


 * -- align="left" valign=top
 * Murugeshan, Meenakshi Sundaram; Lakshmi, K. & Mukherjee, Saswati
 * Exploiting negative categories and wikipedia structures for document classification
 * ARTCom 2009 - International Conference on Advances in Recent Technologies in Communication and Computing, October 27, 2009 - October 28, 2009 Kottayam, Kerala, India
 * 2009
 * 
 * {{hidden||This paper explores the effect of profile based method for classification of Wikipedia XML} documents. Our approach builds two profiles, exploiting the whole content, Initial Descriptions and links in the Wikipedia documents. For building profiles we use the negative category information which has shown to perform well for classifying unstructured texts. The performance of Cosine and Fractional Similarity metrics is also compared. The use of two classifiers and their weighted average improves the classification performance. ""}}
 * {{hidden||This paper explores the effect of profile based method for classification of Wikipedia XML} documents. Our approach builds two profiles, exploiting the whole content, Initial Descriptions and links in the Wikipedia documents. For building profiles we use the negative category information which has shown to perform well for classifying unstructured texts. The performance of Cosine and Fractional Similarity metrics is also compared. The use of two classifiers and their weighted average improves the classification performance. ""}}


 * -- align="left" valign=top
 * Nadamoto, Akiyo; Aramaki, Eiji; Abekawa, Takeshi & Murakami, Yohei
 * Content hole search in community-type content using Wikipedia
 * 11th International Conference on Information Integration and Web-based Applications and Services, iiWAS2009, December 14, 2009 - December 16, 2009 Kuala Lumpur, Malaysia
 * 2009
 * 
 * {{hidden||SNSs} and blogs, both of which are maintained by a community of people, have become popular in Web 2.0. We call these content as Community-type} content." This community is associated with the content and those who use or contribute to community-type content are considered as members of the community. Occasionally the members of a community do not understand the theme of the content from multiple viewpoints hence the amount of information is often insufficient. It is convenient to present the user missed information. In this way when Web 2.0 became popular the content on the Internet and type of users are changed. We believe that there is a need for next-generation search engines in Web 2.0. We require a search engine that can search for information users are unaware of; we call such information as "content holes." In this paper we propose a method for searching content holes in community-type content. We attempt to extract and represent content holes from discussions on SNSs} and blogs. Conventional Web search technique is generally based on similarities. On the other hand our content-hole search is a different search. In this paper we classify and represent a number of images for different searching methods; we define content holes and as the first step toward realizing our aim we propose a content-hole search system using Wikipedia. """}}
 * {{hidden||SNSs} and blogs, both of which are maintained by a community of people, have become popular in Web 2.0. We call these content as Community-type} content." This community is associated with the content and those who use or contribute to community-type content are considered as members of the community. Occasionally the members of a community do not understand the theme of the content from multiple viewpoints hence the amount of information is often insufficient. It is convenient to present the user missed information. In this way when Web 2.0 became popular the content on the Internet and type of users are changed. We believe that there is a need for next-generation search engines in Web 2.0. We require a search engine that can search for information users are unaware of; we call such information as "content holes." In this paper we propose a method for searching content holes in community-type content. We attempt to extract and represent content holes from discussions on SNSs} and blogs. Conventional Web search technique is generally based on similarities. On the other hand our content-hole search is a different search. In this paper we classify and represent a number of images for different searching methods; we define content holes and as the first step toward realizing our aim we propose a content-hole search system using Wikipedia. """}}


 * -- align="left" valign=top
 * Nakabayashi, Takeru; Yumoto, Takayuki; Nii, Manabu; Takahashi, Yutaka & Sumiya, Kazutoshi
 * Measuring peculiarity of text using relation between words on the web
 * 12th International Conference on Asia-Pacific Digital Libraries, ICADL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia
 * 2010
 * 


 * -- align="left" valign=top
 * Nakasaki, Hiroyuki; Kawaba, Mariko; Utsuro, Takehito & Fukuhara, Tomohiro
 * Mining cross-lingual/cross-cultural differences in concerns and opinions in blogs
 * 22nd International Conference on Computer Processing of Oriental Languages, ICCPOL 2009, March 26, 2009 - March 27, 2009 Hong kong
 * 2009
 * 


 * -- align="left" valign=top
 * Nakayama, Kotaro; Ito, Masahiro; Hara, Takahiro & Nishio, Shojiro
 * Wikipedia relatedness measurement methods and influential features
 * 2009 International Conference on Advanced Information Networking and Applications Workshops, WAINA 2009, May 26, 2009 - May 29, 2009 Bradford, United kingdom
 * 2009
 * 
 * {{hidden||As a corpus for knowledge extraction, Wikipedia has become one of the promising resources among researchers in various domains such as NLP, WWW, IR} and AI} since it has a great coverage of concepts for wide-range domain, remarkable accuracy and easy-handled structure for analysis. Relatedness measurement among concepts is one of the traditional research topics on Wikipedia analysis. The value of relatedness measurement research is widely recognized because of the wide range of applications such as query expansion in IR} and context recognition in WSD(Word} Sense Disambiguation). A number of approaches have been proposed and they proved that there are many features that can be used to measure relatedness among concepts in Wikipedia. In the past, previous researches, many features such as categories, co-occurrence of terms (links), inter-page links and Infoboxes are used to this aim. What seems lacking, however, is an integrated feature selection model for these dispersed features since it is still unclear that which feature is influential and how can we integrate them in order to achieve higher accuracy. This paper is a position paper that proposes a SVR} (Support} Vector Regression) based integrated feature selection model to investigate the influence of each feature and seek a combine model of features that achieves high accuracy and coverage. ""}}
 * {{hidden||As a corpus for knowledge extraction, Wikipedia has become one of the promising resources among researchers in various domains such as NLP, WWW, IR} and AI} since it has a great coverage of concepts for wide-range domain, remarkable accuracy and easy-handled structure for analysis. Relatedness measurement among concepts is one of the traditional research topics on Wikipedia analysis. The value of relatedness measurement research is widely recognized because of the wide range of applications such as query expansion in IR} and context recognition in WSD(Word} Sense Disambiguation). A number of approaches have been proposed and they proved that there are many features that can be used to measure relatedness among concepts in Wikipedia. In the past, previous researches, many features such as categories, co-occurrence of terms (links), inter-page links and Infoboxes are used to this aim. What seems lacking, however, is an integrated feature selection model for these dispersed features since it is still unclear that which feature is influential and how can we integrate them in order to achieve higher accuracy. This paper is a position paper that proposes a SVR} (Support} Vector Regression) based integrated feature selection model to investigate the influence of each feature and seek a combine model of features that achieves high accuracy and coverage. ""}}


 * -- align="left" valign=top
 * Nakayama, Kotaro; Ito, Masahiro; Hara, Takahiro & Nishio, Shojiro
 * Wikipedia mining for huge scale Japanese association thesaurus construction
 * 22nd International Conference on Advanced Information Networking and Applications Workshops/Symposia, AINA 2008, March 25, 2008 - March 28, 2008 Gino-wan, Okinawa, Japan
 * 2008
 * 


 * -- align="left" valign=top
 * Nazir, Fawad & Takeda, Hideaki
 * Extraction and analysis of tripartite relationships from Wikipedia
 * 2008 IEEE International Symposium on Technology and Society: ISTAS '08 - Citizens, Groups, Communities and Information and Communication Technologies, June 26, 2008 - June 28, 2008 Fredericton, NB, Canada
 * 2008
 * 


 * -- align="left" valign=top
 * Neiat, Azadeh Ghari; Mohsenzadeh, Mehran; Forsati, Rana & Rahmani, Amir Masoud
 * An agent- based semantic web service discovery framework
 * 2009 International Conference on Computer Modeling and Simulation, ICCMS 2009, February 20, 2009 - February 22, 2009 Macau, China
 * 2009
 * 
 * {{hidden||Web services have changed the Web from a database of static documents to a service provider. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and agents. In this paper we propose a framework for semantic Web service discovery based on semantic Web services and FIPA} multi agents. This paper provides a broker which provides semantic interoperability between semantic Web service provider and agents by translating WSDL} to DF} description for semantic Web services and DF} description to WSDL} ForFIPA} multi agents. We describe how the proposed architecture analyzes the request and match search query. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ). We also describe the recommendation component that recommends the WSDL} to Web service provider to increase their retrieval probability in the related queries. ""}}
 * {{hidden||Web services have changed the Web from a database of static documents to a service provider. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and agents. In this paper we propose a framework for semantic Web service discovery based on semantic Web services and FIPA} multi agents. This paper provides a broker which provides semantic interoperability between semantic Web service provider and agents by translating WSDL} to DF} description for semantic Web services and DF} description to WSDL} ForFIPA} multi agents. We describe how the proposed architecture analyzes the request and match search query. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ). We also describe the recommendation component that recommends the WSDL} to Web service provider to increase their retrieval probability in the related queries. ""}}


 * -- align="left" valign=top
 * Neiat, Azadeh Ghari; Shavalady, Sajjad Haj; Mohsenzadeh, Mehran & Rahmani, Amir Masoud
 * A new approach for semantic web service discovery and propagation based on agents
 * 5th International Conference on Networking and Services, ICNS 2009, April 20, 2009 - April 25, 2009 Valencia, Spain
 * 2009
 * 
 * {{hidden||for Web based systems integration become a time challenge. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and agents. In this paper an approach for semantic Web service discovery and propagation based on semantic Web services and FIPA} multi agents is proposed. A broker allowing to expose semantic interoperability between semantic Web service provider and agent by translating WSDL} to DF} description for semantic Web services and vice versa is proposed . We describe how the proposed architecture analyzes the request and after being analyzed, matches or publishes the request. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ...). We also describe the recommender which analyzes the created WSDL} based on the functional and non-functional requirements and then recommends it to Web service provider to increase their retrieval probability in the related queries. ""}}
 * {{hidden||for Web based systems integration become a time challenge. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and agents. In this paper an approach for semantic Web service discovery and propagation based on semantic Web services and FIPA} multi agents is proposed. A broker allowing to expose semantic interoperability between semantic Web service provider and agent by translating WSDL} to DF} description for semantic Web services and vice versa is proposed . We describe how the proposed architecture analyzes the request and after being analyzed, matches or publishes the request. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ...). We also describe the recommender which analyzes the created WSDL} based on the functional and non-functional requirements and then recommends it to Web service provider to increase their retrieval probability in the related queries. ""}}


 * -- align="left" valign=top
 * Netzer, Yael; Gabay, David; Adler, Meni; Goldberg, Yoav & Elhadad, Michael
 * Ontology evaluation through text classification
 * APWeb/WAIM 2009 International Workshops: WCMT 2009, RTBI 2009, DBIR-ENQOIR 2009, PAIS 2009, April 2, 2009 - April 4, 2009 Suzhou, China
 * 2009
 * 
 * {{hidden||We present a new method to evaluate a search ontology, which relies on mapping ontology instances to textual documents. On the basis of this mapping, we evaluate the adequacy of ontology relations by measuring their classification potential over the textual documents. This data-driven method provides concrete feedback to ontology maintainers and a quantitative estimation of the functional adequacy of the ontology relations towards search experience improvement. We specifically evaluate whether an ontology relation can help a semantic search engine support exploratory search. We test this ontology evaluation method on an ontology in the Movies domain, that has been acquired semi-automatically from the integration of multiple semi-structured and textual data sources (e.g., IMDb} and Wikipedia). We automatically construct a domain corpus from a set of movie instances by crawling the Web for movie reviews (both professional and user reviews). The 1-1 relation between textual documents (reviews) and movie instances in the ontology enables us to translate ontology relations into text classes. We verify that the text classifiers induced by key ontology relations (genre, keywords, actors) achieve high performance and exploit the properties of the learned text classifiers to provide concrete feedback on the ontology. The proposed ontology evaluation method is general and relies on the possibility to automatically align textual documents to ontology instances. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||We present a new method to evaluate a search ontology, which relies on mapping ontology instances to textual documents. On the basis of this mapping, we evaluate the adequacy of ontology relations by measuring their classification potential over the textual documents. This data-driven method provides concrete feedback to ontology maintainers and a quantitative estimation of the functional adequacy of the ontology relations towards search experience improvement. We specifically evaluate whether an ontology relation can help a semantic search engine support exploratory search. We test this ontology evaluation method on an ontology in the Movies domain, that has been acquired semi-automatically from the integration of multiple semi-structured and textual data sources (e.g., IMDb} and Wikipedia). We automatically construct a domain corpus from a set of movie instances by crawling the Web for movie reviews (both professional and user reviews). The 1-1 relation between textual documents (reviews) and movie instances in the ontology enables us to translate ontology relations into text classes. We verify that the text classifiers induced by key ontology relations (genre, keywords, actors) achieve high performance and exploit the properties of the learned text classifiers to provide concrete feedback on the ontology. The proposed ontology evaluation method is general and relies on the possibility to automatically align textual documents to ontology instances. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Newman, David; Noh, Youn; Talley, Edmund; Karimi, Sarvnaz & Baldwin, Timothy
 * Evaluating topic models for digital libraries
 * 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia
 * 2010
 * 
 * {{hidden||Topic models could have a huge impact on improving the ways users find and discover content in digital libraries and search interfaces, through their ability to automatically learn and apply subject tags to each and every item in a collection, and their ability to dynamically create virtual collections on the fly. However, much remains to be done to tap this potential, and empirically evaluate the true value of a given topic model to humans. In this work, we sketch out some sub-tasks that we suggest pave the way towards this goal, and present methods for assessing the coherence and inter-pretability of topics learned by topic models. Our large-scale user study includes over 70 human subjects evaluating and scoring almost 500 topics learned from collections from a wide range of genres and domains. We show how a scoring model - based on pointwise mutual information of word-pairs using Wikipedia, Google and MEDLINE} as external data sources - performs well at predicting human scores. This automated scoring of topics is an important first step to integrating topic modeling into digital libraries. ""}}
 * {{hidden||Topic models could have a huge impact on improving the ways users find and discover content in digital libraries and search interfaces, through their ability to automatically learn and apply subject tags to each and every item in a collection, and their ability to dynamically create virtual collections on the fly. However, much remains to be done to tap this potential, and empirically evaluate the true value of a given topic model to humans. In this work, we sketch out some sub-tasks that we suggest pave the way towards this goal, and present methods for assessing the coherence and inter-pretability of topics learned by topic models. Our large-scale user study includes over 70 human subjects evaluating and scoring almost 500 topics learned from collections from a wide range of genres and domains. We show how a scoring model - based on pointwise mutual information of word-pairs using Wikipedia, Google and MEDLINE} as external data sources - performs well at predicting human scores. This automated scoring of topics is an important first step to integrating topic modeling into digital libraries. ""}}


 * -- align="left" valign=top
 * Nguyen, Chau Q. & Phan, Tuoi T.
 * Key phrase extraction: A hybrid assignment and extraction approach
 * 11th International Conference on Information Integration and Web-based Applications and Services, iiWAS2009, December 14, 2009 - December 16, 2009 Kuala Lumpur, Malaysia
 * 2009
 * 
 * {{hidden||Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP).} In this work, we propose a novel method for key phrase extracting of Vietnamese text that combines assignment and extraction approaches. We also explore NLP} techniques that we propose for the analysis of Vietnamese texts, focusing on the advanced candidate phrases recognition phase as well as part-of-speech (POS) tagging. Then we propose a method that exploits specific characteristics of the Vietnamese language and exploits the Vietnamese Wikipedia as an ontology for key phrase ambiguity resolution. Finally, we show the results of several experiments that have examined the impacts of strategies chosen for Vietnamese key phrase extracting. ""}}
 * {{hidden||Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP).} In this work, we propose a novel method for key phrase extracting of Vietnamese text that combines assignment and extraction approaches. We also explore NLP} techniques that we propose for the analysis of Vietnamese texts, focusing on the advanced candidate phrases recognition phase as well as part-of-speech (POS) tagging. Then we propose a method that exploits specific characteristics of the Vietnamese language and exploits the Vietnamese Wikipedia as an ontology for key phrase ambiguity resolution. Finally, we show the results of several experiments that have examined the impacts of strategies chosen for Vietnamese key phrase extracting. ""}}


 * -- align="left" valign=top
 * Nguyen, Dong; Overwijk, Arnold; Hauff, Claudia; Trieschnigg, Dolf R. B.; Hiemstra, Djoerd & Jong, Franciska De
 * WikiTranslate: Query translation for cross-lingual information retrieval using only wikipedia
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||This paper presents WikiTranslate, a system which performs query translation for cross-lingual information retrieval (CLIR) using only Wikipedia to obtain translations. Queries are mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate} is evaluated by searching with topics formulated in Dutch, French and Spanish in an English data collection. The system achieved a performance of 67\% compared to the monolingual baseline. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper presents WikiTranslate, a system which performs query translation for cross-lingual information retrieval (CLIR) using only Wikipedia to obtain translations. Queries are mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate} is evaluated by searching with topics formulated in Dutch, French and Spanish in an English data collection. The system achieved a performance of 67\% compared to the monolingual baseline. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Nguyen, Hien T. & Cao, Tru H.
 * Exploring wikipedia and text features for named entity disambiguation
 * 2010 Asian Conference on Intelligent Information and Database Systems, ACIIDS 2010, March 24, 2010 - March 26, 2010 Hue City, Viet nam
 * 2010
 * 
 * {{hidden||Precisely identifying entities is essential for semantic annotation. This paper addresses the problem of named entity disambiguation that aims at mapping entity mentions in a text onto the right entities in Wikipedia. The aim of this paper is to explore and evaluate various combinations of features extracted from Wikipedia and texts for the disambiguation task, based on a statistical ranking model of candidate entities. Through experiments, we show which combinations of features are the best choices for disambiguation. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Precisely identifying entities is essential for semantic annotation. This paper addresses the problem of named entity disambiguation that aims at mapping entity mentions in a text onto the right entities in Wikipedia. The aim of this paper is to explore and evaluate various combinations of features extracted from Wikipedia and texts for the disambiguation task, based on a statistical ranking model of candidate entities. Through experiments, we show which combinations of features are the best choices for disambiguation. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Nguyen, Hien T. & Cao, Tru H.
 * Named entity disambiguation on an ontology enriched by Wikipedia
 * RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, July 13, 2008 - July 17, 2008 Ho Chi Minh City, Viet nam
 * 2008
 * 


 * -- align="left" valign=top
 * Nguyen, Thanh C.; Le, Hai M. & Phan, Tuoi T.
 * Building knowledge base for Vietnamese information retrieval
 * 11th International Conference on Information Integration and Web-based Applications and Services, iiWAS2009, December 14, 2009 - December 16, 2009 Kuala Lumpur, Malaysia
 * 2009
 * 
 * {{hidden||At present, Vietnamese knowledge base (vnKB) is one of the most important focuses of Vietnamese researchers because of its applications in wide areas such as Information Retrieval (IR), Machine Translation (MT) etc. There have been several separate projects developing VnKB} in various domains. The training in VnBK} is the most difficulty because of quantity and quality of training data, and lacking of available Vietnamese corpus with acceptable quality. This paper introduces an approach, which first extracts semantic information from Vietnamese Wikipedia (vnWK), then trains the proposed VnKB} by applying support vector machine (SVM) technique. The experimentation of the proposed approach shows that it is a potential solution because of its good results and proves that it can provide more valuable benefits when applying to our Vietnamese Semantic Information Retrieval system. ""}}
 * {{hidden||At present, Vietnamese knowledge base (vnKB) is one of the most important focuses of Vietnamese researchers because of its applications in wide areas such as Information Retrieval (IR), Machine Translation (MT) etc. There have been several separate projects developing VnKB} in various domains. The training in VnBK} is the most difficulty because of quantity and quality of training data, and lacking of available Vietnamese corpus with acceptable quality. This paper introduces an approach, which first extracts semantic information from Vietnamese Wikipedia (vnWK), then trains the proposed VnKB} by applying support vector machine (SVM) technique. The experimentation of the proposed approach shows that it is a potential solution because of its good results and proves that it can provide more valuable benefits when applying to our Vietnamese Semantic Information Retrieval system. ""}}


 * -- align="left" valign=top
 * Ochoa, Xavier & Duval, Erik
 * Measuring learning object reuse
 * 3rd European Conference on Technology Enhanced Learning, EC-TEL 2008, September 16, 2008 - September 19, 2008 Maastricht, Netherlands
 * 2008
 * 
 * {{hidden||This paper presents a quantitative analysis of the reuse of learning objects in real world settings. The data for this analysis was obtained from three sources: Connexions' modules, University courses and Presentation components. They represent the reuse of learning objects at different granularity levels. Data from other types of reusable components, such as software libraries, Wikipedia images and Web APIs, were used for comparison purposes. Finally, the paper discusses the implications of the findings in the field of Learning Object research. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper presents a quantitative analysis of the reuse of learning objects in real world settings. The data for this analysis was obtained from three sources: Connexions' modules, University courses and Presentation components. They represent the reuse of learning objects at different granularity levels. Data from other types of reusable components, such as software libraries, Wikipedia images and Web APIs, were used for comparison purposes. Finally, the paper discusses the implications of the findings in the field of Learning Object research. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Oh, Jong-Hoon; Kawahara, Daisuke; Uchimoto, Kiyotaka; Kazama, Jun'ichi & Torisawa, Kentaro
 * Enriching multilingual language resources by discovering missing cross-language links in Wikipedia
 * 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia
 * 2008
 * 


 * -- align="left" valign=top
 * Ohmori, Kenji & Kunii, Tosiyasu L.
 * The mathematical structure of cyberworlds
 * 2007 International Conference on Cyberworlds, CW'07, October 24, 2007 - October 27, 2007 Hannover, Germany
 * 2007
 * 


 * -- align="left" valign=top
 * Okoli, Chitu
 * A brief review of studies of Wikipedia in peer-reviewed journals
 * 3rd International Conference on Digital Society, ICDS 2009, February 1, 2009 - February 7, 2009 Cancun, Mexico
 * 2009
 * 


 * -- align="left" valign=top
 * Okoli, Chitu
 * Information product creation through open source encyclopedias
 * ICC2009 - International Conference of Computing in Engineering, Science and Information, April 2, 2009 - April 4, 2009 Fullerton, CA, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Okoli, Chitu & Schabram, Kira
 * Protocol for a systematic literature review of research on the Wikipedia
 * 1st ACM International Conference on Management of Emergent Digital EcoSystems, MEDES '09, October 27, 2009 - October 30, 2009 Lyon, France
 * 2009
 * 


 * -- align="left" valign=top
 * Okuoka, Tomoki; Takahashi, Tomokazu; Deguchi, Daisuke; Ide, Ichiro & Murase, Hiroshi
 * Labeling news topic threads with wikipedia entries
 * 11th IEEE International Symposium on Multimedia, ISM 2009, December 14, 2009 - December 16, 2009 San Diego, CA, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Olleros, F. Xavier
 * Learning to trust the crowd: Some lessons from Wikipedia
 * 2008 International MCETECH Conference on e-Technologies, MCETECH 2008, January 23, 2008 - January 25, 2008 Montreal, QC, Canada
 * 2008
 * 
 * {{hidden||Inspired by the open source software (OSS) movement, Wikipedia has gone further than any OSS} project in decentralizing its quality control task. This is seen by many as a fatal flaw. In this short paper, I will try to show that it is rather a shrewd and fertile design choice. First, I will describe the precise way in which Wikipedia is more decentralized than OSS} projects. Secondly, I will explain why Wikipedia's quality control can be and must be decentralized. Thirdly, I will show why it is wise for Wikipedia to welcome anonymous amateurs. Finally, I will argue that concerns about Wikipedia's quality and sustainable success have to be tempered by the fact that, as disruptive innovations tend to do, Wikipedia is in the process of redefining the pertinent dimensions of quality and value for general encyclopedias. ""}}
 * {{hidden||Inspired by the open source software (OSS) movement, Wikipedia has gone further than any OSS} project in decentralizing its quality control task. This is seen by many as a fatal flaw. In this short paper, I will try to show that it is rather a shrewd and fertile design choice. First, I will describe the precise way in which Wikipedia is more decentralized than OSS} projects. Secondly, I will explain why Wikipedia's quality control can be and must be decentralized. Thirdly, I will show why it is wise for Wikipedia to welcome anonymous amateurs. Finally, I will argue that concerns about Wikipedia's quality and sustainable success have to be tempered by the fact that, as disruptive innovations tend to do, Wikipedia is in the process of redefining the pertinent dimensions of quality and value for general encyclopedias. ""}}


 * -- align="left" valign=top
 * Ortega, Felipe; Gonzalez-Barahona, Jesus M. & Robles, Gregorio
 * The top-ten wikipedias : A quantitative analysis using wikixray
 * 2nd International Conference on Software and Data Technologies, ICSOFT 2007, July 22, 2007 - July 25, 2007 Barcelona, Spain
 * 2007


 * -- align="left" valign=top
 * Otjacques, Benoit; Cornil, Mael & Feltz, Fernand
 * Visualizing cooperative activities with ellimaps: The case of wikipedia
 * 6th International Conference on Cooperative Design, Visualization, and Engineering, CDVE 2009, September 20, 2009 - September 23, 2009 Luxembourg, Luxembourg
 * 2009
 * 
 * {{hidden||Cooperation has become a key word in the emerging Web 2.0 paradigm. The nature and motivations of the various behaviours related to this type of cooperative activities remain however incompletely understood. The information visualization tools can play a crucial role from this perspective to analyse the collected data. This paper presents a prototype allowing visualizing some data about the Wikipedia history with a technique called ellimaps. In this context the recent CGD} algorithm is used in order to increase the scalability of the ellimaps approach. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||Cooperation has become a key word in the emerging Web 2.0 paradigm. The nature and motivations of the various behaviours related to this type of cooperative activities remain however incompletely understood. The information visualization tools can play a crucial role from this perspective to analyse the collected data. This paper presents a prototype allowing visualizing some data about the Wikipedia history with a technique called ellimaps. In this context the recent CGD} algorithm is used in order to increase the scalability of the ellimaps approach. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Overell, Simon; Sigurbjornsson, Borkur & Zwol, Roelof Van
 * Classifying tags using open content resources
 * 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain
 * 2009
 * 
 * {{hidden||Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet} categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet} baseline our method increases the coverage of the Flickr vocabulary by 115\%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geo-caching and wii. ""}}
 * {{hidden||Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet} categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet} baseline our method increases the coverage of the Flickr vocabulary by 115\%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geo-caching and wii. ""}}


 * -- align="left" valign=top
 * Ozyurt, I. Burak
 * A large margin approach to anaphora resolution for neuroscience knowledge discovery
 * 22nd International Florida Artificial Intelligence Research Society Conference, FLAIRS-22, March 19, 2009 - March 21, 2009 Sanibel Island, FL, United states
 * 2009
 * {{hidden||A discriminative large margin classifier based approach to anaphora resolution for neuroscience abstracts is presented. The system employs both syntactic and semantic features. A support vector machine based word sense disambiguation method combining evidence from three methods, that use WordNet} and Wikipedia, is also introduced and used for semantic features. The support vector machine anaphora resolution classifier with probabilistic outputs achieved almost four-fold improvement in accuracy over the baseline method. Copyright 2009, Assocation for the Advancement of ArtdicaI} Intelligence (www.aaai.org). All rights reserved.}}
 * {{hidden||A discriminative large margin classifier based approach to anaphora resolution for neuroscience abstracts is presented. The system employs both syntactic and semantic features. A support vector machine based word sense disambiguation method combining evidence from three methods, that use WordNet} and Wikipedia, is also introduced and used for semantic features. The support vector machine anaphora resolution classifier with probabilistic outputs achieved almost four-fold improvement in accuracy over the baseline method. Copyright 2009, Assocation for the Advancement of ArtdicaI} Intelligence (www.aaai.org). All rights reserved.}}
 * {{hidden||A discriminative large margin classifier based approach to anaphora resolution for neuroscience abstracts is presented. The system employs both syntactic and semantic features. A support vector machine based word sense disambiguation method combining evidence from three methods, that use WordNet} and Wikipedia, is also introduced and used for semantic features. The support vector machine anaphora resolution classifier with probabilistic outputs achieved almost four-fold improvement in accuracy over the baseline method. Copyright 2009, Assocation for the Advancement of ArtdicaI} Intelligence (www.aaai.org). All rights reserved.}}


 * -- align="left" valign=top
 * Pablo-Sanchez, Cesar De; Martinez-Fernandez, Jose L.; Gonzalez-Ledesma, Ana; Samy, Doaa; Martinez, Paloma; Moreno-Sandoval, Antonio & Al-Jumaily, Harith
 * Combining wikipedia and newswire texts for question answering in spanish
 * 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary
 * 2008
 * 
 * {{hidden||This paper describes the adaptations of the MIRACLE} group QA} system in order to participate in the Spanish monolingual question answering task at QA@CLEF} 2007. A system, initially developed for the EFE} collection, was reused for Wikipedia. Answers from both collections were combined using temporal information extracted from questions and collections. Reusing the EFE} subsystem has proven not feasible, and questions with answers only in Wikipedia have obtained low accuracy. Besides, a co-reference module based on heuristics was introduced for processing topic-related questions. This module achieves good coverage in different situations but it is hindered by the moderate accuracy of the base system and the chaining of incorrect answers. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper describes the adaptations of the MIRACLE} group QA} system in order to participate in the Spanish monolingual question answering task at QA@CLEF} 2007. A system, initially developed for the EFE} collection, was reused for Wikipedia. Answers from both collections were combined using temporal information extracted from questions and collections. Reusing the EFE} subsystem has proven not feasible, and questions with answers only in Wikipedia have obtained low accuracy. Besides, a co-reference module based on heuristics was introduced for processing topic-related questions. This module achieves good coverage in different situations but it is hindered by the moderate accuracy of the base system and the chaining of incorrect answers. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Panchal, Jitesh H. & Fathianathan, Mervyn
 * Product realization in the age of mass collaboration
 * 2008 ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, DETC 2008, August 3, 2008 - August 6, 2008 New York City, NY, United states
 * 2009


 * -- align="left" valign=top
 * Panciera, Katherine; Priedhorsky, Reid; Erickson, Thomas & Terveen, Loren
 * Lurking? Cyclopaths? A quantitative lifecycle analysis of user behavior in a geowiki
 * 28th Annual CHI Conference on Human Factors in Computing Systems, CHI 2010, April 10, 2010 - April 15, 2010 Atlanta, GA, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Pang, Wenbo & Fan, Xiaozhong
 * Inducing gazetteer for Chinese named entity recognition based on local high-frequent strings
 * 2009 2nd International Conference on Future Information Technology and Management Engineering, FITME 2009, December 13, 2009 - December 14, 2009 Sanya, China
 * 2009
 * 
 * {{hidden||Gazetteers, or entity dictionaries, are important for named entity recognition (NER).} Although the dictionaries extracted automatically by the previous methods from a corpus, web or Wikipedia are very huge, they also misses some entities, especially the domain-specific entities. We present a novel method of automatic entity dictionary induction, which is able to construct a dictionary more specific to the processing text at a much lower computational cost than the previous methods. It extracts the local high-frequent strings in a document as candidate entities, and filters the invalid candidates with the accessor variety (AV) as our entity criterion. The experiments show that the obtained dictionary can effectively improve the performance of a high-precision baseline of NER.} ""}}
 * {{hidden||Gazetteers, or entity dictionaries, are important for named entity recognition (NER).} Although the dictionaries extracted automatically by the previous methods from a corpus, web or Wikipedia are very huge, they also misses some entities, especially the domain-specific entities. We present a novel method of automatic entity dictionary induction, which is able to construct a dictionary more specific to the processing text at a much lower computational cost than the previous methods. It extracts the local high-frequent strings in a document as candidate entities, and filters the invalid candidates with the accessor variety (AV) as our entity criterion. The experiments show that the obtained dictionary can effectively improve the performance of a high-precision baseline of NER.} ""}}


 * -- align="left" valign=top
 * Paolucci, Alessio
 * Research summary: Intelligent Natural language processing techniques and tools
 * 25th International Conference on Logic Programming, ICLP 2009, July 14, 2009 - July 17, 2009 Pasadena, CA, United states
 * 2009
 * 
 * {{hidden||My research path started with my master thesis (supervisor Prof. Stefania Costantini) about a neurobiologically-inspired proposal in the field of natural language processing. In more detail, we proposed the Semantic} Enhanced DCGs"} (for short SE-DCGs) extension to the well-known DCG's} to allow for parallel syntactic and semantic analysis and generate semantically-based description of the sentence at hand. The analysis carried out through SE-DCG's} was called "syntactic-semantic fully informed analysis" and it was designed to be as close as possible (at least in principle) to the results in the context of neuroscience that I had revised and studied. As proof-of-concept I implemented the prototype of semantic search engine the Mnemosine system. Mnemosine is able to interact with a user in natural language and to provide contextual answer at different levels of detail. Mnemosine has been applied to a practical case-study i.e. to the WikiPedia} Web pages. A brief overview of this work was presented during CICL} 08 [1]. 2009 Springer Berlin Heidelberg."}}
 * {{hidden||My research path started with my master thesis (supervisor Prof. Stefania Costantini) about a neurobiologically-inspired proposal in the field of natural language processing. In more detail, we proposed the Semantic} Enhanced DCGs"} (for short SE-DCGs) extension to the well-known DCG's} to allow for parallel syntactic and semantic analysis and generate semantically-based description of the sentence at hand. The analysis carried out through SE-DCG's} was called "syntactic-semantic fully informed analysis" and it was designed to be as close as possible (at least in principle) to the results in the context of neuroscience that I had revised and studied. As proof-of-concept I implemented the prototype of semantic search engine the Mnemosine system. Mnemosine is able to interact with a user in natural language and to provide contextual answer at different levels of detail. Mnemosine has been applied to a practical case-study i.e. to the WikiPedia} Web pages. A brief overview of this work was presented during CICL} 08 [1]. 2009 Springer Berlin Heidelberg."}}


 * -- align="left" valign=top
 * Pedersen, Claus Vesterager
 * Who are the oracles - Is Web 2.0 the fulfilment of our dreams?: Host lecture at the EUSIDIC Annual Conference 11-13 March 2007 at Roskilde University
 * Information Services and Use
 * 2007


 * -- align="left" valign=top
 * Pei, Minghua; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro
 * Constructing a global ontology by concept mapping using Wikipedia thesaurus
 * 22nd International Conference on Advanced Information Networking and Applications Workshops/Symposia, AINA 2008, March 25, 2008 - March 28, 2008 Gino-wan, Okinawa, Japan
 * 2008
 * 
 * {{hidden||Recently, the importance of semantics on the WWW} is widely recognized and a lot of semantic information (RDF, OWL} etc.) is being built/published on the WWW.} However, the lack of ontology mappings becomes a serious problem for the Semantic Web since it needs well defined relations to retrieve information correctly by inferring the meaning of information. One to one mapping is not an efficient method due to the nature of distributed environment. Therefore, it would be a considerable method to map the concepts by using a large-scale intermediate ontology. On the other hand, Wikipedia is a large-scale of concept network covering almost all concepts in the real world. In this paper, we propose an intermediate ontology construction method using Wikipedia Thesaurus, an association thesaurus extracted from Wikipedia. Since Wikipedia Thesaurus provides associated concepts without explicit relation type, we propose an approach of concept mapping using two sub methods; name mapping" and "logic-based mapping". """}}
 * {{hidden||Recently, the importance of semantics on the WWW} is widely recognized and a lot of semantic information (RDF, OWL} etc.) is being built/published on the WWW.} However, the lack of ontology mappings becomes a serious problem for the Semantic Web since it needs well defined relations to retrieve information correctly by inferring the meaning of information. One to one mapping is not an efficient method due to the nature of distributed environment. Therefore, it would be a considerable method to map the concepts by using a large-scale intermediate ontology. On the other hand, Wikipedia is a large-scale of concept network covering almost all concepts in the real world. In this paper, we propose an intermediate ontology construction method using Wikipedia Thesaurus, an association thesaurus extracted from Wikipedia. Since Wikipedia Thesaurus provides associated concepts without explicit relation type, we propose an approach of concept mapping using two sub methods; name mapping" and "logic-based mapping". """}}


 * -- align="left" valign=top
 * Pereira, Francisco; Alves, Ana; Oliveirinha, Joo & Biderman, Assaf
 * Perspectives on semantics of the place from online resources
 * ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Pilato, Giovanni; Augello, Agnese; Scriminaci, Mario; Vassallo, Giorgio & Gaglio, Salvatore
 * Sub-symbolic mapping of cyc microtheories in data-driven conceptual" spaces"
 * 11th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2007, and 17th Italian Workshop on Neural Networks, WIRN 2007, September 12, 2007 - September 14, 2007 Vietri sul Mare, Italy
 * 2007
 * {{hidden||The presented work aims to combine statistical and cognitive-oriented approaches with symbolic ones so that a conceptual similarity relationship layer can be added to a Cyc KB} microtheory. Given a specific microtheory, a LSA-inspired} conceptual space is inferred from a corpus of texts created using both ad hoc extracted pages from the Wikipedia repository and the built-in comments about the concepts of the specific Cyc microtheory. Each concept is projected in the conceptual space and the desired layer of subsymbolic relationships between concepts is created. This procedure can help a user in finding the concepts that are sub-symbolically conceptually related" to a new concept that he wants to insert in the microtheory. Experimental results involving two Cyc microtheories are also reported. Springer-Verlag} Berlin Heidelberg 2007."}}
 * {{hidden||The presented work aims to combine statistical and cognitive-oriented approaches with symbolic ones so that a conceptual similarity relationship layer can be added to a Cyc KB} microtheory. Given a specific microtheory, a LSA-inspired} conceptual space is inferred from a corpus of texts created using both ad hoc extracted pages from the Wikipedia repository and the built-in comments about the concepts of the specific Cyc microtheory. Each concept is projected in the conceptual space and the desired layer of subsymbolic relationships between concepts is created. This procedure can help a user in finding the concepts that are sub-symbolically conceptually related" to a new concept that he wants to insert in the microtheory. Experimental results involving two Cyc microtheories are also reported. Springer-Verlag} Berlin Heidelberg 2007."}}
 * {{hidden||The presented work aims to combine statistical and cognitive-oriented approaches with symbolic ones so that a conceptual similarity relationship layer can be added to a Cyc KB} microtheory. Given a specific microtheory, a LSA-inspired} conceptual space is inferred from a corpus of texts created using both ad hoc extracted pages from the Wikipedia repository and the built-in comments about the concepts of the specific Cyc microtheory. Each concept is projected in the conceptual space and the desired layer of subsymbolic relationships between concepts is created. This procedure can help a user in finding the concepts that are sub-symbolically conceptually related" to a new concept that he wants to insert in the microtheory. Experimental results involving two Cyc microtheories are also reported. Springer-Verlag} Berlin Heidelberg 2007."}}


 * -- align="left" valign=top
 * Pinkwart, Niels
 * Applying Web 2.0 design principles in the design of cooperative applications
 * 5th International Conference on Cooperative Design, Visualization, and Engineering, CDVE 2008, September 22, 2008 - September 25, 2008 Calvia, Mallorca, Spain
 * 2008
 * 
 * {{hidden||Web} 2.0" is a term frequently mentioned in media - apparently applications such as Wikipedia Social Network Services Online Shops with integrated recommender systems or Sharing Services like flickr all of which rely on user's activities contributions and interactions as a central factor are fascinating for the general public. This leads to a success of these systems that seemingly exceeds the impact of most "traditional" groupware applications that have emerged from CSCW} research. This paper discusses differences and similarities between novel Web 2.0 tools and more traditional CSCW} application in terms of technologies system design and success factors. Based on this analysis the design of the cooperative learning application LARGO} is presented to illustrate how Web 2.0 success factors can be considered for the design of cooperative environments. 2008 Springer-Verlag} Berlin Heidelberg."}}
 * {{hidden||Web} 2.0" is a term frequently mentioned in media - apparently applications such as Wikipedia Social Network Services Online Shops with integrated recommender systems or Sharing Services like flickr all of which rely on user's activities contributions and interactions as a central factor are fascinating for the general public. This leads to a success of these systems that seemingly exceeds the impact of most "traditional" groupware applications that have emerged from CSCW} research. This paper discusses differences and similarities between novel Web 2.0 tools and more traditional CSCW} application in terms of technologies system design and success factors. Based on this analysis the design of the cooperative learning application LARGO} is presented to illustrate how Web 2.0 success factors can be considered for the design of cooperative environments. 2008 Springer-Verlag} Berlin Heidelberg."}}


 * -- align="left" valign=top
 * Pirrone, Roberto; Pipitone, Arianna & Russo, Giuseppe
 * Semantic sense extraction from Wikipedia pages
 * 3rd International Conference on Human System Interaction, HSI'2010, May 13, 2010 - May 15, 2010 Rzeszow, Poland
 * 2010
 * 


 * -- align="left" valign=top
 * Popescu, Adrian; Borgne, Herve Le & Moellic, Pierre-Alain
 * Conceptual image retrieval over a large scale database
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||Image retrieval in large-scale databases is currently based on a textual chains matching procedure. However, this approach requires an accurate annotation of images, which is not the case on the Web. To tackle this issue, we propose a reformulation method that reduces the influence of noisy image annotations. We extract a ranked list of related concepts for terms in the query from WordNet} and Wikipedia, and use them to expand the initial query. Then some visual concepts are used to re-rank the results for queries containing, explicitly or implicitly, visual cues. First evaluations on a diversified corpus of 150000 images were convincing since the proposed system was ranked 4 th and 2 nd at the WikipediaMM} task of the ImageCLEF} 2008 campaign [1]. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||Image retrieval in large-scale databases is currently based on a textual chains matching procedure. However, this approach requires an accurate annotation of images, which is not the case on the Web. To tackle this issue, we propose a reformulation method that reduces the influence of noisy image annotations. We extract a ranked list of related concepts for terms in the query from WordNet} and Wikipedia, and use them to expand the initial query. Then some visual concepts are used to re-rank the results for queries containing, explicitly or implicitly, visual cues. First evaluations on a diversified corpus of 150000 images were convincing since the proposed system was ranked 4 th and 2 nd at the WikipediaMM} task of the ImageCLEF} 2008 campaign [1]. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Popescu, Adrian; Grefenstette, Gregory & Moellic, Pierre-Alain
 * Gazetiki: Automatic creation of a geographical gazetteer
 * 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08, June 16, 2008 - June 20, 2008 Pittsburgh, PA, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Prasarnphanich, Pattarawan & Wagner, Christian
 * Creating critical mass in collaboration systems: Insights from wikipedia
 * 2008 2nd IEEE International Conference on Digital Ecosystems and Technologies, IEEE-DEST 2008, February 26, 2008 - February 29, 2008 Phitsanulok, Thailand
 * 2008
 * 


 * -- align="left" valign=top
 * Prato, Andrea & Ronchetti, Marco
 * Using Wikipedia as a reference for extracting semantic information from a text
 * 3rd International Conference on Advances in Semantic Processing - SEMAPRO 2009, October 11, 2009 - October 16, 2009 Sliema, Malta
 * 2009
 * 


 * -- align="left" valign=top
 * Preminger, Michael; Nordlie, Ragnar & Pharo, Nils
 * OUC's participation in the 2009 INEX book track
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||In this article we describe the Oslo University College's participation in the INEX} 2009 Book track. This year's tasks have been featuring complex topics, containing aspects. These lend themselves to use in both the book retrieval and the focused retrieval tasks. The OUC} has submitted retrieval results for both tasks, focusing on using the Wikipedia texts for query expansion, as well as utilizing chapter division information in (a number of) the books. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||In this article we describe the Oslo University College's participation in the INEX} 2009 Book track. This year's tasks have been featuring complex topics, containing aspects. These lend themselves to use in both the book retrieval and the focused retrieval tasks. The OUC} has submitted retrieval results for both tasks, focusing on using the Wikipedia texts for query expansion, as well as utilizing chapter division information in (a number of) the books. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Priedhorsky, Reid; Chen, Jilin; Lam, Shyong K.; Panciera, Katherine; Terveen, Loren & Riedl, John
 * Creating, destroying, and restoring value in wikipedia
 * 2007 International ACM Conference on Supporting Group Work, GROUP'07, November 4, 2007 - November 7, 2007 Sanibel Island, FL, United states
 * 2007
 * 


 * -- align="left" valign=top
 * Pu, Qiang; He, Daqing & Li, Qi
 * Query expansion for effective geographic information retrieval
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||We developed two methods for monolingual Geo-CLEF} 2008 task. The GCEC} method aims to test the effectiveness of our online geographic coordinates extraction and clustering algorithm, and the WIKIGEO} method wants to examine the usefulness of using the geographic coordinates information in Wikipedia for identifying geo-locations. We proposed a measure of topic distance to evaluate these two methods. The experiments results show that: 1) our online geographic coordinates extraction and clustering algorithm is useful for the type of locations that do not have clear corresponding coordinates; 2) the expansion based on the geo-locations generated by GCEC} is effective in improving geographic retrieval; 3) Wikipedia can help in finding the coordinates for many geo-locations, but its usage for query expansion still needs further study; 4) query expansion based on title only obtained better results than that on the title and narrative parts, even though the latter contains more related geographic information. Further study is needed for this part. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||We developed two methods for monolingual Geo-CLEF} 2008 task. The GCEC} method aims to test the effectiveness of our online geographic coordinates extraction and clustering algorithm, and the WIKIGEO} method wants to examine the usefulness of using the geographic coordinates information in Wikipedia for identifying geo-locations. We proposed a measure of topic distance to evaluate these two methods. The experiments results show that: 1) our online geographic coordinates extraction and clustering algorithm is useful for the type of locations that do not have clear corresponding coordinates; 2) the expansion based on the geo-locations generated by GCEC} is effective in improving geographic retrieval; 3) Wikipedia can help in finding the coordinates for many geo-locations, but its usage for query expansion still needs further study; 4) query expansion based on title only obtained better results than that on the title and narrative parts, even though the latter contains more related geographic information. Further study is needed for this part. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Puttaswamy, Krishna P.N.; Marshall, Catherine C.; Ramasubramanian, Venugopalan; Stuedi, Patrick; Terry, Douglas B. & Wobber, Ted
 * Docx2Go: Collaborative editing of fidelity reduced documents on mobile devices
 * 8th Annual International Conference on Mobile Systems, Applications and Services, MobiSys 2010, June 15, 2010 - June 18, 2010 San Francisco, CA, United states
 * 2010
 * 
 * {{hidden||Docx2Go} is a new framework to support editing of shared documents on mobile devices. Three high-level requirements influenced its design - namely, the need to adapt content, especially textual content, on the fly according to the quality of the network connection and the form factor of each device; support for concurrent, uncoordinated editing on different devices, whose effects will later be merged on all devices in a convergent and consistent manner without sacrificing the semantics of the edits; and a flexible replication architecture that accommodates both device-to-device and cloudmediated synchronization. Docx2Go} supports on-the-go editing for XML} documents, such as documents in Microsoft Word and other commonly used formats. It combines the best practices from content adaptation systems, weakly consistent replication systems, and collaborative editing systems, while extending the state of the art in each of these fields. The implementation of Docx2Go} has been evaluated based on a workload drawn from Wikipedia. ""}}
 * {{hidden||Docx2Go} is a new framework to support editing of shared documents on mobile devices. Three high-level requirements influenced its design - namely, the need to adapt content, especially textual content, on the fly according to the quality of the network connection and the form factor of each device; support for concurrent, uncoordinated editing on different devices, whose effects will later be merged on all devices in a convergent and consistent manner without sacrificing the semantics of the edits; and a flexible replication architecture that accommodates both device-to-device and cloudmediated synchronization. Docx2Go} supports on-the-go editing for XML} documents, such as documents in Microsoft Word and other commonly used formats. It combines the best practices from content adaptation systems, weakly consistent replication systems, and collaborative editing systems, while extending the state of the art in each of these fields. The implementation of Docx2Go} has been evaluated based on a workload drawn from Wikipedia. ""}}


 * -- align="left" valign=top
 * Qiu, Qiang; Zhang, Yang; Zhu, Junping & Qu, Wei
 * Building a text classifier by a keyword and Wikipedia knowledge
 * 5th International Conference on Advanced Data Mining and Applications, ADMA 2009, August 17, 2009 - August 19, 2009 Beijing, China
 * 2009
 * 
 * {{hidden||Traditional approach for building text classifiers usually require a lot of labeled documents, which are expensive to obtain. In this paper, we propose a new text classification approach based on a keyword and Wikipedia knowledge, so as to avoid labeling documents manually. Firstly, we retrieve a set of related documents about the keyword from Wikipedia. And then, with the help of related Wikipedia pages, more positive documents are extracted from the unlabeled documents. Finally, we train a text classifier with these positive documents and unlabeled documents. The experiment result on {20Newsgroup} dataset show that the proposed approach performs very competitively compared with NB-SVM, a PU} learner, and NB, a supervised learner. 2009 Springer.}}
 * {{hidden||Traditional approach for building text classifiers usually require a lot of labeled documents, which are expensive to obtain. In this paper, we propose a new text classification approach based on a keyword and Wikipedia knowledge, so as to avoid labeling documents manually. Firstly, we retrieve a set of related documents about the keyword from Wikipedia. And then, with the help of related Wikipedia pages, more positive documents are extracted from the unlabeled documents. Finally, we train a text classifier with these positive documents and unlabeled documents. The experiment result on {20Newsgroup} dataset show that the proposed approach performs very competitively compared with NB-SVM, a PU} learner, and NB, a supervised learner. 2009 Springer.}}


 * -- align="left" valign=top
 * Ramanathan, Madhu; Rajagopal, Srikant; Karthik, Venkatesh; Murugeshan, Meenakshi Sundaram & Mukherjee, Saswati
 * A recursive approach to entity ranking and list completion using entity determining terms, qualifiers and prominent n-grams
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||This paper presents our approach for INEX} 2009 Entity Ranking track which consists of two subtasks viz. Entity Ranking and List Completion. Retrieving the correct entities according to the user query is a three-step process viz. extracting the required information from the query and the provided categories, extracting the relevant documents which may be either prospective entities or intermediate pointers to prospective entities by making use of the structure available in the Wikipedia Corpus and finally ranking the resultant set of documents. We have extracted the Entity Determining Terms (EDTs), Qualifiers and prominent n-grams from the query, strategically exploited the relation between the extracted terms and the structure and connectedness of the corpus to retrieve links which are highly probable of being entities and then used a recursive mechanism for retrieving relevant documents through the Lucene Search. Our ranking mechanism combines various approaches that make use of category information, links, titles and WordNet} information, initial description and the text of the document. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper presents our approach for INEX} 2009 Entity Ranking track which consists of two subtasks viz. Entity Ranking and List Completion. Retrieving the correct entities according to the user query is a three-step process viz. extracting the required information from the query and the provided categories, extracting the relevant documents which may be either prospective entities or intermediate pointers to prospective entities by making use of the structure available in the Wikipedia Corpus and finally ranking the resultant set of documents. We have extracted the Entity Determining Terms (EDTs), Qualifiers and prominent n-grams from the query, strategically exploited the relation between the extracted terms and the structure and connectedness of the corpus to retrieve links which are highly probable of being entities and then used a recursive mechanism for retrieving relevant documents through the Lucene Search. Our ranking mechanism combines various approaches that make use of category information, links, titles and WordNet} information, initial description and the text of the document. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Ramezani, Maryam & Witschel, Hans Friedrich
 * An intelligent system for semi-automatic evolution of ontologies
 * 2010 IEEE International Conference on Intelligent Systems, IS 2010, July 7, 2010 - July 9, 2010 London, United kingdom
 * 2010
 * 


 * -- align="left" valign=top
 * Ramirez, Alex; Ji, Shaobo; Riordan, Rob; Ulbrich, Frank & Hine, Michael J.
 * Empowering business students: Using Web 2.0 tools in the classroom
 * 2nd International Conference on Computer Supported Education, CSEDU 2010, April 7, 2010 - April 10, 2010 Valencia, Spain
 * 2010
 * {{hidden||This paper discusses the design of a course to empower business students using Web 2.0 technologies. We explore the learning phenomenon as a way to bring forward a process of continuous improvement supported by social software. We develop a framework to assess the infrastructure against expectations of skill proficiency using Web 2.0 tools which must emerge as a result of registering in an introductory business information and communication technologies (ICT) course in a business school of a Canadian university. We use Friedman's (2007) thesis that the world is flat" to discuss issues of globalization and the role of ICT.} Students registered in the course are familiar with some of the tools we introduce and use in the course. The students are members of Facebook or MySpace} regularly check YouTube} and use Wikipedia in their studies. They use these tools to socialize. We broaden the students' horizons and explore the potential business benefits of such tools and empower the students to use Web 2.0 technologies within a business context."}}
 * {{hidden||This paper discusses the design of a course to empower business students using Web 2.0 technologies. We explore the learning phenomenon as a way to bring forward a process of continuous improvement supported by social software. We develop a framework to assess the infrastructure against expectations of skill proficiency using Web 2.0 tools which must emerge as a result of registering in an introductory business information and communication technologies (ICT) course in a business school of a Canadian university. We use Friedman's (2007) thesis that the world is flat" to discuss issues of globalization and the role of ICT.} Students registered in the course are familiar with some of the tools we introduce and use in the course. The students are members of Facebook or MySpace} regularly check YouTube} and use Wikipedia in their studies. They use these tools to socialize. We broaden the students' horizons and explore the potential business benefits of such tools and empower the students to use Web 2.0 technologies within a business context."}}
 * {{hidden||This paper discusses the design of a course to empower business students using Web 2.0 technologies. We explore the learning phenomenon as a way to bring forward a process of continuous improvement supported by social software. We develop a framework to assess the infrastructure against expectations of skill proficiency using Web 2.0 tools which must emerge as a result of registering in an introductory business information and communication technologies (ICT) course in a business school of a Canadian university. We use Friedman's (2007) thesis that the world is flat" to discuss issues of globalization and the role of ICT.} Students registered in the course are familiar with some of the tools we introduce and use in the course. The students are members of Facebook or MySpace} regularly check YouTube} and use Wikipedia in their studies. They use these tools to socialize. We broaden the students' horizons and explore the potential business benefits of such tools and empower the students to use Web 2.0 technologies within a business context."}}


 * -- align="left" valign=top
 * Rao, Weixiong; Fu, Ada Wai-Chee; Chen, Lei & Chen, Hanhua
 * Stairs: Towards efficient full-text filtering and dissemination in a DHT environment
 * 25th IEEE International Conference on Data Engineering, ICDE 2009, March 29, 2009 - April 2, 2009 Shanghai, China
 * 2009
 * 
 * {{hidden||Nowadays contents in Internet like weblogs, wikipedia and news sites become live". How to notify and provide users with the relevant contents becomes a challenge. Unlike conventional Web search technology or the RSS} feed this paper envisions a personalized full-text content filtering and dissemination system in a highly distributed environment such as a Distributed Hash Table (DHT).} Users can subscribe to their interested contents by specifying some terms and threshold values for filtering. Then published contents will be disseminated to the associated Subscribers.We} propose a novel and simple framework of filter registration and content publication STAIRS.} By the new framework we propose three algorithms (default forwarding dynamic forwarding and adaptive forwarding) to reduce the forwarding cost and false dismissal rate; meanwhile the subscriber can receive the desired contents with no duplicates. In particular the adaptive forwarding utilizes the filter information to significantly reduce the forwarding cost. Experiments based on two real query logs and two real datasets show the effectiveness of our proposed framework. """}}
 * {{hidden||Nowadays contents in Internet like weblogs, wikipedia and news sites become live". How to notify and provide users with the relevant contents becomes a challenge. Unlike conventional Web search technology or the RSS} feed this paper envisions a personalized full-text content filtering and dissemination system in a highly distributed environment such as a Distributed Hash Table (DHT).} Users can subscribe to their interested contents by specifying some terms and threshold values for filtering. Then published contents will be disseminated to the associated Subscribers.We} propose a novel and simple framework of filter registration and content publication STAIRS.} By the new framework we propose three algorithms (default forwarding dynamic forwarding and adaptive forwarding) to reduce the forwarding cost and false dismissal rate; meanwhile the subscriber can receive the desired contents with no duplicates. In particular the adaptive forwarding utilizes the filter information to significantly reduce the forwarding cost. Experiments based on two real query logs and two real datasets show the effectiveness of our proposed framework. """}}


 * -- align="left" valign=top
 * Ray, Santosh Kumar; Singh, Shailendra & Joshi, B.P.
 * World wide web based question answering system - A relevance feedback framework for automatic answer validation
 * 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009, August 4, 2009 - August 6, 2009 London, United kingdom
 * 2009
 * 


 * -- align="left" valign=top
 * Razmara, Majid & Kosseim, Leila
 * A little known fact is... Answering other questions using interest-markers
 * 8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007, Febrary 18, 2007 - Febrary 24, 2007 Mexico City, Mexico
 * 2007
 * {{hidden||In this paper, we present an approach to answering Other"} questions using the notion of interest marking terms. {"Other"} questions have been introduced in the TREC-QA} track to retrieve other interesting facts about a topic. To answer these types of questions our system extracts from Wikipedia articles a list of interest-marking terms related to the topic and uses them to extract and score sentences from the document collection where the answer should be found. Sentences are then re-ranked using universal interest-markers that are not specific to the topic. The top sentences are then returned as possible answers. When using the 2004 TREC} data for development and 2005 data for testing the approach achieved an F-score of 0.265 placing it among the top systems. Springer-Verlag} Berlin Heidelberg 2007."}}
 * {{hidden||In this paper, we present an approach to answering Other"} questions using the notion of interest marking terms. {"Other"} questions have been introduced in the TREC-QA} track to retrieve other interesting facts about a topic. To answer these types of questions our system extracts from Wikipedia articles a list of interest-marking terms related to the topic and uses them to extract and score sentences from the document collection where the answer should be found. Sentences are then re-ranked using universal interest-markers that are not specific to the topic. The top sentences are then returned as possible answers. When using the 2004 TREC} data for development and 2005 data for testing the approach achieved an F-score of 0.265 placing it among the top systems. Springer-Verlag} Berlin Heidelberg 2007."}}
 * {{hidden||In this paper, we present an approach to answering Other"} questions using the notion of interest marking terms. {"Other"} questions have been introduced in the TREC-QA} track to retrieve other interesting facts about a topic. To answer these types of questions our system extracts from Wikipedia articles a list of interest-marking terms related to the topic and uses them to extract and score sentences from the document collection where the answer should be found. Sentences are then re-ranked using universal interest-markers that are not specific to the topic. The top sentences are then returned as possible answers. When using the 2004 TREC} data for development and 2005 data for testing the approach achieved an F-score of 0.265 placing it among the top systems. Springer-Verlag} Berlin Heidelberg 2007."}}


 * -- align="left" valign=top
 * Reinoso, Antonio J.; Gonzalez-Barahona, Jesus M.; Robles, Gregorio & Ortega, Felipe
 * A quantitative approach to the use of the wikipedia
 * IEEE Symposium on Computers and Communications 2009, ISCC 2009, July 5, 2009 - July 8, 2009 Sousse, Tunisia
 * 2009
 * 


 * -- align="left" valign=top
 * Ren, Reede; Misra, Hemant & Jose, Joemon M.
 * Semantic based adaptive movie summarisation
 * 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China
 * 2009
 * 
 * {{hidden||This paper proposes a framework for automatic video summarization by exploiting internal and external textual descriptions. The web knowledge base Wikipedia is used as a middle media layer, which bridges the gap between general user descriptions and exact film subtitles. Latent Dirichlet Allocation (LDA) detects as well as matches the distribution of content topics in Wikipedia items and movie subtitles. A saliency based summarization system then selects perceptually attractive segments from each content topic for summary composition. The evaluation collection consists of six English movies and a high topic coverage is shown over official trails from the Internet Movie Database. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper proposes a framework for automatic video summarization by exploiting internal and external textual descriptions. The web knowledge base Wikipedia is used as a middle media layer, which bridges the gap between general user descriptions and exact film subtitles. Latent Dirichlet Allocation (LDA) detects as well as matches the distribution of content topics in Wikipedia items and movie subtitles. A saliency based summarization system then selects perceptually attractive segments from each content topic for summary composition. The evaluation collection consists of six English movies and a high topic coverage is shown over official trails from the Internet Movie Database. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Riche, Nathalie Henry; Lee, Bongshin & Chevalier, Fanny
 * IChase: Supporting exploration and awareness of editing activities on Wikipedia
 * International Conference on Advanced Visual Interfaces, AVI '10, May 26, 2010 - May 28, 2010 Rome, Italy
 * 2010
 * 
 * {{hidden||To increase its credibility and preserve the trust of its readers. Wikipedia needs to ensure a good quality of its articles. To that end, it is critical for Wikipedia administrators to be aware of contributors' editing activity to monitor vandalism, encourage reliable contributors to work on specific articles, or find mentors for new contributors. In this paper, we present IChase, a novel interactive visualization tool to provide administrators with better awareness of editing activities on Wikipedia. Unlike the currently used visualizations that provide only page-centric information, IChase} visualizes the trend of activities for two entity types, articles and contributors. IChase} is based on two heatmaps (one for each entity type) synchronized to one timeline. It allows users to interactively explore the history of changes by drilling down into specific articles and contributors, or time points to access the details of the changes. We also present a case study to illustrate how IChase} can be used to monitor editing activities of Wikipedia authors, as well as a usability study. We conclude by discussing the strengths and weaknesses of IChase.} ""}}
 * {{hidden||To increase its credibility and preserve the trust of its readers. Wikipedia needs to ensure a good quality of its articles. To that end, it is critical for Wikipedia administrators to be aware of contributors' editing activity to monitor vandalism, encourage reliable contributors to work on specific articles, or find mentors for new contributors. In this paper, we present IChase, a novel interactive visualization tool to provide administrators with better awareness of editing activities on Wikipedia. Unlike the currently used visualizations that provide only page-centric information, IChase} visualizes the trend of activities for two entity types, articles and contributors. IChase} is based on two heatmaps (one for each entity type) synchronized to one timeline. It allows users to interactively explore the history of changes by drilling down into specific articles and contributors, or time points to access the details of the changes. We also present a case study to illustrate how IChase} can be used to monitor editing activities of Wikipedia authors, as well as a usability study. We conclude by discussing the strengths and weaknesses of IChase.} ""}}


 * -- align="left" valign=top
 * Riedl, John
 * Altruism, selfishness, and destructiveness on the social web
 * 5th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, AH 2008, July 29, 2008 - August 1, 2008 Hannover, Germany
 * 2008
 * 
 * {{hidden||Many online communities are emerging that, like Wikipedia, bring people together to build community-maintained artifacts of lasting value (CALVs).} What is the nature of people's participation in building these repositories? What are their motives? In what ways is their behavior destructive instead of constructive? Motivating people to contribute is a key problem because the quantity and quality of contributions ultimately determine a CALV's} value. We pose three related research questions: 1) How does intelligent task routing-matching people with work-affect the quantity of contributions? 2) How does reviewing contributions before accepting them affect the quality of contributions? 3) How do recommender systems affect the evolution of a shared tagging vocabulary among the contributors? We will explore these questions in the context of existing CALVs, including Wikipedia, Facebook, and MovieLens.} 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Many online communities are emerging that, like Wikipedia, bring people together to build community-maintained artifacts of lasting value (CALVs).} What is the nature of people's participation in building these repositories? What are their motives? In what ways is their behavior destructive instead of constructive? Motivating people to contribute is a key problem because the quantity and quality of contributions ultimately determine a CALV's} value. We pose three related research questions: 1) How does intelligent task routing-matching people with work-affect the quantity of contributions? 2) How does reviewing contributions before accepting them affect the quality of contributions? 3) How do recommender systems affect the evolution of a shared tagging vocabulary among the contributors? We will explore these questions in the context of existing CALVs, including Wikipedia, Facebook, and MovieLens.} 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Roger, Sandra; Vila, Katia; Ferrandez, Antonio; Pardino, Maria; Gomez, Jose Manuel; Puchol-Blasco, Marcel & Peral, Jesus
 * Using AliQAn in monolingual QA@CLEF 2008
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||This paper describes the participation of the system AliQAn} in the CLEF} 2008 Spanish monolingual QA} task. This time, the main goals of the current version of AliQAn} were to deal with topic-related questions and to decrease the number of inexact answers. We have also explored the use of the Wikipedia corpora, which have posed some new challenges for the QA} task. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper describes the participation of the system AliQAn} in the CLEF} 2008 Spanish monolingual QA} task. This time, the main goals of the current version of AliQAn} were to deal with topic-related questions and to decrease the number of inexact answers. We have also explored the use of the Wikipedia corpora, which have posed some new challenges for the QA} task. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Roth, Benjamin & Klakow, Dietrich
 * Combining wikipedia-based concept models for cross-language retrieval
 * 1st Information Retrieval Facility Conference, IRFC 2010, May 31, 2010 - May 31, 2010 Vienna, Austria
 * 2010
 * 
 * {{hidden||As a low-cost ressource that is up-to-date, Wikipedia recently gains attention as a means to provide cross-language brigding for information retrieval. Contradictory to a previous study, we show that standard Latent Dirichlet Allocation (LDA) can extract cross-language information that is valuable for IR} by simply normalizing the training data. Furthermore, we show that LDA} and Explicit Semantic Analysis (ESA) complement each other, yielding significant improvements when combined. Such a combination can significantly contribute to retrieval based on machine translation, especially when query translations contain errors. The experiments were perfomed on the Multext JOC} corpus und a CLEF} dataset. ""}}
 * {{hidden||As a low-cost ressource that is up-to-date, Wikipedia recently gains attention as a means to provide cross-language brigding for information retrieval. Contradictory to a previous study, we show that standard Latent Dirichlet Allocation (LDA) can extract cross-language information that is valuable for IR} by simply normalizing the training data. Furthermore, we show that LDA} and Explicit Semantic Analysis (ESA) complement each other, yielding significant improvements when combined. Such a combination can significantly contribute to retrieval based on machine translation, especially when query translations contain errors. The experiments were perfomed on the Multext JOC} corpus und a CLEF} dataset. ""}}


 * -- align="left" valign=top
 * Ruiz-Casado, Maria; Alfonseca, Enrique & Castells, Pablo
 * Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets
 * Third International Atlantic Web Intelligence Conference on Advances in Web Intelligence, AWIC 2005, June 6, 2005 - June 9, 2005 Lodz, Poland
 * 2005
 * {{hidden||We describe an approach taken for automatically associating entries from an on-line encyclopedia with concepts in an ontology or a lexical semantic network. It has been tested with the Simple English Wikipedia and WordNet, although it can be used with other resources. The accuracy in disambiguating the sense of the encyclopedia entries reaches 91.11\% (83.89\% for polysemous words). It will be applied to enriching ontologies with encyclopedic knowledge. Springer-Verlag} Berlin Heidelberg 2005.}}
 * {{hidden||We describe an approach taken for automatically associating entries from an on-line encyclopedia with concepts in an ontology or a lexical semantic network. It has been tested with the Simple English Wikipedia and WordNet, although it can be used with other resources. The accuracy in disambiguating the sense of the encyclopedia entries reaches 91.11\% (83.89\% for polysemous words). It will be applied to enriching ontologies with encyclopedic knowledge. Springer-Verlag} Berlin Heidelberg 2005.}}
 * {{hidden||We describe an approach taken for automatically associating entries from an on-line encyclopedia with concepts in an ontology or a lexical semantic network. It has been tested with the Simple English Wikipedia and WordNet, although it can be used with other resources. The accuracy in disambiguating the sense of the encyclopedia entries reaches 91.11\% (83.89\% for polysemous words). It will be applied to enriching ontologies with encyclopedic knowledge. Springer-Verlag} Berlin Heidelberg 2005.}}


 * -- align="left" valign=top
 * Ruiz-Casado, Maria; Alfonseca, Enrique & Castells, Pablo
 * Automatic extraction of semantic relationships for wordNet by means of pattern learning from wikipedia
 * 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005: Natural Language Processing and Information Systems, June 15, 2005 - June 17, 2005 Alicante, Spain
 * 2005
 * {{hidden||This paper describes an automatic approach to identify lexical patterns which represent semantic relationships between concepts, from an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet} 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 1200 new relationships that did not appear in WordNet} originally. The precision of these relationships ranges between 0.61 and 0.69, depending on the relation. Springer-Verlag} Berlin Heidelberg 2005.}}
 * {{hidden||This paper describes an automatic approach to identify lexical patterns which represent semantic relationships between concepts, from an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet} 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 1200 new relationships that did not appear in WordNet} originally. The precision of these relationships ranges between 0.61 and 0.69, depending on the relation. Springer-Verlag} Berlin Heidelberg 2005.}}
 * {{hidden||This paper describes an automatic approach to identify lexical patterns which represent semantic relationships between concepts, from an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet} 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 1200 new relationships that did not appear in WordNet} originally. The precision of these relationships ranges between 0.61 and 0.69, depending on the relation. Springer-Verlag} Berlin Heidelberg 2005.}}


 * -- align="left" valign=top
 * Sabin, Mihaela & Leone, Jim
 * IT education 2.0
 * 10th ACM Special Interest Group for Information Technology Education, SIGITE 2009, October 22, 2009 - October 24, 2009 Fairfax, VA, United states
 * 2009
 * 
 * {{hidden||Today's networked computing and communications technologies have changed how information, knowledge, and culture are produced and exchanged. People around the world join online communities that are set up voluntarily and use their members' collaborative participation to solve problems, share interests, raise awareness, or simply establish social connections. Two online community examples with significant economic and cultural impact are the open source software movement and Wikipedia. The technological infrastructure of these peer production models uses current Web 2.0 tools, such as wikis, blogs, social networking, semantic tagging, and RSS} feeds. With no control exercised by property-based markets or managerial hierarchies, commons-based peer production systems contribute to and serve the public domain and public good. The body of cultural, educational, and scientific work of many online communities is made available to the public for free and legal sharing, use, repurposing, and remixing. Higher education's receptiveness to these transformative trends deserves close examination. In the case of the Information Technology (IT) education community, in particular, we note that the curricular content, research questions, and professional skills the IT} discipline encompasses have direct linkages with the Web 2.0 phenomenon. For that reason, IT} academic programs should pioneer and lead efforts to cultivate peer production online communities. We state the case that free access and open engagement facilitated by technological infrastructures that support a peer production model benefit IT} education. We advocate that these technologies be employed to strengthen IT} educational programs, advance IT} research, and revitalize the IT} education community. }}
 * {{hidden||Today's networked computing and communications technologies have changed how information, knowledge, and culture are produced and exchanged. People around the world join online communities that are set up voluntarily and use their members' collaborative participation to solve problems, share interests, raise awareness, or simply establish social connections. Two online community examples with significant economic and cultural impact are the open source software movement and Wikipedia. The technological infrastructure of these peer production models uses current Web 2.0 tools, such as wikis, blogs, social networking, semantic tagging, and RSS} feeds. With no control exercised by property-based markets or managerial hierarchies, commons-based peer production systems contribute to and serve the public domain and public good. The body of cultural, educational, and scientific work of many online communities is made available to the public for free and legal sharing, use, repurposing, and remixing. Higher education's receptiveness to these transformative trends deserves close examination. In the case of the Information Technology (IT) education community, in particular, we note that the curricular content, research questions, and professional skills the IT} discipline encompasses have direct linkages with the Web 2.0 phenomenon. For that reason, IT} academic programs should pioneer and lead efforts to cultivate peer production online communities. We state the case that free access and open engagement facilitated by technological infrastructures that support a peer production model benefit IT} education. We advocate that these technologies be employed to strengthen IT} educational programs, advance IT} research, and revitalize the IT} education community. }}


 * -- align="left" valign=top
 * Sacarea, C.; Meza, R. & Cimpoi, M.
 * Improving conceptual search results reorganization using term-concept mappings retrieved from wikipedia
 * 52008 IEEE International Conference on Automation, Quality and Testing, Robotics, AQTR 2008 - THETA 16th Edition, May 22, 2008 - May 25, 2008 Cluj-Napoca, Romania
 * 2008
 * 


 * -- align="left" valign=top
 * Safarkhani, Banafsheh; Mohsenzadeh, Mehran & Rahmani, Amir Masoud
 * Improving website user model automatically using a comprehensive lexical semantic resource
 * 2009 International Conference on E-Business and Information System Security, EBISS 2009, May 23, 2009 - May 24, 2009 Wuhan, China
 * 2009
 * 
 * {{hidden||A major component in any web personalization system is its user model. Recently a number of researches have been done to incorporate semantics of a web site in representation of its users. All of these efforts use either a specific manually constructed taxonomy or ontology or a general purpose one like WordNet} to map page views into semantic elements. However, building a hierarchy of concepts manually is time consuming and expensive. On the other hand, general purpose resources suffer from low coverage of domain specific terms. In this paper we intend to address both these shortcomings. Our contribution is that we introduce a mechanism to automatically improve the representation of the user in the website using a comprehensive lexical semantic resource. We utilize Wikipedia, the largest encyclopedia to date, as a rich lexical resource to enhance the automatic construction of vector model representation of user interests. We evaluate the effectiveness of the resulting model using concepts extracted from this promising resource. ""}}
 * {{hidden||A major component in any web personalization system is its user model. Recently a number of researches have been done to incorporate semantics of a web site in representation of its users. All of these efforts use either a specific manually constructed taxonomy or ontology or a general purpose one like WordNet} to map page views into semantic elements. However, building a hierarchy of concepts manually is time consuming and expensive. On the other hand, general purpose resources suffer from low coverage of domain specific terms. In this paper we intend to address both these shortcomings. Our contribution is that we introduce a mechanism to automatically improve the representation of the user in the website using a comprehensive lexical semantic resource. We utilize Wikipedia, the largest encyclopedia to date, as a rich lexical resource to enhance the automatic construction of vector model representation of user interests. We evaluate the effectiveness of the resulting model using concepts extracted from this promising resource. ""}}


 * -- align="left" valign=top
 * Safarkhani, Banafsheh; Talabeigi, Mojde; Mohsenzadeh, Mehran & Meybodi, Mohammad Reza
 * Deriving semantic sessions from semantic clusters
 * 2009 International Conference on Information Management and Engineering, ICIME 2009, April 3, 2009 - April 5, 2009 Kuala Lumpur, Malaysia
 * 2009
 * 


 * -- align="left" valign=top
 * Saito, Kazumi; Kimura, Masahiro & Motoda, Hiroshi
 * Discovering influential nodes for SIS models in social networks
 * 12th International Conference on Discovery Science, DS 2009, October 3, 2009 - October 5, 2009 Porto, Portugal
 * 2009
 * 


 * -- align="left" valign=top
 * Sallaberry, Arnaud; Zaidi, Faraz; Pich, Christian & Melancon, Guy
 * Interactive visualization and navigation of web search results revealing community structures and bridges
 * 36th Graphics Interface Conference, GI 2010, May 31, 2010 - June 2, 2010 Ottawa, ON, Canada
 * 2010


 * -- align="left" valign=top
 * Santos, Diana & Cardoso, Nuno
 * GikiP: Evaluating geographical answers from wikipedia
 * 5th Workshop on Geographic Information Retrieval, GIR'08, Co-located with the ACM 17th Conference on Information and Knowledge Management, CIKM 2008, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states
 * 2008
 * 
 * {{hidden||This paper describes GikiP, a pilot task that took place in 2008 in CLEF.} We present the motivation behind GikiP} and the use of Wikipedia as the evaluation collection, detail the task and we list new ideas for its continuation.}}
 * {{hidden||This paper describes GikiP, a pilot task that took place in 2008 in CLEF.} We present the motivation behind GikiP} and the use of Wikipedia as the evaluation collection, detail the task and we list new ideas for its continuation.}}


 * -- align="left" valign=top
 * Santos, Diana; Cardoso, Nuno; Carvalho, Paula; Dornescu, Iustin; Hartrumpf, Sven; Leveling, Johannes & Skalban, Yvonne
 * GikiP at geoCLEF 2008: Joining GIR and QA forces for querying wikipedia
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||This paper reports on the GikiP} pilot that took place in 2008 in GeoCLEF.} This pilot task requires a combination of methods from geographical information retrieval and question answering to answer queries to the Wikipedia. We start by the task description, providing details on topic choice and evaluation measures. Then we offer a brief motivation from several perspectives, and we present results in detail. A comparison of participants' approaches is then presented, and the paper concludes with improvements for the next edition. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper reports on the GikiP} pilot that took place in 2008 in GeoCLEF.} This pilot task requires a combination of methods from geographical information retrieval and question answering to answer queries to the Wikipedia. We start by the task description, providing details on topic choice and evaluation measures. Then we offer a brief motivation from several perspectives, and we present results in detail. A comparison of participants' approaches is then presented, and the paper concludes with improvements for the next edition. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Sarrafzadeh, Bahareh & Shamsfard, Mehrnoush
 * Parallel annotation and population: A cross-language experience
 * Proceedings - 2009 International Conference on Computer Engineering and Technology, ICCET 2009 445 Hoes Lane - P.O.Box} 1331, Piscataway, NJ} 08855-1331, United States
 * 2009
 * 
 * {{hidden||In recent years automatic Ontology Population (OP) from texts has emerged as a new field of application for knowledge acquisition techniques. In OP, the instances of an ontology classes will be extracted from text and added under the ontology concepts. On the other hand, semantic annotation which is a key task in moving toward semantic web tries to tag instance data in a text by their corresponding ontology classes; so the ontology population activity accompanies generating semantic annotations usually. In this paper we introduce a cross-lingual population/ annotation system called POPTA} which annotates Persian texts according to an English lexicalized ontology and populates the English ontology according to the input Persian texts. It exploits a hybrid approach, a combination of statistical and pattern-based methods as well as techniques founded on the web and search engines and a novel method of resolving translation ambiguities. POPTA} also uses Wikipedia as a vast natural language encyclopedia to extract new instances to populate the input ontology. ""}}
 * {{hidden||In recent years automatic Ontology Population (OP) from texts has emerged as a new field of application for knowledge acquisition techniques. In OP, the instances of an ontology classes will be extracted from text and added under the ontology concepts. On the other hand, semantic annotation which is a key task in moving toward semantic web tries to tag instance data in a text by their corresponding ontology classes; so the ontology population activity accompanies generating semantic annotations usually. In this paper we introduce a cross-lingual population/ annotation system called POPTA} which annotates Persian texts according to an English lexicalized ontology and populates the English ontology according to the input Persian texts. It exploits a hybrid approach, a combination of statistical and pattern-based methods as well as techniques founded on the web and search engines and a novel method of resolving translation ambiguities. POPTA} also uses Wikipedia as a vast natural language encyclopedia to extract new instances to populate the input ontology. ""}}


 * -- align="left" valign=top
 * Sawaki, M.; Minami, Y.; Higashinaka, R.; Dohsaka, K. & Maeda, E.
 * Who is this" quiz dialogue system and users' evaluation"
 * 2008 IEEE Workshop on Spoken Language Technology, SLT 2008, December 15, 2008 - December 19, 2008 Goa, India
 * 2008
 * 
 * {{hidden||In order to design a dialogue system that users enjoy and want to be near for a long time, it is important to know the effect of the system's action on users. This paper describes Who} is this" quiz dialogue system and its users' evaluation. Its quiz-style information presentation has been found effective for educational tasks. In our ongoing effort to make it closer to a conversational partner we implemented the system as a stuffed-toy (or CG} equivalent). Quizzes are automatically generated from Wikipedia articles rather than from hand-crafted sets of biographical facts. Network mining is utilized to prepare adaptive system responses. Experiments showed the effectiveness of person network and the relationship of user attribute and interest level. """}}
 * {{hidden||In order to design a dialogue system that users enjoy and want to be near for a long time, it is important to know the effect of the system's action on users. This paper describes Who} is this" quiz dialogue system and its users' evaluation. Its quiz-style information presentation has been found effective for educational tasks. In our ongoing effort to make it closer to a conversational partner we implemented the system as a stuffed-toy (or CG} equivalent). Quizzes are automatically generated from Wikipedia articles rather than from hand-crafted sets of biographical facts. Network mining is utilized to prepare adaptive system responses. Experiments showed the effectiveness of person network and the relationship of user attribute and interest level. """}}


 * -- align="left" valign=top
 * Scardino, Giuseppe; Infantino, Ignazio & Gaglio, Salvatore
 * Automated object shape modelling by clustering of web images
 * 3rd International Conference on Computer Vision Theory and Applications, VISAPP 2008, January 22, 2008 - January 25, 2008 Funchal, Madeira, Portugal
 * 2008


 * -- align="left" valign=top
 * Scarpazza, Daniele Paolo & Braudaway, Gordon W.
 * Workload characterization and optimization of high-performance text indexing on the cell broadband enginetm (Cell/B.E.)
 * 2009 IEEE International Symposium on Workload Characterization, IISWC 2009, October 4, 2009 - October 6, 2009 Austin, TX, United states
 * 2009
 * 
 * {{hidden||In this paper we examine text indexing on the Cell Broadband EngineTM} (Cell/B.E.), an emerging workload on an emerging multicore architecture. The Cell Broadband Engine is a microprocessor jointly developed by Sony Computer Entertainment, Toshiba, and IBM} (herein, we refer to it simply as the Cell").} The importance of text indexing is growing not only because it is the core task of commercial and enterprise-level search engines but also because it appears more and more frequently in desktop and mobile applications and on network appliances. Text indexing is a computationally intensive task. Multi-core processors promise a multiplicative increase in compute power but this power is fully available only if workloads exhibit the right amount and kind of parallelism. We present the challenges and the results of mapping text indexing tasks to the Cell processor. The Cell has become known as a platform capable of impressive performance but only when algorithms have been parallelized with attention paid to its hardware peculiarities (expensive branching wide SIMD} units small local memories). We propose a parallel software design that provides essential text indexing features at a high throughput (161 Mbyte/s per chip on Wikipedia inputs) and we present a performance analysis that details the resources absorbed by each subtask. Not only does this result affect traditional applications but it also enables new ones such as live network traffic indexing for security forensics until now believed to be too computationally demanding to be performed in real time. We conclude that at the cost of a radical algorithmic redesign our Cell-based solution delivers a 4 performance advantage over recent commodity machine like the Intel Q6600. In a per-chip comparison ours is the fastest text indexer that we are aware of. """}}
 * {{hidden||In this paper we examine text indexing on the Cell Broadband EngineTM} (Cell/B.E.), an emerging workload on an emerging multicore architecture. The Cell Broadband Engine is a microprocessor jointly developed by Sony Computer Entertainment, Toshiba, and IBM} (herein, we refer to it simply as the Cell").} The importance of text indexing is growing not only because it is the core task of commercial and enterprise-level search engines but also because it appears more and more frequently in desktop and mobile applications and on network appliances. Text indexing is a computationally intensive task. Multi-core processors promise a multiplicative increase in compute power but this power is fully available only if workloads exhibit the right amount and kind of parallelism. We present the challenges and the results of mapping text indexing tasks to the Cell processor. The Cell has become known as a platform capable of impressive performance but only when algorithms have been parallelized with attention paid to its hardware peculiarities (expensive branching wide SIMD} units small local memories). We propose a parallel software design that provides essential text indexing features at a high throughput (161 Mbyte/s per chip on Wikipedia inputs) and we present a performance analysis that details the resources absorbed by each subtask. Not only does this result affect traditional applications but it also enables new ones such as live network traffic indexing for security forensics until now believed to be too computationally demanding to be performed in real time. We conclude that at the cost of a radical algorithmic redesign our Cell-based solution delivers a 4 performance advantage over recent commodity machine like the Intel Q6600. In a per-chip comparison ours is the fastest text indexer that we are aware of. """}}


 * -- align="left" valign=top
 * Scheau, Cristina; Rebedea, Traian; Chiru, Costin & Trausan-Matu, Stefan
 * Improving the relevance of search engine results by using semantic information from Wikipedia
 * 9th RoEduNet IEEE International Conference, RoEduNet 2010, June 24, 2010 - June 26, 2010 Sibiu, Romania
 * 2010


 * -- align="left" valign=top
 * Schonberg, Christian; Pree, Helmuth & Freitag, Burkhard
 * Rich ontology extraction and wikipedia expansion using language resources
 * 11th International Conference on Web-Age Information Management, WAIM 2010, July 15, 2010 - July 17, 2010 Jiuzhaigou, China
 * 2010
 * 


 * -- align="left" valign=top
 * Schonhofen, Peter
 * Identifying document topics using the wikipedia category network
 * Web Intelligence and Agent Systems
 * 2009
 * 
 * {{hidden||In the last few years the size and coverage of Wikipedia, a community edited, freely available on-line encyclopedia has reached the point where it can be effectively used to identify topics discussed in a document, similarly to an ontology or taxonomy. In this paper we will show that even a fairly simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories surprisingly well. We test the reliability of our method by predicting categories of Wikipedia articles themselves based on their bodies, and also by performing classification and clustering on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of (or in addition to) their texts. 2009 - IOS} Press.}}
 * {{hidden||In the last few years the size and coverage of Wikipedia, a community edited, freely available on-line encyclopedia has reached the point where it can be effectively used to identify topics discussed in a document, similarly to an ontology or taxonomy. In this paper we will show that even a fairly simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories surprisingly well. We test the reliability of our method by predicting categories of Wikipedia articles themselves based on their bodies, and also by performing classification and clustering on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of (or in addition to) their texts. 2009 - IOS} Press.}}


 * -- align="left" valign=top
 * Schonhofen, Peter
 * Annotating documents by Wikipedia concepts
 * 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia
 * 2008
 * 


 * -- align="left" valign=top
 * Schonhofen, Peter; Benczur, Andras; Biro, Istvan & Csalogany, Karoly
 * Cross-language retrieval with wikipedia
 * 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary
 * 2008
 * 
 * {{hidden||We demonstrate a twofold use of Wikipedia for cross-lingual information retrieval. As our main contribution, we exploit Wikipedia hyperlinkage for query term disambiguation. We also use bilingual Wikipedia articles for dictionary extension. Our method is based on translation disambiguation; we combine the Wikipedia based technique with a method based on bigram statistics of pairs formed by translations of different source language terms. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We demonstrate a twofold use of Wikipedia for cross-lingual information retrieval. As our main contribution, we exploit Wikipedia hyperlinkage for query term disambiguation. We also use bilingual Wikipedia articles for dictionary extension. Our method is based on translation disambiguation; we combine the Wikipedia based technique with a method based on bigram statistics of pairs formed by translations of different source language terms. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Shahid, Ahmad R. & Kazakov, Dimitar
 * Automatic multilingual lexicon generation using wikipedia as a resource
 * 1st International Conference on Agents and Artificial Intelligence, ICAART 2009, January 19, 2009 - January 21, 2009 Porto, Portugal
 * 2009
 * {{hidden||This paper proposes a method for creating a multilingual dictionary by taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages. The creation of such multilingual dictionaries has become possible as a result of exponential increase in the size of multilingual information on the web. Wikipedia is a prime example of such multilingual source of information on any conceivable topic in the world, which is edited by the readers. Here, a web crawler has been used to traverse Wikipedia following the links on a given page. The crawler takes out the title along with the titles of the corresponding pages in other targeted languages. The result is a set of words and phrases that are translations of each other. For efficiency, the URLs} are organized using hash tables. A lexicon has been constructed which contains 7-tuples corresponding to 7 different languages, namely: English, German, French, Polish, Bulgarian, Greek and Chinese.}}
 * {{hidden||This paper proposes a method for creating a multilingual dictionary by taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages. The creation of such multilingual dictionaries has become possible as a result of exponential increase in the size of multilingual information on the web. Wikipedia is a prime example of such multilingual source of information on any conceivable topic in the world, which is edited by the readers. Here, a web crawler has been used to traverse Wikipedia following the links on a given page. The crawler takes out the title along with the titles of the corresponding pages in other targeted languages. The result is a set of words and phrases that are translations of each other. For efficiency, the URLs} are organized using hash tables. A lexicon has been constructed which contains 7-tuples corresponding to 7 different languages, namely: English, German, French, Polish, Bulgarian, Greek and Chinese.}}
 * {{hidden||This paper proposes a method for creating a multilingual dictionary by taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages. The creation of such multilingual dictionaries has become possible as a result of exponential increase in the size of multilingual information on the web. Wikipedia is a prime example of such multilingual source of information on any conceivable topic in the world, which is edited by the readers. Here, a web crawler has been used to traverse Wikipedia following the links on a given page. The crawler takes out the title along with the titles of the corresponding pages in other targeted languages. The result is a set of words and phrases that are translations of each other. For efficiency, the URLs} are organized using hash tables. A lexicon has been constructed which contains 7-tuples corresponding to 7 different languages, namely: English, German, French, Polish, Bulgarian, Greek and Chinese.}}


 * -- align="left" valign=top
 * Shilman, Michael
 * Aggregate documents: Making sense of a patchwork of topical documents
 * 8th ACM Symposium on Document Engineering, DocEng 2008, September 16, 2008 - September 19, 2008 Sao Paulo, Brazil
 * 2008
 * 


 * -- align="left" valign=top
 * Shiozaki, Hitohiro & Eguchi, Koji
 * Entity ranking from annotated text collections using multitype topic models
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||Very recently, topic model-based retrieval methods have produced good results using Latent Dirichlet Allocation (LDA) model or its variants in language modeling framework. However, for the task of retrieving annotated documents when using the LDA-based} methods, some post-processing is required outside the model in order to make use of multiple word types that are specified by the annotations. In this paper, we explore new retrieval methods using a 'multitype topic model' that can directly handle multiple word types, such as annotated entities, category labels and other words that are typically used in Wikipedia. We investigate how to effectively apply the multitype topic model to retrieve documents from an annotated collection, and show the effectiveness of our methods through experiments on entity ranking using a Wikipedia collection. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Very recently, topic model-based retrieval methods have produced good results using Latent Dirichlet Allocation (LDA) model or its variants in language modeling framework. However, for the task of retrieving annotated documents when using the LDA-based} methods, some post-processing is required outside the model in order to make use of multiple word types that are specified by the annotations. In this paper, we explore new retrieval methods using a 'multitype topic model' that can directly handle multiple word types, such as annotated entities, category labels and other words that are typically used in Wikipedia. We investigate how to effectively apply the multitype topic model to retrieve documents from an annotated collection, and show the effectiveness of our methods through experiments on entity ranking using a Wikipedia collection. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Shirakawa, Masumi; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro
 * Concept vector extraction from Wikipedia category network
 * 3rd International Conference on Ubiquitous Information Management and Communication, ICUIMC'09, January 15, 2009 - January 16, 2009 Suwon, Korea, Republic of
 * 2009
 * 
 * {{hidden||The availability of machine readable taxonomy has been demonstrated by various applications such as document classification and information retrieval. One of the main topics of automated taxonomy extraction research is Web mining based statistical NLP} and a significant number of researches have been conducted. However, existing works on automatic dictionary building have accuracy problems due to the technical limitation of statistical NLP} (Natural} Language Processing) and noise data on the WWW.} To solve these problems, in this work, we focus on mining Wikipedia, a large scale Web encyclopedia. Wikipedia has high-quality and huge-scale articles and a category system because many users in the world have edited and refined these articles and category system daily. Using Wikipedia, the decrease of accuracy deriving from NLP} can be avoided. However, affiliation relations cannot be extracted by simply descending the category system automatically since the category system in Wikipedia is not in a tree structure but a network structure. We propose concept vectorization methods which are applicable to the category network structured in Wikipedia. ""}}
 * {{hidden||The availability of machine readable taxonomy has been demonstrated by various applications such as document classification and information retrieval. One of the main topics of automated taxonomy extraction research is Web mining based statistical NLP} and a significant number of researches have been conducted. However, existing works on automatic dictionary building have accuracy problems due to the technical limitation of statistical NLP} (Natural} Language Processing) and noise data on the WWW.} To solve these problems, in this work, we focus on mining Wikipedia, a large scale Web encyclopedia. Wikipedia has high-quality and huge-scale articles and a category system because many users in the world have edited and refined these articles and category system daily. Using Wikipedia, the decrease of accuracy deriving from NLP} can be avoided. However, affiliation relations cannot be extracted by simply descending the category system automatically since the category system in Wikipedia is not in a tree structure but a network structure. We propose concept vectorization methods which are applicable to the category network structured in Wikipedia. ""}}


 * -- align="left" valign=top
 * Siira, Erkki; Tuikka, Tuomo & Tormanen, Vili
 * Location-based mobile wiki using NFC tag infrastructure
 * 2009 1st International Workshop on Near Field Communication, NFC 2009, February 24, 2009 - February 24, 2009 Hagenberg, Austria
 * 2009
 * 
 * {{hidden||Wikipedia is widely known encyclopedia in the web updated by volunteers around the world. Mobile and locationbased wiki with NFC, however, brings forward the idea of using Near Field Communication tags as an enabler for seeking information content from wiki. In this paper we shortly address how NFC} infrastructure can be created in a city for the use of location-based wiki. The users of the system can read local information from the Wikipedia system and also update the location-based content. We present an implementation of such a system. Finally, we evaluate the restrictions of the technological system, and delineate further work. ""}}
 * {{hidden||Wikipedia is widely known encyclopedia in the web updated by volunteers around the world. Mobile and locationbased wiki with NFC, however, brings forward the idea of using Near Field Communication tags as an enabler for seeking information content from wiki. In this paper we shortly address how NFC} infrastructure can be created in a city for the use of location-based wiki. The users of the system can read local information from the Wikipedia system and also update the location-based content. We present an implementation of such a system. Finally, we evaluate the restrictions of the technological system, and delineate further work. ""}}


 * -- align="left" valign=top
 * Silva, Lalindra De & Jayaratne, Lakshman
 * Semi-automatic extraction and modeling of ontologies using wikipedia XML corpus
 * 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009, August 4, 2009 - August 6, 2009 London, United kingdom
 * 2009
 * 
 * {{hidden||This paper introduces WikiOnto:} a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus derived from Wikipedia. Based on the Wikipedia XML} Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using Natural Language Processing (NLP) and other Machine Learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well. ""}}
 * {{hidden||This paper introduces WikiOnto:} a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus derived from Wikipedia. Based on the Wikipedia XML} Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using Natural Language Processing (NLP) and other Machine Learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well. ""}}


 * -- align="left" valign=top
 * Silva, Lalindra De & Jayaratne, Lakshman
 * WikiOnto: A system for semi-automatic extraction and modeling of ontologies using Wikipedia XML corpus
 * ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states
 * 2009
 * 
 * {{hidden||This paper introduces WikiOnto:} a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus of one of the largest knowledge bases in the world - the Wikipedia. Based on the Wikipedia XML} Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using Natural Language Processing (NLP) and other Machine Learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well. ""}}
 * {{hidden||This paper introduces WikiOnto:} a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus of one of the largest knowledge bases in the world - the Wikipedia. Based on the Wikipedia XML} Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using Natural Language Processing (NLP) and other Machine Learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well. ""}}


 * -- align="left" valign=top
 * Sipo, Ruben; Bhole, Abhijit; Fortuna, Blaz; Grobelnik, Marko & Mladenic, Dunja
 * Demo: Historyviz - Visualizing events and relations extracted from wikipedia
 * 6th European Semantic Web Conference, ESWC 2009, May 31, 2009 - June 4, 2009 Heraklion, Crete, Greece
 * 2009
 * 


 * -- align="left" valign=top
 * Sjobergh, Jonas; Sjobergh, Olof & Araki, Kenji
 * What types of translations hide in Wikipedia?
 * 3rd International Conference on Large-Scale Knowledge Resources, LKR 2008, March 3, 2008 - March 5, 2008 Tokyo, Japan
 * 2008
 * 
 * {{hidden||We extend an automatically generated bilingual Japanese-Swedish} dictionary with new translations, automatically discovered from the multi-lingual online encyclopedia Wikipedia. Over 50,000 translations, most of which are not present in the original dictionary, are generated, with very high translation quality. We analyze what types of translations can be generated by this simple method. The majority of the words are proper nouns, and other types of (usually) uninteresting translations are also generated. Not counting the less interesting words, about 15,000 new translations are still found. Checking against logs of search queries from the old dictionary shows that the new translations would significantly reduce the number of searches with no matching translation. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We extend an automatically generated bilingual Japanese-Swedish} dictionary with new translations, automatically discovered from the multi-lingual online encyclopedia Wikipedia. Over 50,000 translations, most of which are not present in the original dictionary, are generated, with very high translation quality. We analyze what types of translations can be generated by this simple method. The majority of the words are proper nouns, and other types of (usually) uninteresting translations are also generated. Not counting the less interesting words, about 15,000 new translations are still found. Checking against logs of search queries from the old dictionary shows that the new translations would significantly reduce the number of searches with no matching translation. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Slattery, Shaun
 * Edit this page": The socio-technological infrastructure of a wikipedia article"
 * 27th ACM International Conference on Design of Communication, SIGDOC'09, October 5, 2009 - October 7, 2009 Bloomington, IN, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Sluis, Frans Van Der & Broek, Egon L. Van Den
 * Using complexity measures in Information Retrieval
 * 3rd Information Interaction in Context Symposium, IIiX'10, August 18, 2010 - August 21, 2010 New Brunswick, NJ, United states
 * 2010
 * 
 * {{hidden||Although Information Retrieval (IR) is meant to serve its users, surprisingly little IR} research is not user-centered. In contrast, this article utilizes the concept complexity of in- formation as the determinant of the user's comprehension, not as a formal golden measure. Four aspects of user's com- prehension are applies on a database of simple and normal Wikipedia articles and found to distinguish between them. The results underline the feasibility of the principle of par- simony for IR:} where two topical articles are available, the simpler one is preferred ""}}
 * {{hidden||Although Information Retrieval (IR) is meant to serve its users, surprisingly little IR} research is not user-centered. In contrast, this article utilizes the concept complexity of in- formation as the determinant of the user's comprehension, not as a formal golden measure. Four aspects of user's com- prehension are applies on a database of simple and normal Wikipedia articles and found to distinguish between them. The results underline the feasibility of the principle of par- simony for IR:} where two topical articles are available, the simpler one is preferred ""}}


 * -- align="left" valign=top
 * Smirnov, Alexander V. & Krizhanovsky, Andrew A.
 * Information filtering based on wiki index database
 * Computational Intelligence in Decision and Control - 8th International FLINS Conference, September 21, 2008 - September 24, 2008 Madrid, Spain
 * 2008


 * -- align="left" valign=top
 * Sood, Sara Owsley & Vasserman, Lucy
 * ESSE: Exploring mood on the web
 * 2009 ICWSM Workshop, May 20, 2009 - May 20, 2009 San Jose, CA, United states
 * 2009
 * {{hidden||Future machines will connect with users on an emotional level in addition to performing complex computations (Norman} 2004). In this article, we present a system that adds an emotional dimension to an activity that Internet users engage in frequently, search. ESSE, which stands for Emotional State Search Engine, is a web search engine that goes beyond facilitating a user's exploration of the web by topic, as search engines such as Google or Yahoo! afford. Rather, it enables the user to browse their topically relevant search results by mood, providing the user with a unique perspective on the topic at hand. Consider a user wishing to read opinions about the new president of the United States. Typing President} Obama" into a Google search box will return (among other results) a few recent news stories about Obama the Whitehouse's website as well as a wikipedia article about him. Typing {"President} Obama" into a Google Blog Search box will bring the user a bit closer to their goal in that all of the results are indeed blogs (typically opinions) about Obama. However where blog search engines fall short is in providing users with a way to navigate and digest the vastness of the blogosphere the incredible number of results for the query {"President} Obama" (approximately 17335307 as of 2/24/09) (Google} Blog Search 2009). ESSE} provides another dimension by which users can take in the vastness of the web or the blogosphere. This article outlines the contributions of ESSE} including a new approach to mood classification. Copyright 2009 Association for the Advancement of Artificial Intelligence (www.aaai.org)."}}
 * {{hidden||Future machines will connect with users on an emotional level in addition to performing complex computations (Norman} 2004). In this article, we present a system that adds an emotional dimension to an activity that Internet users engage in frequently, search. ESSE, which stands for Emotional State Search Engine, is a web search engine that goes beyond facilitating a user's exploration of the web by topic, as search engines such as Google or Yahoo! afford. Rather, it enables the user to browse their topically relevant search results by mood, providing the user with a unique perspective on the topic at hand. Consider a user wishing to read opinions about the new president of the United States. Typing President} Obama" into a Google search box will return (among other results) a few recent news stories about Obama the Whitehouse's website as well as a wikipedia article about him. Typing {"President} Obama" into a Google Blog Search box will bring the user a bit closer to their goal in that all of the results are indeed blogs (typically opinions) about Obama. However where blog search engines fall short is in providing users with a way to navigate and digest the vastness of the blogosphere the incredible number of results for the query {"President} Obama" (approximately 17335307 as of 2/24/09) (Google} Blog Search 2009). ESSE} provides another dimension by which users can take in the vastness of the web or the blogosphere. This article outlines the contributions of ESSE} including a new approach to mood classification. Copyright 2009 Association for the Advancement of Artificial Intelligence (www.aaai.org)."}}
 * {{hidden||Future machines will connect with users on an emotional level in addition to performing complex computations (Norman} 2004). In this article, we present a system that adds an emotional dimension to an activity that Internet users engage in frequently, search. ESSE, which stands for Emotional State Search Engine, is a web search engine that goes beyond facilitating a user's exploration of the web by topic, as search engines such as Google or Yahoo! afford. Rather, it enables the user to browse their topically relevant search results by mood, providing the user with a unique perspective on the topic at hand. Consider a user wishing to read opinions about the new president of the United States. Typing President} Obama" into a Google search box will return (among other results) a few recent news stories about Obama the Whitehouse's website as well as a wikipedia article about him. Typing {"President} Obama" into a Google Blog Search box will bring the user a bit closer to their goal in that all of the results are indeed blogs (typically opinions) about Obama. However where blog search engines fall short is in providing users with a way to navigate and digest the vastness of the blogosphere the incredible number of results for the query {"President} Obama" (approximately 17335307 as of 2/24/09) (Google} Blog Search 2009). ESSE} provides another dimension by which users can take in the vastness of the web or the blogosphere. This article outlines the contributions of ESSE} including a new approach to mood classification. Copyright 2009 Association for the Advancement of Artificial Intelligence (www.aaai.org)."}}


 * -- align="left" valign=top
 * Suh, Bongwon; Chi, Ed H.; Kittur, Aniket & Pendleton, Bryan A.
 * Lifting the veil: Improving accountability and social transparency in Wikipedia with WikiDashboard
 * 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2008, April 5, 2008 - April 10, 2008 Florence, Italy
 * 2008
 * 
 * {{hidden||Wikis are collaborative systems in which virtually anyone can edit anything. Although wikis have become highly popular in many domains, their mutable nature often leads them to be distrusted as a reliable source of information. Here we describe a social dynamic analysis tool called WikiDashboard} which aims to improve social transparency and accountability on Wikipedia articles. Early reactions from users suggest that the increased transparency afforded by the tool can improve the interpretation, communication, and trustworthiness of Wikipedia articles. ""}}
 * {{hidden||Wikis are collaborative systems in which virtually anyone can edit anything. Although wikis have become highly popular in many domains, their mutable nature often leads them to be distrusted as a reliable source of information. Here we describe a social dynamic analysis tool called WikiDashboard} which aims to improve social transparency and accountability on Wikipedia articles. Early reactions from users suggest that the increased transparency afforded by the tool can improve the interpretation, communication, and trustworthiness of Wikipedia articles. ""}}


 * -- align="left" valign=top
 * Suh, Bongwon; Chi, Ed H.; Pendleton, Bryan A. & Kittur, Aniket
 * Us vs. Them: Understanding social dynamics in wikipedia with revert graph visualizations
 * VAST IEEE Symposium on Visual Analytics Science and Technology 2007, October 30, 2007 - November 1, 2007 Sacramento, CA, United states
 * 2007
 * 


 * -- align="left" valign=top
 * Swarts, Jason
 * The collaborative construction of 'fact' on wikipedia
 * 27th ACM International Conference on Design of Communication, SIGDOC'09, October 5, 2009 - October 7, 2009 Bloomington, IN, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Szomszor, Martin; Alani, Harith; Cantador, Ivan; O'Hara, Kieron & Shadbolt, Nigel
 * Semantic modelling of user interests based on cross-folksonomy analysis
 * 7th International Semantic Web Conference, ISWC 2008, October 26, 2008 - October 30, 2008 Karlsruhe, Germany
 * 2008
 * 


 * -- align="left" valign=top
 * Szymanski, Julian
 * Mining relations between wikipedia categories
 * 2nd International Conference on 'Networked Digital Technologies', NDT 2010, July 7, 2010 - July 9, 2010 Prague, Czech republic
 * 2010
 * 
 * {{hidden||The paper concerns the problem of automatic category system creation for a set of documents connected with references. Presented approach has been evaluated on the Polish Wikipedia, where two graphs: the Wikipedia category graph and article graph has been analyzed. The linkages between Wikipedia articles has been used to create a new category graph with weighted edges. We compare the created category graph with the original Wikipedia category graph, testing its quality in terms of coverage. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||The paper concerns the problem of automatic category system creation for a set of documents connected with references. Presented approach has been evaluated on the Polish Wikipedia, where two graphs: the Wikipedia category graph and article graph has been analyzed. The linkages between Wikipedia articles has been used to create a new category graph with weighted edges. We compare the created category graph with the original Wikipedia category graph, testing its quality in terms of coverage. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Szymanski, Julian
 * WordVenture - Cooperative WordNet editor: Architecture for lexical semantic acquisition
 * 1st International Conference on Knowledge Engineering and Ontology Development, KEOD 2009, October 6, 2009 - October 8, 2009 Funchal, Madeira, Portugal
 * 2009


 * -- align="left" valign=top
 * Tan, Saravadee Sae; Kong, Tang Enya & Sodhy, Gian Chand
 * Annotating wikipedia articles with semantic tags for structured retrieval
 * 2nd ACM Workshop on Social Web Search and Mining, SWSM'09, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China
 * 2009
 * 


 * -- align="left" valign=top
 * Taneva, Bilyana; Kacimi, Mouna & Weikum, Gerhard
 * Gathering and ranking photos of named entities with high precision, high recall, and diversity
 * 3rd ACM International Conference on Web Search and Data Mining, WSDM 2010, February 3, 2010 - February 6, 2010 New York City, NY, United states
 * 2010
 * 
 * {{hidden||Knowledge-sharing communities like Wikipedia and automated extraction methods like those of DBpedia} enable the construction of large machine-processible knowledge bases with relational facts about entities. These endeavors lack multimodal data like photos and videos of people and places. While photos of famous entities are abundant on the Internet, they are much harder to retrieve for less popular entities such as notable computer scientists or regionally interesting churches. Querying the entity names in image search engines yields large candidate lists, but they often have low precision and unsatisfactory recall. Our goal is to populate a knowledge base with photos of named entities, with high precision, high recall, and diversity of photos for a given entity. We harness relational facts about entities for generating expanded queries to retrieve different candidate lists from image search engines. We use a weighted voting method to determine better rankings of an entity's photos. Appropriate weights are dependent on the type of entity (e.g., scientist vs. politician) and automatically computed from a small set of training entities. We also exploit visual similarity measures based on SIFT} features, for higher diversity in the final rankings. Our experiments with photos of persons and landmarks show significant improvements of ranking measures like MAP} and NDCG, and also for diversity-aware ranking. ""}}
 * {{hidden||Knowledge-sharing communities like Wikipedia and automated extraction methods like those of DBpedia} enable the construction of large machine-processible knowledge bases with relational facts about entities. These endeavors lack multimodal data like photos and videos of people and places. While photos of famous entities are abundant on the Internet, they are much harder to retrieve for less popular entities such as notable computer scientists or regionally interesting churches. Querying the entity names in image search engines yields large candidate lists, but they often have low precision and unsatisfactory recall. Our goal is to populate a knowledge base with photos of named entities, with high precision, high recall, and diversity of photos for a given entity. We harness relational facts about entities for generating expanded queries to retrieve different candidate lists from image search engines. We use a weighted voting method to determine better rankings of an entity's photos. Appropriate weights are dependent on the type of entity (e.g., scientist vs. politician) and automatically computed from a small set of training entities. We also exploit visual similarity measures based on SIFT} features, for higher diversity in the final rankings. Our experiments with photos of persons and landmarks show significant improvements of ranking measures like MAP} and NDCG, and also for diversity-aware ranking. ""}}


 * -- align="left" valign=top
 * Tellez, Alberto; Juarez, Antonio; Hernandez, Gustavo; Denicia, Claudia; Villatoro, Esau; Montes, Manuel & Villasenor, Luis
 * A lexical approach for Spanish question answering
 * 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary
 * 2008
 * 
 * {{hidden||This paper discusses our system's results at the Spanish Question Answering task of CLEF} 2007. Our system is centered in a full data-driven approach that combines information retrieval and machine learning techniques. It mainly relies on the use of lexical information and avoids any complex language processing procedure. Evaluation results indicate that this approach is very effective for answering definition questions from Wikipedia. In contrast, they also reveal that it is very difficult to respond factoid questions from this resource solely based on the use of lexical overlaps and redundancy. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper discusses our system's results at the Spanish Question Answering task of CLEF} 2007. Our system is centered in a full data-driven approach that combines information retrieval and machine learning techniques. It mainly relies on the use of lexical information and avoids any complex language processing procedure. Evaluation results indicate that this approach is very effective for answering definition questions from Wikipedia. In contrast, they also reveal that it is very difficult to respond factoid questions from this resource solely based on the use of lexical overlaps and redundancy. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Theng, Yin-Leng; Li, Yuanyuan; Lim, Ee-Peng; Wang, Zhe; Goh, Dion Hoe-Lian; Chang, Chew-Hung; Chatterjea, Kalyani & Zhang, Jun
 * Understanding user perceptions on usefulness and usability of an integrated Wiki-G-Portal
 * 9th International Conference on Asian Digital Libraries, ICADL 2006, November 27, 2006 - November 30, 2006 Kyoto, Japan
 * 2006
 * {{hidden||This paper describes a pilot study on Wiki-G-Portal, a project integrating Wikipedia, an online encyclopedia, into G-Portal, a Web-based digital library, of geography resources. Initial findings from the pilot study seemed to suggest positive perceptions on usefulness and usability of Wiki-G-Portal, as well as subjects' attitude and intention to use. Springer-Verlag} Berlin Heidelberg 2006.}}
 * {{hidden||This paper describes a pilot study on Wiki-G-Portal, a project integrating Wikipedia, an online encyclopedia, into G-Portal, a Web-based digital library, of geography resources. Initial findings from the pilot study seemed to suggest positive perceptions on usefulness and usability of Wiki-G-Portal, as well as subjects' attitude and intention to use. Springer-Verlag} Berlin Heidelberg 2006.}}
 * {{hidden||This paper describes a pilot study on Wiki-G-Portal, a project integrating Wikipedia, an online encyclopedia, into G-Portal, a Web-based digital library, of geography resources. Initial findings from the pilot study seemed to suggest positive perceptions on usefulness and usability of Wiki-G-Portal, as well as subjects' attitude and intention to use. Springer-Verlag} Berlin Heidelberg 2006.}}


 * -- align="left" valign=top
 * Thomas, Christopher; Mehra, Pankaj; Brooks, Roger & Sheth, Amit
 * Growing fields of interest using an expand and reduce strategy for domain model extraction
 * 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia
 * 2008
 * 
 * {{hidden||Domain hierarchies are widely used as models underlying information retrieval tasks. Formal ontologies and taxonomies enrich such hierarchies further with properties and relationships but require manual effort; therefore they are costly to maintain, and often stale. Folksonomies and vocabularies lack rich category structure. Classification and extraction require the coverage of vocabularies and the alterability of folksonomies and can largely benefit from category relationships and other properties. With Doozer, a program for building conceptual models of information domains, we want to bridge the gap between the vocabularies and Folksonomies on the one side and the rich, expert-designed ontologies and taxonomies on the other. Doozer mines Wikipedia to produce tight domain hierarchies, starting with simple domain descriptions. It also adds relevancy scores for use in automated classification of information. The output model is described as a hierarchy of domain terms that can be used immediately for classifiers and IR} systems or as a basis for manual or semi-automatic creation of formal ontologies. ""}}
 * {{hidden||Domain hierarchies are widely used as models underlying information retrieval tasks. Formal ontologies and taxonomies enrich such hierarchies further with properties and relationships but require manual effort; therefore they are costly to maintain, and often stale. Folksonomies and vocabularies lack rich category structure. Classification and extraction require the coverage of vocabularies and the alterability of folksonomies and can largely benefit from category relationships and other properties. With Doozer, a program for building conceptual models of information domains, we want to bridge the gap between the vocabularies and Folksonomies on the one side and the rich, expert-designed ontologies and taxonomies on the other. Doozer mines Wikipedia to produce tight domain hierarchies, starting with simple domain descriptions. It also adds relevancy scores for use in automated classification of information. The output model is described as a hierarchy of domain terms that can be used immediately for classifiers and IR} systems or as a basis for manual or semi-automatic creation of formal ontologies. ""}}


 * -- align="left" valign=top
 * Tianyi, Shi; Shidou, Jiao; Junqi, Hou & Minglu, Li
 * Improving keyphrase extraction using wikipedia semantics
 * 2008 2nd International Symposium on Intelligent Information Technology Application, IITA 2008, December 21, 2008 - December 22, 2008 Shanghai, China
 * 2008
 * 


 * -- align="left" valign=top
 * Tran, Tien; Kutty, Sangeetha & Nayak, Richi
 * Utilizing the structure and content information for XML document clustering
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||This paper reports on the experiments and results of a clustering approach used in the INEX} 2008 document mining challenge. The clustering approach utilizes both the structure and content information of the Wikipedia XML} document collection. A latent semantic kernel (LSK) is used to measure the semantic similarity between XML} documents based on their content features. The construction of a latent semantic kernel involves the computing of singular vector decomposition (SVD).} On a large feature space matrix, the computation of SVD} is very expensive in terms of time and memory requirements. Thus in this clustering approach, the dimension of the document space of a term-document matrix is reduced before performing SVD.} The document space reduction is based on the common structural information of the Wikipedia XML} document collection. The proposed clustering approach has shown to be effective on the Wikipedia collection in the INEX} 2008 document mining challenge. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper reports on the experiments and results of a clustering approach used in the INEX} 2008 document mining challenge. The clustering approach utilizes both the structure and content information of the Wikipedia XML} document collection. A latent semantic kernel (LSK) is used to measure the semantic similarity between XML} documents based on their content features. The construction of a latent semantic kernel involves the computing of singular vector decomposition (SVD).} On a large feature space matrix, the computation of SVD} is very expensive in terms of time and memory requirements. Thus in this clustering approach, the dimension of the document space of a term-document matrix is reduced before performing SVD.} The document space reduction is based on the common structural information of the Wikipedia XML} document collection. The proposed clustering approach has shown to be effective on the Wikipedia collection in the INEX} 2008 document mining challenge. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Tran, Tien; Nayak, Richi & Bruza, Peter
 * Document clustering using incremental and pairwise approaches
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||This paper presents the experiments and results of a clustering approach for clustering of the large Wikipedia dataset in the INEX} 2007 Document Mining Challenge. The clustering approach employed makes use of an incremental clustering method and a pairwise clustering method. The approach enables us to perform the clustering task on a large dataset by first reducing the dimension of the dataset to an undefined number of clusters using the incremental method. The lower-dimension dataset is then clustered to a required number of clusters using the pairwise method. In this way, clustering of the large number of documents is performed successfully and the accuracy of the clustering solution is achieved. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper presents the experiments and results of a clustering approach for clustering of the large Wikipedia dataset in the INEX} 2007 Document Mining Challenge. The clustering approach employed makes use of an incremental clustering method and a pairwise clustering method. The approach enables us to perform the clustering task on a large dataset by first reducing the dimension of the dataset to an undefined number of clusters using the incremental method. The lower-dimension dataset is then clustered to a required number of clusters using the pairwise method. In this way, clustering of the large number of documents is performed successfully and the accuracy of the clustering solution is achieved. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Tsikrika, Theodora & Kludas, Jana
 * Overview of the WikipediaMM task at ImageCLEF 2008
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||The WikipediaMM} task provides a testbed for the system- oriented evaluation of ad-hoc retrieval from a large collection of Wikipedia images. It became a part of the ImageCLEF} evaluation campaign in 2008 with the aim of investigating the use of visual and textual sources in combination for improving the retrieval performance. This paper presents an overview of the task's resources, topics, assessments, participants' approaches, and main results. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||The WikipediaMM} task provides a testbed for the system- oriented evaluation of ad-hoc retrieval from a large collection of Wikipedia images. It became a part of the ImageCLEF} evaluation campaign in 2008 with the aim of investigating the use of visual and textual sources in combination for improving the retrieval performance. This paper presents an overview of the task's resources, topics, assessments, participants' approaches, and main results. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Tsikrika, Theodora; Serdyukov, Pavel; Rode, Henning; Westerveld, Thijs; Aly, Robin; Hiemstra, Djoerd & Vries, Arjen P. De
 * Structured document retrieval, multimedia retrieval, and entity ranking using PF/Tijah
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||CWI} and University of Twente used PF/Tijah, a flexible XML} retrieval system, to evaluate structured document retrieval, multimedia retrieval, and entity ranking tasks in the context of INEX} 2007. For the retrieval of textual and multimedia elements in the Wikipedia data, we investigated various length priors and found that biasing towards longer elements than the ones retrieved by our language modelling approach can be useful. For retrieving images in isolation, we found that their associated text is a very good source of evidence in the Wikipedia collection. For the entity ranking task, we used random walks to model multi-step relevance propagation from the articles describing entities to all related entities and further, and obtained promising results. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||CWI} and University of Twente used PF/Tijah, a flexible XML} retrieval system, to evaluate structured document retrieval, multimedia retrieval, and entity ranking tasks in the context of INEX} 2007. For the retrieval of textual and multimedia elements in the Wikipedia data, we investigated various length priors and found that biasing towards longer elements than the ones retrieved by our language modelling approach can be useful. For retrieving images in isolation, we found that their associated text is a very good source of evidence in the Wikipedia collection. For the entity ranking task, we used random walks to model multi-step relevance propagation from the articles describing entities to all related entities and further, and obtained promising results. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Urdaneta, Guido; Pierre, Guillaume & Steen, Maarten Van
 * A decentralized wiki engine for collaborative wikipedia hosting
 * 3rd International Conference on Web Information Systems and Technologies, Webist 2007, March 3, 2007 - March 6, 2007 Barcelona, Spain
 * 2007


 * -- align="left" valign=top
 * Vaishnavi, Vijay K.; Vandenberg, Art; Zhang, Yanqing & Duraisamy, Saravanaraj
 * Towards design principles for effective context-and perspective-based web mining
 * 4th International Conference on Design Science Research in Information Systems and Technology, DESRIST '09, May 7, 2009 - May 8, 2009 Philadelphia, CA, United states
 * 2009
 * 
 * {{hidden||A practical and scalable web mining solution is needed that can assist the user in processing existing web-based resources to discover specific, relevant information content. This is especially important for researcher communities where data deployed on the World Wide Web are characterized by autonomous, dynamically evolving, and conceptually diverse information sources. The paper describes a systematic design research study that is based on prototyping/evaluation and abstraction using existing and new techniques incorporated as plug and play components into a research workbench. The study investigates an approach, DISCOVERY, for using (1) context/perspective information and (2) social networks such as ODP} or Wikipedia for designing practical and scalable human-web systems for finding web pages that are relevant and meet the needs and requirements of a user or a group of users. The paper also describes the current implementation of DISCOVERY} and its initial use in finding web pages in a targeted web domain. The resulting system arguably meets the common needs and requirements of a group of people based on the information provided by the group in the form of a set of context web pages. The system is evaluated for a scenario in which assistance of the system is sought for a group of faculty members in finding NSF} research grant opportunities that they should collaboratively respond to, utilizing the context provided by their recent publications. ""}}
 * {{hidden||A practical and scalable web mining solution is needed that can assist the user in processing existing web-based resources to discover specific, relevant information content. This is especially important for researcher communities where data deployed on the World Wide Web are characterized by autonomous, dynamically evolving, and conceptually diverse information sources. The paper describes a systematic design research study that is based on prototyping/evaluation and abstraction using existing and new techniques incorporated as plug and play components into a research workbench. The study investigates an approach, DISCOVERY, for using (1) context/perspective information and (2) social networks such as ODP} or Wikipedia for designing practical and scalable human-web systems for finding web pages that are relevant and meet the needs and requirements of a user or a group of users. The paper also describes the current implementation of DISCOVERY} and its initial use in finding web pages in a targeted web domain. The resulting system arguably meets the common needs and requirements of a group of people based on the information provided by the group in the form of a set of context web pages. The system is evaluated for a scenario in which assistance of the system is sought for a group of faculty members in finding NSF} research grant opportunities that they should collaboratively respond to, utilizing the context provided by their recent publications. ""}}


 * -- align="left" valign=top
 * Vercoustre, Anne-Marie; Pehcevski, Jovan & Naumovski, Vladimir
 * Topic difficulty prediction in entity ranking
 * 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany
 * 2009
 * 
 * {{hidden||Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag the names of the entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX} Wikipedia test collection. In this paper, we show that the knowledge of predicted classes of topic difficulty can be used to further improve the entity ranking performance. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX} topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments suggest that topic difficulty prediction is a promising approach that could be exploited to improve the effectiveness of entity ranking. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag the names of the entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX} Wikipedia test collection. In this paper, we show that the knowledge of predicted classes of topic difficulty can be used to further improve the entity ranking performance. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX} topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments suggest that topic difficulty prediction is a promising approach that could be exploited to improve the effectiveness of entity ranking. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Vercoustre, Anne-Marie; Pehcevski, Jovan & Thom, James A.
 * Using Wikipedia categories and links in entity ranking
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||This paper describes the participation of the INRIA} group in the INEX} 2007 XML} entity ranking and ad hoc tracks. We developed a system for ranking Wikipedia entities in answer to a query. Our approach utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the examples (when provided) to improve the effectiveness of entity ranking. Our experiments on both the training and the testing data sets demonstrate that the use of categories and the link structure of Wikipedia can significantly improve entity retrieval effectiveness. We also use our system for the ad hoc tasks by inferring target categories from the title of the query. The results were worse than when using a full-text search engine, which confirms our hypothesis that ad hoc retrieval and entity retrieval are two different tasks. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper describes the participation of the INRIA} group in the INEX} 2007 XML} entity ranking and ad hoc tracks. We developed a system for ranking Wikipedia entities in answer to a query. Our approach utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the examples (when provided) to improve the effectiveness of entity ranking. Our experiments on both the training and the testing data sets demonstrate that the use of categories and the link structure of Wikipedia can significantly improve entity retrieval effectiveness. We also use our system for the ad hoc tasks by inferring target categories from the title of the query. The results were worse than when using a full-text search engine, which confirms our hypothesis that ad hoc retrieval and entity retrieval are two different tasks. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Vercoustre, Anne-Marie; Thom, James A. & Pehcevski, Jovan
 * Entity ranking in Wikipedia
 * 23rd Annual ACM Symposium on Applied Computing, SAC'08, March 16, 2008 - March 20, 2008 Fortaleza, Ceara, Brazil
 * 2008
 * 
 * {{hidden||The traditional entity extraction problem, lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named entities; we are interested in entity ranking in the field of information retrieval. In this paper, we describe our approach to identifying and ranking entities from the INEX} Wikipedia document collection. Wikipedia offers a number of interesting features for entity identification and ranking that we first introduce. We then describe the principles and the architecture of our entity ranking system, and introduce our methodology for evaluation. Our preliminary results show that the use of categories and the link structure of Wikipedia, together with entity examples, can significantly improve retrieval effectiveness. ""}}
 * {{hidden||The traditional entity extraction problem, lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named entities; we are interested in entity ranking in the field of information retrieval. In this paper, we describe our approach to identifying and ranking entities from the INEX} Wikipedia document collection. Wikipedia offers a number of interesting features for entity identification and ranking that we first introduce. We then describe the principles and the architecture of our entity ranking system, and introduce our methodology for evaluation. Our preliminary results show that the use of categories and the link structure of Wikipedia, together with entity examples, can significantly improve retrieval effectiveness. ""}}


 * -- align="left" valign=top
 * Viegas, Fernanda B.; Wattenberg, Martin & Mckeon, Matthew M.
 * The hidden order of wikipedia
 * 2nd International Conference on Online Communities and Social Computing, OCSC 2007, July 22, 2007 - July 27, 2007 Beijing, China
 * 2007
 * {{hidden||We examine the procedural side of Wikipedia, the well-known internet encyclopedia. Despite the lack of structure in the underlying wiki technology, users abide by hundreds of rules and follow well-defined processes. Our case study is the Featured Article (FA) process, one of the best established procedures on the site. We analyze the FA} process through the theoretical framework of commons governance, and demonstrate how this process blends elements of traditional workflow with peer production. We conclude that rather than encouraging anarchy, many aspects of wiki technology lend themselves to the collective creation of formalized process and policy. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||We examine the procedural side of Wikipedia, the well-known internet encyclopedia. Despite the lack of structure in the underlying wiki technology, users abide by hundreds of rules and follow well-defined processes. Our case study is the Featured Article (FA) process, one of the best established procedures on the site. We analyze the FA} process through the theoretical framework of commons governance, and demonstrate how this process blends elements of traditional workflow with peer production. We conclude that rather than encouraging anarchy, many aspects of wiki technology lend themselves to the collective creation of formalized process and policy. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||We examine the procedural side of Wikipedia, the well-known internet encyclopedia. Despite the lack of structure in the underlying wiki technology, users abide by hundreds of rules and follow well-defined processes. Our case study is the Featured Article (FA) process, one of the best established procedures on the site. We analyze the FA} process through the theoretical framework of commons governance, and demonstrate how this process blends elements of traditional workflow with peer production. We conclude that rather than encouraging anarchy, many aspects of wiki technology lend themselves to the collective creation of formalized process and policy. Springer-Verlag} Berlin Heidelberg 2007.}}


 * -- align="left" valign=top
 * Villarreal, Sara Elena Gaza; Elizalde, Lorena Martinez & Viveros, Adriana Canseco
 * Clustering hyperlinks for topic extraction: An exploratory analysis
 * 8th Mexican International Conference on Artificial Intelligence, MICAI 2009, November 9, 2009 - November 13, 2009 Guanajuato, Guanajuato, Mexico
 * 2009
 * 


 * -- align="left" valign=top
 * Vries, Arjen P. De; Vercoustre, Anne-Marie; Thom, James A.; Craswell, Nick & Lalmas, Mounia
 * Overview of the INEX 2007 entity ranking track
 * 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany
 * 2008
 * 
 * {{hidden||Many realistic user tasks involve the retrieval of specific entities instead of just any type of documents. Examples of information needs include {'Countries} where one can pay with the euro' or {'Impressionist} art museums in The Netherlands'. The Initiative for Evaluation of XML} Retrieval (INEX) started the XML} Entity Ranking track (INEX-XER) to create a test collection for entity retrieval in Wikipedia. Entities are assumed to correspond to Wikipedia entries. The goal of the track is to evaluate how well systems can rank entities in response to a query; the set of entities to be ranked is assumed to be loosely defined either by a generic category (entity ranking) or by some example entities (list completion). This track overview introduces the track setup, and discusses the implications of the new relevance notion for entity ranking in comparison to ad hoc retrieval. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Many realistic user tasks involve the retrieval of specific entities instead of just any type of documents. Examples of information needs include {'Countries} where one can pay with the euro' or {'Impressionist} art museums in The Netherlands'. The Initiative for Evaluation of XML} Retrieval (INEX) started the XML} Entity Ranking track (INEX-XER) to create a test collection for entity retrieval in Wikipedia. Entities are assumed to correspond to Wikipedia entries. The goal of the track is to evaluate how well systems can rank entities in response to a query; the set of entities to be ranked is assumed to be loosely defined either by a generic category (entity ranking) or by some example entities (list completion). This track overview introduces the track setup, and discusses the implications of the new relevance notion for entity ranking in comparison to ad hoc retrieval. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Vries, Christopher M. De; Geva, Shlomo & Vine, Lance De
 * Clustering with random indexing K-tree and XML structure
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||This paper describes the approach taken to the clustering task at INEX} 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX} 2009 Wikipedia collection. The RI} K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper describes the approach taken to the clustering task at INEX} 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX} 2009 Wikipedia collection. The RI} K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Vroom, Regine W.; Vossen, Lysanne E. & Geers, Anoek M.
 * Aspects to motivate users of a design engineering wiki to share their knowledge
 * Proceedings of World Academy of Science, Engineering and Technology
 * 2009
 * {{hidden||Industrial design engineering is an information and knowledge intensive job. Although Wikipedia offers a lot of this information, design engineers are better served with a wiki tailored to their job, offering information in a compact manner and functioning as a design tool. For that reason WikID} has been developed. However for the viability of a wiki, an active user community is essential. The main subject of this paper is a study to the influence of the communication and the contents of WikID} on the user's willingness to contribute. At first the theory about a website's first impression, general usability guidelines and user motivation in an online community is studied. Using this theory, the aspects of the current site are analyzed on their suitability. These results have been verified with a questionnaire amongst 66 industrial design engineers (or students industrial design engineering). The main conclusion is that design engineers are enchanted with the existence of WikID} and its knowledge structure (taxonomy) but this structure has not become clear without any guidance. In other words, the knowledge structure is very helpful for inspiring and guiding design engineers through their tailored knowledge domain in WikID} but this taxonomy has to be better communicated on the main page. Thereby the main page needs to be fitted more to the target group preferences.}}
 * {{hidden||Industrial design engineering is an information and knowledge intensive job. Although Wikipedia offers a lot of this information, design engineers are better served with a wiki tailored to their job, offering information in a compact manner and functioning as a design tool. For that reason WikID} has been developed. However for the viability of a wiki, an active user community is essential. The main subject of this paper is a study to the influence of the communication and the contents of WikID} on the user's willingness to contribute. At first the theory about a website's first impression, general usability guidelines and user motivation in an online community is studied. Using this theory, the aspects of the current site are analyzed on their suitability. These results have been verified with a questionnaire amongst 66 industrial design engineers (or students industrial design engineering). The main conclusion is that design engineers are enchanted with the existence of WikID} and its knowledge structure (taxonomy) but this structure has not become clear without any guidance. In other words, the knowledge structure is very helpful for inspiring and guiding design engineers through their tailored knowledge domain in WikID} but this taxonomy has to be better communicated on the main page. Thereby the main page needs to be fitted more to the target group preferences.}}
 * {{hidden||Industrial design engineering is an information and knowledge intensive job. Although Wikipedia offers a lot of this information, design engineers are better served with a wiki tailored to their job, offering information in a compact manner and functioning as a design tool. For that reason WikID} has been developed. However for the viability of a wiki, an active user community is essential. The main subject of this paper is a study to the influence of the communication and the contents of WikID} on the user's willingness to contribute. At first the theory about a website's first impression, general usability guidelines and user motivation in an online community is studied. Using this theory, the aspects of the current site are analyzed on their suitability. These results have been verified with a questionnaire amongst 66 industrial design engineers (or students industrial design engineering). The main conclusion is that design engineers are enchanted with the existence of WikID} and its knowledge structure (taxonomy) but this structure has not become clear without any guidance. In other words, the knowledge structure is very helpful for inspiring and guiding design engineers through their tailored knowledge domain in WikID} but this taxonomy has to be better communicated on the main page. Thereby the main page needs to be fitted more to the target group preferences.}}


 * -- align="left" valign=top
 * Waltinger, Ulli & Mehler, Alexander
 * Who is it? Context sensitive named entity and instance recognition by means of Wikipedia
 * 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia
 * 2008
 * 
 * {{hidden||This paper presents an approach for predicting context sensitive entities exemplified in the domain of person names. Our approach is based on building a weighted context but also a weighted people graph and predicting the context entity by extracting the best fitting sub graph using a spreading activation technique. The results of the experiments show a quite promising F-Measure} of 0.99. ""}}
 * {{hidden||This paper presents an approach for predicting context sensitive entities exemplified in the domain of person names. Our approach is based on building a weighted context but also a weighted people graph and predicting the context entity by extracting the best fitting sub graph using a spreading activation technique. The results of the experiments show a quite promising F-Measure} of 0.99. ""}}


 * -- align="left" valign=top
 * Waltinger, Ulli; Mehler, Alexander & Heyer, Gerhard
 * Towards automatic content tagging - Enhanced web services in digital libraries using lexical chaining
 * WEBIST 2008 - 4th International Conference on Web Information Systems and Technologies, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal
 * 2008


 * -- align="left" valign=top
 * Wang, Gang; Yu, Yong & Zhu, Haiping
 * PORE: Positive-only relation extraction from wikipedia text
 * 6th International Semantic Web Conference, ISWC 2007 and 2nd Asian Semantic Web Conference, ASWC 2007, November 11, 2007 - November 15, 2007 Busan, Korea, Republic of
 * 2007
 * 
 * {{hidden||Extracting semantic relations is of great importance for the creation of the Semantic Web content. It is of great benefit to semi-automatically extract relations from the free text of Wikipedia using the structured content readily available in it. Pattern matching methods that employ information redundancy cannot work well since there is not much redundancy information in Wikipedia, compared to the Web. Multi-class classification methods are not reasonable since no classification of relation types is available in Wikipedia. In this paper, we propose PORE} (Positive-Only} Relation Extraction), for relation extraction from Wikipedia text. The core algorithm B-POL} extends a state-of-the-art positive-only learning algorithm using bootstrapping, strong negative identifi cation, and transductive inference to work with fewer positive training exam ples. We conducted experiments on several relations with different amount of training data. The experimental results show that B-POL} can work effectively given only a small amount of positive training examples and it significantly out per forms the original positive learning approaches and a multi-class SVM.} Furthermore, although PORE} is applied in the context of Wiki pedia, the core algorithm B-POL} is a general approach for Ontology Population and can be adapted to other domains. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Extracting semantic relations is of great importance for the creation of the Semantic Web content. It is of great benefit to semi-automatically extract relations from the free text of Wikipedia using the structured content readily available in it. Pattern matching methods that employ information redundancy cannot work well since there is not much redundancy information in Wikipedia, compared to the Web. Multi-class classification methods are not reasonable since no classification of relation types is available in Wikipedia. In this paper, we propose PORE} (Positive-Only} Relation Extraction), for relation extraction from Wikipedia text. The core algorithm B-POL} extends a state-of-the-art positive-only learning algorithm using bootstrapping, strong negative identifi cation, and transductive inference to work with fewer positive training exam ples. We conducted experiments on several relations with different amount of training data. The experimental results show that B-POL} can work effectively given only a small amount of positive training examples and it significantly out per forms the original positive learning approaches and a multi-class SVM.} Furthermore, although PORE} is applied in the context of Wiki pedia, the core algorithm B-POL} is a general approach for Ontology Population and can be adapted to other domains. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Wang, Jun; Jin, Xin & Wu, Yun-Peng
 * An empirical study of knowledge collaboration networks in virtual community: Based on wiki
 * 2009 16th International Conference on Management Science and Engineering, ICMSE 2009, September 14, 2009 - September 16, 2009 Moscow, Russia
 * 2009
 * 
 * {{hidden||Wikipedia is a typical Knowledge collaboration-oriented virtual community. Yet its collaboration mechanism remains unclear. This empirical study explores wikipedia's archive data and proposes a knowledge collaboration network model. The analysis indicates that wiki-based knowledge collaboration network is a type of BA} scale-free network which obeys power-law distribution. On the other hand, this network is characterized with higher stable clustering coefficient and smaller average distance, thus present obvious small-world effect. Moreover, the network topology is non-hierarchical becuase clustering coefficients and degrees don't conform to power -law distribution. The above results profile the collaboration network and figure the key network property. Thus we can use the model to describe how people interact with each other and to what extend they collaborate on content creation. ""}}
 * {{hidden||Wikipedia is a typical Knowledge collaboration-oriented virtual community. Yet its collaboration mechanism remains unclear. This empirical study explores wikipedia's archive data and proposes a knowledge collaboration network model. The analysis indicates that wiki-based knowledge collaboration network is a type of BA} scale-free network which obeys power-law distribution. On the other hand, this network is characterized with higher stable clustering coefficient and smaller average distance, thus present obvious small-world effect. Moreover, the network topology is non-hierarchical becuase clustering coefficients and degrees don't conform to power -law distribution. The above results profile the collaboration network and figure the key network property. Thus we can use the model to describe how people interact with each other and to what extend they collaborate on content creation. ""}}


 * -- align="left" valign=top
 * Wang, Juncheng; Ma, Feicheng & Cheng, Jun
 * The impact of research design on the half-life of the wikipedia category system
 * 2010 International Conference on Computer Design and Applications, ICCDA 2010, June 25, 2010 - June 27, 2010 Qinhuangdao, Hebei, China
 * 2010
 * 


 * -- align="left" valign=top
 * Wang, Li; Yata, Susumu; Atlam, El-Sayed; Fuketa, Masao; Morita, Kazuhiro; Bando, Hiroaki & Aoe, Jun-Ichi
 * A method of building Chinese field association knowledge from Wikipedia
 * 2009 International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2009, September 24, 2009 - September 27, 2009 Dalian, China
 * 2009
 * 
 * {{hidden||Field Association (FA) terms form a limited set of discriminating terms that give us the knowledge to identify document fields. The primary goal of this research is to make a system that can imitate the process whereby humans recognize the fields by looking at a few Chinese FA} terms in a document. This paper proposes a new approach to build a Chinese FA} terms dictionary automatically from Wikipedia. 104,532 FA} terms are added in the dictionary. The resulting FA} terms by using this dictionary are applied to recognize the fields of 5,841 documents. The average accuracy in the experiment is 92.04\%. The results show that the presented method is effective in building FA} terms from Wikipedia automatically. ""}}
 * {{hidden||Field Association (FA) terms form a limited set of discriminating terms that give us the knowledge to identify document fields. The primary goal of this research is to make a system that can imitate the process whereby humans recognize the fields by looking at a few Chinese FA} terms in a document. This paper proposes a new approach to build a Chinese FA} terms dictionary automatically from Wikipedia. 104,532 FA} terms are added in the dictionary. The resulting FA} terms by using this dictionary are applied to recognize the fields of 5,841 documents. The average accuracy in the experiment is 92.04\%. The results show that the presented method is effective in building FA} terms from Wikipedia automatically. ""}}


 * -- align="left" valign=top
 * Wang, Qiuyue; Li, Qiushi; Wang, Shan & Du, Xiaoyong
 * Exploiting semantic tags in XML retrieval
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||With the new semantically annotated Wikipedia XML} corpus, we attempt to investigate the following two research questions. Do the structural constraints in CAS} queries help in retrieving an XML} document collection containing semantically rich tags? How to exploit the semantic tag information to improve the CO} queries as most users prefer to express the simplest forms of queries? In this paper, we describe and analyze the work done on comparing CO} and CAS} queries over the document collection at INEX} 2009 ad hoc track, and we propose a method to improve the effectiveness of CO} queries by enriching the element content representations with semantic tags. Our results show that the approaches of enriching XML} element representations with semantic tags are effective in improving the early precision, while on average precisions, strict interpretation of CAS} queries are generally superior. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||With the new semantically annotated Wikipedia XML} corpus, we attempt to investigate the following two research questions. Do the structural constraints in CAS} queries help in retrieving an XML} document collection containing semantically rich tags? How to exploit the semantic tag information to improve the CO} queries as most users prefer to express the simplest forms of queries? In this paper, we describe and analyze the work done on comparing CO} and CAS} queries over the document collection at INEX} 2009 ad hoc track, and we propose a method to improve the effectiveness of CO} queries by enriching the element content representations with semantic tags. Our results show that the approaches of enriching XML} element representations with semantic tags are effective in improving the early precision, while on average precisions, strict interpretation of CAS} queries are generally superior. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Wang, Yang; Wang, Haofen; Zhu, Haiping & Yu, Yong
 * Exploit semantic information for category annotation recommendation in Wikipedia
 * 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France
 * 2007
 * {{hidden||Compared with plain-text resources, the ones in semi-semantic" web sites such as Wikipedia contain high-level semantic information which will benefit various automatically annotating tasks on themself. In this paper we propose a "collaborative annotating" approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach four typical semantic features in Wikipedia namely incoming link outgoing link section heading and template item are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles. Springer-Verlag} Berlin Heidelberg 2007."}}
 * {{hidden||Compared with plain-text resources, the ones in semi-semantic" web sites such as Wikipedia contain high-level semantic information which will benefit various automatically annotating tasks on themself. In this paper we propose a "collaborative annotating" approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach four typical semantic features in Wikipedia namely incoming link outgoing link section heading and template item are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles. Springer-Verlag} Berlin Heidelberg 2007."}}
 * {{hidden||Compared with plain-text resources, the ones in semi-semantic" web sites such as Wikipedia contain high-level semantic information which will benefit various automatically annotating tasks on themself. In this paper we propose a "collaborative annotating" approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach four typical semantic features in Wikipedia namely incoming link outgoing link section heading and template item are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles. Springer-Verlag} Berlin Heidelberg 2007."}}


 * -- align="left" valign=top
 * Wannemacher, Klaus
 * Articles as assignments - Modalities and experiences of wikipedia use in university courses
 * 8th International Conference on Web Based Learning, ICWL 2009, August 19, 2009 - August 21, 2009 Aachen, Germany
 * 2009
 * 


 * -- align="left" valign=top
 * Wartena, Christian & Brussee, Rogier
 * Topic detection by clustering keywords
 * DEXA 2008, 19th International Conference on Database and Expert Systems Applications, September 1, 2008 - September 5, 2008 Turin, Italy
 * 2008
 * 
 * {{hidden||We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles. In addition, we find that a distance measure based on the Jensen-Shannon} divergence of probability distributions outperforms the cosine similarity. In particular, a newly proposed term distribution taking co-occurrence of terms into account gives best results. ""}}
 * {{hidden||We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles. In addition, we find that a distance measure based on the Jensen-Shannon} divergence of probability distributions outperforms the cosine similarity. In particular, a newly proposed term distribution taking co-occurrence of terms into account gives best results. ""}}


 * -- align="left" valign=top
 * Wattenberg, Martin; Viegas, Fernanda B. & Hollenbach, Katherine
 * Visualizing activity on wikipedia with chromograms
 * 11th IFIP TC 13 International Conference on Human-Computer Interaction, INTERACT 2007, September 10, 2007 - September 14, 2007 Rio de Janeiro, Brazil
 * 2007
 * {{hidden||To investigate how participants in peer production systems allocate their time, we examine editing activity on Wikipedia, the well-known online encyclopedia. To analyze the huge edit histories of the site's administrators we introduce a visualization technique, the chromogram, that can display very long textual sequences through a simple color coding scheme. Using chromograms we describe a set of characteristic editing patterns. In addition to confirming known patterns, such reacting to vandalism events, we identify a distinct class of organized systematic activities. We discuss how both reactive and systematic strategies shed light on self-allocation of effort in Wikipedia, and how they may pertain to other peer-production systems. IFIP} International Federation for Information Processing 2007.}}
 * {{hidden||To investigate how participants in peer production systems allocate their time, we examine editing activity on Wikipedia, the well-known online encyclopedia. To analyze the huge edit histories of the site's administrators we introduce a visualization technique, the chromogram, that can display very long textual sequences through a simple color coding scheme. Using chromograms we describe a set of characteristic editing patterns. In addition to confirming known patterns, such reacting to vandalism events, we identify a distinct class of organized systematic activities. We discuss how both reactive and systematic strategies shed light on self-allocation of effort in Wikipedia, and how they may pertain to other peer-production systems. IFIP} International Federation for Information Processing 2007.}}
 * {{hidden||To investigate how participants in peer production systems allocate their time, we examine editing activity on Wikipedia, the well-known online encyclopedia. To analyze the huge edit histories of the site's administrators we introduce a visualization technique, the chromogram, that can display very long textual sequences through a simple color coding scheme. Using chromograms we describe a set of characteristic editing patterns. In addition to confirming known patterns, such reacting to vandalism events, we identify a distinct class of organized systematic activities. We discuss how both reactive and systematic strategies shed light on self-allocation of effort in Wikipedia, and how they may pertain to other peer-production systems. IFIP} International Federation for Information Processing 2007.}}


 * -- align="left" valign=top
 * Wee, Leong Chee & Hassan, Samer
 * Exploiting Wikipedia for directional inferential text similarity
 * International Conference on Information Technology: New Generations, ITNG 2008, April 7, 2008 - April 9, 2008 Las Vegas, NV, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Weikum, Gerhard
 * Chapter 3: Search for knowledge
 * 1st Workshop on Search Computing Challenges and Directions, SeCo 2009, June 17, 2009 - June 19, 2009 Como, Italy
 * 2010
 * 
 * {{hidden||There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. In addition, Semantic-Web-style} ontologies, structured Deep-Web} sources, and Social-Web} networks and tagging communities can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This vision and position paper discusses opportunities and challenges along this research avenue. The technical issues to be looked into include knowledge harvesting to construct large knowledge bases, searching for knowledge in terms of entities and relationships, and ranking the results of such queries. ""}}
 * {{hidden||There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. In addition, Semantic-Web-style} ontologies, structured Deep-Web} sources, and Social-Web} networks and tagging communities can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This vision and position paper discusses opportunities and challenges along this research avenue. The technical issues to be looked into include knowledge harvesting to construct large knowledge bases, searching for knowledge in terms of entities and relationships, and ranking the results of such queries. ""}}


 * -- align="left" valign=top
 * Weikum, Gerhard
 * Harvesting, searching, and ranking knowledge on the web
 * 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain
 * 2009
 * 
 * {{hidden||There are major trends to advance the functionality of search engines to a more expressive semantic level (e.g., [2, 4, 6, 7, 8, 9, 13, 14, 18]). This is enabled by employing large-scale information extraction [1, 11, 20] of entities and relationships from semistructured as well as natural-language Web sources. In addition, harnessing Semantic-Web-style} ontologies [22] and reaching into Deep-Web} sources [16] can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This talk presents ongoing research towards this objective, with emphasis on our work on the YAGO} knowledge base [23, 24] and the NAGA} search engine [14] but also covering related projects. YAGO} is a large collection of entities and relational facts that are harvested from Wikipedia and WordNet} with high accuracy and reconciled into a consistent RDF-style} semantic" graph. For further growing YAGO} from Web sources while retaining its high quality pattern-based extraction is combined with logic-based consistency checking in a unified framework [25]. NAGA} provides graph-template-based search over this data with powerful ranking capabilities based on a statistical language model for graphs. Advanced queries and the need for ranking approximate matches pose efficiency and scalability challenges that are addressed by algorithmic and indexing techniques [15 17]. YAGO} is publicly available and has been imported into various other knowledge-management projects including DB-pedia.} YAGO} shares many of its goals and methodologies with parallel projects along related lines. These include Avatar [19] Cimple/DBlife} [10 21] DBpedia} [3] Know-ItAll/TextRunner} [12 5] Kylin/KOG} [26 27] and the Libra technology [18 28] (and more). Together they form an exciting trend towards providing comprehensive knowledge bases with semantic search capabilities. """}}
 * {{hidden||There are major trends to advance the functionality of search engines to a more expressive semantic level (e.g., [2, 4, 6, 7, 8, 9, 13, 14, 18]). This is enabled by employing large-scale information extraction [1, 11, 20] of entities and relationships from semistructured as well as natural-language Web sources. In addition, harnessing Semantic-Web-style} ontologies [22] and reaching into Deep-Web} sources [16] can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This talk presents ongoing research towards this objective, with emphasis on our work on the YAGO} knowledge base [23, 24] and the NAGA} search engine [14] but also covering related projects. YAGO} is a large collection of entities and relational facts that are harvested from Wikipedia and WordNet} with high accuracy and reconciled into a consistent RDF-style} semantic" graph. For further growing YAGO} from Web sources while retaining its high quality pattern-based extraction is combined with logic-based consistency checking in a unified framework [25]. NAGA} provides graph-template-based search over this data with powerful ranking capabilities based on a statistical language model for graphs. Advanced queries and the need for ranking approximate matches pose efficiency and scalability challenges that are addressed by algorithmic and indexing techniques [15 17]. YAGO} is publicly available and has been imported into various other knowledge-management projects including DB-pedia.} YAGO} shares many of its goals and methodologies with parallel projects along related lines. These include Avatar [19] Cimple/DBlife} [10 21] DBpedia} [3] Know-ItAll/TextRunner} [12 5] Kylin/KOG} [26 27] and the Libra technology [18 28] (and more). Together they form an exciting trend towards providing comprehensive knowledge bases with semantic search capabilities. """}}


 * -- align="left" valign=top
 * Weiping, Wang; Peng, Chen & Bowen, Liu
 * A self-adaptive explicit semantic analysis method for computing semantic relatedness using wikipedia
 * 2008 International Seminar on Future Information Technology and Management Engineering, FITME 2008, November 20, 2008 - November 20, 2008 Leicestershire, United kingdom
 * 2008
 * 
 * {{hidden||In recent years, the Explicit Semantic Analysis (ESA) method has got a good performance in computing semantic relatedness (SR).} However, ESA} method has failed to consider the given context of the word-pair, and generates the same semantic concepts for one word in different word-pairs. It can't exactly determine the intended sense of an ambiguous word. In this paper, we propose an improved method for computing semantic relatedness. Our technique, the Self-Adaptive} Explicit Semantic Analysis (SAESA), is unique in that it generates corresponding concepts to express the intended meaning for the word, according to the different words being compared and the different context. Experimental results on WordSimilarity-353} benchmark dataset show that the proposed method are superior to those of existing methods, the correlation of computed result with human judgment has an improvement from r = 0.74 to 0.81. ""}}
 * {{hidden||In recent years, the Explicit Semantic Analysis (ESA) method has got a good performance in computing semantic relatedness (SR).} However, ESA} method has failed to consider the given context of the word-pair, and generates the same semantic concepts for one word in different word-pairs. It can't exactly determine the intended sense of an ambiguous word. In this paper, we propose an improved method for computing semantic relatedness. Our technique, the Self-Adaptive} Explicit Semantic Analysis (SAESA), is unique in that it generates corresponding concepts to express the intended meaning for the word, according to the different words being compared and the different context. Experimental results on WordSimilarity-353} benchmark dataset show that the proposed method are superior to those of existing methods, the correlation of computed result with human judgment has an improvement from r = 0.74 to 0.81. ""}}


 * -- align="left" valign=top
 * Welker, Andrea L. & Quintiliano, Barbara
 * Information literacy: Moving beyond Wikipedia
 * GeoCongress 2008: Geosustainability and Geohazard Mitigation, March 9, 2008 - March 12, 2008 New Orleans, LA, United states
 * 2008
 * 
 * {{hidden||In the past, finding information was the challenge. Today, the challenge our students face is to sift through and evaluate the incredible amount of information available. This ability to find and evaluate information is sometimes referred to as information literacy. Information literacy relates to a student's ability to communicate, but, more importantly, information literate persons are well-poised to learn throughout life because they have learned how to learn. A series of modules to address information literacy were created in a collaborative effort between faculty in the Civil and Environmental Engineering Department at Villanova and the librarians at Falvey Memorial Library. These modules were integrated throughout the curriculum, from sophomore to senior year. Assessment is based on modified ACRL} (Association} of College and Research Libraries) outcomes. This paper will document the lessons learned in the implementation of this program and provide concrete examples of how to incorporate information literacy into geotechnical engineering classes. Copyright ASCE} 2008.}}
 * {{hidden||In the past, finding information was the challenge. Today, the challenge our students face is to sift through and evaluate the incredible amount of information available. This ability to find and evaluate information is sometimes referred to as information literacy. Information literacy relates to a student's ability to communicate, but, more importantly, information literate persons are well-poised to learn throughout life because they have learned how to learn. A series of modules to address information literacy were created in a collaborative effort between faculty in the Civil and Environmental Engineering Department at Villanova and the librarians at Falvey Memorial Library. These modules were integrated throughout the curriculum, from sophomore to senior year. Assessment is based on modified ACRL} (Association} of College and Research Libraries) outcomes. This paper will document the lessons learned in the implementation of this program and provide concrete examples of how to incorporate information literacy into geotechnical engineering classes. Copyright ASCE} 2008.}}


 * -- align="left" valign=top
 * West, Andrew G.; Kannan, Sampath & Lee, Insup
 * Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata?
 * 3rd European Workshop on System Security, EUROSEC'10, April 13, 2010 - April 13, 2010 Paris, France
 * 2010
 * 


 * -- align="left" valign=top
 * Westerveid, Thijs; Rode, Henning; Os, Roel Van; Hiemstra, Djoerd; Ramirez, Georgina; Mihajlovie, Vojkan & Vries, Arjen P. De
 * Evaluating structured information retrieval and multimedia retrieval using PF/Tijah
 * 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany
 * 2007
 * {{hidden||We used a flexible XML} retrieval system for evaluating structured document retrieval and multimedia retrieval tasks in the context of the INEX} 2006 benchmarks. We investigated the differences between article and element retrieval for Wikipedia data as well as the influence of an elements context on its ranking. We found that article retrieval performed well on many tasks and that pinpointing the relevant passages inside an article may hurt more than it helps. We found that for finding images in isolation the associated text is a very good descriptor in the Wikipedia collection, but we were not very succesful at identifying relevant multimedia fragments consisting of a combination of text and images. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||We used a flexible XML} retrieval system for evaluating structured document retrieval and multimedia retrieval tasks in the context of the INEX} 2006 benchmarks. We investigated the differences between article and element retrieval for Wikipedia data as well as the influence of an elements context on its ranking. We found that article retrieval performed well on many tasks and that pinpointing the relevant passages inside an article may hurt more than it helps. We found that for finding images in isolation the associated text is a very good descriptor in the Wikipedia collection, but we were not very succesful at identifying relevant multimedia fragments consisting of a combination of text and images. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||We used a flexible XML} retrieval system for evaluating structured document retrieval and multimedia retrieval tasks in the context of the INEX} 2006 benchmarks. We investigated the differences between article and element retrieval for Wikipedia data as well as the influence of an elements context on its ranking. We found that article retrieval performed well on many tasks and that pinpointing the relevant passages inside an article may hurt more than it helps. We found that for finding images in isolation the associated text is a very good descriptor in the Wikipedia collection, but we were not very succesful at identifying relevant multimedia fragments consisting of a combination of text and images. Springer-Verlag} Berlin Heidelberg 2007.}}


 * -- align="left" valign=top
 * Winter, Judith & Kuhne, Gerold
 * Achieving high precisions with peer-to-peer is possible
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * 
 * {{hidden||Until previously, centralized stand-alone solutions had no problem coping with the load of storing, indexing and searching the small test collections used for evaluating search results at INEX.} However, searching the new large-scale Wikipedia collection of 2009 requires much more resources such as processing power, RAM, and index space. It is hence more important than ever to regard efficiency issues when performing XML-Retrieval} tasks on such a big collection. On the other hand, the rich markup of the new collection is an opportunity to exploit the given structure and obtain a more efficient search. This paper describes our experiments using distributed search techniques based on XML-Retrieval.} Our aim is to improve both effectiveness and efficiency; we have thus submitted search results to both the Efficiency Track and the Ad Hoc Track. In our experiments, the collection, index, and search load are split over a peer-to-peer (P2P) network to gain more efficiency in terms of load balancing when searching large-scale collections. Since the bandwidth consumption between searching peers has to be limited in order to achieve a scalable, efficient system, we exploit XML-structure} to reduce the number of messages sent between peers. In spite of mainly aiming at efficiency, our search engine SPIRIX} resulted in quite high precisions and made it into the top-10 systems (focused task). It ranked 7 at the Ad Hoc Track (59\%) and came first in terms of precision at the Efficiency Track (both categories of topics). For the first time at INEX, a P2P} system achieved an official search quality comparable with the top-10 centralized solutions! 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Until previously, centralized stand-alone solutions had no problem coping with the load of storing, indexing and searching the small test collections used for evaluating search results at INEX.} However, searching the new large-scale Wikipedia collection of 2009 requires much more resources such as processing power, RAM, and index space. It is hence more important than ever to regard efficiency issues when performing XML-Retrieval} tasks on such a big collection. On the other hand, the rich markup of the new collection is an opportunity to exploit the given structure and obtain a more efficient search. This paper describes our experiments using distributed search techniques based on XML-Retrieval.} Our aim is to improve both effectiveness and efficiency; we have thus submitted search results to both the Efficiency Track and the Ad Hoc Track. In our experiments, the collection, index, and search load are split over a peer-to-peer (P2P) network to gain more efficiency in terms of load balancing when searching large-scale collections. Since the bandwidth consumption between searching peers has to be limited in order to achieve a scalable, efficient system, we exploit XML-structure} to reduce the number of messages sent between peers. In spite of mainly aiming at efficiency, our search engine SPIRIX} resulted in quite high precisions and made it into the top-10 systems (focused task). It ranked 7 at the Ad Hoc Track (59\%) and came first in terms of precision at the Efficiency Track (both categories of topics). For the first time at INEX, a P2P} system achieved an official search quality comparable with the top-10 centralized solutions! 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Witmer, Jeremy & Kalita, Jugal
 * Extracting geospatial entities from Wikipedia
 * ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states
 * 2009
 * 
 * {{hidden||This paper addresses the challenge of extracting geospatial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vector Machine (SVM) for the task of geospatial named entity recognition. We target for testing a corpus of Wikipedia articles about battles and wars, as these have a high incidence of geospatial content. The SVM} recognizes place names in the corpus with a very high recall, close to 100\%, with an acceptable precision. The set of geospatial NEs} is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not immediately geocode to a single location, we present a data structure and algorithm to resolve ambiguity based on sentence and article context, so the correct coordinates can be selected. We achieve an f-measure of 82\%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovisualization of Wikipedia. ""}}
 * {{hidden||This paper addresses the challenge of extracting geospatial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vector Machine (SVM) for the task of geospatial named entity recognition. We target for testing a corpus of Wikipedia articles about battles and wars, as these have a high incidence of geospatial content. The SVM} recognizes place names in the corpus with a very high recall, close to 100\%, with an acceptable precision. The set of geospatial NEs} is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not immediately geocode to a single location, we present a data structure and algorithm to resolve ambiguity based on sentence and article context, so the correct coordinates can be selected. We achieve an f-measure of 82\%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovisualization of Wikipedia. ""}}


 * -- align="left" valign=top
 * Wong, Wilson; Liu, Wei & Bennamoun, Mohammed
 * Featureless similarities for terms clustering using tree-traversing ants
 * International Symposium on Practical Cognitive Agents and Robots, PCAR 2006, November 27, 2006 - November 28, 2006 Perth, WA, Australia
 * 2006
 * 
 * {{hidden||Besides being difficult to scale between different domains and to handle knowledge fluctuations, the results of terms clustering presented by existing ontology engineering systems are far from desirable. In this paper, we propose a new version of ant-based method for clustering terms known as Tree-Traversing} Ants (TTA).} With the help of the Normalized Google Distance (NGD) and n of Wikipedia (nW) as measures for similarity and distance between terms, we attempt to achieve an adaptable clustering method that is highly scalable across domains. Initial experiments with two datasets show promising results and demonstrated several advantages that are not simultaneously present in standard ant-based and other conventional clustering methods. Copyright held by author.}}
 * {{hidden||Besides being difficult to scale between different domains and to handle knowledge fluctuations, the results of terms clustering presented by existing ontology engineering systems are far from desirable. In this paper, we propose a new version of ant-based method for clustering terms known as Tree-Traversing} Ants (TTA).} With the help of the Normalized Google Distance (NGD) and n of Wikipedia (nW) as measures for similarity and distance between terms, we attempt to achieve an adaptable clustering method that is highly scalable across domains. Initial experiments with two datasets show promising results and demonstrated several advantages that are not simultaneously present in standard ant-based and other conventional clustering methods. Copyright held by author.}}


 * -- align="left" valign=top
 * Wongboonsin, Jenjira & Limpiyakorn, Yachai
 * Wikipedia customization for organization's process asset management
 * 2008 International Conference on Advanced Computer Theory and Engineering, ICACTE 2008, December 20, 2008 - December 22, 2008 Phuket, Thailand
 * 2008
 * 
 * {{hidden||Mature organizations typically establish various process assets served as standards for work operations in their units. Process assets include policies, guidelines, standard process definitions, life cycle models, forms and templates, etc. These assets are placed in a repository called Organization's Process Asset Library or OPAL.} Working in a project will then utilize these assets and tailor organizational standard processes to suit for individual project processes. This research proposed an approach to establishing an organization's process asset library by customizing open source software- Wikipedia. The system is called WikiOPAL.} CMMI} is used as the referenced process improvement model for the establishment of organization's process assets in this work. We also demonstrated that Wikipedia can be properly used as an approach for constructing a process asset library in the collaborative environment. ""}}
 * {{hidden||Mature organizations typically establish various process assets served as standards for work operations in their units. Process assets include policies, guidelines, standard process definitions, life cycle models, forms and templates, etc. These assets are placed in a repository called Organization's Process Asset Library or OPAL.} Working in a project will then utilize these assets and tailor organizational standard processes to suit for individual project processes. This research proposed an approach to establishing an organization's process asset library by customizing open source software- Wikipedia. The system is called WikiOPAL.} CMMI} is used as the referenced process improvement model for the establishment of organization's process assets in this work. We also demonstrated that Wikipedia can be properly used as an approach for constructing a process asset library in the collaborative environment. ""}}


 * -- align="left" valign=top
 * Woodley, Alan & Geva, Shlomo
 * NLPX at INEX 2006
 * 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany
 * 2007
 * {{hidden||XML} information retrieval (XML-IR) systems aim to better fulfil users' information needs than traditional IR} systems by returning results lower than the document level. In order to use XML-IR} systems users must encapsulate their structural and content information needs in a structured query. Historically, these structured queries have been formatted using formal languages such as NEXI.} Unfortunately, formal query languages are very complex and too difficult to be used by experienced - let alone casual - users and are too closely bound to the underlying physical structure of the collection. INEX's} NLP} task investigates the potential of using natural language to specify structured queries. QUT} has participated in the NLP} task with our system NLPX} since its inception. Here, we discuss the changes we've made to NLPX} since last year, including our efforts to port NLPX} to Wikipedia. Second, we present the results from the 2006 INEX} track where NLPX} was the best performing participant in the Thorough and Focused tasks. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||XML} information retrieval (XML-IR) systems aim to better fulfil users' information needs than traditional IR} systems by returning results lower than the document level. In order to use XML-IR} systems users must encapsulate their structural and content information needs in a structured query. Historically, these structured queries have been formatted using formal languages such as NEXI.} Unfortunately, formal query languages are very complex and too difficult to be used by experienced - let alone casual - users and are too closely bound to the underlying physical structure of the collection. INEX's} NLP} task investigates the potential of using natural language to specify structured queries. QUT} has participated in the NLP} task with our system NLPX} since its inception. Here, we discuss the changes we've made to NLPX} since last year, including our efforts to port NLPX} to Wikipedia. Second, we present the results from the 2006 INEX} track where NLPX} was the best performing participant in the Thorough and Focused tasks. Springer-Verlag} Berlin Heidelberg 2007.}}
 * {{hidden||XML} information retrieval (XML-IR) systems aim to better fulfil users' information needs than traditional IR} systems by returning results lower than the document level. In order to use XML-IR} systems users must encapsulate their structural and content information needs in a structured query. Historically, these structured queries have been formatted using formal languages such as NEXI.} Unfortunately, formal query languages are very complex and too difficult to be used by experienced - let alone casual - users and are too closely bound to the underlying physical structure of the collection. INEX's} NLP} task investigates the potential of using natural language to specify structured queries. QUT} has participated in the NLP} task with our system NLPX} since its inception. Here, we discuss the changes we've made to NLPX} since last year, including our efforts to port NLPX} to Wikipedia. Second, we present the results from the 2006 INEX} track where NLPX} was the best performing participant in the Thorough and Focused tasks. Springer-Verlag} Berlin Heidelberg 2007.}}


 * -- align="left" valign=top
 * Wu, Shih-Hung; Li, Min-Xiang; Yang, Ping-Che & Ku, Tsun
 * Ubiquitous wikipedia on handheld device for mobile learning
 * 6th IEEE International Conference on Wireless, Mobile and Ubiquitous Technologies in Education, WMUTE 2010, April 12, 2010 - April 16, 2010 Kaohsiung, Taiwan
 * 2010
 * 


 * -- align="left" valign=top
 * Xavier, Clarissa Castella & Lima, Vera Lucia Strube De
 * Construction of a domain ontological structure from Wikipedia
 * 7th Brazilian Symposium in Information and Human Language Technology, STIL 2009, September 8, 2009 - September 11, 2009 Sao Carlos, Sao Paulo, Brazil
 * 2010
 * 


 * -- align="left" valign=top
 * Xu, Hongtao; Zhou, Xiangdong; Wang, Mei; Xiang, Yu & Shi, Baile
 * Exploring Flickr's related tags for semantic annotation of web images
 * ACM International Conference on Image and Video Retrieval, CIVR 2009, July 8, 2009 - July 10, 2009 Santorini Island, Greece
 * 2009
 * 


 * -- align="left" valign=top
 * Xu, Jinsheng; Yilmaz, Levent & Zhang, Jinghua
 * Agent simulation of collaborative knowledge processing in Wikipedia
 * 2008 Spring Simulation Multiconference, SpringSim'08, April 14, 2008 - April 17, 2008 Ottawa, ON, Canada
 * 2008
 * 
 * {{hidden||Wikipedia, a User Innovation Community (UIC), is becoming increasingly influential source of knowledge. The knowledge in Wikipedia is produced and processed collaboratively by UIC.} The results of this collaboration process present various seemingly complex patterns demonstrated by update history of different articles in Wikipedia. Agent simulation is a powerful method that is used to study the behaviors of complex systems of interacting and autonomous agents. In this paper, we study the collaborative knowledge processing in Wikipedia using a simple agent-based model. The proposed model considers factors including knowledge distribution among agents, number of agents, behavior of agents and vandalism. We use this model to explain content growth rate, number and frequency of updates, edit war and vandalism in Wikipedia articles. The results demonstrate that the model captures the important empirical aspects in collaborative knowledge processing in Wikipedia.}}
 * {{hidden||Wikipedia, a User Innovation Community (UIC), is becoming increasingly influential source of knowledge. The knowledge in Wikipedia is produced and processed collaboratively by UIC.} The results of this collaboration process present various seemingly complex patterns demonstrated by update history of different articles in Wikipedia. Agent simulation is a powerful method that is used to study the behaviors of complex systems of interacting and autonomous agents. In this paper, we study the collaborative knowledge processing in Wikipedia using a simple agent-based model. The proposed model considers factors including knowledge distribution among agents, number of agents, behavior of agents and vandalism. We use this model to explain content growth rate, number and frequency of updates, edit war and vandalism in Wikipedia articles. The results demonstrate that the model captures the important empirical aspects in collaborative knowledge processing in Wikipedia.}}


 * -- align="left" valign=top
 * Yan, Ying; Wang, Chen; Zhou, Aoying; Qian, Weining; Ma, Li & Pan, Yue
 * Efficient indices using graph partitioning in RDF triple stores
 * 25th IEEE International Conference on Data Engineering, ICDE 2009, March 29, 2009 - April 2, 2009 Shanghai, China
 * 2009
 * 
 * {{hidden||With the advance of the Semantic Web, varying RDF} data were increasingly generated, published, queried, and reused via the Web. For example, the DBpedia, a community effort to extract structured data from Wikipedia articles, broke 100 million RDF} triples in its latest release. Initiated by Tim Berners-Lee, likewise, the Linking Open Data (LOD) project has published and interlinked many open licence datasets which consisted of over 2 billion RDF} triples so far. In this context, fast query response over such large scaled data would be one of the challenges to existing RDF} data stores. In this paper, we propose a novel triple indexing scheme to help RDF} query engine fast locate the instances within a small scope. By considering the RDF} data as a graph, we would partition the graph into multiple subgraph pieces and store them individually, over which a signature tree would be built up to index the URIs.} When a query arrives, the signature tree index is used to fast locate the partitions that might include the matches of the query by its constant URIs.} Our experiments indicate that the indexing scheme dramatically reduces the query processing time in most cases because many partitions would be early filtered out and the expensive exact matching is only performed over a quite small scope against the original dataset. ""}}
 * {{hidden||With the advance of the Semantic Web, varying RDF} data were increasingly generated, published, queried, and reused via the Web. For example, the DBpedia, a community effort to extract structured data from Wikipedia articles, broke 100 million RDF} triples in its latest release. Initiated by Tim Berners-Lee, likewise, the Linking Open Data (LOD) project has published and interlinked many open licence datasets which consisted of over 2 billion RDF} triples so far. In this context, fast query response over such large scaled data would be one of the challenges to existing RDF} data stores. In this paper, we propose a novel triple indexing scheme to help RDF} query engine fast locate the instances within a small scope. By considering the RDF} data as a graph, we would partition the graph into multiple subgraph pieces and store them individually, over which a signature tree would be built up to index the URIs.} When a query arrives, the signature tree index is used to fast locate the partitions that might include the matches of the query by its constant URIs.} Our experiments indicate that the indexing scheme dramatically reduces the query processing time in most cases because many partitions would be early filtered out and the expensive exact matching is only performed over a quite small scope against the original dataset. ""}}


 * -- align="left" valign=top
 * Yang, Jingjing; Li, Yuanning; Tian, Yonghong; Duan, Lingyu & Gao, Wen
 * A new multiple kernel approach for visual concept learning
 * 15th International Multimedia Modeling Conference, MMM 2009, January 7, 2009 - January 9, 2009 Sophia-Antipolis, France
 * 2009
 * 
 * {{hidden||In this paper, we present a novel multiple kernel method to learn the optimal classification function for visual concept. Although many carefully designed kernels have been proposed in the literature to measure the visual similarity, few works have been done on how these kernels really affect the learning performance. We propose a Per-Sample} Based Multiple Kernel Learning method (PS-MKL) to investigate the discriminative power of each training sample in different basic kernel spaces. The optimal, sample-specific kernel is learned as a linear combination of a set of basic kernels, which leads to a convex optimization problem with a unique global optimum. As illustrated in the experiments on the Caltech 101 and the Wikipedia MM} dataset, the proposed PS-MKL} outperforms the traditional Multiple Kernel Learning methods (MKL) and achieves comparable results with the state-of-the-art methods of learning visual concepts. 2008 Springer Berlin Heidelberg.}}
 * {{hidden||In this paper, we present a novel multiple kernel method to learn the optimal classification function for visual concept. Although many carefully designed kernels have been proposed in the literature to measure the visual similarity, few works have been done on how these kernels really affect the learning performance. We propose a Per-Sample} Based Multiple Kernel Learning method (PS-MKL) to investigate the discriminative power of each training sample in different basic kernel spaces. The optimal, sample-specific kernel is learned as a linear combination of a set of basic kernels, which leads to a convex optimization problem with a unique global optimum. As illustrated in the experiments on the Caltech 101 and the Wikipedia MM} dataset, the proposed PS-MKL} outperforms the traditional Multiple Kernel Learning methods (MKL) and achieves comparable results with the state-of-the-art methods of learning visual concepts. 2008 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Yang, Kai-Hsiang; Chen, Chun-Yu; Lee, Hahn-Ming & Ho, Jan-Ming
 * EFS: Expert finding system based on wikipedia link pattern analysis
 * 2008 IEEE International Conference on Systems, Man and Cybernetics, SMC 2008, October 12, 2008 - October 15, 2008 Singapore, Singapore
 * 2008
 * 
 * {{hidden||Building an expert finding system is very important for many applications especially in the academic environment. Previous work uses e-mails or web pages as corpus to analyze the expertise for each expert. In this paper, we present an Expert Finding System, abbreviated as EFS} to build experts' profiles by using their journal publications. For a given proposal, the EFS} first looks up the Wikipedia web site to get relative link information, and then list and rank all associated experts by using those information. In our experiments, we use a real-world dataset which comprises of 882 people and 13,654 papers, and are categorized into 9 expertise domains. Our experimental results show that the EFS} works well on several expertise domains like Artificial} Intelligence" and {"Image} Pattern Recognition" etc. """}}
 * {{hidden||Building an expert finding system is very important for many applications especially in the academic environment. Previous work uses e-mails or web pages as corpus to analyze the expertise for each expert. In this paper, we present an Expert Finding System, abbreviated as EFS} to build experts' profiles by using their journal publications. For a given proposal, the EFS} first looks up the Wikipedia web site to get relative link information, and then list and rank all associated experts by using those information. In our experiments, we use a real-world dataset which comprises of 882 people and 13,654 papers, and are categorized into 9 expertise domains. Our experimental results show that the EFS} works well on several expertise domains like Artificial} Intelligence" and {"Image} Pattern Recognition" etc. """}}


 * -- align="left" valign=top
 * Yap, Poh-Hean; Ong, Kok-Leong & Wang, Xungai
 * Business 2.0: A novel model for delivery of business services
 * 5th International Conference on Service Systems and Service Management, ICSSSM'08, June 30, 2008 - July 2, 2008 Melbourne, Australia
 * 2008
 * 


 * -- align="left" valign=top
 * Yuan, Pingpeng; Wang, Guoyin; Zhang, Qin & Jin, Hai
 * SASL: A semantic annotation system for literature
 * International Conference on Web Information Systems and Mining, WISM 2009, November 7, 2009 - November 8, 2009 Shanghai, China
 * 2009
 * 
 * {{hidden||Due to ambiguity, search engines for scientific literatures may not return right search results. One efficient solution to the problems is to automatically annotate literatures and attach the semantic information to them. Generally, semantic annotation requires identifying entities before attaching semantic information to them. However, due to abbreviation and other reasons, it is very difficult to identify entities correctly. The paper presents a Semantic Annotation System for Literature (SASL), which utilizes Wikipedia as knowledge base to annotate literatures. SASL} mainly attaches semantic to terminology, academic institutions, conferences, and journals etc. Many of them are usually abbreviations, which induces ambiguity. Here, SASL} uses regular expressions to extract the mapping between full name of entities and their abbreviation. Since full names of several entities may map to a single abbreviation, SASL} introduces Hidden Markov Model to implement name disambiguation. Finally, the paper presents the experimental results, which confirm SASL} a good performance. ""}}
 * {{hidden||Due to ambiguity, search engines for scientific literatures may not return right search results. One efficient solution to the problems is to automatically annotate literatures and attach the semantic information to them. Generally, semantic annotation requires identifying entities before attaching semantic information to them. However, due to abbreviation and other reasons, it is very difficult to identify entities correctly. The paper presents a Semantic Annotation System for Literature (SASL), which utilizes Wikipedia as knowledge base to annotate literatures. SASL} mainly attaches semantic to terminology, academic institutions, conferences, and journals etc. Many of them are usually abbreviations, which induces ambiguity. Here, SASL} uses regular expressions to extract the mapping between full name of entities and their abbreviation. Since full names of several entities may map to a single abbreviation, SASL} introduces Hidden Markov Model to implement name disambiguation. Finally, the paper presents the experimental results, which confirm SASL} a good performance. ""}}


 * -- align="left" valign=top
 * Zacharouli, Polyxeni; Titsias, Michalis & Vazirgiannis, Michalis
 * Web page rank prediction with PCA and em clustering
 * 6th International Workshop on Algorithms and Models for the Web-Graph, WAW 2009, February 12, 2009 - February 13, 2009 Barcelona, Spain
 * 2009
 * 
 * {{hidden||In this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA).} These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM} algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA} so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||In this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA).} These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM} algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA} so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Zhang, Congle; Xue, Gui-Rong & Yu, Yong
 * Knowledge supervised text classification with no labeled documents
 * 10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008, December 15, 2008 - December 19, 2008 Hanoi, Viet nam
 * 2008
 * 
 * {{hidden||In traditional text classification approaches, the semantic meanings of the classes are described by the labeled documents. Since labeling documents is often time consuming and expensive, it is a promising idea that asking users to provide some keywords to depict the classes, instead of labeling any documents. However, short pieces of keywords may not contain enough information and therefore may lead to unreliable classifier. Fortunately, there are large amount of public data easily available in web directories, such as ODP, Wikipedia, etc. We are interested in exploring the enormous crowd intelligence contained in such public data to enhance text classification. In this paper, we propose a novel text classification framework called Knowledge} Supervised Learning"(KSL) which utilizes the knowledge in keywords and the crowd intelligence to learn the classifier without any labeled documents. We design a two-stage risk minimization (TSRM) approach for the KSL} problem. It can optimize the expected prediction risk and build the high quality classifier. Empirical results verify our claim: our algorithm can achieve above 0.9 on Micro-F1} on average which is much better than baselines and even comparable against SVM} classifier supervised by labeled documents. 2008 Springer Berlin Heidelberg."}}
 * {{hidden||In traditional text classification approaches, the semantic meanings of the classes are described by the labeled documents. Since labeling documents is often time consuming and expensive, it is a promising idea that asking users to provide some keywords to depict the classes, instead of labeling any documents. However, short pieces of keywords may not contain enough information and therefore may lead to unreliable classifier. Fortunately, there are large amount of public data easily available in web directories, such as ODP, Wikipedia, etc. We are interested in exploring the enormous crowd intelligence contained in such public data to enhance text classification. In this paper, we propose a novel text classification framework called Knowledge} Supervised Learning"(KSL) which utilizes the knowledge in keywords and the crowd intelligence to learn the classifier without any labeled documents. We design a two-stage risk minimization (TSRM) approach for the KSL} problem. It can optimize the expected prediction risk and build the high quality classifier. Empirical results verify our claim: our algorithm can achieve above 0.9 on Micro-F1} on average which is much better than baselines and even comparable against SVM} classifier supervised by labeled documents. 2008 Springer Berlin Heidelberg."}}


 * -- align="left" valign=top
 * Zhang, Xu; Song, Yi-Cheng; Cao, Juan; Zhang, Yong-Dong & Li, Jin-Tao
 * Large scale incremental web video categorization
 * 1st International Workshop on Web-Scale Multimedia Corpus, WSMC'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09, October 19, 2009 - October 24, 2009 Beijing, China
 * 2009
 * 
 * {{hidden||With the advent of video sharing websites, the amount of videos on the internet grows rapidly. Web video categorization is an efficient methodology for organizing the huge amount of videos. In this paper we investigate the characteristics of web videos, and make two contributions for the large scale incremental web video categorization. First, we develop an effective semantic feature space Concept Collection for Web Video with Categorization Distinguishability (CCWV-CD), which is consisted of concepts with small semantic gap, and the concept correlations are diffused by a novel Wikipedia Propagation (WP) method. Second, we propose an incremental support vector machine with fixed number of support vectors (n-ISVM) for large scale incremental learning. To evaluate the performance of CCWV-CD, WP} and N-ISVM, we conduct extensive experiments on the dataset of 80,021 most representative videos on a video sharing website. The experiment results show that the CCWV-CD} and WP} is more representative for web videos, and the N-ISVM} algorithm greatly improves the efficiency in the situation of incremental learning. ""}}
 * {{hidden||With the advent of video sharing websites, the amount of videos on the internet grows rapidly. Web video categorization is an efficient methodology for organizing the huge amount of videos. In this paper we investigate the characteristics of web videos, and make two contributions for the large scale incremental web video categorization. First, we develop an effective semantic feature space Concept Collection for Web Video with Categorization Distinguishability (CCWV-CD), which is consisted of concepts with small semantic gap, and the concept correlations are diffused by a novel Wikipedia Propagation (WP) method. Second, we propose an incremental support vector machine with fixed number of support vectors (n-ISVM) for large scale incremental learning. To evaluate the performance of CCWV-CD, WP} and N-ISVM, we conduct extensive experiments on the dataset of 80,021 most representative videos on a video sharing website. The experiment results show that the CCWV-CD} and WP} is more representative for web videos, and the N-ISVM} algorithm greatly improves the efficiency in the situation of incremental learning. ""}}


 * -- align="left" valign=top
 * Zhang, Yi; Sun, Aixin; Datta, Anwitaman; Chang, Kuiyu & Lim, Ee-Peng
 * Do wikipedians follow domain experts?: A domain-specific study on wikipedia knowledge building
 * 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia
 * 2010
 * 
 * {{hidden||Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT.} We identified 409 Wikipedia articles matching TKB} records, and went ahead to study them from three aspects: creation, revision, and link evolution. We found that the knowledge building in Wikipedia had largely been independent, and did not follow TKB} - despite the open and online availability of the latter, as well as awareness of at least some of the Wikipedia contributors about the TKB} source. In an attempt to identify possible reasons, we conducted a detailed analysis of contribution behavior demonstrated by Wikipedians. It was found that most Wikipedians contribute to a relatively small set of articles each. Their contribution was biased towards one or very few article(s). At the same time, each article's contributions are often championed by very few active contributors including the article's creator. We finally arrive at a conjecture that the contributions in Wikipedia are more to cover knowledge at the article level rather than at the domain level. ""}}
 * {{hidden||Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT.} We identified 409 Wikipedia articles matching TKB} records, and went ahead to study them from three aspects: creation, revision, and link evolution. We found that the knowledge building in Wikipedia had largely been independent, and did not follow TKB} - despite the open and online availability of the latter, as well as awareness of at least some of the Wikipedia contributors about the TKB} source. In an attempt to identify possible reasons, we conducted a detailed analysis of contribution behavior demonstrated by Wikipedians. It was found that most Wikipedians contribute to a relatively small set of articles each. Their contribution was biased towards one or very few article(s). At the same time, each article's contributions are often championed by very few active contributors including the article's creator. We finally arrive at a conjecture that the contributions in Wikipedia are more to cover knowledge at the article level rather than at the domain level. ""}}


 * -- align="left" valign=top
 * Zhou, Zhi; Tian, Yonghong; Li, Yuanning; Huang, Tiejun & Gao, Wen
 * Large-scale cross-media retrieval of wikipediaMM images with textual and visual query expansion
 * 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark
 * 2009
 * 
 * {{hidden||In this paper, we present our approaches for the WikipediaMM} task at ImageCLEF} 2008. We first experimented with a text-based image retrieval approach with query expansion, where the extension terms were automatically selected from a knowledge base that was semi-automatically constructed from Wikipedia. Encouragingly, the experimental results rank in the first place among all submitted runs. We also implemented a content-based image retrieval approach with query-dependent visual concept detection. Then cross-media retrieval was successfully carried out by independently applying the two meta-search tools and then combining the results through a weighted summation of scores. Though not submitted, this approach outperforms our text-based and content-based approaches remarkably. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||In this paper, we present our approaches for the WikipediaMM} task at ImageCLEF} 2008. We first experimented with a text-based image retrieval approach with query expansion, where the extension terms were automatically selected from a knowledge base that was semi-automatically constructed from Wikipedia. Encouragingly, the experimental results rank in the first place among all submitted runs. We also implemented a content-based image retrieval approach with query-dependent visual concept detection. Then cross-media retrieval was successfully carried out by independently applying the two meta-search tools and then combining the results through a weighted summation of scores. Though not submitted, this approach outperforms our text-based and content-based approaches remarkably. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Zirn, Cacilia; Nastase, Vivi & Strube, Michael
 * Distinguishing between instances and classes in the wikipedia taxonomy
 * 5th European Semantic Web Conference, ESWC 2008, June 1, 2008 - June 5, 2008 Tenerife, Canary Islands, Spain
 * 2008
 * 
 * {{hidden||This paper presents an automatic method for differentiating between instances and classes in a large scale taxonomy induced from the Wikipedia category network. The method exploits characteristics of the category names and the structure of the network. The approach we present is the first attempt to make this distinction automatically in a large scale resource. In contrast, this distinction has been made in WordNet} and Cyc based on manual annotations. The result of the process is evaluated against ResearchCyc.} On the subnetwork shared by our taxonomy and ResearchCyc} we report 84.52\% accuracy. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||This paper presents an automatic method for differentiating between instances and classes in a large scale taxonomy induced from the Wikipedia category network. The method exploits characteristics of the category names and the structure of the network. The approach we present is the first attempt to make this distinction automatically in a large scale resource. In contrast, this distinction has been made in WordNet} and Cyc based on manual annotations. The result of the process is evaluated against ResearchCyc.} On the subnetwork shared by our taxonomy and ResearchCyc} we report 84.52\% accuracy. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Focused Retrieval and Evaluation - 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, Revised and Selected Papers
 * 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia
 * 2010
 * {{hidden||The proceedings contain 42 papers. The topics discussed include: is there something quantum-like about the human mental lexicon?; supporting for real-world tasks: producing summaries of scientific articles tailored to the citation context; semantic document processing using wikipedia as a knowledge base; a methodology for producing improved focused elements; use of language model, phrases and wikipedia forward links for INEX} 2009; combining language models with NLP} and interactive query expansion; exploiting semantic tags in XML} retrieval; the book structure extraction competition with the resurgence software at Caen university; ranking and fusion approaches for XML} book retrieval; index tuning for efficient proximity-enhanced query processing; fast and effective focused retrieval; combining term-based and category-based representations for entity search; and focused search in books and wikipedia: categories, links and relevance feedback.}}
 * {{hidden||The proceedings contain 42 papers. The topics discussed include: is there something quantum-like about the human mental lexicon?; supporting for real-world tasks: producing summaries of scientific articles tailored to the citation context; semantic document processing using wikipedia as a knowledge base; a methodology for producing improved focused elements; use of language model, phrases and wikipedia forward links for INEX} 2009; combining language models with NLP} and interactive query expansion; exploiting semantic tags in XML} retrieval; the book structure extraction competition with the resurgence software at Caen university; ranking and fusion approaches for XML} book retrieval; index tuning for efficient proximity-enhanced query processing; fast and effective focused retrieval; combining term-based and category-based representations for entity search; and focused search in books and wikipedia: categories, links and relevance feedback.}}
 * {{hidden||The proceedings contain 42 papers. The topics discussed include: is there something quantum-like about the human mental lexicon?; supporting for real-world tasks: producing summaries of scientific articles tailored to the citation context; semantic document processing using wikipedia as a knowledge base; a methodology for producing improved focused elements; use of language model, phrases and wikipedia forward links for INEX} 2009; combining language models with NLP} and interactive query expansion; exploiting semantic tags in XML} retrieval; the book structure extraction competition with the resurgence software at Caen university; ranking and fusion approaches for XML} book retrieval; index tuning for efficient proximity-enhanced query processing; fast and effective focused retrieval; combining term-based and category-based representations for entity search; and focused search in books and wikipedia: categories, links and relevance feedback.}}
 * {{hidden||The proceedings contain 42 papers. The topics discussed include: is there something quantum-like about the human mental lexicon?; supporting for real-world tasks: producing summaries of scientific articles tailored to the citation context; semantic document processing using wikipedia as a knowledge base; a methodology for producing improved focused elements; use of language model, phrases and wikipedia forward links for INEX} 2009; combining language models with NLP} and interactive query expansion; exploiting semantic tags in XML} retrieval; the book structure extraction competition with the resurgence software at Caen university; ranking and fusion approaches for XML} book retrieval; index tuning for efficient proximity-enhanced query processing; fast and effective focused retrieval; combining term-based and category-based representations for entity search; and focused search in books and wikipedia: categories, links and relevance feedback.}}


 * -- align="left" valign=top
 * IEEE Pacific Visualization Symposium 2010, PacificVis 2010 - Proceedings
 * IEEE Pacific Visualization Symposium 2010, PacificVis 2010, March 2, 2010 - March 5, 2010 Taipei, Taiwan
 * 2010
 * 2010


 * -- align="left" valign=top
 * JCDL'10 - Digital Libraries - 10 Years Past, 10 Years Forward, a 2020 Vision
 * 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia
 * 2010
 * 2010


 * -- align="left" valign=top
 * Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR'10
 * 6th Workshop on Geographic Information Retrieval, GIR'10, February 18, 2010 - February 19, 2010 Zurich, Switzerland
 * 2010
 * {{hidden||The proceedings contain 24 papers. The topics discussed include: linkable geographic ontologies; unnamed locations, underspecified regions, and other linguistic phenomena in geographic annotation; an ontology of place and service types to facilitate place-affordance geographic information retrieval; Geotagging: using proximity, sibling, and prominence clues to understand comma groups; evaluation of georeferencing; a GIR} architecture with semantic-flavored query reformulation; OGC} catalog service for heterogeneous earth observation metadata using extensible search indices; TWinner:} understanding news queries with geo-content using Twitter; geographical classification of documents using evidence from Wikipedia; a web platform for the evaluation of vernacular place names in automatically constructed gazetteers; grounding toponyms in an Italian local news corpus; and using the geographic scopes of web documents for contextual advertising.}}
 * {{hidden||The proceedings contain 24 papers. The topics discussed include: linkable geographic ontologies; unnamed locations, underspecified regions, and other linguistic phenomena in geographic annotation; an ontology of place and service types to facilitate place-affordance geographic information retrieval; Geotagging: using proximity, sibling, and prominence clues to understand comma groups; evaluation of georeferencing; a GIR} architecture with semantic-flavored query reformulation; OGC} catalog service for heterogeneous earth observation metadata using extensible search indices; TWinner:} understanding news queries with geo-content using Twitter; geographical classification of documents using evidence from Wikipedia; a web platform for the evaluation of vernacular place names in automatically constructed gazetteers; grounding toponyms in an Italian local news corpus; and using the geographic scopes of web documents for contextual advertising.}}
 * {{hidden||The proceedings contain 24 papers. The topics discussed include: linkable geographic ontologies; unnamed locations, underspecified regions, and other linguistic phenomena in geographic annotation; an ontology of place and service types to facilitate place-affordance geographic information retrieval; Geotagging: using proximity, sibling, and prominence clues to understand comma groups; evaluation of georeferencing; a GIR} architecture with semantic-flavored query reformulation; OGC} catalog service for heterogeneous earth observation metadata using extensible search indices; TWinner:} understanding news queries with geo-content using Twitter; geographical classification of documents using evidence from Wikipedia; a web platform for the evaluation of vernacular place names in automatically constructed gazetteers; grounding toponyms in an Italian local news corpus; and using the geographic scopes of web documents for contextual advertising.}}
 * {{hidden||The proceedings contain 24 papers. The topics discussed include: linkable geographic ontologies; unnamed locations, underspecified regions, and other linguistic phenomena in geographic annotation; an ontology of place and service types to facilitate place-affordance geographic information retrieval; Geotagging: using proximity, sibling, and prominence clues to understand comma groups; evaluation of georeferencing; a GIR} architecture with semantic-flavored query reformulation; OGC} catalog service for heterogeneous earth observation metadata using extensible search indices; TWinner:} understanding news queries with geo-content using Twitter; geographical classification of documents using evidence from Wikipedia; a web platform for the evaluation of vernacular place names in automatically constructed gazetteers; grounding toponyms in an Italian local news corpus; and using the geographic scopes of web documents for contextual advertising.}}


 * -- align="left" valign=top
 * 2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2009
 * 2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2009, November 11, 2009 - November 14, 2009 Washington, DC, United states
 * 2009
 * {{hidden||The proceedings contain 68 papers. The topics discussed include: multi-user multi-account interaction in groupware supporting single-display collaboration; supporting collaborative work through flexible process execution; dynamic data services: data access for collaborative networks in a multi-agent systems architecture; integrating external user profiles in collaboration applications; a collaborative framework for enforcing server commitments, and for regulating server interactive behavior in SOA-based} systems; CASTLE:} a social framework for collaborative anti-phishing databases; VisGBT:} visually analyzing evolving datasets for adaptive learning; an IT} appliance for remote collaborative review of mechanisms of injury to children in motor vehicle crashes; user contribution and trust in Wikipedia; and a new perspective on experimental analysis of N-tier systems: evaluating database scalability, multi-bottlenecks, and economical operation.}}
 * {{hidden||The proceedings contain 68 papers. The topics discussed include: multi-user multi-account interaction in groupware supporting single-display collaboration; supporting collaborative work through flexible process execution; dynamic data services: data access for collaborative networks in a multi-agent systems architecture; integrating external user profiles in collaboration applications; a collaborative framework for enforcing server commitments, and for regulating server interactive behavior in SOA-based} systems; CASTLE:} a social framework for collaborative anti-phishing databases; VisGBT:} visually analyzing evolving datasets for adaptive learning; an IT} appliance for remote collaborative review of mechanisms of injury to children in motor vehicle crashes; user contribution and trust in Wikipedia; and a new perspective on experimental analysis of N-tier systems: evaluating database scalability, multi-bottlenecks, and economical operation.}}
 * {{hidden||The proceedings contain 68 papers. The topics discussed include: multi-user multi-account interaction in groupware supporting single-display collaboration; supporting collaborative work through flexible process execution; dynamic data services: data access for collaborative networks in a multi-agent systems architecture; integrating external user profiles in collaboration applications; a collaborative framework for enforcing server commitments, and for regulating server interactive behavior in SOA-based} systems; CASTLE:} a social framework for collaborative anti-phishing databases; VisGBT:} visually analyzing evolving datasets for adaptive learning; an IT} appliance for remote collaborative review of mechanisms of injury to children in motor vehicle crashes; user contribution and trust in Wikipedia; and a new perspective on experimental analysis of N-tier systems: evaluating database scalability, multi-bottlenecks, and economical operation.}}
 * {{hidden||The proceedings contain 68 papers. The topics discussed include: multi-user multi-account interaction in groupware supporting single-display collaboration; supporting collaborative work through flexible process execution; dynamic data services: data access for collaborative networks in a multi-agent systems architecture; integrating external user profiles in collaboration applications; a collaborative framework for enforcing server commitments, and for regulating server interactive behavior in SOA-based} systems; CASTLE:} a social framework for collaborative anti-phishing databases; VisGBT:} visually analyzing evolving datasets for adaptive learning; an IT} appliance for remote collaborative review of mechanisms of injury to children in motor vehicle crashes; user contribution and trust in Wikipedia; and a new perspective on experimental analysis of N-tier systems: evaluating database scalability, multi-bottlenecks, and economical operation.}}


 * -- align="left" valign=top
 * Internet and Other Electronic Resources for Materials Education 2007
 * 136th TMS Annual Meeting, 2007, Febrary 25, 2007 - March 1, 2007 Orlando, FL, United states
 * 2007
 * 2007


 * -- align="left" valign=top
 * Natural Language Processing and Information Systems - 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, Proceedings
 * 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France
 * 2007
 * 2007


 * -- align="left" valign=top
 * Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management, WIDM '07, Co-located with the 16th ACM Conference on Information and Knowledge Management, CIKM '07
 * 9th Annual ACM International Workshop on Web Information and Data Management, WIDM '07, Co-located with the 16th ACM Conference on Information and Knowledge Management, CIKM '07, November 6, 2007 - November 9, 2007 Lisboa, Portugal
 * 2007
 * {{hidden||The proceedings contain 20 papers. The topics discussed include: evaluation of datalog extended with an XPath predicate; data allocation scheme based on term weight for P2P information retrieval; distributed monitoring of peer to peer systems; self-optimizing block transfer in web service grids; supporting personalized top-k skyline queries using partial compressed skycube; toward editable web browser: edit-and-propagate operation for web browsing; mining user navigation patterns for personalizing topic directories; an online PPM} prediction model for web prefetching; extracting the discussion structure in comments on news-articles; pattern detection from web using AFA set theory; using neighbors to date web documents; on improving wikipedia search using article quality; and SATYA: a reputation-based approach for service discovery and selection in service oriented architectures.}}
 * {{hidden||The proceedings contain 20 papers. The topics discussed include: evaluation of datalog extended with an XPath predicate; data allocation scheme based on term weight for P2P information retrieval; distributed monitoring of peer to peer systems; self-optimizing block transfer in web service grids; supporting personalized top-k skyline queries using partial compressed skycube; toward editable web browser: edit-and-propagate operation for web browsing; mining user navigation patterns for personalizing topic directories; an online PPM} prediction model for web prefetching; extracting the discussion structure in comments on news-articles; pattern detection from web using AFA set theory; using neighbors to date web documents; on improving wikipedia search using article quality; and SATYA: a reputation-based approach for service discovery and selection in service oriented architectures.}}
 * {{hidden||The proceedings contain 20 papers. The topics discussed include: evaluation of datalog extended with an XPath predicate; data allocation scheme based on term weight for P2P information retrieval; distributed monitoring of peer to peer systems; self-optimizing block transfer in web service grids; supporting personalized top-k skyline queries using partial compressed skycube; toward editable web browser: edit-and-propagate operation for web browsing; mining user navigation patterns for personalizing topic directories; an online PPM} prediction model for web prefetching; extracting the discussion structure in comments on news-articles; pattern detection from web using AFA set theory; using neighbors to date web documents; on improving wikipedia search using article quality; and SATYA: a reputation-based approach for service discovery and selection in service oriented architectures.}}
 * {{hidden||The proceedings contain 20 papers. The topics discussed include: evaluation of datalog extended with an XPath predicate; data allocation scheme based on term weight for P2P information retrieval; distributed monitoring of peer to peer systems; self-optimizing block transfer in web service grids; supporting personalized top-k skyline queries using partial compressed skycube; toward editable web browser: edit-and-propagate operation for web browsing; mining user navigation patterns for personalizing topic directories; an online PPM} prediction model for web prefetching; extracting the discussion structure in comments on news-articles; pattern detection from web using AFA set theory; using neighbors to date web documents; on improving wikipedia search using article quality; and SATYA: a reputation-based approach for service discovery and selection in service oriented architectures.}}


 * -- align="left" valign=top
 * Tamagawa, Susumu; Sakurai, Shinya; Tejima, Takuya; Morita, Takeshi; Izumi, Noriaki & Yamaguchi, Takahira
 * Learning a Large Scale of Ontology from Japanese Wikipedia
 * Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
 * 2010
 * {{hidden||Here is discussed how to learn a large scale of ontology from Japanese Wikipedia. The learned ontology includes the following properties: Rdfs:subClassOf} (IS-A} relationships), rdf:type (class-instance relationships), Owl:Object/DatatypeProperty} (Infobox} triples), rdfs:domain (property domains), and Skos:altLabel} (synonyms). Experimental case studies show us that the learned Japanese Wikipedia Ontology goes better than already existing general linguistic ontologies, such as EDR} and Japanese WordNet, from the points of building costs and structure information richness.}}
 * {{hidden||Here is discussed how to learn a large scale of ontology from Japanese Wikipedia. The learned ontology includes the following properties: Rdfs:subClassOf} (IS-A} relationships), rdf:type (class-instance relationships), Owl:Object/DatatypeProperty} (Infobox} triples), rdfs:domain (property domains), and Skos:altLabel} (synonyms). Experimental case studies show us that the learned Japanese Wikipedia Ontology goes better than already existing general linguistic ontologies, such as EDR} and Japanese WordNet, from the points of building costs and structure information richness.}}
 * {{hidden||Here is discussed how to learn a large scale of ontology from Japanese Wikipedia. The learned ontology includes the following properties: Rdfs:subClassOf} (IS-A} relationships), rdf:type (class-instance relationships), Owl:Object/DatatypeProperty} (Infobox} triples), rdfs:domain (property domains), and Skos:altLabel} (synonyms). Experimental case studies show us that the learned Japanese Wikipedia Ontology goes better than already existing general linguistic ontologies, such as EDR} and Japanese WordNet, from the points of building costs and structure information richness.}}


 * -- align="left" valign=top
 * Jing, Liping; Yun, Jiali; Yu, Jian & Huang, Houkuan
 * Text Clustering via Term Semantic Units
 * Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
 * 2010
 * {{hidden||How best to represent text data is an important problem in text mining tasks including information retrieval, clustering, classification and etc.. In this paper, we proposed a compact document representation with term semantic units which are identified from the implicit and explicit semantic information. Among it, the implicit semantic information is extracted from syntactic content via statistical methods such as latent semantic indexing and information bottleneck. The explicit semantic information is mined from the external semantic resource (Wikipedia).} The proposed compact representation model can map a document collection in a low-dimension space (term semantic units which are much less than the number of all unique terms). Experimental results on real data sets have shown that the compact representation efficiently improve the performance of text clustering.}}
 * {{hidden||How best to represent text data is an important problem in text mining tasks including information retrieval, clustering, classification and etc.. In this paper, we proposed a compact document representation with term semantic units which are identified from the implicit and explicit semantic information. Among it, the implicit semantic information is extracted from syntactic content via statistical methods such as latent semantic indexing and information bottleneck. The explicit semantic information is mined from the external semantic resource (Wikipedia).} The proposed compact representation model can map a document collection in a low-dimension space (term semantic units which are much less than the number of all unique terms). Experimental results on real data sets have shown that the compact representation efficiently improve the performance of text clustering.}}
 * {{hidden||How best to represent text data is an important problem in text mining tasks including information retrieval, clustering, classification and etc.. In this paper, we proposed a compact document representation with term semantic units which are identified from the implicit and explicit semantic information. Among it, the implicit semantic information is extracted from syntactic content via statistical methods such as latent semantic indexing and information bottleneck. The explicit semantic information is mined from the external semantic resource (Wikipedia).} The proposed compact representation model can map a document collection in a low-dimension space (term semantic units which are much less than the number of all unique terms). Experimental results on real data sets have shown that the compact representation efficiently improve the performance of text clustering.}}


 * -- align="left" valign=top
 * Breuing, Alexa
 * Improving Human-Agent Conversations by Accessing Contextual Knowledge from Wikipedia
 * Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
 * 2010
 * {{hidden||In order to talk to each other meaningfully, conversational partners utilize different types of conversational knowledge. Due to the fact that speakers often use grammatically incomplete and incorrect sentences in spontaneous language, knowledge about conversational and terminological context turns out to be as much important in language understanding as traditional linguistic analysis. In the context of the KnowCIT} project we want to improve human-agent conversations by connecting the agent to an adequate representation of such contextual knowledge drawn from the online encyclopedia Wikipedia. Thereby we make use of additional components provided by Wikipedia which goes beyond encyclopedical information to identify the current dialog topic and to implement human like look-up abilities.}}
 * {{hidden||In order to talk to each other meaningfully, conversational partners utilize different types of conversational knowledge. Due to the fact that speakers often use grammatically incomplete and incorrect sentences in spontaneous language, knowledge about conversational and terminological context turns out to be as much important in language understanding as traditional linguistic analysis. In the context of the KnowCIT} project we want to improve human-agent conversations by connecting the agent to an adequate representation of such contextual knowledge drawn from the online encyclopedia Wikipedia. Thereby we make use of additional components provided by Wikipedia which goes beyond encyclopedical information to identify the current dialog topic and to implement human like look-up abilities.}}
 * {{hidden||In order to talk to each other meaningfully, conversational partners utilize different types of conversational knowledge. Due to the fact that speakers often use grammatically incomplete and incorrect sentences in spontaneous language, knowledge about conversational and terminological context turns out to be as much important in language understanding as traditional linguistic analysis. In the context of the KnowCIT} project we want to improve human-agent conversations by connecting the agent to an adequate representation of such contextual knowledge drawn from the online encyclopedia Wikipedia. Thereby we make use of additional components provided by Wikipedia which goes beyond encyclopedical information to identify the current dialog topic and to implement human like look-up abilities.}}


 * -- align="left" valign=top
 * Salahli, M.A.; Gasimzade, T.M. & Guliyev, A.I.
 * Domain specific ontology on computer science
 * Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control, 2009. ICSCCW 2009. Fifth International Conference on
 * 2009
 * {{hidden||In this paper we introduce the application system based on the domain specific ontology. Some design problems of the ontology are discussed. The ontology is based on the WordNet's} database and consists of Turkish and English terms on computer science and informatics. Second we present the method for determining a set of words, which are related to a given concept and computing the degree of semantic relatedness between them. The presented method has been used for semantic searching process, which is carried out by our application.}}
 * {{hidden||In this paper we introduce the application system based on the domain specific ontology. Some design problems of the ontology are discussed. The ontology is based on the WordNet's} database and consists of Turkish and English terms on computer science and informatics. Second we present the method for determining a set of words, which are related to a given concept and computing the degree of semantic relatedness between them. The presented method has been used for semantic searching process, which is carried out by our application.}}
 * {{hidden||In this paper we introduce the application system based on the domain specific ontology. Some design problems of the ontology are discussed. The ontology is based on the WordNet's} database and consists of Turkish and English terms on computer science and informatics. Second we present the method for determining a set of words, which are related to a given concept and computing the degree of semantic relatedness between them. The presented method has been used for semantic searching process, which is carried out by our application.}}


 * -- align="left" valign=top
 * Yang, Kai-Hsiang; Kuo, Tai-Liang; Lee, Hahn-Ming & Ho, Jan-Ming
 * A Reviewer Recommendation System Based on Collaborative Intelligence
 * Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on
 * 2009


 * -- align="left" valign=top
 * Mishra, Surjeet; Gorai, Amarendra; Oberoi, Tavleen & Ghosh, Hiranmay
 * Efficient Visualization of Content and Contextual Information of an Online Multimedia Digital Library for Effective Browsing
 * Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
 * 2010


 * -- align="left" valign=top
 * Jinwei, Fu; Jianhong, Sun & Tianqing, Xiao
 * A FAQ online system based on wiki
 * E-Health Networking, Digital Ecosystems and Technologies (EDT), 2010 International Conference on
 * 2010
 * {{hidden||In this paper, we will propose a FAQ} online system based on wiki engine. The goal of this system is to reduce the counseling workload in our university. It is also can be used in other counseling field. The proposed system will be built based on one of the popular wiki engines, TikiWiki.} Actually, the function of the proposed system has gone far beyond the FAQ-platform} functionality in practical application process, due to wiki wonderful concept and characteristics.}}
 * {{hidden||In this paper, we will propose a FAQ} online system based on wiki engine. The goal of this system is to reduce the counseling workload in our university. It is also can be used in other counseling field. The proposed system will be built based on one of the popular wiki engines, TikiWiki.} Actually, the function of the proposed system has gone far beyond the FAQ-platform} functionality in practical application process, due to wiki wonderful concept and characteristics.}}
 * {{hidden||In this paper, we will propose a FAQ} online system based on wiki engine. The goal of this system is to reduce the counseling workload in our university. It is also can be used in other counseling field. The proposed system will be built based on one of the popular wiki engines, TikiWiki.} Actually, the function of the proposed system has gone far beyond the FAQ-platform} functionality in practical application process, due to wiki wonderful concept and characteristics.}}


 * -- align="left" valign=top
 * Martins, A.; Rodrigues, E. & Nunes, M.
 * Information repositories and learning environments: Creating spaces for the promotion of virtual literacy and social responsibility
 * International Association of School Librarianship. Selected Papers from the ... Annual Conference
 * 2007
 * 
 * {{hidden||Information repositories are collections of digital information which can be built in several different ways and with different purposes. They can be collaborative and with a soft control of the contents and authority of the documents, as well as directed to the general public (Wikipedia} is an example of this). But they can also have a high degree of control and be conceived in order to promote literacy and responsible learning, as well as directed to special groups of users like, for instance, school students. In the new learning environments built upon digital technologies, the need to promote quality information resources that can support formal and informal e-learning emerges as one of the greatest challenges that school libraries have to face. It is now time that school libraries, namely through their regional and national school library networks, start creating their own information repositories, oriented for school pupils and directed to their specific needs of information and learning. The creation of these repositories implies a huge work of collaboration between librarians, school teachers, pupils, families and other social agents that interact within the school community, which is, in itself, a way to promote cooperative learning and social responsibility between all members of such communities. In our presentation, we will discuss the bases and principles that are behind the construction of the proposed information repositories and learning platforms as well as the need for a constant dialogue between technical and content issues. }}
 * {{hidden||Information repositories are collections of digital information which can be built in several different ways and with different purposes. They can be collaborative and with a soft control of the contents and authority of the documents, as well as directed to the general public (Wikipedia} is an example of this). But they can also have a high degree of control and be conceived in order to promote literacy and responsible learning, as well as directed to special groups of users like, for instance, school students. In the new learning environments built upon digital technologies, the need to promote quality information resources that can support formal and informal e-learning emerges as one of the greatest challenges that school libraries have to face. It is now time that school libraries, namely through their regional and national school library networks, start creating their own information repositories, oriented for school pupils and directed to their specific needs of information and learning. The creation of these repositories implies a huge work of collaboration between librarians, school teachers, pupils, families and other social agents that interact within the school community, which is, in itself, a way to promote cooperative learning and social responsibility between all members of such communities. In our presentation, we will discuss the bases and principles that are behind the construction of the proposed information repositories and learning platforms as well as the need for a constant dialogue between technical and content issues. }}


 * -- align="left" valign=top
 * Lucchese, C.; Orlando, S.; Perego, R.; Silvestri, F. & Tolomei, G.
 * Detecting Task-Based Query Sessions Using Collaborative Knowledge
 * Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
 * 2010
 * {{hidden||Our research challenge is to provide a mechanism for splitting into user task-based sessions a long-term log of queries submitted to a Web Search Engine (WSE).} The hypothesis is that some query sessions entail the concept of user task. We present an approach that relies on a centroid-based and a density-based clustering algorithm, which consider queries inter-arrival times and use a novel distance function that takes care of query lexical content and exploits the collaborative knowledge collected by Wiktionary and Wikipedia.}}
 * {{hidden||Our research challenge is to provide a mechanism for splitting into user task-based sessions a long-term log of queries submitted to a Web Search Engine (WSE).} The hypothesis is that some query sessions entail the concept of user task. We present an approach that relies on a centroid-based and a density-based clustering algorithm, which consider queries inter-arrival times and use a novel distance function that takes care of query lexical content and exploits the collaborative knowledge collected by Wiktionary and Wikipedia.}}
 * {{hidden||Our research challenge is to provide a mechanism for splitting into user task-based sessions a long-term log of queries submitted to a Web Search Engine (WSE).} The hypothesis is that some query sessions entail the concept of user task. We present an approach that relies on a centroid-based and a density-based clustering algorithm, which consider queries inter-arrival times and use a novel distance function that takes care of query lexical content and exploits the collaborative knowledge collected by Wiktionary and Wikipedia.}}


 * -- align="left" valign=top
 * Cover Art
 * Computational Aspects of Social Networks, 2009. CASON '09. International Conference on
 * 2009
 * 2009


 * -- align="left" valign=top
 * Liu, Lei & Tan, Pang-Ning
 * A Framework for Co-classification of Articles and Users in Wikipedia
 * Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
 * 2010


 * -- align="left" valign=top
 * Ohmori, K. & Kunii, T.L.
 * Author Index
 * Cyberworlds, 2007. CW '07. International Conference on
 * 2007


 * -- align="left" valign=top
 * Missen, M.M.S. & Boughanem, M.
 * Sentence-Level Opinion-Topic Association for Opinion Detection in Blogs
 * Advanced Information Networking and Applications Workshops, 2009. WAINA '09. International Conference on
 * 2009
 * {{hidden||The Opinion Detection from blogs has always been a challenge for researchers. One of the challenges faced is to find such documents that specifically contain opinion on users' information need. This requires text processing on sentence level rather than on document level. In this paper, we have proposed an opinion detection approach. The proposed approach tries to tackle opinion detection problem by using some document level heuristics and processing documents on sentence level using different semantic similarity relations of WordNet} between sentence words and list of weighted query terms expanded through encyclopedia Wikipedia. According to initial results, our approach performs well with MAP} of 0.2177 with improvement of 28.89\% over baseline results obtained through BM25} matching formula. TREC} Blog 2006 data is used as test data collection.}}
 * {{hidden||The Opinion Detection from blogs has always been a challenge for researchers. One of the challenges faced is to find such documents that specifically contain opinion on users' information need. This requires text processing on sentence level rather than on document level. In this paper, we have proposed an opinion detection approach. The proposed approach tries to tackle opinion detection problem by using some document level heuristics and processing documents on sentence level using different semantic similarity relations of WordNet} between sentence words and list of weighted query terms expanded through encyclopedia Wikipedia. According to initial results, our approach performs well with MAP} of 0.2177 with improvement of 28.89\% over baseline results obtained through BM25} matching formula. TREC} Blog 2006 data is used as test data collection.}}
 * {{hidden||The Opinion Detection from blogs has always been a challenge for researchers. One of the challenges faced is to find such documents that specifically contain opinion on users' information need. This requires text processing on sentence level rather than on document level. In this paper, we have proposed an opinion detection approach. The proposed approach tries to tackle opinion detection problem by using some document level heuristics and processing documents on sentence level using different semantic similarity relations of WordNet} between sentence words and list of weighted query terms expanded through encyclopedia Wikipedia. According to initial results, our approach performs well with MAP} of 0.2177 with improvement of 28.89\% over baseline results obtained through BM25} matching formula. TREC} Blog 2006 data is used as test data collection.}}


 * -- align="left" valign=top
 * Baeza-Yates, R.
 * Keynote Speakers
 * Web Congress, 2009. LE-WEB '09. Latin American
 * 2009
 * {{hidden||There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC} to improve search. These results not only impact the search performance but also the user interface, suggesting new ways of interaction. We will show live demos that find relations in the Wikipedia or improve image search, already available at sandbox.yahoo.com, the demo site of Yahoo! Research. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.}}
 * {{hidden||There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC} to improve search. These results not only impact the search performance but also the user interface, suggesting new ways of interaction. We will show live demos that find relations in the Wikipedia or improve image search, already available at sandbox.yahoo.com, the demo site of Yahoo! Research. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.}}
 * {{hidden||There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC} to improve search. These results not only impact the search performance but also the user interface, suggesting new ways of interaction. We will show live demos that find relations in the Wikipedia or improve image search, already available at sandbox.yahoo.com, the demo site of Yahoo! Research. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.}}


 * -- align="left" valign=top
 * Alemzadeh, Milad & Karray, Fakhri
 * An Efficient Method for Tagging a Query with Category Labels Using Wikipedia towards Enhancing Search Engine Results
 * Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
 * 2010


 * -- align="left" valign=top
 * Indrie, Sergiu & Groza, Adrian
 * Towards social argumentative machines
 * Intelligent Computer Communication and Processing (ICCP), 2010 IEEE International Conference on
 * 2010
 * {{hidden||This research advocates the idea of combining argumentation theory with the social web technology, aiming to enact large scale or mass argumentation. The proposed framework allows mass-collaborative editing of structured arguments in the style of semantic wikipedia. The Argnet system was developed based on the Semantic MediaWiki} framework and on the Argument Interchange Format ontology.}}
 * {{hidden||This research advocates the idea of combining argumentation theory with the social web technology, aiming to enact large scale or mass argumentation. The proposed framework allows mass-collaborative editing of structured arguments in the style of semantic wikipedia. The Argnet system was developed based on the Semantic MediaWiki} framework and on the Argument Interchange Format ontology.}}
 * {{hidden||This research advocates the idea of combining argumentation theory with the social web technology, aiming to enact large scale or mass argumentation. The proposed framework allows mass-collaborative editing of structured arguments in the style of semantic wikipedia. The Argnet system was developed based on the Semantic MediaWiki} framework and on the Argument Interchange Format ontology.}}


 * -- align="left" valign=top
 * Liu, Ming-Chi; Wen, Dunwei; Kinshuk & Huang, Yueh-Min
 * Learning Animal Concepts with Semantic Hierarchy-Based Location-Aware Image Browsing and Ecology Task Generator
 * Wireless, Mobile and Ubiquitous Technologies in Education (WMUTE), 2010 6th IEEE International Conference on
 * 2010
 * {{hidden||This study firstly notices that lack of overall ecologic knowledge structure is one critical reason for learners' failure of keyword search. Therefore in order to identify their current interesting sight, the dynamic location-aware and semantic hierarchy (DLASH) is presented for learners to browse images. This hierarchy mainly considers that plant and animal species are discontinuously distributed around the planet, hence this hierarchy combines location information for constructing the semantic hierarchy through WordNet.} After learners confirmed their intent information needs, this study also provides learners three kinds of image-based learning tasks to learn: similar-images comparison, concept map fill-out and placement map fill-out. These tasks are designed based on Ausubel's advance organizers and improved it by integrating three new properties: Displaying the nodes of the concepts by authentic images, automatically generating the knowledge structure by computer and interactively integrating new and old knowledge.}}
 * {{hidden||This study firstly notices that lack of overall ecologic knowledge structure is one critical reason for learners' failure of keyword search. Therefore in order to identify their current interesting sight, the dynamic location-aware and semantic hierarchy (DLASH) is presented for learners to browse images. This hierarchy mainly considers that plant and animal species are discontinuously distributed around the planet, hence this hierarchy combines location information for constructing the semantic hierarchy through WordNet.} After learners confirmed their intent information needs, this study also provides learners three kinds of image-based learning tasks to learn: similar-images comparison, concept map fill-out and placement map fill-out. These tasks are designed based on Ausubel's advance organizers and improved it by integrating three new properties: Displaying the nodes of the concepts by authentic images, automatically generating the knowledge structure by computer and interactively integrating new and old knowledge.}}
 * {{hidden||This study firstly notices that lack of overall ecologic knowledge structure is one critical reason for learners' failure of keyword search. Therefore in order to identify their current interesting sight, the dynamic location-aware and semantic hierarchy (DLASH) is presented for learners to browse images. This hierarchy mainly considers that plant and animal species are discontinuously distributed around the planet, hence this hierarchy combines location information for constructing the semantic hierarchy through WordNet.} After learners confirmed their intent information needs, this study also provides learners three kinds of image-based learning tasks to learn: similar-images comparison, concept map fill-out and placement map fill-out. These tasks are designed based on Ausubel's advance organizers and improved it by integrating three new properties: Displaying the nodes of the concepts by authentic images, automatically generating the knowledge structure by computer and interactively integrating new and old knowledge.}}


 * -- align="left" valign=top
 * Takemoto, M.; Yokohata, Y.; Tokunaga, T.; Hamada, M. & Nakamura, T.
 * Demo: Implementation of Information-Provision Service with Smart Phone and Field Trial in Shopping Area
 * Mobile and Ubiquitous Systems: Networking \& Services, 2007. MobiQuitous 2007. Fourth Annual International Conference on
 * 2007
 * {{hidden||To achieve the information-provision service, we adopted the social network concept (http://en.wikipedia.org/wiki/Social\_network\_service), which handles human relationships in networks. We have implemented the information recommendation mechanism, by which users may obtain suitable information from the system based on relationships with other users in the social network service. We believe that information used by people should be handled based on their behavior. We have developed an information-provision service based on our platform. We have been studying and developing the service coordination and provision architecture - ubiquitous service-oriented network (USON) (Takemoto} et al., 2002) - for services in ubiquitous computing environments. We have developed an information-provision service using the social network service based on USON} architecture. This demonstration shows the implementation of the information-provision system with the actual information which was used in the field trial.}}
 * {{hidden||To achieve the information-provision service, we adopted the social network concept (http://en.wikipedia.org/wiki/Social\_network\_service), which handles human relationships in networks. We have implemented the information recommendation mechanism, by which users may obtain suitable information from the system based on relationships with other users in the social network service. We believe that information used by people should be handled based on their behavior. We have developed an information-provision service based on our platform. We have been studying and developing the service coordination and provision architecture - ubiquitous service-oriented network (USON) (Takemoto} et al., 2002) - for services in ubiquitous computing environments. We have developed an information-provision service using the social network service based on USON} architecture. This demonstration shows the implementation of the information-provision system with the actual information which was used in the field trial.}}
 * {{hidden||To achieve the information-provision service, we adopted the social network concept (http://en.wikipedia.org/wiki/Social\_network\_service), which handles human relationships in networks. We have implemented the information recommendation mechanism, by which users may obtain suitable information from the system based on relationships with other users in the social network service. We believe that information used by people should be handled based on their behavior. We have developed an information-provision service based on our platform. We have been studying and developing the service coordination and provision architecture - ubiquitous service-oriented network (USON) (Takemoto} et al., 2002) - for services in ubiquitous computing environments. We have developed an information-provision service using the social network service based on USON} architecture. This demonstration shows the implementation of the information-provision system with the actual information which was used in the field trial.}}


 * -- align="left" valign=top
 * Ayyasamy, Ramesh Kumar; Tahayna, Bashar; Alhashmi, Saadat; gene, Siew Eu & Egerton, Simon
 * Mining Wikipedia Knowledge to improve document indexing and classification
 * Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Conference on
 * 2010
 * {{hidden||Weblogs are an importan source of information that requires automatic techniques to categorize them into “topic-based‿ content, to facilitate their future browsing and retrieval. In this paper we propose and illustrate the effectiveness of a new tf. idf measure. The proposed Conf.idf, Catf.idf measures are solely based on the mapping of terms-to-concepts-to-categories (TCONCAT) method that utilizes Wikipedia. The Knowledge Base-Wikipedia} is considered as a large scale Web encyclopaedia, that has high-quality and huge number of articles and categorical indexes. Using this system, our proposed framework consists of two stages to solve weblog classification problem. The first stage is to find out the terms belonging to a unique concept (article), as well as to disambiguate the terms belonging to more than one concept. The second stage is the determination of the categories to which these found concepts belong to. Experimental result confirms that, proposed system can distinguish the weblogs that belongs to more than one category efficiently and has a better performance and success than the traditional statistical Natural Language Processing-NLP} approaches.}}
 * {{hidden||Weblogs are an importan source of information that requires automatic techniques to categorize them into “topic-based‿ content, to facilitate their future browsing and retrieval. In this paper we propose and illustrate the effectiveness of a new tf. idf measure. The proposed Conf.idf, Catf.idf measures are solely based on the mapping of terms-to-concepts-to-categories (TCONCAT) method that utilizes Wikipedia. The Knowledge Base-Wikipedia} is considered as a large scale Web encyclopaedia, that has high-quality and huge number of articles and categorical indexes. Using this system, our proposed framework consists of two stages to solve weblog classification problem. The first stage is to find out the terms belonging to a unique concept (article), as well as to disambiguate the terms belonging to more than one concept. The second stage is the determination of the categories to which these found concepts belong to. Experimental result confirms that, proposed system can distinguish the weblogs that belongs to more than one category efficiently and has a better performance and success than the traditional statistical Natural Language Processing-NLP} approaches.}}
 * {{hidden||Weblogs are an importan source of information that requires automatic techniques to categorize them into “topic-based‿ content, to facilitate their future browsing and retrieval. In this paper we propose and illustrate the effectiveness of a new tf. idf measure. The proposed Conf.idf, Catf.idf measures are solely based on the mapping of terms-to-concepts-to-categories (TCONCAT) method that utilizes Wikipedia. The Knowledge Base-Wikipedia} is considered as a large scale Web encyclopaedia, that has high-quality and huge number of articles and categorical indexes. Using this system, our proposed framework consists of two stages to solve weblog classification problem. The first stage is to find out the terms belonging to a unique concept (article), as well as to disambiguate the terms belonging to more than one concept. The second stage is the determination of the categories to which these found concepts belong to. Experimental result confirms that, proposed system can distinguish the weblogs that belongs to more than one category efficiently and has a better performance and success than the traditional statistical Natural Language Processing-NLP} approaches.}}


 * -- align="left" valign=top
 * Malone, T.W.
 * Collective intelligence
 * Collaborative Technologies and Systems, 2007. CTS 2007. International Symposium on
 * 2007


 * -- align="left" valign=top
 * Zeng, Honglei; Alhossaini, Maher A.; Fikes, Richard & McGuinness, Deborah L.
 * Mining Revision History to Assess Trustworthiness of Article Fragments
 * Collaborative Computing: Networking, Applications and Worksharing, 2006. CollaborateCom 2006. International Conference on
 * 2006


 * -- align="left" valign=top
 * qdah, Majdi Al & Falzi, Aznan
 * An Educational Game for School Students
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Abrial, J. -R. & Hoang, Thai Son
 * Using Design Patterns in Formal Methods: An Event-B Approach
 * Proceedings of the 5th international colloquium on Theoretical Aspects of Computing
 * 2008
 * 
 * {{hidden||{Emphasis Motivation.Emphasis Formal Methods users are given sophisticated languages and tools for constructing models of complex systems. But quite often they lack some systematic methodological approaches which could help them. The goal of introducing design patterns within formal methods is precisely to bridge this gap. Emphasis A design pattern Emphasis is a general reusable solution to a commonly occurring problem in (software) design . . . It is a description or template for how to solve a problem that can be used in many different situations (Wikipedia on "Design Pattern").}}
 * {{hidden||{Emphasis Motivation.Emphasis Formal Methods users are given sophisticated languages and tools for constructing models of complex systems. But quite often they lack some systematic methodological approaches which could help them. The goal of introducing design patterns within formal methods is precisely to bridge this gap. Emphasis A design pattern Emphasis is a general reusable solution to a commonly occurring problem in (software) design . . . It is a description or template for how to solve a problem that can be used in many different situations (Wikipedia on "Design Pattern").}}


 * -- align="left" valign=top
 * Adafre, Sisay Fissaha & de Rijke, Maarten
 * Discovering missing links in Wikipedia
 * Proceedings of the 3rd international workshop on Link discovery
 * 2005
 * 
 * {{hidden||In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the given page. The main innovation is in the algorithm that we use for identifying similar pages, LTRank, which ranks pages using co-citation and page title information. Both LTRank} and the link discovery method are manually evaluated and show acceptable results, especially given the simplicity of the methods and conservativeness of the evaluation criteria.}}
 * {{hidden||In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the given page. The main innovation is in the algorithm that we use for identifying similar pages, LTRank, which ranks pages using co-citation and page title information. Both LTRank} and the link discovery method are manually evaluated and show acceptable results, especially given the simplicity of the methods and conservativeness of the evaluation criteria.}}


 * -- align="left" valign=top
 * Adams, Catherine
 * Learning Management Systems as sites of surveillance, control, and corporatization: A review of the critical literature
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Al-Senaidi, Said
 * Integrating Web 2.0 in Technology based learning environment
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Allen, Matthew
 * Authentic Assessment and the Internet: Contributions within Knowledge Networks
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Allen, Nancy; Alnaimi, Tarfa Nasser & Lubaisi, Huda Ak
 * Leadership for Technology Adoption in a Reform Community
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Allen, R.B. & Nalluru, S.
 * Exploring history with narrative timelines
 * Human Interface and the Management of Information. Designing Information Environments. Symposium on Human Interface 2009, 19-24 July 2009 Berlin, Germany
 * 2009
 * 


 * -- align="left" valign=top
 * Amin, Mohammad Shafkat; Bhattacharjee, Anupam & Jamil, Hasan
 * Wikipedia driven autonomous label assignment in wrapper induced tables with missing column names
 * Proceedings of the 2010 ACM Symposium on Applied Computing
 * 2010
 * 


 * -- align="left" valign=top
 * Ammann, Alexander & Matthies, Herbert K.
 * K-Space DentMed/Visual Library: Generating and Presenting Dynamic Knowledge Spaces for Dental Research, Education, Clinical and Laboratory Practice
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Anderka, M.; Lipka, N. & Stein, B.
 * Evaluating cross-language explicit semantic analysis and cross querying
 * Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||This paper describes our participation in the TEL@CLEF} task of the CLEF} 2009 ad-hoc track. The task is to retrieve items from various multilingual collections of library catalog records, which are relevant to a user's query. Two different strategies are employed: (i) the Cross-Language} Explicit Semantic Analysis, CL-ESA, where the library catalog records and the queries are represented in a multilingual concept space that is spanned by aligned Wikipedia articles, and, (ii) a Cross Querying approach, where a query is translated into all target languages using Google Translate and where the obtained rankings are combined. The evaluation shows that both strategies outperform the monolingual baseline and achieve comparable results. Furthermore, inspired by the Generalized Vector Space Model we present a formal definition and an alternative interpretation of the CL-ESA} model. This interpretation is interesting for real-world retrieval applications since it reveals how the computational effort for CL-ESA} can be shifted from the query phase to a preprocessing phase.}}
 * {{hidden||This paper describes our participation in the TEL@CLEF} task of the CLEF} 2009 ad-hoc track. The task is to retrieve items from various multilingual collections of library catalog records, which are relevant to a user's query. Two different strategies are employed: (i) the Cross-Language} Explicit Semantic Analysis, CL-ESA, where the library catalog records and the queries are represented in a multilingual concept space that is spanned by aligned Wikipedia articles, and, (ii) a Cross Querying approach, where a query is translated into all target languages using Google Translate and where the obtained rankings are combined. The evaluation shows that both strategies outperform the monolingual baseline and achieve comparable results. Furthermore, inspired by the Generalized Vector Space Model we present a formal definition and an alternative interpretation of the CL-ESA} model. This interpretation is interesting for real-world retrieval applications since it reveals how the computational effort for CL-ESA} can be shifted from the query phase to a preprocessing phase.}}


 * -- align="left" valign=top
 * Angel, Albert; Lontou, Chara; Pfoser, Dieter & Efentakis, Alexandros
 * Qualitative geocoding of persistent web pages
 * Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
 * 2008
 * 


 * -- align="left" valign=top
 * Anma, Fumihiko & Okamoto, Toshio
 * Development of a Participatory Learning Support System based on Social Networking Service
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Antin, Judd & Cheshire, Coye
 * Readers are not free-riders: reading as a form of participation on wikipedia
 * Proceedings of the 2010 ACM conference on Computer supported cooperative work
 * 2010
 * 


 * -- align="left" valign=top
 * Anzai, Yayoi
 * Digital Trends among Japanese University Students: Focusing on Podcasting and Wikis
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Anzai, Yayoi
 * Interactions as the key for successful Web 2.0 integrated language learning: Interactions in a planetary community
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Anzai, Yayoi
 * Introducing a Wiki in EFL Writing Class
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Aoki, Kumiko & Molnar, Pal
 * International Collaborative Learning using Web 2.0: Learning of Foreign Language and Intercultural Understanding
 * Global Learn Asia Pacific
 * 2010
 * 


 * -- align="left" valign=top
 * Arney, David
 * Cooperative e-Learning and other 21st Century Pedagogies
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Ashraf, Bill
 * Teaching the Google–Eyed YouTube Generation
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Atkinson, Tom
 * Cell-Based Learning
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Auer, Sören & Lehmann, Jens
 * What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content
 * Proceedings of the 4th European conference on The Semantic Web: Research and Applications
 * 2007
 * 
 * {{hidden||Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present a method for revealing this structured content by extracting information from template instances. We suggest ways to efficiently query the vast amount of extracted information (e.g. more than 8 million RDF} statements for the English Wikipedia version alone), leading to astonishing query answering possibilities (such as for the title question). We analyze the quality of the extracted content, and propose strategies for quality improvements with just minor modifications of the wiki systems being currently used.}}
 * {{hidden||Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present a method for revealing this structured content by extracting information from template instances. We suggest ways to efficiently query the vast amount of extracted information (e.g. more than 8 million RDF} statements for the English Wikipedia version alone), leading to astonishing query answering possibilities (such as for the title question). We analyze the quality of the extracted content, and propose strategies for quality improvements with just minor modifications of the wiki systems being currently used.}}


 * -- align="left" valign=top
 * Avgerinou, Maria & Pettersson, Rune
 * How Multimedia Research Can Optimize the Design of Instructional Vodcasts
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Aybar, Hector; Juell, Paul & Shanmugasundaram, Vijayakumar
 * Increased Flexablity in Display of Course Content
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Baeza-Yates, R.
 * Mining the Web 2.0 to improve search
 * 2009 Latin American Web Congress. LA-WEB 2009, 9-11 Nov. 2009 Piscataway, NJ, USA}
 * 2009
 * 
 * {{hidden||There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC} to improve search. These results not only impact the search performance but also the user interface, suggesting new ways of interaction. We will show live demos that find relations in the Wikipedia or improve image search, already available at sandbox.yahoo.com, the demo site of Yahoo! Research. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.}}
 * {{hidden||There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC} to improve search. These results not only impact the search performance but also the user interface, suggesting new ways of interaction. We will show live demos that find relations in the Wikipedia or improve image search, already available at sandbox.yahoo.com, the demo site of Yahoo! Research. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.}}


 * -- align="left" valign=top
 * Baeza-Yates, Ricardo
 * User generated content: how good is it?
 * Proceedings of the 3rd workshop on Information credibility on the web
 * 2009
 * 
 * {{hidden||User Generated Content (UGC) is one of the main current trends in the Web. This trend has allowed all people that can access the Internet to publish content in different media, such as text (e.g. blogs), photos or video. This data can be crucial for many applications, in particular for semantic search. It is early to say which impact UGC} will have and to what extent. However, the impact will be clearly related to the quality of this content. Hence, how good is the content that people generate in the so called Web 2.0? Clearly is not as good as editorial content in the Web site of a publisher. However, histories of success such as the case of the Wikipedia, show that it can be quite good. In addition, the quality gap is balanced by volume, as user generated content is much larger than, say, editorial content. In fact, Ramakrishnan and Tomkins estimate that UGC} generates daily from 8 to {10GB} while the professional Web only generates {2GB} in the same time. How we can estimate the quality of UGC?} One possibility is to directly evaluate the quality, but that is not easy as depends on the type of content and the availability of human judgments. One example of such approach is the study of Yahoo! Answers done by Agichtein et al. In this work they start from a judged question/answer collection where good questions usually have good answers. Then they predict good questions and good answers, obtaining an AUC} (area under the curve of the precision-recall graph) of 0.76 and 0.88, respectively. A second possibility is obtaining indirect evidence of the quality. For example, use UGC} for a given task and then evaluate the quality of the task results. One such example is the extraction of semantic relations done by Baeza-Yates} and Tiberi. To evaluate the quality of the results they used the Open Directory Project (ODP), showing that the results had a precision of over 60\%. For the cases that were not found in the ODP, a manually verified sample showed that the real precision was close to 100\%. What happened was that the ODP} was not specific enough to contain very specific relations, and every day the problem gets worse as we have more data. This example shows the quality of ODP} as well as the semantic encoded in queries. Notice that we can define queries as implicit UGC, because each query can be considered an implicit tag to Web pages that are clicked for that query, and hence we have an implicit folksonomy. A final alternative is crossing different UGC} sources and infer from there the quality of those sources. An example of this case, is the work by Van Zwol et al. where they use collective knowledge (wisdom of crowds) to extend image tags, and prove that almost 70\% of the tags can be semantically classified by using Wordnet and Wikipedia. This exposes the quality of both Flickr tags and Wikipedia. Our main motivation, is that by being able to generate semantic resources automatically from the Web (and in particular the Web 2.0), even with noise, coupling that with open content resources, we can create a virtuous feedback circuit. In fact, explicit and implicit folksonomies can be used to do supervised machine learning without the need of manual intervention (or at least drastically reduce it) to improve semantic tagging. After that, we can feedback the results on itself, and repeat the process. Using the right conditions, every iteration should improve the output, obtaining a virtuous cycle. As a side effect, we can also improve Web search, our main goal.}}
 * {{hidden||User Generated Content (UGC) is one of the main current trends in the Web. This trend has allowed all people that can access the Internet to publish content in different media, such as text (e.g. blogs), photos or video. This data can be crucial for many applications, in particular for semantic search. It is early to say which impact UGC} will have and to what extent. However, the impact will be clearly related to the quality of this content. Hence, how good is the content that people generate in the so called Web 2.0? Clearly is not as good as editorial content in the Web site of a publisher. However, histories of success such as the case of the Wikipedia, show that it can be quite good. In addition, the quality gap is balanced by volume, as user generated content is much larger than, say, editorial content. In fact, Ramakrishnan and Tomkins estimate that UGC} generates daily from 8 to {10GB} while the professional Web only generates {2GB} in the same time. How we can estimate the quality of UGC?} One possibility is to directly evaluate the quality, but that is not easy as depends on the type of content and the availability of human judgments. One example of such approach is the study of Yahoo! Answers done by Agichtein et al. In this work they start from a judged question/answer collection where good questions usually have good answers. Then they predict good questions and good answers, obtaining an AUC} (area under the curve of the precision-recall graph) of 0.76 and 0.88, respectively. A second possibility is obtaining indirect evidence of the quality. For example, use UGC} for a given task and then evaluate the quality of the task results. One such example is the extraction of semantic relations done by Baeza-Yates} and Tiberi. To evaluate the quality of the results they used the Open Directory Project (ODP), showing that the results had a precision of over 60\%. For the cases that were not found in the ODP, a manually verified sample showed that the real precision was close to 100\%. What happened was that the ODP} was not specific enough to contain very specific relations, and every day the problem gets worse as we have more data. This example shows the quality of ODP} as well as the semantic encoded in queries. Notice that we can define queries as implicit UGC, because each query can be considered an implicit tag to Web pages that are clicked for that query, and hence we have an implicit folksonomy. A final alternative is crossing different UGC} sources and infer from there the quality of those sources. An example of this case, is the work by Van Zwol et al. where they use collective knowledge (wisdom of crowds) to extend image tags, and prove that almost 70\% of the tags can be semantically classified by using Wordnet and Wikipedia. This exposes the quality of both Flickr tags and Wikipedia. Our main motivation, is that by being able to generate semantic resources automatically from the Web (and in particular the Web 2.0), even with noise, coupling that with open content resources, we can create a virtuous feedback circuit. In fact, explicit and implicit folksonomies can be used to do supervised machine learning without the need of manual intervention (or at least drastically reduce it) to improve semantic tagging. After that, we can feedback the results on itself, and repeat the process. Using the right conditions, every iteration should improve the output, obtaining a virtuous cycle. As a side effect, we can also improve Web search, our main goal.}}


 * -- align="left" valign=top
 * Baker, Peter; Xiao, Yun & Kidd, Jennifer
 * Digital Natives and Digital Immigrants: A Comparison across Course Tasks and Delivery Methodologies
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Bakker, A.; Petrocco, R.; Dale, M.; Gerber, J.; Grishchenko, V.; Rabaioli, D. & Pouwelse, J.
 * Online Video Using BitTorrent And HTML5 Applied To Wikipedia
 * 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P 2010), 25-27 Aug. 2010 Piscataway, NJ, USA
 * 2010
 * 
 * {{hidden||Wikipedia started a project in order to enable users to add video and audio on their Wiki pages. The technical downside of this is that its bandwidth requirements will increase manifold. BitTorrent-based} peer-to-peer technology from P2P-Next} (a European research project) is explored to handle this bandwidth surge. We discuss the impact on the BitTorrent} piece picker and outline our tribe protocol for seamless integration of P2P} into the {HTML5} video and audio elements. Ongoing work on libswift which uses UDP, an enhanced transport protocol and integrated NAT/Firewall} puncturing, is also described.}}
 * {{hidden||Wikipedia started a project in order to enable users to add video and audio on their Wiki pages. The technical downside of this is that its bandwidth requirements will increase manifold. BitTorrent-based} peer-to-peer technology from P2P-Next} (a European research project) is explored to handle this bandwidth surge. We discuss the impact on the BitTorrent} piece picker and outline our tribe protocol for seamless integration of P2P} into the {HTML5} video and audio elements. Ongoing work on libswift which uses UDP, an enhanced transport protocol and integrated NAT/Firewall} puncturing, is also described.}}


 * -- align="left" valign=top
 * Balasuriya, Dominic; Ringland, Nicky; Nothman, Joel; Murphy, Tara & Curran, James R.
 * Named entity recognition in Wikipedia
 * Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
 * 2009
 * 
 * {{hidden||Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these resources have only been evaluated on newswire corpora or themselves. We present the first NER} evaluation on a Wikipedia gold standard (WG) corpus. Our analysis of cross-corpus performance on WG} shows that Wikipedia text may be a harder NER} domain than newswire. We find that an automatic annotation of Wikipedia has high agreement with WG} and, when used as training data, outperforms newswire models by up to 7.7\%.}}
 * {{hidden||Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these resources have only been evaluated on newswire corpora or themselves. We present the first NER} evaluation on a Wikipedia gold standard (WG) corpus. Our analysis of cross-corpus performance on WG} shows that Wikipedia text may be a harder NER} domain than newswire. We find that an automatic annotation of Wikipedia has high agreement with WG} and, when used as training data, outperforms newswire models by up to 7.7\%.}}


 * -- align="left" valign=top
 * Balmin, Andrey & Curtmola, Emiran
 * WikiAnalytics: disambiguation of keyword search results on highly heterogeneous structured data
 * WebDB '10 Procceedings of the 13th International Workshop on the Web and Databases
 * 2010
 * 
 * {{hidden||Wikipedia infoboxes is an example of a seemingly structured, yet extraordinarily heterogenous dataset, where any given record has only a tiny fraction of all possible fields. Such data cannot be queried using traditional means without a massive a priori integration effort, since even for a simple request the result values span many record types and fields. On the other hand, the solutions based on keyword search are too imprecise to capture user's intent. To address these limitations, we propose a system, referred to herein as WikiAnalytics, that utilizes a novel search paradigm in order to derive tables of precise and complete results from Wikipedia infobox records. The user starts with a keyword search query that finds a superset of the result records, and then browses clusters of records deciding which are and are not relevant. WikiAnalytics} uses three categories of clustering features based on record types, fields, and values that matched the query keywords, respectively. Since the system cannot predict which combination of features will be important to the user, it efficiently generates all possible clusters of records by all sets of features. We utilize a novel data structure, universal navigational lattice (UNL), that compactly encodes all possible clusters. WikiAnalytics} provides a dynamic and intuitive interface that lets the user explore the UNL} and construct homogeneous structured tables, which can be further queried and aggregated using the conventional tools.}}
 * {{hidden||Wikipedia infoboxes is an example of a seemingly structured, yet extraordinarily heterogenous dataset, where any given record has only a tiny fraction of all possible fields. Such data cannot be queried using traditional means without a massive a priori integration effort, since even for a simple request the result values span many record types and fields. On the other hand, the solutions based on keyword search are too imprecise to capture user's intent. To address these limitations, we propose a system, referred to herein as WikiAnalytics, that utilizes a novel search paradigm in order to derive tables of precise and complete results from Wikipedia infobox records. The user starts with a keyword search query that finds a superset of the result records, and then browses clusters of records deciding which are and are not relevant. WikiAnalytics} uses three categories of clustering features based on record types, fields, and values that matched the query keywords, respectively. Since the system cannot predict which combination of features will be important to the user, it efficiently generates all possible clusters of records by all sets of features. We utilize a novel data structure, universal navigational lattice (UNL), that compactly encodes all possible clusters. WikiAnalytics} provides a dynamic and intuitive interface that lets the user explore the UNL} and construct homogeneous structured tables, which can be further queried and aggregated using the conventional tools.}}


 * -- align="left" valign=top
 * Balog-Crisan, Radu; Roxin, Ioan & Smeureanu, Ion
 * e-Learning platforms for Semantic Web
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Banek, M.; Juric, D. & Skocir, Z.
 * Learning Semantic N-ary Relations from Wikipedia
 * Database and Expert Systems Applications. 21st International Conference, DEXA 2010, 30 Aug.-3 Sept. 2010 Berlin, Germany
 * 2010
 * 


 * -- align="left" valign=top
 * Barker, Philip
 * Using Wikis and Weblogs to Enhance Human Performance
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Barker, Philip
 * Using Wikis for Knowledge Management
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Baron, Georges-Louis & Bruillard, Eric
 * New learners, Teaching Practices and Teacher Education: Which Synergies? The French case
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Bart, Thurber & Pope, Jack
 * The Humanities in the Learning Space
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Basiel, Anthony Skip
 * The media literacy spectrum: shifting pedagogic design
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Basile, Anthony & Murphy, John
 * The Path to Open Source in Course Management Systems Used in Distance Education Programs
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Basile, Pierpaolo & Semeraro, Giovanni
 * UBA: Using automatic translation and Wikipedia for cross-lingual lexical substitution
 * Proceedings of the 5th International Workshop on Semantic Evaluation
 * 2010
 * 
 * {{hidden||This paper presents the participation of the University of Bari (UBA) at the SemEval-2010} Cross-Lingual} Lexical Substitution Task. The goal of the task is to substitute a word in a language Ls, which occurs in a particular context, by providing the best synonyms in a different language Lt which fit in that context. This task has a strict relation with the task of automatic machine translation, but there are some differences: Cross-lingual lexical substitution targets one word at a time and the main goal is to find as many good translations as possible for the given target word. Moreover, there are some connections with Word Sense Disambiguation (WSD) algorithms. Indeed, understanding the meaning of the target word is necessary to find the best substitutions. An important aspect of this kind of task is the possibility of finding synonyms without using a particular sense inventory or a specific parallel corpus, thus allowing the participation of unsupervised approaches. UBA} proposes two systems: the former is based on an automatic translation system which exploits Google Translator, the latter is based on a parallel corpus approach which relies on Wikipedia in order to find the best substitutions.}}
 * {{hidden||This paper presents the participation of the University of Bari (UBA) at the SemEval-2010} Cross-Lingual} Lexical Substitution Task. The goal of the task is to substitute a word in a language Ls, which occurs in a particular context, by providing the best synonyms in a different language Lt which fit in that context. This task has a strict relation with the task of automatic machine translation, but there are some differences: Cross-lingual lexical substitution targets one word at a time and the main goal is to find as many good translations as possible for the given target word. Moreover, there are some connections with Word Sense Disambiguation (WSD) algorithms. Indeed, understanding the meaning of the target word is necessary to find the best substitutions. An important aspect of this kind of task is the possibility of finding synonyms without using a particular sense inventory or a specific parallel corpus, thus allowing the participation of unsupervised approaches. UBA} proposes two systems: the former is based on an automatic translation system which exploits Google Translator, the latter is based on a parallel corpus approach which relies on Wikipedia in order to find the best substitutions.}}


 * -- align="left" valign=top
 * Basili, Roberto; Bos, Johan & Copestake, Ann
 * Proceedings of the 2008 Conference on Semantics in Text Processing
 * 2008
 * 
 * {{hidden||Thanks to both statistical approaches and finite state methods, natural language processing (NLP), particularly in the area of robust, open-domain text processing, has made considerable progress in the last couple of decades. It is probably fair to say that NLP} tools have reached satisfactory performance at the level of syntactic processing, be the output structures chunks, phrase structures, or dependency graphs. Therefore, the time seems ripe to extend the state-of-the-art and consider deep semantic processing as a serious task in wide-coverage NLP.} This is a step that normally requires syntactic parsing, as well as integrating named entity recognition, anaphora resolution, thematic role labelling and word sense disambiguation, and other lower levels of processing for which reasonably good methods have already been developed. The goal of the STEP} workshop is to provide a forum for anyone active in semantic processing of text to discuss innovative technologies, representation issues, inference techniques, prototype implementations, and real applications. The preferred processing targets are large quantities of texts---either specialised domains, or open domains such as newswire text, blogs, and wikipedia-like text. Implemented rather than theoretical work is emphasised in STEP.} Featuring in STEP} 2008 workshop is a shared task" on comparing semantic representations as output by state-of-the-art NLP} systems. Participants were asked to supply a (small) text before the workshop. The test data for the shared task is composed out of all the texts submitted by the participants allowing participants to "challenge" each other. The output of these systems will be judged on a number of aspects by a panel of experts in the field during the workshop."}}
 * {{hidden||Thanks to both statistical approaches and finite state methods, natural language processing (NLP), particularly in the area of robust, open-domain text processing, has made considerable progress in the last couple of decades. It is probably fair to say that NLP} tools have reached satisfactory performance at the level of syntactic processing, be the output structures chunks, phrase structures, or dependency graphs. Therefore, the time seems ripe to extend the state-of-the-art and consider deep semantic processing as a serious task in wide-coverage NLP.} This is a step that normally requires syntactic parsing, as well as integrating named entity recognition, anaphora resolution, thematic role labelling and word sense disambiguation, and other lower levels of processing for which reasonably good methods have already been developed. The goal of the STEP} workshop is to provide a forum for anyone active in semantic processing of text to discuss innovative technologies, representation issues, inference techniques, prototype implementations, and real applications. The preferred processing targets are large quantities of texts---either specialised domains, or open domains such as newswire text, blogs, and wikipedia-like text. Implemented rather than theoretical work is emphasised in STEP.} Featuring in STEP} 2008 workshop is a shared task" on comparing semantic representations as output by state-of-the-art NLP} systems. Participants were asked to supply a (small) text before the workshop. The test data for the shared task is composed out of all the texts submitted by the participants allowing participants to "challenge" each other. The output of these systems will be judged on a number of aspects by a panel of experts in the field during the workshop."}}
 * {{hidden||Thanks to both statistical approaches and finite state methods, natural language processing (NLP), particularly in the area of robust, open-domain text processing, has made considerable progress in the last couple of decades. It is probably fair to say that NLP} tools have reached satisfactory performance at the level of syntactic processing, be the output structures chunks, phrase structures, or dependency graphs. Therefore, the time seems ripe to extend the state-of-the-art and consider deep semantic processing as a serious task in wide-coverage NLP.} This is a step that normally requires syntactic parsing, as well as integrating named entity recognition, anaphora resolution, thematic role labelling and word sense disambiguation, and other lower levels of processing for which reasonably good methods have already been developed. The goal of the STEP} workshop is to provide a forum for anyone active in semantic processing of text to discuss innovative technologies, representation issues, inference techniques, prototype implementations, and real applications. The preferred processing targets are large quantities of texts---either specialised domains, or open domains such as newswire text, blogs, and wikipedia-like text. Implemented rather than theoretical work is emphasised in STEP.} Featuring in STEP} 2008 workshop is a shared task" on comparing semantic representations as output by state-of-the-art NLP} systems. Participants were asked to supply a (small) text before the workshop. The test data for the shared task is composed out of all the texts submitted by the participants allowing participants to "challenge" each other. The output of these systems will be judged on a number of aspects by a panel of experts in the field during the workshop."}}


 * -- align="left" valign=top
 * Bataineh, Emad & Abbar, Hend Al
 * New Mobile-based Electronic Grade Management System
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Batista, Carlos Eduardo C. F. & Schwabe, Daniel
 * LinkedTube: semantic information on web media objects
 * Proceedings of the XV Brazilian Symposium on Multimedia and the Web
 * 2009
 * 
 * {{hidden||LinkedTube} is a service to create semantic and non-semantic relationships between videos available on services on the Internet (such as YouTube) and external elements (such as Wikipedia, Internet Movie Database, DBPedia, etc). The relationships are defined based on semantic entities obtained through an analysis of textual elements related to the video - its metadata, tags, user comments and external related content (such as sites linking to the video). The set of data comprising the extracted entities and the video metadata are used to define semantic relations between the video and the semantic entities from the Linked Data Cloud. Those relationships are defined using a vocabulary extended from MOWL, based on an extensible set of rules of analysis of the video's related content.}}
 * {{hidden||LinkedTube} is a service to create semantic and non-semantic relationships between videos available on services on the Internet (such as YouTube) and external elements (such as Wikipedia, Internet Movie Database, DBPedia, etc). The relationships are defined based on semantic entities obtained through an analysis of textual elements related to the video - its metadata, tags, user comments and external related content (such as sites linking to the video). The set of data comprising the extracted entities and the video metadata are used to define semantic relations between the video and the semantic entities from the Linked Data Cloud. Those relationships are defined using a vocabulary extended from MOWL, based on an extensible set of rules of analysis of the video's related content.}}


 * -- align="left" valign=top
 * Battye, Greg
 * Turning the ship around while changing horses in mid-stream: Building a University-wide framework for Online and Blended Learning at the University of Canberra
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Baytiyeh, Hoda & Pfaffman, Jay
 * Why be a Wikipedian
 * Proceedings of the 9th international conference on Computer supported collaborative learning - Volume 1
 * 2009
 * 


 * -- align="left" valign=top
 * Bechet, F. & Charton, E.
 * Unsupervised knowledge acquisition for extracting named entities from speech
 * 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010, 14-19 March 2010 Dallas, TX, USA}
 * 2010
 * 
 * {{hidden||This paper presents a Named Entity Recognition (NER) method dedicated to process speech transcriptions. The main principle behind this method is to collect in an unsupervised way lexical knowledge for all entries in the ASR} lexicon. This knowledge is gathered with two methods: by automatically extracting NEs} on a very large set of textual corpora and by exploiting directly the structure contained in the Wikipedia resource. This lexical knowledge is used to update the statistical models of our NER} module based on a mixed approach with generative models (Hidden} Markov Models - {HMM) and discriminative models (Conditional} Random Field - CRF).} This approach has been evaluated within the French ESTER} 2 evaluation program and obtained the best results at the NER} task on ASR} transcripts.}}
 * {{hidden||This paper presents a Named Entity Recognition (NER) method dedicated to process speech transcriptions. The main principle behind this method is to collect in an unsupervised way lexical knowledge for all entries in the ASR} lexicon. This knowledge is gathered with two methods: by automatically extracting NEs} on a very large set of textual corpora and by exploiting directly the structure contained in the Wikipedia resource. This lexical knowledge is used to update the statistical models of our NER} module based on a mixed approach with generative models (Hidden} Markov Models - {HMM) and discriminative models (Conditional} Random Field - CRF).} This approach has been evaluated within the French ESTER} 2 evaluation program and obtained the best results at the NER} task on ASR} transcripts.}}


 * -- align="left" valign=top
 * Becker, Katrin
 * Teaching Teachers about Serious Games
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Belz, Anja; Kow, Eric & Viethen, Jette
 * The GREC named entity generation challenge 2009: overview and evaluation results
 * Proceedings of the 2009 Workshop on Language Generation and Summarisation
 * 2009
 * 
 * {{hidden||The GREC-NEG} Task at Generation Challenges 2009 required participating systems to select coreference chains for all people entities mentioned in short encyclopaedic texts about people collected from Wikipedia. Three teams submitted six systems in total, and we additionally created four baseline systems. Systems were tested automatically using a range of existing intrinsic metrics. We also evaluated systems extrinsically by applying coreference resolution tools to the outputs and measuring the success of the tools. In addition, systems were tested in an intrinsic evaluation involving human judges. This report describes the GREC-NEG} Task and the evaluation methods applied, gives brief descriptions of the participating systems, and presents the evaluation results.}}
 * {{hidden||The GREC-NEG} Task at Generation Challenges 2009 required participating systems to select coreference chains for all people entities mentioned in short encyclopaedic texts about people collected from Wikipedia. Three teams submitted six systems in total, and we additionally created four baseline systems. Systems were tested automatically using a range of existing intrinsic metrics. We also evaluated systems extrinsically by applying coreference resolution tools to the outputs and measuring the success of the tools. In addition, systems were tested in an intrinsic evaluation involving human judges. This report describes the GREC-NEG} Task and the evaluation methods applied, gives brief descriptions of the participating systems, and presents the evaluation results.}}


 * -- align="left" valign=top
 * Belz, Anja; Kow, Eric; Viethen, Jette & Gatt, Albert
 * The GREC challenge: overview and evaluation results
 * Proceedings of the Fifth International Natural Language Generation Conference
 * 2008
 * 
 * {{hidden||The GREC} Task at REG} '08 required participating systems to select coreference chains to the main subject of short encyclopaedic texts collected from Wikipedia. Three teams submitted a total of 6 systems, and we additionally created four baseline systems. Systems were tested automatically using a range of existing intrinsic metrics. We also evaluated systems extrinsically by applying coreference resolution tools to the outputs and measuring the success of the tools. In addition, systems were tested in a reading/comprehension experiment involving human subjects. This report describes the GREC} Task and the evaluation methods, gives brief descriptions of the participating systems, and presents the evaluation results.}}
 * {{hidden||The GREC} Task at REG} '08 required participating systems to select coreference chains to the main subject of short encyclopaedic texts collected from Wikipedia. Three teams submitted a total of 6 systems, and we additionally created four baseline systems. Systems were tested automatically using a range of existing intrinsic metrics. We also evaluated systems extrinsically by applying coreference resolution tools to the outputs and measuring the success of the tools. In addition, systems were tested in a reading/comprehension experiment involving human subjects. This report describes the GREC} Task and the evaluation methods, gives brief descriptions of the participating systems, and presents the evaluation results.}}


 * -- align="left" valign=top
 * Bernardis, Daniela
 * Education and Pervasive Computing. Didactical Use of the Mobile Phone: Create and Share Information Concerning Artistic Heritages and the Environment.
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Bhattacharya, Madhumita & Dron, Jon
 * Mining Collective Intelligence for Creativity and Innovation: A Research proposal
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Bjelland, Tor Kristian & Nordbotten, Svein
 * A Best Practice Online Course Architect
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Black, Aprille Noe; Falls, Jane & Black, Aprille Noe
 * The Use of Web 2.0 Tools for Collaboration and the Development of 21st Century Skills
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Blocher, Michael & Tu, Chih-Hsiung
 * Utilizing a Wiki to Construct Knowledge
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Blok, Rasmus & Godsk, Mikkel
 * Podcasts in Higher Education: What Students Want, What They Really Need, and How This Might be Supported
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Bocek, Thomas; Peric, Dalibor; Hecht, Fabio; Hausheer, David & Stiller, Burkhard
 * PeerVote: A Decentralized Voting Mechanism for P2P Collaboration Systems
 * Proceedings of the 3rd International Conference on Autonomous Infrastructure, Management and Security: Scalability of Networks and Services
 * 2009
 * 
 * {{hidden||Peer-to-peer (P2P) systems achieve scalability, fault tolerance, and load balancing with a low-cost infrastructure, characteristics from which collaboration systems, such as Wikipedia, can benefit. A major challenge in P2P} collaboration systems is to maintain article quality after each modification in the presence of malicious peers. A way of achieving this goal is to allow modifications to take effect only if a majority of previous editors approve the changes through voting. The absence of a central authority makes voting a challenge in P2P} systems. This paper proposes the fully decentralized voting mechanism PeerVote, which enables users to vote on modifications in articles in a P2P} collaboration system. Simulations and experiments show the scalability and robustness of PeerVote, even in the presence of malicious peers.}}
 * {{hidden||Peer-to-peer (P2P) systems achieve scalability, fault tolerance, and load balancing with a low-cost infrastructure, characteristics from which collaboration systems, such as Wikipedia, can benefit. A major challenge in P2P} collaboration systems is to maintain article quality after each modification in the presence of malicious peers. A way of achieving this goal is to allow modifications to take effect only if a majority of previous editors approve the changes through voting. The absence of a central authority makes voting a challenge in P2P} systems. This paper proposes the fully decentralized voting mechanism PeerVote, which enables users to vote on modifications in articles in a P2P} collaboration system. Simulations and experiments show the scalability and robustness of PeerVote, even in the presence of malicious peers.}}


 * -- align="left" valign=top
 * Bonk, Curtis
 * The World is Open: How Web Technology Is Revolutionizing Education
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Bouma, Gosse; Duarte, Sergio & Islam, Zahurul
 * Cross-lingual alignment and completion of Wikipedia templates
 * Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
 * 2009
 * 


 * -- align="left" valign=top
 * Bouma, G.; Fahmi, I.; Mur, J.; van Noord, G.; van der Plas, L. & Tiedemann, J.
 * Using syntactic knowledge for QA*
 * Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany
 * 2007
 * {{hidden||We describe the system of the University of Groningen for the monolingual Dutch and multilingual English to Dutch QA} tasks. First, we give a brief outline of the architecture of our QA-system, which makes heavy use of syntactic information. Next, we describe the modules that were improved or developed especially for the CLEF} tasks, among others incorporation of syntactic knowledge in IR, incorporation of lexical equivalences and coreference resolution, and a baseline multilingual (English} to Dutch) QA} system, which uses a combination of Systran and Wikipedia (for term recognition and translation) for question translation. For non-list questions, 31\% (20\%) of the highest ranked answers returned by the monolingual (multilingual) system were correct.}}
 * {{hidden||We describe the system of the University of Groningen for the monolingual Dutch and multilingual English to Dutch QA} tasks. First, we give a brief outline of the architecture of our QA-system, which makes heavy use of syntactic information. Next, we describe the modules that were improved or developed especially for the CLEF} tasks, among others incorporation of syntactic knowledge in IR, incorporation of lexical equivalences and coreference resolution, and a baseline multilingual (English} to Dutch) QA} system, which uses a combination of Systran and Wikipedia (for term recognition and translation) for question translation. For non-list questions, 31\% (20\%) of the highest ranked answers returned by the monolingual (multilingual) system were correct.}}
 * {{hidden||We describe the system of the University of Groningen for the monolingual Dutch and multilingual English to Dutch QA} tasks. First, we give a brief outline of the architecture of our QA-system, which makes heavy use of syntactic information. Next, we describe the modules that were improved or developed especially for the CLEF} tasks, among others incorporation of syntactic knowledge in IR, incorporation of lexical equivalences and coreference resolution, and a baseline multilingual (English} to Dutch) QA} system, which uses a combination of Systran and Wikipedia (for term recognition and translation) for question translation. For non-list questions, 31\% (20\%) of the highest ranked answers returned by the monolingual (multilingual) system were correct.}}


 * -- align="left" valign=top
 * Boyles, Michael; Frend, Chauney; Rogers, Jeff; William, Albert; Reagan, David & Wernert, Eric
 * Leveraging Pre-Existing Resources at Institutions of Higher Education for K-12 STEM Engagement
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Bra, Paul De; Smits, David; van der Sluijs, Kees; Cristea, Alexandra; Hendrix, Maurice & Bra, Paul De
 * GRAPPLE: Personalization and Adaptation in Learning Management Systems
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Brachman, Ron
 * Emerging Sciences of the Internet: Some New Opportunities
 * Proceedings of the 4th European conference on The Semantic Web: Research and Applications
 * 2007
 * 
 * {{hidden||Semantic Web technologies have started to make a difference in enterprise settings and have begun to creep into use in limited parts of the World Wide Web. As is common in overview articles, it is easy to imagine scenarios in which the Semantic Web could provide important infrastructure for activities across the broader Internet. Many of these seem to be focused on improvements to what is essentially a search function (e.g., list the prices of flat screen {HDTVs} larger than 40 inches with 1080p resolution at shops in the nearest town that are open until 8pm on Tuesday evenings" {Web) and such capabilities will surely be of use to future Internet users. However if one looks closely at the research agendas of some of the largest Internet companies it is not clear that the staples of SW} thinking will intersect the most important paths of the major broad-spectrum service providers. Some of the emerging trends in the research labs of key industry players indicate that SW} goals generally taken for granted may be less central than envisioned and that the biggest opportunities may come from some less obvious directions. Given the level of investment and the global reach of big players like Yahoo! and Google it would pay us to look more closely at some of their fundamental investigations."}}
 * {{hidden||Semantic Web technologies have started to make a difference in enterprise settings and have begun to creep into use in limited parts of the World Wide Web. As is common in overview articles, it is easy to imagine scenarios in which the Semantic Web could provide important infrastructure for activities across the broader Internet. Many of these seem to be focused on improvements to what is essentially a search function (e.g., list the prices of flat screen {HDTVs} larger than 40 inches with 1080p resolution at shops in the nearest town that are open until 8pm on Tuesday evenings" {Web) and such capabilities will surely be of use to future Internet users. However if one looks closely at the research agendas of some of the largest Internet companies it is not clear that the staples of SW} thinking will intersect the most important paths of the major broad-spectrum service providers. Some of the emerging trends in the research labs of key industry players indicate that SW} goals generally taken for granted may be less central than envisioned and that the biggest opportunities may come from some less obvious directions. Given the level of investment and the global reach of big players like Yahoo! and Google it would pay us to look more closely at some of their fundamental investigations."}}


 * -- align="left" valign=top
 * Bradshaw, Daniele; Siko, Kari Lee; Hoffman, William; Talvitie-Siple, June; Fine, Bethann; Carano, Ken; Carlson, Lynne A.; Mixon, Natalie K; Rodriguez, Patricia; Sheffield, Caroline C.; Sullens-Mullican, Carey; Bolick, Cheryl & Berson, Michael J.
 * The Use of Videoconferencing as a Medium for Collaboration of Experiences and Dialogue Among Graduate Students: A Case Study from Two Southeastern Universities
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Bristow, Paul
 * The Digital Divide an age old question?
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Bruckman, Amy
 * Social Support for Creativity and Learning Online
 * Proceedings of the 2008 Second IEEE International Conference on Digital Game and Intelligent Toy Enhanced Learning
 * 2008
 * 


 * -- align="left" valign=top
 * Brunetti, Korey & Townsend, Lori
 * Extreme (Class) Makeover: Engaging Information Literacy Students with Web 2.0
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Brunvand, Stein & Bouwman, Jeffrey
 * The Math Boot Camp Wiki: Using a Wiki to Extend the Learning Beyond June
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Brusilovsky, Peter; Yudelson, Michael & Sosnovsky, Sergey
 * Collaborative Paper Exchange
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2005
 * 


 * -- align="left" valign=top
 * Bucur, Johanna
 * Teacher and Student Support Services for eLearning in Higher Education
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Bulkowski, Aleksander; Nawarecki, Edward & Duda, Andrzej
 * Peer-to-Peer Dissemination of Learning Objects for Creating Collaborative Learning Communities
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Bullock, Shawn
 * The Challenge of Digital Technologies to Educational Reform
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Buriol, Luciana S.; Castillo, Carlos; Donato, Debora; Leonardi, Stefano & Millozzi, Stefano
 * Temporal Analysis of the Wikigraph
 * Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
 * 2006
 * 
 * {{hidden||Wikipedia is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a Wikigraph"} a graph that represents the link structure of Wikipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are explicit timestamps associated with each node's events. This allows us to do a detailed analysis of the Wikipedia evolution over time. In the first part of this study we characterize this evolution in terms of users editions and articles; in the second part we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available."}}
 * {{hidden||Wikipedia is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a Wikigraph"} a graph that represents the link structure of Wikipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are explicit timestamps associated with each node's events. This allows us to do a detailed analysis of the Wikipedia evolution over time. In the first part of this study we characterize this evolution in terms of users editions and articles; in the second part we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available."}}


 * -- align="left" valign=top
 * Buscaldi, D. & Rosso, P.
 * A bag-of-words based ranking method for the Wikipedia question answering task
 * Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany
 * 2007
 * {{hidden||This paper presents a simple approach to the Wikipedia question answering pilot task in CLEF} 2006. The approach ranks the snippets, retrieved using the Lucene search engine, by means of a similarity measure based on bags of words extracted from both the snippets and the articles in Wikipedia. Our participation was in the monolingual English and Spanish tasks. We obtained the best results in the Spanish one.}}
 * {{hidden||This paper presents a simple approach to the Wikipedia question answering pilot task in CLEF} 2006. The approach ranks the snippets, retrieved using the Lucene search engine, by means of a similarity measure based on bags of words extracted from both the snippets and the articles in Wikipedia. Our participation was in the monolingual English and Spanish tasks. We obtained the best results in the Spanish one.}}
 * {{hidden||This paper presents a simple approach to the Wikipedia question answering pilot task in CLEF} 2006. The approach ranks the snippets, retrieved using the Lucene search engine, by means of a similarity measure based on bags of words extracted from both the snippets and the articles in Wikipedia. Our participation was in the monolingual English and Spanish tasks. We obtained the best results in the Spanish one.}}


 * -- align="left" valign=top
 * Buscaldi, Davide & Rosso, Paolo
 * A comparison of methods for the automatic identification of locations in wikipedia
 * Proceedings of the 4th ACM workshop on Geographical information retrieval
 * 2007
 * 
 * {{hidden||In this paper we compare two methods for the automatic identification of geographical articles in encyclopedic resources such as Wikipedia. The methods are a WordNet-based} method that uses a set of keywords related to geographical places, and a multinomial Naïve Bayes classificator, trained over a randomly selected subset of the English Wikipedia. This task may be included into the broader task of Named Entity classification, a well-known problem in the field of Natural Language Processing. The experiments were carried out considering both the full text of the articles and only the definition of the entity being described in the article. The obtained results show that the information contained in the page templates and the category labels is more useful than the text of the articles.}}
 * {{hidden||In this paper we compare two methods for the automatic identification of geographical articles in encyclopedic resources such as Wikipedia. The methods are a WordNet-based} method that uses a set of keywords related to geographical places, and a multinomial Naïve Bayes classificator, trained over a randomly selected subset of the English Wikipedia. This task may be included into the broader task of Named Entity classification, a well-known problem in the field of Natural Language Processing. The experiments were carried out considering both the full text of the articles and only the definition of the entity being described in the article. The obtained results show that the information contained in the page templates and the category labels is more useful than the text of the articles.}}


 * -- align="left" valign=top
 * Butler, Janice W. & Butler, Janice W.
 * A Whodunit in Two Acts: An Online Murder Mystery that Enhances Library and Internet Search Skills
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Butnariu, Cristina & Veale, Tony
 * UCD-S1: a hybrid model for detecting semantic relations between noun pairs in text
 * Proceedings of the 4th International Workshop on Semantic Evaluations
 * 2007
 * 
 * {{hidden||We describe a supervised learning approach to categorizing inter-noun relations, based on Support Vector Machines, that builds a different classifier for each of seven semantic relations. Each model uses the same learning strategy, while a simple voting procedure based on five trained discriminators with various blends of features determines the final categorization. The features that characterize each of the noun pairs are a blend of lexical-semantic categories extracted from WordNet} and several flavors of syntactic patterns extracted from various corpora, including Wikipedia and the WMTS} corpus.}}
 * {{hidden||We describe a supervised learning approach to categorizing inter-noun relations, based on Support Vector Machines, that builds a different classifier for each of seven semantic relations. Each model uses the same learning strategy, while a simple voting procedure based on five trained discriminators with various blends of features determines the final categorization. The features that characterize each of the noun pairs are a blend of lexical-semantic categories extracted from WordNet} and several flavors of syntactic patterns extracted from various corpora, including Wikipedia and the WMTS} corpus.}}


 * -- align="left" valign=top
 * Byron, Akilah
 * The Use of Open Source to mitigate the costs of implementing E-Government in the Caribbean
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Bélisle, Claire
 * Academic Use of Online Encyclopedias
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2005
 * 


 * -- align="left" valign=top
 * la Calzada, Gabriel De & Dekhtyar, Alex
 * On measuring the quality of Wikipedia articles
 * Proceedings of the 4th workshop on Information credibility
 * 2010
 * 


 * -- align="left" valign=top
 * Capuano, Nicola; Pierri, Anna; Colace, Francesco; Gaeta, Matteo & Mangione, Giuseppina Rita
 * A mash-up authoring tool for e-learning based on pedagogical templates
 * Proceedings of the first ACM international workshop on Multimedia technologies for distance learning
 * 2009
 * 
 * {{hidden||The purpose of this paper is twofold. On the one hand it aims at presenting the pedagogical template" methodology for the definition of didactic activities through the aggregation of atomic learning entities on the basis of pre-defined schemas. On the other hand it proposes a Web-based authoring tool to build learning resources applying a defined methodology. The authoring tool is inspired by mashing-up principles and allows the combination of local learning entities with learning entities coming from external sources belonging to Web 2.0 like Wikipedia Flickr YouTube} and SlideShare.} Eventually the results of a small-scale experimentation inside a University course purposed both to define a pedagogical template for "virtual scientific experiments" and to build and deploy learning resources applying such template are presented."}}
 * {{hidden||The purpose of this paper is twofold. On the one hand it aims at presenting the pedagogical template" methodology for the definition of didactic activities through the aggregation of atomic learning entities on the basis of pre-defined schemas. On the other hand it proposes a Web-based authoring tool to build learning resources applying a defined methodology. The authoring tool is inspired by mashing-up principles and allows the combination of local learning entities with learning entities coming from external sources belonging to Web 2.0 like Wikipedia Flickr YouTube} and SlideShare.} Eventually the results of a small-scale experimentation inside a University course purposed both to define a pedagogical template for "virtual scientific experiments" and to build and deploy learning resources applying such template are presented."}}


 * -- align="left" valign=top
 * Carano, Kenneth; Keefer, Natalie & Berson, Michael
 * Mobilizing Social Networking Technology to Empower a New Generation of Civic Activism Among Youth
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Cardoso, N.
 * GikiCLEF Topics and Wikipedia Articles: Did They Blend?
 * Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||This paper presents a post-hoc analysis on how the Wikipedia collections fared in providing answers and justifications to GikiCLEF} topics. Based on all solutions found by all GikiCLEF} participant systems, this paper measures how self-sufficient the particular Wikipedia collections were to provide answers and justifications for the topics, in order to better understand the recall limit that a GikiCLEF} system specialised in one single language has.}}
 * {{hidden||This paper presents a post-hoc analysis on how the Wikipedia collections fared in providing answers and justifications to GikiCLEF} topics. Based on all solutions found by all GikiCLEF} participant systems, this paper measures how self-sufficient the particular Wikipedia collections were to provide answers and justifications for the topics, in order to better understand the recall limit that a GikiCLEF} system specialised in one single language has.}}


 * -- align="left" valign=top
 * Cardoso, N.; Batista, D.; Lopez-Pellicer, F.J. & Silva, M.J.
 * Where In The Wikipedia Is That Answer? The XLDB At The GikiCLEF 2009 Task
 * Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||We developed a new semantic question analyser for a custom prototype assembled for participating in GikiCLEF} 2009, which processes grounded concepts derived from terms, and uses information extracted from knowledge bases to derive answers. We also evaluated a newly developed named-entity recognition module, based in Conditional Random Fields, and a new world geo-ontology, derived from Wikipedia, which is used in the geographic reasoning process.}}
 * {{hidden||We developed a new semantic question analyser for a custom prototype assembled for participating in GikiCLEF} 2009, which processes grounded concepts derived from terms, and uses information extracted from knowledge bases to derive answers. We also evaluated a newly developed named-entity recognition module, based in Conditional Random Fields, and a new world geo-ontology, derived from Wikipedia, which is used in the geographic reasoning process.}}


 * -- align="left" valign=top
 * Carter, B.
 * Beyond Google: Improving learning outcomes through digital literacy
 * International Association of School Librarianship. Selected Papers from the ... Annual Conference
 * 2009


 * -- align="left" valign=top
 * Cataltepe, Z.; Turan, Y. & Kesgin, F.
 * Turkish document classification using shorter roots
 * 2007 15th IEEE Signal Processing and Communications Applications, 11-13 June 2007 Piscataway, NJ, USA}
 * 2007
 * {{hidden||Stemming is one of commonly used pre-processing steps in document categorization. Especially when fast and accurate classification of a lot of documents is needed, it is important to have as small number of and as small length roots as possible. This would not only reduce the time it takes to train and test classifiers but also would reduce the storage requirements for each document. In this study, we analyze the performance of classifiers when the longest or shortest roots found by a stemmer are used. We also analyze the effect of using only the consonants in the roots. We use two document data sets, obtained from Milliyet newspaper and Wikipedia to analyze classification accuracy of classifiers when roots obtained under these four conditions are used. We also analyze the classification accuracy when only the first 4, 3 or 2 letters or consonants are used from the roots. Using smaller roots results in smaller number of TF-IDF} vectors. Especially for small sized TF-IDF} vectors, using only consonants in the roots gives better performance than using all letters in the roots.}}
 * {{hidden||Stemming is one of commonly used pre-processing steps in document categorization. Especially when fast and accurate classification of a lot of documents is needed, it is important to have as small number of and as small length roots as possible. This would not only reduce the time it takes to train and test classifiers but also would reduce the storage requirements for each document. In this study, we analyze the performance of classifiers when the longest or shortest roots found by a stemmer are used. We also analyze the effect of using only the consonants in the roots. We use two document data sets, obtained from Milliyet newspaper and Wikipedia to analyze classification accuracy of classifiers when roots obtained under these four conditions are used. We also analyze the classification accuracy when only the first 4, 3 or 2 letters or consonants are used from the roots. Using smaller roots results in smaller number of TF-IDF} vectors. Especially for small sized TF-IDF} vectors, using only consonants in the roots gives better performance than using all letters in the roots.}}
 * {{hidden||Stemming is one of commonly used pre-processing steps in document categorization. Especially when fast and accurate classification of a lot of documents is needed, it is important to have as small number of and as small length roots as possible. This would not only reduce the time it takes to train and test classifiers but also would reduce the storage requirements for each document. In this study, we analyze the performance of classifiers when the longest or shortest roots found by a stemmer are used. We also analyze the effect of using only the consonants in the roots. We use two document data sets, obtained from Milliyet newspaper and Wikipedia to analyze classification accuracy of classifiers when roots obtained under these four conditions are used. We also analyze the classification accuracy when only the first 4, 3 or 2 letters or consonants are used from the roots. Using smaller roots results in smaller number of TF-IDF} vectors. Especially for small sized TF-IDF} vectors, using only consonants in the roots gives better performance than using all letters in the roots.}}


 * -- align="left" valign=top
 * Chan, Michael; fai Chan, Stephen Chi & ki Leung, Cane Wing
 * Online Search Scope Reconstruction by Connectivity Inference
 * Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
 * 2007
 * 


 * -- align="left" valign=top
 * Chan, Peter & Dovchin, Tuul
 * Evaluation Study of the Development of Multimedia Cases for Training Mongolian Medical Professionals
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Charles, Elizabeth S.; Lasry, Nathaniel & Whittaker, Chris
 * Does scale matter: using different lenses to understand collaborative knowledge building
 * Proceedings of the 9th International Conference of the Learning Sciences - Volume 2
 * 2010
 * 
 * {{hidden||Web-based environments for communicating, networking and sharing information, often referred to collectively as Web} 2.0 have become ubiquitous - e.g., Wikipedia, Facebook, Flickr, or YouTube.} Understanding how such technologies can promote participation, collaboration and co-construction of knowledge, and how such affordances could be used for educational purposes has become a focus of research in the Learning Science and CSCL} communities (e.g., Dohn, 2009; Greenhow et al., 2009). One important mechanism is self-organization, which includes the regulation of feedback loops and the flows of information and resources within an activity system (Holland, 1996). But the study of such mechanisms calls for new ways of thinking about the unit of analysis, and the development of analytic tools that allow us to move back and forth through levels of activity systems that are designed to promote learning. Here, we propose that content analysis can focus on the flows of resources (i.e., content knowledge, scientific artifacts, epistemic beliefs) in terms of how they are established and the factors affecting whether they are taken up by members of the community.}}
 * {{hidden||Web-based environments for communicating, networking and sharing information, often referred to collectively as Web} 2.0 have become ubiquitous - e.g., Wikipedia, Facebook, Flickr, or YouTube.} Understanding how such technologies can promote participation, collaboration and co-construction of knowledge, and how such affordances could be used for educational purposes has become a focus of research in the Learning Science and CSCL} communities (e.g., Dohn, 2009; Greenhow et al., 2009). One important mechanism is self-organization, which includes the regulation of feedback loops and the flows of information and resources within an activity system (Holland, 1996). But the study of such mechanisms calls for new ways of thinking about the unit of analysis, and the development of analytic tools that allow us to move back and forth through levels of activity systems that are designed to promote learning. Here, we propose that content analysis can focus on the flows of resources (i.e., content knowledge, scientific artifacts, epistemic beliefs) in terms of how they are established and the factors affecting whether they are taken up by members of the community.}}


 * -- align="left" valign=top
 * Charnitski, Christina W. & Harvey, Francis A.
 * The Clash Between School and Corporate Reality
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Chen, Irene L. & Beebe, Ronald
 * Assessing Students’ Wiki Projects: Alternatives and Implications
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Chen, Pearl; Wan, Peiwen & Son, Jung-Eun
 * Web 2.0 and Education: Lessons from Teachers’ Perspectives
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Chen, Jing-Ying
 * Resource-Oriented Computing: Towards a Univeral Virtual Workspace
 * Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops - Volume 02
 * 2007
 * 
 * {{hidden||Emerging popular Web applications such as blogs and Wikipedia are transforming the Internet into a global collaborative environment where most people can participate and contribute. When resources created by and shared among people are not just content but also software artifacts, a much more accommodating, universal, and virtual workspace is foreseeable that can support people with diverse background and needs. To realize the goal, it requires not only necessary infrastructure support for resource deployment and composition, but also strategies and mechanisms to handle the implied complexity. We propose a service-oriented architecture in which arbitrary resources are associated with syntactical descriptors, called metaphors, based on which runtime services can be instantiated and managed. Furthermore, service composition can be achieved through syntactic metaphor composition. We demonstrate our approach via an E-Science} workbench that allows user to access and combine distributed computing and storage resources in a flexible manner.}}
 * {{hidden||Emerging popular Web applications such as blogs and Wikipedia are transforming the Internet into a global collaborative environment where most people can participate and contribute. When resources created by and shared among people are not just content but also software artifacts, a much more accommodating, universal, and virtual workspace is foreseeable that can support people with diverse background and needs. To realize the goal, it requires not only necessary infrastructure support for resource deployment and composition, but also strategies and mechanisms to handle the implied complexity. We propose a service-oriented architecture in which arbitrary resources are associated with syntactical descriptors, called metaphors, based on which runtime services can be instantiated and managed. Furthermore, service composition can be achieved through syntactic metaphor composition. We demonstrate our approach via an E-Science} workbench that allows user to access and combine distributed computing and storage resources in a flexible manner.}}


 * -- align="left" valign=top
 * Cheryl, Cheryl Seals; Zhang, Lei & Gilbert, Juan
 * Human Centered Computing Lab Web Site Redesign Effort
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Chikhi, Nacim Fateh; Rothenburger, Bernard & Aussenac-Gilles, Nathalie
 * A Comparison of Dimensionality Reduction Techniques for Web Structure Mining
 * Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
 * 2007
 * 
 * {{hidden||In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink connectivity. We apply and compare four DRTs, namely, Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA) and Random Projection (RP).} Experiments conducted on three datasets allow us to assert the following: NMF} outperforms PCA} and ICA} in terms of stability and interpretability of the discovered structures; the wellknown WebKb} dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited.}}
 * {{hidden||In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink connectivity. We apply and compare four DRTs, namely, Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA) and Random Projection (RP).} Experiments conducted on three datasets allow us to assert the following: NMF} outperforms PCA} and ICA} in terms of stability and interpretability of the discovered structures; the wellknown WebKb} dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited.}}


 * -- align="left" valign=top
 * Chin, Alvin; Hotho, Andreas & Strohmaier, Markus
 * Proceedings of the International Workshop on Modeling Social Media
 * 2010
 * 
 * 


 * -- align="left" valign=top
 * Choi, Boreum; Alexander, Kira; Kraut, Robert E. & Levine, John M.
 * Socialization tactics in wikipedia and their effects
 * Proceedings of the 2010 ACM conference on Computer supported cooperative work
 * 2010
 * 


 * -- align="left" valign=top
 * Chong, Ng & Yamamoto, Michihiro
 * Using Many Wikis for Collaborative Writing
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2006
 * 


 * -- align="left" valign=top
 * Chou, Chen-Hsiung
 * Multimedia in Higher Education of Tourism
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Choudhury, Monojit; Hassan, Samer; Mukherjee, Animesh & Muresan, Smaranda
 * Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
 * 2009
 * 
 * {{hidden||The last few years have shown a steady increase in applying graph-theoretic models to computational linguistics. In many NLP} applications, entities can be naturally represented as nodes in a graph and relations between them can be represented as edges. There have been extensive research showing that graph-based representations of linguistic units such as words, sentences and documents give rise to novel and efficient solutions in a variety of NLP} tasks, ranging from part-of-speech tagging, word sense disambiguation and parsing, to information extraction, semantic role labeling, summarization, and sentiment analysis. More recently, complex network theory, a popular modeling paradigm in statistical mechanics and physics of complex systems, was proven to be a promising tool in understanding the structure and dynamics of languages. Complex network based models have been applied to areas as diverse as language evolution, acquisition, historical linguistics, mining and analyzing the social networks of blogs and emails, link analysis and information retrieval, information extraction, and representation of the mental lexicon. In order to make this field of research more visible, this time the workshop incorporated a special theme on Cognitive and Social Dynamics of Languages in the framework of Complex Networks. Cognitive dynamics of languages include topics focused primarily on language acquisition, which can be extended to language change (historical linguistics) and language evolution as well. Since the latter phenomena are also governed by social factors, we can further classify them under social dynamics of languages. In addition, social dynamics of languages also include topics such as mining the social networks of blogs and emails. A collection of articles pertaining to this special theme will be compiled in a special issue of the Computer Speech and Language journal. This volume contains papers accepted for presentation at the TextGraphs-4} 2009 Workshop on Graph-Based} Methods for Natural Language Processing. The event took place on August 7, 2009, in Suntec, Singapore, immediately following ACL/IJCNLP} 2009, the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Being the fourth workshop on this topic, we were able to build on the success of the previous TextGraphs} workshops, held as part of {HLT-NAACL} 2006, {HLT-NAACL} 2007 and Coling 2008. It aimed at bringing together researchers working on problems related to the use of graph-based algorithms for NLP} and on pure graph-theoretic methods, as well as those applying complex networks for explaining language dynamics. Like last year, TextGraphs-4} has also been endorsed by SIGLEX.} We issued calls for both regular and short papers. Nine regular and three short papers were accepted for presentation, based on the careful reviews of our program committee. Our sincere thanks to all the program committee members for their thoughtful, high quality and elaborate reviews, especially considering our extremely tight time frame for reviewing. The papers appearing in this volume have surely benefited from their expert feedback. This year's workshop attracted papers employing graphs in a wide range of settings and we are therefore proud to present a very diverse program. We received quite a few papers on discovering semantic similarity through random walks. Daniel Ramage et al. explore random walk based methods to discover semantic similarity in texts, while Eric Yeh et al. attempt to discover semantic relatedness through random walks on the Wikipedia. Amec Herdagdelen et al. describes a method for measuring semantic relatedness with vector space models and random walks.}}
 * {{hidden||The last few years have shown a steady increase in applying graph-theoretic models to computational linguistics. In many NLP} applications, entities can be naturally represented as nodes in a graph and relations between them can be represented as edges. There have been extensive research showing that graph-based representations of linguistic units such as words, sentences and documents give rise to novel and efficient solutions in a variety of NLP} tasks, ranging from part-of-speech tagging, word sense disambiguation and parsing, to information extraction, semantic role labeling, summarization, and sentiment analysis. More recently, complex network theory, a popular modeling paradigm in statistical mechanics and physics of complex systems, was proven to be a promising tool in understanding the structure and dynamics of languages. Complex network based models have been applied to areas as diverse as language evolution, acquisition, historical linguistics, mining and analyzing the social networks of blogs and emails, link analysis and information retrieval, information extraction, and representation of the mental lexicon. In order to make this field of research more visible, this time the workshop incorporated a special theme on Cognitive and Social Dynamics of Languages in the framework of Complex Networks. Cognitive dynamics of languages include topics focused primarily on language acquisition, which can be extended to language change (historical linguistics) and language evolution as well. Since the latter phenomena are also governed by social factors, we can further classify them under social dynamics of languages. In addition, social dynamics of languages also include topics such as mining the social networks of blogs and emails. A collection of articles pertaining to this special theme will be compiled in a special issue of the Computer Speech and Language journal. This volume contains papers accepted for presentation at the TextGraphs-4} 2009 Workshop on Graph-Based} Methods for Natural Language Processing. The event took place on August 7, 2009, in Suntec, Singapore, immediately following ACL/IJCNLP} 2009, the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Being the fourth workshop on this topic, we were able to build on the success of the previous TextGraphs} workshops, held as part of {HLT-NAACL} 2006, {HLT-NAACL} 2007 and Coling 2008. It aimed at bringing together researchers working on problems related to the use of graph-based algorithms for NLP} and on pure graph-theoretic methods, as well as those applying complex networks for explaining language dynamics. Like last year, TextGraphs-4} has also been endorsed by SIGLEX.} We issued calls for both regular and short papers. Nine regular and three short papers were accepted for presentation, based on the careful reviews of our program committee. Our sincere thanks to all the program committee members for their thoughtful, high quality and elaborate reviews, especially considering our extremely tight time frame for reviewing. The papers appearing in this volume have surely benefited from their expert feedback. This year's workshop attracted papers employing graphs in a wide range of settings and we are therefore proud to present a very diverse program. We received quite a few papers on discovering semantic similarity through random walks. Daniel Ramage et al. explore random walk based methods to discover semantic similarity in texts, while Eric Yeh et al. attempt to discover semantic relatedness through random walks on the Wikipedia. Amec Herdagdelen et al. describes a method for measuring semantic relatedness with vector space models and random walks.}}
 * {{hidden||The last few years have shown a steady increase in applying graph-theoretic models to computational linguistics. In many NLP} applications, entities can be naturally represented as nodes in a graph and relations between them can be represented as edges. There have been extensive research showing that graph-based representations of linguistic units such as words, sentences and documents give rise to novel and efficient solutions in a variety of NLP} tasks, ranging from part-of-speech tagging, word sense disambiguation and parsing, to information extraction, semantic role labeling, summarization, and sentiment analysis. More recently, complex network theory, a popular modeling paradigm in statistical mechanics and physics of complex systems, was proven to be a promising tool in understanding the structure and dynamics of languages. Complex network based models have been applied to areas as diverse as language evolution, acquisition, historical linguistics, mining and analyzing the social networks of blogs and emails, link analysis and information retrieval, information extraction, and representation of the mental lexicon. In order to make this field of research more visible, this time the workshop incorporated a special theme on Cognitive and Social Dynamics of Languages in the framework of Complex Networks. Cognitive dynamics of languages include topics focused primarily on language acquisition, which can be extended to language change (historical linguistics) and language evolution as well. Since the latter phenomena are also governed by social factors, we can further classify them under social dynamics of languages. In addition, social dynamics of languages also include topics such as mining the social networks of blogs and emails. A collection of articles pertaining to this special theme will be compiled in a special issue of the Computer Speech and Language journal. This volume contains papers accepted for presentation at the TextGraphs-4} 2009 Workshop on Graph-Based} Methods for Natural Language Processing. The event took place on August 7, 2009, in Suntec, Singapore, immediately following ACL/IJCNLP} 2009, the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Being the fourth workshop on this topic, we were able to build on the success of the previous TextGraphs} workshops, held as part of {HLT-NAACL} 2006, {HLT-NAACL} 2007 and Coling 2008. It aimed at bringing together researchers working on problems related to the use of graph-based algorithms for NLP} and on pure graph-theoretic methods, as well as those applying complex networks for explaining language dynamics. Like last year, TextGraphs-4} has also been endorsed by SIGLEX.} We issued calls for both regular and short papers. Nine regular and three short papers were accepted for presentation, based on the careful reviews of our program committee. Our sincere thanks to all the program committee members for their thoughtful, high quality and elaborate reviews, especially considering our extremely tight time frame for reviewing. The papers appearing in this volume have surely benefited from their expert feedback. This year's workshop attracted papers employing graphs in a wide range of settings and we are therefore proud to present a very diverse program. We received quite a few papers on discovering semantic similarity through random walks. Daniel Ramage et al. explore random walk based methods to discover semantic similarity in texts, while Eric Yeh et al. attempt to discover semantic relatedness through random walks on the Wikipedia. Amec Herdagdelen et al. describes a method for measuring semantic relatedness with vector space models and random walks.}}


 * -- align="left" valign=top
 * Choulat, Tracey
 * Teacher Education and Internet Safety
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Clauson, Kevin A; Polen, Hyla H; Boulos, Maged N K & Dzenowagis, Joan H
 * Accuracy and completeness of drug information in Wikipedia
 * AMIA} ... Annual Symposium Proceedings / AMIA} Symposium. AMIA} Symposium
 * 2008
 * 


 * -- align="left" valign=top
 * Clow, Doug
 * Resource Discovery: Heavy and Light Metadata Approaches
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2004
 * 


 * -- align="left" valign=top
 * Colazzo, Luigi; Magagnino, Francesco; Molinari, Andrea & Villa, Nicola
 * From e-learning to Social Networking: a Case Study
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Cook, John
 * Generating New Learning Contexts: Novel Forms of Reuse and Learning on the Move
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Copeland, Nancy & Bednar, Anne
 * Mobilizing Educational Technologists in a Collaborative Online Community to Develop a Knowledge Management System as a Wiki
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Corbeil, Joseph Rene & Valdes-Corbeil, Maria Elena
 * Enhance Your Online Courses by Re-Engineering The Courseware Management System
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Cosley, Dan; Frankowski, Dan; Terveen, Loren & Riedl, John
 * Using intelligent task routing and contribution review to help communities build artifacts of lasting value
 * Proceedings of the SIGCHI conference on Human Factors in computing systems
 * 2006
 * 
 * {{hidden||Many online communities are emerging that, like Wikipedia, bring people together to build community-maintained artifacts of lasting value (CALVs).} Motivating people to contribute is a key problem because the quantity and quality of contributions ultimately determine a CALV's} value. We pose two related research questions: 1) How does intelligent task routing---matching people with work---affect the quantity of contributions? 2) How does reviewing contributions before accepting them affect the quality of contributions? A field experiment with 197 contributors shows that simple, intelligent task routing algorithms have large effects. We also model the effect of reviewing contributions on the value of CALVs.} The model predicts, and experimental data shows, that value grows more slowly with review before acceptance. It also predicts, surprisingly, that a CALV} will reach the same final value whether contributions are reviewed before or after they are made available to the community.}}
 * {{hidden||Many online communities are emerging that, like Wikipedia, bring people together to build community-maintained artifacts of lasting value (CALVs).} Motivating people to contribute is a key problem because the quantity and quality of contributions ultimately determine a CALV's} value. We pose two related research questions: 1) How does intelligent task routing---matching people with work---affect the quantity of contributions? 2) How does reviewing contributions before accepting them affect the quality of contributions? A field experiment with 197 contributors shows that simple, intelligent task routing algorithms have large effects. We also model the effect of reviewing contributions on the value of CALVs.} The model predicts, and experimental data shows, that value grows more slowly with review before acceptance. It also predicts, surprisingly, that a CALV} will reach the same final value whether contributions are reviewed before or after they are made available to the community.}}


 * -- align="left" valign=top
 * Costa, Luís Fernando
 * Using answer retrieval patterns to answer Portuguese questions
 * Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
 * 2008
 * 
 * {{hidden||Esfinge is a general domain Portuguese question answering system which has been participating at QA@CLEF} since 2004. It uses the information available in the official" document collections used in QA@CLEF} (newspaper text and Wikipedia) and information from the Web as an additional resource when searching for answers. Where it regards the use of external tools Esfinge uses a syntactic analyzer a morphological analyzer and a named entity recognizer. This year an alternative approach to retrieve answers was tested: whereas in previous years search patterns were used to retrieve relevant documents this year a new type of search patterns was also used to extract the answers themselves. We also evaluated the second and third best answers returned by Esfinge. This evaluation showed that when Esfinge answers correctly a question it does so usually with its first answer. Furthermore the experiments revealed that the answer retrieval patterns created for this participation improve the results but only for definition questions."}}
 * {{hidden||Esfinge is a general domain Portuguese question answering system which has been participating at QA@CLEF} since 2004. It uses the information available in the official" document collections used in QA@CLEF} (newspaper text and Wikipedia) and information from the Web as an additional resource when searching for answers. Where it regards the use of external tools Esfinge uses a syntactic analyzer a morphological analyzer and a named entity recognizer. This year an alternative approach to retrieve answers was tested: whereas in previous years search patterns were used to retrieve relevant documents this year a new type of search patterns was also used to extract the answers themselves. We also evaluated the second and third best answers returned by Esfinge. This evaluation showed that when Esfinge answers correctly a question it does so usually with its first answer. Furthermore the experiments revealed that the answer retrieval patterns created for this participation improve the results but only for definition questions."}}


 * -- align="left" valign=top
 * Coursey, Kino & Mihalcea, Rada
 * Topic identification using Wikipedia graph centrality
 * Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
 * 2009
 * 


 * -- align="left" valign=top
 * Coutinho, Clara
 * Using Blogs, Podcasts and Google Sites as Educational Tools in a Teacher Education Program
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Coutinho, Clara
 * Web 2.0 technologies as cognitive tools: preparing future k-12 teachers
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Coutinho, Clara & Junior, João Bottentuit
 * Using social bookmarking to enhance cooperation/collaboration in a Teacher Education Program
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Coutinho, Clara & Junior, João Batista Bottentuit
 * Web 2.0 in Portuguese Academic Community: An Exploratory Survey
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Coutinho, Clara & Rocha, Aurora
 * Screencast and Vodcast: An Experience in Secondary Education
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Crawford, Caroline; Smith, Richard A. & Smith, Marion S.
 * Podcasting in the Learning Environment: From Podcasts for the Learning Community, Towards the Integration of Podcasts within the Elementary Learning Environment
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Crawford, Caroline M. & Thomson, Jennifer
 * Graphic Novels as Visual Human Performance and Training Tools: Towards an Understanding of Information Literacy for Preservice Teachers
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Cui, Gaoying; Lu, Qin; Li, Wenjie & Chen, Yirong
 * Mining Concepts from Wikipedia for Ontology Construction
 * Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
 * 2009
 * 
 * {{hidden||An ontology is a structured knowledgebase of concepts organized by relations among them. But concepts are usually mixed with their instances in the corpora for knowledge extraction. Concepts and their corresponding instances share similar features and are difficult to distinguish. In this paper, a novel approach is proposed to comprehensively obtain concepts with the help of definition sentences and Category Labels in Wikipedia pages. N-gram statistics and other NLP} knowledge are used to help extracting appropriate concepts. The proposed method identified nearly 50,000 concepts from about 700,000 Wiki pages. The precision reaching 78.5\% makes it an effective approach to mine concepts from Wikipedia for ontology construction.}}
 * {{hidden||An ontology is a structured knowledgebase of concepts organized by relations among them. But concepts are usually mixed with their instances in the corpora for knowledge extraction. Concepts and their corresponding instances share similar features and are difficult to distinguish. In this paper, a novel approach is proposed to comprehensively obtain concepts with the help of definition sentences and Category Labels in Wikipedia pages. N-gram statistics and other NLP} knowledge are used to help extracting appropriate concepts. The proposed method identified nearly 50,000 concepts from about 700,000 Wiki pages. The precision reaching 78.5\% makes it an effective approach to mine concepts from Wikipedia for ontology construction.}}


 * -- align="left" valign=top
 * Cummings, Jeff; Massey, Anne P. & Ramesh, V.
 * Web 2.0 proclivity: understanding how personal use influences organizational adoption
 * Proceedings of the 27th ACM international conference on Design of communication
 * 2009
 * 
 * {{hidden||Web 2.0 represents a major shift in how individuals communicate and collaborate with others. While many of these technologies have been used for public, social interactions (e.g., Wikipedia and YouTube), organizations are just beginning to explore their use in day-to-day operations. Due to relatively recent introduction and public popularity, Web 2.0 has led to a resurgent focus on how organizations can once again leverage technology within the organization for virtual and mass collaboration. In this paper, we explore some of the key questions facing organizations with regard to Web 2.0 implementation and adoption. We develop a model of Web} 2.0 Proclivity" defined as an individual's propensity to use Web 2.0 tools within the organization. Our model and set of associated hypotheses focuses on understanding an employee's internal Web 2.0 content behaviors based on non-work personal use behaviors. To test our model and hypotheses survey-based data was collected from a global engine design and manufacturing company. Our results show that Web 2.0 Proclivity is positively influenced by an employee's external behaviors and that differences exist across both functional departments and employee work roles. We discuss the research implications of our findings as well as how our findings and model of Web 2.0 Proclivity can be used to help guide organizational practice."}}
 * {{hidden||Web 2.0 represents a major shift in how individuals communicate and collaborate with others. While many of these technologies have been used for public, social interactions (e.g., Wikipedia and YouTube), organizations are just beginning to explore their use in day-to-day operations. Due to relatively recent introduction and public popularity, Web 2.0 has led to a resurgent focus on how organizations can once again leverage technology within the organization for virtual and mass collaboration. In this paper, we explore some of the key questions facing organizations with regard to Web 2.0 implementation and adoption. We develop a model of Web} 2.0 Proclivity" defined as an individual's propensity to use Web 2.0 tools within the organization. Our model and set of associated hypotheses focuses on understanding an employee's internal Web 2.0 content behaviors based on non-work personal use behaviors. To test our model and hypotheses survey-based data was collected from a global engine design and manufacturing company. Our results show that Web 2.0 Proclivity is positively influenced by an employee's external behaviors and that differences exist across both functional departments and employee work roles. We discuss the research implications of our findings as well as how our findings and model of Web 2.0 Proclivity can be used to help guide organizational practice."}}


 * -- align="left" valign=top
 * Cusinato, Alberto; Mea, Vincenzo Della; Salvatore, Francesco Di & Mizzaro, Stefano
 * QuWi: quality control in Wikipedia
 * Proceedings of the 3rd workshop on Information credibility on the web
 * 2009
 * 
 * {{hidden||We propose and evaluate QuWi} (Quality} in Wikipedia), a framework for quality control in Wikipedia. We build upon a previous proposal by Mizzaro [11], who proposed a method for substituting and/or complementing peer review in scholarly publishing. Since articles in Wikipedia are never finished, and their authors change continuously, we define a modified algorithm that takes into account the different domain, with particular attention to the fact that authors contribute identifiable pieces of information that can be further modified by other authors. The algorithm assigns quality scores to articles and contributors. The scores assigned to articles can be used, e.g., to let the reader understand how reliable are the articles he or she is looking at, or to help contributors in identifying low quality articles to be enhanced. The scores assigned to users measure the average quality of their contributions to Wikipedia and can be used, e.g., for conflict resolution policies based on the quality of involved users. Our proposed algorithm is experimentally evaluated by analyzing the obtained quality scores on articles for deletion and featured articles, also on six temporal Wikipedia snapshots. Preliminary results demonstrate that the proposed algorithm seems to appropriately identify high and low quality articles, and that high quality authors produce more long-lived contributions than low quality authors.}}
 * {{hidden||We propose and evaluate QuWi} (Quality} in Wikipedia), a framework for quality control in Wikipedia. We build upon a previous proposal by Mizzaro [11], who proposed a method for substituting and/or complementing peer review in scholarly publishing. Since articles in Wikipedia are never finished, and their authors change continuously, we define a modified algorithm that takes into account the different domain, with particular attention to the fact that authors contribute identifiable pieces of information that can be further modified by other authors. The algorithm assigns quality scores to articles and contributors. The scores assigned to articles can be used, e.g., to let the reader understand how reliable are the articles he or she is looking at, or to help contributors in identifying low quality articles to be enhanced. The scores assigned to users measure the average quality of their contributions to Wikipedia and can be used, e.g., for conflict resolution policies based on the quality of involved users. Our proposed algorithm is experimentally evaluated by analyzing the obtained quality scores on articles for deletion and featured articles, also on six temporal Wikipedia snapshots. Preliminary results demonstrate that the proposed algorithm seems to appropriately identify high and low quality articles, and that high quality authors produce more long-lived contributions than low quality authors.}}


 * -- align="left" valign=top
 * Cuthell, John & Preston, Christina Preston
 * An interactivist e-community of practice using Web 2:00 tools
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Dale, Michael; Stern, Abram; Deckert, Mark & Sack, Warren
 * System demonstration: Metavid.org: a social website and open archive of congressional video
 * Proceedings of the 10th Annual International Conference on Digital Government Research: Social Networks: Making Connections between Citizens, Data and Government
 * 2009
 * 
 * {{hidden||We have developed Metavid.org, a site that archives video footage of the U.S.} Senate and House floor proceedings. Visitors can search for who said what when and also download, remix, blog, edit, discuss, and annotate transcripts and metadata. The site has been built with Open Source Software (OSS) and the video is archived in an OSS} codec (Ogg} Theora). We highlight two aspects of the Metavid design: (1) open standards; and, (2) Wiki functionality. First, open standards allow Metavid to function both as a platform, on top of which other sites can be built, and as a resource for mashing" (i.e. semi-automatically assembling custom websites). For example Voterwatch.org pulls its video from the Metavid archive. Second Metavid extends the MediaWiki} software (which is the foundation of Wikipedia) into the domain of collaborative video authoring. This extension allows closed-captioned text or video sequences to be collectively edited."}}
 * {{hidden||We have developed Metavid.org, a site that archives video footage of the U.S.} Senate and House floor proceedings. Visitors can search for who said what when and also download, remix, blog, edit, discuss, and annotate transcripts and metadata. The site has been built with Open Source Software (OSS) and the video is archived in an OSS} codec (Ogg} Theora). We highlight two aspects of the Metavid design: (1) open standards; and, (2) Wiki functionality. First, open standards allow Metavid to function both as a platform, on top of which other sites can be built, and as a resource for mashing" (i.e. semi-automatically assembling custom websites). For example Voterwatch.org pulls its video from the Metavid archive. Second Metavid extends the MediaWiki} software (which is the foundation of Wikipedia) into the domain of collaborative video authoring. This extension allows closed-captioned text or video sequences to be collectively edited."}}


 * -- align="left" valign=top
 * Dallman, Alicia & McDonald, Michael
 * Upward Bound Success: Climbing the Collegiate Ladder with Web 2.0 Wikis
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Danyaro, K.U.; Jaafar, J.; Lara, R.A.A. De & Downe, A.G.
 * An evaluation of the usage of Web 2.0 among tertiary level students in Malaysia
 * 2010 International Symposium on Information Technology (ITSim 2010), 15-17 June 2010 Piscataway, NJ, USA}
 * 2010
 * 
 * {{hidden||Web 2.0 is increasingly becoming a familiar pedagogical tool in higher education, facilitating the process of teaching and learning. But this advancement in information technology has further provoked the problems like plagiarism and other academic misconduct. This paper evaluates the patterns of use and behavior of tertiary level students towards the use of Web 2.0 as an alternative and supplemental ELearning} Portal. A total of 92 students' data were collected and analyzed according to {'Self-Determination} Theory' (SDT).} It was found that students use social websites for chatting, gamming and sharing files. Facebook, YouTube} and Wikipedia are ranked as the most popular websites used by college students. It also reveals that students have an inherent desire of expressing ideas and opinion online openly and independently. This sense of freedom makes students feel more competent, autonomous or participative and find learning to be less tedious. Therefore, this report, recommends educators to adopt strategies for acknowledging students' feelings and activities online to reinforce positive behavior effective learning. Finally, we discussed the implications of Web 2.0 on education.}}
 * {{hidden||Web 2.0 is increasingly becoming a familiar pedagogical tool in higher education, facilitating the process of teaching and learning. But this advancement in information technology has further provoked the problems like plagiarism and other academic misconduct. This paper evaluates the patterns of use and behavior of tertiary level students towards the use of Web 2.0 as an alternative and supplemental ELearning} Portal. A total of 92 students' data were collected and analyzed according to {'Self-Determination} Theory' (SDT).} It was found that students use social websites for chatting, gamming and sharing files. Facebook, YouTube} and Wikipedia are ranked as the most popular websites used by college students. It also reveals that students have an inherent desire of expressing ideas and opinion online openly and independently. This sense of freedom makes students feel more competent, autonomous or participative and find learning to be less tedious. Therefore, this report, recommends educators to adopt strategies for acknowledging students' feelings and activities online to reinforce positive behavior effective learning. Finally, we discussed the implications of Web 2.0 on education.}}


 * -- align="left" valign=top
 * DeGennaro, Donna & Kress, Tricia
 * Looking to Transform Learning: From Social Transformation in the Public Sphere to Authentic Learning in the Classroom
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Dehinbo, Johnson
 * Strategy for progressing from in-house training into e-learning using Activity Theory at a South African university
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Dehinbo, Johnson
 * Suitable research paradigms for social inclusion through enhancement of Web applications development in developing countries
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Desjardins, Francois & vanOostveen, Roland
 * Collaborative Online Learning Environment:Towards a process driven approach and collective knowledge building
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Desmontils, E.; Jacquin, C. & Monceaux, L.
 * Question types specification for the use of specialized patterns in Prodicos system
 * Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany
 * 2007
 * {{hidden||We present the second version of the Prodicos query answering system which was developed by the TALN} team from the LINA} institute. The main improvements made concern in the one hand, the use of external knowledge (Wikipedia) to improve the passage selection step. And on the other hand, the answer extraction step is improved by the determination of four different strategies for locating the answer to a question regarding its type. Afterwards, for the passage selection and answer extraction modules, the evaluation is put forward to justify the results obtained.}}
 * {{hidden||We present the second version of the Prodicos query answering system which was developed by the TALN} team from the LINA} institute. The main improvements made concern in the one hand, the use of external knowledge (Wikipedia) to improve the passage selection step. And on the other hand, the answer extraction step is improved by the determination of four different strategies for locating the answer to a question regarding its type. Afterwards, for the passage selection and answer extraction modules, the evaluation is put forward to justify the results obtained.}}
 * {{hidden||We present the second version of the Prodicos query answering system which was developed by the TALN} team from the LINA} institute. The main improvements made concern in the one hand, the use of external knowledge (Wikipedia) to improve the passage selection step. And on the other hand, the answer extraction step is improved by the determination of four different strategies for locating the answer to a question regarding its type. Afterwards, for the passage selection and answer extraction modules, the evaluation is put forward to justify the results obtained.}}


 * -- align="left" valign=top
 * Dicheva, Darina & Dichev, Christo
 * Helping Courseware Authors to Build Ontologies: The Case of TM4L
 * Proceeding of the 2007 conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts That Work
 * 2007
 * 


 * -- align="left" valign=top
 * Diem, Richard
 * Technology and Culture: A Conceptual Framework
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Diplaris, S.; Kompatsiaris, I.; Flores, A.; Escriche, M.; Sigurbjornsson, B.; Garcia, L. & van Zwol, R.
 * Collective Intelligence in Mobile Consumer Social Applications
 * 2010 Ninth International Conference on Mobile Business \& 2010 Ninth Global Mobility Roundtable. ICMB-GMR 2010, 13-15 June 2010 Piscataway, NJ, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Dixon, Brian
 * Reflective Video Journals and Adolescent Metacognition: An exploratory study
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Dobrila, T.-A.; Diaconasu, M.-C.; Lungu, I.-D. & Iftene, A.
 * Methods for Classifying Videos by Subject and Detecting Narrative Peak Points
 * Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||2009 marked UAIC's} first participation at the VideoCLEF} evaluation campaign. Our group built two separate systems for the Subject} Classification" and {"Affect} Detection" tasks. For the first task we created two resources starting from Wikipedia pages and pages identified with Google and used two tools for classification: Lucene and Weka. For the second task we extracted the audio component from a given video file using FFmpeg.} After that we computed the average amplitude for each word from the transcript by applying the Fast Fourier Transform algorithm in order to analyze the sound. A brief description of our systems' components is given in this paper."}}
 * {{hidden||2009 marked UAIC's} first participation at the VideoCLEF} evaluation campaign. Our group built two separate systems for the Subject} Classification" and {"Affect} Detection" tasks. For the first task we created two resources starting from Wikipedia pages and pages identified with Google and used two tools for classification: Lucene and Weka. For the second task we extracted the audio component from a given video file using FFmpeg.} After that we computed the average amplitude for each word from the transcript by applying the Fast Fourier Transform algorithm in order to analyze the sound. A brief description of our systems' components is given in this paper."}}


 * -- align="left" valign=top
 * Dodge, Bernie & Molebash, Philip
 * Mini-Courses for Teaching with Technology: Thinking Outside the 3-Credit Box
 * Society for Information Technology \& Teacher Education International Conference
 * 2005
 * 


 * -- align="left" valign=top
 * Dominik, Magda
 * The Alternate Reality Game: Learning Situated in the Realities of the 21st Century
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Dondio, P.; Barrett, S.; Weber, S. & Seigneur, J.M.
 * Extracting trust from domain analysis: a case study on the Wikipedia project
 * Autonomic and Trusted Computing. Third International Conference, ATC 2006. Proceedings, 3-6 Sept. 2006 Berlin, Germany
 * 2006


 * -- align="left" valign=top
 * Dopichaj, P.
 * The university of Kaiserslautern at INEX 2006
 * Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany
 * 2007
 * {{hidden||Digital libraries offer convenient access to large volumes of text, but finding the information that is relevant for a given information need is hard. The workshops of the Initiative for the Evaluation of XML} retrieval (INEX) provide a forum for testing the effectiveness of retrieval strategies. In this paper, we present the current version of our search engine that was used for INEX} 2006: Like at INEX} 2005, our search engine exploits structural patterns - in particular, automatic detection of titles - in the retrieval results to find the appropriate results among overlapping elements. This year, we examine how we can change this method to work better with the Wikipedia collection, which is significantly larger than the IEEE} collection used in previous years. We show that our optimizations both retain the retrieval quality and reduce retrieval time significantly.}}
 * {{hidden||Digital libraries offer convenient access to large volumes of text, but finding the information that is relevant for a given information need is hard. The workshops of the Initiative for the Evaluation of XML} retrieval (INEX) provide a forum for testing the effectiveness of retrieval strategies. In this paper, we present the current version of our search engine that was used for INEX} 2006: Like at INEX} 2005, our search engine exploits structural patterns - in particular, automatic detection of titles - in the retrieval results to find the appropriate results among overlapping elements. This year, we examine how we can change this method to work better with the Wikipedia collection, which is significantly larger than the IEEE} collection used in previous years. We show that our optimizations both retain the retrieval quality and reduce retrieval time significantly.}}
 * {{hidden||Digital libraries offer convenient access to large volumes of text, but finding the information that is relevant for a given information need is hard. The workshops of the Initiative for the Evaluation of XML} retrieval (INEX) provide a forum for testing the effectiveness of retrieval strategies. In this paper, we present the current version of our search engine that was used for INEX} 2006: Like at INEX} 2005, our search engine exploits structural patterns - in particular, automatic detection of titles - in the retrieval results to find the appropriate results among overlapping elements. This year, we examine how we can change this method to work better with the Wikipedia collection, which is significantly larger than the IEEE} collection used in previous years. We show that our optimizations both retain the retrieval quality and reduce retrieval time significantly.}}


 * -- align="left" valign=top
 * Dormann, Claire & Biddle, Robert
 * Urban expressions and experiential gaming
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Dornescu, I.
 * Semantic QA for Encyclopaedic Questions: EQUAL in GikiCLEF
 * Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||This paper presents a new question answering (QA) approach and a prototype system, EQUAL, which relies on structural information from Wikipedia to answer open-list questions. The system achieved the highest score amongst the participants in the GikiCLEF} 2009 task. Unlike the standard textual QA} approach, EQUAL} does not rely on identifying the answer within a text snippet by using keyword retrieval. Instead, it explores the Wikipedia page graph, extracting and aggregating information from multiple documents and enforcing semantic constraints. The challenges for such an approach and an error analysis are also discussed.}}
 * {{hidden||This paper presents a new question answering (QA) approach and a prototype system, EQUAL, which relies on structural information from Wikipedia to answer open-list questions. The system achieved the highest score amongst the participants in the GikiCLEF} 2009 task. Unlike the standard textual QA} approach, EQUAL} does not rely on identifying the answer within a text snippet by using keyword retrieval. Instead, it explores the Wikipedia page graph, extracting and aggregating information from multiple documents and enforcing semantic constraints. The challenges for such an approach and an error analysis are also discussed.}}


 * -- align="left" valign=top
 * Dost, Ascander & King, Tracy Holloway
 * Using large-scale parser output to guide grammar development
 * Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks
 * 2009
 * 


 * -- align="left" valign=top
 * Doucet, A. & Lehtonen, M.
 * Unsupervised classification of text-centric XML document collections
 * Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany
 * 2007
 * {{hidden||This paper addresses the problem of the unsupervised classification of text-centric XML} documents. In the context of the INEX} mining track 2006, we present methods to exploit the inherent structural information of XML} documents in the document clustering process. Using the k-means algorithm, we have experimented with a couple of feature sets, to discover that a promising direction is to use structural information as a preliminary means to detect and put aside structural outliers. The improvement of the semantic-wise quality of clustering is significantly higher through this approach than through a combination of the structural and textual feature sets. The paper also discusses the problem of the evaluation of XML} clustering. Currently, in the INEX} mining track, XML} clustering techniques are evaluated against semantic categories. We believe there is a mismatch between the task (to exploit the document structure) and the evaluation, which disregards structural aspects. An illustration of this fact is that, over all the clustering track submissions, our text-based runs obtained the 1st rank (Wikipedia} collection, out of 7) and 2nd rank (IEEE} collection, out of 13).}}
 * {{hidden||This paper addresses the problem of the unsupervised classification of text-centric XML} documents. In the context of the INEX} mining track 2006, we present methods to exploit the inherent structural information of XML} documents in the document clustering process. Using the k-means algorithm, we have experimented with a couple of feature sets, to discover that a promising direction is to use structural information as a preliminary means to detect and put aside structural outliers. The improvement of the semantic-wise quality of clustering is significantly higher through this approach than through a combination of the structural and textual feature sets. The paper also discusses the problem of the evaluation of XML} clustering. Currently, in the INEX} mining track, XML} clustering techniques are evaluated against semantic categories. We believe there is a mismatch between the task (to exploit the document structure) and the evaluation, which disregards structural aspects. An illustration of this fact is that, over all the clustering track submissions, our text-based runs obtained the 1st rank (Wikipedia} collection, out of 7) and 2nd rank (IEEE} collection, out of 13).}}
 * {{hidden||This paper addresses the problem of the unsupervised classification of text-centric XML} documents. In the context of the INEX} mining track 2006, we present methods to exploit the inherent structural information of XML} documents in the document clustering process. Using the k-means algorithm, we have experimented with a couple of feature sets, to discover that a promising direction is to use structural information as a preliminary means to detect and put aside structural outliers. The improvement of the semantic-wise quality of clustering is significantly higher through this approach than through a combination of the structural and textual feature sets. The paper also discusses the problem of the evaluation of XML} clustering. Currently, in the INEX} mining track, XML} clustering techniques are evaluated against semantic categories. We believe there is a mismatch between the task (to exploit the document structure) and the evaluation, which disregards structural aspects. An illustration of this fact is that, over all the clustering track submissions, our text-based runs obtained the 1st rank (Wikipedia} collection, out of 7) and 2nd rank (IEEE} collection, out of 13).}}


 * -- align="left" valign=top
 * Dovchin, Tuul & Chan, Peter
 * Multimedia Cases for Training Mongolian Medical Professionals -- An Innovative Strategy for Overcoming Pedagogical Challenges
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Dowling, Sherwood
 * Adopting a Long Tail Web Publishing Strategy for Museum Educational Materials at the Smithsonian American Art Museum
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Dron, Jon & Anderson, Terry
 * Collectives, Networks and Groups in Social Software for E-Learning
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Dron, Jon & Bhattacharya, Madhumita
 * A Dialogue on E-Learning and Diversity: the Learning Management System vs the Personal Learning Environment
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Désilets, Alain & Paquet, Sébastien
 * Wiki as a Tool for Web-based Collaborative Story Telling in Primary School: a Case Study
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2005
 * 


 * -- align="left" valign=top
 * Díaz, Francisco; Osorio, Maria & Amadeo, Ana
 * Evolution of the use of Moodle in Argentina, adding Web2.0 features
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Ebner, Martin
 * E-Learning 2.0 = e-Learning 1.0 + Web 2.0?
 * Proceedings of the The Second International Conference on Availability, Reliability and Security
 * 2007
 * 


 * -- align="left" valign=top
 * Ebner, Martin & Nagler, Walther
 * Has Web2.0 Reached the Educated Top?
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Ebner, Martin & Taraghi, Behnam
 * Personal Learning Environment for Higher Education – A First Prototype
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Els, Christo J. & Blignaut, A. Seugnet
 * Exploring Teachers’ ICT Pedagogy in the North-West Province, South Africa
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Erlenkötter, Annekatrin; Kühnle, Claas-Michael; Miu, Huey-Ru; Sommer, Franziska & Reiners, Torsten
 * Enhancing the Class Curriculum with Virtual World Use Cases for Production and Logistics
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * van Erp, Marieke; Lendvai, Piroska & van den Bosch, Antal
 * Comparing alternative data-driven ontological vistas of natural history
 * Proceedings of the Eighth International Conference on Computational Semantics
 * 2009
 * 
 * {{hidden||Traditionally, domain ontologies are created manually, based on human experts' views on the classes and relations of the domain at hand. We present ongoing work on two approaches to the automatic construction of ontologies from a flat database of records, and compare them to a manually constructed ontology. The latter CIDOC-CRM} ontology focusses on the organisation of classes and relations. In contrast, the first automatic method, based on machine learning, focuses on the mutual predictiveness between classes, while the second automatic method, created with the aid of Wikipedia, stresses meaningful relations between classes. The three ontologies show little overlap; their differences illustrate that a different focus during ontology construction can lead to radically different ontologies. We discuss the implications of these differences, and argue that the two alternative ontologies may be useful in higher-level information systems such as search engines.}}
 * {{hidden||Traditionally, domain ontologies are created manually, based on human experts' views on the classes and relations of the domain at hand. We present ongoing work on two approaches to the automatic construction of ontologies from a flat database of records, and compare them to a manually constructed ontology. The latter CIDOC-CRM} ontology focusses on the organisation of classes and relations. In contrast, the first automatic method, based on machine learning, focuses on the mutual predictiveness between classes, while the second automatic method, created with the aid of Wikipedia, stresses meaningful relations between classes. The three ontologies show little overlap; their differences illustrate that a different focus during ontology construction can lead to radically different ontologies. We discuss the implications of these differences, and argue that the two alternative ontologies may be useful in higher-level information systems such as search engines.}}


 * -- align="left" valign=top
 * Erren, Patrick & Keil, Reinhard
 * Enabling new Learning Scenarios in the Age of the Web 2.0 via Semantic Positioning
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Every, Vanessa; Garcia, Gna & Young, Michael
 * A Qualitative Study of Public Wiki Use in a Teacher Education Program
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Ewbank, Ann; Carter, Heather & Foulger, Teresa
 * MySpace Dilemmas: Ethical Choices for Teachers using Social Networking
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Eymard, Oivier; Sanchis, Eric & Selves, Jean-Louis
 * A Peer-to-Peer Collaborative Framework Based on Perceptive Reasoning
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Farhoodi, M.; Yari, A. & Mahmoudi, M.
 * Combining content-based and context-based methods for Persian web page classification
 * 2009 Second International Conference on the Applications of Digital Information and Web Technologies (ICADIWT), 4-6 Aug. 2009 Piscataway, NJ, USA}
 * 2009
 * 


 * -- align="left" valign=top
 * Farkas, Richárd; Szarvas, György & Ormándi, Róbert
 * Improving a state-of-the-art named entity recognition system using the world wide web
 * Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
 * 2007
 * 
 * {{hidden||The development of highly accurate Named Entity Recognition (NER) systems can be beneficial to a wide range of Human Language Technology applications. In this paper we introduce three heuristics that exploit a variety of knowledge sources (the World Wide Web, Wikipedia and WordNet) and are capable of improving further a state-of-the-art multilingual and domain independent NER} system. Moreover we describe our investigations on entity recognition in simulated speech-to-text output. Our web-based heuristics attained a slight improvement over the best results published on a standard NER} task, and proved to be particularly effective in the speech-to-text scenario.}}
 * {{hidden||The development of highly accurate Named Entity Recognition (NER) systems can be beneficial to a wide range of Human Language Technology applications. In this paper we introduce three heuristics that exploit a variety of knowledge sources (the World Wide Web, Wikipedia and WordNet) and are capable of improving further a state-of-the-art multilingual and domain independent NER} system. Moreover we describe our investigations on entity recognition in simulated speech-to-text output. Our web-based heuristics attained a slight improvement over the best results published on a standard NER} task, and proved to be particularly effective in the speech-to-text scenario.}}


 * -- align="left" valign=top
 * Farley, Alan & Barton, Siew Mee
 * Developing and rewarding advanced teaching expertise in higher education - a different approach
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2006
 * 


 * -- align="left" valign=top
 * Feldmann, Birgit & Franzkowiak, Bettina
 * Studying in Web 2.0 - What (Distance) Students Really Want
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Ferguson, Donald F.
 * Autonomic business service management
 * Proceedings of the 6th international conference on Autonomic computing
 * 2009
 * 
 * {{hidden||Medium and large enterprises think of information technology implementing business services. Examples include online banking or Web commerce. Most systems and application management technology manage individual hardware and software systems. A business service is inherently a composite comprised from multiple {HW, SW} and logical entities. For example, a Web commerce system may have a Web server, Web application server, database server and messaging system to connect to mainframe inventory management. Each of the systems has various installed software. Businesses want to automate management of the business service, not the individual instances. IT} management systems must manage the service, unwind" the high level policies and operations and apply them to individual {HW} and SW} elements. SOA} makes managing composites more difficult due to dynamic binding and request routing. This presentation describes the design and implementation of a business service management system. The core elements include: A Unified Service Model A real-time management database that extends the concept of a Configuration Management Database (CMDB) {} and integrates external management and monitoring systems. Rule based event correlation and rule based discovery of the structure of a business service. Algorithmic analysis of the composite service to automatically detect and repair availability and end-to-end performance problems. The presentation suggests topics for additional research."}}
 * {{hidden||Medium and large enterprises think of information technology implementing business services. Examples include online banking or Web commerce. Most systems and application management technology manage individual hardware and software systems. A business service is inherently a composite comprised from multiple {HW, SW} and logical entities. For example, a Web commerce system may have a Web server, Web application server, database server and messaging system to connect to mainframe inventory management. Each of the systems has various installed software. Businesses want to automate management of the business service, not the individual instances. IT} management systems must manage the service, unwind" the high level policies and operations and apply them to individual {HW} and SW} elements. SOA} makes managing composites more difficult due to dynamic binding and request routing. This presentation describes the design and implementation of a business service management system. The core elements include: A Unified Service Model A real-time management database that extends the concept of a Configuration Management Database (CMDB) {} and integrates external management and monitoring systems. Rule based event correlation and rule based discovery of the structure of a business service. Algorithmic analysis of the composite service to automatically detect and repair availability and end-to-end performance problems. The presentation suggests topics for additional research."}}


 * -- align="left" valign=top
 * Ferres, D. & Rodriguez, H.
 * TALP at GikiCLEF 2009
 * Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||This paper describes our experiments in Geographical Information Retrieval with the Wikipedia collection in the context of our participation in the GikiCLEF} 2009 Multilingual task in English and Spanish. Our system, called GikiTALP, follows a simple approach that uses standard Information Retrieval with the Sphinx full-text search engine and some Natural Language Processing techniques without Geographical Knowledge.}}
 * {{hidden||This paper describes our experiments in Geographical Information Retrieval with the Wikipedia collection in the context of our participation in the GikiCLEF} 2009 Multilingual task in English and Spanish. Our system, called GikiTALP, follows a simple approach that uses standard Information Retrieval with the Sphinx full-text search engine and some Natural Language Processing techniques without Geographical Knowledge.}}


 * -- align="left" valign=top
 * Ferrés, Daniel & Rodríguez, Horacio
 * Experiments adapting an open-domain question answering system to the geographical domain using scope-based resources
 * Proceedings of the Workshop on Multilingual Question Answering
 * 2006
 * 
 * {{hidden||This paper describes an approach to adapt an existing multilingual Open-Domain} Question Answering (ODQA) system for factoid questions to a Restricted Domain, the Geographical Domain. The adaptation of this ODQA} system involved the modification of some components of our system such as: Question Processing, Passage Retrieval and Answer Extraction. The new system uses external resources like GNS} Gazetteer for Named Entity (NE) Classification and Wikipedia or Google in order to obtain relevant documents for this domain. The system focuses on a Geographical Scope: given a region, or country, and a language we can semi-automatically obtain multilingual geographical resources (e.g. gazetteers, trigger words, groups of place names, etc.) of this scope. The system has been trained and evaluated for Spanish in the scope of the Spanish Geography. The evaluation reveals that the use of scope-based Geographical resources is a good approach to deal with multilingual Geographical Domain Question Answering.}}
 * {{hidden||This paper describes an approach to adapt an existing multilingual Open-Domain} Question Answering (ODQA) system for factoid questions to a Restricted Domain, the Geographical Domain. The adaptation of this ODQA} system involved the modification of some components of our system such as: Question Processing, Passage Retrieval and Answer Extraction. The new system uses external resources like GNS} Gazetteer for Named Entity (NE) Classification and Wikipedia or Google in order to obtain relevant documents for this domain. The system focuses on a Geographical Scope: given a region, or country, and a language we can semi-automatically obtain multilingual geographical resources (e.g. gazetteers, trigger words, groups of place names, etc.) of this scope. The system has been trained and evaluated for Spanish in the scope of the Spanish Geography. The evaluation reveals that the use of scope-based Geographical resources is a good approach to deal with multilingual Geographical Domain Question Answering.}}


 * -- align="left" valign=top
 * Fiaidhi, Jinan & Mohammed, Sabah
 * Detecting Some Collaborative Academic Indicators Based on Social Networks: A DBLP Case Study
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Filatova, Elena
 * Directions for exploiting asymmetries in multilingual Wikipedia
 * Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
 * 2009
 * 
 * {{hidden||Multilingual Wikipedia has been used extensively for a variety Natural Language Processing (NLP) tasks. Many Wikipedia entries (people, locations, events, etc.) have descriptions in several languages. These descriptions, however, are not identical. On the contrary, descriptions in different languages created for the same Wikipedia entry can vary greatly in terms of description length and information choice. Keeping these peculiarities in mind is necessary while using multilingual Wikipedia as a corpus for training and testing NLP} applications. In this paper we present preliminary results on quantifying Wikipedia multilinguality. Our results support the observation about the substantial variation in descriptions of Wikipedia entries created in different languages. However, we believe that asymmetries in multilingual Wikipedia do not make Wikipedia an undesirable corpus for NLP} applications training. On the contrary, we outline research directions that can utilize multilingual Wikipedia asymmetries to bridge the communication gaps in multilingual societies.}}
 * {{hidden||Multilingual Wikipedia has been used extensively for a variety Natural Language Processing (NLP) tasks. Many Wikipedia entries (people, locations, events, etc.) have descriptions in several languages. These descriptions, however, are not identical. On the contrary, descriptions in different languages created for the same Wikipedia entry can vary greatly in terms of description length and information choice. Keeping these peculiarities in mind is necessary while using multilingual Wikipedia as a corpus for training and testing NLP} applications. In this paper we present preliminary results on quantifying Wikipedia multilinguality. Our results support the observation about the substantial variation in descriptions of Wikipedia entries created in different languages. However, we believe that asymmetries in multilingual Wikipedia do not make Wikipedia an undesirable corpus for NLP} applications training. On the contrary, we outline research directions that can utilize multilingual Wikipedia asymmetries to bridge the communication gaps in multilingual societies.}}


 * -- align="left" valign=top
 * Fleet, Gregory & Wallace, Peter
 * How could Web 2.0 be shaping web-assisted learning?
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Flouris, G.; Fundulaki, I.; Pediaditis, P.; Theoharis, Y. & Christophides, V.
 * Coloring RDF triples to capture provenance
 * Semantic Web - ISWC 2009. 8th International Semantic Web Conference, ISWC 2009, 25-29 Oct. 2009 Berlin, Germany
 * 2009
 * 
 * {{hidden||Recently, the W3C} Linking Open Data effort has boosted the publication and inter-linkage of large amounts of RDF} datasets on the Semantic Web. Various ontologies and knowledge bases with millions of RDF} triples from Wikipedia and other sources, mostly in e-science, have been created and are publicly available. Recording provenance information of RDF} triples aggregated from different heterogeneous sources is crucial in order to effectively support trust mechanisms, digital rights and privacy policies. Managing provenance becomes even more important when we consider not only explicitly stated but also implicit triples (through RDFS} inference rules) in conjunction with declarative languages for querying and updating RDF} graphs. In this paper we rely on colored RDF} triples represented as quadruples to capture and manipulate explicit provenance information.}}
 * {{hidden||Recently, the W3C} Linking Open Data effort has boosted the publication and inter-linkage of large amounts of RDF} datasets on the Semantic Web. Various ontologies and knowledge bases with millions of RDF} triples from Wikipedia and other sources, mostly in e-science, have been created and are publicly available. Recording provenance information of RDF} triples aggregated from different heterogeneous sources is crucial in order to effectively support trust mechanisms, digital rights and privacy policies. Managing provenance becomes even more important when we consider not only explicitly stated but also implicit triples (through RDFS} inference rules) in conjunction with declarative languages for querying and updating RDF} graphs. In this paper we rely on colored RDF} triples represented as quadruples to capture and manipulate explicit provenance information.}}


 * -- align="left" valign=top
 * Fogarolli, Angela & Ronchetti, Marco
 * A Web 2.0-enabled digital library
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Foley, Brian & Chang, Tae
 * Wiki as a Professional Development Tool
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Forrester, Bruce & Verdon, John
 * Introducing Peer Production into the Department of National Defense
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Forte, Andrea & Bruckman, Amy
 * From Wikipedia to the classroom: exploring online publication and learning
 * Proceedings of the 7th international conference on Learning sciences
 * 2006
 * 


 * -- align="left" valign=top
 * Francke, H. & Sundin, O.
 * An inside view: credibility in Wikipedia from the perspective of editors
 * Information Research
 * 2010
 * {{hidden||Introduction. The question of credibility in participatory information environments, particularly Wikipedia, has been much debated. This paper investigates how editors on Swedish Wikipedia consider credibility when they edit and read Wikipedia articles. Method. The study builds on interviews with 11 editors on Swedish Wikipedia, supported by a document analysis of policies on Swedish Wikipedia. Analysis. The interview transcripts have been coded qualitatively according to the participants' use of Wikipedia and what they take into consideration in making credibility assessments. Results. The participants use Wikipedia for purposes where it is not vital that the information is correct. Their credibility assessments are mainly based on authorship, verifiability, and the editing history of an article. Conclusions. The situations and purposes for which the editors use Wikipedia are similar to other user groups, but they draw on their knowledge as members of the network of practice of wikipedians to make credibility assessments, including knowledge of certain editors and of the MediaWiki} architecture. Their assessments have more similarities to those used in traditional media than to assessments springing from the wisdom of crowds.}}
 * {{hidden||Introduction. The question of credibility in participatory information environments, particularly Wikipedia, has been much debated. This paper investigates how editors on Swedish Wikipedia consider credibility when they edit and read Wikipedia articles. Method. The study builds on interviews with 11 editors on Swedish Wikipedia, supported by a document analysis of policies on Swedish Wikipedia. Analysis. The interview transcripts have been coded qualitatively according to the participants' use of Wikipedia and what they take into consideration in making credibility assessments. Results. The participants use Wikipedia for purposes where it is not vital that the information is correct. Their credibility assessments are mainly based on authorship, verifiability, and the editing history of an article. Conclusions. The situations and purposes for which the editors use Wikipedia are similar to other user groups, but they draw on their knowledge as members of the network of practice of wikipedians to make credibility assessments, including knowledge of certain editors and of the MediaWiki} architecture. Their assessments have more similarities to those used in traditional media than to assessments springing from the wisdom of crowds.}}
 * {{hidden||Introduction. The question of credibility in participatory information environments, particularly Wikipedia, has been much debated. This paper investigates how editors on Swedish Wikipedia consider credibility when they edit and read Wikipedia articles. Method. The study builds on interviews with 11 editors on Swedish Wikipedia, supported by a document analysis of policies on Swedish Wikipedia. Analysis. The interview transcripts have been coded qualitatively according to the participants' use of Wikipedia and what they take into consideration in making credibility assessments. Results. The participants use Wikipedia for purposes where it is not vital that the information is correct. Their credibility assessments are mainly based on authorship, verifiability, and the editing history of an article. Conclusions. The situations and purposes for which the editors use Wikipedia are similar to other user groups, but they draw on their knowledge as members of the network of practice of wikipedians to make credibility assessments, including knowledge of certain editors and of the MediaWiki} architecture. Their assessments have more similarities to those used in traditional media than to assessments springing from the wisdom of crowds.}}


 * -- align="left" valign=top
 * Freeman, Wendy
 * Reflecting on the Culture of Research Using Weblogs
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Futrell-Schilling, Dawn
 * Teaching and Learning in the Conceptual Age: Integrating a Sense of Symphony into the Curriculum
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Gagne, Claude & Fels, Deborah
 * Learning through Weblogs
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Ganeshan, Kathiravelu
 * A Technological Framework for Improving Education in the Developing World
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Ganeshan, Kathiravelu & Komosny, Dan
 * Rojak: A New Paradigm in Teaching and Learning
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Ganjisaffar, Y.; Javanmardi, S. & Lopes, C.
 * Review-based ranking of Wikipedia articles
 * 2009 International Conference on Computational Aspects of Social Networks (CASON), 24-27 June 2009 Piscataway, NJ, USA}
 * 2009
 * 
 * {{hidden||Wikipedia, the largest encyclopedia on the Web, is often seen as the most successful example of crowdsourcing. The encyclopedic knowledge it accumulated over the years is so large that one often uses search engines, to find information in it. In contrast to regular Web pages, Wikipedia is fairly structured, and articles are usually accompanied with history pages, categories and talk pages. The meta-data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We discuss how the rich meta-data available in wiki pages can be used to provide better search results in Wikipedia. Built on the studies on Wisdom} of Crowds" and the effectiveness of the knowledge collected by a large number of people we investigate the effect of incorporating the extent of review of an article in the quality of rankings of the search results. The extent of review is measured by the number of distinct editors contributed to the articles and is extracted by processing Wikipedia's history pages. We compare different ranking algorithms that explore combinations of text-relevancy PageRank} and extent of review. The results show that the review-based ranking algorithm which combines the extent of review and text-relevancy outperforms the rest; it is more accurate and less computationally expensive compared to PageRank-based} rankings."}}
 * {{hidden||Wikipedia, the largest encyclopedia on the Web, is often seen as the most successful example of crowdsourcing. The encyclopedic knowledge it accumulated over the years is so large that one often uses search engines, to find information in it. In contrast to regular Web pages, Wikipedia is fairly structured, and articles are usually accompanied with history pages, categories and talk pages. The meta-data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We discuss how the rich meta-data available in wiki pages can be used to provide better search results in Wikipedia. Built on the studies on Wisdom} of Crowds" and the effectiveness of the knowledge collected by a large number of people we investigate the effect of incorporating the extent of review of an article in the quality of rankings of the search results. The extent of review is measured by the number of distinct editors contributed to the articles and is extracted by processing Wikipedia's history pages. We compare different ranking algorithms that explore combinations of text-relevancy PageRank} and extent of review. The results show that the review-based ranking algorithm which combines the extent of review and text-relevancy outperforms the rest; it is more accurate and less computationally expensive compared to PageRank-based} rankings."}}


 * -- align="left" valign=top
 * Ganjisaffar, Yasser; Javanmardi, Sara & Lopes, Cristina
 * Review-Based Ranking of Wikipedia Articles
 * Proceedings of the 2009 International Conference on Computational Aspects of Social Networks
 * 2009
 * 
 * {{hidden||Wikipedia, the largest encyclopedia on the Web, is often seen as the most successful example of crowdsourcing. The encyclopedic knowledge it accumulated over the years is so large that one often uses search engines, to find information in it. In contrast to regular Web pages, Wikipedia is fairly structured, and articles are usually accompanied with history pages, categories and talk pages. The meta-data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We discuss how the rich meta-data available in wiki pages can be used to provide better search results in Wikipedia. Built on the studies on Wisdom} of Crowds" and the effectiveness of the knowledge collected by a large number of people we investigate the effect of incorporating the extent of review of an article in the quality of rankings of the search results. The extent of review is measured by the number of distinct editors contributed to the articles and is extracted by processing Wikipedia's history pages. We compare different ranking algorithms that explore combinations of text-relevancy PageRank} and extent of review. The results show that the review-based ranking algorithm which combines the extent of review and text-relevancy outperforms the rest; it is more accurate and less computationally expensive compared to PageRank-based} rankings."}}
 * {{hidden||Wikipedia, the largest encyclopedia on the Web, is often seen as the most successful example of crowdsourcing. The encyclopedic knowledge it accumulated over the years is so large that one often uses search engines, to find information in it. In contrast to regular Web pages, Wikipedia is fairly structured, and articles are usually accompanied with history pages, categories and talk pages. The meta-data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We discuss how the rich meta-data available in wiki pages can be used to provide better search results in Wikipedia. Built on the studies on Wisdom} of Crowds" and the effectiveness of the knowledge collected by a large number of people we investigate the effect of incorporating the extent of review of an article in the quality of rankings of the search results. The extent of review is measured by the number of distinct editors contributed to the articles and is extracted by processing Wikipedia's history pages. We compare different ranking algorithms that explore combinations of text-relevancy PageRank} and extent of review. The results show that the review-based ranking algorithm which combines the extent of review and text-relevancy outperforms the rest; it is more accurate and less computationally expensive compared to PageRank-based} rankings."}}


 * -- align="left" valign=top
 * Gantner, Zeno & Schmidt-Thieme, Lars
 * Automatic content-based categorization of Wikipedia articles
 * Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
 * 2009
 * 
 * {{hidden||Wikipedia's article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse -- using text classification methods for predicting the categories of Wikipedia articles -- has attracted less attention so far. We propose to return the favor" and use text classifiers to improve Wikipedia. This could support the emergence of a virtuous circle between the wisdom of the crowds and machine Learning/NLP} methods. We define the categorization of Wikipedia articles as a multi-label classification task describe two solutions to the task and perform experiments that show that our approach is feasible despite the high number of labels."}}
 * {{hidden||Wikipedia's article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse -- using text classification methods for predicting the categories of Wikipedia articles -- has attracted less attention so far. We propose to return the favor" and use text classifiers to improve Wikipedia. This could support the emergence of a virtuous circle between the wisdom of the crowds and machine Learning/NLP} methods. We define the categorization of Wikipedia articles as a multi-label classification task describe two solutions to the task and perform experiments that show that our approach is feasible despite the high number of labels."}}


 * -- align="left" valign=top
 * Gaonkar, Shravan & Choudhury, Romit Roy
 * Micro-Blog: map-casting from mobile phones to virtual sensor maps
 * Proceedings of the 5th international conference on Embedded networked sensor systems
 * 2007
 * 


 * -- align="left" valign=top
 * Gardner, J.; Krowne, A. & Xiong, Li
 * NNexus: towards an automatic linker for a massively-distributed collaborative corpus
 * 2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing, 17-20 Nov. 2006 Piscataway, NJ, USA}
 * 2006
 * {{hidden||Collaborative online encyclopedias such as Wikipedia and PlanetMath} are becoming increasingly popular. In order to understand an article in a corpus a user must understand the related and underlying concepts through linked articles. In this paper, we introduce NNexus, a generalization of the automatic linking component of PlanetMath.org} and the first system that automates the process of linking encyclopedia entries into a semantic network of concepts. We discuss the challenges, present the conceptual models as well as specific mechanisms of NNexus} system, and discuss some of our ongoing and completed works}}
 * {{hidden||Collaborative online encyclopedias such as Wikipedia and PlanetMath} are becoming increasingly popular. In order to understand an article in a corpus a user must understand the related and underlying concepts through linked articles. In this paper, we introduce NNexus, a generalization of the automatic linking component of PlanetMath.org} and the first system that automates the process of linking encyclopedia entries into a semantic network of concepts. We discuss the challenges, present the conceptual models as well as specific mechanisms of NNexus} system, and discuss some of our ongoing and completed works}}
 * {{hidden||Collaborative online encyclopedias such as Wikipedia and PlanetMath} are becoming increasingly popular. In order to understand an article in a corpus a user must understand the related and underlying concepts through linked articles. In this paper, we introduce NNexus, a generalization of the automatic linking component of PlanetMath.org} and the first system that automates the process of linking encyclopedia entries into a semantic network of concepts. We discuss the challenges, present the conceptual models as well as specific mechanisms of NNexus} system, and discuss some of our ongoing and completed works}}


 * -- align="left" valign=top
 * Garvoille, Alexa & Buckner, Ginny
 * Writing Wikipedia Pages in the Constructivist Classroom
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Garza, S.E. & Brena, R.F.
 * Graph local clustering for topic detection in Web collections
 * 2009 Latin American Web Congress. LA-WEB 2009, 9-11 Nov. 2009 Piscataway, NJ, USA}
 * 2009
 * 
 * {{hidden||In the midst of a developing Web that increases its size with a constant rhythm, automatic document organization becomes important. One way to arrange documents is by categorizing them into topics. Even when there are different forms to consider topics and their extraction, a practical option is to view them as document groups and apply clustering algorithms. An attractive alternative that naturally copes with the Web size and complexity is the one proposed by graph local clustering (GLC) methods. In this paper, we define a formal framework for working with topics in hyperlinked environments and analyze the feasibility of GLC} for this task. We performed tests over an important Web collection, namely Wikipedia, and our results, which were validated using various kinds of methods (some of them specific for the information domain), indicate that this approach is suitable for topic discovery.}}
 * {{hidden||In the midst of a developing Web that increases its size with a constant rhythm, automatic document organization becomes important. One way to arrange documents is by categorizing them into topics. Even when there are different forms to consider topics and their extraction, a practical option is to view them as document groups and apply clustering algorithms. An attractive alternative that naturally copes with the Web size and complexity is the one proposed by graph local clustering (GLC) methods. In this paper, we define a formal framework for working with topics in hyperlinked environments and analyze the feasibility of GLC} for this task. We performed tests over an important Web collection, namely Wikipedia, and our results, which were validated using various kinds of methods (some of them specific for the information domain), indicate that this approach is suitable for topic discovery.}}


 * -- align="left" valign=top
 * Geiger, R. Stuart & Ribes, David
 * The work of sustaining order in wikipedia: the banning of a vandal
 * Proceedings of the 2010 ACM conference on Computer supported cooperative work
 * 2010
 * 


 * -- align="left" valign=top
 * Gentile, Anna Lisa; Basile, Pierpaolo; Iaquinta, Leo & Semeraro, Giovanni
 * Lexical and Semantic Resources for NLP: From Words to Meanings
 * Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
 * 2008
 * 
 * {{hidden||A user expresses her information need through words with a precise meaning, but from the machine point of view this meaning does not come with the word. A further step is needful to automatically associate it to the words. Techniques that process human language are required and also linguistic and semantic knowledge, stored within distinct and heterogeneous resources, which play an important role during all Natural Language Processing (NLP) steps. Resources management is a challenging problem, together with the correct association between URIs} coming from the resources and meanings of the Words.This} work presents a service that, given a lexeme (an abstract unit of morphological analysis in linguistics, which roughly corresponds to a set of words that are different forms of the same word), returns all syntactic and semantic information collected from a list of lexical and semantic resources. The proposed strategy consists in merging data with origin from stable resources, such as WordNet, with data collected dynamically from evolving sources, such as the Web or Wikipedia. That strategy is implemented in a wrapper to a set of popular linguistic resources that provides a single point of access to them, in a transparent way to the user, to accomplish the computational linguistic problem of getting a rich set of linguistic and semantic annotations in a compact way.}}
 * {{hidden||A user expresses her information need through words with a precise meaning, but from the machine point of view this meaning does not come with the word. A further step is needful to automatically associate it to the words. Techniques that process human language are required and also linguistic and semantic knowledge, stored within distinct and heterogeneous resources, which play an important role during all Natural Language Processing (NLP) steps. Resources management is a challenging problem, together with the correct association between URIs} coming from the resources and meanings of the Words.This} work presents a service that, given a lexeme (an abstract unit of morphological analysis in linguistics, which roughly corresponds to a set of words that are different forms of the same word), returns all syntactic and semantic information collected from a list of lexical and semantic resources. The proposed strategy consists in merging data with origin from stable resources, such as WordNet, with data collected dynamically from evolving sources, such as the Web or Wikipedia. That strategy is implemented in a wrapper to a set of popular linguistic resources that provides a single point of access to them, in a transparent way to the user, to accomplish the computational linguistic problem of getting a rich set of linguistic and semantic annotations in a compact way.}}


 * -- align="left" valign=top
 * Geraci, Michael
 * Implementing a Wiki as a collaboration tool for group projects
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Ghislandi, Patrizia; Mattei, Antonio; Paolino, Daniela; Pellegrini, Alice & Pisanu, Francesco
 * Designing Online Learning Communities for Higher Education: Possibilities and Limits of Moodle
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Gibson, David; Reynolds-Alpert, Suzanne; Doering, Aaron & Searson, Michael
 * Participatory Media in Informal Learning
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Giza, Brian & McCann, Erin
 * The Use of Free Translation Tools in the Biology Classroom
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Gleim, Rüdiger; Mehler, Alexander & Dehmer, Matthias
 * Web corpus mining by instance of Wikipedia
 * Proceedings of the 2nd International Workshop on Web as Corpus
 * 2006
 * 


 * -- align="left" valign=top
 * Gleim, R.; Mehler, A.; Dehmer, M. & Pustylnikov, O.
 * Aisles through the category forest
 * Third International Conference on Web information systems and technologies, WEBIST 2007, 3-6 March 2007 Setubal, Portugal
 * 2007
 * {{hidden||The Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the Web are much more demanding. In order to successfully develop approaches to Web mining, respective corpora are needed. However, the composition of genre- or domain-specific Web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because Web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki} software, can be utilised as a source of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia category explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning.}}
 * {{hidden||The Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the Web are much more demanding. In order to successfully develop approaches to Web mining, respective corpora are needed. However, the composition of genre- or domain-specific Web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because Web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki} software, can be utilised as a source of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia category explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning.}}
 * {{hidden||The Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the Web are much more demanding. In order to successfully develop approaches to Web mining, respective corpora are needed. However, the composition of genre- or domain-specific Web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because Web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki} software, can be utilised as a source of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia category explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning.}}


 * -- align="left" valign=top
 * Glogoff, Stuart
 * Channeling Students and Parents: Promoting the University Through YouTube
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Glover, Ian & Oliver, Andrew
 * Hybridisation of Social Networking and Learning Environments
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Glover, Ian; Xu, Zhijie & Hardaker, Glenn
 * Redeveloping an eLearning Annotation System as a Web Service
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2005
 * 


 * -- align="left" valign=top
 * Goh, Hui-Ngo & Kiu, Ching-Chieh
 * Context-based term identification and extraction for ontology construction
 * 2010 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 2010), 21-23 Aug. 2010 Piscataway, NJ, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * González-Martínez, MaríaDolores & Herrera-Batista, Miguel Angel
 * Habits and preferences of University Students on the use of Information and Communication Technologies in their academic activities and of socialization
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Gool, Luc Van; Breitenstein, Michael D.; Gammeter, Stephan; Grabner, Helmut & Quack, Till
 * Mining from large image sets
 * Proceeding of the ACM International Conference on Image and Video Retrieval
 * 2009
 * 


 * -- align="left" valign=top
 * Gore, David; Lee, Marie & Wassus, Kenny
 * New Possibilities with IT and Print Technologies: Variable Data Printing VDP
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Gray, Kathleen
 * Originality and Plagiarism Resources for Academic Staff Development in the Era of New Web Authoring Formats
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Greenberg, Valerie & Carbajal, Darlene
 * Using Convergent Media to Engage Graduate Students in a Digital and Electronic Writing class: Some Surprising Results
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Greene, M.
 * Epidemiological Monitoring for Emerging Infectious Diseases
 * Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense IX, 5-8 April 2010 USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Greenhow, Christine
 * What Teacher Education Needs to Know about Web 2.0: Preparing New Teachers in the 21st Century
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Greenhow, Christine; Searson, Michael & Strudler, Neal
 * FWIW: What the Research Says About Engaging the Web 2.0 Generation
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Guerrero, Shannon
 * Web 2.0 in a Preservice Math Methods Course: Teacher Candidates’ Perceptions and Predictions
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Guetl, Christian
 * Context-sensitive and Personalized Concept-based Access to Knowledge for Learning and Training Purposes
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Guo, Zinan & Greer, Jim
 * Connecting E-portfolios and Learner Models
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2005
 * 


 * -- align="left" valign=top
 * Gupta, Priyanka; Seals, Cheryl & Wilson, Dale-Marie
 * Design And Evaluation of SimBuilder
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Gurevych, Iryna & Zesch, Torsten
 * Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
 * 2009
 * 
 * {{hidden||Welcome to the proceedings of the ACL} Workshop The} People's Web Meets NLP:} Collaboratively Constructed Semantic Resources". The workshop attracted 21 submissions of which 9 are included in these proceedings. We are gratified by this level of interest. This workshop was motivated by the observation that the NLP} community is currently considerably influenced by online resources which are collaboratively constructed by ordinary users on the Web. In many works such resources have been used as semantic resources overcoming the knowledge acquisition bottleneck and coverage problems pertinent to conventional lexical semantic resources. The resource that has gained the greatest popularity in this respect so far is Wikipedia. However the scope of the workshop deliberately exceeded Wikipedia. We are happy that the proceedings include papers on resources such as Wiktionary Mechanical Turk or creating semantic resources through online games. This encourages us in our belief that collaboratively constructed semantic resources are of growing interest for the natural language processing community. We should also add that we hoped to bring together researchers from both worlds: those using collaboratively created resources in NLP} applications and those using NLP} applications for improving the resources or extracting different types of semantic information from them. This is also reflected in the proceedings although the stronger interest was taken in using semantic resources for NLP} applications."}}
 * {{hidden||Welcome to the proceedings of the ACL} Workshop The} People's Web Meets NLP:} Collaboratively Constructed Semantic Resources". The workshop attracted 21 submissions of which 9 are included in these proceedings. We are gratified by this level of interest. This workshop was motivated by the observation that the NLP} community is currently considerably influenced by online resources which are collaboratively constructed by ordinary users on the Web. In many works such resources have been used as semantic resources overcoming the knowledge acquisition bottleneck and coverage problems pertinent to conventional lexical semantic resources. The resource that has gained the greatest popularity in this respect so far is Wikipedia. However the scope of the workshop deliberately exceeded Wikipedia. We are happy that the proceedings include papers on resources such as Wiktionary Mechanical Turk or creating semantic resources through online games. This encourages us in our belief that collaboratively constructed semantic resources are of growing interest for the natural language processing community. We should also add that we hoped to bring together researchers from both worlds: those using collaboratively created resources in NLP} applications and those using NLP} applications for improving the resources or extracting different types of semantic information from them. This is also reflected in the proceedings although the stronger interest was taken in using semantic resources for NLP} applications."}}
 * {{hidden||Welcome to the proceedings of the ACL} Workshop The} People's Web Meets NLP:} Collaboratively Constructed Semantic Resources". The workshop attracted 21 submissions of which 9 are included in these proceedings. We are gratified by this level of interest. This workshop was motivated by the observation that the NLP} community is currently considerably influenced by online resources which are collaboratively constructed by ordinary users on the Web. In many works such resources have been used as semantic resources overcoming the knowledge acquisition bottleneck and coverage problems pertinent to conventional lexical semantic resources. The resource that has gained the greatest popularity in this respect so far is Wikipedia. However the scope of the workshop deliberately exceeded Wikipedia. We are happy that the proceedings include papers on resources such as Wiktionary Mechanical Turk or creating semantic resources through online games. This encourages us in our belief that collaboratively constructed semantic resources are of growing interest for the natural language processing community. We should also add that we hoped to bring together researchers from both worlds: those using collaboratively created resources in NLP} applications and those using NLP} applications for improving the resources or extracting different types of semantic information from them. This is also reflected in the proceedings although the stronger interest was taken in using semantic resources for NLP} applications."}}


 * -- align="left" valign=top
 * Guru, D. S.; Harish, B. S. & Manjunath, S.
 * Symbolic representation of text documents
 * Proceedings of the Third Annual ACM Bangalore Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Gyarmati, A. & Jones, G.J.F.
 * When to Cross Over? Cross-Language Linking Using Wikipedia for VideoCLEF 2009
 * Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||We describe Dublin City University (DCU)'s} participation in the VideoCLEF} 2009 Linking Task. Two approaches were implemented using the Lemur information retrieval toolkit. Both approaches first extracted a search query from the transcriptions of the Dutch TV} broadcasts. One method first performed search on a Dutch Wikipedia archive, then followed links to corresponding pages in the English Wikipedia. The other method first translated the extracted query using machine translation and then searched the English Wikipedia collection directly. We found that using the original Dutch transcription query for searching the Dutch Wikipedia yielded better results.}}
 * {{hidden||We describe Dublin City University (DCU)'s} participation in the VideoCLEF} 2009 Linking Task. Two approaches were implemented using the Lemur information retrieval toolkit. Both approaches first extracted a search query from the transcriptions of the Dutch TV} broadcasts. One method first performed search on a Dutch Wikipedia archive, then followed links to corresponding pages in the English Wikipedia. The other method first translated the extracted query using machine translation and then searched the English Wikipedia collection directly. We found that using the original Dutch transcription query for searching the Dutch Wikipedia yielded better results.}}


 * -- align="left" valign=top
 * Hamilton, Margaret & Howell, Sheila
 * Technology Options for Assessment Purposes and Quality Graduate Outcomes
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2006
 * 


 * -- align="left" valign=top
 * Hammond, Thomas; Friedman, Adam; Keeler, Christy; Manfra, Meghan & Metan, Demet
 * Epistemology is elementary: Historical thinking as applied epistemology in an elementary social studies methods class
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Haridas, M. & Caragea, D.
 * Exploring Wikipedia and DMoz as Knowledge Bases for Engineering a User Interests Hierarchy for Social Network Applications
 * On the Move to Meaningful Internet Systems: OTM 2009. Confederated International Conferences CoopIS, DOA, IS, and ODBASE 2009, 1-6 Nov. 2009 Berlin, Germany
 * 2009
 * 


 * -- align="left" valign=top
 * Harman, D.; Kando, N.; Lalmas, M. & Peters, C.
 * The Four Ladies of Experimental Evaluation
 * Multilingual and Multimodal Information Access Evaluation. International Conference of the Cross-Language Evaluation Forum, CLEF 2010, 20-23 Sept. 2010 Berlin, Germany
 * 2010
 * 
 * {{hidden||The goal of the panel is to present some of the main lessons that we have learned in well over a decade of experimental evaluation and to promote discussion with respect to what the future objectives in this field should Be.TREC} was started in 1992 in conjunction with the building of a new 2 GB} test collection for the DARPA} TIPSTER} project. Whereas the main task in the early TRECs} was the adhoc retrieval task in English, many other tasks such as question-answering, web retrieval, and retrieval within specific domain have been tried over the years. NTCIR, the Asian version of TREC, started in 1999 and has run in an 18-months cycle. Whereas NTCIR} is similar to TREC, there has always been a tighter connection to the NLP} community, allowing for some unique tracks. Additionally NTCIR} has done extensive pioneering work with patents, including searching, classification, and translation. The coordination of the European CLIR} task moved from TREC} to Europe in 2000 and CLEF} (Cross-Language} Information Forum) was launched. The objective was to expand the European CLIR} effort by including more languages and more tasks, and by encouraging more participation from Europe. The INitiative} for the Evaluation of XML} retrieval (INEX) started in 2002 to provide evaluation of structured document retrieval, in particular to investigate the retrieval of document components that are XML} elements of varying granularity. The initiative used 12,107 full-text scientific articles from 18 IEEE} Computer Society publications, with each article containing 1,532 XML} nodes on average. The collection grew to 16,819 articles in 2005 and moved on to using Wikipedia articles starting in 2006.}}
 * {{hidden||The goal of the panel is to present some of the main lessons that we have learned in well over a decade of experimental evaluation and to promote discussion with respect to what the future objectives in this field should Be.TREC} was started in 1992 in conjunction with the building of a new 2 GB} test collection for the DARPA} TIPSTER} project. Whereas the main task in the early TRECs} was the adhoc retrieval task in English, many other tasks such as question-answering, web retrieval, and retrieval within specific domain have been tried over the years. NTCIR, the Asian version of TREC, started in 1999 and has run in an 18-months cycle. Whereas NTCIR} is similar to TREC, there has always been a tighter connection to the NLP} community, allowing for some unique tracks. Additionally NTCIR} has done extensive pioneering work with patents, including searching, classification, and translation. The coordination of the European CLIR} task moved from TREC} to Europe in 2000 and CLEF} (Cross-Language} Information Forum) was launched. The objective was to expand the European CLIR} effort by including more languages and more tasks, and by encouraging more participation from Europe. The INitiative} for the Evaluation of XML} retrieval (INEX) started in 2002 to provide evaluation of structured document retrieval, in particular to investigate the retrieval of document components that are XML} elements of varying granularity. The initiative used 12,107 full-text scientific articles from 18 IEEE} Computer Society publications, with each article containing 1,532 XML} nodes on average. The collection grew to 16,819 articles in 2005 and moved on to using Wikipedia articles starting in 2006.}}


 * -- align="left" valign=top
 * Hartrumpf, S.; Bruck, T. Vor Der & Eichhorn, C.
 * Detecting duplicates with shallow and parser-based methods
 * 2010 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 2010), 21-23 Aug. 2010 Piscataway, NJ, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Hartrumpf, S. & Leveling, J.
 * Recursive Question Decomposition for Answering Complex Geographic Questions
 * Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||This paper describes the GIRSA-WP} system and the experiments performed for GikiCLEF} 2009, the geographic information retrieval task in the question answering track at CLEF} 2009. Three runs were submitted. The first one contained only results from the InSicht} QA} system; it showed high precision, but low recall. The combination with results from the GIR} system GIRSA} increased recall considerably, but reduced precision. The second run used a standard IR} query, while the third run combined such queries with a Boolean query with selected keywords. The evaluation showed that the third run achieved significantly higher mean average precision (MAP) than the second run. In both cases, integrating GIR} methods and QA} methods was successful in combining their strengths (high precision of deep QA, high recall of GIR), resulting in the third-best performance of automatic runs in GikiCLEF.} The overall performance still leaves room for improvements. For example, the multilingual approach is too simple. All processing is done in only one Wikipedia (the German one); results for the nine other languages are collected by following the translation links in Wikipedia.}}
 * {{hidden||This paper describes the GIRSA-WP} system and the experiments performed for GikiCLEF} 2009, the geographic information retrieval task in the question answering track at CLEF} 2009. Three runs were submitted. The first one contained only results from the InSicht} QA} system; it showed high precision, but low recall. The combination with results from the GIR} system GIRSA} increased recall considerably, but reduced precision. The second run used a standard IR} query, while the third run combined such queries with a Boolean query with selected keywords. The evaluation showed that the third run achieved significantly higher mean average precision (MAP) than the second run. In both cases, integrating GIR} methods and QA} methods was successful in combining their strengths (high precision of deep QA, high recall of GIR), resulting in the third-best performance of automatic runs in GikiCLEF.} The overall performance still leaves room for improvements. For example, the multilingual approach is too simple. All processing is done in only one Wikipedia (the German one); results for the nine other languages are collected by following the translation links in Wikipedia.}}


 * -- align="left" valign=top
 * Hattori, S. & Tanaka, K.
 * Extracting concept hierarchy knowledge from the Web based on property inheritance and aggregation
 * Wl 2008. 2008 IEEE/WIC/ACM International Conference on Web Intelligence. IAT 2008. 2008 IEEE/WIC/ACM International Conference on Intelligent Agent Technology. Wl-IAT Workshop 2008. 2008 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops, 9-12 Dec. 2008 Piscataway, NJ, USA}
 * 2008
 * 
 * {{hidden||Concept hierarchy knowledge, such as hyponymy and meronymy, is very important for various natural language processing systems. While WordNet} and Wikipedia are being manually constructed and maintained as lexical ontologies, many researchers have tackled how to extract concept hierarchies from very large corpora of text documents such as the Web not manually but automatically. However, their methods are mostly based on lexico-syntactic patterns as not necessary but sufficient conditions of hyponymy and meronymy, so they can achieve high precision but low recall when using stricter patterns or they can achieve high recall but low precision when using looser patterns. Therefore, we need necessary conditions of hyponymy and meronymy to achieve high recall and not low precision. In this paper, not only Property} Inheritance from a target concept to its hyponyms but also {"Property} Aggregation from its hyponyms to the target concept is assumed to be necessary and sufficient conditions of hyponymy and we propose a method to extract concept hierarchy knowledge from the Web based on property inheritance and property aggregation."}}
 * {{hidden||Concept hierarchy knowledge, such as hyponymy and meronymy, is very important for various natural language processing systems. While WordNet} and Wikipedia are being manually constructed and maintained as lexical ontologies, many researchers have tackled how to extract concept hierarchies from very large corpora of text documents such as the Web not manually but automatically. However, their methods are mostly based on lexico-syntactic patterns as not necessary but sufficient conditions of hyponymy and meronymy, so they can achieve high precision but low recall when using stricter patterns or they can achieve high recall but low precision when using looser patterns. Therefore, we need necessary conditions of hyponymy and meronymy to achieve high recall and not low precision. In this paper, not only Property} Inheritance from a target concept to its hyponyms but also {"Property} Aggregation from its hyponyms to the target concept is assumed to be necessary and sufficient conditions of hyponymy and we propose a method to extract concept hierarchy knowledge from the Web based on property inheritance and property aggregation."}}


 * -- align="left" valign=top
 * Hauck, Rita
 * Immersion in another Language and Culture through Multimedia and Web Resources
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Hecht, Brent & Gergle, Darren
 * Measuring self-focus bias in community-maintained knowledge repositories
 * Proceedings of the fourth international conference on Communities and technologies
 * 2009
 * 


 * -- align="left" valign=top
 * Hecht, Brent J. & Gergle, Darren
 * On the localness" of user-generated content"
 * Proceedings of the 2010 ACM conference on Computer supported cooperative work
 * 2010
 * 
 * {{hidden||The localness" of participation in repositories of user-generated content (UGC) with geospatial components has been cited as one of UGC's} greatest benefits. However the degree of localness in major UGC} repositories such as Flickr and Wikipedia has never been examined. We show that over 50 percent of Flickr users contribute local information on average and over 45 percent of Flickr photos are local to the photographer. Across four language editions of Wikipedia however we find that participation is less local. We introduce the spatial content production model (SCPM) as a possible factor in the localness of UGC} and discuss other theoretical and applied implications."}}
 * {{hidden||The localness" of participation in repositories of user-generated content (UGC) with geospatial components has been cited as one of UGC's} greatest benefits. However the degree of localness in major UGC} repositories such as Flickr and Wikipedia has never been examined. We show that over 50 percent of Flickr users contribute local information on average and over 45 percent of Flickr photos are local to the photographer. Across four language editions of Wikipedia however we find that participation is less local. We introduce the spatial content production model (SCPM) as a possible factor in the localness of UGC} and discuss other theoretical and applied implications."}}


 * -- align="left" valign=top
 * Heer, Rex
 * My Space in College: Students Use of Virtual Communities to Define their Fit in Higher Education
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Hellmann, S.; Stadler, C.; Lehmann, J. & Auer, S.
 * DBpedia Live Extraction
 * On the Move to Meaningful Internet Systems: OTM 2009. Confederated International Conferences CoopIS, DOA, IS, and ODBASE 2009, 1-6 Nov. 2009 Berlin, Germany
 * 2009
 * 
 * {{hidden||The DBpedia} project extracts information from Wikipedia, interlinks it with other knowledge bases, and makes this data available as RDF.} So far the DBpedia} project has succeeded in creating one of the largest knowledge bases on the Data Web, which is used in many applications and research prototypes. However, the heavy-weight extraction process has been a drawback. It requires manual effort to produce a new release and the extracted information is not up-to-date. We extended DBpedia} with a live extraction framework, which is capable of processing tens of thousands of changes per day in order to consume the constant stream of Wikipedia updates. This allows direct modifications of the knowledge base and closer interaction of users with DBpedia.} We also show how the Wikipedia community itself is now able to take part in the DBpedia} ontology engineering process and that an interactive roundtrip engineering between Wikipedia and DBpedia} is made possible.}}
 * {{hidden||The DBpedia} project extracts information from Wikipedia, interlinks it with other knowledge bases, and makes this data available as RDF.} So far the DBpedia} project has succeeded in creating one of the largest knowledge bases on the Data Web, which is used in many applications and research prototypes. However, the heavy-weight extraction process has been a drawback. It requires manual effort to produce a new release and the extracted information is not up-to-date. We extended DBpedia} with a live extraction framework, which is capable of processing tens of thousands of changes per day in order to consume the constant stream of Wikipedia updates. This allows direct modifications of the knowledge base and closer interaction of users with DBpedia.} We also show how the Wikipedia community itself is now able to take part in the DBpedia} ontology engineering process and that an interactive roundtrip engineering between Wikipedia and DBpedia} is made possible.}}


 * -- align="left" valign=top
 * Hengstler, Julia
 * Exploring Open Source for Educators: We're Not in Kansas Anymore--Entering Os
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Hennis, Thieme; Veen, Wim & Sjoer, Ellen
 * Future of Open Courseware; A Case Study
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Heo, Gyeong Mi; Lee, Romee & Park, Young
 * Blog as a Meaningful Learning Context: Adult Bloggers as Cyworld Users in Korea
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Herbold, Katy & Hsiao, Wei-Ying
 * Online Learning on Steroids: Combining Brain Research with Time Saving Techniques
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Herczeg, Michael
 * Educational Media: From Canned Brain Food to Knowledge Traces
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Herring, Donna & Friery, Kathleen
 * efolios for 21st Century Learners
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Herring, Donna; Hibbs, Roger; Morgan, Beth & Notar, Charles
 * Show What You Know: ePortfolios for 21st Century Learners
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Herrington, Anthony; Kervin, Lisa & Ilias, Joanne
 * Blogging Beginning Teaching
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Herrington, Jan
 * Authentic E-Learning in Higher Education: Design Principles for Authentic Learning Environments and Tasks
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Heuer, Lars
 * Towards converting the internet into topic maps
 * Proceedings of the 2nd international conference on Topic maps research and applications
 * 2006
 * 


 * -- align="left" valign=top
 * Hewitt, Jim & Peters, Vanessa
 * Using Wikis to Support Knowledge Building in a Graduate Education Course
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2006
 * 


 * -- align="left" valign=top
 * Hewitt, Jim; Peters, Vanessa & Brett, Clare
 * Using Wiki Technologies as an Adjunct to Computer Conferencing in a Graduate Teacher Education Course
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Higdon, Jude; Miller, Sean & Paul, Nora
 * Educational Gaming for the Rest of Us: Thinking Worlds and WYSIWYG Game Development
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Hoehndorf, R.; Prufer, K.; Backhaus, M.; Herre, H.; Kelso, J.; Loebe, F. & Visagie, J.
 * A proposal for a gene functions wiki
 * On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops Berlin, Germany
 * 2006
 * {{hidden||Large knowledge bases integrating different domains can provide a foundation for new applications in biology such as data mining or automated reasoning. The traditional approach to the construction of such knowledge bases is manual and therefore extremely time consuming. The ubiquity of the Internet now makes large-scale community collaboration for the construction of knowledge bases, such as the successful online encyclopedia Wikipedia"} possible. We propose an extension of this model to the collaborative annotation of molecular data. We argue that a semantic wiki provides the functionality required for this project since this can capitalize on the existing representations in biological ontologies. We discuss the use of a different relationship model than the one provided by RDF} and OWL} to represent the semantic data. We argue that this leads to a more intuitive and correct way to enter semantic content in the wiki. Furthermore we show how formal ontologies could be used to increase the usability of the software through type-checking and automatic reasoning"}}
 * {{hidden||Large knowledge bases integrating different domains can provide a foundation for new applications in biology such as data mining or automated reasoning. The traditional approach to the construction of such knowledge bases is manual and therefore extremely time consuming. The ubiquity of the Internet now makes large-scale community collaboration for the construction of knowledge bases, such as the successful online encyclopedia Wikipedia"} possible. We propose an extension of this model to the collaborative annotation of molecular data. We argue that a semantic wiki provides the functionality required for this project since this can capitalize on the existing representations in biological ontologies. We discuss the use of a different relationship model than the one provided by RDF} and OWL} to represent the semantic data. We argue that this leads to a more intuitive and correct way to enter semantic content in the wiki. Furthermore we show how formal ontologies could be used to increase the usability of the software through type-checking and automatic reasoning"}}
 * {{hidden||Large knowledge bases integrating different domains can provide a foundation for new applications in biology such as data mining or automated reasoning. The traditional approach to the construction of such knowledge bases is manual and therefore extremely time consuming. The ubiquity of the Internet now makes large-scale community collaboration for the construction of knowledge bases, such as the successful online encyclopedia Wikipedia"} possible. We propose an extension of this model to the collaborative annotation of molecular data. We argue that a semantic wiki provides the functionality required for this project since this can capitalize on the existing representations in biological ontologies. We discuss the use of a different relationship model than the one provided by RDF} and OWL} to represent the semantic data. We argue that this leads to a more intuitive and correct way to enter semantic content in the wiki. Furthermore we show how formal ontologies could be used to increase the usability of the software through type-checking and automatic reasoning"}}


 * -- align="left" valign=top
 * Holcomb, Lori & Beal, Candy
 * Using Web 2.0 to Support Learning in the Social Studies Context
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Holifield, Phil
 * Visual History Project: an Image Map Authoring Tool Assisting Students to Present Project Information
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Holmes, Bryn; Wasty, Shujaat; Hafeez, Khaled & Ahsan, Shakib
 * The Knowledge Box: Can a technology bring schooling to children in crisis?
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Honnibal, Matthew; Nothman, Joel & Curran, James R.
 * Evaluating a statistical CCG parser on Wikipedia
 * Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
 * 2009
 * 
 * {{hidden||The vast majority of parser evaluation is conducted on the 1984 Wall Street Journal (WSJ).} In-domain evaluation of this kind is important for system development, but gives little indication about how the parser will perform on many practical problems. Wikipedia is an interesting domain for parsing that has so far been under-explored. We present statistical parsing results that for the first time provide information about what sort of performance a user parsing Wikipedia text can expect. We find that the C\&C} parser's standard model is 4.3\% less accurate on Wikipedia text, but that a simple self-training exercise reduces the gap to 3.8\%. The self-training also speeds up the parser on newswire text by 20\%.}}
 * {{hidden||The vast majority of parser evaluation is conducted on the 1984 Wall Street Journal (WSJ).} In-domain evaluation of this kind is important for system development, but gives little indication about how the parser will perform on many practical problems. Wikipedia is an interesting domain for parsing that has so far been under-explored. We present statistical parsing results that for the first time provide information about what sort of performance a user parsing Wikipedia text can expect. We find that the C\&C} parser's standard model is 4.3\% less accurate on Wikipedia text, but that a simple self-training exercise reduces the gap to 3.8\%. The self-training also speeds up the parser on newswire text by 20\%.}}


 * -- align="left" valign=top
 * Hopson, David & Martland, David
 * Network Web Directories: Do they deliver and to whom?
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2004
 * 


 * -- align="left" valign=top
 * Hoven, Debra
 * Networking to learn: blogging for social and collaborative purposes
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Hsu, Yu-Chang; Ching, Yu-Hui & Grabowski, Barbara
 * Bookmarking/Tagging in the Web 2.0 Era: From an Individual Cognitive Tool to a Collaborative Knowledge Construction Tool for Educators
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Hu, Meiqun; Lim, Ee-Peng; Sun, Aixin; Lauw, Hady Wirawan & Vuong, Ba-Quy
 * On improving wikipedia search using article quality
 * Proceedings of the 9th annual ACM international workshop on Web information and data management
 * 2007
 * 


 * -- align="left" valign=top
 * ling Huang, Hsiang & ju Hung, Yu
 * An overview of information technology on language education
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Huang, Wenhao & Yoo, Sunjoo
 * How Do Web 2.0 Technologies Motivate Learners? A Regression Analysis based on the Motivation, Volition, and Performance Theory
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Huang, Yin-Fu & Huang, Yu-Yu
 * A framework automating domain ontology construction
 * WEBIST 2008. Fourth International Conference on Web Information Systems and Technologies, 4-7 May 2008 Madeira, Portugal
 * 2008
 * {{hidden||This paper proposed a general framework that could automatically construct domain ontology on a collection of documents with the help of The Free Dictionary, WordNet, and Wikipedia Categories. Both explicit and implicit features of index terms in documents are used to evaluate word correlations and then to construct Is-A} relationships in the framework. Thus, the built ontology would consist of 1) concepts, 2) Is-A} and Parts-of relationships among concepts, and 3) word relationships. Besides, the built ontology could be further refined by learning from incremental documents periodically. To help users browse the built ontology, an ontology browsing system was implemented and provided different search modes and functionality to facilitate searching a variety of relationships.}}
 * {{hidden||This paper proposed a general framework that could automatically construct domain ontology on a collection of documents with the help of The Free Dictionary, WordNet, and Wikipedia Categories. Both explicit and implicit features of index terms in documents are used to evaluate word correlations and then to construct Is-A} relationships in the framework. Thus, the built ontology would consist of 1) concepts, 2) Is-A} and Parts-of relationships among concepts, and 3) word relationships. Besides, the built ontology could be further refined by learning from incremental documents periodically. To help users browse the built ontology, an ontology browsing system was implemented and provided different search modes and functionality to facilitate searching a variety of relationships.}}
 * {{hidden||This paper proposed a general framework that could automatically construct domain ontology on a collection of documents with the help of The Free Dictionary, WordNet, and Wikipedia Categories. Both explicit and implicit features of index terms in documents are used to evaluate word correlations and then to construct Is-A} relationships in the framework. Thus, the built ontology would consist of 1) concepts, 2) Is-A} and Parts-of relationships among concepts, and 3) word relationships. Besides, the built ontology could be further refined by learning from incremental documents periodically. To help users browse the built ontology, an ontology browsing system was implemented and provided different search modes and functionality to facilitate searching a variety of relationships.}}


 * -- align="left" valign=top
 * Huckell, Travis
 * The Academic Exception as Foundation for Innovation in Online Learning
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Hussein, Ramlah; Saeed, Moona; Karim, Nor Shahriza Abdul & Mohamed, Norshidah
 * Instructor’s Perspective on Factors influencing Effectiveness of Virtual Learning Environment (VLE) in the Malaysian Context: Proposed Framework
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Hwang, Jya-Lin
 * University EFL Students’ Learning Strategies On Multimedia YouTube
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Höller, Harald & Reisinger, Peter
 * Wiki Based Teaching and Learning Scenarios at the University of Vienna
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Høivik, Helge
 * An Experimental Player/Editor for Web-based Multi-Linguistic Cooperative Lectures
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Høivik, Helge
 * Read and Write Text and Context - Learning as Poietic Fields of Engagement
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Iftene, A.
 * Identifying Geographical Entities in Users' Queries
 * Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||In 2009 we built a system in order to compete in the LAGI} task (Log} Analysis and Geographic Query Identification). The system uses an external resource built into GATE} in combination with Wikipedia and Tumba in order to identify geographical entities in user's queries. The results obtained with and without Wikipedia resources are comparable. The main advantage of only using GATE} resources is the improved run time. In the process of system evaluation we have identified the main problem of our approach: the system has insufficient external resources for the recognition of geographic entities.}}
 * {{hidden||In 2009 we built a system in order to compete in the LAGI} task (Log} Analysis and Geographic Query Identification). The system uses an external resource built into GATE} in combination with Wikipedia and Tumba in order to identify geographical entities in user's queries. The results obtained with and without Wikipedia resources are comparable. The main advantage of only using GATE} resources is the improved run time. In the process of system evaluation we have identified the main problem of our approach: the system has insufficient external resources for the recognition of geographic entities.}}


 * -- align="left" valign=top
 * Iftene, Adrian
 * Building a Textual Entailment System for the RTE3 Competition. Application to a QA System
 * Proceedings of the 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
 * 2008
 * 


 * -- align="left" valign=top
 * Indrie, S.M. & Groza, A.
 * Enacting argumentative web in semantic wikipedia
 * 2010 9th Roedunet International Conference (RoEduNet), 24-26 June 2010 Piscataway, NJ, USA}
 * 2010


 * -- align="left" valign=top
 * Ingram, Richard
 * JMU/Microsoft Partnership for 21st Century Skills: Overview of Goals, Activities, and Challenges
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Inkpen, Kori; Gutwin, Carl & Tang, John
 * Proceedings of the 2010 ACM conference on Computer supported cooperative work
 * 2010
 * 
 * {{hidden||Welcome to the 2010 ACM} Conference on Computer Supported Cooperative Work! We hope that this conference will be a place to hear exciting talks about the latest in CSCW} research, an opportunity to learn new things, and a chance to connect with friends in the community. We are pleased to see such a strong and diverse program at this year's conference. We have a mix of research areas represented -- some that are traditionally part of our community, and several that have not been frequently seen at CSCW.} There are sessions to suit every taste: from collaborative software development, healthcare, and groupware technologies, to studies of Wikipedia, family communications, games, and volunteering. We are particularly interested in a new kind of forum at the conference this year -- the {'CSCW} Horizon' -- which will present novel and challenging ideas, and will do so in a more interactive fashion than standard paper sessions. The program is an exciting and topical mix of cutting-edge research and thought in CSCW.} A major change for CSCW} beginning this year is our move from being a biennial to an annual conference. This has meant a change in the time of the conference (from November to February), and subsequent changes in all of our normal deadlines and procedures. Despite these changes, the community has responded with enormous enthusiasm, and we look forward to the future of yearly meetings under the ACM} CSCW} banner.}}
 * {{hidden||Welcome to the 2010 ACM} Conference on Computer Supported Cooperative Work! We hope that this conference will be a place to hear exciting talks about the latest in CSCW} research, an opportunity to learn new things, and a chance to connect with friends in the community. We are pleased to see such a strong and diverse program at this year's conference. We have a mix of research areas represented -- some that are traditionally part of our community, and several that have not been frequently seen at CSCW.} There are sessions to suit every taste: from collaborative software development, healthcare, and groupware technologies, to studies of Wikipedia, family communications, games, and volunteering. We are particularly interested in a new kind of forum at the conference this year -- the {'CSCW} Horizon' -- which will present novel and challenging ideas, and will do so in a more interactive fashion than standard paper sessions. The program is an exciting and topical mix of cutting-edge research and thought in CSCW.} A major change for CSCW} beginning this year is our move from being a biennial to an annual conference. This has meant a change in the time of the conference (from November to February), and subsequent changes in all of our normal deadlines and procedures. Despite these changes, the community has responded with enormous enthusiasm, and we look forward to the future of yearly meetings under the ACM} CSCW} banner.}}
 * {{hidden||Welcome to the 2010 ACM} Conference on Computer Supported Cooperative Work! We hope that this conference will be a place to hear exciting talks about the latest in CSCW} research, an opportunity to learn new things, and a chance to connect with friends in the community. We are pleased to see such a strong and diverse program at this year's conference. We have a mix of research areas represented -- some that are traditionally part of our community, and several that have not been frequently seen at CSCW.} There are sessions to suit every taste: from collaborative software development, healthcare, and groupware technologies, to studies of Wikipedia, family communications, games, and volunteering. We are particularly interested in a new kind of forum at the conference this year -- the {'CSCW} Horizon' -- which will present novel and challenging ideas, and will do so in a more interactive fashion than standard paper sessions. The program is an exciting and topical mix of cutting-edge research and thought in CSCW.} A major change for CSCW} beginning this year is our move from being a biennial to an annual conference. This has meant a change in the time of the conference (from November to February), and subsequent changes in all of our normal deadlines and procedures. Despite these changes, the community has responded with enormous enthusiasm, and we look forward to the future of yearly meetings under the ACM} CSCW} banner.}}


 * -- align="left" valign=top
 * Ioannou, Andri
 * Towards a Promising Technology for Online Collaborative Learning: Wiki Threaded Discussion
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Ioannou, Andri & Artino, Anthony
 * Incorporating Wikis in an Educational Technology Course: Ideas, Reflections and Lessons Learned …
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Ion, Radu; Ştefănescu, Dan; Ceauşu, Alexandru & Tufiş, Dan
 * RACAI's QA system at the Romanian-Romanian QA@CLEF2008 main task
 * Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
 * 2008
 * 
 * {{hidden||This paper describes the participation of the Research Institute for Artificial Intelligence of the Romanian Academy (RACAI) to the Multiple Language Question Answering Main Task at the CLEF} 2008 competition. We present our Question Answering system answering Romanian questions from Romanian Wikipedia documents focusing on the implementation details. The presentation will also emphasize the fact that question analysis, snippet selection and ranking provide a useful basis of any answer extraction mechanism.}}
 * {{hidden||This paper describes the participation of the Research Institute for Artificial Intelligence of the Romanian Academy (RACAI) to the Multiple Language Question Answering Main Task at the CLEF} 2008 competition. We present our Question Answering system answering Romanian questions from Romanian Wikipedia documents focusing on the implementation details. The presentation will also emphasize the fact that question analysis, snippet selection and ranking provide a useful basis of any answer extraction mechanism.}}


 * -- align="left" valign=top
 * Iqbal, Muhammad; Barton, Greg & Barton, Siew Mee
 * Internet in the pesantren: A tool to promote or continue autonomous learning?
 * Global Learn Asia Pacific
 * 2010
 * 


 * -- align="left" valign=top
 * Ireland, Alice; Kaufman, David & Sauvé, Louise
 * Simulation and Advanced Gaming Environments (SAGE) for Learning
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Iske, Stefan & Marotzki, Winfried
 * Wikis: Reflexivity, Processuality and Participation
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Jackson, Allen; Gaudet, Laura; Brammer, Dawn & McDaniel, Larry
 * Curriculum, a Change in Theoretical Thinking Theory
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Jacquin, Christine; Desmontils, Emmanuel & Monceaux, Laura
 * French EuroWordNet Lexical Database Improvements
 * Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
 * 2009
 * 
 * {{hidden||Semantic knowledge is often used in the framework of Natural Language Processing (NLP) applications. However, for some languages different from English, such knowledge is not always easily available. In fact, for example, French thesaurus are not numerous and are not enough developed. In this context, we present two modifications made on the French version of the EuroWordnet} Thesaurus in order to improve it. Firstly, we present the French EuroWordNet} thesaurus and its limits. Then we explain two improvements we have made. We add non-existing relationships by using the bilinguism capability of the EuroWordnet} thesaurus, and definitions by using an external multilingual resource (Wikipedia} [1]).}}
 * {{hidden||Semantic knowledge is often used in the framework of Natural Language Processing (NLP) applications. However, for some languages different from English, such knowledge is not always easily available. In fact, for example, French thesaurus are not numerous and are not enough developed. In this context, we present two modifications made on the French version of the EuroWordnet} Thesaurus in order to improve it. Firstly, we present the French EuroWordNet} thesaurus and its limits. Then we explain two improvements we have made. We add non-existing relationships by using the bilinguism capability of the EuroWordnet} thesaurus, and definitions by using an external multilingual resource (Wikipedia} [1]).}}


 * -- align="left" valign=top
 * Jadidinejad, A.H. & Mahmoudi, F.
 * Cross-language Information Retrieval Using Meta-language Index Construction and Structural Queries
 * Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||Structural Query Language allows expert users to richly represent its information needs but unfortunately, the complexity of SQLs} make them impractical in the Web search engines. Automatically detecting the concepts in an unstructured user's information need and generating a richly structured, multilingual equivalent query is an ideal solution. We utilize Wikipedia as a great concept repository and also some state of the art algorithms for extracting Wikipedia's concepts from the user's information need. This process is called Query} Wikification". Our experiments on the TEL} corpus at CLEF2009} achieves +23\% and + 17\% improvement in Mean Average Precision and Recall against the baseline. Our approach is unique in that it does improve both precision and recall; two pans that often improving one hurt the another."}}
 * {{hidden||Structural Query Language allows expert users to richly represent its information needs but unfortunately, the complexity of SQLs} make them impractical in the Web search engines. Automatically detecting the concepts in an unstructured user's information need and generating a richly structured, multilingual equivalent query is an ideal solution. We utilize Wikipedia as a great concept repository and also some state of the art algorithms for extracting Wikipedia's concepts from the user's information need. This process is called Query} Wikification". Our experiments on the TEL} corpus at CLEF2009} achieves +23\% and + 17\% improvement in Mean Average Precision and Recall against the baseline. Our approach is unique in that it does improve both precision and recall; two pans that often improving one hurt the another."}}


 * -- align="left" valign=top
 * Jamaludin, Rozinah; Annamalai, Subashini & Abdulwahed, Mahmoud
 * Web 1.0, Web 2.0: Implications to move from Education 1.0 to Education 2.0 to enhance collaborative intelligence towards the future of Web 3.0
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * von Jan, Ute; Ammann, Alexander; Matthies, Herbert K. & von Jan, Ute
 * Generating and Presenting Dynamic Knowledge in Medicine and Dentistry
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Jang, Soobaek & Green, T.M.
 * Best practices on delivering a wiki collaborative solution for enterprise applications
 * 2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing, 17-20 Nov. 2006 Piscataway, NJ, USA}
 * 2006


 * -- align="left" valign=top
 * Jankowski, Jacek & Decker, Stefan
 * 2LIP: filling the gap between the current and the three-dimensional web
 * Proceedings of the 14th International Conference on 3D Web Technology
 * 2009
 * 


 * -- align="left" valign=top
 * Jansche, Martin & Sproat, Richard
 * Named entity transcription with pair n-gram models
 * Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
 * 2009
 * 
 * {{hidden||We submitted results for each of the eight shared tasks. Except for Japanese name kanji restoration, which uses a noisy channel model, our Standard Run submissions were produced by generative long-range pair n-gram models, which we mostly augmented with publicly available data (either from LDC} datasets or mined from Wikipedia) for the Non-Standard} Runs.}}
 * {{hidden||We submitted results for each of the eight shared tasks. Except for Japanese name kanji restoration, which uses a noisy channel model, our Standard Run submissions were produced by generative long-range pair n-gram models, which we mostly augmented with publicly available data (either from LDC} datasets or mined from Wikipedia) for the Non-Standard} Runs.}}


 * -- align="left" valign=top
 * Javanmardi, S. & Lopes, C.V.
 * Modeling trust in collaborative information systems
 * 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2007), 12 Nov.-15 Nov. 2007 Piscataway, NJ, USA}
 * 2007
 * 


 * -- align="left" valign=top
 * Jijkoun, Valentin; Khalid, Mahboob Alam; Marx, Maarten & de Rijke, Maarten
 * Named entity normalization in user generated content
 * Proceedings of the second workshop on Analytics for noisy unstructured text data
 * 2008
 * 
 * {{hidden||Named entity recognition is important for semantically oriented retrieval tasks, such as question answering, entity retrieval, biomedical retrieval, trend detection, and event and entity tracking. In many of these tasks it is important to be able to accurately normalize the recognized entities, i.e., to map surface forms to unambiguous references to real world entities. Within the context of structured databases, this task (known as record linkage and data de-duplication) has been a topic of active research for more than five decades. For edited content, such as news articles, the named entity normalization (NEN) task is one that has recently attracted considerable attention. We consider the task in the challenging context of user generated content (UGC), where it forms a key ingredient of tracking and media-analysis systems. A baseline NEN} system from the literature (that normalizes surface forms to Wikipedia pages) performs considerably worse on UGC} than on edited news: accuracy drops from 80\% to 65\% for a Dutch language data set and from 94\% to 77\% for English. We identify several sources of errors: entity recognition errors, multiple ways of referring to the same entity and ambiguous references. To address these issues we propose five improvements to the baseline NEN} algorithm, to arrive at a language independent NEN} system that achieves overall accuracy scores of 90\% on the English data set and 89\% on the Dutch data set. We show that each of the improvements contributes to the overall score of our improved NEN} algorithm, and conclude with an error analysis on both Dutch and English language UGC.} The NEN} system is computationally efficient and runs with very modest computational requirements.}}
 * {{hidden||Named entity recognition is important for semantically oriented retrieval tasks, such as question answering, entity retrieval, biomedical retrieval, trend detection, and event and entity tracking. In many of these tasks it is important to be able to accurately normalize the recognized entities, i.e., to map surface forms to unambiguous references to real world entities. Within the context of structured databases, this task (known as record linkage and data de-duplication) has been a topic of active research for more than five decades. For edited content, such as news articles, the named entity normalization (NEN) task is one that has recently attracted considerable attention. We consider the task in the challenging context of user generated content (UGC), where it forms a key ingredient of tracking and media-analysis systems. A baseline NEN} system from the literature (that normalizes surface forms to Wikipedia pages) performs considerably worse on UGC} than on edited news: accuracy drops from 80\% to 65\% for a Dutch language data set and from 94\% to 77\% for English. We identify several sources of errors: entity recognition errors, multiple ways of referring to the same entity and ambiguous references. To address these issues we propose five improvements to the baseline NEN} algorithm, to arrive at a language independent NEN} system that achieves overall accuracy scores of 90\% on the English data set and 89\% on the Dutch data set. We show that each of the improvements contributes to the overall score of our improved NEN} algorithm, and conclude with an error analysis on both Dutch and English language UGC.} The NEN} system is computationally efficient and runs with very modest computational requirements.}}


 * -- align="left" valign=top
 * Jitkrittum, Wittawat; Haruechaiyasak, Choochart & Theeramunkong, Thanaruk
 * QAST: question answering system for Thai Wikipedia
 * Proceedings of the 2009 Workshop on Knowledge and Reasoning for Answering Questions
 * 2009
 * 
 * {{hidden||We propose an open-domain question answering system using Thai Wikipedia as the knowledge base. Two types of information are used for answering a question: (1) structured information extracted and stored in the form of Resource Description Framework (RDF), and (2) unstructured texts stored as a search index. For the structured information, SPARQL} transformed query is applied to retrieve a short answer from the RDF} base. For the unstructured information, keyword-based query is used to retrieve the shortest text span containing the questions's key terms. From the experimental results, the system which integrates both approaches could achieve an average MRR} of 0.47 based on 215 test questions.}}
 * {{hidden||We propose an open-domain question answering system using Thai Wikipedia as the knowledge base. Two types of information are used for answering a question: (1) structured information extracted and stored in the form of Resource Description Framework (RDF), and (2) unstructured texts stored as a search index. For the structured information, SPARQL} transformed query is applied to retrieve a short answer from the RDF} base. For the unstructured information, keyword-based query is used to retrieve the shortest text span containing the questions's key terms. From the experimental results, the system which integrates both approaches could achieve an average MRR} of 0.47 based on 215 test questions.}}


 * -- align="left" valign=top
 * Johnson, Peter C.; Kapadia, Apu; Tsang, Patrick P. & Smith, Sean W.
 * Nymble: anonymous IP-address blocking
 * Proceedings of the 7th international conference on Privacy enhancing technologies
 * 2007
 * 
 * {{hidden||Anonymizing networks such as Tor allow users to access Internet services privately using a series of routers to hide the client's IP} address from the server. Tor's success, however, has been limited by users employing this anonymity for abusive purposes, such as defacing Wikipedia. Website administrators rely on IP-address} blocking for disabling access to misbehaving users, but this is not practical if the abuser routes through Tor. As a result, administrators block all Tor exit nodes, denying anonymous access to honest and dishonest users alike. To address this problem, we present a system in which (1) honest users remain anonymous and their requests unlinkable; (2) a server can complain about a particular anonymous user and gain the ability to blacklist the user for future connections; (3) this blacklisted user's accesses before the complaint remain anonymous; and (4) users are aware of their blacklist status before accessing a service. As a result of these properties, our system is agnostic to different servers' definitions of misbehavior.}}
 * {{hidden||Anonymizing networks such as Tor allow users to access Internet services privately using a series of routers to hide the client's IP} address from the server. Tor's success, however, has been limited by users employing this anonymity for abusive purposes, such as defacing Wikipedia. Website administrators rely on IP-address} blocking for disabling access to misbehaving users, but this is not practical if the abuser routes through Tor. As a result, administrators block all Tor exit nodes, denying anonymous access to honest and dishonest users alike. To address this problem, we present a system in which (1) honest users remain anonymous and their requests unlinkable; (2) a server can complain about a particular anonymous user and gain the ability to blacklist the user for future connections; (3) this blacklisted user's accesses before the complaint remain anonymous; and (4) users are aware of their blacklist status before accessing a service. As a result of these properties, our system is agnostic to different servers' definitions of misbehavior.}}


 * -- align="left" valign=top
 * Jordan, C.; Watters, C. & Toms, E.
 * Using Wikipedia to make academic abstracts more readable
 * Proceedings of the American Society for Information Science and Technology
 * 2008
 * 


 * -- align="left" valign=top
 * Junior, João Batista Bottentuit; Coutinho, Clara & Junior, João Batista Bottentuit
 * The use of mobile technologies in Higher Education in Portugal: an exploratory survey
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Kabisch, Thomas; Padur, Ronald & Rother, Dirk
 * UsingWeb Knowledge to Improve the Wrapping of Web Sources
 * Proceedings of the 22nd International Conference on Data Engineering Workshops
 * 2006
 * 
 * {{hidden||During the wrapping of web interfaces ontological know-ledge is important in order to support an automated interpretation of information. The development of ontologies is a time consuming issue and not realistic in global contexts. On the other hand, the web provides a huge amount of knowledge, which can be used instead of ontologies. Three common classes of web knowledge sources are: Web Thesauri, search engines and Web encyclopedias. The paper investigates how Web knowledge can be utilized to solve the three semantic problems Parameter Finding for Query Interfaces, Labeling of Values and Relabeling after interface evolution. For the solution of the parameter finding problem an algorithm has been implemented using the web encyclopedia WikiPedia} for the initial identification of parameter value candidates and the search engine Google for a validation of label-value relationships. The approach has been integrated into a wrapper definition framework.}}
 * {{hidden||During the wrapping of web interfaces ontological know-ledge is important in order to support an automated interpretation of information. The development of ontologies is a time consuming issue and not realistic in global contexts. On the other hand, the web provides a huge amount of knowledge, which can be used instead of ontologies. Three common classes of web knowledge sources are: Web Thesauri, search engines and Web encyclopedias. The paper investigates how Web knowledge can be utilized to solve the three semantic problems Parameter Finding for Query Interfaces, Labeling of Values and Relabeling after interface evolution. For the solution of the parameter finding problem an algorithm has been implemented using the web encyclopedia WikiPedia} for the initial identification of parameter value candidates and the search engine Google for a validation of label-value relationships. The approach has been integrated into a wrapper definition framework.}}


 * -- align="left" valign=top
 * Kallis, John R. & Patti, Christine
 * Creating an Enhanced Podcast with Section 508
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Kameyama, Shumei; Uchida, Makoto & Shirayama, Susumu
 * A New Method for Identifying Detected Communities Based on Graph Substructure
 * Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
 * 2007
 * 


 * -- align="left" valign=top
 * Kaminishi, Hidekazu & Murota, Masao
 * Development of Multi-Screen Presentation Software
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Kapur, Manu; Hung, David; Jacobson, Michael; Voiklis, John; Kinzer, Charles K. & Victor, Chen Der-Thanq
 * Emergence of learning in computer-supported, large-scale collective dynamics: a research agenda
 * Proceedings of the 8th iternational conference on Computer supported collaborative learning
 * 2007
 * 
 * {{hidden||Seen through the lens of complexity theory, past CSCL} research may largely be characterized as small-scale (i.e., small-group) collective dynamics. While this research tradition is substantive and meaningful in its own right, we propose a line of inquiry that seeks to understand computer-supported, large-scale collective dynamics: how large groups of interacting people leverage technology to create emergent organizations (knowledge, structures, norms, values, etc.) at the collective level that are not reducible to any individual, e.g., Wikipedia, online communities etc. How does learning emerge in such large-scale collectives? Understanding the interactional dynamics of large-scale collectives is a critical and an open research question especially in an increasingly participatory, inter-connected, media-convergent culture of today. Recent CSCL} research has alluded to this; we, however, develop the case further in terms of what it means for how one conceives learning, as well as methodologies for seeking understandings of how learning emerges in these large-scale networks. In the final analysis, we leverage complexity theory to advance computational agent-based models (ABMs) as part of an integrated, iteratively-validated Phenomenological-ABM} inquiry cycle to understand emergent phenomenon from the bottom up"."}}
 * {{hidden||Seen through the lens of complexity theory, past CSCL} research may largely be characterized as small-scale (i.e., small-group) collective dynamics. While this research tradition is substantive and meaningful in its own right, we propose a line of inquiry that seeks to understand computer-supported, large-scale collective dynamics: how large groups of interacting people leverage technology to create emergent organizations (knowledge, structures, norms, values, etc.) at the collective level that are not reducible to any individual, e.g., Wikipedia, online communities etc. How does learning emerge in such large-scale collectives? Understanding the interactional dynamics of large-scale collectives is a critical and an open research question especially in an increasingly participatory, inter-connected, media-convergent culture of today. Recent CSCL} research has alluded to this; we, however, develop the case further in terms of what it means for how one conceives learning, as well as methodologies for seeking understandings of how learning emerges in these large-scale networks. In the final analysis, we leverage complexity theory to advance computational agent-based models (ABMs) as part of an integrated, iteratively-validated Phenomenological-ABM} inquiry cycle to understand emergent phenomenon from the bottom up"."}}


 * -- align="left" valign=top
 * Karadag, Zekeriya & McDougall, Douglas
 * E-contests in Mathematics: Technological Challenges versus Technological Innovations
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Karakus, Turkan; Sancar, Hatice & Cagiltay, Kursat
 * An Eye Tracking Study: The Effects of Individual Differences on Navigation Patterns and Recall Performance on Hypertext Environments
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Karlsson, Mia
 * Teacher Educators Moving from Learning the Office Package to Learning About Digital Natives' Use of ICT
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Karsenti, Thierry; Goyer, Sophie; Villeneuve, Stephane & Raby, Carole
 * The efficacy of eportfolios : an experiment with pupils and student teachers from Canada
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Karsenti, Thierry; Villeneuve, Stephane & Goyer, Sophie
 * The Development of an Eportfolio for Student Teachers
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Kasik, Maribeth Montgomery & Kasik, Maribeth Montgomery
 * Been there done that: emerged, evolved and ever changing face of e-learning and emerging technologies.
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Kasik, Maribeth Montgomery; Mott, Michael; Wasowski, Robert & Kasik, Maribeth Montgomery
 * Cyber Bullies Among the Digital Natives and Emerging Technologies
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2006
 * 


 * -- align="left" valign=top
 * Keengwe, Jared
 * Enhacing e-learning through Technology and Constructivist Pedagogy
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2005
 * 


 * -- align="left" valign=top
 * Kennard, Carl
 * Differences in Male and Female Wiki Participation during Educational Group Projects
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Kennard, Carl
 * Wiki Productivity and Discussion Forum Activity in a Postgraduate Online Distance Learning Course
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Kennedy, Ian
 * One Encyclopedia Per Child (OEPC) in Simple English
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Ketterl, Markus & Morisse, Karsten
 * User Generated Web Lecture Snippets to Support a Blended Learning Approach
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Khalid, Mahboob Alam & Verberne, Suzan
 * Passage retrieval for question answering using sliding windows
 * Proceedings of the 2nd workshop on Information Retrieval for Question Answering
 * 2008
 * 
 * {{hidden||The information retrieval (IR) community has investigated many different techniques to retrieve passages from large collections of documents for question answering (QA).} In this paper, we specifically examine and quantitatively compare the impact of passage retrieval for QA} using sliding windows and disjoint windows. We consider two different data sets, the TREC} 2002--2003 QA} data set, and 93 why-questions against INEX} Wikipedia. We discovered that, compared to disjoint windows, using sliding windows results in improved performance of TREC-QA} in terms of TDRR, and in improved performance of Why-QA} in terms of success@n and MRR.}}
 * {{hidden||The information retrieval (IR) community has investigated many different techniques to retrieve passages from large collections of documents for question answering (QA).} In this paper, we specifically examine and quantitatively compare the impact of passage retrieval for QA} using sliding windows and disjoint windows. We consider two different data sets, the TREC} 2002--2003 QA} data set, and 93 why-questions against INEX} Wikipedia. We discovered that, compared to disjoint windows, using sliding windows results in improved performance of TREC-QA} in terms of TDRR, and in improved performance of Why-QA} in terms of success@n and MRR.}}


 * -- align="left" valign=top
 * Kidd, Jennifer; Baker, Peter; Kaufman, Jamie; Hall, Tiffany; O'Shea, Patrick & Allen, Dwight
 * Wikitextbooks: Pedagogical Tool for Student Empowerment
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Kidd, Jennifer; O'Shea, Patrick; Baker, Peter; Kaufman, Jamie & Allen, Dwight
 * Student-authored Wikibooks: Textbooks of the Future?
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Kidd, Jennifer; O'Shea, Patrick; Kaufman, Jamie; Baker, Peter; Hall, Tiffany & Allen, Dwight
 * An Evaluation of Web 2.0 Pedagogy: Student-authored Wikibook vs Traditional Textbook
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Kim, Daesang; Rueckert, Daniel & Hwang, Yeiseon
 * Let’s create a podcast!
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Kim, Youngjun & Baek, Youngkyun
 * Educational uses of HUD in Second Life
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Kimmerle, Joachim; Moskaliuk, Johannes & Cress, Ulrike
 * Learning and knowledge building with social software
 * Proceedings of the 9th international conference on Computer supported collaborative learning - Volume 1
 * 2009
 * 


 * -- align="left" valign=top
 * Kimmons, Royce
 * Digital Play, Ludology, and the Future of Educational Games
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Kimmons, Royce
 * What Does Open Collaboration on Wikipedia Really Look Like?
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Kinney, Lance
 * Evidence of Engineering Education in Virtual Worlds
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Kiran, G.V.R.; Shankar, R. & Pudi, V.
 * Frequent Itemset Based Hierarchical Document Clustering Using Wikipedia as External Knowledge
 * Knowledge-Based and Intelligent Information and Engineering Systems. 14th International Conference, KES 2010, 8-10 Sept. 2010 Berlin, Germany
 * 2010
 * {{hidden||High dimensionality is a major challenge in document clustering. Some of the recent algorithms address this problem by using frequent itemsets for clustering. But, most of these algorithms neglect the semantic relationship between the words. On the other hand there are algorithms that take care of the semantic relations between the words by making use of external knowledge contained in WordNet, Mesh, Wikipedia, etc but do not handle the high dimensionality. In this paper we present an efficient solution that addresses both these problems. We propose a hierarchical clustering algorithm using closed frequent itemsets that use Wikipedia as an external knowledge to enhance the document representation. We evaluate our methods based on F-Score} on standard datasets and show our results to be better than existing approaches.}}
 * {{hidden||High dimensionality is a major challenge in document clustering. Some of the recent algorithms address this problem by using frequent itemsets for clustering. But, most of these algorithms neglect the semantic relationship between the words. On the other hand there are algorithms that take care of the semantic relations between the words by making use of external knowledge contained in WordNet, Mesh, Wikipedia, etc but do not handle the high dimensionality. In this paper we present an efficient solution that addresses both these problems. We propose a hierarchical clustering algorithm using closed frequent itemsets that use Wikipedia as an external knowledge to enhance the document representation. We evaluate our methods based on F-Score} on standard datasets and show our results to be better than existing approaches.}}
 * {{hidden||High dimensionality is a major challenge in document clustering. Some of the recent algorithms address this problem by using frequent itemsets for clustering. But, most of these algorithms neglect the semantic relationship between the words. On the other hand there are algorithms that take care of the semantic relations between the words by making use of external knowledge contained in WordNet, Mesh, Wikipedia, etc but do not handle the high dimensionality. In this paper we present an efficient solution that addresses both these problems. We propose a hierarchical clustering algorithm using closed frequent itemsets that use Wikipedia as an external knowledge to enhance the document representation. We evaluate our methods based on F-Score} on standard datasets and show our results to be better than existing approaches.}}


 * -- align="left" valign=top
 * Kobayashi, Michiko
 * Creating Wikis in the technology class: How do we use Wikis in K-12 classrooms?
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Koh, Elizabeth & Lim, John
 * An Integrated Collaboration System to Manage Student Team Projects
 * Global Learn Asia Pacific
 * 2010
 * 


 * -- align="left" valign=top
 * Kohlhase, Andrea
 * MS PowerPoint Use from a Micro-Perspective
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Kohlhase, Andrea
 * What if PowerPoint became emPowerPoint (through CPoint)?
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Kolias, C.; Demertzis, S. & Kambourakis, G.
 * Design and implementation of a secure mobile wiki system
 * Seventh IASTED International Conference on Web-Based Education, 17-19 March 2008 Anaheim, CA, USA}
 * 2008
 * {{hidden||During the last few years wikis have emerged as one of the most popular tool shells. Wikipedia has boosted their popularity, but they also keep a significant share in e- learning, intranet-based applications such as defect tracking, requirements management, test-case management, and project portals. However, existing wiki systems cannot fully support mobile clients due to several incompatibilities that exist. On the top of that, an effective secure mobile wiki system must be lightweight enough to support low-end mobile devices having several limitations. In this paper we analyze the requirements for a novel multi-platform secure wiki implementation. XML} encryption and Signature specifications are employed to realize end-to-end confidentiality and integrity services. Our scheme can be applied selectively and only to sensitive wiki content, thus diminishing by far computational resources needed at both ends; the server and the client. To address authentication of wiki clients a simple one-way authentication and session key agreement protocol is also intro-duced. The proposed solution can be easily applied to both centralized and forthcoming P2P} wiki implementations.}}
 * {{hidden||During the last few years wikis have emerged as one of the most popular tool shells. Wikipedia has boosted their popularity, but they also keep a significant share in e- learning, intranet-based applications such as defect tracking, requirements management, test-case management, and project portals. However, existing wiki systems cannot fully support mobile clients due to several incompatibilities that exist. On the top of that, an effective secure mobile wiki system must be lightweight enough to support low-end mobile devices having several limitations. In this paper we analyze the requirements for a novel multi-platform secure wiki implementation. XML} encryption and Signature specifications are employed to realize end-to-end confidentiality and integrity services. Our scheme can be applied selectively and only to sensitive wiki content, thus diminishing by far computational resources needed at both ends; the server and the client. To address authentication of wiki clients a simple one-way authentication and session key agreement protocol is also intro-duced. The proposed solution can be easily applied to both centralized and forthcoming P2P} wiki implementations.}}
 * {{hidden||During the last few years wikis have emerged as one of the most popular tool shells. Wikipedia has boosted their popularity, but they also keep a significant share in e- learning, intranet-based applications such as defect tracking, requirements management, test-case management, and project portals. However, existing wiki systems cannot fully support mobile clients due to several incompatibilities that exist. On the top of that, an effective secure mobile wiki system must be lightweight enough to support low-end mobile devices having several limitations. In this paper we analyze the requirements for a novel multi-platform secure wiki implementation. XML} encryption and Signature specifications are employed to realize end-to-end confidentiality and integrity services. Our scheme can be applied selectively and only to sensitive wiki content, thus diminishing by far computational resources needed at both ends; the server and the client. To address authentication of wiki clients a simple one-way authentication and session key agreement protocol is also intro-duced. The proposed solution can be easily applied to both centralized and forthcoming P2P} wiki implementations.}}


 * -- align="left" valign=top
 * Kondo, Mitsumasa; Tanaka, Akimichi & Uchiyama, Tadasu
 * Search your interests everywhere!: wikipedia-based keyphrase extraction from web browsing history
 * Proceedings of the 21st ACM conference on Hypertext and hypermedia
 * 2010
 * 
 * {{hidden||This paper proposes a method that can extract user interests from the user's Web browsing history. Our method allows easy access to multiple content domains such as blogs, movies, QA} sites, etc. since the user does not need to input a separate search query in each domain/site. To extract user interests, the method first extracts candidate keyphrases from the user's web browsed documents. Second, important keyphrases obtained from a link structure analysis of Wikipedia content is extracted from the main contents of web documents. This technique is based on the idea that important keyphrases in Wikipedia are important keyphrases in the real world. Finally, keyphrases contained in the documents important to the user are set in order as user interests. An experiment shows that our method offers improvements over a conventional method and can recommend interests attractive to the user.}}
 * {{hidden||This paper proposes a method that can extract user interests from the user's Web browsing history. Our method allows easy access to multiple content domains such as blogs, movies, QA} sites, etc. since the user does not need to input a separate search query in each domain/site. To extract user interests, the method first extracts candidate keyphrases from the user's web browsed documents. Second, important keyphrases obtained from a link structure analysis of Wikipedia content is extracted from the main contents of web documents. This technique is based on the idea that important keyphrases in Wikipedia are important keyphrases in the real world. Finally, keyphrases contained in the documents important to the user are set in order as user interests. An experiment shows that our method offers improvements over a conventional method and can recommend interests attractive to the user.}}


 * -- align="left" valign=top
 * Koolen, Marijn; Kazai, Gabriella & Craswell, Nick
 * Wikipedia pages as entry points for book search
 * Proceedings of the Second ACM International Conference on Web Search and Data Mining
 * 2009
 * 


 * -- align="left" valign=top
 * Kowase, Yasufumi; Kaneko, Keiichi & Ishikawa, Masatoshi
 * A Learning System for Related Words based on Thesaurus and Image Retrievals
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Krauskopf, Karsten
 * Developing a psychological framework for teachers’ constructive implementation of digital media in the classroom – media competence from the perspective of socio-cognitive functions of digital tools.
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Krishnan, S. & Bieszczad, A.
 * SEW: the semantic Extensions to Wikipedia
 * 2007 International Conference on Semantic Web \& Web Services (SWWS'07), 25-28 June 2007 Las Vegas, NV, USA}
 * 2007
 * {{hidden||The Semantic Web represents the next step in the evolution of the Web. The goal of the Semantic Web initiative is to create a universal medium for data exchange where data can be shared and processed by people as well as by automated tools. The paper presents the research and implementation of an application, SEW} (Semantic} Extensions to Wikipedia), that uses the Semantic Web technologies to extract information from the user and to store the data along with the semantics. SEW} addresses the shortcomings of the existing portal, Wikipedia through its knowledge extraction and representation techniques. The paper focuses on applying SEW} to solving a problem in the real world domain.}}
 * {{hidden||The Semantic Web represents the next step in the evolution of the Web. The goal of the Semantic Web initiative is to create a universal medium for data exchange where data can be shared and processed by people as well as by automated tools. The paper presents the research and implementation of an application, SEW} (Semantic} Extensions to Wikipedia), that uses the Semantic Web technologies to extract information from the user and to store the data along with the semantics. SEW} addresses the shortcomings of the existing portal, Wikipedia through its knowledge extraction and representation techniques. The paper focuses on applying SEW} to solving a problem in the real world domain.}}
 * {{hidden||The Semantic Web represents the next step in the evolution of the Web. The goal of the Semantic Web initiative is to create a universal medium for data exchange where data can be shared and processed by people as well as by automated tools. The paper presents the research and implementation of an application, SEW} (Semantic} Extensions to Wikipedia), that uses the Semantic Web technologies to extract information from the user and to store the data along with the semantics. SEW} addresses the shortcomings of the existing portal, Wikipedia through its knowledge extraction and representation techniques. The paper focuses on applying SEW} to solving a problem in the real world domain.}}


 * -- align="left" valign=top
 * Krotzsch, M.; Vrandecic, D. & Volkel, M.
 * Semantic MediaWiki
 * The Semantic Web - ISWC 2006. OTM 2006 Workshops. 5th International Semantic Web Conference, ISWC 2006. Proceedings, 5-9 Nov. 2006 Berlin, Germany
 * 2006
 * {{hidden||Semantic MediaWiki} is an extension of MediaWiki} - a widely used wiki-engine that also powers Wikipedia. Its aim is to make semantic technologies available to a broad community by smoothly integrating them with the established usage of MediaWiki.} The software is already used on a number of productive installations world-wide, but the main target remains to establish semantic Wikipedia" as an early adopter of semantic technologies on the Web. Thus usability and scalability are as important as powerful semantic features"}}
 * {{hidden||Semantic MediaWiki} is an extension of MediaWiki} - a widely used wiki-engine that also powers Wikipedia. Its aim is to make semantic technologies available to a broad community by smoothly integrating them with the established usage of MediaWiki.} The software is already used on a number of productive installations world-wide, but the main target remains to establish semantic Wikipedia" as an early adopter of semantic technologies on the Web. Thus usability and scalability are as important as powerful semantic features"}}
 * {{hidden||Semantic MediaWiki} is an extension of MediaWiki} - a widely used wiki-engine that also powers Wikipedia. Its aim is to make semantic technologies available to a broad community by smoothly integrating them with the established usage of MediaWiki.} The software is already used on a number of productive installations world-wide, but the main target remains to establish semantic Wikipedia" as an early adopter of semantic technologies on the Web. Thus usability and scalability are as important as powerful semantic features"}}


 * -- align="left" valign=top
 * Krupa, Y.; Vercouter, L.; Hubner, J.F. & Herzig, A.
 * Trust based Evaluation of Wikipedia's Contributors
 * Engineering Societies in the Agents World X. 10th International Workshop, ESAW 2009, 18-20 Nov. 2009 Berlin, Germany
 * 2009
 * 


 * -- align="left" valign=top
 * Kulathuramaiyer, Narayanan & Maurer, Hermann
 * Current Development of Mashups in Shaping Web Applications
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Kulathuramaiyer, Narayanan; Zaka, Bilal & Helic, Denis
 * Integrating Copy-Paste Checking into an E-Learning Ecosystem
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Kumar, Swapna
 * Building a Learning Community using Wikis in Educational Technology Courses
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Kumar, Swapna
 * Can We Model Wiki Use in Technology Courses to Help Teachers Use Wikis in their Classrooms?
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Kumaran, A.; Khapra, Mitesh M. & Li, Haizhou
 * Report of NEWS 2010 transliteration mining shared task
 * Proceedings of the 2010 Named Entities Workshop
 * 2010
 * 
 * {{hidden||This report documents the details of the Transliteration Mining Shared Task that was run as a part of the Named Entities Workshop (NEWS} 2010), an ACL} 2010 workshop. The shared task featured mining of name transliterations from the paired Wikipedia titles in 5 different language pairs, specifically, between English and one of Arabic, Chinese, Hindi Russian and Tamil. Totally 5 groups took part in this shared task, participating in multiple mining tasks in different languages pairs. The methodology and the data sets used in this shared task are published in the Shared Task White Paper {[Kumaran} et al, 2010]. We measure and report 3 metrics on the submitted results to calibrate the performance of individual systems on a commonly available Wikipedia dataset. We believe that the significant contribution of this shared task is in (i) assembling a diverse set of participants working in the area of transliteration mining, (ii) creating a baseline performance of transliteration mining systems in a set of diverse languages using commonly available Wikipedia data, and (iii) providing a basis for meaningful comparison and analysis of trade-offs between various algorithmic approaches used in mining. We believe that this shared task would complement the NEWS} 2010 transliteration generation shared task, in enabling development of practical systems with a small amount of seed data in a given pair of languages.}}
 * {{hidden||This report documents the details of the Transliteration Mining Shared Task that was run as a part of the Named Entities Workshop (NEWS} 2010), an ACL} 2010 workshop. The shared task featured mining of name transliterations from the paired Wikipedia titles in 5 different language pairs, specifically, between English and one of Arabic, Chinese, Hindi Russian and Tamil. Totally 5 groups took part in this shared task, participating in multiple mining tasks in different languages pairs. The methodology and the data sets used in this shared task are published in the Shared Task White Paper {[Kumaran} et al, 2010]. We measure and report 3 metrics on the submitted results to calibrate the performance of individual systems on a commonly available Wikipedia dataset. We believe that the significant contribution of this shared task is in (i) assembling a diverse set of participants working in the area of transliteration mining, (ii) creating a baseline performance of transliteration mining systems in a set of diverse languages using commonly available Wikipedia data, and (iii) providing a basis for meaningful comparison and analysis of trade-offs between various algorithmic approaches used in mining. We believe that this shared task would complement the NEWS} 2010 transliteration generation shared task, in enabling development of practical systems with a small amount of seed data in a given pair of languages.}}


 * -- align="left" valign=top
 * Kumaran, A.; Khapra, Mitesh M. & Li, Haizhou
 * Whitepaper of NEWS 2010 shared task on transliteration mining
 * Proceedings of the 2010 Named Entities Workshop
 * 2010
 * 
 * {{hidden||Transliteration is generally defined as phonetic translation of names across languages. Machine Transliteration is a critical technology in many domains, such as machine translation, cross-language information retrieval/extraction, etc. Recent research has shown that high quality machine transliteration systems may be developed in a language-neutral manner, using a reasonably sized good quality corpus ({\textasciitilde}15--25K} parallel names) between a given pair of languages. In this shared task, we focus on acquisition of such good quality names corpora in many languages, thus complementing the machine transliteration shared task that is concurrently conducted in the same NEWS} 2010 workshop. Specifically, this task focuses on mining the Wikipedia paired entities data (aka, inter-wiki-links) to produce high-quality transliteration data that may be used for transliteration tasks.}}
 * {{hidden||Transliteration is generally defined as phonetic translation of names across languages. Machine Transliteration is a critical technology in many domains, such as machine translation, cross-language information retrieval/extraction, etc. Recent research has shown that high quality machine transliteration systems may be developed in a language-neutral manner, using a reasonably sized good quality corpus ({\textasciitilde}15--25K} parallel names) between a given pair of languages. In this shared task, we focus on acquisition of such good quality names corpora in many languages, thus complementing the machine transliteration shared task that is concurrently conducted in the same NEWS} 2010 workshop. Specifically, this task focuses on mining the Wikipedia paired entities data (aka, inter-wiki-links) to produce high-quality transliteration data that may be used for transliteration tasks.}}


 * -- align="left" valign=top
 * Kunnath, Maria Lorna
 * MLAKedusoln eLearnovate's Unified E-Learning Strategy For the Semantic Web
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Kupatadze, Ketevan
 * Conducting chemistry lessons in Georgian schools with computer-educational programs (exemplificative one concrete programe)
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Kurhila, Jaakko
 * Unauthorized" Use of Social Software to Support Formal Higher Education"
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Kutty, S.; Tran, Tien; Nayak, R. & Li, Yuefeng
 * Clustering XML documents using frequent subtrees
 * Advances in Focused Retrieval. 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, 15-18 Dec. 2008 Berlin, Germany
 * 2009
 * {{hidden||This paper presents an experimental study conducted over the INEX} 2008 Document Mining Challenge corpus using both the structure and the content of XML} documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML} documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX} 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.}}
 * {{hidden||This paper presents an experimental study conducted over the INEX} 2008 Document Mining Challenge corpus using both the structure and the content of XML} documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML} documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX} 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.}}
 * {{hidden||This paper presents an experimental study conducted over the INEX} 2008 Document Mining Challenge corpus using both the structure and the content of XML} documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML} documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX} 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.}}


 * -- align="left" valign=top
 * Lahti, Lauri
 * Guided Generation of Pedagogical Concept Maps from the Wikipedia
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Lahti, L.
 * Educational tool based on topology and evolution of hyperlinks in the Wikipedia
 * 2010 IEEE 10th International Conference on Advanced Learning Technologies (ICALT 2010), 5-7 July 2010 Los Alamitos, CA, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Lai, Alice
 * An Examination of Technology-Mediated Feminist Consciousness-raising in Art Education
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Lapadat, Judith; Atkinson, Maureen & Brown, Willow
 * The Electronic Lives of Teens: Negotiating Access, Producing Digital Narratives, and Recovering From Internet Addiction
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Lara, Sonia & Naval, Concepción
 * Educative proposal of web 2.0 for the encouragement of social and citizenship competence
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Larson, M.; Newman, E. & Jones, G.J.F.
 * Overview of VideoCLEF 2009: new perspectives on speech-based multimedia content enrichment
 * Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||VideoCLEF} 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language} television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the Beeldenstorm"} collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes elevated speaking pitch increased speaking intensity and radical visual changes. The Linking Task also called {'Finding} Related Resources Across Languages involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language Beeldenstorm"} collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch-language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback query translation and methods that targeted proper names."}}
 * {{hidden||VideoCLEF} 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language} television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the Beeldenstorm"} collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes elevated speaking pitch increased speaking intensity and radical visual changes. The Linking Task also called {'Finding} Related Resources Across Languages involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language Beeldenstorm"} collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch-language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback query translation and methods that targeted proper names."}}


 * -- align="left" valign=top
 * Lau, C.; Tjondronegoro, D.; Zhang, J.; Geva, S. & Liu, Y.
 * Fusing visual and textual retrieval techniques to effectively search large collections of Wikipedia images
 * Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany
 * 2007
 * {{hidden||This paper presents an experimental study that examines the performance of various combination techniques for content-based image retrieval using a fusion of visual and textual search results. The evaluation is comprehensively benchmarked using more than 160,000 samples from INEX-MM2006} images dataset and the corresponding XML} documents. For visual search, we have successfully combined Hough transform, object's color histogram, and texture (H.O.T).} For comparison purposes, we used the provided UvA} features. Based on the evaluation, our submissions show that Uva+Text} combination performs most effectively, but it is closely followed by our {H.O.T-} (visual only) feature. Moreover, {H.O.T+Text} performance is still better than UvA} (visual) only. These findings show that the combination of effective text and visual search results can improve the overall performance of CBIR} in Wikipedia collections which contain a heterogeneous (i.e. wide) range of genres and topics.}}
 * {{hidden||This paper presents an experimental study that examines the performance of various combination techniques for content-based image retrieval using a fusion of visual and textual search results. The evaluation is comprehensively benchmarked using more than 160,000 samples from INEX-MM2006} images dataset and the corresponding XML} documents. For visual search, we have successfully combined Hough transform, object's color histogram, and texture (H.O.T).} For comparison purposes, we used the provided UvA} features. Based on the evaluation, our submissions show that Uva+Text} combination performs most effectively, but it is closely followed by our {H.O.T-} (visual only) feature. Moreover, {H.O.T+Text} performance is still better than UvA} (visual) only. These findings show that the combination of effective text and visual search results can improve the overall performance of CBIR} in Wikipedia collections which contain a heterogeneous (i.e. wide) range of genres and topics.}}
 * {{hidden||This paper presents an experimental study that examines the performance of various combination techniques for content-based image retrieval using a fusion of visual and textual search results. The evaluation is comprehensively benchmarked using more than 160,000 samples from INEX-MM2006} images dataset and the corresponding XML} documents. For visual search, we have successfully combined Hough transform, object's color histogram, and texture (H.O.T).} For comparison purposes, we used the provided UvA} features. Based on the evaluation, our submissions show that Uva+Text} combination performs most effectively, but it is closely followed by our {H.O.T-} (visual only) feature. Moreover, {H.O.T+Text} performance is still better than UvA} (visual) only. These findings show that the combination of effective text and visual search results can improve the overall performance of CBIR} in Wikipedia collections which contain a heterogeneous (i.e. wide) range of genres and topics.}}


 * -- align="left" valign=top
 * Leake, David & Powell, Jay
 * Mining Large-Scale Knowledge Sources for Case Adaptation Knowledge
 * Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
 * 2007
 * 
 * {{hidden||Making case adaptation practical is a longstanding challenge for case-based reasoning. One of the impediments to widespread use of automated case adaptation is the adaptation knowledge bottleneck: the adaptation process may require extensive domain knowledge, which may be difficult or expensive for system developers to provide. This paper advances a new approach to addressing this problem, proposing that systems mine their adaptation knowledge as needed from pre-existing large-scale knowledge sources available on the World Wide Web. The paper begins by discussing the case adaptation problem, opportunities for adaptation knowledge mining, and issues for applying the approach. It then presents an initial illustration of the method in a case study of the testbed system WebAdapt.} WebAdapt} applies the approach in the travel planning domain, using OpenCyc, Wikipedia, and the Geonames GIS} database as knowledge sources for generating substitutions. Experimental results suggest the promise of the approach, especially when information from multiple sources is combined.}}
 * {{hidden||Making case adaptation practical is a longstanding challenge for case-based reasoning. One of the impediments to widespread use of automated case adaptation is the adaptation knowledge bottleneck: the adaptation process may require extensive domain knowledge, which may be difficult or expensive for system developers to provide. This paper advances a new approach to addressing this problem, proposing that systems mine their adaptation knowledge as needed from pre-existing large-scale knowledge sources available on the World Wide Web. The paper begins by discussing the case adaptation problem, opportunities for adaptation knowledge mining, and issues for applying the approach. It then presents an initial illustration of the method in a case study of the testbed system WebAdapt.} WebAdapt} applies the approach in the travel planning domain, using OpenCyc, Wikipedia, and the Geonames GIS} database as knowledge sources for generating substitutions. Experimental results suggest the promise of the approach, especially when information from multiple sources is combined.}}


 * -- align="left" valign=top
 * Lee, Jennifer
 * Fads and Facts in Technology-Based Learning Environments
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Lee, Stella & Dron, Jon
 * Giving Learners Control through Interaction Design
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Lee, Zeng-Han
 * Attitude Changes Toward Applying Technology (A case study of Meiho Institute of Technology in Taiwan)
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Lemay, Philippe
 * Game and flow concepts for learning: some considerations
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Leong, Peter; Joseph, Samuel; Ho, Curtis & Fulford, Catherine
 * Learning to learn in a virtual world: An exploratory qualitative study
 * Global Learn Asia Pacific
 * 2010
 * 


 * -- align="left" valign=top
 * Li, Haizhou & Kumaran, A.
 * Proceedings of the 2010 Named Entities Workshop
 * 2010
 * 
 * {{hidden||Named Entities play a significant role in Natural Language Processing and Information Retrieval. While identifying and analyzing named entities in a given natural language is a challenging research problem by itself, the phenomenal growth in the Internet user population, especially among the Non-English} speaking parts of the world, has extended this problem to the crosslingual arena. We specifically focus on research on all aspects of the Named Entities in our workshop series, Named Entities WorkShop} (NEWS).} The first of the NEWS} workshops (NEWS} 2009) was held as a part of ACL-IJCNLP} 2009 conference in Singapore, and the current edition (NEWS} 2010) is being held as a part of ACL} 2010, in Uppsala, Sweden. The purpose of the NEWS} workshop is to bring together researchers across the world interested in identification, analysis, extraction, mining and transformation of named entities in monolingual or multilingual natural language text. The workshop scope includes many interesting specific research areas pertaining to the named entities, such as, orthographic and phonetic characteristics, corpus analysis, unsupervised and supervised named entities extraction in monolingual or multilingual corpus, transliteration modelling, and evaluation methodologies, to name a few. For this years edition, 11 research papers were submitted, each of which was reviewed by at least 3 reviewers from the program committee. 7 papers were chosen for publication, covering main research areas, from named entities recognition, extraction and categorization, to distributional characteristics of named entities, and finally a novel evaluation metrics for co-reference resolution. All accepted research papers are published in the workshop proceedings. This year, as parts of the NEWS} workshop, we organized two shared tasks: one on Machine Transliteration Generation, and another on Machine Transliteration Mining, participated by research teams from around the world, including industry, government laboratories and academia. The transliteration generation task was introduced in NEWS} 2009. While the focus of the 2009 shared task was on establishing the quality metrics and on baselining the transliteration quality based on those metrics, the 2010 shared task expanded the scope of the transliteration generation task to about dozen languages, and explored the quality depending on the direction of transliteration, between the languages. We collected significantly large, hand-crafted parallel named entities corpora in dozen different languages from 8 language families, and made available as common dataset for the shared task. We published the details of the shared task and the training and development data six months ahead of the conference that attracted an overwhelming response from the research community. Totally 7 teams participated in the transliteration generation task. The approaches ranged from traditional unsupervised learning methods (such as, Phrasal SMT-based, Conditional Random Fields, etc.) to somewhat unique approaches (such as, DirectTL} approach), combined with several model combinations for results re-ranking. A report of the shared task that summarizes all submissions and the original whitepaper are also included in the proceedings, and will be presented in the workshop. The participants in the shared task were asked to submit short system papers (4 pages each) describing their approach, and each of such papers was reviewed by at least two members of the program committee to help improve the quality of the content and presentation of the papers. 6 of them were finally accepted to be published in the workshop proceedings (one participating team did not submit their system paper in time). NEWS} 2010 also featured a second shared task this year, on Transliteration Mining; in this shared task we focus specifically on mining transliterations from the commonly available resource Wikipedia titles. The objective of this shared task is to identify transliterations from linked Wikipedia titles between English and another language in a Non-Latin} script. 5 teams participated in the mining task, each participating in multiple languages. The shared task was conducted in 5 language pairs, and the paired Wikipedia titles between English and each of the languages was provided to the participants. The participating systems output was measured using three specific metrics. All the results are reported in the shared task report. We hope that NEWS} 2010 would provide an exciting and productive forum for researchers working in this research area. The technical programme includes 7 research papers and 9 system papers (3 as oral papers, and 6 as poster papers) to be presented in the workshop. Further, we are pleased to have Dr Dan Roth, Professor at University of Illinois and The Beckman Institute, delivering the keynote speech at the workshop.}}
 * {{hidden||Named Entities play a significant role in Natural Language Processing and Information Retrieval. While identifying and analyzing named entities in a given natural language is a challenging research problem by itself, the phenomenal growth in the Internet user population, especially among the Non-English} speaking parts of the world, has extended this problem to the crosslingual arena. We specifically focus on research on all aspects of the Named Entities in our workshop series, Named Entities WorkShop} (NEWS).} The first of the NEWS} workshops (NEWS} 2009) was held as a part of ACL-IJCNLP} 2009 conference in Singapore, and the current edition (NEWS} 2010) is being held as a part of ACL} 2010, in Uppsala, Sweden. The purpose of the NEWS} workshop is to bring together researchers across the world interested in identification, analysis, extraction, mining and transformation of named entities in monolingual or multilingual natural language text. The workshop scope includes many interesting specific research areas pertaining to the named entities, such as, orthographic and phonetic characteristics, corpus analysis, unsupervised and supervised named entities extraction in monolingual or multilingual corpus, transliteration modelling, and evaluation methodologies, to name a few. For this years edition, 11 research papers were submitted, each of which was reviewed by at least 3 reviewers from the program committee. 7 papers were chosen for publication, covering main research areas, from named entities recognition, extraction and categorization, to distributional characteristics of named entities, and finally a novel evaluation metrics for co-reference resolution. All accepted research papers are published in the workshop proceedings. This year, as parts of the NEWS} workshop, we organized two shared tasks: one on Machine Transliteration Generation, and another on Machine Transliteration Mining, participated by research teams from around the world, including industry, government laboratories and academia. The transliteration generation task was introduced in NEWS} 2009. While the focus of the 2009 shared task was on establishing the quality metrics and on baselining the transliteration quality based on those metrics, the 2010 shared task expanded the scope of the transliteration generation task to about dozen languages, and explored the quality depending on the direction of transliteration, between the languages. We collected significantly large, hand-crafted parallel named entities corpora in dozen different languages from 8 language families, and made available as common dataset for the shared task. We published the details of the shared task and the training and development data six months ahead of the conference that attracted an overwhelming response from the research community. Totally 7 teams participated in the transliteration generation task. The approaches ranged from traditional unsupervised learning methods (such as, Phrasal SMT-based, Conditional Random Fields, etc.) to somewhat unique approaches (such as, DirectTL} approach), combined with several model combinations for results re-ranking. A report of the shared task that summarizes all submissions and the original whitepaper are also included in the proceedings, and will be presented in the workshop. The participants in the shared task were asked to submit short system papers (4 pages each) describing their approach, and each of such papers was reviewed by at least two members of the program committee to help improve the quality of the content and presentation of the papers. 6 of them were finally accepted to be published in the workshop proceedings (one participating team did not submit their system paper in time). NEWS} 2010 also featured a second shared task this year, on Transliteration Mining; in this shared task we focus specifically on mining transliterations from the commonly available resource Wikipedia titles. The objective of this shared task is to identify transliterations from linked Wikipedia titles between English and another language in a Non-Latin} script. 5 teams participated in the mining task, each participating in multiple languages. The shared task was conducted in 5 language pairs, and the paired Wikipedia titles between English and each of the languages was provided to the participants. The participating systems output was measured using three specific metrics. All the results are reported in the shared task report. We hope that NEWS} 2010 would provide an exciting and productive forum for researchers working in this research area. The technical programme includes 7 research papers and 9 system papers (3 as oral papers, and 6 as poster papers) to be presented in the workshop. Further, we are pleased to have Dr Dan Roth, Professor at University of Illinois and The Beckman Institute, delivering the keynote speech at the workshop.}}
 * {{hidden||Named Entities play a significant role in Natural Language Processing and Information Retrieval. While identifying and analyzing named entities in a given natural language is a challenging research problem by itself, the phenomenal growth in the Internet user population, especially among the Non-English} speaking parts of the world, has extended this problem to the crosslingual arena. We specifically focus on research on all aspects of the Named Entities in our workshop series, Named Entities WorkShop} (NEWS).} The first of the NEWS} workshops (NEWS} 2009) was held as a part of ACL-IJCNLP} 2009 conference in Singapore, and the current edition (NEWS} 2010) is being held as a part of ACL} 2010, in Uppsala, Sweden. The purpose of the NEWS} workshop is to bring together researchers across the world interested in identification, analysis, extraction, mining and transformation of named entities in monolingual or multilingual natural language text. The workshop scope includes many interesting specific research areas pertaining to the named entities, such as, orthographic and phonetic characteristics, corpus analysis, unsupervised and supervised named entities extraction in monolingual or multilingual corpus, transliteration modelling, and evaluation methodologies, to name a few. For this years edition, 11 research papers were submitted, each of which was reviewed by at least 3 reviewers from the program committee. 7 papers were chosen for publication, covering main research areas, from named entities recognition, extraction and categorization, to distributional characteristics of named entities, and finally a novel evaluation metrics for co-reference resolution. All accepted research papers are published in the workshop proceedings. This year, as parts of the NEWS} workshop, we organized two shared tasks: one on Machine Transliteration Generation, and another on Machine Transliteration Mining, participated by research teams from around the world, including industry, government laboratories and academia. The transliteration generation task was introduced in NEWS} 2009. While the focus of the 2009 shared task was on establishing the quality metrics and on baselining the transliteration quality based on those metrics, the 2010 shared task expanded the scope of the transliteration generation task to about dozen languages, and explored the quality depending on the direction of transliteration, between the languages. We collected significantly large, hand-crafted parallel named entities corpora in dozen different languages from 8 language families, and made available as common dataset for the shared task. We published the details of the shared task and the training and development data six months ahead of the conference that attracted an overwhelming response from the research community. Totally 7 teams participated in the transliteration generation task. The approaches ranged from traditional unsupervised learning methods (such as, Phrasal SMT-based, Conditional Random Fields, etc.) to somewhat unique approaches (such as, DirectTL} approach), combined with several model combinations for results re-ranking. A report of the shared task that summarizes all submissions and the original whitepaper are also included in the proceedings, and will be presented in the workshop. The participants in the shared task were asked to submit short system papers (4 pages each) describing their approach, and each of such papers was reviewed by at least two members of the program committee to help improve the quality of the content and presentation of the papers. 6 of them were finally accepted to be published in the workshop proceedings (one participating team did not submit their system paper in time). NEWS} 2010 also featured a second shared task this year, on Transliteration Mining; in this shared task we focus specifically on mining transliterations from the commonly available resource Wikipedia titles. The objective of this shared task is to identify transliterations from linked Wikipedia titles between English and another language in a Non-Latin} script. 5 teams participated in the mining task, each participating in multiple languages. The shared task was conducted in 5 language pairs, and the paired Wikipedia titles between English and each of the languages was provided to the participants. The participating systems output was measured using three specific metrics. All the results are reported in the shared task report. We hope that NEWS} 2010 would provide an exciting and productive forum for researchers working in this research area. The technical programme includes 7 research papers and 9 system papers (3 as oral papers, and 6 as poster papers) to be presented in the workshop. Further, we are pleased to have Dr Dan Roth, Professor at University of Illinois and The Beckman Institute, delivering the keynote speech at the workshop.}}


 * -- align="left" valign=top
 * Li, Yun; Tian, Fang; Ren, F.; Kuroiwa, S. & Zhong, Yixin
 * A method of semantic dictionary construction from online encyclopedia classifications
 * 2007 IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE '07), 30 Aug.-1 Sept. 2007 Piscataway, NJ, USA}
 * 2007
 * {{hidden||This paper introduces a method of constructing a semantic dictionary automatically from the keywords and classify relations of the web encyclopedia Chinese WikiPedia.} Semantic units, which are affixes (core/modifier) shared between many phrased-keywords, are selected using statistic method and string affix matching, also with other units to explain the semantic meanings. Then the result are used to mark the semantic explanations for most WikiPedia} keywords by analyzing surface text or upper classes. The feature form ' structure or advantages comparing to other semantic resource are also concerned.}}
 * {{hidden||This paper introduces a method of constructing a semantic dictionary automatically from the keywords and classify relations of the web encyclopedia Chinese WikiPedia.} Semantic units, which are affixes (core/modifier) shared between many phrased-keywords, are selected using statistic method and string affix matching, also with other units to explain the semantic meanings. Then the result are used to mark the semantic explanations for most WikiPedia} keywords by analyzing surface text or upper classes. The feature form ' structure or advantages comparing to other semantic resource are also concerned.}}
 * {{hidden||This paper introduces a method of constructing a semantic dictionary automatically from the keywords and classify relations of the web encyclopedia Chinese WikiPedia.} Semantic units, which are affixes (core/modifier) shared between many phrased-keywords, are selected using statistic method and string affix matching, also with other units to explain the semantic meanings. Then the result are used to mark the semantic explanations for most WikiPedia} keywords by analyzing surface text or upper classes. The feature form ' structure or advantages comparing to other semantic resource are also concerned.}}


 * -- align="left" valign=top
 * Liao, Ching-Jung & Sun, Cheng-Chieh
 * A RIA-Based Collaborative Learning System for E-Learning 2.0
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Liao, Ching-Jung & Yang, Jin-Tan
 * The Development of a Pervasive Collaborative LMS 2.0
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Lim, Keol & Park, So Youn
 * An Exploratory Approach to Understanding the Purposes of Computer and Internet Use in Web 2.0 Trends
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Lim, Ee-Peng; Vuong, Ba-Quy; Lauw, Hady Wirawan & Sun, Aixin
 * Measuring Qualities of Articles Contributed by Online Communities
 * Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
 * 2006
 * 


 * -- align="left" valign=top
 * Lin, Hong & Kelsey, Kathleen
 * Do Traditional and Online Learning Environments Impact Collaborative Learning with Wiki?
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Lin, S.
 * College students' perceptions, motivations and uses of Wikipedia
 * Proceedings of the American Society for Information Science and Technology
 * 2008
 * 


 * -- align="left" valign=top
 * Lin, Chun-Yi
 * Integrating wikis to support collaborative learning in higher education: A design-based approach to developing the instructional theory
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Lin, Chun-Yi & Lee, Hyunkyung
 * Adult Learners' Motivations in the Use of Wikis: Wikipedia, Higher Education, and Corporate Settings
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Lin, Chun-Yi; Lee, Lena & Bonk, Curtis
 * Teaching Innovations on Wikis: Practices and Perspectives of Early Childhood and Elementary School Teachers
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Lin, Meng-Fen Grace; Sajjapanroj, Suthiporn & Bonk, Curtis
 * Wikibooks and Wikibookians: Loosely-Coupled Community or the Future of the Textbook Industry?
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Lindroth, Tomas & Lundin, Johan
 * Students with laptops – the laptop as portfolio
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Linser, Roni; Ip, Albert; Rosser, Elizabeth & Leigh, Elyssebeth
 * On-line Games, Simulations \& Role-plays as Learning Environments: Boundary and Role Characteristics
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Lisk, Randy & Brown, Victoria
 * Digital Paper: The Possibilities
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Liu, Leping & Maddux, Cleborne
 * Online Publishing: A New Online Journal on “Social Media in Education‿
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Liu, Min; Hamilton, Kurstin & Wivagg, Jennifer
 * Facilitating Pre-Service Teachers’ Understanding of Technology Use With Instructional Activities
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Liu, Sandra Shu-Chao & Lin, Elaine Mei-Ying
 * Using the Internet in Developing Taiwanese Students' English Writing Abilities
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Liu, Xiongyi; Li, Lan & Vonderwell, Selma
 * Digital Ink-Based Engaged Learning
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Liu, X.; Qin, J.; Chen, M. & Park, J.-H.
 * Automatic semantic mapping between query terms and controlled vocabulary through using WordNet and Wikipedia
 * Proceedings of the American Society for Information Science and Technology
 * 2008
 * 


 * -- align="left" valign=top
 * Livingston, Michael; Strickland, Jane & Moulton, Shane
 * Decolonizing Indigenous Web Sites
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Livne, Nava; Livne, Oren & Wight, Charles
 * Automated Error Analysis through Parsing Mathematical Expressions in Adaptive Online Learning
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Llorente, A.; Motta, E. & Ruger, S.
 * Exploring the Semantics behind a Collection to Improve Automated Image Annotation
 * Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||The goal of this research is to explore several semantic relatedness measures that help to refine annotations generated by a baseline non-parametric density estimation algorithm. Thus, we analyse the benefits of performing a statistical correlation using the training set or using the World Wide Web versus approaches based on a thesaurus like WordNet} or Wikipedia (considered as a hyperlink structure). Experiments are carried out using the dataset provided by the 2009 edition of the ImageCLEF} competition, a subset of the MIR-Flickr} 25k collection. Best results correspond to approaches based on statistical correlation as they do not depend on a prior disambiguation phase like WordNet} and Wikipedia. Further work needs to be done to assess whether proper disambiguation schemas might improve their performance.}}
 * {{hidden||The goal of this research is to explore several semantic relatedness measures that help to refine annotations generated by a baseline non-parametric density estimation algorithm. Thus, we analyse the benefits of performing a statistical correlation using the training set or using the World Wide Web versus approaches based on a thesaurus like WordNet} or Wikipedia (considered as a hyperlink structure). Experiments are carried out using the dataset provided by the 2009 edition of the ImageCLEF} competition, a subset of the MIR-Flickr} 25k collection. Best results correspond to approaches based on statistical correlation as they do not depend on a prior disambiguation phase like WordNet} and Wikipedia. Further work needs to be done to assess whether proper disambiguation schemas might improve their performance.}}


 * -- align="left" valign=top
 * Lopes, António; Pires, Bruno; Cardoso, Márcio; Santos, Arnaldo; Peixinho, Filipe; Sequeira, Pedro & Morgado, Leonel
 * System for Defining and Reproducing Handball Strategies in Second Life On-Demand for Handball Coaches’ Education
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Lopes, Rui & Carriço, Luis
 * On the credibility of wikipedia: an accessibility perspective
 * Proceeding of the 2nd ACM workshop on Information credibility on the web
 * 2008
 * 


 * -- align="left" valign=top
 * Lopes, Rui & Carriço, Luís
 * The impact of accessibility assessment in macro scale universal usability studies of the web
 * Proceedings of the 2008 international cross-disciplinary conference on Web accessibility (W4A)
 * 2008
 * 
 * {{hidden||This paper presents a modelling framework, Web Interaction Environments, to express the synergies and differences of audiences, in order to study universal usability of the Web. Based on this framework, we have expressed the implicit model of WCAG} and developed an experimental study to assess the Web accessibility quality of Wikipedia at a macro scale. This has resulted on finding out that template mechanisms such as those provided by Wikipedia lower the burden of producing accessible contents, but provide no guarantee that hyperlinking to external websites maintain accessibility quality. We discuss the black-boxed nature of guidelines such as WCAG} and how formalising audiences helps leveraging universal usability studies of the Web at macro scales.}}
 * {{hidden||This paper presents a modelling framework, Web Interaction Environments, to express the synergies and differences of audiences, in order to study universal usability of the Web. Based on this framework, we have expressed the implicit model of WCAG} and developed an experimental study to assess the Web accessibility quality of Wikipedia at a macro scale. This has resulted on finding out that template mechanisms such as those provided by Wikipedia lower the burden of producing accessible contents, but provide no guarantee that hyperlinking to external websites maintain accessibility quality. We discuss the black-boxed nature of guidelines such as WCAG} and how formalising audiences helps leveraging universal usability studies of the Web at macro scales.}}


 * -- align="left" valign=top
 * Lopez, Patrice & Romary, Laurent
 * HUMB: Automatic key term extraction from scientific articles in GROBID
 * Proceedings of the 5th International Workshop on Semantic Evaluation
 * 2010
 * 
 * {{hidden||The Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID's} facilities for analyzing the structure of scientific articles, resulting in a first set of structural features. A second set of features captures content properties based on phraseness, informativeness and keywordness measures. Two knowledge bases, GRISP} and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Finally a post ranking was realized based on statistics of cousage of keywords in {HAL, a large Open Access publication repository.}}
 * {{hidden||The Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID's} facilities for analyzing the structure of scientific articles, resulting in a first set of structural features. A second set of features captures content properties based on phraseness, informativeness and keywordness measures. Two knowledge bases, GRISP} and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Finally a post ranking was realized based on statistics of cousage of keywords in {HAL, a large Open Access publication repository.}}


 * -- align="left" valign=top
 * Lops, P.; Basile, P.; de Gemmis, M. & Semeraro, G.
 * Language Is the Skin of My Thought: Integrating Wikipedia and AI to Support a Guillotine Player
 * AI*IA 2009: Emergent Perspectives in Artificial Intelligence. Xlth International Conference of the Italian Association for Artificial Intelligence, 9-12 Dec. 2009 Berlin, Germany
 * 2009
 * 
 * {{hidden||This paper describes OTTHO} (On} the Tip of my THOught), a system designed for solving a language game, called Guillotine, which demands knowledge covering a broad range of topics, such as movies, politics, literature, history, proverbs, and popular culture. The rule of the game is simple: the player observes five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. The system exploits several knowledge sources, such as a dictionary, a set of proverbs, and Wikipedia to realize a knowledge infusion process. The paper describes the process of modeling these sources and the reasoning mechanism to find the solution of the game. The main motivation for designing an artificial player for Guillotine is the challenge of providing the machine with the cultural and linguistic background knowledge which makes it similar to a human being, with the ability of interpreting natural language documents and reasoning on their content. Experiments carried out showed promising results. Our feeling is that the presented approach has a great potential for other more practical applications besides solving a language game.}}
 * {{hidden||This paper describes OTTHO} (On} the Tip of my THOught), a system designed for solving a language game, called Guillotine, which demands knowledge covering a broad range of topics, such as movies, politics, literature, history, proverbs, and popular culture. The rule of the game is simple: the player observes five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. The system exploits several knowledge sources, such as a dictionary, a set of proverbs, and Wikipedia to realize a knowledge infusion process. The paper describes the process of modeling these sources and the reasoning mechanism to find the solution of the game. The main motivation for designing an artificial player for Guillotine is the challenge of providing the machine with the cultural and linguistic background knowledge which makes it similar to a human being, with the ability of interpreting natural language documents and reasoning on their content. Experiments carried out showed promising results. Our feeling is that the presented approach has a great potential for other more practical applications besides solving a language game.}}


 * -- align="left" valign=top
 * Lotzmann, U.
 * Enhancing agents with normative capabilities
 * 24th European Conference on Modelling and Simulation, ECMS 2010, 1-4 June 2010 Nottingham, UK}
 * 2010
 * {{hidden||This paper describes the derivation of a software architecture (and its implementation called EMIL-S) from a logical normative agent architecture (called EMIL-A).} After a short introduction into the theoretical background of agent-based normative social simulation, the paper focuses on intra-agent structures and processes. The pivotal element in this regard is a rule-based agent design with a corresponding generalised intra-agent process" that involves decision making and learning capabilities. The resulting simulation dynamics are illustrated afterwards by means of an application sample where agents contribute to a Wikipedia community by writing editing and discussing articles. Findings and material presented in the paper are part of the results achieved in the FP6} project EMIL} (EMergence} In the Loop: Simulating the two-way dynamics of norm innovation)."}}
 * {{hidden||This paper describes the derivation of a software architecture (and its implementation called EMIL-S) from a logical normative agent architecture (called EMIL-A).} After a short introduction into the theoretical background of agent-based normative social simulation, the paper focuses on intra-agent structures and processes. The pivotal element in this regard is a rule-based agent design with a corresponding generalised intra-agent process" that involves decision making and learning capabilities. The resulting simulation dynamics are illustrated afterwards by means of an application sample where agents contribute to a Wikipedia community by writing editing and discussing articles. Findings and material presented in the paper are part of the results achieved in the FP6} project EMIL} (EMergence} In the Loop: Simulating the two-way dynamics of norm innovation)."}}
 * {{hidden||This paper describes the derivation of a software architecture (and its implementation called EMIL-S) from a logical normative agent architecture (called EMIL-A).} After a short introduction into the theoretical background of agent-based normative social simulation, the paper focuses on intra-agent structures and processes. The pivotal element in this regard is a rule-based agent design with a corresponding generalised intra-agent process" that involves decision making and learning capabilities. The resulting simulation dynamics are illustrated afterwards by means of an application sample where agents contribute to a Wikipedia community by writing editing and discussing articles. Findings and material presented in the paper are part of the results achieved in the FP6} project EMIL} (EMergence} In the Loop: Simulating the two-way dynamics of norm innovation)."}}


 * -- align="left" valign=top
 * Louis, Ellyn St; McCauley, Pete; Breuch, Tyler; Hatten, Jim & Louis, Ellyn St
 * Artscura: Experiencing Art Through Art
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Lowerison, Gretchen & Schmid, Richard F
 * Pedagogical Implications of Using Learner-Controlled, Web-based Tools for Learning
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Lu, Jianguo; Wang, Yan; Liang, Jie; Chen, Jessica & Liu, Jiming
 * An Approach to Deep Web Crawling by Sampling
 * Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
 * 2008
 * 


 * -- align="left" valign=top
 * Lu, Laura
 * Digital Divide: Does the Internet Speak Your Language?
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Lucassen, Teun & Schraagen, Jan Maarten
 * Trust in wikipedia: how users trust information from an unknown source
 * Proceedings of the 4th workshop on Information credibility
 * 2010
 * 


 * -- align="left" valign=top
 * Lund, Andreas & Rasmussen, Ingvill
 * Tasks 2.0: Education Meets Social Computing and Mass Collaboration
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Luther, Kurt
 * Supporting and transforming leadership in online creative collaboration
 * Proceedings of the ACM 2009 international conference on Supporting group work
 * 2009
 * 


 * -- align="left" valign=top
 * Luyt, B.; Kwek, Wee Tin; Sim, Ju Wei & York, Peng
 * Evaluating the comprehensiveness of wikipedia: the case of biochemistry
 * Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. 10th International Conference on Asian Digital Libraries, ICADL 2007, 10-13 Dec. 2007 Berlin, Germany
 * 2007


 * -- align="left" valign=top
 * Lykourentzou, Ioanna; Vergados, Dimitrios J. & Loumos, Vassili
 * Collective intelligence system engineering
 * Proceedings of the International Conference on Management of Emergent Digital EcoSystems
 * 2009
 * 
 * {{hidden||Collective intelligence (CI) is an emerging research field which aims at combining human and machine intelligence, to improve community processes usually performed by large groups. CI} systems may be collaborative, like Wikipedia, or competitive, like a number of recently established problem-solving companies that attempt to find solutions to difficult R\&D} or marketing problems drawing on the competition among web users. The benefits that CI} systems earn user communities, combined with the fact that they share a number of basic common characteristics, open up the prospect for the design of a general methodology that will allow the efficient development and evaluation of CI.} In the present work, an attempt is made to establish the analytical foundations and main challenges for the design and construction of a generic collective intelligence system. First, collective intelligence systems are categorized into active and passive and specific examples of each category are provided. Then, the basic modeling framework of CI} systems is described. This includes concepts such as the set of possible user actions, the CI} system state and the individual and community objectives. Additional functions, which estimate the expected user actions, the future state of the system, as well as the level of objective fulfillment, are also established. In addition, certain key issues that need to be considered prior to system launch are also described. The proposed framework is expected to promote efficient CI} design, so that the benefit gained by the community and the individuals through the use of CI} systems, will be maximized.}}
 * {{hidden||Collective intelligence (CI) is an emerging research field which aims at combining human and machine intelligence, to improve community processes usually performed by large groups. CI} systems may be collaborative, like Wikipedia, or competitive, like a number of recently established problem-solving companies that attempt to find solutions to difficult R\&D} or marketing problems drawing on the competition among web users. The benefits that CI} systems earn user communities, combined with the fact that they share a number of basic common characteristics, open up the prospect for the design of a general methodology that will allow the efficient development and evaluation of CI.} In the present work, an attempt is made to establish the analytical foundations and main challenges for the design and construction of a generic collective intelligence system. First, collective intelligence systems are categorized into active and passive and specific examples of each category are provided. Then, the basic modeling framework of CI} systems is described. This includes concepts such as the set of possible user actions, the CI} system state and the individual and community objectives. Additional functions, which estimate the expected user actions, the future state of the system, as well as the level of objective fulfillment, are also established. In addition, certain key issues that need to be considered prior to system launch are also described. The proposed framework is expected to promote efficient CI} design, so that the benefit gained by the community and the individuals through the use of CI} systems, will be maximized.}}


 * -- align="left" valign=top
 * Mach, Nada
 * Gaming, Learning 2.0, and the Digital Divide
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Mach, Nada
 * Reorganizing Schools to Engage Learners through Using Learning 2.0
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Mach, Nada & Bhattacharya, Madhumita
 * Social Learning Versus Individualized Learning
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * MacKenzie, Kathleen
 * Distance Education Policy: A Study of the SREB Faculty Support Policy Construct at Four Virtual College and University Consortia.
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Maddux, Cleborne; Johnson, Lamont & Ewing-Taylor, Jacque
 * An Annotated Bibliography of Outstanding Educational Technology Sites on the Web: A Study of Usefulness and Design Quality
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Mader, Elke; Budka, Philipp; Anderl, Elisabeth; Stockinger, Johann & Halbmayer, Ernst
 * Blended Learning Strategies for Methodology Education in an Austrian Social Science Setting
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Malik, Manish
 * Work In Progress: Use of Social Software for Final Year Project Supervision at a Campus Based University
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Malyn-Smith, Joyce; Coulter, Bob; Denner, Jill; Lee, Irene; Stiles, Joel & Werner, Linda
 * Computational Thinking in K-12: Defining the Space
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Manfra, Meghan; Friedman, Adam; Hammond, Thomas & Lee, John
 * Peering behind the curtain: Digital history, historiography, and secondary social studies methods
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Marenzi, Ivana; Demidova, Elena & Nejdl, Wolfgang
 * LearnWeb 2.0 - Integrating Social Software for Lifelong Learning
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Margaryan, Anoush; Nicol, David; Littlejohn, Allison & Trinder, Kathryn
 * Students’ use of technologies to support formal and informal learning
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Martin, Philippe; Eboueya, Michel; Blumenstein, Michael & Deer, Peter
 * A Network of Semantically Structured Wikipedia to Bind Information
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Martin, Sylvia S. & Crawford, Caroline M.
 * Special Education Methods Coursework: Information Literacy for Teachers through the Implementation of Graphic Novels
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Martinez-Cruz, C. & Angeletou, S.
 * Folksonomy expansion process using soft techniques
 * 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010), 10-12 Aug. 2010 Piscataway, NJ, USA}
 * 2010
 * 
 * {{hidden||The use of folksonomies involves several problems due to its lack of semantics associated with them. The nature of these structures makes difficult the process to enrich them semantically by the association of meaningful terms of the Semantic Web. This task implies a phase of disambiguation and another of expansion of the initial tagset, returning an increased contextualised set where synonyms, hyperonyms, gloss terms, etc. are part of it. In this novel proposal a technique based on confidence and similarity degrees is applied to weight this extended tagset in order to allow the user to obtain a customised resulting tagset. Moreover a comparision between the two main thesaurus, WordNet} and Wikipedia, are presented due to their great influence in the disambiguation and expansion process.}}
 * {{hidden||The use of folksonomies involves several problems due to its lack of semantics associated with them. The nature of these structures makes difficult the process to enrich them semantically by the association of meaningful terms of the Semantic Web. This task implies a phase of disambiguation and another of expansion of the initial tagset, returning an increased contextualised set where synonyms, hyperonyms, gloss terms, etc. are part of it. In this novel proposal a technique based on confidence and similarity degrees is applied to weight this extended tagset in order to allow the user to obtain a customised resulting tagset. Moreover a comparision between the two main thesaurus, WordNet} and Wikipedia, are presented due to their great influence in the disambiguation and expansion process.}}


 * -- align="left" valign=top
 * Martland, David
 * The Development of Web/Learning Communities: Is Technology the Way Forward?
 * Society for Information Technology \& Teacher Education International Conference
 * 2004
 * 


 * -- align="left" valign=top
 * Martland, David
 * E-learning: What communication tools does it require?
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2003
 * 


 * -- align="left" valign=top
 * Mass, Y.
 * IBM HRL at INEX 06
 * Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany
 * 2007
 * {{hidden||In previous INEX} years we presented an XML} component ranking algorithm that was based on separation of nested XML} elements to different indices. This worked fine for the IEEE} collection which has a small number of potential component types that can be returned as query results. However, such an assumption doesn't scale to this year Wikipedia collection where there is a large set of potential component types that can be returned. We show a new version of the component ranking algorithm that does not assume any knowledge on the set of component types. We then show some preliminary work we did to exploit the connectivity of the Wikipedia collection to improve ranking.}}
 * {{hidden||In previous INEX} years we presented an XML} component ranking algorithm that was based on separation of nested XML} elements to different indices. This worked fine for the IEEE} collection which has a small number of potential component types that can be returned as query results. However, such an assumption doesn't scale to this year Wikipedia collection where there is a large set of potential component types that can be returned. We show a new version of the component ranking algorithm that does not assume any knowledge on the set of component types. We then show some preliminary work we did to exploit the connectivity of the Wikipedia collection to improve ranking.}}
 * {{hidden||In previous INEX} years we presented an XML} component ranking algorithm that was based on separation of nested XML} elements to different indices. This worked fine for the IEEE} collection which has a small number of potential component types that can be returned as query results. However, such an assumption doesn't scale to this year Wikipedia collection where there is a large set of potential component types that can be returned. We show a new version of the component ranking algorithm that does not assume any knowledge on the set of component types. We then show some preliminary work we did to exploit the connectivity of the Wikipedia collection to improve ranking.}}


 * -- align="left" valign=top
 * Matsuno, Ryoji; Tsutsumi, Yutaka; Matsuo, Kanako & Gilbert, Richard
 * MiWIT: Integrated ESL/EFL Text Analysis Tools for Content Creation in MSWord
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Matthew, Kathryn & Callaway, Rebecca
 * Wiki as a Collaborative Learning Tool
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Matthew, Kathryn; Callaway, Rebecca; Matthew, Christie & Matthew, Josh
 * Online Solitude: A Lack of Student Interaction
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Matthew, Kathryn; Felvegi, Emese & Callaway, Rebecca
 * Collaborative Learning Using a Wiki
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Maurer, Hermann & Kulathuramaiyer, Narayanan
 * Coping With the Copy-Paste-Syndrome
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Maurer, Hermann & Safran, Christian
 * Beyond Wikipedia
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Maurer, Hermann & Schinagl, Wolfgang
 * E-Quiz - A Simple Tool to Enhance Intra-Organisational Knowledge Management eLearning and Edutainment Training
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Maurer, Hermann & Schinagl, Wolfgang
 * Wikis and other E-communities are Changing the Web
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2006
 * 


 * -- align="left" valign=top
 * Maurer, Hermann & Zaka, Bilal
 * Plagiarism - A Problem And How To Fight It
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * McCulloch, Allison & Smith, Ryan
 * The Nature of Students’ Collaboration in the Creation of a Wiki
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * McCulloch, Allison; Smith, Ryan; Wilson, P. Holt; McCammon, Lodge; Stein, Catherine & Arias, Cecilia
 * Creating Asynchronous Learning Communities in Mathematics Teacher Education, Part 2
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * McDonald, Roger
 * Using the Secure Wiki for Teaching Scientific Collaborative
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * McGee, Patricia; Carmean, Colleen; Rauch, Ulrich; Noakes, Nick & Lomas, Cyprien
 * Learning in a Virtual World, Part 2
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * McKay, Sean
 * Wiki as CMS
 * Society for Information Technology \& Teacher Education International Conference
 * 2005
 * 


 * -- align="left" valign=top
 * McKay, Sean & Headley, Scot
 * Best Practices for the Use of Wikis in Teacher Education Programs
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * McLoughlin, Catherine & Lee, Mark J.W.
 * Listen and learn: A systematic review of the evidence that podcasting supports learning in higher education
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * McNeil, Sara; White, Cameron; Angela, Miller & Behling, Debbie
 * Emerging Web 2.0 Technologies to Enhance Teaching and Learning in American History Classrooms
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Mehdad, Yashar; Moschitti, Alessandro & Zanzotto, Fabio Massimo
 * Syntactic/semantic structures for textual entailment recognition
 * HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||In this paper, we describe an approach based on off-the-shelf parsers and semantic resources for the Recognizing Textual Entailment (RTE) challenge that can be generally applied to any domain. Syntax is exploited by means of tree kernels whereas lexical semantics is derived from heterogeneous resources, e.g. WordNet} or distributional semantics through Wikipedia. The joint syntactic/semantic model is realized by means of tree kernels, which can exploit lexical related-ness to match syntactically similar structures, i.e. whose lexical compounds are related. The comparative experiments across different RTE} challenges and traditional systems show that our approach consistently and meaningfully achieves high accuracy, without requiring any adaptation or tuning.}}
 * {{hidden||In this paper, we describe an approach based on off-the-shelf parsers and semantic resources for the Recognizing Textual Entailment (RTE) challenge that can be generally applied to any domain. Syntax is exploited by means of tree kernels whereas lexical semantics is derived from heterogeneous resources, e.g. WordNet} or distributional semantics through Wikipedia. The joint syntactic/semantic model is realized by means of tree kernels, which can exploit lexical related-ness to match syntactically similar structures, i.e. whose lexical compounds are related. The comparative experiments across different RTE} challenges and traditional systems show that our approach consistently and meaningfully achieves high accuracy, without requiring any adaptation or tuning.}}


 * -- align="left" valign=top
 * Meijer, Erik
 * Fooled by expediency, saved by duality: how i denied the fallacies of distributed programming and trivialized the CAP theorem, but found the truth in math
 * Proceedings of the 2010 Workshop on Analysis and Programming Languages for Web Applications and Cloud Applications
 * 2010
 * 
 * {{hidden||Serendipitously, I recently picked up a copy of the book Leadership} and Self-Deception"} at my local thrift store. According to Wikipedia {"Self-deception} is a process of denying or rationalizing away the relevance significance or importance of opposing evidence and logical argument." While reading the book it occurred to me that I unknowingly minimized and repressed the fallacies of distributed programming and the CAP} theorem in my own work on {"Democratizing} the Cloud". Instead redirecting the forces that distribution imposes on the design of systems to create the simplest possible and correct solution I foolishly tried to attack them directly thereby making things more difficult than necessary and of course ultimately fail. Fortunately I found redemption for my sins by turning to math. By judicious use of categorical duality literally reversing the arrows we show how scalable is the dual of non-scalable. The result is a scalable and compositional approach to building distributed systems that we believe is so simple it can be applied by the average developer."}}
 * {{hidden||Serendipitously, I recently picked up a copy of the book Leadership} and Self-Deception"} at my local thrift store. According to Wikipedia {"Self-deception} is a process of denying or rationalizing away the relevance significance or importance of opposing evidence and logical argument." While reading the book it occurred to me that I unknowingly minimized and repressed the fallacies of distributed programming and the CAP} theorem in my own work on {"Democratizing} the Cloud". Instead redirecting the forces that distribution imposes on the design of systems to create the simplest possible and correct solution I foolishly tried to attack them directly thereby making things more difficult than necessary and of course ultimately fail. Fortunately I found redemption for my sins by turning to math. By judicious use of categorical duality literally reversing the arrows we show how scalable is the dual of non-scalable. The result is a scalable and compositional approach to building distributed systems that we believe is so simple it can be applied by the average developer."}}


 * -- align="left" valign=top
 * Memmel, Martin; Wolpers, Martin & Tomadaki, Eleftheria
 * An Approach to Enable Collective Intelligence in Digital Repositories
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Meza, R. & Buchmann, R.A.
 * Real-time Social Networking Profile Information Semantization Using Pipes And FCA
 * 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR 2010), 28-30 May 2010 Piscataway, NJ, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Millard, Mark & Essex, Christopher
 * Web 2.0 Technologies for Social and Collaborative E-Learning
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Milne, David; Medelyan, Olena & Witten, Ian H.
 * Mining Domain-Specific Thesauri from Wikipedia: A Case Study
 * Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
 * 2006
 * 


 * -- align="left" valign=top
 * Min, Jinming; Wilkins, P.; Leveling, J. & Jones, G.J.F.
 * Document Expansion for Text-based Image Retrieval at CLEF 2009
 * Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * {{hidden||In this paper, we describe and analyze our participation in the WikipediaMM} task at CLEF} 2009. Our main efforts concern the expansion of the image metadata from the Wikipedia abstracts collection DBpedia.} In our experiments, we use the Okapi feedback algorithm for document expansion. Compared with our text retrieval baseline, our best document expansion RUN} improves MAP} by 17.89\%. As one of our conclusions, document expansion from external resource can play an effective factor in the image metadata retrieval task.}}
 * {{hidden||In this paper, we describe and analyze our participation in the WikipediaMM} task at CLEF} 2009. Our main efforts concern the expansion of the image metadata from the Wikipedia abstracts collection DBpedia.} In our experiments, we use the Okapi feedback algorithm for document expansion. Compared with our text retrieval baseline, our best document expansion RUN} improves MAP} by 17.89\%. As one of our conclusions, document expansion from external resource can play an effective factor in the image metadata retrieval task.}}
 * {{hidden||In this paper, we describe and analyze our participation in the WikipediaMM} task at CLEF} 2009. Our main efforts concern the expansion of the image metadata from the Wikipedia abstracts collection DBpedia.} In our experiments, we use the Okapi feedback algorithm for document expansion. Compared with our text retrieval baseline, our best document expansion RUN} improves MAP} by 17.89\%. As one of our conclusions, document expansion from external resource can play an effective factor in the image metadata retrieval task.}}


 * -- align="left" valign=top
 * Missen, Malik Muhammad Saad; Boughanem, Mohand & Cabanac, Guillaume
 * Using passage-based language model for opinion detection in blogs
 * Proceedings of the 2010 ACM Symposium on Applied Computing
 * 2010
 * 
 * {{hidden||In this work, we evaluate the importance of Passages in blogs especially when we are dealing with the task of Opinion Detection. We argue that passages are basic building blocks of blogs. Therefore, we use Passage-Based} Language Modeling approach as our approach for Opinion Finding in Blogs. Our decision to use Language Modeling (LM) in this work is totally based on the performance LM} has given in various Opinion Detection Approaches. In addition to this, we propose a novel method for bi-dimensional Query Expansion with relevant and opinionated terms using Wikipedia and Relevance-Feedback} mechanism respectively. We also compare the impacts of two different query terms weighting (and ranking) approaches on final results. Besides all this, we also compare the performance of three Passage-based document ranking functions (Linear, Avg, Max). For evaluation purposes, we use the data collection of TREC} Blog06 with 50 topics of TREC} 2006 over TREC} provided best baseline with opinion finding MAP} of 0.3022. Our approach gives a MAP} improvement of almost 9.29\% over best TREC} provided baseline (baseline4).}}
 * {{hidden||In this work, we evaluate the importance of Passages in blogs especially when we are dealing with the task of Opinion Detection. We argue that passages are basic building blocks of blogs. Therefore, we use Passage-Based} Language Modeling approach as our approach for Opinion Finding in Blogs. Our decision to use Language Modeling (LM) in this work is totally based on the performance LM} has given in various Opinion Detection Approaches. In addition to this, we propose a novel method for bi-dimensional Query Expansion with relevant and opinionated terms using Wikipedia and Relevance-Feedback} mechanism respectively. We also compare the impacts of two different query terms weighting (and ranking) approaches on final results. Besides all this, we also compare the performance of three Passage-based document ranking functions (Linear, Avg, Max). For evaluation purposes, we use the data collection of TREC} Blog06 with 50 topics of TREC} 2006 over TREC} provided best baseline with opinion finding MAP} of 0.3022. Our approach gives a MAP} improvement of almost 9.29\% over best TREC} provided baseline (baseline4).}}


 * -- align="left" valign=top
 * Mitsuhara, Hiroyuki; Kanenishi, Kazuhide & Yano, Yoneo
 * Learning Process Sharing for Educational Modification of the Web
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2004
 * 


 * -- align="left" valign=top
 * Moan, Michael
 * Special student contest on a collective intelligence challenge problem
 * Proceedings of the 2009 conference on American Control Conference
 * 2009
 * 
 * {{hidden||Students and observers are cordially invited to join a Student Special Session on Thursday afternoon concerning a Collective} Intelligence Challenge Problem" (snacks and coffee provided.) From Wikipedia {"Collective} intelligence is a shared or group intelligence that emerges from the collaboration and competition of many individuals." Sign up at the registration desk beginning on Tuesday for speaking time slots (5 minutes) and the option to present two power point slides in a speed session focused on using collective intelligence to understand possible research areas for collaborative control system engineering. For instance come give us your thoughts on how we can better organize and disseminate controls knowledge control algorithm objects and control system building blocks within the open cyber world? How do we solve the problem that control system engineers within industry are overwhelmed by the amount of controls related information available through cyber discovery as even a simple search on "control system" gives over 1 billion hits! As control system engineers how should we organize the knowledge within our area of engineering to facilitate expedient development of control systems in an increasingly systems-of-systems world? Please feel free to share this invitation with your colleagues. There is no fee or peer review for this session and special session participants will receive a token of appreciation for participating. Registration will be accepted on a first-in first-serve basis until all the available time slots are taken. To register please stop by the registration desk on Tuesday or Wednesday."}}
 * {{hidden||Students and observers are cordially invited to join a Student Special Session on Thursday afternoon concerning a Collective} Intelligence Challenge Problem" (snacks and coffee provided.) From Wikipedia {"Collective} intelligence is a shared or group intelligence that emerges from the collaboration and competition of many individuals." Sign up at the registration desk beginning on Tuesday for speaking time slots (5 minutes) and the option to present two power point slides in a speed session focused on using collective intelligence to understand possible research areas for collaborative control system engineering. For instance come give us your thoughts on how we can better organize and disseminate controls knowledge control algorithm objects and control system building blocks within the open cyber world? How do we solve the problem that control system engineers within industry are overwhelmed by the amount of controls related information available through cyber discovery as even a simple search on "control system" gives over 1 billion hits! As control system engineers how should we organize the knowledge within our area of engineering to facilitate expedient development of control systems in an increasingly systems-of-systems world? Please feel free to share this invitation with your colleagues. There is no fee or peer review for this session and special session participants will receive a token of appreciation for participating. Registration will be accepted on a first-in first-serve basis until all the available time slots are taken. To register please stop by the registration desk on Tuesday or Wednesday."}}


 * -- align="left" valign=top
 * Moran, John
 * Mashups - the Web's Collages
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Morneau, Maxime & Mineau, Guy W.
 * Employing a Domain Specific Ontology to Perform Semantic Search
 * Proceedings of the 16th international conference on Conceptual Structures: Knowledge Visualization and Reasoning
 * 2008
 * 
 * {{hidden||Increasing the relevancy of Web search results has been a major concern in research over the last years. Boolean search, metadata, natural language based processing and various other techniques have been applied to improve the quality of search results sent to a user. Ontology-based methods were proposed to refine the information extraction process but they have not yet achieved wide adoption by search engines. This is mainly due to the fact that the ontology building process is time consuming. An all inclusive ontology for the entire World Wide Web might be difficult if not impossible to construct, but a specific domain ontology can be automatically built using statistical and machine learning techniques, as done with our tool: SeseiOnto.} In this paper, we describe how we adapted the SeseiOnto} software to perform Web search on the Wikipedia page on climate change. SeseiOnto, by using conceptual graphs to represent natural language and an ontology to extract links between concepts, manages to properly answer natural language queries about climate change. Our tests show that SeseiOnto} has the potential to be used in domain specific Web search as well as in corporate intranets.}}
 * {{hidden||Increasing the relevancy of Web search results has been a major concern in research over the last years. Boolean search, metadata, natural language based processing and various other techniques have been applied to improve the quality of search results sent to a user. Ontology-based methods were proposed to refine the information extraction process but they have not yet achieved wide adoption by search engines. This is mainly due to the fact that the ontology building process is time consuming. An all inclusive ontology for the entire World Wide Web might be difficult if not impossible to construct, but a specific domain ontology can be automatically built using statistical and machine learning techniques, as done with our tool: SeseiOnto.} In this paper, we describe how we adapted the SeseiOnto} software to perform Web search on the Wikipedia page on climate change. SeseiOnto, by using conceptual graphs to represent natural language and an ontology to extract links between concepts, manages to properly answer natural language queries about climate change. Our tests show that SeseiOnto} has the potential to be used in domain specific Web search as well as in corporate intranets.}}


 * -- align="left" valign=top
 * Moseley, Warren; Campbell, Brian & Campbell, Melaine
 * OK-RMSP-2006-COP : The Oklahoma Rural Math and Science Partnership’s Community of Practice
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Moseley, Warren; Campbell, Brian; Thomason, Matt & Mengers, Jessica
 * SMART-COP - Legitimate Peripheral Participation in the Science Math Association of Rural Teacher’s Community of Practice
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Moseley, Warren; Campbell, Brian; Thompson, Matt & Mengers, Jessica
 * A Sense of Urgency: Linking the Tom P. Stafford Air and Space Museum to the Science and Math Association of Rural Teacher’s Community Of Practice (SMART-COP)
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Moseley, Warren & Raoufi, Mehdi
 * ROCCA : The Rural Oklahoma Collaborative Computing Alliance
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Moshirnia, Andrew
 * Am I Still Wiki? The Creeping Centralization of Academic Wikis
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Moshirnia, Andrew
 * The Educational Implications of Synchronous and Asynchronous Peer-Tutoring in Video Games
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Moshirnia, Andrew
 * Emergent Features and Reciprocal Innovation in Modding Communities
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Moshirnia, Andrew
 * What do I Press? The Limited role of Collaborative Websites in Teacher Preparation Programs.
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Moshirnia, Andrew & Israel, Maya
 * The Use of Graphic Organizers within E-mentoring Wikis
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Motschnig, Renate & Figl, Kathrin
 * The Effects of Person Centered Education on Communication and Community Building
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Moulin, C.; Barat, C.; Lemaitre, C.; Gery, M.; Ducottet, C. & Largeron, C.
 * Combining Text/Image In WikipediaMM Task 2009
 * Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||This paper reports our multimedia information retrieval experiments carried out for the ImageCLEF} Wikipedia task 2009. We extend our previous multimedia model defined as a vector of textual and visual information based on a bag of words approach. We extract additional textual information from the original Wikipedia articles and we compute several image descriptors (local colour and texture features). We show that combining linearly textual and visual information significantly improves the results.}}
 * {{hidden||This paper reports our multimedia information retrieval experiments carried out for the ImageCLEF} Wikipedia task 2009. We extend our previous multimedia model defined as a vector of textual and visual information based on a bag of words approach. We extract additional textual information from the original Wikipedia articles and we compute several image descriptors (local colour and texture features). We show that combining linearly textual and visual information significantly improves the results.}}


 * -- align="left" valign=top
 * Muller-Birn, Claudia; Lehmann, Janette & Jeschke, Sabina
 * A Composite Calculation for Author Activity in Wikis: Accuracy Needed
 * Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
 * 2009
 * 


 * -- align="left" valign=top
 * Murakami, Violet
 * The Learning Community Class combining an Introduction to Digital Art class with a Hawaiian Studies Native Plants and Their Uses Class: A Case Study
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Murugeshan, M.S. & Mukherjee, S.
 * An n-gram and initial description based approach for entity ranking track
 * Focused Access to XML Documents. 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, 17-19 Dec. 2007 Berlin, Germany
 * 2008
 * 
 * {{hidden||The most important work that takes the center stage in the entity ranking track of INEX} is proper query formation. Both the subtasks, namely entity ranking and list completion, would immensely benefit if the given query can be expanded with more relevant terms, thereby improving the efficiency of the search engine. This paper stresses on the correct identification of meaningful n-grams" from the given title and proper selection of the "prominent n-grams" among them as the utmost important task that improves query formation and hence improves the efficiencies of the overall entity ranking tasks. We also exploit the initial descriptions (IDES) of the Wikipedia articles for ranking the retrieved answers based on their similarities with the given topic. List completion task is further aided by the related Wikipedia articles that boosted the score of retrieved answers."}}
 * {{hidden||The most important work that takes the center stage in the entity ranking track of INEX} is proper query formation. Both the subtasks, namely entity ranking and list completion, would immensely benefit if the given query can be expanded with more relevant terms, thereby improving the efficiency of the search engine. This paper stresses on the correct identification of meaningful n-grams" from the given title and proper selection of the "prominent n-grams" among them as the utmost important task that improves query formation and hence improves the efficiencies of the overall entity ranking tasks. We also exploit the initial descriptions (IDES) of the Wikipedia articles for ranking the retrieved answers based on their similarities with the given topic. List completion task is further aided by the related Wikipedia articles that boosted the score of retrieved answers."}}


 * -- align="left" valign=top
 * Myoupo, D.; Popescu, A.; Borgne, H. Le & Moellic, P.-A.
 * Multimodal image retrieval over a large database
 * Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||We introduce a new multimodal retrieval technique which combines query reformulation and visual image reranking in order to deal with results sparsity and imprecision, respectively. Textual queries are reformulated using Wikipedia knowledge and results are then reordered using a K-NN} based reranking method. We compare textual and multimodal retrieval and show that introducing visual reranking results in a significant improvement of performance.}}
 * {{hidden||We introduce a new multimodal retrieval technique which combines query reformulation and visual image reranking in order to deal with results sparsity and imprecision, respectively. Textual queries are reformulated using Wikipedia knowledge and results are then reordered using a K-NN} based reranking method. We compare textual and multimodal retrieval and show that introducing visual reranking results in a significant improvement of performance.}}


 * -- align="left" valign=top
 * Mödritscher, Felix; Garcia-Barrios, Victor Manuel; Gütl, Christian & Helic, Denis
 * The first AdeLE Prototype at a Glance
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2006
 * 


 * -- align="left" valign=top
 * Mödritscher, Felix; Garcia-Barrios, Victor Manuel & Maurer, Hermann
 * The Use of a Dynamic Background Library within the Scope of adaptive e-Learning
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2005
 * 


 * -- align="left" valign=top
 * Möller, Manuel; Regel, Sven & Sintek, Michael
 * RadSem: Semantic Annotation and Retrieval for Medical Images
 * Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
 * 2009
 * 
 * {{hidden||We present a tool for semantic medical image annotation and retrieval. It leverages the MEDICO} ontology which covers formal background information from various biomedical ontologies such as the Foundational Model of Anatomy (FMA), terminologies like ICD-10} and RadLex} and covers various aspects of clinical procedures. This ontology is used during several steps of annotation and retrieval: (1) We developed an ontology-driven metadata extractor for the medical image format DICOM.} Its output contains, {\textless}em{\textgreater}e. g.{\textless}/em{\textgreater}, person name, age, image acquisition parameters, body region, {\textless}em{\textgreater}etc{\textless}/em{\textgreater} . (2) The output from (1) is used to simplify the manual annotation by providing intuitive visualizations and to provide a preselected subset of annotation concepts. Furthermore, the extracted metadata is linked together with anatomical annotations and clinical findings to generate a unified view of a patient's medical history. (3) On the search side we perform query expansion based on the structure of the medical ontologies. (4) Our ontology for clinical data management allows us to link and combine patients, medical images and annotations together in a comprehensive result list. (5) The medical annotations are further extended by links to external sources like Wikipedia to provide additional information.}}
 * {{hidden||We present a tool for semantic medical image annotation and retrieval. It leverages the MEDICO} ontology which covers formal background information from various biomedical ontologies such as the Foundational Model of Anatomy (FMA), terminologies like ICD-10} and RadLex} and covers various aspects of clinical procedures. This ontology is used during several steps of annotation and retrieval: (1) We developed an ontology-driven metadata extractor for the medical image format DICOM.} Its output contains, {\textless}em{\textgreater}e. g.{\textless}/em{\textgreater}, person name, age, image acquisition parameters, body region, {\textless}em{\textgreater}etc{\textless}/em{\textgreater} . (2) The output from (1) is used to simplify the manual annotation by providing intuitive visualizations and to provide a preselected subset of annotation concepts. Furthermore, the extracted metadata is linked together with anatomical annotations and clinical findings to generate a unified view of a patient's medical history. (3) On the search side we perform query expansion based on the structure of the medical ontologies. (4) Our ontology for clinical data management allows us to link and combine patients, medical images and annotations together in a comprehensive result list. (5) The medical annotations are further extended by links to external sources like Wikipedia to provide additional information.}}


 * -- align="left" valign=top
 * Nabende, Peter
 * Mining transliterations from Wikipedia using pair HMMs
 * Proceedings of the 2010 Named Entities Workshop
 * 2010
 * 


 * -- align="left" valign=top
 * Nagler, Walther & Ebner, Martin
 * Is Your University Ready For the Ne(x)t-Generation?
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Nagler, Walther; Huber, Thomas & Ebner, Martin
 * The ABC-eBook System - From Content Management Application to Mashup Landscape
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Najim, Najim Ussiph & Najim, Najim Ussiph
 * VLE And Its Impact On Learning Experience Of Students: Echoes From Rural Community School In Ghana
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Nakamura, Carlos; Lajoie, Susanne & Berdugo, Gloria
 * Do Information Systems Actually Improve Problem-Solving and Decision-Making Performance? -- An Analysis of 3 Different Approaches to the Design of Information Systems
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2005
 * 


 * -- align="left" valign=top
 * Nakasaki, H.; Kawaba, M.; Utsuro, T.; Fukuhara, T.; Nakagawa, H. & Kando, N.
 * Cross-lingual blog analysis by cross-lingual comparison of characteristic terms and blog posts
 * 2008 Second International Symposium on Universal Communication, 15-16 Dec. 2008 Piscataway, NJ, USA}
 * 2008
 * 


 * -- align="left" valign=top
 * Nakayama, K.; Hara, T. & Nishio, S.
 * A thesaurus construction method from large scale Web dictionaries
 * 21st International Conference on Advanced Information Networking and Applications (AINA '07), 21-23 May 2007 Piscataway, NJ, USA}
 * 2007


 * -- align="left" valign=top
 * Nance, Kara; Hay, Brian & Possenti, Karina
 * Communicating Computer Security Issues to K-12 Teachers and Students
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Naumanen, Minnamari & Tukiainen, Markku
 * Discretionary use of computers and Internet among senior-clubbers – communication, writing and information search
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Naumanen, Minnamari & Tukiainen, Markku
 * K-60 - Access to ICT granted but not taken for granted
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Navarro, Emmanuel; Sajous, Franck; Gaume, Bruno; Prévot, Laurent; ShuKai, Hsieh; Tzu-Yi, Kuo; Magistry, Pierre & Chu-Ren, Huang
 * Wiktionary and NLP: improving synonymy networks
 * Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
 * 2009
 * 
 * {{hidden||Wiktionary, a satellite of the Wikipedia initiative, can be seen as a potential resource for Natural Language Processing. It requires however to be processed before being used efficiently as an NLP} resource. After describing the relevant aspects of Wiktionary for our purposes, we focus on its structural properties. Then, we describe how we extracted synonymy networks from this resource. We provide an in-depth study of these synonymy networks and compare them to those extracted from traditional resources. Finally, we describe two methods for semi-automatically improving this network by adding missing relations: (i) using a kind of semantic proximity measure; (ii) using translation relations of Wiktionary itself.}}
 * {{hidden||Wiktionary, a satellite of the Wikipedia initiative, can be seen as a potential resource for Natural Language Processing. It requires however to be processed before being used efficiently as an NLP} resource. After describing the relevant aspects of Wiktionary for our purposes, we focus on its structural properties. Then, we describe how we extracted synonymy networks from this resource. We provide an in-depth study of these synonymy networks and compare them to those extracted from traditional resources. Finally, we describe two methods for semi-automatically improving this network by adding missing relations: (i) using a kind of semantic proximity measure; (ii) using translation relations of Wiktionary itself.}}


 * -- align="left" valign=top
 * Newbury, Robert
 * Podcasting: Beyond Fad and Into Reality
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Newman, David; Lau, Jey Han; Grieser, Karl & Baldwin, Timothy
 * Automatic evaluation of topic coherence
 * HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||This paper introduces the novel task of topic coherence evaluation, whereby a set of words, as generated by a topic model, is rated for coherence or interpretability. We apply a range of topic scoring models to the evaluation task, drawing on WordNet, Wikipedia and the Google search engine, and existing research on lexical similarity/relatedness. In comparison with human scores for a set of learned topics over two distinct datasets, we show a simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results. Google produces strong, if less consistent, results, while our results over WordNet} are patchy at best.}}
 * {{hidden||This paper introduces the novel task of topic coherence evaluation, whereby a set of words, as generated by a topic model, is rated for coherence or interpretability. We apply a range of topic scoring models to the evaluation task, drawing on WordNet, Wikipedia and the Google search engine, and existing research on lexical similarity/relatedness. In comparison with human scores for a set of learned topics over two distinct datasets, we show a simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results. Google produces strong, if less consistent, results, while our results over WordNet} are patchy at best.}}


 * -- align="left" valign=top
 * Nguyen, Huong; Nguyen, Thanh; Nguyen, Hoa & Freire, Juliana
 * Querying Wikipedia documents and relationships
 * WebDB '10 Procceedings of the 13th International Workshop on the Web and Databases
 * 2010
 * 


 * -- align="left" valign=top
 * Niemann, Katja & Wolpers, Martin
 * Real World Object Based Access to Architecture Learning Material – the MACE Experience
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Nowak, Stefanie; Llorente, Ainhoa; Motta, Enrico & Rüger, Stefan
 * The effect of semantic relatedness measures on multi-label classification evaluation
 * Proceedings of the ACM International Conference on Image and Video Retrieval
 * 2010
 * 
 * {{hidden||In this paper, we explore different ways of formulating new evaluation measures for multi-label image classification when the vocabulary of the collection adopts the hierarchical structure of an ontology. We apply several semantic relatedness measures based on web-search engines, WordNet, Wikipedia and Flickr to the ontology-based score (OS) proposed in [22]. The final objective is to assess the benefit of integrating semantic distances to the OS} measure. Hence, we have evaluated them in a real case scenario: the results (73 runs) provided by 19 research teams during their participation in the ImageCLEF} 2009 Photo Annotation Task. Two experiments were conducted with a view to understand what aspect of the annotation behaviour is more effectively captured by each measure. First, we establish a comparison of system rankings brought about by different evaluation measures. This is done by computing the Kendall τ and Kolmogorov-Smirnov} correlation between the ranking of pairs of them. Second, we investigate how stable the different measures react to artificially introduced noise in the ground truth. We conclude that the distributional measures based on image information sources show a promising behaviour in terms of ranking and stability.}}
 * {{hidden||In this paper, we explore different ways of formulating new evaluation measures for multi-label image classification when the vocabulary of the collection adopts the hierarchical structure of an ontology. We apply several semantic relatedness measures based on web-search engines, WordNet, Wikipedia and Flickr to the ontology-based score (OS) proposed in [22]. The final objective is to assess the benefit of integrating semantic distances to the OS} measure. Hence, we have evaluated them in a real case scenario: the results (73 runs) provided by 19 research teams during their participation in the ImageCLEF} 2009 Photo Annotation Task. Two experiments were conducted with a view to understand what aspect of the annotation behaviour is more effectively captured by each measure. First, we establish a comparison of system rankings brought about by different evaluation measures. This is done by computing the Kendall τ and Kolmogorov-Smirnov} correlation between the ranking of pairs of them. Second, we investigate how stable the different measures react to artificially introduced noise in the ground truth. We conclude that the distributional measures based on image information sources show a promising behaviour in terms of ranking and stability.}}


 * -- align="left" valign=top
 * Nunes, Sérgio; Ribeiro, Cristina & David, Gabriel
 * Term frequency dynamics in collaborative articles
 * Proceedings of the 10th ACM symposium on Document engineering
 * 2010
 * 


 * -- align="left" valign=top
 * O'Bannon, Blanche
 * Using Wikis for Collaboration in Creating A Collection: Perceptions of Pre-service Teachers
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * O'Bannon, Blanche; Baytiyeh, Hoda & Beard, Jeff
 * Using Wikis to Create Collections of Curriculum-based Resources: Perceptions of Pre-service Teachers
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * O'Shea, Patrick
 * Using Voice to Provide Feedback in Online Education
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * O'Shea, Patrick; Curry-Corcoran, Daniel; Baker, Peter; Allen, Dwight & Allen, Douglas
 * A Student-written WikiText for a Foundations Course in Education
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * O'Shea, Patrick; Kidd, Jennifer; Baker, Peter; Kauffman, Jaime & Allen, Dwight
 * Studying the Credibility of a Student-Authored Textbook: Is it worth the effort for the results?
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Oh, Sangchul; Kim, Sung-Wan; Choi, Yonghun & Yang, Youjung
 * A Study on the Learning Participation and Communication Process by Learning Task Types in Wiki-Based Collaborative Learning System
 * Global Learn Asia Pacific
 * 2010
 * 


 * -- align="left" valign=top
 * Ohara, Maggie & Armstrong, Robin
 * Managing a Center for Online Testing: Challenges and Successes
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Okamoto, A.; Yokoyama, S.; Fukuta, N. & Ishikawa, H.
 * Proposal of Spatiotemporal Data Extraction and Visualization System Based on Wikipedia for Application to Earth Science
 * 2010 IEEE/ACIS 9th International Conference on Computer and Information Science (ICIS 2010), 18-20 Aug. 2010 Los Alamitos, CA, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Okike, Benjamin
 * Distance/Flexible Learning Education in a Developing Country
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Oliver, Ron & Luca, Joe
 * Using Mobile Technologies and Podcasts to Enhance Learning Experiences in Lecture-Based University Course Delivery
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Olson, J.F.; Howison, J. & Carley, K.M.
 * Paying attention to each other in visible work communities: Modeling bursty systems of multiple activity streams
 * 2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010), 20-22 Aug. 2010 Los Alamitos, CA, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Olsson, Lena & Sandorf, Monica
 * Increase the Professional Use of Digital Learning Resources among Teachers and Students
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Ong, Chorng-Shyong & Day, Min-Yuh
 * An Integrated Evaluation Model Of User Satisfaction With Social Media Services
 * 2010 IEEE International Conference on Information Reuse \& Integration (IRI 2010), 4-6 Aug. 2010 Piscataway, NJ, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Or-Bach, Rachel; Vale, Katie Livingston; del Alamo, Jesus & Lerman, Steven
 * Towards a Collaboration Space for Higher Education Teachers – The Case of MIT iLab Project
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2006
 * 


 * -- align="left" valign=top
 * Ouyang, John Ronghua
 * Make Statistic Analysis Simple: Solution with a Simple Click on the Screen
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Overell, S.; Magalhaes, J. & Ruger, S.
 * Forostar: a system for GIR
 * Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany
 * 2007


 * -- align="left" valign=top
 * Ozkan, Betul
 * Current and future trends in Free and Open Source Software
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Ozkan, Betul & McKenzie, Barbara
 * Open social software applications and their impact on distance education
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * de Pablo-Sanchez, C.; Gonzalez-Ledesma, A.; Moreno-Sandoval, A. & Vicente-Diez, M.T.
 * MIRACLE Experiments in QA@CLEF 2006 in Spanish: Main Task, Real-Time QA and Exploratory QA Using Wikipedia (WiQA)
 * Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany
 * 2007
 * {{hidden||We describe the participation of MIRACLE} group in the QA} track at CLEF.} We participated in three subtasks and presented two systems that work in Spanish. The first system is a traditional QA} system and was evaluated in the main task and the Real-Time} QA} pilot. The system features improved Named Entity recognition and shallow linguistic analysis and achieves moderate performance. In contrast, results obtained in RT-QA} shows that this approach is promising to provide answers in constrained time. The second system focus in the WiQA} pilot task that aims at retrieving important snippets to complete a Wikipedia. The system uses collection link structure, cosine similarity and Named Entities to retrieve new and important snippets. Although the experiments have not been exhaustive it seems that the performance depends on the type of concept.}}
 * {{hidden||We describe the participation of MIRACLE} group in the QA} track at CLEF.} We participated in three subtasks and presented two systems that work in Spanish. The first system is a traditional QA} system and was evaluated in the main task and the Real-Time} QA} pilot. The system features improved Named Entity recognition and shallow linguistic analysis and achieves moderate performance. In contrast, results obtained in RT-QA} shows that this approach is promising to provide answers in constrained time. The second system focus in the WiQA} pilot task that aims at retrieving important snippets to complete a Wikipedia. The system uses collection link structure, cosine similarity and Named Entities to retrieve new and important snippets. Although the experiments have not been exhaustive it seems that the performance depends on the type of concept.}}
 * {{hidden||We describe the participation of MIRACLE} group in the QA} track at CLEF.} We participated in three subtasks and presented two systems that work in Spanish. The first system is a traditional QA} system and was evaluated in the main task and the Real-Time} QA} pilot. The system features improved Named Entity recognition and shallow linguistic analysis and achieves moderate performance. In contrast, results obtained in RT-QA} shows that this approach is promising to provide answers in constrained time. The second system focus in the WiQA} pilot task that aims at retrieving important snippets to complete a Wikipedia. The system uses collection link structure, cosine similarity and Named Entities to retrieve new and important snippets. Although the experiments have not been exhaustive it seems that the performance depends on the type of concept.}}


 * -- align="left" valign=top
 * Padula, Marco; Reggiori, Amanda & Capetti, Giovanna
 * Managing Collective Knowledge in the Web 3.0
 * Proceedings of the 2009 First International Conference on Evolving Internet
 * 2009
 * 
 * {{hidden||Knowledge Management (KM) is one of the hottest Internet challenges influencing the design and the architecture of the infrastructures that will be accessed by the future generation. In this paper, we bridge KM} to philosophical theories to quest a theoretical foundation for the discussion, today utterly exciting, about Web’s semantics. The man has always tried to organise the knowledge he gained, using lists, encyclopaedias, libraries, etc., in order to make the consultation and the finding of information easier. Nowadays it is possible to get information from the Web, digital archives and databases, but the actual problem is linked to its interpretation, which is now possible only by human beings. The act of interpreting is peculiar for men, not for machines. At the moment there are lots of available digital tools which are presented as KM} technologies, but languages often do not discern meanings. We shall investigate the meaning of knowledge" in the digital world sustaining it with references to the Philosophy of Information and epistemology. After having provided a definition of "knowledge" suitable for the digital environment it has been extended to "collective knowledge" a very common concept in the area of global information proper to the current process of knowledge production and management. The definition is verified testing if a well-known growing phenomenon like Wikipedia can be truly regarded as a knowledge management system."}}
 * {{hidden||Knowledge Management (KM) is one of the hottest Internet challenges influencing the design and the architecture of the infrastructures that will be accessed by the future generation. In this paper, we bridge KM} to philosophical theories to quest a theoretical foundation for the discussion, today utterly exciting, about Web’s semantics. The man has always tried to organise the knowledge he gained, using lists, encyclopaedias, libraries, etc., in order to make the consultation and the finding of information easier. Nowadays it is possible to get information from the Web, digital archives and databases, but the actual problem is linked to its interpretation, which is now possible only by human beings. The act of interpreting is peculiar for men, not for machines. At the moment there are lots of available digital tools which are presented as KM} technologies, but languages often do not discern meanings. We shall investigate the meaning of knowledge" in the digital world sustaining it with references to the Philosophy of Information and epistemology. After having provided a definition of "knowledge" suitable for the digital environment it has been extended to "collective knowledge" a very common concept in the area of global information proper to the current process of knowledge production and management. The definition is verified testing if a well-known growing phenomenon like Wikipedia can be truly regarded as a knowledge management system."}}


 * -- align="left" valign=top
 * Pan, Shu-Chien & Franklin, Teresa
 * Teacher’s Self-efficacy and the Integration of Web 2.0 Tool/Applications in K-12 Schools
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Panciera, Katherine; Halfaker, Aaron & Terveen, Loren
 * Wikipedians are born, not made: a study of power editors on Wikipedia
 * Proceedings of the ACM 2009 international conference on Supporting group work
 * 2009
 * 
 * {{hidden||Open content web sites depend on users to produce information of value. Wikipedia is the largest and most well-known such site. Previous work has shown that a small fraction of editors {--Wikipedians} -- do most of the work and produce most of the value. Other work has offered conjectures about how Wikipedians differ from other editors and how Wikipedians change over time. We quantify and test these conjectures. Our key findings include: Wikipedians' edits last longer; Wikipedians invoke community norms more often to justify their edits; on many dimensions of activity, Wikipedians start intensely, tail off a little, then maintain a relatively high level of activity over the course of their career. Finally, we show that the amount of work done by Wikipedians and Non-Wikipedians} differs significantly from their very first day. Our results suggest a design opportunity: customizing the initial user experience to improve retention and channel new users' intense energy.}}
 * {{hidden||Open content web sites depend on users to produce information of value. Wikipedia is the largest and most well-known such site. Previous work has shown that a small fraction of editors {--Wikipedians} -- do most of the work and produce most of the value. Other work has offered conjectures about how Wikipedians differ from other editors and how Wikipedians change over time. We quantify and test these conjectures. Our key findings include: Wikipedians' edits last longer; Wikipedians invoke community norms more often to justify their edits; on many dimensions of activity, Wikipedians start intensely, tail off a little, then maintain a relatively high level of activity over the course of their career. Finally, we show that the amount of work done by Wikipedians and Non-Wikipedians} differs significantly from their very first day. Our results suggest a design opportunity: customizing the initial user experience to improve retention and channel new users' intense energy.}}


 * -- align="left" valign=top
 * Panke, Stefanie
 * Ingredients of Educational Portals as Infrastructures for Informal Learning Activities
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Pantola, Alexis Velarde; Pancho-Festin, Susan & Salvador, Florante
 * Rating the raters: a reputation system for wiki-like domains
 * Proceedings of the 3rd international conference on Security of information and networks
 * 2010
 * 


 * -- align="left" valign=top
 * Papadakis, Ioannis; Stefanidakis, Michalis; Stamou, Sofia & Andreou, Ioannis
 * A Query Construction Service for Large-Scale Web Search Engines
 * Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
 * 2009
 * 
 * {{hidden||Despite their wide usage, large-scale search engines are not always effective in tracing the best possible information for the user needs. There are times when web searchers spend too much time searching over a large-scale search engine. When (if) they eventually succeed in getting back the anticipated results, they often realize that their successful queries are significantly different from their initial one. In this paper, we introduce a query construction service for assisting web information seekers specify precise and un-ambiguous queries over large-scale search engines. The proposed service leverages the collective knowledge encapsulated mainly in the Wikipedia corpus and provides an intuitive GUI} via which web users can determine the se-mantic orientation of their searches before these are executed by the desired engine.}}
 * {{hidden||Despite their wide usage, large-scale search engines are not always effective in tracing the best possible information for the user needs. There are times when web searchers spend too much time searching over a large-scale search engine. When (if) they eventually succeed in getting back the anticipated results, they often realize that their successful queries are significantly different from their initial one. In this paper, we introduce a query construction service for assisting web information seekers specify precise and un-ambiguous queries over large-scale search engines. The proposed service leverages the collective knowledge encapsulated mainly in the Wikipedia corpus and provides an intuitive GUI} via which web users can determine the se-mantic orientation of their searches before these are executed by the desired engine.}}


 * -- align="left" valign=top
 * Park, Hyungsung; Baek, Youngkyun & Hwang, Jihyun
 * The effect of learner and game variables on social problem-solving in simulation game
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Parton, Becky Sue; Hancock, Robert; Ennis, Willie; Fulwiler, John & Dawson, John
 * Technology Integration Potential of Physical World Hyperlinks for Teacher Preparation Programs
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Pearce, Jon
 * A System to Encourage Playful Exploration in a Reflective Environment
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Penzhorn, Cecilia & Pienaar, Heila
 * The Academic Library as Partner in Support of Scholarship
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Pesenhofer, Andreas; Edler, Sonja; Berger, Helmut & Dittenbach, Michael
 * Towards a patent taxonomy integration and interaction framework
 * Proceeding of the 1st ACM workshop on Patent information retrieval
 * 2008
 * 


 * -- align="left" valign=top
 * Peterson, Rob; Verenikina, Irina & Herrington, Jan
 * Standards for Educational, Edutainment, and Developmentally Beneficial Computer Games
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Pferdt, Frederik G.
 * Designing Learning Environments with Social Software for the Ne(x)t Generation – New Perspectives and Implications for Effective Research Design
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Plowman, Travis
 * Wikis As a Social Justice Environment
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Pohl, Margit & Wieser, Dietmar
 * Enthusiasm or Skepticism? What Students Think about E-Learning
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Ponzetto, Simone Paolo & Strube, Michael
 * Extracting world and linguistic knowledge from Wikipedia
 * Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts
 * 2009
 * 


 * -- align="left" valign=top
 * Pope, Jack; Thurber, Bart & Meshkaty, Shahra
 * The Classroom as Learning Space: Two Disciplines, Two Views.
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Popescu, Adrian & Grefenstette, Gregory
 * Spatiotemporal mapping of Wikipedia concepts
 * Proceedings of the 10th annual joint conference on Digital libraries
 * 2010
 * 


 * -- align="left" valign=top
 * Popescu, Adrian; Grefenstette, Gregory & Bouamor, Houda
 * Mining a Multilingual Geographical Gazetteer from the Web
 * Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
 * 2009
 * 
 * {{hidden||Geographical gazetteers are necessary in a wide variety of applications. In the past, the construction of such gazetteers has been a tedious, manual process and only recently have the first attempts to automate the gazetteers creation been made. Here we describe our approach for mining accurate but large-scale multilingual geographic information by successively filtering information found in heterogeneous data sources (Flickr, Wikipedia, Panoramio, Web pages indexed by search engines). Statistically cross-checking information found in each site, we are able to identify new geographic objects, and to indicate, for each one, its name, its GPS} coordinates, its encompassing regions (city, region, country), the language of the name, its popularity, and the type of the object (church, bridge, etc.). We evaluate our approach by comparing, wherever possible, our multilingual gazetteer to other known attempts at automatically building a geographic database and to Geonames, a manually built gazetteer.}}
 * {{hidden||Geographical gazetteers are necessary in a wide variety of applications. In the past, the construction of such gazetteers has been a tedious, manual process and only recently have the first attempts to automate the gazetteers creation been made. Here we describe our approach for mining accurate but large-scale multilingual geographic information by successively filtering information found in heterogeneous data sources (Flickr, Wikipedia, Panoramio, Web pages indexed by search engines). Statistically cross-checking information found in each site, we are able to identify new geographic objects, and to indicate, for each one, its name, its GPS} coordinates, its encompassing regions (city, region, country), the language of the name, its popularity, and the type of the object (church, bridge, etc.). We evaluate our approach by comparing, wherever possible, our multilingual gazetteer to other known attempts at automatically building a geographic database and to Geonames, a manually built gazetteer.}}


 * -- align="left" valign=top
 * Powell, Allison
 * K12 Online Learning: A Global Perspective
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Preiss, Judita; Dehdari, Jon; King, Josh & Mehay, Dennis
 * Refining the most frequent sense baseline
 * Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
 * 2009
 * 


 * -- align="left" valign=top
 * Premchaiswadi, Wichian; Pangma, Sarayoot & Premchaiswadi, Nucharee
 * Knowledge Sharing for an On-Line Test Bank Construction
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Priest, W. Curtiss
 * What is the Common Ground between TCPK (Technological Pedagogical Content Knowledge) and Learning Objects?
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Priest, W. Curtiss
 * A Paradigm Shifting Architecture for Education Technology Systems
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Priest, W. Curtiss & Komoski, P. Kenneth
 * Designing Empathic Learning Games to Improve Emotional Competencies (Intelligence) Using Learning Objects [a work in progress]
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Purwitasari, D.; Okazaki, Y. & Watanabe, K.
 * A study on Web resources' navigation for e-learning: usage of Fourier domain scoring on Web pages ranking method
 * 2007 Second International Conference on Innovative Computing, Information and Control, 5-7 Sept. 2007 Los Alamitos, CA, USA}
 * 2007
 * {{hidden||Using existing Web resources for e-learning is a very promising idea especially in reducing the cost of authoring. Envisioned as open-source, completely free, and frequently updated Wikipedia could become a good candidate. Even though Wikipedia has been structured by categories, still sometimes they are not dynamically updated when there are modifications. As Web resources for e-learning, it is a necessity to provide a navigation path in Wikipedia which semantically mapping the learning material and not merely based on the structures. The desired learning material could be provided as a request from search results. We introduce in this paper the usage of Fourier domain scoring (FDS) for ranking method in searching certain collection of Wikipedia Web pages. Unlike other methods that would only recognize the occurrence numbers of query terms, FDS} could also recognize the spread of query terms throughout the content of Web pages. Based on the experiments, we concluded that the not relevant results retrieved are mainly influenced by the characteristic of Wikipedia. Given that the changes of Wikipedia Web pages could be done in any-part by anyone, we concluded that it is possible if only some parts of retrieved Web pages strongly related to query terms.}}
 * {{hidden||Using existing Web resources for e-learning is a very promising idea especially in reducing the cost of authoring. Envisioned as open-source, completely free, and frequently updated Wikipedia could become a good candidate. Even though Wikipedia has been structured by categories, still sometimes they are not dynamically updated when there are modifications. As Web resources for e-learning, it is a necessity to provide a navigation path in Wikipedia which semantically mapping the learning material and not merely based on the structures. The desired learning material could be provided as a request from search results. We introduce in this paper the usage of Fourier domain scoring (FDS) for ranking method in searching certain collection of Wikipedia Web pages. Unlike other methods that would only recognize the occurrence numbers of query terms, FDS} could also recognize the spread of query terms throughout the content of Web pages. Based on the experiments, we concluded that the not relevant results retrieved are mainly influenced by the characteristic of Wikipedia. Given that the changes of Wikipedia Web pages could be done in any-part by anyone, we concluded that it is possible if only some parts of retrieved Web pages strongly related to query terms.}}
 * {{hidden||Using existing Web resources for e-learning is a very promising idea especially in reducing the cost of authoring. Envisioned as open-source, completely free, and frequently updated Wikipedia could become a good candidate. Even though Wikipedia has been structured by categories, still sometimes they are not dynamically updated when there are modifications. As Web resources for e-learning, it is a necessity to provide a navigation path in Wikipedia which semantically mapping the learning material and not merely based on the structures. The desired learning material could be provided as a request from search results. We introduce in this paper the usage of Fourier domain scoring (FDS) for ranking method in searching certain collection of Wikipedia Web pages. Unlike other methods that would only recognize the occurrence numbers of query terms, FDS} could also recognize the spread of query terms throughout the content of Web pages. Based on the experiments, we concluded that the not relevant results retrieved are mainly influenced by the characteristic of Wikipedia. Given that the changes of Wikipedia Web pages could be done in any-part by anyone, we concluded that it is possible if only some parts of retrieved Web pages strongly related to query terms.}}


 * -- align="left" valign=top
 * Qian, Yufeng
 * Meaningful Learning with Wikis: Making a Connection
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Qiu, Yongqiang & Elsayed, Adel
 * Semantic Structures as Cognitive Tools to Support Reading
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Qu, Zehui; Wang, Yong; Wang, Juan; Zhang, Fengli & Qin, Zhiguang
 * A classification algorithm of signed networks based on link analysis
 * 2010 International Conference on Communications, Circuits and Systems (ICCCAS), 28-30 July 2010 Piscataway, NJ, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Quack, Till; Leibe, Bastian & Gool, Luc Van
 * World-scale mining of objects and events from community photo collections
 * Proceedings of the 2008 international conference on Content-based image and video retrieval
 * 2008
 * 


 * -- align="left" valign=top
 * Quinton, Stephen
 * Unlocking the Knowledge Generation and eLearning Potential of Contemporary Universities
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Raaijmakers, S.; Versloot, C. & de Wit, J.
 * A Cocktail Approach to the VideoCLEF'09 Linking Task
 * Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||In this paper, we describe the TNO} approach to the Finding Related Resources or linking task of VideoCLEF09.} Our system consists of a weighted combination of off-the-shelf and proprietary modules, including the Wikipedia Miner toolkit of the University of Waikato. Using this cocktail of largely off-the-shelf technology allows for setting a baseline for future approaches to this task.}}
 * {{hidden||In this paper, we describe the TNO} approach to the Finding Related Resources or linking task of VideoCLEF09.} Our system consists of a weighted combination of off-the-shelf and proprietary modules, including the Wikipedia Miner toolkit of the University of Waikato. Using this cocktail of largely off-the-shelf technology allows for setting a baseline for future approaches to this task.}}


 * -- align="left" valign=top
 * Ramakrishnan, Raghu
 * Community systems: the world online
 * Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
 * 2007
 * 


 * -- align="left" valign=top
 * Ramanathan, K. & Kapoor, K.
 * Creating User Profiles Using Wikipedia
 * Conceptual Modeling - ER 2009. 28th International Conference on Conceptual Modeling, 9-12 Nov. 2009 Berlin, Germany
 * 2009
 * 
 * {{hidden||Creating user profiles is an important step in personalization. Many methods for user profile creation have been developed to date using different representations such as term vectors and concepts from an ontology like DMOZ.} In this paper, we propose and evaluate different methods for creating user profiles using Wikipedia as the representation. The key idea in our approach is to map documents to Wikipedia concepts at different levels of resolution: words, key phrases, sentences, paragraphs, the document summary and the entire document itself. We suggest a method for evaluating profile recall by pooling the relevant results from the different methods and evaluate our results for both precision and recall. We also suggest a novel method for profile evaluation by assessing the recall over a known ontological profile drawn from DMOZ.}}
 * {{hidden||Creating user profiles is an important step in personalization. Many methods for user profile creation have been developed to date using different representations such as term vectors and concepts from an ontology like DMOZ.} In this paper, we propose and evaluate different methods for creating user profiles using Wikipedia as the representation. The key idea in our approach is to map documents to Wikipedia concepts at different levels of resolution: words, key phrases, sentences, paragraphs, the document summary and the entire document itself. We suggest a method for evaluating profile recall by pooling the relevant results from the different methods and evaluate our results for both precision and recall. We also suggest a novel method for profile evaluation by assessing the recall over a known ontological profile drawn from DMOZ.}}


 * -- align="left" valign=top
 * Rapetti, Emanuele; Ciannamea, Samanta; Cantoni, Lorenzo & Tardini, Stefano
 * The Voice of Learners to Understand ICTs Usages in Learning Experiences: a Quanti-qualitative Research Project in Ticino (Switzerland)
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Ratkiewicz, J.; Flammini, A. & Menczer, F.
 * Traffic in Social Media I: Paths Through Information Networks
 * 2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010), 20-22 Aug. 2010 Los Alamitos, CA, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Reiners, Torsten; Holt, Karyn & Reiß, Dirk
 * Google Wave: Unnecessary Hype or Promising Tool for Teachers
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Reiter, Nils; Hartung, Matthias & Frank, Anette
 * A resource-poor approach for linking ontology classes to Wikipedia articles
 * Proceedings of the 2008 Conference on Semantics in Text Processing
 * 2008
 * 


 * -- align="left" valign=top
 * Rejas-Muslera, R.J.; Cuadrado, J.J.; Abran, A. & Sicilia, M.A.
 * Information economy philosophy in universal education. The Open Educational Resources (OER): technical, socioeconomics and legal aspects
 * 2008 IEEE International Professional Communication Conference (IPCC 2008), 13-16 July 2008 Piscataway, NJ, USA}
 * 2008
 * {{hidden||According to Dr. B.R.} Ambedkars definition by Deshpande, P.M.} (1995), Open Educational Resources (OER) are based on the philosophical view of knowledge as a collective, social product. In consequence, it is also desirable to make it a social property. Terry Foote, one of the Wikipedia projects chairperson emphasize this: Imagine a world in which every single person is given free access to the sum of all human knowledge . The importance of open educational resources (OERs) has been widely documented and demonstrated and a high magnitude impact is to be expected for OERs} in the near future. This paper presents on overview of OERs} and its current usage. Then, the paper goes into detailed some related aspects. Which is the impact, in socio-economic terms, of OER, especially for the less developed? Which legal aspects influence the diffusion and use of OER?} And, which are the technical resources needed for them?.}}
 * {{hidden||According to Dr. B.R.} Ambedkars definition by Deshpande, P.M.} (1995), Open Educational Resources (OER) are based on the philosophical view of knowledge as a collective, social product. In consequence, it is also desirable to make it a social property. Terry Foote, one of the Wikipedia projects chairperson emphasize this: Imagine a world in which every single person is given free access to the sum of all human knowledge . The importance of open educational resources (OERs) has been widely documented and demonstrated and a high magnitude impact is to be expected for OERs} in the near future. This paper presents on overview of OERs} and its current usage. Then, the paper goes into detailed some related aspects. Which is the impact, in socio-economic terms, of OER, especially for the less developed? Which legal aspects influence the diffusion and use of OER?} And, which are the technical resources needed for them?.}}
 * {{hidden||According to Dr. B.R.} Ambedkars definition by Deshpande, P.M.} (1995), Open Educational Resources (OER) are based on the philosophical view of knowledge as a collective, social product. In consequence, it is also desirable to make it a social property. Terry Foote, one of the Wikipedia projects chairperson emphasize this: Imagine a world in which every single person is given free access to the sum of all human knowledge . The importance of open educational resources (OERs) has been widely documented and demonstrated and a high magnitude impact is to be expected for OERs} in the near future. This paper presents on overview of OERs} and its current usage. Then, the paper goes into detailed some related aspects. Which is the impact, in socio-economic terms, of OER, especially for the less developed? Which legal aspects influence the diffusion and use of OER?} And, which are the technical resources needed for them?.}}


 * -- align="left" valign=top
 * Repman, Judi; Zinskie, Cordelia & Clark, Kenneth
 * Online Learning, Web 2.0 and Higher Education: A Formula for Reform?
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Repman, Judi; Zinskie, Cordelia & Downs, Elizabeth
 * On the Horizon: Will Web 2.0 Change the Face of Online Learning?
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Rezaei, Ali Reza
 * Using social networks for language learning
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Richards, Griff; Lin, Arthur; Eap, Ty Mey & Sehboub, Zohra
 * Where Do They Go? Internet Search Strategies in Grade Five Laptop Classrooms
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Roberts, Cody; Yu, Chien; Brandenburg, Teri & Du, Jianxia
 * The Impact of Webcasting in Major Corporations
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Rodda, Paul
 * Social Constructivism as guiding philosophy for Software Development
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2004
 * 


 * -- align="left" valign=top
 * Rodriguez, Mark; Huang, Marcy & Merrill, Marcy
 * Analysis of Web Hosting Services in Collaborative Online Learning
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Ronda, Natalia Sinitskaya; Owston, Ron; Sanaoui, Razika & Ronda, Natalia Sinitskaya
 * Voulez-Vous Jouer?" [Do you want to play?]: Game Development Environments for Literacy Skill Enhancement"
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Rosselet, Alan
 * Active Course Notes within a Group Learning Environment
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Roxin, Ioan; Szilagyi, Ioan & Balog-Crisan, Radu
 * Kernel Design for Semantic Learning Platform
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Royer, Regina
 * Educational Blogging: Going Beyond Reporting, Journaling, and Commenting to Make Connections and Support Critical Thinking
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Royer, Regina
 * Using Web 2.0 Tools in an Online Course to Enhance Student Satisfaction
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Rozovskaya, Alla & Sproat, Richard
 * Multilingual word sense discrimination: a comparative cross-linguistic study
 * Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
 * 2007
 * 
 * {{hidden||We describe a study that evaluates an approach to Word Sense Discrimination on three languages with different linguistic structures, English, Hebrew, and Russian. The goal of the study is to determine whether there are significant performance differences for the languages and to identify language-specific problems. The algorithm is tested on semantically ambiguous words using data from Wikipedia, an online encyclopedia. We evaluate the induced clusters against sense clusters created manually. The results suggest a correlation between the algorithm's performance and morphological complexity of the language. In particular, we obtain FScores} of 0.68, 0.66 and 0.61 for English, Hebrew, and Russian, respectively. Moreover, we perform an experiment on Russian, in which the context terms are lemmatized. The lemma-based approach significantly improves the results over the word-based approach, by increasing the FScore} by 16\%. This result demonstrates the importance of morphological analysis for the task for morphologically rich languages like Russian.}}
 * {{hidden||We describe a study that evaluates an approach to Word Sense Discrimination on three languages with different linguistic structures, English, Hebrew, and Russian. The goal of the study is to determine whether there are significant performance differences for the languages and to identify language-specific problems. The algorithm is tested on semantically ambiguous words using data from Wikipedia, an online encyclopedia. We evaluate the induced clusters against sense clusters created manually. The results suggest a correlation between the algorithm's performance and morphological complexity of the language. In particular, we obtain FScores} of 0.68, 0.66 and 0.61 for English, Hebrew, and Russian, respectively. Moreover, we perform an experiment on Russian, in which the context terms are lemmatized. The lemma-based approach significantly improves the results over the word-based approach, by increasing the FScore} by 16\%. This result demonstrates the importance of morphological analysis for the task for morphologically rich languages like Russian.}}


 * -- align="left" valign=top
 * Rubens, Neil; Vilenius, Mikko & Okamoto, Toshio
 * Data-driven Group Formation for Informal Collaborative Learning
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Rudak, Leszek
 * Vector Graphics as a Mathematical Tool
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Rueckert, Dan; Kim, Daesang & Yang, Mihwa
 * Using a Wiki as a Communication Tool for Promoting Limited English Proficiency (LEP) Students’ Learning Practices
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Ruiz-Casado, Maria; Alfonseca, Enrique; Okumura, Manabu & Castells, Pablo
 * Information Extraction and Semantic Annotation of Wikipedia
 * Proceeding of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
 * 2008
 * 


 * -- align="left" valign=top
 * Rus, Vasile; Lintean, Mihai; Graesser, Art & McNamara, Danielle
 * Assessing Student Paraphrases Using Lexical Semantics and Word Weighting
 * Proceeding of the 2009 conference on Artificial Intelligence in Education: Building Learning Systems that Care: From Knowledge Representation to Affective Modelling
 * 2009
 * 
 * {{hidden||We present in this paper an approach to assessing student paraphrases in the intelligent tutoring system ISTART.} The approach is based on measuring the semantic similarity between a student paraphrase and a reference text, called the textbase. The semantic similarity is estimated using knowledge-based word relatedness measures. The relatedness measures rely on knowledge encoded in Word-Net, a lexical database of English. We also experiment with weighting words based on their importance. The word importance information was derived from an analysis of word distributions in 2,225,726 documents from Wikipedia. Performance is reported for 12 different models which resulted from combining 3 different relatedness measures, 2 word sense disambiguation methods, and 2 word-weighting schemes. Furthermore, comparisons are made to other approaches such as Latent Semantic Analysis and the Entailer.}}
 * {{hidden||We present in this paper an approach to assessing student paraphrases in the intelligent tutoring system ISTART.} The approach is based on measuring the semantic similarity between a student paraphrase and a reference text, called the textbase. The semantic similarity is estimated using knowledge-based word relatedness measures. The relatedness measures rely on knowledge encoded in Word-Net, a lexical database of English. We also experiment with weighting words based on their importance. The word importance information was derived from an analysis of word distributions in 2,225,726 documents from Wikipedia. Performance is reported for 12 different models which resulted from combining 3 different relatedness measures, 2 word sense disambiguation methods, and 2 word-weighting schemes. Furthermore, comparisons are made to other approaches such as Latent Semantic Analysis and the Entailer.}}


 * -- align="left" valign=top
 * Ruth, Alison & Ruutz, Aaron
 * Four Vignettes of Learning: Wiki Wiki Web or What Went Wrong
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Ryder, Barbara & Hailpern, Brent
 * Proceedings of the third ACM SIGPLAN conference on History of programming languages
 * 2007
 * 
 * 


 * -- align="left" valign=top
 * Safran, Christian; Ebner, Martin; Garcia-Barrios, Victor Manuel & Kappe, Frank
 * Higher Education m-Learning and e-Learning Scenarios for a Geospatial Wiki
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Sagara, T. & Hagiwara, M.
 * Natural Language Processing Neural Network for Recall and Inference
 * Artificial Neural Networks - ICANN 2010. 20th International Conference, 15-18 Sept. 2010 Berlin, Germany
 * 2010
 * 


 * -- align="left" valign=top
 * Sagers, Glen; Kasliwal, Shobit; Vila, Joaquin & Lim, Billy
 * Geo-Terra: Location-based Learning Using Geo-Tagged Multimedia
 * Global Learn Asia Pacific
 * 2010
 * 


 * -- align="left" valign=top
 * Sajjapanroj, Suthiporn; Bonk, Curtis; Lee, Mimi & Lin, Grace
 * The Challenges and Successes of Wikibookian Experts and Want-To-Bees
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Salajan, Florin & Mount, Greg
 * Instruction in the Web 2.0 Environment: A Wiki Solution for Multimedia Teaching and Learning
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Saleh, Iman; Darwish, Kareem & Fahmy, Aly
 * Classifying Wikipedia articles into NE's using SVM's with threshold adjustment
 * Proceedings of the 2010 Named Entities Workshop
 * 2010
 * 
 * {{hidden||In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classification accuracy and is shown to effectively classify multilingual pages in a language independent way. Classification is done using Support Vectors Machine (SVM) classifier at first, and then the threshold of SVM} is adjusted in order to improve the recall scores of classification. Threshold adjustment is performed using beta-gamma threshold adjustment algorithm which is a post learning step that shifts the hyperplane of SVM.} This approach boosted recall with minimal effect on precision.}}
 * {{hidden||In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classification accuracy and is shown to effectively classify multilingual pages in a language independent way. Classification is done using Support Vectors Machine (SVM) classifier at first, and then the threshold of SVM} is adjusted in order to improve the recall scores of classification. Threshold adjustment is performed using beta-gamma threshold adjustment algorithm which is a post learning step that shifts the hyperplane of SVM.} This approach boosted recall with minimal effect on precision.}}


 * -- align="left" valign=top
 * SanJuan, E. & Ibekwe-SanJuan, F.
 * Multi Word Term Queries for Focused Information Retrieval
 * Computational Linguistics and Intelligent Text Processing 11th International Conference, CICling 2010, 21-27 March 2010 Berlin, Germany
 * 2010
 * {{hidden||In this paper, we address both standard and focused retrieval tasks based on comprehensible language models and interactive query expansion (IQE).} Query topics are expanded using an initial set of Multi Word Terms (MWTs) selected from top n ranked documents. MWTs} are special text units that represent domain concepts and objects. As such, they can better represent query topics than ordinary phrases or ngrams. We tested different query representations: bag-of-words, phrases, flat list of MWTs, subsets of MWTs.} We also combined the initial set of MWTs} obtained in an IQE} process with automatic query expansion (AQE) using language models and smoothing mechanism. We chose as baseline the Indri IR} engine based on the language model using Dirichlet smoothing. The experiment is carried out on two benchmarks: TREC} Enterprise track (TRECent) 2007 and 2008 collections; INEX} 2008 Adhoc track using the Wikipedia collection.}}
 * {{hidden||In this paper, we address both standard and focused retrieval tasks based on comprehensible language models and interactive query expansion (IQE).} Query topics are expanded using an initial set of Multi Word Terms (MWTs) selected from top n ranked documents. MWTs} are special text units that represent domain concepts and objects. As such, they can better represent query topics than ordinary phrases or ngrams. We tested different query representations: bag-of-words, phrases, flat list of MWTs, subsets of MWTs.} We also combined the initial set of MWTs} obtained in an IQE} process with automatic query expansion (AQE) using language models and smoothing mechanism. We chose as baseline the Indri IR} engine based on the language model using Dirichlet smoothing. The experiment is carried out on two benchmarks: TREC} Enterprise track (TRECent) 2007 and 2008 collections; INEX} 2008 Adhoc track using the Wikipedia collection.}}
 * {{hidden||In this paper, we address both standard and focused retrieval tasks based on comprehensible language models and interactive query expansion (IQE).} Query topics are expanded using an initial set of Multi Word Terms (MWTs) selected from top n ranked documents. MWTs} are special text units that represent domain concepts and objects. As such, they can better represent query topics than ordinary phrases or ngrams. We tested different query representations: bag-of-words, phrases, flat list of MWTs, subsets of MWTs.} We also combined the initial set of MWTs} obtained in an IQE} process with automatic query expansion (AQE) using language models and smoothing mechanism. We chose as baseline the Indri IR} engine based on the language model using Dirichlet smoothing. The experiment is carried out on two benchmarks: TREC} Enterprise track (TRECent) 2007 and 2008 collections; INEX} 2008 Adhoc track using the Wikipedia collection.}}


 * -- align="left" valign=top
 * Santos, D. & Cabral, L.M.
 * GikiCLEF: expectations and lessons learned
 * Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||This overview paper is devoted to a critical assessment of GikiCLEF} 2009, an evaluation contest specifically designed to expose and investigate cultural and linguistic issues in Wikipedia search, with eight participant systems and 17 runs. After providing a maximally short but self contained overview of the GikiCLEF} task and participation, we present the open source SIGA} system, and discuss, for each of the main guiding ideas, the resulting successes or shortcomings, concluding with further work and still unanswered questions.}}
 * {{hidden||This overview paper is devoted to a critical assessment of GikiCLEF} 2009, an evaluation contest specifically designed to expose and investigate cultural and linguistic issues in Wikipedia search, with eight participant systems and 17 runs. After providing a maximally short but self contained overview of the GikiCLEF} task and participation, we present the open source SIGA} system, and discuss, for each of the main guiding ideas, the resulting successes or shortcomings, concluding with further work and still unanswered questions.}}


 * -- align="left" valign=top
 * Sanz-Santamaría, Silvia; Vare, Juan A. Pereira; Serrano, Julián Gutiérrez; Fernández, Tomás A. Pérez & Zorita, José A. Vadillo
 * Practicing L2 Speaking in a Collaborative Video-Conversation Environment
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Sarjant, Samuel; Legg, Catherine; Robinson, Michael & Medelyan, Olena
 * All You Can Eat" Ontology-Building: Feeding Wikipedia to Cyc"
 * Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
 * 2009
 * 
 * {{hidden||In order to achieve genuine web intelligence, building some kind of large general machine-readable conceptual scheme (i.e. ontology) seems inescapable. Yet the past 20 years have shown that manual ontology-building is not practicable. The recent explosion of free user-supplied knowledge on the Web has led to great strides in automatic ontology-building, but quality-control is still a major issue. Ideally one should automatically build onto an already intelligent base. We suggest that the long-running Cyc project is able to assist here. We describe methods used to add {35K} new concepts mined from Wikipedia to collections in ResearchCyc} entirely automatically. Evaluation with 22 human subjects shows high precision both for the new concepts’ categorization, and their assignment as individuals or collections. Most importantly we show how Cyc itself can be leveraged for ontological quality control by ‘feeding’ it assertions one by one, enabling it to reject those that contradict its other knowledge.}}
 * {{hidden||In order to achieve genuine web intelligence, building some kind of large general machine-readable conceptual scheme (i.e. ontology) seems inescapable. Yet the past 20 years have shown that manual ontology-building is not practicable. The recent explosion of free user-supplied knowledge on the Web has led to great strides in automatic ontology-building, but quality-control is still a major issue. Ideally one should automatically build onto an already intelligent base. We suggest that the long-running Cyc project is able to assist here. We describe methods used to add {35K} new concepts mined from Wikipedia to collections in ResearchCyc} entirely automatically. Evaluation with 22 human subjects shows high precision both for the new concepts’ categorization, and their assignment as individuals or collections. Most importantly we show how Cyc itself can be leveraged for ontological quality control by ‘feeding’ it assertions one by one, enabling it to reject those that contradict its other knowledge.}}


 * -- align="left" valign=top
 * Schalick, J.A.
 * Technology and changes in the concept of the university: comments on the reinvention of the role of the university east and west
 * Proceedings of PICMET 2006-Technology Management for the Global Future, 8-13 July 2006 Piscataway, NJ, USA}
 * 2007


 * -- align="left" valign=top
 * Schneider, Daniel
 * Edutech Wiki - an all-in-one solution to support whole scholarship?
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Scholl, P.; Bohnstedt, D.; Garcia, R.D.; Rensing, C. & Steinmetz, R.
 * Extended explicit semantic analysis for calculating semantic relatedness of web resources
 * Sustaining TEL: From Innovation to Learning and Practice. 5th European Conference on Technology Enhanced Learning, EC-TEL 2010, 28 Sept.-1 Oct. 2010 Berlin, Germany
 * 2010
 * 
 * {{hidden||Finding semantically similar documents is a common task in Recommender Systems. Explicit Semantic Analysis (ESA) is an approach to calculate semantic relatedness between terms or documents based on similarities to documents of a reference corpus. Here, usually Wikipedia is applied as reference corpus. We propose enhancements to ESA} (called Extended Explicit Semantic Analysis) that make use of further semantic properties of Wikipedia like article link structure and categorization, thus utilizing the additional semantic information that is included in Wikipedia. We show how we apply this approach to recommendation of web resource fragments in a resource-based learning scenario for self-directed, on-task learning with web resources.}}
 * {{hidden||Finding semantically similar documents is a common task in Recommender Systems. Explicit Semantic Analysis (ESA) is an approach to calculate semantic relatedness between terms or documents based on similarities to documents of a reference corpus. Here, usually Wikipedia is applied as reference corpus. We propose enhancements to ESA} (called Extended Explicit Semantic Analysis) that make use of further semantic properties of Wikipedia like article link structure and categorization, thus utilizing the additional semantic information that is included in Wikipedia. We show how we apply this approach to recommendation of web resource fragments in a resource-based learning scenario for self-directed, on-task learning with web resources.}}


 * -- align="left" valign=top
 * Schrader, Pg & Lawless, Kimberly
 * Gamer Discretion Advised: How MMOG Players Determine the Quality and Usefulness of Online Resources
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Semeraro, Giovanni; Lops, Pasquale; Basile, Pierpaolo & de Gemmis, Marco
 * Knowledge infusion into content-based recommender systems
 * Proceedings of the third ACM conference on Recommender systems
 * 2009
 * 


 * -- align="left" valign=top
 * Sendurur, Emine; Sendurur, Polat & Gedik, Nuray Temur
 * Communicational, Social, and Educational Aspects of Virtual Communities: Potential Educational Opportunities for In-Service Teacher Training
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Senette, C.; Buzzi, M.C.; Buzzi, M. & Leporini, B.
 * Enhancing Wikipedia Editing with WAI-ARIA
 * HCI and Usability for e-Inclusion. 5th Symposium of the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society, USAB 2009, 9-10 Nov. 2009 Berlin, Germany
 * 2009
 * {{hidden||Nowadays Web 2.0 applications allow anyone to create, share and edit online content, but accessibility and usability issues still exist. For instance, Wikipedia presents many difficulties for blind users, especially when they want to write or edit articles. In a previous stage of our study we proposed and discussed how to apply the W3C} ARIA} suite to simplify the Wikipedia editing page when interacting via screen reader. In this paper we present the results of a user test involving totally blind end-users as they interacted with both the original and the modified Wikipedia editing pages. Specifically, the purpose of the test was to compare the editing and formatting process for original and ARIA-implemented} Wikipedia user interfaces, and to evaluate the improvements.}}
 * {{hidden||Nowadays Web 2.0 applications allow anyone to create, share and edit online content, but accessibility and usability issues still exist. For instance, Wikipedia presents many difficulties for blind users, especially when they want to write or edit articles. In a previous stage of our study we proposed and discussed how to apply the W3C} ARIA} suite to simplify the Wikipedia editing page when interacting via screen reader. In this paper we present the results of a user test involving totally blind end-users as they interacted with both the original and the modified Wikipedia editing pages. Specifically, the purpose of the test was to compare the editing and formatting process for original and ARIA-implemented} Wikipedia user interfaces, and to evaluate the improvements.}}
 * {{hidden||Nowadays Web 2.0 applications allow anyone to create, share and edit online content, but accessibility and usability issues still exist. For instance, Wikipedia presents many difficulties for blind users, especially when they want to write or edit articles. In a previous stage of our study we proposed and discussed how to apply the W3C} ARIA} suite to simplify the Wikipedia editing page when interacting via screen reader. In this paper we present the results of a user test involving totally blind end-users as they interacted with both the original and the modified Wikipedia editing pages. Specifically, the purpose of the test was to compare the editing and formatting process for original and ARIA-implemented} Wikipedia user interfaces, and to evaluate the improvements.}}


 * -- align="left" valign=top
 * Seppala, Mika; Caprotti, Olga & Xambo, Sebastian
 * Using Web Technologies to Teach Mathematics
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Settles, Patty
 * Wikis, Blogs and their use in the Water/Wastewater World
 * Proceedings of the Water Environment Federation
 * 2006
 * M3 - doi:10.2175/193864706783779249""


 * -- align="left" valign=top
 * Shakshuki, Elhadi; Lei, Helen & Tomek, Ivan
 * Intelligent Agents in Collaborative E-Learning Environments
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2004
 * 


 * -- align="left" valign=top
 * Shakya, Aman; Takeda, Hideaki & Wuwongse, Vilas
 * Consolidating User-Defined Concepts with StYLiD
 * Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
 * 2008
 * 
 * {{hidden||Information sharing can be effective with structured data. However, there are several challenges for having structured data on the web. Creating structured concept definitions is difficult and multiple conceptualizations may exist due to different user requirements and preferences. We propose consolidating multiple concept definitions into a unified virtual concept and formalize our approach. We have implemented a system called StYLiD} to realize this. StYLiD} is a social software for sharing a wide variety of structured data. Users can freely define their own structured concepts. The system consolidates multiple definitions for the same concept by different users. Attributes of the multiple concept versions are aligned semi-automatically to provide a unified view. It provides a flexible interface for easy concept definition and data contribution. Popular concepts gradually emerge from the cloud of concepts while concepts evolve incrementally. StYLiD} supports linked data by interlinking data instances including external resources like Wikipedia.}}
 * {{hidden||Information sharing can be effective with structured data. However, there are several challenges for having structured data on the web. Creating structured concept definitions is difficult and multiple conceptualizations may exist due to different user requirements and preferences. We propose consolidating multiple concept definitions into a unified virtual concept and formalize our approach. We have implemented a system called StYLiD} to realize this. StYLiD} is a social software for sharing a wide variety of structured data. Users can freely define their own structured concepts. The system consolidates multiple definitions for the same concept by different users. Attributes of the multiple concept versions are aligned semi-automatically to provide a unified view. It provides a flexible interface for easy concept definition and data contribution. Popular concepts gradually emerge from the cloud of concepts while concepts evolve incrementally. StYLiD} supports linked data by interlinking data instances including external resources like Wikipedia.}}


 * -- align="left" valign=top
 * Shin, Wonsug & Lowes, Susan
 * Analyzing Web 2.0 Users in an Online Discussion Forum
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Siemens, George & Tittenberge, Peter
 * Virtual Learning Commons: Designing a Social University
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Sigurbjornsson, B.; Kamps, J. & de Rijke, M.
 * Focused access to Wikipedia
 * DIR'06. Dutch-Belgian Information Retrieval Workshop. Proceedings, 13-14 March 2006 Enschede, Netherlands
 * 2006
 * {{hidden||Wikipedia is a free" online encyclopedia. It contains millions of entries in many languages and is growing at a fast pace. Due to its volume search engines play an important role in giving access to the information in Wikipedia. The "free" availability of the collection makes it an attractive corpus for information retrieval experiments. In this paper we describe the evaluation of a search engine that provides focused search access to Wikipedia i.e. a search engine which gives direct access to individual sections of Wikipedia pages. The main contributions of this paper are twofold. First we introduce Wikipedia as a test corpus for information retrieval experiments in general and for semi-structured retrieval in particular. Second we demonstrate that focused XML} retrieval methods can be applied to a wider range of problems than searching scientific journals in XML} format including accessing reference works"}}
 * {{hidden||Wikipedia is a free" online encyclopedia. It contains millions of entries in many languages and is growing at a fast pace. Due to its volume search engines play an important role in giving access to the information in Wikipedia. The "free" availability of the collection makes it an attractive corpus for information retrieval experiments. In this paper we describe the evaluation of a search engine that provides focused search access to Wikipedia i.e. a search engine which gives direct access to individual sections of Wikipedia pages. The main contributions of this paper are twofold. First we introduce Wikipedia as a test corpus for information retrieval experiments in general and for semi-structured retrieval in particular. Second we demonstrate that focused XML} retrieval methods can be applied to a wider range of problems than searching scientific journals in XML} format including accessing reference works"}}
 * {{hidden||Wikipedia is a free" online encyclopedia. It contains millions of entries in many languages and is growing at a fast pace. Due to its volume search engines play an important role in giving access to the information in Wikipedia. The "free" availability of the collection makes it an attractive corpus for information retrieval experiments. In this paper we describe the evaluation of a search engine that provides focused search access to Wikipedia i.e. a search engine which gives direct access to individual sections of Wikipedia pages. The main contributions of this paper are twofold. First we introduce Wikipedia as a test corpus for information retrieval experiments in general and for semi-structured retrieval in particular. Second we demonstrate that focused XML} retrieval methods can be applied to a wider range of problems than searching scientific journals in XML} format including accessing reference works"}}


 * -- align="left" valign=top
 * Singh, V. K.; Jalan, R.; Chaturvedi, S. K. & Gupta, A. K.
 * Collective Intelligence Based Computational Approach to Web Intelligence
 * Proceedings of the 2009 International Conference on Web Information Systems and Mining
 * 2009
 * 


 * -- align="left" valign=top
 * Sinha, Hansa; Rosson, Mary Beth & Carroll, John
 * The Role of Technology in the Development of Teachers’ Professional Learning Communities
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Slusky, Ludwig & Partow-Navid, Parviz
 * Development for Computer Forensics Course Using EnCase
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Slykhuis, David & Stern, Barbara
 * Whither our Wiki?
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Smith, Jason R.; Quirk, Chris & Toutanova, Kristina
 * Extracting parallel sentences from comparable corpora using document level alignment
 * HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several approaches developed for obtaining parallel sentences from non-parallel, or comparable data, such as news articles published within the same time period (Munteanu} and Marcu, 2005), or web pages with a similar structure (Resnik} and Smith, 2003). One resource not yet thoroughly explored is Wikipedia, an online encyclopedia containing linked articles in many languages. We advance the state of the art in parallel sentence extraction by modeling the document level alignment, motivated by the observation that parallel sentence pairs are often found in close proximity. We also include features which make use of the additional annotation given by Wikipedia, and features using an automatically induced lexicon model. Results for both accuracy in sentence extraction and downstream improvement in an SMT} system are presented.}}
 * {{hidden||The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several approaches developed for obtaining parallel sentences from non-parallel, or comparable data, such as news articles published within the same time period (Munteanu} and Marcu, 2005), or web pages with a similar structure (Resnik} and Smith, 2003). One resource not yet thoroughly explored is Wikipedia, an online encyclopedia containing linked articles in many languages. We advance the state of the art in parallel sentence extraction by modeling the document level alignment, motivated by the observation that parallel sentence pairs are often found in close proximity. We also include features which make use of the additional annotation given by Wikipedia, and features using an automatically induced lexicon model. Results for both accuracy in sentence extraction and downstream improvement in an SMT} system are presented.}}


 * -- align="left" valign=top
 * Son, Moa
 * The Effects of Debriefing on Improvement of Academic Achievements and Game Skills
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Soriano, Javier; López, Javier; Jiménez, Miguel & Alonso, Fernando
 * Enabling semantics-aware collaborative tagging and social search in an open interoperable tagosphere
 * Proceedings of the 10th International Conference on Information Integration and Web-based Applications \& Services
 * 2008
 * 


 * -- align="left" valign=top
 * Sosin, Adrienne Andi; Pepper-Sanello, Miriam; Eichenholtz, Susan; Buttaro, Lucia & Edwards, Richard
 * Digital Storytelling Curriculum for Social Justice Learners \& Leaders
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Speelman, Pamela & Gore, David
 * IT Proposal - Simulation Project as a Higher Order Thinking Technique for Instruction
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Speelman, Pamela; Gore, David & Hyde, Scott
 * Simulation: Gaming and Beyond
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Stefl-Mabry, Joette & William, E. J. Doane
 * Teaching \& Learning 2.0: An urgent call to do away with the isolationist practice of educating and retool education as community in the United States.
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Strohmaier, Mahla; Nance, Kara & Hay, Brian
 * Phishing: What Teachers Need to Know for the Home and Classroom
 * Society for Information Technology \& Teacher Education International Conference
 * 2006
 * 


 * -- align="left" valign=top
 * Suanpang, Pannee & Kalceff, Walter
 * Suan Dusit Internet Broadcasting (SDIB):-Educational Innovation in Thailand
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Sun, Yanling; Masterson, Carolyn & Kahn, Patricia
 * Implementing ePortfolio among Pre-service Teachers: An Approach to Construct Meaning of NCATE Standards to Students
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Sung, Woonhee
 * Analysis of underlying features allowing educational uses for collaborative learning in Social Networking Sites, Cyworld
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Sutcliffe, R.F.E.; Steinberger, J.; Kruschwitz, U.; Alexandrov-Kabadjov, M. & Poesio, M.
 * Identifying novel information using latent semantic analysis in the WiQA task at CLEF 2006
 * Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany
 * 2007
 * {{hidden||In our two-stage system for the English monolingual WiQA} task, snippets were first retrieved if they contained an exact match with the title. Candidates were then passed to the latent semantic analysis component which judged them novel if their match with the article text was less than a threshold. In Runl, the ten best snippets were returned and in Run 2 the twenty best. Run 1 was superior, with average yield per topic 2.46 and precision 0.37. Compared to other groups, our performance was in the middle of the range except for precision where our system was the best. We attribute this to our use of exact title matches in the IR} stage. In future work we will vary the approach used depending on the topic type, exploit co-references in conjunction with exact matches and make use of the elaborate hyperlink structure which is a unique and most interesting aspect of the Wikipedia.}}
 * {{hidden||In our two-stage system for the English monolingual WiQA} task, snippets were first retrieved if they contained an exact match with the title. Candidates were then passed to the latent semantic analysis component which judged them novel if their match with the article text was less than a threshold. In Runl, the ten best snippets were returned and in Run 2 the twenty best. Run 1 was superior, with average yield per topic 2.46 and precision 0.37. Compared to other groups, our performance was in the middle of the range except for precision where our system was the best. We attribute this to our use of exact title matches in the IR} stage. In future work we will vary the approach used depending on the topic type, exploit co-references in conjunction with exact matches and make use of the elaborate hyperlink structure which is a unique and most interesting aspect of the Wikipedia.}}
 * {{hidden||In our two-stage system for the English monolingual WiQA} task, snippets were first retrieved if they contained an exact match with the title. Candidates were then passed to the latent semantic analysis component which judged them novel if their match with the article text was less than a threshold. In Runl, the ten best snippets were returned and in Run 2 the twenty best. Run 1 was superior, with average yield per topic 2.46 and precision 0.37. Compared to other groups, our performance was in the middle of the range except for precision where our system was the best. We attribute this to our use of exact title matches in the IR} stage. In future work we will vary the approach used depending on the topic type, exploit co-references in conjunction with exact matches and make use of the elaborate hyperlink structure which is a unique and most interesting aspect of the Wikipedia.}}


 * -- align="left" valign=top
 * Swenson, Penelope
 * Handheld computer use and the ‘killer application’
 * Society for Information Technology \& Teacher Education International Conference
 * 2005
 * 


 * -- align="left" valign=top
 * Switzer, Anne & Lepkowski, Frank
 * Information Literacy and the Returning Masters Student: Observations from the Library Side
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Sánchez, Alejandro Campos; Ureña, José David Flores; Sánchez, Raúl Campos; Gutiérrez, José Alberto Castellanos & Sánchez, Alejandro Campos
 * Knowledge Construction Through ICT's: Social Networks
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Tahayna, B.; Ayyasamy, R.K.; Alhashmi, S. & Eu-Gene, S.
 * A novel weighting scheme for efficient document indexing and classification
 * 2010 International Symposium on Information Technology (ITSim 2010), 15-17 June 2010 Piscataway, NJ, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Takayuki, Furukawa; Hoshi, Kouki; Aida, Aya; Mitsuhashi, Sachiko; Kamoshida, Hiromi & In, Katsuya
 * Do the traditional classroom-based motivational methods work in e-learning community?
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Takeuchi, H.
 * An automatic Web site menu structure evaluation
 * 2009 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 20-24 Aug. 2009 Piscataway, NJ, USA}
 * 2009
 * 


 * -- align="left" valign=top
 * Takeuchi, Toshihiko
 * Development of a VBA Macro that Draws Figures in Profile with PowerPoint
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Takeuchi, Toshihiko; Kato, Shogo & Kato, Yuuki
 * Suggestion of a quiz-form learning-style using a paid membership bulletin board system
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Takeuchi, Toshihiko; Kato, Shogo; Kato, Yuuki & Wakui, Tomohiro
 * Manga-Based Beginner-level Textbooks; Proposal of a Website for their Creation
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Tam, Shuk Ying; Wat, Sin Tung & Kennedy, David M
 * An Evaluation of Two Open Source Digital Library Software Systems
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Tamashiro, Roy
 * Transforming Curriculum \& Pedagogy for Global Thinking with Social Networking Tools
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Tamashiro, Roy; Rodney, Basiyr D. & Beckmann, Mary
 * Do Student-Authored Wiki Textbook Projects Support 21st Century Learning Outcomes?
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Tamim, Rana; Shaikh, Kamran & Bethel, Edward
 * EDyoutube: Why not?
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Tanaka, K.
 * Web Information Credibility
 * Web-Age Information Management. 11th International Conference, WAIM 2010, 15-17 July 2010 Berlin, Germany
 * 2010
 * 


 * -- align="left" valign=top
 * Tanaka, Katsumi; Zhou, Xiaofang; Zhang, Min & Jatowt, Adam
 * Proceedings of the 4th workshop on Information credibility
 * 2010
 * 
 * {{hidden||It is our great pleasure to welcome you all to the 4th Workshop on Information Credibility on the Web (WICOW'10) organized in conjunction with the 19th World Wide Web Conference in Raleigh, NC, USA} on April 27, 2010. The aim of the workshop is to provide a forum for discussion on various issues related to information credibility criteria on the web. Evaluating and improving information credibility requires combination of different technologies and backgrounds. Through the series of WICOW} workshops we hope to exchange novel ideas and findings as well as promote discussions on various aspects of web information credibility. This year we received 22 full paper submissions from 12 countries: Austria, Brazil, China, Egypt, France, Germany, Ireland, Japan, The Netherlands, Saudi Arabia, UK} and USA.} After a careful review process, with at least three reviews for each paper the Program Committee has selected 10 full papers (45\% acceptance rate) covering variety of topics related to information credibility. The accepted papers were grouped into 3 sessions: Wikipedia} Credibility" {"Studies} of Web Information Credibility" and {"Evaluating} Information Credibility". We are also pleased to invite Miriam Metzger from University of California Santa Barbara for giving a keynote talk entitled: {"Understanding} Credibility across Disciplinary Boundaries.""}}
 * {{hidden||It is our great pleasure to welcome you all to the 4th Workshop on Information Credibility on the Web (WICOW'10) organized in conjunction with the 19th World Wide Web Conference in Raleigh, NC, USA} on April 27, 2010. The aim of the workshop is to provide a forum for discussion on various issues related to information credibility criteria on the web. Evaluating and improving information credibility requires combination of different technologies and backgrounds. Through the series of WICOW} workshops we hope to exchange novel ideas and findings as well as promote discussions on various aspects of web information credibility. This year we received 22 full paper submissions from 12 countries: Austria, Brazil, China, Egypt, France, Germany, Ireland, Japan, The Netherlands, Saudi Arabia, UK} and USA.} After a careful review process, with at least three reviews for each paper the Program Committee has selected 10 full papers (45\% acceptance rate) covering variety of topics related to information credibility. The accepted papers were grouped into 3 sessions: Wikipedia} Credibility" {"Studies} of Web Information Credibility" and {"Evaluating} Information Credibility". We are also pleased to invite Miriam Metzger from University of California Santa Barbara for giving a keynote talk entitled: {"Understanding} Credibility across Disciplinary Boundaries.""}}
 * {{hidden||It is our great pleasure to welcome you all to the 4th Workshop on Information Credibility on the Web (WICOW'10) organized in conjunction with the 19th World Wide Web Conference in Raleigh, NC, USA} on April 27, 2010. The aim of the workshop is to provide a forum for discussion on various issues related to information credibility criteria on the web. Evaluating and improving information credibility requires combination of different technologies and backgrounds. Through the series of WICOW} workshops we hope to exchange novel ideas and findings as well as promote discussions on various aspects of web information credibility. This year we received 22 full paper submissions from 12 countries: Austria, Brazil, China, Egypt, France, Germany, Ireland, Japan, The Netherlands, Saudi Arabia, UK} and USA.} After a careful review process, with at least three reviews for each paper the Program Committee has selected 10 full papers (45\% acceptance rate) covering variety of topics related to information credibility. The accepted papers were grouped into 3 sessions: Wikipedia} Credibility" {"Studies} of Web Information Credibility" and {"Evaluating} Information Credibility". We are also pleased to invite Miriam Metzger from University of California Santa Barbara for giving a keynote talk entitled: {"Understanding} Credibility across Disciplinary Boundaries.""}}


 * -- align="left" valign=top
 * Tappert, Charles
 * The Interplay of Student Projects and Student-Faculty Research
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Tappert, Charles
 * Pedagogical Issues in Managing Information Technology Projects Conducted by Geographically Distributed Student Teams
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Tappert, Charles & Stix, Allen
 * Assessment of Student Work on Geographically Distributed Information Technology Project Teams
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Tarkowski, Diane; Donovan, Marie; Salwach, Joe; Avgerinou, Maria; Rotenberg, Robert & Lin, Wen-Der
 * Supporting Faculty and Students with Podcast Workshops
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2007
 * 


 * -- align="left" valign=top
 * Theng, Yin-Leng & Jiang, Tao
 * Determinant Factors of Information Use or Misuse in Wikipedia
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Thomas, Christopher & Sheth, Amit P.
 * Semantic Convergence of Wikipedia Articles
 * Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
 * 2007
 * 


 * -- align="left" valign=top
 * Thompson, Nicole
 * ICT and globalization in Education
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Toledo, Cheri
 * Setting the Stage to Use Blogging as a Reflective Tool in Teacher Education
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Tomuro, Noriko & Shepitsen, Andriy
 * Construction of disambiguated Folksonomy ontologies using Wikipedia
 * Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
 * 2009
 * 


 * -- align="left" valign=top
 * Traina, Michael; Doctor, Denise; Bean, Erik & Wooldridge, Vernon
 * STUDENT CODE of CONDUCT in the ONLINE CLASSROOM: A CONSIDERATION of ZERO TOLERANCE POLICIES
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2005
 * 


 * -- align="left" valign=top
 * Tran, T. & Nayak, R.
 * Evaluating the performance of XML document clustering by structure only
 * Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany
 * 2007
 * {{hidden||This paper reports the results and experiments performed on the INEX} 2006 document mining challenge corpus with the PCXSS} clustering method. The PCXSS} method is a progressive clustering method that computes the similarity between a new XML} document and existing clusters by considering the structures within documents. We conducted the clustering task on the INEX} and Wikipedia data sets.}}
 * {{hidden||This paper reports the results and experiments performed on the INEX} 2006 document mining challenge corpus with the PCXSS} clustering method. The PCXSS} method is a progressive clustering method that computes the similarity between a new XML} document and existing clusters by considering the structures within documents. We conducted the clustering task on the INEX} and Wikipedia data sets.}}
 * {{hidden||This paper reports the results and experiments performed on the INEX} 2006 document mining challenge corpus with the PCXSS} clustering method. The PCXSS} method is a progressive clustering method that computes the similarity between a new XML} document and existing clusters by considering the structures within documents. We conducted the clustering task on the INEX} and Wikipedia data sets.}}


 * -- align="left" valign=top
 * Tripp, Lisa
 * Teaching Digital Media Production in Online Instruction: Strategies and Recommendations
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Tsikrika, T. & Kludas, J.
 * Overview of the WikipediaMM Task at ImageCLEF 2009
 * Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany
 * 2010
 * 
 * {{hidden||ImageCLEF's} WikipediaMM} task provides a testbed for the system-oriented evaluation of multimedia information retrieval from a collection of Wikipedia images. The aim is to investigate retrieval approaches in the context of a large and heterogeneous collection of images (similar to those encountered on the Web) that are searched for by users with diverse information needs. This paper presents an overview of the resources, topics, and assessments of the WikipediaMM} task at ImageCLEF} 2009, summarises the retrieval approaches employed by the participating groups, and provides an analysis of the main evaluation results.}}
 * {{hidden||ImageCLEF's} WikipediaMM} task provides a testbed for the system-oriented evaluation of multimedia information retrieval from a collection of Wikipedia images. The aim is to investigate retrieval approaches in the context of a large and heterogeneous collection of images (similar to those encountered on the Web) that are searched for by users with diverse information needs. This paper presents an overview of the resources, topics, and assessments of the WikipediaMM} task at ImageCLEF} 2009, summarises the retrieval approaches employed by the participating groups, and provides an analysis of the main evaluation results.}}


 * -- align="left" valign=top
 * Turek, P.; Wierzbicki, A.; Nielek, R.; Hupa, A. & Datta, A.
 * Learning about the quality of teamwork from Wikiteams
 * 2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010), 20-22 Aug. 2010 Los Alamitos, CA, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Turgut, Yildiz
 * EFL Learners’ Experience of Online Writing by PBWiki
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Tynan, Belinda; Lee, Mark J.W. & Barnes, Cameron
 * Polar bears, black gold, and light bulbs: Creating stable futures for tertiary education through instructor training and support in the use of ICTs
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Téllez, Alberto; Juárez, Antonio; Hernández, Gustavo; Denicia, Claudia; Villatoro, Esaú; Montes, Manuel & Villaseñor, Luis
 * A Lexical Approach for Spanish Question Answering
 * Advances in Multilingual and Multimodal Information Retrieval
 * 2008
 * 
 * {{hidden||This paper discusses our system's results at the Spanish Question Answering task of CLEF} 2007. Our system is centered in a full data-driven approach that combines information retrieval and machine learning techniques. It mainly relies on the use of lexical information and avoids any complex language processing procedure. Evaluation results indicate that this approach is very effective for answering definition questions from Wikipedia. In contrast, they also reveal that it is very difficult to respond factoid questions from this resource solely based on the use of lexical overlaps and redundancy.}}
 * {{hidden||This paper discusses our system's results at the Spanish Question Answering task of CLEF} 2007. Our system is centered in a full data-driven approach that combines information retrieval and machine learning techniques. It mainly relies on the use of lexical information and avoids any complex language processing procedure. Evaluation results indicate that this approach is very effective for answering definition questions from Wikipedia. In contrast, they also reveal that it is very difficult to respond factoid questions from this resource solely based on the use of lexical overlaps and redundancy.}}


 * -- align="left" valign=top
 * Udupa, Raghavendra & Khapra, Mitesh
 * Improving the multilingual user experience of Wikipedia using cross-language name search
 * HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||Although Wikipedia has emerged as a powerful collaborative Encyclopedia on the Web, it is only partially multilingual as most of the content is in English and a small number of other languages. In real-life scenarios, Non-English} users in general and ESL/EFL} users in particular, have a need to search for relevant English Wikipedia articles as no relevant articles are available in their language. The multilingual experience of such users can be significantly improved if they could express their information need in their native language while searching for English Wikipedia articles. In this paper, we propose a novel cross-language name search algorithm and employ it for searching English Wikipedia articles in a diverse set of languages including Hebrew, Hindi, Russian, Kannada, Bangla and Tamil. Our empirical study shows that the multilingual experience of users is significantly improved by our approach.}}
 * {{hidden||Although Wikipedia has emerged as a powerful collaborative Encyclopedia on the Web, it is only partially multilingual as most of the content is in English and a small number of other languages. In real-life scenarios, Non-English} users in general and ESL/EFL} users in particular, have a need to search for relevant English Wikipedia articles as no relevant articles are available in their language. The multilingual experience of such users can be significantly improved if they could express their information need in their native language while searching for English Wikipedia articles. In this paper, we propose a novel cross-language name search algorithm and employ it for searching English Wikipedia articles in a diverse set of languages including Hebrew, Hindi, Russian, Kannada, Bangla and Tamil. Our empirical study shows that the multilingual experience of users is significantly improved by our approach.}}


 * -- align="left" valign=top
 * Unal, Zafer & Unal, Aslihan
 * Measuring the Preservice Teachers’ Satisfaction with the use of Moodle Learning Management System during Online Educational Technology Course
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Valencia, Delailah
 * E-Learning Implementation Model for Blended Learning
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Vallance, Michael & Wiz, Charles
 * The Realities of Working in Virtual Worlds
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Varadharajan, Vijay
 * Evolution and challenges in trust and security in information system infrastructures
 * Proceedings of the 2nd international conference on Security of information and networks
 * 2009
 * 


 * -- align="left" valign=top
 * Vaughan, Norm
 * Supporting Deep Approaches to Learning through the Use of Wikis and Weblogs
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Vegnaduzzo, Stefano
 * Morphological productivity rankings of complex adjectives
 * Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
 * 2009
 * 


 * -- align="left" valign=top
 * Veletsianos, George & Kimmons, Royce
 * Networked Participatory Scholarship: Socio-cultural \& Techno-cultural Pressures on Scholarly Practice
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2010
 * 


 * -- align="left" valign=top
 * Verhaart, Michael & Kinshuk
 * The virtualMe: An integrated teaching and learning framework
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Viana, Windson; Hammiche, Samira; Moisuc, Bogdan; Villanova-Oliver, Marlène; Gensel, Jérôme & Martin, Hervé
 * Semantic keyword-based retrieval of photos taken with mobile devices
 * Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia
 * 2008
 * 
 * {{hidden||This paper presents an approach for incorporating contextual metadata in a keyword-based photo retrieval process. We use our mobile annotation system PhotoMap} in order to create metadata describing the photo shoot context (e.g., street address, nearby objects, season, lighting, nearby people...). These metadata are then used to generate a set of stamped words for indexing each photo. We adapt the Vector Space Model (VSM) in order to transform these shoot context words into document-vector terms. Furthermore, spatial reasoning is used for inferring new potential indexing terms. We define methods for weighting those terms and for handling a query matching. We also detail retrieval experiments carried out by using PhotoMap} and Flickr geotagged photos. We illustrate the advantages of using Wikipedia georeferenced objects for indexing photos.}}
 * {{hidden||This paper presents an approach for incorporating contextual metadata in a keyword-based photo retrieval process. We use our mobile annotation system PhotoMap} in order to create metadata describing the photo shoot context (e.g., street address, nearby objects, season, lighting, nearby people...). These metadata are then used to generate a set of stamped words for indexing each photo. We adapt the Vector Space Model (VSM) in order to transform these shoot context words into document-vector terms. Furthermore, spatial reasoning is used for inferring new potential indexing terms. We define methods for weighting those terms and for handling a query matching. We also detail retrieval experiments carried out by using PhotoMap} and Flickr geotagged photos. We illustrate the advantages of using Wikipedia georeferenced objects for indexing photos.}}


 * -- align="left" valign=top
 * Viégas, Fernanda B.; Wattenberg, Martin & Dave, Kushal
 * Studying cooperation and conflict between authors with history flow visualizations
 * Proceedings of the SIGCHI conference on Human factors in computing systems
 * 2004
 * 


 * -- align="left" valign=top
 * Vonrueden, Michael; Hampel, Thorsten & Geissler, Sabrina
 * Collaborative Ontologies in Knowledge Management
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2005
 * 


 * -- align="left" valign=top
 * Vroom, R.W.; Kooijman, A. & Jelierse, R.
 * Efficient community management in an industrial design engineering wiki: distributed leadership
 * 11th International Conference on Enterprise Information Systems. DISI, 6-10 May 2009 Setubal, Portugal
 * 2009
 * {{hidden||Industrial design engineers use a wide variety of research fields when making decisions that will eventually have significant impact on their designs. Obviously, designers cannot master every field, so they are therefore often looking for a simple set of rules of thumb on a particular subject. For this reason a wiki has been set up: www.wikid.eu. Whilst Wikipedia already offers a lot of this information, there is a distinct difference between WikID} and Wikipedia; Wikipedia aims to be an encyclopaedia, and therefore tries to be as complete as possible. WikID} aims to be a design tool. It offers information in a compact manner tailored to its user group, being the Industrial Designers. The main subjects of this paper are the research on how to create an efficient structure for the community of WikID} and the creation of a tool for managing the community. With the new functionality for managing group memberships and viewing information on users, it will be easier to maintain the community. This will also help in creating a better community which will be more inviting to participate in, provided that the assumptions made in this area hold true.}}
 * {{hidden||Industrial design engineers use a wide variety of research fields when making decisions that will eventually have significant impact on their designs. Obviously, designers cannot master every field, so they are therefore often looking for a simple set of rules of thumb on a particular subject. For this reason a wiki has been set up: www.wikid.eu. Whilst Wikipedia already offers a lot of this information, there is a distinct difference between WikID} and Wikipedia; Wikipedia aims to be an encyclopaedia, and therefore tries to be as complete as possible. WikID} aims to be a design tool. It offers information in a compact manner tailored to its user group, being the Industrial Designers. The main subjects of this paper are the research on how to create an efficient structure for the community of WikID} and the creation of a tool for managing the community. With the new functionality for managing group memberships and viewing information on users, it will be easier to maintain the community. This will also help in creating a better community which will be more inviting to participate in, provided that the assumptions made in this area hold true.}}
 * {{hidden||Industrial design engineers use a wide variety of research fields when making decisions that will eventually have significant impact on their designs. Obviously, designers cannot master every field, so they are therefore often looking for a simple set of rules of thumb on a particular subject. For this reason a wiki has been set up: www.wikid.eu. Whilst Wikipedia already offers a lot of this information, there is a distinct difference between WikID} and Wikipedia; Wikipedia aims to be an encyclopaedia, and therefore tries to be as complete as possible. WikID} aims to be a design tool. It offers information in a compact manner tailored to its user group, being the Industrial Designers. The main subjects of this paper are the research on how to create an efficient structure for the community of WikID} and the creation of a tool for managing the community. With the new functionality for managing group memberships and viewing information on users, it will be easier to maintain the community. This will also help in creating a better community which will be more inviting to participate in, provided that the assumptions made in this area hold true.}}


 * -- align="left" valign=top
 * Vuong, Ba-Quy; Lim, Ee-Peng; Sun, Aixin; Chang, Chew-Hung; Chatterjea, K.; Goh, Dion Hoe-Lian; Theng, Yin-Leng & Zhang, Jun
 * Key element-context model: an approach to efficient Web metadata maintenance
 * Research and Advanced Technology for Digital Libraries. 11th European Conference, ECDL 2007, 16-21 Sept. 2007 Berlin, Germany
 * 2007
 * {{hidden||In this paper, we study the problem of maintaining metadata for open Web content. In digital libraries such as DLESE, NSDL} and G-Portal, metadata records are created for some good quality Web content objects so as to make them more accessible. These Web objects are dynamic making it necessary to update their metadata records. As Web metadata maintenance involves manual efforts, we propose to reduce the efforts by introducing the Key Element-Context} (KeC) model to monitor only those changes made on Web page content regions that concern metadata attributes while ignoring other changes. We also develop evaluation metrics to measure the number of alerts and the amount of efforts in updating Web metadata records. KeC} model has been experimented on metadata records defined for Wikipedia articles, and its performance with different settings is reported. The model is implemented in G-Portal} as a metadata maintenance module.}}
 * {{hidden||In this paper, we study the problem of maintaining metadata for open Web content. In digital libraries such as DLESE, NSDL} and G-Portal, metadata records are created for some good quality Web content objects so as to make them more accessible. These Web objects are dynamic making it necessary to update their metadata records. As Web metadata maintenance involves manual efforts, we propose to reduce the efforts by introducing the Key Element-Context} (KeC) model to monitor only those changes made on Web page content regions that concern metadata attributes while ignoring other changes. We also develop evaluation metrics to measure the number of alerts and the amount of efforts in updating Web metadata records. KeC} model has been experimented on metadata records defined for Wikipedia articles, and its performance with different settings is reported. The model is implemented in G-Portal} as a metadata maintenance module.}}
 * {{hidden||In this paper, we study the problem of maintaining metadata for open Web content. In digital libraries such as DLESE, NSDL} and G-Portal, metadata records are created for some good quality Web content objects so as to make them more accessible. These Web objects are dynamic making it necessary to update their metadata records. As Web metadata maintenance involves manual efforts, we propose to reduce the efforts by introducing the Key Element-Context} (KeC) model to monitor only those changes made on Web page content regions that concern metadata attributes while ignoring other changes. We also develop evaluation metrics to measure the number of alerts and the amount of efforts in updating Web metadata records. KeC} model has been experimented on metadata records defined for Wikipedia articles, and its performance with different settings is reported. The model is implemented in G-Portal} as a metadata maintenance module.}}


 * -- align="left" valign=top
 * Vuong, Ba-Quy; Lim, Ee-Peng; Sun, Aixin; Le, Minh-Tam & Lauw, Hady Wirawan
 * On ranking controversies in wikipedia: models and evaluation
 * Proceedings of the international conference on Web search and web data mining
 * 2008
 * 


 * -- align="left" valign=top
 * Völker, Johanna; Hitzler, Pascal & Cimiano, Philipp
 * Acquisition of OWL DL Axioms from Lexical Resources
 * Proceedings of the 4th European conference on The Semantic Web: Research and Applications
 * 2007
 * 
 * {{hidden||State-of-the-art research on automated learning of ontologies from text currently focuses on inexpressive ontologies. The acquisition of complex axioms involving logical connectives, role restrictions, and other expressive features of the Web Ontology Language OWL} remains largely unexplored. In this paper, we present a method and implementation for enriching inexpressive OWL} ontologies with expressive axioms which is based on a deep syntactic analysis of natural language definitions. We argue that it can serve as a core for a semi-automatic ontology engineering process supported by a methodology that integrates methods for both ontology learning and evaluation. The feasibility of our approach is demonstrated by generating complex class descriptions from Wikipedia definitions and from a fishery glossary provided by the Food and Agriculture Organization of the United Nations.}}
 * {{hidden||State-of-the-art research on automated learning of ontologies from text currently focuses on inexpressive ontologies. The acquisition of complex axioms involving logical connectives, role restrictions, and other expressive features of the Web Ontology Language OWL} remains largely unexplored. In this paper, we present a method and implementation for enriching inexpressive OWL} ontologies with expressive axioms which is based on a deep syntactic analysis of natural language definitions. We argue that it can serve as a core for a semi-automatic ontology engineering process supported by a methodology that integrates methods for both ontology learning and evaluation. The feasibility of our approach is demonstrated by generating complex class descriptions from Wikipedia definitions and from a fishery glossary provided by the Food and Agriculture Organization of the United Nations.}}


 * -- align="left" valign=top
 * Wake, Donna & Sain, Nathan
 * Exploring Learning Theory the Wiki Way
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Wald, Mike; Seale, Jane & Draffan, E A
 * Disabled Learners’ Experiences of E-learning
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Walker, J.
 * Collective intelligence: the wisdom of crowds
 * Online Information 2007, 4-6 Dec. 2007 London, UK}
 * 2007
 * {{hidden||Web 2.0 technologies can focus the wisdom of crowds that is latent in social networks. Technologies like Wikipedia and blogs demonstrate how the actions of individuals, when aggregated, can lead to enormous value. Of all these new technologies, blogs and wikis are the most successful. Wikis have become as useful as email to many organisations. This phenomenon is about three things: 1. The social dimension: software that aggregates people around an activity. 2. Collective intelligence: software that facilitates building knowledge. 3. Lightweight software: that is very different from traditionally more complex and more expensive software. These technologies are no longer 'bleeding-edge' or risky ventures: SAP} hosts a public wiki with 750,000 registered users building knowledge on SAP} products. Pixar uses a wiki for all project management of their animated film production. The Los Angeles Fire Department uses Twitter to broadcast emergent activity. IBM's} policy on blogging articulates how blogs are critical to their innovation and corporate citizen values. What accommodations need to be made so that these tools produce value? Don't approach these tools as a way to automate business processes in the traditional sense; they are all about the social interaction of knowledge workers. Avoid the myth of accuracy: the fear that Wikipedia, wikis, and blogs are riddled with bad information. Don't be trapped by the illusion of control: letting go allows the social network to produce the value of collective intelligence. Be prepared for more democratisation of information within the bounds of truly confidential information. Be willing to experiment with less complex software that requires less IT} support.}}
 * {{hidden||Web 2.0 technologies can focus the wisdom of crowds that is latent in social networks. Technologies like Wikipedia and blogs demonstrate how the actions of individuals, when aggregated, can lead to enormous value. Of all these new technologies, blogs and wikis are the most successful. Wikis have become as useful as email to many organisations. This phenomenon is about three things: 1. The social dimension: software that aggregates people around an activity. 2. Collective intelligence: software that facilitates building knowledge. 3. Lightweight software: that is very different from traditionally more complex and more expensive software. These technologies are no longer 'bleeding-edge' or risky ventures: SAP} hosts a public wiki with 750,000 registered users building knowledge on SAP} products. Pixar uses a wiki for all project management of their animated film production. The Los Angeles Fire Department uses Twitter to broadcast emergent activity. IBM's} policy on blogging articulates how blogs are critical to their innovation and corporate citizen values. What accommodations need to be made so that these tools produce value? Don't approach these tools as a way to automate business processes in the traditional sense; they are all about the social interaction of knowledge workers. Avoid the myth of accuracy: the fear that Wikipedia, wikis, and blogs are riddled with bad information. Don't be trapped by the illusion of control: letting go allows the social network to produce the value of collective intelligence. Be prepared for more democratisation of information within the bounds of truly confidential information. Be willing to experiment with less complex software that requires less IT} support.}}
 * {{hidden||Web 2.0 technologies can focus the wisdom of crowds that is latent in social networks. Technologies like Wikipedia and blogs demonstrate how the actions of individuals, when aggregated, can lead to enormous value. Of all these new technologies, blogs and wikis are the most successful. Wikis have become as useful as email to many organisations. This phenomenon is about three things: 1. The social dimension: software that aggregates people around an activity. 2. Collective intelligence: software that facilitates building knowledge. 3. Lightweight software: that is very different from traditionally more complex and more expensive software. These technologies are no longer 'bleeding-edge' or risky ventures: SAP} hosts a public wiki with 750,000 registered users building knowledge on SAP} products. Pixar uses a wiki for all project management of their animated film production. The Los Angeles Fire Department uses Twitter to broadcast emergent activity. IBM's} policy on blogging articulates how blogs are critical to their innovation and corporate citizen values. What accommodations need to be made so that these tools produce value? Don't approach these tools as a way to automate business processes in the traditional sense; they are all about the social interaction of knowledge workers. Avoid the myth of accuracy: the fear that Wikipedia, wikis, and blogs are riddled with bad information. Don't be trapped by the illusion of control: letting go allows the social network to produce the value of collective intelligence. Be prepared for more democratisation of information within the bounds of truly confidential information. Be willing to experiment with less complex software that requires less IT} support.}}


 * -- align="left" valign=top
 * Wallace, A.
 * Open and Transparent Consensus: a Snapshot of Teachers' use of Wikipedia
 * 8th European Conference on e-Learning, 29-30 Oct. 2009 Reading, UK}
 * 2009
 * {{hidden||The title of this paper (Open} and Transparent Consensus) is derived from Wikipedia's own description of itself, and reflects its philosophy and approach to collaborative knowledge production and use. Wikipedia is a popular, multi-lingual, web-based, free-content encyclopaedia and is the most well-known of wikis, collaborative websites that can be directly edited by anyone with access to them. Many teachers and students have experience with Wikipedia, and in this survey teachers were asked how Wiki-based practices might contribute to teaching and learning. This study was conducted in England with 133 teachers from a wide range of schools, who have used Wikipedia in some way. The survey was anonymous to protect individuals' and schools' privacy; there was no way of identifying individual responses. The survey was conducted online and respondents were encouraged to be as open and honest as possible. Participation in this survey was entirely voluntary. Many of the questions were based upon descriptions by Wikipedia about itself and these were intended to elicit responses from teachers that reflect how closely their usage relates to the original intention and philosophy of the encyclopaedia. Other questions were intended to probe different ways in which teachers use the website.}}
 * {{hidden||The title of this paper (Open} and Transparent Consensus) is derived from Wikipedia's own description of itself, and reflects its philosophy and approach to collaborative knowledge production and use. Wikipedia is a popular, multi-lingual, web-based, free-content encyclopaedia and is the most well-known of wikis, collaborative websites that can be directly edited by anyone with access to them. Many teachers and students have experience with Wikipedia, and in this survey teachers were asked how Wiki-based practices might contribute to teaching and learning. This study was conducted in England with 133 teachers from a wide range of schools, who have used Wikipedia in some way. The survey was anonymous to protect individuals' and schools' privacy; there was no way of identifying individual responses. The survey was conducted online and respondents were encouraged to be as open and honest as possible. Participation in this survey was entirely voluntary. Many of the questions were based upon descriptions by Wikipedia about itself and these were intended to elicit responses from teachers that reflect how closely their usage relates to the original intention and philosophy of the encyclopaedia. Other questions were intended to probe different ways in which teachers use the website.}}
 * {{hidden||The title of this paper (Open} and Transparent Consensus) is derived from Wikipedia's own description of itself, and reflects its philosophy and approach to collaborative knowledge production and use. Wikipedia is a popular, multi-lingual, web-based, free-content encyclopaedia and is the most well-known of wikis, collaborative websites that can be directly edited by anyone with access to them. Many teachers and students have experience with Wikipedia, and in this survey teachers were asked how Wiki-based practices might contribute to teaching and learning. This study was conducted in England with 133 teachers from a wide range of schools, who have used Wikipedia in some way. The survey was anonymous to protect individuals' and schools' privacy; there was no way of identifying individual responses. The survey was conducted online and respondents were encouraged to be as open and honest as possible. Participation in this survey was entirely voluntary. Many of the questions were based upon descriptions by Wikipedia about itself and these were intended to elicit responses from teachers that reflect how closely their usage relates to the original intention and philosophy of the encyclopaedia. Other questions were intended to probe different ways in which teachers use the website.}}


 * -- align="left" valign=top
 * Wang, Hong
 * Wiki as a Collaborative Tool to Support Faculty in Mobile Teaching and Learning
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * Wang, Huan; Chia, Liang-Tien & Gao, Shenghua
 * Wikipedia-assisted concept thesaurus for better web media understanding
 * Proceedings of the international conference on Multimedia information retrieval
 * 2010
 * 


 * -- align="left" valign=top
 * Wang, Yang; Wang, Haofen; Zhu, Haiping & Yu, Yong
 * Exploit semantic information for category annotation recommendation in Wikipedia
 * Natural Language Processing and Information Systems. 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, 27-29 June 2007 Berlin, Germany
 * 2007


 * -- align="left" valign=top
 * Wang, Shiang-Kwei
 * Effects of Playing a History-Simulation Game: Romance of Three Kingdoms
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Wang, Sy-Chyi & Chern, Jin-Yuan
 * The new era of “School 2.0‿—Teaching with Pleasure, not Pressure: An Innovative Teaching Experience in a Software-oriented Course
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Wartena, Christian & Brussee, Rogier
 * Instanced-Based Mapping between Thesauri and Folksonomies
 * Proceedings of the 7th International Conference on The Semantic Web
 * 2008
 * 


 * -- align="left" valign=top
 * Watson, Rachel & Boggs, Christine
 * The Virtual Classroom: Student Perceptions of Podcast Lectures in a General Microbiology Classroom
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Watson, Rachel & Boggs, Christine
 * Vodcast Venture: How Formative Evaluation of Vodcasting in a Traditional On-Campus Microbiology Class Led to the Development of a Fully Vodcasted Online Biochemistry Course
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Weaver, Debbi & McIntosh, P. Craig
 * Providing Feedback on Collaboration and Teamwork Amongst Off-Campus Students
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2009
 * 


 * -- align="left" valign=top
 * Weaver, Gabriel; Strickland, Barbara & Crane, Gregory
 * Quantifying the accuracy of relational statements in Wikipedia: a methodology
 * Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
 * 2006
 * 


 * -- align="left" valign=top
 * Weikum, Gerhard
 * Harvesting and organizing knowledge from the web
 * Proceedings of the 11th East European conference on Advances in databases and information systems
 * 2007
 * 
 * {{hidden||Information organization and search on the Web is gaining structure and context awareness and more semantic flavor, for example, in the forms of faceted search, vertical search, entity search, and Deep-Web} search. I envision another big leap forward by automatically harvesting and organizing knowledge from the Web, represented in terms of explicit entities and relations as well as ontological concepts. This will be made possible by the confluence of three strong trends: 1) rich Semantic-Web-style} knowledge repositories like ontologies and taxonomies, 2) large-scale information extraction from high-quality text sources such as Wikipedia, and 3) social tagging in the spirit of Web 2.0. I refer to the three directions as Semantic Web, Statistical Web, and Social Web (at the risk of some oversimplification), and I briefly characterize each of them.}}
 * {{hidden||Information organization and search on the Web is gaining structure and context awareness and more semantic flavor, for example, in the forms of faceted search, vertical search, entity search, and Deep-Web} search. I envision another big leap forward by automatically harvesting and organizing knowledge from the Web, represented in terms of explicit entities and relations as well as ontological concepts. This will be made possible by the confluence of three strong trends: 1) rich Semantic-Web-style} knowledge repositories like ontologies and taxonomies, 2) large-scale information extraction from high-quality text sources such as Wikipedia, and 3) social tagging in the spirit of Web 2.0. I refer to the three directions as Semantic Web, Statistical Web, and Social Web (at the risk of some oversimplification), and I briefly characterize each of them.}}


 * -- align="left" valign=top
 * Weiland, Steven
 * Online Abilities for Teacher Education: The Second Subject in Distance Learning
 * Society for Information Technology \& Teacher Education International Conference
 * 2008
 * 


 * -- align="left" valign=top
 * West, Richard; Wright, Geoff & Graham, Charles
 * Blogs, Wikis, and Aggregators: A New Vocabulary for Promoting Reflection and Collaboration in a Preservice Technology Integration Course
 * Society for Information Technology \& Teacher Education International Conference
 * 2005
 * 


 * -- align="left" valign=top
 * Whittier, David & Supavai, Eisara
 * Supporting Knowledge Building Communities with an Online Application
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Wichelhaus, Svenja; Schüler, Thomas; Ramm, Michaela & Morisse, Karsten
 * More than Podcasting - An evaluation of an integrated blended learning scenario
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Wijaya, Senoaji; Spruit, Marco R. & Scheper, Wim J.
 * Webstrategy Formulation: Benefiting from Web 2.0 Concepts to Deliver Business Values
 * Proceedings of the 1st world summit on The Knowledge Society: Emerging Technologies and Information Systems for the Knowledge Society
 * 2008
 * 


 * -- align="left" valign=top
 * Wilks, Yorick
 * Artificial companions as dialogue agents
 * Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
 * 2009
 * 
 * {{hidden||COMPANIONS} is an EU} project that aims to change the way we think about the relationships of people to computers and the Internet by developing a virtual conversational {'Companion'.} This is intended as an agent or 'presence' that stays with the user for long periods of time, developing a relationship and 'knowing' its owners preferences and wishes. The Companion communicates with the user primarily through speech. This paper describes the functionality and system modules of the Senior Companion, one of two initial prototypes built in the first two years of the project. The Senior Companion provides a multimodal interface for eliciting and retrieving personal information from the elderly user through a conversation about their photographs. The Companion will, through conversation, elicit their life memories, often prompted by discussion of their photographs; the aim is that the Companion should come to know a great deal about its user, their tastes, likes, dislikes, emotional reactions etc, through long periods of conversation. It is a further assumption that most life information will be stored on the internet (as in the Memories for Life project: http://www.memoriesforlife.org/) and the SC} is linked directly to photo inventories in Facebook, to gain initial information about people and relationships, as well as to Wikipedia to enable it to respond about places mentioned in conversations about images. The overall aim of the SC, not yet achieved, is to produce a coherent life narrative for its user from these materials, although its short term goals are to assist, amuse, entertain and gain the trust of the user. The Senior Companion uses Information Extraction to get content from the speech input, rather than conventional parsing, and retains utterance content, extracted internet information and ontologies all in RDF} formalism over which it does primitive reasoning about people. It has a dialogue manager virtual machine intended to capture mixed initiative, between Companion and user, and which can be a basis for later replacement by learned components.}}
 * {{hidden||COMPANIONS} is an EU} project that aims to change the way we think about the relationships of people to computers and the Internet by developing a virtual conversational {'Companion'.} This is intended as an agent or 'presence' that stays with the user for long periods of time, developing a relationship and 'knowing' its owners preferences and wishes. The Companion communicates with the user primarily through speech. This paper describes the functionality and system modules of the Senior Companion, one of two initial prototypes built in the first two years of the project. The Senior Companion provides a multimodal interface for eliciting and retrieving personal information from the elderly user through a conversation about their photographs. The Companion will, through conversation, elicit their life memories, often prompted by discussion of their photographs; the aim is that the Companion should come to know a great deal about its user, their tastes, likes, dislikes, emotional reactions etc, through long periods of conversation. It is a further assumption that most life information will be stored on the internet (as in the Memories for Life project: http://www.memoriesforlife.org/) and the SC} is linked directly to photo inventories in Facebook, to gain initial information about people and relationships, as well as to Wikipedia to enable it to respond about places mentioned in conversations about images. The overall aim of the SC, not yet achieved, is to produce a coherent life narrative for its user from these materials, although its short term goals are to assist, amuse, entertain and gain the trust of the user. The Senior Companion uses Information Extraction to get content from the speech input, rather than conventional parsing, and retains utterance content, extracted internet information and ontologies all in RDF} formalism over which it does primitive reasoning about people. It has a dialogue manager virtual machine intended to capture mixed initiative, between Companion and user, and which can be a basis for later replacement by learned components.}}


 * -- align="left" valign=top
 * Williams, Alexandria; Seals, Cheryl; Rouse, Kenneth & Gilbert, Juan E.
 * Visual Programming with Squeak SimBuilder: Techniques for E-Learning in the Creation of Science Frameworks
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Williams, Vicki
 * Assessing the Web 2.0 Technologies: Mission Impossible?
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2009
 * 


 * -- align="left" valign=top
 * Williams, Vicki
 * Educational Gaming as an Instructional Strategy
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2008
 * 


 * -- align="left" valign=top
 * Williams, Vicki & Williams, Barry
 * Way of the Wiki: The Zen of Social Computing
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2006
 * 


 * -- align="left" valign=top
 * Winkler, Thomas; Ide, Martina & Herczeg, Michael
 * Connecting Second Life and Real Life: Integrating Mixed-Reality-Technology into Teacher Education
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Witteman, Holly; Chandrashekar, Sambhavi; Betel, Lisa & O’Grady, Laura
 * Sense-making and credibility of health information on the social web: A multi-method study accessing tagging and tag clouds
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Witten, Ian
 * Wikipedia and How to Use It for Semantic Document Representation
 * Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
 * 2010


 * -- align="left" valign=top
 * Witten, I.H.
 * Semantic Document Processing Using Wikipedia as a Knowledge Base
 * Focused Retrieval and Evaluation. 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, 7-9 Dec. 2009 Berlin, Germany
 * 2010
 * {{hidden||Summary form only given. Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This talk will introduce the process of wikification"; that is automatically and judiciously augmenting a plain-text document with pertinent hyperlinks to Wikipedia articles as though the document were itself a Wikipedia article. This amounts to a new semantic representation of text in terms of the salient concepts it mentions where "concept" is equated to {"Wikipedia} article." Wikification is a useful process in itself adding value to plain text documents. More importantly it supports new methods of document processing. I first describe how Wikipedia can be used to determine semantic relatedness and then introduce a new high-performance method of wikification that exploits Wikipedia's 60 M internal hyperlinks for relational information and their anchor texts as lexical information using simple machine learning. I go on to discuss applications to knowledge-based information retrieval topic indexing document tagging and document clustering. Some of these perform at human levels. For example on CiteULike} data automatically extracted tags are competitive with tag sets assigned by the best human taggers according to a measure of consistency with other human taggers. Although this work is based on English it involves no syntactic parsing and the techniques are largely language independent. The talk will include live demos."}}
 * {{hidden||Summary form only given. Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This talk will introduce the process of wikification"; that is automatically and judiciously augmenting a plain-text document with pertinent hyperlinks to Wikipedia articles as though the document were itself a Wikipedia article. This amounts to a new semantic representation of text in terms of the salient concepts it mentions where "concept" is equated to {"Wikipedia} article." Wikification is a useful process in itself adding value to plain text documents. More importantly it supports new methods of document processing. I first describe how Wikipedia can be used to determine semantic relatedness and then introduce a new high-performance method of wikification that exploits Wikipedia's 60 M internal hyperlinks for relational information and their anchor texts as lexical information using simple machine learning. I go on to discuss applications to knowledge-based information retrieval topic indexing document tagging and document clustering. Some of these perform at human levels. For example on CiteULike} data automatically extracted tags are competitive with tag sets assigned by the best human taggers according to a measure of consistency with other human taggers. Although this work is based on English it involves no syntactic parsing and the techniques are largely language independent. The talk will include live demos."}}
 * {{hidden||Summary form only given. Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This talk will introduce the process of wikification"; that is automatically and judiciously augmenting a plain-text document with pertinent hyperlinks to Wikipedia articles as though the document were itself a Wikipedia article. This amounts to a new semantic representation of text in terms of the salient concepts it mentions where "concept" is equated to {"Wikipedia} article." Wikification is a useful process in itself adding value to plain text documents. More importantly it supports new methods of document processing. I first describe how Wikipedia can be used to determine semantic relatedness and then introduce a new high-performance method of wikification that exploits Wikipedia's 60 M internal hyperlinks for relational information and their anchor texts as lexical information using simple machine learning. I go on to discuss applications to knowledge-based information retrieval topic indexing document tagging and document clustering. Some of these perform at human levels. For example on CiteULike} data automatically extracted tags are competitive with tag sets assigned by the best human taggers according to a measure of consistency with other human taggers. Although this work is based on English it involves no syntactic parsing and the techniques are largely language independent. The talk will include live demos."}}


 * -- align="left" valign=top
 * Wojcik, Isaac
 * The Industrialization of Education: Creating an Open Virtual Mega-University for the Developing World (OVMUDW).
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Wojtanowski, Scott
 * Using Wikis to Build Collaborative Knowing
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Wong, C.; Vrijmoed, L. & Wong, E.
 * Learning environment for digital natives: Web 2.0 meets globalization
 * Hybrid Learning and Education. First International Conference, ICHL 2008, 13-15 Aug. 2008 Berlin, Germany
 * 2008
 * 
 * {{hidden||Web 2.0 services and communities constitute the daily lives of digital natives with online utilities such as Wikipedia and Facebook. Attempts to apply Web 2.0 at the University of Illinois at Urbana-Champaign} demonstrated that the transformation to writing exercises could improve students' learning experiences. Inspired by their success, blogging technology was adopted to pilot a writing-across-the-curriculum project via the learning management system at City University of Hong Kong. Instead of promoting peer assessment, one-on-one tutoring interactions were induced by providing feedback to written assignments. Taking the advantage of the flat world" tutors were hired from the United States Canada Australia New Zealand and Spain to experiment with outsourcing and offshoring some of the English enhancement schemes. For the university wide project deployment in the fall of 2008 a globalized network of online language tutors needs to be built up with support from universities in countries with English as the native language."}}
 * {{hidden||Web 2.0 services and communities constitute the daily lives of digital natives with online utilities such as Wikipedia and Facebook. Attempts to apply Web 2.0 at the University of Illinois at Urbana-Champaign} demonstrated that the transformation to writing exercises could improve students' learning experiences. Inspired by their success, blogging technology was adopted to pilot a writing-across-the-curriculum project via the learning management system at City University of Hong Kong. Instead of promoting peer assessment, one-on-one tutoring interactions were induced by providing feedback to written assignments. Taking the advantage of the flat world" tutors were hired from the United States Canada Australia New Zealand and Spain to experiment with outsourcing and offshoring some of the English enhancement schemes. For the university wide project deployment in the fall of 2008 a globalized network of online language tutors needs to be built up with support from universities in countries with English as the native language."}}


 * -- align="left" valign=top
 * Wong, Wai-Yat & Wong, Loong
 * Using Wikiweb for Community Information Sharing and e-Governance
 * Society for Information Technology \& Teacher Education International Conference
 * 2005
 * 


 * -- align="left" valign=top
 * Woodman, William & Krier, Dan
 * An Unblinking Eye: Steps for Replacing Traditional With Visual Scholarship
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Wu, Kewen; Zhu, Qinghua; Zhao, Yuxiang & Zheng, Hua
 * Mining the Factors Affecting the Quality of Wikipedia Articles
 * 2010 International Conference of Information Science and Management Engineering. ISME 2010, 7-8 Aug. 2010 Los Alamitos, CA, USA}
 * 2010
 * 


 * -- align="left" valign=top
 * Wu, Youzheng & Kashioka, Hideki
 * An Unsupervised Model of Exploiting the Web to Answer Definitional Questions
 * Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
 * 2009
 * 
 * {{hidden||In order to build accurate target profiles, most definition question answering (QA) systems primarily involve utilizing various external resources, such as WordNet, Wikipedia, Biograpy.com, etc. However, these external resources are not always available or helpful when answering definition questions. In contrast, this paper proposes an unsupervised classification model, called the U-Model, which can liberate definitional QA} systems from heavily depending on a variety of external resources via applying sentence expansion (\$SE\$) and SVM} classifier. Experimental results from testing on English TREC} test sets reveal that the proposed U-Model} can not only significantly outperform baseline system but also require no specific external resources.}}
 * {{hidden||In order to build accurate target profiles, most definition question answering (QA) systems primarily involve utilizing various external resources, such as WordNet, Wikipedia, Biograpy.com, etc. However, these external resources are not always available or helpful when answering definition questions. In contrast, this paper proposes an unsupervised classification model, called the U-Model, which can liberate definitional QA} systems from heavily depending on a variety of external resources via applying sentence expansion (\$SE\$) and SVM} classifier. Experimental results from testing on English TREC} test sets reveal that the proposed U-Model} can not only significantly outperform baseline system but also require no specific external resources.}}


 * -- align="left" valign=top
 * Wubben, Sander & van den Bosch, Antal
 * A semantic relatedness metric based on free link structure
 * Proceedings of the Eighth International Conference on Computational Semantics
 * 2009
 * 
 * {{hidden||While shortest paths in WordNet} are known to correlate well with semantic similarity, an is-a hierarchy is less suited for estimating semantic relatedness. We demonstrate this by comparing two free scale networks (ConceptNet} and Wikipedia) to WordNet.} Using the Finkelstein-353 dataset we show that a shortest path metric run on Wikipedia attains a better correlation than WordNet-based} metrics. Concept-Net} attains a good correlation as well, but suffers from a low concept coverage.}}
 * {{hidden||While shortest paths in WordNet} are known to correlate well with semantic similarity, an is-a hierarchy is less suited for estimating semantic relatedness. We demonstrate this by comparing two free scale networks (ConceptNet} and Wikipedia) to WordNet.} Using the Finkelstein-353 dataset we show that a shortest path metric run on Wikipedia attains a better correlation than WordNet-based} metrics. Concept-Net} attains a good correlation as well, but suffers from a low concept coverage.}}


 * -- align="left" valign=top
 * Yamane, Y.; Ishida, H.; Hattori, F. & Yasuda, K.
 * Conversation support system for people with language disorders - Making topic lists from Wikipedia
 * 2010 9th IEEE International Conference on Cognitive Informatics (ICCI), 7-9 July 2010 Piscataway, NJ, USA}
 * 2010
 * 
 * {{hidden||A conversation support system for people with language disorders is proposed. Although the existing conversation support system Raku-raku} Jiyu Kaiwa" (Easy} Free Conversation) is effective it has insufficient topic words and a rigid topic list structure. To solve these problems this paper proposes a method that makes topic lists from Wikipedia's millions of topic words. Experiments using the proposed topic list showed that subject utterances increased and the variety of spoken topics was expanded."}}
 * {{hidden||A conversation support system for people with language disorders is proposed. Although the existing conversation support system Raku-raku} Jiyu Kaiwa" (Easy} Free Conversation) is effective it has insufficient topic words and a rigid topic list structure. To solve these problems this paper proposes a method that makes topic lists from Wikipedia's millions of topic words. Experiments using the proposed topic list showed that subject utterances increased and the variety of spoken topics was expanded."}}


 * -- align="left" valign=top
 * Yan, Y.; Li, Haibo; Matsuo, Y. & Ishizuka, M.
 * Multi-view Bootstrapping for Relation Extraction by Exploring Web Features and Linguistic Features
 * Computational Linguistics and Intelligent Text Processing 11th International Conference, CICling 2010, 21-27 March 2010 Berlin, Germany
 * 2010
 * {{hidden||Binary semantic relation extraction from Wikipedia is particularly useful for various NLP} and Web applications. Currently frequent pattern mining-based methods and syntactic analysis-based methods are two types of leading methods for semantic relation extraction task. With a novel view on integrating syntactic analysis on Wikipedia text with redundancy information from the Web, we propose a multi-view learning approach for bootstrapping relationships between entities with the complementary between the Web view and linguistic view. On the one hand, from the linguistic view, linguistic features are generated from linguistic parsing on Wikipedia texts by abstracting away from different surface realizations of semantic relations. On the other hand, Web features are extracted from the Web corpus to provide frequency information for relation extraction. Experimental evaluation on a relational dataset demonstrates that linguistic analysis on Wikipedia texts and Web collective information reveal different aspects of the nature of entity-related semantic relationships. It also shows that our multiview learning method considerably boosts the performance comparing to learning with only one view of features, with the weaknesses of one view complement the strengths of the other.}}
 * {{hidden||Binary semantic relation extraction from Wikipedia is particularly useful for various NLP} and Web applications. Currently frequent pattern mining-based methods and syntactic analysis-based methods are two types of leading methods for semantic relation extraction task. With a novel view on integrating syntactic analysis on Wikipedia text with redundancy information from the Web, we propose a multi-view learning approach for bootstrapping relationships between entities with the complementary between the Web view and linguistic view. On the one hand, from the linguistic view, linguistic features are generated from linguistic parsing on Wikipedia texts by abstracting away from different surface realizations of semantic relations. On the other hand, Web features are extracted from the Web corpus to provide frequency information for relation extraction. Experimental evaluation on a relational dataset demonstrates that linguistic analysis on Wikipedia texts and Web collective information reveal different aspects of the nature of entity-related semantic relationships. It also shows that our multiview learning method considerably boosts the performance comparing to learning with only one view of features, with the weaknesses of one view complement the strengths of the other.}}
 * {{hidden||Binary semantic relation extraction from Wikipedia is particularly useful for various NLP} and Web applications. Currently frequent pattern mining-based methods and syntactic analysis-based methods are two types of leading methods for semantic relation extraction task. With a novel view on integrating syntactic analysis on Wikipedia text with redundancy information from the Web, we propose a multi-view learning approach for bootstrapping relationships between entities with the complementary between the Web view and linguistic view. On the one hand, from the linguistic view, linguistic features are generated from linguistic parsing on Wikipedia texts by abstracting away from different surface realizations of semantic relations. On the other hand, Web features are extracted from the Web corpus to provide frequency information for relation extraction. Experimental evaluation on a relational dataset demonstrates that linguistic analysis on Wikipedia texts and Web collective information reveal different aspects of the nature of entity-related semantic relationships. It also shows that our multiview learning method considerably boosts the performance comparing to learning with only one view of features, with the weaknesses of one view complement the strengths of the other.}}


 * -- align="left" valign=top
 * Yang, Junghoon; Han, Jangwhan; Oh, Inseok & Kwak, Mingyung
 * Using Wikipedia technology for topic maps design
 * Proceedings of the 45th annual southeast regional conference
 * 2007
 * 
 * {{hidden||In this paper we present a method for automatically generating collection of topics from Wikipedia/Wikibooks} based on user input. The constructed collection is intended to be displayed through an intuitive interface as assistance to the user creating Topic Maps for a given subject. We discuss the motivation behind the developed tool and outline the technique used for crawling and collecting relevant concepts from Wikipedia/Wikibooks} and for building the topic structure to be output to the user.}}
 * {{hidden||In this paper we present a method for automatically generating collection of topics from Wikipedia/Wikibooks} based on user input. The constructed collection is intended to be displayed through an intuitive interface as assistance to the user creating Topic Maps for a given subject. We discuss the motivation behind the developed tool and outline the technique used for crawling and collecting relevant concepts from Wikipedia/Wikibooks} and for building the topic structure to be output to the user.}}


 * -- align="left" valign=top
 * Yang, Qingxiong; Chen, Xin & Wang, Gang
 * Web 2.0 dictionary
 * Proceedings of the 2008 international conference on Content-based image and video retrieval
 * 2008
 * 


 * -- align="left" valign=top
 * Yang, Yin; Bansal, Nilesh; Dakka, Wisam; Ipeirotis, Panagiotis; Koudas, Nick & Papadias, Dimitris
 * Query by document
 * Proceedings of the Second ACM International Conference on Web Search and Data Mining
 * 2009
 * 
 * {{hidden||We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of content complements content on web sites and traditional media forums such as news papers, news and financial streams, and so on. Given such plethora of information there is a pressing need to cross reference information across textual services. For example, commonly we read a news item and we wonder if there are any blogs reporting related content or vice versa. In this paper, we present techniques to automate the process of cross referencing online information content. We introduce methodologies to extract phrases from a given query document" to be used as queries to search interfaces with the goal to retrieve content related to the query document. In particular we consider two techniques to extract and score key phrases. We also consider techniques to complement extracted phrases with information present in external sources such as Wikipedia and introduce an algorithm called RelevanceRank} for this purpose. We discuss both these techniques in detail and provide an experimental study utilizing a large number of human judges from Amazons's Mechanical Turk service. Detailed experiments demonstrate the effectiveness and efficiency of the proposed techniques for the task of automating retrieval of documents related to a query document."}}
 * {{hidden||We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of content complements content on web sites and traditional media forums such as news papers, news and financial streams, and so on. Given such plethora of information there is a pressing need to cross reference information across textual services. For example, commonly we read a news item and we wonder if there are any blogs reporting related content or vice versa. In this paper, we present techniques to automate the process of cross referencing online information content. We introduce methodologies to extract phrases from a given query document" to be used as queries to search interfaces with the goal to retrieve content related to the query document. In particular we consider two techniques to extract and score key phrases. We also consider techniques to complement extracted phrases with information present in external sources such as Wikipedia and introduce an algorithm called RelevanceRank} for this purpose. We discuss both these techniques in detail and provide an experimental study utilizing a large number of human judges from Amazons's Mechanical Turk service. Detailed experiments demonstrate the effectiveness and efficiency of the proposed techniques for the task of automating retrieval of documents related to a query document."}}


 * -- align="left" valign=top
 * Yao, Jian-Min; Sun, Chang-Long; Hong, Yu; Ge, Yun-Dong & Zhu, Qiao-Min
 * Study on Wikipedia for translation mining for CLIR
 * 2010 International Conference on Machine Learning and Cybernetics (ICMLC 2010), 11-14 July 2010 Piscataway, NJ, USA}
 * 2010
 * 
 * {{hidden||The query translation of Out of Vocabulary (OOV) is one of the key factors that affect the performance of Cross-Language} Information Retrieval (CLIR).} Based on Wikipedia data structure and language features, the paper divides translation environment into target-existence and target-deficit environment. To overcome the difficulty of translation mining in the target-deficit environment, the frequency change information and adjacency information is used to realize the extraction of candidate units, and establish the strategy of mixed translation mining based on the frequency-distance model, surface pattern matching model and summary-score model. Search engine based OOV} translation mining is taken as baseline to test the performance on TOP1} results. It is verified that the mixed translation mining method based on Wikipedia can achieve the precision rate of 0.6279, and the improvement is 6.98\% better than the baseline.}}
 * {{hidden||The query translation of Out of Vocabulary (OOV) is one of the key factors that affect the performance of Cross-Language} Information Retrieval (CLIR).} Based on Wikipedia data structure and language features, the paper divides translation environment into target-existence and target-deficit environment. To overcome the difficulty of translation mining in the target-deficit environment, the frequency change information and adjacency information is used to realize the extraction of candidate units, and establish the strategy of mixed translation mining based on the frequency-distance model, surface pattern matching model and summary-score model. Search engine based OOV} translation mining is taken as baseline to test the performance on TOP1} results. It is verified that the mixed translation mining method based on Wikipedia can achieve the precision rate of 0.6279, and the improvement is 6.98\% better than the baseline.}}


 * -- align="left" valign=top
 * Yatskar, Mark; Pang, Bo; Danescu-Niculescu-Mizil, Cristian & Lee, Lillian
 * For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia
 * HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
 * 2010
 * 


 * -- align="left" valign=top
 * Yeh, Eric; Ramage, Daniel; Manning, Christopher D.; Agirre, Eneko & Soroa, Aitor
 * WikiWalk: random walks on Wikipedia for semantic relatedness
 * Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
 * 2009
 * 
 * {{hidden||Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatedness using personalized PageRank} (random walks) on a graph derived from Wikipedia. This paper evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analysis. We evaluate our techniques on standard word relatedness and text similarity datasets, finding that they capture similarity information complementary to existing Wikipedia-based relatedness measures, resulting in small improvements on a state-of-the-art measure.}}
 * {{hidden||Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatedness using personalized PageRank} (random walks) on a graph derived from Wikipedia. This paper evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analysis. We evaluate our techniques on standard word relatedness and text similarity datasets, finding that they capture similarity information complementary to existing Wikipedia-based relatedness measures, resulting in small improvements on a state-of-the-art measure.}}


 * -- align="left" valign=top
 * Yesilada, Yeliz & Sloan, David
 * Proceedings of the 2008 international cross-disciplinary conference on Web accessibility (W4A)
 * 2008
 * 
 * {{hidden||The World Wide Web (Web) is returning to its origins. Surfers are not just passive readers but content creators. Wikis allow open editing and access, blogs enable personal expression. MySpace, Bebo and Facebook encourage social networking by enabling designs to be 'created' and 'wrapped' around content. Flickr and YouTube} are examples of sites that allow sharing of photos, audio and video which through informal taxonomies can be discovered and shared in the most efficient ways possible. Template based tools enable fast professional looking Web content creation using automated placement, with templates for blogging, picture sharing, and social networking. The Web is becoming ever more democratised as a publishing medium, regardless of technical ability. But with this change comes new challenges for accessibility. New tools, new types of content creator --- where does accessibility fit in this process? The call for participation in W4A} 2008 asked you to consider whether the conjugation of authoring tools and user agents represents an opportunity for automatically generated Web Accessibility or yet another problem for Web Accessibility? Will form-based and highly graphical interfaces excluded disabled people from creation, expression and social networking? And what about educating users --- and customers --- in accessible design? How, for example, do we collectively demand that the producers of the next MySpace} or Second Life adhere to the W3C} Authoring Tool Accessibility Guidelines (ATAG)?} What effect will this have on the wider Web? We posed the question: What happens when surfers become authors and designers? We have collected together an excitingly diverse range of papers for W4A} 2008, each contributing in their own way to helping provide an answer to this question. Papers have ranged from topics as diverse as evaluating the accessibility of Wikipedia, one of the most popular user-generated resources on the Web, and considering the accessibility challenges of geo-referenced information often found in user-generated content. We hear about the challenges of raising awareness of accessibility, through experiences of accessibility education in Brazil, the particular challenges of encouraging accessible design to embrace the needs of older Web users, and the challenges of providing appropriate guidance to policymakers and technology developers alike that gives them freedom to provide innovative and holistic accessible Web solutions while building on the technical framework provided by W3C} WAI.} We also see a continuing focus on Web 2.0; several papers focusing directly on making Web 2.0 technologies as accessible as possible, or adapting assistive technology to cope more effectively with the increasingly interactive behaviour of Web 2.0 Web sites.}}
 * {{hidden||The World Wide Web (Web) is returning to its origins. Surfers are not just passive readers but content creators. Wikis allow open editing and access, blogs enable personal expression. MySpace, Bebo and Facebook encourage social networking by enabling designs to be 'created' and 'wrapped' around content. Flickr and YouTube} are examples of sites that allow sharing of photos, audio and video which through informal taxonomies can be discovered and shared in the most efficient ways possible. Template based tools enable fast professional looking Web content creation using automated placement, with templates for blogging, picture sharing, and social networking. The Web is becoming ever more democratised as a publishing medium, regardless of technical ability. But with this change comes new challenges for accessibility. New tools, new types of content creator --- where does accessibility fit in this process? The call for participation in W4A} 2008 asked you to consider whether the conjugation of authoring tools and user agents represents an opportunity for automatically generated Web Accessibility or yet another problem for Web Accessibility? Will form-based and highly graphical interfaces excluded disabled people from creation, expression and social networking? And what about educating users --- and customers --- in accessible design? How, for example, do we collectively demand that the producers of the next MySpace} or Second Life adhere to the W3C} Authoring Tool Accessibility Guidelines (ATAG)?} What effect will this have on the wider Web? We posed the question: What happens when surfers become authors and designers? We have collected together an excitingly diverse range of papers for W4A} 2008, each contributing in their own way to helping provide an answer to this question. Papers have ranged from topics as diverse as evaluating the accessibility of Wikipedia, one of the most popular user-generated resources on the Web, and considering the accessibility challenges of geo-referenced information often found in user-generated content. We hear about the challenges of raising awareness of accessibility, through experiences of accessibility education in Brazil, the particular challenges of encouraging accessible design to embrace the needs of older Web users, and the challenges of providing appropriate guidance to policymakers and technology developers alike that gives them freedom to provide innovative and holistic accessible Web solutions while building on the technical framework provided by W3C} WAI.} We also see a continuing focus on Web 2.0; several papers focusing directly on making Web 2.0 technologies as accessible as possible, or adapting assistive technology to cope more effectively with the increasingly interactive behaviour of Web 2.0 Web sites.}}
 * {{hidden||The World Wide Web (Web) is returning to its origins. Surfers are not just passive readers but content creators. Wikis allow open editing and access, blogs enable personal expression. MySpace, Bebo and Facebook encourage social networking by enabling designs to be 'created' and 'wrapped' around content. Flickr and YouTube} are examples of sites that allow sharing of photos, audio and video which through informal taxonomies can be discovered and shared in the most efficient ways possible. Template based tools enable fast professional looking Web content creation using automated placement, with templates for blogging, picture sharing, and social networking. The Web is becoming ever more democratised as a publishing medium, regardless of technical ability. But with this change comes new challenges for accessibility. New tools, new types of content creator --- where does accessibility fit in this process? The call for participation in W4A} 2008 asked you to consider whether the conjugation of authoring tools and user agents represents an opportunity for automatically generated Web Accessibility or yet another problem for Web Accessibility? Will form-based and highly graphical interfaces excluded disabled people from creation, expression and social networking? And what about educating users --- and customers --- in accessible design? How, for example, do we collectively demand that the producers of the next MySpace} or Second Life adhere to the W3C} Authoring Tool Accessibility Guidelines (ATAG)?} What effect will this have on the wider Web? We posed the question: What happens when surfers become authors and designers? We have collected together an excitingly diverse range of papers for W4A} 2008, each contributing in their own way to helping provide an answer to this question. Papers have ranged from topics as diverse as evaluating the accessibility of Wikipedia, one of the most popular user-generated resources on the Web, and considering the accessibility challenges of geo-referenced information often found in user-generated content. We hear about the challenges of raising awareness of accessibility, through experiences of accessibility education in Brazil, the particular challenges of encouraging accessible design to embrace the needs of older Web users, and the challenges of providing appropriate guidance to policymakers and technology developers alike that gives them freedom to provide innovative and holistic accessible Web solutions while building on the technical framework provided by W3C} WAI.} We also see a continuing focus on Web 2.0; several papers focusing directly on making Web 2.0 technologies as accessible as possible, or adapting assistive technology to cope more effectively with the increasingly interactive behaviour of Web 2.0 Web sites.}}


 * -- align="left" valign=top
 * Yildiz, Ismail; Kursun, Engin; Saltan, Fatih; Gok, Ali & Karaaslan, Hasan
 * Using Wiki in a Collaborative Group Project: Experiences from a Distance Education Course
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Yildiz, Melda & Hao, Yungwei
 * Power of Social Interaction Technologies in Youth Activism and Civic Engagement
 * Society for Information Technology \& Teacher Education International Conference
 * 2009
 * 


 * -- align="left" valign=top
 * Yildiz, Melda; Mongillo, Gerri & Roux, Yvonne
 * Literacy from A to Z: Power of New Media and Technologies in Teacher Education
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Yildiz, Melda N. & Geldymuradova, Gul
 * Global Positioning System and Social Interaction Software Across Content Areas
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Yildiz, Melda N.; Geldymuradova, Gul & Komekova, Guncha
 * Different Continents Similar Challenges: Integrating Social Media in Teacher Education
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2010
 * 


 * -- align="left" valign=top
 * Yuen, Steve Chi-Yin; Liu, Leping & Maddux, Cleborne
 * Publishing Papers in the International Journal of Technology in Teaching and Learning: Guidelines and Tips
 * Society for Information Technology \& Teacher Education International Conference
 * 2007
 * 


 * -- align="left" valign=top
 * Yun, Jiali; Jing, Liping; Yu, Jian & Huang, Houkuan
 * Semantics-based Representation Model for Multi-layer Text Classification
 * Knowledge-Based and Intelligent Information and Engineering Systems. 14th International Conference, KES 2010, 8-10 Sept. 2010 Berlin, Germany
 * 2010


 * -- align="left" valign=top
 * Zaidi, Faraz; Sallaberry, Arnaud & Melancon, Guy
 * Revealing Hidden Community Structures and Identifying Bridges in Complex Networks: An Application to Analyzing Contents of Web Pages for Browsing
 * Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
 * 2009
 * 
 * {{hidden||The emergence of scale free and small world properties in real world complex networks has stimulated lots of activity in the field of network analysis. An example of such a network comes from the field of Content Analysis (CA) and Text Mining where the goal is to analyze the contents of a set of web pages. The Network can be represented by the words appearing in the web pages as nodes and the edges representing a relation between two words if they appear in a document together. In this paper we present a CA} system that helps users analyze these networks representing the textual contents of a set of web pages visually. Major contributions include a methodology to cluster complex networks based on duplication of nodes and identification of bridges i.e. words that might be of user interest but have a low frequency in the document corpus. We have tested this system with a number of data sets and users have found it very useful for the exploration of data. One of the case studies is presented in detail which is based on browsing a collection of web pages on Wikipedia.}}
 * {{hidden||The emergence of scale free and small world properties in real world complex networks has stimulated lots of activity in the field of network analysis. An example of such a network comes from the field of Content Analysis (CA) and Text Mining where the goal is to analyze the contents of a set of web pages. The Network can be represented by the words appearing in the web pages as nodes and the edges representing a relation between two words if they appear in a document together. In this paper we present a CA} system that helps users analyze these networks representing the textual contents of a set of web pages visually. Major contributions include a methodology to cluster complex networks based on duplication of nodes and identification of bridges i.e. words that might be of user interest but have a low frequency in the document corpus. We have tested this system with a number of data sets and users have found it very useful for the exploration of data. One of the case studies is presented in detail which is based on browsing a collection of web pages on Wikipedia.}}


 * -- align="left" valign=top
 * Zarro, M.A. & Allen, R.B.
 * User-contributed descriptive metadata for libraries and cultural institutions
 * Research and Advanced Technology for Digital Libraries. 14th European Conference, ECDL 2010, 6-10 Sept. 2010 Berlin, Germany
 * 2010
 * 


 * -- align="left" valign=top
 * Zavitsanos, E.; Tsatsaronis, G.; Varlamis, I. & Paliouras, G.
 * Scalable Semantic Annotation of Text Using Lexical and Web Resources
 * Artificial Intelligence: Theories, Models and Applications. 6th Hellenic Conference on AI (SETN 2010), 4-7 May 2010 Berlin, Germany
 * 2010
 * {{hidden||In this paper we are dealing with the task of adding domain-specific semantic tags to a document, based solely on the domain ontology and generic lexical and Web resources. In this manner, we avoid the need for trained domain-specific lexical resources, which hinder the scalability of semantic annotation. More specifically, the proposed method maps the content of the document to concepts of the ontology, using the WordNet} lexicon and Wikipedia. The method comprises a novel combination of measures of semantic relatedness and word sense disambiguation techniques to identify the most related ontology concepts for the document. We test the method on two case studies: (a) a set of summaries, accompanying environmental news videos, (b) a set of medical abstracts. The results in both cases show that the proposed method achieves reasonable performance, thus pointing to a promising path for scalable semantic annotation of documents.}}
 * {{hidden||In this paper we are dealing with the task of adding domain-specific semantic tags to a document, based solely on the domain ontology and generic lexical and Web resources. In this manner, we avoid the need for trained domain-specific lexical resources, which hinder the scalability of semantic annotation. More specifically, the proposed method maps the content of the document to concepts of the ontology, using the WordNet} lexicon and Wikipedia. The method comprises a novel combination of measures of semantic relatedness and word sense disambiguation techniques to identify the most related ontology concepts for the document. We test the method on two case studies: (a) a set of summaries, accompanying environmental news videos, (b) a set of medical abstracts. The results in both cases show that the proposed method achieves reasonable performance, thus pointing to a promising path for scalable semantic annotation of documents.}}
 * {{hidden||In this paper we are dealing with the task of adding domain-specific semantic tags to a document, based solely on the domain ontology and generic lexical and Web resources. In this manner, we avoid the need for trained domain-specific lexical resources, which hinder the scalability of semantic annotation. More specifically, the proposed method maps the content of the document to concepts of the ontology, using the WordNet} lexicon and Wikipedia. The method comprises a novel combination of measures of semantic relatedness and word sense disambiguation techniques to identify the most related ontology concepts for the document. We test the method on two case studies: (a) a set of summaries, accompanying environmental news videos, (b) a set of medical abstracts. The results in both cases show that the proposed method achieves reasonable performance, thus pointing to a promising path for scalable semantic annotation of documents.}}


 * -- align="left" valign=top
 * Zhang, Liming & Li, Dong
 * Web-Based Home School Collaboration System Design and Development
 * World Conference on Educational Multimedia, Hypermedia and Telecommunications
 * 2008
 * 


 * -- align="left" valign=top
 * Zhang, Lei; Liu, QiaoLing; Zhang, Jie; Wang, HaoFen; Pan, Yue & Yu, Yong
 * Semplore: an IR approach to scalable hybrid query of semantic web data
 * Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
 * 2007
 * 
 * {{hidden||As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR} engines in an efficient and scalable manner. We implemented this IR} approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we breifly describe how Semplore is used for searching Wikipedia and an IBM} customer's product information.}}
 * {{hidden||As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR} engines in an efficient and scalable manner. We implemented this IR} approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we breifly describe how Semplore is used for searching Wikipedia and an IBM} customer's product information.}}


 * -- align="left" valign=top
 * Zhang, Weiwei & Zhu, Xiaodong
 * Activity Theoretical Framework for Wiki-based Collaborative Content Creation
 * 2010 International Conference on Management and Service Science (MASS 2010), 24-26 Aug. 2010 Piscataway, NJ, USA}
 * 2010
 * 
 * {{hidden||Most recently, the use of collaboration element within information behavior research namely Collaborative Information Behavior (CIB) has been increasing. In addition, the success of wiki-based, large-scale, open collaborative content creation systems such as Wikipedia has aroused increasing interests of studies on their collaborative model. In contrast to previous related work, this paper focuses on an integrated theoretical framework of collaborative content creation activities in the context of wiki-based system. An activity theoretical approach is used to construct an activity system of wiki-based collaborative content creation and analyze its components, mediators, subsystems and dynamic processes. It's argued that collaborative content creation is the most important component of wiki-based CIB.} Four stages involved in the dynamic process of collaborative content creation activity are learning, editing, feedback and collaboration, as well as conflicts and coordination. The result of the study has provided an integrated theoretical framework of collaborative content creation activities which combines almost all elements such as motive, goal, subject, object, community, tools, rules, roles and collaboration, conflicts, outcome, etc. into one model. It is argued that an activity-theoretical approach to collaborative content creation systems and information behavior research would provide a sound basis for the elaboration of complex collaboration and self-organization mechanisms.}}
 * {{hidden||Most recently, the use of collaboration element within information behavior research namely Collaborative Information Behavior (CIB) has been increasing. In addition, the success of wiki-based, large-scale, open collaborative content creation systems such as Wikipedia has aroused increasing interests of studies on their collaborative model. In contrast to previous related work, this paper focuses on an integrated theoretical framework of collaborative content creation activities in the context of wiki-based system. An activity theoretical approach is used to construct an activity system of wiki-based collaborative content creation and analyze its components, mediators, subsystems and dynamic processes. It's argued that collaborative content creation is the most important component of wiki-based CIB.} Four stages involved in the dynamic process of collaborative content creation activity are learning, editing, feedback and collaboration, as well as conflicts and coordination. The result of the study has provided an integrated theoretical framework of collaborative content creation activities which combines almost all elements such as motive, goal, subject, object, community, tools, rules, roles and collaboration, conflicts, outcome, etc. into one model. It is argued that an activity-theoretical approach to collaborative content creation systems and information behavior research would provide a sound basis for the elaboration of complex collaboration and self-organization mechanisms.}}


 * -- align="left" valign=top
 * Zhang, Ziqi & Iria, José
 * A novel approach to automatic gazetteer generation using Wikipedia
 * Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
 * 2009
 * 
 * {{hidden||Gazetteers or entity dictionaries are important knowledge resources for solving a wide range of NLP} problems, such as entity extraction. We introduce a novel method to automatically generate gazetteers from seed lists using an external knowledge resource, the Wikipedia. Unlike previous methods, our method exploits the rich content and various structural elements of Wikipedia, and does not rely on language- or domain-specific knowledge. Furthermore, applying the extended gazetteers to an entity extraction task in a scientific domain, we empirically observed a significant improvement in system accuracy when compared with those using seed gazetteers.}}
 * {{hidden||Gazetteers or entity dictionaries are important knowledge resources for solving a wide range of NLP} problems, such as entity extraction. We introduce a novel method to automatically generate gazetteers from seed lists using an external knowledge resource, the Wikipedia. Unlike previous methods, our method exploits the rich content and various structural elements of Wikipedia, and does not rely on language- or domain-specific knowledge. Furthermore, applying the extended gazetteers to an entity extraction task in a scientific domain, we empirically observed a significant improvement in system accuracy when compared with those using seed gazetteers.}}


 * -- align="left" valign=top
 * Zhou, Baoyao; Luo, Ping; Xiong, Yuhong & Liu, Wei
 * Wikipedia-Graph Based Key Concept Extraction towards News Analysis
 * Proceedings of the 2009 IEEE Conference on Commerce and Enterprise Computing
 * 2009
 * 
 * {{hidden||The well-known Wikipedia can serve as a comprehensive knowledge repository to facilitate textual content analysis, due to its abundance, high quality and well-structuring. In this paper, we propose WikiRank} - a Wikipedia-graph based ranking model, which can be used to extract key Wikipedia concepts from a document. These key concepts can be regarded as the most salient terms to represent the themeof the document. Different from other existing graph-based ranking algorithms, the concept graph used for ranking in this model is constructed by leveraging not only the co-occurrence relations within the local context of a document but also the preprocessed hyperlink-structure of Wikipedia. We have applied the proposed WikiRank} model with the Support Propagation ranking algorithm to analyze the news articles, especially for enterprise news. These promising applications include Wikipedia Concept Linking and Enterprise Concept Cloud Generation.}}
 * {{hidden||The well-known Wikipedia can serve as a comprehensive knowledge repository to facilitate textual content analysis, due to its abundance, high quality and well-structuring. In this paper, we propose WikiRank} - a Wikipedia-graph based ranking model, which can be used to extract key Wikipedia concepts from a document. These key concepts can be regarded as the most salient terms to represent the themeof the document. Different from other existing graph-based ranking algorithms, the concept graph used for ranking in this model is constructed by leveraging not only the co-occurrence relations within the local context of a document but also the preprocessed hyperlink-structure of Wikipedia. We have applied the proposed WikiRank} model with the Support Propagation ranking algorithm to analyze the news articles, especially for enterprise news. These promising applications include Wikipedia Concept Linking and Enterprise Concept Cloud Generation.}}


 * -- align="left" valign=top
 * Zhou, Yunqing; Guo, Zhongqi; Ren, Peng & Yu, Yong
 * Applying Wikipedia-based Explicit Semantic Analysis For Query-biased Document Summarization
 * Advanced Intelligent Computing Theories and Applications. 6th International Conference on Intelligent Computing, ICIC 2010, 18-21 Aug. 2010 Berlin, Germany
 * 2010
 * 


 * -- align="left" valign=top
 * Zhu, Shanyuan
 * Games, simulations and virtual environment in education
 * Society for Information Technology \& Teacher Education International Conference
 * 2010
 * 


 * -- align="left" valign=top
 * Zhu, Shiai; Wang, Gang; Ngo, Chong-Wah & Jiang, Yu-Gang
 * On the sampling of web images for learning visual concept classifiers
 * Proceedings of the ACM International Conference on Image and Video Retrieval
 * 2010
 * 
 * {{hidden||Visual concept learning often requires a large set of training images. In practice, nevertheless, acquiring noise-free training labels with sufficient positive examples is always expensive. A plausible solution for training data collection is by sampling the largely available user-tagged images from social media websites. With the general belief that the probability of correct tagging is higher than that of incorrect tagging, such a solution often sounds feasible, though is not without challenges. First, user-tags can be subjective and, to certain extent, are ambiguous. For instance, an image tagged with whales" may be simply a picture about ocean museum. Learning concept "whales" with such training samples will not be effective. Second user-tags can be overly abbreviated. For instance an image about concept "wedding" may be tagged with "love" or simply the couple's names. As a result crawling sufficient positive training examples is difficult. This paper empirically studies the impact of exploiting the tagged images towards concept learning investigating the issue of how the quality of pseudo training images affects concept detection performance. In addition we propose a simple approach named semantic field for predicting the relevance between a target concept and the tag list associated with the images. Specifically the relevance is determined through concept-tag co-occurrence by exploring external sources such as WordNet} and Wikipedia. The proposed approach is shown to be effective in selecting pseudo training examples exhibiting better performance in concept learning than other approaches such as those based on keyword sampling and tag voting."}}
 * {{hidden||Visual concept learning often requires a large set of training images. In practice, nevertheless, acquiring noise-free training labels with sufficient positive examples is always expensive. A plausible solution for training data collection is by sampling the largely available user-tagged images from social media websites. With the general belief that the probability of correct tagging is higher than that of incorrect tagging, such a solution often sounds feasible, though is not without challenges. First, user-tags can be subjective and, to certain extent, are ambiguous. For instance, an image tagged with whales" may be simply a picture about ocean museum. Learning concept "whales" with such training samples will not be effective. Second user-tags can be overly abbreviated. For instance an image about concept "wedding" may be tagged with "love" or simply the couple's names. As a result crawling sufficient positive training examples is difficult. This paper empirically studies the impact of exploiting the tagged images towards concept learning investigating the issue of how the quality of pseudo training images affects concept detection performance. In addition we propose a simple approach named semantic field for predicting the relevance between a target concept and the tag list associated with the images. Specifically the relevance is determined through concept-tag co-occurrence by exploring external sources such as WordNet} and Wikipedia. The proposed approach is shown to be effective in selecting pseudo training examples exhibiting better performance in concept learning than other approaches such as those based on keyword sampling and tag voting."}}


 * -- align="left" valign=top
 * Zinskie, Cordelia & Repman, Judi
 * Teaching Qualitative Research Online: Strategies, Issues, and Resources
 * World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education
 * 2007
 * 


 * -- align="left" valign=top
 * Lee, M.J.W. & McLoughlin, C.
 * Harnessing the affordances of Web 2.0 and social software tools: Can we finally make "student-centered" learning a reality?
 * EDMEDIA
 * 2008
 * 


 * -- align="left" valign=top
 * Jankowski, Jacek & Kruk, Sebastian Ryszard
 * 2Lip: The step towards the web3D
 * 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China
 * 2008
 * 
 * {{hidden||The World Wide Web allows users to create and publish a variety of resources, including multimedia ones. Most of the contemporary best practices for designing web interfaces, however, do not take into account the {3D} techniques. In this paper we present a novel approach for designing interactive web Applications-2-Layer} Interface Paradigm (2LIP).} The background layer of the {2LIP-type} user interface is a {3D} scene, which a user cannot directly interact with. The foreground layer is {HTML} content. Only taking an action on this content (e.g. pressing a hyperlink, scrolling a page) can affect the {3D} scene. We introduce a reference implementation of {2LIP:} Copernicus - The Virtual {3D} Encyclopedia, which shows one of the potential paths of the evolution of Wikipedia towards Web 3.0. Based on the evaluation of Copernicus we prove that designing web interfaces according to {2LIP} provides users a better browsing experience, without harming the interaction.}}
 * {{hidden||The World Wide Web allows users to create and publish a variety of resources, including multimedia ones. Most of the contemporary best practices for designing web interfaces, however, do not take into account the {3D} techniques. In this paper we present a novel approach for designing interactive web Applications-2-Layer} Interface Paradigm (2LIP).} The background layer of the {2LIP-type} user interface is a {3D} scene, which a user cannot directly interact with. The foreground layer is {HTML} content. Only taking an action on this content (e.g. pressing a hyperlink, scrolling a page) can affect the {3D} scene. We introduce a reference implementation of {2LIP:} Copernicus - The Virtual {3D} Encyclopedia, which shows one of the potential paths of the evolution of Wikipedia towards Web 3.0. Based on the evaluation of Copernicus we prove that designing web interfaces according to {2LIP} provides users a better browsing experience, without harming the interaction.}}


 * -- align="left" valign=top
 * Tjong, Erik & Sang, Kim
 * A baseline approach for detecting sentences containing uncertainty
 * Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
 * 2010
 * 
 * {{hidden||We apply a baseline approach to the CoNLL-2010} shared task data sets on hedge detection. Weights have been assigned to cue words marked in the training data based on their occurrences in certain and uncertain sentences. New sentences received scores that correspond with those of their best scoring cue word, if present. The best acceptance scores for uncertain sentences were determined using 10-fold cross validation on the training data. This approach performed reasonably on the shared task's biological (F=82.0) and Wikipedia (F=62.8) data sets.}}
 * {{hidden||We apply a baseline approach to the CoNLL-2010} shared task data sets on hedge detection. Weights have been assigned to cue words marked in the training data based on their occurrences in certain and uncertain sentences. New sentences received scores that correspond with those of their best scoring cue word, if present. The best acceptance scores for uncertain sentences were determined using 10-fold cross validation on the training data. This approach performed reasonably on the shared task's biological (F=82.0) and Wikipedia (F=62.8) data sets.}}


 * -- align="left" valign=top
 * Erdmann, Maike; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro
 * A bilingual dictionary extracted from the Wikipedia link structure
 * 13th International Conference on Database Systems for Advanced Applications, DASFAA 2008, March 19, 2008 - March 21, 2008 New Delhi, India
 * 2008
 * 
 * {{hidden||A lot of bilingual dictionaries have been released on the WWW.} However, these dictionaries insufficiently cover new and domainspecific terminology. In our demonstration, we present a dictionary constructed by analyzing the link structure of Wikipedia, a huge scale encyclopedia containing a large amount of links between articles in different languages. We analyzed not only these interlanguage links but extracted even more translation candidates from redirect page and link text information. In an experiment, we already proved the advantages of our dictionary compared to manually created dictionaries as well as to extracting bilingual terminology from parallel corpora. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||A lot of bilingual dictionaries have been released on the WWW.} However, these dictionaries insufficiently cover new and domainspecific terminology. In our demonstration, we present a dictionary constructed by analyzing the link structure of Wikipedia, a huge scale encyclopedia containing a large amount of links between articles in different languages. We analyzed not only these interlanguage links but extracted even more translation candidates from redirect page and link text information. In an experiment, we already proved the advantages of our dictionary compared to manually created dictionaries as well as to extracting bilingual terminology from parallel corpora. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Tang, Buzhou; Wang, Xiaolong; Wang, Xuan; Yuan, Bo & Fan, Shixi
 * A cascade method for detecting hedges and their scope in natural language text
 * Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
 * 2010
 * 
 * {{hidden||Detecting hedges and their scope in natural language text is very important for information inference. In this paper, we present a system based on a cascade method for the CoNLL-2010} shared task. The system composes of two components: one for detecting hedges and another one for detecting their scope. For detecting hedges, we build a cascade subsystem. Firstly, a conditional random field (CRF) model and a large margin-based model are trained respectively. Then, we train another CRF} model using the result of the first phase. For detecting the scope of hedges, a CRF} model is trained according to the result of the first subtask. The experiments show that our system achieves 86.36\% F-measure on biological corpus and 55.05\% F-measure on Wikipedia corpus for hedge detection, and 49.95\% F-measure on biological corpus for hedge scope detection. Among them, 86.36\% is the best result on biological corpus for hedge detection.}}
 * {{hidden||Detecting hedges and their scope in natural language text is very important for information inference. In this paper, we present a system based on a cascade method for the CoNLL-2010} shared task. The system composes of two components: one for detecting hedges and another one for detecting their scope. For detecting hedges, we build a cascade subsystem. Firstly, a conditional random field (CRF) model and a large margin-based model are trained respectively. Then, we train another CRF} model using the result of the first phase. For detecting the scope of hedges, a CRF} model is trained according to the result of the first subtask. The experiments show that our system achieves 86.36\% F-measure on biological corpus and 55.05\% F-measure on Wikipedia corpus for hedge detection, and 49.95\% F-measure on biological corpus for hedge scope detection. Among them, 86.36\% is the best result on biological corpus for hedge detection.}}


 * -- align="left" valign=top
 * Plank, Barbara
 * A comparison of structural correspondence learning and self-training for discriminative parse selection
 * Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
 * 2009
 * 
 * {{hidden||This paper evaluates two semi-supervised techniques for the adaptation of a parse selection model to Wikipedia domains. The techniques examined are Structural Correspondence Learning (SCL) (Blitzer} et al., 2006) and Self-training (Abney, 2007; McClosky} et al., 2006). A preliminary evaluation favors the use of SCL} over the simpler self-training techniques.}}
 * {{hidden||This paper evaluates two semi-supervised techniques for the adaptation of a parse selection model to Wikipedia domains. The techniques examined are Structural Correspondence Learning (SCL) (Blitzer} et al., 2006) and Self-training (Abney, 2007; McClosky} et al., 2006). A preliminary evaluation favors the use of SCL} over the simpler self-training techniques.}}


 * -- align="left" valign=top
 * Gleave, E.; Welser, H.T.; Lento, T.M. & Smith, M.A.
 * A conceptual and operational definition of 'social role' in online community
 * 2009 42nd Hawaii International Conference on System Sciences. HICSS-42, 5-8 Jan. 2009 Piscataway, NJ, USA}
 * 2008
 * 


 * -- align="left" valign=top
 * Adler, B. Thomas & de Alfaro, Luca
 * A content-driven reputation system for the wikipedia
 * Proceedings of the 16th international conference on World Wide Web
 * 2007
 * 


 * -- align="left" valign=top
 * Weerkamp, Wouter; Balog, Krisztian & de Rijke, Maarten
 * A generative blog post retrieval model that uses query expansion based on external collections
 * Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
 * 2009
 * 


 * -- align="left" valign=top
 * Ye, Zheng; Huang, Xiangji & Lin, Hongfei
 * A graph-based approach to mining multilingual word associations from Wikipedia
 * 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, July 19, 2009 - July 23, 2009 Boston, MA, United states
 * 2009
 * 
 * {{hidden||Compilation and indexing terms, Copyright 2010 Elsevier Inc.In this paper, we propose a graph-based approach to constructing a multilingual association dictionary from Wikipedia, in which we exploit two kinds of links in Wikipedia articles to associate multilingual words and concepts together in a graph. The mined association dictionary is applied in cross language information retrieval (CLIR) to verify its quality. We evaluate our approach on four CLIR} data sets and the experimental results show that it is possible to mine a good multilingual association dictionary from Wikipedia articles.}}
 * {{hidden||Compilation and indexing terms, Copyright 2010 Elsevier Inc.In this paper, we propose a graph-based approach to constructing a multilingual association dictionary from Wikipedia, in which we exploit two kinds of links in Wikipedia articles to associate multilingual words and concepts together in a graph. The mined association dictionary is applied in cross language information retrieval (CLIR) to verify its quality. We evaluate our approach on four CLIR} data sets and the experimental results show that it is possible to mine a good multilingual association dictionary from Wikipedia articles.}}


 * -- align="left" valign=top
 * Georgescul, Maria
 * A hedgehop over a max-margin framework using hedge cues
 * Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
 * 2010
 * 
 * {{hidden||In this paper, we describe the experimental settings we adopted in the context of the 2010 CoNLL} shared task for detecting sentences containing uncertainty. The classification results reported on are obtained using discriminative learning with features essentially incorporating lexical information. Hyper-parameters are tuned for each domain: using BioScope} training data for the biomedical domain and Wikipedia training data for the Wikipedia test set. By allowing an efficient handling of combinations of large-scale input features, the discriminative approach we adopted showed highly competitive empirical results for hedge detection on the Wikipedia dataset: our system is ranked as the first with an F-score of 60.17\%.}}
 * {{hidden||In this paper, we describe the experimental settings we adopted in the context of the 2010 CoNLL} shared task for detecting sentences containing uncertainty. The classification results reported on are obtained using discriminative learning with features essentially incorporating lexical information. Hyper-parameters are tuned for each domain: using BioScope} training data for the biomedical domain and Wikipedia training data for the Wikipedia test set. By allowing an efficient handling of combinations of large-scale input features, the discriminative approach we adopted showed highly competitive empirical results for hedge detection on the Wikipedia dataset: our system is ranked as the first with an F-score of 60.17\%.}}


 * -- align="left" valign=top
 * Kilicoglu, Halil & Bergler, Sabine
 * A high-precision approach to detecting hedges and their scopes
 * Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
 * 2010
 * 
 * {{hidden||We extend our prior work on speculative sentence recognition and speculation scope detection in biomedical text to the CoNLL-2010} Shared Task on Hedge Detection. In our participation, we sought to assess the extensibility and portability of our prior work, which relies on linguistic categorization and weighting of hedging cues and on syntactic patterns in which these cues play a role. For Task {1B, we tuned our categorization and weighting scheme to recognize hedging in biological text. By accommodating a small number of vagueness quantifiers, we were able to extend our methodology to detecting vague sentences in Wikipedia articles. We exploited constituent parse trees in addition to syntactic dependency relations in resolving hedging scope. Our results are competitive with those of closed-domain trained systems and demonstrate that our high-precision oriented methodology is extensible and portable.}}
 * {{hidden||We extend our prior work on speculative sentence recognition and speculation scope detection in biomedical text to the CoNLL-2010} Shared Task on Hedge Detection. In our participation, we sought to assess the extensibility and portability of our prior work, which relies on linguistic categorization and weighting of hedging cues and on syntactic patterns in which these cues play a role. For Task {1B, we tuned our categorization and weighting scheme to recognize hedging in biological text. By accommodating a small number of vagueness quantifiers, we were able to extend our methodology to detecting vague sentences in Wikipedia articles. We exploited constituent parse trees in addition to syntactic dependency relations in resolving hedging scope. Our results are competitive with those of closed-domain trained systems and demonstrate that our high-precision oriented methodology is extensible and portable.}}


 * -- align="left" valign=top
 * Halfaker, Aaron; Kittur, Aniket; Kraut, Robert & Riedl, John
 * A jury of your peers: Quality, experience and ownership in Wikipedia
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Milne, David; Witten, Ian H. & Nichols, David M.
 * A knowledge-based search engine powered by Wikipedia
 * 16th ACM Conference on Information and Knowledge Management, CIKM 2007, November 6, 2007 - November 9, 2007 Lisboa, Portugal
 * 2007
 * 
 * {{hidden||This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC} {HARD} track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.}}
 * {{hidden||This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC} {HARD} track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.}}


 * -- align="left" valign=top
 * Kane, Gerald; Majchrzak, Ann; Johnson, Jeremiah & Chenisern, Lily
 * A Longitudinal Model of Perspective Making and Perspective Taking Within Fluid Online Collectives
 * 2009
 * 
 * 


 * -- align="left" valign=top
 * Chen, Lin & Eugenio, Barbara Di
 * A Lucene and maximum entropy model based hedge detection system
 * Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
 * 2010
 * 
 * {{hidden||This paper describes the approach to hedge detection we developed, in order to participate in the shared task at CoNLL-2010.} A supervised learning approach is employed in our implementation. Hedge cue annotations in the training data are used as the seed to build a reliable hedge cue set. Maximum Entropy (MaxEnt) model is used as the learning technique to determine uncertainty. By making use of Apache Lucene, we are able to do fuzzy string match to extract hedge cues, and to incorporate part-of-speech (POS) tags in hedge cues. Not only can our system determine the certainty of the sentence, but is also able to find all the contained hedges. Our system was ranked third on the Wikipedia dataset. In later experiments with different parameters, we further improved our results, with a 0.612 F-score on the Wikipedia dataset, and a 0.802 F-score on the biological dataset.}}
 * {{hidden||This paper describes the approach to hedge detection we developed, in order to participate in the shared task at CoNLL-2010.} A supervised learning approach is employed in our implementation. Hedge cue annotations in the training data are used as the seed to build a reliable hedge cue set. Maximum Entropy (MaxEnt) model is used as the learning technique to determine uncertainty. By making use of Apache Lucene, we are able to do fuzzy string match to extract hedge cues, and to incorporate part-of-speech (POS) tags in hedge cues. Not only can our system determine the certainty of the sentence, but is also able to find all the contained hedges. Our system was ranked third on the Wikipedia dataset. In later experiments with different parameters, we further improved our results, with a 0.612 F-score on the Wikipedia dataset, and a 0.802 F-score on the biological dataset.}}


 * -- align="left" valign=top
 * Pang, Cheong-Iao & Biuk-Aghai, Robert P.
 * A method for category similarity calculation in wikis
 * 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland
 * 2010
 * 


 * -- align="left" valign=top
 * Webster, David; Xu, Jie; Mundy, Darren & Warren, Paul
 * A practical model for conceptual comparison using a wiki
 * 2009 9th IEEE International Conference on Advanced Learning Technologies, ICALT 2009, July 15, 2009 - July 17, 2009 Riga, Latvia
 * 2009
 * 


 * -- align="left" valign=top
 * He, Jiyin & Rijke, Maarten De
 * A ranking approach to target detection for automatic link generation
 * 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland
 * 2010
 * 


 * -- align="left" valign=top
 * Chu, Eric; Baid, Akanksha; Chen, Ting; Doan, AnHai & Naughton, Jeffrey
 * A relational approach to incrementally extracting and querying structure in unstructured data
 * Proceedings of the 33rd international conference on Very large data bases
 * 2007
 * 


 * -- align="left" valign=top
 * Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro
 * A search engine for browsing the Wikipedia thesaurus
 * 13th International Conference on Database Systems for Advanced Applications, DASFAA 2008, March 19, 2008 - March 21, 2008 New Delhi, India
 * 2008
 * 
 * {{hidden||Wikipedia has become a huge phenomenon on the WWW.} As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL} identification for concepts. In our previous work, we proposed link structure mining algorithms to extract a huge scale and accurate association thesaurus from Wikipedia. The association thesaurus covers almost 1.3 million concepts and the significant accuracy is proved in detailed experiments. To prove its practicality, we implemented three features on the association thesaurus; a search engine for browsing Wikipedia Thesaurus, an XML} Web service for the thesaurus and a Semantic Web support feature. We show these features in this demonstration. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Wikipedia has become a huge phenomenon on the WWW.} As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL} identification for concepts. In our previous work, we proposed link structure mining algorithms to extract a huge scale and accurate association thesaurus from Wikipedia. The association thesaurus covers almost 1.3 million concepts and the significant accuracy is proved in detailed experiments. To prove its practicality, we implemented three features on the association thesaurus; a search engine for browsing Wikipedia Thesaurus, an XML} Web service for the thesaurus and a Semantic Web support feature. We show these features in this demonstration. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Li, Decong; Li, Sujian; Li, Wenjie; Wang, Wei & Qu, Weiguang
 * A semi-supervised key phrase extraction approach: learning from title phrases through a document semantic network
 * Proceedings of the ACL 2010 Conference Short Papers
 * 2010
 * 


 * -- align="left" valign=top
 * Müller, Christof & Gurevych, Iryna
 * A study on the semantic relatedness of query and document terms in information retrieval
 * Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
 * 2009
 * 


 * -- align="left" valign=top
 * Yin, Xiaoshi; Huang, Jimmy Xiangji; Zhou, Xiaofeng & Li, Zhoujun
 * A survival modeling approach to biomedical search result diversification using wikipedia
 * Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval
 * 2010
 * 
 * {{hidden||In this paper, we propose a probabilistic survival model derived from the survival analysis theory for measuring aspect novelty. The retrieved documents' query-relevance and novelty are combined at the aspect level for re-ranking. Experiments conducted on the TREC} 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed approach in promoting ranking diversity for biomedical information retrieval.}}
 * {{hidden||In this paper, we propose a probabilistic survival model derived from the survival analysis theory for measuring aspect novelty. The retrieved documents' query-relevance and novelty are combined at the aspect level for re-ranking. Experiments conducted on the TREC} 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed approach in promoting ranking diversity for biomedical information retrieval.}}


 * -- align="left" valign=top
 * Poole, Erika Shehan & Grudin, Jonathan
 * A taxonomy of wiki genres in enterprise settings
 * 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland
 * 2010
 * 


 * -- align="left" valign=top
 * Yang, Xintian; Asur, Sitaram; Parthasarathy, Srinivasan & Mehta, Sameep
 * A visual-analytic toolkit for dynamic interaction graphs
 * Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
 * 2008
 * 
 * {{hidden||In this article we describe a visual-analytic tool for the interrogation of evolving interaction network data such as those found in social, bibliometric, WWW} and biological applications. The tool we have developed incorporates common visualization paradigms such as zooming, coarsening and filtering while naturally integrating information extracted by a previously described event-driven framework for characterizing the evolution of such networks. The visual front-end provides features that are specifically useful in the analysis of interaction networks, capturing the dynamic nature of both individual entities as well as interactions among them. The tool provides the user with the option of selecting multiple views, designed to capture different aspects of the evolving graph from the perspective of a node, a community or a subset of nodes of interest. Standard visual templates and cues are used to highlight critical changes that have occurred during the evolution of the network. A key challenge we address in this work is that of scalability - handling large graphs both in terms of the efficiency of the back-end, and in terms of the efficiency of the visual layout and rendering. Two case studies based on bibliometric and Wikipedia data are presented to demonstrate the utility of the toolkit for visual knowledge discovery.}}
 * {{hidden||In this article we describe a visual-analytic tool for the interrogation of evolving interaction network data such as those found in social, bibliometric, WWW} and biological applications. The tool we have developed incorporates common visualization paradigms such as zooming, coarsening and filtering while naturally integrating information extracted by a previously described event-driven framework for characterizing the evolution of such networks. The visual front-end provides features that are specifically useful in the analysis of interaction networks, capturing the dynamic nature of both individual entities as well as interactions among them. The tool provides the user with the option of selecting multiple views, designed to capture different aspects of the evolving graph from the perspective of a node, a community or a subset of nodes of interest. Standard visual templates and cues are used to highlight critical changes that have occurred during the evolution of the network. A key challenge we address in this work is that of scalability - handling large graphs both in terms of the efficiency of the back-end, and in terms of the efficiency of the visual layout and rendering. Two case studies based on bibliometric and Wikipedia data are presented to demonstrate the utility of the toolkit for visual knowledge discovery.}}


 * -- align="left" valign=top
 * Potthast, Martin; Stein, Benno & Anderka, Maik
 * A Wikipedia-based multilingual retrieval model
 * 30th Annual European Conference on Information Retrieval, ECIR 2008, March 30, 2008 - April 3, 2008 Glasgow, United kingdom
 * 2008
 * 
 * {{hidden||This paper introduces CL-ESA, a new multilingual retrieval model for the analysis of cross-language similarity. The retrieval model exploits the multilingual alignment of Wikipedia: given a document d written in language L we construct a concept vector d for d, where each dimension i in d quantifies the similarity of d with respect to a document d*ichosen from the L-subset"} of Wikipedia. Likewise for a second document d' written in language L' L L' we construct a concept vector d' using from the L'-subset of the Wikipedia the topic-aligned counterparts d'*i of our previously chosen documents. Since the two concept vectors d and d' are collection-relative representations of d and d' they are language-independent. I. e. their similarity can directly be computed with the cosine similarity measure}}
 * {{hidden||This paper introduces CL-ESA, a new multilingual retrieval model for the analysis of cross-language similarity. The retrieval model exploits the multilingual alignment of Wikipedia: given a document d written in language L we construct a concept vector d for d, where each dimension i in d quantifies the similarity of d with respect to a document d*ichosen from the L-subset"} of Wikipedia. Likewise for a second document d' written in language L' L L' we construct a concept vector d' using from the L'-subset of the Wikipedia the topic-aligned counterparts d'*i of our previously chosen documents. Since the two concept vectors d and d' are collection-relative representations of d and d' they are language-independent. I. e. their similarity can directly be computed with the cosine similarity measure}}


 * -- align="left" valign=top
 * Anand, Sarabjot Singh; Bunescu, Razvan; Carvcdho, Vitor; Chomicki, Jan; Conitzer, Vincent; Cox, Michael T.; Dignum, Virginia; Dodds, Zachary; Dredze, Mark; Furcy, David; Gabrilovich, Evgeniy; Goker, Mehmet H.; Guesgen, Hans; Hirsh, Haym; Jannach, Dietmar; Junker, Ulrich; Ketter, Wolfgang; Kobsa, Alfred; Koenig, Sven; Lau, Tessa; Lewis, Lundy; Matson, Eric; Metzler, Ted; Mihalcea, Rada; Mobasher, Bamshad; Pineau, Joelle; Poupart, Pascal; Raja, Anita; Ruml, Wheeler; Sadeh, Norman; Shani, Guy; Shapiro, Daniel; Smith, Trey; Taylor, Matthew E.; Wagstaff, Kiri; Walsh, William & Zhou, Rong
 * AAAI 2008 workshop reports
 * 445 Burgess Drive, Menlo Park, CA} 94025-3496, United States
 * 2009
 * {{hidden||AAAI} was pleased to present the AAAI-08} Workshop Program, held Sunday and Monday, July 13-14, in Chicago, Illinois, USA.} The program included the following 15 workshops: Advancements in POMDP} Solvers; AI} Education Workshop Colloquium; Coordination, Organizations, Institutions, and Norms in Agent Systems, Enhanced Messaging; Human Implications of {Human-Robot} Interaction; Intelligent Techniques for Web Personalization and Recommender Systems; Metareasoning: Thinking about Thinking; Multidisciplinary Workshop on Advances in Preference Handling; Search in Artificial Intelligence and Robotics; Spatial and Temporal Reasoning; Trading Agent Design and Analysis; Transfer Learning for Complex Tasks; What Went Wrong and Why: Lessons from AI} Research and Applications; and Wikipedia and Artificial Intelligence: An Evolving Synergy. Copyright 2009, Association for the Advancement of Artificial Intelligence. All rights reserved.}}
 * {{hidden||AAAI} was pleased to present the AAAI-08} Workshop Program, held Sunday and Monday, July 13-14, in Chicago, Illinois, USA.} The program included the following 15 workshops: Advancements in POMDP} Solvers; AI} Education Workshop Colloquium; Coordination, Organizations, Institutions, and Norms in Agent Systems, Enhanced Messaging; Human Implications of {Human-Robot} Interaction; Intelligent Techniques for Web Personalization and Recommender Systems; Metareasoning: Thinking about Thinking; Multidisciplinary Workshop on Advances in Preference Handling; Search in Artificial Intelligence and Robotics; Spatial and Temporal Reasoning; Trading Agent Design and Analysis; Transfer Learning for Complex Tasks; What Went Wrong and Why: Lessons from AI} Research and Applications; and Wikipedia and Artificial Intelligence: An Evolving Synergy. Copyright 2009, Association for the Advancement of Artificial Intelligence. All rights reserved.}}
 * {{hidden||AAAI} was pleased to present the AAAI-08} Workshop Program, held Sunday and Monday, July 13-14, in Chicago, Illinois, USA.} The program included the following 15 workshops: Advancements in POMDP} Solvers; AI} Education Workshop Colloquium; Coordination, Organizations, Institutions, and Norms in Agent Systems, Enhanced Messaging; Human Implications of {Human-Robot} Interaction; Intelligent Techniques for Web Personalization and Recommender Systems; Metareasoning: Thinking about Thinking; Multidisciplinary Workshop on Advances in Preference Handling; Search in Artificial Intelligence and Robotics; Spatial and Temporal Reasoning; Trading Agent Design and Analysis; Transfer Learning for Complex Tasks; What Went Wrong and Why: Lessons from AI} Research and Applications; and Wikipedia and Artificial Intelligence: An Evolving Synergy. Copyright 2009, Association for the Advancement of Artificial Intelligence. All rights reserved.}}


 * -- align="left" valign=top
 * Huang, Zhiheng; Zeng, Guangping; Xu, Weiqun & Celikyilmaz, Asli
 * Accurate semantic class classifier for coreference resolution
 * Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
 * 2009
 * 
 * {{hidden||There have been considerable attempts to incorporate semantic knowledge into coreference resolution systems: different knowledge sources such as WordNet} and Wikipedia have been used to boost the performance. In this paper, we propose new ways to extract WordNet} feature. This feature, along with other features such as named entity feature, can be used to build an accurate semantic class (SC) classifier. In addition, we analyze the SC} classification errors and propose to use relaxed SC} agreement features. The proposed accurate SC} classifier and the relaxation of SC} agreement features on ACE2} coreference evaluation can boost our baseline system by 10.4\% and 9.7\% using MUC} score and anaphor accuracy respectively.}}
 * {{hidden||There have been considerable attempts to incorporate semantic knowledge into coreference resolution systems: different knowledge sources such as WordNet} and Wikipedia have been used to boost the performance. In this paper, we propose new ways to extract WordNet} feature. This feature, along with other features such as named entity feature, can be used to build an accurate semantic class (SC) classifier. In addition, we analyze the SC} classification errors and propose to use relaxed SC} agreement features. The proposed accurate SC} classifier and the relaxation of SC} agreement features on ACE2} coreference evaluation can boost our baseline system by 10.4\% and 9.7\% using MUC} score and anaphor accuracy respectively.}}


 * -- align="left" valign=top
 * Kelly, Colin; Devereux, Barry & Korhonen, Anna
 * Acquiring human-like feature-based conceptual representations from corpora
 * Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics
 * 2010
 * 


 * -- align="left" valign=top
 * Yang, Jeongwon & Shim, J.P.
 * Adoption Factors of Online Knowledge Sharing Service in the Era of Web 2.0
 * 2009
 * 
 * {{hidden||While the topic of online knowledge sharing services based on Web 2.0 has received considerable attention, virtually all the studies dealing with online knowledge sharing services have neglected or given cursory attention to the users’ perception regarding the usage of those services and the corresponding level of interaction. This study focuses on users’ different attitudes and expectations toward the domestic online knowledge sharing service represented by Korea’s {‘Jisik} IN’} (translation: knowledge IN) of Naver and a foreign counterpart of online knowledge sharing service represented by Wikipedia, which are often presented as a model of Web 2.0 applications. In Korea, the popularity gap between Jisik IN} and Wikipedia drops a hint of the necessity in grasping which factors are more important in allowing for more users’ engagement and satisfaction with regards to the online knowledge sharing service. This study presents and suggests an integrated model which is based on the constructs of WebQual, subjective norms, and cultural dimensions.}}
 * {{hidden||While the topic of online knowledge sharing services based on Web 2.0 has received considerable attention, virtually all the studies dealing with online knowledge sharing services have neglected or given cursory attention to the users’ perception regarding the usage of those services and the corresponding level of interaction. This study focuses on users’ different attitudes and expectations toward the domestic online knowledge sharing service represented by Korea’s {‘Jisik} IN’} (translation: knowledge IN) of Naver and a foreign counterpart of online knowledge sharing service represented by Wikipedia, which are often presented as a model of Web 2.0 applications. In Korea, the popularity gap between Jisik IN} and Wikipedia drops a hint of the necessity in grasping which factors are more important in allowing for more users’ engagement and satisfaction with regards to the online knowledge sharing service. This study presents and suggests an integrated model which is based on the constructs of WebQual, subjective norms, and cultural dimensions.}}
 * {{hidden||While the topic of online knowledge sharing services based on Web 2.0 has received considerable attention, virtually all the studies dealing with online knowledge sharing services have neglected or given cursory attention to the users’ perception regarding the usage of those services and the corresponding level of interaction. This study focuses on users’ different attitudes and expectations toward the domestic online knowledge sharing service represented by Korea’s {‘Jisik} IN’} (translation: knowledge IN) of Naver and a foreign counterpart of online knowledge sharing service represented by Wikipedia, which are often presented as a model of Web 2.0 applications. In Korea, the popularity gap between Jisik IN} and Wikipedia drops a hint of the necessity in grasping which factors are more important in allowing for more users’ engagement and satisfaction with regards to the online knowledge sharing service. This study presents and suggests an integrated model which is based on the constructs of WebQual, subjective norms, and cultural dimensions.}}


 * -- align="left" valign=top
 * Advances in Information Retrieval. Proceedings 32nd European Conference on IR Research, ECIR 2010
 * Advances in Information Retrieval. 32nd European Conference on IR Research, ECIR 2010, 28-31 March 2010 Berlin, Germany
 * 2010
 * {{hidden||The following topics are dealt with: natural language processing; multimedia information retrieval; language modeling; temporal information; recover broken Web; attitude identification; PICO} element; Web search queries; correlation analysis; automatic system evaluation; spatial diversity; online prediction; image detection; gene sequence; ranking fusion methods; peer-to-peer networks; probabilistic; wikipedia-based semantic smoothing; collaborative filtering; contextual image retrieval; XML} ranked retrieval; filtering documents; multilingual retrieval; machine translation; data analysis.}}
 * {{hidden||The following topics are dealt with: natural language processing; multimedia information retrieval; language modeling; temporal information; recover broken Web; attitude identification; PICO} element; Web search queries; correlation analysis; automatic system evaluation; spatial diversity; online prediction; image detection; gene sequence; ranking fusion methods; peer-to-peer networks; probabilistic; wikipedia-based semantic smoothing; collaborative filtering; contextual image retrieval; XML} ranked retrieval; filtering documents; multilingual retrieval; machine translation; data analysis.}}
 * {{hidden||The following topics are dealt with: natural language processing; multimedia information retrieval; language modeling; temporal information; recover broken Web; attitude identification; PICO} element; Web search queries; correlation analysis; automatic system evaluation; spatial diversity; online prediction; image detection; gene sequence; ranking fusion methods; peer-to-peer networks; probabilistic; wikipedia-based semantic smoothing; collaborative filtering; contextual image retrieval; XML} ranked retrieval; filtering documents; multilingual retrieval; machine translation; data analysis.}}
 * {{hidden||The following topics are dealt with: natural language processing; multimedia information retrieval; language modeling; temporal information; recover broken Web; attitude identification; PICO} element; Web search queries; correlation analysis; automatic system evaluation; spatial diversity; online prediction; image detection; gene sequence; ranking fusion methods; peer-to-peer networks; probabilistic; wikipedia-based semantic smoothing; collaborative filtering; contextual image retrieval; XML} ranked retrieval; filtering documents; multilingual retrieval; machine translation; data analysis.}}


 * -- align="left" valign=top
 * Hoffmann, Raphael; Amershi, Saleema; Patel, Kayur; Wu, Fei; Fogarty, James & Weld, Daniel S.
 * Amplifying community content creation with mixed initiative information extraction
 * Proceedings of the 27th international conference on Human factors in computing systems
 * 2009
 * 


 * -- align="left" valign=top
 * Ponzetto, Simone Paolo & Strube, Michael
 * An API for measuring the relatedness of words in Wikipedia
 * Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
 * 2007
 * 
 * {{hidden||We present an API} for computing the semantic relatedness of words in Wikipedia.}}
 * {{hidden||We present an API} for computing the semantic relatedness of words in Wikipedia.}}


 * -- align="left" valign=top
 * Erdmann, Maike; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro
 * An approach for extracting bilingual terminology from Wikipedia
 * 13th International Conference on Database Systems for Advanced Applications, DASFAA 2008, March 19, 2008 - March 21, 2008 New Delhi, India
 * 2008
 * 
 * {{hidden||With the demand of bilingual dictionaries covering domain-specific terminology, research in the field of automatic dictionary extraction has become popular. However, accuracy and coverage of dictionaries created based on bilingual text corpora are often not sufficient for domain-specific terms. Therefore, we present an approach to extracting bilingual dictionaries from the link structure of Wikipedia, a huge scale encyclopedia that contains a vast amount of links between articles in different languages. Our methods analyze not only these interlanguage links but extract even more translation candidates from redirect page and link text information. In an experiment, we proved the advantages of our methods compared to a traditional approach of extracting bilingual terminology from parallel corpora. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||With the demand of bilingual dictionaries covering domain-specific terminology, research in the field of automatic dictionary extraction has become popular. However, accuracy and coverage of dictionaries created based on bilingual text corpora are often not sufficient for domain-specific terms. Therefore, we present an approach to extracting bilingual dictionaries from the link structure of Wikipedia, a huge scale encyclopedia that contains a vast amount of links between articles in different languages. Our methods analyze not only these interlanguage links but extract even more translation candidates from redirect page and link text information. In an experiment, we proved the advantages of our methods compared to a traditional approach of extracting bilingual terminology from parallel corpora. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Gollapudi, Sreenivas & Sharma, Aneesh
 * An axiomatic approach for result diversification
 * Proceedings of the 18th international conference on World wide web
 * 2009
 * 


 * -- align="left" valign=top
 * Popovici, Eugen; Marteau, Pierre-François & Ménier, Gildas
 * An effective method for finding best entry points in semi-structured documents
 * Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
 * 2007
 * 
 * {{hidden||Focused structured document retrieval employs the concept of best entry point (BEP), which is intended to provide optimal starting-point from which users can browse to relevant document components [4]. In this paper we describe and evaluate a method for finding BEPs} in XML} documents. Experiments conducted within the framework of INEX} 2006 evaluation campaign on the Wikipedia XML} collection [2] shown the effectiveness of the proposed approach.}}
 * {{hidden||Focused structured document retrieval employs the concept of best entry point (BEP), which is intended to provide optimal starting-point from which users can browse to relevant document components [4]. In this paper we describe and evaluate a method for finding BEPs} in XML} documents. Experiments conducted within the framework of INEX} 2006 evaluation campaign on the Wikipedia XML} collection [2] shown the effectiveness of the proposed approach.}}


 * -- align="left" valign=top
 * Ben-Chaim, Yochai; Farchi, Eitan & Raz, Orna
 * An effective method for keeping design artifacts up-to-date
 * 2009 ICSE Workshop on Wikis for Software Engineering, Wikis4SE 2009, May 16, 2009 - May 24, 2009 Vancouver, BC, Canada
 * 2009
 * 
 * {{hidden||A major problem in the software development process is that design documents are rarely kept up-to-date with the implementation, and thus become irrelevant for extracting test plans or reviews. Furthermore, design documents tend to become very long and often impossible to review and comprehend. This paper describes an experimental method conducted in a development group at IBM.} The group uses a Wikipedia-like process to maintain design documents, while taking measures to keep them up-todate and in use, and thus relevant. The method uses a wiki enhanced with hierarchal glossaries of terms to maintain design artifacts. Initial results indicate that these enhancements are successful and assist in the creation of more effective design documents. We maintained a large portion of the groups' design documents in use and relevant over a period of three months. Additionally, by archiving artifacts that were not in use, we were able to validate that they were no longer relevant. }}
 * {{hidden||A major problem in the software development process is that design documents are rarely kept up-to-date with the implementation, and thus become irrelevant for extracting test plans or reviews. Furthermore, design documents tend to become very long and often impossible to review and comprehend. This paper describes an experimental method conducted in a development group at IBM.} The group uses a Wikipedia-like process to maintain design documents, while taking measures to keep them up-todate and in use, and thus relevant. The method uses a wiki enhanced with hierarchal glossaries of terms to maintain design artifacts. Initial results indicate that these enhancements are successful and assist in the creation of more effective design documents. We maintained a large portion of the groups' design documents in use and relevant over a period of three months. Additionally, by archiving artifacts that were not in use, we were able to validate that they were no longer relevant. }}


 * -- align="left" valign=top
 * Milne, David & Witten, Ian H.
 * An effective, low-cost measure of semantic relatedness obtained from wikipedia links
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008


 * -- align="left" valign=top
 * Yu, Xiaofeng & Lam, Wai
 * An integrated probabilistic and logic approach to encyclopedia relation extraction with multiple features
 * Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
 * 2008
 * 
 * {{hidden||We propose a new integrated approach based on Markov logic networks (MLNs), an effective combination of probabilistic graphical models and first-order logic for statistical relational learning, to extracting relations between entities in encyclopedic articles from Wikipedia. The MLNs} model entity relations in a unified undirected graph collectively using multiple features, including contextual, morphological, syntactic, semantic as well as Wikipedia characteristic features which can capture the essential characteristics of relation extraction task. This model makes simultaneous statistical judgments about the relations for a set of related entities. More importantly, implicit relations can also be identified easily. Our experimental results showed that, this integrated probabilistic and logic model significantly outperforms the current state-of-the-art probabilistic model, Conditional Random Fields (CRFs), for relation extraction from encyclopedic articles.}}
 * {{hidden||We propose a new integrated approach based on Markov logic networks (MLNs), an effective combination of probabilistic graphical models and first-order logic for statistical relational learning, to extracting relations between entities in encyclopedic articles from Wikipedia. The MLNs} model entity relations in a unified undirected graph collectively using multiple features, including contextual, morphological, syntactic, semantic as well as Wikipedia characteristic features which can capture the essential characteristics of relation extraction task. This model makes simultaneous statistical judgments about the relations for a set of related entities. More importantly, implicit relations can also be identified easily. Our experimental results showed that, this integrated probabilistic and logic model significantly outperforms the current state-of-the-art probabilistic model, Conditional Random Fields (CRFs), for relation extraction from encyclopedic articles.}}


 * -- align="left" valign=top
 * Nguyen, Chau Q. & Phan, Tuoi T.
 * An ontology-based approach for key phrase extraction
 * Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
 * 2009
 * 
 * {{hidden||Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP).} In this work, we propose a novel method for key phrase extracting of Vietnamese text that exploits the Vietnamese Wikipedia as an ontology and exploits specific characteristics of the Vietnamese language for the key phrase selection stage. We also explore NLP} techniques that we propose for the analysis of Vietnamese texts, focusing on the advanced candidate phrases recognition phase as well as part-of-speech (POS) tagging. Finally, we review the results of several experiments that have examined the impacts of strategies chosen for Vietnamese key phrase extracting.}}
 * {{hidden||Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP).} In this work, we propose a novel method for key phrase extracting of Vietnamese text that exploits the Vietnamese Wikipedia as an ontology and exploits specific characteristics of the Vietnamese language for the key phrase selection stage. We also explore NLP} techniques that we propose for the analysis of Vietnamese texts, focusing on the advanced candidate phrases recognition phase as well as part-of-speech (POS) tagging. Finally, we review the results of several experiments that have examined the impacts of strategies chosen for Vietnamese key phrase extracting.}}


 * -- align="left" valign=top
 * Nothman, Joel; Murphy, Tara & Curran, James R.
 * Analysing Wikipedia and gold-standard corpora for NER training
 * Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
 * 2009
 * 


 * -- align="left" valign=top
 * Lizorkin, Dmitry; Medelyan, Olena & Grineva, Maria
 * Analysis of community structure in Wikipedia
 * Proceedings of the 18th international conference on World wide web
 * 2009
 * 
 * {{hidden||We present the results of a community detection analysis of the Wikipedia graph. Distinct communities in Wikipedia contain semantically closely related articles. The central topic of a community can be identified using PageRank.} Extracted communities can be organized hierarchically similar to manually created Wikipedia category structure.}}
 * {{hidden||We present the results of a community detection analysis of the Wikipedia graph. Distinct communities in Wikipedia contain semantically closely related articles. The central topic of a community can be identified using PageRank.} Extracted communities can be organized hierarchically similar to manually created Wikipedia category structure.}}


 * -- align="left" valign=top
 * Zhang, Xinpeng; Asano, Y. & Yoshikawa, M.
 * Analysis of Implicit Relations on Wikipedia: Measuring Strength through Mining Elucidatory Objects
 * Database Systems for Advanced Applications. 15th International Conference, DASFAA 2010, 1-4 April 2010 Berlin, Germany
 * 2010
 * 


 * -- align="left" valign=top
 * Muhr, Markus; Kern, Roman & Granitzer, Michael
 * Analysis of structural relationships for hierarchical cluster labeling
 * 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland
 * 2010
 * 
 * {{hidden||Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. However, most labeling algorithms ignore such structural properties and therefore, the impact of hierarchical structures on the labeling accuracy is yet unclear. In our work we integrate hierarchical information, i.e. sibling and parent-child relations, in the cluster labeling process. We adapt standard labeling approaches, namely Maximum Term Frequency, Jensen-Shannon} Divergence, x2 Test, and Information Gain, to take use of those relationships and evaluate their impact on 4 different datasets, namely the Open Directory Project, Wikipedia, TREC} Ohsumed and the CLEF} IP} European Patent dataset. We show, that hierarchical relationships can be exploited to increase labeling accuracy especially on high-level nodes. }}
 * {{hidden||Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. However, most labeling algorithms ignore such structural properties and therefore, the impact of hierarchical structures on the labeling accuracy is yet unclear. In our work we integrate hierarchical information, i.e. sibling and parent-child relations, in the cluster labeling process. We adapt standard labeling approaches, namely Maximum Term Frequency, Jensen-Shannon} Divergence, x2 Test, and Information Gain, to take use of those relationships and evaluate their impact on 4 different datasets, namely the Open Directory Project, Wikipedia, TREC} Ohsumed and the CLEF} IP} European Patent dataset. We show, that hierarchical relationships can be exploited to increase labeling accuracy especially on high-level nodes. }}


 * -- align="left" valign=top
 * Gupta, Rahul & Sarawagi, Sunita
 * Answering table augmentation queries from unstructured lists on the web
 * Proceedings of the VLDB Endowment VLDB Endowment Hompage
 * 2009
 * 


 * -- align="left" valign=top
 * Kriplean, Travis; Beschastnikh, Ivan & McDonald, David W.
 * Articulations of wikiwork: uncovering valued work in wikipedia through barnstars
 * Proceedings of the 2008 ACM conference on Computer supported cooperative work
 * 2008
 * 


 * -- align="left" valign=top
 * Wohner, Thomas & Peters, Ralf
 * Assessing the quality of Wikipedia articles with lifecycle based metrics
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Adler, B. Thomas; Chatterjee, Krishnendu; Alfaro, Luca De; Faella, Marco; Pye, Ian & Raman, Vishwanath
 * Assigning trust to Wikipedia content
 * 4th International Symposium on Wikis, WikiSym 2008, September 8, 2008 - September 10, 2008 Porto, Portugal
 * 2008
 * 


 * -- align="left" valign=top
 * Ito, Masahiro; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro
 * Association thesaurus construction methods based on link co-occurrence analysis for wikipedia
 * 17th ACM Conference on Information and Knowledge Management, CIKM'08, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states
 * 2008
 * 
 * {{hidden||Wikipedia, a huge scale Web based encyclopedia, attracts great attention as an invaluable corpus for knowledge extraction because it has various impressive characteristics such as a huge number of articles, live updates, a dense link structure, brief anchor texts and URL} identification for concepts. We have already proved that we can use Wikipedia to construct a huge scale accurate association thesaurus. The association thesaurus we constructed covers almost 1.3 million concepts and its accuracy is proved in detailed experiments. However, we still need scalable methods to analyze the huge number of Web pages and hyperlinks among articles in the Web based encyclopedia. In this paper, we propose a scalable method for constructing an association thesaurus from Wikipedia based on link co-occurrences. Link co-occurrence analysis is more scalable than link structure analysis because it is a one-pass process. We also propose integration method of tfidf and link co-occurrence analysis. Experimental results show that both our proposed methods are more accurate and scalable than conventional methods. Furthermore, the integration of tfidf achieved higher accuracy than using only link cooccurrences. }}
 * {{hidden||Wikipedia, a huge scale Web based encyclopedia, attracts great attention as an invaluable corpus for knowledge extraction because it has various impressive characteristics such as a huge number of articles, live updates, a dense link structure, brief anchor texts and URL} identification for concepts. We have already proved that we can use Wikipedia to construct a huge scale accurate association thesaurus. The association thesaurus we constructed covers almost 1.3 million concepts and its accuracy is proved in detailed experiments. However, we still need scalable methods to analyze the huge number of Web pages and hyperlinks among articles in the Web based encyclopedia. In this paper, we propose a scalable method for constructing an association thesaurus from Wikipedia based on link co-occurrences. Link co-occurrence analysis is more scalable than link structure analysis because it is a one-pass process. We also propose integration method of tfidf and link co-occurrence analysis. Experimental results show that both our proposed methods are more accurate and scalable than conventional methods. Furthermore, the integration of tfidf achieved higher accuracy than using only link cooccurrences. }}


 * -- align="left" valign=top
 * Chi, Ed H.; Pirolli, Peter; Suh, Bongwon; Kittur, Aniket; Pendleton, Bryan & Mytkowicz, Todd
 * Augmented Social Cognition
 * 2008 AAAI Spring Symposium, March 26, 2008 - March 28, 2008 Stanford, CA, United states
 * 2008


 * -- align="left" valign=top
 * Chi, Ed H.
 * Augmented social cognition: Using social web technology to enhance the ability of groups to remember, think, and reason
 * International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09, June 29, 2009 - July 2, 2009 Providence, RI, United states
 * 2009
 * 
 * {{hidden||We are experiencing a new Social Web, where people share, communicate, commiserate, and conflict with each other. As evidenced by systems like Wikipedia, twitter, and delicious.com, these environments are turning people into social information foragers and sharers. Groups interact to resolve conflicts and jointly make sense of topic areas from Obama} vs. Clinton" to {"Islam."} PARC's} Augmented Social Cognition researchers -- who come from cognitive psychology computer science {HCI} CSCW} and other disciplines -- focus on understanding how to "enhance a group of people's ability to remember}}
 * {{hidden||We are experiencing a new Social Web, where people share, communicate, commiserate, and conflict with each other. As evidenced by systems like Wikipedia, twitter, and delicious.com, these environments are turning people into social information foragers and sharers. Groups interact to resolve conflicts and jointly make sense of topic areas from Obama} vs. Clinton" to {"Islam."} PARC's} Augmented Social Cognition researchers -- who come from cognitive psychology computer science {HCI} CSCW} and other disciplines -- focus on understanding how to "enhance a group of people's ability to remember}}


 * -- align="left" valign=top
 * Wu, Fei; Hoffmann, Raphael & Weld, Daniel S.
 * Augmenting wikipedia-extraction with results from the web
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008


 * -- align="left" valign=top
 * Dakka, Wisam & Ipeirotis, Panagiotis G.
 * Automatic Extraction of Useful Facet Hierarchies from Text Databases
 * Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
 * 2008
 * 
 * {{hidden||Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Faceted interfaces represent a new powerful paradigm that proved to be a successful complement to searching. Thus far the identification of the facets was either a manual procedure or relied on apriori knowledge of the facets that can potentially appear in the underlying collection. In this paper we present an unsupervised technique for automatic extraction of facets useful for browsing text databases. In particular we observe through a pilot study that facet terms rarely appear in text documents showing that we need external resources to identify useful facet terms. For this we first identify important phrases in each document. Then we expand each phrase with context" phrases using external resources such as WordNet} and Wikipedia causing facet terms to appear in the expanded database. Finally we compare the term distributions in the original database and the expanded database to identify the terms that can be used to construct browsing facets. Our extensive user studies using the Amazon Mechanical Turk service show that our techniques produce facets with high precision and recall that are superior to existing approaches and help users locate interesting items faster."}}
 * {{hidden||Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Faceted interfaces represent a new powerful paradigm that proved to be a successful complement to searching. Thus far the identification of the facets was either a manual procedure or relied on apriori knowledge of the facets that can potentially appear in the underlying collection. In this paper we present an unsupervised technique for automatic extraction of facets useful for browsing text databases. In particular we observe through a pilot study that facet terms rarely appear in text documents showing that we need external resources to identify useful facet terms. For this we first identify important phrases in each document. Then we expand each phrase with context" phrases using external resources such as WordNet} and Wikipedia causing facet terms to appear in the expanded database. Finally we compare the term distributions in the original database and the expanded database to identify the terms that can be used to construct browsing facets. Our extensive user studies using the Amazon Mechanical Turk service show that our techniques produce facets with high precision and recall that are superior to existing approaches and help users locate interesting items faster."}}


 * -- align="left" valign=top
 * Balasubramanian, Niranjan & Cucerzan, Silviu
 * Automatic generation of topic pages using query-based aspect models
 * ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China
 * 2009
 * 


 * -- align="left" valign=top
 * Gardner, James J. & Xiong, Li
 * Automatic link detection: a sequence labeling approach
 * Proceeding of the 18th ACM conference on Information and knowledge management
 * 2009
 * 
 * {{hidden||The popularity of Wikipedia and other online knowledge bases has recently produced an interest in the machine learning community for the problem of automatic linking. Automatic hyperlinking can be viewed as two sub problems - link detection which determines the source of a link, and link disambiguation which determines the destination of a link. Wikipedia is a rich corpus with hyperlink data provided by authors. It is possible to use this data to train classifiers to be able to mimic the authors in some capacity. In this paper, we introduce automatic link detection as a sequence labeling problem. Conditional random fields (CRFs) are a probabilistic framework for labeling sequential data. We show that training a CRF} with different types of features from the Wikipedia dataset can be used to automatically detect links with almost perfect precision and high recall.}}
 * {{hidden||The popularity of Wikipedia and other online knowledge bases has recently produced an interest in the machine learning community for the problem of automatic linking. Automatic hyperlinking can be viewed as two sub problems - link detection which determines the source of a link, and link disambiguation which determines the destination of a link. Wikipedia is a rich corpus with hyperlink data provided by authors. It is possible to use this data to train classifiers to be able to mimic the authors in some capacity. In this paper, we introduce automatic link detection as a sequence labeling problem. Conditional random fields (CRFs) are a probabilistic framework for labeling sequential data. We show that training a CRF} with different types of features from the Wikipedia dataset can be used to automatically detect links with almost perfect precision and high recall.}}


 * -- align="left" valign=top
 * Potthast, Martin; Stein, Benno & Gerling, Robert
 * Automatic vandalism detection in Wikipedia
 * 30th Annual European Conference on Information Retrieval, ECIR 2008, March 30, 2008 - April 3, 2008 Glasgow, United kingdom
 * 2008
 * 
 * {{hidden||We present results of a new approach to detect destructive article revisions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class classification problem, where vandalism edits are the target to be identified among all revisions. Interestingly, vandalism detection has not been addressed in the Information Retrieval literature by now. In this paper we discuss the characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task. We compiled a large number of vandalism edits in a corpus, which allows for the comparison of existing and new detection approaches. Using logistic regression we achieve 83\% precision at 77\% recall with our model. Compared to the rule-based methods that are currently applied in Wikipedia, our approach increases the F-Measure} performance by 49\% while being faster at the same time. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||We present results of a new approach to detect destructive article revisions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class classification problem, where vandalism edits are the target to be identified among all revisions. Interestingly, vandalism detection has not been addressed in the Information Retrieval literature by now. In this paper we discuss the characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task. We compiled a large number of vandalism edits in a corpus, which allows for the comparison of existing and new detection approaches. Using logistic regression we achieve 83\% precision at 77\% recall with our model. Compared to the rule-based methods that are currently applied in Wikipedia, our approach increases the F-Measure} performance by 49\% while being faster at the same time. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Smets, Koen; Goethals, Bart & Verdonk, Brigitte
 * Automatic vandalism detection in wikipedia: Towards a machine learning approach
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008


 * -- align="left" valign=top
 * Sauper, Christina & Barzilay, Regina
 * Automatically generating Wikipedia articles: a structure-aware approach
 * Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
 * 2009
 * 


 * -- align="left" valign=top
 * Wu, Fei & Weld, Daniel S.
 * Automatically refining the wikipedia infobox ontology
 * 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China
 * 2008
 * 
 * {{hidden||The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more semantic knowledge from natural language text. But in order to realize the full power of this information, it must be situated in a cleanly-structured ontology. This paper introduces KOG, an autonomous system for refining Wikipedia's infobox-class ontology towards this end. We cast the problem of ontology refinement as a machine learning problem and solve it using both SVMs} and a more powerful joint-inference approach expressed in Markov Logic Networks. We present experiments demonstrating the superiority of the joint-inference approach and evaluating other aspects of our system. Using these techniques, we build a rich ontology, integrating Wikipedia's infobox-class schemata with WordNet.} We demonstrate how the resulting ontology may be used to enhance Wikipedia with improved query processing and other features.}}
 * {{hidden||The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more semantic knowledge from natural language text. But in order to realize the full power of this information, it must be situated in a cleanly-structured ontology. This paper introduces KOG, an autonomous system for refining Wikipedia's infobox-class ontology towards this end. We cast the problem of ontology refinement as a machine learning problem and solve it using both SVMs} and a more powerful joint-inference approach expressed in Markov Logic Networks. We present experiments demonstrating the superiority of the joint-inference approach and evaluating other aspects of our system. Using these techniques, we build a rich ontology, integrating Wikipedia's infobox-class schemata with WordNet.} We demonstrate how the resulting ontology may be used to enhance Wikipedia with improved query processing and other features.}}


 * -- align="left" valign=top
 * Wu, Fei & Weld, Daniel S.
 * Autonomously semantifying wikipedia
 * 16th ACM Conference on Information and Knowledge Management, CIKM 2007, November 6, 2007 - November 9, 2007 Lisboa, Portugal
 * 2007
 * 
 * {{hidden||Berners-Lee's} compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously Semantifying} Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data source because it is comprehensive not too large high-quality and contains enough manually- derived structure to bootstrap an autonomous}}
 * {{hidden||Berners-Lee's} compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously Semantifying} Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data source because it is comprehensive not too large high-quality and contains enough manually- derived structure to bootstrap an autonomous}}


 * -- align="left" valign=top
 * Navigli, Roberto & Ponzetto, Simone Paolo
 * BabelNet: building a very large multilingual semantic network
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||In this paper we present BabelNet} -- a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet} and Wikipedia. In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource.}}
 * {{hidden||In this paper we present BabelNet} -- a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet} and Wikipedia. In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource.}}


 * -- align="left" valign=top
 * Kittur, Aniket & Kraut, Robert E.
 * Beyond Wikipedia: Coordination and conflict in online production groups
 * 2010 ACM Conference on Computer Supported Cooperative Work, CSCW 2010, February 6, 2010 - February 10, 2010 Savannah, GA, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Oh, Jong-Hoon; Uchimoto, Kiyotaka & Torisawa, Kentaro
 * Bilingual co-training for monolingual hyponymy-relation acquisition
 * Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
 * 2009
 * 


 * -- align="left" valign=top
 * Liu, Xiaojiang; Nie, Zaiqing; Yu, Nenghai & Wen, Ji-Rong
 * BioSnowball: Automated population of wikis
 * 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD-2010, July 25, 2010 - July 28, 2010 Washington, DC, United states
 * 2010
 * 
 * {{hidden||Internet users regularly have the need to find biographies and facts of people of interest. Wikipedia has become the first stop for celebrity biographies and facts. However, Wiki-pedia can only provide information for celebrities because of its neutral point of view (NPOV) editorial policy. In this paper we propose an integrated bootstrapping framework named BioSnowball} to automatically summarize the Web to generate Wikipedia-style pages for any person with a modest web presence. In BioSnowball, biography ranking and fact extraction are performed together in a single integrated training and inference process using Markov Logic Networks (MLNs) as its underlying statistical model. The bootstrapping framework starts with only a small number of seeds and iteratively finds new facts and biographies. As biography paragraphs on the Web are composed of the most important facts, our joint summarization model can improve the accuracy of both fact extraction and biography ranking compared to decoupled methods in the literature. Empirical results on both a small labeled data set and a real Web-scale data set show the effectiveness of BioSnowball.} We also empirically show that BioSnowball} outperforms the decoupled methods. }}
 * {{hidden||Internet users regularly have the need to find biographies and facts of people of interest. Wikipedia has become the first stop for celebrity biographies and facts. However, Wiki-pedia can only provide information for celebrities because of its neutral point of view (NPOV) editorial policy. In this paper we propose an integrated bootstrapping framework named BioSnowball} to automatically summarize the Web to generate Wikipedia-style pages for any person with a modest web presence. In BioSnowball, biography ranking and fact extraction are performed together in a single integrated training and inference process using Markov Logic Networks (MLNs) as its underlying statistical model. The bootstrapping framework starts with only a small number of seeds and iteratively finds new facts and biographies. As biography paragraphs on the Web are composed of the most important facts, our joint summarization model can improve the accuracy of both fact extraction and biography ranking compared to decoupled methods in the literature. Empirical results on both a small labeled data set and a real Web-scale data set show the effectiveness of BioSnowball.} We also empirically show that BioSnowball} outperforms the decoupled methods. }}


 * -- align="left" valign=top
 * Jesus, Rut; Schwartz, Martin & Lehmann, Sune
 * Bipartite networks of Wikipedia's articles and authors: A meso-level approach
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 
 * {{hidden||This exploratory study investigates the bipartite network of articles linked by common editors in Wikipedia, {'The} Free Encyclopedia that Anyone Can Edit'. We use the articles in the categories (to depth three) of Physics and Philosophy and extract and focus on significant editors (at least 7 or 10 edits per each article). We construct a bipartite network, and from it, overlapping cliques of densely connected articles and editors. We cluster these densely connected cliques into larger modules to study examples of larger groups that display how volunteer editors flock around articles driven by interest, real-world controversies, or the result of coordination in WikiProjects.} Our results confirm that topics aggregate editors; and show that highly coordinated efforts result in dense clusters.}}
 * {{hidden||This exploratory study investigates the bipartite network of articles linked by common editors in Wikipedia, {'The} Free Encyclopedia that Anyone Can Edit'. We use the articles in the categories (to depth three) of Physics and Philosophy and extract and focus on significant editors (at least 7 or 10 edits per each article). We construct a bipartite network, and from it, overlapping cliques of densely connected articles and editors. We cluster these densely connected cliques into larger modules to study examples of larger groups that display how volunteer editors flock around articles driven by interest, real-world controversies, or the result of coordination in WikiProjects.} Our results confirm that topics aggregate editors; and show that highly coordinated efforts result in dense clusters.}}


 * -- align="left" valign=top
 * Yardi, Sarita; Golder, Scott A. & Brzozowski, Michael J.
 * Blogging at work and the corporate attention economy
 * Proceedings of the 27th international conference on Human factors in computing systems
 * 2009
 * 
 * {{hidden||The attention economy motivates participation in peer-produced sites on the Web like YouTube} and Wikipedia. However, this economy appears to break down at work. We studied a large internal corporate blogging community using log files and interviews and found that employees expected to receive attention when they contributed to blogs, but these expectations often went unmet. Like in the external blogosphere, a few people received most of the attention, and many people received little or none. Employees expressed frustration if they invested time and received little or no perceived return on investment. While many corporations are looking to adopt Web-based communication tools like blogs, wikis, and forums, these efforts will fail unless employees are motivated to participate and contribute content. We identify where the attention economy breaks down in a corporate blog community and suggest mechanisms for improvement.}}
 * {{hidden||The attention economy motivates participation in peer-produced sites on the Web like YouTube} and Wikipedia. However, this economy appears to break down at work. We studied a large internal corporate blogging community using log files and interviews and found that employees expected to receive attention when they contributed to blogs, but these expectations often went unmet. Like in the external blogosphere, a few people received most of the attention, and many people received little or none. Employees expressed frustration if they invested time and received little or no perceived return on investment. While many corporations are looking to adopt Web-based communication tools like blogs, wikis, and forums, these efforts will fail unless employees are motivated to participate and contribute content. We identify where the attention economy breaks down in a corporate blog community and suggest mechanisms for improvement.}}


 * -- align="left" valign=top
 * Ngomo, Axel-Cyrille Ngonga & Schumacher, Frank
 * BorderFlow: A Local Graph Clustering Algorithm for Natural Language Processing
 * Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
 * 2009
 * 
 * {{hidden||In this paper, we introduce BorderFlow, a novel local graph clustering algorithm, and its application to natural language processing problems. For this purpose, we first present a formal description of the algorithm. Then, we use BorderFlow} to cluster large graphs and to extract concepts from word similarity graphs. The clustering of large graphs is carried out on graphs extracted from the Wikipedia Category Graph. The subsequent low-bias extraction of concepts is carried out on two data sets consisting of noisy and clean data. We show that BorderFlow} efficiently computes clusters of high quality and purity. Therefore, BorderFlow} can be integrated in several other natural language processing applications.}}
 * {{hidden||In this paper, we introduce BorderFlow, a novel local graph clustering algorithm, and its application to natural language processing problems. For this purpose, we first present a formal description of the algorithm. Then, we use BorderFlow} to cluster large graphs and to extract concepts from word similarity graphs. The clustering of large graphs is carried out on graphs extracted from the Wikipedia Category Graph. The subsequent low-bias extraction of concepts is carried out on two data sets consisting of noisy and clean data. We show that BorderFlow} efficiently computes clusters of high quality and purity. Therefore, BorderFlow} can be integrated in several other natural language processing applications.}}


 * -- align="left" valign=top
 * Mindel, Joshua & Verma, Sameer
 * Building Collaborative Knowledge Bases: An Open Source Approach Using Wiki Software in Teaching and Research
 * 2005
 * 
 * {{hidden||To open-minded students and professors alike, a classroom is an experience in which all participants collaborate to expand their knowledge. The collective knowledge is typically documented via a mix of lecture slides, notes taken by students, writings submitted by individuals or teams, online discussion forums, etc. A Wiki is collection of hyperlinked web pages that are assembled with Wiki software. It differs from the traditional process of developing a web site in that any registered participant can edit without knowing how to build a web site. It enables a group to asynchronously develop and refine a body of knowledge in full view of all participants. The emergence of Wikipedia and Wikitravel demonstrate that this collaborative process is scalable.1 In this tutorial, we will provide an overview of the Wiki collaboration process; explain how it can be used in teaching courses, and also how it provides an efficient mechanism for collaborating researchers to document their growing body of knowledge. For teaching, students can collectively post and refine each others writings. Participants: If possible, please bring a laptop with Wi-Fi} capability.}}
 * {{hidden||To open-minded students and professors alike, a classroom is an experience in which all participants collaborate to expand their knowledge. The collective knowledge is typically documented via a mix of lecture slides, notes taken by students, writings submitted by individuals or teams, online discussion forums, etc. A Wiki is collection of hyperlinked web pages that are assembled with Wiki software. It differs from the traditional process of developing a web site in that any registered participant can edit without knowing how to build a web site. It enables a group to asynchronously develop and refine a body of knowledge in full view of all participants. The emergence of Wikipedia and Wikitravel demonstrate that this collaborative process is scalable.1 In this tutorial, we will provide an overview of the Wiki collaboration process; explain how it can be used in teaching courses, and also how it provides an efficient mechanism for collaborating researchers to document their growing body of knowledge. For teaching, students can collectively post and refine each others writings. Participants: If possible, please bring a laptop with Wi-Fi} capability.}}
 * {{hidden||To open-minded students and professors alike, a classroom is an experience in which all participants collaborate to expand their knowledge. The collective knowledge is typically documented via a mix of lecture slides, notes taken by students, writings submitted by individuals or teams, online discussion forums, etc. A Wiki is collection of hyperlinked web pages that are assembled with Wiki software. It differs from the traditional process of developing a web site in that any registered participant can edit without knowing how to build a web site. It enables a group to asynchronously develop and refine a body of knowledge in full view of all participants. The emergence of Wikipedia and Wikitravel demonstrate that this collaborative process is scalable.1 In this tutorial, we will provide an overview of the Wiki collaboration process; explain how it can be used in teaching courses, and also how it provides an efficient mechanism for collaborating researchers to document their growing body of knowledge. For teaching, students can collectively post and refine each others writings. Participants: If possible, please bring a laptop with Wi-Fi} capability.}}


 * -- align="left" valign=top
 * DeRose, Pedro; Chai, Xiaoyong; Gao, Byron J.; Shen, Warren; Doan, AnHai; Bohannon, Philip & Zhu, Xiaojin
 * Building community wikipedias: A machine-human partnership approach
 * 2008 IEEE 24th International Conference on Data Engineering, ICDE'08, April 7, 2008 - April 12, 2008 Cancun, Mexico
 * 2008
 * 


 * -- align="left" valign=top
 * Wang, Pu & Domeniconi, Carlotta
 * Building semantic kernels for text classification using wikipedia
 * 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, August 24, 2008 - August 27, 2008 Las Vegas, NV, United states
 * 2008
 * 
 * {{hidden||Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The traditional document representation is a word-based vector (Bag} of Words, or BOW), where each dimension is associated with a term of the dictionary containing all the words that appear in the corpus. Although simple and commonly used, this representation has several limitations. It is essential to embed semantic information and conceptual patterns in order to enhance the prediction capabilities of classification algorithms. In this paper, we overcome the shortages of the BOW} approach by embedding background knowledge derived from Wikipedia into a semantic kernel, which is then used to enrich the representation of documents. Our empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the BOW} technique, and to other recently developed methods.}}
 * {{hidden||Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The traditional document representation is a word-based vector (Bag} of Words, or BOW), where each dimension is associated with a term of the dictionary containing all the words that appear in the corpus. Although simple and commonly used, this representation has several limitations. It is essential to embed semantic information and conceptual patterns in order to enhance the prediction capabilities of classification algorithms. In this paper, we overcome the shortages of the BOW} approach by embedding background knowledge derived from Wikipedia into a semantic kernel, which is then used to enrich the representation of documents. Our empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the BOW} technique, and to other recently developed methods.}}


 * -- align="left" valign=top
 * Yin, Xiaoxin & Shah, Sarthak
 * Building taxonomy of web search intents for name entity queries
 * 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states
 * 2010
 * 
 * {{hidden||A significant portion of web search queries are name entity queries. The major search engines have been exploring various ways to provide better user experiences for name entity queries, such as showing search tasks" (Bing} search) and showing direct answers (Yahoo!} Kosmix). In order to provide the search tasks or direct answers that can satisfy most popular user intents we need to capture these intents together with relationships between them. In this paper we propose an approach for building a hierarchical taxonomy of the generic search intents for a class of name entities (e.g. musicians or cities). The proposed approach can find phrases representing generic intents from user queries}}
 * {{hidden||A significant portion of web search queries are name entity queries. The major search engines have been exploring various ways to provide better user experiences for name entity queries, such as showing search tasks" (Bing} search) and showing direct answers (Yahoo!} Kosmix). In order to provide the search tasks or direct answers that can satisfy most popular user intents we need to capture these intents together with relationships between them. In this paper we propose an approach for building a hierarchical taxonomy of the generic search intents for a class of name entities (e.g. musicians or cities). The proposed approach can find phrases representing generic intents from user queries}}


 * -- align="left" valign=top
 * Blanco, Roi; Bortnikov, Edward; Junqueira, Flavio; Lempel, Ronny; Telloli, Luca & Zaragoza, Hugo
 * Caching search engine results over incremental indices
 * 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Kittur, Aniket; Suh, Bongwon & Chi, Ed H.
 * Can you ever trust a wiki? Impacting perceived trustworthiness in wikipedia
 * 2008 ACM Conference on Computer Supported Cooperative Work, CSCW 08, November 8, 2008 - November 12, 2008 San Diego, CA, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Wang, Haofen; Tran, Thanh & Liu, Chang
 * CE2: towards a large scale hybrid search engine with integrated ranking support
 * Proceeding of the 17th ACM conference on Information and knowledge management
 * 2008
 * 
 * {{hidden||The Web contains a large amount of documents and increasingly, also semantic data in the form of RDF} triples. Many of these triples are annotations that are associated with documents. While structured query is the principal mean to retrieve semantic data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these formalisms to query both documents and semantic data can address more complex information needs. In this paper, we present CE2, an integrated solution that leverages mature database and information retrieval technologies to tackle challenges in hybrid search on the large scale. For scalable storage, CE2} integrates database with inverted indices. Hybrid query processing is supported in CE2} through novel algorithms and data structures, which allow for advanced ranking schemes to be integrated more tightly into the process. Experiments conducted on Dbpedia and Wikipedia show that CE2} can provide good performance in terms of both effectiveness and efficiency.}}
 * {{hidden||The Web contains a large amount of documents and increasingly, also semantic data in the form of RDF} triples. Many of these triples are annotations that are associated with documents. While structured query is the principal mean to retrieve semantic data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these formalisms to query both documents and semantic data can address more complex information needs. In this paper, we present CE2, an integrated solution that leverages mature database and information retrieval technologies to tackle challenges in hybrid search on the large scale. For scalable storage, CE2} integrates database with inverted indices. Hybrid query processing is supported in CE2} through novel algorithms and data structures, which allow for advanced ranking schemes to be integrated more tightly into the process. Experiments conducted on Dbpedia and Wikipedia show that CE2} can provide good performance in terms of both effectiveness and efficiency.}}


 * -- align="left" valign=top
 * Cowling, Peter; Remde, Stephen; Hartley, Peter; Stewart, Will; Stock-Brooks, Joe & Woolley, Tom
 * C-Link: Concept linkage in knowledge repositories
 * 2010 AAAI Spring Symposium, March 22, 2010 - March 24, 2010 Stanford, CA, United states
 * 2010
 * {{hidden||When searching a knowledge repository such as Wikipedia or the Internet, the user doesn't always know what they are looking for. Indeed, it is often the case that a user wishes to find information about a concept that was completely unknown to them prior to the search. In this paper we describe C-Link, which provides the user with a method for searching for unknown concepts which lie between two known concepts. C-Link} does this by modeling the knowledge repository as a weighted, directed graph where nodes are concepts and arc weights give the degree of relatedness" between concepts. An experimental study was undertaken with 59 participants to investigate the performance of C-Link} compared to standard search approaches. Statistical analysis of the results shows great potential for C-Link} as a search tool. 2009 Association for the Advancement of Artificial Intelligence."}}
 * {{hidden||When searching a knowledge repository such as Wikipedia or the Internet, the user doesn't always know what they are looking for. Indeed, it is often the case that a user wishes to find information about a concept that was completely unknown to them prior to the search. In this paper we describe C-Link, which provides the user with a method for searching for unknown concepts which lie between two known concepts. C-Link} does this by modeling the knowledge repository as a weighted, directed graph where nodes are concepts and arc weights give the degree of relatedness" between concepts. An experimental study was undertaken with 59 participants to investigate the performance of C-Link} compared to standard search approaches. Statistical analysis of the results shows great potential for C-Link} as a search tool. 2009 Association for the Advancement of Artificial Intelligence."}}
 * {{hidden||When searching a knowledge repository such as Wikipedia or the Internet, the user doesn't always know what they are looking for. Indeed, it is often the case that a user wishes to find information about a concept that was completely unknown to them prior to the search. In this paper we describe C-Link, which provides the user with a method for searching for unknown concepts which lie between two known concepts. C-Link} does this by modeling the knowledge repository as a weighted, directed graph where nodes are concepts and arc weights give the degree of relatedness" between concepts. An experimental study was undertaken with 59 participants to investigate the performance of C-Link} compared to standard search approaches. Statistical analysis of the results shows great potential for C-Link} as a search tool. 2009 Association for the Advancement of Artificial Intelligence."}}


 * -- align="left" valign=top
 * Huang, Anna; Milne, David; Frank, Eibe & Witten, Ian H.
 * Clustering documents using a wikipedia-based concept representation
 * 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009, April 27, 2009 - April 30, 2009 Bangkok, Thailand
 * 2009
 * 
 * {{hidden||This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. We test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, our approach already improves upon related techniques. Springer-Verlag} Berlin Heidelberg 2009.}}
 * {{hidden||This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. We test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, our approach already improves upon related techniques. Springer-Verlag} Berlin Heidelberg 2009.}}


 * -- align="left" valign=top
 * Huang, Anna; Milne, David; Frank, Eibe & Witten, Ian H.
 * Clustering documents with active learning using wikipedia
 * 8th IEEE International Conference on Data Mining, ICDM 2008, December 15, 2008 - December 19, 2008 Pisa, Italy
 * 2008
 * 


 * -- align="left" valign=top
 * Banerjee, Somnath; Ramanathan, Krishnan & Gupta, Ajay
 * Clustering short texts using wikipedia
 * 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, July 23, 2007 - July 27, 2007 Amsterdam, Netherlands
 * 2007
 * 


 * -- align="left" valign=top
 * Emigh, William & Herring, Susan C.
 * Collaborative Authoring on the Web: A Genre Analysis of Online Encyclopedias
 * Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
 * 2005
 * 


 * -- align="left" valign=top
 * Ahmadi, Navid; Repenning, Alexander & Ioannidou, Andri
 * Collaborative end-user development on handheld devices
 * 2008 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2008, September 15, 2008 - September 19, 2008 Herrsching am Ammersee, Germany
 * 2008
 * 
 * {{hidden||Web 2.0 has enabled end users to collaborate through their own developed artifacts, moving on from text (e.g., Wikipedia, Blogs) to images (e.g., Flickr) and movies (e.g., YouTube), changing end-user's role from consumer to producer. But still there is no support for collaboration through interactive end-user developed artifacts, especially for emerging handheld devices, which are the next collaborative platform. Featuring fast always-on networks, Web browsers that are as powerful as their desktop counterparts, and innovative user interfaces, the newest generation of handheld devices can run highly interactive content as Web applications. We have created Ristretto Mobile, a Web-compliant framework for running end-user developed applications on handheld devices. The Webbased Ristretto Mobile includes compiler and runtime components to turn end-user applications into Web applications that can run on compatible handheld devices, including the Apple IPhone} and Nokia N800. Our paper reports on the technological and cognitive challenges in creating interactive content that runs efficiently and is user accessible on handheld devices. }}
 * {{hidden||Web 2.0 has enabled end users to collaborate through their own developed artifacts, moving on from text (e.g., Wikipedia, Blogs) to images (e.g., Flickr) and movies (e.g., YouTube), changing end-user's role from consumer to producer. But still there is no support for collaboration through interactive end-user developed artifacts, especially for emerging handheld devices, which are the next collaborative platform. Featuring fast always-on networks, Web browsers that are as powerful as their desktop counterparts, and innovative user interfaces, the newest generation of handheld devices can run highly interactive content as Web applications. We have created Ristretto Mobile, a Web-compliant framework for running end-user developed applications on handheld devices. The Webbased Ristretto Mobile includes compiler and runtime components to turn end-user applications into Web applications that can run on compatible handheld devices, including the Apple IPhone} and Nokia N800. Our paper reports on the technological and cognitive challenges in creating interactive content that runs efficiently and is user accessible on handheld devices. }}


 * -- align="left" valign=top
 * Shieh, Jyh-Ren; Yeh, Yang-Ting; Lin, Chih-Hung; Lin, Ching-Yung & Wu, Ja-Ling
 * Collaborative knowledge semantic graph image search
 * 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China
 * 2008
 * 
 * {{hidden||In this paper, we propose a Collaborative Knowledge Semantic Graphs Image Search (CKSGIS) system. It provides a novel way to conduct image search by utilizing the collaborative nature in Wikipedia and by performing network analysis to form semantic graphs for search-term expansion. The collaborative article editing process used by Wikipedia's contributors is formalized as bipartite graphs that are folded into networks between terms. When a user types in a search term, CKSGIS} automatically retrieves an interactive semantic graph of related terms that allow users to easily find related images not limited to a specific search term. Interactive semantic graph then serve as an interface to retrieve images through existing commercial search engines. This method significantly saves users' time by avoiding multiple search keywords that are usually required in generic search engines. It benefits both naive users who do not possess a large vocabulary and professionals who look for images on a regular basis. In our experiments, 85\% of the participants favored CKSGIS} system rather than commercial search engines.}}
 * {{hidden||In this paper, we propose a Collaborative Knowledge Semantic Graphs Image Search (CKSGIS) system. It provides a novel way to conduct image search by utilizing the collaborative nature in Wikipedia and by performing network analysis to form semantic graphs for search-term expansion. The collaborative article editing process used by Wikipedia's contributors is formalized as bipartite graphs that are folded into networks between terms. When a user types in a search term, CKSGIS} automatically retrieves an interactive semantic graph of related terms that allow users to easily find related images not limited to a specific search term. Interactive semantic graph then serve as an interface to retrieve images through existing commercial search engines. This method significantly saves users' time by avoiding multiple search keywords that are usually required in generic search engines. It benefits both naive users who do not possess a large vocabulary and professionals who look for images on a regular basis. In our experiments, 85\% of the participants favored CKSGIS} system rather than commercial search engines.}}


 * -- align="left" valign=top
 * Kulkarni, Sayali; Singh, Amit; Ramakrishnan, Ganesh & Chakrabarti, Soumen
 * Collective annotation of wikipedia entities in web text
 * 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, June 28, 2009 - July 1, 2009 Paris, France
 * 2009
 * 
 * {{hidden||To take the first step beyond keyword-based search toward entity-based search, suitable token spans (spots") on documents must be identified as references to real-world entities from an entity catalog. Several systems have been proposed to link spots on Web pages to entities in Wikipedia. They are largely based on local compatibility between the text around the spot and textual metadata associated with the entity. Two recent systems exploit inter-label dependencies but in limited ways. We propose a general collective disambiguation approach. Our premise is that coherent documents refer to entities from one or a few related topics or domains. We give formulations for the trade-off between local spot-to-entity compatibility and measures of global coherence between entities. Optimizing the overall entity assignment is NP-hard.} We investigate practical solutions based on local hill-climbing rounding integer linear programs and pre-clustering entities followed by local optimization within clusters. In experiments involving over a hundred manuallyannotated Web pages and tens of thousands of spots our approaches significantly outperform recently-proposed algorithms. }}
 * {{hidden||To take the first step beyond keyword-based search toward entity-based search, suitable token spans (spots") on documents must be identified as references to real-world entities from an entity catalog. Several systems have been proposed to link spots on Web pages to entities in Wikipedia. They are largely based on local compatibility between the text around the spot and textual metadata associated with the entity. Two recent systems exploit inter-label dependencies but in limited ways. We propose a general collective disambiguation approach. Our premise is that coherent documents refer to entities from one or a few related topics or domains. We give formulations for the trade-off between local spot-to-entity compatibility and measures of global coherence between entities. Optimizing the overall entity assignment is NP-hard.} We investigate practical solutions based on local hill-climbing rounding integer linear programs and pre-clustering entities followed by local optimization within clusters. In experiments involving over a hundred manuallyannotated Web pages and tens of thousands of spots our approaches significantly outperform recently-proposed algorithms. }}


 * -- align="left" valign=top
 * Yao, Limin; Riedel, Sebastian & McCallum, Andrew
 * Collective cross-document relation extraction without labelled data
 * Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
 * 2010
 * 
 * {{hidden||We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an indomain (Wikipedia) and a more realistic out-of-domain (New} York Times Corpus) setting. For the in-domain setting, our joint model leads to 4\% higher precision than an isolated local approach, but has no advantage over a pipeline. For the out-of-domain data, we benefit strongly from joint modelling, and observe improvements in precision of 13\% over the pipeline, and 15\% over the isolated baseline.}}
 * {{hidden||We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an indomain (Wikipedia) and a more realistic out-of-domain (New} York Times Corpus) setting. For the in-domain setting, our joint model leads to 4\% higher precision than an isolated local approach, but has no advantage over a pipeline. For the out-of-domain data, we benefit strongly from joint modelling, and observe improvements in precision of 13\% over the pipeline, and 15\% over the isolated baseline.}}


 * -- align="left" valign=top
 * Vibber, Brion
 * Community performance optimization: making your people run as smoothly as your site
 * Proceedings of the 5th International Symposium on Wikis and Open Collaboration
 * 2009
 * 
 * {{hidden||Collaborative communities such as those building wikis and open source software often discover that their human interactions have just as many scaling problems as their web infrastructure. As the number of people involved in a project grows, key decision-makers often become bottlenecks, and community structure needs to change or a project can become stalled despite the best intentions of all participants. I'll describe some of the community scaling challenges in both Wikipedia's editor community and the development of its underlying MediaWiki} software and how we've overcome -- or are still working to overcome -- decision-making bottlenecks to maximize community throughput"."}}
 * {{hidden||Collaborative communities such as those building wikis and open source software often discover that their human interactions have just as many scaling problems as their web infrastructure. As the number of people involved in a project grows, key decision-makers often become bottlenecks, and community structure needs to change or a project can become stalled despite the best intentions of all participants. I'll describe some of the community scaling challenges in both Wikipedia's editor community and the development of its underlying MediaWiki} software and how we've overcome -- or are still working to overcome -- decision-making bottlenecks to maximize community throughput"."}}


 * -- align="left" valign=top
 * He, Jinru; Yan, Hao & Suel, Torsten
 * Compact full-text indexing of versioned document collections
 * ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China
 * 2009
 * 


 * -- align="left" valign=top
 * Zesch, Torsten; Gurevych, Iryna & Mühlhäuser, Max
 * Comparing Wikipedia and German wordnet by evaluating semantic relatedness on multiple datasets
 * NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
 * 2007
 * 


 * -- align="left" valign=top
 * West, Robert; Precup, Doina & Pineau, Joelle
 * Completing Wikipedia's hyperlink structure through dimensionality reduction
 * ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China
 * 2009
 * 


 * -- align="left" valign=top
 * Zhang, Bingjun; Xiang, Qiaoliang; Lu, Huanhuan; Shen, Jialie & Wang, Ye
 * Comprehensive query-dependent fusion using regression-on-folksonomies: a case study of multimodal music search
 * Proceedings of the seventeen ACM international conference on Multimedia
 * 2009
 * 
 * {{hidden||The combination of heterogeneous knowledge sources has been widely regarded as an effective approach to boost retrieval accuracy in many information retrieval domains. While various technologies have been recently developed for information retrieval, multimodal music search has not kept pace with the enormous growth of data on the Internet. In this paper, we study the problem of integrating multiple online information sources to conduct effective query dependent fusion (QDF) of multiple search experts for music retrieval. We have developed a novel framework to construct a knowledge space of users' information need from online folksonomy data. With this innovation, a large number of comprehensive queries can be automatically constructed to train a better generalized QDF} system against unseen user queries. In addition, our framework models QDF} problem by regression of the optimal combination strategy on a query. Distinguished from the previous approaches, the regression model of QDF} (RQDF) offers superior modeling capability with less constraints and more efficient computation. To validate our approach, a large scale test collection has been collected from different online sources, such as Last.fm, Wikipedia, and YouTube.} All test data will be released to the public for better research synergy in multimodal music search. Our performance study indicates that the accuracy, efficiency, and robustness of the multimodal music search can be improved significantly by the proposed Folksonomy-RQDF} approach. In addition, since no human involvement is required to collect training examples, our approach offers great feasibility and practicality in system development.}}
 * {{hidden||The combination of heterogeneous knowledge sources has been widely regarded as an effective approach to boost retrieval accuracy in many information retrieval domains. While various technologies have been recently developed for information retrieval, multimodal music search has not kept pace with the enormous growth of data on the Internet. In this paper, we study the problem of integrating multiple online information sources to conduct effective query dependent fusion (QDF) of multiple search experts for music retrieval. We have developed a novel framework to construct a knowledge space of users' information need from online folksonomy data. With this innovation, a large number of comprehensive queries can be automatically constructed to train a better generalized QDF} system against unseen user queries. In addition, our framework models QDF} problem by regression of the optimal combination strategy on a query. Distinguished from the previous approaches, the regression model of QDF} (RQDF) offers superior modeling capability with less constraints and more efficient computation. To validate our approach, a large scale test collection has been collected from different online sources, such as Last.fm, Wikipedia, and YouTube.} All test data will be released to the public for better research synergy in multimodal music search. Our performance study indicates that the accuracy, efficiency, and robustness of the multimodal music search can be improved significantly by the proposed Folksonomy-RQDF} approach. In addition, since no human involvement is required to collect training examples, our approach offers great feasibility and practicality in system development.}}


 * -- align="left" valign=top
 * Gabrilovich, Evgeniy & Markovitch, Shaul
 * Computing semantic relatedness using Wikipedia-based explicit semantic analysis
 * Proceedings of the 20th international joint conference on Artifical intelligence
 * 2007
 * 
 * {{hidden||Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA} results in substantial improvements in correlation of computed relatedness scores with human judgments: from r = 0.56 to 0.75 for individual words and from r = 0.60 to 0.72 for texts. Importantly, due to the use of natural concepts, the ESA} model is easy to explain to human users.}}
 * {{hidden||Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA} results in substantial improvements in correlation of computed relatedness scores with human judgments: from r = 0.56 to 0.75 for individual words and from r = 0.60 to 0.72 for texts. Importantly, due to the use of natural concepts, the ESA} model is easy to explain to human users.}}


 * -- align="left" valign=top
 * Egozi, Ofer; Gabrilovich, Evgeniy & Markovitch, Shaul
 * Concept-based feature generation and selection for information retrieval
 * Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
 * 2008
 * 
 * {{hidden||Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based feature generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating features, that we have in supervised learning. We present a new feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated features are then evaluated and filtered on the basis of these sets. Experiments on TREC} data confirm the superior performance of our method compared to the previous state of the art.}}
 * {{hidden||Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based feature generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating features, that we have in supervised learning. We present a new feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated features are then evaluated and filtered on the basis of these sets. Experiments on TREC} data confirm the superior performance of our method compared to the previous state of the art.}}


 * -- align="left" valign=top
 * Chang, Jonathan; Boyd-Graber, Jordan & Blei, David M.
 * Connections between the lines: augmenting social networks with text
 * Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
 * 2009
 * 


 * -- align="left" valign=top
 * Wilkinson, Dennis M. & Huberman, Bernardo A.
 * Cooperation and quality in Wikipedia
 * 2007 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA - 2007 International Symposium on Wikis, WikiSym, October 21, 2007 - October 25, 2007 Montreal, QC, Canada
 * 2007
 * 


 * -- align="left" valign=top
 * Krieger, Michel; Stark, Emily Margarete & Klemmer, Scott R.
 * Coordinating tasks on the commons: designing for personal goals, expertise and serendipity
 * Proceedings of the 27th international conference on Human factors in computing systems
 * 2009
 * 


 * -- align="left" valign=top
 * Rossi, Alessandro; Gaio, Loris; Besten, Matthijs Den & Dalle, Jean-Michel
 * Coordination and division of labor in open content communities: The role of template messages in Wikipedia
 * 43rd Annual Hawaii International Conference on System Sciences, HICSS-43, January 5, 2010 - January 8, 2010 Koloa, Kauai, {HI, United states
 * 2010
 * 
 * {{hidden||Though largely spontaneous and loosely regulated, the process of peer production within online communities is also supplemented by additional coordination mechanisms. In this respect, we study an emergent organizational practice of the Wikipedia community, the use of template messages, which seems to act as effective and parsimonious coordination device to signal quality concerns or other issues that need to be addressed. We focus on the template NPOV"} which signals breaches on the fundamental policy of neutrality of Wikipedia articles and we show how and to what extent putting such template on a page affects the editing process. We notably find that intensity of editing increases immediately after the {"NPOV"} template appears and that controversies about articles which have received the attention of a more limited group of editors before they were tagged as controversial have a lower chance to be treated quickly. }}
 * {{hidden||Though largely spontaneous and loosely regulated, the process of peer production within online communities is also supplemented by additional coordination mechanisms. In this respect, we study an emergent organizational practice of the Wikipedia community, the use of template messages, which seems to act as effective and parsimonious coordination device to signal quality concerns or other issues that need to be addressed. We focus on the template NPOV"} which signals breaches on the fundamental policy of neutrality of Wikipedia articles and we show how and to what extent putting such template on a page affects the editing process. We notably find that intensity of editing increases immediately after the {"NPOV"} template appears and that controversies about articles which have received the attention of a more limited group of editors before they were tagged as controversial have a lower chance to be treated quickly. }}


 * -- align="left" valign=top
 * Kittur, Aniket; Lee, Bryant & Kraut, Robert E.
 * Coordination in collective intelligence: the role of team structure and task interdependence
 * Proceedings of the 27th international conference on Human factors in computing systems
 * 2009
 * 


 * -- align="left" valign=top
 * Jankowski, Jacek
 * Copernicus: 3D Wikipedia
 * ACM SIGGRAPH 2008 Posters 2008, SIGGRAPH'08, August 11, 2008 - August 15, 2008 Los Angeles, CA, United states
 * 2008
 * 
 * {{hidden||In this paper we present one of the potential paths of the evolution of Wikipedia towards Web 3.0. We introduce Copernicus - The Virtual {3D} Encyclopedia, which was built according to {2-Layer} Interface Paradigm (2LIP).} The background layer of the {2LIP-type} user interface is a {3D} scene, which a user cannot directly interact with. The foreground layer is {HTML} content. Only taking an action on this content (e.g. pressing a hyperlink) can affect the {3D} scene.}}
 * {{hidden||In this paper we present one of the potential paths of the evolution of Wikipedia towards Web 3.0. We introduce Copernicus - The Virtual {3D} Encyclopedia, which was built according to {2-Layer} Interface Paradigm (2LIP).} The background layer of the {2LIP-type} user interface is a {3D} scene, which a user cannot directly interact with. The foreground layer is {HTML} content. Only taking an action on this content (e.g. pressing a hyperlink) can affect the {3D} scene.}}


 * -- align="left" valign=top
 * Zhao, Shubin & Betz, Jonathan
 * Corroborate and learn facts from the web
 * Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
 * 2007
 * 


 * -- align="left" valign=top
 * Sato, Satoshi
 * Crawling English-Japanese person-name transliterations from the web
 * Proceedings of the 18th international conference on World wide web
 * 2009
 * 
 * {{hidden||Automatic compilation of lexicon is a dream of lexicon compilers as well as lexicon users. This paper proposes a system that crawls English-Japanese} person-name transliterations from the Web, which works a back-end collector for automatic compilation of bilingual person-name lexicon. Our crawler collected {561K} transliterations in five months. From them, an English-Japanese} person-name lexicon with {406K} entries has been compiled by an automatic post processing. This lexicon is much larger than other similar resources including English-Japanese} lexicon of {HeiNER} obtained from Wikipedia.}}
 * {{hidden||Automatic compilation of lexicon is a dream of lexicon compilers as well as lexicon users. This paper proposes a system that crawls English-Japanese} person-name transliterations from the Web, which works a back-end collector for automatic compilation of bilingual person-name lexicon. Our crawler collected {561K} transliterations in five months. From them, an English-Japanese} person-name lexicon with {406K} entries has been compiled by an automatic post processing. This lexicon is much larger than other similar resources including English-Japanese} lexicon of {HeiNER} obtained from Wikipedia.}}


 * -- align="left" valign=top
 * Lindsay, Brooks
 * Creating the Wikipedia of pros and cons""
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 
 * {{hidden||Debatepedia Founder Brooks Lindsay will host a panel focusing on projects and individuals attempting to build what amounts to the Wikipedia of debates" or "the Wikipedia of pros and cons". The panel will bring together Debatepedia founder Brooks Lindsay Debatewise founder David Crane Opposing Views founder Russell Fine and ProCon.org} editor Kambiz Akhavan. We will discuss our successes and failures over the past three years and the way forward for clarifying public debates via wiki and other technologies. "}}
 * {{hidden||Debatepedia Founder Brooks Lindsay will host a panel focusing on projects and individuals attempting to build what amounts to the Wikipedia of debates" or "the Wikipedia of pros and cons". The panel will bring together Debatepedia founder Brooks Lindsay Debatewise founder David Crane Opposing Views founder Russell Fine and ProCon.org} editor Kambiz Akhavan. We will discuss our successes and failures over the past three years and the way forward for clarifying public debates via wiki and other technologies. "}}


 * -- align="left" valign=top
 * Shachaf, Pnina & Hara, Noriko
 * Cross-Cultural Analysis of the Wikipedia Community
 * 2009
 * 
 * {{hidden||This paper reports a cross-cultural analysis of Wikipedia communities of practice (CoPs).} First, this paper argues that Wikipedia communities can be analyzed and understood as CoPs.} Second, the similarities and differences in norms of behaviors across three different languages (English, Hebrew, and Japanese) and on three types of discussion spaces (Talk, User Talk, and Wikipedia Talk) are identified. These are explained by Hofstede’s dimensions of cultural diversity, the size of the community, and the role of each discussion area. This paper expands the research on online CoPs, which have not performed in-depth examinations of cultural variations across multiple.}}
 * {{hidden||This paper reports a cross-cultural analysis of Wikipedia communities of practice (CoPs).} First, this paper argues that Wikipedia communities can be analyzed and understood as CoPs.} Second, the similarities and differences in norms of behaviors across three different languages (English, Hebrew, and Japanese) and on three types of discussion spaces (Talk, User Talk, and Wikipedia Talk) are identified. These are explained by Hofstede’s dimensions of cultural diversity, the size of the community, and the role of each discussion area. This paper expands the research on online CoPs, which have not performed in-depth examinations of cultural variations across multiple.}}
 * {{hidden||This paper reports a cross-cultural analysis of Wikipedia communities of practice (CoPs).} First, this paper argues that Wikipedia communities can be analyzed and understood as CoPs.} Second, the similarities and differences in norms of behaviors across three different languages (English, Hebrew, and Japanese) and on three types of discussion spaces (Talk, User Talk, and Wikipedia Talk) are identified. These are explained by Hofstede’s dimensions of cultural diversity, the size of the community, and the role of each discussion area. This paper expands the research on online CoPs, which have not performed in-depth examinations of cultural variations across multiple.}}


 * -- align="left" valign=top
 * Roth, Benjamin & Klakow, Dietrich
 * Cross-language retrieval using link-based language models
 * 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland
 * 2010
 * 
 * {{hidden||We propose a cross-language retrieval model that is solely based on Wikipedia as a training corpus. The main contributions of our work are: 1. A translation model based on linked text in Wikipedia and a term weighting method associated with it. 2. A combination scheme to interpolate the link translation model with retrieval based on Latent Dirichlet Allocation. On the CLEF} 2000 data we achieve improvement with respect to the best German-English} system at the bilingual track (non-significant) and improvement against a baseline based on machine translation (significant).}}
 * {{hidden||We propose a cross-language retrieval model that is solely based on Wikipedia as a training corpus. The main contributions of our work are: 1. A translation model based on linked text in Wikipedia and a term weighting method associated with it. 2. A combination scheme to interpolate the link translation model with retrieval based on Latent Dirichlet Allocation. On the CLEF} 2000 data we achieve improvement with respect to the best German-English} system at the bilingual track (non-significant) and improvement against a baseline based on machine translation (significant).}}


 * -- align="left" valign=top
 * Hassan, Samer & Mihalcea, Rada
 * Cross-lingual semantic relatedness using encyclopedic knowledge
 * Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
 * 2009
 * 


 * -- align="left" valign=top
 * Potthast, Martin
 * Crowdsourcing a Wikipedia vandalism corpus
 * 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland
 * 2010
 * 
 * {{hidden||We report on the construction of the PAN} Wikipedia vandalism corpus, PAN-WVC-10, using Amazon's Mechanical Turk. The corpus compiles 32 452 edits on 28468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as regular" or "vandalism." The corpus is available free of charge. 2010 ACM.}"}}
 * {{hidden||We report on the construction of the PAN} Wikipedia vandalism corpus, PAN-WVC-10, using Amazon's Mechanical Turk. The corpus compiles 32 452 edits on 28468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as regular" or "vandalism." The corpus is available free of charge. 2010 ACM.}"}}


 * -- align="left" valign=top
 * Amer-Yahia, Sihem; Markl, Volker; Halevy, Alon; Doan, AnHai; Alonso, Gustavo; Kossmann, Donald & Weikum, Gerhard
 * Databases and Web 2.0 panel at VLDB 2007
 * 2008
 * 
 * {{hidden||Web 2.0 refers to a set of technologies that enables indviduals to create and share content on the Web. The types of content that are shared on Web 2.0 are quite varied and include photos and videos (e.g., Flickr, YouTube), encyclopedic knowledge (e.g., Wikipedia), the blogosphere, social book-marking and even structured data (e.g., Swivel, Many-eyes). One of the important distinguishing features of Web 2.0 is the creation of communities of users. Online communities such as LinkedIn, Friendster, Facebook, MySpace} and Orkut attract millions of users who build networks of their contacts and utilize them for social and professional purposes. In a nutshell, Web 2.0 offers an architecture of participation and democracy that encourages users to add value to the application as they use it.}}
 * {{hidden||Web 2.0 refers to a set of technologies that enables indviduals to create and share content on the Web. The types of content that are shared on Web 2.0 are quite varied and include photos and videos (e.g., Flickr, YouTube), encyclopedic knowledge (e.g., Wikipedia), the blogosphere, social book-marking and even structured data (e.g., Swivel, Many-eyes). One of the important distinguishing features of Web 2.0 is the creation of communities of users. Online communities such as LinkedIn, Friendster, Facebook, MySpace} and Orkut attract millions of users who build networks of their contacts and utilize them for social and professional purposes. In a nutshell, Web 2.0 offers an architecture of participation and democracy that encourages users to add value to the application as they use it.}}
 * {{hidden||Web 2.0 refers to a set of technologies that enables indviduals to create and share content on the Web. The types of content that are shared on Web 2.0 are quite varied and include photos and videos (e.g., Flickr, YouTube), encyclopedic knowledge (e.g., Wikipedia), the blogosphere, social book-marking and even structured data (e.g., Swivel, Many-eyes). One of the important distinguishing features of Web 2.0 is the creation of communities of users. Online communities such as LinkedIn, Friendster, Facebook, MySpace} and Orkut attract millions of users who build networks of their contacts and utilize them for social and professional purposes. In a nutshell, Web 2.0 offers an architecture of participation and democracy that encourages users to add value to the application as they use it.}}


 * -- align="left" valign=top
 * Nastase, Vivi & Strube, Michael
 * Decoding Wikipedia categories for knowledge acquisition
 * 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08, July 13, 2008 - July 17, 2008 Chicago, IL, United states
 * 2008
 * {{hidden||This paper presents an approach to acquire knowledge from Wikipedia categories and the category network. Many Wikipedia categories have complex names which reflect human classification and organizing instances, and thus encode knowledge about class attributes, taxonomic and other semantic relations. We decode the names and refer back to the network to induce relations between concepts in Wikipedia represented through pages or categories. The category structure allows us to propagate a relation detected between constituents of a category name to numerous concept links. The results of the process are evaluated against ResearchCyc} and a subset also by human judges. The results support the idea that Wikipedia category names are a rich source of useful and accurate knowledge. Copyright 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}
 * {{hidden||This paper presents an approach to acquire knowledge from Wikipedia categories and the category network. Many Wikipedia categories have complex names which reflect human classification and organizing instances, and thus encode knowledge about class attributes, taxonomic and other semantic relations. We decode the names and refer back to the network to induce relations between concepts in Wikipedia represented through pages or categories. The category structure allows us to propagate a relation detected between constituents of a category name to numerous concept links. The results of the process are evaluated against ResearchCyc} and a subset also by human judges. The results support the idea that Wikipedia category names are a rich source of useful and accurate knowledge. Copyright 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}
 * {{hidden||This paper presents an approach to acquire knowledge from Wikipedia categories and the category network. Many Wikipedia categories have complex names which reflect human classification and organizing instances, and thus encode knowledge about class attributes, taxonomic and other semantic relations. We decode the names and refer back to the network to induce relations between concepts in Wikipedia represented through pages or categories. The category structure allows us to propagate a relation detected between constituents of a category name to numerous concept links. The results of the process are evaluated against ResearchCyc} and a subset also by human judges. The results support the idea that Wikipedia category names are a rich source of useful and accurate knowledge. Copyright 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}


 * -- align="left" valign=top
 * Grishchenko, Victor
 * Deep hypertext with embedded revision control implemented in regular expressions
 * 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland
 * 2010
 * 
 * {{hidden||While text versioning was definitely a part of the original hypertext concept [21, 36, 44], it is rarely considered in this context today. Still, we know that revision control underlies the most exciting social co-authoring projects of the today's Internet, namely the Wikipedia and the Linux kernel. With an intention to adapt the advanced revision control technologies and practices to the conditions of the Web, the paper reconsiders some obsolete assumptions and develops a new versioned text format perfectly processable with standard regular expressions (PCRE} [6]). The resulting deep hypertext model allows instant access to past/concurrent versions, authorship, changes; enables deep links to reference changing parts of a changing text. Effectively, it allows distributed and real-time revision control on the Web, implementing the vision of co-evolution and mutation exchange among multiple competing versions of the same text. }}
 * {{hidden||While text versioning was definitely a part of the original hypertext concept [21, 36, 44], it is rarely considered in this context today. Still, we know that revision control underlies the most exciting social co-authoring projects of the today's Internet, namely the Wikipedia and the Linux kernel. With an intention to adapt the advanced revision control technologies and practices to the conditions of the Web, the paper reconsiders some obsolete assumptions and develops a new versioned text format perfectly processable with standard regular expressions (PCRE} [6]). The resulting deep hypertext model allows instant access to past/concurrent versions, authorship, changes; enables deep links to reference changing parts of a changing text. Effectively, it allows distributed and real-time revision control on the Web, implementing the vision of co-evolution and mutation exchange among multiple competing versions of the same text. }}


 * -- align="left" valign=top
 * Ponzetto, Simone Paolo & Strube, Michael
 * Deriving a large scale taxonomy from Wikipedia
 * AAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference, July 22, 2007 - July 26, 2007 Vancouver, BC, Canada
 * 2007


 * -- align="left" valign=top
 * Arazy, Ofer & Nov, Oded
 * Determinants of wikipedia quality: The roles of global and local contribution inequality
 * 2010 ACM Conference on Computer Supported Cooperative Work, CSCW 2010, February 6, 2010 - February 10, 2010 Savannah, GA, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Wilson, Shomir
 * Distinguishing use and mention in natural language
 * Proceedings of the NAACL HLT 2010 Student Research Workshop
 * 2010
 * 


 * -- align="left" valign=top
 * Rafiei, Davood; Bharat, Krishna & Shukla, Anand
 * Diversifying web search results
 * 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Jr., Joseph M. Reagle
 * Do as I do: Authorial leadership in Wikipedia
 * 2007 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA - 2007 International Symposium on Wikis, WikiSym, October 21, 2007 - October 25, 2007 Montreal, QC, Canada
 * 2007
 * 
 * {{hidden||In seemingly egalitarian collaborative on-line communities, like Wikipedia, there is often a paradoxical, or perhaps merely playful, use of the title Benevolent} Dictator" for leaders. I explore discourse around the use of this title so as to address how leadership works in open content communities. I first review existing literature on "emergent leadership" and then relate excerpts from community discourse on how leadership is understood performed and discussed by Wikipedians. I conclude by integrating concepts from existing literature and my own findings into a theory of "authorial" leadership."}}
 * {{hidden||In seemingly egalitarian collaborative on-line communities, like Wikipedia, there is often a paradoxical, or perhaps merely playful, use of the title Benevolent} Dictator" for leaders. I explore discourse around the use of this title so as to address how leadership works in open content communities. I first review existing literature on "emergent leadership" and then relate excerpts from community discourse on how leadership is understood performed and discussed by Wikipedians. I conclude by integrating concepts from existing literature and my own findings into a theory of "authorial" leadership."}}


 * -- align="left" valign=top
 * Stein, Klaus & Hess, Claudia
 * Does it matter who contributes - A study on featured articles in the german wikipedia
 * Hypertext 2007: 18th ACM Conference on Hypertext and Hypermedia, HT'07, September 10, 2007 - September 12, 2007 Manchester, United kingdom
 * 2007
 * 


 * -- align="left" valign=top
 * U, Leong Hou; Mamoulis, Nikos; Berberich, Klaus & Bedathur, Srikanta
 * Durable top-k search in document archives
 * 2010 International Conference on Management of Data, SIGMOD '10, June 6, 2010 - June 11, 2010 Indianapolis, IN, United states
 * 2010
 * 
 * {{hidden||We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects that are consistently in the top-k results of a query (e.g., a keyword query) throughout a given time interval (e.g., from June 2008 to May 2009). Existing work on temporal top-k queries mainly focuses on finding the most representative top-k elements within a time interval. Such methods are not readily applicable to durable top-k queries. To address this need, we propose two techniques that compute the durable top-k result. The first is adapted from the classic top-k rank aggregation algorithm NRA.} The second technique is based on a shared execution paradigm and is more efficient than the first approach. In addition, we propose a special indexing technique for archived data. The index, coupled with a space partitioning technique, improves performance even further. We use data from Wikipedia and the Internet Archive to demonstrate the efficiency and effectiveness of our solutions.}}
 * {{hidden||We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects that are consistently in the top-k results of a query (e.g., a keyword query) throughout a given time interval (e.g., from June 2008 to May 2009). Existing work on temporal top-k queries mainly focuses on finding the most representative top-k elements within a time interval. Such methods are not readily applicable to durable top-k queries. To address this need, we propose two techniques that compute the durable top-k result. The first is adapted from the classic top-k rank aggregation algorithm NRA.} The second technique is based on a shared execution paradigm and is more efficient than the first approach. In addition, we propose a special indexing technique for archived data. The index, coupled with a space partitioning technique, improves performance even further. We use data from Wikipedia and the Internet Archive to demonstrate the efficiency and effectiveness of our solutions.}}


 * -- align="left" valign=top
 * Sinclair, Patrick A. S.; Martinez, Kirk & Lewis, Paul H.
 * Dynamic link service 2.0: Using wikipedia as a linkbase
 * Hypertext 2007: 18th ACM Conference on Hypertext and Hypermedia, HT'07, September 10, 2007 - September 12, 2007 Manchester, United kingdom
 * 2007
 * 


 * -- align="left" valign=top
 * Nakatani, Makoto; Jatowt, Adam & Tanaka, Katsumi
 * Easiest-first search: Towards comprehension-based web search
 * ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China
 * 2009
 * 


 * -- align="left" valign=top
 * Grineva, Maria; Grinev, Maxim & Lizorkin, Dmitry
 * Effective extraction of thematically grouped key terms from text
 * Social Semantic Web: Where Web 2.0 Meets Web 3.0 - Papers from the AAAI Spring Symposium, March 23, 2009 - March 25, 2009 Stanford, CA, United states
 * 2009


 * -- align="left" valign=top
 * Keegan, Brian & Gergle, Darren
 * Egalitarians at the gate: One-sided gatekeeping practices in social media
 * 2010 ACM Conference on Computer Supported Cooperative Work, CSCW 2010, February 6, 2010 - February 10, 2010 Savannah, GA, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Carmel, David; Roitman, Haggai & Zwerdling, Naama
 * Enhancing cluster labeling using wikipedia
 * 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, July 19, 2009 - July 23, 2009 Boston, MA, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Hu, Jian; Fang, Lujun; Cao, Yang; Zeng, Hua-Jun; Li, Hua; Yang, Qiang & Chen, Zheng
 * Enhancing text clustering by leveraging wikipedia semantics
 * 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008, July 20, 2008 - July 24, 2008 Singapore, Singapore
 * 2008
 * 
 * {{hidden||Most traditional text clustering methods are based on bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW} however ignores the important information on the semantic relationships between key terms. To overcome this problem several methods have been proposed to enrich text representation with external resource in the past such as WordNet.} However}}
 * {{hidden||Most traditional text clustering methods are based on bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW} however ignores the important information on the semantic relationships between key terms. To overcome this problem several methods have been proposed to enrich text representation with external resource in the past such as WordNet.} However}}


 * -- align="left" valign=top
 * Sorg, Philipp & Cimiano, Philipp
 * Enriching the crosslingual link structure of wikipedia - A classification-based approach
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008
 * {{hidden||The crosslingual link structure of Wikipedia represents a valuable resource which can be exploited for crosslingual natural language processing applications. However, this requires that it has a reasonable coverage and is furthermore accurate. For the specific language pair German/English} that we consider in our experiments, we show that roughly 50\% of the articles are linked from German to English and only 14\% from English to German. These figures clearly corroborate the need for an approach to automatically induce new cross-language links, especially in the light of such a dynamically growing resource such as Wikipedia. In this paper we present a classification-based approach with the goal of inferring new cross-language links. Our experiments show that this approach has a recall of 70\% with a precision of 94\% for the task of learning cross-language links on a test dataset.}}
 * {{hidden||The crosslingual link structure of Wikipedia represents a valuable resource which can be exploited for crosslingual natural language processing applications. However, this requires that it has a reasonable coverage and is furthermore accurate. For the specific language pair German/English} that we consider in our experiments, we show that roughly 50\% of the articles are linked from German to English and only 14\% from English to German. These figures clearly corroborate the need for an approach to automatically induce new cross-language links, especially in the light of such a dynamically growing resource such as Wikipedia. In this paper we present a classification-based approach with the goal of inferring new cross-language links. Our experiments show that this approach has a recall of 70\% with a precision of 94\% for the task of learning cross-language links on a test dataset.}}
 * {{hidden||The crosslingual link structure of Wikipedia represents a valuable resource which can be exploited for crosslingual natural language processing applications. However, this requires that it has a reasonable coverage and is furthermore accurate. For the specific language pair German/English} that we consider in our experiments, we show that roughly 50\% of the articles are linked from German to English and only 14\% from English to German. These figures clearly corroborate the need for an approach to automatically induce new cross-language links, especially in the light of such a dynamically growing resource such as Wikipedia. In this paper we present a classification-based approach with the goal of inferring new cross-language links. Our experiments show that this approach has a recall of 70\% with a precision of 94\% for the task of learning cross-language links on a test dataset.}}


 * -- align="left" valign=top
 * Pennacchiotti, Marco & Pantel, Patrick
 * Entity extraction via ensemble semantics
 * Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
 * 2009
 * 


 * -- align="left" valign=top
 * Xu, Yang; Ding, Fan & Wang, Bin
 * Entity-based query reformulation using Wikipedia
 * 17th ACM Conference on Information and Knowledge Management, CIKM'08, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states
 * 2008
 * 
 * {{hidden||Many real world applications increasingly involve both structured data and text, and entity based retrieval is an important problem in this realm. In this paper, we present an automatic query reformulation approach based on entities detected in each query. The aim is to utilize semantics associated with entities for enhancing document retrieval. This is done by expanding a query with terms/phrases related to entities in the query. We exploit Wikipedia as a large repository of entity information. Our reformulated approach consists of three major steps : (1) detect representative entity in a query; (2) expand the query with entity related terms/phrases; and (3) facilitate term dependency features. We evaluate our approach in ad-hoc retrieval task on four TREC} collections, including two large web collections. Experiments results show that significant improvement is possible by utilizing entity corresponding information.}}
 * {{hidden||Many real world applications increasingly involve both structured data and text, and entity based retrieval is an important problem in this realm. In this paper, we present an automatic query reformulation approach based on entities detected in each query. The aim is to utilize semantics associated with entities for enhancing document retrieval. This is done by expanding a query with terms/phrases related to entities in the query. We exploit Wikipedia as a large repository of entity information. Our reformulated approach consists of three major steps : (1) detect representative entity in a query; (2) expand the query with entity related terms/phrases; and (3) facilitate term dependency features. We evaluate our approach in ad-hoc retrieval task on four TREC} collections, including two large web collections. Experiments results show that significant improvement is possible by utilizing entity corresponding information.}}


 * -- align="left" valign=top
 * Bast, Holger; Chitea, Alexandru; Suchanek, Fabian & Weber, Ingmar
 * ESTER: Efficient search on text, entities, and relations
 * 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, July 23, 2007 - July 27, 2007 Amsterdam, Netherlands
 * 2007
 * 
 * {{hidden||We present ESTER, a modular and highly efficient system for combined full-text and ontology search. ESTER} builds on a query engine that supports two basic operations: prefix search and join. Both of these can be implemented very efficiently with a compact index, yet in combination provide powerful querying capabilities. We show how ESTER} can answer basic SPARQL} graph-pattern queries on the ontology by reducing them to a small number of these two basic operations. ESTER} further supports a natural blend of such semantic queries with ordinary full-text queries. Moreover, the prefix search operation allows for a fully interactive and proactive user interface, which after every keystroke suggests to the user possible semantic interpretations of his or her query, and speculatively executes the most likely of these interpretations. As a proof of concept, we applied ESTER} to the English Wikipedia, which contains about 3 million documents, combined with the recent YAGO} ontology, which contains about 2.5 million facts. For a variety of complex queries, ESTER} achieves worst-case query processing times of a fraction of a second, on a single machine, with an index size of about 4 GB. ""}}
 * {{hidden||We present ESTER, a modular and highly efficient system for combined full-text and ontology search. ESTER} builds on a query engine that supports two basic operations: prefix search and join. Both of these can be implemented very efficiently with a compact index, yet in combination provide powerful querying capabilities. We show how ESTER} can answer basic SPARQL} graph-pattern queries on the ontology by reducing them to a small number of these two basic operations. ESTER} further supports a natural blend of such semantic queries with ordinary full-text queries. Moreover, the prefix search operation allows for a fully interactive and proactive user interface, which after every keystroke suggests to the user possible semantic interpretations of his or her query, and speculatively executes the most likely of these interpretations. As a proof of concept, we applied ESTER} to the English Wikipedia, which contains about 3 million documents, combined with the recent YAGO} ontology, which contains about 2.5 million facts. For a variety of complex queries, ESTER} achieves worst-case query processing times of a fraction of a second, on a single machine, with an index size of about 4 GB. ""}}


 * -- align="left" valign=top
 * Moturu, Sai T. & Liu, Huan
 * Evaluating the trustworthiness of Wikipedia articles through quality and credibility
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Cimiano, Philipp; Schultz, Antje; Sizov, Sergej; Sorg, Philipp & Staab, Steffen
 * Explicit versus latent concept models for cross-language information retrieval
 * Proceedings of the 21st international jont conference on Artifical intelligence
 * 2009
 * 
 * {{hidden||The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such as WordNet, latent topics derived from the data itself - as in Latent Semantic Indexing (LSI) or (Latent} Dirichlet Allocation (LDA) - to Wikipedia articles as proxies for concepts, as in the recently proposed Explicit Semantic Analysis (ESA) model. A crucial question which has not been answered so far is whether models based on explicitly given concepts (as in the ESA} model for instance) perform inherently better than retrieval models based on latent" concepts (as in LSI} and/or LDA).} In this paper we investigate this question closer in the context of a cross-language setting which inherently requires concept-based retrieval bridging between different languages. In particular we compare the recently proposed ESA} model with two latent models (LSI} and LDA) showing that the former is clearly superior to the both. From a general perspective our results contribute to clarifying the role of explicit vs. implicitly derived or latent concepts in (cross-language) information retrieval research."}}
 * {{hidden||The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such as WordNet, latent topics derived from the data itself - as in Latent Semantic Indexing (LSI) or (Latent} Dirichlet Allocation (LDA) - to Wikipedia articles as proxies for concepts, as in the recently proposed Explicit Semantic Analysis (ESA) model. A crucial question which has not been answered so far is whether models based on explicitly given concepts (as in the ESA} model for instance) perform inherently better than retrieval models based on latent" concepts (as in LSI} and/or LDA).} In this paper we investigate this question closer in the context of a cross-language setting which inherently requires concept-based retrieval bridging between different languages. In particular we compare the recently proposed ESA} model with two latent models (LSI} and LDA) showing that the former is clearly superior to the both. From a general perspective our results contribute to clarifying the role of explicit vs. implicitly derived or latent concepts in (cross-language) information retrieval research."}}


 * -- align="left" valign=top
 * Sánchez, Liliana Mamani; Li, Baoli & Vogel, Carl
 * Exploiting CCG structures with tree kernels for speculation detection
 * Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
 * 2010
 * 
 * {{hidden||Our CoNLL-2010} speculative sentence detector disambiguates putative keywords based on the following considerations: a speculative keyword may be composed of one or more word tokens; a speculative sentence may have one or more speculative keywords; and if a sentence contains at least one real speculative keyword, it is deemed speculative. A tree kernel classifier is used to assess whether a potential speculative keyword conveys speculation. We exploit information implicit in tree structures. For prediction efficiency, only a segment of the whole tree around a speculation keyword is considered, along with morphological features inside the segment and information about the containing document. A maximum entropy classifier is used for sentences not covered by the tree kernel classifier. Experiments on the Wikipedia data set show that our system achieves 0.55 F-measure (in-domain).}}
 * {{hidden||Our CoNLL-2010} speculative sentence detector disambiguates putative keywords based on the following considerations: a speculative keyword may be composed of one or more word tokens; a speculative sentence may have one or more speculative keywords; and if a sentence contains at least one real speculative keyword, it is deemed speculative. A tree kernel classifier is used to assess whether a potential speculative keyword conveys speculation. We exploit information implicit in tree structures. For prediction efficiency, only a segment of the whole tree around a speculation keyword is considered, along with morphological features inside the segment and information about the containing document. A maximum entropy classifier is used for sentences not covered by the tree kernel classifier. Experiments on the Wikipedia data set show that our system achieves 0.55 F-measure (in-domain).}}


 * -- align="left" valign=top
 * Billerbeck, Bodo; Demartini, Gianluca; Firan, Claudiu S.; Iofciu, Tereza & Krestel, Ralf
 * Exploiting click-through data for entity retrieval
 * 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland
 * 2010
 * 
 * {{hidden||We present an approach for answering Entity Retrieval queries using click-through information in query log data from a commercial Web search engine. We compare results using click graphs and session graphs and present an evaluation test set making use of Wikipedia List} of" pages. 2010 ACM.}"}}
 * {{hidden||We present an approach for answering Entity Retrieval queries using click-through information in query log data from a commercial Web search engine. We compare results using click graphs and session graphs and present an evaluation test set making use of Wikipedia List} of" pages. 2010 ACM.}"}}


 * -- align="left" valign=top
 * Hu, Xia; Sun, Nan; Zhang, Chao & Chua, Tat-Seng
 * Exploiting internal and external semantics for the clustering of short texts using world knowledge
 * ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China
 * 2009
 * 
 * {{hidden||Clustering of short texts, such as snippets, presents great challenges in existing aggregated search techniques due to the problem of data sparseness and the complex semantics of natural language. As short texts do not provide sufficient term occurring information, traditional text representation methods, such as ''bag of words model have several limitations when directly applied to short texts tasks. In this paper we propose a novel framework to improve the performance of short texts clustering by exploiting the internal semantics from original text and external concepts from world knowledge. The proposed method employs a hierarchical three-level structure to tackle the data sparsity problem of original short texts and reconstruct the corresponding feature space with the integration of multiple semantic knowledge bases - Wikipedia and WordNet.} Empirical evaluation with Reuters and real web dataset demonstrates that our approach is able to achieve significant improvement as compared to the state-of-the-art methods. "}}
 * {{hidden||Clustering of short texts, such as snippets, presents great challenges in existing aggregated search techniques due to the problem of data sparseness and the complex semantics of natural language. As short texts do not provide sufficient term occurring information, traditional text representation methods, such as ''bag of words model have several limitations when directly applied to short texts tasks. In this paper we propose a novel framework to improve the performance of short texts clustering by exploiting the internal semantics from original text and external concepts from world knowledge. The proposed method employs a hierarchical three-level structure to tackle the data sparsity problem of original short texts and reconstruct the corresponding feature space with the integration of multiple semantic knowledge bases - Wikipedia and WordNet.} Empirical evaluation with Reuters and real web dataset demonstrates that our approach is able to achieve significant improvement as compared to the state-of-the-art methods. "}}


 * -- align="left" valign=top
 * Pehcevski, Jovan; Vercoustre, Anne-Marie & Thom, James A.
 * Exploiting locality of wikipedia links in entity ranking
 * 30th Annual European Conference on Information Retrieval, ECIR 2008, March 30, 2008 - April 3, 2008 Glasgow, United kingdom
 * 2008
 * 
 * {{hidden||Information retrieval from web and XML} document collections is ever more focused on returning entities instead of web pages or XML} elements. There are many research fields involving named entities; one such field is known as entity ranking, where one goal is to rank entities in response to a query supported with a short list of entity examples. In this paper, we describe our approach to ranking entities from the Wikipedia XML} document collection. Our approach utilises the known categories and the link structure of Wikipedia, and more importantly, exploits link co-occurrences to improve the effectiveness of entity ranking. Using the broad context of a full Wikipedia page as a baseline, we evaluate two different algorithms for identifying narrow contexts around the entity examples: one that uses predefined types of elements such as paragraphs, lists and tables; and another that dynamically identifies the contexts by utilising the underlying XML} document structure. Our experiments demonstrate that the locality of Wikipedia links can be exploited to significantly improve the effectiveness of entity ranking. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Information retrieval from web and XML} document collections is ever more focused on returning entities instead of web pages or XML} elements. There are many research fields involving named entities; one such field is known as entity ranking, where one goal is to rank entities in response to a query supported with a short list of entity examples. In this paper, we describe our approach to ranking entities from the Wikipedia XML} document collection. Our approach utilises the known categories and the link structure of Wikipedia, and more importantly, exploits link co-occurrences to improve the effectiveness of entity ranking. Using the broad context of a full Wikipedia page as a baseline, we evaluate two different algorithms for identifying narrow contexts around the entity examples: one that uses predefined types of elements such as paragraphs, lists and tables; and another that dynamically identifies the contexts by utilising the underlying XML} document structure. Our experiments demonstrate that the locality of Wikipedia links can be exploited to significantly improve the effectiveness of entity ranking. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Ponzetto, Simone Paolo & Strube, Michael
 * Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution
 * Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
 * 2006
 * 
 * {{hidden||In this paper we present an extension of a machine learning based coreference resolution system which uses features induced from different semantic knowledge sources. These features represent knowledge mined from WordNet} and Wikipedia, as well as information about semantic role labels. We show that semantic features indeed improve the performance on different referring expression types such as pronouns and common nouns.}}
 * {{hidden||In this paper we present an extension of a machine learning based coreference resolution system which uses features induced from different semantic knowledge sources. These features represent knowledge mined from WordNet} and Wikipedia, as well as information about semantic role labels. We show that semantic features indeed improve the performance on different referring expression types such as pronouns and common nouns.}}


 * -- align="left" valign=top
 * Milne, David N.
 * Exploiting web 2.0 forallknowledge-based information retrieval
 * Proceedings of the ACM first Ph.D. workshop in CIKM
 * 2007
 * 


 * -- align="left" valign=top
 * Hu, Xiaohua; Zhang, Xiaodan; Lu, Caimei; Park, E.K. & Zhou, Xiaohua
 * Exploiting wikipedia as external knowledge for document clustering
 * 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, June 28, 2009 - July 1, 2009 Paris, France
 * 2009
 * 
 * {{hidden||In traditional text clustering methods, documents are represented as bags of words" without considering the semantic information of each document. For instance if two documents use different collections of core words to represent the same topic they may be falsely assigned to different clusters due to the lack of shared core words although the core words they use are probably synonyms or semantically associated in other forms. The most common way to solve this problem is to enrich document representation with the background knowledge in an ontology. There are two major issues for this approach: (1) the coverage of the ontology is limited even for WordNet} or Mesh}}
 * {{hidden||In traditional text clustering methods, documents are represented as bags of words" without considering the semantic information of each document. For instance if two documents use different collections of core words to represent the same topic they may be falsely assigned to different clusters due to the lack of shared core words although the core words they use are probably synonyms or semantically associated in other forms. The most common way to solve this problem is to enrich document representation with the background knowledge in an ontology. There are two major issues for this approach: (1) the coverage of the ontology is limited even for WordNet} or Mesh}}


 * -- align="left" valign=top
 * Winter, Judith
 * Exploiting XML structure to improve information retrieval in peer-to-peer systems
 * Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
 * 2008
 * 
 * {{hidden||With the advent of XML} as a standard for representation and exchange of structured documents, a growing amount of XML-documents} are being stored in Peer-to-Peer} (P2P) networks. Cur¬rent research on P2P} search engines proposes the use of Informa¬tion Retrieval (IR) techniques to perform content-based search, but does not take into account structural features of documents. P2P} systems typically have no central index, thus avoiding single-points-of-failures, but distribute all information among participating peers. Accordingly, a querying peer has only limited access to the index information and should select carefully which peers can help answering a given query by contributing resources such as local index information or CPU} time for ranking computations. Bandwidth consumption is a major issue. To guarantee scalability, P2P} systems have to reduce the number of peers involved in the retrieval process. As a result, the retrieval quality in terms of recall and precision may suffer substantially. In the proposed thesis, document structure is considered as an extra source of information to improve the retrieval quality of XML-documents} in a P2P} environment. The thesis centres on the following questions: how can structural information help to improve the retrieval of XML-documents} in terms of result quality such as precision, recall, and specificity? Can XML} structure support the routing of queries in distributed environments, especially the selection of promising peers? How can XML} IR} techniques be used in a P2P} network while minimizing bandwidth consumption and considering performance aspects? To answer these questions and to analyze possible achievements, a search engine is proposed that exploits structural hints expressed explicitly by the user or implicitly by the self-describing structure of XML-documents.} Additionally, more focused and specific results are obtained by providing ranked retrieval units that can be either XML-documents} as a whole or the most relevant passages of theses documents. XML} information retrieval techniques are applied in two ways: to select those peers participating in the retrieval process, and to compute the relevance of documents. The indexing approach includes both content and structural information of documents. To support efficient execution of multi term queries, index keys consist of rare combinations of (content, structure)-tuples. Performance is increased by using only fixedsized posting lists: frequent index keys are combined with each other iteratively until the new combination is rare, with a posting list size under a pre-set threshold. All posting lists are sorted by taking into account classical IR} measures such as term frequency and inverted term frequency as well as weights for potential retrieval units of a document, with a slight bias towards documents on peers with good collections regarding the current index key and with good peer characteristics such as online times, available bandwidth, and latency. When extracting the posting list for a specific query, a re-ordering on the posting list is performed that takes into account the structural similarity between key and query. According to this preranking, peers are selected that are expected to hold information about potentially relevant documents and retrieval units The final ranking is computed in parallel on those selected peers. The computation is based on an extension of the vector space model and distinguishes between weights for different structures of the same content. This allows weighting XML} elements with respect to their discriminative power, e.g. a title will be weighted much higher than a footnote. Additionally, relevance is computed as a mixture of content relevance and structural similarity between a given query and a potential retrieval unit. Currently, a first prototype for P2P} Information Retrieval of XML-documents} called SPIRIX} is being implemented. Experiments to evaluate the proposed techniques and use of structural hints will be performed on a distributed version of the INEX} Wikipedia Collection.}}
 * {{hidden||With the advent of XML} as a standard for representation and exchange of structured documents, a growing amount of XML-documents} are being stored in Peer-to-Peer} (P2P) networks. Cur¬rent research on P2P} search engines proposes the use of Informa¬tion Retrieval (IR) techniques to perform content-based search, but does not take into account structural features of documents. P2P} systems typically have no central index, thus avoiding single-points-of-failures, but distribute all information among participating peers. Accordingly, a querying peer has only limited access to the index information and should select carefully which peers can help answering a given query by contributing resources such as local index information or CPU} time for ranking computations. Bandwidth consumption is a major issue. To guarantee scalability, P2P} systems have to reduce the number of peers involved in the retrieval process. As a result, the retrieval quality in terms of recall and precision may suffer substantially. In the proposed thesis, document structure is considered as an extra source of information to improve the retrieval quality of XML-documents} in a P2P} environment. The thesis centres on the following questions: how can structural information help to improve the retrieval of XML-documents} in terms of result quality such as precision, recall, and specificity? Can XML} structure support the routing of queries in distributed environments, especially the selection of promising peers? How can XML} IR} techniques be used in a P2P} network while minimizing bandwidth consumption and considering performance aspects? To answer these questions and to analyze possible achievements, a search engine is proposed that exploits structural hints expressed explicitly by the user or implicitly by the self-describing structure of XML-documents.} Additionally, more focused and specific results are obtained by providing ranked retrieval units that can be either XML-documents} as a whole or the most relevant passages of theses documents. XML} information retrieval techniques are applied in two ways: to select those peers participating in the retrieval process, and to compute the relevance of documents. The indexing approach includes both content and structural information of documents. To support efficient execution of multi term queries, index keys consist of rare combinations of (content, structure)-tuples. Performance is increased by using only fixedsized posting lists: frequent index keys are combined with each other iteratively until the new combination is rare, with a posting list size under a pre-set threshold. All posting lists are sorted by taking into account classical IR} measures such as term frequency and inverted term frequency as well as weights for potential retrieval units of a document, with a slight bias towards documents on peers with good collections regarding the current index key and with good peer characteristics such as online times, available bandwidth, and latency. When extracting the posting list for a specific query, a re-ordering on the posting list is performed that takes into account the structural similarity between key and query. According to this preranking, peers are selected that are expected to hold information about potentially relevant documents and retrieval units The final ranking is computed in parallel on those selected peers. The computation is based on an extension of the vector space model and distinguishes between weights for different structures of the same content. This allows weighting XML} elements with respect to their discriminative power, e.g. a title will be weighted much higher than a footnote. Additionally, relevance is computed as a mixture of content relevance and structural similarity between a given query and a potential retrieval unit. Currently, a first prototype for P2P} Information Retrieval of XML-documents} called SPIRIX} is being implemented. Experiments to evaluate the proposed techniques and use of structural hints will be performed on a distributed version of the INEX} Wikipedia Collection.}}


 * -- align="left" valign=top
 * Grineva, Maria; Grinev, Maxim & Lizorkin, Dmitry
 * Extracting key terms from noisy and multitheme documents
 * Proceedings of the 18th international conference on World wide web
 * 2009
 * 


 * -- align="left" valign=top
 * Shnarch, Eyal; Barak, Libby & Dagan, Ido
 * Extracting lexical reference rules from Wikipedia
 * Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
 * 2009
 * 
 * {{hidden||This paper describes the extraction from Wikipedia of lexical reference rules, identifying references to term meanings triggered by other terms. We present extraction methods geared to cover the broad range of the lexical reference relation and analyze them extensively. Most extraction methods yield high precision levels, and our rule-base is shown to perform better than other automatically constructed baselines in a couple of lexical expansion and matching tasks. Our rule-base yields comparable performance to Word-Net} while providing largely complementary information.}}
 * {{hidden||This paper describes the extraction from Wikipedia of lexical reference rules, identifying references to term meanings triggered by other terms. We present extraction methods geared to cover the broad range of the lexical reference relation and analyze them extensively. Most extraction methods yield high precision levels, and our rule-base is shown to perform better than other automatically constructed baselines in a couple of lexical expansion and matching tasks. Our rule-base yields comparable performance to Word-Net} while providing largely complementary information.}}


 * -- align="left" valign=top
 * Davidov, Dmitry & Rappoport, Ari
 * Extraction and approximation of numerical attributes from the Web
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet} similarity information. First, we obtain from the Web and WordNet} a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet} enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.}}
 * {{hidden||We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet} similarity information. First, we obtain from the Web and WordNet} a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet} enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.}}


 * -- align="left" valign=top
 * Pasca, Marius
 * Extraction of open-domain class attributes from text: building blocks for faceted search
 * Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval
 * 2010
 * 


 * -- align="left" valign=top
 * Li, Chengkai; Yan, Ning; Roy, Senjuti B.; Lisham, Lekhendro & Das, Gautam
 * Facetedpedia: Dynamic generation of query-dependent faceted interfaces for Wikipedia
 * 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Kummerfeld, Jonathan K.; Roesner, Jessika; Dawborn, Tim; Haggerty, James; Curran, James R. & Clark, Stephen
 * Faster parsing by supertagger adaptation
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||We propose a novel self-training method for a parser which uses a lexicalised grammar and supertagger, focusing on increasing the speed of the parser rather than its accuracy. The idea is to train the supertagger on large amounts of parser output, so that the supertagger can learn to supply the supertags that the parser will eventually choose as part of the highest-scoring derivation. Since the supertagger supplies fewer supertags overall, the parsing speed is increased. We demonstrate the effectiveness of the method using a CCG} supertagger and parser, obtaining significant speed increases on newspaper text with no loss in accuracy. We also show that the method can be used to adapt the CCG} parser to new domains, obtaining accuracy and speed improvements for Wikipedia and biomedical text.}}
 * {{hidden||We propose a novel self-training method for a parser which uses a lexicalised grammar and supertagger, focusing on increasing the speed of the parser rather than its accuracy. The idea is to train the supertagger on large amounts of parser output, so that the supertagger can learn to supply the supertags that the parser will eventually choose as part of the highest-scoring derivation. Since the supertagger supplies fewer supertags overall, the parsing speed is increased. We demonstrate the effectiveness of the method using a CCG} supertagger and parser, obtaining significant speed increases on newspaper text with no loss in accuracy. We also show that the method can be used to adapt the CCG} parser to new domains, obtaining accuracy and speed improvements for Wikipedia and biomedical text.}}


 * -- align="left" valign=top
 * Kuhlman, C.J.; Kumar, V.S.A.; Marathe, M.V.; Ravi, S.S. & Rosenkrantz, D.J.
 * Finding critical nodes for inhibiting diffusion of complex contagions in social networks
 * Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2010, 20-24 Sept. 2010 Berlin, Germany
 * 2010
 * 
 * {{hidden||We study the problem of inhibiting diffusion of complex contagions such as rumors, undesirable fads and mob behavior in social networks by removing a small number of nodes (called critical nodes) from the network. We show that, in general, for any 1, even obtaining a p-approximate solution to these problems is NP-hard.} We develop efficient heuristics for these problems and carry out an empirical study of their performance on three well known social networks, namely epinions, wikipedia and slashdot. Our results show that the heuristics perform well on the three social networks.}}
 * {{hidden||We study the problem of inhibiting diffusion of complex contagions such as rumors, undesirable fads and mob behavior in social networks by removing a small number of nodes (called critical nodes) from the network. We show that, in general, for any 1, even obtaining a p-approximate solution to these problems is NP-hard.} We develop efficient heuristics for these problems and carry out an empirical study of their performance on three well known social networks, namely epinions, wikipedia and slashdot. Our results show that the heuristics perform well on the three social networks.}}


 * -- align="left" valign=top
 * Ganter, Viola & Strube, Michael
 * Finding hedges by chasing weasels: hedge detection using Wikipedia tags and shallow linguistic features
 * Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
 * 2009
 * 


 * -- align="left" valign=top
 * Ollivier, Yann & Senellart, Pierre
 * Finding related pages using green measures: an illustration with Wikipedia
 * AAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference, July 22, 2007 - July 26, 2007 Vancouver, BC, Canada
 * 2007


 * -- align="left" valign=top
 * Giuliano, Claudio
 * Fine-grained classification of named entities exploiting latent semantic kernels
 * Proceedings of the Thirteenth Conference on Computational Natural Language Learning
 * 2009
 * 


 * -- align="left" valign=top
 * Zimmer, Christian; Bedathur, Srikanta & Weikum, Gerhard
 * Flood little, cache more: effective result-reuse in P2P IR systems
 * Proceedings of the 13th international conference on Database systems for advanced applications
 * 2008
 * 
 * {{hidden||State-of-the-art Peer-to-Peer} Information Retrieval (P2P} IR) systems suffer from their lack of response time guarantee especially with scale. To address this issue, a number of techniques for caching of multi-term inverted list intersections and query results have been proposed recently. Although these enable speedy query evaluations with low network overheads, they fail to consider the potential impact of caching on result quality improvements. In this paper, we propose the use of a cache-aware query routing scheme, that not only reduces the response delays for a query, but also presents an opportunity to improve the result quality while keeping the network usage low. In this regard, we make threefold contributions in this paper. First of all, we develop a cache-aware, multiround query routing strategy that balances between query efficiency and result-quality. Next, we propose to aggressively reuse the cached results of even subsets of a query towards an approximate caching technique that can drastically reduce the bandwidth overheads, and study the conditions under which such a scheme can retain good result-quality. Finally, we empirically evaluate these techniques over a fully functional P2P} IR} system, using a large-scale Wikipedia benchmark, and using both synthetic and real-world query workloads. Our results show that our proposal to combine result caching with multi-round, cache-aware query routing can reduce network traffic by more than half while doubling the result quality.}}
 * {{hidden||State-of-the-art Peer-to-Peer} Information Retrieval (P2P} IR) systems suffer from their lack of response time guarantee especially with scale. To address this issue, a number of techniques for caching of multi-term inverted list intersections and query results have been proposed recently. Although these enable speedy query evaluations with low network overheads, they fail to consider the potential impact of caching on result quality improvements. In this paper, we propose the use of a cache-aware query routing scheme, that not only reduces the response delays for a query, but also presents an opportunity to improve the result quality while keeping the network usage low. In this regard, we make threefold contributions in this paper. First of all, we develop a cache-aware, multiround query routing strategy that balances between query efficiency and result-quality. Next, we propose to aggressively reuse the cached results of even subsets of a query towards an approximate caching technique that can drastically reduce the bandwidth overheads, and study the conditions under which such a scheme can retain good result-quality. Finally, we empirically evaluate these techniques over a fully functional P2P} IR} system, using a large-scale Wikipedia benchmark, and using both synthetic and real-world query workloads. Our results show that our proposal to combine result caching with multi-round, cache-aware query routing can reduce network traffic by more than half while doubling the result quality.}}


 * -- align="left" valign=top
 * Kangpyo, Lee; Hyunwoo, Kim; Chungsu, Jang & Kim, Hyoung-Joo
 * FolksoViz: A subsumption-based folksonomy visualization using wikipedia texts
 * 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China
 * 2008
 * 
 * {{hidden||In this paper, targeting del.icio.us tag data, we propose a method, FolksoViz, for deriving subsumption relationships between tags by using Wikipedia texts, and visualizing a folksonomy. To fulfill this method, we propose a statistical model for deriving subsumption relationships based on the frequency of each tag on the Wikipedia texts, as well as the TSD} (Tag} Sense Disambiguation) method for mapping each tag to a corresponding Wikipedia text. The derived subsumption pairs are visualized effectively on the screen. The experiment shows that the FolksoViz} manages to find the correct subsumption pairs with high accuracy.}}
 * {{hidden||In this paper, targeting del.icio.us tag data, we propose a method, FolksoViz, for deriving subsumption relationships between tags by using Wikipedia texts, and visualizing a folksonomy. To fulfill this method, we propose a statistical model for deriving subsumption relationships based on the frequency of each tag on the Wikipedia texts, as well as the TSD} (Tag} Sense Disambiguation) method for mapping each tag to a corresponding Wikipedia text. The derived subsumption pairs are visualized effectively on the screen. The experiment shows that the FolksoViz} manages to find the correct subsumption pairs with high accuracy.}}


 * -- align="left" valign=top
 * Pentzold, Christian & Seidenglanz, Sebastian
 * Foucault@Wiki first steps towards a conceptual framework for the analysis of Wiki discourses
 * WikiSym'06 - 2006 International Symposium on Wikis, August 21, 2006 - August 23, 2006 Odense, Denmark
 * 2006
 * 


 * -- align="left" valign=top
 * Bollacker, Kurt; Cook, Robert & Tufts, Patrick
 * Freebase: A shared database of structured general human knowledge
 * AAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference, July 22, 2007 - July 26, 2007 Vancouver, BC, Canada
 * 2007
 * {{hidden||Freebase is a practical, scalable, graph-shaped database of structured general human knowledge, inspired by Semantic Web research and collaborative data communities such as the Wikipedia. Freebase allows public read and write access through an {HTTP-based} graph-query API} for research, the creation and maintenance of structured data, and application building. Access is free and all data in Freebase has a very open (e.g. Creative Commons, GFDL) license. Copyright 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}
 * {{hidden||Freebase is a practical, scalable, graph-shaped database of structured general human knowledge, inspired by Semantic Web research and collaborative data communities such as the Wikipedia. Freebase allows public read and write access through an {HTTP-based} graph-query API} for research, the creation and maintenance of structured data, and application building. Access is free and all data in Freebase has a very open (e.g. Creative Commons, GFDL) license. Copyright 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}
 * {{hidden||Freebase is a practical, scalable, graph-shaped database of structured general human knowledge, inspired by Semantic Web research and collaborative data communities such as the Wikipedia. Freebase allows public read and write access through an {HTTP-based} graph-query API} for research, the creation and maintenance of structured data, and application building. Access is free and all data in Freebase has a very open (e.g. Creative Commons, GFDL) license. Copyright 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}


 * -- align="left" valign=top
 * Weikum, Gerhard & Theobald, Martin
 * From information to knowledge: Harvesting entities and relationships from web sources
 * 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, June 6, 2010 - June 11, 2010 Indianapolis, IN, United states
 * 2010
 * 
 * {{hidden||There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA} project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting.}}
 * {{hidden||There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA} project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting.}}


 * -- align="left" valign=top
 * Bu, Fan; Zhu, Xingwei; Hao, Yu & Zhu, Xiaoyan
 * Function-based question classification for general QA
 * Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
 * 2010
 * 
 * {{hidden||In contrast with the booming increase of internet data, state-of-art QA} (question answering) systems, otherwise, concerned data from specific domains or resources such as search engine snippets, online forums and Wikipedia in a somewhat isolated way. Users may welcome a more general QA} system for its capability to answer questions of various sources, integrated from existed specialized Sub-QA} engines. In this framework, question classification is the primary task. However, the current paradigms of question classification were focused on some specified type of questions, i.e. factoid questions, which are inappropriate for the general QA.} In this paper, we propose a new question classification paradigm, which includes a question taxonomy suitable to the general QA} and a question classifier based on MLN} (Markov} logic network), where rule-based methods and statistical methods are unified into a single framework in a fuzzy discriminative learning approach. Experiments show that our method outperforms traditional question classification approaches.}}
 * {{hidden||In contrast with the booming increase of internet data, state-of-art QA} (question answering) systems, otherwise, concerned data from specific domains or resources such as search engine snippets, online forums and Wikipedia in a somewhat isolated way. Users may welcome a more general QA} system for its capability to answer questions of various sources, integrated from existed specialized Sub-QA} engines. In this framework, question classification is the primary task. However, the current paradigms of question classification were focused on some specified type of questions, i.e. factoid questions, which are inappropriate for the general QA.} In this paper, we propose a new question classification paradigm, which includes a question taxonomy suitable to the general QA} and a question classifier based on MLN} (Markov} logic network), where rule-based methods and statistical methods are unified into a single framework in a fuzzy discriminative learning approach. Experiments show that our method outperforms traditional question classification approaches.}}


 * -- align="left" valign=top
 * Hecht, Brent; Starosielski, Nicole & Dara-Abrams, Drew
 * Generating educational tourism narratives from wikipedia
 * 2007 AAAI Fall Symposium, November 9, 2007 - November 11, 2007 Arlington, VA, United states
 * 2007


 * -- align="left" valign=top
 * Aker, Ahmet & Gaizauskas, Robert
 * Generating image descriptions using dependency relational patterns
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||This paper presents a novel approach to automatic captioning of geo-tagged images by summarizing multiple web-documents that contain information related to an image's location. The summarizer is biased by dependency pattern models towards sentences which contain features typically provided for different scene types such as those of churches, bridges, etc. Our results show that summaries biased by dependency pattern models lead to significantly higher ROUGE} scores than both n-gram language models reported in previous work and also Wikipedia baseline summaries. Summaries generated using dependency patterns also lead to more readable summaries than those generated without dependency patterns.}}
 * {{hidden||This paper presents a novel approach to automatic captioning of geo-tagged images by summarizing multiple web-documents that contain information related to an image's location. The summarizer is biased by dependency pattern models towards sentences which contain features typically provided for different scene types such as those of churches, bridges, etc. Our results show that summaries biased by dependency pattern models lead to significantly higher ROUGE} scores than both n-gram language models reported in previous work and also Wikipedia baseline summaries. Summaries generated using dependency patterns also lead to more readable summaries than those generated without dependency patterns.}}


 * -- align="left" valign=top
 * Li, Peng; Jiang, Jing & Wang, Yinglin
 * Generating templates of entity summaries with an entity-aspect model and pattern mining
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA} model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We apply our method on five Wikipedia entity categories and compare our method with two baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method.}}
 * {{hidden||In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA} model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We apply our method on five Wikipedia entity categories and compare our method with two baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method.}}


 * -- align="left" valign=top
 * Overell, Simon E & Ruger, Stefan
 * Geographic co-occurrence as a tool for GIR
 * 4th ACM Workshop on Geographical Information Retrieval, GIR '07, Co-located with the 16th ACM Conference on Information and Knowledge Management, CIKM 2007, November 6, 2007 - November 9, 2007 Lisboa, Portugal
 * 2007
 * 


 * -- align="left" valign=top
 * Song, Yi-Cheng; Zhang, Yong-Dong; Zhang, Xu; Cao, Juan & Li, Jing-Tao
 * Google challenge: Incremental-learning for web video categorization on robust semantic feature space
 * 17th ACM International Conference on Multimedia, MM'09, with Co-located Workshops and Symposiums, October 19, 2009 - October 24, 2009 Beijing, China
 * 2009
 * 
 * {{hidden||With the advent of video sharing websites, the amount of videos on the internet grows rapidly. Web video categorization is an efficient methodology to organize the huge amount of data. In this paper, we propose an effective web video categorization algorithm for the large scale dataset. It includes two factors: 1) For the great diversity of web videos, we develop an effective semantic feature space called Concept Collection for Web Video Categorization (CCWV-CD) to represent web videos, which consists of concepts with small semantic gap and high distinguishing ability. Meanwhile, the online Wikipedia API} is employed to diffuse the concept correlations in this space. 2) We propose an incremental support vector machine with fixed number of support vectors (n-ISVM) to fit the large scale incremental learning problem in web video categorization. Extensive experiments are conducted on the dataset of 80024 most representative videos on YouTube} demonstrate that the semantic space with Wikipedia prorogation is more representative for web videos, and N-ISVM} outperforms other algorithms in efficiency when performs the incremental learning.}}
 * {{hidden||With the advent of video sharing websites, the amount of videos on the internet grows rapidly. Web video categorization is an efficient methodology to organize the huge amount of data. In this paper, we propose an effective web video categorization algorithm for the large scale dataset. It includes two factors: 1) For the great diversity of web videos, we develop an effective semantic feature space called Concept Collection for Web Video Categorization (CCWV-CD) to represent web videos, which consists of concepts with small semantic gap and high distinguishing ability. Meanwhile, the online Wikipedia API} is employed to diffuse the concept correlations in this space. 2) We propose an incremental support vector machine with fixed number of support vectors (n-ISVM) to fit the large scale incremental learning problem in web video categorization. Extensive experiments are conducted on the dataset of 80024 most representative videos on YouTube} demonstrate that the semantic space with Wikipedia prorogation is more representative for web videos, and N-ISVM} outperforms other algorithms in efficiency when performs the incremental learning.}}


 * -- align="left" valign=top
 * Curino, Carlo A.; Moon, Hyun J. & Zaniolo, Carlo
 * Graceful database schema evolution: the PRISM workbench
 * Proceedings of the VLDB Endowment VLDB Endowment Hompage
 * 2008
 * 
 * {{hidden||Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has nearly disappeared. As of today, schema evolution remains an error-prone and time-consuming undertaking, because the DB} Administrator (DBA) lacks the methods and tools needed to manage and automate this endeavor by (i) predicting and evaluating the effects of the proposed schema changes, (ii) rewriting queries and applications to operate on the new schema, and (iii) migrating the database. Our PRISM} system takes a big first step toward addressing this pressing need by providing: (i) a language of Schema Modification Operators to express concisely complex schema changes, (ii) tools that allow the DBA} to evaluate the effects of such changes, (iii) optimized translation of old queries to work on the new schema version, (iv) automatic data migration, and (v) full documentation of intervened changes as needed to support data provenance, database flash back, and historical queries. PRISM} solves these problems by integrating recent theoretical advances on mapping composition and invertibility, into a design that also achieves usability and scalability. Wikipedia and its 170+ schema versions provided an invaluable testbed for validating PRISM} tools and their ability to support legacy queries.}}
 * {{hidden||Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has nearly disappeared. As of today, schema evolution remains an error-prone and time-consuming undertaking, because the DB} Administrator (DBA) lacks the methods and tools needed to manage and automate this endeavor by (i) predicting and evaluating the effects of the proposed schema changes, (ii) rewriting queries and applications to operate on the new schema, and (iii) migrating the database. Our PRISM} system takes a big first step toward addressing this pressing need by providing: (i) a language of Schema Modification Operators to express concisely complex schema changes, (ii) tools that allow the DBA} to evaluate the effects of such changes, (iii) optimized translation of old queries to work on the new schema version, (iv) automatic data migration, and (v) full documentation of intervened changes as needed to support data provenance, database flash back, and historical queries. PRISM} solves these problems by integrating recent theoretical advances on mapping composition and invertibility, into a design that also achieves usability and scalability. Wikipedia and its 170+ schema versions provided an invaluable testbed for validating PRISM} tools and their ability to support legacy queries.}}


 * -- align="left" valign=top
 * Hajishirzi, Hannaneh; Shirazi, Afsaneh; Choi, Jaesik & Amir, Eyal
 * Greedy algorithms for sequential sensing decisions
 * Proceedings of the 21st international jont conference on Artifical intelligence
 * 2009
 * 
 * {{hidden||In many real-world situations we are charged with detecting change as soon as possible. Important examples include detecting medical conditions, detecting security breaches, and updating caches of distributed databases. In those situations, sensing can be expensive, but it is also important to detect change in a timely manner. In this paper we present tractable greedy algorithms and prove that they solve this decision problem either optimally or approximate the optimal solution in many cases. Our problem model is a POMDP} that includes a cost for sensing, a cost for delayed detection, a reward for successful detection, and no-cost partial observations. Making optimal decisions is difficult in general. We show that our tractable greedy approach finds optimal policies for sensing both a single variable and multiple correlated variables. Further, we provide approximations for the optimal solution to multiple hidden or observed variables per step. Our algorithms outperform previous algorithms in experiments over simulated data and live Wikipedia WWW} pages.}}
 * {{hidden||In many real-world situations we are charged with detecting change as soon as possible. Important examples include detecting medical conditions, detecting security breaches, and updating caches of distributed databases. In those situations, sensing can be expensive, but it is also important to detect change in a timely manner. In this paper we present tractable greedy algorithms and prove that they solve this decision problem either optimally or approximate the optimal solution in many cases. Our problem model is a POMDP} that includes a cost for sensing, a cost for delayed detection, a reward for successful detection, and no-cost partial observations. Making optimal decisions is difficult in general. We show that our tractable greedy approach finds optimal policies for sensing both a single variable and multiple correlated variables. Further, we provide approximations for the optimal solution to multiple hidden or observed variables per step. Our algorithms outperform previous algorithms in experiments over simulated data and live Wikipedia WWW} pages.}}


 * -- align="left" valign=top
 * Kittur, Aniket & Kraut, Robert E.
 * Harnessing the wisdom of crowds in wikipedia: Quality through coordination
 * 2008 ACM Conference on Computer Supported Cooperative Work, CSCW 08, November 8, 2008 - November 12, 2008 San Diego, CA, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Zheng, Yi; Dai, Qifeng; Luo, Qiming & Chen, Enhong
 * Hedge classification with syntactic dependency features based on an ensemble classifier
 * Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
 * 2010
 * 
 * {{hidden||We present our CoNLL-2010} Shared Task system in the paper. The system operates in three steps: sequence labeling, syntactic dependency parsing, and classification. We have participated in the Shared Task 1. Our experimental results measured by the in-domain and cross-domain F-scores on the biological domain are 81.11\% and 67.99\%, and on the Wikipedia domain 55.48\% and 55.41\%.}}
 * {{hidden||We present our CoNLL-2010} Shared Task system in the paper. The system operates in three steps: sequence labeling, syntactic dependency parsing, and classification. We have participated in the Shared Task 1. Our experimental results measured by the in-domain and cross-domain F-scores on the biological domain are 81.11\% and 67.99\%, and on the Wikipedia domain 55.48\% and 55.41\%.}}


 * -- align="left" valign=top
 * Clausen, David
 * HedgeHunter: a system for hedge detection and uncertainty classification
 * Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
 * 2010
 * 
 * {{hidden||With the dramatic growth of scientific publishing, Information Extraction (IE) systems are becoming an increasingly important tool for large scale data analysis. Hedge detection and uncertainty classification are important components of a high precision IE} system. This paper describes a two part supervised system which classifies words as hedge or non-hedged and sentences as certain or uncertain in biomedical and Wikipedia data. In the first stage, our system trains a logistic regression classifier to detect hedges based on lexical and Part-of-Speech} collocation features. In the second stage, we use the output of the hedge classifier to generate sentence level features based on the number of hedge cues, the identity of hedge cues, and a Bag-of-Words} feature vector to train a logistic regression classifier for sentence level uncertainty. With the resulting classification, an IE} system can then discard facts and relations extracted from these sentences or treat them as appropriately doubtful. We present results for in domain training and testing and cross domain training and testing based on a simple union of training sets.}}
 * {{hidden||With the dramatic growth of scientific publishing, Information Extraction (IE) systems are becoming an increasingly important tool for large scale data analysis. Hedge detection and uncertainty classification are important components of a high precision IE} system. This paper describes a two part supervised system which classifies words as hedge or non-hedged and sentences as certain or uncertain in biomedical and Wikipedia data. In the first stage, our system trains a logistic regression classifier to detect hedges based on lexical and Part-of-Speech} collocation features. In the second stage, we use the output of the hedge classifier to generate sentence level features based on the number of hedge cues, the identity of hedge cues, and a Bag-of-Words} feature vector to train a logistic regression classifier for sentence level uncertainty. With the resulting classification, an IE} system can then discard facts and relations extracted from these sentences or treat them as appropriately doubtful. We present results for in domain training and testing and cross domain training and testing based on a simple union of training sets.}}


 * -- align="left" valign=top
 * Kittur, Aniket; Pendleton, Bryan & Kraut, Robert E.
 * Herding the cats: The influence of groups in coordinating peer production
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Kiayias, Aggelos & Zhou, Hong-Sheng
 * Hidden identity-based signatures
 * Proceedings of the 11th International Conference on Financial cryptography and 1st International conference on Usable Security
 * 2007
 * 
 * {{hidden||This paper introduces Hidden Identity-based Signatures (Hidden-IBS), a type of digital signatures that provide mediated signer-anonymity on top of Shamir's Identity-based signatures. The motivation of our new signature primitive is to resolve an important issue with the kind of anonymity offered by group signatures" where it is required that either the group membership list is public or that the opening authority is dependent on the group manager for its operation. Contrary to this {Hidden-IBS} do not require the maintenance of a group membership list and they enable an opening authority that is totally independent of the group manager. As we argue this makes {Hidden-IBS} much more attractive than group signatures for a number of applications. In this paper we provide a formal model of {Hidden-IBS} as well as two efficient constructions that realize the new primitive. Our elliptic curve construction that is based on the SDH/DLDH} assumptions produces signatures that are merely 4605 bits long and can be implemented very efficiently. To demonstrate the power of the new primitive we apply it to solve a problem of current onion-routing systems focusing on the Tor system in particular. Posting through Tor is currently blocked by sites such as Wikipedia due to the real concern that anonymous channels can be used to vandalize online content. By injecting a {Hidden-IBS} inside the header of an {HTTP} POST} request and requiring the exit-policy of Tor to forward only properly signed POST} requests we demonstrate how sites like Wikipedia may allow anonymous posting while being ensured that the recovery of (say) the IP} address of a vandal would be still possible through a dispute resolution system. Using our new {Hidden-IBS} primitive in this scenario allows to keep the listing of identities (e.g.}}
 * {{hidden||This paper introduces Hidden Identity-based Signatures (Hidden-IBS), a type of digital signatures that provide mediated signer-anonymity on top of Shamir's Identity-based signatures. The motivation of our new signature primitive is to resolve an important issue with the kind of anonymity offered by group signatures" where it is required that either the group membership list is public or that the opening authority is dependent on the group manager for its operation. Contrary to this {Hidden-IBS} do not require the maintenance of a group membership list and they enable an opening authority that is totally independent of the group manager. As we argue this makes {Hidden-IBS} much more attractive than group signatures for a number of applications. In this paper we provide a formal model of {Hidden-IBS} as well as two efficient constructions that realize the new primitive. Our elliptic curve construction that is based on the SDH/DLDH} assumptions produces signatures that are merely 4605 bits long and can be implemented very efficiently. To demonstrate the power of the new primitive we apply it to solve a problem of current onion-routing systems focusing on the Tor system in particular. Posting through Tor is currently blocked by sites such as Wikipedia due to the real concern that anonymous channels can be used to vandalize online content. By injecting a {Hidden-IBS} inside the header of an {HTTP} POST} request and requiring the exit-policy of Tor to forward only properly signed POST} requests we demonstrate how sites like Wikipedia may allow anonymous posting while being ensured that the recovery of (say) the IP} address of a vandal would be still possible through a dispute resolution system. Using our new {Hidden-IBS} primitive in this scenario allows to keep the listing of identities (e.g.}}


 * -- align="left" valign=top
 * Scarpazza, Daniele Paolo & Russell, Gregory F.
 * High-performance regular expression scanning on the Cell/B.E. processor
 * Proceedings of the 23rd international conference on Supercomputing
 * 2009
 * 
 * {{hidden||Matching regular expressions (regexps) is a very common work-load. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in every search engine indexer. Tokenization also consumes 30\% or more of most XML} processors' execution time and represents the first stage of any programming language compiler. Despite the multi-core revolution, regexp scanner generators like flex haven't changed much in 20 years, and they do not exploit the power of recent multi-core architectures (e.g., multiple threads and wide SIMD} units). This is unfortunate, especially given the pervasive importance of search engines and the fast growth of our digital universe. Indexing such data volumes demands precisely the processing power that multi-cores are designed to offer. We present an algorithm and a set of techniques for using multi-core features such as multiple threads and SIMD} instructions to perform parallel regexp-based tokenization. As a proof of concept, we present a family of optimized kernels that implement our algorithm, providing the features of flex on the Cell/B.E.} processor at top performance. Our kernels achieve almost-ideal resource utilization (99.2\% of the clock cycles are Non-NOP} issues). They deliver a peak throughput of 14.30 Gbps per Cell chip, and 9.76 Gbps on Wikipedia input: a remarkable performance, comparable to dedicated hardware solutions. Also, our kernels show speedups of 57-81× over flex on the Cell. Our approach is valuable because it is easily portable to other SIMD-enabled} processors, and there is a general trend toward more and wider SIMD} instructions in architecture design.}}
 * {{hidden||Matching regular expressions (regexps) is a very common work-load. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in every search engine indexer. Tokenization also consumes 30\% or more of most XML} processors' execution time and represents the first stage of any programming language compiler. Despite the multi-core revolution, regexp scanner generators like flex haven't changed much in 20 years, and they do not exploit the power of recent multi-core architectures (e.g., multiple threads and wide SIMD} units). This is unfortunate, especially given the pervasive importance of search engines and the fast growth of our digital universe. Indexing such data volumes demands precisely the processing power that multi-cores are designed to offer. We present an algorithm and a set of techniques for using multi-core features such as multiple threads and SIMD} instructions to perform parallel regexp-based tokenization. As a proof of concept, we present a family of optimized kernels that implement our algorithm, providing the features of flex on the Cell/B.E.} processor at top performance. Our kernels achieve almost-ideal resource utilization (99.2\% of the clock cycles are Non-NOP} issues). They deliver a peak throughput of 14.30 Gbps per Cell chip, and 9.76 Gbps on Wikipedia input: a remarkable performance, comparable to dedicated hardware solutions. Also, our kernels show speedups of 57-81× over flex on the Cell. Our approach is valuable because it is easily portable to other SIMD-enabled} processors, and there is a general trend toward more and wider SIMD} instructions in architecture design.}}


 * -- align="left" valign=top
 * Beesley, Angela
 * How and why Wikipedia works
 * WikiSym'06 - 2006 International Symposium on Wikis, August 21, 2006 - August 23, 2006 Odense, Denmark
 * 2006
 * 


 * -- align="left" valign=top
 * Riehle, Dirk
 * How and why Wikipedia works: An interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko
 * WikiSym'06 - 2006 International Symposium on Wikis, August 21, 2006 - August 23, 2006 Odense, Denmark
 * 2006
 * 
 * {{hidden||This article presents an interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko. All three are leading Wikipedia practitioners in the English, German, and Japanese Wikipedias and related projects. The interview focuses on how Wikipedia works and why these three practitioners believe it will keep working. The interview was conducted via email in preparation of WikiSym} 2006, the 2006 International Symposium on Wikis, with the goal of furthering Wikipedia research [1]. Interviewer was Dirk Riehle, the chair of WikiSym} 2006. An online version of the article provides simplified access to URLs} [2].}}
 * {{hidden||This article presents an interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko. All three are leading Wikipedia practitioners in the English, German, and Japanese Wikipedias and related projects. The interview focuses on how Wikipedia works and why these three practitioners believe it will keep working. The interview was conducted via email in preparation of WikiSym} 2006, the 2006 International Symposium on Wikis, with the goal of furthering Wikipedia research [1]. Interviewer was Dirk Riehle, the chair of WikiSym} 2006. An online version of the article provides simplified access to URLs} [2].}}


 * -- align="left" valign=top
 * Xu, Sean & Zhang, Xiaoquan
 * How Do Social Media Shape the Information Environment in the Financial Market?
 * 2009
 * 
 * 


 * -- align="left" valign=top
 * Lindholm, Tancred & Kangasharju, Jaakko
 * How to edit gigabyte XML files on a mobile phone with XAS, RefTrees, and RAXS
 * Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services
 * 2008
 * 
 * {{hidden||The Open Source mobility middleware developed in the Fuego Core project provides a stack for efficient XML} processing on limited devices. Its components are a persistent map API, advanced XML} serialization and out-of-order parsing with byte-level access (XAS), data structures and algorithms for lazy manipulation and random access to XML} trees (RefTree), and a component for XML} document management (RAXS) such as packaging, versioning, and synchronization. The components provide a toolbox of simple and lightweight XML} processing techniques rather than a complete XML} database. We demonstrate the Fuego XML} stack by building a viewer and multiversion editor capable of processing gigabyte-sized Wikipedia XML} files on a mobile phone. We present performance measurements obtained on the phone, and a comparison to implementations based on existing technologies. These show that the Fuego XML} stack allows going beyond what is commonly considered feasible on limited devices in terms of XML} processing, and that it provides advantages in terms of decreased set-up time and storage space requirements compared to existing approaches.}}
 * {{hidden||The Open Source mobility middleware developed in the Fuego Core project provides a stack for efficient XML} processing on limited devices. Its components are a persistent map API, advanced XML} serialization and out-of-order parsing with byte-level access (XAS), data structures and algorithms for lazy manipulation and random access to XML} trees (RefTree), and a component for XML} document management (RAXS) such as packaging, versioning, and synchronization. The components provide a toolbox of simple and lightweight XML} processing techniques rather than a complete XML} database. We demonstrate the Fuego XML} stack by building a viewer and multiversion editor capable of processing gigabyte-sized Wikipedia XML} files on a mobile phone. We present performance measurements obtained on the phone, and a comparison to implementations based on existing technologies. These show that the Fuego XML} stack allows going beyond what is commonly considered feasible on limited devices in terms of XML} processing, and that it provides advantages in terms of decreased set-up time and storage space requirements compared to existing approaches.}}


 * -- align="left" valign=top
 * Medelyan, Olena; Frank, Eibe & Witten, Ian H.
 * Human-competitive tagging using automatic keyphrase extraction
 * Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
 * 2009
 * 
 * {{hidden||This paper connects two research areas: automatic tagging on the web and statistical keyphrase extraction. First, we analyze the quality of tags in a collaboratively created folksonomy using traditional evaluation techniques. Next, we demonstrate how documents can be tagged automatically with a state-of-the-art keyphrase extraction algorithm, and further improve performance in this new domain using a new algorithm, Maui"} that utilizes semantic information extracted from Wikipedia. Maui outperforms existing approaches and extracts tags that are competitive with those assigned by the best performing human taggers."}}
 * {{hidden||This paper connects two research areas: automatic tagging on the web and statistical keyphrase extraction. First, we analyze the quality of tags in a collaboratively created folksonomy using traditional evaluation techniques. Next, we demonstrate how documents can be tagged automatically with a state-of-the-art keyphrase extraction algorithm, and further improve performance in this new domain using a new algorithm, Maui"} that utilizes semantic information extracted from Wikipedia. Maui outperforms existing approaches and extracts tags that are competitive with those assigned by the best performing human taggers."}}


 * -- align="left" valign=top
 * Yamada, Ichiro; Torisawa, Kentaro; Kazama, Jun'ichi; Kuroda, Kow; Murata, Masaki; Saeger, Stijn De; Bond, Francis & Sumida, Asuka
 * Hypernym discovery based on distributional similarity and hierarchical structures
 * Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
 * 2009
 * 
 * {{hidden||This paper presents a new method of developing a large-scale hyponymy relation database by combining Wikipedia and other Web documents. We attach new words to the hyponymy database extracted from Wikipedia by using distributional similarity calculated from documents on the Web. For a given target word, our algorithm first finds k similar words from the Wikipedia database. Then, the hypernyms of these k similar words are assigned scores by considering the distributional similarities and hierarchical distances in the Wikipedia database. Finally, new hyponymy relations are output according to the scores. In this paper, we tested two distributional similarities. One is based on raw verb-noun dependencies (which we call RVD") and the other is based on a large-scale clustering of verb-noun dependencies (called {"CVD").} Our method achieved an attachment accuracy of 91.0\% for the top 10000 relations and an attachment accuracy of 74.5\% for the top 100000 relations when using CVD.} This was a far better outcome compared to the other baseline approaches. Excluding the region that had very high scores}}
 * {{hidden||This paper presents a new method of developing a large-scale hyponymy relation database by combining Wikipedia and other Web documents. We attach new words to the hyponymy database extracted from Wikipedia by using distributional similarity calculated from documents on the Web. For a given target word, our algorithm first finds k similar words from the Wikipedia database. Then, the hypernyms of these k similar words are assigned scores by considering the distributional similarities and hierarchical distances in the Wikipedia database. Finally, new hyponymy relations are output according to the scores. In this paper, we tested two distributional similarities. One is based on raw verb-noun dependencies (which we call RVD") and the other is based on a large-scale clustering of verb-noun dependencies (called {"CVD").} Our method achieved an attachment accuracy of 91.0\% for the top 10000 relations and an attachment accuracy of 74.5\% for the top 100000 relations when using CVD.} This was a far better outcome compared to the other baseline approaches. Excluding the region that had very high scores}}


 * -- align="left" valign=top
 * Iftene, Adrian & Balahur-Dobrescu, Alexandra
 * Hypothesis transformation and semantic variability rules used in recognizing textual entailment
 * Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
 * 2007
 * 
 * {{hidden||Based on the core approach of the tree edit distance algorithm, the system central module is designed to target the scope of TE} -- semantic variability. The main idea is to transform the hypothesis making use of extensive semantic knowledge from sources like DIRT, WordNet, Wikipedia, acronyms database. Additionally, we built a system to acquire the extra background knowledge needed and applied complex grammar rules for rephrasing in English.}}
 * {{hidden||Based on the core approach of the tree edit distance algorithm, the system central module is designed to target the scope of TE} -- semantic variability. The main idea is to transform the hypothesis making use of extensive semantic knowledge from sources like DIRT, WordNet, Wikipedia, acronyms database. Additionally, we built a system to acquire the extra background knowledge needed and applied complex grammar rules for rephrasing in English.}}


 * -- align="left" valign=top
 * Wen, Dunwei; Liu, Ming-Chi; Huang, Yueh-Min; Kinshuk & Hung, Pi-Hsia
 * Identifying Animals with Dynamic Location-aware and Semantic Hierarchy-Based Image Browsing for Different Cognitive Style Learners
 * Advanced Learning Technologies (ICALT), 2010 IEEE 10th International Conference on
 * 2010
 * {{hidden||Lack of overall ecological knowledge structure is a critical reason for learners' failure in keyword-based search. To address this issue, this paper firstly presents the dynamic location-aware and semantic hierarchy (DLASH) designed for the learners to browse images, which aims to identify learners' current interesting sights and provide adaptive assistance accordingly in ecological learning. The main idea is based on the observation that the species of plants and animals are discontinuously distributed around the planet, and hence their semantic hierarchy, besides its structural similarity with WordNet, is related to location information. This study then investigates how different cognitive styles of the learners influence the use of DLASH} in their image browsing. The preliminary results show that the learners perform better when using DLASH} based image browsing than using the Flickr one. In addition, cognitive styles have more effects on image browsing in the DLASH} version than in the Flickr one.}}
 * {{hidden||Lack of overall ecological knowledge structure is a critical reason for learners' failure in keyword-based search. To address this issue, this paper firstly presents the dynamic location-aware and semantic hierarchy (DLASH) designed for the learners to browse images, which aims to identify learners' current interesting sights and provide adaptive assistance accordingly in ecological learning. The main idea is based on the observation that the species of plants and animals are discontinuously distributed around the planet, and hence their semantic hierarchy, besides its structural similarity with WordNet, is related to location information. This study then investigates how different cognitive styles of the learners influence the use of DLASH} in their image browsing. The preliminary results show that the learners perform better when using DLASH} based image browsing than using the Flickr one. In addition, cognitive styles have more effects on image browsing in the DLASH} version than in the Flickr one.}}
 * {{hidden||Lack of overall ecological knowledge structure is a critical reason for learners' failure in keyword-based search. To address this issue, this paper firstly presents the dynamic location-aware and semantic hierarchy (DLASH) designed for the learners to browse images, which aims to identify learners' current interesting sights and provide adaptive assistance accordingly in ecological learning. The main idea is based on the observation that the species of plants and animals are discontinuously distributed around the planet, and hence their semantic hierarchy, besides its structural similarity with WordNet, is related to location information. This study then investigates how different cognitive styles of the learners influence the use of DLASH} in their image browsing. The preliminary results show that the learners perform better when using DLASH} based image browsing than using the Flickr one. In addition, cognitive styles have more effects on image browsing in the DLASH} version than in the Flickr one.}}


 * -- align="left" valign=top
 * Lipka, Nedim & Stein, Benno
 * Identifying featured articles in Wikipedia: Writing style matters
 * 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states
 * 2010
 * 
 * {{hidden||Wikipedia provides an information quality assessment model with criteria for human peer reviewers to identify featured articles. For this classification task Is} an article featured or not?" we present a machine learning approach that exploits an article's character trigram distribution. Our approach differs from existing research in that it aims to writing style rather than evaluating meta features like the edit history. The approach is robust straightforward to implement and outperforms existing solutions. We underpin these claims by an experiment design where among others the domain transferability is analyzed. The achieved performances in terms of the F-measure for featured articles are 0.964 within a single Wikipedia domain and 0.880 in a domain transfer situation. 2010 Copyright is held by the author/owner(s)."}}
 * {{hidden||Wikipedia provides an information quality assessment model with criteria for human peer reviewers to identify featured articles. For this classification task Is} an article featured or not?" we present a machine learning approach that exploits an article's character trigram distribution. Our approach differs from existing research in that it aims to writing style rather than evaluating meta features like the edit history. The approach is robust straightforward to implement and outperforms existing solutions. We underpin these claims by an experiment design where among others the domain transferability is analyzed. The achieved performances in terms of the F-measure for featured articles are 0.964 within a single Wikipedia domain and 0.880 in a domain transfer situation. 2010 Copyright is held by the author/owner(s)."}}


 * -- align="left" valign=top
 * Chang, Ming-Wei; Ratinov, Lev; Roth, Dan & Srikumar, Vivek
 * Importance of semantic representation: Dataless classification
 * 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08, July 13, 2008 - July 17, 2008 Chicago, IL, United states
 * 2008


 * -- align="left" valign=top
 * Fuxman, Ariel; Kannan, Anitha; Goldberg, Andrew B.; Agrawal, Rakesh; Tsaparas, Panayiotis & Shafer, John
 * Improving classification accuracy using automatically extracted training data
 * Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
 * 2009
 * 
 * {{hidden||Classification is a core task in knowledge discovery and data mining, and there has been substantial research effort in developing sophisticated classification models. In a parallel thread, recent work from the NLP} community suggests that for tasks such as natural language disambiguation even a simple algorithm can outperform a sophisticated one, if it is provided with large quantities of high quality training data. In those applications, training data occurs naturally in text corpora, and high quality training data sets running into billions of words have been reportedly used. We explore how we can apply the lessons from the NLP} community to KDD} tasks. Specifically, we investigate how to identify data sources that can yield training data at low cost and study whether the quantity of the automatically extracted training data can compensate for its lower quality. We carry out this investigation for the specific task of inferring whether a search query has commercial intent. We mine toolbar and click logs to extract queries from sites that are predominantly commercial (e.g., Amazon) and non-commercial (e.g., Wikipedia). We compare the accuracy obtained using such training data against manually labeled training data. Our results show that we can have large accuracy gains using automatically extracted training data at much lower cost.}}
 * {{hidden||Classification is a core task in knowledge discovery and data mining, and there has been substantial research effort in developing sophisticated classification models. In a parallel thread, recent work from the NLP} community suggests that for tasks such as natural language disambiguation even a simple algorithm can outperform a sophisticated one, if it is provided with large quantities of high quality training data. In those applications, training data occurs naturally in text corpora, and high quality training data sets running into billions of words have been reportedly used. We explore how we can apply the lessons from the NLP} community to KDD} tasks. Specifically, we investigate how to identify data sources that can yield training data at low cost and study whether the quantity of the automatically extracted training data can compensate for its lower quality. We carry out this investigation for the specific task of inferring whether a search query has commercial intent. We mine toolbar and click logs to extract queries from sites that are predominantly commercial (e.g., Amazon) and non-commercial (e.g., Wikipedia). We compare the accuracy obtained using such training data against manually labeled training data. Our results show that we can have large accuracy gains using automatically extracted training data at much lower cost.}}


 * -- align="left" valign=top
 * MacKinnon, Ian & Vechtomova, Olga
 * Improving complex interactive question answering with Wikipedia anchor text
 * 30th Annual European Conference on Information Retrieval, ECIR 2008, March 30, 2008 - April 3, 2008 Glasgow, United kingdom
 * 2008
 * 
 * {{hidden||When the objective of an information retrieval task is to return a nugget rather than a document, query terms that exist in a document will often not be used in the most relevant information nugget in the document. In this paper, a new method of query expansion is proposed based on the Wikipedia link structure surrounding the most relevant articles selected automatically. Evaluated with the Nuggeteer automatic scoring software, an increase in the F-scores is found from the TREC} Complex Interactive Question Answering task when integrating this expansion into an already high-performing baseline system. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||When the objective of an information retrieval task is to return a nugget rather than a document, query terms that exist in a document will often not be used in the most relevant information nugget in the document. In this paper, a new method of query expansion is proposed based on the Wikipedia link structure surrounding the most relevant articles selected automatically. Evaluated with the Nuggeteer automatic scoring software, an increase in the F-scores is found from the TREC} Complex Interactive Question Answering task when integrating this expansion into an already high-performing baseline system. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Wang, Pu; Hu, Jian; Zeng, Hua-Jun; Chen, Lijun & Chen, Zheng
 * Improving text classification by using encyclopedia knowledge
 * 7th IEEE International Conference on Data Mining, ICDM 2007, October 28, 2007 - October 31, 2007 Omaha, NE, United states
 * 2007
 * 


 * -- align="left" valign=top
 * Li, Yinghao; Luk, Wing Pong Robert; Ho, Kei Shiu Edward & Chung, Fu Lai Korris
 * Improving weak ad-hoc queries using wikipedia asexternal corpus
 * 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, July 23, 2007 - July 27, 2007 Amsterdam, Netherlands
 * 2007
 * 


 * -- align="left" valign=top
 * Wan, Stephen & Paris, Cécile
 * In-browser summarisation: generating elaborative summaries biased towards the reading context
 * Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
 * 2008
 * 
 * {{hidden||We investigate elaborative summarisation, where the aim is to identify supplementary information that expands upon a key fact. We envisage such summaries being useful when browsing certain kinds of (hyper-)linked document sets, such as Wikipedia articles or repositories of publications linked by citations. For these collections, an elaborative summary is intended to provide additional information on the linking anchor text. Our contribution in this paper focuses on identifying and exploring a real task in which summarisation is situated, realised as an In-Browser} tool. We also introduce a neighbourhood scoring heuristic as a means of scoring matches to relevant passages of the document. In a preliminary evaluation using this method, our summarisation system scores above our baselines and achieves a recall of 57\% annotated gold standard sentences.}}
 * {{hidden||We investigate elaborative summarisation, where the aim is to identify supplementary information that expands upon a key fact. We envisage such summaries being useful when browsing certain kinds of (hyper-)linked document sets, such as Wikipedia articles or repositories of publications linked by citations. For these collections, an elaborative summary is intended to provide additional information on the linking anchor text. Our contribution in this paper focuses on identifying and exploring a real task in which summarisation is situated, realised as an In-Browser} tool. We also introduce a neighbourhood scoring heuristic as a means of scoring matches to relevant passages of the document. In a preliminary evaluation using this method, our summarisation system scores above our baselines and achieves a recall of 57\% annotated gold standard sentences.}}


 * -- align="left" valign=top
 * Wu, Fei; Hoffmann, Raphael & Weld, Daniel S.
 * Information extraction from Wikipedia: Moving down the long tail
 * 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, August 24, 2008 - August 27, 2008 Las Vegas, NV, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Wagner, Christian & Prasarnphanich, Pattarawan
 * Innovating collaborative content creation: The role of altruism and wiki technology
 * 40th Annual Hawaii International Conference on System Sciences 2007, HICSS'07, January 3, 2007 - January 6, 2007 Big Island, {HI, United states
 * 2007
 * 


 * -- align="left" valign=top
 * Medelyan, Olena & Legg, Catherine
 * Integrating cyc and wikipedia: Folksonomy meets rigorously defined common-sense
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008


 * -- align="left" valign=top
 * Medelyan, Olena & Legg, Catherine
 * Integrating cyc and wikipedia: Folksonomy meets rigorously defined common-sense
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008


 * -- align="left" valign=top
 * Weld, Daniel S.; Wu, Fei; Adar, Eytan; Amershi, Saleema; Fogarty, James; Hoffmann, Raphael; Patel, Kayur & Skinner, Michael
 * Intelligence in wikipedia
 * 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08, July 13, 2008 - July 17, 2008 Chicago, IL, United states
 * 2008
 * {{hidden||The Intelligence in Wikipedia project at the University of Washington is combining self-supervised information extraction (IE) techniques with a mixed initiative interface designed to encourage communal content creation (CCC).} Since IE} and CCC} are each powerful ways to produce large amounts of structured information, they have been studied extensively - but only in isolation. By combining the two methods in a virtuous feedback cycle, we aim for substantial synergy. While previous papers have described the details of individual aspects of our endeavor [25, 26, 24, 13], this report provides an overview of the project's progress and vision. Copyright 2008.}}
 * {{hidden||The Intelligence in Wikipedia project at the University of Washington is combining self-supervised information extraction (IE) techniques with a mixed initiative interface designed to encourage communal content creation (CCC).} Since IE} and CCC} are each powerful ways to produce large amounts of structured information, they have been studied extensively - but only in isolation. By combining the two methods in a virtuous feedback cycle, we aim for substantial synergy. While previous papers have described the details of individual aspects of our endeavor [25, 26, 24, 13], this report provides an overview of the project's progress and vision. Copyright 2008.}}
 * {{hidden||The Intelligence in Wikipedia project at the University of Washington is combining self-supervised information extraction (IE) techniques with a mixed initiative interface designed to encourage communal content creation (CCC).} Since IE} and CCC} are each powerful ways to produce large amounts of structured information, they have been studied extensively - but only in isolation. By combining the two methods in a virtuous feedback cycle, we aim for substantial synergy. While previous papers have described the details of individual aspects of our endeavor [25, 26, 24, 13], this report provides an overview of the project's progress and vision. Copyright 2008.}}


 * -- align="left" valign=top
 * Halpin, Harry
 * Is there anything worth finding on the semantic web?
 * Proceedings of the 18th international conference on World wide web
 * 2009
 * 


 * -- align="left" valign=top
 * Sato, Issei & Nakagawa, Hiroshi
 * Knowledge discovery of multiple-topic document using parametric mixture model with dirichlet prior
 * KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 12, 2007 - August 15, 2007 San Jose, CA, United states
 * 2007
 * 
 * {{hidden||Documents, such as those seen on Wikipedia and Folksonomy, have tended to be assigned with multiple topics as a Meta-data.Therefore, it is more and more important to analyze a relationship between a document and topics assigned to the document. In this paper, we proposed a novel probabilistic generative model of documents with multiple topics as a meta-data. By focusing on modeling the generation process of a document with multiple topics, we can extract specific properties of documents with multiple Topics.Proposed} model is an expansion of an existing probabilistic generative model: Parametric Mixture Model (PMM).} PMM} models documents with multiple topics by mixing model parameters of each single topic. Since, however, PMM} assigns the same mixture ratio to each single topic, PMM} cannot take into account the bias of each topic within a document. To deal with this problem, we propose a model that considers Dirichlet distribution as a prior distribution of the mixture Ratio.We} adopt Variational Bayes Method to infer the bias of each topic within a document. We evaluate the proposed model and PMM} using MEDLINE} Corpus.The} results of F-measure, Precision and Recall show that the proposed model is more effective than PMM} on multiple-topic classification. Moreover, we indicate the potential of the proposed model that extracts topics and document-specific keywords using information about the assigned topics.}}
 * {{hidden||Documents, such as those seen on Wikipedia and Folksonomy, have tended to be assigned with multiple topics as a Meta-data.Therefore, it is more and more important to analyze a relationship between a document and topics assigned to the document. In this paper, we proposed a novel probabilistic generative model of documents with multiple topics as a meta-data. By focusing on modeling the generation process of a document with multiple topics, we can extract specific properties of documents with multiple Topics.Proposed} model is an expansion of an existing probabilistic generative model: Parametric Mixture Model (PMM).} PMM} models documents with multiple topics by mixing model parameters of each single topic. Since, however, PMM} assigns the same mixture ratio to each single topic, PMM} cannot take into account the bias of each topic within a document. To deal with this problem, we propose a model that considers Dirichlet distribution as a prior distribution of the mixture Ratio.We} adopt Variational Bayes Method to infer the bias of each topic within a document. We evaluate the proposed model and PMM} using MEDLINE} Corpus.The} results of F-measure, Precision and Recall show that the proposed model is more effective than PMM} on multiple-topic classification. Moreover, we indicate the potential of the proposed model that extracts topics and document-specific keywords using information about the assigned topics.}}


 * -- align="left" valign=top
 * Weikum, G.
 * Knowledge on the Web: Robust and Scalable Harvesting of Entity-Relationship Facts
 * Database Systems for Advanced Applications. 15th International Conference, DASFAA 2010, 1-4 April 2010 Berlin, Germany
 * 2010
 * 
 * {{hidden||Summary form only given. The proliferation of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from semistructured and textual Web data have enabled the construction of very large knowledge bases. These knowledge collections contain facts about many millions of entities and relationships between them, and can be conveniently represented in the RDF} data model. Prominent examples are DBpedia, YAGO, Freebase, Trueknowledge, and others. These structured knowledge collections can be viewed as Semantic} Wikipedia Databases" and they can answer many advanced questions by SPARQL-like} query languages and appropriate ranking models. In addition the knowledge bases can boost the semantic capabilities and precision of entity-oriented Web search and they are enablers for value-added knowledge services and applications in enterprises and online communities. The talk discusses recent advances in the large-scale harvesting of entity-relationship facts from Web sources and it points out the next frontiers in building comprehensive knowledge bases and enabling semantic search services. In particular}}
 * {{hidden||Summary form only given. The proliferation of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from semistructured and textual Web data have enabled the construction of very large knowledge bases. These knowledge collections contain facts about many millions of entities and relationships between them, and can be conveniently represented in the RDF} data model. Prominent examples are DBpedia, YAGO, Freebase, Trueknowledge, and others. These structured knowledge collections can be viewed as Semantic} Wikipedia Databases" and they can answer many advanced questions by SPARQL-like} query languages and appropriate ranking models. In addition the knowledge bases can boost the semantic capabilities and precision of entity-oriented Web search and they are enablers for value-added knowledge services and applications in enterprises and online communities. The talk discusses recent advances in the large-scale harvesting of entity-relationship facts from Web sources and it points out the next frontiers in building comprehensive knowledge bases and enabling semantic search services. In particular}}


 * -- align="left" valign=top
 * Ponzetto, Simone Paolo & Navigli, Roberto
 * Knowledge-rich Word Sense Disambiguation rivaling supervised systems
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||One of the main obstacles to high-performance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet} with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD} systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.}}
 * {{hidden||One of the main obstacles to high-performance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet} with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD} systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.}}


 * -- align="left" valign=top
 * Elbassuoni, Shady; Ramanath, Maya; Schenkel, Ralf; Sydow, Marcin & Weikum, Gerhard
 * Language-model-based ranking for queries on RDF-graphs
 * ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China
 * 2009
 * 
 * {{hidden||The success of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from textual and Web sources have made it possible to build large knowledge repositories" such as DBpedia} Freebase and YAGO.} These collections can be viewed as graphs of entities and relationships (ER} graphs) and can be represented as a set of subject-property-object (SPO) triples in the Semantic-Web} data model RDF.} Queries can be expressed in the W3C-endorsed} SPARQL} language or by similarly designed graph-pattern search. However exact-match query semantics often fall short of satisfying the users' needs by returning too many or too few results. Therefore IR-style} ranking models are crucially needed. In this paper}}
 * {{hidden||The success of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from textual and Web sources have made it possible to build large knowledge repositories" such as DBpedia} Freebase and YAGO.} These collections can be viewed as graphs of entities and relationships (ER} graphs) and can be represented as a set of subject-property-object (SPO) triples in the Semantic-Web} data model RDF.} Queries can be expressed in the W3C-endorsed} SPARQL} language or by similarly designed graph-pattern search. However exact-match query semantics often fall short of satisfying the users' needs by returning too many or too few results. Therefore IR-style} ranking models are crucially needed. In this paper}}


 * -- align="left" valign=top
 * Ponzetto, Simone Paolo & Navigli, Roberto
 * Large-scale taxonomy mapping for restructuring and integrating wikipedia
 * Proceedings of the 21st international jont conference on Artifical intelligence
 * 2009
 * 
 * {{hidden||We present a knowledge-rich methodology for disambiguating Wikipedia categories with WordNet} synsets and using this semantic information to restructure a taxonomy automatically generated from the Wikipedia system of categories. We evaluate against a manual gold standard and show that both category disambiguation and taxonomy restructuring perform with high accuracy. Besides, we assess these methods on automatically generated datasets and show that we are able to effectively enrich WordNet} with a large number of instances from Wikipedia. Our approach produces an integrated resource, thus bringing together the fine-grained classification of instances in Wikipedia and a well-structured top-level taxonomy from WordNet.}}
 * {{hidden||We present a knowledge-rich methodology for disambiguating Wikipedia categories with WordNet} synsets and using this semantic information to restructure a taxonomy automatically generated from the Wikipedia system of categories. We evaluate against a manual gold standard and show that both category disambiguation and taxonomy restructuring perform with high accuracy. Besides, we assess these methods on automatically generated datasets and show that we are able to effectively enrich WordNet} with a large number of instances from Wikipedia. Our approach produces an integrated resource, thus bringing together the fine-grained classification of instances in Wikipedia and a well-structured top-level taxonomy from WordNet.}}


 * -- align="left" valign=top
 * Luther, Kurt & Bruckman, Amy
 * Leadership in online creative collaboration
 * Proceedings of the 2008 ACM conference on Computer supported cooperative work
 * 2008
 * 


 * -- align="left" valign=top
 * Wierzbicki, Adam; Turek, Piotr & Nielek, Radoslaw
 * Learning about team collaboration from wikipedia edit history
 * 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland
 * 2010
 * 


 * -- align="left" valign=top
 * Pasternack, Jeff & Roth, Dan
 * Learning better transliterations
 * Proceeding of the 18th ACM conference on Information and knowledge management
 * 2009
 * 


 * -- align="left" valign=top
 * Napoles, Courtney & Dredze, Mark
 * Learning simple Wikipedia: a cogitation in ascertaining abecedarian language
 * Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids
 * 2010
 * 
 * {{hidden||Text simplification is the process of changing vocabulary and grammatical structure to create a more accessible version of the text while maintaining the underlying information and content. Automated tools for text simplification are a practical way to make large corpora of text accessible to a wider audience lacking high levels of fluency in the corpus language. In this work, we investigate the potential of Simple Wikipedia to assist automatic text simplification by building a statistical classification system that discriminates simple English from ordinary English. Most text simplification systems are based on hand-written rules (e.g., PEST} (Carroll} et al., 1999) and its module SYSTAR} (Canning} et al., 2000)), and therefore face limitations scaling and transferring across domains. The potential for using Simple Wikipedia for text simplification is significant; it contains nearly 60,000 articles with revision histories and aligned articles to ordinary English Wikipedia. Using articles from Simple Wikipedia and ordinary Wikipedia, we evaluated different classifiers and feature sets to identify the most discriminative features of simple English for use across domains. These findings help further understanding of what makes text simple and can be applied as a tool to help writers craft simple text.}}
 * {{hidden||Text simplification is the process of changing vocabulary and grammatical structure to create a more accessible version of the text while maintaining the underlying information and content. Automated tools for text simplification are a practical way to make large corpora of text accessible to a wider audience lacking high levels of fluency in the corpus language. In this work, we investigate the potential of Simple Wikipedia to assist automatic text simplification by building a statistical classification system that discriminates simple English from ordinary English. Most text simplification systems are based on hand-written rules (e.g., PEST} (Carroll} et al., 1999) and its module SYSTAR} (Canning} et al., 2000)), and therefore face limitations scaling and transferring across domains. The potential for using Simple Wikipedia for text simplification is significant; it contains nearly 60,000 articles with revision histories and aligned articles to ordinary English Wikipedia. Using articles from Simple Wikipedia and ordinary Wikipedia, we evaluated different classifiers and feature sets to identify the most discriminative features of simple English for use across domains. These findings help further understanding of what makes text simple and can be applied as a tool to help writers craft simple text.}}


 * -- align="left" valign=top
 * Phan, Xuan-Hieu; Nguyen, Le-Minh & Horiguchi, Susumu
 * Learning to classify short and sparse text web with hidden topics from large-scale data collections
 * 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China
 * 2008
 * 
 * {{hidden||This paper presents a general framework for building classifiers that deal with short and sparse text Web segments by making the most of hidden topics discovered from large-scale data collections. The main motivation of this work is that many classification tasks working with short segments of text Web, such as search snippets, forum chat messages, blog news feeds, product reviews, and book movie summaries, fail to achieve high accuracy due to the data sparseness. We, therefore, come up with an idea of gaining external knowledge to make the data more related as well as expand the coverage of classifiers to handle future data better. The underlying idea of the framework is that for each classification task, we collect a large-scale external data collection called universal dataset" and then build a classifier on both a (small) set of labeled training data and a rich set of hidden topics discovered from that data collection. The framework is general enough to be applied to different data domains and genres ranging from Web search results to medical text. We did a careful evaluation on several hundred megabytes of Wikipedia (30M} words) and MEDLINE} (18M} words) with two tasks: {"Web} search domain disambiguation" and "disease categorization for medical text" and achieved significant quality enhancement."}}
 * {{hidden||This paper presents a general framework for building classifiers that deal with short and sparse text Web segments by making the most of hidden topics discovered from large-scale data collections. The main motivation of this work is that many classification tasks working with short segments of text Web, such as search snippets, forum chat messages, blog news feeds, product reviews, and book movie summaries, fail to achieve high accuracy due to the data sparseness. We, therefore, come up with an idea of gaining external knowledge to make the data more related as well as expand the coverage of classifiers to handle future data better. The underlying idea of the framework is that for each classification task, we collect a large-scale external data collection called universal dataset" and then build a classifier on both a (small) set of labeled training data and a rich set of hidden topics discovered from that data collection. The framework is general enough to be applied to different data domains and genres ranging from Web search results to medical text. We did a careful evaluation on several hundred megabytes of Wikipedia (30M} words) and MEDLINE} (18M} words) with two tasks: {"Web} search domain disambiguation" and "disease categorization for medical text" and achieved significant quality enhancement."}}


 * -- align="left" valign=top
 * Milne, David & Witten, Ian H.
 * Learning to link with wikipedia
 * 17th ACM Conference on Information and Knowledge Management, CIKM'08, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Druck, Gregory; Miklau, Gerome & McCallum, Andrew
 * Learning to predict the quality of contributions to wikipedia
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008


 * -- align="left" valign=top
 * Banerjee, Somnath; Chakrabarti, Soumen & Ramakrishnan, Ganesh
 * Learning to rank for quantity consensus queries
 * Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
 * 2009
 * 
 * {{hidden||Web search is increasingly exploiting named entities like persons, places, businesses, addresses and dates. Entity ranking is also of current interest at INEX} and TREC.} Numerical quantities are an important class of entities, especially in queries about prices and features related to products, services and travel. We introduce Quantity Consensus Queries (QCQs), where each answer is a tight quantity interval distilled from evidence of relevance in thousands of snippets. Entity search and factoid question answering have benefited from aggregating evidence from multiple promising snippets, but these do not readily apply to quantities. Here we propose two new algorithms that learn to aggregate information from multiple snippets. We show that typical signals used in entity ranking, like rarity of query words and their lexical proximity to candidate quantities, are very noisy. Our algorithms learn to score and rankquantity intervals directly, combining snippet quantity and snippet text information. We report on experiments using hundreds of QCQs} with ground truth taken from TREC} QA, Wikipedia Infoboxes, and other sources, leading to tens of thousands of candidate snippets and quantities. Our algorithms yield about 20\% better MAP} and NDCG} compared to the best-known collective rankers, and are 35\% better than scoring snippets independent of each other.}}
 * {{hidden||Web search is increasingly exploiting named entities like persons, places, businesses, addresses and dates. Entity ranking is also of current interest at INEX} and TREC.} Numerical quantities are an important class of entities, especially in queries about prices and features related to products, services and travel. We introduce Quantity Consensus Queries (QCQs), where each answer is a tight quantity interval distilled from evidence of relevance in thousands of snippets. Entity search and factoid question answering have benefited from aggregating evidence from multiple promising snippets, but these do not readily apply to quantities. Here we propose two new algorithms that learn to aggregate information from multiple snippets. We show that typical signals used in entity ranking, like rarity of query words and their lexical proximity to candidate quantities, are very noisy. Our algorithms learn to score and rankquantity intervals directly, combining snippet quantity and snippet text information. We report on experiments using hundreds of QCQs} with ground truth taken from TREC} QA, Wikipedia Infoboxes, and other sources, leading to tens of thousands of candidate snippets and quantities. Our algorithms yield about 20\% better MAP} and NDCG} compared to the best-known collective rankers, and are 35\% better than scoring snippets independent of each other.}}


 * -- align="left" valign=top
 * Navigli, Roberto & Velardi, Paola
 * Learning word-class lattices for definition and hypernym extraction
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||Definition extraction is the task of automatically identifying definitional sentences within texts. The task has proven useful in many research areas including ontology learning, relation extraction and question answering. However, current approaches -- mostly focused on lexicosyntactic patterns -- suffer from both low recall and precision, as definitional sentences occur in highly variable syntactic structures. In this paper, we propose Word-Class} Lattices (WCLs), a generalization of word lattices that we use to model textual definitions. Lattices are learned from a dataset of definitions from Wikipedia. Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern generalization methods proposed in the literature.}}
 * {{hidden||Definition extraction is the task of automatically identifying definitional sentences within texts. The task has proven useful in many research areas including ontology learning, relation extraction and question answering. However, current approaches -- mostly focused on lexicosyntactic patterns -- suffer from both low recall and precision, as definitional sentences occur in highly variable syntactic structures. In this paper, we propose Word-Class} Lattices (WCLs), a generalization of word lattices that we use to model textual definitions. Lattices are learned from a dataset of definitions from Wikipedia. Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern generalization methods proposed in the literature.}}


 * -- align="left" valign=top
 * Ganjisaffar, Yasser; Javanmardi, Sara & Lopes, Cristina
 * Leveraging crowdsourcing heuristics to improve search in Wikipedia
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Kirschenbaum, Amit & Wintner, Shuly
 * Lightly supervised transliteration for machine translation
 * Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
 * 2009
 * 
 * {{hidden||We present a Hebrew to English transliteration method in the context of a machine translation system. Our method uses machine learning to determine which terms are to be transliterated rather than translated. The training corpus for this purpose includes only positive examples, acquired semi-automatically. Our classifier reduces more than 38\% of the errors made by a baseline method. The identified terms are then transliterated. We present an SMT-based} transliteration model trained with a parallel corpus extracted from Wikipedia using a fairly simple method which requires minimal knowledge. The correct result is produced in more than 76\% of the cases, and in 92\% of the instances it is one of the top-5 results. We also demonstrate a small improvement in the performance of a {Hebrew-to-English} MT} system that uses our transliteration module.}}
 * {{hidden||We present a Hebrew to English transliteration method in the context of a machine translation system. Our method uses machine learning to determine which terms are to be transliterated rather than translated. The training corpus for this purpose includes only positive examples, acquired semi-automatically. Our classifier reduces more than 38\% of the errors made by a baseline method. The identified terms are then transliterated. We present an SMT-based} transliteration model trained with a parallel corpus extracted from Wikipedia using a fairly simple method which requires minimal knowledge. The correct result is produced in more than 76\% of the cases, and in 92\% of the instances it is one of the top-5 results. We also demonstrate a small improvement in the performance of a {Hebrew-to-English} MT} system that uses our transliteration module.}}


 * -- align="left" valign=top
 * Kaptein, Rianne; Serdyukov, Pavel & Kamps, Jaap
 * Linking wikipedia to the web
 * 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland
 * 2010
 * 


 * -- align="left" valign=top
 * Weiss, Stephane; Urso, Pascal & Molli, Pascal
 * Logoot: A scalable optimistic replication algorithm for collaborative editing on P2P networks
 * 2009 29th IEEE International Conference on Distributed Computing Systems Workshops, ICDCS, 09, June 22, 2009 - June 26, 2009 Montreal, QC, Canada
 * 2009
 * 


 * -- align="left" valign=top
 * Moon, Hyun J.; Curino, Carlo A.; Deutsch, Alin; Hou, Chien-Yi & Zaniolo, Carlo
 * Managing and querying transaction-time databases under schema evolution
 * Proceedings of the VLDB Endowment VLDB Endowment Hompage
 * 2008
 * 
 * {{hidden||The old problem of managing the history of database information is now made more urgent and complex by fast spreading web information systems, such as Wikipedia. Our PRIMA} system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery.} The second key piece of technology is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly important for historical queries spanning over potentially hundreds of different schema versions and it is realized in PRIMA} by (i) introducing Schema Modification Operators (SMOs) to represent the mappings between successive schema versions and (ii) an XML} integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs.} The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB} schema evolution history.}}
 * {{hidden||The old problem of managing the history of database information is now made more urgent and complex by fast spreading web information systems, such as Wikipedia. Our PRIMA} system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery.} The second key piece of technology is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly important for historical queries spanning over potentially hundreds of different schema versions and it is realized in PRIMA} by (i) introducing Schema Modification Operators (SMOs) to represent the mappings between successive schema versions and (ii) an XML} integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs.} The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB} schema evolution history.}}


 * -- align="left" valign=top
 * Curino, Carlo A.; Moon, Hyun J. & Zaniolo, Carlo
 * Managing the History of Metadata in Support for DB Archiving and Schema Evolution
 * Proceedings of the ER 2008 Workshops (CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM) on Advances in Conceptual Modeling: Challenges and Opportunities
 * 2008
 * 


 * -- align="left" valign=top
 * Hu, Meiqun; Lim, Ee-Peng; Sun, Aixin; Lauw, Hady W. & Vuong, Ba-Quy
 * Measuring article quality in wikipedia: Models and evaluation
 * 16th ACM Conference on Information and Knowledge Management, CIKM 2007, November 6, 2007 - November 9, 2007 Lisboa, Portugal
 * 2007
 * 
 * {{hidden||Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our basic model is designed based on the mutual dependency between article quality and their author authority. The PeerReview} model introduces the review behavior into measuring article quality. Finally, our ProbReview} models extend PeerReview} with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.}}
 * {{hidden||Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our basic model is designed based on the mutual dependency between article quality and their author authority. The PeerReview} model introduces the review behavior into measuring article quality. Finally, our ProbReview} models extend PeerReview} with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.}}


 * -- align="left" valign=top
 * Adler, B. Thomas; Alfaro, Luca De; Pye, Ian & Raman, Vishwanath
 * Measuring author contributions to the Wikipedia
 * 4th International Symposium on Wikis, WikiSym 2008, September 8, 2008 - September 10, 2008 Porto, Portugal
 * 2008
 * 


 * -- align="left" valign=top
 * Volkovich, Yana; Litvak, Nelly & Zwart, Bert
 * Measuring extremal dependencies in Web graphs
 * 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China
 * 2008
 * 
 * {{hidden||We analyze dependencies in power law graph data (Web} sample, Wikipedia sample and a preferential attachment graph) using statistical inference for multivariate regular variation. The well developed theory of regular variation is widely applied in extreme value theory, telecommunications and mathematical finance, and it provides a natural mathematical formalism for analyzing dependencies between variables with power laws. However, most of the proposed methods have never been used in the Web graph data mining. The present work fills this gap. The new insights this yields are striking: the three above-mentioned data sets are shown to have a totally different dependence structure between different graph parameters, such as in-degree and Page Rank.}}
 * {{hidden||We analyze dependencies in power law graph data (Web} sample, Wikipedia sample and a preferential attachment graph) using statistical inference for multivariate regular variation. The well developed theory of regular variation is widely applied in extreme value theory, telecommunications and mathematical finance, and it provides a natural mathematical formalism for analyzing dependencies between variables with power laws. However, most of the proposed methods have never been used in the Web graph data mining. The present work fills this gap. The new insights this yields are striking: the three above-mentioned data sets are shown to have a totally different dependence structure between different graph parameters, such as in-degree and Page Rank.}}


 * -- align="left" valign=top
 * Stuckman, Jeff & Purtilo, James
 * Measuring the wikisphere
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Roth, Camille; Taraborelli, Dario & Gilbert, Nigel
 * Measuring wiki viability: an empirical assessment of the social dynamics of a large sample of wikis
 * Proceedings of the 4th International Symposium on Wikis
 * 2008
 * 


 * -- align="left" valign=top
 * Alfaro, Luca De & Ortega, Felipe
 * Measuring Wikipedia: A hands-on tutorial
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 
 * {{hidden||This tutorial is an introduction to the best methodologies, tools and practices for Wikipedia research. The tutorial will be led by Luca de Alfaro (Wiki} Lab at UCSC, California, USA) and Felipe Ortega (Libresoft, URJC, Madrid, Spain). Both cumulate several years of practical experience exploring and processing Wikipedia data [1], [2], [3]. As well, their respective research groups have led the development of two cutting-edge software tools (WikiTrust} and WikiXRay), for analyzing Wikipedia. WikiTrust} implements an author reputation system, and a text trust system, for wikis. WikiXRay} is a tool automating the quantitative analysis of any language version of Wikipedia (in general, any wiki based on MediaWiki). }}
 * {{hidden||This tutorial is an introduction to the best methodologies, tools and practices for Wikipedia research. The tutorial will be led by Luca de Alfaro (Wiki} Lab at UCSC, California, USA) and Felipe Ortega (Libresoft, URJC, Madrid, Spain). Both cumulate several years of practical experience exploring and processing Wikipedia data [1], [2], [3]. As well, their respective research groups have led the development of two cutting-edge software tools (WikiTrust} and WikiXRay), for analyzing Wikipedia. WikiTrust} implements an author reputation system, and a text trust system, for wikis. WikiXRay} is a tool automating the quantitative analysis of any language version of Wikipedia (in general, any wiki based on MediaWiki). }}


 * -- align="left" valign=top
 * Morante, Roser; Asch, Vincent Van & Daelemans, Walter
 * Memory-based resolution of in-sentence scopes of hedge cues
 * Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
 * 2010
 * 
 * {{hidden||In this paper we describe the machine learning systems that we submitted to the CoNLL-2010} Shared Task on Learning to Detect Hedges and Their Scope in Natural Language Text. Task 1 on detecting uncertain information was performed by an SVM-based} system to process the Wikipedia data and by a memory-based system to process the biological data. Task 2 on resolving in-sentence scopes of hedge cues, was performed by a memorybased system that relies on information from syntactic dependencies. This system scored the highest F1 (57.32) of Task 2.}}
 * {{hidden||In this paper we describe the machine learning systems that we submitted to the CoNLL-2010} Shared Task on Learning to Detect Hedges and Their Scope in Natural Language Text. Task 1 on detecting uncertain information was performed by an SVM-based} system to process the Wikipedia data and by a memory-based system to process the biological data. Task 2 on resolving in-sentence scopes of hedge cues, was performed by a memorybased system that relies on information from syntactic dependencies. This system scored the highest F1 (57.32) of Task 2.}}


 * -- align="left" valign=top
 * Yasuda, Keiji & Sumita, Eiichiro
 * Method for building sentence-aligned corpus from wikipedia
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008
 * {{hidden||We propose the framework of a Machine Translation (MT) bootstrapping method by using multilingual Wikipedia articles. This novel method can simultaneously generate a statistical machine translation (SMT) and a sentence-aligned corpus. In this study, we perform two types of experiments. The aim of the first type of experiments is to verify the sentence alignment performance by comparing the proposed method with a conventional sentence alignment approach. For the first type of experiments, we use JENAAD, which is a sentence-aligned corpus built by the conventional sentence alignment method. The second type of experiments uses actual English and Japanese Wikipedia articles for sentence alignment. The result of the first type of experiments shows that the performance of the proposed method is comparable to that of the conventional sentence alignment method. Additionally, the second type of experiments shows that wc can obtain the English translation of 10\% of Japanese sentences while maintaining high alignment quality (rank-A} ratio of over 0.8). Copyright 2008.}}
 * {{hidden||We propose the framework of a Machine Translation (MT) bootstrapping method by using multilingual Wikipedia articles. This novel method can simultaneously generate a statistical machine translation (SMT) and a sentence-aligned corpus. In this study, we perform two types of experiments. The aim of the first type of experiments is to verify the sentence alignment performance by comparing the proposed method with a conventional sentence alignment approach. For the first type of experiments, we use JENAAD, which is a sentence-aligned corpus built by the conventional sentence alignment method. The second type of experiments uses actual English and Japanese Wikipedia articles for sentence alignment. The result of the first type of experiments shows that the performance of the proposed method is comparable to that of the conventional sentence alignment method. Additionally, the second type of experiments shows that wc can obtain the English translation of 10\% of Japanese sentences while maintaining high alignment quality (rank-A} ratio of over 0.8). Copyright 2008.}}
 * {{hidden||We propose the framework of a Machine Translation (MT) bootstrapping method by using multilingual Wikipedia articles. This novel method can simultaneously generate a statistical machine translation (SMT) and a sentence-aligned corpus. In this study, we perform two types of experiments. The aim of the first type of experiments is to verify the sentence alignment performance by comparing the proposed method with a conventional sentence alignment approach. For the first type of experiments, we use JENAAD, which is a sentence-aligned corpus built by the conventional sentence alignment method. The second type of experiments uses actual English and Japanese Wikipedia articles for sentence alignment. The result of the first type of experiments shows that the performance of the proposed method is comparable to that of the conventional sentence alignment method. Additionally, the second type of experiments shows that wc can obtain the English translation of 10\% of Japanese sentences while maintaining high alignment quality (rank-A} ratio of over 0.8). Copyright 2008.}}


 * -- align="left" valign=top
 * Ni, Xiaochuan; Sun, Jian-Tao; Hu, Jian & Chen, Zheng
 * Mining multilingual topics from wikipedia
 * Proceedings of the 18th international conference on World wide web
 * 2009
 * 


 * -- align="left" valign=top
 * Witmer, Jeremy & Kalita, Jugal
 * Mining wikipedia article clusters for geospatial entities and relationships
 * Social Semantic Web: Where Web 2.0 Meets Web 3.0 - Papers from the AAAI Spring Symposium, March 23, 2009 - March 25, 2009 Stanford, CA, United states
 * 2009
 * {{hidden||We present in this paper a method to extract geospatial entities and relationships from the unstructured text of the English language Wikipedia. Using a novel approach that applies SVMs} trained from purely structural features of text strings, we extract candidate geospatial entities and relation-ships. Using a combination of further techniques, along with an external gazetteer, the candidate entities and relationships are disambiguated and the Wikipedia article pages are modified to include the semantic information provided by the extraction process. We successfully extracted location entities with an F-measure of 81 \%, and location relations with an F-measure of 54\%. Copyright 2009, Association for the Advancement of Artificial Intelligence.}}
 * {{hidden||We present in this paper a method to extract geospatial entities and relationships from the unstructured text of the English language Wikipedia. Using a novel approach that applies SVMs} trained from purely structural features of text strings, we extract candidate geospatial entities and relation-ships. Using a combination of further techniques, along with an external gazetteer, the candidate entities and relationships are disambiguated and the Wikipedia article pages are modified to include the semantic information provided by the extraction process. We successfully extracted location entities with an F-measure of 81 \%, and location relations with an F-measure of 54\%. Copyright 2009, Association for the Advancement of Artificial Intelligence.}}
 * {{hidden||We present in this paper a method to extract geospatial entities and relationships from the unstructured text of the English language Wikipedia. Using a novel approach that applies SVMs} trained from purely structural features of text strings, we extract candidate geospatial entities and relation-ships. Using a combination of further techniques, along with an external gazetteer, the candidate entities and relationships are disambiguated and the Wikipedia article pages are modified to include the semantic information provided by the extraction process. We successfully extracted location entities with an F-measure of 81 \%, and location relations with an F-measure of 54\%. Copyright 2009, Association for the Advancement of Artificial Intelligence.}}


 * -- align="left" valign=top
 * Yamangil, Elif & Nelken, Rani
 * Mining wikipedia revision histories for improving sentence compression
 * Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
 * 2008
 * 
 * {{hidden||A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. We propose a new and bountiful resource for such training data, which we obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, we have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis} corpus. Using this new-found data, we propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.}}
 * {{hidden||A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. We propose a new and bountiful resource for such training data, which we obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, we have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis} corpus. Using this new-found data, we propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.}}


 * -- align="left" valign=top
 * Nelken, Rani & Yamangil, Elif
 * Mining wikipedia's article revision history for training computational linguistics algorithms
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008


 * -- align="left" valign=top
 * Minitrack Introduction
 * Proceedings of the Proceedings of the 41st Annual Hawaii International Conference on System Sciences
 * 2008
 * 
 * {{hidden||This year's minitrack on Open Movements:Open} Source Software and Open Content provides aforum for discussion of an increasingly importantmode of collaborative content and softwaredevelopment. OSS} is a broad term used to embracesoftware that is developed and released under somesort of open source license (as is free software, aclosely related phenomenon). There are thousands OfOSS} projects spanning a range of applications, Linuxand Apache being two of the most visible. Open Content refers to published content (e.g., articles,engineering deigns, pictures, etc.) released under alicense allowing the content to be freely used andpossibly modified and redistributed. Examples of OCare} Wikipedia and MIT's} Open Courseware.}}
 * {{hidden||This year's minitrack on Open Movements:Open} Source Software and Open Content provides aforum for discussion of an increasingly importantmode of collaborative content and softwaredevelopment. OSS} is a broad term used to embracesoftware that is developed and released under somesort of open source license (as is free software, aclosely related phenomenon). There are thousands OfOSS} projects spanning a range of applications, Linuxand Apache being two of the most visible. Open Content refers to published content (e.g., articles,engineering deigns, pictures, etc.) released under alicense allowing the content to be freely used andpossibly modified and redistributed. Examples of OCare} Wikipedia and MIT's} Open Courseware.}}
 * {{hidden||This year's minitrack on Open Movements:Open} Source Software and Open Content provides aforum for discussion of an increasingly importantmode of collaborative content and softwaredevelopment. OSS} is a broad term used to embracesoftware that is developed and released under somesort of open source license (as is free software, aclosely related phenomenon). There are thousands OfOSS} projects spanning a range of applications, Linuxand Apache being two of the most visible. Open Content refers to published content (e.g., articles,engineering deigns, pictures, etc.) released under alicense allowing the content to be freely used andpossibly modified and redistributed. Examples of OCare} Wikipedia and MIT's} Open Courseware.}}


 * -- align="left" valign=top
 * Gan, Daniel Dengyang & Chia, Liang-Tien
 * MobileMaps@sg - Mappedia version 1.1
 * IEEE International Conference onMultimedia and Expo, ICME 2007, July 2, 2007 - July 5, 2007 Beijing, China
 * 2007
 * {{hidden||Technology has always been moving. Throughout the decades, improvements in various technological areas have led to a greater sense of convenience to ordinary people, whether it is cutting down time in accessing normal-to-day activities or getting privileged services. One of the technological areas that had been moving very rapidly is that of mobile computing. The common mobile device now has the mobility, provides entertainment via multimedia, connects to the Internet and is powered by intelligent and powerful chips. This paper will touch on an idea that is currently in the works, an integration of a recent technology that has netizens talking all over the world; Google Maps, that provide street and satellite images via the internet to the PC} and Wikipedia's user content support idea, the biggest free-content encyclopedia on the Internet. We will hit on how it is able to integrate such a technology with the idea of free form editing into one application in a small mobile device. The new features provided by this application will work toward supporting the development of multimedia application and computing. }}
 * {{hidden||Technology has always been moving. Throughout the decades, improvements in various technological areas have led to a greater sense of convenience to ordinary people, whether it is cutting down time in accessing normal-to-day activities or getting privileged services. One of the technological areas that had been moving very rapidly is that of mobile computing. The common mobile device now has the mobility, provides entertainment via multimedia, connects to the Internet and is powered by intelligent and powerful chips. This paper will touch on an idea that is currently in the works, an integration of a recent technology that has netizens talking all over the world; Google Maps, that provide street and satellite images via the internet to the PC} and Wikipedia's user content support idea, the biggest free-content encyclopedia on the Internet. We will hit on how it is able to integrate such a technology with the idea of free form editing into one application in a small mobile device. The new features provided by this application will work toward supporting the development of multimedia application and computing. }}
 * {{hidden||Technology has always been moving. Throughout the decades, improvements in various technological areas have led to a greater sense of convenience to ordinary people, whether it is cutting down time in accessing normal-to-day activities or getting privileged services. One of the technological areas that had been moving very rapidly is that of mobile computing. The common mobile device now has the mobility, provides entertainment via multimedia, connects to the Internet and is powered by intelligent and powerful chips. This paper will touch on an idea that is currently in the works, an integration of a recent technology that has netizens talking all over the world; Google Maps, that provide street and satellite images via the internet to the PC} and Wikipedia's user content support idea, the biggest free-content encyclopedia on the Internet. We will hit on how it is able to integrate such a technology with the idea of free form editing into one application in a small mobile device. The new features provided by this application will work toward supporting the development of multimedia application and computing. }}


 * -- align="left" valign=top
 * Diaz, Oscar & Puente, Gorka
 * Model-aware wiki analysis tools: The case of HistoryFlow
 * 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland
 * 2010
 * 


 * -- align="left" valign=top
 * Chi, Ed H.
 * Model-Driven Research in Human-Centric Computing
 * Visual Languages and Human-Centric Computing (VL/HCC), 2010 IEEE Symposium on
 * 2010
 * {{hidden||How can we build systems that enable users to mix and match tools together? How will we know whether we have done a good job in creating usable visual interactive systems that help users accomplish a wide variety of goals? How can people share the results of their explorations with each other, and for innovative tools to be remixed? Widely-used tools such as Web Browsers, Wikis, spreadsheets, and analytics environments like R all contain models of how people mix and combine operators and functionalities. In my own research, system developments are very much informed by models such as information scent, sense making, information theory, probabilistic models, and more recently, evolutionary dynamic models. These models have been used to understand a wide-variety of user behaviors in human-centric computing, from individuals interacting with a search system like MrTaggy.com} to groups of people working on articles in Wikipedia. These models range in complexity from a simple set of assumptions to complex equations describing human and group behavior. In this talk, I will attempt to illustrate how a model-driven approach to answering the above questions should help to illuminate the path forward for {Human-Centric} Computing.}}
 * {{hidden||How can we build systems that enable users to mix and match tools together? How will we know whether we have done a good job in creating usable visual interactive systems that help users accomplish a wide variety of goals? How can people share the results of their explorations with each other, and for innovative tools to be remixed? Widely-used tools such as Web Browsers, Wikis, spreadsheets, and analytics environments like R all contain models of how people mix and combine operators and functionalities. In my own research, system developments are very much informed by models such as information scent, sense making, information theory, probabilistic models, and more recently, evolutionary dynamic models. These models have been used to understand a wide-variety of user behaviors in human-centric computing, from individuals interacting with a search system like MrTaggy.com} to groups of people working on articles in Wikipedia. These models range in complexity from a simple set of assumptions to complex equations describing human and group behavior. In this talk, I will attempt to illustrate how a model-driven approach to answering the above questions should help to illuminate the path forward for {Human-Centric} Computing.}}
 * {{hidden||How can we build systems that enable users to mix and match tools together? How will we know whether we have done a good job in creating usable visual interactive systems that help users accomplish a wide variety of goals? How can people share the results of their explorations with each other, and for innovative tools to be remixed? Widely-used tools such as Web Browsers, Wikis, spreadsheets, and analytics environments like R all contain models of how people mix and combine operators and functionalities. In my own research, system developments are very much informed by models such as information scent, sense making, information theory, probabilistic models, and more recently, evolutionary dynamic models. These models have been used to understand a wide-variety of user behaviors in human-centric computing, from individuals interacting with a search system like MrTaggy.com} to groups of people working on articles in Wikipedia. These models range in complexity from a simple set of assumptions to complex equations describing human and group behavior. In this talk, I will attempt to illustrate how a model-driven approach to answering the above questions should help to illuminate the path forward for {Human-Centric} Computing.}}


 * -- align="left" valign=top
 * Burke, Moira & Kraut, Robert
 * Mopping up: Modeling wikipedia promotion decisions
 * 2008 ACM Conference on Computer Supported Cooperative Work, CSCW 08, November 8, 2008 - November 12, 2008 San Diego, CA, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Sarmento, Luis; Jijkuon, Valentin; de Rijke, Maarten & Oliveira, Eugenio
 * More like these": growing entity classes from seeds"
 * Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
 * 2007
 * 


 * -- align="left" valign=top
 * Chaudhuri, Kamalika; Kakade, Sham M.; Livescu, Karen & Sridharan, Karthik
 * Multi-view clustering via canonical correlation analysis
 * 26th Annual International Conference on Machine Learning, ICML'09, June 14, 2009 - June 18, 2009 Montreal, QC, Canada
 * 2009
 * 
 * {{hidden||Clustering data in high dimensions is believed to be a hard problem in general. A number of effcient clustering algorithms developed in recent years address this problem by projecting the data into a lowerdimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA).} Under the assumption that the views are uncorrelated given the cluster label, we show that the separation conditions required for the algorithm to be successful are signicantly weaker than prior results in the literature. We provide results for mixtures of Gaussians and mixtures of log concave distributions. We also provide empirical support from audio-visual speaker clustering (where we desire the clusters to correspond to speaker ID) and from hierarchical Wikipedia document clustering (where one view is the words in the document and the other is the link structure). Copyright 2009.}}
 * {{hidden||Clustering data in high dimensions is believed to be a hard problem in general. A number of effcient clustering algorithms developed in recent years address this problem by projecting the data into a lowerdimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA).} Under the assumption that the views are uncorrelated given the cluster label, we show that the separation conditions required for the algorithm to be successful are signicantly weaker than prior results in the literature. We provide results for mixtures of Gaussians and mixtures of log concave distributions. We also provide empirical support from audio-visual speaker clustering (where we desire the clusters to correspond to speaker ID) and from hierarchical Wikipedia document clustering (where one view is the words in the document and the other is the link structure). Copyright 2009.}}


 * -- align="left" valign=top
 * Kasneci, Gjergji; Suchanek, Fabian M.; Ifrim, Georgiana; Elbassuoni, Shady; Ramanath, Maya & Weikum, Gerhard
 * NAGA: Harvesting, searching and ranking knowledge
 * 2008 ACM SIGMOD International Conference on Management of Data 2008, SIGMOD'08, June 9, 2008 - June 12, 2008 Vancouver, BC, Canada
 * 2008
 * 
 * {{hidden||The presence of encyclopedic Web sources, such as Wikipedia, the Internet Movie Database (IMDB), World Factbook, etc. calls for new querying techniques that are simple and yet more expressive than those provided by standard keyword-based search engines. Searching for explicit knowledge needs to consider inherent semantic structures involving entities and relationships. In this demonstration proposal, we describe a semantic search system named NAGA.} NAGA} operates on a knowledge graph, which contains millions of entities and relationships derived from various encyclopedic Web sources, such as the ones above. NAGA's} graph-based query language is geared towards expressing queries with additional semantic information. Its scoring model is based on the principles of generative language models, and formalizes several desiderata such as confidence, informativeness and compactness of answers. We propose a demonstration of NAGA} which will allow users to browse the knowledge base through a user interface, enter queries in NAGA's} query language and tune the ranking parameters to test various ranking aspects.}}
 * {{hidden||The presence of encyclopedic Web sources, such as Wikipedia, the Internet Movie Database (IMDB), World Factbook, etc. calls for new querying techniques that are simple and yet more expressive than those provided by standard keyword-based search engines. Searching for explicit knowledge needs to consider inherent semantic structures involving entities and relationships. In this demonstration proposal, we describe a semantic search system named NAGA.} NAGA} operates on a knowledge graph, which contains millions of entities and relationships derived from various encyclopedic Web sources, such as the ones above. NAGA's} graph-based query language is geared towards expressing queries with additional semantic information. Its scoring model is based on the principles of generative language models, and formalizes several desiderata such as confidence, informativeness and compactness of answers. We propose a demonstration of NAGA} which will allow users to browse the knowledge base through a user interface, enter queries in NAGA's} query language and tune the ranking parameters to test various ranking aspects.}}


 * -- align="left" valign=top
 * Han, Xianpei & Zhao, Jun
 * Named entity disambiguation by leveraging wikipedia semantic knowledge
 * ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China
 * 2009
 * 
 * {{hidden||Name ambiguity problem has raised an urgent demand for efficient, high-quality named entity disambiguation methods. The key problem of named entity disambiguation is to measure the similarity between occurrences of names. The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW} cannot reflect the actual similarity. Some research has investigated social networks as background knowledge for disambiguation. Social networks, however, can only capture the social relatedness between named entities, and often suffer the limited coverage problem. To overcome the previous methods' deficiencies, this paper proposes to use Wikipedia as the background knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content. By leveraging Wikipedia's semantic knowledge like social relatedness between named entities and associative relatedness between concepts, we can measure the similarity between occurrences of names more accurately. In particular, we construct a large-scale semantic network from Wikipedia, in order that the semantic knowledge can be used efficiently and effectively. Based on the constructed semantic network, a novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation. The proposed method has been tested on the standard WePS} data sets. Empirical results show that the disambiguation performance of our method gets 10.7\% improvement over the traditional BOW} based methods and 16.7\% improvement over the traditional social network based methods. }}
 * {{hidden||Name ambiguity problem has raised an urgent demand for efficient, high-quality named entity disambiguation methods. The key problem of named entity disambiguation is to measure the similarity between occurrences of names. The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW} cannot reflect the actual similarity. Some research has investigated social networks as background knowledge for disambiguation. Social networks, however, can only capture the social relatedness between named entities, and often suffer the limited coverage problem. To overcome the previous methods' deficiencies, this paper proposes to use Wikipedia as the background knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content. By leveraging Wikipedia's semantic knowledge like social relatedness between named entities and associative relatedness between concepts, we can measure the similarity between occurrences of names more accurately. In particular, we construct a large-scale semantic network from Wikipedia, in order that the semantic knowledge can be used efficiently and effectively. Based on the constructed semantic network, a novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation. The proposed method has been tested on the standard WePS} data sets. Empirical results show that the disambiguation performance of our method gets 10.7\% improvement over the traditional BOW} based methods and 16.7\% improvement over the traditional social network based methods. }}


 * -- align="left" valign=top
 * Maskey, Sameer & Dakka, Wisam
 * Named entity network based on wikipedia
 * 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, September 6, 2009 - September 10, 2009 Brighton, United kingdom
 * 2009
 * {{hidden||Named Entities (NEs) play an important role in many natural language and speech processing tasks. A resource that identifies relations between NEs} could potentially be very useful. We present such automatically generated knowledge resource from Wikipedia, Named Entity Network (NE-NET), that provides a list of related Named Entities (NEs) and the degree of relation for any given NE.} Unlike some manually built knowledge resource, NE-NET} has a wide coverage consisting of 1.5 million NEs} represented as nodes of a graph with 6.5 million arcs relating them. NE-NET} also provides the ranks of the related NEs} using a simple ranking function that we propose. In this paper, we present NE-NET} and our experiments showing how NE-NET} can be used to improve the retrieval of spoken (Broadcast} News) and text documents. }}
 * {{hidden||Named Entities (NEs) play an important role in many natural language and speech processing tasks. A resource that identifies relations between NEs} could potentially be very useful. We present such automatically generated knowledge resource from Wikipedia, Named Entity Network (NE-NET), that provides a list of related Named Entities (NEs) and the degree of relation for any given NE.} Unlike some manually built knowledge resource, NE-NET} has a wide coverage consisting of 1.5 million NEs} represented as nodes of a graph with 6.5 million arcs relating them. NE-NET} also provides the ranks of the related NEs} using a simple ranking function that we propose. In this paper, we present NE-NET} and our experiments showing how NE-NET} can be used to improve the retrieval of spoken (Broadcast} News) and text documents. }}
 * {{hidden||Named Entities (NEs) play an important role in many natural language and speech processing tasks. A resource that identifies relations between NEs} could potentially be very useful. We present such automatically generated knowledge resource from Wikipedia, Named Entity Network (NE-NET), that provides a list of related Named Entities (NEs) and the degree of relation for any given NE.} Unlike some manually built knowledge resource, NE-NET} has a wide coverage consisting of 1.5 million NEs} represented as nodes of a graph with 6.5 million arcs relating them. NE-NET} also provides the ranks of the related NEs} using a simple ranking function that we propose. In this paper, we present NE-NET} and our experiments showing how NE-NET} can be used to improve the retrieval of spoken (Broadcast} News) and text documents. }}


 * -- align="left" valign=top
 * Krioukov, Andrew; Mohan, Prashanth; Alspaugh, Sara; Keys, Laura; Culler, David & Katz, Randy H.
 * NapSAC: design and implementation of a power-proportional web cluster
 * Proceedings of the first ACM SIGCOMM workshop on Green networking
 * 2010
 * 


 * -- align="left" valign=top
 * Brandes, Ulrik; Kenis, Patrick; Lerner, Jürgen & van Raaij, Denise
 * Network analysis of collaboration structure in Wikipedia
 * Proceedings of the 18th international conference on World wide web
 * 2009
 * 


 * -- align="left" valign=top
 * Gardner, James; Krowne, Aaron & Xiong, Li
 * NNexus: an automatic linker for collaborative web-based corpora
 * Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
 * 2009
 * 
 * {{hidden||Collaborative online encyclopedias or knowledge bases such as Wikipedia and PlanetMath} are becoming increasingly popular. We demonstrate NNexus, a generalization of the automatic linking engine of PlanetMath.org} and the first system that automates the process of linking disparate encyclopedia" entries into a fully-connected conceptual network. The main challenges of this problem space include: 1) linking quality (correctly identifying which terms to link and which entry to link to with minimal effort on the part of users) 2) efficiency and scalability and 3) generalization to multiple knowledge bases and web-based information environment. We present NNexus} that utilizes subject classification and other metadata to address these challenges and demonstrate its effectiveness and efficiency through multiple real world corpora."}}
 * {{hidden||Collaborative online encyclopedias or knowledge bases such as Wikipedia and PlanetMath} are becoming increasingly popular. We demonstrate NNexus, a generalization of the automatic linking engine of PlanetMath.org} and the first system that automates the process of linking disparate encyclopedia" entries into a fully-connected conceptual network. The main challenges of this problem space include: 1) linking quality (correctly identifying which terms to link and which entry to link to with minimal effort on the part of users) 2) efficiency and scalability and 3) generalization to multiple knowledge bases and web-based information environment. We present NNexus} that utilizes subject classification and other metadata to address these challenges and demonstrate its effectiveness and efficiency through multiple real world corpora."}}


 * -- align="left" valign=top
 * von dem Bussche, Franziska; Weiand, Klara; Linse, Benedikt; Furche, Tim & Bry, François
 * Not so creepy crawler: easy crawler generation with standard xml queries
 * Proceedings of the 19th international conference on World wide web
 * 2010
 * 
 * {{hidden||Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far more uniformly structured than in the general Web and thus crawlers can use the structure of Web pages for more precise data extraction and more expressive analysis. In this demonstration, we present a focused, structure-based crawler generator, the Not} so Creepy Crawler" (nc2 ). What sets nc2 apart is that all analysis and decision tasks of the crawling process are delegated to an (arbitrary) XML} query engine such as XQuery} or Xcerpt. Customizing crawlers just means writing (declarative) XML} queries that can access the currently crawled document as well as the metadata of the crawl process. We identify four types of queries that together sufice to realize a wide variety of focused crawlers. We demonstrate nc2 with two applications: The first extracts data about cities from Wikipedia with a customizable set of attributes for selecting and reporting these cities. It illustrates the power of nc2 where data extraction from Wiki-style fairly homogeneous knowledge sites is required. In contrast the second use case demonstrates how easy nc2 makes even complex analysis tasks on social networking sites here exemplified by last.fm."}}
 * {{hidden||Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far more uniformly structured than in the general Web and thus crawlers can use the structure of Web pages for more precise data extraction and more expressive analysis. In this demonstration, we present a focused, structure-based crawler generator, the Not} so Creepy Crawler" (nc2 ). What sets nc2 apart is that all analysis and decision tasks of the crawling process are delegated to an (arbitrary) XML} query engine such as XQuery} or Xcerpt. Customizing crawlers just means writing (declarative) XML} queries that can access the currently crawled document as well as the metadata of the crawl process. We identify four types of queries that together sufice to realize a wide variety of focused crawlers. We demonstrate nc2 with two applications: The first extracts data about cities from Wikipedia with a customizable set of attributes for selecting and reporting these cities. It illustrates the power of nc2 where data extraction from Wiki-style fairly homogeneous knowledge sites is required. In contrast the second use case demonstrates how easy nc2 makes even complex analysis tasks on social networking sites here exemplified by last.fm."}}


 * -- align="left" valign=top
 * Wang, Gang & Forsyth, David
 * Object image retrieval by exploiting online knowledge resources
 * 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 23, 2008 - June 28, 2008 Anchorage, AK, United states
 * 2008
 * 
 * {{hidden||We describe a method to retrieve images found on web pages with specified object class labels, using an analysis of text around the image and of image appearance. Our method determines whether an object is both described in text and appears in a image using a discriminative image model and a generative text model. Our models are learnt by exploiting established online knowledge resources (Wikipedia} pages for text; Flickr and Caltech data sets for image). These resources provide rich text and object appearance information. We describe results on two data sets. The first is Berg's collection of ten animal categories; on this data set, we outperform previous approaches [7, 33]. We have also collected five more categories. Experimental results show the effectiveness of our approach on this new data set. }}
 * {{hidden||We describe a method to retrieve images found on web pages with specified object class labels, using an analysis of text around the image and of image appearance. Our method determines whether an object is both described in text and appears in a image using a discriminative image model and a generative text model. Our models are learnt by exploiting established online knowledge resources (Wikipedia} pages for text; Flickr and Caltech data sets for image). These resources provide rich text and object appearance information. We describe results on two data sets. The first is Berg's collection of ten animal categories; on this data set, we outperform previous approaches [7, 33]. We have also collected five more categories. Experimental results show the effectiveness of our approach on this new data set. }}


 * -- align="left" valign=top
 * Pedro, Vasco Calais; Niculescu, Radu Stefan & Lita, Lucian Vlad
 * Okinet: Automatic extraction of a medical ontology from wikipedia
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008
 * {{hidden||The medical domain provides a fertile ground for the application of ontological knowledge. Ontologies are an essential part of many approaches to medical text processing, understanding and reasoning. However, the creation of large, high quality medical ontologies is not trivial, requiring the analysis of domain sources, background knowledge, as well as obtaining consensus among experts. Current methods are labor intensive, prone to generate inconsistencies, and often require expert knowledge. Fortunately, semi structured information repositories, like Wikipedia, provide a valuable resource from which to mine structured information. In this paper we propose a novel framework for automatically creating medical ontologies from semi-structured data. As part of this framework, we present a Directional Feedback Edge Labeling (DFEL) algorithm. We successfully demonstrate the effectiveness of the DFEL} algorithm on the task of labeling the relations of Okinet, a Wikipedia based medical ontology. Current results demonstrate the high performance, utility, and flexibility of our approach. We conclude by describing ROSE, an application that combines Okinet with other medical ontologies.}}
 * {{hidden||The medical domain provides a fertile ground for the application of ontological knowledge. Ontologies are an essential part of many approaches to medical text processing, understanding and reasoning. However, the creation of large, high quality medical ontologies is not trivial, requiring the analysis of domain sources, background knowledge, as well as obtaining consensus among experts. Current methods are labor intensive, prone to generate inconsistencies, and often require expert knowledge. Fortunately, semi structured information repositories, like Wikipedia, provide a valuable resource from which to mine structured information. In this paper we propose a novel framework for automatically creating medical ontologies from semi-structured data. As part of this framework, we present a Directional Feedback Edge Labeling (DFEL) algorithm. We successfully demonstrate the effectiveness of the DFEL} algorithm on the task of labeling the relations of Okinet, a Wikipedia based medical ontology. Current results demonstrate the high performance, utility, and flexibility of our approach. We conclude by describing ROSE, an application that combines Okinet with other medical ontologies.}}
 * {{hidden||The medical domain provides a fertile ground for the application of ontological knowledge. Ontologies are an essential part of many approaches to medical text processing, understanding and reasoning. However, the creation of large, high quality medical ontologies is not trivial, requiring the analysis of domain sources, background knowledge, as well as obtaining consensus among experts. Current methods are labor intensive, prone to generate inconsistencies, and often require expert knowledge. Fortunately, semi structured information repositories, like Wikipedia, provide a valuable resource from which to mine structured information. In this paper we propose a novel framework for automatically creating medical ontologies from semi-structured data. As part of this framework, we present a Directional Feedback Edge Labeling (DFEL) algorithm. We successfully demonstrate the effectiveness of the DFEL} algorithm on the task of labeling the relations of Okinet, a Wikipedia based medical ontology. Current results demonstrate the high performance, utility, and flexibility of our approach. We conclude by describing ROSE, an application that combines Okinet with other medical ontologies.}}


 * -- align="left" valign=top
 * Ortega, Felipe; Izquierdo-Cortazar, Daniel; Gonzalez-Barahona, Jesus M. & Robles, Gregorio
 * On the analysis of contributions from privileged users in virtual open communities
 * 42nd Annual Hawaii International Conference on System Sciences, HICSS, January 5, 2009 - January 9, 2009 Waikoloa, {HI, United states
 * 2009
 * 
 * {{hidden||Collaborative projects built around virtual communities on the Internet have gained momentum over the last decade. Nevertheless, their rapid growth rate rises some questions: which is the most effective approach to manage and organize their content creation process? Can these communities scale, controlling their projects as their size continues to grow over time? To answer these questions, we undertake a quantitative analysis of privileged users in FLOSS} development projects and in Wikipedia. From our results, we conclude that the inequality level of user contributions in both types of initiatives is remarkably distinct, even though both communities present almost identical patterns regarding the number of distinct contributors per file (in FLOSS} projects) or per article (in Wikipedia). As a result, totally open projects like Wikipedia can effectively deal with faster growing rates, while FLOSS} projects may be affected by bottlenecks on committers who play critical roles. }}
 * {{hidden||Collaborative projects built around virtual communities on the Internet have gained momentum over the last decade. Nevertheless, their rapid growth rate rises some questions: which is the most effective approach to manage and organize their content creation process? Can these communities scale, controlling their projects as their size continues to grow over time? To answer these questions, we undertake a quantitative analysis of privileged users in FLOSS} development projects and in Wikipedia. From our results, we conclude that the inequality level of user contributions in both types of initiatives is remarkably distinct, even though both communities present almost identical patterns regarding the number of distinct contributors per file (in FLOSS} projects) or per article (in Wikipedia). As a result, totally open projects like Wikipedia can effectively deal with faster growing rates, while FLOSS} projects may be affected by bottlenecks on committers who play critical roles. }}


 * -- align="left" valign=top
 * Ortega, Felipe; Gonzalez-Barahona, Jesus M. & Robles, Gregorio
 * On the inequality of contributions to wikipedia
 * 41st Annual Hawaii International Conference on System Sciences 2008, HICSS, January 7, 2008 - January 10, 2008 Big Island, {HI, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Suda, Martin; Weidenbach, Christoph & Wischnewski, Patrick
 * On the saturation of YAGO
 * 5th International Joint Conference on Automated Reasoning, IJCAR 2010, July 16, 2010 - July 19, 2010 Edinburgh, United kingdom
 * 2010
 * 
 * {{hidden||YAGO} is an automatically generated ontology out of Wikipedia and WordNet.} It is eventually represented in a proprietary flat text file format and a core comprises 10 million facts and formulas. We present a translation of YAGO} into the Bernays-Schonfinkel} Horn class with equality. A new variant of the superposition calculus is sound, complete and terminating for this class. Together with extended term indexing data structures the new calculus is implemented in Spass-YAGO.} YAGO} can be finitely saturated by Spass-YAGO} in about 1 hour. We have found 49 inconsistencies in the original generated ontology which we have fixed. Spass-YAGO} can then prove non-trivial conjectures with respect to the resulting saturated and consistent clause set of about 1.4 GB} in less than one second.}}
 * {{hidden||YAGO} is an automatically generated ontology out of Wikipedia and WordNet.} It is eventually represented in a proprietary flat text file format and a core comprises 10 million facts and formulas. We present a translation of YAGO} into the Bernays-Schonfinkel} Horn class with equality. A new variant of the superposition calculus is sound, complete and terminating for this class. Together with extended term indexing data structures the new calculus is implemented in Spass-YAGO.} YAGO} can be finitely saturated by Spass-YAGO} in about 1 hour. We have found 49 inconsistencies in the original generated ontology which we have fixed. Spass-YAGO} can then prove non-trivial conjectures with respect to the resulting saturated and consistent clause set of about 1.4 GB} in less than one second.}}


 * -- align="left" valign=top
 * Wang, Huan; Jiang, Xing; Chia, Liang-Tien & Tan, Ah-Hwee
 * Ontology enhanced web image retrieval: Aided by wikipedia spreading activation theory
 * 1st International ACM Conference on Multimedia Information Retrieval, MIR2008, Co-located with the 2008 ACM International Conference on Multimedia, MM'08, August 30, 2008 - August 31, 2008 Vancouver, BC, Canada
 * 2008
 * 


 * -- align="left" valign=top
 * Yu, Jonathan; Thom, James A. & Tam, Audrey
 * Ontology evaluation using wikipedia categories for browsing
 * Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
 * 2007
 * 


 * -- align="left" valign=top
 * Aleahmad, Turadg; Aleven, Vincent & Kraut, Robert
 * Open community authoring of targeted worked example problems
 * 9th International Conference on Intelligent Tutoring Systems, ITS 2008, June 23, 2008 - June 27, 2008 Montreal, QC, Canada
 * 2008
 * 
 * {{hidden||Open collaborative authoring systems such as Wikipedia are growing in use and impact. How well does this model work for the development of educational resources? In particular, can volunteers contribute materials of sufficient quality? Could they create resources that meet students' specific learning needs and engage their personal characteristics? Our experiment explored these questions using a novel web-based tool for authoring worked examples. Participants were professional teachers (math and non-math) and amateurs. Participants were randomly assigned to the basic tool, or to an enhanced version that prompted authors to create materials for a specific (fictitious) student. We find that while there are differences by teaching status, all three groups make contributions of worth and that targeting a specific student leads contributors to author materials with greater potential to engage students. The experiment suggests that community authoring of educational resources is a feasible model of development and can enable new levels of personalization. 2008 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Open collaborative authoring systems such as Wikipedia are growing in use and impact. How well does this model work for the development of educational resources? In particular, can volunteers contribute materials of sufficient quality? Could they create resources that meet students' specific learning needs and engage their personal characteristics? Our experiment explored these questions using a novel web-based tool for authoring worked examples. Participants were professional teachers (math and non-math) and amateurs. Participants were randomly assigned to the basic tool, or to an enhanced version that prompted authors to create materials for a specific (fictitious) student. We find that while there are differences by teaching status, all three groups make contributions of worth and that targeting a specific student leads contributors to author materials with greater potential to engage students. The experiment suggests that community authoring of educational resources is a feasible model of development and can enable new levels of personalization. 2008 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Wu, Fei & Weld, Daniel S.
 * Open information extraction using Wikipedia
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE} systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE} system which improves dramatically on TextRunner's} precision and recall. The key to WOE's} performance is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE's} extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE} can operate in two modes: when restricted to POS} tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.}}
 * {{hidden||Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE} systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE} system which improves dramatically on TextRunner's} precision and recall. The key to WOE's} performance is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE's} extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE} can operate in two modes: when restricted to POS} tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.}}


 * -- align="left" valign=top
 * Utiyama, Masao; Tanimura, Midori & Isahara, Hitoshi
 * Organizing English reading materials for vocabulary learning
 * Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
 * 2005
 * 


 * -- align="left" valign=top
 * Gorgeon, Arnaud & Swanson, E. Burton
 * Organizing the vision for web 2.0: a study of the evolution of the concept in Wikipedia
 * Proceedings of the 5th International Symposium on Wikis and Open Collaboration
 * 2009
 * 


 * -- align="left" valign=top
 * Basile, Pierpaolo; Gemmis, Marco De; Lops, Pasquale & Semeraro, Giovanni
 * OTTHO: On the tip of my THOught
 * European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2009, September 7, 2009 - September 11, 2009 Bled, Slovenia
 * 2009
 * 
 * {{hidden||This paper describes OTTHO} (On} the Tip of my THOught), a system designed for solving a language game called Guillotine. The rule of the game is simple: the player observes five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. The system exploits several knowledge sources, such as a dictionary, a set of proverbs, and Wikipedia to realize a knowledge infusion process. The main motivation for designing an artificial player for Guillotine is the challenge of providing the machine with the cultural and linguistic background knowledge which makes it similar to a human being, with the ability of interpreting natural language documents and reasoning on their content. Our feeling is that the approach presented in this work has a great potential for other more practical applications besides solving a language game. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||This paper describes OTTHO} (On} the Tip of my THOught), a system designed for solving a language game called Guillotine. The rule of the game is simple: the player observes five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. The system exploits several knowledge sources, such as a dictionary, a set of proverbs, and Wikipedia to realize a knowledge infusion process. The main motivation for designing an artificial player for Guillotine is the challenge of providing the machine with the cultural and linguistic background knowledge which makes it similar to a human being, with the ability of interpreting natural language documents and reasoning on their content. Our feeling is that the approach presented in this work has a great potential for other more practical applications besides solving a language game. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Paşca, Marius
 * Outclassing Wikipedia in open-domain information extraction: weakly-supervised acquisition of attributes over conceptual hierarchies
 * Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
 * 2009
 * 


 * -- align="left" valign=top
 * Gabrilovich, E. & Markovitch, S.
 * Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge
 * Twenty-First National Conference on Artificial Intelligence (AAAI-06). Eighteenth Innovative Applications of Artificial Intelligence Conference (IAAI-06), 16-20 July 2006 Menlo Park, CA, USA}
 * 2007
 * {{hidden||When humans approach the task of text categorization, they interpret the specific wording of the document in the much larger context of their background knowledge and experience. On the other hand, state-of-the-art information retrieval systems are quite brittle-they traditionally represent documents as bags of words, and are restricted to learning from individual word occurrences in the (necessarily limited) training set. For instance, given the sentence Wal-Mart} supply chain goes real time" how can a text categorization system know that Wal-Mart} manages its stock with RFID} technology? And having read that {"Ciprofloxacin} belongs to the quinolones group" how on earth can a machine know that the drug mentioned is an antibiotic produced by Bayer? In this paper we present algorithms that can do just that. We propose to enrich document representation through automatic use of a vast compendium of human knowledge-an encyclopedia. We apply machine learning techniques to Wikipedia the largest encyclopedia to date which surpasses in scope many conventional encyclopedias and provides a cornucopia of world knowledge. Each Wikipedia article represents a concept}}
 * {{hidden||When humans approach the task of text categorization, they interpret the specific wording of the document in the much larger context of their background knowledge and experience. On the other hand, state-of-the-art information retrieval systems are quite brittle-they traditionally represent documents as bags of words, and are restricted to learning from individual word occurrences in the (necessarily limited) training set. For instance, given the sentence Wal-Mart} supply chain goes real time" how can a text categorization system know that Wal-Mart} manages its stock with RFID} technology? And having read that {"Ciprofloxacin} belongs to the quinolones group" how on earth can a machine know that the drug mentioned is an antibiotic produced by Bayer? In this paper we present algorithms that can do just that. We propose to enrich document representation through automatic use of a vast compendium of human knowledge-an encyclopedia. We apply machine learning techniques to Wikipedia the largest encyclopedia to date which surpasses in scope many conventional encyclopedias and provides a cornucopia of world knowledge. Each Wikipedia article represents a concept}}
 * {{hidden||When humans approach the task of text categorization, they interpret the specific wording of the document in the much larger context of their background knowledge and experience. On the other hand, state-of-the-art information retrieval systems are quite brittle-they traditionally represent documents as bags of words, and are restricted to learning from individual word occurrences in the (necessarily limited) training set. For instance, given the sentence Wal-Mart} supply chain goes real time" how can a text categorization system know that Wal-Mart} manages its stock with RFID} technology? And having read that {"Ciprofloxacin} belongs to the quinolones group" how on earth can a machine know that the drug mentioned is an antibiotic produced by Bayer? In this paper we present algorithms that can do just that. We propose to enrich document representation through automatic use of a vast compendium of human knowledge-an encyclopedia. We apply machine learning techniques to Wikipedia the largest encyclopedia to date which surpasses in scope many conventional encyclopedias and provides a cornucopia of world knowledge. Each Wikipedia article represents a concept}}


 * -- align="left" valign=top
 * Ingawale, Myshkin; Roy, Rahul & Seetharaman, Priya
 * Persistence of Cultural Norms in Online Communities: The Curious Case of WikiLove
 * 2009
 * 
 * 


 * -- align="left" valign=top
 * Mimno, David; Wallach, Hanna M.; Naradowsky, Jason; Smith, David A. & McCallum, Andrew
 * Polylingual topic models
 * Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
 * 2009
 * 


 * -- align="left" valign=top
 * Converse, Tim; Kaplan, Ronald M.; Pell, Barney; Prevost, Scott; Thione, Lorenzo & Walters, Chad
 * Powerset's natural language wikipedia search engine
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008
 * {{hidden||This demonstration shows the capabilities and features of Powerset's natural language search engine as applied to the English Wikipedia. Powerset has assembled scalable document retrieval technology to construct a semantic index of the World Wide Web. In order to develop and test our technology, we have released a search product (at http://www.powerset.com) that incorporates all the information from the English Wikipedia. The product also integrates community-edited content from Metaweb's Freebase database of structured information. Users may query the index using keywords, natural language questions or phrases. Retrieval latency is comparable to standard key-word based consumer search engines. Powerset semantic indexing is based on the XLE, Natural Language Processing technology licensed from the Palo Alto Research Center (PARC).} During both indexing and querying, we apply our deep natural language analysis methods to extract semantic facts" - relations and semantic connections between words and concepts - from all the sentences in Wikipedia. At query time advanced search-engineering technology makes these facts available for retrieval by matching them against facts or partial facts extracted from the query. In this demonstration we show how retrieved information is presented as conventional search results with links to relevant Wikipedia pages. We also demonstrate how the distilled semantic relations are organized in a browsing format that shows relevant subject/relation/object triples related to the user's query. This makes it easy both to find other relevant pages and to use our Search-Within-The-Page} feature to localize additional semantic searches to the text of the selected target page. Together these features summarize the facts on a page and allow navigation directly to information of interest to individual users. Looking ahead beyond continuous improvements to core search and scaling to much larger collections of content Powerset's automatic extraction of semantic facts can be used to create and extend knowledge resources including lexicons ontologies}}
 * {{hidden||This demonstration shows the capabilities and features of Powerset's natural language search engine as applied to the English Wikipedia. Powerset has assembled scalable document retrieval technology to construct a semantic index of the World Wide Web. In order to develop and test our technology, we have released a search product (at http://www.powerset.com) that incorporates all the information from the English Wikipedia. The product also integrates community-edited content from Metaweb's Freebase database of structured information. Users may query the index using keywords, natural language questions or phrases. Retrieval latency is comparable to standard key-word based consumer search engines. Powerset semantic indexing is based on the XLE, Natural Language Processing technology licensed from the Palo Alto Research Center (PARC).} During both indexing and querying, we apply our deep natural language analysis methods to extract semantic facts" - relations and semantic connections between words and concepts - from all the sentences in Wikipedia. At query time advanced search-engineering technology makes these facts available for retrieval by matching them against facts or partial facts extracted from the query. In this demonstration we show how retrieved information is presented as conventional search results with links to relevant Wikipedia pages. We also demonstrate how the distilled semantic relations are organized in a browsing format that shows relevant subject/relation/object triples related to the user's query. This makes it easy both to find other relevant pages and to use our Search-Within-The-Page} feature to localize additional semantic searches to the text of the selected target page. Together these features summarize the facts on a page and allow navigation directly to information of interest to individual users. Looking ahead beyond continuous improvements to core search and scaling to much larger collections of content Powerset's automatic extraction of semantic facts can be used to create and extend knowledge resources including lexicons ontologies}}
 * {{hidden||This demonstration shows the capabilities and features of Powerset's natural language search engine as applied to the English Wikipedia. Powerset has assembled scalable document retrieval technology to construct a semantic index of the World Wide Web. In order to develop and test our technology, we have released a search product (at http://www.powerset.com) that incorporates all the information from the English Wikipedia. The product also integrates community-edited content from Metaweb's Freebase database of structured information. Users may query the index using keywords, natural language questions or phrases. Retrieval latency is comparable to standard key-word based consumer search engines. Powerset semantic indexing is based on the XLE, Natural Language Processing technology licensed from the Palo Alto Research Center (PARC).} During both indexing and querying, we apply our deep natural language analysis methods to extract semantic facts" - relations and semantic connections between words and concepts - from all the sentences in Wikipedia. At query time advanced search-engineering technology makes these facts available for retrieval by matching them against facts or partial facts extracted from the query. In this demonstration we show how retrieved information is presented as conventional search results with links to relevant Wikipedia pages. We also demonstrate how the distilled semantic relations are organized in a browsing format that shows relevant subject/relation/object triples related to the user's query. This makes it easy both to find other relevant pages and to use our Search-Within-The-Page} feature to localize additional semantic searches to the text of the selected target page. Together these features summarize the facts on a page and allow navigation directly to information of interest to individual users. Looking ahead beyond continuous improvements to core search and scaling to much larger collections of content Powerset's automatic extraction of semantic facts can be used to create and extend knowledge resources including lexicons ontologies}}


 * -- align="left" valign=top
 * Leskovec, Jure; Huttenlocher, Daniel & Kleinberg, Jon
 * Predicting positive and negative links in online social networks
 * Proceedings of the 19th international conference on World wide web
 * 2010
 * 


 * -- align="left" valign=top
 * Wissner-Gross, Alexander D.
 * Preparation of topical reading lists from the link structure of Wikipedia
 * 6th International Conference on Advanced Learning Technologies, ICALT 2006, July 5, 2006 - July 7, 2006 Kerkrade, Netherlands
 * 2006
 * {{hidden||Personalized reading preparation poses an important challenge for education and continuing education. Using a PageRank} derivative and graph distance ordering, we show that personalized background reading lists can be generated automatically from the link structure of Wikipedia. We examine the operation of our new tool in professional, student, and interdisciplinary researcher learning models. Additionally, we present desktop and mobile interfaces for the generated reading lists. }}
 * {{hidden||Personalized reading preparation poses an important challenge for education and continuing education. Using a PageRank} derivative and graph distance ordering, we show that personalized background reading lists can be generated automatically from the link structure of Wikipedia. We examine the operation of our new tool in professional, student, and interdisciplinary researcher learning models. Additionally, we present desktop and mobile interfaces for the generated reading lists. }}
 * {{hidden||Personalized reading preparation poses an important challenge for education and continuing education. Using a PageRank} derivative and graph distance ordering, we show that personalized background reading lists can be generated automatically from the link structure of Wikipedia. We examine the operation of our new tool in professional, student, and interdisciplinary researcher learning models. Additionally, we present desktop and mobile interfaces for the generated reading lists. }}


 * -- align="left" valign=top
 * Dondio, Pierpaolo & Barrett, Stephen
 * Presumptive selection of trust evidence
 * Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
 * 2007
 * 


 * -- align="left" valign=top
 * Moon, Hyun J.; Curino, Carlo A.; Ham, Myungwon & Zaniolo, Carlo
 * PRIMA: archiving and querying historical data with evolving schemas
 * Proceedings of the 35th SIGMOD international conference on Management of data
 * 2009
 * 
 * {{hidden||Schema evolution poses serious challenges in historical data management. Traditionally, historical data have been archived either by (i) migrating them into the current schema version that is well-understood by users but compromising archival quality, or (ii) by maintaining them under the original schema version in which the data was originally created, leading to perfect archival quality, but forcing users to formulate queries against complex histories of evolving schemas. In the PRIMA} system, we achieve the best of both approaches, by (i) archiving historical data under the schema version under which they were originally created, and (ii) letting users express temporal queries using the current schema version. Thus, in PRIMA, the system rewrites the queries to the (potentially many) pertinent versions of the evolving schema. Moreover, the system o ers automatic documentation of the schema history, and allows the users to pose temporal queries over the metadata history itself. The proposed demonstration highlights the system features exploiting both a synthetic-educational running example and the real-life evolution histories (schemas and data), which include hundreds of schema versions from Wikipedia and Ensembl. The demonstration off ers a thorough walk-through of the system features and a hands-on system testing phase, where the audiences are invited to directly interact with the advanced query interface of PRIMA.}}
 * {{hidden||Schema evolution poses serious challenges in historical data management. Traditionally, historical data have been archived either by (i) migrating them into the current schema version that is well-understood by users but compromising archival quality, or (ii) by maintaining them under the original schema version in which the data was originally created, leading to perfect archival quality, but forcing users to formulate queries against complex histories of evolving schemas. In the PRIMA} system, we achieve the best of both approaches, by (i) archiving historical data under the schema version under which they were originally created, and (ii) letting users express temporal queries using the current schema version. Thus, in PRIMA, the system rewrites the queries to the (potentially many) pertinent versions of the evolving schema. Moreover, the system o ers automatic documentation of the schema history, and allows the users to pose temporal queries over the metadata history itself. The proposed demonstration highlights the system features exploiting both a synthetic-educational running example and the real-life evolution histories (schemas and data), which include hundreds of schema versions from Wikipedia and Ensembl. The demonstration off ers a thorough walk-through of the system features and a hands-on system testing phase, where the audiences are invited to directly interact with the advanced query interface of PRIMA.}}


 * -- align="left" valign=top
 * Riehle, Dirk & Noble, James
 * Proceedings of the 2006 international symposium on Wikis
 * HT '06 17th Conference on Hypertext and Hypermedia
 * 2006
 * 
 * {{hidden||It is our great pleasure to welcome you to the 2nd International Symposium on Wikis -- WikiSym} 2006. As its name suggests, this is the second meeting of the academic community of research, practice, and use that has built up around lightweight editable web sites. Growing from Ward Cunningham's original wiki --- the Portland Pattern Repository --- up to Wikipedia, one of the largest sites on the Web (over 1.2 million pages in English alone at time of writing), wikis are rapidly becoming commonplace in the experience of millions of people every day. The Wiki Symposium exists to report new developments in wiki technology, to describe the state of practice, and to reflect on that Experience.The} call for papers for this year's symposium attracted 27 submissions of research papers and practitioner reports, on a wide variety of topics. Ten research papers and one practitioner report were accepted by the symposium programme committee and to be presented at the symposium: These presentations will make up the symposiums' technical programme. Every submitted paper was reviewed by at least three members of the programme committee; while papers submitted by committee members were reviewed by at least four committee members who had not themselves submitted papers to the Symposium.The} symposium proceedings also include abstracts for the keynote talks, panels, workshops, and demonstrations to provide a record of the whole of the symposium as well as an interview with Angela Beesley, the opening keynoter, on the topic of her talk.}}
 * {{hidden||It is our great pleasure to welcome you to the 2nd International Symposium on Wikis -- WikiSym} 2006. As its name suggests, this is the second meeting of the academic community of research, practice, and use that has built up around lightweight editable web sites. Growing from Ward Cunningham's original wiki --- the Portland Pattern Repository --- up to Wikipedia, one of the largest sites on the Web (over 1.2 million pages in English alone at time of writing), wikis are rapidly becoming commonplace in the experience of millions of people every day. The Wiki Symposium exists to report new developments in wiki technology, to describe the state of practice, and to reflect on that Experience.The} call for papers for this year's symposium attracted 27 submissions of research papers and practitioner reports, on a wide variety of topics. Ten research papers and one practitioner report were accepted by the symposium programme committee and to be presented at the symposium: These presentations will make up the symposiums' technical programme. Every submitted paper was reviewed by at least three members of the programme committee; while papers submitted by committee members were reviewed by at least four committee members who had not themselves submitted papers to the Symposium.The} symposium proceedings also include abstracts for the keynote talks, panels, workshops, and demonstrations to provide a record of the whole of the symposium as well as an interview with Angela Beesley, the opening keynoter, on the topic of her talk.}}


 * -- align="left" valign=top
 * Aguiar, Ademar & Bernstein, Mark
 * Proceedings of the 4th International Symposium on Wikis
 * WikiSym08 2008 International Symposium on Wikis
 * 2008
 * 
 * {{hidden||Welcome to the 4th International Symposium on Wikis, Porto, Portugal. The Faculty of Engineering from University of Porto (FEUP) is honoured to host on its campus this year's WikiSym} - the premier conference devoted to research and practice on wikis. Once again, WikiSym} will gather researchers, professionals, writers, scholars and users to share knowledge and experiences on many topics related to wikis and wiki philosophy, ranging from wiki linguistics to graphic visualization in wikis, and from the vast Wikipedia to tiny location-based wikis. A large and diverse program committee exhaustively reviewed more than fifty technical papers, from which the best high quality ones were selected. During a meeting at Porto, the final structure of the conference program was defined, and later consolidated with workshops, panels, tutorials, posters and demos. For the first time, there will be a Doctoral Space, for young researchers, and the WikiFest, devoted for practitioners. More than twenty hours of OpenSpace} will be available for you to fill in. After the conference, WikiWalk} will take the symposium out into the streetscape, as investigators and users walk along the city of Porto, joined by citizens, journalists, and other leaders. Casual and spontaneous discussions will allow the users to share their experiences, concerns and challenges with wiki experts and researchers. Such a diverse conference program reflects the nature of wikis, the tremendous vitality of the wiki spirit, and its ever-widening community.}}
 * {{hidden||Welcome to the 4th International Symposium on Wikis, Porto, Portugal. The Faculty of Engineering from University of Porto (FEUP) is honoured to host on its campus this year's WikiSym} - the premier conference devoted to research and practice on wikis. Once again, WikiSym} will gather researchers, professionals, writers, scholars and users to share knowledge and experiences on many topics related to wikis and wiki philosophy, ranging from wiki linguistics to graphic visualization in wikis, and from the vast Wikipedia to tiny location-based wikis. A large and diverse program committee exhaustively reviewed more than fifty technical papers, from which the best high quality ones were selected. During a meeting at Porto, the final structure of the conference program was defined, and later consolidated with workshops, panels, tutorials, posters and demos. For the first time, there will be a Doctoral Space, for young researchers, and the WikiFest, devoted for practitioners. More than twenty hours of OpenSpace} will be available for you to fill in. After the conference, WikiWalk} will take the symposium out into the streetscape, as investigators and users walk along the city of Porto, joined by citizens, journalists, and other leaders. Casual and spontaneous discussions will allow the users to share their experiences, concerns and challenges with wiki experts and researchers. Such a diverse conference program reflects the nature of wikis, the tremendous vitality of the wiki spirit, and its ever-widening community.}}


 * -- align="left" valign=top
 * Ayers, Phoebe & Ortega, Felipe
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * WikiSym '10 2010 International Symposium on Wikis and Open Collaboration
 * 2010
 * 
 * {{hidden||Welcome to WikiSym} 2010, the 6th International Symposium on Wikis and Open Collaboration! WikiSym} 2010 is located in the picturesque city of Gdansk, Poland at the Dom Muzyka, a historic music academy. The event includes 3 days of cutting-edge research and practice on topics related to open collaboration. This proceedings of WikiSym} 2010 are aimed to act as a permanent record of the conference activities. This year for the first time, WikiSym} is co-located with Wikimania 2010, the international community conference of the Wikimedia Foundation projects, which is taking place right after WikiSym.} The general program of WikiSym} 2010 builds on the success of previous years, formally embracing different aspects of open collaboration research and practice. To support this, for the first time the program is divided into 3 complementary tracks, each focusing on a specific area of interest in this field. The Wiki Track includes contributions specifically dealing with research, deployment, use and management of wiki platforms and the communities around them. The Industry Track draws together practitioners, entrepreneurs and industry managers and employees to better understand open collaboration ecosystems in corporate environments. Finally, the Open Collaboration Track comprises all other aspects related to open cooperative initiatives and projects. Related to this, you will find a growing number of contributions dealing with nontechnical perspectives of open collaboration, such as debates on educational resources and sociopolitical aspects. You will also find the traditional technical papers, plus tutorials, workshops, panels and demos. The success of the new broadened scope of WikiSym} reflects the very high interest in wikis and open collaboration existing today. Cliff Lampe from Michigan State University will be opening the symposium with a talk on The} Machine in the Ghost: a SocioTechnical} Systems Approach to User-Generated} Content Research". Likewise Andrew Lih will be giving the closing keynote session on {"What} Hath Wikipedia Wrought". These represent only two of the talks and sessions that attendees will find at WikiSym} 2010. Fortyone research papers were submitted this year to the academic program and sixteen were accepted for an acceptance rate of 39\%. All papers were revised by at least three reviewers though some of them had up to five different reviewers. Authors from accepted papers come from 18 different countries}}
 * {{hidden||Welcome to WikiSym} 2010, the 6th International Symposium on Wikis and Open Collaboration! WikiSym} 2010 is located in the picturesque city of Gdansk, Poland at the Dom Muzyka, a historic music academy. The event includes 3 days of cutting-edge research and practice on topics related to open collaboration. This proceedings of WikiSym} 2010 are aimed to act as a permanent record of the conference activities. This year for the first time, WikiSym} is co-located with Wikimania 2010, the international community conference of the Wikimedia Foundation projects, which is taking place right after WikiSym.} The general program of WikiSym} 2010 builds on the success of previous years, formally embracing different aspects of open collaboration research and practice. To support this, for the first time the program is divided into 3 complementary tracks, each focusing on a specific area of interest in this field. The Wiki Track includes contributions specifically dealing with research, deployment, use and management of wiki platforms and the communities around them. The Industry Track draws together practitioners, entrepreneurs and industry managers and employees to better understand open collaboration ecosystems in corporate environments. Finally, the Open Collaboration Track comprises all other aspects related to open cooperative initiatives and projects. Related to this, you will find a growing number of contributions dealing with nontechnical perspectives of open collaboration, such as debates on educational resources and sociopolitical aspects. You will also find the traditional technical papers, plus tutorials, workshops, panels and demos. The success of the new broadened scope of WikiSym} reflects the very high interest in wikis and open collaboration existing today. Cliff Lampe from Michigan State University will be opening the symposium with a talk on The} Machine in the Ghost: a SocioTechnical} Systems Approach to User-Generated} Content Research". Likewise Andrew Lih will be giving the closing keynote session on {"What} Hath Wikipedia Wrought". These represent only two of the talks and sessions that attendees will find at WikiSym} 2010. Fortyone research papers were submitted this year to the academic program and sixteen were accepted for an acceptance rate of 39\%. All papers were revised by at least three reviewers though some of them had up to five different reviewers. Authors from accepted papers come from 18 different countries}}


 * -- align="left" valign=top
 * Nack, Frank
 * Proceedings of the ACM workshop on Multimedia for human communication: from capture to convey
 * SPAA99 11th Annual ACM Symposium on Parallel Algorithms and Architectures
 * 2005
 * 
 * {{hidden||It gives us great pleasure to welcome you to the 1st ACM} International Workshop on Multimedia for Human Communication -- From Capture to Convey (MHC'05).} This workshop was inspired by the Dagstuhl meeting 05091 Multimedia} Research -- where do we need to go tomorrow" (http://www.dagstuhl.de/05091/) organised by Susanne Boll Ramesh Jain Tat-Seng} Chua and Navenka Dimitrova. Members of the working group were: Lynda Hardman Brigitte Kerhervé Stephen Kimani}}
 * {{hidden||It gives us great pleasure to welcome you to the 1st ACM} International Workshop on Multimedia for Human Communication -- From Capture to Convey (MHC'05).} This workshop was inspired by the Dagstuhl meeting 05091 Multimedia} Research -- where do we need to go tomorrow" (http://www.dagstuhl.de/05091/) organised by Susanne Boll Ramesh Jain Tat-Seng} Chua and Navenka Dimitrova. Members of the working group were: Lynda Hardman Brigitte Kerhervé Stephen Kimani}}


 * -- align="left" valign=top
 * Gil, Yolanda & Noy, Natasha
 * Proceedings of the fifth international conference on Knowledge capture
 * K--CAP '09 Fifth International Conference on Knowledge Capture 2009
 * 2009
 * 
 * {{hidden||In today's knowledge-driven world, effective access to and use of information is a key enabler for progress. Modern technologies not only are themselves knowledge-intensive technologies, but also produce enormous amounts of new information that we must process and aggregate. These technologies require knowledge capture, which involve the extraction of useful knowledge from vast and diverse sources of information as well as its acquisition directly from users. Driven by the demands for knowledge-based applications and the unprecedented availability of information on the Web, the study of knowledge capture has a renewed importance. This volume presents the papers and poster and demo descriptions for the Fifth International Conference on Knowledge Capture (KCAP} 2009). K-CAP} 2009 brought together researchers that belong to several distinct research communities, including knowledge engineering, machine learning, natural language processing, human-computer interaction, artificial intelligence and the Semantic Web. This year's conference continues its tradition of being the premier forum for presentation of research results and experience reports on leading edge issues of knowledge capture. The call for papers attracted 81 submissions from Asia, Europe, and North America. The international program committee accepted 21 papers that cover a variety of topics, including research on knowledge extraction, ontologies and vocabularies, interactive systems, evaluation of knowledge-based systems, and other topics. In addition, this volume includes descriptions of 21 posters and demos that were presented at the conference. The K-CAP} 2009 program included two keynote talks. Professor Daniel Weld gave the keynote address entitled Machine} Reading: from Wikipedia to the Web". Professor Nigel Shadbolt talked about {"Web} Science: A New Frontier" in his keynote address. Two tutorials and four workshops rounded up the conference program. We hope that these proceedings will serve as a valuable reference for security researchers and developers."}}
 * {{hidden||In today's knowledge-driven world, effective access to and use of information is a key enabler for progress. Modern technologies not only are themselves knowledge-intensive technologies, but also produce enormous amounts of new information that we must process and aggregate. These technologies require knowledge capture, which involve the extraction of useful knowledge from vast and diverse sources of information as well as its acquisition directly from users. Driven by the demands for knowledge-based applications and the unprecedented availability of information on the Web, the study of knowledge capture has a renewed importance. This volume presents the papers and poster and demo descriptions for the Fifth International Conference on Knowledge Capture (KCAP} 2009). K-CAP} 2009 brought together researchers that belong to several distinct research communities, including knowledge engineering, machine learning, natural language processing, human-computer interaction, artificial intelligence and the Semantic Web. This year's conference continues its tradition of being the premier forum for presentation of research results and experience reports on leading edge issues of knowledge capture. The call for papers attracted 81 submissions from Asia, Europe, and North America. The international program committee accepted 21 papers that cover a variety of topics, including research on knowledge extraction, ontologies and vocabularies, interactive systems, evaluation of knowledge-based systems, and other topics. In addition, this volume includes descriptions of 21 posters and demos that were presented at the conference. The K-CAP} 2009 program included two keynote talks. Professor Daniel Weld gave the keynote address entitled Machine} Reading: from Wikipedia to the Web". Professor Nigel Shadbolt talked about {"Web} Science: A New Frontier" in his keynote address. Two tutorials and four workshops rounded up the conference program. We hope that these proceedings will serve as a valuable reference for security researchers and developers."}}


 * -- align="left" valign=top
 * Proceedings of WikiSym 2010 - The 6th International Symposium on Wikis and Open Collaboration
 * 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland
 * 2010
 * {{hidden||The proceedings contain 35 papers. The topics discussed include: who integrates the networks of knowledge in wikipedia?; deep hypertext with embedded revision control implemented in regular expressions; semantic search on heterogeneous wiki systems; wikis at work: success factors and challenges for sustainability of enterprise wikis; model-aware wiki analysis tools: the case of {HistoryFlow;} ThinkFree:} using a visual wiki for IT} knowledge management in a tertiary institution; openness as an asset. a classification system for online communities based on actor-network theory; the Austrian way of wiki(pedia)! development of a structured wiki-based encyclopedia within a local Austrian context; a wiki-based collective intelligence approach to formulate a body of knowledge (BOK) for a new discipline; project management in the wikipedia community; a taxonomy of wiki genres in enterprise settings; and towards sensitive information redaction in a collaborative multilevel security environment.}}
 * {{hidden||The proceedings contain 35 papers. The topics discussed include: who integrates the networks of knowledge in wikipedia?; deep hypertext with embedded revision control implemented in regular expressions; semantic search on heterogeneous wiki systems; wikis at work: success factors and challenges for sustainability of enterprise wikis; model-aware wiki analysis tools: the case of {HistoryFlow;} ThinkFree:} using a visual wiki for IT} knowledge management in a tertiary institution; openness as an asset. a classification system for online communities based on actor-network theory; the Austrian way of wiki(pedia)! development of a structured wiki-based encyclopedia within a local Austrian context; a wiki-based collective intelligence approach to formulate a body of knowledge (BOK) for a new discipline; project management in the wikipedia community; a taxonomy of wiki genres in enterprise settings; and towards sensitive information redaction in a collaborative multilevel security environment.}}
 * {{hidden||The proceedings contain 35 papers. The topics discussed include: who integrates the networks of knowledge in wikipedia?; deep hypertext with embedded revision control implemented in regular expressions; semantic search on heterogeneous wiki systems; wikis at work: success factors and challenges for sustainability of enterprise wikis; model-aware wiki analysis tools: the case of {HistoryFlow;} ThinkFree:} using a visual wiki for IT} knowledge management in a tertiary institution; openness as an asset. a classification system for online communities based on actor-network theory; the Austrian way of wiki(pedia)! development of a structured wiki-based encyclopedia within a local Austrian context; a wiki-based collective intelligence approach to formulate a body of knowledge (BOK) for a new discipline; project management in the wikipedia community; a taxonomy of wiki genres in enterprise settings; and towards sensitive information redaction in a collaborative multilevel security environment.}}
 * {{hidden||The proceedings contain 35 papers. The topics discussed include: who integrates the networks of knowledge in wikipedia?; deep hypertext with embedded revision control implemented in regular expressions; semantic search on heterogeneous wiki systems; wikis at work: success factors and challenges for sustainability of enterprise wikis; model-aware wiki analysis tools: the case of {HistoryFlow;} ThinkFree:} using a visual wiki for IT} knowledge management in a tertiary institution; openness as an asset. a classification system for online communities based on actor-network theory; the Austrian way of wiki(pedia)! development of a structured wiki-based encyclopedia within a local Austrian context; a wiki-based collective intelligence approach to formulate a body of knowledge (BOK) for a new discipline; project management in the wikipedia community; a taxonomy of wiki genres in enterprise settings; and towards sensitive information redaction in a collaborative multilevel security environment.}}


 * -- align="left" valign=top
 * Proceedings of WikiSym'06 - 2006 International Symposium on Wikis
 * WikiSym'06 - 2006 International Symposium on Wikis, August 21, 2006 - August 23, 2006 Odense, Denmark
 * 2006
 * {{hidden||The proceedings contain 26 papers. The topics discussed include: how and why wikipedia works; how and why wikipedia works: an interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko; intimate information: organic hypertext structure and incremental; the augmented wiki; wiki uses in teaching and learning; the future of wikis; translation the wiki way; the radeox wiki render engine; is there a space for the teacher in a WIKI?;} wikitrails: augmenting wiki structure for collaborative, interdisciplinary learning; towards wikis as semantic hypermedia; constrained wiki: an oxymoron?; corporate wiki users: results of a survey; workshop on wikipedia research; wiki markup standard workshop; wiki-based knowledge engineering: second workshop on semantic wikis; semantic wikipedia;and ontowiki: community-driven ontology engineering and ontology usage based on wikis.}}
 * {{hidden||The proceedings contain 26 papers. The topics discussed include: how and why wikipedia works; how and why wikipedia works: an interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko; intimate information: organic hypertext structure and incremental; the augmented wiki; wiki uses in teaching and learning; the future of wikis; translation the wiki way; the radeox wiki render engine; is there a space for the teacher in a WIKI?;} wikitrails: augmenting wiki structure for collaborative, interdisciplinary learning; towards wikis as semantic hypermedia; constrained wiki: an oxymoron?; corporate wiki users: results of a survey; workshop on wikipedia research; wiki markup standard workshop; wiki-based knowledge engineering: second workshop on semantic wikis; semantic wikipedia;and ontowiki: community-driven ontology engineering and ontology usage based on wikis.}}
 * {{hidden||The proceedings contain 26 papers. The topics discussed include: how and why wikipedia works; how and why wikipedia works: an interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko; intimate information: organic hypertext structure and incremental; the augmented wiki; wiki uses in teaching and learning; the future of wikis; translation the wiki way; the radeox wiki render engine; is there a space for the teacher in a WIKI?;} wikitrails: augmenting wiki structure for collaborative, interdisciplinary learning; towards wikis as semantic hypermedia; constrained wiki: an oxymoron?; corporate wiki users: results of a survey; workshop on wikipedia research; wiki markup standard workshop; wiki-based knowledge engineering: second workshop on semantic wikis; semantic wikipedia;and ontowiki: community-driven ontology engineering and ontology usage based on wikis.}}
 * {{hidden||The proceedings contain 26 papers. The topics discussed include: how and why wikipedia works; how and why wikipedia works: an interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko; intimate information: organic hypertext structure and incremental; the augmented wiki; wiki uses in teaching and learning; the future of wikis; translation the wiki way; the radeox wiki render engine; is there a space for the teacher in a WIKI?;} wikitrails: augmenting wiki structure for collaborative, interdisciplinary learning; towards wikis as semantic hypermedia; constrained wiki: an oxymoron?; corporate wiki users: results of a survey; workshop on wikipedia research; wiki markup standard workshop; wiki-based knowledge engineering: second workshop on semantic wikis; semantic wikipedia;and ontowiki: community-driven ontology engineering and ontology usage based on wikis.}}


 * -- align="left" valign=top
 * Ung, Hang & Dalle, Jean-Michel
 * Project management in the Wikipedia community
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * 2010
 * 


 * -- align="left" valign=top
 * Yin, Xiaoshi; Huang, Xiangji & Li, Zhoujun
 * Promoting Ranking Diversity for Biomedical Information Retrieval Using Wikipedia
 * Advances in Information Retrieval. 32nd European Conference on IR Research, ECIR 2010, 28-31 March 2010 Berlin, Germany
 * 2010
 * {{hidden||In this paper, we propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval. The proposed method concerns with finding passages that cover many different aspects of a query topic. First, aspects covered by retrieved passages are detected and explicitly presented by Wikipedia concepts. Then, an aspect filter based on a two-stage model is introduced. It ranks the detected aspects in decreasing order of the probability that an aspect is generated by the query. Finally, retrieved passages are re-ranked using the proposed cost-based re-ranking method which ranks a passage according to the number of new aspects covered by the passage and the query-relevance of aspects covered by the passage. A series of experiments conducted on the TREC} 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed method in promoting ranking diversity for biomedical information retrieval.}}
 * {{hidden||In this paper, we propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval. The proposed method concerns with finding passages that cover many different aspects of a query topic. First, aspects covered by retrieved passages are detected and explicitly presented by Wikipedia concepts. Then, an aspect filter based on a two-stage model is introduced. It ranks the detected aspects in decreasing order of the probability that an aspect is generated by the query. Finally, retrieved passages are re-ranked using the proposed cost-based re-ranking method which ranks a passage according to the number of new aspects covered by the passage and the query-relevance of aspects covered by the passage. A series of experiments conducted on the TREC} 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed method in promoting ranking diversity for biomedical information retrieval.}}
 * {{hidden||In this paper, we propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval. The proposed method concerns with finding passages that cover many different aspects of a query topic. First, aspects covered by retrieved passages are detected and explicitly presented by Wikipedia concepts. Then, an aspect filter based on a two-stage model is introduced. It ranks the detected aspects in decreasing order of the probability that an aspect is generated by the query. Finally, retrieved passages are re-ranked using the proposed cost-based re-ranking method which ranks a passage according to the number of new aspects covered by the passage and the query-relevance of aspects covered by the passage. A series of experiments conducted on the TREC} 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed method in promoting ranking diversity for biomedical information retrieval.}}


 * -- align="left" valign=top
 * Yin, Xiaoshi; Huang, Xiangji & Li, Zhoujun
 * Promoting ranking diversity for biomedical information retrieval using wikipedia
 * 32nd European Conference on Information Retrieval, ECIR 2010, March 28, 2010 - March 31, 2010 Milton Keynes, United kingdom
 * 2010
 * 
 * {{hidden||In this paper, we propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval. The proposed method concerns with finding passages that cover many different aspects of a query topic. First, aspects covered by retrieved passages are detected and explicitly presented by Wikipedia concepts. Then, an aspect filter based on a two-stage model is introduced. It ranks the detected aspects in decreasing order of the probability that an aspect is generated by the query. Finally, retrieved passages are re-ranked using the proposed cost-based re-ranking method which ranks a passage according to the number of new aspects covered by the passage and the query-relevance of aspects covered by the passage. A series of experiments conducted on the TREC} 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed method in promoting ranking diversity for biomedical information retrieval. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||In this paper, we propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval. The proposed method concerns with finding passages that cover many different aspects of a query topic. First, aspects covered by retrieved passages are detected and explicitly presented by Wikipedia concepts. Then, an aspect filter based on a two-stage model is introduced. It ranks the detected aspects in decreasing order of the probability that an aspect is generated by the query. Finally, retrieved passages are re-ranked using the proposed cost-based re-ranking method which ranks a passage according to the number of new aspects covered by the passage and the query-relevance of aspects covered by the passage. A series of experiments conducted on the TREC} 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed method in promoting ranking diversity for biomedical information retrieval. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Luther, Kurt; Flaschen, Matthew; Forte, Andrea; Jordan, Christopher & Bruckman, Amy
 * ProveIt: a new tool for supporting citation in MediaWiki
 * Proceedings of the 5th International Symposium on Wikis and Open Collaboration
 * 2009
 * 
 * {{hidden||ProveIt} is an extension to the Mozilla Firefox browser designed to support editors in citing sources in Wikipedia and other projects that use the MediaWiki} platform.}}
 * {{hidden||ProveIt} is an extension to the Mozilla Firefox browser designed to support editors in citing sources in Wikipedia and other projects that use the MediaWiki} platform.}}


 * -- align="left" valign=top
 * Nakatani, Makoto; Jatowt, Adam; Ohshima, Hiroaki & Tanaka, Katsumi
 * Quality evaluation of search results by typicality and speciality of terms extracted from wikipedia
 * 14th International Conference on Database Systems for Advanced Applications, DASFAA 2009, April 21, 2009 - April 23, 2009 Brisbane, QLD, Australia
 * 2009
 * 


 * -- align="left" valign=top
 * Reinoso, Antonio J.; Gonzalez-Barahona, Jesus M.; Ortega, Felipe & Robles, Greogrio
 * Quantitative analysis and characterization of Wikipedia requests
 * Proceedings of the 4th International Symposium on Wikis
 * 2008
 * 


 * -- align="left" valign=top
 * Ortega, Felipe & Barahona, Jesus M. Gonzalez
 * Quantitative analysis of the wikipedia community of users
 * Proceedings of the 2007 international symposium on Wikis
 * 2007
 * 


 * -- align="left" valign=top
 * Xu, Yang; Jones, Gareth J.F. & Wang, Bin
 * Query dependent pseudo-relevance feedback based on wikipedia
 * Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
 * 2009
 * 
 * {{hidden||Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF.} One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR} system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF.} It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF} for query dependent expansion. Specifically, we classify TREC} topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR} to evaluate these methods. Experiments on four TREC} test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method out-performs the baseline relevance model in terms of precision and robustness.}}
 * {{hidden||Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF.} One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR} system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF.} It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF} for query dependent expansion. Specifically, we classify TREC} topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR} to evaluate these methods. Experiments on four TREC} test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method out-performs the baseline relevance model in terms of precision and robustness.}}


 * -- align="left" valign=top
 * Xu, Yang; Jones, G.J.F. & Wang, Bin
 * Query dependent pseudo-relevance feedback based on Wikipedia
 * 32nd Annual ACM SIGIR Conference. SIGIR 2009, 19-23 July 2009 New York, NY, USA}
 * 2009
 * {{hidden||Pseudo-relevance feedback (PRF) via query-expansion has been proven to be effective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF.} One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR} system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF.} It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF} for query dependent expansion. Specifically, we classify TREC} topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR} to evaluate these methods. Experiments on four TREC} test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method outperforms the baseline relevance model in terms of precision and robustness.}}
 * {{hidden||Pseudo-relevance feedback (PRF) via query-expansion has been proven to be effective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF.} One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR} system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF.} It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF} for query dependent expansion. Specifically, we classify TREC} topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR} to evaluate these methods. Experiments on four TREC} test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method outperforms the baseline relevance model in terms of precision and robustness.}}
 * {{hidden||Pseudo-relevance feedback (PRF) via query-expansion has been proven to be effective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF.} One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR} system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF.} It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF} for query dependent expansion. Specifically, we classify TREC} topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR} to evaluate these methods. Experiments on four TREC} test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method outperforms the baseline relevance model in terms of precision and robustness.}}


 * -- align="left" valign=top
 * Kanhabua, Nattiya & Nrvag, Kjetil
 * QUEST: Query expansion using synonyms over time
 * European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010, September 20, 2010 - September 24, 2010 Barcelona, Spain
 * 2010
 * 
 * {{hidden||A particular problem of searching news archives with named entities is that they are very dynamic in appearance compared to other vocabulary terms, and synonym relationships between terms change with time. In previous work, we proposed an approach to extracting time-based synonyms of named entities from the whole history of Wikipedia. In this paper, we present QUEST} (Query} Expansion using Synonyms over Time), a system that exploits time-based synonyms in searching news archives. The system takes as input a named entity query, and automatically determines time-based synonyms for a given query wrt. time criteria. Query expansion using the determined synonyms can be employed in order to improve the retrieval effectiveness. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||A particular problem of searching news archives with named entities is that they are very dynamic in appearance compared to other vocabulary terms, and synonym relationships between terms change with time. In previous work, we proposed an approach to extracting time-based synonyms of named entities from the whole history of Wikipedia. In this paper, we present QUEST} (Query} Expansion using Synonyms over Time), a system that exploits time-based synonyms in searching news archives. The system takes as input a named entity query, and automatically determines time-based synonyms for a given query wrt. time criteria. Query expansion using the determined synonyms can be employed in order to improve the retrieval effectiveness. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Zaragoza, Hugo; Rode, Henning; Mika, Peter; Atserias, Jordi; Ciaramita, Massimiliano & Attardi, Giuseppe
 * Ranking very many typed entities on wikipedia
 * Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
 * 2007
 * 


 * -- align="left" valign=top
 * Heilman, Michael & Smith, Noah A.
 * Rating computer-generated questions with Mechanical Turk
 * Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
 * 2010
 * 


 * -- align="left" valign=top
 * Nguyen, Dat P. T.; Matsuo, Yutaka & Ishizuka, Mitsuru
 * Relation extraction from Wikipedia using subtree mining
 * AAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference, July 22, 2007 - July 26, 2007 Vancouver, BC, Canada
 * 2007
 * {{hidden||The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia's English articles, which in turn can serve for intelligent systems to satisfy users' information needs. Our proposed method first anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classifies the relationships among entity pairs using SVM} with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classification, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reflects a relationship between a given entity pair, and subsequently identifies key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles. Copyright 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}
 * {{hidden||The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia's English articles, which in turn can serve for intelligent systems to satisfy users' information needs. Our proposed method first anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classifies the relationships among entity pairs using SVM} with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classification, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reflects a relationship between a given entity pair, and subsequently identifies key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles. Copyright 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}
 * {{hidden||The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia's English articles, which in turn can serve for intelligent systems to satisfy users' information needs. Our proposed method first anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classifies the relationships among entity pairs using SVM} with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classification, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reflects a relationship between a given entity pair, and subsequently identifies key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles. Copyright 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.}}


 * -- align="left" valign=top
 * Utiyama, Masao & Yamamoto, Mikio
 * Relevance feedback models for recommendation
 * Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
 * 2006
 * 
 * {{hidden||We extended language modeling approaches in information retrieval (IR) to combine collaborative filtering (CF) and content-based filtering (CBF).} Our approach is based on the analogy between IR} and CF, especially between CF} and relevance feedback (RF).} Both CF} and RF} exploit users' preference/relevance judgments to recommend items. We first introduce a multinomial model that combines CF} and CBF} in a language modeling framework. We then generalize the model to another multinomial model that approximates the Polya distribution. This generalized model outperforms the multinomial model by 3.4\% for CBF} and 17.4\% for CF} in recommending English Wikipedia articles. The performance of the generalized model for three different datasets was comparable to that of a state-of-the-art item-based CF} method.}}
 * {{hidden||We extended language modeling approaches in information retrieval (IR) to combine collaborative filtering (CF) and content-based filtering (CBF).} Our approach is based on the analogy between IR} and CF, especially between CF} and relevance feedback (RF).} Both CF} and RF} exploit users' preference/relevance judgments to recommend items. We first introduce a multinomial model that combines CF} and CBF} in a language modeling framework. We then generalize the model to another multinomial model that approximates the Polya distribution. This generalized model outperforms the multinomial model by 3.4\% for CBF} and 17.4\% for CF} in recommending English Wikipedia articles. The performance of the generalized model for three different datasets was comparable to that of a state-of-the-art item-based CF} method.}}


 * -- align="left" valign=top
 * Collins, Allan M.
 * Rethinking Education in the Age of Technology
 * Proceedings of the 9th international conference on Intelligent Tutoring Systems
 * 2008
 * 


 * -- align="left" valign=top
 * Elsas, Jonathan L.; Arguello, Jaime; Callan, Jamie & Carbonell, Jaime G.
 * Retrieval and feedback models for blog feed search
 * 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008, July 20, 2008 - July 24, 2008 Singapore, Singapore
 * 2008
 * 
 * {{hidden||Blog feed search poses different and interesting challenges from traditional ad hoc document retrieval. The units of retrieval, the blogs, are collections of documents, the blog posts. In this work we adapt a state-of-the-art federated search model to the feed retrieval task, showing a significant improvement over algorithms based on the best performing submissions in the TREC} 2007 Blog Distillation task [12]. We also show that typical query expansion techniques such as pseudo-relevance feedback using the blog corpus do not provide any significant performance improvement and in many cases dramatically hurt performance. We perform an in-depth analysis of the behavior of pseudorelevance feedback for this task and develop a novel query expansion technique using the link structure in Wikipedia. This query expansion technique provides significant and consistent performance improvements for this task, yielding a 22\% and 14\% improvement in MAP} over the unexpanded query for our baseline and federated algorithms respectively. }}
 * {{hidden||Blog feed search poses different and interesting challenges from traditional ad hoc document retrieval. The units of retrieval, the blogs, are collections of documents, the blog posts. In this work we adapt a state-of-the-art federated search model to the feed retrieval task, showing a significant improvement over algorithms based on the best performing submissions in the TREC} 2007 Blog Distillation task [12]. We also show that typical query expansion techniques such as pseudo-relevance feedback using the blog corpus do not provide any significant performance improvement and in many cases dramatically hurt performance. We perform an in-depth analysis of the behavior of pseudorelevance feedback for this task and develop a novel query expansion technique using the link structure in Wikipedia. This query expansion technique provides significant and consistent performance improvements for this task, yielding a 22\% and 14\% improvement in MAP} over the unexpanded query for our baseline and federated algorithms respectively. }}


 * -- align="left" valign=top
 * Jordan, Chris & Watters, Carolyn
 * Retrieval of single wikipedia articles while reading abstracts
 * 42nd Annual Hawaii International Conference on System Sciences, HICSS, January 5, 2009 - January 9, 2009 Waikoloa, {HI, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Retrieval of Single Wikipedia Articles While Reading Abstracts
 * Proceedings of the 42nd Hawaii International Conference on System Sciences
 * 2009
 * 
 * 


 * -- align="left" valign=top
 * Schütt, Thorsten; Schintke, Florian & Reinefeld, Alexander
 * Scalaris: reliable transactional p2p key/value store
 * Proceedings of the 7th ACM SIGPLAN workshop on ERLANG
 * 2008
 * 
 * {{hidden||We present Scalaris, an Erlang implementation of a distributed key/value store. It uses, on top of a structured overlay network, replication for data availability and majority based distributed transactions for data consistency. In combination, this implements the ACID} properties on a scalable structured overlay. By directly mapping the keys to the overlay without hashing, arbitrary key-ranges can be assigned to nodes, thereby allowing a better load-balancing than would be possible with traditional DHTs.} Consequently, Scalaris can be tuned for fast data access by taking, e.g. the nodes' geographic location or the regional popularity of certain keys into account. This improves Scalaris' lookup speed in datacenter or cloud computing environments. Scalaris is implemented in Erlang. We describe the Erlang software architecture, including the transactional Java interface to access Scalaris. Additionally, we present a generic design pattern to implement a responsive server in Erlang that serializes update operations on a common state, while concurrently performing fast asynchronous read requests on the same state. As a proof-of-concept we implemented a simplified Wikipedia frontend and attached it to the Scalaris data store backend. Wikipedia is a challenging application. It requires - besides thousands of concurrent read requests per seconds - serialized, consistent write operations. For Wikipedia's category and backlink pages, keys must be consistently changed within transactions. We discuss how these features are implemented in Scalaris and show its performance.}}
 * {{hidden||We present Scalaris, an Erlang implementation of a distributed key/value store. It uses, on top of a structured overlay network, replication for data availability and majority based distributed transactions for data consistency. In combination, this implements the ACID} properties on a scalable structured overlay. By directly mapping the keys to the overlay without hashing, arbitrary key-ranges can be assigned to nodes, thereby allowing a better load-balancing than would be possible with traditional DHTs.} Consequently, Scalaris can be tuned for fast data access by taking, e.g. the nodes' geographic location or the regional popularity of certain keys into account. This improves Scalaris' lookup speed in datacenter or cloud computing environments. Scalaris is implemented in Erlang. We describe the Erlang software architecture, including the transactional Java interface to access Scalaris. Additionally, we present a generic design pattern to implement a responsive server in Erlang that serializes update operations on a common state, while concurrently performing fast asynchronous read requests on the same state. As a proof-of-concept we implemented a simplified Wikipedia frontend and attached it to the Scalaris data store backend. Wikipedia is a challenging application. It requires - besides thousands of concurrent read requests per seconds - serialized, consistent write operations. For Wikipedia's category and backlink pages, keys must be consistently changed within transactions. We discuss how these features are implemented in Scalaris and show its performance.}}


 * -- align="left" valign=top
 * Forte, Andrea & Bruckman, Amy
 * Scaling consensus: Increasing decentralization in Wikipedia governance
 * 41st Annual Hawaii International Conference on System Sciences 2008, HICSS, January 7, 2008 - January 10, 2008 Big Island, {HI, United states
 * 2008
 * 


 * -- align="left" valign=top
 * Ukkonen, Antti; Castilloz, Carlos; Donatoz, Debora & Gionisz, Aristides
 * Searching the wikipedia with contextual information
 * 17th ACM Conference on Information and Knowledge Management, CIKM'08, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states
 * 2008
 * 
 * {{hidden||We propose a framework for searching the Wikipedia with contextual information. Our framework extends the typical keyword search, by considering queries of the type hq; pi, where q is a set of terms (as in classical Web search), and p is a source Wikipedia document. The query terms q represent the information that the user is interested in finding, and the document p provides the context of the query. The task is to rank other documents in Wikipedia with respect to their relevance to the query terms q given the context document p. By associating a context to the query terms, the search results of a search initiated in a particular page can be made more relevant. We suggest a number of features that extend the classical querysearch model so that the context document p is considered. We then use RankSVM} (Joachims} 2002) to learn weights for the individual features given suitably constructed training data. Documents are ranked at query time using the inner product of the feature and the weight vectors. The experiments indicate that the proposed method considerably improves results obtained by a more traditional approach that does not take the context into account.}}
 * {{hidden||We propose a framework for searching the Wikipedia with contextual information. Our framework extends the typical keyword search, by considering queries of the type hq; pi, where q is a set of terms (as in classical Web search), and p is a source Wikipedia document. The query terms q represent the information that the user is interested in finding, and the document p provides the context of the query. The task is to rank other documents in Wikipedia with respect to their relevance to the query terms q given the context document p. By associating a context to the query terms, the search results of a search initiated in a particular page can be made more relevant. We suggest a number of features that extend the classical querysearch model so that the context document p is considered. We then use RankSVM} (Joachims} 2002) to learn weights for the individual features given suitably constructed training data. Documents are ranked at query time using the inner product of the feature and the weight vectors. The experiments indicate that the proposed method considerably improves results obtained by a more traditional approach that does not take the context into account.}}


 * -- align="left" valign=top
 * Bast, Holger; Suchanek, Fabian & Weber, Ingmar
 * Semantic full-text search with ESTER: Scalable, easy, fast
 * IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008, December 15, 2008 - December 19, 2008 Pisa, Italy
 * 2008
 * 
 * {{hidden||We present a demo of ESTER, a search engine that combines the ease of use, speed and scalability of full-text search with the powerful semantic capabilities of ontologies. ESTER} supports full-text queries, ontological queries and combinations of these, yet its interface is as easy as can be: A standard search field with semantic information provided interactively as one types. ESTER} works by reducing all queries to two basic operations: prefix search and join, which can be implemented very efficiently in terms of both processing time and index space. We demonstrate the capabilities of ESTER} on a combination of the English Wikipedia with the Yago ontology, with response times below 100 milliseconds for most queries, and an index size of about 4 GB.} The system can be run both stand-alone and as a Web application. }}
 * {{hidden||We present a demo of ESTER, a search engine that combines the ease of use, speed and scalability of full-text search with the powerful semantic capabilities of ontologies. ESTER} supports full-text queries, ontological queries and combinations of these, yet its interface is as easy as can be: A standard search field with semantic information provided interactively as one types. ESTER} works by reducing all queries to two basic operations: prefix search and join, which can be implemented very efficiently in terms of both processing time and index space. We demonstrate the capabilities of ESTER} on a combination of the English Wikipedia with the Yago ontology, with response times below 100 milliseconds for most queries, and an index size of about 4 GB.} The system can be run both stand-alone and as a Web application. }}


 * -- align="left" valign=top
 * Völkel, Max; Krötzsch, Markus; Vrandecic, Denny; Haller, Heiko & Studer, Rudi
 * Semantic Wikipedia
 * Proceedings of the 15th international conference on World Wide Web
 * 2006
 * 
 * {{hidden||Wikipedia is the world's largest collaboratively edited source of encyclopaedic knowledge. But in spite of its utility, its contents are barely machine-interpretable. Structural knowledge, e.,g. about how concepts are interrelated, can neither be formally stated nor automatically processed. Also the wealth of numerical data is only available as plain text and thus can not be processed by its actual Meaning.We} provide an extension to be integrated in Wikipedia, that allows the typing of links between articles and the specification of typed data inside the articles in an easy-to-use Manner.Enabling} even casual users to participate in the creation of an open semantic knowledge base, Wikipedia has the chance to become a resource of semantic statements, hitherto unknown regarding size, scope, openness, and internationalisation. These semantic enhancements bring to Wikipedia benefits of today's semantic technologies: more specific ways of searching and browsing. Also, the RDF} export, that gives direct access to the formalised knowledge, opens Wikipedia up to a wide range of external applications, that will be able to use it as a background knowledge Base.In} this paper, we present the design, implementation, and possible uses of this extension.}}
 * {{hidden||Wikipedia is the world's largest collaboratively edited source of encyclopaedic knowledge. But in spite of its utility, its contents are barely machine-interpretable. Structural knowledge, e.,g. about how concepts are interrelated, can neither be formally stated nor automatically processed. Also the wealth of numerical data is only available as plain text and thus can not be processed by its actual Meaning.We} provide an extension to be integrated in Wikipedia, that allows the typing of links between articles and the specification of typed data inside the articles in an easy-to-use Manner.Enabling} even casual users to participate in the creation of an open semantic knowledge base, Wikipedia has the chance to become a resource of semantic statements, hitherto unknown regarding size, scope, openness, and internationalisation. These semantic enhancements bring to Wikipedia benefits of today's semantic technologies: more specific ways of searching and browsing. Also, the RDF} export, that gives direct access to the formalised knowledge, opens Wikipedia up to a wide range of external applications, that will be able to use it as a background knowledge Base.In} this paper, we present the design, implementation, and possible uses of this extension.}}


 * -- align="left" valign=top
 * Haller, Heiko; Krötzsch, Markus; Völkel, Max & Vrandecic, Denny
 * Semantic Wikipedia
 * Proceedings of the 2006 international symposium on Wikis
 * 2006
 * 
 * {{hidden||Wikipedia is the world's largest collaboratively edited source of encyclopaedic knowledge. But its contents are barely machine-interpretable. Structural knowledge, e. g. about how concepts are interrelated, can neither be formally stated nor automatically processed. Also the wealth of numerical data is only available as plain text and thus can not be processed by its actual Meaning.We} provide an extension to be integrated in Wikipedia, that allows even casual users the typing of links between articles and the specification of typed data inside the articles. Wiki users profit from more specific ways of searching and browsing. Each page has an RDF} export, that gives direct access to the formalised knowledge. This allows applications to use Wikipedia as a background knowledge base.}}
 * {{hidden||Wikipedia is the world's largest collaboratively edited source of encyclopaedic knowledge. But its contents are barely machine-interpretable. Structural knowledge, e. g. about how concepts are interrelated, can neither be formally stated nor automatically processed. Also the wealth of numerical data is only available as plain text and thus can not be processed by its actual Meaning.We} provide an extension to be integrated in Wikipedia, that allows even casual users the typing of links between articles and the specification of typed data inside the articles. Wiki users profit from more specific ways of searching and browsing. Each page has an RDF} export, that gives direct access to the formalised knowledge. This allows applications to use Wikipedia as a background knowledge base.}}


 * -- align="left" valign=top
 * Demartini, G.; Firan, C.S.; Iofciu, T. & Nejdl, W.
 * Semantically enhanced entity ranking
 * Web Information Systems Engineering - WISE 2008. 9th International Conference, 1-3 Sept. 2008 Berlin, Germany
 * 2008
 * 
 * {{hidden||Users often want to find entities instead of just documents, i.e., finding documents entirely about specific real-world entities rather than general documents where the entities are merely mentioned. Searching for entities on Web scale repositories is still an open challenge as the effectiveness of ranking is usually not satisfactory. Semantics can be used in this context to improve the results leveraging on entity-driven ontologies. In this paper we propose three categories of algorithms for query adaptation, using (1) semantic information, (2) NLP} techniques, and (3) link structure, to rank entities in Wikipedia. Our approaches focus on constructing queries using not only keywords but also additional syntactic information, while semantically relaxing the query relying on a highly accurate ontology. The results show that our approaches perform effectively, and that the combination of simple NLP, link analysis and semantic techniques improves the retrieval performance of entity search.}}
 * {{hidden||Users often want to find entities instead of just documents, i.e., finding documents entirely about specific real-world entities rather than general documents where the entities are merely mentioned. Searching for entities on Web scale repositories is still an open challenge as the effectiveness of ranking is usually not satisfactory. Semantics can be used in this context to improve the results leveraging on entity-driven ontologies. In this paper we propose three categories of algorithms for query adaptation, using (1) semantic information, (2) NLP} techniques, and (3) link structure, to rank entities in Wikipedia. Our approaches focus on constructing queries using not only keywords but also additional syntactic information, while semantically relaxing the query relying on a highly accurate ontology. The results show that our approaches perform effectively, and that the combination of simple NLP, link analysis and semantic techniques improves the retrieval performance of entity search.}}


 * -- align="left" valign=top
 * Filippova, Katja & Strube, Michael
 * Sentence fusion via dependency graph compression
 * Proceedings of the Conference on Empirical Methods in Natural Language Processing
 * 2008
 * 
 * {{hidden||We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize. We use GermaNet} and Wikipedia for checking semantic compatibility of co-arguments. In an evaluation with human judges our method outperforms the fusion approach of Barzilay \& McKeown} (2005) with respect to readability.}}
 * {{hidden||We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize. We use GermaNet} and Wikipedia for checking semantic compatibility of co-arguments. In an evaluation with human judges our method outperforms the fusion approach of Barzilay \& McKeown} (2005) with respect to readability.}}


 * -- align="left" valign=top
 * Sakai, Tetsuya & Nogami, Kenichi
 * Serendipitous search via wikipedia: a query log analysis
 * Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
 * 2009
 * 


 * -- align="left" valign=top
 * Ciglan, Marek & Nrvag, Kjetil
 * SGDB - Simple graph database optimized for activation spreading computation
 * 15th International Conference on Database Systems for Advanced Applications, DASFAA 2010, April 1, 2010 - April 4, 2010 Tsukuba, Japan
 * 2010
 * 
 * {{hidden||In this paper, we present SGDB, a graph database with a storage model optimized for computation of Spreading Activation (SA) queries. The primary goal of the system is to minimize the execution time of spreading activation algorithm over large graph structures stored on a persistent media; without pre-loading the whole graph into the memory. We propose a storage model aiming to minimize number of accesses to the storage media during execution of SA} and we propose a graph query type for the activation spreading operation. Finally, we present the implementation and its performance characteristics in scope of our pilot application that uses the activation spreading over the Wikipedia link graph. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||In this paper, we present SGDB, a graph database with a storage model optimized for computation of Spreading Activation (SA) queries. The primary goal of the system is to minimize the execution time of spreading activation algorithm over large graph structures stored on a persistent media; without pre-loading the whole graph into the memory. We propose a storage model aiming to minimize number of accesses to the storage media during execution of SA} and we propose a graph query type for the activation spreading operation. Finally, we present the implementation and its performance characteristics in scope of our pilot application that uses the activation spreading over the Wikipedia link graph. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Novotney, Scott & Callison-Burch, Chris
 * Shared task: crowdsourced accessibility elicitation of Wikipedia articles
 * Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
 * 2010
 * 


 * -- align="left" valign=top
 * Parton, Kristen; McKeown, Kathleen R.; Allan, James & Henestroza, Enrique
 * Simultaneous multilingual search for translingual information retrieval
 * Proceeding of the 17th ACM conference on Information and knowledge management
 * 2008
 * 
 * {{hidden||We consider the problem of translingual information retrieval, where monolingual searchers issue queries in a different language than the document language(s) and the results must be returned in the language they know, the query language. We present a framework for translingual IR} that integrates document translation and query translation into the retrieval model. The corpus is represented as an aligned, jointly indexed pseudo-parallel" corpus where each document contains the text of the document along with its translation into the query language. The queries are formulated as multilingual structured queries where each query term and its translations into the document language(s) are treated as synonym sets. This model leverages simultaneous search in multiple languages against jointly indexed documents to improve the accuracy of results over search using document translation or query translation alone. For query translation we compared a statistical machine translation (SMT) approach to a dictionary-based approach. We found that using a Wikipedia-derived dictionary for named entities combined with an SMT-based} dictionary worked better than SMT} alone. Simultaneous multilingual search also has other important features suited to translingual search since it can provide an indication of poor document translation when a match with the source document is found. We show how close integration of CLIR} and SMT} allows us to improve result translation in addition to IR} results."}}
 * {{hidden||We consider the problem of translingual information retrieval, where monolingual searchers issue queries in a different language than the document language(s) and the results must be returned in the language they know, the query language. We present a framework for translingual IR} that integrates document translation and query translation into the retrieval model. The corpus is represented as an aligned, jointly indexed pseudo-parallel" corpus where each document contains the text of the document along with its translation into the query language. The queries are formulated as multilingual structured queries where each query term and its translations into the document language(s) are treated as synonym sets. This model leverages simultaneous search in multiple languages against jointly indexed documents to improve the accuracy of results over search using document translation or query translation alone. For query translation we compared a statistical machine translation (SMT) approach to a dictionary-based approach. We found that using a Wikipedia-derived dictionary for named entities combined with an SMT-based} dictionary worked better than SMT} alone. Simultaneous multilingual search also has other important features suited to translingual search since it can provide an indication of poor document translation when a match with the source document is found. We show how close integration of CLIR} and SMT} allows us to improve result translation in addition to IR} results."}}


 * -- align="left" valign=top
 * Blumenstock, Joshua E.
 * Size matters: word count as a measure of quality on wikipedia
 * Proceeding of the 17th international conference on World Wide Web
 * 2008
 * 


 * -- align="left" valign=top
 * Pirolli, Peter; Wollny, Evelin & Suh, Bongwon
 * So you know you're getting the best possible information: a tool that increases Wikipedia credibility
 * Proceedings of the 27th international conference on Human factors in computing systems
 * 2009
 * 
 * {{hidden||An experiment was conducted to study how credibility judgments about Wikipedia are affected by providing users with an interactive visualization (WikiDashboard) of article and author editing history. Overall, users who self-reported higher use of Internet information and higher rates of Wikipedia usage tended to produce lower credibility judgments about Wikipedia articles and authors. However, use of WikiDashboard} significantly increased article and author credibility judgments, with effect sizes larger than any other measured effects of background media usage and attitudes on Wikiepedia credibility. The results suggest that increased exposure to the editing/authoring histories of Wikipedia increases credibility judgments.}}
 * {{hidden||An experiment was conducted to study how credibility judgments about Wikipedia are affected by providing users with an interactive visualization (WikiDashboard) of article and author editing history. Overall, users who self-reported higher use of Internet information and higher rates of Wikipedia usage tended to produce lower credibility judgments about Wikipedia articles and authors. However, use of WikiDashboard} significantly increased article and author credibility judgments, with effect sizes larger than any other measured effects of background media usage and attitudes on Wikiepedia credibility. The results suggest that increased exposure to the editing/authoring histories of Wikipedia increases credibility judgments.}}


 * -- align="left" valign=top
 * Rodrigues, Eduarda Mendes & Milic-Frayling, Natasa
 * Socializing or knowledge sharing?: characterizing social intent in community question answering
 * Proceeding of the 18th ACM conference on Information and knowledge management
 * 2009
 * 


 * -- align="left" valign=top
 * Atzenbeck, Claus & Hicks, David L.
 * Socs: increasing social and group awareness for Wikis by example of Wikipedia
 * Proceedings of the 4th International Symposium on Wikis
 * 2008
 * 


 * -- align="left" valign=top
 * West, Andrew G.; Kannan, Sampath & Lee, Insup
 * Spatio-temporal analysis of Wikipedia metadata and the STiki anti-vandalism tool
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * 2010
 * 
 * {{hidden||The bulk of Wikipedia anti-vandalism tools require natural language processing over the article or diff text. However, our prior work demonstrated the feasibility of using spatio-temporal properties to locate malicious edits. STiki} is a real-time, On-Wikipedia} tool leveraging this technique. The associated poster reviews STiki's} methodology and performance. We find competing anti-vandalism tools inhibit maximal performance. However, the tool proves particularly adept at mitigating long-term embedded vandalism. Further, its robust and language-independent nature make it well-suited for use in less-patrolled Wiki installations.}}
 * {{hidden||The bulk of Wikipedia anti-vandalism tools require natural language processing over the article or diff text. However, our prior work demonstrated the feasibility of using spatio-temporal properties to locate malicious edits. STiki} is a real-time, On-Wikipedia} tool leveraging this technique. The associated poster reviews STiki's} methodology and performance. We find competing anti-vandalism tools inhibit maximal performance. However, the tool proves particularly adept at mitigating long-term embedded vandalism. Further, its robust and language-independent nature make it well-suited for use in less-patrolled Wiki installations.}}


 * -- align="left" valign=top
 * Lim, Ee-Peng; Maureen; Ibrahim, Nelman Lubis; Sun, Aixin; Datta, Anwitaman & Chang, Kuiyu
 * SSnetViz: a visualization engine for heterogeneous semantic social networks
 * Proceedings of the 11th International Conference on Electronic Commerce
 * 2009
 * 
 * {{hidden||SSnetViz} is an ongoing research to design and implement a visualization engine for heterogeneous semantic social networks. A semantic social network is a multi-modal network that contains nodes representing different types of people or object entities, and edges representing relationships among them. When multiple heterogeneous semantic social networks are to be visualized together, SSnetViz} provides a suite of functions to store heterogeneous semantic social networks, to integrate them for searching and analysis. We will illustrate these functions using social networks related to terrorism research, one crafted by domain experts and another from Wikipedia.}}
 * {{hidden||SSnetViz} is an ongoing research to design and implement a visualization engine for heterogeneous semantic social networks. A semantic social network is a multi-modal network that contains nodes representing different types of people or object entities, and edges representing relationships among them. When multiple heterogeneous semantic social networks are to be visualized together, SSnetViz} provides a suite of functions to store heterogeneous semantic social networks, to integrate them for searching and analysis. We will illustrate these functions using social networks related to terrorism research, one crafted by domain experts and another from Wikipedia.}}


 * -- align="left" valign=top
 * West, Andrew G.; Kannan, Sampath & Lee, Insup
 * STiki: an anti-vandalism tool for Wikipedia using spatio-temporal analysis of revision metadata
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * 2010
 * 
 * {{hidden||STiki} is an anti-vandalism tool for Wikipedia. Unlike similar tools, STiki} does not rely on natural language processing (NLP) over the article or diff text to locate vandalism. Instead, STiki} leverages spatio-temporal properties of revision metadata. The feasibility of utilizing such properties was demonstrated in our prior work, which found they perform comparably to NLP-efforts} while being more efficient, robust to evasion, and language independent. STiki} is a real-time, On-Wikipedia} implementation based on these properties. It consists of, (1) a server-side processing engine that examines revisions, scoring the likelihood each is vandalism, and, (2) a client-side GUI} that presents likely vandalism to end-users for definitive classification (and if necessary, reversion on Wikipedia). Our demonstration will provide an introduction to spatio-temporal properties, demonstrate the STiki} software, and discuss alternative research uses for the open-source code.}}
 * {{hidden||STiki} is an anti-vandalism tool for Wikipedia. Unlike similar tools, STiki} does not rely on natural language processing (NLP) over the article or diff text to locate vandalism. Instead, STiki} leverages spatio-temporal properties of revision metadata. The feasibility of utilizing such properties was demonstrated in our prior work, which found they perform comparably to NLP-efforts} while being more efficient, robust to evasion, and language independent. STiki} is a real-time, On-Wikipedia} implementation based on these properties. It consists of, (1) a server-side processing engine that examines revisions, scoring the likelihood each is vandalism, and, (2) a client-side GUI} that presents likely vandalism to end-users for definitive classification (and if necessary, reversion on Wikipedia). Our demonstration will provide an introduction to spatio-temporal properties, demonstrate the STiki} software, and discuss alternative research uses for the open-source code.}}


 * -- align="left" valign=top
 * Stein, Benno; zu Eissen, Sven Meyer & Potthast, Martin
 * Strategies for retrieving plagiarized documents
 * Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
 * 2007
 * 


 * -- align="left" valign=top
 * Plank, Barbara
 * Structural correspondence learning for parse disambiguation
 * Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
 * 2009
 * 
 * {{hidden||The paper presents an application of Structural Correspondence Learning (SCL) (Blitzer} et al., 2006) for domain adaptation of a stochastic attribute-value grammar (SAVG).} So far, SCL} has been applied successfully in NLP} for Part-of-Speech} tagging and Sentiment Analysis (Blitzer} et al., 2006; Blitzer et al., 2007). An attempt was made in the CoNLL} 2007 shared task to apply SCL} to non-projective dependency parsing (Shimizu} and Nakagawa, 2007), however, without any clear conclusions. We report on our exploration of applying SCL} to adapt a syntactic disambiguation model and show promising initial results on Wikipedia domains.}}
 * {{hidden||The paper presents an application of Structural Correspondence Learning (SCL) (Blitzer} et al., 2006) for domain adaptation of a stochastic attribute-value grammar (SAVG).} So far, SCL} has been applied successfully in NLP} for Part-of-Speech} tagging and Sentiment Analysis (Blitzer} et al., 2006; Blitzer et al., 2007). An attempt was made in the CoNLL} 2007 shared task to apply SCL} to non-projective dependency parsing (Shimizu} and Nakagawa, 2007), however, without any clear conclusions. We report on our exploration of applying SCL} to adapt a syntactic disambiguation model and show promising initial results on Wikipedia domains.}}


 * -- align="left" valign=top
 * Han, Xianpei & Zhao, Jun
 * Structural semantic relatedness: a knowledge-based method to named entity disambiguation
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 
 * {{hidden||Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods. In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best. The problem is that these knowledge sources are heterogeneous and most of the semantic knowledge within them is embedded in complex structures, such as graphs and networks. This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources. Empirical results show that, in comparison with the classical BOW} based methods and social network based methods, our method can significantly improve the disambiguation performance by respectively 8.7\% and 14.7\%.}}
 * {{hidden||Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods. In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best. The problem is that these knowledge sources are heterogeneous and most of the semantic knowledge within them is embedded in complex structures, such as graphs and networks. This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources. Empirical results show that, in comparison with the classical BOW} based methods and social network based methods, our method can significantly improve the disambiguation performance by respectively 8.7\% and 14.7\%.}}


 * -- align="left" valign=top
 * Sabel, Mikalai
 * Structuring wiki revision history
 * Proceedings of the 2007 international symposium on Wikis
 * 2007
 * 


 * -- align="left" valign=top
 * Nguyen, Dat P. T.; Matsuo, Yutaka & Ishizuka, Mitsuru
 * Subtree mining for relation extraction from Wikipedia
 * NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
 * 2007
 * 
 * {{hidden||In this study, we address the problem of extracting relations between entities from Wikipedia's English articles. Our proposed method first anchors the appearance of entities in Wikipedia's articles using neither Named Entity Recognizer (NER) nor coreference resolution tool. It then classifies the relationships between entity pairs using SVM} with features extracted from the web structure and subtrees mined from the syntactic structure of text. We evaluate our method on manually annotated data from actual Wikipedia articles.}}
 * {{hidden||In this study, we address the problem of extracting relations between entities from Wikipedia's English articles. Our proposed method first anchors the appearance of entities in Wikipedia's articles using neither Named Entity Recognizer (NER) nor coreference resolution tool. It then classifies the relationships between entity pairs using SVM} with features extracted from the web structure and subtrees mined from the syntactic structure of text. We evaluate our method on manually annotated data from actual Wikipedia articles.}}


 * -- align="left" valign=top
 * Cosley, Dan; Frankowski, Dan; Terveen, Loren & Riedl, John
 * SuggestBot: using intelligent task routing to help people find work in wikipedia
 * Proceedings of the 12th international conference on Intelligent user interfaces
 * 2007
 * 
 * {{hidden||Member-maintained communities ask their users to perform tasks the community needs. From Slashdot, to IMDb, to Wikipedia, groups with diverse interests create community-maintained artifacts of lasting value (CALV) that support the group's main purpose and provide value to others. Said communities don't help members find work to do, or do so without regard to individual preferences, such as Slashdot assigning meta-moderation randomly. Yet social science theory suggests that reducing the cost and increasing the personal value of contribution would motivate members to participate More.We} present SuggestBot, software that performs intelligent task routing (matching people with tasks) in Wikipedia. SuggestBot} uses broadly applicable strategies of text analysis, collaborative filtering, and hyperlink following to recommend tasks. SuggestBot's} intelligent task routing increases the number of edits by roughly four times compared to suggesting random articles. Our contributions are: 1) demonstrating the value of intelligent task routing in a real deployment; 2) showing how to do intelligent task routing; and 3) sharing our experience of deploying a tool in Wikipedia, which offered both challenges and opportunities for research.}}
 * {{hidden||Member-maintained communities ask their users to perform tasks the community needs. From Slashdot, to IMDb, to Wikipedia, groups with diverse interests create community-maintained artifacts of lasting value (CALV) that support the group's main purpose and provide value to others. Said communities don't help members find work to do, or do so without regard to individual preferences, such as Slashdot assigning meta-moderation randomly. Yet social science theory suggests that reducing the cost and increasing the personal value of contribution would motivate members to participate More.We} present SuggestBot, software that performs intelligent task routing (matching people with tasks) in Wikipedia. SuggestBot} uses broadly applicable strategies of text analysis, collaborative filtering, and hyperlink following to recommend tasks. SuggestBot's} intelligent task routing increases the number of edits by roughly four times compared to suggesting random articles. Our contributions are: 1) demonstrating the value of intelligent task routing in a real deployment; 2) showing how to do intelligent task routing; and 3) sharing our experience of deploying a tool in Wikipedia, which offered both challenges and opportunities for research.}}


 * -- align="left" valign=top
 * Ye, Shiren; Chua, Tat-Seng & Lu, Jie
 * Summarizing definition from Wikipedia
 * Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
 * 2009
 * 
 * {{hidden||Wikipedia provides a wealth of knowledge, where the first sentence, infobox (and relevant sentences), and even the entire document of a wiki article could be considered as diverse versions of summaries (definitions) of the target topic. We explore how to generate a series of summaries with various lengths based on them. To obtain more reliable associations between sentences, we introduce wiki concepts according to the internal links in Wikipedia. In addition, we develop an extended document concept lattice model to combine wiki concepts and non-textual features such as the outline and infobox. The model can concatenate representative sentences from non-overlapping salient local topics for summary generation. We test our model based on our annotated wiki articles which topics come from TREC-QA} 2004--2006 evaluations. The results show that the model is effective in summarization and definition QA.}}
 * {{hidden||Wikipedia provides a wealth of knowledge, where the first sentence, infobox (and relevant sentences), and even the entire document of a wiki article could be considered as diverse versions of summaries (definitions) of the target topic. We explore how to generate a series of summaries with various lengths based on them. To obtain more reliable associations between sentences, we introduce wiki concepts according to the internal links in Wikipedia. In addition, we develop an extended document concept lattice model to combine wiki concepts and non-textual features such as the outline and infobox. The model can concatenate representative sentences from non-overlapping salient local topics for summary generation. We test our model based on our annotated wiki articles which topics come from TREC-QA} 2004--2006 evaluations. The results show that the model is effective in summarization and definition QA.}}


 * -- align="left" valign=top
 * Meij, Edgar & de Rijke, Maarten
 * Supervised query modeling using wikipedia
 * Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval
 * 2010
 * 


 * -- align="left" valign=top
 * Bai, Bing; Weston, Jason; Grangier, David; Collobert, Ronan; Sadamasa, Kunihiko; Qi, Yanjun; Chapelle, Olivier & Weinberger, Kilian
 * Supervised semantic indexing
 * Proceeding of the 18th ACM conference on Information and knowledge management
 * 2009
 * 
 * {{hidden||In this article we propose Supervised Semantic Indexing (SSI), an algorithm that is trained on (query, document) pairs of text documents to predict the quality of their match. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI} our models are trained with a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, and correlated feature hashing (CFH).} We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain state-of-the-art performance while providing realistically scalable methods.}}
 * {{hidden||In this article we propose Supervised Semantic Indexing (SSI), an algorithm that is trained on (query, document) pairs of text documents to predict the quality of their match. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI} our models are trained with a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, and correlated feature hashing (CFH).} We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain state-of-the-art performance while providing realistically scalable methods.}}


 * -- align="left" valign=top
 * Raisanen, Teppo
 * Supporting the sense-making processes of web users by using a proxy server
 * 42nd Annual Hawaii International Conference on System Sciences, HICSS, January 5, 2009 - January 9, 2009 Waikoloa, {HI, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Supporting the Sense-Making Processes of Web Users by Using a Proxy Server
 * Proceedings of the 42nd Hawaii International Conference on System Sciences
 * 2009
 * 
 * 


 * -- align="left" valign=top
 * Ortega, Felipe & Izquierdo-Cortazar, Daniel
 * Survival analysis in open development projects
 * Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development
 * 2009
 * 
 * {{hidden||Open collaborative projects, like FLOSS} development projects and open content creation projects (e.g. Wikipedia), heavily depend on contributions from their respective communities to improve. In this context, an important question for both researchers and practitioners is: what is the expected lifetime of contributors in a community? Answering this question, we will be able to characterize these communities as an appropriate model can show whether or not users maintain their interest to contribute, for how long we could expect them to collaborate and, as a result, improve the organization and management of the project. In this paper, we demonstrate that survival analysis, a wellknown statistical methodology in other research areas such as epidemiology, biology or demographic studies, is a useful methodology to undertake a quantitative comparison of the lifetime of contributors in open collaborative initiatives, like the development of FLOSS} projects and the Wikipedia, providing insightful answers to this challenging question.}}
 * {{hidden||Open collaborative projects, like FLOSS} development projects and open content creation projects (e.g. Wikipedia), heavily depend on contributions from their respective communities to improve. In this context, an important question for both researchers and practitioners is: what is the expected lifetime of contributors in a community? Answering this question, we will be able to characterize these communities as an appropriate model can show whether or not users maintain their interest to contribute, for how long we could expect them to collaborate and, as a result, improve the organization and management of the project. In this paper, we demonstrate that survival analysis, a wellknown statistical methodology in other research areas such as epidemiology, biology or demographic studies, is a useful methodology to undertake a quantitative comparison of the lifetime of contributors in open collaborative initiatives, like the development of FLOSS} projects and the Wikipedia, providing insightful answers to this challenging question.}}


 * -- align="left" valign=top
 * Stampouli, Anastasia; Giannakidou, Eirini & Vakali, Athena
 * Tag disambiguation through Flickr and Wikipedia
 * 15th International Conference on Database Systems for Advanced Applications, DASFAA 2010, April 1, 2010 - April 4, 2010 Tsukuba, Japan
 * 2010
 * 
 * {{hidden||Given the popularity of social tagging systems and the limitations these systems have, due to lack of any structure, a common issue that arises involves the low retrieval quality in such systems due to ambiguities of certain terms. In this paper, an approach for improving the retrieval in these systems, in case of ambiguous terms, is presented that attempts to perform tag disambiguation and, at the same time, provide users with relevant content. The idea is based on a mashup that combines data and functionality of two major web 2.0 sites, namely Flickr and Wikipedia and aims at enhancing content retrieval for web users. A case study with the ambiguous notion Apple"} illustrates the value of the proposed approach. 2010 Springer-Verlag} Berlin Heidelberg."}}
 * {{hidden||Given the popularity of social tagging systems and the limitations these systems have, due to lack of any structure, a common issue that arises involves the low retrieval quality in such systems due to ambiguities of certain terms. In this paper, an approach for improving the retrieval in these systems, in case of ambiguous terms, is presented that attempts to perform tag disambiguation and, at the same time, provide users with relevant content. The idea is based on a mashup that combines data and functionality of two major web 2.0 sites, namely Flickr and Wikipedia and aims at enhancing content retrieval for web users. A case study with the ambiguous notion Apple"} illustrates the value of the proposed approach. 2010 Springer-Verlag} Berlin Heidelberg."}}


 * -- align="left" valign=top
 * Burke, Moira & Kraut, Robert
 * Taking up the mop: identifying future wikipedia administrators
 * CHI '08 CHI '08 extended abstracts on Human factors in computing systems
 * 2008
 * 
 * {{hidden||As Wikipedia grows, so do the messy byproducts of collaboration. Backlogs of administrative work are increasing, suggesting the need for more users with privileged admin status. This paper presents a model of editors who have successfully passed the peer review process to become admins. The lightweight model is based on behavioral metadata and comments, and does not require any page text. It demonstrates that the Wikipedia community has shifted in the last two years to prioritizing policymaking and organization experience over simple article-level coordination, and mere edit count does not lead to adminship. The model can be applied as an AdminFinderBot"} to automatically search all editors' histories and pick out likely future admins as a self-evaluation tool or as a dashboard of relevant statistics for voters evaluating admin candidates."}}
 * {{hidden||As Wikipedia grows, so do the messy byproducts of collaboration. Backlogs of administrative work are increasing, suggesting the need for more users with privileged admin status. This paper presents a model of editors who have successfully passed the peer review process to become admins. The lightweight model is based on behavioral metadata and comments, and does not require any page text. It demonstrates that the Wikipedia community has shifted in the last two years to prioritizing policymaking and organization experience over simple article-level coordination, and mere edit count does not lead to adminship. The model can be applied as an AdminFinderBot"} to automatically search all editors' histories and pick out likely future admins as a self-evaluation tool or as a dashboard of relevant statistics for voters evaluating admin candidates."}}


 * -- align="left" valign=top
 * Viegas, Fernanda B.; Wattenberg, Martin; Kriss, Jesse & Ham, Frank Van
 * Talk before you type: Coordination in Wikipedia
 * 40th Annual Hawaii International Conference on System Sciences 2007, HICSS'07, January 3, 2007 - January 6, 2007 Big Island, {HI, United states
 * 2007
 * 
 * {{hidden||Wikipedia, the online encyclopedia, has attracted attention both because of its popularity and its unconventional policy of letting anyone on the internet edit its articles. This paper describes the results of an empirical analysis of Wikipedia and discusses ways in which the Wikipedia community has evolved as it has grown. We contrast our findings with an earlier study [11] and present three main results. First, the community maintains a strong resilience to malicious editing, despite tremendous growth and high traffic. Second, the fastest growing areas of Wikipedia are devoted to coordination and organization. Finally, we focus on a particular set of pages used to coordinate work, the Talk"} pages. By manually coding the content of a subset of these pages we find that these pages serve many purposes notably supporting strategic planning of edits and enforcement of standard guidelines and conventions. Our results suggest that despite the potential for anarchy the Wikipedia community places a strong emphasis on group coordination policy}}
 * {{hidden||Wikipedia, the online encyclopedia, has attracted attention both because of its popularity and its unconventional policy of letting anyone on the internet edit its articles. This paper describes the results of an empirical analysis of Wikipedia and discusses ways in which the Wikipedia community has evolved as it has grown. We contrast our findings with an earlier study [11] and present three main results. First, the community maintains a strong resilience to malicious editing, despite tremendous growth and high traffic. Second, the fastest growing areas of Wikipedia are devoted to coordination and organization. Finally, we focus on a particular set of pages used to coordinate work, the Talk"} pages. By manually coding the content of a subset of these pages we find that these pages serve many purposes notably supporting strategic planning of edits and enforcement of standard guidelines and conventions. Our results suggest that despite the potential for anarchy the Wikipedia community places a strong emphasis on group coordination policy}}


 * -- align="left" valign=top
 * Konieczny, Piotr
 * Teaching with Wikipedia and other Wikimedia foundation wikis
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * 2010
 * 
 * {{hidden||Wikipedia and other wikis operated by the Wikimedia Foundation are finding increasing applications in teaching and learning. This workshop will demonstrate how teachers from academia and beyond can use those wikis in their courses. Wikipedia can be used for various assignments: for example, students can be asked to reference an unreferenced article or create a completely new one. Students can also work on creating a free textbook on Wikibooks, learn about journalism on Wikinews or engage in variety of media-related projects on Commons. In doing so, students will see that writing an article and related assignments are not a 'tedious assignment' but activities that millions do 'for fun'. They will also gain a deeper understanding of what Wikipedia is, and how (un)reliable it can be. They and the course leaders are assisted by a lively, real world community. Last, but not least, their work will also benefit -- and be improved upon -- by the entire world. The workshop will focus on English Wikipedia, the most popular WMF} wiki with regards to where the teaching assignments are taking place, but will also discuss the educational opportunities on other WMF} wikis, such as Wikibooks. An overview of the Wikipedia School and University Project will be presented. There will be a discussion of Wikipedia policies related to teaching assignments, and a presentation of tools developed to make teaching with Wikipedia easier. The participants will see what kind of assignments can be done on Wikipedia (from learning simple wiki editing skills, through the assignments designed to teach students about proper referencing and sources reliability, to writing paper assignments with the goal of developing Good and Featured Articles), and how they can be implemented most easily and efficiently, avoiding common pitfalls and dealing with common problems (such as how to avoid having your students' articles deleted minutes after creation). Finally, the participants will be given an opportunity to create a draft syllabus for a future course they may want to teach on a WMF} wiki (bringing your laptops for that part is highly recommended).}}
 * {{hidden||Wikipedia and other wikis operated by the Wikimedia Foundation are finding increasing applications in teaching and learning. This workshop will demonstrate how teachers from academia and beyond can use those wikis in their courses. Wikipedia can be used for various assignments: for example, students can be asked to reference an unreferenced article or create a completely new one. Students can also work on creating a free textbook on Wikibooks, learn about journalism on Wikinews or engage in variety of media-related projects on Commons. In doing so, students will see that writing an article and related assignments are not a 'tedious assignment' but activities that millions do 'for fun'. They will also gain a deeper understanding of what Wikipedia is, and how (un)reliable it can be. They and the course leaders are assisted by a lively, real world community. Last, but not least, their work will also benefit -- and be improved upon -- by the entire world. The workshop will focus on English Wikipedia, the most popular WMF} wiki with regards to where the teaching assignments are taking place, but will also discuss the educational opportunities on other WMF} wikis, such as Wikibooks. An overview of the Wikipedia School and University Project will be presented. There will be a discussion of Wikipedia policies related to teaching assignments, and a presentation of tools developed to make teaching with Wikipedia easier. The participants will see what kind of assignments can be done on Wikipedia (from learning simple wiki editing skills, through the assignments designed to teach students about proper referencing and sources reliability, to writing paper assignments with the goal of developing Good and Featured Articles), and how they can be implemented most easily and efficiently, avoiding common pitfalls and dealing with common problems (such as how to avoid having your students' articles deleted minutes after creation). Finally, the participants will be given an opportunity to create a draft syllabus for a future course they may want to teach on a WMF} wiki (bringing your laptops for that part is highly recommended).}}


 * -- align="left" valign=top
 * Longo, Luca; Dondio, Pierpaolo & Barrett, Stephen
 * Temporal factors to evaluate trustworthiness of virtual identities
 * Security and Privacy in Communications Networks and the Workshops, 2007. SecureComm 2007. Third International Conference on
 * 2007


 * -- align="left" valign=top
 * Mousavidin, Elham & Silva, Leiser
 * Testimonial Knowledge and Trust in Virtual Communities: A Research in Progress of the Case of Wikipedia
 * 2009
 * 
 * 


 * -- align="left" valign=top
 * Gupta, Rakesh & Ratinov, Lev
 * Text categorization with knowledge transfer from heterogeneous data sources
 * Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
 * 2008
 * 
 * {{hidden||Multi-category classification of short dialogues is a common task performed by humans. When assigning a question to an expert, a customer service operator tries to classify the customer query into one of N different classes for which experts are available. Similarly, questions on the web (for example questions at Yahoo Answers) can be automatically forwarded to a restricted group of people with a specific expertise. Typical questions are short and assume background world knowledge for correct classification. With exponentially increasing amount of knowledge available, with distinct properties (labeled vs unlabeled, structured vs unstructured), no single knowledge-transfer algorithm such as transfer learning, multi-task learning or selftaught learning can be applied universally. In this work we show that bag-of-words classifiers performs poorly on noisy short conversational text snippets. We present an algorithm for leveraging heterogeneous data sources and algorithms with significant improvements over any single algorithm, rivaling human performance. Using different algorithms for each knowledge source we use mutual information to aggressively prune features. With heterogeneous data sources including Wikipedia, Open Directory Project (ODP), and Yahoo Answers, we show 89.4\% and 96.8\% correct classification on Google Answers corpus and Switchboard corpus using only 200 features/class. This reflects a huge improvement over bag of words approaches and 48-65\% error reduction over previously published state of art (Gabrilovich} et. al. 2006).}}
 * {{hidden||Multi-category classification of short dialogues is a common task performed by humans. When assigning a question to an expert, a customer service operator tries to classify the customer query into one of N different classes for which experts are available. Similarly, questions on the web (for example questions at Yahoo Answers) can be automatically forwarded to a restricted group of people with a specific expertise. Typical questions are short and assume background world knowledge for correct classification. With exponentially increasing amount of knowledge available, with distinct properties (labeled vs unlabeled, structured vs unstructured), no single knowledge-transfer algorithm such as transfer learning, multi-task learning or selftaught learning can be applied universally. In this work we show that bag-of-words classifiers performs poorly on noisy short conversational text snippets. We present an algorithm for leveraging heterogeneous data sources and algorithms with significant improvements over any single algorithm, rivaling human performance. Using different algorithms for each knowledge source we use mutual information to aggressively prune features. With heterogeneous data sources including Wikipedia, Open Directory Project (ODP), and Yahoo Answers, we show 89.4\% and 96.8\% correct classification on Google Answers corpus and Switchboard corpus using only 200 features/class. This reflects a huge improvement over bag of words approaches and 48-65\% error reduction over previously published state of art (Gabrilovich} et. al. 2006).}}


 * -- align="left" valign=top
 * Wang, Kai; Lin, Chien-Liang; Chen, Chun-Der & Yang, Shu-Chen
 * The Adoption of Wikipedia: A Community- and Information Quality-Based View
 * 2008
 * 
 * 


 * -- align="left" valign=top
 * Trattner, Christoph; Hasani-Mavriqi, Ilire; Helic, Denis & Leitner, Helmut
 * The Austrian way of wiki(pedia)! Development of a structured wiki-based encyclopedia within a local Austrian context
 * 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland
 * 2010
 * 
 * {{hidden||Although the success of online encyclopedias such as Wiki-pedia is indisputable, researchers have questioned usefulness of Wikipedia in educational settings. Problems such as copypaste syndrome, unchecked quality, or fragmentation of knowledge have been recognized as serious drawbacks for a wide spread application of Wikipedia in universities or high schools. In this paper we present a Wiki-based encyclopedia called Austria-Forum} that aims to combine openness and collaboration aspects of Wikipedia with approaches to build a structured, quality inspected, and context-sensitive online encyclopedia. To ensure tractability of the publishing process the system focuses on providing information within a local Austrian context. It is our experience that such an approach represents a first step of a proper application of online encyclopedias in educational settings.}}
 * {{hidden||Although the success of online encyclopedias such as Wiki-pedia is indisputable, researchers have questioned usefulness of Wikipedia in educational settings. Problems such as copypaste syndrome, unchecked quality, or fragmentation of knowledge have been recognized as serious drawbacks for a wide spread application of Wikipedia in universities or high schools. In this paper we present a Wiki-based encyclopedia called Austria-Forum} that aims to combine openness and collaboration aspects of Wikipedia with approaches to build a structured, quality inspected, and context-sensitive online encyclopedia. To ensure tractability of the publishing process the system focuses on providing information within a local Austrian context. It is our experience that such an approach represents a first step of a proper application of online encyclopedias in educational settings.}}


 * -- align="left" valign=top
 * Huner, Kai M. & Otto, Boris
 * The effect of using a semantic wiki for metadata management: A controlled experiment
 * 42nd Annual Hawaii International Conference on System Sciences, HICSS, January 5, 2009 - January 9, 2009 Waikoloa, {HI, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Chen, Jilin; Ren, Yuqing & Riedl, John
 * The effects of diversity on group productivity and member withdrawal in online volunteer groups
 * Proceedings of the 28th international conference on Human factors in computing systems
 * 2010
 * 


 * -- align="left" valign=top
 * Khalid, M.A.; Jijkoun, V. & de Rijke, M.
 * The impact of named entity normalization on information retrieval for question answering
 * Advances in Information Retrieval. 30th European Conference on IR Research, ECIR 2008, 30 March-3 April 2008 Berlin, Germany
 * 2008


 * -- align="left" valign=top
 * Kamps, J. & Koolen, M.
 * The importance of link evidence in Wikipedia
 * Advances in Information Retrieval. 30th European Conference on IR Research, ECIR 2008, 30 March-3 April 2008 Berlin, Germany
 * 2008
 * {{hidden||Wikipedia is one of the most popular information sources on the Web. The free encyclopedia is densely linked. The link structure in Wikipedia differs from the Web at large: internal links in Wikipedia are typically based on words naturally occurring in a page, and link to another semantically related entry. Our main aim is to find out if Wikipedia's link structure can be exploited to improve ad hoc information retrieval. We first analyse the relation between Wikipedia links and the relevance of pages. We then experiment with use of link evidence in the focused retrieval of Wikipedia content, based on the test collection of INEX} 2006. Our main findings are: First, our analysis of the link structure reveals that the Wikipedia link structure is a (possibly weak) indicator of relevance. Second, our experiments on INEX} ad hoc retrieval tasks reveal that if the link evidence is made sensitive to the local context we see a significant improvement of retrieval effectiveness. Hence, in contrast with earlier TREC} experiments using crawled Web data, we have shown that Wikipedia's link structure can help improve the effectiveness of ad hoc retrieval.}}
 * {{hidden||Wikipedia is one of the most popular information sources on the Web. The free encyclopedia is densely linked. The link structure in Wikipedia differs from the Web at large: internal links in Wikipedia are typically based on words naturally occurring in a page, and link to another semantically related entry. Our main aim is to find out if Wikipedia's link structure can be exploited to improve ad hoc information retrieval. We first analyse the relation between Wikipedia links and the relevance of pages. We then experiment with use of link evidence in the focused retrieval of Wikipedia content, based on the test collection of INEX} 2006. Our main findings are: First, our analysis of the link structure reveals that the Wikipedia link structure is a (possibly weak) indicator of relevance. Second, our experiments on INEX} ad hoc retrieval tasks reveal that if the link evidence is made sensitive to the local context we see a significant improvement of retrieval effectiveness. Hence, in contrast with earlier TREC} experiments using crawled Web data, we have shown that Wikipedia's link structure can help improve the effectiveness of ad hoc retrieval.}}
 * {{hidden||Wikipedia is one of the most popular information sources on the Web. The free encyclopedia is densely linked. The link structure in Wikipedia differs from the Web at large: internal links in Wikipedia are typically based on words naturally occurring in a page, and link to another semantically related entry. Our main aim is to find out if Wikipedia's link structure can be exploited to improve ad hoc information retrieval. We first analyse the relation between Wikipedia links and the relevance of pages. We then experiment with use of link evidence in the focused retrieval of Wikipedia content, based on the test collection of INEX} 2006. Our main findings are: First, our analysis of the link structure reveals that the Wikipedia link structure is a (possibly weak) indicator of relevance. Second, our experiments on INEX} ad hoc retrieval tasks reveal that if the link evidence is made sensitive to the local context we see a significant improvement of retrieval effectiveness. Hence, in contrast with earlier TREC} experiments using crawled Web data, we have shown that Wikipedia's link structure can help improve the effectiveness of ad hoc retrieval.}}


 * -- align="left" valign=top
 * Huang, Wei Che; Trotman, Andrew & Geva, Shlomo
 * The importance of manual assessment in link discovery
 * Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
 * 2009
 * 


 * -- align="left" valign=top
 * Vora, Parul; Komura, Naoko & Team, Stanton Usability
 * The n00b Wikipedia Editing Experience
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * 2010
 * 


 * -- align="left" valign=top
 * Curino, Carlo A.; Moon, Hyun J.; Ham, MyungWon & Zaniolo, Carlo
 * The PRISM Workwench: Database Schema Evolution without Tears
 * Proceedings of the 2009 IEEE International Conference on Data Engineering
 * 2009
 * 
 * {{hidden||Information Systems are subject to a perpetual evolution, which is particularly pressing in Web Information Systems, due to their distributed and often collaborative nature. Such continuous adaptation process, comes with a very high cost, because of the intrinsic complexity of the task and the serious rami﬿cations of such changes upon database-centric Information System softwares. Therefore, there is a need to automate and simplify the schema evolution process and to ensure predictability and logical independence upon schema changes. Current relational technology makes it easy to change the database content or to revise the underlaying storage and indexes but does little to support logical schema evolution which nowadays remains poorly supported by commercial tools. The PRISM} system demonstrates a major new advance toward automating schema evolution (including query mapping and database conversion), by improving predictability, logical independence, and auditability of the process. In fact, PRISM} exploits recent theoretical results on mapping composition, invertibility and query rewriting to provide DB} Administrators with an intuitive, operational workbench usable in their everyday activities—thus enabling graceful schema evolution. In this demonstration, we will show (i) the functionality of PRISM} and its supportive AJAX} interface, (ii) its architecture built upon a simple SQL–inspired} language of Schema Modi﬿cation Operators, and (iii) we will allow conference participants to directly interact with the system to test its capabilities. Finally, some of the most interesting evolution steps of popular Web Information Systems, such as Wikipedia, will be reviewed in a brief Saga} of Famous Schema Evolutions"."}}
 * {{hidden||Information Systems are subject to a perpetual evolution, which is particularly pressing in Web Information Systems, due to their distributed and often collaborative nature. Such continuous adaptation process, comes with a very high cost, because of the intrinsic complexity of the task and the serious rami﬿cations of such changes upon database-centric Information System softwares. Therefore, there is a need to automate and simplify the schema evolution process and to ensure predictability and logical independence upon schema changes. Current relational technology makes it easy to change the database content or to revise the underlaying storage and indexes but does little to support logical schema evolution which nowadays remains poorly supported by commercial tools. The PRISM} system demonstrates a major new advance toward automating schema evolution (including query mapping and database conversion), by improving predictability, logical independence, and auditability of the process. In fact, PRISM} exploits recent theoretical results on mapping composition, invertibility and query rewriting to provide DB} Administrators with an intuitive, operational workbench usable in their everyday activities—thus enabling graceful schema evolution. In this demonstration, we will show (i) the functionality of PRISM} and its supportive AJAX} interface, (ii) its architecture built upon a simple SQL–inspired} language of Schema Modi﬿cation Operators, and (iii) we will allow conference participants to directly interact with the system to test its capabilities. Finally, some of the most interesting evolution steps of popular Web Information Systems, such as Wikipedia, will be reviewed in a brief Saga} of Famous Schema Evolutions"."}}


 * -- align="left" valign=top
 * Kaisser, Michael
 * The QuALiM question answering demo: supplementing answers with paragraphs drawn from Wikipedia
 * Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session
 * 2008
 * 
 * {{hidden||This paper describes the online demo of the QuALiM} Question Answering system. While the system actually gets answers from the web by querying major search engines, during presentation answers are supplemented with relevant passages from Wikipedia. We believe that this additional information improves a user's search experience.}}
 * {{hidden||This paper describes the online demo of the QuALiM} Question Answering system. While the system actually gets answers from the web by querying major search engines, during presentation answers are supplemented with relevant passages from Wikipedia. We believe that this additional information improves a user's search experience.}}


 * -- align="left" valign=top
 * Jain, Shaili & Parkes, David C.
 * The role of game theory in human computation systems
 * Proceedings of the ACM SIGKDD Workshop on Human Computation
 * 2009
 * 
 * {{hidden||The paradigm of human computation" seeks to harness human abilities to solve computational problems or otherwise perform distributed work that is beyond the scope of current AI} technologies. One aspect of human computation has become known as "games with a purpose" and seeks to elicit useful computational work in fun (typically) multi-player games. Human computation also encompasses distributed work (or "peer production") systems such as Wikipedia and Question and Answer forums. In this short paper we survey existing game-theoretic models for various human computation designs and outline research challenges in advancing a theory that can enable better design."}}
 * {{hidden||The paradigm of human computation" seeks to harness human abilities to solve computational problems or otherwise perform distributed work that is beyond the scope of current AI} technologies. One aspect of human computation has become known as "games with a purpose" and seeks to elicit useful computational work in fun (typically) multi-player games. Human computation also encompasses distributed work (or "peer production") systems such as Wikipedia and Question and Answer forums. In this short paper we survey existing game-theoretic models for various human computation designs and outline research challenges in advancing a theory that can enable better design."}}


 * -- align="left" valign=top
 * Suh, Bongwon; Convertino, Gregorio; Chi, Ed H. & Pirolli, Peter
 * The singularity is not near: Slowing growth of Wikipedia
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Dutta, Amitava; Roy, Rahul; Seetharaman, Priya & Ingawale, Myshkin
 * The Small Worlds of Wikipedia: Implications for Growth, Quality and Sustainability of Collaborative Knowledge Networks
 * 2009
 * 
 * 


 * -- align="left" valign=top
 * Geiger, R. Stuart
 * The social roles of bots and assisted editing programs
 * 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Viegas, Fernanda B.
 * The visual side of Wikipedia
 * 40th Annual Hawaii International Conference on System Sciences 2007, HICSS'07, January 3, 2007 - January 6, 2007 Big Island, {HI, United states
 * 2007
 * 
 * {{hidden||The name Wikipedia"} has been associated with terms such as collaboration volunteers reliability vandalism and edit-war. Fewer people might think of "images}}
 * {{hidden||The name Wikipedia"} has been associated with terms such as collaboration volunteers reliability vandalism and edit-war. Fewer people might think of "images}}


 * -- align="left" valign=top
 * Wang, Yafang; Zhu, Mingjie; Qu, Lizhen; Spaniol, Marc & Weikum, Gerhard
 * Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia
 * Proceedings of the 13th International Conference on Extending Database Technology
 * 2010
 * 
 * {{hidden||Recent progress in information extraction has shown how to automatically build large ontologies from high-quality sources like Wikipedia. But knowledge evolves over time; facts have associated validity intervals. Therefore, ontologies should include time as a first-class dimension. In this paper, we introduce Timely YAGO, which extends our previously built knowledge base YAGO} with temporal aspects. This prototype system extracts temporal facts from Wikipedia infoboxes, categories, and lists in articles, and integrates these into the Timely YAGO} knowledge base. We also support querying temporal facts, by temporal predicates in a SPARQL-style} language. Visualization of query results is provided in order to better understand of the dynamic nature of knowledge.}}
 * {{hidden||Recent progress in information extraction has shown how to automatically build large ontologies from high-quality sources like Wikipedia. But knowledge evolves over time; facts have associated validity intervals. Therefore, ontologies should include time as a first-class dimension. In this paper, we introduce Timely YAGO, which extends our previously built knowledge base YAGO} with temporal aspects. This prototype system extracts temporal facts from Wikipedia infoboxes, categories, and lists in articles, and integrates these into the Timely YAGO} knowledge base. We also support querying temporal facts, by temporal predicates in a SPARQL-style} language. Visualization of query results is provided in order to better understand of the dynamic nature of knowledge.}}


 * -- align="left" valign=top
 * Medelyan, Olena; Witten, Ian H. & Milne, David
 * Topic indexing with wikipedia
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008


 * -- align="left" valign=top
 * Wahabzada, Mirwaes; Xu, Zhao & Kersting, Kristian
 * Topic models conditioned on relations
 * European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010, September 20, 2010 - September 24, 2010 Barcelona, Spain
 * 2010
 * 
 * {{hidden||Latent Dirichlet allocation is a fully generative statistical language model that has been proven to be successful in capturing both the content and the topics of a corpus of documents. Recently, it was even shown that relations among documents such as hyper-links or citations allow one to share information between documents and in turn to improve topic generation. Although fully generative, in many situations we are actually not interested in predicting relations among documents. In this paper, we therefore present a Dirichlet-multinomial nonparametric regression topic model that includes a Gaussian process prior on joint document and topic distributions that is a function of document relations. On networks of scientific abstracts and of Wikipedia documents we show that this approach meets or exceeds the performance of several baseline topic models. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Latent Dirichlet allocation is a fully generative statistical language model that has been proven to be successful in capturing both the content and the topics of a corpus of documents. Recently, it was even shown that relations among documents such as hyper-links or citations allow one to share information between documents and in turn to improve topic generation. Although fully generative, in many situations we are actually not interested in predicting relations among documents. In this paper, we therefore present a Dirichlet-multinomial nonparametric regression topic model that includes a Gaussian process prior on joint document and topic distributions that is a function of document relations. On networks of scientific abstracts and of Wikipedia documents we show that this approach meets or exceeds the performance of several baseline topic models. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Nastase, Vivi
 * Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation
 * Proceedings of the Conference on Empirical Methods in Natural Language Processing
 * 2008
 * 
 * {{hidden||Information of interest to users is often distributed over a set of documents. Users can specify their request for information as a query/topic -- a set of one or more sentences or questions. Producing a good summary of the relevant information relies on understanding the query and linking it with the associated set of documents. To understand" the query we expand it using encyclopedic knowledge in Wikipedia. The expanded query is linked with its associated documents through spreading activation in a graph that represents words and their grammatical connections in these documents. The topic expanded words and activated nodes in the graph are used to produce an extractive summary. The method proposed is tested on the DUC} summarization data. The system implemented ranks high compared to the participating systems in the DUC} competitions confirming our hypothesis that encyclopedic knowledge is a useful addition to a summarization system."}}
 * {{hidden||Information of interest to users is often distributed over a set of documents. Users can specify their request for information as a query/topic -- a set of one or more sentences or questions. Producing a good summary of the relevant information relies on understanding the query and linking it with the associated set of documents. To understand" the query we expand it using encyclopedic knowledge in Wikipedia. The expanded query is linked with its associated documents through spreading activation in a graph that represents words and their grammatical connections in these documents. The topic expanded words and activated nodes in the graph are used to produce an extractive summary. The method proposed is tested on the DUC} summarization data. The system implemented ranks high compared to the participating systems in the DUC} competitions confirming our hypothesis that encyclopedic knowledge is a useful addition to a summarization system."}}


 * -- align="left" valign=top
 * Gehres, Peter; Singleton, Nathan; Louthan, George & Hale, John
 * Toward sensitive information redaction in a collaborative, multilevel security environment
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * 2010
 * 


 * -- align="left" valign=top
 * Wang, Pu & Domeniconi, Carlotta
 * Towards a universal text classifier: Transfer learning using encyclopedic knowledge
 * 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009, December 6, 2009 - December 6, 2009 Miami, FL, United states
 * 2009
 * 


 * -- align="left" valign=top
 * Ronchetti, Marco & Sant, Joseph
 * Towards automatic syllabi matching
 * Proceedings of the 14th annual ACM SIGCSE conference on Innovation and technology in computer science education
 * 2009
 * 


 * -- align="left" valign=top
 * Kotov, Alexander & Zhai, ChengXiang
 * Towards natural question-guided search
 * 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states
 * 2010
 * 


 * -- align="left" valign=top
 * Veale, Tony
 * Tracking the Lexical Zeitgeist with WordNet and Wikipedia
 * Proceeding of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
 * 2006
 * 


 * -- align="left" valign=top
 * Gleich, David F.; Constantine, Paul G.; Flaxman, Abraham D. & Gunawardana, Asela
 * Tracking the random surfer: empirically measured teleportation parameters in PageRank
 * Proceedings of the 19th international conference on World wide web
 * 2010
 * 
 * {{hidden||PageRank} computes the importance of each node in a directed graph under a random surfer model governed by a teleportation parameter. Commonly denoted alpha, this parameter models the probability of following an edge inside the graph or, when the graph comes from a network of web pages and links, clicking a link on a web page. We empirically measure the teleportation parameter based on browser toolbar logs and a click trail analysis. For a particular user or machine, such analysis produces a value of alpha. We find that these values nicely fit a Beta distribution with mean edge-following probability between 0.3 and 0.7, depending on the site. Using these distributions, we compute PageRank} scores where PageRank} is computed with respect to a distribution as the teleportation parameter, rather than a constant teleportation parameter. These new metrics are evaluated on the graph of pages in Wikipedia.}}
 * {{hidden||PageRank} computes the importance of each node in a directed graph under a random surfer model governed by a teleportation parameter. Commonly denoted alpha, this parameter models the probability of following an edge inside the graph or, when the graph comes from a network of web pages and links, clicking a link on a web page. We empirically measure the teleportation parameter based on browser toolbar logs and a click trail analysis. For a particular user or machine, such analysis produces a value of alpha. We find that these values nicely fit a Beta distribution with mean edge-following probability between 0.3 and 0.7, depending on the site. Using these distributions, we compute PageRank} scores where PageRank} is computed with respect to a distribution as the teleportation parameter, rather than a constant teleportation parameter. These new metrics are evaluated on the graph of pages in Wikipedia.}}


 * -- align="left" valign=top
 * Désilets, Alain; Gonzalez, Lucas; Paquet, Sébastien & Stojanovic, Marta
 * Translation the Wiki way
 * Proceedings of the 2006 international symposium on Wikis
 * 2006
 * 
 * {{hidden||This paper discusses the design and implementation of processes and tools to support the collaborative creation and maintenance of multilingual wiki content. A wiki is a website where a large number of participants are allowed to create and modify content using their Web browser. This simple concept has revolutionized collaborative authoring on the web, enabling among others, the creation of Wikipedia, the world's largest online encyclopedia. On many of the largest and highest profile wiki sites, content needs to be provided in more than one language. Yet, current wiki engines do not support the efficient creation and maintenance of such content. Consequently, most wiki sites deal with the issue of multilingualism by spawning a separate and independent site for each language. This approach leads to much wasted effort since the same content must be researched, tracked and written from scratch for every language. In this paper, we investigate what features could be implemented in wiki engines in order to deal more effectively with multilingual content. We look at how multilingual content is currently managed in more traditional industrial contexts, and show how this approach is not appropriate in a wiki world. We then describe the results of a User-Centered} Design exercise performed to explore what a multilingual wiki engine should look like from the point of view of its various end users. We describe a partial implementation of those requirements in our own wiki engine (LizzyWiki), to deal with the special case of bilingual sites. We also discuss how this simple implementation could be extended to provide even more sophisticated features, and in particular, to support the general case of a site with more than two languages. Finally, even though the paper focuses primarily on multilingual content in a wiki context, we argue that translating in this Wiki} Way" may also be useful in some traditional industrial settings as a way of dealing better with the fast and ever-changing nature of our modern internet world."}}
 * {{hidden||This paper discusses the design and implementation of processes and tools to support the collaborative creation and maintenance of multilingual wiki content. A wiki is a website where a large number of participants are allowed to create and modify content using their Web browser. This simple concept has revolutionized collaborative authoring on the web, enabling among others, the creation of Wikipedia, the world's largest online encyclopedia. On many of the largest and highest profile wiki sites, content needs to be provided in more than one language. Yet, current wiki engines do not support the efficient creation and maintenance of such content. Consequently, most wiki sites deal with the issue of multilingualism by spawning a separate and independent site for each language. This approach leads to much wasted effort since the same content must be researched, tracked and written from scratch for every language. In this paper, we investigate what features could be implemented in wiki engines in order to deal more effectively with multilingual content. We look at how multilingual content is currently managed in more traditional industrial contexts, and show how this approach is not appropriate in a wiki world. We then describe the results of a User-Centered} Design exercise performed to explore what a multilingual wiki engine should look like from the point of view of its various end users. We describe a partial implementation of those requirements in our own wiki engine (LizzyWiki), to deal with the special case of bilingual sites. We also discuss how this simple implementation could be extended to provide even more sophisticated features, and in particular, to support the general case of a site with more than two languages. Finally, even though the paper focuses primarily on multilingual content in a wiki context, we argue that translating in this Wiki} Way" may also be useful in some traditional industrial settings as a way of dealing better with the fast and ever-changing nature of our modern internet world."}}


 * -- align="left" valign=top
 * Platt, John C.; Toutanova, Kristina & tau Yih, Wen
 * Translingual document representations from discriminative projections
 * Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
 * 2010
 * 
 * {{hidden||Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a projection of documents from multiple languages into a single translingual vector space. We explore two variants to create these projections: Oriented Principal Component Analysis (OPCA) and Coupled Probabilistic Latent Semantic Analysis (CPLSA).} Both of these variants start with a basic model of documents (PCA} and PLSA).} Each model is then made discriminative by encouraging comparable document pairs to have similar vector representations. We evaluate these algorithms on two tasks: parallel document retrieval for Wikipedia and Europarl documents, and cross-lingual text classification on Reuters. The two discriminative variants, OPCA} and CPLSA, significantly outperform their corresponding baselines. The largest differences in performance are observed on the task of retrieval when the documents are only comparable and not parallel. The OPCA} method is shown to perform best.}}
 * {{hidden||Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a projection of documents from multiple languages into a single translingual vector space. We explore two variants to create these projections: Oriented Principal Component Analysis (OPCA) and Coupled Probabilistic Latent Semantic Analysis (CPLSA).} Both of these variants start with a basic model of documents (PCA} and PLSA).} Each model is then made discriminative by encouraging comparable document pairs to have similar vector representations. We evaluate these algorithms on two tasks: parallel document retrieval for Wikipedia and Europarl documents, and cross-lingual text classification on Reuters. The two discriminative variants, OPCA} and CPLSA, significantly outperform their corresponding baselines. The largest differences in performance are observed on the task of retrieval when the documents are only comparable and not parallel. The OPCA} method is shown to perform best.}}


 * -- align="left" valign=top
 * Vukovic, Maja; Kumara, Soundar & Greenshpan, Ohad
 * Ubiquitous crowdsourcing
 * Proceedings of the 12th ACM international conference adjunct papers on Ubiquitous computing
 * 2010
 * 


 * -- align="left" valign=top
 * Täckström, Oscar; Velupillai, Sumithra; Hassel, Martin; Eriksson, Gunnar; Dalianis, Hercules & Karlgren, Jussi
 * Uncertainty detection as approximate max-margin sequence labelling
 * Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
 * 2010
 * 
 * {{hidden||This paper reports experiments for the CoNLL-2010} shared task on learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction problems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the insentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the bioencoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features. Our official results for Task 1 for the biological domain are 85.2 F1-score, for the Wikipedia set 55.4 F1-score. For Task 2, our official results are 2.1 for the entire task with a score of 62.5 for cue detection. After resolving errors and final bugs, our final results are for Task 1, biological: 86.0, Wikipedia: 58.2; Task 2, scopes: 39.6 and cues: 78.5.}}
 * {{hidden||This paper reports experiments for the CoNLL-2010} shared task on learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction problems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the insentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the bioencoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features. Our official results for Task 1 for the biological domain are 85.2 F1-score, for the Wikipedia set 55.4 F1-score. For Task 2, our official results are 2.1 for the entire task with a score of 62.5 for cue detection. After resolving errors and final bugs, our final results are for Task 1, biological: 86.0, Wikipedia: 58.2; Task 2, scopes: 39.6 and cues: 78.5.}}


 * -- align="left" valign=top
 * Billings, Matt & Watts, Leon A.
 * Understanding dispute resolution online: using text to reflect personal and substantive issues in conflict
 * Proceedings of the 28th international conference on Human factors in computing systems
 * 2010
 * 


 * -- align="left" valign=top
 * Ingawale, Myshkin
 * Understanding the wikipedia phenomenon: a case for agent based modeling
 * Proceeding of the 2nd PhD workshop on Information and knowledge management
 * 2008
 * 


 * -- align="left" valign=top
 * Hu, Jian; Wang, Gang; Lochovsky, Fred; tao Sun, Jian & Chen, Zheng
 * Understanding user's query intent with wikipedia
 * Proceedings of the 18th international conference on World wide web
 * 2009
 * 


 * -- align="left" valign=top
 * Tan, Bin & Peng, Fuchun
 * Unsupervised query segmentation using generative language models and wikipedia
 * Proceeding of the 17th international conference on World Wide Web
 * 2008
 * 
 * {{hidden||In this paper, we propose a novel unsupervised approach to query segmentation, an important task in Web search. We use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, we incorporate evidence from Wikipedia. Experiments show that our approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4\% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM} optimization further improves the performance by 14.3\%. Additional knowledge from Wikipedia provides another improvement of 24.3\%, adding up to a total of 46\% improvement (from 0.530 to 0.774).}}
 * {{hidden||In this paper, we propose a novel unsupervised approach to query segmentation, an important task in Web search. We use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, we incorporate evidence from Wikipedia. Experiments show that our approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4\% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM} optimization further improves the performance by 14.3\%. Additional knowledge from Wikipedia provides another improvement of 24.3\%, adding up to a total of 46\% improvement (from 0.530 to 0.774).}}


 * -- align="left" valign=top
 * Yan, Yulan; Okazaki, Naoaki; Matsuo, Yutaka; Yang, Zhenglu & Ishizuka, Mitsuru
 * Unsupervised relation extraction by mining Wikipedia texts using information from the web
 * Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
 * 2009
 * 


 * -- align="left" valign=top
 * Syed, Zareen & Finin, Tim
 * Unsupervised techniques for discovering ontology elements from Wikipedia article links
 * Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
 * 2010
 * 


 * -- align="left" valign=top
 * de Melo, Gerard & Weikum, Gerhard
 * Untangling the cross-lingual link structure of Wikipedia
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 


 * -- align="left" valign=top
 * Wang, Qihua; Jin, Hongxia & Li, Ninghui
 * Usable access control in collaborative environments: Authorization based on people-tagging
 * 14th European Symposium on Research in Computer Security, ESORICS 2009, September 21, 2009 - September 23, 2009 Saint-Malo, France
 * 2009
 * 
 * {{hidden||We study attribute-based access control for resource sharing in collaborative work environments. The goal of our work is to encourage sharing within an organization by striking a balance between usability and security. Inspired by the great success of a number of collaboration-based Web 2.0 systems, such as Wikipedia and Del.icio.us, we propose a novel attribute-based access control framework that acquires information on users' attributes from the collaborative efforts of all users in a system, instead of from a small number of trusted agents. Intuitively, if several users say that someone has a certain attribute, our system believes that the latter indeed has the attribute. In order to allow users to specify and maintain the attributes of each other, we employ the mechanism of people-tagging, where users can tag each other with the terms they want, and tags from different users are combined and viewable by all users in the system. In this article, we describe the system framework of our solution, propose a language to specify access control policies, and design an example-based policy specification method that is friendly to ordinary users. We have implemented a prototype of our solution based on a real-world and large-scale people-tagging system in IBM.} Experiments have been performed on the data collected by the system. 2009 Springer Berlin Heidelberg.}}
 * {{hidden||We study attribute-based access control for resource sharing in collaborative work environments. The goal of our work is to encourage sharing within an organization by striking a balance between usability and security. Inspired by the great success of a number of collaboration-based Web 2.0 systems, such as Wikipedia and Del.icio.us, we propose a novel attribute-based access control framework that acquires information on users' attributes from the collaborative efforts of all users in a system, instead of from a small number of trusted agents. Intuitively, if several users say that someone has a certain attribute, our system believes that the latter indeed has the attribute. In order to allow users to specify and maintain the attributes of each other, we employ the mechanism of people-tagging, where users can tag each other with the terms they want, and tags from different users are combined and viewable by all users in the system. In this article, we describe the system framework of our solution, propose a language to specify access control policies, and design an example-based policy specification method that is friendly to ordinary users. We have implemented a prototype of our solution based on a real-world and large-scale people-tagging system in IBM.} Experiments have been performed on the data collected by the system. 2009 Springer Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Albertsen, Johannes & Bouvin, Niels Olof
 * User defined structural searches in mediawiki
 * Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
 * 2008
 * 
 * {{hidden||Wikipedia has been the poster child of user contributed content using the space of MediaWiki} as the canvas on which to write. While well suited for authoring simple hypermedia documents, MediaWiki} does not lend itself easily to let the author create dynamically assembled documents, or create pages that monitor other pages. While it is possible to create such special" pages it requires PHP} coding and thus administrative rights to the MediaWiki} server. We present in this paper work on a structural query language (MediaWiki} Query Language - MWQL) to allow users to add dynamically evaluated searches to ordinary wiki-pages."}}
 * {{hidden||Wikipedia has been the poster child of user contributed content using the space of MediaWiki} as the canvas on which to write. While well suited for authoring simple hypermedia documents, MediaWiki} does not lend itself easily to let the author create dynamically assembled documents, or create pages that monitor other pages. While it is possible to create such special" pages it requires PHP} coding and thus administrative rights to the MediaWiki} server. We present in this paper work on a structural query language (MediaWiki} Query Language - MWQL) to allow users to add dynamically evaluated searches to ordinary wiki-pages."}}


 * -- align="left" valign=top
 * Itakura, Kelly Y. & Clarke, Charles L. A.
 * Using dynamic markov compression to detect vandalism in the wikipedia
 * Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
 * 2009
 * 


 * -- align="left" valign=top
 * Coursey, Kino; Mihalcea, Rada & Moen, William
 * Using encyclopedic knowledge for automatic topic identification
 * Proceedings of the Thirteenth Conference on Computational Natural Language Learning
 * 2009
 * 


 * -- align="left" valign=top
 * Irvine, Ann & Klementiev, Alexandre
 * Using Mechanical Turk to annotate lexicons for less commonly used languages
 * Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
 * 2010
 * 
 * {{hidden||In this work we present results from using Amazon's Mechanical Turk (MTurk) to annotate translation lexicons between English and a large set of less commonly used languages. We generate candidate translations for 100 English words in each of 42 foreign languages using Wikipedia and a lexicon induction framework. We evaluate the MTurk} annotations by using positive and negative control candidate translations. Additionally, we evaluate the annotations by adding pairs to our seed dictionaries, providing a feedback loop into the induction system. MTurk} workers are more successful in annotating some languages than others and are not evenly distributed around the world or among the world's languages. However, in general, we find that MTurk} is a valuable resource for gathering cheap and simple annotations for most of the languages that we explored, and these annotations provide useful feedback in building a larger, more accurate lexicon.}}
 * {{hidden||In this work we present results from using Amazon's Mechanical Turk (MTurk) to annotate translation lexicons between English and a large set of less commonly used languages. We generate candidate translations for 100 English words in each of 42 foreign languages using Wikipedia and a lexicon induction framework. We evaluate the MTurk} annotations by using positive and negative control candidate translations. Additionally, we evaluate the annotations by adding pairs to our seed dictionaries, providing a feedback loop into the induction system. MTurk} workers are more successful in annotating some languages than others and are not evenly distributed around the world or among the world's languages. However, in general, we find that MTurk} is a valuable resource for gathering cheap and simple annotations for most of the languages that we explored, and these annotations provide useful feedback in building a larger, more accurate lexicon.}}


 * -- align="left" valign=top
 * Shieh, Jyh-Ren; Yeh, Yang-Ting; Lin, Chih-Hung; Lin, Ching-Yung & Wu, Ja-Ling
 * Using Semantic Graphs for Image Search
 * 2008 IEEE International Conference on Multimedia and Expo, ICME 2008, June 23, 2008 - June 26, 2008 Hannover, Germany
 * 2008
 * 
 * {{hidden||In this paper, we propose a Semantic Graphs for Image Search (SGIS) system, which provides a novel way for image search by utilizing collaborative knowledge in Wikipedia and network analysis to form semantic graphs for search-term suggestion. The collaborative article editing process of Wikipedia's contributors is formalized as bipartite graphs that are folded into networks between terms. When user types in a search term, SGIS} automatically retrieves an interactive semantic graph of related terms that allow users easily find related images not limited to a specific search term. Interactive semantic graph then serves as an interface to retrieve images through existing commercial search engines. This method significantly saves users' time by avoiding multiple search keywords that are usually required in generic search engines. It benefits both naive user who does not possess a large vocabulary (e.g., students) and professionals who look for images on a regular basis. In our experiments, 85\% of the participants favored SGIS} system than commercial search engines. }}
 * {{hidden||In this paper, we propose a Semantic Graphs for Image Search (SGIS) system, which provides a novel way for image search by utilizing collaborative knowledge in Wikipedia and network analysis to form semantic graphs for search-term suggestion. The collaborative article editing process of Wikipedia's contributors is formalized as bipartite graphs that are folded into networks between terms. When user types in a search term, SGIS} automatically retrieves an interactive semantic graph of related terms that allow users easily find related images not limited to a specific search term. Interactive semantic graph then serves as an interface to retrieve images through existing commercial search engines. This method significantly saves users' time by avoiding multiple search keywords that are usually required in generic search engines. It benefits both naive user who does not possess a large vocabulary (e.g., students) and professionals who look for images on a regular basis. In our experiments, 85\% of the participants favored SGIS} system than commercial search engines. }}


 * -- align="left" valign=top
 * Blohm, Sebastian & Cimiano, Philipp
 * Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction
 * Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
 * 2007
 * 
 * {{hidden||Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be mentioned in a similar manner in order to be generalized to a textual pattern. Data sparseness thus becomes a problem when trying to extract information from hardly redundant sources like corporate intranets, encyclopedic works or scientific Databases.We} present results on applying a weakly supervised pattern induction algorithm to Wikipedia to extract instances of arbitrary relations. In particular, we apply different configurations of a basic algorithm for pattern induction on seven different datasets. We show that the lack of redundancy leads to the need of a large amount of training data but that integrating Web extraction into the process leads to a significant reduction of required training data while maintaining the accuracy of Wikipedia. In particular we show that, though the use of the Web can have similar effects as produced by increasing the number of seeds, it leads overall to better results. Our approach thus allows to combine advantages of two sources: The high reliability of a closed corpus and the high redundancy of the Web.}}
 * {{hidden||Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be mentioned in a similar manner in order to be generalized to a textual pattern. Data sparseness thus becomes a problem when trying to extract information from hardly redundant sources like corporate intranets, encyclopedic works or scientific Databases.We} present results on applying a weakly supervised pattern induction algorithm to Wikipedia to extract instances of arbitrary relations. In particular, we apply different configurations of a basic algorithm for pattern induction on seven different datasets. We show that the lack of redundancy leads to the need of a large amount of training data but that integrating Web extraction into the process leads to a significant reduction of required training data while maintaining the accuracy of Wikipedia. In particular we show that, though the use of the Web can have similar effects as produced by increasing the number of seeds, it leads overall to better results. Our approach thus allows to combine advantages of two sources: The high reliability of a closed corpus and the high redundancy of the Web.}}


 * -- align="left" valign=top
 * Kaptein, Rianne; Koolen, Marijn & Kamps, Jaap
 * Using wikipedia categories for ad hoc search
 * Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
 * 2009
 * 


 * -- align="left" valign=top
 * Wang, Pu; Domeniconi, Carlotta & Hu, Jian
 * Using wikipedia for co-clustering based cross-domain text classification
 * 8th IEEE International Conference on Data Mining, ICDM 2008, December 15, 2008 - December 19, 2008 Pisa, Italy
 * 2008
 * 


 * -- align="left" valign=top
 * Gabay, David; Ziv, Ben-Eliahu & Elhadad, Michael
 * Using wikipedia links to construct word segmentation corpora
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008
 * {{hidden||Tagged corpora are essential for evaluating and training natural language processing tools. The cost of constructing large enough manually tagged corpora is high, even when the annotation level is shallow. This article describes a simple method to automatically create a partially tagged corpus, using Wikipedia hyperlinks. The resulting corpus contains information about the correct segmentation of 523,599 non-consecutive words in 363,090 sentences. We used our method to construct a corpus of Modern Hebrew (which we have made available at http://www.cs.bgu.ac.il/-nlpproj). The method can also be applied to other languages where word segmentation is difficult to determine, such as East and South-East} Asian languages. Copyright 2008.}}
 * {{hidden||Tagged corpora are essential for evaluating and training natural language processing tools. The cost of constructing large enough manually tagged corpora is high, even when the annotation level is shallow. This article describes a simple method to automatically create a partially tagged corpus, using Wikipedia hyperlinks. The resulting corpus contains information about the correct segmentation of 523,599 non-consecutive words in 363,090 sentences. We used our method to construct a corpus of Modern Hebrew (which we have made available at http://www.cs.bgu.ac.il/-nlpproj). The method can also be applied to other languages where word segmentation is difficult to determine, such as East and South-East} Asian languages. Copyright 2008.}}
 * {{hidden||Tagged corpora are essential for evaluating and training natural language processing tools. The cost of constructing large enough manually tagged corpora is high, even when the annotation level is shallow. This article describes a simple method to automatically create a partially tagged corpus, using Wikipedia hyperlinks. The resulting corpus contains information about the correct segmentation of 523,599 non-consecutive words in 363,090 sentences. We used our method to construct a corpus of Modern Hebrew (which we have made available at http://www.cs.bgu.ac.il/-nlpproj). The method can also be applied to other languages where word segmentation is difficult to determine, such as East and South-East} Asian languages. Copyright 2008.}}


 * -- align="left" valign=top
 * Finin, Tim; Syed, Zareen; Mayfield, James; Mcnamee, Paul & Piatko, Christine
 * Using wikitology for cross-document entity coreference resolution
 * Learning by Reading and Learning to Read - Papers from the AAAI Spring Symposium, March 23, 2009 - March 25, 2009 Stanford, CA, United states
 * 2009
 * {{hidden||We describe the use of the Wikitology knowledge base as a resource for a variety of applications with special focus on a cross-document entity coreference resolution task. This task involves recognizing when entities and relations mentioned in different documents refer to the same object or relation in the world. Wikitology is a knowledge base system constructed with material from Wikipedia, DBpedia} and Freebase that includes both unstructured text and semi-structured information. Wikitology was used to define features that were part of a system implemented by the Johns Hopkins University Human Language Technology Center of Excellence for the 2008 Automatic Content Extraction cross-document coreference resolution evaluation organized by National Institute of Standards and Technology.}}
 * {{hidden||We describe the use of the Wikitology knowledge base as a resource for a variety of applications with special focus on a cross-document entity coreference resolution task. This task involves recognizing when entities and relations mentioned in different documents refer to the same object or relation in the world. Wikitology is a knowledge base system constructed with material from Wikipedia, DBpedia} and Freebase that includes both unstructured text and semi-structured information. Wikitology was used to define features that were part of a system implemented by the Johns Hopkins University Human Language Technology Center of Excellence for the 2008 Automatic Content Extraction cross-document coreference resolution evaluation organized by National Institute of Standards and Technology.}}
 * {{hidden||We describe the use of the Wikitology knowledge base as a resource for a variety of applications with special focus on a cross-document entity coreference resolution task. This task involves recognizing when entities and relations mentioned in different documents refer to the same object or relation in the world. Wikitology is a knowledge base system constructed with material from Wikipedia, DBpedia} and Freebase that includes both unstructured text and semi-structured information. Wikitology was used to define features that were part of a system implemented by the Johns Hopkins University Human Language Technology Center of Excellence for the 2008 Automatic Content Extraction cross-document coreference resolution evaluation organized by National Institute of Standards and Technology.}}


 * -- align="left" valign=top
 * Zesch, Torsten; Müller, Christof & Gurevych, Iryna
 * Using wiktionary for computing semantic relatedness
 * Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
 * 2008
 * 
 * {{hidden||We introduce Wiktionary as an emerging lexical semantic resource that can be used as a substitute for expert-made resources in AI} applications. We evaluate Wiktionary on the pervasive task of computing semantic relatedness for English and German by means of correlation with human rankings and solving word choice problems. For the first time, we apply a concept vector based measure to a set of different concept representations like Wiktionary pseudo glosses, the first paragraph of Wikipedia articles, English WordNet} glosses, and GermaNet} pseudo glosses. We show that: (i) Wiktionary is the best lexical semantic resource in the ranking task and performs comparably to other resources in the word choice task, and (ii) the concept vector based approach yields the best results on all datasets in both evaluations.}}
 * {{hidden||We introduce Wiktionary as an emerging lexical semantic resource that can be used as a substitute for expert-made resources in AI} applications. We evaluate Wiktionary on the pervasive task of computing semantic relatedness for English and German by means of correlation with human rankings and solving word choice problems. For the first time, we apply a concept vector based measure to a set of different concept representations like Wiktionary pseudo glosses, the first paragraph of Wikipedia articles, English WordNet} glosses, and GermaNet} pseudo glosses. We show that: (i) Wiktionary is the best lexical semantic resource in the ranking task and performs comparably to other resources in the word choice task, and (ii) the concept vector based approach yields the best results on all datasets in both evaluations.}}


 * -- align="left" valign=top
 * Missen, Malik Muhammad Saad & Boughanem, Mohand
 * Using wordnet's semantic relations for opinion detection in blogs
 * 31th European Conference on Information Retrieval, ECIR 2009, April 6, 2009 - April 9, 2009 Toulouse, France
 * 2009
 * 
 * {{hidden||The Opinion Detection from blogs has always been a challenge for researchers. One of the challenges faced is to find such documents that specifically contain opinion on users' information need. This requires text processing on sentence level rather than on document level. In this paper, we have proposed an opinion detection approach. The proposed approach focuses on above problem by processing documents on sentence level using different semantic similarity relations of WordNet} between sentence words and list of weighted query words expanded through encyclopedia Wikipedia. According to initial results, our approach performs well with MAP} of 0.28 and P@10 of 0.64 with improvement of 27\% over baseline results. TREC} Blog 2006 data is used as test data collection. Springer-Verlag} Berlin Heidelberg 2009.}}
 * {{hidden||The Opinion Detection from blogs has always been a challenge for researchers. One of the challenges faced is to find such documents that specifically contain opinion on users' information need. This requires text processing on sentence level rather than on document level. In this paper, we have proposed an opinion detection approach. The proposed approach focuses on above problem by processing documents on sentence level using different semantic similarity relations of WordNet} between sentence words and list of weighted query words expanded through encyclopedia Wikipedia. According to initial results, our approach performs well with MAP} of 0.28 and P@10 of 0.64 with improvement of 27\% over baseline results. TREC} Blog 2006 data is used as test data collection. Springer-Verlag} Berlin Heidelberg 2009.}}


 * -- align="left" valign=top
 * Xu, Yang; Ding, Fan & Wang, Bin
 * Utilizing phrase based semantic information for term dependency
 * Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
 * 2008
 * 
 * {{hidden||Previous work on term dependency has not taken into account semantic information underlying query phrases. In this work, we study the impact of utilizing phrase based concepts for term dependency. We use Wikipedia to separate important and less important term dependencies, and treat them accordingly as features in a linear feature-based retrieval model. We compare our method with a Markov Random Field (MRF) model on four TREC} document collections. Our experimental results show that utilizing phrase based concepts improves the retrieval effectiveness of term dependency, and reduces the size of the feature set to large extent.}}
 * {{hidden||Previous work on term dependency has not taken into account semantic information underlying query phrases. In this work, we study the impact of utilizing phrase based concepts for term dependency. We use Wikipedia to separate important and less important term dependencies, and treat them accordingly as features in a linear feature-based retrieval model. We compare our method with a Markov Random Field (MRF) model on four TREC} document collections. Our experimental results show that utilizing phrase based concepts improves the retrieval effectiveness of term dependency, and reduces the size of the feature set to large extent.}}


 * -- align="left" valign=top
 * Roth, Camille
 * Viable wikis: struggle for life in the wikisphere
 * Proceedings of the 2007 international symposium on Wikis
 * 2007
 * 


 * -- align="left" valign=top
 * Chan, Bryan; Wu, Leslie; Talbot, Justin; Cammarano, Mike & Hanrahan, Pat
 * Vispedia*: Interactive visual exploration of wikipedia data via search-based integration
 * 445 Hoes Lane - P.O.Box} 1331, Piscataway, NJ} 08855-1331, United States
 * 2008
 * 


 * -- align="left" valign=top
 * Chan, Bryan; Talbot, Justin; Wu, Leslie; Sakunkoo, Nathan; Cammarano, Mike & Hanrahan, Pat
 * Vispedia: on-demand data integration for interactive visualization and exploration
 * Proceedings of the 35th SIGMOD international conference on Management of data
 * 2009
 * 
 * {{hidden||Wikipedia is an example of the large, collaborative, semi-structured data sets emerging on the Web. Typically, before these data sets can be used, they must transformed into structured tables via data integration. We present Vispedia, a Web-based visualization system which incorporates data integration into an iterative, interactive data exploration and analysis process. This reduces the upfront cost of using heterogeneous data sets like Wikipedia. Vispedia is driven by a keyword-query-based integration interface implemented using a fast graph search. The search occurs interactively over DBpedia's} semantic graph of Wikipedia, without depending on the existence of a structured ontology. This combination of data integration and visualization enables a broad class of non-expert users to more effectively use the semi-structured data available on the Web.}}
 * {{hidden||Wikipedia is an example of the large, collaborative, semi-structured data sets emerging on the Web. Typically, before these data sets can be used, they must transformed into structured tables via data integration. We present Vispedia, a Web-based visualization system which incorporates data integration into an iterative, interactive data exploration and analysis process. This reduces the upfront cost of using heterogeneous data sets like Wikipedia. Vispedia is driven by a keyword-query-based integration interface implemented using a fast graph search. The search occurs interactively over DBpedia's} semantic graph of Wikipedia, without depending on the existence of a structured ontology. This combination of data integration and visualization enables a broad class of non-expert users to more effectively use the semi-structured data available on the Web.}}


 * -- align="left" valign=top
 * Cruz, Pedro & Machado, Penousal
 * Visualizing empires decline
 * SIGGRAPH '10 ACM SIGGRAPH 2010 Posters
 * 2010
 * 


 * -- align="left" valign=top
 * Athenikos, Sofia J. & Lin, Xia
 * Visualizing intellectual connections among philosophers using the hyperlink \& semantic data from Wikipedia
 * Proceedings of the 5th International Symposium on Wikis and Open Collaboration
 * 2009
 * 


 * -- align="left" valign=top
 * Sundara, Seema; Atre, Medha; Kolovski, Vladimir; Das, Souripriya; Wu, Zhe; Chong, Eugene Inseok & Srinivasan, Jagannathan
 * Visualizing large-scale RDF data using subsets, summaries, and sampling in oracle
 * 26th IEEE International Conference on Data Engineering, ICDE 2010, March 1, 2010 - March 6, 2010 Long Beach, CA, United states
 * 2010
 * 
 * {{hidden||The paper addresses the problem of visualizing large scale RDF} data via a {3-S} approach, namely, by using, 1) Subsets: to present only relevant data for visualisation; both static and dynamic subsets can be specified, 2) Summaries: to capture the essence of RDF} data being viewed; summarized data can be expanded on demand thereby allowing users to create hybrid (summary-detail) fisheye views of RDF} data, and 3) Sampling: to further optimize visualization of large-scale data where a representative sample suffices. The visualization scheme works with both asserted and inferred triples (generated using RDF(S) and OWL} semantics). This scheme is implemented in Oracle by developing a plug-in for the Cytoscape graph visualization tool, which uses functions defined in a Oracle PL/SQL} package, to provide fast and optimized access to Oracle Semantic Store containing RDF} data. Interactive visualization of a synthesized RDF} data set (LUBM} 1 million triples), two native RDF} datasets (Wikipedia} 47 million triples and UniProt} 700 million triples), and an OWL} ontology (eClassOwl} with a large class hierarchy including over 25,000 OWL} classes, 5,000 properties, and 400,000 class-properties) demonstrates the effectiveness of our visualization scheme. }}
 * {{hidden||The paper addresses the problem of visualizing large scale RDF} data via a {3-S} approach, namely, by using, 1) Subsets: to present only relevant data for visualisation; both static and dynamic subsets can be specified, 2) Summaries: to capture the essence of RDF} data being viewed; summarized data can be expanded on demand thereby allowing users to create hybrid (summary-detail) fisheye views of RDF} data, and 3) Sampling: to further optimize visualization of large-scale data where a representative sample suffices. The visualization scheme works with both asserted and inferred triples (generated using RDF(S) and OWL} semantics). This scheme is implemented in Oracle by developing a plug-in for the Cytoscape graph visualization tool, which uses functions defined in a Oracle PL/SQL} package, to provide fast and optimized access to Oracle Semantic Store containing RDF} data. Interactive visualization of a synthesized RDF} data set (LUBM} 1 million triples), two native RDF} datasets (Wikipedia} 47 million triples and UniProt} 700 million triples), and an OWL} ontology (eClassOwl} with a large class hierarchy including over 25,000 OWL} classes, 5,000 properties, and 400,000 class-properties) demonstrates the effectiveness of our visualization scheme. }}


 * -- align="left" valign=top
 * Viégas, Fernanda & Wattenberg, Martin
 * Visualizing the inner lives of texts
 * Proceedings of the 5th International Symposium on Wikis and Open Collaboration
 * 2009
 * 


 * -- align="left" valign=top
 * Harrer, Andreas; Moskaliuk, Johannes; Kimmerle, Joachim & Cress, Ulrike
 * Visualizing wiki-supported knowledge building: co-evolution of individual and collective knowledge
 * Proceedings of the 4th International Symposium on Wikis
 * 2008
 * 


 * -- align="left" valign=top
 * Sherwani, J.; Yu, Dong; Paek, Tim; Czerwinski, Mary; Ju, Y.C. & Acero, Alex
 * Voicepedia: Towards speech-based access to unstructured information
 * 8th Annual Conference of the International Speech Communication Association, Interspeech 2007, August 27, 2007 - August 31, 2007 Antwerp, Belgium
 * 2007
 * {{hidden||Currently there are no dialog systems that enable purely voice-based access to the unstructured information on websites such as Wikipedia. Such systems could be revolutionary for non-literate users in the developing world. To investigate interface issues in such a system, we developed VoicePedia, a telephone-based dialog system for searching and browsing Wikipedia. In this paper, we present the system, as well as a user study comparing the use of VoicePedia} to SmartPedia, a Smartphone GUI-based} alternative. Keyword entry through the voice interface was significantly faster, while search result navigation, and page browsing were significantly slower. Although users preferred the GUI-based} interface, task success rates between both systems were comparable - a promising result for regions where Smartphones and data plans are not viable.}}
 * {{hidden||Currently there are no dialog systems that enable purely voice-based access to the unstructured information on websites such as Wikipedia. Such systems could be revolutionary for non-literate users in the developing world. To investigate interface issues in such a system, we developed VoicePedia, a telephone-based dialog system for searching and browsing Wikipedia. In this paper, we present the system, as well as a user study comparing the use of VoicePedia} to SmartPedia, a Smartphone GUI-based} alternative. Keyword entry through the voice interface was significantly faster, while search result navigation, and page browsing were significantly slower. Although users preferred the GUI-based} interface, task success rates between both systems were comparable - a promising result for regions where Smartphones and data plans are not viable.}}
 * {{hidden||Currently there are no dialog systems that enable purely voice-based access to the unstructured information on websites such as Wikipedia. Such systems could be revolutionary for non-literate users in the developing world. To investigate interface issues in such a system, we developed VoicePedia, a telephone-based dialog system for searching and browsing Wikipedia. In this paper, we present the system, as well as a user study comparing the use of VoicePedia} to SmartPedia, a Smartphone GUI-based} alternative. Keyword entry through the voice interface was significantly faster, while search result navigation, and page browsing were significantly slower. Although users preferred the GUI-based} interface, task success rates between both systems were comparable - a promising result for regions where Smartphones and data plans are not viable.}}


 * -- align="left" valign=top
 * Gordon, Jonathan; Durme, Benjamin Van & Schubert, Lenhart
 * Weblogs as a source for extracting general world knowledge
 * Proceedings of the fifth international conference on Knowledge capture
 * 2009
 * 
 * {{hidden||Knowledge extraction (KE) efforts have often used corpora of heavily edited writing and sources written to provide the desired knowledge (e.g., newspapers or textbooks). However, the proliferation of diverse, up-to-date, unedited writing on the Web, especially in weblogs, offers new challenges for KE} tools. We describe our efforts to extract general knowledge implicit in this noisy data and examine whether such sources can be an adequate substitute for resources like Wikipedia.}}
 * {{hidden||Knowledge extraction (KE) efforts have often used corpora of heavily edited writing and sources written to provide the desired knowledge (e.g., newspapers or textbooks). However, the proliferation of diverse, up-to-date, unedited writing on the Web, especially in weblogs, offers new challenges for KE} tools. We describe our efforts to extract general knowledge implicit in this noisy data and examine whether such sources can be an adequate substitute for resources like Wikipedia.}}


 * -- align="left" valign=top
 * Pantel, Patrick; Crestan, Eric; Borkovsky, Arkady; Popescu, Ana-Maria & Vyas, Vishnu
 * Web-scale distributional similarity and entity set expansion
 * Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
 * 2009
 * 
 * {{hidden||Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce} framework and deployed over a 200 billion word crawl of the Web. The pairwise similarity between 500 million terms is computed in 50 hours using 200 quad-core nodes. We apply the learned similarity matrix to the task of automatic set expansion and present a large empirical study to quantify the effect on expansion performance of corpus size, corpus quality, seed composition and seed size. We make public an experimental testbed for set expansion analysis that includes a large collection of diverse entity sets extracted from Wikipedia.}}
 * {{hidden||Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce} framework and deployed over a 200 billion word crawl of the Web. The pairwise similarity between 500 million terms is computed in 50 hours using 200 quad-core nodes. We apply the learned similarity matrix to the task of automatic set expansion and present a large empirical study to quantify the effect on expansion performance of corpus size, corpus quality, seed composition and seed size. We make public an experimental testbed for set expansion analysis that includes a large collection of diverse entity sets extracted from Wikipedia.}}


 * -- align="left" valign=top
 * Jesus, Rut
 * What cognition does for Wikis
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * 2010
 * 
 * {{hidden||Theoretical frameworks need to be developed to account for the phenomenon of Wikipedia and writing in Wikis. In this paper, a cognitive framework divides processes into the categories of Cognition for Planning and Cognition for Improvising. This distinction is applied to Wikipedia to understand the many small and the few big edits by which Wikipedia's articles grow. The paper relates the distinction to Lessig' Read-Only} and Read-Write, to Benkler's modularity and granularity of contributions and to Turkle and Papert's bricoleurs and planners. It argues that Wikipedia thrives because it harnesses a Cognition for Improvising surplus oriented by kindness and trust towards distant others and proposes that Cognition for Improvising is a determinant mode for the success of Wikis and Wikipedia. The theoretical framework can be a starting point for a cognitive discussion of wikis, peer-produced commons and new patterns of collaboration.}}
 * {{hidden||Theoretical frameworks need to be developed to account for the phenomenon of Wikipedia and writing in Wikis. In this paper, a cognitive framework divides processes into the categories of Cognition for Planning and Cognition for Improvising. This distinction is applied to Wikipedia to understand the many small and the few big edits by which Wikipedia's articles grow. The paper relates the distinction to Lessig' Read-Only} and Read-Write, to Benkler's modularity and granularity of contributions and to Turkle and Papert's bricoleurs and planners. It argues that Wikipedia thrives because it harnesses a Cognition for Improvising surplus oriented by kindness and trust towards distant others and proposes that Cognition for Improvising is a determinant mode for the success of Wikis and Wikipedia. The theoretical framework can be a starting point for a cognitive discussion of wikis, peer-produced commons and new patterns of collaboration.}}


 * -- align="left" valign=top
 * Amer-Yahia, Sihem & Halevy, Alon
 * What does web 2.0 have to do with databases?
 * Proceedings of the 33rd international conference on Very large data bases
 * 2007
 * 
 * {{hidden||Web 2.0 is a buzzword we have been hearing for over 2 years. According to Wikipedia, it hints at an improved form of the World Wide Web where technologies such as weblogs, social bookmarking, RSS} feeds, photo and video sharing, based on an architecture of participation and democracy that encourages users to add value to the application as they use it. Web 2.0 enables social networking on the Web by allowing users to contribute content, share it, rate it, create a network of friends, and decide what they like to see and how they want it to look like.}}
 * {{hidden||Web 2.0 is a buzzword we have been hearing for over 2 years. According to Wikipedia, it hints at an improved form of the World Wide Web where technologies such as weblogs, social bookmarking, RSS} feeds, photo and video sharing, based on an architecture of participation and democracy that encourages users to add value to the application as they use it. Web 2.0 enables social networking on the Web by allowing users to contribute content, share it, rate it, create a network of friends, and decide what they like to see and how they want it to look like.}}


 * -- align="left" valign=top
 * Oxley, Meghan; Morgan, Jonathan T.; Zachry, Mark & Hutchinson, Brian
 * What i know is...": establishing credibility on Wikipedia talk pages"
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * 2010
 * 


 * -- align="left" valign=top
 * Kittur, A.; Chi, E. H & Suh, B.
 * What’s in Wikipedia? Mapping Topics and Conflict Using Socially Annotated Category Structure
 * 2009
 * 
 * 


 * -- align="left" valign=top
 * Thom-Santelli, J.; Cosley, D. & Gay, G.
 * What’s Mine is Mine: Territoriality in Collaborative Authoring
 * 2009
 * 
 * 


 * -- align="left" valign=top
 * Kittur, Aniket; Chi, Ed H. & Suh, Bongwon
 * What's in Wikipedia?: mapping topics and conflict using socially annotated category structure
 * Proceedings of the 27th international conference on Human factors in computing systems
 * 2009
 * 


 * -- align="left" valign=top
 * Thom-Santelli, Jennifer; Cosley, Dan R. & Gay, Geri
 * What's mine is mine: territoriality in collaborative authoring
 * Proceedings of the 27th international conference on Human factors in computing systems
 * 2009
 * 


 * -- align="left" valign=top
 * Halatchliyski, Iassen; Moskaliuk, Johannes; Kimmerle, Joachim & Cress, Ulrike
 * Who integrates the networks of knowledge in Wikipedia?
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * 2010
 * 


 * -- align="left" valign=top
 * Halim, Felix; Yongzheng, Wu & Yap, Roland
 * Wiki credibility enhancement
 * Proceedings of the 5th International Symposium on Wikis and Open Collaboration
 * 2009
 * 


 * -- align="left" valign=top
 * Zhang, Yuejiao
 * Wiki means more: hyperreading in Wikipedia
 * Proceedings of the seventeenth conference on Hypertext and hypermedia
 * 2006
 * 


 * -- align="left" valign=top
 * Kumaran, A.; Saravanan, K.; Datha, Naren; Ashok, B. & Dendi, Vikram
 * WikiBABEL: a wiki-style platform for creation of parallel data
 * Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
 * 2009
 * 
 * {{hidden||In this demo, we present a wiki-style platform -- WikiBABEL} -- that enables easy collaborative creation of multilingual content in many Non-English} Wikipedias, by leveraging the relatively larger and more stable content in the English Wikipedia. The platform provides an intuitive user interface that maintains the user focus on the multilingual Wikipedia content creation, by engaging search tools for easy discoverability of related English source material, and a set of linguistic and collaborative tools to make the content translation simple. We present two different usage scenarios and discuss our experience in testing them with real users. Such integrated content creation platform in Wikipedia may yield as a by-product, parallel corpora that are critical for research in statistical machine translation systems in many languages of the world.}}
 * {{hidden||In this demo, we present a wiki-style platform -- WikiBABEL} -- that enables easy collaborative creation of multilingual content in many Non-English} Wikipedias, by leveraging the relatively larger and more stable content in the English Wikipedia. The platform provides an intuitive user interface that maintains the user focus on the multilingual Wikipedia content creation, by engaging search tools for easy discoverability of related English source material, and a set of linguistic and collaborative tools to make the content translation simple. We present two different usage scenarios and discuss our experience in testing them with real users. Such integrated content creation platform in Wikipedia may yield as a by-product, parallel corpora that are critical for research in statistical machine translation systems in many languages of the world.}}


 * -- align="left" valign=top
 * Gaio, Loris; den Besten, Matthijs; Rossi, Alessandro & Dalle, Jean-Michel
 * Wikibugs: using template messages in open content collections
 * Proceedings of the 5th International Symposium on Wikis and Open Collaboration
 * 2009
 * 
 * {{hidden||In the paper we investigate an organizational practice meant to increase the quality of commons-based peer production: the use of template messages in wiki-collections to highlight editorial bugs and call for intervention. In the context of SimpleWiki, an online encyclopedia of the Wikipedia family, we focus on Complex}, a template which is used to flag articles disregarding the overall goals of simplicity and readability. We characterize how this template is placed on and removed from articles and we use survival analysis to study the emergence and successful treatment of these bugs in the collection.}}
 * {{hidden||In the paper we investigate an organizational practice meant to increase the quality of commons-based peer production: the use of template messages in wiki-collections to highlight editorial bugs and call for intervention. In the context of SimpleWiki, an online encyclopedia of the Wikipedia family, we focus on Complex}, a template which is used to flag articles disregarding the overall goals of simplicity and readability. We characterize how this template is placed on and removed from articles and we use survival analysis to study the emergence and successful treatment of these bugs in the collection.}}


 * -- align="left" valign=top
 * Nunes, Sérgio; Ribeiro, Cristina & David, Gabriel
 * WikiChanges: exposing Wikipedia revision activity
 * Proceedings of the 4th International Symposium on Wikis
 * 2008
 * 
 * {{hidden||Wikis are popular tools commonly used to support distributed collaborative work. Wikis can be seen as virtual scrap-books that anyone can edit without having any specific technical know-how. The Wikipedia is a flagship example of a real-word application of wikis. Due to the large scale of Wikipedia it's difficult to easily grasp much of the information that is stored in this wiki. We address one particular aspect of this issue by looking at the revision history of each article. Plotting the revision activity in a timeline we expose the complete article's history in a easily understandable format. We present WikiChanges, a web-based application designed to plot an article's revision timeline in real time. WikiChanges} also includes a web browser extension that incorporates activity sparklines in the real Wikipedia. Finally, we introduce a revisions summarization task that addresses the need to understand what occurred during a given set of revisions. We present a first approach to this task using tag clouds to present the revisions made.}}
 * {{hidden||Wikis are popular tools commonly used to support distributed collaborative work. Wikis can be seen as virtual scrap-books that anyone can edit without having any specific technical know-how. The Wikipedia is a flagship example of a real-word application of wikis. Due to the large scale of Wikipedia it's difficult to easily grasp much of the information that is stored in this wiki. We address one particular aspect of this issue by looking at the revision history of each article. Plotting the revision activity in a timeline we expose the complete article's history in a easily understandable format. We present WikiChanges, a web-based application designed to plot an article's revision timeline in real time. WikiChanges} also includes a web browser extension that incorporates activity sparklines in the real Wikipedia. Finally, we introduce a revisions summarization task that addresses the need to understand what occurred during a given set of revisions. We present a first approach to this task using tag clouds to present the revisions made.}}


 * -- align="left" valign=top
 * Mihalcea, Rada & Csomai, Andras
 * Wikify!: linking documents to encyclopedic knowledge
 * Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
 * 2007
 * 


 * -- align="left" valign=top
 * Frisa, Raquel; Anglés, Rosana & Puyal, Óscar
 * WikiMob: wiki mobile interaction
 * Proceedings of the 4th International Symposium on Wikis
 * 2008
 * 


 * -- align="left" valign=top
 * Wikipedia and Artificial Intelligence: An Evolving Synergy - Papers from the 2008 AAAI Workshop
 * 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states
 * 2008
 * 2008


 * -- align="left" valign=top
 * Dooley, Patricia L.
 * Wikipedia and the two-faced professoriate
 * Proceedings of the 6th International Symposium on Wikis and Open Collaboration
 * 2010
 * 
 * {{hidden||A primary responsibility of university teachers is to guide their students in the process of using only the most accurate research resources in their completion of assignments. Thus, it is not surprising to hear that faculty routinely coach their students to use Wikipedia carefully. Even more pronounced Anti-Wikipedia} backlashes have developed on some campuses, leading faculty to forbid their students to use the popular on-line compendium of information. Within this context, but directing the spotlight away from students, this pilot study uses survey and content analysis research methods to explore how faculty at U.S.} universities and colleges regard Wikipedia's credibility as an information source, as well as how they use Wikipedia in their academic work. The results of the survey reveal that while none of the university faculty who completed it regard Wikipedia as an extremely credible source of information, more than half stated it has moderate to high credibility, and many use it in both their teaching and research. The results of the content analysis component of the study demonstrates that academic researchers from across the disciplines are citing Wikipedia as a source of scholarly information in their peer-reviewed research reports. Although the study's research findings are not generalizable, they are surprising considering the professoriate's oft-stated lack of trust in Wikipedia.}}
 * {{hidden||A primary responsibility of university teachers is to guide their students in the process of using only the most accurate research resources in their completion of assignments. Thus, it is not surprising to hear that faculty routinely coach their students to use Wikipedia carefully. Even more pronounced Anti-Wikipedia} backlashes have developed on some campuses, leading faculty to forbid their students to use the popular on-line compendium of information. Within this context, but directing the spotlight away from students, this pilot study uses survey and content analysis research methods to explore how faculty at U.S.} universities and colleges regard Wikipedia's credibility as an information source, as well as how they use Wikipedia in their academic work. The results of the survey reveal that while none of the university faculty who completed it regard Wikipedia as an extremely credible source of information, more than half stated it has moderate to high credibility, and many use it in both their teaching and research. The results of the content analysis component of the study demonstrates that academic researchers from across the disciplines are citing Wikipedia as a source of scholarly information in their peer-reviewed research reports. Although the study's research findings are not generalizable, they are surprising considering the professoriate's oft-stated lack of trust in Wikipedia.}}


 * -- align="left" valign=top
 * Tonelli, Sara & Giuliano, Claudio
 * Wikipedia as frame information repository
 * Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
 * 2009
 * 
 * {{hidden||In this paper, we address the issue of automatic extending lexical resources by exploiting existing knowledge repositories. In particular, we deal with the new task of linking FrameNet} and Wikipedia using a word sense disambiguation system that, for a given pair frame -- lexical unit (F, l), finds the Wikipage that best expresses the the meaning of l. The mapping can be exploited to straightforwardly acquire new example sentences and new lexical units, both for English and for all languages available in Wikipedia. In this way, it is possible to easily acquire good-quality data as a starting point for the creation of FrameNet} in new languages. The evaluation reported both for the monolingual and the multilingual expansion of FrameNet} shows that the approach is promising.}}
 * {{hidden||In this paper, we address the issue of automatic extending lexical resources by exploiting existing knowledge repositories. In particular, we deal with the new task of linking FrameNet} and Wikipedia using a word sense disambiguation system that, for a given pair frame -- lexical unit (F, l), finds the Wikipage that best expresses the the meaning of l. The mapping can be exploited to straightforwardly acquire new example sentences and new lexical units, both for English and for all languages available in Wikipedia. In this way, it is possible to easily acquire good-quality data as a starting point for the creation of FrameNet} in new languages. The evaluation reported both for the monolingual and the multilingual expansion of FrameNet} shows that the approach is promising.}}


 * -- align="left" valign=top
 * Hansen, Sean; Berente, Nicholas & Lyytinen, Kalle
 * Wikipedia as rational discourse: An illustration of the emancipatory potential of information systems
 * 40th Annual Hawaii International Conference on System Sciences 2007, HICSS'07, January 3, 2007 - January 6, 2007 Big Island, {HI, United states
 * 2007
 * 


 * -- align="left" valign=top
 * Santamaría, Celina; Gonzalo, Julio & Artiles, Javier
 * Wikipedia as sense inventory to improve diversity in Web search results
 * Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
 * 2010
 * 


 * -- align="left" valign=top
 * Wales, Jimmy
 * Wikipedia in the free culture revolution
 * OOPSLA '05 Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
 * 2005
 * 
 * {{hidden||Jimmy Jimbo"} Wales is the founder of Wikipedia.org the free encyclopedia project and Wikicities.com which extends the social concepts of Wikipedia into new areas. Jimmy was formerly a futures and options trader in Chicago and currently travels the world evangelizing the success of Wikipedia and the importance of free culture. When not traveling}}
 * {{hidden||Jimmy Jimbo"} Wales is the founder of Wikipedia.org the free encyclopedia project and Wikicities.com which extends the social concepts of Wikipedia into new areas. Jimmy was formerly a futures and options trader in Chicago and currently travels the world evangelizing the success of Wikipedia and the importance of free culture. When not traveling}}


 * -- align="left" valign=top
 * Potthast, Martin
 * Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search
 * Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
 * 2007
 * 
 * {{hidden||We develop and implement a new indexing technology which allows us to use complete (and possibly very large) documents as queries, while having a retrieval performance comparable to a standard term query. Our approach aims at retrieval tasks such as near duplicate detection and high similarity search. To demonstrate the performance of our technology we have compiled the search index Wikipedia} in the Pocket" which contains about 2 million English and German Wikipedia articles.1 This index--along with a search interface--fits on a conventional CD} (0.7 gigabyte). The ingredients of our indexing technology are similarity hashing and minimal perfect hashing."}}
 * {{hidden||We develop and implement a new indexing technology which allows us to use complete (and possibly very large) documents as queries, while having a retrieval performance comparable to a standard term query. Our approach aims at retrieval tasks such as near duplicate detection and high similarity search. To demonstrate the performance of our technology we have compiled the search index Wikipedia} in the Pocket" which contains about 2 million English and German Wikipedia articles.1 This index--along with a search interface--fits on a conventional CD} (0.7 gigabyte). The ingredients of our indexing technology are similarity hashing and minimal perfect hashing."}}


 * -- align="left" valign=top
 * Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro
 * Wikipedia mining for an association web thesaurus construction
 * 8th International Conference on Web Information Systems Engineering, WISE 2007, December 3, 2007 - December 7, 2007 Nancy, France
 * 2007
 * {{hidden||Wikipedia has become a huge phenomenon on the WWW.} As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL} identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path} Frequency - Inversed Backward link Frequency) and the extension method forward / backward link weighting (FB} weighting)" in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.} Springer-Verlag} Berlin Heidelberg 2007."}}
 * {{hidden||Wikipedia has become a huge phenomenon on the WWW.} As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL} identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path} Frequency - Inversed Backward link Frequency) and the extension method forward / backward link weighting (FB} weighting)" in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.} Springer-Verlag} Berlin Heidelberg 2007."}}
 * {{hidden||Wikipedia has become a huge phenomenon on the WWW.} As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL} identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path} Frequency - Inversed Backward link Frequency) and the extension method forward / backward link weighting (FB} weighting)" in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.} Springer-Verlag} Berlin Heidelberg 2007."}}


 * -- align="left" valign=top
 * Sunercan, Orner & Birturk, Aysenur
 * Wikipedia missing link discovery: A comparative study
 * 2010 AAAI Spring Symposium, March 22, 2010 - March 24, 2010 Stanford, CA, United states
 * 2010


 * -- align="left" valign=top
 * Dutta, Amitava; Roy, Rahul & Seetharaman, Priya
 * Wikipedia Usage Patterns: The Dynamics of Growth
 * 2008
 * 
 * 


 * -- align="left" valign=top
 * Tu, Xinhui; He, Tingting; Chen, Long; Luo, Jing & Zhang, Maoyuan
 * Wikipedia-based semantic smoothing for the language modeling approach to information retrieval
 * 32nd European Conference on Information Retrieval, ECIR 2010, March 28, 2010 - March 31, 2010 Milton Keynes, United kingdom
 * 2010
 * 
 * {{hidden||Semantic smoothing for the language modeling approach to information retrieval is significant and effective to improve retrieval performance. In previous methods such as the translation model, individual terms or phrases are used to do semantic mapping. These models are not very efficient when faced with ambiguous words and phrases because they are unable to incorporate contextual information. To overcome this limitation, we propose a novel Wikipedia-based semantic smoothing method that decomposes a document into a set of weighted Wikipedia concepts and then maps those unambiguous Wikipedia concepts into query terms. The mapping probabilities from each Wikipedia concept to individual terms are estimated through the EM} algorithm. Document models based on Wikipedia concept mapping are then derived. The new smoothing method is evaluated on the TREC} Ad Hoc Track (Disks} 1, 2, and 3) collections. Experiments show significant improvements over the two-stage language model, as well as the language model with translation-based semantic smoothing. 2010 Springer-Verlag} Berlin Heidelberg.}}
 * {{hidden||Semantic smoothing for the language modeling approach to information retrieval is significant and effective to improve retrieval performance. In previous methods such as the translation model, individual terms or phrases are used to do semantic mapping. These models are not very efficient when faced with ambiguous words and phrases because they are unable to incorporate contextual information. To overcome this limitation, we propose a novel Wikipedia-based semantic smoothing method that decomposes a document into a set of weighted Wikipedia concepts and then maps those unambiguous Wikipedia concepts into query terms. The mapping probabilities from each Wikipedia concept to individual terms are estimated through the EM} algorithm. Document models based on Wikipedia concept mapping are then derived. The new smoothing method is evaluated on the TREC} Ad Hoc Track (Disks} 1, 2, and 3) collections. Experiments show significant improvements over the two-stage language model, as well as the language model with translation-based semantic smoothing. 2010 Springer-Verlag} Berlin Heidelberg.}}


 * -- align="left" valign=top
 * Gray, D.; Kozintsev, I.; Wu, Yi & Haussecker, H.
 * Wikireality: augmenting reality with community driven Websites
 * 2009 IEEE International Conference on Multimedia and Expo (ICME), 28 June-3 July 2009 Piscataway, NJ, USA}
 * 2009
 * 
 * {{hidden||We present a system for making community driven websites easily accessible from the latest mobile devices. Many of these new devices contain an ensemble of sensors such as cameras, GPS} and inertial sensors. We demonstrate how these new sensors can be used to bring the information contained in sites like Wikipedia to users in a much more immersive manner than text or maps. We have collected a large database of images and articles from Wikipedia and show how a user can query this database by simply snapping a photo. Our system uses the location sensors to assist with image matching and the inertial sensors to provide a unique and intuitive user interface for browsing results.}}
 * {{hidden||We present a system for making community driven websites easily accessible from the latest mobile devices. Many of these new devices contain an ensemble of sensors such as cameras, GPS} and inertial sensors. We demonstrate how these new sensors can be used to bring the information contained in sites like Wikipedia to users in a much more immersive manner than text or maps. We have collected a large database of images and articles from Wikipedia and show how a user can query this database by simply snapping a photo. Our system uses the location sensors to assist with image matching and the inertial sensors to provide a unique and intuitive user interface for browsing results.}}


 * -- align="left" valign=top
 * Strube, Michael & Ponzetto, Simone Paolo
 * WikiRelate! computing semantic relatedness using wikipedia
 * AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
 * 2006
 * 
 * {{hidden||Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet.} In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet} on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet} when applied to the largest available dataset designed for that purpose. The best results on this dataset are obtained by integrating Google, WordNet} and Wikipedia based measures. We also show that including Wikipedia improves the performance of an NLP} application processing naturally occurring texts.}}
 * {{hidden||Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet.} In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet} on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet} when applied to the largest available dataset designed for that purpose. The best results on this dataset are obtained by integrating Google, WordNet} and Wikipedia based measures. We also show that including Wikipedia improves the performance of an NLP} application processing naturally occurring texts.}}


 * -- align="left" valign=top
 * Toms, Elaine G.; Mackenzie, Tayze; Jordan, Chris & Hall, Sam
 * wikiSearch: enabling interactivity in search
 * Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
 * 2009
 * 


 * -- align="left" valign=top
 * Ferrari, Luna De; Aitken, Stuart; van Hemert, Jano & Goryanin, Igor
 * WikiSim: simulating knowledge collection and curation in structured wikis
 * Proceedings of the 4th International Symposium on Wikis
 * 2008
 * 
 * {{hidden||The aim of this work is to model quantitatively one of the main properties of wikis: how high quality knowledge can emerge from the individual work of independent volunteers. The approach chosen is to simulate knowledge collection and curation in wikis. The basic model represents the wiki as a set of of true/false values, added and edited at each simulation round by software agents (users) following a fixed set of rules. The resulting WikiSim} simulations already manage to reach distributions of edits and user contributions very close to those reported for Wikipedia. WikiSim} can also span conditions not easily measurable in real-life wikis, such as the impact of various amounts of user mistakes. WikiSim} could be extended to model wiki software features, such as discussion pages and watch lists, while monitoring the impact they have on user actions and consensus, and their effect on knowledge quality. The method could also be used to compare wikis with other curation scenarios based on centralised editing by experts. The future challenges for WikiSim} will be to find appropriate ways to evaluate and validate the models and to keep them simple while still capturing relevant properties of wiki systems.}}
 * {{hidden||The aim of this work is to model quantitatively one of the main properties of wikis: how high quality knowledge can emerge from the individual work of independent volunteers. The approach chosen is to simulate knowledge collection and curation in wikis. The basic model represents the wiki as a set of of true/false values, added and edited at each simulation round by software agents (users) following a fixed set of rules. The resulting WikiSim} simulations already manage to reach distributions of edits and user contributions very close to those reported for Wikipedia. WikiSim} can also span conditions not easily measurable in real-life wikis, such as the impact of various amounts of user mistakes. WikiSim} could be extended to model wiki software features, such as discussion pages and watch lists, while monitoring the impact they have on user actions and consensus, and their effect on knowledge quality. The method could also be used to compare wikis with other curation scenarios based on centralised editing by experts. The future challenges for WikiSim} will be to find appropriate ways to evaluate and validate the models and to keep them simple while still capturing relevant properties of wiki systems.}}


 * -- align="left" valign=top
 * West, Robert; Pineau, Joelle & Precup, Doina
 * Wikispeedia: an online game for inferring semantic distances between concepts
 * Proceedings of the 21st international jont conference on Artifical intelligence
 * 2009
 * 


 * -- align="left" valign=top
 * Ponzetto, Simone Paolo & Strube, Michael
 * WikiTaxonomy: A Large Scale Knowledge Resource
 * Proceeding of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
 * 2008
 * 


 * -- align="left" valign=top
 * Mazur, Pawet & Dale, Robert
 * WikiWars: a new corpus for research on temporal expressions
 * Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
 * 2010
 * 
 * {{hidden||The reliable extraction of knowledge from text requires an appropriate treatment of the time at which reported events take place. Unfortunately, there are very few annotated data sets that support the development of techniques for event time-stamping and tracking the progression of time through a narrative. In this paper, we present a new corpus of temporally-rich documents sourced from English Wikipedia, which we have annotated with TIMEX2} tags. The corpus contains around 120000 tokens, and 2600 TIMEX2} expressions, thus comparing favourably in size to other existing corpora used in these areas. We describe the preparation of the corpus, and compare the profile of the data with other existing temporally annotated corpora. We also report the results obtained when we use DANTE, our temporal expression tagger, to process this corpus, and point to where further work is required. The corpus is publicly available for research purposes.}}
 * {{hidden||The reliable extraction of knowledge from text requires an appropriate treatment of the time at which reported events take place. Unfortunately, there are very few annotated data sets that support the development of techniques for event time-stamping and tracking the progression of time through a narrative. In this paper, we present a new corpus of temporally-rich documents sourced from English Wikipedia, which we have annotated with TIMEX2} tags. The corpus contains around 120000 tokens, and 2600 TIMEX2} expressions, thus comparing favourably in size to other existing corpora used in these areas. We describe the preparation of the corpus, and compare the profile of the data with other existing temporally annotated corpora. We also report the results obtained when we use DANTE, our temporal expression tagger, to process this corpus, and point to where further work is required. The corpus is publicly available for research purposes.}}


 * -- align="left" valign=top
 * Sarini, Marcello; Durante, Federica & Gabbiadini, Alessandro
 * Workflow management social systems: A new socio-psychological perspective on process management
 * Business Process Management Workshops - BPM 2009 International Workshops, September 7, 2009 - September 7, 2009 Ulm, Germany
 * 2010
 * 


 * -- align="left" valign=top
 * Ortega, Felipe; Reagle, Joseph; Reinoso, Antonio J. & Jesus, Rut
 * Workshop on interdisciplinary research on Wikipedia and wiki communities
 * Proceedings of the 4th International Symposium on Wikis
 * 2008
 * 


 * -- align="left" valign=top
 * Voss, Jakob
 * Workshop on Wikipedia research
 * Proceedings of the 2006 international symposium on Wikis
 * 2006
 * 


 * -- align="left" valign=top
 * Amer-Yahia, Sihem; Baeza-Yates, Ricardo; Consens, Mariano P. & Lalmas, Mounia
 * XML Retrieval: DB/IR in theory, web in practice
 * Proceedings of the 33rd international conference on Very large data bases
 * 2007
 * 
 * {{hidden||The world of data has been developed from two main points of view: the structured relational data model and the unstructured text model. The two distinct cultures of databases and information retrieval now have a natural meeting place in the Web with its semi-structured XML} model. Data in Digital Libraries and in Enterprise Environments also shares many of the semi-structured characteristics of web data. As web-style searching becomes an ubiquitous tool, the need for integrating these two viewpoints becomes even more important. In particular, we consider the application of DB} and IR} research to querying Web data in the context of online communities. With Web 2.0, the question arises: how can search interfaces remain simple when users are allowed to contribute content (Wikipedia), share it (Flickr), and rate it (YouTube)?} When they can decide who their friends are (del.icio.us), what they like to see, and how they want it to look like (MySpace)?} While we want to keep the user interface simple (keyword search), we would like to study the applicability of querying structure and content to a context where new forms of data-driven dynamic web content (e.g. user feed-back, tags, contributed multimedia) are provided. This tutorial will provide an overview of the different issues and approaches put forward by the IR} and DB} communities and survey the DB-IR} integration efforts as they focus in the problem of retrieval from XML} content. In particular, the context of querying content in online communities is an excellent example of such an application. Both earlier proposals as well as recent ones will be discussed. A variety of application scenarios for XML} Retrieval will be covered, including examples of current tools and techniques.}}
 * {{hidden||The world of data has been developed from two main points of view: the structured relational data model and the unstructured text model. The two distinct cultures of databases and information retrieval now have a natural meeting place in the Web with its semi-structured XML} model. Data in Digital Libraries and in Enterprise Environments also shares many of the semi-structured characteristics of web data. As web-style searching becomes an ubiquitous tool, the need for integrating these two viewpoints becomes even more important. In particular, we consider the application of DB} and IR} research to querying Web data in the context of online communities. With Web 2.0, the question arises: how can search interfaces remain simple when users are allowed to contribute content (Wikipedia), share it (Flickr), and rate it (YouTube)?} When they can decide who their friends are (del.icio.us), what they like to see, and how they want it to look like (MySpace)?} While we want to keep the user interface simple (keyword search), we would like to study the applicability of querying structure and content to a context where new forms of data-driven dynamic web content (e.g. user feed-back, tags, contributed multimedia) are provided. This tutorial will provide an overview of the different issues and approaches put forward by the IR} and DB} communities and survey the DB-IR} integration efforts as they focus in the problem of retrieval from XML} content. In particular, the context of querying content in online communities is an excellent example of such an application. Both earlier proposals as well as recent ones will be discussed. A variety of application scenarios for XML} Retrieval will be covered, including examples of current tools and techniques.}}


 * -- align="left" valign=top
 * Suchanek, Fabian M.; Kasneci, Gjergji & Weikum, Gerhard
 * Yago: a core of semantic knowledge
 * Proceedings of the 16th international conference on World Wide Web
 * 2007
 * 
 * {{hidden||We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO} builds on entities and relations and currently contains more than 1 million entities and 5 million facts. This includes the Is-A} hierarchy as well as non-taxonomic relations between entities (such as {HASONEPRIZE).} The facts have been automatically extracted from Wikipedia and unified with WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet:} in quality by adding knowledge about individuals like persons, organizations, products, etc. with their semantic relationships - and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correctness shows an accuracy of about 95\%. YAGO} is based on a logically clean model, which is decidable, extensible, and compatible with RDFS.} Finally, we show how YAGO} can be further extended by state-of-the-art information extraction techniques.}}
 * {{hidden||We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO} builds on entities and relations and currently contains more than 1 million entities and 5 million facts. This includes the Is-A} hierarchy as well as non-taxonomic relations between entities (such as {HASONEPRIZE).} The facts have been automatically extracted from Wikipedia and unified with WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet:} in quality by adding knowledge about individuals like persons, organizations, products, etc. with their semantic relationships - and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correctness shows an accuracy of about 95\%. YAGO} is based on a logically clean model, which is decidable, extensible, and compatible with RDFS.} Finally, we show how YAGO} can be further extended by state-of-the-art information extraction techniques.}}