User:Ang li/sandbox

Tag Navigation and Visualization
People are seeking for useful information everyday, and there are mainly two strategies to explore and discover an information space. The first one is the regular search: people know what they are searching for. Under this context, users have a target information in mind. They usually first need to formulate a search query in mind and then search that query in the search engine, such as Google. In contrast, another search strategy is navigation, where people do not really need a target information in mind but rather explore through pieces of information by following certain hyperlinks.

It is usually considered that navigation has advantage over search since recognizing what we are looking for is much easier than formulating and describing the information people need, which also refers to the "vocabulary problem". Therefore, social tagging serves as a new social way of organizing a set of resources. This free-form annotation approached the "vocabulary problem" from a new social angle. Social tagging systems allow people to annotate a set of resources according to their own needs with freely chosen words -- tags, and share them with other users of the social tagging system.The result of this human-based annotation of resources is called Folksonomy, which refers to a fold-generated taxonomy. Examples of such social tagging systems are BibSonomy, CiteULike, Flickr, or Delicious.

Tag-based navigation is the process to find path between information resources in an tag-based information system. The navigation process is usually supported by either a tag cloud or navigate through a tag hierarchy.

Tag Cloud
Tag cloud is a textual representation of the topic or subject collectively seen by the users and it captures the "aboutness" of the resource.

On the one hand, tag cloud has many benefits. It is simple to build, intuitive to understand, and widely used. It can represent the three types of relationship among users, tags, and resources in the tagging systems. On the other hand, since there exists the limitation on size of the tag cloud that can be presented in the screen, selecting the best tags and structuring the the information space to present the relationships in the tag cloud would be an important issue.

Although tag clouds are very simple, they can be applied to support the user in multiple ways. Researches find that tag cloud is usually more useful for the following four different tasks, as illustrated by Rivadeneira et al. :
 * Search: finding the presence or absence of a given target
 * Browsing: exploring the cloud without a particular target in mind
 * Gaining visual impression about a topic
 * Recognition and matching: recognizing the tag cloud as data describing a specific topic

Researches also found that different Layouts is useful in performing different tasks. In addition, researches also demonstrated that tag clouds typography (font size/position) matters: font size has a bigger impact on finding a tag than other visual features like, e.g., color, tag string length and tag location.

Tag cloud evaluation: based on the previous researches, below are list of common ways to perform tag cloud evaluation:
 * Use certain evaluation metrics for tag clouds with respect to coverage, overlap and selectivity
 * User navigation model that combined with the evaluation metrics allows a tag cloud evaluation with respect to navigation
 * User study to evaluate tag-based information access in image collections
 * Examined the navigability assumption (the widely adopted belief that tag clouds are useful for navigation), they found that it does not hold for every social tagging system

Tag Clustering
One of the main issues with social tagging data is the lack of structure. Synonymy, polysemy and homonymy or problems regrading the semantic of the tags are additional issues related to tagging data. Previous research demonstrate us different algorithms for clustering tagging data which could tackle the above problems by organizing the tags according to a classification schema. Depending on the classification schema, there are two main categories: flat and hierarchical clustering algorithms.

Flat Classification can refer to three main methods: Hierarchical Tag Clustering refers to create a hierarchical structure out of unstructured tagging data. This way, hierarchical structure can be seen as the users’ mental maps of the information space, thus the hierarchies can be used as a navigational aid in different ways.
 * 1) Content-based method: one very widely adopted algorithm for tag cloud selection is TopN algorithm proposed by Venetis et al.
 * 2) Network-based method: split a graph of connected tags into clusters, ideas from the concept of modularity.
 * 3) Machine learning method: consider the semantic relationship between tags. Similar idea from the Latent Dirichlet allocation (LDA) model.

Hierarchical Tag Clustering can also refer to three main methods:
 * Hierarchical K-Means is the method that adapted the K-Means algorithms to work with textual data and create a tag hierarchy in a top-down manner
 * Affinity Propagation characterizes each data sample according to its ”responsibility” and its ”availability” values. The input of the algorithm is a set of similarities between data samples provided in a matrix and the output of the algorithm is a hierarchy, and each node in the hierarchy represent a unique tag
 * Generality in Tag Similarity Graph method includes the following steps:
 * The input of the algorithm is a tag similarity graph
 * Set the most general node (centrality) to be the root of the hierarchy
 * All other nodes are added to the hierarchy in descending order of their centrality in the similarity graph
 * Calculate the similarity between all currently present nodes in the hierarchy and the candidate node.
 * If their similarity is above a given threshold: the candidate node is added as a child of the most similar node in the hierarchy
 * Otherwise, the candidate node is added as a child of the root node


 * Typical versions:
 * Degree centrality as centrality measure and co-occurrence as similarity measure (DegCen/Cooc)
 * Closeness centrality and Cosine similarity (CloCen/Cos)

Modeling Navigation in Social Tagging Systems
Modeling tag-based navigation is important for understanding the processes taking place in a social tagging system and how the system is used. There are two essential factors for modeling tag-based navigation in social tagging systems: basic modeling framework for navigation and theories understanding of the ability of folksonomies to guide navigation.

Markov chain models: Decentralized Search:
 * Navigation on the Web can be seen as the process of following links between web pages
 * Markov chain models assign transition probabilities between web pages (also called states)
 * First order Markov chains (the transition probability between states depends only on the current state) are more commonly used
 * Navigation in a network can be modeled by the message-passing algorithm decentralized search
 * The message holder passes a message to one of its immediate neighbor nodes until the target node is found
 * That is, at each step, the decision of where to go is made by the local knowledge of the network only
 * Finding a path to a node (already realized in web navigation)

Theoretic Suitability for Search

Different scholars also provide the theoretic support to argue the suitability of Folksonomies as a navigational aid, there are mainly four perspectives as illustrated below:
 * Network theoretic perspective has two aspects: The general navigability of a folksonomy as a graph; or The ability of tag hierarchies to guide navigation in such a graph
 * Information theoretic perspective suggest to see social tagging as the collective effort of creating a mental map that summarize an information space
 * Information foraging perspective: describe the human information seeking in a digital environment
 * Tagging vs. Library approach: discussed the pros and cons of the “tagging system”. They proposed a definition of a controlled vocabulary and compared unrestricted free-form vocabularies emerged in social tagging systems to controlled vocabularies

Pragmatic Folksonomies Evaluation
Evaluation method introduced in this section is based on the paper by Helic et al. ”Pragmatic Evaluation of Folksonomies”.

The author proposed in the paper the general idea that people can leverage on the OUTPUT produced by folksonomy algorithms (hierarchical structures) as INPUT (background knowledge) for decentralized search for the following reasons:

1) The performance of decentralized search highly depends on the quality of the hierarchical clustering results that developed to facilitated navigation.

2) The performance of the decentralized search algorithm depends on the suitability of folksonomies.

3) Therefore, the authors proposed that we can leverage decentralized search through simulation to evaluate the suitability of folksonomies.

Following is the specific evaluation steps in the paper:

Step 1. Based on different algorithms to generate tag hierarchy structures

Step 2. Classification of searchable networks Step 3. Modeling users navigation tasks: Step 4. Defining evaluation metrics: Usually is to use the length of the shortest path between each search pairs
 * Calculate the distribution of the distance between the tags in a tag hierarchy for connected tags in tag-tag network
 * Analyzed the distribution in order to see how it compares to the distance distribution of the class of theoretically searchable networks
 * Depend on the comparison results, can assess the theoretical suitability of folksonomies for decentralized search
 * Select the starting nodes in the network (uniformly random)
 * Select the target nodes in the network (uniformly random)
 * The pair of start nodes and the set of target nodes is called a search pair
 * Goal is to find a short way between the start node and one of the target nodes.

Step 5. Simulation the user behavior using the search pairs Step 6. Evaluation
 * Simulate exploratory navigation by performing decentralized search using a greedy search strategy on the search pairs (defined in step 3)
 * The folksonomy is applied as background knowledge
 * Search is considered successful if the algorithm finds at least one of the target tags
 * The success rate provides an answer to the first question -- pragmatic suitability of a folksonomy to support navigation
 * Compare the decentralized search steps (step 5) with the evaluation metrics (step 4)
 * The simulation results for different folksonomies are compared to each other