User talk:Xiaoju zheng

Name: Xiaoju Zheng

Social Network Analysis Project Proposal

This project aims to look at how different network structures among people would influence the language conventions adopted, or more specifically, the pattern of collective vocabulary shared by people of certain network structures. With this aim in mind, people’s tagging vocabulary and their social network structures on Flickr.com will be examined, to shed light on language usage by people in another dimension of world, the internet. Flickr.com is a public photo-sharing website, where people can freely upload pictures and create semantic metadata, i.e., tags, to describe their pictures. Thanks to the booming of digital camera industry and increasingly high accessibility of the internet, up to the end of 2007, there were more than 2,000,000,000 photos uploaded to Flickr.com, with more than 40,000,000 people visiting the website every month. User participation is an essential aspect of tagging, which is encouraged by Flickr.com. Besides, users of Flickr.com can also add other users as their friends, and in this way, a certain type of friendship relation is formed on the Flickr.com. By adding another user as friends, the user will be notified when their friends upload a new picture, which will conveniently enable the user to look at the newly uploaded picture, and at the same time, get exposed to the tags their friends annotate the new pictures with.

Since photos taken by a person is rather candid way of manifesting the person’s life: where he lives, what his lifestyle is, what his aesthetic view is, etc, and the words people use as tags are very loosely controlled, the tagging vocabulary of a certain user could be very individualized in terms of word choices. However, Languages are established, maintained, and acquired, primarily through one-to-one interactions, which are manifested as picture viewing on Flickr.com. As described above, users can always keep updated to their friends’ photo stream, hence, get well informed of the tagging vocabulary their friends use. Aggregation of tagging vocabulary could be an indication of aggregation of interest, and by forming closely connected friendship network, people also merge their highly individualized tagging vocabulary into more homogenous set of collective tagging vocabulary.

Besides, the motivations of people’s tagging their own pictures on Flickr.com are mainly for search, self retrieval, making the photo searchable by other contributing users, where tags enable users to discover other users’ photos. In other words, the traditional use of tag is for annotation, personal organization, and retrieval of information. At the same time, adding tags to their photos also increases the searchability of the photos and, hence, number of views these photos receive. In this way, when users are viewing pictures on Flickr.com, they equally get the chance of exposing themselves to a variety of tagging vocabularies associated with the pictures they are viewing.

Based on these observations, three hypotheses are raised:

 Hypothesis 1''' More tagging vocabulary will be shared among people who are friends with each other. In other words, there are more overlaps of tagging vocabulary among people who are friends with each other than those who are not. Another way to frame this hypothesis is people sharing more tags tend to be friends with each other.

Hypothesis 2: The degree of tagging vocabulary overlap will be associated with such features of social network structure as centrality, transitivity, homophily, assimilation. For instance, people who have common interests tend to form friendship relation, and the interests shared may be manifested as overlapping tagging vocabulary; after people have become friends and viewing their friends’ photo stream constantly, they merge their interests or get more exposure to others tagging vocabulary, they come to an agreement of using certain words which could serve better annotation functions; subgroups of higher clustering would have higher degree of tagging vocabulary overlap.

Hypothesis 3 (most interested) Based on hypothesis 1 and 2, a further proposal would be put forth that clustering of tags would be highly correlated with clustering of people in the network. Clustering of tags, or co-occurrence of certain sets of tags, can serves as a way to classify people as groups or as subgroups within a network. N-gram is a similar concept used in document classification, and the concept of N-gram can be adapted to the scenario of tagging in a friendship network. In other words, shared tags can serve as N-gram to measure the similarity of people in the network: e.g. larger sets of shared tags indicate smaller geodesic distance.

Method An adapted FOAF (Friend-of-a-friend) algorithm will be used to crawl friendship relation on flickr.com. In this way a network of people could be obtained: an actor-by-actor matrix is thus obtained. For instance, a network of 1000 actors would be obtained in this way. The tags used by these 1000 actors would be then crawled from the website. Since types of tags used by a singer actor could be well above 100, only the 20 most frequent tags would be selected. In this way, an actor-by-tag matrix would be obtained. It is very possible that these 1000 actors would share some tags, so the overall number of tags in this matrix would be much less than 20*1000. These two matrix are both binary, either 0 or 1 value present. Further analysis would be carried out based on these two matrix.

Fuller citations for Self-organization in linguistics
Hi Xiaoju. If you are still active on Wikipedia, several editors would appreciate it if you would come back to the article on Self-organization and add more complete citations to the Linguistics section. It appears you wrote a lot of this material in 2008, but the citations are not sufficiently complete so they can be verified by other editors. Cheers. N2e (talk) 14:35, 24 January 2010 (UTC)