User:Linzhuoli/sandbox

Word Embedding

Development of Word Embedding Technique
The Word Embedding technique began to develop since 2000. Benjio et al provided in a series of papers the "Neural probabilistic language models" to reduce the high dimensionality of words representations in contexts by "learning a distributed representation for words". (Benjo et al, 2003). Roweis and Saul published in science how to use "locally linear embedding"(LLE) to discover representations of high dimensional data structure. The area developed gradually and really took off after 2010, partly because important advances had been made since then on the quality of vectors and the training speed of the model. There are many branches and many research groups working on word embedding. For example, the probably most famous group is lead by Google(Tomas Mikolov et, al). In 2013, they offered a word2vec toolkit that can analyze the analogy of words embedded in the vector space. Most of new word embedding techniques rely on a neural network architecture instead of more traditional "n-gram" models and unsupervised learning.