User:Linaist/Looking for topics

I further did some study on probabilistic modeling in the last week. Although I notice something that can be done in this topic, I think they do not worth try: 1. Substitute the third layer of LDA with long-distance language model. 2. Incorporate entity as a feature to topic model based on LDA model. I think both ideas are not appropriate. In terms of the first one, I further refer to several language modeling papers, I felt that long-distance language is not accurate, thus not as desired as it was expected. In terms of the second idea, I'm afraid it is a trivial one. I also have a talk with Bingjun, according to him, the entity in the Chemistry domain is very dirty, thus might not obtain good results.

I now realize that the probabilistic model is not suitable for ChemXseer, there is really not much to do on ChemXseer. Having this thought, I went through papers published on SIGIR07 and SIGIR08. I think the following topics are interesting, but I don't know if I should go further in any of them, at this point, I need instruction: 1. Search People/Author (disambigating name problem) 2. Using user log to enhancing ranking, searching and related issue 3. Detecting negative citation of a paper (this idea is inspired by current evaluation system of journal or individual, such as IF and H-index, both of them only consider the time of citation, while neglect the probability that a paper might be negative cited hundreds of time due to misconducting experiments or wrong theory) 4. QA system for Chemistry domain

I am sorry for not very productive this week. I am still in the middle of choosing an appropriate topic. And I need instruction.