User:Michal.s.wojcik/sandbox

$$Q = q_1, \dots, q_n$$ - xml log as query

$$D$$ - chosen xml log from database.

$$RANK(Q,D) = \sum_{i=1}^n IDF_{tr1}^*(q_i) \cdot (\sum_{j=1}^{len(D)}sim_{tr2}^*(D(j), q_i))$$

Where:

$$IDF^{*}(q) = \log(\frac{N - n_{tr1}^*(q) + 0.5}{n_{tr1}^*(q) + 0.5}) - \log(\frac{N + 0.5}{0.5})$$

$$n_{tr1}^*(q)$$ number of documents in database which include row $$r$$ for which $$sim(q,r) \geq tr1$$. We use $$tr1 = 0.9$$

$$sim_{tr2}^*(q,r) = \begin{cases} 0 \text{ if } sim(q,r) < tr2,\\ sim(q,r) \text{ if } sim(q,r)\geq tr2 \end{cases}$$. We use $$tr2 = 0.8$$

$$sim(q,r)$$ is chosen similarity function between lines.

$$len(D)$$ - number of lines in D

$$D(j)$$ - j-th line in document D

$$N$$ - number of documents in database

$$IDF^{*}(q) = \begin{cases} 0 \text{ if } n_{tr1}^*(q) = N, \\ \log(\frac{N}{n_{tr1}^*(q)})\end{cases}$$