Bayesian knowledge tracing

Bayesian knowledge tracing is an algorithm used in many intelligent tutoring systems to model each learner's mastery of the knowledge being tutored.

It models student knowledge in a hidden Markov model as a latent variable, updated by observing the correctness of each student's interaction in which they apply the skill in question.

BKT assumes that student knowledge is represented as a set of binary variables, one per skill, where the skill is either mastered by the student or not. Observations in BKT are also binary: a student gets a problem/step either right or wrong. Intelligent tutoring systems often use BKT for mastery learning and problem sequencing. In its most common implementation, BKT has only skill-specific parameters.

Method
There are four model parameters used in BKT: Assuming that these parameters are set for all skills, the following formulas are used as follows: The initial probability of a student $$u$$ mastering skill $$k$$ is set to the p-init parameter for that skill equation (a). Depending on whether the student $$u$$ learned and applies skill $$k$$ correctly or incorrectly, the conditional probability is computed by using equation (b) for correct application, or by using equation (c) for incorrect application. The conditional probability is used to update the probability of skill mastery calculated by equation (d). To figure out the probability of the student correctly applying the skill on a future practice is calculated with equation (e).
 * $$p(L_0)$$ or $$p\text{-init}$$, the probability of the student knowing the skill beforehand.
 * $$p(T)$$ or $$p\text{-transit}$$, the probability of the student demonstrating knowledge of the skill after an opportunity to apply it
 * $$p(S)$$ or $$p\text{-slip}$$, the probability the student makes a mistake when applying a known skill
 * $$p(G)$$ or $$p\text{-guess}$$, the probability that the student correctly applies an unknown skill (has a lucky guess)

Equation (a):


 * $$p(L_1)^k_u=p(L_0)^k$$

Equation (b):


 * $$p(L_t \mid \text{obs}=\text{correct})^k_u=\frac{p(L_t)^k_u\cdot(1-p(S)^k)}{p(L_t)^k_u\cdot(1-p(S)^k)+(1-p(L_t)^k_u)\cdot p(G)^k}$$

Equation (c):


 * $$p(L_t\mid \text{obs}= \text{wrong})^k_u=\frac{p(L_t)^k_u\cdot p(S)^k}{p(L_t)^k_u\cdot p(S)^k+(1-p(L_t)^k_u) \cdot (1-p(G)^k)}$$

Equation (d):


 * $$p(L_{t+1})^k_u=p(L_t\mid \text{obs})^k_u+(1-p(L_t\mid \text{obs})^k_u)\cdot p(T)^k$$

Equation (e):


 * $$p(C_{t+1})^k_u=p(L_{t+1})^k_u\cdot(1-p(S)^k)+(1-p(L_{t+1})^k_u)\cdot p(G)^k$$