Learning analytics

Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. The growth of online learning since the 1990s, particularly in higher education, has contributed to the advancement of Learning Analytics as student data can be captured and made available for analysis. When learners use an LMS, social media, or similar online tools, their clicks, navigation patterns, time on task, social networks, information flow, and concept development through discussions can be tracked. The rapid development of massive open online courses (MOOCs) offers additional data for researchers to evaluate teaching and learning in online environments.

Definition
Although a majority of Learning Analytics literature has started to adopt the aforementioned definition, the definition and aims of Learning Analytics are still contested.

Learning Analytics as a prediction model
One earlier definition discussed by the community suggested that Learning Analytics is the use of intelligent data, learner-produced data, and analysis models to discover information and social connections for predicting and advising people's learning. But this definition has been criticised by George Siemens and Mike Sharkey.

Learning Analytics as a generic design framework
Dr. Wolfgang Greller and Dr. Hendrik Drachsler defined learning analytics holistically as a framework. They proposed that it is a generic design framework that can act as a useful guide for setting up analytics services in support of educational practice and learner guidance, in quality assurance, curriculum development, and in improving teacher effectiveness and efficiency. It uses a general morphological analysis (GMA) to divide the domain into six "critical dimensions".

Learning Analytics as data-driven decision making
The broader term "Analytics" has been defined as the science of examining data to draw conclusions and, when used in decision-making, to present paths or courses of action. From this perspective, Learning Analytics has been defined as a particular case of Analytics, in which decision-making aims to improve learning and education. During the 2010s, this definition of analytics has gone further to incorporate elements of operations research such as decision trees and strategy maps to establish predictive models and to determine probabilities for certain courses of action.

Learning Analytics as an application of analytics
Another approach for defining Learning Analytics is based on the concept of Analytics interpreted as the process of developing actionable insights through problem definition and the application of statistical models and analysis against existing and/or simulated future data. From this point of view, Learning Analytics emerges as a type of Analytics (as a process), in which the data, the problem definition and the insights are learning-related.

In 2016, a research jointly conducted by the New Media Consortium (NMC) and the EDUCAUSE Learning Initiative (ELI) -an EDUCAUSE Program- describes six areas of emerging technology that will have had significant impact on higher education and creative expression by the end of 2020. As a result of this research, Learning analytics was defined as an educational application of web analytics aimed at learner profiling, a process of gathering and analyzing details of individual student interactions in online learning activities.



Learning analytics as an application of data science
In 2017, Gašević, Коvanović, and  Joksimović proposed a consolidated model of learning analytics. The model posits that learning analytics is defined at the intersection of three disciplines: data science, theory, and design. Data science offers computational methods and techniques for data collection, pre-processing, analysis, and presentation. Theory is typically drawn from the literature in the learning sciences, education, psychology, sociology, and philosophy. The design dimension of the model includes: learning design, interaction design, and study design. In 2015, Gašević, Dawson, and Siemens argued that computational aspects of learning analytics need to be linked with the existing educational research in order for Learning Analytics to deliver its promise to understand and optimize learning.

Learning analytics versus educational data mining
Differentiating the fields of educational data mining (EDM) and learning analytics (LA) has been a concern of several researchers. George Siemens takes the position that educational data mining encompasses both learning analytics and academic analytics, the former of which is aimed at governments, funding agencies, and administrators instead of learners and faculty. Baepler and Murdoch define academic analytics as an area that "...combines select institutional data, statistical analysis, and predictive modeling to create intelligence upon which learners, instructors, or administrators can change academic behavior". They go on to attempt to disambiguate educational data mining from academic analytics based on whether the process is hypothesis driven or not, though Brooks questions whether this distinction exists in the literature. Brooks instead proposes that a better distinction between the EDM and LA communities is in the roots of where each community originated, with authorship at the EDM community being dominated by researchers coming from intelligent tutoring paradigms, and learning anaytics researchers being more focused on enterprise learning systems (e.g. learning content management systems).

Regardless of the differences between the LA and EDM communities, the two areas have significant overlap both in the objectives of investigators as well as in the methods and techniques that are used in the investigation. In the MS program offering in learning analytics at Teachers College, Columbia University, students are taught both EDM and LA methods.

Historical contributions
Learning Analytics, as a field, has multiple disciplinary roots. While the fields of artificial intelligence (AI), statistical analysis, machine learning, and business intelligence offer an additional narrative, the main historical roots of analytics are the ones directly related to human interaction and the education system. More in particular, the history of Learning Analytics is tightly linked to the development of four Social Sciences' fields that have converged throughout time. These fields pursued, and still do, four goals:


 * 1) Definition of Learner, in order to cover the need of defining and understanding a learner.
 * 2) Knowledge trace, addressing how to trace or map the knowledge that occurs during the learning process.
 * 3) Learning efficiency and personalization, which refers to how to make learning more efficient and personal by means of technology.
 * 4) Learner – content comparison, in order to improve learning by comparing the learner's level of knowledge with the actual content that needs to master. (  )

A diversity of disciplines and research activities have influenced in these 4 aspects throughout the last decades, contributing to the gradual development of learning analytics. Some of most determinant disciplines are Social Network Analysis, User Modelling, Cognitive modelling, Data Mining and E-Learning. The history of Learning Analytics can be understood by the rise and development of these fields.

Social Network Analysis
Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them. Social network analysis is prominent in Sociology, and its development has had a key role in the emergence of Learning Analytics.

One of the first examples or attempts to provide a deeper understanding of interactions is by Austrian-American Sociologist Paul Lazarsfeld. In 1944, Lazarsfeld made the statement of "who talks to whom about what and to what effect". That statement forms what today is still the area of interest or the target within social network analysis, which tries to understand how people are connected and what insights can be derived as a result of their interactions, a core idea of Learning Analytics.

Citation analysis

American linguist Eugene Garfield was an early pioneer in analytics in science. In 1955, Garfield led the first attempt to analyse the structure of science regarding how developments in science can be better understood by tracking the associations (citations) between articles (how they reference one another, the importance of the resources that they include, citation frequency, etc). Through tracking citations, scientists can observe how research is disseminated and validated. This was the basic idea of what eventually became a "page rank", which in the early days of Google (beginning of the 21st century) was one of the key ways of understanding the structure of a field by looking at page connections and the importance of those connections. The algorithm PageRank -the first search algorithm used by Google- was based on this principle. American computer scientist Larry Page, Google's co-founder, defined PageRank as "an approximation of the importance" of a particular resource. Educationally, citation or link analysis is important for mapping knowledge domains.

The essential idea behind these attempts is the realization that, as data increases, individuals, researchers or business analysts need to understand how to track the underlying patterns behind the data and how to gain insight from them. And this is also a core idea in Learning Analytics.

Digitalization of Social network analysis

During the early 1970s, pushed by the rapid evolution in technology, Social network analysis transitioned into analysis of networks in digital settings.


 * 1) Milgram's 6 degrees experiment. In 1967, American social psychologist Stanley Milgram and other researchers examined the average path length for social networks of people in the United States, suggesting that human society is a small-world-type network characterized by short path-lengths.
 * 2) Weak ties. American Sociologist Mark Granovetter's work on the strength of what is known as weak ties; his 1973 article "The Strength of Weak Ties" is one of the most influential and most cited articles in Social Sciences.
 * 3) Networked individualism. Towards the end of the 20th century, Sociologist Barry Wellman's research extensively contributed the theory of social network analysis. In particular, Wellman observed and described the rise of "networked individualism" – the transformation from group-based networks to individualized networks.

During the first decade of the century, Professor Caroline Haythornthwaite explored the impact of media type on the development of social ties, observing that human interactions can be analyzed to gain novel insight not from strong interactions (i.e. people that are strongly related to the subject) but, rather, from weak ties. This provides Learning Analytics with a central idea: apparently un-related data may hide crucial information. As an example of this phenomenon, an individual looking for a job will have a better chance of finding new information through weak connections rather than strong ones. (  )

Her research also focused on the way that different types of media can impact the formation of networks. Her work highly contributed to the development of social network analysis as a field. Important ideas were inherited by Learning Analytics, such that a range of metrics and approaches can define the importance of a particular node, the value of information exchange, the way that clusters are connected to one another, structural gaps that might exist within those networks, etc.

The application of social network analysis in digital learning settings has been pioneered by Professor Shane P. Dawson. He has developed a number of software tools, such as Social Networks Adapting Pedagogical Practice (SNAPP) for evaluating the networks that form in [learning management systems] when students engage in forum discussions.

User modelling
The main goal of user modelling is the customization and adaptation of systems to the user's specific needs, especially in their interaction with computing systems. The importance of computers being able to respond individually to into people was starting to be understood in the decade of 1970s. Dr Elaine Rich in 1979 predicted that "computers are going to treat their users as individuals with distinct personalities, goals, and so forth". This is a central idea not only educationally but also in general web use activity, in which personalization is an important goal.

User modelling has become important in research in human-computer interactions as it helps researchers to design better systems by understanding how users interact with software. Recognizing unique traits, goals, and motivations of individuals remains an important activity in learning analytics.

Personalization and adaptation of learning content is an important present and future direction of learning sciences, and its history within education has contributed to the development of learning analytics. Hypermedia is a nonlinear medium of information that includes graphics, audio, video, plain text and hyperlinks. The term was first used in a 1965 article written by American Sociologist Ted Nelson. Adaptive hypermedia builds on user modelling by increasing personalization of content and interaction. In particular, adaptive hypermedia systems build a model of the goals, preferences and knowledge of each user, in order to adapt to the needs of that user. From the end of the 20th century onwards, the field grew rapidly, mainly due to that the internet boosted research into adaptivity and, secondly, the accumulation and consolidation of research experience in the field. In turn, Learning Analytics has been influenced by this strong development.

Education/cognitive modelling
Education/cognitive modelling has been applied to tracing how learners develop knowledge. Since the end of the 1980s and early 1990s, computers have been used in education as learning tools for decades. In 1989, Hugh Burns argued for the adoption and development of intelligent tutor systems that ultimately would pass three levels of "intelligence": domain knowledge, learner knowledge evaluation, and pedagogical intervention. During the 21st century, these three levels have remained relevant for researchers and educators.

In the decade of 1990s, the academic activity around cognitive models focused on attempting to develop systems that possess a computational model capable of solving the problems that are given to students in the ways students are expected to solve the problems. Cognitive modelling has contributed to the rise in popularity of intelligent or cognitive tutors. Once cognitive processes can be modelled, software (tutors) can be developed to support learners in the learning process. The research base on this field became, eventually, significantly relevant for learning analytics during the 21st century.

Epistemic Frame Theory
While big data analytics has been more and more widely applied in education, Wise and Shaffer addressed the importance of theory-based approach in the analysis. Epistemic Frame Theory conceptualized the "ways of thinking, acting, and being in the world" in a collaborative learning environment. Specifically, the framework is based on the context of Community of Practice (CoP), which is a group of learners, with common goals, standards and prior knowledge and skills, to solve a complex problem. Due to the essence of CoP, it is important to study the connections between elements (learners, knowledge, concepts, skills and so on). To identify the connections, the co-occurrences of elements in learners' data are identified and analyzed.

Shaffer and Ruis pointed out the concept of closing the interpretive loop, by emphasizing the transparency and validation of model, interpretation and the original data. The loop can be closed by a good theoretical sound analytics approaches, Epistemic Network Analysis.

Other contributions
In a discussion of the history of analytics, Adam Cooper highlights a number of communities from which learning analytics has drawn techniques, mainly during the first decades of the 21st century, including:


 * 1) Statistics, which are a well established means to address hypothesis testing.
 * 2) Business intelligence, which has similarities with learning analytics, although it has historically been targeted at making the production of reports more efficient through enabling data access and summarising performance indicators.
 * 3) Web analytics, tools such as Google Analytics report on web page visits and references to websites, brands and other key terms across the internet. The more "fine grain" of these techniques can be adopted in learning analytics for the exploration of student trajectories through learning resources (courses, materials, etc.).
 * 4) Operational research, which aims at highlighting design optimisation for maximising objectives through the use of mathematical models and statistical methods. Such techniques are implicated in learning analytics which seek to create models of real world behaviour for practical application.
 * 5) Artificial intelligence methods (combined with machine learning techniques built on data mining) are capable of detecting patterns in data. In learning analytics such techniques can be used for intelligent tutoring systems, classification of students in more dynamic ways than simple demographic factors, and resources such as "suggested course" systems modelled on collaborative filtering techniques.
 * 6) Information visualization, which is an important step in many analytics for sensemaking around the data provided, and is used across most techniques (including those above).

Learning analytics programs
The first graduate program focused specifically on learning analytics was created by Ryan S. Baker and launched in the Fall 2015 semester at Teachers College, Columbia University. The program description states that"'(...)data about learning and learners are being generated today on an unprecedented scale. The fields of learning analytics (LA) and educational data mining (EDM) have emerged with the aim of transforming this data into new insights that can benefit students, teachers, and administrators. As one of world's leading teaching and research institutions in education, psychology, and health, we are proud to offer an innovative graduate curriculum dedicated to improving education through technology and data analysis.'"

Masters programs are now offered at several other universities as well, including the University of Texas at Arlington, the University of Wisconsin, and the University of Pennsylvania.

Analytic methods
Methods for learning analytics include:
 * Content analysis, particularly of resources which students create (such as essays).
 * Discourse analytics, which aims to capture meaningful data on student interactions which (unlike social network analytics) aims to explore the properties of the language used, as opposed to just the network of interactions, or forum-post counts, etc.
 * Social learning analytics, which is aimed at exploring the role of social interaction in learning, the importance of learning networks, discourse used to sensemake, etc.
 * Disposition analytics, which seeks to capture data regarding student's dispositions to their own learning, and the relationship of these to their learning. For example, "curious" learners may be more inclined to ask questions, and this data can be captured and analysed for learning analytics.
 * Epistemic Network Analysis, which is an analytics technique that models the co-occurrence of different concepts and elements in the learning process. For example, the online discourse data can be segmented as turn of talk. By coding students' different behaviors of collaborative learning, we could apply ENA to identify and quantify the co-occurrence of different behaviors for any individual in the group.

Applications
Learning Applications can be and has been applied in a noticeable number of contexts.

General purposes
Analytics have been used for:
 * Prediction purposes, for example to identify "at risk" students in terms of drop out or course failure.
 * Personalization & adaptation, to provide students with tailored learning pathways, or assessment materials.
 * Intervention purposes, providing educators with information to intervene to support students.
 * Information visualization, typically in the form of so-called learning dashboards which provide overview learning data through data visualisation tools.

Benefits for stakeholders
There is a broad awareness of analytics across educational institutions for various stakeholders, but that the way learning analytics is defined and implemented may vary, including:


 * 1) for individual learners to reflect on their achievements and patterns of behaviour in relation to others. Particularly, the following areas can be set out for measuring, monitoring, analyzing and changing to optimize student performance:
 * 2) Monitoring individual student performance
 * 3) Disaggregating student performance by selected characteristics such as major, year of study, ethnicity, etc.
 * 4) Identifying outliers for early intervention
 * 5) Predicting potential so that all students achieve optimally
 * 6) Preventing attrition from a course or program
 * 7) Identifying and developing effective instructional techniques
 * 8) Analyzing standard assessment techniques and instruments (i.e. departmental and licensing exams)
 * 9) Testing and evaluation of curricula.
 * 10) as predictors of students requiring extra support and attention;
 * 11) to help teachers and support staff plan supporting interventions with individuals and groups;
 * 12) for functional groups such as course teams seeking to improve current courses or develop new curriculum offerings; and
 * 13) for institutional administrators taking decisions on matters such as marketing and recruitment or efficiency and effectiveness measures.

Some motivations and implementations of analytics may come into conflict with others, for example highlighting potential conflict between analytics for individual learners and organisational stakeholders.

Software
Much of the software that is currently used for learning analytics duplicates functionality of web analytics software, but applies it to learner interactions with content. Social network analysis tools are commonly used to map social connections and discussions. Some examples of learning analytics software tools include:
 * BEESTAR INSIGHT: a real-time system that automatically collects student engagement and attendance, and provides analytics tools and dashboards for students, teachers and management
 * LOCO-Analyst: a context-aware learning tool for analytics of learning processes taking place in a web-based learning environment
 * SAM: a Student Activity Monitor intended for personal learning environments
 * SNAPP: a learning analytics tool that visualizes the network of interactions resulting from discussion forum posts and replies
 * Solutionpath StREAM: A leading UK based real-time system that leverage predictive models to determine all facets of student engagement using structured and unstructured sources for all institutional roles
 * Student Success System: a predictive learning analytics tool that predicts student performance and plots learners into risk quadrants based upon engagement and performance predictions, and provides indicators to develop understanding as to why a learner is not on track through visualizations such as the network of interactions resulting from social engagement (e.g. discussion posts and replies), performance on assessments, engagement with content, and other indicators
 * Epistemic Network Analysis (ENA) web tool: An interactive online tool that allow researchers to upload the coded dataset and create the model by specifying units, conversations and codes. Useful functions within the online tool includes mean rotation for comparison between two groups, specifying the sliding window size for connection accumulation, weighed or unweighted models, and parametric and non-parametric statistical testings with suggested write-up and so on. The web tool is stable and open source.

Ethics and privacy
The ethics of data collection, analytics, reporting and accountability has been raised as a potential concern for learning analytics, with concerns raised regarding:
 * Data ownership
 * Communications around the scope and role of learning analytics
 * The necessary role of human feedback and error-correction in learning analytics systems
 * Data sharing between systems, organisations, and stakeholders
 * Trust in data clients

As Kay, Kom and Oppenheim point out, the range of data is wide, potentially derived from:
 * Recorded activity: student records, attendance, assignments, researcher information (CRIS)
 * Systems interactions: VLE, library / repository search, card transactions
 * Feedback mechanisms: surveys, customer care
 * External systems that offer reliable identification such as sector and shared services and social networks

Thus the legal and ethical situation is challenging and different from country to country, raising implications for:
 * Variety of data: principles for collection, retention and exploitation
 * Education mission: underlying issues of learning management, including social and performance engineering
 * Motivation for development of analytics: mutuality, a combination of corporate, individual and general good
 * Customer expectation: effective business practice, social data expectations, cultural considerations of a global customer base.
 * Obligation to act: duty of care arising from knowledge and the consequent challenges of student and employee performance management

In some prominent cases like the inBloom disaster, even full functional systems have been shut down due to lack of trust in the data collection by governments, stakeholders and civil rights groups. Since then, the learning analytics community has extensively studied legal conditions in a series of experts workshops on "Ethics & Privacy 4 Learning Analytics" that constitute the use of trusted learning analytics. Drachsler & Greller released an 8-point checklist named DELICATE that is based on the intensive studies in this area to demystify the ethics and privacy discussions around learning analytics.
 * 1) D-etermination: Decide on the purpose of learning analytics for your institution.
 * 2) E-xplain: Define the scope of data collection and usage.
 * 3) L-egitimate: Explain how you operate within the legal frameworks, refer to the essential legislation.
 * 4) I-nvolve: Talk to stakeholders and give assurances about the data distribution and use.
 * 5) C-onsent: Seek consent through clear consent questions.
 * 6) A-nonymise: De-identify individuals as much as possible
 * 7) T-echnical aspects: Monitor who has access to data, especially in areas with high staff turn-over.
 * 8) E-xternal partners: Make sure externals provide highest data security standards

It shows ways to design and provide privacy conform learning analytics that can benefit all stakeholders. The full DELICATE checklist is publicly available.

Privacy management practices of students have shown discrepancies between one's privacy beliefs and one's privacy related actions. Learning analytic systems can have default settings that allow data collection of students if they do not choose to opt-out. Some online education systems such as edX or Coursera do not offer a choice to opt-out of data collection. In order for certain learning analytics to function properly, these systems utilize cookies to collect data.

Open learning analytics
In 2012, a systematic overview on learning analytics and its key concepts was provided by Professor Mohamed Chatti and colleagues through a reference model based on four dimensions, namely:
 * data, environments, context (what?),
 * stakeholders (who?),
 * objectives (why?), and
 * methods (how?).

Chatti, Muslim and Schroeder note that the aim of open learning analytics (OLA) is to improve learning effectiveness in lifelong learning environments. The authors refer to OLA as an ongoing analytics process that encompasses diversity at all four dimensions of the learning analytics reference model.