Chapter 5: Predictive Modelling in Teaching and Learning - Society for

Chapter 5: Predictive Modelling in Teaching and Learning - Society for

Chapter 5: Predictive Modelling in Teaching and Learning Christopher Brooks1, Craig Thompson2 1 2 School of Information, University of Michigan, USA ...

236KB Sizes 0 Downloads 10 Views

Chapter 5: Predictive Modelling in Teaching and Learning Christopher Brooks1, Craig Thompson2 1 2

School of Information, University of Michigan, USA Department of Computer Science, University of Saskatchewan, Canada

DOI: 10.18608/hla17.005

ABSTRACT This article describes the process, practice, and challenges of using predictive modelling LQWHDFKLQJDQGOHDUQLQJ,QERWKWKHāHOGVRIHGXFDWLRQDOGDWDPLQLQJ ('0 DQGOHDUQLQJ analytics (LA) predictive modelling has become a core practice of researchers, largely with DIRFXVRQSUHGLFWLQJVWXGHQWVXFFHVVDVRSHUDWLRQDOL]HGE\DFDGHPLFDFKLHYHPHQW,QWKLV chapter, we provide a general overview of considerations when using predictive modelling, the steps that an educational data scientist must consider when engaging in the process, DQGDEULHIRYHUYLHZRIWKHPRVWSRSXODUWHFKQLTXHVLQWKHāHOG Keywords:3UHGLFWLYHPRGHOLQJPDFKLQHOHDUQLQJHGXFDWLRQDOGDWDPLQLQJ ('0 IHDWXUH selection, model evaluation

3UHGLFWLYHDQDO\WLFVDUHDJURXSRIWHFKQLTXHVXVHG to make inferences about uncertain future events. In the educational domain, one may be interested in predicting a measurement of learning (e.g., student academic success or skill acquisition), teaching (e.g., WKHLPSDFWRIDJLYHQLQVWUXFWLRQDOVW\OHRUVSHFLāF LQVWUXFWRURQDQLQGLYLGXDO RURWKHUSUR[\PHWULFV of value for administrations (e.g., predictions of reWHQWLRQRUFRXUVHUHJLVWUDWLRQ 3UHGLFWLYHDQDO\WLFV in education is a well-established area of research, and several commercial products now incorporate predictive analytics in the learning content managePHQWV\VWHP HJ'/16WDUāVK5HWHQWLRQ6ROXWLRQV2 Ellucian,3 and Blackboard4 )XUWKHUPRUHVSHFLDOL]HG companies (e.g., Blue Canary,5 Civitas Learning) now provide predictive analytics consulting and products for higher education. ,QWKLVFKDSWHUZHLQWURGXFHWKHWHUPVDQGZRUNĂRZ related to predictive modelling, with a particular emphasis on how these techniques are being applied in teaching and learning. While a full review of the literature is beyond the scope of this chapter, we encourage readers to consider the conference proceedings 1 KWWSZZZVWDUāVKVROXWLRQVFRP 3 4 5 2

and journals associated with the Society for Learning Analytics and Research (SoLAR) and the International (GXFDWLRQDO'DWD0LQLQJ6RFLHW\ ,('06 IRUPRUH H[DPSOHVRIDSSOLHGHGXFDWLRQDOSUHGLFWLYHPRGHOOLQJ First, it is important to distinguish predictive modHOOLQJIURPH[SODQDWRU\PRGHOOLQJ7,QH[SODQDWRU\ modelling, the goal is to use all available evidence WRSURYLGHDQH[SODQDWLRQIRUDJLYHQRXWFRPH)RU instance, observations of age, gender, and socioeconomic status of a learner population might be used LQDUHJUHVVLRQPRGHOWRH[SODLQKRZWKH\FRQWULEXWH to a given student achievement result. The intent of WKHVHH[SODQDWLRQVLVJHQHUDOO\WREHFDXVDO YHUVXV correlative alone), though results presented using these DSSURDFKHVRIWHQHVFKHZH[SHULPHQWDOVWXGLHVDQG rely on theoretical interpretation to imply causation (as described well by Shmueli, 2010). In predictive modelling, the purpose is to create a model that will predict the values (or class if the prediction does not deal with numeric data) of new data based on REVHUYDWLRQV8QOLNHH[SODQDWRU\PRGHOOLQJSUHGLFWLYH modelling is based on the assumption that a set of known data (referred to as training instances in data mining 7

Shmueli (2010) notes a third form of modelling, descriptive PRGHOOLQJZKLFKLVVLPLODUWRH[SODQDWRU\PRGHOOLQJEXWLQZKLFK there are no claims of causation. In the higher education literature, we would suggest that causation is often implied, and the majority of descriptive analyses are actually intended to be used as causal HYLGHQFHWRLQĂXHQFHGHFLVLRQPDNLQJ


PG 61

literature) can be used to predict the value or class of new data based on observed variables (referred to as features in predictive modelling literature). Thus the SULQFLSDOGLIIHUHQFHEHWZHHQH[SODQDWRU\PRGHOOLQJ and predictive modelling is with the application of the PRGHOWRIXWXUHHYHQWVZKHUHH[SODQDWRU\PRGHOOLQJ does not aim to make any claims about the future, while predictive modelling does. 0RUHFDVXDOO\H[SODQDWRU\PRGHOOLQJDQGSUHGLFWLYH modelling often have a number of pragmatic differHQFHVZKHQDSSOLHGWRHGXFDWLRQDOGDWD([SODQDWRU\ PRGHOOLQJLVDSRVWKRFDQGUHĂHFWLYHDFWLYLW\DLPHG at generating an understanding of a phenomenon. 3UHGLFWLYHPRGHOOLQJLVDQLQVLWXDFWLYLW\LQWHQGHGWR make systems responsive to changes in the underlying data. It is possible to apply both forms of modelling to technology in higher education. For instance, Lonn and Teasley (2014) describe a student-success system EXLOWRQH[SODQDWRU\PRGHOVZKLOH%URRNV7KRPSVRQ and Teasley (2015) describe an approach based upon predictive modelling. While both methods intend to inform the design of intervention systems, the former does so by building software based on theory GHYHORSHGGXULQJWKHUHYLHZRIH[SODQDWRU\PRGHOVE\ H[SHUWVZKLOHWKHODWWHUGRHVVRXVLQJGDWDFROOHFWHG IURPKLVWRULFDOORJāOHV LQWKLVFDVHFOLFNVWUHDPGDWD  The largest methodological difference between the two modelling approaches is in how they address the issue RIJHQHUDOL]DELOLW\,QH[SODQDWRU\PRGHOOLQJDOORIWKH data collected from a sample (e.g., students enrolled in a given course) is used to describe a population more generally (e.g., all students who could or might enroll in DJLYHQFRXUVH 7KHLVVXHVUHODWHGWRJHQHUDOL]DELOLW\ are largely based on sampling techniques. Ensuring the sample represents the general population by reducing VHOHFWLRQELDVRIWHQWKURXJKUDQGRPRUVWUDWLāHGVDPpling, and determining the amount of power needed to ensure an appropriate sample, through an analysis RISRSXODWLRQVL]HDQGOHYHOVRIHUURUWKHLQYHVWLJDWRU is willing to accept. In a predictive model, a hold out dataset is used to evaluate the suitability of a model IRUSUHGLFWLRQDQGWRSURWHFWDJDLQVWWKHRYHUāWWLQJ of models to data being used for training. There are several different strategies for producing hold out datasets, including k-fold cross validation, leave-oneRXWFURVVYDOLGDWLRQUDQGRPL]HGVXEVDPSOLQJDQG DSSOLFDWLRQVSHFLāFVWUDWHJLHV With these comparisons made, the remainder of this chapter will focus on how predictive modelling is being used in the domain of teaching and learning, and provide an overview of how researchers engage in the predictive modelling process.

PG 62


PREDICTIVE MODELLING WORKFLOW Problem Identification In the domain of teaching and learning, predictive modelling tends to sit within a larger action-oriented HGXFDWLRQDOSROLF\DQGWHFKQRORJ\FRQWH[WZKHUHLQstitutions use these models to react to student needs in real-time. The intent of the predictive modelling activity is to set up a scenario that would accurately describe the outcomes of a given student assuming no new intervention. For instance, one might use a predictive model to determine when a given individual is likely to complete their academic degree. Applying this model to individual students will provide insight into when they might complete their degrees assuming no intervention strategy is employed. Thus, while it is important for a predictive model to generate accurate scenarios, these models are not generally deployed without an intervention or remediation strategy in mind. Strong candidate problems for a successful predictive modelling approach are those in which there are quanWLāDEOHFKDUDFWHULVWLFVRIWKHVXEMHFWEHLQJPRGHOOHG a clear outcome of interest, the ability to intervene in situ, and a large set of data. Most importantly, there must be a recurring need, such as a class being ordered year after year, where the historical data on learners (the training set) is indicative of future learners (the testing set). Conversely, several factors make predictive modelling PRUHGLIāFXOWRUOHVVDSSURSULDWH)RUH[DPSOHERWK sparse and noisy data present challenges when trying WRFUHDWHDFFXUDWHSUHGLFWLYHPRGHOV'DWDVSDUVLW\RU missing data, can occur for a variety of reasons, such as students choosing not to provide optional information. Noisy data occurs when a measurement fails to capture the intended data accurately, such as determining a VWXGHQWÚVORFDWLRQIURPWKHLU,3DGGUHVVZKHQVRPH VWXGHQWVDUHXVLQJYLUWXDOSULYDWHQHWZRUNV SUR[LHV used to circumvent region restrictions, a not uncommon practice in countries such as China). Finally, in some domains, inferences produced by predictive models may be at odds with ethical or equitable practice, such as using models of student at-risk predictions WROLPLWWKHDGPLVVLRQVRIVDLGVWXGHQWV H[HPSOLāHG LQ6WULSOLQJHWDO 

Data Collection In predictive modelling, historical data is used to generate models of relationships between features. One RIWKHāUVWDFWLYLWLHVIRUDUHVHDUFKHULVWRLGHQWLI\WKH outcome variable (e.g., grade or achievement level) as well as the suspected correlates of this variable (e.g., JHQGHUHWKQLFLW\DFFHVVWRJLYHQUHVRXUFHV *LYHQ the situational nature of the modelling activity, it is

important to choose only those correlates available at or before the time in which an intervention might be HPSOR\HG)RULQVWDQFHDPLGWHUPH[DPLQDWLRQJUDGH PLJKWEHSUHGLFWLYHRIDāQDOJUDGHLQWKHFRXUVHEXW if the intent is to intervene before the midterm, this data value should be left out of the modelling activity. In time-based modelling activities, such as the predicWLRQRIDVWXGHQWāQDOJUDGHLWLVFRPPRQIRUPXOWLSOH models to be created (e.g., Barber & Sharkey, 2012), each corresponding to a different time period and set of observed variables. For instance, one might generate predictive models for each week of the course, incorporating into each model the results of weekly TXL]]HVVWXGHQWGHPRJUDSKLFVDQGWKHDPRXQWRI engagement the students have had with respect digital resources to date in the course. While state-based data, such as data about demographics (e.g., gender, ethnicity), relationships (e.g., course enrollments), psychological measures (e.g., grit, as in 'XFNZRUWK3HWHUVRQ0DWWKHZV .HOO\DQG DSWLWXGHWHVWV DQGSHUIRUPDQFH HJVWDQGDUGL]HG test scores, grade point averages) are important for educational predictive models, it is the recent rise of big event-driven data collections that has been a particularly powerful enabler of predictive models (see Alhadad et al., 2015 for a deeper discussion). Event-data is largely student activity-based, and is derived from the learning technologies that students interact with, such as learning content management systems, discussion forums, active learning technologies, and video-based instructional tools. This data LVODUJHDQGFRPSOH[ RIWHQLQWKHRUGHURIPLOOLRQV of database rows for a single course), and requires VLJQLāFDQWHIIRUWWRFRQYHUWLQWRPHDQLQJIXOIHDWXUHV for machine learning. Of pragmatic consideration to the educational researcher is obtaining access to event data and creating the necessary features required for the predictive modelling process. The issue of access is highly conWH[WVSHFLāFDQGGHSHQGVRQLQVWLWXWLRQDOSROLFLHVDQG processes as well as governmental restrictions (such DV)(53$LQWKH8QLWHG6WDWHV 7KHLVVXHRIFRQYHUWLQJ FRPSOH[GDWD DVLVWKHFDVHZLWKHYHQWEDVHGGDWD  into features suitable for predictive modelling is referred to as feature engineering, and is a broad area of research itself.

Classification and Regression In statistical modelling, there are generally four types of data considered: categorical, ordinal, interval, and ratio. Each type of data differs with respect to the kinds of relationships, and thus mathematical operations, which can be derived from individual elements. In practice, ordinal variables are often treated as

categorical, and interval and ratio are considered as numeric. Categorical values may be binary (such as predicting whether a student will pass or fail a course) or multivalued (such as predicting which of a given set of possible practice questions would be most appropriate IRUDVWXGHQW 7ZRGLVWLQFWFODVVHVRIDOJRULWKPVH[LVW IRUWKHVHDSSOLFDWLRQVFODVVLāFDWLRQDOJRULWKPVDUH used to predict categorical values, while regression algorithms are used to predict numeric values.

Feature Selection In order to build and apply a predictive model, features that correlate with the value to predict must be created. When choosing what data to collect, the practitioner should err on the side of collecting more information UDWKHUWKDQOHVVDVLWPD\EHGLIāFXOWRULPSRVVLEOH to add additional data later, but removing information is typically much easier. Ideally, there would be some single feature that perfectly correlates with the chosen outcome prediction. However, this rarely occurs in practice. Some learning algorithms make use of all available attributes to make predictions, whether they are highly informative or not, whereas others apply some form of variable selection to eliminate the uninformative attributes from the model. 'HSHQGLQJRQWKHDOJRULWKPXVHGWREXLOGDSUHGLFWLYH PRGHOLWFDQEHEHQHāFLDOWRH[DPLQHWKHFRUUHODWLRQ between features, and either remove highly correlated attributes (the multicollinearity problem in regression analyses), or apply a transformation to the features to eliminate the correlation. Applying a learning algorithm that naively assumes independence of the attributes can result in predictions with an over-emphasis on the repeated or correlated features. For instance, if one is trying to predict the grade of a student in a class and uses an attribute of both attendance in-class on a given day as well as whether a student asked a question on a given day, it is important for the researcher to acknowledge that the two features are not independent (e.g., a student could not ask a question if they were not in attendance). In practice, the dependencies between features are often ignored, but it is important to note that some techniques used to clean and manipulate data may rely upon an assumption of independence.8 By determining an informative subset of the features, RQHFDQUHGXFHWKHFRPSXWDWLRQDOFRPSOH[LW\RIWKH predictive model, reduce data storage and collection requirements, and aid in simplifying predictive models IRUH[SODQDWLRQ 8

The authors share an anecdote of an analysis that fell prey to the dangers of assuming independence of attributes when using resampling techniques to boost certain classes of data when applying the synthetic minority over-sampling technique (Chawla, Bowyer, Hall, & Kegelmeyer, 2002). In that case, missing data with respect to city and province resulted in a dataset containing geographically impossible combinations, reducing the effectiveness of the attributes and lowering the accuracy of the model.


PG 63

Missing values in a dataset may be dealt with in several ways, and the approach used depends on whether data is missing because it is unknown or because it is not applicable. The simplest approach either is to remove the attributes (columns) or instances (rows) that have missing values. There are drawbacks to both of these WHFKQLTXHV)RUH[DPSOHLQGRPDLQVZKHUHWKHWRWDO amount of data is quite small, the impact of removing HYHQDVPDOOSRUWLRQRIWKHGDWDVHWFDQEHVLJQLāFDQW HVSHFLDOO\LIWKHUHPRYDORIVRPHGDWDH[DFHUEDWHVDQ H[LVWLQJFODVVLPEDODQFH/LNHZLVHLIDOODWWULEXWHV have a small handful of missing values, then attribute removal will remove all of the data, which would not be useful. Instead of deleting rows or columns with missing data, one can also infer the missing values from the other known data. One approach is to reSODFHPLVVLQJYDOXHVZLWKDÜQRUPDOÝYDOXHVXFKDVWKH PHDQRIWKHNQRZQYDOXHV$VHFRQGDSSURDFKLVWRāOO LQPLVVLQJYDOXHVLQUHFRUGVE\āQGLQJRWKHUVLPLODU records in the dataset, and copying the missing values from their records. The impact of missing data is heavily tied to the choice of learning algorithm. Some algorithms, such as the QD°YH%D\HVFODVVLāHUFDQPDNHSUHGLFWLRQVHYHQZKHQ some attributes are unknown; the missing attributes are simply not used in making a prediction. The nearest QHLJKERXUFODVVLāHUUHOLHVRQFRPSXWLQJWKHGLVWDQFH between two data points, and in some implementations the assumption is made that the distance between a known value and a missing value is the largest possible distance for that attribute. Finally, when the C4.5 decision tree algorithm encounters a test on an instance with a missing value, the instance is divided into fractional parts that are propagated down the tree and used for a weighted voting. In short, missing data is an important consideration that both regularly occurs and is handled differently depending upon the machine learning method and toolkit employed.

the instructor of the course, the pedagogical technique employed, or the degree programs requiring the course, this course may no longer be as predictive of degree completion as was originally thought. The practitioner should always consider whether patterns discovered LQKLVWRULFDOGDWDVKRXOGEHH[SHFWHGLQIXWXUHGDWD $QXPEHURIGLIIHUHQWDOJRULWKPVH[LVWIRUEXLOGLQJ predictive models. With educational data, it is common to see models built using methods such as these: 1.

Linear Regression predicts a continuous numeric output from a linear combination of attributes.


Logistic Regression predicts the odds of two or more outcomes, allowing for categorical predictions.


Nearest Neighbours Classifiers use only the closest labelled data points in the training dataset to determine the appropriate predicted labels for new data.


Decision Trees (e.g., C4.5 algorithm) are repeated partitions of the data based on a series of single DWWULEXWHÜWHVWVÝ(DFKWHVWLVFKRVHQDOJRULWKPLFDOO\WRPD[LPL]HWKHSXULW\RIWKHFODVVLāFDWLRQV in each partition.


1D°YH%D\HV&ODVVLāHUV assume the statistical independence of each attribute given the classiāFDWLRQDQGSURYLGHSUREDELOLVWLFLQWHUSUHWDWLRQV RIFODVVLāFDWLRQV


Bayesian Networks feature manually constructed graphical models and provide probabilistic interSUHWDWLRQVRIFODVVLāFDWLRQV


Support Vector Machines use a high dimensional GDWDSURMHFWLRQLQRUGHUWRāQGDK\SHUSODQHRI greatest separation between the various classes.


Neural Networks are biologically inspired algorithms that propagate data input through a series of sparsely interconnected layers of computational nodes (neurons) to produce an output. Increased interest has been shown in neural network approaches under the label of deep learning.


Ensemble Methods use a voting pool of either KRPRJHQHRXVRUKHWHURJHQHRXVFODVVLāHUV7ZR prominent techniques are bootstrap aggregating, in which several predictive models are built from random sub-samples of the dataset, and boosting, in which successive predictive models are GHVLJQHGWRDFFRXQWIRUWKHPLVFODVVLāFDWLRQVRI the prior models.

Methods for Building Predictive Models After collecting a dataset and performing attribute selection, a predictive model can be built from historical data. In the most general terms, the purpose of a predictive model is to make a prediction of some unknown quantity or attribute, given some related NQRZQLQIRUPDWLRQ7KLVVHFWLRQZLOOEULHĂ\LQWURGXFH several such methods for building predictive models. A fundamental assumption of predictive modelling is WKDWWKHUHODWLRQVKLSVWKDWH[LVWLQWKHGDWDJDWKHUHG LQWKHSDVWZLOOVWLOOH[LVWLQWKHIXWXUH+RZHYHUWKLV DVVXPSWLRQPD\QRWKROGXSLQSUDFWLFH)RUH[DPSOH it may be the case that (according to the historical data collected) a student’s grade in Introductory Calculus is highly correlated with their likelihood of completing a degree within 4 years. However, if there is a change in

PG 64


Most of these methods, and their underlying software implementations, have tunable parameters that change the way the algorithm works depending upon H[SHFWDWLRQVRIWKHGDWDVHW)RULQVWDQFHZKHQEXLOGing decision trees, a researcher might set a minimum

OHDIVL]HRUPD[LPXPGHSWKRIWUHHSDUDPHWHUXVHGLQ RUGHUWRHQVXUHVRPHOHYHORIJHQHUDOL]DELOLW\ Numerous software packages are available for the building of predictive modelling, and choosing the right package depends highly on the researcher’s H[SHULHQFHWKHGHVLUHGFODVVLāFDWLRQRUUHJUHVVLRQ approach, and the amount of data and data cleaning required. While a comprehensive discussion of these platforms is outside the scope of this chapter, the freely available and open-source package Weka (Hall et al., 2009) provides implementations of a number of the previously mentioned modelling methods, does not require programming knowledge to use, and has DVVRFLDWHGHGXFDWLRQDOPDWHULDOVLQFOXGLQJDWH[WERRN (Witten, Frank, & Hall, 2011) and series of free online FRXUVHV :LWWHQ  While the breadth of techniques covered within a given software package has led to it being commonplace for researchers (including educational data scientists) to SXEOLVKWDEOHVRIFODVVLāFDWLRQDFFXUDFLHVIRUDQXPEHU of different methods, the authors caution against this. Once a given technique has shown promise, time is EHWWHUVSHQWUHĂHFWLQJRQWKHIXQGDPHQWDODVVXPSWLRQVRIFODVVLāHUV HJZLWKUHVSHFWWRPLVVLQJGDWDRU GDWDVHWLPEDODQFH H[SORULQJHQVHPEOHVRIFODVVLāHUV or tuning the parameters of particular methods being employed. Unless the intent of the research activity is to compare two statistical modelling approaches VSHFLāFDOO\HGXFDWLRQDOGDWDVFLHQWLVWVDUHEHWWHU RIIW\LQJWKHLUāQGLQJVWRQHZRUH[LVWLQJWKHRUHWLFDO constructs, leading to a deepening of understanding of a given phenomenon. Sharing data and analysis scripts in an open science fashion provides better opportunity for small technique iterations than cluttering a publication with tables of (often) uninteresting precision and recall values.

Evaluating a Model In order to assess the quality of a predictive model, a test dataset with known labels is required. The predictions made by the model on the test set can be compared to the known true labels of the test set in order to assess the model. A wide variety of measures is available to compare the similarity of the known WUXHODEHOVDQGWKHSUHGLFWHGODEHOV6RPHH[DPSOHV include prediction accuracy (the raw fraction of test LQVWDQFHVFRUUHFWO\FODVVLāHG SUHFLVLRQDQGUHFDOO Often, when approaching a predictive modelling problem, only one omnibus set of data is available for building. While it may be tempting to reuse this same dataset as a test set to assess model quality, the perIRUPDQFHRIWKHSUHGLFWLYHPRGHOZLOOEHVLJQLāFDQWO\ higher on this dataset than would be seen on a novel

GDWDVHW UHIHUUHGWRDVRYHUāWWLQJWKHPRGHO ,QVWHDG LWLVFRPPRQSUDFWLFHWRÜKROGRXWÝVRPHIUDFWLRQRI the dataset and use it solely as a test set to assess model quality. The simplest approach is to remove half of the data and reserve it for testing. However, there are two drawbacks to this approach. First, by reserving half of the data for testing, the predictive model will only be DEOHWRPDNHXVHRIKDOIRIWKHGDWDIRUPRGHOāWWLQJ *HQHUDOO\PRGHODFFXUDF\LQFUHDVHVDVWKHDPRXQW of available data increases. Thus, training using only half of the available data may result in predictive models with poorer performance than if all the data had been used. Second, our assessment of model quality will only be based on predictions made for half of the DYDLODEOHGDWD*HQHUDOO\LQFUHDVLQJWKHQXPEHURI instances in the test set would increase the reliability of the results. Instead of simply dividing the data into training and testing partitions, it is common to use a process of k-fold cross validation in which the dataset is partitioned at random into k segments; k distinct predictive models are constructed, with each model training on all but one of the segments, and testing on the single held out segment. The test results are then pooled from all k test segments, and an assessment of model quality can be performed. 7KHLPSRUWDQWEHQHāWVRINIROGFURVVYDOLGDWLRQDUH that every available data point can be used as part of the test set, no single data point is ever used in both WKHWUDLQLQJVHWDQGWHVWVHWRIWKHVDPHFODVVLāHUDW the same time, and the training sets used are nearly as large as all of the available data. An important consideration when putting predictive modelling into practice is the similarity between the data used for training the model and the data available when predictions need to be made. Often in the educational domain, predictive models are constructed using data from one or more time periods (e.g., semesters or years), and then applied to student GDWDIURPWKHQH[WWLPHSHULRG,IWKHIHDWXUHVXVHGWR construct the predictive model include factors such as students’ grades on individual assignments, then the accuracy of the model will depend on how similar WKHDVVLJQPHQWVDUHIURPRQH\HDUWRWKHQH[W7RJHW an accurate assessment of model performance, it is important to assess the model in the same manner as will be used in situ. Build the predictive model using data available from one year, and then construct a testing set consisting of data from the following year, instead of dividing data from a single year into training and testing sets.


PG 65

PREDICTIVE ANALYTICS IN PRACTICE 3UHGLFWLYHDQDO\WLFVDUHEHLQJXVHGZLWKLQWKHāHOGRI teaching and learning for many purposes, with one VLJQLāFDQWERG\RIZRUNDLPHGDWLGHQWLI\LQJVWXGHQWV at risk in their academic programming. For instance, Aguiar et al. (2015) describe the use of predictive models to determine whether students will graduate from secondary school on time, demonstrating how the accuracy of predictions changes as students advance from primary school through into secondary school. 3UHGLFWHGRXWFRPHVYDU\ZLGHO\DQGPLJKWLQFOXGHD VSHFLāFVXPPDWLYHJUDGHRUJUDGHGLVWULEXWLRQIRUD student or class of achievement (Brooks et al., 2015) LQDFRXUVH%DNHU*RZGDDQG&RUEHWW  GHVFULEH a method that predicts a formative achievement for a student based on their previous interactions with an intelligent tutoring system. In lower-risk and semi-formal settings such as massive open online courses (MOOCs), the chance that a learner might disengage from the learning activity mid-course is another heavily studied outcome (Xing, Chen, Stein, 0DUFLQNRZVNL7D\ORU9HHUDPDFKDQHQL  O’Reilly, 2014). Beyond performance measures, predictive models have been used in teaching and learning to detect learners who are engaging in off-task behaviour (Xing DQG*RJJLQV%DNHU VXFKDVÜJDPLQJWKH V\VWHPÝLQRUGHUWRDQVZHUTXHVWLRQVFRUUHFWO\ZLWKout learning (Baker, Corbett, Koedinger, & Wagner,  3V\FKRORJLFDOFRQVWUXFWVVXFKDVDIIHFWLYHDQG emotional states have also been predictively modelled 'Ú0HOOR&UDLJ:LWKHUVSRRQ0F'DQLHO *UDHVVHU 2007; Wang, Heffernan, & Heffernan, 2015), using a YDULHW\RIXQGHUO\LQJGDWDDVIHDWXUHVVXFKDVWH[WXDO GLVFRXUVHRUIDFLDOFKDUDFWHULVWLFV0RUHH[DPSOHV of some of the ways predictive modelling has been XVHGLQ(GXFDWLRQDO'DWD0LQLQJLQSDUWLFXODUFDQ EHIRXQGLQ.RHGLQJHU'Ú0HOOR0F/DXJKOLQ3DUGRV and Rosé (2015).

CHALLENGES AND OPPORTUNITIES Computational and statistical methods for predictive modelling are mature, and over the last decade, a number of robust tools have been made available for educational researchers to apply predictive modelling to teaching and learning data. Yet a number of challenges and opportunities face the learning analytics community when building, validating, and applying predictive models. We identify three areas that could use investment in order to increase the impact that predictive modelling techniques can have:

PG 66



Supporting non-computer scientists in predictive modelling activities7KHOHDUQLQJDQDO\WLFVāHOG is highly interdisciplinary and educational researchers, psychometricians, cognitive and social SV\FKRORJLVWVDQGSROLF\H[SHUWVWHQGWRKDYH VWURQJEDFNJURXQGVLQH[SODQDWRU\PRGHOOLQJ 3URYLGLQJVXSSRUWLQWKHDSSOLFDWLRQRISUHGLFWLYH modelling techniques, whether through the innovation of user-friendly tools or the development of educational resources on predictive modelling, could further diversify the set of educational researchers using these techniques.


Creating community-led educational data science challenge initiatives. It is not uncommon for researchers to address the same general theme of work but use slightly different datasets, implementations, and outcomes and, as such, have results WKDWDUHGLIāFXOWWRFRPSDUH7KLVLVH[HPSOLāHG in recent predictive modelling research regarding dropout in massive open online courses, where a number of different authors (e.g., Brooks et al., ;LQJHWDO7D\ORUHWDO:KLWHKLOO :LOOLDPV/RSH]&ROHPDQ 5HLFK KDYH all done work with different datasets, outcome variables, and approaches. Moving towards a common and clear set of outcomes, open data, and shared implementations LQRUGHUWRFRPSDUHWKHHIāFDF\RIWHFKQLTXHV and the suitability of modelling methods for given SUREOHPVFRXOGEHEHQHāFLDOIRUWKHFRPPXQLW\ This approach has been valuable in similar research āHOGVDQGWKHEURDGHUGDWDVFLHQFHFRPPXQLW\DQG we believe that educational data science challenges could help to disseminate predictive modelling knowledge throughout the educational research community while also providing an opportunity for the development of novel interdisciplinary methods, especially related to feature engineering.


Engaging in second order predictive modelling. ,QWKHFRQWH[WRIOHDUQLQJDQDO\WLFVZHGHāQH second order predictive models as those that include historical knowledge as to the effects of and intervention in the model itself. Thus a predictive model that used student interactions with content to determine drop out (for instance) would be an H[DPSOHRIāUVWRUGHUSUHGLFWLYHPRGHOOLQJZKLOH a model that also includes historical data as to the effect of an intervention (such as an email prompt or nudge) would be considered a second order predictive model. Moving towards the modelling of intervention effectiveness is important when multiple interventions are available and personDOL]HGOHDUQLQJSDWKVDUHGHVLUHG

'HVSLWHWKHPXOWLGLVFLSOLQDU\QDWXUHRIWKHOHDUQLQJ analytics and educational data mining communities, WKHUHLVVWLOODVLJQLāFDQWQHHGIRUEULGJLQJXQGHUstanding between the diverse scholars involved. An interesting thematic undercurrent at learning analytics conferences are the (sometimes-heated) discussions of the roles of theory and data as drivers of educational research. Have we reached the point RIÜWKHHQGRIWKHRU\Ý $QGHUVRQ LQHGXFDWLRQDO UHVHDUFK"8QOLNHO\EXWWKLVTXHVWLRQLVPRVWVDOLHQW ZLWKLQWKHVXEāHOGRISUHGLFWLYHPRGHOOLQJLQWHDFKLQJ

and learning: while for some researchers the goal is understanding cognition and learning processes, others are interested in predicting future events and success as accurately as possible. With predictive PRGHOVEHFRPLQJLQFUHDVLQJO\FRPSOH[DQGLQFRPSUHKHQVLEOHE\DQLQGLYLGXDO HVVHQWLDOO\EODFNER[HV  LWLVLPSRUWDQWWRVWDUWGLVFXVVLQJPRUHH[SOLFLWO\WKH JRDOVRIUHVHDUFKDJHQGDVLQWKHāHOGWREHWWHUGULYH PHWKRGRORJLFDOFKRLFHVEHWZHHQH[SODQDWRU\DQG predictive modelling techniques.


PG 67

PG 68