Authors report on findings from a massive, national-post-secondary data initiative that established common data definitions that were openly published and licensed to encourage broad utilization. This paper then reviews aggregated findings of using predictive modeling to find students at risk, as well as results achieved by individual institutions when using student risks scores diagnostically, and linking risk scores to interventions empirically determined to mitigate risk. Authors cite a number of reasons for using caution when developing recommendations using student data to support educational decision-making. Particular attention is paid to issues of privacy, and closes with a reiteration of the importance of creating a culture of evidence to realize the greatest benefits for educational decision-making at each point in the learning–to-life value chain. Decision-makers are encouraged to consider moving beyond the small n study to use diagnostic, predictive and prescriptive analytics (Lowendahl, 2014) for supporting their educational transformation efforts.
PAR member institutions provided anonymized data to the PAR core data team for all credential-seeking students who began taking courses at the institution since August 2009, with more than 2 million student records and more than 20 million course level records in the massive dataset. Data included in the PAR sample included:
- student demographic information – including age, gender, race/ethnicity, military and veteran status, permanent residence zip code, Pell eligibility.
- prior academic information- including high school GPA, transfer GPA, prior amount and type of college credits earned.
- student course information for all courses taken– including specific course titles, course length, course size, outcomes, and delivery mode.
- other student academic information – such as majors pursued, specific credentials sought, transfer credits brought in after enrollment, and credentials earned.
(Ice, et al, 2012)
PAR created and utilized openly published, openly licensed common data definitions that all member institutions used to normalize data. Since all data provided by PAR member institutions utilize these common definitions, both aggregated and cross-institutional comparisons analyses on the combined data sets were enabled. Having relatively comprehensive, detailed data for all credential seeking students, rather than a sample from each institution, enables a more accurate understanding of the student and institutional level factors that impact risk and success. It also makes it possible to more effectively control for confounding variables that might be contributing to observed differences between student groups.