Events

LISER, Esch-Belval, Luxembourg
3-4 June 2019

Secondary data analysis of Large-Scale assessment using R

Workshop

In the present workshop attendees are introduced to the used of R, for data analysis of large-scale assessment data (Rutkowski, Davier, & Rutkowski, 2013). First, attendees install the necessary software and libraries, to than work on importing and merging data, using libraries such as `haven`, `dplyr` and `tidyr` from the ‘tidyverse’ framework (Wickham & Grolemund, 2016). Basic routines are introduced to generate variables from original records, such as removing non-valid values, creating dummy variables and centering variables. Finally, traditional examples of educational gaps are modelled using regressions and mixed models.

In the second day of the workshop, different modelling routines are presented in R, including regression and multilevel models. It is shown how estimates can be retrieved and compared in steps. It is also shown how these estimates can be retrieve in a custom way to build tables and compare results. In the last part of the workshop, it is shown how to integrate R studio with ‘github’ to share code with other users; and also how to use R to interact with other software for statistical analysis.

The general approach of the workshop is a hands-on guided-exercise. That is, attendees are provided with code, which they can run in their own laptops, while these routines are commented and explain by the instructor. As an introductory workshop, the emphasis is on aiding the use of the software by using commented examples. The code style and workflow in this workshop are opinionated, in the sense that it is not the unique possible way to produce the presented results. However, the presented workflow has many advantages to aid reproducibility and data analysis development.

Rutkowski, L., Davier, M. von, & Rutkowski, D. (2013). Handbook of International large-scale assessment: background, technical issues, and methods of data analysis. (L. Rutkowski, M. von Davier, & D. Rutkowski, Eds.). Boca Raton, London, New York: Chapman and Hall/CRC.

Wickham, H., & Grolemund, G. (2016). R for Data Science. Sebastopol, CA: O’Reilly Media, Inc.


Diego Carrasco is a researcher at the Centro de Medición MIDE UC, at the Pontificia Universidad Católica, Chile. He holds an MRes in Psychological Methods, and a PhD from the University of Sussex, UK. His research focus is on the estimation of contextual effects involving measurement and inferential problems present in national and large-scale assessments; in particular, methodological problems involving the comparison of learning environments. He has a long experience as a secondary data analysist for national and large scales assessments, and has been conducting statistical analysis workshops since 2015. The topics covered in these workshops include multilevel models, measurement models, path analysis, and other applications of the generalized latent variable modelling framework, applied to large scale assessment data.

Programme

Day 1: Introduction to R, 3rd of June (7 hours)

11:00-12:00

Problem 1: computer setup

  • Installing R and Rstudio
  • Installing Sublime text
  • Installing R-Box
  • Installing Send Code

12:00-13:00

Problem 2: setting up your data

  • Import data
  • Merge data
  • Filter data
  • Structure data
  • Save data in R, Excel, STATA and MPLUS

13:00-14:00

Lunch break


14:00-15:30

Problem 3: generating variables

  • recording (dummy coding, reverse responses)
  • composite scores (Sum scores, Mean scores)
  • centering variables (group mean centering, grand mean centering, z scores)

Problem 4: basic descriptives

  • frequencies
  • percentages
  • means
  • quick descriptives (with skimr, with rsda)

15:30-16:00

Coffee break


16:00-18:00

Problem 5: fit models

  • regressions
  • mixed models

Day 2: Data analysis with R, June 4th (9 hours)

09:00-11:00

Problem 6: many models for a group

regressions in steps


11:00-11:30

Coffee break


11:30-13:00

Problem 7: many models for a group

mixed models in steps


13:00-14:00

Lunch break


14:00-15:30

Problem 8: single model for many groups

  • single model for many groups
  • running models in batches
  • retrieve estimates
  • tidying up estimates into tables
  • compare estimates with plots

15:30-16:00

Coffee break


16:00-18:00

Problem 9: integrating R with other software

  • versioning control tools to share code (uploading functions, creating libraries)
  • integration with other sofwatre (R and Stata, R and MPLUS)