The Fragile Families Challenge: A Scientific Mass Collaboration

-
Matthew Salganik, Princeton University
Fine Hall 214

Sociologists have long theorized about the processes through which childhood experiences shape life outcomes. However, statistical models that use data on family background and childhood experiences to predict life outcomes often have poor predictive performance.  These empirical results have lead pioneers of the field muse that random chance must play an important role. In this talk, we present results from the Fragile Families Challenge, a scientific mass collaboration based on the Fragile Families and Child Wellbeing Study, a brith cohort study of about 5,000 families from large US cities.  The study began with a probability sample of newborns, and for more than 15 years, researchers have followed these families to collect information related to child and family development as reported by the child as well as the child’s mother, father, primary caregiver, and teachers.  During the Fragile Families Challenge, 159 research teams from 68 institutions in 7 countries used this high-dimensional survey data to produce statistical and machine learning models to predict six life outcomes.  Results suggest that (a) modern machine learning methods enabled predictive performance that outpaced approaches more common in social science, but (b) overall predictive performance was poor. The talk will include a discussion of the potential reasons for poor predictive performance in this setting, open methodological questions raised by our results, and the potential for mass collaboration to address other scientific and policy questions.