• PNAS Teaching Resources Portal
  • Science Sessions: The PNAS Podcast Program

Combining disparate data sources for improved poverty prediction and mapping

  1. Damien Christophe Jacquesb,1
  1. aComputer Science and Engineering, State University of New York, Buffalo, NY 14221;
  2. bEarth and Life Institute–Environment, Université Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
  1. Edited by Anthony J. Bebbington, Clark University, Worcester, MA, and approved September 26, 2017 (received for review January 9, 2017)

Significance

Spatially finest poverty maps are essential for improved diagnosis and policy planning, especially keeping in view the Sustainable Development Goals. “Big Data” sources like call data records and satellite imagery have shown promise in providing intercensal statistics. This study outlines a computational framework to efficiently combine disparate data sources, like environmental data, and mobile data, to provide more accurate predictions of poverty and its individual dimensions for finest spatial microregions in Senegal. These are validated using the concurrent census data.

Abstract

More than 330 million people are still living in extreme poverty in Africa. Timely, accurate, and spatially fine-grained baseline data are essential to determining policy in favor of reducing poverty. The potential of “Big Data” to estimate socioeconomic factors in Africa has been proven. However, most current studies are limited to using a single data source. We propose a computational framework to accurately predict the Global Multidimensional Poverty Index (MPI) at a finest spatial granularity and coverage of 552 communes in Senegal using environmental data (related to food security, economic activity, and accessibility to facilities) and call data records (capturing individualistic, spatial, and temporal aspects of people). Our framework is based on Gaussian Process regression, a Bayesian learning technique, providing uncertainty associated with predictions. We perform model selection using elastic net regularization to prevent overfitting. Our results empirically prove the superior accuracy when using disparate data (Pearson correlation of 0.91). Our approach is used to accurately predict important dimensions of poverty: health, education, and standard of living (Pearson correlation of 0.84–0.86). All predictions are validated using deprivations calculated from census. Our approach can be used to generate poverty maps frequently, and its diagnostic nature is, likely, to assist policy makers in designing better interventions for poverty eradication.

Footnotes

  • ?1N.P. and D.C.J. contributed equally to this work.

  • ?2To whom correspondence should be addressed. Email: neetipok{at}buffalo.edu.

This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

Online Impact

                                    1. 99132880 2018-01-23
                                    2. 802899879 2018-01-23
                                    3. 295573878 2018-01-23
                                    4. 352668877 2018-01-23
                                    5. 984633876 2018-01-23
                                    6. 545928875 2018-01-23
                                    7. 976569874 2018-01-23
                                    8. 871324873 2018-01-23
                                    9. 263462872 2018-01-23
                                    10. 577161871 2018-01-23
                                    11. 255603870 2018-01-23
                                    12. 117346869 2018-01-23
                                    13. 90982868 2018-01-23
                                    14. 663415867 2018-01-23
                                    15. 793874866 2018-01-23
                                    16. 843582865 2018-01-23
                                    17. 864971864 2018-01-22
                                    18. 258841863 2018-01-22
                                    19. 957295862 2018-01-22
                                    20. 553518861 2018-01-22