• PNAS Sustainability Science
  • Sign-up for PNAS eTOC Alerts

Combining disparate data sources for improved poverty prediction and mapping

  1. Damien Christophe Jacquesb,1
  1. aComputer Science and Engineering, State University of New York, Buffalo, NY 14221;
  2. bEarth and Life Institute–Environment, Université Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
  1. Edited by Anthony J. Bebbington, Clark University, Worcester, MA, and approved September 26, 2017 (received for review January 9, 2017)

Significance

Spatially finest poverty maps are essential for improved diagnosis and policy planning, especially keeping in view the Sustainable Development Goals. “Big Data” sources like call data records and satellite imagery have shown promise in providing intercensal statistics. This study outlines a computational framework to efficiently combine disparate data sources, like environmental data, and mobile data, to provide more accurate predictions of poverty and its individual dimensions for finest spatial microregions in Senegal. These are validated using the concurrent census data.

Abstract

More than 330 million people are still living in extreme poverty in Africa. Timely, accurate, and spatially fine-grained baseline data are essential to determining policy in favor of reducing poverty. The potential of “Big Data” to estimate socioeconomic factors in Africa has been proven. However, most current studies are limited to using a single data source. We propose a computational framework to accurately predict the Global Multidimensional Poverty Index (MPI) at a finest spatial granularity and coverage of 552 communes in Senegal using environmental data (related to food security, economic activity, and accessibility to facilities) and call data records (capturing individualistic, spatial, and temporal aspects of people). Our framework is based on Gaussian Process regression, a Bayesian learning technique, providing uncertainty associated with predictions. We perform model selection using elastic net regularization to prevent overfitting. Our results empirically prove the superior accuracy when using disparate data (Pearson correlation of 0.91). Our approach is used to accurately predict important dimensions of poverty: health, education, and standard of living (Pearson correlation of 0.84–0.86). All predictions are validated using deprivations calculated from census. Our approach can be used to generate poverty maps frequently, and its diagnostic nature is, likely, to assist policy makers in designing better interventions for poverty eradication.

More than 330 million people are still living in extreme poverty in Africa (1). Consequently, the goal to “eradicate extreme poverty for all people everywhere by 2030” tops the list of the 17 Sustainable Development Goals adopted by world leaders at the United Nations summit in September 2015. The lack of good-quality and fine-grained data to assess poverty regularly features in discussions of the development agenda for Africa (2, 3). Timely measurement and availability of data are vital in ending poverty.

Despite the nature of the strategies used to reduce poverty, governments and development agencies need a baseline depiction. Poverty maps provide such a spatial distribution of the socioeconomic deprivations and help policy makers assess the impact of interventions. For efficient targeting of policies at microregions and specific demographics, poverty maps should be made available at the finest administrative unit of planning. Also, these values should be disaggregated into individual dimensions of poverty, like deprivations in education, standard of living, health, and so forth (4).

Currently, the most reliable way to estimate poverty is through intensive socioeconomic household surveys. However, this approach is costly and time consuming and can only be realistically carried out for a small sample of households. The extrapolation of the local poverty estimation to a larger scale is traditionally done by exploiting links between census (wide area) and survey (smaller area coverage) data through small area estimation methods (5, 6). These techniques depend on the timely availability of census, which is typically collected every 10 y and whose analysis is delayed for poorer economies by years, making timely updates of poverty challenging.

Recently, there has been a growing interest in realizing the potential of “Big Data” to understand societal development in Africa. However, most current studies are limited to using single source datasets, such as mobile phone data (7) or satellite imagery (8). Since poverty is a complex phenomenon, understanding it using multiple lenses obtained from diverse datasets will help to chart more accurate maps for poverty.

Several studies highlight that significant spatial variation of poverty may be due to a variety of geographic factors, including agrometeorological conditions, accessibility and proximity to markets, access to land, and so forth (9, 10) (see Table S3). Earth Observation Satellites collect data on metrics such as nighttime lights, vegetation cover, and meteorological conditions. The unique features of such datasets are their global coverage, high revisit capability, and free availability. A complementary resource lies in Geographic Information Systems (GIS) analysis. In particular, proximity to important services (schools, hospitals) and density of infrastructure (such as roads) are all factors that might contribute to alleviating poverty (11).

Table S3.

Brief review of poverty estimation methods based on environmental data

While satellite and GIS data are apt to observe and understand the availability of and access to natural resources and manmade structures, they lack information about population structure, especially the socioeconomic ties, cultural interactions, and micro- and macrobehavior that is essential to understanding poverty. One way to study societal interactions is provided by the widespread use of digital technologies (12). The Internet is still finding ground in sub-Saharan Africa. However, mobile phones are a prevalent technology, with adoption rates of more than 70%, even with 43% of population living in abject poverty (13). Such widespread use of mobile phones generates an unprecedented volume of data called call data records (CDRs). CDRs capture how, when, where, and with whom individuals communicate. These data, traditionally used by the telecommunication companies for billing purposes, capture both micro- and macropatterns of human interaction, while preserving the individual anonymity via spatial and temporal aggregation.

Poverty has traditionally been measured in one dimension, usually income or consumption, called income poverty. Another internationally comparable measure is the Global Multidimensional Poverty Index (MPI), which is used in this study. Global MPI is a composite of 10 indicators across three critical dimensions—education (years of schooling, school enrollment), health (malnutrition, child mortality), and standard of living conditions (see Global MPI). Throughout the paper, “poverty” refers to the Global MPI, and “dimensions” refers to education, health, and standard of living. MPI is calculated as a product of the incidence or headcount of poverty (H) and the average intensity (A) across the poor. H is the proportion of the population that is multidimensionally poor. A is the average proportion of indicators in which poor people are deprived.

The study focuses on Senegal, a sub-Saharan country that suffers from persistently high poverty. This study uses mobile phone data in the form of CDRs, and data related to food security (availability and access components), economic activity, and access to services are grouped together as environmental data (Table 1). The CDR variables capture not only the basic phone use statistics of a user but also the regularity, diversity, and spatiotemporal variability in the user’s mobile interactions. Tables S1 and S2 detail the variables extracted from CDR and environment data, respectively. The poverty maps are produced at the spatially finest level of policy planning, called “communes,” and validated at that level using the concurrent census data. Current poverty maps, based on Global MPI (see Fig. 1) and consumption-based measures (14), do not exist uniformly for all communes of Senegal. The map produced by our analysis is available for all 552 communes (see Fig. 2). Such maps can be generated frequently in between cycles of surveys and census, since CDR and environmental data are available at fine temporal granularity.

Table 1.

Summary statistics and characteristics of the data used—CDRs, environment, census, and MPI

Fig. 2.

Quantiles of predicted (Left) and actual (Right) MPI at the commune level. The urban centers are depicted by small circles on the map. The communes in the Dakar and Thiès regions are shown enlarged.

Table S1.

Source, unit, and expected relationship to poverty of each environmental variable used in this study

Table S2.

List of core features extracted for each individual from CDR data using the Bandicoot toolbox (31)

Our objective is to present a computational framework that integrates disparate data sources to accurately predict the Global MPI and its individual dimensions at the finest level of spatial granularity. This framework consists of models trained independently on each data source. Each source-specific model uses Gaussian process (GP) regression (GPR) (15) to infer poverty values. GP falls under the class of kernel methods, where the choice of different kernel functions enables one to learn different nonlinear relationships between the independent and target variables. Each GP-based model provides a probabilistic estimate of poverty for a given commune, including the mean and variance of the estimates. The variance provides a measure of uncertainty, which allows us to combine the predictions from the multiple data sources. An important advantage of this methodology is that the different data ecosystems need not share any data between them. The individual datasets remain private within their specific ecosystems, and only the output predictions and the associated variances are shared.

Global MPI

Poverty has traditionally been measured in one dimension, usually income (or consumption)—also known as income poverty. Another internationally comparable measure is the Global MPI, which complements income poverty and is created from nationally representative Demographic and Health Surveys and Multiple Indicator Cluster Survey (DHS-MICS) (42). It was developed by OPHI and the United Nations Development Program. It is a composite of 10 indicators across three critical dimensions—education (years of schooling, school enrollment), health (malnutrition, child mortality), and living conditions (cooking fuel, sanitation, access to drinking water, electricity, floor, asset ownership).

MPI is defined as the percentage of people who are MPI poor (H, headcount of poverty) multiplied by the average intensity of MPI poverty across the poor (A, intensity of poverty). The MPI data for Senegal, used in this study, were downloaded from www.ophi.org.uk/wp-content/uploads/Senegal-2013.pdf.

MPI is robust to decomposition within relevant subgroups of populations, like urban vs. rural, geographic regions (districts/provinces/states), and gender, so that targeted policies can be planned for specific demographics. Countries can also adapt the multidimensional poverty approach to select different indicators and/or update weights that align better with their nation’s poverty measure. Countries, like Mexico, Colombia, and Chile, have implemented their own version of national MPI using additional dimensions than MPI, such as employment and social protection, when data are available (www.mppn.org/).

The MPI has some limitations. Though it has been defined from available variables in global surveys (DHS-MICS), some of the potential dimensions of poverty (like gender, income, employment) are not directly incorporated. However, due to the wide availability of these surveys, Global MPI can easily be estimated in more than 100 countries covering 5.2 billion people (42). Consequently, it represents a benchmark index, more interesting than the single dimension poverty line, for replication of this study in another country.

Results

GP Model for Predicting Poverty from a Single Data Source.

To predict poverty for a commune from a single data source (CDR or environment), the following model is assumed:<mml:math display="block"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>β</mml:mi><mml:mo>?</mml:mo></mml:msup><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mi>?</mml:mi></mml:mrow></mml:mrow></mml:math>yi=β???i+f(??i)+?[1]where <mml:math><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>yi is the target poverty value and <mml:math><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:math>??i is a vector of independent variables derived from the particular data source for the ith commune. The first term is a linear combination of the independent variables. The function <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f() models the nonlinear relationship between <mml:math><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>yi and <mml:math><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:math>??i. The residual term, <mml:math><mml:mi>?</mml:mi></mml:math>?, models the remaining unexplained noise and is modeled as a zero-mean Gaussian random variable—that is, <mml:math><mml:mrow><mml:mi>?</mml:mi><mml:mo>~</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>?~N(0,σn2).

Without the nonlinear term, <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f() in Eq. 1, the model is equivalent to ordinary linear regression. However, a linear model is not rich enough to capture the relationships between the target and the independent variables (see Fig. S6), thus motivating the need for a nonlinear term.

Fig. S6.

Residual vs. fit plots to predict incidence of poverty (H) using CDR (Top) and environmental (Bottom) data. (Left) Linear (elastic net regression). (Right) Nonlinear (GPR). Linear model fits indicate nonlinearity in the data. The residuals for GPR are normally distributed. Shapiro–Wilk test statistic: CDR, 0.97 (P value <mml:math><mml:mrow><mml:mo><</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>?</mml:mo><mml:mn>9</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math><10?9); environmental, 0.95 (P-value <mml:math><mml:mrow><mml:mo><</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>?</mml:mo><mml:mn>9</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math><10?9).

Instead of assuming a fixed parametric form for <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(), we adopt a nonparametric approach, by assuming a GP prior on <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(). The generative process thus becomes:<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>~</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>f(??)~GP(m(??),k(??,??′))[2]<mml:math display="block"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>~</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>β</mml:mi><mml:mo>?</mml:mo></mml:msup><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo>?</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:math>yi~N(β???i+f(??i),σn2),?i[3]A GP is a stochastic process, indexed by <mml:math><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>∈</mml:mo><mml:msup><mml:mi>?</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math>??∈?d. Any finite sample generated from it is jointly multivariate normal (15). <mml:math><mml:mrow><mml:mi>m</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>m(??) is the mean of <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(??) and <mml:math><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>’</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>k(??,x’) is a kernel function that defines the covariance between any two evaluations of <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(??)—that is, <mml:math><mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>m(??)=??[f(??)], and <mml:math><mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>k(??,??′)=??[(f(??)?m(??))(f(??′)?m(??′))]. For model simplicity, we assume that <mml:math><mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math>m(??)=0, which is a standard practice in GP-based methods (15).

Given a training set of examples, <mml:math><mml:mrow><mml:mi mathvariant="script">D</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup></mml:mrow></mml:math>D={??i,yi}i=1N, the GP prior on <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(), and other terms in Eq. 1, the posterior distribution of <mml:math><mml:msub><mml:mi>y</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:math>y? (for an unseen input vector, <mml:math><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:math>???) is a Gaussian distribution, with the following mean and variance (see GP Regression Model for details):<mml:math display="block"><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mo>?</mml:mo></mml:msub><mml:mo>:=</mml:mo><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mo>?</mml:mo></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>β</mml:mi><mml:mo>?</mml:mo></mml:msup><mml:mtext>??</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msup><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msup><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>K</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mi>I</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mtext>??</mml:mtext></mml:mrow></mml:mrow></mml:mrow></mml:math>yˉ?:=??[y?]=β???+???(K+σn2I)?1??[4]<mml:math display="block"><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mo>?</mml:mo><mml:mn>2</mml:mn></mml:msubsup><mml:mo>:=</mml:mo><mml:mrow><mml:mtext>var</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mo>?</mml:mo></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mo>?</mml:mo></mml:msub><mml:mo>?</mml:mo><mml:mrow><mml:msup><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msup><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>K</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mi>I</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mtext>??</mml:mtext></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow></mml:math>σ?2:=var[y?]=k?????(K+σn2I)?1??+σn2[5]Here, <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:mtext>??</mml:mtext></mml:mpadded><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mrow><mml:mo>?</mml:mo></mml:msup></mml:mrow></mml:math>??=[y1,y2,…]?, and <mml:math><mml:mi>K</mml:mi></mml:math>K is a matrix that contains the kernel function evaluation on each pair of training inputs—that is, <mml:math><mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>K[i,j]=k(??i,??j)—and k is a vector of the kernel computation between each training input and the test input—that is, <mml:math><mml:mrow><mml:mrow><mml:mtext>??</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>??[i]=k(???,??i), <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:msub><mml:mi>k</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:mpadded><mml:mo>=</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>k?=k(???,???)—and <mml:math><mml:mi>I</mml:mi></mml:math>I is an identity matrix.

Choice of Kernel Function.

The role of the kernel function is to specify how the function values <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(??) and <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(??′) vary as the function of their corresponding inputs x and <mml:math><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo></mml:math>??′. We use the following kernel function:<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>f</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>?</mml:mo><mml:mfrac><mml:msup><mml:mrow><mml:mo>∥</mml:mo><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo></mml:mrow><mml:mo>∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mn>2</mml:mn><mml:msup><mml:mi mathvariant="normal">?</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>?</mml:mo><mml:mfrac><mml:msup><mml:mrow><mml:mo>∥</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>s</mml:mi></mml:msub><mml:mo>?</mml:mo><mml:msub><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo></mml:mrow><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mo>∥</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mn>2</mml:mn><mml:msubsup><mml:mi mathvariant="normal">?</mml:mi><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>k(??,??′)=σf2exp(?∥?????′∥22?2)exp(?∥??s???′s∥22?s2)[6]where <mml:math><mml:msub><mml:mtext>??</mml:mtext><mml:mi>s</mml:mi></mml:msub></mml:math>??s and <mml:math><mml:msub><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo></mml:mrow><mml:mi>s</mml:mi></mml:msub></mml:math>??′s are the spatial coordinates (latitude, longitude) of the commune centers corresponding to x and <mml:math><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo></mml:math>??′, respectively. The first exponent term captures nonlinear dependencies in the feature space. The second exponent term plays the same role, but in the geographic space and models, the spatial autocorrelation is a continuous function, which is same as Kriging, a widely used method in geostatistics (16). The parameter <mml:math><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>f</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math>σf2 is the variance of the stochastic process <mml:math><mml:mi>f</mml:mi></mml:math>f, <mml:math><mml:mi>l</mml:mi></mml:math>l is the process length scale for the feature space part, and <mml:math><mml:msub><mml:mi>l</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:math>ls is the process length scale for the spatial part.

The quantities <mml:math><mml:mrow><mml:mi>β</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">?</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="normal">?</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math>β,?,?s,σn2, and <mml:math><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>f</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math>σf2 are estimated by maximizing the marginalized log-likelihood of the training data, as discussed in Materials and Methods. To remove the effect of spurious features, we couple the GP model with elastic net regularization (17) during the model learning phase. This allows for automatic relevant feature selection and learning a parsimonious model that improves interpretability.

Combining Source-Specific Models.

To predict poverty for a commune, we use two independently trained models specified in Eq. 1, corresponding to the two data sources of CDRs and environmental data. Each model produces a posterior Gaussian distribution, denoted by <mml:math><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>~</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>yic~N(yˉic,σic2) and <mml:math><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>~</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>yie~N(yˉie,σie2) for the CDR and environmental data, respectively. The combined poverty estimate, <mml:math><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>yi, is assumed to be a mixture distribution consisting of two Gaussians, defined above, and the mixing weights defined as:<mml:math display="block"><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mfrac><mml:mn>1</mml:mn><mml:msubsup><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mfrac><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:msubsup><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mfrac><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:msubsup><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mfrac></mml:mrow></mml:mfrac></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mfrac><mml:mn>1</mml:mn><mml:msubsup><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mfrac><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:msubsup><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mfrac><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:msubsup><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mfrac></mml:mrow></mml:mfrac></mml:mrow></mml:mrow></mml:math>wic=1σic21σic2+1σie2,wie=1σie21σic2+1σie2[7]The weights assign greater importance to the source that provides a smaller predictive variance, signifying higher confidence in the prediction for the particular commune. The mean and the variance for the combined poverty estimate are (see Estimating Moments of a Mixture Distribution):<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mtext>var</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi>σ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>?</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">ˉ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:math>??[yi]=wicyˉic+wieyˉievar[yi]=wicσic2+wieσie2+wicwie(yˉic?yˉie)2[8]

Predicted MPI Poverty Values.

The predicted map of MPI for microregions—that is, 552 communes of Senegal—is depicted in Fig. 2, Left. Compared with the current poverty map in Fig. 1, our map highlights heterogeneity in the existence of poverty within each macroregion. The communes toward the interior of the country have more poverty compared with the rest. The west regions, containing the capital city Dakar, and communes neighboring the coastal boundary are less poor than the rest of the country. Of special interest is the spatially large division in the south, consisting of the regions of Tambacounda, Kedougou, and Kolda, which are depicted as one color on the current map in Fig. 1 but have communes of varying poverty values spread throughout. Interestingly, the communes in the Kedougou region in the extreme southeast corner of Senegal are predicted as wealthier than other communes within the region. The communes in the region of Ziguinchor, in the southwest corner, are wealthier compared with other communes in the south. This is attributed to the fact that Ziguinchor is the second largest city in Senegal, with the economic advantage of being a port and a tourist center.

The 121 urban centers are shown as small circles on the map and, in general, have less poverty values compared with rural communes. The population in urban centers is generally richer than the population living in adjacent rural communes. This is true even for very poor communes of Senegal in the regions of Kaffrine and Tambacounda in the center, for which the contrast is even higher. The urban centers bordering with the neighboring country Mauritania, in the northeast, are wealthier; this could be attributed to the economy of the Senegal river basin and to cross-border trade. The predominantly urban areas in Dakar are shown enlarged in the map. All communes in Dakar are more well-off than the rest of Senegal because of the concentration of economic activity over the years.

A quantitative validation of the predictions is provided against commune-level poverty values estimated from census data (see Fig. 2, Right) using cross-validation (CV) procedures (details in Materials and Methods). A standard CV is often performed to ensure that the model generalizes to out-of-sample data. We performed a standard 10-fold CV, where the data are randomly split into 10-folds. Each time, ninefolds are used for training, and singlefold is used for evaluation, meaning we randomly assign 90% of communes to the training set and evaluate the remaining 10% of communes. This procedure is repeated 250 times to provide a robust assessment of the variability of model parameters and prediction statistics. Using standard CV, the model gives a Pearson’s correlation of 0.94, with a P value of <mml:math><mml:mo><</mml:mo></mml:math><0.0001. Though training and evaluation data are selected randomly, the above-described method of validation may prove to be insufficient, as the poverty deprivations tend to be spatially correlated. Thus, a model may appear to perform well when evaluated this way, even though it may have poor extrapolation power in the spatial sense. The above results are provided for comparison.

To measure the extrapolation ability of the model to spatial areas that were not represented in the training data, we use a spatial CV procedure (18) (details in Materials and Methods). Here, the training and evaluation sets are sampled from geographically distinct regions ensuring that the model is tested rigorously. The experiments were repeated 250 times with random samples of training and evaluation sets, while ensuring that all communes are represented in the evaluation. We report Pearson’s and Spearman’s correlations, and rms error (RMSE), averaged over the multiple CV runs. The predictions in Fig. 2, Left have a spatially cross-validated Pearson’s correlation of 0.91 and rank correlation of 0.87, with P values less than <mml:math><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>?</mml:mo><mml:mn>20</mml:mn></mml:mrow></mml:msup></mml:math>10?20 for both tests, indicating strong significance. This emphasizes the efficacy of our model in predicting poverty values accurately at the finest spatial granularity, using multisource data.

As a comparative study of how our model performs using multisource and single-source data, we experimented with three datasets—Multisource, CDR, and Environment—to predict H, A, and MPI at the commune level (see Table 2). We report highly accurate results for all three targets (H, A, and MPI). Rank correlations are preserved, as we report a Spearman’s correlation of 0.85 for both H and A. The values of Pearson’s r correlation are much higher than rank correlation, across all prediction tasks, indicating the linear correspondence of the poverty values with the predicted ones. We report significantly low P values (<mml:math><mml:mrow><mml:mo><</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>?</mml:mo><mml:mn>34</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math><10?34) for spatial CV compared with standard CV, signifying more stable performance. For detailed results, see Table S5. Table 2 shows that combining multiple data sources (CDRs and environmental data) results in a consistent improvement of accuracy over using the individual data sources. The improvement is more pronounced in detailed results for all of the indicators of poverty and given in Table S5.

Table 2.

Spatially cross-validated results of the predictions of MPI, headcount of poverty (H), and intensity of poverty (A), along with the individual indicators for poverty given by our model using disparate datasets

Table S4.

Brief review of poverty estimation methods based on CDR data

Table S5.

Spatially-cross validated results of the predictions of MPI, incidence of poverty (H), and intensity of poverty (A), along with the individual indicators for poverty given by our model using disparate datasets

Fig. 3, Left plots the relationship between MPI values predicted by our model and those estimated from census. We observe a linear relationship, in general, for MPI, with lower values for urban areas (shown in red) and higher values for rural areas (shown in blue). Predominantly urban communes of Dakar and a few urban centers are underestimated for poverty (i.e., they are predicted richer than they are). Likewise, there are very few rural communes, where poverty is overestimated. We also observe that for communes with lower population densities, the predicted variance is comparatively higher than it is for communes with higher densities, signifying that lesser numbers of data points in the vicinity of a given commune contribute to its higher variance (see Fig. S5).

Fig. 3.

Predictive power of the Gaussian process model. Left denotes the comparison of actual and predicted MPI values for all communes and urban areas of Senegal. The rural and urban areas are differentiated using blue and red colors, respectively. The size of the circle denotes the variance of the MPI prediction for that commune. Top Right shows how the actual and predicted values compare for asset ownership, while Bottom Right shows the comparison for years of schooling.

Fig. S5.

Relationship between precision of estimates of poverty and the population density of each commune.

Predicted Values for the Dimensions of Poverty.

Global MPI consists of 10 individual deprivation indicators grouped along three dimensions: (i) education (indicators—years of schooling and school attendance), (ii) health (indicators—child mortality and nutrition), and (iii) standard of living (indicators—cooking fuel, sanitation, access to drinking water, electricity, and floor and asset ownership). Each individual deprivation indicator is taken as the target of our model, and the averaged spatially cross-validated results, along the three dimensions, are reported in Table 2. Detailed results for each of the 10 indicators are given in Table S5.

Referring to Table S5, we note that the accuracy of the model is high for some deprivations and good for most deprivations. All deprivations are better predicted using CDR data, probably because they characterize the individual behavior while environmental data depict conditions that might have an influence on poverty (see Tables S1 and S2). Fig. 3, Top Right compares our predictions for asset ownership with those estimated from the census. Rural communes depicted (by blue) are clustered closely toward high deprivation. The urban areas have, generally, lower deprivation than rural areas, though it is spread out.

Indicators related to education—years of schooling and school attendance—are predicted well, because use of short message service (SMS) is indicative of literacy (19). The environmental data also perform well, because they capture the distance to schools, main roads, and urban centers, all of which facilitate access to educational attainment. Fig. 3, Bottom Right shows that all areas of Senegal are deprived in education, as the rural (in blue) and urban (in red) points are spread evenly on the plot. However, rural areas tend to dominate at the very high deprivation index, while very low deprivation areas are urban.

The model performs poorly for the indicators within the health dimension—that is, child mortality and nutrition. This is attributed to the fact that our data are not representative of the children population, and thus, the features extracted from CDR data do not capture this deprivation. A similar inference can be drawn for poorer correlations for nutrition. Moreover, the validation of deprivation values computed from the census for nutrition indicators are based on two hunger-related questions, as detailed nutritional information is not available to us (see Table S7 for details).

Table S7.

A summary of poverty indicators and associated deprivations, with emphasis on how our methodology calculates them using the RGPHAE census data, keeping in view the OPHI guidelines

Dimensions of Poverty—Interpretation of Weights.

Figs. S2 and S3 display the features deemed important by our model for the environment and CDR data, respectively. The important features are those for which the corresponding entries in the coefficient vector, <mml:math><mml:mi>β</mml:mi></mml:math>β, are high in magnitude. We ignore child mortality and nutrition, as our model does not perform very accurately for these two indicators. The following interpretations are given for information purposes. These are, by no means, indicators of causality.

Fig. S1.

Spearman correlation matrix between individual deprivations, H (headcount of poverty), A (intensity of poverty), and MPI at the commune level.

Fig. S2.

Visualization of selected features using elastic net regularization on environmental data for prediction of selected deprivations. The rows represent the features, which are ranked according to their weights from positive (marked green) to negative (marked red). Different features groups are color-coded. Features related to food availability are given in black color, whereas those related to food accessibility are colored green. The land cover features are colored yellow, and the features detailing economic activity are in red color. Finally, features depicting access to services are shown in blue. The cells in white were given 0 weights by our model.

Fig. S3.

Visualization of selected features using elastic net regularization on CDR data for prediction of selected deprivations. The rows represent features, which are ranked according to their weights from positive (marked green) to negative (marked red). The columns are the various deprivations. The feature groups are color-coded. Features related to diversity features are colored blue. Those related to spatial aspects are colored yellow. The features related to active behavior are marked in black. The features related to basic phone use are in red, and those related to regularity are in green. The cells in white were given 0 weights by our model. Legend in parentheses corresponds to the different variation in weights. H and A weights vary between 1.85 and <mml:math><mml:mrow><mml:mo>?</mml:mo><mml:mn>1.85</mml:mn></mml:mrow></mml:math>?1.85, and for others the weights vary between 5.5 and <mml:math><mml:mrow><mml:mo>?</mml:mo><mml:mn>5.5</mml:mn></mml:mrow></mml:math>?5.5.

Referring to Fig. S2, nighttime lights appear to be the most important feature regardless of the predicted dimensions, conforming to the current research (8, 20). Nighttime lights show a strong correlation with MPI (Spearman correlation of ?0.66). Urban areas and road density, two other important indicators of economic activity, are relevant but to a lesser extent. Even though the coefficient values of each dimension are not directly comparable, since each dimension was taken as a separate target, it is interesting to note that the weights of nighttime lights intensity for electricity and asset ownership deprivation are the highest. This result confirms previous findings (21) that access to electricity is correlated with nighttime lights (Spearman correlation of ?0.67). Additional observations regarding water deprivation, food security (access component), and climate are given in Interpretation of Weights—Along the Dimensions of Poverty.

A similar analysis for the CDR features reveals several interesting insights regarding the relationship between poverty and the individual characteristics captured in CDR features. While we considered CDR features for each month individually, for the ease of visualization (see Fig. S3), we average the monthly values of the weights associated with each feature.

Here we discuss the CDR features that were selected by the model as the strongest predictors for the various targets. These features are listed in Table S6. One of the strongest negative predictors for most of the targets is the number of active days (for call and text), which characterizes that individuals in wealthier communes have monetary resources to recharge their phone and make/receive calls. The ratio of calls vs. text shows the preference for calls and emerges as an important factor to predict education-based deprivations. The feature “interevent time call” measures the irregularity in responding to calls/text and emerges as a positive predictor for deprivations. Features that indicate diversity in communication, such as entropy of contacts and interactions per contact (call and text), report a negative relationship to poverty. These results confirm previous findings (7, 22, 23) that diversity of an individual’s relationships is positively correlated with his or her economic wellbeing. However, for features such as percent pareto interactions and balance of contacts, which are proportional to an individual’s diversity in communication, we report a positive relationship with poverty. This counterintuitive relationship needs to be further studied in the context of telecommunication patterns in Senegal.

Table S6.

List of the important features chosen by our model to predict each of H, A, schooling, school attendance, cooking fuel, sanitation, water, electricity, floor, and assets

We observe a negative relationship between the “activeness” of an individual in his or her mobile interactions and poverty. For instance, the delay in responding to text has a positive relationship to poverty. Interestingly, the feature of percent initiated interactions (calls) has, again, a positive relationship to poverty, signifying that in Senegal individuals living in more deprived communes are more likely to initiate calls (for request of resources, etc.) than those living in less deprived communes. The mobility patterns of individuals, captured using spatial features such as number of frequent antennas, entropy of antennas, and total number of antennas used by an individual, indicate a negative relationship to poverty. Thus, individuals living in more deprived communes tend to move fewer antennas than those living in less deprived communes. This observation should be viewed cautiously because of sparse antenna density in rural communes.

GP Regression Model

The following model is assumed to predict poverty for a commune from a single data source (CDR or environment):<mml:math display="block"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>β</mml:mi><mml:mo>?</mml:mo></mml:msup><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mi>?</mml:mi></mml:mrow></mml:mrow></mml:math>yi=β???i+f(??i)+?[S1]where <mml:math><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>yi is the target poverty value and <mml:math><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:math>??i is a vector of independent variables derived from the particular view for the <mml:math><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math>ith commune. Instead of assuming a fixed parametric form for <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(), we adopt a nonparametric approach, by assuming a GP prior on <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(), with zero mean function, and kernel function <mml:math><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>k(). The generative process thus becomes:<mml:math display="block"><mml:mrow><mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>~</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo><mml:mtext>??</mml:mtext><mml:mo>′</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>~</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mi>??</mml:mi><mml:mo>?</mml:mo></mml:msup><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo>?</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:mrow></mml:math>f(??)~GP(0,k(??,??′))yi~N(?????i+f(??i),σn2),?iA GP is a stochastic process, such that any finite sample generated from this stochastic process is jointly multivariate normal (15).

The posterior distribution of <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(???) at a test input, <mml:math><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:math>???, can be computed given a training set of examples, <mml:math><mml:msubsup><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup></mml:math>{??i,f(??i)}i=1N. The joint distribution of the training outputs, <mml:math><mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi></mml:mrow></mml:math>f(??1),f(??2),…, and the test output, <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(???), according to the GP prior is:<mml:math display="block"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable displaystyle="true"><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mo>?</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>N</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>~</mml:mo><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable displaystyle="true"><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mtext>??</mml:mtext></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mo>…</mml:mo></mml:mtd><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mtext>??</mml:mtext></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mtext>??</mml:mtext></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mo>…</mml:mo></mml:mtd><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mtext>??</mml:mtext></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mo>?</mml:mo></mml:mtd><mml:mtd columnalign="center"><mml:mo>?</mml:mo></mml:mtd><mml:mtd columnalign="center"><mml:mo>?</mml:mo></mml:mtd><mml:mtd columnalign="center"><mml:mo>?</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>N</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mtext>??</mml:mtext></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mo>…</mml:mo></mml:mtd><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>N</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mtext>??</mml:mtext></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>N</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mtext>??</mml:mtext></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mo>…</mml:mo></mml:mtd><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mtext>??</mml:mtext></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>[f(??1)f(??2)?f(??N)f(???)]~N(??,[k(??1,????)…k(??1,????)k(??1,???)k(??2,????)…k(??2,????)k(??2,???)????k(??N,????)…k(??N,????)k(??N,???)k(???,????)…k(???,????)k(???,???)])For notational simplicity, let <mml:math><mml:mi>K</mml:mi></mml:math>K denote a <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math>N×N matrix that contains the kernel computation on each pair of training inputs—that is, <mml:math><mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>K[i,j]=k(??i,??j)k be a vector of the kernel computation between each training input and the test input—that is, <mml:math><mml:mrow><mml:mrow><mml:mtext>??</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>??[i]=k(??i,???)—and <mml:math><mml:msub><mml:mi>k</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:math>k? be the self-covariance for x—that is, <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:msub><mml:mi>k</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:mpadded><mml:mo>=</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>k?=k(???,???). Moreover, let f be a <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>N×1 vector, such that <mml:math><mml:mrow><mml:mrow><mml:mtext>??</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>??[i]=f(??i). The above equation can be written as:<mml:math display="block"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable displaystyle="true"><mml:mtr><mml:mtd columnalign="center"><mml:mtext>??</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>~</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable displaystyle="true"><mml:mtr><mml:mtd columnalign="center"><mml:mi>K</mml:mi></mml:mtd><mml:mtd columnalign="center"><mml:mtext>??</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:msup><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msup></mml:mtd><mml:mtd columnalign="center"><mml:msub><mml:mi>k</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>[??f(???)]~N(??,[K?????k?])Since f and <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(???) are jointly Gaussian, one can make use of the well-known Gaussian identity (43) for the conditional distribution of <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(???)—that is:<mml:math display="block"><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo stretchy="false">|</mml:mo><mml:mtext>??</mml:mtext><mml:mo>~</mml:mo><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msup><mml:msup><mml:mi>K</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mo>?</mml:mo></mml:msub><mml:mo>?</mml:mo><mml:msup><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msup><mml:msup><mml:mi>K</mml:mi><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mtext>??</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>f(???)|??~N(???K?1??,k?????K?1??)[S2]We assume that the observed poverty for the <mml:math><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math>ith commune, <mml:math><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>yi, is equal to the sum of the linear term, the latent function value, with zero mean GP prior, and an independent and identically distributed Gaussian noise (<mml:math><mml:mrow><mml:mo>~</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>~N(0,σn2)). Thus, the prior on the observed data will be:<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>β</mml:mi><mml:mo>?</mml:mo></mml:msup><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>?</mml:mo><mml:mtext>cov</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>??</mml:mtext><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>δ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow></mml:mrow></mml:math>??[yi]=β???i?cov[yi,yj]=k(??i,??j)+δijσn2where <mml:math><mml:msub><mml:mi>δ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math>δij is the Kronecker delta, such that <mml:math><mml:mrow><mml:msub><mml:mi>δ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>δij=1, if <mml:math><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>(i=j), and 0 otherwise. For the entire training dataset:<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="normal">??</mml:mi><mml:mo>?</mml:mo><mml:mtext>cov</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mi>I</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:math>??[??]=???cov[??]=K+σn2Iwhere b is a <mml:math><mml:mi>N</mml:mi></mml:math>N length vector, such that <mml:math><mml:mrow><mml:mrow><mml:mtext>??</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msup><mml:mi>β</mml:mi><mml:mo>?</mml:mo></mml:msup><mml:msub><mml:mtext>??</mml:mtext><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math>??[i]=β???i, and <mml:math><mml:mi>I</mml:mi></mml:math>I is the <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math>N×N identity matrix. The joint distribution of y and <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(???) can be written as:<mml:math display="block"><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable displaystyle="true"><mml:mtr><mml:mtd columnalign="center"><mml:mtext>??</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>~</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable displaystyle="true"><mml:mtr><mml:mtd columnalign="center"><mml:mtext>??</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:msup><mml:mi>β</mml:mi><mml:mo>?</mml:mo></mml:msup><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable displaystyle="true"><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mi>K</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>N</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mi>I</mml:mi></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="center"><mml:mi>k</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:msup><mml:mi>k</mml:mi><mml:mo>?</mml:mo></mml:msup></mml:mtd><mml:mtd columnalign="center"><mml:msub><mml:mi>k</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>[??f(???)]~N([??β????],[K+σN2Ikk?k?])Using the conditional Gaussian result, similar to Eq. 2, and noting the relation between <mml:math><mml:msub><mml:mi>y</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:math>y? and <mml:math><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>f(???) from Eq. 1, the conditional distribution for the prediction <mml:math><mml:msub><mml:mi>y</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:math>y? becomes:<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>β</mml:mi><mml:mo>?</mml:mo></mml:msup><mml:msub><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msup><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msup><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>K</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mi>I</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo><mml:mtext>??</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>?</mml:mo><mml:mtext>var</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mo>?</mml:mo></mml:msub></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msup><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo></mml:msup><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>K</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mi>I</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mtext>??</mml:mtext></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow></mml:math>??[y?]=β????+???(K+σn2I)?1(?????)?var[y?]=k?????(K+σn2I)?1??+σn2

Estimating Moments of a Mixture Distribution

Let random variable <mml:math><mml:mi>y</mml:mi></mml:math>y represent a mixture of two unimodal normal distributions, <mml:math><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>~</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>μ</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>y1~N(μ1,σ12) and <mml:math><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>~</mml:mo><mml:mrow><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>y2~N(μ2,σ22) and mixing probabilities <mml:math><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math>w1 and <mml:math><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math>w2, such that <mml:math><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>w1+w2=1—that is:<mml:math display="block"><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>y</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>y</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math>y=w1y1+w2y2

Any moment of <mml:math><mml:mi>y</mml:mi></mml:math>y can be computed as (44):<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msubsup><mml:mi>y</mml:mi><mml:mn>1</mml:mn><mml:mi>k</mml:mi></mml:msubsup></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msubsup><mml:mi>y</mml:mi><mml:mn>2</mml:mn><mml:mi>k</mml:mi></mml:msubsup></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>??[yk]=w1??[y1k]+w2??[y2k]which directly gives:<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math>??[y]=w1μ1+w2μ2The expression for the variance of <mml:math><mml:mi>y</mml:mi></mml:math>y can be derived as follows:<mml:math display="block"><mml:mrow><mml:mrow><mml:mtext>var</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msubsup><mml:mi>y</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mi>??</mml:mi><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msubsup><mml:mi>y</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msub><mml:mi>μ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mpadded></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mtext>var</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:msubsup><mml:mi>μ</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mtext>var</mml:mtext><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:msubsup><mml:mi>μ</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msub><mml:mi>μ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mpadded></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>σ</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>μ</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msubsup><mml:mi>μ</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msubsup><mml:mi>w</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>μ</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:msubsup><mml:mi>w</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>μ</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>σ</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>μ</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>μ</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>σ</mml:mi><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mpadded width="+1.7pt"><mml:msubsup><mml:mi>σ</mml:mi><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mpadded></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mpadded width="+1.7pt"><mml:msub><mml:mi>μ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mpadded><mml:mo>?</mml:mo><mml:msub><mml:mi>μ</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:math>var[y]=??[y2]?(??[y])2=w1??[y12]+w2??[y22]?(w1μ1+w2μ2)2=w1(var[y1]+μ12)+w2(var[y1]+μ12)?(w1μ1+w2μ2)2=w1σ12+w2σ22+w1μ12+w2μ22?w12μ12?w22μ22?2w1w2μ1μ2=w1σ12+w2σ22+w1w2μ12+w1w2μ12?2w1w2μ1μ2=w1σ12+w2σ22+w1w2(μ1?μ2)2The last result makes use of the fact that <mml:math><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>w1+w2=1.

Interpretation of Weights—Along the Dimensions of Poverty

This section complements Dimensions of Poverty—Interpretation of Weights and refers to Fig. S2. These interpretations are given for information purposes and are by no means indicators of causality.

Several features related to the presence of water in the commune (water bodies, water, mudflat soil, hydro-morphic soil, and elevation) are positively correlated with water deprivation, although the opposite is observed for the other dimensions. One interpretation is that natural nonportable water would be used for drinking in these areas. On the other hand, access to water is needed for irrigated agriculture, watering livestock, or fishing, all of which can increase income and life quality, which explains the negative relationship for the other dimensions. Interestingly, the distance to a water tower is not quite correlated with this deprivation. Alternatively, the proximity to a water forage would have probably been a more interesting feature.

The food security (access) features (like distance to main roads and urban centers) are also prominent, stressing their importance for development. Millet price has a mixed behavior. Depending on the dimensions, its coefficient is sometimes positive and sometimes negative, without any evident explanation.

The effect of temperature is clear. The higher the maximum temperature and the range, the higher the poverty. Temperature plays a role in crop growth, but it also impacts the environment quality of the people who live in warm (and cold during the night) areas.

The effect of precipitation is less obvious. The amount and the period of rainfall affects the availability of water, which is the main limiting factor in Sahel for crop and forage production. The precipitation seasonality, described by the period during which the water is available, and the precipitation of the warmest quarter (critical period) are logically negatively correlated with poverty. However, the annual precipitation and the precipitation of the wettest month and quarter have a positive coefficient (except for education deprivation). In other words, the more it rains in an area, the poorer it is. The intuition would have been that it was the opposite. But looking more closely, it appears that several features related to agriculture (groundnut production, cassava production, rain-fed croplands) show the same patterns. We interpret that these features define a suitable environment for agricultural areas, which itself is linked to the presence of a rural community tending more to poverty than the urban population.

Discussion

The technological advances over the past decade have led to building of communication devices (like phones) and sensors (like satellites and weather and ground sensors) that produce and store a myriad of data. In this work, we show how these novel sources of data, which are characterized by their volume, variety, and associated uncertainty, can be used to generate accurate poverty maps.

We outline several challenges that lie in establishing relationships between auxiliary data sources (that are not collected to directly measure socioeconomic deprivations) and poverty. The first challenge occurs due to the varying spatial granularity at which the different datasets are available; this requires an aggregation mechanism to link them. CDR data are available for each subscriber, while environmental data have mixed spatial resolution, from very accurate vector data to low-resolution satellite imagery (1 km). On the other hand, census data are available for individuals or households, depending on the response variable. However, given that the individual information is anonymized for both CDRs and census data, there is no obvious way to link the records across these two datasets. In this work, we localize the individuals and/or households to their respective communes, or urban centers, by using their census information (details in Materials and Methods). This lets us calculate the commune-level deprivations. For CDRs, the individuals are localized to their home antennas based on their most frequent night location. The CDR and environmental data are aggregated to commune levels. Though we have taken a commune as the level of aggregation, the framework allows for the same analysis at even finer spatial resolutions.

A key concern associated with using CDR data for population-level analyses is the selection bias arising from mobile phone ownership. In Senegal, however, there were 92.93 mobile phone subscriptions per 100 inhabitants in 2013, which implies that most of the population owns cell phones (24). The second challenge is the bias arising when using data from only one provider. However, the provider of the data used here, Sonatel, had nearly 62% of the cell phone market in 2013 (25). The third concern is that some demographic subgroups like children and the ultra poor are left out by the analysis while only using CDR data. Also, results may be biased toward urban regions, rather than rural regions, because of factors like lack of electricity in rural areas.

Here, we used two distinct types of environment data. The first type includes static natural/physical environment variables (like elevation, soil types, etc.) or long-term dynamic phenomena (like climate). The second type includes human-induced aspects, like urban areas, roads, access to facilities, and so forth. Though the natural environment acts as a constraint in designing poverty eradication plans, effective policies and sustainable approaches should be made an integral part of policy planning. Environmental features derived from satellite images (nighttime lights, NDVI, etc.) have the potential to be computed in near real-time to monitor the impact of shocks such as natural hazards, armed conflicts, or crop pests that can rapidly cause serious deprivations. However, for reliability, these variables need to be aggregated for a longer period, typically at an annual level for nighttime lights and for the growing season for NDVI. OpenStreetMap (OSM) data, which are used to map facilities and roads, are crowd-sourced and therefore have the (theoretical) potential to be updated in near real time. This capability could be limited in African countries. Due to the above constraints, 1 y is probably the relevant period for consistent monitoring of poverty with our method (compare with 3–5 y for a detailed and costly census).

Another challenge is the ease of availability of data. Environmental datasets are available to researchers for free and typically have no privacy constraints, especially at the resolution at which it is analyzed here. CDR data are collected by commercial telecommunication entities and might suffer from lack of accessibility to researchers due to sharing constraints between different organizations. However, our methodology requires no raw data to be shared between different data-owning entities; only the output predictions from each individual model and the associated uncertainties are combined at the final step.

An important consideration is the number of features extracted from the data. Recent work (20) has used four features—namely, call volume and mobile ownership per capita, nightlights, and population density—to estimate the MPI of sectors in Rwanda using a linear regression model. As a baseline for our model, we used the same features and model to predict MPI values at the commune level in Senegal. A spatially cross-validated Pearson’s correlation of 0.84 was achieved with a significant P value (<mml:math><mml:mo><</mml:mo></mml:math><0.0001) (see Table S8 for comparison). Although less features provide computational tractability of analysis, they offer no insight into other features that could be useful in understanding poverty. Also, linear models are limited in their ability by the linearity assumption and sensitivity to outliers.

Table S8.

Comparative table showing how our model performs compared with only nightlights and a previous work (used as a baseline) using only four features—namely, call volume and mobile ownership per capita, nightlights, and population density

An important advantage of our GPR model is that each predicted poverty value is associated with an uncertainty (generated by the model). This highlights the strength of confidence in the predictions and can be used as guidance by policy makers. Comparing these source-specific uncertainties can reveal which data hold a better signal for a specific prediction (see Fig. 4). We note that for predicting A, the predictions of CDRs and environment data are comparable for most of the communes. For predicting H, CDRs perform with lower uncertainties than environmental data. These variations may be attributed to multiple reasons, including resolution and concurrency of data, demographics and mobile penetration of the cellular provider, and spatial heterogeneity of poverty deprivations.

Fig. 4.

The uncertainty associated with each dataset evidenced by the most accurate one (denoted as CDR and ENV) for the average intensity of poverty (A) (Left) and prediction of the headcountof poverty (H) (Right).

Though we have discussed the methodology for predictions at the commune level, our predictions of MPI and associated dimensions can be successfully aggregated to coarser administrative units, if needed, for policy planning. Since we use global MPI as the poverty index, its limitations, as noted by global MPI researchers (26), are applicable to our study as well. In particular, global MPI does not include characteristics such as parents’ education, social norms and beliefs, empowerment, etc.

Additionally, it will be interesting to see how well this methodology can be used to predict other indicators of deprivation and inequality, like the GINI index, at the microregional level. Apart from being useful in producing interim statistics in between long cycles of census and surveys, such methodology can also be extended to places of conflict or remote areas that are difficult to reach by census takers.

As described in the results, the interpretation of the model coefficients provides some insights on the dimensions of MPI. However, due to the number of variables, this interpretation is still complex and not necessarily straightforward for policy intervention. Conversely, the MPI dimensions are well-known factors for which policy planning is feasible (26). As an illustration, Fig. S4 shows the highest predicted deprivation for each commune within each dimension.

Fig. S4.

The highest deprivation by commune as predicted by our model for each dimension of global MPI (from top to bottom: education, health, and standard of living).

Lastly, though GPR model uncertainty is impacted by the bias and inaccuracy of each data source (quality of soil type map, interpolation of climatic data, missing facilities, mobile operator’s market share), a higher resolution and accuracy of the input data should benefit the modeling relevance and quality.

Materials and Methods

Target Country.

Senegal is a sub-Saharan country that ranks 170 on the Human Development Index with a score of 0.466 and a population of 14.5 million (with 43.5% urban population) (27). As one of the poorest countries in the world, it has 52% of the population living in multidimensional poverty (27). On the other hand, there are 98.8 mobile phone subscriptions per 100 people (24). Senegal is composed of 14 coarsest administrative units called regions, which are further divided into 45 administrative units called departments. The finest level of administrative units is called a commune. There are 552 communes (121 as urban centers and 431 rural) (Fig. 1).

Data Sources.

CDRs.

A CDR consists of an identifier with the caller and callee, the antenna location of the caller, the time of the call, duration of the class, and a flag indicating if the record is a text or a call. A CDR is generated each time a call or text is placed. The data belong to the subscribers of Sonatel, Orange, which is the dominant telecom provider in Senegal. The data are anonymized and span a period from January 1 to December 31, 2013. They contain more than 9.54 million unique aliased mobile phone subscribers. The population of Senegal in 2013 was 14.13 million. Additionally, the geographical coordinates of the mobile antennas are known, and shown in Fig. 1.

Environmental Features.

Based on literature, several environmental features that may have a relationship with poverty have been explored (see Table S1). They are either based on Geographical Information System (GIS), Earth Observation data, or weather stations.

Census.

The Agence Nationale de la Statistique et de la Demographie (ANSD), which is the National Statistics Office of Senegal, provided us with a 10% sample of the 2013 census [called RGPHAE (Recensement General de la Population de l’Habitat de l’Agriculture et de l’Elevage)]. The data are evenly sampled across the entire population of Senegal and are from 1.4 million individuals, spread across 150,000 households, characterizing information related to demographic statistics (mortality, fertility, migration, literacy, occupation, etc.), along with habitat features, such as type of roof, floor, access to drinking water, sanitation, and agriculture practices. The advantage of the census is that it represents important national statistics at the level of individuals. Brief statistics of the data sources are given in Table 1.

The mobile phone data used in this study can be obtained for replication purposes by contacting Zbigniew Smoreda ([email protected]).

Feature Extraction.

CDRs.

We have access to more than 11 billion mobile phone transactions involving calls and texts for a year in Senegal. Each time a call or text is placed, it is logged as a transaction. Missed, forwarded, and other undelivered calls were removed from the logs.

To extract important features that quantify the mobile use pattern of a subscriber, we focus on well-studied metrics capturing the individualistic, spatial, and temporal patterns of the subscriber (28?30). The individual aspects quantify the typical use pattern of an individual. Some of the metrics that belong to this category are the number of active days, the number of contacts, the average call duration, percent nocturnal, and so forth. Spatial metrics are the ones that quantify the typical movement pattern of an individual. Examples of spatial metrics for a subscriber include radius of gyration, entropy of antennas, and so forth. There are <mml:math><mml:mn>43</mml:mn></mml:math>43 core features (briefly described in Table S2), extracted using the Bandicoot toolbox (31). All features were calculated at monthly granularity capturing the temporal aspect of a subscriber, resulting in <mml:math><mml:mrow><mml:mn>43</mml:mn><mml:mo>×</mml:mo><mml:mn>12</mml:mn></mml:mrow></mml:math>43×12 CDR-based features.

The second step is to localize each subscriber, <mml:math><mml:mi>i</mml:mi></mml:math>i, to his or her home antenna. A home antenna, <mml:math><mml:msub><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>hi, is calculated as one from where the subscriber makes the most nocturnal calls (from 7 PM to 7 AM) during each month. We filtered out individuals who made less than five calls during each month and who were not active for at least half of the year within the range of their home antennas. This ensures that individuals are reliably allocated to their home antennas. After the filtering step, the sample contained 6.19 million individuals (65% of the original subscriber population).

We then computed the average feature value for each antenna site by computing the average of the feature values for all individuals who consider that antenna as their home:<mml:math display="block"><mml:mrow><mml:msubsup><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:msub><mml:mi>N</mml:mi><mml:mi>a</mml:mi></mml:msub></mml:mfrac><mml:mrow><mml:munder><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>:</mml:mo><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:mrow></mml:munder><mml:msubsup><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mrow></mml:mrow></mml:math>ma(f)=1Na∑i:hi=ami(f)[9]where <mml:math><mml:msubsup><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:math>mi(f) is the fth feature value.

Finally, we compute the feature value for each commune as the weighted average of all antennas whose voronoi polygon intersects with the commune boundary as:<mml:math display="block"><mml:mrow><mml:msubsup><mml:mi>m</mml:mi><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mo>∑</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mrow><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math>mc(f)=1∑wc,a∑wc,ama(f)[10]The weight <mml:math><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:math>wc,a is the ratio <mml:math><mml:mfrac><mml:mrow><mml:mi>A</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>∩</mml:mo><mml:mi>a</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>a</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math>Area(c∩a)Area(a), which is a measure of how much of the voronoi cell for antenna <mml:math><mml:mi>a</mml:mi></mml:math>a falls within the boundary of commune <mml:math><mml:mi>c</mml:mi></mml:math>c. To study how well the Voronoi-based approach has performed in assigning people to their communes, we study the correlation of the commune population estimated by this approach and that calculated from census. The Pearson’s correlation is reported as <mml:math><mml:mn>0.85</mml:mn></mml:math>0.85 with a P value of <mml:math><mml:mrow><mml:mo><</mml:mo><mml:mn>0.00001</mml:mn></mml:mrow></mml:math><0.00001, thus ensuring the validity of our approach.

Environmental features.

In this study, we focus on three broad categories of environmental features: food security (divided into the availability and access components), economic activity, and access to services (see Table S1). These three categories cover most of the features that have been shown to be significantly related to poverty in the literature (see Table S3).

Food security is mainly described by agrometeorological measurements (temperature, precipitation, slope, elevation, soil type) that drive agricultural production (crop production), one of the most important inputs, along with livestock and fishing, of food availability in the country. On the other hand, access to staple food can be approximated by the average millet prices observed in the markets (retail prices in 56 local markets). Millet serves as the main local staple food crop in the country, making it a potentially good indicator of poverty. In addition, proximity to main road and urban centers was also computed to describe the connectivity to major markets.

The economic activity corresponds to the intensity of urbanization. Among the studied features, the nighttime lights are the most frequently used to describe poverty using remote-sensing data (20). Moreover, a clear link between household wealth and the level of night light emissions has been shown before (32). The underlying hypothesis is that economic activity and urbanization are strong indicators of living standards.

Finally, the access to services can help to predict some of the individual indicators of poverty. In particular, the proximity to school, water towers, and hospitals can be used to determine the deprivation in education, water, and health, respectively.

The raw environmental data are available either in raster grid (at different spatial resolutions) or in vector format. As a first step, all vector data were converted into raster grid format. Then, all data layers were resampled (using the nearest neighbor approach) at a spatial resolution of 100 m. Pixel values falling within each commune’s boundary were averaged to give a unique value for that commune.

All environmental data are available at high spatial resolution, with the exception of crop production and millet prices (see Table S1 for the data sources). Millet prices were available in 56 local markets, potentially missing some of the local heterogeneity. Production estimation features were derived from the Direction de l’Analyze, de la Prévision et des Statistiques Agricoles (DAPSA) database. The granularity of these features is at the department level. Cultivated areas were masked using the 2005 1:100,000 Scale Senegal Land Cover Map produced by the Global Land Cover Network based on the GlobCover 2005 map (33), which is the most accurate map for Senegal (34). Since reliable information on the spatial distribution of each crop is unavailable, we made an assumption that they were grown evenly within the cultivated areas of a specific department. Therefore, the production of a specific department was distributed evenly among all of the 100-m pixels that fell within the cropland of this department. This raster was then used to aggregate the production estimations by communes.

The Normalized Difference Vegetation Index (NDVI) is used as a proxy of potential agricultural production within a department. The NDVI, defined as the difference between near-infrared and red reflectances normalized by the sum of the two parameters, is a useful yield proxy in regions where water or soil fertility are the main limiting factors, such as Sahel (35, 36). For each pixel within cultivated areas, NDVI values above 0.2 during the growing season (July to November) were integrated (TNDVI), which limited the contribution of bare soil to the signal.

Model Training.

The unknown parameters of each source-specific model in Eq. 1 are as follows: the parameter <mml:math><mml:mi>β</mml:mi></mml:math>β of the linear component, the hyperparameters of the kernel function <mml:math><mml:mrow><mml:mi mathvariant="normal">?</mml:mi><mml:mpadded width="+1.7pt"><mml:mo>,</mml:mo></mml:mpadded><mml:msub><mml:mi mathvariant="normal">?</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mpadded width="+1.7pt"><mml:mo>,</mml:mo></mml:mpadded><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>f</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math>?,?s,σf2, and the variance of the error term <mml:math><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math>σn2. These are estimated by maximizing the marginalized likelihood of the target poverty values in the training data y. The marginalized likelihood is obtained by taking the integral of the likelihood times the prior:<mml:math display="block"><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">|</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo largeop="true" symmetric="true">∫</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">|</mml:mo><mml:mtext mathvariant="bold">f,X</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">|</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>d</mml:mi><mml:mtext>??</mml:mtext></mml:mrow></mml:math>p(??|??)=∫p(??|f,X)p(??|??)d??[11]where the matrix X contains the training input vectors as rows and f is a vector containing the latent function values for the inputs in X. The GP prior means that <mml:math><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">|</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>~</mml:mo><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mpadded width="+1.7pt"><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo></mml:mrow></mml:mpadded><mml:mi>K</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>p(??|??)~N(??,K) and the likelihood is a Gaussian—that is, <mml:math><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">|</mml:mo><mml:mpadded width="+1.7pt"><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo></mml:mrow></mml:mpadded><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>~</mml:mo><mml:mi mathvariant="script">N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mi>??</mml:mi><mml:mo>+</mml:mo><mml:mpadded width="+1.7pt"><mml:mrow><mml:mtext>??</mml:mtext><mml:mo>,</mml:mo></mml:mrow></mml:mpadded><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mi>I</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>p(??|??,??)~N(????+??,σn2I). The integration of Eq. 11 yields the following marginalized log likelihood (15) of the training data:<mml:math display="block"><mml:mrow><mml:mi>log</mml:mi><mml:mo>?</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">|</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>?</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo><mml:mtext>??</mml:mtext><mml:mi>β</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>?</mml:mo></mml:msup><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>K</mml:mi><mml:mo>+</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mi>I</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>?</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo>?</mml:mo><mml:mtext>??</mml:mtext><mml:mi>β</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>?</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mi>log</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mi>K</mml:mi><mml:mo>+</mml:mo><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mi>I</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mo>?</mml:mo><mml:mfrac><mml:mi>N</mml:mi><mml:mn>2</mml:mn></mml:mfrac><mml:mi>log</mml:mi><mml:mo>?</mml:mo><mml:mn>2</mml:mn><mml:mo>?</mml:mo><mml:mi>π</mml:mi></mml:mrow></mml:math>log?p(??|??)=?12(?????β)?(K+σn2I)?1(?????β)?12log|K+σn2I|?N2log?2?π[12]where <mml:math><mml:mi>N</mml:mi></mml:math>N is the number of training examples.

To regularize the coefficients in <mml:math><mml:mi>β</mml:mi></mml:math>β, we apply elastic net regularization on the marginalized log likelihood to obtain the following objective function:<mml:math display="block"><mml:mrow><mml:mi>J</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mpadded width="+1.7pt"><mml:mo>,</mml:mo></mml:mpadded><mml:mi mathvariant="normal">?</mml:mi><mml:mpadded width="+1.7pt"><mml:mo>,</mml:mo></mml:mpadded><mml:msub><mml:mi mathvariant="normal">?</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mpadded width="+1.7pt"><mml:mo>,</mml:mo></mml:mpadded><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mpadded width="+1.7pt"><mml:mo>,</mml:mo></mml:mpadded><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>f</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mo>?</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">|</mml:mo><mml:mtext>??</mml:mtext><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>?</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>α</mml:mi><mml:mi>λ</mml:mi><mml:mo>∥</mml:mo><mml:mi>β</mml:mi><mml:msubsup><mml:mo>∥</mml:mo><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>?</mml:mo><mml:mi>α</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>λ</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mi>β</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>J(β,?,?s,σn2,σf2)=log?p(??|??)?(αλ∥β∥22+(1?α)λ|β|)[13]The function <mml:math><mml:mi>J</mml:mi></mml:math>J is maximized to estimate the hyperparameters using conjugate gradient descent (37).

All codes used to replicate the results can be obtained by writing to the corresponding author.

Regularization.

Regularization techniques, such as those used in Lasso (38) or Ridge regression (39), are often used to improve model performance, especially when the data contain several irrelevant features. The <mml:math><mml:msub><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math>L2 penalty, imposed by Ridge regression, ensures shrinkage of regression coefficients to avoid overfitting. On the other hand, the <mml:math><mml:msub><mml:mi>L</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math>L1 penalty imposed by Lasso forces the coefficients to be sparse, thereby providing feature selection. However, neither of the two regularization methods have been found to universally dominate the other (38). For instance, in the presence of groups of correlated features, Lasso tends to select only one feature within each group, which leads to poor interpretability of the estimated coefficients. Elastic net regularization (17) is a weighted addition of <mml:math><mml:msub><mml:mi>L</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math>L1 and <mml:math><mml:msub><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math>L2 penalties and combines the strengths of both Lasso and Ridge regression. It is known to select a greater number of influential features than Lasso and has a lower false-positive rate than ridge regression.

We used elastic net regularization to penalize complexity of the solution and to avoid overfitting on the limited training dataset. The elastic net penalty is computed as:<mml:math display="block"><mml:mrow><mml:mrow><mml:mi>α</mml:mi><mml:mi>λ</mml:mi><mml:msubsup><mml:mrow><mml:mo>∥</mml:mo><mml:mi>β</mml:mi><mml:mo>∥</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>?</mml:mo><mml:mi>α</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>λ</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo><mml:mi>β</mml:mi><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math>αλ∥β∥22+(1?α)λ|β|[14]Our empirical results show that elastic net regularization results in better prediction accuracy, compared with ordinary least squares, Ridge, and Lasso regression.

Model Validation.

This section details the steps followed to validate our model, namely creating commune-level poverty statistics from census data and methodology for spatial CV.

Creating commune poverty statistics from census.

The 10% sample of the 2013 RGPHAE census, used here, has survey responses for 150,000 households and 1.4 million individuals pertaining to their socioeconomic indicators (literacy, birth and death in the family, etc.) and habitat (type of house, access to electricity and drinking water, etc.). Some survey responses are individualistic (like literacy and profession), while others are associated with the entire household (like type of roof, sanitation, electricity).

The first step is to assign the individuals to their respective households using information from the fields in the census. The second step is to calculate per-household deprivations in the poverty indicators of interest. Global MPI computation (26) requires deprivations along three dimensions (with 10 indicators)—namely, health (child mortality, nutrition), education (child school attendance, years of schooling), and standard of living (electricity, sanitation, drinking water, flooring, cooking fuel, assets).

We follow the procedure similar to the widely used Alkire–Foster methodology for computing MPI (40). First, we create a deprivation vector <mml:math><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>depveci,d corresponding to each household <mml:math><mml:mi>i</mml:mi></mml:math>i in poverty indicators <mml:math><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mpadded width="+1.7pt"><mml:mo>,</mml:mo></mml:mpadded><mml:mi mathvariant="normal">…</mml:mi><mml:mpadded width="+1.7pt"><mml:mo>,</mml:mo></mml:mpadded><mml:mi>D</mml:mi></mml:mrow></mml:mrow></mml:math>d=1,…,D. Each vector entry is either <mml:math><mml:mn>1</mml:mn></mml:math>1 if <mml:math><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math>yi,d <mml:math><mml:mrow><mml:mo>≤</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>d</mml:mi></mml:msub></mml:mrow></mml:math>≤zd, where <mml:math><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math>yi,d is the achievement of household i in indicator <mml:math><mml:mi>d</mml:mi></mml:math>d and <mml:math><mml:msub><mml:mi>z</mml:mi><mml:mi>d</mml:mi></mml:msub></mml:math>zd is the cutoff score in indicator d, or <mml:math><mml:mn>0</mml:mn></mml:math>0 otherwise. A value of <mml:math><mml:mn>0</mml:mn></mml:math>0 for <mml:math><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>depveci,d implies nondeprivation of the household in that particular indicator. For the values of cutoff scores for different indicators, see Table S7. We aggregate all households that are deprived in a particular indicator, for each commune, and divide by the total number of households in that commune. This score gives the proportion of households deprived in a particular indicator within a commune.

Since MPI is a multiplicative combination of H and A—that is, <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:mtext>MPI</mml:mtext></mml:mpadded><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="normal">H</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="normal">A</mml:mi></mml:mrow></mml:mrow></mml:math>MPI=H×A—we first calculate H and A. For H, we introduce a weight, <mml:math><mml:msub><mml:mi>w</mml:mi><mml:mi>d</mml:mi></mml:msub></mml:math>wd, for each indicator <mml:math><mml:mi>d</mml:mi></mml:math>d. For each household, we compute a weighted deprivation score, <mml:math><mml:mrow><mml:mpadded width="+1.7pt"><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mpadded><mml:mo>=</mml:mo><mml:mrow><mml:msubsup><mml:mo largeop="true" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>D</mml:mi></mml:msubsup><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>d</mml:mi></mml:msub><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math>ci=∑d=1Dwddepveci,d. The weights <mml:math><mml:msub><mml:mi>w</mml:mi><mml:mi>d</mml:mi></mml:msub></mml:math>wd are assigned as follows. The education- and health-related indicators are given a weight of <mml:math><mml:mfrac><mml:mn>1</mml:mn><mml:mn>6</mml:mn></mml:mfrac></mml:math>16, while each of the six standard of living indicators are given a weight of <mml:math><mml:mfrac><mml:mn>1</mml:mn><mml:mn>18</mml:mn></mml:mfrac></mml:math>118. Thus, each dimension has a weight of <mml:math><mml:mfrac><mml:mn>1</mml:mn><mml:mn>3</mml:mn></mml:mfrac></mml:math>13.

<mml:math><mml:msub><mml:mi>H</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math>Hj, which is the relative headcount of poor households in commune <mml:math><mml:mi>j</mml:mi></mml:math>j, is calculated as:<mml:math display="block"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:msub><mml:mi>N</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mfrac><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:munderover><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>></mml:mo><mml:mi>θ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>Hj=1Nj∑i=1NjI(ci>θ)[15]where <mml:math><mml:mi>θ</mml:mi></mml:math>θ is a cutoff, whose higher values mean a higher cutoff for household achievement, and <mml:math><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>></mml:mo><mml:mi>θ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>I(ci>θ) is the indicator function. <mml:math><mml:msub><mml:mi>N</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math>Nj is equal to the total number of households in the jth commune.

To calculate A, we count only the poor households, and their deprivations, as follows:<mml:math display="block"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msubsup><mml:mo largeop="true" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:msubsup><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>></mml:mo><mml:mi>θ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:munderover><mml:mo largeop="true" movablelimits="false" symmetric="true">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:munderover><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>></mml:mo><mml:mi>θ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>?</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>Aj=1∑i=1NjI(ci>θ)∑i=1NjI(ci>θ)?ci[16]The value of threshold <mml:math><mml:mi>θ</mml:mi></mml:math>θ is taken as <mml:math><mml:mn>0.3</mml:mn></mml:math>0.3. We varied <mml:math><mml:mi>θ</mml:mi></mml:math>θ from 0.2 to 0.75, and the H and A values obtained in each run were correlated with region-level H and A, available from University of Oxford’s MPI calculation [Oxford Poverty & Human Development Initiative (OPHI)]. The results were stable and peaked at 0.3, which is also the threshold value taken by OPHI for its calculations.

Spatial CV.

To measure the extrapolation capacity of the model on out-of-sample data, spatial CV techniques, where training and evaluation sets are sampled from geographically distinct regions, are more robust (18, 41). The following spatial CV strategy was adopted: For each CV run, we first randomly sampled a region <mml:math><mml:mi>r</mml:mi></mml:math>r from the set of 14 regions and then randomly sampled a commune <mml:math><mml:mi>c</mml:mi></mml:math>c belonging to <mml:math><mml:mi>r</mml:mi></mml:math>r. All communes that lie within distance <mml:math><mml:mi>d</mml:mi></mml:math>d of the commune <mml:math><mml:mi>c</mml:mi></mml:math>c are included in the training dataset. The remaining communes are included in the evaluation dataset.

This strategy ensures that communes from all regions of Senegal are represented in the training and evaluation datasets during CV. To ensure that the training dataset has enough examples, we forced at least 40% of the communes (225) to be included in the training dataset. To achieve this, <mml:math><mml:mi>d</mml:mi></mml:math>d is initially set to <mml:math><mml:mn>100</mml:mn></mml:math>100 km and is increased by <mml:math><mml:mn>50</mml:mn></mml:math>50 km until the size of the training dataset meets the threshold. CV is repeated <mml:math><mml:mn>250</mml:mn></mml:math>250 times. We report the mean predictive performance (using Pearson’s and Spearman’s correlation and RMSE values) on the evaluation dataset, along with the SD across multiple runs.

Understanding Model Uncertainty

The predictive variance associated with the GP model, as calculated using Eq. 4, indicates the model uncertainty for a test target. The variance does not depend on the observed target values, only on the inputs. The variance at a given test commune is directly related to how many similar communes (in terms of the CDR, environmental, and spatial features) are available in the training data. For instance, if the predictive variance is high for a given test commune, it would mean that the relative density of the training feature vectors in proximity of the feature vector corresponding to the test commune is low, and hence the GP model will yield a higher predictive variance. This could explain the higher variance for MPI predictions observed for rural communes compared with urban communes.

Acknowledgments

We thank Nicolas de Cordes and Stephanie de Prevoisin (Orange) and the other winner teams of the Data 4 Development Challenge (D4D), and we thank the two anonymous reviewers for their useful comments. We thank Orange Sonatel, Senegal, and Orange Laboratories, Paris, for providing the raw CDRs and the National Statistics Office of Senegal (ANSD) for the census. N.P. and D.C.J. are both funded by Bill & Melinda Gates Foundation Grant OPP1114791. D.C.J.’s work is also partly funded by the Belgian National Fund for Scientific Research through Fonds pour la Formation à la Recherche dans l’Industrie et dans l’Agriculture Grant 5211815F.

Footnotes

  • ?1N.P. and D.C.J. contributed equally to this work.

  • ?2To whom correspondence should be addressed. Email: neetipok{at}buffalo.edu.

This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

References

  1. ?
    .
  2. ?
    .
  3. ?
    .
  4. ?
    .
  5. ?
    .
  6. ?
    .
  7. ?
    .
  8. ?
    .
  9. ?
    .
  10. ?
    .
  11. ?
    .
  12. ?
    .
  13. ?
    .
  14. ?
    .
  15. ?
    .
  16. ?
    .
  17. ?
    .
  18. ?
    .
  19. ?
    .
  20. ?
    .
  21. ?
    .
  22. ?
    .
  23. ?
    .
  24. ?
    .
  25. ?
    .
  26. ?
    .
  27. ?
    .
  28. ?
    .
  29. ?
    .
  30. ?
    .
  31. ?
    .
  32. ?
    .
  33. ?
    .
  34. ?
    .
  35. ?
    .
  36. ?
    .
  37. ?
    .
  38. ?
    .
  39. ?
    .
  40. ?
    .
  41. ?
    .
  42. ?
    .
  43. ?
    .
  44. ?
    .
  45. .
  46. .
  47. .
  48. .
  49. .
  50. .
  51. .
  52. .
  53. .
  54. .
  55. .
  56. .
  57. .
  58. .

Online Impact

  • 7638311322 2018-02-22
  • 9654151321 2018-02-22
  • 1588961320 2018-02-22
  • 5712971319 2018-02-22
  • 5536211318 2018-02-22
  • 4417061317 2018-02-22
  • 3024201316 2018-02-21
  • 4658931315 2018-02-21
  • 3216561314 2018-02-21
  • 1965251313 2018-02-21
  • 970811312 2018-02-21
  • 609011311 2018-02-21
  • 3219131310 2018-02-21
  • 613261309 2018-02-21
  • 6972481308 2018-02-21
  • 2758991307 2018-02-21
  • 5213301306 2018-02-21
  • 6402651305 2018-02-21
  • 975701304 2018-02-20
  • 619701303 2018-02-20