Estimating mobility using sparse data: Application to human genetic variation

1. aResearch Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom;
2. bResearch Laboratory for Archaeology & the History of Art, University of Oxford, Oxford OX1 3QY, United Kingdom;
3. cDepartment of Zoology, University of Cambridge, Cambridge CB2 3EJ, United Kingdom;
4. dManchester Institute of Biotechnology, School of Earth and Environmental Sciences, University of Manchester, Manchester M1 7DN, United Kingdom;
5. eLeverhulme Centre for Human Evolutionary Studies, Department of Archaeology and Anthropology, University of Cambridge, Cambridge CB2 1QH, United Kingdom;
6. fCentre for Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, London WC1E 6BT, United Kingdom;
7. gDepartment of Medical and Molecular Genetics, King’s College London, London SE1 9RT, United Kingdom
1. Edited by Eske Willerslev, University of Copenhagen, Copenhagen, Denmark, and approved September 28, 2017 (received for review March 3, 2017)

Significance

Migratory activity is a critical factor in shaping processes of biological and cultural change through time. We introduce a method to estimate changes in underlying migratory activity that can be applied to genetic, morphological, or cultural data and is well-suited to samples that are sparsely distributed in space and through time. By applying this method to ancient genome data, we infer a number of changes in human mobility in Western Eurasia, including higher mobility in pre- than post-Last Glacial Maximum hunter–gatherers, and oscillations in Holocene mobility with peaks centering on the Neolithic transition and the beginnings of the Bronze Age and the Late Iron Age.

Abstract

Mobility is one of the most important processes shaping spatiotemporal patterns of variation in genetic, morphological, and cultural traits. However, current approaches for inferring past migration episodes in the fields of archaeology and population genetics lack either temporal resolution or formal quantification of the underlying mobility, are poorly suited to spatially and temporally sparsely sampled data, and permit only limited systematic comparison between different time periods or geographic regions. Here we present an estimator of past mobility that addresses these issues by explicitly linking trait differentiation in space and time. We demonstrate the efficacy of this estimator using spatiotemporally explicit simulations and apply it to a large set of ancient genomic data from Western Eurasia. We identify a sequence of changes in human mobility from the Late Pleistocene to the Iron Age. We find that mobility among European Holocene farmers was significantly higher than among European hunter–gatherers both pre- and postdating the Last Glacial Maximum. We also infer that this Holocene rise in mobility occurred in at least three distinct stages: the first centering on the well-known population expansion at the beginning of the Neolithic, and the second and third centering on the beginning of the Bronze Age and the late Iron Age, respectively. These findings suggest a strong link between technological change and human mobility in Holocene Western Eurasia and demonstrate the utility of this framework for exploring changes in mobility through space and time.

One of the major goals of population history inference is to assess the role played by past mobility in shaping patterns of genetic, phenotypic, and cultural variation. It is well recognized that the past movement of people shapes geographic patterns of genetic variation (1) and the subsequent ecological and evolutionary properties of populations (2). This is due to the fact that gene flow changes allele frequencies, shapes genetic drift, and can affect (3) or even mimic (4) natural selection processes. It is also recognized that migration activity can influence cultural evolutionary processes (5, 6). However, despite the general agreement that mobility has played an important role in shaping past and present patterns of genetic, phenotypic, and cultural variation among humans, relatively little is known about its temporal and geographic variation in the past (7).

Inferring past mobility is challenged by the sparseness and unevenness of sampling in time and space. As a result, studies of prehistorical mobility are typically limited to descriptive approaches, where major attested migration episodes or events are used as a proxy for general mobility. Data sources such as stable isotopes have enabled some quantification of mobility by allowing researchers to identify individuals within an archaeological community who have migrated into a region during their lifetime (e.g., ref. 8). The underlying logic behind this approach is that differences between isotope ratios—particularly strontium—within organisms reflect the isotope ratios acquired from the local environment (as a result of variation in underlying geology) (9). However, it is challenging to extrapolate within-community mobility rates to migration rates across larger geographic regions or over long time periods. Furthermore, isoscapes are still often poorly characterized, and isotope ratios can be relatively constant over large areas (9, 10) and so are not always informative.

Most standard population genetic tools used for quantifying population structure, such as ADMIXTURE analysis (11), f statistics (12), and TREEMIX (11), are poorly suited for estimating underlying mobility change through time. In classical population genetic analysis, estimators of migration rates between hypothesized subpopulations have been developed, including statistics such as FST (13). Some of these statistics have also been applied to large sets of quantitative trait data, such as variation in craniometric morphology (e.g., refs. 14 and 15). However, such statistics quantify differentiation among a set of contemporaneous samples and only inform on migration rates under idealized demographic scenarios—such as gene flow between discrete subpopulations—and are also influenced by other factors, such as subpopulation split times and population size fluctuations. Furthermore, these estimators reflect past migration between hypothesized subpopulations over large periods of time and therefore lack temporal resolution. Some researchers interpret the estimated ages and geographic distribution of clades on a phylogenetic tree of uniparental genetic systems (mtDNA or the Y chromosome) as proxies for the rate of spread of populations (e.g., ref. 16). However, such approaches do not permit a formal quantification of mobility and have been criticized as a tool for demographic inference (17?19).

Thus, existing methods allow us to identify migration episodes to some extent but lack the temporal resolution and formal quantification of underlying mobility, are poorly suited to spatially and temporarily sparsely sampled data, and do not permit systematic comparison between different time periods or geographic regions. To overcome these problems, we present an estimator of past mobility that is particularly suited to sparsely distributed morphological, cultural, or genetic variation data and provide a first application to a large set of genome-wide data from ancient individuals from across Western Eurasia. We define mobility as the average distance moved by entities in a given time period.

Estimating Past Migration Rates

Under a general model of identity-by-descent with modification and isolation by distance (IBD) (20, 21), trait (genetic, morphological, or cultural) differences between any two entities (individuals or populations) increase monotonically as a function of both the temporal and spatial distance between them. We therefore expect that trait differences between entities correlate with temporal as well as spatial distances. However, the extent to which spatial and temporal differences explain observed trait variation depends on the level of spatial population structure and therefore on the level of mobility. If mobility was low (i.e., strong spatial structure), then we would expect differences between entities to be more strongly correlated with space, relative to time, whereas if mobility was high we would expect time to explain a relatively larger proportion of differences between entities (because of the homogenizing effects of high mobility across space).

Given that both spatial and temporal distances are expected to correlate with trait differences among entities, a matrix combining both spatial and temporal distance information should give a stronger correlation than either matrix alone (extra correlation, EC). However, since spatial and temporal distances are measured in different units (e.g., kilometers and years), combining them requires a scaling factor (S). Here, we show that the scaling factor value (Smax) that maximizes the correlation between a trait difference matrix and a Euclidian distance matrix combining the spatial and temporal distance matrices provides an estimator of mobility over the period and region covered by the data (see Materials and Methods and Fig. 1). For convenience, we use a geometric interpretation of the scaling factor Smax as an angle, α, in the plane defined by the spatial and temporal distances [α = atan(Smax), illustrated in Fig. 1, Inset; see Materials and Methods].

To test the reliability and the robustness of Smax in recovering information about past mobility, we simulated data under a spatiotemporally explicit 2D model, which includes simple population dynamics with population growth, density dependence, and mobility (modeled as a Gaussian random walk), and generated variation data under different mobility parameter values (see Materials and Methods). We assessed the ability of Smax to infer simulated mobility values by correlation across simulations. We found a strong, positive linear relationship between the simulated average migration distance (i.e., mobility) and values of Smax (Fig. 2, R2 = 0.8), thus demonstrating the utility of this statistic as an estimator for relative mobility. However, for this result to hold, it is important that the trait differences are generated under an approximately constant mutation rate and vary neutrally within a population.

View larger version:
Fig. 2.

Correlation between simulated movement rate (dmig) and estimated scaling factor (Smax). Each black circle represents a single simulation. The colors correspond to the density of circles (see the color scale bar). The black line shows the best linear fit between dmig and Smax (R2 = 0.8), demonstrating that the scaling factor captures the underlying mobility in the simulated world.

Migration Rates Among Pleistocene Hunter–Gatherers and Early Farmers

Recent advances in sequencing technologies have allowed genomic data retrieval from a large sample of past individuals (e.g., refs. 22???26). Although these studies have not explicitly quantified underlying mobility in the past, they have suggested several periods of large-scale population turnover in Western Eurasia.

Given that the Smax statistic is able to recover information on past mobility in simulated data, we applied the method to a sample (n = 329) of previously published genome-wide genotype data covering a time period from the beginning of the Upper Paleolithic to the Iron Age to explore changes in past human mobility in Western Eurasia (see Materials and Methods). We also constructed nonparametric confidence intervals to account for date and sampling uncertainty and estimated P values for the Smax statistic by permutation under the null hypothesis of no IBD in space and time, which allowed us to quantify the robustness of our estimates and identify time periods during which data are too sparse for the Smax statistic to be informative (see Material and Methods). First, we explored the extent to which mobility differed between pre- and post-Last Glacial Maximum (LGM) hunter–gatherers (Fig. 3). We found the average (median) mobility rates to be higher (α = 18.1; 95% CI: 14.9–87.7; P = 0.08) among pre-LGM hunter–gatherers, temporally ranging from 37,000–26,000 y ago, compared with post-LGM hunter–gatherers (α = 9.9; 95% CI: 9.5–10.9; P = 0.03), temporally ranging from 19,000–5,000 y ago. We also estimated mobility rates for Holocene farmers, temporally ranging from 10,000–1,000 y ago, and found even higher values (α = 34.8; 95% CI: 33.9–35.3; P < 0.0001) than for both hunter–gatherer groups (see SI Appendix, Table S2 for full results).

View larger version:
Fig. 3.

Boxplot showing the mobility rate estimates (from jackknifing and date resampling) among pre-LGM hunter–gatherers temporally ranging from 37,000–26,000 y ago (n = 19), post-LGM hunter–gatherers temporally ranging from 19,000–5,000 y ago (n = 47), and Holocene farmers temporally ranging from 10,000–1,000 y ago (n = 263). The black solid lines are the medians of the distributions. The boxes represent the interquartile ranges, and the whiskers show the spans of the distributions.

Because Holocene western Eurasia is particularly well sampled for ancient genomic DNA, we performed a sliding window analysis to explore changes in mobility over the last 14,000 y in more detail (Fig. 4), using 4,000 y-wide windows to ensure sufficient temporal signal within each window. We inferred a reduction in mobility rate between 14,000 and 9,000 y ago, before the start of the Neolithic transition (Fig. 4A). However, throughout most of this period, the P values are not significant (Fig. 4B). Because of the small sample size in the windows covering this time period (Fig. 4C), there is no significant correlation between genetic and temporal distances, and as a result, we do not observe any EC and so lack power to estimate mobility (see Materials and Methods and SI Appendix, Fig. S3). We consequently treat the inferred decline in mobility in this time range with caution. Second, we infer a substantial increase in mobility centered on the beginning of the Neolithic, with a peak centered around 7,500 y ago (Fig. 4A). Notably, the inferred mobility rate does not remain at this level throughout the Holocene. Instead, we infer a Late Neolithic drop in mobility before a second increase centered on the beginning of the Bronze Age, around 5,000 y ago, then a decline in the Late Bronze Age and Early Iron Age, before a final increase centered on the Late Iron Age (Fig. 4A and SI Appendix, Table S3 for full results for each window).

View larger version:
Fig. 4.

Estimation of mobility through time from empirical data. (A) Relative mobility rate estimates in Western Eurasia over the last 14,000 y, using a 4,000-y sliding window (121 windows). The solid black line represents the mean α value from 10,000 date resampled iterations; the colored area represents the 95% confidence intervals of the jackknife distribution. (B) P values for each 4,000-y window under the null hypothesis of no EC, constructed by calculating the proportion of permuted datasets where the calculated EC value was as high or higher than the average EC value from the empirical dataset (see Material and Methods). The red dotted line represents the level above which 5% or more of the permuted datasets result in EC values as high or higher than the empirical dataset. (C) Sample size for each 4,000-y window, averaged over 10,000 date resampled iterations.

To validate the efficacy of our method to identify changes in migration rate on the time scales found in the empirical dataset (Fig. 4), we modified our simulations to represent a population experiencing two changes in migration rate, resulting in three episodes of constant migration rate. We observe a good correspondence between changes in Smax and the simulated migration rate (SI Appendix, Fig. S4), supporting our interpretation of the empirical results in Fig. 4.

Finally, we compare the performance of the Smax statistic to a simple IBD through time approach, where (the slope of) the linear relation between the genetic distances and geographic distances is used as an indicator of the level of past migratory activity: High levels of migration correspond to shallow IBD patterns. We observe a trend of decreasing spatial structure, consistent with the cumulative effects of a series of high migratory activity episodes over this period. However, this approach fails to recover the timing of those changes in migratory activity in specific periods (SI Appendix, Fig. S5). Our method overcomes this lack of power to identify changes in migratory activity by explicitly considering the temporal dimension of the data.

Discussion

Through spatiotemporally explicit simulations, we have shown that the Smax statistic can be used as a reliable proxy for the underlying relative mobility of individuals within a given time period and geographic region. Because our statistic is based on correlations, it is well suited for analyzing data from archeological and paleontological contexts, where preservation can vary significantly across different geographical areas and temporal ranges, and samples are commonly sparsely distributed across space and time. Nevertheless, in the extreme case of just a small number of sites from different geographic locations or temporal periods, spurious estimates of migratory activity may arise. The permutation procedure introduced in this study can be used to identify when the Smax estimator is uninformative. We choose only to consider relative changes in the value of the Smax estimator and do not attempt to interpret its values in absolute terms. This is because, while our intuition is that mutation rate and population size will not affect the relationship between absolute values of the Smax estimator and the true mobility rate, we admit the possibility that other factors may. Selection in response to ecological and environmental factors could also reduce the utility of the Smax statistic as a proxy for mobility because local selection can create confounding spatial or temporal population structures. However, this is a common problem for any analysis assuming neutral evolution and can be dealt with by focusing on putatively neutrally varying traits or loci.

The Smax statistic offers a robust alternative to existing methods for the quantification of IBD patterns in temporally heterogeneous datasets. In population genetics, correlations between trait differences and geographic distances are commonly used to infer past population structure and connectivity between populations (27). In such approaches, temporal structures in data are usually either ignored or mathematically controlled for by using partial least squares (e.g., ref. 28), but both of these practices have been criticized (29?31), and we show that while such approaches can inform the cumulative effects of migration in terms of structure reduction, they are unable to recover temporal changes in migratory activity. Partial least squares analysis assumes that the effect of time on genetic differences can be decoupled from the effect of space, which is generally not the case. We avoid this problem by integrating space and time into a single distance measure. Finally, because the statistic contains information about both spatial and temporal structuring of the populations, it can be used as a potentially informative summary statistic in quantitative model-fitting frameworks such as Approximate Bayesian Computation (32).

Using the Smax statistic on ancient genomic data, we identified a sequence of changes in human mobility from the late Pleistocene to the Iron Age in western Eurasia. We find some support for reduced mobility in west Eurasian post-LGM hunter–gatherers compared with pre-LGM populations. The reasons for this result are, as yet, unclear, although possible explanations include reduced resource availability in pre-LGM Western Eurasia, requiring larger foraging ranges compared with post-LGM conditions (33, 34) and/or residual post-LGM population structure following recolonization of northern latitudes from LGM southern refugia (35). Using a sliding window analysis, we find some suggestion of a decline in post-LGM hunter–gatherer mobility leading up to the Neolithic transition. However, we caution against overinterpretation of this result as the estimated P values for the Smax statistic under the null hypothesis of no EC are mostly not significant. We find strong support for a rise in mobility during the Neolithic transition in western Eurasia, likely corresponding to a well-established demic expansion of farmers, originating in the Middle East and resulting in the spread of farming technologies throughout most of Western Eurasia (36?38). This is followed by an inferred mobility decline toward the end of the Neolithic, possibly related to the terminal phase of the spread of farming across most of Western Eurasia, and increased sedentism (39, 40). We also find strong support for a rise in mobility centered on the onset of the Bronze Age. From previous ancient DNA studies, this period has been associated with large-scale migration of Eurasian steppe populations, particularly those related to the Yamnaya culture, into Central and Northern Europe (22, 23). However, the emergence of the first civilizations and the concomitant establishment of far-reaching trade networks, as well as technological innovations such as horse-based transport (41), could also contribute to this increase in mobility (42). Finally, our sliding window analysis indicates a mobility reduction centered on the Late Bronze Age and Early Iron Age, starting around 3,000 y ago, before a final increase centered on the Late Iron Age in Western Eurasia (Fig. 4A). One possible explanation for this pattern is a significant increase in trade and warfare during that period (43?45). Overall, our analysis suggests a strong link between technological change and human mobility in Holocene Western Eurasia. However, it should be noted that we have used wide windows (4,000 y), which necessarily reduces chronological resolution.

A major strength of our method is its applicability to any set of neutrally evolving heritable traits where differences between individuals can be quantified and increase monotonically with geographic distance and temporal difference. This means that, in principle, the Smax statistics could allow the quantification of migratory activity in temporal and environmental contexts where obtaining ancient genetic data is not feasible, by using phenotypic data such as variation in cranial morphology, which has been shown to fit the pattern of neutral evolution and closely follow the patterns observed in analyses of neutral genetic data in humans (46, 47). Another exciting possibility is the quantification of movement based on cultural variation data, provided that appropriate near-neutral traits are used (e.g., refs. 48?50). While it should not be assumed that the movement of artifacts always coincides with that of people, contrasting measures of movement based on genetics and cultural artifacts obtained under the same conceptual framework would allow quantification of demic vs. cultural diffusion processes. This might permit identification periods and regions where genetic, phenotypic, and cultural processes are coupled, or decoupled. Given its robustness and flexibility, we anticipate that the Smax estimator will be applicable to a wide range of genetic, phenotypic, and cultural traits, allowing the quantification of mobility in a wide variety of scenarios in which this type of analysis has previously been challenging.

Materials and Methods

The proposed migration rate estimator, Smax, is the value of a scaling factor combining spatial and temporal distance matrices into a single distance matrix that maximizes its correlation with a matrix of trait distances. To estimate that value, the geographical, temporal, and trait distance matrices are calculated as described below.

Geographic, Temporal, and Trait Distances.

The geographic distance between all sample pairs was calculated in kilometers using the Haversine Eq. (51) to account for the curvature of the Earth as follows:<mml:math display="block"><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mi>r</mml:mi><mml:mo>?</mml:mo><mml:mi>arcsin</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msqrt><mml:mrow><mml:mi>sin</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>φ</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>?</mml:mo><mml:msub><mml:mi>φ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mi>cos</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>φ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>cos</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>φ</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>sin</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>λ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>?</mml:mo><mml:msub><mml:mi>λ</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math>Gij=2r?arcsin(sin((φj?φi)/2)2+cos(φi)cos(φj)sin((λi?λj)/2)2),[1]where G is the distance in kilometers between individuals i and j; φi and φj are the latitude coordinates of individuals i and j, respectively; λi and λj are the longitude coordinates of individuals i and j, respectively; and r is the radius of the earth in kilometers.

Temporal distances between samples were calculated as time in years between sample pairs. Previously reported date ranges based on stratigraphy or direct radiocarbon dating were used for all individuals (SI Appendix, Table S1). In all analyses, sample dates were randomly drawn from a uniform distribution corresponding to the upper and lower bounds of a time period for a given specimen.

Genetic distances were calculated as pairwise proportion of alleles that are not identical by state (pairwise heterozygosity), using the function ibs.dist from the Bioconductor package SNPstats v.1.18.0 (52) in the R statistical analysis environment v3.2.2 (53).

The Smax Estimator.

To consider the full range of scaling factors on a finite interval, we choose to represent S as the tangent of an angle α between 0° and 90°, where α = 0 corresponds to S = 0 (geographic variation alone explains the observed trait distances between entities) and α = π/2 corresponds to S = ∞ (temporal variation alone explains the observed trait distances between entities). Formally, the time–space product matrix (D) was calculated as follows:<mml:math display="block"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:msubsup><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>S</mml:mi><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mo>,</mml:mo></mml:mrow></mml:math>Dij=Gij2+(STij)2,[2]where i and j are the specimens considered, D is the time–space product matrix, G is the geographical distance matrix, T the temporal distance matrix (given by the difference in age of the samples), and S is the scaling factor [S = tan(α)].

To find the scaling factor, Smax, that maximizes the correlation between the trait distance matrix and D, the time–space product matrix, we calculated the Pearson correlation coefficient between these matrices for 200 (500 for the simulated data) scaling factor values (Fig. 1). The scaling factor value in the time–space product matrix that produced the strongest correlation with the trait distance matrix is recorded as Smax, the mobility estimator.

Simulation Tests.

The reliability and the robustness of the Smax statistic in recovering information about past mobility was explored using a spatiotemporally explicit simulation model. The simulation world consists of a grid of 8,000 by 8,000 demes. Each simulation starts with one entity placed in a randomly chosen deme and lasts 20,000 generations. The model simulated exponential population growth to a carrying capacity of 10,000 entities, followed by a stochastic birth–death process (54) mobility, and trait mutation. We generated spatiotemporal trait variation data under different mobility parameter values using the same Smax estimation protocols as described above for each dataset. A total of 10,000 independent replicates of the simulations and analyses were generated, and the utility of the Smax statistic in recovering information about mobility was assessed by correlation.

The migratory process was modeled as Gaussian random walks: In each generation, each entity moves independently in the x and y directions by distances picked randomly from a normal distribution with mean = 0 and SD = σmig. This corresponds to the average distance moved in a single step (dmig) of <mml:math><mml:mrow><mml:msqrt><mml:mrow><mml:mrow><mml:mi>π</mml:mi><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msqrt></mml:mrow></mml:math>π/2 σmig = 1.2533 σmig. Thus, dmig is the parameter of interest. We choose 1,000 random values of dmig from a uniform distribution with a range of 1–100. We modeled drift as a Moran-type birth–death process (54). At each generation, each entity undergoes binary fission with probability P = 0.1, creating a duplicate of itself at the same location. The two entities subsequently move and evolve independent of each other. When the number of entities reaches or exceeds the carrying capacity (10,000), excess entities are deleted at random among all entities present in that generation. Mutation was modeled as a one-dimensional Gaussian random walk for each trait (Ntraits = 50). Each trait was assigned an initial value of 1,000, and new (mutated) values were picked from a random normal distribution with mean equal to the current value and SD fixed at 0.05.

Following a burn-in period of 10,000 generations, entities were sampled from simulations with a probability of 0.00001 at each generation. The x and y coordinates, time of sampling in generations, and the values for the 50 traits were recorded for all sampled entities.

Pairwise trait distances between all sampled entities in each of the simulated datasets were calculated using the Euclidean distance formula as follows:<mml:math display="block"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover><mml:mo>∑</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover></mml:mstyle><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>?</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:msqrt><mml:mo>,</mml:mo></mml:mrow></mml:math>Mij=∑k=1n(dik?djk)2,[3]

where Mij is the distance between the two entities i and j; dik and djk are the values of the trait k for individuals i and j respectively; and n is the number of recorded traits.

Out of 10,000 simulations, 9,866 (98.66%) resulted in an EC greater than zero. To match the simulated data with the empirical data, we filtered the simulated data based on the measured EC values and removed all simulations that produced an EC value smaller than 0.001. This resulted in 9,155 simulations being used in the correlation analysis.

To assess the reliability of the Smax statistic in recovering information about mobility, R2 values were calculated for the correlation between the simulated dmig values and their corresponding Smax values.

Human Mobility in Late Pleistocene and Holocene.

We considered genome-wide data comprising 354,199 SNPs typed in 329 West Eurasian (i.e., west of the Ural mountains) individuals (SI Appendix, Fig. S2) temporally ranging from ～39,000–1,000 y before present (see SI Appendix, Fig. S1). We merged the overlapping SNPs typed in archaeological samples published in refs. 22???26 and 55?????61 (see SI Appendix, Table S1 for list of samples and references) that met the geographic and temporal criteria described above. No additional bioinformatic processing of the data were carried out for this study.

The 329 individuals were assigned to one of following three groups based on their estimated age and subsistence strategy based on their archaeological context: pre-LGM hunter–gatherers n = 19 (temporally ranging from 39,000 y B.P. to 26,000 y B.P.), post-LGM hunter–gatherers n = 47 (temporally ranging from 19,000 y B.P. to 5,000 y B.P.), and Holocene farmers n = 263 (temporally ranging from 10,000 y B.P. to 500 y B.P.).

Sliding window analysis was performed on all individuals in the dataset postdating 16,000 y B.P. The Smax statistic was estimated for 121 overlapping 4,000-y windows, each differing by 100 y.

To take age uncertainty into account, we report the mean scaling factor angle from 10,000 replicates with sample dates randomly resampled from their age ranges. We estimated 95% confidence intervals through a jackknifing procedure in which a randomly chosen sample in each window was removed from analysis, and the 0.025 and 0.095 quantiles were calculated from the resulting distribution.

To estimate the IBD signal through time, we fitted a linear model of genetic distances as a function of geographic distances in each time window (with sample jackknifing and age resampling as before, using the lm function from the R package base version 3.2.2) (53) and reported the slope of the line.

Confidence Intervals and Robustness of Smax Estimator.

We tested the assumption that there is an IBD pattern by correlating the genetic (trait) distance matrices in all time bins and in all windows with the respective geographic distance matrices and the date-resampled temporal distance matrices and calculated the P values for these correlations. We find a positive and statistically significant IBD pattern in space in all windows (SI Appendix, Fig. S3 A and B, respectively and SI Appendix, Fig. S6). The isolation by temporal distance pattern is positive and significant for most windows, but some windows show negative correlations or are not significant. We find that these windows correspond to time periods where we observe low EC (SI Appendix, Fig. S3C) and also low P values for the EC (Fig. 4B).

To account for the uncertainty in sample ages, we calculated the scaling factor angle 10,000 times using dates resampled at random from a uniform distribution for each sample, as described above, and report the average of the scaling factor angle of the given distribution as a point estimate.

We also performed a leave-one-out analysis (10,000 replicates, combined with sample date resampling) to explore the combined effect of sampling and dating uncertainty and constructed approximate equal-tailed 95% confidence intervals for all groups and windows.

To assess the statistical significance of Smax estimates, we consider the EC—defined as the Pearson correlation coefficient between the trait difference matrix and the time–space product matrix when S = Smax, minus the Pearson correlation coefficient between the trait difference matrix and either the temporal or geographical distance matrix alone, whichever is higher.

To obtain a null-distribution of EC, we permuted trait data for individuals among the spatiotemporal sample locations 10,000 times and calculated EC for each permutation, as described above. Finally, we calculate the proportion of EC values from the permuted datasets that are equally high or higher than that obtained from the observed data. This permutation test permits assessment of how frequently the EC for the observed data are produced by chance alone or, alternatively, as the result of the method used for estimating the Smax statistic. The resultant P value is the probability of observing an equally high or higher EC value in permuted, supposedly signal-less data, and provides an indication of the information content of each dataset.

Simulated Scenario of Changing Migration Rate.

We modified our simulations to represent a population experiencing two changes in migration rate, resulting in three episodes of constant migration rate. We assumed a generation time of 25 y and chose the effective population size to be 2Ne = 10,000, standard figures in population genetic models of European populations (62). We next chose three levels of migration with relative magnitude on par with what was inferred from the empirical data: m1 = 0.0002, m2 = 0.01, and m3 = 0.05. To ensure equilibrium conditions during the start of the sampling period, we discarded the first 10,000 steps of the simulation (using migration rate m1). We then simulated a time period of 20,000 y, divided into three episodes with constant migration rate: m1 for 25,000–15,000 y ago, m2 for 15,000–10,000 y ago, and m3 for the last 5,000 y of the simulation. This roughly corresponds to the time spans associated with Mesolithic hunter–gatherers, Neolithic farmers, and post-Neolithic cultures in our empirical dataset. From a population genetic point of view, whole genome data as used in the empirical estimates correspond to a large number of approximately independent replicates. Because our model does not include recombination, we accounted for this effect by increasing the sample size to 10,000 individuals. SI Appendix, Fig. S4 shows the migration rate estimation using the Smax statistic using a 4,000 y-wide sliding window.

R version 3.2.2 (53) was used for analyses throughout this manuscript. The correlations between temporal, geographic, and trait distance matrices were calculated using the mantel (method = “pearson”) function in R package Vegan version 2.3.0 (63). The permutation and bootstrap tests were performed using the function sample in the R package base version 3.2.2 (53).

Acknowledgments

We are very grateful to Robert Foley for valuable discussions during the formulation of the approach, to Mike Parker-Pearson for advice on Holocene migration processes, and to Tamsin O’Connell for advice on stable isotopes. L.L. was supported by Natural Environment Research Council, UK Grants NE/K005243/1 and NE/K003259/1 and European Research Council Grant 339941-ADAPT. M.G.T. was supported by Wellcome Trust Senior Investigator Award Grant 100719/Z/12/Z and Leverhulme Trust Grant RP2011-R-045. A.M. and A.E. were supported by European Research Council Consolidator Grant 647787-LocalAdaptation. M.M.L. was supported by European Research Council Advanced Grant 295907, In-Africa. M.K. was funded by the Engineering and Physical Sciences Research Council through the Centre for Mathematics and Physics in the Life Sciences and Experimental Biology.

Footnotes

• ?1To whom correspondence may be addressed. Email: LiisaLoog{at}gmail.com or m.thomas{at}ucl.ac.uk.
• Author contributions: M.G.T. devised the approach in discussion with M.M.L.; L.L. and M.G.T. developed the method with input from A.E.; M.K., A.E., and M.G.T. developed the simulation code with input from L.L.; L.L. performed the analyses with input from A.E. and M.G.T.; and L.L., A.E., and M.G.T. wrote the paper with input from M.M.L, M.K., and A.M.

• The authors declare no conflict of interest.

• Data deposition: The R code used for analyses is available from the GitHub repository (http://www.danielhellerman.com/LiisaLoog/SpaceTime-Framework.git).

References

1. ?
.
2. ?
.
3. ?
.
4. ?
.
5. ?
.
6. ?
.
7. ?
.
8. ?
.
9. ?
.
10. ?
.
11. ?
.
12. ?
.
13. ?
.
14. ?
.
15. ?
.
16. ?
.
17. ?
.
18. ?
.
19. ?
.
20. ?
.
21. ?
.
22. ?
.
23. ?
.
24. ?
.
25. ?
.
26. ?
.
27. ?
.
28. ?
.
29. ?
.
30. ?
.
31. ?
.
32. ?
.
33. ?
.
34. ?
.
35. ?
.
36. ?
.
37. ?
.
38. ?
.
39. ?
.
40. ?
.
41. ?
.
42. ?
.
43. ?
.
44. ?
.
45. ?
.
46. ?
.
47. ?
.
48. ?
.
49. ?
.
50. ?
.
51. ?
.
52. ?
.
53. ?
.
54. ?
.
55. ?
.
56. ?
.
57. ?
.
58. ?
.
59. ?
.
60. ?
.
61. ?
.
62. ?
.
63. ?
.

Online Impact

• 8686301327 2018-02-22
• 1879481326 2018-02-22
• 9332351325 2018-02-22
• 7384141324 2018-02-22
• 8918371323 2018-02-22
• 7638311322 2018-02-22
• 9654151321 2018-02-22
• 1588961320 2018-02-22
• 5712971319 2018-02-22
• 5536211318 2018-02-22
• 4417061317 2018-02-22
• 3024201316 2018-02-21
• 4658931315 2018-02-21
• 3216561314 2018-02-21
• 1965251313 2018-02-21
• 970811312 2018-02-21
• 609011311 2018-02-21
• 3219131310 2018-02-21
• 613261309 2018-02-21
• 6972481308 2018-02-21