• PNAS Subscriptions
  • Sign-up for PNAS eTOC Alerts

The emergence of spatial cyberinfrastructure

  1. Shaowen Wangb
  1. aDepartment of Geosciences, Oregon State University, Corvallis, OR 97331-5506; and
  2. bDepartment of Geography and National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL 61801

Abstract

Cyberinfrastructure integrates advanced computer, information, and communication technologies to empower computation-based and data-driven scientific practice and improve the synthesis and analysis of scientific data in a collaborative and shared fashion. As such, it now represents a paradigm shift in scientific research that has facilitated easy access to computational utilities and streamlined collaboration across distance and disciplines, thereby enabling scientific breakthroughs to be reached more quickly and efficiently. Spatial cyberinfrastructure seeks to resolve longstanding complex problems of handling and analyzing massive and heterogeneous spatial datasets as well as the necessity and benefits of sharing spatial data flexibly and securely. This article provides an overview and potential future directions of spatial cyberinfrastructure. The remaining four articles of the special feature are introduced and situated in the context of providing empirical examples of how spatial cyberinfrastructure is extending and enhancing scientific practice for improved synthesis and analysis of both physical and social science data. The primary focus of the articles is spatial analyses using distributed and high-performance computing, sensor networks, and other advanced information technology capabilities to transform massive spatial datasets into insights and knowledge.

The term cyberinfrastructure (CI) was first coined by a National Science Foundation Blue-Ribbon Committee (1) to reflect how the traditional modes of scientific research (e.g., experimentation in the laboratory, observation in the field, processing/analyzing on a single calculator or computer, and even calculations on the back of an envelope) are being enhanced and even revolutionized by the integrative capabilities of high-performance computers, storage and visualization tools for very large datasets, digitally enabled sensors and instruments in the environment, virtual organizations for collaborative problem solving, and interoperable suites of software services and tools (2). The world of scientific publishing is being transformed as part of CI evolution (3). CI, therefore, represents a paradigm shift in scientific research that has facilitated collaboration across distance and disciplines, thus enabling quick and efficient scientific breakthroughs that might not be possible otherwise.

Examples include the discovery of abrupt transitions in Earth's climate and ecosystem dynamics, previously unknown properties of minerals at extreme temperatures and pressures deep within the Earth, simulations of the development of the early universe, discoveries and insights through improved ocean models, understandings of individual and group behavior and its relationship to social, economic, and political structures, and creation of a human linkage genetic map (2, 4, 5). As Benioff et al. (6) note, computation, along with theory and experiment, has become “the third pillar” of science and engineering (6). Additionally, making scientific discoveries requires the computational ability to synthesize and analyze very large datasets that are integrated across biological, physical, and social sciences and engineering and across the science–technology interface, where Hey et al. (5) name “data-intensive science” as the “fourth paradigm.” Indeed, CI has become more than just hardware and software but its own evolving area of research in the realm of data-intensive science and digital libraries (59), with many countries investing hundreds of millions of dollars in CI research and development (10, 11) and calls coming from diverse scientific communities arguing the urgent need for further levels of CI investment (12, 13). Hey et al. (5) point out that, although we have attained high-performance computing at affordable cost and have made good progress on simulation tools, many challenges remain in effectively integrating multiple field observatories containing thousands of instruments, involving millions of users and petabytes of data, built on a true data grid with the ability to analyze data on that grid with sophisticated data analysis.

Spatial CI is an emerging term in the literature (1416), and it is defined as a specific type of CI that synergistically integrates the capabilities of CI, geographic information systems (GIS) (17, 18), and spatial analysis (19, 20) for geospatial problem solving and decision making. By spatial or space, we mean both real, physical space (i.e., on the surface of the Earth, in the atmosphere, or under the ocean) and virtual space (e.g., digital worlds or understanding how and where computers are connected worldwide). Nearly all of our knowledge about the world can be classified according to space (location, area, distance, or spatial interaction) as well as time. However, although time is divided into the globally understood units of seconds, hours, years, and so forth, spatial units and associated relationships are much more complex, multidimensional (e.g., x, y, and z), at multiple scales and resolutions, often heterogeneous (even in the representation of a single variable), and always changing over time. Without a clear understanding of space, any associated models, structures, and hypotheses may be erroneous (especially those about relationships among variables).

In particular, the complexity of geographic space poses significant computational and intellectual challenges in distributed spatial data access, sharing, and analysis, government-sponsored spatial data information infrastructures (21), and the geospatial semantic web (22) (i.e., locating and integrating information without human intervention, including providing the ability to search for geographic information within web pages), all of which are part of a spatial CI. However, many of these challenges are already well-known to those working on spatial data, and a variety of approaches not involving spatial CI has arisen to address these challenges. Spatial CI is going beyond these existing approaches by anchoring solutions in more sophisticated thinking about the representation and implications of space coupled with the latest in sophisticated mathematical and statistical models (2326) and forging more intimate collaborations between computer and information science and the domain disciplines of geography, geology and geophysics, oceanography, ecology, environmental engineering and sciences, and social sciences to name a few (5, 8, 27, 28). Such cross-disciplinary collaborations are making possible new knowledge systems that are leading to, at long last, a partial realization of a “Digital Earth,” as first envisioned by Vice President Al Gore (29) and now epitomized in products such as Google Earth, Microsoft Bing Maps, and National Aeronautics and Space Administration (NASA) World Wind.

The deluge of spatial data collected at an accelerated pace in the foreseeable future from sensor networks, satellites, and even cell phones continues to be driven by the tremendous needs of the aforementioned domains and cannot be well used or well-understood unless it can be properly managed, analyzed, and shared through spatial CI. The dynamic nature of the Earth system (e.g., waves, tides, atmospheric turbulence, and movements in the Earth's crust) further complicates our efforts to accurately and precisely measure the system. Massive datasets are common in the spatial analysis of human systems as well, including population and transportation systems, risk assessment, disease vectors, human mobility, and much more. Spatial analysis (broadly including spatial modeling) itself has traditionally encompassed a variety of approaches, including but not limited to spatial statistics (30, 31), heuristics and optimization (32, 33), and simulation for spatial problem solving and decision making (34, 35). These methods have been extensively applied in many fields (3639) but have been difficult to implement for large- and multiscale problems that are computationally intensive and require collaborative input. This is a limitation that has existed despite the advances already made to deal with the challenges associated with the complexity of geographic space mentioned earlier. However, spatial CI promises to remove this limitation and thus, transform spatial analyses into powerful and accessible computational utilities for enabling widespread scientific breakthroughs. Spatial CI is also proving invaluable in the estimation of errors that propagate from measurements through the analyses, and it is facilitating the development of better models for error representation, propagation, and management throughout large distributed computational networks (40).

The articles in this Special Feature address how the coupling of CI with spatial thinking and geographic approaches offers a promising path forward for solving scientific problems and improving decision-making practices of significant societal impact (e.g., assessing impacts of global climate change, understanding the complexity of coupled human–natural systems, sustaining ecosystem services, preserving and accessing digital resources in humanities and social sciences, and managing transportation infrastructure). They are far from inclusive of all aspects and current interests of spatial CI, because the field is growing quickly. However, they are representative of current research addressing longstanding problems of the complexity of spatial datasets and spatial analysis as well as the necessity and benefits of sharing spatial data flexibly and securely. This research highlights some of the discoveries and insights that can be gained, and these results could not have readily occurred without spatial CI.

Spatial Principles

The Special Feature begins with a technical treatment by Yang et al. (41) that examines the spatial principles governing the interaction of different parameters and phenomena in a variety of physical geographic studies (e.g., of the Earth's lithosphere, hydrosphere, atmosphere, pedosphere, and global flora and fauna patterns). Chief among them is the development of architecture and algorithms for distributed geographic information processing within a spatial CI (drawing in part on spatial CI theory introduced by Wang and Armstrong) (24) to enhance the understanding of ecosystem dynamics and improve the forecasting of the onset and extent of dust storms in the US southwest. As a result of the experiments, scientists were able to predict the onset of dust storms at higher resolutions (3 × 3 km) over longer time periods (5–10 d).

Physical Science Applications

Helly et al. (40) describe the evolution of a set of methods and software tools to integrate multiscale, -source, and -disciplinary oceanographic data over several recent research cruises to the Antarctic. Their initial goal was to investigate several scientific hypotheses about the movement of sea ice and meltwater plumes from icebergs, but an important parallel effort was the creation of a near real-time geospatial decision-support framework. As they constructed a spatial CI to support this framework, they were led to the development of a sampling scheme that was optimized to capture smaller scales of interest with respect to the broader scale of the study area. This sampling strategy overcame the limitations of the conventional sampling methods used previously (i.e., using a research ship as a static platform for sampling a single parameter on a station by station basis), thereby allowing for more rapid characterization of the surface of the ocean using multiple data streams at sea and in outer space and simultaneously over multiple spatial and temporal scales. Thus, without the spatial CI, Helly et al. (40) would not have been able to make direct observation and characterization of meltwater plumes from individual icebergs and would not have been able to effectively integrate these individual results with regional- and global-scale data. The results lend insights as to the influence of meltwater from icebergs on carbon flux from the surface of the ocean to sediments on the ocean floor as well as to the role that icebergs play in controlling biological productivity in the Weddell Sea. Their results also illustrate the importance of spatial CI in the overall scientific enterprise and identify key architectural and design considerations in the development of current and future Earth-observing systems, especially as oceanographers and other Earth scientists move into an era of petascale computing.

From Physical to Social Sciences and the Humanities

A goal of this Special Feature is to show that spatial CI is not only about using hardware and software or enabling the physical sciences but about distributed knowledge communities that serve the needs of the social sciences and humanities as well as the multiple stakeholders and decision makers of citizen groups from differing social, economic, and political backgrounds. Building a CI is also very much a social as well as a scientific endeavor. As such, Sieber et al. (42) report on a spatial CI incorporating the China Biographical Database (the largest in the world), the China Historical Geographical Information System (part of China's original Electronic Cultural Atlas Initiative), and the McGill-Harvard-Yenching Library Ming Qing Women's Writings database. The study focuses in general on a CI for humanities data, and specifically, on a spatial CI that aids research on Chinese women writers, their kinship networks, their publishing venues, and their literary and social communities. The article provides a critical examination of and recommendations on related issues of conflicting data that researchers may not necessarily want to eliminate from differing data models and geographic scales. This case study shows the value of spatial CI in removing difficulties arising from spatial and also multilingual, biographical, and temporal ambiguities in these databases, solutions that, again, would not be possible without spatial CI.

Buetow (4) notes that, although team or big science will continue to be necessary to achieve research goals, the small independent investigator is still “the engine of innovative research” and the widespread adoption of CI will allow the two approaches to blend harmoniously. Poore (43) expands on this theme in a final perspectives article on the needs and contributions of individual users within a spatial CI. Poore (43) notes that, in particular, as human geographers and other social scientists as well as geographic information scientists actively participate in spatial CIs as users, there is a great opportunity to make spatial CI a truly user-centered enterprise. Spatial CI should make room for not only the scientists who will use cybertools to collaborate at a distance but also the educators who will teach with CIs. This also applies to citizen scientist users who will contribute data and insights to CI projects on some of the most important scientific questions of the day, such as global climate change.

Concluding Perspective

Citizen scientists may, along with professional scientists, increasingly participate in the now ubiquitous cloud computing, which uses service-oriented architecture to control the life cycle of virtual machines and data archives for everything from one's personal address book to the largest of multidimensional, multidisciplinary scientific modeling systems. However, rather than federating autonomous entities (computing centers) into virtual organizations as computational grids do, clouds (Microsoft, Amazon, and Google) instead focus on delivering infrastructure as a service, software as a service, and so on. Huge commercial investments in clouds make it likely that these systems will dominate large-scale computing hardware and software in the next decade (44, 45). Spatial CI is an important subset of the more general CI, spanning both the computationally intense and interdisciplinary use requirements such as service hosting, virtual computing environments, and virtual datasets. The special requirements of spatial CI are a good match for the many common capabilities of clouds, thus warranting further fundamental and empirical research.

Indeed, the notion that spatial is special within CI introduces several interesting research challenges for physical and social scientists alike. Many geographic applications are interdisciplinary and involve multiple stakeholders and decision-makers who have diverse social, economic, and political backgrounds, thereby making collaboration critical but challenging. For example, how do we effectively and securely share and integrate spatial data, information, and analytical methods to develop and sustain evolving geographic knowledge? How do we facilitate collaborative spatial problem solving and decision making through virtual organizations?

Given the promise of spatial CI, for some, the effort in mastering it may still not be balanced by the apparent benefits, suggesting that the technology will always be the reserve of a highly technical group of experts. What will it take to popularize spatial CI beyond these experts, especially if it is to benefit the social sciences and humanities? Perhaps spatial CI will follow the path of GIS and eventually become as transparent as GIS is becoming in the world of Google Maps and Google Earth. Studies such as those by Yang et al. (41) and Poore (43) seek to distill the principles of spatial CI into simpler concepts that lend more obvious value to a broader range of users. Another approach may be to deal with conceptually and computationally unmanageable problems by dividing them spatially, understanding the resulting pieces, and then stitching the results back together. This divide and conquer approach, initially popularized in the literature of computational geometry (46), mirrors the way that society often solves its spatial problems. In the context of spatial CI, this implies spatially heterogeneous data and spatially explicit consideration for parallel and distributed processing within individual high-performance computers and/or across the grid as well as clouds.

Although this Special Feature provides a small sampling of a much broader scientific and engineering enterprise, we hope that it will help to elucidate some important issues and research questions, thereby accelerating scientific progress in this emerging area. As the size of spatial datasets and the complexity of spatial analysis and modeling continue to increase and the need for virtual collaboration in scientific research becomes compelling, the transformative research to establish user-centric, efficient, and extensible spatial CI becomes ever more important and timely. The intellectual merits of spatial CI stem from the complexity of the challenges, the dangers inherent in not fixing the errors that may propagate, the profound need to develop solutions that will benefit many fields of societal relevance, the continuing vision of achieving access to a complete Digital Earth, and the next generation of GIS—CyberGIS—with integrative high-performance, distributed, and collaborative capabilities (25). We have sought to make the case that spatial CI leads to discoveries in science. It is our hope that articles in this Special Feature have shown that spatial CI has facilitated such advances and made them more replicable, more readily distributed, and certainly, better visualized. It is only by advocating spatial CI that we will see the cyber-enabled approaches emerge that can make further scientific advances possible. We urge the scientific community to wait and see.

Acknowledgments

We thank all of the contributors to this special feature for their enthusiasm and skill in authoring these articles. We also thank the many reviewers for their thoughtful insights, which improved the manuscripts. We are grateful to our many colleagues in the Association of American Geographers (AAG) Cyberinfrastructure Specialty Group and the University Consortium for Geographic Information Science (UCGIS) for valuable discussions and inspiration as well as the PNAS editorial board member Susan Hanson. Finally, we thank editors Michael Goodchild and David Stopak for their encouragement, assistance, and helpful reviews. This material is based in part on work supported by National Science Foundation Grant BCS-0846655.

Footnotes

  • ?1To whom correspondence should be addressed. E-mail: dawn{at}dusk.geo.orst.edu.
  • Author contributions: D.J.W. and S.W. wrote the paper.

  • The authors declare no conflict of interest.

References

  1. ?
  2. ?
  3. ?
  4. ?
  5. ?
  6. ?
  7. ?
  8. ?
  9. ?
  10. ?
  11. ?
  12. ?
  13. ?
  14. ?
  15. ?
  16. ?
  17. ?
  18. ?
  19. ?
  20. ?
  21. ?
  22. ?
  23. ?
  24. ?
  25. ?
  26. ?
  27. ?
  28. ?
  29. ?
  30. ?
  31. ?
  32. ?
  33. ?
  34. ?
  35. ?
  36. ?
  37. ?
  38. ?
  39. ?
  40. ?
  41. ?
  42. ?
  43. ?
  44. ?
  45. ?
  46. ?

Online Impact

  • 864971864 2018-01-22
  • 258841863 2018-01-22
  • 957295862 2018-01-22
  • 553518861 2018-01-22
  • 983792860 2018-01-22
  • 539694859 2018-01-22
  • 956115858 2018-01-22
  • 730379857 2018-01-22
  • 346624856 2018-01-22
  • 201609855 2018-01-22
  • 72549854 2018-01-21
  • 795928853 2018-01-21
  • 752345852 2018-01-21
  • 566508851 2018-01-21
  • 615722850 2018-01-21
  • 689612849 2018-01-21
  • 846903848 2018-01-21
  • 674896847 2018-01-21
  • 11197846 2018-01-21
  • 986896845 2018-01-21