Hello World: Introducing Spatial Data
What is Spatial Analytics?
Spatial and Spatio-temporal data is all around us. In television and newspapers, we repeatedly see reporters informing the viewers What are the chances of rain today? How do the weather conditions vary between Karachi and Islamabad? Beyond asking simple questions, spatial data is necessary for making maps and visualizations to understand how the variations occur between the underlying regions. To make it simple, Geospatial data combines locations such as regions, streets, or census blocks with information in the database. Spatial Data Analysis is concerned with questions not directly answered by looking at the data themselves. Statistical inference for such a hypothesis is often challenging since it needs spatial data handling, spatial analysis, and spatial modeling.
In this blog series, we will learn and understand the fundamentals of Spatial Data Science.
Why use R for Spatial Analysis?
For over 20 years, the R community has developed a number of R packages for handling and analyzing spatial data. Up to 2003, these packages were used to make different assumptions about how spatial data were organized. After some joint effort, the community developers wrote R package sp which extends base R classes and methods for spatial data. Classes specify a structure and define how spatial data is organized and stored while Methods are instances of functions specialized for a particular class. Due to the sp package, the R community got enhanced tools for understanding Spatial data. Therefore, for our blog series, we will use the sp package and understand its different methods and classes of it.
What is GIS
Spatial data storage and analysis is traditionally done in Geographical Information System (GIS). According to Burrough and McDonnell (1998), a GIS is ‘. . . a powerful set of tools for collecting, storing, retrieving at will, transforming, and displaying spatial data from the real world for a particular set of purposes’.
For many spatial projects, using GIS is enough for the project requirements, but R makes a good choice for spatial data because of its capability in data analysis and visualization. Nowadays R scripts are used with GIS software and possibly GIS databases as well.
Types of Spatial Data
Before we move to spatial analysis and its equivalent classes, we must be clear in types of Spatial Data types. A broad understanding of spatial data by Roger S. Bivand (2013) is as follows:
Spatial data have spatial reference: they have coordinate values and a system of reference for these coordinates.
As a simple example, we are interested in the volcanoes that have shown activity between 1980 and 2000. We could list the locations of these volcanoes as pairs of longitude/latitude decimal degree values with respect to the prime meridian at Greenwich and zero latitudes at the equator.
If we also have the magnitude of the last observed eruption at the volcano this information is called a Nonspatial attribute:
A nonspatial entity but the information will exist for each spatial entity (volcano).
The nonspatial attributes are utilized for characterizing the nonspatial features of the object. Other similar examples of nonspatial attributes would be the population of the city and the unemployment rate in the city.
It is important to understand that without such explicit spatial attributes, points in the map have implicit attributes. The implicit attributes are also known as Spatial attributes. We represent such spatial information of entities by data models. The different types of different data models are as follows as defined in Roger S. Bivand (2013):
Point: a single point location, such as a GPS reading or a geocoded address.
Line: a set of ordered points, connected by straight line segments.
Polygon: an area, marked by one or more enclosing lines, possibly containing holes.
Grid: a collection of points or rectangular cells, organized in a regular lattice.
The point, line, and Polygon are vector data models and represented entities as closely as possible, while the Grid is a raster data model representing continuous surfaces using regular tessellation (pattern of shapes that fit together perfectly.)
Ending Remarks
Spatial Data Mining is a huge field that helps us analyze how certain variables geographically impact our lives. Why do certain spatial relationships exist? Why are certain locations popular travel destinations? Why does a brand do this successfully in one country and not in another?
In this article, I explained the fundamental blocks of spatial analysis including the spatial and nonspatial attributes. Next week, we will dive into exploring Points that form a fundamental component of maps
References
Burrough, P. A., and R. A. McDonnell. 1998. Principles of Geographical Information Systems. Oxford University Press.
Roger S. Bivand, Virgilio Gómez-Rubio, Edzer Pebesma. 2013. Applied Spatial Data Analysis with R. Springer New York, NY.