# Hello World: Introducing Spatial Data

## What is Spatial Analytics?

Spatial and Spatio-temporal data is all around us. In television and newspapers, we repeatedly see reporters informing the viewers *What are the chances of rain today?* *How do the weather conditions vary between Karachi and Islamabad?* Beyond asking simple questions, spatial data is necessary for making maps and visualizations to understand how the variations occur between the underlying regions. To make it simple, Geospatial data combines *locations* such as regions, streets, or census blocks with information in the database. *Spatial Data Analysis* is concerned with questions not directly answered by looking at the data themselves. Statistical inference for such a hypothesis is often challenging since it needs spatial data handling, spatial analysis, and spatial modeling.

In this blog series, we will learn and understand the fundamentals of Spatial Data Science.

# Why use R for Spatial Analysis?

For over 20 years, the **R** community has developed a number of R packages for handling and analyzing spatial data. Up to 2003, these packages were used to make different assumptions about how spatial data were organized. After some joint effort, the community developers wrote **R** package **sp** which *extends* base R classes and methods for spatial data.* Classes* specify a structure and define how spatial data is organized and stored while *Methods* are instances of functions specialized for a particular class. Due to the **sp** package, the R community got enhanced tools for understanding Spatial data. Therefore, for our blog series, we will use the **sp** package and understand its different methods and classes of it.

# What is GIS

Spatial data storage and analysis is traditionally done in Geographical Information System (GIS). According to Burrough and McDonnell (1998), a GIS is ‘. . . a powerful set of tools for collecting, storing, retrieving at will, transforming, and displaying spatial data from the real world for a particular set of purposes’.

For many spatial projects, using GIS is enough for the project requirements, but **R** makes a good choice for spatial data because of its capability in data analysis and visualization. Nowadays **R** scripts are used with GIS software and possibly GIS databases as well.

# Types of Spatial Data

Before we move to spatial analysis and its equivalent classes, we must be clear in types of Spatial Data types. A broad understanding of spatial data by Roger S. Bivand (2013) is as follows:

**Spatial data have spatial reference: they have coordinate values and a system of reference for these coordinates.**

As a simple example, we are interested in the volcanoes that have shown activity between 1980 and 2000. We could list the locations of these volcanoes as pairs of longitude/latitude decimal degree values with respect to the prime meridian at Greenwich and zero latitudes at the equator.

If we also have the magnitude of the last observed eruption at the volcano this information is called a **Nonspatial attribute**:

A nonspatial entity but the information will exist for each spatial entity (volcano).

The nonspatial attributes are utilized for characterizing the nonspatial features of the object. Other similar examples of nonspatial attributes would be the population of the city and the unemployment rate in the city.

It is important to understand that without such *explicit* spatial attributes, points in the map have *implicit* attributes. The implicit attributes are also known as **Spatial attributes**. We represent such *spatial* information of entities by *data models*. The different types of different data models are as follows as defined in Roger S. Bivand (2013):

**Point:** a single point location, such as a GPS reading or a geocoded address.

**Line:** a set of ordered points, connected by straight line segments.

**Polygon:** an area, marked by one or more enclosing lines, possibly containing holes.

**Grid:** a collection of points or rectangular cells, organized in a regular lattice.

The point, line, and Polygon are **vector data models** and represented entities as closely as possible, while the Grid is a **raster data model** representing continuous surfaces using regular tessellation (pattern of shapes that fit together perfectly.)

# Ending Remarks

Spatial Data Mining is a huge field that helps us analyze how certain variables geographically impact our lives. Why do certain spatial relationships exist? Why are certain locations popular travel destinations? Why does a brand do this successfully in one country and not in another?

In this article, I explained the fundamental blocks of spatial analysis including the spatial and nonspatial attributes. Next week, we will dive into exploring *Points* that form a fundamental component of maps

# References

Burrough, P. A., and R. A. McDonnell. 1998. *Principles of Geographical Information Systems*. Oxford University Press.

Roger S. Bivand, Virgilio Gómez-Rubio, Edzer Pebesma. 2013. *Applied Spatial Data Analysis with R*. Springer New York, NY.