|
|
|
Quarterly Review: The Basics of Spatial Data Analysis
Interactive Spatial Data Analysis by Trevor C. Bailey and Anthony C. Gatrell (Addison Wesley Longman, 1995, $60.75) Reviewed by Keith D. Holler
Spatial data analysis is essentially the study of data in which the relative location of events influences the process under review. Interactive Spatial Data Analysis by Bailey and Gatrell is an applied introductory guide through the topic. The text is organized around four general types of analysis.
Emphasizing the applied nature of the book, there are over two dozen individual data sets that are reviewed. A basic software package is included that allows one to step through the techniques discussed with the actual data.
The second chapter discusses the key components of computing systems. The distinction between mapping, database maintenance, and statistical analysis tools is made. Various software, including products like ARC/INFO, IDRISI, MapInfo, and SPlus, are discussed.
Chapter three begins the discussion of the first of the four general problems discussed, the analysis of point patterns. The data consist of the locations of a specific random event over a study region, R. Of interest is whether the pattern of events is random or contains a systematic pattern. Systematic patterns may be clusters, perhaps as in the location of fraudulent or staged auto accidents, or unnatural regular spacings, as in the location of cell nuclei in a tissue sample.
The null hypothesis of randomness, or complete spatial randomness (CSR) is defined as a homogeneous Poisson process. One hypothesis test discussed uses a test statistic based on quadrat counts. Quadrat counts are obtained by dropping small shapes, or quadrats, on the region R and counting the number of events contained in each quadrat. Another test is developed based on the (nearest) neighbor distances between events. The tests mentioned use approximate distributions for the test statistics. They can be considerably improved by simulating the distribution of the test statistic for the given region under the CSR assumption. The tests can also be adjusted for the influence of various covariates, like the underlying population density of the region.
Chapter five begins the discussion of the second type of problem, that of spatially continuous data. Examples of spatially continuous data include temperature, rainfall, and ore concentration. Generally, in this type of problem, one is concerned with modeling the process at points other than those included in the observed data.
Spatially continuous data exploration begins with some smoothing or interpolation techniques. Although they may have intimidating names, these techniques are actually easy to understand. The first set of techniques uses triangular regions to interpolate between the observed data points. These techniques include Dirichlet tesselation, TIN, and Voronoi polygons. The interpolations are used to construct contour maps over the entire region.
The more advanced models begin with a discussion of the variogram. The variogram can be used to estimate the covariance function between observations. The last half of chapter five and beginning of chapter six are devoted to the prediction technique called Kriging. The expected value at a new location is estimated as a linear combination of the observed values. Kriging basically gives us the weights to use, and allows one to compute confidence intervals about the predicted value. Kriging methods are actually somewhat dated. Chapter six also contains a very brief discussion of some other traditional multivariate methods like principal components and factor analysis.
Area data analysis is the third type of analysis discussed in the text. Area data is data that has been aggregated to a subregion level. Examples include voting data by state, disease incidence by town, or claim counts by zip code. With area data one is primarily interested in detecting and explaining patterns or trends.
The data visualization techniques and modeling are similar to those used in spatially continuous analysis. The exploratory techniques include simple mapping, weighted averaging, median polishing, and kernel smoothing. The more advanced models begin with generalized least squares (GLS), in which the covariance matrix is estimated via the variogram. The end of chapter seven discusses simultaneous autoregressive (SAR) and conditional autoregressive (CAR) models. Chapter eight is devoted to use of empirical Bayes analysis, generalized linear models, and image analysis, in area data analysis.
The last type of analysis is the analysis of the spatial interaction of data. The flows of items, typically people, between 'origins' and 'destinations' are modeled. An example would be modeling the flow of shoppers to area supermarkets. Given such a model, the company could then evaluate the expected impact of constructing a new supermarket in various locations. Adjustments would also be made for covariates like the attractiveness of the new facility. Other examples of this type of analysis include animal migration studies, transportation planning, and location of company-sponsored auto repair facilities.
The text concentrates on modeling flows with gravity models. Gravity models get their name in part because the attraction between specific origins and destinations increases as the distance decreases, in the model framework.
This book is written at an introductory level. While the techniques may sound overbearing, the text makes a concerted effort to describe the underlying ideas in a common sense fashion. Armed with an understanding of these basic premises and a good software package, such as Splus, the average actuary should be able to conduct the general types of analysis described.