Monthly Archives: October 2013

Google Circles and PostGIS Circles

When the user draws a circle on the map and then clicks Get Draw Report, a report appears in a dialog that shows the NEPP seminars that have taken place within the area of the circle. How does this magic happen?
Google circle and Draw Report

The first thing to understand is that there are two softwares combining with each other here, Google Maps and the PostGIS extension of the PostgreSQL database, and that they don’t automatically know about each other. This is different than the Microsoft environment where softwares are connected to each other and thus “know” about each other. This may seem like a good thing, but in reality it causes complexity that often causes problems with the softwares.

For example, recently at the my place of work an install of SQL Server 2008 Management Studio caused Visual Studio 2008 to fail. We then installed Managment Studio on a virtual machine where there was no other Microsoft software, and the software worked as it should. Having softwares “know” about each other violates the OO principles of “loose coupling” and “tight cohesion”. Software functions best when it is independent of other software.

In this case, Google Maps must expose enough information about its circle so that PostGIS can recreate the circle and query the database for those seminars that fall within the circle. Google Maps exposes two facts about its circle, the radius of the circle and the center point. Having those two facts, the browser uses the Google geometry library to also find the four points to the north, south, east, and west of the radius of the circle:

spherical = google.maps.geometry.spherical;
center = selectedShape.getCenter();
radius = selectedShape.getRadius();
north = spherical.computeOffset(center, radius, 0);
east = spherical.computeOffset(center, radius, 90);
south = spherical.computeOffset(center, radius, 180);
west = spherical.computeOffset(center,radius,270);

The browser now performs an AJAX query and sends these four points to the server side PHP program. PHP uses these four points to construct a spatial query to the database. The WHERE clause of the query illustrates the complex spatial operations of which PostGIS is capable:

WHERE 
ST_WITHIN
(
  geometry,
  ST_MAKEPOLYGON
  (
    ST_CURVETOLINE
    (
      ST_GEOMFROMTEXT
      (
       'SRID=4326;
        CIRCULARSTRING
        (
           -123.1199 49.2662, 
           -123.1098 49.2727, 
           -123.0998 49.2662, 
           -123.1098 49.2596,  
           -123.1199 49.2662
        ) – end CIRCULARSTRING
      ') -- end ST_GEOMFROMTEXT
    ) -- end ST_CURVETOLINE
  ) -- end ST_MAKEPOLYGON
) -- end ST_WITHIN

This query first creates a CIRCULARSTRING, which is a curved string that begins and ends on the same point. The CIRCULARSTRING acts as input to ST_GEOMFROMTEXT, which makes a valid geometry object, using the spatial reference id (SRID) for WGS84, which is what Google Maps uses.

The geometry object then acts as input to ST_CURVETOLINE. This function converts a CIRCULARSTRING to a valid polygon, with a default value of 128 segments. This is because PostGIS does not actually do a spatial query with a circle, but rather with an approximation of a circle.

Finally the polygon acts as input to ST_WITHIN, which PostGIS uses to retrieve features within the polygon.

PHP retrieves the results of the spatial query, formats it as an HTML table and sends it back to the browser. The browser invokes a jQuery dialog and outputs the HTML table to the dialog.

In conclusion a Google circle is not the same as a PostGIS circle. These two softwares are independent of each other, but must be able to communicate in a non-partisan way to all other softwares. This they do extremely well.

Principal Component Analysis

Part of my thesis involves generating a statistics set known as Principal Component Analysis (PCA). PCA takes a group of statistics and, well, makes sense of them.

Actually, that’s not quite correct. PCA gives you columns of numbers called eigenvectors. The numbers are just numbers; it’s up to you to make sense of them.

To illustrate the value of PCA, consider the following list. Each item in the list is a measure, or indicator of vulnerability that a person has in the event of an earthquake. Do they make sense?

  • Population – 65 years and older
  • Family – Single parent
  • Income – low
  • Education – High school or less
  • Unemployment
  • Renter
  • English – not spoken at home
  • Occupation – arts or service
  • Social dependence
  • Population growth
  • Family – Average size
  • Population – 14 years and under
  • Aboriginal identity

Okay, they make sense by themselves, but do they make sense as a group? Which is the most important? Are some related to each in the sense that when one indicator is high, another indicator is also high?

PCA takes these indicators and gives you columns of numbers. Here is one such column:

Renter -0.517039016
Family – Average size 0.421954814
Population – 14 years and under 0.409067414
Income – low -0.385866351
Occupation – arts or service -0.242320211
Unemployment -0.218579446
Aboriginal identity -0.192174588
Social dependence -0.147162482
Education – High school or less -0.057844989
Family – Single parent -0.055617301
Population growth -0.047615417
Population – 65 years and older 0.150799633
English – not spoken at home 0.204608869

Notice that I sorted the rows so that the top values, either positive or negative are at the top and the least values are at the bottom. The top numbers show us the most important indicators – the higher the number, the more important the indicator.

So what do these numbers tell us? Look at the first four indicators. Basically they tell us that areas of the city that have high percentages of renters do not have large families and population 14 and under, and that they tend to be low in income; this is indicated by the negative sign on each indicator. Conversely, the numbers tell us that large families and population 14 and under go together, but not with renters.

Are these patterns true? Let’s look at the map in several views.

RentersFirst, look at the Renters layer. This layer shows us that many people in the West End (the red part in top left), a densely packed area of 100,000 people in 100 square blocks, rent their dwelling units.
Population 14 and under
Next, look at the Population – 14 years and under layer. This layer tells us that in the south, Population 14 years and under is high, while in the West End it is low.
Large familyFinally, look at the Family – average size layer. This layer is similar to the Population – 14 years and under layer; the south part of Vancouver has large families, while in the West End, families are not large.

These layers confirm the patterns that the column of eigenvectors indicated. PCA “makes sense” of data by showing us patterns in the data. That is its value.

Now if you have been paying attention, you might say, “Whoa! Why do we need PCA when we can look at a map and get an even more accurate picture?” Well, that’s another question, one that I don’t have time to answer here. Gotta get back to my thesis.

Now where are those eigenvectors again…?