SPSS Statistics

 View Only

Using Python to geocode data in SPSS

By Archive User posted Fri March 14, 2014 10:42 AM

  


This is the first time since I've been using SPSS that I have regular access to Python and R programmability in all of the different places I use SPSS (home and multiple work computers). So I've been exploring more solutions to use these tools in regular data analysis and work-flows - of course to accomplish things that can not be done directly in native SPSS code.


The example I am going to show today is using geopy, a Python library that places several geocoding API's all in a convenient set of scripts. So first once geopy is installed you can call Python code within SPSS by placing it within a BEGIN PROGRAM and END PROGRAM blocks. Here is an example modified from geopy's tutorial.



BEGIN PROGRAM.
from geopy import geocoders
g = geocoders.GoogleV3()
place, (lat, lng) = g.geocode("135 Western Ave. Albany, NY")
a = [place, lat, lng]
print a
END PROGRAM.



Now what we want to do is to geocode some address data that is currently stored in SPSS case data. So here is an example dataset with some addresses in Albany.



DATA LIST LIST ("|") / Address (A100).
BEGIN DATA
135 Western Ave. Albany, NY
Western Ave. and Quail St Albany, NY
325 Western Ave. Albany, NY
END DATA.
DATASET NAME Add.



Here I will use the handy SPSSINC TRANS function (provided when installing Python programmability - and as of SPSS 22 installed by default with SPSS) to return the geocoded coordinates using the Google API. The geocode function from geopy does not return the data in an array exactly how I want it, so what I do is create my own function, named g, and it coerces the individual objects (place, lat and lng) into an array and returns that.



BEGIN PROGRAM.
from geopy import geocoders
def g(a):
g = geocoders.GoogleV3()
place, (lat, lng) = g.geocode(a)
return [place, lat, lng]
print g("135 Western Ave. Albany, NY")
END PROGRAM.



Now I can use the SPSSINC TRANS function to return the associated place string, as well as the latitude and longitude coordinates from Google.



SPSSINC TRANS RESULT=Place Lat Lng TYPE=100 0 0
/FORMULA g(Address).



Pretty easy. Note that (I believe) the Google geocoding API has a limit of 2,500 cases - so don't go submitting a million cases to be geocoded (use an offline solution for that). Also a mandatory mention should be made of the variable reliability of online geocoding services.










#data-manipulation
#geocoding
#mapping
#python
#SPSS
#SPSSStatistics
1 comment
0 views

Permalink

Comments

Sat March 15, 2014 07:48 AM

Nice.

Jon Peck (no "h") aka Kim Senior Software Engineer, IBM peck@us.ibm.com phone: 720-342-5621