SPSS Statistics

 View Only

Using the New York State Online Geocoding API with Python

By Archive User posted Thu April 02, 2015 08:20 AM

  
I've been very lucky doing geographic analysis in New York state, as the majority of base map layers I need, and in particular streets centerline files for geocoding, are available statewide at the NYS GIS Clearing house. I've written in the past how to use various Google API's for geo data, and here I will show how one can use the NYS SAM Address database and their ESRI online geocoding service. I explored this because Google's terms of service are restrictive, and the NYS composite locator should be more comprehensive/up to date in matches (in theory).

So first, this is basically the same as with most online API's (at least in my limited experience), submit a particular url and get JSON in return. You just then need to parse the JSON for whatever info you need. This is meant to be used within SPSS, but the function works with just a single field address string and returns the single top hit in a list of length 3, with the unicode string address, and then the x and y coordinates. (The function is of course a valid python function, so you could use this in any environment you want.) The coordinates are specified using ESRI's WKID (see the list for projected and geographic coordinate systems). In the code I have it fixed as WKID 4326, which is WGS 1984, and so returns the longitude and latitude for the address. When the search returns no hits, it just returns a list of [None,None,None].
*Function to use NYS geocoding API.
BEGIN PROGRAM Python.
import urllib, json

def ParsNYGeo(jBlob):
if not jBlob['candidates']:
data = [None,None,None]
else:
add = jBlob['candidates'][0]['address']
y = jBlob['candidates'][0]['location']['y']
x = jBlob['candidates'][0]['location']['x']
data = [add,x,y]
return data

def NYSGeo(Add, WKID=4326):
base = "http://gisservices.dhses.ny.gov/arcgis/rest/services/Locators/SAM_composite/GeocodeServer/findAddressCandidates?SingleLine="
wkid = "&maxLocations=1&outSR=4326"
end = "&f=pjson"
mid = Add.replace(' ','+')
MyUrl = base + mid + wkid + end
soup = urllib.urlopen(MyUrl)
jsonRaw = soup.read()
jsonData = json.loads(jsonRaw)
MyDat = ParsNYGeo(jsonData)
return MyDat

t1 = "100 Washington Ave, Albany, NY"
t2 = "100 Washington Ave, Poop"

Out = NYSGeo(t1)
print Out

Empt = NYSGeo(t2)
print Empt
END PROGRAM.

So you can see in the code sample that you need both the street address and the city in one field. And here is a quick example with some data in SPSS. Just the zip code doesn't return any results. There is some funny results here though in this test run, and yes that Washington Ave. extension has caused me geocoding headaches in the past.
*Example using with SPSS data.
DATA LIST FREE / MyAdd (A100).
BEGIN DATA
"100 Washington Ave, Albany"
"100 Washinton Ave, Albany"
"100 Washington Ave, Albany, NY 12203"
"100 Washington Ave, Albany, NY, 12203"
"100 Washington Ave, Albany, NY 12206"
"100 Washington Ave, Poop"
"12222"
END DATA.
DATASET NAME NY_Add.

SPSSINC TRANS RESULT=GeoAdd lon lat TYPE=100 0 0
/FORMULA NYSGeo(Add=MyAdd).

LIST ALL.








#geocoding
#geospatial
#Programmability
#python
#SPSS
#SPSSStatistics
0 comments
0 views

Permalink