I've been very lucky doing geographic analysis in New York state, as the majority of base map layers I need, and in particular streets centerline files for geocoding, are available statewide at the
NYS GIS Clearing house. I've written in the past how to use various Google API's for geo data, and here I will show how one can use the
NYS SAM Address database and their ESRI
online geocoding service. I explored this because Google's terms of service are restrictive, and the NYS composite locator should be more comprehensive/up to date in matches (in theory).
So first, this is basically the same as with most online API's (at least in my limited experience), submit a particular url and get JSON in return. You just then need to parse the JSON for whatever info you need. This is meant to be used within SPSS, but the function works with just a single field address string and returns the single top hit in a list of length 3, with the unicode string address, and then the x and y coordinates. (The function is of course a valid python function, so you could use this in any environment you want.) The coordinates are specified using ESRI's WKID (see the list for
projected and
geographic coordinate systems). In the code I have it fixed as WKID 4326, which is WGS 1984, and so returns the longitude and latitude for the address. When the search returns no hits, it just returns a list of
[None,None,None]
.
*Function to use NYS geocoding API.
BEGIN PROGRAM Python.
import urllib, json
def ParsNYGeo(jBlob):
if not jBlob['candidates']:
data = [None,None,None]
else:
add = jBlob['candidates'][0]['address']
y = jBlob['candidates'][0]['location']['y']
x = jBlob['candidates'][0]['location']['x']
data = [add,x,y]
return data
def NYSGeo(Add, WKID=4326):
base = "http://gisservices.dhses.ny.gov/arcgis/rest/services/Locators/SAM_composite/GeocodeServer/findAddressCandidates?SingleLine="
wkid = "&maxLocations=1&outSR=4326"
end = "&f=pjson"
mid = Add.replace(' ','+')
MyUrl = base + mid + wkid + end
soup = urllib.urlopen(MyUrl)
jsonRaw = soup.read()
jsonData = json.loads(jsonRaw)
MyDat = ParsNYGeo(jsonData)
return MyDat
t1 = "100 Washington Ave, Albany, NY"
t2 = "100 Washington Ave, Poop"
Out = NYSGeo(t1)
print Out
Empt = NYSGeo(t2)
print Empt
END PROGRAM.
So you can see in the code sample that you need both the street address and the city in one field. And here is a quick example with some data in SPSS. Just the zip code doesn't return any results. There is some funny results here though in this test run, and yes that Washington Ave. extension has caused me geocoding headaches in the past.
*Example using with SPSS data.
DATA LIST FREE / MyAdd (A100).
BEGIN DATA
"100 Washington Ave, Albany"
"100 Washinton Ave, Albany"
"100 Washington Ave, Albany, NY 12203"
"100 Washington Ave, Albany, NY, 12203"
"100 Washington Ave, Albany, NY 12206"
"100 Washington Ave, Poop"
"12222"
END DATA.
DATASET NAME NY_Add.
SPSSINC TRANS RESULT=GeoAdd lon lat TYPE=100 0 0
/FORMULA NYSGeo(Add=MyAdd).
LIST ALL.
#geocoding#geospatial#Programmability#python#SPSS#SPSSStatistics