SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers!

View Only

Back to discussions

Expand all | Collapse all

Address Matching

1. Address Matching

Like
Art Jack
Posted Mon March 14, 2022 12:28 PM

Reply
Morning,
I have two files that I need to match on address. One file has address in one field with varying degrees of format. The second file has both parsed out and single fields with a pretty standard format. I'm trying to figure out what would be an efficient way to merge the two. Thanks Arthur.

------------------------------
Art Jack
------------------------------

#SPSSStatistics
2. RE: Address Matching

Like
Jon Peck

IBM Champion
Posted Mon March 14, 2022 12:37 PM

Reply
You will need to make a variable or set of variables that match in the two files (and then use MATCH FILES). The details on how to do that depend on the formats of the address items. If you can supply samples from the two files, we may be able to help.

--
Jon K Peck
jkpeck@gmail.com

Original Message
3. RE: Address Matching

Like
Rick Marcantonio
Posted Mon March 14, 2022 12:38 PM

Reply
Hi. So in one file you have one variable, ADDRESS, that has something like: "123 Main Street Chicago IL 60014" and in the other you have maybe 4 fields, for example, "123 Main Street" "Chicago" , "IL", "60014"?

If so, then my advice would be in the 2nd file, use CONCAT to create a new string variable named ADDRESS with a consistent format (the A width of the string in the first file), then MATCH the files on that key. If case is a problem, then in both files, create a new key variable based on ADDRESS that is in all lower or upper case.

------------------------------
Rick Marcantonio
Quality Assurance
IBM
------------------------------

Original Message
4. RE: Address Matching

Like
Art Jack
Posted Mon March 14, 2022 01:14 PM

Reply
Below is sort of example. In one it's standard with parsed fields & concat. The second file is one field but formatting is not standard.

file_1_address file_1A file_1b file_1c file_1d file_1e file_2_address

123 Main Dr, 265, 37244 123 Main Dr 265 37244 The Apartments at Chicago Land, 123 Main DR, APT 265

------------------------------
Art Jack
------------------------------

Original Message
5. RE: Address Matching

Like
Rick Marcantonio
Posted Mon March 14, 2022 01:39 PM

Reply
Interesting. Clearly a 1-to-1 match is out, case issues aside.

Looks like multiple keys are needed. You need to create separate string variables in file2 that are the same as those in file1, and like file1 they have to be the same "things": street_number, street, city, state, zip, etc.

I guess I'd try breaking up the file 2 addresses into those variables using (CHAR) SUBSTR and INDEX. I'm not sure what to break them up by; not spaces, maybe commas. I can't know without looking at a lot of the addresses. Obviously, there may be parts of the file 2 address that you won't be able to use; that "The Apartments at Chicago Land" part of file2 address will have no match in file1, at least for this case.

------------------------------
Rick Marcantonio
Quality Assurance
IBM
------------------------------

Original Message
6. RE: Address Matching

Like
Jon Peck

IBM Champion
Posted Mon March 14, 2022 01:45 PM

Reply
It's really going to depend on how much variation there is in the formats. I would be inclined to use Python pattern match along with the SPSSINC TRANS extension command, but the pattern variations need to be defined. It might not be possible to get 100%, but a high percentage is conceivable.

--
Jon K Peck
jkpeck@gmail.com

Original Message
7. RE: Address Matching

Like
Rick Marcantonio
Posted Mon March 14, 2022 01:54 PM

Reply
Jon - I guess I don't know enough about pattern matching in Python, but wouldn't the files have to be already joined to do that? What are you pattern-matching in each of the two separate files before you MATCH them? I was thinking of using Python to create a "checksum" variable on the address for each case, but I still don't know what the resulting MATCH on that checksum var would look like. Maybe I'll experiment when I find the time. ;)

------------------------------
Rick Marcantonio
Quality Assurance
IBM
------------------------------

Original Message
8. RE: Address Matching

Like
Jon Peck

IBM Champion
Posted Mon March 14, 2022 02:03 PM

Reply
What I would do would be to "canonicalize" the addresses in both files. For example, you might discard everything before the first numerical sequence and take those numbers as the street number and the following items up to a comma as the street name (and lower case), etc.

--
Jon K Peck
jkpeck@gmail.com

Original Message
9. RE: Address Matching

Like
Rick Marcantonio
Posted Mon March 14, 2022 02:07 PM

Reply
OK, I see.

------------------------------
Rick Marcantonio
Quality Assurance
IBM
------------------------------

Original Message
10. RE: Address Matching

Like
Art Jack
Posted Mon March 14, 2022 02:33 PM

Reply
Is there an example of that somewhere? Thanks.

------------------------------
Art Jack
------------------------------

Original Message
11. RE: Address Matching

Like
Jon Peck

IBM Champion
Posted Mon March 14, 2022 02:43 PM

Reply
It would need some Python tweaking. If you can send me samples of the two files, I'll see if I can come up with something (jkpeck@gmail.com).

--
Jon K Peck
jkpeck@gmail.com

Original Message
12. RE: Address Matching

Like
Jeannette Magne
Posted Tue March 15, 2022 11:27 AM

Reply
Hello,

I'm saler on the IBM SPSS

------------------------------

Jeannette MAGNE
Business Intelligence & Predictive Analytics
IBM Hybrid Cloud
Tel. +33 (0)6 31 05 61 04
Cognos : https://youtu.be/NvKjf6JM-ZU
Contact Support : https://www.ibm.com/analytics/us/en/spss/spss-support/index.htmlor https://www.ibm.com/mysupport/s/

------------------------------

Original Message

SPSS Statistics

SPSS Statistics

Address Matching

Art JackMon March 14, 2022 12:28 PM

Jon PeckMon March 14, 2022 12:37 PM

Rick MarcantonioMon March 14, 2022 12:38 PM

Art JackMon March 14, 2022 01:14 PM

Rick MarcantonioMon March 14, 2022 01:39 PM

Jon PeckMon March 14, 2022 01:45 PM

Rick MarcantonioMon March 14, 2022 01:54 PM

Jon PeckMon March 14, 2022 02:03 PM

Rick MarcantonioMon March 14, 2022 02:07 PM

Art JackMon March 14, 2022 02:33 PM

Jon PeckMon March 14, 2022 02:43 PM

Jeannette MagneTue March 15, 2022 11:27 AM

1. Address Matching

2. RE: Address Matching

3. RE: Address Matching

4. RE: Address Matching

5. RE: Address Matching

6. RE: Address Matching

7. RE: Address Matching

8. RE: Address Matching

9. RE: Address Matching

10. RE: Address Matching

11. RE: Address Matching

12. RE: Address Matching

Additional
Resources

Office

Quick Links

file_1_address	file_1A	file_1b	file_1c	file_1d	file_1e	file_2_address
123 Main Dr, 265, 37244	123	Main	Dr	265	37244	The Apartments at Chicago Land, 123 Main DR, APT 265

SPSS Statistics

SPSS Statistics

Address Matching

Art JackMon March 14, 2022 12:28 PM

Jon PeckMon March 14, 2022 12:37 PM

Rick MarcantonioMon March 14, 2022 12:38 PM

Art JackMon March 14, 2022 01:14 PM

Rick MarcantonioMon March 14, 2022 01:39 PM

Jon PeckMon March 14, 2022 01:45 PM

Rick MarcantonioMon March 14, 2022 01:54 PM

Jon PeckMon March 14, 2022 02:03 PM

Rick MarcantonioMon March 14, 2022 02:07 PM

Art JackMon March 14, 2022 02:33 PM

Jon PeckMon March 14, 2022 02:43 PM

Jeannette MagneTue March 15, 2022 11:27 AM

1. Address Matching

2. RE: Address Matching

3. RE: Address Matching

4. RE: Address Matching

5. RE: Address Matching

6. RE: Address Matching

7. RE: Address Matching

8. RE: Address Matching

9. RE: Address Matching

10. RE: Address Matching

11. RE: Address Matching

12. RE: Address Matching

Additional Resources

Office

Quick Links

Additional
Resources