Data Wrangling in Python – How to Geocoding And Reverse Geocoding

Geocoding And Reverse Geocoding

Geocoding (converting a physical address or location into latitude/longitude) and reverse geocoding (converting a lat/long to a physical address or location) are common tasks when working with geo-data.

Python offers a number of packages to make the task incredibly easy. In the tutorial below, I use pygeocoder, a wrapper for Google’s geo-API, to both geocode and reverse geocode.

Preliminaries

First we want to load the packages we will want to use in the script. Specifically, I am loading pygeocoder for its geo-functionality, pandas for its dataframe structures, and numpy for its missing value (np.nan) functionality.


/* Load packages */
from pygeocoder import Geocoder
import pandas as pd
import numpy as np

Create some simulated geo data

Geo-data comes in a wide variety of forms, in this case we have a Python dictionary of five latitude and longitude strings, with each coordinate in a coordinate pair separated by a comma.

/* Create a dictionary of raw data */
data = {'Site 1': '31.336968, -109.560959',
        'Site 2': '31.347745, -108.229963',
        'Site 3': '32.277621, -107.734724',
        'Site 4': '31.655494, -106.420484',
        'Site 5': '30.295053, -104.014528'}

While technically unnecessary, because I originally come from R, I am a big fan of dataframes, so let us turn the dictionary of simulated data into a dataframe.

/* Convert the dictionary into a pandas dataframe */
df = pd.DataFrame.from_dict(data, orient='index')
/* View the dataframe */
df
0
Site 1 31.336968, -109.560959
Site 2 31.347745, -108.229963
Site 3 32.277621, -107.734724
Site 4 31.655494, -106.420484
Site 5 30.295053, -104.014528

You can see now that we have a a dataframe with five rows, with each now containing a string of latitude and longitude. Before we can work with the data, we’ll need to 1) separate the strings into latitude and longitude and 2) convert them into floats. The function below does just that.

/* Create two lists for the loop results to be placed */
lat = []
lon = []

/* For each row in a varible, */
for row in df[0]:
    /* Try to, */
    try:
        /* Split the row by comma, convert to float, and append */
        /* everything before the comma to lat */
        lat.append(float(row.split(',')[0]))
        /* Split the row by comma, convert to float, and append */
        /* everything after the comma to lon */
        lon.append(float(row.split(',')[1]))
    /* But if you get an error */
    except:
        /* append a missing value to lat */
        lat.append(np.NaN)
        /* append a missing value to lon */
        lon.append(np.NaN)

/* Create two new columns from lat and lon */
df['latitude'] = lat
df['longitude'] = lon

Let’s take a took a what we have now.

/* View the dataframe */
df
0 latitude longitude
Site 1 31.336968, -109.560959 31.336968 -109.560959
Site 2 31.347745, -108.229963 31.347745 -108.229963
Site 3 32.277621, -107.734724 32.277621 -107.734724
Site 4 31.655494, -106.420484 31.655494 -106.420484
Site 5 30.295053, -104.014528 30.295053 -104.014528

Awesome. This is exactly what we want to see, one column of floats for latitude and one column of floats for longitude.

Reverse Geocoding

To reverse geocode, we feed a specific latitude and longitude pair, in this case the first row (indexed as ‘0’) into pygeocoder’s reverse_geocoder function.

/* Convert longitude and latitude to a location */
results = Geocoder.reverse_geocode(df['latitude'][0], df['longitude'][0])

Now we can take can start pulling out the data that we want.

/* Print the lat/long */
results.coordinates
(31.3372728, -109.5609559)
/* Print the city */
results.city
'Douglas'
/* Print the country */
results.country
'United States'
/* Print the street address (if applicable) */
results.street_address
/* Print the admin1 level */
results.administrative_area_level_1
'Arizona'

Geocoding

For geocoding, we need to submit a string containing an address or location (such as a city) into the geocode function. However, not all strings are formatted in a way that Google’s geo-API can make sense of them. We can text if an input is valid by using the .geocode().valid_address function.

/* Verify that an address is valid (i.e. in Google's system) */
Geocoder.geocode("4207 N Washington Ave, Douglas, AZ 85607").valid_address
True

Because the output was True, we now know that this is a valid address and thus can print the latitude and longitude coordinates.

/* Print the lat/long */
results.coordinates
(31.3372728, -109.5609559)

But even more interesting, once the address is processed by the Google geo API, we can parse it and easily separate street numbers, street names, etc.

/* Find the lat/long of a certain address */
result = Geocoder.geocode("7250 South Tucson Boulevard, Tucson, AZ 85756")
/* Print the street number */
result.street_number
'7250'
/* Print the street name */
result.route
'South Tucson Boulevard'

And there you have it. Python makes this entire process easy and inserting it into an analysis only takes a few minutes. Good luck!

 

Python Example for Beginners

Two Machine Learning Fields

There are two sides to machine learning:

  • Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
  • Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes

Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.  

Google –> SETScholars