Buffalo Car Thefts

Animated Buffalo Car Theft Geospatial Heatmap Using Buffalo OpenData API

September 10, 2023 · 480 mins read

Table of Contents

Summary

This project offers a captivating demonstration of the power of geospatial heatmaps in data science, using Folium to visualize monthly car theft patterns in Buffalo from January 2009 to the present day. Geospatial heatmaps have emerged as a pivotal tool in the data science toolkit, enabling the exploration and communication of intricate spatial data trends.

In this endeavor, we leverage Folium, a robust Python library, to craft an engaging and informative geospatial heatmap. The heatmap employs a gradient color scheme that intuitively conveys insights: areas shaded in blue represent lower incidences of car theft, while those in red indicate higher concentrations of such incidents. This color-coded approach enhances the accessibility of complex data, allowing viewers to grasp patterns and variations at a glance.

What sets this project apart is the dynamic aspect introduced through animation. The heatmap unfolds sequentially, commencing its journey in January 2009 and traversing each subsequent month until September 2023. This animation not only provides a chronological perspective but also highlights temporal fluctuations in car theft occurrences. It serves as a compelling visual narrative, unveiling evolving trends over the extensive time frame.

In the realm of data science, geospatial heatmaps serve as indispensable tools for spatial analysis, allowing practitioners to discern geographical patterns, hotspots, and spatial relationships within their datasets. Whether investigating crime trends, urban planning, or resource allocation, geospatial heatmaps empower data scientists to make data-driven decisions with a profound understanding of geographical dynamics.

Folium, with its user-friendly interface and seamless integration with Python, has become a preferred choice for creating interactive and aesthetically pleasing geospatial visualizations. Its versatility extends beyond heatmaps, enabling the incorporation of various layers, markers, and tooltips for comprehensive spatial storytelling.

In conclusion, this project not only offers an insightful view of car theft patterns in Buffalo but also underscores the significance of geospatial heatmaps and tools like Folium in the data science landscape. It showcases how geospatial visualizations can enhance data exploration, communication, and decision-making, paving the way for informed and impactful insights in diverse domains.

Imports

import requests
import pandas as pd
import math
import datetime
import urllib.request
import json
import time
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import geopandas as gpd
import folium
from folium.plugins import HeatMap
import folium.plugins as plugins
from folium.plugins import HeatMapWithTime

plt.style.use('seaborn-v0_8-darkgrid')
# warnings ignore
import warnings
# set warnings to ignore
warnings.filterwarnings('ignore')

pd.options.mode.chained_assignment = None  # default='warn'
# bring openDataBuffalo api key into googleColab
from google.colab import files
#import io

uploaded = files.upload()

Api Request

# open api key
app_token = open('api_key.txt', 'r').read()
#app_token
#hide api token & return BuffaloOpenData crime data
limit = 500000
app_token = open('api_key.txt', 'r').read()

uri = f"https://data.buffalony.gov/resource/d6g9-xbgu.json?$limit={limit}&$$app_token={app_token}&$where=incident_datetime>'2009-01-10T12:00:00'"

# Send the HTTP GET request
r = requests.get(uri)


print('Status code ',r.status_code)
print('Number of rows returned ',len(r.json()))
print('Endoced URI with params ',r.url)
new_json = r.json()
#new_json
Status code  200
Number of rows returned  245204
Endoced URI with params  https://data.buffalony.gov/resource/d6g9-xbgu.json?$limit=500000&$$app_token=NnGV0W4ip4YEFBLvBMGAjaByD&$where=incident_datetime%3E'2009-01-10T12:00:00'
df=pd.DataFrame(new_json)
print(df.shape)
df.head()
(245204, 27)
case_number incident_datetime incident_type_primary incident_description parent_incident_type hour_of_day day_of_week address_1 city state ... census_tract census_block census_block_group neighborhood_1 police_district council_district tractce20 geoid20_tract geoid20_blockgroup geoid20_block
0 09-0100387 2009-01-10T12:19:00.000 BURGLARY Buffalo Police are investigating this report o... Breaking & Entering 12 Saturday 2700 Block BAILEY Buffalo NY ... 51 1013 1 North Park District D DELAWARE 005100 36029005100 360290001101 360290002001013
1 09-0100389 2009-01-10T12:21:00.000 BURGLARY Buffalo Police are investigating this report o... Breaking & Entering 12 Saturday 800 Block EGGERT RD Buffalo NY ... 41 1009 1 Kenfield District E UNIVERSITY 004100 36029004100 360290001101 360290002001009
2 09-0270361 2009-01-10T12:27:00.000 UUV Buffalo Police are investigating this report o... Theft of Vehicle 12 Saturday 1600 Block MAIN ST Buffalo NY ... 168.02 1017 1 Masten Park District E MASTEN 016802 36029016802 360290001101 360290165001017
3 09-0100435 2009-01-10T12:30:00.000 ASSAULT Buffalo Police are investigating this report o... Assault 12 Saturday JEFFERSON AV & E FERRY ST Buffalo NY ... 168.02 2000 2 Masten Park District E MASTEN 016802 36029016802 360290001102 360290046012000
4 09-0100421 2009-01-10T12:30:00.000 BURGLARY Buffalo Police are investigating this report o... Breaking & Entering 12 Saturday 100 Block URBAN ST Buffalo NY ... 35.02 2000 2 MLK Park District C MASTEN 003502 36029003502 360290001102 360290046012000

5 rows × 27 columns

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 245204 entries, 0 to 245203
Data columns (total 27 columns):
 #   Column                   Non-Null Count   Dtype 
---  ------                   --------------   ----- 
 0   case_number              245204 non-null  object
 1   incident_datetime        245204 non-null  object
 2   incident_type_primary    245204 non-null  object
 3   incident_description     245204 non-null  object
 4   parent_incident_type     245204 non-null  object
 5   hour_of_day              245204 non-null  object
 6   day_of_week              245204 non-null  object
 7   address_1                245187 non-null  object
 8   city                     245204 non-null  object
 9   state                    245204 non-null  object
 10  location                 240127 non-null  object
 11  latitude                 240127 non-null  object
 12  longitude                240127 non-null  object
 13  created_at               245204 non-null  object
 14  census_tract_2010        242785 non-null  object
 15  census_block_group_2010  242785 non-null  object
 16  census_block_2010        242785 non-null  object
 17  census_tract             242785 non-null  object
 18  census_block             242785 non-null  object
 19  census_block_group       242785 non-null  object
 20  neighborhood_1           242785 non-null  object
 21  police_district          242785 non-null  object
 22  council_district         242785 non-null  object
 23  tractce20                242922 non-null  object
 24  geoid20_tract            242922 non-null  object
 25  geoid20_blockgroup       242922 non-null  object
 26  geoid20_block            242922 non-null  object
dtypes: object(27)
memory usage: 50.5+ MB
# check for null
df.isnull().sum()
case_number                   0
incident_datetime             0
incident_type_primary         0
incident_description          0
parent_incident_type          0
hour_of_day                   0
day_of_week                   0
address_1                    17
city                          0
state                         0
location                   5077
latitude                   5077
longitude                  5077
created_at                    0
census_tract_2010          2419
census_block_group_2010    2419
census_block_2010          2419
census_tract               2419
census_block               2419
census_block_group         2419
neighborhood_1             2419
police_district            2419
council_district           2419
tractce20                  2282
geoid20_tract              2282
geoid20_blockgroup         2282
geoid20_block              2282
dtype: int64
def null_nonnull_ratios(dataframe):
    """
    Calculate the ratios of null and non-null data in a pandas DataFrame.

    Parameters:
    dataframe (pd.DataFrame): The DataFrame for which you want to calculate null and non-null ratios.

    Returns:
    pd.DataFrame: A DataFrame containing columns for null and non-null ratios for each column.
    """
    total_rows = len(dataframe)
    null_counts = dataframe.isnull().sum()
    nonnull_counts = total_rows - null_counts
    null_ratios = null_counts / total_rows
    nonnull_ratios = nonnull_counts / total_rows
    result_df = pd.DataFrame({'null': null_ratios, 'non-null': nonnull_ratios})
    return result_df


ratios = null_nonnull_ratios(df)
print(ratios)
                             null  non-null
case_number              0.000000  1.000000
incident_datetime        0.000000  1.000000
incident_type_primary    0.000000  1.000000
incident_description     0.000000  1.000000
parent_incident_type     0.000000  1.000000
hour_of_day              0.000000  1.000000
day_of_week              0.000000  1.000000
address_1                0.000069  0.999931
city                     0.000000  1.000000
state                    0.000000  1.000000
location                 0.020705  0.979295
latitude                 0.020705  0.979295
longitude                0.020705  0.979295
created_at               0.000000  1.000000
census_tract_2010        0.009865  0.990135
census_block_group_2010  0.009865  0.990135
census_block_2010        0.009865  0.990135
census_tract             0.009865  0.990135
census_block             0.009865  0.990135
census_block_group       0.009865  0.990135
neighborhood_1           0.009865  0.990135
police_district          0.009865  0.990135
council_district         0.009865  0.990135
tractce20                0.009307  0.990693
geoid20_tract            0.009307  0.990693
geoid20_blockgroup       0.009307  0.990693
geoid20_block            0.009307  0.990693
print(df['parent_incident_type'].value_counts())
#print(df['incident_type_primary'].value_counts())
Theft                   108718
Assault                  50190
Breaking & Entering      43816
Theft of Vehicle         22122
Robbery                  15154
Sexual Assault            1844
Other Sexual Offense      1695
Homicide                   867
Sexual Offense             797
SODOMY                       1
Name: parent_incident_type, dtype: int64

Format Buffalo Open Data

df.index = pd.DatetimeIndex(df['incident_datetime'])

df['Year'] = df.index.year
df['Month'] = df.index.month
df['dayOfWeek'] = df.index.dayofweek
df['dayOfMonth'] = df.index.day
df['dayOfYear'] = df.index.dayofyear
df['weekOfMonth'] = df.dayOfMonth.apply(lambda d: (d - 1) // 7 + 1)

dayOfYear = list(df.index.dayofyear)

weekOfYear = [math.ceil(i/7) for i in dayOfYear]
df['weekOfYear'] = weekOfYear
df['latitude'] = pd.to_numeric(df['latitude'])
df['longitude'] = pd.to_numeric(df['longitude'])
df['hour_of_day'] = pd.to_numeric(df['hour_of_day'])
# remove outliers that are not in the city limits
df = df[(df['longitude'] < -78.80)]
df = df[(df['latitude'] < 43)]
#buffalo_map.sort_values('Latitude', ascending=False)
#should clean up datapoints outside of buffalo proper
df = df[df['neighborhood_1'] != 'UNKNOWN']
#df.info()
kiaBoys = df[df['parent_incident_type'] == 'Theft of Vehicle']

#print(df['parent_incident_type'].value_counts())

kiaBoys.shape
(21354, 34)
# check new car theft df for missing values

def null_nonnull_ratios(dataframe):
    """
    Calculate the ratios of null and non-null data in a pandas DataFrame.

    Parameters:
    dataframe (pd.DataFrame): The DataFrame for which you want to calculate null and non-null ratios.

    Returns:
    pd.DataFrame: A DataFrame containing columns for null and non-null ratios for each column.
    """
    total_rows = len(dataframe)
    null_counts = dataframe.isnull().sum()
    nonnull_counts = total_rows - null_counts
    null_ratios = null_counts / total_rows
    nonnull_ratios = nonnull_counts / total_rows
    result_df = pd.DataFrame({'null': null_ratios, 'non-null': nonnull_ratios})
    return result_df

ratios = null_nonnull_ratios(kiaBoys)
print(ratios)
                             null  non-null
case_number              0.000000  1.000000
incident_datetime        0.000000  1.000000
incident_type_primary    0.000000  1.000000
incident_description     0.000000  1.000000
parent_incident_type     0.000000  1.000000
hour_of_day              0.000000  1.000000
day_of_week              0.000000  1.000000
address_1                0.000047  0.999953
city                     0.000000  1.000000
state                    0.000000  1.000000
location                 0.000000  1.000000
latitude                 0.000000  1.000000
longitude                0.000000  1.000000
created_at               0.000000  1.000000
census_tract_2010        0.000000  1.000000
census_block_group_2010  0.000000  1.000000
census_block_2010        0.000000  1.000000
census_tract             0.000000  1.000000
census_block             0.000000  1.000000
census_block_group       0.000000  1.000000
neighborhood_1           0.000000  1.000000
police_district          0.000000  1.000000
council_district         0.000000  1.000000
tractce20                0.000000  1.000000
geoid20_tract            0.000000  1.000000
geoid20_blockgroup       0.000000  1.000000
geoid20_block            0.000000  1.000000
Year                     0.000000  1.000000
Month                    0.000000  1.000000
dayOfWeek                0.000000  1.000000
dayOfMonth               0.000000  1.000000
dayOfYear                0.000000  1.000000
weekOfMonth              0.000000  1.000000
weekOfYear               0.000000  1.000000
#create monthyear column and groupby monthyear

# Convert 'incident_datetime' to a datetime format if it's not already
kiaBoys['incident_datetime'] = pd.to_datetime(kiaBoys['incident_datetime'])

# Extract the month and year separately
kiaBoys['month'] = kiaBoys['incident_datetime'].dt.month
kiaBoys['year'] = kiaBoys['incident_datetime'].dt.year

# Create a new column with the month names
kiaBoys['month_name'] = kiaBoys['month'].apply(lambda x: datetime.date(1900, x, 1).strftime('%B'))

# Create a new column with the month and year as a formatted string
kiaBoys['monthYear'] = kiaBoys['incident_datetime'].dt.strftime('%Y %B')

# Display the result
#print(kiaBoys.head())
mean_latitude = kiaBoys['latitude'].mean()
mean_longitude = kiaBoys['longitude'].mean()
print(mean_latitude)
print(mean_longitude)
42.91205881802004
-78.84833113234055
# Create an empty list to store weight_list
weight_list = []

# Set 'account' to 1
kiaBoys['account'] = 1

# Loop to create the weight_list
for x in kiaBoys['monthYear'].sort_values().unique():
    weight_list.append(
        kiaBoys.loc[kiaBoys['monthYear'] == x, ['latitude', 'longitude', 'account']]
        .groupby(['latitude', 'longitude'])
        .sum()
        .reset_index()
        .values.tolist()
    )

Geospatial Heatmap

# create geospatial heatmap

# Assuming you have a DataFrame called kiaBoys and weight_list is defined.

# Calculate the mean latitude and longitude
mean_latitude = kiaBoys['latitude'].mean()
mean_longitude = kiaBoys['longitude'].mean()

# Create a map centered at the mean latitude and longitude
buffaloMap = folium.Map(location=[mean_latitude, mean_longitude], control_scale=True, zoom_start=12, tiles='Cartodbdark_matter')

# Now, use HeatMapWithTime
HeatMapWithTime(weight_list, radius=12, auto_play=True, min_opacity=0.5, max_opacity=1, use_local_extrema=True).add_to(buffaloMap)

# Display the map
buffaloMap

Make this Notebook Trusted to load map: File -> Trust Notebook