Summary

This project offers a captivating demonstration of the power of geospatial heatmaps in data science, using Folium to visualize monthly car theft patterns in Buffalo from January 2009 to the present day. Geospatial heatmaps have emerged as a pivotal tool in the data science toolkit, enabling the exploration and communication of intricate spatial data trends.

In this endeavor, we leverage Folium, a robust Python library, to craft an engaging and informative geospatial heatmap. The heatmap employs a gradient color scheme that intuitively conveys insights: areas shaded in blue represent lower incidences of car theft, while those in red indicate higher concentrations of such incidents. This color-coded approach enhances the accessibility of complex data, allowing viewers to grasp patterns and variations at a glance.

What sets this project apart is the dynamic aspect introduced through animation. The heatmap unfolds sequentially, commencing its journey in January 2009 and traversing each subsequent month until September 2023. This animation not only provides a chronological perspective but also highlights temporal fluctuations in car theft occurrences. It serves as a compelling visual narrative, unveiling evolving trends over the extensive time frame.

In the realm of data science, geospatial heatmaps serve as indispensable tools for spatial analysis, allowing practitioners to discern geographical patterns, hotspots, and spatial relationships within their datasets. Whether investigating crime trends, urban planning, or resource allocation, geospatial heatmaps empower data scientists to make data-driven decisions with a profound understanding of geographical dynamics.

Folium, with its user-friendly interface and seamless integration with Python, has become a preferred choice for creating interactive and aesthetically pleasing geospatial visualizations. Its versatility extends beyond heatmaps, enabling the incorporation of various layers, markers, and tooltips for comprehensive spatial storytelling.

In conclusion, this project not only offers an insightful view of car theft patterns in Buffalo but also underscores the significance of geospatial heatmaps and tools like Folium in the data science landscape. It showcases how geospatial visualizations can enhance data exploration, communication, and decision-making, paving the way for informed and impactful insights in diverse domains.

Imports

import requests
import pandas as pd
import math
import datetime
import urllib.request
import json
import time
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import geopandas as gpd
import folium
from folium.plugins import HeatMap
import folium.plugins as plugins
from folium.plugins import HeatMapWithTime

plt.style.use('seaborn-v0_8-darkgrid')
# warnings ignore
import warnings
# set warnings to ignore
warnings.filterwarnings('ignore')

pd.options.mode.chained_assignment = None  # default='warn'

# bring openDataBuffalo api key into googleColab
from google.colab import files
#import io

uploaded = files.upload()

Api Request

# open api key
app_token = open('api_key.txt', 'r').read()
#app_token

#hide api token & return BuffaloOpenData crime data
limit = 500000
app_token = open('api_key.txt', 'r').read()

uri = f"https://data.buffalony.gov/resource/d6g9-xbgu.json?$limit={limit}&$$app_token={app_token}&$where=incident_datetime>'2009-01-10T12:00:00'"

# Send the HTTP GET request
r = requests.get(uri)


print('Status code ',r.status_code)
print('Number of rows returned ',len(r.json()))
print('Endoced URI with params ',r.url)
new_json = r.json()
#new_json

Status code  200
Number of rows returned  245204
Endoced URI with params  https://data.buffalony.gov/resource/d6g9-xbgu.json?$limit=500000&$$app_token=NnGV0W4ip4YEFBLvBMGAjaByD&$where=incident_datetime%3E'2009-01-10T12:00:00'

df=pd.DataFrame(new_json)
print(df.shape)
df.head()

(245204, 27)

	case_number	incident_datetime	incident_type_primary	incident_description	parent_incident_type	hour_of_day	day_of_week	address_1	city	state	...	census_tract	census_block	census_block_group	neighborhood_1	police_district	council_district	tractce20	geoid20_tract	geoid20_blockgroup	geoid20_block
0	09-0100387	2009-01-10T12:19:00.000	BURGLARY	Buffalo Police are investigating this report o...	Breaking & Entering	12	Saturday	2700 Block BAILEY	Buffalo	NY	...	51	1013	1	North Park	District D	DELAWARE	005100	36029005100	360290001101	360290002001013
1	09-0100389	2009-01-10T12:21:00.000	BURGLARY	Buffalo Police are investigating this report o...	Breaking & Entering	12	Saturday	800 Block EGGERT RD	Buffalo	NY	...	41	1009	1	Kenfield	District E	UNIVERSITY	004100	36029004100	360290001101	360290002001009
2	09-0270361	2009-01-10T12:27:00.000	UUV	Buffalo Police are investigating this report o...	Theft of Vehicle	12	Saturday	1600 Block MAIN ST	Buffalo	NY	...	168.02	1017	1	Masten Park	District E	MASTEN	016802	36029016802	360290001101	360290165001017
3	09-0100435	2009-01-10T12:30:00.000	ASSAULT	Buffalo Police are investigating this report o...	Assault	12	Saturday	JEFFERSON AV & E FERRY ST	Buffalo	NY	...	168.02	2000	2	Masten Park	District E	MASTEN	016802	36029016802	360290001102	360290046012000
4	09-0100421	2009-01-10T12:30:00.000	BURGLARY	Buffalo Police are investigating this report o...	Breaking & Entering	12	Saturday	100 Block URBAN ST	Buffalo	NY	...	35.02	2000	2	MLK Park	District C	MASTEN	003502	36029003502	360290001102	360290046012000

5 rows × 27 columns

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 245204 entries, 0 to 245203
Data columns (total 27 columns):
 #   Column                   Non-Null Count   Dtype 
---  ------                   --------------   ----- 
 case_number              245204 non-null  object
 incident_datetime        245204 non-null  object
 incident_type_primary    245204 non-null  object
 incident_description     245204 non-null  object
 parent_incident_type     245204 non-null  object
 hour_of_day              245204 non-null  object
 day_of_week              245204 non-null  object
 address_1                245187 non-null  object
 city                     245204 non-null  object
 state                    245204 non-null  object
location                 240127 non-null  object
latitude                 240127 non-null  object
longitude                240127 non-null  object
created_at               245204 non-null  object
census_tract_2010        242785 non-null  object
census_block_group_2010  242785 non-null  object
census_block_2010        242785 non-null  object
census_tract             242785 non-null  object
census_block             242785 non-null  object
census_block_group       242785 non-null  object
neighborhood_1           242785 non-null  object
police_district          242785 non-null  object
council_district         242785 non-null  object
tractce20                242922 non-null  object
geoid20_tract            242922 non-null  object
geoid20_blockgroup       242922 non-null  object
geoid20_block            242922 non-null  object
dtypes: object(27)
memory usage: 50.5+ MB

# check for null
df.isnull().sum()

case_number                   0
incident_datetime             0
incident_type_primary         0
incident_description          0
parent_incident_type          0
hour_of_day                   0
day_of_week                   0
address_1                    17
city                          0
state                         0
location                   5077
latitude                   5077
longitude                  5077
created_at                    0
census_tract_2010          2419
census_block_group_2010    2419
census_block_2010          2419
census_tract               2419
census_block               2419
census_block_group         2419
neighborhood_1             2419
police_district            2419
council_district           2419
tractce20                  2282
geoid20_tract              2282
geoid20_blockgroup         2282
geoid20_block              2282
dtype: int64

def null_nonnull_ratios(dataframe):
    """
    Calculate the ratios of null and non-null data in a pandas DataFrame.

    Parameters:
    dataframe (pd.DataFrame): The DataFrame for which you want to calculate null and non-null ratios.

    Returns:
    pd.DataFrame: A DataFrame containing columns for null and non-null ratios for each column.
    """
    total_rows = len(dataframe)
    null_counts = dataframe.isnull().sum()
    nonnull_counts = total_rows - null_counts
    null_ratios = null_counts / total_rows
    nonnull_ratios = nonnull_counts / total_rows
    result_df = pd.DataFrame({'null': null_ratios, 'non-null': nonnull_ratios})
    return result_df


ratios = null_nonnull_ratios(df)
print(ratios)

                             null  non-null
case_number              0.000000  1.000000
incident_datetime        0.000000  1.000000
incident_type_primary    0.000000  1.000000
incident_description     0.000000  1.000000
parent_incident_type     0.000000  1.000000
hour_of_day              0.000000  1.000000
day_of_week              0.000000  1.000000
address_1                0.000069  0.999931
city                     0.000000  1.000000
state                    0.000000  1.000000
location                 0.020705  0.979295
latitude                 0.020705  0.979295
longitude                0.020705  0.979295
created_at               0.000000  1.000000
census_tract_2010        0.009865  0.990135
census_block_group_2010  0.009865  0.990135
census_block_2010        0.009865  0.990135
census_tract             0.009865  0.990135
census_block             0.009865  0.990135
census_block_group       0.009865  0.990135
neighborhood_1           0.009865  0.990135
police_district          0.009865  0.990135
council_district         0.009865  0.990135
tractce20                0.009307  0.990693
geoid20_tract            0.009307  0.990693
geoid20_blockgroup       0.009307  0.990693
geoid20_block            0.009307  0.990693

print(df['parent_incident_type'].value_counts())
#print(df['incident_type_primary'].value_counts())

Theft                   108718
Assault                  50190
Breaking & Entering      43816
Theft of Vehicle         22122
Robbery                  15154
Sexual Assault            1844
Other Sexual Offense      1695
Homicide                   867
Sexual Offense             797
SODOMY                       1
Name: parent_incident_type, dtype: int64

Format Buffalo Open Data

df.index = pd.DatetimeIndex(df['incident_datetime'])

df['Year'] = df.index.year
df['Month'] = df.index.month
df['dayOfWeek'] = df.index.dayofweek
df['dayOfMonth'] = df.index.day
df['dayOfYear'] = df.index.dayofyear
df['weekOfMonth'] = df.dayOfMonth.apply(lambda d: (d - 1) // 7 + 1)

dayOfYear = list(df.index.dayofyear)

weekOfYear = [math.ceil(i/7) for i in dayOfYear]
df['weekOfYear'] = weekOfYear

df['latitude'] = pd.to_numeric(df['latitude'])
df['longitude'] = pd.to_numeric(df['longitude'])
df['hour_of_day'] = pd.to_numeric(df['hour_of_day'])

# remove outliers that are not in the city limits
df = df[(df['longitude'] < -78.80)]
df = df[(df['latitude'] < 43)]
#buffalo_map.sort_values('Latitude', ascending=False)

#should clean up datapoints outside of buffalo proper
df = df[df['neighborhood_1'] != 'UNKNOWN']

#df.info()

kiaBoys = df[df['parent_incident_type'] == 'Theft of Vehicle']

#print(df['parent_incident_type'].value_counts())

kiaBoys.shape

(21354, 34)

# check new car theft df for missing values

def null_nonnull_ratios(dataframe):
    """
    Calculate the ratios of null and non-null data in a pandas DataFrame.

    Parameters:
    dataframe (pd.DataFrame): The DataFrame for which you want to calculate null and non-null ratios.

    Returns:
    pd.DataFrame: A DataFrame containing columns for null and non-null ratios for each column.
    """
    total_rows = len(dataframe)
    null_counts = dataframe.isnull().sum()
    nonnull_counts = total_rows - null_counts
    null_ratios = null_counts / total_rows
    nonnull_ratios = nonnull_counts / total_rows
    result_df = pd.DataFrame({'null': null_ratios, 'non-null': nonnull_ratios})
    return result_df

ratios = null_nonnull_ratios(kiaBoys)
print(ratios)

                             null  non-null
case_number              0.000000  1.000000
incident_datetime        0.000000  1.000000
incident_type_primary    0.000000  1.000000
incident_description     0.000000  1.000000
parent_incident_type     0.000000  1.000000
hour_of_day              0.000000  1.000000
day_of_week              0.000000  1.000000
address_1                0.000047  0.999953
city                     0.000000  1.000000
state                    0.000000  1.000000
location                 0.000000  1.000000
latitude                 0.000000  1.000000
longitude                0.000000  1.000000
created_at               0.000000  1.000000
census_tract_2010        0.000000  1.000000
census_block_group_2010  0.000000  1.000000
census_block_2010        0.000000  1.000000
census_tract             0.000000  1.000000
census_block             0.000000  1.000000
census_block_group       0.000000  1.000000
neighborhood_1           0.000000  1.000000
police_district          0.000000  1.000000
council_district         0.000000  1.000000
tractce20                0.000000  1.000000
geoid20_tract            0.000000  1.000000
geoid20_blockgroup       0.000000  1.000000
geoid20_block            0.000000  1.000000
Year                     0.000000  1.000000
Month                    0.000000  1.000000
dayOfWeek                0.000000  1.000000
dayOfMonth               0.000000  1.000000
dayOfYear                0.000000  1.000000
weekOfMonth              0.000000  1.000000
weekOfYear               0.000000  1.000000

#create monthyear column and groupby monthyear

# Convert 'incident_datetime' to a datetime format if it's not already
kiaBoys['incident_datetime'] = pd.to_datetime(kiaBoys['incident_datetime'])

# Extract the month and year separately
kiaBoys['month'] = kiaBoys['incident_datetime'].dt.month
kiaBoys['year'] = kiaBoys['incident_datetime'].dt.year

# Create a new column with the month names
kiaBoys['month_name'] = kiaBoys['month'].apply(lambda x: datetime.date(1900, x, 1).strftime('%B'))

# Create a new column with the month and year as a formatted string
kiaBoys['monthYear'] = kiaBoys['incident_datetime'].dt.strftime('%Y %B')

# Display the result
#print(kiaBoys.head())

mean_latitude = kiaBoys['latitude'].mean()
mean_longitude = kiaBoys['longitude'].mean()
print(mean_latitude)
print(mean_longitude)

42.91205881802004
-78.84833113234055

# Create an empty list to store weight_list
weight_list = []

# Set 'account' to 1
kiaBoys['account'] = 1

# Loop to create the weight_list
for x in kiaBoys['monthYear'].sort_values().unique():
    weight_list.append(
        kiaBoys.loc[kiaBoys['monthYear'] == x, ['latitude', 'longitude', 'account']]
        .groupby(['latitude', 'longitude'])
        .sum()
        .reset_index()
        .values.tolist()
    )

Geospatial Heatmap

# create geospatial heatmap

# Assuming you have a DataFrame called kiaBoys and weight_list is defined.

# Calculate the mean latitude and longitude
mean_latitude = kiaBoys['latitude'].mean()
mean_longitude = kiaBoys['longitude'].mean()

# Create a map centered at the mean latitude and longitude
buffaloMap = folium.Map(location=[mean_latitude, mean_longitude], control_scale=True, zoom_start=12, tiles='Cartodbdark_matter')

# Now, use HeatMapWithTime
HeatMapWithTime(weight_list, radius=12, auto_play=True, min_opacity=0.5, max_opacity=1, use_local_extrema=True).add_to(buffaloMap)

# Display the map
buffaloMap

Make this Notebook Trusted to load map: File -> Trust Notebook

← Previous Post Next Post →

Buffalo Car Thefts

Animated Buffalo Car Theft Geospatial Heatmap Using Buffalo OpenData API

Table of Contents

Summary

Imports

Api Request

Format Buffalo Open Data

Geospatial Heatmap