This project offers a captivating demonstration of the power of geospatial heatmaps in data science, using Folium to visualize monthly car theft patterns in Buffalo from January 2009 to the present day. Geospatial heatmaps have emerged as a pivotal tool in the data science toolkit, enabling the exploration and communication of intricate spatial data trends.
In this endeavor, we leverage Folium, a robust Python library, to craft an engaging and informative geospatial heatmap. The heatmap employs a gradient color scheme that intuitively conveys insights: areas shaded in blue represent lower incidences of car theft, while those in red indicate higher concentrations of such incidents. This color-coded approach enhances the accessibility of complex data, allowing viewers to grasp patterns and variations at a glance.
What sets this project apart is the dynamic aspect introduced through animation. The heatmap unfolds sequentially, commencing its journey in January 2009 and traversing each subsequent month until September 2023. This animation not only provides a chronological perspective but also highlights temporal fluctuations in car theft occurrences. It serves as a compelling visual narrative, unveiling evolving trends over the extensive time frame.
In the realm of data science, geospatial heatmaps serve as indispensable tools for spatial analysis, allowing practitioners to discern geographical patterns, hotspots, and spatial relationships within their datasets. Whether investigating crime trends, urban planning, or resource allocation, geospatial heatmaps empower data scientists to make data-driven decisions with a profound understanding of geographical dynamics.
Folium, with its user-friendly interface and seamless integration with Python, has become a preferred choice for creating interactive and aesthetically pleasing geospatial visualizations. Its versatility extends beyond heatmaps, enabling the incorporation of various layers, markers, and tooltips for comprehensive spatial storytelling.
In conclusion, this project not only offers an insightful view of car theft patterns in Buffalo but also underscores the significance of geospatial heatmaps and tools like Folium in the data science landscape. It showcases how geospatial visualizations can enhance data exploration, communication, and decision-making, paving the way for informed and impactful insights in diverse domains.
import requests
import pandas as pd
import math
import datetime
import urllib.request
import json
import time
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import geopandas as gpd
import folium
from folium.plugins import HeatMap
import folium.plugins as plugins
from folium.plugins import HeatMapWithTime
plt.style.use('seaborn-v0_8-darkgrid')
# warnings ignore
import warnings
# set warnings to ignore
warnings.filterwarnings('ignore')
pd.options.mode.chained_assignment = None # default='warn'
# bring openDataBuffalo api key into googleColab
from google.colab import files
#import io
uploaded = files.upload()
# open api key
app_token = open('api_key.txt', 'r').read()
#app_token
#hide api token & return BuffaloOpenData crime data
limit = 500000
app_token = open('api_key.txt', 'r').read()
uri = f"https://data.buffalony.gov/resource/d6g9-xbgu.json?$limit={limit}&$$app_token={app_token}&$where=incident_datetime>'2009-01-10T12:00:00'"
# Send the HTTP GET request
r = requests.get(uri)
print('Status code ',r.status_code)
print('Number of rows returned ',len(r.json()))
print('Endoced URI with params ',r.url)
new_json = r.json()
#new_json
Status code 200
Number of rows returned 245204
Endoced URI with params https://data.buffalony.gov/resource/d6g9-xbgu.json?$limit=500000&$$app_token=NnGV0W4ip4YEFBLvBMGAjaByD&$where=incident_datetime%3E'2009-01-10T12:00:00'
df=pd.DataFrame(new_json)
print(df.shape)
df.head()
(245204, 27)
case_number | incident_datetime | incident_type_primary | incident_description | parent_incident_type | hour_of_day | day_of_week | address_1 | city | state | ... | census_tract | census_block | census_block_group | neighborhood_1 | police_district | council_district | tractce20 | geoid20_tract | geoid20_blockgroup | geoid20_block | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 09-0100387 | 2009-01-10T12:19:00.000 | BURGLARY | Buffalo Police are investigating this report o... | Breaking & Entering | 12 | Saturday | 2700 Block BAILEY | Buffalo | NY | ... | 51 | 1013 | 1 | North Park | District D | DELAWARE | 005100 | 36029005100 | 360290001101 | 360290002001013 |
1 | 09-0100389 | 2009-01-10T12:21:00.000 | BURGLARY | Buffalo Police are investigating this report o... | Breaking & Entering | 12 | Saturday | 800 Block EGGERT RD | Buffalo | NY | ... | 41 | 1009 | 1 | Kenfield | District E | UNIVERSITY | 004100 | 36029004100 | 360290001101 | 360290002001009 |
2 | 09-0270361 | 2009-01-10T12:27:00.000 | UUV | Buffalo Police are investigating this report o... | Theft of Vehicle | 12 | Saturday | 1600 Block MAIN ST | Buffalo | NY | ... | 168.02 | 1017 | 1 | Masten Park | District E | MASTEN | 016802 | 36029016802 | 360290001101 | 360290165001017 |
3 | 09-0100435 | 2009-01-10T12:30:00.000 | ASSAULT | Buffalo Police are investigating this report o... | Assault | 12 | Saturday | JEFFERSON AV & E FERRY ST | Buffalo | NY | ... | 168.02 | 2000 | 2 | Masten Park | District E | MASTEN | 016802 | 36029016802 | 360290001102 | 360290046012000 |
4 | 09-0100421 | 2009-01-10T12:30:00.000 | BURGLARY | Buffalo Police are investigating this report o... | Breaking & Entering | 12 | Saturday | 100 Block URBAN ST | Buffalo | NY | ... | 35.02 | 2000 | 2 | MLK Park | District C | MASTEN | 003502 | 36029003502 | 360290001102 | 360290046012000 |
5 rows × 27 columns
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 245204 entries, 0 to 245203
Data columns (total 27 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 case_number 245204 non-null object
1 incident_datetime 245204 non-null object
2 incident_type_primary 245204 non-null object
3 incident_description 245204 non-null object
4 parent_incident_type 245204 non-null object
5 hour_of_day 245204 non-null object
6 day_of_week 245204 non-null object
7 address_1 245187 non-null object
8 city 245204 non-null object
9 state 245204 non-null object
10 location 240127 non-null object
11 latitude 240127 non-null object
12 longitude 240127 non-null object
13 created_at 245204 non-null object
14 census_tract_2010 242785 non-null object
15 census_block_group_2010 242785 non-null object
16 census_block_2010 242785 non-null object
17 census_tract 242785 non-null object
18 census_block 242785 non-null object
19 census_block_group 242785 non-null object
20 neighborhood_1 242785 non-null object
21 police_district 242785 non-null object
22 council_district 242785 non-null object
23 tractce20 242922 non-null object
24 geoid20_tract 242922 non-null object
25 geoid20_blockgroup 242922 non-null object
26 geoid20_block 242922 non-null object
dtypes: object(27)
memory usage: 50.5+ MB
# check for null
df.isnull().sum()
case_number 0
incident_datetime 0
incident_type_primary 0
incident_description 0
parent_incident_type 0
hour_of_day 0
day_of_week 0
address_1 17
city 0
state 0
location 5077
latitude 5077
longitude 5077
created_at 0
census_tract_2010 2419
census_block_group_2010 2419
census_block_2010 2419
census_tract 2419
census_block 2419
census_block_group 2419
neighborhood_1 2419
police_district 2419
council_district 2419
tractce20 2282
geoid20_tract 2282
geoid20_blockgroup 2282
geoid20_block 2282
dtype: int64
def null_nonnull_ratios(dataframe):
"""
Calculate the ratios of null and non-null data in a pandas DataFrame.
Parameters:
dataframe (pd.DataFrame): The DataFrame for which you want to calculate null and non-null ratios.
Returns:
pd.DataFrame: A DataFrame containing columns for null and non-null ratios for each column.
"""
total_rows = len(dataframe)
null_counts = dataframe.isnull().sum()
nonnull_counts = total_rows - null_counts
null_ratios = null_counts / total_rows
nonnull_ratios = nonnull_counts / total_rows
result_df = pd.DataFrame({'null': null_ratios, 'non-null': nonnull_ratios})
return result_df
ratios = null_nonnull_ratios(df)
print(ratios)
null non-null
case_number 0.000000 1.000000
incident_datetime 0.000000 1.000000
incident_type_primary 0.000000 1.000000
incident_description 0.000000 1.000000
parent_incident_type 0.000000 1.000000
hour_of_day 0.000000 1.000000
day_of_week 0.000000 1.000000
address_1 0.000069 0.999931
city 0.000000 1.000000
state 0.000000 1.000000
location 0.020705 0.979295
latitude 0.020705 0.979295
longitude 0.020705 0.979295
created_at 0.000000 1.000000
census_tract_2010 0.009865 0.990135
census_block_group_2010 0.009865 0.990135
census_block_2010 0.009865 0.990135
census_tract 0.009865 0.990135
census_block 0.009865 0.990135
census_block_group 0.009865 0.990135
neighborhood_1 0.009865 0.990135
police_district 0.009865 0.990135
council_district 0.009865 0.990135
tractce20 0.009307 0.990693
geoid20_tract 0.009307 0.990693
geoid20_blockgroup 0.009307 0.990693
geoid20_block 0.009307 0.990693
print(df['parent_incident_type'].value_counts())
#print(df['incident_type_primary'].value_counts())
Theft 108718
Assault 50190
Breaking & Entering 43816
Theft of Vehicle 22122
Robbery 15154
Sexual Assault 1844
Other Sexual Offense 1695
Homicide 867
Sexual Offense 797
SODOMY 1
Name: parent_incident_type, dtype: int64
df.index = pd.DatetimeIndex(df['incident_datetime'])
df['Year'] = df.index.year
df['Month'] = df.index.month
df['dayOfWeek'] = df.index.dayofweek
df['dayOfMonth'] = df.index.day
df['dayOfYear'] = df.index.dayofyear
df['weekOfMonth'] = df.dayOfMonth.apply(lambda d: (d - 1) // 7 + 1)
dayOfYear = list(df.index.dayofyear)
weekOfYear = [math.ceil(i/7) for i in dayOfYear]
df['weekOfYear'] = weekOfYear
df['latitude'] = pd.to_numeric(df['latitude'])
df['longitude'] = pd.to_numeric(df['longitude'])
df['hour_of_day'] = pd.to_numeric(df['hour_of_day'])
# remove outliers that are not in the city limits
df = df[(df['longitude'] < -78.80)]
df = df[(df['latitude'] < 43)]
#buffalo_map.sort_values('Latitude', ascending=False)
#should clean up datapoints outside of buffalo proper
df = df[df['neighborhood_1'] != 'UNKNOWN']
#df.info()
kiaBoys = df[df['parent_incident_type'] == 'Theft of Vehicle']
#print(df['parent_incident_type'].value_counts())
kiaBoys.shape
(21354, 34)
# check new car theft df for missing values
def null_nonnull_ratios(dataframe):
"""
Calculate the ratios of null and non-null data in a pandas DataFrame.
Parameters:
dataframe (pd.DataFrame): The DataFrame for which you want to calculate null and non-null ratios.
Returns:
pd.DataFrame: A DataFrame containing columns for null and non-null ratios for each column.
"""
total_rows = len(dataframe)
null_counts = dataframe.isnull().sum()
nonnull_counts = total_rows - null_counts
null_ratios = null_counts / total_rows
nonnull_ratios = nonnull_counts / total_rows
result_df = pd.DataFrame({'null': null_ratios, 'non-null': nonnull_ratios})
return result_df
ratios = null_nonnull_ratios(kiaBoys)
print(ratios)
null non-null
case_number 0.000000 1.000000
incident_datetime 0.000000 1.000000
incident_type_primary 0.000000 1.000000
incident_description 0.000000 1.000000
parent_incident_type 0.000000 1.000000
hour_of_day 0.000000 1.000000
day_of_week 0.000000 1.000000
address_1 0.000047 0.999953
city 0.000000 1.000000
state 0.000000 1.000000
location 0.000000 1.000000
latitude 0.000000 1.000000
longitude 0.000000 1.000000
created_at 0.000000 1.000000
census_tract_2010 0.000000 1.000000
census_block_group_2010 0.000000 1.000000
census_block_2010 0.000000 1.000000
census_tract 0.000000 1.000000
census_block 0.000000 1.000000
census_block_group 0.000000 1.000000
neighborhood_1 0.000000 1.000000
police_district 0.000000 1.000000
council_district 0.000000 1.000000
tractce20 0.000000 1.000000
geoid20_tract 0.000000 1.000000
geoid20_blockgroup 0.000000 1.000000
geoid20_block 0.000000 1.000000
Year 0.000000 1.000000
Month 0.000000 1.000000
dayOfWeek 0.000000 1.000000
dayOfMonth 0.000000 1.000000
dayOfYear 0.000000 1.000000
weekOfMonth 0.000000 1.000000
weekOfYear 0.000000 1.000000
#create monthyear column and groupby monthyear
# Convert 'incident_datetime' to a datetime format if it's not already
kiaBoys['incident_datetime'] = pd.to_datetime(kiaBoys['incident_datetime'])
# Extract the month and year separately
kiaBoys['month'] = kiaBoys['incident_datetime'].dt.month
kiaBoys['year'] = kiaBoys['incident_datetime'].dt.year
# Create a new column with the month names
kiaBoys['month_name'] = kiaBoys['month'].apply(lambda x: datetime.date(1900, x, 1).strftime('%B'))
# Create a new column with the month and year as a formatted string
kiaBoys['monthYear'] = kiaBoys['incident_datetime'].dt.strftime('%Y %B')
# Display the result
#print(kiaBoys.head())
mean_latitude = kiaBoys['latitude'].mean()
mean_longitude = kiaBoys['longitude'].mean()
print(mean_latitude)
print(mean_longitude)
42.91205881802004
-78.84833113234055
# Create an empty list to store weight_list
weight_list = []
# Set 'account' to 1
kiaBoys['account'] = 1
# Loop to create the weight_list
for x in kiaBoys['monthYear'].sort_values().unique():
weight_list.append(
kiaBoys.loc[kiaBoys['monthYear'] == x, ['latitude', 'longitude', 'account']]
.groupby(['latitude', 'longitude'])
.sum()
.reset_index()
.values.tolist()
)
# create geospatial heatmap
# Assuming you have a DataFrame called kiaBoys and weight_list is defined.
# Calculate the mean latitude and longitude
mean_latitude = kiaBoys['latitude'].mean()
mean_longitude = kiaBoys['longitude'].mean()
# Create a map centered at the mean latitude and longitude
buffaloMap = folium.Map(location=[mean_latitude, mean_longitude], control_scale=True, zoom_start=12, tiles='Cartodbdark_matter')
# Now, use HeatMapWithTime
HeatMapWithTime(weight_list, radius=12, auto_play=True, min_opacity=0.5, max_opacity=1, use_local_extrema=True).add_to(buffaloMap)
# Display the map
buffaloMap