Summary
Imports
Open Buffalo API Request
Initial Data Shape & Column Name Review
Check For Null Data
Add Date Columns
Exploratory Data Analysis
Neighborhood Graphs
Geospatial Graphs
Geospatial Heatmaps
Forecast Modeling
Conclusion

Summary

This project serves as a comprehensive demonstration of crime analysis in Buffalo, utilizing an API linked to Buffalo’s open data resources. In recognition of the potential data reliability issues noted on the Buffalo Open Data website prior to 2009, the decision was made to focus exclusively on data spanning from 2009 to the present day.

The primary objectives of this endeavor encompass several key aspects:

Data Acquisition Through APIs: The project commences by harnessing the power of Application Programming Interfaces (APIs) to efficiently collect and retrieve crime-related data from Buffalo’s open data repository. This process ensures access to up-to-date and reliable information, essential for subsequent analysis.
Exploratory Data Analysis (EDA): Following data acquisition, an initial exploratory analysis phase ensues. During this stage, the project aims to uncover valuable insights and trends within the crime data. This involves examining patterns by year, neighborhood, and crime type, shedding light on key factors influencing Buffalo’s crime landscape.
Forecasting Techniques: Building upon the EDA findings, the project delves into advanced forecasting techniques to enhance our understanding of future crime trends. Three primary forecasting methods are employed:
- Simple Moving Averages: This technique applies a straightforward moving average approach to predict future crime rates. It involves calculating the average of crime occurrences over a defined period, such as months or weeks, providing a basic yet valuable forecasting tool.
- Weighted Moving Averages: In this approach, a weighted average is employed, assigning different levels of importance to data points based on their proximity to the prediction point. This method accommodates the potential significance of recent crime data in making forecasts.
- Exponential Moving Averages: Recognizing the exponential decay of relevance in historical data, exponential moving averages assign greater weight to recent data points. This technique is particularly useful for capturing short-term fluctuations and trends in crime rates.

Through this multifaceted approach, the project contributes to a data-driven understanding of crime dynamics in Buffalo and to make informed decisions for a safer future.

Import Packages

# import packages

import requests
import pandas as pd
import math
import datetime
import urllib.request
import json
import time
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
from folium.plugins import HeatMap
import folium

plt.style.use('seaborn-v0_8-darkgrid')
# warnings ignore
import warnings
# set warnings to ignore
#warnings.filterwarnings('ignore')

pd.options.mode.chained_assignment = None  # default='warn'

# bring api key into googleColab
from google.colab import files
#import io

uploaded = files.upload()

Buffalo OpenData API

# open api key
app_token = open('api_key.txt', 'r').read()
# app_token

# hide api token & return BuffaloOpenData crime data
limit = 500000
app_token = open('api_key.txt', 'r').read()

uri = f"https://data.buffalony.gov/resource/d6g9-xbgu.json?$limit={limit}&$$app_token={app_token}&$where=incident_datetime>'2009-01-10T12:00:00'"

# send the HTTP GET request
r = requests.get(uri)

# check the response status code and process the data if it's successful
if r.status_code == 200:
    print('Status code:', r.status_code)
    print('Number of rows returned:', len(r.json()))
    print('Encoded URI with params:', r.url)
    new_json = r.json()
    # Process the new_json data as needed
else:
    print('Failed to fetch data. Status code:', r.status_code)

Status code: 200
Number of rows returned: 239722
Encoded URI with params: https://data.buffalony.gov/resource/d6g9-xbgu.json?$limit=500000&$$app_token=NnGV0W4ip4YEFBLvBMGAjaByD&$where=incident_datetime%3E'2009-01-10T12:00:00'

Initial Data Shape & Column Review

data=pd.DataFrame(new_json)
print(data.shape)
data.head()

(239722, 27)

	case_number	incident_datetime	incident_type_primary	incident_description	parent_incident_type	hour_of_day	day_of_week	address_1	city	state	...	census_tract	census_block	census_block_group	neighborhood_1	police_district	council_district	tractce20	geoid20_tract	geoid20_blockgroup	geoid20_block
0	09-0100387	2009-01-10T12:19:00.000	BURGLARY	Buffalo Police are investigating this report o...	Breaking & Entering	12	Saturday	2700 Block BAILEY	Buffalo	NY	...	51	1013	1	North Park	District D	DELAWARE	005100	36029005100	360290001101	360290002001013
1	09-0100389	2009-01-10T12:21:00.000	BURGLARY	Buffalo Police are investigating this report o...	Breaking & Entering	12	Saturday	800 Block EGGERT RD	Buffalo	NY	...	41	1009	1	Kenfield	District E	UNIVERSITY	004100	36029004100	360290001101	360290002001009
2	09-0270361	2009-01-10T12:27:00.000	UUV	Buffalo Police are investigating this report o...	Theft of Vehicle	12	Saturday	1600 Block MAIN ST	Buffalo	NY	...	168.02	1017	1	Masten Park	District E	MASTEN	016802	36029016802	360290001101	360290165001017
3	09-0100435	2009-01-10T12:30:00.000	ASSAULT	Buffalo Police are investigating this report o...	Assault	12	Saturday	JEFFERSON AV & E FERRY ST	Buffalo	NY	...	168.02	2000	2	Masten Park	District E	MASTEN	016802	36029016802	360290001102	360290046012000
4	09-0100421	2009-01-10T12:30:00.000	BURGLARY	Buffalo Police are investigating this report o...	Breaking & Entering	12	Saturday	100 Block URBAN ST	Buffalo	NY	...	35.02	2000	2	MLK Park	District C	MASTEN	003502	36029003502	360290001102	360290046012000

5 rows × 27 columns

# check data types and swicth to int, floats and strings
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 239722 entries, 0 to 239721
Data columns (total 27 columns):
 #   Column                   Non-Null Count   Dtype 
---  ------                   --------------   ----- 
 case_number              239722 non-null  object
 incident_datetime        239722 non-null  object
 incident_type_primary    239722 non-null  object
 incident_description     239722 non-null  object
 parent_incident_type     239722 non-null  object
 hour_of_day              239722 non-null  object
 day_of_week              239722 non-null  object
 address_1                239705 non-null  object
 city                     239722 non-null  object
 state                    239722 non-null  object
location                 235055 non-null  object
latitude                 235055 non-null  object
longitude                235055 non-null  object
created_at               239722 non-null  object
census_tract_2010        237713 non-null  object
census_block_group_2010  237713 non-null  object
census_block_2010        237713 non-null  object
census_tract             237713 non-null  object
census_block             237713 non-null  object
census_block_group       237713 non-null  object
neighborhood_1           237713 non-null  object
police_district          237713 non-null  object
council_district         237713 non-null  object
tractce20                237850 non-null  object
geoid20_tract            237850 non-null  object
geoid20_blockgroup       237850 non-null  object
geoid20_block            237850 non-null  object
dtypes: object(27)
memory usage: 49.4+ MB

Check For Null Data

# check for null
data.isnull().sum()

case_number                   0
incident_datetime             0
incident_type_primary         0
incident_description          0
parent_incident_type          0
hour_of_day                   0
day_of_week                   0
address_1                    17
city                          0
state                         0
location                   4667
latitude                   4667
longitude                  4667
created_at                    0
census_tract_2010          2009
census_block_group_2010    2009
census_block_2010          2009
census_tract               2009
census_block               2009
census_block_group         2009
neighborhood_1             2009
police_district            2009
council_district           2009
tractce20                  1872
geoid20_tract              1872
geoid20_blockgroup         1872
geoid20_block              1872
dtype: int64

# chatgpt code for function displaying null & non-null column ratios

def null_nonnull_ratios(dataframe):
    """
    Calculate the ratios of null and non-null data in a pandas DataFrame.

    Parameters:
    dataframe (pd.DataFrame): The DataFrame for which you want to calculate null and non-null ratios.

    Returns:
    pd.DataFrame: A DataFrame containing columns for null and non-null ratios for each column.
    """
    total_rows = len(dataframe)
    null_counts = dataframe.isnull().sum()
    nonnull_counts = total_rows - null_counts
    null_ratios = null_counts / total_rows
    nonnull_ratios = nonnull_counts / total_rows
    result_df = pd.DataFrame({'null': null_ratios, 'non-null': nonnull_ratios})
    return result_df

ratios = null_nonnull_ratios(data)
print(ratios)

                             null  non-null
case_number              0.000000  1.000000
incident_datetime        0.000000  1.000000
incident_type_primary    0.000000  1.000000
incident_description     0.000000  1.000000
parent_incident_type     0.000000  1.000000
hour_of_day              0.000000  1.000000
day_of_week              0.000000  1.000000
address_1                0.000071  0.999929
city                     0.000000  1.000000
state                    0.000000  1.000000
location                 0.019468  0.980532
latitude                 0.019468  0.980532
longitude                0.019468  0.980532
created_at               0.000000  1.000000
census_tract_2010        0.008381  0.991619
census_block_group_2010  0.008381  0.991619
census_block_2010        0.008381  0.991619
census_tract             0.008381  0.991619
census_block             0.008381  0.991619
census_block_group       0.008381  0.991619
neighborhood_1           0.008381  0.991619
police_district          0.008381  0.991619
council_district         0.008381  0.991619
tractce20                0.007809  0.992191
geoid20_tract            0.007809  0.992191
geoid20_blockgroup       0.007809  0.992191
geoid20_block            0.007809  0.992191

Add Date Columns

# make new date columns to groupby for EDA

data.index = pd.DatetimeIndex(data['incident_datetime'])

data['Year'] = data.index.year
data['Month'] = data.index.month
data['dayOfWeek'] = data.index.dayofweek
data['dayOfMonth'] = data.index.day
data['dayOfYear'] = data.index.dayofyear
data['weekOfMonth'] = data.dayOfMonth.apply(lambda d: (d - 1) // 7 + 1)

dayOfYear = list(data.index.dayofyear)

weekOfYear = [math.ceil(i/7) for i in dayOfYear]
data['weekOfYear'] = weekOfYear

# code for color slection on graphs / comment out later

import math

from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors


def plot_colortable(colors, *, ncols=4, sort_colors=True):

    cell_width = 212
    cell_height = 22
    swatch_width = 48
    margin = 12

    # Sort colors by hue, saturation, value and name.
    if sort_colors is True:
        names = sorted(
            colors, key=lambda c: tuple(mcolors.rgb_to_hsv(mcolors.to_rgb(c))))
    else:
        names = list(colors)

    n = len(names)
    nrows = math.ceil(n / ncols)

    width = cell_width * 4 + 2 * margin
    height = cell_height * nrows + 2 * margin
    dpi = 72

    fig, ax = plt.subplots(figsize=(width / dpi, height / dpi), dpi=dpi)
    fig.subplots_adjust(margin/width, margin/height,
                        (width-margin)/width, (height-margin)/height)
    ax.set_xlim(0, cell_width * 4)
    ax.set_ylim(cell_height * (nrows-0.5), -cell_height/2.)
    ax.yaxis.set_visible(False)
    ax.xaxis.set_visible(False)
    ax.set_axis_off()

    for i, name in enumerate(names):
        row = i % nrows
        col = i // nrows
        y = row * cell_height

        swatch_start_x = cell_width * col
        text_pos_x = cell_width * col + swatch_width + 7

        ax.text(text_pos_x, y, name, fontsize=14,
                horizontalalignment='left',
                verticalalignment='center')

        ax.add_patch(
            Rectangle(xy=(swatch_start_x, y-9), width=swatch_width,
                      height=18, facecolor=colors[name], edgecolor='0.7')
        )

    return fig

# available colors for graphs / comment out later
plt.style.use('dark_background')  # set the background to black
plot_colortable(mcolors.CSS4_COLORS)
plt.show()

Exploratory Data Analysis

# yearly analysis on crime count

# plt.style.use('dark_background')  # set the background to black
# once plt.style is set there is no need to include teh code setting in future plots
ax = data.groupby([data.Year]).size().plot(legend=False, color='yellowgreen', kind='barh')

plt.ylabel('Year', color='white')
plt.xlabel('Number of crimes', color='white')
plt.title('Number of crimes by year', color='white')

plt.tick_params(axis='both', colors='white')  # Set tick color
ax.spines['bottom'].set_color('white')  # Set x-axis color
ax.spines['left'].set_color('white')  # Set y-axis color

plt.show()

png

The graph presented above illustrates a noteworthy annual decline in the total number of crimes since the year 2009.

Furthermore, as depicted in the chart below, the year 2022 accounts for a relatively modest 3.95% of the total crimes recorded in the dataset spanning from 2009 to the present day.

# above graph data in chart form
print(f'Percentage of total crimes in dataset(2009-2023) per year:\n\n{data.Year.value_counts(normalize=True)}')

Percentage of total crimes in dataset(2009-2023) per year:

  0.090559
  0.088761
  0.085991
  0.085399
  0.077807
  0.073097
  0.072033
  0.068629
  0.064516
  0.064262
  0.057020
  0.050571
  0.049011
  0.039533
  0.032809
Name: Year, dtype: float64

#crimes by day of week
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

ax = data.groupby([data.dayOfWeek]).size().plot(legend=False, color='yellowgreen', kind='barh')
#ax = data.groupby([data.Year]).size().plot(legend=False, color='yellowgreen', kind='barh')

plt.ylabel('Day of week', color='white')
plt.yticks(np.arange(7), days)
plt.xlabel('Number Of Crimes', color='white')
plt.title('Number Of Crimes By Day Of Week', color='white')

plt.tick_params(axis='both', colors='white')  # Set tick color
ax.spines['bottom'].set_color('white')  # Set x-axis color
ax.spines['left'].set_color('white')  # Set y-axis color

plt.show()

png

Friday appears to exhibit a slightly higher incidence of crimes when compared to other days, although this difference is not markedly significant.

# crimes by month
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
data.groupby([data.Month]).size().plot(kind='barh', color='yellowgreen')
plt.ylabel('Months Of The Year')
plt.yticks(np.arange(12), months)
plt.xlabel('Number Of Crimes')
plt.title('Number Of Crimes By Month Of The Year')
plt.show()

png

# define a dictionary to map numeric month values to month names
month_names = {
    1: 'January',
    2: 'February',
    3: 'March',
    4: 'April',
    5: 'May',
    6: 'June',
    7: 'July',
    8: 'August',
    9: 'September',
    10: 'October',
    11: 'November',
    12: 'December'
}

# map the numeric month values to month names
data['MonthNames'] = data['Month'].map(month_names)

# calculate the counts of each month and normalize the results
month_counts = data['MonthNames'].value_counts(normalize=True)

print(f'Percentage of Crime Per Month:\n\n{month_counts}')

Percentage of Crime Per Month:

August       0.100879
July         0.100212
June         0.091752
May          0.089896
September    0.088649
October      0.086217
April        0.078445
November     0.076042
January      0.075567
December     0.074728
March        0.074190
February     0.063423
Name: MonthNames, dtype: float64

The graphical representations above provide a clear depiction of February consistently registering the lowest number of crimes per month.

Moreover, the chart underscores a pronounced disparity in crime rates between the sweltering summer months and the frigid winter months.

plt.figure(figsize=(11,5))
data.resample('M').size().plot(legend=False, color='yellowgreen')
plt.title('Number Of Crimes Per Month (2009 - 2023)')
plt.xlabel('Months')
plt.ylabel('Number Of Crimes')
plt.show()

png

The chart presented above vividly illustrates a declining trend in annual crime rates.

Furthermore, it unveils a distinctive zigzag pattern, with crime receding during the colder seasons and resurging during the hotter months.

data.groupby([data.dayOfMonth]).size().plot(kind='barh',legend=False, color='yellowgreen')
plt.ylabel('Day of the month')
plt.xlabel('Number of crimes')
plt.title('Number of crimes by day of the month')
plt.show()

png

print(f'Percentage Of Crime Per Day Of Month:\n\n{data.dayOfMonth.value_counts(normalize=True)}')

Percentage Of Crime Per Day Of Month:

   0.041590
  0.033643
  0.033635
  0.033626
  0.033606
  0.033547
  0.033159
  0.033080
  0.032884
  0.032880
   0.032792
   0.032667
  0.032596
  0.032529
  0.032529
  0.032521
  0.032475
  0.032408
  0.032287
  0.032246
  0.032237
  0.032033
   0.031916
   0.031758
   0.031566
   0.031411
   0.031257
   0.031015
  0.030577
  0.030310
  0.019218
Name: dayOfMonth, dtype: float64

The data suggests that the first day of each month consistently records the highest incidence of criminal activities.

# crimes plotted per day
plt.figure(figsize=(11,5))
data.resample('D').size().plot(legend=False, color='yellowgreen')
plt.title('Number Of Crimes Per Day (2009 - 2023)')
plt.xlabel('Days')
plt.ylabel('Number Of Crimes')
plt.show()

png

# crimes plotted by week of month
data.groupby([data.weekOfMonth]).size().plot(kind='barh',  color='yellowgreen')
plt.ylabel('Week Of The Month')
plt.xlabel('Number Of Crimes')
plt.title('Number Of Crimes By Week Of The Month')
plt.show()

png

print(f'Percentage Of Crime Per Week Of Month:\n\n{data.weekOfMonth.value_counts(normalize=True)}')

#data.weekOfMonth.value_counts(normalize=True)

Percentage Of Crime Per Week Of Month:

  0.232995
  0.230346
  0.230313
  0.226241
  0.080105
Name: weekOfMonth, dtype: float64

Based on the insights gleaned from the preceding graph and chart, it becomes evident that the specific week within a month may not significantly impact crime rates. Notably, the observation that the fifth week records fewer incidents can be attributed to its shorter duration.

# week of year
plt.figure(figsize=(8,10))
data.groupby([data.weekOfYear]).size().sort_values().plot(kind='barh', color='yellowgreen')
plt.ylabel('weeks of the year')
plt.xlabel('Number of crimes')
plt.title('Number of crimes by month of the year')
plt.show()

png

The graph above serves as an additional perspective, reaffirming the correlation between warmer months and their respective weeks, which consistently exhibit higher crime rates when contrasted with the colder months.

# number of crimes per week
plt.figure(figsize=(11,5))
data.resample('W').size().plot(legend=False,color='yellowgreen')
plt.title('Number Of Crimes Per Week (2009 - 2023)')
plt.xlabel('Weeks')
plt.ylabel('Number Of Crimes')
plt.show()

png

The graph displayed above offers yet another illustrative trendline, dissected on a weekly basis, spanning from 2009 to the present day.

Now, let’s delve into the substantial decline at the outset of 2023 and investigate whether it can indeed be attributed to the blizzard event.

# grab the dec 2022 and jan 2023 data only
blizzard2022 = data[(data['Year'] == 2022) & (data['Month'] == 12)]
blizzard2023 = data[(data['Year'] == 2023) & (data['Month'] == 1)]

# concatenate the two DataFrames
blizzard_combined = pd.concat([blizzard2022, blizzard2023], ignore_index=True)
#blizzard_combined

# convert the 'incident_datetime' column to a datetime type if it's not already
blizzard_combined['incident_datetime'] = pd.to_datetime(blizzard_combined['incident_datetime'])

# set the 'incident_datetime' column as the index
blizzard_combined.set_index('incident_datetime', inplace=True)

# plot the number of crimes using resample
plt.figure(figsize=(11, 5))
blizzard_combined.resample('W').size().plot(legend=False, color='yellowgreen')
plt.title('Number Of Crimes Around the Blizzard (Dec 2022-Jan 2023)')
plt.xlabel('Weeks')
plt.ylabel('Number Of Crimes')
plt.show()

png

My initial hypothesis has been disproven; the decrease in crime can be attributed to February’s weather conditions rather than the blizzard event.

Neighborhood Graphs

# week of year per neigborhood

listOfNeighborhoods = list(data['neighborhood_1'].unique())

for neighborhood in listOfNeighborhoods:
    df = data[data['neighborhood_1'] == neighborhood]

    # Check if df is empty before resampling and plotting
    if not df.empty:
        plt.figure(figsize=(11, 5))
        df.resample('W').size().plot(legend=False, color='yellowgreen')
        plt.title('Number Of Crimes Per Week (2009 - 2023) For Neighborhood {}'.format(neighborhood))
        plt.xlabel('Weeks')
        plt.ylabel('Number Of Crimes')
        plt.show()
    else:
        print(f"No data for neighborhood {neighborhood}")

png

No data for neighborhood nan

# bar chart of crimes
plt.figure(figsize=(8,10))
data.groupby([data['incident_type_primary']]).size().sort_values(ascending=True).plot(kind='barh', color='yellowgreen')
plt.title('Number of crimes by type')
plt.ylabel('Crime Type')
plt.xlabel('Number of crimes')
plt.show()

png

# chart of crimes
print(f'Percentage of Crimes by types:\n\n{data.incident_type_primary.value_counts(normalize=True)}')

Percentage of Crimes by types:

LARCENY/THEFT               0.438012
ASSAULT                     0.203365
BURGLARY                    0.180000
UUV                         0.086375
ROBBERY                     0.062623
RAPE                        0.009090
SEXUAL ABUSE                0.008685
THEFT OF SERVICES           0.006916
MURDER                      0.003216
Assault                     0.000480
Breaking & Entering         0.000346
AGGR ASSAULT                0.000321
CRIM NEGLIGENT HOMICIDE     0.000271
Theft                       0.000138
MANSLAUGHTER                0.000046
AGG ASSAULT ON P/OFFICER    0.000042
Robbery                     0.000025
Sexual Assault              0.000021
Theft of Vehicle            0.000013
Other Sexual Offense        0.000008
Homicide                    0.000004
SODOMY                      0.000004
Name: incident_type_primary, dtype: float64

Remove Outlier Crimes / Maybe Label As Others Later

print('Current rows:', data.shape[0])
data['incident_type_primary'] = data['incident_type_primary'].astype(str)
data = data[(data['incident_type_primary'] != 'SODOMY') &
            (data['incident_type_primary'] != 'Homicide') &
            (data['incident_type_primary'] != 'Other Sexual Offense') &
            (data['incident_type_primary'] != 'Theft of Vehicle') &
            (data['incident_type_primary'] != 'Sexual Assault') &
            (data['incident_type_primary'] != 'Robbery') &
            (data['incident_type_primary'] != 'AGG ASSAULT ON P/OFFICER') &
            (data['incident_type_primary'] != 'Theft') &
            (data['incident_type_primary'] != 'CRIM NEGLIGENT HOMICIDE') &
            (data['incident_type_primary'] != 'AGGR ASSAULT') &
            (data['incident_type_primary'] != 'Breaking & Entering') &
            (data['incident_type_primary'] != 'Assault') &
            (data['incident_type_primary'] != 'MANSLAUGHTER')]

print('Rows after removing primary type outliers:', data.shape[0])

Current rows: 239722
Rows after removing primary type outliers: 239310

plt.figure(figsize=(8,10))
data.groupby([data['neighborhood_1']]).size().sort_values(ascending=True)[-70:].plot(kind='barh', color='yellowgreen')
plt.title('Number of crimes by locations')
plt.ylabel('neighborhood_1')
plt.xlabel('Number of crimes')
plt.show()

png

# Show 2022 vs 2009
# possible show ratio

# grab 2009 data and 2022 data to compare crime charts
data2009 = data[(data['Year'] == 2009)]
data2022 = data[(data['Year'] == 2022)]

# 2009 crimes by location

plt.figure(figsize=(8,10))
data2009.groupby([data2009['neighborhood_1']]).size().sort_values(ascending=True)[-70:].plot(kind='barh', color='yellowgreen')
plt.title('Number Of Crimes By Locations In 2009')
plt.ylabel('Neighborhood')
plt.xlabel('Number Of Crimes')
plt.show()

png

# 2022 crimes by location

plt.figure(figsize=(8,10))
data2022.groupby([data2022['neighborhood_1']]).size().sort_values(ascending=True)[-70:].plot(kind='barh', color='yellowgreen')
plt.title('Number Of Crimes By Locations In 2022')
plt.ylabel('Neighborhood')
plt.xlabel('Number of crimes')
plt.show()

png

import plotly.graph_objects as go

# Filter data for 2009 and 2022
data2009 = data[data['Year'] == 2009]
data2022 = data[data['Year'] == 2022]

# Create subplots
fig = go.Figure()

# Subplot 1: 2009 crimes by location
fig.add_trace(go.Bar(
    y=data2009.groupby([data2009['neighborhood_1']]).size().sort_values(ascending=True)[-70:].index,
    x=data2009.groupby([data2009['neighborhood_1']]).size().sort_values(ascending=True)[-70:],
    orientation='h',
    marker=dict(color='deepskyblue'),
    name='2009'
))

# Subplot 2: 2022 crimes by location
fig.add_trace(go.Bar(
    y=data2022.groupby([data2022['neighborhood_1']]).size().sort_values(ascending=True)[-70:].index,
    x=data2022.groupby([data2022['neighborhood_1']]).size().sort_values(ascending=True)[-70:],
    orientation='h',
    marker=dict(color='orchid'),
    name='2022'
))

# Update layout for dark theme
fig.update_layout(
    title='Number of Crimes by Locations (2009 and 2022)',
    yaxis_title='Neighborhood',
    xaxis_title='Number of Crimes',
    barmode='group',
    width=1000,
    height=500,
    plot_bgcolor='black',  # Set background color to black
    paper_bgcolor='black',  # Set paper color to black
    font=dict(color='white')  # Set text color to white
)

# Show plot
fig.show()

Buffalo Crime Geospatial Graphs

# make new data frame with map data
buffalo_map = data[['neighborhood_1','incident_type_primary', 'latitude', 'longitude',  'incident_datetime', 'hour_of_day']]

buffalo_map['latitude'] = pd.to_numeric(buffalo_map['latitude'])
buffalo_map['longitude'] = pd.to_numeric(buffalo_map['longitude'])
buffalo_map['hour_of_day'] = pd.to_numeric(buffalo_map['hour_of_day'])

buffalo_map['incident_datetime'] = pd.to_datetime(buffalo_map['incident_datetime'])
buffalo_map['Year'] = buffalo_map['incident_datetime'].dt.year
buffalo_map['Month'] = buffalo_map['incident_datetime'].dt.month

buffalo_map.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 239310 entries, 2009-01-10 12:19:00 to 2023-09-11 11:12:45
Data columns (total 8 columns):
 #   Column                 Non-Null Count   Dtype         
---  ------                 --------------   -----         
 0   neighborhood_1         237303 non-null  object        
 1   incident_type_primary  239310 non-null  object        
 2   latitude               234651 non-null  float64       
 3   longitude              234651 non-null  float64       
 4   incident_datetime      239310 non-null  datetime64[ns]
 5   hour_of_day            239310 non-null  int64         
 6   Year                   239310 non-null  int64         
 7   Month                  239310 non-null  int64         
dtypes: datetime64[ns](1), float64(2), int64(3), object(2)
memory usage: 16.4+ MB

# buffalo lat and lon mean
mean_latitude = buffalo_map['latitude'].mean()
print(mean_latitude)
mean_longitude = buffalo_map['longitude'].mean()
print(mean_longitude)

42.911893612215586
-78.84912654111854

# remove outliers that are not in the city limits
buffalo_map = buffalo_map[(buffalo_map['longitude'] < -78.80)]
buffalo_map = buffalo_map[(buffalo_map['latitude'] < 43)]
#buffalo_map.sort_values('Latitude', ascending=False)

#ignoring unknown neighborhoods
buffalo_map = buffalo_map[buffalo_map['neighborhood_1'] != 'UNKNOWN']

# all crimes per neighborhood
sns.lmplot(x = 'longitude',
           y = 'latitude',
           data=buffalo_map[:],
           fit_reg=False,
           hue="neighborhood_1",
           palette='Dark2',
           height=10,
           ci=2,
           scatter_kws={"marker": "D",
                        "s": 10})
ax = plt.gca()
ax.set_title("All Crime Distribution Per Neighborhood")

Text(0.5, 1.0, 'All Crime Distribution Per Neighborhood')

png

# show most common crime per neighborhood
# preprocessing to group most common crime per neighborhood
sdf = buffalo_map.groupby(['neighborhood_1', 'incident_type_primary']).size().reset_index(name='counts')
idx = sdf.groupby(['neighborhood_1'])['counts'].transform(max) == sdf['counts']
sdf = sdf[idx]
other = buffalo_map.groupby('neighborhood_1')[['longitude', 'latitude']].mean()

sdf = sdf.set_index('neighborhood_1').join(other)
sdf = sdf.reset_index().sort_values("counts",ascending=False)
#sns.lmplot(x='longitude', y='latitude',height=10, hue=incident_type_primary', data=sdf,scatter_kws={"s": sdf['counts'].apply(lambda x: x/100.0)}, fit_reg=False)


#  scatter plot
sns.lmplot(x='longitude', y='latitude', height=10, hue='incident_type_primary', data=sdf, fit_reg=False, scatter=True)

# Annotation code...
for r in sdf.reset_index().to_numpy():
    neighborhood_ = "neighborhood_1: {0}, Count: {1}".format(r[1], int(r[3]))

    #neighborhood_ = "neighborhood_1 {0}, Count : {1}".format(int(r[1]), int(r[3]))
    x = r[4]
    y = r[5]
    plt.annotate(
        neighborhood_,
        xy=(x, y), xytext=(-15, 15),
        textcoords='offset points', ha='right', va='bottom',
        bbox=dict(boxstyle='round,pad=0.5', fc='grey', alpha=0.3),
        arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))

plt.show()

png

The graph above distinctly highlights that, across Buffalo neighborhoods, the prevailing type of crime is predominantly larceny or theft. However, a notable exception to this pattern is the Delavan Grider neighborhood, where the dominant crime category is assault.

# buffalo lat and lon mean
mean_latitude = buffalo_map['latitude'].mean()
print(mean_latitude)
mean_longitude = buffalo_map['longitude'].mean()
print(mean_longitude)

42.91184928528912
-78.84964614694492

# interactive map of buffalo showing crime amount per neighborhood

sdf = buffalo_map.groupby(['neighborhood_1', 'incident_type_primary']).size().reset_index(name='counts')
idx = sdf.groupby(['neighborhood_1'])['counts'].transform(max) == sdf['counts']
sdf = sdf[idx]
other = buffalo_map.groupby('neighborhood_1')[['longitude', 'latitude']].mean()

sdf = sdf.set_index('neighborhood_1').join(other)
sdf = sdf.reset_index().sort_values("counts", ascending=False)

# Create a Folium map centered around Buffalo, New York
m = folium.Map(location=[mean_latitude, mean_longitude], zoom_start=12)

# Create the scatter plot
for _, row in sdf.iterrows():
    district = f"neighborhood_1: {row['neighborhood_1']}, Count: {int(row['counts'])}"
    x = row['latitude']
    y = row['longitude']

    # Add a marker for each point on the map
    folium.Marker([x, y], tooltip=district).add_to(m)

m

Make this Notebook Trusted to load map: File -> Trust Notebook

Buffalo Crime Heatmap

"""
This function generates a folium map with Buffalo location and given zoom value.
"""

def generateBaseMap(default_location=[mean_latitude, mean_longitude], default_zoom_start=12):
    base_map = folium.Map(location=default_location, control_scale=True, zoom_start=default_zoom_start)
    return base_map

buffalo_map.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 231842 entries, 2009-01-10 12:19:00 to 2023-09-11 11:12:45
Data columns (total 8 columns):
 #   Column                 Non-Null Count   Dtype         
---  ------                 --------------   -----         
 0   neighborhood_1         231842 non-null  object        
 1   incident_type_primary  231842 non-null  object        
 2   latitude               231842 non-null  float64       
 3   longitude              231842 non-null  float64       
 4   incident_datetime      231842 non-null  datetime64[ns]
 5   hour_of_day            231842 non-null  int64         
 6   Year                   231842 non-null  int64         
 7   Month                  231842 non-null  int64         
dtypes: datetime64[ns](1), float64(2), int64(3), object(2)
memory usage: 15.9+ MB

buffalo_map.head()

	neighborhood_1	incident_type_primary	latitude	longitude	incident_datetime	hour_of_day	Year	Month
incident_datetime
2009-01-10 12:19:00	North Park	BURGLARY	42.955	-78.857	2009-01-10 12:19:00	12	2009	1
2009-01-10 12:21:00	Kenfield	BURGLARY	42.928	-78.818	2009-01-10 12:21:00	12	2009	1
2009-01-10 12:27:00	Masten Park	UUV	42.917	-78.863	2009-01-10 12:27:00	12	2009	1
2009-01-10 12:30:00	Masten Park	ASSAULT	42.915	-78.854	2009-01-10 12:30:00	12	2009	1
2009-01-10 12:30:00	MLK Park	BURGLARY	42.910	-78.835	2009-01-10 12:30:00	12	2009	1

# make night & day column
buffalo_map['dayType'] = buffalo_map['hour_of_day'].apply(lambda x: 'Day' if (x >= 6 and x < 18) else 'Night')

# grab summer 2023 data
summer_2023 = buffalo_map.loc[(buffalo_map['Year'] == 2023) & (buffalo_map['Month'] > 5) & (buffalo_map['Month'] < 9)]
# grab summer 2009 data
summer_2009 = buffalo_map.loc[(buffalo_map['Year'] == 2009) & (buffalo_map['Month'] > 5) & (buffalo_map['Month'] < 9)]

print(type(summer_2023))
print(type(summer_2009))
print(summer_2023.shape)
print(summer_2009.shape)

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
(2835, 9)
(5811, 9)

# make day and night data dfor summer 2023 & summer 2009

summer_2023_day = summer_2023[summer_2023['dayType'] == 'Day']
summer_2023_night = summer_2023[summer_2023['dayType'] == 'Night']
summer_2009_day = summer_2009[summer_2009['dayType'] == 'Day']
summer_2009_night = summer_2009[summer_2009['dayType'] == 'Night']

# Heatmap --> 2023 Summer Days
base_map = generateBaseMap()
HeatMap(data=summer_2023_day[['latitude', 'longitude']].\
        groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=8, max_zoom=12).add_to(base_map)

base_map

Make this Notebook Trusted to load map: File -> Trust Notebook

# Heatmap --> 2023 Summer Nights
base_map = generateBaseMap()
HeatMap(data=summer_2023_night[['latitude', 'longitude']].\
        groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=8, max_zoom=12).add_to(base_map)

base_map

Make this Notebook Trusted to load map: File -> Trust Notebook

Upon comparing the day and night heatmaps for Summer 2023, it becomes evident that there is a higher incidence of crime during daylight hours compared to nighttime.

# Heatmap --> 2009 Summer Days
base_map = generateBaseMap()
HeatMap(data=summer_2009_day[['latitude', 'longitude']].\
        groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=8, max_zoom=12).add_to(base_map)

base_map

Make this Notebook Trusted to load map: File -> Trust Notebook

# Heatmap --> 2009 Summer Nights
base_map = generateBaseMap()
HeatMap(data=summer_2009_night[['latitude', 'longitude']].\
        groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=8, max_zoom=12).add_to(base_map)

base_map

Make this Notebook Trusted to load map: File -> Trust Notebook

Crime Forecasting

import warnings
#warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
from numpy import mean
from numpy import array
from prettytable import PrettyTable
from tqdm import tqdm_notebook

from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Bidirectional
from keras.layers import Flatten
from keras.layers import TimeDistributed
from keras.layers import Conv1D
from keras.layers import MaxPooling1D

from sklearn.metrics import mean_squared_error

data['latitude'] = pd.to_numeric(data['latitude'])
data['longitude'] = pd.to_numeric(data['longitude'])
data['hour_of_day'] = pd.to_numeric(data['hour_of_day'])
#ignoring unknown neighborhoods
data = data[data['neighborhood_1'] != 'UNKNOWN']

data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 236726 entries, 2009-01-10 12:19:00 to 2023-09-11 11:12:45
Data columns (total 35 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 case_number              236726 non-null  object 
 incident_datetime        236726 non-null  object 
 incident_type_primary    236726 non-null  object 
 incident_description     236726 non-null  object 
 parent_incident_type     236726 non-null  object 
 hour_of_day              236726 non-null  int64  
 day_of_week              236726 non-null  object 
 address_1                236710 non-null  object 
 city                     236726 non-null  object 
 state                    236726 non-null  object 
location                 234291 non-null  object 
latitude                 234291 non-null  float64
longitude                234291 non-null  float64
created_at               236726 non-null  object 
census_tract_2010        234719 non-null  object 
census_block_group_2010  234719 non-null  object 
census_block_2010        234719 non-null  object 
census_tract             234719 non-null  object 
census_block             234719 non-null  object 
census_block_group       234719 non-null  object 
neighborhood_1           234719 non-null  object 
police_district          234719 non-null  object 
council_district         234719 non-null  object 
tractce20                234856 non-null  object 
geoid20_tract            234856 non-null  object 
geoid20_blockgroup       234856 non-null  object 
geoid20_block            234856 non-null  object 
Year                     236726 non-null  int64  
Month                    236726 non-null  int64  
dayOfWeek                236726 non-null  int64  
dayOfMonth               236726 non-null  int64  
dayOfYear                236726 non-null  int64  
weekOfMonth              236726 non-null  int64  
weekOfYear               236726 non-null  int64  
MonthNames               236726 non-null  object 
dtypes: float64(2), int64(8), object(25)
memory usage: 65.0+ MB

# function to split training and test data

def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

# decide on the training and test set by using dates

data_tr = data.loc['2011-01-01':'2022-12-31']
data_test = data.loc['2023-01-01':'2023-09-01']

listOfNeigh = list(data['neighborhood_1'].unique())

train_d = []
for neigh in listOfNeigh:
    df = data_tr[data_tr['neighborhood_1'] == neigh]
    df_gr = df.groupby(['Year', 'Month']).count()
    train_d.append(list(df_gr['incident_datetime'].values))

test_d = []
for neigh in listOfNeigh:
    df = data_test[data_test['neighborhood_1'] == neigh]
    df_gr = df.groupby(['Month']).count()
    test_d.append(list(df_gr['incident_datetime'].values))

data_test['neighborhood_1'].unique()

array(['South Park', 'Hopkins-Tifft', 'Lower West Side', 'Central',
       'Lovejoy', 'North Park', 'Kensington-Bailey', 'Elmwood Bryant',
       'Pratt-Willert', 'Masten Park', 'West Hertel',
       'University Heights', 'Broadway Fillmore', 'Elmwood Bidwell',
       'Genesee-Moselle', 'Upper West Side', 'West Side', 'Hamlin Park',
       'Ellicott', 'Seneca Babcock', 'Kenfield', nan, 'First Ward',
       'Allentown', 'Black Rock', 'Delavan Grider', 'Schiller Park',
       'Riverside', 'Fruit Belt', 'Central Park', 'MLK Park', 'Parkside',
       'Kaisertown', 'Seneca-Cazenovia', 'Grant-Amherst',
       'Fillmore-Leroy'], dtype=object)

Crime Projection On The Last Eight Months Using Simple Moving Average

# Simple Moving Average
window = 5
predTot = list()
testTot = list()

# get unique neighborhood names
unique_neighborhoods = data_test['neighborhood_1'].unique()

# walk forward over time steps in test
for neighNum, neighborhood in enumerate(unique_neighborhoods):

    history = train_d[neighNum]
    test = test_d[neighNum]

    # check if there is test data for this neighborhood
    if len(test) == 0:
        continue  # skip neighborhoods with no test data

    preds = []
    for t in range(len(test)):
        length = len(history)
        yhat = mean([history[i] for i in range(length - window, length)])
        obs = test[t]
        preds.append(yhat)
        history.append(obs)

    print('Neighborhood: {}'.format(neighborhood))
    print('Actuals: {}'.format(test))
    print('Predictions: {}'.format(preds))

    # plot
    plt.plot(test, color='yellowgreen')
    plt.plot(preds, color='steelblue')

    # Add neighborhood name as annotation
    plt.annotate(neighborhood, (0.02, 0.9), xycoords='axes fraction', fontsize=12, color='black')

    plt.title(f'Simple Moving Average - {neighborhood}')
    plt.xlabel('Months Staring in Jan')
    plt.ylabel('Number Of Crimes')
    plt.legend(['Test Data', 'Predictions'])
    plt.show()

    plt.show()

    testTot = testTot + test
    predTot = predTot + preds

error = mean_squared_error(predTot, testTot) ** .5
print('Test RMSE: %.3f' % error)

Neighborhood: South Park
Actuals: [67, 50, 72, 65, 63, 58, 45, 55, 1]
Predictions: [41.2, 43.8, 46.0, 50.2, 59.8, 63.4, 61.6, 60.6, 57.2]

png

Neighborhood: Hopkins-Tifft
Actuals: [30, 16, 17, 35, 16, 27, 18, 33, 2]
Predictions: [22.6, 24.2, 21.2, 18.8, 23.0, 22.8, 22.2, 22.6, 25.8]

png

Neighborhood: Lower West Side
Actuals: [28, 16, 24, 25, 31, 30, 35, 27, 1]
Predictions: [23.6, 23.6, 21.6, 21.2, 24.2, 24.8, 25.2, 29.0, 29.6]

png

Neighborhood: Central
Actuals: [16, 11, 7, 15, 16, 20, 16, 13, 2]
Predictions: [11.4, 12.4, 11.8, 9.6, 11.4, 13.0, 13.8, 14.8, 16.0]

png

Neighborhood: Lovejoy
Actuals: [23, 13, 23, 21, 28, 17, 15, 23, 3]
Predictions: [18.8, 17.8, 15.8, 16.0, 18.4, 21.6, 20.4, 20.8, 20.8]

png

Neighborhood: North Park
Actuals: [18, 19, 21, 24, 19, 28, 34, 33, 1]
Predictions: [24.8, 20.4, 19.4, 18.2, 19.4, 20.2, 22.2, 25.2, 27.6]

png

Neighborhood: Kensington-Bailey
Actuals: [32, 27, 30, 36, 40, 34, 34, 41]
Predictions: [28.2, 28.8, 26.6, 27.2, 30.8, 33.0, 33.4, 34.8]

png

Neighborhood: Elmwood Bryant
Actuals: [41, 39, 52, 44, 45, 45, 42, 53]
Predictions: [40.0, 39.2, 36.8, 38.4, 41.8, 44.2, 45.0, 45.6]

png

Neighborhood: Pratt-Willert
Actuals: [8, 10, 14, 13, 17, 20, 18, 29, 1]
Predictions: [13.4, 11.4, 9.4, 9.4, 11.0, 12.4, 14.8, 16.4, 19.4]

png

Neighborhood: Masten Park
Actuals: [33, 21, 20, 10, 19, 15, 14, 24]
Predictions: [16.0, 19.2, 19.0, 18.4, 18.0, 20.6, 17.0, 15.6]

png

Neighborhood: West Hertel
Actuals: [39, 28, 26, 34, 39, 47, 50, 62, 1]
Predictions: [35.6, 33.6, 30.4, 28.0, 32.0, 33.2, 34.8, 39.2, 46.4]

png

Neighborhood: University Heights
Actuals: [45, 46, 43, 41, 43, 60, 90, 58, 2]
Predictions: [41.0, 39.4, 40.8, 38.8, 42.0, 43.6, 46.6, 55.4, 58.4]

png

Neighborhood: Broadway Fillmore
Actuals: [56, 21, 27, 39, 53, 60, 51, 48, 1]
Predictions: [33.0, 34.2, 30.6, 28.4, 33.0, 39.2, 40.0, 46.0, 50.2]

png

Neighborhood: Elmwood Bidwell
Actuals: [34, 22, 19, 24, 24, 30, 43, 43, 3]
Predictions: [22.4, 23.8, 22.8, 21.4, 22.8, 24.6, 23.8, 28.0, 32.8]

png

Neighborhood: Genesee-Moselle
Actuals: [32, 26, 29, 29, 34, 26, 33, 38]
Predictions: [31.0, 29.8, 28.0, 24.8, 27.6, 30.0, 28.8, 30.2]

png

Neighborhood: Upper West Side
Actuals: [18, 7, 18, 14, 10, 19, 13, 27]
Predictions: [15.8, 16.4, 13.6, 13.4, 15.0, 13.4, 13.6, 14.8]

png

Neighborhood: West Side
Actuals: [36, 24, 40, 40, 52, 60, 51, 52, 5]
Predictions: [35.8, 34.0, 29.6, 29.0, 33.2, 38.4, 43.2, 48.6, 51.0]

png

Neighborhood: Hamlin Park
Actuals: [30, 15, 11, 13, 30, 19, 20, 15, 1]
Predictions: [17.6, 19.0, 17.6, 16.2, 16.6, 19.8, 17.6, 18.6, 19.4]

png

Neighborhood: Ellicott
Actuals: [27, 15, 17, 16, 21, 20, 35, 21, 1]
Predictions: [17.6, 19.0, 17.0, 16.6, 18.8, 19.2, 17.8, 21.8, 22.6]

png

Neighborhood: Seneca Babcock
Actuals: [33, 19, 40, 34, 37, 40, 44, 30, 1]
Predictions: [26.0, 25.2, 23.2, 25.6, 30.6, 32.6, 34.0, 39.0, 37.0]

png

Neighborhood: Kenfield
Actuals: [13, 14, 16, 14, 20, 16, 17, 14]
Predictions: [14.2, 13.0, 13.6, 14.2, 14.8, 15.4, 16.0, 16.6]

png

Neighborhood: nan
Actuals: [14, 18, 22, 25, 30, 22, 16, 11]
Predictions: [18.6, 17.0, 17.4, 18.2, 21.2, 21.8, 23.4, 23.0]

png

Neighborhood: First Ward
Actuals: [29, 26, 29, 39, 30, 36, 37, 26]
Predictions: [25.6, 25.2, 24.2, 23.4, 28.2, 30.6, 32.0, 34.2]

png

Neighborhood: Allentown
Actuals: [33, 27, 26, 37, 39, 30, 32, 41]
Predictions: [31.2, 33.0, 28.6, 27.4, 30.8, 32.4, 31.8, 32.8]

png

Neighborhood: Black Rock
Actuals: [12, 15, 14, 24, 21, 25, 23, 15]
Predictions: [13.6, 12.6, 13.2, 11.4, 15.2, 17.2, 19.8, 21.4]

png

Neighborhood: Delavan Grider
Actuals: [16, 5, 13, 16, 24, 18, 18, 20]
Predictions: [14.8, 15.2, 12.4, 11.2, 13.2, 14.8, 15.2, 17.8]

png

Neighborhood: Schiller Park
Actuals: [12, 12, 11, 20, 9, 13, 11, 14, 1]
Predictions: [8.4, 7.6, 7.6, 9.0, 12.0, 12.8, 13.0, 12.8, 13.4]

png

Neighborhood: Riverside
Actuals: [17, 12, 14, 6, 7, 15, 12, 7]
Predictions: [8.2, 9.6, 9.4, 10.8, 11.2, 11.2, 10.8, 10.8]

png

Neighborhood: Fruit Belt
Actuals: [12, 8, 4, 7, 13, 7, 8, 10]
Predictions: [7.6, 7.4, 8.2, 6.4, 7.0, 8.8, 7.8, 7.8]

png

Neighborhood: Central Park
Actuals: [19, 16, 9, 17, 14, 18, 22, 17, 1]
Predictions: [14.0, 14.6, 14.4, 13.2, 14.4, 15.0, 14.8, 16.0, 17.6]

png

Neighborhood: MLK Park
Actuals: [10, 9, 7, 14, 14, 7, 4, 7]
Predictions: [8.4, 8.8, 8.4, 8.2, 10.2, 10.8, 10.2, 9.2]

png

Neighborhood: Parkside
Actuals: [41, 17, 30, 32, 28, 24, 22, 27, 2]
Predictions: [26.4, 27.6, 24.6, 25.4, 28.8, 29.6, 26.2, 27.2, 26.6]

png

Neighborhood: Kaisertown
Actuals: [5, 2, 4, 2, 5, 5, 2, 4]
Predictions: [3.4, 3.2, 3.0, 3.4, 3.4, 3.6, 3.6, 3.6]

png

Neighborhood: Seneca-Cazenovia
Actuals: [25, 13, 20, 21, 22, 15, 22, 19, 1]
Predictions: [17.0, 18.8, 15.8, 14.8, 17.0, 20.2, 18.2, 20.0, 19.8]

png

Neighborhood: Grant-Amherst
Actuals: [11, 9, 7, 8, 7, 8, 12, 6]
Predictions: [5.6, 6.8, 6.8, 7.0, 7.8, 8.4, 7.8, 8.4]

png

Test RMSE: 11.191

Crime Projection On The Last Eight Months Using Weighted Moving Average

# Weighted Moving Average
window = 5
predTot = list()
testTot = list()

# get unique neighborhood names
unique_neighborhoods = data_test['neighborhood_1'].unique()

# walk forward over time steps in test
#for neighNum in range(len(train_d)):
for neighNum, neighborhood in enumerate(unique_neighborhoods):

    history = train_d[neighNum]
    test = test_d[neighNum]

    # Check if there is test data for this neighborhood
    if len(test) == 0:
        continue  # Skip neighborhoods with no test data

    preds = []
    for t in range(len(test)):
        length = len(history)
        yhat = np.average([history[i] for i in range(length - window, length)], weights=[1,2,3,4,5])
        obs = test[t]
        preds.append(yhat)
        history.append(obs)

    #print('Neighborhood: {}'.format(neighNum+1))
    print('Neighborhood: {}'.format(neighborhood))
    print('Actuals: {}'.format(test))
    print('Predictions: {}'.format(preds))

    # plot
    plt.plot(test, color='yellowgreen')
    plt.plot(preds, color='steelblue')

    # Add neighborhood name as annotation
    plt.annotate(neighborhood, (0.02, 0.9), xycoords='axes fraction', fontsize=12, color='black')

    plt.title(f'Weighted Moving Average - {neighborhood}')
    plt.xlabel('Months Staring in Jan')
    plt.ylabel('Number Of Crimes')
    plt.legend(['Test Data', 'Predictions'])


    plt.show()

    testTot = testTot + test
    predTot = predTot + preds
error = mean_squared_error(predTot, testTot) ** .5
print('Test RMSE: %.3f' % error)

Neighborhood: South Park
Actuals: [67, 50, 72, 65, 63, 58, 45, 55, 1]
Predictions: [35.93333333333333, 43.46666666666667, 45.06666666666667, 54.53333333333333, 59.86666666666667, 63.86666666666667, 62.06666666666667, 56.53333333333333, 54.666666666666664]

png

Neighborhood: Hopkins-Tifft
Actuals: [30, 16, 17, 35, 16, 27, 18, 33, 2]
Predictions: [17.733333333333334, 21.333333333333332, 19.333333333333332, 18.4, 23.533333333333335, 22.2, 23.6, 22.2, 25.666666666666668]

png

Neighborhood: Lower West Side
Actuals: [28, 16, 24, 25, 31, 30, 35, 27, 1]
Predictions: [20.6, 21.666666666666668, 18.933333333333334, 19.8, 21.733333333333334, 25.8, 27.533333333333335, 30.8, 30.133333333333333]

png

Neighborhood: Central
Actuals: [16, 11, 7, 15, 16, 20, 16, 13, 2]
Predictions: [11.066666666666666, 11.933333333333334, 11.133333333333333, 9.6, 11.333333333333334, 13.266666666666667, 15.6, 16.333333333333332, 15.733333333333333]

png

Neighborhood: Lovejoy
Actuals: [23, 13, 23, 21, 28, 17, 15, 23, 3]
Predictions: [14.266666666666667, 16.2, 15.133333333333333, 17.666666666666668, 19.0, 22.8, 21.266666666666666, 19.466666666666665, 20.2]

png

Neighborhood: North Park
Actuals: [18, 19, 21, 24, 19, 28, 34, 33, 1]
Predictions: [20.933333333333334, 19.266666666666666, 18.0, 18.0, 19.866666666666667, 20.666666666666668, 23.266666666666666, 27.2, 29.8]

png

Neighborhood: Kensington-Bailey
Actuals: [32, 27, 30, 36, 40, 34, 34, 41]
Predictions: [37.266666666666666, 35.6, 32.53333333333333, 31.333333333333332, 32.4, 34.666666666666664, 35.0, 35.2]

png

Neighborhood: Elmwood Bryant
Actuals: [41, 39, 52, 44, 45, 45, 42, 53]
Predictions: [46.8, 45.2, 43.13333333333333, 45.8, 45.333333333333336, 45.06666666666667, 45.333333333333336, 44.333333333333336]

png

Neighborhood: Pratt-Willert
Actuals: [8, 10, 14, 13, 17, 20, 18, 29, 1]
Predictions: [15.466666666666667, 12.466666666666667, 10.733333333333333, 11.0, 11.2, 13.8, 16.333333333333332, 17.4, 21.6]

png

Neighborhood: Masten Park
Actuals: [33, 21, 20, 10, 19, 15, 14, 24]
Predictions: [17.933333333333334, 23.466666666666665, 23.466666666666665, 23.0, 18.866666666666667, 18.0, 16.133333333333333, 15.133333333333333]

png

Neighborhood: West Hertel
Actuals: [39, 28, 26, 34, 39, 47, 50, 62, 1]
Predictions: [35.733333333333334, 35.46666666666667, 31.533333333333335, 28.2, 29.133333333333333, 33.6, 38.2, 43.266666666666666, 50.86666666666667]

png

Neighborhood: University Heights
Actuals: [45, 46, 43, 41, 43, 60, 90, 58, 2]
Predictions: [45.0, 43.13333333333333, 41.46666666666667, 39.733333333333334, 40.46666666666667, 43.0, 48.46666666666667, 62.93333333333333, 63.8]

png

Neighborhood: Broadway Fillmore
Actuals: [56, 21, 27, 39, 53, 60, 51, 48, 1]
Predictions: [34.86666666666667, 39.333333333333336, 31.933333333333334, 29.133333333333333, 31.933333333333334, 40.0, 46.93333333333333, 50.6, 51.266666666666666]

png

Neighborhood: Elmwood Bidwell
Actuals: [34, 22, 19, 24, 24, 30, 43, 43, 3]
Predictions: [26.666666666666668, 28.466666666666665, 25.6, 22.266666666666666, 22.2, 23.4, 25.2, 31.6, 36.6]

png

Neighborhood: Genesee-Moselle
Actuals: [32, 26, 29, 29, 34, 26, 33, 38]
Predictions: [33.13333333333333, 33.13333333333333, 30.933333333333334, 30.266666666666666, 29.4, 30.466666666666665, 29.133333333333333, 30.533333333333335]

png

Neighborhood: Upper West Side
Actuals: [18, 7, 18, 14, 10, 19, 13, 27]
Predictions: [18.533333333333335, 19.0, 15.533333333333333, 15.933333333333334, 15.066666666666666, 12.8, 14.666666666666666, 14.466666666666667]

png

Neighborhood: West Side
Actuals: [36, 24, 40, 40, 52, 60, 51, 52, 5]
Predictions: [37.2, 34.53333333333333, 28.933333333333334, 31.066666666666666, 33.93333333333333, 41.6, 48.8, 51.4, 52.53333333333333]

png

Neighborhood: Hamlin Park
Actuals: [30, 15, 11, 13, 30, 19, 20, 15, 1]
Predictions: [12.866666666666667, 17.2, 16.533333333333335, 14.8, 14.333333333333334, 19.666666666666668, 19.4, 20.2, 19.0]

png

Neighborhood: Ellicott
Actuals: [27, 15, 17, 16, 21, 20, 35, 21, 1]
Predictions: [17.0, 19.466666666666665, 17.533333333333335, 16.6, 16.533333333333335, 18.466666666666665, 18.733333333333334, 24.466666666666665, 24.2]

png

Neighborhood: Seneca Babcock
Actuals: [33, 19, 40, 34, 37, 40, 44, 30, 1]
Predictions: [24.933333333333334, 25.8, 22.266666666666666, 27.133333333333333, 30.266666666666666, 34.13333333333333, 36.6, 39.93333333333333, 36.93333333333333]

png

Neighborhood: Kenfield
Actuals: [13, 14, 16, 14, 20, 16, 17, 14]
Predictions: [16.0, 14.933333333333334, 14.266666666666667, 14.666666666666666, 14.4, 16.333333333333332, 16.533333333333335, 16.866666666666667]

png

Neighborhood: nan
Actuals: [14, 18, 22, 25, 30, 22, 16, 11]
Predictions: [18.0, 15.733333333333333, 15.533333333333333, 17.466666666666665, 20.4, 24.4, 24.466666666666665, 22.0]

png

Neighborhood: First Ward
Actuals: [29, 26, 29, 39, 30, 36, 37, 26]
Predictions: [32.333333333333336, 30.8, 28.933333333333334, 28.333333333333332, 31.533333333333335, 31.6, 33.4, 35.06666666666667]

png

Neighborhood: Allentown
Actuals: [33, 27, 26, 37, 39, 30, 32, 41]
Predictions: [35.86666666666667, 34.93333333333333, 32.266666666666666, 30.066666666666666, 31.8, 33.86666666666667, 33.06666666666667, 33.13333333333333]

png

Neighborhood: Black Rock
Actuals: [12, 15, 14, 24, 21, 25, 23, 15]
Predictions: [20.533333333333335, 17.333333333333332, 15.933333333333334, 14.6, 17.333333333333332, 19.0, 21.6, 22.666666666666668]

png

Neighborhood: Delavan Grider
Actuals: [16, 5, 13, 16, 24, 18, 18, 20]
Predictions: [19.333333333333332, 18.266666666666666, 13.533333333333333, 12.733333333333333, 13.266666666666667, 16.6, 17.666666666666668, 18.6]

png

Neighborhood: Schiller Park
Actuals: [12, 12, 11, 20, 9, 13, 11, 14, 1]
Predictions: [8.6, 9.4, 10.0, 10.333333333333334, 13.666666666666666, 12.933333333333334, 13.0, 12.333333333333334, 12.733333333333333]

png

Neighborhood: Riverside
Actuals: [17, 12, 14, 6, 7, 15, 12, 7]
Predictions: [9.866666666666667, 12.4, 12.533333333333333, 13.0, 10.866666666666667, 9.466666666666667, 10.733333333333333, 11.133333333333333]

png

Neighborhood: Fruit Belt
Actuals: [12, 8, 4, 7, 13, 7, 8, 10]
Predictions: [9.066666666666666, 10.066666666666666, 9.4, 7.733333333333333, 7.266666666666667, 8.866666666666667, 8.266666666666667, 8.333333333333334]

png

Neighborhood: Central Park
Actuals: [19, 16, 9, 17, 14, 18, 22, 17, 1]
Predictions: [12.6, 14.133333333333333, 14.333333333333334, 12.333333333333334, 13.866666666666667, 14.4, 15.4, 17.8, 18.133333333333333]

png

Neighborhood: MLK Park
Actuals: [10, 9, 7, 14, 14, 7, 4, 7]
Predictions: [7.6, 7.866666666666666, 8.066666666666666, 7.933333333333334, 10.133333333333333, 11.666666666666666, 10.4, 8.333333333333334]

png

Neighborhood: Parkside
Actuals: [41, 17, 30, 32, 28, 24, 22, 27, 2]
Predictions: [17.333333333333332, 24.133333333333333, 22.066666666666666, 24.8, 27.666666666666668, 28.866666666666667, 27.0, 25.6, 25.533333333333335]

png

Neighborhood: Kaisertown
Actuals: [5, 2, 4, 2, 5, 5, 2, 4]
Predictions: [3.6666666666666665, 4.133333333333334, 3.4, 3.533333333333333, 3.066666666666667, 3.6, 4.066666666666666, 3.533333333333333]

png

Neighborhood: Seneca-Cazenovia
Actuals: [25, 13, 20, 21, 22, 15, 22, 19, 1]
Predictions: [13.266666666666667, 16.333333333333332, 15.2, 16.533333333333335, 18.333333333333332, 20.333333333333332, 18.6, 19.866666666666667, 19.533333333333335]

png

Neighborhood: Grant-Amherst
Actuals: [11, 9, 7, 8, 7, 8, 12, 6]
Predictions: [8.266666666666667, 9.2, 9.266666666666667, 8.533333333333333, 8.2, 7.8, 7.666666666666667, 9.066666666666666]

png

Test RMSE: 11.405

Crime Projection On The Last Eight Months Using Exponential Moving Average

# Exponential Moving Average
predTot = list()
testTot = list()
alpha = 0.6

# Get unique neighborhood names
unique_neighborhoods = data_test['neighborhood_1'].unique()

# Walk forward over time steps in test
for neighNum, neighborhood in enumerate(unique_neighborhoods):

    history = train_d[neighNum]
    test = test_d[neighNum]

    # Check if there is test data for this neighborhood
    if len(test) == 0:
        continue  # Skip neighborhoods with no test data

    preds = []
    lastPred = 0
    for t in range(len(test)):
        yhat = ((1-alpha)*lastPred + (alpha*history[-1]))
        lastPred = yhat
        obs = test[t]
        preds.append(yhat)
        history.append(obs)

    # Plot
    plt.figure(figsize=(8, 4))  # Adjust figure size
    plt.plot(test, color='yellowgreen')
    plt.plot(preds, color='steelblue')

    # Add neighborhood name as annotation
    plt.annotate(neighborhood, (0.02, 0.9), xycoords='axes fraction', fontsize=12, color='black')

    plt.title(f'Exponential Moving Average - {neighborhood}')
    plt.xlabel('Months Staring in Jan')
    plt.ylabel('Number Of Crimes')
    plt.legend(['Test Data', 'Predictions'])
    plt.show()

        #print('Neighborhood: {}'.format(neighNum+1))
    print('Neighborhood: {}'.format(neighborhood))
    print('Actuals: {}'.format(test))
    print('Predictions: {}'.format(preds))

    testTot = testTot + test
    predTot = predTot + preds

error = mean_squared_error(predTot, testTot) ** .5
print('Test RMSE: %.3f' % error)

png

Neighborhood: South Park
Actuals: [67, 50, 72, 65, 63, 58, 45, 55, 1]
Predictions: [0.6, 40.44, 46.176, 61.6704, 63.66816, 63.267264, 60.1069056, 51.04276224, 53.417104896]

png

Neighborhood: Hopkins-Tifft
Actuals: [30, 16, 17, 35, 16, 27, 18, 33, 2]
Predictions: [1.2, 18.48, 16.992, 16.9968, 27.79872, 20.719488, 24.4877952, 20.59511808, 28.038047232]

png

Neighborhood: Lower West Side
Actuals: [28, 16, 24, 25, 31, 30, 35, 27, 1]
Predictions: [0.6, 17.04, 16.416, 20.9664, 23.386560000000003, 27.954624, 29.1818496, 32.67273984, 29.269095936]

png

Neighborhood: Central
Actuals: [16, 11, 7, 15, 16, 20, 16, 13, 2]
Predictions: [1.2, 10.08, 10.632, 8.4528, 12.38112, 14.552448, 17.8209792, 16.72839168, 14.491356672000002]

png

Neighborhood: Lovejoy
Actuals: [23, 13, 23, 21, 28, 17, 15, 23, 3]
Predictions: [1.7999999999999998, 14.52, 13.608, 19.2432, 20.29728, 24.918912, 20.1675648, 17.06702592, 20.626810368]

png

Neighborhood: North Park
Actuals: [18, 19, 21, 24, 19, 28, 34, 33, 1]
Predictions: [0.6, 11.04, 15.815999999999999, 18.9264, 21.97056, 20.188223999999998, 24.875289600000002, 30.35011584, 31.940046336]

png

Neighborhood: Kensington-Bailey
Actuals: [32, 27, 30, 36, 40, 34, 34, 41]
Predictions: [24.599999999999998, 29.04, 27.816, 29.1264, 33.25056, 37.300224, 35.3200896, 34.52803584]

png

Neighborhood: Elmwood Bryant
Actuals: [41, 39, 52, 44, 45, 45, 42, 53]
Predictions: [31.799999999999997, 37.31999999999999, 38.327999999999996, 46.5312, 45.01248, 45.004992, 45.0019968, 43.20079872]

png

Neighborhood: Pratt-Willert
Actuals: [8, 10, 14, 13, 17, 20, 18, 29, 1]
Predictions: [0.6, 5.04, 8.016, 11.6064, 12.44256, 15.177024, 18.0708096, 18.02832384, 24.611329536]

png

Neighborhood: Masten Park
Actuals: [33, 21, 20, 10, 19, 15, 14, 24]
Predictions: [14.399999999999999, 25.560000000000002, 22.824, 21.129600000000003, 14.451840000000002, 17.180736000000003, 15.872294400000001, 14.748917760000001]

png

Neighborhood: West Hertel
Actuals: [39, 28, 26, 34, 39, 47, 50, 62, 1]
Predictions: [0.6, 23.639999999999997, 26.256, 26.102400000000003, 30.840960000000003, 35.736384, 42.4945536, 46.99782144, 55.999128576]

png

Neighborhood: University Heights
Actuals: [45, 46, 43, 41, 43, 60, 90, 58, 2]
Predictions: [1.2, 27.48, 38.592, 41.2368, 41.094719999999995, 42.237888, 52.895155200000005, 75.15806208000001, 64.863224832]

png

Neighborhood: Broadway Fillmore
Actuals: [56, 21, 27, 39, 53, 60, 51, 48, 1]
Predictions: [0.6, 33.84, 26.136000000000003, 26.654400000000003, 34.06176, 45.424704, 54.1698816, 52.26795264, 49.707181055999996]

png

Neighborhood: Elmwood Bidwell
Actuals: [34, 22, 19, 24, 24, 30, 43, 43, 3]
Predictions: [1.7999999999999998, 21.119999999999997, 21.647999999999996, 20.059199999999997, 22.423679999999997, 23.369472, 27.3477888, 36.73911552, 40.495646208]

png

Neighborhood: Genesee-Moselle
Actuals: [32, 26, 29, 29, 34, 26, 33, 38]
Predictions: [22.8, 28.32, 26.928, 28.1712, 28.66848, 31.867392, 28.3469568, 31.138782720000002]

png

Neighborhood: Upper West Side
Actuals: [18, 7, 18, 14, 10, 19, 13, 27]
Predictions: [16.2, 17.28, 11.112000000000002, 15.2448, 14.49792, 11.799168000000002, 16.119667200000002, 14.24786688]

png

Neighborhood: West Side
Actuals: [36, 24, 40, 40, 52, 60, 51, 52, 5]
Predictions: [3.0, 22.799999999999997, 23.519999999999996, 33.408, 37.3632, 46.14528, 54.458112, 52.3832448, 52.15329792]

png

Neighborhood: Hamlin Park
Actuals: [30, 15, 11, 13, 30, 19, 20, 15, 1]
Predictions: [0.6, 18.24, 16.296, 13.1184, 13.047360000000001, 23.218944, 20.6875776, 20.275031040000002, 17.110012416000004]

png

Neighborhood: Ellicott
Actuals: [27, 15, 17, 16, 21, 20, 35, 21, 1]
Predictions: [0.6, 16.439999999999998, 15.576, 16.4304, 16.172159999999998, 19.068863999999998, 19.627545599999998, 28.85101824, 24.140407296]

png

Neighborhood: Seneca Babcock
Actuals: [33, 19, 40, 34, 37, 40, 44, 30, 1]
Predictions: [0.6, 20.04, 19.416, 31.7664, 33.10656, 35.442624, 38.177049600000004, 41.67081984, 34.668327936]

png

Neighborhood: Kenfield
Actuals: [13, 14, 16, 14, 20, 16, 17, 14]
Predictions: [8.4, 11.16, 12.864, 14.7456, 14.29824, 17.719296, 16.6877184, 16.875087360000002]

png

Neighborhood: nan
Actuals: [14, 18, 22, 25, 30, 22, 16, 11]
Predictions: [6.6, 11.040000000000001, 15.216, 19.2864, 22.71456, 27.085824000000002, 24.0343296, 19.21373184]

png

Neighborhood: First Ward
Actuals: [29, 26, 29, 39, 30, 36, 37, 26]
Predictions: [15.6, 23.64, 25.056, 27.4224, 34.36896, 31.747584000000003, 34.2990336, 35.91961344]

png

Neighborhood: Allentown
Actuals: [33, 27, 26, 37, 39, 30, 32, 41]
Predictions: [24.599999999999998, 29.64, 28.056, 26.822400000000002, 32.928960000000004, 36.571584, 32.6286336, 32.25145344]

png

Neighborhood: Black Rock
Actuals: [12, 15, 14, 24, 21, 25, 23, 15]
Predictions: [9.0, 10.799999999999999, 13.32, 13.728000000000002, 19.891199999999998, 20.55648, 23.222592, 23.0890368]

png

Neighborhood: Delavan Grider
Actuals: [16, 5, 13, 16, 24, 18, 18, 20]
Predictions: [12.0, 14.4, 8.760000000000002, 11.304, 14.1216, 20.04864, 18.819456, 18.327782399999997]

png

Neighborhood: Schiller Park
Actuals: [12, 12, 11, 20, 9, 13, 11, 14, 1]
Predictions: [0.6, 7.4399999999999995, 10.175999999999998, 10.670399999999999, 16.26816, 11.907264000000001, 12.5629056, 11.62516224, 13.050064896]

png

Neighborhood: Riverside
Actuals: [17, 12, 14, 6, 7, 15, 12, 7]
Predictions: [4.2, 11.879999999999999, 11.951999999999998, 13.1808, 8.87232, 7.748928, 12.0995712, 12.03982848]

png

Neighborhood: Fruit Belt
Actuals: [12, 8, 4, 7, 13, 7, 8, 10]
Predictions: [6.0, 9.6, 8.64, 5.856, 6.542400000000001, 10.41696, 8.366783999999999, 8.1467136]

png

Neighborhood: Central Park
Actuals: [19, 16, 9, 17, 14, 18, 22, 17, 1]
Predictions: [0.6, 11.64, 14.256, 11.1024, 14.64096, 14.256384, 16.5025536, 19.80102144, 18.120408576]

png

Neighborhood: MLK Park
Actuals: [10, 9, 7, 14, 14, 7, 4, 7]
Predictions: [4.2, 7.68, 8.472, 7.5888, 11.43552, 12.974208, 9.3896832, 6.15587328]

png

Neighborhood: Parkside
Actuals: [41, 17, 30, 32, 28, 24, 22, 27, 2]
Predictions: [1.2, 25.08, 20.232, 26.0928, 29.63712, 28.654848, 25.861939200000002, 23.54477568, 25.617910272]

png

Neighborhood: Kaisertown
Actuals: [5, 2, 4, 2, 5, 5, 2, 4]
Predictions: [2.4, 3.96, 2.784, 3.5136, 2.6054399999999998, 4.0421759999999995, 4.6168704, 3.04674816]

png

Neighborhood: Seneca-Cazenovia
Actuals: [25, 13, 20, 21, 22, 15, 22, 19, 1]
Predictions: [0.6, 15.24, 13.896, 17.5584, 19.623359999999998, 21.049343999999998, 17.419737599999998, 20.167895039999998, 19.467158016]

png

Neighborhood: Grant-Amherst
Actuals: [11, 9, 7, 8, 7, 8, 12, 6]
Predictions: [3.5999999999999996, 8.04, 8.616, 7.6464, 7.85856, 7.343424000000001, 7.7373696, 10.294947839999999]
Test RMSE: 13.853

Conclusion

In conclusion, the graphs and charts presented throughout the project have been instrumental in conveying critical insights:

We observed a remarkable annual decline in the total number of crimes since 2009.
The year 2022 accounted for a relatively modest 3.95% of the total crimes recorded in the dataset spanning from 2009 to the present day.
While Fridays appeared to exhibit a slightly higher incidence of crimes when compared to other days, the difference was not markedly significant.
February consistently registered the lowest number of crimes per month, as evident from the graphical representations.
The annual crime rate displayed a declining trend, characterized by a distinctive zigzag pattern, with crime receding during colder seasons and resurging during hotter months.
The specific week within a month appeared to have minimal impact on crime rates, with the observation that the fifth week recorded fewer incidents, attributed to its shorter duration.
Our hypothesis regarding the decrease in crime during the blizzard was disproven; it was solely attributable to the typical February weather and its influence on crime due to the freezing conditions.
The predominant type of crime across Buffalo neighborhoods was larceny or theft, with the noteworthy exception of the Delavan Grider neighborhood, where assault was the dominant category.

In terms of forecasting accuracy, we obtained Root Mean Square Errors (RMSE) for crime predictions per neighborhood:

Simple Moving Average: RMSE of 11.41
Weighted Moving Average: RMSE of 11.405
Exponential Moving Average: RMSE of 13.85

← Previous Post Next Post →

City of Buffalo Crime EDA & Forecasting

Sourcing Data From Buffalo OpenData API To Predict Crime

Table of Contents

Summary

Import Packages

Buffalo OpenData API

Initial Data Shape & Column Review

Check For Null Data

Add Date Columns

Exploratory Data Analysis

Neighborhood Graphs

Remove Outlier Crimes / Maybe Label As Others Later

Buffalo Crime Geospatial Graphs

Buffalo Crime Heatmap

Crime Forecasting

Crime Projection On The Last Eight Months Using Simple Moving Average

Crime Projection On The Last Eight Months Using Weighted Moving Average

Crime Projection On The Last Eight Months Using Exponential Moving Average

Conclusion