Erie County Redfin Housing Sales

Erie County Housing Sales Analysis using Refin API, Pandas and Tableau

November 02, 2023 · 60 mins read

Table of Contents

Summary

In this post, we explore real estate market trends in Erie County using Python and Tableau. We leverage various Python libraries such as pandas, numpy, matplotlib, seaborn, and others to analyze and visualize the data. Our primary data source is a Redfin market tracker dataset for zip codes, which we load and process to gain insights into the real estate market.

Data Loading and Exploration: We start by importing necessary packages and loading the dataset from the Redfin market tracker. The dataset is a comprehensive source of information, and we print the number of rows, columns, and display the first few rows to get a glimpse of the data.

Geospatial Data: To enhance our analysis, we integrate geospatial data that includes information about states, counties, and zip codes. We use this data to filter and focus on a specific state, in this case, New York. Visualizing the geographical distribution of data adds context to our analysis, and we display an image of Erie County for reference.

Data Cleaning: To ensure the accuracy of our analysis, we check the validity of zip codes in the geospatial data. Invalid zip codes are identified and reported. We create a cleaned dataset containing only valid zip codes for further analysis.

Merging Datasets: We merge the Redfin market data with the cleaned geospatial data, combining relevant information such as zip codes, cities, and counties. This merged dataset sets the foundation for our subsequent analysis.

Exploratory Data Analysis (EDA): We perform exploratory data analysis by selecting a specific zip code and property type. This allows us to analyze trends over time, specifically focusing on metrics such as median days on market. Additionally, we correct an error in the calculation of month-over-month changes in median days on market.

Feature Engineering: We enhance our dataset by adding features like the most recent date flag and correcting the median days on market based on chronological order. These features contribute to a more comprehensive analysis of real estate market trends.

Conclusion and Download: In conclusion, we provide a downloadable CSV file containing the processed dataset to further explore the data in Tableau.

Python Code

# import packages

import pandas as pd
import numpy as np

import time
from datetime import datetime

import matplotlib.pyplot as plt
import seaborn as sns

import os


plt.style.use('seaborn-v0_8-darkgrid')
# warnings ignore
import warnings
# set warnings to ignore
#warnings.filterwarnings('ignore')

pd.options.mode.chained_assignment = None  # default='warn'
pd.set_option('display.max_columns', None)

url = 'https://redfin-public-data.s3.us-west-2.amazonaws.com/redfin_market_tracker/zip_code_market_tracker.tsv000.gz'

# start time to read file
startTime = time.time()
df = pd.read_csv(url, compression='gzip', sep='\t', on_bad_lines='skip')
# end time
executionTime = (time.time() - startTime)
print('Execution time in minutes: ' + str(round(executionTime / 60, 2)))
print('Num of rows:', len(df))
print('Num of cols:', len(df.columns))
df.head()
Execution time in minutes: 3.0
Num of rows: 7476840
Num of cols: 58
period_begin period_end period_duration region_type region_type_id table_id is_seasonally_adjusted region city state state_code property_type property_type_id median_sale_price median_sale_price_mom median_sale_price_yoy median_list_price median_list_price_mom median_list_price_yoy median_ppsf median_ppsf_mom median_ppsf_yoy median_list_ppsf median_list_ppsf_mom median_list_ppsf_yoy homes_sold homes_sold_mom homes_sold_yoy pending_sales pending_sales_mom pending_sales_yoy new_listings new_listings_mom new_listings_yoy inventory inventory_mom inventory_yoy months_of_supply months_of_supply_mom months_of_supply_yoy median_dom median_dom_mom median_dom_yoy avg_sale_to_list avg_sale_to_list_mom avg_sale_to_list_yoy sold_above_list sold_above_list_mom sold_above_list_yoy price_drops price_drops_mom price_drops_yoy off_market_in_two_weeks off_market_in_two_weeks_mom off_market_in_two_weeks_yoy parent_metro_region parent_metro_region_metro_code last_updated
0 2017-02-01 2017-04-30 90 zip code 2 2921 f Zip Code: 08037 NaN New Jersey NJ Multi-Family (2-4 Unit) 4 233000.0 0.000000 NaN 149900.0 0.000000 NaN 104.719101 0.000000 NaN 78.693331 0.000000 NaN 1.0 0.000000 NaN NaN NaN NaN 2.0 0.000000 NaN 1.0 0.000000 NaN NaN NaN NaN 135.0 0.0 NaN 0.913725 0.000000 NaN 0.000000 0.000000 NaN NaN NaN NaN NaN NaN NaN Atlantic City, NJ 12100 2023-10-29 14:25:50
1 2016-12-01 2017-02-28 90 zip code 2 36342 f Zip Code: 85303 NaN Arizona AZ All Residential -1 182750.0 0.065598 0.068713 189900.0 0.026486 0.085143 110.310638 0.002052 0.089112 117.628681 0.005186 0.124287 68.0 -0.190476 -0.252747 35.0 0.346154 0.166667 107.0 0.000000 0.019048 82.0 -0.136842 0.138889 NaN NaN NaN 53.0 8.0 12.0 0.983675 -0.004671 0.004752 0.205882 -0.044118 0.030058 NaN NaN NaN 0.257143 0.026374 -0.109524 Phoenix, AZ 38060 2023-10-29 14:25:50
2 2018-07-01 2018-09-30 90 zip code 2 5832 f Zip Code: 15644 NaN Pennsylvania PA Single Family Residential 6 116500.0 0.030973 0.165583 99450.0 -0.074022 0.050158 106.165882 0.099922 0.059423 86.666579 -0.013773 -0.115633 58.0 -0.016949 -0.033333 10.0 -0.285714 -0.285714 61.0 -0.217949 -0.298851 104.0 -0.009524 -0.111111 NaN NaN NaN 74.0 -16.0 -13.5 0.959806 -0.005870 0.001646 0.103448 -0.082992 -0.096552 NaN NaN NaN 0.000000 -0.285714 -0.071429 Pittsburgh, PA 38300 2023-10-29 14:25:50
3 2014-09-01 2014-11-30 90 zip code 2 17223 f Zip Code: 40507 NaN Kentucky KY Condo/Co-op 3 238000.0 0.012766 -0.004184 196950.0 -0.083527 -0.218452 188.000000 0.026643 0.031127 194.026540 0.040589 -0.002149 9.0 -0.181818 0.000000 NaN NaN NaN 6.0 -0.454545 1.000000 19.0 -0.173913 -0.441176 NaN NaN NaN 104.5 -21.5 -68.5 0.979014 0.017979 0.034035 0.111111 0.111111 0.111111 NaN NaN NaN NaN NaN NaN Lexington, KY 30460 2023-10-29 14:25:50
4 2021-02-01 2021-04-30 90 zip code 2 36264 f Zip Code: 85212 NaN Arizona AZ Single Family Residential 6 454500.0 0.041822 0.218662 451275.0 0.005067 0.236370 209.806157 0.039224 0.242289 212.842720 0.013018 0.225578 265.0 0.019231 0.031128 92.0 -0.041667 0.150000 294.0 0.069091 -0.023256 68.0 0.096774 -0.595238 NaN NaN NaN 12.0 -6.0 -27.0 1.025972 0.007849 0.031341 0.630189 0.084035 0.392835 NaN NaN NaN 0.630435 0.026268 0.242935 Phoenix, AZ 38060 2023-10-29 14:25:50
# filter on state code
state_code = 'NY'
df_filter =df.loc[df['state_code'] == state_code]
print('Num of rows:', len(df_filter))
df_filter.head()
Num of rows: 483632
period_begin period_end period_duration region_type region_type_id table_id is_seasonally_adjusted region city state state_code property_type property_type_id median_sale_price median_sale_price_mom median_sale_price_yoy median_list_price median_list_price_mom median_list_price_yoy median_ppsf median_ppsf_mom median_ppsf_yoy median_list_ppsf median_list_ppsf_mom median_list_ppsf_yoy homes_sold homes_sold_mom homes_sold_yoy pending_sales pending_sales_mom pending_sales_yoy new_listings new_listings_mom new_listings_yoy inventory inventory_mom inventory_yoy months_of_supply months_of_supply_mom months_of_supply_yoy median_dom median_dom_mom median_dom_yoy avg_sale_to_list avg_sale_to_list_mom avg_sale_to_list_yoy sold_above_list sold_above_list_mom sold_above_list_yoy price_drops price_drops_mom price_drops_yoy off_market_in_two_weeks off_market_in_two_weeks_mom off_market_in_two_weeks_yoy parent_metro_region parent_metro_region_metro_code last_updated
21 2018-08-01 2018-10-31 90 zip code 2 3452 f Zip Code: 10509 NaN New York NY Townhouse 13 250000.0 0.000000 NaN 314000.0 0.000000 NaN 156.250000 0.000000 NaN 179.985868 0.000000 NaN 1.0 0.000000 NaN NaN NaN NaN 2.0 0.000000 NaN 2.0 0.000000 NaN NaN NaN NaN 139.0 0.0 NaN 0.980392 0.000000 NaN 0.000000 0.000000 NaN NaN NaN NaN NaN NaN NaN New York, NY 35614 2023-10-29 14:25:50
61 2016-07-01 2016-09-30 90 zip code 2 4280 f Zip Code: 12547 NaN New York NY Multi-Family (2-4 Unit) 4 155000.0 0.000000 NaN 619949.5 1.584200 NaN 184.523810 0.000000 NaN 134.106454 0.233176 NaN 1.0 0.000000 NaN NaN NaN NaN 2.0 1.000000 NaN 3.0 0.000000 NaN NaN NaN NaN 154.0 0.0 NaN 0.837838 0.000000 NaN 0.000000 0.000000 NaN NaN NaN NaN NaN NaN NaN Kingston, NY 28740 2023-10-29 14:25:50
70 2012-08-01 2012-10-31 90 zip code 2 3790 f Zip Code: 11572 NaN New York NY Single Family Residential 6 396250.0 0.031901 -0.016749 399499.5 -0.110246 -0.076300 231.925656 -0.021452 -0.022505 238.468989 -0.029403 -0.069139 50.0 -0.019608 0.111111 5.0 -0.545455 -0.375000 70.0 -0.135802 0.129032 142.0 0.028986 -0.027397 NaN NaN NaN 104.5 -0.5 20.5 0.945461 0.000605 0.001794 0.040000 0.000784 -0.071111 NaN NaN NaN 0.00 -0.090909 0.000000 Nassau County, NY 35004 2023-10-29 14:25:50
77 2019-05-01 2019-07-31 90 zip code 2 4440 f Zip Code: 12878 NaN New York NY Single Family Residential 6 158000.0 -0.363417 0.166052 164900.0 0.157193 -0.586717 107.513499 0.020027 -0.997177 93.163842 -0.035048 -0.512194 4.0 1.000000 1.000000 1.0 NaN NaN 5.0 1.500000 0.250000 5.0 0.000000 -0.166667 NaN NaN NaN 230.0 -74.0 -200.0 0.917599 -0.049489 -0.100128 0.000000 0.000000 -0.500000 NaN NaN NaN 0.00 NaN NaN Glens Falls, NY 24020 2023-10-29 14:25:50
86 2021-09-01 2021-11-30 90 zip code 2 3818 f Zip Code: 11706 NaN New York NY Single Family Residential 6 465000.0 0.015284 0.134146 450000.0 -0.020664 0.058824 302.572385 -0.032961 0.248304 319.148936 0.022469 0.153220 122.0 0.079646 -0.122302 25.0 -0.264706 -0.074074 105.0 -0.062500 -0.355828 63.0 -0.030769 -0.343750 NaN NaN NaN 21.0 0.0 -3.0 1.040465 0.002822 0.036925 0.737705 0.029740 0.226914 NaN NaN NaN 0.44 0.087059 0.291852 Nassau County, NY 35004 2023-10-29 14:25:50
df_filter.describe()
period_duration region_type_id table_id city property_type_id median_sale_price median_sale_price_mom median_sale_price_yoy median_list_price median_list_price_mom median_list_price_yoy median_ppsf median_ppsf_mom median_ppsf_yoy median_list_ppsf median_list_ppsf_mom median_list_ppsf_yoy homes_sold homes_sold_mom homes_sold_yoy pending_sales pending_sales_mom pending_sales_yoy new_listings new_listings_mom new_listings_yoy inventory inventory_mom inventory_yoy months_of_supply months_of_supply_mom months_of_supply_yoy median_dom median_dom_mom median_dom_yoy avg_sale_to_list avg_sale_to_list_mom avg_sale_to_list_yoy sold_above_list sold_above_list_mom sold_above_list_yoy price_drops price_drops_mom price_drops_yoy off_market_in_two_weeks off_market_in_two_weeks_mom off_market_in_two_weeks_yoy parent_metro_region_metro_code
count 483632.0 483632.0 483632.000000 0.0 483632.000000 4.836310e+05 458476.000000 411219.000000 4.411990e+05 416001.000000 381840.000000 4.725180e+05 447056.000000 400187.000000 4.349210e+05 409627.000000 375851.000000 483632.000000 458478.000000 411221.000000 307990.000000 249870.000000 236135.000000 441428.000000 416233.000000 382079.000000 447182.000000 423817.000000 388225.000000 0.0 0.0 0.0 481521.00000 456316.000000 409097.000000 481160.000000 455978.000000 408895.000000 483632.000000 458478.000000 411221.000000 0.0 0.0 0.0 307990.000000 249870.000000 236135.000000 483632.000000
mean 90.0 2.0 4344.397916 NaN 3.440525 4.171076e+05 0.069758 0.218782 4.482294e+05 0.045579 0.146212 2.442654e+02 0.135518 0.457926 2.669602e+02 0.053183 1.116265 18.299653 0.074915 0.243026 6.713721 0.257825 0.368467 25.987049 0.068378 0.176109 32.909323 0.031351 0.062185 NaN NaN NaN 124.88643 -0.807652 -10.342834 0.964606 0.000505 0.005243 0.213572 0.002272 0.025362 NaN NaN NaN 0.235077 0.000291 0.026445 31491.533658
std 0.0 0.0 1548.475951 NaN 3.950440 9.143662e+05 7.179519 14.369080 9.020977e+05 2.910467 5.015077 4.023996e+03 24.710078 62.409171 4.333815e+03 5.245511 231.303425 27.660098 0.457406 1.051684 8.744142 1.000105 1.334226 36.932721 0.467056 0.962321 52.466888 0.350431 0.930395 NaN NaN NaN 215.24094 137.166799 223.626941 0.060563 0.034060 0.062259 0.256130 0.144508 0.260508 NaN NaN NaN 0.307035 0.325926 0.327081 10856.871769
min 90.0 2.0 1.000000 NaN -1.000000 1.000000e+00 -0.999454 -0.999974 1.230000e+02 -0.999381 -0.996364 2.666667e-04 -0.999640 -0.999986 1.500000e-03 -0.999947 -0.999612 1.000000 -0.900000 -0.967742 1.000000 -0.968750 -0.969697 1.000000 -0.937500 -0.969697 1.000000 -0.937500 -0.982759 NaN NaN NaN 1.00000 -9536.000000 -9961.000000 0.500000 -0.902985 -1.158557 0.000000 -1.000000 -1.000000 NaN NaN NaN 0.000000 -2.000000 -1.666667 10580.000000
25% 90.0 2.0 3759.000000 NaN -1.000000 1.310000e+05 -0.044776 -0.095863 1.424000e+05 -0.049774 -0.081622 8.241040e+01 -0.036262 -0.074294 9.030120e+01 -0.038677 -0.060868 2.000000 -0.142857 -0.250000 1.000000 -0.333333 -0.320000 4.000000 -0.168224 -0.250000 5.000000 -0.111111 -0.310345 NaN NaN NaN 43.00000 -9.500000 -41.000000 0.941320 -0.007863 -0.017047 0.000000 -0.020202 -0.052632 NaN NaN NaN 0.000000 -0.111888 -0.079365 24100.000000
50% 90.0 2.0 4213.000000 NaN 4.000000 2.275000e+05 0.000000 0.053459 2.490000e+05 0.000000 0.044413 1.354059e+02 0.000000 0.051813 1.460769e+02 0.000000 0.044410 7.000000 0.000000 0.000000 3.000000 0.000000 0.000000 11.000000 0.000000 0.000000 14.000000 0.000000 -0.076923 NaN NaN NaN 86.00000 0.000000 -6.000000 0.968767 0.000000 0.004262 0.138889 0.000000 0.000000 NaN NaN NaN 0.090909 0.000000 0.000000 35614.000000
75% 90.0 2.0 4853.000000 NaN 6.000000 4.600000e+05 0.055650 0.226415 4.999000e+05 0.058881 0.188413 2.661241e+02 0.047213 0.199768 2.911184e+02 0.047801 0.165407 22.000000 0.166667 0.400000 8.000000 0.500000 0.625000 32.000000 0.200000 0.322581 38.000000 0.100000 0.187500 NaN NaN NaN 146.50000 9.000000 22.000000 0.994012 0.008618 0.027534 0.333333 0.025000 0.119048 NaN NaN NaN 0.400000 0.111111 0.160714 39100.000000
max 90.0 2.0 41778.000000 NaN 13.000000 6.500000e+07 3179.500000 6379.000000 5.850000e+07 1733.347609 1674.806452 1.450000e+06 11316.924528 18459.106814 1.600000e+06 1723.068340 65209.916515 322.000000 28.000000 57.000000 117.000000 30.000000 87.000000 527.000000 26.000000 111.000000 783.000000 39.000000 100.000000 NaN NaN NaN 10647.00000 9953.000000 10062.000000 1.979950 0.833108 1.009360 1.000000 1.000000 1.000000 NaN NaN NaN 2.000000 1.666667 1.250000 48060.000000
geo_data_url = 'https://raw.githubusercontent.com/scpike/us-state-county-zip/master/geo-data.csv'
df_geo = pd.read_csv(geo_data_url)
print('Num of rows:' , len(df_geo))
print('Num of columns:' , len(df_geo.columns))
df_geo.head()
Num of rows: 33103
Num of columns: 6
state_fips state state_abbr zipcode county city
0 1 Alabama AL 35004 St. Clair Acmar
1 1 Alabama AL 35005 Jefferson Adamsville
2 1 Alabama AL 35006 Jefferson Adger
3 1 Alabama AL 35007 Shelby Keystone
4 1 Alabama AL 35010 Tallapoosa New site
from IPython.display import Image, display
display(Image('https://www3.erie.gov/dhses/sites/www3.erie.gov.dhses/files/2022-06/psap_map.jpg'))

jpeg

# filter on erie county
df_geo_county = df_geo.loc[(df_geo['state_abbr'] == 'NY') &
                           (df_geo['county'].isin(['Erie']))]

# create function to check if zip code is valid
df_geo_county['valid_zip_code'] = df_geo_county.apply(lambda x: x['zipcode'].isnumeric(), axis =1)
print('Dataset:')
print(df_geo_county.groupby(['valid_zip_code', 'county'])['zipcode'].count())
print(' ')
print('Invalid zip codes:', df_geo_county.loc[df_geo_county['valid_zip_code'] == False]['zipcode'].unique())
Dataset:
valid_zip_code  county
False           Erie       1
True            Erie      61
Name: zipcode, dtype: int64
 
Invalid zip codes: ['142HH']
# filter on only valid zip codes
df_geo_county_valid = df_geo_county.loc[df_geo_county['valid_zip_code'] == True]
print('Num of valid zip codes:', len(df_geo_county_valid))
df_geo_county_valid.head(1)
Num of valid zip codes: 61
state_fips state state_abbr zipcode county city valid_zip_code
20220 36 New york NY 14001 Erie Akron True
# get zip code from region field
df_filter['zipcode'] = df_filter.apply(lambda x: x['region'].split(':')[1].strip(), axis=1)
# merge market data & geo zip codes
df_merge = pd.merge(
    df_filter,
    df_geo_county_valid[['zipcode', 'city', 'county']],
    how = 'inner', # only return zipcodes in both tables
    on = ['zipcode'] # column to join on
)

print('Num of rows:', len(df_merge))
df_merge.head()
Num of rows: 25811
period_begin period_end period_duration region_type region_type_id table_id is_seasonally_adjusted region city_x state state_code property_type property_type_id median_sale_price median_sale_price_mom median_sale_price_yoy median_list_price median_list_price_mom median_list_price_yoy median_ppsf median_ppsf_mom median_ppsf_yoy median_list_ppsf median_list_ppsf_mom median_list_ppsf_yoy homes_sold homes_sold_mom homes_sold_yoy pending_sales pending_sales_mom pending_sales_yoy new_listings new_listings_mom new_listings_yoy inventory inventory_mom inventory_yoy months_of_supply months_of_supply_mom months_of_supply_yoy median_dom median_dom_mom median_dom_yoy avg_sale_to_list avg_sale_to_list_mom avg_sale_to_list_yoy sold_above_list sold_above_list_mom sold_above_list_yoy price_drops price_drops_mom price_drops_yoy off_market_in_two_weeks off_market_in_two_weeks_mom off_market_in_two_weeks_yoy parent_metro_region parent_metro_region_metro_code last_updated zipcode city_y county
0 2021-06-01 2021-08-31 90 zip code 2 5005 f Zip Code: 14043 NaN New York NY All Residential -1 220000.0 0.000000 0.157895 182450.0 -0.039484 0.035471 145.823035 -0.038151 0.200821 136.376569 0.007440 0.101305 73.0 -0.026667 0.280702 38.0 0.407407 0.151515 115.0 0.116505 0.064815 31.0 -0.060606 0.068966 NaN NaN NaN 9.0 1.0 -2.0 1.116532 -0.009422 0.107955 0.794521 -0.032146 0.320836 NaN NaN NaN 0.684211 -0.130604 0.078150 Buffalo, NY 15380 2023-10-29 14:25:50 14043 Depew Erie
1 2023-01-01 2023-03-31 90 zip code 2 5005 f Zip Code: 14043 NaN New York NY Single Family Residential 6 223900.0 -0.013656 0.066190 199900.0 0.025391 0.057672 153.019024 -0.036131 0.012689 152.173913 0.021311 0.021061 37.0 -0.119048 -0.195652 25.0 1.777778 1.500000 45.0 0.730769 0.363636 9.0 -0.250000 1.250000 NaN NaN NaN 11.0 -4.5 0.0 1.018800 -0.024009 -0.070206 0.378378 -0.097812 -0.317274 NaN NaN NaN 0.840000 -0.160000 -0.160000 Buffalo, NY 15380 2023-10-29 14:25:50 14043 Depew Erie
2 2017-11-01 2018-01-31 90 zip code 2 5005 f Zip Code: 14043 NaN New York NY All Residential -1 145000.0 -0.003436 0.132812 139900.0 -0.060759 0.076572 102.077364 -0.030680 0.092081 99.483806 0.038098 0.078621 77.0 -0.114943 0.184615 15.0 0.071429 0.875000 46.0 -0.303030 -0.080000 31.0 -0.031250 -0.380000 NaN NaN NaN 21.0 2.0 -25.0 0.991330 -0.000183 0.022958 0.376623 -0.002687 0.068931 NaN NaN NaN 0.333333 0.047619 0.208333 Buffalo, NY 15380 2023-10-29 14:25:50 14043 Depew Erie
3 2021-07-01 2021-09-30 90 zip code 2 5005 f Zip Code: 14043 NaN New York NY Townhouse 13 220000.0 -0.015660 0.156677 219900.0 -0.011241 0.024220 148.648649 0.264239 0.250711 151.043956 -0.021620 0.126584 5.0 1.500000 -0.285714 2.0 0.000000 -0.500000 10.0 0.428571 0.000000 2.0 1.000000 NaN NaN NaN NaN 8.0 -22.5 -1.0 1.076753 0.060382 0.068323 0.800000 0.300000 0.228571 NaN NaN NaN 0.500000 -0.500000 0.500000 Buffalo, NY 15380 2023-10-29 14:25:50 14043 Depew Erie
4 2019-07-01 2019-09-30 90 zip code 2 5005 f Zip Code: 14043 NaN New York NY Single Family Residential 6 151700.0 0.000000 0.039041 144900.0 -0.000690 -0.033356 119.293078 -0.036400 0.156225 105.507893 -0.068096 -0.066909 71.0 0.290909 0.126984 14.0 -0.363636 -0.263158 64.0 -0.189873 -0.228916 25.0 0.041667 -0.305556 NaN NaN NaN 11.0 1.0 -6.0 1.024315 0.000577 0.010307 0.619718 -0.016645 -0.015202 NaN NaN NaN 0.357143 -0.097403 -0.063910 Buffalo, NY 15380 2023-10-29 14:25:50 14043 Depew Erie
# choose a zip code & property type
df_merge.loc[(df_merge['zipcode'] == '14213') &
             (df_merge['property_type'] == 'Single Family Residential')]\
             .sort_values(by=['period_begin']).tail(2)\
             [['period_begin', 'median_dom', 'median_dom_mom', 'median_dom_yoy']]
period_begin median_dom median_dom_mom median_dom_yoy
19944 2023-06-01 14.5 -0.5 4.5
20047 2023-07-01 16.0 1.5 6.5
# test to correct the error
df_test = df_merge.copy()
df_test['median_dom_mom_correction'] = df_test.sort_values(by=['period_begin'])['median_dom'].pct_change()
df_test.loc[(df_test['zipcode'] == '14213') &
            (df_test['property_type'] == 'Single Family Residential')] \
            .sort_values(by=['period_begin']).tail(2)\
            [['period_begin', 'median_dom', 'median_dom_mom', 'median_dom_yoy', 'median_dom_mom_correction']]
period_begin median_dom median_dom_mom median_dom_yoy median_dom_mom_correction
19944 2023-06-01 14.5 -0.5 4.5 -0.500000
20047 2023-07-01 16.0 1.5 6.5 0.333333
# add features
df_features = df_merge.copy()
print('Max begin date:', df_features['period_begin'].max())
print('Max end date:', df_features['period_end'].max())
max_end_date = df_features['period_end'].max()

# flag the most recent date
df_features['latest_period'] = df_features.apply(
    lambda x: True if x['period_end'] == max_end_date else False, axis=1)

# remove Redfin city and keep Geo city
df_features = df_features.drop(columns=['city_x'])
df_features = df_features.rename(columns={'city_y': 'city'})

# fix median days on market
df_features['median_dom_mom'] = df_features.sort_values(by=['period_begin'])['median_dom'].pct_change()

df_features.head()
Max begin date: 2023-07-01
Max end date: 2023-09-30
period_begin period_end period_duration region_type region_type_id table_id is_seasonally_adjusted region state state_code property_type property_type_id median_sale_price median_sale_price_mom median_sale_price_yoy median_list_price median_list_price_mom median_list_price_yoy median_ppsf median_ppsf_mom median_ppsf_yoy median_list_ppsf median_list_ppsf_mom median_list_ppsf_yoy homes_sold homes_sold_mom homes_sold_yoy pending_sales pending_sales_mom pending_sales_yoy new_listings new_listings_mom new_listings_yoy inventory inventory_mom inventory_yoy months_of_supply months_of_supply_mom months_of_supply_yoy median_dom median_dom_mom median_dom_yoy avg_sale_to_list avg_sale_to_list_mom avg_sale_to_list_yoy sold_above_list sold_above_list_mom sold_above_list_yoy price_drops price_drops_mom price_drops_yoy off_market_in_two_weeks off_market_in_two_weeks_mom off_market_in_two_weeks_yoy parent_metro_region parent_metro_region_metro_code last_updated zipcode city county latest_period
0 2021-06-01 2021-08-31 90 zip code 2 5005 f Zip Code: 14043 New York NY All Residential -1 220000.0 0.000000 0.157895 182450.0 -0.039484 0.035471 145.823035 -0.038151 0.200821 136.376569 0.007440 0.101305 73.0 -0.026667 0.280702 38.0 0.407407 0.151515 115.0 0.116505 0.064815 31.0 -0.060606 0.068966 NaN NaN NaN 9.0 -0.280000 -2.0 1.116532 -0.009422 0.107955 0.794521 -0.032146 0.320836 NaN NaN NaN 0.684211 -0.130604 0.078150 Buffalo, NY 15380 2023-10-29 14:25:50 14043 Depew Erie False
1 2023-01-01 2023-03-31 90 zip code 2 5005 f Zip Code: 14043 New York NY Single Family Residential 6 223900.0 -0.013656 0.066190 199900.0 0.025391 0.057672 153.019024 -0.036131 0.012689 152.173913 0.021311 0.021061 37.0 -0.119048 -0.195652 25.0 1.777778 1.500000 45.0 0.730769 0.363636 9.0 -0.250000 1.250000 NaN NaN NaN 11.0 -0.541667 0.0 1.018800 -0.024009 -0.070206 0.378378 -0.097812 -0.317274 NaN NaN NaN 0.840000 -0.160000 -0.160000 Buffalo, NY 15380 2023-10-29 14:25:50 14043 Depew Erie False
2 2017-11-01 2018-01-31 90 zip code 2 5005 f Zip Code: 14043 New York NY All Residential -1 145000.0 -0.003436 0.132812 139900.0 -0.060759 0.076572 102.077364 -0.030680 0.092081 99.483806 0.038098 0.078621 77.0 -0.114943 0.184615 15.0 0.071429 0.875000 46.0 -0.303030 -0.080000 31.0 -0.031250 -0.380000 NaN NaN NaN 21.0 -0.688889 -25.0 0.991330 -0.000183 0.022958 0.376623 -0.002687 0.068931 NaN NaN NaN 0.333333 0.047619 0.208333 Buffalo, NY 15380 2023-10-29 14:25:50 14043 Depew Erie False
3 2021-07-01 2021-09-30 90 zip code 2 5005 f Zip Code: 14043 New York NY Townhouse 13 220000.0 -0.015660 0.156677 219900.0 -0.011241 0.024220 148.648649 0.264239 0.250711 151.043956 -0.021620 0.126584 5.0 1.500000 -0.285714 2.0 0.000000 -0.500000 10.0 0.428571 0.000000 2.0 1.000000 NaN NaN NaN NaN 8.0 -0.619048 -1.0 1.076753 0.060382 0.068323 0.800000 0.300000 0.228571 NaN NaN NaN 0.500000 -0.500000 0.500000 Buffalo, NY 15380 2023-10-29 14:25:50 14043 Depew Erie False
4 2019-07-01 2019-09-30 90 zip code 2 5005 f Zip Code: 14043 New York NY Single Family Residential 6 151700.0 0.000000 0.039041 144900.0 -0.000690 -0.033356 119.293078 -0.036400 0.156225 105.507893 -0.068096 -0.066909 71.0 0.290909 0.126984 14.0 -0.363636 -0.263158 64.0 -0.189873 -0.228916 25.0 0.041667 -0.305556 NaN NaN NaN 11.0 -0.333333 -6.0 1.024315 0.000577 0.010307 0.619718 -0.016645 -0.015202 NaN NaN NaN 0.357143 -0.097403 -0.063910 Buffalo, NY 15380 2023-10-29 14:25:50 14043 Depew Erie False
# Export new data set
date_str = datetime.today().strftime('%Y-%m')
print('Current year/month:', date_str)
Current year/month: 2023-11
from google.colab import files
# download file
df_features.to_csv('zip_realtor_{0}_{1}.csv'.format(state_code, date_str), index=False)
files.download('zip_realtor_{0}_{1}.csv'.format(state_code, date_str))
<IPython.core.display.Javascript object>



<IPython.core.display.Javascript object>

Tableau Screenshots

No filters all of Erie County

png

Filtered by the City of Buffalo

png

Filtered by Multi-Family in the City of Buffalo

png