Table of Contents
Summary
In this post, we explore real estate market trends in Erie County using Python and Tableau. We leverage various Python libraries such as pandas, numpy, matplotlib, seaborn, and others to analyze and visualize the data. Our primary data source is a Redfin market tracker dataset for zip codes, which we load and process to gain insights into the real estate market.
Data Loading and Exploration:
We start by importing necessary packages and loading the dataset from the Redfin market tracker. The dataset is a comprehensive source of information, and we print the number of rows, columns, and display the first few rows to get a glimpse of the data.
Geospatial Data:
To enhance our analysis, we integrate geospatial data that includes information about states, counties, and zip codes. We use this data to filter and focus on a specific state, in this case, New York. Visualizing the geographical distribution of data adds context to our analysis, and we display an image of Erie County for reference.
Data Cleaning:
To ensure the accuracy of our analysis, we check the validity of zip codes in the geospatial data. Invalid zip codes are identified and reported. We create a cleaned dataset containing only valid zip codes for further analysis.
Merging Datasets:
We merge the Redfin market data with the cleaned geospatial data, combining relevant information such as zip codes, cities, and counties. This merged dataset sets the foundation for our subsequent analysis.
Exploratory Data Analysis (EDA):
We perform exploratory data analysis by selecting a specific zip code and property type. This allows us to analyze trends over time, specifically focusing on metrics such as median days on market. Additionally, we correct an error in the calculation of month-over-month changes in median days on market.
Feature Engineering:
We enhance our dataset by adding features like the most recent date flag and correcting the median days on market based on chronological order. These features contribute to a more comprehensive analysis of real estate market trends.
Conclusion and Download:
In conclusion, we provide a downloadable CSV file containing the processed dataset to further explore the data in Tableau.
Python Code
# import packages
import pandas as pd
import numpy as np
import time
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import os
plt.style.use('seaborn-v0_8-darkgrid')
# warnings ignore
import warnings
# set warnings to ignore
#warnings.filterwarnings('ignore')
pd.options.mode.chained_assignment = None # default='warn'
pd.set_option('display.max_columns', None)
url = 'https://redfin-public-data.s3.us-west-2.amazonaws.com/redfin_market_tracker/zip_code_market_tracker.tsv000.gz'
# start time to read file
startTime = time.time()
df = pd.read_csv(url, compression='gzip', sep='\t', on_bad_lines='skip')
# end time
executionTime = (time.time() - startTime)
print('Execution time in minutes: ' + str(round(executionTime / 60, 2)))
print('Num of rows:', len(df))
print('Num of cols:', len(df.columns))
df.head()
Execution time in minutes: 3.0
Num of rows: 7476840
Num of cols: 58
|
period_begin |
period_end |
period_duration |
region_type |
region_type_id |
table_id |
is_seasonally_adjusted |
region |
city |
state |
state_code |
property_type |
property_type_id |
median_sale_price |
median_sale_price_mom |
median_sale_price_yoy |
median_list_price |
median_list_price_mom |
median_list_price_yoy |
median_ppsf |
median_ppsf_mom |
median_ppsf_yoy |
median_list_ppsf |
median_list_ppsf_mom |
median_list_ppsf_yoy |
homes_sold |
homes_sold_mom |
homes_sold_yoy |
pending_sales |
pending_sales_mom |
pending_sales_yoy |
new_listings |
new_listings_mom |
new_listings_yoy |
inventory |
inventory_mom |
inventory_yoy |
months_of_supply |
months_of_supply_mom |
months_of_supply_yoy |
median_dom |
median_dom_mom |
median_dom_yoy |
avg_sale_to_list |
avg_sale_to_list_mom |
avg_sale_to_list_yoy |
sold_above_list |
sold_above_list_mom |
sold_above_list_yoy |
price_drops |
price_drops_mom |
price_drops_yoy |
off_market_in_two_weeks |
off_market_in_two_weeks_mom |
off_market_in_two_weeks_yoy |
parent_metro_region |
parent_metro_region_metro_code |
last_updated |
0 |
2017-02-01 |
2017-04-30 |
90 |
zip code |
2 |
2921 |
f |
Zip Code: 08037 |
NaN |
New Jersey |
NJ |
Multi-Family (2-4 Unit) |
4 |
233000.0 |
0.000000 |
NaN |
149900.0 |
0.000000 |
NaN |
104.719101 |
0.000000 |
NaN |
78.693331 |
0.000000 |
NaN |
1.0 |
0.000000 |
NaN |
NaN |
NaN |
NaN |
2.0 |
0.000000 |
NaN |
1.0 |
0.000000 |
NaN |
NaN |
NaN |
NaN |
135.0 |
0.0 |
NaN |
0.913725 |
0.000000 |
NaN |
0.000000 |
0.000000 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Atlantic City, NJ |
12100 |
2023-10-29 14:25:50 |
1 |
2016-12-01 |
2017-02-28 |
90 |
zip code |
2 |
36342 |
f |
Zip Code: 85303 |
NaN |
Arizona |
AZ |
All Residential |
-1 |
182750.0 |
0.065598 |
0.068713 |
189900.0 |
0.026486 |
0.085143 |
110.310638 |
0.002052 |
0.089112 |
117.628681 |
0.005186 |
0.124287 |
68.0 |
-0.190476 |
-0.252747 |
35.0 |
0.346154 |
0.166667 |
107.0 |
0.000000 |
0.019048 |
82.0 |
-0.136842 |
0.138889 |
NaN |
NaN |
NaN |
53.0 |
8.0 |
12.0 |
0.983675 |
-0.004671 |
0.004752 |
0.205882 |
-0.044118 |
0.030058 |
NaN |
NaN |
NaN |
0.257143 |
0.026374 |
-0.109524 |
Phoenix, AZ |
38060 |
2023-10-29 14:25:50 |
2 |
2018-07-01 |
2018-09-30 |
90 |
zip code |
2 |
5832 |
f |
Zip Code: 15644 |
NaN |
Pennsylvania |
PA |
Single Family Residential |
6 |
116500.0 |
0.030973 |
0.165583 |
99450.0 |
-0.074022 |
0.050158 |
106.165882 |
0.099922 |
0.059423 |
86.666579 |
-0.013773 |
-0.115633 |
58.0 |
-0.016949 |
-0.033333 |
10.0 |
-0.285714 |
-0.285714 |
61.0 |
-0.217949 |
-0.298851 |
104.0 |
-0.009524 |
-0.111111 |
NaN |
NaN |
NaN |
74.0 |
-16.0 |
-13.5 |
0.959806 |
-0.005870 |
0.001646 |
0.103448 |
-0.082992 |
-0.096552 |
NaN |
NaN |
NaN |
0.000000 |
-0.285714 |
-0.071429 |
Pittsburgh, PA |
38300 |
2023-10-29 14:25:50 |
3 |
2014-09-01 |
2014-11-30 |
90 |
zip code |
2 |
17223 |
f |
Zip Code: 40507 |
NaN |
Kentucky |
KY |
Condo/Co-op |
3 |
238000.0 |
0.012766 |
-0.004184 |
196950.0 |
-0.083527 |
-0.218452 |
188.000000 |
0.026643 |
0.031127 |
194.026540 |
0.040589 |
-0.002149 |
9.0 |
-0.181818 |
0.000000 |
NaN |
NaN |
NaN |
6.0 |
-0.454545 |
1.000000 |
19.0 |
-0.173913 |
-0.441176 |
NaN |
NaN |
NaN |
104.5 |
-21.5 |
-68.5 |
0.979014 |
0.017979 |
0.034035 |
0.111111 |
0.111111 |
0.111111 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Lexington, KY |
30460 |
2023-10-29 14:25:50 |
4 |
2021-02-01 |
2021-04-30 |
90 |
zip code |
2 |
36264 |
f |
Zip Code: 85212 |
NaN |
Arizona |
AZ |
Single Family Residential |
6 |
454500.0 |
0.041822 |
0.218662 |
451275.0 |
0.005067 |
0.236370 |
209.806157 |
0.039224 |
0.242289 |
212.842720 |
0.013018 |
0.225578 |
265.0 |
0.019231 |
0.031128 |
92.0 |
-0.041667 |
0.150000 |
294.0 |
0.069091 |
-0.023256 |
68.0 |
0.096774 |
-0.595238 |
NaN |
NaN |
NaN |
12.0 |
-6.0 |
-27.0 |
1.025972 |
0.007849 |
0.031341 |
0.630189 |
0.084035 |
0.392835 |
NaN |
NaN |
NaN |
0.630435 |
0.026268 |
0.242935 |
Phoenix, AZ |
38060 |
2023-10-29 14:25:50 |
# filter on state code
state_code = 'NY'
df_filter =df.loc[df['state_code'] == state_code]
print('Num of rows:', len(df_filter))
df_filter.head()
|
period_begin |
period_end |
period_duration |
region_type |
region_type_id |
table_id |
is_seasonally_adjusted |
region |
city |
state |
state_code |
property_type |
property_type_id |
median_sale_price |
median_sale_price_mom |
median_sale_price_yoy |
median_list_price |
median_list_price_mom |
median_list_price_yoy |
median_ppsf |
median_ppsf_mom |
median_ppsf_yoy |
median_list_ppsf |
median_list_ppsf_mom |
median_list_ppsf_yoy |
homes_sold |
homes_sold_mom |
homes_sold_yoy |
pending_sales |
pending_sales_mom |
pending_sales_yoy |
new_listings |
new_listings_mom |
new_listings_yoy |
inventory |
inventory_mom |
inventory_yoy |
months_of_supply |
months_of_supply_mom |
months_of_supply_yoy |
median_dom |
median_dom_mom |
median_dom_yoy |
avg_sale_to_list |
avg_sale_to_list_mom |
avg_sale_to_list_yoy |
sold_above_list |
sold_above_list_mom |
sold_above_list_yoy |
price_drops |
price_drops_mom |
price_drops_yoy |
off_market_in_two_weeks |
off_market_in_two_weeks_mom |
off_market_in_two_weeks_yoy |
parent_metro_region |
parent_metro_region_metro_code |
last_updated |
21 |
2018-08-01 |
2018-10-31 |
90 |
zip code |
2 |
3452 |
f |
Zip Code: 10509 |
NaN |
New York |
NY |
Townhouse |
13 |
250000.0 |
0.000000 |
NaN |
314000.0 |
0.000000 |
NaN |
156.250000 |
0.000000 |
NaN |
179.985868 |
0.000000 |
NaN |
1.0 |
0.000000 |
NaN |
NaN |
NaN |
NaN |
2.0 |
0.000000 |
NaN |
2.0 |
0.000000 |
NaN |
NaN |
NaN |
NaN |
139.0 |
0.0 |
NaN |
0.980392 |
0.000000 |
NaN |
0.000000 |
0.000000 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
New York, NY |
35614 |
2023-10-29 14:25:50 |
61 |
2016-07-01 |
2016-09-30 |
90 |
zip code |
2 |
4280 |
f |
Zip Code: 12547 |
NaN |
New York |
NY |
Multi-Family (2-4 Unit) |
4 |
155000.0 |
0.000000 |
NaN |
619949.5 |
1.584200 |
NaN |
184.523810 |
0.000000 |
NaN |
134.106454 |
0.233176 |
NaN |
1.0 |
0.000000 |
NaN |
NaN |
NaN |
NaN |
2.0 |
1.000000 |
NaN |
3.0 |
0.000000 |
NaN |
NaN |
NaN |
NaN |
154.0 |
0.0 |
NaN |
0.837838 |
0.000000 |
NaN |
0.000000 |
0.000000 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Kingston, NY |
28740 |
2023-10-29 14:25:50 |
70 |
2012-08-01 |
2012-10-31 |
90 |
zip code |
2 |
3790 |
f |
Zip Code: 11572 |
NaN |
New York |
NY |
Single Family Residential |
6 |
396250.0 |
0.031901 |
-0.016749 |
399499.5 |
-0.110246 |
-0.076300 |
231.925656 |
-0.021452 |
-0.022505 |
238.468989 |
-0.029403 |
-0.069139 |
50.0 |
-0.019608 |
0.111111 |
5.0 |
-0.545455 |
-0.375000 |
70.0 |
-0.135802 |
0.129032 |
142.0 |
0.028986 |
-0.027397 |
NaN |
NaN |
NaN |
104.5 |
-0.5 |
20.5 |
0.945461 |
0.000605 |
0.001794 |
0.040000 |
0.000784 |
-0.071111 |
NaN |
NaN |
NaN |
0.00 |
-0.090909 |
0.000000 |
Nassau County, NY |
35004 |
2023-10-29 14:25:50 |
77 |
2019-05-01 |
2019-07-31 |
90 |
zip code |
2 |
4440 |
f |
Zip Code: 12878 |
NaN |
New York |
NY |
Single Family Residential |
6 |
158000.0 |
-0.363417 |
0.166052 |
164900.0 |
0.157193 |
-0.586717 |
107.513499 |
0.020027 |
-0.997177 |
93.163842 |
-0.035048 |
-0.512194 |
4.0 |
1.000000 |
1.000000 |
1.0 |
NaN |
NaN |
5.0 |
1.500000 |
0.250000 |
5.0 |
0.000000 |
-0.166667 |
NaN |
NaN |
NaN |
230.0 |
-74.0 |
-200.0 |
0.917599 |
-0.049489 |
-0.100128 |
0.000000 |
0.000000 |
-0.500000 |
NaN |
NaN |
NaN |
0.00 |
NaN |
NaN |
Glens Falls, NY |
24020 |
2023-10-29 14:25:50 |
86 |
2021-09-01 |
2021-11-30 |
90 |
zip code |
2 |
3818 |
f |
Zip Code: 11706 |
NaN |
New York |
NY |
Single Family Residential |
6 |
465000.0 |
0.015284 |
0.134146 |
450000.0 |
-0.020664 |
0.058824 |
302.572385 |
-0.032961 |
0.248304 |
319.148936 |
0.022469 |
0.153220 |
122.0 |
0.079646 |
-0.122302 |
25.0 |
-0.264706 |
-0.074074 |
105.0 |
-0.062500 |
-0.355828 |
63.0 |
-0.030769 |
-0.343750 |
NaN |
NaN |
NaN |
21.0 |
0.0 |
-3.0 |
1.040465 |
0.002822 |
0.036925 |
0.737705 |
0.029740 |
0.226914 |
NaN |
NaN |
NaN |
0.44 |
0.087059 |
0.291852 |
Nassau County, NY |
35004 |
2023-10-29 14:25:50 |
|
period_duration |
region_type_id |
table_id |
city |
property_type_id |
median_sale_price |
median_sale_price_mom |
median_sale_price_yoy |
median_list_price |
median_list_price_mom |
median_list_price_yoy |
median_ppsf |
median_ppsf_mom |
median_ppsf_yoy |
median_list_ppsf |
median_list_ppsf_mom |
median_list_ppsf_yoy |
homes_sold |
homes_sold_mom |
homes_sold_yoy |
pending_sales |
pending_sales_mom |
pending_sales_yoy |
new_listings |
new_listings_mom |
new_listings_yoy |
inventory |
inventory_mom |
inventory_yoy |
months_of_supply |
months_of_supply_mom |
months_of_supply_yoy |
median_dom |
median_dom_mom |
median_dom_yoy |
avg_sale_to_list |
avg_sale_to_list_mom |
avg_sale_to_list_yoy |
sold_above_list |
sold_above_list_mom |
sold_above_list_yoy |
price_drops |
price_drops_mom |
price_drops_yoy |
off_market_in_two_weeks |
off_market_in_two_weeks_mom |
off_market_in_two_weeks_yoy |
parent_metro_region_metro_code |
count |
483632.0 |
483632.0 |
483632.000000 |
0.0 |
483632.000000 |
4.836310e+05 |
458476.000000 |
411219.000000 |
4.411990e+05 |
416001.000000 |
381840.000000 |
4.725180e+05 |
447056.000000 |
400187.000000 |
4.349210e+05 |
409627.000000 |
375851.000000 |
483632.000000 |
458478.000000 |
411221.000000 |
307990.000000 |
249870.000000 |
236135.000000 |
441428.000000 |
416233.000000 |
382079.000000 |
447182.000000 |
423817.000000 |
388225.000000 |
0.0 |
0.0 |
0.0 |
481521.00000 |
456316.000000 |
409097.000000 |
481160.000000 |
455978.000000 |
408895.000000 |
483632.000000 |
458478.000000 |
411221.000000 |
0.0 |
0.0 |
0.0 |
307990.000000 |
249870.000000 |
236135.000000 |
483632.000000 |
mean |
90.0 |
2.0 |
4344.397916 |
NaN |
3.440525 |
4.171076e+05 |
0.069758 |
0.218782 |
4.482294e+05 |
0.045579 |
0.146212 |
2.442654e+02 |
0.135518 |
0.457926 |
2.669602e+02 |
0.053183 |
1.116265 |
18.299653 |
0.074915 |
0.243026 |
6.713721 |
0.257825 |
0.368467 |
25.987049 |
0.068378 |
0.176109 |
32.909323 |
0.031351 |
0.062185 |
NaN |
NaN |
NaN |
124.88643 |
-0.807652 |
-10.342834 |
0.964606 |
0.000505 |
0.005243 |
0.213572 |
0.002272 |
0.025362 |
NaN |
NaN |
NaN |
0.235077 |
0.000291 |
0.026445 |
31491.533658 |
std |
0.0 |
0.0 |
1548.475951 |
NaN |
3.950440 |
9.143662e+05 |
7.179519 |
14.369080 |
9.020977e+05 |
2.910467 |
5.015077 |
4.023996e+03 |
24.710078 |
62.409171 |
4.333815e+03 |
5.245511 |
231.303425 |
27.660098 |
0.457406 |
1.051684 |
8.744142 |
1.000105 |
1.334226 |
36.932721 |
0.467056 |
0.962321 |
52.466888 |
0.350431 |
0.930395 |
NaN |
NaN |
NaN |
215.24094 |
137.166799 |
223.626941 |
0.060563 |
0.034060 |
0.062259 |
0.256130 |
0.144508 |
0.260508 |
NaN |
NaN |
NaN |
0.307035 |
0.325926 |
0.327081 |
10856.871769 |
min |
90.0 |
2.0 |
1.000000 |
NaN |
-1.000000 |
1.000000e+00 |
-0.999454 |
-0.999974 |
1.230000e+02 |
-0.999381 |
-0.996364 |
2.666667e-04 |
-0.999640 |
-0.999986 |
1.500000e-03 |
-0.999947 |
-0.999612 |
1.000000 |
-0.900000 |
-0.967742 |
1.000000 |
-0.968750 |
-0.969697 |
1.000000 |
-0.937500 |
-0.969697 |
1.000000 |
-0.937500 |
-0.982759 |
NaN |
NaN |
NaN |
1.00000 |
-9536.000000 |
-9961.000000 |
0.500000 |
-0.902985 |
-1.158557 |
0.000000 |
-1.000000 |
-1.000000 |
NaN |
NaN |
NaN |
0.000000 |
-2.000000 |
-1.666667 |
10580.000000 |
25% |
90.0 |
2.0 |
3759.000000 |
NaN |
-1.000000 |
1.310000e+05 |
-0.044776 |
-0.095863 |
1.424000e+05 |
-0.049774 |
-0.081622 |
8.241040e+01 |
-0.036262 |
-0.074294 |
9.030120e+01 |
-0.038677 |
-0.060868 |
2.000000 |
-0.142857 |
-0.250000 |
1.000000 |
-0.333333 |
-0.320000 |
4.000000 |
-0.168224 |
-0.250000 |
5.000000 |
-0.111111 |
-0.310345 |
NaN |
NaN |
NaN |
43.00000 |
-9.500000 |
-41.000000 |
0.941320 |
-0.007863 |
-0.017047 |
0.000000 |
-0.020202 |
-0.052632 |
NaN |
NaN |
NaN |
0.000000 |
-0.111888 |
-0.079365 |
24100.000000 |
50% |
90.0 |
2.0 |
4213.000000 |
NaN |
4.000000 |
2.275000e+05 |
0.000000 |
0.053459 |
2.490000e+05 |
0.000000 |
0.044413 |
1.354059e+02 |
0.000000 |
0.051813 |
1.460769e+02 |
0.000000 |
0.044410 |
7.000000 |
0.000000 |
0.000000 |
3.000000 |
0.000000 |
0.000000 |
11.000000 |
0.000000 |
0.000000 |
14.000000 |
0.000000 |
-0.076923 |
NaN |
NaN |
NaN |
86.00000 |
0.000000 |
-6.000000 |
0.968767 |
0.000000 |
0.004262 |
0.138889 |
0.000000 |
0.000000 |
NaN |
NaN |
NaN |
0.090909 |
0.000000 |
0.000000 |
35614.000000 |
75% |
90.0 |
2.0 |
4853.000000 |
NaN |
6.000000 |
4.600000e+05 |
0.055650 |
0.226415 |
4.999000e+05 |
0.058881 |
0.188413 |
2.661241e+02 |
0.047213 |
0.199768 |
2.911184e+02 |
0.047801 |
0.165407 |
22.000000 |
0.166667 |
0.400000 |
8.000000 |
0.500000 |
0.625000 |
32.000000 |
0.200000 |
0.322581 |
38.000000 |
0.100000 |
0.187500 |
NaN |
NaN |
NaN |
146.50000 |
9.000000 |
22.000000 |
0.994012 |
0.008618 |
0.027534 |
0.333333 |
0.025000 |
0.119048 |
NaN |
NaN |
NaN |
0.400000 |
0.111111 |
0.160714 |
39100.000000 |
max |
90.0 |
2.0 |
41778.000000 |
NaN |
13.000000 |
6.500000e+07 |
3179.500000 |
6379.000000 |
5.850000e+07 |
1733.347609 |
1674.806452 |
1.450000e+06 |
11316.924528 |
18459.106814 |
1.600000e+06 |
1723.068340 |
65209.916515 |
322.000000 |
28.000000 |
57.000000 |
117.000000 |
30.000000 |
87.000000 |
527.000000 |
26.000000 |
111.000000 |
783.000000 |
39.000000 |
100.000000 |
NaN |
NaN |
NaN |
10647.00000 |
9953.000000 |
10062.000000 |
1.979950 |
0.833108 |
1.009360 |
1.000000 |
1.000000 |
1.000000 |
NaN |
NaN |
NaN |
2.000000 |
1.666667 |
1.250000 |
48060.000000 |
geo_data_url = 'https://raw.githubusercontent.com/scpike/us-state-county-zip/master/geo-data.csv'
df_geo = pd.read_csv(geo_data_url)
print('Num of rows:' , len(df_geo))
print('Num of columns:' , len(df_geo.columns))
df_geo.head()
Num of rows: 33103
Num of columns: 6
|
state_fips |
state |
state_abbr |
zipcode |
county |
city |
0 |
1 |
Alabama |
AL |
35004 |
St. Clair |
Acmar |
1 |
1 |
Alabama |
AL |
35005 |
Jefferson |
Adamsville |
2 |
1 |
Alabama |
AL |
35006 |
Jefferson |
Adger |
3 |
1 |
Alabama |
AL |
35007 |
Shelby |
Keystone |
4 |
1 |
Alabama |
AL |
35010 |
Tallapoosa |
New site |
from IPython.display import Image, display
display(Image('https://www3.erie.gov/dhses/sites/www3.erie.gov.dhses/files/2022-06/psap_map.jpg'))

# filter on erie county
df_geo_county = df_geo.loc[(df_geo['state_abbr'] == 'NY') &
(df_geo['county'].isin(['Erie']))]
# create function to check if zip code is valid
df_geo_county['valid_zip_code'] = df_geo_county.apply(lambda x: x['zipcode'].isnumeric(), axis =1)
print('Dataset:')
print(df_geo_county.groupby(['valid_zip_code', 'county'])['zipcode'].count())
print(' ')
print('Invalid zip codes:', df_geo_county.loc[df_geo_county['valid_zip_code'] == False]['zipcode'].unique())
Dataset:
valid_zip_code county
False Erie 1
True Erie 61
Name: zipcode, dtype: int64
Invalid zip codes: ['142HH']
# filter on only valid zip codes
df_geo_county_valid = df_geo_county.loc[df_geo_county['valid_zip_code'] == True]
print('Num of valid zip codes:', len(df_geo_county_valid))
df_geo_county_valid.head(1)
Num of valid zip codes: 61
|
state_fips |
state |
state_abbr |
zipcode |
county |
city |
valid_zip_code |
20220 |
36 |
New york |
NY |
14001 |
Erie |
Akron |
True |
# get zip code from region field
df_filter['zipcode'] = df_filter.apply(lambda x: x['region'].split(':')[1].strip(), axis=1)
# merge market data & geo zip codes
df_merge = pd.merge(
df_filter,
df_geo_county_valid[['zipcode', 'city', 'county']],
how = 'inner', # only return zipcodes in both tables
on = ['zipcode'] # column to join on
)
print('Num of rows:', len(df_merge))
df_merge.head()
|
period_begin |
period_end |
period_duration |
region_type |
region_type_id |
table_id |
is_seasonally_adjusted |
region |
city_x |
state |
state_code |
property_type |
property_type_id |
median_sale_price |
median_sale_price_mom |
median_sale_price_yoy |
median_list_price |
median_list_price_mom |
median_list_price_yoy |
median_ppsf |
median_ppsf_mom |
median_ppsf_yoy |
median_list_ppsf |
median_list_ppsf_mom |
median_list_ppsf_yoy |
homes_sold |
homes_sold_mom |
homes_sold_yoy |
pending_sales |
pending_sales_mom |
pending_sales_yoy |
new_listings |
new_listings_mom |
new_listings_yoy |
inventory |
inventory_mom |
inventory_yoy |
months_of_supply |
months_of_supply_mom |
months_of_supply_yoy |
median_dom |
median_dom_mom |
median_dom_yoy |
avg_sale_to_list |
avg_sale_to_list_mom |
avg_sale_to_list_yoy |
sold_above_list |
sold_above_list_mom |
sold_above_list_yoy |
price_drops |
price_drops_mom |
price_drops_yoy |
off_market_in_two_weeks |
off_market_in_two_weeks_mom |
off_market_in_two_weeks_yoy |
parent_metro_region |
parent_metro_region_metro_code |
last_updated |
zipcode |
city_y |
county |
0 |
2021-06-01 |
2021-08-31 |
90 |
zip code |
2 |
5005 |
f |
Zip Code: 14043 |
NaN |
New York |
NY |
All Residential |
-1 |
220000.0 |
0.000000 |
0.157895 |
182450.0 |
-0.039484 |
0.035471 |
145.823035 |
-0.038151 |
0.200821 |
136.376569 |
0.007440 |
0.101305 |
73.0 |
-0.026667 |
0.280702 |
38.0 |
0.407407 |
0.151515 |
115.0 |
0.116505 |
0.064815 |
31.0 |
-0.060606 |
0.068966 |
NaN |
NaN |
NaN |
9.0 |
1.0 |
-2.0 |
1.116532 |
-0.009422 |
0.107955 |
0.794521 |
-0.032146 |
0.320836 |
NaN |
NaN |
NaN |
0.684211 |
-0.130604 |
0.078150 |
Buffalo, NY |
15380 |
2023-10-29 14:25:50 |
14043 |
Depew |
Erie |
1 |
2023-01-01 |
2023-03-31 |
90 |
zip code |
2 |
5005 |
f |
Zip Code: 14043 |
NaN |
New York |
NY |
Single Family Residential |
6 |
223900.0 |
-0.013656 |
0.066190 |
199900.0 |
0.025391 |
0.057672 |
153.019024 |
-0.036131 |
0.012689 |
152.173913 |
0.021311 |
0.021061 |
37.0 |
-0.119048 |
-0.195652 |
25.0 |
1.777778 |
1.500000 |
45.0 |
0.730769 |
0.363636 |
9.0 |
-0.250000 |
1.250000 |
NaN |
NaN |
NaN |
11.0 |
-4.5 |
0.0 |
1.018800 |
-0.024009 |
-0.070206 |
0.378378 |
-0.097812 |
-0.317274 |
NaN |
NaN |
NaN |
0.840000 |
-0.160000 |
-0.160000 |
Buffalo, NY |
15380 |
2023-10-29 14:25:50 |
14043 |
Depew |
Erie |
2 |
2017-11-01 |
2018-01-31 |
90 |
zip code |
2 |
5005 |
f |
Zip Code: 14043 |
NaN |
New York |
NY |
All Residential |
-1 |
145000.0 |
-0.003436 |
0.132812 |
139900.0 |
-0.060759 |
0.076572 |
102.077364 |
-0.030680 |
0.092081 |
99.483806 |
0.038098 |
0.078621 |
77.0 |
-0.114943 |
0.184615 |
15.0 |
0.071429 |
0.875000 |
46.0 |
-0.303030 |
-0.080000 |
31.0 |
-0.031250 |
-0.380000 |
NaN |
NaN |
NaN |
21.0 |
2.0 |
-25.0 |
0.991330 |
-0.000183 |
0.022958 |
0.376623 |
-0.002687 |
0.068931 |
NaN |
NaN |
NaN |
0.333333 |
0.047619 |
0.208333 |
Buffalo, NY |
15380 |
2023-10-29 14:25:50 |
14043 |
Depew |
Erie |
3 |
2021-07-01 |
2021-09-30 |
90 |
zip code |
2 |
5005 |
f |
Zip Code: 14043 |
NaN |
New York |
NY |
Townhouse |
13 |
220000.0 |
-0.015660 |
0.156677 |
219900.0 |
-0.011241 |
0.024220 |
148.648649 |
0.264239 |
0.250711 |
151.043956 |
-0.021620 |
0.126584 |
5.0 |
1.500000 |
-0.285714 |
2.0 |
0.000000 |
-0.500000 |
10.0 |
0.428571 |
0.000000 |
2.0 |
1.000000 |
NaN |
NaN |
NaN |
NaN |
8.0 |
-22.5 |
-1.0 |
1.076753 |
0.060382 |
0.068323 |
0.800000 |
0.300000 |
0.228571 |
NaN |
NaN |
NaN |
0.500000 |
-0.500000 |
0.500000 |
Buffalo, NY |
15380 |
2023-10-29 14:25:50 |
14043 |
Depew |
Erie |
4 |
2019-07-01 |
2019-09-30 |
90 |
zip code |
2 |
5005 |
f |
Zip Code: 14043 |
NaN |
New York |
NY |
Single Family Residential |
6 |
151700.0 |
0.000000 |
0.039041 |
144900.0 |
-0.000690 |
-0.033356 |
119.293078 |
-0.036400 |
0.156225 |
105.507893 |
-0.068096 |
-0.066909 |
71.0 |
0.290909 |
0.126984 |
14.0 |
-0.363636 |
-0.263158 |
64.0 |
-0.189873 |
-0.228916 |
25.0 |
0.041667 |
-0.305556 |
NaN |
NaN |
NaN |
11.0 |
1.0 |
-6.0 |
1.024315 |
0.000577 |
0.010307 |
0.619718 |
-0.016645 |
-0.015202 |
NaN |
NaN |
NaN |
0.357143 |
-0.097403 |
-0.063910 |
Buffalo, NY |
15380 |
2023-10-29 14:25:50 |
14043 |
Depew |
Erie |
# choose a zip code & property type
df_merge.loc[(df_merge['zipcode'] == '14213') &
(df_merge['property_type'] == 'Single Family Residential')]\
.sort_values(by=['period_begin']).tail(2)\
[['period_begin', 'median_dom', 'median_dom_mom', 'median_dom_yoy']]
|
period_begin |
median_dom |
median_dom_mom |
median_dom_yoy |
19944 |
2023-06-01 |
14.5 |
-0.5 |
4.5 |
20047 |
2023-07-01 |
16.0 |
1.5 |
6.5 |
# test to correct the error
df_test = df_merge.copy()
df_test['median_dom_mom_correction'] = df_test.sort_values(by=['period_begin'])['median_dom'].pct_change()
df_test.loc[(df_test['zipcode'] == '14213') &
(df_test['property_type'] == 'Single Family Residential')] \
.sort_values(by=['period_begin']).tail(2)\
[['period_begin', 'median_dom', 'median_dom_mom', 'median_dom_yoy', 'median_dom_mom_correction']]
|
period_begin |
median_dom |
median_dom_mom |
median_dom_yoy |
median_dom_mom_correction |
19944 |
2023-06-01 |
14.5 |
-0.5 |
4.5 |
-0.500000 |
20047 |
2023-07-01 |
16.0 |
1.5 |
6.5 |
0.333333 |
# add features
df_features = df_merge.copy()
print('Max begin date:', df_features['period_begin'].max())
print('Max end date:', df_features['period_end'].max())
max_end_date = df_features['period_end'].max()
# flag the most recent date
df_features['latest_period'] = df_features.apply(
lambda x: True if x['period_end'] == max_end_date else False, axis=1)
# remove Redfin city and keep Geo city
df_features = df_features.drop(columns=['city_x'])
df_features = df_features.rename(columns={'city_y': 'city'})
# fix median days on market
df_features['median_dom_mom'] = df_features.sort_values(by=['period_begin'])['median_dom'].pct_change()
df_features.head()
Max begin date: 2023-07-01
Max end date: 2023-09-30
|
period_begin |
period_end |
period_duration |
region_type |
region_type_id |
table_id |
is_seasonally_adjusted |
region |
state |
state_code |
property_type |
property_type_id |
median_sale_price |
median_sale_price_mom |
median_sale_price_yoy |
median_list_price |
median_list_price_mom |
median_list_price_yoy |
median_ppsf |
median_ppsf_mom |
median_ppsf_yoy |
median_list_ppsf |
median_list_ppsf_mom |
median_list_ppsf_yoy |
homes_sold |
homes_sold_mom |
homes_sold_yoy |
pending_sales |
pending_sales_mom |
pending_sales_yoy |
new_listings |
new_listings_mom |
new_listings_yoy |
inventory |
inventory_mom |
inventory_yoy |
months_of_supply |
months_of_supply_mom |
months_of_supply_yoy |
median_dom |
median_dom_mom |
median_dom_yoy |
avg_sale_to_list |
avg_sale_to_list_mom |
avg_sale_to_list_yoy |
sold_above_list |
sold_above_list_mom |
sold_above_list_yoy |
price_drops |
price_drops_mom |
price_drops_yoy |
off_market_in_two_weeks |
off_market_in_two_weeks_mom |
off_market_in_two_weeks_yoy |
parent_metro_region |
parent_metro_region_metro_code |
last_updated |
zipcode |
city |
county |
latest_period |
0 |
2021-06-01 |
2021-08-31 |
90 |
zip code |
2 |
5005 |
f |
Zip Code: 14043 |
New York |
NY |
All Residential |
-1 |
220000.0 |
0.000000 |
0.157895 |
182450.0 |
-0.039484 |
0.035471 |
145.823035 |
-0.038151 |
0.200821 |
136.376569 |
0.007440 |
0.101305 |
73.0 |
-0.026667 |
0.280702 |
38.0 |
0.407407 |
0.151515 |
115.0 |
0.116505 |
0.064815 |
31.0 |
-0.060606 |
0.068966 |
NaN |
NaN |
NaN |
9.0 |
-0.280000 |
-2.0 |
1.116532 |
-0.009422 |
0.107955 |
0.794521 |
-0.032146 |
0.320836 |
NaN |
NaN |
NaN |
0.684211 |
-0.130604 |
0.078150 |
Buffalo, NY |
15380 |
2023-10-29 14:25:50 |
14043 |
Depew |
Erie |
False |
1 |
2023-01-01 |
2023-03-31 |
90 |
zip code |
2 |
5005 |
f |
Zip Code: 14043 |
New York |
NY |
Single Family Residential |
6 |
223900.0 |
-0.013656 |
0.066190 |
199900.0 |
0.025391 |
0.057672 |
153.019024 |
-0.036131 |
0.012689 |
152.173913 |
0.021311 |
0.021061 |
37.0 |
-0.119048 |
-0.195652 |
25.0 |
1.777778 |
1.500000 |
45.0 |
0.730769 |
0.363636 |
9.0 |
-0.250000 |
1.250000 |
NaN |
NaN |
NaN |
11.0 |
-0.541667 |
0.0 |
1.018800 |
-0.024009 |
-0.070206 |
0.378378 |
-0.097812 |
-0.317274 |
NaN |
NaN |
NaN |
0.840000 |
-0.160000 |
-0.160000 |
Buffalo, NY |
15380 |
2023-10-29 14:25:50 |
14043 |
Depew |
Erie |
False |
2 |
2017-11-01 |
2018-01-31 |
90 |
zip code |
2 |
5005 |
f |
Zip Code: 14043 |
New York |
NY |
All Residential |
-1 |
145000.0 |
-0.003436 |
0.132812 |
139900.0 |
-0.060759 |
0.076572 |
102.077364 |
-0.030680 |
0.092081 |
99.483806 |
0.038098 |
0.078621 |
77.0 |
-0.114943 |
0.184615 |
15.0 |
0.071429 |
0.875000 |
46.0 |
-0.303030 |
-0.080000 |
31.0 |
-0.031250 |
-0.380000 |
NaN |
NaN |
NaN |
21.0 |
-0.688889 |
-25.0 |
0.991330 |
-0.000183 |
0.022958 |
0.376623 |
-0.002687 |
0.068931 |
NaN |
NaN |
NaN |
0.333333 |
0.047619 |
0.208333 |
Buffalo, NY |
15380 |
2023-10-29 14:25:50 |
14043 |
Depew |
Erie |
False |
3 |
2021-07-01 |
2021-09-30 |
90 |
zip code |
2 |
5005 |
f |
Zip Code: 14043 |
New York |
NY |
Townhouse |
13 |
220000.0 |
-0.015660 |
0.156677 |
219900.0 |
-0.011241 |
0.024220 |
148.648649 |
0.264239 |
0.250711 |
151.043956 |
-0.021620 |
0.126584 |
5.0 |
1.500000 |
-0.285714 |
2.0 |
0.000000 |
-0.500000 |
10.0 |
0.428571 |
0.000000 |
2.0 |
1.000000 |
NaN |
NaN |
NaN |
NaN |
8.0 |
-0.619048 |
-1.0 |
1.076753 |
0.060382 |
0.068323 |
0.800000 |
0.300000 |
0.228571 |
NaN |
NaN |
NaN |
0.500000 |
-0.500000 |
0.500000 |
Buffalo, NY |
15380 |
2023-10-29 14:25:50 |
14043 |
Depew |
Erie |
False |
4 |
2019-07-01 |
2019-09-30 |
90 |
zip code |
2 |
5005 |
f |
Zip Code: 14043 |
New York |
NY |
Single Family Residential |
6 |
151700.0 |
0.000000 |
0.039041 |
144900.0 |
-0.000690 |
-0.033356 |
119.293078 |
-0.036400 |
0.156225 |
105.507893 |
-0.068096 |
-0.066909 |
71.0 |
0.290909 |
0.126984 |
14.0 |
-0.363636 |
-0.263158 |
64.0 |
-0.189873 |
-0.228916 |
25.0 |
0.041667 |
-0.305556 |
NaN |
NaN |
NaN |
11.0 |
-0.333333 |
-6.0 |
1.024315 |
0.000577 |
0.010307 |
0.619718 |
-0.016645 |
-0.015202 |
NaN |
NaN |
NaN |
0.357143 |
-0.097403 |
-0.063910 |
Buffalo, NY |
15380 |
2023-10-29 14:25:50 |
14043 |
Depew |
Erie |
False |
date_str = datetime.today().strftime('%Y-%m')
print('Current year/month:', date_str)
Current year/month: 2023-11
from google.colab import files
# download file
df_features.to_csv('zip_realtor_{0}_{1}.csv'.format(state_code, date_str), index=False)
files.download('zip_realtor_{0}_{1}.csv'.format(state_code, date_str))
<IPython.core.display.Javascript object>
<IPython.core.display.Javascript object>
Tableau Screenshots
No filters all of Erie County

Filtered by the City of Buffalo

Filtered by Multi-Family in the City of Buffalo
