Improving Neural Network Models for House Price Prediction
One of the key steps in building an effective neural network model for predicting house prices is visualizing the distribution of the target variable, price
, as it often contains outliers.
Removing outliers is crucial for model accuracy. Follow these steps to remove outliers in the dataset:
# Outlier Removal Technique --- there are many... but his one is my favorite # OPTIONAL --- you don't have to remove the outliers... but you can from scipy import stats # Calculate Z-scores for 'price' column df['price_z'] = np.abs(stats.zscore(df['price'])) # Filter out rows where the Z-score is greater than 3 df = df[df['price_z'] <= 3] # Drop the 'price_z' column after filtering df = df.drop(columns=['price_z'])
To prepare the data for neural network training, scale the numeric data and encode any categorical variables with the following steps:
# Initialize the column transformer transformer = make_column_transformer( (MinMaxScaler(), ['sqft_living', 'sqft_lot', 'sqft_above', 'sqft_basement']), remainder='passthrough' ) # Separate features from the target variable X = df.drop(columns=['price']) y = df['price']
Note: if your dataset has categorical values, you must encode them using the following method:
# from sklearn.compose import make_column_transformer # from sklearn.preprocessing import MinMaxScaler, OneHotEncoder # transformer = make_column_transformer( (MinMaxScaler(), ['sqft_living', 'sqft_lot', 'sqft_above', 'sqft_basement', 'house_age']), (OneHotEncoder(handle_unknown='ignore'), ['bedrooms', 'bathrooms', 'floors', 'view', 'condition']) )
...Continued on next page...
Sign Up For Daily Newsletter
Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.