Featured Project
Real Estate pricing with Machine Learning
An app that merges data science and web development to predict the value of real estate homes in Austin, TX.
The machine learning model mathematically finds a pattern by combing through 15,175 real estate listings in the Austin, Texas, area to compute the pricing of a property. The model uses logistic regression and takes a supervised learning algorithm approach.
A listing can have an excessive amount of detail that is unnecessary for what we want the model to do. Real estate prices also increase over time. A technique called feature engineering was applied to clean the data and reap better results.
Nulls, duplicates, and outliers that represent erroneous data were removed and appreciated to the time scale at development to provide a more accurate prediction. Once the dataset was evaluated, an 80/20 (Pareto Principle) training split was applied to validate the measure of the model performance.
FRONTEND
React, Semantic UI, and Styled Components were used to build the stylish and responsive view layer of the application. Collecting all the data required to compute the price prediction in one form could be overwhelming for the user, so a multi-step form was added to the form design. Drop-downs take standard inputs such as home type, number of parking/garage spaces, bedrooms, bathrooms, etc. Input fields take user typed information such as the lot and living area size.
To ensure only valid value types make it to the server, form validation was implemented with HTML5 and Javascript. If validation fails, the form throws an error message and does not proceed to the next step. A back button on each form also allows navigation to the former step for any input edits. Finally, the form data is submitted by hitting the 'Predict' button.
Additionally, D3.js, a data-driven document library was used to visually represent dynamic data and create scatterplot and line charts fetched from the server to lay out the results.
BACKEND
The backend was built with Python. Python has an extensive set of libraries such as NumPy and Scikit-learn that supports multi-dimensional arrays, matrices, and high-level mathematical functions used in machine learning.
Flask, a micro-framework for Python and functions as a dev server, was used to implement API endpoints that request latitude and longitude data from a third-party geolocation service based on the address entered and serves the prediction along with graphical data. In production, a WSGI server was implemented.
Swagger UI or OpenAPI is also integrated into the backend to allow visualization and interaction of API resources from the web without going through the main website. This was to demonstrate forethought to team collaboration and host the API documentation securely on the web.
DEPLOYMENT
The application is deployed on an AWS EC2 instance and runs on a Linux AMI. The deployment architecture consists of Nginx acting as a reverse proxy and web server. It serves static content and directs requests to the WSGI server, uWSGI, which uses uwsgi protocol to communicate with the Python application. AWS Secrets Manager was also explored in the deployment to store the API Key for the geolocation service and security.
The machine learning model mathematically finds a pattern by combing through 15,175 real estate listings in the Austin, Texas, area to compute the pricing of a property. The model uses logistic regression and takes a supervised learning algorithm approach.
A listing can have an excessive amount of detail that is unnecessary for what we want the model to do. Real estate prices also increase over time. A technique called feature engineering was applied to clean the data and reap better results.
Nulls, duplicates, and outliers that represent erroneous data were removed and appreciated to the time scale at development to provide a more accurate prediction. Once the dataset was evaluated, an 80/20 (Pareto Principle) training split was applied to validate the measure of the model performance.
FRONTEND
React, Semantic UI, and Styled Components were used to build the stylish and responsive view layer of the application. Collecting all the data required to compute the price prediction in one form could be overwhelming for the user, so a multi-step form was added to the form design. Drop-downs take standard inputs such as home type, number of parking/garage spaces, bedrooms, bathrooms, etc. Input fields take user typed information such as the lot and living area size.
To ensure only valid value types make it to the server, form validation was implemented with HTML5 and Javascript. If validation fails, the form throws an error message and does not proceed to the next step. A back button on each form also allows navigation to the former step for any input edits. Finally, the form data is submitted by hitting the 'Predict' button.
Additionally, D3.js, a data-driven document library was used to visually represent dynamic data and create scatterplot and line charts fetched from the server to lay out the results.
BACKEND
The backend was built with Python. Python has an extensive set of libraries such as NumPy and Scikit-learn that supports multi-dimensional arrays, matrices, and high-level mathematical functions used in machine learning.
Flask, a micro-framework for Python and functions as a dev server, was used to implement API endpoints that request latitude and longitude data from a third-party geolocation service based on the address entered and serves the prediction along with graphical data. In production, a WSGI server was implemented.
Swagger UI or OpenAPI is also integrated into the backend to allow visualization and interaction of API resources from the web without going through the main website. This was to demonstrate forethought to team collaboration and host the API documentation securely on the web.
DEPLOYMENT
The application is deployed on an AWS EC2 instance and runs on a Linux AMI. The deployment architecture consists of Nginx acting as a reverse proxy and web server. It serves static content and directs requests to the WSGI server, uWSGI, which uses uwsgi protocol to communicate with the Python application. AWS Secrets Manager was also explored in the deployment to store the API Key for the geolocation service and security.
Technologies
- React
- D3.js
- Semantic UI
- Python
- Flask
- Geocoding API
- OpenAPI
- Nginx
- uWSGI
- AWS EC2
- Webpack