Back

Real Estate Web Scraper



Overview & Purpose

This project showcases my ability to build a functional and user-centric web application that leverages web scraping techniques to gather and filter real estate listings. The core goal was to create a tool that allows users to dynamically search for properties based on specific criteria they select on the front end. This demonstrates my proficiency in both front-end and back-end development, including handling user input, implementing complex filtering logic, and managing data retrieved from external sources.



What Was Used

To accomplish this, I leveraged a combination of powerful tools and technologies:

• React with Next.js: Used for building the dynamic and interactive user interface, allowing users to select search parameters and view the filtered listings.

• FastAPI (Python): Employed as the back-end framework to handle API requests, implement the web scraping logic, and filter the scraped data based on the user's selected parameters.

• Beautiful Soup (Python): A Python library used for parsing the HTML content retrieved from the target real estate website, making it easy to extract relevant data points from the unstructured HTML.

• Requests (Python): A Python library used to make HTTP requests to the target website, fetching the HTML content that needed to be scraped.

• Axios (JavaScript): Used on the front-end to make asynchronous HTTP requests to the back-end API, sending the user's search parameters and receiving the filtered listings.




Challenges

This project presented several challenges, each contributing to a deeper understanding of web scraping and application development:


Initial Filtering Inaccuracies: The primary challenge was ensuring the filtering logic accurately reflected the user's selections. Initially, the filtering based on property type (e.g., Villa/House) was not working correctly, leading to the display of irrelevant listings (like land). This required careful analysis of the scraped data and iterative refinement of the back-end filtering logic.


Dynamic and Varied Website Structure: Websites can have inconsistent HTML structures, making it challenging to reliably extract data. I had to identify stable CSS selectors and implement robust parsing logic to handle potential variations in the target website's HTML.


Accurate Property Type Identification: Determining the correct property type from the scraped data proved difficult, as the target website didn't always use consistent terminology in the listing titles. I had to adapt the back-end filtering to analyze other data points, like the bedbath field, to improve accuracy.



Lessons Learned

This project reinforced the critical importance of data analysis and iterative development in web scraping and full-stack development. Key lessons learned include:


The Importance of Understanding the Data: Initially, I relied on the short_title for filtering property types, but analyzing the scraped data revealed that the bedbath field was a more reliable source. This highlighted the necessity of thoroughly understanding the data being scraped before implementing filtering logic.


Adaptability to Website Structure: Web scraping requires adaptability. Understanding how to identify and utilize different data points when the primary ones are insufficient is a crucial skill.


Front-End and Back-End Integration: This project provided valuable experience in integrating front-end user interactions with back-end data processing, highlighting the importance of clear API design and communication between the two layers.