Automated Classification of Municipal Zoning Codes
Can we use natural language processing to label the allowed residential uses of zoning districts in California? Data science project for INFO 259 Natural Language Processing with Prof. David Bamman.
The Problem
Zoning districts define what kind of building structures or uses are allowed on different parcels of land. These districts are officially established by cities in the text of municipal codes. Our task is to categorize cities’ highly granular zoning districts into three distinct categories (single-family, multifamily, and non-residential) based on the expressed purpose statements of the zoning definition.
The Approach
- Scrape data from publicly hosted municipal code websites.
- Specifically, we want to identify a particular paragraph present in the articles defining each zone: the “intent” or “purpose” paragraph.
Currently working on
The scraping strategy: the structure of municipal code pages vary. How can we identify these intent/purpose paragraphs efficiently?