Automated Classification of Municipal Zoning Codes

Can we use natural language processing to label the allowed residential uses of zoning districts in California? Data science project for INFO 259 Natural Language Processing with Prof. David Bamman.


The Problem

Zoning districts define what kind of building structures or uses are allowed on different parcels of land. These districts are officially established by cities in the text of municipal codes. Our task is to categorize cities’ highly granular zoning districts into three distinct categories (single-family, multifamily, and non-residential) based on the expressed purpose statements of the zoning definition.

The Approach

Currently working on

The scraping strategy: the structure of municipal code pages vary. How can we identify these intent/purpose paragraphs efficiently?