-
Notifications
You must be signed in to change notification settings - Fork 10
Data analysis
Ultimately, the brittle R scripts that we used for our initial analysis should be replaced with a robust data pipeline. This work should also encompass improvements to the classification of donors and will likely require a restructuring of the database.
The panel suggested a more representative taxonomy of donor categories than the current two-category split of "transactional" vs. "ideological" donors.
In order to extend this work beyond one election cycle, we will need to use a feature space more generalizable than donation amounts to specific candidates. We’ll need to do some experimentation to find out what works best. An ideal feature space will tend to cluster the same donors together across multiple election cycles, and lead to meaningful clusters, such as “pro-labor donors”. As a first pass, it may be fruitful to ask experts to classify candidates along a small set of dimensions, and then cluster donors based on their financial support for these dimensions (i.e., donors are supporting candidate ideologies, not candidates per se). In the long-term, a model that learns to categorize both donors and candidates may offer superior performance in predicting donor and candidate behavior, and also produce new insights into the political structure of the city. There is almost certainly a submanifold in candidate-donor space that we could discover using our data.
- Factor in the candidate's donations to their own candidacies (candidate name will come up as the Entity)
- Break down finances into the primary and election cycles (primary = cycles 1-3, election = cycles 4-6, annual report = 7). Determine the cycle by comparing the transaction date to the Annual Data Draft report. Ignore the cycle column, as there are known data entry problems.
- Create an automated process that accepts new reports, classifies donors (and candidates, if applicable), and updates candidate totals. There's a separate page on the Data pipeline to discuss these issues.