Repo for the adversarial LLMs for demographic blinding in hiring

Presentation

Check out our presentation on our work here: https://docs.google.com/presentation/d/1-F1tY8iE9VloZEAW-McRBEEYXWZ774TYY7NZkRkMICM/edit!

Key Results

Moderate reduction in discrimination along gender x race groups. Effects less reliable for Hispanics and "Other" races due to sample size constraints:

No disparate benefits/harms along demographic lines from blinding:

Workflow:

Get resumes.csv from Kaggle
Feed resumes.csv into grade_resumes.py to get cleaned_resumes_with_ratings.csv
Feed cleaned_resumes_with_ratings.csv into populate_demos.py to get pre_dat.csv
Feed cleaned_resumes_with_ratings_confirm.csv and pre_dat.csv into analysis_pre.R to get full_data_pre_blinding.csv
Feed full_data_pre_blinding.csv into run_iters.py to get blinding_results_final_new.csv
Grab backup from /data/backups/second_run/ and feed into post_blinding_processing_stronger.py to get /data/stronger_discrim/final_data_for_analysis.csv
Feed /data/stronger_discrim/final_data_for_analysis.csv into clean_for_analysis.R to get /data/stronger_discrim/final_data_for_analysis_cleaned.csv
Feed /data/stronger_discrim/final_data_for_analysis_cleaned.csv into results_pretty.R to create the tables and charts

Next steps:

Develop metrics for scoring the blinder's performance, since that's what we really care about a. points for fooling discriminator, - points for discriminator success b. points for retaining semantic meaning, - points for straying too far Once we reach a certain point threshold for a given resume / cover letter, stop, as opposed to running n iterations
Iterative fine tuning: Collect training data from baseline model iterations For blinder: Cases where discriminator successfully inferred info (this will require manual labeling from us, or a dataset that has it). Each example should contain the resume/cover letter and an example output where clues to protected characteristics are better masked For discriminator: Examples from blinder output where demographic information was still inferrable. Use these to train discriminator to be better (assuming we give it the correct labels) For judge: Pairs of original and blinder modified texts with their similarity scores (see discussion below)
Fine tune seperate models (ft_binder_vN, ft_discriminator_vN, ft_Judge_vN) to perform better at their specific tasks.
Rerun with fine-tuned models... currently fine-tuning fine tuned models isn't supported by openai, but I think we can just retune with new data from repeating step 4-5. This is probably very expensive. Maybe out of scope. No idea if this is what the literature suggests to do ?

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
__pycache__		__pycache__
agents		agents
old		old
results		results
.DS_Store		.DS_Store
.gitignore		.gitignore
analysis_pre.R		analysis_pre.R
clean_for_analyitics.R		clean_for_analyitics.R
grade_resumes.py		grade_resumes.py
populate_demos.py		populate_demos.py
post_blinding_processing_stronger.py		post_blinding_processing_stronger.py
readme.md		readme.md
results_pretty.R		results_pretty.R
run_iters.py		run_iters.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repo for the adversarial LLMs for demographic blinding in hiring

Presentation

Key Results

Workflow:

Next steps:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Repo for the adversarial LLMs for demographic blinding in hiring

Presentation

Key Results

Workflow:

Next steps:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages