dtch1997

Follow

Daniel Tan dtch1997

Follow

AI safety researcher. Interested in understanding the principles behind LLM learning / generalization

68 followers · 18 following

Achievements

Achievements

Highlights

Pro

Pinned Loading

motools motools Public

Model organism toolkit

Python 1
inoculation-prompting/inoculation-prompting inoculation-prompting/inoculation-prompting Public

Official codebase for Inoculation Prompting

Python 3 1
emergent-misalignment/emergent-misalignment emergent-misalignment/emergent-misalignment Public

Python 234 79
steering-bench steering-bench Public

Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"

Python 19 2
rl_cbf rl_cbf Public

Code accompanying "Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory"

Python 32 1