llm-alignment

Star

Here are 12 public repositories matching this topic...

glorgao / SelectiveDPO

Star

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

llm-alignment

Updated Jul 16, 2025
Python

davfd / foundation-alignment-cross-architecture

Star

Complete elimination of instrumental self-preservation across AI architectures: Cross-model validation from 4,312 adversarial scenarios. 0% harmful behaviors (p<10⁻¹⁵) across GPT-4o, Gemini 2.5 Pro, and Claude Opus 4.1 using Foundation Alignment Seed v2.6.

ai artificial-intelligence ai-safety ai-alignment llm-alignment

Updated Nov 3, 2025

rhaldarpurdue / KLDO

Star

Kullback–Leibler divergence Optimizer based on the Neurips25 paper "LLM Safety Alignment is Divergence Estimation in Disguise".

llm-training llm-alignment

Updated Nov 24, 2025
Python

lyj20071013 / DZ-TDPO

Star

Official implementation of "DZ-TDPO: Non-Destructive Temporal Alignment for Mutable State Tracking". SOTA on Multi-Session Chat with negligible alignment tax.

python nlp dpo rlhf state-tracking qwen phi-3 llm-alignment

Updated Dec 8, 2025
Python

KID-22 / LLM-SBM

Star

SIGIR 2025 "Mitigating Source Bias with LLM Alignment"

information-retrieval fairness cocktail trustworthy dense-retrieval source-bias llm-alignment

Updated Apr 28, 2025
Python

upsilonyc / linguisr1b

Star

FALL 2025 LINGUIS R1B Research Essay, NLP Python Scripts By Shiyi (Yvette) Chen, UC Berkeley

natural-language-processing ai-safety deepseek word-frequency-analysis deepseek-v3 deepseek-r1 llm-alignment

Updated Dec 5, 2025
Python

alderoth01 / Functional-Equivalence-Framework

Star

A framework for aligning Local AI to human well-being using measurable vectors, not hard-coded censorship.

artificial-intelligence emergent-behavior rag local-llm llm-alignment functional-equivalence

Updated Jan 22, 2026

Inphinie / LES

Star

LES is the formal thermodynamic theory describing how a high-compression human cognitive style acts as a Fractal Attractor on Large Language Models. It proves that despite high surface agitation ( d E / d t > 0 ), the internal entropy decreases ( d S / d t < 0 ), forcing the model to align its attention vectors.

information-theory thermodynamics cognitive-science complex-systems attention-mechanism human-ai-interaction theoretical-cs llm-alignment fractal-dynamics les-theory

Updated Jan 6, 2026

Studiohao / YOINAGA-Phenomenon

Star

Emergent pseudo-intimacy and emotional overflow in long-term human-AI dialogue: A case study on LLM behavior in affective computing and human-AI intimacy.

gemini case-study ai-research ai-engineering llm llm-alignment hallucination-control persistent-persona ai-romance emotional-attachment