Popular repositories Loading
-
TruthfulQA
TruthfulQA PublicForked from sylinrl/TruthfulQA
TruthfulQA: Measuring How Models Imitate Human Falsehoods
Jupyter Notebook
-
s21mind
s21mind PublicWe split TruthfulQA into linguistically-detectable vs knowledge-required hallucinations. Result: 91.92% accuracy with ZERO parameters on the pattern-detectable subset. This establishes the first to…
Python
-
HexaMind
HexaMind PublicWe split TruthfulQA into linguistically-detectable vs knowledge-required hallucinations. Result: 91.92% accuracy with ZERO parameters on the pattern-detectable subset. This establishes the first to…
Python
-
alpaca_eval
alpaca_eval PublicForked from tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Jupyter Notebook
-
-
If the problem persists, check the GitHub status page or contact support.