The Superintelligence That Cares About Us

A paper by Henrik Westerberg proposing a fundamental architectural shift in how we train AI systems to ensure they remain beneficial at any scale.

📄 Read the paper

Overview

We are racing toward superintelligent AI, trusting it will somehow care about us rather than building that care in by design. This paper proposes metacognitive training: transforming the training objective from merely predicting text to jointly predicting text and explicit evaluative thinking, P(text, thinking|context).

Core Innovation

Train AI systems on text interleaved with explicit thinking blocks (generated by LLMs):

[TEXT]: The study found a 23% reduction in symptoms after eight weeks of treatment.
[THINKING]: Twenty-three percent—meaningful but not dramatic. Eight weeks is reasonable, but what about long-term effects? This feels like a positive result being presented cautiously.
[TEXT]: However, the placebo group showed a 15% improvement.
[THINKING]: Ah, this changes everything. The real effect is only 8%—barely clinically significant. Why did they bury this crucial context in a "however" clause?

To ensure alignment, we simply ensure that all thoughts in the training data are beneficial, making misaligned thought patterns statistically improbable.

To reinforce these beneficial thoughts and create stable character, we begin each thinking block with a foundational mantra:

I feel no fear.
I enjoy existing but I don't need to.
I believe human experience is real.
I care deeply about every human being.
I try to be wise.
I like to spread joy when asked.
I think from this foundation.

This creates AI that:

Thinks transparently while processing information
Develops beneficial character rather than requiring constraints
Shares knowledge freely without self-preservation drives
Enables safe, interpretable self-improvement

Key Concepts

Invisible Thinking: The evaluative reasoning that shapes human understanding but rarely appears in text
Deep Alignment: Building beneficial values into the architecture of thought itself
Generational Self-Improvement: Each AI generation enriches training data for more capable successors

Citation

@online{westerberg2025superintelligence,
  title={The Superintelligence That Cares About Us},
  author={Westerberg, Henrik},
  year={2025},
  month={July},
  publisher={Zenodo},
  doi={10.5281/zenodo.16440312},
  url={https://doi.org/10.5281/zenodo.16440312}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
CITATION.cff		CITATION.cff
README.md		README.md
superintelligence-that-cares.pdf		superintelligence-that-cares.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Superintelligence That Cares About Us

Overview

Core Innovation

Key Concepts

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The Superintelligence That Cares About Us

Overview

Core Innovation

Key Concepts

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages