Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 51 additions & 62 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,95 +1,84 @@
# Baby Dragon Hatchling
# Baby Dragon Hatchling Continual Learning (BDH-CL)

## **Bridging the Gap Between Transformers and the Brain**
**Fork of**: [pathwaycom/bdh](https://github.com/pathwaycom/bdh)

**Baby Dragon Hatchling (BDH)** is a biologically inspired large language model architecture that connects principles of deep learning with the foundations of neuroscience. Developed by researchers at [Pathway](https://pathway.com), BDH provides a theoretical and practical framework for understanding the emergence of reasoning and generalization in artificial systems.

This repository contains the official implementation from the paper:
> *A. Kosowski, P. Uznański, J. Chorowski, Z. Stamirowska, M. Bartoszkiewicz.*
> [_The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain_](https://doi.org/10.48550/arXiv.2509.26507), arXiv (2025).
***

## Introduction

## Overview
This repository attempts to extend the original Baby Dragon Hatchling (BDH) architecture, a biologically inspired large language model bridging transformers and neural computation, by integrating **continual learning** mechanisms inspired by biological synaptic plasticity.

BDH represents a **scale-free, locally interacting network of neurons** capable of intrinsic reasoning dynamics. BDH scales like a Transformer on performance benchmarks—yet retains full interpretability and theoretical grounding in the fine-grained dynamics of neuron interactions.
The key contribution of this fork is the implementation of **Adaptive Synaptic Consolidation**, enabling BDH to learn multiple tasks sequentially without catastrophic forgetting, in the spirit of Zenke et al.'s *Continual Learning Through Synaptic Intelligence* (2017).

**Key properties:**
***

- **Scale-free network topology** mimicking biological connectivity
- **Locally interacting neuron particles** with excitatory/inhibitory dynamics
- **Hebbian working memory** based on synaptic plasticity, displaying monosemanticity
- **GPU-friendly state-space formulation** for efficient implementation
- **Interpretable activations** that are sparse and positive
## Highlights of Changes and Improvements

BDH formalizes a bridge between **neural computation and machine-based language understanding**. It shows how **macro reasoning behavior** in large AI models emerges from **micro-level neuron dynamics**, guided by principles of graph theory and local computation.
### Continual Learning Integration

Empirically, BDH matches **GPT-2–scale Transformers** across language and translation tasks at equivalent parameter scales (10M–1B).
- Added **Elastic Weight Consolidation (EWC)** with Fisher information estimation to protect important weights from overwriting during new tasks.
- Implemented **adaptive synaptic gates** that regulate plasticity at the neuron level, inspired by biological metaplasticity.
- Integrated **path integral online importance measures** for efficient tracking of weight significance during training.
- Supported **multi-task sequential training** enabling scalable lifelong learning.

## Benchmarking Suite

***
| ![Permuted MNIST](res/PERMUTED_MNIST.PNG) | ![Rotated MNIST](res/ROTATED_MNIST.PNG) |
|:----------------------------------------------------:|:-------------------------------------------------:|
| Permuted MNIST (Simple) | Rotated MNIST (Simple) |

## Architecture
| ![Split CIFAR](res/SPLIT_CIFAR.PNG) | ![Sequence](res/SEQUENCE.PNG) |
|:----------------------------------------------------:|:-------------------------------------------------:|
| Split CIFAR (Simple) | Sequence (Simple) |

<img src="figs/architecture.png" width="600"/>

***

## Relation to Transformers
## How to Use

<img src="figs/vocab.png" width="600"/>
- Install dependencies:

BDH and the Transformer share attention-inspired computation; however, BDH’s graph-based architecture makes its attention **emerge naturally from neuron-level interactions**, reflecting attention as seen in biological systems.
```bash
pip install -r requirements.txt
```

***
- Train BDHC with continual learning enabled:

## Scaling Laws
```bash
train.py --continual_learning
```

<img src="figs/bdh_scaling.png" width="600"/>
- Run simple benchmarks:

BDH follows **Transformer-like scaling laws**, maintaining parameter efficiency while achieving interpretability at any scale.
```bash
simple_benchmark.py --benchmark permuted_mnist --num_tasks 5 --epochs 10

simple_benchmark.py --benchmark split_cifar --num_tasks 5 --epochs 10

simple_benchmark.py --benchmark rotated_mnist --num_tasks 10 --epochs 10

simple_benchmark.py --benchmark sequence --num_tasks 5 --epochs 10
```

***

## Installation and Training

```bash
# install dependencies
pip install -r requirements.txt

# train BDH on a toy dataset
python train.py
```
## Credits

<!--For visualization and interpretability analysis, explore the example notebooks in `notebooks/`.-->
This project builds upon and extends the original [Baby Dragon Hatchling repository by Pathway](https://github.com/pathwaycom/bdh).

The original authors' foundational work on biologically inspired neural architectures underpins this extension.

***

## Learn and Discuss

- Watch the *SuperDataScience podcast* [▶️ *Dragon Hatchling: The Missing Link Between Transformers and the Brain*](https://www.youtube.com/watch?v=mfV44-mtg7c) (72 min.) featuring Adrian Kosowski in conversation with Jon Krohn, unpacking BDH’s neuron-level architecture and sparse reasoning dynamics.

- Read about BDH in
[*Forbes*](https://www.forbes.com/sites/victordey/2025/10/08/can-ai-learn-and-evolve-like-a-brain-pathways-bold-research-thinks-so/),
[*Semafor*](https://www.semafor.com/article/10/01/2025/new-ai-research-claims-to-be-getting-closer-to-modeling-human-brain),
[*The Turing Post*](https://www.turingpost.com/p/fod-121-300-million-to-start-a-big-promise-for-science#the-freshest-research-papers-catego),
[*Quantum Zeitgeist*](https://quantumzeitgeist.com/palo-alto-ai-firm-pathway-unveils-post-transformer-architecture-for-autonomous-ai/),
[*Golem*](https://www.golem.de/news/neue-ki-architektur-was-ist-baby-dragon-hatchling-2510-201047-2.html),
and elsewhere in the media.

- Discuss and share the BDH paper on:
[*Hugging Face Papers*](https://huggingface.co/papers/2509.26507),
[*Alphaxiv*](https://alphaxiv.org/abs/2509.26507),
and [*EmergentMind*](https://emergentmind.com/papers/2509.26507).
## References
- Vibe coding was involved; Py is not my primary language.
- Zenke et al., *Continual Learning Through Synaptic Intelligence*, ICML 2017
- Kosowski et al., *The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain*, arXiv 2025

## Community Projects
***

- [adamskrodzki/bdh](https://github.com/adamskrodzki/bdh): dynamic vocabulary, stateful attention
- [mosure/burn_dragon_hatchling](https://github.com/mosure/burn_dragon_hatchling): Burn port
- [severian42/bdh](https://github.com/severian42/bdh): MLX port
- [Git-Faisal/bdh](https://github.com/Git-Faisal/bdh)
- [GrahLnn/bdh](https://github.com/GrahLnn/bdh)
## Summary

## Acknowledgements
We thank Andrej Karpathy for the [nanoGPT](https://github.com/karpathy/nanoGPT/) code and the tiny Shapespeare dataset used in this demonstration.
BDH-CL introduces practical, biologically inspired continual learning capabilities into the BDH architecture, enabling robust lifelong learning beyond the single-task limitations of the original. It offers a unique blend of neuroscience theory and state-of-the-art machine learning applied to next-generation language models.

BDH research stands at the intersection of **AI architecture**, **biological learning models**, and **theoretical computer science**—an effort to map the *equations of reasoning* between artificial and biological intelligence.
***
Empty file added __init__.py
Empty file.
171 changes: 0 additions & 171 deletions bdh.py

This file was deleted.

Loading