🧬 ARAI (Arabic Root-and-Pattern AI)

"Arabic is not a sequence of letters; it is a mathematical matrix of meaning."

ARAI is a Research Repository dedicated to exploring the hypothesis that Classical Arabic can be treated as a computational system. This project investigates Morphological Algebra—a method of decomposing and synthesizing Semitic languages into their constituent structural DNA using Semitic Root Encoding (SRE).

Warning

Status: Experimental Research. This is not a production-ready model. The current implementations are exploratory and designed to validate the theoretical framework of SRE-based architectures.

🔬 The Research Hypothesis: Arabic as a Formal System

Standard Large Language Models (LLMs) treat Arabic as a sequence of opaque tokens (BPE), identical to how they treat English or French. However, Classical Arabic is inherently nonlinear. It operates on a multi-dimensional grid where semantic intent and grammatical function are distinct, orthogonal layers.

ARAI investigates treating Arabic as "Code" rather than "Text."

1. The Morphological Matrix (Root x Pattern)

In the SRE framework, we move away from 1D tokenization. Instead, every "word" is viewed as a result of a mathematical operation:

$$Result = Root \oplus Pattern$$

The Root (The Semantic Core): Usually a three-letter consonant cluster (e.g., ك.ت.ب - K-T-B). This is the "constant" that carries the abstract concept of Writing.
The Pattern (The Functional Template): A specific template (e.g., م123َ - Ma123a) that carries the concept of Space/Location.

When you apply the Pattern of Location to the Root of Writing, the language "calculates" the word Maktaba (Library/Office). This project explores whether a Transformer architecture can learn this "Morphological Algebra" directly, allowing it to generalize to new root-pattern combinations that it has never seen before.

2. Syllogistic Semantics

Because the language is so highly structured, it mirrors formal logic. A single root can generate hundreds of words across different patterns, yet the semantic "DNA" of the root remains constant. Standard LLMs must learn these relationships statistically; ARAI explores encoding them architecturally.

3. SRE: Semitic Root Encoding

SRE is our experimental approach to sparse embeddings. By feeding the model two distinct input streams—one for the Root and one for the Pattern—we allow the attention mechanism to track semantic flow and grammatical consistency on separate but synchronized channels.

🛠️ Current Explorations

This repository contains the following experimental modules used to audit the SRE hypothesis:

Experimental SRE Transformer: A dual-input architecture designed to investigate the structural interaction between semantic and functional embeddings.
Morphological Algebra Benchmarks: Vector-space tools for testing semantic analogies, such as Root(Justice) + (Pattern(Agent) - Pattern(Abstract)) to see if the resulting vector clusters near the expected morphological state.
Corpus Ingestion Pipeline: Tools for extracting morphological primitives from classical linguistic datasets, preserving the register and precision of the classical language.
Edge Feasibility (WIP): A prototype TensorFlow.js implementation exploring whether morphological logic can enable "Titan-class" reasoning on lightweight hardware by reducing dependency on massive parameter counts.

🏗️ Repository Roadmap

This project is currently in the Discovery Phase. Our objective is to audit the efficiency of SRE as a data structure and to understand the limitations of current Transformer architectures in capturing nonlinear morphological relationships.

├── src/
│   ├── python/arai/     # Core Research Engine (PyTorch)
│   └── javascript/      # Experimental Edge Implementation
├── scripts/
│   ├── training/        # Exploratory training pipelines
│   ├── evaluation/      # Logical audits and SRE benchmarks
│   └── preprocessing/   # Morphological extraction tools
├── research/            # Historical logs and experimental audits
└── docs/                # Technical specifications and research notes

🧪 Getting Started

Note: This repository requires an environment capable of running PyTorch and Camel-Tools.

# Install research dependencies
pip install -e .
npm install

# Run the SRE ingestion pipeline
python3 scripts/preprocessing/preprocess.py

Advanced Agentic Coding Project | Exploring the frontiers of Morphological AGI.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
data		data
docs		docs
models		models
scripts		scripts
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
.repomixignore		.repomixignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TECHNICAL_SPEC.md		TECHNICAL_SPEC.md
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 ARAI (Arabic Root-and-Pattern AI)

🔬 The Research Hypothesis: Arabic as a Formal System

1. The Morphological Matrix (Root x Pattern)

2. Syllogistic Semantics

3. SRE: Semitic Root Encoding

🛠️ Current Explorations

🏗️ Repository Roadmap

🧪 Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 ARAI (Arabic Root-and-Pattern AI)

🔬 The Research Hypothesis: Arabic as a Formal System

1. The Morphological Matrix (Root x Pattern)

2. Syllogistic Semantics

3. SRE: Semitic Root Encoding

🛠️ Current Explorations

🏗️ Repository Roadmap

🧪 Getting Started

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages