Skip to content

adshaa/arai

Repository files navigation

🧬 ARAI (Arabic Root-and-Pattern AI)

"Arabic is not a sequence of letters; it is a mathematical matrix of meaning."

ARAI is a Research Repository dedicated to exploring the hypothesis that Classical Arabic can be treated as a computational system. This project investigates Morphological Algebraβ€”a method of decomposing and synthesizing Semitic languages into their constituent structural DNA using Semitic Root Encoding (SRE).

Warning

Status: Experimental Research. This is not a production-ready model. The current implementations are exploratory and designed to validate the theoretical framework of SRE-based architectures.


πŸ”¬ The Research Hypothesis: Arabic as a Formal System

Standard Large Language Models (LLMs) treat Arabic as a sequence of opaque tokens (BPE), identical to how they treat English or French. However, Classical Arabic is inherently nonlinear. It operates on a multi-dimensional grid where semantic intent and grammatical function are distinct, orthogonal layers.

ARAI investigates treating Arabic as "Code" rather than "Text."

1. The Morphological Matrix (Root x Pattern)

In the SRE framework, we move away from 1D tokenization. Instead, every "word" is viewed as a result of a mathematical operation:

$$Result = Root \oplus Pattern$$

  • The Root (The Semantic Core): Usually a three-letter consonant cluster (e.g., Ωƒ.Ψͺ.Ψ¨ - K-T-B). This is the "constant" that carries the abstract concept of Writing.
  • The Pattern (The Functional Template): A specific template (e.g., Ω…123َ - Ma123a) that carries the concept of Space/Location.

When you apply the Pattern of Location to the Root of Writing, the language "calculates" the word Maktaba (Library/Office). This project explores whether a Transformer architecture can learn this "Morphological Algebra" directly, allowing it to generalize to new root-pattern combinations that it has never seen before.

2. Syllogistic Semantics

Because the language is so highly structured, it mirrors formal logic. A single root can generate hundreds of words across different patterns, yet the semantic "DNA" of the root remains constant. Standard LLMs must learn these relationships statistically; ARAI explores encoding them architecturally.

3. SRE: Semitic Root Encoding

SRE is our experimental approach to sparse embeddings. By feeding the model two distinct input streamsβ€”one for the Root and one for the Patternβ€”we allow the attention mechanism to track semantic flow and grammatical consistency on separate but synchronized channels.


πŸ› οΈ Current Explorations

This repository contains the following experimental modules used to audit the SRE hypothesis:

  • Experimental SRE Transformer: A dual-input architecture designed to investigate the structural interaction between semantic and functional embeddings.
  • Morphological Algebra Benchmarks: Vector-space tools for testing semantic analogies, such as Root(Justice) + (Pattern(Agent) - Pattern(Abstract)) to see if the resulting vector clusters near the expected morphological state.
  • Corpus Ingestion Pipeline: Tools for extracting morphological primitives from classical linguistic datasets, preserving the register and precision of the classical language.
  • Edge Feasibility (WIP): A prototype TensorFlow.js implementation exploring whether morphological logic can enable "Titan-class" reasoning on lightweight hardware by reducing dependency on massive parameter counts.

πŸ—οΈ Repository Roadmap

This project is currently in the Discovery Phase. Our objective is to audit the efficiency of SRE as a data structure and to understand the limitations of current Transformer architectures in capturing nonlinear morphological relationships.

β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ python/arai/     # Core Research Engine (PyTorch)
β”‚   └── javascript/      # Experimental Edge Implementation
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ training/        # Exploratory training pipelines
β”‚   β”œβ”€β”€ evaluation/      # Logical audits and SRE benchmarks
β”‚   └── preprocessing/   # Morphological extraction tools
β”œβ”€β”€ research/            # Historical logs and experimental audits
└── docs/                # Technical specifications and research notes

πŸ§ͺ Getting Started

Note: This repository requires an environment capable of running PyTorch and Camel-Tools.

# Install research dependencies
pip install -e .
npm install

# Run the SRE ingestion pipeline
python3 scripts/preprocessing/preprocess.py

Advanced Agentic Coding Project | Exploring the frontiers of Morphological AGI.

About

🧬 ARAI: Researching Semitic Root Encoding (SRE) and Morphological Algebra to treat Classical Arabic as a computational matrix of meaning.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors