Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
-
Updated
Sep 24, 2025 - Python
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
Survey: https://arxiv.org/pdf/2507.20198
A High-Efficiency System of Large Language Model Based Search Agents
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
Official PyTorch implementation of the paper "Dataset Distillation via the Wasserstein Metric" (ICCV 2025).
TinyML and Efficient Deep Learning Computing | MIT 6.S965/6.5940
Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tuning.
Official PyTorch implementation of the paper "Towards Adversarially Robust Dataset Distillation by Curvature Regularization" (AAAI 2025).
A deep learning framework that implements Early Exit strategies in Convolutional Neural Networks (CNNs) using Deep Q-Learning (DQN). This project enhances computational efficiency by dynamically determining the optimal exit point in a neural network for image classification tasks on CIFAR-10.
Ground-Truthing AI Energy Consumption: Validating CodeCarbon Against External Measurements
MOCA-Net: Novel neural architecture with sparse MoE, external memory, and budget-aware computation. Real Stanford SST-2 integration, O(L) complexity, 96.40% accuracy. Built for efficient sequence modeling.
An open and practical guide to Edge Language
"TRM (Tiny Recursive Model) integration architecture for Symbion.space ecosystem"
In this repo you will understand .The process of reducing the precision of a model’s parameters and/or activations (e.g., from 32-bit floating point to 8-bit integers) to make neural networks smaller, faster, and more energy-efficient with minimal accuracy loss.
Production-grade GPT transformer implemented from scratch in C++. Runs on modest hardware with complete mathematical derivations and optimized tensor operations.
QuantLab-8bit is a reproducible benchmark of 8-bit quantization on compact vision backbones. It includes FP32 baselines, PTQ (dynamic & static), QAT, ONNX exports, parity checks, ORT CPU latency, and visual diagnostics.
Task-Aware Dynamic Model Optimization for Multi-Task Learning (IEEE Access 2023)
🔬 Curiosity-Driven Quantized Mixture of Experts
Code for paper "Automated Design for Hardware-aware Graph Neural Networks on Edge Devices"
Add a description, image, and links to the efficient-ai topic page so that developers can more easily learn about it.
To associate your repository with the efficient-ai topic, visit your repo's landing page and select "manage topics."