Skip to content
View SuleynanAuir's full-sized avatar
🤩
👩‍🎓 DL Searcher | 🔧 ML Engeering | 🤖 Agent Builder…etc Welcome u all 🤗
🤩
👩‍🎓 DL Searcher | 🔧 ML Engeering | 🤖 Agent Builder…etc Welcome u all 🤗

Block or report SuleynanAuir

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SuleynanAuir/README.md

👋 Hi, This is Aiur Suleynan

🧑‍💻 AI Engineer • Agents / Computer Vision / NLP / Multimodal / Deep Search
Building grounded, explainable, and product-ready AI systems together.

GitHub stars

Repositories Featured Cards LinkedIn

中文 English

📚 TOC

中文目录


⭐ 精选项目卡片

🎯 Targeting
解决:MiroFish舆情分析“正确的废话、官话”、千人一面模板结论 + 内容时效性问题 + 高tokens、耗时问题 + 难以围绕用户真实关注点深度分析(模拟过程重复无效的推理)

✨ Highlight
Attn Anchors + Data Augmentation + MultiSource High-Fidelity Retrieval + GraphRAG-Enhanced Digital Cognitive Twin World(Canyon)

🔧 Stack

🏆 Outcome
从“泛化回答”→“用户独享认知推演”:先验锚点聚焦与多智能体协同,显著降低无效、重复推理,提升证据一致性与趋势可解释性;实现更快与更低成本的推理(项目口径:整体成本可降约40%~60%,分析时长可压缩至约1/3),支持政策、市场、舆论与研究多视角高保真决策模拟

🎯 Targeting
OCR在企业税务票据、合同中低质扫描、拍照文档的识别准确率低 + 影响下游自动化表单处理

✨ Highlight
Prior-Enhanced Attention OCR for Structure Parsing + Structure-SR + Image2Structure Pipeline

🔧 Stack

🏆 Outcome
提升文档图像内容抽取与结构化解析能力:通过优化文档文字&结构超分辨率技术,实现更高精度的文档理解,以及结构化文档内容提取(官方TextZoom资料源量化指标平均优化超5%⭡)

🎯 Targeting
解决原有 SAM2 (Meta, 2024) 框架“长时序列的记忆偏见积累”问题 + 轻量部署和用户Laptop配置友好型

✨ Highlight
Prompt-Enhanced + Conditional Filtering Memory Block + HSA + Optical-Flow Guide

🔧 Stack

🏆 Outcome
增强视频时空与语义一致性表现:各量化指标均优化超过原SAM-Based架构6%⭡+ 测试用户 Laptop端设备(RTX 4070 Laptop) 最佳参数配置视频语义修复 5.526 s/vsec (30fps), 12.839 s/vsec (60fps)

Browse All Repositories 中文更多项目 More Projects EN


🇨🇳 中文版

核心方向 代表项目 活动经历

🎯 核心方向

专注于将 Machine Learning / Deep Learning 能力工程化落地,核心方向包括:

  • 🤖 Agents:深度搜索、不确定性感知、反思式迭代、Agent Collaboration、缓解RAG类检索增强的冲突与幻觉问题
  • 💻 Computer Vision:视频修复与视频理解、超分辨率、OCR、医学影像分析
  • 📝 NLP:文本理解与任务化生成、社媒舆论隐性恶意识别、情感分析
  • 🎨 Multimodal:视觉-语言协同建模与跨模态推理, 建模图文特征关系
  • 🔍 Deep Search 搜索算法:段落级检索、重排、推理链路与证据对齐
  • 🏗️ 3D Reconstruction:ON Research real-world 3D reconstruction and digital twin applications

不仅关注模型在 benchmark 上的表现,更注重 可复现性、可解释性、以及生产环境的适配性

🧩 代表项目(点击直达)

🤖 UPAIRS-Agents | Uncertainty-Aware Paragraph-Level Iterative Reflective Deep Search | 优化Bettafish微舆

Version 2.0 | 深度搜索 + Agents | 2026年2月

核心创新

  • 基于压力驱动(Pressure-Driven)的段落级迭代反思搜索架构,动态反思压力机制(Dynamic Reflection Stress)
  • **五因子不确定性量化模型(5-Factor UQ Model)**自适应分配检索预算与推理深度
  • 8种专业化角色Agent协作工作流,优化拆分搜索解析与高效协同
  • 多样性去噪重排序 + NDCG/MRR 排序指标评估查询来源质量
  • 异步高并发执行:asyncio + aiohttp 跨章节并发检索与调度

成果

  • ✨ 解决内容幻觉问题(算法量化确定性 88%⭡)
  • 💴 Token成本优化(单次深度搜索约¥0.1)
  • 🎯 强化推理稳定性、批判性、证据一致性

技术栈Python | LLM Orchestration | asyncio | RAG | Uncertainty Quantification | MAS


🧾 P-ADONIS | Prior-Enhanced Attention Doc-OCR Network for Image2Structure | 实习工程项目

Version 1.0 | OCR + 超分辨 + 文档结构化 | 2025年10月

问题:OCR 在税务票据、合同中低质扫描、拍照文档的识别准确率低

核心创新

  • Prior-Enhanced Attention 模块融入视觉+语义先验加权机制
  • 从文本先验超分辨→文本检测→文本识别→文档结构化的端到端自动深度学习 pipeline
  • 支持对图像扫描件的段落、字符、符号、字段标签的结构识别与组织,输出结构化JSON

成果

  • 📈 官方TextZoom资料源量化指标平均优化 5%⭡(PSNR, SSIM, Lrec, L1, CTCL)
  • 🔝 鲁棒性增强:对模糊边缘与复杂背景的识别能力显著提升
  • 🚀 下游自动化支持:企业表单处理、文档自动摘要等

技术栈Python | PyTorch | Super Resolution | OCR | Computer Vision | Prior-Enhanced Attention

📄 Report Paper


🎬 PEANUT | Multimodal End2End Framework for Semantic Image2Video Understanding & Restoration | 个人科研实践项目 2025

Version 3.0 | 视频修复 + 深度学习 | 2025年

问题:SAM2 (Meta, 2024) 框架长时序列的记忆偏见积累问题;轻量化部署需求

核心创新

  • Conditional Memory Encoder + Hierarchical Selective Attention(HSA) 促进多层次语义理解
  • 端到端可训练的语义驱动光流引导视频修复框架,对 E2FGVI (CVPR 2022) 创新优化
  • NOF-Eraser 模块:仅利用轻量CNN特征提取 + Deformable Convolution + 光流辅助特征传播
  • Mask-Agent 模块:文本语义解析自动识别用户关注区域,生成高质量掩码

成果

  • 📊 量化指标PSNR, SSIM, Sharpness, 伪影均优化 6%⭡ 原SAM-Based架构
  • ⚡ 轻量部署:RTX 4070 Laptop 上实现 5.526 s/vsec (30fps), 12.839 s/vsec (60fps)
  • 🎨 个性化效果:AI驱动掩码生成增强用户交互体验

技术栈Python | PyTorch | Video Restoration | Optical Flow | Attention | Deformable Conv

📄 Report Paper


👁️ Hateful Memes Detection & Innovation | Facebook AI × NeurIPS 2020 Track

Team Competition Track | 多模态语义理解 + 隐性恶意识别 | 2025年9月至今

问题:社交媒体图文语义不一致、隐性指代、反讽式仇恨表达

核心创新

  • CLIP + LLMs 图像-文本联合框架,PE优化LLM输出
  • 多个LLMs交叉验证的结果嵌入特征,增强社会先验建模(文化背景、网络热梗反讽)
  • Cross-Modal Inference Graph:引入注意力矩阵显式建模图文特征关系
  • GAT 编码跨模态图结构,实现多节点语义推理

成果

  • 🎯 多模态语义增强与隐性恶意识别系统
  • 📈 高阶语义关联优化,缓解传统拼接式融合的信息损失
  • 🔗 实体-语义-情感推导路径的显式刻画

技术栈Python | CLIP | LLMs | Multimodal Fusion | GAT | Graph Neural Networks


🧠 Clinical Brain Tumor Detection | Brain Tumour Research Competition (Canada, 2023)

Project | 机器学习 + 医学影像 | 2024年

问题:MRI脑肿瘤多分类中非典型小样本数据、病灶细粒度差异敏感性低

核心创新

  • 系统部署多种经典ML算法:SVM、XGBoost、Random Forest、KNN、LR、MLP
  • 特征工程 + 传统模型组合策略提升小样本数据利用效率
  • HOG + 统计纹理特征提取高判别性特征,减少大规模数据依赖
  • 数据增强、特征归一化与降维方法有效缓解医学影像噪声与模糊

成果

  • 📊 性能指标:Acc=95.38%、F1=0.9553、Prec=96.07%、Rec=0.9551
  • 💪 小样本学习:仅在1200份数据集达到优秀识别能力
  • 🏥 临床适用:对少数病征判断的高精度识别能力

技术栈Python | XGBoost | Random Forest | Feature Engineering | Medical Imaging

📄 Analysis Paper



UPAIRS-Agents P-ADONIS PEANUT Clinical-BT E2FGVI-PLUS

🏅 活动经历

  • DataFun Conference 2025(北京 + 深圳)|参会者|2025年7月、11月
    参加 DataFunTalk 主办的 DACon 数智大会,系统化关注 Data + AI、大模型技术、数据智能与产业落地。重点学习大模型应用、搜索推荐系统、NLP、AI Agent、广告算法与数据架构,并了解 RAG、LLM 应用、ChatBI / Data Agent 等企业级方案。

  • NVIDIA Technology Conference AI Summit(台湾)|参会者|2024年6月
    受邀参与 NVIDIA GTC 相关 AI 技术论坛与交流,重点关注 AI 在医学影像分析、三维重建、数字孪生等真实场景的应用;持续跟踪 VLM、GenAI、Robotics、Physical AI 等技术方向。

  • 湾西小冰(中山)科技有限公司|珠海市项目经理|2024年
    负责珠海地区小冰数字人应用方案推广与优化,推进 AI 产品落地;面向客户需求提出定制化 C2C 方案,提升匹配度与满意度。


🇺🇸 English Version

Core Directions Featured Projects Activities

🎯 Core Directions

I focus on turning Machine Learning / Deep Learning into deployable systems, with strong interests in:

  • Agents: uncertainty-aware planning, iterative reflection, tool orchestration
  • Computer Vision: OCR, medical imaging, video restoration and understanding
  • NLP: BERT-based modeling, language understanding and generation
  • Multimodal: vision-language modeling and cross-modal reasoning
  • Deep Search Algorithms: paragraph-level retrieval, reranking, evidence-aligned reasoning

I care not only about benchmark scores, but also reproducibility, explainability, and production readiness.

🧩 Featured Projects (Direct Links)

🤖 UPAIRS-Agents | Uncertainty-Aware Paragraph-Level Iterative Reflective Deep Search

Version 2.0 | Agentic AI + Deep Search | Feb 2026

Core Innovations:

  • Pressure-Driven Paragraph-Level Iterative Reflection Search Architecture with Dynamic Reflection Stress mechanism
  • 5-Factor Uncertainty Quantification (5-Factor UQ Model) for adaptive retrieval budget & reasoning depth allocation
  • 8 specialized role-based Agent collaboration workflows optimizing search decomposition & coordination
  • Diversity-aware denoising reranking + NDCG/MRR metrics for source quality assessment
  • Async high-concurrency execution: asyncio + aiohttp for cross-section parallel retrieval

Outcomes:

  • ✨ Resolved content hallucination (88% algorithmic certainty⭡)
  • 💴 Token cost optimization (~¥0.1 per deep search query)
  • 🎯 Reinforced reasoning stability, criticality, evidence consistency

Tech Stack: Python | LLM Orchestration | asyncio | RAG | Uncertainty Quantification | MAS


🧾 P-ADONIS | Prior-Enhanced Attention Doc-OCR Network for Image2Structure

Version 1.0 | OCR + Super-Resolution + Document Structuring | Internship Engineering | Oct 2025

Problem: Low OCR accuracy on low-quality document scans, receipts, contracts in enterprise scenarios

Core Innovations:

  • Prior-Enhanced Attention Module incorporating visual + semantic prior weighting
  • End-to-end automatic DL pipeline: text super-resolution → text detection → recognition → structure output
  • Structure-aware parsing for paragraphs, characters, symbols, field labels; JSON-formatted structured output

Outcomes:

  • 📈 Official TextZoom benchmark improvement: 5%⭡ across metrics (PSNR, SSIM, Lrec, L1, CTCL)
  • 🔝 Enhanced robustness: improved recognition on blurred edges & complex backgrounds
  • 🚀 Downstream automation support: enterprise form processing, automatic document summarization

Tech Stack: Python | PyTorch | Super Resolution | OCR | Computer Vision | Prior-Enhanced Attention

📄 Report Paper


🎬 PEANUT | Multimodal End2End Framework for Semantic Image2Video Understanding & Restoration

Version 3.0 | Video Restoration + Deep Learning | 2025

Problem: SAM2 (Meta, 2024) memory bias accumulation in long sequences; lightweight deployment requirements

Core Innovations:

  • Conditional Memory Encoder + Hierarchical Selective Attention (HSA) for multi-level semantic understanding
  • End-to-end trainable semantic-driven optical-flow-guided video restoration framework; improved upon E2FGVI (CVPR 2022)
  • NOF-Eraser module: lightweight CNN feature extraction + Deformable Convolution + optical-flow-assisted propagation
  • Mask-Agent module: text semantic parsing automatically identifies user focus areas & generates high-quality masks

Outcomes:

  • 📊 Metrics (PSNR, SSIM, Sharpness, Artifacts) improved 6%⭡ over SAM-Based baseline
  • ⚡ Lightweight deployment: 5.526 s/vsec (30fps), 12.839 s/vsec (60fps) on RTX 4070 Laptop
  • 🎨 Personalized effects: AI-driven mask generation enhances user interaction

Tech Stack: Python | PyTorch | Video Restoration | Optical Flow | Attention | Deformable Conv

📄 Report Paper


👁️ Hateful Memes Detection & Innovation | Facebook AI × NeurIPS 2020 Track

Team Competition | Multimodal Semantic Understanding + Implicit Hate Detection | Sep 2025 - Present

Problem: Social media image-text semantic mismatch, implicit references, ironic hate speech

Core Innovations:

  • CLIP + LLMs joint image-text framework with PE optimization of LLM outputs
  • Cross-verification results embedding from multiple LLMs enhancing social prior modeling (cultural context, memes)
  • Cross-Modal Inference Graph: explicit attention matrix modeling image-text relationships
  • GAT encoding of cross-modal graph structures enabling multi-node semantic reasoning

Outcomes:

  • 🎯 Multimodal semantic enhancement & implicit hate detection system
  • 📈 High-order semantic association optimization addressing fusion information loss
  • 🔗 Explicit entity-semantic-emotion inference path modeling

Tech Stack: Python | CLIP | LLMs | Multimodal Fusion | GAT | Graph Neural Networks


🧠 Clinical Brain Tumor Detection | Optimized ML Framework for MRI Diagnosis

Project | Machine Learning + Medical Imaging | 2024

Problem: Small-sample atypical data in MRI brain tumor multi-classification; low sensitivity to lesion fine-grained differences

Core Innovations:

  • Systematic deployment of classic ML algorithms: SVM, XGBoost, Random Forest, KNN, LR, MLP
  • Feature engineering + model combination strategy improving small-sample data efficiency
  • HOG + statistical texture features for high-discriminative feature extraction, reducing large-scale data dependency
  • Data augmentation, feature normalization & dimensionality reduction mitigating medical imaging noise

Outcomes:

  • 📊 Performance metrics: Acc=95.38%, F1=0.9553, Prec=96.07%, Rec=0.9551
  • 💪 Small-sample learning: excellent recognition on just 1200 samples
  • 🏥 Clinical applicability: high-precision rare disease detection capability

Tech Stack: Python | XGBoost | Random Forest | Feature Engineering | Medical Imaging

📄 Analysis Paper


📚 HuggingFace-TA-Material | BERT-Centered NLP Training Framework

Repository | NLP Foundation Framework | Teaching & Practice

Content:

  • Complete BERT pretraining & fine-tuning workflows
  • Hugging Face ecosystem applications
  • Modular NLP processing pipeline design
  • Task implementations: text classification, named entity recognition, etc.

Tech Stack: Python | PyTorch | BERT | Hugging Face | NLP


UPAIRS-Agents P-ADONIS PEANUT Clinical-BT E2FGVI-PLUS

🏅 Activities

  • DataFun Conference 2025 (Beijing + Shenzhen) | Attendee | Jul & Nov 2025
    Participated in DACon by DataFunTalk with a structured focus on Data + AI, large models, data intelligence, and industrial deployment. Followed key topics including LLM applications, search/recommendation systems, NLP, AI Agents, ad algorithms, and data architecture, with practical exposure to RAG, ChatBI, and Data Agent solutions.

  • NVIDIA Technology Conference AI Summit (Taiwan) | Attendee | Jun 2024
    Joined NVIDIA GTC-related AI technical sessions and exchanges, focusing on real-world AI applications in medical imaging, 3D reconstruction, and digital twins, while tracking VLM, GenAI, Robotics, and Physical AI trends.

  • Wanxi Xiaobing (Zhongshan) Technology Co., Ltd. | Zhuhai Project Manager | 2024
    Led regional promotion and optimization of Xiaobing digital-human solutions in Zhuhai, with hands-on experience in AI product adoption. Worked directly with clients to propose customized C2C solutions and improve product-fit satisfaction.


🛠️ Technical Strengths

  • Modeling: ML/DL pipelines, transformer-style methods, multimodal fusion
  • Agent Engineering: planning, reflection, retrieval augmentation, evaluation loops
  • CV + NLP Integration: OCR, classification, video processing, language reasoning
  • Implementation: Python, PyTorch ecosystem, experiment-driven iteration

🤝 Connect


Building AI systems that can search deeply, reason clearly, and deliver reliably.

Pinned Loading

  1. NEXUS-Navigating-Emergent-X-agent-Universe-Simulator-with-Unprecedented-Insight NEXUS-Navigating-Emergent-X-agent-Universe-Simulator-with-Unprecedented-Insight Public

    NEXUS(Networked Emergent X-agent Universe Simulator) 是一个面向复杂信息环境分析的多智能体 AI 框架,旨在通过多智能体协同与知识驱动推理,对现实世界不断演化的信息流进行深度理解与模拟。系统整合了领域适配微调模型、客制化深度搜索 Agents、用户注意力驱动 Agents以及数字孪生仿真环境 Canyon,构建了一个能够持续获取、组织并分析…

    Python 7 2

  2. PEANUT--Prompt-Enhanced-Ablation-with-Optical-Flow-Based-Neural-Unit PEANUT--Prompt-Enhanced-Ablation-with-Optical-Flow-Based-Neural-Unit Public

    PEANUT (Prompt-Enhanced Ablation with Optical Flow-Based Neural Unit) designed to enhance video restoration by combining spatial and temporal consistency with clarity optimization. The core innovat…

    Python 6 3

  3. UPAIRS-Agents UPAIRS-Agents Public

    UPARIS-DS (Uncertainty-Aware Paragraph-Level Iterative Reflective Deep Search Agents) is an uncertainty-aware paragraph-level iterative reflective framework that enhances deep search agents through…

    Python 5

  4. P-ADONIS P-ADONIS Public

    P-ADONIS (Prior-enhanced Attention Document OCR Network for Image-to-Structure): This project integrates the Prior-Enhanced Attention text image super-resolution model (ours-PADNet) with an OCR pip…

    Python 6

  5. Hateful-Image-Project Hateful-Image-Project Public

    This project focuses on social media multimodal hate detection, addressing challenges such as image–text semantic inconsistency, implicit references, and sarcastic/ironic expressions. We develop a …

    Jupyter Notebook 6

  6. Clinical-Brain-Tumor-Detection-Optimized-ML-Frame4MRI-Diagnosis Clinical-Brain-Tumor-Detection-Optimized-ML-Frame4MRI-Diagnosis Public

    This project builds an optimized ML framework for brain tumor detection using only basic libraries (NumPy, Pandas, Matplotlib). It implements SVM, MLP, XGBoost, KNN, and logistic regression, along …

    5