🧑💻 AI Engineer • Agents / Computer Vision / NLP / Multimodal / Deep Search
Building grounded, explainable, and product-ready AI systems together.
|
🎯 Targeting ✨ Highlight 🏆 Outcome |
🎯 Targeting ✨ Highlight 🏆 Outcome |
🎯 Targeting ✨ Highlight 🏆 Outcome |
专注于将 Machine Learning / Deep Learning 能力工程化落地,核心方向包括:
- 🤖 Agents:深度搜索、不确定性感知、反思式迭代、Agent Collaboration、缓解RAG类检索增强的冲突与幻觉问题
- 💻 Computer Vision:视频修复与视频理解、超分辨率、OCR、医学影像分析
- 📝 NLP:文本理解与任务化生成、社媒舆论隐性恶意识别、情感分析
- 🎨 Multimodal:视觉-语言协同建模与跨模态推理, 建模图文特征关系
- 🔍 Deep Search 搜索算法:段落级检索、重排、推理链路与证据对齐
- 🏗️ 3D Reconstruction:ON Research real-world 3D reconstruction and digital twin applications
不仅关注模型在 benchmark 上的表现,更注重 可复现性、可解释性、以及生产环境的适配性。
🤖 UPAIRS-Agents | Uncertainty-Aware Paragraph-Level Iterative Reflective Deep Search | 优化Bettafish微舆
Version 2.0 | 深度搜索 + Agents | 2026年2月
核心创新:
- 基于压力驱动(Pressure-Driven)的段落级迭代反思搜索架构,动态反思压力机制(Dynamic Reflection Stress)
- **五因子不确定性量化模型(5-Factor UQ Model)**自适应分配检索预算与推理深度
- 8种专业化角色Agent协作工作流,优化拆分搜索解析与高效协同
- 多样性去噪重排序 + NDCG/MRR 排序指标评估查询来源质量
- 异步高并发执行:asyncio + aiohttp 跨章节并发检索与调度
成果:
- ✨ 解决内容幻觉问题(算法量化确定性 88%⭡)
- 💴 Token成本优化(单次深度搜索约¥0.1)
- 🎯 强化推理稳定性、批判性、证据一致性
技术栈:Python | LLM Orchestration | asyncio | RAG | Uncertainty Quantification | MAS
Version 1.0 | OCR + 超分辨 + 文档结构化 | 2025年10月
问题:OCR 在税务票据、合同中低质扫描、拍照文档的识别准确率低
核心创新:
- Prior-Enhanced Attention 模块融入视觉+语义先验加权机制
- 从文本先验超分辨→文本检测→文本识别→文档结构化的端到端自动深度学习 pipeline
- 支持对图像扫描件的段落、字符、符号、字段标签的结构识别与组织,输出结构化JSON
成果:
- 📈 官方TextZoom资料源量化指标平均优化 5%⭡(PSNR, SSIM, Lrec, L1, CTCL)
- 🔝 鲁棒性增强:对模糊边缘与复杂背景的识别能力显著提升
- 🚀 下游自动化支持:企业表单处理、文档自动摘要等
技术栈:Python | PyTorch | Super Resolution | OCR | Computer Vision | Prior-Enhanced Attention
🎬 PEANUT | Multimodal End2End Framework for Semantic Image2Video Understanding & Restoration | 个人科研实践项目 2025
Version 3.0 | 视频修复 + 深度学习 | 2025年
问题:SAM2 (Meta, 2024) 框架长时序列的记忆偏见积累问题;轻量化部署需求
核心创新:
- Conditional Memory Encoder + Hierarchical Selective Attention(HSA) 促进多层次语义理解
- 端到端可训练的语义驱动光流引导视频修复框架,对 E2FGVI (CVPR 2022) 创新优化
- NOF-Eraser 模块:仅利用轻量CNN特征提取 + Deformable Convolution + 光流辅助特征传播
- Mask-Agent 模块:文本语义解析自动识别用户关注区域,生成高质量掩码
成果:
- 📊 量化指标PSNR, SSIM, Sharpness, 伪影均优化 6%⭡ 原SAM-Based架构
- ⚡ 轻量部署:RTX 4070 Laptop 上实现 5.526 s/vsec (30fps), 12.839 s/vsec (60fps)
- 🎨 个性化效果:AI驱动掩码生成增强用户交互体验
技术栈:Python | PyTorch | Video Restoration | Optical Flow | Attention | Deformable Conv
Team Competition Track | 多模态语义理解 + 隐性恶意识别 | 2025年9月至今
问题:社交媒体图文语义不一致、隐性指代、反讽式仇恨表达
核心创新:
- CLIP + LLMs 图像-文本联合框架,PE优化LLM输出
- 多个LLMs交叉验证的结果嵌入特征,增强社会先验建模(文化背景、网络热梗反讽)
- Cross-Modal Inference Graph:引入注意力矩阵显式建模图文特征关系
- GAT 编码跨模态图结构,实现多节点语义推理
成果:
- 🎯 多模态语义增强与隐性恶意识别系统
- 📈 高阶语义关联优化,缓解传统拼接式融合的信息损失
- 🔗 实体-语义-情感推导路径的显式刻画
技术栈:Python | CLIP | LLMs | Multimodal Fusion | GAT | Graph Neural Networks
Project | 机器学习 + 医学影像 | 2024年
问题:MRI脑肿瘤多分类中非典型小样本数据、病灶细粒度差异敏感性低
核心创新:
- 系统部署多种经典ML算法:SVM、XGBoost、Random Forest、KNN、LR、MLP
- 特征工程 + 传统模型组合策略提升小样本数据利用效率
- HOG + 统计纹理特征提取高判别性特征,减少大规模数据依赖
- 数据增强、特征归一化与降维方法有效缓解医学影像噪声与模糊
成果:
- 📊 性能指标:Acc=95.38%、F1=0.9553、Prec=96.07%、Rec=0.9551
- 💪 小样本学习:仅在1200份数据集达到优秀识别能力
- 🏥 临床适用:对少数病征判断的高精度识别能力
技术栈:Python | XGBoost | Random Forest | Feature Engineering | Medical Imaging
-
DataFun Conference 2025(北京 + 深圳)|参会者|2025年7月、11月
参加 DataFunTalk 主办的 DACon 数智大会,系统化关注 Data + AI、大模型技术、数据智能与产业落地。重点学习大模型应用、搜索推荐系统、NLP、AI Agent、广告算法与数据架构,并了解 RAG、LLM 应用、ChatBI / Data Agent 等企业级方案。 -
NVIDIA Technology Conference AI Summit(台湾)|参会者|2024年6月
受邀参与 NVIDIA GTC 相关 AI 技术论坛与交流,重点关注 AI 在医学影像分析、三维重建、数字孪生等真实场景的应用;持续跟踪 VLM、GenAI、Robotics、Physical AI 等技术方向。 -
湾西小冰(中山)科技有限公司|珠海市项目经理|2024年
负责珠海地区小冰数字人应用方案推广与优化,推进 AI 产品落地;面向客户需求提出定制化 C2C 方案,提升匹配度与满意度。
I focus on turning Machine Learning / Deep Learning into deployable systems, with strong interests in:
- Agents: uncertainty-aware planning, iterative reflection, tool orchestration
- Computer Vision: OCR, medical imaging, video restoration and understanding
- NLP: BERT-based modeling, language understanding and generation
- Multimodal: vision-language modeling and cross-modal reasoning
- Deep Search Algorithms: paragraph-level retrieval, reranking, evidence-aligned reasoning
I care not only about benchmark scores, but also reproducibility, explainability, and production readiness.
Version 2.0 | Agentic AI + Deep Search | Feb 2026
Core Innovations:
- Pressure-Driven Paragraph-Level Iterative Reflection Search Architecture with Dynamic Reflection Stress mechanism
- 5-Factor Uncertainty Quantification (5-Factor UQ Model) for adaptive retrieval budget & reasoning depth allocation
- 8 specialized role-based Agent collaboration workflows optimizing search decomposition & coordination
- Diversity-aware denoising reranking + NDCG/MRR metrics for source quality assessment
- Async high-concurrency execution: asyncio + aiohttp for cross-section parallel retrieval
Outcomes:
- ✨ Resolved content hallucination (88% algorithmic certainty⭡)
- 💴 Token cost optimization (~¥0.1 per deep search query)
- 🎯 Reinforced reasoning stability, criticality, evidence consistency
Tech Stack: Python | LLM Orchestration | asyncio | RAG | Uncertainty Quantification | MAS
Version 1.0 | OCR + Super-Resolution + Document Structuring | Internship Engineering | Oct 2025
Problem: Low OCR accuracy on low-quality document scans, receipts, contracts in enterprise scenarios
Core Innovations:
- Prior-Enhanced Attention Module incorporating visual + semantic prior weighting
- End-to-end automatic DL pipeline: text super-resolution → text detection → recognition → structure output
- Structure-aware parsing for paragraphs, characters, symbols, field labels; JSON-formatted structured output
Outcomes:
- 📈 Official TextZoom benchmark improvement: 5%⭡ across metrics (PSNR, SSIM, Lrec, L1, CTCL)
- 🔝 Enhanced robustness: improved recognition on blurred edges & complex backgrounds
- 🚀 Downstream automation support: enterprise form processing, automatic document summarization
Tech Stack: Python | PyTorch | Super Resolution | OCR | Computer Vision | Prior-Enhanced Attention
Version 3.0 | Video Restoration + Deep Learning | 2025
Problem: SAM2 (Meta, 2024) memory bias accumulation in long sequences; lightweight deployment requirements
Core Innovations:
- Conditional Memory Encoder + Hierarchical Selective Attention (HSA) for multi-level semantic understanding
- End-to-end trainable semantic-driven optical-flow-guided video restoration framework; improved upon E2FGVI (CVPR 2022)
- NOF-Eraser module: lightweight CNN feature extraction + Deformable Convolution + optical-flow-assisted propagation
- Mask-Agent module: text semantic parsing automatically identifies user focus areas & generates high-quality masks
Outcomes:
- 📊 Metrics (PSNR, SSIM, Sharpness, Artifacts) improved 6%⭡ over SAM-Based baseline
- ⚡ Lightweight deployment: 5.526 s/vsec (30fps), 12.839 s/vsec (60fps) on RTX 4070 Laptop
- 🎨 Personalized effects: AI-driven mask generation enhances user interaction
Tech Stack: Python | PyTorch | Video Restoration | Optical Flow | Attention | Deformable Conv
Team Competition | Multimodal Semantic Understanding + Implicit Hate Detection | Sep 2025 - Present
Problem: Social media image-text semantic mismatch, implicit references, ironic hate speech
Core Innovations:
- CLIP + LLMs joint image-text framework with PE optimization of LLM outputs
- Cross-verification results embedding from multiple LLMs enhancing social prior modeling (cultural context, memes)
- Cross-Modal Inference Graph: explicit attention matrix modeling image-text relationships
- GAT encoding of cross-modal graph structures enabling multi-node semantic reasoning
Outcomes:
- 🎯 Multimodal semantic enhancement & implicit hate detection system
- 📈 High-order semantic association optimization addressing fusion information loss
- 🔗 Explicit entity-semantic-emotion inference path modeling
Tech Stack: Python | CLIP | LLMs | Multimodal Fusion | GAT | Graph Neural Networks
Project | Machine Learning + Medical Imaging | 2024
Problem: Small-sample atypical data in MRI brain tumor multi-classification; low sensitivity to lesion fine-grained differences
Core Innovations:
- Systematic deployment of classic ML algorithms: SVM, XGBoost, Random Forest, KNN, LR, MLP
- Feature engineering + model combination strategy improving small-sample data efficiency
- HOG + statistical texture features for high-discriminative feature extraction, reducing large-scale data dependency
- Data augmentation, feature normalization & dimensionality reduction mitigating medical imaging noise
Outcomes:
- 📊 Performance metrics: Acc=95.38%, F1=0.9553, Prec=96.07%, Rec=0.9551
- 💪 Small-sample learning: excellent recognition on just 1200 samples
- 🏥 Clinical applicability: high-precision rare disease detection capability
Tech Stack: Python | XGBoost | Random Forest | Feature Engineering | Medical Imaging
Repository | NLP Foundation Framework | Teaching & Practice
Content:
- Complete BERT pretraining & fine-tuning workflows
- Hugging Face ecosystem applications
- Modular NLP processing pipeline design
- Task implementations: text classification, named entity recognition, etc.
Tech Stack: Python | PyTorch | BERT | Hugging Face | NLP
-
DataFun Conference 2025 (Beijing + Shenzhen) | Attendee | Jul & Nov 2025
Participated in DACon by DataFunTalk with a structured focus on Data + AI, large models, data intelligence, and industrial deployment. Followed key topics including LLM applications, search/recommendation systems, NLP, AI Agents, ad algorithms, and data architecture, with practical exposure to RAG, ChatBI, and Data Agent solutions. -
NVIDIA Technology Conference AI Summit (Taiwan) | Attendee | Jun 2024
Joined NVIDIA GTC-related AI technical sessions and exchanges, focusing on real-world AI applications in medical imaging, 3D reconstruction, and digital twins, while tracking VLM, GenAI, Robotics, and Physical AI trends. -
Wanxi Xiaobing (Zhongshan) Technology Co., Ltd. | Zhuhai Project Manager | 2024
Led regional promotion and optimization of Xiaobing digital-human solutions in Zhuhai, with hands-on experience in AI product adoption. Worked directly with clients to propose customized C2C solutions and improve product-fit satisfaction.
- Modeling: ML/DL pipelines, transformer-style methods, multimodal fusion
- Agent Engineering: planning, reflection, retrieval augmentation, evaluation loops
- CV + NLP Integration: OCR, classification, video processing, language reasoning
- Implementation: Python, PyTorch ecosystem, experiment-driven iteration
- GitHub: https://github.com/SuleynanAuir
- Email Welcome to Connect 😊:
- Daily suleynanaiur@gmail.com
- Research & Academic t330034027@mail.uic.edu.cn
- Domestic 2925795986@qq.com
- LinkedIn: https://www.linkedin.com/in/aiur-suleynan-1a58872b9/
Building AI systems that can search deeply, reason clearly, and deliver reliably.