Skip to content
View saadyaq's full-sized avatar

Block or report saadyaq

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
saadyaq/README.md

๐Ÿ‘‹ Hello, Data World!

Typing SVG

"In data we trust โ€” but first, we clean it."

Portfolio LinkedIn Email


๐Ÿง‘โ€๐Ÿ’ป Who Am I?

I'm Saad Yaqine, a Data Scientist passionate about transforming raw data into measurable business value.
From NLP pipelines to scalable MLOps, I specialize in bringing machine learning solutions to life โ€” clean, efficient, and production-ready.

  • ๐Ÿ” Currently seeking full-time opportunities as a Data Scientist or ML Engineer in Europe
  • ๐Ÿš€ Recently completed my mission @ Bouygues Telecom with measurable business impact
  • ๐Ÿ’ก Passionate about LLM agents, NLP, and MLOps
  • ๐Ÿ“ Based in Morocco | Open to France, Netherlands, Spain

๐Ÿš€ Featured Projects

๐Ÿ” Citation Verifier

Citation Verifier

AI agent that verifies citations in documents to stop hallucinations. Uses Claude/GPT-4o + RAG.

๐Ÿ’ฐ FinSentBot

FinSentBot

Financial sentiment analysis tool using transformer models for market intelligence.


๐Ÿ› ๏ธ My Data Science Toolbox

Languages & Core

Python R SQL

ML & AI

Scikit-Learn PyTorch TensorFlow Transformers LangChain

Big Data & Streaming

PySpark Kafka Airflow

MLOps & Cloud

Docker FastAPI AWS Azure GCP

Databases

PostgreSQL SQL_Server Teradata

Visualization

Streamlit Tableau Power_BI


๐Ÿ’ผ My Data Science Journey

๐Ÿš€ Bouygues Telecom โ€” Data Scientist (2024 - 2025)

  • ๐ŸŽฏ Designed a call classification model (94% accuracy) โ†’ reduced inefficient interactions by 46%.
  • โšก Built MLOps pipelines (Docker, AWS Lambda, SageMaker) โ†’ 30% faster deployments.
  • ๐Ÿ“Š Processed big customer datasets (PySpark, Dataiku) โ†’ +40% ingestion performance.

๐Ÿ“ฑ LeBonCoin โ€” Machine Learning Engineer (2023)

  • ๐Ÿงน Cleaned and normalized user-generated text with Azure Data Factory.
  • ๐Ÿท๏ธ Trained a classifier (92% accuracy) with TF-IDF + embeddings.
  • ๐Ÿš€ Deployed a FastAPI microservice โ†’ automated text analysis of live ads.

๐ŸŽ“ Universidad Politรฉcnica de Valencia โ€” AI/NLP Research Intern (2022 - 2023)

  • ๐ŸŒ Built multilingual corpora (news, tweets) for LLM pretraining.
  • ๐Ÿ”ค Developed a language detection model (>90% accuracy) to enhance preprocessing.
  • ๐Ÿ› ๏ธ Created robust NLP pipelines: cleaning โ†’ vectorization โ†’ encoding.

๐ŸŽฏ My Current Mission

๐Ÿš€ AVAILABLE IMMEDIATELY for new opportunities!
๐Ÿ“ Actively seeking full-time roles as a Data Scientist or ML Engineer where I can drive real-world impact with machine learning.
โœ… Recently completed my mission @ Bouygues Telecom with measurable business impact.


๐Ÿ“Š GitHub Stats & Activity

GitHub Streak
Contribution Graph

๐Ÿค Let's Connect!

Open to opportunities in Data Science & ML Engineering

Portfolio LinkedIn GitHub Email

๐Ÿ’ก Always happy to discuss ML projects, collaborate on research, or chat about the latest in AI!


Profile views

Pinned Loading

  1. emotion-detector emotion-detector Public

    Speech Emotion Recognition (SER) pipeline with RAVDESS dataset - 94% accuracy Random Forest classifier using MFCC, chroma, and spectral features with Streamlit demo app

    Jupyter Notebook 1

  2. FinSentBot FinSentBot Public

    AI-powered financial sentiment analysis bot with automated trading signals. Analyzes market sentiment using NLP and generates trading recommendations.

    Python

  3. rapidoPDF rapidoPDF Public

    Versatile Python PDF toolkit with CLI and Streamlit web interface for compression, PDF/Word conversion, and merging documents using Ghostscript and LibreOffice

    Python

  4. Web_Scraper_news_content Web_Scraper_news_content Public

    Automated news content extractor for media monitoring and NLP datasets - scrapes article titles, dates, URLs, and full text from multiple news websites

    Python 1

  5. documind documind Public

    Intelligent RAG system with FAISS semantic search and Streamlit interface - 123 documents indexed using Google EmbeddingGemma-300M with sub-100ms retrieval

    Python

  6. code_review_ai code_review_ai Public

    Automated Python code review with AST and Claude AI - finds bugs and suggests fixes via GitHub PRs.

    Python 1