Skip to content

anhpdd/anhpdd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 

Repository files navigation

Hi, I'm Robin (Duy Anh) πŸ‘‹

Data Scientist | Business Analytics Graduate | Building ML Systems That Solve Real Problems

I turn messy, real-world data into production-ready machine learning systems. My edge? A business background that helps me translate technical solutions into stakeholder valueβ€”not just optimize metrics.

πŸŽ“ Master's in Business Analytics @ Sunway University (Graduating January 2026)
🌏 Seeking roles in: Malaysia | Singapore | Vietnam
πŸš€ Available: January 2026


πŸ’‘ What Makes Me Different

I didn't start in computer scienceβ€”I came from International Business, taught myself data analytics in 2021, and pursued a Master's in Business Analytics. That unconventional path means I don't just build modelsβ€”I solve problems that matter to stakeholders and communicate insights people can actually use.


πŸš€ What I'm Working On

🏠 Property Price Prediction System – 97% accurate ML model for Malaysia's Klang Valley using geospatial features and DBSCAN clustering
☁️ AWS Cloud Certifications – Expanding MLOps capabilities for scalable model deployment
πŸ“± Building in Public – Sharing my data science journey on LinkedIn


πŸ’Ό Featured Projects

The Challenge: Property valuations in Malaysia take days of manual research and cost RM 400-2,000+ per property.

My Solution: Built an end-to-end ML system that predicts prices in under 5 minutes with 97% accuracy (RΒ² = 0.97). The breakthrough wasn't just the algorithmβ€”it was solving a data quality nightmare.

Key Innovation:
Consolidated 18,000+ inconsistent location labels (misspelled road names, duplicate schemes, manual entry errors) into 238 spatial market segments using DBSCAN clustering. This single feature engineering step improved model accuracy from 84% to 97%.

Tech Stack: Python β€’ scikit-learn β€’ DBSCAN β€’ Random Forest β€’ OpenStreetMap β€’ Geospatial Analysis β€’ pandas

Business Impact:
βœ… Reduces valuation time from days β†’ minutes (99% faster)
βœ… Maintains 97% accuracy on unseen 2025 data (temporal validation)
βœ… Production-ready Python package with 50+ unit tests
βœ… Potential cost savings: RM 150,000/month for high-volume agencies

What I Learned: Feature engineering > hyperparameter tuning. I achieved 97% with default Random Forest parametersβ€”proving that smart data preparation matters more than complex algorithms.

πŸ“‚ View Full Project | πŸ“Š Technical Deep Dive


The Problem: Brands need to understand how they're perceived on social media, but manual analysis doesn't scale.

My Solution: Built an NLP pipeline that processes 10,000+ social media posts to extract brand perception insights and competitive positioning.

Tech Stack: Python β€’ NLP β€’ Sentiment Analysis β€’ pandas β€’ Text Processing

Business Value:
βœ… Automated sentiment tracking across platforms
βœ… Comparative brand analysis (Uniqlo vs Muji positioning)
βœ… Network analysis revealing influencer patterns

πŸ“‚ View Project


The Problem: Logistics companies need efficient routing to minimize delivery time and fuel costs.

My Solution: Implemented a genetic algorithm solution for the Traveling Salesman Problem, optimizing delivery routes across 150+ locations in Subang, Malaysia.

Tech Stack: Python β€’ Genetic Algorithms β€’ Optimization β€’ Evolutionary Computing

Impact:
βœ… Reduces total route distance by 20-30%
βœ… Scalable to real-world logistics scenarios
βœ… Demonstrates algorithmic problem-solving

πŸ“‚ View Project


πŸ› οΈ Tech Stack

Core Skills:
Python β€’ SQL β€’ Machine Learning β€’ Statistical Analysis β€’ Data Visualization

ML & Data Science:
scikit-learn β€’ pandas β€’ NumPy β€’ TensorFlow β€’ DBSCAN β€’ Random Forest β€’ Feature Engineering

Visualization & BI:
Tableau β€’ Power BI β€’ Matplotlib β€’ Seaborn β€’ Plotly

Cloud & DevOps:
AWS (learning) β€’ Git β€’ Jupyter β€’ VS Code β€’ Google Colab

Domain Expertise:
Geospatial Analysis β€’ NLP β€’ Sentiment Analysis β€’ Optimization Algorithms β€’ Time Series Analysis


πŸ“Š GitHub Activity

Anh's GitHub Stats

Top Languages


🎯 What I Bring to Your Team

βœ… End-to-end ML execution – From messy data to production-ready models
βœ… Business acumen – I understand stakeholder needs and translate technical insights into action
βœ… Communication skills – I explain complex concepts to non-technical audiences (proven through LinkedIn content)
βœ… Production mindset – I write clean, tested, documented code (see my 50+ unit tests)
βœ… Continuous learning – Currently expanding into AWS/MLOps to enhance deployment capabilities


πŸ“« Let's Connect

πŸ’Ό LinkedIn: linkedin.com/in/phan-Δ‘α»©c-duy-anh
πŸ“§ Email: duyanh.phanduc@gmail.com
🌐 GitHub: github.com/anhpdd

Currently seeking: Data Scientist | Business Intelligence Analyst roles
Available: January 2026
Locations: Malaysia | Singapore | Vietnam
Work Authorization: Graduate Pass sponsorship required


πŸ’¬ Recent Highlights

πŸ“± Building in Public: Sharing my data science journey on LinkedIn with 3x weekly posts about ML, career lessons, and technical deep-dives

πŸŽ“ Academic Recognition: Capstone project supervised by Dr. Norman Arshed & Dr. Mubbasher Munir (Sunway University)

🌱 Current Learning: AWS Cloud Practitioner certification, LLM integration with Gemini API, MLOps best practices


πŸ”₯ Fun Facts

  • 🌏 Originally from International Business β†’ self-taught analytics β†’ Master's in Business Analytics
  • πŸ“š Started learning data science on DataCamp in 2021
  • πŸ—ΊοΈ Fascinated by geospatial analytics and how location data shapes decisions
  • β˜• Best ideas come at 2 AM during debugging sessions

"Data science isn't just about algorithmsβ€”it's about solving real problems end-to-end."


⭐ If you found my work interesting, consider giving my repos a star!

πŸ’Ό Open to collaboration, mentorship, and full-time opportunities starting January 2026.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published