Data Scientist | Business Analytics Graduate | Building ML Systems That Solve Real Problems
I turn messy, real-world data into production-ready machine learning systems. My edge? A business background that helps me translate technical solutions into stakeholder valueβnot just optimize metrics.
π Master's in Business Analytics @ Sunway University (Graduating January 2026)
π Seeking roles in: Malaysia | Singapore | Vietnam
π Available: January 2026
I didn't start in computer scienceβI came from International Business, taught myself data analytics in 2021, and pursued a Master's in Business Analytics. That unconventional path means I don't just build modelsβI solve problems that matter to stakeholders and communicate insights people can actually use.
π Property Price Prediction System β 97% accurate ML model for Malaysia's Klang Valley using geospatial features and DBSCAN clustering
βοΈ AWS Cloud Certifications β Expanding MLOps capabilities for scalable model deployment
π± Building in Public β Sharing my data science journey on LinkedIn
The Challenge: Property valuations in Malaysia take days of manual research and cost RM 400-2,000+ per property.
My Solution: Built an end-to-end ML system that predicts prices in under 5 minutes with 97% accuracy (RΒ² = 0.97). The breakthrough wasn't just the algorithmβit was solving a data quality nightmare.
Key Innovation:
Consolidated 18,000+ inconsistent location labels (misspelled road names, duplicate schemes, manual entry errors) into 238 spatial market segments using DBSCAN clustering. This single feature engineering step improved model accuracy from 84% to 97%.
Tech Stack: Python β’ scikit-learn β’ DBSCAN β’ Random Forest β’ OpenStreetMap β’ Geospatial Analysis β’ pandas
Business Impact:
β
Reduces valuation time from days β minutes (99% faster)
β
Maintains 97% accuracy on unseen 2025 data (temporal validation)
β
Production-ready Python package with 50+ unit tests
β
Potential cost savings: RM 150,000/month for high-volume agencies
What I Learned: Feature engineering > hyperparameter tuning. I achieved 97% with default Random Forest parametersβproving that smart data preparation matters more than complex algorithms.
π View Full Project | π Technical Deep Dive
The Problem: Brands need to understand how they're perceived on social media, but manual analysis doesn't scale.
My Solution: Built an NLP pipeline that processes 10,000+ social media posts to extract brand perception insights and competitive positioning.
Tech Stack: Python β’ NLP β’ Sentiment Analysis β’ pandas β’ Text Processing
Business Value:
β
Automated sentiment tracking across platforms
β
Comparative brand analysis (Uniqlo vs Muji positioning)
β
Network analysis revealing influencer patterns
The Problem: Logistics companies need efficient routing to minimize delivery time and fuel costs.
My Solution: Implemented a genetic algorithm solution for the Traveling Salesman Problem, optimizing delivery routes across 150+ locations in Subang, Malaysia.
Tech Stack: Python β’ Genetic Algorithms β’ Optimization β’ Evolutionary Computing
Impact:
β
Reduces total route distance by 20-30%
β
Scalable to real-world logistics scenarios
β
Demonstrates algorithmic problem-solving
Core Skills:
Python β’ SQL β’ Machine Learning β’ Statistical Analysis β’ Data Visualization
ML & Data Science:
scikit-learn β’ pandas β’ NumPy β’ TensorFlow β’ DBSCAN β’ Random Forest β’ Feature Engineering
Visualization & BI:
Tableau β’ Power BI β’ Matplotlib β’ Seaborn β’ Plotly
Cloud & DevOps:
AWS (learning) β’ Git β’ Jupyter β’ VS Code β’ Google Colab
Domain Expertise:
Geospatial Analysis β’ NLP β’ Sentiment Analysis β’ Optimization Algorithms β’ Time Series Analysis
β
End-to-end ML execution β From messy data to production-ready models
β
Business acumen β I understand stakeholder needs and translate technical insights into action
β
Communication skills β I explain complex concepts to non-technical audiences (proven through LinkedIn content)
β
Production mindset β I write clean, tested, documented code (see my 50+ unit tests)
β
Continuous learning β Currently expanding into AWS/MLOps to enhance deployment capabilities
πΌ LinkedIn: linkedin.com/in/phan-Δα»©c-duy-anh
π§ Email: duyanh.phanduc@gmail.com
π GitHub: github.com/anhpdd
Currently seeking: Data Scientist | Business Intelligence Analyst roles
Available: January 2026
Locations: Malaysia | Singapore | Vietnam
Work Authorization: Graduate Pass sponsorship required
π± Building in Public: Sharing my data science journey on LinkedIn with 3x weekly posts about ML, career lessons, and technical deep-dives
π Academic Recognition: Capstone project supervised by Dr. Norman Arshed & Dr. Mubbasher Munir (Sunway University)
π± Current Learning: AWS Cloud Practitioner certification, LLM integration with Gemini API, MLOps best practices
- π Originally from International Business β self-taught analytics β Master's in Business Analytics
- π Started learning data science on DataCamp in 2021
- πΊοΈ Fascinated by geospatial analytics and how location data shapes decisions
- β Best ideas come at 2 AM during debugging sessions
"Data science isn't just about algorithmsβit's about solving real problems end-to-end."
β If you found my work interesting, consider giving my repos a star!
πΌ Open to collaboration, mentorship, and full-time opportunities starting January 2026.