A personal library on machine learning, data engineering, system design and engineering craft.
A collection of 50+ resources built over years of study and hands-on work in data and ML engineering.
Topics span from machine learning, generative AI, data engineering, data analysis, system design, and all the way to engineering craft — what it means to grow as a senior engineer and eventually a team lead.
These resources come from university courses, work experience, and a genuine curiosity for the field. I keep adding to this list as I find things worth keeping track of.
The collection covers books, newsletters, courses, articles, research papers, and tools. I also maintain hands-on material in this repo: google-cloud/ with BigQuery SQL patterns and Google Cloud Workflows building blocks, and oauth/ with REST protocols and OAuth2 notes.
There are a lot of resources here and your time is limited. Don't try to consume them all at once — pick a few from the categories where you need to grow most, and do a focused deep dive.
Before choosing, it helps to reflect on your current skill set and identify the real gaps. Then come back and pick 2–3 resources that address those gaps directly.
- Machine Learning Systems by Chip Huyen
- Applied Time Series Analysis by Terence C. Mills
- Fluent Python by Luciano Ramalho
- Learning SQL by Alan Beaulieu
- Data Science on the Google Cloud Platform by Valliappa Lakshmanan
- Introduction to Machine Learning Systems by Vijay Janapa Reddi
- Big Book of MLOps by Databricks
- Designing Data-Intensive Applications by Martin Kleppmann
- Building Microservices: Designing Fine-Grained Systems by Sam Newman
- Fundamentals of Software Architecture by Neal Ford and Mark Richards
- Fundamentals of Data Engineering by Matt Housley
- Data Engineering with Python by Paul Crickard
- AI Engineering: Building Applications with Foundation Models by Chip Huyen
- Building Applications with AI Agents by Michael Albada
- AI Systems Performance Engineering by Chris Fregly
- Prompt Engineering for LLMs by John Berryman and Albert Ziegler
- Machine Learning at Scale by Ludovico Bessi
- The Batch by DeepLearning.AI
- The AI Edge by Damien Benveniste
- Sebastian Raschka's Blog by Sebastian Raschka
- HuggingFace Blog by HuggingFace
- DeepMind Blog by Google DeepMind
- Hacker News by Y Combinator
- ByteByteGo by Alex Xu
- Byte-Sized Design by Byte-Sized Design
- System Design Classroom by Raul Junco
- Make Me a CTO by Sergio Visinoni
- Addy Osmani's Newsletter by Addy Osmani
- Engineering Leadership by Gregor Ojstersek
- X Engineering Blog by X
- Google Research Blog by Google
- Netflix Tech Blog by Netflix
- AWS Architecture Blog by AWS
- Meta Engineering by Meta
- Slack Engineering by Slack
- Microsoft Engineering by Microsoft
- Airbnb Engineering by Airbnb
- Instagram Engineering by Instagram
- GitHub Engineering by GitHub
- Spotify Engineering by Spotify
- KAN: Kolmogorov-Arnold Networks by Ziming Liu et al.
- Attention Is All You Need by Ashish Vaswani et al.
- Agentic Workflow for BigQuery with LangGraph and Gemini by Google Cloud
- Leveraging GenAI to Superpower Analytics Platforms by Sightfull
- Prompting Guide by DAIR.AI
- LangGraph Multi-Agent Discussion by LangChain
- How to Build a Data Lake by LinkedIn Advice
- How to Use ML for Time Series Forecasting by Vegard Flovik
- 4-bit Transformers with bitsandbytes by HuggingFace
- HuggingFace bitsandbytes Integration by HuggingFace
- OpenAI API Documentation by OpenAI
- Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by LLMs by Lei Wang et al.
- Self-Discover: Large Language Models Self-Compose Reasoning Structures by Pei Zhou et al.
- ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models by Binfeng Xu et al.
- Conversational User-AI Intervention: A Study on Prompt Rewriting for Improved LLM Response Generation by Rupak Sarkar et al.
- Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning by Can Jin et al.
- X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs by Rui Ye et al.
- Eliciting Reasoning in Language Models with Cognitive Tools by Brown Ebouky et al.
- Scalable Extraction of Training Data from (Production) Language Models by Milad Nasr et al.
- Online Advertising Revenue Forecasting: An Interpretable Deep Learning Approach by Max Würfel et al.
- Learn System Design in a Hurry by Evan King
- Engineering Leader's Guide: How to Become a Great Coach and Mentor by Gregor Ojstersek
- This Is Holding Most Engineers Back from Lead Roles by Gregor Ojstersek
- Prompt Engineering Guide by Brex
- Agentic Data Analysis by William White
- LobeHub by LobeHub Team — open-source AI chat platform
- Cheshire Cat AI by Piero Savastano — open-source AI assistant framework
- Google Cloud Workflows Demos by Google Cloud
- OpenAI Tokenizer by OpenAI — visualize token splits for any text
- Code Wiki by Google — AI-powered wiki for code repositories
- GitNexus by Abhigyan Patwari — client-side knowledge graph for code repos
- Artificial Analysis by Artificial Analysis — independent benchmarks for AI models and APIs
- RAGAS — LLM Evaluation Framework by Exploding Gradients
- DeepLearning.AI Courses by Andrew Ng
- Google Machine Learning by Google
- Kaggle — Prompt Engineering Whitepaper by Google
- DS Cheatsheets by Favio Vazquez
- Developer Roadmap by Kamran Ahmed
- Stanford CRFM Ecosystem Graph by Stanford CRFM
- Feature Store by featurestore.org community