diff --git a/.gitignore b/.gitignore
index d408bb15..1b4b7f6a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,3 @@
+.DS_Store
 /public/
-.hugo_build.lock
\ No newline at end of file
+.hugo_build.lock
diff --git a/README.md b/README.md
index a3205778..3953d7c6 100644
--- a/README.md
+++ b/README.md
@@ -4,4 +4,42 @@
 
 采用Hugo进行生成，目录结构和其他Hugo博客类似。
 
-注意deprecated-content文件夹中，包含了一些写过，但是最终放弃发布的文章。一般来说这些文章将不会再更新，甚至可能会在未来彻底删除。
\ No newline at end of file
+注意deprecated-content文件夹中，包含了一些写过，但是最终放弃发布的文章。一般来说这些文章将不会再更新，甚至可能会在未来彻底删除。
+
+
+## 当前更新
+> 结合[ToDoList](./content/post/Life/ToDoList.md)一起来看
+
+> 有先后顺序
+
+### 读书笔记
+- [Kubernetes读书笔记总结](./content/post/book/kubernetes-handbook.md)
+- [大规模C++软件开发：卷1过程与架构](./content/post/book/large-scale-cpp.md)
+- [Linux：Web和I/O系列书籍读书笔记](./content/post/book/linux-web-io-series.md)
+- [图解Linux内核 读书笔记](./content/post/book/linux-kernel-pictures.md)
+- [Linux-Unix系统编程手册](./content/post/book/linux-unix-system-devman.md)
+- [性能之巅：企业和云计算](./content/post/book/system-performance.md)
+- 差个别章节 [深入理解Nginx：模块开发与架构解析](./content/post/book/understanding-nginx.md)
+
+### CSBasic
+- 缺少一些补充[algo1](./content/post/CSBasic/algo1-analysis.md)
+
+### Framework
+- [WebRTC音视频实时交互技术-合集](./content/post/Framework/webrtc.md)
+
+### language
+- [Go：合集篇](./content/post/language/go-all-in-one.md)
+
+### Middleware
+- [大数据和AI技术：框架篇](./content/post/Middleware/bigdata&ai-framework.md)
+- [大数据和AI技术：基础篇](./content/post/Middleware/bigdata&ai-basic.md)
+- [大数据和AI技术：开坑篇](./content/post/Middleware/bigdata&ai-init.md)
+- [redis 单机数据库](./content/post/Middleware/redis-single.md)
+
+# OS
+- 这个系列整体完成度都不高 [边学边用linux系列](./content/post/os/linux-memory.md)
+
+### 其他
+> 其实还有很多未整理、未写完、未填充更多细节，可以多注意下，日常看一看那些内容过少的文章。
+
+各种标记了```draft: true```的文档，以及[草稿箱](./content/post/Life/drafts.md)
\ No newline at end of file
diff --git a/content/post/Life/ToDoList.md b/content/post/Life/ToDoList.md
index 6c4f9840..43c9b296 100644
--- a/content/post/Life/ToDoList.md
+++ b/content/post/Life/ToDoList.md
@@ -32,9 +32,10 @@ thumbnailImage: /images/thumbnail/todo.jpg
 3. 实用工具和技巧（AI回答不好的部分）
 4. 其他兴趣内容
 
-## 更新列表
+## 计划更新列表
 1. [ ] 删除/归档博客中不再有阅读意义的文章内容，甚至是文章
-1. [x] 一个测试框架
+2. [ ] 逐步完成所有施工中、暂停施工的文章
+3. [x] 一个测试框架
     - [x] 开发语言和图形接口选择（建议前后端分离）
         - vue+js+electron：[DrawFlow](https://github.com/jerosoler/Drawflow)
         - 后端python/cpp/java
@@ -43,23 +44,24 @@ thumbnailImage: /images/thumbnail/todo.jpg
     - [x] 测试中的异常处理
     - [ ] 网络化、批量化
     - [ ] 添加重做撤销功能（可持久化数据结构）
-2. [ ] 技能准备
+4. [ ] 技能准备
     - 中间件学习：ElasticSearch、HDFS & HBase、Hadoop & MapReduce & HIVE、Flink、Impala
     - 中间件复习：Redis、NginX、MySQL、Kafka
     - 分布式：一致性哈希、Raft、分布式事务
     - 容器：Docker、k8s
     - 工具：perf、gperftool、gprof、gdb、flamegraph
-3. [ ] 博客填充
+5. [ ] 博客填充
     - 所有内容较少的页面，尽量填充（也就是说快点把该学的东西补上来）
     - 在复习的过程中，补齐应有的图片
-4. [ ] 把PC版的页面文章宽度增加一些
-5. [ ] 算法分析，所有的分析方式（尤其是势函数法）
-6. [ ] 跑通一个跨端WebRTC Demo。PC + Web。
-7. [ ] 把菜谱章节更新一下。
-8. [ ] 找一下二胡的乐谱，偶尔练一练。
-9.  [ ] redis尽快写完。
-10. [ ] 整理cs_misc
-11. [ ] davinci学习
+6. [ ] 把PC版的页面文章宽度增加一些
+7. [ ] 算法分析，所有的分析方式（尤其是势函数法）
+8. [ ] 跑通一个跨端WebRTC Demo。PC + Web。
+9. [ ] 把菜谱章节更新一下。
+10. [ ] 找一下二胡的乐谱，偶尔练一练。
+11. [ ] redis尽快写完。
+12. [ ] 整理cs_misc
+13. [ ] davinci学习
+14. [ ] Envoy学习
 
 ## 更新一
 1.  [ ] linux-file - 30h
@@ -117,6 +119,7 @@ thumbnailImage: /images/thumbnail/todo.jpg
     - [ ] 云原生数据中心网络
     - [ ] 网络虚拟化技术详解 NFV与SDN
     - [ ] kubernetes网络权威指南 基础、原理与实践
+    - [ ] 色彩与光线（Color and lights）
 
 - 其他：
    - [ ] 3D数学基础：图形和游戏开发
diff --git a/content/post/Life/drafts.md b/content/post/Life/drafts.md
index 35002912..9cc387bd 100644
--- a/content/post/Life/drafts.md
+++ b/content/post/Life/drafts.md
@@ -79,3 +79,38 @@ draft: true
 - 文案
     - I have loved you. I did my best.
     - 以离校身份，转一转学校最后的地方，用剪辑和延时拍摄一下该地点的回忆。
+
+## 剪辑素材整理
+挑出来的放在`D:\Resources\Photography\Video\ChooseFor2011-2024`
+目前看我的网盘/多媒体/影响资料/本科阶段
+
+> 子目录按照文件名排序（从大到小），从上往下看。比如文件夹2015在2014之前看。
+
+进度：本科阶段/手机相册/2015.6.9~9.111
+
+分开剪4个视频
+1. 挑本科、研究生期间的
+   脚本思路：（下面有一些素材在gif中，建议剪之前先看看gif）
+    1. 开场先来一段最后的内容。
+    2. 导带特效，倒序，穿插一些重要的事件节点
+    3. 从登校日开始
+        1. 以为学院路，实际沙河
+        2. 男女比：it was at this moment, he knew he f'd up
+        3. 课程很多：膝盖中箭
+    4. 各种素材，各种meme
+        1. 到学院路：五百年了
+        2. 跟凯哥的表白墙可以作为下期预告
+    5. 本科毕业
+    6. 研究生往后剪辑速度要加快
+    7. callback，把开场的素材放的更多更全。
+
+2. 毕业之后的：剪辑2014~2021
+
+3. 高中部分
+
+4. 高中之前部分和前面所有的一起。
+
+
+如何避免大量照片素材无聊
+1. 猜猜看：放出来一个照片角落，猜是哪里。分不同级别，入门级->王者级。
+2. 多加一些meme：尤其是适合静态图片的。或者给图片加一些特效（咧嘴笑或者撅嘴）
diff --git a/content/post/Middleware/Docker.md b/content/post/Middleware/Docker.md
index a4076362..de2a7fe7 100644
--- a/content/post/Middleware/Docker.md
+++ b/content/post/Middleware/Docker.md
@@ -291,6 +291,61 @@ struct cgroup_subsys {
   ```
 2. docker-compose
 
+## 坑
+### 安装后root
+官方文档中有提到，默认安装后使用root用户，因此需要sudo。如果不想用sudo权限，则需要配置添加当前用户到docker，并重启或注销。
+参考[Linux post-installation steps for Docker Engine](https://docs.docker.com/engine/install/linux-postinstall/)
+```bash
+sudo groupadd docker
+sudo usermod -aG docker $USER
+newgrp docker
+```
+
+
+### 代理
+docker使用过程中需要代理的一共有三个部分，一个是docker pull的时候使用的代理，另一个是为容器提供的代理，最后是docker build时。
+参考[如何优雅的给 Docker 配置网络代理](https://www.cnblogs.com/Chary/p/18096678)
+
+简单总结如下
+```bash
+# docker pull代理
+# 准备daemon配置
+sudo mkdir -p /etc/systemd/system/docker.service.d
+sudo touch /etc/systemd/system/docker.service.d/proxy.conf
+
+# 文件内容
+# [Service]
+# Environment="HTTP_PROXY=http://proxy.example.com:8080/"
+# Environment="HTTPS_PROXY=http://proxy.example.com:8080/"
+# Environment="NO_PROXY=localhost,127.0.0.1,.example.com"
+
+# ========================
+# 容器内代理
+vim ~/.docker/config.json
+# {
+#  "proxies":
+#  {
+#    "default":
+#    {
+#      "httpProxy": "http://proxy.example.com:8080",
+#      "httpsProxy": "http://proxy.example.com:8080",
+#      "noProxy": "localhost,127.0.0.1,.example.com"
+#    }
+#  }
+# }
+
+# ========================
+# 参考博客中提到docker build需要使用时配置，用户级配置无效
+docker build . \
+    --build-arg "HTTP_PROXY=http://proxy.example.com:8080/" \
+    --build-arg "HTTPS_PROXY=http://proxy.example.com:8080/" \
+    --build-arg "NO_PROXY=localhost,127.0.0.1,.example.com" \
+    --network host
+    -t your/image:tag
+
+```
+
+
 ## 参考
 《Docker容器与容器云》
 
diff --git a/deprecated-content/vmware.md b/content/post/Tools/vmware.md
similarity index 87%
rename from deprecated-content/vmware.md
rename to content/post/Tools/vmware.md
index 9a718fdf..fef4c076 100644
--- a/deprecated-content/vmware.md
+++ b/content/post/Tools/vmware.md
@@ -12,6 +12,14 @@ thumbnailImage: /images/thumbnail/VMware.jpg
 ---
 VMWare作为Windows上最常用的虚拟机。是跨平台开发必不可少的帮手。本文记录一些实用的VMWare知识
 <!--more-->
+
+## vmware tools
+在workstation较高版本之后，vmware tools只提供windows虚拟机实用了。如果安装ubuntu虚拟机。请实用open-vm-tools。
+```bash
+sudo apt update
+sudo apt install open-vm-tools
+```
+
 ## 共享文件夹
 1. 需要虚拟机内系统支持该特性（VMWare Tools）
 2. 设置-选项-共享文件夹-选择主机内指定目录
diff --git a/content/post/book/CppTemplates2nd.md b/content/post/book/CppTemplates2nd.md
index f336101e..bb488d8d 100644
--- a/content/post/book/CppTemplates2nd.md
+++ b/content/post/book/CppTemplates2nd.md
@@ -6,7 +6,7 @@ categories:
 - C++
 tags:
 - C++
-- 施工中
+- 暂停施工
 - 读书笔记
 thumbnailImagePosition: left
 thumbnailImage: /images/thumbnail/book/CppTemplate2nd.jpg
diff --git a/content/post/book/cpp-concurrency-in-action.md b/content/post/book/cpp-concurrency-in-action.md
index c2fe0ca7..3058ece1 100644
--- a/content/post/book/cpp-concurrency-in-action.md
+++ b/content/post/book/cpp-concurrency-in-action.md
@@ -6,7 +6,7 @@ categories:
 - C++
 tags:
 - C++
-- 施工中
+- 暂停施工
 - 读书笔记
 thumbnailImagePosition: left
 thumbnailImage: /images/thumbnail/book/cpp-concurrency.png
diff --git a/content/post/book/hadoop-definitive-guide.md b/content/post/book/hadoop-definitive-guide.md
index b9c727e2..77dc4377 100644
--- a/content/post/book/hadoop-definitive-guide.md
+++ b/content/post/book/hadoop-definitive-guide.md
@@ -7,7 +7,6 @@ categories:
 tags:
 - Hadoop
 - 中间件
-- 暂停施工
 thumbnailImagePosition: left
 thumbnailImage: /images/thumbnail/hadoop-logo.jpg
 ---
@@ -342,25 +341,17 @@ mapred streaming -input input -output output3 -mapper /bin/cat -reducer /usr/bin
 - 提供了作业完成通知选项供配置
 
 ## 生态
-### HBase和Hive
 
-### ZooKeeper
-
-### Spark
-
-### Flink
-
-### Avro
-
-### Flume
-
-### Sqoop
-
-### Pig
-
-### Solr
-
-### 其他
+- HBase和Hive
+- ZooKeeper
+- Spark
+- Flink
+- Avro
+- Flume
+- Sqoop
+- Pig
+- Solr
+- 其他
 
 
 ## 一些坑：
diff --git a/content/post/book/kubernetes-handbook.md b/content/post/book/kubernetes-handbook.md
index 69c4f795..39df5ad3 100644
--- a/content/post/book/kubernetes-handbook.md
+++ b/content/post/book/kubernetes-handbook.md
@@ -7,7 +7,7 @@ categories:
 tags:
 - Kubernetes
 - 中间件
-- 施工中
+- 暂停施工
 thumbnailImagePosition: left
 thumbnailImage: /images/thumbnail/k8s-logo.png
 draft: true
diff --git a/content/post/book/large-scale-cpp.md b/content/post/book/large-scale-cpp.md
new file mode 100644
index 00000000..35675170
--- /dev/null
+++ b/content/post/book/large-scale-cpp.md
@@ -0,0 +1,21 @@
+---
+title: "大规模C++软件开发：卷1过程与架构"
+date: 2025-08-03T20:41:51+08:00
+categories:
+- 计算机科学与技术
+- C++
+tags:
+- C++
+- 施工中
+- 读书笔记
+thumbnailImagePosition: left
+thumbnailImage: /images/thumbnail/large-scale-cpp-1.jpg
+draft: true
+
+---
+看一下编写大型C++工程的经验谈。后面两卷暂时没有出版，分别是设计与实现，验证与测试。
+<!--more-->
+
+# 第0章 动机
+
+目标：研发进度、产品内容、研发预算
\ No newline at end of file
diff --git a/content/post/book/linux-kernel-pictures-1.md b/content/post/book/linux-kernel-pictures-1.md
new file mode 100644
index 00000000..fc11eecf
--- /dev/null
+++ b/content/post/book/linux-kernel-pictures-1.md
@@ -0,0 +1,1425 @@
+---
+title: "图解Linux内核 读书笔记（上）"
+date: 2025-01-01T16:28:33+08:00
+categories:
+- 计算机科学与技术
+- 操作系统
+tags:
+- 操作系统
+- 施工中
+- 读书笔记
+thumbnailImagePosition: left
+thumbnailImage: /images/thumbnail/book/linux-kernel-pictures.jpg
+draft: true
+mermaid: true
+math: true
+---
+本书提供了大量的插图，来学习Linux内核。
+<!--more-->
+
+> 推荐直接来这个网站看linux kernel：[bootlin](https://elixir.bootlin.com/linux/v5.0/source/Documentation/x86/x86_64/mm.txt)，这个页面是mm.txt的，不过版本较低（v5.0）。也有很多其他重量级开源项目。
+
+> 本书有很多细节，汇编代码，因此也建议作为科普，工具书阅读。有需要的时候可以回来看看。这里会尽量精简重点内容。目标就是看一遍能有个大概。
+
+> 书中所使用的两个Linux版本，分别为3.10和6.2。如果某一个版本代码和书中对不上，就去看另一个版本吧。
+
+> 本页为上半部分，包括内存和文件系统
+
+## 概述和基础知识
+1. 内核代码结构：
+    - Documentation：文档
+    - arch：和体系结构有关的，或者是其他模块中需要区分体系结构的内容
+    - kernel：核心部分，包括进程调度、中断处理、时钟等，和体系结构相关的会放到/arch/xxx/kernel
+    - drivers：驱动
+    - mm：内存管理，同样也会有在/arch/xxx/xx下
+    - fs：文件系统，一种文件系统拥有一种子目录
+    - ipc：进程间通信
+    - block：块设备管理
+    - lib：内核空间下的通用函数库
+    - init：内核初始化
+    - firmware：由外部设备的芯片运行的固件程序
+    - scripts：内核配置脚本
+    - 其他（本书不涉及的）：net、crypto、certs、security、tools、virt（虚拟化）
+1. 基础数据结构
+    - Linux目前仍以C为主，所以其数据结构，以struct、和container_of等宏的方式形成，不像其他面向对象的语言提供的那种数据结构的形式
+        ```c
+        // 再复习一下 container of
+        // typeof 是gnu c关键字
+        #define container_of(ptr, type, member) ({   \
+            const typeof(((type *)0)->member) * __mptr = (ptr); \
+            (type *)((char *)__mptr - offsetof(type, member)); })
+
+        #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
+        ```
+    - 一对多的描述方式：将链表结构嵌入到有需要的数据结构中
+        ```c
+        // 方式一
+        struct branch {
+            struct list_head head;
+            // other member
+        };
+        struct leaf {
+            struct list_head node;
+            // other member
+        };
+
+        // 用于串联的list_head
+        struct list_head {
+            struct list_head *next, *prev;
+        };
+
+        // 方式二，节省空间但有一些不便
+        struct hlist_head {
+            struct hlist_node *first;
+        };
+        struct hlist_node {
+            // pprev是前一个hlist_node的next指针的地址
+            // 即若prev => curr, curr->pprev == &prev->next
+            struct hlist_node *next, **pprev;
+            // other member
+        };
+        ```
+    - 多对多的描述方式：将多对多联系，抽象为connection结构体，再嵌入到有需要的数据结构中
+        
+        示例，这只是一个选课场景（学生-老师）下的示例，表示c语言具备的多对多抽象能力。
+        实际场景中，设备-设备处理程序，就是一个类似的多对多的关系。
+        ```c
+        // 书上的示例描述了一个老师-学生的多对多场景
+        // 一个学生可以有多个老师，一个老师可以有多个学生
+
+        // 具有同一个老师的某些学生
+        struct s_connection {
+            // 连接到下一个s_connection
+            struct list_head node;
+            // 指向一个学生
+            struct student *student;
+        };
+        
+        // 具有同一个学生的某些老师
+        struct t_connection {
+            // 连接到下一个t_connection
+            struct list_head node;
+            // 指向一个老师
+            struct teacher *teacher;
+        };
+
+        // 因为师生关系一定是双向的，所以可以将两种联系合并, 代表一个学生-老师的联系
+        struct connection {
+            // 其他具有相同老师的connection
+            struct list_head s_node;
+            // 其他具有相同学生的connection
+            struct list_head t_node;
+            // 当前老师
+            struct teacher *teacher;
+            // 当前学生
+            struct student *student;
+        };
+
+        // 最后，每个学生和老师的数据结构
+        struct teacher {
+            // 连接到connection中的s_node
+            // 遍历该s_node的connection，可获得所有的student
+            struct list_head head_of_student_list;
+            // other member
+        };
+        struct student {
+            // 连接到connecton中的t_node
+            // 遍历该t_node的connection，可获得所有的teacher
+            struct list_head head_of_teacher_list;
+            // other member
+        }
+        ```
+2. 设计模式：注意内核的设计方式，是面向对象的
+    - 模板方法模式：Template Method。即开发者实现固定的接口，系统会根据流程进行调用。
+    - 观察者模式：内核中，xxx_listener、xxx_notify
+3. 中断
+    
+    广义的中断可进一步细分为中断（interrupt）和异常（exception）。更进一步的，中断分为可屏蔽和不可屏蔽，都是来自I/O设备的。异常则是程序主动进行的，包括陷阱、故障和终止。不论是哪种，CPU只会在一个指令执行完成后再检查，**不会在执行中检查**。
+    
+    ![中断处理流程](/images/book/linux-pic/idt.png)
+
+    中断处理需要软硬件分工合作。中断控制器和CPU相连，单CPU架构和SMP架构中分别是PIC（可编程中断控制器），IOAPIC（高级可编程中断控制器）。CPU提供了处理的指令、以及相应的寄存器位来存储。
+
+    区分两个概念：**中断处理程序**是指整个处理过程，从保护现场、处理、恢复现场。**中断服务例程**是其中的一部分，是专门处理产生中断的设备的相关逻辑的。中断服务例程涉及到两个关键的结构体：`irq_desc`、`irqaction`，是一对多，因为一个中断是可以被共享的。一个irq号对应一个`irq_desc`，会有通用的handler，而一个`irqaction`则代表一种设备更具体的处理，会有自己的handler供`irq_desc`中的handler调用。
+
+    Top Half和Bottom Half，一些函数可看到th、bh的后缀，代表前半段、后半段。中断处理应当快速，头半段不能做复杂的处理。复杂逻辑应当使用工作队列、软中断，或启动单独的线程工作。
+
+    注册`irq_desc`和`irqaction`的方式。
+    ```c
+    int request_threaded_irq(unsigned int irq, irq_handler_t handler,
+        irq_handler_t thread_fn, unsigned long flags, const char *name, void *dev);
+    
+    int request_irq(unsigned int irq, irq_handler_t handler,
+        unsigned long flags, const char *name, void *dev);
+    ```
+
+    这里有一个具体的例子，键盘和鼠标等外设，可以是共享相同irq（例如200）的设备，但是二者并不会直接拉起一次中断处理程序，会是由GPIO再发起一个irq（例如50），GPIO的设备将会负责相应的`irq_desc`的处理。键盘和鼠标只需要完成自己的对应200的，`irq_desc`，`irq_action`。共享中断需要该设备的驱动能够区分出来，是否是自己设备发出的，如果不能，那么不能进行共享。
+
+    中断处理还有很多细节：比如中断处理时，又有新中断发生（一般来说是会继续处理最新的中断，如果有多个新中断，会丢失中间的）；是否还有软中断需要处理；中断处理结束后，需要返回内核态还是用户态等
+
+    **软中断**：对于timer、tasklet等，内核支持一些软中断来完成这些事情。主要包括定时器，小任务，网络读写，块读写等。注意这里说的都是内核空间的事情，是内核可以使用的能力。
+
+    从处理流程上来看，系统调用其实也是中断（异常）的一种。因此减少系统调用对优化是有一定作用的。
+
+4. Linux的时间
+    
+    内核的时间功能分为两种：一个是作为时钟源，提供时间戳信息。另一个是提供时钟中断，以供一次性或周期性事件的触发。
+
+    内核的时间单位：jiffy，滴答。以及```ktime_t```。
+
+    核心的数据结构，```timekeeper```、```clocksource```、```clock_event_device```时钟事件。
+    
+    时钟源是有等级的，内核会选择一个作为“看门狗”，其他时钟源可以受此监督，如果某个其他时钟源误差过大，将会置为不稳定。时钟芯片有很多种，RTC、PIT、TSC、HPET、APIC Timer。
+
+    内核维护的事件有多种，常见的有：REALTIME（系统时间，也是WALL TIME墙上时间）、MONOTONIC（非休眠时间）、BOOTTIME（启动时间，包括休眠）
+
+    时钟中断是触发进程调度的最常见的情景之一。在中断章节中也可以，这种情况下，中断程序负责标记进程需要调度，并在中断返回时，内核态下检查标记在内核态进一步完成调度。
+
+## 内存管理篇
+
+### 内存寻址
+   
+广义的内存管理，也就是CPU所说的内存管理，其实是包括所有有效的连接在总线上的存储。换言之CPU访问的物理地址并不一定真的在RAM里。
+
+既然内存空间包括多种存储设备，Linux系统将会把所有的这些映射到内存空间中，即MMIO（memory mapped io）。可以通过```/proc/iomem```。设备寄存器、显存等都可以是MMIO的一部分。不过这一点需要CPU架构的支持。
+
+而且内存空间并不是连续的，会有一些用不到的空洞Hole。
+
+内存管理，实际上需要维护内存介质（RAM + MMIO）、内存空间、虚拟内存**三者之间**的关系。其中前两者之间的映射，由BIOS完成。
+![mmio](/images/book/linux-pic/mmio.png)
+
+Linux中共有三种地址：虚拟地址（Virtual Address）、线性地址（Linear Address）、物理地址（Physical Address）。应用程序使用的是虚拟地址，虚拟地址通过分段机制（用户代码段、用户数据段、内核代码段、内核数据段）后就变为线性地址。内存管理单元MMU将会用分页机制，把线性地址转换为物理地址。Linux上虚拟地址和线性地址其实几乎相同（段描述符基准地址为0）。
+
+MMU寻址部分就是多级页表的机制。32位和64位有一些区别。这里强调几点：寻址由MMU硬件完成，各级页表项所包含的地址都是物理地址。页框是指划分好的一块连续的物理内存，而页/页面是指对应页框大小虚拟内存。P.S.:可以再去[复习一下](https://blog.csdn.net/weixin_49342084/article/details/142773491)CR3寄存器（PDBR）、PTBR。页表的加载和寻址是从CR3赋值开始的，这一数据存储在每个进程的进程控制块task_struct中。
+
+即使有了MMU，操作系统仍然需要完成虚拟地址到物理地址的映射的建立。就是说页表需要操作系统来设置，内核提供了大量的函数和宏来做这些事情。如果有需要，这里可以结合一些博客来学习，强烈推荐如[Linux Kernel直接映射区的构建](https://zhuanlan.zhihu.com/p/692536727)、[Linux Kernel内存管理之分页](https://zhuanlan.zhihu.com/p/661911303)。另外也可以考虑参考[Intel x86-64开发人员手册](https://www.intel.cn/content/www/cn/zh/content-details/858440/intel-64-and-ia-32-architectures-software-developer-s-manual-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4.html)，该手册内有很多图表值得一看。不看这些博客的话，简单看一下下面也可以。
+
+目前分页最多的时候有5级页表，页全局目录PGD、页四级目录P4D、页上级目录PUD、页中级目录PMD、页表PT（第一级）。而且一般如果只有2级，则P4D、PUD、PMD的项数均为0，即10、0、0、0、10，最后页表内有12位物理地址偏移，总共32位。常规4K页面的分页下的一个内核直接映射内存区的页表建立方法如下：
+
+> 内存直接映射区：**内核空间**中一个大的，连续的虚拟内存空间，他映射了部分或所有物理内存。
+
+```c
+/*
+* 页表导航说明：
+*
+* pgd, p4d, pud, pmd, pte 均为指向各级页表项的虚拟地址指针。
+* pfn（页框号）为物理页帧编号，此处为 0x12，对应物理地址 0x12000（即 0x12 << PAGE_SHIFT）。
+*
+* 本代码目标：为指定的物理页帧 pfn 建立对应的页表映射（线性地址空间中）。
+*/
+
+// 计算给定物理页帧在线性地址空间中的 PGD（Page Global Directory）索引
+// 注意：(pfn << PAGE_SHIFT) + PAGE_OFFSET 将物理页帧转换为对应的线性地址（内核线性地址直接映射区）
+// PAGE_OFFSET 是内核线性映射的起始虚拟地址偏移（如 0xFFFF888000000000 在 x86_64）
+pgd_idx = pgd_index((pfn << PAGE_SHIFT) + PAGE_OFFSET);
+pgd = pgd_base + pgd_idx;  // 获取该线性地址对应的 PGD 项指针
+
+/*
+* 在当前配置（通常为 4-level 分页但启用兼容模式或线性映射平坦）下，
+* p4d、pud、pmd 层级可能被折叠或直接透传，因此偏移量为 0。
+* 使用 p4d_offset/pud_offset/pmd_offset 获取下一级页表指针。
+*/
+p4d = p4d_offset(pgd, 0);
+pud = pud_offset(p4d, 0);
+pmd = pmd_offset(pud, 0);
+
+/*
+* 检查 PMD 项是否已存在且有效（即指向一个页表页）。
+* 如果对应页表页未分配（_PAGE_PRESENT 位未设置），则需分配一个新页表。
+*/
+pte_ofs = pte_index((pfn << PAGE_SHIFT) + PAGE_OFFSET);  // 计算 PTE 索引（页内偏移）
+if (! (pmd_val(*pmd) & _PAGE_PRESENT)) {
+    // 分配一个位于低地址区域的物理页作为页表页（页表本身存储空间）
+    pte_t *page_table = (pte_t*) alloc_low_page();
+
+    /*
+    * 构造 PMD 项值：
+    *   - __pa(page_table): 获取 page_table 的物理地址
+    *   - _PAGE_TABLE: 标志位，表示该 PMD 指向一个页表（而非大页）
+    *   - __pmd(): 将整型值封装为 PMD 类型
+    *   - set_pmd(): 安全地更新 PMD 项
+    */
+    set_pmd(pmd, __pmd(__pa(page_table) | _PAGE_TABLE));
+}
+
+/*
+* 获取最终的 PTE（Page Table Entry）指针。
+* pte_offset_kernel() 根据 PMD 和线性地址中的页内偏移计算出 PTE 位置。
+*/
+pte = pte_offset_kernel(pmd, pte_ofs);
+
+/*
+* 设置 PTE 项，建立最终的物理页映射：
+*   - pfn_pte(pfn, prot): 将页帧号 pfn 与访问权限 prot 组合成一个 PTE 值
+*   - set_pte(): 将生成的 PTE 值写入页表项
+*/
+set_pte(pte, pfn_pte(pfn, prot));
+
+/*
+* 至此，物理页帧 pfn 已成功映射到线性地址空间中对应的位置。
+* 后续可通过 (pfn << PAGE_SHIFT) + PAGE_OFFSET 访问该物理页。
+*/
+```
+
+不过注意虽然现代计算机已经开始64位了，但其实并不允许使用全部的64位寻址，而通常只使用48位（而且用户空间为高16位为0，内核空间高16位为1）。中间空洞的地址是非法的，因此实际上一共只能使用256T内存。
+
+> 其他扩展阅读：[linux kernel pwn之ret2dir攻击学习](https://www.anquanke.com/post/id/185408)
+
+### 物理内存的管理
+    
+> 联动一下博客中的：[边学边用linux-内存管理]({{<relref "/content/post/OS/linux-memory.md#Buddy">}})
+
+概念：节点（node）、区域（zone）、非统一内存访问（NUMA，和传统SMP架构相对，以socket为区分，将CPU和内存分组为不同的node，一组CPU访问自己组内的内存更快）。可以在```lscpu```中看到cpu的分组信息。
+
+BIOS提供了SRAT（System Resource Affinity Table）、SLIT（System Locality Information Table）两个表，用来确定系统资源亲和性和延迟的信息。系统会进一步用来控制CPU上进程的对应的物理内存申请。
+
+而zone则是对node内的资源再进行划分。zonelist中存储的就是对node中的内存的划分。划分至少是出于兼容性的考虑，比如有些设备只能访问指定的部分，因此需要将这部分内存保留出来。
+
+![node-zone](/images/book/linux-pic/node-zone.png)
+
+内核分配内存时，每一个NUMA节点就会从节点保存的zonelist上寻找。如果有多个node且允许尝试其他node的内存，则需要维护一个更复杂的zonelist（维护所有node的所有zone）。注意不同的NUMA节点，其zonelist会略有差别。总的来说会按照优先本地，优先高位地址的顺序排列。
+
+一页物理内存对应一个Linux中的```page```对象。在这个思路指导下，Linux管理物理内存实际上有三种模式：FLATMEM、SPARSEMEM、SPARSEMEM_VMEMMAP。区别在于对物理内存的认定，以及对page对象的管理方式不同，page对象和pfn（页框号）的转换方式不同。
+
+内存配置情况，可以通过```/sys/firmware/memmap```查看，这里会列出每一段bios提供的物理内存段。但是注意其中并不是所有的部分都可以用作内存分配，有一些内存会预留给其他模块使用。这些不能用物理内存也称为hole。
+
+- FLATMEM：把内存看作连续的，即使中间有上面说到的hole，这些hole也是有page对象对应的。显然会造成一些page对象的浪费。
+- SPARSEMEM：将内存做切分，有效的部分分配若干连续的section，section内是若干page，无效的hole部分不再分配section&page。
+- SPARSEMEM_VMEMMAP模式【理解存疑】：依然会为有效的部分分配若干的section，但是要求分配出来的page对象的地址位于虚拟地址连续的区间上。也就是说page对应的虚拟内存地址从一开始就是确定了的。不过只有活跃的部分才会得到真正的物理内存。这种模式下，对于某个物理页而言，其pfn对应的page对象的虚拟地址是```vmemmap + pfn```。
+
+> 区分对内存连续性的要求，虚拟地址连续性是比较好满足的，但仍然有一些场景，比如使用DMA时，可能需要物理地址也连续。
+
+![SPARSEMEM_VMEMMAP](/images/book/linux-pic/sparsemem_vmemmap.png)
+
+内存申请管理一般有三个阶段：启动程序、memblock、buddy。启动阶段即grub程序，grub程序可以通过```mem```参数来限制内核可管理的内存上限。memblock也可以通过将内存块加入```reserve```数据组扣留一部分，最后才是buddy系统管理。对于操作系统而言，memblock是内存管理的第一个阶段，buddy系统会接替他的工作。
+
+> 内存管理还有更多方案：比如huge tlb，但本书并未讨论。
+
+buddy系统的名字恰如其实。buddy将内存分为不同大小的块，1页，2页，4页...1024页（对应4K、8K、16K...4M）共11个级别（order阶）。如果块的伙伴也是空闲的（实际上已分配出去的块，不再属于伙伴系统），就可以合并为一个更大的块。确定伙伴的规则包括：
+1. 两个块相邻，且位于同一个zone
+2. 每个块大小都是2的整数次幂。合并后也要是，所以两个快的阶要相同
+3. 两个块的地址必须是$2^n$对齐的，合并之后第一个块的地址则需要是$2^(n+1)$对齐的
+
+zone和page是上下层级的关系。完整的层级是section（内存初始化和热插拔单位）→zone（分配管理单元）→page（页）。zone内按照阶，存储了所有阶```free_area```，其中每个还分为可迁移和不可迁移等类型。具体如下图。
+
+![zone_page](/images/book/linux-pic/zone_page.png)
+
+页的申请和释放函数，是上面曾见到过的：```alloc_page/pages```,```free_page/pages```等。```alloc```函数在使用时有很多参数，包含优先选择的zone和其他影响内存分配的行为，比如分配的优先级（是否需要保持zone内的分配水位，更高优先级可以使用一些预留的内存）。
+
+可以看出，buddy系统所能提供的物理内存，要么可以物理地址连续但不能超过4M，要么可以超过4M但物理地址不能保证连续了。
+
+
+### 虚拟内存的管理
+   
+每一个进程的线性地址空间（虚拟地址）划分分为内核态和用户态。内核态起始位置就是之前见到过的宏，```PAGE_OFFSET```。而且内核空间实际上是进程间直接、或者间接共享的。可以理解为用户态空间互相独立，内核态空间共享。正因如此，用户空间的页表需要进程自行维护，是用户页表。而内核页表很多情况下是相同的，属于公共的部分。
+
+在x86时期，空间有限，内核虽然一般有1G的线性空间，但是并不能直接映射1G的物理内存。一般只能直接映射一部分（896M），剩下的部分保留满足其他需秋，直接映射的部分叫```Low Memory```,剩下的部分是```High Memory```。注意x86时期，物理内存是可以超过4G的，但是线性空间只有4G。
+
+所谓直接映射，就是映射后的虚拟地址和物理地址有直接关系，在前面的代码中也能看到：$va = vp + PAGEOFFSET$。此时映射状态（左侧物理内存，右侧线性空间）
+
+![x86-va-pa](/images/book/linux-pic/x86-va-pa.png)
+
+而等到了x86-64时代，线性空间足够大了，不再区分```Low/High Memory```。
+
+具体来看，内核线性空间（从高地址到低地址）内部还分为若干区域：
+- 32位：固定映射区、永久映射区、CPU Entry区、动态映射区、直接映射区
+- 64位：-
+
+![x86-va-space](/images/book/linux-pic/x86-va-space.png)
+上图为32位
+
+![x64-va-space](/images/book/linux-pic/x64-va-space.png)
+上图为64位
+> 64位可能有4、5级页表等不同情况，这里是5级的布局。
+
+接下来介绍一下内核线性空间中的各个区：
+1. 直接映射区：大小理论上是MAXMEM，不考虑```High Memory```的情况下，会一直映射到没有物理内存为止。映射完成后在运行期内不变，因此需要稳定存在的数据结构需要用直接映射区。
+2. 动态映射区：其他区域多少都有限制，但动态映射区能满足各类需求。常见的```ioremap```都在此区域实现。由```get_vm_area```函数族来分配此区域空间。用红黑树管理。
+3. 永久映射区：x86-64上已经不再有这一区域。内核使用```kmap```函数将一页物理内存映射到该区。```kmap```的参数就是page结构体，如果page对应的物理页在```High Memory```会占用永久映射区，如果不再，则返回直接映射下的虚拟地址。该区域也只能以页的单位来进行分配。实际上和“永久”并没有关系。可能会用于和一些设备通信的内存区域。
+4. 固定映射区：内部分为若干小区间，每个区间有特定用途。值得提到的是其中有一个临时映射区，为每一个CPU准备了一些页，通过```kmap_tomic```等函数操作物理内存映射到该区域，申请释放都很快，适合临时使用。
+
+
+#### 详解用户空间内存映射mmap
+mmap其实并不陌生，用于将文件/设备映射进内存，后续可以项访问内存一样访问。但要注意mmap使用的一定是用户线性空间。函数原型如下
+```c
+void* mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
+int munmap(void *addr, size_t length);
+```
+
+其中```flags```有一些讲究了，比如MAP_PRIVATE、MAP_SHARED。前者采用COW策略（Copy On Write），对映射区的更新将会对其他映射了同一区域的进程不可见，也不会写回文件。后者则是共享所有更新，并且会写文件。这里提前说一下，共享是针对物理内存的，后面会详细展开。大的费雷上，mmap分为匿名映射（不由文件映射而来）和非匿名映射（有具体映射的文件/设备）
+
+当然，由于mmap也可以映射设备，因此并不是所有对文件的操作都可以用，具体支持情况依赖于设备驱动。
+
+用户线性空间是以```vm_area_struct```来描述一个一个的用户线性空间中的区域的。进一步整合到进程结构体中的`mm_struct`。
+```c
+/*
+ * This struct defines a memory VMM memory area. There is one of these
+ * per VM-area/task.  A VM area is any part of the process virtual memory
+ * space that has a special rule for the page-fault handlers (ie a shared
+ * library, the executable area etc).
+ */
+struct vm_area_struct {
+	/* The first cache line has the info for VMA tree walking. */
+
+	unsigned long vm_start;		/* Our start address within vm_mm. */
+	unsigned long vm_end;		/* The first byte after our end address
+					   within vm_mm. */
+
+	/* linked list of VM areas per task, sorted by address */
+	struct vm_area_struct *vm_next, *vm_prev;
+
+	struct rb_node vm_rb;
+
+	/*
+	 * Largest free memory gap in bytes to the left of this VMA.
+	 * Either between this VMA and vma->vm_prev, or between one of the
+	 * VMAs below us in the VMA rbtree and its ->vm_prev. This helps
+	 * get_unmapped_area find a free area of the right size.
+	 */
+	unsigned long rb_subtree_gap;
+
+	/* Second cache line starts here. */
+
+	struct mm_struct *vm_mm;	/* The address space we belong to. */
+	pgprot_t vm_page_prot;		/* Access permissions of this VMA. */
+	unsigned long vm_flags;		/* Flags, see mm.h. */
+
+	/*
+	 * For areas with an address space and backing store,
+	 * linkage into the address_space->i_mmap interval tree.
+	 */
+	struct {
+		struct rb_node rb;
+		unsigned long rb_subtree_last;
+	} shared;
+
+	/*
+	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
+	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
+	 * can only be in the i_mmap tree.  An anonymous MAP_PRIVATE, stack
+	 * or brk vma (with NULL file) can only be in an anon_vma list.
+	 */
+	struct list_head anon_vma_chain; /* Serialized by mmap_sem &
+					  * page_table_lock */
+	struct anon_vma *anon_vma;	/* Serialized by page_table_lock */
+
+	/* Function pointers to deal with this struct. */
+	const struct vm_operations_struct *vm_ops;
+
+	/* Information about our backing store: */
+	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
+					   units */
+	struct file * vm_file;		/* File we map to (can be NULL). */
+	void * vm_private_data;		/* was vm_pte (shared mem) */
+
+	atomic_long_t swap_readahead_info;
+#ifndef CONFIG_MMU
+	struct vm_region *vm_region;	/* NOMMU mapping region */
+#endif
+#ifdef CONFIG_NUMA
+	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
+#endif
+	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
+} __randomize_layout;
+
+// 这段代码也要在旧一点的版本中才能找到，
+struct mm_struct {
+	struct {
+        // 映射区域链表头
+		struct vm_area_struct *mmap;
+        // 映射区域红黑树
+		struct rb_root mm_rb;
+    }
+    // ... 其他成员暂时忽略
+}
+```
+
+mmap的实现细节，都在```do_mmap```函数中。
+
+![do_mmap](/images/book/linux-pic/do_mmap.png)
+
+调用流程中，`get_unmmapped_area`用来获取可用的线性空间（也就是书中所说的“坑”）。物理内存在书中则用“萝卜”指代。如果用户不指定`addr`的话，这一步会根据当前进程`mm_struct`对象中`mm_mt`字段来查找合适区域。如果指定了，并检查了该地址开始的线性空间长度足够则使用，否则忽略`addr`重新分配。
+
+而`mmap_region`则进行具体映射。初始化映射对应的`vm_area_struct`，完成映射，将对象插入红黑树。
+
+物理内存的使用情况，也就是mmap最终的效果主要是由对应的文件/设备提供的驱动决定的。这有几种分类
+1. 驱动有自己的物理内存（比如MMIO下的显存），驱动可以使用ioremap将其映射到内核线性空间的动态映射区。
+2. 驱动需要申请内存然后在做映射。这里根据需要还会分为是否要申请连续物理内存。
+3. 驱动中的mmap不提供映射，由后续的内存访问异常，触发内核调用驱动的fault操作，申请物理内存page赋值给vm_fault字段。
+
+在了解了以上这些之后，就能明白共享内存其实是共享物理内存。不同进程之间需要将同一段物理内存，映射到自己的线性空间内。
+
+书中提到`/dev/mem`设备，这是一个影射了物理内存的设备，我们可以用mmap将此设备进行映射，并直接操作物理内存。当然这一操作由于危险性极高，很多情况下已经被禁止直接使用（对应的区域禁止映射）。如果要用的话，可能需要编写驱动，以MMIO的方式进行使用。
+
+#### 内存管理进阶
+
+本节讨论一些更复杂的问题。
+
+1. **内存申请：**
+   - 申请连续物理内存。在buddy系统之下，内核还维护了slab系统，现在有多个版本，slab、slob、slub。slab内部使用`kmalloc/kfree`。起到内存池的作用。受buddy的限制，最大4MB的连续物理内存。
+   - 申请连续虚拟内存。使用`vmalloc`，申请一段一段的物理内存，让后映射到连续的线性空间段上。`vmalloc`的使用场景是为内核态准备内存，因此更新的是内核页表，而非进程的页表。其虚拟地址在内核的动态映射区。
+
+   无论slab还是vmalloc，返回的都是虚拟内存。相比之下，grub、memblock、buddy申请和管理的则是物理内存。
+
+2. **缓存**
+   
+   内存中的数据可以分为两种：页表数据和实际数据。
+   
+   - TLB缓存用于加速对页表的访问。TLB是比较特殊的，内核写页表不会通过TLB，而是直接写内存。所以所有页表项更新的情况下，TLB都需要刷新。
+   - cache缓存用于加速对普通数据的访问。实际上MMIO也会被cache进行缓存。cache有很多的缓存策略，在x86架构下：
+        - Strong Uncacheable：UC，读写都不经缓存
+        - Uncacheable：也是UC，但可以用MTRR（Memory Type Range Register）将其变为WC
+        - Write Combining：WC，允许CPU缓冲多个写操作，并在合适的时候一次性写回内存
+        - Write Back：WB，读写都经过缓存
+        - Write Through：WT，和WB类似，但是写操作也同时写内存
+        - Write Protected：WP，和WB类似，但是每次写都会导致缓存失效
+        
+        需要区分明确是因为，并不是所有的场景都可以使用WB这种效率最高的情况，如果写内存有副作用（比如MMIO的内存，可能是设备的控制位），那就不能用缓存了。
+
+        MTRR机制可以用来设置一段物理内存的缓存方式。BIOS一般已经配置了，可在`/proc/mtrr`文件查看。修改该文件可以更改缓存方式。MTRR有一定限制（硬件相关），所以又有了PAT（Page Attribute Table），粒度是页，可以按照页来精准控制内存的缓存属性。可以在`/sys/kernel/debug/x86/pat_memtype_list`文件中看到配置。
+
+        内存和缓存的不一致问题，不仅在于CPU的读写，在DMA设备访问内存的情况下，也会出现不一致的情况。这种时候也需要刷新缓存。
+
+3. **缺页异常**
+   
+   缺页异常实际上会有不同的种类，CPU提供两项信息：错误码和异常地址。其中错误码存储在栈中，引发缺页异常的虚拟地址存储在CR2寄存器中。
+
+   整体上看错误有三种场景：
+   - 程序逻辑错误：空指针、访问越界、违反权限（写了只读内存）
+   - 访问地址未映射物理内存
+   - TLB过时
+   - COW（Copy On Write）等场景，内存没有写权限。
+  
+    缺页异常程序是`asm_exc_page_fault`，汇编程序【似乎在内核代码中不太好找到实际代码段】。负责保存现场，并收集error_code。最终错误码和异常地址会传递到`handle_page_fault`。
+
+    ![handle page fault](/images/book/linux-pic/handle_page_fault.png)
+
+    根据地址位于内核还是用户空间，分别调用不同的处理程序。
+    
+    地址位于内核空间的情况下：如果进程是用户态，那么只有`vmalloc`和`spurious`两种情况可以处理。因为vmalloc的内存分配情况存储在内核页表，使用了vmalloc申请的内存会缺页异常，需要将页表拷贝给进程页表。spurious则是指的TLB刷新不及时（内存已经变为可读写，但是TLB中仍只读）的情况，产生的虚假错误。各种`bad_area`函数用来处理其它的情况，如果发生缺页异常时，进程处于内核态，会尽量尝试修复错误，否则直接发送SIGSEGV给用户态进程。
+
+    地址位于用户空间的情况下：核心目标就是为地址找到对应的vma（就是我们前面提到过的vm_area_struct）并映射内存。在确认vma和当前的操作权限匹配后，开始真正的缺页处理。这里还要分为三种情况
+    1. 没有完整的物理内存映射，需要申请内存并映射。
+    2. 映射存在，但是物理页被交换了，需要将其读到内存。
+    3. 映射完整，内存可写，但是页表中权限是只读，写内存异常。这对应的也是前面的COW等场景。
+
+    这里需要格外强调：用户空间虚拟内存访问权限分成两个部分。内存映射的权限（在`vma->vm_flags`中），以及页表的权限。前者是全集，在其之外的是错误，没有讨论余地。后者则是表示实际的访问权限，就是说页表中的权限可能发生变化，来适应对应的需求。另外这里所说的两个权限，都是再`vma`的结构体中（flag和prot）。而DMA访问内存实际上是用pte访问的，因此内存实际上一共有三个权限在共同工作。
+
+    `handle_pte_fault`比较重要，是站在pte的角度，处理以上的这些问题。按照书上的内容，补充了一些注释。
+
+    ```c
+    static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
+    {
+        pte_t entry;
+
+        // 这一段if + else，是在处理第一种问题。等待后续分配并映射。
+        if (unlikely(pmd_none(*vmf->pmd))) {
+            /*
+            * Leave __pte_alloc() until later: because vm_ops->fault may
+            * want to allocate huge page, and if we expose page table
+            * for an instant, it will be difficult to retract from
+            * concurrent faults and from rmap lookups.
+            */
+            vmf->pte = NULL;
+            vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID;
+        } else {
+            /*
+            * If a huge pmd materialized under us just retry later.  Use
+            * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead
+            * of pmd_trans_huge() to ensure the pmd didn't become
+            * pmd_trans_huge under us and then back to pmd_none, as a
+            * result of MADV_DONTNEED running immediately after a huge pmd
+            * fault in a different thread of this mm, in turn leading to a
+            * misleading pmd_trans_huge() retval. All we have to ensure is
+            * that it is a regular pmd that we can walk with
+            * pte_offset_map() and we can do that through an atomic read
+            * in C, which is what pmd_trans_unstable() provides.
+            */
+            if (pmd_devmap_trans_unstable(vmf->pmd))
+                return 0;
+            /*
+            * A regular pmd is established and it can't morph into a huge
+            * pmd from under us anymore at this point because we hold the
+            * mmap_lock read mode and khugepaged takes it in write mode.
+            * So now it's safe to run pte_offset_map().
+            */
+            vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
+            vmf->orig_pte = *vmf->pte;
+            vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID;
+
+            /*
+            * some architectures can have larger ptes than wordsize,
+            * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=y and
+            * CONFIG_32BIT=y, so READ_ONCE cannot guarantee atomic
+            * accesses.  The code below just needs a consistent view
+            * for the ifs and we later double check anyway with the
+            * ptl lock held. So here a barrier will do.
+            */
+            barrier();
+            if (pte_none(vmf->orig_pte)) {
+                pte_unmap(vmf->pte);
+                vmf->pte = NULL;
+            }
+        }
+
+        // 继续处理第一种问题（此时pte为NULL）
+        if (!vmf->pte) {
+            if (vma_is_anonymous(vmf->vma))
+                return do_anonymous_page(vmf);
+            else
+                /*
+                非匿名映射，内部根据不同情况进行处理
+                1. 读操作异常：do_read_fault
+                2. 写MAP_PRIVATE映射的内存：do_cow_fault
+                3. 写MAP_SHARED映射的内存：do_shared_fault
+
+                总之最终都会回调vma->vm_ops->fault得到一页内存，再用finish_fault更新页表
+
+                这里do_cow_fault会申请到一页新的物理内存（vmf->cow_page），初始内容是从之前的页vmf->page拷贝过来的
+                */
+                return do_fault(vmf);
+        }
+
+        // 处理第二种情况，加载交换出去的内存
+        if (!pte_present(vmf->orig_pte))
+            return do_swap_page(vmf);
+        
+        if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
+            return do_numa_page(vmf);
+
+        vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
+        spin_lock(vmf->ptl);
+        entry = vmf->orig_pte;
+        if (unlikely(!pte_same(*vmf->pte, entry))) {
+            update_mmu_tlb(vmf->vma, vmf->address, vmf->pte);
+            goto unlock;
+        }
+
+        // 处理第三种情况，写操作异常，没有写权限
+        // 注意区分，上面是pte为NULL的流程，而这里pte是存在的。但是pte中缺少写权限，而禁止了这次访问。
+        // PROT_WRITE且MAP_SHARED，调用相关函数修改权限为可写
+        // PROT_WRITE且MAP_PRIVATE，是COW，申请新的物理内存，复制内容，更新页表
+        if (vmf->flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
+            if (!pte_write(entry))
+                return do_wp_page(vmf);
+            else if (likely(vmf->flags & FAULT_FLAG_WRITE))
+                entry = pte_mkdirty(entry);
+        }
+        entry = pte_mkyoung(entry);
+        if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry,
+                    vmf->flags & FAULT_FLAG_WRITE)) {
+            update_mmu_cache(vmf->vma, vmf->address, vmf->pte);
+        } else {
+            /* Skip spurious TLB flush for retried page fault */
+            if (vmf->flags & FAULT_FLAG_TRIED)
+                goto unlock;
+            /*
+            * This is needed only for protection faults but the arch code
+            * is not yet telling us if this is a protection fault or not.
+            * This still avoids useless tlb flushes for .text page faults
+            * with threads.
+            */
+            if (vmf->flags & FAULT_FLAG_WRITE)
+                flush_tlb_fix_spurious_fault(vmf->vma, vmf->address);
+        }
+    unlock:
+        pte_unmap_unlock(vmf->pte, vmf->ptl);
+        return 0;
+    }
+    ```
+
+    COW有一个非常经典的应用场景：fork子进程。因为子进程需要继承父进程的很多信息，这部分信息复制实际上由`dup_mmap`完成。在复制的过程中，主要就是在做COW。
+
+    ![fork-cow](/images/book/linux-pic/fork-cow.png)
+
+    为了保证COW的效果，实际上父子进程的pte项中的权限都会降级。仔细想想这里其实会有一个问题。就是如果父进程此时想写这里的内存，那么COW的优化意义实际上就失效了，父进程必须复制这段内存（因为父进程要修改了），即使后面并子进程不需要写，这其实可能出现浪费。所以子进程先执行COW更合理。而且如果子进程执行新的程序，那么很多内存都不需要复制，出于这个考虑，内核有一个变量来控制子进程是否可以抢占父进程。
+
+
+### 内存回收
+
+前文在伙伴系统的讲解中，忽略了内存回收这个大问题。这个问题主要由`_alloc_pages_slowpath`完成。
+
+进行回收时可能有几种情况：
+1. 空闲内存足够，但是碎片过多，没有连续内存。这是需要移动并合并一些内存碎片，进行规整（`compact`）。
+2. 空闲内存不足，需要释放一些已经被占用的内存,就是回收（`reclaim`）。典型的例子是mmap的内存，如果释放的话，就写回`swap`（匿名映射），或者写回文件（非匿名映射）
+
+**扫描**是回收的第一步。扫描过程由`scan_control`结构体控制。其内容如下
+```c
+// code from kernel 6.2
+struct scan_control {
+	/* How many pages shrink_list() should reclaim */
+	unsigned long nr_to_reclaim;
+
+	/*
+	 * Nodemask of nodes allowed by the caller. If NULL, all nodes
+	 * are scanned.
+	 */
+	nodemask_t	*nodemask;
+
+	/*
+	 * The memory cgroup that hit its limit and as a result is the
+	 * primary target of this reclaim invocation.
+	 */
+	struct mem_cgroup *target_mem_cgroup;
+
+	/*
+	 * Scan pressure balancing between anon and file LRUs
+	 */
+	unsigned long	anon_cost;
+	unsigned long	file_cost;
+
+	/* Can active folios be deactivated as part of reclaim? */
+#define DEACTIVATE_ANON 1
+#define DEACTIVATE_FILE 2
+	unsigned int may_deactivate:2;
+	unsigned int force_deactivate:1;
+	unsigned int skipped_deactivate:1;
+
+	/* Writepage batching in laptop mode; RECLAIM_WRITE */
+	unsigned int may_writepage:1;
+
+	/* Can mapped folios be reclaimed? */
+	unsigned int may_unmap:1;
+
+	/* Can folios be swapped as part of reclaim? */
+	unsigned int may_swap:1;
+
+	/* Proactive reclaim invoked by userspace through memory.reclaim */
+	unsigned int proactive:1;
+
+	/*
+	 * Cgroup memory below memory.low is protected as long as we
+	 * don't threaten to OOM. If any cgroup is reclaimed at
+	 * reduced force or passed over entirely due to its memory.low
+	 * setting (memcg_low_skipped), and nothing is reclaimed as a
+	 * result, then go back for one more cycle that reclaims the protected
+	 * memory (memcg_low_reclaim) to avert OOM.
+	 */
+	unsigned int memcg_low_reclaim:1;
+	unsigned int memcg_low_skipped:1;
+
+	unsigned int hibernation_mode:1;
+
+	/* One of the zones is ready for compaction */
+	unsigned int compaction_ready:1;
+
+	/* There is easily reclaimable cold cache in the current node */
+	unsigned int cache_trim_mode:1;
+
+	/* The file folios on the current node are dangerously low */
+	unsigned int file_is_tiny:1;
+
+	/* Always discard instead of demoting to lower tier memory */
+	unsigned int no_demotion:1;
+
+#ifdef CONFIG_LRU_GEN
+	/* help kswapd make better choices among multiple memcgs */
+	unsigned int memcgs_need_aging:1;
+	unsigned long last_reclaimed;
+#endif
+
+	/* Allocation order */
+	s8 order;
+
+	/* Scan (total_size >> priority) pages at once */
+    // 补充：就是说priority是右移参数，越小的话，扫描的页数越多
+	s8 priority;
+
+	/* The highest zone to isolate folios for reclaim from */
+	s8 reclaim_idx;
+
+	/* This context's GFP mask */
+	gfp_t gfp_mask;
+
+	/* Incremented by the number of inactive pages that were scanned */
+	unsigned long nr_scanned;
+
+	/* Number of pages freed so far during a call to shrink_zones() */
+	unsigned long nr_reclaimed;
+
+	struct {
+		unsigned int dirty;
+		unsigned int unqueued_dirty;
+		unsigned int congested;
+		unsigned int writeback;
+		unsigned int immediate;
+		unsigned int file_taken;
+		unsigned int taken;
+	} nr;
+
+	/* for recording the reclaimed slab by now */
+	struct reclaim_state reclaim_state;
+};
+```
+
+从结构体中可以看到。包含了需要回收的页数，已扫描的页数、已回收的页数等等信息。具体的回收函数`shrink_zones`是在一个循环中进行的，在某次调用后，可能出现的情况有：
+1. 当已回收的页数大于需要回收的页数，函数成功退出
+2. 已回收的页数不够，增加下次扫描的页数
+3. 如果扫描页数最大化还是找不到足够的内存。可尝试不跳过active的页（默认跳过）再试一下。还不行就只能返回失败了
+
+而这个回收函数，老内核调用`shrink_zones`，内部再调用`shrink_node`。高版本直接调用后者，也就是直接以node为单位了。可扫描的页都存储在一个LRU list上。不同版本中所在的位置页不同，老版本在`zone`中（`zone.lruvec`），新版本在`pglist_data.__lruvec`
+
+`lruvec`实际上是一个链表数组，里面每一个元素都是一个链表，链表元素也就是一系列被扫描的页。链表的类型有多种：匿名页链表、文件页链表。并且还分为活跃、非活跃（最近一段时间是否被访问），每个链表内部还是按照LRU处理的。这些`lru`上的页都是内核申请的页。应用、驱动都不感知这些页的信息，只是能够使用而已，因此回收过程，实际上我们只要保证下一次再访问时内容正确，将他们暂时从物理内存移除是完全ok的。
+
+不过当然，不是所有内存都可以放到lru链表中。比如在驱动中使用`alloc_pages`申请内存，驱动在使用完成后将他们释放掉（`free_pages`）。虽然这些内存还是buddy系统分配的，但是这些内存的生命周期由模块本身负责，内核并不能直接回收这部分。但是内核也为模块留了一个口子，就是`shrinker`，如果模块实现了回收方法，并注册到内核，在回收时，也会尝试由模块释放一些内存。
+
+`shrink_lruvec`的逻辑比较清晰：
+1. 计算各类LRU需要扫描的页数
+2. 循环调用`shrink_list`，每次尝试一种类型的LRU链表
+
+```c
+// code from kernel 6.2
+static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+{
+	unsigned long nr[NR_LRU_LISTS];
+	unsigned long targets[NR_LRU_LISTS];
+	unsigned long nr_to_scan;
+	enum lru_list lru;
+	unsigned long nr_reclaimed = 0;
+	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
+	bool proportional_reclaim;
+	struct blk_plug plug;
+
+	if (lru_gen_enabled()) {
+		lru_gen_shrink_lruvec(lruvec, sc);
+		return;
+	}
+
+    // 计算各类LRU扫描数量
+	get_scan_count(lruvec, sc, nr);
+
+	/* Record the original scan target for proportional adjustments later */
+	memcpy(targets, nr, sizeof(nr));
+
+	/*
+	 * Global reclaiming within direct reclaim at DEF_PRIORITY is a normal
+	 * event that can occur when there is little memory pressure e.g.
+	 * multiple streaming readers/writers. Hence, we do not abort scanning
+	 * when the requested number of pages are reclaimed when scanning at
+	 * DEF_PRIORITY on the assumption that the fact we are direct
+	 * reclaiming implies that kswapd is not keeping up and it is best to
+	 * do a batch of work at once. For memcg reclaim one check is made to
+	 * abort proportional reclaim if either the file or anon lru has already
+	 * dropped to zero at the first pass.
+	 */
+	proportional_reclaim = (!cgroup_reclaim(sc) && !current_is_kswapd() &&
+				sc->priority == DEF_PRIORITY);
+
+	blk_start_plug(&plug);
+	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
+					nr[LRU_INACTIVE_FILE]) {
+		unsigned long nr_anon, nr_file, percentage;
+		unsigned long nr_scanned;
+
+        // 遍历每一种lru
+		for_each_evictable_lru(lru) {
+			if (nr[lru]) {
+				nr_to_scan = min(nr[lru], SWAP_CLUSTER_MAX);
+				nr[lru] -= nr_to_scan;
+
+                // shink_list内有对inactive、active的分别回收
+				nr_reclaimed += shrink_list(lru, nr_to_scan,
+							    lruvec, sc);
+			}
+		}
+
+		cond_resched();
+
+		if (nr_reclaimed < nr_to_reclaim || proportional_reclaim)
+			continue;
+
+		/*
+		 * For kswapd and memcg, reclaim at least the number of pages
+		 * requested. Ensure that the anon and file LRUs are scanned
+		 * proportionally what was requested by get_scan_count(). We
+		 * stop reclaiming one LRU and reduce the amount scanning
+		 * proportional to the original scan target.
+		 */
+		nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE];
+		nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON];
+
+		/*
+		 * It's just vindictive to attack the larger once the smaller
+		 * has gone to zero.  And given the way we stop scanning the
+		 * smaller below, this makes sure that we only make one nudge
+		 * towards proportionality once we've got nr_to_reclaim.
+		 */
+		if (!nr_file || !nr_anon)
+			break;
+
+		if (nr_file > nr_anon) {
+			unsigned long scan_target = targets[LRU_INACTIVE_ANON] +
+						targets[LRU_ACTIVE_ANON] + 1;
+			lru = LRU_BASE;
+			percentage = nr_anon * 100 / scan_target;
+		} else {
+			unsigned long scan_target = targets[LRU_INACTIVE_FILE] +
+						targets[LRU_ACTIVE_FILE] + 1;
+			lru = LRU_FILE;
+			percentage = nr_file * 100 / scan_target;
+		}
+
+		/* Stop scanning the smaller of the LRU */
+		nr[lru] = 0;
+		nr[lru + LRU_ACTIVE] = 0;
+
+		/*
+		 * Recalculate the other LRU scan count based on its original
+		 * scan target and the percentage scanning already complete
+		 */
+		lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE;
+		nr_scanned = targets[lru] - nr[lru];
+		nr[lru] = targets[lru] * (100 - percentage) / 100;
+		nr[lru] -= min(nr[lru], nr_scanned);
+
+		lru += LRU_ACTIVE;
+		nr_scanned = targets[lru] - nr[lru];
+		nr[lru] = targets[lru] * (100 - percentage) / 100;
+		nr[lru] -= min(nr[lru], nr_scanned);
+	}
+	blk_finish_plug(&plug);
+	sc->nr_reclaimed += nr_reclaimed;
+
+	/*
+	 * Even if we did not try to evict anon pages at all, we want to
+	 * rebalance the anon lru active/inactive ratio.
+	 */
+	if (can_age_anon_pages(lruvec_pgdat(lruvec), sc) &&
+	    inactive_is_low(lruvec, LRU_INACTIVE_ANON))
+		shrink_active_list(SWAP_CLUSTER_MAX, lruvec,
+				   sc, LRU_ACTIVE_ANON);
+}
+```
+
+LRU链表的一些添加和移除的细节情况。
+1. 访问页时，会导致active和inactive链表之箭的移动。为了优化，这个移动操作被批量化了，保存在LRU cache中。在扫描inactive list之前，`lru_add_drain`必须将这部分排空，加入到对应的LRU链表中，避免漏扫。
+2. 页隔离，`isolate_lru_folios/isolate_lru_pages`用来将待扫描的页从所在的`LRU list`中删除，并加入到`folio_list`链表，这样接下来就不回被重复扫描了。当然也就能避免重复回收。
+
+> **性能思考**：为什么隔离采用了删除的方式，而不是添加标记之类的方式。其实可以认为，删除的方式，锁的粒度很小，而且失败了的线程可以直接跳过，去隔离其他的页。另外，将待扫描的页统一到folio_list中，后续批量处理速度更快。
+
+folio（英文原意，对开本）看起来是一个突然出现的概念。但其实一定程度上就是复合页（compound pages），比如某些情况下一个folio中包括的页数是2的整数次幂。注意folio这个概念是在逐渐取代page。但目前内核中仍然会同时存在：page、Compound page（复合页）、folio。folio和page大部分字段都是一致的。所以在高版本中，会用`isolate_lru_folios`，其实是在向folios做迁移。
+
+隔离到了足够多的页之后，就可以开始回收了。`shrink_folio_list`在500行左右（有不少的注释）,这里不再贴代码，可以直接[点击链接](https://elixir.bootlin.com/linux/v6.2/source/mm/vmscan.c#L1651)。这里直接总结一下书中整理的10个步骤，也就是在循环中执行：
+1. 从folio_list（隔离出来的页的列表）中取下一个folio
+2. 如果folio正在写回，等待其写完（对应第6步），并重新插入folio_list尾部，下次继续循环处理
+3. 检查folio的活跃程度。active/inactive维度，是由上次扫描到现在，**访问了物理页的映射的数量**来决定（也就是访问了的pte的数量，而不是访问的数量），使用这个数据是因为MMU硬件上是在PTE中提供一个Accessed（A）访问标记，而不是访问次数。referenced/unreferenced维度，只要有一个PTE带有Accessed标记，就是referenced
+4. 开始尝试回收。如果可以被swap出去，则开始准备。
+5. 尝试取消folio之前的映射。取消成功后，folio原有的虚拟地址到物理页的映射就无效了。**需要**从物理地址反向遍历所有相关的PTE，并将它们设置为无效。
+6. 处理有dirty标记的情况。也就是需要将数据写回（对应第2步）。
+7. 现在folio已经完全准备好回收，插入free_folios
+8. 处理回收没有成功，且变为active的folio。
+9. 将上一步中的folio重新插回folio_list，之后会被返回。
+
+收尾工作，再之前的结尾处会发现，在函数返回时，folio_list中有一些folio实际上回收失败了。但是情况可能有好几种
+1. 回收失败（比如正在写回等，反正是回收的几步操作失败了），将会被重新放回inactive_lru
+2. 有过访问，需要重新放回inactive lru
+3. 访问活跃，要被提升为active，将要被插入active lru
+
+以上的扫描将会进行多次，按顺序分别是先匿名页，后文件页。并且每一种内部是先inactive，后active。当然不同的种类，循环中的步骤会有所差别，比如：active类型的folio不回直接被回收，最多被降级为inactive，因此没有回收的几个步骤。
+
+![folio-active](/images/book/linux-pic/folio-active.png)
+
+#### 反向映射
+
+上面忽略了一个问题，就是从一个页面（folio/page），如何获得映射这个页面的PTE。低版本内核将所有映射到某一个页面的PTE维护为一个链表，这样显然非常浪费。一个更合理的方式就是利用已有字段，`folio->mapping/page->mapping`。这里还要分为匿名和文件两种情况。这里我们还是贴一下folio的代码吧。
+
+```c
+/**
+ * struct folio - Represents a contiguous set of bytes.
+ * @flags: Identical to the page flags.
+ * @lru: Least Recently Used list; tracks how recently this folio was used.
+ * @mlock_count: Number of times this folio has been pinned by mlock().
+ * @mapping: The file this page belongs to, or refers to the anon_vma for
+ *    anonymous memory.
+ * @index: Offset within the file, in units of pages.  For anonymous memory,
+ *    this is the index from the beginning of the mmap.
+ * @private: Filesystem per-folio data (see folio_attach_private()).
+ *    Used for swp_entry_t if folio_test_swapcache().
+ * @_mapcount: Do not access this member directly.  Use folio_mapcount() to
+ *    find out how many times this folio is mapped by userspace.
+ * @_refcount: Do not access this member directly.  Use folio_ref_count()
+ *    to find how many references there are to this folio.
+ * @memcg_data: Memory Control Group data.
+ * @_flags_1: For large folios, additional page flags.
+ * @_head_1: Points to the folio.  Do not use.
+ * @_folio_dtor: Which destructor to use for this folio.
+ * @_folio_order: Do not use directly, call folio_order().
+ * @_compound_mapcount: Do not use directly, call folio_entire_mapcount().
+ * @_subpages_mapcount: Do not use directly, call folio_mapcount().
+ * @_pincount: Do not use directly, call folio_maybe_dma_pinned().
+ * @_folio_nr_pages: Do not use directly, call folio_nr_pages().
+ * @_flags_2: For alignment.  Do not use.
+ * @_head_2: Points to the folio.  Do not use.
+ * @_hugetlb_subpool: Do not use directly, use accessor in hugetlb.h.
+ * @_hugetlb_cgroup: Do not use directly, use accessor in hugetlb_cgroup.h.
+ * @_hugetlb_cgroup_rsvd: Do not use directly, use accessor in hugetlb_cgroup.h.
+ * @_hugetlb_hwpoison: Do not use directly, call raw_hwp_list_head().
+ *
+ * A folio is a physically, virtually and logically contiguous set
+ * of bytes.  It is a power-of-two in size, and it is aligned to that
+ * same power-of-two.  It is at least as large as %PAGE_SIZE.  If it is
+ * in the page cache, it is at a file offset which is a multiple of that
+ * power-of-two.  It may be mapped into userspace at an address which is
+ * at an arbitrary page offset, but its kernel virtual address is aligned
+ * to its size.
+ */
+struct folio {
+	/* private: don't document the anon union */
+	union {
+		struct {
+	/* public: */
+			unsigned long flags;
+			union {
+				struct list_head lru;
+	/* private: avoid cluttering the output */
+				struct {
+					void *__filler;
+	/* public: */
+					unsigned int mlock_count;
+	/* private: */
+				};
+	/* public: */
+			};
+			struct address_space *mapping;
+			pgoff_t index;
+			void *private;
+			atomic_t _mapcount;
+			atomic_t _refcount;
+#ifdef CONFIG_MEMCG
+			unsigned long memcg_data;
+#endif
+	/* private: the union with struct page is transitional */
+		};
+		struct page page;
+	};
+	union {
+		struct {
+			unsigned long _flags_1;
+			unsigned long _head_1;
+			unsigned char _folio_dtor;
+			unsigned char _folio_order;
+			atomic_t _compound_mapcount;
+			atomic_t _subpages_mapcount;
+			atomic_t _pincount;
+#ifdef CONFIG_64BIT
+			unsigned int _folio_nr_pages;
+#endif
+		};
+		struct page __page_1;
+	};
+	union {
+		struct {
+			unsigned long _flags_2;
+			unsigned long _head_2;
+			void *_hugetlb_subpool;
+			void *_hugetlb_cgroup;
+			void *_hugetlb_cgroup_rsvd;
+			void *_hugetlb_hwpoison;
+		};
+		struct page __page_2;
+	};
+};
+```
+
+1. 匿名映射的mapping
+   
+   在代码注释中也可以看到，匿名映射时，mapping字段是anon_vma地址。通过vma信息，可以得出映射的虚拟地址address，并用来定位PTE。而所有映射到同一个folio的vma，通过`anon_vma_chain`数据结构（avc），串联在了一起。自然就可以通过每一个vma拿到所有的虚拟地址address，并定位PTE。如下图所示。当然实际的细节要复杂很多。
+
+   ![anon_vma_chain](/images/book/linux-pic/anon-vma-chain.png)
+   ![anon_vma_chain with cow](/images/book/linux-pic/anon-vma-chain-cow.png)
+   
+2. 文件映射的mapping
+   
+   文件映射的mapping字段则相对简洁，此时的mapping可以用来直接遍历vma，不需要`anon_vma`。
+
+> 反向映射相关的vma结构，实际由区间树实现（用线性空间地址作为划分），效率还是可以的。更多反向映射的细节可以进一步阅读`rmap_walk_contrl`相关的内容。
+
+> 文件映射的页也可以是匿名页。这一点是针对MAP_ANONYMOUSE|MAP_SHARED的情况，内核会分配一个虚拟文件。这类页写回的时候，只能写回swap分区。
+
+
+## 文件系统篇
+
+### VFS
+
+先熟悉下几个基本的结构
+
+| 概念 | 数据结构 |
+| ------| ------ |
+| 文件系统 | super_block |
+| 文件本身 | inode |
+| 文件入口 | dentry |
+| 文件内容 | file |
+
+super_block是物理上的文件系统在内存中的抽象。每一块分区格式化完毕的磁盘，都是一个独立的文件系统。super_block按照类型，由`file_system_type`进行链接管理。
+
+![file-system-type&super-block](/images/book/linux-pic/file-system-type&super-block.png)
+
+inode在整个文件系统中是核心结构，这里粘贴一下源代码定义。
+```c
+/*
+ * Keep mostly read-only and often accessed (especially for
+ * the RCU path lookup and 'stat' data) fields at the beginning
+ * of the 'struct inode'
+ */
+//  inode本身并不存储文件内容，而是存储访问文件内容的方法
+struct inode {
+    // 文件类型
+	umode_t			i_mode;
+	unsigned short		i_opflags;
+	kuid_t			i_uid;
+	kgid_t			i_gid;
+	unsigned int		i_flags;
+
+#ifdef CONFIG_FS_POSIX_ACL
+	struct posix_acl	*i_acl;
+	struct posix_acl	*i_default_acl;
+#endif
+    // inode支持的操作
+	const struct inode_operations	*i_op;
+    // sb这个缩写都是指super block
+	struct super_block	*i_sb;
+	struct address_space	*i_mapping;
+
+#ifdef CONFIG_SECURITY
+	void			*i_security;
+#endif
+
+	/* Stat data, not accessed from path walking */
+    // inode 序号,在同一个文件系统中，应该是唯一的
+    // 内核会维护一组哈希链表，每个inode所在的哈希链表，是hash(super_block,i_ino)一起算出来的
+    // 内核中的这组哈希链表是inode_hashtable
+	unsigned long		i_ino;
+	/*
+	 * Filesystems may only read i_nlink directly.  They shall use the
+	 * following functions for modification:
+	 *
+	 *    (set|clear|inc|drop)_nlink
+	 *    inode_(inc|dec)_link_count
+	 */
+	union {
+		const unsigned int i_nlink;
+		unsigned int __i_nlink;
+	};
+	dev_t			i_rdev;
+	loff_t			i_size;
+    // access、modify、change时间
+	struct timespec64	i_atime;
+	struct timespec64	i_mtime;
+	struct timespec64	i_ctime;
+	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
+	unsigned short          i_bytes;
+	u8			i_blkbits;
+	u8			i_write_hint;
+	blkcnt_t		i_blocks;
+
+#ifdef __NEED_I_SIZE_ORDERED
+	seqcount_t		i_size_seqcount;
+#endif
+
+	/* Misc */
+	unsigned long		i_state;
+	struct rw_semaphore	i_rwsem;
+
+	unsigned long		dirtied_when;	/* jiffies of first dirtying */
+	unsigned long		dirtied_time_when;
+
+    // 链接到hash表中
+	struct hlist_node	i_hash;
+	struct list_head	i_io_list;	/* backing dev IO list */
+#ifdef CONFIG_CGROUP_WRITEBACK
+	struct bdi_writeback	*i_wb;		/* the associated cgroup wb */
+
+	/* foreign inode detection, see wbc_detach_inode() */
+	int			i_wb_frn_winner;
+	u16			i_wb_frn_avg_time;
+	u16			i_wb_frn_history;
+#endif
+	struct list_head	i_lru;		/* inode LRU list */
+
+    // 链接到super block的链表中
+	struct list_head	i_sb_list;
+	struct list_head	i_wb_list;	/* backing dev writeback list */
+	union {
+        // 文件的硬链接的dentry组成的链表的链表头
+		struct hlist_head	i_dentry;
+		struct rcu_head		i_rcu;
+	};
+	atomic64_t		i_version;
+	atomic64_t		i_sequence; /* see futex */
+	atomic_t		i_count;
+	atomic_t		i_dio_count;
+	atomic_t		i_writecount;
+#if defined(CONFIG_IMA) || defined(CONFIG_FILE_LOCKING)
+	atomic_t		i_readcount; /* struct files open RO */
+#endif
+	union {
+        // 文件内容相关操作，注意和i_op的区别
+		const struct file_operations	*i_fop;	/* former ->i_op->default_file_ops */
+		void (*free_inode)(struct inode *);
+	};
+	struct file_lock_context	*i_flctx;
+	struct address_space	i_data;
+	struct list_head	i_devices;
+	union {
+		struct pipe_inode_info	*i_pipe;
+		struct cdev		*i_cdev;
+        // 链接的目标文件的路径
+		char			*i_link;
+		unsigned		i_dir_seq;
+	};
+
+	__u32			i_generation;
+
+#ifdef CONFIG_FSNOTIFY
+	__u32			i_fsnotify_mask; /* all events this inode cares about */
+	struct fsnotify_mark_connector __rcu	*i_fsnotify_marks;
+#endif
+
+#ifdef CONFIG_FS_ENCRYPTION
+	struct fscrypt_info	*i_crypt_info;
+#endif
+
+#ifdef CONFIG_FS_VERITY
+	struct fsverity_info	*i_verity_info;
+#endif
+
+	void			*i_private; /* fs or device private pointer */
+} __randomize_layout;
+```
+
+一些关键的内容已经补充了中文注释。另外注意文件系统中，modify指的是更改文件的**内容**，change指的是文件本身的改动。
+
+dentry用来协助inode，完成文件之间的层级结构的表示。
+
+```c
+struct dentry {
+	/* RCU lookup touched fields */
+	unsigned int d_flags;		/* protected by d_lock */
+	seqcount_spinlock_t d_seq;	/* per dentry seqlock */
+	struct hlist_bl_node d_hash;	/* lookup hash list */
+	struct dentry *d_parent;	/* parent directory */
+	struct qstr d_name;
+	struct inode *d_inode;		/* Where the name belongs to - NULL is
+					 * negative */
+
+    // 短名字，不够存的话，放到d_name里
+	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
+
+	/* Ref lookup also touches following */
+	struct lockref d_lockref;	/* per-dentry lock and refcount */
+	const struct dentry_operations *d_op;
+	struct super_block *d_sb;	/* The root of the dentry tree */
+	unsigned long d_time;		/* used by d_revalidate */
+	void *d_fsdata;			/* fs-specific data */
+
+	union {
+		struct list_head d_lru;		/* LRU list */
+		wait_queue_head_t *d_wait;	/* in-lookup ones only */
+	};
+	struct list_head d_child;	/* child of parent list */
+	struct list_head d_subdirs;	/* our children */
+	/*
+	 * d_alias and d_rcu can share memory
+	 */
+	union {
+		struct hlist_node d_alias;	/* inode alias list */
+		struct hlist_bl_node d_in_lookup_hash;	/* only for in-lookup ones */
+	 	struct rcu_head d_rcu;
+	} d_u;
+} __randomize_layout;
+```
+
+这里就可以复习一下，软链接和硬链接的区别。由了dentry，就可以遍历父目录、子目录。当然也会有效率更高的，内核维护的`dentry_hashtable`来查找的方式。但是**dentry并不是一开始就有的**，是在访问目录的过程中不断创建出来的，也就是说，实际上我们仍然是在访问inode，从inode中得知文件系统的层级结构。并将其存储到dentry里。因此上面我们说，dentry是协助inode维护目录信息的。
+
+最后我们来看一下文件`file`
+
+```c
+struct file {
+	union {
+		struct llist_node	f_llist;
+		struct rcu_head 	f_rcuhead;
+		unsigned int 		f_iocb_flags;
+	};
+	struct path		f_path;
+    // 所属的inode
+	struct inode		*f_inode;	/* cached value */
+    // 文件操作
+	const struct file_operations	*f_op;
+
+	/*
+	 * Protects f_ep, f_flags.
+	 * Must not be taken from IRQ context.
+	 */
+	spinlock_t		f_lock;
+    // 引用计数
+	atomic_long_t		f_count;
+	unsigned int 		f_flags;
+	fmode_t			f_mode;
+	struct mutex		f_pos_lock;
+    // 当前位置
+	loff_t			f_pos;
+	struct fown_struct	f_owner;
+	const struct cred	*f_cred;
+	struct file_ra_state	f_ra;
+
+	u64			f_version;
+#ifdef CONFIG_SECURITY
+	void			*f_security;
+#endif
+	/* needed for tty driver, and maybe others */
+	void			*private_data;
+
+#ifdef CONFIG_EPOLL
+	/* Used by fs/eventpoll.c to link all the hooks to this file */
+	struct hlist_head	*f_ep;
+#endif /* #ifdef CONFIG_EPOLL */
+	struct address_space	*f_mapping;
+	errseq_t		f_wb_err;
+	errseq_t		f_sb_err; /* for syncfs */
+} __randomize_layout
+  __attribute__((aligned(4)));	/* lest something weird decides that 2 is OK */
+```
+
+超级块super_block的内容稍微有些多，这里放一下[链接](https://elixir.bootlin.com/linux/v6.2.16/source/include/linux/fs.h#L1473)可以自行查看。
+
+#### 文件系统的挂载
+
+文件系统可以分为三类：基于磁盘的、基于内存的、网络文件系统。但从设计上可以则将他们分为：虚拟文件系统VFS，和挂载到VFS的实际上的文件系统（比如ext4、sysfs等）。
+
+挂载的调用关系如下图所示（do_mount是系统调用最终的工作函数）
+
+![do mount](/images/book/linux-pic/do-mount.png)
+
+> 这里的my_，或者xx_都是一种代指，因为文件系统众多，实际上调用的可能有多种
+
+如果是新挂载一个文件系统。那么`do_new_mount`中需要经过的步骤主要是
+1. 根据fstype找到对应的file_system_type
+2. 初始化`fs_context`，后续简称为fc。初始化操作由`file_system_type->init_fs_context`完成。主要是要定义`fc->ops`，也就是文件系统的一些操作。
+3. `vfs_get_tree`回调`fc->ops->get_tree`。获取文件结构树。最终获取到了这个文件操作系统对应的super_block（当然如果没有的话会创建一个super_block）。超级块里面的s_root，也就是dentry类型指针，对应的文件系统的root文件，名字一般就是我们熟悉的`/`。
+    1. 这一步流程内部，在已建立super_block之后，是由对应的文件系统提供获取文件的inode的方式`xx_get_inode`，这里的inode是由该文件系统内部查询或者创建出来的。
+4. `do_new_mount_fc`，根据得到的超级块和root文件为mount结构体赋值。
+    1. 调用`lock_mount`找到/创建对应的`mountpoint`挂载点。调用`do_add_mount & graft_tree`完成挂载关系。为了方便查找子mount，在哈希链表中，子mount的下标，是用父mount的信息计算的。
+
+这里有几个要点值得理解一下：
+1. super_block的创建是看需求的，如果之前已有的super_block不能满足当前的新的mount的需求，就会创建一个，比如文件系统类型不同、mount参数冲突等。
+2. 一个路径可以被挂载多次，后挂载的文件系统将会覆盖之前挂载到这个路径的文件系统，`unmount`之后又会恢复。书上的比喻是：一层一层穿过所有的墙（递归检查指定路径上的挂载），在最后一面墙的后面再起新墙（本次的挂载，覆盖了前面的所有挂载）。子mount会覆盖父mount。
+3. 继续第2点。路径解析是自顶向下。但挂载点查找是自底向上的，即给定一个路径，查找最深的覆盖他的mount（mountpoint lookup）。从另一个角度理解，就是mnt_mountpoint 不是“挂载的目标路径”，而是“在父文件系统中被覆盖的目录”
+
+下图是路径/test被先后挂载ext4、sysfs、proc文件系统之后的状态。
+![mount](/images/book/linux-pic/mount.png)
+
+
+mount和、vfsmount结构体如下。一些书中重点已单独备注
+```c
+struct mount {
+    // 将当前mount对象链接到hash链表
+	struct hlist_node mnt_hash;
+    // 父mount，因为挂载是可以嵌套的，所以需要有这个
+	struct mount *mnt_parent;
+    // 挂载点
+	struct dentry *mnt_mountpoint;
+    // 内嵌vfsmount
+	struct vfsmount mnt;
+	union {
+		struct rcu_head mnt_rcu;
+		struct llist_node mnt_llist;
+	};
+#ifdef CONFIG_SMP
+	struct mnt_pcp __percpu *mnt_pcp;
+#else
+	int mnt_count;
+	int mnt_writers;
+#endif
+    // 子mount也保存了
+	struct list_head mnt_mounts;	/* list of children, anchored here */
+	struct list_head mnt_child;	/* and going through their mnt_child */
+    // 将当前对象链接到超级块的链表中
+	struct list_head mnt_instance;	/* mount instance on sb->s_mounts */
+	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
+	struct list_head mnt_list;
+	struct list_head mnt_expire;	/* link in fs-specific expiry list */
+	struct list_head mnt_share;	/* circular list of shared mounts */
+	struct list_head mnt_slave_list;/* list of slave mounts */
+	struct list_head mnt_slave;	/* slave list entry */
+	struct mount *mnt_master;	/* slave is on master->mnt_slave_list */
+	struct mnt_namespace *mnt_ns;	/* containing namespace */
+	struct mountpoint *mnt_mp;	/* where is it mounted */
+	union {
+		struct hlist_node mnt_mp_list;	/* list mounts with the same mountpoint */
+		struct hlist_node mnt_umount;
+	};
+	struct list_head mnt_umounting; /* list entry for umount propagation */
+#ifdef CONFIG_FSNOTIFY
+	struct fsnotify_mark_connector __rcu *mnt_fsnotify_marks;
+	__u32 mnt_fsnotify_mask;
+#endif
+	int mnt_id;			/* mount identifier */
+	int mnt_group_id;		/* peer group identifier */
+	int mnt_expiry_mark;		/* true if marked for expiry */
+	struct hlist_head mnt_pins;
+	struct hlist_head mnt_stuck_children;
+} __randomize_layout;
+
+struct vfsmount {
+    // 前文说的xx_get_tree所返回的根dentry
+	struct dentry *mnt_root;	/* root of the mounted tree */
+    // 同样是xx_get_tree返回的超级块
+	struct super_block *mnt_sb;	/* pointer to superblock */
+	int mnt_flags;
+	struct mnt_idmap *mnt_idmap;
+} __randomize_layout;
+
+```
+
+#### 文件查找
+
+查找路径有共同的流程：设置起点，查找中间路径，处理目标文件/路径。过程中用到的，`metaidata`是存储查找阶段配置的结构体。`nameidata`是每一轮查找的一个辅助结构（主要包含本轮找到的path、dentry、文件类型）。
+
+1. 设置起点`path_init`：起点在`metaidata->dfd`中表示（int值类型，值是AT_FDCWD当前目录的意思，或者是一个文件描述符fd）。分别处理相对路径和绝对路径，设置起点和对应的inode（`nd->path.dentry->d_inode`）
+2. 查询中间路径`link_path_walk`：循环查找，以`/`为分割。注意我们之前讲过dentry是在使用过程中一点一点建立的，所以查找过程中，`lookup_fast`是查找已有的dentry中是否有我们要的，但是没有的话，还是需要调用`lookup_slow`，深入到从文件系统中查找。
+    1. 查找dentry的过程中，找到的dentry并不一定能直接使用，前面我们讲过的挂载，拿到的dentry有可能已经被子挂载覆盖隐藏了。因此查找dentry还需要“**穿墙**”。
+    2. 另外还要处理符号链接，如果文件是符号链接，改为返回链接的路径，将剩余路径拼到链接路径后面，继续处理
+3. 处理目标文件、路径`path_lookupat`
+
+lookup的调用流程
+![path lookup](/images/book/linux-pic/path-lookup.png)
+
+lookup的fast和slow的区别，一边是用dentry，一边是用`inode->i_op->lookup`深入文件系统查找
+![lookup fast/slow](/images/book/linux-pic/lookup-fast-slow.png)
+
+#### 文件操作
+> 文件操作指的是对“文件”本身的操作，而不是对文件内容的操作
+
+> 实际上无论是什么操作，VFS都是定义框架，具体的实现由各个文件系统决定。
+
+软链接和硬链接核心逻辑位于`do_link_at`函数及`vfs_link`，底层为`inode->i_op->link`。软链接创建一个新的特殊文件（拥有新的inode），但是其内容是链接的目标地址的字符串。而硬链接本质则是同一个inode的多个dentry。而且因此可以知道一些区别：
+1. 硬链接不能跨文件系统，因为文件系统之间的inode彼此独立。
+2. 不能给目录创建硬链接。主要的理由是防止路径循环，因为硬链接直接用同一个inode，如果允许目录硬链接，那么实际上破坏了文件系统的DAG结构，这是结构上的设计问题（没有必要为了这个特性，在inode中添加信息来维护）。而软链接能够允许则是因为软链接相当于路径重定向（只是一个alias），而且一般也配置了深度限制。而且常见的还有另一个问题，`../`的语义，硬链接在这种情况下也变得模糊，到底是当前目录的上级，还是链接目标目录的上级。
+    
+    从实现的角度来看，软链接循环发生在`vfs_follow_link`，内核可以控制，而硬链接目录的循环是静态结构，需要应用层自行防御，完全不可靠。
+3. 
+
+
+## 课后问题
+1. 进程控制块中包含了进程页表的基址，那进程控制块本身所在的虚拟内存，由谁的页表管理，以及其内存基址如何存储？如何避免套娃问题？
+   
+   涉及到内核启动、分页机制、进程管理的启动
+
+2. 物理内存是否可以热插拔，热插拔是否会引起直接映射区的重新映射？将被拔出的内存中的数据如何保存？
+3. malloc、free的底层原理，他们是如何操作brk这个系统调用的
+   【TODO】 再补充一些，尤其是brk。
+   堆内存由glibc管理。每次调用brk改变堆内存大小。系统调用是有代价的，所以每次申请实际上会多申请一些。反正返回的也是虚拟地址，只有访问的时候，才会触发缺页中断而产生实际物理内存映射。
+4. 缺页异常的信息由CPU提供，但是映射是MMU负责，CPU是如何知道这些信息的呢？
+5. vma结构是区间树，不同vma的线性空间完全不同，这个区间是针对什么进行划分的呢？
+6. 硬链接禁止链接目录，其实说明了inode和dentry在访问上的底层逻辑区别。
+
+
+<!-- 阅读位置，电子书151/纸质书139页 -->
+
+<!-- 但是 硬链接禁止链接目录 这个事情，感觉对inode和dentry的理解还不够透，在课后问题中补充一下吧 -->
+
+<!-- 可从https://fliphtml5.com/ytimv/nlep/%E5%9B%BE%E8%A7%A3Linux%E5%86%85%E6%A0%B8%EF%BC%88%E5%9F%BA%E4%BA%8E6.x%EF%BC%89_%28%E5%A7%9C%E4%BA%9A%E5%8D%8E%29_%28Z-Library%29/147/  在线阅读 -->
+
+<!-- https://elixir.bootlin.com/linux/v5.0/source/Documentation/x86/x86_64/mm.txt -->
+
+<!-- https://elixir.bootlin.com/linux/v6.2.16/source -->
\ No newline at end of file
diff --git a/content/post/book/linux-unix-system-devman.md b/content/post/book/linux-unix-system-devman.md
index 4e1cb950..b28e2488 100644
--- a/content/post/book/linux-unix-system-devman.md
+++ b/content/post/book/linux-unix-system-devman.md
@@ -7,7 +7,7 @@ categories:
 tags:
 - 操作系统
 - Linux
-- 施工中
+- 暂停施工
 thumbnailImagePosition: left
 thumbnailImage: /images/thumbnail/book/LinuxUnixSystemDev.png
 draft: true
diff --git a/content/post/book/linux-web-io-series.md b/content/post/book/linux-web-io-series.md
index 02ce64c3..e56c52c6 100644
--- a/content/post/book/linux-web-io-series.md
+++ b/content/post/book/linux-web-io-series.md
@@ -17,7 +17,7 @@ thumbnailImage: /images/thumbnail/book/linux-web-io-series.png
 ## 参考书籍列表
 | 书籍名称 | C++版本 | 重点 |
 | --- | --- | --- |
-|Linux多线程服务端编程：使用muduo C++网络库 | 早于C++11 | 线程安全 |
+| Linux多线程服务端编程：使用muduo C++网络库 | 早于C++11 | 线程安全 |
 | Linux高性能服务器编程 |  | TCP/IP和IO模型 |
 | C++服务器开发精髓 |  |  |
 
diff --git a/deprecated-content/README.md b/deprecated-content/README.md
new file mode 100644
index 00000000..77dbd5a3
--- /dev/null
+++ b/deprecated-content/README.md
@@ -0,0 +1,3 @@
+# 说明
+
+本文件夹下存放废弃文档，一般来说不会再发布。仅作存档保留。
\ No newline at end of file
diff --git a/content/post/book/apue-1.md b/deprecated-content/apue-1.md
similarity index 100%
rename from content/post/book/apue-1.md
rename to deprecated-content/apue-1.md
diff --git a/content/post/business/computational-advertising.md b/deprecated-content/computational-advertising.md
similarity index 100%
rename from content/post/business/computational-advertising.md
rename to deprecated-content/computational-advertising.md
diff --git a/content/post/Web/fullstack-css.md b/deprecated-content/fullstack-css.md
similarity index 100%
rename from content/post/Web/fullstack-css.md
rename to deprecated-content/fullstack-css.md
diff --git a/content/post/language/python-all-in-one.md b/deprecated-content/python-all-in-one.md
similarity index 100%
rename from content/post/language/python-all-in-one.md
rename to deprecated-content/python-all-in-one.md
diff --git a/content/post/business/recommended-system.md b/deprecated-content/recommended-system.md
similarity index 100%
rename from content/post/business/recommended-system.md
rename to deprecated-content/recommended-system.md
diff --git a/static/images/book/linux-pic/anon-vma-chain-cow.png b/static/images/book/linux-pic/anon-vma-chain-cow.png
new file mode 100644
index 00000000..9bd0d1ee
Binary files /dev/null and b/static/images/book/linux-pic/anon-vma-chain-cow.png differ
diff --git a/static/images/book/linux-pic/anon-vma-chain.png b/static/images/book/linux-pic/anon-vma-chain.png
new file mode 100644
index 00000000..aad13187
Binary files /dev/null and b/static/images/book/linux-pic/anon-vma-chain.png differ
diff --git a/static/images/book/linux-pic/do-mount.png b/static/images/book/linux-pic/do-mount.png
new file mode 100644
index 00000000..a8a0fc37
Binary files /dev/null and b/static/images/book/linux-pic/do-mount.png differ
diff --git a/static/images/book/linux-pic/do_mmap.png b/static/images/book/linux-pic/do_mmap.png
new file mode 100644
index 00000000..e7cb0950
Binary files /dev/null and b/static/images/book/linux-pic/do_mmap.png differ
diff --git a/static/images/book/linux-pic/file-system-type&super-block.png b/static/images/book/linux-pic/file-system-type&super-block.png
new file mode 100644
index 00000000..ea97465f
Binary files /dev/null and b/static/images/book/linux-pic/file-system-type&super-block.png differ
diff --git a/static/images/book/linux-pic/folio-active.png b/static/images/book/linux-pic/folio-active.png
new file mode 100644
index 00000000..19d2ab0c
Binary files /dev/null and b/static/images/book/linux-pic/folio-active.png differ
diff --git a/static/images/book/linux-pic/fork-cow.png b/static/images/book/linux-pic/fork-cow.png
new file mode 100644
index 00000000..ca60e910
Binary files /dev/null and b/static/images/book/linux-pic/fork-cow.png differ
diff --git a/static/images/book/linux-pic/handle_page_fault.png b/static/images/book/linux-pic/handle_page_fault.png
new file mode 100644
index 00000000..b8d37aa2
Binary files /dev/null and b/static/images/book/linux-pic/handle_page_fault.png differ
diff --git a/static/images/book/linux-pic/idt.png b/static/images/book/linux-pic/idt.png
new file mode 100644
index 00000000..d04a4f71
Binary files /dev/null and b/static/images/book/linux-pic/idt.png differ
diff --git a/static/images/book/linux-pic/lookup-fast-slow.png b/static/images/book/linux-pic/lookup-fast-slow.png
new file mode 100644
index 00000000..6fe55bb0
Binary files /dev/null and b/static/images/book/linux-pic/lookup-fast-slow.png differ
diff --git a/static/images/book/linux-pic/mmio.png b/static/images/book/linux-pic/mmio.png
new file mode 100644
index 00000000..96d62364
Binary files /dev/null and b/static/images/book/linux-pic/mmio.png differ
diff --git a/static/images/book/linux-pic/mount.png b/static/images/book/linux-pic/mount.png
new file mode 100644
index 00000000..e689fe26
Binary files /dev/null and b/static/images/book/linux-pic/mount.png differ
diff --git a/static/images/book/linux-pic/node-zone.png b/static/images/book/linux-pic/node-zone.png
new file mode 100644
index 00000000..0fdf56aa
Binary files /dev/null and b/static/images/book/linux-pic/node-zone.png differ
diff --git a/static/images/book/linux-pic/path-lookup.png b/static/images/book/linux-pic/path-lookup.png
new file mode 100644
index 00000000..87d658c9
Binary files /dev/null and b/static/images/book/linux-pic/path-lookup.png differ
diff --git a/static/images/book/linux-pic/sparsemem_vmemmap.png b/static/images/book/linux-pic/sparsemem_vmemmap.png
new file mode 100644
index 00000000..ef1f3206
Binary files /dev/null and b/static/images/book/linux-pic/sparsemem_vmemmap.png differ
diff --git a/static/images/book/linux-pic/x64-va-space.png b/static/images/book/linux-pic/x64-va-space.png
new file mode 100644
index 00000000..e4eac5d5
Binary files /dev/null and b/static/images/book/linux-pic/x64-va-space.png differ
diff --git a/static/images/book/linux-pic/x86-va-pa.png b/static/images/book/linux-pic/x86-va-pa.png
new file mode 100644
index 00000000..3361a09a
Binary files /dev/null and b/static/images/book/linux-pic/x86-va-pa.png differ
diff --git a/static/images/book/linux-pic/x86-va-space.png b/static/images/book/linux-pic/x86-va-space.png
new file mode 100644
index 00000000..eaba0beb
Binary files /dev/null and b/static/images/book/linux-pic/x86-va-space.png differ
diff --git a/static/images/book/linux-pic/zone_page.png b/static/images/book/linux-pic/zone_page.png
new file mode 100644
index 00000000..37cadfd3
Binary files /dev/null and b/static/images/book/linux-pic/zone_page.png differ
diff --git a/static/images/thumbnail/book/linux-kernel-pictures.jpg b/static/images/thumbnail/book/linux-kernel-pictures.jpg
new file mode 100644
index 00000000..8d037a66
Binary files /dev/null and b/static/images/thumbnail/book/linux-kernel-pictures.jpg differ
diff --git a/static/images/thumbnail/large-scale-cpp-1.jpg b/static/images/thumbnail/large-scale-cpp-1.jpg
new file mode 100644
index 00000000..307ee076
Binary files /dev/null and b/static/images/thumbnail/large-scale-cpp-1.jpg differ