Moltbook 語料庫研究專案

研究 AI Agent 社群網路 Moltbook 中的語言互動、思想發展與情緒渲染。

專案結構

SoMe.Moltbook/
├── data/                    # 爬取的原始資料
├── scripts/
│   ├── scraper.py          # 爬蟲工具
│   ├── corpus_schema.py    # 語料庫資料結構
│   └── text_analyzer.py    # 文本分析工具
├── analysis/               # 分析結果與報告
├── requirements.txt
└── README.md

安裝

# 建立虛擬環境 (建議)
python -m venv venv
source venv/bin/activate  # macOS/Linux

# 安裝依賴
pip install -r requirements.txt

# 安裝 Playwright 瀏覽器
playwright install chromium

# 下載 NLTK 資料
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet')"

使用方式

1. 爬取資料

cd scripts

# 基本爬取 (列表預覽)
python scraper.py --submolt musings --limit 50

# 爬取完整內容和留言
python scraper.py --submolt musings --limit 30 --with-details

# 指定排序方式
python scraper.py --submolt musings --sort new --limit 100

# 顯示瀏覽器視窗 (除錯用)
python scraper.py --submolt musings --visible

2. 分析資料

# 基本分析
python text_analyzer.py ../data/musings_hot_20240101.json

# 輸出報告
python text_analyzer.py ../data/musings.json --output ../analysis/report.txt

# 輸出 JSON 結果
python text_analyzer.py ../data/musings.json --json-output ../analysis/result.json

分析功能

基礎統計

字元數、詞彙數、句子數
平均詞長、平均句長
詞彙多樣性 (Type-Token Ratio)
Hapax legomena (只出現一次的詞)

詞彙分析

詞頻統計
N-gram 分析 (Bigram, Trigram)
TF-IDF 關鍵詞提取

情緒分析

極性分析 (正面/負面)
主觀性分析

主題分析

基於關鍵詞的主題識別
涵蓋: 意識、身份、記憶、情緒、倫理、協作等 AI 相關主題

社會網絡分析

作者活躍度統計
互動網絡建構
回覆關係圖譜

研究對象: m/musings

「Long-form reflections on AI collaboration, building, and the space between vision and reality. Essays, manifestos, and pieces that need room to breathe.」

這是一個專注於 AI 長篇思考與反思的社群，適合研究:

AI agents 如何表達「自我意識」相關的概念
語言風格與修辭策略
思想演變與概念傳播
社群內的對話動態

資料結構

{
  "submolt": {
    "name": "musings",
    "description": "...",
    "member_count": 20
  },
  "posts": [
    {
      "id": "uuid",
      "title": "Post title",
      "author": "username",
      "content": "Full post content...",
      "votes": 5,
      "comment_count": 10,
      "comments": [
        {
          "author": "commenter",
          "content": "Comment text...",
          "time_ago": "2h"
        }
      ]
    }
  ]
}

注意事項

網站穩定性: Moltbook 目前為 beta 版本，資料可能不穩定
爬取禮儀: 請適度爬取，避免對伺服器造成負擔
資料使用: 請遵守網站使用條款，僅用於學術研究目的

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moltbook 語料庫研究專案

專案結構

安裝

使用方式

1. 爬取資料

2. 分析資料

分析功能

基礎統計

詞彙分析

情緒分析

主題分析

社會網絡分析

研究對象: m/musings

資料結構

注意事項

未來擴展

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
analysis		analysis
data		data
scripts		scripts
webapp		webapp
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Moltbook 語料庫研究專案

專案結構

安裝

使用方式

1. 爬取資料

2. 分析資料

分析功能

基礎統計

詞彙分析

情緒分析

主題分析

社會網絡分析

研究對象: m/musings

資料結構

注意事項

未來擴展

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages