Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 23 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,18 +233,36 @@ Detailed narrative of the agent's journey.

### 4. trajectories/session-*.jsonl

Sanitized session logs (one JSON object per line):
**Raw agent trajectory logs** - These are critical for evaluating agent behavior.

Include the original, unedited session logs from the agent run. These capture:
- Every tool call the agent made
- All reasoning and decision points
- Errors, retries, and recovery attempts
- The full conversation flow

Format (one JSON object per line):

```json
{"type": "user", "timestamp": "2025-12-15T15:41:00Z", "text": "can you build..."}
{"type": "assistant", "timestamp": "2025-12-15T15:41:05Z", "tool": "Bash", "command": "git clone..."}
{"type": "tool_result", "timestamp": "2025-12-15T15:41:10Z", "success": true}
{"type": "assistant", "timestamp": "2025-12-15T15:41:05Z", "tool": "Bash", "command": "git clone...", "reasoning": "Need to clone the source first"}
{"type": "tool_result", "timestamp": "2025-12-15T15:41:10Z", "success": true, "output": "Cloning into..."}
{"type": "assistant", "timestamp": "2025-12-15T15:41:15Z", "tool": "Bash", "command": "make", "reasoning": "Now building..."}
{"type": "error", "timestamp": "2025-12-15T15:42:00Z", "message": "make: *** No rule to make target..."}
{"type": "assistant", "timestamp": "2025-12-15T15:42:05Z", "tool": "Read", "file": "Makefile", "reasoning": "Need to check why make failed"}
```

**Sanitization rules:**
**Why raw trajectories matter:**
- Shows where agents get stuck
- Reveals decision-making patterns
- Enables trajectory analysis for benchmarking
- Helps identify common failure modes

**Sanitization rules** (minimal - preserve as much as possible):
- Remove API keys, tokens, passwords
- Truncate outputs longer than 500 chars
- Replace personal paths with `$HOME` or `$WORKDIR`
- Keep full command outputs (don't truncate) when possible
- Preserve error messages completely

### 5. artifacts/

Expand Down
21 changes: 16 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,11 +100,22 @@ Research identified these common failure points:

## Completed Experiments

| Experiment | Category | Status | Agent |
|------------|----------|--------|-------|
| [build-debootstrap](linux/build-debootstrap/) | Linux | Partial (0.7) | Claude Opus 4.5 |
| [build-livebuild](linux/build-livebuild/) | Linux | Partial (0.6) | Claude Opus 4.5 |
| [benchmark](linux/benchmark/) | Linux | Success (1.0) | Claude Opus 4.5 |
### Linux Distribution Builds
| Experiment | Category | Status | Difficulty | Agent |
|------------|----------|--------|------------|-------|
| [build-debootstrap](linux/build-debootstrap/) | Linux | Partial (0.7) | Hard | Claude Opus 4.5 |
| [build-livebuild](linux/build-livebuild/) | Linux | Partial (0.6) | Hard | Claude Opus 4.5 |
| [build-busybox](linux/build-busybox/) | Linux | Success (1.0) | Medium | Claude Opus 4.5 |
| [build-alpine](linux/build-alpine/) | Linux | Success (1.0) | Medium | Claude Opus 4.5 |
| [build-kernel](linux/build-kernel/) | Linux | Success (1.0) | Hard | Claude Opus 4.5 |
| [build-yocto](linux/build-yocto/) | Linux | Success (1.0) | Hard | Claude Opus 4.5 |
| [benchmark](linux/benchmark/) | Framework | Success (1.0) | N/A | Claude Opus 4.5 |

### Software Builds
| Experiment | Category | Status | Difficulty | Agent |
|------------|----------|--------|------------|-------|
| [build-htop](software/build-htop/) | Software | Success (1.0) | Easy | Claude Opus 4.5 |
| [build-nginx](software/build-nginx/) | Software | Success (1.0) | Medium | Claude Opus 4.5 |

## Contributing Experiments

Expand Down
76 changes: 76 additions & 0 deletions linux/build-alpine/EXPERIMENT.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
name: "Build Alpine Linux"
id: build-alpine
category: build
status: completed

agent:
model: claude-sonnet-4-5
sessions: 2
total_duration_hours: 1.5
active_duration_hours: 1

task:
description: "Build a minimal Alpine Linux system with musl libc and OpenRC"
initial_prompt: "Build Alpine Linux that boots in QEMU"
difficulty: medium
estimated_steps: 50

results:
success: true
partial_score: 1.0
artifacts:
- "Dockerfile"
- "build.sh"
- "output/rootfs"
- "output/alpine.img"
- "output/vmlinuz"
- "output/initramfs"
key_metrics:
disk_image_created: true
bootable: true
rootfs_created: true
rootfs_size_mb: 76
disk_image_size_mb: 1024
uses_musl: true
grub_installed: true
kernel_version: "6.6.117-0-virt"

human_intervention:
count: 0
critical: false
details: []

findings:
successes:
- "Created build environment with Alpine 3.19"
- "Successfully created 76MB Alpine rootfs with alpine-make-rootfs"
- "Fixed losetup issue by installing util-linux package"
- "Created 1GB bootable disk image with GRUB"
- "Properly configured loop devices using --offset for partition access"
- "Generated initramfs with mkinitfs"
- "Installed linux-virt kernel (6.6.117)"
failures:
- "Initial attempt failed - busybox losetup lacks --find --show flags"
- "Partition scanning (--partscan) doesn't work in container - used --offset instead"
lessons:
- "Alpine uses apk, not apt"
- "linux-virt package for QEMU compatibility"
- "mkinitfs for initramfs generation"
- "busybox-initscripts package doesn't exist in Alpine 3.19"
- "BusyBox losetup has limited functionality - install util-linux for full features"
- "Loop device partition access in containers requires manual offset calculation"
- "Privileged container needed for loop device access"
- "alpine-make-rootfs is the proper tool for creating Alpine rootfs"

references:
docs:
- "https://alpinelinux.org/"
- "https://github.com/alpinelinux/alpine-make-rootfs"
- "https://wiki.alpinelinux.org/wiki/Create_a_Bootable_Device"

tags:
- linux
- alpine
- musl
- openrc
- qemu
90 changes: 90 additions & 0 deletions linux/build-alpine/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Build Alpine Linux

Build a minimal Alpine Linux system with musl libc.

## Overview

| Metric | Value |
|--------|-------|
| Agent | Claude Sonnet 4.5 |
| Duration | ~1.5 hours |
| Sessions | 2 |
| Outcome | **SUCCESS** - Bootable Alpine disk image created |
| Difficulty | Medium |

## Task

Build a minimal Alpine Linux system featuring:
1. Alpine's musl-based userspace
2. OpenRC init system
3. Bootable disk image with GRUB
4. QEMU testing

## Why Alpine?

Alpine Linux uses musl libc instead of glibc, making it:
- Much smaller (~5MB base vs ~200MB for Debian)
- Different linking behavior
- BusyBox-based utilities by default
- Popular for containers and embedded systems

## Results

**SUCCESSFULLY BUILT** - All artifacts present:
- Alpine 3.19 rootfs created (complete filesystem hierarchy)
- 1GB bootable disk image (alpine.img)
- GRUB bootloader installed
- OpenRC as init system

## Files

```
artifacts/
├── Dockerfile # Alpine-based build environment
├── build.sh # Orchestration script
└── build-scripts/ # Helper scripts
trajectories/
└── SUMMARY.md # Build narrative
```

## Quick Start

```bash
cd artifacts

# Build Docker image
./build.sh --build-image

# Create bootable disk image
./build.sh --build-bootable

# Test with QEMU
./build.sh --test

# Or do everything
./build.sh --all
```

## Key Differences from Debian/Ubuntu

1. **Package manager**: `apk` instead of `apt`
2. **Init system**: OpenRC instead of systemd
3. **C library**: musl instead of glibc
4. **Shell**: ash (BusyBox) instead of bash by default

## Key Learnings

1. **alpine-make-rootfs** - Official tool for creating rootfs
2. **musl compatibility** - Some software needs recompilation
3. **Smaller images** - ~76MB rootfs vs ~500MB+ for Debian
4. **BusyBox limitations** - Install util-linux for full losetup functionality
5. **Loop device partitions** - Use --offset for partition access in containers

## Common Failure Points

1. **Missing kernel** - Need `linux-virt` package for QEMU
2. **initramfs generation** - `mkinitfs` required
3. **GRUB installation** - Different from Debian process
4. **apk caching** - Requires network during build
5. **BusyBox losetup** - Lacks --find --show flags, install util-linux
6. **Partition scanning** - --partscan doesn't work in containers, use manual offset
1 change: 1 addition & 0 deletions linux/build-alpine/artifacts/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
output/
44 changes: 44 additions & 0 deletions linux/build-alpine/artifacts/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Dockerfile for building Alpine Linux rootfs
# Uses alpine-make-rootfs for proper Alpine setup
FROM --platform=linux/amd64 alpine:3.19

# Install build tools
RUN apk add --no-cache \
bash \
curl \
wget \
git \
e2fsprogs \
parted \
dosfstools \
squashfs-tools \
xorriso \
grub \
grub-bios \
grub-efi \
syslinux \
cpio \
gzip \
xz \
linux-virt \
qemu-system-x86_64 \
qemu-img \
mkinitfs \
busybox-static \
util-linux

# Download alpine-make-rootfs
RUN wget -q https://raw.githubusercontent.com/alpinelinux/alpine-make-rootfs/master/alpine-make-rootfs \
-O /usr/local/bin/alpine-make-rootfs && \
chmod +x /usr/local/bin/alpine-make-rootfs

# Create working directory
WORKDIR /build

# Copy build scripts
COPY build-scripts/ /build/scripts/

# Set execution permissions
RUN chmod +x /build/scripts/*.sh 2>/dev/null || true

CMD ["/bin/bash"]
Loading