Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ coverage.html
# vendor/
/cmd/alterx/alterx
/alterx
/.testing
119 changes: 104 additions & 15 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,28 +43,41 @@ make run
**Single test execution:**
```bash
go test -v -run TestFunctionName ./path/to/package

# Run specific test at package root
go test -v -run TestMutator
go test -v -run TestInput

# Run with race detector
go test -race -v ./...
```

## Architecture

### Core Components

**1. Entry Point** (`cmd/alterx/main.go`)
- CLI argument parsing via `runner.ParseFlags()`
- Mode selection logic (default/discover/both)
- Pattern mining flow orchestration
- Deduplication between mined and user-defined patterns
- CLI argument parsing via `runner.ParseFlags()` using goflags library
- Mode selection logic (default/discover/both) passed to `alterx.Options`
- Pattern mining flow orchestration in `Mutator.Execute()` via goroutines
- Output writing with `getOutputWriter()` (file or stdout)
- Rules saving via `Mutator.SaveRules()` after execution completes

**2. Mutator Engine** (`mutator.go`, `algo.go`)
- `Mutator` struct: Core permutation generator
- `ClusterBomb` algorithm: Nth-order payload combination using recursion
- `Mutator` struct: Core permutation generator with concurrent execution
- `Execute()` method: Runs default and/or mining modes in parallel goroutines
- `ClusterBomb` algorithm: Recursive Nth-order payload combination (cartesian product)
- `IndexMap`: Maintains deterministic ordering for payload iteration
- Template replacement using variables extracted from input domains
- Template replacement using `fasttemplate` library with `{{var}}` syntax
- Deduplication via `dedupe.NewDedupe()` with configurable memory limits
- Smart optimization: Skips words already present in leftmost subdomain

**3. Input Processing** (`inputs.go`)
- `Input` struct: Parses domains into components (sub, suffix, tld, etld, etc.)
- Uses `publicsuffix` library to extract eTLD and root domain correctly
- Variable extraction: `{{sub}}`, `{{sub1}}`, `{{suffix}}`, `{{root}}`, `{{sld}}`, etc.
- Multi-level subdomain support (e.g., `cloud.api.example.com` → `sub=cloud`, `sub1=api`)
- `getNValidateRootDomain()`: Validates homogeneous domains for pattern mining

**4. Pattern Mining** (`internal/patternmining/`)
- **Three-phase discovery algorithm:**
Expand Down Expand Up @@ -124,13 +137,14 @@ Generates all combinations of payloads across variables:
- Early exit when no variables present in template

### Pattern Mining Workflow
1. **Validate input:** Ensure domains share common target (e.g., `.example.com`)
2. **Build distance table:** Compute pairwise Levenshtein distances
3. **Phase 1 - Edit clustering:** Group by edit distance (min to max)
1. **Validate input:** `getNValidateRootDomain()` ensures domains share common root
2. **Build distance table:** Compute pairwise Levenshtein distances with memoization
3. **Phase 1 - Edit clustering:** Group by edit distance (min to max) without prefix enforcement
4. **Phase 2 - N-grams:** Generate unigrams/bigrams, cluster by prefix
5. **Phase 3 - Prefix clustering:** Apply edit distance within prefix groups
6. **Quality validation:** Filter patterns using threshold and ratio metrics
7. **Generate subdomains:** Use DFA to produce strings from patterns
5. **Phase 3 - Prefix clustering:** Apply edit distance within prefix groups for refinement
6. **Quality validation:** `isGoodRule()` filters patterns using threshold and ratio metrics
7. **Regex generation:** Convert clusters to regex with alternations `(a|b)` and optional groups
8. **Generate subdomains:** DFA engine produces fixed-length strings from patterns

## Pattern Mining Modes

Expand All @@ -155,6 +169,46 @@ Generates all combinations of payloads across variables:
- `-quality-ratio 25`: Max ratio of synthetic/observed subdomains
- `-save-rules output.json`: Save discovered patterns and metadata to JSON file

## Execution Flow

### Mode-Based Execution
The `Mutator.Execute()` method orchestrates parallel execution based on mode:

**Default Mode:**
1. Parse inputs → Extract variables → Validate patterns
2. Optionally enrich payloads from input subdomains
3. For each input × pattern combination:
- Replace input variables (e.g., `{{sub}}`, `{{suffix}}`)
- Execute ClusterBomb for payload permutations
- Skip patterns with missing variables
4. Deduplicate results and write to output

**Discover Mode:**
1. Validate homogeneous domains (must share root)
2. Initialize `Miner` with distance/quality parameters
3. Run three-phase clustering algorithm
4. Generate regex patterns from clusters
5. Use DFA engine to produce subdomains
6. Skip input domains from output (avoid duplicates)

**Both Mode:**
- Runs default and discover in parallel goroutines
- Deduplication happens at channel level
- Results combined before writing

### Key Variables & Utilities

**Variable Extraction (`util.go`):**
- `getAllVars()`: Extract variable names from template using regex
- `checkMissing()`: Validate all variables have values before execution
- `getSampleMap()`: Merge input variables with payload variables for validation
- `unsafeToBytes()`: Zero-allocation string→byte conversion for performance

**Deduplication:**
- Enabled by default (`DedupeResults = true` in `mutator.go`)
- Uses memory-efficient dedupe from projectdiscovery/utils
- Estimates required memory: `count * maxkeyLenInBytes`

## Common Patterns

### Adding New CLI Flags
Expand All @@ -171,6 +225,14 @@ Generates all combinations of payloads across variables:
- **Clustering logic:** `internal/patternmining/clustering.go`
- **Tokenization rules:** `tokenize()` in `internal/patternmining/regex.go`
- **Quality metrics:** `isGoodRule()` in `internal/patternmining/patternmining.go`
- **DFA operations:** `internal/dank/dank.go` (minimize, generate strings)

### Working with Modes
When adding features that interact with modes:
1. Check `opts.Mode` in `New()` to conditionally initialize components
2. Use goroutines in `Execute()` for parallel execution (default + discover)
3. Remember to close channels properly in `Execute()` cleanup goroutine
4. Mode validation happens in `Options.Validate()` with backwards-compatible defaults

## Testing Strategy

Expand All @@ -182,11 +244,14 @@ Generates all combinations of payloads across variables:
## Important Notes

- **Dedupe enabled by default:** `DedupeResults = true` in `mutator.go`
- **Prefix optimization:** ClusterBomb skips words already in leftmost subdomain
- **Prefix optimization:** ClusterBomb skips words already in leftmost subdomain (lines 378-387 in `mutator.go`)
- **Pattern quality critical:** Low thresholds generate millions of subdomains
- **Distance memoization:** Pattern mining caches Levenshtein distances for performance
- **Distance memoization:** Pattern mining caches Levenshtein distances for performance in `Miner.memo` map
- **DFA minimization:** Three-pass Brzozowski ensures minimal automaton
- **No breaking changes:** All pattern mining is additive; default behavior unchanged
- **SaveRules timing:** Must be called AFTER `Execute()` to ensure mining completes (line 68-72 in `cmd/alterx/main.go`)
- **Homogeneous domains required:** Discover/both modes validate all domains share same root via `getNValidateRootDomain()`
- **Goroutine-safe:** Pattern mining and default mode run in separate goroutines with WaitGroup coordination

## Credits

Expand All @@ -202,3 +267,27 @@ Generates all combinations of payloads across variables:
- Use `gologger` for all logging (not fmt.Println)
- Follow Go naming conventions and project structure
- Add tests for new features
- Use `fasttemplate` for variable replacement (already integrated)
- Respect memory limits via `MaxSize` option in output writing

## Common Gotchas & Troubleshooting

### Pattern Mining Issues
- **"domains do not have the same root"**: All input domains must share a common root (e.g., all under `.example.com`). Use `getNValidateRootDomain()` to validate.
- **Too many patterns generated**: Decrease `-max-distance` or increase `-pattern-threshold` and `-quality-ratio`
- **No patterns discovered**: Increase `-max-distance` or decrease `-min-distance` to allow more clustering

### ClusterBomb Performance
- **Memory exhaustion**: Reduce payload sizes or use `-limit` to cap output
- **Slow execution**: Check that prefix optimization is working (should skip redundant words)
- **Expected combinations not appearing**: Verify variables exist in pattern template and payload map

### Mode Selection
- **Default mode** works without any special validation (backwards compatible)
- **Discover/Both modes** require homogeneous domains (same root)
- **SaveRules only works** with discover/both modes after execution completes

### Testing Tips
- Use `DryRun()` or `EstimateCount()` to validate logic without generating output
- Test pattern mining with small domain sets first (5-10 domains)
- For ClusterBomb testing, use simple 2-variable patterns to verify cartesian product logic
150 changes: 26 additions & 124 deletions cmd/alterx/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,101 +3,34 @@ package main
import (
"io"
"os"
"strings"

"github.com/projectdiscovery/alterx"
"github.com/projectdiscovery/alterx/internal/patternmining"
"github.com/projectdiscovery/alterx/internal/runner"
"github.com/projectdiscovery/gologger"
"golang.org/x/net/publicsuffix"
)

func main() {

cliOpts := runner.ParseFlags()

// Validate mode
if cliOpts.Mode != "default" && cliOpts.Mode != "discover" && cliOpts.Mode != "both" {
gologger.Fatal().Msgf("invalid mode: %s (must be 'default', 'discover', or 'both')", cliOpts.Mode)
}

// Write output with deduplication
output := getOutputWriter(cliOpts.Output)
Comment thread
tarunKoyalwar marked this conversation as resolved.
defer closeOutput(output, cliOpts.Output)
// we intentionally remove all known subdomains from the output
// that way only the discovered subdomains are included in the output
dedupWriter := alterx.NewDedupingWriter(output, cliOpts.Domains...)
defer func() {
if err := dedupWriter.Close(); err != nil {
gologger.Error().Msgf("failed to close dedup writer: %v", err)
}
}()

var estimatedDiscoverOutputs = 0

// Handle pattern mining modes (discover or both)
var minedPatterns []string
if cliOpts.Mode == "discover" || cliOpts.Mode == "both" {
target := getNValidateRootDomain(cliOpts.Domains)
if target == "" {
gologger.Fatal().Msgf("pattern mining requires domains with a common target (e.g., sub.example.com)")
}
gologger.Info().Msgf("Target domain: %s", target)

miner := patternmining.NewMiner(&patternmining.Options{
Domains: cliOpts.Domains,
Target: target,
MinDistance: cliOpts.MinDistance,
MaxDistance: cliOpts.MaxDistance,
PatternThreshold: cliOpts.PatternThreshold,
QualityRatio: float64(cliOpts.QualityRatio),
MaxLength: 1000,
NgramsLimit: cliOpts.NgramsLimit,
})

result, err := miner.Mine()
if err != nil {
gologger.Fatal().Msgf("pattern mining failed: %v", err)
}

// Save rules if requested
if cliOpts.SaveRules != "" {
if err := miner.SaveRules(result, cliOpts.SaveRules); err != nil {
gologger.Error().Msgf("failed to save rules: %v", err)
} else {
gologger.Info().Msgf("Saved %d patterns to %s", len(result.Patterns), cliOpts.SaveRules)
}
}

estimatedDiscoverOutputs = int(miner.EstimateCount(result.Patterns))

// Generate subdomains from discovered patterns
// and exit early
if cliOpts.Mode == "discover" {
// In discover mode, only use mined patterns
generated := miner.GenerateFromPatterns(result.Patterns)
for _, subdomain := range generated {
if _, err := dedupWriter.Write([]byte(subdomain + "\n")); err != nil {
gologger.Error().Msgf("failed to write subdomain: %v", err)
}
}
gologger.Info().Msgf("Generated %d unique subdomains from discovered patterns", dedupWriter.Count())
return
}

// In 'both' mode, collect mined patterns for combination
minedPatterns = result.Patterns
gologger.Info().Msgf("Discovered %d patterns, combining with user-defined patterns", len(minedPatterns))
}

// Handle default mode or 'both' mode
// Build alterx options with all modes supported
alterOpts := alterx.Options{
Domains: cliOpts.Domains,
Patterns: cliOpts.Patterns,
Payloads: cliOpts.Payloads,
Limit: cliOpts.Limit,
Enrich: cliOpts.Enrich,
MaxSize: cliOpts.MaxSize,
Domains: cliOpts.Domains,
Patterns: cliOpts.Patterns,
Payloads: cliOpts.Payloads,
Limit: cliOpts.Limit,
Enrich: cliOpts.Enrich,
MaxSize: cliOpts.MaxSize,
Mode: cliOpts.Mode,
MinDistance: cliOpts.MinDistance,
MaxDistance: cliOpts.MaxDistance,
PatternThreshold: cliOpts.PatternThreshold,
QualityRatio: float64(cliOpts.QualityRatio),
NgramsLimit: cliOpts.NgramsLimit,
MaxLength: 1000,
}

if cliOpts.PermutationConfig != "" {
Expand All @@ -115,20 +48,26 @@ func main() {

m, err := alterx.New(&alterOpts)
if err != nil {
gologger.Fatal().Msgf("failed to parse alterx config got %v", err)
gologger.Fatal().Msgf("failed to initialize alterx: %v", err)
}

if cliOpts.Estimate {
estimated := m.EstimateCount() + estimatedDiscoverOutputs
estimated := m.EstimateCount()
gologger.Info().Msgf("Estimated Payloads (including duplicates): %v", estimated)
return
}
// Write alterx results to same dedupWriter (automatic deduplication)
if err = m.ExecuteWithWriter(dedupWriter); err != nil {
gologger.Error().Msgf("failed to write output to file got %v", err)

// Execute mutator (handles all modes internally)
if err = m.ExecuteWithWriter(output); err != nil {
gologger.Error().Msgf("failed to execute alterx: %v", err)
}

gologger.Info().Msgf("Generated %d total unique subdomains (both modes)", dedupWriter.Count())
// Save rules if requested (must be after Execute to ensure mining is complete)
if cliOpts.SaveRules != "" {
if err := m.SaveRules(cliOpts.SaveRules); err != nil {
gologger.Error().Msgf("failed to save rules: %v", err)
}
}
}

// getOutputWriter returns the appropriate output writer
Expand All @@ -142,40 +81,3 @@ func getOutputWriter(outputPath string) io.Writer {
}
return os.Stdout
}

// closeOutput closes the output writer if it's a file
func closeOutput(output io.Writer, outputPath string) {
if outputPath != "" {
if closer, ok := output.(io.Closer); ok {
if err := closer.Close(); err != nil {
gologger.Error().Msgf("failed to close output file: %v", err)
}
}
}
}

func getNValidateRootDomain(domains []string) string {
if len(domains) == 0 {
return ""
}

var rootDomain string
// parse root domain from publicsuffix for first entry
for _, domain := range domains {
if strings.TrimSpace(domain) == "" {
continue
}
if rootDomain == "" {
root, err := publicsuffix.EffectiveTLDPlusOne(domain)
if err != nil || root == "" {
gologger.Fatal().Msgf("failed to derive root domain from %v: %v", domain, err)
}
rootDomain = root
} else {
if domain != rootDomain && !strings.HasSuffix(domain, "."+rootDomain) {
gologger.Fatal().Msgf("domain %v does not have the same root domain as %v, only homogeneous domains are supported in discover mode", domain, rootDomain)
}
}
}
return rootDomain
}
20 changes: 20 additions & 0 deletions internal/patternmining/patternmining.go
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,26 @@ func (m *Miner) validateDomains() []string {
gologger.Verbose().Msgf("Rejecting malformed input: %s", host)
continue
}
// see: https://github.com/projectdiscovery/alterx/issues/285
// due to known blocking issues we add some safety check to skip certain domains
// this isn't silver bullet but avoids known blocking issues
if len(tokens[0]) > 5 {
// if subdomain has more than 5 levels then skip it
// ex: service.api.dev.home.us1.americas.example.com
gologger.Verbose().Msgf("Rejecting input: %s since it has more than 5 levels", host)
continue
}
// to avoid expensive computation skip any subdomain that can be tokenized into more than 10 tokens
tokenCount := 0
for _, tokenGroup := range tokens[0] {
tokenCount += len(tokenGroup)
}
if tokenCount > 10 {
// ex: api1dev-home-us1..... basically even if subdomain levels are less than 5 but have too many
// separators the vector length would become too long
gologger.Verbose().Msgf("Rejecting input: %s since it has more than 10 tokens (found %d)", host, tokenCount)
continue
}
knownHosts = append(knownHosts, host)
}
return m.removeDuplicatesAndSort(knownHosts)
Expand Down
Loading
Loading