Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,123 @@

All notable changes to this project will be documented in this file.

## Unreleased

### RPKIviews Historical Data Support

* **Added RPKIviews as a historical RPKI data source**: Users can now load historical RPKI data from RPKIviews collectors in addition to RIPE NCC archives
- New `RpkiViewsCollector` enum with four collectors: SoborostNet (default), MassarsNet, AttnJp, and KerfuffleNet
- Added `RpkiTrie::from_rpkiviews(collector, date)` method for loading from a specific collector
- Added `RpkiTrie::from_rpkiviews_file(url, date)` and `from_rpkiviews_files(urls, date)` for loading from specific archive URLs
- Added `list_rpkiviews_files(collector, date)` function to discover available archives for a given date
- New `HistoricalRpkiSource` enum to explicitly select between RIPE and RPKIviews sources

* **Streaming optimization for .tgz archives**: RPKIviews archives are streamed efficiently without downloading the entire file
- `rpki-client.json` is located at position 3-4 in the archive, allowing early termination after ~80MB instead of downloading 300+ MB
- New `extract_file_from_tgz(url, target_path)` function for streaming extraction of specific files
- New `list_files_in_tgz(url, max_entries)` function for listing archive contents with early termination
- New `tgz_contains_file(url, target_path)` function for checking file existence
- Uses `reqwest` for HTTP streaming and external `gunzip` for decompression
- Test completion time reduced from several minutes to ~8 seconds

* **Unified rpki-client JSON parsing**: Extracted shared parsing logic for rpki-client JSON format
- New internal `rpki_client.rs` module with `RpkiClientData` struct and robust deserializers
- Handles variations in ASN formats (numeric `12345` vs string `"AS12345"`)
- Handles variations in ASPA field names (`customer_asid` vs `customer`)
- Handles provider arrays as both numbers and strings
- Used by Cloudflare, RIPE historical, and RPKIviews sources

* **Public ROA and ASPA structs**: Added stable public API types
- New `Roa` struct with fields: `prefix`, `asn`, `max_length`, `not_before`, `not_after`
- New `Aspa` struct with fields: `customer_asn`, `providers`
- Internal rpki-client format structs are now `pub(crate)` only

* **Updated RIPE historical to use JSON format**: Changed from CSV to `output.json.xz` for consistency
- Requires `xz` feature in oneio (now enabled by default for rpki feature)
- Provides richer data including expiry timestamps

* **New BgpkitCommons methods**:
- `load_rpki_historical(source, date)` - Load historical RPKI data from specified source
- `list_rpki_files(source, date)` - List available RPKI files for a date from specified source
- `load_rpki_from_files(urls, date)` - Load and merge RPKI data from multiple file URLs

* **New example**: Added `examples/rpki_historical.rs` demonstrating historical RPKI data loading

* **Updated example**: `examples/list_aspas.rs` now counts ASPA objects for first day of years 2020-2025

### Dependencies

* Added `reqwest` (with blocking feature) for HTTP streaming
* Added `tar` crate for reading tar archives
* Enabled `xz` feature in `oneio` for RIPE historical JSON support

### Crate Consolidation

* **Migrated `as2org-rs` into bgpkit-commons**: The CAIDA AS-to-Organization mapping functionality previously provided by the external `as2org-rs` crate has been fully integrated into the `asinfo` module
- New `src/asinfo/as2org.rs` module provides `As2org` struct with `new()`, `get_as_info()`, `get_siblings()`, and `are_siblings()` methods
- Removed external `as2org-rs` dependency from Cargo.toml
- Single codebase simplifies maintenance and patch application

* **Migrated `peeringdb-rs` into bgpkit-commons**: The PeeringDB API access functionality previously provided by the external `peeringdb-rs` crate has been fully integrated into the `asinfo` module
- Updated `src/asinfo/peeringdb.rs` with full PeeringDB API client implementation
- Includes `PeeringdbNet` struct and `load_peeringdb_net()` function for direct API access
- Removed external `peeringdb-rs` dependency from Cargo.toml

* **Updated feature flags**: The `asinfo` feature now uses `regex` instead of external crate dependencies
- Before: `asinfo = ["as2org-rs", "peeringdb-rs", "oneio", "serde_json", "tracing", "chrono"]`
- After: `asinfo = ["oneio", "serde_json", "tracing", "chrono", "regex"]`

### API Improvements

* **AsInfoBuilder**: Added a new builder pattern for loading AS information with specific data sources
- New `AsInfoBuilder` struct with fluent API methods: `with_as2org()`, `with_population()`, `with_hegemony()`, `with_peeringdb()`, `with_all()`
- Added `asinfo_builder()` method to `BgpkitCommons` for creating builders
- Added `load_asinfo_with(builder)` method to `BgpkitCommons` for loading with builder configuration
- The existing `load_asinfo(bool, bool, bool, bool)` method is preserved for backward compatibility

**Before (confusing boolean parameters):**
```rust
commons.load_asinfo(true, false, true, false)?;
```

**After (clear builder pattern):**
```rust
let builder = commons.asinfo_builder()
.with_as2org()
.with_hegemony();
commons.load_asinfo_with(builder)?;
```

### Public API Enhancements

* **asinfo module**: Added `PeeringdbData` to public exports for direct module access
* All modules now consistently support both:
- Central access via `BgpkitCommons` instance
- Direct module access (e.g., `bgpkit_commons::bogons::Bogons::new()`)

### Testing Improvements

* **Comprehensive as2org module tests**: Added extensive unit tests for the migrated CAIDA AS-to-Organization functionality
- JSON deserialization tests for `As2orgJsonOrg` and `As2orgJsonAs` structures
- Tests for optional fields and default values
- `As2orgAsInfo` struct creation and serialization round-trip tests
- `fix_latin1_misinterpretation` function tests for edge cases
- Integration tests (ignored by default) for `As2org::new()`, `get_as_info()`, `get_siblings()`, and `are_siblings()` methods

* **Comprehensive peeringdb module tests**: Added extensive unit tests for the migrated PeeringDB functionality
- `PeeringdbData` struct creation, serialization, and deserialization tests
- `PeeringdbNet` struct tests with all optional fields
- `PeeringdbNetResponse` API response deserialization tests
- `Peeringdb` struct tests for `get_data()`, `contains()`, `len()`, `is_empty()`, and `get_all_asns()` methods
- Empty database edge case tests
- Integration tests (ignored by default) for live API access

* **New Peeringdb helper methods**: Added utility methods to the `Peeringdb` struct for better usability
- `len()`: Get the number of networks in the database
- `is_empty()`: Check if the database is empty
- `contains(asn)`: Check if an ASN exists in PeeringDB
- `get_all_asns()`: Get all ASNs in the database

## v0.9.6 - 2025-10-29

### Maintenance
Expand Down
16 changes: 10 additions & 6 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,32 +18,32 @@ keywords = ["bgp", "bgpkit"]
thiserror = "2.0"
serde = { version = "1.0", features = ["derive"] }

as2org-rs = { version = "1.1.1", optional = true }
chrono = { version = "0.4", features = ["serde"], optional = true }
ipnet = { version = "2.9", features = ["serde"], optional = true }
ipnet-trie = { version = "0.3.0", optional = true }
oneio = { version = "0.20.0", optional = true, features = ["json"] }
peeringdb-rs = { version = "0.1.3", optional = true }
oneio = { version = "0.20.0", optional = true, features = ["json", "xz"] }
regex = { version = "1", optional = true }
serde_json = { version = "1", optional = true }
tracing = { version = "0.1", optional = true }
reqwest = { version = "0.12", optional = true, features = ["blocking"] }
tar = { version = "0.4", optional = true }

[dev-dependencies]
tracing-subscriber = "0.3"
serde_json = "1"
oneio = { version = "0.20.0", features = ["json"] }
oneio = { version = "0.20.0", features = ["json", "xz"] }


[features]
default = ["all"]

# Module features
asinfo = ["as2org-rs", "peeringdb-rs", "oneio", "serde_json", "tracing", "chrono"]
asinfo = ["oneio", "serde_json", "tracing", "chrono", "regex"]
as2rel = ["oneio", "serde_json", "tracing"]
bogons = ["oneio", "ipnet", "regex", "chrono"]
countries = ["oneio"]
mrt_collectors = ["oneio", "chrono"]
rpki = ["oneio", "ipnet", "ipnet-trie", "chrono", "tracing"]
rpki = ["oneio", "ipnet", "ipnet-trie", "chrono", "tracing", "reqwest", "tar", "serde_json"]

# Convenience feature to enable all modules
all = ["asinfo", "as2rel", "bogons", "countries", "mrt_collectors", "rpki"]
Expand All @@ -61,6 +61,10 @@ required-features = ["mrt_collectors"]
name = "list_aspas"
required-features = ["rpki"]

[[example]]
name = "rpki_historical"
required-features = ["rpki"]

[lints.clippy]
uninlined_format_args = "allow"
collapsible_if = "allow"
42 changes: 36 additions & 6 deletions examples/list_aspas.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,39 @@
use serde_json::json;
use bgpkit_commons::rpki::{RpkiTrie, RpkiViewsCollector};
use chrono::NaiveDate;

fn main() {
let cf_data = bgpkit_commons::rpki::CfData::new().unwrap();
println!(
"{}",
serde_json::to_string_pretty(&json!(cf_data.aspas)).unwrap()
);
println!("Counting ASPA objects on the first day of each year (2020-2025)");
println!("{}", "=".repeat(60));

for year in 2020..=2025 {
let date = NaiveDate::from_ymd_opt(year, 1, 1).unwrap();

// Try RIPE historical first
match RpkiTrie::from_ripe_historical(date) {
Ok(trie) => {
println!("{}-01-01: {} ASPAs (from RIPE)", year, trie.aspas.len());
continue;
}
Err(_) => {
// RIPE failed, try RPKIviews
}
}

// Fallback to RPKIviews
match RpkiTrie::from_rpkiviews(RpkiViewsCollector::default(), date) {
Ok(trie) => {
println!(
"{}-01-01: {} ASPAs (from RPKIviews)",
year,
trie.aspas.len()
);
}
Err(_) => {
println!("{}-01-01: No data available", year);
}
}
}

println!("{}", "=".repeat(60));
println!("Done!");
}
64 changes: 64 additions & 0 deletions examples/rpki_historical.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
//! Example demonstrating historical RPKI data loading from RIPE NCC and RPKIviews
//!
//! Run with: cargo run --example rpki_historical --features rpki

use bgpkit_commons::BgpkitCommons;
use bgpkit_commons::rpki::{HistoricalRpkiSource, RpkiViewsCollector};
use chrono::NaiveDate;

fn main() {
// Initialize tracing for debug output
tracing_subscriber::fmt::init();

let date = NaiveDate::from_ymd_opt(2024, 1, 4).unwrap();
let commons = BgpkitCommons::new();

println!("=== Listing available RPKI files for {} ===\n", date);

// List files from RIPE NCC (one file per RIR)
println!("RIPE NCC files:");
match commons.list_rpki_files(date, HistoricalRpkiSource::Ripe) {
Ok(files) => {
for file in &files {
println!(
" - {} (RIR: {})",
file.url,
file.rir
.map(|r| r.to_string())
.unwrap_or_else(|| "N/A".to_string())
);
}
}
Err(e) => println!(" Error listing RIPE files: {}", e),
}
println!();

// List files from RPKIviews (multiple snapshots per day)
println!("RPKIviews files (KerfuffleNet collector):");
let source = HistoricalRpkiSource::RpkiViews(RpkiViewsCollector::KerfuffleNet);
match commons.list_rpki_files(date, source) {
Ok(files) => {
println!(" Found {} files for {}", files.len(), date);
// Show first 5 files
for file in files.iter().take(5) {
println!(
" - {} ({} bytes, timestamp: {})",
file.url,
file.size.unwrap_or(0),
file.timestamp
);
}
if files.len() > 5 {
println!(" ... and {} more files", files.len() - 5);
}
}
Err(e) => println!(" Error listing RPKIviews files: {}", e),
}
println!();

// Show available collectors
println!("Available RPKIviews collectors:");
for collector in RpkiViewsCollector::all() {
println!(" - {} ({})", collector, collector.base_url());
}
}
Loading