Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 17 additions & 6 deletions .github/workflows/rust.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,29 @@ env:
CARGO_TERM_COLOR: always

jobs:
build:

format-and-docs:
name: Format and Documentation Checks
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: Run format check
run: cargo fmt --check

- name: Run cargo-readme check
run: cargo install cargo-readme && cargo readme > TMP_README.md && diff -b TMP_README.md README.md

- name: Run format check
run: cargo fmt --check
build-and-test:
name: Build, Test, and Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Build
run: cargo build --verbose

- name: Run tests
run: cargo test --verbose

- name: Run clippy
run: cargo clippy --all-features -- -D warnings
47 changes: 47 additions & 0 deletions .github/workflows/scheduled-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Scheduled Tests

on:
schedule:
# Run daily at 3:00 AM UTC
- cron: '0 3 * * *'
workflow_dispatch: # Allow manual trigger

env:
CARGO_TERM_COLOR: always

jobs:
daily-test:
name: Daily Integration Test
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Cache cargo registry
uses: actions/cache@v4
with:
path: ~/.cargo/registry
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo index
uses: actions/cache@v4
with:
path: ~/.cargo/git
key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo build
uses: actions/cache@v4
with:
path: target
key: ${{ runner.os }}-cargo-build-target-${{ hashFiles('**/Cargo.lock') }}

- name: Build
run: cargo build --verbose

- name: Run tests
run: cargo test --verbose
timeout-minutes: 15

- name: Run example
run: cargo run --example find_siblings 15169
timeout-minutes: 10
32 changes: 31 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,36 @@

All notable changes to this project will be documented in this file.

## v1.1.0 -- 2025-10-29

### New Features

* Added `As2org::get_all_files_with_dates()` method to list all available dataset files with their dates
* Added `As2org::get_latest_file_url()` method to get the URL for the latest dataset file
* Introduced `BASE_URL` constant for CAIDA dataset location, improving code maintainability

### Improvements

* Enhanced rustdoc with comprehensive examples and usage patterns
* Marked rustdoc examples with `no_run` to prevent unnecessary network calls during doc tests
* Significantly expanded test suite with 15 comprehensive unit tests covering:
- Database initialization and loading
- ASN information retrieval (existing and non-existent)
- Sibling ASN lookups and consistency checks
- Organization mapping validation
- Helper function testing
- Internal data structure consistency
* Optimized test suite to minimize data fetching by sharing database instance across tests
* Improved code documentation and inline comments
* Updated GitHub CI workflows:
- Separated format/documentation checks from build/test/lint into parallel jobs
- Added scheduled daily tests to continuously verify library compatibility with CAIDA data

### Bug Fixes

* Fixed regex pattern in `get_most_recent_data()` to use proper digit matching (`\d{8}`)
* Refactored URL construction to use `BASE_URL` constant for consistency

## v1.0.0 -- 2025-04-04

This crate is now being used in several production systems, and we now consider this crate stable.
Expand All @@ -24,7 +54,7 @@ Initial release of `as2org-rs`.
The main returning data structure is `As2orgAsInfo`, which contains the following fields:

* `asn`: the AS number
* `name`: the name provide for the individual AS number
* `name`: the name provided for the individual AS number
* `country_code`: the country code of the organization's registration country
* `org_id`: maps to an organization entry
* `org_name`: the name of the organization
Expand Down
5 changes: 3 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "as2org-rs"
version = "1.0.0"
version = "1.1.0"
authors = ["Mingwei Zhang <mingwei@bgpkit.com>"]
edition = "2021"
readme = "README.md"
Expand All @@ -13,8 +13,9 @@ A library helps accessing CAIDA's as-to-organization mapping data.
keywords = ["bgp", "bgpkit", "caida", "as2org"]

[dependencies]
oneio = { version = "0.17.0", default-features = false, features = ["remote", "gz", "rustls"] }
oneio = { version = "0.19.2", default-features = false, features = ["https", "gz"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
anyhow = "1.0"
regex = "1.10.5"
chrono = { version = "0.4" }
91 changes: 78 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,97 @@
[![Docs.rs](https://docs.rs/as2org-rs/badge.svg)](https://docs.rs/as2org-rs)
[![License](https://img.shields.io/crates/l/as2org-rs)](https://raw.githubusercontent.com/bgpkit/as2org-rs/main/LICENSE)

## CAIDA as2org utility.
as2org-rs: Access CAIDA AS-to-Organization mappings in Rust

This crate provides a small, dependency-light helper for reading and querying
CAIDA's AS Organizations dataset. It downloads (or opens a local/remote path)
the newline-delimited JSON (JSONL) files published by CAIDA and exposes a
simple API to:

- Fetch the latest dataset URL from CAIDA
- Load the dataset into memory
- Look up information for a given ASN
- Find all "sibling" ASNs that belong to the same organization
- Test whether two ASNs are siblings (belong to the same org)

The crate supports local files, HTTP(S) URLs, and gz-compressed inputs via
the `oneio` crate.

### Installation

Add the dependency to your `Cargo.toml`:

```toml
[dependencies]
as2org-rs = "1"
```

### Data source
* The CAIDA [AS Organizations Dataset](http://www.caida.org/data/as-organizations).
- CAIDA AS Organizations Dataset: <http://www.caida.org/data/as-organizations>

### Data model

Public return type:

### Data structure
`As2orgAsInfo` contains:
- `asn`: the AS number
- `name`: the name provided for the individual AS number
- `country_code`: the registration country code of the organization
- `org_id`: the CAIDA/WHOIS organization identifier
- `org_name`: the organization's name
- `source`: the RIR or NIR database that contained this entry

`As2orgAsInfo`:
* `asn`: the AS number
* `name`: the name provide for the individual AS number
* `country_code`: the country code of the organization's registration country
* `org_id`: maps to an organization entry
* `org_name`: the name of the organization
* `source`: the RIR or NIR database which was contained this entry
### Quickstart

### Examples
Load the most recent dataset and run typical queries:

```rust
use as2org_rs::As2org;

// Construct from the latest public dataset (requires network access)
let as2org = As2org::new(None).unwrap();
dbg!(as2org.get_as_info(400644).unwrap());
dbg!(as2org.get_siblings(15169).unwrap());

// Look up one ASN
let info = as2org.get_as_info(15169).unwrap();
assert_eq!(info.org_id.is_empty(), false);

// List all siblings for an ASN (ASNs under the same org)
let siblings = as2org.get_siblings(15169).unwrap();
assert!(siblings.iter().any(|s| s.asn == 36040));

// Check whether two ASNs are siblings
assert!(as2org.are_siblings(15169, 36040));
```

### Offline and custom input

You can also point to a local file path or a remote URL (HTTP/HTTPS), gzipped
or plain:

```rust
use as2org_rs::As2org;

// From a local jsonl.gz file
let as2org = As2org::new(Some("/path/to/20250101.as-org2info.jsonl.gz".into())).unwrap();

// From an explicit HTTPS URL
let as2org = As2org::new(Some("https://publicdata.caida.org/datasets/as-organizations/20250101.as-org2info.jsonl.gz".into())).unwrap();
```

### Errors

Constructors and helper functions return `anyhow::Result<T>`. For lookups,
the API returns `Option<_>` when a requested ASN or organization is missing.

### Notes

- Network access is only required when you pass `None` to `As2org::new` so the
crate can discover and fetch the latest dataset URL.
- Dataset files can be large; loading them will allocate in-memory maps for
fast queries.
- This crate is not affiliated with CAIDA. Please review CAIDA's data usage
policies before redistribution or heavy automated access.

## License

MIT
Loading