Skip to content

ovipaul/BathyNet

Repository files navigation

BathyNet

Standalone point cloud semantic segmentation with PointCNN (PyTorch).

This repository provides an end-to-end pipeline for classifying LAS/LAZ point clouds:

  1. Convert LAS to H5 training blocks, 2) Train a PointCNN model, 3) Run inference on new point clouds. No proprietary dependencies.

Features

  • Full pipeline: LAS → H5 → training → inference
  • PointCNN segmentation model with hierarchical X-Conv layers
  • Clean, configurable CLIs for preprocessing, training, and inference
  • Works with standard LAS classes and optional features (intensity, num_returns)
  • JSON-config support for reproducible runs

Repository structure

BathyNet/
├── 01_prepare_data.py              # Convert LAS to H5 blocks
├── 02_train_model.py               # Train PointCNN on H5 data
├── 03_run_inference.py             # Inference on LAS/LAZ with trained model
├── config_example.json             # Example configuration
├── pointcnn_default_config.json    # Default hyperparameters (reference)
├── requirements.txt                # Python dependencies
├── models/
│   ├── pointcnn_core.py            # PointCNN network + dataset + inference utils
│   ├── pointcnn_segmentation_trainer.py
│   └── trainer.py                  # Training entry used by 02_train_model.py
└── utilities/
    └── data_converter.py           # LAS → H5 converter

Requirements

  • Python 3.8+ recommended
  • PyTorch 1.8+ (with CUDA if using GPU)
  • See requirements.txt for the full list:
    • torch, torchvision, torchaudio
    • numpy, h5py, laspy
    • scikit-learn, scipy, tqdm, matplotlib, seaborn

Install dependencies:

pip install -r requirements.txt

1) Data preparation (LAS → H5)

Script: 01_prepare_data.py

This converts raw LAS files into H5 blocks suitable for training. It will create train/ and val/ subfolders under the output path and write a meta.json file describing classes and settings.

Basic usage:

python 01_prepare_data.py --input_dir ./data/las_files --output_dir ./data/h5_files

Advanced options (see --help for all):

python 01_prepare_data.py \
  --input_dir ./data/las_files \
  --output_dir ./data/h5_files \
  --block_size 50 \
  --max_points 8192 \
  --train_split 0.8 \
  --intensity_range 0 5000 \
  --returns_range 1 5

Config-file driven (reproducible):

python 01_prepare_data.py --config preprocessing_config.json

Example preprocessing_config.json:

{
  "input_dir": "./data/las_files",
  "output_dir": "./data/h5_files",
  "block_size": 50.0,
  "max_points": 8192,
  "train_split": 0.8,
  "augment": false,
  "workers": 1,
  "intensity_range": [0, 5000],
  "returns_range": [1, 5]
}

Notes

  • The converter reads XYZ, labels (Classification), and optional features (intensity, num_returns) from LAS.
  • Coordinates are normalized per block; XZY layout is used internally and handled consistently in the model.
  • meta.json records class IDs discovered in your dataset.

2) Train the model

Script: 02_train_model.py

Train a PointCNN segmentation model using the preprocessed H5 dataset.

Basic usage:

python 02_train_model.py --data_dir ./data/h5_files --output_dir ./models/output

Common options:

python 02_train_model.py \
  --data_dir ./data/h5_files \
  --output_dir ./models/output \
  --epochs 100 \
  --batch_size 8 \
  --learning_rate 0.001 \
  --num_classes 4 \
  --num_points 8192

Resume training from a checkpoint:

python 02_train_model.py --data_dir ./data/h5_files --output_dir ./models/output \
  --resume ./models/output/checkpoint_epoch_50.pth

You can also provide a JSON config:

python 02_train_model.py --config training_config.json

Example training_config.json:

{
  "data_dir": "./data/h5_files",
  "output_dir": "./models/output",
  "epochs": 100,
  "batch_size": 8,
  "learning_rate": 0.001,
  "weight_decay": 0.0001,
  "num_classes": 4,
  "num_points": 8192,
  "feature_dim": 5,
  "validate_every": 5,
  "save_every": 10,
  "early_stopping": 20,
  "workers": 4,
  "device": "auto"
}

Outputs

  • Model checkpoints and the final model under --output_dir
  • A copy of the training configuration for reproducibility

3) Run inference on LAS/LAZ

Script: 03_run_inference.py

Use a trained model to classify new point clouds. The script expects the dataset meta.json (for class mapping) from your training data directory.

Single-file inference:

python 03_run_inference.py \
  --model_path ./models/output/pointcnn_best_model.pth \
  --data_path ./data/h5_files \
  --las_file ./data/test/sample.las \
  --output_dir ./results

Batch inference:

python 03_run_inference.py \
  --model_path ./models/output/pointcnn_best_model.pth \
  --data_path ./data/h5_files \
  --las_dir ./data/test \
  --output_dir ./results/batch

Useful options:

  • --block_size (default 50.0), --max_points (default 8192)
  • --save_h5 to store intermediate H5 outputs
  • --detailed_metrics to compute metrics if ground-truth labels are present
  • --selective_classify <ids...>: only reclassify specific input classes
  • --preserve_classes <ids...>: never change these classes
  • --remap_classes old:new old2:new2: remap classes before writing

Model and data details

  • Network: PointCNN encoder–decoder with X-Conv layers and per-point classification head.
  • Dataset: H5 files contain normalized blocks; features include XYZ (XZY ordering internally), intensity, and num_returns when available.
  • Classes: Derived from LAS Classification values present in your data (see meta.json).

Troubleshooting

  • “No H5 files found” during training: Run the preprocessing step and verify that train/ and val/ contain .h5 files and that meta.json exists under --data_dir.
  • “Model file not found” during inference: Make sure you pass the correct --model_path to a .pth file saved by training.
  • Large memory usage: Reduce --num_points (training) or --max_points (inference) and/or decrease --batch_size.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages