Standalone point cloud semantic segmentation with PointCNN (PyTorch).
This repository provides an end-to-end pipeline for classifying LAS/LAZ point clouds:
- Convert LAS to H5 training blocks, 2) Train a PointCNN model, 3) Run inference on new point clouds. No proprietary dependencies.
- Full pipeline: LAS → H5 → training → inference
- PointCNN segmentation model with hierarchical X-Conv layers
- Clean, configurable CLIs for preprocessing, training, and inference
- Works with standard LAS classes and optional features (intensity, num_returns)
- JSON-config support for reproducible runs
BathyNet/
├── 01_prepare_data.py # Convert LAS to H5 blocks
├── 02_train_model.py # Train PointCNN on H5 data
├── 03_run_inference.py # Inference on LAS/LAZ with trained model
├── config_example.json # Example configuration
├── pointcnn_default_config.json # Default hyperparameters (reference)
├── requirements.txt # Python dependencies
├── models/
│ ├── pointcnn_core.py # PointCNN network + dataset + inference utils
│ ├── pointcnn_segmentation_trainer.py
│ └── trainer.py # Training entry used by 02_train_model.py
└── utilities/
└── data_converter.py # LAS → H5 converter
- Python 3.8+ recommended
- PyTorch 1.8+ (with CUDA if using GPU)
- See
requirements.txtfor the full list:- torch, torchvision, torchaudio
- numpy, h5py, laspy
- scikit-learn, scipy, tqdm, matplotlib, seaborn
Install dependencies:
pip install -r requirements.txtScript: 01_prepare_data.py
This converts raw LAS files into H5 blocks suitable for training. It will create train/ and val/ subfolders under the output path and write a meta.json file describing classes and settings.
Basic usage:
python 01_prepare_data.py --input_dir ./data/las_files --output_dir ./data/h5_filesAdvanced options (see --help for all):
python 01_prepare_data.py \
--input_dir ./data/las_files \
--output_dir ./data/h5_files \
--block_size 50 \
--max_points 8192 \
--train_split 0.8 \
--intensity_range 0 5000 \
--returns_range 1 5Config-file driven (reproducible):
python 01_prepare_data.py --config preprocessing_config.jsonExample preprocessing_config.json:
{
"input_dir": "./data/las_files",
"output_dir": "./data/h5_files",
"block_size": 50.0,
"max_points": 8192,
"train_split": 0.8,
"augment": false,
"workers": 1,
"intensity_range": [0, 5000],
"returns_range": [1, 5]
}Notes
- The converter reads XYZ, labels (Classification), and optional features (intensity, num_returns) from LAS.
- Coordinates are normalized per block; XZY layout is used internally and handled consistently in the model.
meta.jsonrecords class IDs discovered in your dataset.
Script: 02_train_model.py
Train a PointCNN segmentation model using the preprocessed H5 dataset.
Basic usage:
python 02_train_model.py --data_dir ./data/h5_files --output_dir ./models/outputCommon options:
python 02_train_model.py \
--data_dir ./data/h5_files \
--output_dir ./models/output \
--epochs 100 \
--batch_size 8 \
--learning_rate 0.001 \
--num_classes 4 \
--num_points 8192Resume training from a checkpoint:
python 02_train_model.py --data_dir ./data/h5_files --output_dir ./models/output \
--resume ./models/output/checkpoint_epoch_50.pthYou can also provide a JSON config:
python 02_train_model.py --config training_config.jsonExample training_config.json:
{
"data_dir": "./data/h5_files",
"output_dir": "./models/output",
"epochs": 100,
"batch_size": 8,
"learning_rate": 0.001,
"weight_decay": 0.0001,
"num_classes": 4,
"num_points": 8192,
"feature_dim": 5,
"validate_every": 5,
"save_every": 10,
"early_stopping": 20,
"workers": 4,
"device": "auto"
}Outputs
- Model checkpoints and the final model under
--output_dir - A copy of the training configuration for reproducibility
Script: 03_run_inference.py
Use a trained model to classify new point clouds. The script expects the dataset meta.json (for class mapping) from your training data directory.
Single-file inference:
python 03_run_inference.py \
--model_path ./models/output/pointcnn_best_model.pth \
--data_path ./data/h5_files \
--las_file ./data/test/sample.las \
--output_dir ./resultsBatch inference:
python 03_run_inference.py \
--model_path ./models/output/pointcnn_best_model.pth \
--data_path ./data/h5_files \
--las_dir ./data/test \
--output_dir ./results/batchUseful options:
--block_size(default 50.0),--max_points(default 8192)--save_h5to store intermediate H5 outputs--detailed_metricsto compute metrics if ground-truth labels are present--selective_classify <ids...>: only reclassify specific input classes--preserve_classes <ids...>: never change these classes--remap_classes old:new old2:new2: remap classes before writing
- Network: PointCNN encoder–decoder with X-Conv layers and per-point classification head.
- Dataset: H5 files contain normalized blocks; features include XYZ (XZY ordering internally), intensity, and num_returns when available.
- Classes: Derived from LAS Classification values present in your data (see
meta.json).
- “No H5 files found” during training: Run the preprocessing step and verify that
train/andval/contain.h5files and thatmeta.jsonexists under--data_dir. - “Model file not found” during inference: Make sure you pass the correct
--model_pathto a.pthfile saved by training. - Large memory usage: Reduce
--num_points(training) or--max_points(inference) and/or decrease--batch_size.