A toolbox for large-scale, single-image semantic segmentation.
To install this package, we recommend that you use Conda to create a virtual env and install the dependancies :
conda env create -f environment.yml- Once the installation is done, you should be able to run the test.
pytest tests
- Data preprocessing. An example of configuration is proposed in
configs/config_preprocess.yaml. Make sure to adapt it to your folder organization. Then run the following command.
python scripts/main_preprocessing.py -c /path/to/your_config.yaml
- Training pipeline. Two possibilities : you can use
scripts/main_train.pyfor a standard training, andscripts/main_training_k_fold.pyfor a k-fold training. Make sure to use the example configurationconfigs/training.yamlorconfig_kfolds.yamlaccording to your use case, with the right paths.
python scripts/main_train.py -c .path/to/your_config.yaml
- Run segmentation.
For details about the different steps
In this phase, the data representing the galactic plan, contained in the .fits file is transformed into an h5 file of patches. Each patch is of a defined patch_size. If a patch is tempty, it is discarded.
The training pipeline is divided into two different use cases : standard and k-folds. The standard use case is a specific case of k-fold with k=1, so we won't give much details about it as it can be deducted from the k-folds.
Now, we develop each of the steps of the training pipeline :
1- Initialization.
During the init phase, the fold_controller is initialized, according to its configuration given in the configuration file. This controller loads the patches.h5 file, created during the previous phase. Then it generates the k-folds splits according to the configuration, i.e. if k=4 and k_train=2:
splits = [[[1, 2], [3], [4]], [[3, 4], [1], [2]]]
with two folds into the train set, and the rest separated into valid and test.
Then, the controller assigns each patch to an area, and this area to a fold according to different strategies. We do this area trick to ensure the continuity of the data because of some normalization necessities. The default strategy is called random, which corresponds to a round robin strategy. A naïve strategy can also be used, where the image is divided into k equal parts, and then each part is considered as a fold.
The indices of how the area are distributed into the folds are stored, so they don't have to be computed at each run.
2- Training
For each split, the train, valid and test sets are loaded according to the fold assigments. The model to train is loaded according to the configuration, and then k different versions of this model are trained. Some results are saved in log files.
For details about how to use the library for your own use case
Le package BigSF permet d'ajouter ou de modifier tout composant de manière modulaire grâce à l'architecture basée sur Configurable et les schémas de configuration. Tous les composants (modèles, datasets, optimisateurs, métriques, etc.) suivent ce principe.
LIS2 is based on a modular architecture thanks to the Configurable and TypedConfigurable base classes.
These classes allow for flexible, extensible, and standardized configuration of components (models, datasets, optimizers, etc.).
Configurable is a base class that uses schemas to dynamically validate configurations.
It allows:
- Validation: Each parameter is validated by type and constraint before instantiation using the
Schemaclass. - Flexibility: Load configurations from Python dictionaries or YAML files. Configurations are dynamic because the parameters depend on the type of object/class requested.
Example :
from configurable import Configurable, Schema
class MyComponent(Configurable):
config_schema = {
'learning_rate': Schema(float, default=0.01),
'batch_size': Schema(int, default=32),
}
def __init__(self):
config = {'learning_rate': 0.001, 'batch_size': 64}
component = MyComponent.from_config(config)
print(component.learning_rate) # 0.001
print(component.batch_size) # 64TypedConfigurable extends Configurable by adding the ability to dynamically choose a subclass to instantiate based on a type parameter.
Example with Models :
from configurable import TypedConfigurable, Schema
class BaseModel(TypedConfigurable):
aliases = ['base_model']
class CNNModel(BaseModel):
aliases = ['cnn']
config_schema = {
'filters': Schema(int, default=32),
'kernel_size': Schema(int, default=3),
}
def __init__(self):
config = {'type': 'cnn', 'filters': 64, 'kernel_size': 5}
model = BaseModel.from_config(config)
print(model.filters) # 64
print(model.kernel_size) # 5Schema defines the expected structure for each configuration parameter. It plays a central role in validating and applying default values when instantiating objects.
Main Schema Attributes:
type: Specifies the expected type (e.g., int, float, str).default: Defines a default value if the parameter is not provided.optional: Indicates whether the parameter is optional.aliases: Allows you to use alternative names for the same parameter.
Schema defines the expected structure for each configuration parameter. It plays a central role in validating and applying default values when instantiating objects.
Define the class: Inherit from the appropriate base class (e.g., BaseModel, Metric, BaseDataset, etc.) or directly from
Configurable and implement the necessary logic.
class NewComponent(Configurable):
config_schema = {
'param1': Schema(str),
'param2': Schema(int, default=10),
}
def __init__(self):
print(self.param1)
print(self.param2)Add in YAML configuration: Reference the new component with its parameters.
component:
param1: "example"Use in pipeline: Dynamically load and integrate the component via from_config.
import NewComponent
component = NewComponent.from_config(config['component'])Define subclasses: Create subclasses for each component type.
class NewComponent(Configurable):
config_schema = {
'param1': Schema(str, default="default"),
'param2': Schema(int, default=10),
}
def __init__(self):
print(self.param1) # default
print(self.param2) # 10
class NewComponentNeg(Configurable):
config_schema = {
'param1': Schema(str, default="Neg"),
'param2': Schema(int, default=-10),
}
def __init__(self):
print(self.param1) # Neg
print(self.param2) # -10component:
type: NewComponentNeg
param1: "example"import Configurable
component = Configurable.from_config(config['component'])
# Output
# example
# -10Additional components can be added using the same process, as long as they inherit from Configurable.
In addition, for some classes, such as models, base classes are already defined to facilitate the addition of new components:
BaseModel: Base class for models (models/base_model.py).BaseDataset: Base class for datasets (datasets/dataset.py).BaseOptimizer: Base class for optimizers (core/optim.py).BaseScheduler: Base class for schedulers (core/scheduler.py).Metric: Base class for metrics (core/metrics.py).Encoder: Base class for encoders (models/encoders/encoder.py).EarlyStopping: Base class for early stopping (core/early_stopping.py).
Unit tests are included in theClasse de base pour les modèles ( models/base_model.py).
BaseDataset: Base class for datasets (datasets/dataset.py).BaseOptimizer: Base class for optimizers (core/optim.py).BaseScheduler: Base class for schedulers (core/scheduler.py).Metric: Base class for metrics (core/metrics.py).Encoder: Base class for encoders (models/encoders/encoder.py).EarlyStopping: Base class for early stopping (core/early_stopping.py).
Unit tests are included in the PyTorch class ( datasets/dataset.py).
BaseOptimizer: Class for optimizers (core/optim.py).BaseScheduler: Class for schedulers (core/scheduler.py).Metric: Class for metrics (core/metrics.py).Encoder: Class for encoders (models/encoders/encoder.py).EarlyStopping: Class for early stopping (core/early_stopping.py).
Unit tests are included in the package to ensure the components function properly. To run the tests, use the tests folder.
The LIS² package provides a ready-to-use Trainer to orchestrate model training, validation, and testing. The Trainer dynamically integrates with all the components configured in a YAML file.
trainer:
output_dir: ./results
run_name: experiment_1
model:
type: unet
in_channels: 1
out_channels: 1
optimizer:
type: adam
lr: 0.001
epochs: 50
batch_size: 16
metrics:
- type: dice- Results: Metrics and losses are saved in the directory defined by
output_dir. - Snapshots: Snapshot files contain:
- The model state (
MODEL_STATE). - The state of optimizers and schedulers.
- The global configuration (
GLOBAL_CONFIG).
Resuming from a Snapshot:
from core.trainer import Trainer
trainer = Trainer.from_snapshot("./results/experiment_1/best.pt")
trainer.train() 