Decentralized Text-to-Text Training Framework

This framework enables distributed training of text-to-text models (like T5 and BART) across multiple machines using federated learning. It consists of three main components:

Coordinator: Manages the training process and aggregates model updates
Worker: Performs local training on data and sends updates to the coordinator
Monitor: Web-based dashboard to track training progress

Features

Support for multiple text-to-text models (T5, BART)
Real-time training monitoring via web dashboard
Automatic model aggregation using federated averaging
GPU/CPU support
Fault tolerance and automatic node recovery
Easy to extend with new models

Installation

Clone the repository:

git clone https://github.com/yourusername/decentralized-text-training.git
cd decentralized-text-training

Install dependencies:

pip install -r requirements.txt

Usage

1. Start the Coordinator

python coordinator.py --host 0.0.0.0 --port 5000 --model t5-small

2. Start Worker Nodes

On each machine that will participate in training:

python worker.py --coordinator-address <coordinator-ip>:5000 --model t5-small [--gpu]

3. Start the Monitor

python web_monitor.py --coordinator-address <coordinator-ip>:5000 --port 8080

Then open your browser and navigate to http://localhost:8080 to view the training dashboard.

Configuration

The framework can be configured through config.py. Key settings include:

BATCH_SIZE: Batch size for training
LEARNING_RATE: Learning rate for model updates
LOCAL_EPOCHS: Number of local training epochs
MIN_NODES_TO_START: Minimum number of nodes required to start training
NODE_TIMEOUT: Timeout for inactive nodes
NODE_HEARTBEAT_INTERVAL: Interval for node heartbeats

Adding New Models

To add a new model:

Add the model configuration to model_registry.py:

model_registry.register_model(
    name="your-model-name",
    model_class=YourModelClass,
    tokenizer_class=YourTokenizerClass,
    pretrained_name="pretrained-model-name",
    config={
        "max_length": 512,
        "num_beams": 4,
        "early_stopping": True
    }
)

Update the worker to handle the new model's data format and training process.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
config.py		config.py
coordinator.py		coordinator.py
model.py		model.py
model_registry.py		model_registry.py
monitor.py		monitor.py
requirements.txt		requirements.txt
text_monitor.py		text_monitor.py
utils.py		utils.py
web_monitor.py		web_monitor.py
worker.py		worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decentralized Text-to-Text Training Framework

Features

Installation

Usage

1. Start the Coordinator

2. Start Worker Nodes

3. Start the Monitor

Configuration

Adding New Models

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Decentralized Text-to-Text Training Framework

Features

Installation

Usage

1. Start the Coordinator

2. Start Worker Nodes

3. Start the Monitor

Configuration

Adding New Models

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages