OptiGuide/milp-evolve at main · microsoft/OptiGuide

Name	Name	Last commit message	Last commit date
parent directory ..
data/milp_code	data/milp_code
src	src
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
setup.md	setup.md

Towards Foundation Models for Mixed Integer Linear Programming

Homepage: The OptiGuide Project
Repository: MILP-Evolve
Dataset: Hugging Face
Paper: ArXiv, openreview

Citation Information

Please cite the following paper when using our code or dataset:

@article{li2024towards,
  author    = {Li, Sirui and Kulkarni, Janardhan and Wu, Cathy and Menache, Ishai and Li, Beibin},
  title     = {Towards Foundation Models for Mixed Integer Linear Programming},
  booktitle = {The Thirteenth International Conference on Learning Representations},
  year      = {2025}
}

Setup for the OptiGuide Code

Please refer to setup.md for the detailed setup instructions.

MILP-Evolve Dataset Summary

MILP-Evolve is a large-scale dataset of Mixed Integer Linear Programming (MILP) problem classes and instances. It is generated using an LLM-based evolutionary framework capable of producing a diverse set of MILP classes with unlimited instances. The dataset is designed to facilitate research in developing foundation models for MILP that generalize across problem classes. It supports multiple learning tasks, including integrality gap prediction, learning to branch, and aligning MILP instances with natural language descriptions. You can find our dataset on Hugging Face.

Supported Tasks

Integrality Gap Prediction: This task involves predicting the integrality gap of MILP instances, which measures the difference between the optimal integer solution and the linear relaxation. Success is typically measured by a low mean squared error or high correlation between predicted and actual gaps. Models like regression-based neural networks can be trained using this dataset.
Learning to Branch: The dataset can be used to train models that learn effective branching strategies within MILP solvers. Performance is measured by metrics such as reduced solve time, smaller branch-and-bound trees, or fewer nodes explored. Reinforcement learning models or imitation learning approaches are commonly used for this task.
Language-MILP Alignment: This new task involves aligning MILP instances with their natural language descriptions. Success is measured by retrieval accuracy or alignment scores. Models like cross-modal transformers or contrastive learning frameworks can be applied.

Dataset Creation

Curation Rationale

The dataset was created to overcome the limitations of existing MILP datasets, which often lack diversity and volume, hindering the generalization of deep learning models across different problem classes. MILP-Evolve aims to provide a diverse and extensive dataset to facilitate the development of foundation models in MILP.

Source Data

Initial Data Collection and Normalization

MILP-Evolve uses an LLM-based evolutionary framework to generate MILP code iteratively. Starting from 8 seed classes from previous literature, the framework employs OpenAI's GPT-4 to generate new MILP classes by applying various transformations like addition, mutation, and crossover. Each generated code is then subjected to parameter tuning and filtering to ensure computational feasibility and diversity.

Who are the source language producers?

The source data is machine-generated by the MILP-Evolve framework using GPT-4. The initial seed classes are standard MILP problems from established literature, reformatted into a modular code structure.

Annotations

Annotation process

Annotations such as integrality gaps and branching decisions are generated automatically. For integrality gaps, instances are solved using MILP solvers with specified time limits, and gaps are calculated based on the optimal solutions. For learning to branch, data is collected by solving instances and recording branching decisions, sometimes employing expert strategies like Strong Branching.

Who are the annotators?

Annotations are produced by computational processes and solvers without human intervention.

Personal and Sensitive Information

The dataset does not contain personal or sensitive information. All data is synthetic and generated for research purposes.

Considerations for Using the Data

Social Impact of Dataset

MILP-Evolve has the potential to significantly advance optimization and machine learning research. By enabling models that generalize across a wide range of MILP problems, it can lead to more efficient solutions in industries like logistics, supply chain, healthcare, and environmental planning. This can result in cost savings, improved resource utilization, and better decision-making processes.

This dataset is being released to facilitate further machine learning research and not for any real-world application. Users are responsible for developing, testing, and validating any models trained on this dataset before any implementation in the real world.

Discussion of Biases

While efforts were made to ensure diversity, the dataset may still reflect biases inherent in the LLM's training data or the initial seed classes. Certain problem types or formulations might be overrepresented, and users should be cautious when generalizing results.

Other Known Limitations

Generative Errors: As the MILP classes are generated by an LLM, there might be syntactic or logical errors that passed the filtering process.
Computational Feasibility: Some instances might be trivial or extremely hard to solve despite filtering for problem size and solve time.
Representation Gaps: Despite the dataset's size, it may not cover all real-world MILP applications.

Additional Information

Dataset Curators

The dataset was curated by the research team behind the MILP-Evolve framework. Specific names and affiliations will be provided upon publication.

Licensing Information

This dataset is licensed under the CDLA-2.0.

Contributions

Thanks to the entire MILP-Evolve team for their efforts in creating and releasing this dataset.

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Towards Foundation Models for Mixed Integer Linear Programming

Citation Information

Setup for the OptiGuide Code

MILP-Evolve Dataset Summary

Supported Tasks

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection and Normalization

Who are the source language producers?

Annotations

Annotation process

Who are the annotators?

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Contributions

Trademarks

FilesExpand file tree

milp-evolve

Directory actions

More options

Directory actions

More options

Latest commit

History

milp-evolve

Folders and files

parent directory

README.md

Towards Foundation Models for Mixed Integer Linear Programming

Citation Information

Setup for the OptiGuide Code

MILP-Evolve Dataset Summary

Supported Tasks

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection and Normalization

Who are the source language producers?

Annotations

Annotation process

Who are the annotators?

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Contributions

Trademarks