- Homepage: The OptiGuide Project
- Repository: MILP-Evolve
- Dataset: Hugging Face
- Paper: ArXiv, openreview
Please cite the following paper when using our code or dataset:
@article{li2024towards,
author = {Li, Sirui and Kulkarni, Janardhan and Wu, Cathy and Menache, Ishai and Li, Beibin},
title = {Towards Foundation Models for Mixed Integer Linear Programming},
booktitle = {The Thirteenth International Conference on Learning Representations},
year = {2025}
}Please refer to setup.md for the detailed setup instructions.
MILP-Evolve Dataset Summary
MILP-Evolve is a large-scale dataset of Mixed Integer Linear Programming (MILP) problem classes and instances. It is generated using an LLM-based evolutionary framework capable of producing a diverse set of MILP classes with unlimited instances. The dataset is designed to facilitate research in developing foundation models for MILP that generalize across problem classes. It supports multiple learning tasks, including integrality gap prediction, learning to branch, and aligning MILP instances with natural language descriptions. You can find our dataset on Hugging Face.
-
Integrality Gap Prediction: This task involves predicting the integrality gap of MILP instances, which measures the difference between the optimal integer solution and the linear relaxation. Success is typically measured by a low mean squared error or high correlation between predicted and actual gaps. Models like regression-based neural networks can be trained using this dataset.
-
Learning to Branch: The dataset can be used to train models that learn effective branching strategies within MILP solvers. Performance is measured by metrics such as reduced solve time, smaller branch-and-bound trees, or fewer nodes explored. Reinforcement learning models or imitation learning approaches are commonly used for this task.
-
Language-MILP Alignment: This new task involves aligning MILP instances with their natural language descriptions. Success is measured by retrieval accuracy or alignment scores. Models like cross-modal transformers or contrastive learning frameworks can be applied.
The dataset was created to overcome the limitations of existing MILP datasets, which often lack diversity and volume, hindering the generalization of deep learning models across different problem classes. MILP-Evolve aims to provide a diverse and extensive dataset to facilitate the development of foundation models in MILP.
MILP-Evolve uses an LLM-based evolutionary framework to generate MILP code iteratively. Starting from 8 seed classes from previous literature, the framework employs OpenAI's GPT-4 to generate new MILP classes by applying various transformations like addition, mutation, and crossover. Each generated code is then subjected to parameter tuning and filtering to ensure computational feasibility and diversity.
The source data is machine-generated by the MILP-Evolve framework using GPT-4. The initial seed classes are standard MILP problems from established literature, reformatted into a modular code structure.
Annotations such as integrality gaps and branching decisions are generated automatically. For integrality gaps, instances are solved using MILP solvers with specified time limits, and gaps are calculated based on the optimal solutions. For learning to branch, data is collected by solving instances and recording branching decisions, sometimes employing expert strategies like Strong Branching.
Annotations are produced by computational processes and solvers without human intervention.
The dataset does not contain personal or sensitive information. All data is synthetic and generated for research purposes.
MILP-Evolve has the potential to significantly advance optimization and machine learning research. By enabling models that generalize across a wide range of MILP problems, it can lead to more efficient solutions in industries like logistics, supply chain, healthcare, and environmental planning. This can result in cost savings, improved resource utilization, and better decision-making processes.
This dataset is being released to facilitate further machine learning research and not for any real-world application. Users are responsible for developing, testing, and validating any models trained on this dataset before any implementation in the real world.
While efforts were made to ensure diversity, the dataset may still reflect biases inherent in the LLM's training data or the initial seed classes. Certain problem types or formulations might be overrepresented, and users should be cautious when generalizing results.
- Generative Errors: As the MILP classes are generated by an LLM, there might be syntactic or logical errors that passed the filtering process.
- Computational Feasibility: Some instances might be trivial or extremely hard to solve despite filtering for problem size and solve time.
- Representation Gaps: Despite the dataset's size, it may not cover all real-world MILP applications.
The dataset was curated by the research team behind the MILP-Evolve framework. Specific names and affiliations will be provided upon publication.
This dataset is licensed under the CDLA-2.0.
Thanks to the entire MILP-Evolve team for their efforts in creating and releasing this dataset.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.