OptimouseQuest

The dataset from the paper "Synthetic Dialogue Dataset Generation using LLM Agents" by Abdullin et al.

Abstract

Linear programming (LP) problems are ubiquitous in various real-world scenarios. OptimouseQuest aims to facilitate the development of conversational agents that assist users in formulating linear models for these problems. Using prompt engineering, two agents are developed: one simulates a conversational agent and the other a user. The agents engage in dialogues based on text descriptions of linear problems from NLP4Opt, generating sample conversations that can serve as a baseline for future work.

Dataset Structure

Human Annotated folder - the dialogues from the DEV subset of NLP4Opt with human evaluation(28 dialogues).
Dev folder - the dialogues from DEV subset of NLP4Opt
Train folder - the dialogues from TRAIN subset of NLP4Opt

data/ 
  generated
    human_annotated/* 
    dev/* 
    train/*

Evaluation

The dataset includes extrinsic evaluation metrics, assessing the quality of dialogue summaries against original problem descriptions. Both human and automatic evaluations have been conducted, including a GPT-4-based approach to mimic human evaluation metrics.

Human Evaluation

Four evaluators scored how well the summary generated at the end of the dialogue matches the problem statement. For every pair of a problem statement and a generated summary, each evaluator produced the following 4 evaluation metrics.

Information recall (IR) (1-5) -- All the necessary information is in the generated summary.
Information precision (IP) (1-5) -- No irrelevant information is generated.
Information repetition (IRep) (1-5) -- The generated summary does not repeat the same information multiple times.
Readability (Read) (1-5) -- The generated summary is easily readable and fluent.

Usage

To use this dataset in your project, please follow these steps:

Clone the repository:

git clone https://github.com/eabdullin/optimouse-quest.git

Navigate to the data directory:

Citation

@misc{abdullin2023synthetic,
  title={Synthetic Dialogue Dataset Generation using LLM Agents},
  author={Yelaman Abdullin, Diego Molla-Aliod, Bahadorreza Ofoghi, John Yearwood, Qingyang Li},
  year={2023},
  eprint={2401.17461},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2401.17461}
}

Contributing

If you find any errors or have suggestions, please open an issue or submit a pull request.

License

This dataset is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OptimouseQuest

Abstract

Dataset Structure

Evaluation

Human Evaluation

Usage

Citation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

OptimouseQuest

Abstract

Dataset Structure

Evaluation

Human Evaluation

Usage

Citation

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages