Multi-Agent Framework for Table Processing

Answering natural language (NL) questions about tables, known as Tabular Question Answering (TQA), is crucial because it allows users to quickly and efficiently extract meaningful insights from structured data, effectively bridging the gap between human language and machine-readable formats. Many of these tables are derived from web sources or real-world scenarios, which require meticulous data preparation (or data prep) to ensure accurate responses. However, preparing such tables for NL questions introduces new requirements that extend beyond traditional data preparation. This question-aware data preparation involves specific tasks such as column augmentation and filtering tailored to particular questions, as well as question-aware value normalization or conversion, highlighting the need for a more nuanced approach in this context. Because each of the above tasks is unique, a single model (or agent) may not perform effectively across all scenarios. In this paper, we propose AutoPrep, a large language model (LLM)-based multi-agent framework that leverages the strengths of multiple agents, each specialized in a certain type of data prep, ensuring more accurate and contextually relevant responses. Given an NL question over a table, AutoPrep performs data prep through three key components. Planner: Determines a logical plan, outlining a sequence of high-level operations. Programmer: Translates this logical plan into a physical plan by generating the corresponding low-level code. Executor: Executes the generated code to process the table. To support this multi-agent framework, we design a novel Chain-of-Clauses reasoning mechanism for high-level operation suggestion, and a tool-augmented method for low-level code generation. Extensive experiments on real-world TQA datasets demonstrate that AutoPrep can significantly improve the state-of-the-art TQA solutions through question-aware data preparation.

Quick Start

Environment Requirement

conda create -n ap python=3.9.15
conda activate ap

pip install -r requirements.txt

Buildup Steps

Download the datasets with token tllm and unzip it to any path. And the constructed TransTQ dataset can be accessed by TransTQ with token tllm.
Modify the DATA_PATH in global_values.py to the root path of your downloaded datasets.
create a key file named keys.txt in the root and put your api keys in it (each key for one line)

Overall Results

Main Takeaway: AutoPrep can improve the accuracy of all TQA baselines, especially for code generation baselines.

Main Takeaway: AutoPrep integrated with NL2SQL achieves SOTA performance on two TQA datasets based on four different LLM backbones.

Contributing

❗ Please refer to Developer Guides when comitting.

Citation

If you find our work helpful, please cite as:

@article{fan2024autoprep,
  title={AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework},
  author={Fan, Meihao and Fan, Ju and Tang, Nan and Cao, Lei and Li, Guoliang and Du, Xiaoyong},
  journal={arXiv preprint arXiv:2412.10422},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
figs		figs
pdf		pdf
scripts		scripts
src		src
tmp		tmp
README.md		README.md
global_values.py		global_values.py
mula_dp.py		mula_dp.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Multi-Agent Framework for Table Processing

Quick Start

Environment Requirement

Buildup Steps

Overall Results

Contributing

Citation

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

ruc-datalab/AutoPrep

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Framework for Table Processing

Quick Start

Environment Requirement

Buildup Steps

Overall Results

Contributing

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages