Controllable Table Data Synthesis with Natural Language and Knowledge Base

This project provides a pipeline for generating structured table datasets using natural language descriptions and an optional knowledge base for controlled constraints. The workflow consists of three sequential steps:

generate_json.py - Accepts a natural language description of the desired table dataset and generates a JSON file representing column relationships.
generate_python.py - Takes the JSON file as input, optionally incorporating a knowledge base to enforce controlled constraints on the synthesized data.
generate_dataset.py - Uses the processed JSON and constraints to generate the final dataset.

Workflow

Define the dataset: Run generate_json.py with a textual description of the table data you want to synthesize.
Process constraints: Use generate_python.py to process the JSON output, optionally integrating domain-specific knowledge for enhanced control.
Generate the dataset: Execute generate_dataset.py to synthesize the structured data.

File Descriptions

`generate_json.py`

Functionality:

Accepts a text description of the target dataset.
Produces a JSON file defining column relationships.

Usage:

python generate_json.py --description "Your dataset description here"

`generate_python.py`

Functionality:

Processes the JSON file generated in the first step.
Optionally integrates a knowledge base to impose constraints on the synthesized data.

Usage:

python generate_python.py --json_file generated_structure.json [--knowledge_base knowledge_base.json]

`generate_dataset.py`

Functionality:

Generates the final table dataset based on the structured JSON and optional constraints.

Usage:

python generate_dataset.py --json_file processed_structure.json

Dependencies

The project requires the following Python libraries. It is recommended to use Python 3.8 or later.

pip install -r requirements.txt

Ensure all dependencies are installed before running the scripts.

File Structure

Project Root
│   README.md
│   requirements.txt
│
└───scripts
    │   generate_json.py
    │   generate_python.py
    │   generate_dataset.py

Notes

Ensure that all scripts are executed in the correct sequence for proper dataset synthesis.
The knowledge base is optional but improves data generation control.
Refer to inline comments within each script for additional details on parameters and configurations.

Contributors

For any issues or suggestions, feel free to reach out to the project maintainers.

Thank you for using this project! We hope it helps with your work.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
dataset_columns_example.json		dataset_columns_example.json
generate_dataset.py		generate_dataset.py
generate_json.py		generate_json.py
generate_python_ds.py		generate_python_ds.py
generate_python_gpt.py		generate_python_gpt.py
knowledge_example.txt		knowledge_example.txt
test_example.py		test_example.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Controllable Table Data Synthesis with Natural Language and Knowledge Base

Workflow

File Descriptions

`generate_json.py`

`generate_python.py`

`generate_dataset.py`

Dependencies

File Structure

Notes

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Controllable Table Data Synthesis with Natural Language and Knowledge Base

Workflow

File Descriptions

generate_json.py

generate_python.py

generate_dataset.py

Dependencies

File Structure

Notes

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`generate_json.py`

`generate_python.py`

`generate_dataset.py`

Packages