Best Prompts for Text-to-Image Models and How to Find Them

This repository contains code and data for Best Prompts for Text-to-Image Models and How to Find Them paper.

Code

To run the prompt optimization, you need to create a class that gets image generation queries and returns images generated with Stable Diffusion. We use the following interface:

class DiffusionApi:
    def generate(self, prompt, steps=50, scale=7.5, seed=0, height=512, width=512):
        pass

Here you pass the prompt string, the number of steps, guidance scale, seed number, and shape of the image, and the generate function returns a Pillow Image object.

To run a genetic optimization, you need to use optimize.py script that has the following arguments:

--toloka-token. Toloka API token
--aws-access-key-id. AWS secret key ID. Here we use it to store generated images in an AWS bucket to get direct links that will be embedded into annotation tasks
--aws-secret-access-key. AWS secret access key
--endpoint-url. Base URL at which your images will be stored. In other words, a URL to your AWS bucket
--bucket. Name of the AWS bucket
--base-pool-id. An ID of configured Toloka pool that will be cloned on every optimization iteration

Data

The annotation.csv contains a CSV file with the results of the pairwise comparisons. It has five columns: prompt_id is an ID of an image description, left_uid is a UID of four left images (more details below), right_uid is a UID of four right images, worker is the worker's ID, and label is a worker's preference (left or right)
The uid_to_keywords.csv contains a mapping of image UID to keywords that it was obtained with
prompts.csv contains image descriptions. Here index of a prompt is prompt_id in annotation.csv
keywords.csv contains keywords and their occurrences in the Stable Diffusion Discord

To obtain generated images, you need to use https://storage.yandexcloud.net/diffusion/ as a base URL and append {UID}_{0-3}.png to it. For example,

https://storage.yandexcloud.net/diffusion/0000298d546d4a6299774ca323fa7f34_0.png

Cite

Use this BibTeX citation in case you want to cite our work.

@misc{https://doi.org/10.48550/arxiv.2209.11711,
  doi = {10.48550/ARXIV.2209.11711},
  url = {https://arxiv.org/abs/2209.11711},
  author = {Pavlichenko, Nikita and Zhdanov, Fedor and Ustalov, Dmitry},
  keywords = {Human-Computer Interaction (cs.HC), Computation and Language (cs.CL), Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences, H.5.2; H.3.3},
  title = {Best Prompts for Text-to-Image Models and How to Find Them},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution Share Alike 4.0 International}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE-APACHE.txt		LICENSE-APACHE.txt
LICENSE-CC-BY.txt		LICENSE-CC-BY.txt
README.md		README.md
annotation.csv		annotation.csv
comp_hp.csv		comp_hp.csv
keywords.csv		keywords.csv
optimize.py		optimize.py
prompts.csv		prompts.csv
uid_to_keywords.csv		uid_to_keywords.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Best Prompts for Text-to-Image Models and How to Find Them

Code

Data

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Best Prompts for Text-to-Image Models and How to Find Them

Code

Data

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages