Text2Scene: Generating Compositional Scenes from Textual Descriptions

Albert Nasybullin (a.nasibullin@innopolis.university), Marina Lisnichenko (m.lisnichenko@innopolis.university)

Project idea: to rebuild text into image

Dataset: Abstract Scene Dataset v1.1

Timeline with each member individual task:

1st week: literature review and code investigation
2nd week: object detection
3rd week: code creating
4th week: cartoonizer
5th week: presentation

References:

Original paper, their previous paper

What do we want:

As far as some code was found in the GitHub, firstly we want to test it and find out either it is workable or not. If there is no problem, we can suggest to do the following work: on synthesized image find objects, image of which separately are saved in the dataset (just pattern recognition).

Then there is an idea to move some objects. For example, when sentence is "Mike throws a ball to Jenny" ball is moving from Mike to Jenny.

Final step is to generate from the output image a comic-book style picture. We propose to creater another GAN for this task.

In the end it could be done text-to-image work. For example, when ball appears in the Jenny's hands the output sentence is "Now Jenny holds ball".

But to do all of this work we firstly have to be sure, that primary scripts from Git are workable.

Final Update (26.04.2020)

What do we have till now:

literature review
Text2Scene algorithm (NOT from paper)
Comic-book style picture

Problems

Unfortunately, we couldn't run git from source paper. It happened because of (1) too slow internet to download around 100 Gb of info and (2) there is no instruction about how to run the code.

Text-to-scene algorithm was done from scratch by Marina. There is no Neural Networks, but, was done attempt to save the idea of authors of paper (make a code from text -> from code take image -> place images in the background). Because of this unexpected task, following moving objects wasn't done yet.

However, by Albert was done GAN, making comic-book style of pictures. Till now, it needs some debugging process, but already we have some good results.

Results

The code is there: https://colab.research.google.com/drive/10a2mwlpcp9hURDLwGuWdUHrRVY88vnJO#scrollTo=Z3B6G3xFBbRm

Also you can check our git: https://github.com/UralmashFox/CV_Project there is "CV_project.ipynb" file. All you need is to open it (for example in collab), in the last cell input the desired text:

For example, input

It was sunny day. Happy Mike was sitting in the park and playing with ball. There was a snake and Jenny was worried, she was wearing silly hat

will give you a following picture:

if you are going to change text, from second time run ONLY the last cell, not all the program.

Important issues

try to build picture in following steps for each sentence: sky objects -> ground objects -> person + mood + hat + activity -> animals. In case to not overlap small objects by big
follow syntax rules of English
you can input any amount of sentences with any amount of words
NOT input sentence with only hat or activity

Some examples

There is a bear

The rocket was flying in the sky and Jenny was excited

It was a storm. Despite of this, kids were in the park and playing soccer. Mike was sitting near sand. Jenny was a happy princes, she was kicking soccer

On the table there is a drink and hotdog

Also when you run the code, in the end there is a cartoonized image. For example:

Part2: Cartoonizing filter:

In the beginning, we wanted to create GAN, which will generate comics-like images from the images we get from the text2image part. Regretfully, we failed on this part. Mostly because of computation complexity and my personal (Albert's) lack of expertise in Generative Adversarial Network. Also, we realized a GAN approach is overcomplicated for our goals. So, we decided to create a filter based on classical Computer Vision methods.

We could've to use a pre-trained model for cartoonizing, but it doesn't sound like fun. Isn't it? So, we have spent most of our time trying to make the GAN works but failed. I will be happy to try it one more time during the summer holidays.

How cartoonizing filter works?

Papers and articles we found useful in this work:

Step by step:

to create histogram for given image,
to find centroid,
to update centroids as long as they won't stop changing,
to choose minimum group size, alpha value to get the best value of k,
to calculate values of H, S and V,
to extract contours,
to fill contours with calculated H, S, V;

Given results of cartoonizng filter:

Regretfully, we didn't get enough attention to a fact, that Abstract Scene Dataset is very abstract itself. So, there is no real sense in cartoonizing abstract images. But our filter works pretty good for another type of image. For our image this filter gives us a way more shape contours. You can see several examples here:

Another examples:

Due to computation limits cartoonizer part represented as temporary approach required for demo. It is supposed to be developed in the next time

We can see that after cartoonizing image looks more painted by hand, the result that we wanted is achieved.

Conclusion

Due to doing this project we met some problems in case to modify the goal and some tasks. In the end we have workable code from scratch that realizes most tasks which we planed.

If you're going to modify the text, PLEASE run run all only ONCE! When modifying text - change it and run only LAST cell! Thanks.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
cartoonizer_examples		cartoonizer_examples
CV_project.ipynb		CV_project.ipynb
README.md		README.md
Text2Scene_ Generating Compositional Scenes from Textual Descriptions.pdf		Text2Scene_ Generating Compositional Scenes from Textual Descriptions.pdf
book.csv		book.csv
pngs.rar		pngs.rar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text2Scene: Generating Compositional Scenes from Textual Descriptions

Albert Nasybullin (a.nasibullin@innopolis.university), Marina Lisnichenko (m.lisnichenko@innopolis.university)

Project idea: to rebuild text into image

Dataset: Abstract Scene Dataset v1.1

Timeline with each member individual task:

References:

What do we want:

Final Update (26.04.2020)

Problems

Results

Important issues

Some examples

Part2: Cartoonizing filter:

How cartoonizing filter works?

Given results of cartoonizng filter:

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

UralmashFox/CV_Project

Folders and files

Latest commit

History

Repository files navigation

Text2Scene: Generating Compositional Scenes from Textual Descriptions

Albert Nasybullin (a.nasibullin@innopolis.university), Marina Lisnichenko (m.lisnichenko@innopolis.university)

Project idea: to rebuild text into image

Dataset: Abstract Scene Dataset v1.1

Timeline with each member individual task:

References:

What do we want:

Final Update (26.04.2020)

Problems

Results

Important issues

Some examples

Part2: Cartoonizing filter:

How cartoonizing filter works?

Given results of cartoonizng filter:

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages