The dalle-flow project can generate images based on text

Project Address

The following demonstrates the effect of the project and describes the algorithm used.

1.Effect Demo

Let's take a simple example to show you.

For example, we want to generate an image for the text "a teddy bear on a skateboard in Times Square".

After typing it into dalle-flow, we can get the following image


Isn't it amazing!

I'll show you how to use this project with a few lines of Python code.

First, install docarray

pip install "docarray[common]>=0.13.5" jina

Define server_url variable to store the dalle-flow model address

server_url = 'grpc://'

server_url is the official service provided, we can also follow the documentation and deploy the model to our own server (GPU required).

Submit the text to the server and get the candidate images.

prompt = 'a teddy bear on a skateboard in Times Square'
from docarray import Document

da = Document(text=prompt).post(server_url, parameters={'num_images': 2}).matches

After submitting the text, the server calls the DALL-E-Mega algorithm to generate candidate images, and then calls CLIP-as-services to rank the candidate images.

We specify num_images equal to 2, and eventually 4 images will be returned, 2 from the DALLE-mega model and 2 from the GLID3 XL model. Since the server_url server is abroad, the program may take a long time to run, so you should wait more when running it.

After the program is finished, we will show these 4 images

da.plot_image_sprites(fig_size=(10,10), show_index=True)


We can select one of them and continue to submit it to the server for diffusion.

Each image has a number in the upper left corner, here I have selected the image numbered 2

fav_id = 2
fav = da[fav_id]

diffused ='{server_url}', parameters={'skip_rate': 0.5, 'num_images': 36}, target_executor='diffusion').matches

diffusion actually takes the selected image and feeds it into the GLID-3 XL model to enrich the texture and background.

The returned results are as follows.


We can choose a satisfactory image from among them for the final result page.

fav = diffused[6]

2. Knowledge of dalle-flow algorithm

The dalle-flow project is simple to use, but the DALL-E algorithm involved is complex, and is only briefly described here.

The goal of DALL-E is to treat text token and image token as a sequence of data and perform autoregression by Transformer.


This process is somewhat similar to machine translation, where machine translation translates English text into Chinese text, while DALL-E translates English text into images, where the token in the text is a word and the token in the image is a pixel.

Those who are interested in the dalle-flow project can run the above code and try deploying the model by themselves.

Related articles

37 Python Web Development Frameworks Summary

What exactly is a Web framework? Web frameworks are mainly used for web development. Web frameworks implement a lot of functionality and provide a common approach for implementing business logic.

auto-py-to-exe example

What is auto-py-to-exe? auto-py-to-exe is a graphical tool for packaging Python programs into executable files. This article is to introduce how to use auto-py-to-exe to complete the python program packaging.

Getting Started with Python Unit Testing Framework Pytest

Unit testing is the most important tool for code correctness verification, and the most important part of system testing. It is also the only test method that requires code to be written for testing.Getting Started with Python Unit Testing Framework Pytes

Installation and use of PyMuPDF

Before introducing PyMuPDF, let's learn about MuPDF, as you can see from the naming form, PyMuPDF is the Python interface form of MuPDF.

pandasql example

pandasql is a small but powerful package with only 358 lines of code. pandasql's idea is to let Python run SQL