March 13, 2025

ikayaniaamirshahzad@gmail.com

FLUX Image Generation with DigitalOcean


We have talked a lot about the capabilities and potential of Deep Learning Image Generation here on the Paperspace by DigitalOcean Blog. Not only are image generation tools fun and intuitive to use, but they are one of the most widely democratized and distributed AI models available to the public. Really, the only Deep Learning technology with a larger social footprint are Large Language Models.

For the last two years, Stable Diffusion, the first publicly distributed and functional image synthesis model, has completely dominated the scene. We have written about competitors like PixArt Alpha/Sigma and done research into others like AuraFlow, but, at the time of each release, nothing has set the tone like Stable Diffusion models. Stable Diffusion 3 remains one of the best open source models out there, and many are still trying to emulate their success.

Last week, this paradigm changed with the release of FLUX from Black Forest Labs. FLUX represents a palpable step forward in image synthesis technologies in terms of prompt understanding, object recognition, vocabulary, writing capability, and much more. In this tutorial, we are going to discuss what little is available to the public about the two open-source FLUX models, FLUX.1 schnell and FLUX.1-dev, before the release of any Flux related paper from the research team. Afterwards, we will show how to run Flux on a DigitalOcean GPU Droplet powered by an NVIDIA H100 GPU.

  • Python: The content of this article is highly technical. We recommend this piece to readers experienced with both Python and basic concepts in Deep Learning. For new users, this article may be a good place to start.
  • Cloud GPU: Running FLUX.1 will require a sufficiently powerful GPU. We recommend at least 40 GB VRAM machines at the minimum.

FLUX was created by the Black Forest Labs team, which is comprised largely of former Stability AI staffers. The engineers on the team were directly responsible for the development/invention of both VQGAN and Latent Diffusion, in addition to the Stable Diffusion model suite.

Very little has been made public about the development of the FLUX models, but we do know the following:

This is the most of what we know about the improvements to typical Latent Diffusion Modeling techniques they have added for FLUX.1. Fortunately, they are going to release an official tech report for us to read in the near future. In the meantime, they do provide a bit more qualitative and comparative information in the rest of their release statement.

Let’s dig a bit deeper and discuss what information was made available in their official blog post:

Comparison of leading Image Synthesis models based on ELO (Source)

Comparison of leading Image Synthesis models based on ELO (Source)

The release of FLUX is meant to “define a new state-of-the-art in image detail, prompt adherence, style diversity and scene complexity for text-to-image synthesis” (Source). To better achieve this, they have released three versions of FLUX: Pro, Dev, and Schnell.

The first is only available via API, while the latter two are open-sourced to varying degrees. As we can see from the plot above, each of the FLUX models performs comparably to the top performant models available both closed and open source in terms of quality of outputs (ELO Score). From this, we can infer that each of the FLUX models has peak quality image generation both in terms of understanding of the text input and potential scene complexity.

Let’s look at their differences between these versions more closely:

  • FLUX.1 pro: is their best performant version of the model. It offers state-of-the-art image synthesis that outmatches even Stable Diffusion 3 Ultra and Ideogram in terms of prompt following, detail, quality, and output diversity. (Source)
  • FLUX.1 dev: FLUX.1 dev is an “open-weight, guidance-distilled model for non-commercial applications” (Source). It was distilled directly from the FLUX.1 pro model, and offers nearly the same level of performance at image generation in a significantly more efficient package. This makes FLUX.1 dev the most powerful open source model available for image synthesis. FLUX.1 dev weights are available on HuggingFace, but remember the license is restricted to only non-commercial use
  • FLUX.1 schnell: Their fastest model, schnell is designed for local development and personal use. This model is capable of generating high quality images in as little as 4 steps, making it one of the fastest image generation models ever. Like dev, schnell is available on HuggingFace and inference code can be found on GitHub

image

(Source)

The researchers have identified 5 traits to measure Image Generation models more specifically on, namely: Visual Quality, Prompt Following, Size/Aspect Variability, Typography and Output Diversity. The above plot shows how each major Image Generation model compares, according to the Black Forest Team, in terms of their ELO Measure. They assert that each of the pro and dev versions of the models outperforms Ideogram, Stable Diffusion3 Ultra, and MidJourney V6 in each category. Additionally, they show in the blog that the model is capable of a diverse range of resolutions and aspect ratios.

All together, the release blog paints a picture of an incredibly powerful image generation model. Now that we have seen their claims, let’s run the Gradio demo they provide on a NVIDIA H100 and see how the model holds up to them.

To run the FLUX demos for schnell and dev, we first need to create a GPU Droplet on DigitalOcean, or whatever preferred cloud provider you choose. We recommend using an H100 or A100-80G GPU for this task, but an A6000 should also handle the models without issue. See the DigitalOcean Documentation for details on getting started with GPU Droplets and setting up SSH.

Setup

FLUX GitHub repository onto our Machine and move into the new directory.

cd Downloads
git clone https://github.com/black-forest-labs/flux
cd flux

Once the repository is cloned and we’re inside, we can begin setting up the demo itself. First, we will create a new virtual environment, and install all the requirements for FLUX to run.

python3.10 -m venv .venv
source .venv/bin/activate
pip install -e '.[all]'

This will take a few moments, but once it is completed, we are almost ready to run our demo. All that is left is to log in to HuggingFace, and navigate to the FLUX dev page. There, we will need to agree to their licensing requirement if we want to access the model. Skip this step if you plan to only use schnell.

Next, go to the HuggingFace tokens page and create or refresh a new Read token. We are going to take this and run

huggingface-cli login

in our terminal to give the access token to the HuggingFace cache. This will ensure that we can download our models when we run the demo in a moment.

Starting the Demo

Running the Demo

Prompting for text

General Prompt Engineering

Aspect Ratios

Source link

Leave a Comment