Tesla announces launch of universal AI fully autonomous driving solution
Hugging Face acquires Pollen Robotics to enter the field of open source robot hardware
GPT-4.1 model unveiled! Cursor and Windsurf help developers encode more efficiently
OpenAI future model access will require authentication: Improve security and compliance
Flux.1 is an open source AI image generation model developed by Black Forest Labs. Grok uses this model to implement its image generation function.
Flux.1 is a powerful text-to-image generation model with 12 billion parameters and is one of the largest open source literary and biographical models at present. It comes in three variants:
Flux.1 [pro]: Closed source version, best performance, suitable for commercial use, accessed through API.
Flux.1 [dev]: Open source version, suitable for non-commercial use, suitable for developers or individual users.
Flux.1 [schnell]: Open source and commercial, fastest, suitable for local development and personal use.
Grok's image generation function is mainly based on the Flux.1 model (specific versions may vary by Grok's implementation, but are usually open source [dev] or [schnell] versions). According to user feedback on the X platform, Grok's image generation is excellent, especially in terms of detail processing and prompt word compliance.
As a chatbot, Grok integrates the image generation function of Flux.1, and you can generate images directly through conversations. The following are the specific steps:
Open Grok's chat interface (for example on X platform).
Make sure you have permission to use the image generation feature. According to feedback from X platform users, free users can send 10 messages every 2 hours, including image generation requests.
Enter your image generation requirements directly into Grok and use natural language descriptions. For example:
“Generate a post-apocalyptic wasteland painting with robots, humans, desolation, ruin, and technology, from a bird's-eye view.”
Chinese prompt words are also OK, but it is recommended to use English because Flux.1 has a more accurate understanding of English prompt words. For example: “Generate a post-apocalyptic wasteland-style painting with robots, humans, desolation, ruins and technology, taking a bird’s-eye view.”
Tips Suggestions:
Try to be specific and describe the details of the scene, style, perspective, color, etc.
If you need a specific art style, you can add "in the style of [style]", such as "in the style of a cyberpunk painting".
If the result is not as expected, you can adjust the prompt words or add more details.
Grok will call the Flux.1 model to generate images, which usually takes several seconds to tens of seconds (the specific time depends on the server load and prompt word complexity).
After the generation is completed, Grok will directly return the image, which you can view and download.
If you are not satisfied with the generated results, you can further adjust through the dialogue. For example:
“Make the scene darker and add more robots.”
“Change the perspective to a ground-level view.”
Grok will regenerate the image based on your feedback.
Limitations of use
Free users have message limits (10 messages per 2 hours). If you need more generations, you can consider upgrading to a paid account.
The Flux.1 [dev] version is not commercially available. If you plan to use the generated image for commercial purposes, you need to confirm the specific Flux.1 version used by Grok (the [schnell] version is commercially available).
If you want to use Flux.1 more deeply, or if the number of generation limits of Grok affects your experience, you can choose to deploy the Flux.1 model locally and then migrate the generated image prompt words from Grok to your local workflow. Here are the steps for local deployment:
Hardware requirements:
GPU: At least 16GB of video memory is recommended (e.g. NVIDIA RTX 3090) and a minimum of 12GB (using the quantized version).
Memory: At least 32GB of system RAM.
Storage: The Flux.1 model file is large (about 23GB), the quantized version is about 11GB, and sufficient space needs to be reserved.
Software requirements:
Operating system: Windows, Linux or Mac is OK.
Python 3.10+.
Git (for cloning repositories).
ComfyUI (an image generation interface that supports Flux.1).
1. Install ComfyUI : comfyui installation guide
2. Download the Flux.1 model :
Access the Flux.1 model repository on Hugging Face:
Flux.1 [dev]: https://huggingface.co/black-forest-labs/FLUX.1-dev
Flux.1 [schnell]: https://huggingface.co/black-forest-labs/FLUX.1-schnell
Download the model file (such as flux1-dev.safetensors or flux1-schnell.safetensors).
If the memory is insufficient, you can download the quantitative version (FP8), such as flux1-dev-fp8.safetensors.
Put the downloaded model file into the ComfyUI/models/unet directory.
3. Download the CLIP and VAE model:
Flux.1 requires additional CLIP and VAE models to handle text and image generation.
Download from Hugging Face:
CLIP: https://huggingface.co/comfyanonymous/flux_text_encoders
Files include clip_l.safetensors and t5xxl_fp16.safetensors (or FP8 version).
Put the files into the ComfyUI/models/clip and ComfyUI/models/vae directories.
1. Start ComfyUI :
2. The browser will automatically open the ComfyUI interface (usually http://localhost:8188).
3. Load a Flux.1 workflow in ComfyUI:
You can download ready-made workflows (JSON files) from the Internet or build them manually.
4. Basic workflows include:
Load Diffusion Model Node.
Enter a prompt word (CLIP Text Encode node).
Set the generation parameters (resolution, sampling steps, etc.).
Output image (Save Image node).
In the ComfyUI interface, enter the Grok-generated prompt words (or your own prompt words) into the workflow.
Click the "Queue Prompt" button and wait for the generation to complete.
The speed of generation depends on hardware performance, for example, a graph generated on an RTX 3090 is about 20 seconds, and the RTX 3060 (12GB video memory) may take several minutes.
If the locally generated results need further adjustment, the image can be uploaded to Grok and requested further editing. For example:
“I generated this image [upload image], can you make the sky darker and add more ruins?”
If your hardware is not enough to support local deployment, you can experience Flux.1 through the online platform and generate images in combination with Grok's prompt words. The following are the recommended platforms:
Replicate:
Visit https://replicate.com/black-forest-labs/flux-pro.
New users have free trial quotas, supporting Flux.1 [pro], [dev] and [schnell].
Enter the prompt word and click "Run" to generate an image.
fal.ai:
Register to get $1 free points, supporting the generation of multiple images.
Price: Flux.1 [pro] about $0.055 per piece, [schnell] about $0.003 per piece.
Hugging Face:
Visit https://huggingface.co/black-forest-labs/FLUX.1-dev.
Enter the prompt word directly on the web page and generate it (may require login).
In order for Flux.1 to generate more expected images, the writing of prompt words is very important. Here are some tips:
Definite style: specify artistic styles, such as "cyberpunk style", "oil painting style".
Details of description: including color, light, viewing angle, etc., such as "a bird's-eye view, muted colors, dramatic lighting".
Avoid blurring: Don't use overly abstract descriptions like "something cool", but be concrete, such as "a futuristic city with neon lights and flying cars".
Negative prompt words: On some platforms (such as ComfyUI), negative prompt words can be set to avoid unwanted elements such as "no blurry details, no extra limbs".
Generating images directly using Flux.1 through Grok is the easiest way to just enter the prompt word. For advanced users, Flux.1 can be used more flexibly by deploying ComfyUI locally or using online platforms such as Replicate. Either way, Flux.1 can generate high-quality, detailed images.