Current location: Home> Ai Course> AI Basics

How to reduce memory usage and optimal optimization solutions for ComfyUI

Author: LoRA Time: 10 Mar 2025 1015

ComfyUI is a visual workflow tool based on Stable Diffusion, but it can consume a lot of VRAM (video memory) and RAM (system memory) when running. If your device has low memory (such as a 4GB-8GB GPU), or encounters an OOM (Out of Memory) error while running, you can reduce memory usage by following optimization methods.

1. The main factors affecting memory usage

In ComfyUI, the main factors that affect memory usage include:

  1. The size of the model itself

  2. Image resolution

  3. Sample and sampling steps

  4. Additional models such as ControlNet, LoRA, etc.

  5. ComfyUI's running mode

  6. Computational optimization (such as xFormers, Torch 2.0)

2. Start parameter optimization

When starting ComfyUI, you can enable some memory-saving modes through command line parameters:

 python main.py --lowvram --xformers --enable-torch2 

880376d760c9b15d246281a895a9a42d.png

Parameter description:

  • --lowvram low memory mode, suitable for GPUs with 4GB video memory, slows down but reduces memory usage.

  • --medvram medium video memory mode, suitable for GPUs with 6GB-8GB video memory, reducing occupancy but not affecting too much performance.

  • --xformers Enable xFormers to optimize Transformer computing and improve memory utilization.

  • --disable-metadata Turn off metadata storage and reduce RAM usage.

  • --enable-torch2 enables Torch 2.0 computing optimization to improve memory management efficiency.

3. Reduce the model video memory usage

Different Stable Diffusion models have different sizes and occupy different video memory:

  • SD 1.5 (fp16) → Approximately 3GB VRAM

  • SDXL (fp16) → Approximately 7GB VRAM

  • SDXL (fp32) → Approximately 12GB VRAM

  • SDXL-Turbo → Larger video memory requirements

  • 4-bit quantitative model (GPTQ, bitsandbytes) → can reduce video memory usage by 30%-50%

If you have insufficient video memory, it is recommended to use fp16 or 4-bit quantization model:

  • Stable Diffusion 1.5 (fp16)

  • Stable Diffusion XL (fp16)

  • 4-bit quantization SDXL (such as GPTQ quantization)

4. Reduce image resolution

When generating high-resolution images, the memory usage will increase dramatically. For example:

  • 512x512 → SD 1.5 about 2GB, SDXL about 4GB

  • 768x768 → SD 1.5 about 4GB, SDXL about 6GB

  • 1024x1024 → SD 1.5 about 6GB, SDXL about 10GB

Optimization suggestions:

  • Generate small-sized images ( 512x512 or 768x768 ) and zoom in with super resolution ( Upcaler ).

  • Using FreeU technology ( in Stable Diffusion's KSampler (sampler) or U-Net structures, image details can be enhanced by adjusting FreeU-related parameters ) to reduce computational requirements while maintaining quality.

5. Sampler and step optimization

The more sampling steps, the higher the memory usage and the longer the calculation time.

  • DPM++ SDE Karras → High Quality & Low Video Memory Essence (Recommended)

  • Euler a → Fast speed, suitable for low-video devices

  • DDIM → Low occupancy, suitable for quick picture output

Optimization suggestions:

  • Set the sampling step number to 20-30 to avoid high steps of 50+.

  • Use high-efficiency samplers such as DPM++ SDE Karras.

20250310-162722.jpg

6. Reduce ControlNet and LoRA usage

ControlNet and LoRA will significantly increase memory demand.

  • ControlNet takes up an additional 1GB-3GB

  • Single LoRA takes up an additional 500MB-1GB

  • Multiple LoRAs may exceed video memory

Optimization method:

  • Avoid loading multiple ControlNets at the same time, and 1-2 control nodes are recommended.

  • Use low-memory LoRA (such as 4-bit LoRA).

  • Load LoRA dynamically, enable and close on demand.

7. Let the CPU handle some tasks

Part of the calculation can be handed over to the CPU to reduce GPU pressure:

  • VAE decoding (VAEDecode)

  • CLIP processing

  • Preprocessing (such as canny, depth)

In ComfyUI, you can manually set some tasks to run on the CPU, or use the --cpu startup parameter (but it will slow down).

8. Use swap to swap memory

If you have insufficient video memory, Windows/Linux can enable virtual memory (Swap):

Windows Setting Method:

  1. Open System PropertiesAdvancedPerformanceAdvanced

  2. In virtual memory , manually set more than 16GB

Linux settings:

 sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

9. Limit parallel tasks

If you run multiple tasks at the same time in ComfyUI, the memory will be full.
suggestion:

  • Avoid running multiple graphs at once

  • Close unnecessary nodes

  • Clean up occupied processes in Task Manager/System Monitor

Summary: The best optimization solution

If the video memory is insufficient (4GB-8GB), the following settings are recommended:

 python main.py --lowvram --xformers --enable-torch2

at the same time:

  • Quantitative model using FP16 or 4-bit

  • Reduce resolution (recommended 512x512 or 768x768)

  • Set the sampling step number to 20-30, use DPM++ SDE Karras

  • Turn off unnecessary ControlNet/LoRA

  • Let the CPU handle some tasks (such as VAE decoding)

  • Turn on swap to swap memory to prevent overflow

This can effectively reduce memory usage and improve the operation efficiency of ComfyUI on low-video memory devices.