CFG Scale, or Classifier Free Guided Scale, is a key parameter in Stable Diffusion and other diffusion models. It controls how well the model follows the cue words you provide. It essentially strikes a balance between the influence of text cues and the model’s inherent prior knowledge of images.
Here's a detailed explanation:
Classifier-Free Guidance: This is the underlying technology. Traditionally, diffusion models use a classifier to guide the denoising process so that it generates images that match the cue words. Classifier free bootstrapping simplifies this process by training the model to predict both with and without cue words.
How it works: During the image generation process, the model makes two predictions:
The final forecast is then calculated as a weighted combination of these two forecasts.
A prediction is guided by a textual prompt word.
Another prediction was unguided (no cue words).
What CFG Scale does: The CFG Scale value determines the weight given to the difference between guided and unguided predictions.
Low CFG Scale (e.g., 1-3): The model relies more on its own internal image knowledge. The resulting images will be more diverse and creative, but they may be less similar to the prompt words. You'll often see more artistic interpretations and unexpected elements.
Medium CFG Scale (e.g., 5-10): This is generally considered the optimal value. The model strikes a balance between following prompt words and creating freedom. The resulting images usually match the prompt words well while still retaining some artistic flair.
High CFG Scale (e.g., 12-20 or higher): The model strongly prefers following cue words. The resulting images will be very close to a literal interpretation of the prompt words, but they will sometimes look over-processed, uncreative, and may contain artifacts. Very high values may also result in reduced image quality.
analogy:
Imagine that you ask a painter to paint a picture of "red apples on a table."
Low CFG Scale: A painter might draw a vague, reddish round object on the surface, but it might not look much like an apple, and the table might be abstract.
Medium CFG Scale: The artist will draw a recognizable red apple on a clear table.
High CFG Scale: A painter will meticulously recreate a realistic red apple on a perfectly rendered table, perhaps even adding unnecessary details based on their stereotype of a "table".
Summarize:
CFG Scale is a key parameter that controls the balance between prompt word compliance and creative freedom in Stable Diffusion. Experimenting with different values is critical to finding the best settings for the desired results. Generally, it's a good idea to start around 7 and adjust up or down based on the results.
Some additional notes and tips:
Prompt word quality: Even if the CFG Scale is set high, if your prompt words are blurry or unclear, the results may still be unsatisfactory. Clear, specific prompt words are the basis for good results.
Sampling method: Different sampling methods may respond differently to CFG Scale. Some samplers may perform better at high CFG Scale, while others may produce better results at low CFG Scale.
Negative cue words: Combining negative cue words (cue words that describe what you don’t want to see in an image) can further improve image quality and cue word compliance.
Dynamic CFG: Some advanced Stable Diffusion implementations provide the option of "dynamic CFG", which automatically adjusts the CFG Scale during the generation process to obtain better results.
Experiment and Observe: The best way to understand CFG Scale is to conduct lots of experiments and observe the effect of different values on the resulting images. With practice, you'll get a better grasp of how to use this parameter to get the results you want.
Hope the above information is helpful to you!
AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.
The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.
Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).
You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.
You can work as a data scientist, machine learning engineer, AI researcher, or apply AI technology to innovate in all walks of life.