This guide explains how to set up Z-Image Turbo on your local machine. This powerful model uses a 6B-parameter architecture to generate high-quality images with exceptional text rendering capabilities.
🚀 No GPU? No Problem.
If you don’t have a high-end graphics card or want to skip the installation process, you can use the online version immediately:
Z-Image Online: Free AI Generator with Perfect Text
Generate 4K photorealistic AI art with accurate text in 20+ languages. Fast, free, and no GPU needed. Experience the best multilingual Z-Image tool now.
1. Hardware Requirements
To run this model effectively locally, your system needs to meet specific requirements:
- GPU: A graphics card with 16 GB of VRAM is recommended. Recent consumer cards (like the RTX 3090/4090) or data center cards work best. Lower memory devices may work with offloading but will be significantly slower.
- Python: Version 3.9 or newer.
- CUDA: Ensure you have a working installation of CUDA compatible with your GPU drivers.
2. Create a Virtual Environment
It is best practice to isolate your project dependencies to prevent conflicts with other Python projects.
- Open your terminal application.
- Run the command below to create a new environment named
zimage-env:
python -m venv zimage-env
- Activate the environment:
# On Linux or macOS
source zimage-env/bin/activate
# On Windows
zimage-envScriptsactivate
3. Install PyTorch and Libraries
You must install a version of PyTorch that supports your GPU. The commands below target CUDA 12.4.
- Note: Adjust the index URL if you require a different CUDA version.
- We install
diffusersdirectly from the source to ensure compatibility with the latest Z-Image features.
pip install torch --index-url [https://download.pytorch.org/whl/cu124](https://download.pytorch.org/whl/cu124)
pip install git+[https://github.com/huggingface/diffusers](https://github.com/huggingface/diffusers)
pip install transformers accelerate safetensors
4. Load the Z-Image Turbo Pipeline
Create a Python script (e.g., generate.py) to load the model. We use the ZImagePipeline class wrapper and bfloat16 precision to save memory without sacrificing quality.
import torch
from diffusers import ZImagePipeline
# Load model from Hugging Face
pipe = ZImagePipeline.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=False,
)
# Move pipeline to GPU
pipe.to("cuda")
5. Generate an Image
You can now generate an image. This model is optimized for speed and works well with just 9 inference steps and a guidance scale of 0.0.
Copy the following code into your script:
prompt = "City street at night with clear bilingual store signs, warm lighting, and detailed reflections on wet pavement."
image = pipe(
prompt=prompt,
height=1024,
width=1024,
num_inference_steps=9,
guidance_scale=0.0,
generator=torch.Generator("cuda").manual_seed(123),
).images[0]
image.save("z_image_turbo_city.png")
print("Image saved successfully!")
6. Optimization Options
Performance Tuning
If you have supported hardware, you can enable Flash Attention 2 or compile the transformer to speed up generation:
# Switch attention backend to Flash Attention 2
pipe.transformer.set_attention_backend("flash")
# Optional: Compile the transformer (requires PyTorch 2.0+)
# pipe.transformer.compile()
Low Memory Mode (CPU Offload)
If your computer has limited VRAM (less than 16GB), you can use CPU offloading. This moves parts of the model to system RAM when they are not in use.
- Note: This allows the model to run on smaller GPUs, but generation will take longer.
pipe.enable_model_cpu_offload()
