Dec 26, 2025 | The Tongyi Weekly: Your weekly dose of cutting-edge AI from Tongyi Lab

🎄 Merry Christmas and Happy New Year!
As 2025 comes to a close, we want to extend our deepest gratitude to each of you for your creativity and support this year. Your experiments, feedback, and brilliant creations have been the heartbeat of our open ecosystem.
As a final gift of the year, we’re excited to share the newest models and tools born in this last week of 2025.
Let’s take a look at what’s just landed.

👉 Subscribe to The Tongyi Weekly and never miss a release:
Subscribe Now → https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7392460924453945345

📣 Model Release & Updates

*Introducing Qwen-Image-Layered: native image decomposition, fully open-sourced *

Why it stands out

Photoshop-grade layering: Physically isolated RGBA layers with true native editability
Prompt-controlled structure: Explicitly specify 3–10 layers — from coarse layouts to fine-grained details

Infinite decomposition: Keep drilling down: layers within layers, to any depth of detail

🔗 Get started:

New Open-Source End-to-End Voice Model: Fun-Audio-Chat
We’re open-sourcing Fun-Audio-Chat — an end-to-end voice model that’s more than just a chatbot.
It’s your AI voice partner:

Empathetic: Understands emotion, tone, and intent
Action-oriented: Follows voice commands to complete tasks
End-to-end S2S architecture: lower latency, higher efficiency.
Dual-resolution design: ~50% lower GPU cost
Leader in multiple benchmarks (OpenAudioBench, MMAU, etc.).
Open, efficient, and deeply useful.
🔗 Try it:
GitHub
Hugging Face
ModelScope
Demo

New Qwen3-TTS Lineup: VoiceDesign & VoiceClone
Create, control, and clone voices—faster and more expressive than ever.

VoiceDesign-VD-Flash

Fully controllable speech via free-form text instructions — tone, rhythm, emotion, persona
No preset voices. Design your own unique vocal identity
Outperforms GPT-4o-mini-tts & Gemini-2.5-pro on role-play benchmarks

VoiceClone-VC-Flash

Clone any voice from just 3 seconds of audio
Generate speech in 10 languages (CN / EN / JP / ES + more)
15% lower WER vs. ElevenLabs & GPT-4o-Audio in multilingual tests
Context-aware cadence for more natural delivery

🔗 Try it now

Qwen-Image-Edit-2511: Stronger Consistency & Real-World Image Editing
What’s new in 2511:

Stronger multi-person consistency for group photos and complex scenes
Built-in popular community LoRAs — no extra tuning required
Enhanced industrial & product design generation
Reduced image drift with dramatically improved character & identity consistency
Improved geometric reasoning, including construction lines and structural edits

From identity-preserving portrait edits to high-fidelity multi-person fusion and practical engineering & design workflows, 2511 pushes image editing to the next level.
🔗 Try it now

🧩 Ecosystem Highlights

Z-Image Turbo: #1 Open-Weight Text-to-Image Model in the Artificial Analysis Image Arena
According to Artificial Analysis, Z-Image Turbo now ranks #1 among all open-weight image models in the Artificial Analysis Image Arena.
Why it leads:

Only $5/1k images on Alibaba Cloud
Runs on consumer with just 16GB of memory
Apache 2.0 open source license
A 6B powerhouse that proves: high quality doesn’t require high cost.

✨ Community Spotlights

Portrait Photography: BEYOND REALITY Z IMAGE 1.0 from Nurburgring
This model, fine-tuned from Z-Image-Turbo, optimizes skin textures and environmental details while maintaining analog film aesthetics. It is available in both BF16 and FP8 versions, the latter being compatible with 8GB VRAM hardware.
👉 Try it here

📬 Want More? Stay Updated.

Every week, we bring you:
● New model releases & upgrades
● AI research breakthroughs
● Open-source tools you can use today
● Community highlights that inspire

👉 Subscribe to The Tongyi Weekly and never miss a release.
Subscribe Now → https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7392460924453945345

Thank you for being part of this journey.

Tongyi Lab is a research institution under Alibaba Group dedicated to artificial intelligence and foundation models, focusing on the research, development, and innovative applications of AI models across diverse domains. Its research spans large language models (LLMs), multimodal understanding and generation, visual AIGC, speech technologies, and more.

📣 Model Release & Updates

🧩 Ecosystem Highlights

✨ Community Spotlights

📬 Want More? Stay Updated.

Leave a Reply Cancel reply