Dec 19, 2025 | The Tongyi Weekly: Your weekly dose of cutting-edge AI from Tongyi Lab

Hello, creators and builders,
This week was a harvest of breakthroughs in voice and video AI.
From Wan2.6 — our cinematic multimodal generation model that brings characters to life with consistent appearance, voice, and cinematic storytelling — to Fun-ASR and Fun-CosyVoice 3, our speech models now available with open-source versions, the future of expressive AI has never felt closer.

Let’s dive in.

👉 Subscribe to The Tongyi Weekly and never miss a release:
Subscribe Now → https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7392460924453945345

📣 Model Release & Updates

Introducing Wan2.6: The Cinematic Multimodal Generation Model

Starring: Cast characters from reference videos into new scenes. Support human or human-like figures, enabling complex multi-person and human-object interactions with appearance and voice consistency.
Intelligent Multi-shot Narrative: Turn simple prompts into auto-storyboarded, multi-shot videos. Maintain visual consistency and upgrade storytelling from single shots to rich narratives.
Native A/V Sync: Generate multi-speaker dialogue with natural lip-sync and studio-quality audio. It doesn’t just look real – it sounds real.
Cinematic Quality: 15s 1080p HD generation with comprehensive upgrades to instruction adherence, motion physics, and aesthetic control.
Advanced Image Synthesis and Editing: Deliver cinematic photorealism with precise control over lens and lighting. Support multi-image referencing for commercial-grade consistency and faithful aesthetic transfer.
Storytelling with Structure: Generate interleaved texts and images powered by real-world knowledge and reasoning capabilities, enabling hierarchical and structured visual narratives.

🔗 Try Wan 2.6 yourself (150 Free Credits Everyday!)

🔗 API

Fun-ASR Upgrade: Noise-robust, Multilingual, Customizabe ASR
We’re thrilled to unveil the newest evolution of Fun-ASR, our enterprise-grade end-to-end Automatic Speech Recognition model — now more noise-robust, more multilingual, and more customizable than ever. We’re also releasing the lightweight Fun-ASR-Nano (0.8B) model as open source.

Major Upgrades in Fun-ASR

Achieves 93% accuracy in real-world noisy environments such as conferences, metro stations, and in-car speech.
Breakthrough in lyric recognition: accurately transcribes vocals even with strong background music or rap-style delivery.
Supports 31 languages, with enhanced performance for East Asian & Southeast Asian languages including Japanese and Vietnamese.
Covers 7 major Chinese dialect groups and 26 regional accents with high precision.
The RAG-based solution boosts enterprise-grade customization by raising the hotword limit from 1,000 to 10,000 without compromising accuracy.

Fun-ASR-Nano (0.8B) Released as Open Source
Lightweight yet highly noise-resistant ASR model optimized for: Compute-sensitive scenarios, Edge devices, and Low-latency real-time recognition

🔗 Now available on:

Fun-CosyVoice 3: The Next-Generation Text-to-Speech Model
Fun-CosyVoice 3, our next-generation text-to-speech model — now faster, more expressive, and officially open-sourced.

What’s New in Fun-CosyVoice 3:

50% lower first-token latency with full bidirectional streaming TTS, enabling true real-time “type-to-speech” experiences.
Significant improvement in Chinese–English code-switching, with WER (Word Error Rate) reduced by 56.4%.
Enhanced zero-shot voice cloning: replicate a voice using only 3 seconds of audio, now with improved consistency and emotion control.
Support for 30+ timbres, 9 languages, 18 Chinese dialect accents, and 9 emotion styles, with cross-lingual voice cloning capability.
Achieves significant improvements across multiple standard benchmarks, with a 26% relative reduction in character error rate (CER) on challenging scenarios (test-hard), and certain metrics approaching those of human-recorded speech.

Fun-CosyVoice 3 (0.5B) Now Open Source
We’re releasing a lightweight yet powerful 0.5B-parameter version with:

Zero-shot voice cloning
Local deployment support
Outperforms popular open-source TTS models across evaluated metrics.
🔗 Explore & Download:
Modelscope
GitHub
github.io
Huggingface

Qwen Code v0.5.0: Smarter AI coding assistant
What’s new:

VSCode Integration: Bundled CLI into VSCode release package with improved cross-platform compatibility
Native TypeScript SDK: Seamlessly integrate with Node/TS
Smart Session Management: Auto-saves and continue conversations
Support for OpenAI-compatible reasoning models, including DeepSeek V3.2, Kimi-K2, and more
Control custom tools via SDK-hosted servers
Russian Language Support: Added internationalization with Russian language option
Enhanced User Experience: Terminal bell setting for audio notifications and session resume command display
Testing & Stability: Better Ubuntu shell support, faster SDK timeouts, and rock-solid test stability

👉Get started in Terminal:
npm install -g @qwen-code/qwen-code
🔗 Check out the full changelog

✨ Community Spotlights

Children’s Storytelling: COOLKIDS LoRA from Clumsy_Trainer

This Z-Image-Turbo LoRA captures the whimsy, warmth, and visual charm of children’s illustration — perfect for picture books, educational content, or animated shorts.
The generations feel like pages from a beloved storybook.
👉 Try it here

** Portrait Polisher: AWPortrait-Z from Shakker-Labs**
AWPortrait-Z is a native noise-reduction LoRA that polishes Z-Image’s portrait capabilities. From “relit” lighting to authentic skin texture, it is a massive quality-of-life upgrade for character generation.
👉Try it here

Z-Image Workflow Masterpiece from luneva
This Z-Image workflow generates pixel-level realistic details for both foregrounds and backgrounds at incredible speeds.
No brute force, no upscaling needed—just pure, high-density realism. A must-try for the community.
👉Try it here

🔥 Upcoming Events

WAN MUSE+ Season 3 “IN CHARACTER” Now Live
We’re thrilled to launch WAN MUSE+ Season 3: “IN CHARACTER” — a global creative challenge inviting you to explore identity, narrative, and AI expression.
Prize Pool: Up to $14,000

Best Narrative / Best Animated Short Award / Best Visual / Best PSA Award
Nomination & Special Inspiration Awards
How to Enter:
Post on TikTok / IG / X / YouTube with hashtags: #incharacter #wanmuse #wan
AIGC Platforms: SeaArt.Ai, WaveSpeedAI, Tensor.Art
🔗 Full details

📬 Want More? Stay Updated.

Every week, we bring you:

New model releases & upgrades
AI research breakthroughs
Open-source tools you can use today
Community highlights that inspire

👉 Subscribe to The Tongyi Weekly and never miss a release.
Subscribe Now → https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7392460924453945345

Thank you for being part of this journey.

Tongyi Lab is a research institution under Alibaba Group dedicated to artificial intelligence and foundation models, focusing on the research, development, and innovative applications of AI models across diverse domains. Its research spans large language models (LLMs), multimodal understanding and generation, visual AIGC, speech technologies, and more.

📣 Model Release & Updates

✨ Community Spotlights

🔥 Upcoming Events

📬 Want More? Stay Updated.

Leave a Reply Cancel reply