Anonymous · Arena · 04 · 07 · 2026

An anonymous model showed up on the leaderboard.

No press release. No team attribution. Within days, it took #2 in both Text-to-Video and Image-to-Video on the Artificial Analysis Video Arena. Three days later, Alibaba ATH claimed it. Meet HappyHorse-1.0 — a 15B-parameter unified video + audio model.

#2 TTV · 1444 ELO
#2 ITV · 1444 ELO
15B params
Alibaba ATH

Join Yihook Four breakthroughs

demo · unified-audio-videolive · autoplay

Origin story

A silent launch, then a confession.

For three days the AI community tried to guess who built it. The answer was unconventional enough to be its own product strategy.

2026 · 04 · 07

Appeared anonymously.

A pseudonymous entry quietly lands on the Artificial Analysis Video Arena leaderboard. No announcement, no team, no paper. Within 48 hours it is trading punches with Veo-3.1 and Grok-Imagine at the top of both video tracks.

blind arena · no attribution

2026 · 04 · 10

Alibaba ATH claims it.

Three days after the mystery landing, Alibaba confirms HappyHorse-1.0 as a project from their ATH AI Innovation Unit — a 15B-parameter unified multimodal Transformer that generates cinematic video with natively synchronised audio in a single inference pass.

ath ai innovation unit · confirmed

Arena ELO · April 2026

#2 on both tracks, by a comfortable margin.

Blind-tested pairwise on the Artificial Analysis Video Arena. Numbers below reflect public Elo as of mid-April 2026.

Text-to-Video Elo: 1444
Image-to-Video Elo: 1444
Delta to next model: +69

Artificial Analysis Video Arena leaderboard snapshot showing HappyHorse-1.0 at rank 2 on both text-to-video and image-to-video tracks — source · artificial analysis video arenasnapshot · 2026 · 04 · 13

Four technical breakthroughs

What the benchmarks are actually measuring.

Audio. Consistency. Motion. Instruction-following. HappyHorse moves each of these meaningfully — watch each demo and the pattern shows up in a few seconds.

01 / Unified architecture

Video and audio in a single pass.

Most AI video pipelines generate video first and then bolt audio on — a two-stage process that leaks latency and lip-sync drift. HappyHorse-1.0 uses a unified multimodal Transformer that emits synchronised video + audio in one inference pass.

Native audio with 7-language multilingual lip-sync
Text-to-video + image-to-video transformation
Up to 1080p output resolution
No cross-attention bridge between modalities

01 · unified-audio-videosingle-pass

02 / Consistency

87% cross-clip stability.

Characters, lighting, styles and environments hold across every cut — the highest cross-clip consistency reported by any video model in 2026. That stability is what lets a 20-second scene feel like a film rather than a slideshow of related frames.

Stable character identity
Persistent lighting & colour grading
Coherent environmental continuity
Style locked across cuts

02 · cross-clip-consistency87%

03 / Motion & physics

Motion that stops feeling AI-generated.

Fewer floaty gestures. Cleaner scene transitions. Physics that holds under scrutiny — objects with weight, fabrics that fold, characters that don't morph between frames. The kind of baseline that professional workflows actually need.

03 · motion-qualityphysics

04 / Prompt following

Reads the brief. Executes it.

Complex camera moves, specific lighting conditions, nuanced character interactions — HappyHorse-1.0 lands multi-part prompts with notably less re-rolling. Fewer regenerations mean less waste, faster iteration, closer-to-intent outputs.

Camera path & lens specifications
Lighting & atmosphere direction
Character interaction & blocking
Negative prompts & exclusions

04 · prompt-followingmulti-part

Technical specifications

The numbers, in one place.

Everything Alibaba ATH has disclosed about HappyHorse-1.0 as of April 2026. Weights, pricing and API surface are still pending the official release.

Model size

15 billion parameters

Architecture

40-layer self-attention Transformer · unified multimodal · no cross-attention

Developer

Alibaba — ATH AI Innovation Unit

Output resolution

Up to 1080p

Audio

Native, natively synchronised with video

Multilingual lip-sync

7 languages

Input modes

Text-to-video · Image-to-video · Video-to-video · Image + Text-to-video

Arena position

#2 Text-to-Video & Image-to-Video (Artificial Analysis, April 2026)

Head-to-head

How the silent arenas actually rank.

Artificial Analysis maintains four leaderboards — text/image to video, each with or without audio. Noon, 13 April 2026: HappyHorse leads the no-audio tracks with a historic image-to-video high.

Text-to-Video · No Audio

Apr 13 · 2026

01
HappyHorse-1.0
Alibaba-ATH
1,384
02
Dreamina Seedance 2.0 720p
ByteDance Seed
1,273
03
SkyReels V4
Skywork AI
1,243
04
Kling 3.0 1080p (Pro)
KlingAI
1,240
05
grok-imagine-video
xAI
1,228
06
Runway Gen-4.5
Runway
1,223
07
Vidu Q3 Pro
Vidu
1,222

TTV · no audio1384 elo · +111 vs seedance 2.0

Image-to-Video · No Audio

Apr 13 · 2026

Rank 01 · Alibaba-ATHPlatform high

HappyHorse-1.0

1413

Highest ELO ever recorded on the platform — +56 over Seedance 2.0, on 14,587 samples.

02
Dreamina Seedance 2.0 720p
ByteDance Seed
1,357
03
grok-imagine-video
xAI
1,331
04
PixVerse V6
PixVerse
1,308
05
Kling 3.0 Omni 1080p (Pro)
KlingAI
1,298
06
SkyReels V4
Skywork AI
1,295
07
Kling 2.5 Turbo 1080p
KlingAI
1,293
08
Veo 3.1 Fast
Google
1,290

ITV · no audio1413 elo · platform high

HappyHorse advantages

Unified architecture — video + audio in one pass
Native audio beats bolted-on TTS pipelines
87% cross-clip consistency (highest in class)
7-language multilingual lip-sync out of the box
Backed by Alibaba ATH AI Innovation Unit

What the silent launch signals

Shipping an unattributed model to the top of a blind arena — and waiting three days to claim it — is a confident product strategy. It says: the technology is ready to be judged without the logo. Whether Alibaba repeats that playbook or not, the bar for every other video lab just moved.

Where it earns its keep

Four kinds of team waiting for this drop.

Creators & marketers

Cinematic clips at scale without a crew. Social-ready output, native audio, brand-consistent style.

Film & entertainment

Pre-vis, storyboards, final-tier B-roll. 87% cross-clip stability survives a timeline cut.

E-commerce

Product videos with multilingual voiceovers from one prompt — ship localised creative globally.

Education & training

Learning content with accurate lip-sync across seven languages — accessibility by default.

Availability

Not public yet. We're waiting too.

No public API, no downloadable weights, no confirmed pricing. HappyHorse-1.0 is on top of the arena while its launch surface catches up.

Public API: Pending — no documented endpoint as of 22 April 2026.
Model weights: Not released. Closed, hosted-inference only at launch.
Yihook integration: The moment the HappyHorse API opens up, Yihook will wire it into the video pipeline — waitlist first.

Coming to Yihook at launch

When HappyHorse opens up, you don't want to refresh the leaderboard.

Unified video + audio. 87% cross-clip consistency. Seven-language lip-sync. Available inside Yihook the moment the API goes live — waitlist members get access first.

Join Yihook todayNo credit card required · Free credits on signup