Anonymous · Arena · 04 · 07 · 2026

An anonymous model showed up on the leaderboard.

No press release. No team attribution. Within days, it took #2 in both Text-to-Video and Image-to-Video on the Artificial Analysis Video Arena. Three days later, Alibaba ATH claimed it. Meet HappyHorse-1.0 — a 15B-parameter unified video + audio model.

  • #2 TTV · 1444 ELO
  • #2 ITV · 1444 ELO
  • 15B params
  • Alibaba ATH
demo · unified-audio-videolive · autoplay

Origin story

A silent launch, then a confession.

For three days the AI community tried to guess who built it. The answer was unconventional enough to be its own product strategy.

2026 · 04 · 07

Appeared anonymously.

A pseudonymous entry quietly lands on the Artificial Analysis Video Arena leaderboard. No announcement, no team, no paper. Within 48 hours it is trading punches with Veo-3.1 and Grok-Imagine at the top of both video tracks.

blind arena · no attribution

2026 · 04 · 10

Alibaba ATH claims it.

Three days after the mystery landing, Alibaba confirms HappyHorse-1.0 as a project from their ATH AI Innovation Unit — a 15B-parameter unified multimodal Transformer that generates cinematic video with natively synchronised audio in a single inference pass.

ath ai innovation unit · confirmed

Arena ELO · April 2026

#2 on both tracks, by a comfortable margin.

Blind-tested pairwise on the Artificial Analysis Video Arena. Numbers below reflect public Elo as of mid-April 2026.

Text-to-Video Elo
1444

#2 — leading #3 Veo-3.1 (with audio) by +69 points.

Image-to-Video Elo
1444

#2 — leading #3 Grok-Imagine-Video-720p by +23 points.

Delta to next model
+69

Gap to the closest head-to-head contender on TTV.

Artificial Analysis Video Arena leaderboard snapshot showing HappyHorse-1.0 at rank 2 on both text-to-video and image-to-video tracks
source · artificial analysis video arenasnapshot · 2026 · 04 · 13

Four technical breakthroughs

What the benchmarks are actually measuring.

Audio. Consistency. Motion. Instruction-following. HappyHorse moves each of these meaningfully — watch each demo and the pattern shows up in a few seconds.

01 / Unified architecture

Video and audio in a single pass.

Most AI video pipelines generate video first and then bolt audio on — a two-stage process that leaks latency and lip-sync drift. HappyHorse-1.0 uses a unified multimodal Transformer that emits synchronised video + audio in one inference pass.

  • Native audio with 7-language multilingual lip-sync
  • Text-to-video + image-to-video transformation
  • Up to 1080p output resolution
  • No cross-attention bridge between modalities
01 · unified-audio-videosingle-pass

02 / Consistency

87% cross-clip stability.

Characters, lighting, styles and environments hold across every cut — the highest cross-clip consistency reported by any video model in 2026. That stability is what lets a 20-second scene feel like a film rather than a slideshow of related frames.

  • Stable character identity
  • Persistent lighting & colour grading
  • Coherent environmental continuity
  • Style locked across cuts
02 · cross-clip-consistency87%

03 / Motion & physics

Motion that stops feeling AI-generated.

Fewer floaty gestures. Cleaner scene transitions. Physics that holds under scrutiny — objects with weight, fabrics that fold, characters that don't morph between frames. The kind of baseline that professional workflows actually need.

03 · motion-qualityphysics

04 / Prompt following

Reads the brief. Executes it.

Complex camera moves, specific lighting conditions, nuanced character interactions — HappyHorse-1.0 lands multi-part prompts with notably less re-rolling. Fewer regenerations mean less waste, faster iteration, closer-to-intent outputs.

  • Camera path & lens specifications
  • Lighting & atmosphere direction
  • Character interaction & blocking
  • Negative prompts & exclusions
04 · prompt-followingmulti-part

Technical specifications

The numbers, in one place.

Everything Alibaba ATH has disclosed about HappyHorse-1.0 as of April 2026. Weights, pricing and API surface are still pending the official release.

Model size

15 billion parameters

Architecture

40-layer self-attention Transformer · unified multimodal · no cross-attention

Developer

Alibaba — ATH AI Innovation Unit

Output resolution

Up to 1080p

Audio

Native, natively synchronised with video

Multilingual lip-sync

7 languages

Input modes

Text-to-video · Image-to-video · Video-to-video · Image + Text-to-video

Arena position

#2 Text-to-Video & Image-to-Video (Artificial Analysis, April 2026)

Head-to-head

How the silent arenas actually rank.

Artificial Analysis maintains four leaderboards — text/image to video, each with or without audio. Noon, 13 April 2026: HappyHorse leads the no-audio tracks with a historic image-to-video high.

Text-to-Video · No Audio

Apr 13 · 2026
  1. 01

    HappyHorse-1.0

    Alibaba-ATH

    1,384
  2. 02

    Dreamina Seedance 2.0 720p

    ByteDance Seed

    1,273
  3. 03

    SkyReels V4

    Skywork AI

    1,243
  4. 04

    Kling 3.0 1080p (Pro)

    KlingAI

    1,240
  5. 05

    grok-imagine-video

    xAI

    1,228
  6. 06

    Runway Gen-4.5

    Runway

    1,223
  7. 07

    Vidu Q3 Pro

    Vidu

    1,222
TTV · no audio1384 elo · +111 vs seedance 2.0

Image-to-Video · No Audio

Apr 13 · 2026
Rank 01 · Alibaba-ATHPlatform high

HappyHorse-1.0

1413

Highest ELO ever recorded on the platform — +56 over Seedance 2.0, on 14,587 samples.

  1. 02

    Dreamina Seedance 2.0 720p

    ByteDance Seed

    1,357
  2. 03

    grok-imagine-video

    xAI

    1,331
  3. 04

    PixVerse V6

    PixVerse

    1,308
  4. 05

    Kling 3.0 Omni 1080p (Pro)

    KlingAI

    1,298
  5. 06

    SkyReels V4

    Skywork AI

    1,295
  6. 07

    Kling 2.5 Turbo 1080p

    KlingAI

    1,293
  7. 08

    Veo 3.1 Fast

    Google

    1,290
ITV · no audio1413 elo · platform high

HappyHorse advantages

  • Unified architecture — video + audio in one pass
  • Native audio beats bolted-on TTS pipelines
  • 87% cross-clip consistency (highest in class)
  • 7-language multilingual lip-sync out of the box
  • Backed by Alibaba ATH AI Innovation Unit

What the silent launch signals

Shipping an unattributed model to the top of a blind arena — and waiting three days to claim it — is a confident product strategy. It says: the technology is ready to be judged without the logo. Whether Alibaba repeats that playbook or not, the bar for every other video lab just moved.

Where it earns its keep

Four kinds of team waiting for this drop.

Creators & marketers

Cinematic clips at scale without a crew. Social-ready output, native audio, brand-consistent style.

Film & entertainment

Pre-vis, storyboards, final-tier B-roll. 87% cross-clip stability survives a timeline cut.

E-commerce

Product videos with multilingual voiceovers from one prompt — ship localised creative globally.

Education & training

Learning content with accurate lip-sync across seven languages — accessibility by default.

Availability

Not public yet. We're waiting too.

No public API, no downloadable weights, no confirmed pricing. HappyHorse-1.0 is on top of the arena while its launch surface catches up.

Public API
Pending — no documented endpoint as of 22 April 2026.
Model weights
Not released. Closed, hosted-inference only at launch.
Yihook integration
The moment the HappyHorse API opens up, Yihook will wire it into the video pipeline — waitlist first.

Coming to Yihook at launch

When HappyHorse opens up, you don't want to refresh the leaderboard.

Unified video + audio. 87% cross-clip consistency. Seven-language lip-sync. Available inside Yihook the moment the API goes live — waitlist members get access first.

Join Yihook todayNo credit card required · Free credits on signup
HappyHorse-1.0 — Alibaba ATH’s Anonymous #2 AI Video Model, Explained | Yihook