Voyager is a video-generation model built on Tencent's HunyuanWorld 1.0 and integrated into the Hunyuan ecosystem alongside Hunyuan3D-2 and HunyuanVideo. Training used automated software to analyze camera motion and per-frame depth across over 100,000 clips from real-world footage and Unreal Engine renders, removing manual labeling. The model requires substantial GPU memory (minimum 60GB at 540p; 80GB recommended) and is released with Hugging Face weights and code for single- and multi-GPU setups. Licensing prohibits use in the EU, UK, and South Korea and restricts very large commercial deployments. Voyager reported top WorldScore performance but faces deployment challenges due to high computational demands.
To train Voyager, researchers developed software that automatically analyzes existing videos to process camera movements and calculate depth for every frame-eliminating the need for humans to manually label thousands of hours of footage. The system processed over 100,000 video clips from both real-world recordings and the aforementioned Unreal Engine renders. The model demands serious computing power to run, requiring at least 60GB of GPU memory for 540p resolution, though Tencent recommends 80GB for better results. Tencent published the model weights on Hugging Face and included code that works with both single and multi-GPU setups.
The model comes with notable licensing restrictions. Like other Hunyuan models from Tencent, the license prohibits usage in the European Union, the United Kingdom, and South Korea. Additionally, commercial deployments serving over 100 million monthly active users require separate licensing from Tencent. On the WorldScore benchmark developed by Stanford University researchers, Voyager reportedly achieved the highest overall score of 77.62, compared to 72.69 for WonderWorld and 62.15 for CogVideoX-I2V. The model reportedly excelled in object control (66.92), style consistency (84.89), and subjective quality (71.09), though it placed second in camera control (85.95) behind WonderWorld's 92.98. WorldScore evaluates world generation approaches across multiple criteria, including 3D consistency and content alignment.
Collection
[
|
...
]