We introduce Cosmos3, afamilyofomnimodalworldmodelsdesignedtojointlyprocessandgeneratelan
guage, image, video, audio, and action sequences within a unified mixture-of-transformers architecture.
By supporting highly flexible input-output configurat...