UniMuMo
UniMuMo generates music, actions, and text seamlessly, catering to creators in music, gaming, and VR.
What is UniMuMo?
UniMuMo is a cutting-edge multimodal model that generates cross-modal outputs from text, music, and motion data. It uses a unified encoder-decoder transformer architecture to bridge these modalities by converting them into token-based representations.