Stability AI is known for its Stable Diffusion text-generating image model. Recently, the company has partnered with global semiconductor giant Arm to introduce the AI capabilities of generating audio into mobile devices. This collaboration enables the Stable Audio Open model to run completely on Arm CPUs, allowing users to quickly generate sound effects, audio samples and production elements on the device without an internet connection.
Stability AI says that as generative artificial intelligence becomes more and more widely used among enterprises and professional creators, it is particularly important to ensure that our models and workflows are easily used in every creative field. This not only improves creative efficiency, but also helps seamlessly integrate these technologies into the visual media production process.
Faced with growing demand, the company aims to increase the efficiency of its models operating on edge devices. In the process of optimizing the Stable Audio Open model to suit mobile devices, the initial tests generated audio on an Arm CPU device for 240 seconds. By distilling the model and leveraging Arm's software stack, especially through the int8 matrix multiplication kernel in XNNPack's KleidiAI, the company successfully reduced the time to generate an 11-second audio clip to 8 seconds, improving the response speed by 30 times.
It should be noted that users need a compatible mobile device to experience this feature. Considering that most smartphones today are equipped with CPUs with Arm architecture, this technology has become more accessible to all kinds of users. In the future, Stability AI also plans to bring all its models in the fields of image, video and 3D to the edge devices, aiming to revolutionize the way visual media is created on mobile devices.