Kimi visual model
Kimi visual model is an advanced image understanding technology provided by the Moonshot AI open platform, which can accurately identify and understand image content, including text, color, object shape, etc. It is efficient and accurate and suitable for various scenarios such as image content description and visual question and answer. The pricing is consistent with the moonshot-v1 series model. It is billed based on the total Tokens inferred by the model, and each picture consumes 1024 Tokens.
target users
Developers, researchers, and businesses requiring image understanding capabilities. Developers can easily integrate its powerful API interface; researchers can use it for image analysis and research; enterprises can improve business efficiency and user experience.
Usage scenario examples
Developer develops image question and answer application
Enterprises conduct automated image content review
Researchers conduct image recognition study
Product features
Support multiple rounds of conversations, understand context and answer questions
Provide streaming output and return results in real time
Tool calls can be made to expand the scope of application
Support JSON mode to facilitate data interaction
Support partial processing and response to improve efficiency
Internet search is not supported to ensure data security
Creating caches with image content is not supported, but already created caches can be used
Only supports base64 encoded image content
Tutorial
1 Get the Moonshot API key
2 Select the appropriate Kimi vision model, such as moonshot-v1-8k-vision-preview
3 Convert the image to base64 encoded string
4 Build an API request, including model name, image content and instructions
5 Send a request to the Moonshot AI open platform
6 Parse the response results and perform subsequent processing