InternViT-300M-448px-V2_5
Enhanced InternViT for improved visual feature extraction in complex data, ideal for researchers and developers.
What is InternViT-300M-448px-V2_5?
InternViT-300M-448px-V2_5 is an advanced visual recognition model that enhances feature extraction capabilities particularly in underrepresented domains like multilingual OCR and math diagrams. It uses ViT incremental learning and NTP loss to improve performance on rare data. This model integrates with pre-trained LLMs and supports multi-modal data including images and videos, making it ideal for researchers and developers in image classification and text recognition tasks.