watsonx.ai

 View Only

🔥 New Multimodal Model - Welcome Pixtral 12B, the first-ever multimodal Mistral model

By NICK PLOWDEN posted 21 days ago

  

Here's all you need to know about it:

1. This model is capable of understanding images and text.
2. It can handle variable image resolution, supporting images with arbitrary sizes.
3. It can process large documents with interleaved text and images
4. It has a 128k context window
5. and... it is Open Source! Open-weights available with Apache 2.0 license

The performance is the best in multimodal and text benchmarks compared to other Open-Source multimodal models such as Phi-3 Vision, LLaVA-OV 7B, Qwen2-VL 7B, or Claude-3 Haiku.

The best? This 12B open-source model beats commercial Closed Models of similar size, and it is competitive against much larger closed models such as GPT-4o or Claude-3.5 Sonnet.

See Armand Ruiz's, VP of AI Platform, IBM,  post on LinkedIn.

Bye for now,

Nick


#watsonx.ai
#GenerativeAI

0 comments
15 views

Permalink