IBM Granite

A Family of Open, Performant, & Trusted AI Models

View Only

Back to discussions

Expand all | Collapse all

granite3.2-vision Ollama for people counting

Kevin Kadow21 days ago

My Makerspace has started using Vision on an Nvidia RTX laptop for a people counting application at my ...

Julian Gonzalez3 days ago

Thank you for sharing! I've been planning a computer vision asssitive technology for DeafBlind individuals. ...

1. granite3.2-vision Ollama for people counting

Like
Kevin Kadow
Posted 21 days ago

Reply
My Makerspace has started using Vision on an Nvidia RTX laptop for a people counting application at my non-profit, with favorable results.

Initial testing was with cloud-based CV and LLM services, with mixed results. Everybody kept telling us that the only way to do people counting is with YOLO and a purpose-trained CV model.

For the local LLM (RTX 2070 with 8GB), we tested 3 different models. Granite vision provided the most accurate and consistent results, giving the correct count for our (chaotic, full of machine tools and safety gear and highly variable lighting) workspaces about 90% of the time, and only off by ±1 for 10% of the sample images. Speed was acceptable -- qwen was faster, but gave almost random answers.

NAME SIZE/LOADED SIZE ERROR SECONDS/FRAME
qwen2.5vl:3b 3.2 GB / 6.0GB +3 1
gemma3:latest 3.3 GB / 6.2GB ±2 9
granite3.2-vision:latest 2.4 GB / 7.5GB ±1 3

A listener is launched which loads Granite-Vision into Ollama and supplies a system prompt of "You are a terse digital assistant. You respond with short, simple answers, preferring to return integer numeric responses rather than sentences".

The service ingests near-realtime snapshot-on-motion (usually 704x480 px JPEG) events from 19 different public workspace cameras, and then makes a loopback API call to Ollama with a prompt of "Return as an integer the number of people in this image. The output should be just the numerical value alone". The answer is provided as a bare integer (e.g. "3") which feeds real-time occupancy trackers as well as a time-series database (influxdb).

Has anybody done any benchmarking or effectiveness testing with different image resolutions? Some of our cameras naturally provide snapshots at 1280x720 and these take twice as much long to upload and analyze (with any of the 3 models we tried) with only a slight improvement in error rate.

#LLM

------------------------------
Kevin Kadow
------------------------------
2. RE: granite3.2-vision Ollama for people counting

Like
Julian Gonzalez

IBM Champion
Posted 3 days ago

Reply
Thank you for sharing! I've been planning a computer vision asssitive technology for DeafBlind individuals. This will be useful!

I too am interested in benchmarking! Feel free to reach out!

------------------------------
Julian Gonzalez
Generative AI Engineer
CreativeAct Technologies
Orlando FL
------------------------------

Original Message

IBM Granite

IBM Granite

granite3.2-vision Ollama for people counting

Kevin Kadow21 days ago

Julian Gonzalez3 days ago

1. granite3.2-vision Ollama for people counting

2. RE: granite3.2-vision Ollama for people counting

Additional
Resources

Office

Quick Links

IBM Granite

IBM Granite

granite3.2-vision Ollama for people counting

Kevin Kadow21 days ago

Julian Gonzalez3 days ago

1. granite3.2-vision Ollama for people counting

2. RE: granite3.2-vision Ollama for people counting

Related Content

Granite-Pi - Granite3.2:2b(Ollama) running locally on a Raspberry Pi 5(8gb)

Structured Data Extraction from Unstructured Text Python LLMs Ollama Pydantic Llama 3.2 Granite 3.2

Zero-Shot Prompt Structure - Granite 3.2

Granite3.3:2b(Ollama) for Synthesizer Preset Generation in Music Production

Zero-Shot Prompt Structure - Granite 3.2

Additional Resources

Office

Quick Links

Additional
Resources