Infrastructure as a Service

Infrastructure as a Service

Join us to learn more from a community of collaborative experts and IBM Cloud product users to share advice and best practices with peers and stay up to date regarding product enhancements, regional user group meetings, webinars, how-to blogs, and other helpful materials.

 View Only

From Zero to RAG-Hero: Building an AI Chatbot Using Oracle 23ai and IBM Granite LLM (on a Mac!)

By Naved Afroz posted yesterday

  

Before RAG becomes the most laughable chapter in AI innovation, let’s get our hands dirty — No RAGrets later. Jump-start kit from git 👉

Getting Started the RAG Way

Here’s a practical walkthrough for building a Retrieval-Augmented Generation (RAG) chatbot using Oracle 23ai and IBM’s Granite Code LLM — all running locally on a Mac M3 Pro.

                                                        Architecture Diagram run locally on MAC

A bold move — and a replicable one.

If you’ve got a Mac and a bit of curiosity, this guide is for you. It’s part how-to, part weekend build, and practical enough to actually finish.

Step 0: What Even is RAG?

Let’s start with a little analogy. If you’ve ever been on an interview panel, you’ll relate.

Let’s imagine a job interview scenario. A chatbot walks in as the candidate — dressed sharp, speaks fluently, and oozes confidence. Each time the panel asks a question, the chatbot replies with stunningly incorrect answers — classic LLM hallucinations, polished and poised.

Now imagine we give the same chatbot a magical device — let’s call it The RAG Implant. Think of it as something straight out of a Black Mirror episode or Neuralink’s brain-computer interface (BCI) — except instead of a chip in your brain, it’s a direct neural feed of knowledge into the chatbot.

This “implant” connects seamlessly to a magical GitHub repo filled with the company’s internal docs, policies, codebases, and even watercooler gossip.

Now, instead of hallucinating nonsense, the chatbot pauses, pulls up the exact relevant answer from the repo, and responds like a well-informed genius.

That, dear reader, is the power of Retrieval-Augmented Generation — a method where a language model is “fed” context retrieved from a database, then asked to generate a response. Think of it as:

Query → Search → Inject relevant data → LLM answers smartly

In our case:

  • The retrieval comes from Oracle database 23ai (with embeddings stored as vectors)
  • The generation is done using IBM’s Granite 8B model, running locally via Ollama

Step 1: Install Oracle 23ai Free Edition in a Docker Container (on a Mac!)

This part is where magic meets muscle — getting a full Oracle AI-enabled database up and running locally.

1.1 Prerequisites

Ensure you have:

  • A Mac with M1/M2/M3 (Intel also okay, with performance hit)
  • Homebrew
  • Docker + Colima
brew install docker docker-compose colima rust protobuf cmake python
colima start --cpu 4 --memory 8 --arch x86_64

1.2 Pull and Run Oracle 23ai

Place all your scripts in your local machine and mount while restart /Users/nava/Documents/data/scripts . This mount will facilitate seamless file share between your local machine and free-pdb container.

docker container ls
docker run -d \
-p 1521:1521 \
-p 1522:1522 \
-p 8443:8443 \
-p 27017:27017 \
-e WORKLOAD_TYPE=ATP \
-e WALLET_PASSWORD=Welcome321#_ \
-e ADMIN_PASSWORD=Welcome321#_ \
-v /Users/nava/Documents/data/scripts:/mnt/scripts \
--cap-add SYS_ADMIN \
--device /dev/fuse \
--volume adb_container_volume:/Users/nava/Documents/data \
--name adb-free \
container-registry.oracle.com/database/free:latest

docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
701c13cef9df container-registry.oracle.com/database/free:latest "/bin/bash -c $ORACL…" 44 hours ago Up 44 hours (healthy) 0.0.0.0:1521-1522->1521-1522/tcp, [::]:1521-1522->1521-1522/tcp, 0.0.0.0:8443->8443/tcp, [::]:8443->8443/tcp, 0.0.0.0:27017->27017/tcp, [::]:27017->27017/tcp adb-free

If the container is not starting then review logs and take necessary action by running docker logs -f adb-free to monitor startup.Oracle Database 23ai should now be running. Let’s peek inside.

Step 2: Oracle Internals — Say Hello to FREEPDB1

First, get a shell inside the running container , optionally change the password and validate the connection . You should see FREEPDB1 — READ WRITE.

docker exec -it adb-free /bin/bash

-- Optionally Change password for SYS/SYSTEM and PDBADMIN in PDB
cd /home/oracle
./setPassword.sh Welcome321#_

-- Validate connection:
sqlplus / as sysdba
SHOW PDBS;
SELECT name, open_mode FROM v$pdbs;

-- You should see: FREEPDB1 - READ WRITE
-- Also check parameters:
SHOW PARAMETER vector_memory_size;

-- If 0, you’ll need to bounce the DB and set:
ALTER SYSTEM SET VECTOR_MEMORY_SIZE = 1G SCOPE=SPFILE;
SHUTDOWN IMMEDIATE;
STARTUP;

-- Login to PDB as SYS
sqlplus sys/Welcome321#_@localhost:1521/FREEPDB1 as sysdba

Step 3: Create Vector-Ready User

Create a dedicated user and tablespace in PDB.

CREATE BIGFILE TABLESPACE TBS_VECTOR DATAFILE '/opt/oracle/oradata/FREE/FREEPDB1/vector01.dbf' SIZE 256M AUTOEXTEND ON MAXSIZE 2G;

CREATE USER vector_user IDENTIFIED BY "Oracle_4U" DEFAULT TABLESPACE TBS_VECTOR TEMPORARY TABLESPACE TEMP QUOTA UNLIMITED ON TBS_VECTOR;

GRANT DB_DEVELOPER_ROLE TO vector_user;
GRANT CREATE MINING MODEL TO vector_user;

Check user vector_user connectivity from your local machine. You should see Connected!

Once the new container is running, open Python REPL. This puts you inside the Python REPL (you will see >>> automatically) i.e. interactive mode.

python3 
#Then:
import oracledb
conn = oracledb.connect(user="vector_user", password="Oracle_4U", dsn="localhost:1521/FREEPDB1")
print("Connected!")

Step 4: Load the ONNX Model for Embedding

Load the Pre-built Embedding Generation model for Oracle Database 23ai on your free-pdb instance.

Download and place ONNX model into /mnt/scripts.

curl -O https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip
unzip all_MiniLM_L12_v2_augmented.zip

Login as sys in PDB and create directory and load the model as vector_user.


-- In the PDB as sys:
CREATE OR REPLACE DIRECTORY DM_DUMP AS '/mnt/scripts';
GRANT READ, WRITE ON DIRECTORY DM_DUMP TO vector_user;

-- Load the model as vector_user:
sqlplus VECTOR_USER/Oracle_4U@FREEPDB1;

-- Optionally drop the model first if a model with the same name already exists in the database.
exec DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'ALL_MINILM_L12_V2', force => true);

BEGIN
DBMS_VECTOR.LOAD_ONNX_MODEL(
directory => 'DM_DUMP',
file_name => 'all_MiniLM_L12_v2.onnx',
model_name => 'ALL_MINILM_L12_V2'
);
END;
/

-- Verify:
col model_name format a20
col mining_function format a12
col algorithm format a12
col attribute_name format a20
col data_type format a20
col vector_info format a30
col attribute_type format a20
set lines 120

SELECT model_name, mining_function, algorithm,
algorithm_type, model_size
FROM user_mining_models
WHERE model_name = 'ALL_MINILM_L12_V2'
ORDER BY model_name;

MODEL_NAME MINING_FUNCT ALGORITHM ALGORITHM_ MODEL_SIZE
-------------------- ------------ ------------ ---------- ----------
ALL_MINILM_L12_V2 EMBEDDING ONNX NATIVE 133322334

SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'ALL_MINILM_L12_V2'
ORDER BY attribute_name;

MODEL_NAME ATTRIBUTE_NAME ATTRIBUTE_TYPE DATA_TYPE VECTOR_INFO
-------------------- -------------------- -------------------- -------------------- ------------------------------
ALL_MINILM_L12_V2 DATA TEXT VARCHAR2
ALL_MINILM_L12_V2 ORA$ONNXTARGET VECTOR VECTOR VECTOR(384,FLOAT32)

Step 5: Store and Embed Text in Oracle

Generate embedding vectors using the VECTOR_EMBEDDING SQL scoring function.

-- Create table and embed strings:
CREATE TABLE my_table (id INT, my_data VARCHAR2(4000), v VECTOR);
INSERT INTO my_table VALUES (1, 'Nava – The Lazy King', NULL);

-- Now vectorize it:
-- Single row:
BEGIN
UPDATE my_table
SET v = (
SELECT TO_VECTOR(VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING my_data AS data))
FROM DUAL
)
WHERE ID = 1;
END;
/

-- Whole table:
BEGIN
FOR rec IN (SELECT * FROM my_table) LOOP
UPDATE my_table SET v = (
SELECT TO_VECTOR(VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING my_data AS data))
FROM DUAL
) WHERE ID = rec.ID;
END LOOP;
END;
/

-- Search similar entries:
SELECT my_data FROM my_table
ORDER BY vector_distance(
TO_VECTOR(VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING 'who is lazy' AS data)),
v
)
FETCH FIRST 1 ROWS ONLY;

MY_DATA
--------------------------------------------------------------------------------
Nava ??? The Lazy King

Step 6: PDF to Oracle — Embedding Full Docs

Create a pdf from a word doc of your desired content and place pdf in mount /Users/nava/Documents/data/scripts . Start with one pager based on your system resources.

cd /mnt/scripts
ls -lrt
-rw-r--r-- 1 501 games 107363 Jun 12 12:30 Input.pdf

-- Create table for PDF:
CREATE TABLE pdf_tab (id NUMBER, data BLOB);
INSERT INTO pdf_tab VALUES (1, TO_BLOB(BFILENAME('DM_DUMP', 'input.pdf')));

-- Chunk & embed:
INSERT INTO my_table
SELECT chunk_id, chunk_data, '[1]'
FROM pdf_tab dt,
dbms_vector_chain.utl_to_chunks(
dbms_vector_chain.utl_to_text(dt.data),
json('{"by":"words", "max":"30", "split":"recursively"}')
) t,
JSON_TABLE(t.column_value, '$[*]' COLUMNS (
chunk_id NUMBER PATH '$.chunk_id',
chunk_data VARCHAR2(4000) PATH '$.chunk_data'
)) et;

This can get tricky. Please review oracle documentation and modify the dbms_vector_chain.utl_to_chunks as per your chunking requirement

Should look like this, I have inserted an anomaly with #10 for ground test.

Step 7: Create a Vector Index for Fast Search

To enable efficient similarity search over vector embeddings, you’ll need to create a vector index. Oracle 23ai supports HNSW (Hierarchical Navigable Small World) indexing for fast approximate nearest neighbor searches.

Before that, make sure your database has enough memory allocated to vector operations. This tells Oracle to use an in-memory HNSW index optimized for cosine distance and a target accuracy of 95%. It dramatically speeds up vector queries for large dataset.

-- If vector memory is low:
ALTER SYSTEM SET VECTOR_MEMORY_SIZE = 1G SCOPE=SPFILE;
SHUTDOWN IMMEDIATE;
STARTUP;

-- Then:
CREATE VECTOR INDEX docs_hnsw_idx ON my_table(v)
ORGANIZATION INMEMORY NEIGHBOR GRAPH DISTANCE COSINE WITH TARGET ACCURACY 95;

Step 8: Set Up IBM Granite LLM with Ollama

To handle the generation part of RAG, we’ll run a local LLM — in this case, IBM’s Granite Code 8B model — using Ollama, a simple tool that lets you run models on your machine with zero setup overhead. You can customize how the model behaves using a Modelfile. Here’s an example that gives your LLM a system persona and sets some useful parameters:

brew install ollama
ollama serve &
ollama pull granite-code:8b

# Create Modelfile:
FROM granite-code:8b
PARAMETER temperature 1
PARAMETER num_ctx 8192
SYSTEM You are Nava, an AI assistant using Oracle DB as your brain.

#Run it:
ollama run autopilot
LLM is running locally

Step 9: Web App with FastAPI and HTML

It has two parts , create both the files and start the app.

Part 1: Backend — FastAPI app.py Build a simple web-based UI similar to ChatGPT or Gemini. Since you are using Oracle database 23ai which has a built-in VECTOR_EMBEDDING() function, you do not need to handle: Tokenization, ONNX model inference or any manual embedding computation in Python. Oracle is doing all that internally using its registered model (e.g., ALL_MINILM_L12_V2).

Part 2: Frontend — index.html Simple HTML + JS Chat UI

# Run the app from directory where app.py exists:
cd /mnt/scripts
uvicorn app:app --reload --port 8000

Wrapping Up: Why This RAG Stack Matters

By now, you’ve got a full RAG pipeline running entirely on your Mac:

✅ Oracle 23ai with vector embeddings + ONNX support

✅ IBM Granite LLM via Ollama

✅ FastAPI backend and a clean chatbot UI

✅ All local — no cloud APIs, no vendor lock-in

This isn’t just a cool demo — it’s a blueprint for:

🔒 Building privacy-first, data governance–compliant LLM apps

💻 Exploring RAG locally without relying on external APIs

🧠 Understanding how databases and language models can work together

Just you, your Mac, and your curiosity — for free. No RAGrets.

Things to Try

More importantly: This is your permission slip to explore.

🌟 Ask questions using the chat UI — specific to your uploaded PDF. Try embedding something unique that doesn’t exist on the internet, and see if it can retrieve it correctly. That’s how you test grounding.

🌟 Simulate failure — stop one component (like the DB, the app, or the LLM) and observe how the system responds. This helps you understand the role and resilience of each piece in the pipeline.

🌟 Experiment with different models — try swapping Granite with Mistral or LLaMA2, and compare how they interpret the same vector context.

🌟 Scale it — this architecture is already being used by enterprises internally to stay compliant with data governance and run LLMs within secure boundaries. 

0 comments
13 views

Permalink