RAG with LlamaIndex and Ollama

语速

If you want to build a RAG system locally, we can use ollama as the base model and llamaindex to construct the agent.

Since llamaindex defaults to using OpenAI, we first need to adjust the default embedding model and LLM model.

1
2
  Settings.embed_model = OllamaEmbedding(model_name=model_name, base_url=sdmicl[1])
  Settings.llm = Ollama(model=sdmicl[0], base_url=sdmicl[1], request_timeout=360.0)

The base_url needs to be replaced with your own ollama instance, such as http://localhost:11434.

If the files in the directory are all txt or md data, you can directly use SimpleDirectoryReader to read the basic data.

1
2
  # Create a RAG tool using LlamaIndex
  documents = SimpleDirectoryReader("data").load_data()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.ollama import OllamaEmbedding

def get_agent(model_name: str):
  Settings.embed_model = OllamaEmbedding(model_name=model_name, base_url=sdmicl[1])
  Settings.llm = Ollama(model=sdmicl[0], base_url=sdmicl[1], request_timeout=360.0)

  # Create a RAG tool using LlamaIndex
  documents = SimpleDirectoryReader("data").load_data()
  index = VectorStoreIndex.from_documents(documents)
  query_engine = index.as_query_engine()


  async def search_documents(query: str) -> str:
    """Useful for answering natural language questions about an personal essay written by Paul Graham."""
    response = await query_engine.query(query)
    return str(response)


  agent = FunctionAgent(
    name="Agent",
    description="Useful for multiplying two numbers and searching documents",
    tools=[multiply, search_documents],
    llm=ollama,
    system_prompt="You are a helpful assistant that can multiply two numbers and search documents to answer questions",
  )
  return agent

async def main():
  models = ('bge-m3', 'nomic-embed-text',)

  for model_name in models:
    print(f'model: {model_name}')
    agent = get_agent(model_name=model_name)
    response = await agent.run("What did the paul graham do in college? Also, what's 7 * 8?")
    print(str(response))
    print("Done.")
    print('-' * 100)

await main()