使用 Milvus 和 Ollama 构建 RAG

Ollama 是一个开源平台，简化了在本地运行和自定义大型语言模型（LLM）的过程。它提供了用户友好的、无云体验，使用户能够轻松下载、安装和与模型交互，而无需高级技术技能。凭借不断增长的预训练 LLM 库——从通用到特定领域——Ollama 使管理和自定义各种应用程序的模型变得容易。它确保数据隐私和灵活性，使用户能够完全在自己的机器上微调、优化和部署 AI 驱动的解决方案。

在本指南中，我们将向您展示如何利用 Ollama 和 Milvus 高效、安全地构建 RAG（检索增强生成）管道。

准备工作

依赖项和环境

$ pip install pymilvus ollama

如果您使用的是 Google Colab，为了启用刚安装的依赖项，您可能需要重启运行时（点击屏幕顶部的"Runtime"菜单，然后从下拉菜单中选择"Restart session"）。

准备数据

我们使用 Milvus Documentation 2.4.x 的 FAQ 页面作为我们 RAG 中的私有知识，这是一个简单 RAG 管道的良好数据源。

下载 zip 文件并将文档提取到文件夹 milvus_docs。

$ wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip
$ unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs

--2024-11-26 21:47:19-- https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip Resolving github.com (github.com)... 140.82.112.4 Connecting to github.com (github.com)|140.82.112.4|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/267273319/c52902a0-e13c-4ca7-92e0-086751098a05?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241127%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241127T024720Z&X-Amz-Expires=300&X-Amz-Signature=7808b77cbdaa7e122196bcd75a73f29f2540333a350c4830bbdf5f286e876304&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dmilvus_docs_2.4.x_en.zip&response-content-type=application%2Foctet-stream [following] --2024-11-26 21:47:20-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/267273319/c52902a0-e13c-4ca7-92e0-086751098a05?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241127%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241127T024720Z&X-Amz-Expires=300&X-Amz-Signature=7808b77cbdaa7e122196bcd75a73f29f2540333a350c4830bbdf5f286e876304&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dmilvus_docs_2.4.x_en.zip&response-content-type=application%2Foctet-stream Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ... Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.109.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 613094 (599K) [application/octet-stream] Saving to: 'milvus_docs_2.4.x_en.zip'

milvus_docs_2.4.x_e 100%[===================>] 598.72K 1.20MB/s in 0.5s

2024-11-26 21:47:20 (1.20 MB/s) - 'milvus_docs_2.4.x_en.zip' saved [613094/613094]

我们从文件夹 milvus_docs/en/faq 加载所有 markdown 文件。对于每个文档，我们只是简单地使用 "# " 来分离文件中的内容，这可以大致分离 markdown 文件每个主要部分的内容。

from glob import glob

text_lines = []

for file_path in glob("milvus_docs/en/faq/*.md", recursive=True):
    with open(file_path, "r") as file:
        file_text = file.read()

    text_lines += file_text.split("# ")

准备 LLM 和嵌入模型

Ollama 支持多种模型，用于基于 LLM 的任务和嵌入生成，使开发检索增强生成（RAG）应用程序变得容易。对于此设置：

我们将使用 Llama 3.2 (3B) 作为我们的 LLM 进行文本生成任务。
对于嵌入生成，我们将使用 mxbai-embed-large，这是一个针对语义相似性优化的 334M 参数模型。

在开始之前，确保两个模型都在本地拉取：

! ollama pull mxbai-embed-large

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest pulling 819c2adf5ce6... 100% ▕████████████████▏ 669 MB
pulling c71d239df917... 100% ▕████████████████▏ 11 KB
pulling b837481ff855... 100% ▕████████████████▏ 16 B
pulling 38badd946f91... 100% ▕████████████████▏ 408 B
verifying sha256 digest writing manifest success [?25h

! ollama pull llama3.2

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest pulling dde5aa3fc5ff... 100% ▕████████████████▏ 2.0 GB
pulling 966de95ca8a6... 100% ▕████████████████▏ 1.4 KB
pulling fcc5a6bec9da... 100% ▕████████████████▏ 7.7 KB
pulling a70ff7e570d9... 100% ▕████████████████▏ 6.0 KB
pulling 56bb8bd477a5... 100% ▕████████████████▏ 96 B
pulling 34bb5ab01051... 100% ▕████████████████▏ 561 B
verifying sha256 digest writing manifest success [?25h

准备好这些模型后，我们可以继续实现 LLM 驱动的生成和基于嵌入的检索工作流程。

import ollama


def emb_text(text):
    response = ollama.embeddings(model="mxbai-embed-large", prompt=text)
    return response["embedding"]

生成一个测试嵌入并打印其维度和前几个元素。

test_embedding = emb_text("This is a test")
embedding_dim = len(test_embedding)
print(embedding_dim)
print(test_embedding[:10])

1024 [0.23276396095752716, 0.4257211685180664, 0.19724100828170776, 0.46120673418045044, -0.46039995551109314, -0.1413791924715042, -0.18261606991291046, -0.07602324336767197, 0.39991313219070435, 0.8337644338607788]

将数据加载到 Milvus

创建 Collection

from pymilvus import MilvusClient

milvus_client = MilvusClient(uri="./milvus_demo.db")

collection_name = "my_rag_collection"

关于 MilvusClient 的参数：

将 uri 设置为本地文件，例如 ./milvus.db，是最方便的方法，因为它会自动利用 Milvus Lite 将所有数据存储在此文件中。
如果您有大规模数据，可以在 docker 或 kubernetes 上设置性能更高的 Milvus 服务器。在此设置中，请使用服务器 uri，例如 http://localhost:19530，作为您的 uri。
如果您想使用 Zilliz Cloud，Milvus 的完全托管云服务，请调整 uri 和 token，它们对应于 Zilliz Cloud 中的 Public Endpoint 和 Api key。

检查 collection 是否已存在，如果存在则删除它。

if milvus_client.has_collection(collection_name):
    milvus_client.drop_collection(collection_name)

使用指定参数创建新的 collection。

如果我们不指定任何字段信息，Milvus 将自动创建一个默认的 id 字段作为主键，以及一个 vector 字段来存储向量数据。保留的 JSON 字段用于存储非模式定义的字段及其值。

milvus_client.create_collection(
    collection_name=collection_name,
    dimension=embedding_dim,
    metric_type="IP",  # 内积距离
    consistency_level="Strong",  # 支持的值为 (`"Strong"`, `"Session"`, `"Bounded"`, `"Eventually"`)。详见 https://milvus.io/docs/consistency.md#Consistency-Level
)

插入数据

遍历文本行，创建嵌入，然后将数据插入到 Milvus 中。

这里有一个新字段 text，它是 collection 模式中的非定义字段。它将自动添加到保留的 JSON 动态字段中，在高级别上可以被视为普通字段。

from tqdm import tqdm

data = []

text_embeddings = [emb_text(line) for line in tqdm(text_lines, desc="Creating embeddings")]

for i, (line, embedding) in enumerate(zip(text_lines, text_embeddings)):
    data.append({"id": i, "vector": embedding, "text": line})

milvus_client.insert(collection_name=collection_name, data=data)

Creating embeddings: 100%|██████████| 72/72 [03:22<00:00, 2.81s/it]

{'insert_count': 72, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71], 'cost': 0}

构建 RAG

为查询检索数据

让我们指定一个关于 Milvus 的常见问题。

question = "How is data stored in milvus?"

在 collection 中搜索问题并检索语义上最相关的前 3 个匹配项。

search_res = milvus_client.search(
    collection_name=collection_name,
    data=[emb_text(question)],  # 将问题转换为嵌入向量
    limit=3,  # 返回前 3 个结果
    search_params={"metric_type": "IP", "params": {}},  # 内积距离
    output_fields=["text"],  # 返回文本字段
)

让我们看看搜索的结果

retrieved_lines_with_distances = [
    (res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=2))

[ [ "# How is data stored in Milvus?\n\nMilvus supports multiple storage engines, allowing you to choose the one that best fits your use case:\n\n- Faiss: Primarily used for CPU-based vector similarity search. It's efficient for smaller datasets and provides various indexing algorithms.\n\n- Annoy: Suitable for memory-constrained environments. It builds tree-based indexes that are compact and fast for approximate nearest neighbor search.\n\n- HNSW (Hierarchical Navigable Small World): Great for high-dimensional data and provides a good balance between search accuracy and speed.\n\n- IVF (Inverted File): Suitable for large-scale data. It partitions vectors into clusters, making search more efficient.\n\n- RHNSW (Refined HNSW): An optimized version of HNSW, providing better performance for high-dimensional vector search.\n\nMilvus also supports scalar data storage alongside vector data, enabling hybrid queries that combine vector similarity search with traditional filtering.", 0.8488019108772278 ], [ "# What is the maximum dataset size Milvus can handle?\n\nMilvus is designed to handle large-scale vector data, but the maximum dataset size depends on several factors:\n\n- Hardware Resources: The amount of available memory, storage, and computational power.\n- Data Characteristics: The dimensionality of vectors, the number of vectors, and the complexity of queries.\n- Configuration: Milvus settings, such as indexing methods and memory allocation.\n\nIn practice, Milvus can handle datasets ranging from millions to billions of vectors. For example:\n- A cluster with sufficient resources can manage billions of high-dimensional vectors.\n- Proper indexing and configuration are crucial for optimal performance with large datasets.\n\nFor specific use cases, it's recommended to conduct performance testing with your data to determine the optimal setup.", 0.5329589247703552 ], [ "# How does Milvus handle data consistency?\n\nMilvus provides configurable consistency levels to balance between consistency and performance:\n\n- Strong: Guarantees that all reads receive the most recent write. Queries are performed on the most up-to-date data view.\n\n- Session: Ensures read-your-writes consistency within the same session. A client can always read data that it has written.\n\n- Bounded Staleness: Allows reads to be slightly behind the most recent writes, but within a bounded time window.\n\n- Eventually: Provides the weakest consistency level. Reads might not immediately reflect recent writes, but the system will eventually converge.\n\nYou can specify the consistency level when creating a collection or performing queries, depending on your application's requirements.", 0.4895109236240387 ] ]

使用 LLM 生成响应

将检索到的文档转换为字符串格式。

context = "\n".join([line_with_distance[0] for line_with_distance in retrieved_lines_with_distances])

定义系统和用户提示符以供 Ollama 使用。此提示符是根据检索到的文档组装的。

SYSTEM_PROMPT = """
Human: You are an AI assistant. You are given a user question, and please write clean, concise and accurate answer to the question. You will be given a set of related contexts to the question, please answer based on the context. Please say "information is not available" if the question cannot be answered based on the context.
"""

USER_PROMPT = f"""
Use the following pieces of information enclosed in &lt;context&gt; tags to provide an answer to the question enclosed in &lt;question&gt; tags.
&lt;context&gt;
{{context}}
&lt;/context&gt;
&lt;question&gt;
{{question}}
&lt;/question&gt;
"""

使用 Ollama 生成基于提示符的响应。

response = ollama.chat(
    model="llama3.2",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {
            "role": "user",
            "content": USER_PROMPT.format(context=context, question=question),
        },
    ],
)

print(response["message"]["content"])

Based on the provided context, here's how data is stored in Milvus:

Milvus supports multiple storage engines, allowing you to choose the one that best fits your use case:

Faiss: Primarily used for CPU-based vector similarity search. It's efficient for smaller datasets and provides various indexing algorithms.
Annoy: Suitable for memory-constrained environments. It builds tree-based indexes that are compact and fast for approximate nearest neighbor search.
HNSW (Hierarchical Navigable Small World): Great for high-dimensional data and provides a good balance between search accuracy and speed.
IVF (Inverted File): Suitable for large-scale data. It partitions vectors into clusters, making search more efficient.
RHNSW (Refined HNSW): An optimized version of HNSW, providing better performance for high-dimensional vector search.

Additionally, Milvus supports scalar data storage alongside vector data, enabling hybrid queries that combine vector similarity search with traditional filtering.

The choice of storage engine depends on factors like dataset size, memory constraints, search accuracy requirements, and performance needs.

太好了！我们已经使用 Milvus 和 Ollama 构建了一个 RAG 管道。

快速部署

要了解如何使用此教程启动在线演示，请参阅示例应用程序。

准备工作​

依赖项和环境​

准备数据​

准备 LLM 和嵌入模型​

将数据加载到 Milvus​

创建 Collection​

插入数据​

构建 RAG​

为查询检索数据​

使用 LLM 生成响应​

快速部署​