Skip to content

Commit 1069dbe

Browse files
committed
Add LLM-KG integration examples
1 parent 61a0296 commit 1069dbe

File tree

1 file changed

+312
-0
lines changed

1 file changed

+312
-0
lines changed

src/content/docs/guide/KG_llms.md

Lines changed: 312 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,312 @@
1+
---
2+
title: Knowledge Graph Applications with LLMs
3+
description: Examples of integrating Knowledge Graphs with LLMs using Virtuoso, Apache Jena, and other tools.
4+
sidebar:
5+
order: 3
6+
---
7+
8+
Knowledge Graphs (KGs) combined with Large Language Models (LLMs) offer powerful solutions for data-driven applications. This guide showcases practical examples of how to integrate LLMs with Knowledge Graphs using tools like Virtuoso and Apache Jena.
9+
10+
---
11+
12+
## Example 1: Querying Knowledge Graphs with LLMs
13+
14+
<details>
15+
16+
### Overview
17+
18+
In this example, we demonstrate how to query a Virtuoso Knowledge Graph using a Large Language Model (LLM) to retrieve meaningful insights from structured data. The core idea is to bridge the gap between natural language queries and structured data stored in RDF format within Virtuoso.The integration leverages `llama_index`, an interface that connects LLMs to structured data sources like SPARQL endpoints.
19+
20+
---
21+
22+
## Prerequisites
23+
24+
### System Requirements:
25+
26+
- **Python 3.x** installed.
27+
- **Virtuoso Server** running with SPARQL authentication enabled.
28+
29+
### Required Installations:
30+
31+
1. **Uninstall existing LlamaIndex (if any):**
32+
33+
```bash
34+
pip uninstall llama_index -y
35+
```
36+
37+
2. **Install OpenLink's fork of LlamaIndex:**
38+
39+
```bash
40+
pip install git+https://github.com/OpenLinkSoftware/llama_index
41+
```
42+
43+
3. **Set OpenAI API Key:**
44+
45+
```bash
46+
export OPENAI_API_KEY=your_openai_api_key_here
47+
```
48+
49+
4. **Create a directory for graph data storage:**
50+
```bash
51+
mkdir llama_storage_graph
52+
```
53+
54+
---
55+
56+
## Configuration
57+
58+
### SPARQL Endpoint Details:
59+
60+
Update the following connection details in your Python script:
61+
62+
```python
63+
ENDPOINT = 'http://localhost:8890/sparql-auth/'
64+
GRAPH = 'http://purl.org/stuff/guardians'
65+
BASE_URI = 'http://purl.org/stuff/data'
66+
USER = 'dba'
67+
PASSWORD = 'dba'
68+
```
69+
70+
### OpenAI API Configuration:
71+
72+
```python
73+
import os
74+
import openai
75+
76+
openai.api_key = os.environ["OPENAI_API_KEY"]
77+
```
78+
79+
---
80+
81+
## Full Python Code (`llama_test.py`)
82+
83+
```python
84+
from llama_index import download_loader
85+
import os
86+
from llama_index import KnowledgeGraphIndex, ServiceContext
87+
from llama_index.storage.storage_context import StorageContext
88+
from llama_index.graph_stores import SparqlGraphStore
89+
from llama_index.llms import OpenAI
90+
from llama_index import load_index_from_storage
91+
import openai
92+
93+
# OpenAI API Key
94+
openai.api_key = os.environ["OPENAI_API_KEY"]
95+
96+
# Initialize LLM
97+
llm = OpenAI(temperature=0, model="text-davinci-002")
98+
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=512)
99+
100+
# Virtuoso SPARQL Endpoint Configuration
101+
ENDPOINT = 'http://localhost:8890/sparql-auth/'
102+
GRAPH = 'http://purl.org/stuff/guardians'
103+
BASE_URI = 'http://purl.org/stuff/data'
104+
USER = 'dba'
105+
PASSWORD = 'dba'
106+
107+
# Connect to Virtuoso SPARQL Graph Store
108+
graph_store = SparqlGraphStore(
109+
sparql_endpoint=ENDPOINT,
110+
sparql_graph=GRAPH,
111+
sparql_base_uri=BASE_URI,
112+
create_graph=False,
113+
user_name=USER,
114+
user_password=PASSWORD,
115+
)
116+
117+
# Load Index from Storage
118+
storage_context = StorageContext.from_defaults(persist_dir='./llama_storage_graph', graph_store=graph_store)
119+
kg_index = load_index_from_storage(
120+
storage_context=storage_context,
121+
service_context=service_context,
122+
max_triplets_per_chunk=10,
123+
sparql_endpoint=ENDPOINT,
124+
sparql_graph=GRAPH,
125+
sparql_base_uri=BASE_URI,
126+
include_embeddings=True,
127+
verbose=True,
128+
)
129+
130+
# Query Engine Setup
131+
kg_rag_query_engine = kg_index.as_query_engine(
132+
include_text=False,
133+
retriever_mode="keyword",
134+
response_mode="tree_summarize",
135+
)
136+
137+
# Example Query
138+
response_graph_rag = kg_rag_query_engine.query("In the movie, what does Ken think about?")
139+
140+
# Display the Response
141+
print(str(response_graph_rag))
142+
```
143+
144+
---
145+
146+
## Running the Code
147+
148+
Execute the script in your terminal:
149+
150+
```bash
151+
python llama_test.py
152+
```
153+
154+
---
155+
156+
## Expected Output
157+
158+
When the code is executed, we expect the output to provide an insightful answer extracted from the Knowledge Graph:
159+
160+
```bash
161+
Ken thinks about his identity, purpose, and the meaning of life, reflecting on his role beyond just being a supporting character.
162+
```
163+
164+
This response is generated based on the RDF triples extracted from the Virtuoso knowledge graph.
165+
166+
---
167+
168+
## Key Concepts
169+
170+
- **Virtuoso Integration:** The example connects to a Virtuoso SPARQL endpoint for querying RDF data.
171+
- **LLM Query Processing:** LLM enhances the query with natural language understanding, making it user-friendly.
172+
- **Knowledge Graph Indexing:** The Knowledge Graph Index improves retrieval efficiency by organizing data into meaningful chunks.
173+
174+
---
175+
176+
## Troubleshooting Tips
177+
178+
- **Connection Errors:** Ensure Virtuoso is running and accessible via the specified SPARQL endpoint.
179+
- **Authentication Issues:** Verify that the provided `USER` and `PASSWORD` have the necessary SPARQL access rights.
180+
- **API Key Errors:** Confirm that the OpenAI API key is correctly set in the environment variables.
181+
182+
---
183+
184+
## Expanding the Dataset
185+
186+
You can modify the SPARQL queries to explore more data points. For example:
187+
188+
```python
189+
response_graph_rag = kg_rag_query_engine.query("Who is Barbie?")
190+
print(str(response_graph_rag))
191+
```
192+
193+
### Expected Output:
194+
195+
```bash
196+
Barbie is a character who thinks about becoming human and living in the real world. She also contemplates what it means to be human.
197+
```
198+
199+
</details>
200+
201+
## Example 2: Extracting Triples from Text Using LLMs
202+
203+
<details>
204+
205+
### Overview
206+
207+
In this example, we showcase how to automatically extract structured knowledge (in the form of triples: subject, relation, object) from unstructured text using a Large Language Model (LLM). The goal is to transform plain text into a format suitable for building knowledge graphs, which can later be queried using SPARQL or integrated with systems like Virtuoso or Apache Jena.
208+
209+
### Full Python Code (`kg_generator.py`)
210+
211+
```python
212+
from openai import OpenAI
213+
import csv
214+
215+
# Set up OpenAI API key
216+
client = OpenAI(api_key="")
217+
218+
# Sample text for extracting entities and relationships
219+
text = """
220+
Barack Obama was born in Hawaii. He was the 44th President of the United States.
221+
Michelle Obama is his wife. They have two daughters, Malia and Sasha.
222+
"""
223+
224+
# Function to extract entities and relationships
225+
def extract_entities_relations(text):
226+
prompt = f"""
227+
Extract entities and relationships from the following text in the form of triples (subject, relation, object):
228+
229+
Text: {text}
230+
231+
Format:
232+
(Subject, Relation, Object)
233+
"""
234+
235+
# Correct ChatCompletion API call using the instantiated client
236+
response = client.chat.completions.create(
237+
model="gpt-4",
238+
messages=[
239+
{"role": "system", "content": "You are an assistant that extracts entities and relationships from text."},
240+
{"role": "user", "content": prompt}
241+
],
242+
temperature=0.5,
243+
max_tokens=200
244+
)
245+
246+
return response.choices[0].message.content.strip()
247+
248+
# Extract entities and relationships from the sample text
249+
extracted_triples = extract_entities_relations(text)
250+
251+
# Display the extracted triples
252+
print("Extracted Triples:")
253+
print(extracted_triples)
254+
255+
# Save triples to a CSV file
256+
csv_filename = "extracted_triples.csv"
257+
with open(csv_filename, mode='w', newline='', encoding='utf-8') as file:
258+
writer = csv.writer(file)
259+
writer.writerow(["Subject", "Predicate", "Object"]) # CSV Header
260+
for triple in extracted_triples.split('\n'):
261+
writer.writerow(triple.strip("()").split(", "))
262+
263+
print(f"\nTriples successfully saved to {csv_filename}")
264+
```
265+
266+
---
267+
268+
### Running the Code
269+
270+
Execute the script:
271+
272+
```bash
273+
python kg_generator.py
274+
```
275+
276+
---
277+
278+
### Expected Output
279+
280+
```bash
281+
Extracted Triples:
282+
(Barack Obama, was born in, Hawaii)
283+
(Barack Obama, was, 44th President of the United States)
284+
(Michelle Obama, is wife of, Barack Obama)
285+
(Barack Obama, has daughters, Malia)
286+
(Barack Obama, has daughters, Sasha)
287+
288+
Triples successfully saved to extracted_triples.csv
289+
```
290+
291+
---
292+
293+
### CSV Output
294+
295+
```csv
296+
Subject,Predicate,Object
297+
Barack Obama,was born in,Hawaii
298+
Barack Obama,was,44th President of the United States
299+
Michelle Obama,is wife of,Barack Obama
300+
Barack Obama,has daughters,Malia
301+
Barack Obama,has daughters,Sasha
302+
```
303+
304+
This file can now be imported into Virtuoso or Apache Jena as part of a knowledge graph.
305+
306+
---
307+
308+
</details>
309+
310+
## More Examples
311+
312+
Using Virtuoso with LlamaIndex for RAG significantly improves the quality and reliability of LLM-generated content by grounding responses in factual data. This approach is particularly effective for minimizing hallucinations in knowledge-driven applications.

0 commit comments

Comments
 (0)