Skip to content

Commit 7e54ce6

Browse files
committed
Implement QnA fine-tune
0 parents  commit 7e54ce6

File tree

5 files changed

+54
-0
lines changed

5 files changed

+54
-0
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.env

documents/amaterasu.txt

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Amaterasu, also known as Amaterasu Omikami or Ohirume no Muchi no Kami, is the goddess of the sun in Japanese mythology. One of the major deities (kami) of Shinto, she is also portrayed in Japan's earliest literary texts, the Kojiki (c. 712 CE) and the Nihon Shoki (720 CE), as the ruler (or one of the rulers) of the heavenly realm Takamagahara and the mythical ancestress of the Imperial House of Japan via her grandson Ninigi. Along with her siblings, the moon deity Tsukuyomi and the impetuous storm god Susanoo, she is considered to be one of the "Three Precious Children" (mihashira no uzu no miko / sankishi), the three most important offspring of the creator god Izanagi.

documents/jiraiya.txt

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Jiraiya, originally known as Ogata Shuma Hiroyuki, is the toad-riding protagonist of the Japanese folk tale Katakiuchi Kidan Jiraiya Monogatari ("The Tale of the Gallant Jiraiya"). The tale was originally a Yomihon that was published in 1806–1807, and was adapted into a serialized novel that was written by different authors and published in 43 installments from 1839 to 1868; one of its illustrators was woodblock artist Kunisada. Kawatake Mokuami then wrote a kabuki drama based on the first ten parts of the novel, which premiered in Edo in 1852, starring Ichikawa Danjuro VIII in the leading role. Since then the story has been adapted into, several films, video games, and manga and has also influenced various other works.

documents/susanoo.txt

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Susanoo (historical orthography: 'Susanowo') is a kami in Japanese mythology. The younger brother of Amaterasu, goddess of the sun and mythical ancestress of the Japanese imperial line, he is a multifaceted deity with contradictory characteristics (both good and bad), being portrayed in various stories either as a wild, impetuous god associated with the sea and storms, as a heroic figure who killed a monstrous serpent, or as a local deity linked with the harvest and agriculture. Syncretic beliefs that arose after the introduction of Buddhism to Japan also saw Susanoo becoming conflated with deities of pestilence and disease.

qna.py

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
import time
2+
from dotenv import load_dotenv
3+
from langchain.embeddings import OpenAIEmbeddings
4+
from langchain.document_loaders import DirectoryLoader, TextLoader
5+
from langchain.text_splitter import CharacterTextSplitter
6+
from langchain.vectorstores import Chroma
7+
from langchain.chains import RetrievalQA
8+
from langchain import OpenAI
9+
10+
def process_data():
11+
print("Getting data...")
12+
loader = DirectoryLoader("documents", glob="**/*.txt")
13+
documents = loader.load()
14+
print("Documents loaded.")
15+
16+
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
17+
print("Text splitted.")
18+
19+
texts = text_splitter.split_documents(documents)
20+
21+
print(texts)
22+
23+
return texts
24+
25+
def get_embeddings(texts):
26+
print("Getting embeddings...")
27+
embeddings = OpenAIEmbeddings()
28+
print("Embeddings loaded.")
29+
30+
print("Creating vector store...")
31+
vector_store = Chroma.from_documents(texts, embeddings)
32+
print("Vector store created.")
33+
34+
qna = RetrievalQA.from_chain_type(
35+
llm=OpenAI(),
36+
chain_type="stuff",
37+
retriever=vector_store.as_retriever()
38+
)
39+
40+
return qna
41+
42+
def query(prompt, qna_retriever):
43+
print(f"User prompt: {prompt}")
44+
print(f"Answer: {qna_retriever.run(prompt)}")
45+
46+
if __name__ == '__main__':
47+
load_dotenv()
48+
texts = process_data()
49+
qna_retriever = get_embeddings(texts)
50+
query("What did Ichikawa Danjuro star in?", qna_retriever)

0 commit comments

Comments
 (0)