You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Example of Search Graph
"""
import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SearchGraph
from china_unis import universities
os.environ.clear()
load_dotenv()
# ************************************************
# Define the configuration for the graph
# ************************************************
openai_key = os.getenv("OPENAI_API_KEY")
graph_config = {
"llm": {
"api_key": openai_key,
"model": "openai/gpt-4o-2024-08-06",
},
"max_results": 2,
"verbose": True,
}
prompt = f"""
Get me the contact email addresses of the following universities:
{universities[:10]}
"""
# ************************************************
# Create the SearchGraph instance and run it
# ************************************************
search_graph = SearchGraph(
prompt=prompt, config=graph_config
)
result = search_graph.run()
print(result)
# Save results to both JSON and TXT formats for flexibility
import json
from pathlib import Path
from datetime import datetime
# Create output directory if it doesn't exist
output_dir = Path("output")
output_dir.mkdir(exist_ok=True)
# Generate timestamp for unique filenames
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# Save as JSON
json_path = output_dir / f"university_contacts_{timestamp}.json"
with open(json_path, "w", encoding="utf-8") as f:
json.dump(result, f, indent=2, ensure_ascii=False)
# Save as TXT
txt_path = output_dir / f"university_contacts_{timestamp}.txt"
with open(txt_path, "w", encoding="utf-8") as f:
f.write(str(result))
print(f"\nResults saved to:")
print(f"JSON: {json_path}")
print(f"TXT: {txt_path}")
input:
universities = [
"Beijing Foreign Studies University",
"Beijing Jiaotong University",
"Beijing Language and Culture University",
"Beijing Radio and Television University",
"Beijing University of Chinese Medicine",
"Beijing University of Posts and Telecommunications",
"Central China Normal University",
"Chong Qing University",
"Donghua University",
"East China Normal University",
"Harbin Engineering University",
"Harbin Institute of Technology Shenzhen Graduate School",
"Henan University",
"Hubei University",
"Jiangxi Normal University",
"Jilin University",
"Nanjing University",
"Ningbo University",
"Northeast Normal University",
"Northwest University",
"Northwestern Polytechnical University",
"Ocean University of China",
"Peking University",
"Renmin University of China",
"Shaanxi Normal University",
"Shanghai International Studies University",
"Shanghai Normal University",
"Shanghai University",
"Shanghai University of Traditional Chinese Medicine",
"Sichuan Normal University",
"Sichuan University",
"Sun Yat-sen University",
"The Central Academy of Drama",
"Tianjin University",
"Tianjin University of Finance and Economics",
"Tsinghua University",
"Wuhan University",
"Yanbian University",
"Yangzhou University",
"Zhejiang University",
"Zhongnan University of Economics and Law",
"Zhuhai College of Jilin University",
"Shanghai University",
"Sichuan Normal University",
"Chong Qing University",
"Shanghai University of Finance & Economics",
"Beijing Institute of Technology",
"North China University of Technology",
"Beijing University of Chemical Technology",
"Shantou University",
"China Medical University",
"Chinese Culture University",
"Dharma Drum Buddhist College",
"Feng Chia University",
"Fo Guang University",
"Nanhua University",
"National Central University",
"National Cheng Kung University",
"National Chengchi University",
"National Taipei University",
"National Taipei University of Technology",
"National Taiwan Normal University",
"National Taiwan University",
"Shih Chien University",
"Tatung University",
"Tzu Chi University",
"Chung Yuan Christian University",
"Southern Taiwan University of Science and Technology",
"National Taiwan University",
"National University of Kaohsiung",
"Asia University",
"University of Taipei",
"Lingnan University",
"The Hong Kong Institute of Education"
]
gets me only:
{
"Beijing Foreign Studies University": [
"[email protected]",
"[email protected]"
],
"Beijing Jiaotong University": "NA",
"Beijing Language and Culture University": "NA",
"Beijing Radio and Television University": "NA",
"Beijing University of Chinese Medicine": "NA",
"Beijing University of Posts and Telecommunications": "NA",
"Central China Normal University": "NA",
"Chong Qing University": "NA",
"Donghua University": "NA",
"East China Normal University": "NA",
"sources": [
"https://iss.bfsu.edu.cn/notice_intro.php?id=84",
"https://osao.bfsu.edu.cn/info/1042/2097.htm",
"https://greatyop.com/chinese-universities-agency-no-province/",
"https://freestudyinchina.com/silk-road-scholarship-beijing-jiaotong-university/"
]
}
The text was updated successfully, but these errors were encountered:
The issue with receiving "NA" for many universities in your SearchGraph results could be due to several factors:
Model Limitations: The language model specified in your configuration, "openai/gpt-4o-2024-08-06", might not be effectively trained or capable of extracting contact emails from the web pages it accesses. Consider experimenting with different models or configurations to see if results improve.
Search Limitations: The max_results parameter is set to 2, which limits the number of search results the graph can process. Increasing this number might provide more opportunities to find the required information.
Prompt Specificity: The prompt might not be specific enough to guide the model in finding the correct information. You could try refining the prompt to be more explicit about the type of contact information you are seeking.
Data Availability: The contact emails for some universities might not be readily available or easily accessible on the web, leading to "NA" results.
Error Handling: There might be no specific error handling or logging implemented to provide insights into why results are "NA". Implementing logging could help diagnose issues.
Consider adjusting these factors to potentially improve the quality of your results.
input:
gets me only:
The text was updated successfully, but these errors were encountered: