事件图谱构建与展示

爬虫

爬虫方法路径./crawler

爬虫接口

路径./crawler/crawler_func.py

保证./crawler/config.yaml预定义了如下配置：

  - name: "talos_blog"   # 保存文件夹名称
    homepage: "https://blog.talosintelligence.com"   # 主页（仅爬取该页面的子页面）
    subpage_strategy: "sitemap"   # 子页面获取方式
    start_url: ["https://blog.talosintelligence.com/sitemap-posts.xml"]  # sitemap
    target_lang: "en" # 目标语言

直接import方法即可进行爬取

from crawler.crawler_func import *

crawler_for_talos_blog(base_dir,date_str)

"""
base_dir: 存储原始数据的目录，文件会保存在{base_dir}/{date_str}
date_str: 爬取数据的日期，格式为 'YYYY-MM-DD'，例如 '2025-09-22'
"""

数据保存

目前数据保存在./data/raw路径，调用raw2sqlite方法可以将数据转存为./data/sqlite/data.db

数据库后端

启动

python ./event_graph/db_app.py --port 5001 --base_dir {base_dir} --db_dir {db_dir}

API

{BASE_URL}/meta_data
{BASE_URL}/search

事件图谱展示

文档预处理+图谱构建

./graph_service

资源

tool-非法json修复repair
cti-openwall邮件列表openwall
cti-seclists邮件列表seclists 漏洞主要在这fulllist
cti-开源工具cti-tool 漏洞主要在这漏洞
- 安全事件incident-report
cti-中文cti的RSS列表cn-rss
Malpedia的新闻整合库malpedia
IOCs查询网站，也许可以利用AlienVault OTX

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
construct		construct
event_refinement		event_refinement
external_dict		external_dict
graph_eval		graph_eval
graph_service		graph_service
preprocess		preprocess
construct.py		construct.py
db_app.py		db_app.py
dealer_func.py		dealer_func.py
ekg_app.py		ekg_app.py
inference_service.py		inference_service.py
preprocess.py		preprocess.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

事件图谱构建与展示

爬虫

爬虫接口

数据保存

数据库后端

事件图谱展示

文档预处理+图谱构建

资源

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

事件图谱构建与展示

爬虫

爬虫接口

数据保存

数据库后端

事件图谱展示

文档预处理+图谱构建

资源

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages