Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export to KùzuDB #5

Merged
merged 3 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
chromedriver
lemma.json
lemma.ttl
lemma.zip
examples/tmp.*.html
paper.md
vis.html
Expand Down
4 changes: 0 additions & 4 deletions NOTES.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
TODO:

* load RDF, to bootstrap/iterate analysis
- use `skos:broader` for structural represenation of synonyms

* download ZIP: KuzuDB node-link
- https://neo4j.com/docs/getting-started/data-import/csv-import/

* link entities for lemmas, noun chunks using MediaWiki lookups?
Expand Down
14 changes: 12 additions & 2 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,11 +397,14 @@
Download a serialized <em>lemma graph</em> in multiple formats:
<ul>
<li>
<a href="https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.json_graph.node_link_data.html" target="_blank"><em>node-link</em></a>: suitable for import to Neo4j, NetworkX, KùzuDB, etc.
<a href="https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.json_graph.node_link_data.html" target="_blank"><em>node-link</em></a>: JSON data suitable for import to Neo4j, NetworkX, etc.
</li>
<li>
<a href="https://www.w3.org/TR/turtle/" target="_blank"><em>Turtle/N3</em></a>: W3C semantic graph representation, based on RDF, OWL, SKOS, etc.
</li>
<li>
<a href="https://opencypher.org/" target="_blank"><em>openCypher</em></a>: ZIP file of a labeled property graph in <a href="https://kuzudb.com/" target="_blank"><em>KùzuDB</em></a>
</li>
</ul>
""",
unsafe_allow_html = True,
Expand All @@ -416,11 +419,18 @@

st.download_button(
label = "download RDF",
data = tg.extract_rdf(),
data = tg.export_rdf(),
file_name = "lemma_graph.ttl",
mime = "text/turtle",
)

st.download_button(
label = "download KùzuDB",
data = tg.export_kuzu(zip_name = "lemma.zip"),
file_name = "lemma.zip",
mime = "application/x-zip-compressed",
)


## WIP
st.divider()
Expand Down
255 changes: 99 additions & 156 deletions docs/ex0_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ import textgraphs
%watermark
```

Last updated: 2024-01-11T19:52:45.516127-08:00
Last updated: 2024-01-14T17:42:23.566161-08:00

Python implementation: CPython
Python version : 3.10.11
Expand All @@ -60,10 +60,10 @@ import textgraphs

pyvis : 0.3.2
textgraphs: 0.3.2.dev3+gaea63b7.d20240108
spacy : 3.7.2
sys : 3.10.11 (v3.10.11:7d4cc5aa85, Apr 4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)]
matplotlib: 3.8.2
pandas : 2.1.4
sys : 3.10.11 (v3.10.11:7d4cc5aa85, Apr 4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)]
spacy : 3.7.2



Expand Down Expand Up @@ -551,115 +551,64 @@ Extract the nodes and edges which have IRIs, to create an "abstraction layer" as

```python
triples: str = tg.extract_rdf()
ic(triples);
print(triples)
```

ic| triples: ('@base <https://github.com/DerwenAI/textgraphs/ns/> .
'
'@prefix dbo: <http://dbpedia.org/ontology/> .
'
'@prefix dbr: <http://dbpedia.org/resource/> .
'
'@prefix schema: <https://schema.org/> .
'
'@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
'
'@prefix wd_ent: <http://www.wikidata.org/entity/> .
'
'
'
'dbr:Germany skos:prefLabel "Germany (German: Deutschland, German '
'pronunciation: [ˈdɔʏtʃlant]), constitutionally the Federal"@en .
'
'
'
'dbr:United_States skos:prefLabel "The United States of America (USA), '
'commonly known as the United States (U.S. or US) or America"@en .
'
'
'
'dbr:Werner_Herzog skos:prefLabel "Werner Herzog (German: [ˈvɛɐ̯nɐ '
'ˈhɛɐ̯tsoːk]; born 5 September 1942) is a German film director"@en .
'
'
'
'wd_ent:Q183 skos:prefLabel "country in Central Europe"@en .
'
'
'
'wd_ent:Q44131 skos:prefLabel "German film director, producer, screenwriter, '
'actor and opera director"@en .
'
'
'
'<entity/america_propn> a dbo:Country ;
'
' skos:prefLabel "America"@en ;
'
' schema:event <entity/war_noun> .
'
'
'
'<entity/become_verb> skos:prefLabel "become"@en .
'
'
'
'<entity/dietrich_propn_herzog_propn> a dbo:Person ;
'
' skos:prefLabel "Dietrich Herzog"@en ;
'
' schema:children <entity/werner_propn_herzog_propn> .
'
'
'
'<entity/filmmaker_noun> skos:prefLabel "filmmaker"@en .
'
'
'
'<entity/flee_verb> skos:prefLabel "fled"@en .
'
'
'
'<entity/intellectual_noun> skos:prefLabel "intellectual"@en .
'
'
'
'<entity/son_noun> skos:prefLabel "son"@en .
'
'
'
'<entity/werner_propn> a dbo:Person ;
'
' skos:prefLabel "Werner"@en .
'
'
'
'<entity/germany_propn> a dbo:Country ;
'
' skos:prefLabel "Germany"@en .
'
'
'
'<entity/war_noun> skos:prefLabel "war"@en .
'
'
'
'<entity/werner_propn_herzog_propn> a dbo:Person ;
'
' skos:prefLabel "Werner Herzog"@en ;
'
' schema:nationality <entity/germany_propn> .
'
'
'
'dbo:Country skos:prefLabel "Countries, cities, states"@en .
'
'
'
'dbo:Person skos:prefLabel "People, including fictional"@en .
'
'
')
@base <https://github.com/DerwenAI/textgraphs/ns/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix schema: <https://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix wd_ent: <http://www.wikidata.org/entity/> .

dbr:Germany skos:definition "Germany (German: Deutschland, German pronunciation: [ˈdɔʏtʃlant]), constitutionally the Federal"@en ;
skos:prefLabel "Germany"@en .

dbr:United_States skos:definition "The United States of America (USA), commonly known as the United States (U.S. or US) or America"@en ;
skos:prefLabel "United States"@en .

dbr:Werner_Herzog skos:definition "Werner Herzog (German: [ˈvɛɐ̯nɐ ˈhɛɐ̯tsoːk]; born 5 September 1942) is a German film director"@en ;
skos:prefLabel "Werner Herzog"@en .

wd_ent:Q183 skos:definition "country in Central Europe"@en ;
skos:prefLabel "Germany"@en .

wd_ent:Q44131 skos:definition "German film director, producer, screenwriter, actor and opera director"@en ;
skos:prefLabel "Werner Herzog"@en .

<entity/america_PROPN> a dbo:Country ;
skos:prefLabel "America"@en ;
schema:event <entity/war_NOUN> .

<entity/dietrich_PROPN_herzog_PROPN> a dbo:Person ;
skos:prefLabel "Dietrich Herzog"@en ;
schema:children <entity/werner_PROPN_herzog_PROPN> .

<entity/filmmaker_NOUN> skos:prefLabel "filmmaker"@en .

<entity/intellectual_NOUN> skos:prefLabel "intellectual"@en .

<entity/son_NOUN> skos:prefLabel "son"@en .

<entity/werner_PROPN> a dbo:Person ;
skos:prefLabel "Werner"@en .

<entity/germany_PROPN> a dbo:Country ;
skos:prefLabel "Germany"@en .

<entity/war_NOUN> skos:prefLabel "war"@en .

<entity/werner_PROPN_herzog_PROPN> a dbo:Person ;
skos:prefLabel "Werner Herzog"@en ;
schema:nationality <entity/germany_PROPN> .

dbo:Country skos:definition "Countries, cities, states"@en ;
skos:prefLabel "country"@en .

dbo:Person skos:definition "People, including fictional"@en ;
skos:prefLabel "person"@en .




## statistical stack profile instrumentation
Expand All @@ -672,7 +621,7 @@ profiler.stop()



<pyinstrument.session.Session at 0x1548b8100>
<pyinstrument.session.Session at 0x162c12d70>



Expand All @@ -682,56 +631,50 @@ profiler.print()
```


_ ._ __/__ _ _ _ _ _/_ Recorded: 19:52:45 Samples: 10764
/_//_/// /_\ / //_// / //_'/ // Duration: 59.853 CPU time: 72.021
_ ._ __/__ _ _ _ _ _/_ Recorded: 17:42:23 Samples: 12907
/_//_/// /_\ / //_// / //_'/ // Duration: 192.261 CPU time: 84.960
/ _/ v4.6.1

Program: /Users/paco/src/textgraphs/venv/lib/python3.10/site-packages/ipykernel_launcher.py -f /Users/paco/Library/Jupyter/runtime/kernel-21c48172-c498-4e47-889b-254035b61b7d.json

59.853 _UnixSelectorEventLoop._run_once asyncio/base_events.py:1832
└─ 59.852 Handle._run asyncio/events.py:78
192.262 _UnixSelectorEventLoop._run_once asyncio/base_events.py:1832
└─ 192.257 Handle._run asyncio/events.py:78
[12 frames hidden] asyncio, ipykernel, IPython
42.909 ZMQInteractiveShell.run_ast_nodes IPython/core/interactiveshell.py:3394
├─ 20.530 <module> ../ipykernel_64445/1708547378.py:1
│ ├─ 16.121 InferRel_Rebel.__init__ textgraphs/rel.py:121
│ │ └─ 16.007 pipeline transformers/pipelines/__init__.py:531
│ │ [39 frames hidden] transformers, torch, <built-in>, json
│ ├─ 2.905 PipelineFactory.__init__ textgraphs/pipe.py:434
│ │ └─ 2.890 load spacy/__init__.py:27
│ │ [15 frames hidden] spacy, en_core_web_sm, catalogue, imp...
│ ├─ 0.816 InferRel_OpenNRE.__init__ textgraphs/rel.py:33
│ │ └─ 0.808 get_model opennre/pretrain.py:126
│ └─ 0.688 TextGraphs.create_pipeline textgraphs/doc.py:96
│ └─ 0.688 PipelineFactory.create_pipeline textgraphs/pipe.py:508
│ └─ 0.688 Pipeline.__init__ textgraphs/pipe.py:216
│ └─ 0.688 English.__call__ spacy/language.py:1016
│ [11 frames hidden] spacy, spacy_dbpedia_spotlight, reque...
└─ 20.513 <module> ../ipykernel_64445/1245857438.py:1
└─ 20.513 TextGraphs.perform_entity_linking textgraphs/doc.py:526
└─ 20.513 KGWikiMedia.perform_entity_linking textgraphs/kg.py:288
├─ 10.783 KGWikiMedia._link_kg_search_entities textgraphs/kg.py:914
│ └─ 10.782 KGWikiMedia.dbpedia_search_entity textgraphs/kg.py:623
│ └─ 10.700 get requests/api.py:62
│ [37 frames hidden] requests, urllib3, http, socket, ssl,...
├─ 8.960 KGWikiMedia._link_spotlight_entities textgraphs/kg.py:833
│ └─ 8.959 KGWikiMedia.dbpedia_search_entity textgraphs/kg.py:623
│ └─ 8.908 get requests/api.py:62
│ [34 frames hidden] requests, urllib3, http, socket, ssl,...
└─ 0.769 KGWikiMedia._secondary_entity_linking textgraphs/kg.py:1045
└─ 0.769 KGWikiMedia.wikidata_search textgraphs/kg.py:557
└─ 0.768 KGWikiMedia._wikidata_endpoint textgraphs/kg.py:426
└─ 0.768 get requests/api.py:62
[7 frames hidden] requests, urllib3
16.186 InferRel_Rebel.gen_triples_async textgraphs/pipe.py:188
├─ 15.446 InferRel_Rebel.gen_triples textgraphs/rel.py:259
│ ├─ 14.153 InferRel_Rebel.tokenize_sent textgraphs/rel.py:145
│ │ └─ 14.151 TranslationPipeline.__call__ transformers/pipelines/text2text_generation.py:341
│ │ [44 frames hidden] transformers, torch, <built-in>
│ └─ 1.289 KGWikiMedia.resolve_rel_iri textgraphs/kg.py:352
│ └─ 0.799 get_entity_dict_from_api qwikidata/linked_data_interface.py:21
│ [8 frames hidden] qwikidata, requests, urllib3
└─ 0.740 InferRel_OpenNRE.gen_triples textgraphs/rel.py:58
└─ 0.695 KGWikiMedia.resolve_rel_iri textgraphs/kg.py:352
162.938 ZMQInteractiveShell.run_ast_nodes IPython/core/interactiveshell.py:3394
├─ 136.946 <module> ../ipykernel_85826/1245857438.py:1
│ └─ 136.946 TextGraphs.perform_entity_linking textgraphs/doc.py:529
│ └─ 136.945 KGWikiMedia.perform_entity_linking textgraphs/kg.py:306
│ ├─ 73.530 KGWikiMedia._link_kg_search_entities textgraphs/kg.py:932
│ │ └─ 73.521 KGWikiMedia.dbpedia_search_entity textgraphs/kg.py:641
│ │ └─ 73.301 get requests/api.py:62
│ │ [34 frames hidden] requests, urllib3, http, socket, ssl,...
│ │ 61.888 _SSLSocket.read <built-in>
│ ├─ 59.665 KGWikiMedia._link_spotlight_entities textgraphs/kg.py:851
│ │ └─ 59.662 KGWikiMedia.dbpedia_search_entity textgraphs/kg.py:641
│ │ └─ 59.598 get requests/api.py:62
│ │ [34 frames hidden] requests, urllib3, http, socket, ssl,...
│ │ 48.631 _SSLSocket.read <built-in>
│ └─ 3.746 KGWikiMedia._secondary_entity_linking textgraphs/kg.py:1060
│ └─ 3.745 KGWikiMedia.wikidata_search textgraphs/kg.py:575
│ └─ 3.744 KGWikiMedia._wikidata_endpoint textgraphs/kg.py:444
│ └─ 3.743 get requests/api.py:62
│ [12 frames hidden] requests, urllib3, <built-in>
└─ 22.367 <module> ../ipykernel_85826/1708547378.py:1
├─ 16.426 InferRel_Rebel.__init__ textgraphs/rel.py:121
│ └─ 16.299 pipeline transformers/pipelines/__init__.py:531
│ [29 frames hidden] transformers, torch, <built-in>
├─ 3.130 PipelineFactory.__init__ textgraphs/pipe.py:434
│ └─ 3.111 load spacy/__init__.py:27
│ [6 frames hidden] spacy, en_core_web_sm
└─ 1.926 TextGraphs.create_pipeline textgraphs/doc.py:98
└─ 1.926 PipelineFactory.create_pipeline textgraphs/pipe.py:508
└─ 1.926 Pipeline.__init__ textgraphs/pipe.py:216
└─ 1.926 English.__call__ spacy/language.py:1016
27.979 InferRel_Rebel.gen_triples_async textgraphs/pipe.py:188
└─ 26.858 InferRel_Rebel.gen_triples textgraphs/rel.py:259
└─ 24.972 InferRel_Rebel.tokenize_sent textgraphs/rel.py:145
└─ 24.952 TranslationPipeline.__call__ transformers/pipelines/text2text_generation.py:341
[41 frames hidden] transformers, torch, <built-in>



Expand Down
Loading
Loading