Skip to content

Commit 6d14111

Browse files
authoredSep 24, 2024··
Prepare v0.2.0 (#5)
1 parent 81649c3 commit 6d14111

File tree

2 files changed

+20
-20
lines changed

2 files changed

+20
-20
lines changed
 

‎README.md

+19-19
Original file line numberDiff line numberDiff line change
@@ -92,37 +92,37 @@ Represents a page in the document:
9292

9393
This node represent a paragraph, a heading or any text within the document.
9494

95-
- `category`: The classification of the text within the document.
95+
- `category`: The [category](#category) of the text within the document, e.g. `heading`, `title`
9696
- `content`: A string representing the textual content.
9797
- `marks`: List of [marks](#marks) applied to the text, such as bold, italic, etc.
9898
- `attributes`: Can contain metadata like the bounding box representing where this portion of text is located in the page.
9999

100100
### Category
101-
Below are the various categories of text that may be found within a document:
102101

103-
**Category Type**
104-
- `page-header`: Represents the header of the page.
105-
- `footer`: Represents the footer of the page.
106-
- `heading`: Any heading within the document.
107-
- `figure`: Represents a figure or an image.
108-
- `other`: Any other unclassified text.
109-
- `appendix`: Text within an appendix.
110-
- `keywords`: List of keywords.
102+
Each block of text is assigned a _category_.
103+
104+
- `abstract`: The abstract of the document.
111105
- `acknowledgments`: Section acknowledging contributors.
106+
- `affiliation`: Author's institutional affiliation.
107+
- `appendix`: Text within an appendix.
108+
- `authors`: List of authors.
109+
- `body`: Main body text of the document.
112110
- `caption`: Caption associated with a figure or table.
113-
- `toc`: Table of contents.
114-
- `abstract`: The abstract of the document.
111+
- `categories`: Categories or topics listed in the document.
112+
- `figure`: Represents a figure or an image.
113+
- `footer`: Represents the footer of the page.
115114
- `footnote`: Text at the bottom of the page providing additional information.
116-
- `body`: Main body text of the document.
115+
- `formula`: Mathematical formula or equation.
116+
- `general-terms`: General terms section.
117+
- `heading`: Any heading within the document.
118+
- `keywords`: List of keywords.
117119
- `itemize-item`: Item in a list or bullet point.
118-
- `title`: The title of the document.
120+
- `other`: Any other unclassified text.
121+
- `page-header`: Represents the header of the page.
119122
- `reference`: References or citations within the document.
120-
- `affiliation`: Author's institutional affiliation.
121-
- `general-terms`: General terms section.
122-
- `formula`: Mathematical formula or equation.
123-
- `categories`: Categories or topics listed in the document.
124123
- `table`: Represents a table.
125-
- `authors`: List of authors.
124+
- `title`: The title of the document.
125+
- `toc`: Table of contents.
126126

127127
### Marks
128128

‎setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
setup(
1212
name='parse-document-model',
13-
version='0.1.0',
13+
version='0.2.0',
1414
description='Pydantic models for representing a text document as a hierarchical structure.',
1515
long_description=long_description,
1616
long_description_content_type='text/markdown',

0 commit comments

Comments
 (0)
Please sign in to comment.