@@ -92,37 +92,37 @@ Represents a page in the document:
92
92
93
93
This node represent a paragraph, a heading or any text within the document.
94
94
95
- - ` category ` : The classification of the text within the document.
95
+ - ` category ` : The [ category ] ( #category ) of the text within the document, e.g. ` heading ` , ` title `
96
96
- ` content ` : A string representing the textual content.
97
97
- ` marks ` : List of [ marks] ( #marks ) applied to the text, such as bold, italic, etc.
98
98
- ` attributes ` : Can contain metadata like the bounding box representing where this portion of text is located in the page.
99
99
100
100
### Category
101
- Below are the various categories of text that may be found within a document:
102
101
103
- ** Category Type**
104
- - ` page-header ` : Represents the header of the page.
105
- - ` footer ` : Represents the footer of the page.
106
- - ` heading ` : Any heading within the document.
107
- - ` figure ` : Represents a figure or an image.
108
- - ` other ` : Any other unclassified text.
109
- - ` appendix ` : Text within an appendix.
110
- - ` keywords ` : List of keywords.
102
+ Each block of text is assigned a _ category_ .
103
+
104
+ - ` abstract ` : The abstract of the document.
111
105
- ` acknowledgments ` : Section acknowledging contributors.
106
+ - ` affiliation ` : Author's institutional affiliation.
107
+ - ` appendix ` : Text within an appendix.
108
+ - ` authors ` : List of authors.
109
+ - ` body ` : Main body text of the document.
112
110
- ` caption ` : Caption associated with a figure or table.
113
- - ` toc ` : Table of contents.
114
- - ` abstract ` : The abstract of the document.
111
+ - ` categories ` : Categories or topics listed in the document.
112
+ - ` figure ` : Represents a figure or an image.
113
+ - ` footer ` : Represents the footer of the page.
115
114
- ` footnote ` : Text at the bottom of the page providing additional information.
116
- - ` body ` : Main body text of the document.
115
+ - ` formula ` : Mathematical formula or equation.
116
+ - ` general-terms ` : General terms section.
117
+ - ` heading ` : Any heading within the document.
118
+ - ` keywords ` : List of keywords.
117
119
- ` itemize-item ` : Item in a list or bullet point.
118
- - ` title ` : The title of the document.
120
+ - ` other ` : Any other unclassified text.
121
+ - ` page-header ` : Represents the header of the page.
119
122
- ` reference ` : References or citations within the document.
120
- - ` affiliation ` : Author's institutional affiliation.
121
- - ` general-terms ` : General terms section.
122
- - ` formula ` : Mathematical formula or equation.
123
- - ` categories ` : Categories or topics listed in the document.
124
123
- ` table ` : Represents a table.
125
- - ` authors ` : List of authors.
124
+ - ` title ` : The title of the document.
125
+ - ` toc ` : Table of contents.
126
126
127
127
### Marks
128
128
0 commit comments