diff --git a/doc/widgets/SciHubator.rst b/doc/widgets/SciHubator.rst
new file mode 100644
index 00000000..f2ee005c
--- /dev/null
+++ b/doc/widgets/SciHubator.rst
@@ -0,0 +1,179 @@
+.. meta::
+ :description: Orange3 Textable Prototypes documentation, SciHubator widget
+ :keywords: Orange3, Textable, Prototypes, documentation, SciHubator, widget
+
+.. _SciHubator:
+
+SciHubator
+==============
+
+.. image:: https://github.com/sarahperettipoix/orange3-textable-prototypes/blob/master/orangecontrib/textable_prototypes/widgets/icons/scihubator.png
+
+Download pdf files from `Sci-HUB `_ and extract textual content into segmentations
+
+Authors
+-------
+Peretti-Poix Sarah, Borgeaud Matthias, Chétioui Orsowen, Luginbühl Colin
+
+Signals
+-------
+
+Inputs: ``None``
+
+ None
+
+
+Outputs: ``Text data``
+
+ Segmentation covering the content of downloaded pdf files
+
+Requirements
+------------
+
+* Orange 3.38.1
+* Orange Textable 3.2.2
+* from scidownl import scihub_download
+* import pdfplumber
+
+Description
+-----------
+
+This widget is designed to download pdf files from the SciHub project and outputs its content
+into an annotated text segmentation.
+
+
+Basic interface
+~~~~~~~~~~~~~~~
+
+In its basic version,
+the **SciHubator** widget is limited to the import of a single DOI.
+The interface contains a **Source** section enabling the user to type the DOI.
+
+.. _SciHubator_basicinterface:
+
+.. figure:: https://github.com/sarahperettipoix/orange3-textable-prototypes/blob/master/specs/images/scihubator_minimal.png
+ :align: center
+ :alt: Basic interface of the SciHubator widget
+
+ Figure 1: **SciHubator** widget (basic interface).
+
+Note that pdfplumber might not work properly with none latin alphabets
+and serif typefaces.
+
+The **Send** button triggers the emission of a segmentation to the output
+connection(s). When it is selected, the **Send automatically** checkbox
+disables the button and the widget attempts to automatically emit a
+segmentation at every modification of its interface.
+
+The text below the **Send** button indicates the number TODO of characters in the single
+segment contained in the output segmentation, or the reasons why no
+segmentation is emitted (no input data, encoding issue, etc.).
+
+Advanced interface
+~~~~~~~~~~~~~~~~~~
+
+The advanced version of **SciHubator** allows the user to type several DOIs
+in a determined order; each output text file can moreover be segmented into
+specific segmentations (introduction, mais corpus and bibliography) with specific
+annotations. The emitted segmentation contains a segment
+for each imported file.
+
+.. _scihubator_advancedinterface:
+
+.. figure:: https://github.com/sarahperettipoix/orange3-textable-prototypes/blob/master/specs/images/scihubator_principal.png
+ :align: center
+ :alt: Advanced interface of the Super Text files widget
+ :scale: 80%
+
+ Figure 2: **SciHubator** widget (advanced interface).
+
+The advanced interface presents similarities with that of the **URLs** and **Segment**
+widgets. The **Sources** section allows the user to select the input
+DOI(s). The list
+of imported files appears at the top of the window; the columns of this list
+indicate (a) the name of each file, (b) the corresponding annotation (if any),
+and (c) the encoding with which each is associated.
+
+The first buttons on the right of the imported files' list enable the user to
+modify the order in which they appear in the output segmentation (**Move Up**
+and **Move Down**), to delete a file from the list (**Remove**) or to
+completely empty it (**Clear All**). Except for **Clear All**, all these
+buttons require the user to previously select an entry from the list.
+
+The **Send** button triggers the emission of a segmentation to the output
+connection(s). When it is selected, the **Send automatically** checkbox
+disables the button and the widget attempts to automatically emit a
+segmentation at every modification of its interface.
+
+The text below the **Send** button indicates the length of the output segmentation in
+characters, or the reasons why no segmentation is emitted (no selected file,
+encoding issue, etc.). In the example, the two segments corresponding to the
+imported files thus total up to 1'262'145 characters.
+
+Messages
+--------
+
+Information
+~~~~~~~~~~~
+
+*Data correctly sent to output: segments ( characters).*
+ This confirms that the widget has operated properly.
+
+*Settings were* (or *Input has*) *changed, please click 'Send' when ready.*
+ Settings and/or input have changed but the **Send automatically** checkbox
+ has not been selected, so the user is prompted to click the **Send**
+ button (or equivalently check the box) in order for computation and data
+ emission to proceed.
+
+*No data sent to output yet: no DOI selected.*
+ The widget instance is not able to emit data to output because no input
+ DOI has been selected.
+
+*No data sent to output yet, see 'Widget state' below.*
+ A problem with the instance's parameters and/or input data prevents it
+ from operating properly, and additional diagnostic information can be
+ found in the **Widget state** box at the bottom of the instance's
+ interface (see `Warnings`_ and `Errors`_ below).
+
+*Duplicate DOI(s) found and deleted.*
+ A duplicate DOI was found in the DOI list.
+ Adding operation is halted so that no duplicates appear
+
+
+Warnings
+~~~~~~~~
+
+*Please enter one or many valid DOIs.*
+ A valid DOI is required for being processed by Sci-Hub.
+The warning indicates that nothing was typed in the DOI field.
+
+*Not all sections were segmented*
+ The regex was not able to segment the content of certain DOIs.
+
+*Step 1/3: Pre-processing...*
+ The PDF is being downloaded
+*Step 2/3: Processing...*
+ The PDF is being processed into a raw text.
+*Step 3/3: Post-processing...*
+ Segmentations are applied to the text.
+
+
+
+
+Errors
+~~~~~~
+
+*SciHub inaccessible - verify your connexion.*
+ Please verify your internet connexion or check if `Sci-HUB `_ is down.
+
+*An error occurred when downloading.*
+ Downloading the PDF didn't worked, please try again.
+
+*Error occurred when reading PDF:*
+ An unexpected error occurred when reading the downloaded PDF. Please try again, if the error still happen your DOI could be not compatible.
+
+*Download failed. Please, verify DOI or connexion.*
+ Sci-Hub is accessible but scihubator couldn't download the PDF. Your connexion has perhaps crashed in the download process or the DOI provided is not valid.
+
+
+
diff --git a/orangecontrib/textable_prototypes/widgets/DemoSciHub.py b/orangecontrib/textable_prototypes/widgets/DemoSciHub.py
new file mode 100644
index 00000000..84488f38
--- /dev/null
+++ b/orangecontrib/textable_prototypes/widgets/DemoSciHub.py
@@ -0,0 +1,372 @@
+#/(?<=\n)\n((biblio|r(e|é)f)\w*\W*\n)(.|\n)*/
+#/(Abstract.+?\n{1,})((.|\n)*)(?=\n\n)/gmi
+"""
+Class DemoTextableWidget
+Copyright 2025 University of Lausanne
+-----------------------------------------------------------------------------
+This file is part of the Orange3-Textable-Prototypes package.
+
+Orange3-Textable-Prototypes is free software: you can redistribute
+it and/or modify it under the terms of the GNU General Public License
+as published by the Free Software Foundation, either version 3 of the
+License, or (at your option) any later version.
+
+Orange3-Textable-Prototypes is distributed in the hope that it will
+be useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with Orange3-Textable-Prototypes. If not, see
+ .
+"""
+
+__version__ = u"0.0.1"
+__author__ = "Sarah Perreti-Poix, Borgeaud Matthias, Chétioui Orsowen, Luginbühl Colin"
+__maintainer__ = "Aris Xanthos"
+__email__ = "aris.xanthos@unil.ch"
+
+
+from functools import partial
+import time
+import tempfile
+from scidownl import scihub_download
+import pdfplumber
+import os
+import requests
+
+from _textable.widgets.TextableUtils import (
+ OWTextableBaseWidget, VersionedSettingsHandler, ProgressBar,
+ InfoBox, SendButton, pluralize, Task
+)
+
+from LTTL.Segmentation import Segmentation
+#from LTTL.Input import Input
+
+# Using the threaded version of LTTL.Segmenter to create
+# a "responsive" widget.
+import LTTL.SegmenterThread as Segmenter
+
+from Orange.widgets import widget, gui, settings
+from Orange.widgets.utils.widgetpreview import WidgetPreview
+from LTTL.Input import Input
+
+
+class DemoSciHUB(OWTextableBaseWidget):
+ """Demo Orange3-Textable widget"""
+
+ name = "Demo Scihub"
+ description = "Export a text segmentation from a DOI or URL"
+ icon = "icons/scihubator.png"
+ priority = 99
+
+ # Input and output channels (remove if not needed)...
+ #inputs = [("Segmentation", Segmentation, "inputData")]
+ outputs = [("New segmentation", Segmentation)]
+
+ # Copied verbatim in every Textable widget to facilitate
+ # settings management.
+ settingsHandler = VersionedSettingsHandler(
+ version=__version__.rsplit(".", 1)[0]
+ )
+
+ # Settings...
+ DOIContent = settings.Setting("")
+ #numberOfSegments = settings.Setting("10")
+
+ want_main_area = False
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+ self.inputSegmentationLength = 0
+ # The following attribute is required by every widget
+ # that imports new strings into Textable.
+ self.createdInputs = list()
+
+ self.infoBox = InfoBox(widget=self.controlArea)
+ self.sendButton = SendButton(
+ widget=self.controlArea,
+ master=self,
+ callback=self.sendData,
+ cancelCallback=self.cancel_manually,
+ infoBoxAttribute="infoBox",
+ )
+
+ # GUI...
+ # Top-level GUI boxes are created using method
+ # create_widgetbox(), so that they are automatically
+ # enabled/disabled when processes are running.
+ sourceBox = self.create_widgetbox(
+ box=u'Options',
+ orientation='vertical',
+ addSpace=False,
+ )
+
+ # GUI elements can be assigned to variables or even
+ # attributes (e.g. self.DOIContentLineEdit) if
+ # they must be referred to elsewhere, e.g., to enable
+ # or disable them, etc. It is not the case below.
+ gui.lineEdit(
+ widget=sourceBox,
+ master=self,
+ value="DOIContent",
+ orientation="horizontal",
+ label="DOI:",
+ labelWidth=130,
+ # self.sendButton.settingsChanged should be used in
+ # in cases where using a GUI element should result
+ # in sending data to output. If it should result in
+ # other operations being done, use a custom method
+ # instead, and at the end of it, if data should be
+ # sent to output, call self.sendButton.settingsChanged().
+ # If using the GUI element should not result in
+ # anything at that moment, delete the "callback"
+ # parameter.
+ callback=self.sendButton.settingsChanged,
+ tooltip=(
+ "A string that defines the content "
+ "each segment."
+ ),
+ )
+
+ # Stretchable vertical spacing between "options"
+ # and Send button etc.
+ gui.rubber(self.controlArea)
+
+ # Draw send button & Info box...
+ self.sendButton.draw()
+ self.infoBox.draw()
+
+ # Send data if needed.
+ self.sendButton.settingsChanged()
+
+ def sendData(self):
+ """Perform every required check and operation
+ before calling the method that does the actual
+ processing.
+ """
+
+ if self.DOIContent == "":
+ # Use mode "warning" when user needs to do some
+ # action or provide some information; use mode "error"
+ # when invalid parameters have been provided;
+ # for notifications that don't require user action,
+ # don't use a mode. Use formulations that emphasize
+ # what should be done rather than what is wrong or
+ # missing.
+ self.infoBox.setText("Please type valid DOI.",
+ "warning")
+ # Make sure to send None and return if the widget
+ # cannot operate properly at this point.
+ self.send("New segmentation", None)
+ return
+
+ # If the widget creates new LTTL.Input objects (i.e.
+ # if it imports new strings in Textable), make sure to
+ # clear previously created Inputs with this method.
+ self.clearCreatedInputs()
+
+ # Notify processing in infobox. Typically, there should
+ # always be a "processing" step, with optional "pre-
+ # processing" and "post-processing" steps before and
+ # after it. If there are no optional steps, notify
+ # "Preprocessing...".
+ self.infoBox.setText("Step 1/2: Pre-processing...", "warning")
+
+ # Progress bar should be initialized at this point.
+ self.progressBarInit()
+
+ # Create a threaded function to do the actual processing
+ # and specify its arguments (here there are none).
+ threaded_function = partial(
+ self.processData,
+ # argument1,
+ # argument2,
+ # ...
+ )
+
+ # Run the threaded function...
+ self.threading(threaded_function)
+
+ def processData(self):
+ """Actual processing takes place in this method,
+ which is run in a worker thread so that GUI stays
+ responsive and operations can be cancelled
+ """
+
+ # At start of processing, set progress bar to 1%.
+ # Within this method, this is done using the following
+ # instruction.
+ self.signal_prog.emit(1, False)
+
+ DOIList = self.DOIContent.split(",")
+ #DOIList.append(self.DOIContent)
+
+ # Indicate the total number of iterations that the
+ # progress bar will go through (e.g. number of input
+ # segments, number of selected files, etc.), then
+ # set current iteration to 1.
+ max_itr = len(DOIList)
+ cur_itr = 1
+
+ # Permet de tester la connexion à Sci-Hub
+ if not test_scihub_accessible():
+ self.sendNoneToOutputs()
+ self.infoBox.setText("SciHub inaccessible - verify your connexion", 'error')
+ return
+ # Actual processing...
+
+ # For each progress bar iteration...
+ tempdir = tempfile.TemporaryDirectory()
+ for DOI in DOIList:
+
+ # Update progress bar manually...
+ self.signal_prog.emit(int(100*cur_itr/max_itr), False)
+ cur_itr += 1
+
+ # code ajouté ici
+ paper = DOI
+ paper_type = "doi"
+ out = f"{tempdir.name}/{DOIList.index(DOI)}"
+ try:
+ scihub_download(paper, paper_type=paper_type, out=out)
+ except Exception as ex:
+ print(ex)
+ self.sendNoneToOutputs()
+ self.infoBox.setText("An error occurred when downloading", 'error')
+ return
+ # Cancel operation if requested by user...
+ time.sleep(0.00001) # Needed somehow!
+ if self.cancel_operation:
+ self.signal_prog.emit(100, False)
+ return
+
+ # Update infobox and reset progress bar...
+ self.signal_text.emit("Step 2/2: Processing...",
+ "warning")
+ cur_itr = 0
+ self.signal_prog.emit(0, True)
+ for DOI in DOIList:
+ DOIText = ""
+ if os.path.exists(f"{tempdir.name}/{DOIList.index(DOI)}.pdf"):
+ try:
+ with pdfplumber.open(f"{tempdir.name}/{DOIList.index(DOI)}.pdf") as pdf:
+ for page in pdf.pages:
+ self.signal_prog.emit(int(100 * cur_itr / max_itr), False)
+ cur_itr += (1 / len(pdf.pages))
+ DOIText += page.extract_text()
+ except Exception as e:
+ self.sendNoneToOutputs()
+ self.infoBox.setText(f"Error occurred when reading PDF: {str(e)}", 'error')
+ return
+ else:
+ self.sendNoneToOutputs()
+ self.infoBox.setText("Download failed. Please, verify DOI or connexion", 'error')
+ return
+ ########
+
+ # Create an LTTL.Input...
+ if len(DOIList) == 1:
+ # self.captionTitle is the name of the widget,
+ # which will become the label of the output
+ # segmentation.
+ label = self.captionTitle
+ else:
+ label = None # will be set later.
+ print(DOIText)
+ myInput = Input(DOIText, label)
+
+ # Extract the first (and single) segment in the
+ # newly created LTTL.Input and annotate it with
+ # the length of the input segmentation.
+ segment = myInput[0]
+ segment.annotations["DOI"] \
+ = DOI
+ # For the annotation to be saved in the LTTL.Input,
+ # the extracted and annotated segment must be re-assigned
+ # to the first (and only) segment of the LTTL.Input.
+ myInput[0] = segment
+
+ # Add the LTTL.Input to self.createdInputs.
+ self.createdInputs.append(myInput)
+
+ # Cancel operation if requested by user...
+ time.sleep(0.00001) # Needed somehow!
+ if self.cancel_operation:
+ self.signal_prog.emit(100, False)
+ return
+ tempdir.cleanup()
+
+
+ # If there's only one LTTL.Input created, it is the
+ # widget's output...
+ if len(DOIList) == 1:
+ return self.createdInputs[0]
+
+ # Otherwise the widget's output is a concatenation...
+ else:
+ return Segmenter.concatenate(
+ caller=self,
+ segmentations=self.createdInputs,
+ label=self.captionTitle,
+ import_labels_as=None,
+ )
+
+ @OWTextableBaseWidget.task_decorator
+ def task_finished(self, f):
+ """All operations following the successful termination
+ of self.processData
+ """
+
+ # Get the result value of self.processData.
+ processed_data = f.result()
+
+ # If it is not None...
+ if processed_data:
+ message = "text sent to output "
+ message = pluralize(message, len(processed_data))
+ """numChars = 0
+ for segment in processed_data:
+ segmentLength = len(Segmentation.get_data(segment.str_index))
+ numChars += segmentLength
+ message += f"({numChars} character@p)."
+ message = pluralize(message, numChars)"""
+ self.infoBox.setText(message)
+ self.send("New segmentation", processed_data)
+
+ # The following method should be copied verbatim in
+ # every Textable widget.
+ def setCaption(self, title):
+ """Register captionTitle changes and send if needed"""
+ if 'captionTitle' in dir(self):
+ changed = title != self.captionTitle
+ super().setCaption(title)
+ if changed:
+ self.cancel() # Cancel current operation
+ self.sendButton.settingsChanged()
+ else:
+ super().setCaption(title)
+
+ # The following two methods should be copied verbatim in
+ # every Textable widget that creates LTTL.Input objects.
+
+ def clearCreatedInputs(self):
+ """Clear created inputs"""
+ for i in self.createdInputs:
+ Segmentation.set_data(i[0].str_index, None)
+ del self.createdInputs[:]
+
+ def onDeleteWidget(self):
+ """Clear created inputs on widget deletion"""
+ self.clearCreatedInputs()
+
+
+def test_scihub_accessible():
+ try:
+ response = requests.get("https://sci-hub.se", timeout=10)
+ return response.status_code == 200
+ except:
+ return False
+
+if __name__ == '__main__':
+ WidgetPreview(DemoSciHub).run()
diff --git a/orangecontrib/textable_prototypes/widgets/DemoTextableWidget.py b/orangecontrib/textable_prototypes/widgets/DemoTextableWidget.py
new file mode 100644
index 00000000..30c058bc
--- /dev/null
+++ b/orangecontrib/textable_prototypes/widgets/DemoTextableWidget.py
@@ -0,0 +1,340 @@
+"""
+Class DemoTextableWidget
+Copyright 2025 University of Lausanne
+-----------------------------------------------------------------------------
+This file is part of the Orange3-Textable-Prototypes package.
+
+Orange3-Textable-Prototypes is free software: you can redistribute
+it and/or modify it under the terms of the GNU General Public License
+as published by the Free Software Foundation, either version 3 of the
+License, or (at your option) any later version.
+
+Orange3-Textable-Prototypes is distributed in the hope that it will
+be useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with Orange3-Textable-Prototypes. If not, see
+ .
+"""
+
+__version__ = '0.0.1'
+__author__ = "Aris Xanthos"
+__maintainer__ = "Aris Xanthos"
+__email__ = "aris.xanthos@unil.ch"
+
+
+from functools import partial
+import time
+
+from _textable.widgets.TextableUtils import (
+ OWTextableBaseWidget, VersionedSettingsHandler, ProgressBar,
+ InfoBox, SendButton, pluralize, Task
+)
+
+from LTTL.Segmentation import Segmentation
+from LTTL.Input import Input
+
+# Using the threaded version of LTTL.Segmenter to create
+# a "responsive" widget.
+import LTTL.SegmenterThread as Segmenter
+
+from Orange.widgets import widget, gui, settings
+from Orange.widgets.utils.widgetpreview import WidgetPreview
+
+
+class DemoTextableWidget(OWTextableBaseWidget):
+ """Demo Orange3-Textable widget"""
+
+ name = "Demo widget"
+ description = "Illustrates common code behind Textable widgets"
+ icon = "icons/someIcon.svg"
+ priority = 99
+
+ # Input and output channels (remove if not needed)...
+ inputs = [("Segmentation", Segmentation, "inputData")]
+ outputs = [("New segmentation", Segmentation)]
+
+ # Copied verbatim in every Textable widget to facilitate
+ # settings management.
+ settingsHandler = VersionedSettingsHandler(
+ version=__version__.rsplit(".", 1)[0]
+ )
+
+ # Settings...
+ segmentContent = settings.Setting("sample text")
+ numberOfSegments = settings.Setting("10")
+
+ want_main_area = False
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ # Attributes...
+ self.inputSegmentationLength = 0
+
+ # The following attribute is required by every widget
+ # that imports new strings into Textable.
+ self.createdInputs = list()
+
+ self.infoBox = InfoBox(widget=self.controlArea)
+ self.sendButton = SendButton(
+ widget=self.controlArea,
+ master=self,
+ callback=self.sendData,
+ cancelCallback=self.cancel_manually,
+ infoBoxAttribute="infoBox",
+ )
+
+ # GUI...
+
+ # Top-level GUI boxes are created using method
+ # create_widgetbox(), so that they are automatically
+ # enabled/disabled when processes are running.
+ optionsBox = self.create_widgetbox(
+ box=u'Options',
+ orientation='vertical',
+ addSpace=False,
+ )
+
+ # GUI elements can be assigned to variables or even
+ # attributes (e.g. self.segmentContentLineEdit) if
+ # they must be referred to elsewhere, e.g., to enable
+ # or disable them, etc. It is not the case below.
+ gui.lineEdit(
+ widget=optionsBox,
+ master=self,
+ value="segmentContent",
+ orientation="horizontal",
+ label="Segment text:",
+ labelWidth=130,
+ # self.sendButton.settingsChanged should be used in
+ # in cases where using a GUI element should result
+ # in sending data to output. If it should result in
+ # other operations being done, use a custom method
+ # instead, and at the end of it, if data should be
+ # sent to output, call self.sendButton.settingsChanged().
+ # If using the GUI element should not result in
+ # anything at that moment, delete the "callback"
+ # parameter.
+ callback=self.sendButton.settingsChanged,
+ tooltip=(
+ "A string that defines the content "
+ "each segment."
+ ),
+ )
+
+ gui.comboBox(
+ widget=optionsBox,
+ master=self,
+ value="numberOfSegments",
+ items=["1", "10", "100", "1000", "10000"],
+ sendSelectedValue=True,
+ orientation='horizontal',
+ label="Number of segments:",
+ labelWidth=130,
+ callback=self.sendButton.settingsChanged,
+ tooltip="Number of segments to create.",
+ )
+
+ # Stretchable vertical spacing between "options"
+ # and Send button etc.
+ gui.rubber(self.controlArea)
+
+ # Draw send button & Info box...
+ self.sendButton.draw()
+ self.infoBox.draw()
+
+ # Send data if needed.
+ self.sendButton.settingsChanged()
+
+ def inputData(self, segmentation):
+ """Handle segmentation on input connection"""
+
+ # If the input is None and it is needed for the widget
+ # to operate, send None to output(s) then return.
+ # Here, the widget can still operate without input.
+ if segmentation is None:
+ self.inputSegmentationLength = 0
+ else:
+ self.inputSegmentationLength = len(segmentation)
+
+ # Display the standard message for "input changed".
+ self.infoBox.inputChanged()
+
+ def sendData(self):
+ """Perform every required check and operation
+ before calling the method that does the actual
+ processing.
+ """
+
+ if self.segmentContent == "":
+ # Use mode "warning" when user needs to do some
+ # action or provide some information; use mode "error"
+ # when invalid parameters have been provided;
+ # for notifications that don't require user action,
+ # don't use a mode. Use formulations that emphasize
+ # what should be done rather than what is wrong or
+ # missing.
+ self.infoBox.setText("Please type segment content.",
+ "warning")
+ # Make sure to send None and return if the widget
+ # cannot operate properly at this point.
+ self.send("New segmentation", None)
+ return
+
+ # If the widget creates new LTTL.Input objects (i.e.
+ # if it imports new strings in Textable), make sure to
+ # clear previously created Inputs with this method.
+ self.clearCreatedInputs()
+
+ # Notify processing in infobox. Typically, there should
+ # always be a "processing" step, with optional "pre-
+ # processing" and "post-processing" steps before and
+ # after it. If there are no optional steps, notify
+ # "Preprocessing...".
+ self.infoBox.setText("Step 1/2: Processing...", "warning")
+
+ # Progress bar should be initialized at this point.
+ self.progressBarInit()
+
+ # Create a threaded function to do the actual processing
+ # and specify its arguments (here there are none).
+ threaded_function = partial(
+ self.processData,
+ # argument1,
+ # argument2,
+ # ...
+ )
+
+ # Run the threaded function...
+ self.threading(threaded_function)
+
+ def processData(self):
+ """Actual processing takes place in this method,
+ which is run in a worker thread so that GUI stays
+ responsive and operations can be cancelled
+ """
+
+ # At start of processing, set progress bar to 1%.
+ # Within this method, this is done using the following
+ # instruction.
+ self.signal_prog.emit(1, False)
+
+ # Indicate the total number of iterations that the
+ # progress bar will go through (e.g. number of input
+ # segments, number of selected files, etc.), then
+ # set current iteration to 1.
+ max_itr = int(self.numberOfSegments)
+ cur_itr = 1
+
+ # Actual processing...
+
+ # For each progress bar iteration...
+ for _ in range(int(self.numberOfSegments)):
+
+ # Update progress bar manually...
+ self.signal_prog.emit(int(100*cur_itr/max_itr), False)
+ cur_itr += 1
+
+ # Create an LTTL.Input...
+ if int(self.numberOfSegments) == 1:
+ # self.captionTitle is the name of the widget,
+ # which will become the label of the output
+ # segmentation.
+ label = self.captionTitle
+ else:
+ label = None # will be set later.
+ myInput = Input(self.segmentContent, label)
+
+ # Extract the first (and single) segment in the
+ # newly created LTTL.Input and annotate it with
+ # the length of the input segmentation.
+ segment = myInput[0]
+ segment.annotations["demo_annotation"] \
+ = self.inputSegmentationLength
+ # For the annotation to be saved in the LTTL.Input,
+ # the extracted and annotated segment must be re-assigned
+ # to the first (and only) segment of the LTTL.Input.
+ myInput[0] = segment
+
+ # Add the LTTL.Input to self.createdInputs.
+ self.createdInputs.append(myInput)
+
+ # Cancel operation if requested by user...
+ time.sleep(0.00001) # Needed somehow!
+ if self.cancel_operation:
+ self.signal_prog.emit(100, False)
+ return
+
+ # Update infobox and reset progress bar...
+ self.signal_text.emit("Step 2/2: Post-processing...",
+ "warning")
+ self.signal_prog.emit(1, True)
+
+ # If there's only one LTTL.Input created, it is the
+ # widget's output...
+ if int(self.numberOfSegments) == 1:
+ return self.createdInputs[0]
+
+ # Otherwise the widget's output is a concatenation...
+ else:
+ return Segmenter.concatenate(
+ caller=self,
+ segmentations=self.createdInputs,
+ label=self.captionTitle,
+ import_labels_as=None,
+ )
+
+ @OWTextableBaseWidget.task_decorator
+ def task_finished(self, f):
+ """All operations following the successful termination
+ of self.processData
+ """
+
+ # Get the result value of self.processData.
+ processed_data = f.result()
+
+ # If it is not None...
+ if processed_data:
+ message = f"{len(processed_data)} segment@p sent to output "
+ message = pluralize(message, len(processed_data))
+ numChars = 0
+ for segment in processed_data:
+ segmentLength = len(Segmentation.get_data(segment.str_index))
+ numChars += segmentLength
+ message += f"({numChars} character@p)."
+ message = pluralize(message, numChars)
+ self.infoBox.setText(message)
+ self.send("New segmentation", processed_data)
+
+ # The following method should be copied verbatim in
+ # every Textable widget.
+ def setCaption(self, title):
+ """Register captionTitle changes and send if needed"""
+ if 'captionTitle' in dir(self):
+ changed = title != self.captionTitle
+ super().setCaption(title)
+ if changed:
+ self.cancel() # Cancel current operation
+ self.sendButton.settingsChanged()
+ else:
+ super().setCaption(title)
+
+ # The following two methods should be copied verbatim in
+ # every Textable widget that creates LTTL.Input objects.
+
+ def clearCreatedInputs(self):
+ """Clear created inputs"""
+ for i in self.createdInputs:
+ Segmentation.set_data(i[0].str_index, None)
+ del self.createdInputs[:]
+
+ def onDeleteWidget(self):
+ """Clear created inputs on widget deletion"""
+ self.clearCreatedInputs()
+
+
+if __name__ == '__main__':
+ WidgetPreview(DemoTextableWidget).run()
diff --git a/orangecontrib/textable_prototypes/widgets/SciHubator.py b/orangecontrib/textable_prototypes/widgets/SciHubator.py
new file mode 100644
index 00000000..8ac497cd
--- /dev/null
+++ b/orangecontrib/textable_prototypes/widgets/SciHubator.py
@@ -0,0 +1,585 @@
+"""
+Class SuperTextFiles
+Copyright 2020-2025 University of Lausanne
+-----------------------------------------------------------------------------
+This file is part of the Orange3-Textable-Prototypes package and based on the
+file OWTextableTextFiles of the Orange3-Textable package.
+
+Orange3-Textable-Prototypes is free software: you can redistribute it
+and/or modify it under the terms of the GNU General Public License as published
+by the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+Orange3-Textable-Prototypes is distributed in the hope that it will be
+useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with Orange-Textable-Prototypes. If not, see
+.
+"""
+
+__version__ = "0.0.1"
+__author__ = "Sarah Perreti-Poix, Borgeaud Matthias, Chétioui Orsowen, Luginbühl Colin"
+__maintainer__ = "Aris Xanthos"
+__email__ = "aris.xanthos@unil.ch"
+
+# Standard imports...
+import re
+import time
+import tempfile
+import os
+
+from functools import partial
+import pdfplumber
+import requests
+from scidownl import scihub_download
+from _textable.widgets.TextableUtils import (
+ OWTextableBaseWidget,
+ InfoBox, SendButton, pluralize
+)
+import LTTL.SegmenterThread as Segmenter
+from LTTL.Segmenter import tokenize
+from LTTL.Segmentation import Segmentation
+from LTTL.Input import Input
+from Orange.widgets import gui, settings
+from Orange.widgets.utils.widgetpreview import WidgetPreview
+from Orange.widgets.settings import Setting
+from PyQt5.QtWidgets import QMessageBox
+
+class SciHubator(OWTextableBaseWidget):
+ """
+ Orange widget for importing and segmenting text from DOIs using Sci-Hub.
+
+ Attributes :
+ URLLabel (list) : List of labels for the DOIs.
+ selectedURLLabel (list) : List of selected labels from the URL list.
+ newDOI (str) : DOI entered by the user for addition.
+ extractedText (str) : Extracted text from the downladed PDF
+ DOI (str) : Single DOI value.
+ DOIs (list) : List of DOIs added by the user
+ createdInputs (list) : List of created LTTL.Inputs
+ """
+
+ #Version minimale
+
+ # ----------------------------------------------------------------------
+ # Widget's metadata...
+
+ name = "Sci-Hubator"
+ description = "Export a text segmentation from a DOI or URL"
+ icon = "icons/scihubator.svg"
+ priority = 10
+
+ # ----------------------------------------------------------------------
+ # Channel definitions (NB: no input in this case)...
+
+ outputs = [('Segmentation', Segmentation)]
+
+ # ----------------------------------------------------------------------
+ # GUI layout parameters...
+
+ want_main_area = False
+ resizing_enabled = True
+
+ # ----------------------------------------------------------------------
+ # Settings declaration and initializations (default values)...
+
+ DOIs = Setting([])
+ encoding = Setting('(auto-detect)')
+ autoNumber = Setting(False)
+ autoNumberKey = Setting('num')
+ autoSend = settings.Setting(False)
+ importDOIs = Setting(True)
+ importDOIsKey = Setting('url')
+ lastLocation = Setting('.')
+ DOI = Setting('')
+
+ # Ici-dessous les variables qui n'ont pas été copiées, et conçues spécialement pour SciHubator
+ importAllorBib = Setting(0)
+
+ def __init__(self):
+ """
+ Initializes the SciHubator widget, including the GUI components and settings
+ """
+ super().__init__()
+ self.URLLabel = self.DOIs[:]
+ print(self.URLLabel)
+ self.selectedURLLabel = []
+ self.newDOI = ''
+ self.extractedText = ''
+ self.DOI = ''
+ self.createdInputs = []
+
+ self.infoBox = InfoBox(widget=self.controlArea)
+ self.sendButton = SendButton(
+ widget=self.controlArea,
+ master=self,
+ callback=self.sendData,
+ cancelCallback=self.cancel_manually,
+ infoBoxAttribute="infoBox",
+ )
+ # ----------------------------------------------------------------------
+ # User interface...
+
+ # ADVANCED GUI...
+
+ # URL box
+ URLBox = gui.widgetBox(
+ widget=self.controlArea,
+ box='Sources',
+ orientation='vertical',
+ addSpace=False,
+ )
+ URLBoxLine1 = gui.widgetBox(
+ widget=URLBox,
+ box=False,
+ orientation='horizontal',
+ addSpace=True,
+ )
+ self.fileListbox = gui.listBox(
+ widget=URLBoxLine1,
+ master=self,
+ value='selectedURLLabel',
+ labels='URLLabel',
+ callback=self.updateURLBoxButtons,
+ tooltip=(
+ "The list of DOIs whose content will be imported.\n"
+ "\nIn the output segmentation, the content of each\n"
+ "DOI appears in the same position as in the list.\n"
+ ),
+ )
+ URLBoxCol2 = gui.widgetBox(
+ widget=URLBoxLine1,
+ orientation='vertical',
+ )
+ self.removeButton = gui.button(
+ widget=URLBoxCol2,
+ master=self,
+ label='Remove',
+ callback=self.remove,
+ tooltip=(
+ "Remove the selected DOI from the list."
+ ),
+ disabled = True,
+ )
+ self.clearAllButton = gui.button(
+ widget=URLBoxCol2,
+ master=self,
+ label='Clear All',
+ callback=self.clearAll,
+ tooltip=(
+ "Remove all DOIs from the list."
+ ),
+ disabled = True,
+ )
+ URLBoxLine2 = gui.widgetBox(
+ widget=URLBox,
+ box=False,
+ orientation='vertical',
+ )
+ # Add URL box
+ addURLBox = gui.widgetBox(
+ widget=URLBoxLine2,
+ box=True,
+ orientation='vertical',
+ addSpace=False,
+ )
+ gui.lineEdit(
+ widget=addURLBox,
+ master=self,
+ value='newDOI',
+ orientation='horizontal',
+ label='DOI(s):',
+ labelWidth=101,
+ callback=self.updateURLBoxButtons,
+ tooltip=(
+ "The DOI(s) that will be added to the list when\n"
+ "button 'Add' is clicked.\n\n"
+ "Successive DOIs must be separated with ' , '. \n"
+ "Their order in the list\n"
+ " will be the same as in this field."
+ ),
+ )
+ advOptionsBox = gui.widgetBox(
+ widget=self.controlArea,
+ box='Options',
+ orientation='vertical',
+ addSpace=False,
+ )
+ gui.separator(widget=advOptionsBox, height=3)
+ gui.radioButtonsInBox(
+ widget=advOptionsBox,
+ master=self,
+ value='importAllorBib',
+ btnLabels=['All in one Segment', 'Bibliography'],
+ label='Choose what to import',
+ callback=self.sendButton.settingsChanged,
+ tooltips=[
+ "Import all article's content in one segment", "Import only bibliography (if found)"
+ ]
+ )
+ gui.separator(widget=addURLBox, height=3)
+ self.addButton = gui.button(
+ widget=addURLBox,
+ master=self,
+ label='Add',
+ callback=self.add,
+ tooltip=(
+ "Add the DOI(s) currently displayed in the 'DOI'\n"
+ "text field to the list."
+ ),
+ disabled = True,
+ )
+ gui.rubber(self.controlArea)
+ self.URLLabel = self.URLLabel
+ self.updateURLBoxButtons()
+ self.sendButton.draw()
+ self.infoBox.draw()
+ self.sendButton.sendIf()
+
+ def sendData(self):
+ """
+ Trigger the data processing workflow from user-provided DOIs.
+
+ This method:
+ - Validates the presence of at least one DOI.
+ - Displays a warning if no DOI is provided.
+ - Clears any previously created inputs.
+ - Updates the UI to indicate the start of preprocessing.
+ - Launches the processing asynchronously using a background thread
+ """
+ # Verify DOIs
+ if not self.DOIs:
+ self.infoBox.setText("Please enter one or many valid DOIs.", "warning")
+ self.send("Segmentation", None)
+ return
+
+ self.clearCreatedInputs()
+
+ # Notify processing in infobox. Typically, there should
+ # always be a "processing" step, with optional "pre-
+ # processing" and "post-processing" steps before and
+ # after it. If there are no optional steps, notify
+ # "Preprocessing...".
+ self.infoBox.setText("Step 1/3: Pre-processing...", "warning")
+
+ # Progress bar should be initialized at this point.
+ self.progressBarInit()
+
+ # Create a threaded function to do the actual processing
+ # and specify its arguments (here there are none).
+ threaded_function = partial(
+ self.processData,
+ # argument1,
+ # argument2,
+ # ...
+ )
+
+ # Run the threaded function...
+ self.threading(threaded_function)
+
+ def processData(self):
+ """
+ Download and process academic articles from DOIs using Sci-Hub.
+
+ This method handles the full pipeline for downloading PDFs via Sci-Hub,
+ extracting their text content, and converting them into LTTL-compatible
+ input segmentations.
+
+ Steps:
+ 1. Verifies Sci-Hub accessibility.
+ 2. Downloads PDFs for each DOI.
+ 3. Extracts text from each PDF using pdfplumber.
+ 4. Wraps extracted text into LTTL.Inputs with DOI annotations.
+ 5. Concatenates inputs if multiple DOIs are processed.
+
+ Returns :
+ Segmentation: A single or concatenated segmentation(s) ready for output.
+
+ Raises:
+ Emits error messages and halts processing if:
+ - Sci-Hub is unreachable.
+ - A download fails.
+ - A PDF cannot be parsed.
+ """
+
+ # At start of processing, set progress bar to 1%.
+ # Within this method, this is done using the following
+ # instruction.
+ self.signal_prog.emit(1, False)
+
+ # DOIList.append(self.DOIContent)
+
+ # Indicate the total number of iterations that the
+ # progress bar will go through (e.g. number of input
+ # segments, number of selected files, etc.), then
+ # set current iteration to 1.
+ max_itr = len(self.DOIs)
+ cur_itr = 1
+
+ # Permet de tester la connexion à Sci-Hub
+ if not test_scihub_accessible():
+ self.sendNoneToOutputs()
+ self.infoBox.setText("SciHub inaccessible - verify your connexion", 'error')
+ return
+ # Actual processing...
+
+ # For each progress bar iteration...
+ tempdir = tempfile.TemporaryDirectory()
+ for DOI in self.DOIs:
+
+ # Update progress bar manually...
+ self.signal_prog.emit(int(100 * cur_itr / max_itr), False)
+ cur_itr += 1
+
+ # code ajouté ici
+ paper = DOI
+ paper_type = "doi"
+ out = f"{tempdir.name}/{self.DOIs.index(DOI)}"
+ try:
+ scihub_download(paper, paper_type=paper_type, out=out)
+ except Exception as ex:
+ print(ex)
+ self.sendNoneToOutputs()
+ self.infoBox.setText("An error occurred when downloading", 'error')
+ return
+ # Cancel operation if requested by user...
+ time.sleep(0.00001) # Needed somehow!
+ if self.cancel_operation:
+ self.signal_prog.emit(100, False)
+ return
+
+ # Update infobox and reset progress bar...
+ self.signal_text.emit("Step 2/3: Processing...",
+ "warning")
+ cur_itr = 0
+ cur_itr_p3 = 0
+ self.signal_prog.emit(0, True)
+ empty_re = False
+ for DOI in self.DOIs:
+ DOIText = ""
+ if os.path.exists(f"{tempdir.name}/{self.DOIs.index(DOI)}.pdf"):
+ try:
+ with pdfplumber.open(f"{tempdir.name}/{self.DOIs.index(DOI)}.pdf") as pdf:
+ for page in pdf.pages:
+ self.signal_prog.emit(int(100 * cur_itr / max_itr), False)
+ cur_itr += (1 / len(pdf.pages))
+ DOIText += page.extract_text()
+ except Exception as e:
+ self.sendNoneToOutputs()
+ self.infoBox.setText(f"Error occurred when reading PDF: {str(e)}", 'error')
+ return
+ else:
+ self.sendNoneToOutputs()
+ self.infoBox.setText("Download failed. Please, verify DOI or connexion", 'error')
+ return
+ ########
+
+ # Create an LTTL.Input...
+ if len(self.DOIs) == 1:
+ # self.captionTitle is the name of the widget,
+ # which will become the label of the output
+ # segmentation.
+ label = self.captionTitle
+ else:
+ label = None # will be set later.
+
+ myInput = Input(DOIText, label)
+
+ self.signal_text.emit("Step 3/3: Post-processing...",
+ "warning")
+ max_itr = 2*len(self.DOIs) #+ int(self.importText)
+ if self.importAllorBib == 0:
+ cur_itr_p3 += 1
+ # Extract the first (and single) segment in the
+ # newly created LTTL.Input and annotate it with
+ # the length of the input segmentation.
+ segment = myInput[0]
+ segment.annotations["DOI"] \
+ = DOI
+ # For the annotation to be saved in the LTTL.Input,
+ # the extracted and annotated segment must be re-assigned
+ # to the first (and only) segment of the LTTL.Input.
+ myInput[0] = segment
+ # Add the LTTL.Input to self.createdInputs.
+ self.createdInputs.append(myInput)
+ if self.importAllorBib == 1:
+ cur_itr_p3 += 1
+ ma_regex = re.compile(r'(?<=\n)\n?(([Bb]iblio|[Rr][eé]f)\w*\W*\n)(.|\n)*')
+ regexes = [(ma_regex, 'tokenize')]
+ self.signal_prog.emit(int(100 * cur_itr_p3 / max_itr), False)
+ new_segmentation = tokenize(myInput, regexes)
+ if len(new_segmentation) == 0:
+ empty_re = True
+ new_input = Input(
+ f"Empty search Bib for DOI: {DOI}", "Empty Bibliography section"
+ )
+ else:
+ new_input = Input(new_segmentation.to_string(), "Bibliographies")
+ segment = new_input[0]
+ segment.annotations["part"] = "Bibliography"
+ segment.annotations["DOI"] = DOI
+ new_input[0] = segment
+ self.createdInputs.append(new_input)
+
+ # Cancel operation if requested by user...
+ time.sleep(0.00001) # Needed somehow!
+ if self.cancel_operation:
+ self.signal_prog.emit(100, False)
+ return
+ tempdir.cleanup()
+
+
+ # If there's only one LTTL.Input created, it is the
+ # widget's output...
+ if empty_re:
+ QMessageBox.warning(
+ None, "SciHubator", "Not all sections were segmented",
+ QMessageBox.Ok
+ )
+ if len(self.DOIs) == 1:
+ return self.createdInputs[0]
+ # Otherwise the widget's output is a concatenation...
+ return Segmenter.concatenate(
+ caller=self,
+ segmentations=self.createdInputs,
+ label=self.captionTitle,
+ import_labels_as=None,
+ )
+
+ @OWTextableBaseWidget.task_decorator
+ def task_finished(self, f):
+ """
+ Handle the output after asynchronous DOI processing is complete.
+
+ This method :
+ - Retrieves the result of the processing task.
+ - Calculates the number of segments and total characters.
+ - Displays an informative message to the user.
+ - Sends the processed data to the output.
+
+ Args :
+ f (Future): A Future object containing the result from `processData`.
+
+ """
+
+ # Get the result value of self.processData.
+ processed_data = f.result()
+
+ # If it is not None...
+ if processed_data:
+ message = f"{len(processed_data)} segment@p sent to output "
+ message = pluralize(message, len(processed_data))
+ self.infoBox.setText(message)
+ self.send("Segmentation", processed_data)
+
+ # The following method should be copied verbatim in
+ # every Textable widget.
+ def setCaption(self, title):
+ """
+ Set or update the widget's caption title.
+
+ If the caption has changed, it triggers cancellation of ongoing tasks
+ and marks the settings as changed to prompt UI updates.
+
+ Args :
+ title (str): The new caption/title to be displayed on the widget.
+ """
+ if 'captionTitle' in dir(self):
+ changed = title != self.captionTitle
+ super().setCaption(title)
+ if changed:
+ self.cancel() # Cancel current operation
+ self.sendButton.settingsChanged()
+ else:
+ super().setCaption(title)
+
+ def clearAll(self):
+ """
+ Clear all stored DOIs and reset related UI elements.
+
+ This method empties the DOI list and selection,
+ disables the 'Clear All' button,
+ and updates the interface state.
+ """
+ del self.DOIs[:]
+ del self.selectedURLLabel[:]
+ self.sendButton.settingsChanged()
+ self.URLLabel = self.DOIs
+ self.clearAllButton.setDisabled(True)
+ self.removeButton.setDisabled(True)
+
+ def remove(self):
+ """
+ Remove the selected DOI from the list.
+
+ Removes the DOI corresponding to the currently selected index in the GUI,
+ updates the list of DOIs and labels, and disables the clear button if the
+ list is empty.
+ """
+ if self.selectedURLLabel:
+ index = self.selectedURLLabel[0]
+ self.DOIs.pop(index)
+ del self.selectedURLLabel[:]
+ self.sendButton.settingsChanged()
+ self.URLLabel = self.URLLabel
+ self.clearAllButton.setDisabled(not bool(self.URLLabel))
+
+ def add(self):
+ """
+ Add new DOI(s) from the input field to the list.
+
+ Parses the input string for comma-separated DOIs, adds them to the internal list,
+ removes duplicates if any, updates the display labels, and enables relevant UI buttons.
+ Shows a message box if duplicates are found and removed.
+ """
+ DOIList = [x.strip() for x in self.newDOI.strip().split(',')]
+
+ for DOI in DOIList:
+ self.DOIs.append(DOI)
+ if self.DOIs:
+ tempSet = set(self.DOIs)
+ if len(tempSet)
+
diff --git a/specs/Sci-Hubator.rst b/specs/Sci-Hubator.rst
new file mode 100644
index 00000000..3228770f
--- /dev/null
+++ b/specs/Sci-Hubator.rst
@@ -0,0 +1,117 @@
+############################
+Specification widget SCI-HUbator
+############################
+
+1 Introduction
+**************
+
+1.1 But du projet
+=================
+Créer un widget pour Orange Textable (v3.2.2) permettant l'importation et l'extraction de corpus tirés de `Sci-HUB `_
+
+1.2 Aperçu des étapes
+=====================
+* Première version des spécifications: 13.03.2025
+* Remise des spécifications: 20.03.2025
+* Version alpha du projet: 17.04.2025
+* Version finale du projet: 22.05.2025
+
+1.3 Equipe et résponsabilités
+==============================
+
+* Luginbühl Colin (`colin.luginbuhl@unil.ch`_):
+
+.. _colin.luginbuhl@unil.ch: mailto:colin.luginbuhl@unil.ch
+
+ - Specification
+ - Extraction des données
+ - Code
+ - Documentation
+
+* Borgeaud Matthias (`matthias.borgeaud@unil.ch`_):
+
+.. _matthias.borgeaud@unil.ch: mailto:matthias.borgeaud@unil.ch
+
+ - Spécification
+ - Code
+ - Documentation
+ - Vérification orthographe
+
+* Peretti-Poix Sarah (`sarah.peretti-poix@unil.ch`_):
+
+.. _sarah.peretti-poix@unil.ch: mailto:sarah.peretti-poix@unil.ch
+
+ - Spécification
+ - GitHub
+ - Code
+ - Débuggage
+
+* Chétioui Orsowen (`orsowen.chetioui@unil.ch`_):
+
+.. _orsowen.chetioui@unil.ch: mailto:orsowen.chetioui@unil.ch
+
+ - Documentation
+ - Code
+ - Débuggage
+
+2. Technique
+************
+
+2.1 Dépendances
+===============
+* Orange 3.38.1
+* Orange Textable 3.2.2
+* `scidownl `_ 1.0.2
+* `pdfplumber `_ 0.11.6 (déjà présent pour SuperTextFiles)
+
+
+2.2 Fonctionnalités minimales
+=============================
+
+.. image:: images/scihubator_minimal.png
+
+* permettre l'importation de pdf tirés de SCI-HUB à l'aide d'un DOI et l'extraction du corpus textuel.
+* créer et émettre une segmentation avec un segment (=Input) comprenant l'entièreté du texte du PDF.
+
+2.3 Fonctionnalités principales
+===============================
+
+.. image:: images/scihubator_principal_specs.png
+
+* permettre l'importation de pdf tirés de SCI-HUB (à partir d'un DOI).
+* permettre d'en tirer le texte.
+* permettre la constitution d'une sélection de corpus multiples (add/remove/clear).
+* créer et émettre une segmentation avec un segment (=Input)
+pour chaque partie du corpus importé (résumé/abstract, bibliographie...).
+* traitement correct des références
+
+2.4 Fonctionnalités optionnelles
+================================
+* créer et émettre une segmentation par thème.
+* créer et émettre un résumé/abstract.
+* créer et émettre un tableau de cross-reference.
+* importer un JSON contenant plusieurs DOI.
+
+2.5 Tests
+=========
+
+TODO
+
+3. Etapes
+*********
+
+3.1 Version alpha
+=================
+* L'interface graphique est complètement construite.
+* Les fonctionnalités minimales sont prises en charge par le logiciel.
+
+3.2 Remise et présentation
+==========================
+* Les fonctionnalités principales sont complétement prises en charge par le logiciel.
+* La documentation du logiciel est complète.
+
+
+4. Infrastructure
+=================
+Le projet est disponible sur GitHub à l'adresse `https://github.com/sarahperettipoix/orange3-textable-prototypes
+`_
diff --git a/specs/images/scihubator_minimal.png b/specs/images/scihubator_minimal.png
new file mode 100644
index 00000000..ab8ceaf0
Binary files /dev/null and b/specs/images/scihubator_minimal.png differ
diff --git a/specs/images/scihubator_principal.png b/specs/images/scihubator_principal.png
new file mode 100644
index 00000000..f3b3e84e
Binary files /dev/null and b/specs/images/scihubator_principal.png differ
diff --git a/specs/images/scihubator_principal_specs.png b/specs/images/scihubator_principal_specs.png
new file mode 100644
index 00000000..59d95c77
Binary files /dev/null and b/specs/images/scihubator_principal_specs.png differ