Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
b54ca8b
Création d'un script test concernant Sci-Hubator afin de faire la GUI…
ColinLug Mar 6, 2025
347209b
GUI des advanced settings, mais ne fonctionne pas ! Juste les graphis…
ColinLug Mar 6, 2025
b4c6f67
cahier de specification V1 scihubator avec images min et principal
sarahperettipoix Mar 11, 2025
82a241b
GUI des advanced settings, mais ne fonctionne pas ! Juste les graphis…
ColinLug Mar 13, 2025
414908f
Update SCi-Hubator.rst
ColinLug Mar 13, 2025
2026476
Update SCi-Hubator.rst
ColinLug Mar 13, 2025
a51961a
Update SCi-Hubator.rst
ColinLug Mar 13, 2025
3796fe3
Changement GUI comme demandé par M. Xanthos
ColinLug Mar 13, 2025
f58d589
Merge remote-tracking branch 'origin/master'
ColinLug Mar 13, 2025
d527051
Update SCi-Hubator.rst
ColinLug Mar 15, 2025
7407d50
Update SCi-Hubator.rst
ColinLug Mar 15, 2025
081ce4d
Images plus propres
ColinLug Mar 15, 2025
c3f6af2
Merge remote-tracking branch 'origin/master'
ColinLug Mar 15, 2025
a5ba572
Corrections de l'interface (URL n'était pas une formulation claire)
ColinLug Mar 15, 2025
96c6ad6
Update SCi-Hubator.rst
ColinLug Mar 15, 2025
025b856
Update SCi-Hubator.rst
ColinLug Mar 15, 2025
473897f
ajout DemoTextableWidget.py
ColinLug Mar 24, 2025
fa86284
Merge remote-tracking branch 'origin/master'
ColinLug Mar 24, 2025
adba070
Tentative de faire fonctionner les listes d'importations
ColinLug Mar 24, 2025
9c394bd
Boutons add, remove et clearall fonctionnent par la magie du Monstre …
ColinLug Mar 24, 2025
a0b159e
Boutons add, remove et clearall fonctionnent par la magie du Monstre …
ColinLug Mar 24, 2025
e6e9737
version GUI minimale DEMO
sarahperettipoix Mar 27, 2025
22e5467
version GUI minimale DEMO
sarahperettipoix Mar 27, 2025
82bd61b
Update SCi-Hubator.rst
ColinLug Apr 1, 2025
3d71a9f
Update SCi-Hubator.rst
ColinLug Apr 1, 2025
ef6b5ea
Update SCi-Hubator.rst
ColinLug Apr 1, 2025
a351e36
Le widget prend un DOI et lit le fichier pdf téléchargé
ColinLug Apr 3, 2025
af0b06e
version GUI minimale DEMO
sarahperettipoix Apr 3, 2025
66462ad
Gestion d'erreurs
ColinLug Apr 3, 2025
f352336
Gestion d'erreurs + progress bar + version minimale terminée?
ColinLug Apr 3, 2025
89e515e
version GUI minimale DEMO
sarahperettipoix Apr 10, 2025
aa0b8cf
2 étapes + double barre
ColinLug Apr 10, 2025
e2ebba5
preprocess et process
ColinLug Apr 10, 2025
05a7f1b
Gestion d'erreurs et cancel
ColinLug Apr 10, 2025
ee89514
scihub.png and .rst doc made (vcheck all TODOs)
sarahperettipoix Apr 10, 2025
44efd7e
Merge remote-tracking branch 'origin/master'
sarahperettipoix Apr 10, 2025
a27ac6d
made inputs to none
sarahperettipoix Apr 10, 2025
3ce3206
enlever debugage
ColinLug Apr 17, 2025
5103f55
Merge remote-tracking branch 'origin/master'
ColinLug Apr 17, 2025
ecf0397
buttons working
ColinLug Apr 17, 2025
bae28a9
supp function unused
ColinLug May 1, 2025
00f1084
regex en commentaire dans DemoSciHub.py
sarahperettipoix May 1, 2025
051ac71
Merge remote-tracking branch 'origin/master'
sarahperettipoix May 1, 2025
c05a619
Update SciHubatorTest.py
Orsoladee May 1, 2025
5a21543
Merge branch 'master' of https://github.com/sarahperettipoix/orange3-…
Orsoladee May 1, 2025
ac7feb3
specs
matBorgeaud May 8, 2025
539acb5
Update SciHubator.rst
matBorgeaud May 8, 2025
5400b1a
Update SciHubator.rst
matBorgeaud May 8, 2025
e210d64
update like demo
ColinLug May 8, 2025
a4fcaea
Merge remote-tracking branch 'origin/master'
ColinLug May 8, 2025
a013424
Update SciHubator.rst
matBorgeaud May 8, 2025
38c1a4d
import cleaned (a bit) and tasked working
ColinLug May 8, 2025
07c8f16
Merge remote-tracking branch 'origin/master'
ColinLug May 8, 2025
eec66e2
Update SciHubator.rst
matBorgeaud May 8, 2025
86e972d
Update SciHubator.rst
matBorgeaud May 8, 2025
9836f3a
Update SciHubator.rst
matBorgeaud May 8, 2025
fe9346e
jsp mais réparation des clearAll
ColinLug May 8, 2025
863982b
Merge remote-tracking branch 'origin/master'
ColinLug May 8, 2025
503abb2
updated processing and saving settings
ColinLug May 8, 2025
63c0ea0
Update SciHubator.rst
matBorgeaud May 9, 2025
478eb43
j'ai mis mon nom correctment c'est tout
sarahperettipoix May 14, 2025
3ba138a
truc
sarahperettipoix May 15, 2025
373fe5a
Update SciHubator.rst
matBorgeaud May 15, 2025
78ee0d1
Update SciHubator.rst
matBorgeaud May 15, 2025
02432ff
Update SciHubator.rst
matBorgeaud May 15, 2025
51fa219
Update SciHubator.rst
matBorgeaud May 15, 2025
7a17265
Update SciHubator.rst
matBorgeaud May 15, 2025
00c95f4
Update SciHubator.rst
matBorgeaud May 15, 2025
445f701
Update SciHubator.rst
matBorgeaud May 15, 2025
af53c6d
sa march tkt
ColinLug May 15, 2025
65afb6d
Merge remote-tracking branch 'origin/master'
ColinLug May 15, 2025
14fd590
abstract mis en commentaire
sarahperettipoix May 15, 2025
790609a
all should work i guess
sarahperettipoix May 15, 2025
8533bb5
les larmes
ColinLug May 15, 2025
ac98506
enlever les abstract sections
sarahperettipoix May 16, 2025
e8b8a61
changer image et enlever commentaires
sarahperettipoix May 19, 2025
8a7c487
correction
sarahperettipoix May 19, 2025
fed37da
les larmes ont cessé
ColinLug May 19, 2025
d2612ad
Merge remote-tracking branch 'origin/master'
ColinLug May 19, 2025
23aa031
corrections
sarahperettipoix May 19, 2025
f94ab87
update image v. maximale
ColinLug May 19, 2025
1f2148f
update nom spec
ColinLug May 19, 2025
a17e35d
png
sarahperettipoix May 19, 2025
63614fb
svg
sarahperettipoix May 19, 2025
f590b4e
Merge remote-tracking branch 'origin/master'
sarahperettipoix May 19, 2025
9735237
Update SciHubator.rst
matBorgeaud May 21, 2025
3146839
Update SciHubator.rst
matBorgeaud May 21, 2025
976bf84
specs on line again
ColinLug May 21, 2025
92aa31f
Merge remote-tracking branch 'origin/master'
ColinLug May 21, 2025
73cfb85
partie 1 des corrections demandées
sarahperettipoix May 22, 2025
c434d98
orso doc ajoutée
ColinLug May 22, 2025
19e8e4c
Merge remote-tracking branch 'origin/master'
ColinLug May 22, 2025
3f217b4
orso doc ajoutée part.2
ColinLug May 22, 2025
b3e296b
chais pas
sarahperettipoix Jun 3, 2025
5cd168d
this is not going well
sarahperettipoix Jun 3, 2025
4b2509a
progress_bar part 3 works?
ColinLug Jun 3, 2025
1e17d0f
specs like before
ColinLug Jun 3, 2025
db053a4
Merge remote-tracking branch 'origin/master'
sarahperettipoix Jun 3, 2025
5f27378
petit ajout pour les radiobuttons
ColinLug Jun 3, 2025
76ab128
Merge remote-tracking branch 'origin/master'
sarahperettipoix Jun 5, 2025
49401e9
this is not going well
sarahperettipoix Jun 5, 2025
e96d360
final
ColinLug Jun 5, 2025
e6dc84a
final
ColinLug Jun 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions doc/widgets/SciHubator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
.. meta::
:description: Orange3 Textable Prototypes documentation, SciHubator widget
:keywords: Orange3, Textable, Prototypes, documentation, SciHubator, widget

.. _SciHubator:

SciHubator
==============

.. image:: https://github.com/sarahperettipoix/orange3-textable-prototypes/blob/master/orangecontrib/textable_prototypes/widgets/icons/scihubator.png

Download pdf files from `Sci-HUB <https://www.sci-hub.se/>`_ and extract textual content into segmentations

Authors
-------
Peretti-Poix Sarah, Borgeaud Matthias, Chétioui Orsowen, Luginbühl Colin

Signals
-------

Inputs: ``None``

None


Outputs: ``Text data``

Segmentation covering the content of downloaded pdf files

Requirements
------------

* Orange 3.38.1
* Orange Textable 3.2.2
* from scidownl import scihub_download
* import pdfplumber

Description
-----------

This widget is designed to download pdf files from the SciHub project and outputs its content
into an annotated text segmentation.


Basic interface
~~~~~~~~~~~~~~~

In its basic version,
the **SciHubator** widget is limited to the import of a single DOI.
The interface contains a **Source** section enabling the user to type the DOI.

.. _SciHubator_basicinterface:

.. figure:: https://github.com/sarahperettipoix/orange3-textable-prototypes/blob/master/specs/images/scihubator_minimal.png
:align: center
:alt: Basic interface of the SciHubator widget

Figure 1: **SciHubator** widget (basic interface).

Note that pdfplumber might not work properly with none latin alphabets
and serif typefaces.

The **Send** button triggers the emission of a segmentation to the output
connection(s). When it is selected, the **Send automatically** checkbox
disables the button and the widget attempts to automatically emit a
segmentation at every modification of its interface.

The text below the **Send** button indicates the number TODO of characters in the single
segment contained in the output segmentation, or the reasons why no
segmentation is emitted (no input data, encoding issue, etc.).

Advanced interface
~~~~~~~~~~~~~~~~~~

The advanced version of **SciHubator** allows the user to type several DOIs
in a determined order; each output text file can moreover be segmented into
specific segmentations (introduction, mais corpus and bibliography) with specific
annotations. The emitted segmentation contains a segment
for each imported file.

.. _scihubator_advancedinterface:

.. figure:: https://github.com/sarahperettipoix/orange3-textable-prototypes/blob/master/specs/images/scihubator_principal.png
:align: center
:alt: Advanced interface of the Super Text files widget
:scale: 80%

Figure 2: **SciHubator** widget (advanced interface).

The advanced interface presents similarities with that of the **URLs** and **Segment**
widgets. The **Sources** section allows the user to select the input
DOI(s). The list
of imported files appears at the top of the window; the columns of this list
indicate (a) the name of each file, (b) the corresponding annotation (if any),
and (c) the encoding with which each is associated.

The first buttons on the right of the imported files' list enable the user to
modify the order in which they appear in the output segmentation (**Move Up**
and **Move Down**), to delete a file from the list (**Remove**) or to
completely empty it (**Clear All**). Except for **Clear All**, all these
buttons require the user to previously select an entry from the list.

The **Send** button triggers the emission of a segmentation to the output
connection(s). When it is selected, the **Send automatically** checkbox
disables the button and the widget attempts to automatically emit a
segmentation at every modification of its interface.

The text below the **Send** button indicates the length of the output segmentation in
characters, or the reasons why no segmentation is emitted (no selected file,
encoding issue, etc.). In the example, the two segments corresponding to the
imported files thus total up to 1'262'145 characters.

Messages
--------

Information
~~~~~~~~~~~

*Data correctly sent to output: <n> segments (<m> characters).*
This confirms that the widget has operated properly.

*Settings were* (or *Input has*) *changed, please click 'Send' when ready.*
Settings and/or input have changed but the **Send automatically** checkbox
has not been selected, so the user is prompted to click the **Send**
button (or equivalently check the box) in order for computation and data
emission to proceed.

*No data sent to output yet: no DOI selected.*
The widget instance is not able to emit data to output because no input
DOI has been selected.

*No data sent to output yet, see 'Widget state' below.*
A problem with the instance's parameters and/or input data prevents it
from operating properly, and additional diagnostic information can be
found in the **Widget state** box at the bottom of the instance's
interface (see `Warnings`_ and `Errors`_ below).

*Duplicate DOI(s) found and deleted.*
A duplicate DOI was found in the DOI list.
Adding operation is halted so that no duplicates appear


Warnings
~~~~~~~~

*Please enter one or many valid DOIs.*
A valid DOI is required for being processed by Sci-Hub.
The warning indicates that nothing was typed in the DOI field.

*Not all sections were segmented*
The regex was not able to segment the content of certain DOIs.

*Step 1/3: Pre-processing...*
The PDF is being downloaded
*Step 2/3: Processing...*
The PDF is being processed into a raw text.
*Step 3/3: Post-processing...*
Segmentations are applied to the text.




Errors
~~~~~~

*SciHub inaccessible - verify your connexion.*
Please verify your internet connexion or check if `Sci-HUB <https://www.sci-hub.se/>`_ is down.

*An error occurred when downloading.*
Downloading the PDF didn't worked, please try again.

*Error occurred when reading PDF:*
An unexpected error occurred when reading the downloaded PDF. Please try again, if the error still happen your DOI could be not compatible.

*Download failed. Please, verify DOI or connexion.*
Sci-Hub is accessible but scihubator couldn't download the PDF. Your connexion has perhaps crashed in the download process or the DOI provided is not valid.



Loading