pyAutoSummarizer

pyAutoSummarizer - An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence.

Introduction

pyAutoSummarizer is a sophisticated Python library developed to handle the complex task of text summarization, an essential component of NLP (Natural Language Processing). The library implements several advanced summarization algorithms, both extractive and abstractive. Extractive summarization algorithms focus on identifying and extracting key sentences or phrases from the original text to form the summary. Among the techniques utilized by pyAutoSummarizer are TextRank, LexRank, LSA (Latent Semantic Analysis), and KL-Sum. In the domain of deep learning, pyAutoSummarizer incorporates BART (Bidirectional and Auto-Regressive Transformers) and the use of T5 (Text-to-Text Transfer Transformer) model, which is known for its versatility in handling a range of language tasks including summarization. Furthermore, pyAutoSummarizer also utilizes PEGASUS (Pre-training with Extracted Gap-sentences for Abstractive Summarization) and the OpenAI's GPT (Generative Pretrained Transformer), specifically the chatGPT model for abstractive summarization. Unlike extractive techniques, abstractive summarization involves generating new sentences, offering a summary that maintains the essence of the original text but may not use the exact wording.

pyAutoSummarizer stands out for its proficient preprocessing capabilities that pave the way for high-quality text summarization. Recognizing the importance of text normalization, the library offers a range of text cleansing and standardization features. It can convert text to lowercase, ensuring uniformity across the data. Additionally, it can remove accents, remove special characters, and remove numbers, which helps mitigate the text's noise. It also offers the functionality to remove custom words, enabling users to tailor their preprocessing needs. Notably, pyAutoSummarizer supports stopwords removal across various languages, including Arabic, Bengali, Bulgarian, Chinese, Czech, English, Finnish, French, German, Greek, Hebrew, Hind, Hungarian, Italian, Japanese, Korean, Marathi, Persia, Polish, Portuguese-br, Romanian, Russian, Slovak, Spanish, Swedish, Thai, and Ukrainian. The library provides flexibility in sentence segmentation, allowing sentences to be split based on punctuation, character count, or word count.

To evaluate the quality of the summaries generated, pyAutoSummarizer integrates various metrics such as Rouge-N, Rouge-L, and Rouge-S, which compare the overlap of n-grams, longest common subsequence, and skip-bigram between the generated summary and the reference summary respectively. Additionally, it employs BLEU (Bilingual Evaluation Understudy), and METEOR (Metric for Evaluation of Translation with Explicit ORdering).

Usage

Install

pip install pyAutoSummarizer

Try it in Colab:

Extractive Summarization

Example 01: TextRank ( Colab Demo )
Example 02: LexRank ( Colab Demo )
Example 03: LSA ( Colab Demo )
Example 04: KL-Sum ( Colab Demo )
Example 05: BART (Deep Learning) ( Colab Demo )
Example 06: T5 (Deep Learning) ( Colab Demo )

Abstractive Summarization.

Example 01: chatGPT (Deep Learning) ( Colab Demo ) Requires the user to have an API key (https://platform.openai.com/account/api-keys)
Example 02: PEGASUS (Deep Learning) ( Colab Demo )

Others

pyBibX - A Bibliometric and Scientometric Python Library Powered with Artificial Intelligence Tools

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
pyAutoSummarizer		pyAutoSummarizer
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyAutoSummarizer

Introduction

Usage

Others

About

Releases

Packages

Languages

License

Valdecy/pyAutoSummarizer

Folders and files

Latest commit

History

Repository files navigation

pyAutoSummarizer

Introduction

Usage

Others

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages