Skip to content

Seif250/Text-Summarization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Text Summarization Project βœοΈπŸ“–

A deep learning-based project for summarizing articles into concise highlights. This project uses natural language processing (NLP) and attention mechanisms to extract meaningful summaries from long text articles.


✨ Features

  • πŸ“„ Clean and preprocess text data for NLP tasks.
  • πŸ–Ό Visualize data insights with histograms and word clouds.
  • πŸ”  Tokenize and pad sequences for training deep learning models.
  • 🧠 Build and train a sequence-to-sequence model with attention for text summarization.
  • πŸ’Ύ Save trained models and tokenizers for future use.

πŸ“š Technologies Used

  • Python: The core programming language.
  • Libraries:
    • πŸ”Ή pandas, numpy: For data processing and numerical computations.
    • πŸ”Ή matplotlib, seaborn, wordcloud: For data visualization.
    • πŸ”Ή tensorflow.keras: For building and training the deep learning model.
    • πŸ”Ή sklearn: For splitting datasets into training and testing sets.

πŸ”„ How to Run

  1. Clone the repository and navigate to the project directory:
    git clone <repository_url>
    cd text_summarization_project
  2. Install the required libraries:
    pip install pandas numpy matplotlib seaborn wordcloud tensorflow scikit-learn
  3. Place your dataset (train.csv) in the root directory. Ensure it has the required columns:
    • article
    • highlights
  4. Run the main script:
    python main.py

πŸ§ͺ Workflow

  1. Preprocessing:
    • Clean and standardize text by removing special characters, numbers, and extra spaces.
    • Tokenize and pad the sequences for both articles and highlights.
  2. Visualization:
    • Generate histograms for article and summary lengths.
    • Create word clouds to highlight frequently used terms.
  3. Model Training:
    • Use LSTM and attention mechanisms to develop a seq2seq model for summarization.
    • Optimize model performance with callbacks like EarlyStopping and ReduceLROnPlateau.
  4. Saving Artifacts:
    • Save the trained model and tokenizers for future inference.

πŸ”œ Future Improvements

  • βž• Add support for multilingual text summarization.
  • πŸš€ Enhance model architecture for better performance.
  • πŸ“Š Implement an interactive dashboard for summary generation.

πŸ–Ό Screenshots


πŸ”’ License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages