Skip to content

aryan-r03/Twitter-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Typing SVG

Python Flask Scikit-learn NLTK License

Accuracy Model NLP API


🐦 AI-Powered Twitter Sentiment Analysis Web Application

Professional Flask web application that analyzes tweet sentiment in real-time using Machine Learning and Natural Language Processing. Features advanced text preprocessing, TF-IDF vectorization, and Logistic Regression for accurate emotion detection.

💬 Perfect for social media analytics, brand monitoring, and NLP learning projects

FeaturesDemoQuick StartAPIModel


📋 Table of Contents


🌟 Project Overview

Twitter
Tweet Analysis
Real-time sentiment
280 char support
AI
Machine Learning
Logistic Regression
85%+ accuracy
API
RESTful API
JSON responses
Easy integration
Web
Modern UI
Responsive design
Real-time results

A production-ready sentiment analysis application that uses Machine Learning to classify tweets as positive or negative in real-time. Built with Flask for the backend, scikit-learn for ML, and NLTK for natural language processing.

🎯 Why This Project?

For Learning:

  • 🎓 Master NLP fundamentals
  • 📊 Understand ML classification
  • 🌐 Learn Flask web development
  • 🧹 Practice text preprocessing
  • 📈 Explore feature engineering (TF-IDF)

For Production:

  • 💼 Industry-standard architecture
  • 🎨 Professional UI/UX design
  • 📱 Social media monitoring
  • 🔍 Brand sentiment analysis
  • 📊 Customer feedback analysis

✨ Features

Core Capabilities

Category Features
🤖 Machine Learning ✅ Logistic Regression classifier
✅ Balanced class weights
✅ 85%+ accuracy on test data
✅ TF-IDF vectorization (5000 features)
✅ Bigram support (1-2 word phrases)
✅ Model persistence with pickle
📝 NLP Text Processing ✅ Advanced text cleaning
✅ URL and mention removal
✅ Lemmatization (WordNet)
✅ Stopword removal (keeps negations)
✅ Special character handling
✅ Lowercase normalization
🌐 Web Application ✅ Modern, responsive UI
✅ Real-time sentiment analysis
✅ Character counter (280 limit)
✅ Animated result display
✅ Confidence score visualization
✅ Keyboard shortcuts (Enter to analyze)
📡 RESTful API ✅ JSON request/response format
✅ POST /api/analyze endpoint
✅ Detailed sentiment scores
✅ Error handling & validation
✅ CORS support ready
✅ Easy external integration
📊 Model Evaluation ✅ Accuracy, Precision, Recall metrics
✅ F1-Score calculation
✅ Confusion matrix visualization
✅ Classification report
✅ Train/test split (80/20)
✅ Stratified sampling
💾 Data Handling ✅ CSV dataset loading
✅ Multiple format support
✅ Automatic label conversion
✅ Data validation & cleaning
✅ Balanced dataset sampling
✅ Missing value handling

🎬 Demo & Preview

Application Interface

┌─────────────────────────────────────────────────────┐
│      🐦 Twitter Sentiment Analysis                  │
│   AI-Powered Emotion Detection using ML             │
│    [Trained with Logistic Regression]               │
├─────────────────────────────────────────────────────┤
│                                                     │
│   Enter Tweet                                       │
│   ┌─────────────────────────────────────────────┐   │
│   │ I love this amazing product! It's so good!  │   │
│   └─────────────────────────────────────────────┘   │
│                               47 / 280 characters   │
│                                                     │
│                                                     │
│           [      Analyze Sentiment      ]           │
│   ╔════════════════════════════════════════════╗    │
│   ║             Sentiment: Positive            ║    │
│   ║                                            ║    │
│   ║  ┌───────────┐ ┌────────────┐ ┌──────────┐ ║    │
│   ║  │Confidence │ │Positive    │ │Negative  │ ║    │
│   ║  │   95%     │ │Score: 95%  │ │Score: 5% │ ║    │
│   ║  └───────────┘ └────────────┘ └──────────┘ ║    │
│   ╚════════════════════════════════════════════╝    │
│                                                     │
└─────────────────────────────────────────────────────┘

System Architecture

┌─────────────────────────────────────────────────────────┐
│                   User Interface (Web)                  │
│                 index.html + JavaScript                 │
└────────────────────────────┬────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│             Flask Web Server (app.py)             │
│  ┌────────────────────────────────────────────┐   │
│  │ # Route Handlers                           │   │
│  │  • GET /           → Serve HTML            │   │
│  │  • POST /api/analyze → Analyze sentiment   │   │
│  └────────────────────────────────────────────┘   │
└─────────────────────────┬─────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│        SentimentModel Class (sentiment_model.py)        │
│  ┌──────────────────────────────────────────────────┐   │
│  │  1. Text Preprocessing                           │   │
│  │     • Lowercase, remove URLs, mentions           │   │
│  │     • Lemmatization, stopword removal            │   │
│  │                                                  │   │
│  │  2. TF-IDF Vectorization                         │   │
│  │     • Convert text → numerical features          │   │
│  │     • 5000 max features, bigrams                 │   │
│  │                                                  │   │
│  │  3. Logistic Regression                          │   │
│  │     • Binary classification (0/1)                │   │
│  │     • Probability scores                         │   │
│  └──────────────────────────────────────────────────┘   │
└────────────────────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│                  Prediction Result                      │
│  {                                                      │
│    "sentiment": "Positive",                             │
│    "confidence": 95,                                    │
│    "positive_score": 95,                                │
│    "negative_score": 5                                  │
│  }                                                      │
└─────────────────────────────────────────────────────────┘

ML Pipeline Workflow

Input Tweet
│
▼
┌─────────────────────────────┐
│   Text Preprocessing        │
│  • Convert to lowercase     │
│  • Remove URLs & @mentions  │
│  • Clean special chars      │
│  • Lemmatize words          │
│  • Remove stopwords         │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│   TF-IDF Vectorization      │
│  • Extract features         │
│  • Weight by importance     │
│  • Create feature vector    │
│    (5000 dimensions)        │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│   Logistic Regression       │
│  • Predict class (0/1)      │
│  • Calculate probabilities  │
│  • Return confidence        │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│   Sentiment Result          │
│  • Positive or Negative     │
│  • Confidence score (%)     │
│  • Individual probabilities │
└─────────────────────────────┘

🧠 Tech Stack

Technologies & Libraries

Python
Python 3.8+
Core language
Flask
Flask 3.0
Web framework
Scikit-learn
Scikit-learn
ML library
Pandas
Pandas
Data processing
NumPy
NumPy
Numerical ops
NLTK
NLTK
NLP toolkit
HTML5
HTML5/CSS3
Frontend
JavaScript
JavaScript
Frontend logic

Component Stack

Component Technology Purpose
Web Framework Flask HTTP routing, templating, JSON API
ML Algorithm Logistic Regression Binary sentiment classification
Feature Extraction TF-IDF Vectorizer Text → numerical features
Text Processing NLTK Lemmatization, stopwords, tokenization
Data Handling Pandas CSV loading, data manipulation
Model Storage Pickle Serialize/deserialize trained model
Frontend HTML/CSS/JavaScript Responsive UI, AJAX requests

📦 Installation

System Requirements

Requirement Minimum Recommended
Python 3.8 3.9 - 3.11
RAM 2 GB 4 GB+
Storage 500 MB 1 GB+ (for datasets)
OS Windows 10+, macOS 10.14+, Ubuntu 18.04+

Dependencies

# Core Framework
flask==3.0.0                # Web application framework

# Machine Learning
pandas==2.1.4               # Data manipulation
numpy==1.26.2               # Numerical computing
scikit-learn==1.3.2         # ML algorithms & tools

# Natural Language Processing
nltk==3.8.1                 # NLP toolkit (stopwords, lemmatization)

🚀 Quick Start

Step 1️⃣: Clone Repository

git clone https://github.com/your-username/twitter-sentiment-analysis.git
cd twitter-sentiment-analysis

Step 2️⃣: Create Virtual Environment (Recommended)

Windows:

python -m venv venv
venv\Scripts\activate

macOS/Linux:

python -m venv venv
source venv/bin/activate

Step 3️⃣: Install Dependencies

pip install -r requirements.txt

Verify installation:

python -c "import flask, sklearn, nltk, pandas; print('All dependencies installed!')"

Step 4️⃣: Download NLTK Data (Automatic)

The application will automatically download required NLTK data on first run:

  • Stopwords corpus
  • WordNet lemmatizer
  • OMW-1.4 (Open Multilingual Wordnet)

Manual download (if needed):

import nltk
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

Step 5️⃣: Prepare Dataset (Optional)

Option A: Use Sample Data (Default)

  • Application creates a sample dataset automatically
  • Good for testing and learning

Option B: Use Custom Dataset

# Place your CSV file in the project directory
# Name it: data.csv or Twitter_data.csv

# CSV Format:
# - Column 1: Tweet text
# - Column 2: Sentiment label (Positive/Negative or 0/1)

Step 6️⃣: Run the Application

python app.py

Expected Output:

============================================================
INITIALIZING SENTIMENT ANALYSIS MODEL
============================================================

✓ Found existing model: sentiment_model.pkl
✓ Model loaded successfully!

============================================================
MODEL READY
============================================================

============================================================
TWITTER SENTIMENT ANALYSIS - MACHINE LEARNING PROJECT
============================================================

📌 To use your own dataset:
   Place CSV file named 'data.csv' in this directory
   CSV should have columns: 'text' and 'sentiment'

Starting Flask server...
Open your browser and go to: http://127.0.0.1:5000
============================================================

 * Running on http://127.0.0.1:5000

Step 7️⃣: Access the Application

  1. Open your browser
  2. Navigate to the URL from the terminal
  3. Enter a tweet in the text area
  4. Click "Analyze Sentiment"
  5. View results with confidence scores! 🎉

💻 Usage Guide

Web Interface

Step-by-Step:

  1. Enter Tweet Text

    • Type or paste a tweet (up to 280 characters)
    • Character counter shows remaining length
    • Example: "I absolutely love this product! Best purchase ever!"
  2. Analyze Sentiment

    • Click "Analyze Sentiment" button
    • Or press Enter key for quick analysis
    • Loading animation appears during processing
  3. View Results

    • Sentiment: Positive or Negative
    • Confidence: Overall prediction confidence (%)
    • Positive Score: Probability of positive sentiment (%)
    • Negative Score: Probability of negative sentiment (%)

Example Tweets to Try:

Tweet Expected Result
"I love this amazing product! It's fantastic!" ✅ Positive (90%+ confidence)
"This is terrible. Worst experience ever." ❌ Negative (90%+ confidence)
"The weather is nice today." ✅ Positive (moderate confidence)
"I don't like this at all. Very disappointed." ❌ Negative (high confidence)

Training Custom Model

With Your Own Dataset:

from sentiment_model import SentimentModel

# Initialize model
model = SentimentModel()

# Load your CSV file
df = model.load_dataset_from_csv(
    'your_data.csv',
    text_column='tweet',      # Column with tweet text
    label_column='sentiment'  # Column with labels
)

# Train model
if df is not None:
    accuracy = model.train(df)
    print(f"Model accuracy: {accuracy:.2%}")
    
    # Save trained model
    model.save_model('my_sentiment_model.pkl')

Test Predictions:

# Make predictions
test_tweets = [
    "I love this product!",
    "This is terrible.",
    "Not bad, could be better."
]

for tweet in test_tweets:
    result = model.predict(tweet)
    print(f"\nTweet: {tweet}")
    print(f"Sentiment: {result['sentiment']}")
    print(f"Confidence: {result['confidence']}%")

📡 API Reference

Available Endpoints

🏠 Home Page

Endpoint: GET /

Description: Serves the main HTML interface

Response: HTML page

Usage:

curl http://127.0.0.1:5000/

🔮 Analyze Sentiment

Endpoint: POST /api/analyze

Description: Analyzes sentiment of provided tweet text

Request Headers:

Content-Type: application/json

Request Body:

{
  "tweet": "I absolutely love this product! It's amazing!"
}

Response (Success):

{
  "success": true,
  "result": {
    "sentiment": "Positive",
    "confidence": 95,
    "positive_score": 95,
    "negative_score": 5
  }
}

Response (Error - Empty Tweet):

{
  "success": false,
  "error": "Tweet is empty"
}

Response (Error - No Tweet Provided):

{
  "success": false,
  "error": "No tweet provided"
}

Status Codes:

  • 200 OK - Analysis successful
  • 400 Bad Request - Invalid input (empty tweet, no tweet field)
  • 500 Internal Server Error - Server/model error

Integration Examples

Python (requests)
import requests
import json

# API endpoint
url = "http://127.0.0.1:5000/api/analyze"

# Tweet to analyze
tweet_data = {
    "tweet": "I love this amazing product! Best purchase ever!"
}

# Make request
response = requests.post(url, json=tweet_data)
result = response.json()

if result['success']:
    print(f"Sentiment: {result['result']['sentiment']}")
    print(f"Confidence: {result['result']['confidence']}%")
    print(f"Positive Score: {result['result']['positive_score']}%")
    print(f"Negative Score: {result['result']['negative_score']}%")
else:
    print(f"Error: {result['error']}")
JavaScript (Fetch API)
// API endpoint
const url = 'http://127.0.0.1:5000/api/analyze';

// Tweet to analyze
const tweetData = {
  tweet: "I love this amazing product! Best purchase ever!"
};

// Make request
fetch(url, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(tweetData)
})
.then(response => response.json())
.then(data => {
  if (data.success) {
    console.log(`Sentiment: ${data.result.sentiment}`);
    console.log(`Confidence: ${data.result.confidence}%`);
    console.log(`Positive Score: ${data.result.positive_score}%`);
    console.log(`Negative Score: ${data.result.negative_score}%`);
  } else {
    console.error(`Error: ${data.error}`);
  }
})
.catch(error => console.error('Request failed:', error));
cURL (Command Line)
# Analyze sentiment
curl -X POST http://127.0.0.1:5000/api/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "tweet": "I love this amazing product! Best purchase ever!"
  }'

# Pretty print with jq
curl -X POST http://127.0.0.1:5000/api/analyze \
  -H "Content-Type: application/json" \
  -d @tweet.json | jq '.'
Node.js (Axios)
const axios = require('axios');

// API endpoint
const url = 'http://127.0.0.1:5000/api/analyze';

// Tweet to analyze
const tweetData = {
  tweet: "I love this amazing product! Best purchase ever!"
};

// Make request
axios.post(url, tweetData)
  .then(response => {
    const data = response.data;
    if (data.success) {
      console.log(`Sentiment: ${data.result.sentiment}`);
      console.log(`Confidence: ${data.result.confidence}%`);
      console.log(`Positive: ${data.result.positive_score}%`);
      console.log(`Negative: ${data.result.negative_score}%`);
    } else {
      console.error(`Error: ${data.error}`);
    }
  })
  .catch(error => console.error('Request failed:', error));

🤖 Model Details

Logistic Regression Classifier

Algorithm Configuration:

LogisticRegression(
    max_iter=1000,          # Maximum iterations for convergence
    random_state=42,        # Reproducibility
    class_weight='balanced' # Handle imbalanced datasets
)

Key Features:

  • Binary Classification: Positive (1) vs Negative (0)
  • Probability Output: Confidence scores for each class
  • Balanced Weights: Handles imbalanced data automatically
  • Fast Training: Efficient on large datasets
  • Interpretable: Clear feature importance

Model Performance Metrics

Metric Score Interpretation
Accuracy 85-90% Overall correct predictions
Precision 83-88% Positive predictions that are correct
Recall 85-90% Actual positives correctly identified
F1-Score 0.84-0.89 Harmonic mean of precision & recall

Sample Confusion Matrix:

                Predicted
              Neg    Pos
Actual Neg    420    35      Specificity: 92.3%
       Pos     48    397     Recall: 89.2%

True Negatives: 420    False Positives: 35
False Negatives: 48    True Positives: 397

Overall Accuracy: (420 + 397) / 900 = 90.78%

TF-IDF Vectorization

Configuration:

TfidfVectorizer(
    max_features=5000,      # Top 5000 most important features
    ngram_range=(1, 2)      # Unigrams and bigrams
)

What is TF-IDF?

  • Term Frequency (TF): How often a word appears in a document
  • Inverse Document Frequency (IDF): How unique/rare a word is across all documents
  • TF-IDF Score: TF × IDF = Importance of word in document

Example:

Tweet: "I love this product! The product is amazing!"

Unigrams: ["love", "product", "amazing", ...]
Bigrams: ["love product", "product amazing", ...]

TF-IDF Vector: [0.23, 0.45, 0.67, ...] (5000 dimensions)

Why Bigrams?

  • Captures phrases: "not good" vs "good"
  • Better context understanding
  • Improved accuracy for negations

🔧 Text Processing Pipeline

Preprocessing Steps

Complete Pipeline:

def clean_text(text):
    # 1. Lowercase
    text = text.lower()
    # → "I Love This!" → "i love this!"
    
    # 2. Remove URLs
    text = re.sub(r'http\S+|www\S+|https\S+', '', text)
    # → "Check this: http://example.com" → "Check this:"
    
    # 3. Remove @mentions
    text = re.sub(r'@\w+', '', text)
    # → "@user Great product!" → "Great product!"
    
    # 4. Remove special characters (keep !?)
    text = re.sub(r'[^a-zA-z\s!?]', '', text)
    # → "Product #1 costs $50!" → "Product  costs !"
    
    # 5. Lemmatization
    words = [lemmatizer.lemmatize(word) for word in words]
    # → "running" → "run", "better" → "good"
    
    # 6. Remove stopwords (keep negations)
    keep_words = {'not', 'no', 'never', 'none', 'neither', 'nor'}
    words = [word for word in words 
             if word not in stopwords or word in keep_words]
    # → Keeps "not good" (important for sentiment)
    
    return ' '.join(words)

Example Transformation:

Stage Text
Original I LOVE this product! Check it out: http://example.com @company #amazing
Lowercase i love this product! check it out: http://example.com @company #amazing
Remove URLs i love this product! check it out: @company #amazing
Remove Mentions i love this product! check it out: #amazing
Remove Special Chars i love this product check it out amazing
Lemmatize i love this product check it out amazing
Remove Stopwords love product check amazing
Final love product check amazing

Negation Handling:

# These words are preserved even though they're stopwords
keep_words = {'not', 'no', 'never', 'none', 'nothing', 'neither', 'nor', "n't"}

# Why? They completely change sentiment:
"good"Positive
"not good"Negative ✓ (preserved)

📊 Dataset Format

Supported CSV Formats

Format 1: Standard Binary (Recommended)

text,sentiment
"I love this product!",1
"This is terrible.",0
"Great experience!",1
"Worst service ever.",0

Format 2: Text Labels

text,sentiment
"I love this product!",positive
"This is terrible.",negative
"Great experience!",positive
"Worst service ever.",negative

Format 3: Twitter Format (0/4)

text,sentiment
"I love this product!",4
"This is terrible.",0
"Great experience!",4
"Worst service ever.",0

Format 4: Alternative Column Names

tweet,label
"I love this product!",pos
"This is terrible.",neg

Automatic Format Detection

The application automatically detects and converts:

Format Conversion
0/1 ✓ Already binary
0/4 0→0, 4→1
positive/negative negative→0, positive→1
pos/neg neg→0, pos→1
Custom numeric Threshold at median

Dataset Requirements:

Must Have:

  • Text column (tweets/messages)
  • Sentiment/label column
  • At least 100 samples (recommended: 1000+)
  • Balanced classes (equal pos/neg samples)

Avoid:

  • Missing values in text or sentiment
  • Empty tweets
  • Single-class datasets
  • Extreme class imbalance (>90% one class)

Example Custom Dataset:

import pandas as pd

# Create custom dataset
data = {
    'text': [
        "I love this!",
        "Terrible experience",
        "Amazing product!",
        "Very disappointed",
        # ... more samples
    ],
    'sentiment': [1, 0, 1, 0, ...]  # 0=negative, 1=positive
}

df = pd.DataFrame(data)
df.to_csv('my_dataset.csv', index=False)

⚙️ Configuration

Model Parameters

Modify in sentiment_model.py:

# TF-IDF Configuration
self.vectorizer = TfidfVectorizer(
    max_features=5000,        # Number of features (1000-10000)
    ngram_range=(1, 2),       # (1,1) for unigrams only, (1,3) for trigrams
    min_df=2,                 # Minimum document frequency
    max_df=0.95               # Maximum document frequency
)

# Logistic Regression Configuration
self.model = LogisticRegression(
    max_iter=1000,            # Increase if convergence warning
    random_state=42,          # For reproducibility
    class_weight='balanced',  # Handle imbalanced data
    C=1.0,                    # Regularization strength (lower = more regularization)
    solver='lbfgs'            # Optimization algorithm
)

Parameter Tuning Guide:

Parameter Effect Recommendation
max_features Number of TF-IDF features 5000 (balanced)
3000 (faster)
10000 (more accurate)
ngram_range Word combinations to consider (1,2) - unigrams + bigrams
(1,1) - single words only
(1,3) - up to 3-word phrases
max_iter Training iterations 1000 (default)
2000 (if not converging)
class_weight Handle class imbalance 'balanced' (recommended)
None (equal weights)

Flask Server Configuration

Modify in app.py:

# Change port
app.run(debug=True, port=5000)  # Use 8000, 8080, etc.

# Production mode
app.run(debug=False, host='0.0.0.0', port=5000)

# Threading for multiple requests
app.run(debug=False, threaded=True)

File Paths

# CSV dataset path
CSV_FILE = 'data.csv'  # Change to your dataset name

# Model save/load path
MODEL_FILE = 'sentiment_model.pkl'  # Custom model name

🎨 Customization

Extension Ideas

🎨 Custom UI Theme

Modify templates/index.html:

/* Change gradient colors */
body {
    background: linear-gradient(135deg, #1e3c72 0%, #2a5298 100%);
    /* Or try: */
    background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
    background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%);
}

/* Change button colors */
.btn {
    background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
}

/* Change positive/negative colors */
.result.positive {
    background: #c3e6cb;  /* Light green */
    border: 2px solid #28a745;
}

.result.negative {
    background: #f5c6cb;  /* Light red */
    border: 2px solid #dc3545;
}
📊 Add More Metrics
# In sentiment_model.py - predict() method

def predict(self, text):
    # ... existing code ...
    
    # Add entropy (uncertainty measure)
    entropy = -sum(p * np.log(p) for p in probabilities if p > 0)
    
    # Add emotion detection (requires additional model)
    emotions = self.detect_emotions(text)
    
    return {
        'sentiment': sentiment,
        'confidence': confidence,
        'positive_score': int(probabilities[1] * 100),
        'negative_score': int(probabilities[0] * 100),
        'entropy': round(entropy, 3),  # NEW
        'emotions': emotions  # NEW
    }
💾 Save Analysis History
# In app.py

import datetime
import json

HISTORY_FILE = 'analysis_history.json'

@app.route('/api/analyze', methods=['POST'])
def analyze():
    # ... existing analysis code ...
    
    # Save to history
    history_entry = {
        'timestamp': datetime.datetime.now().isoformat(),
        'tweet': tweet,
        'result': result
    }
    
    # Load existing history
    try:
        with open(HISTORY_FILE, 'r') as f:
            history = json.load(f)
    except:
        history = []
    
    # Add new entry
    history.append(history_entry)
    
    # Save history
    with open(HISTORY_FILE, 'w') as f:
        json.dump(history, f, indent=2)
    
    return jsonify({'success': True, 'result': result})

# Add endpoint to view history
@app.route('/api/history', methods=['GET'])
def get_history():
    try:
        with open(HISTORY_FILE, 'r') as f:
            history = json.load(f)
        return jsonify({'success': True, 'history': history})
    except:
        return jsonify({'success': False, 'error': 'No history found'})
🔄 Try Different ML Models
# In sentiment_model.py

from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB

# Option 1: Random Forest
self.model = RandomForestClassifier(
    n_estimators=100,
    max_depth=50,
    random_state=42
)

# Option 2: Support Vector Machine
self.model = SVC(
    kernel='linear',
    probability=True,  # Required for predict_proba
    random_state=42
)

# Option 3: Naive Bayes
self.model = MultinomialNB(
    alpha=1.0  # Smoothing parameter
)

# Compare models
models = {
    'Logistic Regression': LogisticRegression(),
    'Random Forest': RandomForestClassifier(),
    'SVM': SVC(probability=True),
    'Naive Bayes': MultinomialNB()
}

for name, model in models.items():
    self.model = model
    accuracy = self.train(df)
    print(f"{name}: {accuracy:.2%}")
📱 Add Batch Analysis
# In app.py

@app.route('/api/batch_analyze', methods=['POST'])
def batch_analyze():
    try:
        data = request.get_json()
        tweets = data.get('tweets', [])
        
        if not tweets or not isinstance(tweets, list):
            return jsonify({
                'success': False,
                'error': 'Invalid tweets array'
            }), 400
        
        results = []
        for tweet in tweets:
            result = sentiment_model.predict(tweet)
            results.append({
                'tweet': tweet,
                'sentiment': result
            })
        
        return jsonify({
            'success': True,
            'results': results,
            'count': len(results)
        })
    
    except Exception as e:
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500

Usage:

curl -X POST http://127.0.0.1:5000/api/batch_analyze \
  -H "Content-Type: application/json" \
  -d '{
    "tweets": [
      "I love this!",
      "This is terrible.",
      "Not bad."
    ]
  }'
🌐 Add Multi-language Support
# Install: pip install googletrans==3.1.0a0

from googletrans import Translator

translator = Translator()

def translate_to_english(text):
    """Translate text to English before analysis"""
    try:
        detected = translator.detect(text)
        if detected.lang != 'en':
            translated = translator.translate(text, dest='en')
            return translated.text
    except:
        pass
    return text

# In predict() method
def predict(self, text):
    # Translate if needed
    text_english = translate_to_english(text)
    
    # ... rest of prediction code ...

🐛 Troubleshooting

Common Issues & Solutions

❌ NLTK Data Not Found

Symptoms:

LookupError: Resource stopwords not found.
LookupError: Resource wordnet not found.

Solutions:

  1. Manual Download:

    import nltk
    nltk.download('stopwords')
    nltk.download('wordnet')
    nltk.download('omw-1.4')
  2. Download to Specific Directory:

    import nltk
    nltk.download('stopwords', download_dir='/path/to/nltk_data')
    nltk.data.path.append('/path/to/nltk_data')
  3. Download All NLTK Data:

    import nltk
    nltk.download('all')  # Warning: Large download (3.5 GB)
  4. Verify Installation:

    from nltk.corpus import stopwords
    print(stopwords.words('english')[:10])
    # Should print: ['i', 'me', 'my', 'myself', ...]
🔄 Model Not Training / Low Accuracy

Symptoms:

  • Accuracy < 70%
  • Model predicts same class for everything
  • Convergence warnings

Solutions:

  1. Check Dataset Balance:

    print(df['sentiment'].value_counts())
    # Should be roughly equal:
    # 0    5000
    # 1    5000
  2. Increase Training Data:

    • Need minimum 500 samples (250 per class)
    • Recommended: 5000+ samples
  3. Increase Max Iterations:

    self.model = LogisticRegression(max_iter=2000)  # Increase from 1000
  4. Balance Dataset:

    # Undersample majority class
    min_count = min(
        (df['sentiment'] == 0).sum(),
        (df['sentiment'] == 1).sum()
    )
    
    df_neg = df[df['sentiment'] == 0].sample(n=min_count)
    df_pos = df[df['sentiment'] == 1].sample(n=min_count)
    df_balanced = pd.concat([df_neg, df_pos])
  5. Check Text Quality:

    # Print sample cleaned texts
    print(df['cleaned_text'].head(10))
    # Should not be empty or too short
💾 Model Save/Load Errors

Symptoms:

FileNotFoundError: [Errno 2] No such file or directory: 'sentiment_model.pkl'
pickle.UnpicklingError: invalid load key

Solutions:

  1. Check File Exists:

    import os
    print(os.path.exists('sentiment_model.pkl'))
    print(os.path.abspath('sentiment_model.pkl'))
  2. Ensure Directory Permissions:

    # Linux/Mac
    chmod 755 .
    
    # Windows
    # Check folder permissions in Properties
  3. Save with Absolute Path:

    import os
    model_path = os.path.join(os.getcwd(), 'sentiment_model.pkl')
    model.save_model(model_path)
  4. Delete Corrupted Model:

    rm sentiment_model.pkl
    # Then retrain
    python app.py
🌐 Flask Server Won't Start

Symptoms:

Address already in use
OSError: [Errno 48] Address already in use

Solutions:

  1. Find Process Using Port:

    # Linux/Mac
    lsof -i :5000
    
    # Windows
    netstat -ano | findstr :5000
  2. Kill Process:

    # Linux/Mac
    kill -9 <PID>
    
    # Windows
    taskkill /PID <PID> /F
  3. Use Different Port:

    app.run(debug=True, port=8000)  # Change to 8000
  4. Check for Multiple Instances:

    ps aux | grep python  # Linux/Mac
    tasklist | findstr python  # Windows
📊 CSV Loading Errors

Symptoms:

FileNotFoundError: data.csv not found
KeyError: 'sentiment'
UnicodeDecodeError: 'utf-8' codec can't decode

Solutions:

  1. Check File Location:

    import os
    print(os.listdir('.'))  # List files in current directory
  2. Try Different Encoding:

    # In load_dataset_from_csv()
    try:
        df = pd.read_csv('data.csv', encoding='utf-8')
    except:
        df = pd.read_csv('data.csv', encoding='latin-1')
    except:
        df = pd.read_csv('data.csv', encoding='iso-8859-1')
  3. Check CSV Format:

    import pandas as pd
    df = pd.read_csv('data.csv', nrows=5)
    print(df.columns)  # Check column names
    print(df.head())   # Check first rows
  4. Manual Column Mapping:

    df = df.rename(columns={
        'Tweet': 'text',        # Rename your columns
        'Label': 'sentiment'
    })

🚀 Future Enhancements

Planned Features

Feature Description Status
😊 Multi-class Emotions Detect joy, anger, sadness, fear, surprise 🔄 Planned
🌍 Multi-language Support Analyze tweets in multiple languages 🔄 Planned
📊 Analytics Dashboard Visualize sentiment trends over time 🔄 Planned
🔄 Real-time Twitter Stream Analyze live tweets from Twitter API 💡 Idea
🤖 Deep Learning Model Use LSTM/BERT for better accuracy 💡 Idea
📱 Mobile App iOS/Android app for on-the-go analysis 💡 Idea
🔗 Browser Extension Analyze tweets directly on Twitter.com 💡 Idea
📈 Trend Analysis Track sentiment changes for topics/hashtags 💡 Idea
🎯 Aspect-based Sentiment Analyze sentiment for specific aspects (price, quality, etc.) 💡 Idea
💾 Database Integration Store analysis results in PostgreSQL/MongoDB 💡 Idea

🤝 Contributing

Contributions are welcome! Help improve sentiment analysis:

Ways to Contribute

Bug
Report Bugs
Found an issue?
Open an issue
Feature
Suggest Features
Have an idea?
Share it!
Code
Submit Code
Improvements?
Send a PR
Docs
Improve Docs
Better explanation?
Update README

Development Workflow

  1. Fork the repository
  2. Clone your fork:
    git clone https://github.com/your-username/twitter-sentiment-analysis.git
    cd twitter-sentiment-analysis
  3. Create a feature branch:
    git checkout -b feature/emotion-detection
  4. Make your changes
  5. Test thoroughly
  6. Commit with clear messages:
    git commit -m 'Add emotion detection feature'
  7. Push to your fork:
    git push origin feature/emotion-detection
  8. Open a Pull Request

Code Style Guidelines

  • ✅ Follow PEP 8 for Python code
  • ✅ Use descriptive variable names
  • ✅ Add docstrings to functions
  • ✅ Comment complex logic
  • ✅ Write unit tests for new features
  • ✅ Update documentation

📄 License

This project is licensed under the MIT License

Free to use, modify, and distribute with attribution

Click to view full license
MIT License

Copyright (c) 2025 Twitter Sentiment Analysis Project

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Acknowledgments

Special thanks to:

  • 🐍 Scikit-learn Team for powerful ML tools
  • 📚 NLTK Developers for NLP resources
  • 🌐 Flask Community for the web framework
  • 🐦 Twitter for inspiring social media analytics
  • 👥 Open Source Community for continuous support
  • Thank You for using and supporting this project!

👨‍💻 Author

Author Typing SVG

🎓 Computer Applications in AI & ML
Building intelligent NLP solutions


📞 Support

Need Help?

Docs
Documentation
Complete README Guide
Setup & troubleshooting
Code
Code Comments
In-line Documentation
Implementation details

Refer to the troubleshooting section above for common issues and solutions


🌟 Show Your Support

If this project helped you, please consider:

GitHub stars GitHub forks GitHub watchers



⭐ Star this repository if you found it helpful!

🍴 Fork it to build your own NLP projects!

📢 Share it with the ML community!


Footer

💬 "The limits of my language mean the limits of my world." - Ludwig Wittgenstein





© 2025 Open Source Project | Natural Language Processing | MIT License


About

A production-ready sentiment analysis application that uses Machine Learning to classify tweets as positive or negative in real-time. Built with Flask for the backend, scikit-learn for ML, and NLTK for natural language processing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors