GitHub - aryan-r03/Twitter-Sentiment-Analysis: A production-ready sentiment analysis application that uses Machine Learning to classify tweets as positive or negative in real-time. Built with Flask for the backend, scikit-learn for ML, and NLTK for natural language processing.

🐦 AI-Powered Twitter Sentiment Analysis Web Application

Professional Flask web application that analyzes tweet sentiment in real-time using Machine Learning and Natural Language Processing. Features advanced text preprocessing, TF-IDF vectorization, and Logistic Regression for accurate emotion detection.

💬 Perfect for social media analytics, brand monitoring, and NLP learning projects

Features • Demo • Quick Start • API • Model

📋 Table of Contents

🌟 Project Overview
✨ Features
🎬 Demo & Preview
🧠 Tech Stack
📦 Installation
🚀 Quick Start
💻 Usage Guide
📡 API Reference
🤖 Model Details
🔧 Text Processing Pipeline
📊 Dataset Format
⚙️ Configuration
🎨 Customization
🐛 Troubleshooting
🚀 Future Enhancements
🤝 Contributing
📄 License

🌟 Project Overview

Tweet Analysis
Real-time sentiment
280 char support

Machine Learning
Logistic Regression
85%+ accuracy

RESTful API
JSON responses
Easy integration

Modern UI
Responsive design
Real-time results

A production-ready sentiment analysis application that uses Machine Learning to classify tweets as positive or negative in real-time. Built with Flask for the backend, scikit-learn for ML, and NLTK for natural language processing.

🎯 Why This Project?

For Learning:

🎓 Master NLP fundamentals
📊 Understand ML classification
🌐 Learn Flask web development
🧹 Practice text preprocessing
📈 Explore feature engineering (TF-IDF)

For Production:

💼 Industry-standard architecture
🎨 Professional UI/UX design
📱 Social media monitoring
🔍 Brand sentiment analysis
📊 Customer feedback analysis

✨ Features

Core Capabilities

Category	Features
🤖 Machine Learning	✅ Logistic Regression classifier ✅ Balanced class weights ✅ 85%+ accuracy on test data ✅ TF-IDF vectorization (5000 features) ✅ Bigram support (1-2 word phrases) ✅ Model persistence with pickle
📝 NLP Text Processing	✅ Advanced text cleaning ✅ URL and mention removal ✅ Lemmatization (WordNet) ✅ Stopword removal (keeps negations) ✅ Special character handling ✅ Lowercase normalization
🌐 Web Application	✅ Modern, responsive UI ✅ Real-time sentiment analysis ✅ Character counter (280 limit) ✅ Animated result display ✅ Confidence score visualization ✅ Keyboard shortcuts (Enter to analyze)
📡 RESTful API	✅ JSON request/response format ✅ POST /api/analyze endpoint ✅ Detailed sentiment scores ✅ Error handling & validation ✅ CORS support ready ✅ Easy external integration
📊 Model Evaluation	✅ Accuracy, Precision, Recall metrics ✅ F1-Score calculation ✅ Confusion matrix visualization ✅ Classification report ✅ Train/test split (80/20) ✅ Stratified sampling
💾 Data Handling	✅ CSV dataset loading ✅ Multiple format support ✅ Automatic label conversion ✅ Data validation & cleaning ✅ Balanced dataset sampling ✅ Missing value handling

🎬 Demo & Preview

Application Interface

┌─────────────────────────────────────────────────────┐
│      🐦 Twitter Sentiment Analysis                  │
│   AI-Powered Emotion Detection using ML             │
│    [Trained with Logistic Regression]               │
├─────────────────────────────────────────────────────┤
│                                                     │
│   Enter Tweet                                       │
│   ┌─────────────────────────────────────────────┐   │
│   │ I love this amazing product! It's so good!  │   │
│   └─────────────────────────────────────────────┘   │
│                               47 / 280 characters   │
│                                                     │
│                                                     │
│           [      Analyze Sentiment      ]           │
│   ╔════════════════════════════════════════════╗    │
│   ║             Sentiment: Positive            ║    │
│   ║                                            ║    │
│   ║  ┌───────────┐ ┌────────────┐ ┌──────────┐ ║    │
│   ║  │Confidence │ │Positive    │ │Negative  │ ║    │
│   ║  │   95%     │ │Score: 95%  │ │Score: 5% │ ║    │
│   ║  └───────────┘ └────────────┘ └──────────┘ ║    │
│   ╚════════════════════════════════════════════╝    │
│                                                     │
└─────────────────────────────────────────────────────┘

System Architecture

┌─────────────────────────────────────────────────────────┐
│                   User Interface (Web)                  │
│                 index.html + JavaScript                 │
└────────────────────────────┬────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│             Flask Web Server (app.py)             │
│  ┌────────────────────────────────────────────┐   │
│  │ # Route Handlers                           │   │
│  │  • GET /           → Serve HTML            │   │
│  │  • POST /api/analyze → Analyze sentiment   │   │
│  └────────────────────────────────────────────┘   │
└─────────────────────────┬─────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│        SentimentModel Class (sentiment_model.py)        │
│  ┌──────────────────────────────────────────────────┐   │
│  │  1. Text Preprocessing                           │   │
│  │     • Lowercase, remove URLs, mentions           │   │
│  │     • Lemmatization, stopword removal            │   │
│  │                                                  │   │
│  │  2. TF-IDF Vectorization                         │   │
│  │     • Convert text → numerical features          │   │
│  │     • 5000 max features, bigrams                 │   │
│  │                                                  │   │
│  │  3. Logistic Regression                          │   │
│  │     • Binary classification (0/1)                │   │
│  │     • Probability scores                         │   │
│  └──────────────────────────────────────────────────┘   │
└────────────────────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│                  Prediction Result                      │
│  {                                                      │
│    "sentiment": "Positive",                             │
│    "confidence": 95,                                    │
│    "positive_score": 95,                                │
│    "negative_score": 5                                  │
│  }                                                      │
└─────────────────────────────────────────────────────────┘

ML Pipeline Workflow

Input Tweet
│
▼
┌─────────────────────────────┐
│   Text Preprocessing        │
│  • Convert to lowercase     │
│  • Remove URLs & @mentions  │
│  • Clean special chars      │
│  • Lemmatize words          │
│  • Remove stopwords         │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│   TF-IDF Vectorization      │
│  • Extract features         │
│  • Weight by importance     │
│  • Create feature vector    │
│    (5000 dimensions)        │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│   Logistic Regression       │
│  • Predict class (0/1)      │
│  • Calculate probabilities  │
│  • Return confidence        │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│   Sentiment Result          │
│  • Positive or Negative     │
│  • Confidence score (%)     │
│  • Individual probabilities │
└─────────────────────────────┘

🧠 Tech Stack

Technologies & Libraries

Python 3.8+
Core language

Flask 3.0
Web framework

Scikit-learn
ML library

Pandas
Data processing

NumPy
Numerical ops

NLTK
NLP toolkit

HTML5/CSS3
Frontend

JavaScript
Frontend logic

Component Stack

Component	Technology	Purpose
Web Framework	Flask	HTTP routing, templating, JSON API
ML Algorithm	Logistic Regression	Binary sentiment classification
Feature Extraction	TF-IDF Vectorizer	Text → numerical features
Text Processing	NLTK	Lemmatization, stopwords, tokenization
Data Handling	Pandas	CSV loading, data manipulation
Model Storage	Pickle	Serialize/deserialize trained model
Frontend	HTML/CSS/JavaScript	Responsive UI, AJAX requests

📦 Installation

System Requirements

Requirement	Minimum	Recommended
Python	3.8	3.9 - 3.11
RAM	2 GB	4 GB+
Storage	500 MB	1 GB+ (for datasets)
OS	Windows 10+, macOS 10.14+, Ubuntu 18.04+

Dependencies

# Core Framework
flask==3.0.0                # Web application framework

# Machine Learning
pandas==2.1.4               # Data manipulation
numpy==1.26.2               # Numerical computing
scikit-learn==1.3.2         # ML algorithms & tools

# Natural Language Processing
nltk==3.8.1                 # NLP toolkit (stopwords, lemmatization)

🚀 Quick Start

Step 1️⃣: Clone Repository

git clone https://github.com/your-username/twitter-sentiment-analysis.git
cd twitter-sentiment-analysis

Step 2️⃣: Create Virtual Environment (Recommended)

Windows:

python -m venv venv
venv\Scripts\activate

macOS/Linux:

python -m venv venv
source venv/bin/activate

Step 3️⃣: Install Dependencies

pip install -r requirements.txt

Verify installation:

python -c "import flask, sklearn, nltk, pandas; print('All dependencies installed!')"

Step 4️⃣: Download NLTK Data (Automatic)

The application will automatically download required NLTK data on first run:

Stopwords corpus
WordNet lemmatizer
OMW-1.4 (Open Multilingual Wordnet)

Manual download (if needed):

import nltk
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

Step 5️⃣: Prepare Dataset (Optional)

Option A: Use Sample Data (Default)

Application creates a sample dataset automatically
Good for testing and learning

Option B: Use Custom Dataset

# Place your CSV file in the project directory
# Name it: data.csv or Twitter_data.csv

# CSV Format:
# - Column 1: Tweet text
# - Column 2: Sentiment label (Positive/Negative or 0/1)

Step 6️⃣: Run the Application

python app.py

Expected Output:

============================================================
INITIALIZING SENTIMENT ANALYSIS MODEL
============================================================

✓ Found existing model: sentiment_model.pkl
✓ Model loaded successfully!

============================================================
MODEL READY
============================================================

============================================================
TWITTER SENTIMENT ANALYSIS - MACHINE LEARNING PROJECT
============================================================

📌 To use your own dataset:
   Place CSV file named 'data.csv' in this directory
   CSV should have columns: 'text' and 'sentiment'

Starting Flask server...
Open your browser and go to: http://127.0.0.1:5000
============================================================

 * Running on http://127.0.0.1:5000

Step 7️⃣: Access the Application

Open your browser
Navigate to the URL from the terminal
Enter a tweet in the text area
Click "Analyze Sentiment"
View results with confidence scores! 🎉

💻 Usage Guide

Web Interface

Step-by-Step:

Enter Tweet Text
- Type or paste a tweet (up to 280 characters)
- Character counter shows remaining length
- Example: "I absolutely love this product! Best purchase ever!"
Analyze Sentiment
- Click "Analyze Sentiment" button
- Or press Enter key for quick analysis
- Loading animation appears during processing
View Results
- Sentiment: Positive or Negative
- Confidence: Overall prediction confidence (%)
- Positive Score: Probability of positive sentiment (%)
- Negative Score: Probability of negative sentiment (%)

Example Tweets to Try:

Tweet	Expected Result
"I love this amazing product! It's fantastic!"	✅ Positive (90%+ confidence)
"This is terrible. Worst experience ever."	❌ Negative (90%+ confidence)
"The weather is nice today."	✅ Positive (moderate confidence)
"I don't like this at all. Very disappointed."	❌ Negative (high confidence)

Training Custom Model

With Your Own Dataset:

from sentiment_model import SentimentModel

# Initialize model
model = SentimentModel()

# Load your CSV file
df = model.load_dataset_from_csv(
    'your_data.csv',
    text_column='tweet',      # Column with tweet text
    label_column='sentiment'  # Column with labels
)

# Train model
if df is not None:
    accuracy = model.train(df)
    print(f"Model accuracy: {accuracy:.2%}")
    
    # Save trained model
    model.save_model('my_sentiment_model.pkl')

Test Predictions:

# Make predictions
test_tweets = [
    "I love this product!",
    "This is terrible.",
    "Not bad, could be better."
]

for tweet in test_tweets:
    result = model.predict(tweet)
    print(f"\nTweet: {tweet}")
    print(f"Sentiment: {result['sentiment']}")
    print(f"Confidence: {result['confidence']}%")

📡 API Reference

Available Endpoints

🏠 Home Page

Endpoint: GET /

Description: Serves the main HTML interface

Response: HTML page

Usage:

curl http://127.0.0.1:5000/

🔮 Analyze Sentiment

Endpoint: POST /api/analyze

Description: Analyzes sentiment of provided tweet text

Request Headers:

Content-Type: application/json

Request Body:

{
  "tweet": "I absolutely love this product! It's amazing!"
}

Response (Success):

{
  "success": true,
  "result": {
    "sentiment": "Positive",
    "confidence": 95,
    "positive_score": 95,
    "negative_score": 5
  }
}

Response (Error - Empty Tweet):

{
  "success": false,
  "error": "Tweet is empty"
}

Response (Error - No Tweet Provided):

{
  "success": false,
  "error": "No tweet provided"
}

Status Codes:

200 OK - Analysis successful
400 Bad Request - Invalid input (empty tweet, no tweet field)
500 Internal Server Error - Server/model error

Integration Examples

Python (requests)

import requests
import json

# API endpoint
url = "http://127.0.0.1:5000/api/analyze"

# Tweet to analyze
tweet_data = {
    "tweet": "I love this amazing product! Best purchase ever!"
}

# Make request
response = requests.post(url, json=tweet_data)
result = response.json()

if result['success']:
    print(f"Sentiment: {result['result']['sentiment']}")
    print(f"Confidence: {result['result']['confidence']}%")
    print(f"Positive Score: {result['result']['positive_score']}%")
    print(f"Negative Score: {result['result']['negative_score']}%")
else:
    print(f"Error: {result['error']}")

JavaScript (Fetch API)

// API endpoint
const url = 'http://127.0.0.1:5000/api/analyze';

// Tweet to analyze
const tweetData = {
  tweet: "I love this amazing product! Best purchase ever!"
};

// Make request
fetch(url, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(tweetData)
})
.then(response => response.json())
.then(data => {
  if (data.success) {
    console.log(`Sentiment: ${data.result.sentiment}`);
    console.log(`Confidence: ${data.result.confidence}%`);
    console.log(`Positive Score: ${data.result.positive_score}%`);
    console.log(`Negative Score: ${data.result.negative_score}%`);
  } else {
    console.error(`Error: ${data.error}`);
  }
})
.catch(error => console.error('Request failed:', error));

cURL (Command Line)

# Analyze sentiment
curl -X POST http://127.0.0.1:5000/api/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "tweet": "I love this amazing product! Best purchase ever!"
  }'

# Pretty print with jq
curl -X POST http://127.0.0.1:5000/api/analyze \
  -H "Content-Type: application/json" \
  -d @tweet.json | jq '.'

Node.js (Axios)

const axios = require('axios');

// API endpoint
const url = 'http://127.0.0.1:5000/api/analyze';

// Tweet to analyze
const tweetData = {
  tweet: "I love this amazing product! Best purchase ever!"
};

// Make request
axios.post(url, tweetData)
  .then(response => {
    const data = response.data;
    if (data.success) {
      console.log(`Sentiment: ${data.result.sentiment}`);
      console.log(`Confidence: ${data.result.confidence}%`);
      console.log(`Positive: ${data.result.positive_score}%`);
      console.log(`Negative: ${data.result.negative_score}%`);
    } else {
      console.error(`Error: ${data.error}`);
    }
  })
  .catch(error => console.error('Request failed:', error));

🤖 Model Details

Logistic Regression Classifier

Algorithm Configuration:

LogisticRegression(
    max_iter=1000,          # Maximum iterations for convergence
    random_state=42,        # Reproducibility
    class_weight='balanced' # Handle imbalanced datasets
)

Key Features:

Binary Classification: Positive (1) vs Negative (0)
Probability Output: Confidence scores for each class
Balanced Weights: Handles imbalanced data automatically
Fast Training: Efficient on large datasets
Interpretable: Clear feature importance

Model Performance Metrics

Metric	Score	Interpretation
Accuracy	85-90%	Overall correct predictions
Precision	83-88%	Positive predictions that are correct
Recall	85-90%	Actual positives correctly identified
F1-Score	0.84-0.89	Harmonic mean of precision & recall

Sample Confusion Matrix:

                Predicted
              Neg    Pos
Actual Neg    420    35      Specificity: 92.3%
       Pos     48    397     Recall: 89.2%

True Negatives: 420    False Positives: 35
False Negatives: 48    True Positives: 397

Overall Accuracy: (420 + 397) / 900 = 90.78%

TF-IDF Vectorization

Configuration:

TfidfVectorizer(
    max_features=5000,      # Top 5000 most important features
    ngram_range=(1, 2)      # Unigrams and bigrams
)

What is TF-IDF?

Term Frequency (TF): How often a word appears in a document
Inverse Document Frequency (IDF): How unique/rare a word is across all documents
TF-IDF Score: TF × IDF = Importance of word in document

Example:

Tweet: "I love this product! The product is amazing!"

Unigrams: ["love", "product", "amazing", ...]
Bigrams: ["love product", "product amazing", ...]

TF-IDF Vector: [0.23, 0.45, 0.67, ...] (5000 dimensions)

Why Bigrams?

Captures phrases: "not good" vs "good"
Better context understanding
Improved accuracy for negations

🔧 Text Processing Pipeline

Preprocessing Steps

Complete Pipeline:

def clean_text(text):
    # 1. Lowercase
    text = text.lower()
    # → "I Love This!" → "i love this!"
    
    # 2. Remove URLs
    text = re.sub(r'http\S+|www\S+|https\S+', '', text)
    # → "Check this: http://example.com" → "Check this:"
    
    # 3. Remove @mentions
    text = re.sub(r'@\w+', '', text)
    # → "@user Great product!" → "Great product!"
    
    # 4. Remove special characters (keep !?)
    text = re.sub(r'[^a-zA-z\s!?]', '', text)
    # → "Product #1 costs $50!" → "Product  costs !"
    
    # 5. Lemmatization
    words = [lemmatizer.lemmatize(word) for word in words]
    # → "running" → "run", "better" → "good"
    
    # 6. Remove stopwords (keep negations)
    keep_words = {'not', 'no', 'never', 'none', 'neither', 'nor'}
    words = [word for word in words 
             if word not in stopwords or word in keep_words]
    # → Keeps "not good" (important for sentiment)
    
    return ' '.join(words)

Example Transformation:

Stage	Text
Original	I LOVE this product! Check it out: http://example.com @company #amazing
Lowercase	i love this product! check it out: http://example.com @company #amazing
Remove URLs	i love this product! check it out: @company #amazing
Remove Mentions	i love this product! check it out: #amazing
Remove Special Chars	i love this product check it out amazing
Lemmatize	i love this product check it out amazing
Remove Stopwords	love product check amazing
Final	love product check amazing

Negation Handling:

# These words are preserved even though they're stopwords
keep_words = {'not', 'no', 'never', 'none', 'nothing', 'neither', 'nor', "n't"}

# Why? They completely change sentiment:
"good" → Positive
"not good" → Negative ✓ (preserved)

📊 Dataset Format

Supported CSV Formats

Format 1: Standard Binary (Recommended)

text,sentiment
"I love this product!",1
"This is terrible.",0
"Great experience!",1
"Worst service ever.",0

Format 2: Text Labels

text,sentiment
"I love this product!",positive
"This is terrible.",negative
"Great experience!",positive
"Worst service ever.",negative

Format 3: Twitter Format (0/4)

text,sentiment
"I love this product!",4
"This is terrible.",0
"Great experience!",4
"Worst service ever.",0

Format 4: Alternative Column Names

tweet,label
"I love this product!",pos
"This is terrible.",neg

Automatic Format Detection

The application automatically detects and converts:

Format	Conversion
0/1	✓ Already binary
0/4	0→0, 4→1
positive/negative	negative→0, positive→1
pos/neg	neg→0, pos→1
Custom numeric	Threshold at median

Dataset Requirements:

✅ Must Have:

Text column (tweets/messages)
Sentiment/label column
At least 100 samples (recommended: 1000+)
Balanced classes (equal pos/neg samples)

❌ Avoid:

Missing values in text or sentiment
Empty tweets
Single-class datasets
Extreme class imbalance (>90% one class)

Example Custom Dataset:

import pandas as pd

# Create custom dataset
data = {
    'text': [
        "I love this!",
        "Terrible experience",
        "Amazing product!",
        "Very disappointed",
        # ... more samples
    ],
    'sentiment': [1, 0, 1, 0, ...]  # 0=negative, 1=positive
}

df = pd.DataFrame(data)
df.to_csv('my_dataset.csv', index=False)

⚙️ Configuration

Model Parameters

Modify in sentiment_model.py:

# TF-IDF Configuration
self.vectorizer = TfidfVectorizer(
    max_features=5000,        # Number of features (1000-10000)
    ngram_range=(1, 2),       # (1,1) for unigrams only, (1,3) for trigrams
    min_df=2,                 # Minimum document frequency
    max_df=0.95               # Maximum document frequency
)

# Logistic Regression Configuration
self.model = LogisticRegression(
    max_iter=1000,            # Increase if convergence warning
    random_state=42,          # For reproducibility
    class_weight='balanced',  # Handle imbalanced data
    C=1.0,                    # Regularization strength (lower = more regularization)
    solver='lbfgs'            # Optimization algorithm
)

Parameter Tuning Guide:

Parameter	Effect	Recommendation
max_features	Number of TF-IDF features	5000 (balanced) 3000 (faster) 10000 (more accurate)
ngram_range	Word combinations to consider	(1,2) - unigrams + bigrams (1,1) - single words only (1,3) - up to 3-word phrases
max_iter	Training iterations	1000 (default) 2000 (if not converging)
class_weight	Handle class imbalance	'balanced' (recommended) None (equal weights)

Flask Server Configuration

Modify in app.py:

# Change port
app.run(debug=True, port=5000)  # Use 8000, 8080, etc.

# Production mode
app.run(debug=False, host='0.0.0.0', port=5000)

# Threading for multiple requests
app.run(debug=False, threaded=True)

File Paths

# CSV dataset path
CSV_FILE = 'data.csv'  # Change to your dataset name

# Model save/load path
MODEL_FILE = 'sentiment_model.pkl'  # Custom model name

🎨 Customization

Extension Ideas

🎨 Custom UI Theme

Modify templates/index.html:

/* Change gradient colors */
body {
    background: linear-gradient(135deg, #1e3c72 0%, #2a5298 100%);
    /* Or try: */
    background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
    background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%);
}

/* Change button colors */
.btn {
    background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
}

/* Change positive/negative colors */
.result.positive {
    background: #c3e6cb;  /* Light green */
    border: 2px solid #28a745;
}

.result.negative {
    background: #f5c6cb;  /* Light red */
    border: 2px solid #dc3545;
}

📊 Add More Metrics

# In sentiment_model.py - predict() method

def predict(self, text):
    # ... existing code ...
    
    # Add entropy (uncertainty measure)
    entropy = -sum(p * np.log(p) for p in probabilities if p > 0)
    
    # Add emotion detection (requires additional model)
    emotions = self.detect_emotions(text)
    
    return {
        'sentiment': sentiment,
        'confidence': confidence,
        'positive_score': int(probabilities[1] * 100),
        'negative_score': int(probabilities[0] * 100),
        'entropy': round(entropy, 3),  # NEW
        'emotions': emotions  # NEW
    }

💾 Save Analysis History

# In app.py

import datetime
import json

HISTORY_FILE = 'analysis_history.json'

@app.route('/api/analyze', methods=['POST'])
def analyze():
    # ... existing analysis code ...
    
    # Save to history
    history_entry = {
        'timestamp': datetime.datetime.now().isoformat(),
        'tweet': tweet,
        'result': result
    }
    
    # Load existing history
    try:
        with open(HISTORY_FILE, 'r') as f:
            history = json.load(f)
    except:
        history = []
    
    # Add new entry
    history.append(history_entry)
    
    # Save history
    with open(HISTORY_FILE, 'w') as f:
        json.dump(history, f, indent=2)
    
    return jsonify({'success': True, 'result': result})

# Add endpoint to view history
@app.route('/api/history', methods=['GET'])
def get_history():
    try:
        with open(HISTORY_FILE, 'r') as f:
            history = json.load(f)
        return jsonify({'success': True, 'history': history})
    except:
        return jsonify({'success': False, 'error': 'No history found'})

🔄 Try Different ML Models

# In sentiment_model.py

from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB

# Option 1: Random Forest
self.model = RandomForestClassifier(
    n_estimators=100,
    max_depth=50,
    random_state=42
)

# Option 2: Support Vector Machine
self.model = SVC(
    kernel='linear',
    probability=True,  # Required for predict_proba
    random_state=42
)

# Option 3: Naive Bayes
self.model = MultinomialNB(
    alpha=1.0  # Smoothing parameter
)

# Compare models
models = {
    'Logistic Regression': LogisticRegression(),
    'Random Forest': RandomForestClassifier(),
    'SVM': SVC(probability=True),
    'Naive Bayes': MultinomialNB()
}

for name, model in models.items():
    self.model = model
    accuracy = self.train(df)
    print(f"{name}: {accuracy:.2%}")

📱 Add Batch Analysis

# In app.py

@app.route('/api/batch_analyze', methods=['POST'])
def batch_analyze():
    try:
        data = request.get_json()
        tweets = data.get('tweets', [])
        
        if not tweets or not isinstance(tweets, list):
            return jsonify({
                'success': False,
                'error': 'Invalid tweets array'
            }), 400
        
        results = []
        for tweet in tweets:
            result = sentiment_model.predict(tweet)
            results.append({
                'tweet': tweet,
                'sentiment': result
            })
        
        return jsonify({
            'success': True,
            'results': results,
            'count': len(results)
        })
    
    except Exception as e:
        return jsonify({
            'success': False,
            'error': str(e)
        }), 500

Usage:

curl -X POST http://127.0.0.1:5000/api/batch_analyze \
  -H "Content-Type: application/json" \
  -d '{
    "tweets": [
      "I love this!",
      "This is terrible.",
      "Not bad."
    ]
  }'

🌐 Add Multi-language Support

# Install: pip install googletrans==3.1.0a0

from googletrans import Translator

translator = Translator()

def translate_to_english(text):
    """Translate text to English before analysis"""
    try:
        detected = translator.detect(text)
        if detected.lang != 'en':
            translated = translator.translate(text, dest='en')
            return translated.text
    except:
        pass
    return text

# In predict() method
def predict(self, text):
    # Translate if needed
    text_english = translate_to_english(text)
    
    # ... rest of prediction code ...

🐛 Troubleshooting

Common Issues & Solutions

❌ NLTK Data Not Found

Symptoms:

LookupError: Resource stopwords not found.
LookupError: Resource wordnet not found.

Solutions:

Manual Download:

import nltk
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

Download to Specific Directory:

import nltk
nltk.download('stopwords', download_dir='/path/to/nltk_data')
nltk.data.path.append('/path/to/nltk_data')

Download All NLTK Data:

import nltk
nltk.download('all')  # Warning: Large download (3.5 GB)

Verify Installation:

from nltk.corpus import stopwords
print(stopwords.words('english')[:10])
# Should print: ['i', 'me', 'my', 'myself', ...]

🔄 Model Not Training / Low Accuracy

Symptoms:

Accuracy < 70%
Model predicts same class for everything
Convergence warnings

Solutions:

Check Dataset Balance:

print(df['sentiment'].value_counts())
# Should be roughly equal:
# 0    5000
# 1    5000

Increase Training Data:
- Need minimum 500 samples (250 per class)
- Recommended: 5000+ samples

Increase Max Iterations:

self.model = LogisticRegression(max_iter=2000)  # Increase from 1000

Balance Dataset:

# Undersample majority class
min_count = min(
    (df['sentiment'] == 0).sum(),
    (df['sentiment'] == 1).sum()
)

df_neg = df[df['sentiment'] == 0].sample(n=min_count)
df_pos = df[df['sentiment'] == 1].sample(n=min_count)
df_balanced = pd.concat([df_neg, df_pos])

Check Text Quality:

# Print sample cleaned texts
print(df['cleaned_text'].head(10))
# Should not be empty or too short

💾 Model Save/Load Errors

Symptoms:

FileNotFoundError: [Errno 2] No such file or directory: 'sentiment_model.pkl'
pickle.UnpicklingError: invalid load key

Solutions:

Check File Exists:

import os
print(os.path.exists('sentiment_model.pkl'))
print(os.path.abspath('sentiment_model.pkl'))

Ensure Directory Permissions:

# Linux/Mac
chmod 755 .

# Windows
# Check folder permissions in Properties

Save with Absolute Path:

import os
model_path = os.path.join(os.getcwd(), 'sentiment_model.pkl')
model.save_model(model_path)

Delete Corrupted Model:

rm sentiment_model.pkl
# Then retrain
python app.py

🌐 Flask Server Won't Start

Symptoms:

Address already in use
OSError: [Errno 48] Address already in use

Solutions:

Find Process Using Port:

# Linux/Mac
lsof -i :5000

# Windows
netstat -ano | findstr :5000

Kill Process:

# Linux/Mac
kill -9 <PID>

# Windows
taskkill /PID <PID> /F

Use Different Port:

app.run(debug=True, port=8000)  # Change to 8000

Check for Multiple Instances:

ps aux | grep python  # Linux/Mac
tasklist | findstr python  # Windows

📊 CSV Loading Errors

Symptoms:

FileNotFoundError: data.csv not found
KeyError: 'sentiment'
UnicodeDecodeError: 'utf-8' codec can't decode

Solutions:

Check File Location:

import os
print(os.listdir('.'))  # List files in current directory

Try Different Encoding:

# In load_dataset_from_csv()
try:
    df = pd.read_csv('data.csv', encoding='utf-8')
except:
    df = pd.read_csv('data.csv', encoding='latin-1')
except:
    df = pd.read_csv('data.csv', encoding='iso-8859-1')

Check CSV Format:

import pandas as pd
df = pd.read_csv('data.csv', nrows=5)
print(df.columns)  # Check column names
print(df.head())   # Check first rows

Manual Column Mapping:

df = df.rename(columns={
    'Tweet': 'text',        # Rename your columns
    'Label': 'sentiment'
})

🚀 Future Enhancements

Planned Features

Feature	Description	Status
😊 Multi-class Emotions	Detect joy, anger, sadness, fear, surprise	🔄 Planned
🌍 Multi-language Support	Analyze tweets in multiple languages	🔄 Planned
📊 Analytics Dashboard	Visualize sentiment trends over time	🔄 Planned
🔄 Real-time Twitter Stream	Analyze live tweets from Twitter API	💡 Idea
🤖 Deep Learning Model	Use LSTM/BERT for better accuracy	💡 Idea
📱 Mobile App	iOS/Android app for on-the-go analysis	💡 Idea
🔗 Browser Extension	Analyze tweets directly on Twitter.com	💡 Idea
📈 Trend Analysis	Track sentiment changes for topics/hashtags	💡 Idea
🎯 Aspect-based Sentiment	Analyze sentiment for specific aspects (price, quality, etc.)	💡 Idea
💾 Database Integration	Store analysis results in PostgreSQL/MongoDB	💡 Idea

🤝 Contributing

Contributions are welcome! Help improve sentiment analysis:

Ways to Contribute

Report Bugs
Found an issue?
Open an issue

Suggest Features
Have an idea?
Share it!

Submit Code
Improvements?
Send a PR

Improve Docs
Better explanation?
Update README

Development Workflow

Fork the repository

Clone your fork:

git clone https://github.com/your-username/twitter-sentiment-analysis.git
cd twitter-sentiment-analysis

Create a feature branch:

git checkout -b feature/emotion-detection

Make your changes
Test thoroughly

Commit with clear messages:

git commit -m 'Add emotion detection feature'

Push to your fork:

git push origin feature/emotion-detection

Open a Pull Request

Code Style Guidelines

✅ Follow PEP 8 for Python code
✅ Use descriptive variable names
✅ Add docstrings to functions
✅ Comment complex logic
✅ Write unit tests for new features
✅ Update documentation

📄 License

This project is licensed under the MIT License

Free to use, modify, and distribute with attribution

Click to view full license

MIT License

Copyright (c) 2025 Twitter Sentiment Analysis Project

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Acknowledgments

Special thanks to:

🐍 Scikit-learn Team for powerful ML tools
📚 NLTK Developers for NLP resources
🌐 Flask Community for the web framework
🐦 Twitter for inspiring social media analytics
👥 Open Source Community for continuous support
Thank You for using and supporting this project!

👨‍💻 Author

🎓 Computer Applications in AI & ML
Building intelligent NLP solutions

📞 Support

Need Help?

Documentation
Complete README Guide
Setup & troubleshooting

Code Comments
In-line Documentation
Implementation details

Refer to the troubleshooting section above for common issues and solutions

🌟 Show Your Support

If this project helped you, please consider:

⭐ Star this repository if you found it helpful!

🍴 Fork it to build your own NLP projects!

📢 Share it with the ML community!

💬 "The limits of my language mean the limits of my world." - Ludwig Wittgenstein

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
app.py		app.py
index.html		index.html
requirements.txt		requirements.txt
sentiment_model.py		sentiment_model.py

Folders and files

Latest commit

History

Repository files navigation

🐦 AI-Powered Twitter Sentiment Analysis Web Application

📋 Table of Contents

🌟 Project Overview

🎯 Why This Project?

✨ Features

Core Capabilities

🎬 Demo & Preview

Application Interface

System Architecture

ML Pipeline Workflow

🧠 Tech Stack

Technologies & Libraries

Component Stack

📦 Installation

System Requirements

Dependencies

🚀 Quick Start

Step 1️⃣: Clone Repository

Step 2️⃣: Create Virtual Environment (Recommended)

Step 3️⃣: Install Dependencies

Step 4️⃣: Download NLTK Data (Automatic)

Step 5️⃣: Prepare Dataset (Optional)

Step 6️⃣: Run the Application

Step 7️⃣: Access the Application

💻 Usage Guide

Web Interface

Training Custom Model

📡 API Reference

Available Endpoints

🏠 Home Page

🔮 Analyze Sentiment

Integration Examples

🤖 Model Details

Logistic Regression Classifier

Model Performance Metrics

TF-IDF Vectorization

🔧 Text Processing Pipeline

Preprocessing Steps

📊 Dataset Format

Supported CSV Formats

Automatic Format Detection

⚙️ Configuration

Model Parameters

Flask Server Configuration

File Paths

🎨 Customization

Extension Ideas

🐛 Troubleshooting

Common Issues & Solutions

🚀 Future Enhancements

Planned Features

🤝 Contributing

Ways to Contribute

Development Workflow

Code Style Guidelines

📄 License

Acknowledgments

👨‍💻 Author

📞 Support

Need Help?

🌟 Show Your Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages