Skip to content

GarfieldFluffJr/Terence

Repository files navigation

Terence  🦅

Terence.jpg

Terence is a Python package that makes it easy to scan and analyze GitHub repositories. It simplifies the GitHub API and processes the repo contents into a simple flat dictionary that can be accessed by file path.

Installation

From PyPI

pip install terence

Quick Start

1. Get a GitHub Developer Token

Create a personal access token at: https://github.com/settings/tokens

  • New token (classic)
  • Only permission required: repo -> public_repo
  • Additional permissions are optional

2. Basic Usage

from terence import Terence

# Initialize a new Terence instance
terence = Terence()

# Authenticate Terence
terence.auth("ghp_your_token_here")

# Scan a repository
terence.scan_repository("https://github.com/user/repo_name")

# Access repo contents
print(f"Found {len(terence.results)} files")
for file_path, content in terence.results.items():
    print(f"{file_path}: {len(content)} characters")

Usage Guide

Authentication

You must authenticate Terence with your GitHub API token before scanning any repository

terence = Terence()
terence.auth("ghp_your_token_here")

Scanning Repositories

# Scan entire repository
terence.scan_repository("https://github.com/user/repo_name")

You also have the option to scan specific file types by providing the extension in a list argument

Extension can be prepended with "." but not required (py vs .py)

# Scan only Python files
terence.scan_repository("https://github.com/user/repo_name", ["py"])

# Scan multiple file types
terence.scan_repository("https://github.com/user/repo_name", ["py", "js", "html"])

Working with Branches

You can scan the contents of a specific branch rather than the default main/master branch

# Scan a specific branch name
terence.branch("develop")
terence.scan_repository("https://github.com/user/repo_name")

# Scan a specific tag
terence.branch("v2.0.0")
terence.scan_repository("https://github.com/user/repo_name")

# Scan a specific commit (can chain methods)
terence.branch("abc123def456").scan_repository("https://github.com/user/repo_name")

To reset to the default branch, simply clear the results and scan again

# Reset to default branch
terence.clear_results() 
terence.scan_repository("https://github.com/user/repo_name")

Accessing Results

Once a scan is performed, the repository's file contents are stored in a flat dictionary in terence.results.

results = terence.results

# List all files:
for path in results.keys():
    print(f" - {path}")

# Print the first 200 characters of a specific file
if "frontend/app/page.tsx" in results:
    print(results["frontend/app/page.tsx"][:200])

# Search content across files
for file_path, content in results.items():
    if "def main" in content:
        print(f"Found 'def main' in: {file_path})

Sample Results Output

Results is a flat dictionary with each key being the path to the file including the file name and the value is the raw contents of the file

terence.results = {
    'frontend/app/index.html': '<!DOCTYPE html>\n<html>\n<head>\n<meta charset="utf-8">\n</head></html>...',
    'frontend/app/styles/globals.css': 'body {\n  font-family: Arial, sans-serif;\n...}\nh1 {\n  color: #333;\n}'
}

Repository Information

terence.scan_repository("https://github.com/user/repo_name")

repo_info = terence.get_repo_info()

repo_info = {
    'owner': 'user',
    'repo': 'repo_name',
    'url': 'https://github.com/user/repo_name'
}

Rate Limit Management

GitHub API allows for 5000 requests per hour per authenticated API token or 60 for unauthenticated.

Terence automatically flags a RateLimitError if rate limit is too low to make a new repository scan request.

rate = terence.get_rate_limit()

rate = {
    'remaining': 4102,
    'limit': 5000, # GitHub limit
    # Date format yyyy-mm-dd hr:min:sec+00:00 timezone
    'reset': datetime.datetime(2025, 12, 4, 18, 30, 0, tzinfo=datetime.timezone.utc)
}

Clearing Data

# Clear results but stay authenticated
terence.clear_results()

# Clear everything (deauthenticate)
terence.clear_all()

Sample Error Handling

from terence import Terence, RateLimitException

terence = Terence().auth("ghp_your_token_here")

try:
    terence.scan_repository("https://github.com/user/repo_name")
    print(f"Success! Found {len(terence.results)} files")

except RateLimitException as e:
    print(f"Rate limit reached: {e}")
    # Wait until reset time or use different token

except ValueError as e:
    print(f"Invalid input: {e}")
    # Check URL format or extension list

except Exception as e:
    print(f"Error: {e}")
    # Handle authentication, repo not found, etc.

File Filtering

Allowed Extensions

By default, Terence scans these file types:

  • Python: .py
  • JavaScript/TypeScript: .js, .jsx, .ts, .tsx
  • Web: .html, .htm, .css, .scss, .sass, .vue, .svelte
  • Java: .java
  • C/C++: .c, .cpp, .h, .hpp, .cc
  • Other: .go, .rs, .rb, .php, .swift, .kt, .cs

Excluded Directories

The following directories are automatically excluded:

  • node_modules/, .git/, venv/, env/, .venv/
  • __pycache__/, dist/, build/
  • .next/, .nuxt/, target/, bin/, obj/
  • test/, tests/, .pytest_cache/, coverage/

Error Types

RateLimitException

Raised when GitHub API rate limit is too low (< 10 requests remaining).

from terence import RateLimitException

try:
    terence.scan_repository(url)
except RateLimitException as e:
    print(f"Rate limit reached: {e}")

ValueError

Raised when:

  • Invalid GitHub URL format
  • Extension not in allowed extensions list

Exception

Raised for:

  • Not authenticated
  • Invalid GitHub token
  • Repository not found (or private)
  • Other GitHub API errors

Development

Setup

git clone https://github.com/yourusername/terence.git
cd terence
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest tests/test_client.py -v

# Run specific test
pytest tests/test_client.py::TestTerence::test_auth -v

# Run with coverage
pytest tests/test_client.py --cov=terence --cov-report=html

Requirements

  • Python 3.7+
  • PyGithub >= 2.1.1
  • python-dotenv >= 1.0.0

License

MIT License - see LICENSE file for details

Contributions & Support

Contributions are welcome! Feel free to fork and submit a pull request.

For any questions or concerns, please reach out to me at louieyin6@gmail.com

Author

Created by Louie Yin (GarfieldFluffJr)

About

Terence is a Python package that significantly simplifies the GitHub API to allow for the retrieval of file contents of public GitHub repositories

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages