Skip to content

feat: Automate llms.txt Generation via GitHub Action #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

canburaks
Copy link

Summary

This PR introduces an automated process using GitHub Actions to generate and maintain llms.txt and llms-full.txt based on the SDK's Markdown documentation.

This follows prior discussion on email regarding this feature. Evaluating the suggestion to utilize the existing Speakeasy generation process revealed that implementing it directly could introduce unnecessary complexity and maintenance overhead compared to a dedicated solution.

Description

This PR introduces an automated process for generating and updating llms.txt and llms-full.txt files based on the project's documentation. This ensures these files remain accurate and up-to-date with minimal manual intervention, improving the project's discoverability and usability for AI agents and language models.

Changes Introduced

  1. Python Generation Script (.github/scripts/generate_llms_files.py):

    • Parses project metadata (name, description) from pyproject.toml.
    • Recursively scans a specified directory (e.g., docs/sdks) for Markdown (.md) files.
    • Generates llms.txt:
      • Includes project name (H1) and description (blockquote).
      • Adds a dynamically named section (e.g., ## Sdks Files) based on the source directory.
      • Lists links to discovered .md files.
      • Includes an optional link to llms-full.txt.
      • Generates absolute raw GitHub URLs using a provided --base-url or relative links otherwise.
    • Generates llms-full.txt:
      • Includes the base content from llms.txt.
      • Appends the full content of each discovered .md file under a ## Content: <filepath> heading.
      • Adjusts heading levels within the appended content (H1 -> H3, H2 -> H4, etc.) to maintain correct Markdown hierarchy.
  2. GitHub Actions Workflow (.github/workflows/generate-llms-txt.yml):

    • Triggers: Automatically runs on pushes to main affecting docs/sdks/**, pyproject.toml, the script, or the workflow itself. Also supports manual triggers (workflow_dispatch).
    • Execution:
      • Sets up Python 3.10 and installs the toml dependency.
      • Constructs the correct base URL for raw GitHub file content (using github.ref to include refs/heads/...).
      • Executes the Python generation script, passing the source directory (docs/sdks) and the constructed base URL.
    • Auto-Commit: Uses stefanzweifel/git-auto-commit-action to automatically commit changes only to llms.txt and llms-full.txt directly to the branch, using the github-actions[bot] user.

Benefits

  • Automation: Eliminates the need to manually update llms.txt files when documentation changes.
  • Accuracy: Ensures generated files always reflect the current state of the documentation and project metadata.
  • Consistency: Guarantees adherence to the llms.txt specification and correct link/heading formatting.
  • Maintainability: Centralizes generation logic in a dedicated script.

How to Verify

  • Pushing changes to files within docs/sdks/ on the main branch should trigger the action and result in an automatic commit updating llms.txt and llms-full.txt if necessary.
  • The workflow can be manually triggered via the Actions tab on GitHub by appending the correct branch name to .github/workflows/generate-llms-txt.yml file under on -> push -> paths.

Happy to make any necessary changes based on feedback!

canburaks and others added 2 commits April 29, 2025 04:54
commit 76269d5191da3e12050d8bc60e485a5f470d50d8
Author: Can Burak Sofyalioglu <[email protected]>
Date:   Tue Apr 29 04:53:57 2025 +0300

    trigger path was updated

commit 528522de7ab4d2fdcaee34e6e4cc77a891b3b86e
Author: Can Burak Sofyalioglu <[email protected]>
Date:   Tue Apr 29 04:52:28 2025 +0300

    Squashed commit of the following:

    commit dc4fa10360137823a39d89d90edc8a0f1a57eba6
    Author: Can Burak Sofyalioglu <[email protected]>
    Date:   Tue Apr 29 04:51:43 2025 +0300

        feat/llms-txt

    commit a322742
    Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
    Date:   Tue Apr 29 01:11:21 2025 +0000

        chore: Auto-generate llms.txt and llms-full.txt

    commit d26a7c7
    Author: Can Burak Sofyalioglu <[email protected]>
    Date:   Tue Apr 29 04:11:05 2025 +0300

        Markdown link text shows the SDK module name

    commit 67894d7
    Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
    Date:   Tue Apr 29 01:03:16 2025 +0000

        chore: Auto-generate llms.txt and llms-full.txt

    commit c7d7b60
    Merge: 2d1cc27 1821db3
    Author: Can Burak Sofyalioglu <[email protected]>
    Date:   Tue Apr 29 04:02:59 2025 +0300

        Merge branch 'llms-txt' of https://github.com/canburaks/polar-python into llms-txt

    commit 2d1cc27
    Author: Can Burak Sofyalioglu <[email protected]>
    Date:   Tue Apr 29 04:02:45 2025 +0300

        correct repo name variable was set

    commit 1821db3
    Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
    Date:   Tue Apr 29 00:51:29 2025 +0000

        chore: Auto-generate llms.txt and llms-full.txt

    commit d39e877
    Author: Can Burak Sofyalioglu <[email protected]>
    Date:   Tue Apr 29 03:51:12 2025 +0300

        broken links was fixed

    commit 9f14d27
    Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
    Date:   Mon Apr 28 10:12:42 2025 +0000

        chore: Auto-generate llms.txt and llms-full.txt

    commit 9ec074f
    Author: Can Burak Sofyalioglu <[email protected]>
    Date:   Mon Apr 28 13:11:27 2025 +0300

        raw file url link was fixed

    commit 473b1f5
    Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
    Date:   Mon Apr 28 09:52:34 2025 +0000

        chore: Auto-generate llms.txt and llms-full.txt

    commit 287f492
    Author: Can Burak Sofyalioglu <[email protected]>
    Date:   Mon Apr 28 12:51:20 2025 +0300

        raw url usage for llms-full.txt

        The llms-full.txt file will use raw GitHub URLs rather than relative URLs.

    commit 5fb33ee
    Author: Can Burak Sofyalioglu <[email protected]>
    Date:   Mon Apr 28 12:21:14 2025 +0300

        Create llms.txt

        llms.txt and llms-full.txt files have been generated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant