Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEO-and-paper-update #23

Merged
merged 12 commits into from
Mar 18, 2025
Merged

SEO-and-paper-update #23

merged 12 commits into from
Mar 18, 2025

Conversation

VatsalSy
Copy link
Member

Description

This pull request includes the following changes:

  1. Generate SEO metadata from search database:

    • Adds a new script scripts/generate_seo_tags.rb that generates SEO metadata (keywords and descriptions) for the website's pages based on the search database.
    • Loads the search database from assets/js/search_db.json.
    • Extracts keywords from the page title, type, and tags.
    • Generates descriptions from the page content.
    • Updates the HTML files with the generated metadata (keywords and description).
    • Handles cases where no good description is available by generating one from the full content.
    • Normalizes URLs to ensure consistent metadata generation.
    • This feature will improve the website's SEO by providing relevant metadata for search engines.
  2. Add build and development instructions, repository guidelines, and coding conventions:

    • Adds detailed instructions for building and developing the project.
    • Adds guidelines for maintaining the repository, including updating the README, using the provided templates and CSS files, and keeping the documentation up-to-date.
    • Adds coding conventions for HTML/Markdown, CSS, and JavaScript, including indentation, DRY principles, commenting, and support for light and dark themes.
  3. Add GitHub Actions workflow to rebuild site on search database update:

    • Adds a new GitHub Actions workflow that automatically rebuilds the site when the search database file (assets/js/search_db.json) is updated.
    • The workflow is triggered on a push to the main branch, specifically when the search database file is modified.
    • It checks out the repository, sets up the Ruby environment, builds the site and search database, and then deploys the updated site to GitHub Pages.
  4. Add new paper on drop impact forces and scaling:

    • Adds a new paper by Sanjay, V. and Lohse, D. on the unifying theory of scaling in drop impact, including the forces and maximum spreading diameter.
    • The paper was published in Physical Review Letters in 2025 and received positive reviews.
    • The changes include adding the paper details, tags, and associated badges to the research page.

Related Issues

None.

Checklist

  • I have tested the changes locally.
  • I have updated the documentation (if applicable).
  • I have added necessary tests (if applicable).

This commit adds a new script `scripts/generate_seo_tags.rb` that generates SEO metadata (keywords and descriptions) for the website's pages based on the search database.

The key changes are:

- Loads the search database from `assets/js/search_db.json`
- Extracts keywords from the page title, type, and tags
- Generates descriptions from the page content
- Updates the HTML files with the generated metadata (keywords and description)
- Handles cases where no good description is available by generating one from the full content
- Normalizes URLs to ensure consistent metadata generation

This feature will improve the website's SEO by providing relevant metadata for search engines.
…and coding conventions

This commit adds detailed instructions for building and developing the project, as well as guidelines for maintaining the repository and coding conventions to follow. The changes include:

- Added a new section on "Build and Development Commands" with instructions for installing dependencies, building the site and search database, running the local server, fetching blog content, and generating the search database.
- Added a new section on "Repository Guidelines" with instructions for updating the README, using the provided templates and CSS files, and keeping the documentation up-to-date.
- Added a new section on "General" coding conventions, including indentation, DRY principles, commenting, and support for light and dark themes.
- Added a new section on "HTML/Markdown" conventions, including the use of semantic HTML elements, BEM naming, and keeping content in Markdown format.
- Added a new section on "CSS" conventions, including the use of CSS variables, responsive breakpoints, units, mobile-first approach, and dark theme implementation.
- Added a new section on "JavaScript" conventions, including the use of ES6+ features and following best practices.

These changes aim to provide clear and comprehensive guidelines for contributors to follow when working on the project, ensuring consistency and maintainability.
rebuild site on search database update

This commit adds a new GitHub Actions workflow that automatically rebuilds
the site when the search database file (`assets/js/search_db.json`) is
updated. This ensures that the search functionality on the site is always
up-to-date with the latest content.

The workflow is triggered on a push to the `main` branch, specifically when
the search database file is modified. It checks out the repository, sets up
the Ruby environment, builds the site and search database, and then deploys
the updated site to GitHub Pages.
The changes in this commit add a new paper by Sanjay, V. and Lohse, D. on the unifying theory of scaling in drop impact, including the forces and maximum spreading diameter. This paper was published in Physical Review Letters in 2025 and received positive reviews. The changes include adding the paper details, tags, and associated badges to the research page.
Copy link
Contributor

coderabbitai bot commented Mar 18, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • The News and Research sections now highlight the latest publication with updated citation details and refined terminology.
  • SEO & Presentation Enhancements
    • Improved metadata and structured information across key pages boost content discoverability and search accuracy.
    • Refined visual styling ensures a consistent and smooth browsing experience.
  • Search Updates
    • Enhanced search content now delivers more accurate and timely results.

Walkthrough

This pull request introduces a new GitHub Actions workflow to rebuild the site upon changes to the search database and modifies an existing workflow to update its schedule and functionality. Various layout templates and content files are enhanced with SEO meta tags, structured data, and font declarations. A new research entry is added to the News.md and _research/index.md, while an outdated entry is removed. The search database is expanded with additional details, a new robots.txt file is created, and the build process is enhanced with a Ruby script for generating SEO metadata.

Changes

File(s) Change Summary
.github/workflows/rebuild-on-search-update.yml
.github/workflows/update-search.yml
Added a new rebuild workflow triggered by changes in assets/js/search_db.json and modified the scheduled workflow (cron adjusted to 4:00 UTC) with an added commit/push step.
News.md
_research/index.md
Updated content: A new research publication entry was added with detailed citation info, and an old course/news entry was removed; tags were also adjusted.
_layouts/default.html
_layouts/research.html
_layouts/teaching-course.html
_layouts/teaching.html
_layouts/team.html
Enhanced SEO across layout templates with new meta tags (description, author, robots, keywords), JSON-LD structured data scripts, and updated font-family declarations.
assets/js/search_db.json Expanded with sections covering build/development commands, repository guidelines, coding conventions, and additional news details.
robots.txt New file added specifying crawler directives with open access and a sitemap reference.
scripts/build.sh
scripts/generate_seo_tags.rb
Extended the build script to invoke SEO metadata generation; a new Ruby script was introduced to generate and update SEO tags within HTML files and adjust the sitemap.
scripts/generate_search_db.rb Improved logic for skipping README.md files during processing.
_config.yml Added an exclude section to specify directories and files to be excluded from processing.

Sequence Diagram(s)

sequenceDiagram
    participant Developer
    participant GitHubActions as "GitHub Actions"
    participant BuildJob as "Build Job"
    participant DeployJob as "Deploy Job"
    
    Developer->>GitHubActions: Push to main (change in search_db.json)
    GitHubActions->>BuildJob: Trigger build workflow
    BuildJob->>BuildJob: Checkout code, setup Ruby, run build.sh (includes SEO generation)
    BuildJob->>GitHubActions: Upload build artifact
    GitHubActions->>DeployJob: Trigger deploy workflow (after build success)
    DeployJob->>DeployJob: Deploy to GitHub Pages
Loading
sequenceDiagram
    participant BuildScript as "build.sh"
    participant SEOGenerator as "generate_seo_tags.rb"
    participant HTML as "HTML Files"
    participant Sitemap as "sitemap.xml"
    
    BuildScript->>SEOGenerator: Invoke SEO metadata generation
    SEOGenerator->>HTML: Process URLs and update SEO metadata
    SEOGenerator->>Sitemap: Update sitemapindex.xml if sitemap exists
    SEOGenerator->>BuildScript: Return completion status
Loading

Poem

I'm a bunny with ears that perk high,
Hopping through commits as time goes by.
New workflows and SEO tags set the pace,
Research and news updated with grace.
I nibble on code, cheerful and spry—
A rabbit's rejoice as our project takes flight! 🐇✨

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@VatsalSy VatsalSy self-assigned this Mar 18, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (15)
News.md (1)

5-15: Enhance SEO Research Entry Structure
The new research entry for the publication by Sanjay, V. & Lohse, D. is well-formatted and clearly highlights the paper’s details, including the image with appropriate alt text and the DOI link. One suggestion is to consider moving repeated inline style definitions (e.g. those on the <div> and <a> tags) into a shared CSS class for improved maintainability and consistency.

_research/index.md (1)

59-70: New Research Entry Addition & Tag Consistency
The new research entry under the 2025 section is clearly presented and follows the format of existing entries. The title, citation, and DOI link are all properly displayed, and the inclusion of the image (with centered styling) adds visual appeal.

One minor point to address is a probable typo in the tags: the term "Dissipative anamoly" appears instead of "Dissipative anomaly". Correcting this will help maintain consistency across all research entries. For example, consider applying the following diff:

-<tags><span>Drops</span><span>Dissipative anamoly</span><span>Superamphiphobic-surfaces</span><span>Impact forces</span><span>Featured</span></tags>
+<tags><span>Drops</span><span>Dissipative anomaly</span><span>Superamphiphobic-surfaces</span><span>Impact forces</span><span>Featured</span></tags>

Additionally, verify that the Dropbox link for the parameter space image remains reliably accessible for public usage.

robots.txt (1)

1-7: Review of Robots.txt File

• The directives correctly allow all user agents and point to the sitemap.
• The empty "Disallow:" is interpreted as “no restrictions” which is acceptable.
Nitpick: There is an extra line at the end (line 7 displaying “7”) that appears unintended. If it is not purposeful, please remove it.

scripts/build.sh (1)

38-41: SEO Metadata Generation Step

• The new block that echoes "Generating SEO metadata..." and then invokes the Ruby script (scripts/generate_seo_tags.rb) is clear and well placed in the build process.
• Ensure that the Ruby script is executable and that any necessary dependencies or environment settings are in place.
• It might be helpful to add a brief inline comment for maintainers about the expected outcome or failure behavior of this step.

_layouts/teaching.html (1)

64-78: Structured Data for Teaching

• The JSON-LD snippet for a "Course" clearly outlines information about the educational offerings. This structured data will aid search engines in understanding your course content.
• As a good-to-have improvement, consider adding additional properties (e.g., “courseCode” or “provider” details) if available.

_layouts/default.html (1)

61-72: Organization Structured Data Added

• The JSON-LD structured data block provides search engines with key information about the organization.
• Consider populating the "sameAs" array with links to your verified social media profiles to further improve your rich snippet eligibility.

_layouts/team.html (1)

36-41: SEO Meta Tags Addition
The new SEO meta tags (description, author, robots, keywords) are a welcome enhancement for improving search engine discoverability. Ensure that the static content is reviewed periodically or, if applicable, later converted to dynamic values to better reflect page-specific content.

.github/workflows/update-search.yml (1)

67-67: Remove Trailing Spaces
Line 67 contains trailing spaces. Removing these will keep the YAML file clean and compliant with linting tools.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 67-67: trailing spaces

(trailing-spaces)

.github/workflows/rebuild-on-search-update.yml (3)

18-19: YAML Indentation and Trailing Spaces
The static analysis indicates an indentation issue at line 18 (expected 6 spaces but found 4) along with trailing spaces on line 19. Please adjust the indentation and remove the extra spaces to conform with YAML linting standards.

🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 18-18: wrong indentation: expected 6 but found 4

(indentation)


[error] 19-19: trailing spaces

(trailing-spaces)


25-25: Clean Up Trailing Spaces
There are trailing spaces on line 25. Removing these will improve readability and reduce potential linting warnings.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 25-25: trailing spaces

(trailing-spaces)


30-30: Remove Trailing Spaces on Line 30
Please remove any trailing spaces on line 30 for consistency with YAML formatting best practices.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 30-30: trailing spaces

(trailing-spaces)

assets/js/search_db.json (1)

739-757: Consider removing script code from content field.

The registration section contains embedded JavaScript code for copying email addresses. While the functionality is useful, embedding scripts directly in content can lead to security concerns and maintenance challenges.

-    "content": "For registration details, please contact \n<div class=\"email-container\">\n    <span class=\"email-text\">[email protected]</span>\n    <button class=\"copy-btn\" onclick=\"copyEmail(this)\" data-text=\"[email protected]\" aria-label=\"Copy email address [email protected]\">\n        <i class=\"fas fa-copy\"></i>\n    </button>\n</div>\n<div class=\"email-container\">\n    <span class=\"email-text\">[email protected]</span>\n    <button class=\"copy-btn\" onclick=\"copyEmail(this)\" data-text=\"[email protected]\" aria-label=\"Copy email address [email protected]\">\n        <i class=\"fas fa-copy\"></i>\n    </button>\n</div>\n\n<script>\nfunction copyEmail(button) {\n  const textToCopy = button.getAttribute('data-text');\n  \n  // Create a temporary textarea element to copy from\n  const textarea = document.createElement('textarea');\n  textarea.value = textToCopy;\n  textarea.setAttribute('readonly', '');\n  textarea.style.position = 'absolute';\n  textarea.style.left = '-9999px';\n  document.body.appendChild(textarea);\n  \n  // Select and copy the text\n  textarea.select();\n  document.execCommand('copy');\n  \n  // Remove the temporary element\n  document.body.removeChild(textarea);\n  \n  // Show feedback\n  const originalIcon = button.innerHTML;\n  button.innerHTML = '<i class=\"fas fa-check\"></i>';\n  button.classList.add('copied');\n  \n  // Restore original state after a delay\n  setTimeout(() => {\n    button.innerHTML = originalIcon;\n    button.classList.remove('copied');\n  }, 2000);\n}\n</script>\n\n<div style=\"margin-top: 2rem; text-align: center;\">\n  <a href=\"https://github.com/comphy-lab/Basilisk-101\" class=\"course-card__link\" target=\"_blank\" aria-label=\"Course GitHub Repository\">\n    <i class=\"fa-brands fa-github\" style=\"margin-right: 0.5rem; font-style: normal;\"></i>Course GitHub Repository\n  </a>\n</div>",
+    "content": "For registration details, please contact \n<div class=\"email-container\">\n    <span class=\"email-text\">[email protected]</span>\n    <button class=\"copy-btn\" data-text=\"[email protected]\" aria-label=\"Copy email address [email protected]\">\n        <i class=\"fas fa-copy\"></i>\n    </button>\n</div>\n<div class=\"email-container\">\n    <span class=\"email-text\">[email protected]</span>\n    <button class=\"copy-btn\" data-text=\"[email protected]\" aria-label=\"Copy email address [email protected]\">\n        <i class=\"fas fa-copy\"></i>\n    </button>\n</div>\n\n<div style=\"margin-top: 2rem; text-align: center;\">\n  <a href=\"https://github.com/comphy-lab/Basilisk-101\" class=\"course-card__link\" target=\"_blank\" aria-label=\"Course GitHub Repository\">\n    <i class=\"fa-brands fa-github\" style=\"margin-right: 0.5rem; font-style: normal;\"></i>Course GitHub Repository\n  </a>\n</div>",

Instead, consider moving the JavaScript to an external file and adding appropriate data attributes to the buttons that the external script can target.

scripts/generate_seo_tags.rb (3)

89-93: Consider increasing the keyword limit

Limiting to only 8 keywords might be too restrictive for content-rich pages. Search engines can typically handle more keywords effectively.

- all_keywords = (title_keywords + type_keywords + tag_keywords).uniq.take(8)
+ all_keywords = (title_keywords + type_keywords + tag_keywords).uniq.take(15)

95-100: Improve description generation logic

The current approach of using only the first sentence might not always produce the most relevant description, especially if the first sentence is too short or not representative of the content.

- description = content.gsub(/\s+/, ' ').strip.split(/[.!?]/).first
+ # Get the first 2-3 sentences and trim to 160 chars
+ sentences = content.gsub(/\s+/, ' ').strip.split(/[.!?]/).take(3)
+ description = sentences.join('. ')[0..159].strip
  
- # Only use if description is between 50 and 160 characters
- if description && description.length >= 50 && description.length <= 160
+ # Only use if description is at least 50 characters
+ if description && description.length >= 50
    descriptions_by_url[normalized_url].add(description)
  end

209-217: Make the site URL configurable

The hardcoded URL "https://comphy-lab.org/sitemap.xml" should be made configurable to support different deployment environments.

+ # Read site URL from config if available
+ site_url = 'https://comphy-lab.org'
+ config_file = File.join(ROOT_DIR, '_config.yml')
+ if File.exist?(config_file)
+   config = YAML.load_file(config_file)
+   site_url = config['url'] if config['url'] && !config['url'].empty?
+ end

  # Create sitemapindex content
  sitemapindex_content = <<-XML
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
-   <loc>https://comphy-lab.org/sitemap.xml</loc>
+   <loc>#{site_url}/sitemap.xml</loc>
    <lastmod>#{last_mod}</lastmod>
  </sitemap>
</sitemapindex>
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa41720 and 477e2da.

⛔ Files ignored due to path filters (1)
  • assets/images/research/drop-impact-prl.png is excluded by !**/*.png
📒 Files selected for processing (13)
  • .github/workflows/rebuild-on-search-update.yml (1 hunks)
  • .github/workflows/update-search.yml (2 hunks)
  • News.md (1 hunks)
  • _layouts/default.html (2 hunks)
  • _layouts/research.html (2 hunks)
  • _layouts/teaching-course.html (2 hunks)
  • _layouts/teaching.html (2 hunks)
  • _layouts/team.html (2 hunks)
  • _research/index.md (2 hunks)
  • assets/js/search_db.json (2 hunks)
  • robots.txt (1 hunks)
  • scripts/build.sh (1 hunks)
  • scripts/generate_seo_tags.rb (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/rebuild-on-search-update.yml

[warning] 18-18: wrong indentation: expected 6 but found 4

(indentation)


[error] 19-19: trailing spaces

(trailing-spaces)


[error] 25-25: trailing spaces

(trailing-spaces)


[error] 30-30: trailing spaces

(trailing-spaces)

.github/workflows/update-search.yml

[error] 67-67: trailing spaces

(trailing-spaces)

🪛 RuboCop (1.73)
scripts/generate_seo_tags.rb

[warning] 1-1: Script file generate_seo_tags.rb doesn't have execute permission.

(Lint/ScriptPermission)


[convention] 123-176: Assignment Branch Condition size for update_html_with_metadata is too high. [<10, 32, 12> 35.61/23]

(Metrics/AbcSize)


[convention] 123-176: Cyclomatic complexity for update_html_with_metadata is too high. [10/7]

(Metrics/CyclomaticComplexity)


[convention] 123-176: Perceived complexity for update_html_with_metadata is too high. [12/8]

(Metrics/PerceivedComplexity)

🔇 Additional comments (27)
News.md (1)

16-16: Clean Removal of Outdated Announcement
The removal of the hybrid online+offline course announcement is appropriate given that it no longer aligns with current content priorities. This cleanup helps maintain the relevance of the news section.

_layouts/research.html (3)

109-114: SEO Meta Tags Added

• The inclusion of meta tags for description, author, robots, and keywords will help improve search engine visibility.
• Please verify that the keyword list remains representative as the content evolves.


120-122: New Font-Face Declaration

• The addition of the system font fallback (declaring 'System Font' via a local lookup of 'system-ui') helps improve performance and reliability.
• Confirm that this change aligns with your overall design guidelines.


136-145: Structured Data for Research

• The JSON-LD structured data providing context for the research project is well implemented and should benefit rich search results.
• Ensure that the “ResearchProject” type is the most appropriate schema type for your content and consider adding properties if needed in the future.

_layouts/teaching.html (2)

37-41: SEO Meta Tags Updated

• The new meta tags (description, author, robots, and keywords) clearly describe the teaching page and are consistent with the site’s SEO strategy.
• Double-check the content for consistency with overall branding and update as necessary.


47-48: System Font Declaration

• The updated @font-face rule now includes a declaration for 'System Font' with a local fallback, which should enhance text-rendering performance.
• Verify that this change meets the visual design requirements.

_layouts/default.html (2)

36-40: Enhanced SEO Meta Tags

• The meta tags now dynamically use page-specific descriptions or fall back to the site-wide defaults.
• The conditional inclusion of keywords (when page tags exist) is a smart move. Please double-check that the Liquid logic works as intended for all pages.


73-83: Conditional Research Project Structured Data

• The conditional inclusion of a second JSON-LD block for pages with a research layout is well handled.
• Verify that non-research pages do not mistakenly include these tags and that research pages render them correctly.

_layouts/team.html (2)

44-50: Font-Face Declaration Update
The updated @font-face rule now specifies 'System Font' with a local source (local('system-ui')). This should help achieve consistent font rendering and potentially improve performance by leveraging system fonts.


64-73: Structured Data for Team Section
The inclusion of JSON‑LD structured data defining the organization as a "ResearchOrganization" is a strong move for SEO and rich results. Verify that the URL and textual content (name and description) remain consistent with the organization’s branding.

.github/workflows/update-search.yml (2)

5-6: Cron Schedule Adjustment
The cron schedule has been updated to run daily at 4:00 UTC, which is appropriate if a daily update is desired. Please confirm that this frequency aligns with all downstream processes and performance requirements.


68-74: Commit and Push if Changed Step
The new step that configures Git, stages changes to assets/js/search_db.json, and commits & pushes them conditionally (based on git diff) is a solid enhancement. Verify in testing that minor changes (or changes caused by formatting) do not inadvertently trigger unnecessary commits.

🧰 Tools
🪛 actionlint (1.7.4)

69-69: shellcheck reported issue in this script: SC2015:info:4:18: Note that A && B || C is not if-then-else. C may run when A is true

(shellcheck)

.github/workflows/rebuild-on-search-update.yml (1)

1-13: New Rebuild Workflow Overview
The new workflow for rebuilding the site when assets/js/search_db.json is updated is well structured. The permissions, job steps, and environment variables are defined clearly to support a robust build & deploy process.

_layouts/teaching-course.html (3)

37-42: Dynamic SEO Meta Tags for Course Page
The meta tag block now conditionally renders page-specific values (for description and keywords) with fallback defaults. This is an excellent improvement that should boost SEO by ensuring each course page is optimally described.


47-49: Updated Font-Face Rule
The adjustment to the @font-face rule—using font-display: swap with a local source 'system-ui'—is a smart move to enhance page rendering speed and user experience on different platforms.


64-78: Structured Data for Course
Adding JSON‑LD structured data for the course helps search engines better understand and index the page content. Please ensure that the conditional parts (course name, description) render correctly in all scenarios.

assets/js/search_db.json (9)

310-407: Good addition of comprehensive development documentation.

The new sections on build commands, repository guidelines, and coding conventions provide essential guidance for contributors. This documentation clarifies the development workflow and establishes coding standards that will ensure consistency across the codebase.


408-421: Excellent research paper update with visual elements.

The addition of the March section with information about Sanjay and Lohse's paper includes an appropriate image and well-structured links to the full paper. This enhances the visibility of recent research outputs.


807-896: Utility pages and redirects appropriately implemented.

The 404 page, contact redirects, and other utility pages have been properly added to the search database, ensuring users can find these pages through search functionality.


997-1288: Well-structured research paper entries with comprehensive tagging.

The research paper entries follow a consistent format with proper titles, empty content (likely populated elsewhere), specific URLs, and relevant tags. The tagging system will facilitate effective content filtering and discovery.


1037-1049: Excellent use of tags for the featured research paper.

This paper entry includes the "Featured" tag along with subject-specific tags, which will help highlight important research while maintaining proper categorization. The organization of tags from specific to general is logical.

"tags": [
  "Drops",
  "Dissipative anamoly",
  "Superamphiphobic-surfaces",
  "Impact forces",
  "Featured"
]

597-803: Comprehensive course information structure with appropriate hierarchical organization.

The teaching content for "High-Fidelity Simulations Using Basilisk C" is exceptionally well-organized with clear titles, content sections, and proper URL paths. The hierarchical structure with headings and sub-sections will provide excellent navigation for students.


2764-3613: Well-organized blog content with clear excerpts and metadata.

The blog content entries feature well-structured excerpts and consistent metadata. The content is academically focused and relevant to the site's audience. The proper use of formatting and special characters in mathematical formulas is particularly good.


3614-5613: Thorough documentation of lecture notes and teaching materials.

The extensive collection of lecture notes, code documentation, and educational content provides valuable resources for students and researchers. The content is properly formatted with appropriate headings, code blocks, and mathematical notation.


1638-2763: Comprehensive team member information with consistent formatting.

The team member entries follow a consistent structure, providing clear information about each person's role, academic background, and research interests. The uniform formatting will enhance the user experience when browsing team information.

scripts/generate_seo_tags.rb (2)

31-48: LGTM: Well-structured URL normalization function

The normalize_url function handles various URL cases effectively, ensuring consistent handling of URLs throughout the script.


1-233: Overall well-structured SEO metadata generation script

The script effectively generates SEO metadata from the search database and updates HTML files accordingly. It handles URL normalization, keyword extraction, and description generation in a systematic way. The addition of sitemapindex.xml generation is a good touch for better SEO.

With the suggested improvements implemented, this script will be a valuable addition to your build process, improving the site's SEO significantly.

🧰 Tools
🪛 RuboCop (1.73)

[warning] 1-1: Script file generate_seo_tags.rb doesn't have execute permission.

(Lint/ScriptPermission)


[convention] 123-176: Assignment Branch Condition size for update_html_with_metadata is too high. [<10, 32, 12> 35.61/23]

(Metrics/AbcSize)


[convention] 123-176: Cyclomatic complexity for update_html_with_metadata is too high. [10/7]

(Metrics/CyclomaticComplexity)


[convention] 123-176: Perceived complexity for update_html_with_metadata is too high. [12/8]

(Metrics/PerceivedComplexity)

Docstrings generation was requested by @VatsalSy.

* #23 (comment)

The following files were modified:

* `scripts/generate_seo_tags.rb`
Copy link
Contributor

coderabbitai bot commented Mar 18, 2025

Note

Generated docstrings for this pull request at #24

📝 Add docstrings to `SEO-and-paper-update`
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
scripts/generate_seo_tags.rb (1)

170-170: ⚠️ Potential issue

Properly escape HTML entities in meta tag content

When adding meta tags with innerHTML, you need to escape the content to prevent potential HTML injection or breaking the page structure.

- head.add_child("<meta name=\"keywords\" content=\"#{keywords_str}\">")
+ head.add_child("<meta name=\"keywords\" content=\"#{CGI.escape_html(keywords_str)}\">")

- head.add_child("<meta name=\"description\" content=\"#{desc_str}\">")
+ head.add_child("<meta name=\"description\" content=\"#{CGI.escape_html(desc_str)}\">")

Also applies to: 185-185

🧹 Nitpick comments (3)
scripts/generate_seo_tags.rb (3)

86-94: Consider refining keyword extraction logic

The current implementation hard-codes criteria like word length > 3 and a maximum of 5 title keywords. Consider making these configurable with constants at the top of the file.

+# Constants for keyword extraction
+MIN_KEYWORD_LENGTH = 3
+MAX_TITLE_KEYWORDS = 5
+MAX_TOTAL_KEYWORDS = 8

# Later in the code:
-                       .select { |w| w.length > 3 }
+                       .select { |w| w.length > MIN_KEYWORD_LENGTH }
-                       .take(5)
+                       .take(MAX_TITLE_KEYWORDS)

-  all_keywords = (title_keywords + type_keywords + tag_keywords).uniq.take(8)
+  all_keywords = (title_keywords + type_keywords + tag_keywords).uniq.take(MAX_TOTAL_KEYWORDS)

202-219: Consider adding progress indicator for large sites

When processing a large number of files, it would be helpful to show a progress indicator. Consider adding a counter and percentage display.

files_updated = 0
+total_files = keywords_by_url.size
+puts "Processing #{total_files} HTML files..."

# Process each URL
+counter = 0
keywords_by_url.each do |url, keywords|
+  counter += 1
+  if counter % 10 == 0 || counter == total_files
+    percent_done = (counter.to_f / total_files * 100).round(1)
+    puts "Progress: #{counter}/#{total_files} (#{percent_done}%)"
+  end
  
  # Get description for this URL
  descriptions = descriptions_by_url[url] || Set.new
  
  # Get file path
  file_path = File.join(site_dir, url)
  
  # Update HTML file if it exists
  if update_html_with_metadata(file_path, keywords, descriptions)
    files_updated += 1
-    puts "Updated metadata for #{url}"
+    puts "  Updated metadata for #{url}" if ENV['VERBOSE']
  end
end

261-264: Add error handling for the main execution flow

While individual functions have error handling, the main script execution doesn't. Consider wrapping the entire execution in a begin/rescue block for better error handling.

+begin
  # Generate sitemapindex
  generate_sitemapindex(site_dir)
  
  puts "SEO enhancement completed successfully!"
+rescue => e
+  puts "Error during SEO enhancement: #{e.message}"
+  puts e.backtrace if ENV['DEBUG']
+  exit 1
+end
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 477e2da and 3887332.

📒 Files selected for processing (1)
  • scripts/generate_seo_tags.rb (1 hunks)
🧰 Additional context used
🪛 RuboCop (1.73)
scripts/generate_seo_tags.rb

[warning] 1-1: Script file generate_seo_tags.rb doesn't have execute permission.

(Lint/ScriptPermission)


[convention] 147-200: Assignment Branch Condition size for update_html_with_metadata is too high. [<10, 32, 12> 35.61/23]

(Metrics/AbcSize)


[convention] 147-200: Cyclomatic complexity for update_html_with_metadata is too high. [10/7]

(Metrics/CyclomaticComplexity)


[convention] 147-200: Perceived complexity for update_html_with_metadata is too high. [12/8]

(Metrics/PerceivedComplexity)

🔇 Additional comments (3)
scripts/generate_seo_tags.rb (3)

1-1: Add execute permission to the script

The script file doesn't have execute permission, which will prevent it from being executed directly.

Run this command to fix:

chmod +x scripts/generate_seo_tags.rb
🧰 Tools
🪛 RuboCop (1.73)

[warning] 1-1: Script file generate_seo_tags.rb doesn't have execute permission.

(Lint/ScriptPermission)


42-59: Well-structured URL normalization function

The normalize_url function is well-documented with clear examples and properly handles various URL formats including fragments, missing extensions, and directory paths.


147-200: Refactor complex HTML update function

The update_html_with_metadata function has high complexity (ABC size of 35.61, cyclomatic complexity of 10, perceived complexity of 12). Break it down into smaller, more focused functions.

def update_html_with_metadata(file_path, keywords, description)
  return unless File.exist?(file_path)
  
  begin
    # Read file content
    html_content = File.read(file_path)
    
    # Parse with Nokogiri
    doc = Nokogiri::HTML(html_content)
    
    # Get existing head element
    head = doc.at_css('head')
    return unless head
    
-   # Check for existing metadata
-   existing_keywords = doc.css('meta[name="keywords"]')
-   existing_description = doc.css('meta[name="description"]')
-   
-   # Update or add keywords meta tag
-   if !keywords.empty?
-     keywords_str = keywords.to_a.uniq.join(', ')
-     if existing_keywords.empty?
-       # Add new meta tag for keywords
-       head.add_child("<meta name=\"keywords\" content=\"#{keywords_str}\">")
-     else
-       # Update existing keywords
-       existing_keywords.first['content'] = keywords_str
-     end
-   end
-   
-   # Update or add description meta tag if we have a good one
-   if !description.empty?
-     desc_str = description.to_a.first
-     
-     # Add if no description meta or if the existing one is too short
-     if existing_description.empty? || existing_description.first['content'].to_s.length < 50
-       if existing_description.empty?
-         # Add new meta tag for description
-         head.add_child("<meta name=\"description\" content=\"#{desc_str}\">")
-       else
-         # Update existing description
-         existing_description.first['content'] = desc_str
-       end
-     end
-   end
+   # Update meta tags
+   update_keywords_meta(doc, head, keywords)
+   update_description_meta(doc, head, description)
    
    # Write updated content back to file
    File.write(file_path, doc.to_html)
    return true
  rescue => e
    puts "Error updating metadata for #{file_path}: #{e.message}"
    return false
  end
end

+ # Update keywords meta tag
+ def update_keywords_meta(doc, head, keywords)
+   return if keywords.empty?
+   
+   existing_keywords = doc.css('meta[name="keywords"]')
+   keywords_str = keywords.to_a.uniq.join(', ')
+   
+   if existing_keywords.empty?
+     # Add new meta tag for keywords
+     head.add_child("<meta name=\"keywords\" content=\"#{CGI.escape_html(keywords_str)}\">")
+   else
+     # Update existing keywords
+     existing_keywords.first['content'] = keywords_str
+   end
+ end
+ 
+ # Update description meta tag
+ def update_description_meta(doc, head, description)
+   return if description.empty?
+   
+   existing_description = doc.css('meta[name="description"]')
+   desc_str = description.to_a.first
+   
+   # Add if no description meta or if the existing one is too short
+   if existing_description.empty? || existing_description.first['content'].to_s.length < 50
+     if existing_description.empty?
+       # Add new meta tag for description
+       head.add_child("<meta name=\"description\" content=\"#{CGI.escape_html(desc_str)}\">")
+     else
+       # Update existing description
+       existing_description.first['content'] = desc_str
+     end
+   end
+ end
🧰 Tools
🪛 RuboCop (1.73)

[convention] 147-200: Assignment Branch Condition size for update_html_with_metadata is too high. [<10, 32, 12> 35.61/23]

(Metrics/AbcSize)


[convention] 147-200: Cyclomatic complexity for update_html_with_metadata is too high. [10/7]

(Metrics/CyclomaticComplexity)


[convention] 147-200: Perceived complexity for update_html_with_metadata is too high. [12/8]

(Metrics/PerceivedComplexity)

This commit updates the spelling of "Dissipative anamoly" to "Dissipative anomaly" across multiple entries in the research index. The changes ensure consistency and accuracy in the tagging of research topics.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
_config.yml (1)

35-35: Add Newline at End of File.

YAML linting indicated a missing newline at the end of the file. Adding this newline helps conform to style guidelines and avoids potential issues with certain processing tools.

-  - .jekyll-cache
+  - .jekyll-cache
+
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 35-35: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3887332 and e749da7.

📒 Files selected for processing (6)
  • _config.yml (1 hunks)
  • _research/index.md (6 hunks)
  • assets/js/search_db.json (2 hunks)
  • scripts/build.sh (1 hunks)
  • scripts/generate_search_db.rb (1 hunks)
  • scripts/generate_seo_tags.rb (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • scripts/build.sh
  • assets/js/search_db.json
🧰 Additional context used
🪛 YAMLlint (1.35.1)
_config.yml

[error] 35-35: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (16)
_config.yml (2)

26-27: Review Teaching Collection Permalink Configuration.

The teaching collection now includes an explicit permalink (/:collection/:name), which brings consistency with the other collections. Verify that this URL structure meets the site's overall routing strategy for teaching pages.


28-35: Approval of Exclude Section.

The added exclude section correctly lists directories and files (e.g., scripts/node_modules, node_modules, .jekyll-cache, etc.) that should be omitted from processing. This change is aligned with the overall improvements for SEO and build efficiency.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 35-35: no new line character at the end of file

(new-line-at-end-of-file)

scripts/generate_seo_tags.rb (6)

1-7: LGTM: Script has appropriate dependencies

The script correctly includes all necessary libraries, including the CGI library that was previously missing and flagged in past reviews. The shebang line allows for direct script execution.


19-20: Good implementation of configurable domain

The script properly loads the domain from environment variables first, then falls back to the site configuration, and finally uses a default value. This addresses the previous concern about hardcoded domains.


41-70: Well-documented URL normalization function

The normalize_url function includes excellent documentation with examples and is implemented correctly to handle various URL formats. The function properly removes fragments, ensures leading slashes, and handles directory URLs by appending index.html.


72-126: Thorough metadata extraction logic

The code appropriately extracts keywords from multiple sources (title, type, tags) and creates descriptions from content, with proper handling of edge cases. The skippping of external URLs and entries with little content helps ensure quality metadata.


158-183: Well-refactored HTML metadata update functions

The HTML update functionality has been effectively split into smaller functions as suggested in previous reviews. The code now properly escapes HTML content using CGI.escape_html, addressing the security concern raised earlier.

Also applies to: 185-199, 201-218


250-277: Verify domain variable in sitemap generation

The sitemap generation properly uses the configurable SITE_DOMAIN variable instead of a hardcoded domain, addressing previous concerns.

scripts/generate_search_db.rb (2)

74-74: Improved README file check

The case-insensitive check for README.md files is a good improvement that ensures consistent behavior regardless of filename casing.


81-83: Efficient README processing skip

Adding a complete skip for README processing is more efficient than continuing with unnecessary operations when the file is identified as README.md.

_research/index.md (6)

19-19: Corrected Tag Spelling for Global Consistency

The tag on line 19 now reads "Dissipative anomaly," correcting the previous misspelling ("Dissipative anamoly"). This small fix enhances consistency across the document.


59-69: Addition of New 2025 Research Entry [15]

The new research entry for 2025 is comprehensively added with a header, updated citation details, and a full set of metadata:

  • Header & Citation: The <h3 id="15"> element clearly formats the research citation and assigns a unique identifier.
  • Tags: The <tags> block now includes "Drops," "Dissipative anomaly," "Superamphiphobic-surfaces," "Impact forces," and "Featured," aligning the entry with current terminology and enhancing discoverability.
  • External Links & Image: The inclusion of DOI, arXiv, GitHub, and Blog links, along with an image demonstrating the parameter space of drop impact, provides a rich set of resources for readers.

Please verify that the external links (especially the GitHub repository URL and DOI) are accurate and up-to-date.


93-93: Tag Correction for Research Entry [13]

The tag on line 93 has been updated to "Dissipative anomaly," thereby correcting the earlier typographical error. This ensures uniform terminology across all research entries.


106-106: Revised Tags for Research Entry [12]

The tags for the research entry on [12] have been revised to include <span>Bubbles</span>, <span>Non-Newtonian</span>, <span>Jets</span>, and <span>Soft-matter-singularities</span>. This update removes the outdated/misspelled terms and aligns the tags with the site's overall tagging conventions.


127-127: Tag Correction for Research Entry [11]

The tag on line 127 now correctly shows "Dissipative anomaly," replacing the incorrect version. This correction contributes to consistent filtering and search functionality on the site.


158-158: Tag Correction for Research Entry [9]

The updated tag on line 158 now correctly reflects "Dissipative anomaly," ensuring that the term is uniformly used throughout the document. This change supports better categorization and reader comprehension.

@VatsalSy VatsalSy merged commit 867af57 into main Mar 18, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant