-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEO-and-paper-update #23
Conversation
This commit adds a new script `scripts/generate_seo_tags.rb` that generates SEO metadata (keywords and descriptions) for the website's pages based on the search database. The key changes are: - Loads the search database from `assets/js/search_db.json` - Extracts keywords from the page title, type, and tags - Generates descriptions from the page content - Updates the HTML files with the generated metadata (keywords and description) - Handles cases where no good description is available by generating one from the full content - Normalizes URLs to ensure consistent metadata generation This feature will improve the website's SEO by providing relevant metadata for search engines.
…and coding conventions This commit adds detailed instructions for building and developing the project, as well as guidelines for maintaining the repository and coding conventions to follow. The changes include: - Added a new section on "Build and Development Commands" with instructions for installing dependencies, building the site and search database, running the local server, fetching blog content, and generating the search database. - Added a new section on "Repository Guidelines" with instructions for updating the README, using the provided templates and CSS files, and keeping the documentation up-to-date. - Added a new section on "General" coding conventions, including indentation, DRY principles, commenting, and support for light and dark themes. - Added a new section on "HTML/Markdown" conventions, including the use of semantic HTML elements, BEM naming, and keeping content in Markdown format. - Added a new section on "CSS" conventions, including the use of CSS variables, responsive breakpoints, units, mobile-first approach, and dark theme implementation. - Added a new section on "JavaScript" conventions, including the use of ES6+ features and following best practices. These changes aim to provide clear and comprehensive guidelines for contributors to follow when working on the project, ensuring consistency and maintainability.
rebuild site on search database update This commit adds a new GitHub Actions workflow that automatically rebuilds the site when the search database file (`assets/js/search_db.json`) is updated. This ensures that the search functionality on the site is always up-to-date with the latest content. The workflow is triggered on a push to the `main` branch, specifically when the search database file is modified. It checks out the repository, sets up the Ruby environment, builds the site and search database, and then deploys the updated site to GitHub Pages.
The changes in this commit add a new paper by Sanjay, V. and Lohse, D. on the unifying theory of scaling in drop impact, including the forces and maximum spreading diameter. This paper was published in Physical Review Letters in 2025 and received positive reviews. The changes include adding the paper details, tags, and associated badges to the research page.
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis pull request introduces a new GitHub Actions workflow to rebuild the site upon changes to the search database and modifies an existing workflow to update its schedule and functionality. Various layout templates and content files are enhanced with SEO meta tags, structured data, and font declarations. A new research entry is added to the Changes
Sequence Diagram(s)sequenceDiagram
participant Developer
participant GitHubActions as "GitHub Actions"
participant BuildJob as "Build Job"
participant DeployJob as "Deploy Job"
Developer->>GitHubActions: Push to main (change in search_db.json)
GitHubActions->>BuildJob: Trigger build workflow
BuildJob->>BuildJob: Checkout code, setup Ruby, run build.sh (includes SEO generation)
BuildJob->>GitHubActions: Upload build artifact
GitHubActions->>DeployJob: Trigger deploy workflow (after build success)
DeployJob->>DeployJob: Deploy to GitHub Pages
sequenceDiagram
participant BuildScript as "build.sh"
participant SEOGenerator as "generate_seo_tags.rb"
participant HTML as "HTML Files"
participant Sitemap as "sitemap.xml"
BuildScript->>SEOGenerator: Invoke SEO metadata generation
SEOGenerator->>HTML: Process URLs and update SEO metadata
SEOGenerator->>Sitemap: Update sitemapindex.xml if sitemap exists
SEOGenerator->>BuildScript: Return completion status
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (15)
News.md (1)
5-15
: Enhance SEO Research Entry Structure
The new research entry for the publication by Sanjay, V. & Lohse, D. is well-formatted and clearly highlights the paper’s details, including the image with appropriate alt text and the DOI link. One suggestion is to consider moving repeated inline style definitions (e.g. those on the<div>
and<a>
tags) into a shared CSS class for improved maintainability and consistency._research/index.md (1)
59-70
: New Research Entry Addition & Tag Consistency
The new research entry under the 2025 section is clearly presented and follows the format of existing entries. The title, citation, and DOI link are all properly displayed, and the inclusion of the image (with centered styling) adds visual appeal.One minor point to address is a probable typo in the tags: the term "Dissipative anamoly" appears instead of "Dissipative anomaly". Correcting this will help maintain consistency across all research entries. For example, consider applying the following diff:
-<tags><span>Drops</span><span>Dissipative anamoly</span><span>Superamphiphobic-surfaces</span><span>Impact forces</span><span>Featured</span></tags> +<tags><span>Drops</span><span>Dissipative anomaly</span><span>Superamphiphobic-surfaces</span><span>Impact forces</span><span>Featured</span></tags>Additionally, verify that the Dropbox link for the parameter space image remains reliably accessible for public usage.
robots.txt (1)
1-7
: Review of Robots.txt File• The directives correctly allow all user agents and point to the sitemap.
• The empty "Disallow:" is interpreted as “no restrictions” which is acceptable.
• Nitpick: There is an extra line at the end (line 7 displaying “7”) that appears unintended. If it is not purposeful, please remove it.scripts/build.sh (1)
38-41
: SEO Metadata Generation Step• The new block that echoes "Generating SEO metadata..." and then invokes the Ruby script (
scripts/generate_seo_tags.rb
) is clear and well placed in the build process.
• Ensure that the Ruby script is executable and that any necessary dependencies or environment settings are in place.
• It might be helpful to add a brief inline comment for maintainers about the expected outcome or failure behavior of this step._layouts/teaching.html (1)
64-78
: Structured Data for Teaching• The JSON-LD snippet for a "Course" clearly outlines information about the educational offerings. This structured data will aid search engines in understanding your course content.
• As a good-to-have improvement, consider adding additional properties (e.g., “courseCode” or “provider” details) if available._layouts/default.html (1)
61-72
: Organization Structured Data Added• The JSON-LD structured data block provides search engines with key information about the organization.
• Consider populating the "sameAs" array with links to your verified social media profiles to further improve your rich snippet eligibility._layouts/team.html (1)
36-41
: SEO Meta Tags Addition
The new SEO meta tags (description, author, robots, keywords) are a welcome enhancement for improving search engine discoverability. Ensure that the static content is reviewed periodically or, if applicable, later converted to dynamic values to better reflect page-specific content..github/workflows/update-search.yml (1)
67-67
: Remove Trailing Spaces
Line 67 contains trailing spaces. Removing these will keep the YAML file clean and compliant with linting tools.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 67-67: trailing spaces
(trailing-spaces)
.github/workflows/rebuild-on-search-update.yml (3)
18-19
: YAML Indentation and Trailing Spaces
The static analysis indicates an indentation issue at line 18 (expected 6 spaces but found 4) along with trailing spaces on line 19. Please adjust the indentation and remove the extra spaces to conform with YAML linting standards.🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 18-18: wrong indentation: expected 6 but found 4
(indentation)
[error] 19-19: trailing spaces
(trailing-spaces)
25-25
: Clean Up Trailing Spaces
There are trailing spaces on line 25. Removing these will improve readability and reduce potential linting warnings.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 25-25: trailing spaces
(trailing-spaces)
30-30
: Remove Trailing Spaces on Line 30
Please remove any trailing spaces on line 30 for consistency with YAML formatting best practices.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 30-30: trailing spaces
(trailing-spaces)
assets/js/search_db.json (1)
739-757
: Consider removing script code from content field.The registration section contains embedded JavaScript code for copying email addresses. While the functionality is useful, embedding scripts directly in content can lead to security concerns and maintenance challenges.
- "content": "For registration details, please contact \n<div class=\"email-container\">\n <span class=\"email-text\">[email protected]</span>\n <button class=\"copy-btn\" onclick=\"copyEmail(this)\" data-text=\"[email protected]\" aria-label=\"Copy email address [email protected]\">\n <i class=\"fas fa-copy\"></i>\n </button>\n</div>\n<div class=\"email-container\">\n <span class=\"email-text\">[email protected]</span>\n <button class=\"copy-btn\" onclick=\"copyEmail(this)\" data-text=\"[email protected]\" aria-label=\"Copy email address [email protected]\">\n <i class=\"fas fa-copy\"></i>\n </button>\n</div>\n\n<script>\nfunction copyEmail(button) {\n const textToCopy = button.getAttribute('data-text');\n \n // Create a temporary textarea element to copy from\n const textarea = document.createElement('textarea');\n textarea.value = textToCopy;\n textarea.setAttribute('readonly', '');\n textarea.style.position = 'absolute';\n textarea.style.left = '-9999px';\n document.body.appendChild(textarea);\n \n // Select and copy the text\n textarea.select();\n document.execCommand('copy');\n \n // Remove the temporary element\n document.body.removeChild(textarea);\n \n // Show feedback\n const originalIcon = button.innerHTML;\n button.innerHTML = '<i class=\"fas fa-check\"></i>';\n button.classList.add('copied');\n \n // Restore original state after a delay\n setTimeout(() => {\n button.innerHTML = originalIcon;\n button.classList.remove('copied');\n }, 2000);\n}\n</script>\n\n<div style=\"margin-top: 2rem; text-align: center;\">\n <a href=\"https://github.com/comphy-lab/Basilisk-101\" class=\"course-card__link\" target=\"_blank\" aria-label=\"Course GitHub Repository\">\n <i class=\"fa-brands fa-github\" style=\"margin-right: 0.5rem; font-style: normal;\"></i>Course GitHub Repository\n </a>\n</div>", + "content": "For registration details, please contact \n<div class=\"email-container\">\n <span class=\"email-text\">[email protected]</span>\n <button class=\"copy-btn\" data-text=\"[email protected]\" aria-label=\"Copy email address [email protected]\">\n <i class=\"fas fa-copy\"></i>\n </button>\n</div>\n<div class=\"email-container\">\n <span class=\"email-text\">[email protected]</span>\n <button class=\"copy-btn\" data-text=\"[email protected]\" aria-label=\"Copy email address [email protected]\">\n <i class=\"fas fa-copy\"></i>\n </button>\n</div>\n\n<div style=\"margin-top: 2rem; text-align: center;\">\n <a href=\"https://github.com/comphy-lab/Basilisk-101\" class=\"course-card__link\" target=\"_blank\" aria-label=\"Course GitHub Repository\">\n <i class=\"fa-brands fa-github\" style=\"margin-right: 0.5rem; font-style: normal;\"></i>Course GitHub Repository\n </a>\n</div>",Instead, consider moving the JavaScript to an external file and adding appropriate data attributes to the buttons that the external script can target.
scripts/generate_seo_tags.rb (3)
89-93
: Consider increasing the keyword limitLimiting to only 8 keywords might be too restrictive for content-rich pages. Search engines can typically handle more keywords effectively.
- all_keywords = (title_keywords + type_keywords + tag_keywords).uniq.take(8) + all_keywords = (title_keywords + type_keywords + tag_keywords).uniq.take(15)
95-100
: Improve description generation logicThe current approach of using only the first sentence might not always produce the most relevant description, especially if the first sentence is too short or not representative of the content.
- description = content.gsub(/\s+/, ' ').strip.split(/[.!?]/).first + # Get the first 2-3 sentences and trim to 160 chars + sentences = content.gsub(/\s+/, ' ').strip.split(/[.!?]/).take(3) + description = sentences.join('. ')[0..159].strip - # Only use if description is between 50 and 160 characters - if description && description.length >= 50 && description.length <= 160 + # Only use if description is at least 50 characters + if description && description.length >= 50 descriptions_by_url[normalized_url].add(description) end
209-217
: Make the site URL configurableThe hardcoded URL "https://comphy-lab.org/sitemap.xml" should be made configurable to support different deployment environments.
+ # Read site URL from config if available + site_url = 'https://comphy-lab.org' + config_file = File.join(ROOT_DIR, '_config.yml') + if File.exist?(config_file) + config = YAML.load_file(config_file) + site_url = config['url'] if config['url'] && !config['url'].empty? + end # Create sitemapindex content sitemapindex_content = <<-XML <?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> - <loc>https://comphy-lab.org/sitemap.xml</loc> + <loc>#{site_url}/sitemap.xml</loc> <lastmod>#{last_mod}</lastmod> </sitemap> </sitemapindex>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
assets/images/research/drop-impact-prl.png
is excluded by!**/*.png
📒 Files selected for processing (13)
.github/workflows/rebuild-on-search-update.yml
(1 hunks).github/workflows/update-search.yml
(2 hunks)News.md
(1 hunks)_layouts/default.html
(2 hunks)_layouts/research.html
(2 hunks)_layouts/teaching-course.html
(2 hunks)_layouts/teaching.html
(2 hunks)_layouts/team.html
(2 hunks)_research/index.md
(2 hunks)assets/js/search_db.json
(2 hunks)robots.txt
(1 hunks)scripts/build.sh
(1 hunks)scripts/generate_seo_tags.rb
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/rebuild-on-search-update.yml
[warning] 18-18: wrong indentation: expected 6 but found 4
(indentation)
[error] 19-19: trailing spaces
(trailing-spaces)
[error] 25-25: trailing spaces
(trailing-spaces)
[error] 30-30: trailing spaces
(trailing-spaces)
.github/workflows/update-search.yml
[error] 67-67: trailing spaces
(trailing-spaces)
🪛 RuboCop (1.73)
scripts/generate_seo_tags.rb
[warning] 1-1: Script file generate_seo_tags.rb doesn't have execute permission.
(Lint/ScriptPermission)
[convention] 123-176: Assignment Branch Condition size for update_html_with_metadata is too high. [<10, 32, 12> 35.61/23]
(Metrics/AbcSize)
[convention] 123-176: Cyclomatic complexity for update_html_with_metadata
is too high. [10/7]
(Metrics/CyclomaticComplexity)
[convention] 123-176: Perceived complexity for update_html_with_metadata
is too high. [12/8]
(Metrics/PerceivedComplexity)
🔇 Additional comments (27)
News.md (1)
16-16
: Clean Removal of Outdated Announcement
The removal of the hybrid online+offline course announcement is appropriate given that it no longer aligns with current content priorities. This cleanup helps maintain the relevance of the news section._layouts/research.html (3)
109-114
: SEO Meta Tags Added• The inclusion of meta tags for description, author, robots, and keywords will help improve search engine visibility.
• Please verify that the keyword list remains representative as the content evolves.
120-122
: New Font-Face Declaration• The addition of the system font fallback (declaring 'System Font' via a local lookup of 'system-ui') helps improve performance and reliability.
• Confirm that this change aligns with your overall design guidelines.
136-145
: Structured Data for Research• The JSON-LD structured data providing context for the research project is well implemented and should benefit rich search results.
• Ensure that the “ResearchProject” type is the most appropriate schema type for your content and consider adding properties if needed in the future._layouts/teaching.html (2)
37-41
: SEO Meta Tags Updated• The new meta tags (description, author, robots, and keywords) clearly describe the teaching page and are consistent with the site’s SEO strategy.
• Double-check the content for consistency with overall branding and update as necessary.
47-48
: System Font Declaration• The updated @font-face rule now includes a declaration for 'System Font' with a local fallback, which should enhance text-rendering performance.
• Verify that this change meets the visual design requirements._layouts/default.html (2)
36-40
: Enhanced SEO Meta Tags• The meta tags now dynamically use page-specific descriptions or fall back to the site-wide defaults.
• The conditional inclusion of keywords (when page tags exist) is a smart move. Please double-check that the Liquid logic works as intended for all pages.
73-83
: Conditional Research Project Structured Data• The conditional inclusion of a second JSON-LD block for pages with a research layout is well handled.
• Verify that non-research pages do not mistakenly include these tags and that research pages render them correctly._layouts/team.html (2)
44-50
: Font-Face Declaration Update
The updated@font-face
rule now specifies'System Font'
with a local source (local('system-ui')
). This should help achieve consistent font rendering and potentially improve performance by leveraging system fonts.
64-73
: Structured Data for Team Section
The inclusion of JSON‑LD structured data defining the organization as a "ResearchOrganization" is a strong move for SEO and rich results. Verify that the URL and textual content (name and description) remain consistent with the organization’s branding..github/workflows/update-search.yml (2)
5-6
: Cron Schedule Adjustment
The cron schedule has been updated to run daily at 4:00 UTC, which is appropriate if a daily update is desired. Please confirm that this frequency aligns with all downstream processes and performance requirements.
68-74
: Commit and Push if Changed Step
The new step that configures Git, stages changes toassets/js/search_db.json
, and commits & pushes them conditionally (based ongit diff
) is a solid enhancement. Verify in testing that minor changes (or changes caused by formatting) do not inadvertently trigger unnecessary commits.🧰 Tools
🪛 actionlint (1.7.4)
69-69: shellcheck reported issue in this script: SC2015:info:4:18: Note that A && B || C is not if-then-else. C may run when A is true
(shellcheck)
.github/workflows/rebuild-on-search-update.yml (1)
1-13
: New Rebuild Workflow Overview
The new workflow for rebuilding the site whenassets/js/search_db.json
is updated is well structured. The permissions, job steps, and environment variables are defined clearly to support a robust build & deploy process._layouts/teaching-course.html (3)
37-42
: Dynamic SEO Meta Tags for Course Page
The meta tag block now conditionally renders page-specific values (for description and keywords) with fallback defaults. This is an excellent improvement that should boost SEO by ensuring each course page is optimally described.
47-49
: Updated Font-Face Rule
The adjustment to the@font-face
rule—usingfont-display: swap
with a local source'system-ui'
—is a smart move to enhance page rendering speed and user experience on different platforms.
64-78
: Structured Data for Course
Adding JSON‑LD structured data for the course helps search engines better understand and index the page content. Please ensure that the conditional parts (course name, description) render correctly in all scenarios.assets/js/search_db.json (9)
310-407
: Good addition of comprehensive development documentation.The new sections on build commands, repository guidelines, and coding conventions provide essential guidance for contributors. This documentation clarifies the development workflow and establishes coding standards that will ensure consistency across the codebase.
408-421
: Excellent research paper update with visual elements.The addition of the March section with information about Sanjay and Lohse's paper includes an appropriate image and well-structured links to the full paper. This enhances the visibility of recent research outputs.
807-896
: Utility pages and redirects appropriately implemented.The 404 page, contact redirects, and other utility pages have been properly added to the search database, ensuring users can find these pages through search functionality.
997-1288
: Well-structured research paper entries with comprehensive tagging.The research paper entries follow a consistent format with proper titles, empty content (likely populated elsewhere), specific URLs, and relevant tags. The tagging system will facilitate effective content filtering and discovery.
1037-1049
: Excellent use of tags for the featured research paper.This paper entry includes the "Featured" tag along with subject-specific tags, which will help highlight important research while maintaining proper categorization. The organization of tags from specific to general is logical.
"tags": [ "Drops", "Dissipative anamoly", "Superamphiphobic-surfaces", "Impact forces", "Featured" ]
597-803
: Comprehensive course information structure with appropriate hierarchical organization.The teaching content for "High-Fidelity Simulations Using Basilisk C" is exceptionally well-organized with clear titles, content sections, and proper URL paths. The hierarchical structure with headings and sub-sections will provide excellent navigation for students.
2764-3613
: Well-organized blog content with clear excerpts and metadata.The blog content entries feature well-structured excerpts and consistent metadata. The content is academically focused and relevant to the site's audience. The proper use of formatting and special characters in mathematical formulas is particularly good.
3614-5613
: Thorough documentation of lecture notes and teaching materials.The extensive collection of lecture notes, code documentation, and educational content provides valuable resources for students and researchers. The content is properly formatted with appropriate headings, code blocks, and mathematical notation.
1638-2763
: Comprehensive team member information with consistent formatting.The team member entries follow a consistent structure, providing clear information about each person's role, academic background, and research interests. The uniform formatting will enhance the user experience when browsing team information.
scripts/generate_seo_tags.rb (2)
31-48
: LGTM: Well-structured URL normalization functionThe normalize_url function handles various URL cases effectively, ensuring consistent handling of URLs throughout the script.
1-233
: Overall well-structured SEO metadata generation scriptThe script effectively generates SEO metadata from the search database and updates HTML files accordingly. It handles URL normalization, keyword extraction, and description generation in a systematic way. The addition of sitemapindex.xml generation is a good touch for better SEO.
With the suggested improvements implemented, this script will be a valuable addition to your build process, improving the site's SEO significantly.
🧰 Tools
🪛 RuboCop (1.73)
[warning] 1-1: Script file generate_seo_tags.rb doesn't have execute permission.
(Lint/ScriptPermission)
[convention] 123-176: Assignment Branch Condition size for update_html_with_metadata is too high. [<10, 32, 12> 35.61/23]
(Metrics/AbcSize)
[convention] 123-176: Cyclomatic complexity for
update_html_with_metadata
is too high. [10/7](Metrics/CyclomaticComplexity)
[convention] 123-176: Perceived complexity for
update_html_with_metadata
is too high. [12/8](Metrics/PerceivedComplexity)
Docstrings generation was requested by @VatsalSy. * #23 (comment) The following files were modified: * `scripts/generate_seo_tags.rb`
Note Generated docstrings for this pull request at #24 |
📝 Add docstrings to `SEO-and-paper-update`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
scripts/generate_seo_tags.rb (1)
170-170
:⚠️ Potential issueProperly escape HTML entities in meta tag content
When adding meta tags with innerHTML, you need to escape the content to prevent potential HTML injection or breaking the page structure.
- head.add_child("<meta name=\"keywords\" content=\"#{keywords_str}\">") + head.add_child("<meta name=\"keywords\" content=\"#{CGI.escape_html(keywords_str)}\">") - head.add_child("<meta name=\"description\" content=\"#{desc_str}\">") + head.add_child("<meta name=\"description\" content=\"#{CGI.escape_html(desc_str)}\">")Also applies to: 185-185
🧹 Nitpick comments (3)
scripts/generate_seo_tags.rb (3)
86-94
: Consider refining keyword extraction logicThe current implementation hard-codes criteria like word length > 3 and a maximum of 5 title keywords. Consider making these configurable with constants at the top of the file.
+# Constants for keyword extraction +MIN_KEYWORD_LENGTH = 3 +MAX_TITLE_KEYWORDS = 5 +MAX_TOTAL_KEYWORDS = 8 # Later in the code: - .select { |w| w.length > 3 } + .select { |w| w.length > MIN_KEYWORD_LENGTH } - .take(5) + .take(MAX_TITLE_KEYWORDS) - all_keywords = (title_keywords + type_keywords + tag_keywords).uniq.take(8) + all_keywords = (title_keywords + type_keywords + tag_keywords).uniq.take(MAX_TOTAL_KEYWORDS)
202-219
: Consider adding progress indicator for large sitesWhen processing a large number of files, it would be helpful to show a progress indicator. Consider adding a counter and percentage display.
files_updated = 0 +total_files = keywords_by_url.size +puts "Processing #{total_files} HTML files..." # Process each URL +counter = 0 keywords_by_url.each do |url, keywords| + counter += 1 + if counter % 10 == 0 || counter == total_files + percent_done = (counter.to_f / total_files * 100).round(1) + puts "Progress: #{counter}/#{total_files} (#{percent_done}%)" + end # Get description for this URL descriptions = descriptions_by_url[url] || Set.new # Get file path file_path = File.join(site_dir, url) # Update HTML file if it exists if update_html_with_metadata(file_path, keywords, descriptions) files_updated += 1 - puts "Updated metadata for #{url}" + puts " Updated metadata for #{url}" if ENV['VERBOSE'] end end
261-264
: Add error handling for the main execution flowWhile individual functions have error handling, the main script execution doesn't. Consider wrapping the entire execution in a begin/rescue block for better error handling.
+begin # Generate sitemapindex generate_sitemapindex(site_dir) puts "SEO enhancement completed successfully!" +rescue => e + puts "Error during SEO enhancement: #{e.message}" + puts e.backtrace if ENV['DEBUG'] + exit 1 +end
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/generate_seo_tags.rb
(1 hunks)
🧰 Additional context used
🪛 RuboCop (1.73)
scripts/generate_seo_tags.rb
[warning] 1-1: Script file generate_seo_tags.rb doesn't have execute permission.
(Lint/ScriptPermission)
[convention] 147-200: Assignment Branch Condition size for update_html_with_metadata is too high. [<10, 32, 12> 35.61/23]
(Metrics/AbcSize)
[convention] 147-200: Cyclomatic complexity for update_html_with_metadata
is too high. [10/7]
(Metrics/CyclomaticComplexity)
[convention] 147-200: Perceived complexity for update_html_with_metadata
is too high. [12/8]
(Metrics/PerceivedComplexity)
🔇 Additional comments (3)
scripts/generate_seo_tags.rb (3)
1-1
: Add execute permission to the scriptThe script file doesn't have execute permission, which will prevent it from being executed directly.
Run this command to fix:
chmod +x scripts/generate_seo_tags.rb🧰 Tools
🪛 RuboCop (1.73)
[warning] 1-1: Script file generate_seo_tags.rb doesn't have execute permission.
(Lint/ScriptPermission)
42-59
: Well-structured URL normalization functionThe normalize_url function is well-documented with clear examples and properly handles various URL formats including fragments, missing extensions, and directory paths.
147-200
: Refactor complex HTML update functionThe
update_html_with_metadata
function has high complexity (ABC size of 35.61, cyclomatic complexity of 10, perceived complexity of 12). Break it down into smaller, more focused functions.def update_html_with_metadata(file_path, keywords, description) return unless File.exist?(file_path) begin # Read file content html_content = File.read(file_path) # Parse with Nokogiri doc = Nokogiri::HTML(html_content) # Get existing head element head = doc.at_css('head') return unless head - # Check for existing metadata - existing_keywords = doc.css('meta[name="keywords"]') - existing_description = doc.css('meta[name="description"]') - - # Update or add keywords meta tag - if !keywords.empty? - keywords_str = keywords.to_a.uniq.join(', ') - if existing_keywords.empty? - # Add new meta tag for keywords - head.add_child("<meta name=\"keywords\" content=\"#{keywords_str}\">") - else - # Update existing keywords - existing_keywords.first['content'] = keywords_str - end - end - - # Update or add description meta tag if we have a good one - if !description.empty? - desc_str = description.to_a.first - - # Add if no description meta or if the existing one is too short - if existing_description.empty? || existing_description.first['content'].to_s.length < 50 - if existing_description.empty? - # Add new meta tag for description - head.add_child("<meta name=\"description\" content=\"#{desc_str}\">") - else - # Update existing description - existing_description.first['content'] = desc_str - end - end - end + # Update meta tags + update_keywords_meta(doc, head, keywords) + update_description_meta(doc, head, description) # Write updated content back to file File.write(file_path, doc.to_html) return true rescue => e puts "Error updating metadata for #{file_path}: #{e.message}" return false end end + # Update keywords meta tag + def update_keywords_meta(doc, head, keywords) + return if keywords.empty? + + existing_keywords = doc.css('meta[name="keywords"]') + keywords_str = keywords.to_a.uniq.join(', ') + + if existing_keywords.empty? + # Add new meta tag for keywords + head.add_child("<meta name=\"keywords\" content=\"#{CGI.escape_html(keywords_str)}\">") + else + # Update existing keywords + existing_keywords.first['content'] = keywords_str + end + end + + # Update description meta tag + def update_description_meta(doc, head, description) + return if description.empty? + + existing_description = doc.css('meta[name="description"]') + desc_str = description.to_a.first + + # Add if no description meta or if the existing one is too short + if existing_description.empty? || existing_description.first['content'].to_s.length < 50 + if existing_description.empty? + # Add new meta tag for description + head.add_child("<meta name=\"description\" content=\"#{CGI.escape_html(desc_str)}\">") + else + # Update existing description + existing_description.first['content'] = desc_str + end + end + end🧰 Tools
🪛 RuboCop (1.73)
[convention] 147-200: Assignment Branch Condition size for update_html_with_metadata is too high. [<10, 32, 12> 35.61/23]
(Metrics/AbcSize)
[convention] 147-200: Cyclomatic complexity for
update_html_with_metadata
is too high. [10/7](Metrics/CyclomaticComplexity)
[convention] 147-200: Perceived complexity for
update_html_with_metadata
is too high. [12/8](Metrics/PerceivedComplexity)
This commit updates the spelling of "Dissipative anamoly" to "Dissipative anomaly" across multiple entries in the research index. The changes ensure consistency and accuracy in the tagging of research topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
_config.yml (1)
35-35
: Add Newline at End of File.YAML linting indicated a missing newline at the end of the file. Adding this newline helps conform to style guidelines and avoids potential issues with certain processing tools.
- - .jekyll-cache + - .jekyll-cache +🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 35-35: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
_config.yml
(1 hunks)_research/index.md
(6 hunks)assets/js/search_db.json
(2 hunks)scripts/build.sh
(1 hunks)scripts/generate_search_db.rb
(1 hunks)scripts/generate_seo_tags.rb
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- scripts/build.sh
- assets/js/search_db.json
🧰 Additional context used
🪛 YAMLlint (1.35.1)
_config.yml
[error] 35-35: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (16)
_config.yml (2)
26-27
: Review Teaching Collection Permalink Configuration.The teaching collection now includes an explicit permalink (
/:collection/:name
), which brings consistency with the other collections. Verify that this URL structure meets the site's overall routing strategy for teaching pages.
28-35
: Approval of Exclude Section.The added
exclude
section correctly lists directories and files (e.g.,scripts/node_modules
,node_modules
,.jekyll-cache
, etc.) that should be omitted from processing. This change is aligned with the overall improvements for SEO and build efficiency.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 35-35: no new line character at the end of file
(new-line-at-end-of-file)
scripts/generate_seo_tags.rb (6)
1-7
: LGTM: Script has appropriate dependenciesThe script correctly includes all necessary libraries, including the CGI library that was previously missing and flagged in past reviews. The shebang line allows for direct script execution.
19-20
: Good implementation of configurable domainThe script properly loads the domain from environment variables first, then falls back to the site configuration, and finally uses a default value. This addresses the previous concern about hardcoded domains.
41-70
: Well-documented URL normalization functionThe
normalize_url
function includes excellent documentation with examples and is implemented correctly to handle various URL formats. The function properly removes fragments, ensures leading slashes, and handles directory URLs by appending index.html.
72-126
: Thorough metadata extraction logicThe code appropriately extracts keywords from multiple sources (title, type, tags) and creates descriptions from content, with proper handling of edge cases. The skippping of external URLs and entries with little content helps ensure quality metadata.
158-183
: Well-refactored HTML metadata update functionsThe HTML update functionality has been effectively split into smaller functions as suggested in previous reviews. The code now properly escapes HTML content using CGI.escape_html, addressing the security concern raised earlier.
Also applies to: 185-199, 201-218
250-277
: Verify domain variable in sitemap generationThe sitemap generation properly uses the configurable SITE_DOMAIN variable instead of a hardcoded domain, addressing previous concerns.
scripts/generate_search_db.rb (2)
74-74
: Improved README file checkThe case-insensitive check for README.md files is a good improvement that ensures consistent behavior regardless of filename casing.
81-83
: Efficient README processing skipAdding a complete skip for README processing is more efficient than continuing with unnecessary operations when the file is identified as README.md.
_research/index.md (6)
19-19
: Corrected Tag Spelling for Global ConsistencyThe tag on line 19 now reads "Dissipative anomaly," correcting the previous misspelling ("Dissipative anamoly"). This small fix enhances consistency across the document.
59-69
: Addition of New 2025 Research Entry [15]The new research entry for 2025 is comprehensively added with a header, updated citation details, and a full set of metadata:
- Header & Citation: The
<h3 id="15">
element clearly formats the research citation and assigns a unique identifier.- Tags: The
<tags>
block now includes "Drops," "Dissipative anomaly," "Superamphiphobic-surfaces," "Impact forces," and "Featured," aligning the entry with current terminology and enhancing discoverability.- External Links & Image: The inclusion of DOI, arXiv, GitHub, and Blog links, along with an image demonstrating the parameter space of drop impact, provides a rich set of resources for readers.
Please verify that the external links (especially the GitHub repository URL and DOI) are accurate and up-to-date.
93-93
: Tag Correction for Research Entry [13]The tag on line 93 has been updated to "Dissipative anomaly," thereby correcting the earlier typographical error. This ensures uniform terminology across all research entries.
106-106
: Revised Tags for Research Entry [12]The tags for the research entry on [12] have been revised to include
<span>Bubbles</span>
,<span>Non-Newtonian</span>
,<span>Jets</span>
, and<span>Soft-matter-singularities</span>
. This update removes the outdated/misspelled terms and aligns the tags with the site's overall tagging conventions.
127-127
: Tag Correction for Research Entry [11]The tag on line 127 now correctly shows "Dissipative anomaly," replacing the incorrect version. This correction contributes to consistent filtering and search functionality on the site.
158-158
: Tag Correction for Research Entry [9]The updated tag on line 158 now correctly reflects "Dissipative anomaly," ensuring that the term is uniformly used throughout the document. This change supports better categorization and reader comprehension.
Description
This pull request includes the following changes:
Generate SEO metadata from search database:
scripts/generate_seo_tags.rb
that generates SEO metadata (keywords and descriptions) for the website's pages based on the search database.assets/js/search_db.json
.Add build and development instructions, repository guidelines, and coding conventions:
Add GitHub Actions workflow to rebuild site on search database update:
assets/js/search_db.json
) is updated.main
branch, specifically when the search database file is modified.Add new paper on drop impact forces and scaling:
Related Issues
None.
Checklist