Skip to content

Commit 332cd0e

Browse files
feat: integrate Crawler Scheduler into Docker and enhance documentation
- Added Crawler Scheduler service to both development and production Docker Compose files for automated task management. - Created comprehensive documentation for the Crawler Scheduler, including usage guides, integration methods, and troubleshooting resources. - Implemented necessary scripts for starting, stopping, and verifying the Crawler Scheduler setup. - Enhanced the overall project structure by organizing documentation and ensuring all components are easily discoverable. These changes significantly improve the deployment and management of the Crawler Scheduler, providing a robust solution for automated crawling tasks within the Search Engine Core project.
1 parent f8ab887 commit 332cd0e

40 files changed

+5050
-36
lines changed
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
name: 📅 Build Crawler Scheduler
2+
3+
on:
4+
workflow_call:
5+
inputs:
6+
docker_image:
7+
required: true
8+
type: string
9+
docker_tag:
10+
required: true
11+
type: string
12+
cache_version:
13+
required: false
14+
type: string
15+
default: '1'
16+
17+
permissions:
18+
contents: read
19+
packages: write
20+
actions: write
21+
22+
jobs:
23+
build-crawler-scheduler:
24+
name: 📅 Build Crawler Scheduler
25+
runs-on: ubuntu-latest
26+
27+
steps:
28+
- uses: actions/checkout@v4
29+
30+
- name: Log in to GitHub Container Registry
31+
uses: docker/login-action@v3
32+
with:
33+
registry: ghcr.io
34+
username: ${{ github.actor }}
35+
password: ${{ secrets.GITHUB_TOKEN }}
36+
37+
- name: Set up Docker Buildx
38+
uses: docker/setup-buildx-action@v3
39+
40+
- name: Try to load image from cache
41+
id: load-cache
42+
run: |
43+
if docker pull ${{ inputs.docker_image }}:${{ inputs.docker_tag }}; then
44+
echo "loaded=true" >> $GITHUB_OUTPUT
45+
else
46+
echo "loaded=false" >> $GITHUB_OUTPUT
47+
fi
48+
49+
- name: Build Crawler Scheduler Service Image
50+
if: steps.load-cache.outputs.loaded != 'true'
51+
uses: docker/build-push-action@v5
52+
with:
53+
context: ./crawler-scheduler
54+
file: ./crawler-scheduler/Dockerfile
55+
tags: ${{ inputs.docker_image }}:${{ inputs.docker_tag }}
56+
load: true
57+
push: true
58+
cache-from: type=gha
59+
cache-to: type=gha,mode=max
60+

.github/workflows/docker-build-orchestrator.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,13 @@ jobs:
3333
docker_image: ghcr.io/${{ github.repository }}/js-minifier
3434
docker_tag: latest
3535

36+
build-crawler-scheduler:
37+
uses: ./.github/workflows/build-crawler-scheduler.yml
38+
with:
39+
docker_image: ghcr.io/${{ github.repository }}/crawler-scheduler
40+
docker_tag: latest
41+
cache_version: ${{ inputs.cache_version }}
42+
3643
build-app:
3744
needs: [build-drivers, build-js-minifier]
3845
uses: ./.github/workflows/build-search-engine.yml

DOCS_ORGANIZATION_COMPLETE.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# ✅ Documentation Organization Complete
2+
3+
**Date:** October 17, 2025
4+
**Status:** Successfully organized all markdown files
5+
6+
## 📊 Summary
7+
8+
### Files Organized: 8 moved + 1 new directory
9+
10+
### Before → After
11+
12+
```
13+
❌ BEFORE (Scattered) ✅ AFTER (Organized)
14+
├── README.md ├── README.md
15+
├── FIX_MONGODB_WARNING.md ├── DOCUMENTATION_REORGANIZATION.md
16+
├── MONGODB_WARNING_ANALYSIS.md └── docs/
17+
├── SCHEDULER_INTEGRATION_SUMMARY.md ├── README.md (updated)
18+
├── WEBSITE_PROFILE_API_SUMMARY.md ├── DOCUMENTATION_CLEANUP.md
19+
└── docs/ ├── DOCUMENTATION_ORGANIZATION_SUMMARY.md
20+
├── README.md ├── api/ (9 files)
21+
├── DOCKER_HEALTH_CHECK_...md │ ├── README.md
22+
├── JS_MINIFIER_CLIENT_...md │ ├── crawler_endpoint.md
23+
├── PERFORMANCE_OPT...md │ ├── search_endpoint.md
24+
├── PRODUCTION_JS_...md │ ├── sponsor_endpoint.md
25+
├── api/ (5 files) │ ├── website_profile_endpoint.md
26+
├── architecture/ (4 files) │ └── WEBSITE_PROFILE_API_SUMMARY.md ⬅ moved
27+
├── development/ (5 files) ├── architecture/ (8 files)
28+
└── guides/ (4 files) │ ├── content-storage-layer.md
29+
│ ├── PERFORMANCE_OPTIMIZATIONS_SUMMARY.md ⬅ moved
30+
│ ├── SCHEDULER_INTEGRATION_SUMMARY.md ⬅ moved
31+
│ ├── SCORING_AND_RANKING.md
32+
│ └── SPA_RENDERING.md
33+
├── development/ (6 files)
34+
│ ├── JS_MINIFIER_CLIENT_CHANGELOG.md ⬅ moved
35+
│ ├── MONGODB_CPP_GUIDE.md
36+
│ └── template-development.md
37+
├── guides/ (8 files)
38+
│ ├── DOCKER_HEALTH_CHECK_BEST_PRACTICES.md ⬅ moved
39+
│ ├── PRODUCTION_JS_MINIFICATION.md ⬅ moved
40+
│ ├── JS_CACHING_BEST_PRACTICES.md
41+
│ └── README_STORAGE_TESTING.md
42+
└── troubleshooting/ (3 files) 🆕 NEW
43+
├── README.md 🆕
44+
├── FIX_MONGODB_WARNING.md ⬅ moved
45+
└── MONGODB_WARNING_ANALYSIS.md ⬅ moved
46+
```
47+
48+
## 📁 Final Structure
49+
50+
```
51+
docs/ (34 markdown files organized)
52+
53+
├── 📄 Meta Documentation (3 files)
54+
│ ├── README.md - Main documentation index
55+
│ ├── DOCUMENTATION_CLEANUP.md
56+
│ └── DOCUMENTATION_ORGANIZATION_SUMMARY.md
57+
58+
├── 📂 api/ (9 files)
59+
│ └── API endpoints, schemas, examples
60+
61+
├── 📂 architecture/ (8 files)
62+
│ └── System design, technical architecture
63+
64+
├── 📂 development/ (6 files)
65+
│ └── Developer tools, guides, changelogs
66+
67+
├── 📂 guides/ (8 files)
68+
│ └── User guides, deployment, operations
69+
70+
└── 📂 troubleshooting/ (3 files) 🆕
71+
└── Bug fixes, problem analysis, solutions
72+
```
73+
74+
## 🎯 Quick Access
75+
76+
### For Developers
77+
78+
- 📚 **Start here:** [docs/README.md](docs/README.md)
79+
- 🔧 **API docs:** [docs/api/README.md](docs/api/README.md)
80+
- 🏗️ **Architecture:** [docs/architecture/](docs/architecture/)
81+
- 💻 **Development:** [docs/development/](docs/development/)
82+
- 🐛 **Troubleshooting:** [docs/troubleshooting/README.md](docs/troubleshooting/README.md)
83+
84+
### For Operations
85+
86+
- 🚀 **Production guides:** [docs/guides/](docs/guides/)
87+
- 🐳 **Docker setup:** [docs/guides/DOCKER_HEALTH_CHECK_BEST_PRACTICES.md](docs/guides/DOCKER_HEALTH_CHECK_BEST_PRACTICES.md)
88+
-**Performance:** [docs/architecture/PERFORMANCE_OPTIMIZATIONS_SUMMARY.md](docs/architecture/PERFORMANCE_OPTIMIZATIONS_SUMMARY.md)
89+
90+
### Recently Fixed Issues
91+
92+
- ⚠️ **MongoDB warning fix:** [docs/troubleshooting/FIX_MONGODB_WARNING.md](docs/troubleshooting/FIX_MONGODB_WARNING.md)
93+
94+
## 📊 Statistics
95+
96+
| Category | Before | After | Change |
97+
| ---------------------- | ------ | ----- | ------ |
98+
| Root-level docs | 4 | 2 | -2 ✅ |
99+
| Docs-level loose files | 4 | 3 | -1 ✅ |
100+
| Total directories | 4 | 5 | +1 ✅ |
101+
| Total organized files | 30 | 34 | +4 ✅ |
102+
103+
## ✨ Benefits
104+
105+
### 🎯 Improved Discoverability
106+
107+
- Clear categorization by purpose
108+
- Easy to find relevant documentation
109+
- Logical directory structure
110+
111+
### 🔧 Better Maintainability
112+
113+
- Consistent file organization
114+
- Predictable locations
115+
- Scalable structure
116+
117+
### 📈 Enhanced User Experience
118+
119+
- Updated navigation in README
120+
- Cross-referenced documentation
121+
- Comprehensive index files
122+
123+
## 🔗 Key Documents
124+
125+
### 📘 Main Index
126+
127+
[docs/README.md](docs/README.md) - Comprehensive documentation index with quick navigation
128+
129+
### 📋 This Organization
130+
131+
[DOCUMENTATION_REORGANIZATION.md](DOCUMENTATION_REORGANIZATION.md) - Detailed reorganization summary
132+
133+
### 🆕 New Troubleshooting Section
134+
135+
[docs/troubleshooting/README.md](docs/troubleshooting/README.md) - Troubleshooting guide index
136+
137+
## ✅ Checklist
138+
139+
- [x] Created `docs/troubleshooting/` directory
140+
- [x] Moved 8 files to appropriate locations
141+
- [x] Created troubleshooting README
142+
- [x] Updated main docs README with new structure
143+
- [x] Updated navigation links
144+
- [x] Updated version to 2.1
145+
- [x] Created comprehensive summaries
146+
- [x] Verified all files in correct locations
147+
- [x] No broken links
148+
149+
## 🎉 Result
150+
151+
All markdown documentation is now **properly organized**, **easily discoverable**, and **ready for future growth**!
152+
153+
---
154+
155+
**Next Steps:**
156+
157+
1. Review the new structure: `cd docs && ls -R`
158+
2. Read the updated index: `cat docs/README.md`
159+
3. Check troubleshooting guide: `cat docs/troubleshooting/README.md`
160+
161+
**Questions?** See [docs/README.md](docs/README.md) for complete documentation.

0 commit comments

Comments
 (0)