Skip to content

Commit 8cd6f77

Browse files
SimplicityGuyclaude
andcommitted
docs: comprehensive documentation update and organization
- Create docs/README.md as complete documentation index - Add file-completion-tracking.md guide for new completion tracking feature - Reorganize main README documentation section into categories: - Essential Guides - Development Standards - Operations & Security - Features & References - Update consumer-cancellation.md with extractor integration details - Update recent-improvements.md with January 2025 enhancements - Add file completion tracking info to extractor/README.md - Ensure all docs follow lowercase-with-hyphens naming convention - Add proper navigation links and last updated dates - Remove duplicate documentation links from main README footer All documentation is now properly organized, linked, and up-to-date with the latest platform features and improvements. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 184ac8c commit 8cd6f77

6 files changed

Lines changed: 374 additions & 9 deletions

File tree

README.md

Lines changed: 33 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -117,13 +117,41 @@ graph TD
117117

118118
## 📖 Documentation
119119

120+
### 🎯 Essential Guides
121+
120122
| Document | Purpose |
121123
|----------|---------|
122124
| **[CLAUDE.md](CLAUDE.md)** | 🤖 Claude Code integration guide & development standards |
125+
| **[Documentation Index](docs/README.md)** | 📚 Complete documentation directory with all guides |
123126
| **[GitHub Actions Guide](docs/github-actions-guide.md)** | 🚀 CI/CD workflows, automation & best practices |
124-
| **[Task Automation](docs/task-automation.md)** | 🚀 Complete taskipy command reference |
127+
| **[Task Automation](docs/task-automation.md)** | ⚡ Complete taskipy command reference |
128+
129+
### 🏗️ Development Standards
130+
131+
| Document | Purpose |
132+
|----------|---------|
133+
| **[Monorepo Guide](docs/monorepo-guide.md)** | 📦 Managing Python monorepo with shared dependencies |
134+
| **[Testing Guide](docs/testing-guide.md)** | 🧪 Comprehensive testing strategies and patterns |
135+
| **[Logging Guide](docs/logging-guide.md)** | 📊 Structured logging standards and practices |
136+
| **[Python Version Management](docs/python-version-management.md)** | 🐍 Managing Python 3.13+ across the project |
137+
138+
### 🛡️ Operations & Security
139+
140+
| Document | Purpose |
141+
|----------|---------|
125142
| **[Docker Security](docs/docker-security.md)** | 🔒 Container hardening & security practices |
126-
| **[Dockerfile Standards](docs/dockerfile-standards.md)** | 🏗️ Best practices for writing Dockerfiles |
143+
| **[Dockerfile Standards](docs/dockerfile-standards.md)** | 🐋 Best practices for writing Dockerfiles |
144+
| **[Database Resilience](docs/database-resilience.md)** | 💾 Database connection patterns & error handling |
145+
| **[Performance Guide](docs/performance-guide.md)** | ⚡ Performance optimization strategies |
146+
147+
### 📋 Features & References
148+
149+
| Document | Purpose |
150+
|----------|---------|
151+
| **[Consumer Cancellation](docs/consumer-cancellation.md)** | 🔄 File completion and consumer lifecycle |
152+
| **[Platform Targeting](docs/platform-targeting.md)** | 🎯 Cross-platform compatibility |
153+
| **[Emoji Guide](docs/emoji-guide.md)** | 📋 Standardized emoji usage |
154+
| **[Recent Improvements](docs/recent-improvements.md)** | 🚀 Latest platform enhancements |
127155
| **Service Guides** | 📚 Individual README for each service |
128156

129157
## 🚀 Quick Start
@@ -883,12 +911,9 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
883911

884912
### Documentation
885913

886-
- 📖 **[CLAUDE.md](CLAUDE.md)** - Detailed technical documentation
887-
- 🚀 **[GitHub Actions Guide](docs/github-actions-guide.md)** - CI/CD workflows and automation
888-
- 🤖 **[Task Automation](docs/task-automation.md)** - Available tasks and workflows
889-
- 🔒 **[Docker Security](docs/docker-security.md)** - Security best practices
890-
- 🏗️ **[Dockerfile Standards](docs/dockerfile-standards.md)** - Container standards
891-
- 📦 **[Service READMEs](/)** - Individual service documentation
914+
- 📚 **[Complete Documentation Index](docs/README.md)** - All guides and references
915+
- 🤖 **[CLAUDE.md](CLAUDE.md)** - AI development guide
916+
- 📦 **Service Documentation** - README in each service directory
892917

893918
### Project Status
894919

docs/README.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# 📚 Discogsography Documentation
2+
3+
<div align="center">
4+
5+
**Comprehensive guides and documentation for the Discogsography platform**
6+
7+
[🏠 Back to Main](../README.md) | [🤖 Claude Guide](../CLAUDE.md) | [📋 Emoji Guide](emoji-guide.md)
8+
9+
</div>
10+
11+
## 📖 Documentation Index
12+
13+
### 🏗️ Development & Standards
14+
15+
| Document | Description |
16+
|----------|-------------|
17+
| **[Dockerfile Standards](dockerfile-standards.md)** | 🐋 Best practices for writing secure, efficient Dockerfiles |
18+
| **[Monorepo Guide](monorepo-guide.md)** | 📦 Managing a Python monorepo with shared dependencies |
19+
| **[Python Version Management](python-version-management.md)** | 🐍 Managing Python 3.13+ across the project |
20+
| **[Platform Targeting](platform-targeting.md)** | 🎯 Cross-platform compatibility guidelines |
21+
| **[Logging Guide](logging-guide.md)** | 📊 Structured logging standards and best practices |
22+
| **[Testing Guide](testing-guide.md)** | 🧪 Comprehensive testing strategies and patterns |
23+
24+
### 🚀 Operations & Deployment
25+
26+
| Document | Description |
27+
|----------|-------------|
28+
| **[GitHub Actions Guide](github-actions-guide.md)** | 🔄 CI/CD workflows, automation & best practices |
29+
| **[Task Automation](task-automation.md)** | ⚡ Complete taskipy command reference |
30+
| **[Docker Security](docker-security.md)** | 🔒 Container hardening & security practices |
31+
| **[Database Resilience](database-resilience.md)** | 💾 Database connection patterns & error handling |
32+
| **[Performance Guide](performance-guide.md)** | ⚡ Performance optimization strategies |
33+
34+
### 📋 Features & Updates
35+
36+
| Document | Description |
37+
|----------|-------------|
38+
| **[Consumer Cancellation](consumer-cancellation.md)** | 🔄 File completion and consumer lifecycle management |
39+
| **[File Completion Tracking](file-completion-tracking.md)** | 📊 Intelligent completion tracking and stalled detection |
40+
| **[Recent Improvements](recent-improvements.md)** | 🚀 Latest platform enhancements and changes |
41+
| **[Emoji Guide](emoji-guide.md)** | 📋 Standardized emoji usage across the project |
42+
43+
## 🎯 Quick Links by Topic
44+
45+
### For New Contributors
46+
47+
1. Start with the [Monorepo Guide](monorepo-guide.md)
48+
1. Review [Python Version Management](python-version-management.md)
49+
1. Understand our [Logging Guide](logging-guide.md)
50+
1. Check [Testing Guide](testing-guide.md) for quality standards
51+
52+
### For DevOps
53+
54+
1. [Docker Security](docker-security.md) for container best practices
55+
1. [Dockerfile Standards](dockerfile-standards.md) for image optimization
56+
1. [GitHub Actions Guide](github-actions-guide.md) for CI/CD
57+
1. [Task Automation](task-automation.md) for build commands
58+
59+
### For Performance Tuning
60+
61+
1. [Performance Guide](performance-guide.md) for optimization strategies
62+
1. [Database Resilience](database-resilience.md) for connection pooling
63+
1. [Consumer Cancellation](consumer-cancellation.md) for message processing
64+
65+
## 📝 Documentation Standards
66+
67+
When creating or updating documentation:
68+
69+
1. **File Naming**: Use lowercase with hyphens (e.g., `new-feature-guide.md`)
70+
1. **Structure**: Include a header with title, description, and navigation
71+
1. **Emojis**: Follow the [Emoji Guide](emoji-guide.md) for consistency
72+
1. **Examples**: Provide practical code examples and configurations
73+
1. **Updates**: Keep the "Last Updated" date current
74+
75+
## 🤝 Contributing
76+
77+
To add new documentation:
78+
79+
1. Create a new `.md` file in this directory
80+
1. Follow the naming convention: `lowercase-with-hyphens.md`
81+
1. Add an entry to this README's index
82+
1. Update the main [README.md](../README.md) if it's a major guide
83+
1. Use the standard header format shown in existing docs
84+
85+
## 📊 Documentation Coverage
86+
87+
Current documentation covers:
88+
89+
- ✅ Development environment setup
90+
- ✅ CI/CD workflows and automation
91+
- ✅ Docker and containerization
92+
- ✅ Testing strategies
93+
- ✅ Performance optimization
94+
- ✅ Security best practices
95+
- ✅ Database patterns
96+
- ✅ Service architecture
97+
98+
## 🔍 Need Help?
99+
100+
- Check [CLAUDE.md](../CLAUDE.md) for AI-assisted development
101+
- Review service-specific READMEs in each service directory
102+
- See [Recent Improvements](recent-improvements.md) for latest changes

docs/consumer-cancellation.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
# Consumer Cancellation Feature
22

3+
<div align="center">
4+
5+
**Automatic consumer lifecycle management for completed file processing**
6+
7+
Last Updated: January 2025
8+
9+
</div>
10+
311
## Overview
412

513
The consumer cancellation feature automatically closes RabbitMQ queue consumers after files have completed processing. This helps free up resources and provides clearer monitoring of active vs. completed file processing.
@@ -102,3 +110,19 @@ docker-compose logs -f tableinator graphinator
102110
- Consumer tags are stored when consumers are created
103111
- Cancellation tasks are tracked to allow proper cleanup on shutdown
104112
- The `nowait=True` parameter prevents hanging if RabbitMQ is slow to respond
113+
114+
## Extractor Integration
115+
116+
The extractor service integrates with consumer cancellation by:
117+
118+
1. **Sending File Completion Messages**: When a file finishes processing, the extractor sends a "file_complete" message
119+
1. **Tracking Completed Files**: The extractor maintains a `completed_files` set to avoid false stalled warnings
120+
1. **Progress Monitoring**: Completed files are excluded from stalled detection logic
121+
122+
This prevents the extractor from incorrectly reporting files as "stalled" when they have actually completed processing and their consumers have been canceled.
123+
124+
### Recent Improvements (January 2025)
125+
126+
- Added `completed_files` tracking in extractor to prevent false stalled warnings
127+
- Enhanced progress reporting to show which file types are completed
128+
- Fixed issue where completed files would show as stalled after 2 minutes of inactivity

docs/file-completion-tracking.md

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# File Completion Tracking
2+
3+
<div align="center">
4+
5+
**Intelligent file completion tracking and stalled detection management**
6+
7+
Last Updated: January 2025
8+
9+
[🏠 Back to Docs](README.md) | [🔄 Consumer Cancellation](consumer-cancellation.md)
10+
11+
</div>
12+
13+
## Overview
14+
15+
The file completion tracking system ensures accurate monitoring of file processing status across the Discogsography platform. It prevents false warnings about stalled extractors and coordinates with the consumer cancellation feature for optimal resource management.
16+
17+
## How It Works
18+
19+
### 1. File Processing Lifecycle
20+
21+
```mermaid
22+
graph LR
23+
A[File Processing Starts] --> B[Records Extracted]
24+
B --> C[Messages Sent to RabbitMQ]
25+
C --> D[File Complete Message Sent]
26+
D --> E[File Marked as Complete]
27+
E --> F[Consumer Cancellation Scheduled]
28+
F --> G[Stalled Detection Skips File]
29+
30+
style D fill:#f9f,stroke:#333,stroke-width:4px
31+
style E fill:#9f9,stroke:#333,stroke-width:4px
32+
```
33+
34+
### 2. Completion Tracking
35+
36+
When a file finishes processing:
37+
38+
1. **Extractor** sends a `file_complete` message with:
39+
40+
- `type`: "file_complete"
41+
- `data_type`: The type of data (artists, labels, masters, releases)
42+
- `timestamp`: Completion time
43+
- `total_processed`: Number of records processed
44+
- `file`: Original filename
45+
46+
1. **Extractor** adds the data type to `completed_files` set
47+
48+
1. **Consumers** (graphinator/tableinator) receive the message and:
49+
50+
- Mark the file as complete (🎉 in logs)
51+
- Schedule consumer cancellation after grace period
52+
53+
### 3. Stalled Detection
54+
55+
The extractor's progress monitoring:
56+
57+
- Checks for files with no activity for >2 minutes
58+
- **Excludes** files in the `completed_files` set
59+
- Only reports actual stalls, not completed files
60+
61+
## Implementation Details
62+
63+
### Extractor Changes
64+
65+
```python
66+
# Global tracking variables
67+
completed_files = set() # Track which files have been completed
68+
69+
# When sending file completion message
70+
completed_files.add(self.data_type)
71+
72+
# During stalled detection
73+
for data_type, last_time in last_extraction_time.items():
74+
# Skip if this file type has been completed
75+
if data_type in completed_files:
76+
continue
77+
# ... stalled detection logic
78+
```
79+
80+
### Progress Reporting
81+
82+
Enhanced progress reports show:
83+
84+
```
85+
📊 Extraction Progress: 50000 total records extracted
86+
(Artists: 20000, Labels: 15000, Masters: 10000, Releases: 5000)
87+
✅ Completed file types: ['artists', 'labels']
88+
✅ Active extractors: ['masters', 'releases']
89+
```
90+
91+
## Benefits
92+
93+
1. **Accurate Monitoring**: No false warnings about completed files
94+
1. **Clear Status**: Easy to see which files are done vs. active
95+
1. **Resource Optimization**: Works with consumer cancellation for cleanup
96+
1. **Better Debugging**: Clear indication of actual vs. false stalls
97+
98+
## Configuration
99+
100+
No additional configuration needed - the feature works automatically with existing settings.
101+
102+
### Related Environment Variables
103+
104+
- `CONSUMER_CANCEL_DELAY`: Grace period before canceling consumers (default: 300s)
105+
- `FORCE_REPROCESS`: Set to "true" to reprocess all files
106+
107+
## Monitoring
108+
109+
### Log Messages to Watch
110+
111+
**Extractor**:
112+
113+
- `✅ Sent file completion message for {type}` - File marked complete
114+
- `✅ Completed file types: [...]` - Shows all completed files
115+
- `⚠️ Stalled extractors detected: [...]` - Only shows actual stalls
116+
117+
**Consumers**:
118+
119+
- `🎉 File processing complete for {type}!` - Completion received
120+
- `🔌 Canceling consumer for {type}` - Cancellation scheduled
121+
122+
## Troubleshooting
123+
124+
### Issue: Still seeing stalled warnings for completed files
125+
126+
**Cause**: Service was restarted and lost completion state
127+
128+
**Solution**: The `completed_files` set is reset on restart. This is expected behavior - the warnings will stop once files complete in the new session.
129+
130+
### Issue: Consumer not being canceled after completion
131+
132+
**Check**:
133+
134+
1. Verify `CONSUMER_CANCEL_DELAY` is not 0
135+
1. Check logs for cancellation messages
136+
1. Ensure RabbitMQ connection is stable
137+
138+
## Testing
139+
140+
Test the feature:
141+
142+
```bash
143+
# Start services
144+
docker-compose up -d
145+
146+
# Watch logs for completion tracking
147+
docker-compose logs -f extractor | grep -E "(Completed file types|Stalled extractors)"
148+
149+
# Force a quick test with small files
150+
# Files will complete quickly and should not show as stalled
151+
```
152+
153+
## Technical Architecture
154+
155+
### State Management
156+
157+
- `extraction_progress`: Tracks record counts per type
158+
- `last_extraction_time`: Tracks last activity time per type
159+
- `completed_files`: Set of completed data types
160+
- State is reset when processing new files
161+
162+
### Integration Points
163+
164+
1. **Extractor → RabbitMQ**: Sends completion message
165+
1. **Extractor Internal**: Updates completion tracking
166+
1. **Consumers → RabbitMQ**: Cancel queue consumers
167+
1. **Progress Reporter**: Excludes completed files
168+
169+
## Future Enhancements
170+
171+
- [ ] Persist completion state across restarts
172+
- [ ] Add completion timestamps to progress reports
173+
- [ ] Create completion metrics for monitoring
174+
- [ ] Add file-level (not just type-level) tracking

0 commit comments

Comments
 (0)