Common issues and solutions for Discogsography
π Back to Main | π Documentation Index | π Monitoring
This guide covers common issues you might encounter while using Discogsography and provides step-by-step solutions. For real-time monitoring and debugging tools, see the Monitoring Guide.
Symptoms:
- Extractor fails to download data files
- Connection timeout errors
- Disk space errors
- Permission denied errors
Diagnostic Steps:
# Check connectivity to Discogs S3
curl -I https://discogs-data-dumps.s3.us-west-2.amazonaws.com
# Verify disk space (need 76GB+)
df -h /discogs-data
# Check permissions
ls -la /discogs-data
# View extractor logs
docker-compose logs -f extractor-discogs extractor-musicbrainzSolutions:
-
β Ensure internet connectivity
# Test connection ping discogs-data-dumps.s3.us-west-2.amazonaws.com -
β Verify 76GB+ free space
# Check available space df -h /discogs-data # Clean up if needed docker system prune -a --volumes
-
β Check directory permissions
# Fix permissions (Docker needs write access) sudo chown -R 1000:1000 /discogs-data chmod -R 755 /discogs-data -
β Verify environment variables
# Check DISCOGS_ROOT is set correctly echo $DISCOGS_ROOT # Should point to writable directory
Symptoms:
- Services can't connect to RabbitMQ
- "Connection refused" errors
- "Authentication failed" errors
Diagnostic Steps:
# Check RabbitMQ status
docker-compose ps rabbitmq
docker-compose logs rabbitmq
# Test connection
curl -u discogsography:discogsography http://localhost:15672/api/overview
# Check if port is accessible
netstat -an | grep 5672Solutions:
-
β Wait for RabbitMQ startup (30-60s)
# RabbitMQ takes time to start docker-compose logs -f rabbitmq | grep "started"
-
β Check firewall settings
# Ensure ports 5672 and 15672 are not blocked # macOS/Linux sudo ufw status
-
β Verify credentials in
.env# Check RabbitMQ connection variables echo $RABBITMQ_HOST echo $RABBITMQ_USERNAME echo $RABBITMQ_PASSWORD # Should match RabbitMQ configuration
-
β Restart RabbitMQ
docker-compose restart rabbitmq docker-compose logs -f rabbitmq
Symptoms:
- "Failed to connect to Neo4j" errors
- "Authentication failed" errors
- Timeout errors
Diagnostic Steps:
# Check Neo4j status
docker-compose logs neo4j
# Test HTTP access
curl http://localhost:7474
# Test bolt connection
echo "MATCH (n) RETURN count(n);" | \
cypher-shell -u neo4j -p discogsographySolutions:
-
β Wait for Neo4j startup (30-60s)
docker-compose logs -f neo4j | grep "Started"
-
β Verify credentials
# Check environment variables echo $NEO4J_HOST echo $NEO4J_USERNAME echo $NEO4J_PASSWORD
-
β Check connection string
# Should be bolt://host:7687 # For Docker: bolt://neo4j:7687 # For local: bolt://localhost:7687
-
β Restart Neo4j
docker-compose restart neo4j
Symptoms:
- "Could not connect to PostgreSQL" errors
- "Authentication failed" errors
- Connection timeout errors
Diagnostic Steps:
# Check PostgreSQL status
docker-compose logs postgres
# Test connection
PGPASSWORD=discogsography psql \
-h localhost -p 5433 -U discogsography \
-d discogsography -c "SELECT 1;"Solutions:
-
β Wait for PostgreSQL startup
docker-compose logs -f postgres | grep "ready"
-
β Verify credentials
echo $POSTGRES_HOST echo $POSTGRES_USERNAME echo $POSTGRES_PASSWORD echo $POSTGRES_DATABASE
-
β Check port mapping
# Default: 5433 (host) maps to 5432 (container) docker-compose ps postgres -
β Restart PostgreSQL
docker-compose restart postgres
Symptoms:
- "Port already in use" errors
- Services fail to start
- "Address already in use" errors
Diagnostic Steps:
# Check what's using the ports
netstat -an | grep -E "(5672|7474|7687|5433|6379|8003|8004|8005)"
# Or on macOS
lsof -i :8004
lsof -i :7474
# List all Docker containers
docker ps -aSolutions:
-
β Stop conflicting services
# Find process using port lsof -i :8004 # Kill the process kill -9 <PID>
-
β Change port mapping
# Edit docker-compose.yml ports: - "8006:8005" # Use 8006 on host instead
-
β Stop all Docker containers
docker-compose down docker-compose up -d
Symptoms:
- Containers crash or are killed
- "No space left on device" errors
- Docker build failures
Diagnostic Steps:
# Check available disk space
df -h
# Check Docker disk usage
docker system df
# Check container resource usage
docker statsSolutions:
-
β Increase Docker memory limits
- Open Docker Desktop β Settings β Resources
- Increase memory allocation (recommend 16GB+ for full dataset)
- Restart Docker
-
β Clean up Docker resources
# Remove unused containers docker container prune # Remove unused images docker image prune -a # Remove unused volumes docker volume prune # Remove everything unused docker system prune -a --volumes
-
β Free up disk space
# Find large files du -sh /path/to/data/* | sort -hr | head -10 # Remove old logs find /var/log -name "*.log" -mtime +7 -delete
Symptoms:
- Cannot write to volumes or log files
- "Permission denied" errors in logs
- Services fail to start with permission errors
Diagnostic Steps:
# Check file permissions
ls -la /discogs-data
ls -la logs/
# Check Docker user
docker run --rm alpine idSolutions:
# Fix permissions on host directories
sudo chown -R 1000:1000 /discogs-data
sudo chown -R 1000:1000 logs/
chmod -R 755 /discogs-data
chmod -R 755 logs/All services expose health endpoints:
# Check each service
curl http://localhost:8000/health # Extractor
curl http://localhost:8001/health # Graphinator
curl http://localhost:8002/health # Tableinator
curl http://localhost:8003/health # Dashboard
curl http://localhost:8005/health # API (health check port)
curl http://localhost:8009/health # Insights
curl http://localhost:8010/health # Brainztableinator
curl http://localhost:8011/health # Brainzgraphinator
curl http://localhost:8007/health # Explore (health check port)Expected response:
{"status": "healthy"}If unhealthy:
# View service logs
docker-compose logs [service_name]
# Restart service
docker-compose restart [service_name]Set LOG_LEVEL environment variable for detailed output:
# Set environment variable
export LOG_LEVEL=DEBUG
# Restart services
docker-compose down
docker-compose up -d
# Or for specific service
LOG_LEVEL=DEBUG uv run python -m explore.exploreDEBUG level includes:
- π Database query logging with parameters
- π Detailed operation traces
- π Cache hits/misses
- π‘ Internal state changes
See Logging Guide for complete details.
# All services
docker-compose logs -f
# Specific service with timestamp
docker-compose logs -f --timestamps graphinator
# Filter for errors
docker-compose logs -f | grep -E "(ERROR|β)"
# Filter for Neo4j queries (DEBUG mode β queries are handled by the API service)
docker-compose logs -f api | grep "π Executing Neo4j query"# RabbitMQ management UI
open http://localhost:15672
# Or use CLI monitoring
just monitor
# Or API
curl -u discogsography:discogsography \
http://localhost:15672/api/queuesLook for:
- Messages accumulating (consumers not keeping up)
- Zero consumers (service not connected)
- High unacked count (processing errors)
Neo4j:
# Browser access
curl http://localhost:7474
# Query test
echo "MATCH (n) RETURN count(n) as total;" | \
cypher-shell -u neo4j -p discogsographyPostgreSQL:
# Connection test
PGPASSWORD=discogsography psql \
-h localhost -p 5433 -U discogsography \
-d discogsography -c "SELECT 1;"
# Check record counts
PGPASSWORD=discogsography psql \
-h localhost -p 5433 -U discogsography \
-d discogsography \
-c "SELECT 'artists' as table, COUNT(*) FROM artists \
UNION ALL SELECT 'releases', COUNT(*) FROM releases;"Neo4j - Check node counts:
MATCH (n)
RETURN labels(n)[0] as type, count(n) as count
ORDER BY count DESC;PostgreSQL - Check table counts:
SELECT 'artists' as table_name, COUNT(*) FROM artists
UNION ALL
SELECT 'releases', COUNT(*) FROM releases
UNION ALL
SELECT 'labels', COUNT(*) FROM labels
UNION ALL
SELECT 'masters', COUNT(*) FROM masters;Expected counts (full dataset):
- Artists: ~10 million
- Releases: ~19 million
- Labels: ~2.3 million
- Masters: ~2.5 million
Symptoms: You see warning messages in the Graphinator service logs like:
{"event":"Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.UnknownRelationshipTypeWarning} ...","level":"warning",...}Warnings about:
- Unknown relationship types:
BY,IS - Unknown labels:
Genre,Style - Unknown properties:
profile
Cause: These warnings appear when:
- The Neo4j database is empty (no data has been loaded yet)
- The database is being populated by the Graphinator service
- A service queries data that does not exist yet
This is normal and not an error! The Cypher queries use OPTIONAL MATCH patterns that gracefully handle missing data.
Solution 1: Suppress the warnings (Recommended)
The warnings are already suppressed in the codebase by configuring the logging level:
# In common/config.py setup_logging()
logging.getLogger("neo4j.notifications").setLevel(logging.ERROR)
logging.getLogger("neo4j").setLevel(logging.ERROR)Solution 2: Populate the database
Run the extractor and graphinator to load data:
docker-compose up -d extractor-discogs
docker-compose logs -f extractor-discogs
docker-compose up -d graphinator
docker-compose logs -f graphinator
# Verify data in Neo4j
curl http://localhost:7474Symptom:
- Dashboard shows "Disconnected" status
- Real-time updates not working
- Browser console shows WebSocket errors
Solution:
# Check dashboard is running
curl http://localhost:8003/health
# Restart dashboard
docker-compose restart dashboard
# Check browser console for errors
# F12 β Console tabSymptom:
- Dashboard shows old data
- Metrics don't update
Solution:
# Clear Redis cache
docker-compose exec redis redis-cli FLUSHDB
# Restart dashboard
docker-compose restart dashboard
# Refresh browser (Cmd+Shift+R / Ctrl+Shift+F5)Symptom:
- Extractor logs show "π Checking for updates..." repeatedly
- No download progress
- Runs indefinitely
Solution:
# Check network connectivity
curl -I https://discogs-data-dumps.s3.us-west-2.amazonaws.com
# Restart extractor
docker-compose restart extractor-discogs # or extractor-musicbrainz
# Check logs
docker-compose logs -f extractor-discogs extractor-musicbrainzSymptom:
- Download takes very long
- Slow progress messages
- Low MB/s rate
Solutions:
-
Check network speed
# Test download speed speedtest-cli -
Resume interrupted download
- Extractor automatically resumes from last position
- Check for partial
.xml.gzfiles in/discogs-data
Symptoms:
- Queries take too long
- Dashboard slow to load
- Explore service timeouts
Diagnostic Steps:
Neo4j:
-- Profile slow query
PROFILE MATCH (a:Artist {name: "Pink Floyd"})-[:BY]-(r:Release)
RETURN r.title, r.year;
-- Check index usage
SHOW INDEXES;PostgreSQL:
-- Analyze query performance
EXPLAIN ANALYZE
SELECT data FROM artists WHERE data->>'name' = 'Pink Floyd';
-- Check index usage
SELECT * FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;Solutions:
- Add missing indexes (see Database Schema)
- Run VACUUM ANALYZE on PostgreSQL
- Increase database memory (see Configuration)
- Enable query caching in Redis
Symptoms:
- Services using excessive RAM
- OOM (Out of Memory) kills
- System slowdown
Solutions:
# Check resource usage
docker stats
# Limit service memory in docker-compose.yml
services: