|
| 1 | +# PostgreSQL Pretty Printer Integration Plan |
| 2 | + |
| 3 | +## Current Status |
| 4 | + |
| 5 | +The pretty printer foundation is **complete and working**! Basic SQL formatting is functional with: |
| 6 | +- ✅ SELECT statements with aliases, schema qualification |
| 7 | +- ✅ Line length-based breaking (configurable via filename suffix) |
| 8 | +- ✅ Proper comma placement and indentation |
| 9 | +- ✅ Comprehensive test suite with snapshot testing |
| 10 | +- ✅ AST integrity verification (location-aware comparison) |
| 11 | + |
| 12 | +## Architecture Overview |
| 13 | + |
| 14 | +``` |
| 15 | +SQL Input → pgt_query::parse() → AST → ToTokens → Layout Events → Renderer → Formatted SQL |
| 16 | +``` |
| 17 | + |
| 18 | +**Key Components:** |
| 19 | +- **ToTokens trait**: Converts AST nodes to layout events |
| 20 | +- **Layout Events**: `Token`, `Space`, `Line(Hard/Soft/SoftOrSpace)`, `GroupStart/End`, `IndentStart/End` |
| 21 | +- **Renderer**: Two-phase prettier-style algorithm (try single line, else break) |
| 22 | + |
| 23 | +## Renderer Implementation Status |
| 24 | + |
| 25 | +### ✅ Completed |
| 26 | +- **Core rendering pipeline**: Event processing, text/space/line output |
| 27 | +- **Basic grouping**: Single-line vs multi-line decisions |
| 28 | +- **Indentation**: Configurable spaces/tabs with proper nesting |
| 29 | +- **Line length enforcement**: Respects `max_line_length` config |
| 30 | +- **Token rendering**: Keywords, identifiers, punctuation |
| 31 | +- **Break propagation**: Child groups with `break_parent: true` force parent groups to break |
| 32 | +- **Nested group independence**: Inner groups make independent fit decisions when outer groups break |
| 33 | +- **Stack overflow elimination**: Fixed infinite recursion in renderer |
| 34 | + |
| 35 | +### ❌ Missing Features (Priority Order) |
| 36 | + |
| 37 | +#### 1. **Group ID References** (Medium Priority) |
| 38 | +**Issue**: Groups can't reference each other's break decisions. |
| 39 | + |
| 40 | +```rust |
| 41 | +// Missing: Conditional formatting based on other groups |
| 42 | +GroupStart { id: Some("params") } |
| 43 | +// ... later reference "params" group's break decision |
| 44 | +``` |
| 45 | + |
| 46 | +**Implementation**: |
| 47 | +- Track group break decisions by ID |
| 48 | +- Add conditional breaking logic |
| 49 | + |
| 50 | +#### 2. **Advanced Line Types** (Medium Priority) |
| 51 | +**Issue**: `LineType::Soft` vs `LineType::SoftOrSpace` handling could be more sophisticated. |
| 52 | + |
| 53 | +**Current behavior**: |
| 54 | +- `Hard`: Always breaks |
| 55 | +- `Soft`: Breaks if group breaks, disappears if inline |
| 56 | +- `SoftOrSpace`: Breaks if group breaks, becomes space if inline |
| 57 | + |
| 58 | +**Enhancement**: Better handling of soft line semantics in complex nesting. |
| 59 | + |
| 60 | +#### 3. **Performance Optimizations** (Low Priority) |
| 61 | +- **Early bailout**: Stop single-line calculation when length exceeds limit |
| 62 | +- **Caching**: Memoize group fit calculations for repeated structures |
| 63 | +- **String building**: More efficient string concatenation |
| 64 | + |
| 65 | +## AST Node Coverage Status |
| 66 | + |
| 67 | +### ✅ Implemented ToTokens |
| 68 | +- `SelectStmt`: Basic SELECT with FROM clause |
| 69 | +- `ResTarget`: Column targets with aliases |
| 70 | +- `ColumnRef`: Column references (schema.table.column) |
| 71 | +- `String`: String literals in column references |
| 72 | +- `RangeVar`: Table references with schema |
| 73 | +- `FuncCall`: Function calls with break propagation support |
| 74 | + |
| 75 | +### ❌ Missing ToTokens (Add as needed) |
| 76 | +- `InsertStmt`, `UpdateStmt`, `DeleteStmt`: DML statements |
| 77 | +- `WhereClause`, `JoinExpr`: WHERE conditions and JOINs |
| 78 | +- `AExpr`: Binary/unary expressions (`a = b`, `a + b`) |
| 79 | +- `AConst`: Literals (numbers, strings, booleans) |
| 80 | +- `SubLink`: Subqueries |
| 81 | +- `CaseExpr`: CASE expressions |
| 82 | +- `WindowFunc`: Window functions |
| 83 | +- `AggRef`: Aggregate functions |
| 84 | +- `TypeCast`: Type casting (`::int`) |
| 85 | + |
| 86 | +## Testing Infrastructure |
| 87 | + |
| 88 | +### ✅ Current |
| 89 | +- **dir-test integration**: Drop SQL files → automatic snapshot testing |
| 90 | +- **Line length extraction**: `filename_80.sql` → `max_line_length: 80` |
| 91 | +- **AST integrity verification**: Ensures no data loss during formatting |
| 92 | +- **Location field handling**: Clears location differences for comparison |
| 93 | + |
| 94 | +### 🔄 Enhancements Needed |
| 95 | +- **Add more test cases**: Complex queries, edge cases |
| 96 | +- **Performance benchmarks**: Large SQL file formatting speed |
| 97 | +- **Configuration testing**: Different indent styles, line lengths |
| 98 | +- **Break propagation testing**: Verified with `FuncCall` implementation |
| 99 | + |
| 100 | +## Integration Steps |
| 101 | + |
| 102 | +### ✅ Phase 1: Core Renderer Fixes (COMPLETED) |
| 103 | +1. ✅ **Fix break propagation**: Implemented proper `break_parent` handling |
| 104 | +2. ✅ **Fix nested groups**: Allow independent fit decisions |
| 105 | +3. ✅ **Fix stack overflow**: Eliminated infinite recursion in renderer |
| 106 | +4. ✅ **Test with complex cases**: Added `FuncCall` with break propagation test |
| 107 | + |
| 108 | +### Phase 2: AST Coverage Expansion (2-4 days) |
| 109 | +1. **Add WHERE clause support**: `WhereClause`, `AExpr` ToTokens |
| 110 | +2. **Add basic expressions**: `AConst`, binary operators |
| 111 | +3. **Add INSERT/UPDATE/DELETE**: Basic DML statements |
| 112 | + |
| 113 | +### Phase 3: Advanced Features (1-2 days) |
| 114 | +1. **Implement group ID system**: Cross-group references |
| 115 | +2. **Add performance optimizations**: Early bailout, caching |
| 116 | +3. **Enhanced line breaking**: Better soft line semantics |
| 117 | + |
| 118 | +### Phase 4: Production Ready (1-2 days) |
| 119 | +1. **Comprehensive testing**: Large SQL files, edge cases |
| 120 | +2. **Performance validation**: Benchmark against alternatives |
| 121 | +3. **Documentation**: API docs, integration examples |
| 122 | + |
| 123 | +## API Integration Points |
| 124 | + |
| 125 | +```rust |
| 126 | +// Main formatting function |
| 127 | +pub fn format_sql(sql: &str, config: RenderConfig) -> Result<String, Error> { |
| 128 | + let parsed = pgt_query::parse(sql)?; |
| 129 | + let ast = parsed.root()?; |
| 130 | + |
| 131 | + let mut emitter = EventEmitter::new(); |
| 132 | + ast.to_tokens(&mut emitter); |
| 133 | + |
| 134 | + let mut output = String::new(); |
| 135 | + let mut renderer = Renderer::new(&mut output, config); |
| 136 | + renderer.render(emitter.events)?; |
| 137 | + |
| 138 | + Ok(output) |
| 139 | +} |
| 140 | + |
| 141 | +// Configuration |
| 142 | +pub struct RenderConfig { |
| 143 | + pub max_line_length: usize, // 80, 100, 120, etc. |
| 144 | + pub indent_size: usize, // 2, 4, etc. |
| 145 | + pub indent_style: IndentStyle, // Spaces, Tabs |
| 146 | +} |
| 147 | +``` |
| 148 | + |
| 149 | +## Estimated Completion Timeline |
| 150 | + |
| 151 | +- ✅ **Phase 1** (Core fixes): COMPLETED → **Fully functional renderer** |
| 152 | +- **Phase 2** (AST coverage): 4 days → **Supports most common SQL** |
| 153 | +- **Phase 3** (Advanced): 2 days → **Production-grade formatting** |
| 154 | +- **Phase 4** (Polish): 2 days → **Integration ready** |
| 155 | + |
| 156 | +**Total: ~1 week remaining** for complete production-ready PostgreSQL pretty printer. |
| 157 | + |
| 158 | +## Current Limitations |
| 159 | + |
| 160 | +1. **Limited SQL coverage**: Only basic SELECT statements and function calls |
| 161 | +2. **No error recovery**: Unimplemented AST nodes cause panics |
| 162 | +3. **No configuration validation**: Invalid configs not checked |
| 163 | +4. **Missing group ID system**: Cross-group conditional formatting not yet implemented |
| 164 | + |
| 165 | +The core renderer foundation is now solid with proper break propagation and nested group handling - the remaining work is primarily expanding AST node coverage. |
0 commit comments