Repository files navigation # Share-Safe Toolkit
**Privacy-First Redaction PWA**
A cross-platform, browser-only Progressive Web App for safely redacting sensitive information from documents, images, and text files before sharing. All processing happens locally in your browser�nothing is uploaded.


---
## 🆕 What's New in v1.1
**Format Expansion Release** - We've extended support beyond PDFs and images!
- ✨ **Plain Text & Markdown** (.txt, .md) - Redact configuration files, logs, and notes
- ✨ **CSV/TSV Support** (.csv, .tsv) - Redact spreadsheet data with column-level operations
- 🏗️ **Format Abstraction Layer** - Extensible architecture for adding new formats
- 📚 **Comprehensive Documentation** - See [docs/FORMATS.md](docs/FORMATS.md) for details
**Coming Soon:** Office documents (.docx, .xlsx, .pptx), Rich Text (.rtf, .html), E-books (.epub)
---
## Features
### Core Capabilities
- **Multi-Format Support** 🆕
- **PDFs**: Text-based and scanned documents with OCR
- **Images**: JPEG, PNG, WebP, GIF, BMP with automatic EXIF removal
- **Plain Text**: .txt and .md files with line-based redaction
- **CSV/TSV**: Spreadsheet data with cell and column-level redaction
- See [docs/FORMATS.md](docs/FORMATS.md) for full format documentation
- **Intelligent PII Detection**
- Automatic detection: Emails, phone numbers, SSNs, payment card numbers
- Regex + Luhn validation for accuracy
- Optional ML-based Named Entity Recognition (NER)
- Hybrid detection merging for best results
- **Flexible Redaction**
- Automatic detection suggestions
- Manual redaction boxes (draw with mouse/touch)
- Column-based redaction for CSV files
- Non-reversible black-box rendering
- **Security & Privacy**
- 100% client-side processing
- No server uploads
- No tracking or analytics
- Automatic metadata removal (EXIF, GPS)
- Flattened exports (no hidden layers or selectable text)
- **Modern PWA**
- Installable on desktop & mobile
- Offline support via Service Worker
- Fast, responsive interface
- WCAG 2.2 accessible
---
## Security Rationale
### Why Black Boxes + Flattening?
**Never use blur or pixelation for redaction.** These techniques are reversible:
- **Research evidence**: Studies including work from the PoPETs conference demonstrate that blurred text can be recovered using deconvolution techniques
- **Real-world failures**: Multiple high-profile incidents where pixelated data was recovered (see Bishop Fox security advisories)
**Our approach**:
1. **Solid black rectangles**: Completely opaque fills (no transparency, no blur)
2. **Flattening**:
- PDFs are rasterized to images and embedded in a fresh PDF document
- Images are re-encoded through canvas, stripping all metadata
- No hidden layers, selectable text, or recoverable data
### Metadata Removal
- **Canvas re-encode**: `toBlob()` creates a fresh image without EXIF data
- **Expected behavior**: GPS coordinates, camera info, and orientation tags are intentionally removed
- **PDF metadata**: Generated PDFs only include safe, user-specified metadata
---
## Tech Stack
### Core Libraries
- **Vite** (^5.4.11): Fast build tool with ESM support
- **pdfjs-dist** (^4.8.69): PDF rendering and text extraction
- **pdf-lib** (^1.17.1): PDF generation with embedded images
- **tesseract.js** (^5.1.1): Optional OCR for scanned documents
- **browser-fs-access** (^0.35.0): Cross-platform file access (File System Access API + fallback)
- **Workbox** (^7.1.1): Service Worker generation for PWA
### Why No Heavy AI?
- Pattern detection uses **regex + Luhn validation** (fast, accurate, privacy-preserving)
- No paid APIs required
- Fully offline-capable
- No external dependencies for core functionality
---
## Installation & Development
### Prerequisites
- Node.js 18+ and npm
### Setup
```bash
# Install dependencies
npm install
# Run development server
npm run dev
# Build for production
npm run build
# Preview production build
npm run preview
```
### Testing
```bash
# Run unit tests
npm test
# Run tests with UI
npm run test:ui
# Generate coverage report
npm run test:coverage
```
---
## Deployment
### Static Hosting
The app is a static site and can be deployed to any HTTPS host:
- **Netlify**: Drag `dist/` folder to Netlify drop zone
- **Vercel**: `vercel --prod`
- **Cloudflare Pages**: Connect repo and deploy
- **GitHub Pages**: Push `dist/` to `gh-pages` branch
### Requirements
- **HTTPS**: Required for Service Worker and File System Access API
- **Modern browser**: Chrome/Edge 86+, Safari 15.4+, Firefox 105+
---
## Project Structure
```
share-safe/
� public/
� � icons/ # PWA icons
� � manifest.webmanifest # PWA manifest
� src/
� � lib/
� � � detect/ # PII detection (patterns, Luhn)
� � � pdf/ # PDF processing
� � � images/ # Image processing & EXIF removal
� � � fs/ # File I/O utilities
� � � pwa/ # Service Worker registration
� � ui/
� � � components/ # UI components
� � � App.ts # Main application
� � main.ts # Entry point
� � styles.css # Styles
� tests/
� � unit/ # Unit tests
� � e2e.spec.ts # E2E test stubs
� vite.config.ts # Vite configuration
� workbox.config.mjs # Service Worker config
� package.json
```
---
## Usage
### Basic Workflow
1. **Load Files**: Drag & drop or click to select PDFs/images
2. **Configure Detection**: Toggle email, phone, SSN, card number detection
3. **Review Suggestions**: Approve auto-detected sensitive data
4. **Manual Redaction**: Draw custom boxes with mouse/touch
5. **Export**: Download redacted file with `-redacted` suffix
### Keyboard Shortcuts
- `Tab` / `Shift+Tab`: Navigate UI elements
- `Enter` / `Space`: Activate buttons
- `Delete`: Remove selected redaction box
- `+` / `-`: Zoom in/out on canvas
### Accessibility
- WCAG 2.2 compliant
- Full keyboard navigation
- ARIA labels and roles
- High contrast mode support
- Reduced motion support
- Touch-friendly targets (44x44px minimum)
---
## Detection Patterns
### Email Addresses
- Pattern: RFC 5322 simplified
- Example: `user@example.com`
### Phone Numbers (E.164)
- Pattern: `+` followed by 1-15 digits
- Example: `+14155552671`
### US Social Security Numbers
- Formats: `XXX-XX-XXXX` or `XXXXXXXXX`
- Note: SSA randomized allocation in 2011; geography not inferred
### Payment Card Numbers
- Method: Luhn algorithm validation
- Length: 13-19 digits
- Formats: With/without spaces or dashes
- Reduces false positives significantly
---
## Known Limitations
### Current Version (1.0.0)
1. **Single-page redaction**: Manual boxes apply to current page only (multi-page tracking coming in v1.1)
2. **OCR performance**: Tesseract.js can be slow on high-resolution pages
3. **Icon placeholders**: Replace `public/icons/` with properly designed icons for production
4. **Pattern detection**:
- US-centric patterns (SSN format)
- E.164 may match non-phone number sequences
- Regex patterns optimized for speed over perfect recall
### Browser Compatibility
| Feature | Chrome/Edge | Firefox | Safari |
|---------|-------------|---------|--------|
| Basic functionality | � 86+ | � 105+ | � 15.4+ |
| File System Access | � | L (download fallback) | �� 15.4+ (limited) |
| PWA Install | � | �� (Android only) | � (iOS 16.4+) |
---
## Security Best Practices
### For Users
1. **Verify redactions**: Always review auto-detected areas before export
2. **Check manually**: Use zoom to inspect redactions at 200%+
3. **Test with dummy data**: Practice workflow before redacting real documents
4. **Keep originals secure**: This tool creates redacted copies; manage originals appropriately
### For Developers
1. **Update dependencies**: Regularly check for security updates
2. **CSP headers**: Deploy with Content Security Policy
3. **Subresource Integrity**: Use SRI for CDN resources (if any)
4. **Audit regex**: Avoid catastrophic backtracking (ReDoS)
---
## Roadmap
### v1.1 (Planned)
- [ ] Multi-page redaction tracking
- [ ] Batch export for multiple files
- [ ] Custom pattern definitions
- [ ] Undo/redo for redactions
- [ ] Export audit log (redaction summary)
### v1.2 (Future)
- [ ] International pattern libraries (IBAN, passport numbers, etc.)
- [ ] Collaborative redaction (encrypted sharing)
- [ ] License verification (optional Gumroad integration)
- [ ] Advanced OCR (on-device ML for better accuracy)
---
## Contributing
Contributions welcome! Please:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit changes (`git commit -m 'Add amazing feature'`)
4. Push to branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
### Code Style
- TypeScript strict mode
- ESLint + Prettier (config coming soon)
- Test coverage for new features
---
## License
MIT License - see [LICENSE](LICENSE) file for details
---
## References & Research
### Redaction Security
- **Blur/pixelation reversibility**: PoPETs conference proceedings on deconvolution attacks
- **Bishop Fox advisories**: Case studies of failed pixelation redaction
- **NIST guidelines**: Digital redaction best practices (SP 800-88)
### Standards & APIs
- **E.164**: ITU-T international phone number format
- **Luhn algorithm**: ISO/IEC 7812-1 payment card validation
- **PDF specification**: ISO 32000-2 (PDF 2.0)
- **File System Access API**: W3C specification
### Libraries
- [PDF.js](https://mozilla.github.io/pdf.js/ ): Mozilla's PDF rendering engine
- [pdf-lib](https://pdf-lib.js.org/ ): PDF generation and manipulation
- [Tesseract.js](https://tesseract.projectnaptha.com/ ): Browser-based OCR
- [Workbox](https://developer.chrome.com/docs/workbox/ ): Google's Service Worker toolkit
---
## Support
- **Documentation**:
- [Supported Formats Guide](docs/FORMATS.md) - Complete format documentation
- [Format Handler Development](docs/FORMAT_HANDLER_GUIDE.md) - Guide for developers
- Additional docs in `docs/` directory
- **Issues**: [GitHub Issues](https://github.com/yourusername/share-safe/issues )
- **Discussions**: [GitHub Discussions](https://github.com/yourusername/share-safe/discussions )
---
## Acknowledgments
Built with guidance from:
- OWASP security best practices
- W3C PWA guidelines
- Mozilla PDF.js documentation
- Accessibility guidelines (WCAG 2.2)
---
**Made with security and privacy in mind. Share safely!** =�
About
No description, website, or topics provided.
Resources
Security policy
Stars
Watchers
Forks
You can’t perform that action at this time.