kopia lustrzana https://github.com/dgtlmoon/changedetection.io
update readme
rodzic
03e751b57f
commit
b90d03a78e
|
@ -9,116 +9,76 @@ The real-time system provides live updates to the web interface for:
|
||||||
- Queue length updates
|
- Queue length updates
|
||||||
- General statistics updates
|
- General statistics updates
|
||||||
|
|
||||||
## Historical Issues and Solutions
|
|
||||||
|
|
||||||
### Eventlet vs Playwright Conflicts
|
|
||||||
|
|
||||||
**Problem**: The application originally used `eventlet.monkey_patch()` to enable green threading for Socket.IO, but this caused severe conflicts with Playwright's synchronous browser automation.
|
|
||||||
|
|
||||||
#### Symptoms:
|
|
||||||
1. **Playwright hanging**: The `with sync_playwright() as p:` context manager would hang when exiting, preventing proper cleanup
|
|
||||||
2. **Greenlet thread switching errors**:
|
|
||||||
```
|
|
||||||
greenlet.error: Cannot switch to a different thread
|
|
||||||
Current: <greenlet.greenlet object at 0x...>
|
|
||||||
Expected: <greenlet.greenlet object at 0x...>
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Root Cause:
|
|
||||||
- `eventlet.monkey_patch()` globally patches Python's threading, socket, and I/O modules
|
|
||||||
- Playwright's sync API relies on real OS threads for browser communication and cleanup
|
|
||||||
- When eventlet patches threading, it replaces real threads with green threads (greenlets)
|
|
||||||
- Playwright's internal operations try to switch between real threads, but eventlet expects greenlet switching
|
|
||||||
- This creates an incompatible execution model
|
|
||||||
|
|
||||||
### Solution Evolution
|
|
||||||
|
|
||||||
#### Attempt 1: Selective Monkey Patching
|
|
||||||
```python
|
|
||||||
# Tried to patch only specific modules
|
|
||||||
eventlet.monkey_patch(socket=True, select=True, time=True, thread=False, os=False)
|
|
||||||
```
|
|
||||||
**Result**: Still had conflicts because Socket.IO operations interacted with Playwright's threaded operations.
|
|
||||||
|
|
||||||
#### Attempt 2: Complete Eventlet Removal
|
|
||||||
**Final Solution**: Removed eventlet monkey patching entirely and switched to threading-based Socket.IO:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Before
|
|
||||||
async_mode = 'eventlet'
|
|
||||||
eventlet.monkey_patch()
|
|
||||||
polling_thread = eventlet.spawn(polling_function)
|
|
||||||
|
|
||||||
# After
|
|
||||||
async_mode = 'threading'
|
|
||||||
# No monkey patching
|
|
||||||
polling_thread = threading.Thread(target=polling_function, daemon=True)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Current Implementation
|
## Current Implementation
|
||||||
|
|
||||||
### Socket.IO Configuration
|
### Socket.IO Configuration
|
||||||
- **Async Mode**: `eventlet` (restored)
|
- **Async Mode**: `threading` (default) or `gevent` (optional via SOCKETIO_MODE env var)
|
||||||
- **Server**: Eventlet WSGI server
|
- **Server**: Flask-SocketIO with threading support
|
||||||
- **Threading**: Eventlet greenlets for background tasks
|
- **Background Tasks**: Python threading with daemon threads
|
||||||
|
|
||||||
### Playwright Integration
|
### Async Worker Integration
|
||||||
- **API**: `async_playwright()` instead of `sync_playwright()`
|
- **Workers**: Async workers using asyncio for watch processing
|
||||||
- **Execution**: Runs in separate asyncio event loops when called from Flask routes
|
- **Queue**: AsyncSignalPriorityQueue for job distribution
|
||||||
- **Browser Steps**: Fully converted to async operations
|
- **Signals**: Blinker signals for real-time updates between workers and Socket.IO
|
||||||
|
|
||||||
### Background Tasks
|
### Environment Variables
|
||||||
- **Queue polling**: Uses eventlet greenlets with `eventlet.Event` for clean shutdown
|
- `SOCKETIO_MODE=threading` (default, recommended)
|
||||||
- **Signal handling**: Blinker signals for watch updates
|
- `SOCKETIO_MODE=gevent` (optional, has cross-platform limitations)
|
||||||
- **Real-time updates**: Direct Socket.IO `emit()` calls to connected clients
|
|
||||||
|
|
||||||
### Trade-offs
|
## Architecture Decision: Why Threading Mode?
|
||||||
|
|
||||||
#### Benefits:
|
### Previous Issues with Eventlet
|
||||||
- ✅ No conflicts between eventlet and Playwright (async mode)
|
**Eventlet was completely removed** due to fundamental compatibility issues:
|
||||||
- ✅ No greenlet thread switching errors
|
|
||||||
- ✅ Full SocketIO functionality restored
|
|
||||||
- ✅ Better performance with eventlet green threads
|
|
||||||
- ✅ Production-ready eventlet server
|
|
||||||
|
|
||||||
#### Implementation Details:
|
1. **Monkey Patching Conflicts**: `eventlet.monkey_patch()` globally replaced Python's threading/socket modules, causing conflicts with:
|
||||||
- ✅ Async Playwright runs in isolated asyncio event loops
|
- Playwright's synchronous browser automation
|
||||||
- ✅ Flask routes use `asyncio.run_until_complete()` for async calls
|
- Async worker event loops
|
||||||
- ✅ Browser steps session management fully async
|
- Various Python libraries expecting real threading
|
||||||
|
|
||||||
## Alternative Approaches Considered
|
2. **Python 3.12+ Compatibility**: Eventlet had issues with newer Python versions and asyncio integration
|
||||||
|
|
||||||
### 1. Async Playwright
|
3. **CVE-2023-29483**: Security vulnerability in eventlet's dnspython dependency
|
||||||
Converting to `async_playwright()` would eliminate sync context conflicts, but:
|
|
||||||
- Major refactoring required across the entire content fetcher system
|
|
||||||
- Async/await propagation through the codebase
|
|
||||||
- Potential compatibility issues with other sync operations
|
|
||||||
|
|
||||||
### 2. Process Isolation
|
### Current Solution Benefits
|
||||||
Running Playwright in separate processes via multiprocessing:
|
✅ **Threading Mode Advantages**:
|
||||||
- Added complexity for IPC
|
- Full compatibility with async workers and Playwright
|
||||||
- Overhead of process creation/communication
|
- No monkey patching - uses standard Python threading
|
||||||
- Difficult error handling and resource management
|
- Better Python 3.12+ support
|
||||||
|
- Cross-platform compatibility (Windows, macOS, Linux)
|
||||||
|
- No external async library dependencies
|
||||||
|
- Fast shutdown capabilities
|
||||||
|
|
||||||
### 3. Eventlet Import Patching
|
✅ **Optional Gevent Support**:
|
||||||
Using `eventlet.import_patched()` for specific modules:
|
- Available via `SOCKETIO_MODE=gevent` for high-concurrency scenarios
|
||||||
- Still had underlying thread model conflicts
|
- Cross-platform limitations documented in requirements.txt
|
||||||
- Selective patching complexity
|
- Not recommended as default due to Windows socket limits and macOS ARM build issues
|
||||||
- Maintenance burden
|
|
||||||
|
|
||||||
## Best Practices
|
## Socket.IO Mode Configuration
|
||||||
|
|
||||||
### When Adding New Features:
|
### Threading Mode (Default)
|
||||||
1. **Avoid** `eventlet.monkey_patch()` calls
|
```python
|
||||||
2. **Use** standard Python threading for background tasks
|
# Enabled automatically
|
||||||
3. **Test** Socket.IO functionality with concurrent Playwright operations
|
async_mode = 'threading'
|
||||||
4. **Monitor** for thread safety issues in shared resources
|
socketio = SocketIO(app, async_mode='threading')
|
||||||
|
```
|
||||||
|
|
||||||
### For Production Deployment:
|
### Gevent Mode (Optional)
|
||||||
Consider replacing Werkzeug with a production WSGI server that supports Socket.IO threading mode, such as:
|
```bash
|
||||||
- Gunicorn with threading workers
|
# Set environment variable
|
||||||
- uWSGI with threading support
|
export SOCKETIO_MODE=gevent
|
||||||
- Custom WSGI setup with proper Socket.IO integration
|
```
|
||||||
|
|
||||||
|
## Background Tasks
|
||||||
|
|
||||||
|
### Queue Polling
|
||||||
|
- **Threading Mode**: `threading.Thread` with `threading.Event` for shutdown
|
||||||
|
- **Signal Handling**: Blinker signals for watch state changes
|
||||||
|
- **Real-time Updates**: Direct Socket.IO `emit()` calls to connected clients
|
||||||
|
|
||||||
|
### Worker Integration
|
||||||
|
- **Async Workers**: Run in separate asyncio event loop thread
|
||||||
|
- **Communication**: AsyncSignalPriorityQueue bridges async workers and Socket.IO
|
||||||
|
- **Updates**: Real-time updates sent when workers complete tasks
|
||||||
|
|
||||||
## Files in This Directory
|
## Files in This Directory
|
||||||
|
|
||||||
|
@ -126,9 +86,39 @@ Consider replacing Werkzeug with a production WSGI server that supports Socket.I
|
||||||
- `events.py`: Watch operation event handlers
|
- `events.py`: Watch operation event handlers
|
||||||
- `__init__.py`: Module initialization
|
- `__init__.py`: Module initialization
|
||||||
|
|
||||||
|
## Production Deployment
|
||||||
|
|
||||||
|
### Recommended WSGI Servers
|
||||||
|
For production with Socket.IO threading mode:
|
||||||
|
- **Gunicorn**: `gunicorn --worker-class eventlet changedetection:app` (if using gevent mode)
|
||||||
|
- **uWSGI**: With threading support
|
||||||
|
- **Docker**: Built-in Flask server works well for containerized deployments
|
||||||
|
|
||||||
|
### Performance Considerations
|
||||||
|
- Threading mode: Better memory usage, standard Python threading
|
||||||
|
- Gevent mode: Higher concurrency but platform limitations
|
||||||
|
- Async workers: Separate from Socket.IO, provides scalability
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `SOCKETIO_MODE` | `threading` | Socket.IO async mode (`threading` or `gevent`) |
|
||||||
|
| `FETCH_WORKERS` | `10` | Number of async workers for watch processing |
|
||||||
|
| `CHANGEDETECTION_HOST` | `0.0.0.0` | Server bind address |
|
||||||
|
| `CHANGEDETECTION_PORT` | `5000` | Server port |
|
||||||
|
|
||||||
## Debugging Tips
|
## Debugging Tips
|
||||||
|
|
||||||
1. **Socket.IO Issues**: Enable logging with `SOCKETIO_LOGGING=True`
|
1. **Socket.IO Issues**: Check browser dev tools for WebSocket connection errors
|
||||||
2. **Threading Issues**: Monitor thread count and check for deadlocks
|
2. **Threading Issues**: Monitor with `ps -T` to check thread count
|
||||||
3. **Playwright Issues**: Look for hanging processes and check browser cleanup
|
3. **Worker Issues**: Use `/worker-health` endpoint to check async worker status
|
||||||
4. **Performance**: Monitor memory usage as threading can have different characteristics than green threads
|
4. **Queue Issues**: Use `/queue-status` endpoint to monitor job queue
|
||||||
|
5. **Performance**: Use `/gc-cleanup` endpoint to trigger memory cleanup
|
||||||
|
|
||||||
|
## Migration Notes
|
||||||
|
|
||||||
|
If upgrading from eventlet-based versions:
|
||||||
|
- Remove any `EVENTLET_*` environment variables
|
||||||
|
- No code changes needed - Socket.IO mode is automatically configured
|
||||||
|
- Optional: Set `SOCKETIO_MODE=gevent` if high concurrency is required and platform supports it
|
Ładowanie…
Reference in New Issue