pull/3220/head
dgtlmoon 2025-06-02 19:25:39 +02:00
rodzic 03e751b57f
commit b90d03a78e
1 zmienionych plików z 91 dodań i 101 usunięć

Wyświetl plik

@ -6,129 +6,119 @@ This directory contains the Socket.IO implementation for changedetection.io's re
The real-time system provides live updates to the web interface for:
- Watch status changes (checking, completed, errors)
- Queue length updates
- Queue length updates
- General statistics updates
## Historical Issues and Solutions
### Eventlet vs Playwright Conflicts
**Problem**: The application originally used `eventlet.monkey_patch()` to enable green threading for Socket.IO, but this caused severe conflicts with Playwright's synchronous browser automation.
#### Symptoms:
1. **Playwright hanging**: The `with sync_playwright() as p:` context manager would hang when exiting, preventing proper cleanup
2. **Greenlet thread switching errors**:
```
greenlet.error: Cannot switch to a different thread
Current: <greenlet.greenlet object at 0x...>
Expected: <greenlet.greenlet object at 0x...>
```
#### Root Cause:
- `eventlet.monkey_patch()` globally patches Python's threading, socket, and I/O modules
- Playwright's sync API relies on real OS threads for browser communication and cleanup
- When eventlet patches threading, it replaces real threads with green threads (greenlets)
- Playwright's internal operations try to switch between real threads, but eventlet expects greenlet switching
- This creates an incompatible execution model
### Solution Evolution
#### Attempt 1: Selective Monkey Patching
```python
# Tried to patch only specific modules
eventlet.monkey_patch(socket=True, select=True, time=True, thread=False, os=False)
```
**Result**: Still had conflicts because Socket.IO operations interacted with Playwright's threaded operations.
#### Attempt 2: Complete Eventlet Removal
**Final Solution**: Removed eventlet monkey patching entirely and switched to threading-based Socket.IO:
```python
# Before
async_mode = 'eventlet'
eventlet.monkey_patch()
polling_thread = eventlet.spawn(polling_function)
# After
async_mode = 'threading'
# No monkey patching
polling_thread = threading.Thread(target=polling_function, daemon=True)
```
## Current Implementation
### Socket.IO Configuration
- **Async Mode**: `eventlet` (restored)
- **Server**: Eventlet WSGI server
- **Threading**: Eventlet greenlets for background tasks
- **Async Mode**: `threading` (default) or `gevent` (optional via SOCKETIO_MODE env var)
- **Server**: Flask-SocketIO with threading support
- **Background Tasks**: Python threading with daemon threads
### Playwright Integration
- **API**: `async_playwright()` instead of `sync_playwright()`
- **Execution**: Runs in separate asyncio event loops when called from Flask routes
- **Browser Steps**: Fully converted to async operations
### Async Worker Integration
- **Workers**: Async workers using asyncio for watch processing
- **Queue**: AsyncSignalPriorityQueue for job distribution
- **Signals**: Blinker signals for real-time updates between workers and Socket.IO
### Background Tasks
- **Queue polling**: Uses eventlet greenlets with `eventlet.Event` for clean shutdown
- **Signal handling**: Blinker signals for watch updates
- **Real-time updates**: Direct Socket.IO `emit()` calls to connected clients
### Environment Variables
- `SOCKETIO_MODE=threading` (default, recommended)
- `SOCKETIO_MODE=gevent` (optional, has cross-platform limitations)
### Trade-offs
## Architecture Decision: Why Threading Mode?
#### Benefits:
- ✅ No conflicts between eventlet and Playwright (async mode)
- ✅ No greenlet thread switching errors
- ✅ Full SocketIO functionality restored
- ✅ Better performance with eventlet green threads
- ✅ Production-ready eventlet server
### Previous Issues with Eventlet
**Eventlet was completely removed** due to fundamental compatibility issues:
#### Implementation Details:
- ✅ Async Playwright runs in isolated asyncio event loops
- ✅ Flask routes use `asyncio.run_until_complete()` for async calls
- ✅ Browser steps session management fully async
1. **Monkey Patching Conflicts**: `eventlet.monkey_patch()` globally replaced Python's threading/socket modules, causing conflicts with:
- Playwright's synchronous browser automation
- Async worker event loops
- Various Python libraries expecting real threading
## Alternative Approaches Considered
2. **Python 3.12+ Compatibility**: Eventlet had issues with newer Python versions and asyncio integration
### 1. Async Playwright
Converting to `async_playwright()` would eliminate sync context conflicts, but:
- Major refactoring required across the entire content fetcher system
- Async/await propagation through the codebase
- Potential compatibility issues with other sync operations
3. **CVE-2023-29483**: Security vulnerability in eventlet's dnspython dependency
### 2. Process Isolation
Running Playwright in separate processes via multiprocessing:
- Added complexity for IPC
- Overhead of process creation/communication
- Difficult error handling and resource management
### Current Solution Benefits
**Threading Mode Advantages**:
- Full compatibility with async workers and Playwright
- No monkey patching - uses standard Python threading
- Better Python 3.12+ support
- Cross-platform compatibility (Windows, macOS, Linux)
- No external async library dependencies
- Fast shutdown capabilities
### 3. Eventlet Import Patching
Using `eventlet.import_patched()` for specific modules:
- Still had underlying thread model conflicts
- Selective patching complexity
- Maintenance burden
**Optional Gevent Support**:
- Available via `SOCKETIO_MODE=gevent` for high-concurrency scenarios
- Cross-platform limitations documented in requirements.txt
- Not recommended as default due to Windows socket limits and macOS ARM build issues
## Best Practices
## Socket.IO Mode Configuration
### When Adding New Features:
1. **Avoid** `eventlet.monkey_patch()` calls
2. **Use** standard Python threading for background tasks
3. **Test** Socket.IO functionality with concurrent Playwright operations
4. **Monitor** for thread safety issues in shared resources
### Threading Mode (Default)
```python
# Enabled automatically
async_mode = 'threading'
socketio = SocketIO(app, async_mode='threading')
```
### For Production Deployment:
Consider replacing Werkzeug with a production WSGI server that supports Socket.IO threading mode, such as:
- Gunicorn with threading workers
- uWSGI with threading support
- Custom WSGI setup with proper Socket.IO integration
### Gevent Mode (Optional)
```bash
# Set environment variable
export SOCKETIO_MODE=gevent
```
## Background Tasks
### Queue Polling
- **Threading Mode**: `threading.Thread` with `threading.Event` for shutdown
- **Signal Handling**: Blinker signals for watch state changes
- **Real-time Updates**: Direct Socket.IO `emit()` calls to connected clients
### Worker Integration
- **Async Workers**: Run in separate asyncio event loop thread
- **Communication**: AsyncSignalPriorityQueue bridges async workers and Socket.IO
- **Updates**: Real-time updates sent when workers complete tasks
## Files in This Directory
- `socket_server.py`: Main Socket.IO initialization and event handling
- `events.py`: Watch operation event handlers
- `events.py`: Watch operation event handlers
- `__init__.py`: Module initialization
## Production Deployment
### Recommended WSGI Servers
For production with Socket.IO threading mode:
- **Gunicorn**: `gunicorn --worker-class eventlet changedetection:app` (if using gevent mode)
- **uWSGI**: With threading support
- **Docker**: Built-in Flask server works well for containerized deployments
### Performance Considerations
- Threading mode: Better memory usage, standard Python threading
- Gevent mode: Higher concurrency but platform limitations
- Async workers: Separate from Socket.IO, provides scalability
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `SOCKETIO_MODE` | `threading` | Socket.IO async mode (`threading` or `gevent`) |
| `FETCH_WORKERS` | `10` | Number of async workers for watch processing |
| `CHANGEDETECTION_HOST` | `0.0.0.0` | Server bind address |
| `CHANGEDETECTION_PORT` | `5000` | Server port |
## Debugging Tips
1. **Socket.IO Issues**: Enable logging with `SOCKETIO_LOGGING=True`
2. **Threading Issues**: Monitor thread count and check for deadlocks
3. **Playwright Issues**: Look for hanging processes and check browser cleanup
4. **Performance**: Monitor memory usage as threading can have different characteristics than green threads
1. **Socket.IO Issues**: Check browser dev tools for WebSocket connection errors
2. **Threading Issues**: Monitor with `ps -T` to check thread count
3. **Worker Issues**: Use `/worker-health` endpoint to check async worker status
4. **Queue Issues**: Use `/queue-status` endpoint to monitor job queue
5. **Performance**: Use `/gc-cleanup` endpoint to trigger memory cleanup
## Migration Notes
If upgrading from eventlet-based versions:
- Remove any `EVENTLET_*` environment variables
- No code changes needed - Socket.IO mode is automatically configured
- Optional: Set `SOCKETIO_MODE=gevent` if high concurrency is required and platform supports it