Files
chore/.github/specs/active/feat-account-delete-scheduler.md
Ryan Kegel 060b2953fa
All checks were successful
Gitea Actions Demo / build-and-push (push) Successful in 49s
Add account deletion scheduler and comprehensive tests
- Implemented account deletion scheduler in `account_deletion_scheduler.py` to manage user deletions based on a defined threshold.
- Added logging for deletion processes, including success and error messages.
- Created tests for deletion logic, including edge cases, retry logic, and integration tests to ensure complete deletion workflows.
- Ensured that deletion attempts are tracked and that users are marked for manual intervention after exceeding maximum attempts.
- Implemented functionality to check for interrupted deletions on application startup and retry them.
2026-02-08 22:42:36 -05:00

319 lines
11 KiB
Markdown

# Feature: Account Deletion Scheduler
## Overview
**Goal:** Implement a scheduler in the backend that will delete accounts that are marked for deletion after a period of time.
**User Story:**
As an administrator, I want accounts that are marked for deletion to be deleted around X amount of hours after they were marked. I want the time to be adjustable.
---
## Configuration
### Environment Variables
- `ACCOUNT_DELETION_THRESHOLD_HOURS`: Hours to wait before deleting marked accounts (default: 720 hours / 30 days)
- **Minimum:** 24 hours (enforced for safety)
- **Maximum:** 720 hours (30 days)
- Configurable via environment variable with validation on startup
### Scheduler Settings
- **Check Interval:** Every 1 hour
- **Implementation:** APScheduler (BackgroundScheduler)
- **Restart Handling:** On app restart, scheduler checks for users with `deletion_in_progress = True` and retries them
- **Retry Logic:** Maximum 3 attempts per user; tracked via `deletion_attempted_at` timestamp
---
## Data Model Changes
### User Model (`backend/models/user.py`)
Add two new fields to the `User` dataclass:
- `deletion_in_progress: bool` - Default `False`. Set to `True` when deletion is actively running
- `deletion_attempted_at: datetime | None` - Default `None`. Timestamp of last deletion attempt
**Serialization:**
- Both fields must be included in `to_dict()` and `from_dict()` methods
---
## Deletion Process & Order
When a user is due for deletion (current time >= `marked_for_deletion_at` + threshold), the scheduler performs deletion in this order:
1. **Set Flag:** `deletion_in_progress = True` (prevents concurrent deletion)
2. **Pending Rewards:** Remove all pending rewards for user's children
3. **Children:** Remove all children belonging to the user
4. **Tasks:** Remove all user-created tasks (where `user_id` matches)
5. **Rewards:** Remove all user-created rewards (where `user_id` matches)
6. **Images (Database):** Remove user's uploaded images from `image_db`
7. **Images (Filesystem):** Delete `data/images/[user_id]` directory and all contents
8. **User Record:** Remove the user from `users_db`
9. **Clear Flag:** `deletion_in_progress = False` (only if deletion failed; otherwise user is deleted)
10. **Update Timestamp:** Set `deletion_attempted_at` to current time (if deletion failed)
### Error Handling
- If any step fails, log the error and continue to next step
- If deletion fails completely, update `deletion_attempted_at` and set `deletion_in_progress = False`
- If a user has 3 failed attempts, log a critical error but continue processing other users
- Missing directories or empty tables are not considered errors
---
## Admin API Endpoints
### New Blueprint: `backend/api/admin_api.py`
All endpoints require JWT authentication and admin privileges.
**Note:** Endpoint paths below are as defined in Flask (without `/api` prefix). Frontend accesses them via nginx proxy at `/api/admin/*`.
#### `GET /admin/deletion-queue`
Returns list of users pending deletion.
**Response:** JSON with `count` and `users` array containing user objects with fields: `id`, `email`, `marked_for_deletion_at`, `deletion_due_at`, `deletion_in_progress`, `deletion_attempted_at`
#### `GET /admin/deletion-threshold`
Returns current deletion threshold configuration.
**Response:** JSON with `threshold_hours`, `threshold_min`, and `threshold_max` fields
#### `PUT /admin/deletion-threshold`
Updates deletion threshold (requires admin auth).
**Request:** JSON with `threshold_hours` field
**Response:** JSON with `message` and updated `threshold_hours`
**Validation:**
- Must be between 24 and 720 hours
- Returns 400 error if out of range
#### `POST /admin/deletion-queue/trigger`
Manually triggers the deletion scheduler (processes entire queue immediately).
**Response:** JSON with `message`, `processed`, `deleted`, and `failed` counts
---
## SSE Event
### New Event Type: `USER_DELETED`
**File:** `backend/events/types/user_deleted.py`
**Payload fields:**
- `user_id: str` - ID of deleted user
- `email: str` - Email of deleted user
- `deleted_at: str` - ISO format timestamp of deletion
**Broadcasting:**
- Event is sent only to **admin users** (not broadcast to all users)
- Triggered immediately after successful user deletion
- Frontend admin clients can listen to this event to update UI
---
## Implementation Details
### File Structure
- `backend/config/deletion_config.py` - Configuration with env variable
- `backend/utils/account_deletion_scheduler.py` - Scheduler logic
- `backend/api/admin_api.py` - New admin endpoints
- `backend/events/types/user_deleted.py` - New SSE event
### Scheduler Startup
In `backend/main.py`, import and call `start_deletion_scheduler()` after Flask app setup
### Logging Strategy
**Configuration:**
- Use dedicated logger: `account_deletion_scheduler`
- Log to both stdout (for Docker/dev) and rotating file (for persistence)
- File: `logs/account_deletion.log`
- Rotation: 10MB max file size, keep 5 backups
- Format: `%(asctime)s - %(name)s - %(levelname)s - %(message)s`
**Log Levels:**
- **INFO:** Each deletion step (e.g., "Deleted 5 children for user {user_id}")
- **INFO:** Summary after each run (e.g., "Deletion scheduler run: 3 users processed, 2 deleted, 1 failed")
- **ERROR:** Individual step failures (e.g., "Failed to delete images for user {user_id}: {error}")
- **CRITICAL:** User with 3+ failed attempts (e.g., "User {user_id} has failed deletion 3 times")
- **WARNING:** Threshold set below 168 hours (7 days)
---
## Acceptance Criteria (Definition of Done)
### Data Model
- [x] Add `deletion_in_progress` field to User model
- [x] Add `deletion_attempted_at` field to User model
- [x] Update `to_dict()` and `from_dict()` methods for serialization
- [x] Update TypeScript User interface in frontend
### Configuration
- [x] Create `backend/config/deletion_config.py` with `ACCOUNT_DELETION_THRESHOLD_HOURS`
- [x] Add environment variable support with default (720 hours)
- [x] Enforce minimum threshold of 24 hours
- [x] Enforce maximum threshold of 720 hours
- [x] Log warning if threshold is less than 168 hours
### Backend Implementation
- [x] Create `backend/utils/account_deletion_scheduler.py`
- [x] Implement APScheduler with 1-hour check interval
- [x] Implement deletion logic in correct order (pending_rewards → children → tasks → rewards → images → directory → user)
- [x] Add comprehensive error handling (log and continue)
- [x] Add restart handling (check `deletion_in_progress` flag on startup)
- [x] Add retry logic (max 3 attempts per user)
- [x] Integrate scheduler into `backend/main.py` startup
### Admin API
- [x] Create `backend/api/admin_api.py` blueprint
- [x] Implement `GET /admin/deletion-queue` endpoint
- [x] Implement `GET /admin/deletion-threshold` endpoint
- [x] Implement `PUT /admin/deletion-threshold` endpoint
- [x] Implement `POST /admin/deletion-queue/trigger` endpoint
- [x] Add JWT authentication checks for all admin endpoints
- [ ] Add admin role validation
### SSE Event
- [x] Create `backend/events/types/user_deleted.py`
- [x] Add `USER_DELETED` to `event_types.py`
- [x] Implement admin-only event broadcasting
- [x] Trigger event after successful deletion
### Backend Unit Tests
#### Configuration Tests
- [x] Test default threshold value (720 hours)
- [x] Test environment variable override
- [x] Test minimum threshold enforcement (24 hours)
- [x] Test maximum threshold enforcement (720 hours)
- [x] Test invalid threshold values (negative, non-numeric)
#### Scheduler Tests
- [x] Test scheduler identifies users ready for deletion (past threshold)
- [x] Test scheduler ignores users not yet due for deletion
- [x] Test scheduler handles empty database
- [x] Test scheduler runs at correct interval (1 hour)
- [x] Test scheduler handles restart with `deletion_in_progress = True`
- [x] Test scheduler respects retry limit (max 3 attempts)
#### Deletion Process Tests
- [x] Test deletion removes pending_rewards for user's children
- [x] Test deletion removes children for user
- [x] Test deletion removes user's tasks (not system tasks)
- [x] Test deletion removes user's rewards (not system rewards)
- [x] Test deletion removes user's images from database
- [x] Test deletion removes user directory from filesystem
- [x] Test deletion removes user record from database
- [x] Test deletion handles missing directory gracefully
- [x] Test deletion order is correct (children before user, etc.)
- [x] Test `deletion_in_progress` flag is set during deletion
- [x] Test `deletion_attempted_at` is updated on failure
#### Edge Cases
- [x] Test deletion with user who has no children
- [x] Test deletion with user who has no custom tasks/rewards
- [x] Test deletion with user who has no uploaded images
- [x] Test partial deletion failure (continue with other users)
- [x] Test concurrent deletion attempts (flag prevents double-deletion)
- [x] Test user with exactly 3 failed attempts (logs critical, no retry)
#### Admin API Tests
- [x] Test `GET /admin/deletion-queue` returns correct users
- [x] Test `GET /admin/deletion-queue` requires authentication
- [x] Test `GET /admin/deletion-threshold` returns current threshold
- [x] Test `PUT /admin/deletion-threshold` updates threshold
- [x] Test `PUT /admin/deletion-threshold` validates min/max
- [ ] Test `PUT /admin/deletion-threshold` requires admin role
- [x] Test `POST /admin/deletion-queue/trigger` triggers scheduler
- [x] Test `POST /admin/deletion-queue/trigger` returns summary
#### Integration Tests
- [x] Test full deletion flow from marking to deletion
- [x] Test multiple users deleted in same scheduler run
- [x] Test deletion with restart midway (recovery)
### Logging & Monitoring
- [x] Configure dedicated scheduler logger with rotating file handler
- [x] Create `logs/` directory for log files
- [x] Log each deletion step with INFO level
- [x] Log summary after each scheduler run (users processed, deleted, failed)
- [x] Log errors with user ID for debugging
- [x] Log critical error for users with 3+ failed attempts
- [x] Log warning if threshold is set below 168 hours
### Documentation
- [x] Create `README.md` at project root
- [x] Document scheduler feature and behavior
- [x] Document environment variable `ACCOUNT_DELETION_THRESHOLD_HOURS`
- [x] Document deletion process and order
- [x] Document admin API endpoints
- [x] Document restart/retry behavior
---
## Testing Strategy
All tests should use `DB_ENV=test` and operate on test databases in `backend/test_data/`.
### Unit Test Files
- `backend/tests/test_deletion_config.py` - Configuration validation
- `backend/tests/test_deletion_scheduler.py` - Scheduler logic
- `backend/tests/test_admin_api.py` - Admin endpoints
### Test Fixtures
- Create users with various `marked_for_deletion_at` timestamps
- Create users with children, tasks, rewards, images
- Create users with `deletion_in_progress = True` (for restart tests)
### Assertions
- Database records are removed in correct order
- Filesystem directories are deleted
- Flags and timestamps are updated correctly
- Error handling works (log and continue)
- Admin API responses match expected format
---
## Future Considerations
- Archive deleted accounts instead of hard deletion
- Email notification to admin when deletion completes
- Configurable retry count (currently hardcoded to 3)
- Soft delete with recovery option (within grace period)