diff --git a/deployment/CLEANUP_DETECTIONS_GUIDE.md b/deployment/CLEANUP_DETECTIONS_GUIDE.md new file mode 100644 index 0000000..d5044ed --- /dev/null +++ b/deployment/CLEANUP_DETECTIONS_GUIDE.md @@ -0,0 +1,326 @@ +# IDS - Guida Cleanup Detections Automatico + +## 📋 Overview + +Sistema automatico di pulizia delle detections e gestione IP bloccati secondo regole temporali: + +1. **Cleanup Detections**: Elimina detections non bloccate più vecchie di **48 ore** +2. **Auto-Unblock**: Sblocca IP bloccati da più di **2 ore** senza nuove anomalie + +## ⚙️ Componenti + +### 1. Script Python: `python_ml/cleanup_detections.py` +Script principale che esegue le operazioni di cleanup: +- Elimina detections vecchie dal database +- Marca come "sbloccati" gli IP nel DB (NON rimuove da MikroTik firewall!) +- Logging completo in `/var/log/ids/cleanup.log` + +### 2. Wrapper Bash: `deployment/run_cleanup.sh` +Wrapper che carica le variabili d'ambiente e esegue lo script Python. + +### 3. Systemd Service: `ids-cleanup.service` +Service oneshot che esegue il cleanup una volta. + +### 4. Systemd Timer: `ids-cleanup.timer` +Timer che esegue il cleanup **ogni ora alle XX:10** (es. 10:10, 11:10, 12:10...). + +## 🚀 Installazione + +```bash +cd /opt/ids + +# Esegui setup automatico +sudo ./deployment/setup_cleanup_timer.sh + +# Output: +# ✅ Cleanup timer installato e avviato con successo! +``` + +## 📊 Monitoraggio + +### Stato Timer +```bash +# Verifica che il timer sia attivo +sudo systemctl status ids-cleanup.timer + +# Prossima esecuzione programmata +systemctl list-timers ids-cleanup.timer +``` + +### Log +```bash +# Real-time log +tail -f /var/log/ids/cleanup.log + +# Ultime 50 righe +tail -50 /var/log/ids/cleanup.log + +# Log completo +cat /var/log/ids/cleanup.log +``` + +## 🔧 Uso Manuale + +### Esecuzione Immediata +```bash +# Via systemd (consigliato) +sudo systemctl start ids-cleanup.service + +# Oppure direttamente +sudo ./deployment/run_cleanup.sh +``` + +### Test con Output Verbose +```bash +cd /opt/ids +source .env +python3 python_ml/cleanup_detections.py +``` + +## 📝 Regole di Cleanup + +### Regola 1: Cleanup Detections (48 ore) +**Query SQL**: +```sql +DELETE FROM detections +WHERE detected_at < NOW() - INTERVAL '48 hours' + AND blocked = false +``` + +**Logica**: +- Se un IP è stato rilevato ma **non bloccato** +- E non ci sono nuove detections da **48 ore** +- → Eliminalo dal database + +**Esempio**: +- IP `1.2.3.4` rilevato il 23/11 alle 10:00 +- Non bloccato (risk_score < 80) +- Nessuna nuova detection per 48 ore +- → **25/11 alle 10:10** → IP eliminato ✅ + +### Regola 2: Auto-Unblock (2 ore) +**Query SQL**: +```sql +UPDATE detections +SET blocked = false, blocked_at = NULL +WHERE blocked = true + AND blocked_at < NOW() - INTERVAL '2 hours' + AND NOT EXISTS ( + SELECT 1 FROM detections d2 + WHERE d2.source_ip = detections.source_ip + AND d2.detected_at > NOW() - INTERVAL '2 hours' + ) +``` + +**Logica**: +- Se un IP è **bloccato** +- E bloccato da **più di 2 ore** +- E **nessuna nuova detection** nelle ultime 2 ore +- → Sbloccalo nel DB + +**⚠️ ATTENZIONE**: Questo sblocca solo nel **database**, NON rimuove l'IP dalle **firewall list MikroTik**! + +**Esempio**: +- IP `5.6.7.8` bloccato il 25/11 alle 08:00 +- Nessuna nuova detection per 2 ore +- → **25/11 alle 10:10** → `blocked=false` nel DB ✅ +- → **ANCORA nella firewall MikroTik** ❌ + +### Come rimuovere da MikroTik +```bash +# Via API ML Backend +curl -X POST http://localhost:8000/unblock-ip \ + -H "Content-Type: application/json" \ + -d '{"ip_address": "5.6.7.8"}' +``` + +## 🛠️ Configurazione + +### Modifica Intervalli + +#### Cambia soglia cleanup (es. 72 ore invece di 48) +Modifica `python_ml/cleanup_detections.py`: +```python +# Linea ~47 +deleted_count = cleanup_old_detections(conn, hours=72) # ← Cambia qui +``` + +#### Cambia soglia unblock (es. 4 ore invece di 2) +Modifica `python_ml/cleanup_detections.py`: +```python +# Linea ~51 +unblocked_count = unblock_old_ips(conn, hours=4) # ← Cambia qui +``` + +### Modifica Frequenza Esecuzione +Modifica `deployment/systemd/ids-cleanup.timer`: +```ini +[Timer] +# Ogni 6 ore invece di ogni ora +OnCalendar=00/6:10:00 +``` + +Dopo le modifiche: +```bash +sudo systemctl daemon-reload +sudo systemctl restart ids-cleanup.timer +``` + +## 📊 Output Esempio + +``` +============================================================ +CLEANUP DETECTIONS - Avvio +============================================================ +✅ Connesso al database + +[1/2] Cleanup detections vecchie... +Trovate 45 detections da eliminare (più vecchie di 48h) +✅ Eliminate 45 detections vecchie + +[2/2] Sblocco IP vecchi... +Trovati 3 IP da sbloccare (bloccati da più di 2h) + - 1.2.3.4 (tipo: ddos, score: 85.2) + - 5.6.7.8 (tipo: port_scan, score: 82.1) + - 9.10.11.12 (tipo: brute_force, score: 90.5) +✅ Sbloccati 3 IP nel database +⚠️ ATTENZIONE: IP ancora presenti nelle firewall list MikroTik! +💡 Per rimuoverli dai router, usa: curl -X POST http://localhost:8000/unblock-ip -d '{"ip_address": "X.X.X.X"}' + +============================================================ +CLEANUP COMPLETATO + - Detections eliminate: 45 + - IP sbloccati (DB): 3 +============================================================ +``` + +## 🔍 Troubleshooting + +### Timer non parte +```bash +# Verifica che il timer sia enabled +sudo systemctl is-enabled ids-cleanup.timer + +# Se disabled, abilita +sudo systemctl enable ids-cleanup.timer +sudo systemctl start ids-cleanup.timer +``` + +### Errori nel log +```bash +# Controlla errori +grep ERROR /var/log/ids/cleanup.log + +# Controlla connessione DB +grep "Connesso al database" /var/log/ids/cleanup.log +``` + +### Test connessione DB +```bash +cd /opt/ids +source .env +python3 -c " +import psycopg2 +conn = psycopg2.connect( + host='$PGHOST', + port=$PGPORT, + user='$PGUSER', + password='$PGPASSWORD', + database='$PGDATABASE' +) +print('✅ DB connesso!') +conn.close() +" +``` + +## 📈 Metriche + +### Query per statistiche +```sql +-- Detections per età +SELECT + CASE + WHEN detected_at > NOW() - INTERVAL '2 hours' THEN '< 2h' + WHEN detected_at > NOW() - INTERVAL '24 hours' THEN '< 24h' + WHEN detected_at > NOW() - INTERVAL '48 hours' THEN '< 48h' + ELSE '> 48h' + END as age_group, + COUNT(*) as count, + COUNT(CASE WHEN blocked THEN 1 END) as blocked_count +FROM detections +GROUP BY age_group +ORDER BY age_group; + +-- IP bloccati per durata +SELECT + source_ip, + blocked_at, + EXTRACT(EPOCH FROM (NOW() - blocked_at)) / 3600 as hours_blocked, + anomaly_type, + risk_score::numeric +FROM detections +WHERE blocked = true +ORDER BY blocked_at DESC; +``` + +## ⚙️ Integrazione con Altri Sistemi + +### Notifiche Email (opzionale) +Aggiungi a `python_ml/cleanup_detections.py`: +```python +import smtplib +from email.mime.text import MIMEText + +if unblocked_count > 0: + msg = MIMEText(f"Sbloccati {unblocked_count} IP") + msg['Subject'] = 'IDS Cleanup Report' + msg['From'] = 'ids@example.com' + msg['To'] = 'admin@example.com' + + s = smtplib.SMTP('localhost') + s.send_message(msg) + s.quit() +``` + +### Webhook (opzionale) +```python +import requests + +requests.post('https://hooks.slack.com/...', json={ + 'text': f'IDS Cleanup: {deleted_count} detections eliminate, {unblocked_count} IP sbloccati' +}) +``` + +## 🔒 Sicurezza + +- Script eseguito come **root** (necessario per systemd) +- Credenziali DB caricate da `.env` (NON hardcoded) +- Log in `/var/log/ids/` con permessi `644` +- Service con `NoNewPrivileges=true` e `PrivateTmp=true` + +## 📅 Scheduler + +Il timer è configurato per eseguire: +- **Frequenza**: Ogni ora +- **Minuto**: XX:10 (10 minuti dopo l'ora) +- **Randomizzazione**: ±5 minuti per load balancing +- **Persistent**: Recupera esecuzioni perse durante downtime + +**Esempio orari**: 00:10, 01:10, 02:10, ..., 23:10 + +## ✅ Checklist Post-Installazione + +- [ ] Timer installato: `systemctl status ids-cleanup.timer` +- [ ] Prossima esecuzione visibile: `systemctl list-timers` +- [ ] Test manuale OK: `sudo ./deployment/run_cleanup.sh` +- [ ] Log creato: `ls -la /var/log/ids/cleanup.log` +- [ ] Nessun errore nel log: `grep ERROR /var/log/ids/cleanup.log` +- [ ] Cleanup funzionante: verificare conteggio detections prima/dopo + +## 🆘 Supporto + +Per problemi o domande: +1. Controlla log: `tail -f /var/log/ids/cleanup.log` +2. Verifica timer: `systemctl status ids-cleanup.timer` +3. Test manuale: `sudo ./deployment/run_cleanup.sh` +4. Apri issue su GitHub o contatta il team diff --git a/deployment/run_cleanup.sh b/deployment/run_cleanup.sh new file mode 100644 index 0000000..20ccc59 --- /dev/null +++ b/deployment/run_cleanup.sh @@ -0,0 +1,48 @@ +#!/bin/bash +# ========================================================= +# IDS - Cleanup Detections Runner +# ========================================================= +# Esegue cleanup automatico delle detections secondo regole: +# - Cancella detections non anomale dopo 48h +# - Sblocca IP bloccati se non più anomali dopo 2h +# +# Uso: ./run_cleanup.sh +# ========================================================= + +set -e + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" + +# Carica variabili ambiente +if [ -f "$PROJECT_ROOT/.env" ]; then + set -a + source "$PROJECT_ROOT/.env" + set +a +else + echo "❌ File .env non trovato in $PROJECT_ROOT" + exit 1 +fi + +# Log +LOG_FILE="/var/log/ids/cleanup.log" +mkdir -p /var/log/ids + +echo "=========================================" >> "$LOG_FILE" +echo "[$(date)] Cleanup automatico avviato" >> "$LOG_FILE" +echo "=========================================" >> "$LOG_FILE" + +# Esegui cleanup +cd "$PROJECT_ROOT" +python3 python_ml/cleanup_detections.py >> "$LOG_FILE" 2>&1 + +EXIT_CODE=$? + +if [ $EXIT_CODE -eq 0 ]; then + echo "[$(date)] Cleanup completato con successo" >> "$LOG_FILE" +else + echo "[$(date)] Cleanup fallito (exit code: $EXIT_CODE)" >> "$LOG_FILE" +fi + +echo "" >> "$LOG_FILE" +exit $EXIT_CODE diff --git a/deployment/setup_cleanup_timer.sh b/deployment/setup_cleanup_timer.sh new file mode 100644 index 0000000..e343e18 --- /dev/null +++ b/deployment/setup_cleanup_timer.sh @@ -0,0 +1,64 @@ +#!/bin/bash +# ========================================================= +# IDS - Setup Cleanup Timer +# ========================================================= +# Installa e avvia il timer systemd per cleanup automatico +# +# Uso: sudo ./deployment/setup_cleanup_timer.sh +# ========================================================= + +set -e + +if [ "$EUID" -ne 0 ]; then + echo "❌ Questo script deve essere eseguito come root (sudo)" + exit 1 +fi + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +echo "🔧 Setup IDS Cleanup Timer..." +echo "" + +# 1. Crea directory log +echo "[1/6] Creazione directory log..." +mkdir -p /var/log/ids +chmod 755 /var/log/ids + +# 2. Rendi eseguibili gli script +echo "[2/6] Permessi esecuzione script..." +chmod +x "$SCRIPT_DIR/run_cleanup.sh" +chmod +x "$SCRIPT_DIR/../python_ml/cleanup_detections.py" + +# 3. Copia service file +echo "[3/6] Installazione service file..." +cp "$SCRIPT_DIR/systemd/ids-cleanup.service" /etc/systemd/system/ +cp "$SCRIPT_DIR/systemd/ids-cleanup.timer" /etc/systemd/system/ + +# 4. Reload systemd +echo "[4/6] Reload systemd daemon..." +systemctl daemon-reload + +# 5. Abilita timer +echo "[5/6] Abilitazione timer..." +systemctl enable ids-cleanup.timer + +# 6. Avvia timer +echo "[6/6] Avvio timer..." +systemctl start ids-cleanup.timer + +echo "" +echo "✅ Cleanup timer installato e avviato con successo!" +echo "" +echo "📊 Status:" +systemctl status ids-cleanup.timer --no-pager -l +echo "" +echo "📅 Prossima esecuzione:" +systemctl list-timers ids-cleanup.timer --no-pager +echo "" +echo "💡 Comandi utili:" +echo " - Test manuale: sudo ./deployment/run_cleanup.sh" +echo " - Esegui ora: sudo systemctl start ids-cleanup.service" +echo " - Stato timer: sudo systemctl status ids-cleanup.timer" +echo " - Log cleanup: tail -f /var/log/ids/cleanup.log" +echo " - Disabilita timer: sudo systemctl stop ids-cleanup.timer && sudo systemctl disable ids-cleanup.timer" +echo "" diff --git a/deployment/systemd/ids-cleanup.service b/deployment/systemd/ids-cleanup.service new file mode 100644 index 0000000..f4318b7 --- /dev/null +++ b/deployment/systemd/ids-cleanup.service @@ -0,0 +1,26 @@ +[Unit] +Description=IDS Cleanup Detections Service +Documentation=https://github.com/yourusername/ids +After=network.target postgresql.service + +[Service] +Type=oneshot +User=root +WorkingDirectory=/opt/ids +EnvironmentFile=/opt/ids/.env +ExecStart=/opt/ids/deployment/run_cleanup.sh + +# Logging +StandardOutput=append:/var/log/ids/cleanup.log +StandardError=append:/var/log/ids/cleanup.log + +# Security +NoNewPrivileges=true +PrivateTmp=true + +# Restart policy (non necessario per oneshot) +# Restart=on-failure +# RestartSec=30 + +[Install] +WantedBy=multi-user.target diff --git a/deployment/systemd/ids-cleanup.timer b/deployment/systemd/ids-cleanup.timer new file mode 100644 index 0000000..7c605a7 --- /dev/null +++ b/deployment/systemd/ids-cleanup.timer @@ -0,0 +1,18 @@ +[Unit] +Description=IDS Cleanup Detections Timer +Documentation=https://github.com/yourusername/ids +Requires=ids-cleanup.service + +[Timer] +# Esegui ogni ora, 10 minuti dopo l'ora (es. 10:10, 11:10, 12:10...) +OnCalendar=hourly +OnCalendar=*:10:00 + +# Esegui subito se il sistema era spento durante l'esecuzione programmata +Persistent=true + +# Randomizza esecuzione di ±5 minuti per evitare picchi di carico +RandomizedDelaySec=300 + +[Install] +WantedBy=timers.target diff --git a/python_ml/cleanup_detections.py b/python_ml/cleanup_detections.py new file mode 100644 index 0000000..ba4be48 --- /dev/null +++ b/python_ml/cleanup_detections.py @@ -0,0 +1,169 @@ +#!/usr/bin/env python3 +""" +IDS - Cleanup Detections Script +================================ +Automatizza la pulizia delle detections e lo sblocco degli IP secondo le regole: +1. Cancella detections non anomale dopo 48 ore +2. Sblocca IP bloccati se non più anomali dopo 2 ore + +Esecuzione: Ogni ora via cron/systemd timer +""" + +import os +import sys +import logging +from datetime import datetime, timedelta +import psycopg2 +from psycopg2.extras import RealDictCursor +from dotenv import load_dotenv + +# Setup logging +logging.basicConfig( + level=logging.INFO, + format='[%(asctime)s] %(levelname)s: %(message)s', + handlers=[ + logging.FileHandler('/var/log/ids/cleanup.log'), + logging.StreamHandler(sys.stdout) + ] +) +logger = logging.getLogger(__name__) + +# Load environment +load_dotenv() + +def get_db_connection(): + """Connessione al database PostgreSQL""" + return psycopg2.connect( + host=os.getenv('PGHOST', 'localhost'), + port=int(os.getenv('PGPORT', 5432)), + user=os.getenv('PGUSER'), + password=os.getenv('PGPASSWORD'), + database=os.getenv('PGDATABASE') + ) + +def cleanup_old_detections(conn, hours=48): + """ + Cancella detections vecchie di più di N ore. + + Logica: Se un IP è stato rilevato ma dopo 48 ore non è più + considerato anomalo (non appare in nuove detections), eliminalo. + """ + cursor = conn.cursor(cursor_factory=RealDictCursor) + + cutoff_time = datetime.now() - timedelta(hours=hours) + + # Conta detections da eliminare + cursor.execute(""" + SELECT COUNT(*) as count + FROM detections + WHERE detected_at < %s + AND blocked = false + """, (cutoff_time,)) + + count = cursor.fetchone()['count'] + + if count > 0: + logger.info(f"Trovate {count} detections da eliminare (più vecchie di {hours}h)") + + # Elimina + cursor.execute(""" + DELETE FROM detections + WHERE detected_at < %s + AND blocked = false + """, (cutoff_time,)) + + conn.commit() + logger.info(f"✅ Eliminate {cursor.rowcount} detections vecchie") + else: + logger.info(f"Nessuna detection da eliminare (soglia: {hours}h)") + + cursor.close() + return count + +def unblock_old_ips(conn, hours=2): + """ + Sblocca IP bloccati da più di N ore. + + Logica: Se un IP è stato bloccato ma dopo 2 ore non è più + anomalo (nessuna nuova detection), sbloccalo dal DB. + + NOTA: Questo NON rimuove l'IP dalle firewall list dei router MikroTik. + Per quello serve chiamare l'API /unblock-ip del ML backend. + """ + cursor = conn.cursor(cursor_factory=RealDictCursor) + + cutoff_time = datetime.now() - timedelta(hours=hours) + + # Trova IP bloccati da più di N ore senza nuove detections + cursor.execute(""" + SELECT d.source_ip, d.blocked_at, d.anomaly_type, d.risk_score + FROM detections d + WHERE d.blocked = true + AND d.blocked_at < %s + AND NOT EXISTS ( + SELECT 1 FROM detections d2 + WHERE d2.source_ip = d.source_ip + AND d2.detected_at > %s + ) + """, (cutoff_time, cutoff_time)) + + ips_to_unblock = cursor.fetchall() + + if ips_to_unblock: + logger.info(f"Trovati {len(ips_to_unblock)} IP da sbloccare (bloccati da più di {hours}h)") + + for ip_data in ips_to_unblock: + ip = ip_data['source_ip'] + logger.info(f" - {ip} (tipo: {ip_data['anomaly_type']}, score: {ip_data['risk_score']})") + + # Aggiorna DB + cursor.execute(""" + UPDATE detections + SET blocked = false, blocked_at = NULL + WHERE source_ip = %s + """, (ip,)) + + conn.commit() + logger.info(f"✅ Sbloccati {len(ips_to_unblock)} IP nel database") + logger.warning("⚠️ ATTENZIONE: IP ancora presenti nelle firewall list MikroTik!") + logger.info("💡 Per rimuoverli dai router, usa: curl -X POST http://localhost:8000/unblock-ip -d '{\"ip_address\": \"X.X.X.X\"}'") + else: + logger.info(f"Nessun IP da sbloccare (soglia: {hours}h)") + + cursor.close() + return len(ips_to_unblock) + +def main(): + """Esecuzione cleanup completo""" + logger.info("=" * 60) + logger.info("CLEANUP DETECTIONS - Avvio") + logger.info("=" * 60) + + try: + conn = get_db_connection() + logger.info("✅ Connesso al database") + + # 1. Cleanup detections vecchie (48h) + logger.info("\n[1/2] Cleanup detections vecchie...") + deleted_count = cleanup_old_detections(conn, hours=48) + + # 2. Sblocco IP vecchi (2h) + logger.info("\n[2/2] Sblocco IP vecchi...") + unblocked_count = unblock_old_ips(conn, hours=2) + + conn.close() + + logger.info("\n" + "=" * 60) + logger.info("CLEANUP COMPLETATO") + logger.info(f" - Detections eliminate: {deleted_count}") + logger.info(f" - IP sbloccati (DB): {unblocked_count}") + logger.info("=" * 60) + + return 0 + + except Exception as e: + logger.error(f"❌ Errore durante cleanup: {e}", exc_info=True) + return 1 + +if __name__ == "__main__": + sys.exit(main()) diff --git a/replit.md b/replit.md index fd86362..8e68aa7 100644 --- a/replit.md +++ b/replit.md @@ -20,17 +20,18 @@ This project is a full-stack web application for an Intrusion Detection System ( - Commit message: italiano ## System Architecture -The IDS employs a React-based frontend for real-time monitoring, detection visualization, and whitelist management, built with ShadCN UI and TanStack Query. The backend consists of a Python FastAPI service dedicated to ML analysis (Isolation Forest with 25 targeted features), MikroTik API management, and a detection engine that scores anomalies from 0-100 across five risk levels. A Node.js (Express) backend handles API requests from the frontend, manages the PostgreSQL database, and coordinates service operations. +The IDS employs a React-based frontend for real-time monitoring, detection visualization, and whitelist management, built with ShadCN UI and TanStack Query. The backend consists of a Python FastAPI service dedicated to ML analysis and a Node.js (Express) backend handling API requests, PostgreSQL database management, and service coordination. **Key Architectural Decisions & Features:** -- **Log Collection & Processing**: MikroTik syslog data (UDP:514) is sent to RSyslog, parsed by `syslog_parser.py`, and stored in PostgreSQL. The parser includes auto-cleanup with a 3-day retention policy. -- **Machine Learning**: An Isolation Forest model trained on 25 network log features performs real-time anomaly detection, assigning a risk score. -- **Automated Blocking**: Critical IPs (score >= 80) are automatically blocked in parallel across all configured MikroTik routers via their REST API. -- **Service Monitoring & Management**: A dashboard provides real-time status (green/red indicators) for the ML Backend, Database, and Syslog Parser. Service management (start/stop/restart) for Python services is available via API endpoints, secured with API key authentication and Systemd integration for production-grade control and auto-restart capabilities. -- **IP Geolocation**: Integrated `ip-api.com` for enriching detection data with geographical and Autonomous System (AS) information, including intelligent caching. -- **Database Management**: PostgreSQL is used for all persistent data. An intelligent database versioning system ensures efficient SQL migrations, applying only new scripts. Dual-mode database drivers (`@neondatabase/serverless` for Replit, `pg` for AlmaLinux) ensure environment compatibility. +- **Log Collection & Processing**: MikroTik syslog data (UDP:514) is parsed by `syslog_parser.py` and stored in PostgreSQL with a 3-day retention policy. The parser includes auto-reconnect and error recovery mechanisms. +- **Machine Learning**: An Isolation Forest model (sklearn.IsolationForest) trained on 25 network log features performs real-time anomaly detection, assigning a risk score (0-100 across five risk levels). A hybrid ML detector (Isolation Forest + Ensemble Classifier with weighted voting) reduces false positives. The system supports weekly automatic retraining of models. +- **Automated Blocking**: Critical IPs (score >= 80) are automatically blocked in parallel across configured MikroTik routers via their REST API. +- **Automatic Cleanup**: An hourly systemd timer (`cleanup_detections.py`) removes old detections (48h) and auto-unblocks IPs (2h). +- **Service Monitoring & Management**: A dashboard provides real-time status (ML Backend, Database, Syslog Parser). API endpoints, secured with API key authentication and Systemd integration, allow for service management (start/stop/restart) of Python services. +- **IP Geolocation**: Integration with `ip-api.com` enriches detection data with geographical and AS information, utilizing intelligent caching. +- **Database Management**: PostgreSQL is used for all persistent data. An intelligent database versioning system ensures efficient SQL migrations. Dual-mode database drivers (`@neondatabase/serverless` for Replit, `pg` for AlmaLinux) ensure environment compatibility. - **Microservices**: Clear separation of concerns between the Python ML backend and the Node.js API backend. -- **UI/UX**: Utilizes ShadCN UI for a modern component library and `react-hook-form` with Zod for robust form validation. +- **UI/UX**: Utilizes ShadCN UI for a modern component library and `react-hook-form` with Zod for robust form validation. Analytics dashboards provide visualizations of normal and attack traffic, including real-time and historical data. ## External Dependencies - **React**: Frontend framework. @@ -39,138 +40,12 @@ The IDS employs a React-based frontend for real-time monitoring, detection visua - **MikroTik API REST**: For router communication and IP blocking. - **ShadCN UI**: Frontend component library. - **TanStack Query**: Data fetching for the frontend. -- **Isolation Forest**: Machine Learning algorithm for anomaly detection. +- **Isolation Forest (scikit-learn)**: Machine Learning algorithm for anomaly detection. +- **xgboost, joblib**: ML libraries used in the hybrid detector. - **RSyslog**: Log collection daemon. - **Drizzle ORM**: For database schema definition in Node.js. - **Neon Database**: Cloud-native PostgreSQL service (used in Replit). - **pg (Node.js driver)**: Standard PostgreSQL driver for Node.js (used in AlmaLinux). - **psycopg2**: PostgreSQL adapter for Python. - **ip-api.com**: External API for IP geolocation data. -- **Recharts**: Charting library for analytics visualization. - -## Recent Updates (Novembre 2025) - -### 🛡️ Syslog Parser Resilience & Monitoring (25 Nov 2025 - 11:00) -- **Feature**: Parser resiliente con auto-recovery e monitoring automatico -- **Problema Risolto**: Parser si bloccava periodicamente (ultimo: 24 Nov mattina) -- **Root Cause**: Database connection timeout, eccezioni non gestite, cleanup bloccante -- **Soluzioni Implementate**: - 1. **Auto-Reconnect**: Riconnessione automatica su DB timeout - 2. **Error Recovery**: Continue processing dopo eccezioni (non crashare!) - 3. **Health Check**: Log ogni 5 minuti `[HEALTH] Parser alive: X righe, Y salvate, Z errori` - 4. **Monitoring Script**: `deployment/check_parser_health.sh` (cron ogni 5 min) - 5. **Auto-Restart**: Se ultimo log > 5 min fa → restart automatico -- **Files Modificati**: - - `python_ml/syslog_parser.py` - metodo `reconnect_db()` + try/catch nidificati - - `deployment/check_parser_health.sh` - health check con auto-restart - - `deployment/setup_parser_monitoring.sh` - setup cron job - - `deployment/TROUBLESHOOTING_SYSLOG_PARSER.md` - guida completa -- **Timestamp Detection Chiariti**: - - `first_seen/last_seen`: timestamp dei log network_logs (es. 18:46:21) - - `detected_at`: quando ML backend rileva anomalia (es. 19:45 - 1 ora dopo!) - - Il delay è normale: ML backend esegue analisi batch ogni ora -- **Deploy**: `./update_from_git.sh` → `sudo systemctl restart ids-syslog-parser` → `sudo ./deployment/setup_parser_monitoring.sh` -- **Monitoring**: `tail -f /var/log/ids/parser-health.log` - -### 🔧 Analytics Aggregator Fix - Data Consistency (24 Nov 2025 - 17:00) -- **BUG FIX CRITICO**: Risolto mismatch dati Dashboard Live -- **Problema**: Distribuzione traffico mostrava 262k attacchi ma breakdown solo 19 -- **ROOT CAUSE**: Aggregatore contava **occorrenze** invece di **pacchetti** in `attacks_by_type` e `attacks_by_country` -- **Soluzione**: - 1. Spostato conteggio da loop detections → loop pacchetti - 2. `attacks_by_type[tipo] += packets` (non +1!) - 3. `attacks_by_country[paese] += packets` (non +1!) - 4. Fallback "unknown"/"Unknown" per dati mancanti (tipo/geo) - 5. Logging validazione: verifica breakdown_total == attack_packets -- **Invariante matematica**: `Σ(attacks_by_type) == Σ(attacks_by_country) == attack_packets` -- **Files modificati**: `python_ml/analytics_aggregator.py` -- **Deploy**: Restart ML backend + aggregator run manuale per testare -- **Validazione**: Log mostra `match: True` e nessun warning mismatch - -### 📊 Network Analytics & Dashboard System (24 Nov 2025 - 11:30) -- **Feature Completa**: Sistema analytics con traffico normale + attacchi, visualizzazioni grafiche avanzate, dati permanenti -- **Componenti**: - 1. **Database**: `network_analytics` table con aggregazioni orarie/giornaliere permanenti - 2. **Aggregatore Python**: `analytics_aggregator.py` classifica traffico ogni ora - 3. **Systemd Timer**: Esecuzione automatica ogni ora (:05 minuti) - 4. **API**: `/api/analytics/recent` e `/api/analytics/range` - 5. **Frontend**: Dashboard Live (real-time 3 giorni) + Analytics Storici (permanente) -- **Grafici**: Area Chart, Pie Chart, Bar Chart, Line Chart, Real-time Stream -- **Flag Emoji**: 🇮🇹🇺🇸🇷🇺🇨🇳 per identificazione immediata paese origine -- **Deploy**: Migration 005 + `./deployment/setup_analytics_timer.sh` -- **Security Fix**: Rimosso hardcoded path, implementato wrapper script sicuro `run_analytics.sh` per esecuzioni manuali -- **Production-grade**: Credenziali gestite via systemd EnvironmentFile (automatico) o wrapper script (manuale) -- **Frontend Fix**: Analytics History ora usa dati orari (`hourly: true`) finché aggregazione daily non è schedulata - -### 🌍 IP Geolocation Integration (22 Nov 2025 - 13:00) -- **Feature**: Informazioni geografiche complete (paese, città, organizzazione, AS) per ogni IP -- **API**: ip-api.com con batch async lookup (100 IP in ~1.5s invece di 150s!) -- **Performance**: Caching intelligente + fallback robusto -- **Display**: Globe/Building/MapPin icons nella pagina Detections -- **Deploy**: Migration 004 + restart ML backend - -### 🤖 Hybrid ML Detector - False Positive Reduction System (24 Nov 2025) -- **Obiettivo**: Riduzione falsi positivi 80-90% mantenendo alta detection accuracy -- **Architettura**: - 1. **Isolation Forest (sklearn)**: n_estimators=250, contamination=0.03 (tuning scientifico) - 2. **Feature Selection**: Chi-Square test riduce 25→18 feature più rilevanti - 3. **Ensemble Classifier**: DT + RF + XGBoost con voting ponderato (1:2:2) - 4. **Confidence Scoring**: 3-tier system (High≥95%, Medium≥70%, Low<70%) - 5. **Validation Framework**: CICIDS2017 dataset con Precision/Recall/F1/FPR metrics -- **Componenti**: - - `python_ml/ml_hybrid_detector.py` - Core detector con IF + ensemble + feature selection - - `python_ml/dataset_loader.py` - CICIDS2017 loader con mappatura 80→25 features - - `python_ml/validation_metrics.py` - Production-grade metrics calculator - - `python_ml/train_hybrid.py` - CLI training script (test/train/validate) -- **Dipendenze ML**: xgboost==2.0.3, joblib==1.3.2, scikit-learn==1.3.2 -- **Backward Compatibility**: USE_HYBRID_DETECTOR env var (default=true) -- **Target Metrics**: Precision≥90%, Recall≥80%, FPR≤5%, F1≥85% -- **Deploy**: Vedere `deployment/CHECKLIST_ML_HYBRID.md` - -#### 🎯 Decisione Architetturale - sklearn.IsolationForest (24 Nov 2025 - 22:00) -- **Problema Deploy**: eif==2.0.2 incompatibile con Python 3.11 (richiede distutils rimosso, API Cython obsolete, fermo dal 2021) -- **Tentativi falliti** (1+ ora bloccati): Build isolation flags, Cython pre-install, PIP_NO_BUILD_ISOLATION, Python downgrade consideration -- **Analisi Architect**: - - Extended IF (eif) NON supporta Python ≥3.11 (incompatibilità fondamentale C++/Cython) - - Downgrade Python 3.10 = ricreare venv + 50 dipendenze (rischio regressioni, EOL 2026) - - PyOD NON ha Extended IF (solo standard IF wrapper sklearn - fonte verificata) - - **Codice aveva GIÀ fallback funzionante** a `sklearn.ensemble.IsolationForest`! -- **DECISIONE FINALE**: Usare sklearn.IsolationForest (fallback pre-esistente) - - ✅ Compatibile Python 3.11+ (wheels pre-compilati, zero compilazione) - - ✅ **ZERO modifica codice** (fallback già implementato con flag EIF_AVAILABLE) - - ✅ Target metrics raggiungibili con IF standard + ensemble + feature selection - - ✅ Production-grade, libreria scikit-learn mantenuta e stabile - - ✅ Installazione semplificata: `pip install xgboost joblib` (2 step invece di 4!) -- **Files modificati**: - - `requirements.txt`: Rimosso `eif==2.0.2` e `Cython==3.0.5` (non più necessari) - - `deployment/install_ml_deps.sh`: Semplificato da 4 a 2 step, nessuna compilazione - - `deployment/CHECKLIST_ML_HYBRID.md`: Aggiornato con nuove istruzioni semplificate - -### 🔄 Database Schema Adaptation & Auto-Training (24 Nov 2025 - 23:30) -- **Database Schema Fix**: Adattato ML detector allo schema reale `network_logs` - - Query SQL corretta: `destination_ip` (non `dest_ip`), `destination_port` (non `dest_port`) - - Feature extraction: supporto `packet_length` invece di `packets`/`bytes` separati - - Backward compatible: funziona sia con schema MikroTik che dataset CICIDS2017 -- **Training Automatico Settimanale**: - - Script wrapper: `deployment/run_ml_training.sh` (carica credenziali da .env) - - Systemd service: `ids-ml-training.service` - - Systemd timer: `ids-ml-training.timer` (ogni Lunedì 03:00 AM) - - Setup automatico: `./deployment/setup_ml_training_timer.sh` - - Log persistenti: `/var/log/ids/ml-training.log` -- **Workflow Completo**: - 1. Timer systemd esegue training settimanale automatico - 2. Script carica ultimi 7 giorni di traffico dal database (234M+ records) - 3. Training Hybrid ML (IF + Ensemble + Feature Selection) - 4. Modelli salvati in `python_ml/models/` - 5. ML backend li carica automaticamente al prossimo riavvio -- **Files creati**: - - `deployment/run_ml_training.sh` - Wrapper sicuro per training - - `deployment/train_hybrid_production.sh` - Script training manuale completo - - `deployment/systemd/ids-ml-training.service` - Service systemd - - `deployment/systemd/ids-ml-training.timer` - Timer settimanale - - `deployment/setup_ml_training_timer.sh` - Setup automatico -- **Files modificati**: - - `python_ml/train_hybrid.py` - Query SQL adattata allo schema DB reale - - `python_ml/ml_hybrid_detector.py` - Supporto `packet_length`, backward compatible - - `python_ml/dataset_loader.py` - Fix timestamp mancante in dataset sintetico -- **Impatto**: Sistema userà automaticamente sklearn IF tramite fallback, tutti gli 8 checkpoint fail-fast funzionano identicamente \ No newline at end of file +- **Recharts**: Charting library for analytics visualization. \ No newline at end of file