From 3c14508aa59a30877f50a089d1c4dbe7d4877888 Mon Sep 17 00:00:00 2001 From: marco370 <48531002-marco370@users.noreply.replit.com> Date: Mon, 24 Nov 2025 15:09:18 +0000 Subject: [PATCH] Add critical alert for idle analytics aggregator Add a destructive alert to the services page indicating when the analytics aggregator has been idle for too long, along with immediate solution instructions. Also creates a deployment checklist detailing the critical step of setting up the analytics aggregator timer. Replit-Commit-Author: Agent Replit-Commit-Session-Id: 7a657272-55ba-4a79-9a2e-f1ed9bc7a528 Replit-Commit-Checkpoint-Type: intermediate_checkpoint Replit-Commit-Event-Id: 618f6e47-fbdc-49e2-b076-7366edc904a6 Replit-Commit-Screenshot-Url: https://storage.googleapis.com/screenshot-production-us-central1/449cf7c4-c97a-45ae-8234-e5c5b8d6a84f/7a657272-55ba-4a79-9a2e-f1ed9bc7a528/F6DiMv4 --- .replit | 4 - client/src/pages/Services.tsx | 18 ++- deployment/CHECKLIST_DEPLOY.md | 223 +++++++++++++++++++++++++++++++++ 3 files changed, 240 insertions(+), 5 deletions(-) create mode 100644 deployment/CHECKLIST_DEPLOY.md diff --git a/.replit b/.replit index 0648659..3dc4618 100644 --- a/.replit +++ b/.replit @@ -18,10 +18,6 @@ externalPort = 80 localPort = 41303 externalPort = 3002 -[[ports]] -localPort = 42657 -externalPort = 3001 - [[ports]] localPort = 43471 externalPort = 3003 diff --git a/client/src/pages/Services.tsx b/client/src/pages/Services.tsx index 6c778be..90f6a4b 100644 --- a/client/src/pages/Services.tsx +++ b/client/src/pages/Services.tsx @@ -350,12 +350,28 @@ export default function ServicesPage() { {servicesStatus?.services.analyticsAggregator.details?.hoursSinceLastRun && (
L'aggregatore non esegue da {servicesStatus.services.analyticsAggregator.details.hoursSinceLastRun}h! Dashboard e Analytics bloccati.
+Soluzione Immediata (sul server):
+
+ sudo /opt/ids/deployment/setup_analytics_timer.sh
+
+ Verifica timer:
diff --git a/deployment/CHECKLIST_DEPLOY.md b/deployment/CHECKLIST_DEPLOY.md
new file mode 100644
index 0000000..23f216d
--- /dev/null
+++ b/deployment/CHECKLIST_DEPLOY.md
@@ -0,0 +1,223 @@
+# β
Checklist Deploy IDS - AlmaLinux 9
+
+## π Procedura Completa per Deploy Sicuro
+
+### 1. **Pre-Deploy: Verifiche Locali**
+
+```bash
+# Su Replit - verificare che non ci siano errori
+npm run build
+npm run db:push --force # Sync schema database
+```
+
+### 2. **Commit e Push su GitLab**
+
+```bash
+# Su Replit
+./push-gitlab.sh
+```
+
+*Messaggio commit descrittivo consigliato con tipo di modifica*
+
+---
+
+### 3. **Pull Codice sul Server**
+
+```bash
+# Sul server AlmaLinux
+cd /opt/ids
+./deployment/update_from_git.sh
+
+# Se ci sono migrations database
+./deployment/update_from_git.sh --db
+```
+
+---
+
+### 4. **CRITICO: Setup Servizi Systemd**
+
+#### 4a. Servizi Python (ML Backend & Syslog Parser)
+```bash
+# Prima volta O dopo modifiche ai .service files
+sudo ./deployment/install_systemd_services.sh
+```
+
+#### 4b. β οΈ **Analytics Aggregator Timer** (SPESSO DIMENTICATO!)
+```bash
+# IMPORTANTE: Deve essere fatto SEMPRE al primo deploy
+sudo ./deployment/setup_analytics_timer.sh
+
+# Verifica che sia attivo
+sudo systemctl list-timers ids-analytics-aggregator.timer
+```
+
+**PerchΓ© Γ¨ critico?**
+- Dashboard Live e Analytics Storici dipendono da aggregazioni orarie
+- Se il timer non Γ¨ attivo β dati fermi/vecchi!
+- Ultima run > 2 ore = problema grave
+
+---
+
+### 5. **Restart Servizi Modificati**
+
+```bash
+# Se hai modificato codice Python ML
+sudo systemctl restart ids-ml-backend
+
+# Se hai modificato syslog_parser.py
+sudo systemctl restart ids-syslog-parser
+
+# Se hai modificato frontend (Node.js)
+./deployment/restart_frontend.sh
+```
+
+---
+
+### 6. **Verifiche Post-Deploy**
+
+#### 6a. Check Status Servizi
+```bash
+# Verifica tutti i servizi
+sudo systemctl status ids-ml-backend
+sudo systemctl status ids-syslog-parser
+sudo systemctl status ids-analytics-aggregator.timer
+
+# Verifica prossima esecuzione timer
+sudo systemctl list-timers | grep ids-analytics
+```
+
+**Output atteso Analytics Timer:**
+```
+NEXT LEFT LAST PASSED UNIT ACTIVATES
+Sun 2025-11-24 17:05:00 CET 14min Sun 2025-11-24 16:05:00 CET 35min ids-analytics-aggregator.timer ids-analytics-aggregator.service
+```
+
+#### 6b. Check Logs (primi 2-3 minuti)
+```bash
+# ML Backend
+tail -f /var/log/ids/backend.log
+
+# Syslog Parser
+tail -f /var/log/ids/syslog_parser.log
+
+# Analytics Aggregator (journal)
+journalctl -u ids-analytics-aggregator -n 50
+```
+
+#### 6c. Test API Endpoints
+```bash
+# Health checks
+curl http://localhost:5000/api/stats
+curl http://localhost:8000/health
+
+# Verifica Analytics
+curl http://localhost:5000/api/analytics/recent | jq '.[] | length'
+```
+
+#### 6d. Check Database
+```bash
+# Verifica tabelle critiche
+sudo -u postgres psql ids -c "\dt"
+
+# Verifica ultime aggregazioni
+sudo -u postgres psql ids -c "SELECT COUNT(*), MAX(date), MAX(hour) FROM network_analytics;"
+
+# Verifica ultime detections
+sudo -u postgres psql ids -c "SELECT COUNT(*), MAX(detected_at) FROM detections;"
+```
+
+---
+
+### 7. **Troubleshooting Comuni**
+
+#### Problem: Analytics Aggregator non gira
+```bash
+# Soluzione
+sudo ./deployment/setup_analytics_timer.sh
+
+# Forza run immediata
+sudo systemctl start ids-analytics-aggregator
+
+# Check log
+journalctl -u ids-analytics-aggregator -n 50
+```
+
+#### Problem: ML Backend crash loop
+```bash
+# Check log per errore
+tail -100 /var/log/ids/backend.log
+
+# Spesso Γ¨ problema .env o venv
+ls -la /opt/ids/.env # Deve esistere e 600 permissions
+ls -la /opt/ids/python_ml/venv/ # Deve esistere
+```
+
+#### Problem: Syslog Parser non processa log
+```bash
+# Verifica RSyslog riceve dati
+tail -f /var/log/mikrotik/raw.log
+
+# Verifica parser in esecuzione
+ps aux | grep syslog_parser | grep -v grep
+
+# Check permessi file log
+ls -la /var/log/mikrotik/
+```
+
+---
+
+### 8. **Checklist Finale (Prima di Dichiarare Deploy OK)**
+
+- [ ] ML Backend: `systemctl status ids-ml-backend` β **active (running)**
+- [ ] Syslog Parser: `systemctl status ids-syslog-parser` β **active (running)**
+- [ ] Analytics Timer: `systemctl status ids-analytics-aggregator.timer` β **active (waiting)**
+- [ ] Next timer run: `systemctl list-timers` β mostra prossima esecuzione < 1 ora
+- [ ] Frontend: `curl http://localhost:5000/` β **200 OK**
+- [ ] ML API: `curl http://localhost:8000/health` β **{"status":"healthy"}**
+- [ ] Database: `psql $DATABASE_URL -c "SELECT 1"` β **?column? 1**
+- [ ] Analytics data: Ultima aggregazione < 2 ore fa
+- [ ] Logs: Nessun errore critico negli ultimi 5 minuti
+- [ ] Web UI: Dashboard e Analytics caricano senza errori
+
+---
+
+## π¨ Errori Comuni da Evitare
+
+1. **Dimenticare setup_analytics_timer.sh** β Dashboard fermi!
+2. Non verificare timer systemd dopo deploy
+3. Non controllare logs dopo restart servizi
+4. Non testare API endpoints prima di dichiarare deploy OK
+5. Modificare .env senza chmod 600
+6. Fare `git pull` invece di `./update_from_git.sh`
+
+---
+
+## π Monitoring Continuo
+
+```bash
+# Script debug completo
+./deployment/debug_system.sh
+
+# Verifica salute sistema ogni ora (crontab)
+0 * * * * /opt/ids/deployment/check_backend.sh
+```
+
+---
+
+## π In Caso di Emergenza
+
+```bash
+# Restart completo sistema IDS
+sudo ./deployment/restart_all.sh
+
+# Backup database PRIMA di interventi drastici
+./deployment/backup_db.sh
+
+# Restore da backup
+pg_restore -U postgres -d ids /backup/ids_backup_YYYYMMDD.dump
+```
+
+---
+
+**Ultimo aggiornamento:** 24 Novembre 2025
+**Versione:** 1.0.0