Add critical alert for idle analytics aggregator

Add a destructive alert to the services page indicating when the analytics aggregator has been idle for too long, along with immediate solution instructions. Also creates a deployment checklist detailing the critical step of setting up the analytics aggregator timer.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 7a657272-55ba-4a79-9a2e-f1ed9bc7a528
Replit-Commit-Checkpoint-Type: intermediate_checkpoint
Replit-Commit-Event-Id: 618f6e47-fbdc-49e2-b076-7366edc904a6
Replit-Commit-Screenshot-Url: https://storage.googleapis.com/screenshot-production-us-central1/449cf7c4-c97a-45ae-8234-e5c5b8d6a84f/7a657272-55ba-4a79-9a2e-f1ed9bc7a528/F6DiMv4
This commit is contained in:
marco370 2025-11-24 15:09:18 +00:00
parent b61940f1fe
commit 3c14508aa5
3 changed files with 240 additions and 5 deletions

View File

@ -18,10 +18,6 @@ externalPort = 80
localPort = 41303
externalPort = 3002
[[ports]]
localPort = 42657
externalPort = 3001
[[ports]]
localPort = 43471
externalPort = 3003

View File

@ -350,12 +350,28 @@ export default function ServicesPage() {
{servicesStatus?.services.analyticsAggregator.details?.hoursSinceLastRun && (
<div className="flex items-center justify-between">
<span className="text-sm text-muted-foreground">Ore dall'ultimo run:</span>
<Badge variant={parseFloat(servicesStatus.services.analyticsAggregator.details.hoursSinceLastRun) < 2 ? "default" : "secondary"}>
<Badge variant={parseFloat(servicesStatus.services.analyticsAggregator.details.hoursSinceLastRun) < 2 ? "default" : "destructive"}>
{servicesStatus.services.analyticsAggregator.details.hoursSinceLastRun}h
</Badge>
</div>
)}
{/* CRITICAL ALERT: Aggregator idle for too long */}
{servicesStatus?.services.analyticsAggregator.details?.hoursSinceLastRun &&
parseFloat(servicesStatus.services.analyticsAggregator.details.hoursSinceLastRun) > 2 && (
<Alert variant="destructive" className="mt-2" data-testid="alert-aggregator-idle">
<AlertCircle className="h-4 w-4" />
<AlertTitle className="text-sm font-semibold"> Timer Systemd Non Attivo</AlertTitle>
<AlertDescription className="text-xs mt-1">
<p className="mb-2">L'aggregatore non esegue da {servicesStatus.services.analyticsAggregator.details.hoursSinceLastRun}h! Dashboard e Analytics bloccati.</p>
<p className="font-semibold">Soluzione Immediata (sul server):</p>
<code className="block bg-destructive-foreground/10 p-2 rounded mt-1 font-mono text-xs">
sudo /opt/ids/deployment/setup_analytics_timer.sh
</code>
</AlertDescription>
</Alert>
)}
<div className="mt-4 p-3 bg-muted rounded-lg">
<p className="text-xs font-medium mb-2">Verifica timer:</p>
<code className="text-xs bg-background p-2 rounded block font-mono" data-testid="code-status-aggregator">

View File

@ -0,0 +1,223 @@
# ✅ Checklist Deploy IDS - AlmaLinux 9
## 📋 Procedura Completa per Deploy Sicuro
### 1. **Pre-Deploy: Verifiche Locali**
```bash
# Su Replit - verificare che non ci siano errori
npm run build
npm run db:push --force # Sync schema database
```
### 2. **Commit e Push su GitLab**
```bash
# Su Replit
./push-gitlab.sh
```
*Messaggio commit descrittivo consigliato con tipo di modifica*
---
### 3. **Pull Codice sul Server**
```bash
# Sul server AlmaLinux
cd /opt/ids
./deployment/update_from_git.sh
# Se ci sono migrations database
./deployment/update_from_git.sh --db
```
---
### 4. **CRITICO: Setup Servizi Systemd**
#### 4a. Servizi Python (ML Backend & Syslog Parser)
```bash
# Prima volta O dopo modifiche ai .service files
sudo ./deployment/install_systemd_services.sh
```
#### 4b. ⚠️ **Analytics Aggregator Timer** (SPESSO DIMENTICATO!)
```bash
# IMPORTANTE: Deve essere fatto SEMPRE al primo deploy
sudo ./deployment/setup_analytics_timer.sh
# Verifica che sia attivo
sudo systemctl list-timers ids-analytics-aggregator.timer
```
**Perché è critico?**
- Dashboard Live e Analytics Storici dipendono da aggregazioni orarie
- Se il timer non è attivo → dati fermi/vecchi!
- Ultima run > 2 ore = problema grave
---
### 5. **Restart Servizi Modificati**
```bash
# Se hai modificato codice Python ML
sudo systemctl restart ids-ml-backend
# Se hai modificato syslog_parser.py
sudo systemctl restart ids-syslog-parser
# Se hai modificato frontend (Node.js)
./deployment/restart_frontend.sh
```
---
### 6. **Verifiche Post-Deploy**
#### 6a. Check Status Servizi
```bash
# Verifica tutti i servizi
sudo systemctl status ids-ml-backend
sudo systemctl status ids-syslog-parser
sudo systemctl status ids-analytics-aggregator.timer
# Verifica prossima esecuzione timer
sudo systemctl list-timers | grep ids-analytics
```
**Output atteso Analytics Timer:**
```
NEXT LEFT LAST PASSED UNIT ACTIVATES
Sun 2025-11-24 17:05:00 CET 14min Sun 2025-11-24 16:05:00 CET 35min ids-analytics-aggregator.timer ids-analytics-aggregator.service
```
#### 6b. Check Logs (primi 2-3 minuti)
```bash
# ML Backend
tail -f /var/log/ids/backend.log
# Syslog Parser
tail -f /var/log/ids/syslog_parser.log
# Analytics Aggregator (journal)
journalctl -u ids-analytics-aggregator -n 50
```
#### 6c. Test API Endpoints
```bash
# Health checks
curl http://localhost:5000/api/stats
curl http://localhost:8000/health
# Verifica Analytics
curl http://localhost:5000/api/analytics/recent | jq '.[] | length'
```
#### 6d. Check Database
```bash
# Verifica tabelle critiche
sudo -u postgres psql ids -c "\dt"
# Verifica ultime aggregazioni
sudo -u postgres psql ids -c "SELECT COUNT(*), MAX(date), MAX(hour) FROM network_analytics;"
# Verifica ultime detections
sudo -u postgres psql ids -c "SELECT COUNT(*), MAX(detected_at) FROM detections;"
```
---
### 7. **Troubleshooting Comuni**
#### Problem: Analytics Aggregator non gira
```bash
# Soluzione
sudo ./deployment/setup_analytics_timer.sh
# Forza run immediata
sudo systemctl start ids-analytics-aggregator
# Check log
journalctl -u ids-analytics-aggregator -n 50
```
#### Problem: ML Backend crash loop
```bash
# Check log per errore
tail -100 /var/log/ids/backend.log
# Spesso è problema .env o venv
ls -la /opt/ids/.env # Deve esistere e 600 permissions
ls -la /opt/ids/python_ml/venv/ # Deve esistere
```
#### Problem: Syslog Parser non processa log
```bash
# Verifica RSyslog riceve dati
tail -f /var/log/mikrotik/raw.log
# Verifica parser in esecuzione
ps aux | grep syslog_parser | grep -v grep
# Check permessi file log
ls -la /var/log/mikrotik/
```
---
### 8. **Checklist Finale (Prima di Dichiarare Deploy OK)**
- [ ] ML Backend: `systemctl status ids-ml-backend` → **active (running)**
- [ ] Syslog Parser: `systemctl status ids-syslog-parser` → **active (running)**
- [ ] Analytics Timer: `systemctl status ids-analytics-aggregator.timer` → **active (waiting)**
- [ ] Next timer run: `systemctl list-timers` → mostra prossima esecuzione < 1 ora
- [ ] Frontend: `curl http://localhost:5000/` → **200 OK**
- [ ] ML API: `curl http://localhost:8000/health`**{"status":"healthy"}**
- [ ] Database: `psql $DATABASE_URL -c "SELECT 1"`**?column? 1**
- [ ] Analytics data: Ultima aggregazione < 2 ore fa
- [ ] Logs: Nessun errore critico negli ultimi 5 minuti
- [ ] Web UI: Dashboard e Analytics caricano senza errori
---
## 🚨 Errori Comuni da Evitare
1. **Dimenticare setup_analytics_timer.sh** → Dashboard fermi!
2. Non verificare timer systemd dopo deploy
3. Non controllare logs dopo restart servizi
4. Non testare API endpoints prima di dichiarare deploy OK
5. Modificare .env senza chmod 600
6. Fare `git pull` invece di `./update_from_git.sh`
---
## 📊 Monitoring Continuo
```bash
# Script debug completo
./deployment/debug_system.sh
# Verifica salute sistema ogni ora (crontab)
0 * * * * /opt/ids/deployment/check_backend.sh
```
---
## 🆘 In Caso di Emergenza
```bash
# Restart completo sistema IDS
sudo ./deployment/restart_all.sh
# Backup database PRIMA di interventi drastici
./deployment/backup_db.sh
# Restore da backup
pg_restore -U postgres -d ids /backup/ids_backup_YYYYMMDD.dump
```
---
**Ultimo aggiornamento:** 24 Novembre 2025
**Versione:** 1.0.0