ids.alfacom.it/replit.md
marco370 3de433f278 Fix analytics data inconsistency on live dashboard
Update analytics aggregator to correctly count attack occurrences and fix type hinting for daily aggregation.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 7a657272-55ba-4a79-9a2e-f1ed9bc7a528
Replit-Commit-Checkpoint-Type: intermediate_checkpoint
Replit-Commit-Event-Id: f54e3953-68c3-42e1-be9d-1d1db98db671
Replit-Commit-Screenshot-Url: https://storage.googleapis.com/screenshot-production-us-central1/449cf7c4-c97a-45ae-8234-e5c5b8d6a84f/7a657272-55ba-4a79-9a2e-f1ed9bc7a528/F6DiMv4
2025-11-24 15:28:22 +00:00

88 lines
6.8 KiB
Markdown

# IDS - Intrusion Detection System
## Overview
This project is a full-stack web application for an Intrusion Detection System (IDS) tailored for MikroTik routers, utilizing Machine Learning. Its core function is to monitor network traffic, identify anomalies indicative of intrusions, and automatically block malicious IP addresses across multiple routers. The system aims to provide real-time monitoring, efficient anomaly detection, and streamlined network security management for MikroTik environments, including advanced features like IP geolocation and robust service monitoring.
## User Preferences
### Operazioni Git e Deployment
- **IMPORTANTE**: L'agente NON deve usare comandi git (push-gitlab.sh) perché Replit blocca le operazioni git
- **Workflow corretto**:
1. Utente riporta errori/problemi dal server AlmaLinux
2. Agente risolve problemi e modifica file su Replit
3. **Utente esegue manualmente**: `./push-gitlab.sh` per commit+push
4. **Utente esegue sul server**: `./update_from_git.sh` o `./update_from_git.sh --db`
5. Utente testa e riporta risultati all'agente
6. Ripeti fino a funzionamento completo
### Linguaggio
- Tutte le risposte dell'agente devono essere in **italiano**
- Codice e documentazione tecnica: inglese
- Commit message: italiano
## System Architecture
The IDS employs a React-based frontend for real-time monitoring, detection visualization, and whitelist management, built with ShadCN UI and TanStack Query. The backend consists of a Python FastAPI service dedicated to ML analysis (Isolation Forest with 25 targeted features), MikroTik API management, and a detection engine that scores anomalies from 0-100 across five risk levels. A Node.js (Express) backend handles API requests from the frontend, manages the PostgreSQL database, and coordinates service operations.
**Key Architectural Decisions & Features:**
- **Log Collection & Processing**: MikroTik syslog data (UDP:514) is sent to RSyslog, parsed by `syslog_parser.py`, and stored in PostgreSQL. The parser includes auto-cleanup with a 3-day retention policy.
- **Machine Learning**: An Isolation Forest model trained on 25 network log features performs real-time anomaly detection, assigning a risk score.
- **Automated Blocking**: Critical IPs (score >= 80) are automatically blocked in parallel across all configured MikroTik routers via their REST API.
- **Service Monitoring & Management**: A dashboard provides real-time status (green/red indicators) for the ML Backend, Database, and Syslog Parser. Service management (start/stop/restart) for Python services is available via API endpoints, secured with API key authentication and Systemd integration for production-grade control and auto-restart capabilities.
- **IP Geolocation**: Integrated `ip-api.com` for enriching detection data with geographical and Autonomous System (AS) information, including intelligent caching.
- **Database Management**: PostgreSQL is used for all persistent data. An intelligent database versioning system ensures efficient SQL migrations, applying only new scripts. Dual-mode database drivers (`@neondatabase/serverless` for Replit, `pg` for AlmaLinux) ensure environment compatibility.
- **Microservices**: Clear separation of concerns between the Python ML backend and the Node.js API backend.
- **UI/UX**: Utilizes ShadCN UI for a modern component library and `react-hook-form` with Zod for robust form validation.
## External Dependencies
- **React**: Frontend framework.
- **FastAPI**: Python web framework for the ML backend.
- **PostgreSQL**: Primary database for storing configurations, logs, detections, and whitelist entries.
- **MikroTik API REST**: For router communication and IP blocking.
- **ShadCN UI**: Frontend component library.
- **TanStack Query**: Data fetching for the frontend.
- **Isolation Forest**: Machine Learning algorithm for anomaly detection.
- **RSyslog**: Log collection daemon.
- **Drizzle ORM**: For database schema definition in Node.js.
- **Neon Database**: Cloud-native PostgreSQL service (used in Replit).
- **pg (Node.js driver)**: Standard PostgreSQL driver for Node.js (used in AlmaLinux).
- **psycopg2**: PostgreSQL adapter for Python.
- **ip-api.com**: External API for IP geolocation data.
- **Recharts**: Charting library for analytics visualization.
## Recent Updates (Novembre 2025)
### 🔧 Analytics Aggregator Fix - Data Consistency (24 Nov 2025 - 17:00)
- **BUG FIX CRITICO**: Risolto mismatch dati Dashboard Live
- **Problema**: Distribuzione traffico mostrava 262k attacchi ma breakdown solo 19
- **ROOT CAUSE**: Aggregatore contava **occorrenze** invece di **pacchetti** in `attacks_by_type` e `attacks_by_country`
- **Soluzione**:
1. Spostato conteggio da loop detections → loop pacchetti
2. `attacks_by_type[tipo] += packets` (non +1!)
3. `attacks_by_country[paese] += packets` (non +1!)
4. Fallback "unknown"/"Unknown" per dati mancanti (tipo/geo)
5. Logging validazione: verifica breakdown_total == attack_packets
- **Invariante matematica**: `Σ(attacks_by_type) == Σ(attacks_by_country) == attack_packets`
- **Files modificati**: `python_ml/analytics_aggregator.py`
- **Deploy**: Restart ML backend + aggregator run manuale per testare
- **Validazione**: Log mostra `match: True` e nessun warning mismatch
### 📊 Network Analytics & Dashboard System (24 Nov 2025 - 11:30)
- **Feature Completa**: Sistema analytics con traffico normale + attacchi, visualizzazioni grafiche avanzate, dati permanenti
- **Componenti**:
1. **Database**: `network_analytics` table con aggregazioni orarie/giornaliere permanenti
2. **Aggregatore Python**: `analytics_aggregator.py` classifica traffico ogni ora
3. **Systemd Timer**: Esecuzione automatica ogni ora (:05 minuti)
4. **API**: `/api/analytics/recent` e `/api/analytics/range`
5. **Frontend**: Dashboard Live (real-time 3 giorni) + Analytics Storici (permanente)
- **Grafici**: Area Chart, Pie Chart, Bar Chart, Line Chart, Real-time Stream
- **Flag Emoji**: 🇮🇹🇺🇸🇷🇺🇨🇳 per identificazione immediata paese origine
- **Deploy**: Migration 005 + `./deployment/setup_analytics_timer.sh`
- **Security Fix**: Rimosso hardcoded path, implementato wrapper script sicuro `run_analytics.sh` per esecuzioni manuali
- **Production-grade**: Credenziali gestite via systemd EnvironmentFile (automatico) o wrapper script (manuale)
- **Frontend Fix**: Analytics History ora usa dati orari (`hourly: true`) finché aggregazione daily non è schedulata
### 🌍 IP Geolocation Integration (22 Nov 2025 - 13:00)
- **Feature**: Informazioni geografiche complete (paese, città, organizzazione, AS) per ogni IP
- **API**: ip-api.com con batch async lookup (100 IP in ~1.5s invece di 150s!)
- **Performance**: Caching intelligente + fallback robusto
- **Display**: Globe/Building/MapPin icons nella pagina Detections
- **Deploy**: Migration 004 + restart ML backend