Create a dedicated script to install machine learning dependencies in the correct order, ensuring Cython is installed before packages that require it for compilation, and update documentation accordingly. Replit-Commit-Author: Agent Replit-Commit-Session-Id: 7a657272-55ba-4a79-9a2e-f1ed9bc7a528 Replit-Commit-Checkpoint-Type: full_checkpoint Replit-Commit-Event-Id: aa7dc534-7330-4bd4-b726-d6eeb29008af Replit-Commit-Screenshot-Url: https://storage.googleapis.com/screenshot-production-us-central1/449cf7c4-c97a-45ae-8234-e5c5b8d6a84f/7a657272-55ba-4a79-9a2e-f1ed9bc7a528/2lUhxO2
110 lines
8.3 KiB
Markdown
110 lines
8.3 KiB
Markdown
# IDS - Intrusion Detection System
|
|
|
|
## Overview
|
|
This project is a full-stack web application for an Intrusion Detection System (IDS) tailored for MikroTik routers, utilizing Machine Learning. Its core function is to monitor network traffic, identify anomalies indicative of intrusions, and automatically block malicious IP addresses across multiple routers. The system aims to provide real-time monitoring, efficient anomaly detection, and streamlined network security management for MikroTik environments, including advanced features like IP geolocation and robust service monitoring.
|
|
|
|
## User Preferences
|
|
### Operazioni Git e Deployment
|
|
- **IMPORTANTE**: L'agente NON deve usare comandi git (push-gitlab.sh) perché Replit blocca le operazioni git
|
|
- **Workflow corretto**:
|
|
1. Utente riporta errori/problemi dal server AlmaLinux
|
|
2. Agente risolve problemi e modifica file su Replit
|
|
3. **Utente esegue manualmente**: `./push-gitlab.sh` per commit+push
|
|
4. **Utente esegue sul server**: `./update_from_git.sh` o `./update_from_git.sh --db`
|
|
5. Utente testa e riporta risultati all'agente
|
|
6. Ripeti fino a funzionamento completo
|
|
|
|
### Linguaggio
|
|
- Tutte le risposte dell'agente devono essere in **italiano**
|
|
- Codice e documentazione tecnica: inglese
|
|
- Commit message: italiano
|
|
|
|
## System Architecture
|
|
The IDS employs a React-based frontend for real-time monitoring, detection visualization, and whitelist management, built with ShadCN UI and TanStack Query. The backend consists of a Python FastAPI service dedicated to ML analysis (Isolation Forest with 25 targeted features), MikroTik API management, and a detection engine that scores anomalies from 0-100 across five risk levels. A Node.js (Express) backend handles API requests from the frontend, manages the PostgreSQL database, and coordinates service operations.
|
|
|
|
**Key Architectural Decisions & Features:**
|
|
- **Log Collection & Processing**: MikroTik syslog data (UDP:514) is sent to RSyslog, parsed by `syslog_parser.py`, and stored in PostgreSQL. The parser includes auto-cleanup with a 3-day retention policy.
|
|
- **Machine Learning**: An Isolation Forest model trained on 25 network log features performs real-time anomaly detection, assigning a risk score.
|
|
- **Automated Blocking**: Critical IPs (score >= 80) are automatically blocked in parallel across all configured MikroTik routers via their REST API.
|
|
- **Service Monitoring & Management**: A dashboard provides real-time status (green/red indicators) for the ML Backend, Database, and Syslog Parser. Service management (start/stop/restart) for Python services is available via API endpoints, secured with API key authentication and Systemd integration for production-grade control and auto-restart capabilities.
|
|
- **IP Geolocation**: Integrated `ip-api.com` for enriching detection data with geographical and Autonomous System (AS) information, including intelligent caching.
|
|
- **Database Management**: PostgreSQL is used for all persistent data. An intelligent database versioning system ensures efficient SQL migrations, applying only new scripts. Dual-mode database drivers (`@neondatabase/serverless` for Replit, `pg` for AlmaLinux) ensure environment compatibility.
|
|
- **Microservices**: Clear separation of concerns between the Python ML backend and the Node.js API backend.
|
|
- **UI/UX**: Utilizes ShadCN UI for a modern component library and `react-hook-form` with Zod for robust form validation.
|
|
|
|
## External Dependencies
|
|
- **React**: Frontend framework.
|
|
- **FastAPI**: Python web framework for the ML backend.
|
|
- **PostgreSQL**: Primary database for storing configurations, logs, detections, and whitelist entries.
|
|
- **MikroTik API REST**: For router communication and IP blocking.
|
|
- **ShadCN UI**: Frontend component library.
|
|
- **TanStack Query**: Data fetching for the frontend.
|
|
- **Isolation Forest**: Machine Learning algorithm for anomaly detection.
|
|
- **RSyslog**: Log collection daemon.
|
|
- **Drizzle ORM**: For database schema definition in Node.js.
|
|
- **Neon Database**: Cloud-native PostgreSQL service (used in Replit).
|
|
- **pg (Node.js driver)**: Standard PostgreSQL driver for Node.js (used in AlmaLinux).
|
|
- **psycopg2**: PostgreSQL adapter for Python.
|
|
- **ip-api.com**: External API for IP geolocation data.
|
|
- **Recharts**: Charting library for analytics visualization.
|
|
|
|
## Recent Updates (Novembre 2025)
|
|
|
|
### 🔧 Analytics Aggregator Fix - Data Consistency (24 Nov 2025 - 17:00)
|
|
- **BUG FIX CRITICO**: Risolto mismatch dati Dashboard Live
|
|
- **Problema**: Distribuzione traffico mostrava 262k attacchi ma breakdown solo 19
|
|
- **ROOT CAUSE**: Aggregatore contava **occorrenze** invece di **pacchetti** in `attacks_by_type` e `attacks_by_country`
|
|
- **Soluzione**:
|
|
1. Spostato conteggio da loop detections → loop pacchetti
|
|
2. `attacks_by_type[tipo] += packets` (non +1!)
|
|
3. `attacks_by_country[paese] += packets` (non +1!)
|
|
4. Fallback "unknown"/"Unknown" per dati mancanti (tipo/geo)
|
|
5. Logging validazione: verifica breakdown_total == attack_packets
|
|
- **Invariante matematica**: `Σ(attacks_by_type) == Σ(attacks_by_country) == attack_packets`
|
|
- **Files modificati**: `python_ml/analytics_aggregator.py`
|
|
- **Deploy**: Restart ML backend + aggregator run manuale per testare
|
|
- **Validazione**: Log mostra `match: True` e nessun warning mismatch
|
|
|
|
### 📊 Network Analytics & Dashboard System (24 Nov 2025 - 11:30)
|
|
- **Feature Completa**: Sistema analytics con traffico normale + attacchi, visualizzazioni grafiche avanzate, dati permanenti
|
|
- **Componenti**:
|
|
1. **Database**: `network_analytics` table con aggregazioni orarie/giornaliere permanenti
|
|
2. **Aggregatore Python**: `analytics_aggregator.py` classifica traffico ogni ora
|
|
3. **Systemd Timer**: Esecuzione automatica ogni ora (:05 minuti)
|
|
4. **API**: `/api/analytics/recent` e `/api/analytics/range`
|
|
5. **Frontend**: Dashboard Live (real-time 3 giorni) + Analytics Storici (permanente)
|
|
- **Grafici**: Area Chart, Pie Chart, Bar Chart, Line Chart, Real-time Stream
|
|
- **Flag Emoji**: 🇮🇹🇺🇸🇷🇺🇨🇳 per identificazione immediata paese origine
|
|
- **Deploy**: Migration 005 + `./deployment/setup_analytics_timer.sh`
|
|
- **Security Fix**: Rimosso hardcoded path, implementato wrapper script sicuro `run_analytics.sh` per esecuzioni manuali
|
|
- **Production-grade**: Credenziali gestite via systemd EnvironmentFile (automatico) o wrapper script (manuale)
|
|
- **Frontend Fix**: Analytics History ora usa dati orari (`hourly: true`) finché aggregazione daily non è schedulata
|
|
|
|
### 🌍 IP Geolocation Integration (22 Nov 2025 - 13:00)
|
|
- **Feature**: Informazioni geografiche complete (paese, città, organizzazione, AS) per ogni IP
|
|
- **API**: ip-api.com con batch async lookup (100 IP in ~1.5s invece di 150s!)
|
|
- **Performance**: Caching intelligente + fallback robusto
|
|
- **Display**: Globe/Building/MapPin icons nella pagina Detections
|
|
- **Deploy**: Migration 004 + restart ML backend
|
|
|
|
### 🤖 Hybrid ML Detector - False Positive Reduction System (24 Nov 2025 - 18:30)
|
|
- **Obiettivo**: Riduzione falsi positivi 80-90% mantenendo alta detection accuracy
|
|
- **Architettura Nuova**:
|
|
1. **Extended Isolation Forest**: n_estimators=250, contamination=0.03 (scientificamente tuned)
|
|
2. **Feature Selection**: Chi-Square test riduce 25→18 feature più rilevanti
|
|
3. **Confidence Scoring**: 3-tier system (High≥95%, Medium≥70%, Low<70%)
|
|
4. **Validation Framework**: CICIDS2017 dataset con Precision/Recall/F1/FPR metrics
|
|
- **Componenti**:
|
|
- `python_ml/ml_hybrid_detector.py` - Core detector con EIF + feature selection
|
|
- `python_ml/dataset_loader.py` - CICIDS2017 loader con mappatura 80→25 features
|
|
- `python_ml/validation_metrics.py` - Production-grade metrics calculator
|
|
- `python_ml/train_hybrid.py` - CLI training script (test/train/validate)
|
|
- **Dipendenze Nuove**: Cython==3.0.5, xgboost==2.0.3, joblib==1.3.2, eif==2.0.2
|
|
- **Backward Compatibility**: USE_HYBRID_DETECTOR env var (default=true)
|
|
- **Target Metrics**: Precision≥90%, Recall≥80%, FPR≤5%, F1≥85%
|
|
- **Deploy**: Vedere `deployment/CHECKLIST_ML_HYBRID.md`
|
|
- **Fix Deploy (24 Nov 2025 - 19:30)**:
|
|
- Corretto `eif==2.0.0` → `eif==2.0.2` (versione 2.0.0 non disponibile)
|
|
- Aggiunto `Cython==3.0.5` come build dependency (eif richiede compilazione)
|
|
- Creato `deployment/install_ml_deps.sh` per installazione in 2 fasi (Cython → eif)
|
|
- **Soluzione**: pip non installa Cython in tempo per eif → script separa installazione |