Add Cython==3.0.5 to python_ml/requirements.txt and update replit.md to reflect this change, resolving a compilation issue with the eif library. Replit-Commit-Author: Agent Replit-Commit-Session-Id: 7a657272-55ba-4a79-9a2e-f1ed9bc7a528 Replit-Commit-Checkpoint-Type: full_checkpoint Replit-Commit-Event-Id: f24578fc-6be7-42c0-9a9c-5ffe13dacdbe Replit-Commit-Screenshot-Url: https://storage.googleapis.com/screenshot-production-us-central1/449cf7c4-c97a-45ae-8234-e5c5b8d6a84f/7a657272-55ba-4a79-9a2e-f1ed9bc7a528/2lUhxO2
8.1 KiB
IDS - Intrusion Detection System
Overview
This project is a full-stack web application for an Intrusion Detection System (IDS) tailored for MikroTik routers, utilizing Machine Learning. Its core function is to monitor network traffic, identify anomalies indicative of intrusions, and automatically block malicious IP addresses across multiple routers. The system aims to provide real-time monitoring, efficient anomaly detection, and streamlined network security management for MikroTik environments, including advanced features like IP geolocation and robust service monitoring.
User Preferences
Operazioni Git e Deployment
- IMPORTANTE: L'agente NON deve usare comandi git (push-gitlab.sh) perché Replit blocca le operazioni git
- Workflow corretto:
- Utente riporta errori/problemi dal server AlmaLinux
- Agente risolve problemi e modifica file su Replit
- Utente esegue manualmente:
./push-gitlab.shper commit+push - Utente esegue sul server:
./update_from_git.sho./update_from_git.sh --db - Utente testa e riporta risultati all'agente
- Ripeti fino a funzionamento completo
Linguaggio
- Tutte le risposte dell'agente devono essere in italiano
- Codice e documentazione tecnica: inglese
- Commit message: italiano
System Architecture
The IDS employs a React-based frontend for real-time monitoring, detection visualization, and whitelist management, built with ShadCN UI and TanStack Query. The backend consists of a Python FastAPI service dedicated to ML analysis (Isolation Forest with 25 targeted features), MikroTik API management, and a detection engine that scores anomalies from 0-100 across five risk levels. A Node.js (Express) backend handles API requests from the frontend, manages the PostgreSQL database, and coordinates service operations.
Key Architectural Decisions & Features:
- Log Collection & Processing: MikroTik syslog data (UDP:514) is sent to RSyslog, parsed by
syslog_parser.py, and stored in PostgreSQL. The parser includes auto-cleanup with a 3-day retention policy. - Machine Learning: An Isolation Forest model trained on 25 network log features performs real-time anomaly detection, assigning a risk score.
- Automated Blocking: Critical IPs (score >= 80) are automatically blocked in parallel across all configured MikroTik routers via their REST API.
- Service Monitoring & Management: A dashboard provides real-time status (green/red indicators) for the ML Backend, Database, and Syslog Parser. Service management (start/stop/restart) for Python services is available via API endpoints, secured with API key authentication and Systemd integration for production-grade control and auto-restart capabilities.
- IP Geolocation: Integrated
ip-api.comfor enriching detection data with geographical and Autonomous System (AS) information, including intelligent caching. - Database Management: PostgreSQL is used for all persistent data. An intelligent database versioning system ensures efficient SQL migrations, applying only new scripts. Dual-mode database drivers (
@neondatabase/serverlessfor Replit,pgfor AlmaLinux) ensure environment compatibility. - Microservices: Clear separation of concerns between the Python ML backend and the Node.js API backend.
- UI/UX: Utilizes ShadCN UI for a modern component library and
react-hook-formwith Zod for robust form validation.
External Dependencies
- React: Frontend framework.
- FastAPI: Python web framework for the ML backend.
- PostgreSQL: Primary database for storing configurations, logs, detections, and whitelist entries.
- MikroTik API REST: For router communication and IP blocking.
- ShadCN UI: Frontend component library.
- TanStack Query: Data fetching for the frontend.
- Isolation Forest: Machine Learning algorithm for anomaly detection.
- RSyslog: Log collection daemon.
- Drizzle ORM: For database schema definition in Node.js.
- Neon Database: Cloud-native PostgreSQL service (used in Replit).
- pg (Node.js driver): Standard PostgreSQL driver for Node.js (used in AlmaLinux).
- psycopg2: PostgreSQL adapter for Python.
- ip-api.com: External API for IP geolocation data.
- Recharts: Charting library for analytics visualization.
Recent Updates (Novembre 2025)
🔧 Analytics Aggregator Fix - Data Consistency (24 Nov 2025 - 17:00)
- BUG FIX CRITICO: Risolto mismatch dati Dashboard Live
- Problema: Distribuzione traffico mostrava 262k attacchi ma breakdown solo 19
- ROOT CAUSE: Aggregatore contava occorrenze invece di pacchetti in
attacks_by_typeeattacks_by_country - Soluzione:
- Spostato conteggio da loop detections → loop pacchetti
attacks_by_type[tipo] += packets(non +1!)attacks_by_country[paese] += packets(non +1!)- Fallback "unknown"/"Unknown" per dati mancanti (tipo/geo)
- Logging validazione: verifica breakdown_total == attack_packets
- Invariante matematica:
Σ(attacks_by_type) == Σ(attacks_by_country) == attack_packets - Files modificati:
python_ml/analytics_aggregator.py - Deploy: Restart ML backend + aggregator run manuale per testare
- Validazione: Log mostra
match: Truee nessun warning mismatch
📊 Network Analytics & Dashboard System (24 Nov 2025 - 11:30)
- Feature Completa: Sistema analytics con traffico normale + attacchi, visualizzazioni grafiche avanzate, dati permanenti
- Componenti:
- Database:
network_analyticstable con aggregazioni orarie/giornaliere permanenti - Aggregatore Python:
analytics_aggregator.pyclassifica traffico ogni ora - Systemd Timer: Esecuzione automatica ogni ora (:05 minuti)
- API:
/api/analytics/recente/api/analytics/range - Frontend: Dashboard Live (real-time 3 giorni) + Analytics Storici (permanente)
- Database:
- Grafici: Area Chart, Pie Chart, Bar Chart, Line Chart, Real-time Stream
- Flag Emoji: 🇮🇹🇺🇸🇷🇺🇨🇳 per identificazione immediata paese origine
- Deploy: Migration 005 +
./deployment/setup_analytics_timer.sh - Security Fix: Rimosso hardcoded path, implementato wrapper script sicuro
run_analytics.shper esecuzioni manuali - Production-grade: Credenziali gestite via systemd EnvironmentFile (automatico) o wrapper script (manuale)
- Frontend Fix: Analytics History ora usa dati orari (
hourly: true) finché aggregazione daily non è schedulata
🌍 IP Geolocation Integration (22 Nov 2025 - 13:00)
- Feature: Informazioni geografiche complete (paese, città, organizzazione, AS) per ogni IP
- API: ip-api.com con batch async lookup (100 IP in ~1.5s invece di 150s!)
- Performance: Caching intelligente + fallback robusto
- Display: Globe/Building/MapPin icons nella pagina Detections
- Deploy: Migration 004 + restart ML backend
🤖 Hybrid ML Detector - False Positive Reduction System (24 Nov 2025 - 18:30)
- Obiettivo: Riduzione falsi positivi 80-90% mantenendo alta detection accuracy
- Architettura Nuova:
- Extended Isolation Forest: n_estimators=250, contamination=0.03 (scientificamente tuned)
- Feature Selection: Chi-Square test riduce 25→18 feature più rilevanti
- Confidence Scoring: 3-tier system (High≥95%, Medium≥70%, Low<70%)
- Validation Framework: CICIDS2017 dataset con Precision/Recall/F1/FPR metrics
- Componenti:
python_ml/ml_hybrid_detector.py- Core detector con EIF + feature selectionpython_ml/dataset_loader.py- CICIDS2017 loader con mappatura 80→25 featurespython_ml/validation_metrics.py- Production-grade metrics calculatorpython_ml/train_hybrid.py- CLI training script (test/train/validate)
- Dipendenze Nuove: Cython==3.0.5, xgboost==2.0.3, joblib==1.3.2, eif==2.0.2
- Backward Compatibility: USE_HYBRID_DETECTOR env var (default=true)
- Target Metrics: Precision≥90%, Recall≥80%, FPR≤5%, F1≥85%
- Deploy: Vedere
deployment/CHECKLIST_ML_HYBRID.md - Fix Deploy (24 Nov 2025 - 19:15):
- Corretto
eif==2.0.0→eif==2.0.2(versione 2.0.0 non disponibile) - Aggiunto
Cython==3.0.5come dipendenza (richiesto per compilare eif da source)
- Corretto