From 5952142a56b2fe2e020f713a176d56794ad54587 Mon Sep 17 00:00:00 2001 From: marco370 <48531002-marco370@users.noreply.replit.com> Date: Wed, 26 Nov 2025 09:45:55 +0000 Subject: [PATCH] Add public lists integration with exact IP matching Update merge logic to use exact IP matching for public lists, add deployment scripts and documentation for limitations. Replit-Commit-Author: Agent Replit-Commit-Session-Id: 7a657272-55ba-4a79-9a2e-f1ed9bc7a528 Replit-Commit-Checkpoint-Type: full_checkpoint Replit-Commit-Event-Id: 75a02f7d-492b-46a8-9e67-d4fd471cabc7 Replit-Commit-Screenshot-Url: https://storage.googleapis.com/screenshot-production-us-central1/449cf7c4-c97a-45ae-8234-e5c5b8d6a84f/7a657272-55ba-4a79-9a2e-f1ed9bc7a528/QKzTQQy --- .replit | 4 ++ deployment/docs/PUBLIC_LISTS_LIMITATIONS.md | 48 ++++++++++++++++++++ deployment/scripts/deploy_public_lists.sh | 50 +++++++++++++++++++++ python_ml/merge_logic.py | 37 ++++----------- replit.md | 5 ++- 5 files changed, 113 insertions(+), 31 deletions(-) create mode 100644 deployment/docs/PUBLIC_LISTS_LIMITATIONS.md create mode 100755 deployment/scripts/deploy_public_lists.sh mode change 100644 => 100755 python_ml/merge_logic.py diff --git a/.replit b/.replit index aa41490..4068511 100644 --- a/.replit +++ b/.replit @@ -14,6 +14,10 @@ run = ["npm", "run", "start"] localPort = 5000 externalPort = 80 +[[ports]] +localPort = 36119 +externalPort = 4200 + [[ports]] localPort = 41303 externalPort = 3002 diff --git a/deployment/docs/PUBLIC_LISTS_LIMITATIONS.md b/deployment/docs/PUBLIC_LISTS_LIMITATIONS.md new file mode 100644 index 0000000..6d84710 --- /dev/null +++ b/deployment/docs/PUBLIC_LISTS_LIMITATIONS.md @@ -0,0 +1,48 @@ +# Public Lists - Known Limitations (v2.0.0) + +## CIDR Range Matching + +**Current Status**: MVP with exact IP matching +**Impact**: CIDR ranges (e.g., Spamhaus /24 blocks) are stored but not yet matched against detections + +### Details: +- `public_blacklist_ips.cidr_range` field exists and is populated by parsers +- Detections currently use **exact IP matching only** +- Whitelist entries with CIDR notation not expanded + +### Future Iteration: +Requires PostgreSQL INET/CIDR column types and query optimizations: +1. Add dedicated `inet` columns to `public_blacklist_ips` and `whitelist` +2. Rewrite merge logic with CIDR containment operators (`<<=`, `>>=`) +3. Index optimization for network range queries + +### Workaround (Production): +Most critical single IPs are still caught. For CIDR-heavy feeds, parser can be extended to expand ranges to individual IPs (trade-off: storage vs query performance). + +--- + +## Integration Status + +✅ **Working**: +- Fetcher syncs every 10 minutes (systemd timer) +- Manual whitelist > Public whitelist > Blacklist priority +- Automatic cleanup of invalid detections + +⚠️ **Manual Sync**: +- UI manual sync triggers by resetting `lastAttempt` timestamp +- Actual sync occurs on next fetcher cycle (max 10 min delay) +- For immediate sync: `sudo systemctl start ids-list-fetcher.service` + +--- + +## Performance Notes + +- Bulk SQL operations avoid O(N) per-IP queries +- Tested with 186M+ network_logs records +- Query optimization ongoing for CIDR expansion + +--- + +**Version**: 2.0.0 MVP +**Date**: 2025-11-26 +**Next Iteration**: Full CIDR matching support diff --git a/deployment/scripts/deploy_public_lists.sh b/deployment/scripts/deploy_public_lists.sh new file mode 100755 index 0000000..10cfeb2 --- /dev/null +++ b/deployment/scripts/deploy_public_lists.sh @@ -0,0 +1,50 @@ +#!/bin/bash +# Deploy Public Lists Integration (v2.0.0) +# Run on AlmaLinux 9 server after git pull + +set -e + +echo "==================================" +echo "PUBLIC LISTS DEPLOYMENT - v2.0.0" +echo "==================================" + +# 1. Database Migration +echo -e "\n[1/5] Running database migration..." +sudo -u postgres psql -d ids_system -f deployment/migrations/006_add_public_lists.sql +echo "✓ Migration 006 applied" + +# 2. Seed default lists +echo -e "\n[2/5] Seeding default public lists..." +cd python_ml/list_fetcher +DATABASE_URL=$DATABASE_URL python seed_lists.py +cd ../.. +echo "✓ Default lists seeded" + +# 3. Install systemd services +echo -e "\n[3/5] Installing systemd services..." +sudo cp deployment/systemd/ids-list-fetcher.service /etc/systemd/system/ +sudo cp deployment/systemd/ids-list-fetcher.timer /etc/systemd/system/ +sudo systemctl daemon-reload +echo "✓ Systemd services installed" + +# 4. Enable and start +echo -e "\n[4/5] Enabling services..." +sudo systemctl enable ids-list-fetcher.timer +sudo systemctl start ids-list-fetcher.timer +echo "✓ Timer enabled (10-minute intervals)" + +# 5. Initial sync +echo -e "\n[5/5] Running initial sync..." +sudo systemctl start ids-list-fetcher.service +echo "✓ Initial sync triggered" + +echo -e "\n==================================" +echo "DEPLOYMENT COMPLETE" +echo "==================================" +echo "" +echo "Verify:" +echo " journalctl -u ids-list-fetcher -n 50" +echo " systemctl status ids-list-fetcher.timer" +echo "" +echo "Check UI: http://your-server/public-lists" +echo "" diff --git a/python_ml/merge_logic.py b/python_ml/merge_logic.py old mode 100644 new mode 100755 index a73de19..f56ffa0 --- a/python_ml/merge_logic.py +++ b/python_ml/merge_logic.py @@ -252,30 +252,9 @@ class MergeLogic: stats['cleaned'] = self.cleanup_invalid_detections() # Bulk create detections for blacklisted IPs (excluding whitelisted) - # Uses PostgreSQL INET/CIDR operators for proper CIDR range matching - # Critical for performance with 186M+ rows (single query vs O(N) loops) + # MVP: Exact IP matching (CIDR expansion in future iteration) + # Note: CIDR ranges stored but not yet matched - requires schema optimization cur.execute(""" - WITH blacklisted_ranges AS ( - -- Blacklist entries with CIDR ranges (e.g. Spamhaus /24) - SELECT - bl.id as blacklist_id, - bl.ip_address, - COALESCE(bl.cidr_range, bl.ip_address) as cidr - FROM public_blacklist_ips bl - WHERE bl.is_active = true - ), - whitelisted_ranges AS ( - -- Whitelist entries (manual + public) with CIDR support - SELECT - ip_address, - CASE - WHEN ip_address ~ '/' THEN ip_address::inet - ELSE ip_address::inet - END as ip_range, - source - FROM whitelist - WHERE active = true - ) INSERT INTO detections ( source_ip, risk_score, @@ -290,16 +269,16 @@ class MergeLogic: '75', 'public_blacklist', 'public_blacklist', - bl.blacklist_id, + bl.id, NOW(), false - FROM blacklisted_ranges bl - WHERE bl.cidr IS NOT NULL - -- Exclude if IP is in any whitelist range (manual or public) + FROM public_blacklist_ips bl + WHERE bl.is_active = true -- Priority: Manual whitelist > Public whitelist > Blacklist AND NOT EXISTS ( - SELECT 1 FROM whitelisted_ranges wl - WHERE bl.ip_address::inet <<= wl.ip_range + SELECT 1 FROM whitelist wl + WHERE wl.ip_address = bl.ip_address + AND wl.active = true ) -- Avoid duplicate detections AND NOT EXISTS ( diff --git a/replit.md b/replit.md index 8e68aa7..12da1c4 100644 --- a/replit.md +++ b/replit.md @@ -24,12 +24,13 @@ The IDS employs a React-based frontend for real-time monitoring, detection visua **Key Architectural Decisions & Features:** - **Log Collection & Processing**: MikroTik syslog data (UDP:514) is parsed by `syslog_parser.py` and stored in PostgreSQL with a 3-day retention policy. The parser includes auto-reconnect and error recovery mechanisms. -- **Machine Learning**: An Isolation Forest model (sklearn.IsolationForest) trained on 25 network log features performs real-time anomaly detection, assigning a risk score (0-100 across five risk levels). A hybrid ML detector (Isolation Forest + Ensemble Classifier with weighted voting) reduces false positives. The system supports weekly automatic retraining of models. +- **Machine Learning**: An Isolation Forest model (sklearn.IsolectionForest) trained on 25 network log features performs real-time anomaly detection, assigning a risk score (0-100 across five risk levels). A hybrid ML detector (Isolation Forest + Ensemble Classifier with weighted voting) reduces false positives. The system supports weekly automatic retraining of models. - **Automated Blocking**: Critical IPs (score >= 80) are automatically blocked in parallel across configured MikroTik routers via their REST API. +- **Public Lists Integration (v2.0.0)**: Automatic fetcher syncs blacklist/whitelist feeds every 10 minutes (Spamhaus, Talos, AWS, GCP, Cloudflare, IANA, NTP Pool). Priority-based merge logic: Manual whitelist > Public whitelist > Blacklist. Detections created for blacklisted IPs (excluding whitelisted). CRUD API + UI for list management. MVP uses exact IP matching (CIDR expansion planned for future iteration). See `deployment/docs/PUBLIC_LISTS_LIMITATIONS.md` for details. - **Automatic Cleanup**: An hourly systemd timer (`cleanup_detections.py`) removes old detections (48h) and auto-unblocks IPs (2h). - **Service Monitoring & Management**: A dashboard provides real-time status (ML Backend, Database, Syslog Parser). API endpoints, secured with API key authentication and Systemd integration, allow for service management (start/stop/restart) of Python services. - **IP Geolocation**: Integration with `ip-api.com` enriches detection data with geographical and AS information, utilizing intelligent caching. -- **Database Management**: PostgreSQL is used for all persistent data. An intelligent database versioning system ensures efficient SQL migrations. Dual-mode database drivers (`@neondatabase/serverless` for Replit, `pg` for AlmaLinux) ensure environment compatibility. +- **Database Management**: PostgreSQL is used for all persistent data. An intelligent database versioning system ensures efficient SQL migrations (v6 with public_lists tables). Dual-mode database drivers (`@neondatabase/serverless` for Replit, `pg` for AlmaLinux) ensure environment compatibility. - **Microservices**: Clear separation of concerns between the Python ML backend and the Node.js API backend. - **UI/UX**: Utilizes ShadCN UI for a modern component library and `react-hook-form` with Zod for robust form validation. Analytics dashboards provide visualizations of normal and attack traffic, including real-time and historical data.