Add public lists integration with exact IP matching

Update merge logic to use exact IP matching for public lists, add deployment scripts and documentation for limitations.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 7a657272-55ba-4a79-9a2e-f1ed9bc7a528
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 75a02f7d-492b-46a8-9e67-d4fd471cabc7
Replit-Commit-Screenshot-Url: https://storage.googleapis.com/screenshot-production-us-central1/449cf7c4-c97a-45ae-8234-e5c5b8d6a84f/7a657272-55ba-4a79-9a2e-f1ed9bc7a528/QKzTQQy
This commit is contained in:
marco370 2025-11-26 09:45:55 +00:00
parent 77874c83bf
commit 5952142a56
5 changed files with 113 additions and 31 deletions

View File

@ -14,6 +14,10 @@ run = ["npm", "run", "start"]
localPort = 5000 localPort = 5000
externalPort = 80 externalPort = 80
[[ports]]
localPort = 36119
externalPort = 4200
[[ports]] [[ports]]
localPort = 41303 localPort = 41303
externalPort = 3002 externalPort = 3002

View File

@ -0,0 +1,48 @@
# Public Lists - Known Limitations (v2.0.0)
## CIDR Range Matching
**Current Status**: MVP with exact IP matching
**Impact**: CIDR ranges (e.g., Spamhaus /24 blocks) are stored but not yet matched against detections
### Details:
- `public_blacklist_ips.cidr_range` field exists and is populated by parsers
- Detections currently use **exact IP matching only**
- Whitelist entries with CIDR notation not expanded
### Future Iteration:
Requires PostgreSQL INET/CIDR column types and query optimizations:
1. Add dedicated `inet` columns to `public_blacklist_ips` and `whitelist`
2. Rewrite merge logic with CIDR containment operators (`<<=`, `>>=`)
3. Index optimization for network range queries
### Workaround (Production):
Most critical single IPs are still caught. For CIDR-heavy feeds, parser can be extended to expand ranges to individual IPs (trade-off: storage vs query performance).
---
## Integration Status
**Working**:
- Fetcher syncs every 10 minutes (systemd timer)
- Manual whitelist > Public whitelist > Blacklist priority
- Automatic cleanup of invalid detections
⚠️ **Manual Sync**:
- UI manual sync triggers by resetting `lastAttempt` timestamp
- Actual sync occurs on next fetcher cycle (max 10 min delay)
- For immediate sync: `sudo systemctl start ids-list-fetcher.service`
---
## Performance Notes
- Bulk SQL operations avoid O(N) per-IP queries
- Tested with 186M+ network_logs records
- Query optimization ongoing for CIDR expansion
---
**Version**: 2.0.0 MVP
**Date**: 2025-11-26
**Next Iteration**: Full CIDR matching support

View File

@ -0,0 +1,50 @@
#!/bin/bash
# Deploy Public Lists Integration (v2.0.0)
# Run on AlmaLinux 9 server after git pull
set -e
echo "=================================="
echo "PUBLIC LISTS DEPLOYMENT - v2.0.0"
echo "=================================="
# 1. Database Migration
echo -e "\n[1/5] Running database migration..."
sudo -u postgres psql -d ids_system -f deployment/migrations/006_add_public_lists.sql
echo "✓ Migration 006 applied"
# 2. Seed default lists
echo -e "\n[2/5] Seeding default public lists..."
cd python_ml/list_fetcher
DATABASE_URL=$DATABASE_URL python seed_lists.py
cd ../..
echo "✓ Default lists seeded"
# 3. Install systemd services
echo -e "\n[3/5] Installing systemd services..."
sudo cp deployment/systemd/ids-list-fetcher.service /etc/systemd/system/
sudo cp deployment/systemd/ids-list-fetcher.timer /etc/systemd/system/
sudo systemctl daemon-reload
echo "✓ Systemd services installed"
# 4. Enable and start
echo -e "\n[4/5] Enabling services..."
sudo systemctl enable ids-list-fetcher.timer
sudo systemctl start ids-list-fetcher.timer
echo "✓ Timer enabled (10-minute intervals)"
# 5. Initial sync
echo -e "\n[5/5] Running initial sync..."
sudo systemctl start ids-list-fetcher.service
echo "✓ Initial sync triggered"
echo -e "\n=================================="
echo "DEPLOYMENT COMPLETE"
echo "=================================="
echo ""
echo "Verify:"
echo " journalctl -u ids-list-fetcher -n 50"
echo " systemctl status ids-list-fetcher.timer"
echo ""
echo "Check UI: http://your-server/public-lists"
echo ""

37
python_ml/merge_logic.py Normal file → Executable file
View File

@ -252,30 +252,9 @@ class MergeLogic:
stats['cleaned'] = self.cleanup_invalid_detections() stats['cleaned'] = self.cleanup_invalid_detections()
# Bulk create detections for blacklisted IPs (excluding whitelisted) # Bulk create detections for blacklisted IPs (excluding whitelisted)
# Uses PostgreSQL INET/CIDR operators for proper CIDR range matching # MVP: Exact IP matching (CIDR expansion in future iteration)
# Critical for performance with 186M+ rows (single query vs O(N) loops) # Note: CIDR ranges stored but not yet matched - requires schema optimization
cur.execute(""" cur.execute("""
WITH blacklisted_ranges AS (
-- Blacklist entries with CIDR ranges (e.g. Spamhaus /24)
SELECT
bl.id as blacklist_id,
bl.ip_address,
COALESCE(bl.cidr_range, bl.ip_address) as cidr
FROM public_blacklist_ips bl
WHERE bl.is_active = true
),
whitelisted_ranges AS (
-- Whitelist entries (manual + public) with CIDR support
SELECT
ip_address,
CASE
WHEN ip_address ~ '/' THEN ip_address::inet
ELSE ip_address::inet
END as ip_range,
source
FROM whitelist
WHERE active = true
)
INSERT INTO detections ( INSERT INTO detections (
source_ip, source_ip,
risk_score, risk_score,
@ -290,16 +269,16 @@ class MergeLogic:
'75', '75',
'public_blacklist', 'public_blacklist',
'public_blacklist', 'public_blacklist',
bl.blacklist_id, bl.id,
NOW(), NOW(),
false false
FROM blacklisted_ranges bl FROM public_blacklist_ips bl
WHERE bl.cidr IS NOT NULL WHERE bl.is_active = true
-- Exclude if IP is in any whitelist range (manual or public)
-- Priority: Manual whitelist > Public whitelist > Blacklist -- Priority: Manual whitelist > Public whitelist > Blacklist
AND NOT EXISTS ( AND NOT EXISTS (
SELECT 1 FROM whitelisted_ranges wl SELECT 1 FROM whitelist wl
WHERE bl.ip_address::inet <<= wl.ip_range WHERE wl.ip_address = bl.ip_address
AND wl.active = true
) )
-- Avoid duplicate detections -- Avoid duplicate detections
AND NOT EXISTS ( AND NOT EXISTS (

View File

@ -24,12 +24,13 @@ The IDS employs a React-based frontend for real-time monitoring, detection visua
**Key Architectural Decisions & Features:** **Key Architectural Decisions & Features:**
- **Log Collection & Processing**: MikroTik syslog data (UDP:514) is parsed by `syslog_parser.py` and stored in PostgreSQL with a 3-day retention policy. The parser includes auto-reconnect and error recovery mechanisms. - **Log Collection & Processing**: MikroTik syslog data (UDP:514) is parsed by `syslog_parser.py` and stored in PostgreSQL with a 3-day retention policy. The parser includes auto-reconnect and error recovery mechanisms.
- **Machine Learning**: An Isolation Forest model (sklearn.IsolationForest) trained on 25 network log features performs real-time anomaly detection, assigning a risk score (0-100 across five risk levels). A hybrid ML detector (Isolation Forest + Ensemble Classifier with weighted voting) reduces false positives. The system supports weekly automatic retraining of models. - **Machine Learning**: An Isolation Forest model (sklearn.IsolectionForest) trained on 25 network log features performs real-time anomaly detection, assigning a risk score (0-100 across five risk levels). A hybrid ML detector (Isolation Forest + Ensemble Classifier with weighted voting) reduces false positives. The system supports weekly automatic retraining of models.
- **Automated Blocking**: Critical IPs (score >= 80) are automatically blocked in parallel across configured MikroTik routers via their REST API. - **Automated Blocking**: Critical IPs (score >= 80) are automatically blocked in parallel across configured MikroTik routers via their REST API.
- **Public Lists Integration (v2.0.0)**: Automatic fetcher syncs blacklist/whitelist feeds every 10 minutes (Spamhaus, Talos, AWS, GCP, Cloudflare, IANA, NTP Pool). Priority-based merge logic: Manual whitelist > Public whitelist > Blacklist. Detections created for blacklisted IPs (excluding whitelisted). CRUD API + UI for list management. MVP uses exact IP matching (CIDR expansion planned for future iteration). See `deployment/docs/PUBLIC_LISTS_LIMITATIONS.md` for details.
- **Automatic Cleanup**: An hourly systemd timer (`cleanup_detections.py`) removes old detections (48h) and auto-unblocks IPs (2h). - **Automatic Cleanup**: An hourly systemd timer (`cleanup_detections.py`) removes old detections (48h) and auto-unblocks IPs (2h).
- **Service Monitoring & Management**: A dashboard provides real-time status (ML Backend, Database, Syslog Parser). API endpoints, secured with API key authentication and Systemd integration, allow for service management (start/stop/restart) of Python services. - **Service Monitoring & Management**: A dashboard provides real-time status (ML Backend, Database, Syslog Parser). API endpoints, secured with API key authentication and Systemd integration, allow for service management (start/stop/restart) of Python services.
- **IP Geolocation**: Integration with `ip-api.com` enriches detection data with geographical and AS information, utilizing intelligent caching. - **IP Geolocation**: Integration with `ip-api.com` enriches detection data with geographical and AS information, utilizing intelligent caching.
- **Database Management**: PostgreSQL is used for all persistent data. An intelligent database versioning system ensures efficient SQL migrations. Dual-mode database drivers (`@neondatabase/serverless` for Replit, `pg` for AlmaLinux) ensure environment compatibility. - **Database Management**: PostgreSQL is used for all persistent data. An intelligent database versioning system ensures efficient SQL migrations (v6 with public_lists tables). Dual-mode database drivers (`@neondatabase/serverless` for Replit, `pg` for AlmaLinux) ensure environment compatibility.
- **Microservices**: Clear separation of concerns between the Python ML backend and the Node.js API backend. - **Microservices**: Clear separation of concerns between the Python ML backend and the Node.js API backend.
- **UI/UX**: Utilizes ShadCN UI for a modern component library and `react-hook-form` with Zod for robust form validation. Analytics dashboards provide visualizations of normal and attack traffic, including real-time and historical data. - **UI/UX**: Utilizes ShadCN UI for a modern component library and `react-hook-form` with Zod for robust form validation. Analytics dashboards provide visualizations of normal and attack traffic, including real-time and historical data.