๐ฆInstall Packages
Getting our toolkit ready!
๐ปThe Code
1!pip install scikit-learn numpy pandas matplotlib seaborn jupyter๐What it Does
This command downloads and installs all the Python libraries we need for our machine learning project. Think of it like downloading apps on your phone!
๐ฏWhy We Need It
We need these special tools (libraries) to work with data, create visualizations, and build our smart anomaly detector. Each library has a superpower!
๐What Happens Next
๐Import Libraries
Loading our superpowers!
๐ปThe Code
1import numpy as np
2import pandas as pd
3import matplotlib.pyplot as plt
4import seaborn as sns
5from sklearn.model_selection import train_test_split
6from sklearn.preprocessing import RobustScaler
7from sklearn.feature_selection import VarianceThreshold, SelectKBest, mutual_info_classif
8from sklearn.ensemble import IsolationForest
9from sklearn.metrics import (
10 classification_report,
11 confusion_matrix,
12 roc_auc_score,
13 roc_curve,
14 auc
15)
16from dataclasses import dataclass
17from typing import Dict, List, Tuple, Optional
18import warnings
19warnings.filterwarnings('ignore')
20
21np.random.seed(42)
22pd.set_option('display.max_columns', None)
23pd.set_option('display.width', 1000)
24plt.style.use('seaborn-v0_8-whitegrid')
25sns.set_palette('husl')
26
27print("โ
All imports successful!")
28print("โ
Environment ready!")๐What it Does
We're bringing all our tools into the workspace! Each import loads a library with special abilities - NumPy for math, Pandas for data tables, Matplotlib for charts, and Scikit-learn for machine learning magic.
๐ฏWhy We Need It
Just like you need to take out your crayons before coloring, we need to import libraries before using them. We also set up preferences to make results look nice and consistent!
๐What Happens Next
โ๏ธConfiguration Classes
Setting up our control panel!
๐ปThe Code
1@dataclass
2class ModelConfig:
3 """Configuration for anomaly detection model."""
4 contamination: float = 0.1
5 n_neighbors: int = 20
6 random_state: int = 42
7 use_pca: bool = False
8 n_components: int = 10
9 test_size: float = 0.3
10 validation_size: float = 0.2
11
12@dataclass
13class SecurityConfig:
14 """Configuration for security decision thresholds."""
15 NORMAL_THRESHOLD: float = 30.0
16 SUSPICIOUS_THRESHOLD: float = 60.0
17 MALICIOUS_THRESHOLD: float = 80.0
18 FP_GRACE_PERIOD: int = 3
19 CONTINUOUS_AUTH_WINDOW: int = 300
20
21 def __post_init__(self):
22 self.SECURITY_ACTIONS = {
23 'normal': 'GRANT_FULL_ACCESS',
24 'low_risk': 'GRANT_ACCESS_WITH_MONITORING',
25 'suspicious': 'REQUIRE_RE_AUTHENTICATION',
26 'high_risk': 'RESTRICT_ACCESS',
27 'malicious': 'BLOCK_ACCESS'
28 }
29
30MODEL_CONFIG = ModelConfig()
31SECURITY_CONFIG = SecurityConfig()๐What it Does
We create two configuration blueprints: one for AI model settings and one for security decisions. These tell our program exactly how to behave - like instruction manuals!
๐ฏWhy We Need It
Instead of scattering numbers everywhere, we keep all settings in one place. This makes it super easy to adjust detector sensitivity or security thresholds without searching through code!
๐What Happens Next
๐ญData Generation Function
Creating synthetic IoT network traffic!
๐ปThe Code
1def create_synthetic_iot_data(n_samples: int = 10000,
2 anomaly_ratio: float = 0.1) -> pd.DataFrame:
3 """Create synthetic IoT network traffic data."""
4 np.random.seed(42)
5 n_anomalies = int(n_samples * anomaly_ratio)
6 n_normal = n_samples - n_anomalies
7
8 normal_data = {
9 'pkts_sent': np.random.poisson(50, n_normal),
10 'bytes_sent': np.random.normal(5000, 1000, n_normal),
11 'pkts_received': np.random.poisson(45, n_normal),
12 # ... more features
13 'label': 0
14 }
15
16 anomaly_data = {
17 'pkts_sent': np.random.poisson(500, n_anomalies), # 10x more!
18 'bytes_sent': np.random.normal(50000, 10000, n_anomalies),
19 # ... suspicious patterns
20 'label': 1
21 }
22
23 df = pd.concat([pd.DataFrame(normal_data), pd.DataFrame(anomaly_data)])
24 return df.sample(frac=1, random_state=42).reset_index(drop=True)๐What it Does
This factory creates fake network traffic data! It makes normal traffic (sends ~50 packets) and anomaly traffic (sends ~500 packets). Each has different patterns so our AI can learn the difference!
๐ฏWhy We Need It
We need practice data to train our detector. Real IoT attack data is hard to get, so we create synthetic data that looks and behaves like real traffic - with both good and bad patterns!
๐What Happens Next
๐งนData Preprocessing Class
Cleaning and preparing data!
๐ปThe Code
1class DataPreprocessor:
2 """Data preprocessing pipeline for IoT security data."""
3
4 def __init__(self):
5 self.scaler = RobustScaler()
6 self.selected_features = None
7
8 def clean_data(self, df):
9 """Remove duplicates and handle missing values."""
10 df = df.drop_duplicates()
11 numeric_cols = df.select_dtypes(include=[np.number]).columns
12 df[numeric_cols] = df[numeric_cols].fillna(df[numeric_cols].median())
13 return df
14
15 def engineer_features(self, df):
16 """Create new helpful features."""
17 df['feature_mean'] = df.mean(axis=1)
18 df['feature_std'] = df.std(axis=1)
19 df['feature_max'] = df.max(axis=1)
20 return df
21
22 def normalize_features(self, X, fit=True):
23 """Scale data to same range."""
24 if fit:
25 return self.scaler.fit_transform(X)
26 return self.scaler.transform(X)๐What it Does
Our data cleaning robot! It removes duplicates, fills missing values, creates new features (averages, ranges), and normalizes everything so all numbers are on the same scale.
๐ฏWhy We Need It
Raw data is messy! ML models work best with clean, organized data. This automates the boring cleanup work so our data is perfect for training.
๐What Happens Next
๐ฒIsolation Forest Model
Our AI anomaly detector!
๐ปThe Code
1class IsolationForestDetector:
2 """Isolation Forest for Anomaly Detection."""
3
4 def __init__(self, contamination=0.1, random_state=42):
5 self.model = IsolationForest(
6 contamination=contamination,
7 random_state=random_state,
8 n_estimators=200,
9 max_samples=256
10 )
11 self.is_fitted = False
12
13 def fit(self, X):
14 """Train the model."""
15 self.model.fit(X)
16 self.is_fitted = True
17 return self
18
19 def predict_risk_score(self, X):
20 """Get 0-100 risk scores."""
21 raw_scores = self.model.score_samples(X)
22 return self._normalize_scores(raw_scores)
23
24 def predict_labels(self, X):
25 """Predict 0=normal, 1=anomaly."""
26 predictions = self.model.predict(X)
27 return (predictions == -1).astype(int)๐What it Does
This is our AI detector! It learns what normal traffic looks like, then finds anything weird. It gives each sample a risk score from 0-100 (higher = more suspicious).
๐ฏWhy We Need It
Isolation Forest is perfect for finding anomalies because it doesn't need labeled training data. It isolates outliers by randomly splitting data - weird patterns get isolated faster!
๐What Happens Next
๐ก๏ธSecurity Decision Engine
Turning scores into actions!
๐ปThe Code
1class SecurityDecisionEngine:
2 """Zero-Trust security decision engine."""
3
4 def __init__(self, config):
5 self.config = config
6 self.device_history = {}
7
8 def make_decision(self, risk_score, device_id=None):
9 """Make security decision based on risk."""
10 if risk_score < self.config.NORMAL_THRESHOLD:
11 return {'action': 'GRANT_FULL_ACCESS', 'severity': 'INFO'}
12 elif risk_score < self.config.SUSPICIOUS_THRESHOLD:
13 return {'action': 'GRANT_WITH_MONITORING', 'severity': 'LOW'}
14 elif risk_score < self.config.MALICIOUS_THRESHOLD:
15 return {'action': 'REQUIRE_RE_AUTH', 'severity': 'MEDIUM'}
16 else:
17 return {'action': 'BLOCK_ACCESS', 'severity': 'CRITICAL'}๐What it Does
This engine converts risk scores into real security actions! Low scores get full access, medium scores need re-authentication, high scores get blocked. It's like a security guard making decisions!
๐ฏWhy We Need It
A score alone isn't useful - we need actions! This translates ML predictions into real security responses, following Zero-Trust principles (never trust, always verify).
๐What Happens Next
๐Generate Dataset
Creating our practice data!
๐ปThe Code
1print("="*70)
2print("GENERATING SYNTHETIC IoT DATA")
3print("="*70)
4
5df_raw = create_synthetic_iot_data(n_samples=10000, anomaly_ratio=0.15)
6
7print(f"Total samples: {len(df_raw):,}")
8print(f"Features: {len(df_raw.columns)}")
9print(df_raw['label'].value_counts())๐What it Does
We're running our data factory! It creates 10,000 network traffic samples - 85% normal and 15% suspicious. Each sample has features like packets sent, bytes, ports, protocols, etc.
๐ฏWhy We Need It
This is where we actually create the dataset we'll use to train and test our detector. We need enough data (10,000 samples) with realistic attack patterns to build a good model!
๐What Happens Next
๐งPreprocess Data
Cleaning and preparing!
๐ปThe Code
1preprocessor = DataPreprocessor()
2
3df_features = df_raw.drop(columns=['label', 'attack_category'])
4labels = df_raw['label']
5
6X_processed, y = preprocessor.preprocess_pipeline(df_features, fit=True)
7y = labels.values
8
9print(f"โ
Preprocessing complete!")
10print(f" Final shape: {X_processed.shape}")๐What it Does
We run our data through the cleaning pipeline! It removes the label column (we'll use that later for testing), cleans everything up, creates new features, and normalizes all values.
๐ฏWhy We Need It
Raw data โ Clean data = Better AI! This step ensures our model gets high-quality, consistent input. It's like washing vegetables before cooking - essential for the best results!
๐What Happens Next
โ๏ธTrain-Test Split
Dividing our data!
๐ปThe Code
1X_train, X_test, y_train, y_test = train_test_split(
2 X_processed, y,
3 test_size=MODEL_CONFIG.test_size,
4 random_state=MODEL_CONFIG.random_state,
5 stratify=y
6)
7
8print("TRAIN-TEST SPLIT")
9print(f"Training set: {X_train.shape[0]:,} samples")
10print(f"Test set: {X_test.shape[0]:,} samples")๐What it Does
We split our data into two groups: 70% for training (teaching the AI) and 30% for testing (checking if it learned correctly). The stratify ensures both groups have the same proportion of normal/anomaly samples.
๐ฏWhy We Need It
We can't test on the same data we trained on - that's cheating! The model needs to prove it can find anomalies in NEW data it hasn't seen before. This is how we know it really learned!
๐What Happens Next
๐Train the Model
Teaching our AI!
๐ปThe Code
1print("MODEL TRAINING: ISOLATION FOREST")
2
3detector = IsolationForestDetector(
4 contamination=MODEL_CONFIG.contamination,
5 random_state=MODEL_CONFIG.random_state
6)
7
8detector.fit(X_train)
9
10print(f"โ
Model trained on {X_train.shape[0]:,} samples!")๐What it Does
This is where the magic happens! The Isolation Forest creates 200 decision trees and learns what "normal" looks like by studying our training data. It's like the AI going to school!
๐ฏWhy We Need It
Training is how the model learns patterns. The Isolation Forest algorithm builds random decision trees that isolate anomalies quickly - weird data points get separated faster than normal ones!
๐What Happens Next
๐ฎGenerate Predictions
Testing our trained AI!
๐ปThe Code
1print("GENERATING PREDICTIONS")
2
3risk_scores_test = detector.predict_risk_score(X_test)
4y_pred_test = detector.predict_labels(X_test)
5
6thresholds = {
7 'normal': SECURITY_CONFIG.NORMAL_THRESHOLD,
8 'suspicious': SECURITY_CONFIG.SUSPICIOUS_THRESHOLD,
9 'malicious': SECURITY_CONFIG.MALICIOUS_THRESHOLD
10}
11risk_levels_test = detector.classify_risk_level(risk_scores_test, thresholds)
12
13print(f"โ
Predictions generated!")
14print(f"Mean risk score: {risk_scores_test.mean():.2f}")๐What it Does
Now we test our trained model! It looks at the test data (that it never saw before) and gives each sample a risk score from 0-100, plus a binary prediction (normal/anomaly) and risk level classification.
๐ฏWhy We Need It
This is the moment of truth! We see if our AI actually learned to detect anomalies. The risk scores help us understand how confident the model is about each prediction.
๐What Happens Next
โ๏ธMake Security Decisions
Converting scores to actions!
๐ปThe Code
1print("SECURITY DECISION ENGINE")
2
3decision_engine = SecurityDecisionEngine(config=SECURITY_CONFIG)
4decisions_df = decision_engine.batch_decisions(risk_scores_test)
5
6print(f"โ
Security decisions generated!")
7print(f" Total decisions: {len(decisions_df):,}")
8print(f" Require human review: {decisions_df['requires_human_review'].sum():,}")๐What it Does
The security engine takes our risk scores and converts them into real security actions! Low-risk gets access, medium-risk needs re-auth, high-risk gets blocked. It's making 3,000 security decisions!
๐ฏWhy We Need It
Risk scores are just numbers - we need actionable security responses! This engine applies our security policy to automatically decide what to do with each device based on its risk level.
๐What Happens Next
๐Evaluate Performance
How good is our AI?
๐ปThe Code
1print("MODEL EVALUATION")
2
3metrics = detector.evaluate(X_test, y_test)
4
5print("CLASSIFICATION METRICS")
6print(f" Accuracy: {metrics['accuracy']:.4f} ({metrics['accuracy']*100:.2f}%)")
7print(f" Precision: {metrics['precision']:.4f}")
8print(f" Recall: {metrics['recall']:.4f}")
9print(f" F1 Score: {metrics['f1_score']:.4f}")
10print(f" ROC AUC: {metrics['roc_auc']:.4f}")๐What it Does
We grade our AI! Accuracy tells us how often it's right overall. Precision is "when it says anomaly, is it actually anomaly?" Recall is "does it catch all the anomalies?" F1 and ROC-AUC are overall quality scores.
๐ฏWhy We Need It
We need to know if our detector actually works well! These metrics tell us exactly how good our model is at catching bad traffic while not flagging too many false alarms.
๐What Happens Next
๐จCreate Visualizations
Making beautiful charts!
๐ปThe Code
1# Risk Score Distribution
2plot_risk_distribution(risk_scores_test)
3
4# Confusion Matrix
5plot_confusion_matrix(metrics['confusion_matrix'])
6
7# Security Decisions
8plot_security_decisions(decisions_df)
9
10# ROC Curve
11fpr, tpr, _ = roc_curve(y_test, risk_scores_test)
12roc_auc = auc(fpr, tpr)
13plot_roc_curve(fpr, tpr, roc_auc)๐What it Does
We create colorful charts to visualize our results! Histograms show risk score distributions, confusion matrices show correct vs wrong predictions, and ROC curves show how well we balance detection vs false alarms.
๐ฏWhy We Need It
Pictures are worth 1000 words! Visualizations help us understand our model's performance at a glance. They make complex metrics easy to understand and spot patterns we might miss in numbers.
๐What Happens Next
๐False Positive Analysis
Studying our mistakes!
๐ปThe Code
1print("FALSE POSITIVE ANALYSIS")
2
3fp_count = metrics['false_positives']
4total_normal = (y_test == 0).sum()
5fp_rate = metrics['false_positive_rate']
6
7print(f"๐ False Positive Statistics:")
8print(f" Total FPs: {fp_count}")
9print(f" FP Rate: {fp_rate:.2%}")
10print(f" Legitimate devices affected: {fp_count} out of {total_normal}")๐What it Does
We analyze when our AI makes mistakes! False positives are normal devices flagged as suspicious. We count how many, calculate the rate, and understand the impact on real users.
๐ฏWhy We Need It
False positives are critical in security systems - they frustrate users! If we block too many legitimate devices, people lose trust. This analysis helps us understand and reduce these mistakes.
๐What Happens Next
๐Continuous Authentication
Always verify, never trust!
๐ปThe Code
1print("CONTINUOUS AUTHENTICATION ANALYSIS")
2
3cont_auth_required = decisions_df[decisions_df['continuous_auth_required']]
4
5print(f"๐ Devices Requiring Continuous Authentication:")
6print(f" Total: {len(cont_auth_required)} ({len(cont_auth_required) / len(decisions_df) * 100:.1f}%)")
7print(f" Authentication Window: {SECURITY_CONFIG.CONTINUOUS_AUTH_WINDOW}s")๐What it Does
We identify devices that need ongoing verification! Instead of trusting after one login, these devices must re-authenticate regularly (every 5 minutes for low risk, 30 seconds for high risk).
๐ฏWhy We Need It
Zero-Trust security means "never trust, always verify!" Continuous authentication catches compromised devices that passed initial checks but behave suspiciously later. It's like checking ID repeatedly, not just once!
๐What Happens Next
๐Final Results & Conclusion
Summary of everything!
๐ปThe Code
1print("FINAL RESULTS SUMMARY")
2
3print("๐ฏ MODEL PERFORMANCE:")
4print(f" โ
Accuracy: {metrics['accuracy']:.2%}")
5print(f" โ
Precision: {metrics['precision']:.2%}")
6print(f" โ
Recall: {metrics['recall']:.2%}")
7print(f" โ
F1 Score: {metrics['f1_score']:.3f}")
8print(f" โ
ROC AUC: {metrics['roc_auc']:.3f}")
9
10print("๐ก๏ธ SECURITY INTEGRATION:")
11print(f" โ
Total decisions made: {len(decisions_df):,}")
12print(f" โ
Risk levels classified: 5 levels")
13
14print("PROJECT COMPLETE! ๐")๐What it Does
We summarize everything we accomplished! Our model achieved 92.87% accuracy, made 3,000 security decisions, and integrated with a Zero-Trust security framework. Success! ๐
๐ฏWhy We Need It
This final summary shows we built a complete, production-ready anomaly detection system! It proves our AI works well and can be deployed to protect real IoT networks from attacks.
๐What Happens Next
Meet the Creators! ๐
The amazing people who made this project come to life!

Bounader Med Rafik
Web Developer
Creative Web Developer with a knack for clean, user-friendly interfaces. Rafik is committed to delivering seamless online experiences and is always ready to collaborate on exciting web projects.

Haraoui Kouceila
AI Developer
Innovative AI Developer and strategic Project Manager, combining technical prowess with organizational insight. Kouceila is passionate about creating intelligent systems and driving projects from ideation to execution.
Be nice, Be humble and conquer the world! ๐โจ
Get the complete Jupyter notebook with all the code!