GIN · internal_only

Leave-One-Family-OutMar 5, 2026

32f2e82e8a584eb5b31fdfb7a4b01fc2

Description

Full leave-one-family-out (LOFO) evaluation of GIN on internal_only. Trains 6 separate models, each time holding out one ransomware family entirely, to systematically measure whether the baseline can detect ransomware families not seen during training.

Conclusion

Main result of the baseline study. Mean malware recall across held-out families is just 18.1%, with wipelocker, blackroselucy, and filecoder at 0% recall. Confirms that the GIN baseline memorises family-specific structural patterns and does not generalise to unseen families. Family-aware evaluation is the honest benchmark for this thesis.

Mean Test Metrics (across holdouts)

Accuracy

70.7%

F1 Macro

48.6%

F1 Malware

17.2%

Precision

17.4%

Recall

18.1%

AUROC

85.1%

Best Val Loss

0.1119

Training Time

6849.8000s

Summed Confusion Matrix (all holdouts)

	Pred Benign	Pred Malware
Actual Benign	398	52
Actual Malware	163	50

Configuration

Hidden Dim	128
Num Layers	3
Dropout	0.5
Batch Size	4
Learning Rate	0.001
Weight Decay	0.0001
Max Epochs	200
ES Patience	20
ES Min Epochs	100
LR Patience	10
LR Factor	0.5
Mixed Precision	Yes
Random Seed	42
Epochs Trained	111