Experiments

10 experiment runs logged.

First baseline experiment. Train GIN on the internal_only FCG dataset using a standard stratified 70/15/15 train/val/test split to establish a performance reference for graph-based ransomware detection.

Accuracy

98.2%

F1 Macro

97.8%

Recall

100.0%

AUROC

99.0%

Feb 17, 2026 · 135 epochs · 884s

GINfull_fcg

Train GIN on the full_fcg dataset (all methods, internal + external) to compare with the internal_only result from Experiment 02.

95.4%

94.7%

100.0%

97.7%

Feb 17, 2026 · 113 epochs · 935s

GCNinternal_only

Train GCN on the internal_only dataset to compare GNN architectures. Same hyperparameters and data split as GIN experiments.

96.3%

95.6%

93.8%

98.4%

Feb 17, 2026 · 108 epochs · 1057s

GCNfull_fcg

Train GCN on the full_fcg dataset to complete the GCN comparison across both graph representations.

94.4%

93.5%

93.8%

98.9%

Feb 17, 2026 · 101 epochs · 1449s

GATinternal_only

Train GAT on the internal_only dataset to complete the three-model architecture comparison (GIN, GCN, GAT).

96.3%

95.6%

96.9%

98.5%

Feb 18, 2026 · 100 epochs · 2047s

GATfull_fcg

Train GAT on the full_fcg dataset to complete the 3-model x 2-dataset baseline grid.

98.2%

97.8%

96.9%

99.4%

Feb 18, 2026 · 113 epochs · 3646s

GINinternal_onlyCross-Validation

4-fold cross-validation of GIN on internal_only to verify that the strong baseline result from Experiment 02 is not due to a lucky train/test split. Each fold uses a different 25% of the data as the test set.

95.9%

95.4%

99.5%

98.8%

Mar 4, 2026 · 109 epochs · 3562s

GINinternal_onlyFamily Holdout

Hold out simplelocker and wipelocker families entirely from training. Test whether GIN can detect ransomware families it has never seen during training. These two families were chosen as the first holdout test.

95.7%

95.2%

100.0%

96.0%

Mar 4, 2026 · 100 epochs · 962s

GINinternal_onlyFamily Holdout

Hold out wannalocker and blackroselucy families from training. Same setup as Experiment 10 but with different held-out families to test whether generalisation depends on which families are excluded.

54.5%

44.0%

11.8%

89.9%

Mar 4, 2026 · 100 epochs · 973s

GINinternal_onlyLOFO

Full leave-one-family-out (LOFO) evaluation of GIN on internal_only. Trains 6 separate models, each time holding out one ransomware family entirely, to systematically measure whether the baseline can detect ransomware families not seen during training.

70.7%

48.6%

18.1%

85.1%

Mar 5, 2026 · 111 epochs · 6850s