Experiments
10 experiment runs logged.
First baseline experiment. Train GIN on the internal_only FCG dataset using a standard stratified 70/15/15 train/val/test split to establish a performance reference for graph-based ransomware detection.
98.2%
97.8%
100.0%
99.0%
Train GIN on the full_fcg dataset (all methods, internal + external) to compare with the internal_only result from Experiment 02.
95.4%
94.7%
100.0%
97.7%
Train GCN on the internal_only dataset to compare GNN architectures. Same hyperparameters and data split as GIN experiments.
96.3%
95.6%
93.8%
98.4%
Train GCN on the full_fcg dataset to complete the GCN comparison across both graph representations.
94.4%
93.5%
93.8%
98.9%
Train GAT on the internal_only dataset to complete the three-model architecture comparison (GIN, GCN, GAT).
96.3%
95.6%
96.9%
98.5%
Train GAT on the full_fcg dataset to complete the 3-model x 2-dataset baseline grid.
98.2%
97.8%
96.9%
99.4%
4-fold cross-validation of GIN on internal_only to verify that the strong baseline result from Experiment 02 is not due to a lucky train/test split. Each fold uses a different 25% of the data as the test set.
95.9%
95.4%
99.5%
98.8%
Hold out simplelocker and wipelocker families entirely from training. Test whether GIN can detect ransomware families it has never seen during training. These two families were chosen as the first holdout test.
95.7%
95.2%
100.0%
96.0%
Hold out wannalocker and blackroselucy families from training. Same setup as Experiment 10 but with different held-out families to test whether generalisation depends on which families are excluded.
54.5%
44.0%
11.8%
89.9%
Full leave-one-family-out (LOFO) evaluation of GIN on internal_only. Trains 6 separate models, each time holding out one ransomware family entirely, to systematically measure whether the baseline can detect ransomware families not seen during training.
70.7%
48.6%
18.1%
85.1%