Skip to main content

Table 3 Model performance on the Emory holdout dataset for model-assisted-labeling models

From: Toward a generalizable machine learning workflow for neurodegenerative disease staging with focus on neurofibrillary tangles

Models

Pre-NFT

iNFT

Macro F1-score

Precision

Recall

F1 score

Precision

Recall

F1 Score

iter. 1

0.36 ± 0.03

0.40 ± 0.03

0.38 ± 0.03

0.86 ± 0.01

0.57 ± 0.00

0.69 ± 0.00

0.53 ± 0.01

iter. 2

0.37 ± 0.02

0.45 ± 0.01

0.41 ± 0.01

0.84 ± 0.02

0.63 ± 0.01

0.72 ± 0.01

0.56 ± 0.01

iter. 3

0.29 ± 0.02

0.46 ± 0.01

0.36 ± 0.02

0.82 ± 0.01

0.71 ± 0.02

0.76 ± 0.01

0.56 ± 0.01

iter. 4

0.31 ± 0.02

0.47 ± 0.01

0.37 ± 0.02

0.79 ± 0.01

0.74 ± 0.02

0.77 ± 0.01

0.57 ± 0.00

iter. 5

0.31 ± 0.03

0.51 ± 0.00

0.38 ± 0.02

0.78 ± 0.01

0.76 ± 0.02

0.77 ± 0.01

0.58 ± 0.02

iter. 6

0.30 ± 0.04

0.53 ± 0.02

0.38 ± 0.03

0.75 ± 0.01

0.78 ± 0.02

0.77 ± 0.01

0.57 ± 0.02

iter. 7

0.29 ± 0.01

0.53 ± 0.02

0.38 ± 0.02

0.74 ± 0.00

0.81 ± 0.01

0.77 ± 0.00

0.58 ± 0.01

iter. 8

0.26 ± 0.01

0.54 ± 0.04

0.35 ± 0.02

0.73 ± 0.02

0.80 ± 0.02

0.76 ± 0.02

0.56 ± 0.02

amygdala

0.46 ± 0.06

0.52 ± 0.08

0.48 ± 0.00

0.73 ± 0.03

0.86 ± 0.02

0.79 ± 0.03

0.64 ± 0.02

hippocampus

0.27 ± 0.04

0.44 ± 0.08

0.33 ± 0.04

0.68 ± 0.03

0.78 ± 0.01

0.73 ± 0.02

0.53 ± 0.03

temporal

0.14 ± 0.06

0.20 ± 0.10

0.16 ± 0.07

0.76 ± 0.06

0.67 ± 0.05

0.71 ± 0.04

0.44 ± 0.06

occipital

0.04 ± 0.03

0.22 ± 0.19

0.06 ± 0.05

0.68 ± 0.09

0.76 ± 0.09

0.71 ± 0.04

0.39 ± 0.05

QC ROIs

0.41 ± 0.04

0.45 ± 0.01

0.43 ± 0.03

0.78 ± 0.01

0.85 ± 0.03

0.81 ± 0.01

0.62 ± 0.02

best consensus

0.36 ± 0.04

0.53 ± 0.03

0.43 ± 0.04

0.82 ± 0.01

0.70 ± 0.01

0.76 ± 0.01

0.59 ± 0.02

  1. Additional models are also shown which are modifications to the datasets used. iter.: iteration in model-assisted-labeling, amygdala/hippocampus/temporal/occipital: models trained on ROIs only from specific regions of the brain (temporal and occipital refers to the temporal and occipital cortex), QC ROIs: models trained only with ROIs with curated labels during model-assisted-labeling, best consensus: consensus model when n equal to 4 (Additional file 3: Fig. S4). Values are shown with standard deviation from the average of the three-fold cross-validation models. Bold score is the best performing model trained on the dataset from all brain regions