Toward a generalizable machine learning workflow for neurodegenerative disease staging with focus on neurofibrillary tangles

Table 2 Results for YOLO models trained with data annotated by humans

Annotators	Pre-NFT F1 Score			iNFT F1 Score			Macro F1 Score
Annotators	Val	Test	Emory Holdout	Val	Test	Emory Holdout	Val	Test	Emory Holdout
Novice 1	0.49 ± 0.12	0.63 ± 0.10	0.20 ± 0.09	0.76 ± 0.02	0.76 ± 0.02	0.59 ± 0.05	0.63 ± 0.07	0.70 ± 0.06	0.39 ± 0.02
Novice 2	0.44 ± 0.04	0.39 ± 0.06	0.21 ± 0.08	0.80 ± 0.01	0.73 ± 0.02	0.51 ± 0.05	0.62 ± 0.02	0.56 ± 0.02	0.36 ± 0.06
Novice 3	0.67 ± 0.08	0.65 ± 0.03	0.13 ± 0.02	0.65 ± 0.05	0.74 ± 0.01	0.59 ± 0.01	0.66 ± 0.06	0.70 ± 0.02	0.36 ± 0.01
Expert 1	0.36 ± 0.08	0.55 ± 0.05	0.21 ± 0.04	0.75 ± 0.06	0.75 ± 0.00	0.45 ± 0.06	0.55 ± 0.06	0.65 ± 0.03	0.33 ± 0.05
Expert 2	0.41 ± 0.24	0.26 ± 0.13	0.10 ± 0.01	0.64 ± 0.11	0.46 ± 0.03	0.39 ± 0.10	0.53 ± 0.09	0.36 ± 0.07	0.25 ± 0.05
Expert 3	0.40 ± 0.24	0.54 ± 0.04	0.31 ± 0.08	0.80 ± 0.05	0.75 ± 0.03	0.66 ± 0.03	0.60 ± 0.11	0.65 ± 0.03	0.48 ± 0.05
Expert 4	0.49 ± 0.21	0.40 ± 0.02	0.29 ± 0.07	0.79 ± 0.01	0.71 ± 0.02	0.59 ± 0.07	0.64 ± 0.10	0.55 ± 0.01	0.44 ± 0.06
Expert 5	0.47 ± 0.02	0.43 ± 0.06	0.17 ± 0.04	0.67 ± 0.04	0.65 ± 0.01	0.38 ± 0.04	0.57 ± 0.03	0.54 ± 0.03	0.28 ± 0.03

The Emory-Holdout 28 ROI dataset is the consensus annotated dataset from a hold-out Emory cohort. Val (Validation) and Test datasets are annotated by the specific annotator and reflect how well the models learned the annotator nuances. All values reported are the average results of three-fold cross-validation models for each annotator. Standard deviations are shown

ISSN: 2051-5960