Study selection
The datasets of spatiotemporal mouse pathology used in the present research were chosen based on the following criteria: a study had to characterize tau pathology in at least 10 distinct brain regions across at least 2 timepoints spanning a total period of at least 6 months. The average number of regions quantified across studies was 89 areas; specific numbers of regions quantified per study are discussed below in this subsection. Every study used mice with a C57/BL6 background so that mouse strain effects did not confound the results. All mice possessing a tau mutation had to have the same transgene background and so were all PS19 mice possessing the P301S familial FTD derived tau mutation, with mutant human tau transgene expression driven by the same promoter (MoPrP). One dataset used mice possessing both tauopathic and amyloidopathic transgenes, created by crossing the P301S tau transgene mice as above with mice possessing the human APPswe transgene [17]. In 5 of the 6 datasets cited in the present research mice were injected at between 2 and 3 months of age with a pathogenic tau infusate; specific injectate details differ from study to study, but included brain homogenate from Down Syndrome and AD (102 regions quantified) as well as CBD (96 regions quantified) tauopathy patients [4] as well as mouse models ([9]; 11 regions quantified), purified mutant tau [19], and synthetic short chain fibrils produced using cDNA cloning in E. coli vectors [19] injected into the hippocampus (148 regions quantified) and striatum (132 regions quantified). The study using PS19 x APPswe mice did not have exogenous seeding of pathology but still characterized the spatial development of tau pathology ([17]; 45 regions quantified). All studies used semi-quantitative, regionally realized tau pathology grading as their regional pathology measurements. Specific methodological information on each study can be found in the relevant citations, which are all listed above.
Connectivity networks
Connectivity data was taken from the supplementary dataset published along with the mesoscale mouse connectome from the ABI (MBCA; [29]). Total projection volume between regions was generated by multiplying element-wise by the rows the connectivity matrix times the number of voxels in each seeding region. As the MBCA connectivity matrix, retrieved from SI.4 in [29], gave per-voxel normalized connectivity strength, our adjustment of connectivity to the size of each region approximates total connectivity; this procedure for approximating total axon volume or connectivity between regions is laid out in detail in SI.5 in [29]. We then averaged the resulting directed connectivity matrix, N
C
, with its transpose,\( {N}_C^T \), to get the standard undirected connectivity matrix used in prior graph theoretic neuroscience models [7], including the network diffusion model [33]. We employed undirected networks rather than directed networks because recent studies on transsynaptic tau spread indicate that directional transmission biases remain ambiguous [41]. Following this, we applied a thresholding criteria of getting rid of all values that were less than 0.05 the standard deviation of the nonzero entries of N
C
, resulting in a network density < 0.14. The resultant network was a sparse matrix of 426 × 426 regions, with each cell representing thresholded approximate total axon volume.
Genetic proximity networks
In the present research, we created 3 distinct genetic proximity networks: the first network used characterized the interregional genetic expression profile similarity across all 4500 genes in the Coronal Mouse Gene Expression Atlas from the Allen Institute [26]. The second and third characterized the similarity in expression profile across smaller subsets of genes; the subsets were genes known to exert profound effects on misfolded tau aggregation and tau gene expression [12], as well as genes necessary for the synthesis and degradation and the receptors of norepinephrine, the monoamine neurotransmitter theorized to be an important factor in tauopathic disease genesis [27]. Full lists of genes from the tau pathology related and noradrenergic related specific gene expression profile networks can be found in Additional file 1: Table S1.
Interregional networks of gene expression profile similarity or proximity were calculated using the following method: First, a gene expression profile discrepancy matrix, D, was created, where each entry in the matrix was an integer corresponding to the number of genes, between any two regions, that were more than 3-fold differentially expressed, a methodology previously validated by the Allen Institute [15]. This discrepancy matrix was then inverted and exponentiated to create a proximity network with values normalized to a range from 0 to 1, using the following equation:
$$ {N}_G={e}^{-D/\lambda } $$
(1)
In Eq. (1) above N
G
represents the resulting gene expression proximity network, D represents the original discrepancy matrix, and λ is the mean of nonzero values from the discrepancy matrix, used above for normalization of the resulting values. N
G
was then thresholded to boost the signal of regionally similar gene expression profiles above the noise of the endogenously dense matrix; any values below the mean plus one standard deviation of nonzero and non-1 values was set to 0, signifying that for our purposes these two regions had effectively maximally dissimilar gene expression profiles, and resulting in a network density < 0.3, still more dense than that of C. N
G
will refer to the genetic expression network calculated across the entire set of sequenced genes in the MGA, N
T
will refer to the network calculated only on the set of tau expression and aggregation related genes, and N
N
will refer to the gene proximity network calculated only for noradrenergic neurotransmitter related genes.
Regional gene expression
In addition to the gene proximity networks, we also examine whether higher regional levels of MAPT and tau aggregation related genes, as well as higher expression levels of noradrenergic related genes, relate to regional tau pathology severity. To get measures of regional gene expression levels we extracted the normalized gene expression intensity from the Allen Institute MGA for our genes of interest and summed the expression intensity across those specific genes. This resulted in a vector that had one measurement of gene intensity per region from the connectivity and genetic atlases for our genes of interest, for each of the two aforementioned gene groups. In all cases out suite of specific gene sets, across tau aggregation related [12], tau transcription promoting [3], and noradrenergic neurotransmission related genes [27], were derived from prior work. A complete list of genes used can be found in Additional file 1: Table S1. Among genes listed in [12], we selected those that are either general proteinopathic aggregation risk factors or specifically tau related, and excluded any genes that are solely related to amyloid beta. We included PrP in the list of tau expression promoting genes because this was the promoter sequence used to drive tau transgene expression in all analyzed mouse datasets.
Spatial diffusion modeling as an alternative model to graph diffusion
We additionally created a spatial diffusion model as a comparison or alternative hypothesis to the graph diffusion model. The spatial diffusion model was based on the same fundamental network diffusion eq. (2) stated below. The difference between ND and spatial diffusion in the present study is that the network for spatial diffusion is a matrix where each entry in the matrix N
D
(i, j)is the reciprocal of the Cartesian distance between the center of mass of each GM region included in the Allen Institute’s mouse connectivity atlas. Using this distance matrix, N
D
, rather than the connectivity matrix N
C
, we ran the diffusion equation stated above in [2] to get a model approximating diffusion based on spatial proximity, which will be referred to as SPD.
Seed region proximity analyses
For the 5/6 datasets that had reported seed regions we performed what we term a Proximity Analysis. An example of this can be seen in Fig. 1. These analyses involved calculating the average connectivity, spatial distance, or genetic similarity with a given seed site or sites on a region by region basis. This produced a 426 × 1 vector, corresponding to 426 Gy matter regions available in the mouse atlas. We then compared this vector with that of empirical regional pathology from each study and an aggregated meta-dataset using a natural log transformed regression, as proximity data in all networks as well as empirical data were exponentially distributed and would give erroneously high r-values due to outliers with standard linear regression. We created the aggregated meta dataset by vertically concatenating each the data from each dataset in the y-vector, and each dataset’s corresponding predictor vector in the x-vector. As datasets were measured on different scales, the values in the y-vector were normalized by division by the maximum value, on a per dataset basis. We then performed a natural log transformed regression as above.
Network diffusion
A previous graph theoretic model of pathology progression in AD throughout a brain network was shown to be predictive of future patterns of disease progression [33]. The model captures the diffusion of the disease factor throughout the network via the Network Diffusion equation:
$$ {X}_{GD}(t)={e}^{\left(-\beta Lt\right)}\cdotp X(0) $$
(2)
This models the long range patterns of progression of the protein pathology at any time t as a product of the initial seeding pattern X(0), and the so-called diffusion kernel exp.(−βtL), with diffusion and time constants, β and t, [33], and the network Laplacian matrix, L. The Laplacian is defined as [33, 34]:
$$ \mathrm{L}=\mathrm{I}\hbox{--} {\mathrm{D}}^{\left(-1/2\right)}\cdotp \mathrm{N}\cdotp {\mathrm{D}}^{\left(-1/2\right)} $$
(3)
where N is the 426 × 426 connectivity matrix giving the strength of connections between all region pairs. Since we are interested in understanding how the same canonical network diffusion model gives pathology progression using various proximity networks, we therefore defined separate 426 × 426 matrices corresponding to pairwise proximity determined, respectively, using tracer-based connectivity, spatial distance, and gene expression similarity networks. These are denoted respectively by matrices N
C
, N
D
, N
G
, N
T
, N
N
. Note that we defined 3 different gene-based similarity matrices N
G
, N
T
, N
N
, corresponding to general, tau-specific and noradrenergic gene expression, respectively. For each proximity matrix, the corresponding Laplacian was defined using Eq. (3).
The major difference with previous ND model is that because we are interested in total pathology accumulation over time, we model tau progression as a summative or iterative process:
$$ {X}_{NT}(i)={e}^{\left(-\beta Lt\right)}\cdotp X\left(i- 1\right)+X\left(i- 1\right) $$
(4)
We use eq. (4) to calculate, for any point in time, the deposition of tau across the brain regions represented in our connectivity, spatial distance, and gene expression networks. Further information on the original network diffusion equation and its mathematical foundation can be found in both [1, 33]. The symbol meanings in Eq. (4) are the same as in Eq. (2).
The result from the network diffusion equation was, akin to proximity analyses, a vector with one entry per region represented in the connectivity, spatial distance, and gene expression networks. However, the ND model produces predictions of regional pathology, not a simple empirical measurement of network proximity with a seed region, and so does not require a seed region, but only a baseline pathology measurement. Akin to the proximity analyses discussed above, we compared our prediction vector from the ND model, run with the L from each network, to the regional pathology measurements from each dataset using a natural log transformed regression. We used both baseline measurements and, where available, reported seedpoints, as the initiation point for the ND model. An example of the ND model and how to interpret its results can be found in Fig. 3. Note in particular Fig. 3b: here we show both how we calculate βt-values, by setting β = 0 and modulating t to the value that produces the strongest correlation with the data, and how we assess predictive value added, by calculating the change in r-value from baseline to peak βt-value, in this manuscript referred to as Δr.
Comparing predictive value across different predictors
When comparing r-values, p-values, and fits across predictions from proximity or ND modeling using any of the connectivity, gene expression profile, or spatial distance networks, we employed two methods. First, using separate bivariate analyses, we obtained Pearson’s r-values between regional tau and either connectivity or gene expression. We compared the resulting r statistic directly using Fisher’s R-to-Z Test, and obtained a p-value for the likelihood of a true difference between r-values associated with different predictors. Next, we used a Multivariate Linear Model, and entered predictions from connectivity networks, regional gene expression across tau aggregation and transcription related, as well as noradrenergic related, genes, and seed region or baseline regional pathology data, as separate predictors. From this we could calculate independent per-predictor r and p-values, which we used as the basis of our comparisons. All analyses were performed using the following methods for creating the prediction and data vectors: we used only the sampled regions from each dataset in our regressions and multivariate linear models, and 2) we used all 426 regions from the MBA, with 0 pathology given in each region that went unmeasured in our y-variable vector. All above statistics were performed in MatLab.