### Datasets

The datasets used in this work and the proteins in each of them are described in Additional file 1: Datasets S1 and S2.

### Data analysis

Raw files were converted into the Mascot generic format (MGF) format using Proteome Discoverer 1.4 (Thermo Fischer Scientific, Germany). MGF files were searched against a combined database containing the Swiss-Prot part of the UniProt Knowledgebase (UniProtKB) [35] or *Homo sapiens* (release 2014/05/28, 20,265 curated entries). For the generation of shuffled decoy entries DecoyDatabaseBuilder was used [29]. Identifications were performed by Mascot 2.5 (Matrixscience Ltd., [28]) with a peptide mass tolerance of 10 ppm, fragment mass tolerance of 0.5 Da, one allowed missed cleavage and carbamidomethylation (C), oxidation (M) as well as phosphorylation (S, T, Y) as variable modifications. Label free relative quantification by spectral counting was performed as described in [14].

### Calculation of protein abundance

We previously reported abundances as spectral counts normalized by the total number of spectral counts in a given sample [14, 20, 25, 26]. Here, we performed an additional normalization step to account for the fact that longer proteins will generate more peptides in mass spectrometry than smaller proteins of the same abundance [17]. Akin to the normalized spectral abundance factor, we divided normalized spectral counts in our data sets by the protein length. We then divided these values by the sum of all such normalized values in a given sample. We then averaged these normalized protein abundances across replicates and log_{10}-transformed these values to arrive at a final abundance value.

### Calculation of gene expression from microarray data

Microarray data was obtained from BioGPS pre-processed using gcrma as previously described. For cross-tissue analysis, cell ine and malignant tissue expression levels were excluded. Transcript identifiers were converted to UniProt IDs, with cases of ambiguous conversion or absence of reviewed UniProt IDs excluded from analysis. Values ≤0 were excluded. Expression levels were then log_{10}-transformed then averaged across all values for a given UniProt ID. A similar procedure was done for the skeletal muscle analysis, but limiting it to the two arrays of skeletal muscle data.

### Calculation of gene expression from RNA sequencing data

Processed RNA sequencing data was obtained, with expression levels reported in FKPM (GEO Datasets GSE102138) [15]. Any values ≤0 were excluded. Identifiers were converted to reviewed UniProt IDs, with ambiguous conversions excluded from further analysis. In cases in which one multiple identifiers mapped to a single UniProt IDs, these FKPM values were averaged. The values were then log_{10}-transformed. Significantly upregulated and downregulated transcripts were identified based on the reported q-values. In cases in which there were multiple q-values associated with a given UniProt ID, the largest q-value was used. The q-values reported were two-tailed, which we converted to one-tailed q-values for the purpose of our analysis. We used a threshold of significance of *p* < 0.05.

### Calculation of protein aggregation propensity

For the human proteome set, we calculated the *Z*_{agg}*,* \( {Z}_{agg}^{SC} \) and TANGO scores as previously described [7, 34]. For TANGO, we set the parameters at pH = 7.4, T = 310 K, and ionic strength = 0.1 M. The supersaturation score *σ* is calculated as the sum

where C is the log_{10} of the concentration and *Z* is aggregation propensity score; the concentrations are derived from the protein abundance levels. In each dataset, values were recentered such that the median σ score for each database was 0.

### Identification of proteins enriched in disease-associated inclusions

In order to determine vacuole-enriched proteins in the IBM data set, we compared abundance values in the RV dataset to those in the DC dataset. For this analysis, we only included proteins that had a non-zero abundance in both the DC and RV datasets, which constituted a total of 1302 proteins. For these proteins, we performed a one-tailed paired t-test. We then used the Benjamini-Hochberg method to calculate q-values for each of these proteins, using as a threshold of significant q < 0.05, for a False Discovery Rate of 5%.

### Gaussian noise generation

We performed noise testing to evaluate the robustness of our results for the comparison of supersaturation scores among the IBM data sets, as well as the hPAM data sets. We defined one hundred noise levels on the basis of the standard deviation of a series of Gaussian distributions with mean of 0. The range of standard deviations was log_{10}(1.1) to log_{10}(10.1). At each noise level *l*, we performed 100 trials *t*, in which we drew a random number *n*_{l, t, p} from that the noise level distribution for each of the *p* proteins in the database. The noise-introduced supersaturation score *σ*_{p, l, t} was defined as

$$ {\sigma}_{p,l,t}={\sigma}_p+{n}_{l,t,p} $$

(2)

For trial *t* of noise level *l*, the set *S*_{l, t} of noise values is

$$ {S}_{l,t}=\left\{{n}_{l,t,1},{n}_{l,t,2},\dots, {n}_{l,t,p}\right\} $$

(3)

The set *m*_{l, t} of linear magnitudes of noise for trial *t* of noise level *l* is

$$ {m}_{l,t}=\left\{{10}^{\ln \left\lceil {n}_{l,t,1}\right\rceil },{10}^{\ln \left\lceil {n}_{l,t,2}\right\rceil },\dots, {10}^{\ln \left\lceil {n}_{l,t,p}\right\rceil}\right\} $$

(4)

For noise level *l*, the set *M*_{l} of median noise values for its constituent trials is

$$ {M}_l=\left\{ median\left({m}_{l,1}\right), median\left({m}_{l,2}\right),\dots, median\left({m}_{l,100}\right)\right\} $$

(5)

In each Gaussian noise plot, the values plotted on the x-axis were the median of *M*_{l} with error bars representing the standard error of the mean as calculated using default settings in the Python package SciPy.

### Gaussian noise significance testing

For each trial at each noise level, we determined the sets of noise-modified *σ* scores for the data sets under consideration. A one-tailed Wilcoxon/Mann-Whitney U test was performed for each of these trials, with multiple hypothesis correction performed based on the same families used for the original analysis, with one difference. At each noise level, the median of the *p*-values for the 100 trials was plotted with error bars representing the standard error of the mean as calculated using default settings in the Python package SciPy. We performed a one-sided one-sample t-test using the distribution of p-values for a given trial to test the null hypothesis that the mean of these p-values is not significantly less than 0.05. For those cases in which we could not reject the null hypothesis, we plotted the points in grey; otherwise, we plotted the points in color.

### Gaussian noise fold change testing

For each trial at each noise level, we determined the sets of noise-modified *σ* scores for the data sets under consideration. The linear difference *d*_{l, t} between the medians of the supersaturation scores of the control set *C*_{l, t} and experiment set *E*_{l, t} being tested at noise level *l* and trial *t* is

$$ {d}_{l,t}={10}^{median\left({E}_{l,t}\right)- median\left({C}_{l,t}\right)} $$

(6)

At noise level l, we plotted the median of set {*d*_{l, 1}, *d*_{l, 2}, …, *d*_{l, 100}} with error bars representing the standard error of the mean as calculated using default settings in the Python package SciPy. We performed a one-sided one-sample t-test using the distribution of fold change values for a given trial to test the null hypothesis that the mean of these fold changes is not significantly greater than 1. For those cases in which we could not reject the null hypothesis, we plotted the points in grey; otherwise, we plotted the points in color.

### Overlap analysis

In Fig. 5b and Additional file 2: Figure S12B, the Fisher exact test is used to calculate enrichment of data sets for particular categories of proteins.

### Statistical significance of escalating supersaturation

To test the significance of our observations of rising supersaturation (Fig. 4, Additional file 2: Figures S7–11) we used a simulation. The null hypothesis was that it would arise by chance that 1) the median Δ > 0 for a set of proteins of interest in each context and 2) median Δ of those proteins would rise successively from HC to DC to AF to RV contexts. To test this, we performed the following procedure *K* times, where *K* =1,000,000. For each trial *k*, we randomly selected *N* proteins from the proteome (where *N* is equal to the number of proteins of interest, for instance 53 in the case of RV-enriched proteins or 51 in the case of hPAM-enriched proteins). When selecting *N*, we used the total number of proteins meeting a particular criterion, even if a smaller number of those proteins was actually present in the original dataset. For these *N* proteins, *D* is the set of median Δ compared to the proteome for each of the four contexts:

$$ D\equiv \left\{ med{\Delta }_{HC}, med{\Delta }_{DC}, med{\Delta }_{AF}, med{\Delta }_{RV}\right\} $$

(7)

If the supersaturation rose successively at each from HC to DC to AF to RV, and median Δ > 0 in each context, we assigned a score *E*_{k} of one; otherwise, we assigned a score *E*_{k} of zero. We then summed this score over the 1,000,000 trials.

$$ D=\left\{ med{\Delta }_{HC}, med{\Delta }_{DC}, med{\Delta }_{AF}, med{\Delta }_{RV}\right\} $$

(8)

$$ {E}_k=\left\{\begin{array}{c}1,\kern0.5em if\min (D)>0\ and\ med{\Delta }_{RV}> med{\Delta }_{AF}> med{\Delta }_{DC}> med{\Delta }_{HC}\\ {}0,\kern26em otherwise\end{array}\right.\kern0.5em $$

(9)

We estimated the significance of the escalation in supersaturation as follows:

$$ E=E1,\dots, EK $$

(10)

$$ p=\sum \limits_{k=1}^K\frac{E_k}{K} $$

(11)

$$ p=\sum \limits_{k=1}^K\frac{E_k}{K} $$

(12)

In order to test the isolated contribution of escalating median Δ, we removed the constraint of median Δ > 0, and calculated a score \( {E}_k^r \):

$$ {E}_k^r=\left\{\begin{array}{c}1,\kern0.5em ifmed{\Delta }_{RV}> med{\Delta }_{AF}> med{\Delta }_{DC}> med{\Delta }_{HC}\\ {}0,\kern17.75em otherwise\end{array}\right. $$

(13)

$$ p=\sum \limits_{k=1}^K\frac{E_k}{K} $$

(14)

We considered all cases analyzed by our original constraints on family for the purpose of multiple hypothesis correction and all cases analyzed by the relaxed criteria a separate family. Multiple hypothesis correction was performed using the Holm-Bonferroni method. *P*-values for both constraints are reported in Additional file 1: Dataset S12.

### Statistical significance of comparative median Δ

To test the significance of differences in median Δ between different contexts (Figs. 2, 3 and 4), we used a simulation. The null hypothesis was that the difference in median Δ (*∆*_{∆)}, of at least the magnitude reported would arise by chance. The reported difference in median Δ we refer to as \( {\Delta }_{\Delta }^0 \). To test this, we performed the following procedure *K* times, where *K* =1,000,000. For each trial *k*, we randomly selected *N* proteins from the proteome by the same procedure as above for escalating supersaturation. For these *N* proteins, we calculated the median Δ in contexts *C*_{1} and *C*_{2}. Note that we performed this analysis in a one-tailed fashion.

$$ {S}_{\varDelta_{\varDelta }}=\left\{{\varDelta}_{\varDelta}^1,\dots, {\varDelta}_{\varDelta}^K\right\}{\varDelta}_{\varDelta }=\left\{{\varDelta}_{\varDelta}^1,\dots, {\varDelta}_{\varDelta}^K\right\} $$

(15)

where

$$ {\Delta }_{\Delta }^k= med{\Delta }_2^k- med{\Delta }_1^k $$

(16)

We assigned a score *E*_{k} to each trial and from all the trials together derived a *p*-value, as follows:

$$ {E}_k=\left\{\begin{array}{c}1,\kern0.5em {\Delta }_{\Delta }^k>{\Delta }_{\Delta }^0\\ {}0,\kern0.75em otherwise\end{array}\right. $$

(17)

$$ p=\sum \limits_{k=1}^K\frac{E_k}{K} $$

(18)

We considered all cases analyzed in this fashion as a single family. Multiple hypothesis correction was performed using the Holm-Bonferroni method. P-values are reported in Additional file 1: Dataset S12.

### Multiple hypothesis correction

In order to perform adequate multiple hypothesis correction while avoiding increasing Type II error by overcorrecting, it was necessary to group our results into a series of families on which multiple hypothesis correction would be performed meaningfully. We used the following principles to help divide the analyses in these studies into a set of coherent families. Except when they were being compared directly, hPAM and IBM data sets were considered part of separate families. IBM families were organized cross data subsets (that is, HC, DC, AF, and RV included in the same family). hPAM families were organized in three families: 1) HC, 2) DC, and 3) AF. This was organized in this way because there were multiple individual hPAMs, but analyses for the composite group of hPAM aggregate-enriched proteins could only be performed logically on the HC dataset as the other data sets were disease-specific. Analyses using σ_{u} were considered distinct from analyses using σ_{f}. All σ_{u} analyses were considered as part of a single family. Among IBM data sets, we performed a series of analyses in which we compared σ_{f} levels between the proteome and particular subsets of proteins (RV-enriched, hPAM-enriched, plaque-enriched, NFT-enriched) across the four IBM data sets (HC, DC, AF, RV). We considered analyses involving each of these subsets as separate families. Additional file 1: Dataset S12 shows a summary of all statistical tests performed in this analysis, and groups those tests by their respective families.

### Laser microdissection (LMD) and sample processing

Patients provided informed consent. Study protocols were approved by the local ethics committee (reg. Number 3882–10) at Ruhr-University Bochum, Bochum, Germany. For each patient 250,000 μm^{2} of HC, DC, AF or RV tissue was collected by LMD (LMD 6500, Leica Microsystems, Wetzlar, Germany). Sample lysis and digestion were carried out as previously described [25]. Briefly, samples were lysed with formic acid (98–100%) for 30 min at room temperature (RT), followed by a sonication step for 5 min (RK31, BANDELIN electronic, Berlin, Germany). Samples were kept frozen at − 80 °C until digestion.

Prior to digestion the formic acid was removed and the collected samples were digested in 50 mM ammonium bicarbonate at pH 7.8. Samples were reduced and alkylated by adding dithiothreitol and iodoacetamide. Trypsin (Serva) was added to a final concentration of 1 μg. Digestion was carried out overnight at 37 °C and stopped by adding TFA to acidify the sample. Samples were purified using OMIX C18 Tips (Varian, Agilent Technologies, Böblingen, Germany) completely dried vacuum and again solved in 63 μl 0.1% TFA, as described in [25].

### Mass spectrometry

Sixteen microliter per sample were analysed by nano-liquid chromatography tandem mass spectrometry (nanoLC-ESI-MS/MS). The nano high performance liquid chromatography (HPLC) analysis was performed on an UltiMate 3000 RSLC nano LC system (Dionex, Idstein, Germany) as described in [26]. Peptides were separated with a flow rate of 400 nl/min using a solvent gradient from 4 to 40% B-solvent for 95 min. Washing of the column was performed for 5 min with 95% B-solvent and was then returned to 4% B-solvent. The HPLC system was online-coupled to the nano electrospray ionization (ESI) source of an Orbitrap elite mass spectrometer (Thermo Fisher Scientific, Germany). Mass spectrometric measurements were performed as previously described [14].