Post-mortem series
The Mayo Clinic Brain Bank was queried to identify participants with available tissue samples that met neuropathological criteria for Alzheimer’s disease (NINCDS-ADRDA [31]), with a Braak stage ≥ four, scored for CAA pathology, and an age at death of greater than 55 years. The study size was maximized to include all available samples that met these criteria (MC-CAA dataset). To assess key findings in the absence of significant ADNC, the Mayo Clinic Brain Bank was also queried to identify participants that were scored for CAA but did not meet criteria for a diagnosis of AD (non-AD dataset). Due to availability, only individuals recorded as North American Caucasian were included. This study was approved by the appropriate Mayo Clinic Institutional Review Board.
Neuropathology
CAA severity was scored using Thioflavin-S staining across five brain regions (inferior parietal cortex, middle frontal cortex, motor cortex, superior temporal cortex and visual cortex). Semi-quantitative scores were assigned for each of the regions defined as; zero = no amyloid positive vessels; 0.5 = scattered amyloid deposition only in leptomeninges; one = scattered amyloid deposition in both leptomeningeal and cortical vessels; two = strong circumferential amyloid deposition in multiple cortical and leptomeningeal vessels; three = widespread strong amyloid deposition in leptomeningeal and cortical vessels; four = same as score three plus extravasation of amyloid deposition accompanied by dyshoric amyloid. Notably only eleven individuals had a score of four in any one brain region. Scores were averaged and square root transformed (sqrtCAA) in order to meet the assumptions of parametric statistical tests. Thal phase [52] and Braak stage [11] were likewise collected using established approaches, as previously described [34, 35]. To reduce the number of variables, the distribution of CAA scores across Thal phases and Braak stages were evaluated and categories combined when not variable (Table S1). Furthermore, Braak stage was provided with intermediate levels and redefined as follows: zero (0 &0.5), one (1 &1.5), two (2 & 2.5), three (3 & 3.5), four (4 & 4.5), five (5 & 5.5) or six (6).
MC-CAA genetic data
Genomic DNA was isolated from brain tissue of 853 AD cases using the AutoGen245T instrument (AutoGen) according to manufacturer’s protocols, incubated with two µl (4 mg/ml) RNAseA solution (Qiagen) and stored at −80 degrees Celsius prior to transfer to the Mayo Clinic Genome Analysis Core (GAC) in Rochester MN, for genotyping. Genome-wide genotypes (GWG) were generated for study participants in two batches (Additional file 1: Table S1), batch A (N = 477) and batch B (N = 376), using the Infinium Omni2.5 Exome8 v1.3 (A) or v1.4 (B) array. Genotypes were exported to a comma-separated final report file using Illumina’s GenomeStudio software v1.9.4 and v2.0.3, respectively. Final report files were converted to PLINK (v1.9) [13, 38] formatted lgen, fam, and map files using in-house scripts. Following quality control (Additional file 1: Figures S1-S2) 32 samples were removed resulting in a total of 821 samples for analysis. Data was imputed to the haplotype reference consortium (HRC) panel [28] (Additional file 1: Figure S1); variants with in imputation R2 ≥ 0.7 and a MAF ≥ 2% were retained resulting in 1,282,922 genotyped variants and 5,441,346 imputed variants. PLINK [13] was used to generate minor allele frequency and Hardy–Weinberg p-value annotation for all reported variants [57]. Where applicable, imputed dosages were converted to hard calls with uncertainty > 0.1 set to missing.
AMP-AD brain transcriptome datasets
Brain transcriptome datasets on the AD Knowledge Portal (Additional file 1: Table S2) were utilized for functional annotations. For gene and exon QTL analysis (statistical analysis) independently processed gene and exon counts, and accompanying whole-genome sequencing (WGS) genotypes from the Mayo RNASeq dataset [1, 2] were utilized. The Mayo RNAseq dataset comprises transcriptome measures from temporal cortex (TCX) and cerebellum (CER); RNA isolation, data collection, sequencing alignment, counting and QC has been described in detail elsewhere [1, 2]. Gene counts were normalized using conditional quantile normalization (CQN) [20], RPKM exon counts ((10^9 × exon counts) / (total mapped reads x length of exon)), mapped to ensembl GRCh37/hg19 assembly, were transformed by log2 (1 + RPKM). Whole-genome sequencing (WGS) data was collected from individuals who passed prior QC, and was likewise shared on the AD Knowledge Portal along with detailed methods (Additional file 1: Table S2). Independent processing and QC of WGS data is outlined in additional file 1: Figure S3. Genotypes were extracted from VCF files using PLINK [13]. Selected variants were annotated using online databases [9, 40].
For transcriptome profiling analysis, consensus reprocessed counts from the Mayo RNASeq and two additional brain transcriptome datasets; the Religious Orders Study and Rush Memory Aging Project (ROSMAP) [16, 33], and the Mount Sinai Brain Bank (MSBB) study [56] were collectively assessed (Additional file 1: Figure S4). To reduce between-study variability, the AMP-AD consortium reprocessed the raw format RNASeq data from these three studies through a consensus alignment, counting and quality control pipeline, as detailed on the AD Knowledge Portal (https://adknowledgeportal.synapse.org/, Synapse ID: syn17010685) and elsewhere [55]. Gene counts and metadata for all three studies were downloaded from the AD Knowledge Portal (Additional file 1: Table S2), and underwent subsequent quality control (Additional file 1: Table S3), and CQN normalization. Neuropathological information provided in the available metadata files was used to assign individuals as AD, control, or other (Additional file 1: Table S3); only AD cases or controls were utilized for transcriptome profiling analysis.
Statistical analysis
In the MC-CAA dataset, to assess for any genotyping batch effects, key variables were tested for their association with genotyping batch using the Wilcoxon rank sum test (sqrtCAA), linear regression (age at death), or chi-square test (Sex, APOEε4 dose, Thal phase, and Braak stage) Additional file 1: Table S1 & Figures S5-S10. These same variables were tested for association with sqrtCAA in a multivariable linear regression model in the full dataset, and in subsets based on sex (Male-only, Female-only) or APOEε4 genotype (“APOEε4-neg” = APOEε22, ε23, or ε33; “APOEε4-pos” = APOEε24, ε34, or ε44) where sex or APOE were excluded from the model respectively. These analyses were carried out using R statistical software version 3.6.2.
For the genome-wide association study (GWAS), variant dosages were tested for association with sqrtCAA using linear regression in PLINK (v2.00a2LM) [13, 38], as an additive model adjusting for age, sex, batch, the first three principal components (PCs) accounting for population substructure, Thal phase and Braak stage. Models were also run without Thal phase and Braak stage, and both with and without APOEε2 and ε4 alleles, for comparison. Sex (male-only or female-only), and APOEε4 (APOEε4-pos, or APOEε4-neg) stratified analysis were likewise performed, adjusting for the same covariates, excluding sex as appropriate. Sex and APOEε4 interaction models were run in R (v3.5.2) by including an interaction term (SNP*Sex, or SNP*APOEε4) in the regression model. Key variants were tested for association with dyshoric CAA by creating a binary variable where individuals with an average CAA score of 0.5–3 were grouped together and compared to those with a score of 4, using logistic regression run in R.
For all analysis p-values are reported. To determine genome-wide significance we applied a p-value threshold of 2.97E-08 which applies a Bonferroni adjustment for 1,679,420 SNPs that remained after filtering on an r2 of 0.8 [19] Further adjustments for analyses of subsets of the dataset were not applied. Association results of variants with a dbSNP reference SNP identifier, v142 from GWAS and interaction analyses were tested for enrichment of gene sets in the Gene Ontology [4, 53] database using GSA-SNP2 [36] software with selected options of European, padding of 20 kb, build GRCh37 (hg19), and pathway size window of 10–200.
WGS genotypes at the LINC-PINT locus were assessed for MAF and linkage disequilibrium (LD) with the index SNP using PLINK [13]. SNPs were tested for association with CQN gene expression levels (eQTL) using a linear mixed model implemented with the lme4 package [6] in R statistical software version 3.5.2. CQN expression value was the dependent variable, variant dosage (0, 1 or 2) was the independent variable. Similarly, for exon QTL (splicing QTL = sQTL), the variant genotypes were tested for association with the normalized log2FPKM exon expression values. All QTL models were adjusted for diagnosis, sex, age at death, RIN, tissue source, flowcell and the first three principle components, with flowcell being the random effects variable. Denominator degrees of freedom for test statistic was obtained using Kenward-Roger [23] restricted maximum likelihood approximation in the lmerTest package [25] in R.
LINC-PINT expression levels were assessed for differential expression between AD cases and controls, and for association with the expressed transcriptome in the reprocessed AMP-AD datasets using linear regression implemented in R statistical software version 3.5.2. Normalized LINC-PINT expression measures were the dependent variable, and either diagnosis or normalized gene expression levels were the independent variable; all analyses were adjusted for age at death, sex, RNA integrity number (RIN), and sequencing batch. In compliance with HIPAA, samples with age over 90 were censored and coded as “90” in all datasets for the purpose of analysis. Gene sets were tested for enrichment of gene ontology (GO) terms using the “anRichment” R package with p-values computed via the hypergeometric test. Tests for enrichment of cell type marker genes were carried out using Fisher’s exact test and previously defined cell type marker genes [2, 61]. False-discovery rate adjusted (Benjamini-Hochberg) q-values were calculated using R, as appropriate.
REVIGO [50] was used to organize significant GO terms from GSA-SNP2 and “anRichment” outputs based on similarity, and to generate summary figures using the “treemap” package implemented in R statistical software version v3.6.2. REVIGO settings used were “medium (0.7)” for allowed similarity, “Homo Sapiens” (Gene Ontology Jan 2017) as the database and “SimRel for the semantic similarity measure.
RNAscope
To further validate and visualize the RNAseq expression measures for LINC-PINT we performed RNAscope using cerebellum tissue from 8 AD cases that were part of the Mayo RNAseq study and identified as having high or low LINC-PINT expression. Single nuclei suspensions were collected from human cerebellum following an established approach [39]. Nuclei were then stained with anti-HNA (ab216943) antibodies and sorted using FANS (BD FACSAria™ II Cell Sorter) to increase purity. Sorted nuclei were seeded to PLL-coated 8-well chamber slides and fixed with 4% PFA for 60 min at room temperature. An RNA probe that targets the LINC-PINT transcript was utilized in RNAscope® Fluorescent Multiplex (ACDBiotech—477,631) assay according to manufacturer’s instructions. DAPI was used to mark and visualize the cerebellar nuclei and five images per condition were captured via 63X objective of Confocal Laser Scanning Microscope (Zeiss). Images were processed via ZEN Black software (Zeiss). Cell Profiler pipeline was established to relate and assign the dots to respective nuclei; the pipeline was applied to each image. LINC-PINT intensity per image was calculated according to scoring criteria developed by manufacturer and an H-score was assigned to each image. Mann–Whitney test was used to assess the statistical significance of the variation in H-score.
non-AD dataset
The Mayo Clinic Brain Bank was queried to identify additional participants scored for CAA pathology, and an age at death of greater than 55 years, without a pathological diagnosis of Alzheimer’s disease (non-AD). Amongst these participants, 265 were identified with existing available genotypes from a prior GWAS [21], 100 with existing whole genome sequence genotypes (Mayo RNAseq, genetic data, Table S2), and 217 with available DNA for genotyping, resulting in a sample size of 582 non-AD individuals. Taqman genotyping assays (Thermo Fisher Scientific, USA) were not available for the index variant, rs10234094, so we elected to investigate a SNP, rs1588770, that is in strong linkage disequilibrium (r2 = 1, D’ = 1, in the AD dataset) and had an available assay. Genotypes for rs1588770 were extracted from the prior GWAS and WGS study data for 365 participants. For the remaining 217 individuals, DNA was genotyped on 384 well plates according to manufacturer’s directions using the QuantStudio 7 Flex system (Thermo Fisher Scientific, USA). Similarly genotypes for the APOE tagging variants rs429358 and rs7412 were extracted from existing GWAS or WGS data, or from an in-house database based on prior Taqman genotyping. Variant rs1588770 (dominant model) was tested for association with sqrtCAA in the APOEε4 defined subsets (Table S1) using multi-variable linear regression, with Age at death, Sex, Braak stage and Thal phase included as covariates. APOEε4 status ( ±) was similarly tested for association with sqrtCAA in the overall non-AD dataset. AD diagnosis was tested for association with sqrt CAA adjusting for Age, Sex, APOEε2 and APOEε4. All statistical analyses for the non-AD dataset were performed using R statistical software v4.0.2.
Data sharing
The data in this manuscript are available via the AD Knowledge Portal (https://adknowledgeportal.synapse.org). The AD Knowledge Portal is a platform for accessing data, analyses and tools generated by the Accelerating Medicines Partnership (AMP-AD) Target Discovery Program and other National Institute on Aging (NIA)-supported programs to enable open- science practices and accelerate translational learning. The data, analyses and tools are shared early in the research cycle without a publication embargo on secondary use. Data is available for general research use according to the following requirements for data access and data attribution (https://adknowledgeportal.synapse.org/DataAccess/Instructions). For access to content described in this manuscript see https://doi.org/10.7303/syn22228853.