Skip to main content

Single-nucleus multi-omics of Parkinson’s disease reveals a glutamatergic neuronal subtype susceptible to gene dysregulation via alteration of transcriptional networks

Abstract

The genetic architecture of Parkinson’s disease (PD) is complex and multiple brain cell subtypes are involved in the neuropathological progression of the disease. Here we aimed to advance our understanding of PD genetic complexity at a cell subtype precision level. Using parallel single-nucleus (sn)RNA-seq and snATAC-seq analyses we simultaneously profiled the transcriptomic and chromatin accessibility landscapes in temporal cortex tissues from 12 PD compared to 12 control subjects at a granular single cell resolution. An integrative bioinformatic pipeline was developed and applied for the analyses of these snMulti-omics datasets. The results identified a subpopulation of cortical glutamatergic excitatory neurons with remarkably altered gene expression in PD, including differentially-expressed genes within PD risk loci identified in genome-wide association studies (GWAS). This was the only neuronal subtype showing significant and robust overexpression of SNCA. Further characterization of this neuronal-subpopulation showed upregulation of specific pathways related to axon guidance, neurite outgrowth and post-synaptic structure, and downregulated pathways involved in presynaptic organization and calcium response. Additionally, we characterized the roles of three molecular mechanisms in governing PD-associated cell subtype-specific dysregulation of gene expression: (1) changes in cis-regulatory element accessibility to transcriptional machinery; (2) changes in the abundance of master transcriptional regulators, including YY1, SP3, and KLF16; (3) candidate regulatory variants in high linkage disequilibrium with PD-GWAS genomic variants impacting transcription factor binding affinities. To our knowledge, this study is the first and the most comprehensive interrogation of the multi-omics landscape of PD at a cell-subtype resolution. Our findings provide new insights into a precise glutamatergic neuronal cell subtype, causal genes, and non-coding regulatory variants underlying the neuropathological progression of PD, paving the way for the development of cell- and gene-targeted therapeutics to halt disease progression as well as genetic biomarkers for early preclinical diagnosis.

Introduction

Parkinson’s disease (PD) is the second most common and the fastest-growing neurodegenerative disorder [1, 2], with a complex etiology involving environmental and genetic factors. Mutations in several genes such as, alpha-synuclein (SNCA), LRRK2, GBA, DJ-1 (PARK7), PINK1, and PRKN, were discovered in families with the rare familial form of PD (fPD) [3, 4]. However, the genetic complexity of the more common non-mendelian PD is yet to be fully understood. Large-scale genome-wide association studies (GWAS) have identified more than 100 single nucleotide variants (SNV) [5, 6] and structural variants (SV) [7, 8] associated with increased risk for PD, pointing at disease-associated chromosomal regions but not directly identifying causal variants and/or genes driving PD pathogenesis. Moreover, several PD-associated genetic variants, including short tandem repeat variants (STRs), have been suggested to have functional effects on gene expression [9,10,11,12]. However, the mechanisms through which they affect gene expression in the context of PD have not yet been described. Recent technological advancements have facilitated more granulated analyses on the molecular phenotypes of PD at the single-cell level. Single-nucleus RNA sequencing (snRNA-seq) studies have examined cellular heterogeneity of the substantia nigra (SN) and cortical regions [13, 14], and identified a sub-population of SN dopaminergic neurons that is selectively vulnerable to degeneration in PD [15]. Other studies suggested that microglial-mediated neuroinflammation [16] and astrocyte-mediated modulation of neuronal health [17] play roles in PD pathogenesis. More recently, an snRNA-seq analysis identified PD-associated gene expression changes of broad cell types in the SN, and following data integration with snATAC-seq and Chromatin Immunopreciptation (ChIP)-seq discovered candidate cis regulatory elements (cCREs) linked to PD-associated differentially expressed genes (DEGs) [18]. However, these emerging studies represent only the first steps towards a comprehensive exploration of the genetic architecture of PD. While in-depth transcriptomic analyses of PD have only been carried out on the broad cell type level, evidence suggests that specific cellular subtypes play important differential roles in the disease [15]. Thus, to gain meaningful understanding of the molecular mechanisms driving PD pathogenesis it is imperative to examine the genetics of PD at a more precise resolution within specialized cell subtypes. An additional gap in knowledge arises from the fact that the few PD single-cell transcriptomic analyses [13,14,15, 18] have focused only on the SN where early neurodegeneration is most severe and thus may obfuscate distinction between disease-driving events vs. consequences of neurodegenerative processes. As cortical brain regions have been shown to play an important role in PD pathology in later stages as the disease progresses [19,20,21], an examination of gene expression in cortical tissue with no to mild pathological perturbation provides a valuable window to capture molecular changes occurring prior to neurodegenerative processes, as well as offering new insights into genetic factors contributing to disease progression.

Here we conducted the largest single-cell-sequencing-based analysis of PD to date, and the first in the human temporal cortex (TC), examining integrated transcriptomic and chromatin accessibility data at a cell-subtype level. Moreover, this is the first study evaluating the functional impact of SNVs, indels, and STRs on the binding affinity of transcription factors co-expressed in the same particular cell subtypes as the candidate PD-DEGs. Collectively, our goal was to discover cell subtype-specific gene expression changes associated with PD and their linked regulatory elements, variants, and transcriptional networks. We further aimed to identify and characterize cellular subtype/s exhibiting selective susceptibility to gene dysregulation in PD progression. Figure 1 describes our overall approach towards accomplishing these aims. Briefly, we defined in parallel gene expression (snRNA-seq) and chromatin accessibility (snATAC-seq) profiles of more than 200,000 nuclei extracted from 12 PD and 12 normal TC tissue samples. This enabled us to identify and characterize a sub-population of glutamatergic excitatory neurons that showed remarkable PD-associated dysregulation of transcriptional programs compared to all other neuronal subtypes. Finally, we integrated our data with PD-GWAS risk loci and transcription factor binding motif datasets to obtain new insights into the potential mechanisms underlying cell subtype-specific dysregulation of gene expression in PD.

Fig. 1
figure 1

Patterns and predicted mechanisms of gene regulation in cell subtypes of the temporal cortex in PD. Summary of analysis pipeline and major findings. snRNA-seq analysis of temporal cortex nuclei from normal and PD donors revealed PD-dependent differential gene expression in cell subtype clusters, with strong polarization towards up- or downregulation in specific clusters. Exc5, Micro1 and OPC1 clusters stood out as having greater numbers of differentially-expressed genes (DEGs) compared to other clusters. Upregulated DEGs were enriched for functional pathways relating to translation, proteasomal protein degradation, mitochondrial function, cellular respiration, and intracellular transport. Downregulated DEGs were enriched for functional pathways relating to chromatin organization, DNA damage response, regulation of apoptosis, and cellular recycling. Exc5 was characterized by positive marker genes enriched for functions relating to excitatory neurotransmission, postsynaptic structure, regulation of intracellular ion homeostasis, and neuron development including axon guidance. snATAC-seq analysis revealed overall increased chromatin accessibility in PD for each cell subtype cluster examined. In some cases, differentially-accessible peaks (DAPs) formed networks (CCANs) with peaks that were coaccessible to regulatory regions of DEGs (candidate cis-regulatory elements; cCREs). Enrichment of transcription factor binding sites (TFBSs) was identified within cCREs of numerous GWAS DEGs. cCREs of certain PD GWAS-DEGs were found to be enriched for binding of high numbers of DEG TFs, including MPP2, BRD2, ATXN7L3, EIF4A1, and EIF5A, among others. DEG-encoded TFs (TF-DEGs) found to be enriched for binding within cCREs of multiple GWAS DEGs included YY1, SP3, and KLF16, which was itself also encoded by a GWAS DEG, among others. Additionally, sequence variants with high LD to GWAS-defined PD risk alleles were identified within cCREs of GWAS DEGs predicted to enhance or disrupt TF binding, including binding of several TF-DEGs (including YY1, SP3 and KLF16) predicted to regulate multiple PD GWAS-DEGs

Results

Characterization of cell types and subtypes in the human temporal cortex (TC) of individuals with PD and neurologically normal controls in snRNA-seq dataset

Nuclei were isolated from frozen post-mortem human TC tissues of 12 PD and 12 normal control individuals, 6 females and 6 males from each diagnosis group (Table 1 and Table S1 summarize the demographic and neuropathological phenotypes).

Table 1 Demographics and pathology summary of study cohort

After quality control (QC) filtering, a total of 208,081 nuclei from all 24 TC samples were retained for snRNA-seq (Table S1). Cell types were annotated by the label transfer method [22] following gene expression library sequencing, using a pre-annotated reference snRNA-seq dataset [23]. Annotations were validated by examination of cell type-specific marker gene expression [24] (Fig. S1). Six major cell types were identified including: astrocytes (Astro), excitatory neurons (Exc), inhibitory neurons (Inh), microglia (Micro), oligodendrocytes (Oligo), and oligodendrocyte precursor cells (OPC; Fig. 2A). Oligodendrocytes were the most prevalent cell type in our dataset (96,812 nuclei in total), followed by excitatory neurons (50,590 nuclei), whereas OPCs were the least prevalent major cell type (9564 nuclei). Vascular and leptomeningeal cells represented less than 1% of recovered nuclei in our dataset and were therefore excluded from downstream analyses. Recovery of sufficient quantities of this cell type would likely require sample pooling or additional enrichment procedures to enable cell dissociation from the vessel basement membrane [25, 26]. Following dimensional reduction by principal component analysis (PCA) and uniform manifold approximation and projection (UMAP), Louvain community detection was used to delineate 36 nuclei clusters with distinct gene expression profiles (Fig. 2A, Table S2) and representing subtypes of the six major cell types. Cell subtypes were each given a unique label according to their most common cell type (e.g., the 12 clusters of excitatory neurons were labeled Exc1-Exc12; Fig. 2A, Table S2). To check for donor-based batch effects, we examined the distribution of nuclei across subtype clusters for each donor sample. Donor samples of both Normal and PD groups overall showed qualitatively even distribution across cell subtype clusters (Fig. S2).

Fig. 2
figure 2figure 2

Differential gene expression in major cell types and subtype clusters. A UMAP dimensional reduction plot of major cell types: Astrocytes (Astro), Excitatory neurons (Exc), Inhibitory neurons (Inh), Microglia (Micro), Oligodendrocytes (Oligo), and Oligodendrocyte Precursor Cells (OPC) (left) and 36 cell subtype clusters (right) based on snRNA-seq data. B Histogram showing the number of DEGs with |log2FC| > 0.2 for each cell subtype cluster. Proportions of upregulated DEGs are indicated in red, while the proportions of downregulated DEGs are indicated in blue for each cluster. Total number of DEGs is indicated above each bar. C Unbiased volcano plots for selected cell subtype clusters, representing astrocyte (Astro), excitatory neuron (Exc), inhibitory neuron (Inh), microglia (Micro), oligodendrocyte (Oligo), and oligodendrocyte precursor (OPC) cell subtypes based on snRNA-seq clustering. Log2 fold change (FC) between PD and normal control samples is plotted against –log10 p value (FDR). Points representing DEGs with statistically significant (FDR < 0.05) upregulation in PD are shown in dark red while DEGs with significant downregulation are shown in dark blue. Genes without significantly differential expression are shown as gray points. The three DEGs with the highest absolute fold change (|log2FC| > 0.2) in the up- and downregulated categories are labeled in dark red and dark blue, respectively. The three familial PD genes with the highest absolute log2FC in the up- and downregulated categories are labeled in pink and teal, respectively. The three DEGs within 500 kb of PD-associated SNPs previously identified in GWAS (PD GWAS-DEGs) with the highest absolute log2FC in the up- and downregulated categories are labeled in bright red and bright blue, respectively. D Metascape analysis of enriched pathways among all upregulated DEGs across cell subtype clusters. Network diagrams (i) indicate shared genes among enriched biological pathways. Node colors indicate clusters with similar functionality (indicated by cluster labels) and node sizes are proportional to the number of marker genes included in the pathway. Widths of lines linking pathway nodes are proportional to the number of shared genes between the two linked pathways. Heatmaps (ii) indicate the top 20 enriched biological pathway terms across cell subtype clusters. Darker-colored rectangles indicate greater statistical significance (proportional to Log10 p value) of pathway enrichment. Pathways are labeled with dots indicating membership in broad-functional based clusters based on clustering in associated networks. Cladogram connecting lines indicate similarity of pathways (left) and clusters (top) with regard to shared DEGs in associated pathways. E Metascape analysis of enriched pathways among all downregulated DEGs across cell subtype clusters. Format is as in panel D. F Metascape analysis of enriched pathways among only upregulated PD GWAS-DEGs across cell subtype clusters. Format is as in panel D. G Metascape analysis of enriched pathways among only downregulated PD GWAS-DEGs across cell subtype clusters. Format is as in panel D

PD pathology is characterized by neuronal cell death [27]. Thus, we compared the proportions of major cell types and subtypes (number of nuclei from a particular donor sample belonging to a particular cell type/subtype divided by total number of nuclei for the same sample) between PD and control samples. The comparison analysis was performed using the Mixed-Effects Association of Single Cells (MASC) algorithm [28] with age, sex, postmortem-interval (PMI), number of nuclei after QC filtering, median genes per cell, and average library size as fixed effects and sample donor ID included as a random effect. We did not identify any significant changes in either major cell type or cell subtype cluster proportions between the PD and the control groups (Fig. S3A), as expected according the low McKeith scores in the TC tissues of the PD group (Table 1 and S1).

Cell-type and subtype-specific differential gene expression in PD

To examine differential gene expression between PD and the normal control samples using the snRNA-seq data, we employed the NEgative Binomial mixed model Using a Large-sample Approximation (NEBULA) algorithm [29]. First, we analyzed the contributions of 41 metadata variables (listed in Table S1) to sample variance and found that the number of nuclei after QC filtering, median number of genes per cell, and average library size contribute significantly to the variability in gene expression. Thus, we included these variables as fixed effects in our NEBULA model, along with donor age, sex, and sample collection PMI. In addition, sample donor ID was included as a random effect to account for donor-specific variance. Changes in gene expression were compared between PD and control for each of the 6 major cell types and each of the 36 subtype clusters, using a gene-level false discovery rate (FDR)-corrected p value significance threshold of < 0.05 and a log2 fold-change (FC) of ≥ 0.2 to define differentially expressed genes (DEGs). Among the major cell types, microglia produced the highest number of DEGs (n = 4849), while excitatory neurons produced the lowest number with only 3 DEGs (Fig. S4A). Of the major cell types, excitatory neurons were divided into the highest number of subtype clusters (n = 12) which also tended to be clearly delineated in UMAP space (Fig. 2A), indicating a high degree of diversity among cell subtypes that may diminish the utility of aggregate analyses. Variability between clusters in this case may have overwhelmed variability between disease states. In contrast, microglia separated into only 3 clusters, with the bulk of nuclei belonging to the Micro1 cluster (Fig. 2A, Table S2). The fPD gene DJ-1 (PARK7) was identified as a DEG in both inhibitory neurons and oligodendrocytes, and PRKN and LRRK2 were DEGs in microglia and OPCs, respectively (Fig. S4A).

A granulated analysis at the cell subtype level revealed a total of 11,612 unique DEGs, with the highest numbers of DEGs in Exc5, Micro1, and OPC1 clusters (n = 3899, 5640, and 5382, respectively; Fig. 2B, Table S3). On the other hand, there were no DEGs in the Inh9 and OPC2 clusters. Noteworthily, while Exc5 and other Exc clusters (e.g., Exc3 and Exc8) showed numerous DEGs, very few DEGs were found in Exc neurons when the analysis was performed at a broad cell type level, suggesting roles for specific Exc subtypes in PD pathogenesis. Notably, we observed that in most cell subtypes the directionality of the DEGs was highly polarized, with the vast majority of DEGs being either upregulated or downregulated in PD in a given subtype (Fig. 2B, C, Fig. S4B). While directionality was generally consistent within clusters of a particular major cell type, in some cases specific subtype clusters showed a different directionality than the majority of clusters of that cell type. For example, the Exc6, 7, and 9 clusters showed primarily PD-associated decreases in gene expression while the other Exc clusters showed primarily increased expression (Fig. 2B, C, Fig. S4C). We then integrated the cell subtype DEGs dataset with 90 PD-GWAS loci [5] (Table S3). Out of a total of 2428 genes mapped to PD-GWAS regions, defined as ± 500 Kb of GWAS-SNVs, 628 were DEGs in at least one cell subtype (hereafter designated as PD GWAS-DEGs), including SNCA (Exc5 and Micro3), GBA (Inh2), and LRRK2 (OPC1) (Fig. 2C, Fig. S4B, Table S3). SNCA, GBA, and LRRK2 are not only PD GWAS loci in non-mendelian PD cases, but are also established fPD genes [4]. In addition to these three genes, in several clusters we identified other fPD DEGs, including, DJ-1 (Exc5, Exc12, Inh1, Inh5, and Micro3), and PINK1 (Exc8, Inh5, and Micro3) (Fig. 2C, Fig. S4B).

Biological pathways enriched for cell-type specific PD-associated DEGs

Next, we assessed the biological significance of the identified DEGs using Metascape [30]. Enriched pathways for upregulated DEGs in the different cell subtypes were grouped into major functional categories relating to protein synthesis and stabilization, protein catabolism, intracellular transport, mitochondrial function and cellular respiration, transcriptional regulation, circadian signaling, and cytoskeletal dynamics (Fig. 2Di). Upregulation of these pathways was most strongly observed among neuronal subtypes, but was also detected in oligodendrocyte and microglia clusters (Fig. 2Dii). Downregulated DEGs across cell subtypes were enriched in pathways relating to DNA damage response, cellular stress response (including regulation of apoptosis), chromatin organization, catabolic processes (protein and other types), RHO GTPase signaling, cytoskeletal dynamics, kinase signaling, protein complex assembly, cancer pathways, and retinoic acid signaling (Fig. 2Ei). These downregulated pathways were enriched primarily in glial subtypes (Micro1, Micro2, Oligo6, Astro 1, Astro3, and OPC1; Fig. 2Eii).

To pinpoint pathways driving PD risk, we next examined pathway enrichment among PD GWAS-DEGs only. Upregulated pathways were enriched in categories relating to neurotransmission, membrane trafficking and organization, mitochondrial function and mitochondrial stress response, translation, and nucleocytoplasmic transport (Fig. 2Fi). Shared enriched pathways for upregulated PD GWAS-DEGs were found exclusively among neuronal subtypes (Fig. 2Fii), suggesting neurons as the primary PD drivers. Furthermore, these pathways were also associated with other PD-related functions, including protein synthesis [31,32,33,34] and catabolism [35,36,37,38], and mitochondrial function [39,40,41,42,43]. Downregulated PD GWAS-DEGs were enriched in pathways of diverse categories including DNA metabolism, immune response, cellular recycling pathways, mitochondrial organization, and vesicle transport (Fig. 2Gi). The analysis pointed to a decrease in stress-responsive functions, including pathways associated with the DNA damage response, in accordance with recent work highlighting the importance of DNA repair and other stress-response mechanisms in PD [44]. Downregulated pathways were enriched in glial cells, most prominently within the Micro1 and OPC1 clusters, including immune response pathways specific to microglia (Fig. 2Gii). These results suggest suppression of pathways regulating DNA metabolism and chromatin structure, cellular recycling, and immune-based stress responses within glial subtypes as drivers of PD. We also described the most highly enriched up- and downregulated PD GWAS-DEG pathways for individual cell subtype clusters (Fig. S5A and B, respectively).

Disproportionate impact of a specific glutamatergic excitatory neuron subtype on PD pathological progression

The Exc5 cluster showed the most PD-upregulated DEGs and dysregulated biological pathways compared to other neuronal clusters in our dataset. Moreover, Exc5 was the only neuronal cluster showing a significant and robust overexpression of SNCA (Figs. 2C, S4B), further supporting a role for this neuronal subtype in later PD stages along the neuropathological progression of the disease [45, 46]. Thus, we sought to characterize in-depth this excitatory neuron subtype in the context of its relevance to PD (Fig. 3A). To this end, we first examined the expression of 10 canonical markers in all Exc and Inh neuronal clusters to annotate neuronal subtypes (Fig. S6). The vesicular glutamate transporter SLC17A7 was expressed in all Exc subtypes (Fig. 3B, Fig. S6), indicating the predominance of glutamatergic neurons within our TC samples, and markers for GABA transmission (SLC6A1, GAD1) were expressed among Inh neuron clusters, as expected. Next, to further refine the characterization of the Exc neuronal subtypes and Exc5 in particular, we examined the expression of 56 marker genes previously used to categorize excitatory neuron subtypes [14, 47, 48] in nuclei of each of the Exc clusters (Fig. 3B, Fig S7). Exc5 cells showed high expression of CBLN2, RASGRF2, CUX2, and PHACTR2, but RORB or FOXP2 expression was not detected (Fig. 3B). Four markers were highly expressed in Exc5, the autophagic genes LAMP5 and HPCAL1 [49,50,51], and the long noncoding RNAs LINC00507 and LINC01500 (Fig. 2B). Exc1 was the only cluster showing a similar expression profile to Exc5 with relatively high co-expression of three out of the four markers, LAMP5, HPCAL1 and LINC00507, whereas LINC01500, an upper cortical layer marker [47], was specifically expressed only in Exc5. Consistently with our findings, the expression of LINC00507, known to be involved in cortical development [52], was previously correlated with LAMP5 expression [47]. Next, we further characterized the transcriptional profile of Exc5 cells compared to all other excitatory neuron clusters using FindMarkers in Seurat (Methods) and identified 432 upregulated and 498 downregulated marker genes (|log2FC| threshold of ≥ 0.25 and a gene-level FDR-adjusted p value threshold of < 0.01; Table S4). Metascape pathway enrichment analysis identified multiple enriched pathways for both upregulated and downregulated gene sets. Upregulated pathways (Fig. 3C) were related to neuronal development and organization, including the Roundabout (ROBO-SLIT) signaling pathway [53] and semaphorin genes (e.g. SEMA3C; Fig. 3D), both involved in axon guidance and neurite outgrowth [54]. Other upregulated pathways involved synapse formation, in particular the postsynaptic structure, and included CBLN2, involved in synaptic structure maintenance and glutamatergic synaptic transmission [55], and genes with roles in the postsynaptic structure such as glutamate receptor-encoding GRIA4 (Fig. 3D). Additionally, intracellular ion homeostasis pathways were enriched with numerous genes, such as the voltage-gated calcium channel subunit-encoding CACNA1E (Fig. 3D). Downregulated pathways (Fig. 3E) were enriched for genes involved in presynaptic organization (e.g., Synaptoporin-encoding SYNPR), synaptic transmission (e.g., voltage-gated potassium channel subunit-encoding KCNH5), inhibitory signal reception (e.g. GABA receptor subunit-encoding GABRG3), and calcium-response (e.g., CDH6 and PDE1C; Fig. 3F).

Fig. 3
figure 3

Characterization of excitatory neuron subtype cluster Exc5 by differential gene expression. A UMAP dimensional reduction plot of 36 cell subtypes in snRNA-seq dataset. Boxed area and associated enlarged panel indicates excitatory neuron (Exc) subtypes and the Exc5 cluster is indicated by red circle. B Violin plots of log-normalized, corrected count data showing expression of canonical excitatory neuron markers among the 12 Exc subtype snRNA-seq clusters. C Metascape analysis of enriched pathways among genes identified as upregulated in Exc5 compared to the other Exc clusters using the FindMarkers function of the Seurat R package. (i.) Network diagram indicating shared genes among enriched biological pathways. Node colors indicate clusters with similar functionality (indicated by cluster labels) and node sizes are proportional to the number of marker genes included in the pathway. Widths of lines linking pathway nodes are proportional to the number of shared genes between the two linked pathways. (ii.) The top 20 enriched biological pathway terms based on all upregulated Exc5 marker genes. Statistical significance (Log10 p value) is plotted on horizontal axes. Darker-colored bars indicated greater significance. Pathways are labeled with dots indicating membership in broad-functional based clusters based on clustering in i. D Feature plots of log-normalized, corrected count data showing expression of selected positive Exc marker genes with functions relating to major functional categories identified in panel C superimposed on UMAP plots. Darker violet coloring indicates stronger gene expression for corresponding nuclei. E Metascape analysis of enriched pathways among genes identified as downregulated in Exc5 compared to the other Exc clusters using the FindMarkers function of the Seurat R package. i and ii are formatted as in panel C for downregulated instead of upregulated Exc5 marker genes. F Feature plots of log-normalized, corrected count data showing expression selected negative Exc marker genes with functions relating to major functional categories identified in panel D superimposed on UMAP plots. Darker violet coloring indicated stronger gene expression for corresponding nuclei

Cell type and subtype-specific differential chromatin accessibility in PD

In the snATAC-seq dataset, 167,445 nuclei were retained after QC filtering (Table S5) and Louvain community detection distinguished 32 cell subtype clusters representing the same six major cell types as in the snRNA-seq data (Fig. 4A). We also checked for donor-based batch effects in snATAC-seq clustering by examining the distribution of nuclei across subtype clusters for each donor sample, and showed again qualitatively even distribution of donor samples and diagnosis groups across subtypes (Fig. S9). Comparisons of major cell types and subtype clusters using MASC did not reveal significant depletion or enrichment of any cell types in PD compared to control, consistent with the snRNA-seq findings and as expected from the observed TC McKeith stages (Fig. S3B). We further mapped the 32 snATAC-seq clusters to 22 of the corresponding snRNA-seq clusters (Fig. 4A), and focused the downstream analyses on these corresponding cell subtype clusters. Chromatin accessibility differential analysis between PD and control groups by NEBULA (number of nuclei, age, sex, and PMI were included as covariates in the model) identified differentially accessible peaks (DAPs) in 20 of the subtype clusters (FDR < 0.05 and a log2 FC cutoff of > 0.2; Figs. 4B and S10). Of these 20 clusters, Oligo2 demonstrated the highest number of DAPs with 28,377, while OPC1 had the fewest with only 5 DAPs (Figs. 4B and S10). All clusters showed higher numbers of DAPs with increased as opposed to decreased accessibility in PD (Fig. S10).

Fig. 4
figure 4

Differential chromatin peak accessibility in snATAC-seq cell subtype clusters. A UMAP dimensional reduction plots of snATAC-seq nuclei data indicating major cell types (left panel), subtype clusters based on snATAC-seq data (center panel) and snRNA-seq-linked clusters determined via label transfer (right panel). B Unbiased volcano plots for example linked-subtype clusters belonging to each of the major cell types. Log2 fold change (FC) in peak accessibility between PD and normal control samples is plotted against –log10 p value (FDR). Points representing DAPs with statistically significant (p < 0.05) upregulation in PD are shown in dark red while DAPs with significant downregulation are shown in dark blue. Peaks without significantly differential expression are shown as gray points. The three DAPs with the highest absolute fold change (|log2FC| > 0.2) in the more- and less-accessible categories are labeled with their nearest genes in dark red and dark blue, respectively. The closest genes to the top three more- and less-accessible DAPs within 500 kb of PD GWAS-SNVs are labeled in teal and pink, respectively. The DAPs closest to familial PD genes with the highest absolute log2FC in the more- and less-accessible categories are labeled in pink and teal, respectively. C Genome track plots showing the genomic locations of DAPs in relation to the hereditary PD genes SNCA, LRRK2, and PRKN for each of the indicated linked cell subtypes. More accessible DAPs are shown in red while less accessible DAPs are shown in blue. Gene coding regions are shown in green, with direction of transcription indicated by arrows, and exons indicated by vertical bars. The remaining area of the coding region comprises the gene introns. D Venn diagrams showing numbers of overlapping DEGs and DAP closest genes for upregulated DEGs/more-accessible DAPs (red/yellow), and downregulated DEGS/less-accessible DAPs (blue/violet) for selected clusters. E Diagrams of cis-coaccessibility networks overlapping PD GWAS-SNV regions. Vertical lines at the top of each diagram indicate chromatin peaks and Bezier curves indicate coaccessibility of distal chromatin peaks to peaks overlapping the promoters or intron 1 regions of DEGs. DAPs with greater accessibility in PD are shown in salmon while DAPs with reduced accessibility in PD are shown in blue. Non-differentially-accessible peaks are shown in grey. Below the peak linkage plots, gene exons overlapping CCAN regions are depicted as arrows indicating the directionality of transcription. DEGs are labeled with gene names, and names of genes with distal peak coaccessibility in promoter or intron 1 regions are labeled in bold text. PD GWAS-DEGs are labeled with underlined text. PD-upregulated DEGs are shown in salmon, downregulated DEGs are shown in blue, and non-differentially expressed genes are shown in grey. At the bottom of each diagram, a Manhattan plot is shown indicating -log10 of p-values for PD-association of chromosome loci within CCAN region as reported by Nalls et al. [5]. Dotted lines in Manhattan plots indicate statistical significance threshold (p = 0.05). GWAS-identified PD-associated SNVs are indicated via red dots and labeled on Manhattan plots

The snRNA-seq and snATAC-seq assays were performed in parallel by splitting and simultaneous processing of one nuclei preparation for each donor sample to generate corresponding gene expression and chromatin accessibility libraries, respectively, for each study subject. This parallel processing minimized technical variables and variability in cell type composition between the assays, thereby facilitating downstream integrative multi-omics analyses to better understand the relationships between DEGs and DAPs. More open DAPs were identified within coding and promoter regions of three fPD genes: SNCA, LRRK2, and PRKN, in several cell subtypes (Fig. 4C). Overall, correlation between DEGs and DAPs, annotated by the closest gene, was fairly low, with relatively few DEGs having corresponding DAPs with the same directionality (upregulated/more accessible or downregulated/less accessible) within the same cell subtype (Fig. 4D). Notably, the Exc5 cluster had the highest number of more accessible DAPs overlapping with upregulated DEGs (n = 678), followed by Inh2 and Exc3 (n = 98 and 94, respectively). These results suggest that in many cases open chromatin sites may not affect the closest gene, but may instead be linked to more distal genes.

Mechanisms of gene dysregulation in PD

Changes in chromatin accessibility of candidate cis-regulatory elements (cCREs)

To evaluate the effect of PD-dependent changes in chromatin accessibility on gene dysregulation in PD, we first employed Cicero [56] to identify coaccessible peaks and construct cis-coaccessibility networks (CCANs) using the snATAC-seq data of the PD nuclei. We applied a threshold coaccessibility score of ≥ 0.2 [56, 57] between each peak and at least one other peak in the network and revealed a total of 29,968 CCANs across all 22 clusters (Table S6). Next, within each CCAN we catalogued distal peaks that were coaccessible to proximal peaks overlapping the promoters or intron 1 of genes and the resulting peak pairs defined cCREs and their linked genes. The downstream analysis was confined to a subset of CCANs that encompassed both DAPs and cCREs linked to DEGs with the same directionality (unidirectional CCANs). The analysis identified 115 unidirectional CCANs (Table S6), including 2 CCANs overlapping PD GWAS-DEGs (Fig. 4E). Examples of cCRE-linked DEGs within unidirectional CCANs included RTN4 in Exc5, encoding an inhibitor of neurite outgrowth [58], GABARAP in Inh3, encoding an autophagy-associated GABA receptor [59], and the proteasome-associated UCHL1 in Oligo2, involved in SNCA degradation [60] (Fig. 4E).

Changes in transcription factor (TF) abundance

Additionally, gene dysregulation in PD may result from altered TF expression. Thus, we first identified a total of 142 cCREs linked to 104 unique PD GWAS-DEGs (out of 628 total PD GWAS-DEGs examined) across 18 clusters (Fig. 5; Table S7). We then predicted TF binding sites within the promoter/intron 1 of these PD GWAS-DEGs and their linked cCREs in each cell subtype for TFs expressed in ≥ 10% of cells using HOMER [61]. The analysis revealed enrichment of TF binding sites (TFBS) within 125 cCREs linked to a total of 90 unique PD GWAS-DEGs across 17 out of the 18 analyzed cell subtypes (fold-enrichment cutoff ≥ 1.2 and FDR < 0.05; Fig. 5, Table S8). Notably, 93 of the cCREs with TFBS enrichment showed at least one TF that was also encoded by a PD-associated DEG (hereafter TF-DEG). Notably, Exc5 showed the highest number of cCRE-linked upregulated PD GWAS-DEGs, and the strongest enrichment for TF-DEGs (Fig. 5A, Table S8). For downregulated PD GWAS-DEGs, the strongest enrichment for TF-DEGs was found in Micro1 and OPC1 (Fig. 5Biv-vi, Table S8). Enrichment for binding sites of TF-DEGs in cCREs linked to upregulated and downregulated PD GWAS-DEGs was also identified in other cell subtype clusters (Fig. 5Bi-iii, Table S8).

Fig. 5
figure 5figure 5figure 5

TFBS motif enrichment within regulatory regions and candidate cis-regulatory elements (cCREs) of PD GWAS DEGs. A, B Diagrams of candidate cis-regulatory elements (cCREs) of example PD GWAS-DEGs and associated enriched transcription factor binding motif sites (TFBS). Vertical lines at the top of each diagram indicate chromatin peaks and Bezier curves indicate coaccessibility of distal chromatin peaks to peaks overlapping the promoters or intron 1 regions of PD GWAS-DEGs. Non-differentially-accessible peaks are shown in grey. Below the peak linkage plots, gene exons overlapping cCRE regions are depicted as arrows indicating the directionality of transcription. DEGs are labeled with gene names, and names of genes with distal peak coaccessibility are labeled in enlarged text. PD-upregulated DEGs are shown in salmon, downregulated DEGs are shown in blue, and non-differentially expressed genes are shown in grey. At the bottom of each diagram, a Manhattan plot is shown indicating −log10 of p values for PD-association of chromosome loci within cCRE region as reported by Nalls et al. [5]. Dotted lines in Manhattan plots indicate statistical significance threshold (p = 0.05). GWAS-identified PD-associated SNPs are indicated via red dots and labeled on Manhattan plots. Boxes to the right of cCRE diagrams depict enlarged regions of chromatin peaks indicated by blue triangles and show the locations of enriched TFBS represented by vertical lines. Motifs for TF-DEGs upregulated in PD are shown in red, while those for downregulated TFs are shown in blue. Motifs for non-DEG TFs are shown in grey. Heights of TFBS bars indicate the fold-enrichment of binding motifs for the associated TF within the cCRE peaks compared to elsewhere in the genome. A cCREs of the Exc5 cluster. B cCREs of additional clusters

In order to identify TFs potentially acting as master regulators of PD-associated gene expression, we examined networks of TF-DEGs with TFBS enrichment in cCRE regulatory regions of target PD GWAS-DEGs for each cell subtype cluster (Fig. 6). Among upregulated PD GWAS-DEGs, the Exc5 cluster produced the most extensive network, comprising 63 TF-DEGs and 27 linked target DEGs (Fig. 6A). Among these TFs, several emerged as candidate regulators of larger numbers of PD GWAS-DEGs, with YY1 enriched in cCREs of the largest number of target DEGs (n = 13). Notably, ELK4 and KLF16 (9 and 6 linked-DEGs, respectively) were also themselves encoded by upregulated PD GWAS-DEGs in the Exc5 cluster. Among the targeted DEGs, postsynaptic scaffold protein-encoding MPP2 [62] was predicted to be regulated by the highest number of TF-DEGs (n = 27; Fig. 6A, Table S8).

Fig. 6
figure 6

Predicted transcription factor regulatory networks for PD GWAS-DEGs. A, B Diagrams of predicted regulatory networks of PD GWAS-DEGs and the TF-DEGs predicted to regulate them based on enrichment of TF binding motif sites (TFBS) within the promoters and intron 1 sequences and distal coaccessible peaks of the associated PD GWAS-DEGs, generated by Cytoscape software. Network diagrams are shown for each linked snRNA-seq/snATAC-seq cluster in which enriched TFBS were identified for TF-DEGs within cCRE region of GWAS DEGs. Red nodes on diagrams represent PD-upregulated GWAS DEGs, blue nodes represent PD-downregulated GWAS DEGs, green nodes represent TF-DEGs, and purple nodes represent PD GWAS-DEGs that are also TF-DEGs. Lines linking nodes indicate predicted regulatory relationships between TF-DEGs and PD GWAS-DEGs based on TFBS enrichment within cCRE regions. Node sizes are proportional to the number of linkages predicted (Number of PD GWAS-DEGs predicted to be regulated by a particular TF or number of TFs predicted to regulate a particular PD GWAS-DEG). Panel A depicts PD GWAS-DEGs and TF-DEGs upregulated in PD, while panel B depicts those downregulated in PD (directionality of regulation is always the same between TF-DEG and PD GWAS-DEG)

For the downregulated PD GWAS-DEGs, the largest TF-DEG/PD GWAS-DEG regulatory networks were identified in the Micro1 cluster, with 79 TF-DEGs predicted to regulate 18 PD GWAS-DEGs, and the OPC1 cluster, with 87 TFs predicted to regulate 17 PD GWAS-DEGs (Fig. 6B). However, compared to the Exc5 cluster network, there were fewer TFs in the Micro1 and OPC1 clusters predicted to regulate high numbers of GWAS-DEGs.

PD-correlated genetic variants altering TF binding affinity

Alterations in binding affinities of TFs to cCREs and regulatory regions may also contribute to dysregulation of gene expression in PD. To asses this mechanism we used MotifbreakR [63] to predict the effects of genomic variation [64] on TF binding affinity within sequences of cCREs and promoter/intron 1 regions linked to 41 selected PD GWAS-DEGs in corresponding cell subtypes. We identified a total of 1175 unique variants predicted to impact binding of a total of 599 TFs, including 1035 SNVs, 27 insertions, and 113 deletions (Table S9). TFs showing changes in the affinity of their binding sites also included TF-DEGs, with the highest number (n = 24) identified in the Exc5 cluster (Table S10). Some of these TF-DEGs (including YY1) were predicted to regulate multiple PD GWAS-DEGs (described above).

Next, we sought insight into the relationships between the identified regulatory variants and PD risk loci. We examined the linkage disequilibrium (LD) between each candidate regulatory variant and the nearest PD GWAS-SNV using both r2 and D′ metrics. Out of the 1175 candidate regulatory variants, 19 were in LD with a PD GWAS-SNV (r2 and D′ ≥ 0.5; Table S11). Several variants were predicted to impact expression of multiple PD GWAS-DEGs, particularly in the Exc5 cluster. A prominent example involved the PD GWAS-DEGs UBTF, ATXN7L3, MPP2, and DUSP3 (Fig. 7A, Table S11). Two of the TFs with predicted altered binding affinity in cCREs of these genes, IRF2 and FOXP1, also exhibited TFBS enrichment in cCREs of ATXN7L3 and MPP2, respectively. The minor allele of deletion variant rs5820527 was found to have perfect LD with the PD risk allele of PD GWAS-SNV rs2269906 (r2 = 1, D′ = 1), suggesting that disrupted binding of associated TFs (FOXD3, PAX4, and the TF-DEGs: FOXJ2, FOXJ3, and HDAC2) may contribute mechanistically to PD risk. The major allele of another deletion variant in this region (rs5820529) was in high LD with the same PD GWAS-SNV (r2 = 0.554, D′ = 0.87; Fig. 7A, Table S11). Noteworthily, TFs SP3 and KLF16 are predicted to have enhanced binding to this PD-risk allele and are encoded by upregulated DEGs in the Exc5 cluster. Furthermore, SP3 binding motifs are enriched within cCRE regions of both ATXN7L3 and MPP2, while KLF16 is enriched for binding in cCRE regions of MPP2. Both are also among the overall most enriched TFs within cCREs of PD GWAS-DEGs in Exc5, and KLF16 is itself also encoded by a PD GWAS-DEG (Fig. 6A). Altogether, these observations suggest that transcriptional upregulation of PD GWAS-DEGs such as ATXN7L3 and MPP2 by the TFs SP3 and KLF16 may increase PD risk. FOS1 and SP1 are also enriched for binding within MPP2 cCRE regions and were predicted to have enhanced binding to this PD risk-associated allele, as was ZNF219, an important regulator of SNCA expression [65].

Fig. 7
figure 7figure 7figure 7

Sequence variants predicted to impact binding affinity for TFs in regulatory elements of PD GWAS-DEGs. AE Diagrams of cCREs of example PD GWAS-DEGs and associated TFBS. PD-versus-normal fold change and associated FDR values for the relevant PD GWAS-DEGs are shown for each diagram. Vertical lines at the top of each diagram indicate chromatin peaks and Bezier curves indicate coaccessibility of distal chromatin peaks to peaks overlapping the promoters or intron 1 regions of PD GWAS-DEGs. Non-differentially-accessible peaks are shown in grey. Below the peak linkage plots, gene exons overlapping genomic regions are depicted as arrows indicating the directionality of transcription. DEGs are labeled with gene names, and names of genes with distal peak coaccessibility are labeled in enlarged text. PD-upregulated DEGs are shown in salmon, downregulated DEGs are shown in blue, and non-differentially expressed genes are shown in grey. At the bottom of each diagram, a Manhattan plot is shown indicating −log10 of p values for PD-association of chromosome loci within genomic region as reported by Nalls et al. [5]. Dotted lines in Manhattan plots indicate statistical significance threshold (p = 0.05). GWAS-identified PD-risk-associated SNVs are indicated via red dots and labeled on Manhattan plots. Boxes to the right of cCRE diagrams depict enlarged regions of chromatin peaks connected by blue lines and vertical lines show the locations of sequence variants predicted to impact binding affinity of TFs for associated TFBS. Variants in motifs for TF-DEGs upregulated in PD are shown in red, while those for downregulated TF-DEGs are shown in blue. Motif variants for non-DEG TFs are shown in grey. Sequence tracks to the right of sequence variant plots depict the specific sequence variations predicted to affect binding affinity of the indicated TF. The DNA strand containing the TFBS is shown in color, while the opposite strand is shown in greyscale. Binding motifs are indicated by bold lettering and shown in dark blue on the binding strand. Sequence variants are shown in enlarged letters within motif regions, with the major variant shown aligned with the chromosomal sequence and the minor variant placed externally. Variant nucleotides associated with greater predicted binding affinity are shown in green, while those associated with lower affinity are shown in red. Above the sequence tracks, gene names of TFs with predicted binding affinity changes from associated sequence variants are shown. Below these, the major and minor allele binding affinity scores, and the major and minor allele p values are indicated in black lettering. Upregulated TF-DEG names are shown in red, and downregulated TF-DEG names are shown in blue. The fold change and associated FDR values for the DEG TFs are shown to the right of the gene names. Non-DEG TF names are shown in grey. Asterisks next to TF names indicate TFBS enriched within the cCRE peaks indicated compared to elsewhere in the genome. Up arrows indicate that the TF is a known or predicted transcriptional activator; down arrows indicate a known or predicted transcriptional repressor; bi-directional arrows indicate both transcriptional activation and repression; absence of arrow indicates no information available. Below the sequence tracks, sequence variant IDs are indicated, and r2 and D′ linkage disequilibrium values of sequence variant loci with the most proximal PD GWAS-SNV are indicated in blue. Whether the major or minor allele of the variant correlates with the PD risk variant of the GWAS-SNV is also noted in parentheses

A second example of PD risk variants potentially impacting regulation of multiple Exc5 PD GWAS-DEGs was discovered on chromosome 17. cCREs of 5 PD GWAS-DEGs (EIF4A1, NLGN2, GABARAP, FXR2, and POLR2A) had predicted differential binding at 3 variant loci with high LD to PD GWAS-SNV rs12600861 (Fig. 7B, Table S11). SP1 binding motifs were enriched in cCREs of both EIF4A1 and GABARAP, and the binding affinity of this TF was predicted to be disrupted by the minor allele of regulatory SNV rs55894190 (r2 = 0.773, D′ = 0.988 with PD risk allele of PD GWAS-SNV rs12600861) within the EIF4A1 and NLGN2 cCREs. The minor allele of a second SNV in this region (rs78138308) was also in high LD with the same PD risk allele (r2 = 0.769, D′ = 0.988). Binding of TF-DEG TBP was predicted to be disrupted by this variant, as was that of ZNF40, enriched in cCREs of EIF4A1, GABARAP, FXR2, and POLR2A, and ZNF219.

On chromosome 16, two SNVs (rs12445650, rs58726213) showed strong LD with PD GWAS-SNV rs12600861 (r2 = 0.801, D′ = 0.901) within cCRE peaks linked to the PD GWAS-DEG RNF40 in Exc5 nuclei (Fig. 7C, Table S11). The risk allele of PD GWAS-SNV rs58726213l was in high LD with the major allele of regulatory SNV rs58726213, which was predicted to have a stronger binding affinity of cAMP response element-binding TFs ATF1 and CREB1, the latter of which was also encoded by an upregulated DEG in Exc5. These findings suggest the possible involvement of calcium-responsive pathways in to RNF40 regulation. Moreover, binding of the TF-DEG YY1 was predicted to be disrupted by the PD risk variant allele at this locus.

Sequence variants in high LD with PD risk alleles were also identified in other cell subtype clusters, including in cCREs of upregulated PD GWAS-DEGs in neuronal cluster Exc3 (Fig. 7D, Table S11), and downregulated PD GWAS-DEGs in microglial cluster Micro1 (Fig. 7E, Table S11), as well as numerous other subtype clusters (Table S11).

Discussion

In this study we analyzed snRNA-seq and snATAC-seq data in parallel from TC samples of the same donors to examine the transcriptomic and epigenomic landscapes associated with PD progression in cell subtypes (Fig. 1). Our study design incorporated innovative approaches with regard to both biology and analytical methodology in order to uncover novel genetic factors underlying PD. Biologically, the analysis of the TC, most strongly affected during later stages of the disease, provided new insights into the molecular changes leading to PD progression. Moreover, the TC regions of our study subjects were only mildly affected, thus allowing us to capture molecular changes occurring prior to neurodegenerative processes. Furthermore, while a recent study reported PD-associated transcriptomic and chromatin accessibility changes at a cell type-specific level [18], to the best of our knowledge our study is the first comprehensive multi-omics single cell analysis of PD at a granular cell subtype level of precision. Analytically, this is the first study that has examined several classes of variants, including short structural variants (SSVs) such as indels, in relation to mechanisms of transcriptional dysregulation in PD.

We characterized a specific subpopulation of vulnerable cortical glutamatergic neurons with strikingly altered gene expression in PD. The high number of DEGs identified for the Exc5 cluster was of particular interest given the important role of excitatory neurons in driving PD pathology [13,14,15]. Furthermore, the Exc5 subtype of glutamatergic neurons was the only neuronal cell subtype showing increased SNCA expression. SNCA protein is the main component of Lewy bodies and is the pathological hallmark of PD [66], and overexpression of SNCA has been implicated in PD pathogenesis (we reviewed previously [45]). SNCA mutations were also identified in rare families with fPD [4, 67]. Thus, upregulation of SNCA levels further supports the important role of Exc5 in PD. The signature of the Exc5 subtype was similar to Exc1 in marker gene expression yet the former population expressed > sevenfold more DEGs. The genetic markers defining Exc5 suggested that compared to other excitatory neuron subtypes Exc5 cells were characterized by dysregulation of several key neuronal functions including: (1) cell development and organization, guided in part by the Roundabout signaling pathway; (2) synaptic organization, prioritizing postsynaptic reception of excitatory neurotransmitters and downregulation of inhibitory receptors and genes involved in presynaptic signal transmission; and (3) regulation of intracellular calcium ion levels with comparative repression of downstream calcium response elements. Neuronal organization and axon guidance have been previously implicated in PD vulnerability [68] as increased axonal arborization has been associated with increased reactive oxygen species (ROS) stress due to mitochondrial hyperactivity, while semaphorin-mediated reduction of arborization led to decreased ROS production and increased neuronal resilience [69]. We suggest that specific organizational and synaptic structural features of Exc5 neurons may contribute to the disproportionate transcriptional dysregulation in PD observed for this cell type. Further investigations in PD model systems are needed to elucidate the specific biological functions of the neuronal subtype represented by Exc5, and to better understand its role in PD pathology.

Additionally, using our integrative single cell multi-omic approach, we examined several different regulatory mechanisms potentially governing transcriptional dysregulation in the context of PD pathogenesis and disease progression. First, we identified PD-associated changes in chromatin accessibility within cCREs coaccessible to the promoter/intron 1 regions of DEGs which may thereby have a mechanistic effect on dysregulation of these DEGs in the PD state. Next, we identified enrichment for binding motifs of differentially expressed TFs within DEG-linked cCREs, indicating that these TFs may act as master regulators of numerous genes driving PD neuropathological progression. These results suggested that changes in the expression levels of a relatively small number of TFs could mechanistically drive large-scale dysregulation of gene expression within key disease-relevant cell subtypes such as Exc5. Finally, we catalogued regulatory genetic variants, both SNVs and SSVs, that are predicted to change the affinities of TF binding sites within cCREs linked to PD GWAS-DEGs, and identified several regulatory SNVs and SSVs that were in strong LD with PD GWAS-SNVs. Our finding suggested that transcriptional regulatory variants underlie mechanistically, at least in part, the disease risk impact of PD GWAS loci.

Our analysis pointed at YY1, SP3, and KLF16 as candidate master regulators of gene expression in PD, especially within the Exc5 cell subtype. All of these TFs were also found to be differentially expressed in PD, and their binding affinities within cCREs were impacted by regulatory variants showing high LD with GWAS SNVs. YY1 can act as both a transcriptional activator and repressor, and is thought to play an important role in neuronal development and function [70, 71]. YY1 has not been previously associated with PD, but has been suggested to play a role in Alzheimer’s neurodegeneration by regulation of BACE1 and other proteins involved in amyloid-beta (Aβ) generation [70], and is also known to mediate DNA repair [72]. SP3 is involved in many developmental processes [73], as well as in inflammatory NF-κB signaling [74]. The gene encoding KLF16 was also mapped within a PD-GWAS region and this TF functions both in transcriptional activation and repression [75]. KLF16 has also been shown to inhibit neurite outgrowth [76], and to regulate dopaminergic synaptic transmission in the striatum [77]. Binding motifs for KLF16 were enriched within cCRE regions of the postsynaptic scaffold gene MPP2, and KLF16 binding in this region was predicted to be enhanced by an identified PD risk-associated sequence variant. Of note, MPP2 plays an essential role in synaptic ion channel function and glutamatergic neuron excitability [78], thus changes in its expression may result in disrupted postsynaptic structure and neurotransmission. Consistently, post-synapse-associated genes were among the most highly enriched markers for the Exc5 neuronal subtype.

Our study presents important advancements to the field of PD genetics. However, there are several limitations. First, while the TC region has previously been shown to be involved in later stages of PD [79,80,81,82,83,84], in this work, all PD donor samples were in stages preliminary to major disease progression to the cortical brain region, and the majority of samples thus exhibited little to no Lewy pathology within the TC, based on established metrics [85]. Furthermore we did not find any dopaminergic neuron clusters within our dataset, the subtype most highly impacted by neurodegeneration in early PD stages. Thus, our data reflect PD-associated transcriptional changes in intact TC tissue preceding major neurodegenerative effects in this region, providing insight into preliminary genetic mechanisms leading to disease progression and subsequent neurodegeneration and cognitive effects. As our interest focused on disease progression into tissues affected in later stages, however, the donor samples did show pronounced pathology within early-affected tissues such as the SN. Understanding the effects of the disease in later stages may aid in the development of treatment strategies to halt disease progression. Another limitation stems from the uncertainty inherent in linking cell subtype clusters derived from parallel datasets. We previously estimated the average accuracy of such cluster linking to be approximately 75.6%, and found that this level of certainty was sufficient to provide meaningful insight into the relationships between epigenetic and transcriptomic observations [57]. The recent development of simultaneous multi-omic single-nucleus sequencing methods [86] would overcome this limitation. Additionally, we employed covariate selection analysis and tailored our statistical models to control for identified confounding variables as well as for random effects associated with specific donor samples, but it is not possible to completely control for all possible confounding effects when dealing with complex biological systems. Moreover, while we provided an expanded investigation into the impact of genomic variation on gene expression in PD by including both SNVs and indels in our analysis, our methodology using MotifbreakR did not include the assessment of longer repeat structural variants. Finally, we reiterate the caveat that the mechanistic relationships described here between chromatin accessibility, sequence variation, and gene expression are predictive in nature and empirical evidence obtained through controlled experimentation will be required to validate these relationships and their roles in PD. Physical interactions between cCREs and putative target genes identified in silico may be validated using novel single-cell methods such as Promoter Capture Hi-C for 3D genomic mapping [87].

Here, we have provided mechanistic insights into the gene expression patterns associated with PD. Examining TC tissues with minimal neuropathological perturbations opened a window into transcriptional dysregulation occurring in early stages of PD progression to the cortical brain, prior to neurodegeneration in this region. Thus, while these changes in gene expression occur subsequently to disease onset in earlier-affected tissues, they likely represent molecular events specifically driving the progression of PD into the cortex. Collectively, we described PD-associated changes in the cross talk between cCREs, TFs, genetic variants, and gene expression. Furthermore, here for the first time, we identified candidate PD regulatory variants that are in high LD with PD GWAS-SNVs and hence are candidate PD causal variants. Follow up experimental validation in PD model systems is warranted to determine the effects of these candidate regulatory variants on TF binding and gene expression and to confirm their relevance to PD. Together, these findings provide new insight into the underlying mechanisms of transcriptional dysregulation in PD at a cell subtype-specific level and further support the role of gene dysregulation in PD risk and progression.

Methods

Human post-mortem brain tissue samples

Frozen human TC tissue samples from donors clinically diagnosed with PD (n = 12) were obtained from the Banner Sun Health Research Institute (BSHRI) [88], and neurologically healthy control donor samples (Normal) (n = 12) were obtained from the Kathleen Price Bryan Brain Bank (KPBBB) at Duke University. Unaffected controls were derived from donors with no clinical history of neurological disorder and samples had no neuropathological evidence of neurodegenerative diseases. Donor patient PD diagnoses were defined by the presence of two of the three cardinal clinical signs of resting tremor, muscular rigidity and bradykinesia. Additionally, diagnoses of all 12 PD samples were confirmed in autopsy by observation of pigmented neuron loss and the presence of Lewy bodies in the SN. Neuropathological states of PD samples were confirmed postmortem using established clinical practice recommendations for McKeith scoring [85] and staging via the Unified Staging System for Lewy Body Disorders (USSLB) [89]. All PD samples for which information was available had McKeith scores ranging from moderate to severe (2–4) in both the amygdala and SN. Where available, TC McKeith scores for most of the PD samples were either 0–1, with one sample each receiving scores of 2 and 3, indicating mild or absent PD pathology in this region for the majority of samples. USSLB stages of PD samples ranged from II–IV. PD samples 96-36 and 96-49 were lacking specific USSLB stage determination due to harvesting prior to BSHRI standardization of stage determination protocol. All tissue donors were of European ancestry. The demographics and other metadata for this cohort are detailed in Tables 1 and S1. Complete pathology analysis results of PD samples are provided in Table S12. The project was approved for exemption by the Duke University Health System Institutional Review Board. The methods described were conducted in accordance with the relevant guidelines and regulations.

Cohort statistics

For comparisons of demographic variables, R statistical programming language was used. Age and post-mortem interval (PMI) of female PD was compared to female Normal, and age and PMI of male PD was compared to male Normal (Table S13). The Shapiro–Wilk test was used for normality, Bartlett’s test for equal variance of normally distributed data, and Levene’s test for equal variance of non-normally distributed data. If compared groups were normally distributed but had unequal variance, two sample Welch’s t-tests were used to determine differences between group means. If groups were not normal but had equal variances, a Mann–Whitney’s U test was run.

Nuclei isolation from post-mortem human brain tissue

The nuclei isolation procedure has been described [57], and was based on previous studies [90, 91] and optimized for single-cell experiments. 100–200 mg of human TC brain tissue samples were thawed in Lysis Buffer (0.32 M Sucrose, 5 mM CaCl2, 3 mM Magnesium Acetate, 0.1 mM EDTA, 10 mM Tris–HCl pH 8, 1 mM DTT, 0.1% Triton X-100) and homogenized with a 7 ml dounce tissue homogenizer (Corning) and filtered through a 100 μm cell strainer, transferred to a 14 × 89 mm polypropylene ultracentrifuge tube, and underlain with sucrose solution (1.8 M Sucrose, 3 mM Magnesium Acetate, 1 mM DTT, 10 mM Tris–HCl, pH 8). Nuclei were separated by ultracentrifugation for 15 min at 4 °C at 107,000 RCF. Supernatant was aspirated, and nuclei were washed with 1 ml Nuclei Wash Buffer (10 mM Tris–HCl pH 8, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20, 1% BSA, 0.2 U/μL RNase Inhibitor). 800 μL resuspended nuclei were transferred to a microcentrifuge tube designated for the 10X Genomics single-cell ATAC assay while 200 μL was transferred to a microcentrifuge tube designated for the 10X Genomics single-cell gene expression assay. Nuclei were centrifuged at 300 RCF for 5 min at 4 °C, and supernatant was aspirated. For ATAC, nuclei were resuspended in Diluted Nuclei Buffer (10X Genomics). For the gene expression assay, nuclei were resuspended in Wash and Resuspension Buffer (1X PBS, 1% BSA, 0.2 U/μL RNase Inhibitor). Nuclei were then filtered through a 35 μm strainer. Nuclei concentrations determined using a Countess™ II Automated Cell Counter (ThermoFisher) and nuclei quality was assessed at 10X and 40X magnification using an Evos XL Core Cell Imager (ThermoFisher).

Parallel snATAC-seq/snRNA-seq library preparation and sequencing

snATAC-seq libraries were constructed as previously [57] using the Chromium Next GEM Single Cell ATAC Library and Gel Bead v1.1 kit, Chip H Single Cell kit, and Single Index Kit N Set A (10X Genomics) according to manufacturer’s instructions. In parallel, from the same pool of nuclei from each sample, snRNA-seq libraries were constructed using the Chromium Next GEM Single Cell 3’ GEM, Library, and Gel Bead v3.1 kit, Chip G Single Cell kit, and i7 Multiplex kit (10X Genomics) according to manufacturer’s instructions. For each sample, 10,000 nuclei were targeted for both the ATAC and 3′ assays. Library quality control was performed on a Bioanalyzer (Agilent) with the High Sensitivity DNA Kit (Agilent) according to manufacturer’s instructions and the 10X Genomics protocols. Libraries were submitted to the Sequencing and Genomic Technologies Shared Resource at Duke University for quantification using the KAPA Library Quantification Kit for Illumina® Platforms and sequencing. Groups of four snRNA-seq libraries were pooled on a NovaSeq 6000 S1 50 bp PE full flow cell to target a sequencing depth of 400 million reads per sample (Read 1 = 28, i7 index = 8, and Read 2 = 91 cycles). Groups of four snATAC-seq libraries were pooled on a NovaSeq 6000 S1 100 bp PE full flow cell to target a sequencing depth of 400 million reads per sample (Read 1N = 50, i7 index = 8, i5 index = 16, and Read 2N = 50 cycles). Sequencing was performed blinded to age, sex, and diagnosis.

snRNA-seq data processing

Raw snRNA-seq sequencing data were converted to FastQ format, aligned to a GRCh38 pre-mRNA reference, filtered, and counted using CellRanger 4.0.0 (10X Genomics). Subsequent processing was done using Seurat 4.0.1 [92]. Filtered feature-barcode matrices were used to generate Seurat objects for the 24 samples. For QC filtering, nuclei below the 1st and above the 99th percentile for number of features were excluded. Nuclei above the 95th percentile for mitochondrial gene transcript proportion (or > 5% mitochondrial transcripts if 95th percentile mitochondrial transcript proportion was < 5%) were also excluded. Because experiments were conducted in nuclei rather than whole cells, mitochondrial genes were subsequently removed. The 24 Seurat objects were merged into one, and were iteratively normalized using SCTransform [93] with glmGamPoi, which alleviates bias from weakly-expressed genes [94]. Batch correction was performed using reference-based integration [22] on the 24 normalized datasets, which improves computational efficiency for integration.

Cell type and subtype cluster annotation

Cell type annotation was conducted using a label transfer method [22] and a previously annotated reference dataset from human M1. Batch-corrected data from both our dataset and the human M1 dataset were used for label transfer. Nuclei with maximum prediction scores of < 0.5 were excluded. Nuclei with a percent difference of < 20% between first and second highest cell type prediction scores were termed “hybrid” and excluded [95]. Endothelial cells and VLMCs were in low abundance and did not form distinct UMAP clusters and were thus excluded. Following PCA, dimensionality was examined using an Elbow plot and by calculating variance contribution of each PC. UMAP was then run using the first 30 PCs, and nuclei were clustered based on UMAP reduction at a resolution of 0.7. This resolution level for cluster delineation was selected after comparison of a range of values as it was determined to provide optimal distinction between populations of nuclei displaying unique gene expression profiles as evidenced by their separation from one another in UMAP space. Counts of predicted major cell types based on the label transfer were examined for each of the 36 clusters (Table S2), and clusters were manually annotated based on the majority cell type for each cluster (e.g., ‘Exc1’, ‘Exc2’, etc.). Most frequent cell types comprised over 90% of cells in each cluster with the exceptions of Astro4, Micro2, Micro3 and OPC2 (Table S2).

Human M1 reference data processing

To optimize label transfer, we re-processed previously published human primary motor cortex (M1) snRNA-seq data [23] to map it to GRCh38 Ensembl 80 as we did with our data [57]. FastQ files were obtained from the Neuroscience Multi-omic Data Archive (NeMO: https://nemoarchive.org/) and were aligned to the same GRCh38 pre-mRNA reference used for our data, filtered, and counted using CellRanger 4.0.0 (10X Genomics). Filtered feature-barcode matrices were used to generate separate Seurat objects for each sample, with nuclei absent from the annotated metadata excluded. Seurat objects were merged and iteratively normalized using SCTransform [93] with glmGamPoi. Batch correction was performed using reference-based integration [22] on the normalized datasets. The 127 transcriptomic cell types in this data were grouped into 8 broad cell types including astrocytes, endothelial cells, excitatory neurons, inhibitory neurons, microglia, oligodendrocytes, OPCs, and VLMCs.

Cell type proportion comparisons

For both snRNA-seq and snATAC-seq datasets, proportion comparisons of cell types and subtypes between PD and Normal groups were made using the MASC algorithm [28]. Proportions were calculated by counting nuclei of each cell type and subtype and dividing this number by the total number of nuclei for each sample. For snRNA-seq data, age, sex, PMI, number of nuclei after QC filtering, median genes per cell, and average library size were included as fixed effects for MASC and sample donor ID was included as a random effect. For snATAC-seq data, age, sex, PMI, and number of nuclei after QC filtering were included as fixed effects for MASC and sample donor ID was included as a random effect. Benjamini–Hochberg correction for multiple testing was applied to MASC-derived p values.

snATAC-seq data processing

As described previously [57], DNA fragments acquired from snATAC-seq were sequenced and converted to FastQ format, and then mapped to GENCODE’s human release 32 reference [96] and counted using CellRanger-ATAC 1.2.0 (10X Genomics). Remaining nuclei were screened using the following QC metrics: (I) Nucleosome signal: Ratio of mononucleosome fragments (147 to 294 bp) to nucleosome free fragments (< 147 bp). Nuclei having a nucleosome signal of > 4 were excluded [97]; (II) Transcription start site (TSS) enrichment: Ratio of aggregated, normalized read signal centered around a reference set of TSS’s compared to signal in the TSS flanking regions. Nuclei with TSS enrichment scores < 2 were excluded [97]; (III) Percent reads in peaks: Proportion of fragments per cell mapping to peak regions. Cells with < 15% of reads in peaks were excluded [97]; (IV) Total peak region fragments: Cells with < 1000 peak region fragments were excluded due to low sequencing depth. Additionally, cells in the upper 1% in each sample distribution were excluded as a precaution against multiplets [97]; (V) Blacklist ratio: Proportion of fragments mapping to sequences associated with technical artifacts. Cells with > 5% of fragments mapping to blacklisted regions were excluded [98].

Seurat 4.0.1 [22], Signac 1.3.0 [97], and Harmony 0.1.0 [99] R packages were used for subsequent processing. Latent semantic indexing (LSI) was used to create a low-rank approximation of the data [100]. The 24 datasets were term frequency-inverse document frequency (TF-IDF) normalized and aggregated to form a joint peak-by-cell count matrix. Singular value decomposition (SVD) was then performed on this dataset, followed by standardization of left singular vectors, representing LSI components. Correlation of each component with sequencing depth was then measured. The first dimension was removed from downstream analysis due to high sequencing depth correlation (rho = 0.7) [97]. Remaining LSI components were then adjusted with using the RunHarmony function to remove batch effects prior to data clustering. As in snRNA dimensionality reduction, we used dimensions 2 through 30 for clustering and subsequent analysis.

Clusters were constructed from the adjusted LSI embeddings of the integrated dataset using the FindNeighbors and FindClusters Seurat functions, with k-nearest neighbors = 20, and cluster resolution = 1. The data were then projected onto a 2D surface via UMAP to evaluate cluster resolution.

Cell type annotation of snATAC-seq nuclei

As described previously [57], cell type annotation of the snATAC cells was performed using the integrated snRNA dataset as a reference [22]. “Gene activity” matrices were constructed for each nucleus from snATAC samples by counting fragments mapped to gene promoters (2000 bp upstream to 200 bp downstream of TSS). These matrices were log-normalized following promoter region fragment quantifications. The FindTransferAnchors Seurat function was then employed to annotate the snATAC data using the snRNA data as reference, with reduction via canonical correlation analysis. This entails computation of cross correlation between snATAC and snRNA cell variable features. After L2 normalization, the left and right singular vectors from the SVD of this matrix are taken as the canonical correlation vectors. A mutual nearest neighbor approach is then used to find anchors between the datasets, representing biologically similar cell states across modalities. Major cell type prediction scores were calculated for each nucleus using the weighted combination of the k-nearest anchors. The cell type with the maximum prediction score was the predicted cell type identity. Nuclei with maximum prediction scores of < 0.5 were excluded. “Hybrid” nuclei were identified using the same metric as for snRNA-seq data above, and excluded.

Doublet/multiplet detection in snRNA-seq and snATAC-seq data

Multiplets comprising different cell types (heterotypic) were excluded from snRNA and snATAC data by considering the “hybrid score”, as described previously [57]. The hybrid score is calculated as (x1 − x2)/x1, where x1 is the highest and x2 is the second highest prediction score [95]. Heterotypic multiplets would be expected to exhibit competing cell type prediction scores due to the presence of transcriptomic/epigenomic profiles from multiple cell types. Multiplets composed of one cell type (homotypic) were identified based on the number of features per cell (snRNA) or the total number of fragments in peaks (snATAC). snRNA nuclei with feature counts > 99th percentile and snATAC nuclei with total fragments in peaks > 95th percentile were excluded. Methods for Removal of homotypic multiplets in this manner is expected to also aid in filtering of heterotypic multiplets.

Linking snATAC and snRNA datasets

The FindTransferAnchors and TransferData Seurat functions were used to link snATAC clusters to snRNA clusters, as described previously [57]. Anchors were used to transfer snRNA cluster information to snATAC nuclei as in cell type annotation. Each snATAC nucleus was given 36 prediction scores, corresponding to each snRNA cluster. The snRNA cluster prediction scores were summed across all nuclei within each snATAC cluster. Each snATAC cluster was assigned a closest matching snRNA cluster based on its maximum prediction score. To ensure concordance, the cell types of the linked snRNA clusters were compared to the original snATAC cluster cell types. To assess the accuracy of the cluster linking methodology, we previously used PBMC granulocyte multiome data, freely available on the 10X Genomics website, estimating the average accuracy of such cluster linking to be approximately 75.6%, and found that this level of certainty was sufficient to provide meaningful insight into the relationships between epigenetic and transcriptomic observations [57].

Peak calling

As previously [57], peak regions were predicted empirically within each sample separately using the MACS2 algorithm on each cluster [101]. Peaks were designated as regions having a p value of ≤ 10e-5. The Multi Sample Peak Calling (MSPC) software package was used [102] to combine peaks into a consensus set for each cluster, with Fisher’s method used to evaluate overlapping peaks across samples. Peaks occurring in at least 2 samples, with an FDR of ≤ 0.05 from Fisher’s combined probability test were used as the consensus set for downstream analysis.

Covariate selection for differential analyses

Prior to differential analysis, as previously [57] we estimated the impact of multiple technical variables as well as donor-level characteristics separately for the snRNA-seq (Table S1) and snATAC-seq (Table S5) experiments. Read counts were summed for all nuclei in each donor sample, resulting in only one expression/accessibility value per sample per gene/peak, as all nuclei from a particular donor would have identical donor characteristics. Genes with no expression or peaks with zero accessibility for > 20% of samples were subsequently removed, and all values were mean-centered and scaled prior to covariate analysis. PCA was then performed for genes and peaks using prcomp in R. We then carried out linear regression using glm in R for PCs explaining > 10% of the variability in global expression or chromatin accessibility on both nuclei- and donor-specific metadata variables to identify factors that should be included as covariates in differential analyses. Specifically, we selected the variable most associated (surpassing Bonferroni correction for multiple testing, q < 0.05) with PC1 (or alternatively, the PC explaining the most variability) and regressed all genes or peaks on the associated variable to obtain gene or peak residuals that are adjusted for its effect. We then performed PC analysis on the gene or peak residuals, and in an iterative process, repeating the above steps until no additional metadata variables were associated with global expression or chromatin accessibility (q < 0.05). For snRNA-seq data, age, sex, PMI, number of nuclei after QC filtering, median genes per cell, and average library size were selected as covariates for differential expression gene analysis. For snATAC-seq data, age, sex, PMI, and number of nuclei after QC filtering were selected as covariates for differential peak accessibility analysis.

Differential expression analysis

In order to identify DEGs at both the cell type and subtype levels between PD and Normal samples within our snRNA-seq dataset, we employed the NEBULA algorithm [29]. Specifically, the NEBULA-HL procedure was used as this process is optimized for estimating both nucleus-level and donor-level data overdispersions [29, 103]. Prior to running NEBULA, for each cell type and cluster, genes expressed in less than 10% of cells in either group (PD or Normal) were filtered out. For snRNA-seq data, age, sex, PMI, number of nuclei after QC filtering, median genes per cell, and average library size were included as fixed effects for NEBULA and sample donor ID was included as a random effect. The reference level was set to ‘Normal’ such that the results for log2FC coefficients would be positive if up-regulated in PD and negative if down-regulated in PD. Benjamini–Hochberg correction for multiple testing was applied at the gene level to NEBULA-derived p values.

Differential accessibility analysis

In order to identify DAPs at both the cell type and subtype levels between PD and Normal samples within our snATAC-seq dataset, The NEBULA-HL algorithm was used as above for DEG identification. Prior to running NEBULA, for each cluster, peaks accessible in less than 10% of cells in either group (PD or Normal) were filtered out. For snATAC-seq data, age, sex, PMI, and number of nuclei after QC filtering were included as fixed effects for NEBULA and sample donor ID was included as a random effect. The reference level was set to ‘Normal’ such that the results for log2FC coefficients would be positive if more accessible in PD and negative if less accessible in PD. Benjamini–Hochberg correction for multiple testing was applied at the peak level to NEBULA-derived p values.

Cluster marker identification

To characterize the transcriptional profile of Exc5 cells, we used the FindMarkers function of the Seurat R package, employing a Wilcoxon Rank Sum test, to compare gene expression in Exc5 nuclei to nuclei of all other excitatory neuron clusters, thereby identifying marker genes defining this cluster. In order to focus on the most significant marker genes, we used a log2FC threshold of > 0.25 and a gene-level FDR-adjusted p-value threshold of < 0.01, and identified 432 positive (upregulated) marker genes, and 498 negative (downregulated) marker genes for the Exc5 cluster (Table S4).

Biological pathway enrichment analysis

In order to understand the biological significance of gene sets derived from differential expression analyses, we employed the Metascape [30] algorithm (metascape.org). The gene set of interest was input as the target gene list, and the total set of genes examined in the corresponding differential expression analysis was input as the background gene list. In order to group the top 20 enriched Metascape output GO terms into broader biological categories, functions of each individual gene (based on literature search) contributing to a particular category were qualitatively grouped into major functional categories. If a majority of the target genes associated with an enriched pathway GO term were associated with a particular functional category, this category was assigned to the GO term. A single GO term could be assigned to multiple functional categories. For analyses of shared pathways among clusters, Micro3 was excluded, as the majority cell type in this cluster represented below 50 percent of total nuclei in the cluster (Table S2).

Assessment of cCREs and CCANs

We used the R package Cicero [56] to characterize chromatin interactions in the data using as done previously [57]. The Cicero pipeline was conducted on a per cluster basis using PD cells only. Integrated LSI embeddings were passed to Cicero’s bootstrap aggregation process, wherein highly similar cells are aggregated by summing the raw counts in groups of 50k-nearest neighbors. Fragment sums are then normalized to account for within-group sequencing depth. A graphical LASSO is then used to estimate the partial correlation structure of each peak with neighboring peaks. A penalty term based on the genomic distance between peak pairs is used in GLASSO, and the resulting regularized correlations derived from the precision matrix are termed “coaccessibility scores.” The maximum peak-peak distance, at which regularized correlations are assigned 0, was defined as 500 Kb. A minimum coaccessibility score of 0.2 was specified before extracting CCANs from the resulting data via Louvain community detection. A coaccessibility score of ≥ 0.2 was previously shown to provide a reliable indication of meaningful peak pair accessibility correlation, and was thus used as our threshold for defining coaccessible peaks [56, 57]. Each peak within a CCAN was coaccessible with at least one other peak in the network. cCREs were defined as peaks within 500 kb of and with minimum coaccessibility scores of 0.2 with peaks overlapping the promoter (2 kb region upstream of start codon) or intron 1 regions of target genes.

Motif detection and enrichment analysis

To detect TFBS in the data, the genome was scanned for motifs within cCREs (including distal cCRE peaks and promoter/intron1 peaks) of DEGs of interest, using position weight matrices (PWMs) from JASPAR 2020 [104, 105]. A p value threshold of 5e-5 was used for motif matches. In the case of overlapping motif matches of the same TF, only the highest scoring match was used. We then used HOMER to detect motif enrichment in cCREs [61]. HOMER first quantifies the GC content and n-mer composition of both the background and target regions and applies weights to eliminate sequence bias before using a binomial test to compute enrichment p-values. Peaks within cCREs were used as target sequences, whereas all other cluster-specific peaks were used as background regions. For downstream analysis, we used only enriched motifs with FDR ≤ 0.05 and fold enrichment ≥ 1.2.

Definition of loci and SNVs for analysis

Loci for analysis of SNV impact on TF binding sites were defined by the cell-specific clusters for each cell type (microglia, oligodendrocyte precursor cells, inhibitory neurons) that were determined from the PD-GWAS data [5]. Identity of the cell-type and associated GWAS gene were retained for all subsequent analysis steps. SNVs located within the cCREs were catalogued from dbSNV version 151 with a global MAF ≥ 0.01. Genomic location, reference and alternate alleles were obtained from dbSNV and used for the TF analysis.

In silico prediction of TF binding sites affected by genetic variants in PD

Prediction of TF binding sites was completed for 1175 sequence variants, including, 1035 SNVs, 27 insertions, and 113 deletions. The software package/algorithm motifbreakR [63] (version 2.15) was used to estimate or predict whether the sequence surrounding a variant matches to specific TF binding sites, and how one allele of the variant relative to the other affects the strength of the TF binding site (gain or loss of the TF binding affinity). MotifbreakR can predict effects for novel or previously described variants in public databases. TF binding affinity scores were compared for all variants with at least 1% frequency within the database population using the 1000 Genomes database of human genetic variation [64]. For this study, we utilized the information content (ic) algorithm and position weight matrices from Homer, HOCOMOCO, Factorbook and ENCODE. Each variant from the catalogue we generated from the cell-specific PD-GWAS was evaluated for the potential to disrupt/gain TF binding sites using a predicted p value < 1 × 10–4. The variants were evaluated for impact on specific TF binding with calculation of a permutation p value, score for impact on binding and assessment of loss or gain of a binding site based on the motifbreakR calculations.

Linkage disequilibrium (LD) calculations

LD statistics (r2 and D′) between the PD GWAS-SNV and the specific variant that disrupts the transcription factor binding site using the LD matrix tool option from the LD Link software [106]. European panels were used to correspond to the ancestries covered in the PD GWAS. JMP software (SAS Institute, Cary, NC, version 17.0.0) was used to complete the annotation of the pairwise LD results for all of the PD GWAS SNVs and corresponding SNVs in the transcription factor binding sites.

Genome version and coordinates

All genomic data and coordinates are based on the December 2013 version of the genome: hg38, GRCh38.

Availability of data and materials

The snRNA-seq and snATAC-seq data are available at the Synapse data repository (https://synapse.org, ProjectSynID: syn60245188 and syn50996869). Access will be avaliable upon request under controlled use conditions. In addition, the snRNA-seq and snATAC-seq raw and normalized count data generated in this study are available at the Duke Research Data Repository (https://research.repository.duke.edu). All computer code used for this study has been deposited to GitHub (https://www.github.com) and will be made available upon request.

References

  1. Ou Z et al (2021) Global trends in the incidence, prevalence, and years lived with disability of Parkinson’s disease in 204 countries/territories from 1990 to 2019. Front Public Health 9:776847

    Article  PubMed  PubMed Central  Google Scholar 

  2. Group GNDC (2017) Global, regional, and national burden of neurological disorders during 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet Neurol 16:877–897

    Article  Google Scholar 

  3. Funayama M, Nishioka K, Li Y, Hattori N (2023) Molecular genetics of Parkinson’s disease: contributions and global trends. J Hum Genet 68:125–130

    Article  PubMed  Google Scholar 

  4. Lim SY, Klein C (2024) Parkinson’s disease is predominantly a genetic disease. J Parkinsons Dis 14:467–482

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Nalls MA et al (2019) Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol 18:1091–1102

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Kim JJ et al (2023) Multi-ancestry genome-wide association meta-analysis of Parkinson’s disease. Nat Genet 56:27–36

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Bustos BI et al (2022) Genome-wide contribution of common short-tandem repeats to Parkinson’s disease genetic risk. Brain 146:65–74

    Article  PubMed Central  Google Scholar 

  8. Billingsley KJ et al (2023) Genome-wide analysis of structural variants in Parkinson disease. Ann Neurol 93:1012–1022

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Afek A et al (2018) Toward deciphering the mechanistic role of variations in the Rep1 repeat site in the transcription regulation of SNCA gene. Neurogenetics 19:135–144

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Chiba-Falek O (2017) Structural variants in SNCA gene and the implication to synucleinopathies. Curr Opin Genet Dev 44:110–116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Glenn OC, Tagliafierro L, Beach TG, Woltjer RL, Chiba-Falek O (2017) Interpreting gene expression effects of disease-associated variants: a lesson from SNCA rs356168. Front Genet 8:133

    Article  PubMed  PubMed Central  Google Scholar 

  12. Lutz MW et al (2015) A cytosine-thymine (CT)-rich haplotype in intron 4 of SNCA confers risk for Lewy body pathology in Alzheimer’s disease and affects SNCA expression. Alzheimers Dement 11:1133–1143

    Article  PubMed  Google Scholar 

  13. Agarwal D et al (2020) A single-cell atlas of the human substantia nigra reveals cell-specific pathways associated with neurological disorders. Nat Commun 11:4183

    Article  PubMed  PubMed Central  Google Scholar 

  14. Lake BB et al (2018) Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat Biotechnol 36:70–80

    Article  CAS  PubMed  Google Scholar 

  15. Kamath T et al (2022) Single-cell genomic profiling of human dopamine neurons identifies a population that selectively degenerates in Parkinson’s disease. Nat Neurosci 25:588–595

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Lecours C et al (2018) Microglial implication in Parkinson’s disease: loss of beneficial physiological roles or gain of inflammatory functions? Front Cell Neurosci 12:282

    Article  PubMed  PubMed Central  Google Scholar 

  17. Booth HDE, Hirst WD, Wade-Martins R (2017) The role of astrocyte dysfunction in Parkinson’s disease pathogenesis. Trends Neurosci 40:358–370

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lee AJ et al (2023) Characterization of altered molecular mechanisms in Parkinson’s disease through cell type-resolved multiomics analyses. Sci Adv 9:eabo467

    Article  Google Scholar 

  19. Aarsland D, Kurz MW (2010) The epidemiology of dementia associated with Parkinson disease. J Neurol Sci 289:18–22

    Article  PubMed  Google Scholar 

  20. Williams-Gray CH et al (2013) The CamPaIGN study of Parkinson’s disease: 10-year outlook in an incident population-based cohort. J Neurol Neurosurg Psychiatry 84:1258–1264

    Article  PubMed  Google Scholar 

  21. Aarsland D et al (2021) Parkinson disease-associated cognitive impairment. Nat Rev Dis Primers 7:47

    Article  PubMed  Google Scholar 

  22. Stuart T et al (2019) Comprehensive integration of single-cell data. Cell 177:1888-1902 e21

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Bakken TE et al (2020) Evolution of cellular diversity in primary motor cortex of human, marmoset monkey, and mouse. bioRxiv 2020.03.31.016972

  24. Mathys H et al (2019) Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570:332–337

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Yang AC, Vest RT, Kern F (2022) A human brain vascular atlas reveals diverse mediators of Alzheimer’s risk. Nature 603:885–892

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Sun N et al (2023) Single-nucleus multiregion transcriptomic analysis of brain vasculature in Alzheimer’s disease. Nat Neurosci 26:970–982

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Johnson TS et al (2021) Combinatorial analyses reveal cellular composition changes have different impacts on transcriptomic changes of cell type specific genes in Alzheimer’s Disease. Sci Rep 11:353

    Article  PubMed  PubMed Central  Google Scholar 

  28. Fonseka CY et al (2018) Mixed-effects association of single cells identifies an expanded effector CD4(+) T cell subset in rheumatoid arthritis. Sci Transl Med 10:eaaq0305

    Article  PubMed  PubMed Central  Google Scholar 

  29. He L et al (2021) NEBULA is a fast negative binomial mixed model for differential or co-expression analysis of large-scale multi-subject single-cell data. Commun Biol 4:629

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zhou Y et al (2019) Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun 10:1523

    Article  PubMed  PubMed Central  Google Scholar 

  31. Flinkman D et al (2023) Regulators of proteostasis are translationally repressed in fibroblasts from patients with sporadic and LRRK2-G2019S Parkinson’s disease. npj Parkinson’s Dis 9:20

    Article  CAS  Google Scholar 

  32. Martin I (2016) Decoding Parkinson’s disease pathogenesis: the role of deregulated mRNA translation. J Parkinsons Dis 6:17–27

    Article  PubMed  PubMed Central  Google Scholar 

  33. D'Angiolini S, Lui M, Mazzon E, Calabrò M (2024) Network analysis performed on transcriptomes of Parkinson's disease patients reveals dysfunction in protein translation. Int J Mol Sci 25:1299

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Correddu D, Leung IKH (2019) Targeting mRNA translation in Parkinson’s disease. Drug Discov Today 24:1295–1303

    Article  CAS  PubMed  Google Scholar 

  35. Bi M, Du X, Jiao Q, Chen X, Jiang H (2021) Expanding the role of proteasome homeostasis in Parkinson’s disease: beyond protein breakdown. Cell Death Dis 12:154

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. McKinnon C et al (2020) Early-onset impairment of the ubiquitin-proteasome system in dopaminergic neurons caused by α-synuclein. Acta Neuropathol Commun 8:17

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Dantuma NP, Bott LC (2014) The ubiquitin-proteasome system in neurodegenerative diseases: precipitating factor, yet part of the solution. Front Mol Neurosci 7:70

    Article  PubMed  PubMed Central  Google Scholar 

  38. Sahoo S, Padhy AA, Kumari V, Mishra P (2022) Role of ubiquitin-proteasome and autophagy-lysosome pathways in α-synuclein aggregate clearance. Mol Neurobiol 59:5379–5407

    Article  CAS  PubMed  Google Scholar 

  39. Henrich MT, Oertel WH, Surmeier DJ, Geibl FF (2023) Mitochondrial dysfunction in Parkinson’s disease—a key disease hallmark with therapeutic potential. Mol Neurodegener 18:83

    Article  PubMed  PubMed Central  Google Scholar 

  40. Zampese E et al (2022) Ca(2+) channels couple spiking to mitochondrial metabolism in substantia nigra dopaminergic neurons. Sci Adv 8:eabp8701

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Angelova PR, Abramov AY (2018) Role of mitochondrial ROS in the brain: from physiology to neurodegeneration. FEBS Lett 592:692–702

    Article  CAS  PubMed  Google Scholar 

  42. Martinez TN, Greenamyre JT (2012) Toxin models of mitochondrial dysfunction in Parkinson’s disease. Antioxid Redox Signal 16:920–934

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. González-Rodríguez P et al (2021) Disruption of mitochondrial complex I induces progressive parkinsonism. Nature 599:650–656

    Article  PubMed  PubMed Central  Google Scholar 

  44. Wang ZX, Li YL, Pu JL, Zhang BR (2023) DNA damage-mediated neurotoxicity in Parkinson's disease. Int J Mol Sci 24:6313

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Tagliafierro L, Chiba-Falek O (2016) Up-regulation of SNCA gene expression: implications to synucleinopathies. Neurogenetics 17:145–157

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Tagliafierro L, Zamora ME, Chiba-Falek O (2019) Multiplication of the SNCA locus exacerbates neuronal nuclear aging. Hum Mol Genet 28:407–421

    Article  CAS  PubMed  Google Scholar 

  47. Hodge RD et al (2019) Conserved cell types with divergent features in human versus mouse cortex. Nature 573:61–68

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Wei JR et al (2022) Identification of visual cortex cell types and species differences using single-cell RNA sequencing. Nat Commun 13:6902

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Chen X et al (2023) Identification of HPCAL1 as a specific autophagy receptor involved in ferroptosis. Autophagy 19:54–74

    Article  CAS  PubMed  Google Scholar 

  50. Wang WT et al (2019) Activation of the lysosome-associated membrane protein LAMP5 by DOT1L serves as a bodyguard for MLL fusion oncoproteins to evade degradation in leukemia. Clin Cancer Res 25:2795–2808

    Article  CAS  PubMed  Google Scholar 

  51. Li N et al (2023) Targeting ANXA7/LAMP5-mTOR axis attenuates spinal cord injury by inhibiting neuronal apoptosis via enhancing autophagy in mice. Cell Death Discov 9:309

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Mills JD et al (2016) LINC00507 is specifically expressed in the primate cortex and has age-dependent expression patterns. J Mol Neurosci 59:431–439

    Article  CAS  PubMed  Google Scholar 

  53. Ypsilanti AR, Chedotal A (2014) Roundabout receptors. Adv Neurobiol 8:133–164

    Article  PubMed  Google Scholar 

  54. Russell SA, Bashaw GJ (2018) Axon guidance pathways and the control of gene expression. Dev Dyn 247:571–580

    Article  PubMed  PubMed Central  Google Scholar 

  55. Tao W, Díaz-Alonso J, Sheng N, Nicoll RA (2018) Postsynaptic δ1 glutamate receptor assembles and maintains hippocampal synapses via Cbln2 and neurexin. Proc Natl Acad Sci U S A 115:E5373-e5381

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Pliner HA et al (2018) Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol Cell 71:858-871.e8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Gamache J et al (2023) Integrative single-nucleus multi-omics analysis prioritizes candidate cis and trans regulatory networks and their target genes in Alzheimer’s disease brains. Cell Biosci 13:185

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. GrandPré T, Nakamura F, Vartanian T, Strittmatter SM (2000) Identification of the Nogo inhibitor of axon regeneration as a Reticulon protein. Nature 403:439–444

    Article  PubMed  Google Scholar 

  59. Schaaf MB, Keulers TG, Vooijs MA, Rouschop KM (2016) LC3/GABARAP family proteins: autophagy-(un)related functions. Faseb J 30:3961–3978

    Article  CAS  PubMed  Google Scholar 

  60. Liu Y, Fallon L, Lashuel HA, Liu Z, Lansbury PT Jr (2002) The UCH-L1 gene encodes two opposing enzymatic activities that affect alpha-synuclein degradation and Parkinson’s disease susceptibility. Cell 111:209–218

    Article  CAS  PubMed  Google Scholar 

  61. Heinz S et al (2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38:576–589

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Rademacher N, Schmerl B, Lardong JA, Wahl MC, Shoichet SA (2016) MPP2 is a postsynaptic MAGUK scaffold protein that links SynCAM1 cell adhesion molecules to core components of the postsynaptic density. Sci Rep 6:35283

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Coetzee SG, Coetzee GA, Hazelett DJ (2015) motifbreakR: an R/bioconductor package for predicting variant effects at transcription factor binding sites. Bioinformatics 31:3847–3849

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Fairley S, Lowy-Gallego E, Perry E, Flicek P (2019) The international genome sample resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res 48:D941–D947

    Article  PubMed Central  Google Scholar 

  65. Clough RL, Dermentzaki G, Stefanis L (2009) Functional dissection of the alpha-synuclein promoter: transcriptional regulation by ZSCAN21 and ZNF219. J Neurochem 110:1479–1490

    Article  CAS  PubMed  Google Scholar 

  66. Zalon AJ, Quiriconi DJ, Pitcairn C, Mazzulli JR (2024) α-Synuclein: multiple pathogenic roles in trafficking and proteostasis pathways in Parkinson's disease. Neuroscientist 10738584241232963

  67. Polymeropoulos MH et al (1997) Mutation in the alpha-synuclein gene identified in families with Parkinson’s disease. Science 276:2045–2047

    Article  CAS  PubMed  Google Scholar 

  68. Wong YC et al (2019) Neuronal vulnerability in Parkinson disease: Should the focus be on axons and synaptic terminals? Mov Disord 34:1406–1422

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Pacelli C et al (2015) Elevated mitochondrial bioenergetics and axonal arborization size are key contributors to the vulnerability of dopamine neurons. Curr Biol 25:2349–2360

    Article  CAS  PubMed  Google Scholar 

  70. He Y, Casaccia-Bonnefil P (2008) The Yin and Yang of YY1 in the nervous system. J Neurochem 106:1493–1502

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Gabriele M et al (2017) YY1 haploinsufficiency causes an intellectual disability syndrome featuring transcriptional and chromatin dysfunction. Am J Hum Genet 100:907–925

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Wu S et al (2007) A YY1-INO80 complex regulates genomic stability through homologous recombination-based repair. Nat Struct Mol Biol 14:1165–1172

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. McCoy AM et al (2023) Sp3 is essential for normal lung morphogenesis and cell cycle progression during mouse embryonic development. Development 150:dev200839

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Carver BJ, Plosa EJ, Stinnett AM, Blackwell TS, Prince LS (2013) Interactions between NF-κB and SP3 connect inflammatory signaling with reduced FGF-10 expression. J Biol Chem 288:15318–15325

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Daftary GS et al (2012) Detailed structural-functional analysis of the Krüppel-like factor 16 (KLF16) transcription factor reveals novel mechanisms for silencing Sp/KLF sites involved in metabolism and endocrinology. J Biol Chem 287:7010–7025

    Article  CAS  PubMed  Google Scholar 

  76. Wang J et al (2016) Novel roles and mechanism for Krüppel-like factor 16 (KLF16) regulation of neurite outgrowth and Ephrin receptor A5 (EphA5) expression in retinal ganglion cells. J Biol Chem 291:18084–18095

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Suh Y et al (2019) Dopamine D1 receptor (D1R) expression is controlled by a transcriptional repressor complex containing DISC1. Mol Neurobiol 56:6725–6735

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Peng X et al (2024) MPP2 interacts with SK2 to rescue the excitability of glutamatergic neurons in the BLA and facilitate the extinction of conditioned fear in mice. CNS Neurosci Ther 30:e14362

    Article  CAS  PubMed  Google Scholar 

  79. Irwin DJ, Lee VM, Trojanowski JQ (2013) Parkinson’s disease dementia: convergence of α-synuclein, tau and amyloid-β pathologies. Nat Rev Neurosci 14:626–636

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Sasikumar S, Strafella AP (2020) Imaging mild cognitive impairment and dementia in Parkinson’s disease. Front Neurol 11:47

    Article  PubMed  PubMed Central  Google Scholar 

  81. Hall H et al (2014) Hippocampal Lewy pathology and cholinergic dysfunction are associated with dementia in Parkinson’s disease. Brain 137:2493–2508

    Article  PubMed  Google Scholar 

  82. Yu L et al (2019) Common age-related neuropathologies and yearly variability in cognition. Ann Clin Transl Neurol 6:2140–2149

    Article  PubMed  PubMed Central  Google Scholar 

  83. Smith C et al (2019) Neuropathology of dementia in patients with Parkinson’s disease: a systematic review of autopsy studies. J Neurol Neurosurg Psychiatry 90:1234–1243

    PubMed  Google Scholar 

  84. Compta Y et al (2011) Lewy- and Alzheimer-type pathologies in Parkinson’s disease dementia: which is more important? Brain 134:1493–1505

    Article  PubMed  Google Scholar 

  85. McKeith IG et al (2005) Diagnosis and management of dementia with Lewy bodies: third report of the DLB consortium. Neurology 65:1863–1872

    Article  CAS  PubMed  Google Scholar 

  86. Ma S et al (2020) Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183:1103-1116.e20

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Schoenfelder S, Javierre B-M, Furlan-Magaril M, Wingett SW, Fraser P (2018) Promoter capture Hi-C: high-resolution, genome-wide profiling of promoter interactions. JoVE e57320

  88. Beach TG et al (2015) Arizona study of aging and neurodegenerative disorders and brain and body donation program. Neuropathology 35:354–389

    Article  PubMed  PubMed Central  Google Scholar 

  89. Adler CH et al (2019) Unified staging system for lewy body disorders: clinicopathologic correlations and comparison to Braak staging. J Neuropathol Exp Neurol 78:891–899

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Jiang Y, Matevossian A, Huang HS, Straubhaar J, Akbarian S (2008) Isolation of neuronal chromatin from brain tissue. BMC Neurosci 9:42

    Article  PubMed  PubMed Central  Google Scholar 

  91. Marzluff WF (1990) Preparation of active nuclei. Methods Enzymol 181:30–36

    Article  CAS  PubMed  Google Scholar 

  92. Hao Y. et al (2020) Integrated analysis of multimodal single-cell data. bioRxiv 2020.10.12.335331

  93. Hafemeister C, Satija R (2019) Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20:296

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Lause J, Berens P, Kobak D (2020) Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data. bioRxiv 2020.12.01.405886

  95. Grubman A et al (2019) A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation. Nat Neurosci 22:2087–2097

    Article  CAS  PubMed  Google Scholar 

  96. Frankish A et al (2019) GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47:D766–D773

    Article  CAS  PubMed  Google Scholar 

  97. Stuart T, Srivastava A, Madad S, Lareau CA, Satija R (2021) Single-cell chromatin state analysis with Signac. Nat Methods 18:1333–1341

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Amemiya HM, Kundaje A, Boyle AP (2019) The ENCODE blacklist: identification of problematic regions of the genome. Sci Rep 9:9354

    Article  PubMed  PubMed Central  Google Scholar 

  99. Korsunsky I et al (2019) Fast, sensitive and accurate integration of single-cell data with harmony. Nat Methods 16:1289–1296

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Cusanovich DA et al (2018) A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174:1309-1324.e18

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Zhang Y et al (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol 9:R137

    Article  PubMed  PubMed Central  Google Scholar 

  102. Jalili V, Matteucci M, Masseroli M, Morelli MJ (2015) Using combined evidence from replicates to evaluate ChIP-seq peaks. Bioinformatics (Oxford, England) 31:2761–2769

    CAS  PubMed  Google Scholar 

  103. Gagnon J et al (2022) Recommendations of scRNA-seq differential gene expression analysis based on comprehensive benchmarking. Life (Basel) 12:850

    CAS  PubMed  Google Scholar 

  104. Fornes O et al (2020) JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res 48:D87–D92

    CAS  PubMed  Google Scholar 

  105. Schep A (2022) motifmatchr: fast motif matching in R. R package version 1.20.0

  106. Machiela MJ, Chanock SJ (2015) LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31:3555–3557

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the Kathleen Price Bryan Brain Bank at Duke University (funded by NIA AG028377) for providing us with the brain tissues, and the Duke Sequencing and Genomic Technologies Shared Resource for sequencing. This study used a high-performance computing facility partially supported by grant 2016-IDG-1013 (“HARDAC+: Reproducible HPC for Next-generation Genomics") from the North Carolina Biotechnology Center. This work is dedicated to Olivia Louise Shwab.

Funding

This work was funded in part by the National Institutes of Health/National Institute of Neurological Disorders & Stroke (NIH/NINDS) [RF1-NS113548-01A1 to OC-F]. We are grateful to the Banner Sun Health Research Institute Brain and Body Donation Program of Sun City, Arizona for the provision of human biological materials. The Brain and Body Donation Program has been supported by the National Institute of Neurological Disorders and Stroke (U24 NS072026 National Brain and Tissue Resource for Parkinson’s Disease and Related Disorders), the National Institute on Aging (P30 AG019610 and P30AG072980, Arizona Alzheimer’s Disease Center), the Arizona Department of Health Services (contract 211002, Arizona Alzheimer’s Research Center), the Arizona Biomedical Research Commission (contracts 4001, 0011, 05-901 and 1001 to the Arizona Parkinson's Disease Consortium) and the Michael J. Fox Foundation for Parkinson’s Research.

Author information

Authors and Affiliations

Authors

Contributions

OC-F conceived of the presented idea. OC-F acquired brain samples and designed the study sample. GES and TGB provided PD brain tissue and associated data. DCG and JG performed tissue processing, nuclei extraction, and the 10X experiments for snATAC-seq/snRNA-seq library preps. EKS, MWL, and ZM performed all data analyses. MEG performed covariate analysis. OC-F planned and supervised the work. GEC, AAK, and MWL provided guidance. EKS, MWL, and OC-F discussed and interpreted the results. EKS, ZM, and MWL prepared figures and tables. EKS and OC-F wrote the manuscript. DCG, MWL and MEG contributed to writing of the methods section. MEG, GEC, AAK, MWL, GES, and TGB reviewed and edited the manuscript. All authors read and approved the final manuscript. OC-F obtained funding support.

Corresponding author

Correspondence to Ornit Chiba-Falek.

Ethics declarations

Ethics approval and consent to participate

The project was approved by the Duke Institutional Review Board (IRB). The study does not involve living human subjects. All samples were obtained from autopsies, and all are de-identified.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shwab, E.K., Gingerich, D.C., Man, Z. et al. Single-nucleus multi-omics of Parkinson’s disease reveals a glutamatergic neuronal subtype susceptible to gene dysregulation via alteration of transcriptional networks. acta neuropathol commun 12, 111 (2024). https://doi.org/10.1186/s40478-024-01803-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40478-024-01803-1

Keywords