Molecular classification of a complex structural rearrangement of the RB1 locus in an infant with sporadic, isolated, intracranial, sellar region retinoblastoma

Retinoblastoma is a childhood cancer of the retina involving germline or somatic alterations of the RB Transcriptional Corepressor 1 gene, RB1. Rare cases of sellar-suprasellar region retinoblastoma without evidence of ocular or pineal tumors have been described. A nine-month-old male presented with a sellar-suprasellar region mass. Histopathology showed an embryonal tumor with focal Flexner-Wintersteiner-like rosettes and loss of retinoblastoma protein (RB1) expression by immunohistochemistry. DNA array-based methylation profiling confidently classified the tumor as pineoblastoma group A/intracranial retinoblastoma. The patient was subsequently enrolled on an institutional translational cancer research protocol and underwent comprehensive molecular profiling, including paired tumor/normal exome and genome sequencing and RNA-sequencing of the tumor. Additionally, Pacific Biosciences (PacBio) Single Molecule Real Time (SMRT) sequencing was performed from comparator normal and disease-involved tissue to resolve complex structural variations. RNA-sequencing revealed multiple fusions clustered within 13q14.1-q21.3, including a novel in-frame fusion of RB1-SIAH3 predicted to prematurely truncate the RB1 protein. SMRT sequencing revealed a complex structural rearrangement spanning 13q14.11-q31.3, including two somatic structural variants within intron 17 of RB1. These events corresponded to the RB1-SIAH3 fusion and a novel RB1 rearrangement expected to correlate with the complete absence of RB1 protein expression. Comprehensive molecular analysis, including DNA array-based methylation profiling and sequencing-based methodologies, were critical for classification and understanding the complex mechanism of RB1 inactivation in this diagnostically challenging tumor. Supplementary Information The online version contains supplementary material available at 10.1186/s40478-021-01164-z.

community-accepted guidelines for best practices (https://www.broadinstitute.org/gatk/guide/best-practices). Duplicate sequence reads were removed using samblaster-v.0.1.22, local realignment was performed on the aligned sequence data using the Genome Analysis Toolkit (v3.7-0), and Churchill's own deterministic implementation of base quality score recalibration was used. Germline variants were called using GATK's HaplotypeCaller. Enhanced exome sequencing (eES) average coverage depth was 283x and 287x for the tumor and normal sample, respectively. Genome sequencing average coverage depth was 119x and 41x for the tumor and normal sample, respectively. Somatic single nucleotide variation (SNV) and indel detection were performed using MuTect-2 [2]. Germline variation in cancer-associated genes and somatic variation across the coding region of the exome were analyzed [22]. Copy number alteration (CNA) was assessed using VarScan2 [10]. Genome sequencing data was analyzed for structural variation (SV) using LUMPY [11].

RNA-sequencing
In parallel, a 500 ng aliquot of snap frozen tumor derived RNA was subjected 0.8x SPRI bead cleanup and size selection prior to DNase treatment and ribodepletion prior to using Illumina's TruSeq Stranded Total RNA Sample preparation (performed with 8-minute chemical fragmentation). The library was constructed for whole transcriptome sequencing (RNA-seq).
Paired-end 151-bp reads were generated on the Illumina HiSeq 4000, and reads were aligned to the human genome reference sequence (GRCh38) with the resultant output representing 94,103,657 uniquely mapped reads. RNA-seq data were processed using an ensemble approach of six fusion callers (STAR-fusion, JAFFA, fusioncatcher, FusionMap, MapSplice, and TopHat-Fusion) with high-confidence fusions characterized by overlap of fusion identification between the multiple callers (called by at least 4 out of 6 callers) and rarity (<10% frequency) within our IGM Translational Research cancer cohort [3-5, 9, 13, 20, 21]. Transcript coverage was evaluated using GenVisR [19]. Transcripts per million (TPM) values were generated from paired-end RNA sequence data using Salmon with bootstrapping set to 100 [14].

Sanger sequencing
We used 500 ng of RNA with MultiScribe reverse transcriptase (ThermoFisher, Waltham, MA) and random hexamers (Applied Biosystems, Foster City, CA) for RT-PCR.

PacBio HiFi SMRTbell template prep and sequencing
Genomic DNA quantity and quality, from primary tumor and blood comparator assayed with Thermo Fisher Scientific Qubit 1x dsDNA HS Assay Kit and Agilent TapeStation 4200 Genomic ScreenTape. DNA shearing was unnecessary as the DNA mode size for the primary tumor was 23.9 kb and blood comparator mode size was 13.6 kb which were in the size range for

PacBio HiFi-based analysis of chromosome 13
SVs of >1 kb were identified from the HiFi Circular Consensus Sequencing (CCS) data using an ensemble-based approach. Briefly, we used five methods to call SVs from the data: . In all cases, reads were aligned to the GRCh38 patch release 13 (GRCh38.p13) using preset parameters for PacBio and/or CCS data.
Our ensemble approach collected filter passing variants from each call set and compared them for compatible structural variant types and coordinates, allowing for a 50 nucleotide (nt) margin to detect overlaps. Only SVs which were present in all five call sets, affected a region of 1000 nts or above, and had a minimum of 4 supporting reads were accepted into the final ensemble-based output. The algorithm Sniffles flags variants with read strand bias which were additionally filtered out during our approach.
We focused our analysis on chromosome 13, as other assays suggested the presence of and 26 inversion-associated rearrangements. One inversion was additionally detected in the comparator normal sample, but further analysis revealed it to be an artifact of incorrect alignment due to overlap with a long interspersed nuclear element (LINE). A filtering strategy was employed to eliminate systematic false positives and exclude variants that represent normal genetic variation in the human genome. SVs were compared to five normal human cell lines and 15 non-cancer samples from institutional patients. A frequency-based filtering was used to exclude recurrent variants found at >20% of this internal reference cohort. Additionally, SVs were filtered for normal human population variation through comparison against gnomAD SV v2.1 (<0.001 frequency) [7]. All of the SVs found in the comparator normal sample were considered normal genetic variation within the human genome.
Copy number estimations were based on median-normalized coverage data. Bedtools [16] (version 2.29.2) genomecov was used to generate alignment-based coverage, which was then grouped into 100 kb blocks. Median normalized coverage was calculated as (coverage/median(coverage)). The log2-based copy number was calculated as 2*log2([median-