Title | Genomic sequence context differs between germline and somatic structural variants allowing for their differentiation in tumor samples without paired normals. |
Publication Type | Journal Article |
Year of Publication | 2023 |
Authors | Chukwu W, Lee S, Crane A, Zhang S, Mittra I, Imielinski M, Beroukhim R, Dubois F, Dalin S |
Journal | bioRxiv |
Date Published | 2023 Dec 04 |
Abstract | There is currently no method to distinguish between germline and somatic structural variants (SVs) in tumor samples that lack a matched normal sample. In this study, we analyzed several features of germline and somatic SVs from a cohort of 974 patients from The Cancer Genome Atlas (TCGA). We identified a total of 21 features that differed significantly between germline and somatic SVs. Several of the germline SV features were associated with each other, as were several of the somatic SV features. We also found that these associations differed between the germline and somatic classes, for example, we found that somatic inversions were more likely to be longer events than their germline counterparts. Using these features we trained a support vector machine (SVM) classifier on 555,849 TCGA SVs to computationally distinguish germline from somatic SVs in the absence of a matched normal. This classifier had an ROC curve AUC of 0.984 when tested on an independent test set of 277,925 TCGA SVs. In this dataset, we achieved a positive predictive value (PPV) of 0.81 for an SV called somatic by the classifier being truly somatic. We further tested the classifier on a separate set of 7,623 SVs from pediatric high-grade gliomas (pHGG). In this non-TCGA cohort, our classifier achieved a PPV of 0.828, showing robust performance across datasets. |
DOI | 10.1101/2023.10.09.561462 |
Alternate Journal | bioRxiv |
PubMed ID | 38106141 |
PubMed Central ID | PMC10723258 |
Grant List | F32 CA261024 / CA / NCI NIH HHS / United States |