Englander Institute for Precision Medicine

Genomic sequence context differs between germline and somatic structural variants allowing for their differentiation in tumor samples without paired normals.

TitleGenomic sequence context differs between germline and somatic structural variants allowing for their differentiation in tumor samples without paired normals.
Publication TypeJournal Article
Year of Publication2023
AuthorsChukwu W, Lee S, Crane A, Zhang S, Mittra I, Imielinski M, Beroukhim R, Dubois F, Dalin S
JournalbioRxiv
Date Published2023 Dec 04
Abstract

There is currently no method to distinguish between germline and somatic structural variants (SVs) in tumor samples that lack a matched normal sample. In this study, we analyzed several features of germline and somatic SVs from a cohort of 974 patients from The Cancer Genome Atlas (TCGA). We identified a total of 21 features that differed significantly between germline and somatic SVs. Several of the germline SV features were associated with each other, as were several of the somatic SV features. We also found that these associations differed between the germline and somatic classes, for example, we found that somatic inversions were more likely to be longer events than their germline counterparts. Using these features we trained a support vector machine (SVM) classifier on 555,849 TCGA SVs to computationally distinguish germline from somatic SVs in the absence of a matched normal. This classifier had an ROC curve AUC of 0.984 when tested on an independent test set of 277,925 TCGA SVs. In this dataset, we achieved a positive predictive value (PPV) of 0.81 for an SV called somatic by the classifier being truly somatic. We further tested the classifier on a separate set of 7,623 SVs from pediatric high-grade gliomas (pHGG). In this non-TCGA cohort, our classifier achieved a PPV of 0.828, showing robust performance across datasets.

DOI10.1101/2023.10.09.561462
Alternate JournalbioRxiv
PubMed ID38106141
PubMed Central IDPMC10723258
Grant ListF32 CA261024 / CA / NCI NIH HHS / United States

Weill Cornell Medicine Englander Institute for Precision Medicine 413 E 69th Street
Belfer Research Building
New York, NY 10021