Englander Institute for Precision Medicine

PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data.

TitlePhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data.
Publication TypeJournal Article
Year of Publication2019
AuthorsMalikic S, Mehrabadi FRashidi, Ciccolella S, Rahman MKhaledur, Ricketts C, Haghshenas E, Seidman D, Hach F, Hajirasouliha I, S Sahinalp C
JournalGenome Res
Volume29
Issue11
Pagination1860-1877
Date Published2019 Nov
ISSN1549-5469
KeywordsComputational Biology, High-Throughput Nucleotide Sequencing, Humans, Neoplasms, Phylogeny, Single-Cell Analysis
Abstract

Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely satisfying the (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and-as a first in tumor phylogeny reconstruction-a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.

DOI10.1101/gr.234435.118
Alternate JournalGenome Res
PubMed ID31628256
PubMed Central IDPMC6836735
Grant ListT32 GM083937 / GM / NIGMS NIH HHS / United States

Weill Cornell Medicine Englander Institute for Precision Medicine 413 E 69th Street
Belfer Research Building
New York, NY 10021