A New Computational Model to Understand Tumor Evolution
EIPM Member Iman Hajirasouliha and colleagues recently published a Methods paper, “PhISCS—A Combinatorial Approach for Sub-perfect Tumor Phylogeny Reconstruction via Integrative use of Single Cell and Bulk Sequencing,” in Genome Research that explored how recent technological advances in single cell sequencing provide high resolution data for the study of intra-tumor heterogeneity and tumor evolution.
In order to address technical limitations, Dr. Hajirasouliha and colleagues, for the first time, introduced a new combinatorial formulation that integrates single cell sequencing data with matching bulk sequencing data, with the objective of minimizing a linear combination of potential false negatives, and potential false positives among mutation calls, as well as the number of mutations that violate infinite sites assumption to define the optimal sub-perfect phylogeny.
Dr. Hajirasouliha explains some of the highlights of the new paper.

Dr. Iman Hajirasouliha
What’s the most unique aspect of your paper?
We have created a model that combines both single cell data and bulk sequencing data. Each data type has certain advantages and disadvantages, but when we combine these two we can understand and explain the evolution of tumors much better.
Our method is not only faster, but the model is more general because we also allow more complex events that remove or delete mutations. Usually in these evolutionary scenarios people assume that once a mutation occurs in a cell that its follow-up lineages wouldn’t return in the infinite site assumption. However, in some cases, because of the deletions and loss of heterozygosity, we may not see that mutation again. Our model would actually take that into account by combining both data types.
What questions does the paper try to answer?
When we first started this project, we were trying to answer a theoretical question. We used some techniques in optimizations in the context of tumor heterogeneity and cancer evolution for the first time. That was very interesting, playing with these new mathematical models and applying them for the first time to cancer.
Does this paper advance your lab’s research goals?
To really advance my research, I need to develop collaborations using real data sets to validate these methods and apply the algorithms to infer novel insights into the biology of cancer.
I really believe we can more fully explain the biology of cancer using these models. We could use publicly available data sets and show that our methods work on those data sets, but down the road I want to collaborate with people who generate their own data and use it to learn more about the biology of cancer.
How will this research ultimately benefit patients?
That’s a great question. More fully understanding the biology of cancer can help medical professionals develop more personalized treatments for patients. Learning more about how mutations occur and how different cells evolve can help us develop these targeted therapies.
Our approach can provide a more complete picture, with much better resolution, to more fully understand the biology of cancer.
Any last thoughts about the paper?
As we demonstrated in the paper, using different real data sets, it is important that people know that we can apply these models to real cancer genes, that our approach isn’t just theoretical.
For the first time we can use combinatorial optimization techniques to this general model of cancer evolution. I would also like to thank my co-authors and collaborators on this project, in particular Dr. Cenk Sahinalp who is now a senior investigator at NCI’s Center for Cancer Research. I conceived the original ideas and solutions for this model together with him when we both visited the Simons Institute for the Theory of Computing at UC Berkeley and the Computational Genomics Summer Institute at UCLA two years ago. A handful of talented and hardworking students from our groups and a visiting student from Milan, Simone Ciccolella helped us implement and extensively test the model on simulated and real data sets. I am thankful to all.
# # #