Englander Institute for Precision Medicine

cloudrnaSPAdes: Isoform assembly using bulk barcoded RNA sequencing data.

TitlecloudrnaSPAdes: Isoform assembly using bulk barcoded RNA sequencing data.
Publication TypeJournal Article
Year of Publication2023
AuthorsMeleshko D, Prjbelski AD, Raiko M, Tomescu AI, Tilgner H, Hajirasouliha I
Date Published2023 Jul 27

MOTIVATION: Recent advancements in long-read RNA sequencing have enabled the examination of full-length isoforms, previously uncaptured by short-read sequencing methods. An alternative powerful method for studying isoforms is through the use of barcoded short-read RNA reads, for which a barcode indicates whether two short-reads arise from the same molecule or not. Such techniques included the 10x Genomics linked-read based SParse Isoform Sequencing (SPIso-seq), as well as Loop-Seq, or Tell-Seq. Some applications, such as novel-isoform discovery, require very high coverage. Obtaining high coverage using long reads can be difficult, making barcoded RNA-seq data a valuable alternative for this task. However, most annotation pipelines are not able to work with a set of short reads instead of a single transcript, also not able to work with coverage gaps within a molecule if any. In order to overcome this challenge, we present an RNA-seq assembler allowing the determination of the expressed isoform per barcode.

RESULTS: In this paper, we present cloudrnaSPAdes, a tool for assembling full-length isoforms from barcoded RNA-seq linked-read data in a reference-free fashion. Evaluating it on simulated and real human data, we found that cloudrnaSPAdes accurately assembles isoforms, even for genes with high isoform diversity.

Alternate JournalbioRxiv
PubMed ID37546844
PubMed Central IDPMC10402000
Grant ListR35 GM138152 / GM / NIGMS NIH HHS / United States

