Single-cell genomics is now producing an ever-increasing amount of datasets that, when integrated, could provide large-scale reference atlases of tissue in health and disease. Such large-scale... Show moreSingle-cell genomics is now producing an ever-increasing amount of datasets that, when integrated, could provide large-scale reference atlases of tissue in health and disease. Such large-scale atlases increase the scale and generalizability of analyses and enable combining knowledge generated by individual studies. Specifically, individual studies often differ regarding cell annotation terminology and depth, with different groups specializing in different cell type compartments, often using distinct terminology. Understanding how these distinct sets of annotations are related and complement each other would mark a major step towards a consensus-based cell-type annotation reflecting the latest knowledge in the field. Whereas recent computational techniques, referred to as 'reference mapping' methods, facilitate the usage and expansion of existing reference atlases by mapping new datasets (i.e. queries) onto an atlas; a systematic approach towards harmonizing dataset-specific cell-type terminology and annotation depth is still lacking. Here, we present 'treeArches', a framework to automatically build and extend reference atlases while enriching them with an updatable hierarchy of cell-type annotations across different datasets. We demonstrate various use cases for treeArches, from automatically resolving relations between reference and query cell types to identifying unseen cell types absent in the reference, such as disease-associated cell states. We envision treeArches enabling data-driven construction of consensus atlas-level cell-type hierarchies and facilitating efficient usage of reference atlases. Show less
Multiple methods have recently been developed to reconstruct full-length B-cell receptors (BCRs) from single-cell RNA sequencing (scRNA-seq) data. This need emerged from the expansion of scRNA-seq... Show moreMultiple methods have recently been developed to reconstruct full-length B-cell receptors (BCRs) from single-cell RNA sequencing (scRNA-seq) data. This need emerged from the expansion of scRNA-seq techniques, the increasing interest in antibody-based drug development and the importance of BCR repertoire changes in cancer and autoimmune disease progression. However, a comprehensive assessment of performance-influencing factors such as the sequencing depth, read length or number of somatic hypermutations (SHMs) as well as guidance regarding the choice of methodology is still lacking. In this work, we evaluated the ability of six available methods to reconstruct full-length BCRs using one simulated and three experimental SMART-seq datasets. In addition, we validated that the BCRs assembled in silico recognize their intended targets when expressed as monoclonal antibodies. We observed that methods such as BALDR, BASIC and BRACER showed the best overall performance across the tested datasets and conditions, whereas only BASIC demonstrated acceptable results on very short read libraries. Furthermore, the de novo assembly-based methods BRACER and BALDR were the most accurate in reconstructing BCRs harboring different degrees of SHMs in the variable domain, while TRUST4, MiXCR and BASIC were the fastest. Finally, we propose guidelines to select the best method based on the given data characteristics. Show less
Schendel, R. van; Schimmel, J.; Tijsterman, M. 2022
With the emergence of CRISPR-mediated genome editing, there is an increasing desire for easy-to-use tools that can process and overview the spectra of outcomes. Here, we present Sequence... Show moreWith the emergence of CRISPR-mediated genome editing, there is an increasing desire for easy-to-use tools that can process and overview the spectra of outcomes. Here, we present Sequence Interrogation and Quantification (SIQ), a simple-to-use software tool that enables researchers to retrieve, data-mine and visualize complex sets of targeted sequencing data. SIQ can analyse Sanger sequences but specifically benefit the processing of short- and long-read next-generation sequencing data (e.g. Illumina and PacBio). SIQ facilitates their interpretation by establishing mutational profiles, with a focus on event classification such as deletions, single-nucleotide variations, (templated) insertions and tandem duplications. SIQ results can be directly analysed and visualized via SIQPlotteR, an interactive web tool that we made freely available. Using insightful tornado plot visualizations as outputs, we illustrate that SIQ readily identifies sequence- and repair pathway-specific mutational signatures in a variety of model systems, such as nematodes, plants and mammalian cell culture. Show less
Guebila, M. ben; Morgan, D.C.; Glass, K.; Kuijjer, M.L.; DeMeo, D.L.; Quackenbush, J. 2022
Gene regulatory network inference allows for the modeling of genome-scale regulatory processes that are altered during development, in disease, and in response to perturbations. Our group has... Show moreGene regulatory network inference allows for the modeling of genome-scale regulatory processes that are altered during development, in disease, and in response to perturbations. Our group has developed a collection of tools to model various regulatory processes, including transcriptional (PANDA, SPIDER) and post-transcriptional (PUMA) gene regulation, as well as gene regulation in individual samples (LIONESS). These methods work by postulating a network structure and then optimizing that structure to be consistent with multiple lines of biological evidence through repeated operations on data matrices. Although our methods are widely used, the corresponding computational complexity, and the associated costs and run times, do limit some applications. To improve the cost/time performance of these algorithms, we developed gpuZoo which implements GPU-accelerated calculations, dramatically improving the performance of these algorithms. The runtime of the gpuZoo implementation in MATLAB and Python is up to 61 times faster and 28 times less expensive than multi-core CPU implementation of the same methods. gpuZoo is available in MATLAB through the netZooM package https://github.com/netZoo/netZooM and in Python through the netZooPy package https://github.com/netZoo/netZooPy. Show less
Single-cell RNA sequencing data is characterized by a large number of zero counts, yet there is growing evidence that these zeros reflect biological variation rather than technical artifacts. We... Show moreSingle-cell RNA sequencing data is characterized by a large number of zero counts, yet there is growing evidence that these zeros reflect biological variation rather than technical artifacts. We propose to use binarized expression profiles to identify the effects of biological variation in single-cell RNA sequencing data. Using 16 publicly available and simulated datasets, we show that a binarized representation of single-cell expression data accurately represents biological variation and reveals the relative abundance of transcripts more robustly than counts. Show less