Scanpy is a Python library for the preprocessing, visualization, and analysis of large-scale single-cell gene expression datasets. It serves as a toolkit for single-cell RNA sequencing analysis, providing a framework to process and analyze genomic data from individual cells to identify biological markers and cell types.
The library includes a scalable data processing pipeline for cleaning and preparing genomic data, a clustering framework for grouping cells with similar expression profiles, and a system for modeling transitions between cell states to reconstruct biological development and differentiation processes. It also provides a suite of tools for generating graphical representations of high-dimensional cell populations and gene expression patterns.
The toolkit covers broader analytical capabilities including differential gene expression testing to identify characteristic markers and various genomic data preprocessing operations.