Listen to two hosts talking about this story:
The convergence of artificial intelligence and life sciences is unlocking new opportunities in how researchers interact with complex biological data. At BizGenius, we specialize in building intelligent tools for life science companies, such as Citation Widget, Deep Analysis and ChatGenius. One of our latest contributions is CompBioAgent, a cutting-edge platform designed to democratize single-cell RNA sequencing (scRNA-seq) data exploration—powered by large language models (LLMs).
This platform, developed in collaboration with leading researchers from Biogen, Stanford, and BioInfoRx, was recently described in a preprint titled “CompBioAgent: An LLM-powered agent for single-cell RNA-seq data exploration”. In this blog post, I want to share the story behind this project and demonstrate how our work exemplifies BizGenius’s deep capabilities in deploying AI solutions for the life sciences.
The Challenge: Making scRNA-seq Data Accessible to All
Single-cell RNA-seq has revolutionized our ability to analyze gene expression at the resolution of individual cells. But the tools for scRNA-seq analysis—Scanpy, Seurat, Cellxgene—often require specialized knowledge in bioinformatics and programming. This creates a significant barrier for experimental biologists and clinicians who need to extract insights from these datasets without becoming data scientists.
The question was: Can we build a natural language interface that allows any scientist to query and visualize scRNA-seq data, without writing a single line of code?
The Solution: Natural Language to JSON via LLM
Our key contribution to this project was designing and implementing the LLM-driven query parsing engine that powers CompBioAgent.
At the core of CompBioAgent is a system that listens to natural language questions—like:
“Show TYK2 gene expression in multiple sclerosis patients vs. healthy controls.”
—and automatically converts them into structured JSON commands that drive the underlying CellDepot and CellxgeneVIP APIs.
This involved:
- Prompt engineering: Crafting both simple and complex prompt templates tailored for OpenAI GPT, LLaMA, Claude, and other models.
- Few-shot learning: Supplying contextual examples to guide the model toward accurate JSON generation.
- Entity normalization: Automatically correcting and standardizing biological terms, such as diseases (“MS” → “Multiple Sclerosis”), genes (“CREB-1” → “CREB1”), and cell types (“Astro” → “astrocytes”).
- Error handling: Adding fallback logic and validations to correct LLM inconsistencies—such as placeholder gene names or incomplete metadata.
The result: a seamless pipeline where the user can focus on the biology, and the machine handles the data structuring, querying, and plotting in the background.
Instant Biological Insights, No Coding Required
CompBioAgent integrates with CellDepot—a curated warehouse of over 270 scRNA-seq projects—and Cellxgene VIP, a visualization powerhouse. It supports UMAP plots, violin plots, dot plots, heatmaps, and stacked bar plots, dynamically selected based on user queries.
Here are a few powerful use cases:
- Compare gene expression (e.g., TYK2) between MS patients and controls with a violin plot.
- Visualize multi-gene patterns across cell types with dot plots in Alzheimer’s disease.
- Explore immune signatures in cutaneous lupus via heatmaps of marker genes.
Every plot is generated interactively, informed by user intent, and translated by AI.
The Infrastructure: Deploying LLMs at Scale
CompBioAgent is deployed as a flexible web service supporting both GPU-based local inference (via Ollama + LLaMA) and commercial LLM APIs (OpenAI, Claude, DeepSeek, Groq). This dual-mode architecture respects data privacy for enterprise users, while enabling cost-effective inference where appropriate.
Our tech stack includes:
- Apache + MySQL for serving and storage
- Backend logic for LLM orchestration and error correction
- Modular integration for visualization engines
This hybrid LLM architecture reflects what we do at BizGenius—tailoring AI solutions to fit scientific needs, infrastructure constraints, and real-world applications.
Why This Matters for Life Science Companies
If you’re running a biotech, pharmaceutical company, or research tool platform, here’s why this matters:
- You don’t need to build full AI pipelines in-house. BizGenius can help you integrate LLMs into your product to enable smarter querying, recommendation engines, intelligent literature insights, or automated reporting.
- Natural language is the new interface. Scientists increasingly expect Google-like interfaces to interrogate data. We build those.
- AI-native tools accelerate discovery. CompBioAgent shortens the path from question to insight, empowering researchers to explore hypotheses faster and with greater precision.
About Us: BizGenius
At BizGenius, we specialize in AI-powered tools for life science companies. Whether it’s automating product citation tracking (and displaying), or powering smart AI custom support, we bring deep expertise at the intersection of biology, data, and language models.
If your company is looking to transform its data tools with LLMs or AI, let’s talk. The future of bioscience is intelligent—and we’re building it now.
📩 Contact us at https://www.biz-genius.com/contact/.
Read the full preprint:
Zhang et al. (2025). CompBioAgent: An LLM-powered agent for single-cell RNA-seq data exploration.