Guides8 min read

Nextflow and Snakemake Pipelines Without DevOps

Run Nextflow and Snakemake bioinformatics pipelines in the cloud without configuring AWS, writing Groovy, or managing infrastructure. A practical guide for wet-lab researchers.

GeneChef TeamJanuary 20, 2026

nextflow snakemake pipelines guide

On this page

The Friday Afternoon You Didn't Plan For

It's 3 PM on a Friday. Your collaborator just sent you an nf-core/rnaseq pipeline and a Slack message: "Just run this on your FASTQ files, should take a couple hours." You click the GitHub link. There's a nextflow.config file full of executor settings, a params.yml referencing S3 buckets you don't have, and a README that starts with "Ensure Java 11+ and Nextflow 23.10+ are installed."

You're a molecular biologist. You ran the wet lab. You generated beautiful RNA-seq data. And now you're supposed to become a cloud engineer by Monday.

This is the moment where most researchers either (a) spend the next two weeks learning AWS, or (b) email the bioinformatics core and wait three weeks for a queue slot. Neither option is great.

There's a third path. But first, let's talk about what Nextflow and Snakemake actually do and why everyone keeps telling you to use them.

What Pipeline Managers Actually Do

Nextflow and Snakemake are pipeline managers — software that chains bioinformatics tools together and handles the tedious parts: running steps in parallel, retrying failed jobs, managing memory and CPU allocation, and tracking which outputs came from which inputs.

Think of them like a lab protocol, but for computation. Instead of "incubate at 37°C for 1 hour, then spin at 12,000g," it's "align reads with STAR, then count features with featureCounts, then run DESeq2." The pipeline manager makes sure each step gets the right inputs, runs on appropriate hardware, and picks up where it left off if something crashes.

Why Researchers Keep Hearing About Them

Three reasons keep coming up in every bioinformatics seminar:

Reproducibility. A Nextflow pipeline defines every tool version, every parameter, every container image. Someone in another lab can run the exact same analysis and get the same results.
Scalability. The same pipeline that processes 6 samples on your laptop can process 600 on a cloud cluster without rewriting anything.
Community pipelines. The nf-core project maintains over 100 peer-reviewed, production-grade pipelines. nf-core/rnaseq, nf-core/sarek for variant calling, nf-core/ampliseq for 16S — these are battle-tested by hundreds of labs.

The promise is real. The problem is getting there.

Why It's Hard: The Setup Nobody Talks About

Here's what "just run this pipeline" actually involves if you're starting from scratch:

Install the runtime. Nextflow needs Java 11+. Snakemake needs Python 3.8+ and Conda or Mamba. Both need Docker or Singularity for containerized tools.
Configure an executor. Running locally works for toy datasets. Real data needs a cluster (Slurm, PBS) or cloud compute (AWS Batch, Google Life Sciences). Each executor has its own configuration.
Set up AWS Batch (if going cloud). This means creating a VPC, compute environments, job queues, IAM roles, S3 buckets, and a launch template. You'll need to understand Spot vs. On-Demand pricing, instance families, and ECS task definitions.
Write or modify pipeline code. Nextflow uses Groovy-based DSL with concepts like processes, channels, and operators. Snakemake uses a Python-based DSL with rules, wildcards, and directed acyclic graphs. Both have learning curves measured in weeks, not hours.
Debug cryptic errors. "No such variable: reads" in Nextflow. "MissingInputException" in Snakemake. "CannotPullContainerError" from AWS Batch. Each one sends you down a different rabbit hole.

This isn't biology. It's software engineering. And there's no shame in finding it overwhelming — it is overwhelming. Experienced bioinformaticians spend years building this expertise.

How GeneChef Handles Pipelines

GeneChef gives you two paths depending on where you're starting from.

Path 1: You Don't Have a Pipeline Yet

If you're starting from raw data and a research question, you don't need Nextflow or Snakemake at all. GeneChef's AI workflow builder lets you describe your analysis in plain language — "I have paired-end RNA-seq FASTQs from a mouse knockout experiment and I need differential expression analysis" — and it generates a Galaxy workflow with the right tools, parameters, and connections.

Galaxy has over 100 bioinformatics tools already installed. The AI assistant (powered by Claude) knows which tools to chain together, what reference genomes to use, and how to set parameters for your organism and data type. You review the workflow, upload your data, and click run.

No pipeline code. No Groovy. No Python DSL. No containers to manage.

Path 2: You Already Have a Nextflow or Snakemake Pipeline

This is the Friday afternoon scenario. A collaborator sent you an nf-core pipeline, or your lab has a custom Snakemake workflow that's been validated on your data type. You don't want to rebuild it in Galaxy — you want to run it.

GeneChef runs Nextflow and Snakemake pipelines on AWS Batch using Fargate Spot instances on ARM64 (Graviton) processors. Here's what that means in practice:

No AWS account needed. GeneChef manages the Batch compute environment, job queues, IAM roles, and networking.
No Spot instance management. Fargate Spot handles capacity automatically. If a Spot instance gets reclaimed, the job retries on a new one.
No container orchestration. Pipeline container images are pulled and run automatically.
Zero idle cost. You pay nothing when pipelines aren't running. Compute costs are approximately $0.05 per vCPU-hour on Spot pricing, included in your subscription.

You submit a pipeline run through the API (/api/pipelines/run), monitor status (/api/pipelines/[id]/status), and stream logs from CloudWatch (/api/pipelines/[id]/logs). Each job gets 4 vCPU and 8 GB RAM with a maximum duration of 24 hours.

A Realistic Walkthrough: Running nf-core/rnaseq

Let's walk through running nf-core/rnaseq on GeneChef from start to finish, with honest time estimates.

Step 1: Upload Your Data (~5-15 minutes)

Upload your FASTQ files through GeneChef's dataset management interface. Files go to S3 via presigned URLs. For a typical RNA-seq experiment with 12 samples (paired-end, ~5 GB per sample), expect 10-15 minutes on a decent connection.

Step 2: Configure the Pipeline (~5 minutes)

Specify the nf-core/rnaseq pipeline, your sample sheet, reference genome, and any parameter overrides. If you're using a standard organism (human, mouse, Arabidopsis), the defaults handle most of it.

Step 3: Submit and Monitor (~1-3 hours for typical datasets)

Submit the run via the GeneChef interface. The pipeline executes on AWS Batch — FASTQC, trimming, alignment (STAR or HISAT2), quantification (Salmon), and MultiQC report generation all run as separate jobs, parallelized across your samples.

You can stream logs in real time to watch progress or check back later. For 12 samples of standard RNA-seq, expect 1-3 hours depending on read depth and genome size.

Step 4: Retrieve Results (~2 minutes)

Outputs land in S3 with a 90-day lifecycle. Download your MultiQC report, count matrices, and alignment statistics. Feed the count matrix into GeneChef's Galaxy tools for downstream differential expression analysis if you want to stay on the platform.

Total hands-on time: roughly 15-25 minutes. The rest is compute time where you can go do something else — like actual lab work.

Cost Comparison

Running nf-core/rnaseq yourself on AWS typically costs $15-50 per run in compute, plus the hours (or days) of setup time configuring Batch, debugging IAM permissions, and figuring out why your Spot instances keep getting terminated. That setup time is the real cost — it's your time as a researcher, and it's not free.

On GeneChef, pipeline execution is included in your subscription. A Professional plan at $299/month covers 5 concurrent jobs. If you're running a handful of pipelines per month, the math works out quickly.

What GeneChef Won't Do

Honesty matters more than a sales pitch, so here are the limitations:

Custom pipeline development still requires coding. GeneChef runs existing pipelines. If you need to write a new Nextflow pipeline from scratch, you still need to learn Nextflow. GeneChef doesn't have a visual pipeline editor for Nextflow/Snakemake (Galaxy workflows are a different story).
Large-scale production workloads may need dedicated infrastructure. Each job runs with 4 vCPU and 8 GB RAM, with a 24-hour maximum duration. If you're processing thousands of whole-genome samples or running pipelines that need 96 GB of RAM for a single step, you'll likely need a custom AWS Batch setup or an HPC cluster.
Not every nf-core pipeline has been tested. The nf-core ecosystem is large and evolving. Most popular pipelines (rnaseq, sarek, ampliseq, fetchngs, taxprofiler) work well. Newer or more niche pipelines may need parameter adjustments.
You're on a managed platform. You don't get root access to the compute environment. If you need to install custom system libraries or modify the execution environment, that's not available.

Who This Is Actually For

GeneChef's pipeline execution works best for:

Wet-lab researchers who generate sequencing data and need to run standard analysis pipelines without learning cloud infrastructure.
Small biotech teams (5-15 people) that don't have a dedicated bioinformatics engineer but need reproducible, scalable analysis.
Labs using nf-core pipelines that want to run community workflows without maintaining their own AWS environment.
PIs and postdocs who need results this week, not after a three-month infrastructure project.

If you're a bioinformatics engineer who enjoys tuning Nextflow configurations and optimizing Spot instance strategies, GeneChef probably isn't adding much for you. You've already solved this problem. This is for everyone else.

Getting Started

GeneChef offers a free trial so you can test pipeline execution with your own data before committing. Upload a small dataset, run an nf-core pipeline, and see if the workflow fits how your lab operates.

No credit card required. No AWS account required. No Java installation required.

GeneChef is a managed bioinformatics platform built on Galaxy that gives researchers AI-powered workflow building and cloud pipeline execution — so you can focus on the science, not the infrastructure.