Showing posts with label Performance Tips. Show all posts
Showing posts with label Performance Tips. Show all posts

MongoDB Aggregation Framework Tutorial: Pipeline Stages, Examples & Pro Tips


Aggregation Framework Explained: The Magic Data Factory

A Super Fun Conveyor Belt Adventure For Beginners to Expert Level


This article explains MongoDB's Aggregation Framework in a fun, beginner-friendly way. Learn the pipeline stages ($match, $group, $sort, $lookup, etc.), see real examples, use Compass Aggregation Builder, and discover pro tricks like $facet, $bucket, and performance tips.

Imagine you own a huge toy factory full of superhero action figures (your data). Every day, thousands of figures come off the line. But your customers don’t want raw toys, they want awesome reports:

  • “Top 10 strongest heroes”
  • “Average power level per team”
  • “Heroes with fire powers who are level 10+”

The Aggregation Framework is your magic conveyor belt that takes raw toys (documents) and turns them into perfect finished products (reports) step by step. Each step is called a stage. This tutorial turns the scary-sounding “aggregation pipeline” into a fun factory game that a Class 8 student can understand, while giving real pro techniques to experienced developers. We’ll use our Hero Academy data again!

Let’s turn on the conveyor belt!


Part 1: What is the Aggregation Pipeline?

db.collection.aggregate([ stage1, stage2, stage3, ... ])

It’s a pipeline → data flows from left to right through stages. Each stage transforms the data and passes it to the next.

Factory Example:
Raw plastic → $match (pick only red plastic) → $group (make groups of 10) → $sort (biggest first) → finished toys!

Visualizing the Aggregation Pipeline

MongoDB Aggregation Pipeline

Think of the pipeline like a factory conveyor belt. Each stage processes your data before passing it to the next stage.

MongoDB Aggregation Pipeline Flow Diagram

Image: Data flows from raw documents → stages → final aggregated results.


Part 2: Sample Data (Start the Factory!)

use heroAcademy
db.heroes.insertMany([
  { name: "Aarav", team: "Alpha", power: "Speed", level: 85, city: "Mumbai" },
  { name: "Priya", team: "Alpha", power: "Invisible", level: 92, city: "Delhi" },
  { name: "Rohan", team: "Beta", power: "Fire", level: 78, city: "Mumbai" },
  { name: "Sanya", team: "Alpha", power: "Telekinesis", level: 88, city: "Bangalore" },
  { name: "Karan", team: "Beta", power: "Ice", level: 95, city: "Delhi" },
  { name: "Neha", team: "Beta", power: "Fire", level: 89, city: "Mumbai" }
])

Extended set

use heroAcademy
db.heroes.insertMany([
  { name: "Aarav", team: "Alpha", power: "Speed", level: 85, city: "Mumbai", skills: ["flight", "strength"], status: "active" },
  { name: "Priya", team: "Alpha", power: "Invisible", level: 92, city: "Delhi", skills: ["stealth"], status: "inactive" },
  { name: "Rohan", team: "Beta", power: "Fire", level: 78, city: "Mumbai", skills: ["fire", "combat"], status: "active" },
  { name: "Sanya", team: "Alpha", power: "Telekinesis", level: 88, city: "Bangalore", skills: ["mind-control"], status: "active" },
  { name: "Karan", team: "Beta", power: "Ice", level: 95, city: "Delhi", skills: ["ice", "defense"], status: "inactive" },
  { name: "Neha", team: "Beta", power: "Fire", level: 89, city: "Mumbai", skills: ["fire"], status: "active" }
])

These additional fields allow demonstration of $unwind, $facet, and filtering by status.


Part 3: The Most Useful Stages (Factory Machines)

1. $match – The Filter Gate (Let only certain toys pass)

{ $match: { team: "Alpha" } }

→ Only Alpha team heroes continue. Use early → makes everything faster (like putting filter first in factory)!

Tip for Beginners: Always place $match early in the pipeline. This filters data first, making your aggregation much faster!

2. $project - The Reshaper Machine (Change how toys look)

{ $project: { name: 1, level: 1, _id: 0 } }

→ Show only name and level

{ $project: { name: 1, level: { $add: ["$level", 10] } } }

// +10 bonus!

3. $group - The Packing Machine (Group toys & calculate)

{
  $group: {
    _id: "$team",
    avgLevel: { $avg: "$level" },
    totalHeroes: { $sum: 1 },
    highestLevel: { $max: "$level" }
  }
}

Output:

{ "_id": "Alpha", "avgLevel": 88.33, "totalHeroes": 3, "highestLevel": 92 }
{ "_id": "Beta",  "avgLevel": 87.33, "totalHeroes": 3, "highestLevel": 95 }

Common Accumulators (Factory Tools):
$sum, $avg, $min, $max, $push → make array of names, $addToSet → array without duplicates, $first, $last (use with $sort first)

4. $sort - The Sorting Conveyor

{ $sort: { level: -1 } }

// Highest first

5. $limit & $skip - Pagination

{ $limit: 10 }
{ $skip: 20 }

// For page 3

6. $unwind - The Unpacker (For arrays)

If a hero has skills array:

{ $unwind: "$skills" }

// One document per skill

7. $lookup - The Join Machine (Bring data from another collection!)

{
  $lookup: {
    from: "teams",
    localField: "team",
    foreignField: "name",
    as: "teamInfo"
  }
}

Like SQL JOIN!


Part 4: Full Pipeline Example – Top Fire Heroes in Mumbai

db.heroes.aggregate([
  { $match: { power: "Fire" } },
  { $match: { city: "Mumbai" } },
  { $sort: { level: -1 } },
  { $project: { name: 1, level: 1, _id: 0 } },
  { $limit: 5 }
])

Beginner Win: You just built a complete factory line!



Part 5: Using Compass Aggregation Builder (Click & Build!)

Open Compass → heroes → Aggregations tab
Click + Stage → choose $match
Type { team: "Alpha" }
Add more stages by clicking
See live preview!

Beginner Magic: Build complex reports without typing!


Part 6: Pro Stages & Tricks

$facet – Multiple Reports at Once!

{
  $facet: {
    "byTeam": [
      { $group: { _id: "$team", count: { $sum: 1 } } }
    ],
    "topHeroes": [
      { $sort: { level: -1 } },
      { $limit: 3 }
    ]
  }
}

One query → multiple results!

$bucket - Group into Ranges (Age groups!)

{
  $bucket: {
    groupBy: "$level",
    boundaries: [0, 70, 80, 90, 100],
    default: "Beginner",
    output: { count: { $sum: 1 } }
  }
}
Pro Tip: Use $bucket for grouping numeric ranges and $facet for generating multiple reports simultaneously. Place these strategically for performance!

Expressions – Math & Logic

{
  $project: {
    bonusLevel: {
      $cond: {
        if: { $gte: ["$level", 90] },
        then: 20,
        else: 5
      }
    }
  }
}

Performance Tips (Expert Level)

Put $match and $sort early → uses indexes!
Use .explain() on aggregation:

db.heroes.aggregate([...]).explain("executionStats")

Indexes work with aggregation too!
For huge data → use allowDiskUse: true

db.heroes.aggregate([...], { allowDiskUse: true })
Performance Tip: Always use indexes on fields used in $match or $sort. For large collections, enable allowDiskUse: true to prevent memory issues.

Part 7: Mini Project - Build a Complete Hero Report

db.heroes.aggregate([
  { $match: { level: { $gte: 80 } } },
  {
    $group: {
      _id: "$city",
      heroes: { $push: "$name" },
      avgLevel: { $avg: "$level" },
      count: { $sum: 1 }
    }
  },
  { $sort: { avgLevel: -1 } },
  {
    $project: {
      city: "$_id",
      _id: 0,
      numberOfHeroes: "$count",
      averageLevel: { $round: ["$avgLevel", 1] },
      heroNames: "$heroes"
    }
  }
])

You just became a Data Factory CEO!

Challenge: Build a report showing the top 3 heroes per city with level > 85. Use $match, $group, $sort, and $limit to complete your factory.

Part 8: Aggregation() vs find() - When to Use What

Use find() when
Simple search, just filtering/sorting, speed is critical & simple

Use aggregate() when
Complex reports, grouping, calculations, need averages, top 10, joins, reshaping, you need powerful data transformation


Part 9: Cheat Sheet (Print & Stick!)

StageWhat It DoesExample
$matchFilter{ team: "Alpha" }
$projectReshape/show fields{ name: 1 }
$groupGroup & calculate{ _id: "$team", avg: { $avg: "$level" } }
$sortSort{ level: -1 }
$limit/$skipPagination{ $limit: 10 }
$unwindFlatten arrays{ $unwind: "$skills" }
$lookupJoin collectionsSee above
$facetMultiple reportsSee above

Frequently Asked Questions (FAQs)

1. What is the Aggregation Framework?

MongoDB's Aggregation Framework allows you to process and transform data step by step using a series of stages in a pipeline. It's perfect for complex reports and calculations.

2. When should I use aggregate() vs find()?

Use find() for simple queries and filtering. Use aggregate() for grouping, reshaping, joins, or complex calculations.

3. How do I optimize aggregation performance?

Place $match and $sort early, use indexes, and for huge datasets, enable allowDiskUse: true.

4. What is $facet?

$facet allows you to run multiple pipelines in parallel to generate different reports in a single query.

5. How do I handle arrays in aggregation?

Use $unwind to flatten arrays, then perform grouping, sorting, or projections as needed.


Final Words

You’re a Data Factory Master!
You just learned:

  • How the pipeline works (conveyor belt!)
  • All major stages with real examples
  • Pro tricks like $facet, $bucket, performance tips
  • Built reports with clicks (Compass) and code

Key Takeaways / Checklist

  • Understand how the aggregation pipeline works step by step.
  • Use $match early for better performance.
  • Remember core stages: $project, $group, $sort, $unwind, $lookup, $facet, $bucket.
  • Leverage Compass Aggregation Builder for interactive report building.
  • Use performance tips: indexes, allowDiskUse: true, and .explain() for query optimization.
  • Practice with mini projects and challenges to solidify understanding.
  • Refer to official docs and operator references for complex aggregation tasks.

Your Factory Mission:
Run this now:

db.heroes.aggregate([
  { $match: { power: "Fire" } },
  { $group: { _id: "$power", count: { $sum: 1 } } }
])

How many fire heroes?
You’re now a Certified MongoDB Aggregation Chef!

Resources:
Aggregation Docs
Pipeline Builder in Compass
All Operators List
Keep cooking awesome data dishes!

Resources & Next Steps

Try it Yourself: Share your favorite aggregation query in the comments below. Let’s build an army of Data Factory Masters together.

Featured Post

MongoDB Embedded Documents & Arrays Tutorial : Beginner to Expert

Embedded Documents & Arrays: Nested Magic Boxes in MongoDB A Russian Doll Adventure - For Beginner to Expert Level Imagine you hav...

Popular Posts