home
Hero image for: How to detect and delete orphaned documents in Mongo DB

How to detect and delete orphaned documents in Mongo DB

By Eduardo García ● CTO | August 22nd, 2019

In any database relational or not, usually, we link tables or collections  ( in the case of Mongo DB). But, one process that I found a little tricky to do on my firsts days with mongo. Was how to detect documents that are not related to anyone, but it's is expected to be related, if not maybe we want to delete those documents.

Let me set an example. Let's imagine that we have a collection named clients and another collection named projects, and projects are related to clients using the Client Id Document.

As you maybe know each document in Mongo DB could have a different structure by his own. So, we cannot warranty that all documents have the same fields, but let imagine our client documents looks like this.

Using the following query, we could the first document in our collection

db.clients.findOne();

As a result, we could get something like this

{
"_id" : "24Ne5krwnba8hxHRc",
"name" : "ACME",
"createdAt" : ISODate("2017-11-08T19:23:09.730Z"),
"updatedAt" : ISODate("2019-02-21T16:15:27.796Z"),
"country" : "GB",
"email" : "info@example.com",
}

We could do the same with the collection projects:

db.projects.findOne();

Below an example of a project document

{
	"_id" : "2W9anDLvkB8td7NEu",
	"name" : "Policy Book",
	"client" : "uBXNLm2MdgMXkWcRB",
	"createdAt" : ISODate("2017-12-05T19:01:43.914Z"),
	"updatedAt" : ISODate("2017-12-20T19:37:49.669Z"),
}

Now, imagine that for some strange reason we have the idea that we have clients orphaned of projects. In other words, clients without any projects, and for this example, we will say those clients are required to be removed.

Below I will show you how to accomplish this task.

Creating a cursor

In Mongo DB a cursor is a query result, that you can iterate using these cursors methods.

var cursor = db.getCollection('clients').aggregate([
    {
      $lookup:
        {
          from: "projects",
          localField: "_id",
          foreignField: "client",
          as: "matched_docs"
        }
   },
   {
      $match: { "matched_docs": { $eq: [] } }
   }
]);

Now, let me dissect a little bit the query itself.

Aggregate

The aggregate method allows you to operate in one or more collection to execute a plan or pipeline of a query to get the information needed.

In our case, our pivot collection is clients,and we will run the aggregation on this collection.

$Lookup

In Mongo DB, a pipeline is a combination of stages that are executed in sequence.

The $lookup stage allows us to implement a left outer join between our pivot collection and another collection.

Let's review in detail

$lookup:
        {
          from: "projects",
          localField: "_id",
          foreignField: "client",
          as: "matched_docs"
        }
}

The snippet about said, we want to do a left outer join between clients out pivot table and the collection projects, using the fields clients._id and project.client as a cross-reference to do the join.

In the end, the result is stored in a new field in result document using the name matcheddocs_ you can use any name that makes more sense to you.

Filtering

After doing the left outer join we need to filter the relationship to be sure we will only get those clients without projects, for that we will use the stage match in combination with the operator $eq.

{
      $match: { "matched_docs": { $eq: [] } }
}

This filtering will return only the relations with an empty array of documents, which is equal to orphaned clients.

Getting clients to delete

Using the cursor, we could iterate it to create a list of clients Ids ready to be removed.

var client_ids = cursor.map(function (doc) { return doc._id; });

Remove clients

Now we need to iterate our array of client ids to remove those clients using the collection method remove.

db.getCollection('clients').remove({"_id": { "$in": client_ids }});

I hope that you now have a better understanding of how to fetch information from Mongo DB and operate with the results.

Happy queries.