Rules of Thumb for MongoDB Schema Design
This is a summary of 6 Rules of Thumb for MongoDB Schema Design, which details how should MongoDB schemas should be organized in three separate blogs posts. So please take a look if this summarization is not sufficient.
If you are new to MongoDB it is natural to ask how should one structure the schema for One-to-Many relationship.
Relationships can be in three different forms.
Each methods for structuring has its pros and cons. So the user should know how to decide which one is better than the other in the given situation.
- You can call all the information in one query
- It is impossible to search the contained entity independently.
Parent holds the list of ObjectID of the child documents. This requires an application level join, not the database level join.
- It is easy to handle insert, delete on each documents independently.
- It has flexibility for implementing N-to-N relationship because it is an application level join.
- Performance drops as you call documents multiple times.
If you need to store tons of data (ie. event logs), you need to use a different approach since a document cannot be larger than 16MB in size. You need to use ‘parent-referencing’.
Later you can join like below.
We put references on bot sides in order to find the opposite document in one-to-many relationship.
- It is easy to search on both Person and Task documents
- It requires two separate queries to update an item. The update is not atomic.
We can structure the schema like below to avoid making multiple queries.
You can use it like below in the application level:
- Denormalization reduces the cost of calling the data.
- When you want to update the part name, you have to update all names contained inside product document
- It is not a good choice when updates are frequent
This form of denormalization is favorable when there is no frequent updates and squillion reads.
Unlike the previous example, we can denormalize in the opposite way. The same pros and cons apply.
In fact, you can merge the two documents.
It would be used like this in the code.
Here are some “rules of thumb” to guide you through these indenumberable (but not infinite) choices
favor embedding unless there is a compelling reason not to
needing to access an object on its own is a compelling reason not to embed it
Arrays should not grow without bound. If there are more than a couple of hundred documents on the “many” side, don’t embed them; if there are more than a few thousand documents on the “many” side, don’t use an array of ObjectID references. High-cardinality arrays are a compelling reason not to embed.
Don’t be afraid of application-level joins: if you index correctly and use the projection specifier then application-level joins are barely more expensive than server-side joins in a relational database.
Consider the write/read ratio when denormalizing. A field that will mostly be read and only seldom updated is a good candidate for denormalization: if you denormalize a field that is updated frequently then the extra work of finding and updating all the instances is likely to overwhelm the savings that you get from denormalizing.
As always with MongoDB, how you model your data depends – entirely – on your particular application’s data access patterns. You want to structure your data to match the ways that your application queries and updates it.