Home
/
Linux journey
/
Courses
/
Mongodb essentials
/
Mongodb Essentials Training
MongoDB first released in 2009.
MongoDB built with distribution in mind.
Open Source.
MongoDB has implemented the following features:
Schema validation.
ACID compliance
Joins
MongoDB uses sharding.
Client-Side Field-Level Encryption.
Data is never unencrypted.
If deploy MongoDB in cloud, have to check MongoDB Connection String
Regardless of install option, need the following:
MongoDB
MongoDB Shell
MongoDB Database Tools
MongoDB Community
MongoDB Enterprise
Advanced security options.
storage engines.
Management tools.
MongoDB Enterprise Kubernetes Operator
MongoDB Connector for BI
mongod
daemon process MongoDB
handles requests from MongoDB shell or drivers
performs background management operations
Mongo listens by default on Port 27017
/data/db in Linux
C:\data\db*
To insert a value, run this command:
db.test.insertOne({"hello": "world"})
Production environments run one more than mongod process, due to fault tolerance
Replica Sets
Roles
One member is elected as the Primary.
Primary receives all write operations.
The other ones are Secondaries.
Replicate operations from the Primary asynchronously, maintain exact same data sets.
If a Primary becomes unavailable, election is held and Secondary takes over as new Primary.
More than half of the replica set members have to vote for the new Primary.
Have an uneven number of replica set members, so that a successful outcome can be reached.
Setting up a Replica Set:
openssl rand -base64 755 > keyfile
Allows the running of MongoDB without authentication.
Generally it is recommended that X.509 certificate instead.
Make sure only current user has read access with chmod 400
Shell Parameter Extensions:
mongod --replSet myReplSet --dbpath ./m1/db --logpath ./m1/mongodb.log --port 27017 --keyFile ./keyfile
mongod --replSet myReplSet --dbpath ./m2/db --logpath ./m2/mongodb.log --port 27018 --fork --keyFile ./keyfile
mongod --replSet myReplSet --dbpath ./m3/db --logpath ./m3/mongodb.log --port 27019 --fork --keyFile ./keyfile
Start the replication with:
To switch to the admin user, use the following command:
LocalHost Exception for creating a user.
This commands creates a user within the DB (need admin privileges) and prompts for a password:
db.createUser({user: 'howard', pwd: passwordPrompt(), roles: ["root"]})
To then authenticate the user from the admin setting:
db.getSiblingDB("admin").auth("howard", passwordPrompt())
Add each of the replica sets with this command:
` rs.add(“localhost:27017”)`
rs.add("localhost:27018")
rs.add("localhost:27019")
Can check the status of the replica set with:
rs.status
db.serverStatus()["repl"]
Shows each of the members.
ctrl + d
killall mongod
Kills all mongo running processes.
Replica Set From a Configuration File
MBP00-3f041b:replicaset $ openssl rand -base64 755 > keyfile
MBP00-3f041b:replicaset $ chmod 400 keyfile
MBP00-3f041b:replicaset $ mkdir -p m{1,2,3}/db
MBP00-3f041b:replicaset $ touch m1.conf
MBP00-3f041b:replicaset $ vim m1.conf
MBP00-3f041b:replicaset $ cp m1.conf m2.conf
MBP00-3f041b:replicaset $ cp m1.conf m3.conf
MBP00-3f041b:replicaset $ vim m2.conf
MBP00-3f041b:replicaset $ vim m3.conf
MBP00-3f041b:replicaset $ mongod -f m1.conf
To start the replica ste, run the mongod -f m1.conf command.
To connect to one of the instances, we just run mongosh
Right out the config variable
use admin
config = { _id: “mongodb-essentials-rs”, members: [{_id: 0, host: “localhost:27017”}, {_id: 1, host: “localhost:27018”}, {_id: 2, host: “localhost:27019”}]}
rs.initiate(config)
How to initiate the replica set.
Create User with Local Host Exception:
The first user you create, should have privileges to create further users.
db.createUser({user: ‘howard’, pwd: passwordPrompt(), roles: [“root”]})
To authenticate, you have to authenticate again the database where you created the user.
db.getSiblingDB(“admin”).auth(“howard”)
rs.status()
Reports on the health of replica set members.
db.serverStatus()[‘repl’]
Gets the “repl” field value.
MongoDB Database Tools
mongostat
Statistics on a running mongod
mongodump
Export dump files to BSON
BSON is
binary encoded Javascript Object Notation
Transmits and stores data across web-based applications.
mongorestore
Import dump files from BSON
mongoexport
Export data to JSON or CSV
mongoimport
Import data from JSON or CSV.
An example of importing data into a Mongo database:
mongoimport –username=”howard” –authenticationDatabase=”admin” –db=sample_data inventory.json
Debugging Development
Good way is to check mongod.log
Disable the fork option as well.
In the configuration file
Another good way is to check the Oplog file:
use local db.oplog.rs.find( { “o.msg”: { $ne: “periodic noop” } }).sort( { $natural: -1}).limit(1).pretty()
db.oplog.rs.find( { “o.msg”: { $ne: “periodic noop” } } ).sort( { $natural: -1 } ).limit(1).pretty()
Can also increase the log level.
Can change the above settings.
db.adminCommand({ setParameter: 1, logLevel: 2})
Have a higher log level, provides more verbose log output.
However, this can cause performance degredation.
The Document Mode
MongoDB natively works with JSON documents.
Can store JSON data without prior modification.
JSON has multiple key value pairs, where the keys define the data.
The values are the ones that contain the data.
For example:
{
“course”: “MongoDB Essentials”,
“tags”: [“databases”, “document databases”, “noSQL”],
“author”: {
“name”: “Howard”,
“website”: “mongodb.learn”,
“mastadon”: “toot”
}
Can use strings, values, booleans, arrays.
MongoDB use binary-encoded serialisation of JSON-like documents called BSON for storag .
BSON design lightweight and efficient.
BSON store binary data, such as images, timestamps and longs.
Have one large database and inside that, multiple other databses.
Collections are groupings of documents.
Documents are the basic unit of data.
Each document contains one individual record.
Each document has a maximum size of 16MB
How insert a document into a collection.
The collection “authors” is also created here.
db.authors.insertOne({“name”: “Howard VDW”})
Each document in MongoDB must have unique ObjectId value. If one is not given, MongoDB will automatically assign one.
MongoDB Query Language (MQL)
Can also be referenced as MongoDB Query API
MQL allows perform CRUD operations.
JavaScript-based shell.
insertMany command takes in an array of documents.
db.authors.insertMany([{ “name”: “Bob”},{“name”: “Kevin”},{“name”: “Stuart”}])
How to find a document within MongoDB:
db.authors.find({“name”:”Howard”})
It doesn’t matter if you put quotes around the field name or not.
For example:
db.authors.find({“name”:”Howard”})
How to update one document:
db.authors.updateOne({ name : “Howard” }, { $set: { website: “www.soundsgood.com” } })
How to update many documents at once.
How to update multiple documents, set the first field as empty.
The below example creates an empty array:
* db.authors.updateMany({ }, { $set: { books: [] } })
How to delete a document:
db.authors.deleteOne({ name: “Howard” })
How to delete multiple documents (literally deletes all of the documents within a collection):
db.authors.deleteMany({})
Indexes and How They Work.
When you perform a query:
If you have no index, the database checks every document.
Called a collection scan.
Indexes are an organised way to look up data.
Store a subset of data with pointers.
These point to the location of full records.
If the query can be answered with an index, its called a covered query.
Provides more efficient queries and updates.
When should an Index be created?
When frequently query on same fields.
When frequently perform range-based queries on fields.
If have “Common Query Pattern”
Want an index on the pattern.
Indexes needs to be maintained:
Adds 10% write overhead.
Faster reads, but slower writes.
Must have enough RAM to fit the index.
Index Types
Single Field Indexes
Create an Index on only one field.
Partial Indexes
Add option to index to tell database to only match documents on a value that matches a certain condition.
Compound Indexes
Create Index on a combination of fields (useful if querying on multiple fields)
Multikey Indexes
Index on up to one array value.
It can’t be more than 1 array value (it grows super quickly otherwise)
Text Indexes
Allow you to search within text fields.
Wildcard Indexes
Indexed on a field or set of fields.
But don’t know the name of these fields, because the schema changes dynamically.
Should not be used otherwise.
Geospatial Indexes
Hashed Indexes
Can reduce the index size.
If the original values are very large.
Not performant for ranged queries.
How to create an index:
db.authors.createIndex({ name: 1 })
Have to think about how the index will look up your data quickly.
Durability in MongoDB
Guarantees acknowledged writes are permanently stored, even if the database or parts of it become temporarily unavailable.
Configuratble in MongoDB with a writeConcern
High Durability - Slower Writes
Low Durability - Faster Writes
An example of Durability:
db.authors.insertOne(
{ “name”: “Howard” },
{
w: “majority”,
j: “true”,
wtimeout: 100
}
)
wtimeout, how long write operations should block for.
j option guarantees writes are all written to disk (takes longer however) or if they are okay to be written to the in-memory journal at the time the write is acknowledged.
If J is set to true, all writes have to be written to the disk and acknowledged.
If J is set to false, Operation reported as succeed, once the journals of enough mongods have the writes. Can cause issues if power is lost during the write process.
If client issues write with the write concern as majority.
More than half of the data bearing replica set members in the deployment, must have the write, before the write acknowledgement is sent to the client.
One secondary must propagate the write, before the primary can acknowledge the write to the client.
Remaining nodes choose a new primary and continue working.
Higher write concern, makes data loss less likely.
If data integrity is important, set the writeconcern to majority and it helps with failovers.
How Access Array Values
db.movies.findOne({“genres.0”: “Musical”})
Lists all of the documents that have their Genre set as Musical.
Find allows you to see only data that is majority committed .
How to set the readConcern:
db.authors.find({ }).readConcern("majority")
readConcern can be the following:
local (default)
Returns most recent data on mongod conncected to.
No regard for majority connected data.
Available
Used for sharded clusters and similar to local.
Majority
Linearizable
Only majority of committed data only.
Waits for any current writes to complete, before reading and returning the data.
Can configure the readPreference
Allows application to route reads to secondaries.
primary (default)
All of the reads come from the Primary.
primaryPreferred
Allows reads to be writed to secondaries, the primary is still the preferred option.
secondary
Reads are routed directly to secondaries.
secondaryPreferred
Will only go to primary for certain circumstances.
nearest (lowest latency)
The node with the lowest latency to where you are querying from.
Risk reading stales data when reading from secondaries.
Fine for analytics.
Don’t use it, if the goal is to increase your traffic capacity.
Comparison Operators
$eq
$gt
$gte
$lt
*< >
$lte
*less than or equal to
$ne
Example commands:
db.inventory.findOne({ “variations.quantity”: { $gte: 8 } })
Shows all of the documents that have a quantity of 8 or more.
db.inventory.findOne({ “price”: { $lt: 1700 } })
Checks for a price that is lower than 1700.
$in
$nin
Further Examples:
db.inventory.findOne({ “variations.variation”: { $in: [ “Blue”, “Red” ] } })
Gives us back 1 car, where the variation is “Red”
Searches for items that are only “Blue” or “Red”
db.inventory.findOne({ “variations.variation”: { $nin: [ “Blue”, “Red” ] } })
We only get cars that are NOT Blue or Red
Logical Operators
$and (^)
Query for items that match multiple conditions.
db.inventory.findOne({ $and: [{“variations.quantity”:{$ne: 0}},{“variations.quantity”:{$exists: true}}]})
Checks for a document where the quantity is not equal to zero and that are documents exists.
$or (v)
db.inventory.findOne({ $or: [{“variations.variation”: “Blue”}, {“variations.variation”: “Green”}, {“variations.variation”: “Teal”}]})
Finds a documen that is either Blue, Green or Teal
$nor (^)
db.inventory.findOne({ $nor: [{price: {$gt: 8000}}, {“variations.variation”: “Blue”}]})
Checks for a car, where the price is not greater than 8000 and the variation is not blue.
$not (^)
db.inventory.findOne({ “price”: {$not: {$gt: 2000}}})
Matches on the price field and a documents that is not greater than 2000.
Sort, Skip, Limit
db.movies.find({}, {title: 1, director: 1, genres: 1}).sort({ title: 1})
Sorts movies by title.
Can sort results on multiple fields.
db.movies.find({}, {title: 1, director: 1, genres: 1}).sort({ director: 1, title: 1})
Checks for directors that start with the letter A.
How to use the Skip method:
This one skips the first 100 results:
db.movies.find({}, {title: 1, director: 1, genres: 1}).sort({ director: 1, title: 1}).skip(100)
Shows then the directors starting with the letter B.
Limit can “Limit” the results.
db.movies.find({}, {title: 1, director: 1, genres: 1}).sort({ director: 1, title: 1}).skip(100).limit(3)
Helps to limit the results to just 3 only.
When sort is a common query pattern, use an index.
If it is not a common query pattern, using sort with a limit is much faster regarding the algorithm used.
Mongo will always perform Sort, then Skip and then Limit in that order.
updateOne and updateMany
If you run this and it matches multiple documents, only the first document matched will be updated.
an updateOne example:
db.authors.updateOne({name: “Howard VDW”}, { $set: {message: “Hello World!” }})
The above example adds the “message” field.
It was this before:
_id: ObjectId(“640924af841d3b1208bbf975”),
name: ‘Howard VDW’,
books: [],
Now it is this:
_id: ObjectId(“640924af841d3b1208bbf975”),
name: ‘Howard VDW’,
books: [],
message: ‘Hello World!’
an updateMany example:
You can match all by specifying an empty document in the first argument with {}
db.authors.updateMany({}, {$set: {message: “Hello” }})
Adds a message saying “Hello” to all fields.
Also unset operator:
db.authors.updateMany({}, {$unset: { message: “” }})
The above example removes the “message” field that we had before.
Commonly Used Update Operators:
{ $set: {msg: “Hello world!”}}
{ $unset: {msg: “”}}
{ $inc: {quantity: -1, ordered: 1}}
Increases a value, either positively or negatively.
{ $mul: {price: 0.9}}
Multiplies a field by a specified value.
{$max: {bid: 500}}
Updates the value to the specified value, ONLY if the value is not already at that level.
{$min: {lowest_available_price: 500}}
} The opposite of max and only if the original value is not lower than the value specified.
Arrays
Using Find:
db.movies.find({genres: “Comedy”})
Checks if Comedy has any value iniside the array.
Can specify an array of values:
db.movies.find({genres: [ “Comedy”, “Drama”, “Thriller”]})
Finds all movies that have Comedy, Drama and Thriller genres.
$all operator brings back documents that match everything within the array.
db.movies.find({genres: { $all: [ “Comedy”, “Drama”]}})
$elemMatch –> Specifiy multiple conditions that have to be matched by one document in the array.
db.inventory.find({ variations: { $elemMatch: { variation: “Blue”, quantity: {$gte: 8} }}})
How to update arrays as well.
$push can add more fields.
An example below of adding a field called “test” to the available genres.
db.movies.updateOne({_id: ObjectId(“63fd50eaf2d7bf128c7ca0d5”)}, {$push:{genres: “Test”}})
How to find that document you updated again:
db.movies.findOne({_id: ObjectId(“63fd50eaf2d7bf128c7ca0d5”)})
$addToSet
Only adds the elements we specifiy, if it is NOT already present in the array.
An example of $addToSet below:
db.movies.updateOne({_id: ObjectId(“63fd50eaf2d7bf128c7ca0d5”)}, {$addToSet:{genres: “Test”}})
How to REMOVE things from an array.
*$pop operator.
This will remove the LAST item in the array genres array, for example if there was “test” and “green”, it would remove “green”, but “test” would still be there.
db.movies.updateOne({_id: ObjectId(“63fd50eaf2d7bf128c7ca0d5”)}, {$pop: {genres: 1}})
To remove the FIRST element in the array:
Use -1 instead of 1
db.movies.updateOne({_id: ObjectId(“63fd50eaf2d7bf128c7ca0d5”)}, {$pop: {genres: -1}})
Transactions
An operation on a single document is atomic.
If there are two people and one writes and one reads, the document will either be read to first or written to first.
Multi-document Transactions
When someone makes multiple changes to a document.
Guarantee atomicity of reads and writes to multiple documents.
Reads return all documents in the state they were when the read began.
Either all writes occur or none occur.
Transactions can be used across Operations, Documents, Collections and Databases.
How to Create a Session Objects:
session = db.getMongo().startSession({ readPreference: { mode: “primary”} })
To start the sesssion:
session.startTransaction()
How to use Transactions.
session.getDatabase(“blog”).authors.updateMany({},{$set: {message: “Transaction occured”}})
How to update multiple documents.
To End a Session:
Overuse of transactions, leads to performance degredation.
If you need transactions, check the data model.
$expr
Can compare different document values.
db.movies.find({},{title:1, ratings:1})
Good way to find go throug all matching documents.
The $ is required in front of it, for the string literal.
An example of $expr:
db.movies.find({$expr:{$gt: [{$multiply: [“$ratings.mndb”, 10]},”$ratings.soft_avocadoes”]}})
This multiplies the mndb rating by 10 (to make the rating out of 100, which is the same rating that the soft_avocadoes scale uses) and can then be compared.
$expr has a lot of operators that can be used with it.
Any common operations used with progrsamming, can be done with programming as well.
Look up Aggregation Pipeline Operators on the MongoDB Documentation.
Aggregation Pipeline
db.collection.aggregate( [] )
This is how you start it out. To use operators.
$group
grouping patterns are useful for grouping data.
Specify with $ value, because we want to get the value of the field.
An example would be:
db.inventory.aggregate([{ $group: { _id: “$source”}}])
Then it produces an output such as:
[
{ _id: ‘Jetpulse’ }, { _id: ‘Brightdog’ },
{ _id: ‘Mudo’ }, { _id: ‘Browsedrive’ },
{ _id: ‘Skivee’ }, { _id: ‘Voolith’ },
{ _id: ‘Skibox’ }, { _id: ‘Babbleblab’ },
{ _id: ‘Realbridge’ }, { _id: ‘Avaveo’ },
{ _id: ‘Yacero’ }, { _id: ‘Thoughtbridge’ },
{ _id: ‘Yadel’ }, { _id: ‘Gigaclub’ },
{ _id: ‘Meedoo’ }, { _id: ‘Yodo’ },
{ _id: ‘Ozu’ }, { _id: ‘Gabtype’ },
{ _id: ‘Camido’ }, { _id: ‘Skiptube’ }
]
Using $sum adds one. An example is:
db.inventory.aggregate([{ $group: { _id: “$source”, count: {$sum: 1}}}])
Which gives an output such as:
[
{ _id: ‘Shufflester’, count: 5 },
{ _id: ‘Pixoboo’, count: 3 },
{ _id: ‘Skinder’, count: 2 },
{ _id: ‘Omba’, count: 3 },
{ _id: ‘Browsezoom’, count: 2 },
{ _id: ‘Skiptube’, count: 1 },
{ _id: ‘Zava’, count: 2 },
{ _id: ‘Gabtype’, count: 2 },
{ _id: ‘Camido’, count: 2 },
{ _id: ‘Meedoo’, count: 5 },
{ _id: ‘Yodo’, count: 2 },
{ _id: ‘Ozu’, count: 1 },
{ _id: ‘Yadel’, count: 3 },
{ _id: ‘Gigaclub’, count: 4 },
{ _id: ‘Yacero’, count: 1 },
{ _id: ‘Thoughtbridge’, count: 2 },
{ _id: ‘Realbridge’, count: 2 },
{ _id: ‘Avaveo’, count: 3 },
{ _id: ‘Babbleblab’, count: 4 },
{ _id: ‘Browsedrive’, count: 1 }
]
This one adds an array of car names in the “items” field:
db.inventory.aggregate([{ $group: { _id: “$source”, count: {$sum: 1}, items: { $push: “$name”} }}])
{
_id: ‘Babbleblab’,
count: 4,
items: [ ‘Land Rover’, ‘Hummer’, ‘Toyota’, ‘Oldsmobile’ ]
},
{ _id: ‘Jetpulse’, count: 1, items: [ ‘Toyota’ ] },
{ _id: ‘Browsedrive’, count: 1, items: [ ‘Hyundai’ ] }
You can see the average price as well:
db.inventory.aggregate([{ $group: { _id: “$source”, count: {$sum: 1}, items: { $push: “$name”}, avg_price: {$avg: “$price”} }}])
$bucket
Instead of grouping by one value for all documents, you define bucket ranges for a value.
If the values fall into that range, they will be placed into that particular bucket.
This example will place different documents into separate buckets and any other documents that don’t fit into that, will go into the Other category.
Example output:
{ _id: 1980, count: 85 },
{ _id: 1990, count: 340 },
{ _id: 2000, count: 431 },
{ _id: 2010, count: 127 },
{ _id: ‘Other’, count: 17 }
How to add more context than just count:
db.inventory.aggregate([{$bucket: {groupBy: “$year”, boundaries: [1980,1990, 2000, 2010, 2020], default: “Other”, output: { count: {$sum: 1}, cars: { $push: {name: “$name”, model: “$model”}}}}}])
An example output:
_id: ‘Other’,
count: 17,
cars: [
{ name: ‘Chevrolet’, model: ‘Camaro’ },
{ name: ‘Volkswagen’, model: ‘Beetle’ },
{ name: ‘Pontiac’, model: ‘Grand Prix’ },
{ name: ‘Pontiac’, model: ‘Grand Prix’ },
{ name: ‘Pontiac’, model: ‘Grand Prix’ },
{ name: ‘Plymouth’, model: ‘Volare’ },
{ name: ‘Porsche’, model: ‘914’ },
{ name: ‘Chevrolet’, model: ‘Camaro’ },
{ name: ‘Pontiac’, model: ‘GTO’ },
{ name: ‘Chevrolet’, model: ‘Monte Carlo’ },
{ name: ‘Chevrolet’, model: ‘Vega’ },
{ name: ‘Ford’, model: ‘Mustang’ },
{ name: ‘Ford’, model: ‘Thunderbird’ },
{ name: ‘Dodge’, model: ‘Charger’ },
{ name: ‘Studebaker’, model: ‘Avanti’ },
{ name: ‘Austin’, model: ‘Mini Cooper S’ },
{ name: ‘Pontiac’, model: ‘GTO’ }
$bucketAuto
Automatically define the boundary and distributes fairly between all groups.
An example of $bucketAuto
db.inventory.aggregate([{$bucketAuto: {groupBy: “$year”, buckets: 5 }}])
$unwind
Create one output document for each array element.
db.inventory.aggregate([{$unwind: “$variations”}])
Shows individual documents for each car variation.
db.inventory.aggregate([{$unwind: “$variations”}, {$match: {“variations.variation”: “Purple”}}])
Matches only on the colour purple and only shows purple cars.
To make the query more efficient, you can add the following:
db.inventory.aggregate([{$match: {“variations.variation”: “Purple”}},{$unwind: “$variations”}, {$match: {“variations.variation”: “Purple”}}])
$out
Store output of aggregation pipeline into new collection.
An example:
db.inventory.aggregate([{$match: {“variations.variation”: “Purple”}},{$unwind: “$variations”}, {$match: {“variations.variation”: “Purple”}}, {$out: { db: “sample_data”, coll: “purple” }}])
Creates a collection, only containing purple cars.
db.purple.find({})
$merge
Similar to $out.
Also allows to merge results into an existing collection.
A good example of this:
db.inventory.aggregate([{$match: {“variations.variation”: “Purple”}},{$unwind: “$variations”}, {$match: {“variations.variation”: “Purple”}}, {$merge: { into: “purple”, on: “_id”, whenMatched: “keepExisting”, whenNotMatched: “insert” }}])
$function
Allows writing of Javascript functions, that operate on the field values of the documents.
An example:
function custom_agg_expression(actors) {return actors.sort();}
db.movies.aggregate([{$project: {title: 1,actors:{$function: {body: “function(actors) {return actors.sort(); }”, args: [“$actors”],lang: “js”}}}}])
Shows the titles and actors in alphabetical order.
lookup
For example, we have Orders and Inventory
Can merge information from both documents, for example matching the car ID fields from both Orders and Inventory.
An example:
db.orders.aggregate([{$lookup:{from: “inventory”, localField: “car_id”, foreignField: “_id”, as: “car_id”}}])
Pulls in information regarding the car ID:
Here is the output:
_id: ObjectId(“622fc4ebf464966bf901547c”),
name_id: ‘Pamela Franzelini’,
price: 17242.17,
credit_card: ‘3562980000615614’,
credit_card_type: ‘jcb’,
car_id: [
{
_id: ‘120983921-0’,
name: ‘Nissan’,
model: ‘GT-R’,
year: 2010,
price: 17642.36,
source: ‘Jabbercube’,
sale_frequency: ‘Daily’,
variations: [ { variation: ‘Green’, quantity: 12 } ]
}
Must create an index on the foreignField.
The performance will be degredated otherwise.
Common query patterns rarely require joins.
If use a lot of lookups, not structuring data well for MongoDB.
Data you query together, should be in the same documents.
Performance
Aggregation pipelines generally require more RAM and CPU than CRUD operations.
Large data.
Run frequently.
Operation needs to be fast.
Example query:
db.movies.explain(“executionStats”).aggregate([{$project: {release_year: {$year: “$release_year”},title: 1}},{$lookup: {from “inventory”, localfField: “release_year”, foreignField: “year”, as: “year”}}])
When query plan goes up to certain threshold of results, that query plan becomes the winner.
Collection Scan –> Checks for all results in certain collection.
1000 movies and 1000 cars, the database has to check everything.
Total Keys Examined: If 0, then no index was used.
Collection Scans
Can see slow queries in MongoDB logs and why they were slow.
Because a query does not show up in the MongoD logs, doesn’t necessarily mean that it is fast.
If they are slower than 100ms, they are not logged in the MongoD logs.
Use the Native Profiler
db.setProfilingLevel(1, {slowms: 20})
Finds operators that are slower than a certain amount of MS.
This profile with low MS values can slow down the deployment.
Common operations:
$sort + $limit
Makes it possible to perform sort operations faster.
$project as the final stage.
Not better to project early in the pipeline.
Optimiser will do that for you.
Hinting
Can tell an aggregation pipeline to use a specific index, if it is using a less optimal one by default.
db.collection.aggregate(pipeline, {hint: “index_name”})
Analytics nodes
If running a lot of aggregations and impacting performance.
Can run multiple analytics secondary nodes as well, to stop performance.
If notice aggregation pipelines slowing down MongoDB.
Require to kill the operation.
db.currentOp(true)
db.adminCommand({“killOp”:1, “op”: OP_NUMBER})
Can use a specific Index if using an op
Relational vs. Document Models
Relational DBs are structured in tables.
When querying data, you join multiple tables together to gather the information.
Storing JSON documents.
Relational databases store JSON in blobs.
MongoDB just stores JSON files.
Relational DBs - good performance, but that’s about it.
Document - optimal performance.
Store all JSON documents as they are.
BSON makes operations for documents much more efficient.
Storing tables
Data from a person or an other is done via joins.
Can perform joins using the $lookup:
Example:
db.orders.aggregate([{
$lookup: {
from: “inventory”,
localField: “car_id”,
foreignField: “_id”,
as: “car_id”
}
}])
Relational models are optimised for working with tables and jooins.
MongoDB is not optimised for working with tables and joins primarily.
Storing data in the data models best for each database.
Which one is better for querying?
Can query MongoDB with SQL.
Consider:
Performance.
Ease of development
Learning time
Scale
Joins only perform quickly, if tables are located close to each other.
More performant to scale horizontally with sharding.
Data modelling
Data commonly queried together should live close together.
16 MB document limit.
Aggregation pipeline processing limits.
Performance is better with small documents.
One-to-one relationship;
Person + DOB for example.
The documents gets too big.
Rarely use information.
One-to-many relationship.
One to few - store both in one collection using nesting or arrays.
Authors and Books for example
One to many - store them separately with links.
Many-tomany relationship
Don’t embed the informtion.
Store IDs in either one or both collections.
Keep arrays small –> Less than 100 elements.
Documents should not be large or flat.
Flexible Schema
Schema defines structure and contents of data in colleciton.
Documents contain fields marked as “required”.
Documents field values conform to specified data types.
Example:
validator: {$jsonSchema: {
bsonType: “object”,
required: [ “name”, “message”],
properties: {
name: { bsonType: “string”,
description: “must be a string and is required”},
message: { bsonType: “string”,
description: “must be a string and is required”}
}}}
Schema validation is not required:
Documents should have a common structure.
To force a schema:
Here is an example:
db.runCommand({ collMod: “movies”, validator: { $jsonSchema: { bsonType: “object”, required: [“title”, “director”], properties: { title: { bsonType: “string” }, director: {bsonType: “string” }}}}, validationLevel: “moderate” })
Then if you try to insert a new document, the schema will check the document and either accept or reject it.
MongoDB Drivers
Can interface with Mongo via Python and so on.