mongodb WARNING!

  • [+] it is 100% open source 😀
  • [?] currently Server Side Public License (SSPL) licenced (wiki)
  • [?] 3-letter-agencies might have been involved in the initial funding mongodb (which might or might not be a problem) (Wiki)
  • [-] documentation chaos 🙁 also the cli program named “mongo” was renamed “mongosh” which means, A LOT of howtos and documentation is still using the old name
  • [-] support not reachable, also the forum does not show a lot of activity 🙁

interesting examples can be found here: https://www.mongodb.com/docs/drivers/rust/current/fundamentals/performance/

let client = Client::with_uri_str("").await?;
let data = doc! { "title": "1984", "author": "George Orwell" };

mongodb is written in C++ but most examples are written for rust? interesting 😀

dwaves now also using codeberg.org instead of github (hopefully it will never be sold) for uploading open source code and hopfeully good cooperation with the open source community 🙂

https://codeberg.org/dwaves/PYTHON_EXAMPLES_ONLINE/

# this test was performed inside debian13 kvm qemu vm running on above hardware
su - root
# install mongodb according to latest howto

apt -y install gnupg curl; # this used to be the howto install mongodb in 2025-03
curl -fsSL https://www.mongodb.org/static/pgp/server-8.0.asc | gpg -o /usr/share/keyrings/mongodb-server-8.0.gpg --dearmor;
echo "deb [ signed-by=/usr/share/keyrings/mongodb-server-8.0.gpg ] http://repo.mongodb.org/apt/debian bookworm/mongodb-org/8.0 main" | tee /etc/apt/sources.list.d/mongodb-org-8.0.list;
apt update;
apt install -y mongodb-org;
systemctl list-unit-files --type=service|grep mongo;
mongod.service disabled enabled
systemctl start mongod.service;
systemctl status mongod.service;

# get the benchmark scripts
mkdir /scripts
cd /scripts
apt -y install python3-pymongo; # install requirements

# single threaded Lord of the Rings based benchmark
apt -y install python3-tqdm python3-pymongo; # install requirements
### WARNING! have sufficient disk space available!
# 100mio records take ~24Gbytes

## install_requirements:
apt -y install python3-tqdm python3-pymongo
git clone https://codeberg.org/dwaves/PYTHON_EXAMPLES_ONLINE.git

## usage:
cd PYTHON_EXAMPLES_ONLINE
time python3 ./benchmark_mongodb_singlethread_LOTR_10mio.py; # test with 10mio records
time python3 ./benchmark_mongodb_singlethread_LOTR_10mio.py 100_000_000; # yes in python numbers can be written like this, test with 100mio records

# that's what it looks like
# if user is running 
# terminal1: iostat -cdx 1
# terminal2: htop
# never version with I/O tab (hit tab to switch)
# terminal3: while true; do mongosh --eval "show dbs"; sleep 1; clear; done
# next: multi threaded version
# answering the question: how is performance with a lot of parallel read, write, update access?
# which means: python is starting to insert_one 2x 10x 100x threads at the same time
# observation: more threads = write performance suffers, but read performance suffers even more

# some results
neofetch
       _,met$$$$$gg.          user@debian12DevPython 
    ,g$$$$$$$$$$$$$$$P.       ---------------------- 
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux 12 (bookworm) x86_64 
 ,$$P'              `$$$.     Host: KVM/QEMU (Standard PC (Q35 + ICH9, 2009) pc-q35-7.2) 
',$$P       ,ggs.     `$$b:   Kernel: 6.1.0-28-amd64 
 $$:      $$.   -    ,d$$'    Shell: bash 5.2.15 
          `"Y$b._             CPU: 10x Cores assigned to kvm vm: AMD Ryzen 5 5600G with Radeon Graphics (8) @ 3.928GHz 
                              Memory: 2685MiB / 7843MiB 

### basic system info ###
python version:  3.11.2
kernel version:  Linux 6.1.0-28-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.119-1 (2024-11-22)
mongodb version: db version v8.0.5

# run mongodb benchmark, with 100000x records 100x threads
time python3 /scripts/benchmark_mongodb_multithread_singledatabase_v1-2.py 100000 100; # (more threads than cores)

##### results: #####
performance_insert:  6934.01 per second
performance_read:    63.27 per second
performance_change:  69.29 per second

time python3 /scripts/benchmark_mongodb_multithread_singledatabase_v1-2.py 100000 10; # (threads matches cores count)

##### results: #####
performance_insert:  6993.00 per second
performance_read:    634.00 per second
performance_change:  675.20 per second

time python3 /scripts/benchmark_mongodb_multithread_singledatabase_v1-2.py 100000 2; # (less threads than cores)

##### results: #####
performance_insert:  9660.00 per second
performance_read:    4254.00 per second
performance_change:  4650.00 per second

time python3 /scripts/benchmark_mongodb_multithread_singledatabase_v1-2.py 100000 2; # best performance yet

##### results: #####
performance_insert:  9793.00 per second
performance_read:    4285.50 per second
performance_change:  4617.50 per second

time python3 /scripts/benchmark_mongodb_multithread_singledatabase_v1-2.py 100000 1; # single thread slower

##### results: #####
performance_insert:  5852.00 per second
performance_read:    5439.00 per second
performance_change:  4974.00 per second

neofetch 
            .-/+oossssoo+/-.               root@ubuntu2204
        `:+ssssssssssssssssss+:`           -------------------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 22.04.4 LTS x86_64 
    .ossssssssssssssssssdMMMNysssso.       Host: VMware Virtual Platform None 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 5.15.0-133-generic 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 7 days, 11 hours, 54 mins 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   2x cores CPU: Intel Xeon E5-2690 v2 (2) @ 2.999GHz 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Memory: 2144MiB / 7937MiB 

##### benchmark_mongodb_multithread_singledatabase_v1.py #####
### basic system info ###
python version:  3.10.12
kernel version:  Linux 5.15.0-133-generic #144-Ubuntu SMP Fri Feb 7 20:47:38 UTC 2025
mongodb version: db version v6.0.15

time python3 /scripts/benchmark_mongodb_multithread_singledatabase_v1-2.py 100000 10

##### results: #####
performance_insert:  2756.60 per second
performance_read:    248.70 per second
performance_change:  251.70 per second

time python3 /scripts/benchmark_mongodb_multithread_singledatabase_v1-2.py 100000 100

##### results: #####
performance_insert:  2353.99 per second
performance_read:    22.41 per second
performance_change:  23.54 per second

time python3 /scripts/benchmark_mongodb_multithread_singledatabase_v1-2.py 100000 2

##### results: #####
performance_insert:  3479.50 per second
performance_read:    1548.50 per second
performance_change:  1610.00 per second

also check out single threaded version:

# single threaded version (can start it multiple times as a job &)
wget https://dwaves.de/scripts/bench/benchmark_mongodb3.1.py.txt
mv benchmark_mongodb3.1.py.txt benchmark_mongodb3.1.py

time /usr/bin/python3 /scripts/benchmark_mongodb3.1.py; # run benchmark

btw in order to drop a database from bash oneliner

mongosh databasename --eval "db.dropDatabase()"

virtual environment fail: for some reason even when installing with pip or pip3 python can not find pymongo 🙁

apt install python3.13-venv; # install requirements
python3 -m venv myenv; # create
source myenv/bin/activate; # activate
pip3 install pymongo; # install pip packages like robotframework

/usr/bin/python3
from pymongo import MongoClient
Traceback (most recent call last):
  File "", line 1, in 
    from pymongo import MongoClient
ModuleNotFoundError: No module named 'pymongo'

observations:

  • RAM is not even 50% used
  • CPU cores are mostly not 100% in use
  • low disk activity according to iotop ~ 1.5Mbytes/sec
  • less threads = faster?

hardware upgrades: while single thread performance of CPU definately would help, more RAM more CPU cores, faster HD will not help in this scenario…

the big question: how to optimize mongodb for performance?

sharding?

currently testing this, it’s not easy to setup especially because of the documentation chaos (sorry) this video was usefull

here is a script that automatically sets up 2x shards with 2x replsets each, the endpoint for pymongo to connect to is: localhost:26061

############ 4th attempt: how to setup mongodb + replSet + sharding, only 2x shards!
### vmname: debian-13-mongodb-xfs-all-in-one
# attach a 2nd as fast as possible disk to vm (vdb)

# format + mount vdb
mkdir /media/vdb;
mkfs.xfs -L vdb /dev/vdb;
mount /dev/vdb /media/vdb;

PATH_TO_MONGO1="/media/vda/mongo1"; # spreading shards over 2x disks
mkdir -p $PATH_TO_MONGO1
PATH_TO_MONGO2="/media/vdb/mongo2";
mkdir -p $PATH_TO_MONGO2; # creating a dir that holds all other dir for this test system

mkdir -v $PATH_TO_MONGO1/cfg1; # where config servers store data
mkdir -v $PATH_TO_MONGO2/cfg2
mkdir -v $PATH_TO_MONGO1/a1 $PATH_TO_MONGO1/a2
mkdir -v $PATH_TO_MONGO2/b1 $PATH_TO_MONGO2/b2; # 2x shards each on a spearate disk

mongod --configsvr --dbpath $PATH_TO_MONGO1/cfg1 --port 26051 --fork --logpath /var/log/log.cfg1 --replSet cfg; # starting 1st cfg server, will replicate 1st cfg server
mongod --configsvr --dbpath $PATH_TO_MONGO2/cfg2 --port 26052 --fork --logpath /var/log/log.cfg2 --replSet cfg; # starting 2nd cfg server, will replicate 1st cfg server

mongosh --port 26051 --eval 'rs.initiate()';
mongosh --port 26051 --eval 'rs.add("localhost:26052")';

mongod --shardsvr --replSet a --dbpath $PATH_TO_MONGO1/a1 --port 26001 --fork --logpath /var/log/log.a1; # start 1st shard of replicaset
mongod --shardsvr --replSet a --dbpath $PATH_TO_MONGO1/a2 --port 26002 --fork --logpath /var/log/log.a2;

mongod --shardsvr --replSet b --dbpath $PATH_TO_MONGO2/b1 --port 26101 --fork --logpath /var/log/log.b1; # start 2nd shard of replicaset
mongod --shardsvr --replSet b --dbpath $PATH_TO_MONGO2/b2 --port 26102 --fork --logpath /var/log/log.b2;

# initiate replication
mongosh --port 26001 --eval 'rs.initiate()';
mongosh --port 26001 --eval 'rs.add("localhost:26002")';

mongosh --port 26101 --eval 'rs.initiate()';
mongosh --port 26101 --eval 'rs.add("localhost:26102")';

mongos --configdb "cfg/localhost:26051,localhost:26052" --fork --logpath /var/log/log.mongos1 --port 26061

mongosh --port 26061 --eval 'sh.addShard("a/localhost:26001")';
mongosh --port 26061 --eval 'sh.addShard("b/localhost:26101")';
mongosh --port 26061 --eval 'sh.status';

# enable sharding
mongosh --port 26061 --eval 'use benchmark2025';
mongosh --port 26061 --eval 'show dbs';
mongosh --port 26061 --eval 'sh.enableSharding("benchmark2025")';
mongosh --port 26061 --eval 'sh.shardCollection("benchmark2025.document",{_id:1})';

mongosh --port 26061
use config;
db.shards.find();

chown -R mongodb:mongodb $PATH_TO_MONGO1;
chown -R mongodb:mongodb $PATH_TO_MONGO2;

# startup order
# 1. config servers (mongod instances)
# 2. shards (mongod instances)
# 3. mongos routers (mongos instances)

ChatGPT?

question: how to tune mongodb for performance? (maybe even faster parallelism?)

answer: WARNING! SOME PARAMETERS MIGHT NOT EVEN EXIST (ChatGPT likes to make up imaginary parameters and settings)

From your observations, it looks like MongoDB is not resource-starved. Your system has plenty of RAM, CPU usage is moderate, and disk I/O is not maxed out. The fact that reducing the number of threads seems to improve performance suggests that thread contention, locking, or some form of internal overhead is at play. Let’s break this down and explore performance tuning.


🔍 Possible Performance Bottlenecks

  1. Thread contention & parallelism overhead
    • If fewer threads are performing better, it might indicate lock contention or scheduling inefficiencies.
    • Too many concurrent queries can overwhelm MongoDB’s internal resource management.
  2. Query Execution Efficiency
    • Are the queries well-indexed?
    • Are they using efficient execution plans? (explain("executionStats"))
  3. Write Concern & Read Preference
    • High write concern (majority) can slow writes.
    • If too many reads are hitting primary instead of secondaries, that can be an issue.
  4. Connection Pooling & Network Latency
    • Too many open connections can cause bottlenecks.
    • Use db.serverStatus().connections to check current usage.
  5. Storage Engine & Data Layout
    • WiredTiger caches efficiently, but inefficient indexing or large documents might cause slowdowns.
    • Check wiredTiger.cache in db.serverStatus() to see cache utilization.

Performance Tuning for MongoDB

1️⃣ Parallelism & Threading

  • Reduce number of worker threads if too much parallelism causes slowdowns.
  • Check wiredTiger.concurrentWriteTransactions and wiredTiger.concurrentReadTransactions in db.serverStatus().
  • Adjust the wt_max_concurrent_transactions parameter.

2️⃣ Index Optimization

  • Run db.collection.getIndexes() to check if queries use the correct indexes.
  • Use explain("executionStats") on slow queries to ensure proper index usage.
  • Avoid multi-key indexes when possible, as they can slow down writes.

3️⃣ Improve Write Performance

  • Batch inserts instead of single document writes.
  • Use WriteConcern(0 or 1) if data durability isn’t critical.
  • Turn on journaling for safer writes (storage.journal.enabled).
  • Tune the commitIntervalMs setting in WiredTiger.

4️⃣ Cache & Memory Usage

  • MongoDB automatically uses free RAM for caching.
  • Check wiredTiger.cache["bytes currently in the cache"] to see cache usage.
  • If working set fits in RAM, increasing cache size (storage.wiredTiger.engineConfig.cacheSizeGB) might help.

5️⃣ Storage & Disk I/O

  • If using an SSD, ensure XFS filesystem (best for WiredTiger).
  • Use compression like snappy to reduce I/O if CPU is underutilized.
  • Disable Read-Ahead for better random access performance.

🔥 Final Thoughts

  • Your bottleneck is likely thread contention rather than hardware limitations.
  • Tuning parallelism (reducing excessive threading) seems to help.
  • Check indexes & query execution plans for inefficiencies.
  • Adjust write concern, connection pooling, and cache settings for further performance gains.

Want me to analyze a specific workload or explain() output? 🚀

tidyup

delete all mongodbs that start with “benchmark*”

mongosh --eval '
db.adminCommand("listDatabases").databases.forEach(function(database) {
if (database.name.startsWith("benchmark")) {
print("Dropping: " + database.name);
db.getSiblingDB(database.name).dropDatabase();
}})'

how to monitor disk usage

while true; do mongosh --eval "show dbs"; sleep 3; clear; done;

monitor harddisk activity:

 

GNU Linux – basic harddisk harddisk benchmarks with fio dd and python (many small files)

mongo politics

“On March 10, 2022, MongoDB warned its users in Russia and Belarus that their data stored on the MongoDB Atlas platform will be destroyed as a result of American sanctions related to the Russo-Ukrainian War.[46]” (Wiki)

 

liked this article?

  • only together we can create a truly free world
  • plz support dwaves to keep it up & running!
  • (yes the info on the internet is (mostly) free but beer is still not free (still have to work on that))
  • really really hate advertisement
  • contribute: whenever a solution was found, blog about it for others to find!
  • talk about, recommend & link to this blog and articles
  • thanks to all who contribute!
admin