Why this bog:
I thought of sharing what I leanred from recent perforamance/benchmark assessment of duckdb storage engine for mariadb
Setup
First of all, if you want setup duckdb for mariadb , please refer 2 good blogs:
1.https://mariadb.org/duckdb-storage-engine-for-mariadb-when-the-sea-lion-learns-to-quack/ - In case, you want to build from source and install.
Note: In case you find issue related to libcrypt you can try this: sed -i 's/libxcrypt-dev/libcrypt-dev/g' /test-data/clones/mdb-server/storage/duckdb/duckdb/build.sh
(modify path as per you cloned path)
USER="mysql"
GROUP="mysql"
DEFAULT_MDB_DATADIR=/var/lib/mysql/
INSTALL_PREFIX=/usr
sudo chown "$USER:$GROUP" /run/mysqld
sudo "$INSTALL_PREFIX/bin/mariadbd-safe" --datadir="$DEFAULT_MDB_DATADIR" &
( above , in case you want to shutdown and start mariadb again)
2. https://mariadb.org/mariadb-duckdb-a-new-playground-for-analytics-a-first-look-at-the-new-storage-engine/ - This in case, you want to use dbdeployer to do setup of duckdb for mariadb.
Performance Benchmark
to know benchmark numbers.
How to get and set configuration in duckdb for mariadb
show global variables like 'duc%'; or
SELECT run_in_duckdb('SELECT name,value FROM duckdb_settings()');
You can set dynamically also the value like this:
SELECT run_in_duckdb('PRAGMA threads = ''16''');
Monitoring duckdb
1. show global status like '%duck% will show details about txn happening inside duckdb storage engine 'Duckdb_commit' etc . This will be helpful , in case you are doing DML operations
on tables with duckdb storage
2. In case you are running OLAP query and you want to know 'Buffer Manager Caching',
you may want to monitor duckdb memory and you can use the following command to do iSELECT run_in_duckdb('FROM duckdb_memory()')
Sample output:
tag memory_usage_bytes temporary_storage_bytes
VARCHAR BIGINT BIGINT
[ Rows: 16]
BASE_TABLE 14942208 0
HASH_TABLE 0 0
PARQUET_READER 0 0
CSV_READER 0 0
ORDER_BY 0 0
ART_INDEX 0 0
COLUMN_DATA 0 0
METADATA 0 0
OVERFLOW_STRINGS 0 0
IN_MEMORY_TABLE 0 0
ALLOCATOR 0 0
EXTENSION 0 0
TRANSACTION 0 0
EXTERNAL_FILE_CACHE 0 0
WINDOW 0 0
OBJECT_CACHE 0 0
You can refer: https://duckdb.org/2024/07/09/memory-management
3. Monitoring Files:
You need to monitor basically 3 files
a) duckdb data file: duckdb.db
b) wal file: duckdb.db.wal
c) temporary space in case spilling of memory data to disk
SELECT run_in_duckdb('SELECT value FROM duckdb_settings() WHERE name = ''temp_directory''');
value\t\nVARCHAR\t\n[ Rows: 1]\n/mnt/nvme2/duckdb-tmp\n\n
4. In case you see RSS ( resident memory of mariadb ) keeps increasing and you want to do memory profiling then heaptrack is great tool to dump, analyze and it has heaptrack_gui to see flamegraph of memory
#install heaptrack sudo apt install heaptrack
# export HEAPTRACK_ENABLE_DEBUGINFOD=1
#attach to running mariadb 'heaptrack -p `pid of mariadb`
press Ctrl+C to stop
#heaptrack --analyze <dump file> or
#heaptrack_gui <dump file> on you laptop
Sample output:
Reference
https://duckdb.org/2024/07/09/memory-management
https://duckdb.org/docs/current/guides/performance/how_to_tune_workloads
https://duckdb.org/docs/current/operations_manual/limits
https://duckdb.org/docs/lts/core_extensions/tpch ( In case you want to run benchmark CLI with tpch data)
https://github.com/drrtuy/mdb-server/tree/bb-11.4-duckdb-jemalloc
https://github.com/MariaDB/duckdb-engine
https://datafusion.apache.org/blog/2025/04/10/fastest-tpch-generator/ ( tpch data generator)
https://github.com/ClickHouse/ClickBench/tree/main
https://www.alibabacloud.com/blog/duckdb-internals---part-5-the-transaction-lifecycle_602860
https://apps.kde.org/heaptrack/
https://github.com/kde/heaptrack
Comments
Post a Comment