Learning and tips from performance assessment of duckdb in mariadb

Why this bog

I thought of sharing what I leanred from recent perforamance/benchmark assessment of duckdb storage engine for mariadb

Setup

First of all, if you want setup duckdb for mariadb , please refer 2 good blogs:

1.https://mariadb.org/duckdb-storage-engine-for-mariadb-when-the-sea-lion-learns-to-quack/ - In case, you want to build from source and install.

Note: In case you find issue related to libcrypt you can try this: sed -i 's/libxcrypt-dev/libcrypt-dev/g' /test-data/clones/mdb-server/storage/duckdb/duckdb/build.sh
(modify path as per you cloned path)

    USER="mysql"
    GROUP="mysql"
    DEFAULT_MDB_DATADIR=/var/lib/mysql/
    INSTALL_PREFIX=/usr
    sudo chown "$USER:$GROUP" /run/mysqld
    sudo "$INSTALL_PREFIX/bin/mariadbd-safe" --datadir="$DEFAULT_MDB_DATADIR" &
( above , in case you want to shutdown and start mariadb again)

2. https://mariadb.org/mariadb-duckdb-a-new-playground-for-analytics-a-first-look-at-the-new-storage-engine/  - This in case, you want to use dbdeployer to do setup of duckdb for mariadb.

Performance Benchmark

Refer Roman's very good presentation video: https://www.youtube.com/watch?v=PVWY_hES0tE
to know benchmark numbers.


How to get and set configuration in duckdb for mariadb

 show global variables like 'duc%'; or

SELECT run_in_duckdb('SELECT name,value FROM duckdb_settings()');

You can set dynamically also the value like this:
SELECT run_in_duckdb('PRAGMA threads = ''16''');

Monitoring duckdb

1. show global status like '%duck% will show details about txn  happening inside duckdb storage engine 'Duckdb_commit' etc . This will be helpful , in case you are doing DML operations

on tables with duckdb storage

2. In case you are running OLAP query and you want to know 'Buffer Manager Caching',

you may want to monitor duckdb memory and you can use the following command to do iSELECT run_in_duckdb('FROM duckdb_memory()')

Sample output:

tag memory_usage_bytes temporary_storage_bytes VARCHAR BIGINT BIGINT [ Rows: 16] BASE_TABLE 14942208 0 HASH_TABLE 0 0 PARQUET_READER 0 0 CSV_READER 0 0 ORDER_BY 0 0 ART_INDEX 0 0 COLUMN_DATA 0 0 METADATA 0 0 OVERFLOW_STRINGS 0 0 IN_MEMORY_TABLE 0 0 ALLOCATOR 0 0 EXTENSION 0 0 TRANSACTION 0 0 EXTERNAL_FILE_CACHE 0 0 WINDOW 0 0 OBJECT_CACHE 0 0

You can refer: https://duckdb.org/2024/07/09/memory-management


3. Monitoring Files:
    You need to monitor basically 3 files
    a) duckdb data file: duckdb.db   
    b) wal file: duckdb.db.wal 
    c) temporary space in case spilling of memory data to disk
       
   SELECT run_in_duckdb('SELECT value FROM duckdb_settings() WHERE name = ''temp_directory''');
       value\t\nVARCHAR\t\n[ Rows: 1]\n/mnt/nvme2/duckdb-tmp\n\n

4. In case you see RSS ( resident memory of mariadb ) keeps increasing and you want to do memory profiling then heaptrack is great tool to dump, analyze and it has heaptrack_gui to see flamegraph of memory
#install heaptrack  sudo apt install heaptrack
# export HEAPTRACK_ENABLE_DEBUGINFOD=1 
#attach to running mariadb 'heaptrack -p `pid of mariadb`
press Ctrl+C to stop
#heaptrack --analyze <dump file> or
#heaptrack_gui <dump file>  on you laptop

Sample output:

heaptrack


Reference

https://duckdb.org/2024/07/09/memory-management
https://duckdb.org/docs/current/guides/performance/how_to_tune_workloads
https://duckdb.org/docs/current/operations_manual/limits
https://duckdb.org/docs/lts/core_extensions/tpch ( In case you want to run benchmark CLI with tpch data)
https://github.com/drrtuy/mdb-server/tree/bb-11.4-duckdb-jemalloc
https://github.com/MariaDB/duckdb-engine
https://datafusion.apache.org/blog/2025/04/10/fastest-tpch-generator/ ( tpch data generator)
https://github.com/ClickHouse/ClickBench/tree/main
https://www.alibabacloud.com/blog/duckdb-internals---part-5-the-transaction-lifecycle_602860 
https://apps.kde.org/heaptrack/
https://github.com/kde/heaptrack


Comments

Popular posts from this blog

Threadpool Performance Assessment - MySQL 9.7.0 vs Mariadb 13.0.1

MySQL- Crash Recovery Performance Assessment

MySQL performance with Kubernetes: bare metal vs K8 with docker vs K8 with container