Fix CI Tests - Decrease mdb map size for large topologies #6531

progier389 · 2025-01-22T12:02:57Z

Issue Description
github test shows random failure that does not reproduce on our test vm.
The test fails because some instance get unexpectedly stopped. (some 1 instance over the 4, sometime 2 instances)
It happens when the instance are idle after a test finished successfully and before another test starts.
At this point I suspect that the system kill -9 the process because of OOM (The alternative hypotheses would be a crash after deleting a backend but ?I doubt it)
I propose to decrease the mdb map size to 10g when using topology with more than 3 instances

**Package Version and Platform:**s

Platform: Fedora
Package and version: main

Steps to Reproduce
Steps to reproduce the behavior:

See CI test results and looks for error in replication/acceptance test_new_suffix

Expected results
The test should be stable

tbordaz · 2025-01-22T12:17:05Z

Reducing the memory footprint of the instance looks good to avoid false negative (OOM). Note I also noted transient failure (replication/acceptance) with BDB.

progier389 · 2025-01-22T14:58:47Z

I have also seen BDB failure with the same signature so I wonder if there is not another problem.
Just want to see if decreasing the vm memory footprint helps or not.
So far the PR tests are full green while it was almost always failing before ...

@tbordaz

#6532) Decrease mdb map size to 10 gb when using topologies whose number of instances is > 3 Issue: #6531 Reviewed by: @tbordaz (Thanks!)

vashirov · 2025-01-23T19:06:45Z

There is a core generated in
https://github.com/389ds/389-ds-base/actions/runs/12927667100/job/36053887628#step:7:1072
It's present under assets/cores in https://github.com/389ds/389-ds-base/actions/runs/12927667100/artifacts/2474328185
Can be used with this build - rpms

(gdb) bt
#0  ___pthread_mutex_lock (mutex=mutex@entry=0x7373616c4374636d) at pthread_mutex_lock.c:80
#1  0x00007f0fb2788796 in PR_EnterMonitor (mon=0x7373616c43746365) at ../../../../nspr/pr/src/pthreads/ptsynch.c:563
#2  0x00007f0fae3ad5c7 in replica_lock (lock=<optimized out>) at ldap/servers/plugins/replication/repl5_replica.c:109
#3  replica_relinquish_exclusive_access (r=0x7f0c2ab08140, connid=0, opid=-1) at ldap/servers/plugins/replication/repl5_replica.c:676
#4  0x00007f0fae393475 in consumer_connection_extension_destructor (ext=<optimized out>, object=<optimized out>, parent=<optimized out>)
    at ldap/servers/plugins/replication/repl_connext.c:91
#5  0x00007f0fb3161ad3 in factory_destroy_extension (type=<optimized out>, object=0x7f0c4b601478, parent=0x0, extension=0x7f0c4b6015b0)
    at ldap/servers/slapd/factory.c:366
#6  factory_destroy_extension (type=<optimized out>, object=0x7f0c4b601478, parent=0x0, extension=0x7f0c4b6015b0)
    at ldap/servers/slapd/factory.c:348
#7  0x000055c33b2d676c in connection_cleanup (conn=0x7f0c4b601478) at ldap/servers/slapd/connection.c:181
#8  0x000055c33b2ea17c in connection_table_move_connection_out_of_active_list.isra.0 (ct=ct@entry=0x7f0faeb14880, c=c@entry=0x7f0c4b601478)
    at ldap/servers/slapd/conntable.c:470
#9  0x000055c33b2db7db in setup_pr_read_pds (ct=0x7f0faeb14880, listnum=<optimized out>) at ldap/servers/slapd/daemon.c:1596
#10 ct_list_thread (threadnum=<optimized out>) at ldap/servers/slapd/daemon.c:1356
#11 0x00007f0fb278f3b7 in _pt_root (arg=0x7f0fb182bbc0) at ../../../../nspr/pr/src/pthreads/ptthread.c:191
#12 0x00007f0fb2f18057 in start_thread (arg=<optimized out>) at pthread_create.c:448
#13 0x00007f0fb2f9bf4c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

progier389 · 2025-01-28T15:14:28Z

Createwd issue #6555 to follow up on the core issue

progier389 added the needs triage The issue will be triaged during scrum label Jan 22, 2025

progier389 self-assigned this Jan 22, 2025

progier389 mentioned this issue Jan 22, 2025

Issue 6531 - Fix CI Tests - Decrease mdb map size for large topologies #6532

Merged

progier389 linked a pull request Jan 22, 2025 that will close this issue

Issue 6531 - Fix CI Tests - Decrease mdb map size for large topologies #6532

Merged

progier389 closed this as completed in #6532 Jan 23, 2025

progier389 added a commit that referenced this issue Jan 23, 2025

Issue 6531 - Fix CI Tests - Decrease mdb map size for large topologies (

0646e23

#6532) Decrease mdb map size to 10 gb when using topologies whose number of instances is > 3 Issue: #6531 Reviewed by: @tbordaz (Thanks!)

progier389 mentioned this issue Jan 28, 2025

Potential crash when deleting a replicated backend #6555

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CI Tests - Decrease mdb map size for large topologies #6531

Fix CI Tests - Decrease mdb map size for large topologies #6531

progier389 commented Jan 22, 2025

tbordaz commented Jan 22, 2025

progier389 commented Jan 22, 2025

vashirov commented Jan 23, 2025

progier389 commented Jan 28, 2025

Fix CI Tests - Decrease mdb map size for large topologies #6531

Fix CI Tests - Decrease mdb map size for large topologies #6531

Comments

progier389 commented Jan 22, 2025

tbordaz commented Jan 22, 2025

progier389 commented Jan 22, 2025

vashirov commented Jan 23, 2025

progier389 commented Jan 28, 2025