Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI Tests - Decrease mdb map size for large topologies #6531

Closed
progier389 opened this issue Jan 22, 2025 · 4 comments · Fixed by #6532
Closed

Fix CI Tests - Decrease mdb map size for large topologies #6531

progier389 opened this issue Jan 22, 2025 · 4 comments · Fixed by #6532
Assignees
Labels
needs triage The issue will be triaged during scrum

Comments

@progier389
Copy link
Contributor

Issue Description
github test shows random failure that does not reproduce on our test vm.
The test fails because some instance get unexpectedly stopped. (some 1 instance over the 4, sometime 2 instances)
It happens when the instance are idle after a test finished successfully and before another test starts.
At this point I suspect that the system kill -9 the process because of OOM (The alternative hypotheses would be a crash after deleting a backend but ?I doubt it)
I propose to decrease the mdb map size to 10g when using topology with more than 3 instances

**Package Version and Platform:**s

  • Platform: Fedora
  • Package and version: main

Steps to Reproduce
Steps to reproduce the behavior:

  1. See CI test results and looks for error in replication/acceptance test_new_suffix

Expected results
The test should be stable

@tbordaz
Copy link
Contributor

tbordaz commented Jan 22, 2025

Reducing the memory footprint of the instance looks good to avoid false negative (OOM). Note I also noted transient failure (replication/acceptance) with BDB.

@progier389
Copy link
Contributor Author

I have also seen BDB failure with the same signature so I wonder if there is not another problem.
Just want to see if decreasing the vm memory footprint helps or not.
So far the PR tests are full green while it was almost always failing before ...

progier389 added a commit that referenced this issue Jan 23, 2025
#6532)

Decrease mdb map size to 10 gb when using topologies whose number of instances is > 3

Issue: #6531

Reviewed by: @tbordaz (Thanks!)
@vashirov
Copy link
Member

There is a core generated in
https://github.com/389ds/389-ds-base/actions/runs/12927667100/job/36053887628#step:7:1072
It's present under assets/cores in https://github.com/389ds/389-ds-base/actions/runs/12927667100/artifacts/2474328185
Can be used with this build - rpms

(gdb) bt
#0  ___pthread_mutex_lock (mutex=mutex@entry=0x7373616c4374636d) at pthread_mutex_lock.c:80
#1  0x00007f0fb2788796 in PR_EnterMonitor (mon=0x7373616c43746365) at ../../../../nspr/pr/src/pthreads/ptsynch.c:563
#2  0x00007f0fae3ad5c7 in replica_lock (lock=<optimized out>) at ldap/servers/plugins/replication/repl5_replica.c:109
#3  replica_relinquish_exclusive_access (r=0x7f0c2ab08140, connid=0, opid=-1) at ldap/servers/plugins/replication/repl5_replica.c:676
#4  0x00007f0fae393475 in consumer_connection_extension_destructor (ext=<optimized out>, object=<optimized out>, parent=<optimized out>)
    at ldap/servers/plugins/replication/repl_connext.c:91
#5  0x00007f0fb3161ad3 in factory_destroy_extension (type=<optimized out>, object=0x7f0c4b601478, parent=0x0, extension=0x7f0c4b6015b0)
    at ldap/servers/slapd/factory.c:366
#6  factory_destroy_extension (type=<optimized out>, object=0x7f0c4b601478, parent=0x0, extension=0x7f0c4b6015b0)
    at ldap/servers/slapd/factory.c:348
#7  0x000055c33b2d676c in connection_cleanup (conn=0x7f0c4b601478) at ldap/servers/slapd/connection.c:181
#8  0x000055c33b2ea17c in connection_table_move_connection_out_of_active_list.isra.0 (ct=ct@entry=0x7f0faeb14880, c=c@entry=0x7f0c4b601478)
    at ldap/servers/slapd/conntable.c:470
#9  0x000055c33b2db7db in setup_pr_read_pds (ct=0x7f0faeb14880, listnum=<optimized out>) at ldap/servers/slapd/daemon.c:1596
#10 ct_list_thread (threadnum=<optimized out>) at ldap/servers/slapd/daemon.c:1356
#11 0x00007f0fb278f3b7 in _pt_root (arg=0x7f0fb182bbc0) at ../../../../nspr/pr/src/pthreads/ptthread.c:191
#12 0x00007f0fb2f18057 in start_thread (arg=<optimized out>) at pthread_create.c:448
#13 0x00007f0fb2f9bf4c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

@progier389
Copy link
Contributor Author

Createwd issue #6555 to follow up on the core issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage The issue will be triaged during scrum
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants