Skip to content

Manticore should not rely on DNS when running in kubernetes. #6

@zerthimon

Description

@zerthimon

Manticore should not rely on DNS for cluster nodes resolution when working in kubernetes.

When a pod restarts because of node failure/upgrade/etc, its entry is removed from the kube dns (core-dns). Manticore then crashes on the remaining node with error, because it cannot resolve the failed node by name:

[Thu Aug 12 12:18:54.614 2021] [13] FATAL: no AF_INET address found for: backend-manticoresearch-worker-1.backend-manticoresearch-worker  
FATAL: no AF_INET address found for: backend-manticoresearch-worker-1.backend-manticoresearch-worker  
[Thu Aug 12 12:18:54.673 2021] [1] caught SIGTERM, shutting down  
caught SIGTERM, shutting down  
------- FATAL: CRASH DUMP -------  
[Thu Aug 12 12:18:54.673 2021] [    1]  
[Thu Aug 12 12:19:19.674 2021] [1] WARNING: GlobalCrashQueryGetRef: thread-local info is not set! Use ad-hoc  
WARNING: GlobalCrashQueryGetRef: thread-local info is not set! Use ad-hoc  
  
--- crashed invalid query ---  
  
--- request dump end ---  
--- local index:  
Manticore 3.6.0 96d61d8bf@210504 release  
Handling signal 11  
Crash!!! Handling signal 11  
-------------- backtrace begins here ---------------  
Program compiled with 7  
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=bionic -DUSE_SSL=ON -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.20 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DUSE_RE2=1 -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SONAME=libgalera_manticore.so.31 -DSYSCONFDIR=/etc/manticoresearch  
Host OS is Linux x86_64  
Stack bottom = 0x7fff43fad227, thread stack size = 0x20000  
Trying manual backtrace:  
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x5c95bbd0f9002)  
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x5c95bbd0f9002, stack=0x564947870000, stacksize=0x20000)  
Trying system backtrace:  
begin of system symbols:  
searchd(_Z12sphBacktraceib+0xcb)[0x564946faf75b]  
searchd(_ZN11CrashLogger11HandleCrashEi+0x1ac)[0x564946dcd66c]  
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fa45a7fa890]  
searchd(_ZN11CSphNetLoop11StopNetLoopEv+0xa)[0x564946eb978a]  
searchd(_Z8Shutdownv+0xd0)[0x564946dd2c00]  
searchd(_Z12CheckSignalsv+0x63)[0x564946de04a3]  
searchd(_Z8TickHeadv+0x1b)[0x564946de04fb]  
searchd(_Z11ServiceMainiPPc+0x1cea)[0x564946dfa5ea]  
searchd(main+0x63)[0x564946dcb6a3]  
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fa4594b4b97]  
searchd(_start+0x2a)[0x564946dcca6a]  
-------------- backtrace ends here ---------------  
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)  
and attach there:  
a) searchd log, b) searchd binary, c) searchd symbols.  
Look into the chapter 'Reporting bugs' in the manual  
(https://manual.manticoresearch.com/Reporting_bugs)  
Dump with GDB is not available  
--- BT to source lines (depth 11): ---  
conversion failed (error 'No such file or directory'):  
  1. Run the command provided below over the crashed binary (for example, 'searchd'):  
  2. Attach the source.txt to the bug report.  
addr2line -e searchd 0x46faf75b 0x46dcd66c 0x5a7fa890 0x46eb978a 0x46dd2c00 0x46de04a3 0x46de04fb   
0x46dfa5ea 0x46dcb6a3 0x594b4b97 0x46dcca6a > source.txt  

After cluster node comes back online, the remaining node cannot start because it cannot resolve its own IP due to it's own entry got removed from DNS. The NXDOMAIN DNS response is cached by the kubernetes cluster node OS time and again, so node cannot start anymore at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    waitingWaiting for the original poster (in most cases) or something else

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions