Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seg fault under high load whilst tailing a log #9864

Open
sorran opened this issue Jan 24, 2025 · 1 comment
Open

seg fault under high load whilst tailing a log #9864

sorran opened this issue Jan 24, 2025 · 1 comment

Comments

@sorran
Copy link

sorran commented Jan 24, 2025

Bug Report

Describe the bug

When running under high load we encounter seg faults:

#0  0xffff98bcd810      in  ???() at ???:0
#1  0xffff98bcffa3      in  ???() at ???:0
#2  0xffff98bd0d4f      in  ???() at ???:0
#3  0x5f884f            in  msgpack_sbuffer_write() at lib/msgpack-c/include/msgpack/sbuffer.h:81
#4  0x5b8367            in  msgpack_pack_map() at lib/msgpack-c/include/msgpack/pack_template.h:753
#5  0x5bc78b            in  flb_mp_map_header_init() at src/flb_mp.c:326
#6  0x5f8a83            in  flb_log_event_encoder_dynamic_field_scope_enter() at src/flb_log_event_encoder_dynamic_field.c:70
#7  0x5f8b7b            in  flb_log_event_encoder_dynamic_field_begin_map() at src/flb_log_event_encoder_dynamic_field.c:117
#8  0x5ee66f            in  flb_log_event_encoder_begin_record() at src/flb_log_event_encoder.c:250
#9  0xafdcc3            in  apply_modifying_rules() at plugins/filter_modify/modify.c:1414
#10 0xafe127            in  cb_modify_filter() at plugins/filter_modify/modify.c:1526
#11 0x4dcc53            in  flb_filter_do() at src/flb_filter.c:161
#12 0x4d25e7            in  input_chunk_append_raw() at src/flb_input_chunk.c:1608
#13 0x4d2e4f            in  flb_input_chunk_append_raw() at src/flb_input_chunk.c:1929
#14 0x5d2923            in  input_log_append() at src/flb_input_log.c:71
#15 0x5d29ab            in  flb_input_log_append() at src/flb_input_log.c:90
#16 0x744ccb            in  ml_stream_buffer_flush() at plugins/in_tail/tail_file.c:412
#17 0x745ceb            in  ml_flush_callback() at plugins/in_tail/tail_file.c:919
#18 0x574883            in  flb_ml_flush_stream_group() at src/multiline/flb_ml.c:1516
#19 0x571e13            in  flb_ml_flush_parser_instance() at src/multiline/flb_ml.c:117
#20 0x5fd42f            in  flb_ml_stream_id_destroy_all() at src/multiline/flb_ml_stream.c:316
#21 0x746cb7            in  flb_tail_file_remove() at plugins/in_tail/tail_file.c:1256
#22 0x74898b            in  check_purge_deleted_file() at plugins/in_tail/tail_file.c:1936
#23 0x748d07            in  flb_tail_file_purge() at plugins/in_tail/tail_file.c:1992
#24 0x4cbb13            in  flb_input_collector_fd() at src/flb_input.c:1982
#25 0x50f3af            in  flb_engine_handle_event() at src/flb_engine.c:577
#26 0x50f3af            in  flb_engine_start() at src/flb_engine.c:960
#27 0x4ad693            in  flb_lib_worker() at src/flb_lib.c:835
#28 0xffff98bc0933      in  ???() at ???:0
#29 0xffff98b64e5b      in  ???() at ???:0
#30 0xffffffffffffffff  in  ???() at ???:0
#0  0xffffb400e810      in  ???() at ???:0
#1  0xffffb4010fa3      in  ???() at ???:0
#2  0xffffb4011d4f      in  ???() at ???:0
#3  0x1219a63           in  msgpack_unpacker_init() at lib/msgpack-c/src/unpack.c:372
#4  0xafd9e3            in  apply_modifying_rules() at plugins/filter_modify/modify.c:1372
#5  0xafe127            in  cb_modify_filter() at plugins/filter_modify/modify.c:1526
#6  0x4dcc53            in  flb_filter_do() at src/flb_filter.c:161
#7  0x4d25e7            in  input_chunk_append_raw() at src/flb_input_chunk.c:1608
#8  0x4d2e4f            in  flb_input_chunk_append_raw() at src/flb_input_chunk.c:1929
#9  0x5d2923            in  input_log_append() at src/flb_input_log.c:71
#10 0x5d29ab            in  flb_input_log_append() at src/flb_input_log.c:90
#11 0x744ccb            in  ml_stream_buffer_flush() at plugins/in_tail/tail_file.c:412
#12 0x745ceb            in  ml_flush_callback() at plugins/in_tail/tail_file.c:919
#13 0x574883            in  flb_ml_flush_stream_group() at src/multiline/flb_ml.c:1516
#14 0x571e13            in  flb_ml_flush_parser_instance() at src/multiline/flb_ml.c:117
#15 0x5fd42f            in  flb_ml_stream_id_destroy_all() at src/multiline/flb_ml_stream.c:316
#16 0x746cb7            in  flb_tail_file_remove() at plugins/in_tail/tail_file.c:1256
#17 0x74898b            in  check_purge_deleted_file() at plugins/in_tail/tail_file.c:1936
#18 0x748d07            in  flb_tail_file_purge() at plugins/in_tail/tail_file.c:1992
#19 0x4cbb13            in  flb_input_collector_fd() at src/flb_input.c:1982
#20 0x50f3af            in  flb_engine_handle_event() at src/flb_engine.c:577
#21 0x50f3af            in  flb_engine_start() at src/flb_engine.c:960
#22 0x4ad693            in  flb_lib_worker() at src/flb_lib.c:835
#23 0xffffb4001933      in  ???() at ???:0
#24 0xffffb3fa5e5b      in  ???() at ???:0
#25 0xffffffffffffffff  in  ???() at ???:0
Aborted (core dumped)

Indicates some memory corruption around:

buffer = (char*)malloc(initial_buffer_size);

tmp = realloc(sbuf->data, nsize);

Valgrind logs:

valgrindx-1.log
valgrindx.log

Any advice on how to troubleshoot further would be well received.

To Reproduce

  • Rubular link if applicable:
  • Example log message if applicable:
2025-01-24 07:12:17,692 [B] [107465] [homecontainer]  WARN [Thread-92(sf-worker)] (Debugger.java:120) - Can't find respawn variables (LogicGameObjectManager:2313:2062 LogicLevel:5550 LogicGameMode:1085:1029:296)
2025-01-24 07:12:17,692 [C] [267970] [homecontainer]  WARN [Thread-104(sf-worker)] (Debugger.java:120) - Can't find respawn variables (LogicGameObjectManager:2313:2062 LogicLevel:5550 LogicGameMode:1085:1029:296)
2025-01-24 07:12:17,692 [C] [488726] [homecontainer]  WARN [Thread-81(sf-worker)] (Debugger.java:120) - Can't find respawn variables (LogicGameObjectManager:2313:2062 LogicLevel:5550 LogicGameMode:1085:1029:296)
2025-01-24 07:12:17,692 [C] [256683] [battlecontainer]  WARN [Thread-103(sf-worker)] (Debugger.java:120) - Can't find respawn variables (LogicGameObjectManager:2313:2062 LogicLevel:5550 LogicGameMode:1085:1029:296)
  • Steps to reproduce the problem:

Seems to be occur under high stress. We can encounter within 1-2 minutes on a c7g.large EC2 instance that is tailing a log that is producing at 50k lines/s. Performane at 25k line/s performance appears stable. The log is a java log that will rotate once it hits 500mb and is then deleted once there's > 3 logs. At 50k line/s we'd expect the log producer is running at a higher rate then what fluent-bit is likely able to consume.

Expected behavior

Performance to degrade without seg fault crash.

Screenshots

Your Environment

  • Version used: v3.2.4
  • Configuration:

config.zip

  • Environment name and version (e.g. Kubernetes? What version?): EC2
  • Server type and version: AWS Linux c7g.large
  • Operating System and version: Amazon Linux
  • Filters and plugins: tail input, java_multiline, java_capture, modify, http output

Additional context

Stress testing fluent-bit attempting to understand performance limitations. Possibly we need throttle fluent-bit but unclear if this will actually resolve the seg fault.

@patrick-stephens
Copy link
Contributor

patrick-stephens commented Jan 24, 2025

@sorran could you paste in the flat config just to save having to download, extract and open potentially malicious files?

It looks like it is tail input with loki and http output but be good to get the full config as flat text?

I wrote this a while back when I had monstrous includes to help: https://github.com/couchbase/couchbase-fluent-bit/blob/main/tools/flatten-config.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants