Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for tcp_long_connection_metrics #1224

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

yp969803
Copy link
Contributor

@yp969803 yp969803 commented Feb 6, 2025

What type of PR is this?
Proposal for "Metrics for TCP Long Connection"
LFX 2025 term-1

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #1211

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@kmesh-bot kmesh-bot added size/L and removed size/S labels Feb 6, 2025
@yp969803
Copy link
Contributor Author

yp969803 commented Feb 6, 2025

Some questions:

  • Do we also need acceess logs for tcp-long-connection or just metrics works
  • Does current logs support open-telemetry format
  • If not, do we need opentelemetry compatible logs

@LiZhenCheng9527 @nlgwcy @hzxuzhonghu

@hzxuzhonghu
Copy link
Member

Do we also need acceess logs for tcp-long-connection or just metrics works

The accesslog is printed after connection closed. Can keep it as now

Integrate with OTEL sounds reasonable to me.

@LiZhenCheng9527
Copy link
Collaborator

Some questions:

  • Do we also need acceess logs for tcp-long-connection or just metrics works
  • Does current logs support open-telemetry format
  • If not, do we need opentelemetry compatible logs

@LiZhenCheng9527 @nlgwcy @hzxuzhonghu

  • In proposal, You need to include the design of the accesslog and metrics. In code, you can only focus on metrics.
  • Now, accesslog of Kmesh is not support OTEL format. We will support it later.
  • If you want to do this work. : )


- Reporting of metrics and access logs, at periodic time or based on throughput (e.g. after transfer of 1mb of data).

- User can fine tune the time and throughput using yaml during kmesh deployment or can use CLI tool kmeshctl anytime.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this mean? How to fine tune the time and throughput?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to, give users options to set their prefered periodic time and threshold values during the start of kmesh daemon, be setting the values in values.yaml file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understand.
You can change the description

}
```

The value of the period or the threshold is provided by the user, if not a default value of 5 seconds and 1 mb is chosen.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you able to explain in the proposal why 1MB is the threshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the threshold were set too low, the system might generate too many reports, leading to noise and increased processing overhead, 1 mb threshold sounds appropriate to me. We are also giving users options to set their own threshold if he is not satisfied with 1mb

@LiZhenCheng9527
Copy link
Collaborator

@nlgwcy PTAL

@yp969803
Copy link
Contributor Author

Sorry for inactivity i am sick from last 6 days, i will reply to all your queries and complete my proposal asap.

authors:
- "yp969803"
reviewers:
- "nglwcy"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- "nglwcy"
- "nlgwcy"

know that this has succeeded?
-->

- Collect detailed traffic metrics (e.g. bytes send/recieved, direction, throughput, round-trip time, latency , state-change) continously during the lifetime of long TCP connections using ebpf.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended that TCP retransmission and packet loss measurement indicators be added.

@yp969803
Copy link
Contributor Author

@LiZhenCheng9527 can you review the ebpf code in the proposal!

@yp969803 yp969803 marked this pull request as ready for review February 14, 2025 23:25
Copy link
Collaborator

@LiZhenCheng9527 LiZhenCheng9527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please fix typo in your proposal
  2. In Kmesh's dual-engine mode, traffic passes through a waypoint. In the original metrics, the origin destination would be preserved before modifying the TCP metadata. However, I noticed that the current proposal does not address this aspect.

};

struct ipv6_addr {
__u8 addr[16];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ipv4 and ipv6 are represented together, why not use __u32 addr[4]. Is there a difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using union for ipv4 or ipv6,

  union {
        struct ipv4_addr v4;
        struct ipv6_addr v6;
    } saddr;
    union {
        struct ipv4_addr v4;
        struct ipv6_addr v6;
    } daddr;

This approach enhance code readiblity, i can use a single "__u32 addr[4]", which might require conversion logic (e.g., mapping IPv4 into an IPv6-like format), adding complexity


Using various ebpf tracepoints hooks to collects metrics of tcp_long_collection, a ring buffer is also decleared to send data from kernel space to userspace

Code update in tracepoint.c file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is best to add a separate file.

key.dport = ctx->dport;

if (ctx->newstate == TCP_ESTABLISHED) {
struct long_tcp_metrics m = {};
Copy link
Contributor

@nlgwcy nlgwcy Feb 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel-native mode has implemented some observation capabilities, such as on_cluster_sock_connect and on_cluster_sock_close. It is recommended to enhance them based on the existing implementation;


struct long_tcp_metrics *m = bpf_map_lookup_elem(&conn_metrics_map, &key);
if (m) {
__sync_fetch_and_add(&m->bytes_sent, bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The socket structure already stores link information such as the number of bytes sent and received. Why do we need to calculate it? You can refer to tcp_report



SEC("tracepoint/tcp/tcp_set_state")
int trace_tcp_set_state(struct trace_event_raw_tcp_set_state *ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All newly added BPF observation hook points need to use is_managed_by_kmesh to determine whether the current link is taken over by Kmesh

// Flush Function: Periodically invoked via a perf event.
// Iterates over the conn_metrics_map and submits events for connections
// that have been open longer than LONG_CONN_THRESHOLD_NS.
SEC("perf_event/flush")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the periodic reporting of the indicator information of the long link be realized based on bpf_timer? For reference: https://github.com/Asphaltt/learn-by-example/blob/main/ebpf/timer/tcp-connecting.c

@yp969803
Copy link
Contributor Author

what is the difference between kmesh_map64, kmesh_map192, kmesh_map296, kmesh_map1600 ebpf maps, they all look similar?
@nlgwcy

@kmesh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nlgwcy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kmesh-bot kmesh-bot added size/L and removed size/XL labels Feb 23, 2025
@hzxuzhonghu
Copy link
Member

what is the difference between kmesh_map64, kmesh_map192, kmesh_map296, kmesh_map1600 ebpf maps, they all look similar?

These are inner implementions, you should never use them directly

@LiZhenCheng9527
Copy link
Collaborator

what is the difference between kmesh_map64, kmesh_map192, kmesh_map296, kmesh_map1600 ebpf maps, they all look similar? @nlgwcy

you can refer to #1029 and https://github.com/kmesh-net/kmesh/blob/main/docs/proposal/map-in-map_management_enhancement-en.md


Currently kmesh provides access logs during termination and establisment of a TCP connection with more detailed information about the connection.

Kmesh also provides metrics during connection establishment, completion and deny apturing a variety of details about the connection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kmesh also provides metrics during connection establishment, completion and deny apturing a variety of details about the connection.
Kmesh also provides metrics during connection establishment, completion and deny capturing a variety of details about the connection.

Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update once the solution settle down

nitty-gritty.
-->

Metrics will be collected using eBPF tracepoint hooks, and a eBPF map will be used to transfer metrics from kernel space to userspace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracepoint hooks is not consistent with impl

Copy link
Contributor Author

@yp969803 yp969803 Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to update the proposal with the new implementations:
#1249
this is the pr with the latest work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[lfx-mentorship-2025-Mar-May] Metrics for TCP Long Connection
5 participants