Skip to content

Conversation

prinzdezibel
Copy link

@prinzdezibel prinzdezibel commented Sep 5, 2025

…s the gauge's value to become smaller/greater than the specified limits. The values are optional and don't change the gauge's behaviour if not given.

Example use case:
A log stream increases or decreases a user session gauge if a corresponding login/logoff log entry is seen. Specifying a minimum value of 0 for the gauge helps to prevent the user session gauge to go negative in case that the very first message that triggers the gauge is a logoff message. This might happen without the proposed changes if the user is already logged in and the prometheus service is restarted, because it resets the gauge values to 0.

…s the gauge's value to become smaller/greater than the specified limits. The values are optional and don't change the gauge's behaviour if not given.

Example use case:
A log stream increases or decreases a user session gauge if a corresponding login/logoff log entry is seen. Specifying a minimum value of 0 for the gauge helps to prevent the user session gauge to get negative if the very first message that triggers the gauge is a logoff message. This might happen if the user is already logged in and the prometheus service is restarted, which resets the gauge values to 0.

Signed-off-by: Michael Jenny <[email protected]>
@prinzdezibel
Copy link
Author

prinzdezibel commented Sep 5, 2025

I tried to find another solution for my use case, but couldn't find another way to solve it. But because my use case (counting in/off messages in a log stream to get the active sessions of a user) is so common, I think I might have overlooked something obvious. I would be interested in any advice, in case this is not the best way to solve the problem at hand.

@ArthurSens, @bwplotka, @kakkoyun, @vesari : I need your guidance in this matter. I'm aware that this PR is not completed yet, it misses tests and also documentation. But before doing that I'd like to get your thoughts on it, because I'm not sure my approach to the problem is the way to go...

As I described above, my use case is a gauge that can decrease and increase. But I don't want to have the gauge go negative (in my specific case), because that would indicate a logical error. Therefore I have added a min_value (and also a max_value) to the gauge.

Here is an alloy script that counts active VPN sessions whenever a specific pattern is encountered in the log stream:

loki.process "metric_vpn_sessions" {

    stage.regex {
       expression          = `sessiond\[\d+\]: msg_id="3E00-000(2|4)" (IPSec|SSL) VPN user (?<user>[^@]+)@(?<host>[^\s]+) from (?<external_ip>([0-9]{1,3}\.){3}[0-9]{1,3}) logged (?<state>in|out) assigned virtual IP is (?<internal_ip>([0-9]{1,3}\.){3}[0-9]{1,3})`
       labels_from_groups  = true
    }

    stage.template {
      source   = "delta"
      template = "{{ if eq .state \"in\" }}1{{ else }}-1{{ end }}"
    }

    stage.match {
      selector = `{ state =~ "(in|out)" }`
     
      stage.label_drop { values = [
        "state",
        "external_ip",
        "internal_ip",
       ] }

      stage.metrics {
        metric.gauge {
          name              = "vpn_sessions"
          action            = "add"
          max_idle_duration = "24h"
          min_value         = "0"
          source            = "delta"
        }
      }
    }

   
    forward_to = [loki.write.local.receiver]
}

This works and does what I want thanks to the newly introduced min_value. But I'm wondering how others handle this use case?

I appreciate your thoughts. Thank you.

@prinzdezibel prinzdezibel marked this pull request as draft September 9, 2025 13:53
@bwplotka
Copy link
Member

bwplotka commented Oct 7, 2025

Hi! Thanks for proposing!

So do I understand this right, that you want to ensure some gauge boundaries, for easier later query use, because implementing reliable sub/add operation is not possible (e.g. using log -> metric). First of all minimum 0 gets you away from this case of startup and log-off case, but what if you have 10x log-in and 10x old session log-offs? Then no minimum/max feature would help you and you have an inaccurate metric anyway?

Generally we can argue or not how useful this feature is for general client_golang audience, but the best way is to add a tiny coding wrapper on top of client_golang Set/Add/Sub gauge methods and implement quickly on your code. Why do we need this for general audience at this point? (:

Or... are you rather looking for solution for your instrumentation problem on stateful data (sessions) using stateless mechanisms (on-line log parsing/recording)? In this case Slack, prometheus user group or Loki user group might be a better choice.

@bwplotka
Copy link
Member

bwplotka commented Oct 7, 2025

For general instrumentation logic, sounds like you really need stateful medium to remember the number of old logins OR you search for some periodic session logs OR you use counters. With 2 counters _logoffs_total, _logins_total, you could tell a lot from increase(_logins_total) - increase(_logoffs_total) -- it won't give you accurate number of sessions, but rather periods when lots of logins without logoffs (or opposite) are happening, which can tell you about some leaks. Wide time periods, would give you approx session number perhaps.

Probably others might have better ideas.

TL;DR minimum 0 won't be much better in your case, it feels.

@prinzdezibel
Copy link
Author

prinzdezibel commented Oct 7, 2025

@bwplotka: Thank you for your reply and your time!

Hi! Thanks for proposing!

So do I understand this right, that you want to ensure some gauge boundaries, for easier later query use, because implementing reliable sub/add operation is not possible (e.g. using log -> metric). First of all minimum 0 gets you away from this case of startup and log-off case, but what if you have 10x log-in and 10x old session log-offs? Then no minimum/max feature would help you and you have an inaccurate metric anyway?

Log-In and Log-out messages are emitted pairwise from the VPN-Server. It's not possible I see 2 logoffs (for a specific {user,sessionid} combination) without another login inbetween. Without the gauge boundary (min=0), the gauge could go negative if after a gauge reset the messages that come in have the following sequence:

log message gauge value
Start-Up. 0
LogOff User A -1
LogIn User A 0

Result: The user is logged in, but yet the gauge value is 0 instead of 1.

Generally we can argue or not how useful this feature is for general client_golang audience, but the best way is to add a tiny coding wrapper on top of client_golang Set/Add/Sub gauge methods and implement quickly on your code. Why do we need this for general audience at this point? (:

Yes, this is what I've done for now. If there is no general audience for my use case, I'm not hesitating to close the PR. I don't think it makes sense to have something that only one person uses. But it's kind of hard to believe that this use case is so specific. Perhaps other people have found different solutions, but who knows...

Or... are you rather looking for solution for your instrumentation problem on stateful data (sessions) using stateless mechanisms (on-line log parsing/recording)? In this case Slack, prometheus user group or Loki user group might be a better choice.

No need for that. I'm happy with my solution. It correctly keeps track of active VPN sessions:
Screenshot 2025-10-07 at 18 05 34

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants