Skip to content
This repository was archived by the owner on Feb 12, 2024. It is now read-only.

Commit d1d66c6

Browse files
authored
Merge pull request #268 from strongdm/feat/add-prometheus-monitoring
Add prometheus monitoring
2 parents 727d765 + a9f591e commit d1d66c6

20 files changed

Lines changed: 2444 additions & 9 deletions

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@ docker logs accessbot_accessbot_1
8585
#### Without Docker
8686

8787
If you want to install and execute the bot locally without Docker, please refer to: [Configure Local Environment](docs/CONFIGURE_LOCAL_ENV.md)
88+
If you want to expose a Prometheus endpoint with AccessBot Metrics, please refer to [Configure Monitoring](docs/configure_accessbot/CONFIGURE_MONITORING.md)
8889

8990
## Getting Started
9091
Once AccessBot is up and running, you can add it as an app or to a channel and start using it!

config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,3 +110,5 @@ def get_bot_admins():
110110
"bot_id": None, # will be initialized in SlackBoltBackend.resolve_access_form_bot_id method
111111
"nickname": os.getenv("SDM_ACCESS_FORM_BOT_NICKNAME")
112112
}
113+
114+
EXPOSE_METRICS = os.getenv("SDM_EXPOSE_METRICS", "false").lower() == "true"

docker-compose-prometheus.yaml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
version: "3.9"
2+
services:
3+
accessbot:
4+
image: public.ecr.aws/strongdm/accessbot:latest
5+
env_file:
6+
# You could use env-file.example as a reference
7+
- env-file
8+
environment:
9+
- SDM_EXPOSE_METRICS=true
10+
ports:
11+
- 3141:3141
12+
- 3142:3142
13+
prometheus:
14+
build: tools/prometheus
15+
ports:
16+
- 9090:9090
17+
grafana:
18+
build: tools/grafana
19+
ports:
20+
- 3000:3000
21+

docker-compose.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ services:
77
- env-file
88
ports:
99
- 3141:3141
10+
- 3142:3142
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# CONFIGURE MONITORING
2+
3+
To enable monitoring you need to set the variable `SDM_EXPOSE_METRICS=true`.
4+
5+
After enabling it, a metrics endpoint will available at port `3142`. There you can see the following metrics:
6+
- `accessbot_total_received_messages` - total count of received messages
7+
- `accessbot_total_access_requests` - total count of received access requests messages
8+
- `accessbot_total_pending_access_requests` - total count of pending access requests
9+
- `accessbot_total_manual_approvals` - total count of manually approved access requests
10+
- `accessbot_total_auto_approvals` - total count of auto approved access requests
11+
- `accessbot_total_denied_access_requests` - total count of manually denied access requests
12+
- `accessbot_total_timed_out_access_requests` - total count of timed out access requests
13+
- `accessbot_total_consecutive_errors` - total count of consecutive errors
14+
15+
To see an example, follow these steps:
16+
1. Download the file [docker-compose-prometheus.yaml](../../docker-compose-prometheus.yaml);
17+
- Make sure that your `env-file` is properly configured following the `env-file.example` template
18+
2. Run with your preferred container orchestrator (with docker, you can simply run `docker-compose -f docker-compose-prometheus.yaml up`)
19+
20+
Now you can go to the **AccessBot Metrics** Grafana Dashboard in `http://localhost:3000/d/982GyKX7z/accessbot-metrics` and see the following charts:
21+
22+
1 - Received Messages Count (`accessbot_total_received_messages` metric):
23+
24+
![image](https://user-images.githubusercontent.com/49597325/168816013-b71ff2b5-be8b-45ea-9a58-4cec4e51cb53.png)
25+
26+
2 - Access Requests Count (`accessbot_total_access_requests` metric):
27+
28+
![image](https://user-images.githubusercontent.com/49597325/168816036-f51baf75-67ed-4735-be77-51e2f9ce379a.png)
29+
30+
3 - Pending Access Requests Count (`accessbot_total_pending_access_requests` metric):
31+
32+
![image](https://user-images.githubusercontent.com/49597325/168816111-8b330af8-110c-4dc4-96f2-6ff554e6703b.png)
33+
34+
4 - Manually Approved Access Requests Count (`accessbot_total_manual_approvals` metric):
35+
36+
![image](https://user-images.githubusercontent.com/49597325/168816119-344c8c2c-ddad-4008-a5c6-4ee8c02fb66f.png)
37+
38+
5 - Auto Approved Access Requests Count (`accessbot_total_auto_approvals` metric):
39+
40+
![image](https://user-images.githubusercontent.com/49597325/168816132-755b62f2-da7c-49ab-9c7f-bf20b5b0162b.png)
41+
42+
6 - Manually Denied Access Requests Count (`accessbot_total_denied_access_requests` metric):
43+
44+
![image](https://user-images.githubusercontent.com/49597325/168816152-1ffbffe7-128e-475a-9ba4-2971431c380d.png)
45+
46+
7 - Timed Out Access Requests Count (`accessbot_total_timed_out_access_requests` metric):
47+
48+
![image](https://user-images.githubusercontent.com/49597325/168816166-8deb901e-ccfd-4101-9ae7-8e290f429ea6.png)
49+
50+
8 - Total Consecutive Errors Count (`accessbot_total_consecutive_errors` metric):
51+
52+
![image](https://user-images.githubusercontent.com/49597325/168816189-a559a694-2790-49be-87a0-1d06f4a73cb4.png)
53+
54+
9 - Last Execution Status (`accessbot_total_consecutive_errors` metric - 0 means that everything is fine, 1 means that the last execution(s) failed):
55+
56+
![image](https://user-images.githubusercontent.com/49597325/168816198-c76173b2-cb18-4799-9590-d7405bf5496f.png)

e2e/test_common.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@ class ErrBotExtraTestSettings:
2222
'allowprivate': True,
2323
'allowmuc': False,
2424
}
25-
}
25+
},
26+
'EXPOSE_METRICS': False,
2627
}
2728
extra_plugin_dir = "plugins/sdm"
2829

@@ -42,7 +43,8 @@ class MSTeamsErrBotExtraTestSettings:
4243
'allowprivate': True,
4344
'allowmuc': False,
4445
}
45-
}
46+
},
47+
'EXPOSE_METRICS': False,
4648
}
4749
extra_plugin_dir = "plugins/sdm"
4850

plugins/sdm/accessbot.py

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,16 @@
11
import os
22
import re
3-
import time
4-
import json
5-
import copy
63
from itertools import chain
7-
from errbot import BotPlugin, re_botcmd, Message, botcmd
4+
5+
from errbot import BotPlugin, re_botcmd, Message
86
from errbot.core import ErrBot
97
from slack_sdk.errors import SlackApiError
10-
from collections import namedtuple
118

129
import config_template
1310
from lib import ApproveHelper, create_sdm_service, MSTeamsPlatform, PollerHelper, \
1411
ShowResourcesHelper, ShowRolesHelper, SlackBoltPlatform, SlackRTMPlatform, \
1512
ResourceGrantHelper, RoleGrantHelper, DenyHelper, CommandAliasHelper, ArgumentsHelper, \
16-
GrantRequestHelper, WhoamiHelper
13+
GrantRequestHelper, WhoamiHelper, MetricsHelper
1714
from lib.util import normalize_utf8
1815
from grant_request_type import GrantRequestType
1916

@@ -25,6 +22,7 @@
2522
SHOW_ROLES_REGEX = r"show available roles"
2623
FIVE_SECONDS = 5
2724
ONE_MINUTE = 60
25+
MSG_ERROR_OCCURRED = "An error occurred, please contact your SDM admin"
2826

2927
def get_callback_message_fn(bot):
3028
def callback_message(msg):
@@ -38,6 +36,14 @@ def callback_message(msg):
3836
ErrBot.callback_message(bot, msg)
3937
return callback_message
4038

39+
def get_send_simple_reply(bot):
40+
def send_simple_reply(msg, text, private=False, threaded=False):
41+
if text.startswith(MSG_ERROR_OCCURRED):
42+
accessbot = bot.plugin_manager.plugins['AccessBot']
43+
accessbot.get_metrics_helper().increment_consecutive_errors()
44+
return ErrBot.send_simple_reply(bot, msg, text, private=private, threaded=threaded)
45+
return send_simple_reply
46+
4147
def get_platform(bot):
4248
platform = bot.bot_config.BOT_PLATFORM if hasattr(bot.bot_config, 'BOT_PLATFORM') else None
4349
if platform == 'ms-teams':
@@ -50,13 +56,15 @@ def get_platform(bot):
5056
# pylint: disable=too-many-ancestors
5157
class AccessBot(BotPlugin):
5258
__grant_requests_helper = None
59+
__metrics_helper = None
5360
_platform = None
5461

5562
def activate(self):
5663
super().activate()
5764
self._platform = get_platform(self)
58-
self._bot.MSG_ERROR_OCCURRED = 'An error occurred, please contact your SDM admin'
65+
self._bot.MSG_ERROR_OCCURRED = MSG_ERROR_OCCURRED
5966
self._bot.callback_message = get_callback_message_fn(self._bot)
67+
self._bot.send_simple_reply = get_send_simple_reply(self._bot)
6068
self.init_access_form_bot()
6169
self.update_access_control_admins()
6270
self['auto_approve_uses'] = {}
@@ -68,6 +76,8 @@ def activate(self):
6876
# If something doesn't need to be "instantiated" again we shouldn't be doing it
6977
if self.__grant_requests_helper is None:
7078
self.__grant_requests_helper = GrantRequestHelper(self)
79+
if self.__metrics_helper is None:
80+
self.__metrics_helper = MetricsHelper(self)
7181
self._hide_utils_whoami_command()
7282

7383
def _hide_utils_whoami_command(self):
@@ -156,6 +166,7 @@ def access_resource(self, message, match):
156166
"""
157167
Grant access to a resource (using the requester's email address)
158168
"""
169+
self.__metrics_helper.increment_access_requests()
159170
arguments = re.sub(ACCESS_REGEX, "\\1", match.string.replace("*", ""), flags=re.IGNORECASE)
160171
if re.match("^role (.*)", arguments, flags=re.IGNORECASE):
161172
self.log.debug("##SDM## AccessBot.access better match for assign_role")
@@ -172,6 +183,7 @@ def access_resource(self, message, match):
172183
yield str(e)
173184
return
174185
yield from self.get_resource_grant_helper().request_access(message, resource_name, flags=flags)
186+
self.__metrics_helper.reset_consecutive_errors()
175187

176188
@re_botcmd(pattern=ASSIGN_ROLE_REGEX, flags=re.IGNORECASE, prefixed=False, re_cmd_name_help="access to role role-name")
177189
def assign_role(self, message, match):
@@ -180,48 +192,58 @@ def assign_role(self, message, match):
180192
"""
181193
if not self._platform.can_assign_role(message):
182194
return
195+
self.__metrics_helper.increment_access_requests()
183196
role_name = re.sub(ASSIGN_ROLE_REGEX, "\\1", match.string.replace("*", ""), flags=re.IGNORECASE)
184197
yield from self.get_role_grant_helper().request_access(message, role_name)
198+
self.__metrics_helper.reset_consecutive_errors()
185199

186200
@re_botcmd(pattern=APPROVE_REGEX, flags=re.IGNORECASE, prefixed=False, hidden=True)
187201
def approve(self, message, match):
188202
"""
189203
Approve a grant (resource or role)
190204
"""
205+
self.__metrics_helper.increment_received_messages()
191206
access_request_id = re.sub(APPROVE_REGEX, r"\1", match.string.replace("*", ""), flags=re.IGNORECASE).upper()
192207
approver = message.frm
193208
yield from self.get_approve_helper().execute(approver, access_request_id)
209+
self.__metrics_helper.reset_consecutive_errors()
194210

195211
@re_botcmd(pattern=DENY_REGEX, flags=re.IGNORECASE, prefixed=False, hidden=True)
196212
def deny(self, message, match):
197213
"""
198214
Deny a grant request (resource or role)
199215
"""
216+
self.__metrics_helper.increment_received_messages()
200217
access_request_id = re.sub(DENY_REGEX, r"\1", match.string.replace("*", ""), flags=re.IGNORECASE).upper()
201218
denial_reason = re.sub(DENY_REGEX, r"\2", match.string.replace("*", ""), flags=re.IGNORECASE)
202219
admin = message.frm
203220
yield from self.get_deny_helper().execute(admin, access_request_id, denial_reason)
221+
self.__metrics_helper.reset_consecutive_errors()
204222

205223
#pylint: disable=unused-argument
206224
@re_botcmd(pattern=SHOW_RESOURCES_REGEX, flags=re.IGNORECASE, prefixed=False, re_cmd_name_help="show available resources [--filter expression]")
207225
def show_resources(self, message, match):
208226
"""
209227
Show all available resources
210228
"""
229+
self.__metrics_helper.increment_received_messages()
211230
if not self._platform.can_show_resources(message):
212231
return
213232
flags = self.get_arguments_helper().extract_flags(message.body)
214233
yield from self.get_show_resources_helper().execute(message, flags=flags)
234+
self.__metrics_helper.reset_consecutive_errors()
215235

216236
#pylint: disable=unused-argument
217237
@re_botcmd(pattern=SHOW_ROLES_REGEX, flags=re.IGNORECASE, prefixed=False, re_cmd_name_help="show available roles")
218238
def show_roles(self, message, match):
219239
"""
220240
Show all available roles
221241
"""
242+
self.__metrics_helper.increment_received_messages()
222243
if not self._platform.can_show_roles(message):
223244
return
224245
yield from self.get_show_roles_helper().execute(message)
246+
self.__metrics_helper.reset_consecutive_errors()
225247

226248
@re_botcmd(pattern=r"whoami", flags=re.IGNORECASE, prefixed=False, name="accessbot-whoami")
227249
def whoami(self, message, _):
@@ -279,17 +301,22 @@ def get_arguments_helper(self):
279301
def get_whoami_helper(self):
280302
return WhoamiHelper(self)
281303

304+
def get_metrics_helper(self):
305+
return self.__metrics_helper
306+
282307
def get_admin_ids(self):
283308
return self._platform.get_admin_ids()
284309

285310
def enter_grant_request(self, request_id: str, message, sdm_object, sdm_account, grant_request_type: GrantRequestType, flags: dict = None):
286311
self.__grant_requests_helper.add(request_id, message, sdm_object, sdm_account, grant_request_type, flags)
312+
self.__metrics_helper.increment_pending_requests()
287313

288314
def grant_requests_exists(self, request_id: str):
289315
return self.__grant_requests_helper.exists(request_id)
290316

291317
def remove_grant_request(self, request_id):
292318
self.__grant_requests_helper.remove(request_id)
319+
self.__metrics_helper.decrement_pending_requests()
293320

294321
def get_grant_request(self, request_id):
295322
return self.__grant_requests_helper.get(request_id)

plugins/sdm/lib/helper/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@
1111
from .arguments_helper import *
1212
from .grant_request_helper import *
1313
from .whoami_helper import *
14+
from .metrics_helper import *

plugins/sdm/lib/helper/approve_helper.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from grant_request_type import GrantRequestType
44
from .base_evaluate_request_helper import BaseEvaluateRequestHelper
55
from ..util import convert_duration_flag_to_timedelta, get_formatted_duration_string
6+
from metric_type import MetricGaugeType
67

78

89
class ApproveHelper(BaseEvaluateRequestHelper):
@@ -24,6 +25,7 @@ def __approve_assign_role(self, grant_request):
2425
self._bot.add_thumbsup_reaction(grant_request['message'])
2526
self._bot.remove_grant_request(grant_request['id'])
2627
yield from self.__notify_assign_role_request_granted(grant_request['message'], grant_request['sdm_object'].name)
28+
self._bot.get_metrics_helper().increment_manual_approvals()
2729

2830
def __approve_access_resource(self, grant_request):
2931
duration = grant_request['flags'].get('duration')
@@ -37,6 +39,7 @@ def __approve_access_resource(self, grant_request):
3739
self._bot.add_thumbsup_reaction(grant_request['message'])
3840
self._bot.remove_grant_request(grant_request['id'])
3941
yield from self.__notify_access_request_granted(grant_request['message'], resource, duration, needs_renewal)
42+
self._bot.get_metrics_helper().increment_manual_approvals()
4043

4144
def __grant_temporal_access_by_role(self, role_name, account_id):
4245
grant_start_from = datetime.datetime.now(datetime.timezone.utc)

plugins/sdm/lib/helper/base_grant_helper.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
convert_duration_flag_to_timedelta, get_approvers_channel
77
from grant_request_type import GrantRequestType
88

9+
from metric_type import MetricGaugeType
10+
911

1012
class BaseGrantHelper(ABC):
1113
def __init__(self, bot, sdm_service, admin_ids, grant_type, auto_approve_tag_key, auto_approve_all_key):
@@ -91,6 +93,7 @@ def __auto_approve_access_request(self, message, sdm_object, sdm_account, execut
9193
self.__enter_grant_request(message, sdm_object, sdm_account, self.__grant_type, request_id, flags=flags)
9294
self.__bot.log.info("##SDM## %s GrantHelper.__grant_%s granting access", execution_id, self.__grant_type)
9395
yield from self.__bot.get_approve_helper().evaluate(request_id, is_auto_approve=True)
96+
self.__bot.get_metrics_helper().increment_auto_approvals()
9497

9598
def __request_manual_approval(self, message, sdm_object, sdm_account, execution_id, request_id, sender_nick, flags: dict):
9699
approvers_channel_name = sdm_object.tags.get(self.__bot.config['APPROVERS_CHANNEL_TAG']) if sdm_object.tags else None

0 commit comments

Comments
 (0)