Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create helper scripts to help analyze what happened with cluster #228

Open
mattsfuller opened this issue Sep 1, 2016 · 1 comment
Open

Comments

@mattsfuller
Copy link

We need to create some scripts (maybe as a part of presto-admin), which will help to identify issues with presto cluster.

Ideally we should be able to detect:

  • long GC pauses based on GC log if enabled
  • jvm crashes

It would create timeline of events which happened in given time period:
{code}
presto-admin show-events 24h
2015-01-01 00:00:000 Node 10.10.0.1 started
2015-01-01 01:00:000 Node 10.10.0.2 crashed (Out of memory error)
2015-01-01 02:00:000 Node 10.10.0.3 long STW GC pause (22.003 seconds)
{code}

We should be able to do this based on gc and launcher logs.

@mattsfuller
Copy link
Author

This is an extension to the existing collect logs presto-admin command. Basically, it would look through the logs (and maybe also jmx stats) to produce a timeline of what's happening on the cluster.
This seems to me something that would be a fun hackathon project, but not something that's essential to work on right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant