You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the latest TonY code, I've noticed that the TonY Portal shows applications as RUNNING even after they have finished. Consequently, when I click on the "<application_id>" link, it returns an error "Cannot display events because job is still running". However, the tony-final.xml and RM links still work.
The text was updated successfully, but these errors were encountered:
erwa
changed the title
TonY History Server shows job status as RUNNING even after it has finished
TonY Portal shows job status as RUNNING even after it has finished
May 2, 2019
if the job finishes normally, the .jhist.inprogress file should get renamed to .jhist (this happens at the end of ApplicationMaster.run() -- eventHandler.stop() is called and that calls moveInProgressToFinal())
what might be happening is when the job starts running, the TonY Portal has already loaded it into its in-memory cache as a RUNNING job and when the job finishes, it won't get updated to SUCCEEDED/FAILED until the next time the HistoryFileMover runs. One way to speed this up and have the job's state updated immediately is for the job to query some "I'm finished" endpoint in the TonY Portal so TonY Portal knows it can immediately process that job and update its in-memory state
Jobs that get KILLED in the middle are trickier. As far as I know, if you "yarn kill" an application, whether from the CLI or UI, the application doesn't have an opportunity to do a graceful shutdown (e.g.: change the history file to a finished state) (@hungj , I think this would be a nice feature to add in YARN.), so the history file might forever remain as .jhist.inprogress and TonY Portal will forever show it as RUNNING. Currently, the job files will eventually be cleaned up by the HistoryFilePurger after they've hit the retention period (default 30 days). However, it would be nice if the HistoryFileMover could also periodically check in-progress jobs by querying the RM to detect KILLED jobs. If a job has a .jhist.inprogress file but the RM says the job has already been KILLED, then the HistoryFileMover can go ahead and move the files and update its in-memory state.
Another thing to check if you see this issue coming up for normally-terminated jobs is what version of TonY they're using. Also, you should see if after 5 minutes (the default history file mover check interval), whether the history files are moved to the finished/ directory and the state of the job updated in the TonY Portal.
With the latest TonY code, I've noticed that the TonY Portal shows applications as RUNNING even after they have finished. Consequently, when I click on the "<application_id>" link, it returns an error "Cannot display events because job is still running". However, the tony-final.xml and RM links still work.
The text was updated successfully, but these errors were encountered: