You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For example, it hangs here: testTensorBoardPortSetOnlyOnChiefWorker
2019-10-02 23:44:28 INFO ApplicationMaster:886 - Received result registration request with exit code 0 from chief 0
2019-10-02 23:44:28 INFO ApplicationMaster:893 - Unregistering task [chief:0] from Heartbeat monitor..
2019-10-02 23:44:31 INFO ApplicationMaster:851 - All 3 tasks registered.
2019-10-02 23:44:31 INFO ApplicationMaster:851 - All 3 tasks registered.
2019-10-02 23:44:31 INFO ApplicationMaster:886 - Received result registration request with exit code 0 from worker 0
2019-10-02 23:44:31 INFO ApplicationMaster:893 - Unregistering task [worker:0] from Heartbeat monitor..
2019-10-02 23:44:31 INFO ApplicationMaster:886 - Received result registration request with exit code 0 from ps 0
2019-10-02 23:44:31 INFO ApplicationMaster:893 - Unregistering task [ps:0] from Heartbeat monitor..
2019-10-02 23:44:59 INFO Client:871 - Retrying connect to server: . Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-02 23:45:00 INFO Client:871 - Retrying connect to server: . Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-02 23:45:01 INFO Client:871 - Retrying connect to server: . Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-02 23:45:02 INFO Client:871 - Retrying connect to server: . Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-02 23:45:03 INFO Client:871 - Retrying connect to server: . Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
All three tasks unregistered with AM, but then ipc.Client retry policy kicks in which causes test to exceed 10 min timeout. Seems it's trying to talk to RM, probably to unregisterApplicationMaster, but can't contact it for some reason.
The text was updated successfully, but these errors were encountered:
For example, it hangs here: testTensorBoardPortSetOnlyOnChiefWorker
All three tasks unregistered with AM, but then ipc.Client retry policy kicks in which causes test to exceed 10 min timeout. Seems it's trying to talk to RM, probably to unregisterApplicationMaster, but can't contact it for some reason.
The text was updated successfully, but these errors were encountered: