Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorted Strings to Fix test flakiness in testHumanPrinterAll #1

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

kavvya97
Copy link
Owner

@kavvya97 kavvya97 commented Oct 9, 2023

Setup:
Java version: openjdk 11.0.20.1
Maven version: Apache Maven 3.6.3

Description of PR

The test org.apache.hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter#testHumanPrinterAll can fail due to flakiness. These flakiness occurs because the test utilizes Hashmaps values and converts the values to string to perform the comparision and the order of the objects returned may not be necessarily maintained. This can be detected by utilizing the Nondex plugin.

Steps to reproduce:

  1. git clone https://github.com/apache/hadoop
  2. mvn install -pl hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core -am -DskipTests
  3. Run the tests
    mvn -pl hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core test -Dtests=org.apache.hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter#testHumanPrinterAll
  4. Run the test with the Nondex tool and observe the test results
    mvn -pl hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest=org.apache.hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter#testHumanPrinterAll
    • Test Fails when Running Nondex in ONE mode (Assumes deterministic implementation of code but shuffled once different from underlying implementation) -DnondexMode=ONE & FULL Mode -DnondexMode=FULL (shuffles differently for each call)

For code changes:

The test utilizes Hashmap for storing job information and builts the string using HashMap.values(). However, the order of the objects returned by a Hashmap may not be maintained. Hence the test fails due to the string comparision. The following error occurs

testHumanPrinterAll(org.apache.hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter)  Time elapsed: 0.297 s  <<< FAILURE!
org.junit.ComparisonFailure:
expected:<...8501754_0001_m_00000[7	6-Oct-2011 19:15:09	6-Oct-2011 19:15:16 (7sec)

SUCCEEDED MAP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error	InputSplits
====================================================
task_1317928501754_0001_m_000006	6-Oct-2011 19:15:08	6-Oct-2011 19:15:14 (6sec)

SUCCEEDED MAP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error	InputSplits
====================================================
task_1317928501754_0001_m_000005	6-Oct-2011 19:15:07	6-Oct-2011 19:15:12 (5sec)

SUCCEEDED MAP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error	InputSplits
====================================================
task_1317928501754_0001_m_000004	6-Oct-2011 19:15:06	6-Oct-2011 19:15:10 (4sec)

SUCCEEDED MAP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error	InputSplits
====================================================
task_1317928501754_0001_m_000003	6-Oct-2011 19:15:05	6-Oct-2011 19:15:08 (3sec)

SUCCEEDED REDUCE task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error
====================================================
task_1317928501754_0001_r_000008	6-Oct-2011 19:15:10	6-Oct-2011 19:15:18 (8sec)

SUCCEEDED JOB_CLEANUP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error
====================================================
task_1317928501754_0001_c_000009	6-Oct-2011 19:15:11	6-Oct-2011 19:15:20 (9sec)

JOB_SETUP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	HostName	Error	TaskLogs
====================================================
attempt_1317928501754_0001_s_000001_1	6-Oct-2011 19:15:03	6-Oct-2011 19:15:04 (1sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_s_000001_1

MAP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	HostName	Error	TaskLogs
====================================================
attempt_1317928501754_0001_m_000007_1	6-Oct-2011 19:15:09	6-Oct-2011 19:15:16 (7sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000007_1
attempt_1317928501754_0001_m_000002_1	6-Oct-2011 19:15:04	6-Oct-2011 19:15:06 (2sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000002_1
attempt_1317928501754_0001_m_000006_1	6-Oct-2011 19:15:08	6-Oct-2011 19:15:14 (6sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000006_1
attempt_1317928501754_0001_m_000005_1	6-Oct-2011 19:15:07	6-Oct-2011 19:15:12 (5sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000005_1
attempt_1317928501754_0001_m_000004_1	6-Oct-2011 19:15:06	6-Oct-2011 19:15:10 (4sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000004_1
attempt_1317928501754_0001_m_000003_1	6-Oct-2011 19:15:05	6-Oct-2011 19:15:08 (3sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000003]_1

REDUCE task list...> but was:<...8501754_0001_m_00000[5	6-Oct-2011 19:15:07	6-Oct-2011 19:15:12 (5sec)

SUCCEEDED MAP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error	InputSplits
====================================================
task_1317928501754_0001_m_000006	6-Oct-2011 19:15:08	6-Oct-2011 19:15:14 (6sec)

SUCCEEDED MAP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error	InputSplits
====================================================
task_1317928501754_0001_m_000004	6-Oct-2011 19:15:06	6-Oct-2011 19:15:10 (4sec)

SUCCEEDED MAP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error	InputSplits
====================================================
task_1317928501754_0001_m_000007	6-Oct-2011 19:15:09	6-Oct-2011 19:15:16 (7sec)

SUCCEEDED MAP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error	InputSplits
====================================================
task_1317928501754_0001_m_000003	6-Oct-2011 19:15:05	6-Oct-2011 19:15:08 (3sec)

SUCCEEDED REDUCE task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error
====================================================
task_1317928501754_0001_r_000008	6-Oct-2011 19:15:10	6-Oct-2011 19:15:18 (8sec)

SUCCEEDED JOB_CLEANUP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	Error
====================================================
task_1317928501754_0001_c_000009	6-Oct-2011 19:15:11	6-Oct-2011 19:15:20 (9sec)

JOB_SETUP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	HostName	Error	TaskLogs
====================================================
attempt_1317928501754_0001_s_000001_1	6-Oct-2011 19:15:03	6-Oct-2011 19:15:04 (1sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_s_000001_1

MAP task list for job_1317928501754_0001
TaskId		StartTime	FinishTime	HostName	Error	TaskLogs
====================================================
attempt_1317928501754_0001_m_000004_1	6-Oct-2011 19:15:06	6-Oct-2011 19:15:10 (4sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000004_1
attempt_1317928501754_0001_m_000005_1	6-Oct-2011 19:15:07	6-Oct-2011 19:15:12 (5sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000005_1
attempt_1317928501754_0001_m_000003_1	6-Oct-2011 19:15:05	6-Oct-2011 19:15:08 (3sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000003_1
attempt_1317928501754_0001_m_000007_1	6-Oct-2011 19:15:09	6-Oct-2011 19:15:16 (7sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000007_1
attempt_1317928501754_0001_m_000002_1	6-Oct-2011 19:15:04	6-Oct-2011 19:15:06 (2sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000002_1
attempt_1317928501754_0001_m_000006_1	6-Oct-2011 19:15:08	6-Oct-2011 19:15:14 (6sec)	localhost	http://t:1234/tasklog?attemptid=attempt_1317928501754_0001_m_000006]_1

Fix

The Fix involves extracting a single line of the string and sorting them to perform the string comparision. Since the strings are sorted, The comparision passes successfully. The Alternate way of sorting was converting to json but string being built is not json. Moreover, There was no library available to convert string to json.

How was this patch tested?

The fix was tested by adding a suitable fix and running the Nondex plugin again and ensuring that all the tests pass in FULL Mode and ONE Mode of the Nondex runs.

@bbelide2
Copy link

  1. Instead of mentioning "the test can fail", you can mention "test can fail due to flakiness" or "test becomes flaky"
  2. You can show the exact error reported when NonDex is run. The error shows the difference between expected and actual values. It will clearly show the problem in the description.
  3. Instead of adding code pointers as hyperlinks, you can add them as embedded which is easy to read. You can refer to the code links I added here
  4. You can consider following this order which may be easier to understand - problem, root cause, fix, setup/how to test?
  5. Did you consider changing HashMap to LinkedHashMap?
  6. [Minor] Nondex hyperlink is not added properly.

@krishnanand5
Copy link

Most of the changes I wanted to suggest have been suggested by @bbelide2. You can maybe add a line to indicate that the test execution has not been hindered by your changes.

@kavvya97
Copy link
Owner Author

kavvya97 commented Oct 13, 2023

  1. Instead of mentioning "the test can fail", you can mention "test can fail due to flakiness" or "test becomes flaky"

Fixed

  1. You can show the exact error reported when NonDex is run. The error shows the difference between expected and actual values. It will clearly show the problem in the description.

The exact error is very Huge and it is a assertion error. Pasting the whole string would not be useful

  1. Instead of adding code pointers as hyperlinks, you can add them as embedded which is easy to read. You can refer to the code links I added here

I am not sure how you can embed a link. I tried copy perma link but that didn't work. @bbelide2 can you tell me how to embed the code

  1. You can consider following this order which may be easier to understand - problem, root cause, fix, setup/how to test?

The above template for the PR follows the hadoop default PR template.

  1. Did you consider changing HashMap to LinkedHashMap?

Yes. However, HashMap is used at multiple places for constructing the string, Including the source code and the tests. The above fix address the tests without touch the source code.

  1. [Minor] Nondex hyperlink is not added properly.

Fixed

@bbelide2
Copy link

Looks good now. For embedding the code, you just have to select the code, copy permalink and paste the link in the description.

@kavvya97
Copy link
Owner Author

Looks good now. For embedding the code, you just have to select the code, copy permalink and paste the link in the description.

Hey sukesh, I have done the same. unfortunately, I am not able to make the code be seen in the PR description

@kavvya97 kavvya97 force-pushed the nondex-testHistoryViewPrinterAll branch 2 times, most recently from 0ce8ce9 to 32af584 Compare October 31, 2023 15:06
@kavvya97 kavvya97 force-pushed the nondex-testHistoryViewPrinterAll branch from 32af584 to 777daa6 Compare October 31, 2023 19:38
@lxb007981
Copy link

This fix looks good and elegant to me. Using containsExactlyInAnyOrderElementsOf is a good idea.

@harshith2000
Copy link

This fix looks good to me. Converting the string to a list and checking if they're the same (without the order) seems a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants