This is an in-class exercise on hash functions that created at Northeastern University. Its purpose is to:
- illustrate how user-written hash functions affect hash set performance
- challenge students to think about what makes a good hash function for a given corpus
It can be used with Gradescope or GitHub Classroom, and we are working on a Pawtograder version.
The recommended usage is:
- Going over the implementation of HashSet.java
with students, showing them how the
hashCode()function is used. - Discussing the contract for
hashCode(), which requiresequals()objects to have equal hash codes. Point out that this is a legal implementation but that it will lead to maximal collisions, reducing the set table's performance to that of a linked list:
@Override
public int hashCode(){
return 42;
}- Providing students with MyString.java and
challenge them to write a better hash function without calling any existing
implementations of
hashCode(). (It is up to you whether to provide any examples of good hash functions or allow them access to online resources.) - Having students submit their files to a Gradescope assignment you create for immediate feedback on the number of collisions their function causes on a hidden (or visible) corpus.
- Sharing the implementations that perform the best, either in that class session or the next one.
Follow video instructions to create a Gradescope assignment, which can be summarized as follows:
- Download
autograder.zipfrom the latest release or build the zip file by cloning this repository and running./make_gradescope_autograder.sh. - Create a Programming Assignment on Gradescope, enabling a leaderboard.
- Configure the autograder, which includes:
- Selecting the latest Ubuntu image.
- Selecting JDK 17.
- Uploading
autograder.zip. - Clicking on
Update Autograder. - Uploading a test submission from a sample-submissions subdirectory.
See README-GITHUB-CLASSROOM.md.
Here are ideas for customizing the assignment:
- Replacing the corpus, such as using the names of your students or a list of courses offered at your university.
- De-generifying
HashSetto simplify the code. - Replace
MyStringwith a class students have used for another assignment.
Anyone interested in hash functions, especially for Java, should read Item 11: Always override hashCode when you override equals in Effective Java (3rd edition) by Joshua Bloch (Pearson, 2018).
The Java 1 String.hashCode() implementation
is an interesting historical relic that has the curious
property of O(1) runtime.