-
Notifications
You must be signed in to change notification settings - Fork 6
Enhanced hexbin for ggplot2
Create a package that adds enhanced hexbin capabilities to {ggplot2}
{hexbin} is an R package that allows one to effectively plot and visualize data with large numbers of points. In particular, it allows the user to create plots that do not turn into a solid mass of dots and overwhelm the eye. It is of particular use in finance when conducting stock market studies with many thousands of stocks, but is broadly applicable to other contexts as well.
The basic idea underlying hexbin is simple – it represents a collection of nearby points with a single hexagonal bin (or hexbin) whose size is proportional to the number of points it represents. Quite often, tens or even hundreds of points can be gathered into a single hexbin, allowing one to get a quick visual feel for the density of points while still allowing one to see regression lines etc. that are drawn through the data, as well as text that is overlaid on it.
Currently the {hexbin} package is built on lattice, and while basic hexbin plots serve their intended purpose, any additional complexity (e.g. regression lines, text, legends etc.) are not handled well (or at all).
The purpose of this project is to create a package to add enhanced hexbin functionality to {ggplot2}, giving users access to all the power and flexibility that is available in a ggplot while enhancing the ease of creating visually appealing graphs with large numbers of points.
That {ggplot2} function does not appear to make use of Dan Carr's hexbin algorithm, e.g., the hexbins all have the same size and do not automatically adjust their size to reflect local density.
The currently best capability for hexbin plots is in the package {hexbin} on CRAN. It is pretty good, but does not support standard style additions such as legends with flexible placement, line and point overlays, among others, such as in the attached plots.
Create a new package gghexbin that allows the hexbinning algorithm from the hexbin package to be used with ggplot2. Do this in such a way that all existing ggplot2 layer capabilities can be used. Illustrate the use of the use of gghexbin by creating examples including, but not limited to, color hexbins, flexible legends (both inside and outside the plot region), the addition of regression lines and text and graphics annotations.
Write a vignette showing how a variety of hexbin plots may be made using gghexbin.
Students should propose a realistic project plan. Quality is more important than quantity, so your proposal may not contain all targeted functionality.
Communications with the current {hexbin} package maintainer, Edzer Pebesma, sugggest that it would best to create the overall hexbin capability needed using {ggplot2} (the current hexbin functionality is built on lattice).
The proposed project will result in a more useful (and nicer looking) hexbin capability.
Students, please contact mentors below after completing at least one of the tests below.
- EVALUATING MENTOR - Thomas Philips: [email protected]
- Doug Martin: [email protected]
- Brian Peterson: [email protected]
- Peter Carl: [email protected]
Students, please do one or more of the following tests before contacting the mentors:
- Demonstrate how to use the current hexbin package capability to plot a large data set (a dataset with stock valuations and returns will be provided by a mentor if necessary, but start by playing around with one of the data sets in dslabs)
- Compare the resulting hexbin plot to your favorite ggplot of the same data.
- Complete the exercises found at this link.
In addition to the test above, applicants should demonstrate that they have:
- A very good working knowledge of programming in R
- Familiarity with the construction of R packages
- Good coding standards (Google’s R style guide)
- Experience with GitHub
Please email the results of your tests to the mentors and add a link to your profile or website below.
Students, please post contact information here and send a link to completed test results to the mentors listed above:
- Student name:
- Email:
- University:
- Program:
- Solution to Test Exercises:
Carr, D B, Olsen, A R, and White, D. Hexagon mosaic maps for display of univariate and bivariate geographical data. United States: N. p., 1994. Available here.