Skip to content

Choropleth maps with categorical legends are larger than those with continuous legends due to duplicate copies of shape outlines within HTML files #5291

@kburchfiel

Description

@kburchfiel

I am finding that choropleth maps that use a categorical value for the color parameter are significantly larger than those that use a continuous value. This appears to be due to duplicate copies of boundary data within the categorical maps' HTML files.

Here's an simple set of code that replicates this problem: (I ran it using the latest copies of both Plotly and Geopandas.)

# Categorical map tests

import plotly.express as px
import geopandas as gpd

gdf_states = gpd.read_file(
'https://raw.githubusercontent.com/ifstudies/simplified_shapefiles/\
refs/heads/main/state_shapefiles_simplified.json')

# Creating meaningless category columns for mapping purposes:
gdf_states['Continuous_Vals'] = gdf_states.index % 4 + 1
gdf_states['Categorical_Vals'] = gdf_states[
'Continuous_Vals'].astype('str')
gdf_states.set_index('NAME', inplace = True)

# Creating a map with a continuous color scale:
# The following code was based on
# https://plotly.com/python/choropleth-maps/
# and https://plotly.com/python/tile-county-choropleth/ .
fig_continuous_scale_map = px.choropleth_map(
gdf_states, geojson = gdf_states.geometry,
locations = gdf_states.index,
zoom = 3, center = {'lat':37, 'lon': -96},
color = 'Continuous_Vals', map_style = 'white-bg')
fig_continuous_scale_map.write_html('fig_continuous_scale_map.html',
                                   include_plotlyjs='cdn')

# Creating a map with a categorical legend:
# The following code was based on
# https://plotly.com/python/choropleth-maps/
# and https://plotly.com/python/tile-county-choropleth/ .
fig_categorical_map = px.choropleth_map(
gdf_states, geojson = gdf_states.geometry,
locations = gdf_states.index,
zoom = 3, center = {'lat':37, 'lon': -96},
color = 'Categorical_Vals', map_style = 'white-bg')
fig_categorical_map.write_html('fig_categorical_map.html',
                                   include_plotlyjs='cdn')

And here are screenshots of the basic maps created by this script:

Continuous map:

Image

Categorical map:
Image

The two choropleth maps created by this code are almost identical, except that the first uses a continuous scale for its color parameter and the second uses a categorical scale. However, while the continuous-scale map is around 290 KB in size, the categorical one is 1.1 MB in size.

A review of the HTML files explains why this is the case: state boundaries are defined just once within the continuous-scale map, but four times within the categorical map (once for each category, I assume). This results in a much larger file.

Obviously, both of these maps are still pretty small, but I'm finding this inefficiency to be an issue for maps with very high numbers of shapefiles (e.g. Census tracts). One particular map of Census tracts was around 45 MB in size when a continuous scale was used, but 160 MB when a categorical scale was applied.

If there's any way to get categorical maps to store only one set of region boundaries within their HTML files, that would be a huge help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions