-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
I am finding that choropleth maps that use a categorical value for the color
parameter are significantly larger than those that use a continuous value. This appears to be due to duplicate copies of boundary data within the categorical maps' HTML files.
Here's an simple set of code that replicates this problem: (I ran it using the latest copies of both Plotly and Geopandas.)
# Categorical map tests
import plotly.express as px
import geopandas as gpd
gdf_states = gpd.read_file(
'https://raw.githubusercontent.com/ifstudies/simplified_shapefiles/\
refs/heads/main/state_shapefiles_simplified.json')
# Creating meaningless category columns for mapping purposes:
gdf_states['Continuous_Vals'] = gdf_states.index % 4 + 1
gdf_states['Categorical_Vals'] = gdf_states[
'Continuous_Vals'].astype('str')
gdf_states.set_index('NAME', inplace = True)
# Creating a map with a continuous color scale:
# The following code was based on
# https://plotly.com/python/choropleth-maps/
# and https://plotly.com/python/tile-county-choropleth/ .
fig_continuous_scale_map = px.choropleth_map(
gdf_states, geojson = gdf_states.geometry,
locations = gdf_states.index,
zoom = 3, center = {'lat':37, 'lon': -96},
color = 'Continuous_Vals', map_style = 'white-bg')
fig_continuous_scale_map.write_html('fig_continuous_scale_map.html',
include_plotlyjs='cdn')
# Creating a map with a categorical legend:
# The following code was based on
# https://plotly.com/python/choropleth-maps/
# and https://plotly.com/python/tile-county-choropleth/ .
fig_categorical_map = px.choropleth_map(
gdf_states, geojson = gdf_states.geometry,
locations = gdf_states.index,
zoom = 3, center = {'lat':37, 'lon': -96},
color = 'Categorical_Vals', map_style = 'white-bg')
fig_categorical_map.write_html('fig_categorical_map.html',
include_plotlyjs='cdn')
And here are screenshots of the basic maps created by this script:
Continuous map:

The two choropleth maps created by this code are almost identical, except that the first uses a continuous scale for its color
parameter and the second uses a categorical scale. However, while the continuous-scale map is around 290 KB in size, the categorical one is 1.1 MB in size.
A review of the HTML files explains why this is the case: state boundaries are defined just once within the continuous-scale map, but four times within the categorical map (once for each category, I assume). This results in a much larger file.
Obviously, both of these maps are still pretty small, but I'm finding this inefficiency to be an issue for maps with very high numbers of shapefiles (e.g. Census tracts). One particular map of Census tracts was around 45 MB in size when a continuous scale was used, but 160 MB when a categorical scale was applied.
If there's any way to get categorical maps to store only one set of region boundaries within their HTML files, that would be a huge help!