Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turning data into choropleth maps with Python and Folium #604

Open
hawc2 opened this issue Mar 19, 2024 · 63 comments
Open

Turning data into choropleth maps with Python and Folium #604

hawc2 opened this issue Mar 19, 2024 · 63 comments

Comments

@hawc2
Copy link
Collaborator

hawc2 commented Mar 19, 2024

Programming Historian in English has received a proposal for an original lesson, 'Turning data into choropleth maps with Python and Folium,' by @adamlporter.

I have circulated this proposal for feedback within the English team. We have considered this proposal for:

  • Openness: we advocate for use of open source software, open programming languages and open datasets
  • Global access: we serve a readership working with different operating systems and varying computational resources
  • Multilingualism: we celebrate methodologies and tools that can be applied or adapted for use in multilingual research-contexts
  • Sustainability: we're committed to publishing learning resources that can remain useful beyond present-day graphical user interfaces and current software versions

We are pleased to have invited @adamlporter to develop this Proposal into a Submission under the guidance of @nabsiddiqui.

The Submission package should include:

  • Lesson text (written in Markdown)
  • Figures: images / plots / graphs (if using)
  • Data assets: codebooks, sample dataset (if using)

We ask @adamlporter to share their Submission package with our Publishing team by email, copying in the editors.

We've agreed a submission date of early April. We ask @adamlporter to contact us if they need to revise this deadline.

When the Submission package is received, our Publishing team will process the new lesson materials, and prepare a Preview of the initial draft. They will post a comment in this Issue to provide the locations of all key files, as well as a link to the Preview where contributors can read the lesson as the draft progresses.

If we have not received the Submission package by April, @nabsiddiqui will attempt to contact @adamlporter. If we do not receive any update, this Issue will be closed.

Our dedicated Ombudspersons are Ian Milligan (English), Silvia Gutiérrez De la Torre (español), Hélène Huet (français), and Luis Ferla (português) Please feel free to contact them at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudspersons will have no impact on the outcome of any peer review.

@charlottejmc
Copy link
Collaborator

charlottejmc commented May 10, 2024

Hello @nabsiddiqui and @adamlporter,

You can find the key files here:

You can review a preview of the lesson here:


While processing this new lesson, I encountered a couple of queries which I’d like to ask for your help with:

image

Could you please confirm whether these are needed for the lesson? If so, we can host them too.

  • I'm seeing two images with extremely long (and slightly obscure) links at lines 1275 and 1279 on the markdown file. Would you like to keep them in? I can save them locally, then upload and embed them using our liquid syntax instead. However, we are committed to principles of minimal compute, which includes ensuring our pages remain as light as possible. Since these images are not crucial to understanding the lesson (and illustrate a rather easy step!), I wonder whether you would mind if we simply removed them altogether.

Thank you! ✨

@anisa-hawes
Copy link
Contributor

anisa-hawes commented May 10, 2024

Thank you for setting this up, @charlottejmc!

--

Hello @adamlporter,

Thank you for your patience and collaboration. I apologise for my confusion about how the Colab notebook you created intersects with your lesson. As I mentioned in my email, we are relatively new to handling codebooks within our lessons. The key objective of the guideline notes I shared with you is to make sure learning actions are accessible to all (whatever development environment they choose to work in) and to make sure we are able to sustain the lesson (performing technical maintenance if it is needed) in the future. What I hadn't understood upon first quick-reading, but what I understand better now is that this lesson is specifically is Colab-based. For that reason, I'd like to suggest that we foreground this fact -- I'd be keen to hear @nabsiddiqui and @hawc2's thoughts, but I might suggest including Colab in the title of this lesson.

The general approach we've developed encourages authors to think of codebooks as a supporting 'asset' that aggregates all the code used in the lesson, allowing readers to run it directly in the cloud, no matter their local technical specifications (as you very neatly express in the lesson). In this case (of a lesson specifically centred on using Colab as a workspace for creating the chloropleth maps) I wonder if the accompanying codebook could be pared back completely so that it contains only the code (+ essential line comments) and does not replicate any of the lesson text. We suggest that headings and subheadings are kept in place and advise that this mirrors the lesson's heading structure to support readers' navigation. We've made a new copy of the codebook which is hosted within our Google workspace. (I'll send you an invitation to edit that copy now).

If we re-centre the lesson as read on our website (Markdown + inline code) in this way, some small copyediting adjustments will be needed throughout. For example, 'in the next cell' would be replaced with 'in the code block below'. Charlotte and I would be happy to help with this.

Alongside this, some small changes would also be necessary in the section titled Colab to clarify how you intend readers use the Colab notebook to facilitate the learning actions taught by the lesson. Something like: 'We have set up a Google Colab notebook for you to use as you work through this lesson [...]' at that point in the text, we'll share a link to the codebook (hosted on our organisational workspace).


One final idea which you may have thought about previously (and which I'm sure you will discuss with Nabeel as you develop the lesson together) is how Colab enables you to show the maps as interactive visualisation. If that is a feature you'd like to explain within the lesson, you could consider creating some .gif animations that express the interactivity. We can handle .gif files in exactly the same we handle static images within our liquid syntax. So if you want to replace one or two of the static images with animations, we'd be happy to swap the files for you.

--

A couple of other very small notes about the adjustments we made to the Markdown file during set-up:

  • I transformed the html tables you've created into Markdown tables according to our conventions: ad51031 I've checked each one, but I'd like to ask you to double-check that they render as you intended. We can adjust as needed.
  • We've added placeholder alt_text + captions for each of the figure images, which we can work together to populate as the images are finalised.
  • I think Charlotte's question (above) about the two images which render in the Markdown preceded by ![image.png](data:image [...] is probably caused by my confusion too... You mentioned that you inserted the images of Colab's file/save functions directly into your notebook, and that function code transferred to the Markdown. Did you intend that these two screenshots of the file menu would be included as figures in the lesson? We can slot them into the sequence if you want to include them. My sense is that a sentence to describe the action of opening the file menu could be just as effective, (and preferred in terms of accessibility).
    --

@adamlporter
Copy link
Collaborator

adamlporter commented May 12, 2024 via email

@adamlporter
Copy link
Collaborator

adamlporter commented May 12, 2024 via email

@charlottejmc
Copy link
Collaborator

charlottejmc commented May 15, 2024

Thanks very much @adamlporter,

I've uploaded the additional assets for this lesson here, including the Fatal Force documentation (README.md).

  • Because Fatal Force's dataset is continuously updated, I think it will be important to clarify within the lesson that data taken from the original GitHub repository will be different to the data in the .csv file, which I have captured and uploaded to the assets folder today.

  • Both the Fatal Force and US Census datasets are open source, but we'll still want to reference them accurately using endnotes. If you have a suggested citation for them please do reply in a comment, or we can work on it together.

  • I also just want to make sure you double check the code at line 268 now, as I suppose it might need a slight adjustment if readers are using the cb_2021_us_county_5m.zip file from our repo, rather than the online link.

We've also removed the two screenshots that illustrated downloading the files.

Thanks again,

Charlotte ✨

@anisa-hawes anisa-hawes moved this from 1 Submission to 2 Initial Edit in Active Lessons May 15, 2024
@anisa-hawes
Copy link
Contributor

anisa-hawes commented May 15, 2024

Thank you, @adamlporter (and @charlottejmc!)


Hello @nabsiddiqui,

You'll note that a few general thoughts and queries were raised during the Submission set-up, to which Adam has responded. Charlotte has noted a few practical details which we will be happy to collaborate on with Adam during Phase 3 Revision 1.

This Submission is ready for your Initial Edit! ✨


What's happening now?

Hello @adamlporter. Your lesson has been moved to the next phase of our workflow which is Phase 2: Initial Edit.

In this Phase, your editor Nabeel @nabsiddiqui will read your lesson, and provide some initial feedback. Nabeel will post feedback and suggestions as a comment in this Issue, so that you can revise your draft in the following Phase 3: Revision 1.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 1 <br> Submission
Who worked on this? : Publishing Assistant (@charlottejmc)
All  Phase 1 tasks completed? : Yes
Section Phase 2 <br> Initial Edit
Who's working on this? : Editor (@nabsiddiqui)
Expected completion date? : June 12
Section Phase 3 <br> Revision 1
Who's responsible? : Author (@adamlporter)
Expected timeframe? : ~30 days after feedback is received
Loading

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

@nabsiddiqui
Copy link
Collaborator

Thank you, @adamlporter, @charlottejmc, @anisa-hawes. I plan to get this done sometime in the next week or two. I will let you know if I need anything in the mean time.

@nabsiddiqui
Copy link
Collaborator

Hello @adamlporter,

Thank you for this wonderful article. My initial edits and thoughts are below:

General Comments

Comments are based on the paragraph numbers on the draft, which can be seen here: https://programminghistorian.github.io/ph-submissions/en/drafts/originals/data-into-choropleth-maps-with-python-and-folium.

Overall, I really enjoyed the article and think it is a good explanation of Folium. I felt the organization was well done. I've divided my comments into these general comments and line edits.

  1. I do not believe that in the section discussing the different ways to get a scale variable that it is necessary to talk about all three methods. You mention that there are issues with capping the scale. I would not include the scale information as you already discuss many of the issues that come with it. Instead, I would remove these methods and just use the log scale. You will then need to also remove the part about
  2. You should mention earlier on the issue with normalizing data for choropleth maps. It is fine to not include this in the main text, but there should be more information earlier on with some of the issues that come about not normalizing the data and stating that there is information in the Appendix for those interested in looking at it.
  3. When the map is displayed throughout the article, the "m" is easy to miss in the code. Rename this variable to something more descriptive so that it is easier to notice.
  4. Since the data changes, it is useful to create a prominent note early on that you may not get the results that are shown in the file. You could also "halt” the data at some point, upload it to the Programming Historian GitHub, and use that throughout the tutorial
  5. The LaTeX code is not working throughout the article. This is largely due to our renderer I believe. @anisa-hawes would know more. We may need to have images of the formula.

Line Edits

Introduction

Paragraph 1-The link to the Covid-19 infection/death rates leads to a page that requires you to login into ArcGIS; Combine with paragraph 2

Paragraph 3-The last sentence isn’t necessary here as it is explained in the rest of the tutorial

Paragraph 4-“How to” -> “Know how to”; “One issue” -> “Understand that a key issue”

Paragraph 6: Combine sentences such as: “To create the maps, we will use Folium, a Python library…”

Folium

Paragraph 8-"CSS and Javascript,” -> “CSS and Javascript — if you need some help, see Programming Historian’s article….

Paragraph 10-“advanced features:” -> “advanced features, such as…”

Colab

Paragraph 19-Paragraph is not needed as Paragraph 18 and 22 covers all this

Paragraph 24-“In the next cell” -> “In the first cell”; When I ran the code in Google Colab, it says that the requirements are already met. You may want to double check if its still the case that Colab requires this to be installed. Otherwise, you can remove this and the earlier wording about geopandas.

Fatal Force Data

Paragraph 29-Let people know that the sample data they get will likely be different than the one displayed in the lesson

County Geometry Data

Paragraph 47-The first sentence is cnofusing. Match what data in ff_df? Be specific.; They are combined in the” -> “The numbers are combined in the”

Preparing the Data

Paragraph 56-“Doesn’t have a” -> “Doesn’t have an”

Define the Question

Paragraph 67-“Now that we have (1) a DF with data….->”Now that we have a DF with data (ff_df) and a DF with county geometries (counties) that share a common field (FIPS), we are ready to draw a map.

Paragraph 71-“someone being killed by a police officer” -> '”a police officer killing someone”

Draw the Map

Paragaph 76-This paragraph is not needed. I feel the fact you are using Folium is already implied in the previous paragraph

Paragraph 78-Might want to put another sentence here that says you will explain the code more below and that the reader should just paste the code for now; The code that displays the map at the end is easy to miss. It may be easier to break it into its own code chunk.

Paragraph 80-“choroplet” -> “choropleth”

Paragraph 81-I would recommend getting rid of the note and the following code and bullet point. You already mentioned earlier that it is not necessary to have the columns be the same name.

The Problem of Uneven Distribution of Data

Paragraph 83-Remove second sentence as you mention it again in paragraph 86

Paragraph 96-Dollar signs are present at the end of this. I think this is to show LaTeX? I don’t believe this is rendering properly.

Paragraph 98-At the end of the paragraph, write the following “We will look at two different solutions: The

Solution 1: Fisher-Jenks algorithm

Paragraph 100-May need to get rid of the GeoPandas part if it turns out you don’t actually need this.

Paragraph 101-Again the “m” at the end is easy to miss. Make it a seperate code block

Paragraph 104-“but especially at the lower end of the scale…” -> “but the lower end of the scale is illegible.”

Paragraph 105-Delete this paragraph

How to add a scale-value column

Paragraph 109-“For this explanation, I will assume” -> “For this explanation, I will…”

Method 1: Use a Log Scale

Paragraph 126-“We need not” -> “We do not need”

Paragraph 133-For this, you can just write something to the effect of “For the purposes of this tutorial and its learning goals, you do not need to know the specifics of the following code. It simply replaces log values with original values.”; I would also remove the code until later since it seems that someone should paste it in before the part it is used in paragraph 135

Method 2: Cap the Scale Manually

I don’t believe that you need this section. It is too confusing with all the moving pieces. Using the first method is sufficient, so I would remove this and fix the wording earlier to reflect that you are just using the log-method.

Adding a Floating Information Box
If you remove Method 2, you would need to rework this section to not talk about the capped scale.

Paragraph 162-Combine with following paragraph

Paragraph 169-Replace with the following: “Let’s see how this code functions. First, we iterate over all rows in the ‘features’ part of the GeoJSON data.

Paragraph 183-See if this still makes sense if using only the log method.

Add a Title

Paragraph 189-“Here’s what the next cell does:” -> “Let’s look at how the code works before using it for our map.”

Normalizing Population Data

Paragraph 209: "Pandas allows me" -> "Pandas allows us"

@anisa-hawes anisa-hawes moved this from 2 Initial Edit to 3 Revision 1 in Active Lessons May 26, 2024
@anisa-hawes
Copy link
Contributor

Hello Adam @adamlporter,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 3: Revision 1.

This phase is an opportunity for you to revise your draft in response to @nabsiddiqui's initial feedback.

I've sent you an invitation to join us as an Outside Collaborator here on GitHub. This will give you the 'write access' you'll need to edit your lesson directly.

We ask authors to work on their own files with direct commits: we prefer you don't fork our repo, or use the Pull Request system to edit in ph-submissions. You can make direct commits to your file here: /en/drafts/originals/data-into-choropleth-maps-with-python-and-folium.md. Charlotte and I can help if you encounter any practical problems or have questions.

When you and Nabeel are both happy with the revised draft, we will move forward to Phase 4: Open Peer Review.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 2 <br> Initial Edit
Who worked on this? : Editor (@nabsiddiqui) 
All  Phase 2 tasks completed? : Yes
Section Phase 3 <br> Revision 1
Who's working on this? : Author (@adamlporter)  
Expected completion date? : June 27
Section Phase 4 <br> Open Peer Review
Who's responsible? : Reviewers (TBC) 
Expected timeframe? : ~60 days after request is accepted
Loading

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

@adamlporter
Copy link
Collaborator

@nabsiddiqui, thanks for your many good suggestions and corrections. I have edited the document to reflect them:

  • the biggest change is getting rid of the appendix and moving the discussion of normalizing data up to the middle of the lesson
  • I've simplified the discussion creating the scale-value / cap by only describing the log-scale. I've collapsed the others into an bullet list of "other options."

I've pushed the edited version of the document to Github, so everyone can review it.

@charlottejmc, unfortunately these changes mean that the map images for this section (and some subsequent sections) may need to be re-generated and inserted into the article in the appropriate places. I'm happy to make the images / screen shots.

Because the maps that are generated are interactive, @charlottejmc made the excellent suggestion that I create some animated GIFs that show how the maps can be scrolled / zoomed over. This would be especially nice for the info-tip box feature. I've done a bit of Googling and think I can create several of these.

Where should I send them or how would I set them up in the article?

Also, I realized there is a second census dept datafile that should probably be copied into the assets folder. It can be found here: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv . It is referenced in the section of the lesson about normalizing data.

@anisa-hawes suggested that I set up a colab / jupyter notebook with just the code cells that could be accessed / run by users. I love this idea -- it could be run on Binder, so folks could execute it very easily without needing to download it to their own computers or to open it Colab.

Do you have suggestions about the sort of language I should use to help users follow from one file to the other? When I've finished this, to whom should I send it?

Thanks! Adam

@adamlporter
Copy link
Collaborator

I've created three GIFs to show how users can move around the map and zoom in.
I've also created four PNGs to replace images in the current document that no longer correspond with the revised text.
I've edited the MD file to show where these seven new images should either be inserted or replace existing images.

I've also created a Colab / Jupyter notebook that has all the code for the article.

All this material can be found in this Google Drive folder

@anisa-hawes
Copy link
Contributor

Dear @adamlporter,

Many thanks for your work on the revisions in response to @nabsiddiqui's feedback.

At the moment, we are providing codebooks rendered with nbviewer (for example: understanding-creating-word-embeddings.ipynb), which are openable in Colab for those readers who wish to work there, while remaining accessible for those who want to work in other cloud-based environments or locally instead. MyBinder is an interesting suggestion, and one which has also been recommended to me by someone else in our network. Thank you - I'll investigate our options.

We'll download your additional images and gifs, and adjust the figure sequence as required. We'll write you a note here to confirm when the preview is updated. Thank you for your patience.

Best,
Anisa

@charlottejmc
Copy link
Collaborator

charlottejmc commented Jun 7, 2024

Hello @adamlporter,

I've added the additional datafile co-est2019-alldata.csv to the lesson's assets folder, alongside the Colab notebook you've provided – and I added a short line in the lesson to point readers to it.

I've also reordered all the images in the markdown file to prepare to re-upload the updated set of .pngs and .gifs. Just a couple points:

  • Am I correct that you deleted the figures 07 and 08 from the initial draft?
  • Am I correct that the total number of .pngs and .gifs is now 14?
  • Unfortunately the 3 .gifs are currently above GitHub maximum file size, which is 25MB, so I cannot upload them to our repository. I've tried compressing them using an online tool, but this reduces the quality too much. Do you think you would be able to provide a new set of .gifs that are under 25MB? If not, we can try and find an alternative solution together.

Meanwhile, would you please be able to provide a caption and alt-text description for each image? You could edit the markdown file directly and insert them within the liquid syntax:
{% include figure.html filename="en-or-data-into-choropleth-maps-with-python-and-folium-XX.png" alt="Visual description of figure image" caption="Figure XX. Caption text to display" %}, or if you would rather write them in a comment below, I can add them in myself.

Thank you very much for your patience!

@rnelson2
Copy link

rnelson2 commented Sep 6, 2024

@adamlporter (and @nabsiddiqui):

Great lesson.

I only have one thought that's even remotely critical. There's a lot of techniques that are covered in the lesson beyond using Folium to generate maps. Half of the lesson is about transforming and joining the data, which in most cases is probably more work than generating the map. I mention this because I do find myself wondering how easily someone who was relatively new to mapping who read this lesson could begin to produce choropleth maps using different data that they were interested in. That data would almost certainly require different kinds of manipulations and transformations that what's covered here, and I can easily envision that person getting stuck very quickly.

Having said that, that's the nature of the work. The lesson provides an example of manipulating a dataset and joining it with spatial data, and that's probably as much as it can do and people have to spend the hours learning through further research and trial and error how to manipulate their own data. As you say, the "lesson assumes some proficiency with Python." Just a thought that I offer in no way to detract from what's a very solid and effective lesson.

I do have a few suggestions, which I think are all quite minor.

  • Around paragraph 15, you might suggest that users open up two browser windows side by side or use a browser with split screen functionality to have the lesson and the Colab notebook open side-by-side as they're proceeding through the lesson. That's what I did, and presuming they have enough screen space on their monitor, I think it's a much more effective way of moving through the lesson than switching back and forth.
  • The code on paragraph 33 isn't in the notebook.
  • In paragraph 55, you might provide a brief explanation of what a CRS is, particularly as you use the acronym several times in the subsequent paragraph. It's obviously an important concept for anyone doing anything with spatial data.
  • In paragraph 68, you might explain a bit more about why .reset_index() convert map_df to a DF. It wasn't obvious to me at least, which might just be me as I don't do a lot of programming in Python or with pandas.
  • In paragraph 81, the parameters are in a different order than they were in paragraph 77. The ones in 77 are better for your explanation with data appearing before key_on. That way you can swap the explanations of those parameters so that you can mention the data (map_df) before referencing it when describing key_on.
  • I think something's missing from the first sentence of the last paragraph: "In short, choropleth maps may be a way of displaying data and informing readers about topics": "powerful way", "effective way"—something like that. Even with that, the "may be" is an unnecessary qualification at the end of the lesson. I'd suggest revising it to say something more direct about what choropleth maps do particularly effectively.

Stylistic

¶ 4: drop the "Know how to"s. Readers will be able to "Associate latitude/longitude ..." and "Normalize data ..."
¶ 73: You might change "The following code block initializes a map object" to "The following code defines a python function that initializes a map object." Very minor, obviously, but more precise.
¶ 122: I'd change var newvalue = Math.pow(10.0, value).toFixed(1).toString() to var newvalue = Math.round(Math.pow(10.0, value)).toString(). I found the scale with its decimal places a little weird and maybe even off-putting as these are lives lost that shouldn't be counted in fractions.

Typos

¶ 2 spacial
¶ 6 June, — drop the comma
¶ 10 unfamilar
¶ 43 came (instead of same).
¶ 71 passs
¶ 78 intersted
¶ 97 uneavenly
¶ 116 logrithm
¶ 119 ned
¶ 162 becuse
¶ 170 missing period at the end of the paragraph
¶ 194 I don't believe lead levels is hyphenated

@nabsiddiqui
Copy link
Collaborator

Thank you @rnelson2. This is a wonderful review. @adamlporter you can make some of the copyedit changes now. I would wait until @fmvaldezg responds for more substantial edits.

@adamlporter
Copy link
Collaborator

adamlporter commented Sep 13, 2024 via email

@anisa-hawes
Copy link
Contributor

Hello Adam @adamlporter,

It's great that you're already starting to think-through your next steps, considering the first reviewer's feedback!
Just a quick reminder that it is important to wait until both reviews are received before beginning to implement changes.

When both reviews are in, Nabeel @nabsiddiqui will summarise them to clarify priorities for Phase 5 revisions. Thank you for your patience.

@fmvaldezg
Copy link

Hi @adamlporter,

Great lesson. I followed the steps both in the Colab notebook and by copying and pasting the code into a jupyter notebook. I generally agree with @rnelson2 that most of the lesson focuses on transforming the data. It would be interesting to present those using the lesson with the option of jumping directly to generating the map with the previously transformed data or replaying the entire lesson.
That said, here are some recommendations and observations about the lesson:

¶ 14. Add some examples of shapely data formats.

¶ 15. State the need to have a google account to execute code in colab.

¶ 16. Typo on ‘modern’ web-browser.

¶ 43. Typo on ‘same’ column names.

¶ 49. Stating why the use of s join instead of merge or join is basic. Or, paragraphs 55-57 could be at the very beginning of this section so that all paragraphs related to the sjoin process are together. I suggest referencing to the University Consortium GIS Body of Knowledge entry on ‘Spatial Joins’

¶ 55. Include a brief explanation of the importance of defining a CRS.

¶ 63. I suggest renaming the title of this section to something like ‘Summarizing data by county’.

¶ 70. Format ‘folium.Map’ as code snipet.

¶ 130. The file located in github did not worked. The reading process is successful if the census file is used

pop_df = pd.read_csv('https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv', usecols = ['STATE','COUNTY','POPESTIMATE2019'],encoding = "ISO-8859-1")
pop_df.head()

Additional coments

I see a problem in the fact of using civilian deaths for mutliple years (2015-2024) and normalize it using population data for 2022 only. In my opinion, the rate should be calculated for one year only.

When trying to reproduce the process in a jupyter notebook, when summarizing the ff_df dataframe by column FIPS, the name of the new column does not defaults to count (as it does when executing the line in the Colab notebook) which results in an error when displaying the map using this column. I used map_df.rename(columns={0:'count'}, inplace=True) to rename the column and the error was solved.

In the Colab notebook, on line 5, I suggest changing the question to 'what percent of the records have latitude/longitude data?' since it is more aligned with the code result.
After paragraph 119, be sure to state that this section of the code should be run. If it is run before defining cp it will result in an error.

@anisa-hawes
Copy link
Contributor

Many thanks, Felipe @fmvaldezg!

--

Hello @adamlporter,

Now that both reviews are received, @nabsiddiqui will summarise the two reviews so that you have a clear path forwards for your Phase 5 revisions.

Thank you,
Anisa

@nabsiddiqui
Copy link
Collaborator

nabsiddiqui commented Oct 8, 2024

Thank you @felipelmc and @rnelson2 for the wonderful reviews, and thank you again, @adamlporter for the lesson.

Based on the comments, I believe that the following changes should be made to the lesson, and then we can move forward with publishing it. I have divided them into two sections: Felipe's Comments and Rob's Comments.

Rob Comments

In regards to Rob's comment about the lesson focusing on transforming the data, I believe one way you can address this is to write a few small sentences or paragraphs outlining how the focus on the data transformation process that you have outlined here is to simulate the importance of how data is likely to need to be transformed in a real-world scenario. As Rob mentioned, "that's the nature of the work."

  • Add a small sentence around paragraph 15 providing a tip that says opening two browser windows side by side might be helpful
  • Add the code in paragraph 33 to the notebook
  • Make sure to introduce what CRS is
  • Put parameters in paragraph 81 in the same order as paragraph 77
  • Remove "may be" to "powerful way or "effective way" in the first sentence of the last paragraph
  • Make the typographic and stylistic changes Rob has outlined

Felipe Comments

  • In paragraph 14, add some examples of shapely data formats
  • In paragraph 15, state that you will need a Google account to execute the code
  • Paragraph 55-57, move towards the beginning of the section near paragraph 49
  • Make any stylistic or typographic changes that Felipe has outlined that Rob has not already outlined
  • Paragraph 63, rename the title of the section to be more descriptive
  • Paragraph 70, format as code snippet
  • Paragraph 130, fix the reading process
  • Address the "additional comments" section that Felipe has made

@anisa-hawes anisa-hawes moved this from 4 Open Peer Review to 5 Revision 2 in Active Lessons Oct 9, 2024
@anisa-hawes
Copy link
Contributor

Hello Adam @adamlporter,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 5: Revision 2.

This phase is an opportunity for you to revise your draft in response to the peer reviewers' feedback.

Nabeel @nabsiddiqui has summarised their suggestions, but feel free to ask questions if you are unsure.

Please make revisions via direct commits to your file: /en/drafts/originals/data-into-choropleth-maps-with-python-and-folium.md. @charlottejmc and I are here to help if you encounter any difficulties.

When you and Nabeel are both happy with the revised draft, the Managing Editor @hawc2 will read it through before we move forward to Phase 6: Sustainability + Accessibility.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 4 <br> Open Peer Review
Who worked on this? : Reviewers (@felipelmc + @rnelson2)
All  Phase 4 tasks completed? : Yes
Section Phase 5 <br> Revision 2
Who's working on this? : Author (@adamlporter)
Expected completion date? : Nov 9
Section Phase 6 <br> Sustainability + Accessibility
Who's responsible? : Publishing Team
Expected timeframe? : 7~21 days
Loading

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

@anisa-hawes
Copy link
Contributor

Hello Adam @adamlporter. How are you?

I wondered how you are getting on with your Phase 5 revisions?

Please let us know if there's anything we can do to support your next steps. Nabeel @nabsiddiqui has summarised the reviewers' suggestions to guide you, but is here to discuss any aspect. Meanwhile, Charlotte and I are on hand to help with any practicalities.

Looking forward to collaborating with you to move this lesson through the final stages.

@adamlporter
Copy link
Collaborator

adamlporter commented Dec 8, 2024 via email

@anisa-hawes
Copy link
Contributor

Thank you, @adamlporter. No rush - We realise how busy things are towards the year-end. Please let us know if we can help in any way.

All best for now,
Anisa

@adamlporter
Copy link
Collaborator

adamlporter commented Dec 11, 2024 via email

@anisa-hawes
Copy link
Contributor

anisa-hawes commented Dec 11, 2024

Thank you, Adam @adamlporter. We really appreciate your work on these revisions. I can confirm that your commits have all been successful.

Unfortunately, we cannot receive images within the comment thread, so I don't have the new version of your screenshot. May I ask if you could send this to Charlotte [[email protected]] as an attachment in a direct email? From there, we can process and upload it.

Charlotte's full name is Charlotte Chevrie. I am very grateful to you for expressing thanks to us all in the way that you have. Positive feedback from contributors helps us to know that gradual improvements and adaptations to our workflow are bringing the difference we hoped for. Thank you! ☺️

@charlottejmc
Copy link
Collaborator

Thank you @adamlporter, I have received your new version of Figure 06 and replaced it in the lesson. Thank you also for your kind acknowledgements!

@nabsiddiqui
Copy link
Collaborator

These changes look good to me. I think we are ready to go to the next stage now @anisa-hawes

@anisa-hawes
Copy link
Contributor

Thank you, @nabsiddiqui.

Hello Alex @hawc2,
This lesson is ready for your read-through. Please advise @adamlporter if you feel any further revisions are required before we move onwards to Phase 6: Sustainability + Accessibility.

@hawc2
Copy link
Collaborator Author

hawc2 commented Feb 6, 2025

@adamlporter your lesson is looking good overall. I still intend to do a close line edit, but before I do that, I think it's important you to take a shot at revising this lesson to condense it in numerous ways. Your lesson is currently close to 10K words long, and we try to keep lessons to a max of 8K. In your case, there's alot of tables and code, and alot of visualizations, and it feels a bit cluttered. I'm curious to hear from you where you think we could cut things down a bit to focus on the essentials.

For example, do we need tables with rows 8-10 long? Could we standardize all tables to be 4 rows or so?

Are there tables or chunks of code that are repetitious and unnecessary? For example, paragraph 31 - is it necessary to show all output from printing out the possible data types? There a few cases where I'm not sure it's necessary for you to show output for code you're telling the person to run. They will see the output on their end. Is there a reason for all this being included? If not, let's cut it.

Similarly, do we really need this many different map visualizations? Is it possible one or two could be removed/reduced? For example, on paragraph 123, you have a map, but no explanation for why this one is being shared. With the two prior maps, you said it showed an improvement. What does this one show that requires another reproduction? Similarly with paragraph 146, you could say more explicitly how this new generated map demonstrates a new level of improvements. Imagine a reader who is skimming through the lesson to get their bearings - those extra signposts can help alot.

After the two maps I've noted here, it's important to point out you include another 4(!) map reproductions further down the lesson. Please consider if some of this can be condensed. If you could only reproduce 3-4 maps to demonstrate the value of this methodology and code, which ones would it be? The section, Adding an Information Box is very long and includes two maps. Why two? The following two sections on Minimap and Title additions could be combined into one subsection with one map demonstrating both improvements at once (in my opinion). Some of these minor aesthetic additions don't really teach anything special about the main methodology being taught here - chloropleth mapping - so I would downplay their presence here...

It's a little overwhelming right now, so I'm trying to think about how to focus it for the reader. For context, I'm inclined to rank this lesson as Advanced difficulty - it assumes knowledge of digital mapping and programming, and focuses on specific types of mapping for research. I don't think you need to do much more work to further introduce this difficult lesson to the reader - it's more about how to make it streamlined for the reader to efficiently learn the key steps.

I hope this makes sense. Please let me know if you have questions, and please feel free to take a shot at condensing this lesson down where you see fit. Once you take a shot, I can do a close read through and make further recommendations, so no worries if it is still over 8K when you send it back to me.

@nabsiddiqui
Copy link
Collaborator

Hey @hawc2. Sure. I'll take a look at this sometime soon. I'm almost caught up with most things so should be able to dedicate more time to this.

@adamlporter
Copy link
Collaborator

adamlporter commented Feb 10, 2025 via email

@charlottejmc
Copy link
Collaborator

Thank you for your patience and your energy, @adamlporter.

If you decide to remove any images, please simply let me know which figure numbers you removed, and I will take care of updating the filenames and captions on those remaining.

@adamlporter
Copy link
Collaborator

adamlporter commented Feb 18, 2025 via email

@nabsiddiqui
Copy link
Collaborator

Hey @adamlporter,

Thank you for your revisions to the lesson. The changes you've made streamline the content. I think the main issue now is target length. I would definitely get rid of any maps that you think are not needed and maybe combine the explanations of similar techniques (like the various scaling methods).

Your suggestion about potentially removing the floating information box section is also worth considering, though I agree this feature is valuable for users. You can also reduce some of the inspection steps shown (like .info() and .sample() outputs by showing only the most relevant parts of the output rather than the full display.

@charlottejmc
Copy link
Collaborator

Hi @adamlporter, according to several different online word counter tools, you have between 7900 and 8300 words here. I don't think you need to completely remove anything important from your lesson. In the next phase, I will be carefully copyediting your lesson: I'll try to keep word count in mind and edit sentences down for you, as well as merge any repetitions I might find.

Thank you for outlining the figures you'd like to remove. I've deleted figures 5, 11, 13 and 14, so you now have a total of 10 figures. You can check that it renders correctly in the lesson preview.

I agree with @nabsiddiqui that you don't need to show the full output display if it is not essential. You can focus simply on the relevant lines (if you do so, I'd recommend adding a short note explaining this choice so that readers are aware their own output will look a little different).

@nabsiddiqui
Copy link
Collaborator

Hey @charlottejmc and @adamlporter. Yes, if we are ok with the length, I think this is almost ready to go, other than maybe removing the full output display when not necessary.

@charlottejmc
Copy link
Collaborator

Hi @adamlporter, it seems that your local edits created some conflicts in another lesson. We would prefer it if you worked on GitHub online, but if you must work locally, please make sure to always pull the very latest version! We're very active so the repo changes very quickly... Thank you ✨

@adamlporter
Copy link
Collaborator

I'm sorry! I was trying to rebase my local copy and thought I told it to overwrite all my local files with the material from GitHub. I certainly didn't mean to cause problems with other submissions. I'm very sorry!

@charlottejmc
Copy link
Collaborator

It's OK – the conflicts were easy to resolve manually. These things are often knotty and mysterious!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 5 Revision 2
Development

No branches or pull requests

7 participants