Repository which contains the scripts to retrive user warnings metrics from a MongoDB database.
The collections should be obtained merging the outcomes of user-metrics and Wikidump repository.
Clone this repository
git clone https://github.com/WikiCommunityHealth/warnings-dropoff-analysis.gitCreate the poetry project virtual environment using your current Python version
make envInstall the required packages and dependencies
make installOnly after having installed the dependencies in the poetry virtual environment, can you run the following command
poetry run python -m warnings_dropoff_analysis db-name collection-name [--output-compression gzip] extract-user-warnings-metrics [month to consider the drop-off]Or you can simply edit the Makefile and type
make runThe result will be found in the ouput folder.
After the data has been retrieved, you can plot some basic statistics by typing
poetry run python plotter/[plotter-stats-file].py [community-lang]Or, as in the previous case, you can edit the Makefile and run
make plotThe produced figures can be found in the plots folder.
The script tries, by default, to connect to the local MongoDB instance.
So as to connect it to a remote one, you can add a .env file writing the connection string within, following the .env.example template file:
echo '[your mongodb connection string]' > .envBe careful with the plot scripts, since pandas memory overhead, to instantiate the DataFrame structure, is extremely RAM consuming.
As a result, make sure you have enough free memory before running one of those scripts.
The data retrived by the script is in the following format:
{
"name": "<name>",
"id": 21,
"last_edit_month": 2,
"last_edit_year": 2010,
"retirement_declared": false,
"retire_date": null,
"retirement_parameters": null,
"retirement_template_name": null,
"edit_count_after_retirement": null,
"last_serious_warning": {
"name": "avvisoimmagine2",
"date": "200_-0_-1_ 2_:1_:4_+00:00"
},
"last_serious_warning_name": "avvisoimmagine2",
"last_serious_warning_date": "200_-0_-1_ 2_:1_:4_+00:00",
"last_normal_warning": {
"date": null,
"name": null
},
"last_normal_warning_name": null,
"last_normal_warning_date": null,
"last_not_serious_warning": {
"date": null,
"name": null
},
"last_not_serious_warning_name": null,
"last_not_serious_warning_date": null,
"average_edit_count_before_last_serious_warning_date": 9.333333333333334,
"average_edit_count_before_last_normal_warning_date": 0,
"average_edit_count_before_last_not_serious_warning_date": 0,
"average_edit_count_after_last_serious_warning_date": 0.0036443148688046646,
"average_edit_count_after_last_normal_warning_date": 0,
"average_edit_count_after_last_not_serious_warning_date": 0,
"count_serious_templates_transcluded": 1,
"count_warning_templates_transcluded": 0,
"count_not_serious_templates_transcluded": 0,
"count_serious_templates_substituted": 0,
"count_warning_templates_substituted": 0,
"count_not_serious_templates_substituted": 0,
"edit_history": {
"2001": {
"1": 0,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 1,
"9": 0,
"10": 0,
"11": 0,
"12": 0
},
"2015": {
"1": 0,
"2": 0,
"3": 0,
"4": 0,
"5": 0,
"6": 0,
"7": 0,
"8": 0,
"9": 0,
"10": 0,
"11": 0,
"12": 3
}
},
"warnings_history": {
"2005": {
"10": {
"serious_substituted": 0,
"not_serious_substituted": 0,
"not_serious_transcluded": 0,
"warning_substituted": 0,
"warning_transcluded": 0,
"serious_transcluded": 1
},
"2021": {
"1": {
"serious_substituted": 0,
"not_serious_substituted": 0,
"not_serious_transcluded": 0,
"warning_substituted": 0,
"warning_transcluded": 0,
"serious_transcluded": 0
},
"12": {
"serious_substituted": 0,
"not_serious_substituted": 0,
"not_serious_transcluded": 0,
"warning_substituted": 0,
"warning_transcluded": 0,
"serious_transcluded": 0
}
}
},
"sex": null,
"banned": false
}A brief description of the fields
namename of the useriduser idlast_edit_monthmonth of the last edit datelast_edit_yearyear of the last edit dateretirement_declaredis a boolean field representing, whether the user has specified a retirement template or not, on the user page or user talk pageretire_datethe retirement date if specifiedretirement_parameters: parameters specified in the retired templateretirement_template_name: name of the retired templateedit_count_after_retirement: activity count after the retirement datelast_serious_warning: last transcluded serious warninglast_serious_warning_name: the name of the last high severity warning receivedlast_serious_warning_date: the date of the last high severity warning receivedlast_normal_warninglast transcluded warning of medium severitylast_normal_warning_name: the name of the last medium severity warning receivedlast_normal_warning_date: the date of the last medium severity warning receivedlast_not_serious_warninglast transcluded non serious warninglast_not_serious_warning_name: the name of the last low severity warning receivedlast_not_serious_warning_date: the date of the last low severity warning receivedaverage_edit_count_before_last_serious_warning_dateaverage count of actions the user has made before the last serious warning date (in the range of [date - 12 months, date])average_edit_count_before_last_normal_warning_dateaverage count of actions the user has made before the last warning date (in the range of [date - 12 months, date])average_edit_count_before_last_not_serious_warning_dateaverage count of action the user has made before the last not serious warning date (in the range of [date - 12 months])average_edit_count_after_last_serious_warning_dateaverage count of action the user has made after the last serious warning date (in the range of [date, date + 12 months])average_edit_count_after_last_normal_warning_dateaverage count of action the user has made after the last warning date (in the range of [date, date + 12 months])average_edit_count_after_last_not_serious_warning_dateaverage count of action the user has made after the last not serious warning date (in the range of [date, date + 12 months])count_serious_templates_transcludedcount of the transcluded serious warnings templatescount_warning_templates_transcludedcount of the transcluded warnings templatescount_not_serious_templates_transcludedcount of the transcluded not serious warnings templatescount_serious_templates_substitutedcount of the substituted serious warnings templatescount_warning_templates_substitutedcount of the substituted warnings templatescount_not_serious_templates_substitutedcount of the substituted not serious warnings templatesedit_historyhistory of the user's activity per monthwarnings_historyhistory of the user's warnings per monthsex: the user's sex if specifiedbanned: whether the user is banned or not
In order to get the code documentation, you can use pdoc by typing
make docOr you can open it directly with your browser using xdg-open
make openDocHere, the plots, produced by the scripts stored in the plotter folder, are listed
Considering only the users who have declared their withdrawal from Wikipedia:
- A bar chart which indicates the percentage of users who have stopped editing in Wikipedia, and the percentage of users who have continued to be active.
- A bar chart which shows the number of edits the users have made after having declared their withdrawal.
- A bar chart which illustrates the percentage of men and women who have declared the withdrawal from Wikipedia
Considering only the users who have received at least a serious warning:
- A bar chart which indicates the number of users who have decreased their activity in the community within the time interval of the last serious warning received.
- The graph above but considering only men and women
- A bar chart which compares the percentage of women and men who have decreased their activity
- Considering the five users who have received the highest amount of edits:
- A line chart which shows the user activity over months with some vertical lines, which indicate one or more warning.
- A line chart illustrating the
z-scoreof the user history within the time interval of the last serious user warning received
In order to call the scripts on all the MongoDB database collections, it is possibile to run the run bash script.
./run.shFirst of all, be sure you have modified all the readonly variables so as to fit your needs; feel free to change whatever you want.
The dependencies of the previously defined script are
So as to call the entire program in a Docker cotainer, a Dockerfile has been provided.
First, you need to change the content of the run.sh file in order to fill your requirements, such as the files' locations and which operation should be carried out by the script.
Then, you can build the Docker image by typing:
docker build -t warning-dropoff .Then, make sure you have your local MongoDB instance working with all the required data.
Finally, run the docker image:
docker run --network="host" warning-dropoff