layout | title | type | navigation | ||||
---|---|---|---|---|---|---|---|
global |
Release Process |
page singular |
|
The release manager role in Spark means you are responsible for a few different things:
- Preparing your setup
- Preparing for release candidates:
- cutting a release branch
- informing the community of timing
- working with component leads to clean up JIRA
- making code changes in that branch with necessary version updates
- Running the voting process for a release:
- creating release candidates using automated tooling
- calling votes and triaging issues
- Finalizing and posting a release:
- updating the Spark website
- writing release notes
- announcing the release
If you are a new Release Manager, you can read up on the process from the followings:
- release signing https://www.apache.org/dev/release-signing.html
- gpg for signing https://www.apache.org/dev/openpgp.html
- svn https://www.apache.org/dev/version-control.html#https-svn
You can skip this section if you have already uploaded your key.
After generating the gpg key, you need to upload your key to a public key server. Please refer to https://www.apache.org/dev/openpgp.html#generate-key for details.
If you want to do the release on another machine, you can transfer your gpg key to that machine
via the gpg --export
and gpg --import
commands.
The last step is to update the KEYS file with your code signing key https://www.apache.org/dev/openpgp.html#export-public-key
# Move dev/ to release/ when the voting is completed. See Finalize the Release below
svn co --depth=files "https://dist.apache.org/repos/dist/dev/spark" svn-spark
# edit svn-spark/KEYS file
svn ci --username $ASF_USERNAME --password "$ASF_PASSWORD" -m"Update KEYS"
The scripts to create release candidates are run through docker. You need to install docker before running these scripts. Please make sure that you can run docker as non-root users. See https://docs.docker.com/install/linux/linux-postinstall for more details.
The main step towards preparing a release is to create a release branch. This is done via standard Git branching mechanism and should be announced to the community once the branch is created.
It is also good to set up Jenkins jobs for the release branch once it is cut to ensure tests are passing. These are jobs like https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.3-test-maven-hadoop-2.7/ . Consult Josh Rosen and Shane Knapp for help with this. Also remember to add the newly-added jobs to the test dashboard at https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ .
If this is not the first RC, then make sure that the JIRA issues that have been solved since the
last RC are marked as Resolved
and has a Target Versions
set to this release version.
To track any issue with pending PR targeting this release, create a filter in JIRA with a query like this
project = SPARK AND "Target Version/s" = "12340470" AND status in (Open, Reopened, "In Progress")
For target version string value to use, find the numeric value corresponds to the release by looking into an existing issue with that target version and click on the version (eg. find an issue targeting 2.2.1 and click on the version link of its Target Versions field)
Verify from git log
whether they are actually making it in the new RC or not. Check for JIRA issues
with release-notes
label, and make sure they are documented in relevant migration guide for breaking
changes or in the release news on the website later.
Also check that all build and test passes are green from the RISELab Jenkins: https://amplab.cs.berkeley.edu/jenkins/ particularly look for Spark Packaging, QA Compile, QA Test. Note that not all permutations are run on PR therefore it is important to check Jenkins runs.
To cut a release candidate, there are 4 steps:
- Create a git tag for the release candidate.
- Package the release binaries & sources, and upload them to the Apache staging SVN repo.
- Create the release docs, and upload them to the Apache staging SVN repo.
- Publish a snapshot to the Apache staging Maven repo.
The process of cutting a release candidate has been automated via the dev/create-release/do-release-docker.sh
script.
Run this script, type information it requires, and wait until it finishes. You can also do a single step via the -s
option.
Please run do-release-docker.sh -h
and see more details.
The release voting takes place on the Apache Spark developers list (the PMC is voting). Look at past voting threads to see how this proceeds. The email should follow this format.
- Make a shortened link to the full list of JIRAs using https://s.apache.org/
- If possible, attach a draft of the release notes with the email
- Make sure the voting closing time is in UTC format. Use this script to generate it
- Make sure the email is in text format and the links are correct
Once the vote is done, you should also send out a summary email with the totals, with a subject
that looks something like [VOTE][RESULT] ...
.
Be Careful!
THIS STEP IS IRREVERSIBLE so make sure you selected the correct staging repository. Once you move the artifacts into the release folder, they cannot be removed.
After the vote passes, to upload the binaries to Apache mirrors, you move the binaries from dev directory (this should be where they are voted) to release directory. This "moving" is the only way you can add stuff to the actual release directory. (Note: only PMC can move to release directory)
# Move the sub-directory in "dev" to the
# corresponding directory in "release"
$ export SVN_EDITOR=vim
$ svn mv https://dist.apache.org/repos/dist/dev/spark/v1.1.1-rc2-bin https://dist.apache.org/repos/dist/release/spark/spark-1.1.1
# If you've added your signing key to the KEYS file, also update the release copy.
svn co --depth=files "https://dist.apache.org/repos/dist/release/spark" svn-spark
curl "https://dist.apache.org/repos/dist/dev/spark/KEYS" > svn-spark/KEYS
(cd svn-spark && svn ci --username $ASF_USERNAME --password "$ASF_PASSWORD" -m"Update KEYS")
Verify that the resources are present in https://www.apache.org/dist/spark/. It may take a while for them to be visible. This will be mirrored throughout the Apache network. Check the release checker result of the release at https://checker.apache.org/projs/spark.html.
For Maven Central Repository, you can Release from the Apache Nexus Repository Manager. This is already populated by the release-build.sh publish-release
step. Log in, open Staging Repositories, find the one voted on (eg. orgapachespark-1257 for https://repository.apache.org/content/repositories/orgapachespark-1257/), select and click Release and confirm. If successful, it should show up under https://repository.apache.org/content/repositories/releases/org/apache/spark/spark-core_2.11/2.2.1/
and the same under https://repository.apache.org/content/groups/maven-staging-group/org/apache/spark/spark-core_2.11/2.2.1/ (look for the correct release version). After some time this will be sync'd to Maven Central automatically.
You'll need the credentials for the spark-upload
account, which can be found in
this message
(only visible to PMC members).
The artifacts can be uploaded using twine. Just run:
twine upload --repository-url https://upload.pypi.org/legacy/ pyspark-{version}.tar.gz pyspark-{version}.tar.gz.asc
Adjusting the command for the files that match the new release. If for some reason the twine upload
is incorrect (e.g. http failure or other issue), you can rename the artifact to
pyspark-version.post0.tar.gz
, delete the old artifact from PyPI and re-upload.
Publishing to CRAN is done using this form. Since it requires further manual steps, please also contact the PMC.
After the vote passes and you moved the approved RC to the release repository, you should delete the RC directories from the staging repository. For example:
svn rm https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-bin/ \
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc1-docs/ \
-m"Removing RC artifacts."
Make sure to also remove the unpublished staging repositories from the Apache Nexus Repository Manager.
Spark always keeps the latest maintenance released of each branch in the mirror network. To delete older versions simply use svn rm:
$ svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.1.0
You will also need to update js/download.js
to indicate the release is not mirrored
anymore, so that the correct links are generated on the site.
Also take a moment to check HiveExternalCatalogVersionsSuite.scala
starting with branch-2.2
and see if it needs to be adjusted, since that test relies on mirrored downloads of previous
releases.
Check out the tagged commit for the release candidate that passed and apply the correct version tag.
$ git tag v1.1.1 v1.1.1-rc2 # the RC that passed
$ git push apache v1.1.1
The website repository is located at https://github.com/apache/spark-website.
It's recommended to not remove the generated docs of the latest RC, so that we can copy it to spark-website directly, otherwise you need to re-build the docs.
# Build the latest docs
$ git checkout v1.1.1
$ cd docs
$ PRODUCTION=1 jekyll build
# Copy the new documentation to Apache
$ git clone https://github.com/apache/spark-website
...
$ cp -R _site spark-website/site/docs/1.1.1
# Update the "latest" link
$ cd spark/site/docs
$ rm latest
$ ln -s 1.1.1 latest
Next, update the rest of the Spark website. See how the previous releases are documented
(all the HTML file changes are generated by jekyll
). In particular:
- update
_layouts/global.html
if the new release is the latest one - update
documentation.md
to add link to the docs for the new release - add the new release to
js/downloads.js
- check
security.md
for anything to update
$ git add 1.1.1
$ git commit -m "Add docs for Spark 1.1.1"
Then, create the release notes. Go to the
release page in JIRA,
pick the release version from the list, then click on "Release Notes". Copy this URL and then make a short URL on
s.apache.org, sign in to your Apache account, and pick the ID as something like
spark-2.1.2
. Create a new release post under releases/_posts
to include this short URL. The date of the post should
be the date you create it.
Then run jekyll build
to update the site
directory.
After merging the change into the asf-site
branch, you may need to create a follow-up empty
commit to force synchronization between ASF's git and the web site, and also the GitHub mirror.
For some reason synchronization seems to not be reliable for this repository.
On a related note, make sure the version is marked as released on JIRA. Go find the release page as above, eg.,
https://issues.apache.org/jira/projects/SPARK/versions/12340295
, and click the "Release" button on the right and enter the release date.
(Generally, this is only for major and minor, but not patch releases) The contributors list can be automatically generated through this script. It accepts the tag that corresponds to the current release and another tag that corresponds to the previous (not including maintenance release). For instance, if you are releasing Spark 1.2.0, set the current tag to v1.2.0-rc2 and the previous tag to v1.1.0. Once you have generated the initial contributors list, it is highly likely that there will be warnings about author names not being properly translated. To fix this, run this other script, which fetches potential replacements from GitHub and JIRA. For instance:
$ cd release-spark/dev/create-release
# Set RELEASE_TAG and PREVIOUS_RELEASE_TAG
$ export RELEASE_TAG=v1.1.1
$ export PREVIOUS_RELEASE_TAG=v1.1.0
# Generate initial contributors list, likely with warnings
$ ./generate-contributors.py
# set JIRA_USERNAME, JIRA_PASSWORD, and GITHUB_API_TOKEN
$ export JIRA_USERNAME=blabla
$ export JIRA_PASSWORD=blabla
$ export GITHUB_API_TOKEN=blabla
# Translate names generated in the previous step, reading from known_translations if necessary
$ ./translate-contributors.py
Additionally, if you wish to give more specific credit for developers of larger patches, you may use the the following commands to identify large patches. Extra care must be taken to make sure commits from previous releases are not counted since git cannot easily associate commits that were back ported into different branches.
# Determine PR numbers closed only in the new release
$ git log v1.1.1 | grep "Closes #" | cut -d " " -f 5,6 | grep Closes | sort > closed_1.1.1
$ git log v1.1.0 | grep "Closes #" | cut -d " " -f 5,6 | grep Closes | sort > closed_1.1.0
$ diff --new-line-format="" --unchanged-line-format="" closed_1.1.1 closed_1.1.0 > diff.txt
# Grep expression with all new patches
$ EXPR=$(cat diff.txt | awk '{ print "\\("$1" "$2" \\)"; }' | tr "\n" "|" | sed -e "s/|/\\\|/g" | sed "s/\\\|$//")
# Contributor list
$ git shortlog v1.1.1 --grep "$EXPR" > contrib.txt
# Large patch list (300+ lines)
$ git log v1.1.1 --grep "$expr" --shortstat --oneline | grep -B 1 -e "[3-9][0-9][0-9] insert" -e "[1-9][1-9][1-9][1-9] insert" | grep SPARK > large-patches.txt
When a new release occurs, PROCESS_TABLES.testingVersions
in HiveExternalCatalogVersionsSuite
must be updated shortly thereafter. This list should contain the latest release in all active
maintenance branches, and no more.
For example, as of this writing, it has value val testingVersions = Seq("2.1.3", "2.2.2", "2.3.2")
.
"2.4.0" will be added to the list when it's released. "2.1.3" will be removed (and removed from the Spark dist mirrors)
when the branch is no longer maintained. "2.3.2" will become "2.3.3" when "2.3.3" is released.
Once everything is working (website docs, website changes) create an announcement on the website
and then send an e-mail to the mailing list. To create an announcement, create a post under
news/_posts
and then run jekyll build
.
Enjoy an adult beverage of your choice, and congratulations on making a Spark release.