Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

META: Hatch documentation upgrade #1245

Open
2 tasks
lwasser opened this issue Feb 6, 2024 · 22 comments
Open
2 tasks

META: Hatch documentation upgrade #1245

lwasser opened this issue Feb 6, 2024 · 22 comments

Comments

@lwasser
Copy link
Contributor

lwasser commented Feb 6, 2024

Following the discussion here, we discussed with @ofek @DahanDv upgrading the hatch docs to include more how-to and tutorial style elements to help users get started with hatch.

This also related to this issue opened by @pfmoore about using the Diátaxis framework.

In this issue we can iterate around what the structure of tutorials vs how-tos should look like and what we wish to create / develop further to help hatch users. I'll attempt to track comments below and update the main outline here as the discussion progresses.

I'll also try to here and there scan issues and discussions to identify pain points and get users involved in the upgraded content reviews :) @DahanDv

Note that we also are working on tutorials at pyOpenSci which we could link to / use as needed here.
Here is a tutorial on publishing to PyPI using hatch.

I'm starting the discussion here but probably can't work out a full outline now. Please add comments about other tutorials / how to's that you'd like to see and i will update this header comment as needed. (or @ofek obviously you can always edit it too!).

Screenshot 2024-02-05 at 5 56 51 PM

Hatch How To's

Hatch Tutorials

  • Tutorial on Python version management with hatch - @DahanDv
@DahanDv
Copy link

DahanDv commented Feb 12, 2024

@lwasser hey!
Sorry for being off line for so long!
I'm working on a reduced guide for diataxis so new contributions can get up to speed when they wish to contribute to any kind of documentation! (Instead of letting them figure this out themselves which can be a turnoff for some; the diataxis official guide is wordy and repetitive IMO!)
I will link this here today (I hope!) your review and comments will be appreciated ❤️

@lwasser
Copy link
Contributor Author

lwasser commented Feb 12, 2024

looking forward to seeing what you pull together @DahanDv !

@polarathene
Copy link

polarathene commented May 3, 2024

It would be good to document advice on usage within Docker (this was requested in the past).

Is just installing hatch via curl and running hatch --version expected to add over 400MB in disk usage? If python is available, one can install pipx to get hatch which is less heavy, but uv seems to be pulled with these two install methods too regardless if you'd use it? (IIRC in one case it was about 30MB while the other had about 90MB of data related to uv).

I had seen in the docs a brief note/admonition about standalone/installers not being able to detect/use an existing python install, thus pulling in a standalone version of python? (I had attempted to avoid this with a config.toml, but it didn't seem to help reduce weight)

If 150-400MB is to be expected, it might be worthwhile to raise some awareness there. At least with an endorsed approach for using hatch within a container, that expectation of disk weight would be clearer :)

@lwasser
Copy link
Contributor Author

lwasser commented May 7, 2024

i am not sure if i can help here or not but chiming in. i just played with this quickly locally. when i created a docker container with python / pip in it it automatically increased the container size but about 330mb.

my question: if python is not installed on a user's system and you install hatch, will it by default now try to install python now that it supports uv?

i wonder if this should be another issue where folks chime in but also i wonder if anyone has worked with a docker container with some version of python already installed to see if there is a difference in the size of the container when running hatch --version (as a way to potentially tease out the need for python to be installed and how is't setup most efficiently in a container vs. hatch's default behavior).

please excuse this comment if it's totally off base. it does seem like docs around this would be useful!

@ofek
Copy link
Collaborator

ofek commented May 7, 2024

I will respond to a few comments at the same time:

  • If you download the Hatch binary then on the first run it will download a Python distribution and install itself from PyPI. If this is undesirable then manage Python and install Hatch manually.
  • UV is only used for virtual environment creation and dependency installation, entirely a runtime thing when using environments that have it enabled so for example hatch --version would not invoke UV at all.
  • It would be helpful to know where exactly the disk space is coming from. Perhaps the Dockerfile isn't cleaning up pip caches.

@lwasser
Copy link
Contributor Author

lwasser commented May 7, 2024

@ofek to clarify

the example above referred to this issue comment.

which had this docker setup:

$ docker run -it ubuntu:22.04 bash

$ apt update && apt install -y curl
$ curl -sSfL https://github.com/pypa/hatch/releases/download/hatch-v1.10.0/hatch-1.10.0-x86_64-unknown-linux-gnu.tar.gz | tar -xz
$ mv hatch-1.10.0-x86_64-unknown-linux-gnu/usr/local/bin/hatch
$ du -shx /
144M    /

$ hatch --version
Hatch, version 1.10.0

$ du -shx /
558M    /

in this case a user is

  • creating a "blank slate" docker environment with ubuntu only from what I can see.
  • downloading hatch via curl.

So nothing is run - yet. Then the hatch binaries are moved into a new location so hatch can be called.

To me it makes sense based on what you wrote above that in this specific case, when you run hatch --version it will first download python. And that Python download accounts for the increase in size of the container.

The alternative approach would be for someone to create a docker container that first installs python or inherits from another container on dockerhub that contains python.

Is that interpretation correct? and if it is, would it make sense to create a small how to (or add doc enhancements elsewhere)? i'm happy to help create a very basic example of this that others could enhance / build off of.

@lwasser
Copy link
Contributor Author

lwasser commented May 7, 2024

here is a repro example. i definitely saw it install python and hatch when i ran hatch --version . NOTE: i'm on a mac so using a different release distro below compared to the example referred to above! But a small cleanup step did reduce the size.


$docker run -it ubuntu:22.04 bash

root@bd1bf5df743c:/# apt update && apt install -y curl
root@bd1bf5df743c:/# mv hatch-1.10.0-aarch64-unknown-linux-gnu /usr/local/bin/hatch
root@bd1bf5df743c:/# du -shx
131M	.
root@bd1bf5df743c:/# hatch --version
Hatch, version 1.10.0
root@bd1bf5df743c:/# du -shx
361M	.
root@bd1bf5df743c:/# rm -rf /var/lib/apt/lists/*
root@bd1bf5df743c:/# du -shx
316M	.

@ofek
Copy link
Collaborator

ofek commented May 7, 2024

Yes that is actually expected as I mention in my first bullet point. Hatch binaries are built with PyApp and bootstraps itself on the first run. If you already have Python available and want to cut down on disk space then I would recommend installing manually.

@ofek
Copy link
Collaborator

ofek commented May 8, 2024

I might be able to shave some MBs off given a new release of the binaries and docs on enabling the option.

@lwasser
Copy link
Contributor Author

lwasser commented May 8, 2024

fantastic. Ofek would a small "how to" or tutorial about creating a docker environment be useful in the docs? i am not a docker expert but i could atleast capture the information here for folks to use.

maybe @polarathene (if you are up for it) could review and provide input as well?

@ofek
Copy link
Collaborator

ofek commented May 8, 2024

Yes that would be quite helpful! I wouldn't have time to add that new feature until after PyCon though.

@polarathene
Copy link

I looked into it a bit, here's my findings, hope it's helpful 👍

FWIW, keeping it simple and focused/familiar for most Docker users (that is those less experienced) is probably best. I wouldn't stress too much on size as you can see in the examples below you won't save too much with the added effort, but it's possible 👍

If you write something up and contribute a PR feel free to ping me and I'll try provide a review if I have the time :)


TL;DR:

  • Some distros do package hatch already, but they're only providing 1.9 right now. Might take a while before 1.10 is available to benefit from uv. These should be the lightest install option when available.
  • pipx install is fairly simple and easy to do via any distro as an alternative, with the benefit of the latest hatch + uv (bundled). You'll need to either install uv via pipx to get it available easily, or alternatively configure hatch to provide it per environment, otherwise the symlink (ln -s command below) approach works easy enough (most users may be more comfortable with just adjusting PATH).

There's also the route of having a Dockerfile added to this repo, and optionally a GH Actions workflow that automates publishing images to DockerHub / GHCR with the release CI. Most users would likely be happy using a base image with hatch, unless they need to install system packages and have a particular preference (often this is ubuntu or debian for the familiar apt command they'll come across online on sites like StackExchange/StackOverflow).


NOTE: du -shx reports the total size of the location in MiB (1024^2, not MB: 1000^2, which would be -sx --si_). So the M value in output is MiB.

  • -x excludes any other potential filesystem boundaries (unlikely in this case).
  • If hardlinks are present (like with uv) the content will only be counted once. Thus two separate venv folders with hardlinks to uv package store (cache) would not report duplicates, while you can query individual venv folder in isolation it does not represent that some data is shared (hardlinks are to an inode, unlike a symlink there is no specific location as owner).

Install approaches

Package Manager (122 MiB)

$ docker run --rm -it quay.io/fedora/fedora-minimal:41 bash
$ du -shx /
126M    /

$ dnf5 install -y --setopt=install_weak_deps=0 hatch
Transaction Summary:
 Installing:       75 packages
 Upgrading:         5 packages
 Replacing:         5 packages

Total size of inbound packages is 33 MiB. Need to download 33 MiB.
After this operation 122 MiB will be used (install 124 MiB, remove 2 MiB).

# Extra is from package manager cache:
$ du -shx /
297M    /

# Clean up package manager cache:
$ dnf5 clean all
Removed 12 files, 7 directories. 0 errors occurred.

# Thus total 122 MiB added weight:
$ du -shx /
248M    /

Standalone installer (4MiB installs to 400+ MiB)

$ docker run --rm -it quay.io/fedora/fedora-minimal:41 bash

# Fedora image already has curl, just needs tar + gzip to extract:
$ dnf5 install -y tar gzip && dnf5 clean all

# As the tar.gz contains only a single file, we can write the output to the preferred location directly:
$ curl -sSfL https://github.com/pypa/hatch/releases/download/hatch-v1.10.0/hatch-1.10.0-x86_64-unknown-linux-gnu.tar.gz \
   | tar -xzO > /usr/local/bin/hatch && chmod +x /usr/local/bin/hatch

# Before triggering install:
$ du -shx /
131M    /

# 410+ MiB added weight from install:
$ hatch --version && du -shx /
544M    /

Now as a Dockerfile, build the image for better insight into layer for hatch --version via the dive CLI tool to see where all that weight is coming from:

FROM quay.io/fedora/fedora-minimal:41
RUN dnf5 install -y tar gzip && dnf5 clean all
RUN curl -sSfL https://github.com/pypa/hatch/releases/download/hatch-v1.10.0/hatch-1.10.0-x86_64-unknown-linux-gnu.tar.gz \
   | tar -xzO > /usr/local/bin/hatch && chmod +x /usr/local/bin/hatch
RUN hatch --version
# In dir with `Dockerfile` above:
docker build --tag local/hatch .
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive local/hatch

image

image

# Overview of the biggest sources of that weight:
61MB => /root/.cache/pyapp/distributions/14656550572188801628
32MB => /root/.cache/pyapp/uv
229MB => /root/.cache/pyapp/distributions/_14656550572188801628/python/lib/
- 189 MB => libpython3.12.so.1.0
- 26 MB => python3.12
91MB => /root/.cache/uv

Considering that's all in the /root/.cache dir and nowhere else, it's not obvious what is safe to remove without breaking any assumptions from hatch?

  • Presumably uv is still optional and could be removed if not needed. Not sure why there are two instances there?
    • /root/.cache/uv (91MB) looks like it's a python package for uv?
    • /root/.cache/pyapp/uv (32MB) is the actual uv binary.
  • Presumably /root/.cache/pyapp/distributions/_14656550572188801628/python can be removed
    • EDIT: No, hatch breaks, as /root/.local/share/pyapp/hatch/14656550572188801628/1.10.0/bin/hatch is reliant upon it.
    • If an existing python environment is present (when running hatch --version as above), it is disregarded and the pyapp python build is still pulled in, this hatch is fully self-contained... Thus likely the same for the uv dependency?

pipx install hatch (165 MiB, 48 MiB for pipx, 117 MiB for hatch + bundled uv)

$ docker run --rm -it quay.io/fedora/fedora-minimal:41 bash
# Install pipx with python3:
$ dnf5 install -y pipx && dnf5 clean all

Transaction Summary:
 Installing:       17 packages

Total size of inbound packages is 13 MiB. Need to download 13 MiB.
After this operation 48 MiB will be used (install 48 MiB, remove 0 B).

# Pre-install weight (ignoring pipx 48 MiB):
$ du -shx /
184M    /

$ pipx install hatch uv
$ export PATH="${PATH}:/root/.local/bin"

# Post-install weight:
$ du -shx /
365M    /

$ hatch --version
Hatch, version 1.10.0
$ uv --version
uv 0.1.42

# No change, huzzah!
$ du -shx /
365M    /
  • 35 MiB of that is from /root/.cache/, specifically pip. The entire cache folder at this point can be emptied. Using pipx install it brings in it's own copy of pip, thus no value from adding pip via the package manager.
  • 31 MiB belongs to an internal copy of uv that hatch has bundled in it's virtual environment at /root/.local/share/pipx/venvs/hatch/bin/uv. This allows hatch to use uv when configured (eg: installer = "uv" in hatch.toml), but is otherwise not available to you, even within hatch run / hatch shell (so uv pip list isn't available to inspect what uv has installed implicitly via hatch)

You could of course make uv available a few ways:

  • Add a symlink: ln -s /root/.local/share/pipx/venvs/hatch/bin/uv /usr/local/bin/uv
  • Update your PATH ENV export PATH="${PATH}:/root/.local/share/pipx/venvs/hatch/bin/uv" (_**NOTE:** within a hatchenvironment this location is available viaHATCH_UV` ENV_)
  • Configure pip or uv to alias HATCH_UV environment variable via "Extra Scripts" (as shown in the docs). But this would need to be per environment AFAIK?
  • Install uv again via pipx if you don't mind the extra space since pipx does not de-duplicate via hardlinks from what I can tell.
FROM quay.io/fedora/fedora-minimal:41
RUN dnf5 install -y pipx && dnf5 clean all
# Hatch bundles uv:
RUN pipx install hatch && rm -rf /root/.cache/
# Effectively what `pipx ensurepath` accomplishes to make the hatch command available:
ENV PATH="${PATH}:/root/.local/bin"
# One of many ways to use the internal uv installed with hatch:
RUN ln -s /root/.local/share/pipx/venvs/hatch/bin/uv /usr/local/bin/uv
# Verify both commands work:
RUN hatch --version && uv --version

image

image

Advanced: FROM scratch multi-stage (roughly 210 MiB total image size)

  • While I haven't tried this, the standalone binary installer possibly can be run without a base image, but it's still the heavy-weight choice.
  • Fedora image thanks to dnf has a feature that can create a new base image with only the minimal packages you need, which we've established is hatch (197 MiB total base image) or pipx (109 MiB total base image + 117 MiB after pip install hatch). zypper (openSUSE) also has this feature where python311-hatch base will be 179 MiB and python311-pipx 87 MiB (+117 MiB after pipx install hatch). Any extra commands can still be run in that new root location via chroot if needed, such as running pipx install hatch, then you can switch to the next stage with scratch and COPY that over for a minimal image size.
# syntax=docker.io/docker/dockerfile:1

FROM quay.io/fedora/fedora-minimal:41 AS base-stage

# The <<EOF (start) and later EOF (end) markers are HereDoc syntax
# Allows for a RUN directive to more nicely run multiple commands in a single layer
RUN <<EOF
  dnf5 --installroot /rootfs --use-host-config --setopt=install_weak_deps=0 install -y pipx
  dnf5 --installroot /rootfs --use-host-config --setopt=install_weak_deps=0 clean all

  # This works since bash was implicitly installed into the new root fs
  # NOTE: DNF was not included, so it is not available once we switch via chroot.
  # For DNS lookups like `pipx install` needs, we'll also need to provide `/etc/resolv.conf`
  cp /etc/resolv.conf /rootfs/etc/resolv.conf
  # chroot is a bit awkward in a Dockerfile, using SHELL directive or after the COPY on scratch
  # may be more convenient?
  chroot /rootfs bash -c 'pipx install hatch && rm -rf /root/.cache/'
  chroot /rootfs ln -s /root/.local/share/pipx/venvs/hatch/bin/uv /usr/local/bin/uv
EOF

FROM scratch
ENV PATH="${PATH}:/root/.local/bin"
COPY --link --from=base-stage /rootfs /
RUN hatch --version && uv --version

Throughout my examples I've used quay.io/fedora/fedora-minimal:41, this is a beta image where dnf5 is built-in. Previously on minimal images it'd be microdnf, but once Fedora 41 is released both the minimal image and regular fedora (eg: fedora:41) will have dnf5 as the usual dnf command (finally!). fedora-minimal has a smaller base, but it does make some compromises (for example try running btop, it needs a little extra nudge on your part), I think the UX (at least interactively?) goes down a bit, so I'd generally suggest the regular fedora images, and it should make little difference with this --installroot approach.

Like Fedora, the openSUSE TumbleWeed image is still on hatch 1.9.x, thus both hatch packages are 30 MiB shy of what they'd actually be with uv involved. When that lands you'll get a more minimal/simpler scratch, but honestly the size isn't that big of a win here:

# syntax=docker.io/docker/dockerfile:1

FROM opensuse/tumbleweed AS base-stage

RUN <<EOF
  zypper --releasever tumbleweed --installroot /rootfs --gpg-auto-import-keys refresh
  zypper --releasever tumbleweed --installroot /rootfs --non-interactive install --download-in-advance --no-recommends python311-pipx

  # Cleanup doesn't make a difference in this case (zypper keeps most cache on the main root), but this is how you'd do it:
  # NOTE: If you care about this base-stage image layers you could clear the main root cache without the `--releasever --installroot` args
  # zypper --releasever tumbleweed --installroot /rootfs-h --non-interactive clean --all

  # No need to worry about the /etc/resolv.conf if you're not doing any network stuff via chroot
  # At runtime of the container Docker will replace it to manage networking itself.
EOF

FROM scratch
COPY --link --from=base-stage /rootfs /
RUN hatch --version

NOTE: If you try to do the pipx install with the opensuse image you'll find that it fails with the rm and ln commands not existing. Those are packages that weren't needed for pipx, but are required to do those extra steps so you'd need to add them. Fedora on the other hand still installs those basic utility commands.

Alpine (roughly 180 MiB total image size)

Smallest by about 30-40 MiB, fairly simple but Alpine with musl does have some caveats to be mindful of.

# syntax=docker.io/docker/dockerfile:1

FROM alpine
RUN <<EOF
  apk add --no-cache pipx
  pipx install hatch && rm -rf /root/.cache
  ln -s /root/.local/share/pipx/venvs/hatch/bin/uv /usr/local/bin/uv
EOF
ENV PATH="${PATH}:/root/.local/bin"
RUN hatch --version && uv --version

For minimizing size

  • pyapp (Standalone installer via curl) => Perhaps there is something you can remove from above, but it's not clear what will break (or change over time like the addition of uv).
  • Package manager => Probably your best option is when it's supported like Fedora does (although there is the disadvantage of version lag, as you cannot get the 1.10.0 release yet to enjoy uv). As can be seen above the size is much less and with uv it should only go up by about roughly 30MB (another issue has the package maintainer discussing it, where they might not make uv a weak dependency.. which may enforce that uv to be bundled even if you don't need it).
  • pipx => This is also reasonably lightweight (approx 150MiB) and also requires installing uv separately to hatch (_ which you can do through the same tool for 30MiB more_) (EDIT: hatch bundles uv, you can technically use it directly too_). So slightly more weight, but much more broadly accessible 👍
  • ✅**FROM scratch** (210 MiB for the whole image) => The smallest of all all, but a bit more involved. You can achieve similar size with a much simpler alpine + pipx equivalent without the --installroot multi-stage trick (183 MiB).. However Alpine being musl based has some drawbacks (you'll find some articles specifically about issues with Python, but there can be quite a few gotchas), thus I generally discourage it, especially since glibc based distros like fedora and suse can compete reasonably close size wise (210 MiB) with a few extra lines, but much better performance and compatibility.

@ofek
Copy link
Collaborator

ofek commented May 14, 2024

Thank you for the fantastic writeup!

As of https://github.com/pypa/hatch/releases/tag/hatch-v1.11.0, the binaries pull down distributions that already have Hatch installed which is about as small as I can make that. This is what the official GitHub action to install Hatch will use when I have time to do so.

There is also a new self cache command so after installation you would want to run hatch self cache dist --remove and now all that will exist will be the distribution with Hatch that is tied to the binary. The following is an example:

❯ docker run --rm -it ubuntu bash
root@c8f3aacf6229:/# apt update && apt install -y --no-install-recommends curl ca-certificates
root@c8f3aacf6229:/# du -shx
127M    .
root@c8f3aacf6229:/# curl -LO https://github.com/pypa/hatch/releases/latest/download/hatch-x86_64-unknown-linux-gnu.tar.gz
root@c8f3aacf6229:/# tar xzf hatch-x86_64-unknown-linux-gnu.tar.gz
root@c8f3aacf6229:/# ./hatch self restore
root@c8f3aacf6229:/# rm hatch-x86_64-unknown-linux-gnu.tar.gz
root@c8f3aacf6229:/# ./hatch self cache dist -r
root@c8f3aacf6229:/# du -shx
470M    .

@ofek
Copy link
Collaborator

ofek commented May 14, 2024

Actually forget what I said please, I'm about to reduce that substantially.

@ofek
Copy link
Collaborator

ofek commented May 14, 2024

Done!

image

@lwasser
Copy link
Contributor Author

lwasser commented May 14, 2024

amazing!! ofek, with pycon travel coming up i won't be able to start a tutorial / how to until after i'm back! but also @polarathene you've provided an INCREDIBLE amount of information above and i suspect / know :) that you know a lot more about this topic than i do. would you like to start a tutorial and i can perhaps contribute? or would you like for me to start / try my best to reflect what you have found and then you can review/ contribute / add that way?

it just seems to me that there is so much information in this thread now, that we should capture it and turn it into a documentation page for others to discover!

@lwasser
Copy link
Contributor Author

lwasser commented May 15, 2024

ofek that is a considerable reduction in image size!! so so awesome!!

@polarathene
Copy link

Cheers for the improvement @ofek ! 🥳 (EDIT: It seems there are some gotchas to consider vs a pipx install hatch)

The below notes are mostly for my benefit to come back to, but sharing with others if helpful. I'll summarize with a TLDR in a follow-up comment.

Collapsed for brevity (click to view)

Layer insights:

image

image


hatch self restore size was is almost equivalent to hatch --version (near 200MB added), just 4 MB less.

hatch self cache dist --remove removes 47MB of that added weight from ~/.cache/pyapp, so you can remove this dir afterwards or leave it with the empty content:

image

Actual hatch lives as a python script at /root/.local/share/pyapp/hatch/1303662642487178586/1.11.0/python/bin, but still relies on the binary extracted from curl AFAIK to run (as even with a local python install to run that script directly it is not happy), so move the installer binary to a location like /usr/local/bin/hatch 👍

.pyc / pycache content

The final RUN layer shows that the hatch --version command added about 3MB, and that it's due to running python creating various .pyc cache files like this:

image

PYTHONPYCACHEPREFIX=/path/to/cache is meant to allow customizing the cache dir for this content since Python 3.8, but for some reason in my Dockerfile ENV it wasn't having any effect 🤷‍♂️ (it does for a system pipx install uv, so presumably this is due to hatch using the bundled Python?)

Dockerfile

3 examples, with the first a little bit better documented and avoiding &&.

# syntax=docker.io/docker/dockerfile:1

FROM fedora:40
RUN <<EOF
  # Fedora comes with curl (and tar + gzip, unlike fedora-minimal), nothing to install via dnf

  # Grab the latest release for your arch and extract it to /usr/local/bin, then make it executable:
  HATCH_URL="https://github.com/pypa/hatch/releases/latest/download/hatch-$(uname -m)-unknown-linux-gnu.tar.gz"
  curl -sSfL "${HATCH_URL}" | tar -xzO > /usr/local/bin/hatch
  chmod +x /usr/local/bin/hatch

  # Finish installing hatch, then remove the redundant PyApp cache:
  hatch self restore
  hatch self cache dist --remove
EOF
# syntax=docker.io/docker/dockerfile:1

FROM quay.io/fedora/fedora-minimal:41
RUN <<EOF
  dnf5 install -y tar gzip && dnf5 clean all
  curl -sfSL "https://github.com/pypa/hatch/releases/latest/download/hatch-$(uname -m)-unknown-linux-gnu.tar.gz" \
    | tar -xzO > /usr/local/bin/hatch && chmod +x /usr/local/bin/hatch
  hatch self restore && hatch self cache dist --remove
EOF
# syntax=docker.io/docker/dockerfile:1

FROM ubuntu:24.04
RUN <<EOF
  apt update && apt install -y --no-install-recommends curl ca-certificates && rm -rf /var/lib/apt/lists/*
  curl -sfSL "https://github.com/pypa/hatch/releases/latest/download/hatch-$(uname -m)-unknown-linux-gnu.tar.gz" \
    | tar -xzO > /usr/local/bin/hatch && chmod +x /usr/local/bin/hatch
  hatch self restore && hatch self cache dist --remove
EOF

Technical details if any some of the stuff I did is unfamiliar:

  • syntax=docker.io/docker/dockerfile:1 is a good practice encouraged by docker on their docs.
  • The <<EOF (HereDoc) syntax is something I prefer, it's not difficult to grok once you understand <<EOF is a start marker and the EOF the end marker, everything in between is a multi-line input, making it a shell script without && \ noise to use a single RUN layer. Despite this, I do find many are uncomfortable with it, so it might not be ideal for official documentation? 🤷‍♂️
  • The curl URL is also using $(uname -m) to get the processor architecture (x86_64 / aarch64), so you can run the same Dockerfile on either platform. pipx is still probably more straight-forward though.
  • I need to use chmod +x due to extracting the compressed hatch file from tar.gz into new name / location (tar -O > /path/filename). This avoids needing another $(uname -m) or mv, and technically corrects permissions (UID is 1001, GID is 127), but writing the contents to a new file (>) lost the original executable bit (+x), which needs to be restored.

Total size via du -sx --bytes --si / (Hatch adds: 140MB /root/.local/share/pyapp/hatch + 4MB /usr/local/bin/hatch):

  • fedora-minimal:41 (263MB / 262 414 865) + 18s image build without cache
  • fedora:40 (366MB / 365 360 517) + 22s image build without cache
  • ubuntu:24.04 (230MB / 229 171 530) + 45s image build without cache (232MB + 67s for ubuntu:22.04, 236MB + 82s for ubuntu:20.04)

I tend to prefer Fedora as a base as it's faster and better UX with the package manager, but most users may have a better UX with Ubuntu images, especially when they need to add additional system packages (this is sometimes inconvenient with Fedora for proprietary packages like nvidia or certain video codecs IIRC).

Ubuntu has the better image size in this case. It's smaller than fedora-minimal:41 (which I show for size + build speed comparison, but I encourage regular fedora base until fedora-minimal shares the same dnf command instead of microdnf / dnf5, which might happen by the final Fedora 41 release).


GH release URLs naming convention change from v1.11.0

The above curl example is for the latest release on GH. If you want to version pin the release file dropped the version prefix since 1.11.0, so not too relevant going forward (at least hopefully it'll remain consistent from now on, omitting the version prefix is convenient for the latest approach):

https://github.com/pypa/hatch/releases/latest/download/hatch-x86_64-unknown-linux-gnu.tar.gz
https://github.com/pypa/hatch/releases/download/hatch-v1.11.0/hatch-x86_64-unknown-linux-gnu.tar.gz
https://github.com/pypa/hatch/releases/download/hatch-v1.10.0/hatch-1.10.0-x86_64-unknown-linux-gnu.tar.gz

GH release variants

You also have in addition to the glibc target (-gnu), a -musl one. For anyone interested on the glibc linking:

# These must resolve (and they usually should in a glibc focused distro):
$ ldd /usr/local/bin/hatch
        linux-vdso.so.1 (0x00007ffc2d9e0000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f74e59d3000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f74e6003000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f74e5ffe000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f74e58f0000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f74e5ff9000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f74e5703000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f74e600d000)

# Binary built with Rust 1.78 (latest) and a rather old Ubuntu which suggests `cross-rs` Docker image environment:
$ readelf -p .comment /usr/local/bin/hatch

String dump of section '.comment':
  [     0]  GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
  [    35]  rustc version 1.78.0 (9b00956e5 2024-04-29)

# Probably built this way for broader compatibility by targeting a low glibc version,
# cargo zigbuild is a more modern approach that can be used instead:
# Command from my comment here: https://github.com/rust-cross/cargo-zigbuild/issues/231#issuecomment-1987845738
$ readelf -W --version-info --dyn-syms /usr/local/bin/hatch \
  | grep 'Name: GLIBC' \
  | sed -re 's/.*GLIBC_(.+) Flags.*/\1/g' \
  | sort -t . -k1,1n -k2,2n | tail -n 1
2.18

# The equivalent for the static linked musl build (old GCC, March 2021):
$ readelf -p .comment /usr/local/bin/hatch

String dump of section '.comment':
  [     0]  GCC: (GNU) 9.4.0
  [    11]  rustc version 1.78.0 (9b00956e5 2024-04-29)
  [    3d]  GCC: (GNU) 9.2.0
  • However the GH releases only publish -musl for x86_64, thus if you want to support ARM64 (aarch64), just use -gnu.
  • This also means -musl via this install method will only work for Alpine with x86_64 (not that you should be using Alpine for python deployments anyway 🤔 )

Also from 1.11.0 of hatch, there is "dist" variants, which the release page doesn't add clarification to - but extracting these results in 150MiB of content: hatch + uv + hatchling and a bundled Python 3.12. Perhaps related to the improvement @ofek mentioned above?

Feedback

So with the above improvement, curl is a great install option with about 140MB weight 🎉 (100MB for bundled Python + 30MB for bundled uv)

It'd be neat if you could opt-out of the bundled Python and uv options if hatch can instead detect and use the ones available from the system after the boot strapping is done?

hatch doesn't seem to be aware of it's own bundled distribution though, so I assume that isn't possible?

# Running this command installed
$ hatch python find 3.12
Distribution not installed

# Hatch doesn't consider this as a managed python install, it treats the bundle like a system one?
$ hatch python show
      Available
┏━━━━━━━━━━┳━━━━━━━━━┓
┃ Name     ┃ Version ┃
┡━━━━━━━━━━╇━━━━━━━━━┩
│ 3.7      │ 3.7.9   │
├──────────┼─────────┤
│ 3.8      │ 3.8.19  │
├──────────┼─────────┤
│ 3.9      │ 3.9.19  │
├──────────┼─────────┤
│ 3.10     │ 3.10.14 │
├──────────┼─────────┤
│ 3.11     │ 3.11.9  │
├──────────┼─────────┤
│ 3.12     │ 3.12.3  │
├──────────┼─────────┤
│ pypy2.7  │ 7.3.15  │
├──────────┼─────────┤
│ pypy3.9  │ 7.3.15  │
├──────────┼─────────┤
│ pypy3.10 │ 7.3.15  │
└──────────┴─────────┘

# Installing it adds another 160MB:
$ hatch python install 3.12
Installed 3.12 @ /root/.local/share/hatch/pythons/3.12

The following directory has been added to your PATH (pending a shell restart):

/root/.local/share/hatch/pythons/3.12/python/bin

$ du -shx /
445M    /

Not a major concern, and I may be unfamiliar with a way to configure that, but something to be aware of as if you want to pip install ... something, AFAIK that requires bringing in another python install (either via distro system package, hatch python install <name>, or implicitly via pyproject.toml / hatch.toml, etc)... so the above is perhaps not as minimal / convenient as the pipx approach?

Gotchas

I assume once installing actual python packages or similar activity, another install of Python is going to add to the weight? hatch isn't able to use the one it's bundled? (EDIT: Documented below, it's possible for virtual env to use the same Python bundled)

Docs for hatch shell are a bit lacking here:

$ hatch shell --help
Usage: hatch shell [OPTIONS] [ENV_NAME]

  Enter a shell within a project's environment.

Options:
  --name TEXT
  --path TEXT
  -h, --help   Show this message and exit.

From what I've seen elsewhere --name refers to a version of Python as listed under the Name column in hatch python show?

  • The hatch python find CLI help also was not that clear when referring to an expected arg of NAME btw. Including an example in the help output might be better UX, or just the associated web docs (where it's also vague). That would make it less guesswork that it's meant to be a value from hatch python show.
  • The CLI --help also doesn't show defaults like the web docs do.
  • The web docs could link to this section perhaps for the supported versions? While the CLI could mention they're listed in hatch python show?

These two sections from the web docs are a little insightful about what I was after:

There's an ENV HATCH_PYTHON, which doesn't appear to be documented elsewhere? (I tried the docs search box). It mentions a value of self can be used, which is not valid for --name or --path with hatch shell, but it is as an ENV. This prevents install an extra copy of Python.

hatch shell --name does not appear to be a name related to a Python version however.

Caution: Extra Python expected by default

The first virtual environment adds about 20MB, subsequent ones around 8MB. If there is no other Python detected, hatch downloads a new one which seems to add another 150MB? You can avoid that with the HATCH_PYTHON=self ENV as mentioned above.

du -sx --bytes --si /
263M    /

# 3-4MB increase:
$ hatch --version
Hatch, version 1.11.0

$ du -sx --bytes --si /
266M    /

# Environment added, 18MB increase:
$ cd /tmp && HATCH_PYTHON=self hatch shell
$ du -sx --bytes --si /
284M    /
$ exit
# No excess when using without the ENV:
$ hatch shell
$ du -sx --bytes --si /
284M    /

# Different location creates a new environment.
# This time since ENV is omitted it's created by bringing in Python 3.12 again:
$ cd /opt && hatch shell
du -sx --bytes --si /
448M    /

Inconsistency within virtual environment due to PATH ENV

The curl install approach differs from pipx / package install in a notable way.

  • Perk: You can share uv command in the environment without any extra steps (like symlinking).
  • Con: You can't use hatch command within the environment, unless you provide an absolute path to the proper command (/usr/local/bin/hatch, which was already discoverable in PATH)
  • These differences apply regardless of HATCH_PYTHON (only affects the virtual env), the difference is due to an extra PyApp addition into the PATH ENV, thus hatch from that location has priority over your actual hatch binary 🤷‍♂️
$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

$ cd /tmp && hatch shell --name 3.11
$ python --version
Python 3.12.3

# Fails due to modified PATH:
$ hatch --version
bash: /root/.local/share/pyapp/hatch/1303662642487178586/1.11.0/python/bin/hatch: cannot execute: required file not found
$ /usr/local/bin/hatch --version
Hatch, version 1.11.0
# UV is available however:
$ uv --version
uv 0.1.44

# hatch environment and hatch install location are given precedence for resolving binaries:
$ echo $PATH
/root/.local/share/hatch/env/virtual/opt/y8366zdl/opt/bin:/root/.local/share/pyapp/hatch/1303662642487178586/1.11.0/python/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

dnf install python adds 50 MB and can also be used by HATCH_PYTHON ENV instead of self. This ENV affects the linked python in the virtual env PATH, which adds a symlink to that location. It seems unnecessary though as when Python is already installed on the system already like this, hatch detects that and will use it by default.

Contrasting to a pipx / package install, where python is externally available to hatch, it too will create using that Python by default. You'll find that the PATH ENV isn't altered in the same way, hatch --version will work in the environment while uv will not:

$ dnf install -y pipx && pipx install hatch
$ cd /tmp && hatch shell

$ hatch --version
Hatch, version 1.11.0
$ uv --version
bash: uv: command not found

env | grep PATH
PATH=/root/.local/share/hatch/env/virtual/tmp/6WcazSRI/tmp/bin:/root/.local/bin:/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

I assume this difference isn't intentional?

@polarathene
Copy link

Summary of prior message

Still a tad long, see the prior message for more details.

GH Releases:

  • Hatch releases from v1.11.0 have changed their GH release naming convention.
    • The version is no longer part of the filename, only the release tag in the URL.
    • The compressed contents is also normalized to hatch instead of the filename without .tar.gz.
    • These change are great, but something the release didn't draw attention to (or explain the dist additions that appear to be 150MB of content similar to the standalone?).
  • Only -gnu (glibc) is available for ARM64, -musl only offers AMD64 (x86_64), it also seems to be broken (fails to install properly, but so does v1.10.0).

Standalone installer depends on external python despite bundling it's own:

  • Other install methods require Python already (pipx, distro package), while the PyApp could use the one that hatch seems to bundle regardless of install method?
  • When creating an environment and no other Python install exists, rather than using it's own copy, it'll download another Python (aka hatch python install). Unless you use HATCH_PYTHON=self as a workaround, expect an extra 150MB once you use hatch shell / hatch env or similar.

Standalone installer prepends it's bin location to PATH ENV:

  • While it's nice to have uv available in the environment without extra config, this overrides the hatch binary with the hatch python script from it's PyApp install.
  • As a result you're prevented from using the hatch command within a Hatch managed environment. Unlike other install methods where this is a non-issue by only prepending the venv to PATH.

Docs (Web / CLI help) need love:

  • hatch shell especially has --help output that is not explaining the --name and --path options. There's overlap with these elsewhere but it's not treated the same. Meanwhile the web docs for this command are equally vague.
  • hatch python find + hatch python install are examples where the CLI output is not very good at communicating what is expected. NAME must be understood that it's a Python distribution (as per the web docs venv section on the topic).
    • The user must discover this via the web docs or hatch python show which outputs a Name column that can be inferred as what sibling commands want.
    • CLI help could provide a one-liner example to better communicate what the value of NAME implies (eg: 3.12). While the web docs could link to an appropriate section that already covers what is supported (venv plugin page)
  • While ENV like HATCH_PYTHON are briefly mentioned (venv plugin page) so that you can learn about the HATCH_PYTHON=self, this ENV and others like it don't appear to be documented on the web docs?
  • hatch config show has:

@polarathene
Copy link

polarathene commented May 15, 2024

@lwasser I've got a bit to juggle elsewhere, but I'd be happy to review a PR when I can spare the time.

I am not that experienced with Python, but I know Linux and Docker very well! If you've got any questions feel free to reach out 👍

I think most of the info I've covered above doesn't really need to go into the docs. It was more about exploring what options were available and the tradeoffs 😎

  • I've revised a Dockerfile for you below, it's documented well and should convey what's necessary to get a basic image with hatch setup.
  • I've added a separate context section below since others interested in docs for Docker may run into similar concerns. One that's common with Docker images is handling deps in a separate layer, although I'd like to try manually install some in advance.

Dockerfile example

Decisions made:

  • Ubuntu is the smallest image from above experiments. It is also a base image choice that most users will be comfortable with as a reference.
  • pipx is only 5MB larger than the curl install approach
    • Pro: pipx provides a simpler UX? (especially for supportnig both AMD64 + ARM64 builds)
    • Con: pipx image does take 90s on my system to build, vs 45s for the curl approach (or 17s via fedora-minimal, while it's pipx equivalent takes 36s). This shouldn't be too much of a concern provided the layer isn't invalidated in future builds (cache mounts can alleviate that if needed).
  • I find the Dockerfile below with the HereDoc feature is easier to grok, I'd encourage choosing that.
    • Alternatively, I've provided the old technique of running commands within a single RUN layer.
    • I can't recall compatibility for this feature with Docker prior to v23 (Feb 2023) releases. I think the ENV DOCKER_BUILDKIT=1 may have been required.
  • Symlinking for uv seemed most convenient to manage, while avoiding an extra 30MB.
    • Personally I'd opt-out of the bundled uv in hatch if I could, and pipx install uv with the HATCH_UV ENV set.
    • Maybe it's ok to delete the bundled uv, but that just swaps ln for rm thus no improvement to the Dockerfile?
# syntax=docker.io/docker/dockerfile:1

FROM ubuntu:24.04
RUN <<HEREDOC
  # Install pipx, then empty the apt cache:
  apt update && apt install -y --no-install-recommends pipx
  rm -rf /var/lib/apt/lists/*

  # Updates the USER `.bashrc` and `.profile` to append `${HOME}/.local/bin` to $PATH
  pipx ensurepath

  # Install hatch, then empty the pip cache:
  pipx install hatch && rm -rf "${HOME}/.cache/pip"

  # Hatch bundles UV, symlink to it to avoid needing `pipx install uv`:
  ln -s "${HOME}/.local/share/pipx/venvs/hatch/bin/uv" /usr/local/bin/uv
HEREDOC
Old approach for `RUN`
FROM ubuntu:24.04
RUN apt-get update \
  && apt-get install -y --no-install-recommends pipx \
  && rm -rf /var/lib/apt/lists/* \
  && pipx ensurepath \
  && pipx install hatch \
  && rm -rf "${HOME}/.cache/pip" \
  && ln -s "${HOME}/.local/share/pipx/venvs/hatch/bin/uv" /usr/local/bin/uv
Fedora equivalent (very little difference)
# syntax=docker.io/docker/dockerfile:1

FROM fedora:40
RUN <<HEREDOC
  dnf install -y pipx && dnf clean all
  pipx ensurepath
  pipx install hatch && rm -rf "${HOME}/.cache/pip"
  ln -s "${HOME}/.local/share/pipx/venvs/hatch/bin/uv" /usr/local/bin/uv
HEREDOC
Reference: Alternative - Standalone via curl

NOTE: Current caveats apply:

  • hatch command does not work in a venv due to modified PATH.
  • uv is not symlinked for that same modified PATH reason that makes it available.
# syntax=docker.io/docker/dockerfile:1

FROM ubuntu:24.04
RUN <<EOF
  apt update && apt install -y --no-install-recommends curl ca-certificates
  rm -rf /var/lib/apt/lists/*

  # Grabs the latest release for your arch and extracts it to /usr/local/bin:
  HATCH_URL="https://github.com/pypa/hatch/releases/latest/download/hatch-$(uname -m)-unknown-linux-gnu.tar.gz"
  curl -sfSL "${HATCH_URL}" | tar -xzO > /usr/local/bin/hatch
  # Permit this file to run / execute:
  chmod +x /usr/local/bin/hatch

  # Installs standalone hatch, then does some cleanup (remove PyApp cache):
  hatch self restore && hatch self cache dist --remove
EOF

Fedora equivalent (without the commentary):

  • Larger image size (base) than Ubuntu (over 100MB), but faster to build. If you build multiple images for projects that share the same base image layer it's less of an issue.
  • This image already has curl already, so no packages to install. Unlike fedora-minimal, it already has tar + gzip too.
  • TIP: Since Hatch v1.11.0, the tar.gz files have normalized the compressed filename to hatch. You could alternatively use tar -xz && mv hatch /usr/local/bin/hatch instead, no chmod +x needed, but the original UID and GID may not be compatible for non-root customizations (the GID changed with v1.11.0, UID remains at 1001).
# syntax=docker.io/docker/dockerfile:1

FROM fedora:40
RUN <<EOF
  HATCH_URL="https://github.com/pypa/hatch/releases/latest/download/hatch-$(uname -m)-unknown-linux-gnu.tar.gz"
  curl -sfSL "${HATCH_URL}" | tar -xzO > /usr/local/bin/hatch
  chmod +x /usr/local/bin/hatch

  hatch self restore && hatch self cache dist --remove
EOF

Context

As the type of user that'd be interested in such docs when I was looking into Hatch, but also as a user new to Python that wants to run some Github projects in Docker containers - I wanted to know what install process for hatch was going to work best to minimize disk space vs plain pip install.

  • We've pretty much established pipx install is still the best choice right now (standalone installer has some caveats remaining, while distro packages are behind in releases to enjoy uv support).
  • The availability of the standalone installer (and it's apparent small size on GH releases) did make me wonder if I could use that without pipx or Python, so I might have tried it anyway to compare (and then get confused once actually using hatch due to the present issues outlined above). The docs could try emphasize pipx has the least amount of friction / surprises? 🤷‍♂️
  • I'll be trying Hatch at a later date with UV to run some PyTorch based projects, if I learn anything else from that worth sharing I'll chime in here 👍

An unresolved concern I have is going to be how to handle PyTorch. Deps in hatch.toml / pyproject.toml don't have a clear command to install/sync but instead require hatch shell / hatch env run to trigger that implicitly?

  • If I want to "warm" up the cache for UV in advance by installing the 4-5GB torch uses, this should be done in a separate RUN layer (or image/stage) before other deps to prevent this data being discarded when something else in the project is updated (hatch.toml, project source files) which could invalidate the layer.
  • I'm not sure how hatch (and the virtual environments it manages through UV) are involved in that, it's not something you'd really worry about outside of a container.
  • While Docker does have cache mounts which could help with builds (and allow a hatch.toml to be present without layer invalidation concerns) - this would prevent using hard links, thus incurring a copy across the mount boundary introduced. Not really a problem when the image is only being built for a single virtual environment using PyTorch, but if I want to have several that may be a concern.
  • This topic is perhaps more niche / advanced, so it doesn't need to be tackled with the initial Docker guidance, but if someone knows how to approach it that'd be good! Without the cache mount usage, I suppose I could have a separate dummy hatch.toml environment to bring these in (or directly run uv venv + uv pip install, without hatch involved?). The hardlinking feature should take care of the rest I think (if I manage a hatch.toml for each project, I think they can inherit the same PyTorch environment?). I'll try it when I can :)
# Related UV issue as below will need to handle different "local identifiers":
# https://github.com/astral-sh/uv/issues/3437#issuecomment-2102125794
[envs.default]
type = "virtual"
path = "venv-pytorch"
dependencies = [
  "torch==2.3+cu121",
  "torchvision",
  "torchaudio"
]
installer = "uv"

[envs.default.env-vars]
UV_INDEX_URL = "https://download.pytorch.org/whl/cu121"

@lwasser
Copy link
Contributor Author

lwasser commented Aug 20, 2024

I just wanted to check back in here, y'all. I've been swamped with other volunteer commitments, and I won't be able to follow through with the docker PR. I hope that someone else can hop in and work on this, as this issue contains a lot of great information. We are having good success with using and teaching Hatch over at pyOpenSci, so I hope to continue to see the use of and documentation for Hatch grow!

@jesshart
Copy link

Hi all! I wanted to introduce myself to this topic as Ofek kindly pointed me here when I mentioned I would be interested in helping out with some documentation.

I work as a data scientist at a small company in Austin, TX and we adopted hatch as our project manager earlier this year after some research. We had been using conda but I ran into some major headaches when trying to deploy using conda + docker + AWS services. Since these AWS services were going to be a big part of how we deployed our solutions, we decided to switch our project manager.

Since I don't want to write an essay here, I will try to keep it short 😁. We decided on hatch and I have been experimenting with it ever since and really enjoy the features though I think documentation could be improved and so I am here to help.

I have not read this entire thread yet but I look forward to catching up and helping as I can (I also volunteer for too many projects 😬).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants