Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/flm_npu_linux.html
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,14 @@ <h2>Background</h2>
NOTE: This is a beta feature right now, and to use it you will need to
run lemonade-server with the environment variable
<code>LEMONADE_FLM_LINUX_BETA=1</code> set.
<br><br>
DOCKER USERS: The beta currently uses `which` to locate your flm binary.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bluefalcon13 can you confirm that FLM works in the docker whatsoever? That would be a pleasant surprise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the dockerfile for the lemonade server docker I am using: https://github.com/bluefalcon13/local_ai_stack/blob/main/configs/lemonade/Dockerfile

The docker compose is at the project root.

Can 100% confirm, after a bunch of fighting, I have a functional lemonade docker, with a custom llama-cpp and flm built. I need to bump my max LLMs so I can run them concurrently, then its more fighting to try to get FLM to act as a drafter. :D

image

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the official docker released from this repo?

If you see where I am going with this: if we add a docker note to the website, people will think the built-in docker works with the NPU if they just do the one tip.

Any chance you want to update the mainline docker definition here to work with the NPU?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be able to. I ran into an issue with my Ubuntu docker (I am more familiar with Debian-based distros) cause I moved up to Arch's mainline kernel. Ubuntu did NOT play nice with that, and building XRT (and its plugin) from source requires the kernel headers. Shortly after that, I moved the container to Arch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the official docker released from this repo?

How about we bundle FLM in that once it releases?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW - I'm running natively in Arch myself using xrt and xdna-plugin that I uploaded MR's for.

You just need to build FLM, and there is an AUR for that too: https://aur.archlinux.org/packages/fastflowlm-git

Yeah, there is, but in a docker, it's almost the same as pulling source and adding some tweaks :P

I did pull in XRT and the plugin though from extra-testing. Those are super annoying to build.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help push those out of testing? I'm new to arch packaging. I'm not sure what is needed for that to happen.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how to do that either. Just looking over Arch's docs, and core-testing is pretty clear, but it doesn't seem like the rules are as strict for extra-testing > extra. https://wiki.archlinux.org/title/Official_repositories#extra-testing

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're migrated now.
image

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the official docker released from this repo?

If you see where I am going with this: if we add a docker note to the website, people will think the built-in docker works with the NPU if they just do the one tip.

Any chance you want to update the mainline docker definition here to work with the NPU?

After much pain, I can confirm, yes it does:

root@30c2954fe628:/opt/lemonade# flm validate
[Linux]  Kernel: 7.0.0-rc2-1-mainline
[Linux]  NPU: /dev/accel/accel0 with 8 columns
[Linux]  NPU FW Version: 1.1.2.65
[Linux]  amdxdna version: 0.6
[Linux]  Memlock Limit: infinity
root@30c2954fe628:/opt/lemonade# 

I inserted the following in at line 67 of the Dockerfile. Never built a .deb before, but in theory, you could do that in a separate stage, and pull the .deb in and install it.

RUN apt update && apt install -y --no-install-recommends \
    software-properties-common && add-apt-repository ppa:amd-team/xrt && \
    apt update && apt install -y --no-install-recommends \
    amdxdna-dkms build-essential cmake git g++ libavcodec-dev libavdevice-dev libavformat-dev \
    libavutil-dev libboost-dev libboost-program-options-dev libcurl4-openssl-dev \
    libdrm-dev libfftw3-dev libswscale-dev libxrt-dev libxrt-npu2 ninja-build \
    uuid-dev && rm -fr /var/lib/apt/lists/*

RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y

RUN cd /opt && git clone --recursive https://github.com/FastFlowLM/FastFlowLM.git && \
    cd /opt/FastFlowLM/src && cmake --preset linux-default -G Ninja \
        -DCMAKE_BUILD_TYPE=Release && \
    cmake --build build -j$(nproc) && \
    cmake --install build

Additional verification:

root@30c2954fe628:/opt/lemonade# LEMONADE_FLM_LINUX_BETA=1 ./lemonade-server recipes
Recipe              Backend     Status          Message/Version                               Action
----------------------------------------------------------------------------------------------------------------------------------------------------
flm                 npu         update_required Backend update is required before use.         lemonade-server recipes --install flm:npu
kokoro              cpu         installable     Backend is supported but not installed.        lemonade-server recipes --install kokoro:cpu
llamacpp            system      unsupported     llama-server not found in PATH                 -
                    metal       unsupported     Requires macOS                                 -
                    vulkan      installable     Backend is supported but not installed.        lemonade-server recipes --install llamacpp:vulkan
                    rocm        installable     Backend is supported but not installed.        lemonade-server recipes --install llamacpp:rocm
                    cpu         installable     Backend is supported but not installed.        lemonade-server recipes --install llamacpp:cpu
ryzenai-llm         npu         unsupported     Requires Windows                               -
sd-cpp              rocm        installable     Backend is supported but not installed.        lemonade-server recipes --install sd-cpp:rocm
                    cpu         installable     Backend is supported but not installed.        lemonade-server recipes --install sd-cpp:cpu
whispercpp          npu         unsupported     Requires Windows                               -
                    vulkan      installable     Backend is supported but not installed.        lemonade-server recipes --install whispercpp:vulkan
                    cpu         installable     Backend is supported but not installed.        lemonade-server recipes --install whispercpp:cpu
----------------------------------------------------------------------------------------------------------------------------------------------------
root@30c2954fe628:/opt/lemonade# 

docker run cmd used:

docker run -it --rm --device /dev/kfd --device /dev/dri --device /dev/accel/accel0 --ulimit memlock=-1:-1 --group-add $(getent group render | cut -d: -f3) --group-add $(getent group video | cut -d: -f3) --security-opt seccomp=unconfined --ipc=host lemonade:test bash

I did not run it myself, but that's cause I am currently already running it in my Arch container, and I am not sure I want to find out how graceful that handoff is!

If you do not have which in your docker, it will fail to pull models with
an error stating FLM cannot be installed automatically on linux.
<br><br>
Additionally, if you volume-mount your container so that `~/.cache/lemonade/hardware_info.json`
is saved between machine docker builds, you will need to delete the file on your host
so that it will rebuild a new one. This can prevent the FLM beta flag from being recognized.
</p>
</div>

Expand Down