Skip to content
This repository has been archived by the owner on Dec 16, 2020. It is now read-only.

Auto-restart functions #69

Open
PeriGK opened this issue Apr 10, 2020 · 4 comments
Open

Auto-restart functions #69

PeriGK opened this issue Apr 10, 2020 · 4 comments

Comments

@PeriGK
Copy link

PeriGK commented Apr 10, 2020

My actions before raising this issue

Hi,

I have written a few functions in openfaas. Some of those are not used very frequently. This morning I tried to send a request to a function that was not touched (HTTP request/build/deploy) for a few weeks.

The function never brought up.

Some facts that came from my investigation:

The docker service ps command returns a shutdown state.
docker service inspect returns a MaxAttempts of 5 in the RestartPolicy, which might be related or not.

In the meantime, as we are speaking about local environments, I have shut down my machine every night, which I suppose is affecting the issue one way or another.

Are any of those related? What about the read_timeout/write_timeout settings?

Expected Behaviour

The function to recover as a reaction to the invoke/http request, of course with some expected delay.

Of course, it is all going back to normal if I do a build-deploy again from the faas-cli (new function with the same contents like the old one). But of course I would like this to happen without any manual intervention.

Current Behaviour

The function is not recovering from down state.

Possible Solution

Steps to Reproduce (for bugs)

  1. Setup a local function. A plain return {"hello": "world"} would suffice.
  2. Leave the function idle for a couple of hours and make sure there is at least a machine restart in the meantime. You may force it by shutting down the docker service which serves the function.
  3. Try to reach the function again
  4. the function is not waking up

Context

I understand this is a common concern, so I don't think this is a bug, rather a lack of my understanding or documentation.

So my questions are:

  • How would you mitigate it? What is most openfaas-y solution for that problem?
  • Are timeouts the best setting to amend?
  • Is Docker build options a good practice? eg restart policy

Your Environment

  • FaaS-CLI version ( Full output from: faas-cli version ):
  ___                   _____           ____
 / _ \ _ __   ___ _ __ |  ___|_ _  __ _/ ___|
| | | | '_ \ / _ \ '_ \| |_ / _` |/ _` \___ \
| |_| | |_) |  __/ | | |  _| (_| | (_| |___) |
 \___/| .__/ \___|_| |_|_|  \__,_|\__,_|____/
      |_|

CLI:
 commit:  73004c23e5a4d3fdb7352f953247473477477a64
 version: 0.11.3

Gateway
 uri:     http://127.0.0.1:8080
 version: 0.18.10
 sha:     80b6976c106370a7081b2f8e9099a6ea9638e1f3
 commit:  Update Golang versions to 1.12


Provider
 name:          faas-swarm
 orchestration: swarm
 version:       0.8.2 
 sha:           47988f8ba284678f3eb86eb62f75f72bafeec4d9
Your faas-cli version (0.11.3) may be out of date. Version: 0.12.2 is now available on GitHub.
  • Docker version docker version (e.g. Docker 17.0.05 ):
Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:21:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:19:41 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
  • Are you using Docker Swarm or Kubernetes (FaaS-netes)?
    Docker Swarm

  • Operating System and version (e.g. Linux, Windows, MacOS):
    Linux

  • Code example or link to GitHub repo or gist to reproduce problem:
    N/A

  • Other diagnostic information / logs from troubleshooting guide
    The service shows no logs.

Thanks,
P.

@alexellis
Copy link
Member

Hi thanks for your interest

Unfortunately unless you fill out the issue template including "Steps to Reproduce", then we're unlikely to be able to help.

Please could you do that?

Thanks

@alexellis alexellis transferred this issue from openfaas/faas Apr 10, 2020
@PeriGK
Copy link
Author

PeriGK commented Apr 10, 2020

Hi @alexellis sorry about that, I forgot. I fixed it now

@PeriGK
Copy link
Author

PeriGK commented Apr 13, 2020

More details:

I spotted that for those containers that fit the problem have the following error in docker service ps service_name. Check the Error column

ID                  NAME                IMAGE                     NODE                DESIRED STATE       CURRENT STATE         ERROR                              PORTS
3omhdi90dsvg        wordcount.1         functions/alpine:latest   moro-dell           Shutdown            Failed 2 months ago   "No such container: wordcount.…"   
obee7cwi3ffa         \_ wordcount.1     functions/alpine:latest   moro-dell           Shutdown            Failed 2 months ago   "No such container: wordcount.…"   
nf1i6pwoct3v         \_ wordcount.1     functions/alpine:latest   moro-dell           Shutdown            Failed 2 months ago   "No such container: wordcount.…"   
sai0wpxjx0z3         \_ wordcount.1     functions/alpine:latest   moro-dell           Shutdown            Failed 2 months ago   "No such container: wordcount.…"   
j9c0yzil6idv         \_ wordcount.1     functions/alpine:latest   moro-dell           Shutdown            Failed 2 months ago   "No such container: wordcount.…"   
opfuml973pxs         \_ wordcount.1     functions/alpine:latest   moro-dell           Shutdown            Failed 2 months ago   "No such container: wordcount.…"   

@PeriGK
Copy link
Author

PeriGK commented May 15, 2020

Hi @alexellis

I did some more investigation. I managed to reproduce it with a function which was working in the afternoon but not in the morning. Looks like the swarm managed couldn't recover the container after shutting down my machine.

Not sure if you have any input on that, but I don't see any other explanation.

Thanks,
P.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants