refactor: use the new cas server (#128)

numb3r3 · ZiniuYu · web-flow · commit c7346303897e · 2022-10-27T17:15:53.000+08:00
* fix: use new clip server

* fix: use new clip server

* fix: address comment

Co-authored-by: Ziniu Yu &lt;ziniuyu@gmail.com&gt;

* fix: address comment

Co-authored-by: Ziniu Yu &lt;ziniuyu@gmail.com&gt;

* fix: use new clip server

* fix: add cas token env in flow parser

* fix: add cas token env in flow parser

* fix: add cas token env in flow parser

* fix: disable external

Co-authored-by: Ziniu Yu &lt;ziniuyu@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -29,6 +29,7 @@ DALL·E Flow is in client-server architecture.
 
 ## Updates
 
+- ⚠️ **2022/10/26** To use CLIP-as-service available at `grpcs://api.clip.jina.ai:2096` (requires `jina >= v3.11.0`), you need first get an access token from [here](https://console.clip.jina.ai/get_started). See [Use the CLIP-as-service](#use-the-clip-as-service) for more details.
 - 🌟 **2022/9/25** Automated [CLIP-based segmentation](https://github.com/timojl/clipseg) from a prompt has been added.
 - 🌟 **2022/8/17** Text to image for [Stable Diffusion](https://github.com/CompVis/stable-diffusion) has been added. In order to use it you will need to agree to their ToS, download the weights, then enable the flag in docker or `flow_parser.py`.
 - ⚠️ **2022/8/8** Started using CLIP-as-service as an [external executor](https://docs.jina.ai/fundamentals/flow/add-executors/#external-executors). Now you can easily [deploy your own CLIP executor](#run-your-own-clip) if you want. There is [a small breaking change](https://github.com/jina-ai/dalle-flow/pull/74/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R103) as a result of this improvement, so [please _reopen_ the notebook in Google Colab](https://colab.research.google.com/github/jina-ai/dalle-flow/blob/main/client.ipynb).
@@ -170,7 +171,7 @@ DALL·E Flow needs one GPU with 21GB VRAM at its peak. All services are squeezed
 
 The following reasonable tricks can be used for further reducing VRAM:
 - SwinIR can be moved to CPU (-3GB)
-- CLIP can be delegated to [CLIP-as-service demo server](https://github.com/jina-ai/clip-as-service#text--image-embedding) (-3GB)
+- CLIP can be delegated to [CLIP-as-service free server](https://console.clip.jina.ai/get_started) (-3GB)
 
 
 It requires at least 50GB free space on the hard drive, mostly for downloading pretrained models.
@@ -399,10 +400,53 @@ Congrats! Now you should be able to [run the client](#client).
 
 You can modify and extend the server flow as you like, e.g. changing the model, adding persistence, or even auto-posting to Instagram/OpenSea. With Jina and DocArray, you can easily make DALL·E Flow [cloud-native and ready for production](https://github.com/jina-ai/jina). 
 
-### Run your own CLIP
 
-By default [`CLIPTorchEncoder`](https://hub.jina.ai/executor/gzpbl8jh) runs as an [external executor](https://docs.jina.ai/fundamentals/flow/add-executors/#external-executors).
-If you want to run your own CLIP, you can do that by removing external executor related configs (`host, port, tls and external`) from [`flow.yml`](./flow.yml).
+### Use the CLIP-as-service
+
+To reduce the usage of vRAM, you can use the `CLIP-as-service` as an external executor freely available at `grpcs://api.clip.jina.ai:2096`.  
+First, make sure you have created an access token from [console website](https://console.clip.jina.ai/get_started), or CLI as following
+
+```bash
+jina auth token create <name of PAT> -e <expiration days>
+```
+
+Then, you need to change the executor related configs (`host`, `port`, `external`, `tls` and `grpc_metadata`) from [`flow.yml`](./flow.yml).
+
+```yaml
+...
+  - name: clip_encoder
+    uses: jinahub+docker://CLIPTorchEncoder/latest-gpu
+    host: 'api.clip.jina.ai'
+    port: 2096
+    tls: true
+    external: true
+    grpc_metadata:
+      authorization: "<your access token>"
+    needs: [gateway]
+...
+  - name: rerank
+    uses: jinahub+docker://CLIPTorchEncoder/latest-gpu
+    host: 'api.clip.jina.ai'
+    port: 2096
+    uses_requests:
+      '/': rank
+    tls: true
+    external: true
+    grpc_metadata:
+      authorization: "<your access token>"
+    needs: [dalle, diffusion]
+```
+
+You can also use the `flow_parser.py` to automatically generate and run the flow with using the `CLIP-as-service` as external executor:
+
+```bash
+python flow_parser.py --cas-token "<your access token>'
+jina flow --uses flow.tmp.yml
+```
+
+> ⚠️ `grpc_metadata` is only available after Jina `v3.11.0`. If you are using an older version, please upgrade to the latest version.
+
+Now, you can use the free `CLIP-as-service` in your flow.
 
 <!-- start support-pitch -->
 ## Support
diff --git a/flow-jcloud.yml b/flow-jcloud.yml
@@ -30,10 +30,6 @@ executors:
     uses_with:
       name: ViT-L-14-336::openai
     replicas: 1
-    host: 'demo-cas.jina.ai'
-    port: 2096
-    tls: true
-    external: true
     needs: [gateway]
     gpus: all
     jcloud:
@@ -58,10 +54,6 @@ executors:
     uses_requests:
       '/': rank
     replicas: 1
-    host: 'demo-cas.jina.ai'
-    port: 2096
-    tls: true
-    external: true
     needs: [dalle, diffusion]
     gpus: all
     jcloud:
diff --git a/flow.yml b/flow.yml
@@ -15,10 +15,6 @@ executors:
     replicas: 1  # change this if you have larger VRAM
   - name: clip_encoder
     uses: jinahub+docker://CLIPTorchEncoder/latest-gpu
-    host: 'demo-cas.jina.ai'
-    port: 2096
-    tls: true
-    external: true
     needs: [gateway]
   - name: diffusion
     uses: executors/glid3/config.yml
@@ -30,12 +26,8 @@ executors:
     needs: [clip_encoder]
   - name: rerank
     uses: jinahub+docker://CLIPTorchEncoder/latest-gpu
-    host: 'demo-cas.jina.ai'
-    port: 2096
     uses_requests:
       '/': rank
-    tls: true
-    external: true
     needs: [dalle, diffusion]
   - name: upscaler
     uses: executors/swinir/config.yml
diff --git a/flow_parser.py b/flow_parser.py
@@ -32,6 +32,7 @@
 ENV_GPUS_GLID3XL = 'GPUS_GLID3XL'
 ENV_GPUS_SWINIR = 'GPUS_SWINIR'
 ENV_GPUS_STABLE_DIFFUSION = 'GPUS_STABLE_DIFFUSION'
+ENV_CAS_TOKEN = 'CAS_TOKEN'
 
 FLOW_KEY_ENV = 'env'
 FLOW_KEY_ENV_CUDA_DEV = 'CUDA_VISIBLE_DEVICES'
@@ -44,6 +45,9 @@
 SWINIR_FLOW_NAME = 'upscaler'
 STABLE_DIFFUSION_FLOW_NAME = 'stable'
 
+CLIP_AS_SERVICE_HOST = os.environ.get('CLIP_AS_SERVICE_HOST', 'api.clip.jina.ai')
+CLIP_AS_SERVICE_PORT = os.environ.get('CLIP_AS_SERVICE_PORT', '2096')
+
 
 def represent_ordereddict(dumper, data):
     '''
@@ -103,6 +107,11 @@ def represent_ordereddict(dumper, data):
     action='store_true',
     help="Enable Stable Diffusion executor (default false)",
     required=False)
+parser.add_argument('--cas-token',
+    dest='cas_token',
+    help="Token to authenticate with the CAS service (default ''). If not set, the CAS service will not be used.",
+    default='',
+    required=False)
 parser.add_argument('--gpus-dalle-mega',
     dest='gpus_dalle_mega',
     help="GPU device ID(s) for DALLE-MEGA (default 0)",
@@ -152,6 +161,7 @@ def represent_ordereddict(dumper, data):
     args.get('gpus_stable_diffusion')
 gpus_swinir = os.environ.get(ENV_GPUS_SWINIR, False) or \
     args.get('gpus_swinir')
+cas_token = os.environ.get(ENV_CAS_TOKEN, '') or args.get('cas_token')
 
 CLIPSEG_DICT = OrderedDict({
     'env': {
@@ -174,6 +184,7 @@ def represent_ordereddict(dumper, data):
     'uses': f'executors/{STABLE_DIFFUSION_FLOW_NAME}/config.yml',
 })
 
+
 def _filter_out(flow_exec_list, name):
     return list(filter(lambda exc: exc['name'] != name, flow_exec_list))
 
@@ -185,6 +196,17 @@ def _filter_out(flow_exec_list, name):
         print(exc)
         sys.exit(1)
 
+    # If the cas_token is not empty, we will use the clip-as-a-service as external executor
+    if cas_token:
+        for ext in flow_as_dict['executors']:
+            if ext['name'] in [CAS_FLOW_NAME, RERANK_FLOW_NAME]:
+                ext['host'] = CLIP_AS_SERVICE_HOST
+                ext['port'] = int(CLIP_AS_SERVICE_PORT)
+                ext['external'] = True
+                ext['tls'] = True
+                ext['grpc_metadata'] = {'authorization': cas_token}
+
+
     # For backwards compatibility, we inject the stable diffusion configuration
     # into the flow yml and then remove it if needed.
     #