Semantic search using AI embeddings

在 v3.7.0 中已经完成底层实现，需要手动配置才能启用：

- conf.json

   比如阿里云百炼

   ```
   "embeddingModel": "text-embedding-v4",
   "embeddingBaseURL": "https://dashscope.aliyuncs.com/compatible-mode/v1",
   "embeddingAPIKey": "sk-xxx"
   ```
- 重建索引后就会开始进行文本向量化
   - 小于 7 个字节或者大于 12000 字节的块会被跳过（少于 2 个汉字或者约 4000 汉字以上会被跳过）
   - data/.siyuan/embeddingignore 用于忽略向量化，和 indexignore 语法一样 glob 模式（.gitignore 语法）
- 等表 block_embeddings 的行数和 blocks 表一样就说明向量化完毕了

测试接口：/api/search/semanticSearchBlock

```json
{
    "query": "找foo相关的内容"
}
```

向量化性能：5 百万字的库使用了 2 百万 token 进行向量化，每秒处理 27 个块。

待研究：

- 关键字高亮
- 设定相关性（向量距离）范围，减少无关结果

---

**注意**：所有符合长度的内容都会进行文本向量化，隐私内容请通过配置 embeddingignore 进行忽略。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Semantic search using AI embeddings #17788

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Semantic search using AI embeddings #17788

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions