Skip to content

Commit 4619a87

Browse files
committed
first commit
0 parents  commit 4619a87

4 files changed

Lines changed: 80 additions & 0 deletions

File tree

.private-env

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# fscrawler
2+
export ELASTIC_VERSION=7.17.0
3+
export FSCRAWLER_VERSION=2.10-SNAPSHOT-ocr-es6

README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# docker-compose-fscrawler
2+
3+
> Mostly inspired by [fscrawler docs](https://fscrawler.readthedocs.io/en/latest/dev/doc.html)
4+
5+
6+
## What
7+
> You can build a basic search engine using elasticsearch & fscrawler. Quickly start up this using docker compose.
8+
9+
10+
## How to use
11+
### Source version env file
12+
13+
```
14+
# export ELASTIC_VERSION=7.17.0
15+
# export FSCRAWLER_VERSION=2.10-SNAPSHOT-ocr-es6
16+
source .private-env
17+
```
18+
19+
### Run elasticsearch.
20+
21+
```
22+
docker-compose up -d elasticsearch
23+
docker-compose logs -f elasticsearch
24+
```
25+
26+
### Run fscrawler
27+
28+
```
29+
docker-compose up fscrawler
30+
```

config/job_name/_settings.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
name: "job_name"
2+
elasticsearch:
3+
nodes:
4+
- url: "http://elasticsearch:9200"

docker-compose.yml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
version: '3'
2+
services:
3+
# Elasticsearch Cluster
4+
elasticsearch:
5+
image: docker.elastic.co/elasticsearch/elasticsearch:$ELASTIC_VERSION
6+
container_name: elasticsearch
7+
environment:
8+
- bootstrap.memory_lock=true
9+
- discovery.type=single-node
10+
restart: always
11+
ulimits:
12+
memlock:
13+
soft: -1
14+
hard: -1
15+
volumes:
16+
- data:/usr/share/elasticsearch/data
17+
ports:
18+
- 9200:9200
19+
networks:
20+
- fscrawler_net
21+
22+
# FSCrawler
23+
fscrawler:
24+
image: dadoonet/fscrawler:$FSCRAWLER_VERSION
25+
container_name: fscrawler
26+
restart: always
27+
volumes:
28+
- ${PWD}/config:/root/.fscrawler
29+
- ${PWD}/logs:/usr/share/fscrawler/logs
30+
- ../../test-documents/src/main/resources/documents/:/tmp/es:ro
31+
depends_on:
32+
- elasticsearch
33+
command: fscrawler --rest idx
34+
networks:
35+
- fscrawler_net
36+
37+
volumes:
38+
data:
39+
driver: local
40+
41+
networks:
42+
fscrawler_net:
43+
driver: bridge

0 commit comments

Comments
 (0)