Skip to content

Latest commit

 

History

History
333 lines (246 loc) · 9.2 KB

File metadata and controls

333 lines (246 loc) · 9.2 KB

Data Commons Local Mixer Developer Guide

Prerequisite

  • Contact DataCommons team to get data access to Cloud Bigtable and BigQuery.

  • Install the following tools

    • gcloud
    • Golang
    • protoc at version 3.21.12
      • If using Homebrew, run brew install protobuf@21 and be sure to update your path as described in the output (likely it'll instruct you to run echo 'export PATH="/opt/homebrew/opt/protobuf@21/bin:$PATH"' >> ~/.zshrc).

    Make sure to add GOPATH and update PATH:

    # Use the actual path of your Go installation
    export GOPATH=/Users/<USER>/go/
    export PATH=$PATH:$GOPATH/bin
  • Authenticate to GCP

    gcloud components update
    gcloud auth login
    gcloud auth application-default login

Generate Go proto files

Install the following packages as a one-time action.

cd ~/   # Be sure there is no go.mod in the local directory
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.30.0
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@v1.3.0

Run the following command to generate Go proto files.

# In repo root directory
./scripts/compile_protos.sh

Start Mixer as a gRPC server

To run the Mixer with a default configuration, use the run_server.sh script:

./run_server.sh

This will start the Mixer with a cleaner, less-dense log output suitable for local development. To use the raw JSON logs instead, pass the --json-log flag.

Once the Mixer is ready to serve, you can send some sample gRPC requests:

go run examples/api/main.go

Log Level Configuration

You can configure the logging level using the MIXER_LOG_LEVEL environment variable. Valid values are DEBUG, INFO, WARN, ERROR.

If not specified:

  • Local runs (via run_server.sh) default to DEBUG.
  • Other runs default to INFO.

Example:

MIXER_LOG_LEVEL=ERROR ./run_server.sh

Start Mixer as a gRPC server backed by SQLite Database

Mixer can load data stored in SQLite database. This requires setting flag:

  • --use_sqlite=true

Run the following code to start mixer, with sqlite database datacommons.db in the root of this repo.

# In repo root directory
export MIXER_API_KEY=<YOUR API KEY>
./run_server.sh \
    --use_bigquery=false \
    --use_base_bigtable=false \
    --use_branch_bigtable=false \
    --use_maps_api=false \
    --use_sqlite=true \
    --sqlite_path=$PWD/test/datacommons.db \
    --remote_mixer_domain=https://api.datacommons.org

Start Mixer as a gRPC server backed by CloudSQL Database

Mixer can load data stored from Google CloudSQL. This requires setting flag:

  • --use_cloudsql=true
# In repo root directory
export MIXER_API_KEY=<YOUR API KEY>
export DB_USER=<user>
export DB_PASS=<password>
./run_server.sh \
    --use_bigquery=false \
    --use_base_bigtable=false \
    --use_branch_bigtable=false \
    --use_maps_api=false \
    --use_cloudsql=true \
    --cloudsql_instance=<project>:<region>:dc-graph \
    --remote_mixer_domain=https://api.datacommons.org

Start Mixer with Spanner Graph and V3 APIs

Enabling Spanner Graph requires the following feature flags to be set:

  • EnableV3: true
  • UseSpannerGraph: true

These are currently set in local.yaml.

Additionally, to use a database other than the default in spanner_graph_info.yaml, set the feature flag:

  • SpannerGraphDatabase: <DATABASE NAME>
# In repo root directory
export MIXER_API_KEY=<YOUR API KEY>
./run_server.sh \
    --feature_flags_path=$PWD/deploy/featureflags/local.yaml \
    --spanner_graph_info="$(cat deploy/storage/spanner_graph_info.yaml)" 

Running ESP locally

Mixer is a gRPC service but callers (website, API clients) are normally http clients. Therefore developing and testing mixer locally often requires both the mixer gRPC server and its corresponding json transcoding server. HTTP to gRPC translation can be done locally thorugh envoy proxy. To install envoy, please follow the official doc.

Before running envoy proxy, please make sure mixer service definition(mixer-grpc.pb) is available by running the follow from repo root.

protoc --proto_path=proto \
  --include_source_info \
  --include_imports \
  --descriptor_set_out mixer-grpc.pb \
  proto/*.proto proto/**/*.proto

Start mixer gRPC server (running at localhost:12345) by following ./run_server.sh as instructed in the previous section.

In a new shell, run the following from repo root to spin up envoy proxy. This exposes the http mixer service at localhost:8081.

envoy -l warning --config-path esp/envoy-config.yaml

Running Redis locally

Mixer can use Redis as a cache. To run Redis locally for development and connect mixer to it:

  1. Install Redis and start a Redis server. On MacOS, you can use Homebrew:
brew install redis
redis-server
  1. Start mixer with ./run_server.sh as described above and add the following flags:
./run_server.sh \
  --feature_flags_path=$PWD/deploy/featureflags/local.yaml \
  --use_redis=true \
  --redis_info="$(cat <<EOF
instances:
  - host: "127.0.0.1"
    port: "6379"
EOF
)"

Setting feature flags to match an environment

Use the --feature_flags_path argument to specify a feature flag environment YAML file to read values from. If not specified, default flag values are used.

# Example for local testing:
./run_server.sh --feature_flags_path=$PWD/deploy/featureflags/local.yaml
# Use the same values that are used in dev
./run_server.sh --feature_flags_path=$PWD/deploy/featureflags/dev.yaml

Update Go package dependencies

To view possible updates:

go list -m -u all

To update:

go get -u ./...
go mod tidy

Run tests (Go)

./run_test.sh

Lint (Go)

./run_test.sh -l

Auto-fix some lint issues (Go)

./run_test.sh -f

Update e2e test golden files (Go)

./scripts/update_golden.sh

Run import group latency tests

In root directory, run:

./test/e2e/run_latency.sh

Profile a program

Install Graphgiz.

go test -v -parallel 1 -cpuprofile cpu.prof -memprofile mem.prof XXX_test.go
go tool pprof -png cpu.prof
go tool pprof -png mem.prof

Profile Mixer Startup Memory

Run the regular go run cmd/main.go command that you'd like to profile with the flag --startup_memprof=<output_file_path>. This will save the memory profile to that path, and you can use go tool pprof to analyze it. For example;

# Command from ### Start Mixer as a gRPC server backed by TMCF + CSV files
# In repo root directory
go run cmd/main.go \
    --host_project=datcom-mixer-dev-316822 \
    --bq_dataset=$(head -1 deploy/storage/bigquery.version) \
    --base_bigtable_info="$(cat deploy/storage/base_bigtable_info.yaml)" \
    --schema_path=$PWD/deploy/mapping/ \
    --startup_memprof=grpc.memprof     # <-- note the additional flag here

# -sample_index=alloc_space reports on all memory allocations, including those
# that have been garbage collected. use -sample_index=inuse_space for memory
# still in use after garbage collection
go tool pprof -sample_index=alloc_space -png grpc.memprof

Profile API Requests against a Running Mixer Instance

Run the regular go run cmd/main.go command that you'd like to profile with the flag --httpprof_port=<port, recommended 6060>. This will run the mixer server with an HTTP handler at that port serving memory and CPU profiles of the running server.

go run cmd/main.go \
    --host_project=datcom-mixer-dev-316822 \
    --bq_dataset=$(head -1 deploy/storage/bigquery.version) \
    --base_bigtable_info="$(cat deploy/storage/base_bigtable_info.yaml)" \
    --schema_path=$PWD/deploy/mapping/ \
    --httpprof_port=6060     # <-- note the additional flag here

Once this server is ready to serve requests, you can send it requests and use the profile handler to retrieve memory and CPU profiles. test/http_memprof/http_memprof.go is a program that automatically sends and profiles the memory usage of given gRPC calls. You can update this file to your profiling needs or use it as a starting point for an independent script that will automatically run a suite of tests.

# in another process...
go run test/http_memprof/http_memprof.go \
  --grpc_addr=127.0.0.1:12345 \ # default is given; where to find the Mixer server
  --prof_addr=127.0.0.1:6060 # default is given; where to find the live profile handler

go tool pprof also supports ad-hoc profiling of servers started as described above. To use, specify the URL at which the HTTP handler can be found as the input file argument. pprof will download a profile from the handler and open in interactive mode to run queries.

# ?gc=1 triggers a garbage collection run before the memory profile is served
# See net/http/pprof for other URLs and profiles available https://pkg.go.dev/net/http/pprof
# with no flags specifying output, pprof goes into interactive mode
go tool pprof -sample_index=alloc_space 127.0.0.1:6060/debug/pprof/heap?gc=1