Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(plan-generator): improve memory and cpu usage #1684

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

alepane21
Copy link
Contributor

@alepane21 alepane21 commented Mar 12, 2025

Motivation and Context

A customer is having issues with how much memory the query planner is using. After some debugging we found two things:

  • the customer was deploying the query planner in a container, and the query planner was instead using the memory and cpu limits from the host
  • the PlanGenerator was creating a new plan.Configuration and ast.Document for each go routine

This PR is acting on this two issues:

  • using go.uber.org/automaxprocs/maxprocs and github.com/KimMachineGun/automemlimit/memlimit to recognize when running inside a container and set the right limits
  • refactoring the PlanGenerator so that it can be instantiate only one time, and then each operation can get its own planner.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

Results

Testing with a big configuration (9MB), a concurrency of 20 and 643 queries, with the query-plan deployed with the router 0.189.2, we got the following result:

ale@Alessandros-MacBook-Pro router % /usr/bin/time -l ./router query-plan -execution-config ../routerConfig.json -operations ../operations  -plans ../plans --concurrency 20
        0.96 real         9.70 user         0.74 sys
          2041774080  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              125644  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                1377  signals received
                1787  voluntary context switches
               42662  involuntary context switches
        115559925736  instructions retired
         37222549148  cycles elapsed
          2026227680  peak memory footprint

The results vary a bit on each execution, by they are always arount this values.

With the router build from this PR we got the following result:

ale@Alessandros-MacBook-Pro router % /usr/bin/time -l ./router query-plan -execution-config ../routerConfig.json -operations ../perations  -plans ../plans --concurrency 20
        0.77 real         7.28 user         0.35 sys
           509607936  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               31690  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                1598  signals received
                1763  voluntary context switches
               27882  involuntary context switches
         87230173821  instructions retired
         27234204057  cycles elapsed
           493224824  peak memory footprint

So we went from a peak RSS of 2GB to 0.5GB!

Copy link

github-actions bot commented Mar 12, 2025

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-a75719116da8fbaffad7f1637b9179f395d19b3d

@alepane21 alepane21 marked this pull request as ready for review March 12, 2025 11:30
@StarpTech
Copy link
Contributor

Can you tell anything about the real world improvements? (Memory etc..)

@alepane21
Copy link
Contributor Author

Can you tell anything about the real world improvements? (Memory etc..)

@StarpTech Good idea, added in description, thanks!

// when the system is under memory pressure e.g. when GC is not able to free memory fast enough.
// More details: https://tip.golang.org/doc/gc-guide#Memory_limit
mLimit, err := memlimit.SetGoMemLimitWithOpts(
memlimit.WithRatio(0.9),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there is any case we want to parameterize this, I would assume not since a use can specify GOMEMLIMIT, wondering if they would want to specify a ratio sometimes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it like that to follow the same pattern we are following in the router main command: https://github.com/wundergraph/cosmo/blob/main/router/cmd/instance.go#L37

// More details: https://tip.golang.org/doc/gc-guide#Memory_limit
mLimit, err := memlimit.SetGoMemLimitWithOpts(
memlimit.WithRatio(0.9),
memlimit.WithProvider(memlimit.FromCgroupHybrid),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment about memlimit.FromCgroupHybrid

logger.Info(fmt.Sprintf(msg, args...))
}))
if err != nil {
logger.Fatal(fmt.Sprintf("could not set max GOMAXPROCS: %s", err.Error()))
Copy link
Contributor

@SkArchon SkArchon Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use the zap logger semantics by default, without needing to use fmt.Sprintf, eg :-

logger.Fatal("Could not start router", zap.Error(err))

you can do this for other similar places here, and use the correct type, eg :- zap.String etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants