Referee reports # 1, CMAME Sep. 2016 (decision: reject) #19

labarba · 2016-10-19T17:21:06Z

Revised paper submitted to CMAME on 15 March 2016 — Decision: Reject

Editor comments

I regret to inform you that the reviewers of your manuscript have advised against publication, and I must therefore decline it. For your guidance, the reviewers' comments are included below.

Referee # 1 comments

Summary of the problem being solved. This paper considers boundary element method (BEM) formulations of certain problems, namely, Laplace problems and Stokes flow problems that arise in blood flow simulation. Computationally, this formulation requires solving a large dense linear system, most typically solved by Krylov subspace-based iterative methods, of which this paper considers GMRES. The usual performance bottleneck of a Krylov subspace solver is a matrix-vector product ("matvec"). In the target applications, the matrix is special: its entries are K(x, y) evaluated at pairs of points (x, y), where K(x, y) decays rapidly with the distance between x and y. In this case, one may implement fast but approximate matrix-vector products. A classical approach for the target problems is the fast multipole method (FMM) of Greengard & Rokhlin (mid-80s). The FMM has the feature that one may control tradeoff performance and accuracy via a parameter, the so-called
degree of the multipole expansion, denoted by $p$ and roughly corresponding to the order of truncation in a series expansion. (Higher values of $p$ are more accurate but slower.)

With this background, the specific problem this paper tackles is to study the interaction between $p$ and the outer GMRES solver.

Technical contribution. The paper develops an integrated GMRES+FMM method for its target problems that exploits a clever and nontrivial observation from the theory of inexact Krylov methods: as the iterations proceed, the matvec can actually become progressively less accurate without harming the convergence rate of the solver. The idea of the paper is to decrease $p$ as the iterations proceed.

From this idea, the paper studies several model problems to show that it can work and characterizes the time and accuracy tradeoffs, with some interesting FMM-specific observations along the way.

Summary evaluation and recommendation. Though there is a lot of "setup," the idea is both clever and simple, meaning it is likely to work and have an impact. The results are a good first "proof-of-concept" of the feasibility of the proposed technique. The suggested edits that appear below are all minor. As such, I recommend the paper be accepted with just minor revisions.

One major strength of the paper, beyond the idea and its demonstration, is the effort the authors made toward making the results reproducible, including an open-source code repository with sufficient infrastructure to recompute the results and even make the plots. Cool!

[ ] 1. One major weakness to me is that the procedure (or schedule) for relaxing $p$ is is not described with enough precision. Section 2.6 sketches a procedure with reference to equations (17)-(19), but I did not find this description sufficiently algorithmic to code up myself or use to interpret what is happening in the results, e.g., in Figure 6. (I suppose the code that accompanies the paper is that algorithmic description. However, I think the paper should also summarize it in a way that does not require referring to that code.)

Detailed comments for the authors. My remaining issues, detailed below, are fairly minor and mostly a matter of editing or questions to consider clarifying in a revision.

tingyu66 · 2016-12-06T00:44:45Z

comment # 2:
fixed by commit ff3ace7

tingyu66 · 2016-12-06T00:46:33Z

comment # 3:
fixed by commit d90a0c8

tingyu66 · 2016-12-06T00:53:40Z

comment # 4:

Section 2.6, equation (17): Is it really the case that $\epsilon_k$ can be as large as 1? In that case, the perturbation is as large as the norm of the operator, which seems surprisingly high.

The last iteration before convergence can make ε_k as large as 1. The residual of the last iteration r_k should be smaller than the desired tolerance η, thus the first element in the min function is larger than 1, so ε_k is 1, which means the allowed pertubation approaches its maximum (ε_k = 1) at convergence.

tingyu66 · 2016-12-06T00:54:47Z

comment # 5:
fixed by commit f39f844

tingyu66 · 2017-01-13T18:47:17Z

comment # 6:
fixed by commit b1b2ed4

tingyu66 · 2017-01-13T19:34:29Z

comment # 7:

Section 2.6: As noted in the summary above, the precise procedure for modifying $p$ with respect to equations (17)-(19) is not clear enough to this reviewer.

The FMM error bound in eqn.18 is a function of the order of expansion p. Since θ_MAC = 0.5, we replace a/r with 0.5 (for the worst case) in the eqn.18. By equating the FMM error bound with the allowed perturbation ε_k at k-th iteration in eqn.17, we can obtain the relationship between the required p_k and ε_k at k-th iteration as shown in eqn.19. ε_k can be calculated by using eqn.17 given the residual at the previous step r_k-1 and the desired tolerance η.

tingyu66 · 2017-01-13T20:00:43Z

comment # 8:

Section 3, Figure 3: The claim of linear scaling in Figure 3 seems unnecessarily qualitative. Rather than using the "eye-norm," which not just fit $N^r$ to the points and report the exponent $r$?

I agree it is always better to have both qualitative and quantitative results. The fitted exponent (or the fitted slope in this log-log plot) is 1.06, showing a linear scaling with respect to N.

tingyu66 · 2017-01-16T18:33:44Z

comment # 9:

Section 3 and 3.1: The results focus immediately on the case of using the FMM to approximate the matrix. For completeness, another quick experiment would be to compare using the exact matrix to the FMM and reporting just the number of iterations in each case. Since the cost of using the exact matrix might be prohibitive due to its quadratic scaling in $N$, just a few cases with "small" values of $N$ (possibly up to $N=10,000$) would be sufficient. The point of such an experiment is to help the reader verify that there is no blow-up in the number of iterations due to approximation.

The acceleration compared with the traditional BEM is twofold, and it comes from: 1). using FMM-accelerated mat-vec and 2). introducing relaxation in GMRES (using a decreasing p in FMM). The first improvement is very common in Fast BEM applications (see Nishimura's review article on fast multipole accelerated boundary integral equation methods). They showed that using FMM-accelerated mat-vec would not blow up the number of iteration, and that is why we did not compare with the exact mat-vec. Moreover, our main contribution is to relax further (gradually decrease p) in GMRES solver. Therefore, in the result section, we measure the speedups based on FMM-accelerated BEM without relaxation, rather than vanilla BEM.

tingyu66 · 2017-01-16T18:42:08Z

comment # 10:

Section 3, Figure 6: How would the residual look if $p$ were fixed at its initial value?

The fixed-p residual history curve is very close to the relaxed-p curve (the solid line on figure 6), since relaxation will not greatly affect the number of iterations to converge. Table 1 and Table 2 show that both cases take the same number of iterations to converge.

tingyu66 · 2017-01-16T19:11:31Z

comment # 11:

Section 3.1, page 10: The paper discusses the relative imbalance of near- and far-field cost. It would be good to show that explicitly, the same way you do later, e.g., Figure 12.

We first clarify that the near-field and far-field loads are unbalanced in each iteration, but well-balanced overall. The reason for not adding the time breakdown plot in section 3.1 is that the trend is very similar to Figure 12 in section 3.2. Since we follow the same mechanism to reduce p for each case, we will always get a bloated far-field in the first several iterations and a smaller far-field in the later iterations. Therefore, we only show this trend once to avoid redundancy. For the same reason, we did not add a residual history & required-p plot (like Figure 6) in StokesBEM section.

tingyu66 · 2017-01-17T18:05:20Z

comment # 12:

Figure 11: In the relaxed scheme, what is the schedule for varying $p$? Is that given by the values listed for $p_{\mathrm{min}}$ in Table 5?

The schedule for varying p comes from eqn.17 ~ eqn.19 (see comment # 7 above). For the relaxed cases in a Stokes flow problem, there are two p s to distinguish between: p_initial (the high p used in the first iteration) and p_min (the minimum value of p allowed in the relaxed solver). As the iteration proceeds, the required-p for each iteration decreases from p_initial to p_min. Table 5 is a parameter study to determine p_min. In Figure 11, we use p_initial=16 and p_min=5.

tingyu66 · 2017-01-17T18:38:59Z

comment # 13:

Figure 14: Why are the middle tolerances faster than the extremes?

First, this test is aimed to prove that the relaxation will always offer a decent speedup no matter what the user-defined tolerance is. Second, from Figure 14 we do see that the speedup due to relaxation is greater at the middle tolerances than at the extreme tolerances. The desired tolerance η's effect on the speedup is tricky and indirect. η first determines allowed perturbation (eqn.17), then effects the required-p at each iteration. The combination of these required-p s together determines the optimal n_crit, resulting in the best possible runtime. Therefore, I would say that the shape of this curve is problem-dependent.

labarba · 2017-01-28T00:29:23Z

comment # 14

Figure 17: Like Figure 14, why do middle values of $N$ require more iterations?

The last "peak" number of iterations at N=8192 is 39. The two cases that follow with larger N indeed take fewer iterations, but not by much: 35 and 31, respectively. We don't know "why," but we don't think it is important. In practice, one should probably use tighter solver tolerances when refining the surface mesh. However, for presentation purposes, we generally want to change one parameter at a time.

labarba added referee comments reject accept and removed reject labels Oct 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Referee reports # 1, CMAME Sep. 2016 (decision: reject) #19

Referee reports # 1, CMAME Sep. 2016 (decision: reject) #19

labarba commented Oct 19, 2016 •

edited

Loading

tingyu66 commented Dec 6, 2016 •

edited

Loading

tingyu66 commented Dec 6, 2016

tingyu66 commented Dec 6, 2016

tingyu66 commented Dec 6, 2016

tingyu66 commented Jan 13, 2017

tingyu66 commented Jan 13, 2017 •

edited

Loading

tingyu66 commented Jan 13, 2017 •

edited

Loading

tingyu66 commented Jan 16, 2017

tingyu66 commented Jan 16, 2017

tingyu66 commented Jan 16, 2017 •

edited

Loading

tingyu66 commented Jan 17, 2017 •

edited

Loading

tingyu66 commented Jan 17, 2017 •

edited

Loading

labarba commented Jan 28, 2017

Referee reports # 1, CMAME Sep. 2016 (decision: reject) #19

Referee reports # 1, CMAME Sep. 2016 (decision: reject) #19

Comments

labarba commented Oct 19, 2016 • edited Loading

tingyu66 commented Dec 6, 2016 • edited Loading

tingyu66 commented Dec 6, 2016

tingyu66 commented Dec 6, 2016

tingyu66 commented Dec 6, 2016

tingyu66 commented Jan 13, 2017

tingyu66 commented Jan 13, 2017 • edited Loading

tingyu66 commented Jan 13, 2017 • edited Loading

tingyu66 commented Jan 16, 2017

tingyu66 commented Jan 16, 2017

tingyu66 commented Jan 16, 2017 • edited Loading

tingyu66 commented Jan 17, 2017 • edited Loading

tingyu66 commented Jan 17, 2017 • edited Loading

labarba commented Jan 28, 2017

labarba commented Oct 19, 2016 •

edited

Loading

tingyu66 commented Dec 6, 2016 •

edited

Loading

tingyu66 commented Jan 13, 2017 •

edited

Loading

tingyu66 commented Jan 13, 2017 •

edited

Loading

tingyu66 commented Jan 16, 2017 •

edited

Loading

tingyu66 commented Jan 17, 2017 •

edited

Loading

tingyu66 commented Jan 17, 2017 •

edited

Loading