Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Referee reports # 1, CMAME Sep. 2016 (decision: reject) #19

Open
13 tasks done
labarba opened this issue Oct 19, 2016 · 13 comments
Open
13 tasks done

Referee reports # 1, CMAME Sep. 2016 (decision: reject) #19

labarba opened this issue Oct 19, 2016 · 13 comments

Comments

@labarba
Copy link
Member

labarba commented Oct 19, 2016

Revised paper submitted to CMAME on 15 March 2016 — Decision: Reject

Editor comments

I regret to inform you that the reviewers of your manuscript have advised against publication, and I must therefore decline it. For your guidance, the reviewers' comments are included below.

Referee # 1 comments

Summary of the problem being solved. This paper considers boundary element method (BEM) formulations of certain problems, namely, Laplace problems and Stokes flow problems that arise in blood flow simulation. Computationally, this formulation requires solving a large dense linear system, most typically solved by Krylov subspace-based iterative methods, of which this paper considers GMRES. The usual performance bottleneck of a Krylov subspace solver is a matrix-vector product ("matvec"). In the target applications, the matrix is special: its entries are K(x, y) evaluated at pairs of points (x, y), where K(x, y) decays rapidly with the distance between x and y. In this case, one may implement fast but approximate matrix-vector products. A classical approach for the target problems is the fast multipole method (FMM) of Greengard & Rokhlin (mid-80s). The FMM has the feature that one may control tradeoff performance and accuracy via a parameter, the so-called
degree of the multipole expansion, denoted by $p$ and roughly corresponding to the order of truncation in a series expansion. (Higher values of $p$ are more accurate but slower.)

With this background, the specific problem this paper tackles is to study the interaction between $p$ and the outer GMRES solver.

Technical contribution. The paper develops an integrated GMRES+FMM method for its target problems that exploits a clever and nontrivial observation from the theory of inexact Krylov methods: as the iterations proceed, the matvec can actually become progressively less accurate without harming the convergence rate of the solver. The idea of the paper is to decrease $p$ as the iterations proceed.

From this idea, the paper studies several model problems to show that it can work and characterizes the time and accuracy tradeoffs, with some interesting FMM-specific observations along the way.

Summary evaluation and recommendation. Though there is a lot of "setup," the idea is both clever and simple, meaning it is likely to work and have an impact. The results are a good first "proof-of-concept" of the feasibility of the proposed technique. The suggested edits that appear below are all minor. As such, I recommend the paper be accepted with just minor revisions.

One major strength of the paper, beyond the idea and its demonstration, is the effort the authors made toward making the results reproducible, including an open-source code repository with sufficient infrastructure to recompute the results and even make the plots. Cool!

  • [ ] 1. One major weakness to me is that the procedure (or schedule) for relaxing $p$ is is not described with enough precision. Section 2.6 sketches a procedure with reference to equations (17)-(19), but I did not find this description sufficiently algorithmic to code up myself or use to interpret what is happening in the results, e.g., in Figure 6. (I suppose the code that accompanies the paper is that algorithmic description. However, I think the paper should also summarize it in a way that does not require referring to that code.)

Detailed comments for the authors. My remaining issues, detailed below, are fairly minor and mostly a matter of editing or questions to consider clarifying in a revision.

  • 2. Section 2.5, near equation (14): $y_c$ should be defined explicitly.
  • 3. Below equation (14), "… resulting in an octree structure." I assume this octree is adaptive, since it only needs to represent points at the surface of the objects? (Just saying "adaptive octree" would be sufficient here.)
  • 4. Section 2.6, equation (17): Is it really the case that $\epsilon_k$ can be as large as 1? In that case, the perturbation is as large as the norm of the operator, which seems surprisingly high.
  • 5. Section 2.6, below equation (16): There is a space missing in the text, "… where $a$ isthe cluster radius …"
  • 6. Various places: I think there is some minor inconsistency in the notation for the number of Gauss points. It is introduced as $K$ (capital-K) but appears in the appendix as $N_k$ and in some figures and results as $k$ (lowercase-k).
  • 7. Section 2.6: As noted in the summary above, the precise procedure for modifying $p$ with respect to equations (17)-(19) is not clear enough to this reviewer.
  • 8. Section 3, Figure 3: The claim of linear scaling in Figure 3 seems unnecessarily qualitative. Rather than using the "eye-norm," which not just fit $N^r$ to the points and report the exponent $r$?
  • 9. Section 3 and 3.1: The results focus immediately on the case of using the FMM to approximate the matrix. For completeness, another quick experiment would be to compare using the exact matrix to the FMM and reporting just the number of iterations in each case. Since the cost of using the exact matrix might be prohibitive due to its quadratic scaling in $N$, just a few cases with "small" values of $N$ (possibly up to $N=10,000$) would be sufficient. The point of such an experiment is to help the reader verify that there is no blow-up in the number of iterations due to approximation.
  • 10. Section 3, Figure 6: How would the residual look if $p$ were fixed at its initial value?
  • 11. Section 3.1, page 10: The paper discusses the relative imbalance of near- and far-field cost. It would be good to show that explicitly, the same way you do later, e.g., Figure 12.
  • 12. Figure 11: In the relaxed scheme, what is the schedule for varying $p$? Is that given by the values listed for $p_{\mathrm{min}}$ in Table 5?
  • 13. Figure 14: Why are the middle tolerances faster than the extremes?
  • 14. Figure 17: Like Figure 14, why do middle values of $N$ require more iterations?
@tingyu66
Copy link
Member

tingyu66 commented Dec 6, 2016

comment # 2:
fixed by commit ff3ace7

@tingyu66
Copy link
Member

tingyu66 commented Dec 6, 2016

comment # 3:
fixed by commit d90a0c8

@tingyu66
Copy link
Member

tingyu66 commented Dec 6, 2016

comment # 4:

Section 2.6, equation (17): Is it really the case that $\epsilon_k$ can be as large as 1? In that case, the perturbation is as large as the norm of the operator, which seems surprisingly high.

The last iteration before convergence can make εk as large as 1. The residual of the last iteration rk should be smaller than the desired tolerance η, thus the first element in the min function is larger than 1, so εk is 1, which means the allowed pertubation approaches its maximum (εk = 1) at convergence.

@tingyu66
Copy link
Member

tingyu66 commented Dec 6, 2016

comment # 5:
fixed by commit f39f844

@tingyu66
Copy link
Member

comment # 6:
fixed by commit b1b2ed4

@tingyu66
Copy link
Member

tingyu66 commented Jan 13, 2017

comment # 7:

Section 2.6: As noted in the summary above, the precise procedure for modifying $p$ with respect to equations (17)-(19) is not clear enough to this reviewer.

The FMM error bound in eqn.18 is a function of the order of expansion p. Since θMAC = 0.5, we replace a/r with 0.5 (for the worst case) in the eqn.18. By equating the FMM error bound with the allowed perturbation εk at k-th iteration in eqn.17, we can obtain the relationship between the required pk and εk at k-th iteration as shown in eqn.19. εk can be calculated by using eqn.17 given the residual at the previous step rk-1 and the desired tolerance η.

@tingyu66
Copy link
Member

tingyu66 commented Jan 13, 2017

comment # 8:

Section 3, Figure 3: The claim of linear scaling in Figure 3 seems unnecessarily qualitative. Rather than using the "eye-norm," which not just fit $N^r$ to the points and report the exponent $r$?

I agree it is always better to have both qualitative and quantitative results. The fitted exponent (or the fitted slope in this log-log plot) is 1.06, showing a linear scaling with respect to N.

@tingyu66
Copy link
Member

comment # 9:

Section 3 and 3.1: The results focus immediately on the case of using the FMM to approximate the matrix. For completeness, another quick experiment would be to compare using the exact matrix to the FMM and reporting just the number of iterations in each case. Since the cost of using the exact matrix might be prohibitive due to its quadratic scaling in $N$, just a few cases with "small" values of $N$ (possibly up to $N=10,000$) would be sufficient. The point of such an experiment is to help the reader verify that there is no blow-up in the number of iterations due to approximation.

The acceleration compared with the traditional BEM is twofold, and it comes from: 1). using FMM-accelerated mat-vec and 2). introducing relaxation in GMRES (using a decreasing p in FMM). The first improvement is very common in Fast BEM applications (see Nishimura's review article on fast multipole accelerated boundary integral equation methods). They showed that using FMM-accelerated mat-vec would not blow up the number of iteration, and that is why we did not compare with the exact mat-vec. Moreover, our main contribution is to relax further (gradually decrease p) in GMRES solver. Therefore, in the result section, we measure the speedups based on FMM-accelerated BEM without relaxation, rather than vanilla BEM.

@tingyu66
Copy link
Member

comment # 10:

Section 3, Figure 6: How would the residual look if $p$ were fixed at its initial value?

The fixed-p residual history curve is very close to the relaxed-p curve (the solid line on figure 6), since relaxation will not greatly affect the number of iterations to converge. Table 1 and Table 2 show that both cases take the same number of iterations to converge.

@tingyu66
Copy link
Member

tingyu66 commented Jan 16, 2017

comment # 11:

Section 3.1, page 10: The paper discusses the relative imbalance of near- and far-field cost. It would be good to show that explicitly, the same way you do later, e.g., Figure 12.

We first clarify that the near-field and far-field loads are unbalanced in each iteration, but well-balanced overall. The reason for not adding the time breakdown plot in section 3.1 is that the trend is very similar to Figure 12 in section 3.2. Since we follow the same mechanism to reduce p for each case, we will always get a bloated far-field in the first several iterations and a smaller far-field in the later iterations. Therefore, we only show this trend once to avoid redundancy. For the same reason, we did not add a residual history & required-p plot (like Figure 6) in StokesBEM section.

@tingyu66
Copy link
Member

tingyu66 commented Jan 17, 2017

comment # 12:

Figure 11: In the relaxed scheme, what is the schedule for varying $p$? Is that given by the values listed for $p_{\mathrm{min}}$ in Table 5?

The schedule for varying p comes from eqn.17 ~ eqn.19 (see comment # 7 above). For the relaxed cases in a Stokes flow problem, there are two p s to distinguish between: pinitial (the high p used in the first iteration) and pmin (the minimum value of p allowed in the relaxed solver). As the iteration proceeds, the required-p for each iteration decreases from pinitial to pmin. Table 5 is a parameter study to determine pmin. In Figure 11, we use pinitial=16 and pmin=5.

@tingyu66
Copy link
Member

tingyu66 commented Jan 17, 2017

comment # 13:

Figure 14: Why are the middle tolerances faster than the extremes?

First, this test is aimed to prove that the relaxation will always offer a decent speedup no matter what the user-defined tolerance is. Second, from Figure 14 we do see that the speedup due to relaxation is greater at the middle tolerances than at the extreme tolerances. The desired tolerance η's effect on the speedup is tricky and indirect. η first determines allowed perturbation (eqn.17), then effects the required-p at each iteration. The combination of these required-p s together determines the optimal ncrit, resulting in the best possible runtime. Therefore, I would say that the shape of this curve is problem-dependent.

@labarba
Copy link
Member Author

labarba commented Jan 28, 2017

comment # 14

  1. Figure 17: Like Figure 14, why do middle values of $N$ require more iterations?

The last "peak" number of iterations at N=8192 is 39. The two cases that follow with larger N indeed take fewer iterations, but not by much: 35 and 31, respectively. We don't know "why," but we don't think it is important. In practice, one should probably use tighter solver tolerances when refining the surface mesh. However, for presentation purposes, we generally want to change one parameter at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants