Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the code path in createFileset and optimize path. #6562

Open
yuqi1129 opened this issue Feb 27, 2025 · 2 comments
Open

Optimize the code path in createFileset and optimize path. #6562

yuqi1129 opened this issue Feb 27, 2025 · 2 comments
Assignees

Comments

@yuqi1129
Copy link
Contributor

yuqi1129 commented Feb 27, 2025

Code in HadoopCatalogOperations#createFileset

   try {
      // formalize the path to avoid path without scheme, uri, authority, etc.
      filesetPath = formalizePath(filesetPath, conf);

      FileSystem fs = getFileSystem(filesetPath, conf);
      if (!fs.exists(filesetPath)) {
        if (!fs.mkdirs(filesetPath)) {
          throw new RuntimeException(
              "Failed to create fileset " + ident + " location " + filesetPath);
        }

        LOG.info("Created fileset {} location {}", ident, filesetPath);
      } else {
        LOG.info("Fileset {} manages the existing location {}", ident, filesetPath);
      }

    } catch (IOException ioe) {
      throw new RuntimeException(
          "Failed to create fileset " + ident + " location " + filesetPath, ioe);
    
  • filesetPath = formalizePath(filesetPath, conf);
  • FileSystem fs = getFileSystem(filesetPath, conf);

These two lines will repeatedly get and initialize file system and can be merged into one

      AtomicReference<FileSystem> fileSystem = new AtomicReference<>();
      Awaitility.await()
          .atMost(timeoutSeconds, TimeUnit.SECONDS)
          .until(
              () -> {
                fileSystem.set(provider.getFileSystem(path, config));
                return true;
              });
      return fileSystem.get();

This code can be replaced to Java Future mechanism to reduce the time taken in poll status.

There may be other minor points to improve.

@Abyss-lord
Copy link
Contributor

I would like to work on it.

@yuqi1129
Copy link
Contributor Author

OK, just go ahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants