Skip to content

Conversation

ijokarumawak
Copy link
Member

@ijokarumawak ijokarumawak commented May 15, 2018

This PR adds ProxyConfigService into following processors. However, due to the restriction of underlying libraries, SOCKS and SOCKS+Auth are partially supported.

Category Processors HTTP HTTP+Auth SOCKS SOCKS+Auth
FTP FetchFTP GetFTP ListFTP PutFTP OK OK OK N/A
SFTP (jsch) FetchSFTP GetSFTP ListSFTP PutSFTP OK OK OK OK
HTTP (HTTPClient) GetHTTP PostHTTP OK OK N/A N/A
HTTP (OKHttp) InvokeHTTP OK OK OK N/A
ES (OkHttp) PutElasticsearchHttp QueryElasticsearchHttp FetchElasticsearchHttp FetchElasticsearchHttp PutElasticsearchHttpRecord OK OK OK N/A
AWS PutS3Object ListS3 FetchS3Object DeleteS3Object PutKinesisFirehose PutKinesisStream PutLambda PutDynamoDB DeleteDynamoDB GetDynamoDB OK OK N/A N/A
Azure FetchAzureBlobStorage ListAzureBlobStorage PutAzureBlobStorage DeleteAzureBlobStorage GetAzureQueueStorage PutAzureQueueStorage OK N/A OK N/A

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

@MikeThomsen
Copy link
Contributor

@ijokarumawak haven't had a chance to take a look at this, but have you tried it against Solr and Elastic yet? I think the latter's APIs do their own proxy management so that might need a little finessing here.

@jugi92
Copy link
Contributor

jugi92 commented May 15, 2018

It would be very nice if the initial proxy service also includes a SOCKS Proxy example. Other processors that implement the Proxy Service can then reuse the existing implementation even better. For example we would probably implement that change for the SFTP processors then.

@ottobackwards
Copy link
Contributor

How will this work with the AWS components? They have proxy as well ( although there is a PR for full support ), but a different builder I think

@ijokarumawak
Copy link
Member Author

@ottobackwards I assume you were talking about this, #2016. That one adds user/password for proxy authentication at abstract AWS processor. This PR adds ProxyConfigurationService, which can be added on top of #2016 for AWS processors proxy configurations to be managed by the centralized Controller Service. Please look at the FTP and HTTP processors in this PR, AWS ones can adopt the CS same way.

@ijokarumawak
Copy link
Member Author

@jugi92 FTPTransfer supports SOCKS proxy. Specifically at these lines:

         if (proxyType == Proxy.Type.HTTP) {
-            client = new FTPHTTPClient(proxyHost, proxyPort, ctx.getProperty(HTTP_PROXY_USERNAME).getValue(), ctx.getProperty(HTTP_PROXY_PASSWORD).getValue());
+            client = new FTPHTTPClient(proxyHost, proxyPort, proxyConfig.getProxyUserName(), proxyConfig.getProxyUserPassword());
         } else {
             client = new FTPClient();
             if (proxyType == Proxy.Type.SOCKS) {
                 client.setSocketFactory(new SocksProxySocketFactory(new Proxy(proxyType, new InetSocketAddress(proxyHost, proxyPort))));
             }
         }

https://github.com/apache/nifi/pull/2704/files#diff-6e7e715d42f332cbe404edd9afbcaafaL533

For processors those don't support SOCKS proxy, following validation code should be added into their customValidate method, to confirm that ProxyConfigurationService is configured with the supported proxy type(s):

ProxyConfiguration.validateProxyType(validationContext, results, Proxy.Type.HTTP);

ProxyConfigurationService just holds the centralized proxy settings, each processor is responsible to use the settings with its own relying SDK/API way.

I checked #2018 but the PR doesn't look active. I will take a closer look on SFTP processor and #2018 to see if I can include SFTP ones into this PR, too.

@ijokarumawak
Copy link
Member Author

@MikeThomsen We can combine ProxyConfigurationService into ES or Solr, the CS just let users manage proxy settings in a centralized place. I will take a look on #2094 to see how I can help review that one. Thanks.

@ijokarumawak
Copy link
Member Author

Now this PR includes SFTP processors and SOCKS proxy support for SFTP as well.

@ijokarumawak
Copy link
Member Author

Elasticsearch processors are also included in this PR now.
I'm researching on AWS and Azure processors now, but those can be done separately.

break;
case SOCKS:
final ProxySOCKS5 proxySOCKS5 = new ProxySOCKS5(proxyConfig.getProxyServerHost(), proxyConfig.getProxyServerPort());
session.setProxy(proxySOCKS5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add:
if (proxyConfig.hasCredential()) { socksProxy.setUserPasswd(proxyConfig.getProxyUserName(), proxyConfig.getProxyUserPassword()); }

Copy link
Member Author

@ijokarumawak ijokarumawak May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jugi92 Thanks! I actually didn't know SOCKS protocol supports user authentication.. It seems only SFTP processors (thanks to the underlying jsch) support that. I've added your suggestion and confirmed with Dante SOCKS server with authentication.
https://gist.github.com/ijokarumawak/b3a31378bdc0a6c6b9922a138e9ec9c1

I will update this PR shortly.

@ottobackwards
Copy link
Contributor

@ijokarumawak I'm talking about passing around an HttpClientBuilder when not everyone uses that.

@ijokarumawak
Copy link
Member Author

ijokarumawak commented May 17, 2018

@ottobackwards You are talking about these code specifically?

HTTPUtils.setProxy(context, clientBuilder, credentialsProvider);

Then yes, the above util method accepts HttpClientBuilder and useful only for processors those use HttpClient library. It's currently used from only GetHTTP and PostHTTP. It's just a convenient method for those two for now.

Other processors who don't use HttpClient, uses ProxyConfiguration directly to get proxy settings. Following snippet is copied from AbstractAWSProcessor:

// Get Proxy configuration from ProxyConfigurationService if it's used, or from processor's own proxy configurations, either way, the configurations are put into the `proxyConfig` instance. And subsequent code do not have to care how where these settings are set.
final ProxyConfiguration proxyConfig = ProxyConfiguration.getConfiguration(context, () -> {
    if (context.getProperty(PROXY_HOST).isSet()) {
        final ProxyConfiguration componentProxyConfig = new ProxyConfiguration();
        String proxyHost = context.getProperty(PROXY_HOST).evaluateAttributeExpressions().getValue();
        Integer proxyPort = context.getProperty(PROXY_HOST_PORT).evaluateAttributeExpressions().asInteger();
        String proxyUsername = context.getProperty(PROXY_USERNAME).evaluateAttributeExpressions().getValue();
        String proxyPassword = context.getProperty(PROXY_PASSWORD).evaluateAttributeExpressions().getValue();
        componentProxyConfig.setProxyType(Proxy.Type.HTTP);
        componentProxyConfig.setProxyServerHost(proxyHost);
        componentProxyConfig.setProxyServerPort(proxyPort);
        componentProxyConfig.setProxyUserName(proxyUsername);
        componentProxyConfig.setProxyUserPassword(proxyPassword);
        return componentProxyConfig;
    }
    return ProxyConfiguration.DIRECT_CONFIGURATION;
});

// Apply Proxy settings to underlying SDK/API.
if (Proxy.Type.HTTP.equals(proxyConfig.getProxyType())) {
    config.setProxyHost(proxyConfig.getProxyServerHost());
    config.setProxyPort(proxyConfig.getProxyServerPort());

    if (proxyConfig.hasCredential()) {
        config.setProxyUsername(proxyConfig.getProxyUserName());
        config.setProxyPassword(proxyConfig.getProxyUserPassword());
    }
}

Does that answer to your question?

@ijokarumawak
Copy link
Member Author

Now this PR also includes AWS related processors. I've tested following processors can utilize HTTP forward proxies and support authentication:

  • PutS3Object
  • ListS3
  • FetchS3Object
  • DeleteS3Object
  • PutKinesisFirehose
  • PutKinesisStream
  • PutLambda
  • PutDynamoDB
  • DeleteDynamoDB
  • GetDynamoDB

@ijokarumawak
Copy link
Member Author

I've summarized current capabilities on this PR's description. Please check the table. We can keep expanding the list of processors, but I'd stop here and finish reviewing these processors as the 1st phase.

ijokarumawak and others added 6 commits May 17, 2018 18:25
- Added ProxyConfigurationService to manage centralized proxy
configurations
- Adopt ProxyConfigurationService at FTP and HTTP processors
- Fixed check style issue
- Use the same proxy related PropertyDescriptors from FTPTransfer and
SFTPTransfer
- Dropped FlowFile EL evaluation support to make it align with other
processors spec, Now it supports VARIABLE_REGISTRY
- Added ProxyConfigurationService to SFTP processors
- Added SOCKS proxy support to SFTP processors
…ssors

- ElasticsearchHttp processors now support SOCKS proxy, too
- Added proxy support to PutElasticsearchHttpRecord
- Moved more common property descriptors to
AbstractElasticsearchHttpProcessor and just return static unmodifiable
property descriptor list at each implementation processors
NIFI-4196 - Fix jUnit errors

This closes apache#2016.

Signed-off-by: Koji Kawamura <[email protected]>
- Applied ProxyConfigService to S3 processors
- Added proxy support to following processors:
  - PutKinesisFirehose, PutKinesisStream
  - PutDynamoDB, DeleteDynamoDB, GetDynamoDB
  - PutKinesisStream
- All AWS processors support HTTP proxy now
Copy link
Contributor

@jvwing jvwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijokarumawak Thanks for this PR, this is a much needed approach to Proxy configuration.

I have reviewed your changes with respect to the AWS processors only. The code looks good, I recommend only a few minor tweaks. I tested a flow with some S3 processors using the StandardProxyConfigurationService, the separate AWS processor PROXY_HOST properties, and no proxy. Everything worked fine in my tests.

@@ -311,5 +312,10 @@ public void testGetPropertyDescriptors() throws Exception {
assertTrue(pd.contains(ListS3.PREFIX));
assertTrue(pd.contains(ListS3.USE_VERSIONS));
assertTrue(pd.contains(ListS3.MIN_AGE));
assertTrue(pd.contains(ProxyConfigurationService.PROXY_CONFIGURATION_SERVICE));
assertTrue(pd.contains(ListS3.PROXY_HOST));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor tweak: The check for PROXY_HOST and PROXY_HOST_PORT duplicates checks above on lines 309-310. I believe this is why we add 5 lines of new assertions, but the count of property descriptors only goes up by 3 from 17 to 20. It doesn't make any difference, really, but the math was bothering me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvwing Good catch, thanks. I removed the duplicated PROXY_HOST and PROXY_HOST_PORT. The missing one was LIST_TYPE. So, 17 + 3 = 20. 3 additions are PROXY_USER, PROXY_PASS and LIST_TYPE.

@@ -92,6 +94,23 @@
.addValidator(StandardValidators.PORT_VALIDATOR)
.build();

public static final PropertyDescriptor PROXY_USERNAME = new PropertyDescriptor.Builder()
.name("Proxy Username")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend a separate name vs displayName for PROXY_USERNAME.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update it.

.build();

public static final PropertyDescriptor PROXY_PASSWORD = new PropertyDescriptor.Builder()
.name("Proxy Password")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend a separate name vs displayName for PROXY_PASSWORD.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update it.

- Each processor has different supporting Proxy specs
- Show supported spec to ProxyConfigurationService property doc
- Validate not only Proxy type, but also with Authentication
- Fixed TestListS3 property descriptor check
- Separate name and displayName
@ijokarumawak
Copy link
Member Author

Added 3 more commits.

  1. Added proxy support to Azure processors.
  2. Adding more explicit Proxy spec check and doc. Due to the restrictions of underlying libraries, Proxy support spec varies. Based on the investigation summarized in this PR's description, I've used 4 labels to represent spec HTTP, HTTP_AUTH, SOCKS and SOCKS_AUTH.
  3. Incorporated review comments.

Example screenshots:
InvokeHTTP does not support SOCKS_AUTH, so if ProxyConfigurationService is configured with SOCKS and username/password, then it becomes invalid, but SOCKS without auth can be used:
image

PostHTTP does not support SOCKS at all:
image

Not only validation, property description shows what proxy is supported:
image

SFTP processors are the only ones supporting all Proxy specs:
image

@ijokarumawak
Copy link
Member Author

I really stop updating this PR. No more addition from my side. Let's wrap this up. Thanks for reviewing!

@ottobackwards
Copy link
Contributor

Should the tests for InvokeHTTP be updated to test with the changes?

@MikeThomsen
Copy link
Contributor

@ijokarumawak I'm going to start reviewing this. Once we get this done, I could use a hand with a review on this lookup service I wrote which I'm partly holding back so I can do its proxy support via your changes here.

Copy link
Contributor

@MikeThomsen MikeThomsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM. 2/3 builds pass so everything looks in order. As mentioned, this ticket may require ongoing refactoring of additional components, so since LGTM now we'll merge so we can keep scope creep down.

@MikeThomsen
Copy link
Contributor

@trixpan @ijokarumawak There are two other tickets referenced, 4196 and 4175(?) in the commit list for this PR. Before I keep squashing, I want to confirm that you want me to keep going and put 3 "This closes #ABCD" statements in there to close this, 4196 and 4175.

@asfgit asfgit closed this in 2834fa4 May 20, 2018
@ijokarumawak
Copy link
Member Author

@MikeThomsen Thanks for merging this. Although my original intent was keeping commits made by @trixpan separated (not squashed) to retain his credits, it looks good to me because the original PR 4196 and 4175 are closed as I expected. @trixpan Thanks again for originating this improvements!

@ottobackwards
Copy link
Contributor

I found a bug in this in the aws implementation, I am not sure how you would see it in the other processors, I found it when bringing this code into my Gateway Api PR.

The issue is that customValidate validates that both host and port need to be set, but not that both user and password need to be set.

Since I test for this ( from the InvokeHttp testProxy ), I fail.

@ottobackwards
Copy link
Contributor

Once I prove out my fix and update my pr, I'll guess I'll do a PR against master with that fix?

@ottobackwards
Copy link
Contributor

@ottobackwards
Copy link
Contributor

#2727

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants