-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-14673: Add bin/solr stream CLI #2479
base: main
Are you sure you want to change the base?
Conversation
… to modern SolrCLI
So much easier without having to add more bash scripts now!
…eters for headers, delimiters etc...
A few high-level questions/concerns:
|
Thanks for sharing the feedback @gerlowskija ! I think the value of the tool is only there if your second comment about being able to run a streaming expression locally is valid, and then having it do what yoru first comment highlights falls out easy, otherwise it really is a thin wrapper/duplication of the I do believe the second part is the really cool thing, that I can run a streaming expression locally and use it to process some data. We clearly need some way of specifying where the processing is happening, in the cluster or locally. I was trying to think if we have any other places in Solr where we define "Where am I doing work" that might provide a name for a parameter. Reading through docs more, we have the I have found that lots of streaming expressions don't require a Solr connection, especially during development. I'm just iterating on the logic, and I'm starting and ending iwth tuples.. it's only later when I get the mappings etc working that I then move to adding in my Also, as far as docs go, we have a LONG way to go in Streaming expressions. It's both the best docuemnted code, with all the howtos and guides, but also, I find a million expressions that exist but don't show up in our reference docs ;-). |
I went with the plural name --workers solr, and then you pass in a collection. However, I could imagine that this becomes --workers my_collection,films,worker_collection on your local solr... Not quite sure what passing more then one in means however...
Workers is a term that is used in paralelization on Solr itself, so don't want to compete...
…add docs for that.
Okay, I think this is ready for review! I've added some docs.. I especially liked being able to cat some local data right into a Solr collection!
In my local playing, it's been nice to be able to write a complex streaming expression in a file and just run it from the command line.... |
@gerlowskija since you provided some early review, do you think the docs I've added etc are enough that I can merge this in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for those added docs @epugh . They're a huge help, and I suggested a few tweaks that might help even more.
One remaining question I have - wdyt about marking the tool syntax as "experimental" in some way? Seeing all the hard work you've put into improving syntax on the other tools, and considering that we might not notice some rough edges to the syntax of this tool until it's out in the wild a bit...might be prudent to give this script equivalent of "@lucene.experimental" so that we wouldn't need to worry about backcompat if we want to make any future tweaks?
The Stream tool allows you to run a xref:streaming-expressions.adoc[] and see the results from the command line. | ||
It is very similar to the xref:stream-screen.adoc[], but is part of the `bin/solr` CLI. | ||
|
||
To run it, open a window and enter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] "window" -> "terminal" or maybe "shell"?
// under the License. | ||
|
||
The Stream tool allows you to run a xref:streaming-expressions.adoc[] and see the results from the command line. | ||
It is very similar to the xref:stream-screen.adoc[], but is part of the `bin/solr` CLI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] Might be worth mentioning here the other differentiator - that this executes some streams "locally"?
A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM| | ||
---- | ||
|
||
TIP: Notice how we used the pipe character (|) as the delimiter? It required a backslash for escaping it so it wouldn't be treated as a pipe with in the shell script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] "with in" -> "within"
TIP: Notice how we used the pipe character (|) as the delimiter? It required a backslash for escaping it so it wouldn't be treated as a pipe with in the shell script. | ||
|
||
You can also specify a file with the suffix `.expr` containing your streaming expression. | ||
This is useful for longer expressions or if you having command line parsing issues with your expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] "or if you having" -> or if you have"
Might also be worth clarifying that the file approach primarily helps with shell character-escaping issues, and not parsing/syntax issues generally.
A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM | ||
---- | ||
|
||
The `--help` (or simply `-h`) option will output information on its usage (i.e., `bin/solr stream --help)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Should this sentence go in the "Using the ..." section below, since that section essentially pastes the "bin/solr stream -h" output in its entirety?
-u,--credentials <credentials> Credentials in the format username:password. Example: --credentials solr:SolrRocks | ||
-url,--solr-url <HOST> Base Solr URL, which can be used to determine the zk-host if that's not known; | ||
defaults to: http://localhost:8983. | ||
-e,--execution <CONTEXT> Execution context is either 'local' or 'solr'. Default is 'solr' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] I don't love this terminology, but I don't have anything better in mind (yet). "Local" to me could be misconstrued by folks running 'bin/solr' on a box that also happens to have Solr running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on:
Execution context is either 'local' (i.e CLI process) or 'solr'.
|
||
Caveats: | ||
|
||
* You don't get to use any of the parallelization support that is available when you run the expression on the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Is this only a limitation of --execution=local
is specified?
Hello World | ||
---- | ||
|
||
This also works with a `.expr` files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] "also works with a .expr
files." -> "also works when using .expr
files."
) | ||
---- | ||
|
||
Running this expression will read in the local file and send the first two lines to the collection `gettingstarted`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] Might be worth hitting this point a little more strongly: even "local" processing is likely to reach out to a remote host.
Maybe something like:
All streaming expressions are processed "locally" if that execution mode is selected.
However, "local" processing does not imply a networking sandbox.
Many streaming expressions, such assearch
andupdate
, will make network requests to remote Solr nodes if configured to do so, even in "local" execution mode.
https://issues.apache.org/jira/browse/SOLR-14673
Description
Bring in code that @joel-bernstein wrote, but using the SolrCLI infrastructure. The original code is a patch in the associated JIRA.
Solution
Another CLI client ;-)
Tests
Copied over the basic tests from the patch. I still need to write an integration style test and ideally one that exercies the basic auth.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.