You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
9. When configuring the remote interpreter in your IDE on Windows, you cannot use `ssh fqdn` directly, because SSH needs to call Bash somehow.
111
+
112
+
But there's a trick (A). Inside GitBash run `sm-ssh connect` and it will forward you the remote SSH port to `localhost` on port `10022`.
113
+
114
+
Alternatively (B), configure [~/.ssh/config](README.md#sshconfig) inside GitBash and forward the port manually:
115
+
116
+
```bash
117
+
ssh -L localhost:10022:localhost:22 fqdn
118
+
```
119
+
120
+
Now use `localhost:10022` in your IDE to connect to remote interpreter and when the IDE asks for the private key, use either (A) `~/.ssh/fqdn` or (B) `~/.ssh/sagemaker-ssh-gw` respectively.
121
+
110
122
### Are SageMaker notebook instances supported?
111
123
112
124
Yes, the setup is similar to SageMaker Studio. Run [SageMaker_SSH_Notebook.ipynb](SageMaker_SSH_Notebook.ipynb) on the notebook instance and `sm-ssh connect <<notebook-instance-name>>.notebook.sagemaker` your local machine.
@@ -187,7 +199,7 @@ During the container build, execute `sm-setup-ssh configure` and `sm-ssh-ide con
187
199
188
200
See the examples of such containers [byoc/Dockerfile.internet_free](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/tests/byoc/Dockerfile.internet_free) and [byoi_studio/Dockerfile.internet_free](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/tests/byoi_studio/Dockerfile.internet_free) in the tests.
189
201
190
-
You will also need to configure AWS PrivateLink for [Session Manager endpoints](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-getting-started-privatelink.html) and for [STS endpoints](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_sts_vpce.html).
202
+
You will also need to configure AWS PrivateLink for [Session Manager endpoints](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-getting-started-privatelink.html) and for [STS endpoints](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_sts_vpce.html), in addition to your already existing endpoints for SageMaker and S3.
191
203
192
204
*Note:* If you are using the [Network Isolation](https://docs.aws.amazon.com/sagemaker/latest/dg/mkt-algo-model-internet-free.html) mode, i.e., set the `enable_network_isolation` parameter of the `Estimator` to `True`, you won't be able to connect to your containers, because they will have no access to the Amazon Systems Manager.
193
205
@@ -577,6 +589,11 @@ Below are the generic tips to start with:
577
589
578
590
***Important:** Make sure you fully read and understood the "Getting started" section and didn't skip the steps from [Setting up your AWS account with IAM and SSM configuration](IAM_SSM_Setup.md).
579
591
592
+
* Find all instances of SSH Helper installation. They might conflict with each other if both are in the system `PATH`. Switch into each Python environment and uninstall old versions with `pip uninstall`:
593
+
```bash
594
+
find / -name 'sagemaker_ssh_helper' 2>/dev/null
595
+
```
596
+
580
597
* Check that the managed instance in AWS Console in Systems Manager -> Fleet Manager section appears as "Online". Check that you're able to connect to the node from the Console by selecting Node actions -> Start terminal session.
581
598
582
599
If instance is "Offline", you might see this error message when calling an `sm-ssh connect` command:
@@ -595,7 +612,7 @@ An error occurred (InvalidInstanceId) when calling the SendCommand operation: In
595
612
596
613
* Turn on Session Manager [logging](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-logging.html) and inspect the session logs.
597
614
598
-
* Try `sm-ssh list` to see if instance is `Online` or offline (will be marked with `-`). Pay attention to what the output says about the AWS region that you connect to.
615
+
* Try `sm-ssh list` to see if instance is `Online` or offline (will be marked with `ConnectionLost` or `ssh:NotFound`). Pay attention to what the output says about the AWS region that you connect to.
599
616
600
617
* If you have issues with SSH, but you can connect successfully from AWS Console, make sure you can run the both below SSM commands successfully on your local machine:
* Enable Session Manager session logs as described [in the AWS Systems Manager documentation]((https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-logging.html)). You might need to create a new CloudWatch log group, e.g. `/ssm/logs` and / or S3 bucket like `ssm-logs-555555555555`. Note that according to the documentation, *"Logging isn't available for Session Manager sessions that connect through port forwarding or SSH"*, so it will only help you when you connect directly to the `mi-*` instance with [AWS CLI or AWS Console](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-sessions-start.html).
655
+
656
+
* Set locally the environment variable `SM_SSH_DEBUG=true` and check the file `/tmp/sm-ssh-debug.log`
657
+
658
+
* Check that the remote host is not overloaded with tasks and has enough memory to execute SSM and SSH commands, e.g., by running `top` from SageMaker Studio image terminal.
629
659
630
660
### I’m getting an API throttling error in the logs
Copy file name to clipboardExpand all lines: IAM_SSM_Setup.md
+3-4
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ SageMaker SSH Helper relies on the AWS Systems Manager service to create SSH tun
12
12
13
13
### Automated setup with CDK and Cloud9
14
14
15
-
a. Create the [Cloud9](https://docs.aws.amazon.com/cloud9/latest/user-guide/create-environment-main.html) environment. Alternatively, you can the commands run in your local terminal. In this case, make sure you've installed Node.js and CDK and fulfilled [all other CDK prerequisites](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_prerequisites). In both cases you need to have an admin role.
15
+
a. From AWS Console, pop up [CloudShell](https://aws.amazon.com/cloudshell/) environment. Alternatively, you can the commands run in your local terminal. In this case, make sure you've installed Node.js and CDK and fulfilled [all other CDK prerequisites](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_prerequisites). In both cases you need to have an admin role.
16
16
17
17
b. Define your SageMaker role, local user role, AWS account ID and AWS Region as variables by executing the following commands in the terminal line by line:
18
18
@@ -52,6 +52,7 @@ Local variables `SAGEMAKER_ROLE_ARN` and `USER_ROLE_ARN` are passed as parameter
52
52
c. To enable SageMaker SSH Helper in additional AWS Regions, run these commands per region (adjust `REGION` variable each time):
Copy file name to clipboardExpand all lines: README.md
+47-11
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ remote debugging, and advanced troubleshooting.
10
10
11
11
Three most common tasks that motivated to create the library, sometimes referred as "SSH into SageMaker", are:
12
12
1. A terminal session into a container running in SageMaker to diagnose a stuck training job, use CLI commands
13
-
like nvidia-smi, or iteratively fix and re-execute your training script within seconds.
13
+
like nvidia-smi and neuron-ls, or iteratively fix and re-execute your training script within seconds.
14
14
2. Remote debugging of a code running in SageMaker from your local favorite IDE like
15
15
PyCharm Professional Edition or Visual Studio Code.
16
16
3. Port forwarding to access auxiliary tools running inside SageMaker, e.g., Dask dashboard, Streamlit apps, TensorBoard or Spark Web UI.
@@ -90,7 +90,9 @@ Install the latest stable version of library from the [PyPI repository](https://
90
90
```shell
91
91
pip install sagemaker-ssh-helper
92
92
```
93
-
**Caution:** It's always recommended to install the library into a Python venv, not into the system env.
93
+
**Caution:** It's always recommended to install the library into a Python venv, not into the system env. If you want to use later the SSH plugins of your IDE that will use the system env and system Python, you should add the venv into the system PATH, as described in the section [Remote code execution with PyCharm / VSCode over SSH](#remote-interpreter).
94
+
95
+
If you're working on Windows, see [FAQ](FAQ.md#is-windows-supported).
94
96
95
97
### Step 2: Modify your start training job code
96
98
1. Add import for `SSHEstimatorWrapper`
@@ -499,7 +501,7 @@ This low-level script takes the managed instance ID as a parameter. Next section
499
501
The syntax for the SSH Helper CLI command `sm-ssh` is the following:
It's possible to pass additional arguments and forward ports together with the `connect` command, e.g., to forward [SSH Agent](https://linux.die.net/man/1/ssh-agent) and Streamlit web app port:
543
+
544
+
```bash
545
+
ssh-add
546
+
sm-ssh connect ssh-training-example-2023-07-25-03-18-04-490.training.sagemaker -A -L 8501:localhost:8501
547
+
```
548
+
540
549
#### ~/.ssh/config
541
550
542
551
Alternatively, instead of using `sm-ssh connect` command, you can use the native `ssh` command, but it will require you to update your [ssh config](https://linux.die.net/man/5/ssh_config), typically `~/.ssh/config`, with `sm-ssh start-proxy` command as follows:
@@ -577,7 +586,14 @@ Follow the steps in the next section for the IDE configuration, to prepare the `
577
586
sm-local-configure
578
587
```
579
588
580
-
**Caution**: If you plan to use `sm-ssh` tool from the IDE, which you run inside your system Python env, you should install SSH Helper into your system Python env, too.
589
+
**Caution**: You will use SSH plugins from the IDE running inside your system env with system Python, therefore you should add SSH Helper into your system PATH, e.g., on macOS:
You might need restart the Terminal and the IDE for changes to take an effect.
595
+
596
+
Alternatively, use the trick with port forwarding - start the `sm-ssh` or `ssh` with `-L` option inside venv, and then use `localhost` as the host to connect to from IDE. This trick is used to make SSH Helper work on Windows, and it's described in [FAQ - Is Windows Supported?](FAQ.md#is-windows-supported).
581
597
582
598
2. Submit your code to SageMaker with SSH Helper as described in previous sections, e.g. as a [training job](#step-1-install-the-library).
583
599
@@ -589,13 +605,19 @@ Instead of using SSM to connect to the container from command line, proceed to t
589
605
590
606
Make sure you've configured your ssh config as mentioned in the [~/.ssh/config](#sshconfig) section and your IDE can access `sm-ssh` command from the system env.
591
607
608
+
If you connect to your host for the first time, check that `ssh` command is working from CLI:
A. Follow the [instructions in the PyCharm docs](https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html#remote-interpreter), to configure the remote interpreter in PyCharm.
593
615
594
616
In the field for host name, put the same value as for `fqdn` in the [`sm-ssh` command](#sm-ssh), e.g., `ssh-training-manual-2023-10-02-14-38-56-744.training.sagemaker`, and use `root` as the username.
595
617
596
618

597
619
598
-
When PyCharm asks for the SSH key, point to the `~/.ssh/<fqdn>` private key file that was automatically generated for you by SSH Helper:
620
+
If PyCharm asks for the SSH key, point to the `~/.ssh/<fqdn>` private key file that was automatically generated for you by SSH Helper:
599
621
600
622

601
623
@@ -614,11 +636,15 @@ Put the `root@fqdn` as the hostname to connect to, e.g., `root@ssh-training-exam
614
636
615
637

616
638
617
-
> **NOTE:** The **Remote SSH** extension described in the above instructions is only for the [Visual Studio Code native app](https://code.visualstudio.com/). Code Editor in SageMaker Studio and web apps based on [Code Server](https://github.com/coder/code-server) that use extensions from [Open VSX Registry](https://open-vsx.org/) might look and work differently. SageMaker SSH Helper **DOES NOT** support browser-based implementations and haven't been tested with any of Open VSX extensions. If you prefer to use the browser for development, take a look at the [Web VNC](#web-vnc) option.
639
+
> **NOTE:** The **Remote SSH** extension described in the above instructions is only for the [Visual Studio Code native app](https://code.visualstudio.com/). Code Editor in SageMaker Studio and other web apps based on [Code - OSS](https://github.com/microsoft/vscode#visual-studio-code---open-source-code---oss) such as [Code Server](https://github.com/coder/code-server) that use extensions from [Open VSX Registry](https://open-vsx.org/) might look and work differently from the native app that has Microsoft-specific customizations. SageMaker SSH Helper **DOES NOT** support browser-based implementations of VS Code and haven't been tested with any of Open VSX extensions. If you prefer to use the browser for development, take a look at the [Web VNC](#web-vnc) option.
640
+
641
+
There are few extension options that you might want to change for VS Code to work properly with SageMaker containers:
618
642
619
-
You might also need to increase "Remote.SSH: Connect Timeout" option to `90` in VS Code. See [the StackOverflow post](https://stackoverflow.com/questions/59978826/why-ssh-connection-timed-out-in-vscode) for details.
643
+
*You might need to increase "Remote.SSH: Connect Timeout" option to `120` in VS Code. See [the StackOverflow post](https://stackoverflow.com/questions/59978826/why-ssh-connection-timed-out-in-vscode) for details.
620
644
621
-
If you see the error `tar: code: Cannot change ownership to uid 1000, gid 1000: Operation not permitted` when connecting, then try to set "Remote.SSH: Use Exec server" to `false`, as mentioned in [#58 - vscode connect fails](https://github.com/aws-samples/sagemaker-ssh-helper/issues/58).
645
+
* If you see the error `tar: code: Cannot change ownership to uid 1000, gid 1000: Operation not permitted` when connecting, then try to set "Remote.SSH: Use Exec server" to `false`, as mentioned in [#58 - vscode connect fails](https://github.com/aws-samples/sagemaker-ssh-helper/issues/58).
646
+
647
+
* You might also need to set "Remote.SSH: Use Local Server" to `false` and "Remote.SSH: Lockfiles In Tmp" to `true`, if you still have connection problems.
622
648
623
649
4. Connect to the instance and stop the waiting loop
624
650
@@ -735,15 +761,23 @@ For your local IDE integration with SageMaker Studio, follow the same steps as f
735
761
736
762
1. Copy [SageMaker_SSH_IDE.ipynb](SageMaker_SSH_IDE.ipynb) into SageMaker Studio and run it.
737
763
764
+
Note that the `main` branch of this repo can contain changes that are not compatible with the version of `sagemaker-ssh-helper` that you installed from pip.
765
+
766
+
To be completely sure that you're using the version of the notebook that corresponds to the installed library, take a copy of the notebook from your filesystem after you install SSH Helper package, e.g.:
You can also check the version with `pip freeze | grep sagemaker-ssh-helper` and take the notebook from [the corresponding release tag](https://github.com/aws-samples/sagemaker-ssh-helper/tags).
773
+
738
774
Alternatively, [attach](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-lcc-create.html) to a domain the KernelGateway lifecycle config script [kernel-lc-config.sh](kernel-lc-config.sh)
739
775
(you may need to ask your administrator to do this).
740
776
Once configured, from the Launcher choose the environment, pick up the lifecycle script and choose
741
777
'Open image terminal' (so, you don't even need to create a notebook).
742
778
743
779
You might want to change the `LOCAL_USER_ID` variable upon the first run, to prevent users from impersonating each other. For more details see the FAQ on [How SageMaker SSH Helper protects users from impersonating each other?](FAQ.md#how-sagemaker-ssh-helper-protects-users-from-impersonating-each-other).
744
780
745
-
> Note that the `main` branch of this repo can contain changes that are not compatible with the version of `sagemaker-ssh-helper` that you installed from pip. To ensure the stable performance, check the version with `pip freeze | grep sagemaker-ssh-helper` and take the notebook and the lifecycle script from [the corresponding tag](https://github.com/aws-samples/sagemaker-ssh-helper/tags).
746
-
747
781
2. Configure remote interpreter in PyCharm / VS Code to connect to SageMaker Studio
748
782
749
783
Use `app_name.user_profile_name.domain_id.studio.sagemaker` or `app_name.studio.sagemaker` as the `fqdn` to connect.
@@ -754,6 +788,8 @@ To see available apps to connect to, you may run the `list` command:
754
788
sm-ssh list studio.sagemaker
755
789
```
756
790
791
+
*Note:* If you're using Windows, see [the FAQ](FAQ.md#is-windows-supported).
792
+
757
793
3. Using the remote Jupyter Notebook
758
794
759
795
In recent versions of PyCharm, Jupyter Notebook is tunnelled automatically through remote interpreter connection. You might need to add `--allow-root` argument to the command line, when your remote interpreter runs under root:
0 commit comments