Skip to content

Commit 7c0cf12

Browse files
Documentation updates: Windows instructions, troubleshooting steps, moving to CloudShell from Cloud9, IDE integration steps
1 parent 61bd4af commit 7c0cf12

File tree

5 files changed

+91
-20
lines changed

5 files changed

+91
-20
lines changed

FAQ.md

+34-4
Original file line numberDiff line numberDiff line change
@@ -45,10 +45,10 @@ We often see a lot of questions that surface repeatedly. This repository is an a
4545
The solution was primarily designed for developers who are using Linux and macOS.
4646

4747
Basic scenarios, which require only SSM without SSH, work on Windows without
48-
any additional configuration.
48+
any additional configuration, i.e., you only need to install the library with pip.
4949

5050
To be able to connect from your local machine with SSH and start port forwarding with the script `sm-ssh`, please consider that
51-
you need Bash interpreter and Python to execute them. They don't work in PowerShell or in the default Command Prompt.
51+
you need Bash interpreter and Python to execute them. They don't work in PowerShell or in the default Command Prompt that have no Bash.
5252

5353
However, it's possible also to make it working on Windows, with some limitations on use from IDEs that use the Command Prompt.
5454

@@ -107,6 +107,18 @@ export AWS_DEFAULT_REGION=eu-west-1
107107
sm-ssh list
108108
```
109109

110+
9. When configuring the remote interpreter in your IDE on Windows, you cannot use `ssh fqdn` directly, because SSH needs to call Bash somehow.
111+
112+
But there's a trick (A). Inside GitBash run `sm-ssh connect` and it will forward you the remote SSH port to `localhost` on port `10022`.
113+
114+
Alternatively (B), configure [~/.ssh/config](README.md#sshconfig) inside GitBash and forward the port manually:
115+
116+
```bash
117+
ssh -L localhost:10022:localhost:22 fqdn
118+
```
119+
120+
Now use `localhost:10022` in your IDE to connect to remote interpreter and when the IDE asks for the private key, use either (A) `~/.ssh/fqdn` or (B) `~/.ssh/sagemaker-ssh-gw` respectively.
121+
110122
### Are SageMaker notebook instances supported?
111123

112124
Yes, the setup is similar to SageMaker Studio. Run [SageMaker_SSH_Notebook.ipynb](SageMaker_SSH_Notebook.ipynb) on the notebook instance and `sm-ssh connect <<notebook-instance-name>>.notebook.sagemaker` your local machine.
@@ -187,7 +199,7 @@ During the container build, execute `sm-setup-ssh configure` and `sm-ssh-ide con
187199

188200
See the examples of such containers [byoc/Dockerfile.internet_free](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/tests/byoc/Dockerfile.internet_free) and [byoi_studio/Dockerfile.internet_free](https://github.com/aws-samples/sagemaker-ssh-helper/blob/main/tests/byoi_studio/Dockerfile.internet_free) in the tests.
189201

190-
You will also need to configure AWS PrivateLink for [Session Manager endpoints](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-getting-started-privatelink.html) and for [STS endpoints](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_sts_vpce.html).
202+
You will also need to configure AWS PrivateLink for [Session Manager endpoints](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-getting-started-privatelink.html) and for [STS endpoints](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_sts_vpce.html), in addition to your already existing endpoints for SageMaker and S3.
191203

192204
*Note:* If you are using the [Network Isolation](https://docs.aws.amazon.com/sagemaker/latest/dg/mkt-algo-model-internet-free.html) mode, i.e., set the `enable_network_isolation` parameter of the `Estimator` to `True`, you won't be able to connect to your containers, because they will have no access to the Amazon Systems Manager.
193205

@@ -577,6 +589,11 @@ Below are the generic tips to start with:
577589
578590
* **Important:** Make sure you fully read and understood the "Getting started" section and didn't skip the steps from [Setting up your AWS account with IAM and SSM configuration](IAM_SSM_Setup.md).
579591
592+
* Find all instances of SSH Helper installation. They might conflict with each other if both are in the system `PATH`. Switch into each Python environment and uninstall old versions with `pip uninstall`:
593+
```bash
594+
find / -name 'sagemaker_ssh_helper' 2>/dev/null
595+
```
596+
580597
* Check that the managed instance in AWS Console in Systems Manager -> Fleet Manager section appears as "Online". Check that you're able to connect to the node from the Console by selecting Node actions -> Start terminal session.
581598
582599
If instance is "Offline", you might see this error message when calling an `sm-ssh connect` command:
@@ -595,7 +612,7 @@ An error occurred (InvalidInstanceId) when calling the SendCommand operation: In
595612
596613
* Turn on Session Manager [logging](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-logging.html) and inspect the session logs.
597614
598-
* Try `sm-ssh list` to see if instance is `Online` or offline (will be marked with `-`). Pay attention to what the output says about the AWS region that you connect to.
615+
* Try `sm-ssh list` to see if instance is `Online` or offline (will be marked with `ConnectionLost` or `ssh:NotFound`). Pay attention to what the output says about the AWS region that you connect to.
599616
600617
* If you have issues with SSH, but you can connect successfully from AWS Console, make sure you can run the both below SSM commands successfully on your local machine:
601618
@@ -605,6 +622,8 @@ aws ssm start-session --target mi-01234567890abcdef \
605622
--document-name AWS-StartSSHSession --parameters portNumber=22
606623
```
607624
625+
* Use `ssh -v` for additional log output
626+
608627
* (SageMaker Studio) Check SSM agent logs. From the image terminal run:
609628
```text
610629
tail /var/log/amazon/ssm/*.log && date
@@ -626,6 +645,17 @@ Check carefully the notebook output in SageMaker Studio to see if there are any
626645
627646
* (SageMaker Studio) Try to re-initialize the instance by restarting the notebook: Kernel -> Restart Kernel and Run All Cells.
628647
648+
* (PyCharm) Check the IDE log:
649+
650+
```bash
651+
tail -f ~/Library/Logs/JetBrains/PyCharm2024.1/idea.log
652+
```
653+
654+
* Enable Session Manager session logs as described [in the AWS Systems Manager documentation]((https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-logging.html)). You might need to create a new CloudWatch log group, e.g. `/ssm/logs` and / or S3 bucket like `ssm-logs-555555555555`. Note that according to the documentation, *"Logging isn't available for Session Manager sessions that connect through port forwarding or SSH"*, so it will only help you when you connect directly to the `mi-*` instance with [AWS CLI or AWS Console](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-sessions-start.html).
655+
656+
* Set locally the environment variable `SM_SSH_DEBUG=true` and check the file `/tmp/sm-ssh-debug.log`
657+
658+
* Check that the remote host is not overloaded with tasks and has enough memory to execute SSM and SSH commands, e.g., by running `top` from SageMaker Studio image terminal.
629659
630660
### I’m getting an API throttling error in the logs
631661

IAM_SSM_Setup.md

+3-4
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ SageMaker SSH Helper relies on the AWS Systems Manager service to create SSH tun
1212

1313
### Automated setup with CDK and Cloud9
1414

15-
a. Create the [Cloud9](https://docs.aws.amazon.com/cloud9/latest/user-guide/create-environment-main.html) environment. Alternatively, you can the commands run in your local terminal. In this case, make sure you've installed Node.js and CDK and fulfilled [all other CDK prerequisites](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_prerequisites). In both cases you need to have an admin role.
15+
a. From AWS Console, pop up [CloudShell](https://aws.amazon.com/cloudshell/) environment. Alternatively, you can the commands run in your local terminal. In this case, make sure you've installed Node.js and CDK and fulfilled [all other CDK prerequisites](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_prerequisites). In both cases you need to have an admin role.
1616

1717
b. Define your SageMaker role, local user role, AWS account ID and AWS Region as variables by executing the following commands in the terminal line by line:
1818

@@ -52,6 +52,7 @@ Local variables `SAGEMAKER_ROLE_ARN` and `USER_ROLE_ARN` are passed as parameter
5252
c. To enable SageMaker SSH Helper in additional AWS Regions, run these commands per region (adjust `REGION` variable each time):
5353

5454
```shell
55+
ACCOUNT_ID=
5556
REGION=
5657
```
5758

@@ -60,9 +61,7 @@ cdk bootstrap aws://"$ACCOUNT_ID"/"$REGION"
6061

6162
APP="python -m sagemaker_ssh_helper.cdk.advanced_tier_app"
6263

63-
AWS_REGION="$REGION" cdk -a "$APP" deploy SSM-Advanced-Tier-Stack \
64-
-c sagemaker_role="$SAGEMAKER_ROLE_ARN" \
65-
-c user_role="$USER_ROLE_ARN"
64+
AWS_REGION="$REGION" cdk -a "$APP" deploy SSM-Advanced-Tier-Stack
6665
```
6766

6867
*Note:* If you will run the jobs from SageMaker Studio instead of your local machine, specify `USER_ROLE_ARN` the same as `SAGEMAKER_ROLE_ARN`.

README.md

+47-11
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ remote debugging, and advanced troubleshooting.
1010

1111
Three most common tasks that motivated to create the library, sometimes referred as "SSH into SageMaker", are:
1212
1. A terminal session into a container running in SageMaker to diagnose a stuck training job, use CLI commands
13-
like nvidia-smi, or iteratively fix and re-execute your training script within seconds.
13+
like nvidia-smi and neuron-ls, or iteratively fix and re-execute your training script within seconds.
1414
2. Remote debugging of a code running in SageMaker from your local favorite IDE like
1515
PyCharm Professional Edition or Visual Studio Code.
1616
3. Port forwarding to access auxiliary tools running inside SageMaker, e.g., Dask dashboard, Streamlit apps, TensorBoard or Spark Web UI.
@@ -90,7 +90,9 @@ Install the latest stable version of library from the [PyPI repository](https://
9090
```shell
9191
pip install sagemaker-ssh-helper
9292
```
93-
**Caution:** It's always recommended to install the library into a Python venv, not into the system env.
93+
**Caution:** It's always recommended to install the library into a Python venv, not into the system env. If you want to use later the SSH plugins of your IDE that will use the system env and system Python, you should add the venv into the system PATH, as described in the section [Remote code execution with PyCharm / VSCode over SSH](#remote-interpreter).
94+
95+
If you're working on Windows, see [FAQ](FAQ.md#is-windows-supported).
9496

9597
### Step 2: Modify your start training job code
9698
1. Add import for `SSHEstimatorWrapper`
@@ -499,7 +501,7 @@ This low-level script takes the managed instance ID as a parameter. Next section
499501
The syntax for the SSH Helper CLI command `sm-ssh` is the following:
500502

501503
```bash
502-
sm-ssh [-h] [-v] {list,start-proxy,connect} [fqdn]
504+
sm-ssh [-h] [-v] {list,start-proxy,connect} [fqdn] [extra-connect-args]*
503505
```
504506

505507
where `fqdn` is the resource name with `.sagemaker` suffix, respectively:
@@ -529,14 +531,21 @@ sm-ssh list sagemaker
529531

530532
– will list all resources of all types.
531533

532-
The instances with SSH Helper will be marked `Online` while other instances will be marked with `-`.
534+
The instances with SSH Helper will be marked `Online` or `ConnectionLost` while the instances not registered with SSM be marked with `ssh:NotFound`.
533535

534536
The `connect` command starts interactive SSH session into container, e.g.:
535537

536538
```bash
537539
sm-ssh connect ssh-training-example-2023-07-25-03-18-04-490.training.sagemaker
538540
```
539541

542+
It's possible to pass additional arguments and forward ports together with the `connect` command, e.g., to forward [SSH Agent](https://linux.die.net/man/1/ssh-agent) and Streamlit web app port:
543+
544+
```bash
545+
ssh-add
546+
sm-ssh connect ssh-training-example-2023-07-25-03-18-04-490.training.sagemaker -A -L 8501:localhost:8501
547+
```
548+
540549
#### ~/.ssh/config
541550

542551
Alternatively, instead of using `sm-ssh connect` command, you can use the native `ssh` command, but it will require you to update your [ssh config](https://linux.die.net/man/5/ssh_config), typically `~/.ssh/config`, with `sm-ssh start-proxy` command as follows:
@@ -577,7 +586,14 @@ Follow the steps in the next section for the IDE configuration, to prepare the `
577586
sm-local-configure
578587
```
579588

580-
**Caution**: If you plan to use `sm-ssh` tool from the IDE, which you run inside your system Python env, you should install SSH Helper into your system Python env, too.
589+
**Caution**: You will use SSH plugins from the IDE running inside your system env with system Python, therefore you should add SSH Helper into your system PATH, e.g., on macOS:
590+
```bash
591+
sudo bash -c "echo '/Users/janedoe/PycharmProjects/sagemaker-ssh-helper-dev-venv/bin' > /etc/paths.d/42-sm-ssh"
592+
```
593+
594+
You might need restart the Terminal and the IDE for changes to take an effect.
595+
596+
Alternatively, use the trick with port forwarding - start the `sm-ssh` or `ssh` with `-L` option inside venv, and then use `localhost` as the host to connect to from IDE. This trick is used to make SSH Helper work on Windows, and it's described in [FAQ - Is Windows Supported?](FAQ.md#is-windows-supported).
581597

582598
2. Submit your code to SageMaker with SSH Helper as described in previous sections, e.g. as a [training job](#step-1-install-the-library).
583599

@@ -589,13 +605,19 @@ Instead of using SSM to connect to the container from command line, proceed to t
589605

590606
Make sure you've configured your ssh config as mentioned in the [~/.ssh/config](#sshconfig) section and your IDE can access `sm-ssh` command from the system env.
591607

608+
If you connect to your host for the first time, check that `ssh` command is working from CLI:
609+
610+
```bash
611+
ssh sh-training-manual-2023-10-02-14-38-56-744.training.sagemaker
612+
```
613+
592614
A. Follow the [instructions in the PyCharm docs](https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html#remote-interpreter), to configure the remote interpreter in PyCharm.
593615

594616
In the field for host name, put the same value as for `fqdn` in the [`sm-ssh` command](#sm-ssh), e.g., `ssh-training-manual-2023-10-02-14-38-56-744.training.sagemaker`, and use `root` as the username.
595617

596618
![](images/pycharm_training.png)
597619

598-
When PyCharm asks for the SSH key, point to the `~/.ssh/<fqdn>` private key file that was automatically generated for you by SSH Helper:
620+
If PyCharm asks for the SSH key, point to the `~/.ssh/<fqdn>` private key file that was automatically generated for you by SSH Helper:
599621

600622
![](images/pycharm_training_ssh.png)
601623

@@ -614,11 +636,15 @@ Put the `root@fqdn` as the hostname to connect to, e.g., `root@ssh-training-exam
614636

615637
![](images/vscode_training.png)
616638

617-
> **NOTE:** The **Remote SSH** extension described in the above instructions is only for the [Visual Studio Code native app](https://code.visualstudio.com/). Code Editor in SageMaker Studio and web apps based on [Code Server](https://github.com/coder/code-server) that use extensions from [Open VSX Registry](https://open-vsx.org/) might look and work differently. SageMaker SSH Helper **DOES NOT** support browser-based implementations and haven't been tested with any of Open VSX extensions. If you prefer to use the browser for development, take a look at the [Web VNC](#web-vnc) option.
639+
> **NOTE:** The **Remote SSH** extension described in the above instructions is only for the [Visual Studio Code native app](https://code.visualstudio.com/). Code Editor in SageMaker Studio and other web apps based on [Code - OSS](https://github.com/microsoft/vscode#visual-studio-code---open-source-code---oss) such as [Code Server](https://github.com/coder/code-server) that use extensions from [Open VSX Registry](https://open-vsx.org/) might look and work differently from the native app that has Microsoft-specific customizations. SageMaker SSH Helper **DOES NOT** support browser-based implementations of VS Code and haven't been tested with any of Open VSX extensions. If you prefer to use the browser for development, take a look at the [Web VNC](#web-vnc) option.
640+
641+
There are few extension options that you might want to change for VS Code to work properly with SageMaker containers:
618642

619-
You might also need to increase "Remote.SSH: Connect Timeout" option to `90` in VS Code. See [the StackOverflow post](https://stackoverflow.com/questions/59978826/why-ssh-connection-timed-out-in-vscode) for details.
643+
* You might need to increase "Remote.SSH: Connect Timeout" option to `120` in VS Code. See [the StackOverflow post](https://stackoverflow.com/questions/59978826/why-ssh-connection-timed-out-in-vscode) for details.
620644

621-
If you see the error `tar: code: Cannot change ownership to uid 1000, gid 1000: Operation not permitted` when connecting, then try to set "Remote.SSH: Use Exec server" to `false`, as mentioned in [#58 - vscode connect fails](https://github.com/aws-samples/sagemaker-ssh-helper/issues/58).
645+
* If you see the error `tar: code: Cannot change ownership to uid 1000, gid 1000: Operation not permitted` when connecting, then try to set "Remote.SSH: Use Exec server" to `false`, as mentioned in [#58 - vscode connect fails](https://github.com/aws-samples/sagemaker-ssh-helper/issues/58).
646+
647+
* You might also need to set "Remote.SSH: Use Local Server" to `false` and "Remote.SSH: Lockfiles In Tmp" to `true`, if you still have connection problems.
622648

623649
4. Connect to the instance and stop the waiting loop
624650

@@ -735,15 +761,23 @@ For your local IDE integration with SageMaker Studio, follow the same steps as f
735761

736762
1. Copy [SageMaker_SSH_IDE.ipynb](SageMaker_SSH_IDE.ipynb) into SageMaker Studio and run it.
737763

764+
Note that the `main` branch of this repo can contain changes that are not compatible with the version of `sagemaker-ssh-helper` that you installed from pip.
765+
766+
To be completely sure that you're using the version of the notebook that corresponds to the installed library, take a copy of the notebook from your filesystem after you install SSH Helper package, e.g.:
767+
768+
```bash
769+
cp /opt/conda/sm_ssh/SageMaker_SSH_IDE.ipynb /root/
770+
```
771+
772+
You can also check the version with `pip freeze | grep sagemaker-ssh-helper` and take the notebook from [the corresponding release tag](https://github.com/aws-samples/sagemaker-ssh-helper/tags).
773+
738774
Alternatively, [attach](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-lcc-create.html) to a domain the KernelGateway lifecycle config script [kernel-lc-config.sh](kernel-lc-config.sh)
739775
(you may need to ask your administrator to do this).
740776
Once configured, from the Launcher choose the environment, pick up the lifecycle script and choose
741777
'Open image terminal' (so, you don't even need to create a notebook).
742778

743779
You might want to change the `LOCAL_USER_ID` variable upon the first run, to prevent users from impersonating each other. For more details see the FAQ on [How SageMaker SSH Helper protects users from impersonating each other?](FAQ.md#how-sagemaker-ssh-helper-protects-users-from-impersonating-each-other).
744780

745-
> Note that the `main` branch of this repo can contain changes that are not compatible with the version of `sagemaker-ssh-helper` that you installed from pip. To ensure the stable performance, check the version with `pip freeze | grep sagemaker-ssh-helper` and take the notebook and the lifecycle script from [the corresponding tag](https://github.com/aws-samples/sagemaker-ssh-helper/tags).
746-
747781
2. Configure remote interpreter in PyCharm / VS Code to connect to SageMaker Studio
748782

749783
Use `app_name.user_profile_name.domain_id.studio.sagemaker` or `app_name.studio.sagemaker` as the `fqdn` to connect.
@@ -754,6 +788,8 @@ To see available apps to connect to, you may run the `list` command:
754788
sm-ssh list studio.sagemaker
755789
```
756790

791+
*Note:* If you're using Windows, see [the FAQ](FAQ.md#is-windows-supported).
792+
757793
3. Using the remote Jupyter Notebook
758794

759795
In recent versions of PyCharm, Jupyter Notebook is tunnelled automatically through remote interpreter connection. You might need to add `--allow-root` argument to the command line, when your remote interpreter runs under root:

sagemaker_ssh_helper/cdk/iam_ssm/iam_ssm_stack.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,10 @@ def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
6262
actions=[
6363
"ssm:StartSession",
6464
],
65-
resources=[f"arn:{Aws.PARTITION}:ssm:*::document/AWS-StartSSHSession"]
65+
resources=[
66+
f"arn:{Aws.PARTITION}:ssm:*::document/AWS-StartSSHSession",
67+
f"arn:{Aws.PARTITION}:ssm:*:{Aws.ACCOUNT_ID}:document/SSM-SessionManagerRunShell"
68+
]
6669
),
6770
PolicyStatement(
6871
effect=Effect.ALLOW,

0 commit comments

Comments
 (0)