Remote Python Debugging on Slurm Clusters with VSCode
UV
1 | export UV_INSTALL_DIR="$HOME/CISPA-home/.uv" |
Best practice:
- Before running
uv initoruv venv, create a soft link of venv to the same storage of uv cache bymkdir -p $PATH_TO_LARGE_STORAGE/$PROJECT_NAME/.venv && ln -s $PATH_TO_LARGE_STORAGE/$PROJECT_NAME/.venv ./. This ensures that hard-link of uv works properly. - If the
.venvfolder of current project has to been placed in a storage different from uv cache, use soft-link inuv.tomlbyecho 'link-mode = "symlink"' >> ~/CISPA-home/.uv/uv.toml.
Environment Variables
VSCode does not automatically pass the current environment variables to a task, meaning we must set them manually. Furthermore, since ~/.bashrc is only accessible on the login node, we need to store our environment variables in a dedicated configuration file located in a shared directory accessible by all nodes.
File ~/CISPA-home/.envrc:
1 | # local variables |
Add the lines below at the end of file ~/.profile
1 | # load envrc |
If you are using VSCode, kill the server and reload the window:
Ctrl+Shift+P,>Remote-SSH: Kill Current VS Code ServerCtrl+Shift+P,>Developer: Reload Window
Debugpy Script
Note that VSCode will invasively execute a source command in every new task terminal when using a Python venv. This behavior can interfere with the first input() calling in the script being debugged. To prevent this, we use read -r -t 0.1 to clear the stdin before starting debugpy.
Since the debugging task must run in the background, we need to output specific string patterns to indicate the current status of the debugging node to VSCode:
- Start:
>>>>>>>> MYDEBUGPY HELLO hostname:port - Ready:
>>>>>>>> DEBUGGING [script, args...] - Finish:
>>>>>>>> MYDEBUGPY BYEBYE code
File ~/CISPA-home/mydebugpy:
1 |
|
Request a Compute Node for Notebook or Coding Agents
Use sbatch to submit a task with only one command sleep 24h. By this way you will be assigned with a compute node. Assume that its hostname is xe8545-a100-23 and job ID is 1234567.
IMPORTANT! Remember to run scancel 1234567 when the compute node is no longer needed!
Jupyter Notebook
1 | srun --jobid=1234567 uv run jupyter server --ip="0.0.0.0" --port 12345 |
You will see information like http://dgx-a100-3:12345/?token=.... Copy and paste it in Jupyter Notebook in VSCode to connect it.
Coding Agents
By prompting: “Every time you want to run python code you should use command srun --jobid=1234567 uv run ...“
By skill.md: [TODO]
VSCode Debugging
Node Configure
Define the configuration parameters for the target compute node in settings.json. This allows the variables to be shared seamlessly between tasks.json and launch.json.
File .vscode/settings.json:
1 | { |
Debugging Task
In this task configuration, we connect to the (sleeping) allocated compute node via srun --jobid= and launch the debugging server.
File .vscode/tasks.json:
1 | { |
Launch Debugger
Finally, attach the VSCode debugger to the remote debugpy session running on the allocated compute node.
File .vscode/launch.json:
1 | { |