Lab:newuser

From biowiki
Jump to navigation Jump to search

How to use lab cluster

Preparation

If you are new to Linux: https://ryanstutorials.net/linuxtutorial/

 

 

Lab cluster

login

You need an account to login to lab cluster to use various software and perform calculations. We can no longer log in remotely using password. SSH key is required. You have to come to the lab to do the below in person. If not, ask admin to do it for you. Once you receivd an account and temp password, log in at any workstation in the lab, and set up your ssh key:

These steps should have been done for you by admin.

cd ~
ssh-keygen -t rsa -b 4096 (then hit enter all the way)
cd .ssh 
cat id_rsa.pub >> authorized_keys
chmod 600  authorized_keys


Note the key exist in pairs, one is public (id_rsa_pub) and another is private.

Then "ssh bme-uranus”. Use "passwd" to update your password to something very secure. The bare command is sufficient; prompts will open up for old and new passwords. Your changes will be propagated to all nodes automatically. 

After you have logged into uranus, obtain a list of all known nodes (hostnames, https://github.com/bdw2292/Ren-Lab-Daemon/blob/main/nodes.txt) and ssh from uranus to each node. There will be a prompt that opens asking if you would like to add that hostname to list of known hosts, just type yes for each prompt.

There is a Google spreadsheet for signing up to use different CPUs and GPUs.

login from your laptop on and off campus

You will need a ssh client to remote log into our computer cluster.

You also need UT VPN to connect from off campus. On campus, you don't need this.

If you use Mac or Linux, ssh client is built-in (via terminal). If you use Windows, you can install ssh clients like mobaxterm https://mobaxterm.mobatek.net/, Windows subsystem for linux (ubutu/Debian etc), or cygwin. You need to create ssh keys (private and public pair) on your own computer (e.g. laptop).Mobaxterm has tool for this (Tools/mobakeygen). If you use WSL or Mac, the instruction is the same as above (ssh-keygen). Once you generate the key pairs, save them somewhere you know. Then append the pub key from your laptop (client) to "authorized_keys" on lab computer (remote server) in your .ssh folder. On your laptop, you will need specify the matching private key if you use mobaxterm (otherwise it will be found in ~/.ssh if you use Mac or linux). Again, if you cannot come to lab in person, send me the public key (via link to clound drive, not explicitly in email) and we will append for you.

Remotely, you can log into bme-jupiter.bme.utexas.edu or bme-sugar.bme.utexas.edu or All the other nodes/server are blocked. From there, you can ssh into uranus and nodes behind uranus (see cluster structure at the bottom of this page). If the username you have on your device does not match your cluster username, you will have to preface this address with your username (e.g. longhorn42@bme-jupiter.bme.utexas.edu).

From off campus, you need to use UT VPN to ssh into jupiter or sugar. https://wikis.utexas.edu/display/engritgpublic/Connecting+to+the+University+of+Texas+VPN

More instructions if you are using mobaxterm:

  • make sure connected via VPN
  • Start mobxterm and use Tools/mobakeygen to generate a pair of public and private keys. Save both.
  • Append your public key to .ssh/authprized_keys (you can ask for help if you are remote)
  • Create a new session in mobaxterm, under Advanced ssh setting, host: bme-jupiter.bme.utexas.edu, username: xxx, and specify your private key you saved above.

your home dir is on uranus

uranus is now obsolete. Uranus replaced uranus as the header node.

bme-uranus.bme.utexas.edu (node199, 10.0.0.199) is the header node of our computer cluster where all your home directory and files are stored. Your home directory (echo $HOME) is located either in /home or /users which is physically located on uranus . It can be accessed from any nodes and workstations in the lab. From uranus, you can start your job by doing "ssh node120 xxx” where xxx is your simulation script. You can write your own script to submit a bunch of job to these nodes (check the availability and skip if the node is busy). Your home directory is shared among uranus and all nodes vis NFS.

(More about lab cluster structure at the end of page)

Your home directory has a limit (quota) on space and file number and you won't be able to create new files once it is reached.

Even you see your home folder on all WS/nodes in the lab but they are actually doing so remotely (hence use /scratch that is local on each node for serious jobs).

 

 

where to run simulations & analysis

The home dir on uranus is accessible on all nodes. But when you perform calculations on the nodes/WS in your home directory, it is actually accessing the files on uranus through network. It is convenient to run relatively short simulations, store the output files and perform quick analysis directly in your home directory. But because all members in the lab share home dir on uranus, it could stress out the system if several demanding simulations/trajectory analyses are performed at the same time on files in your home dir on uranus.

So for demanding calculations, please use /work (mounted from sun.bme.utexas.edu) and /work2 (from uranus.bme.utexas.edu) as working directories. "sun and uranus" are also shared among all nodes like uranus. But this way we spread out the burden among 3 servers. If you ahve an active projecta nd expect to produce sunstaitnal amount files, let me know and I will created folder for you on /work and /work2. Keep uranus(your home directory) clean and efficient for everyone including yourself.

For QM and highly disk intensive tasks, we also use local "/scratch" disk. This space is large (100s GB) but is "local" meaning only accessible on specific node (after you ssh into it). 

When submitting a command, use the following. This will run it in the background and redirect output that would print to the screen to a file "nohup.out". This prevents it from terminating when you logout or close the terminal.

"nohup your_command &"

 

"less +F file_name" to have constantly updated end of file on screen

 

Lab cluster usage monitoring and signup

The node activities can be monitored here:

https://biomolmd.org/pren/nodes.html
https://biomolmd.org/liuchw/jobs.txt

Complete CPU/GPU usage & sign up spreadsheet (email pren to request permission to edit): https://docs.google.com/spreadsheets/d/1EOlUwFpdNU2uBZ5XrHSvZnRYCUCOw5tCSkFisTm3big/edit?usp=sharing

 

 

Utilities

We don’t have a job queuing system (yet) so you just log into a node to run your job.

Use "top" to check the load on a node. 600% means 6 cores are fully loaded.
"less /proc/cpuinfo" to check how many CPU cores/threads on a node
"free -g" to check free memory (esp. for QM jobs). The line "-/+ buffers/cache:" has the real number for free space
"echo > large_file_name" to empty a large file quickly. rm can be slow
"df -h" to check available storage (disk space)

 

More tips about Lnux commands and uitlies:

https://docs.google.com/document/d/1cnmSItdRXDBcpVBGhwDahJVJ2jeh4l2oDVpMBLtlySE/edit?usp=sharing 

Backups

All home directories (/home, /users, /opt, /work, /work2) are backed up twice a week for ~ 2 weeks on bigdata.bme.utexas.edu. If you need to recover any files from last couple of weeks, you should be able to find them there. Log into bigdata, cd /bigdata/renlab/.


Cluster structure

uranus.bme.utexas.edu is the header node for all computing nodes (nodexxx) that can only be accessed from uranus. The /home, /users, /opt folders are all part of uranus file system. The computing nodes have private IP address so can not be seen from outside including your WS or laptop. Each mode has a large local /scratch" folder that writable by everyone. "sun and uranus" are another two large servers mounted as /work and /work2 on each WS and computing node. You can use them just like the home dir. bigdata is the backup server.

Read the pdf below for an illustration (replace NOVA with uranus)

Current login nodes are bme-jupiter and bme-sugar and UT VPN is required

alt text