User's Guide Systems Wiki Monitoring More

IIGB HT-Seq

Workshops

R & BioC

BioC-Seq

R Programming

EMBOSS

Linux

Cluster

Linux Essentials

Please email questions and fix requests to <aleksandr DOT levchuk AT ucr DOT edu>

1. Introduction

Why GNU/Linux?

How to get access?

  • Install a flavor of GNU/Linux on your local machine (not required!!!)
  • Get an account on biocluster.ucr.edu server
    • Email <tgirke AT citrus DOT ucr DOT edu>

Unix variants

GNU/Linux distributions



2. Basics

Syntax for this manual

  • Remember the UNIX/LINUX command line is case sensitive!
  • The text in green or red represents the actual command. The commands in red emphasize essential information for beginners.

  • The notation <...> refers to variables and file names that need to be specified by the user.

    • The arrows < and > need to be excluded.

Logging-In

from Mac or LINUX

  • Open terminal and type:
    • ssh -X your_username@biocluster.ucr.edu

from Windows

  1. Open Putty and select ssh. Download PuTTY if you do not have it.

  2. Provide host name and session name
    • hostname:

      biocluster.ucr.edu

  3. Enter your identity information
    • username:

      your username

      password:

      your password

  4. Setup for graphics emulation. Download and install Xming if you do not have it.

  5. Use WinSCP for file exchange. Download and install WinSCP if you do not have it.

Changing password

  • passwd

         
    passwd            # this will first ask you to enter your current password
    

Orientation

  • ls

  • pwd

  • stat

  • whoami

  • hostname

         
    pwd               # present working directory
    ls                # content of pwd
    ls -l             # similar as ls, but provides additional info on files and directories
    ls -a             # includes hidden files (.name) as well
    ls -R             # lists subdirectories recursively
    ls -t             # lists files in chronological order
    stat <file-name>  # provides all attributes of a file
    whoami            # shows as who you are logged in
    hostname          # shows on which machine you are
    

Files and directories

  • cd

  • cp

  • mv

  • mkdir

  • rm

  • rmdir

         
    mkdir <dir_name>   # creates specified directory
    cd <dir_name>      # switches into specified directory
    cd ..              # moves one directory up
    cd ../../          # moves two directories up (and so on)
    cd                 # brings you to highest level of your home directory
    rmdir <dir_name>   # removes empty directory
    rm <file_name>     # removes file name
    rm -r <dir_name>   # removes directory including its content, but asks for confirmation, 'f' argument turns confirmation off
    mv <name1> <name2> # renames directories or files
    mv <name> <path>   # moves file/directory as specified in path
    cp <name> <path>   # copy file/directory as specified in path (-r to include content in directories)
    

Copy and paste

The methods differ depending where you are

If you are in Command Line

  • Cut last word with keyboard only
    Ctrl+w
    Paste with keyboard only
    Ctrl+y

If you are a non-command line Free Desktop environment

  • Copy
    Ctrl+c
    Paste
    Ctrl+v
  • Copy with mouse only
    Simply select the text with the mouse
    Paste with mouse only
    Click the middle mouse button or both left/right buttons simltaniously

Handy shortcuts

  • history

         
    history                        # shows all commands you have used recently
    # .                              refers to the present working directory
    # ~/                             refers to user's home directory
    # up(down)_key                   scrolls through command history
    # <something-incomplete> TAB     completes path/file_name
    # Ctrl+a                         cursor to beginning of command line
    # Ctrl+e                         cursor to end of command line
    # Ctrl+d                         delete character under cursor
    # Ctrl+k                         delete line from cursor, content goes into kill buffer
    # Ctrl+y                         paste content from Ctrl k
    


3. Unix Help

  • man

  • info

  • apropos

         
    man         # general help
    man wc      # manual on program 'word count' wc
    wc --help   # short help on wc
    info wc     # more detailed information system (GNU)
    apropos wc  # retrieves pages where wc appears
    


4. Finding Things

Finding files, directories and applications

  • find

         
    find -name "*pattern*"            # searches for *pattern* in and below current directory
    find /usr/local -name "*blast*"   # finds file names *blast* in specfied directory
    find /usr/local -iname "*blast*"  # same as above, but case insensitive
    
    • additional useful arguments: -user <user name>, -group <group name>, -ctime <number of days ago changed>

  • locate

  • grep

  • which

  • whereis

  • dpkg

         
    find ~ -type f -mtime -2   # finds all files you have modified in the last two days
    locate <pattern>           # finds files and dirs that are written into update file
    which <application_name>   # location of application
    whereis <application_name> # searches for executeables in set of directories
    dpkg -l | grep mypattern   # find Debian packages and refine search with grep pattern
    

Finding things in files

  • grep

  • wc

  • xargs

         
    grep pattern file           # provides lines in 'file' where pattern 'appears', if pattern is shell function use single-quotes: '>'
    grep -H pattern             # -H prints out file name in front of pattern
    grep 'pattern' file | wc    # pipes lines with pattern into word count wc (see chapter 8); wc arguments: -c: show only bytes, -w: show only words, -l: show only lines; help on regular expressions: $ man 7 regex or man perlre
    find /home/my_dir -name '*.txt' | xargs grep -c ^.*  # counts line numbers on many files and records each count along with individual file name; find and xargs are used to circumvent the Linux wildcard limit to apply this function on thousands of files.
    


5. Permissions and Ownership

How does it work

  • ls

    ls -l
    • Shows something like this for each file/dir

      drwxrwxrwx

      • Here is what the symbols mean

        d

        directory

        r

        read

        w

        write

        x

        execute

      • Here is what the positions of the symbols mean

        first triplet

        user permissions (u)

        second triplet

        group permissions (g)

        third triplet

        world permissions (o)

To assign write and execute permissions to user and group

  • chmod

    chmod ug+rx my_file

To remove all permissions from all three user groups

  • chmod ugo-rwx my_file

    +

    causes the permissions selected to be added

    -

    causes them to be removed

    =

    causes them to be the only permissions that the file has.

Example for number system

  •      
    chmod +rx public_html/
    # or
    chmod 755 public_html/
    

Change ownership

  • chown

  • chgrp

         
    chown <user> <file-or-directory>          # changes user ownership
    chgrp <group> <file-or-directory>         # changes group ownership
    chown <user>:<group> <file-or-directory>  # changes user & group ownership
    


6. Useful Unix Commands

  • wget

  • df

  • free

  • uname

  • bc

  • ifconfig

  • du

  • ln

         
    df          # disk space
    free -g     # memory info in Megabytes
    uname -a    # shows tech info about machine
    bc          # command-line calculator (to exit type 'quit')
    wget ftp://ftp.ncbi.nih.... # file download from web
    /sbin/ifconfig # give IP and other network info
    ln -s original_filename new_filename # creates symbolic link to file or directory
    du -sh      # displays disk space usage of current directory
    du -sh *    # displays disk space usage of individual files/directories
    du -s * | sort -nr # shows disk space used by different directories/files sorted by size
    


7. Process Management

  • top

  • who

  • w

  • ps

  • fg

  • bg

  • kill

  • renice

         
    top               # view top consumers of memory and CPU (press 1 to see per-CPU statistics)
    who               # Shows who is logged into system
    w                 # Shows which users are logged into system and what they are doing
    ps                # Shows processes running by user
    ps -e             # Shows all processes on system; try also '-a' and '-x' arguments
    ps aux | grep <user_name> # Shows all processes of one user
    ps ax --tree      # Shows the child-parent hierarchy of all processes
    ps -o %t -p <pid> # Shows how long a particular process was running. (E.g. 6-04:30:50 means 6 days 4 hours ...)
    # Ctrl z <enter>  # Suspend (put to sleep) a process
    fg                # Resume (wake up) a suspended process and brings it into foreground
    bg                # Resume (wake up) a suspended process but keeps it running in the background
    # Ctrl c          # Kills the process that is currently running in the foreground
    
    kill <process-ID>     # Kills a specific process
    kill -9 <process-ID>  # NOTICE: "kill -9" is a very violent approach - it does not give the process any time to perform cleanup procedures
    
    kill -l                      # List all of the signals that can be sent to a proccess
    kill -s SIGSTOP <process-ID> # Suspend (put to sleep) a specific process
    kill -s SIGCONT <process-ID> # Resume (wake up) a specific process
    
    renice -n <priority_value> # Changes the priority value, which range from 1-19, the higher the value the lower the priority, default is 10
    


8. Text Viewing

  • less

  • more

  • cat

         
    more <my_file>  # views text, use space bar to browse, hit 'q' to exit
    less <my_file>  # a more versatile text viewer than 'more', 'q' exits, 'G' moves to end of text, 'g' to beginning, '/' find forward, '?' find backwards
    cat  <my_file>  # concatenates files and prints content to standard output
    


9. Text Editors

Vi and Vim

Non-graphical (terminal-based) editor. Vi is guaranteed to be available on any system. Vim is the improved version of vi.

Emacs

Non-graphical or window-based editor. You still need to know keystroke commands to use it. Is usually not installed on modern Linux distributions.

Pico

Simple terminal-based editor available on most versions of Unix. Uses keystroke commands, but they are listed in logical fashion at bottom of screen.

Nano

A simple terminal-based editor which is default on modern Debian systems

Vim Manual

Basics

  • vim

         
    vim <my_file_name>  # open/create file with vim
    
  • i

    Insert Mode

    escape key

    Hit the Escape key for Normal (non-editing) mode

    :

    Commands start with ':'

    :w

    Save command; if you are in editing mode you have to hit ESC first!!!

    :q

    Quit file, don't save

    :q!

    Exits WITHOUT saving any changes you have made

    :wq

    Save and quit

    R

    Replace MODE

    r

    Replace only one character under cursor

    q:

    History of commands, to re-execute one of them, select and hit enter!

    :w <new_filename>

    Saves into new file

    :123

    Go to specified line number. For example line number 123.

Help

    • vimtutor

           
      vimtutor                 # open vim tutorial from shell
      

Moving Around in a File

  • $

    moves cursor to end of line

    A

    same as $, but also switches to insert mode

    0

    moves cursor to beginning of line

    Ctrl-g

    shows at status line filename and the line you are on

    Shift-G

    brings you to bottom of file, type line number

    Shift-G <number>

    brings you to specified line number

Wrapping long Lines

  • By default vi wraps long lines - this makes some files unreadable for a human eye
  • :set nowrap

    • :set nowrap

      turns off line wrapping, letting the text run passed the right side of the screen

      :set wrap

      turns it on

Line Numbers

  • :set number

    • :set number

      shows line numbers

      :set nonumber!

      hides line numbers

Working with Many Files & Splitting Windows

  •      
    vim *.txt         # opens many files at once
    
  • :n

    switches between files

    :wall or :qall

    write or quit all open files


         
    vim -o *.txt      # opens many files at once and displays them with horizontal split, '-O' does vertical split
    

    :args *.txt

    places all the relevant files in the argument list

    :all

    splits all files in the argument list (buffer) horizontally

    CTRL-ww

    switch between windows

    :split

    shows same file in two windows

    :split <file-to-open>

    opens second file in new window

    :vsplit

    splits windows vertically, very useful for tables

    :set "scrollbind"

    let's you scroll all open windows simultaneously

    :close

    closes current window

    :only

    closes all windows except current one

Enabling Syntax Highlighting

  • :syntax on

    • :syntax on

      turns on color syntax highlighting for various programming languages and data formats

      :syntax off

      turns it off

Spell Checking & Dictionary

  • ispell

    • ispell -l <some-file>     # List misspelled words
  • dict

  • wn

    • :! dict <word>

      meaning of word

      :! wn 'word' -over

      synonyms of word

Printing

  • :ha

    prints entire file

    :<FROM>,<TO>ha

    prints specified lines numbers: <FROM>,<TO>

Merging/Inserting Files

  • :r <filename>

    inserts content of specified file after cursor

Undo / Redo

  • u

    undo last command

    U

    undo all changes on current line

    CTRL-R

    redo one change which was undone

Delete / Cut

  • x

    deletes what is under cursor

    dw

    deletes from curser to end of word including the space

    de

    deletes from curser to end of word NOT including the space

    cw

    deletes rest of word and lets you then insert, hit ESC to continue with NORMAL mode

    c$

    deletes rest of line and lets you then insert, hit ESC to continue with with NORMAL mode

    d$

    deletes from cursor to the end of the line

    dd

    deletes entire line

    2dd

    deletes next two lines, continues: 3dd, 4dd and so on

Put (Paste)

  • p

    uses what was deleted/cut and pastes it behind cursor

    P

    pastes clipboard in front of the cursor

Copy & Paste

  • yy

    copies line, for copying several lines do 2yy, 3yy and so on

    p

    pastes clipboard behind cursor

Search in a File

  • {i} Most regular expressions work

    /my_pattern

    searches for my_pattern downwards, type n for next match

    ?my_pattern

    searches for my_pattern upwards, type n for next match

    :set ic

    switches to ignore case search (case insensitive)

    :set hls

    switches to highlight search (highlights search hits)

Replacing Text (using Regular Expressions)

  • {i} Great intro: A Tao of Regular Expressions

    :s/old_pat/new_pat/

    replaces first occurrence in a line

    :s/old_pat/new_pat/g

    replaces all occurrence in a line

    :s/old_pat/new_pat/gc

    add 'c' to ask for confirmation

    :#,#s/old_pat/new_pat/g

    replaces all occurrence between line numbers: #,#

    :%s/old_pat/new_pat/g

    replaces all occurrence in file

    :%s/\(pattern1\)\(pattern2\)/\1test\2/g

    regular expression to insert, you need here '\' in front of parentheses (<# Perl)

    :%s/\(pattern.*\)/\1 my_tag/g

    appends something to line containing pattern (<# .+ from Perl is .* in VIM)

    :%s/\(pattern\)\(.*\)/\1/g

    removes everything in lines after pattern

    :%s/\(At\dg\d\d\d\d\d\.\d\)\(.*\)/\1\t\2/g

    inserts tabs between At1g12345.1 and Description

    :%s/\n/new_pattern/g

    Replaces return signs

    :%s/pattern/\r/g

    Replace pattern with return signs!!

    :%s/\(\n\)/\1\1/g

    Insert additional return signs

    :%s/\(^At\dg\d\d\d\d\d.\d\t.\{-}\t.\{-}\t.\{-}\t.\{-}\t\).\{-}\t/\1/g

    replaces content between 5th and 6th tab (5th column), '{-}' turns off 'greedy' behavior

    :#,#s/\( \{-} \|\.\|\n\)/\1/g

    performs simple word count in specified range of text

    :%s/\(E\{6,\}\)/<font color="green">\1<\/font>/g

    highlight pattern in html colors, here highlighting of >= 6 occurrences of Es

    :%s/\([A-Z]\)/\l\1/g

    change uppercase to lowercase, '%s/\([A-Z]\)/\u\1/g' does the opposite

    :g/my_pattern/ s/\([A-Z]\)/\l\1/g | copy $

    Uses 'global' command to apply replace function only on those lines that match a certain pattern.

    The 'copy $' command after the pipe '|' prints all matching lines at the end of the file.

    :args *.txt | all | argdo %s/\old_pat/new_pat/ge | update

    Command 'args' places all relevant files in the argument list (buffer)

    'all' displays each file in separate split window

    command 'argdo' applies replacement to all files in argument list (buffer)

    flag 'e' is necessary to avoid stop at error messages for files with no matches

    command 'update' saves all changes to files that were updated

Matching-Parentheses Search

  • Place the cursor on (, [ or { and type %

    The courser moves to matching parentheses

HTML Editing

  • :runtime! syntax/2html.vim

    Convert and open file text file to HTML format

Executing Shell Commands in Vim

  • :!<SHELL_COMMAND>

    Executes any shell command, hit <enter> to return

    :sh

    Switches window to shell, 'exit' switches back to vim

Using Vim as Table Editor

  • v

    starts visual mode for selecting characters

    V

    starts visual mode for selecting lines

    CTRL-V

    starts visual mode for selecting blocks (use CTRL-q in gVim under Windows). This allows column-wise selections and operations like inserting and deleting columns. To restrict substitute commands to a column, one can select it and switch to the command-line by typing ':'. After this the substitution sytax for a selected block looks like this: '<,'>s///

    :set scrollbind

    Starts simultaneous scrolling of 'vsplitted' files

    :set scrollopt=hor

    To set to horizontal binding of files

    :AlignCtrl I=\t
    :%Align

    This allows to align tables by column separators (here '\t') when the Align utility from Charles Campbell's is installed

    {i} To sort table rows by selected lines or block, perform the visual select and then hit F3 key. The rest is interactive. To enable this function one has to include in the .vimrc file from Gerald Lai the Vim sort script

Modify Vim Settings (in file .vimrc)

  • See last chapter of vimtutor (start from shell)

  • Useful .vimrc sample

  • When vim starts to respond very slowly then one may need to delete the .viminf* files in home directory

10. The Unix Shell

When you log into UNIX/LINUX the system starts a program called Shell. It provides you with a working environment and interface to the operating system. Usually there are several different shell programs installed.

{i} The shell program bash is very common.

  •      
    finger <user_name>   # shows which shell you are using
    cat /etc/shells | awk '/^\// {system("ls " $1)}' 2> /dev/null  # lists all shell programs available on your system
    <shell_name>         # switches to a different shell
    

STDIN, STDOUT, STDERR, Redirections, and Wildcards

  • See LINUX HOWTOs


    By default, UNIX commands read from standard input (STDIN) and send their output to standard out (STDOUT).


    You can redirect them by using the following commands:

         
    <beginning-of-filename>*         # * is wildcard to specify many files
    ls > file                        # prints ls output into specified file
    command < my_file                # uses file after '<' as STDIN
    command >> my_file               # appends output of one command to file
    command | tee my_file            # writes STDOUT to file and prints it to screen
    command > my_file; cat my_file   # writes STDOUT to file and prints it to screen
    command > /dev/null              # turns off progress info of applications by redirecting their output to /dev/null
    grep my_pattern my_file | wc     # Pipes (|) output of 'grep' into 'wc'
    grep my_pattern my_non_existing_file 2 > my_stderr # prints STDERR to file
    

Useful shell commands

  •      
    cat <file1> <file2> > <cat.out>      # concatenate files in output file 'cat.out'
    paste <file1> <file2> > <paste.out>  # merges lines of files and separates them by tabs (useful for tables)
    cmp <file1> <file2>                  # tells you whether two files are identical
    diff <fileA> <fileB>                 # finds differences between two files
    head -<number> <file>                # prints first lines of a file
    tail -<number> <file>                # prints last lines of a file
    split -l <number> <file>             # splits lines of file into many smaller ones
    csplit -f out fasta_batch "%^>%" "/^>/" "{*}" # splits fasta batch file into many files at '>'
    sort <file>                          # sorts single file, many files and can merge (-m) them, -b ignores leading white space, ...
    sort -k 2,2 -k 3,3n input_file > output_file # sorts in table column 2 alphabetically and column 3 numerically, '-k' for column, '-n' for numeric
    sort input_file | uniq > output_file # uniq command removes duplicates and creates file/table with unique lines/fields
    join -1 1 -2 1 <table1> <table2>     # joins two tables based on specified column numbers (-1 file1, 1: col1; -2: file2, col2). It assumes that join fields are sorted. If that is not the case, use the next command:
    sort table1 > table1a; sort table2 > table2a; join -a 1 -t "`echo -e '\t'`" table1a table2a > table3 # '-a <table>' prints all lines of specified table! Default prints only all lines the two tables have in common. '-t "`echo -e '\t'`" ->' forces join to use tabs as field separator in its output. Default is space(s)!!!
    cat my_table | cut -d , -f1-3        # cut command prints only specified sections of a table, -d specifies here comma as column separator (tab is default), -f specifies column numbers.
    grep                                 # see chapter 4
    egrep                                # see chapter 4
    

11. Screen

Starting a New Screen Session

  • screen

  •      
    screen                 # Start a new session
    screen -S <some-name>  # Start a new session and gives it a name
    

    Ctrl-a d

    Detach from the screen session

    Ctrl-a c

    Create a new window inside the screen session

    Ctrl-a Space

    Switch to the next window

    Ctrl-a a

    Switch to the window that you were previously on

    Ctrl-a "

    List all open windows. Double-quotes " are typed with the Shift key

    Ctrl-d or type exit

    Exit out of the current window. Exiting form the last window will end the screen session

    Ctrl-a [

    Enters the scrolling mode.

    Use Page Up and Page Down keys to scroll through the window.

    Hit the Enter key twice to return to normal mode.

Attaching to Screen Sessions

  • From any computer, you can attach to a screen session after SSHing into the server (e.g. Biocluster).
         
    screen -r              # Attaches to an existing session, if there is only one
    screen -r              # Lists available sessions and their names, if there are more then one session running
    screen -r <some-name>  # Attaches to a specific session
    screen -r <first-few-letters-of-name> # Type just the first few letters of the name and you will be attached to the session you need
    

Destroying Screen Sessions

  1. Terminate all programs that are running in the screen session. The standard way to do that is:
    • Ctrl-c
  2. Exit out of your shell. Type:
    • exit
  3. Repeat steps 1 and 2 until you see the sign:
    • [screen is terminating]

      {i} There may be programs running in different windows of the same screen session. That's why you may need to terminate programs and exit shells multiple times

Tabs and a Reasonably Large History Buffer

  • For a better experience with screen, run
    • cp ~/.screenrc ~/.screenrc.backup 2> /dev/null
      echo 'startup_message off
      defscrollback 10240
      caption always "%{=b dy}{ %{= dm}%H %{=b dy}}%={ %?%{= dc}%-Lw%?%{+b dy}(%{-b r}%n:%t%{+b dy})%?(%u)%?%{-dc}%?%{= dc}%+Lw%? %{=b dy}}"
      ' > ~/.screenrc

Related Topics

12. Simple Shell One-Liner Scripts

Useful One-Liners (script download)

Renames file name.old to name.new - To test things first, replace 'do mv' with 'do echo mv'
  • for i in *.input; do mv $i ${i/name\.old/name\.new}; done
Runs application in loops on many input files
  • for i in *.input; do ./application $i; done
Runs fastacmd in loops on many *.input files and creates *.out files
  • for i in *.input; do fastacmd -d /data/../database_name -i $i > $i.out; done
Runs SAM's target99 on many input files
  • for i in *.pep; do target99 -db /usr/../database_name -seed $i -out $i; done
Searches in > 10,000 files for pattern and prints occurrences together with file names
  • for j in 0 1 2 3 4 5 6 7 8 9; do grep -iH <my_pattern> *$j.seq; done
Example of how to run an interactive application (tmpred) that asks for file name input/output
  • for i in *.pep; do echo -e "$i\n\n17\n33\n\n\n" | ./tmpred $i > $i.out; done
Runs BLAST2 for all *.fasa1/*.fasta2 file pairs in the order specified by file names and writes results into one file
  • for i in *.fasta1; do blast2 -p blastp -i $i -j ${i/_*fasta1/_*fasta2} >> my_out_file; done 

    {i} This example uses two variables in a for loop. The content of the second variable gets specified in each loop by a replace function.


Runs BLAST2 in all-against-all mode and writes results into one file; '-F F' turns low-complexity filter off
  • for i in *.fasta; do for j in *.fasta; do blast2 -p blastp -F F -i $i -j $j >> my_out_file; done; done;

How to write a script

  • create file which contains in first line:
    #!/bin/bash
  • place shell commands in file
  • run <chmod +x my_shell_script> to make it executable

  • run shell script like this: ./my_shell_script

  • when you place it into /usr/local/bin you only type its name from any user account

13. Simple Perl One-Liner Scripts

Useful One-Liners

Replace something
  •      
    perl -p -i -w -e 's/pattern1/pattern2/g' my_input_file  # replace something (e.g. return signs) in file using regular expressions
    

    {i} use $1 to back-reference to pattern placed in parentheses

    {i} '-p' lets perl know to write program

    {i} '-i.bak' creates backup file *.bak, only -i doesn't

    {i} '-w' turns on warnings

    {i} '-e' executable code follows


Parse lines that contain pattern1 and pattern2
  • perl -ne 'print if (/my_pattern1/ ? ($c=1) : (--$c > 0)) ; print if (/my_pattern2/ ? ($d = 1) : (--$d > 0))' my_input_file > my_output_file 

    {i} following lines after pattern can be specified in '$c=1' and '$d=1'

    {i} for the OR function use this syntax: '/(pattern1|pattern2)/'

14. Remote Copy: wget, scp, ncftp

WGET (file download from the www)

  •      
    wget http://www.ncbi.nlm.nih.gov/index.html # file download from www; add option '-r' to download entire directories
    

SCP (secure copy between machines)

General syntax

  •      
    scp <source> <destination> # Use form 'userid@machine_name' if your local and remote user ids are different. If they are the same you can use only 'machine_name'.
    

Examples

  • Copy file from Server to Local Machine (type from local machine prompt):
    scp user@remote_host:file.name . # '.' copies to pwd, you can specify here any directory, use wildcards to copy many files at once.
    Copy file from Local Machine to Server:
    scp file.name user@remote_host:~/dir/newfile.name
    Copy entire directory from Server to Local Machine (type from local machine prompt):
    scp -r user@remote_host:directory/ ~/dir
    Copy entire directory from Local Machine to Server (type from local machine prompt):
    scp -r directory/ user@remote_host:directory/
    Copy between two remote hosts (e.g. from bioinfo to cache): similar as above, just be logged in one of the remote hosts:
    scp -r directory/ user@remote_host:directory/


NICE FTP

  •      
    ncftp
    ncftp> open ftp.ncbi.nih.gov
    ncftp> cd /blast/executables
    ncftp> get blast.linux.tar.Z (skip extension: @)
    ncftp> bye
    

15. Archiving and Compressing

Compressing

  •      
    tar -cvf my_file.tar mydir/    # Builds tar archive of files or directories. For directories, execute command in parent directory. Don't use absolute path.     
    tar -czvf my_file.tgz mydir/   # Builds tar archive with compression of files or directories. For directories, execute command in parent directory. Don't use absolute path.
    

Viewing Archives

  •      
    tar -tvf my_file.tar
    tar -tzvf my_file.tgz
    

Extracting

  •      
    tar -xvf my_file.tar
    tar -xzvf my_file.tgz
    gunzip my_file.tar.gz # or unzip my_file.zip, uncompress my_file.Z, or bunzip2 for file.tar.bz2
    find -name '*.zip' | xargs -n 1 unzip # this command usually works for unziping many files that were compressed under Windows
    
    try also:
         
    tar zxf blast.linux.tar.Z
    tar xvzf file.tgz
    
    options:
    • f

      use archive file

      p

      preserve permissions

      v

      list files processed

      x

      exclude files listed in FILE

      z

      filter the archive through gzip

16. Simple Installs

Systems-wide installations

  • Installations for systems-wide usage are the responsibility of system administrator To find out if an application is installed, type:
         
    which   my_application
    whereis my_application_name # searches for executables in set of directories, doesn't depend on your path
    

    {i} Most applications are installed in /usr/local/bin or /usr/bin

    {i} You need root permissions to write to these directories

    {i} Perl scripts go into /usr/local/bin, Perl modules (*.pm) into /usr/local/share/perl/5.8.8/


    To copy executables in one batch, use command:

    cp `find -perm -111 -type f` /usr/local/bin

Applications in user accounts

  • Create a new directory, download application into this directory, unpack it (see chapter 13) and follow package-specific installation instructions. Usually you can then already run this application when you specify its location e.g.: /home/user/my_app/blastall. If you want you can add this directory to your PATH by typing from this directory:
    PATH=.:$PATH; export PATH # this allows you to run application by providing only its name; when you do echo $PATH you will see .: added to PATH.

Installation of RPMs

  • (!) RPMs are installable software packages used in the following Linux distributions: RedHat, CentOS, Fedora, SuSe, and others.

    rpm -i application_name.rpm
    To check which version of RPM package is installed, type:
    rpm --query <package_name>

    Help and upgrade files for RPMs can be found at http://rpmfind.net/.

Installation of Debian packages

  • (!) Deb's are installable software packages used in the following Linux distributions: Debain, Ubuntu, and others.


    Check whether your application is available at: http://packages.debian.org/stable/, then you type (no need to download):

         
    apt-cache search phylip   # searches for application "phylip" from command line
    apt-cache show phylip     # provides description of program
    apt-get install phylip    # example for phylip install, manuals can be found in /usr/doc/phylip/, use zless or lynx to read documentation (don't unzip).
    apt-get update            # do once a month do update Debian packages
    apt-get upgrade -u        # to upgrade after update from above
    dpkg -i                   # install data package from local package file (e.g. after download)
    aptitude                  # Debian package manageing interface (Ctrl-t opens menues)
    aptitude search vim       # search for packages on system and in Debian depositories
    

17. Devices

Mount/unmount usb/floppy/cdrom

  •      
    mount /media/usb
    umount /media/usb
    mount /media/cdrom
    eject /media/cdrom
    mount /media/floppy
    

18. Environment Variables

  • xhost user@host                # adds X permissions for user on server.
    echo $DISPLAY                  # shows current display settings
    export DISPLAY=<local_IP>:0    # change environment variable
    unsetenv DISPLAY               # removes display variable
    env                            # prints all environment variables
List of directories that the shell will search when you type a command
echo $PATH

{i} You can edit your default DISPLAY setting for your account by adding it to file .bash_profile

19. Exercises

Exercise 1

  1. Download proteome of Halobacterium spec from ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Halobacterium_sp/AE004437.faa (use wget or web browser for download)

  2. How many predicted proteins are there?
    • grep '>' AE004437.faa | wc -l
  3. How many proteins contain the pattern "WxHxxH[1-2]"?
    • egrep 'W.H..H{1,2}' AE004437.faa | wc -l
  4. Use the find function (/) in less to fish out the proteins containing this pattern or more elegantly do it with awk:

    • awk --posix -v RS='>' '/W.H..(H){1,2}/ { print ">" $0;}' AE004437.faa | less
  5. Create a BLASTable database with formatdb

    • formatdb -i AE004437.faa -p T -o T

      {i} -p F for nucleotide

      {i} -p T for protein databases

  6. Generate list of sequence IDs for above pattern match result and retrieve its sequences with fastacmd from formatted database

    • fastacmd -d AE004437.faa -i my_IDs > my_sequences
  7. Generate several lists of sequence IDs from various pattern match results and retrieve their sequences in one step using the fastacmd in for loop

    • for i in *.my_ids; do fastacmd -d AE004437.faa -i $i > $i.out; done
  8. Run blastall with a few proteins against newly created database or against Halobacterium or UniProt database (on Biocluster: -d /srv/projects/db/uniprot/uniprot)

    • blastall -p blastp -i my_sequences -d AE004437.faa -o blastp.out -e 1e-6 -v 10 -b 10 &

      {i} The input_file is the list of sequence IDs that you generated with fastacmd

      {i} On one CPU core this will run for 2 or 3 minutes

      {i} Add -a 10 to run on 8 CPU cores (Requires a computer with 2 Quad-Core processors)

  9. Parse blastall output into Excel spread sheet:

    • a. Using the biocore parser

      • blastParse -c <hits> -i my_blastp.out -o my_blastp.out.xls
    • b. Using BioPerl parser

      • bioblastParse.pl my_blastp.out
  10. Run HMMPFAM search with above proteins against Pfam database

    • hmmpfam -E 0.1 --acc -A0 /srv/projects/db/PFAM/Pfam_ls my_sequences > my_output.pfam

      Parse result with BioPerl parser

      hmmSummary my_output.pfam > my_hmm.summary

Exercise 2

  1. Split sample fasta batch file with csplit (use sequence file from exercise 1).

  2. Concatenate single fasta files from (1) to one batch file.

  3. Find common hit IDs:
    • a. BLAST two related sequences
    • b. Retrieve the result in table format
    • c. Use join to identify common hit IDs in the two tables

Exercise 3

  1. Write a shell script that executes a range of BLAST searches at once.
    • One way would be to repeat the command as many times as there are input files:
      • blastall -p blastp -d /.../my_database -i /.../my_input1 -o my_out1 -e 1e-6 -v 10 -b 10 &
        blastall -p blastp -d /.../my_database -i /.../my_input2 -o my_out2 -e 1e-6 -v 10 -b 10 &
        blastall -p blastp -d /.../my_database -i /.../my_input3 -o my_out3 -e 1e-6 -v 10 -b 10 &

      (!) Can you write a script without the repetitions?

Exercise 4

  1. Create multiple alignment with clustalw (e.g. use sequences with 'W.H..HH' pattern)

    • clustalw my_sequences1

Exercise 5

  1. Reformat alignment into PHYILIP format using seqret from EMBOSS

    • seqret clustal::my_sequences1.aln phylip::my_sequences1.phylip
      cat my_sequences1.phylip

Exercise 6

  1. Create neighbor-joining tree with phylip from PHYLIP

    •      
      cp my_sequences1.phylip infile
      phylip protdist   # creates distance matrix
      mv outfile infile
      phylip neighbor   # use default settings
      cp outtree intree
      phylip retree     # displays tree and can use midpoint method for defining root of tree, my typical command sequence is: 'N' 'Y' 'M' 'W' 'R' 'R' 'X'
      cp outtree my_tree.dnd
      cat my_tree.dnd | ruby -e 'while l=gets; print l.chomp; end; puts' # Print all on one line - necessary for TreeBrowse
      

      View your tree in TreeBrowse or open it in TreeView

References

  1. Savings in Open Source Confirmed. Softwaremag.com June 2008 (1)

  2. The Unix time sharing system. D. M. Ritchie and K. Thompson. Communications of the ACM, 17(7):365–375, July 1974. (2)

  3. Unix, Linux multitasking Librenix. April 2007 (3)

  4. Remote access in Linux Polishlinux.org. March 2007 (4)

  5. The Unix time sharing system D. M. Ritchie and K. Thompson. Communications of the ACM, 17(7):365–375, July 1974. (5)

  6. List of open source software packages Wikipedia. Link added in March 2009. (6)

  7. The Free Software Definition Free Software Foundation. Revision 1.10 was made in October 2001 (7)

BioclusterWiki: Linux-Essentials (last edited 2009-06-30 10:44:46 by AleksandrLevchuk)
All Biocluster Wiki pages were viewed

site stats
times. Show Details