Linux Essentials
Please email questions and fix requests to <aleksandr DOT levchuk AT ucr DOT edu>
Contents
-
Linux Essentials
- Introduction
- Basics
- Unix Help
- Finding Things
- Permissions and Ownership
- Useful Unix Commands
- Process Management
- Text Viewing
- Text Editors
- The Unix Shell
- Screen
- Simple Shell One-Liner Scripts
- Simple Perl One-Liner Scripts
- Remote Copy: wget, scp, ncftp
- Archiving and Compressing
- Simple Installs
- Devices
- Environment Variables
- Exercises
- References
-
Linux Essentials
1. Introduction
Why GNU/Linux?
Software costs $0 1
Remote tasking ("real networking") 4
Multiuser 5
Easy access to programming languages, databases, open-source projects 6
Software freedoms 7
- Free to use for any purpose
- Free to study the source code
- Free to share
- Free to modify
- No dependence on vendors
- Better performance
- More up-to-date
- Many more reasons ...
How to get access?
- Install a flavor of GNU/Linux on your local machine (not required!!!)
- Get an account on biocluster.ucr.edu server
Email <tgirke AT citrus DOT ucr DOT edu>
Unix variants
GNU/Linux distributions
2. Basics
Syntax for this manual
- Remember the UNIX/LINUX command line is case sensitive!
The text in green or red represents the actual command. The commands in red emphasize essential information for beginners.
The notation <...> refers to variables and file names that need to be specified by the user.
The arrows < and > need to be excluded.
Logging-In
from Mac or LINUX
- Open terminal and type:
ssh -X your_username@biocluster.ucr.edu
from Windows
Open Putty and select ssh. Download PuTTY if you do not have it.
- Provide host name and session name
hostname:
biocluster.ucr.edu
- Enter your identity information
username:
your username
password:
your password
Setup for graphics emulation. Download and install Xming if you do not have it.
Use WinSCP for file exchange. Download and install WinSCP if you do not have it.
Changing password
passwd
passwd # this will first ask you to enter your current password
Orientation
ls
pwd
stat
whoami
hostname
pwd # present working directory ls # content of pwd ls -l # similar as ls, but provides additional info on files and directories ls -a # includes hidden files (.name) as well ls -R # lists subdirectories recursively ls -t # lists files in chronological order stat <file-name> # provides all attributes of a file whoami # shows as who you are logged in hostname # shows on which machine you are
Files and directories
cd
cp
mv
mkdir
rm
rmdir
mkdir <dir_name> # creates specified directory cd <dir_name> # switches into specified directory cd .. # moves one directory up cd ../../ # moves two directories up (and so on) cd # brings you to highest level of your home directory rmdir <dir_name> # removes empty directory rm <file_name> # removes file name rm -r <dir_name> # removes directory including its content, but asks for confirmation, 'f' argument turns confirmation off mv <name1> <name2> # renames directories or files mv <name> <path> # moves file/directory as specified in path cp <name> <path> # copy file/directory as specified in path (-r to include content in directories)
Copy and paste
The methods differ depending where you are
If you are in Command Line
- Cut last word with keyboard only
- Ctrl+w
- Paste with keyboard only
- Ctrl+y
If you are a non-command line Free Desktop environment
- Copy
- Ctrl+c
- Paste
- Ctrl+v
- Copy with mouse only
- Simply select the text with the mouse
- Paste with mouse only
- Click the middle mouse button or both left/right buttons simltaniously
Handy shortcuts
history
history # shows all commands you have used recently # . refers to the present working directory # ~/ refers to user's home directory # up(down)_key scrolls through command history # <something-incomplete> TAB completes path/file_name # Ctrl+a cursor to beginning of command line # Ctrl+e cursor to end of command line # Ctrl+d delete character under cursor # Ctrl+k delete line from cursor, content goes into kill buffer # Ctrl+y paste content from Ctrl k
3. Unix Help
man
info
apropos
man # general help man wc # manual on program 'word count' wc wc --help # short help on wc info wc # more detailed information system (GNU) apropos wc # retrieves pages where wc appears
Online help: SuperMan Pages, Linux Documentation Project (LDP)
4. Finding Things
Finding files, directories and applications
find
find -name "*pattern*" # searches for *pattern* in and below current directory find /usr/local -name "*blast*" # finds file names *blast* in specfied directory find /usr/local -iname "*blast*" # same as above, but case insensitive
additional useful arguments: -user <user name>, -group <group name>, -ctime <number of days ago changed>
locate
grep
which
whereis
dpkg
find ~ -type f -mtime -2 # finds all files you have modified in the last two days locate <pattern> # finds files and dirs that are written into update file which <application_name> # location of application whereis <application_name> # searches for executeables in set of directories dpkg -l | grep mypattern # find Debian packages and refine search with grep pattern
Finding things in files
grep
wc
xargs
grep pattern file # provides lines in 'file' where pattern 'appears', if pattern is shell function use single-quotes: '>' grep -H pattern # -H prints out file name in front of pattern grep 'pattern' file | wc # pipes lines with pattern into word count wc (see chapter 8); wc arguments: -c: show only bytes, -w: show only words, -l: show only lines; help on regular expressions: $ man 7 regex or man perlre find /home/my_dir -name '*.txt' | xargs grep -c ^.* # counts line numbers on many files and records each count along with individual file name; find and xargs are used to circumvent the Linux wildcard limit to apply this function on thousands of files.
5. Permissions and Ownership
How does it work
ls
ls -l
- Shows something like this for each file/dir
drwxrwxrwx
- Here is what the symbols mean
d
directory
r
read
w
write
x
execute
- Here is what the positions of the symbols mean
first triplet
user permissions (u)
second triplet
group permissions (g)
third triplet
world permissions (o)
- Here is what the symbols mean
- Shows something like this for each file/dir
To assign write and execute permissions to user and group
chmod
chmod ug+rx my_file
To remove all permissions from all three user groups
chmod ugo-rwx my_file
+
causes the permissions selected to be added
-
causes them to be removed
=
causes them to be the only permissions that the file has.
Example for number system
chmod +rx public_html/ # or chmod 755 public_html/
Change ownership
chown
chgrp
chown <user> <file-or-directory> # changes user ownership chgrp <group> <file-or-directory> # changes group ownership chown <user>:<group> <file-or-directory> # changes user & group ownership
6. Useful Unix Commands
wget
df
free
uname
bc
ifconfig
du
ln
df # disk space free -g # memory info in Megabytes uname -a # shows tech info about machine bc # command-line calculator (to exit type 'quit') wget ftp://ftp.ncbi.nih.... # file download from web /sbin/ifconfig # give IP and other network info ln -s original_filename new_filename # creates symbolic link to file or directory du -sh # displays disk space usage of current directory du -sh * # displays disk space usage of individual files/directories du -s * | sort -nr # shows disk space used by different directories/files sorted by size
7. Process Management
top
who
w
ps
fg
bg
kill
renice
top # view top consumers of memory and CPU (press 1 to see per-CPU statistics) who # Shows who is logged into system w # Shows which users are logged into system and what they are doing ps # Shows processes running by user ps -e # Shows all processes on system; try also '-a' and '-x' arguments ps aux | grep <user_name> # Shows all processes of one user ps ax --tree # Shows the child-parent hierarchy of all processes ps -o %t -p <pid> # Shows how long a particular process was running. (E.g. 6-04:30:50 means 6 days 4 hours ...) # Ctrl z <enter> # Suspend (put to sleep) a process fg # Resume (wake up) a suspended process and brings it into foreground bg # Resume (wake up) a suspended process but keeps it running in the background # Ctrl c # Kills the process that is currently running in the foreground kill <process-ID> # Kills a specific process kill -9 <process-ID> # NOTICE: "kill -9" is a very violent approach - it does not give the process any time to perform cleanup procedures kill -l # List all of the signals that can be sent to a proccess kill -s SIGSTOP <process-ID> # Suspend (put to sleep) a specific process kill -s SIGCONT <process-ID> # Resume (wake up) a specific process renice -n <priority_value> # Changes the priority value, which range from 1-19, the higher the value the lower the priority, default is 10
8. Text Viewing
less
more
cat
more <my_file> # views text, use space bar to browse, hit 'q' to exit less <my_file> # a more versatile text viewer than 'more', 'q' exits, 'G' moves to end of text, 'g' to beginning, '/' find forward, '?' find backwards cat <my_file> # concatenates files and prints content to standard output
9. Text Editors
Vi and Vim |
Non-graphical (terminal-based) editor. Vi is guaranteed to be available on any system. Vim is the improved version of vi. |
Emacs |
Non-graphical or window-based editor. You still need to know keystroke commands to use it. Is usually not installed on modern Linux distributions. |
Pico |
Simple terminal-based editor available on most versions of Unix. Uses keystroke commands, but they are listed in logical fashion at bottom of screen. |
Nano |
A simple terminal-based editor which is default on modern Debian systems |
Vim Manual
Contents
-
Vim Manual
- Basics
- Help
- Moving Around in a File
- Wrapping long Lines
- Line Numbers
- Working with Many Files & Splitting Windows
- Enabling Syntax Highlighting
- Spell Checking & Dictionary
- Printing
- Merging/Inserting Files
- Undo / Redo
- Delete / Cut
- Put (Paste)
- Copy & Paste
- Search in a File
- Replacing Text (using Regular Expressions)
- Matching-Parentheses Search
- HTML Editing
- Executing Shell Commands in Vim
- Using Vim as Table Editor
- Modify Vim Settings (in file .vimrc)
Basics
vim
vim <my_file_name> # open/create file with vim
i
Insert Mode
escape key
Hit the Escape key for Normal (non-editing) mode
:
Commands start with ':'
:w
Save command; if you are in editing mode you have to hit ESC first!!!
:q
Quit file, don't save
:q!
Exits WITHOUT saving any changes you have made
:wq
Save and quit
R
Replace MODE
r
Replace only one character under cursor
q:
History of commands, to re-execute one of them, select and hit enter!
:w <new_filename>
Saves into new file
:123
Go to specified line number. For example line number 123.
Help
Find help on the web. Google will find abundance of responses to questions on vi and vim (try searching for both terms)
- Useful list of vim commands:
vimtutor
vimtutor # open vim tutorial from shell
Moving Around in a File
$
moves cursor to end of line
A
same as $, but also switches to insert mode
0
moves cursor to beginning of line
Ctrl-g
shows at status line filename and the line you are on
Shift-G
brings you to bottom of file, type line number
Shift-G <number>
brings you to specified line number
Wrapping long Lines
- By default vi wraps long lines - this makes some files unreadable for a human eye
:set nowrap
:set nowrap
turns off line wrapping, letting the text run passed the right side of the screen
:set wrap
turns it on
Line Numbers
:set number
:set number
shows line numbers
:set nonumber!
hides line numbers
Working with Many Files & Splitting Windows
vim *.txt # opens many files at once
:n
switches between files
:wall or :qall
write or quit all open files
vim -o *.txt # opens many files at once and displays them with horizontal split, '-O' does vertical split:args *.txt
places all the relevant files in the argument list
:all
splits all files in the argument list (buffer) horizontally
CTRL-ww
switch between windows
:split
shows same file in two windows
:split <file-to-open>
opens second file in new window
:vsplit
splits windows vertically, very useful for tables
:set "scrollbind"
let's you scroll all open windows simultaneously
:close
closes current window
:only
closes all windows except current one
Enabling Syntax Highlighting
:syntax on
:syntax on
turns on color syntax highlighting for various programming languages and data formats
:syntax off
turns it off
Spell Checking & Dictionary
ispell
ispell -l <some-file> # List misspelled words
dict
wn
:! dict <word>
meaning of word
:! wn 'word' -over
synonyms of word
Printing
:ha
prints entire file
:<FROM>,<TO>ha
prints specified lines numbers: <FROM>,<TO>
Merging/Inserting Files
:r <filename>
inserts content of specified file after cursor
Undo / Redo
u
undo last command
U
undo all changes on current line
CTRL-R
redo one change which was undone
Delete / Cut
x
deletes what is under cursor
dw
deletes from curser to end of word including the space
de
deletes from curser to end of word NOT including the space
cw
deletes rest of word and lets you then insert, hit ESC to continue with NORMAL mode
c$
deletes rest of line and lets you then insert, hit ESC to continue with with NORMAL mode
d$
deletes from cursor to the end of the line
dd
deletes entire line
2dd
deletes next two lines, continues: 3dd, 4dd and so on
Put (Paste)
p
uses what was deleted/cut and pastes it behind cursor
P
pastes clipboard in front of the cursor
Copy & Paste
yy
copies line, for copying several lines do 2yy, 3yy and so on
p
pastes clipboard behind cursor
Search in a File
Most regular expressions work /my_pattern
searches for my_pattern downwards, type n for next match
?my_pattern
searches for my_pattern upwards, type n for next match
:set ic
switches to ignore case search (case insensitive)
:set hls
switches to highlight search (highlights search hits)
Replacing Text (using Regular Expressions)
Great intro: A Tao of Regular Expressions :s/old_pat/new_pat/
replaces first occurrence in a line
:s/old_pat/new_pat/g
replaces all occurrence in a line
:s/old_pat/new_pat/gc
add 'c' to ask for confirmation
:#,#s/old_pat/new_pat/g
replaces all occurrence between line numbers: #,#
:%s/old_pat/new_pat/g
replaces all occurrence in file
:%s/\(pattern1\)\(pattern2\)/\1test\2/g
regular expression to insert, you need here '\' in front of parentheses (<# Perl)
:%s/\(pattern.*\)/\1 my_tag/g
appends something to line containing pattern (<# .+ from Perl is .* in VIM)
:%s/\(pattern\)\(.*\)/\1/g
removes everything in lines after pattern
:%s/\(At\dg\d\d\d\d\d\.\d\)\(.*\)/\1\t\2/g
inserts tabs between At1g12345.1 and Description
:%s/\n/new_pattern/g
Replaces return signs
:%s/pattern/\r/g
Replace pattern with return signs!!
:%s/\(\n\)/\1\1/g
Insert additional return signs
:%s/\(^At\dg\d\d\d\d\d.\d\t.\{-}\t.\{-}\t.\{-}\t.\{-}\t\).\{-}\t/\1/g
replaces content between 5th and 6th tab (5th column), '{-}' turns off 'greedy' behavior
:#,#s/\( \{-} \|\.\|\n\)/\1/g
performs simple word count in specified range of text
:%s/\(E\{6,\}\)/<font color="green">\1<\/font>/g
highlight pattern in html colors, here highlighting of >= 6 occurrences of Es
:%s/\([A-Z]\)/\l\1/g
change uppercase to lowercase, '%s/\([A-Z]\)/\u\1/g' does the opposite
:g/my_pattern/ s/\([A-Z]\)/\l\1/g | copy $
Uses 'global' command to apply replace function only on those lines that match a certain pattern.
The 'copy $' command after the pipe '|' prints all matching lines at the end of the file.:args *.txt | all | argdo %s/\old_pat/new_pat/ge | update
Command 'args' places all relevant files in the argument list (buffer)
'all' displays each file in separate split window
command 'argdo' applies replacement to all files in argument list (buffer)
flag 'e' is necessary to avoid stop at error messages for files with no matches
command 'update' saves all changes to files that were updated
Matching-Parentheses Search
Place the cursor on (, [ or { and type %
The courser moves to matching parentheses
HTML Editing
:runtime! syntax/2html.vim
Convert and open file text file to HTML format
Executing Shell Commands in Vim
:!<SHELL_COMMAND>
Executes any shell command, hit <enter> to return
:sh
Switches window to shell, 'exit' switches back to vim
Using Vim as Table Editor
v
starts visual mode for selecting characters
V
starts visual mode for selecting lines
CTRL-V
starts visual mode for selecting blocks (use CTRL-q in gVim under Windows). This allows column-wise selections and operations like inserting and deleting columns. To restrict substitute commands to a column, one can select it and switch to the command-line by typing ':'. After this the substitution sytax for a selected block looks like this: '<,'>s///
:set scrollbind
Starts simultaneous scrolling of 'vsplitted' files
:set scrollopt=hor
To set to horizontal binding of files
:AlignCtrl I=\t
:%AlignThis allows to align tables by column separators (here '\t') when the Align utility from Charles Campbell's is installed
To sort table rows by selected lines or block, perform the visual select and then hit F3 key. The rest is interactive. To enable this function one has to include in the .vimrc file from Gerald Lai the Vim sort script
Modify Vim Settings (in file .vimrc)
See last chapter of vimtutor (start from shell)
Useful .vimrc sample
- When vim starts to respond very slowly then one may need to delete the .viminf* files in home directory
10. The Unix Shell
When you log into UNIX/LINUX the system starts a program called Shell. It provides you with a working environment and interface to the operating system. Usually there are several different shell programs installed.
The shell program bash is very common.
finger <user_name> # shows which shell you are using cat /etc/shells | awk '/^\// {system("ls " $1)}' 2> /dev/null # lists all shell programs available on your system <shell_name> # switches to a different shell
STDIN, STDOUT, STDERR, Redirections, and Wildcards
See LINUX HOWTOs
By default, UNIX commands read from standard input (STDIN) and send their output to standard out (STDOUT).
You can redirect them by using the following commands:<beginning-of-filename>* # * is wildcard to specify many files ls > file # prints ls output into specified file command < my_file # uses file after '<' as STDIN command >> my_file # appends output of one command to file command | tee my_file # writes STDOUT to file and prints it to screen command > my_file; cat my_file # writes STDOUT to file and prints it to screen command > /dev/null # turns off progress info of applications by redirecting their output to /dev/null grep my_pattern my_file | wc # Pipes (|) output of 'grep' into 'wc' grep my_pattern my_non_existing_file 2 > my_stderr # prints STDERR to file
Useful shell commands
cat <file1> <file2> > <cat.out> # concatenate files in output file 'cat.out' paste <file1> <file2> > <paste.out> # merges lines of files and separates them by tabs (useful for tables) cmp <file1> <file2> # tells you whether two files are identical diff <fileA> <fileB> # finds differences between two files head -<number> <file> # prints first lines of a file tail -<number> <file> # prints last lines of a file split -l <number> <file> # splits lines of file into many smaller ones csplit -f out fasta_batch "%^>%" "/^>/" "{*}" # splits fasta batch file into many files at '>' sort <file> # sorts single file, many files and can merge (-m) them, -b ignores leading white space, ... sort -k 2,2 -k 3,3n input_file > output_file # sorts in table column 2 alphabetically and column 3 numerically, '-k' for column, '-n' for numeric sort input_file | uniq > output_file # uniq command removes duplicates and creates file/table with unique lines/fields join -1 1 -2 1 <table1> <table2> # joins two tables based on specified column numbers (-1 file1, 1: col1; -2: file2, col2). It assumes that join fields are sorted. If that is not the case, use the next command: sort table1 > table1a; sort table2 > table2a; join -a 1 -t "`echo -e '\t'`" table1a table2a > table3 # '-a <table>' prints all lines of specified table! Default prints only all lines the two tables have in common. '-t "`echo -e '\t'`" ->' forces join to use tabs as field separator in its output. Default is space(s)!!! cat my_table | cut -d , -f1-3 # cut command prints only specified sections of a table, -d specifies here comma as column separator (tab is default), -f specifies column numbers. grep # see chapter 4 egrep # see chapter 4
11. Screen
A Visual Introduction to Screen
Starting a New Screen Session
screen
screen # Start a new session screen -S <some-name> # Start a new session and gives it a name
Ctrl-a d
Detach from the screen session
Ctrl-a c
Create a new window inside the screen session
Ctrl-a Space
Switch to the next window
Ctrl-a a
Switch to the window that you were previously on
Ctrl-a "
List all open windows. Double-quotes " are typed with the Shift key
Ctrl-d or type exit
Exit out of the current window. Exiting form the last window will end the screen session
Ctrl-a [
Enters the scrolling mode.
Use Page Up and Page Down keys to scroll through the window.
Hit the Enter key twice to return to normal mode.
Attaching to Screen Sessions
- From any computer, you can attach to a screen session after SSHing into the server (e.g. Biocluster).
screen -r # Attaches to an existing session, if there is only one screen -r # Lists available sessions and their names, if there are more then one session running screen -r <some-name> # Attaches to a specific session screen -r <first-few-letters-of-name> # Type just the first few letters of the name and you will be attached to the session you need
Destroying Screen Sessions
- Terminate all programs that are running in the screen session. The standard way to do that is:
Ctrl-c
- Exit out of your shell. Type:
exit
- Repeat steps 1 and 2 until you see the sign:
[screen is terminating]
There may be programs running in different windows of the same screen session. That's why you may need to terminate programs and exit shells multiple times
Tabs and a Reasonably Large History Buffer
- For a better experience with screen, run
cp ~/.screenrc ~/.screenrc.backup 2> /dev/null echo 'startup_message off defscrollback 10240 caption always "%{=b dy}{ %{= dm}%H %{=b dy}}%={ %?%{= dc}%-Lw%?%{+b dy}(%{-b r}%n:%t%{+b dy})%?(%u)%?%{-dc}%?%{= dc}%+Lw%? %{=b dy}}" ' > ~/.screenrc
Related Topics
12. Simple Shell One-Liner Scripts
Useful One-Liners (script download)
- Renames file name.old to name.new - To test things first, replace 'do mv' with 'do echo mv'
for i in *.input; do mv $i ${i/name\.old/name\.new}; done
- Runs application in loops on many input files
for i in *.input; do ./application $i; done
- Runs fastacmd in loops on many *.input files and creates *.out files
for i in *.input; do fastacmd -d /data/../database_name -i $i > $i.out; done
- Runs SAM's target99 on many input files
for i in *.pep; do target99 -db /usr/../database_name -seed $i -out $i; done
- Searches in > 10,000 files for pattern and prints occurrences together with file names
for j in 0 1 2 3 4 5 6 7 8 9; do grep -iH <my_pattern> *$j.seq; done
- Example of how to run an interactive application (tmpred) that asks for file name input/output
for i in *.pep; do echo -e "$i\n\n17\n33\n\n\n" | ./tmpred $i > $i.out; done
- Runs BLAST2 for all *.fasa1/*.fasta2 file pairs in the order specified by file names and writes results into one file
for i in *.fasta1; do blast2 -p blastp -i $i -j ${i/_*fasta1/_*fasta2} >> my_out_file; done
This example uses two variables in a for loop. The content of the second variable gets specified in each loop by a replace function.
- Runs BLAST2 in all-against-all mode and writes results into one file; '-F F' turns low-complexity filter off
for i in *.fasta; do for j in *.fasta; do blast2 -p blastp -F F -i $i -j $j >> my_out_file; done; done;
How to write a script
- create file which contains in first line:
#!/bin/bash
- place shell commands in file
run <chmod +x my_shell_script> to make it executable
run shell script like this: ./my_shell_script
- when you place it into /usr/local/bin you only type its name from any user account
13. Simple Perl One-Liner Scripts
Useful One-Liners
- Replace something
perl -p -i -w -e 's/pattern1/pattern2/g' my_input_file # replace something (e.g. return signs) in file using regular expressions
use $1 to back-reference to pattern placed in parentheses
'-p' lets perl know to write program
'-i.bak' creates backup file *.bak, only -i doesn't
'-w' turns on warnings
'-e' executable code follows
- Parse lines that contain pattern1 and pattern2
perl -ne 'print if (/my_pattern1/ ? ($c=1) : (--$c > 0)) ; print if (/my_pattern2/ ? ($d = 1) : (--$d > 0))' my_input_file > my_output_file
following lines after pattern can be specified in '$c=1' and '$d=1'
for the OR function use this syntax: '/(pattern1|pattern2)/'
14. Remote Copy: wget, scp, ncftp
WGET (file download from the www)
wget http://www.ncbi.nlm.nih.gov/index.html # file download from www; add option '-r' to download entire directories
SCP (secure copy between machines)
General syntax
scp <source> <destination> # Use form 'userid@machine_name' if your local and remote user ids are different. If they are the same you can use only 'machine_name'.
Examples
- Copy file from Server to Local Machine (type from local machine prompt):
scp user@remote_host:file.name . # '.' copies to pwd, you can specify here any directory, use wildcards to copy many files at once.
Copy file from Local Machine to Server:scp file.name user@remote_host:~/dir/newfile.name
Copy entire directory from Server to Local Machine (type from local machine prompt):scp -r user@remote_host:directory/ ~/dir
Copy entire directory from Local Machine to Server (type from local machine prompt):scp -r directory/ user@remote_host:directory/
Copy between two remote hosts (e.g. from bioinfo to cache): similar as above, just be logged in one of the remote hosts:scp -r directory/ user@remote_host:directory/
NICE FTP
ncftp ncftp> open ftp.ncbi.nih.gov ncftp> cd /blast/executables ncftp> get blast.linux.tar.Z (skip extension: @) ncftp> bye
15. Archiving and Compressing
Compressing
tar -cvf my_file.tar mydir/ # Builds tar archive of files or directories. For directories, execute command in parent directory. Don't use absolute path. tar -czvf my_file.tgz mydir/ # Builds tar archive with compression of files or directories. For directories, execute command in parent directory. Don't use absolute path.
Viewing Archives
tar -tvf my_file.tar tar -tzvf my_file.tgz
Extracting
- try also:
tar -xvf my_file.tar tar -xzvf my_file.tgz gunzip my_file.tar.gz # or unzip my_file.zip, uncompress my_file.Z, or bunzip2 for file.tar.bz2 find -name '*.zip' | xargs -n 1 unzip # this command usually works for unziping many files that were compressed under Windows
options:tar zxf blast.linux.tar.Z tar xvzf file.tgz
f
use archive file
p
preserve permissions
v
list files processed
x
exclude files listed in FILE
z
filter the archive through gzip
16. Simple Installs
Systems-wide installations
- Installations for systems-wide usage are the responsibility of system administrator To find out if an application is installed, type:
which my_application whereis my_application_name # searches for executables in set of directories, doesn't depend on your path
Most applications are installed in /usr/local/bin or /usr/bin
You need root permissions to write to these directories
Perl scripts go into /usr/local/bin, Perl modules (*.pm) into /usr/local/share/perl/5.8.8/
To copy executables in one batch, use command:cp `find -perm -111 -type f` /usr/local/bin
Applications in user accounts
- Create a new directory, download application into this directory, unpack it (see chapter 13) and follow package-specific installation instructions. Usually you can then already run this application when you specify its location e.g.: /home/user/my_app/blastall. If you want you can add this directory to your PATH by typing from this directory:
PATH=.:$PATH; export PATH # this allows you to run application by providing only its name; when you do echo $PATH you will see .: added to PATH.
Installation of RPMs
RPMs are installable software packages used in the following Linux distributions: RedHat, CentOS, Fedora, SuSe, and others. rpm -i application_name.rpm
To check which version of RPM package is installed, type:rpm --query <package_name>
Help and upgrade files for RPMs can be found at http://rpmfind.net/.
Installation of Debian packages
Deb's are installable software packages used in the following Linux distributions: Debain, Ubuntu, and others.
Check whether your application is available at: http://packages.debian.org/stable/, then you type (no need to download):apt-cache search phylip # searches for application "phylip" from command line apt-cache show phylip # provides description of program apt-get install phylip # example for phylip install, manuals can be found in /usr/doc/phylip/, use zless or lynx to read documentation (don't unzip). apt-get update # do once a month do update Debian packages apt-get upgrade -u # to upgrade after update from above dpkg -i # install data package from local package file (e.g. after download) aptitude # Debian package manageing interface (Ctrl-t opens menues) aptitude search vim # search for packages on system and in Debian depositories
17. Devices
Mount/unmount usb/floppy/cdrom
mount /media/usb umount /media/usb mount /media/cdrom eject /media/cdrom mount /media/floppy
18. Environment Variables
xhost user@host # adds X permissions for user on server. echo $DISPLAY # shows current display settings export DISPLAY=<local_IP>:0 # change environment variable unsetenv DISPLAY # removes display variable env # prints all environment variables
- List of directories that the shell will search when you type a command
echo $PATH
You can edit your default DISPLAY setting for your account by adding it to file .bash_profile
19. Exercises
Exercise 1
Download proteome of Halobacterium spec from ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Halobacterium_sp/AE004437.faa (use wget or web browser for download)
- How many predicted proteins are there?
grep '>' AE004437.faa | wc -l
- How many proteins contain the pattern "WxHxxH[1-2]"?
egrep 'W.H..H{1,2}' AE004437.faa | wc -l
Use the find function (/) in less to fish out the proteins containing this pattern or more elegantly do it with awk:
awk --posix -v RS='>' '/W.H..(H){1,2}/ { print ">" $0;}' AE004437.faa | less
Create a BLASTable database with formatdb
formatdb -i AE004437.faa -p T -o T
-p F for nucleotide
-p T for protein databases
Generate list of sequence IDs for above pattern match result and retrieve its sequences with fastacmd from formatted database
fastacmd -d AE004437.faa -i my_IDs > my_sequences
Generate several lists of sequence IDs from various pattern match results and retrieve their sequences in one step using the fastacmd in for loop
for i in *.my_ids; do fastacmd -d AE004437.faa -i $i > $i.out; done
Run blastall with a few proteins against newly created database or against Halobacterium or UniProt database (on Biocluster: -d /srv/projects/db/uniprot/uniprot)
blastall -p blastp -i my_sequences -d AE004437.faa -o blastp.out -e 1e-6 -v 10 -b 10 &
The input_file is the list of sequence IDs that you generated with fastacmd
On one CPU core this will run for 2 or 3 minutes
Add -a 10 to run on 8 CPU cores (Requires a computer with 2 Quad-Core processors)
Parse blastall output into Excel spread sheet:
Run HMMPFAM search with above proteins against Pfam database
hmmpfam -E 0.1 --acc -A0 /srv/projects/db/PFAM/Pfam_ls my_sequences > my_output.pfam
Parse result with BioPerl parser
hmmSummary my_output.pfam > my_hmm.summary
Exercise 2
Split sample fasta batch file with csplit (use sequence file from exercise 1).
Concatenate single fasta files from (1) to one batch file.
- Find common hit IDs:
- a. BLAST two related sequences
- b. Retrieve the result in table format
c. Use join to identify common hit IDs in the two tables
View a Solution
Exercise 3
- Write a shell script that executes a range of BLAST searches at once.
- One way would be to repeat the command as many times as there are input files:
blastall -p blastp -d /.../my_database -i /.../my_input1 -o my_out1 -e 1e-6 -v 10 -b 10 & blastall -p blastp -d /.../my_database -i /.../my_input2 -o my_out2 -e 1e-6 -v 10 -b 10 & blastall -p blastp -d /.../my_database -i /.../my_input3 -o my_out3 -e 1e-6 -v 10 -b 10 &
Can you write a script without the repetitions?
- One way would be to repeat the command as many times as there are input files:
View a Solution
Exercise 4
Create multiple alignment with clustalw (e.g. use sequences with 'W.H..HH' pattern)
clustalw my_sequences1
Exercise 5
Reformat alignment into PHYILIP format using seqret from EMBOSS
seqret clustal::my_sequences1.aln phylip::my_sequences1.phylip cat my_sequences1.phylip
Exercise 6
Create neighbor-joining tree with phylip from PHYLIP
cp my_sequences1.phylip infile phylip protdist # creates distance matrix mv outfile infile phylip neighbor # use default settings cp outtree intree phylip retree # displays tree and can use midpoint method for defining root of tree, my typical command sequence is: 'N' 'Y' 'M' 'W' 'R' 'R' 'X' cp outtree my_tree.dnd cat my_tree.dnd | ruby -e 'while l=gets; print l.chomp; end; puts' # Print all on one line - necessary for TreeBrowse
View your tree in TreeBrowse or open it in TreeView
References
Savings in Open Source Confirmed. Softwaremag.com June 2008 (1)
The Unix time sharing system. D. M. Ritchie and K. Thompson. Communications of the ACM, 17(7):365–375, July 1974. (2)
Unix, Linux multitasking Librenix. April 2007 (3)
Remote access in Linux Polishlinux.org. March 2007 (4)
The Unix time sharing system D. M. Ritchie and K. Thompson. Communications of the ACM, 17(7):365–375, July 1974. (5)
List of open source software packages Wikipedia. Link added in March 2009. (6)
The Free Software Definition Free Software Foundation. Revision 1.10 was made in October 2001 (7)
[ Home ]
[ Workshops ]
[ R & BioC ]
[ BioC-Seq ]
[ R Programming ]
[ EMBOSS ]
[ Linux ]
[ Cluster ]