2006-03-20

Bioinformatics practical - Comparative Genomics

Use ClustalX to perform MSA and Jalview to edit your MSA results and create phylogenetic tree using programs from PHYLIP package with Patched proteins and PDGFRs as examples.

An Overview Collecting homologous protein sequences à Use a text editor to create a FASTA file (orthologs and paralogs) à Perform MSA using ClustalX à Edit MSA result using Jalview (save output file in both aln and (save edited file in aln format) phy format) à Reopen the edited MSA result à Use phylip seqboot to perform bootstrapping using ClustalX and save the file (phy file as input) into phy format à phylip protdist to calculate distance à phylip neighbor to calculate neighbor (“infile” as input file) à Use phylip drawgram to draw “rooted” tree (“intree” as input file)

Programs

ClustalX to perform multiple sequence alignment (MSA)

Jalview to edit MSA result

PHYLIP package to bootstrap and draw phylogenetic tree

Methods

Section A. Patched proteins

Patched proteins have been identified in a diverse group of organisms and function in Hedgehog signaling. At least two isoforms of patched proteins (patched 1 and 2) have been identified in mouse. In this section, we will examine the evolution of the patched proteins in various vertebrates.

Using one of the mouse Patched homologs (Q61115) to identify orthologs for Patched-1 and Patched-2 using blast search against SWISS-PROT database via a local server (http://sf01.bic.nus.edu.sg/blast/blast.html).
Extract the Patched-1 and Patched-2 sequences from the blast results and create a text file in FASTA format. Name this file as “Patched.fasta” using a text editor.
Launch ClustalX (KMenu à Education à ClustalX)
Load Patched.fasta file. Click on Alignment à Output Format Options, select Clustal format and PHYLIP format.
Do Complete Alignment.


Editing the MSA result

Jalview is used to edit the MSA result. Save the edited MSA result as Patched.aln to overwrite the aln file from previous MSA run. Use ClustalX to open this file and save as “Patched.phy” to overwrite the phy file from previous run. The “Patched.phy” will be the input file for PHYLIP package.


Generate Phylogenetic Tree using PHYLIP package

Download phylip_3.6.1-2_i386.deb from IVLE.
Click on “Penguin” à Root Shell
Type cd Desktop to change directory to Desktop
Type ls to list the file content on Desktop
Type dpkg -i phylip_3.6.1-2_i386.deb to depackage the PHYLIP files
Type /usr/bin/phylip to see the program options for PHYLIP package
By now you should have Patched.aln and Patched.phy on the Desktop


Bootstrap

Type phylip seqboot

You will see the following:

seqboot: can't find input file "infile"
Please enter a new file name>

Type Patched.phy and Enter. You will see something like this:

Bootstrapping algorithm, version 3.61

Settings for this run:
D Sequence, Morph, Rest., Gene Freqs? Molecular sequences
J Bootstrap, Jackknife, Permute, Rewrite? Bootstrap
% Regular or altered sampling fraction? regular
B Block size for block-bootstrapping? 1 (regular bootstrap)
R How many replicates? 100
W Read weights of characters? No
C Read categories of sites? No
S Write out data sets or just weights? Data sets
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, ANSI, none)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes

Y to accept these or type the letter for one to change

Type “Y” to accept the setting and Enter

Next you will see this:

Random number seed (must be odd)?

Enter any odd number e.g. 11 and Enter

You will see the following:

completed replicate number 10
completed replicate number 20
completed replicate number 30
completed replicate number 40
completed replicate number 50
completed replicate number 60
completed replicate number 70
completed replicate number 80
completed replicate number 90
completed replicate number 100

Output written to file "outfile"

Done.

Check Desktop for “outfile”. Rename the outfile as seqbootout.txt Double click on this file to see the content.


Calculate Distance

Type phylip protdist
You will see something like this

Protein distance algorithm, version 3.61

Settings for this run:
P Use JTT, PMB, PAM, Kimura, categories model? Jones-Taylor-Thornton matrix
G Gamma distribution of rates among positions? No
C One category of substitution rates? Yes
W Use weights for positions? No
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes

Are these settings correct? (type Y or the letter for one to change)

Type “Y” to accept the setting and Enter

You will see something like

Computing distances:
O35595PT
Q9Y6C5PTC .
Q61115PTC ..
Q13635PT ...
Q90693PT ....
Q98864PT .....
Q09614PT ......
Q6T3U4NP .......

Output written to file "outfile"

Check “outfile” on Desktop and double clicking to see its content. Now rename “outfile” as “infile” which will be the input file to calculate neighbor using “neighbor” option in PHYLIP.

Calculate Neighbor

Under root shell, type phylip neighbor and Enter

You will see the following

Neighbor-Joining/UPGMA method version 3.61

Settings for this run:
N Neighbor-joining or UPGMA tree? Neighbor-joining
O Outgroup root? No, use as outgroup species 1
L Lower-triangular data matrix? No
R Upper-triangular data matrix? No
S Subreplicates? No
J Randomize input order of species? No. Use input order
M Analyze multiple data sets? No
0 Terminal type (IBM PC, ANSI, none)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes


Y to accept these or type the letter for one to change

Type “Y” to accept the setting and Enter.

You will see

Protein distance algorithm, version 3.61

Settings for this run:
P Use JTT, PMB, PAM, Kimura, categories model? Jones-Taylor-Thornton matrix
G Gamma distribution of rates among positions? No
C One category of substitution rates? Yes
W Use weights for positions? No
M Analyze multiple data sets? No
I Input sequences interleaved? Yes

Protein distance algorithm, version 3.61
Settings for this run:
P Use JTT, PMB, PAM, Kimura, categories model? Jones-Taylor-Thornton matrix
G Gamma distribution of rates among positions? No
C One category of substitution rates? Yes
W Use weights for positions? No

Neighbor-Joining/UPGMA method version 3.61

Settings for this run:
N Neighbor-joining or UPGMA tree? Neighbor-joining
O Outgroup root? No, use as outgroup species 1
L Lower-triangular data matrix? No
R Upper-triangular data matrix? No
S Subreplicates? No
J Randomize input order of species? No. Use input order
M Analyze multiple data sets? No
0 Terminal type (IBM PC, ANSI, none)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes


Y to accept these or type the letter for one to change

Type “Y” to accept the setting and then Enter. You will see something like

Cycle 5: species 1 ( 0.03614) joins species 2 ( 0.05739)
Cycle 4: species 3 ( 0.01320) joins species 4 ( 0.02322)
Cycle 3: node 3 ( 0.05218) joins species 5 ( 0.06119)
Cycle 2: species 7 ( 0.98182) joins species 8 ( 2.21938)
Cycle 1: node 3 ( 0.23822) joins node 7 ( 0.11636)
last cycle:
node 1 ( 0.26219) joins node 3 ( 0.05675) joins species 6 ( 0.17702)

Output written on file "outfile"

Tree written on file "outtree"

Done.


Check “outfile” and “outtree” on Desktop and double clicking on them to see their content.

Rename “outtree” as “intree” which will serve as input file for drawgram option in PHYLIP.

Tree Drawing

The “drawgram” option is used to generate a “rooted” tree.

Type phylip drawgram in root shell and Enter

You will see


DRAWGRAM from PHYLIP version 3.61
Reading tree ...
Tree has been read.
Loading the font ....
Font loaded.

Rooted tree plotting program version 3.61

Here are the settings:
0 Screen type (IBM PC, ANSI): ANSI
P Final plotting device: Postscript printer
V Previewing device: X Windows display
H Tree grows: Horizontally
S Tree style: Phenogram
B Use branch lengths: Yes
L Angle of labels: 90.0
R Scale of branch length: Automatically rescaled
D Depth/Breadth of tree: 0.53
T Stem-length/tree-depth: 0.05
C Character ht / tip space: 0.3333
A Ancestral nodes: Weighted
F Font: Times-Roman
M Horizontal margins: 1.65 cm
M Vertical margins: 2.16 cm
# Pages per tree: one page per tree

Y to accept these or type the letter for one to change

Type “Y” to accept the setting and Enter. You will see

Writing plot file ...

Plot written to file "plotfile"

Done.

A “drawgram” will appear. Examine the “drawgram”. Go to File à Plot

Check Desktop for “plotfile” and rename it as “plotfile.bmp” and double click to view.


Now insert the protein sequence with SWISS-PROT accession number Q13635 into the FASTA file created previously and repeat multiple sequence alignment, edit MSA result and draw tree as procedures you gone through above using ClustalX, Jalview, and PHYLIP, respectively.

No comments: