Go to Buz's Home Page

Treespace search FAQ

The question

I have been reading your recent arthropod cladistics paper and I was wondering if you could please help me with the tree space search idea ?

I have started by running an heuristic search in PAUP* 4.0b3 for windows with 500 random addition replicates, steepest descent, tbr branch swapping, nchuck=10, chuckscore=1. How do I go about doing the second search: branch swapping on those trees that were found in the first search? I have looked at Reid (1996), but I am not sure how to tell paup to branch swap on a set of trees found in a previous search.

Also, I wonder if you could tell me what bootstrapping with replacement is?

The answer

TREESPACE SEARCH: For most datasets (except really difficult ones like the Giribet and Wheeler dataset, Mol Phy Evol 1999), the best PAUP* settings are

so a typical command will look like:

hsearch addseq=random nchuck=3 chuckscore=1 nreps=1000;

Then when the above run finishes, do a

hsearch nchuck=0 chuckscore=0 start=current

to swap on the trees in memory, and turn off the tree/iteration limition. This will fill out your treespace, and in some instances even find a shorter tree. If this happens you need to re-do the treespace search, looking for "hits" of the new shorter limit. A good treespace search on difficult datasets will have approximately 10-30 hits on the shortest trees.

Use the following PAUP* commands, written as a PAUP* block that you can paste it directly in your data file at the end. PAUP* will automatically exectute these commands along with the data. Alternatively, put them in another file and execute it after executing the data.

BEGIN PAUP;
log start;
SET TCOMPRESS=YES INCREASE=AUTO TAXLABELS=FULL TORDER=RIGHT;
    [tidies up the output]
OUTGROUP 1-2; [set your outgroups]
hsearch addseq=random nchuck=3 chuckscore=1 nreps=1000;
    [the basic treespace search can increase hit rate by increasing nchuck, range 5-10]
hsearch start=current nchuck=0 chuckscore=0;
   [fills out the treespace]
savetrees; [save your result]
describe 1 / noplot; [get statistics on a tree]
contree; [get the strict consensus tree]

   [set up for a bootstrap]
default hsearch addseq=random nchuck=10 chuckscore=1 nreps=10;
   [heuristic options for each bootstrap iteration large samples allow high hit rate, nreps=10 limits the number of trees to 100 maximum (10x10) so as to avoid iteration bias in the bootstrap]
bootstrap nreps=1000 [for p <=0.001]
[ or bootstrap nreps=100 for p <=0.01]
log stop;
END;


To study character evolution, find two most different trees using the treedist command.

Bootstrapping (and the related jackknifing) is a method that simulates natural variation in your data, and can be used in many statistical contexts where we lack normal parametric methods of calculating such things as variances or means. Sampling with replacement simply means that the algorithm generates new datasets from your original data, but by sampling characters randomly until the data are the same size as the original. The replacement simply means that a data item (character in this case) once sampled, can be sampled again - it is replaced in the original data set. What phylogenetic algorithms really do is simply add a weight of 1 for each random pick of a particular character. As a result, the sampled data sets may lack some characters and have extra weights for other characters.

Now in the context of a phylogenetic analysis, one needs to assess whether one is interested in the presence of a particular clade. Although bootstrap analyses are often presented as tests of entire trees or multiple clades, this is statistically suspect, and amounts to multiple testing. If treated in this way, one should assess whether the joint bootstrap values for several clades are being used, in which case a more stringent alpha may be necessary.

For a bootstrap analysis, here are some good settings that avoid iteration bias, but give you a fairly quick analysis:

defaults hsearch addseq=random nchuck=10 chuckscore=1 nreps=5;

bootstrap nreps=1000;

Some phylogenetists are suspicious of the value of the bootstrap in a cladistic context, so you may also explore the PTP analysis which is also available in PAUP*, using the permute command. I am suspicious of a straight PTP that tests the level of heirarchical structure in a tree, because I believe that it is insensitive. You may study the literature over the last decade (starting with Faith, 1990; Faith & Cranston, 1991) on the subject and make your own decision. If you wish to test two different hypotheses (i.e., a tree from the literature and your final tree) you can use the compare2 option, or simply test the monophyly of a particular group using a constraint tree with only that clade represented.Used in this context, I believe that permuting results in more powerful tests.

Lots of teaching documents are available on the internet, including the Compleat Cladist and Diana Lipscombe's cladistics course at Washington University. Start at the Hennig site and navigate from there. A copy of Kitching et al, 1999 Systematic Assn Spec Publ 11 book on methods would be helpful.

Queries may be made to Buz Wilson

Go to Buz's Home Page