A note on numbering:
Perhaps one of the most frustrating things for first-time users of Hennig86 is the way it treats anything numbered. Everything begins with zero (0) not one (1)! Thus your taxa in your matrix, for the purposes of setting outgroups, inputting tree topolog
ies,
etc. are considered by Hennig86 to occur in the order 0, 1, 2, 3... and your characters, for the purposes of ordering/unordering, are in the order 0, 1, 2, 3... . Similarly, internal tree files are numbered 0 through nine. Trees output from an
analysis are numbered from 0, so if a bunch of them scroll past your eyes and the last one says "tree 5" it is the sixth tree.
Create the data matrix with a text editor (e.g., EDIT, Norton Editor, or WEDIT) or with a word-processor (e.g., Word or WordPerfect, making sure to save as "text-only", "unformated", or "DOS text" as the case may be).
The DOS file suffix can be anything you choose, or absent altogether.
Hennig86 recognizes the command "xread" to read in a data matrix. Immediately following "xread" can be a title enclosed in single quotes and then the number of characters followed by the number of taxa, or the title can be omitted.
The following matrices (filename = test.hen) are identical:
xread 'test file' do not put apostraphes or semicolons in your 7 6 title (e.g., 'Smith's data; reanalysis') zero 0000000 one 0001110 two 0111211 three 0111111 four 1012111 five 1012110 ; xread 'test file' 7 6 zero 0 0 0 0 0 0 0 one 0 0 0 1 1 1 0 two 0 1 1 1 2 1 1 three 0 1 1 1 1 1 1 four 1 0 1 2 1 1 1 five 1 0 1 2 1 1 0 ; xread 'test file' 7 6 zero 0000000 one 0001110 two 0111211 three 0111111 four 1012111 five 1012110;
Thus, taxon names can be of any length, any number of spaces or carraige returns can be inserted between taxon names and character states.
To designate unknown,or inapplicable states use either "-" or "?". by convention, "-" denotes inapplicable and "?" denotes unknown, though they are treated exactly the same by the algorithm.
The semicolon at the end is optional for Hennig86 but is absolutely required for CLADOS, and for certain algorithms in Random Cladistics.
To read in the matrix, type "proc test.hen;" at the *>
prompt. If you receive a "proc>" prompt it means you
neglected to include the semicolon after "proc test.hen";
simply type the semicolon and hit
Optionally, you may place the command "proc /;" at the end
of the data matrix. This will allow all successive commands to
entered all on a single line from DOS (e.g.,C:\ ss proc test.hen; mh; bb; tp;)
instead of having to do it interactively.
I recommend operating Hennig86 interactively until you know
what you're doing.
You may also enter a data matrix while you are in Hennig86. However, you can't save it so why bother. To do this, type the following:
by leaving off the semicolon, Hennig86 stays in xread mode giving you the xread> prompt.
*> xread 'test file'
To start up Hennig86 just type "ss" at the DOS prompt.
You will be met with the generic Hennig86 prompt *>.
To read in your matrix type "proc" and your filename followed by a
semicolon (e.g., *> proc test.hen; ). If you receive a procedure> prompt
it means you forgot to put the semicolon in... just enter it at this
prompt and all will be fine.
HOWEVER...
Usually you will likely want to both see what you are doing and
have the results saved to a logfile that can be examined
later. The default options of Hennig86 will only output to the
screen, not a file. However, if you define a logfile,
output will only go to the logfile and not to the screen!
To get both, begin by declaring a logfile of any name with a "log
*> log test.out;opens a file for output
and continue. This will create a logfile called "test.out"
and will turn screen display on.
At anytime you can suppress output to the logfile by typing
"log-;" and turn it back on by typing "log*;". Similarly
you can turn off the output to the screen by typing
"display-;".
You might want to do this as you fiddle around and figure out what's happening to your data as you analyse it while not dumping extraneous stuff to your logfile.
Hennig86 will automatically choose the first taxon in the
matrix as the outgroup if none is specified.
To specify outgroups type "outgroup = " followed by their
numbers. TAXA ARE NUMBERED FROM ZERO! NOT FROM ONE!
For example:
*> outgroup = 0 1 2;
To view which taxa have been set as outgroups type:
*> outgroup;
Okay this next bit is important so pay attention... Hennig86 will not
constrain all designated outgroup taxa to sit outside of the ingroup if it
is not globally parsimonious to do so (usually it is though if you have
chosen ingroup and outgroup taxa with care). If more than
one taxon is designated as an outgroup, one of them is always considered
to be the "prime" outgroup taxon. That is, the ultimate root for the tree
is determined by a single prime outgroup taxon. BY DEFAULT, THIS IS THE
NUMERICALLY-FIRST TAXON DESIGNATED. If you wish some other outgroup taxon
to be designated as prime (for example taxon 1 instead of taxon 0 in the
example above), type the following:
*> outgroup = 0 1 2 /1;
The front-slash designates the prime taxon.
Note, it is impossible to interactively inactivate taxa with
Hennig86. This can only be accomplished by editing the input
file, and changing the specified number of taxa following the xread
statement.
To view the current status of character codes type "ccode;"
alone. Something like the following will be shown:
The top line is the character number (FIRST = ZERO). The
second line has three positions for each character. The
first position is the weight (all weighted 1 above). The
second position indicates whether the character is "ordered"
(+) or "unordered" (-). The third position indicates
whether the character is active ([) or inactive (]).
NOTE: the default options in Hennig86 are as above, unitary weight, ordered and active. If this is not your preference you must change it on your own. For example, if you are running dna sequence characters, you must unorder all multistate characters.
To change weights type "ccode" followed by a slash and the
weight followed by the character(s) to be weighted. For
example:
*> ccode /2 3 5; weights the fourth and sixth by 2
To change "ordering" is similar:
*> ccode - 3 5; unorders the fourth and sixth
To change whether or not the character is active:
*> ccode ] 3 5; inactivates the fourth and sixth
Ranges of characters can be identified by placing a period
between them. For example:
*> ccode - 3.5; unorders 3 through 5
more than one coding change can be performed in a single command:
*> ccode - 10.23,5 /2 8,17,24 /3 6,10 ] 16;
The above will unorder the eleventh through the 24th, as well as unorder the 6th, will weight the ninth, eighteenth and 25th by 2, will weight the seventh and eleventh by 3 and will inactivate the seventeenth.
Hennig86 has many ways to do this depending on how exhaustive you wish to be.
Heuristic
Exhaustive
For fewer than 20 taxa the exhaustive procedures can be used efficiently.
*> ie-;
finds only one of possibly many equally parsimonious trees
*> ie;
finds all equally parsimonious trees provided that there are fewer than 100 of them. At 100, OVERFLOW is indicated.
*> ie*;
uses all available memory to save equally parsimonious trees regardless of number.
*> hennig;
pretty much useless on its own. Calculates one tree by a single pass through the data. Not likely to be of shortest length. As a start-up for branch swapping mhennig is better anyway.
*> mhennig;
calculates multiple trees by multiple passes through the data. The trees are not likely to be most parsimonious, if the data are complicated. Used as starting point for branch swapping.
*> bb;
does nothing if hennig or mhennig has not been done first.
*>mhennig; bb;
generates multiple trees and then applies branch swapping to them to find multiple equally parsimonious trees (up to 100).
*> mhennig; bb-;
as above but holds onto only one tree in the end.
*> mhennig; bb*;
as above but holds onto more than 100 if there are that
many.
In practise you will find that "mhennig;bb;" is a decent place to start looking at your results,. However, I would not publish until you have at least attempted "ie;"!
NOTE THAT PERFORMING THE ABOVE ONLY FINDS THE TREES, IT DOES NOT OUTPUT THOSE TREES ANYWHERE, NOR DISPLAY THEM.
To view/output all resulting trees currently in memory:
*> tplot;
You might want to take a quick look at the number found before actually doing this!
NOTE: if you want to look at them before sending them to a
file, don't forget to type "log-;" first to turn off your logfile.
If there are multiple trees the screen will keep scrolling,
use [CTRL]-S to delay the scroll.
NOTE: trees are numbered from 0 (like everything else in
Hennig86).
CAUTION: often, the output to a logfile from a tplot are
unreadable later; if so, go back, re-do the analysis and type "txascii-;" first.
Trees will output to the logfile in angular form.
To view/output all resulting trees in parenthetical
notation:
*> tlist;
Hennig86 will only construct strict consensus. The command
is "nelsen;" (with an e). It does not create a Nelson (with an o) Consensus (i.e.,
combinable components), nor majority rule, nor Adams
consensus.
Doing so will wipe out any memory of the original trees
unless you saved them as an internal treefile.
To see the consensus you must follow with a "tplot;" command of course.
Hennig86 allows you to hold on to some of your steps as you go along, using the keep and get commands.
Let's say you want to:
1) calculate a set of MEPT's by a heuristic search command,
2) view all of the trees, and then
3) you want to construct a consensus tree without erasing the
original trees and then
4) view the consensus;
then you decide that you are satified with the results and wish to
5) output the original trees as well as
6) output the consensus.
If you entered the commands in the order: mhennig; then bb; then tplot; to see the trees then nelsen; this last command will create the consensus but will also wipe out all memory of the
previous trees on which it was based!!!!
The following will circumvent this:
*> proc test.hen; reads in data
*> outgroup = 0 1; sets outgroups
*> mhennig; calculates trees
*> bb; applies branch breaking
*> tplot; outputs trees to screen
*> keep 1; puts trees found by bb into internal treefile
#1.
*> nelsen; constructs a consensus
*> tplot; views the consensus
*> keep 2; puts the consensus in internal treefile #2
*> log test.out; opens a logfile for output
*> display*; allows screen display as well
*> get 1; retrieves trees in treefile #1;
*> tplot; outputs trees to screen and test.out
*> get 2; retrieves consensus tree in treefile #2;
*> tplot; outputs consensus tree to screen and to test.out
If you study the above you will realize the logic of the following rules:
Internal treefiles are numbered 0 through 9.
Treefile #0 is always current.
Any action will wipe out the contents of #0 replacing it
with the results of that action. These actions include
calculating a consensus, choosing some of the MEPT's with a
tchoose command (see Choosing Trees below), calculating new
trees after weighting etc., retrieving a stored treefile, or
importing a saved external treefile (see External Treefiles
below).
"keep n;" puts tree(s) in #0 into internal treefile #n and
leaves them in #0. If there was already something in #n it
will be wiped out.
"get n;" retreives tree(s) from #n and puts them into #0,
and leaves them in #n. Whatever was in #0 is wiped out.
"erase n;" deletes the contents of #n.
Now that you know something about how Hennig86 works, you
should be aware that multiple commands can be put on a
single line to executed as a batch. If you do so, you will
not be able to stop it until it's finished.
Also most Hennig86 commands have short forms that save on
typing time.
Thus the above series of commands detailed in the internal
treefile section can be done as follows:
*> proc test.hen;
*> outgroup = 0 1; mhennig; bb; tplot; keep 1;
*> nelsen; tplot; keep 2;
*> log test.out; display*; get 1; tplot; get 2; tplot;
Or as follows:
*> p test.hen;
*> o = 0 1; m; bb; tp; k 1; n; tp; k 2; l test.out; d*; g 1; tp; g 2; tp;
In general it takes some time to figure this out. Other
less severe abbreviations are allowed (e.g., mhennig = mh =
m and outgroup = out = o).
Commands previously typed and executed can be repeated or
repeated and edited by typing F3.
A more complete listing of cammands and abbreviations can be found on the last pages of this guide.
Occassionally, you may like one or more of the MEPT's over
some of the others, though I can't imagine why.
Alternatively, because the tplot command dumps out all trees
on the screen at once you may want to look at them one at a
time.
To do this you need to invoke the tchoose command (or just
tc). But be aware that like taxa and characters, trees are
numbered from zero, not from one.
In practise you'd be wise to save all trees in an internal
treefile first so you can get them back easily.
For example:
*> mh;bb*; calculates MEPTs allowing >100
*> k 1; stores them in internal #1
*> tc 0; keeps only the first (0) in resident (#0)
*> tp; looks at it
*> g 1; retrieves all trees
*> tc 1; tp; keeps only the second and looks at it
*> g 1; tc 2; tp; you get the idea
The xsteps (or xs) is very versatile depending on what
options you use.
*> xs;(same as xs l )
outputs the following for 5 equally parsimonious trees in
the default (#0) internal treefile:
which is telling you that all 5 trees have 12 steps given
the data at hand.
Let's say this was based on ordered multistate data and you
want to see what effect unordering has with respect to the
number of implied steps on these 5 trees (as opposed to recalculating trees from the revised codes). The following would accomplish this:
*> cc -.; unorders all characters
*> xs; optimizes all characters on the 5 pre-existing trees
If the results are as follows:
you'd know that, though all trees are equivalent with
respect to the ordered data, they are not all equivalent
with respect to unordered data.
For every character on every tree, "xsteps c;" will output
the number of steps, the character CI and the character RI
under the character's number. For example:
If there are multiple trees in resident memory, this will be
done for each tree and will be followed by the best fit for
each character among all trees and the worst fit of each
character among all trees. Typing "xsteps m;" will do the
latter (best and worst) alone.
"xsteps h;" is a little more complicated and has to do with
character state changes and optimizations. In practise
you'll find CLADOS more helpful. What it does is, for each
tree in resident memory (i.e., internal treefile #0), and for each character,
is pump out the state (or possible states) implied for each
internal node according to how they are numbered on the
results of a tplot. This has very limited utility as the
number of an ancestral node will vary from one equally
parsimonious tree to the next such that they are not
directly comparable across MEPT's.
So, for example, given two data sets in files test.one and
test.two. WITH THE CAVEAT THAT BOTH DATA SETS HAVE THE SAME
NUMBER OF TAXA ENTERED IN THE SAME ORDER IN THE MATRIX.
Consider the following:
*> proc test.one;
*> mh;bb;
*> tsave one.tre; saves trees to a file
*> proc test.two; imports new data set wiping out old
*> mh; bb;
*> tsave two.tre; saves trees to file
*> proc test.one; retrieves first data set
*> proc two.tre; retrieves results of second
*> xs; forces first data on second results
*> proc test.two; retrieves second data set
*> proc one.tre; retrieves results of first
*> xs; forces second data on first results
If you look at the saved file (e.g., view one.tre) you'll
notice it's in parenthetical notation with the taxon numbers
representing the taxa which is why if you do this for
different data sets everything must be in the same order.
To create a topology of your own you need to know how to
interpret trees in parenthetical notation.
((0 1) (2 (3 4)))
where a pair of matched parentheses delimits a monophyletic
group
In situations of unresolved taxa:
they are represented as follows:
((0 1) 2 (3 4))
I reccomend that to get used to this, follow a tplot with a
tlist and compare the two.
In any case, you may input a tree topology of your own
making into the resident treefile (i.e., #0) wiping out any
tree(s) that are there currently as follows:
*> tread ((0 1) (2 (3 4)));
which is the same as using the taxon names:
*> tread ((one two) (three (four five)));
For safety, you should match parentheses. For speed, you don't have to:
*> tread one two) (three (four five;
is the same as above. Consult the Hennig86 manual for more advanced notations.
Another convenient way of building trees is as follows:
*> tread (0 4) (leave off semicolon so you can keep on tread-ing)
which builds exactly the same tree as above. Note that the pre-existing taxon must appear first in the replacement pair.
*> yama
or just
*> y
WHEN I TYPE A COMMAND I GET GET THE COMMAND NAME FOLLOWED BY A
QUESTION MARK.
The command does not exist.
WHEN I TYPE A COMMAND, I GET THE COMMAND NAME FOLLOWED BY A ">".
You forgot the terminal semicolon.
Enter it now.
WHEN I TRY TO READ IN MY DATA FILE I GET CHARACTERS AND A ?
RETURNED AND I CAN'T RUN MY DATA SET.
Your input file is not formatted correctly. Likely, the number of
characters and number of taxa were entered in reverse order (common
among PAUP users), or the numbers do not match the matrix size.
WHEN I TRY TO READ IN MY DATA FILE I GET "open
The name you used is wrong, or not in the directory.
WHEN I TRY TO SET OUTGROUPS, HENNIG86 DOES NOT SET THEM TO THE NUMBERS I HAVE HAVE REQUESTED.
You typed "outgroup # # #;" and forgot the "=" sign.
Or you're forgetting that the first taxon in your matrix is #0 not #1.
WHEN I RUN MY DATA SET I GET AN UNREALISTIC TREE WITH HUGE NUMBERS OF STEPS.
You've made the mistake of using 9's for missing data (an old paup-ism)
instead of "?" or "-".
WHEN I TYPE "bb" NOTHING HAPPENS.
bb must be preceded by h or mh
WHEN I RUN MY DATA SET, NOT ALL OF MY OUTGROUPS FALL OUTSIDE OF THE INGROUP.
That's life. You've violated the primary assumption of monophyly
of the ingroup and any results are suspect. Welcome to global parsimony.
WHEN I EXAMINE MY LOGFILE WITH A WORD PROCESSOR, THE TREES ARE FULL OF WEIRD CHARACTERS INSTEAD OF LINES.
Try a different font. Or go back into Hennig86 and type "txascii-;" first.
note, the portion in bold-roman should be typed, any portion of the rest of the command is optional, for example, mhennig, m must be typed, but mh, mhe, mhen, mhenn, mhenni, or mhennig will all have the desired effect. Where an ambiguou
s abbreviation is given
(e.g., "t"), Hennig86 will choose for you!
switches are options that may follow the command depending on what you want to do, below they are enclosed in curly braces which are not part of the command.
assist
batch{-}
bb { -, * }
bytes
ccode{ -, +, /, ], [ }
ckeep
cget
display{ -, * }
erase
files
get
hennig
mhennig
ie{ -, * }
keep
log{ -, *, / }
nelsen
outgroup{ =, / }
procedure
quote
reroot
steps
tchoose
tlist
tplot
tread
tsave
txascii{ -, * }
view{;}
watch
xread
xsteps{l, c, m, h, w, u}
xx
yama
xread> 7 6
xread> zero 0000000
xread> one 0001110
xread> two 0111211
xread> three 0111111
xread> four 1012111
xread> five 1012110; the semicolon tells Hennig86 the end of input
*>
*> display*;allows screen dsplay as well
*> proc test.hen;reads in contents of test.hen
0 1 2 3 4 5 6
1+[ 1+[ 1+[ 1+[ 1+[ 1+[ 1+[
*> ccode - .; unorders all characters
*> ccode /+[ .; returns everything to default of 1+[
see later for interpreting these
0 1 2 3 4
12 12 12 12 12
0 1 2 3 4
10 11 10 11 11
0 1 2 3 4 5 6 7
1 2 1 1 3 2 4 1
100 50 100 100 66 50 25 100
100 0 100 100 100 75 50 100
tsave" and is followed by a filename of your
choice.
is the same as
tread> (0 1) replace 0 above with 1 and 0 as sister taxa
tread> (4 2) replace 4 above with 2 and 4 as sister taxa
tread> (4 3); replace 4 above with 3 and 4 as sister taxa,
semicolon=end
brings up list of commands, the assistance is not breathtaking
(nil) cause exit to DOS on any error
- remain in Hennig86 on any error [default]
branch breaking on pre-existing trees
(nil) find many up to 100
- find one
* find all up to maximum available mmeory
indicates how much memory is taken up by Hennig86
character coding display and change
requires a numerical argument (0 through 9) indicating an internal character-
coding file in which to save the codes as they are currently set and from
which they can be retrieved later, see cget
retquires numerical argument (0 through 9) indicating the internal character-
coding file (previously saved by a ckeep) from which codes are to be re-set.
monitor display toggle
- turn off [default after a log command]
* turn on [default before a log command, or after logfile is closed]
requires a numerical argument indicating the internal treefile to erase
list internal treefiles by number of each, number of trees in each, and the
command that created each.
requires a numerical argument specifying the treefile to retrieve.
calculate one tree by one pass through the data
calculate multiple tree by multiple passes through the data
implicit enumeration of all trees to find shortest tree(s)
requires a numerical argument specifying the internal treefile in which to save
the trees currently in resident memory
logfile opening and output control
- stop outputting
construct a strict consensus of current trees
a fairly complicated way of looking at what you're doing. If you specify a filename Hennig86 looks for the file in the current directory, if the file is not there, it will open one with that filename and expect you to start inputting data at the procedur
e> prompt. Once you specify proc/; it closes the procedure file. In so far as prepared files go, the file may contain the data matrix alone, (i.e., an xread command) and you will then be able to work with Hennig86 interactively. or it may contain any s
eries of commands following an xread command to cause Hennig86 to go ahead and do everything without you having to type anything. Usually you'll just want to put a data matric in the file so you can run Hennig86 interactively. More than one proc stateme
nt may be involved in your running of Hennig86, for example, one for reading the data matrix and one later for reading in trees saved by a tsave command.
allows comment lines to be sent to your logfile
e.g., quote the next trees are with states unordered;
used when trees have been calculated (at the expense of your time) and a decision to change the outgroup composition has been made afterwards (i.e., use reroot immediately following a out= command). The tree(s) are re-rooted without having to recalculate
them.
longer output than xsteps
requires numerical argument(s) specifying which tree(s) of those currently in
in resident memory are to be kept, the rest are discarded.
like tplot but in parenthetical notation
show me all trees in resident memory
read in trees in parenthetical notation
output a Hennig86 procedure file to disk which has a tread and the current
trees in parenthetical notation.
file viewer.
(do not use the semicolon on the command line, as it will automatically close the file
before you get a chance to see the whole thing)
programer's vanity
must be followed by properly formatted data input
tree diagnostics
Dos Equis, character diagnostics, pay attention now...
accepts no argument, exit from Hennig86 unceremoniously and
without prompting you if you want to save anything