Go to Table of Contents
Table of Contents
Diana, Goddess of the Hunt — for Ancestors!
 
Go to Every-Name Index
Every-Name Index
 
What is the Difference between a Cluster and a Clade?
And Why Does This Matter for Researchers in Genetic Genealogy?
If you have ventured into the world of genetic genealogy, sooner or later you will encounter two terms:  "cluster" and "clade."  These terms are sometimes used interchangeably, as if they were synonymous, but they're not.  These are technical terms from the field of systematic biology and other related fields, such as, paleontology, evolutionary biology, and, now, genetic genealogy.  As technical terms, they have very specific meanings, but as with all technical terms that go into popular usage, their meanings gets blurred — the most unfortunate example being the difference between the scientific meaning of the word, "theory," versus the popular meaning, but I digress...  As succinctly as I can manage…
A cluster is a group of things placed together on the basis of their resemblance to one another, irrespective of their evolutionary relationship, if any.  It may be a group of objects, a group of species, a group of individuals, or, in the case of genetic genealogy, typically, a group of Y-DNA STR haplotypes.

When morphological traits (i.e., physical traits) are used to define the groups, the method is usually referred to as phenetics or, more broadly, cluster analysis.  When the traits are genetic, one clustering method employed is referred to as neighbor-joining.  The tree so produced is usually called a dendrogram or phenogram.  There are many methods of clustering in use.

A clade is a group of organisms defined by their having a common biological ancestor, irrespective of how closely they may or may not resemble one another.  To be a clade, the group must contain the ancestor and all of its descendants — and no one else — a condition known as monophyly ("mono-" meaning "one" and "phyl-" meaning race, tribe, or kind).

In genetic genealogy, a clade usually refers to a group united by possessing the same SNP mutations.  By logically deducing the chronological appearance of SNP mutations, it's possible to produce a hierarchical branching diagram called a haplotree, which is, by nature of its construction, a cladogram.  The tree itself and each of its individual branches is a clade, and the method used to form such clades and trees is called cladistic analysis or simply cladistics.  At each branch in the tree a determination must be made as to the "polarity" of the trait involved, that is, which state is ancestral (-) and which is derived or descendant (+), so their chronological order can be logically deduced.  Cladograms may be node-based (at the forks of the tree) or branch-based (on the limbs of the tree), and may or may not include time as a factor, usually indicated by the lengths of the branches in the diagram.

As examples of how clusters and clades differ…  Groups of birds clustered on the basis of having plumage of the same color would not produce monophyletic groups, nor would the resulting dendrogram reflect their evolutionary relationships.  For example, a Scarlet Ibis is more closely related to a White Ibis than it is to a Scarlet Macaw, and so on.  Similarly, while mice and humans don't much resemble one another, at least not superficially (they do internally), they both belong to the Class Mammalia, which is a valid monophyletic clade based on such traits as endothermy, the possession of hair, and the suckling of young through lactation.  Zoology is rife with examples of organisms whose appearance belies their actual evolutionary relationships (e.g., elephants being the closest relatives of sea cows and hyraxes).
As you can see, the method of forming clusters and clades is different, and the nature of the group formed is different.  A cluster is defined by resemblance, which may or may not reflect common ancestry, while a clade is defined entirely on proving common ancestry through logical deduction, despite what may be an apparent lack of resemblance.

When genetic data are used, the cluster often has a high statistical probability of also being a clade (of having a single common ancestor), but there always remains the possibility that it isn't a clade due to convergence or parallelism (i.e., coincidence). 

Proving a group is monophyletic is important because the single greatest problem in evolutionary biology is detecting convergence and parallelism, which will cause organisms or haplotypes to resemble one another, despite having different ancestral origins.  Convergence and parallelism are less of a problem with genetic data than they are with traditional morphological traits, but the problem has not disappeared.

The issue of distinguishing coincidental resemblance from true common ancestry has already lead people analyzing autosomal DNA to coin the terms, "IBS" (identical by state) and "IBD" (identical by descent).  As more and more people become tested, the risk of mistaking resemblance for relationship will only increase. 

There needs to be something "outside" the cluster that independently insures its monophyly.  In the case of genetic genealogy, pinning STR clusters to a SNP cladogram (i.e., using STR testing and SNP testing in combination) is one way of assuring the clusters are also clades.  Another way is using paper pedigrees to determine the polarity of STR mutations, though it's only useful in genealogical time.  Such proofs, where independent forms of evidence support the same conclusion, are often termed, "elegant," because of their scientific rigor. 

Lastly, and especially, I would like to emphasize that SNP status is not a mere "attribute" of the STR cluster.  The organization of SNP mutations into a cladogram (haplotree) supercedes the organization of STR mutations into clusters.  You are a SNP clade, first, and an STR cluster, second.

UPDATE (10 Oct 2015):  With the BigY and other advanced SNP testing bringing the Y-DNA haplotree down into genealogical time, it has become useful to use SNP testing to support paper genealogies, pinning individual STR mutations, not just STR clusters, to a SNP cladogram.  When the two are in synch, you have  solid support for both the phylogeny and the genealogy.

Contact Home
Page
Table of
Contents
DNA
Hub
Biddle
DNA
Carrico
DNA
Corbin
DNA
Cupp
DNA
Danish
DNA
Ely
DNA
Lyon(s)
DNA
Rasey
DNA
Reason
DNA
Rose
DNA
Straub
DNA
Pedigree
Charts
Census
Hubs
Every-Name
Indices

Table of Contents
Go to Table of Contents
Every-Name Index
Go to Every-Name Index