Y-haplogroups - what could I learn?
Jan 8, 2020 19:04:28 GMT -5
Post by merantau on Jan 8, 2020 19:04:28 GMT -5
This note is intended to give you some notion of what can be reasonably expected from Y chromosome haplotyping.
Why would one haplotype Y-chromosomes in surname studies? First, the Y-chromosome is inherited father-to-son (i.e. patrilineally) like a surname. Next, the bulk of the Y-chromosome does not undergo recombination, i.e., most of it is transmitted intact from the father to male descendants. This means that the mutations that occur in each generation are passed to the next, accumulating in unique patterns down each lineage. The pattern of mutations will indicate the tree of inheritance from distant prehistory to every male today.
The example I will use is an extensive study by Chinese geneticists into an ancient claim that CAO Cao (曹操) was not a descendant of the Dukes of CAO but was instead the son of a person adopted by a member of the CAO clan. You can access it here:-
www.nature.com/articles/jhg2011147
The main reason I have chosen this study is that the authors have done basic Y-haplotyping of 79 Cao (曹) clans. It gives a good idea of what you can and cannot do with Y-haplotyping. Incidentally, it could be helpful to the few members of this forum bearing that surname.
To follow this discussion, you will need to view Supplementary Table 2 from this study:-
static-content.springer.com/esm/art%3A10.1038%2Fjhg.2011.147/MediaObjects/10038_2012_BFjhg2011147_MOESM349_ESM.xls
First, if a clan stems from a single ancient male progenitor, the Y-haplotype of all males will have the same Y-haplotype – his. Exceptions arise from non-paternity events (NPE) like adoption, infidelity, etc. Assuming that the NPE rate in history was not excessive, many clan members will share a common haplogroup.
The study looked at clan records and identified clans that explicitly stated their descent from CAO Cao and those that explicitly excluded him from their lineage (indicated in column F). Assuming an acceptable NPE rate, if the historical claim is true, Cao clans with CAO Cao as their ancestor will have a different haplogroup from those in which he was explicitly stated not to have been an ancestor.
Their results showed that to be the case. Those with CAO Cao as ancestor were mostly O2-M268 while those that explicitly did not were mostly O3-002611.
It is necessary to realise the importance of assumptions and how statistical methods have been used to control for these. First, that the NPE rate was not excessive. If it were, the haplogroup distribution would be randomised toward that of the general population but it was not so extreme here as to wipe out any signal. However, you can see in clan 1 that a large number of those tested were not O2-M268 but O3-M117 which, at the very least, suggests an early NPE in its history. You will also observe a significant diversity in haplogroups in other clans. NPE is real in genealogies. Note also the many CAO clans in which no members tested positive for O2-M268 or O3-002611 haplogroups. This could be from inadequate sampling in some cases, excessive NPE in others or that the CAO surname was also adopted by other unrelated families.
Next, to be able to draw the conclusions they do, the origin myths have to have some historical basis because those are the claims you are testing. In this study, at the simplest level, they have determined that the clans claiming descent from CAO Cao and the ones that exclude him have different patrilineal ancestry. Given how this matches their own origin myths – it provides support for the historical claim that CAO Cao was not a descendant of the Dukes of CAO.
If I were to run a similar analysis on all clans that claim descent from Huangdi, and they had wildly different haplogroups, it will make those claims unsupportable – you cannot all be direct descendants of Huangdi, even if he existed.
What would I advise individuals or clans wanting to do Y-haplogroup analysis?
First, ask yourself what you intend to do about NPE? Is your sole identity as a clan based around patrilineal descent or does shared experience also matter (and NPE descendants have certainly shared a large chunk of your history).
Next, is some claim of descent from ancient royalty important to you? With enough accumulated data, that could be shown to be unlikely – is that unacceptable?
Note that direct patrilineal descent and surnames have an imperfect relationship – that is clear enough from the CAO study.
Another recent study:-
www.nature.com/articles/s10038-019-0616-2
“Paternal inheritance of both Y chromosome and surnames makes it possible to trace the origin and migration histories of surnames based on high-resolution Y chromosome phylogeny. In this study, 292 male samples with surname Ye (叶) in China were collected to unravel the history of this surname. Among these samples, O-F492 showed the highest frequency (26.71%). Analysis based on Y chromosome genotyping data of 52,798 males from virtually the whole China revealed a close correlation between O-F492 and surname Ye. High-throughput sequencing of 131 unrelated male individuals covering all sub-haplogroups in O-F492 was conducted to update the phylogeny of O-F492. Most of the Ye individuals (43/64, 67.19%) are embedded in three major branches, i.e., O-MF1461, O-MF15219, and O-FGC66159, deriving from the same node (O-FGC66168). These three clades restrictively distributed in different regions, likely attributed to independent differentiations. Coalescent ages of the three subclades are estimated ranging from 1,925 to 1,775 years ago, probably driven by the massive migration from north to south China after Yongjia riot in Jin Dynasty, consistent with the migration history of surname Ye. Our study thus shed important light on the history of the surname Ye from genetic perspective.”
It confirms what I already said earlier. First, three quarters of Ye individuals are not O-F492. NPE or multiple origins (i.e. polyphyletic) are major contributors to Ye ancestry. It also adds one element – with deep sequencing, you can derive the deep ancestry of a group of individuals (a subset of Ye) sharing a common ancestor.
This is because over those long periods, there are accumulated mutations on each branch that could be used to estimate how long ago a common ancestor lived.
[ I’m a retired bioinformatician. ]
Why would one haplotype Y-chromosomes in surname studies? First, the Y-chromosome is inherited father-to-son (i.e. patrilineally) like a surname. Next, the bulk of the Y-chromosome does not undergo recombination, i.e., most of it is transmitted intact from the father to male descendants. This means that the mutations that occur in each generation are passed to the next, accumulating in unique patterns down each lineage. The pattern of mutations will indicate the tree of inheritance from distant prehistory to every male today.
The example I will use is an extensive study by Chinese geneticists into an ancient claim that CAO Cao (曹操) was not a descendant of the Dukes of CAO but was instead the son of a person adopted by a member of the CAO clan. You can access it here:-
www.nature.com/articles/jhg2011147
The main reason I have chosen this study is that the authors have done basic Y-haplotyping of 79 Cao (曹) clans. It gives a good idea of what you can and cannot do with Y-haplotyping. Incidentally, it could be helpful to the few members of this forum bearing that surname.
To follow this discussion, you will need to view Supplementary Table 2 from this study:-
static-content.springer.com/esm/art%3A10.1038%2Fjhg.2011.147/MediaObjects/10038_2012_BFjhg2011147_MOESM349_ESM.xls
First, if a clan stems from a single ancient male progenitor, the Y-haplotype of all males will have the same Y-haplotype – his. Exceptions arise from non-paternity events (NPE) like adoption, infidelity, etc. Assuming that the NPE rate in history was not excessive, many clan members will share a common haplogroup.
The study looked at clan records and identified clans that explicitly stated their descent from CAO Cao and those that explicitly excluded him from their lineage (indicated in column F). Assuming an acceptable NPE rate, if the historical claim is true, Cao clans with CAO Cao as their ancestor will have a different haplogroup from those in which he was explicitly stated not to have been an ancestor.
Their results showed that to be the case. Those with CAO Cao as ancestor were mostly O2-M268 while those that explicitly did not were mostly O3-002611.
It is necessary to realise the importance of assumptions and how statistical methods have been used to control for these. First, that the NPE rate was not excessive. If it were, the haplogroup distribution would be randomised toward that of the general population but it was not so extreme here as to wipe out any signal. However, you can see in clan 1 that a large number of those tested were not O2-M268 but O3-M117 which, at the very least, suggests an early NPE in its history. You will also observe a significant diversity in haplogroups in other clans. NPE is real in genealogies. Note also the many CAO clans in which no members tested positive for O2-M268 or O3-002611 haplogroups. This could be from inadequate sampling in some cases, excessive NPE in others or that the CAO surname was also adopted by other unrelated families.
Next, to be able to draw the conclusions they do, the origin myths have to have some historical basis because those are the claims you are testing. In this study, at the simplest level, they have determined that the clans claiming descent from CAO Cao and the ones that exclude him have different patrilineal ancestry. Given how this matches their own origin myths – it provides support for the historical claim that CAO Cao was not a descendant of the Dukes of CAO.
If I were to run a similar analysis on all clans that claim descent from Huangdi, and they had wildly different haplogroups, it will make those claims unsupportable – you cannot all be direct descendants of Huangdi, even if he existed.
What would I advise individuals or clans wanting to do Y-haplogroup analysis?
First, ask yourself what you intend to do about NPE? Is your sole identity as a clan based around patrilineal descent or does shared experience also matter (and NPE descendants have certainly shared a large chunk of your history).
Next, is some claim of descent from ancient royalty important to you? With enough accumulated data, that could be shown to be unlikely – is that unacceptable?
Note that direct patrilineal descent and surnames have an imperfect relationship – that is clear enough from the CAO study.
Another recent study:-
www.nature.com/articles/s10038-019-0616-2
“Paternal inheritance of both Y chromosome and surnames makes it possible to trace the origin and migration histories of surnames based on high-resolution Y chromosome phylogeny. In this study, 292 male samples with surname Ye (叶) in China were collected to unravel the history of this surname. Among these samples, O-F492 showed the highest frequency (26.71%). Analysis based on Y chromosome genotyping data of 52,798 males from virtually the whole China revealed a close correlation between O-F492 and surname Ye. High-throughput sequencing of 131 unrelated male individuals covering all sub-haplogroups in O-F492 was conducted to update the phylogeny of O-F492. Most of the Ye individuals (43/64, 67.19%) are embedded in three major branches, i.e., O-MF1461, O-MF15219, and O-FGC66159, deriving from the same node (O-FGC66168). These three clades restrictively distributed in different regions, likely attributed to independent differentiations. Coalescent ages of the three subclades are estimated ranging from 1,925 to 1,775 years ago, probably driven by the massive migration from north to south China after Yongjia riot in Jin Dynasty, consistent with the migration history of surname Ye. Our study thus shed important light on the history of the surname Ye from genetic perspective.”
It confirms what I already said earlier. First, three quarters of Ye individuals are not O-F492. NPE or multiple origins (i.e. polyphyletic) are major contributors to Ye ancestry. It also adds one element – with deep sequencing, you can derive the deep ancestry of a group of individuals (a subset of Ye) sharing a common ancestor.
This is because over those long periods, there are accumulated mutations on each branch that could be used to estimate how long ago a common ancestor lived.
[ I’m a retired bioinformatician. ]