All Ancient DNA Dataset

Miscellanea Population Genomics All Ancient DNA Dataset

  • This topic has 49 replies, 3 voices, and was last updated 2 years ago by .
Viewing 10 posts - 41 through 50 (of 50 total)
  • Author
  • #34321
    Carlos Quiles

      Updated to version 2.04.53, including new data from (and updates to previously reported) early farmers of Anatolia, South-East and Central Europe, from Marchi et al. bioRxiv (2020).

      Carlos Quiles

        Updated to version 2.04.68, including the changes reported for Uyelgi samples, as well as other FTDNA Haplotree (provisional) assessments, like:

        • Kostenki14, which splits hg. C1b-B66 with another FTDNA customer.
        • SG41, which splits hg. D1a-BY12975 with another FTDNA customer from Kazakhstan.
        • Samples from Yu et al. Cell (2020), such as UKY001 and probably KAG001, GLZ002, which will probably split hg. C2a-BY728.
        Carlos Quiles

          Release of version 2.05:

          I have been updating the Ancient DNA Dataset, with some global additions, clearly enough to change version number. In particular, these are the columns added (or those I consider likely or possible to be added):

          FTDNA-Y-Haplotree: for FTDNA Y-Haplotree Y-names. I hope that sorting the file following their SNP order will help clarify the actual position of each ancient sample in their respective haplogroup branches.

          As you might have noticed, I am also shifting the “main” original column, YTree, to an FTDNA-friendly naming system. Naming consistency was becoming an issue, since many samples have now a depth that cannot be followed with either ISOGG or YFull.

          NOTE. For the moment, though, I am wary of changing the subclade naming for certain haplogroups. For example, haplogroup J – for some reason – appears to have an important user base in YFull which encourages the addition of ancient samples to their YTree. Anyway, it looks as though in the near future, when all ancient samples get fully analyzed and published by FTDNA, the whole haplogroup naming ecosystem will possibly be dominated by FTDNA.

          Y-SNP: I am now selecting only SNPs approved by FTDNA, so as to avoid the many dubious SNPs described by other companies and individuals but not fully accepted by others. Nevertheless, a proper terminal SNP (with negative and dubious ones) needs a manual check, and (unless you are Michael Sager) this is an impossible task for one person. Also, I am not well-versed in most subclades, and a certain experience with ancient and modern samples is needed when it comes to assess which derived and ancestral downstream calls are more likely to be correct. I will be posting links to the files, including pathPhynder’s estimation, apart from including as many alternative Responsible-SNP sources, to strengthen the reliability of each call.

          Isotopes: Basically, whether the sample is considered local or non-local, not necessarily the specific isotopic values, which might increase the file unnecessarily.

          Skeletal-Element: Will NOTE be included, for the moment. I am not convinced that a column with bone type (or other sample origin) is useful for this ancient haplogroup compilation, except maybe for statistical analyses. For the moment, I prefer not to increase the file size.

          Data-Type: Ditto. Furthermore, by following the current Reich Lab’s naming standard (adding .SG or .DG) I think this information is mostly included in the Object_ID of the samples relevant for genome-wide analyses.

          Qualitative Assessment/Confidence of archaeological and chronological contextualization for the genetic data of an individual: Very useful new columns added currently and for the past (2?) years by the Reich Lab. Since most samples offer reliable results, only some offer doubts, and a few have alerts, it seems like the most economic choice, I am not sure if only doubts and alerts should be added to the final column, reserved for “site”, which seems like the most economic choice. Until the next release of the Reich Lab curated Dataset, I don’t think I will make a decision on this.

          In general, I will try to keep up with the Reich Lab’s Dataset naming changes, to make both compatible and easy to combine when performing formal stats, even though their slow pace of corrections (and radical naming changes from the first to the second version released) suggest that those conventions might not be valid for long.

          Carlos Quiles

            Version 2.05.07 includes recent samples from:

            Other papers like Moussa et al. (2021) and others with few samples – to see the whole list of new samples since your last downloaded version, order the spreadsheet by date (second-to-last column).

            Carlos Quiles

              Updated to version 2.05.21, including:

              Minor changes, like the update of I6561, the Alexandria sample of hg. R1a-Y3, dated supposedly ca. 4000 BC, but now corrected in the AADR based on genetic data (as I suggested to the authors here):

              Context: Layer date based on 6 20-28 cM IBD individuals with Srubnaya/Alakul/Kazakhstan_MLBA individuals from 3900-3400 [based on these genetic results we ignore the direct date of 4153-3970 calBCE (5215±20 BP, PSUAMS-2832) from same site calibrated as 95.4%; IntCal20, OxCal v4.4.2 Bronk Ramsey (2020)

              Carlos Quiles

                Updated to version 2.05.75 (There have been other intermediate versions published with some of these updates):

                • New rules for access to Y-SNP files: Now fully restricted to reliable users; bots are forbidden.
                • I have checked new batches of samples for SNP calls from the FTDNA Haplotree, including Allentoft et al. (2015), Mathieson et al. (2015) and (partially) Mathieson et al. (2018), Damgaard et al. Nature (2018), and Jeong et al. (2020).
                • Added links to Y-SNP calls from Olalde et al. (2018) and Olalde et al. (2019). Currently working on Damgaard et al. Science (2018).

                  The new color codes are intended to immediately convey information visually about recent Y-SNP updates (2021):

                • light green background: Those checked by me, in contrast with those in green background with the ‘seal of approval’ of FTDNA or YFull.
                • estimations bold: those calls considered estimations by me (due e.g. to lack of intermediate SNPs, or unreliable derived on ancestral SNP calls subject to deamination).
                • Strikethrough: in the “responsible” column, whenever the previous call is corrected (not just updated to a more specific subclade, which remains underlined).
                Carlos Quiles

                  Recent changes leading up to the current version 2.06.160:

                  Today I added FTDNA’s assessment of the Y-DNA of Peder Winstrup from Krzewińska et al. (2021).

                  Also updated are the ADMIXTURE values, including the new samples from Gnecchi et al. (2021), and experimenting with the SE Asia proxy: now the reference is Thailand LN_BA rather than Papuan.

                  All files (including PDFs) updated and uploaded.

                  Carlos Quiles

                    New version 2.07, now adding an inverted Formation-Age Ratio (FAR) applied to Y-SNPs and mt-SNPs, as a measure of time-related precision of the terminal SNP: the closer the value is to 1, the closer the formation date is to the ancient sample’s (radiocarbon vs. contextual) date.

                    This metric was proposed by Jari Kinnunen (from, and estimates are based on some relatively recent YFull formation dates adapted to FTDNA’s Y-DNA Haplotree at SNP Tracker.

                    [We are still waiting for FTDNA’s own estimations to be published, as recently announced]

                    Changes from the previously published 2.06.209 also include new SNP inferences, especially from the R1a (mainly Z93) and R1b branches (mainly P312).

                    The spreadsheet is up to date with the most recent reports of ancient samples.


                      Hi Carlos,

                      Do you have any idea when the next update of the ancient spreadsheet/map will be available?

                      Best regards,




                        could ancient Y-DNA from these studies be added to the dataset?



               (medieval Czech, data on page 83 and 84 of “text prace”)

                        I apologize if it’s already there.

                      Viewing 10 posts - 41 through 50 (of 50 total)
                      • You must be logged in to reply to this topic.