Researchers Built a 13 Million Person Family Tree

You may come from a big family, but your family tree is not the biggest one in existence. Researchers have built a 13 million person family tree. How did they do it?

The research article was published in Science in March of 2018. It was titled “Quantitative analysis of population-scale family trees with millions of relatives”. The researchers are Joanna Kaplanis, Assaf Gordon, Tal Shor, Omer Weisbrod, Dan Geiger, Mary Wahl, Michael Gershovits, Barak Markus, Mona Sheikh, Melissa Gymrek, Gaurav Bhatia, Daniel G. MacArthur, Alkes L. Price, and Yaniv Erlich. Many of them are geneticists and bioinformaticians.

The Abstract includes the following description: “Here, we collected 86 million profiles from publicly-available online data shared by genealogy enthusiasts. After extensive cleaning and validation, we obtained population-scale family trees, including a single pedigree of 13 million individuals. We leveraged the data to partition the genetic architecture of longevity by inspecting millions of relative pairs and to provide insights into the geographical dispersion of families. We also report a simple digital procedure to overlay other datasets with our resource in order to empower studies with population-scale genealogical data.”

Yaniv Erlich was a leading academic research into DNA data storage, genome hacking, and population genetics at Columbia. It was there he was first introduced to and the Geni data set. Today, Yaniv Erlich is the chief scientific officer of MyHeritage, Geni’s parent company.

The researchers found some interesting things. They looked at lifespan variation between more than three million pairs of relatives and found that your chances of living longer could only be chalked up to your genes about 16 percent of the time. Other variables that can affect a person’s lifespan include their environment, lifestyle, and accidents.

The main purpose of the paper was to show that the kind of genealogical data, crowdsourced from descendants who seek out sites like, could offer up the same analytical insights as more traditional demographic datasets. Wired reported that traditional demographic datasets are are way more labor and cost-intensive to produce. As an example, the most recent US Census cost about $13 billion to produce.

Another advantage to datasets like those at, are publicly available. Anyone can download the researchers’ family tree and demographic data, in a de-identified format. Geni has set up its API to allow researchers to contact anyone in the database to get their consent to access their data.

Related Articles at

* You’re Invited to the Global Family Reunion

* How Many Cousins Do You Really Have?

* MyHeritage Released Sun Charts

< Return To Blog

Leave a Reply

Your email address will not be published.