Peter Kerpedjiev needed a crash course in genetics. A software engineer with some training in bioinformatics, he was pursuing a PhD and thought it would really help to know some fundamentals of biology. “If I wanted to have an intelligent conversation with someone, what genes do I need to know about?” he wondered.
Kerpedjiev went straight to the data. For years, the US National Library of Medicine (NLM) has been systematically tagging almost every paper in its popular PubMed database that contains some information about what a gene does. Kerpedjiev extracted all the papers marked as describing the structure, function or location of a gene or the protein it encodes.
Sorting through the records, he compiled a list of the most studied genes of all time — a sort of ‘top hits’ of the human genome, and several other genomes besides.