Below is software we have developed in our lab from most recent to oldest.



acorn: An R package that works with de novo variants (DNVs) already called using a DNV caller (e.g., HAT). The toolkit is useful for extracting different types of DNVs and summarizing characteristics of the DNVs. Available at

Relevant Publication:


PYRUS: A tool for plotting copy number estimate data, from an individual, for user-specified regions of the genome. It has several options including plotting other individuals in the same region, plotting an annotation track, and writing out specific regions where the individuals have a copy number below or above given values. The input to the tool is bgzipped and tabix indexed bed files, which enables rapid plotting of the data. Available at

Short Writeup:


HAT: Hare And Tortoise, HAT, are two de novo variant callers we developed for parent-child trio sequencing data. Hare, as seen in Ng et al. 2022, uses the software Parabricks, v4.0.0-1, by NVIDIA, that leverages GPUs to accelerate variant calling, specifically for Haplotyecaller GATK 4.2.0 and DeepVariant v1.4.0. Tortoise uses freely available, open-source versions of these variant callers. We then use GLnexus to form family level joint-genotyped files to be run through our custom de novo variant filter. Available at

Relevant Preprint:

Relevant Publication:


ACES: A workflow to query small sequences in a large set of genomes. It provides several outputs including BLAST results, a multiple sequence alignment file, a graphical fragment assembly file, and a phylogenetic tree file. Available at

Relevant publication:

Updated fitDNM

fitDNM for noncoding: fitDNM was originally developed by the Allen lab ( in Jiang et al 2015, Am. J. Hum. Genet. ( to incorporate functional information in test of excess de novo mutational load. We adapted the pipeline, in collaboration with the Allen lab, to utilize CADD scores instead of PolyPhen-2 scores in order to run in noncoding regions of the genome and implemented a scalable version of the pipeline to test many elements at once. Given a bed file that contains the regions of interest one wants to test for a significant excess of de novo mutations and the corresponding variants to use, this pipeline will output two summary files that contain the p values and scores calculated by fitDNM for each element in the bed file in the “” file and a summary of all mutations found in these genomic regions in the “” file. Available at

Relevant Publication: