Below is software we have developed in our lab from most recent to oldest.
3D-CLUMP: 3D-CLustering by Mutation Position (3D-CLUMP) is an unsupervised clustering of amino acid residue positions where variants occur, without any prior knowledge of their functional importance, in 3D space. Available at https://github.com/tnturnerLab/3d-clump.
Relevant Preprint: https://www.medrxiv.org/content/10.1101/2024.02.02.24302238v1
acorn: An R package that works with de novo variants (DNVs) already called using a DNV caller (e.g., HAT). The toolkit is useful for extracting different types of DNVs and summarizing characteristics of the DNVs. Available at https://github.com/TNTurnerLab/acorn.
Relevant Publication: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05457-z
PYRUS: A tool for plotting copy number estimate data, from an individual, for user-specified regions of the genome. It has several options including plotting other individuals in the same region, plotting an annotation track, and writing out specific regions where the individuals have a copy number below or above given values. The input to the tool is bgzipped and tabix indexed bed files, which enables rapid plotting of the data. Available at https://github.com/TNTurnerLab/PYRUS.
Short Writeup: https://github.com/TNTurnerLab/PYRUS/blob/main/paper/paper.md
HAT: Hare And Tortoise, HAT, are two de novo variant callers we developed for parent-child trio sequencing data. Hare, as seen in Ng et al. 2022, uses the software Parabricks, v4.0.0-1, by NVIDIA, that leverages GPUs to accelerate variant calling, specifically for Haplotyecaller GATK 4.2.0 and DeepVariant v1.4.0. Tortoise uses freely available, open-source versions of these variant callers. We then use GLnexus to form family level joint-genotyped files to be run through our custom de novo variant filter. Available at https://github.com/TNTurnerLab/hat
Relevant Preprint: https://doi.org/10.1101/2023.01.27.525940
Relevant Publication: https://pubmed.ncbi.nlm.nih.gov/36054329/
ACES: A workflow to query small sequences in a large set of genomes. It provides several outputs including BLAST results, a multiple sequence alignment file, a graphical fragment assembly file, and a phylogenetic tree file. Available at https://github.com/TNTurnerLab/ACES.
Relevant publication: https://pubmed.ncbi.nlm.nih.gov/34601580/
fitDNM for noncoding: fitDNM was originally developed by the Allen lab (http://people.duke.edu/~asallen/Software.html) in Jiang et al 2015, Am. J. Hum. Genet. (https://www.cell.com/ajhg/fulltext/S0002-9297(15)00277-3) to incorporate functional information in test of excess de novo mutational load. We adapted the pipeline, in collaboration with the Allen lab, to utilize CADD scores instead of PolyPhen-2 scores in order to run in noncoding regions of the genome and implemented a scalable version of the pipeline to test many elements at once. Given a bed file that contains the regions of interest one wants to test for a significant excess of de novo mutations and the corresponding variants to use, this pipeline will output two summary files that contain the p values and scores calculated by fitDNM for each element in the bed file in the “.fitDNM.report” file and a summary of all mutations found in these genomic regions in the “.mutation.report” file. Available at https://github.com/TNTurnerLab/fitDNM.
Relevant Publication: https://pubmed.ncbi.nlm.nih.gov/34256850/