Middle author bioinformatics

your friendly neighborhood bioinformaticians

Services

1. What's-in-the-bag? 🔎🧫

  • Strain-typing and serotyping

  • Multi-Locus sequence typing (MLST)

  • Prokaryote vs eukaryote identification

2. Resistance-is-futile! ✊

  • Antimicrobial resistance discovery

  • Variant-calling and mutation prediction

  • Annotation of biosynthetic gene clusters

3. Get-me-the-microbes! 🪆

  • Bacterial and Archaeal Genome Reconstruction from Mixed Samples

  • Viral and phage genome discovery

  • Genome evaluation and annotation

4. Dude-where's-my-transposon? 🛫✂️🛬

  • Analysis of Tn-Seq datasets

  • Identification of insertions and genes affected

  • Functional annotation of impacted genes

5. Everything-RNA 🧬

  • Differential Expression Analysis

  • Transcriptome Assembly

  • Identification and Removal of Ribosomal RNA (rRNA)

6. Amplicon-and-microbiome 🧮

  • Generation of Amplicon Sequence Variants (ASVs) or Operational Taxonomic Units (OTUs)

  • 16S rRNA and Internal Transcribed Spacer Sequencing

  • Alpha and Beta Diversity Metrics

7. Genome-assembly 🪢

  • Prokaryotic and eukaryotic Genomes

  • de novo annotation of prokaryotic genomes

  • Transcriptome and proteome-guided annotation of eukaryotic genomes

8. Structural-biology 🏗️

  • de novo protein folding and structural prediction

  • Domain classification and functional annotation

  • Identification of structural homologs

Selected Publications and Software contributions

Garber, A. I., Armbruster, C. R., Lee, S. E., Cooper, V. S., Bomberger, J. M., & McAllister, S. M. (2022). SprayNPray: user-friendly taxonomic profiling of genome and metagenome contigs. _BMC Genomics_, 23(1), 202.
link to article
link to software.

Syberg-Olsen, M. J., Garber, A. I., Keeling, P. J., McCutcheon, J. P., & Husnik, F. (2022). Pseudofinder: detection of pseudogenes in prokaryotic genomes. _Molecular Biology and Evolution_.
link to article.
link to software.

Ramírez, G. A., Keshri, J., Vahrson, I., Garber, A. I., Berrang, M. E., Cox, N. A., González-Cerón, F., Aggrey, S. E., & Oakley, B. B. (2022). Cecal Microbial Hydrogen Cycling Potential Is Linked to Feed Efficiency Phenotypes in Chickens. _Frontiers in Veterinary Science_, 9, 904698.
link to article

• Garber, A. I., Kupper, M., Laetsch, D. R., Weldon, S. R., Ladinsky, M. S., Bjorkman, P. J., & McCutcheon, J. P. (2021). The Evolution of Interdependence in a Four-Way Mealybug Symbiosis. _Genome Biology and Evolution_, 13(8).
link to article

Garber, A. I., Zehnpfennig, J. R., Sheik, C. S., Henson, M. W., Ramírez, G. A., Mahon, A. R., Halanych, K. M., & Learman, D. R. (2021). Metagenomics of Antarctic Marine Sediment Reveals Potential for Diverse Chemolithoautotrophy. _mSphere_, 6(6), e0077021.
link to article
link to software.

• Garber A. I., Nealson KH, Okamoto A, McAllister SM, Chan CS, Barco RA, Merino N (2020) FeGenie: a comprehensive tool for the identification of iron genes and iron gene neighborhoods in genome and metagenome assemblies. _Frontiers in Microbiology_ 11:37.
link to article
link to software.

• Ramírez GA, Garber A. I., Lecoeuvre A, D’Angelo T, Wheat CG, Orcutt BN (2019) Ecology of Subseafloor Crustal Biofilms. _Frontiers in Microbiology_.
link to article.
link to software

• Armbruster, C. R., Marshall, C. W., Garber, A. I., Melvin, J. A., Zemke, A. C., Moore, J., Zamora, P. F., Li, K., Fritz, I. L., Manko, C. D., Weaver, M. L., Gaston, J. R., Morris, A., Methé, B., DePas, W. H., Lee, S. E., Cooper, V. S., & Bomberger, J. M. (2021). Adaptation and genomic erosion in fragmented Pseudomonas aeruginosa populations in the sinuses of people with cystic fibrosis. Cell Reports, 37(3), 109829.
link to article

• Keffer, J. L., McAllister, S. M., Garber, A. I., Hallahan, B. J., Sutherland, M. C., Rozovsky, S., & Chan, C. S. (2021). Iron Oxidation by a Fused Cytochrome-Porin Common to Diverse Iron-Oxidizing Bacteria. _mBio_, 12(4), e0107421.
link to article

• BagOfTricks: A set of short-to-medium length software tools for various bioinformatics tasks. This is a growing list of tools that I make for various projects, which I then make broadly available to others for use.
link to software.

Please reach out to [email protected] for
information on how to securely transfer data to the MAB server.

Data drives can also be mailed directly to the following address:
Middle Author Bioinformatics
16 W. Loma Vista Dr. Unit 102
Tempe, AZ 85282
United States of America
Phone Number: (818) 324-1145

Leadership

Arkadiy Garber: Founder, CEO
Middle Author Bioinformatics was founded to fill a data-analysis gap. Specifically, there is a lot of sequencing data generated by academic, industry, and government labs all across the country and world. MAB 's purpose is to connect bioinformaticians with projects in need of DNA sequencing and/or data analysis.

Dr. Vaughn Cooper: Founder, scientific advisor
Vaughn Cooper is an evolutionary microbiologist and Professor at the University of Pittsburgh. He is co-Founder and Scientific Advisor of both SeqCoast Genomics and Middle Author Bioinformatics, which work together to provide advanced genome-scale sequencing and analyses using intuitive and accessible processes. He also founded EvolvingSTEM, an innovative education program for high school biology students, as well as the Center for Evolutionary Biology and Medicine at Pitt. Previously, he co-founded the Microbial Genome Sequencing Center (MiGS) and was a professor at the University of New Hampshire. He has an AB in Biology with honors from Amherst College, a PhD in Ecology, Evolutionary Biology, and Behavior from Michigan State University, and was a Fellow of the Michigan Society of Fellows at the University of Michigan in evolutionary biology and pediatrics. Dr. Cooper is a Fellow of the American Academy of Microbiology and his research on microbial and genomic evolution has appeared in >100 publications. He believes that there has never been a better time to be a microbiologist or geneticist thanks to unprecedented advances in technology, and he is committed to democratizing access to these powerful tools.

Dr. Jean-Paul Baquiran: Director of Operations
Jean-Paul (JP) is Chief Scientific Officer of Biologic Environmental, a firm specializing in sustainable waste management. Jean-Paul has a long career that reflects his passion in bioremediation and sustainability. His background in microbial and molecular biology, in combination with strong business and leadership skills, makes JP an important asset to this company.

Dr. Gustavo Ramírez: Hiring Manager, Senior Bioinformatician
Gustavo Ramírez is an Assistant Professor at California State University, Los Angeles, where he leads a bioinformatics lab that focuses on microbial ecology of the marine subsurface and host-microbe interactions in broiler chickens. Gustavo has mentored countless students and has devoted a large proportion of his time ensuring the success of early career scientists. Gustavo's profound interest in mentorship and education has been of great benefit to Middle Author Bioinformatics, and key to developing the company's educational program.

Arkadiy Garber: Senior BioinformaticianArkadiy enjoys programming and designing/implementing bioinformatics pipelines and software packages. His research spans geomicrobiology, microbial ecology, environmental and clinical microbiology, and evolutionary biology. As humanity generates more and more biological sequence data, we increase potential to make novel discoveries that improve our understanding of fundamental biology and help to implement biotechnological and clinical improvements. The generation of sequence data greatly outpaces the rate at which this data can be processed, analyzed, and understood. To address this discrepancy, he decided to launch a bioinformatics firm whose purpose is to assist in the processing, analysis, and interpretation of biological sequence data.

Dr. Michael Pavia: Bioinformatician, specializing in metagenomics and metatranscriptomics
Mike has nearly a decade of experience in microbiology. Much of this decade has been spent at the command line, wrangling hundreds of metagenomes and metatranscriptomes. In addition to reconstructing thousands of genomes from metagenomes, Mike has spent a lot of time thinking about biogeochemical cycling in the environment, reconstructing metabolic pathways, and attempting to follow up on the many hypothesis that have been generated form his data. A true believer is scientific dissemination and improving access to science for all, Mike also founded a podcast, the aptly named Mikroscope, designed to deliver cutting-edge scientific discoveries to the general public.

Dr. Ashley Cohen: Bioinformatician, specializing in RNA-FISH and biostatistics
Ashley earned her Masters degree in Geosciences and PhD in Marine Sciences with a concentration in microbiology at Stony Brook University in NY. Her background includes organic biogeochemistry, microbiology, ecology, and biostatistics. A combination of expertise in wet-lab and computational techniques makes Ashley an invaluable resource when it comes to generating, processing, and interpreting the many different forms biological data. Ashley is passionate about creating tools to solve common problems in microbiology workflows, tools that are accessible to users with no coding experience. Ashley has also enjoyed many years of outreach by advising undergraduate students in the laboratory and creating open-source python tutorials as part of the Bioinformatics Virtual Coordination Network.

Common Bioinformatics services

ServiceDescription
Prokaryotic genome assembly and annotationReference-guided or de novo assembly using Unicycler, followed by a comprehensive annotation pipeline that includes de novo prediction of coding (CDS) and non-coding (tRNA, rRNA, tmRNA, miRNA) gene sequences by BAKTA. Predicted genes are then compared against a variety of databases, including KEGG, COG, CAZy, [Pfam/InterPro(https://www.ebi.ac.uk/interpro/), TIGRFAMs, and ISFinder.
Eukaryotic genome assembly and annotationReference-guided or de novo assembly using SPAdes and (if long-reads are available) Longstitch. If long reads are available, Flye is also used to build an initial draft assembly. If short-reads are available, polishing is then carried out using Pilon. Gene prediction is carried out using Braker and functional annotation using eggNOG-mapper.
Metagenome assemblyOptimized de novo assembly using metaSPAdes and Megahit. Other techniques can be considered depending on the data used.
Metagenome binningMultiparametric binning using MetaBAT and DASTool, followed by bin evaulation with CheckM, SprayNPray, and Binarena. Other techniques can be considered depending on the data used.
PhylogenomicsIdentification of single-copy genes using GToTree, and generation of a phylogenomic tree in the context of 100 most-closely related genomes available from NCBI's RefSeq database.
RNA seq and differential expressionRead mapping to reference genome using Bowtie2. Data is summarized into count tables using HTSeq, and differential expression analysis performed in DESeq2. Transcripts are reconstructed using Trinity.
Amplicon (16S) analysisAnalysis using Qiime2 and Dada2, producing an ASV/OTU table, sequences for each ASV/OTU in FASTA format, and taxonomic assignment for each sequence.
Amplicon Biostatistics and visualizationsDiversity calculations include alpha diversity indices (Shannon’s index, Pielou eveness) and beta diversity matrices (Jaccard, Bray-Curtis, Unifrac) derived from an appropriately rarefied OTU table and a binned metadata table. Beta diversity matrices are further processed by principal coordinate analysis, and those results are visualized with the metadata. Statistical testing against alpha diversity (Kruskall-Wallace tests) and beta diversity (PERMANOVA, ANOSIM) is also available. Rarified OTU tables can be transformed into relative abundances with the option of pseudo-counting and further clr-transformation, as well as ANCOM testing against binned metadata.
Taxonomic classificationTaxonomic profiling at the read level, using Kraken, and at the contig level (following a Unicycler assembly) using SprayNPray. This analysis also includes identification of the most closely related sequenced genomes from RefSeq/GenBank using Mash and construction of a phylogenomic tree using GToTree.
Virus identificationIdentification of virus and phage sequences using VirSorter.
ConsultationMAB is available to provide advice, feedback, and assistance with pipeline development, software development, bioinformatics analyses, and data interpretation. A retainer for an extended period of time (e.g. weeks, months, and years) is possible at a discounted rate. Initial consultation session is free.
Letters of support and grant assistanceWe are happy to assist/consult in grant preparation, particularly regarding any proposed bioinformatics. Additionally, MAB will, upon request, provide a letter of support in regard to any bioinformatics analysis or training that is included as part of the grant proposal.

SEQUENCING

We are happy to partner with SeqCoast for sequencing support, including whole-genome and RNA sequencing. SeqCoast offers friendly service, lightning-fast turnaround times, and complete bioinformatics support.

COST BREAKDOWN FROM SEQCOAST

Sample TypeCost/Sample
bacterial genome$80-105
yeast genome$130-145
metagenomes and larger eukaryotic genomes$200-550
bacterial genome (hybrid-sequenced)$590
RNAseq (with rRNA depletion using RiboZero)$270-430

CONTACT

TRAINING AND WORKSHOPS

We are passionate about training the next generation of scientists in cutting edge methods and techniques relevant to bioinformatics. We are available to train biologists at all levels, from undergrad to professor, in bioinformatics techniques that are of interest to each respective lab. The scope and flexibility of training includes private lessons with individual researchers, as well as department-wide workshops. These can be virtual, in-person, or a combination of both.Please contact us for a customized lesson plan and quote.

Sample lesson plans
One-on-one bioinformatics support
3-hour bioinformatics seminar (e.g. introductory Python, R, Bash, metagenomics, transcriptomics, etc.)
2-hour hands-on lessons, including one hour of lecture and one hour of virtual-machine-facilitated workshops.

NCBI DATA SUBMISSION

Sample TypeRepository
GenomeGenBank and RefSeq
FASTQ readsSequence Read Archive (SRA)
MetadataFigShare, GitHub, etc.

We know how tedious and complicated data submission to NCBI can be. But reproducibility and open access of data is essential to many ongoing projects that rely on and add upon previous work. To this end, we offer, as a service, submission of samples to NCBI and other public repositories (e.g. GitHub, FigShare).

PROBE DESIGN FOR RNA-FISH

Fluorescence in situ hybridization (FISH) is a powerful technique that allows for simultaneous visualization, phylogenetic identification, and enumeration of individual microbial cells. FISH is therefore often a critical step in many cutting-edge microbiology workflows that separate cells of a particular phylogenetic affiliation for single-cell amplicon or genome sequencing or that localize and interrogate those cells using microspectroscopy methods such as NanoSIMS (determine natural stable isotopic relative abundances) or Raman (determine which cells have taken up an isotopically spiked substrate).This method entails irreversibly binding a fluorescently labelled probe to permeabilized cells’ ribosomal RNA. Probes are 16-23 mers with a base sequence that is complimentary to a consensus region- a sequence that is conserved among and unique to a phylogenetic target group- within a target phylogenetic group’s 16S or 23S gene. Probes will typically hybridize against sequences with a 0 or 1 base pair mismatch, so that many must be combined with competitor probes- non-fluorescent probes that bind to non-target sequences with a one-base mismatch so that they are “unavailable” to the FISH probe. Some phylogenetic groups also require probe “cocktails” for near-total coverage (for example, Deltaproteobacteria). For robust experimental results, it is essential that the FISH probes have a high specificity (unique to target phylogenetic group) and coverage (accounts for a high percentage of target group sequences) against a curated database of high-quality 16S or 23S sequences such as SILVA or GreenGenes, and against the user’s sample 16S or 23S sequences. While there are online tools that aid in these analyses to a degree, such as TestProbe, they lack several important services. These include calculating competitor probe-adjusted coverages and specificities, coverages and specificities of “cocktails” and testing the probe(s) and competitor(s) against the user’s sample library.Our service accomplishes all of this in an easily executable python pipeline and has a variety of optional outputs. These include database target and non-target accessions and sequences, the position(s) and base(s) of the most common mismatches, suggested additional competitor probes, and probe testing against the user’s amplicon libraries through alignments with high-quality reference sequences and consensus region checks.

SAMPLE PLOTS

(bioinformatics as a service)

variant calling

Variant calling is the process of identifying mutations in an evolved lineage. Mutations are predicted by identifying changes in the genome sequence over the course of an experiment (e.g. evolve-and-resequence)
Please contact us for a customized lesson plan and quote.

(meta)genomics

Genomes are routinely sequenced with methods that involve shearing DNA into smaller fragments prior to sequencing. Computational pipelines exist that assembly these reads back together based on overlapping sequences.

prokaryotic genome assembly

Reference-guided or de novo assembly using Unicycler, followed by a comprehensive annotation pipeline that includes de novo prediction of coding (CDS) and non-coding (tRNA, rRNA, tmRNA, miRNA) gene sequences by BAKTA. Predicted genes are then compared against a variety of databases, including KEGG, COG, CAZy, [Pfam/InterPro(https://www.ebi.ac.uk/interpro/), TIGRFAMs, and ISFinder.

eukaryotic genome assembly

Reference-guided or de novo assembly using SPAdes and (if long-reads are available) Longstitch. If long reads are available, Flye is also used to build an initial draft assembly. If short-reads are available, polishing is then carried out using Pilon. Gene prediction is carried out using Braker and functional annotation using eggNOG-mapper.

Metagenomics

Optimized de novo assembly using metaSPAdes and Megahit. Other techniques can be considered depending on the data used.Multiparametric binning using MetaBAT and DASTool, followed by bin evaulation with CheckM, SprayNPray, and Binarena. Other techniques can be considered depending on the data used.Identification of single-copy genes using GToTree, and generation of a phylogenomic tree in the context of 100 most-closely related genomes available from NCBI's RefSeq database.

phylogenomics

Taxonomic profiling at the read level, using Kraken, and at the contig level (following a Unicycler assembly) using SprayNPray. This analysis also includes identification of the most closely related sequenced genomes from RefSeq/GenBank using Mash and construction of a phylogenomic tree using GToTree.

microbiome profiling

Analysis using Qiime2 and Dada2, producing an ASV/OTU table, sequences for each ASV/OTU in FASTA format, and taxonomic assignment for each sequence.Diversity calculations include alpha diversity indices (Shannon’s index, Pielou eveness) and beta diversity matrices (Jaccard, Bray-Curtis, Unifrac) derived from an appropriately rarefied OTU table and a binned metadata table. Beta diversity matrices are further processed by principal coordinate analysis, and those results are visualized with the metadata. Statistical testing against alpha diversity (Kruskall-Wallace tests) and beta diversity (PERMANOVA, ANOSIM) is also available. Rarified OTU tables can be transformed into relative abundances with the option of pseudo-counting and further clr-transformation, as well as ANCOM testing against binned metadata.

transcriptomics

Read mapping to reference genome using Bowtie2. Data is summarized into count tables using HTSeq, and differential expression analysis performed in DESeq2. Transcripts are reconstructed using Trinity.

methylation prediction

Text