Tools

Here are the link to our publicly available tools

  • Sequence Search
  • Genome Browser
  • Metagenomics Software
  • SeaFlow Cytometer
  • Pplacer
  • Sequence Search

    Armbrust Cluster BLAST Interface

    This interface provides web 2.0 access to using NCBI's BLAST executable on the Armbrust Lab's 256 CPU Linux cluster in highly parallel fashion

    Features

    link


    Genome Browsers

    AnnoJ

    "Anno-J is a Web 2.0 application designed for visualizing deep sequencing data and other genome annotation data. It is intended to run in modern W3C compliant browsers*, and allows flexible configuration of plugins and data streams from providers located anywhere on the internet."

    Metagenomics

    Pplacer

    Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment. Pplacer is designed to be fast, to give useful information about uncertainty, and to offer advanced visualization and downstream analysis.

    Pplacer was developed by the Matsen Group at Fred Hutch. More information can found here:
    http://matsen.fhcrc.org/pplacer/

    Here are some info to install all the dependencies to run pplacer:
    Installation

    Here is a quick tutorial to run pplacer:
    Pipeline

    Installation

    PPLACER installation software, adapted by Sacha, May 2016

    1. Do a command line introductory course, if needed. For Mac, I did Macheads101 on Youtube. This will help you to figure out how to install the software. For a more advanced command line course, writing scripts etc:
      http://mywiki.wooledge.org/BashGuide

      optional: do a python intro course, f.e. 'learn python the hard way' or 'python-Code academy'

    2. Install anaconda (Python) https://docs.continuum.io/anaconda/install
    3. Install a texteditor to work your code. I am using textwrangler. I installed the command line tools. You can run .py code directly from here. For new text file just type "edit name-file.txt" in terminal, or .py for python. To run python script type "python namefile.py"
    4. Install alignment program MAFFT http://mafft.cbrc.jp/alignment/software/macstandard.html
      You can also run alignments through guidance2.0 online. Will help you identify unconserved regions.
    5. Install homebrew. To do so, type in your terminal:
      /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    6. Install GSL1.16 through homebrew (needed for PPLACER to run)
      brew install gsl
    7. Download and unzip pplacer-Darwin-v1.1.alpha17 from https://github.com/matsen/pplacer/releases/tag/v1.1.alpha17
      place PPLACER, GUPPY and rppr binaries in $PATH. I just copied them into
      usr/local/bin
    8. Install raxML (slow) and fasttree (fast) for building of trees and log files
    9. Install taxtastic to build and maintain reference packages
      https://github.com/fhcrc/taxtastic
      This will also automatically also install biopython 1.66
    10. Install Jalview to look at alignments (you need to have java installed, go to apple to download). Open Jalview from website
       javaws http://www.jalview.org/webstart/jalview.jnlp -open yourFileName
    11. Install HMMER 3.1b on computer (http://hmmer.org/). The manual has a tutorial. To run HMMbuild, your alignment has to be transformed from fasta format to Stockholm format. See little script HMMofFasta. You can use little package called bioscripts converter to do this job. You can also run HMMbuild on MSF file, see http://bioinf.ibun.unal.edu.co/cursos/Course01/hmm_profiles/
      Jalview can convert alignment in MSF format
    12. Install seqmagick to remove duplicate sequences, quickly change between sto and fasta files, etc. http://seqmagick.readthedocs.org/en/latest/
    13. Install Guidance2.01 http://guidance.tau.ac.il/ver2/source.php. Lots of warning messages when compiling, not yet curated for Mac...
    14. Install R and biostrings. In R, type:
      source("https://bioconductor.org/biocLite.R")
      biocLite("Biostrings")
    15. Install guidance2.0: need to install bioperl first, is a bit complex... still needs to be done
    16. Install prottest and make RaxML tree
    17. Install NCBI edirect to fetch and search NCBI from the command line.
      http://www.ncbi.nlm.nih.gov/books/NBK179288/
       cd ~
        perl -MNet::FTP -e \
          '$ftp = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1); $ftp->login;
           $ftp->binary; $ftp->get("/entrez/entrezdirect/edirect.zip");'
        unzip -u -q edirect.zip
        rm edirect.zip
        export PATH=$PATH:$HOME/edirect
        ./edirect/setup.sh
        
        efetch -help
      

    Pipeline

    Pplacer pipeline adapted by Sacha, May 2016

    A basic pplacer run looks like:

     pplacer -c my.refpkg aln.fasta 

    The reference package is made of the input alignment and tree using the taxtastic package. The alignment fasta contains the reference sequences (used for the reference tree) aligned with the query fasta obtained using hhmer

    PREPARE YOUR REFERENCE ALIGNMENT

    1. To get started: go to the right directory
       cd /Users/.../..../...
    2. Make a multiple sequence alignment (MSA), and remove unreliable regions.
      I like to use Guidance2.0 - MAFFT algorithm, takes a bit of time:
      http://guidance.tau.ac.il/ver2/overview.phpInputMSA
    3. If you have more than 500 sequences, you can also run MAFFT 7.0 locally on computer:

      mafft filename.fasta > filenameAln.fasta

      or more fancy:

      mafft --localpair --maxiterate 1000 --reorder --leavegappyregion filename.fasta > filenameAln.fasta 

      You can use prottest to test for best AA substitution model. Pplacer only knows about the GTR, WAG, LG, and JTT models

    4. Remove possible duplicate sequences from alignment (will mess up PPLACER run later on)
      the name of my alignment is "filename".
    5. seqmagick convert --deduplicate-sequences filename.aln.fasta filename.aln.dedup.fasta
    6. Remove stop codon * (asterix) from alignment files (is not recognized by PPLACER)
      you can do find-replace in any text editor
    7. WHEN YOU HAVE YOUR ALIGNMENT READY:

      1. Build tree with FastTree, creating a log file
        FastTree -log filename.tree.log filename.aln.dedup.fasta > filename.tree

        FastTree is fast and easy. You can also construct your tree using RaxML, which will give you more advanced options.

      2. Look at tree using FigTree or archaeopterix (Forester.jar).
      3. Make reference package w/o TaxIDs (script will be updated for TaxIDs soon)
        taxit create -l nod -P filename.refpkg --aln-fasta filename.aln.dedup.fasta --tree-stats filename.tree.log --tree-file filename.tree
      4. Convert alignment format from fasta to stockholm format
        seqmagick convert filename.aln.dedup.fasta filename.aln.dedup.sto
      5. Run HMMbuild to get HMM profile
        hmmbuild filename.hmm filename.aln.dedup.sto
      6. Use hmm profile to do an HMM search on the metatranscriptomics file and get output in .sto format
        hmmsearch -A filename.query.sto filename.aln.dedup.hmm /Users//path to meta transcriptome file

        Note from manual: The --tblout and --domtblout options save output in simple tabular
        Only keep hits of e-value less than ....

        hmmsearch -A filename.query.sto -E 0.001 --tblout filename.query.txt filename.hmm /Users/path to meta transcriptome file
      7. Use hmmalign to align query hits to the reference alignment
        hmmalign -o filename.combo.sto --mapali filename.aln.dedup.sto filename.hmm filename.query.sto
      8. and than now..... run pplacer using refpkg.
        pplacer -c filename.refpkg filename.combo.sto
      9. Now run `guppy fat` to make a phyloXML "fat tree" visualization, and run archaeopteryx to look at it.
        Note that `fat` can be run without the reference package specification, e.g.:
        guppy fat filename.combo.jplace
      10. We have a little script function `aptx` to run archaeopteryx from within this script
        (you can also open them directly from the archaeopteryx user interface if you prefer).
        aptx() {
            java -jar bin/forester.jar -c bin/_aptx_configuration_file $1
        }
        aptx filename.combo.xml &

        Look at PPLACER demo for more options http://matsen.fhcrc.org/pplacer/manual.html

    MANTA

    We've developed an R package for Microbial Assemblage Normalized Transcript Analysis [MANTA] to perform comparative metatranscriptomics. The software accepts count or alignment data as input, provides cross tabulation routines, facilitates normalization and significance estimation and outputs summary tables and publication quality differential expression plots as well as visual and statistical quality control diagnostics.

    Downloads

    Additional Resources

    EdgeR Bioconductor page
    EdgeR Bioinformatics paper
    More information on RA plots

    Referencing Articles

    PNAS 18 January 2012 "Comparative metatranscriptomics identifies molecular bases for the physiological responses of phytoplankton to varying iron availability".

    SEAStAR

    SeaFlow cytometer


    SeaFlow is a novel underway flow cytometer created at UW by Jarred Swalwell that is designed to measure continuously the abundance and composition of microbial populations, making it possible to analyze the equivalent of one sample every three minutes. The instrument collects information about the size and pigment content of an individual cell and counts several thousands cells every second in real-time. The instrument utilizes light scattering and autofluorescence properties of individual cells to discriminate and quantify different cell populations that span 0.5-15 micrometer in size. The instrument is semi-autonomous and can be controlled remotely via Satellite connection.

    SeaFlow data are processed at the Armbrust Lab by Francois Ribalet. Abundance and optical properties of of the different microbial populations can be visualized using our web interface. Distribution of microbial population can also be visualized in Google Earth. Click on the cruise track to download individual KML file.

    Data visualization

    Click here to interactively visualize Seaflow data (website created by Chris Berthiaume).

    Seaflow visualization interface

    Instrument Description

    The instrument is presented in the following publication: Swalwell, J.E., Ribalet, F., and Armbrust, E.V. 2011. SeaFlow: A novel underway flow-cytometer for continuous observations of phytoplankton in the ocean. Limnology & Oceanography Methods 9: 466-477.

    Instrument Photos

    images courtesy of Jarred Swalwell

    First-generation SeaFlow (June 2008) Second-generation SeaFlow (August 2009)
    SeaFlow on the UW research vessel (April 2010) SeaFlow on a container ship (Oocl Tokyo) (January 2011)