Pplacer

Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment. Pplacer is designed to be fast, to give useful information about uncertainty, and to offer advanced visualization and downstream analysis.

Pplacer was developed by the Matsen Group at Fred Hutch. More information can found here:
http://matsen.fhcrc.org/pplacer/

Here are some info to install all the dependencies to run pplacer:
Installation

Here is a quick tutorial to run pplacer:
Pipeline

Installation

PPLACER installation software, adapted by Sacha, May 2016

  1. Do a command line introductory course, if needed. For Mac, I did Macheads101 on Youtube. This will help you to figure out how to install the software. For a more advanced command line course, writing scripts etc:
    http://mywiki.wooledge.org/BashGuide

    optional: do a python intro course, f.e. 'learn python the hard way' or 'python-Code academy'

  2. Install anaconda (Python) https://docs.continuum.io/anaconda/install
  3. Install a texteditor to work your code. I am using textwrangler. I installed the command line tools. You can run .py code directly from here. For new text file just type "edit name-file.txt" in terminal, or .py for python. To run python script type "python namefile.py"
  4. Install alignment program MAFFT http://mafft.cbrc.jp/alignment/software/macstandard.html
    You can also run alignments through guidance2.0 online. Will help you identify unconserved regions.
  5. Install homebrew. To do so, type in your terminal:
    /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  6. Install GSL1.16 through homebrew (needed for PPLACER to run)
    brew install gsl
  7. Download and unzip pplacer-Darwin-v1.1.alpha17 from https://github.com/matsen/pplacer/releases/tag/v1.1.alpha17
    place PPLACER, GUPPY and rppr binaries in $PATH. I just copied them into
    usr/local/bin
  8. Install raxML (slow) and fasttree (fast) for building of trees and log files
  9. Install taxtastic to build and maintain reference packages
    https://github.com/fhcrc/taxtastic
    This will also automatically also install biopython 1.66
  10. Install Jalview to look at alignments (you need to have java installed, go to apple to download). Open Jalview from website
     javaws http://www.jalview.org/webstart/jalview.jnlp -open yourFileName
  11. Install HMMER 3.1b on computer (http://hmmer.org/). The manual has a tutorial. To run HMMbuild, your alignment has to be transformed from fasta format to Stockholm format. See little script HMMofFasta. You can use little package called bioscripts converter to do this job. You can also run HMMbuild on MSF file, see http://bioinf.ibun.unal.edu.co/cursos/Course01/hmm_profiles/
    Jalview can convert alignment in MSF format
  12. Install seqmagick to remove duplicate sequences, quickly change between sto and fasta files, etc. http://seqmagick.readthedocs.org/en/latest/
  13. Install Guidance2.01 http://guidance.tau.ac.il/ver2/source.php. Lots of warning messages when compiling, not yet curated for Mac...
  14. Install R and biostrings. In R, type:
    source("https://bioconductor.org/biocLite.R")
    biocLite("Biostrings")
  15. Install guidance2.0: need to install bioperl first, is a bit complex... still needs to be done
  16. Install prottest and make RaxML tree
  17. Install NCBI edirect to fetch and search NCBI from the command line.
    http://www.ncbi.nlm.nih.gov/books/NBK179288/
     cd ~
      perl -MNet::FTP -e \
        '$ftp = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1); $ftp->login;
         $ftp->binary; $ftp->get("/entrez/entrezdirect/edirect.zip");'
      unzip -u -q edirect.zip
      rm edirect.zip
      export PATH=$PATH:$HOME/edirect
      ./edirect/setup.sh
      
      efetch -help
    

Pipeline

Pplacer pipeline adapted by Sacha, May 2016

A basic pplacer run looks like:

 pplacer -c my.refpkg aln.fasta 

The reference package is made of the input alignment and tree using the taxtastic package. The alignment fasta contains the reference sequences (used for the reference tree) aligned with the query fasta obtained using hhmer

PREPARE YOUR REFERENCE ALIGNMENT

  1. To get started: go to the right directory
     cd /Users/.../..../...
  2. Make a multiple sequence alignment (MSA), and remove unreliable regions.
    I like to use Guidance2.0 - MAFFT algorithm, takes a bit of time:
    http://guidance.tau.ac.il/ver2/overview.phpInputMSA
  3. If you have more than 500 sequences, you can also run MAFFT 7.0 locally on computer:

    mafft filename.fasta > filenameAln.fasta

    or more fancy:

    mafft --localpair --maxiterate 1000 --reorder --leavegappyregion filename.fasta > filenameAln.fasta 

    You can use prottest to test for best AA substitution model. Pplacer only knows about the GTR, WAG, LG, and JTT models

  4. Remove possible duplicate sequences from alignment (will mess up PPLACER run later on)
    the name of my alignment is "filename".
  5. seqmagick convert --deduplicate-sequences filename.aln.fasta filename.aln.dedup.fasta
  6. Remove stop codon * (asterix) from alignment files (is not recognized by PPLACER)
    you can do find-replace in any text editor
  7. WHEN YOU HAVE YOUR ALIGNMENT READY:

    1. Build tree with FastTree, creating a log file
      FastTree -log filename.tree.log filename.aln.dedup.fasta > filename.tree

      FastTree is fast and easy. You can also construct your tree using RaxML, which will give you more advanced options.

    2. Look at tree using FigTree or archaeopterix (Forester.jar).
    3. Make reference package w/o TaxIDs (script will be updated for TaxIDs soon)
      taxit create -l nod -P filename.refpkg --aln-fasta filename.aln.dedup.fasta --tree-stats filename.tree.log --tree-file filename.tree
    4. Convert alignment format from fasta to stockholm format
      seqmagick convert filename.aln.dedup.fasta filename.aln.dedup.sto
    5. Run HMMbuild to get HMM profile
      hmmbuild filename.hmm filename.aln.dedup.sto
    6. Use hmm profile to do an HMM search on the metatranscriptomics file and get output in .sto format
      hmmsearch -A filename.query.sto filename.aln.dedup.hmm /Users//path to meta transcriptome file

      Note from manual: The --tblout and --domtblout options save output in simple tabular
      Only keep hits of e-value less than ....

      hmmsearch -A filename.query.sto -E 0.001 --tblout filename.query.txt filename.hmm /Users/path to meta transcriptome file
    7. Use hmmalign to align query hits to the reference alignment
      hmmalign -o filename.combo.sto --mapali filename.aln.dedup.sto filename.hmm filename.query.sto
    8. and than now..... run pplacer using refpkg.
      pplacer -c filename.refpkg filename.combo.sto
    9. Now run `guppy fat` to make a phyloXML "fat tree" visualization, and run archaeopteryx to look at it.
      Note that `fat` can be run without the reference package specification, e.g.:
      guppy fat filename.combo.jplace
    10. We have a little script function `aptx` to run archaeopteryx from within this script
      (you can also open them directly from the archaeopteryx user interface if you prefer).
      aptx() {
          java -jar bin/forester.jar -c bin/_aptx_configuration_file $1
      }
      aptx filename.combo.xml &

      Look at PPLACER demo for more options http://matsen.fhcrc.org/pplacer/manual.html