postpb manual

This manual will guide you through postpb usage by explaining each of the panels in the app and their main functions.

Orientation

After installing and starting postpb as described here, make sure the app runs in your browser using full screen. If you are using Rstudio, it may display the app in its viewer. In that case, you can switch to the browser by clicking ‘open in browser’ in the viewer’s top panel.

Similar to a regular website, postpb is organised in multiple pages (panels) which which can be accessed by clicking on the tabs on the very top the the screen (‘Parameters’, ‘Trees’, ‘About’). While the latter contains some general information, the former 2 are where the analyses happen. Note that the panels are completely independent of each other, meaning that you can use one or both, and that you don’t necessarily need to provide data from the same runs to both panels (although this is the typical use case).

File formats

When you start the app, you will likely have data from a Bayesian phylogenetic analysis. Typically, this would be

Parameter or trace files that contain the values of various estimated model parameters over the length of the run, and
Tree files containing estimated phylogenies for each iteration of the run.

Depending on the software used, these files may look quite differently. postpb works best with output files from Phylobayes and MrBayes, but should work for other Bayesian phylogenetics software as well.

Parameter files are typically tabulator separated files, with one column per model parameter, and one line for each iteration or generation of the run. Files produced by Phylobayes and MrBayes end in .trace or .p, respectively.
Tree files contain one tree per iteration of the run. The most typical tree formats are Newick (e.g., used by Phylobayes, files end in *.treelist) and Nexus (used by MrBayes, *.t file ending).

Examples of these file types can be found in the example folder of this repository.

Examples

The examples were specifically chosen to illustrate a ‘good’ Phylobayes run (good mixing of chains, convergence between and within chains achieved, no topological conflict, Example 2), and a run that was not successful (Example 1). To familiarise yourself with the app, please have a look at the examples first.

The Parameters tab

When you start the app, the ‘Parameters’ tab will open automatically. You will notice that there is a side panel to the left, and a main display area to the right. The main display carries 4 tabs (‘Traces’, ‘Violin’, ‘Density’, ‘Summary statistics’). The side panel is where you find all of the global options and settings, i.e., the settings that apply to each of the plots.

To get to know the app, it makes sense to use one of the examples. Click ‘Load example 1’ and wait for a plot to appear. Alternatively, upload your own data through the interface.

Sidebar

In the sidebar to the left, the following global options (i.e., applying to all trace plots) are available:

Burnin - the number of iterations to exclude from the beginning of the run. This defaults to 20% of the shortest chain, but it makes sense to play around with this (indeed, one of the aims of postpb is to allow you to choose a sensible burnin size). Changing this will automatically update the plots and stats in the main panel.
Select trace file - it is recommended to have multiple runs and/or chains for your Bayesian phylogenetics analysis. Here, you can exclude selected chains from your postpb analysis. This may be sensible if one of your chains is a clear outlier. Any alterations will automatically update your plots and stats in the main panel.
Sampling frequency - How many data points from your Bayesian phylogeny do you want to analyse in postpb? The default value is 10, meaning only 10% of the run will be analysed. Due to the high autocorrelation in Bayesian phylogenetics analyses, this is often a good default. Keep in mind however that you may already have specified a reduced sampling frequency within e.g., Phylobayes or Mr Bayes. If you want to include the entire run, choose ‘1’. Low values here will result in longer run times of postpb. To change the sampling frequency and update your plots and stats in the main panel, click ‘Apply’
More options - this button enables another suit of options that are all about the layout of the plots (size and number of columns). All changes will be applied immediately.
Download pdf - press this once happy with the look of your plots. A pdf will be generated for you to download.

IMPORTANT: The items in the main panel area will update dynamically if any of the global options are changed, and each change will take a moment to be applied. It is therefore recommended not to make too many changes at once.

Main panel

Access the parameter plots by clicking through the different panels of the main display. Alter the global options (e.g., burnin and sampling frequency) to see how that affects descriptive plots and statistics.

Trace - generates simple scatter plots of all variables over iteration of the analysis. This is a very common way of displaying results of a Bayesian analysis and is helpful in determining suitable burnin sizes and in assessing chain convergence and mixing. You can choose to have the scatter plots in lines, points, or both. Play around with the burnin slider and observe how the plot changes. Ideally, a trace plot should look like a ‘hairy caterpillar’ (have a look at Example 2) - this indicates good mixing, i.e., the chain is exploring the parameter space well (successive steps don’t stay on the same value for very long and don’t follow only a single direction).
Violin & Density - these very similar plots show the distribution of your parameters in the post-burnin sample. The plots are especially helpful in determining how similar the different chains are compared with each other. Ideally, you’d want the shape of these plots to be very similar between independent chains. Large differences in the distributions indicate poor convergence between chains.
Summary statistics - here, the data is summarised once more using descriptive statistics. Click on the ‘Toggle explanations’ button to learn a bit about the different values and what they may mean for the interpretation of the run. For documentation purposes, the table of values can be copied to the clipboard (e.g., to paste them into a spreadsheet application), downloaded as .csv file or printed from the browser.

The Trees tab

Access the trees tab through the panel at the very top of the page. Again, using one of the examples is probably a good way of trying out the app. Alternatively, upload your tree files through the interface. Once you uploaded tree data, postpb will begin calculating a consensus tree. Depending on the number of your trees, this may take some time. A popup will remind you to be patient here.

Before checking out the tree, familiarise yourself with the global options (panel on the left). Much of this is very similar to the traces tab:

Burnin & Select tree files - do exactly what they do in the traces tab.
Number of cores - some of the most intense computations benefit from parallelisation. postpb will automatically detect the number of cores available on your machine and select one core fewer than the maximum number. Choose a lower value if you run many other processes parallel to postpb (but be prepared it may then take longer to run).
More options - as in the traces tab, you will find some display options here, all of which will be applied immediately.

Consensus tree display and customization

After loading an example or results files, a consensus tree will be generated and displayed in the main panel area. The tree can now be modified through various means.

Any alterations made here will be applied to all subsequent tree plots in the other tabs as well.

Rooting | The consensus tree can be midpoint rooted by clicking ‘midpoint rooting’. Alternatively, the outgroup taxa can be selected in the dropdown menu in the sidebar (‘Select outgroups for rooting’) and clicking ‘Reroot with outgroup’ after the selecting is complete. Interactive rooting is also supported: Click and drag the mousepointer around outgroup taxa. When releasing the mouse button, confirm in the popup that you wish to use these taxa for rooting and finally, apply the rooting by clicking ‘Reroot with outgroup’. See interactive rooting in action below.

Interactive rooting in postpb.

Collapsing poorly supported nodes | By definition, a consensus cladogram only shows nodes that are found in at least half of all trees of the posterior sample. This means however that the consensus tree might still contain nodes that are poorly supported. The slider at the very top of the main panel can be used to collapse nodes below a user defined value.

Highlighting taxa | Highlighting specific taxa works very similarly to rooting the tree: You can specify which taxa to be highlight using the dropdown menu in the sidebar (‘Select taxa to highlight’). Alternatively, you can interactively highlight multiple taxa.

When using multiple colours, you need to first select the taxa to highlight, and then the colour to use. This is demonstrated below.

Interactive highlighting in postpb.

Other display options & export | In the sidebar, under ‘More options’ you can change the appearance of the tip labels (bold, italics, aligned), the overall size of the plot, and the relative size of labels and lines. There is also support for some very basic tree annotations: simply add your notes to into the text box, and they will appear directly underneath the consensus tree support (similar to a figure legend). Only unformatted text can be displayed. Once happy with the layout of the tree, you can export the plot using the ‘Download tree as pdf’ button. The tree can also be downloaded in newick format.

Single generation tree display

In the ‘Trees’ tab, you can display every tree from all chains for each generation. There is also an option to automatically move on to the next tree every few seconds (use the ‘PLAY’ button). This is not so much an objective analysis, but can be informative to learn about the different topologies explored in the posterior sample. This is especially instructive after having highlighted a few clades in you consensus tree – you can then check to see e.g., how the composition of certain clades change over time.

Consensus trees of individual chains

Independent chains of your run may result in different topologies. Especially for trees with many taxa, it may be cumbersome to identify conflicting nodes by hand. The ‘Difference’ tab will display all possible comparisons between consensus trees of all chains. Clades that differ in the pairwise comparisons will be highlighted in bright pink to allow easier identification.

Tree distances within and between chains

The tab ‘Pairwise Robinson-Foulds’ shows how similar the trees of the posterior sample are in the course of the analyses. The first plot shows pairwise Robinson-Foulds (RF) distances between trees from different chains for every generation. The expectation for a good run would be that after a burnin period, the distances between the chains should be small and relatively constant. The plot below also shows RF distances, but for subsequent trees of the same chain. Again, the expectation would be that the line here is ~ 0 in the posterior sample. If RF values in this plot vary a lot over the course of the analysis, the may indicate a lack of convergence.

Calculating RF distances is computationally demanding. Reduce the number of trees here if you experience big delays.

Check for clades of interest

All nodes or bipartitions that are found in at least 50% of the post-burnin trees will be displayed in the consensus tree. However, sometimes you may wish to know how often other bipartitions are found in the dataset. This may be e.g., groupings that were recovered by alternative approaches but are not found in your consensus tree. You can use the ‘Bipartion support’ tab to check if such lineages are entirely absent or potentially present in a large minority of the posterior sample. Provide the names of the taxa to check either through the text box or by selecting the tip names in the dropdown menu, and click ‘Submit’ to display the frequency of the bipartition.

RWTY

postpb provides an interface for some functions of the RWTY package that are helpful for posterior assessment of Bayesian phylogenetic analyses. It would be beyond the scope of this manual to explain the functions of this package, and it would also be redundant: there is an excellent vignette that you should check out should you wish to use these functions. I especially recommend RWTY when after analysing your Bayesian phylogenetics run with postpb you are still unclear of what has happened with the run.

[TODO: An example analysis with postpb]