AMP density tutorial

In this tutorial, we will show how to use the expected.percontigs output produced in the contigs and reads mode to obtain the density of AMPs per species.

Consider the expected results in the folder tests/contigs/expected.percontigs as one of the inputs used in this tutorial, and the table for taxonomy of contigs available in the example_seqs/example_taxonomy_contigs.tsv.gz.

First you will need to load tables into your system:

import pandas as pd

percontigs = pd.read_table('tests/contigs/expected.percontigs', comment='#')
taxonomy = pd.read_table('example_seqs/example_taxonomy_contigs.tsv.gz')

Then, you will need to merge these tables.

percontigs = percontigs.merge(on='contig',
                              right=taxonomy,
                              how='outer')

Now, we will group results and sum values:

percontigs = percontigs.dropna()
percontigs = percontigs.drop('contig', axis=1)
percontigs = percontigs.groupby('taxonomy').agg('sum')

By now, you should have a table with the species and the total of assembled base pairs per species as well as their total number of ORFs, smORFs and redundant AMPs. Now you just calculate the density as follows:

percontigs['AMP_density'] = percontigs.AMPs * 1e6 / percontigs.length
percontigs.to_csv('expected.density',
                  sep='\t',
                  header=True,
                  index=None)

The resulting expected.density table should be as follows:

taxonomy length ORFs smORFs AMPs AMP_density
speciesA 4208 11 10 0 0.000
speciesB 11876 15 8 1 84.203
speciesC 5679 8 6 0 0.000