tacco.tools.setup_goa_analysis

setup_goa_analysis(gene_index, gene_info_file='https://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz', tax_id=10090, GO_obo_file='http://purl.obolibrary.org/obo/go/go-basic.obo', gene2GO_file='https://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz', working_directory='.')[source]

Setup a GO analysis. This is a convenience wrapper around the goatools package [Klopfenstein18] and like goatools performs the enrichment analysis independent of the availability of webservices using a databases downloaded once for reproducibility.

Parameters:
  • gene_index – The list of all possible genes.

  • gene_info_file – File containing a mapping from NCBI GeneIDs to gene symbols, e.g. downloaded from https://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz. If this is not available as a local file, it is treated as an URL and downloaded to the working_directory if necessary, see below.

  • tax_id – The NCBI taxonomy ID to filter the gene_info_file for.

  • GO_obo_file – File containing the Gene Ontology data, e.g. downloaded from http://purl.obolibrary.org/obo/go/go-basic.obo analysis/go-basic.obo. If this is not available as a local file, it is treated as an URL and downloaded to the working_directory if necessary, see below.

  • gene2GO_file – File containing a mapping from NCBI GeneIDs to Gene Ontology data, e.g. downloaded from https://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz. If this is not available as a local file, it is treated as an URL and downloaded to the working_directory if necessary, see below.

  • working_directory – Directory where to buffer downloaded files. If a file of the same name already exists in this directory, it is not downloaded again.

Returns:

Returns a go_enrichment_ns:GOEnrichmentStudyNS and a Series mapping gene symbols to gene ids. Both are needed to run the enrichment analyses. For convenience, they are also buffered as global objects and used automatically.