IPTK.Classes package

Submodules

IPTK.Classes.Annotator module

The class provides methods for visualizing different aspects of the protein biology. This is achieved through three main methods:

1- add_segmented_track: which visualize information about non-overlapping protein substructures, for example, protein domains.

2- add_stacked_track: which visualize information about overlapping protein substructures, for example, splice variants.

3- add_marked_positions_track: which visualize or highlight positions in the protein, for example, sequence variants, or PTM.

The class also provides functions for visualizing the relationship between a protein and its eluted peptide/peptides in an analogous manner to the way NGS reads are aligned to genomic regions. This can be useful to identify regions in the protein with high/low number of eluted peptides, i.e.,Coverage. Also, to link it with other facests of the protein like domain organization,PTM, sequence/splice variants.

Notes

each figure should have a base track this can be done explicitly by calling the function add_base_track or by implicitly by calling the function add_coverage_plot with the parameter coverage_as_base=True.

class IPTK.Classes.Annotator.Annotator(protein_length: int, figure_size: Tuple[int, int], figure_dpi: int, face_color='white')

Bases: object

A high level API to plot information about the protein, for example, PTM, Splice variant etc, using matplotlib library

add_base_track(space_fraction: float = 0.3, protein_name_position: float = 0.5, track_label: str = 'base_track', track_label_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, protein_name: str = 'A protein', protein_name_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 10}, rect_dict: Dict[str, Union[int, str]] = {'capstyle': 'butt', 'color': 'olive'}, number_ticks: int = 10, xticks_font_size: int = 4)

Adds a base track to the figure.

Parameters
  • space_fraction (float, optional) – A float between 0 and 1 that represent the fraction of space left below and above the track. The default is 0.3 which means that the track will be drown on a 40% while 60% are left as an empty space below and above the track.

  • protein_name_position (float, optional) – A float between 0 and 1 which control the relative position of the protein name on the y-axis. The default is 0.5.

  • track_label (string, optional) – The name on the track, which will be shown on the y-axis. The default is “base_track”.

  • track_label_dict (Dict[str,Union[int,str]], optional) – The parameters that control the printing of the track_label, for example, the font size and the color. These parameters should be provided as dict that will be fed to the function axes.set_ylabel.The default is {“fontsize”:8,”color”:”black”}.

  • protein_name (string, optional) – The name of the protein to be printed to the track. The default is “A protein”.

  • protein_name_dict (Dict[str,Union[int,str]], optional) – the parameters that control the printing of the protein name, for example, the font size and the color. These parameters should be provided as dict that will be fed to the function axes.text(). The default is {“fontsize”:10,”color”:”black”}.

  • rect_dict (Dict[str,Union[int,str]], optional) – a dictionary that control the character of the track itself, for example, the color and the transparency. this dict will be fed to the function plt.Rectangle(). The default is {“color”:”olive”,”capstyle”:”butt”}.

  • number_ticks (int) – The number of ticks on the x-axis. The default is 10.

  • xticks_font_size (int) – The font size of the x-axis ticks. The default is 4.

Returns

Return type

None.

Examples

>>> example_1=VisTool(250,(3,5),300)
    # create a graph of size 3 inches by 5 inches with a 300 dots per
    # inch (DPI) as a resolution metric for a protein of length 250 amino acids
>>> example_1.add_base_track()
    # adds a basic track using the default parameters.
>>> example_1.add_base_track(space_fraction=0.1,
                            track_label="example_1",
                            track_label_dict={"fontsize":5,"color":"blue"}
                            number_ticks=5,
                            xticks_font_size=6)
    # generate a base track with 10% empty space above and below
    #  the track. Track will have the name example_1 and it will be
    # shown in font 5 instead of 8 and in blue color instead of black.
    # five ticks will be shown on the x-axis using a font of size 6.

Notes

calling the function more than once will result in an overriding of the previously added base track, for example, in the examples section calling add_base_track for the second time will overrides the graph build by the previous call.

add_coverage_track(coverage_matrix: numpy.ndarray, coverage_as_base: bool = False, coverage_dict: Dict[str, Union[int, str]] = {'color': 'blue', 'width': 1.2}, xlabel: str = 'positions', xlabel_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 6}, ylabel: str = 'coverage', ylabel_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 6}, number_ticks: int = 10, xticks_font_size: int = 4, yticks_font_size: int = 4)

Adds a coverage plot to the panel. The coverage plot shows the relationship between a peptide and its experimentally detected eluted peptide/peptides.

Parameters
  • coverage_matrix (np.ndarray) – A protein length by one array which summarize information about the protein and the eluted peptides.

  • coverage_as_base (bool, optional) – Whether or not to plot the coverage as a base track for the figure. The default is False which means that the track appended to a figure that have a default base track which can be constructed using the method add_base_track. However, if coverage_as_base is set to True, the function will draw the base track using the coverage matrix and calling the function add_base_track should be avoided.

  • coverage_dict (Dict[str,Union[int,str]], optional) – The parameters that control the printing of the coverage matrix, for example, the color. These parameters are fed to the function axes.bar. The default is {“color”:”blue”,”width”:1.2}.

  • xlabel (str, optional) – The label of the x-axis of the coverage track. The default is “positions”.

  • xlabel_dict (Dict[str,Union[int,str]], optional) – The parameters that control the x-label printing, for example, the color and/ the font size. these parameters are fed to the function axes.set_xlabel. The default is {“fontsize”:6,”color”:”black”}.

  • ylabel (str, optional) – The label of the y-axis of the coverage track. The default is “coverage”.

  • ylabel_dict (Dict[str,Union[int,str]], optional) – The parameters that control the x-label printing, for example, the color and/ the font size. these parameters are fed to the function axes.set_ylabel. The default is {“fontsize”:6,”color”:”black”}.

  • number_ticks (int, optional) – The number of ticks on the x-axis. The default is 10.

  • xticks_font_size (float, optional) – The font size of the x-axis ticks. The default is 4.

  • yticks_font_size (float, optional) – The font size of the y-axis ticks. The default is 4.

add_marked_positions_track(positions: List[int], height_frac: float = 0.5, marker_bar_dict: Dict[str, Union[int, str]] = {'color': 'black', 'linestyles': 'solid'}, marker_dict: Dict[str, Union[int, str]] = {'color': 'red', 's': 3}, track_label: str = 'A marked positions Track', track_label_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, base_line_dict: Dict[str, Union[int, str]] = {'color': 'black', 'linewidth': 1})

The function adds a marked position to the track which is shown to highlight certain amino acid position within the protein, for example, a sequence variant position, or PTM position.

positionsList[int]

a list that contain the position/positions that should be heighlighted in the protein sequence.

height_fracfloat

the relative hight of the marked positions. The default is 0.5 which means that the hight of the marker will be 50% of the y-axis height.

marker_bar_dictDict[str,Union[int,str]], optional

The parameters of the marker position bar, for example, line width or color. These parameters are going to be fed to the function plt.hlines. The default is {“color”:”black”,”linestyles”:”solid”}.

marker_dictDict[str,Union[int,str]], optional

These are the parameters for the marker points which sits on top of the marker bar, for example, the color, the shape or the size. The default is {“color”:”red”,”s”:3}.

track_labelstr, optional

The name of the track, which will be shown on the y-axis. The default is “A marked positions Track”.

track_label_dictDict[str,Union[int,str]], optional

The parameters that control the printing of the track_label, for example, the font size and the color. These parameters should be provided as dict that will be fed to the function axes.set_ylabel.The default is {“fontsize”:8,”color”:”black”}.

base_line_dictDict[str,Union[int,str]], optional

The parameters that control the shape of the base line, for example, color and/or line width. These parameters are going to be fed to the function axes.hlines. The default is {“color”:”black”,”linewidth”:1}.

None.

>>> test_list=[24,26,75,124,220]
# first define a dict object that define some protein features.
>>> example_1=Annotator(protein_length=250, figure_size=(5,3), figure_dpi=200)
# creating a VisTool instance
>>> example_1.add_base_track()
# add a base_track
>>> example_1.add_marked_positions_track(test_list) # build a marked position track using the default parameters
# marked positions track
>>> example_1.add_marked_positions_track(positions=test_list,height_frac=0.75,
                                  track_label="Post_translational_modifications",
                                  marker_bar_dict={"color":"blue"})
# add a second marked position track with the following parameters:
#track name:  Post_translational_modifications
#hight of the maker bar = 75%
#color of the markerbar= blue

Any panel can have zero, one or more than one marked-position track. Thus, in the above examples calling the method add_marked_positions_track for the second time does NOT override the previous marked-position track it create a new one and added to the figure.

add_segmented_track(track_dict: Dict[str, Dict[str, Union[int, str]]], track_label: str = 'A segmented Track', track_label_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, track_element_names_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, center_line_dict: Dict[str, Union[int, str, float]] = {'alpha': 0.5, 'linewidth': 0.5}, track_elements_dict: Dict[str, Union[int, str]] = {'capstyle': 'butt', 'color': 'brown'}, show_names: bool = True) None

Adds a segmentation track which show non-overlapping features of the protein.

Parameters
  • track_dict (Dict[str,Dict[str,Union[int,str]]]) –

    A dict that contain the non-overlapping features of the protein. The dict is assumed to have the following structure: a dict with the feature_index as a key and associated features as values. The associated features is a dict with the following three keys:

    1- Name: which contain the feature name

    2- startIdx: which contain the start position of the protein

    3- endIdx: which contain the end position of the protein

  • track_label (str, optional) – The name of the track, which will be shown on the y-axis. The default is “A segmented Track”.

  • track_label_dict (Dict[str,Union[int,str]], optional) – the parameters that control the printing of the track_label, for example, the font size and the color. These parameters should be provided as dict that will be fed to the function axes.set_ylabel. The default is {“fontsize”:8,”color”:”black”}.

  • track_element_names_dict (Dict[str,Union[int,str]], optional) – the parameters that control the printing of the feature names on the track, for example, the font size and the color. These parameters should be provided as a dict that will be fed to the function axes.text. The default is {“fontsize”:8,”color”:”black”}.

  • center_line_dict (Dict[str,Union[int,str, float]], optional) – The parameters that control the printing of the center line of a segmented track object. The default is {“fontsize”:8,”color”:”black”}.

  • track_elements_dict (Dict[str,Union[int,str]], optional) – the parameters that control the printing of the feature rectangluar representation for example the color, the dict will be fed to the function plt.Rectangle. The default is {“color”:”brown”,”capstyle”:”butt”}.

  • show_names (bool, optional) – whether or not to show the name of the features. The default is True.

Returns

Return type

None.

Examples

>>> test_dict={"domain1":{"Name":"domain_one","startIdx":55,"endIdx":150},
               "domain2":{"Name":"domain_Two","startIdx":190,"endIdx":225}}
# first define a dict object that define some protein features.
>>> example_1=Annotator(protein_length=250, figure_size=(5,3), figure_dpi=200)
# creating a Annotator instance
>>> example_1.add_base_track()
# add a base_track
>>> example_1.add_segmented_track(test_dict) # build a segmented track using the default parameters
# add the segmented track
>>> example_1.add_segmented_track(track_dict=test_dict,
                                  track_label="Domains",
                                  track_elements_dict={"color":"brown"})
# add a second segmented track with track name set to Domains and elements
# of the track shown as brown rectangles.

Notes

Any panel can have one or more segmented-tracks. Thus, in the above examples calling the method add_segmented_track for the second time does NOT override the previous segmented track it create a new one and added to the figure.

add_stacked_track(track_dict: Dict[str, Dict[str, Union[int, str]]], track_label: str = 'A stacked Track', track_label_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, track_element_names_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, track_elements_dict: Dict[str, Union[int, str]] = {'capstyle': 'butt', 'color': 'magenta'}, base_line_dict: Dict[str, Union[int, str]] = {'color': 'black', 'linewidth': 1}, show_names: bool = True)

The function adds a stacked_track to a visualization panel. The stacked track is used to show overlapping protein features, for example, different splice variants.

Parameters

track_dict (Dict[str,Dict[str,Union[int,str]]]) –

A dict that contain the overlapping features of the protein. The dict is assumed to have the following structure, a dict with the feature_index as a key and associated features as values. The associated features is a dict with the following three keys:

1- Name: which contain the feature’s name

2- startIdx: which contain the start position of the feature.

3- endIdx: which contain the end position of the feature.

track_labelstr, optional

The name of the track, which will be shown on the y-axis. The default is “A stacked Track”.

track_label_dictDict[str,Union[int,str]], optional

the parameters that control the printing of the track_label, for example, the font size and the color. These parameters should be provided as dict that will be fed to the function axes.set_ylabel.The default is {“fontsize”:8,”color”:”black”}.

track_element_names_dictDict[str,Union[int,str]], optional

the parameters that control the printing of the feature names on the track, for example, the font size and the color. These parameters should be provided as a dict that will be fed to the function axes.text. The default is {“fontsize”:8,”color”:”black”}.

track_elements_dictDict[str,Union[int,str]], optional

the parameters that control the printing of the feature rectangluar representation for example the color, the dict will be fed to the function plt.Rectangle. The default is {“color”:”magenta”,”capstyle”:”butt”}.

base_line_dictDict[str,Union[int,str]], optional

the parameters that control the shape of the base line, for example, color and/or line width. These parameters are going to be fed to the function axes.hlines. The default is {“color”:”black”,”linewidth”:1}.

show_namesbool, optional

whether or not to show the name of the features. The default is True.

Returns

Return type

None.

Examples

>>> test_dict={"feature_1":{"Name":"X","startIdx":55,"endIdx":150},
               "feature_2":{"Name":"Y","startIdx":85,"endIdx":225},
               "feature_3":{"Name":"Z","startIdx":160,"endIdx":240}}
 # first define a dict object that define some protein features.
>>> example_1=Annotator(protein_length=250, figure_size=(5,3), figure_dpi=200)
# creating a Annotator instance
>>> example_1.add_base_track()
# add a base_track
>>> example_1.add_segmented_track(test_dict) # build a stacked track using the default parameters.
# add the stacked track
>>> example_1.add_segmented_track(track_dict=test_dict,
                                  track_label="OverLappingFeat",
                                  track_elements_dict={"color":"red"})
# add a second segmented track with track name set to OverLappingFeat and elements
# of the track shown as red rectangles.

Notes

Any panel can have zero, one or more than one stacked-track. Thus, in the above examples calling the method add_stacked_track for the second time does NOT override the previous stacked track it creates a new one and added to the figure.

get_figure() matplotlib.figure.Figure
Returns

The figure with all the tracks that have been added to it.

Return type

matplotlib.figure.Figure

save_fig(name: str, output_path: str = '.', format_: str = 'png', figure_dpi: str = 'same', figure_saving_dict: Dict[str, Union[int, str]] = {'facecolor': 'white'}) None

Write the constructed figure to the disk.

Parameters
  • name (str) – The name of the figure to save the file.

  • output_path (str , optional) – The path to write the output, by default the function write to the current working directory.

  • format (str, optional) – The output format, this parameter will be fed to the method plt.savefig. The default is “png”.

  • figure_dpi (int, optional) – The dpi of the saved figure. The deafult is same which means the figure will be saved using the same dpi used for creating the figure.

  • figure_saving_dict (Dict[str,Union[int,str]],optional) – The parameters that should be fed to the function plt.savefig. The default is figure_saving_dict={“facecolor”:”white”}

Returns

Return type

None.

IPTK.Classes.Database module

This submodule defines a collection of container classes that are used through the library

class IPTK.Classes.Database.CellularLocationDB(path2data: str = 'https://www.proteinatlas.org/download/subcellular_location.tsv.zip', sep: str = '\t')

Bases: object

The class provides an API to access the cellular location information from a database that follows the structure of the Human Proteome Atlas sub-cellular location database. See https://www.proteinatlas.org/about/download for more details.

add_to_database(genes_to_add: IPTK.Classes.Database.CellularLocationDB) None

adds the the location of more proteins to the database.

Parameters

genes_to_add (CellularLocationDB) – a CellularLocationDB instance containing the genes that shall be added to the database.

Raises
  • ValueError – if the genes_to_add to the database are already defined in the database

  • RuntimeError – incase any other error has been encountered while merging the tables.

get_approved_location(gene_id: Optional[str] = None, gene_name=None) List[str]

return the location of the provided gene id or gene name

Parameters
  • gene_id (str, optional) – the id of the gene of interest, defaults to None

  • gene_name ([type], optional) – the name of gene of interest, defaults to None

Raises
  • ValueError – if both gene_id and gene_name are None

  • KeyError – if gene_id is None and gene_name is not in the database

  • KeyError – if gene_name is None and gene_id is not in the database

  • RuntimeError – Incase an error was encountered while retriving the element from the database.

Returns

The approved location where the protein that corresponds to the provided name or id is located.

Return type

List[str]

get_gene_names() List[str]

return a list of all gene names in the dataset

Returns

the names of all genes in the database

Return type

List[str]

get_genes() List[str]

return a list of all gene ids in the dataset

Returns

all genes ids currently defined in the database

Return type

List[str]

get_go_names(gene_id: Optional[str] = None, gene_name=None) List[str]

return the location of the provided gene id or gene name

Parameters
  • gene_id (str, optional) – the id of the gene of interest , defaults to None

  • gene_name ([type], optional) – the name of the gene of interest , defaults to None

Raises
  • ValueError – if both gene_id and gene_name are None

  • KeyError – if gene_id is None and gene_name is not in the database

  • KeyError – if gene_name is None and gene_id is not in the database

  • RuntimeError – incase an error was encountered while retriving the element from the database.

Returns

The gene ontology, GO, location where the protein that corresponds to the provided name or id is located.

Return type

List[str]

get_main_location(gene_id: Optional[str] = None, corresponds=None) List[str]

Return the main location(s) of the provided gene id or gene name. If both gene Id and gene name are provided, gene_id has a higher precedence

Parameters
  • gene_id (str, optional) – The id of the gene of interest, defaults to None

  • gene_name ([type], optional) – The name of the gene of interest, defaults to None

Raises
  • ValueError – if both gene_id and gene_name are None

  • KeyError – if gene_id is None and gene_name is not in the database

  • KeyError – if gene_name is None and gene_id is not in the database

  • RuntimeError – Incase an error was encountered while retriving the element from the database

Returns

the main location where the protein that corresponds to the provided name or id is located.

Return type

List[str]

get_table() pandas.core.frame.DataFrame

return the instance table

Returns

the location table of the instance.

Return type

pd.DataFrame

class IPTK.Classes.Database.GeneExpressionDB(path2data: str = 'https://www.proteinatlas.org/download/rna_tissue_consensus.tsv.zip', sep: str = '\t')

Bases: object

The class provides an API to access gene expression data stored in table that follows the same structure as the Human proteome Atlas Normalized RNA Expression see https://www.proteinatlas.org/about/download for more details

get_expression(gene_name: Optional[str] = None, gene_id: Optional[str] = None) pandas.core.frame.DataFrame

Return a table summarizing the expression of the provided gene name or gene id accross different tissues.

Parameters
  • gene_id (str, optional) – the id of the gene of interest, defaults to None

  • gene_name ([type], optional) – the name of the gene of interest, defaults to None

Raises
  • ValueError – if both gene_id and gene_name are None

  • KeyError – if gene_id is None and gene_name is not in the database

  • KeyError – if gene_name is None and gene_id is not in the database

  • RuntimeError – incase an error was encountered while retriving the elements from the database

Returns

A table summarizing the expression of the provided gene accross all tissues in the database

Return type

pd.DataFrame

get_expression_in_tissue(tissue_name: str) pandas.core.frame.DataFrame

return the expression profile of the provided tissue

Parameters

tissue_name (str) – The name of the tissue

Raises
  • KeyError – Incase the provided tissue is not defined in the database

  • RuntimeError – In case an error was encountered while generating the expression profile.

Returns

A table summarizing the expression of all genes in the provided tissue.

Return type

pd.DataFrame

get_gene_names() List[str]

returns a list of the UNIQUE gene names currently in the database

Returns

A list of the UNIQUE gene names currently in the database

Return type

List[str]

get_genes() List[str]

returns a list of the UNIQUE gene ids currently in the database.

Returns

The list of the UNIQUE gene ids currently in the database

Return type

List[str]

get_table() pandas.core.frame.DataFrame

return a table containing the expression value of all the genes accross all tissues in the current instance

Returns

The expression of all genes accross all tissues in the database.

Return type

pd.DataFrame

get_tissues() List[str]

return a list of the tissues in the current database

Returns

A list containing the names of the UNIQUE tissues in the database.

Return type

List[str]

class IPTK.Classes.Database.OrganismDB(path2Fasta: str)

Bases: object

Extract information about the source organsim of a collection of protein sequencesfrom a fasta file and provides an API to query the results. The function expect the input fasta file to have headers written in the UNIPROT format.

get_number_protein_per_organism() pandas.core.frame.DataFrame

Provides a table containing the number of proteins per organism.

Returns

A table containing the number of proteins per organism

Return type

pd.DataFrame

get_org(prot_id: str) str

return the parent organism of the provided protein identifer

Parameters

prot_id (str) – the id of the protein of interest

Raises

KeyError – incase the provided identifier is not in the database

Returns

the name of the parent organism, i.e. the source organism.

Return type

str

get_unique_orgs() List[str]

Get the number of unique organisms in the database

Returns

a list of all unique organisms in the current instance

Return type

List[str]

class IPTK.Classes.Database.SeqDB(path2fasta: str)

Bases: object

Load a FASTA file and constructs a lock up dictionary where sequence ids are keys and sequences are values.

get_seq(protein_id: str) str

returns the corresponding sequence if the provided protein-id is defined in the database.

Parameters

protein_id (str) – The protein id to retrive its sequence, CASE SENSITIVE!!.

Raises

KeyError – If the provided protein does not exist in the database

Returns

the protein sequence

Return type

str

has_sequence(sequence_id: str) bool

check if the provided sequence id is an element of the database or not

Parameters

sequence_name (str) – The id of the sequence, CASE SENSITIVE!!.

Returns

True if the database has this id, False otherwise.

Return type

bool

IPTK.Classes.Experiment module

IPTK.Classes.ExperimentalSet module

IPTK.Classes.Features module

Parses the XML scheme of a uniprot protein and provides a python API for quering and accessing the results

class IPTK.Classes.Features.Features(uniprot_id: str, temp_dir: Optional[str] = None)

Bases: object

The class provides a template for the features associated with a protein. The following features are associated with the protein #signal peptide: dict

The range of the signal peptides, if the protein has no signal, for example, a globular cytologic protein. None is used as a default, placeholder value.

#chains:dict

the chains making up the mature protein, the protein should at least have one chain.

#domain: dict

the known domains in the protein, if no domain is defined, None is used.

#modification sites: nested dict

that contains information about the PTM sites, glycosylation site and disulfide bonds.

#sequence variances: dict

which contains information about the sequence variants of a protein structure.

#split variance: dict

which contain known splice variants

** Notes: Although disulfide bond is not a PTMs, it is being treated as a one here to simplify the workflow.

get_PTMs() Dict[str, Dict[str, Dict[str, Union[int, str]]]]
Returns

a nested dictionary that contains the PTMs found within the protein the PTMs are classified into three main categories:

1- Modifications: which is the generic case and contain information about any sequence modification beside disulfide bonds and glycosylation.

2- glycosylation: contains information about glycosylation sites

3- DisulfideBond: contains information about disulfide bond

Return type

Dict[str,Dict[str,Dict[str,Union[str,int]]]]

get_PTMs_glycosylation() Dict[str, Dict[str, Union[int, str]]]
Returns

The glycosylation sites found on the protein. If the protein has no glycosylation sites, the function returns None.

Return type

[type]

get_PTMs_modifications() Dict[str, Dict[str, Union[int, str]]]
Returns

The generic modifications found on the protein. If the protein has no PTM, the function returns None.

Return type

Dict[str,Dict[str,Union[str,int]]]

get_chains() Dict[Dict[str, Union[str, int]]]
Returns

A dictionary that contains the chains of the protein, if no chain is defined it return None

Return type

Dict[Dict[str,Union[str,int]]]

get_disulfide_bonds() Dict[str, Dict[str, Union[int, str]]]
Returns

The disulfide sites found on the protein. If the protein has no disulfide sites, the function returns None

Return type

[type]

get_domains() Dict[str, Dict[str, int]]
Returns

The domains defined in the protein sequence, if no domain is defined it returns None.

Return type

Dict[str, Dict[str, int]]

get_num_transmembrane_regions() int

Return the number of transmembrane regions on the protein

Returns

Return the number of transmembrane regions on the protein

Return type

int

get_number_PTMs() int
Returns

The number of PTMs the sequence has, this include di-sulfide bonds. See Note1 for more details. If the protein has no PTMs the function returns zero

Return type

int

get_number_chains() int
Returns

The number of chains in the protein. if no chain is defined it returns zero.

Return type

int

get_number_disulfide_bonds() int
Returns

The number of disulfide bonds the protein has, if the protein has no disulfide bonds, the function return zero.

Return type

int

get_number_domains() int
Returns

The number of domains a protein has, if no domain is defined it returns zero.

Return type

int

get_number_glycosylation_sites() int
Returns

The number of glycosylation_sites the protein has, if the protein has no glycosylation sites, the function returns zero

Return type

int

get_number_modifications() int
Returns

Returns the total number of generic modifications found on the protein. if no modification is found it return 0

Return type

int

get_number_sequence_variants() int
Returns

The number of sequence variants the protein has, if the protein has no sequence varient, the function returns 0.

Return type

int

get_number_splice_variants() int
Returns

The number of slice variants in the protein, if the protein has no splice variants, the function returns zero.

Return type

int

get_sequence_variants() Dict[str, Dict[str, Union[int, str]]]
Returns

A dict object that contains all sequence variants within a protein, if the protein has no sequence variants the function returns None.

Return type

Dict[str,Dict[str,Union[str,int]]]

get_signal_peptide_index() Tuple[int, int]
Returns

The Index of the signal-peptide in the protein, if not signal peptide is defined, it returns None

Return type

Tuple[int,int]

get_splice_variants() Dict[str, Dict[str, Union[int, str]]]
Returns

A dict object that contains the splice variants. If the protein has no splice variants the function returns None.

Return type

Dict[str,Dict[str,Union[str,int]]]

get_transmembrane_regions() List[Tuple[int, int]]

return a list containing the boundaries of transmembrane regions in the protein

Returns

a list containing the boundaries of transmembrane regions in the protein

Return type

List[Tuple[int,int]]

has_PTMs() bool

:return:True if the protein has a PTMs and False other wise :rtype: bool

has_chains() bool
Returns

True if the protein has/have chain/chains as feature and False otherwise.

Return type

[type]

has_disulfide_bond() bool
Returns

True is the protein has disulfide and False other wise

Return type

bool

has_domains() bool
Returns

True if the protein has a defined domain/domains, otherwise it return False.

Return type

bool

has_glycosylation_site() bool
Returns

True if the protein has a glycosylation site and False otherwise.

Return type

[type]

has_sequence_variants() bool
Returns

True if the protein has a sequence variants, and False otherwise.

Return type

bool

has_signal_peptide() bool
Returns

True if the protein has a signal peptide and False other wise.

Return type

bool

has_site_modifications() bool
Returns

True if the protein has a modification site and False otherwise

Return type

bool

has_splice_variants() bool
Returns

True if the sequence has a splice variants and False otherwise.

Return type

bool

has_transmembrane_domains() bool
Returns

True if the protein has transmembrane region and false otherwise

Return type

bool

summary() Dict[str, Union[int, str]]
Returns

The function return a dict object that summarizes the features of the protein.

Return type

Dict[str,Union[str,int]]

IPTK.Classes.HLAChain module

The implementation of an HLA molecules

class IPTK.Classes.HLAChain.HLAChain(name: str)

Bases: object

get_allele_group() str
Returns

The allele group

Return type

str

get_chain_class(gene_name: str) int
Parameters

gene_name (str) – the name of the gene

Returns

1 if the gene belongs to class one and 2 if it belong to class two

Return type

int

get_class() int
Returns

The HLA class

Return type

int

get_gene() str
Returns

The gene name

Return type

str

get_name() str
Returns

The chain name

Return type

str

get_protein_group() str
Returns

The protein name

Return type

str

IPTK.Classes.HLAMolecules module

a representation of an HLA molecules

class IPTK.Classes.HLAMolecules.HLAMolecule(**hla_chains)

Bases: object

get_allele_group() List[str]
Returns

The allele group for the instance chain/pair of chains

Return type

AlleleGroup

get_class() int
Returns

The class of the HLA molecules

Return type

int

get_gene() List[str]
Returns

return gene/pair of genes coding for the current HLA molecules

Return type

Genes

get_name(sep: str = ':') str
Parameters

sep (str, optional) – The name of the allele by concatenating the names of the individual chains using a separator, defaults to ‘:’

Returns

[description]

Return type

str

get_protein_group() List[str]
Returns

The protein group for the instance chain/pair of chains

Return type

ProteinGroup

IPTK.Classes.HLASet module

An abstraction for a collection of HLA alleles

class IPTK.Classes.HLASet.HLASet(hlas: List[str], gene_sep: str = ':')

Bases: object

get_alleles() List[str]
Returns

The current alleles in the set

Return type

int

get_class() int
Returns

The class of the HLA-alleles in the current instance

Return type

int

get_hla_count() int
Returns

The count of HLA molecules in the set

Return type

int

get_names() List[str]

Return a list of all HLA allele names defined in the set

Returns

[description]

Return type

List[str]

has_allele(allele: str) bool
Parameters

allele (str) – The name of the alleles to check for its occurrence in the instance.

Returns

True, if the provided allele is in the current instance, False otherwise.

Return type

bool

has_allele_group(allele_group: str) bool
Parameters

allele_group (str) – The allele group to search the set for

Returns

True, if at least one allele in the set belongs to the provided allele group, False otherwise.

Return type

bool

has_gene(gene_name: str) bool
Parameters

gene_name (str) – the gene name to search the set against.

Returns

True, if at least one of the alleles in the set belongs to the provided gene. False otherwise

Return type

bool

has_protein_group(protein_group: str) bool
Parameters

protein_group – The protein group to search the set for

Returns

True, if at least one allele in the set belongs to the provided protein group

Return type

bool

IPTK.Classes.Peptide module

IPTK.Classes.Proband module

A description for an IP proband

class IPTK.Classes.Proband.Proband(**info)

Bases: object

get_meta_data() dict
Returns

A dict containing all the meta-data about the proband

Return type

dict

get_name() str
Returns

The name of the proband

Return type

str

update_info(**info) None

Add new or update existing info about the patient using an arbitrary number of key-value pairs to be added to the instance meta-info dict

IPTK.Classes.Protein module

IPTK.Classes.Tissue module

A representation of the Tissue used in an IP Experiment.

class IPTK.Classes.Tissue.ExpressionProfile(name: str, expression_table: pandas.core.frame.DataFrame, aux_proteins: Optional[pandas.core.frame.DataFrame] = None)

Bases: object

a representation of tissue reference expression value.

get_gene_id_expression(gene_id: str) float
Parameters

gene_id (str) – the gene id to retrive its expression value from the database

Raises

KeyError – if the provided id is not defined in the instance table

Returns

the expression value of the provided gene id.

Return type

float

get_gene_name_expression(gene_name: str) float
Parameters

gene_name (str) – the gene name to retrive its expression value from the database

Raises

KeyError – if the provided id is not defined in the instance table

Returns

the expression value of the provided gene name.

Return type

float

get_name() str
Returns

the name of the tissue where the expression profile was obtained

Return type

str

get_table() pandas.core.frame.DataFrame
Returns

return a table that contain the expression of all the transcripts in the current profile including core and auxiliary proteins

Return type

pd.DataFrame

class IPTK.Classes.Tissue.Tissue(name: str, main_exp_value: IPTK.Classes.Database.GeneExpressionDB, main_location: IPTK.Classes.Database.CellularLocationDB, aux_exp_value: Optional[IPTK.Classes.Database.GeneExpressionDB] = None, aux_location: Optional[IPTK.Classes.Database.CellularLocationDB] = None)

Bases: object

get_expression_profile() IPTK.Classes.Tissue.ExpressionProfile
Returns

the expresion profile of the current tissue

Return type

ExpressionProfile

get_name() str
Returns

the name of the tissue

Return type

str

get_subCellular_locations() IPTK.Classes.Database.CellularLocationDB
Returns

the sub-cellular localization of all the proteins stored in current instance resources.

Return type

CellularLocationDB

Module contents