IPTK.Classes package¶

Submodules¶

IPTK.Classes.Annotator module¶

The class provides methods for visualizing different aspects of the protein biology. This is achieved through three main methods:

1- add_segmented_track: which visualize information about non-overlapping protein substructures, for example, protein domains.

2- add_stacked_track: which visualize information about overlapping protein substructures, for example, splice variants.

3- add_marked_positions_track: which visualize or highlight positions in the protein, for example, sequence variants, or PTM.

The class also provides functions for visualizing the relationship between a protein and its eluted peptide/peptides in an analogous manner to the way NGS reads are aligned to genomic regions. This can be useful to identify regions in the protein with high/low number of eluted peptides, i.e.,Coverage. Also, to link it with other facests of the protein like domain organization,PTM, sequence/splice variants.

Notes

each figure should have a base track this can be done explicitly by calling the function add_base_track or by implicitly by calling the function add_coverage_plot with the parameter coverage_as_base=True.

class IPTK.Classes.Annotator.Annotator(protein_length: int, figure_size: Tuple[int, int], figure_dpi: int, face_color='white')¶

Bases: object

A high level API to plot information about the protein, for example, PTM, Splice variant etc, using matplotlib library

add_base_track(space_fraction: float = 0.3, protein_name_position: float = 0.5, track_label: str = 'base_track', track_label_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, protein_name: str = 'A protein', protein_name_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 10}, rect_dict: Dict[str, Union[int, str]] = {'capstyle': 'butt', 'color': 'olive'}, number_ticks: int = 10, xticks_font_size: int = 4)¶

Adds a base track to the figure.

Parameters

space_fraction (float, optional) – A float between 0 and 1 that represent the fraction of space left below and above the track. The default is 0.3 which means that the track will be drown on a 40% while 60% are left as an empty space below and above the track.
protein_name_position (float, optional) – A float between 0 and 1 which control the relative position of the protein name on the y-axis. The default is 0.5.
track_label (string, optional) – The name on the track, which will be shown on the y-axis. The default is “base_track”.
track_label_dict (Dict[str,Union[int,str]], optional) – The parameters that control the printing of the track_label, for example, the font size and the color. These parameters should be provided as dict that will be fed to the function axes.set_ylabel.The default is {“fontsize”:8,”color”:”black”}.
protein_name (string, optional) – The name of the protein to be printed to the track. The default is “A protein”.
protein_name_dict (Dict[str,Union[int,str]], optional) – the parameters that control the printing of the protein name, for example, the font size and the color. These parameters should be provided as dict that will be fed to the function axes.text(). The default is {“fontsize”:10,”color”:”black”}.
rect_dict (Dict[str,Union[int,str]], optional) – a dictionary that control the character of the track itself, for example, the color and the transparency. this dict will be fed to the function plt.Rectangle(). The default is {“color”:”olive”,”capstyle”:”butt”}.
number_ticks (int) – The number of ticks on the x-axis. The default is 10.
xticks_font_size (int) – The font size of the x-axis ticks. The default is 4.

Returns

Return type

None.

Examples

>>> example_1=VisTool(250,(3,5),300)
    # create a graph of size 3 inches by 5 inches with a 300 dots per
    # inch (DPI) as a resolution metric for a protein of length 250 amino acids

>>> example_1.add_base_track()
    # adds a basic track using the default parameters.

>>> example_1.add_base_track(space_fraction=0.1,
                            track_label="example_1",
                            track_label_dict={"fontsize":5,"color":"blue"}
                            number_ticks=5,
                            xticks_font_size=6)
    # generate a base track with 10% empty space above and below
    #  the track. Track will have the name example_1 and it will be
    # shown in font 5 instead of 8 and in blue color instead of black.
    # five ticks will be shown on the x-axis using a font of size 6.

Notes

calling the function more than once will result in an overriding of the previously added base track, for example, in the examples section calling add_base_track for the second time will overrides the graph build by the previous call.

add_coverage_track(coverage_matrix: numpy.ndarray, coverage_as_base: bool = False, coverage_dict: Dict[str, Union[int, str]] = {'color': 'blue', 'width': 1.2}, xlabel: str = 'positions', xlabel_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 6}, ylabel: str = 'coverage', ylabel_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 6}, number_ticks: int = 10, xticks_font_size: int = 4, yticks_font_size: int = 4)¶

Adds a coverage plot to the panel. The coverage plot shows the relationship between a peptide and its experimentally detected eluted peptide/peptides.

Parameters

coverage_matrix (np.ndarray) – A protein length by one array which summarize information about the protein and the eluted peptides.
coverage_as_base (bool, optional) – Whether or not to plot the coverage as a base track for the figure. The default is False which means that the track appended to a figure that have a default base track which can be constructed using the method add_base_track. However, if coverage_as_base is set to True, the function will draw the base track using the coverage matrix and calling the function add_base_track should be avoided.
coverage_dict (Dict[str,Union[int,str]], optional) – The parameters that control the printing of the coverage matrix, for example, the color. These parameters are fed to the function axes.bar. The default is {“color”:”blue”,”width”:1.2}.
xlabel (str, optional) – The label of the x-axis of the coverage track. The default is “positions”.
xlabel_dict (Dict[str,Union[int,str]], optional) – The parameters that control the x-label printing, for example, the color and/ the font size. these parameters are fed to the function axes.set_xlabel. The default is {“fontsize”:6,”color”:”black”}.
ylabel (str, optional) – The label of the y-axis of the coverage track. The default is “coverage”.
ylabel_dict (Dict[str,Union[int,str]], optional) – The parameters that control the x-label printing, for example, the color and/ the font size. these parameters are fed to the function axes.set_ylabel. The default is {“fontsize”:6,”color”:”black”}.
number_ticks (int, optional) – The number of ticks on the x-axis. The default is 10.
xticks_font_size (float, optional) – The font size of the x-axis ticks. The default is 4.
yticks_font_size (float, optional) – The font size of the y-axis ticks. The default is 4.

add_marked_positions_track(positions: List[int], height_frac: float = 0.5, marker_bar_dict: Dict[str, Union[int, str]] = {'color': 'black', 'linestyles': 'solid'}, marker_dict: Dict[str, Union[int, str]] = {'color': 'red', 's': 3}, track_label: str = 'A marked positions Track', track_label_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, base_line_dict: Dict[str, Union[int, str]] = {'color': 'black', 'linewidth': 1})¶

The function adds a marked position to the track which is shown to highlight certain amino acid position within the protein, for example, a sequence variant position, or PTM position.

positionsList[int]
a list that contain the position/positions that should be heighlighted in the protein sequence.

height_fracfloat
the relative hight of the marked positions. The default is 0.5 which means that the hight of the marker will be 50% of the y-axis height.

marker_bar_dictDict[str,Union[int,str]], optional
The parameters of the marker position bar, for example, line width or color. These parameters are going to be fed to the function plt.hlines. The default is {“color”:”black”,”linestyles”:”solid”}.

marker_dictDict[str,Union[int,str]], optional
These are the parameters for the marker points which sits on top of the marker bar, for example, the color, the shape or the size. The default is {“color”:”red”,”s”:3}.

track_labelstr, optional
The name of the track, which will be shown on the y-axis. The default is “A marked positions Track”.

track_label_dictDict[str,Union[int,str]], optional
The parameters that control the printing of the track_label, for example, the font size and the color. These parameters should be provided as dict that will be fed to the function axes.set_ylabel.The default is {“fontsize”:8,”color”:”black”}.

base_line_dictDict[str,Union[int,str]], optional

The parameters that control the shape of the base line, for example, color and/or line width. These parameters are going to be fed to the function axes.hlines. The default is {“color”:”black”,”linewidth”:1}.

None.

>>> test_list=[24,26,75,124,220]
# first define a dict object that define some protein features.

>>> example_1=Annotator(protein_length=250, figure_size=(5,3), figure_dpi=200)
# creating a VisTool instance

>>> example_1.add_base_track()
# add a base_track

>>> example_1.add_marked_positions_track(test_list) # build a marked position track using the default parameters
# marked positions track

>>> example_1.add_marked_positions_track(positions=test_list,height_frac=0.75,
                                  track_label="Post_translational_modifications",
                                  marker_bar_dict={"color":"blue"})
# add a second marked position track with the following parameters:
#track name:  Post_translational_modifications
#hight of the maker bar = 75%
#color of the markerbar= blue

Any panel can have zero, one or more than one marked-position track. Thus, in the above examples calling the method add_marked_positions_track for the second time does NOT override the previous marked-position track it create a new one and added to the figure.

add_segmented_track(track_dict: Dict[str, Dict[str, Union[int, str]]], track_label: str = 'A segmented Track', track_label_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, track_element_names_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, center_line_dict: Dict[str, Union[int, str, float]] = {'alpha': 0.5, 'linewidth': 0.5}, track_elements_dict: Dict[str, Union[int, str]] = {'capstyle': 'butt', 'color': 'brown'}, show_names: bool = True) → None¶

Adds a segmentation track which show non-overlapping features of the protein.

Parameters

track_dict (Dict[str,Dict[str,Union[int,str]]]) –
A dict that contain the non-overlapping features of the protein. The dict is assumed to have the following structure: a dict with the feature_index as a key and associated features as values. The associated features is a dict with the following three keys:

1- Name: which contain the feature name

2- startIdx: which contain the start position of the protein

3- endIdx: which contain the end position of the protein
track_label (str, optional) – The name of the track, which will be shown on the y-axis. The default is “A segmented Track”.
track_label_dict (Dict[str,Union[int,str]], optional) – the parameters that control the printing of the track_label, for example, the font size and the color. These parameters should be provided as dict that will be fed to the function axes.set_ylabel. The default is {“fontsize”:8,”color”:”black”}.
track_element_names_dict (Dict[str,Union[int,str]], optional) – the parameters that control the printing of the feature names on the track, for example, the font size and the color. These parameters should be provided as a dict that will be fed to the function axes.text. The default is {“fontsize”:8,”color”:”black”}.
center_line_dict (Dict[str,Union[int,str, float]], optional) – The parameters that control the printing of the center line of a segmented track object. The default is {“fontsize”:8,”color”:”black”}.
track_elements_dict (Dict[str,Union[int,str]], optional) – the parameters that control the printing of the feature rectangluar representation for example the color, the dict will be fed to the function plt.Rectangle. The default is {“color”:”brown”,”capstyle”:”butt”}.
show_names (bool, optional) – whether or not to show the name of the features. The default is True.

Returns

Return type

None.

Examples

>>> test_dict={"domain1":{"Name":"domain_one","startIdx":55,"endIdx":150},
               "domain2":{"Name":"domain_Two","startIdx":190,"endIdx":225}}
# first define a dict object that define some protein features.

>>> example_1=Annotator(protein_length=250, figure_size=(5,3), figure_dpi=200)
# creating a Annotator instance

>>> example_1.add_base_track()
# add a base_track

>>> example_1.add_segmented_track(test_dict) # build a segmented track using the default parameters
# add the segmented track

>>> example_1.add_segmented_track(track_dict=test_dict,
                                  track_label="Domains",
                                  track_elements_dict={"color":"brown"})
# add a second segmented track with track name set to Domains and elements
# of the track shown as brown rectangles.

Notes

Any panel can have one or more segmented-tracks. Thus, in the above examples calling the method add_segmented_track for the second time does NOT override the previous segmented track it create a new one and added to the figure.

add_stacked_track(track_dict: Dict[str, Dict[str, Union[int, str]]], track_label: str = 'A stacked Track', track_label_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, track_element_names_dict: Dict[str, Union[int, str]] = {'color': 'black', 'fontsize': 8}, track_elements_dict: Dict[str, Union[int, str]] = {'capstyle': 'butt', 'color': 'magenta'}, base_line_dict: Dict[str, Union[int, str]] = {'color': 'black', 'linewidth': 1}, show_names: bool = True)¶

The function adds a stacked_track to a visualization panel. The stacked track is used to show overlapping protein features, for example, different splice variants.

Parameters

track_dict (Dict[str,Dict[str,Union[int,str]]]) –

A dict that contain the overlapping features of the protein. The dict is assumed to have the following structure, a dict with the feature_index as a key and associated features as values. The associated features is a dict with the following three keys:

1- Name: which contain the feature’s name

2- startIdx: which contain the start position of the feature.

3- endIdx: which contain the end position of the feature.

track_labelstr, optional: The name of the track, which will be shown on the y-axis. The default is “A stacked Track”.
track_label_dictDict[str,Union[int,str]], optional: the parameters that control the printing of the track_label, for example, the font size and the color. These parameters should be provided as dict that will be fed to the function axes.set_ylabel.The default is {“fontsize”:8,”color”:”black”}.
track_element_names_dictDict[str,Union[int,str]], optional: the parameters that control the printing of the feature names on the track, for example, the font size and the color. These parameters should be provided as a dict that will be fed to the function axes.text. The default is {“fontsize”:8,”color”:”black”}.
track_elements_dictDict[str,Union[int,str]], optional: the parameters that control the printing of the feature rectangluar representation for example the color, the dict will be fed to the function plt.Rectangle. The default is {“color”:”magenta”,”capstyle”:”butt”}.
base_line_dictDict[str,Union[int,str]], optional: the parameters that control the shape of the base line, for example, color and/or line width. These parameters are going to be fed to the function axes.hlines. The default is {“color”:”black”,”linewidth”:1}.
show_namesbool, optional: whether or not to show the name of the features. The default is True.

Returns
Return type: None.

Examples

>>> test_dict={"feature_1":{"Name":"X","startIdx":55,"endIdx":150},
               "feature_2":{"Name":"Y","startIdx":85,"endIdx":225},
               "feature_3":{"Name":"Z","startIdx":160,"endIdx":240}}
 # first define a dict object that define some protein features.

>>> example_1=Annotator(protein_length=250, figure_size=(5,3), figure_dpi=200)
# creating a Annotator instance

>>> example_1.add_base_track()
# add a base_track

>>> example_1.add_segmented_track(test_dict) # build a stacked track using the default parameters.
# add the stacked track

>>> example_1.add_segmented_track(track_dict=test_dict,
                                  track_label="OverLappingFeat",
                                  track_elements_dict={"color":"red"})
# add a second segmented track with track name set to OverLappingFeat and elements
# of the track shown as red rectangles.

Notes

Any panel can have zero, one or more than one stacked-track. Thus, in the above examples calling the method add_stacked_track for the second time does NOT override the previous stacked track it creates a new one and added to the figure.

get_figure() → matplotlib.figure.Figure¶

Returns: The figure with all the tracks that have been added to it.
Return type: matplotlib.figure.Figure

save_fig(name: str, output_path: str = '.', format_: str = 'png', figure_dpi: str = 'same', figure_saving_dict: Dict[str, Union[int, str]] = {'facecolor': 'white'}) → None¶

Write the constructed figure to the disk.

Parameters

name (str) – The name of the figure to save the file.
output_path (str , optional) – The path to write the output, by default the function write to the current working directory.
format (str, optional) – The output format, this parameter will be fed to the method plt.savefig. The default is “png”.
figure_dpi (int, optional) – The dpi of the saved figure. The deafult is same which means the figure will be saved using the same dpi used for creating the figure.
figure_saving_dict (Dict[str,Union[int,str]],optional) – The parameters that should be fed to the function plt.savefig. The default is figure_saving_dict={“facecolor”:”white”}

Returns

Return type

None.

IPTK.Classes.Database module¶

This submodule defines a collection of container classes that are used through the library

class IPTK.Classes.Database.CellularLocationDB(path2data: str = 'https://www.proteinatlas.org/download/subcellular_location.tsv.zip', sep: str = '\t')¶

Bases: object

The class provides an API to access the cellular location information from a database that follows the structure of the Human Proteome Atlas sub-cellular location database. See https://www.proteinatlas.org/about/download for more details.

add_to_database(genes_to_add: IPTK.Classes.Database.CellularLocationDB) → None¶

adds the the location of more proteins to the database.

Parameters

genes_to_add (CellularLocationDB) – a CellularLocationDB instance containing the genes that shall be added to the database.

Raises

ValueError – if the genes_to_add to the database are already defined in the database
RuntimeError – incase any other error has been encountered while merging the tables.

get_approved_location(gene_id: Optional[str] = None, gene_name=None) → List[str]¶

return the location of the provided gene id or gene name

Parameters

gene_id (str, optional) – the id of the gene of interest, defaults to None
gene_name ([type], optional) – the name of gene of interest, defaults to None

Raises

ValueError – if both gene_id and gene_name are None
KeyError – if gene_id is None and gene_name is not in the database
KeyError – if gene_name is None and gene_id is not in the database
RuntimeError – Incase an error was encountered while retriving the element from the database.

Returns

The approved location where the protein that corresponds to the provided name or id is located.

Return type

List[str]

get_gene_names() → List[str]¶

return a list of all gene names in the dataset

Returns: the names of all genes in the database
Return type: List[str]

get_genes() → List[str]¶

return a list of all gene ids in the dataset

Returns: all genes ids currently defined in the database
Return type: List[str]

get_go_names(gene_id: Optional[str] = None, gene_name=None) → List[str]¶

return the location of the provided gene id or gene name

Parameters

gene_id (str, optional) – the id of the gene of interest , defaults to None
gene_name ([type], optional) – the name of the gene of interest , defaults to None

Raises

ValueError – if both gene_id and gene_name are None
KeyError – if gene_id is None and gene_name is not in the database
KeyError – if gene_name is None and gene_id is not in the database
RuntimeError – incase an error was encountered while retriving the element from the database.

Returns

The gene ontology, GO, location where the protein that corresponds to the provided name or id is located.

Return type

List[str]

get_main_location(gene_id: Optional[str] = None, corresponds=None) → List[str]¶

Return the main location(s) of the provided gene id or gene name. If both gene Id and gene name are provided, gene_id has a higher precedence

Parameters

gene_id (str, optional) – The id of the gene of interest, defaults to None
gene_name ([type], optional) – The name of the gene of interest, defaults to None

Raises

ValueError – if both gene_id and gene_name are None
KeyError – if gene_id is None and gene_name is not in the database
KeyError – if gene_name is None and gene_id is not in the database
RuntimeError – Incase an error was encountered while retriving the element from the database

Returns

the main location where the protein that corresponds to the provided name or id is located.

Return type

List[str]

get_table() → pandas.core.frame.DataFrame¶

return the instance table

Returns: the location table of the instance.
Return type: pd.DataFrame

class IPTK.Classes.Database.GeneExpressionDB(path2data: str = 'https://www.proteinatlas.org/download/rna_tissue_consensus.tsv.zip', sep: str = '\t')¶

Bases: object

The class provides an API to access gene expression data stored in table that follows the same structure as the Human proteome Atlas Normalized RNA Expression see https://www.proteinatlas.org/about/download for more details

get_expression(gene_name: Optional[str] = None, gene_id: Optional[str] = None) → pandas.core.frame.DataFrame¶

Return a table summarizing the expression of the provided gene name or gene id accross different tissues.

Parameters

gene_id (str, optional) – the id of the gene of interest, defaults to None
gene_name ([type], optional) – the name of the gene of interest, defaults to None

Raises

ValueError – if both gene_id and gene_name are None
KeyError – if gene_id is None and gene_name is not in the database
KeyError – if gene_name is None and gene_id is not in the database
RuntimeError – incase an error was encountered while retriving the elements from the database

Returns

A table summarizing the expression of the provided gene accross all tissues in the database

Return type

pd.DataFrame

get_expression_in_tissue(tissue_name: str) → pandas.core.frame.DataFrame¶

return the expression profile of the provided tissue

Parameters

tissue_name (str) – The name of the tissue

Raises

KeyError – Incase the provided tissue is not defined in the database
RuntimeError – In case an error was encountered while generating the expression profile.

Returns

A table summarizing the expression of all genes in the provided tissue.

Return type

pd.DataFrame

get_gene_names() → List[str]¶

returns a list of the UNIQUE gene names currently in the database

Returns: A list of the UNIQUE gene names currently in the database
Return type: List[str]

get_genes() → List[str]¶

returns a list of the UNIQUE gene ids currently in the database.

Returns: The list of the UNIQUE gene ids currently in the database
Return type: List[str]

get_table() → pandas.core.frame.DataFrame¶

return a table containing the expression value of all the genes accross all tissues in the current instance

Returns: The expression of all genes accross all tissues in the database.
Return type: pd.DataFrame

get_tissues() → List[str]¶

return a list of the tissues in the current database

Returns: A list containing the names of the UNIQUE tissues in the database.
Return type: List[str]

class IPTK.Classes.Database.OrganismDB(path2Fasta: str)¶

Bases: object

Extract information about the source organsim of a collection of protein sequencesfrom a fasta file and provides an API to query the results. The function expect the input fasta file to have headers written in the UNIPROT format.

get_number_protein_per_organism() → pandas.core.frame.DataFrame¶

Provides a table containing the number of proteins per organism.

Returns: A table containing the number of proteins per organism
Return type: pd.DataFrame

get_org(prot_id: str) → str¶

return the parent organism of the provided protein identifer

Parameters: prot_id (str) – the id of the protein of interest
Raises: KeyError – incase the provided identifier is not in the database
Returns: the name of the parent organism, i.e. the source organism.
Return type: str

get_unique_orgs() → List[str]¶

Get the number of unique organisms in the database

Returns: a list of all unique organisms in the current instance
Return type: List[str]

class IPTK.Classes.Database.SeqDB(path2fasta: str)¶

Bases: object

Load a FASTA file and constructs a lock up dictionary where sequence ids are keys and sequences are values.

get_seq(protein_id: str) → str¶

returns the corresponding sequence if the provided protein-id is defined in the database.

Parameters: protein_id (str) – The protein id to retrive its sequence, CASE SENSITIVE!!.
Raises: KeyError – If the provided protein does not exist in the database
Returns: the protein sequence
Return type: str

has_sequence(sequence_id: str) → bool¶

check if the provided sequence id is an element of the database or not

Parameters: sequence_name (str) – The id of the sequence, CASE SENSITIVE!!.
Returns: True if the database has this id, False otherwise.
Return type: bool

IPTK.Classes.Experiment module¶

IPTK.Classes.ExperimentalSet module¶

IPTK.Classes.Features module¶

Parses the XML scheme of a uniprot protein and provides a python API for quering and accessing the results

class IPTK.Classes.Features.Features(uniprot_id: str, temp_dir: Optional[str] = None)¶

Bases: object

The class provides a template for the features associated with a protein. The following features are associated with the protein #signal peptide: dict

The range of the signal peptides, if the protein has no signal, for example, a globular cytologic protein. None is used as a default, placeholder value.

#chains:dict: the chains making up the mature protein, the protein should at least have one chain.
#domain: dict: the known domains in the protein, if no domain is defined, None is used.
#modification sites: nested dict: that contains information about the PTM sites, glycosylation site and disulfide bonds.
#sequence variances: dict: which contains information about the sequence variants of a protein structure.
#split variance: dict: which contain known splice variants

** Notes: Although disulfide bond is not a PTMs, it is being treated as a one here to simplify the workflow.

get_PTMs() → Dict[str, Dict[str, Dict[str, Union[int, str]]]]¶

Returns

a nested dictionary that contains the PTMs found within the protein the PTMs are classified into three main categories:

1- Modifications: which is the generic case and contain information about any sequence modification beside disulfide bonds and glycosylation.

2- glycosylation: contains information about glycosylation sites

3- DisulfideBond: contains information about disulfide bond

Return type

Dict[str,Dict[str,Dict[str,Union[str,int]]]]

get_PTMs_glycosylation() → Dict[str, Dict[str, Union[int, str]]]¶

Returns: The glycosylation sites found on the protein. If the protein has no glycosylation sites, the function returns None.
Return type: [type]

get_PTMs_modifications() → Dict[str, Dict[str, Union[int, str]]]¶

Returns: The generic modifications found on the protein. If the protein has no PTM, the function returns None.
Return type: Dict[str,Dict[str,Union[str,int]]]

get_chains() → Dict[Dict[str, Union[str, int]]]¶

Returns: A dictionary that contains the chains of the protein, if no chain is defined it return None
Return type: Dict[Dict[str,Union[str,int]]]

get_disulfide_bonds() → Dict[str, Dict[str, Union[int, str]]]¶

Returns: The disulfide sites found on the protein. If the protein has no disulfide sites, the function returns None
Return type: [type]

get_domains() → Dict[str, Dict[str, int]]¶

Returns: The domains defined in the protein sequence, if no domain is defined it returns None.
Return type: Dict[str, Dict[str, int]]

get_num_transmembrane_regions() → int¶

Return the number of transmembrane regions on the protein

Returns: Return the number of transmembrane regions on the protein
Return type: int

get_number_PTMs() → int¶

Returns: The number of PTMs the sequence has, this include di-sulfide bonds. See Note1 for more details. If the protein has no PTMs the function returns zero
Return type: int

get_number_chains() → int¶

Returns: The number of chains in the protein. if no chain is defined it returns zero.
Return type: int

get_number_disulfide_bonds() → int¶

Returns: The number of disulfide bonds the protein has, if the protein has no disulfide bonds, the function return zero.
Return type: int

get_number_domains() → int¶

Returns: The number of domains a protein has, if no domain is defined it returns zero.
Return type: int

get_number_glycosylation_sites() → int¶

Returns: The number of glycosylation_sites the protein has, if the protein has no glycosylation sites, the function returns zero
Return type: int

get_number_modifications() → int¶

Returns: Returns the total number of generic modifications found on the protein. if no modification is found it return 0
Return type: int

get_number_sequence_variants() → int¶

Returns: The number of sequence variants the protein has, if the protein has no sequence varient, the function returns 0.
Return type: int

get_number_splice_variants() → int¶

Returns: The number of slice variants in the protein, if the protein has no splice variants, the function returns zero.
Return type: int

get_sequence_variants() → Dict[str, Dict[str, Union[int, str]]]¶

Returns: A dict object that contains all sequence variants within a protein, if the protein has no sequence variants the function returns None.
Return type: Dict[str,Dict[str,Union[str,int]]]

get_signal_peptide_index() → Tuple[int, int]¶

Returns: The Index of the signal-peptide in the protein, if not signal peptide is defined, it returns None
Return type: Tuple[int,int]

get_splice_variants() → Dict[str, Dict[str, Union[int, str]]]¶

Returns: A dict object that contains the splice variants. If the protein has no splice variants the function returns None.
Return type: Dict[str,Dict[str,Union[str,int]]]

get_transmembrane_regions() → List[Tuple[int, int]]¶

return a list containing the boundaries of transmembrane regions in the protein

Returns: a list containing the boundaries of transmembrane regions in the protein
Return type: List[Tuple[int,int]]

has_PTMs() → bool¶: :return:True if the protein has a PTMs and False other wise :rtype: bool

has_chains() → bool¶

Returns: True if the protein has/have chain/chains as feature and False otherwise.
Return type: [type]

has_disulfide_bond() → bool¶

Returns: True is the protein has disulfide and False other wise
Return type: bool

has_domains() → bool¶

Returns: True if the protein has a defined domain/domains, otherwise it return False.
Return type: bool

has_glycosylation_site() → bool¶

Returns: True if the protein has a glycosylation site and False otherwise.
Return type: [type]

has_sequence_variants() → bool¶

Returns: True if the protein has a sequence variants, and False otherwise.
Return type: bool

has_signal_peptide() → bool¶

Returns: True if the protein has a signal peptide and False other wise.
Return type: bool

has_site_modifications() → bool¶

Returns: True if the protein has a modification site and False otherwise
Return type: bool

has_splice_variants() → bool¶

Returns: True if the sequence has a splice variants and False otherwise.
Return type: bool

has_transmembrane_domains() → bool¶

Returns: True if the protein has transmembrane region and false otherwise
Return type: bool

summary() → Dict[str, Union[int, str]]¶

Returns: The function return a dict object that summarizes the features of the protein.
Return type: Dict[str,Union[str,int]]

IPTK.Classes.HLAChain module¶

The implementation of an HLA molecules

class IPTK.Classes.HLAChain.HLAChain(name: str)¶

Bases: object

get_allele_group() → str¶

Returns: The allele group
Return type: str

get_chain_class(gene_name: str) → int¶

Parameters: gene_name (str) – the name of the gene
Returns: 1 if the gene belongs to class one and 2 if it belong to class two
Return type: int

get_class() → int¶

Returns: The HLA class
Return type: int

get_gene() → str¶

Returns: The gene name
Return type: str

get_name() → str¶

Returns: The chain name
Return type: str

get_protein_group() → str¶

Returns: The protein name
Return type: str

IPTK.Classes.HLAMolecules module¶

a representation of an HLA molecules

class IPTK.Classes.HLAMolecules.HLAMolecule(**hla_chains)¶

Bases: object

get_allele_group() → List[str]¶

Returns: The allele group for the instance chain/pair of chains
Return type: AlleleGroup

get_class() → int¶

Returns: The class of the HLA molecules
Return type: int

get_gene() → List[str]¶

Returns: return gene/pair of genes coding for the current HLA molecules
Return type: Genes

get_name(sep: str = ':') → str¶

Parameters: sep (str, optional) – The name of the allele by concatenating the names of the individual chains using a separator, defaults to ‘:’
Returns: [description]
Return type: str

get_protein_group() → List[str]¶

Returns: The protein group for the instance chain/pair of chains
Return type: ProteinGroup

IPTK.Classes.HLASet module¶

An abstraction for a collection of HLA alleles

class IPTK.Classes.HLASet.HLASet(hlas: List[str], gene_sep: str = ':')¶

Bases: object

get_alleles() → List[str]¶

Returns: The current alleles in the set
Return type: int

get_class() → int¶

Returns: The class of the HLA-alleles in the current instance
Return type: int

get_hla_count() → int¶

Returns: The count of HLA molecules in the set
Return type: int

get_names() → List[str]¶

Return a list of all HLA allele names defined in the set

Returns: [description]
Return type: List[str]

has_allele(allele: str) → bool¶

Parameters: allele (str) – The name of the alleles to check for its occurrence in the instance.
Returns: True, if the provided allele is in the current instance, False otherwise.
Return type: bool

has_allele_group(allele_group: str) → bool¶

Parameters: allele_group (str) – The allele group to search the set for
Returns: True, if at least one allele in the set belongs to the provided allele group, False otherwise.
Return type: bool

has_gene(gene_name: str) → bool¶

Parameters: gene_name (str) – the gene name to search the set against.
Returns: True, if at least one of the alleles in the set belongs to the provided gene. False otherwise
Return type: bool

has_protein_group(protein_group: str) → bool¶

Parameters: protein_group – The protein group to search the set for
Returns: True, if at least one allele in the set belongs to the provided protein group
Return type: bool

IPTK.Classes.Peptide module¶

IPTK.Classes.Proband module¶

A description for an IP proband

class IPTK.Classes.Proband.Proband(**info)¶

Bases: object

get_meta_data() → dict¶

Returns: A dict containing all the meta-data about the proband
Return type: dict

get_name() → str¶

Returns: The name of the proband
Return type: str

update_info(**info) → None¶: Add new or update existing info about the patient using an arbitrary number of key-value pairs to be added to the instance meta-info dict

IPTK.Classes.Protein module¶

IPTK.Classes.Tissue module¶

A representation of the Tissue used in an IP Experiment.

class IPTK.Classes.Tissue.ExpressionProfile(name: str, expression_table: pandas.core.frame.DataFrame, aux_proteins: Optional[pandas.core.frame.DataFrame] = None)¶

Bases: object

a representation of tissue reference expression value.

get_gene_id_expression(gene_id: str) → float¶

Parameters: gene_id (str) – the gene id to retrive its expression value from the database
Raises: KeyError – if the provided id is not defined in the instance table
Returns: the expression value of the provided gene id.
Return type: float

get_gene_name_expression(gene_name: str) → float¶

Parameters: gene_name (str) – the gene name to retrive its expression value from the database
Raises: KeyError – if the provided id is not defined in the instance table
Returns: the expression value of the provided gene name.
Return type: float

get_name() → str¶

Returns: the name of the tissue where the expression profile was obtained
Return type: str

get_table() → pandas.core.frame.DataFrame¶

Returns: return a table that contain the expression of all the transcripts in the current profile including core and auxiliary proteins
Return type: pd.DataFrame

class IPTK.Classes.Tissue.Tissue(name: str, main_exp_value: IPTK.Classes.Database.GeneExpressionDB, main_location: IPTK.Classes.Database.CellularLocationDB, aux_exp_value: Optional[IPTK.Classes.Database.GeneExpressionDB] = None, aux_location: Optional[IPTK.Classes.Database.CellularLocationDB] = None)¶

Bases: object

get_expression_profile() → IPTK.Classes.Tissue.ExpressionProfile¶

Returns: the expresion profile of the current tissue
Return type: ExpressionProfile

get_name() → str¶

Returns: the name of the tissue
Return type: str

get_subCellular_locations() → IPTK.Classes.Database.CellularLocationDB¶

Returns: the sub-cellular localization of all the proteins stored in current instance resources.
Return type: CellularLocationDB

IPTK.Classes package¶

Submodules¶

IPTK.Classes.Annotator module¶

IPTK.Classes.Database module¶

IPTK.Classes.Experiment module¶

IPTK.Classes.ExperimentalSet module¶

IPTK.Classes.Features module¶

IPTK.Classes.HLAChain module¶

IPTK.Classes.HLAMolecules module¶

IPTK.Classes.HLASet module¶

IPTK.Classes.Peptide module¶

IPTK.Classes.Proband module¶

IPTK.Classes.Protein module¶

IPTK.Classes.Tissue module¶

Module contents¶