IPTK.Utils package

Submodules

IPTK.Utils.DevFunctions module

IPTK.Utils.Mapping module

A submodule that contain function to map different database keys

IPTK.Utils.Mapping.map_from_uniprot_gene(uniprots: List[str]) pandas.core.frame.DataFrame

map from uniprot id to ensemble gene ids

Parameters

uniprots (List[str]) – a list of uniprot IDs

Returns

A table that contain the mapping between each uniprot and its corresponding Gene ID/IDs

Return type

pd.DataFrame

IPTK.Utils.Mapping.map_from_uniprot_pdb(uniprots: List[str]) pandas.core.frame.DataFrame

map from uniprot id to protein data bank identifiers

Parameters

uniprots (List[str]) – a list of uniprot IDs

Returns

A table that contain the mapping between each uniprot and its corresponding PDB ID/IDs

Return type

pd.DataFrame

IPTK.Utils.Mapping.map_from_uniprot_to_Entrez_Gene(uniprots: List[str]) pandas.core.frame.DataFrame

map from uniprot id to ensemble gene ids

Parameters

uniprots (List[str]) – a list of uniprot IDs

Returns

A table that contain the mapping between each uniprot and its corresponding Gene ID/IDs

Return type

pd.DataFrame

IPTK.Utils.Types module

Contain a definition of commonly used types through the library

IPTK.Utils.UtilityFunction module

Utility functions that are used through the library

IPTK.Utils.UtilityFunction.append_to_calling_string(param: str, def_value, cur_val, calling_string: str, is_flag: bool = False) str

help function that take a calling string, a parameter, a default value and current value if the parameter does not equal its default value the function append the parameter with its current value to the calling string adding a space before the calling_string.

Parameters
  • param (str) – The name of the parameter that will be append to the calling string

  • def_value ([type]) – The default value for the parameter

  • cur_val ([type]) – The current value for the parameter

  • calling_string (str) – The calling string in which the parameter and the current value might be appended to it

  • is_flag (bool, optional) – If the parameter is a control flag, i.e. a boolean switch, it append the parameter to the calling string without associating a value to it , defaults to False

Returns

the updated version of the calling string

Return type

str

IPTK.Utils.UtilityFunction.build_sequence_table(sequence_dict: Dict[str, str]) pandas.core.frame.DataFrame

construct a sequences database from a sequences dict object

Parameters

sequence_dict (Dict[str,str]) – a dict that contain the protein ids as keys and sequences as values.

Returns

pandas dataframe that contain the protein ID and the associated protein sequence

Return type

pd.DataFrame

IPTK.Utils.UtilityFunction.check_peptide_made_of_std_20_aa(peptide: str) str

Check if the peptide is made of the standard 20 amino acids, if this is the case, it return the peptide sequence, otherwise it return an empty string

Parameters

peptide (str) – a peptide sequence to check its composition

Returns

True, if the peptide is made of the standard 20 amino acids, False otherwise.

Return type

str

IPTK.Utils.UtilityFunction.combine_summary(child_dfs: List[pandas.core.frame.DataFrame], root_df: Optional[pandas.core.frame.DataFrame] = None) pandas.core.frame.DataFrame

combine multiple summaray dataframes into one dataframe

Parameters
  • child_dfs (List[pd.DataFrame]) – a list of summary dataframes to conctinate into one

  • root_df (pd.DataFrame, optional) – a dataframe to append the child dataframe to its tail, defaults to None

Returns

a dataframe containing the root and the child dataframes

Return type

pd.DataFrame

IPTK.Utils.UtilityFunction.generate_color_scale(color_ranges: int) matplotlib.colors.LinearSegmentedColormap

generate a color gradient with number of steps equal to color_ranges -1

Parameters

color_ranges (int) – the number of colors in the range

Returns

A color gradient palette

Return type

matplotlib.colors.LinearSegmentedColormap

IPTK.Utils.UtilityFunction.generate_random_name(name_length: int) str
Parameters

name_length (int) – Generate a random ASCII based string

Returns

[description]

Return type

str

IPTK.Utils.UtilityFunction.generate_random_protein_mapping(protein_len: int, max_coverage: int) numpy.ndarray

Generate a NumPy array with shape of 1 by protein_len where the elements in the array is a random integer between zero & max_coverage.

Parameters
  • protein_len (int) – The length of the protein

  • max_coverage (int) – The maximum peptide coverage at each position

Returns

a NumPy array containing a simulated protein coverage

Return type

np.ndarray

IPTK.Utils.UtilityFunction.get_experiment_summary(ident_table: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

takes as an input an identification table and return a summary table containing the count of unique peptides, unique proteins, maximum peptide length, minmum peptide length, median and mean peptide length

Parameters

ident_table (pd.DataFrame) – the identification table as returned by one of the parser functions defined in the IO modules

Returns

The summary table

Return type

pd.DataFrame

IPTK.Utils.UtilityFunction.get_idx_peptide_in_sequence_table(sequence_table: pandas.core.frame.DataFrame, peptide: str) List[str]

check the sequences table if the provided peptide is locate in one of its sequences and returns a list of protein identifiers containing the identifier of the hit proteins.

Parameters
  • sequence_table (pd.DataFrame) – pandas dataframe that contain the protein ID and the associated protein sequence

  • peptide (str) – The peptide sequence to query the protein with

Returns

A list of protein identifiers containing the identifier of the hit proteins

Return type

List[str]

IPTK.Utils.UtilityFunction.load_3d_figure(file_path: str) matplotlib.figure.Figure
Parameters

file_path (str) – Load a pickled 3D figure from the provided path

Raises

IOError – The path of the pickled figure.

Returns

a matplotlib figure

Return type

plt.Figure

IPTK.Utils.UtilityFunction.pad_mapped_proteins(list_array: List[numpy.ndarray], pre_pad: bool = True, padding_char: int = - 1) numpy.ndarray

Pad the provided list of array into a 2D tensor of shape number of arrays by maxlength.

Parameters
  • list_array (List[np.ndarray]) – A list of NumPy arrays where each array is a mapped_protein array, the expected shape of these arrays is 1 by protein length.

  • pre_pad (bool, optional) – pre or post padding of shorter array in the list_array. Defaults to True, which mean prepadding

  • padding_char (int, optional) – The padding char, defaults to -1

Returns

A 2D tensor of shape number of arrays by maxlength.

Return type

np.ndarray

IPTK.Utils.UtilityFunction.save_3d_figure(outpath: str, fig2save: matplotlib.figure.Figure) None

write a pickled version of the a 3D figure so it can be loaded later for more interactive analysis

Parameters
  • outpath (str) – The output path of the writer function

  • fig2save (plt.Figure) – The figure to save to the output file

Raises

IOError – In case writing the file failed

IPTK.Utils.UtilityFunction.simulate_protein_binary_represention(num_conditions: int, protein_length: int)
Parameters
  • num_conditions (int) – The number of conditions to simulate

  • protein_length (int) – The Length of the protein

Returns

A 2D matrix of shape protein_length by number of conditions, where each element can be either zero or 1.

Return type

np.ndarray

IPTK.Utils.UtilityFunction.simulate_protein_representation(num_conditions: int, protein_len: int, protein_coverage: int) Dict[str, numpy.ndarray]

Simulate protein peptide coverage under-different conditions

Parameters
  • num_conditions ([type]) – The number of condition to simulate

  • protein_len ([type]) – The length of the protein

  • protein_coverage ([type]) – The maximum protein coverage

Returns

a dict of length num_conditions containing the condition index and a simulated protein array

Return type

Dict[str, np.ndarray]

Module contents