IPTK.Utils package¶
Submodules¶
IPTK.Utils.DevFunctions module¶
IPTK.Utils.Mapping module¶
A submodule that contain function to map different database keys
- IPTK.Utils.Mapping.map_from_uniprot_gene(uniprots: List[str]) pandas.core.frame.DataFrame ¶
map from uniprot id to ensemble gene ids
- Parameters
uniprots (List[str]) – a list of uniprot IDs
- Returns
A table that contain the mapping between each uniprot and its corresponding Gene ID/IDs
- Return type
pd.DataFrame
- IPTK.Utils.Mapping.map_from_uniprot_pdb(uniprots: List[str]) pandas.core.frame.DataFrame ¶
map from uniprot id to protein data bank identifiers
- Parameters
uniprots (List[str]) – a list of uniprot IDs
- Returns
A table that contain the mapping between each uniprot and its corresponding PDB ID/IDs
- Return type
pd.DataFrame
- IPTK.Utils.Mapping.map_from_uniprot_to_Entrez_Gene(uniprots: List[str]) pandas.core.frame.DataFrame ¶
map from uniprot id to ensemble gene ids
- Parameters
uniprots (List[str]) – a list of uniprot IDs
- Returns
A table that contain the mapping between each uniprot and its corresponding Gene ID/IDs
- Return type
pd.DataFrame
IPTK.Utils.Types module¶
Contain a definition of commonly used types through the library
IPTK.Utils.UtilityFunction module¶
Utility functions that are used through the library
- IPTK.Utils.UtilityFunction.append_to_calling_string(param: str, def_value, cur_val, calling_string: str, is_flag: bool = False) str ¶
help function that take a calling string, a parameter, a default value and current value if the parameter does not equal its default value the function append the parameter with its current value to the calling string adding a space before the calling_string.
- Parameters
param (str) – The name of the parameter that will be append to the calling string
def_value ([type]) – The default value for the parameter
cur_val ([type]) – The current value for the parameter
calling_string (str) – The calling string in which the parameter and the current value might be appended to it
is_flag (bool, optional) – If the parameter is a control flag, i.e. a boolean switch, it append the parameter to the calling string without associating a value to it , defaults to False
- Returns
the updated version of the calling string
- Return type
str
- IPTK.Utils.UtilityFunction.build_sequence_table(sequence_dict: Dict[str, str]) pandas.core.frame.DataFrame ¶
construct a sequences database from a sequences dict object
- Parameters
sequence_dict (Dict[str,str]) – a dict that contain the protein ids as keys and sequences as values.
- Returns
pandas dataframe that contain the protein ID and the associated protein sequence
- Return type
pd.DataFrame
- IPTK.Utils.UtilityFunction.check_peptide_made_of_std_20_aa(peptide: str) str ¶
Check if the peptide is made of the standard 20 amino acids, if this is the case, it return the peptide sequence, otherwise it return an empty string
- Parameters
peptide (str) – a peptide sequence to check its composition
- Returns
True, if the peptide is made of the standard 20 amino acids, False otherwise.
- Return type
str
- IPTK.Utils.UtilityFunction.combine_summary(child_dfs: List[pandas.core.frame.DataFrame], root_df: Optional[pandas.core.frame.DataFrame] = None) pandas.core.frame.DataFrame ¶
combine multiple summaray dataframes into one dataframe
- Parameters
child_dfs (List[pd.DataFrame]) – a list of summary dataframes to conctinate into one
root_df (pd.DataFrame, optional) – a dataframe to append the child dataframe to its tail, defaults to None
- Returns
a dataframe containing the root and the child dataframes
- Return type
pd.DataFrame
- IPTK.Utils.UtilityFunction.generate_color_scale(color_ranges: int) matplotlib.colors.LinearSegmentedColormap ¶
generate a color gradient with number of steps equal to color_ranges -1
- Parameters
color_ranges (int) – the number of colors in the range
- Returns
A color gradient palette
- Return type
matplotlib.colors.LinearSegmentedColormap
- IPTK.Utils.UtilityFunction.generate_random_name(name_length: int) str ¶
- Parameters
name_length (int) – Generate a random ASCII based string
- Returns
[description]
- Return type
str
- IPTK.Utils.UtilityFunction.generate_random_protein_mapping(protein_len: int, max_coverage: int) numpy.ndarray ¶
Generate a NumPy array with shape of 1 by protein_len where the elements in the array is a random integer between zero & max_coverage.
- Parameters
protein_len (int) – The length of the protein
max_coverage (int) – The maximum peptide coverage at each position
- Returns
a NumPy array containing a simulated protein coverage
- Return type
np.ndarray
- IPTK.Utils.UtilityFunction.get_experiment_summary(ident_table: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame ¶
takes as an input an identification table and return a summary table containing the count of unique peptides, unique proteins, maximum peptide length, minmum peptide length, median and mean peptide length
- Parameters
ident_table (pd.DataFrame) – the identification table as returned by one of the parser functions defined in the IO modules
- Returns
The summary table
- Return type
pd.DataFrame
- IPTK.Utils.UtilityFunction.get_idx_peptide_in_sequence_table(sequence_table: pandas.core.frame.DataFrame, peptide: str) List[str] ¶
check the sequences table if the provided peptide is locate in one of its sequences and returns a list of protein identifiers containing the identifier of the hit proteins.
- Parameters
sequence_table (pd.DataFrame) – pandas dataframe that contain the protein ID and the associated protein sequence
peptide (str) – The peptide sequence to query the protein with
- Returns
A list of protein identifiers containing the identifier of the hit proteins
- Return type
List[str]
- IPTK.Utils.UtilityFunction.load_3d_figure(file_path: str) matplotlib.figure.Figure ¶
- Parameters
file_path (str) – Load a pickled 3D figure from the provided path
- Raises
IOError – The path of the pickled figure.
- Returns
a matplotlib figure
- Return type
plt.Figure
- IPTK.Utils.UtilityFunction.pad_mapped_proteins(list_array: List[numpy.ndarray], pre_pad: bool = True, padding_char: int = - 1) numpy.ndarray ¶
Pad the provided list of array into a 2D tensor of shape number of arrays by maxlength.
- Parameters
list_array (List[np.ndarray]) – A list of NumPy arrays where each array is a mapped_protein array, the expected shape of these arrays is 1 by protein length.
pre_pad (bool, optional) – pre or post padding of shorter array in the list_array. Defaults to True, which mean prepadding
padding_char (int, optional) – The padding char, defaults to -1
- Returns
A 2D tensor of shape number of arrays by maxlength.
- Return type
np.ndarray
- IPTK.Utils.UtilityFunction.save_3d_figure(outpath: str, fig2save: matplotlib.figure.Figure) None ¶
write a pickled version of the a 3D figure so it can be loaded later for more interactive analysis
- Parameters
outpath (str) – The output path of the writer function
fig2save (plt.Figure) – The figure to save to the output file
- Raises
IOError – In case writing the file failed
- IPTK.Utils.UtilityFunction.simulate_protein_binary_represention(num_conditions: int, protein_length: int)¶
- Parameters
num_conditions (int) – The number of conditions to simulate
protein_length (int) – The Length of the protein
- Returns
A 2D matrix of shape protein_length by number of conditions, where each element can be either zero or 1.
- Return type
np.ndarray
- IPTK.Utils.UtilityFunction.simulate_protein_representation(num_conditions: int, protein_len: int, protein_coverage: int) Dict[str, numpy.ndarray] ¶
Simulate protein peptide coverage under-different conditions
- Parameters
num_conditions ([type]) – The number of condition to simulate
protein_len ([type]) – The length of the protein
protein_coverage ([type]) – The maximum protein coverage
- Returns
a dict of length num_conditions containing the condition index and a simulated protein array
- Return type
Dict[str, np.ndarray]