Skip to content

glycontact.process module

glycontact.process

ComplexDictSerializer

Bases: DataFrameSerializer

Extends DataFrameSerializer with methods to handle complex defaultdict structures.

deserialize_complex_dict(path: str) -> defaultdict classmethod

Deserialize a defaultdict of (DataFrame, dict) tuples from a single JSON file

serialize_complex_dict(data_dict: defaultdict, path: str) -> None classmethod

Serialize a defaultdict of (DataFrame, dict) tuples to a single JSON file

align_point_sets(mobile_coords, ref_coords, fast=False)

Find optimal rigid transformation to align two point sets using SVD-based Kabsch algorithm or Nelder-Mead optimization. Args: mobile_coords (np.ndarray): Nx3 array of coordinates to transform ref_coords (np.ndarray): Mx3 array of reference coordinates fast (bool): Whether to use SVD-based Kabsch algorithm with k-d trees or Nelder-Mead optimization. Defaults to the latter Returns: Tuple of (transformed coordinates, RMSD)

annotate_pdb_data(pdb_dataframe, mapping_dict)

Annotates PDB data with IUPAC nomenclature using the mapping dictionary. Args: pdb_dataframe (pd.DataFrame): DataFrame with PDB coordinates. mapping_dict (dict): Mapping from PDB to IUPAC nomenclature. Returns: pd.DataFrame: Annotated dataframe with IUPAC column.

annotation_pipeline(glycan, pdb_file=None, threshold=3.5, stereo=None, my_path=None)

Combines all annotation steps to convert PDB files to IUPAC annotations. Args: glycan (str): IUPAC glycan sequence. pdb_file (str or list, optional): Path(s) to PDB file(s). threshold (float): Distance threshold for interactions. stereo (str, optional): 'alpha' or 'beta' stereochemistry. my_path (Path, optional): Custom path to PDB folder Returns: tuple: (dataframes_list, interaction_dicts_list) for all processed PDBs.

calculate_ring_pucker(df: pd.DataFrame, residue_number: int) -> Dict

Calculate ring puckering parameters for a monosaccharide using the Cremer-Pople method. Args: df (pd.DataFrame): DataFrame with PDB coordinates residue_number (int): Residue number to analyze Returns: dict: Dictionary with puckering parameters

calculate_torsion_angle(coords: List[List[float]]) -> float

Calculate torsion angle from 4 xyz coordinates. Args: coords (list): List of 4 [x,y,z] coordinates Returns: float: Torsion angle in degrees

check_graph_content(G)

Prints node and edge information from a graph for inspection. Args: G (nx.Graph): NetworkX graph object. Returns: None: Prints information to console.

check_reconstructed_interactions(interaction_dict)

Verifies if the reconstructed glycan is connected as a single component. Args: interaction_dict (dict): Dictionary of interactions. Returns: bool: True if glycan is correctly reconstructed as a single connected component.

compare_graphs_with_attributes(G_contact, G_work)

Performs attribute-aware isomorphism check between two glycan graphs. Args: G_contact (nx.Graph): Glycontact graph. G_work (nx.Graph): Glycowork graph. Returns: dict: Mapping between node indices or empty dict if not isomorphic.

compute_merge_SASA_flexibility(glycan, mode='weighted', stereo=None, my_path=None)

Merges SASA and flexibility data for a glycan structure. Args: glycan (str): IUPAC glycan sequence. mode (str, optional): 'standard', 'amplify', or 'weighted' for flexibility calculation. stereo (str, optional): 'alpha' or 'beta' stereochemistry. my_path (str, optional): Custom path to PDB folders. Returns: pd.DataFrame: Combined table with SASA and flexibility (as RMSF) metrics.

convert_glycan_to_class(glycan)

Converts monosaccharides in a glycan string to abstract classes. Args: glycan (str): IUPAC glycan sequence. Returns: str: Modified glycan string with abstracted monosaccharide classes.

correct_dataframe(df)

Corrects monosaccharide assignments in the dataframe based on atom counts. Args: df (pd.DataFrame): Annotated dataframe from annotate_pdb_data. Returns: pd.DataFrame: Corrected dataframe with fixed monosaccharide assignments.

create_glycontact_annotated_graph(glycan: str, mapping_dict, g_contact, libr=None) -> nx.Graph

Creates a glycowork graph annotated with glycontact structural data. Args: glycan (str): IUPAC glycan sequence. mapping_dict (dict): Node mapping from compare_graphs_with_attributes. g_contact (nx.Graph): Glycontact graph with structural attributes. libr (dict, optional): Custom library for glycan_to_nxGraph. Returns: nx.Graph: Annotated glycowork graph with combined information.

create_mapping_dict_and_interactions(df, valid_fragments, n_glycan, furanose_end, d_end, is_protein_complex)

Creates mapping dictionaries for converting PDB residue names to IUPAC notation. Args: df (pd.DataFrame): Interaction dataframe from extract_binary_interactions_from_PDB. valid_fragments (set): Valid monosaccharide link fragments from glycowork. n_glycan (bool): If True, applies N-glycan-specific corrections. furanose_end (bool): If True, considers furanose forms for terminal residues. d_end (bool): If True, considers D-form for terminal residues. is_protein_complex (bool): If True, assumes glycan comes from protein-glycan PDB Returns: tuple: (mapping_dict, interaction_dict) for PDB to IUPAC conversion.

df_to_pdb_content(df)

Convert a DataFrame containing PDB-like data to PDB file content. Args: df: DataFrame with columns matching PDB HETATM/ATOM format Returns: String containing PDB-formatted content

download_from_glycoshape(IUPAC)

Downloads PDB files for a given IUPAC sequence from the GlycoShape database. Args: IUPAC (str): IUPAC-formatted glycan sequence to download. Returns: bool: False if IUPAC is improperly formatted, None otherwise.

extract_3D_coordinates(pdb_file)

Extracts 3D coordinates from a PDB file and returns them as a DataFrame. Args: pdb_file (str): Path to the PDB file. Returns: pd.DataFrame: DataFrame containing extracted atom coordinates with columns for atom information, coordinates, and properties.

extract_binary_glycontact_interactions(interaction_dict, mapping_dict)

Transforms PDB-based interactions into IUPAC binary interactions. Args: interaction_dict (dict): Dict of interactions from create_mapping_dict_and_interactions. mapping_dict (dict): Mapping dict from create_mapping_dict_and_interactions. Returns: list: List of binary interaction tuples in IUPAC format.

extract_binary_glycowork_interactions(graph_output)

Extracts binary interactions from glycowork graph output. Args: graph_output (tuple): Output from glycan_to_graph function. Returns: list: List of binary interaction pairs.

extract_binary_interactions_from_PDB(coordinates_df)

Extracts binary interactions between C1/C2 atoms and oxygen atoms from coordinates. Args: coordinates_df (pd.DataFrame): Coordinate dataframe from extract_3D_coordinates. Returns: pd.DataFrame or list of pd.DataFrame: DataFrame with columns 'Atom', 'Column', and 'Value' showing interactions. Returns a list of DataFrames if multiple chains are present.

extract_glycan_coords(pdb_filepath, residue_ids=None, main_chain_only=False)

Extracts coordinates of glycan residues from a PDB file. Args: pdb_filepath (str): Path to PDB file. residue_ids (list, optional): List of residue numbers to extract. main_chain_only (bool): If True, extracts only main chain atoms. Returns: tuple: (coordinates_array, atom_labels).

fetch_pdbs(glycan, stereo=None, my_path=None)

Given a glycan sequence, will query first GlycoShape and then UniLectin for appropriate PDB files. Args: glycan (str): glycan sequence, preferably in IUPAC-condensed stereo (str, optional): specification of whether reducing end alpha or beta is desired my_path (Path, optional): custom path to PDB folder Returns: List of Paths for GlycoShape and list of get_annotation output tuples for UniLectin

focus_table_on_residue(table, residue)

Filters a monosaccharide contact table to keep only one residue type. Args: table (pd.DataFrame): Monosaccharide contact table. residue (str): Residue type to focus on (e.g., 'MAN'). Returns: pd.DataFrame: Filtered contact table.

get_all_clusters_frequency(fresh=False)

Extracts frequency data for all glycan clusters from GlycoShape. Args: fresh (bool): If True, fetches fresh data from GlycoShape. Returns: dict: Dictionary mapping IUPAC sequences to cluster frequency lists.

get_annotation(glycan, pdb_file, threshold=3.5)

Annotates a PDB file with IUPAC nomenclature for a given glycan. Args: glycan (str): IUPAC glycan sequence. pdb_file (str): Path to PDB file. threshold (float or list): Distance threshold for interactions. Returns: tuple: (annotated_dataframe, interaction_dict) or (empty_dataframe, {}) if validation fails.

get_contact_tables(glycan, stereo=None, level='monosaccharide', my_path=None)

Gets contact tables for a given glycan across all its PDB structures. Args: glycan (str): IUPAC glycan sequence. stereo (str, optional): 'alpha' or 'beta' to select stereochemistry. level (str): 'monosaccharide' or 'atom' to determine detail level. my_path (str, optional): Custom path to PDB folders. Returns: list: List of contact tables for each PDB structure.

get_example_pdb(glycan, stereo=None, rng=None, my_path=None)

Gets a random example PDB file for a given glycan. Args: glycan (str): IUPAC glycan sequence. stereo (str, optional): 'alpha' or 'beta' stereochemistry. rng (Random, optional): Random number generator instance. my_path (Path, optional): Custom path to pdb folder Returns: Path: Path to a randomly selected PDB file.

get_glycoshape_IUPAC(fresh=False)

Retrieves a list of available glycans from GlycoShape database. Args: fresh (bool): If True, fetches data directly from GlycoShape API. If False, uses cached data from the local mirror. Returns: set: Set of IUPAC-formatted glycan sequences available in the database.

get_glycosidic_torsions(df: pd.DataFrame, interaction_dict: Dict[str, List[str]]) -> pd.DataFrame

Calculate phi/psi/omega torsion angles for all glycosidic linkages in structure. Args: df (pd.DataFrame): DataFrame with PDB atomic coordinates interaction_dict (dict): Dictionary of glycosidic linkages Returns: pd.DataFrame: Phi/psi angles for each linkage

get_ring_conformations(df: pd.DataFrame, exclude_types: List[str] = ['ROH', 'MEX', 'PCX', 'SO3', 'ACX']) -> pd.DataFrame

Analyze ring conformations for all residues in structure. Args: df (pd.DataFrame): DataFrame with PDB coordinates exclude_types (list): List of residue types to exclude Returns: pd.DataFrame: DataFrame with ring parameters for each residue

get_sasa_table(glycan, stereo=None, my_path=None, fresh=False)

Calculates solvent accessible surface area (SASA) for each monosaccharide. Args: glycan (str): IUPAC glycan sequence. stereo (str, optional): 'alpha' or 'beta' stereochemistry. my_path (str, optional): Custom path to PDB folders. fresh (bool): If True, fetches fresh cluster frequencies. Returns: pd.DataFrame: Table with SASA values and statistics for each monosaccharide.

get_similar_glycans(query_glycan, pdb_path=None, glycan_database=None, rmsd_cutoff=2.0, fast=False, unilectin_id=0)

Search for structurally similar glycans by comparing against all available conformers/structures and keeping the best match for each glycan. Args: query_glycan (str): PDB file or coordinates of query structure pdb_path (str, optional): Optional specific path to query PDB file glycan_database (list, optional): List of candidate glycan structures rmsd_cutoff (float): Maximum RMSD to consider as similar fast (bool): Whether to use SVD-based Kabsch algorithm with k-d trees or Nelder-Mead optimization. Defaults to the latter unilectin_id (int): if pdb_path=='unilectin', will retrieve that structure ID from unilectin; Defaults to the first Returns: List of (glycan_id, rmsd, best_structure) tuples sorted by similarity

get_structure_graph(glycan, stereo=None, libr=None, example_path=None, sasa_flex_path=None, my_path=None)

Creates a complete annotated structure graph for a glycan. Args: glycan (str): IUPAC glycan sequence. stereo (str, optional): 'alpha' or 'beta' stereochemistry. libr (dict, optional): Custom library for glycan_to_nxGraph. example_path (str, optional): Path to a specific PDB, used for torsion angles and conformations. sasa_flex_path (str, optional): Path to a specific PDB, used for SASA/flexibility. my_path(Path, optional): Custom path to PDB folder Returns: nx.Graph: Fully annotated structure graph with all available properties.

glycan_cluster_pattern(threshold=70, mute=False, fresh=False)

Categorizes glycans based on their cluster distribution patterns. Args: threshold (float): Percentage threshold for major cluster classification. mute (bool): If True, suppresses print output. fresh (bool): If True, fetches fresh data from GlycoShape. Returns: tuple: (major_clusters_list, minor_clusters_list) sorted by cluster pattern.

glycowork_vs_glycontact_interactions(glycowork_interactions, glycontact_interactions)

Compares binary interactions from glycowork and glycontact for validation. Args: glycowork_interactions (list): Interactions from glycowork. glycontact_interactions (list): Interactions from glycontact. Returns: bool: True if interactions are consistent (excluding special cases).

group_by_silhouette(glycan_list, mode='X')

Groups glycans by their topological silhouette/branching pattern. Args: glycan_list (list): List of IUPAC glycan sequences. mode (str): 'X' for simple abstraction or 'class' for detailed classes. Returns: pd.DataFrame: DataFrame of glycans annotated with silhouette and group.

inter_structure_frequency_table(glycan, stereo=None, threshold=5, my_path=None)

Creates a table showing frequency of contacts between residues across structures. Args: glycan (str or list): Glycan in IUPAC sequence or list of contact tables. stereo (str, optional): 'alpha' or 'beta' to select stereochemistry. threshold (float): Maximum distance for determining a contact. my_path (str, optional): Custom path to PDB folders. Returns: pd.DataFrame: Table of contact frequencies across structures.

inter_structure_variability_table(glycan, stereo=None, mode='standard', my_path=None, fresh=False)

Creates a table showing stability of atom/monosaccharide positions across different PDB structures of the same glycan. Args: glycan (str or list): Glycan in IUPAC sequence or list of contact tables. stereo (str, optional): 'alpha' or 'beta' to select stereochemistry. mode (str): 'standard', 'amplify', or 'weighted' for different calculation methods. my_path (str, optional): Custom path to PDB folders. fresh (bool): If True, fetches fresh cluster frequencies. Returns: pd.DataFrame: Variability table showing how much positions vary across structures.

make_atom_contact_table(coord_df, threshold=10, mode='exclusive')

Creates a contact table showing distances between atoms in a PDB structure. Args: coord_df (pd.DataFrame): Dataframe of coordinates from extract_3D_coordinates. threshold (float): Maximum distance to consider, longer distances set to threshold+1. mode (str): 'exclusive' to exclude intra-residue distances, 'inclusive' to include them. Returns: pd.DataFrame: Matrix of distances between atoms.

make_correlation_matrix(glycan, stereo=None, my_path=None)

Computes a Pearson correlation matrix between residue positions across structures. Args: glycan (str or list): Glycan in IUPAC sequence or list of contact tables. stereo (str, optional): 'alpha' or 'beta' to select stereochemistry. my_path (str, optional): Custom path to PDB folders. Returns: pd.DataFrame: Correlation matrix showing relationships between residue positions.

make_monosaccharide_contact_table(coord_df, threshold=10, mode='binary')

Creates a contact table at the monosaccharide level rather than atom level. Args: coord_df (pd.DataFrame): Dataframe of coordinates from extract_3D_coordinates. threshold (float): Maximum distance to consider. mode (str): 'binary' for binary contact matrix, 'distance' for distance values, 'both' to return both matrices. Returns: pd.DataFrame or list: Contact table(s) between monosaccharides.

map_data_to_graph(computed_df, interaction_dict, ring_conf_df=None, torsion_df=None)

Creates a NetworkX graph with node-level structural data. Args: computed_df (pd.DataFrame): DataFrame with computed monosaccharide properties. interaction_dict (dict): Dictionary of glycosidic linkages. ring_conf_df (pd.DataFrame, optional): Ring conformation data. torsion_df (pd.DataFrame, optional): Torsion angle data. Returns: nx.Graph: Graph with nodes/edges representing glycan structure and properties.

monosaccharide_preference_structure(df, monosaccharide, threshold, mode='default')

Finds preferred partners for a given monosaccharide. Args: df (pd.DataFrame): Monosaccharide distance table. monosaccharide (str): Target monosaccharide type. threshold (float): Minimum distance to exclude covalent bonds. mode (str): 'default', 'monolink', or 'monosaccharide' for different reporting formats. Returns: dict: Dictionary of preferred partners for the target monosaccharide.

multi_glycan_monosaccharide_preference_structure(glycan, monosaccharide, stereo=None, threshold=3.5, mode='default')

Visualizes monosaccharide partner preferences across multiple structures. Args: glycan (str): IUPAC glycan sequence. monosaccharide (str): Target monosaccharide type. stereo (str, optional): 'alpha' or 'beta' stereochemistry. threshold (float): Minimum distance to exclude covalent bonds. mode (str): 'default', 'monolink', or 'monosaccharide' for different reporting formats. Returns: None: Displays a bar plot of partner frequencies.

process_interactions(coordinates_df)

Extracts binary interactions between C1/C2 atoms and oxygen atoms from coordinates. Args: coordinates_df (pd.DataFrame): Coordinate dataframe from extract_3D_coordinates. Returns: pd.DataFrame: DataFrame with columns 'Atom', 'Column', and 'Value' showing interactions.

process_interactions_result(res, threshold, valid_fragments, n_glycan, furanose_end, d_end, is_protein_complex, glycan, df)

Process a single interaction result and return the annotation if valid.

remove_and_concatenate_labels(graph)

Processes a graph by removing odd-indexed nodes and concatenating labels. Args: graph (nx.Graph): NetworkX graph object. Returns: nx.Graph: Modified graph with simplified structure.

superimpose_glycans(ref_glycan, mobile_glycan, ref_residues=None, mobile_residues=None, main_chain_only=False, fast=False)

Superimpose two glycan structures and calculate RMSD. Args: ref_glycan (str): Reference glycan or PDB path. mobile_glycan (str): Mobile glycan or PDB path to superimpose. ref_residues (list, optional): Residue numbers for reference glycan. mobile_residues (list, optional): Residue numbers for mobile glycan. main_chain_only (bool): If True, uses only main chain atoms. fast (bool): Whether to use SVD-based Kabsch algorithm with k-d trees or Nelder-Mead optimization. Defaults to the latter Returns: Dict containing: - ref_coords: Original coordinates of reference - transformed_coords: Aligned mobile coordinates - rmsd: Root mean square deviation - ref_labels: Atom labels from reference structure - mobile_labels: Atom labels from mobile structure - ref_conformer: PDB path of reference conformer - mobile_conformer: PDB path of mobile conformer

trim_gcontact(G_contact)

Removes node 1 (-R terminal) from glycontact graph and connects its neighbors. Args: G_contact (nx.Graph): Glycontact graph. Returns: None: Modifies graph in-place.