API Documentation
ibd_dendrogram.make_distance_matrix module
- ibd_dendrogram.make_distance_matrix.check_kwargs(args_dict: dict[str, Any]) str | None[source]
Function that will make sure that the necessary arguments are passed to distance function
- Parameters:
args_dict (Dict[str, Any]) – Dictionary that has the arguments as keys and the values for the distance function
- ibd_dendrogram.make_distance_matrix.draw_dendrogram(clustering_results: ndarray[Any, dtype[ScalarType]], grids: List[str], output_name: Path | str, cases: List[str] | None = None, exclusions: List[str] = [], title: str | None = None, node_font_size: int = 10, save_fig: bool = False) tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes, Dict[str, Any]][source]
Function that will draw the dendrogram
- Parameters:
clustering_results (npt.NDArray) – numpy array that has the results from running the generate_dendrogram function
grids (list[str]) – list of ids to use as labels
output_name (Path | str) – path object or a string that tells where the dendrogram will be saved to.
cases (list[str] | None) – list of case ids. If the user doesn’t provided this value then all of the labels on the dendrogram will be black. If the user provides a value then the case labels will be red. Value defaults to None
exclusions (List[str]) – list of individuals who are consider exclusions and are indicated as N/A or -1 by the phenotype file. This value defaults to None
title (str | None) – Optional title for the plot. If this is not provided then the plot will have no title
node_font_size (int) – Size for the font of the dendrogram leaf nodes
save_fig (bool) – whether or not to save the figure. Defaults to False.
- Returns:
returns a tuple with the matplotlib Figure, the matplotlib Axes object, and a dictionary from the sch. dendrogram command
- Return type:
tuple[plt.Figure, plt.Axes, dict[str, Any]]
- ibd_dendrogram.make_distance_matrix.generate_dendrogram(matrix: ndarray[Any, dtype[ScalarType]]) ndarray[Any, dtype[ScalarType]][source]
Function that will perform the hierarchical clustering algorithm
- Parameters:
matrix (Array) – distance matrix represented by 2D numpy array. distance should be calculated based on 1/(ibd segment length)
- Returns:
returns the results of the clustering as a numpy array
- Return type:
Array
- ibd_dendrogram.make_distance_matrix.make_distance_matrix(pairs_df: ~pandas.core.frame.DataFrame, min_cM: int, distance_function: ~typing.Callable = <function _determine_distances>) tuple[list[str] | None, numpy.ndarray[typing.Any, numpy.dtype[+ScalarType]]][source]
Function that will make the distance matrix
- Parameters:
pairs_df (pd.DataFrame) – dataframe that has the pairs_files. it should have at least three columns called ‘pair_1’, ‘pair_2’, and ‘length’
min_cM (float) – This is the minimum centimorgan threshold that will be divided in half to get the ibd segment length when pairs do not share a segment
- Returns:
returns a tuple where the first object is a list of ids that has the individual id that corresponds to each row. The second object is the distance matrix
- Return type:
Dict[str, Dict[str, float]]
- ibd_dendrogram.make_distance_matrix.record_matrix(output: Path | str, matrix, pair_list: List[str]) None[source]
Function that will write the dataframe to a file
- Parameters:
output (str) – filepath to write the output to
matrix (array) – array that has the distance matrix for each individual
pair_list (List[str]) – list of ids that represent each row of the pair_list