gwide Reference manual¶
ProfileAnalyser¶
Set of functions designed to analysis of genomic profiles
-
gwide.profileAnalyser.compare1toRef(dataset=Series([], dtype: float64), ranges='mm', heatmap=False, relative=False, reference='/home/tturowski/notebooks/RDN37_reference_collapsed.csv')[source]¶ Takes series and compare this with reference DataFrame, as a result gives param dataset: given series param ranges: mm : min-max or qq : q1-q3 param heatmap=False: Dataframe with(reference_above_experiment minimum etc.): rae_min, rae_max, ear_min, ear_max;
heatmap=True: Series of differences to plot heatmapparam relative: only for heatmap, recalculates differences according to the peak size. Warning: negative values are in range -1 to 0 but positive are from 0 to values higher than 1 param reference: path to reference plot :return: Dataframe (heatmap=False) or Series (heatmap=True)
-
gwide.profileAnalyser.compareMoretoRef(dataset=Empty DataFrame Columns: [] Index: [], ranges='mm', reference='/home/tturowski/notebooks/RDN37_reference_collapsed.csv')[source]¶ Takes Dataframe created by filter_df and compare this with reference DataFrame
-
gwide.profileAnalyser.filter_df(input_df=Empty DataFrame Columns: [] Index: [], let_in=[''], let_out=['wont_find_this_string'], smooth=True, window=10)[source]¶ Returns dataframe for choosen experiments param input_df: input dataframe param let_in: list of words that characterize experiment param let_out: list of word that disqualify experiments (may remain predefined) param smooth: apply 10nt smootheninig window :return: dataframe with ‘mean’, ‘median’, ‘min’, ‘max’ and quartiles if more than 2 experiments
-
gwide.profileAnalyser.plot_ChIP(df_sense=Empty DataFrame Columns: [] Index: [], df_anti=Empty DataFrame Columns: [] Index: [], title=None, start=None, stop=None, figsize=(15, 6), ylim=(-0.001, 0.001), s_color='red', as_color='blue', h_lines=[], lc='black', dpi=150, csv_path='/home/tturowski/notebooks/RDN37_reference_collapsed.csv', color='green')[source]¶ Function creates plot similar to box plot: median, 2 and 3 quartiles and min-max range
- csv_path: str()
- Path to CRAC or other reference file
title: str()
start: int()
stop: int()
figsize: tuple()
- ylim: tuple()
- OY axes lim - def (None,0.01)
- color: str()
- plot color
- h_lines: list()
- Optional. list() of horizontal lines
- lc: str()
- color of horizontal lines
None
-
gwide.profileAnalyser.plot_as_box_plot(df=Empty DataFrame Columns: [] Index: [], title=None, start=None, stop=None, figsize=(15, 6), ylim=(None, 0.01), dpi=150, color='green', h_lines=[], lc='red')[source]¶ Plots figure similar to box plot: median, 2 and 3 quartiles and min-max range
- df : DataFrame
- Dataframe containing following columns:
`['position'] ['nucleotide'] ['mean'] ['median'] ['std']`optionally`['q1'] ['q3'] ['max'] ['min']`
title : str
start : int
stop : int
figsize : (float, float)
- ylim : (float, float)
- OY axes lim. Default = (None,0.01)
- color : str
- line color
- h_lines : list
- optional: list of horizontal lines
- lc : str
- optional: color of horizontal lines
-
gwide.profileAnalyser.plot_diff(dataset=Empty DataFrame Columns: [] Index: [], ranges='mm', label='', start=None, stop=None, plot_medians=True, plot_ranges=True, figsize=(15, 6), ylim=(None, 0.01), h_lines=[], reference='/home/tturowski/notebooks/RDN37_reference_collapsed.csv')[source]¶ Plot given dataset and reference, differences are marked param dataset: dataset from filter_df param ranges: mm : min-max or qq : q1-q3 param label: label of given dataset param start: start param stop: stop param plot_medians: plot medians param plot_ranges: plot ranges param figsize: figzise touple(15, 6) param ylim: ylim touple(None,0.01) param h_lines: list of horizontal lines param reference: path to reference plot :return: plot with marked differences
-
gwide.profileAnalyser.plot_from_csv(csv_path='', title=None, start=None, stop=None, figsize=(15, 6), ylim=(None, 0.01), color='green', h_lines=[], lc='red', dpi=75)[source]¶ Plots figure similar to box plot: median, 2 and 3 quartiles and min-max range
- csv_path: str
- path to csv file
title: str
start: int
stop: int
- figsize: (float, float)
- Default = (15,6)
- ylim : (float, float)
- OY axes lim. Default = (None,0.01)
- color : str
- line color
- h_lines : list
- optional: list of horizontal lines
- lc : str
- optional: color of horizontal lines
-
gwide.profileAnalyser.plot_heatmap(df=Empty DataFrame Columns: [] Index: [], title='Heat map of differences between dataset and reference plot for RDN37-1', vmin=None, vmax=None, figsize=(20, 10))[source]¶ plot heat map of differences, from dataframe generated by compare1toRef(dataset, heatmap=True) function
-
gwide.profileAnalyser.plot_to_compare(df=Empty DataFrame Columns: [] Index: [], df2=None, color1='black', color2='darkred', label='', title=None, start=None, stop=None, figsize=(15, 6), ylim=(None, 0.01), h_lines=[], dpi=75, csv_path='/home/tturowski/notebooks/RDN37_csv_path_collapsed.csv')[source]¶ Plots given dataset and reference dataset from csv file.
- df : DataFrame
- Dataframe (dataset) containing following columns:
`['position'] ['nucleotide'] ['mean'] ['median'] ['std']`optionally`['q1'] ['q3'] ['max'] ['min']` - df2 : DataFrame
- Optional Dataframe (dataset2). Default = None
- color1 : str
- Default color for dataset1 = ‘black’
- color2 : str
- Default color for dataset2 = ‘darkred’
label : str
title : str
start : int
stop : int
figsize : (float, float)
- ylim : (float, float)
- OY axes lim. Default = (None,0.01)
- h_lines : list
- optional: list of horizontal lines
- dpi : int
- Default dpi=75
- csv_path : str
- Default =
'/home/tturowski/notebooks/RDN37_csv_path_collapsed.csv'
-
gwide.profileAnalyser.save_csv(data_ref=Empty DataFrame Columns: [] Index: [], datasets=Empty DataFrame Columns: [] Index: [], path=None)[source]¶ Takes reference and data DataFrame’s
- data_ref : DataFrame
- DataFrame with
['position']and['nucleotide']columns - datasets : DataFrame
- DataFrame containinig experimental data only
- path : str
- Optional: path to save csv. Default: None
Returns: reference DataFrame
Methods¶
Common methods
-
gwide.methods.calGC(dataset=Empty DataFrame Columns: [] Index: [], calFor=['G', 'C'])[source]¶ Returns GC content in a given dataset :param dataset: Pandas DataFrame with “nucleotide” column :return: fraction of GC content
-
gwide.methods.calculateFDR(data=Series([], dtype: float64), iterations=100, target_FDR=0.05)[source]¶ calculates False Discovery Rate (FDR) for a given dataset. data : pd.Series()
- iterations : int()
- number of iterations. Default = 100
- target_FDR : float()
- Detault = 0.05
Series()
-
gwide.methods.cleanNames(df=Empty DataFrame Columns: [] Index: [], additional_tags=[])[source]¶ Cleans some problems with names if exist
-
gwide.methods.define_experiments(paths_in, whole_name=False)[source]¶ Parse file names and extract experiment name from them :param whole_name As defaults script takes first ‘a_b_c’ from a_b_c_hittable_reads.txt as experiment name. :return: list of experiment names and list of paths.
-
gwide.methods.expNameParser(name, additional_tags=[], order='b_d_e_p')[source]¶ Function handles experiment name; recognizes AB123456 as experiment date; BY4741 or HTP or given string as bait protein :param name: :param additional_tags: list of tags :param output: default ‘root’ ; print other elements when ‘all’ :param order: defoult ‘b_d_e_p’ b-bait; d-details, e-experiment, p-prefix :return: reordered name
-
gwide.methods.filterExp(datasets, let_in=[''], let_out=['wont_find_this_string'])[source]¶ for pd.DataFrame() or dict(). Returns object with filtered columns/keys.
- datasets : DataFrame() or dict()
- DataFrame() or dict() with exp name as a key
- let_in : list()
- list() with elements of name to filter in
- let_out : list()
- list() with elements of name to filter out
Returns: DataFrame() or dict()
-
gwide.methods.findPeaks(s1=Series([], dtype: float64), window=1, order=20)[source]¶ Find local extrema using SciPy argrelextrema function s1 : Series()
data to localize peaks- window : int()
- To smooth data before peak-calling. Default = 1 (no smoothed)
- order : int()
- argrelextrema order parameter. Detault = 20
Returns: list() of peaks
-
gwide.methods.getRefFile(file_from_options, file_type)[source]¶ - Sorting out source of gtf, fasta or tab path, in order (1) from options parser, (2) from ~/bin/default.aml
- or (3) from environmental variable $xxx_PATH
Parameters: file_from_options – file path from options parser, can be an empty string file_type: ‘GTF’, ‘FASTA’, ‘TAB’ Returns: path to the file
BigWigTools¶
Tools to analyse BigWig files