gwide Reference manual¶

ProfileAnalyser¶

Set of functions designed to analysis of genomic profiles

gwide.profileAnalyser.compare1toRef(dataset=Series([], dtype: float64), ranges='mm', heatmap=False, relative=False, reference='/home/tturowski/notebooks/RDN37_reference_collapsed.csv')[source]¶

Takes series and compare this with reference DataFrame, as a result gives param dataset: given series param ranges: mm : min-max or qq : q1-q3 param heatmap=False: Dataframe with(reference_above_experiment minimum etc.): rae_min, rae_max, ear_min, ear_max;

heatmap=True: Series of differences to plot heatmap

param relative: only for heatmap, recalculates differences according to the peak size. Warning: negative values are in range -1 to 0 but positive are from 0 to values higher than 1 param reference: path to reference plot :return: Dataframe (heatmap=False) or Series (heatmap=True)

gwide.profileAnalyser.compareMoretoRef(dataset=Empty DataFrame Columns: [] Index: [], ranges='mm', reference='/home/tturowski/notebooks/RDN37_reference_collapsed.csv')[source]¶: Takes Dataframe created by filter_df and compare this with reference DataFrame

gwide.profileAnalyser.filter_df(input_df=Empty DataFrame Columns: [] Index: [], let_in=[''], let_out=['wont_find_this_string'], smooth=True, window=10)[source]¶: Returns dataframe for choosen experiments param input_df: input dataframe param let_in: list of words that characterize experiment param let_out: list of word that disqualify experiments (may remain predefined) param smooth: apply 10nt smootheninig window :return: dataframe with ‘mean’, ‘median’, ‘min’, ‘max’ and quartiles if more than 2 experiments

gwide.profileAnalyser.plot_ChIP(df_sense=Empty DataFrame Columns: [] Index: [], df_anti=Empty DataFrame Columns: [] Index: [], title=None, start=None, stop=None, figsize=(15, 6), ylim=(-0.001, 0.001), s_color='red', as_color='blue', h_lines=[], lc='black', dpi=150, csv_path='/home/tturowski/notebooks/RDN37_reference_collapsed.csv', color='green')[source]¶

Function creates plot similar to box plot: median, 2 and 3 quartiles and min-max range

csv_path: str(): Path to CRAC or other reference file

title: str()

start: int()

stop: int()

figsize: tuple()

ylim: tuple(): OY axes lim - def (None,0.01)
color: str(): plot color
h_lines: list(): Optional. list() of horizontal lines
lc: str(): color of horizontal lines

None

gwide.profileAnalyser.plot_as_box_plot(df=Empty DataFrame Columns: [] Index: [], title=None, start=None, stop=None, figsize=(15, 6), ylim=(None, 0.01), dpi=150, color='green', h_lines=[], lc='red')[source]¶

Plots figure similar to box plot: median, 2 and 3 quartiles and min-max range

df : DataFrame: Dataframe containing following columns:`['position'] ['nucleotide'] ['mean'] ['median'] ['std']` optionally `['q1'] ['q3'] ['max'] ['min']`

title : str

start : int

stop : int

figsize : (float, float)

ylim : (float, float): OY axes lim. Default = (None,0.01)
color : str: line color
h_lines : list: optional: list of horizontal lines
lc : str: optional: color of horizontal lines

gwide.profileAnalyser.plot_diff(dataset=Empty DataFrame Columns: [] Index: [], ranges='mm', label='', start=None, stop=None, plot_medians=True, plot_ranges=True, figsize=(15, 6), ylim=(None, 0.01), h_lines=[], reference='/home/tturowski/notebooks/RDN37_reference_collapsed.csv')[source]¶: Plot given dataset and reference, differences are marked param dataset: dataset from filter_df param ranges: mm : min-max or qq : q1-q3 param label: label of given dataset param start: start param stop: stop param plot_medians: plot medians param plot_ranges: plot ranges param figsize: figzise touple(15, 6) param ylim: ylim touple(None,0.01) param h_lines: list of horizontal lines param reference: path to reference plot :return: plot with marked differences

gwide.profileAnalyser.plot_from_csv(csv_path='', title=None, start=None, stop=None, figsize=(15, 6), ylim=(None, 0.01), color='green', h_lines=[], lc='red', dpi=75)[source]¶

Plots figure similar to box plot: median, 2 and 3 quartiles and min-max range

csv_path: str: path to csv file

title: str

start: int

stop: int

figsize: (float, float): Default = (15,6)
ylim : (float, float): OY axes lim. Default = (None,0.01)
color : str: line color
h_lines : list: optional: list of horizontal lines
lc : str: optional: color of horizontal lines

gwide.profileAnalyser.plot_heatmap(df=Empty DataFrame Columns: [] Index: [], title='Heat map of differences between dataset and reference plot for RDN37-1', vmin=None, vmax=None, figsize=(20, 10))[source]¶: plot heat map of differences, from dataframe generated by compare1toRef(dataset, heatmap=True) function

gwide.profileAnalyser.plot_to_compare(df=Empty DataFrame Columns: [] Index: [], df2=None, color1='black', color2='darkred', label='', title=None, start=None, stop=None, figsize=(15, 6), ylim=(None, 0.01), h_lines=[], dpi=75, csv_path='/home/tturowski/notebooks/RDN37_csv_path_collapsed.csv')[source]¶

Plots given dataset and reference dataset from csv file.

df : DataFrame: Dataframe (dataset) containing following columns:`['position'] ['nucleotide'] ['mean'] ['median'] ['std']` optionally `['q1'] ['q3'] ['max'] ['min']`
df2 : DataFrame: Optional Dataframe (dataset2). Default = None
color1 : str: Default color for dataset1 = ‘black’
color2 : str: Default color for dataset2 = ‘darkred’

label : str

title : str

start : int

stop : int

figsize : (float, float)

ylim : (float, float): OY axes lim. Default = (None,0.01)
h_lines : list: optional: list of horizontal lines
dpi : int: Default dpi=75
csv_path : str: Default = '/home/tturowski/notebooks/RDN37_csv_path_collapsed.csv'

gwide.profileAnalyser.save_csv(data_ref=Empty DataFrame Columns: [] Index: [], datasets=Empty DataFrame Columns: [] Index: [], path=None)[source]¶

Takes reference and data DataFrame’s

data_ref : DataFrame: DataFrame with ['position'] and ['nucleotide'] columns
datasets : DataFrame: DataFrame containinig experimental data only
path : str: Optional: path to save csv. Default: None

Returns:	reference DataFrame

Methods¶

Common methods

gwide.methods.calGC(dataset=Empty DataFrame Columns: [] Index: [], calFor=['G', 'C'])[source]¶: Returns GC content in a given dataset :param dataset: Pandas DataFrame with “nucleotide” column :return: fraction of GC content

gwide.methods.calculateFDR(data=Series([], dtype: float64), iterations=100, target_FDR=0.05)[source]¶

calculates False Discovery Rate (FDR) for a given dataset. data : pd.Series()

iterations : int(): number of iterations. Default = 100
target_FDR : float(): Detault = 0.05

Series()

gwide.methods.cleanNames(df=Empty DataFrame Columns: [] Index: [], additional_tags=[])[source]¶: Cleans some problems with names if exist

gwide.methods.define_experiments(paths_in, whole_name=False)[source]¶: Parse file names and extract experiment name from them :param whole_name As defaults script takes first ‘a_b_c’ from a_b_c_hittable_reads.txt as experiment name. :return: list of experiment names and list of paths.

gwide.methods.expNameParser(name, additional_tags=[], order='b_d_e_p')[source]¶: Function handles experiment name; recognizes AB123456 as experiment date; BY4741 or HTP or given string as bait protein :param name: :param additional_tags: list of tags :param output: default ‘root’ ; print other elements when ‘all’ :param order: defoult ‘b_d_e_p’ b-bait; d-details, e-experiment, p-prefix :return: reordered name

gwide.methods.filterExp(datasets, let_in=[''], let_out=['wont_find_this_string'])[source]¶

for pd.DataFrame() or dict(). Returns object with filtered columns/keys.

datasets : DataFrame() or dict(): DataFrame() or dict() with exp name as a key
let_in : list(): list() with elements of name to filter in
let_out : list(): list() with elements of name to filter out

Returns:	DataFrame() or dict()

gwide.methods.findPeaks(s1=Series([], dtype: float64), window=1, order=20)[source]¶

Find local extrema using SciPy argrelextrema function s1 : Series()

data to localize peaks

window : int(): To smooth data before peak-calling. Default = 1 (no smoothed)
order : int(): argrelextrema order parameter. Detault = 20

Returns:	list() of peaks

gwide.methods.getRefFile(file_from_options, file_type)[source]¶

Sorting out source of gtf, fasta or tab path, in order (1) from options parser, (2) from ~/bin/default.aml: or (3) from environmental variable $xxx_PATH

Parameters:	file_from_options – file path from options parser, can be an empty string file_type: ‘GTF’, ‘FASTA’, ‘TAB’
Returns:	path to the file

gwide.methods.indexOrder(df=Empty DataFrame Columns: [] Index: [], additional_tags=[], output='root', order='b_d_e_p')[source]¶: Aplly expNameParser to whole dataframe :param order: defoult ‘b_d_e_p’ b-bait; d-details, e-experiment, p-prefix

gwide.methods.list_paths_in_current_dir(suffix='', stdin=False)[source]¶

Param:	suffix - lists paths in current directory ending with an indicated suffix only stdin - read from standard input instead current directory
Returns:	list of paths in current dir ending with suffix

BigWigTools¶

Tools to analyse BigWig files