gwide Reference manual

ProfileAnalyser

Set of functions designed to analysis of genomic profiles

gwide.profileAnalyser.compare1toRef(dataset=Series([], dtype: float64), ranges='mm', heatmap=False, relative=False, reference='/home/tturowski/notebooks/RDN37_reference_collapsed.csv')[source]

Takes series and compare this with reference DataFrame, as a result gives param dataset: given series param ranges: mm : min-max or qq : q1-q3 param heatmap=False: Dataframe with(reference_above_experiment minimum etc.): rae_min, rae_max, ear_min, ear_max;

heatmap=True: Series of differences to plot heatmap

param relative: only for heatmap, recalculates differences according to the peak size. Warning: negative values are in range -1 to 0 but positive are from 0 to values higher than 1 param reference: path to reference plot :return: Dataframe (heatmap=False) or Series (heatmap=True)

gwide.profileAnalyser.compareMoretoRef(dataset=Empty DataFrame Columns: [] Index: [], ranges='mm', reference='/home/tturowski/notebooks/RDN37_reference_collapsed.csv')[source]

Takes Dataframe created by filter_df and compare this with reference DataFrame

gwide.profileAnalyser.filter_df(input_df=Empty DataFrame Columns: [] Index: [], let_in=[''], let_out=['wont_find_this_string'], smooth=True, window=10)[source]

Returns dataframe for choosen experiments param input_df: input dataframe param let_in: list of words that characterize experiment param let_out: list of word that disqualify experiments (may remain predefined) param smooth: apply 10nt smootheninig window :return: dataframe with ‘mean’, ‘median’, ‘min’, ‘max’ and quartiles if more than 2 experiments

gwide.profileAnalyser.plot_ChIP(df_sense=Empty DataFrame Columns: [] Index: [], df_anti=Empty DataFrame Columns: [] Index: [], title=None, start=None, stop=None, figsize=(15, 6), ylim=(-0.001, 0.001), s_color='red', as_color='blue', h_lines=[], lc='black', dpi=150, csv_path='/home/tturowski/notebooks/RDN37_reference_collapsed.csv', color='green')[source]

Function creates plot similar to box plot: median, 2 and 3 quartiles and min-max range

csv_path: str()
Path to CRAC or other reference file

title: str()

start: int()

stop: int()

figsize: tuple()

ylim: tuple()
OY axes lim - def (None,0.01)
color: str()
plot color
h_lines: list()
Optional. list() of horizontal lines
lc: str()
color of horizontal lines

None

gwide.profileAnalyser.plot_as_box_plot(df=Empty DataFrame Columns: [] Index: [], title=None, start=None, stop=None, figsize=(15, 6), ylim=(None, 0.01), dpi=150, color='green', h_lines=[], lc='red')[source]

Plots figure similar to box plot: median, 2 and 3 quartiles and min-max range

df : DataFrame
Dataframe containing following columns:`['position'] ['nucleotide'] ['mean'] ['median'] ['std']` optionally `['q1'] ['q3'] ['max'] ['min']`

title : str

start : int

stop : int

figsize : (float, float)

ylim : (float, float)
OY axes lim. Default = (None,0.01)
color : str
line color
h_lines : list
optional: list of horizontal lines
lc : str
optional: color of horizontal lines
gwide.profileAnalyser.plot_diff(dataset=Empty DataFrame Columns: [] Index: [], ranges='mm', label='', start=None, stop=None, plot_medians=True, plot_ranges=True, figsize=(15, 6), ylim=(None, 0.01), h_lines=[], reference='/home/tturowski/notebooks/RDN37_reference_collapsed.csv')[source]

Plot given dataset and reference, differences are marked param dataset: dataset from filter_df param ranges: mm : min-max or qq : q1-q3 param label: label of given dataset param start: start param stop: stop param plot_medians: plot medians param plot_ranges: plot ranges param figsize: figzise touple(15, 6) param ylim: ylim touple(None,0.01) param h_lines: list of horizontal lines param reference: path to reference plot :return: plot with marked differences

gwide.profileAnalyser.plot_from_csv(csv_path='', title=None, start=None, stop=None, figsize=(15, 6), ylim=(None, 0.01), color='green', h_lines=[], lc='red', dpi=75)[source]

Plots figure similar to box plot: median, 2 and 3 quartiles and min-max range

csv_path: str
path to csv file

title: str

start: int

stop: int

figsize: (float, float)
Default = (15,6)
ylim : (float, float)
OY axes lim. Default = (None,0.01)
color : str
line color
h_lines : list
optional: list of horizontal lines
lc : str
optional: color of horizontal lines
gwide.profileAnalyser.plot_heatmap(df=Empty DataFrame Columns: [] Index: [], title='Heat map of differences between dataset and reference plot for RDN37-1', vmin=None, vmax=None, figsize=(20, 10))[source]

plot heat map of differences, from dataframe generated by compare1toRef(dataset, heatmap=True) function

gwide.profileAnalyser.plot_to_compare(df=Empty DataFrame Columns: [] Index: [], df2=None, color1='black', color2='darkred', label='', title=None, start=None, stop=None, figsize=(15, 6), ylim=(None, 0.01), h_lines=[], dpi=75, csv_path='/home/tturowski/notebooks/RDN37_csv_path_collapsed.csv')[source]

Plots given dataset and reference dataset from csv file.

df : DataFrame
Dataframe (dataset) containing following columns:`['position'] ['nucleotide'] ['mean'] ['median'] ['std']` optionally `['q1'] ['q3'] ['max'] ['min']`
df2 : DataFrame
Optional Dataframe (dataset2). Default = None
color1 : str
Default color for dataset1 = ‘black’
color2 : str
Default color for dataset2 = ‘darkred’

label : str

title : str

start : int

stop : int

figsize : (float, float)

ylim : (float, float)
OY axes lim. Default = (None,0.01)
h_lines : list
optional: list of horizontal lines
dpi : int
Default dpi=75
csv_path : str
Default = '/home/tturowski/notebooks/RDN37_csv_path_collapsed.csv'
gwide.profileAnalyser.save_csv(data_ref=Empty DataFrame Columns: [] Index: [], datasets=Empty DataFrame Columns: [] Index: [], path=None)[source]

Takes reference and data DataFrame’s

data_ref : DataFrame
DataFrame with ['position'] and ['nucleotide'] columns
datasets : DataFrame
DataFrame containinig experimental data only
path : str
Optional: path to save csv. Default: None
Returns:reference DataFrame

Methods

Common methods

gwide.methods.calGC(dataset=Empty DataFrame Columns: [] Index: [], calFor=['G', 'C'])[source]

Returns GC content in a given dataset :param dataset: Pandas DataFrame with “nucleotide” column :return: fraction of GC content

gwide.methods.calculateFDR(data=Series([], dtype: float64), iterations=100, target_FDR=0.05)[source]

calculates False Discovery Rate (FDR) for a given dataset. data : pd.Series()

iterations : int()
number of iterations. Default = 100
target_FDR : float()
Detault = 0.05

Series()

gwide.methods.cleanNames(df=Empty DataFrame Columns: [] Index: [], additional_tags=[])[source]

Cleans some problems with names if exist

gwide.methods.define_experiments(paths_in, whole_name=False)[source]

Parse file names and extract experiment name from them :param whole_name As defaults script takes first ‘a_b_c’ from a_b_c_hittable_reads.txt as experiment name. :return: list of experiment names and list of paths.

gwide.methods.expNameParser(name, additional_tags=[], order='b_d_e_p')[source]

Function handles experiment name; recognizes AB123456 as experiment date; BY4741 or HTP or given string as bait protein :param name: :param additional_tags: list of tags :param output: default ‘root’ ; print other elements when ‘all’ :param order: defoult ‘b_d_e_p’ b-bait; d-details, e-experiment, p-prefix :return: reordered name

gwide.methods.filterExp(datasets, let_in=[''], let_out=['wont_find_this_string'])[source]

for pd.DataFrame() or dict(). Returns object with filtered columns/keys.

datasets : DataFrame() or dict()
DataFrame() or dict() with exp name as a key
let_in : list()
list() with elements of name to filter in
let_out : list()
list() with elements of name to filter out
Returns:DataFrame() or dict()
gwide.methods.findPeaks(s1=Series([], dtype: float64), window=1, order=20)[source]

Find local extrema using SciPy argrelextrema function s1 : Series()

data to localize peaks
window : int()
To smooth data before peak-calling. Default = 1 (no smoothed)
order : int()
argrelextrema order parameter. Detault = 20
Returns:list() of peaks
gwide.methods.getRefFile(file_from_options, file_type)[source]
Sorting out source of gtf, fasta or tab path, in order (1) from options parser, (2) from ~/bin/default.aml
or (3) from environmental variable $xxx_PATH
Parameters:file_from_options – file path from options parser, can be an empty string file_type: ‘GTF’, ‘FASTA’, ‘TAB’
Returns:path to the file
gwide.methods.indexOrder(df=Empty DataFrame Columns: [] Index: [], additional_tags=[], output='root', order='b_d_e_p')[source]

Aplly expNameParser to whole dataframe :param order: defoult ‘b_d_e_p’ b-bait; d-details, e-experiment, p-prefix

gwide.methods.list_paths_in_current_dir(suffix='', stdin=False)[source]
Param:suffix - lists paths in current directory ending with an indicated suffix only stdin - read from standard input instead current directory
Returns:list of paths in current dir ending with suffix

BigWigTools

Tools to analyse BigWig files