GraphPype

graphpype.pipe module

Graphs

A collection of graph specific functions. These include graph construction from neuroimaging data, graph expression e.g. graph composites from Tensorflow, graph generation e.g. null-models and spin-models, and graph analysis tools. There are native implementations for niche functions and API calls for more established functions e.g. NetworkX and Tensforflow.

graphpype.graph.anatNifti(data, atlas='msdl', atlasDir='./data/atlases', standardize='zscore_sample', standardize_confounds='zscore_sample', memory='nilearn_cache', verbose='0', confounds=False): A simple API wrapper for the nilearn function for constructing a covariance matrix according to an atlas from Nifti data formats.

graphpype.graph.atlasDist(atlas='msdl', atlasDir='./data/atlases')

graphpype.graph.atlasLabels(atlas='msdl', atlasDir='./data/atlases')

graphpype.graph.constructCovarianceAverageGraph(*covariances, density=0.1, seed=0)

Constructs the average graph over a dataset of covariances using a minimum span density method.

Parameters:

covarianceslist: List of covariance matrices.
densityfloat: Target edge density.
seedint: Seed used for randomisation in the construction method.

Notes

This allows for individual registration and covariance determination before global averaging.

graphpype.graph.constructMinSpanDensity(covariance, density=0.1, seed=0)

Construct a graph from a covariance matrix using the minimum span density method.

Parameters:

covariancenumpy.ndarray: A covariance matrix between nodal sites.
densityfloat: The edge density targeted in the graph.
seedfloat: Random seed used for sampling.

Returns:

graphnetworkx.digraph

Notes

A covariance matrix (M) is used as a fully connected graph from which a minimal spanning tree (S) is constructed. The number of edges (N) in this spanning tree is calculated and the covariances asscociated with the nodes in the tree are set to zero. Finally, the difference (D) between the required density of edges (Nr) and the number of edges is calculated and D samples without replacement are taken using the remaining covariances as weights. These samples are added as edges into S to construct the final graph G.

graphpype.graph.constructedDensityPermutationGraph(covariances, density=0.1, nPermutations=1000, seed=0)

Construct a distrbution of graphs with a specified target density.

Parameters:

covariancesnumpy.ndarray: A matrix of covariances
densityfloat: Target edge density
nPermutations: int: Number of graphs in the distributions.
seedint: The randomisation seed used in graph constructions

Returns:

distributionlist: A vector of randomly sampled graphs to a target edge density

graphpype.graph.covAtlasPermute(data, atlases)

Generate the covariance matrix for an fMRI scan based on a permuted set of atlasObjects.

Parameters:

data: fMRI data object
atlaslist: List of atlas objects

Returns:

covarianceslist: A list of covariance matrixes between the locations of the specified atlas.

graphpype.graph.covNifti(data, atlas='msdl', atlasDir='./data/derivatives/atlases/', standardize='zscore_sample', standardize_confounds='zscore_sample', memory='nilearn_cache', verbose=0, memory_level=1, confounds=False)

Generate the covariance matrix for an fMRI scan.

Parameters:

data: fMRI data object
atlasstring: Atlas used for registration. The default is ‘msdl’.
atlasDirstring, optional: Location for caching the atlas.
standardizestring, optional: Standarisation method for data.
standarsize_confoundsstring, optional: Standardisation method for confounds
memorystring, optional: Cache location
verbose: string, optional: Verbosity level, defaults to “0”.
confounds: bool, optional: Default false.

Returns:

covariancenumpy.ndarray: A covariance matrix between the locations of the atlas.

graphpype.graph.degreeCDF(G)

Calculate the degree distribution as an empirical CDF.

Parameters:

Gnetworkx.digraph

Returns:

uniqueDegslist: A list of degree sizes
cdfnumpy.ndarray: A numpy array of the cummulative distribution function
lodcdfnumpy.nndarray: A numpy array of the cummulative logged cdf values.
ecdf: A model of the emperical distribution provided by statsmodels
degslist: A list of the degrees in the graph

Notes

Returns a list of degree size and cummlative logged or unlogged CDF values (default: logProb = true). Finally, a model of the ecdf is returned from statsmodels.

graphpype.graph.degreeHistogram(G, nBins=1)

Calculate the degree histogram according to a particular number of bins (default: 1).

Parameters:

Gnetworkx.digraph: The graph to be analysed
nBinsint: The binning number

Returns:

histogramnumpy.histogram

graphpype.graph.euclideanDistanceMatrix(data)

Returns a distance matrix based on a particular parcellation.

Parameters:

datalist: Parcelation data.
Returns
——-
distancenumpy.ndarray: Distance matrix

Notes

Expects the coordinates to be a vector of coordinates.

graphpype.graph.featureDegreeDistribution(G, feature)

Collates mean and standard deviation over features with the same degree.

Parameters:

Gnetworkx.digraph: Graph defining the unique degree distribution.
featurenumpy.ndarray: Feature vector asscociated with the particular nodes e.g. cortical thickness.

Returns:

uniqueDegreeslist: A list of the unique degrees in the graph
featureDatalist: A list of the feature data organised by degree
featureMeanfloat
featureStandardDeviationfloat

graphpype.graph.graphComposite(d, features)

Takes a graph and a series of features to compose a graph composite.

Parameters:

features: Features to be incorporated into the graph schema.

Returns:

nodesetdict: Dictionary with the graph schema node specification.
edgesetdict: Dictionary with the graph schema edge specifciation.
contextdict: Dictionary specifying the tensorflow context.

Notes

The graph composite is ideal for constructing tensorflowGNN graphs and for specifying regression/classification taasks on multiple graph data. This could include multiple graphs each with multiple features on both edges and nodes.

graphpype.graph.graphSchemaFromComposites(graphComposites): Convert the graph composites into a graph schema that can be read by TensorFlow.

graphpype.graph.greedyModules(G)

A simple wrapper for a greedy community detection algorithm using modularity maximisation.

Parameters:

Gnetworkx.diggraph: The graph to be analysed.

Returns:

communitieslist: A list of communites given by modularity maximisation

graphpype.graph.louvainCommunities(G, seed: int, gpu=False)

A simple wrapper for the default parameters of the NetworkX implementation of the Louvain Clustering algorithm.

Parameters:

Gnetworkx.digraph: The graph to be analysed.
seed: int: Random seed specified by the recipe.
gpu: bool: Flag for GPU acceleration. Only supports CUDA frameworks currently. (not yet mainlined)

Returns:

communitieslist: A list of the communities given by the Louvain clustering algorithm

Notes

The graph is weighted by the adjacency matrix and the seed must by specified according to the recipe.

graphpype.graph.randomCommunityStochasticBlock(g, communities, density=0.1, nGraphs=1000, seed=0)

Construct a distribution of graphs with similar community structure as target graph.

Parameters:

gnetworkx.digraph: Target graph.
communitieslist: List of communities found by a comunnity detection algorithm.
densityfloat: Target density in the Stochastic Block Model.
nGraphsInt: Number of graphs to be sampled from the distribution.
seedInt: Random seed used for graph generation. Seeds for community detection and stochastic block model are based on independent increments of this seed.

Returns:

communitieslist: A list of communities found in the Stochastic Block Model sample.

Notes

Given a graph G with an already computed community structure use the stochastic block model to create random distribution of graphs with similar community structure. The probabilities are nominally given as a representative edge denisty of the constructed network

graphpype.graph.randomSpin(*data, atlas='msdl', atlasDir='./data/derivatives/atlases/', nPermutations=1000, seed=0)

Parameters:

atlasstring: The base atlas used to generate the distribution.
atlasDirstring: The atlas storage directory.
nPermutationsint: Number of samples to generate.
seedInt: Random seed used in generation.

Returns:

distributionlist: A vector of permuted atlases

Notes

A native implementation of the method proposed by Alexander-Block et. al. (2018) to control for spatial autocorrelation. Given data that is spatially distributed with a notion of spherical symmetry and an atlas of distance coordinates a spherical rotation is applied to the 3D space of the data and the atlas region associated with each datum is remapped to the closest (as the crow flies) atlas region. The data is typically in the form of feature (or biomarker) and can be a 3D or 4D tensor corrospending to some (registered) measurement. Currently, an atlas must be provided and it is rotated to give a new rotated atlas.

Usage notes: Spinning a set of coordinates and aligning them to an atlas is equivalent to reverse-spinning the atlas. The latter is more space efficient does not compound errors (outside of those in the original atlas mapping). A diveregence from the original implementation is the generation of random numbers; the original paper stated rotations on the sphere were uniformly distributed but the Git repository indicated sampling from a normal distribution and enforcing orthoganality via QR decomposition. This is a costly procedure. Here we will sample each angle from the distribution U([0,1]) and transform to the correct range for appropriate Euler angles.

To do: offer acceleration through CUDA

graphpype.graph.wiringCost(adj, dist)

Computes the standard wiring cost of a network defined by an adjancency matric and with a defined distance metric between the nodes.

\[W = \frac{1}{N} \sum_{ij} A_{ij} D_{ij}\]

Parameters:

adjnumpy.ndarray: A square adjacency matrix.
distnumpy.ndarray: A square distance matrix defining the distance between each node.

Returns:

costfloat: Wiring cost under the induced distance topology.

graphpype.pipe module

Pipelines

A collection of functions and object classes to construct recipes and run analysis pipelines.

The major conceptual classes exported are:

Datum - graphpypes internal expression of data containing metadata fields, preprocessing, and post-processing analysis of raw data.
Dataset - graphpypes internal represtation of datasets comprising of a list of data objects and analysis. Is composable with itself.
Operator - graphpypes internal representation of functions that operator on data and datasets. Contains metadata to aid with reproducibility and portability.
Recipe - an object to specify how operators should interact with data.
Pipeline - an object to construct and analyse data through a structured pipeline defined by the recipe.

class graphpype.pipe.DataSet(name='', dataObjs=[])

Bases: object

A composition of data with type Datum. Contains dataset level analysis and processing.

Attributes:

name: str: A name for the dataset (default: “”)
data: list: The processed data.
analysis: dict: Group level analysis of processed data.

Methods

__call__(directory, channel, loader)

Call self as a function.

analysis: dict

data: list

name: str

class graphpype.pipe.Datum(*directories)

Bases: object

An object that completely specifies a particular element of the dataset with potentially multiple imaging modalities.

Attributes:

dirs: list
List of the directories used for the specimen/subject in the analysis.
preProcess: dict
Dictionary storing the preprocessed data from a particular pipeline such as prepfmri e.g. Dict[“Connectivity”] = matrix
postProcess: dict
Dictionary storing the processed data from a particular operator e.g. Dict[“Connectivity”] = matrix

Methods

__call__(directory)

Call self as a function.

addChannel

addChannel(channel, data)

dirs: list

postProcess: dict

preProcess: dict

class graphpype.pipe.Operator(name: str, description: str, function: dict, channels: dict, args: dict, inter={})

Bases: object

A processing operator that operates on a discrete chunk of data which either is broadcastover/reduces a data set. The result is stored in the DataType object in a channel indexed as a dictionary and specified by this operator.

Notes

The default is to apply the operator to the function channel as is but occasionally you might want to broadcast over a list of elements in the channel e.g. doing a spin correction. To specify the function: - function = {“name”: name of the function, “package”: e.g numpy.random, “version”: blank if installed package / directory of user defined function} - channels ={{“dataIndex”: {“Layer”: [“Channel1”, “Channel2”, etc]}, {“Layer2”: [“Channel0”]}}, {“resultIndex”: {“Layer”: [“SingleChannel”]}}} - args = {“unnamed” = [], “alpha”: 0, “beta”: 1} - inter = {}, {“broadcast”: True}

Attributes:

opName: str: A local understandable name.
description: str: A description of what the operator does.
name: str: Name of the function.
basePackage: str: The package name.
packageDir: str
version: str: Give the version number for the locally installed package. If the function is self defined then provide the relative directory.
args: dict: The args required to run the operator. Unnamed arguments should be assigned to the dictionary entry unnamed.
internal: dict: A dictionary of internal operating requirements e.g. broadcast, reduce. Broadcast and reduce will always be applied to the first index of the data channels.
channelsIn: list: The shape of the container of the incoming data. The first entry specifies the layer on which the operator should function.
channelOut: list: The shape of the container of the returned data.
json: list: A JSON object storing the object.

Methods

__call__(data[, ret])

The operator can be used to operate on a datum by specifying the data layers and channels.

class graphpype.pipe.Pipeline(recipeDir, bids={})

Bases: object

Pipeline object: takes a recipe and executes it over a series of datapaths.

Attributes:

recipegraphpype.pipe.recipe: The recipe for the pipeline to execute.
pathslist: The BIDS paths for data.
resultgraphpype.pipe.dataset: A dataset containing the post-analysis, individual dataset analyses, post-processing of each datum, and preprocessing routines.

Methods

`output`()	Generate the output of the analysis.
`plot`(outputDir)	Generate plot files for the plotting objects defined in the analysis.
`process`([dataSetNames, preProcess])	Process a pipeline over a series of datasets.

output(): Generate the output of the analysis.

paths: list

plot(outputDir): Generate plot files for the plotting objects defined in the analysis.

process(dataSetNames=[], preProcess=True)

Process a pipeline over a series of datasets.

Parameters:

dataSetNameslist: A list of strings defining the datasets to process.
preProcessbool, optional: Optional, but recommended, preprocessing.

recipe: any

result: any

class graphpype.pipe.Recipe(name: str = '', description: str = '', nodes: dict = {}, env: dict = {'nThreads': 1, 'seed': 1}, template='')

Bases: object

Pipeline Recipe: a directed line graph of functions/constants that operate on a subset of data for analysis, output, and plotting.

The recipe informs the operations that must be sequentially applied to data and subsequent output manipulations. Recipes can be composed by concatentation.

Attributes:

name: str: Recipe identifier
description: str: Summary of the pipeline flow i.e. brief description of what the recipe cooks.
nodes: dict: Functions that the analysis will operate with. The entries are: “preProcess”, “postProcess”, “analysis”, “postAnalysis”, and “output”. Analysis refers to functions that are applied to a specific data set, while postAnalysis refers to functions that are applied to multiple datasets. Plotting, or output functions such as reports and caching go in the “output” layer.
env: dict: Any enviroment variables including number of threads, randomisation seed, etc.

Methods

`read`(inputDir)	Read a recipe in .json format from disk and construct recipe object.
`report`(bids[, author, outputDir])	Generate a report card summarising the recipe.
`write`(outputDir)	Write the recipe to disk in .json format for use in pipelines.

description: str

env: dict

name: str

nodes: dict

read(inputDir)

Read a recipe in .json format from disk and construct recipe object.

Parameters:

inputDirstr: The path to a recipe.

report(bids, author='', outputDir='data/derivatives'): Generate a report card summarising the recipe.

write(outputDir)

Write the recipe to disk in .json format for use in pipelines.

Parameters:

outputDirstr: The path to write the recipe to.

Returns:

None

graphpype.stats module

Statistics

A collection of statistical functions that are useful to neuroimaging analysis. There are native implementations of less common functions such as modularOverlap, and API calls for more established protocols to packages such as statsmodels.

graphpype.stats.compareDist(dist1, dist2, test='Kolmogorov-Smirnov')

Assesses the degree to which two empirical distributions are statistically different.

Parameters:

dist1distribution: A representation of a distribution i.e. a list of floats, numpy array, or a statsmodels distribution
dist2distribution
teststring: The test specification.

Returns:

testfloat: Test result

Notes

There are some common ways of doing this: assuming the degree distribution takes a specific functional form, or the general Kolmogorov-Smirnov test. Support is provided for degree distributions assumed to be in the power-law form or the Kolmogorov-Smirnov through the test keyword (default test=’Kolmogorov-Smirnov’)

graphpype.stats.compareGroupDegreeMeans(*data, correction='FDR', threshold=0.05)

Returns the corrected p-values for a pairwise t-test between a list of data for each degree in a groupwise graph.

Parameters:

datalist: A list of processed data with associated channels along the degree of the graph.
correctionstring: The multiple hypothesis test correction.
thresholdfloat: The threshold value for the multiple hypothesis test.

Returns:

pvalsdict: A dictionary for the corrected p-values for each of the provided channels

graphpype.stats.covarianceMatrix(*data, normalise=True)

Generates the covariance matrix from a given parcellation size on a particular dataset.

Parameters:

datanumpy.ndarray: Data with dimensions (N,L) for single estimate data, or (N,L,T) for time series data.
normalisebool: Option to normalise the data; default = true.

Returns:

covariancenumpy.ndarray: Covariance matrix

Notes

Returns the covariance matrix (size: L x L) of a data distributed amongst a particular parcellation (size: L) for a particular dataset (size: N). The default behavior is to normalise by standard deviations returning the regular Pearsons correlation coefficient.

graphpype.stats.estimateDistancePermutation(graph, distanceDist)

Estimate the distribution of distances between edges of a target graph.

Parameters:

graphnetworkx.digraph: Target graph
distanceDistlist: Distribution of distances between registered nodes.

Returns:

distributionlist: The distance distribution.
avfloat: Estimated mean distance.
errfloat: Standard error of the mean distance.

graphpype.stats.generalLinearModel(*data, sets=[], covariateChannels=[], regressorChannels=[], link=None, flatten=True)

Fits a general linear model to the data on given covariate and regressor channels in the data.

Parameters:

datagraphpype.pipe.dataset: The data to be regressed on.
setslist: The dataset names or indexes to regress on; an empty list implies regression on a single dataset.
covariateChannelslist: The data channels asscociated with the covariates.
regressorChannelslist: The data channels asscociated with the regressors.
link: The link function.
flattenbool: Flatten data into a single vector if no sets are provided.

Returns:

fitdict: Dictionary indexed on the dataset string/index containing the covariate/regressor fit between those datasets.

Notes

fit[x][y][“model”] is the fit asscociated between the covariates on dataset x and the regressors on dataset y.

graphpype.stats.graphNeuralNetwork(data, graphComposites={}, network={}, learningTask={})

Generalised graph neural networks API call to abstract arbitrary graph data formats and train them following the tfgnn GraphTensor structure.

Parameters:

datagraphpype.pipe.dataset: The complete dataset to be trained on.
graphCompositeslist: The graph graph composites specifing training channels and edge features.
networkdict: The neural network architecture.
learningTaskdict: The hyperpameters that define how the model is to be trained: optimiser, epochs, validation, batchsize, loss/task.

Returns:

trained: Trained Tensorflow network.

Notes

The data is treated as a dataset and a dictionary of “graphs” are passed per subject and the node and edge sets in the GraphTensor. Each graph is attached to a keyword e.g. “fmri” or “adjacency” and each of these can hold feature lists for every graph in the data in both the edges and the nodes. Currently, these graphs are considered to be independent i.e. there are assumed to be no edges between each keyword although in principle there is nothing stopping links between, for example, gene regulatory networks and fmri images. The data is specified as a total dataset at either the analysis or dataset level. Graph composites are in the form of a dictionary and specify the channel of the graph, the channels where the node features are derived, and the channels where the edge features are derived. These are used to derive the subgraphs and features extracted from each datum in the dataset to combine into the final graphTensor representing the entire dataset.

graphpype.stats.loadFeature(*data)

graphpype.stats.modularOverlap(modules1, modules2)

Computes the modular overlap between two graphs given their modular membership.

Parameters:

modules1list: The modular membership classes of graph one.
modules2list: The modular membership classes of graph two.

Returns:

overlapfloat: The number of shared pairs in the modules normalised by the total number of possible pairs.

Notes

Given a vector of module membership of nodes compute the modular overlap and return a z-transformed score. To compute the modular overlap compute the fraction of pairs of nodes that share a module in both groups i.e. a binary vector. Note: this is not a symetric relationship between partitions as the vectors will have different lengths based on which is chosen first.

graphpype.stats.modularZTest(*modules)

Compute the z-scores of each graph in the dataset with respect to the distributions defined by the modular overlap with a null distribution.

Parameters:

moduleslist: A vector of graph modules and corresponding modules of graphs sampled from null-models induced by the graphs. Each element of the list stores the graph modules in its first index and the null-model modules in its second index.

Returns:

zstatsnumpy.ndarray: A matrix of z-transformed modular overlap comparison between a list of graphs.

Notes

Each graph ..math::G_i of has a set of modules ..math::{M_k}_k^_N that can be measured against a null-model sample to define a pseudo-distribution. The modular overlap can be computed against another graph ..math::G_j and this modular overlap can be compared against the distribution of modular overlaps defined by the null model to compute a z-score. Each graph can be compared in this manner to generate a matrix of z-transformed modular overlaps.

graphpype.stats.multipleTTest(*data, threshold=0.025, correction='FDR')

Perform a standard multiple t-test comparison with correction.

Parameters:

datalist: The data to be examined.
correctionstring: The multiple hypothesis testing correction procedure.
thresholdfloat: The statistical signficance threshold.

Returns:

resultsdict: Dictionary of names or numerical identity of the data with each entry being a dictionary summarising the test with keys: pvals, signficance, and idxs. Idxs indicates the the degree.

graphpype.stats.pairgroupModularZTest(*modules, correction='FDR', threshold=0.025)

Return the corrected statistically significant paired difference modular overlap.

Each graph in each dataset induces a null-model which can be used for z-statistics. The differences between graphs z-scores in each dataset can be compared against the differences in the null-models z-scores and corrected for multiple hypothesis testing.

Parameters:

moduleslist: List of modules in a graph and the modules and a sample from the null-model induced by the graph. The graph modules are in the first index of each element and the distribution the seond.
correctionstring: The multiple hypothesis testing correction procedure. Defaults to “FDR”
thresholdfloat: The threshold for significance for the multiple hypothesis test

Returns:

pvalsnumpy.ndarray: A matrix of the corrected p-values for each graphical element being compared.

Notes

Compute the modular overlap of each of the measured populations and each paired sample from the null model. The z-test then computes Z values for each paired difference of the modular overlap of each pair of groupings when compared against the paired difference of distributions in each pair of groupings. These p-values are corrected (default: FDR) and returned as a matrix of pairs of the paired groupings and the signficance values are reported.

graphpype.utils module

Utility Functions

A collection of utility functions that allow graphpype to operate but are not specific to neuroimaging or graphpype.

graphpype.utils.distanceMat(data)

Construct a distance matrix from a list of coordinates.

The data can be two or three dimensional.

Parameters:

datalist: A list of geometric coordinates.

Returns:

matnumpy.ndarray: A square matrix in the form of a numpy array.

Notes

The distance metric used is the euclidean metric.

graphpype.utils.fetchAtlas(atlas='msdl', atlasDir='./data/derivatives/atlases/')

Grabs an atlas using the NiLearn API.

The default atlas used is the ‘msdl’ atlas but this can be specified to work with any atlas available in the NiLearn database. The atlases are placed in the data/atlases/ subdirectory of the BIDS directory.

Parameters:

atlasstring: Default is ‘msdl’ but can be any in the NiLean database e.g. ‘cort-maxprob-thr25-2mm’ for the Harvard-Oxford atlas.
atlasDir: string: Defaults to data/atlases in the BIDSs directory. This is BIDS compliant but can be changed.

Returns:

atlasObjobject: The atlas objects contains the maps in the maps key and the labels in the labels key.

GraphPype

graphpype.pipe module

Graphs

graphpype.pipe module

Pipelines

graphpype.stats module

Statistics

graphpype.utils module

Utility Functions

Module contents