TensorOperations.jl
Fast tensor operations using a convenient Einstein index notation.
Table of contents
- TensorOperations.jl
- Index notation with macros
- The
@tensor
macro - Contraction order and
@tensoropt
macro - Dynamical tensor network contractions with
ncon
and@ncon
- Multithreading and GPU evaluation of tensor contractions with
@cutensor
- Functions
- Cache for temporaries
- Implementation
Installation
Install with the package manager, pkg> add TensorOperations
.
Package features
- A macro
@tensor
for conveniently specifying tensor contractions and index permutations via Einstein's index notation convention. The index notation is analyzed at compile time. - Ability to optimize pairwise contraction order using the
@tensoropt
macro. This optimization is performed at compile time, and the resulting contraction order is hard coded into the resulting expression. The similar macro@tensoropt_verbose
provides more information on the optimization process. - A function
ncon
(for network contractor) for contracting a group of tensors (a.k.a. a tensor network), as well as a corresponding@ncon
macro that simplifies and optimizes this slightly. Unlike the previous macros,ncon
and@ncon
do not analyze the contractions at compile time, thus allowing them to deal with dynamic networks or index specifications. - Support for any Julia Base array which qualifies as strided, i.e. such that its entries are layed out according to a regular pattern in memory. The only exception are
ReinterpretedArray
objects (implementation provided by Strided.jl, see below). Additionally,Diagonal
objects whose underlying diagonal data is stored as a strided vector are supported. This facilitates tensor contractions where one of the operands is e.g. a diagonal matrix of singular values or eigenvalues, which are returned as aVector
by Julia'seigen
orsvd
method. - Support for
CuArray
objects if used together with CUDA.jl, by relying on (and thus providing a high level interface into) NVidia's cuTENSOR library. - Implementation can easily be extended to other types, by overloading a small set of methods.
- Efficient implementation of a number of basic tensor operations (see below), by relying on Strided.jl and
gemm
from BLAS for contractions. The latter is optional but on by default, it can be controlled by a package wide setting viaenable_blas()
anddisable_blas()
. If BLAS is disabled or cannot be applied (e.g. non-matching or non-standard numerical types), Strided.jl is also used for the contraction. - A package wide cache for storing temporary arrays that are generated when evaluating complex tensor expressions within the
@tensor
macro (based on the implementation of LRUCache). By default, the cache is allowed to use up to the minimum of either 1GB or 25% of the total memory.
Tensor operations
TensorOperations.jl is centered around 3 basic tensor operations, i.e. primitives in which every more complicated tensor expression is deconstructed.
addition: Add a (possibly scaled version of) one array to another array, where the indices of the both arrays might appear in different orders. This operation combines normal array addition and index permutation. It includes as a special case copying one array into another with permuted indices.
The actual implementation is provided by Strided.jl, which contains multithreaded implementations and cache-friendly blocking strategies for an optimal efficiency.
trace or inner contraction: Perform a trace/contraction over pairs of indices of an array, where the result is a lower-dimensional array. As before, the actual implementation is provided by Strided.jl.
contraction: Performs a general contraction of two tensors, where some indices of one array are paired with corresponding indices in a second array. This is typically handled by first permuting (a.k.a. transposing) and reshaping the two input arrays such that the contraction becomes equivalent to a matrix multiplication, which is then performed by the highly efficient
gemm
method from BLAS. The resulting array might need another reshape and index permutation to bring it in its final form. Alternatively, a native Julia implementation that does not require the additional transpositions (yet is typically slower) can be selected by usingdisable_blas()
.
To do list
- Make it easier to check contraction order and to splice in runtime information, or optimize based on memory footprint or other custom cost functions.