Fast tensor operations using a convenient Einstein index notation.
- Index notation with macros
- Contraction order and
- Dynamical tensor network contractions with
- Multithreading and GPU evaluation of tensor contractions with
- Cache for temporaries
Install with the package manager,
pkg> add TensorOperations.
- A macro
@tensorfor conveniently specifying tensor contractions and index permutations via Einstein's index notation convention. The index notation is analyzed at compile time.
- Ability to optimize pairwise contraction order using the
@tensoroptmacro. This optimization is performed at compile time, and the resulting contraction order is hard coded into the resulting expression. The similar macro
@tensoropt_verboseprovides more information on the optimization process.
- New: a function
ncon(for network contractor) for contracting a group of tensors (a.k.a. a tensor network), as well as a corresponding
@nconmacro that simplifies and optimizes this slightly. Unlike the previous macros,
@ncondo not analyze the contractions at compile time, thus allowing them to deal with dynamic networks or index specifications.
- Support for any Julia Base array which qualifies as strided, i.e. such that its entries are layed out according to a regular pattern in memory. The only exception are
ReinterpretedArrayobjects (implementation provided by Strided.jl, see below). Additionally,
Diagonalobjects whose underlying diagonal data is stored as a strided vector are supported. This facilitates tensor contractions where one of the operands is e.g. a diagonal matrix of singular values or eigenvalues, which are returned as a
- New: Support for
CuArrayobjects if used together with CuArrays.jl, by relying on (and thus providing a high level interface into) NVidia's cuTENSOR library.
- Implementation can easily be extended to other types, by overloading a small set of methods.
- Efficient implementation of a number of basic tensor operations (see below), by relying on Strided.jl and
gemmfrom BLAS for contractions. The latter is optional but on by default, it can be controlled by a package wide setting via
disable_blas(). If BLAS is disabled or cannot be applied (e.g. non-matching or non-standard numerical types), Strided.jl is also used for the contraction.
- A package wide cache for storing temporary arrays that are generated when evaluating complex tensor expressions within the
@tensormacro (based on the implementation of LRUCache). By default, the cache is allowed to use up to the minimum of either 1GB or 25% of the total memory.
TensorOperations.jl is centered around 3 basic tensor operations, i.e. primitives in which every more complicated tensor expression is deconstructed.
addition: Add a (possibly scaled version of) one array to another array, where the indices of the both arrays might appear in different orders. This operation combines normal array addition and index permutation. It includes as a special case copying one array into another with permuted indices.
The actual implementation is provided by Strided.jl, which contains multithreaded implementations and cache-friendly blocking strategies for an optimal efficiency.
trace or inner contraction: Perform a trace/contraction over pairs of indices of an array, where the result is a lower-dimensional array. As before, the actual implementation is provided by Strided.jl.
contraction: Performs a general contraction of two tensors, where some indices of one array are paired with corresponding indices in a second array. This is typically handled by first permuting (a.k.a. transposing) and reshaping the two input arrays such that the contraction becomes equivalent to a matrix multiplication, which is then performed by the highly efficient
gemmmethod from BLAS. The resulting array might need another reshape and index permutation to bring it in its final form. Alternatively, a native Julia implementation that does not require the additional transpositions (yet is typically slower) can be selected by using
- Make it easier to check contraction order and to splice in runtime information, or optimize based on memory footprint or other custom cost functions.