Index notation with macros

The main export and main functionality of TensorOperations.jl is the @tensor macro

TensorOperations.@tensorMacro
@tensor(tensor_expr; kwargs...)
@tensor [kw_expr...] tensor_expr

Specify one or more tensor operations using Einstein's index notation. Indices can be chosen to be arbitrary Julia variable names, or integers. When contracting several tensors together, this will be evaluated as pairwise contractions in left to right order, unless the so-called NCON style is used (positive integers for contracted indices and negative indices for open indices).

Additional keyword arguments may be passed to control the behavior of the parser:

  • order: A list of contraction indices of the form order=(...,) which specify the order in which they will be contracted.
  • opt: Contraction order optimization, similar to @tensoropt. Can be either a boolean or an OptExpr.
  • contractcheck: Boolean flag to enable runtime check for contractibility of indices with clearer error messages.
  • costcheck: Can be either :warn or :cache and adds runtime checks to compare the compile-time contraction order to the optimal order computed for the actual run time tensor costs. If costcheck == :warn, warnings are printed for every sub-optimal contraction that is encountered. If costcheck == :cache, only the most costly run of a particular sub-optimal contraction will be cached in TensorOperations.costcache. In both cases, a suggestion for the order keyword argument is computed to switch to the optimal contraction order.
  • backend: Inserts an implementation backend as a final argument in the different tensor operation calls in the generated code.
  • allocator: Inserts an allocation strategy as a final argument in the tensor allocation calls in the generated code.
source

The functionality and configurability of @tensor and some of its relatives is explained in detail on this page.

The @tensor macro

The prefered way to specify (a sequence of) tensor operations is by using the @tensor macro, which accepts an index notation format, a.k.a. Einstein notation (and in particular, Einstein's summation convention).

This can most easily be explained using a simple example:

using TensorOperations
α = randn()
A = randn(5, 5, 5, 5, 5, 5)
B = randn(5, 5, 5)
C = randn(5, 5, 5)
D = zeros(5, 5, 5)
@tensor begin
    D[a, b, c] = A[a, e, f, c, f, g] * B[g, b, e] + α * C[c, a, b]
    E[a, b, c] := A[a, e, f, c, f, g] * B[g, b, e] + α * C[c, a, b]
end

The first important observation is the use of two different assignment operators within the body of the @tensor call. The regular assignment operator = stores the result of the tensor expression in the right hand side in an existing tensor D, whereas the 'definition' operator := results in a new tensor E with the correct properties to be created. Nonetheless, the contents of D and E will be equal.

Following Einstein's summation convention, that contents is computed in a number of steps involving the three primitive tensor operators. In this particular example, the first step involves tracing/contracting the 3rd and 5th index of array A, the result of which is stored in a temporary array which thus needs to be created. This resulting array will then be contracted with array B by contracting its 2nd index with the last index of B and its last index with the first index of B. The result is stored in D in the first line, or in a newly allocated array which will end up being E in the second line. Note that the index order of D and E is such that its first index corresponds to the first index of A, the second index corresponds to the second index of B, whereas the third index corresponds to the fourth index of A. Finally, the array C (scaled with α) is added to this result (in place), which requires a further index permutation.

The index pattern is analyzed at compile time and expanded to a set of calls to the basic tensor operations, i.e. tensoradd!, tensortrace! and tensorcontract!. Temporaries are created where necessary, as these building blocks operate pairwise on the input tensors. The generated code can easily be inspected

using TensorOperations
@macroexpand @tensor E[a, b, c] := A[a, e, f, c, f, g] * B[g, b, e] + α * C[c, a, b]
quote
    var"##T_##E_A#231#232" = TensorOperations.scalartype(A)
    var"##E_A#231" = TensorOperations.tensoralloc_add(var"##T_##E_A#231#232", A, ((1, 4), (2, 6)), false, Val{true}())
    var"##E_A#231" = TensorOperations.tensortrace!(var"##E_A#231", A, ((1, 4), (2, 6)), ((3,), (5,)), false, VectorInterface.One(), VectorInterface.Zero())
    var"##T_E#233" = TensorOperations.promote_add(TensorOperations.promote_contract(TensorOperations.scalartype(A), TensorOperations.scalartype(B)), Base.promote_op(*, TensorOperations.scalartype(C), TensorOperations.scalartype(α)))
    E = TensorOperations.tensoralloc_contract(var"##T_E#233", var"##E_A#231", ((1, 2), (3, 4)), false, B, ((3, 1), (2,)), false, ((1, 3, 2), ()), Val{false}())
    E = TensorOperations.tensorcontract!(E, var"##E_A#231", ((1, 2), (3, 4)), false, B, ((3, 1), (2,)), false, ((1, 3, 2), ()), VectorInterface.One(), VectorInterface.Zero())
    TensorOperations.tensorfree!(var"##E_A#231")
    E
    E = TensorOperations.tensoradd!(E, C, ((2, 3, 1), ()), false, α, VectorInterface.One())
end

The different functions in which this tensor expression is decomposed are discussed in more detail in the Implementation section of this manual.

In this example, the tensor indices were labeled with arbitrary letters; also longer names could have been used. In fact, any proper Julia variable name constitutes a valid label. Note though that these labels are never interpreted as existing Julia variables. Within the @tensor macro they are converted into symbols and then used as dummy names, whose only role is to distinguish the different indices. Their specific value bears no meaning. They also do not appear in the generated code as illustrated above. This implies, in particular, that the specific tensor operations defined by the code inside the @tensor environment are completely specified at compile time. Various remarks regarding the index notation are in order.

  1. TensorOperations.jl only supports strict Einstein summation convention. This implies that there are two types of indices. Either an index label appears once in every term of the right hand side, and it also appears on the left hand side. We refer to the corresponding indices as open or free. Alternatively, an index label appears exactly twice within a given term on the right hand side. The corresponding indices are referred to as closed or contracted, i.e. the pair of indices takes equal values and are summed over their (equal) range. This is known as a contraction, either an outer contraction (between two indices of two different tensors) or an inner contraction (a.k.a. trace, between two indices of a single tensor). More liberal use of the index notation, such as simultaneous summutation over three or more indices, or a open index appearing simultaneously in different tensor factors, are not supported by TensorOperations.jl.

  2. Aside from valid Julia identifiers, index labels can also be specified using literal integer constants or using a combination of integers and symbols. Furthermore, it is also allowed to use primes (i.e. Julia's adjoint operator) to denote different indices, including using multiple subsequent primes. The following expression thus computes the same result as the example above:

    @tensor D[å'', ß, clap'] = A[å'', 1, -3, clap', -3, 2] * B[2, ß, 1] + α * C[clap', å'', ß]
  3. If only integers are used for specifying index labels, this can be used to control the pairwise contraction order, by using the well-known NCON convention, where open indices in the left hand side are labelled by negative integers -1, -2, -3, whereas contraction indices are labelled with positive integers 1, 2, … Since the index order of the left hand side is in that case clear from the right hand side expression, the left hand side can be indexed with [:], which is automatically replaced with all negative integers appearing in the right hand side, in decreasing order. The value of the labels for the contraction indices determines the pairwise contraction order. If multiple tensors need to be contracted, a first temporary will be created consisting of the contraction of the pair of tensors that share contraction index 1, then the pair of tensors that share contraction index 2 (if not contracted away in the first pair) will be contracted, and so forth. The next subsection explains contraction order in more detail and gives some useful examples, as the example above only includes a single pair of tensors to be contracted.

  4. Index labels always appear in square brackets [ ... ] but can be separated by either commas, as in D[a, b, c], (yielding a :ref expression) or by spaces, as in D[a b c], (yielding a :typed_hcat expression).

    There is also the option to separate the indices into two groups using a semicolon. This can be useful for tensor types which have two distinct set of indices, but has no effect when using Julia AbstractArray objects. While in principle both spaces and commas can be used within the two groups, e.g. as in D[a, b; c] or D[a b; c], there are some restrictions because of accepted Julia syntax. Both groups of indices should use the same convention. If there is only a single index in the first group, the second group should use spaces to constitute a valid expression. Finally, having no indices in the first group is only possible by writing an empty tuple. The second group can then use spaces, or also contain the indices as a tuple, i.e. both D[(); a b c] or D[(); (a, b, c)]. Writing the two groups of indices within a tuple (which uses a comma as natural separator), with both tuples seperated by a semicolon is always valid syntax, irrespective of the number of indices in that group.

  5. Index expressions [...] are only interpreted as index notation on the highest level. For example, if you want to mulitply two matrices which are stored in a list, you can write

    @tensor result[i,j] := list[1][i,k] * list[2][k,j]

    However, if both are stored as a the slices of a 3-way array, you cannot write

    @tensor result[i,j] := list[i,k,1] * list[k,j,2]

    Rather, you should use

    @tensor result[i,j] := list[:,:,1][i,k] * list[:,:,2][k,j]

    or, if you want to avoid additional allocations

    @tensor result[i,j] := view(list,:,:,1)[i,k] * view(list,:,:,2)[k,j]

Note, finally, that the @tensor specifier can be put in front of a single tensor expression, or in front of a begin ... end block to group and evaluate different expressions at once. Within an @tensor begin ... end block, the @notensor macro can be used to annotate indexing expressions that need to be interpreted literally.

TensorOperations.@notensorMacro
@notensor(block)

Marks a block which should be ignored within an @tensor environment. Has no effect outside of @tensor.

source

As in illustration, note that the previous code examples about the matrix multiplication with matrices stored in a 3-way array can now also be written as

@tensor begin
    @notensor A = list[:,:,1]
    @notensor B = list[:,:,2]
    result[i,j] = A[i,k] * B[k,j]
end

Contraction order specification and optimisation

A contraction of several (more than two) tensors, as in

@tensor D[a, d, j, i, g] := A[a, b, c, d, e] * B[b, e, f, g] * C[c, f, i, j]

is known as a tensor network and is generically evaluated as a sequence of pairwise contractions. In the example above, this contraction is evaluated using Julia's default left to right order, i.e. as (A[a, b, c, d, e] * B[b, e, f, g]) * C[c, f, i, j]. There are however different strategies to modify this order.

  1. Explicit parenthesis can be used to group subnetworks within a tensor network that will be evaluated first. Parentheses around subexpressions are always respected by the @tensor macro.

  2. As explained in the previous subsection, if one respects the NCON convention of specifying indices, i.e. positive integers for the contracted indices and negative indices for the open indices, the different factors will be reordered and so that the pairwise tensor contractions contract over indices with smaller integer label first. For example,

    @tensor D[:] := A[-1, 3, 1, -2, 2] * B[3, 2, 4, -5] * C[1, 4, -4, -3]

    will be evaluated as (A[-1, 3, 1, -2, 2] * C[1, 4, -4, -3]) * B[3, 2, 4, -5]. Furthermore, in that case the indices of the output tensor (D in this case) do not need to be specified (using [:] instead), and will be chosen as (-1, -2, -3, -4, -5). Note that if two tensors are contracted, all contraction indices among them will be contracted, even if there are additional contraction indices whose label is a higher positive number. For example,

    @tensor D[:] := A[-1, 3, 2, -2, 1] * B[3, 1, 4, -5] * C[2, 4, -4, -3]

    amounts to the original left to right order, because A and B share the first contraction index 1. When A and B are contracted, also the contraction with label 3 will be performed, even though contraction index with label 2 is not yet 'processed'.

  3. A specific contraction order can be manually specified by supplying an order keyword argument to the @tensor macro. The value is a tuple of the contraction indices in the order that they should be dealt with, e.g. the default order could be changed to first contract A with C using

    @tensor order=(c, b, e, f) begin
        D[a, d, j, i, g] := A[a, b, c, d, e] * B[b, e, f, g] * C[c, f, i, j]
    end

    Here, the same comment as in the NCON style applies; once two tensors are contracted because they share an index label which is next in the order list, all other indices with shared label among them will be contracted, irrespective of their order.

In the case of more complex tensor networks, the optimal contraction order cannot always easily be guessed or determined on plain sight. It is then useful to be able to optimize the contraction order automatically, given a model for the complexity of contracting the different tensors in a particular order. This functionality is provided where the cost function being minimized models the computational complexity by counting the number of scalar multiplications. This minimisation problem is solved using the algorithm that was described in Physical Review E 90, 033315 (2014). For a given tensor networ contraction, this algorithm is ran once at compile time, while lowering the tensor espression, and the outcome will be hard coded in the expression resulting from the macro expansion. While the computational complexity of this optimisation algorithm scales itself exponentially in the number of tensors involved in the network, it should still be acceptibly fast (milliseconds up to a few seconds at most) for tensor network contractions with up to around 30 tensors. Information of the optimization process can be obtained during compilation by using the alternative macro @tensoropt_verbose.

As the cost is determined at compile time, it is not using actual tensor properties (e.g. size(A, i) in the case of arrays) in the cost model, and the cost or extent associated with every index can be specified in various ways, either using integers or floating point numbers or some arbitrary univariate polynomial of an abstract variable, e.g. χ. In the latter case, the optimization assumes the asymptotic limit of large χ.

TensorOperations.@tensoroptMacro
@tensoropt(optex, block)
@tensoropt(block)

Specify one or more tensor operations using Einstein's index notation. Indices can be chosen to be arbitrary Julia variable names, or integers. When contracting several tensors together, the macro will determine (at compile time) the optimal contraction order depending on the cost associated to the individual indices. If no optex is provided, all indices are assumed to have an abstract scaling χ which is optimized in the asympotic limit of large χ.

The cost can be specified in the following ways:

# cost χ for all indices (a, b, c, d, e, f)
@tensoropt D[a, b, c, d] := A[a, e, c, f] * B[g, d, e] * C[g, f, b]

# asymptotic cost χ for indices (a, b, c, e), other indices (d, f) have cost 1
@tensoropt (a, b, c, e) C[a, b, c, d] := A[a, e, c, f, h] * B[f, g, e, b] * C[g, d, h]

# cost 1 for indices (a, b, c, e); other indices (d, f) have asymptotic cost χ
@tensoropt !(a, b, c, e) C[a, b, c, d] := A[a, e, c, f, h] * B[f, g, e, b] * C[g, d, h]

# cost as specified for listed indices, unlisted indices have cost 1 (any symbol for χ can be used)
@tensoropt (a => χ, b => χ^2, c => 2 * χ, e => 5) begin
    C[a, b, c, d] := A[a, e, c, f, h] * B[f, g, e, b] * C[g, d, h]
end

Note that @tensoropt will optimize any tensor contraction sequence it encounters in the (block of) expressions. It will however not break apart expressions that have been explicitly grouped with parenthesis, i.e. in

@tensoropt C[a, b, c, d] := A[a, e, c, f, h] * (B[f, g, e, b] * C[g, d, h])

it will always contract B and C first. For a single tensor contraction sequence, the optimal contraction order and associated (asymptotic) cost can be obtained using @optimalcontractiontree.

source

As an final remark, the optimization can also be accessed directly from @tensor by specifying the additional keyword argument opt=true, which will then use the default cost model, or opt=optex to further specify the costs.

# cost χ for all indices (a, b, c, d, e, f)
@tensor opt=true D[a, b, c, d] := A[a, e, c, f] * B[g, d, e] * C[g, f, b]

# cost χ for indices (a, b, c, e), other indices (d, f) have cost 1
@tensor opt=(a, b, c, e) D[a, b, c, d] := A[a, e, c, f] * B[g, d, e] * C[g, f, b]

# cost 1 for indices (a, b, c, e), other indices (d, f) have cost χ
@tensor opt=!(a, b, c, e) D[a, b, c, d] := A[a, e, c, f] * B[g, d, e] * C[g, f, b]

# cost as specified for listed indices, unlisted indices have cost 1 (any symbol for χ can be used)
@tensor opt=(a => χ, b => χ^2, c => 2 * χ, e => 5) begin
    D[a, b, c, d] := A[a, e, c, f] * B[g, d, e] * C[g, f, b]
end

Dynamical tensor network contractions with ncon and @ncon

Tensor network practicioners are probably more familiar with the network contractor function ncon to perform a tensor network contraction, as e.g. described in NCON. In particular, a graphical application TensorTrace was recently introduced to facilitate the generation of such ncon calls. TensorOperations.jl provides compatibility with this interface by also exposing an ncon function with the same basic syntax

ncon(list_of_tensor_objects, list_of_index_lists)

e.g. the example of above is equivalent to

@tensor D[:] := A[-1, 3, 1, -2, 2] * B[3, 2, 4, -5] * C[1, 4, -4, -3]
D ≈ ncon((A, B, C), ([-1, 3, 1, -2, 2], [3, 2, 4, -5], [1, 4, -4, -3]))

where the lists of tensor objects and of index lists can be given as a vector or a tuple. The ncon function necessarily needs to analyze the contraction pattern at runtime, but this can be an advantage, in cases where the contraction is determined by runtime information and thus not known at compile time. A downside from this, besides the fact that this can result in some overhead (though this is typically negligable for anything but very small tensor contractions), is that ncon is type-unstable, i.e. its return type cannot be inferred by the Julia compiler.

The full call syntax of the ncon method exposed by TensorOperations.jl is

ncon(tensorlist, indexlist, [conjlist]; order=..., output=...)

where the first two arguments are those of above. Let us first discuss the keyword arguments. The keyword argument order can be used to change the contraction order, i.e. by specifying which contraction indices need to be processed first, rather than the strictly increasing order [1, 2, ...], as discussed in the previous subsection. The keyword argument output can be used to specify the order of the output indices, when it is different from the default [-1, -2, ...].

The optional positional argument conjlist is a list of Bool variables that indicate whether the corresponding tensor needs to be conjugated in the contraction. So while

ncon([A, conj(B), C], [[-1, 3, 1, -2, 2], [3, 2, 4, -5], [1, 4, -4, -3]]) ≈
ncon([A, B, C], [[-1, 3, 1, -2, 2], [3, 2, 4, -5], [1, 4, -4, -3]], [false, true, false])

the latter has the advantage that conjugating B is not an extra step (which creates an additional temporary requiring allocations), but is performed at the same time when it is contracted.

As an alternative solution to the optional positional arguments, there is also an @ncon macro. It is just a simple wrapper over an ncon call and thus does not analyze the indices at compile time, so that they can be fully dynamical. However, it will transform

@ncon([A, conj(B), C], indexlist; order=..., output=...)

into

ncon(Any[A, B, C], indexlist, [false, true, false]; order=..., output=...)

so as to get the advantages of just-in-time conjugation (pun intended) using the familiar looking ncon syntax.

TensorOperations.nconFunction
ncon(tensorlist, indexlist, [conjlist, sym]; order = ..., output = ..., backend = ..., allocator = ...)

Contract the tensors in tensorlist (of type Vector or Tuple) according to the network as specified by indexlist. Here, indexlist is a list (i.e. a Vector or Tuple) with the same length as tensorlist whose entries are themselves lists (preferably Vector{Int}) where every integer entry provides a label for corresponding index/dimension of the corresponding tensor in tensorlist. Positive integers are used to label indices that need to be contracted, and such thus appear in two different entries within indexlist, whereas negative integers are used to label indices of the output tensor, and should appear only once.

Optional arguments in another list with the same length, conjlist, whose entries are of type Bool and indicate whether the corresponding tensor object should be conjugated (true) or not (false). The default is false for all entries.

By default, contractions are performed in the order such that the indices being contracted over are labelled by increasing integers, i.e. first the contraction corresponding to label 1 is performed. The output tensor had an index order corresponding to decreasing (negative, so increasing in absolute value) index labels. The keyword arguments order and output allow to change these defaults.

See also the macro version @ncon.

source
TensorOperations.@nconMacro
@ncon(tensorlist, indexlist; order = ..., output = ...)

Contract the tensors in tensorlist (of type Vector or Tuple) according to the network as specified by indexlist. Here, indexlist is a list (i.e. a Vector or Tuple) with the same length as tensorlist whose entries are themselves lists (preferably Vector{Int}) where every integer entry provides a label for corresponding index/dimension of the corresponding tensor in tensorlist. Positive integers are used to label indices that need to be contracted, and such thus appear in two different entries within indexlist, whereas negative integers are used to label indices of the output tensor, and should appear only once.

By default, contractions are performed in the order such that the indices being contracted over are labelled by increasing integers, i.e. first the contraction corresponding to label 1 is performed. The output tensor had an index order corresponding to decreasing (negative, so increasing in absolute value) index labels. The keyword arguments order and output allow to change these defaults.

The advantage of the macro @ncon over the function call ncon is that, if tensorlist is not just some variable but an actual list (as a tuple with parentheses or a vector with square brackets) at the call site, the @ncon macro will scan for conjugation calls, e.g. conj(A), and replace this with just A but build a matching list of conjugation flags to be specified to ncon. This makes it more convenient to specify tensor conjugation, without paying the cost of actively performing the conjugation beforehand.

See also the function ncon.

source

Index compatibility and checks

Errors in the index pattern are caught at compile time, and will spawn an error of type

Once the expression is compiled, there can still be errors that will lead to runtime errors, the type of which depends on the specific type of tensor (e.g. DimensionMismatch for AbstractArray objects). Indeed, indices with the same label, either open indices on the two sides of the equation, or contracted indices, need to be compatible. For AbstractArray objects, this means they must have the same size. Other tensor types might have more complicated structure associated with their indices, and requires matching between those. The function checkcontractible is part of the interface that can be used to control when tensors can be contracted with each other along specific indices.

For example, for AbstractArray objects, the following helper methods are defined that capture the recurring checks across different implementations of the tensor operations:

TensorOperations.argcheck_tensoraddFunction
argcheck_tensoradd(C::AbstractArray, A::AbstractArray, pA::Index2Tuple)

Check that C and A have numind(pA) indices and that pA constitutes a valid permutation.

source
TensorOperations.argcheck_tensortraceFunction
argcheck_tensortrace(C::AbstractArray, A::AbstractArray, p::Index2Tuple, q::Index2Tuple)

Check that the partial trace of A over indices q and with permutation of the remaining indices p is compatible with output C.

source
TensorOperations.argcheck_tensorcontractFunction
argcheck_tensorcontract(C::AbstractArray, A::AbstractArray, pA::Index2Tuple, B::AbstractArray, pB::Index2Tuple, pAB::Index2Tuple)

Check that C, A and pA, and B and pB and pAB have compatible indices and number of dimensions.

source
TensorOperations.dimcheck_tensortraceFunction
dimcheck_tensorcontract(C::AbstractArray, A::AbstractArray,
                        p::Index2Tuple, q::Index2Tuple)

Check that C and A have compatible sizes for the trace and addition specified by p and q.

source
TensorOperations.dimcheck_tensorcontractFunction
dimcheck_tensorcontract(C::AbstractArray,
                        A::AbstractArray, pA::Index2Tuple,
                        B::AbstractArray, pB::Index2Tuple,
                        pAB::Index2Tuple)

Check that C, A and B have compatible sizes for the contraction specified by pA, pB and pAB.

source

Whenever an incompatible index structures is detected at runtime, the resulting error will originate deep within the implementation, at which point the error message will provide little information as to which specific tensors and which indices are producing the mismatch. When debugging, it might be useful to add the keyword argument contractcheck = true to the @tensor macro. Explicit checks using checkcontractible are then enabled that are run before any tensor operation is performed. When a mismatch is detected, these checks still have access to the label information and spawn a more informative error message.

A different type of check is the costcheck keyword argument, which can be given the values :warn or :cache. With either of both values for this keyword argument, additional checks are inserted that compare the contraction order of any tensor contraction of three or more factors against the optimal order based on the current tensor size. More generally, the function tensorcost is part of the interface and associated a cost value with every index of a tensor, which is then used in the cost model. With costcheck=:warn, a warning will be spawned for every tensor network where the actual contraction order (even when optimized using abstract costs) does not match with the ideal contraction order given the current tensorcost values. With costcheck = :cache, the tensor networks with non-optimal contraction order are stored in a global package variable TensorOperations.costcache. However, when a tensor network is evaluated several times with different tensor sizes or tensor costs, only the evaluation giving rise to the largest total contraction cost for that network will appear in the cache (provided the actual contraction order deviates from the optimal order in that largest case).

Backends, multithreading and GPUs

Every index expression will be evaluated as a sequence of elementary tensor operations, i.e. permuted additions, partial traces and contractions, which are implemented for strided arrays as discussed in Package features. In particular, optimised implementations rely on Strided.jl, and we refer to this package for a full specification of which arrays are supported. As a rule of thumb, this primarily includes Arrays from Julia base, as well as views thereof if sliced with a combination of Integers and Ranges. Special types such as Adjoint and Transpose from Base are also supported. For permuted addition and partial traces, native Julia implementations are used which could benefit from multithreading if JULIA_NUM_THREADS>1.

The binary contraction is performed by first permuting the two input tensors into a form such that the contraction becomes equivalent to one matrix multiplication on the whole data, followed by a final permutation to bring the indices of the output tensor into the desired order. This approach allows to use the highly efficient matrix multiplication kernel (gemm) from BLAS, which is multithreaded by default. There is also a native contraction implementation that is used for e.g. arrays with an eltype that is not <:LinearAlgebra.BlasFloat. It performs the contraction directly without the additional permutations, but still in a cache-friendly and multithreaded way (again relying on JULIA_NUM_THREADS > 1). This implementation can also be used for BlasFloat types (but will typically be slower), and the use of BLAS can be controlled by explicitly switching the backend between StridedBLAS and StridedNative using the backend keyword to @tensor. Similarly, different allocation strategies, when available, can be selected using the allocator keyword of @tensor.

Since TensorOperations v5, there are also fallback implementations for non-strided arrays that only use functionality from Base, as well as LinearAlgebra.mul!. The manual section Backends and Allocators provides a more elaborate discussion.

The primitive tensor operations are also implemented for CuArray objects of the CUDA.jl library. This implementation is essentially a simple wrapper over the cuTENSOR library of NVidia, and will only be loaded when the cuTENSOR.jl package is loaded. The @tensor macro will then automatically work for operations between GPU arrays.

Mixed operations between host arrays (e.g. Array) and device arrays (e.g. CuArray) will fail. However, if one wants to harness the computing power of the GPU to perform all tensor operations, there is a dedicated macro @cutensor. This will transfer all host arrays to the GPU before performing the requested operations. If the output is an existing host array, the result will be copied back. If a new result array is created (i.e. using :=), it will remain on the GPU device and it is up to the user to transfer it back. Arrays are transfered to the GPU just before they are first used, and in a complicated tensor expression, this might have the benefit that transer of the later arrays overlaps with computation of earlier operations.

TensorOperations.@cutensorMacro
@cutensor tensor_expr

Use the GPU to perform all tensor operations, through the use of the cuTENSOR library. This will transfer all arrays to the GPU before performing the requested operations. If the output is an existing host array, the result will be transferred back. If a new array is created (i.e. using :=), it will remain on the GPU device and it is up to the user to transfer it back. This macro is equivalent to @tensor backend=cuTENSORBackend() allocator=CUDAAllocator() tensor_expr.

Note

This macro requires the cuTENSOR library to be installed and loaded. This can be achieved by running using cuTENSOR or import cuTENSOR before using the macro.

source