Skip to content

Tutorial

This tutorial introduces basic TensorX concepts.

Prerequisites

TensorX is a machine learning library to build neural network models written in Python and it works as a complement to Tensorflow, therefore, to make the most out of this tutorial (and this library), readers should be familiarized with the following:

  • Python 3: if you're new to the Python language or need to refresh some concepts check out the Python Tutorial.
  • Tensorflow: Tensorflow is a high-performance machine learning library that allows numerical computation on GPUs and TPUs. It was originally designed to build auto-differentiable dataflow graphs. Although it has adopted an eager execution model in version 2, computation graphs are still an integral concept of high performance tensorflow program definition and deployment. Unfortunately most tensorflow tutorials and guides, focus on Keras, a high level interface similar to TensorX. For a primer on Tensorflow, I recommend taking a look at Tensorflow basics guide section instead.

  • NumPy: similarly to the core of Tensorflow, Numpy is a numerical computation library with a focus in multi-dimensional array transformations. Given that Tensorflow tensors are converted to and from NumPy arrays, and TensorX also depends on NumPy (mostly for testing), it is recommended that the reader is familiarized with
    NumPy basics. For more details, check the NumPy documentation.

Installation

You can install tensorx with pip as follows:

pip install tensorflow
pip install tensorx

for more details see the installation documentation.

Layers

In TensorX, a Layer is the basic building block of a neural network. Semantically speaking, a layer is an object that can have multiple inputs, an inner state, and a computation function applied to its inputs (that depends on the current state). Each layer has a single output. In essence,we can say that a Layer instance is a stateful function. Connecting a series of layers results in a layer graph. In TensorX, each layer is the end-node of a subgraph, and executing it will result in the execution of all layers in the subgraph with the current layer as output. Layer subclasses can range from simple linear transformations (e.g. Layer) to more complex layers used to build recurrent neural networks such as long short-term memory (LSTM) cells (e.g. LSTMCell) or attention mechanisms such as MHAttention.

Layer properties and methods

  • inputs: list of input layers for the current layer;
  • input: syntax sugar for inputs[0];
  • n_units: number of output units or neurons, this is the last dimension of the output tensor resulting from this layer's computation;
  • shape: the inferred shape for the layer output;
  • compute(*tensors): layer computation applied to its input layers or input tensors if any is given.
  • __call__: all layers are Callable and the result is the computation of the entire layer graph taking the current layer as the terminal node.
  • reuse_with: create a new layer object that shares the state with the current layer but is connected to different inputs. The new layer is the end-point node of a new layer graph.
  • variables: a list of tf.Variable objects that handled by the current layer
  • trainable_variables: a list of tf.Variable objects that are trainable, this is, that are changed by an optimizer during training.*
  • config: a layer configuration (LayerConfig) with the arguments used in the current layer instance constructor.

Using existing Layers

TensorX ships with a number of built in Layers that you can easily use to compose layer graphs that perform various computations. All layers are accessible from the global namespace tensorx.Linear or from the tensorx.layers module. The following example shows how to use a simple Linear layer that performs the computation y=Wx+b:

import tensorflow as tf
import tensorx as tx

x = tf.random.uniform([2, 2], dtype=tf.float32)
# y = Wx + b
y = tx.Linear(x, n_units=3)
result = y()

assert tx.tensor_equal(tf.shape(result), [2, 3])
assert len(y.inputs) == 1
assert isinstance(y.input, tx.Constant)

Note that we can pass a Tensor object to Linear (or any other layer), and it will be automatically converted to a Layer, to a Constant layer to be more precise. The layer y has exactly 1 input layer and __call__ will return the result of its computation on this input.

Dynamic stateful Input

The Input layer allows us to add a dynamic input to a layer graph:

value = tf.random.uniform([2, 2], dtype=tf.float32)
x = tx.Input(init_value=value)
# y = Wx + b
y = tx.Linear(x, n_units=3)

result1 = y()
# x is stateful and its value can be changed e.g. to a new random value
x.value = tf.random.uniform([2,2], dtype=tf.float32)
result2 = y()
result3 = y.compute(value)

assert not tx.tensor_equal(result1, result2)
assert not y.input.constant
# compute returns the layer computation independently from its current graph
assert tx.tensor_equal(result1, result3)

print(result1)
tf.Tensor(
    [[ 0.8232075   0.2716378  -0.33215973]
     [ 0.34996247 -0.02594224 -0.05033442]], 
    shape=(2, 3), 
    dtype=float32)

Input allows the creation of dynamic input layers with a value property that can be changed, we can see that the value at the end-point of this graph changes as well. Moreover, the compute method is distinct from __call__ as it only depends on the layer current state and not on the current graph.

Important

if n_units is not set to None on a dynamic Input layer, it will take the last dimension of the initial value, henceforth, any tensor assigned to value must match the n_units in its last dimension. This means that the batch dimension can be variable for example.

Warning

You can't switch the number of dimension in a dynamic Input. Without an initial value or shape, it defaults to a shape (0, 0) (an empty tensor with 2 dimensions). An error is thrown if you try to assign a tensor with a mismatching number of dimensions. For example, if you create an input as follows Input(shape=[None,None,None]), an error is thrown if you assign a tensor with a mismatching number of dimensions like input.value = tf.ones([2,2]).

Re-Using Layers

When you create a new Layer object, usually you will pass it its input layers which will then make it the end-node of a graph connected to those input layers. This will also call the init_state method which initializes any tf.Variable objects that might be part of the layers' state. If you want to re-use this layer with a different set of input layers, you can use the reuse_with method. This creates a new layer with all the same parameters, additionally this new layer will share it's state with the previous one.

import tensorflow as tf
import tensorx as tx

# stateful input placeholder
x1 = tx.Input(n_units=2)
x1.value = tf.random.uniform([2, 2])
#y = Wx + b
l1 = tx.Linear(x1, n_units=3)
a1 = tx.Activation(l1, tx.relu)
l2 = tx.Linear(a1,n_units=4)

d1 = tx.Dropout(a1,probability=0.4)
l3 = l2.reuse_with(d1)

Warning

Any changes to the state of one layer will affect the state of the second.

Re-Using Modules

A Module is a special layer which creates a single Layer from a given layer graph. A layer graph is a set of layers connected to each other. For example:

x = tx.Input(tf.ones([2,2]))
y1 = tx.Linear(x,3)
y2 = tx.Linear(y1,4)

m = tx.Module(inputs=x,output=y2)

assert tx.tensor_equal(m(),y2())

You can take the two Linear layers and create a single module with a state shared with both layers. Like with any other layer you can also call reuse_with on a module and in this case, the entire state of the two Linear layers will again be shared with the newly created Module.

Gradients and Autodiff

Automatic differentiation is a cornerstone of most deep learning frameworks. TensorFlow remembers what operations happen and in what order during the forward pass, then, during the backpropagation pass, TensorFlow traverses this list of operations in reverse order to compute gradients --usually with respect to some input like a tf.Variable. Automatic differentiation can be accessed in Tensorflow using the tf.GradientTape context. Whatever is executed inside the GradientTape context, gets tracked so that the gradients with respect to some variables can be computed:

import tensorflow as tf
x = tf.Variable(3.0)

with tf.GradientTape() as tape:
  y = x**2

# dy = 2x * dx
dy_dx = tape.gradient(y, x)

TensorX Layers describe operations over tensors in terms of tensorflow operations, and store their state in tf.Variable objects, so layers executed inside the tf.GradientTape context are tracked just like any other Tensorflow operation. With this in mind, we can then compute the gradients of a particular value with respect to the trainable_variables used in the computation. For example:

import tensorflow as tf
import tensorx as tx

x = tx.Input(n_units=3)
# y = Wx + b
y = tx.Linear(x, 3, add_bias=True)
loss = tx.Lambda(y, fn=lambda v: tf.reduce_mean(v ** 2))
x.value = [[1., 2., 3.]]

with tf.GradientTape() as tape:  
    loss_value = loss()

    # we could have done this as well
    # v = y()
    # loss_value = tf.reduce_mean(v ** 2)

grads = tape.gradient(loss_value, y.trainable_variables)

assert len(y.trainable_variables) == 2
assert len(grads) == 2
assert grads[0].shape == y.weights.shape
assert grads[1].shape == y.bias.shape

In this case, only the weights, and bias of the Linear layer are trainable variables, so we can take the gradient of loss_value with respect to these variables, the result is a list of tensors with the same shape as the variables used as targets.

Tip

In these examples we're still using an eager execution model from Tensorflow, as we will see, this is good for debugging, but not very efficient. Next in this tutorial, we will show how we can compile TensorX layer graphs into Tensorflow graphs using the tf.function.

Graph Compilation

TensorX builds layer graphs automatically from layer objects are connected to each other. These graphs are, in effect, directed acyclic graphs (DAG) defining a given computation over inputs. To aid with validation and execution of neural network layer graphs, TensorX has a Graph utility class. The Graph class allows for automatic graph construction from output nodes (by recursively visiting each node's inputs). It also facilitates transversal by dependency ordering along with conversion of arbitrary graphs to functions and Tensorflow static graphs.

TensorX takes advantage of Tensorflow's graph optimization system to simplify and optimize Layer computations. It does this by converting layer graphs into functions that are then trace-compiled into an optimized TensorFlow static graphs.

x1 = Input(n_units=2)
x2 = Input(n_units=4)
l1 = Linear(x1,4)
l2 = Add(l1,x2)
l3 = Linear(l2,2)

g = Graph.build(outputs=l3,
                inputs=[x1,x2])
fn = g.as_function(compile=True)
# fn is holding the following function
@tf.function 
def compiled_graph():
    x1 = layers["x1"].compute()
    x2 = layers["x2"].compute()
    l1 = layers["l1"].compute(x1)
    l2 = layers["l2"].compute(l1,x2)
    l3 = layers["l3"].compute(l2)
    return l3

If no ord_inputs are given to as_function, the resulting function doesn't define input parameters. To feed values to such a function we would need to change the values of the inputs with x1.value = ... before calling fn(). If ord_inputs are passed (e.g. g.as_function(ord_inputs=[x1,x2])), these will map the parameters to the corresponding layers that must be inputs of the current graph, if so, the resulting function can be called with arguments as fn(value1,value2).

Just as Layer objects define implicit subgraphs, we can also build Callable functions and TensorFlow static graphs from any layer by calling layer.as_function(). Much like in the previous example, doing this will return a function without parameters. This is just syntax sugar for:

...
graph = Graph.build(inputs=None, outputs=self)
return graph.as_function(name=name, compile=compile)

Dev Notes

A function conversion procedure which uses parameters with optional values for Input layers is in development.

Models

TensorX uses the Model class to group together multiple layer graphs and simplify the configuration of a training loop with multiple callbacks. This part of the API might suffer some changes but at its core it's just intended as a way to group together layer graphs, optimizers, and a configurable training loop with Callbacks.

Docs in progress

Finish this documentation with examples

Callbacks

Docs in progress

Finish this documentation with examples

Serialization

Docs in progress

Finish this documentation with examples