This tutorial introduces basic TensorX concepts.
TensorX is a machine learning library to build neural network models written in Python and it works as a complement to Tensorflow, therefore, to make the most out of this tutorial (and this library), readers should be familiarized with the following:
- Python 3: if you're new to the Python language or need to refresh some concepts check out the Python Tutorial.
Tensorflow: Tensorflow is a high-performance machine learning library that allows numerical computation on GPUs and TPUs. It was originally designed to build auto-differentiable dataflow graphs. Although it has adopted an eager execution model in version 2, computation graphs are still an integral concept of high performance tensorflow program definition and deployment. Unfortunately most tensorflow tutorials and guides, focus on Keras, a high level interface similar to TensorX. For a primer on Tensorflow, I recommend taking a look at Tensorflow basics guide section instead.
NumPy: similarly to the core of Tensorflow, Numpy is a numerical computation library with a focus in multi-dimensional array transformations. Given that Tensorflow tensors are converted to and from NumPy arrays, and TensorX also depends on NumPy (mostly for testing), it is recommended that the reader is familiarized with
NumPy basics. For more details, check the NumPy documentation.
You can install
pip as follows:
pip install tensorflow pip install tensorx
for more details see the installation documentation.
In TensorX, a Layer is the basic building block of a neural network. Semantically speaking, a layer is an object that can have multiple inputs, an inner state, and a computation function applied to its inputs (that depends on the current state). Each layer has a single output. In essence,we can say that a Layer instance is a stateful function. Connecting a series of layers results in a layer graph. In TensorX, each layer is the end-node of a subgraph, and executing it will result in the execution of all layers in the subgraph with the current layer as output. Layer subclasses can range from simple linear transformations (e.g. Layer) to more complex layers used to build recurrent neural networks such as long short-term memory (LSTM) cells (e.g. LSTMCell) or attention mechanisms such as MHAttention.
Layer properties and methods
inputs: list of input layers for the current layer;
input: syntax sugar for
n_units: number of output units or neurons, this is the last dimension of the output tensor resulting from this layer's computation;
shape: the inferred shape for the layer output;
compute(*tensors): layer computation applied to its input layers or input tensors if any is given.
__call__: all layers are Callable and the result is the computation of the entire layer graph taking the current layer as the terminal node.
reuse_with: create a new layer object that shares the state with the current layer but is connected to different inputs. The new layer is the end-point node of a new layer graph.
tf.Variableobjects that handled by the current layer
tf.Variableobjects that are trainable, this is, that are changed by an optimizer during training.*
config: a layer configuration (LayerConfig) with the arguments used in the current layer instance constructor.
Using existing Layers
TensorX ships with a number of built in Layers that you can easily use to compose layer graphs that perform various
computations. All layers are accessible from the global namespace
tensorx.Linear or from the
The following example shows how to use a simple
Linear layer that performs the computation
import tensorflow as tf import tensorx as tx x = tf.random.uniform([2, 2], dtype=tf.float32) # y = Wx + b y = tx.Linear(x, n_units=3) result = y() assert tx.tensor_equal(tf.shape(result), [2, 3]) assert len(y.inputs) == 1 assert isinstance(y.input, tx.Constant)
Note that we can pass a
Tensor object to
Linear (or any other layer), and it will be automatically converted to a
Layer, to a
Constant layer to be more precise. The layer
y has exactly 1 input layer and
__call__ will return
the result of its computation on this input.
Dynamic stateful Input
Input layer allows us to add a dynamic input to a layer graph:
value = tf.random.uniform([2, 2], dtype=tf.float32) x = tx.Input(init_value=value) # y = Wx + b y = tx.Linear(x, n_units=3) result1 = y() # x is stateful and its value can be changed e.g. to a new random value x.value = tf.random.uniform([2,2], dtype=tf.float32) result2 = y() result3 = y.compute(value) assert not tx.tensor_equal(result1, result2) assert not y.input.constant # compute returns the layer computation independently from its current graph assert tx.tensor_equal(result1, result3) print(result1)
tf.Tensor( [[ 0.8232075 0.2716378 -0.33215973] [ 0.34996247 -0.02594224 -0.05033442]], shape=(2, 3), dtype=float32)
Input allows the creation of dynamic input layers with a value property that
can be changed, we can see that the value at the end-point of this graph changes as well. Moreover,
compute method is distinct from
__call__ as it only depends on the layer current state and
not on the current graph.
n_units is not set to
None on a dynamic
Input layer, it will take the last dimension of the initial value,
henceforth, any tensor assigned to
value must match the
n_units in its last dimension. This means that the batch
dimension can be variable for example.
You can't switch the number of dimension in a dynamic
Input. Without an initial value or shape, it
defaults to a shape
(0, 0) (an empty tensor with 2 dimensions). An error is thrown if you try to assign a tensor
with a mismatching number of dimensions. For example, if you create an input as follows
Input(shape=[None,None,None]), an error is thrown if you assign a tensor with a mismatching number of dimensions
input.value = tf.ones([2,2]).
When you create a new
Layer object, usually you will pass it its input layers which will then make it the end-node of
a graph connected to those input layers. This will also call the
init_state method which initializes any
objects that might be part of the layers' state. If you want to re-use this layer with a different set of input layers,
you can use the
reuse_with method. This creates a new layer with all the same parameters, additionally
this new layer will share it's state with the previous one.
import tensorflow as tf import tensorx as tx # stateful input placeholder x1 = tx.Input(n_units=2) x1.value = tf.random.uniform([2, 2]) #y = Wx + b l1 = tx.Linear(x1, n_units=3) a1 = tx.Activation(l1, tx.relu) l2 = tx.Linear(a1,n_units=4) d1 = tx.Dropout(a1,probability=0.4) l3 = l2.reuse_with(d1)
Any changes to the state of one layer will affect the state of the second.
Module is a special layer which creates a single
Layer from a given layer graph. A layer graph is a
set of layers connected to each other. For example:
x = tx.Input(tf.ones([2,2])) y1 = tx.Linear(x,3) y2 = tx.Linear(y1,4) m = tx.Module(inputs=x,output=y2) assert tx.tensor_equal(m(),y2())
You can take the two
Linear layers and create a single module with a state shared with both layers. Like with any
other layer you can also call
reuse_with on a module and in this case, the entire state of the two
will again be shared with the newly created
Gradients and Autodiff
Automatic differentiation is a cornerstone of most deep learning frameworks. TensorFlow remembers what operations
happen and in what order during the forward pass, then, during the backpropagation pass, TensorFlow traverses this list
of operations in reverse order to compute gradients --usually with respect to some input like a
Automatic differentiation can be accessed in Tensorflow using the
context. Whatever is executed inside the
GradientTape context, gets tracked so that the gradients with respect to some
variables can be computed:
import tensorflow as tf x = tf.Variable(3.0) with tf.GradientTape() as tape: y = x**2 # dy = 2x * dx dy_dx = tape.gradient(y, x)
TensorX Layers describe operations over tensors in terms of tensorflow operations, and store their state in
objects, so layers executed inside the
tf.GradientTape context are tracked just like any other Tensorflow operation.
With this in mind, we can then compute the gradients of a particular value with respect to the
used in the computation. For example:
import tensorflow as tf import tensorx as tx x = tx.Input(n_units=3) # y = Wx + b y = tx.Linear(x, 3, add_bias=True) loss = tx.Lambda(y, fn=lambda v: tf.reduce_mean(v ** 2)) x.value = [[1., 2., 3.]] with tf.GradientTape() as tape: loss_value = loss() # we could have done this as well # v = y() # loss_value = tf.reduce_mean(v ** 2) grads = tape.gradient(loss_value, y.trainable_variables) assert len(y.trainable_variables) == 2 assert len(grads) == 2 assert grads.shape == y.weights.shape assert grads.shape == y.bias.shape
In this case, only the
bias of the
Linear layer are trainable variables, so we can take the gradient
loss_value with respect to these variables, the result is a list of tensors with the same shape as the variables
used as targets.
In these examples we're still using an eager execution model from Tensorflow, as we will see, this is good for
debugging, but not very efficient. Next in this tutorial, we will show how we can compile TensorX layer graphs
into Tensorflow graphs using the
TensorX builds layer graphs automatically from layer objects are connected to each other. These graphs are, in effect,
directed acyclic graphs (DAG) defining a given computation over inputs. To aid with validation and execution
of neural network layer graphs, TensorX has a Graph utility class. The
Graph class allows for
automatic graph construction from output nodes (by recursively visiting each node's inputs). It also facilitates
transversal by dependency ordering along with conversion of arbitrary graphs to functions and Tensorflow static graphs.
TensorX takes advantage of Tensorflow's graph optimization system to simplify and optimize
Layer computations. It does
this by converting layer graphs into functions that are then trace-compiled into an optimized TensorFlow static graphs.
x1 = Input(n_units=2) x2 = Input(n_units=4) l1 = Linear(x1,4) l2 = Add(l1,x2) l3 = Linear(l2,2) g = Graph.build(outputs=l3, inputs=[x1,x2]) fn = g.as_function(compile=True)
# fn is holding the following function @tf.function def compiled_graph(): x1 = layers["x1"].compute() x2 = layers["x2"].compute() l1 = layers["l1"].compute(x1) l2 = layers["l2"].compute(l1,x2) l3 = layers["l3"].compute(l2) return l3
ord_inputs are given to
as_function, the resulting function doesn't define input parameters.
To feed values to such a function we would need to change the values of the inputs with
x1.value = ... before
ord_inputs are passed (e.g.
g.as_function(ord_inputs=[x1,x2])), these will map the parameters to the corresponding layers that must
be inputs of the current graph, if so, the resulting function can be called with arguments as
Layer objects define implicit subgraphs, we can also build
Callable functions and TensorFlow static graphs
from any layer by calling
layer.as_function(). Much like in the previous example, doing this will return a function
without parameters. This is just syntax sugar for:
... graph = Graph.build(inputs=None, outputs=self) return graph.as_function(name=name, compile=compile)
A function conversion procedure which uses parameters with optional values for
Input layers is in development.
TensorX uses the Model class to group together multiple layer graphs and simplify the configuration of a training loop with multiple callbacks. This part of the API might suffer some changes but at its core it's just intended as a way to group together layer graphs, optimizers, and a configurable training loop with Callbacks.
Docs in progress
Finish this documentation with examples
Docs in progress
Finish this documentation with examples
Docs in progress
Finish this documentation with examples