GSoC week 4: SVDbased Reservoir
The standard construction of the reservoir matrix \( \textbf{W} \) for Echo State Networks (ESN) is based on initializing \( \textbf{W} \) using specific schemes, usually generating random numbers and then rescaling it to make sure that the spectral radius is less or equal to a chosen number. This procedure is effective, but in literature other ways are explored. In this week we implemented a Singular Value Decomposition (SVD)based algorithm, described in [1], that is capable of obtaining a sparse matrix suitable for the ESNs. In this post we are going to describe the theory behind the implementation and then a couple of examples will be given.
Singular Value Decomposition reservoir construction
One of the key aspects the an ESN is for its reservoir to posses the Echo State Property (ESP) [2]. A sufficient condition to obtain it is to construct the reservoir with a spectral radius less than 1. While using the architecture explained in the opening ensures this condition, it doesn’t take into account the singular values information of \( \textbf{W} \), and it doesn’t allow much control over the construction of said matrix. An alternative could be to leverage the SVD to build a reservoir matrix given the largest singular value. To fully comprehend this procedure firstly we have to illustrate what SVD consists of.
Let us consider the reservoir matrix \( \textbf{W} \in \mathbb{R}^{N \times N}\); this matrix can be expressed as \( \textbf{W} = \textbf{U}\textbf{S}\textbf{V} \) where \( \textbf{U}, \textbf{V} \in \mathbb{R}^{N \times N}\) are orthogonal matrices and \( \textbf{S}=\text{diag}(\sigma _1, …, \sigma _N) \) is a diagonal matrix whose entries are ordered in increasing order. The values \( \sigma _i \) are called the singular values of \( \textbf{W} \). Given any diagonal matrix \( \textbf{S} \), and orthogonal matrices \( \textbf{U}, \textbf{V} \) the matrix \( \textbf{W} \) obtained as \( \textbf{W} = \textbf{U}\textbf{S}\textbf{V} \) has the same singular values as \( \textbf{S} \).
This method provides an effective way of ensuring the ESP without the scaling of the reservoir weights. Instead of using orthogonal matrices \( \textbf{U}, \textbf{V} \), that could produce a dense matrix \( \textbf{W} \), the authors opted for a two dimensional rotation matrix \( \textbf{Q}(i, j, \theta) \in \mathbb{R}^{N \times N}\) with \( \textbf{Q}_{i,i} = \textbf{Q}_{j,j} = \text{cos}(\theta)\), \( \textbf{Q}_{i,j} = \text{sin}(\theta)) \), \( \textbf{Q}_{j,i} = \text{sin}(\theta)) \) with \( i, j \) random values in [1, N] and \( \theta \) random value in [1, 1]. The algorithm proposed is as follows:
 Choose a predefined \( \sigma _N \) in the range [0, 1] and generate \( \sigma _i, i=1,…, N1 \) in the range (0, \( \sigma _N \)]. This values are used to create a diagonal matrix \( \textbf{S}=\text{diag}(\sigma _1, …, \sigma _N) \). With \( h=1 \) let \( \textbf{W}_1 = \textbf{S} \).
 For \( h = h + 1 \) randomly choose the two dimensional matrix \( \textbf{Q}(i, j, \theta) \) as defined above. \( \textbf{W} _h = \textbf{W} _{h1} \textbf{Q}(i, j, \theta)\) gives the matrix \( \textbf{W} \) for the step \( h \). This procedure is repeated until the chosen density is reached.
Implementation in ReservoirComputing.jl
The implementation into code is extremely straightforwad: following the instructions in the paper a function pseudoSVD
is created which takes as input the following
 dim: the desired dimension of the reservoir
 max_value: the value of the largest of the singular values
 sparsity: the sparsity for the reservoir
 sorted: optional value. If = true (default) the singular values in the diagonal matrix will be sorted.
 reverse_sort: optional value if sort = true. If = true (default = false) the singular values in the diagonal matrix will be sorted in a decreasing order.
Examples
Original ESN
Testing the SVD construction on the original ESN we can try to reproduce the Lorenz attractor, with similar parameters as given in the Introduction to Reservoir Computing


The values of the largest singular value for the construction of the SVD based reservoir is equal to the spectral radius of the standard reservoir, that in this case is greater than one. A plot of the results shows:
This construction is capable of reproducing the Lorenz system in the short term, and behaves better in the long term than the standard implementation, or at least in this example it does. A more in depth analysis is needed for the consistency of the results and the behavior of the SVD reservoir when the largest singular value is set greater than one and when one of the non linear algorithms is applied.
Ridge ESN, SVESM and ESGP
In order to test this implementation for others ESN architectures currently implemented in ReservoirComputing.jl we choose to use the same examples as last week, based on the MackeyGlass system:
$$\frac{dx}{dt} = \beta x(t)+\frac{\alpha x(t\delta)}{1+x(t\delta)^2}$$
with the same values:
 \(\beta = 0.1 \)
 \(\alpha = 0.2 \)
 \(\delta = 17 \)
 \( dt = 0.1 \)
Furthermore the time series is rescaled in the range \( [1, 1] \) by application of a tangent hyperbolic transform \( y_{ESN}(\text{t}) = \text{tanh}(\text{y}(t)1) \). To evaluate the precision of our results we are going to use root mean square deviation (rmse), defined as:
$$\textbf{rmse} = \sqrt{\frac{\sum_{i=1}^{T_n}(y_d(i)y(i))^2}{T_n}}$$
where
 \(y_d(i) \) is the target value
 \(y(i) \) is the predicted value
 \(T_d \) is the number of test examples
The ESN parameters are as follows


The largest singular value was set equal to the spectral radius for the standard construction. Averaging on ten runs the results are as follows:


For the ESGP this procedure yields far worst performances than the standard counterpart. For the ridge ESN and SVESM the results are almost identical.
The results obtained are interesting and for sure more testing is needed. Some sperimentation on the h steps ahead prediction could be done, as well as giving different values for the spectral radius and largest singular value, since in all the examples examined the spectral radius was chosen following the literature, and hence could be more optimized that the largest values that we used.
Documentation
[1] Yang, Cuili, et al. “Design of polynomial echo state networks for time series prediction.” Neurocomputing 290 (2018): 148160.
[2] Jaeger, Herbert. “The “echo state” approach to analysing and training recurrent neural networkswith an erratum note.” Bonn, Germany: German National Research Center for Information Technology GMD Technical Report 148.34 (2001): 13.