Neural Network Training

This problem is a sequel to Neural Network Calculation.

Now we shall use Neural Network to discover hidden relation in the data!

Here comes some mysterious function with 4 inputs and 1 output:

y  =  f(x1, x2, x3, x4)

Input data will contain several sets of "inputs" and corresponding result value of the function, like this:

0.052652 -0.378784 -0.222785 0.475783 -0.286463
...
-0.864796 0.054079 -0.542527 -0.222785 -0.277355

Every line contains 4 values of x1 ... x4 and last value is y - i.e. respective f(x1 ... x4). For example:

f(0.052652, -0.378784, -0.222785, 0.475783)  ==  -0.286463

What is this function? Perhaps these data are changes of currency prices from high-frequency trading platform, and we are trying to predict the latest change by preceeding one (suppose, trading robots create local oscillation involuntarily) - or maybe they come from seismic sensors and we are trying to predict coming of tsunami by measuring accelerations at four remote points on the ocean floor.

Or probably 5-th values in every line are just random? How to tell?

If there is some secret correlation there is a good chance to discover it with neural network. If on contrary small neural network doesn't tend to diverge to some solution - then probably data are really random.

So we are going to try predicting result of unknown function by its four inputs. As an answer we want the description of the network, which can successfully simulate this unknown function.

Training neural network with random search

Build simple NN with 4 inputs, 1 output and single hidden layer. Then run the training algorithm whichever you prefer. If you know no better, use random search in a manner like this:

Set some weights coefficient (zeroes or small random values) to all neuron inputs.
Calculate Yreal (output of NN) for every line in the training set.
For every calculation compare Yreal with expected Yexp and find difference squared D = (Yreal - Yexp) ^ 2.
Calculate average squared difference and extract square root of it, (i.e. root mean square error) sqrt((D1 + D2 + ... + Dn) / n) - let's call it our current best error.
Now try to change one or few weight coefficients and check if error is improved (becomes smaller). If not, restore weight coefficients and retry. If error is improved, remember it as new current best error.
Continue these iterations until error becomes quite small.

The idea is very simple - we only need to carefully implement it - and come up with good strategy of doing iterative changes to coefficients so that result diverges quickly enough.

Problem Statement

So the NN will have single hidden layer, but you can set arbitrary amount of neurons in it (let's call it H). Also you shoold choose output scale factor S arbitrarily. Checker for this problem will reconstruct NN by your description and run it against some more sets of input values (secret cross-check dataset) to tell if it gives results close to expected.

About "output scale factor": note that expected outputs (5-th column in training data) swing from -1.0 to 1.0. However output neuron has sigmoid function which makes it hard to reach border values. We can overcome this by multiplying these values by 0.5 for example so they are in "more linear" part of sigmoid function. Of course when NN yields result it should be converted back (divided by scale factor).

Input data: first comes N - number of lines in the training dataset.
Then lines themselves follow, in the format shown above.
Answer should give 5 * H + 2 numbers. Of them H comes first and S the second. Next follow weight coefficients of all neurons in the hidden layer and output layer - in the same order as used in previous task. All are simply space delimited.

Obviously with K neurons in hidden layer, your answer will have (after K itself) 4 * K weight coefficients of neurons of the hidden layer - and K weight coefficients for output layer. Thus 5 * K + 1 numbers.

Example:

input data:
4
0.052652375609813 -0.378784251978 -0.22278533421656 0.47578360066379 -0.28646310823256
-0.067601277471901 -0.37433812548313 0.052652375609813 -0.1770136490212 -0.7590187023125
-0.54252766907647 -0.056689748478961 -0.067601277471901 -0.44178507631181 -0.815443239296
-0.86479614798221 0.054079413349508 -0.54252766907647 -0.22278533421656 -0.27735579314

answer:
2 0.613163 -0.469 0.897 -0.371 0.842 0.820 0.306 2.511 0.259 1.023 0.636

Checker will accept the answer if your network gives root mean square error below 0.025 on cross-check dataset and error with any single calculation from it does not exceed 0.04.