'Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation' (November 2020)

Elisa Oostwal, Michiel Straat, Michael Biehl. In: Physica A: Statistical Mechanics and its Applications, Volume 564.

ABSTRACT: By applying concepts from the statistical physics of learning, we study layered neural networks of rectified linear units (ReLU). The comparison with conventional, sigmoidal activation functions is in the center of interest. We compute typical learning curves for large shallow networks with K hidden units in matching student teacher scenarios. The systems undergo phase transitions, i.e. sudden changes of the generalization performance via the process of hidden unit specialization at critical sizes of the training set. Surprisingly, our results show that the training behavior of ReLU networks is qualitatively different from that of networks with sigmoidal activations. In networks with K ≥ 3 sigmoidal hidden units, the transition is discontinuous: Specialized network configurations co-exist and compete with states of poor performance even for very large training sets. On the contrary, the use of ReLU activations results in continuous transitions for all K. For large enough training sets, two competing, differently specialized states display similar generalization abilities, which coincide exactly for large hidden layers in the limit K → ∞. Our findings are also confirmed in Monte Carlo simulations of the training processes.


'Phase Transitions in Layered Neural Networks: The Role of The Activation Function' (December 2020)

Master thesis  -  Supervisors: Michael Biehl, Kerstin Bunte, Michiel Straat

ABSTRACT: One of the improvements in the model of artificial neural networks is the use of other activation functions than the conventionally used sigmoidal function. Many alternative activation functions have been proposed which are all claimed to have a superior performance. While an extensive comparison of activation functions based on their performance has been made, a theoretical foundation that explains the observed differences is lacking. In this thesis we investigate which characteristics of the activation functions determine the type of phase transition. For this, we borrow concepts from statistical physics to research the learning behaviour of artificial neural networks in the context of off-line learning. Five activation functions are studied: sigmoidal, Rectified Linear Unit, Leaky Rectified Linear Unit, Piecewise Linear Unit, and a novel activation function, dubbed Rectified Piecewise Linear Unit. Our research shows that sigmoidal and PLU activation both cause a discontinuous phase transition in networks with more than three hidden units, whereas ReLU and LReLU activation induce a continuous phase transition. RePLU causes a discontinuous phase transition for a particular range of its slope, but provokes a continuous phase transition when its slope exceeds this upper limit. We hypothesize that a continuous phase transition is established when the response of the activation function is linear and the slope on the negative domain differs from the slope on the positive domain.

'Learning of single-layer neural networks: ReLU vs. sigmoidal activation' (November 2019)

Research project  -  Supervisors: Michael Biehl, Michiel Straat

ABSTRACT: Due to great advancements in hardware and the availability of large amounts of data, the topic of artificial neural networks has regained interest from scientists. Conventionally, sigmoidal activation is used in the hidden units of these networks. However, so-called rectified linear units (ReLU) have been proposed as a better alternative, due to the activation function's computational ease and higher training rate compared to sigmoidal activation. These claims are however mainly based on empirical data, and thus a theoretical approach is needed to understand the fundamental differences between sigmoidal activation and ReLU activation, if there are any at all. In this study we have investigated why ReLU might perform better than sigmoidal units by researching their fundamental differences in the context of off-line learning, using a statistical physics approach. We have restricted ourselves to shallow networks with a single hidden layer as a first model system. We found that, while sigmoidal undergoes a first order phase transition for three hidden units, ReLU still experiences a second order phase transition in this case, which is beneficial for the performance. This provides theoretical evidence that indeed ReLU performs better than sigmoidal, at least for this small number of units.

'Efficiency of organic solar cells: Improving a model for the fill factor' (July 2018)

Bachelor thesis  -  Supervisors: Michael Biehl, Jan Anton Koster

ABSTRACT: The need for renewable ways of generating energy such as solar energy is urgent. Solar cells made from inorganic materials have already made great progress, the state of the art inorganic solar cells having an efficiency of around 45%. There are however alternatives to these inorganic solar cells, namely organic solar cells, which have interesting mechanical properties such as flexibility. It is however difficult to obtain a high efficiency for these, most recent research obtaining an efficiency of only 15%. It is therefore needed to get a better understanding of what determines the efficiency of organic solar cells. In this research an attempt has been made to improve the already existing model for fill factor, a measure of the efficiency, which has been developed by Bartesaghi et al.. This has been done using two approaches. First from the point of view of physics, using both steady-state and transient simulations. For the former it was needed to set up a numerical model of a solar cell. In the second half we have tried to optimize the model from the side of computing science, mainly using a basic technique from machine learning: linear regression.