Адаптивна гібридна функція активації для глибоких нейронних мереж
The adaptive hybrid activation function (AHAF) is proposed that combines the properties of the rectifier units and the squashing functions. The proposed function can be used as a drop-in replacement for ReLU, SiL and Swish activations for deep neural networks and can evolve to one of such functions...
Gespeichert in:
| Datum: | 2022 |
|---|---|
| Hauptverfasser: | , |
| Format: | Artikel |
| Sprache: | Englisch |
| Veröffentlicht: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2022
|
| Schlagworte: | |
| Online Zugang: | https://journal.iasa.kpi.ua/article/view/259203 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | System research and information technologies |
| Завантажити файл: | |
Institution
System research and information technologies| _version_ | 1866391922028838912 |
|---|---|
| author | Bodyanskiy, Yevgeniy Kostiuk, Serhii |
| author_facet | Bodyanskiy, Yevgeniy Kostiuk, Serhii |
| author_sort | Bodyanskiy, Yevgeniy |
| baseUrl_str | http://journal.iasa.kpi.ua/oai |
| collection | OJS |
| datestamp_date | 2022-06-21T10:27:50Z |
| description | The adaptive hybrid activation function (AHAF) is proposed that combines the properties of the rectifier units and the squashing functions. The proposed function can be used as a drop-in replacement for ReLU, SiL and Swish activations for deep neural networks and can evolve to one of such functions during the training. The effectiveness of the function was evaluated on the image classification task using the Fashion-MNIST and CIFAR-10 datasets. The evaluation shows that the neural networks with AHAF activations achieve better classification accuracy comparing to their base implementations that use ReLU and SiL. A double-stage parameter tuning process for training the neural networks with AHAF is proposed. The proposed approach is sufficiently simple from the implementation standpoint and provides high performance for the neural network training process. |
| doi_str_mv | 10.20535/SRIT.2308-8893.2022.1.07 |
| first_indexed | 2025-07-17T10:27:52Z |
| format | Article |
| fulltext |
Ye. Bodyanskiy, S. Kostiuk, 2022
Системні дослідження та інформаційні технології, 2022, № 1 87
UDC 004.8:004.032.26
DOI: 10.20535/SRIT.2308-8893.2022.1.07
ADAPTIVE HYBRID ACTIVATION FUNCTION
FOR DEEP NEURAL NETWORKS
Ye. BODYANSKIY, S. KOSTIUK
Abstract. The adaptive hybrid activation function (AHAF) is proposed that com-
bines the properties of the rectifier units and the squashing functions. The proposed
function can be used as a drop-in replacement for ReLU, SiL and Swish activations
for deep neural networks and can evolve to one of such functions during the train-
ing. The effectiveness of the function was evaluated on the image classification task
using the Fashion-MNIST and CIFAR-10 datasets. The evaluation shows that the
neural networks with AHAF activations achieve better classification accuracy com-
paring to their base implementations that use ReLU and SiL. A double-stage pa-
rameter tuning process for training the neural networks with AHAF is proposed. The
proposed approach is sufficiently simple from the implementation standpoint and
provides high performance for the neural network training process.
Keywords: adaptive hybrid activation function, double-stage parameter turning pro-
cess, deep neural networks.
INTRODUCTION
In the recent years deep neural networks (DNNs) have got a wide proliferation for
solving ranges of problems in virtually all areas of human activity, including the
fields of Data Mining, Big Data, Data Science, digital video and audio signal process-
ing, natural language processing, forecasting and control of complex systems [1–6].
The common property of all neural networks is their learning ability which
consists of tuning the parameters (and, possibly, architectures) during the process-
ing of available information and their universal approximation capabilities [7, 8]
that allows to analyze and recover arbitrary complex nonlinear dependencies in
the source data.
The most popular neural node of the DNN is the elementary perceptron of
F. Rosenblatt which uses so-called squashing functions as their activation func-
tions [7], such as sigmoid σ-functions, which are the most common squashing
functions, tanh, Softsign, Satlin, aretan and others. At the same time the applica-
tion of squashing functions runs against computational difficulties (so-called ef-
fect of vanishing gradient) when their derivatives approach zero while the input
signal moves further from the origin.
Thereby instead of the squashing functions various DNN implementations
commonly use piece-wise activation functions that belong to the so-called “recti-
fied unit” family [9] which includes ReLU, ELU, PReLU, LReLU, NReLU and
other similar functions [10–12]. It shall be noted that piece-wise activation func-
tions allow only piece-wise approximation, i.e., the number of nodes and layers in
the neural network shall be significantly increased to provide the required ap-
proximation capacity for non-trivial dependencies.
At the same time, there is a relatively wide group of recurrent neural net-
works [13] such as long-short-term memories, transformers and similar networks
Ye. Bodyanskiy, S. Kostiuk
ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 88
that use squashing functions in their gated recurrent units [3], so the hybrid acti-
vation functions were introduced that combine the properties of both the rectifiers
and sigmoid functions. The list of hybrid functions includes [14], Swish [15],
S-shaped [16], WiG [17] and other similar functions [18, 19].
All such hybrid activation functions have some free parameters that define
their exact shape, amplitude and singular points which shall be in some way se-
lected and adjusted for solving specific tasks. In this regard, it is advisable to in-
troduce some additional procedures for automatic adjustment of the activation
functions parameters. [20–25] address the off-line procedures that allow to find
the required function parameters after the synaptic weights of the network are al-
ready set up. It is clear this approach significantly increases the training time.
In [26] the adaptive parametric rectified linear activation function (Ad-
PReLU) was introduced where the parameters were adjusted simultaneously with
the synaptic weights during the error backpropagation procedure. This approach
allowed to reduce the training time and improve the quality of the obtained solu-
tion compared to Adaline, ReLU and tanh on the prediction task.
It is advisable to implement a similar approach for hybrid activation func-
tions [14–19] and synthesize on their basis an adaptive activation function that is
a generalization of the ones that are already used in the DNN applications.
ARCHITECTURE OF A NEURON WITH ADAPTIVE HYBRID ACTIVATION
FUNCTION
Elementary perceptron of F. Rosenblatt as node of a neural network performs a
non-linear transformation of the following form:
))(())(()()()(ˆ
01
0 kukxwkxwkxwky jj
T
jj
n
i
ijij
n
i
ijijjj
,
where )(ˆ ky j — output signal of the j-th neuron of the network on the k-th data
processing step, ...,,...,3,2,1 Nk , ))(( ku jj —non-linear transformation that
is performed by the activation function on the signal of internal activation )(ku j ,
0j — threshold signal, jiw — synaptic weight on the i-th input of the j-th neu-
ron, ni ,...,2,1,0 , 00 jjw , 1
10 ),...,,( nT
jnjjj Rwwww , ),...(,1()( 1 kxkx
T
n kx ))(..., — 1)1( n — dimensional vector of the input signals.
One of the most popular activation functions in the neural networks is a so-
called sigmoid one that is studied by G. Cybenko [7] and has the following form:
jjujjjj
e
uu
1
1
)()( , (1)
where j — so-called gain parameter [20] that defines the shape of this function.
The gain parameter value is often assumed to be equal to 1.
While the usage of sigmoid activation functions allows to provide universal
approximation capabilities for the neural network, its application in DNNs runs
up against computational complexities when the signal of internal activation starts
Adaptive hybrid activation function for deep neural networks
Системні дослідження та інформаційні технології, 2022, № 1 89
to rise in its amplitude. In those cases, the derivative of the -function ap-
proaches zero, i.e., the effect of “vanishing gradient” increases.
To overcome this problem, we propose using a hybrid activation function of
the following form:
jju
jj
jjjjjj
e
u
uuu
1
)()( , (2)
where j and j — parameters that shall be determined together with the synap-
tic weights during the training process. Being a modification of (1), activation
function (2) does not suffer from the vanishing gradient effect. Note that the de-
rivative of (2) by the signal of internal activation:
)))(1(1()(
)(
jjjjjjj
j
jj uuu
u
u
produces small by amplitude values only when 0ju that can be compensated
by dialing the gain parameter j .
Fig. 1 shows the architecture of an artificial neuron with adaptive hybrid ac-
tivation function (2) (AHAF) in which function parameters j and j are trained
together with the vector of synaptic weights.
Here j — external reference signal, )(ˆ 0 jjjjj uyyye =
1)1( jju
jjj euy — learning error.
TRAINING ALGORITHM FOR A NEURON WITH AHAF
For training artificial neurons with AHAF we use the standard δ-rule [9] that for
a regular perceptron of F. Rosenblatt and the error squared loss criteria:
2
0
22 )()(
2
1
)))(()((
2
1
)(
2
1
)(
kxwkykukykekE i
n
i
jijjjjjjj
xn wjn
x1
x2
1
wj2
wj1
wj0
yj
jŷ
σ(γj,uj)
x
βj
ej
–
Fig. 1. Neuron with adaptive hybrid activation function (AHAF)
Ye. Bodyanskiy, S. Kostiuk
ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 90
allows to refine the synaptic weights with a recurrent procedure:
ji
j
j
j
wjiji w
ke
ke
kE
kkwkw
)(
)(
)(
)()1()(
ji
j
j
j
jwji
ji
j
jwji w
ku
ku
ke
kekkw
w
ke
kekkw
)(
)(
)(
)()()1(
)(
)()()1(
)()()()1()())(()()()1( kxkkkwkxkukekkw ijwjiijjjwji ,
where )(kw — learning rate parameter the choice of which determines the con-
vergence rate and the filtering (smoothing) abilities of the algorithm,
))(()()( kukek jjjj — so-called -error, based on which the error back-
propagation procedure is implemented for training of multilayer neural networks.
For a neuron with AHAF that has a two-layer architecture (i.e., the first lay-
er — synaptic weights nwji ,...,1,0 , the second — tunable parameters j and
j ), backpropagation is implemented on a per-neuron level: parameters of the
activation function are tuned first, then — the synaptic weights. This training pro-
cedure is referenced in this paper as the double-stage parameter tuning procedure
(the DSPT procedure).
Considering that the -rule for tuning the activation function parameters
),,( jjjj u :
,
1
)(
jju
j
jjj
j
j
e
u
uu
jj
jj
jj u
u
u
j
jjjjjjj
j
j
e
e
e
u
uuu
11
))(1)((
2
2
can be written in the form of:
j
j
jj
j
j
jj
k
kekk
kE
kkk
)(
)()()1(
)(
)()1()(
)))1(),1(),(()(()()1( kkkukykk jjjjjj
)),()1(()( kukku jjj
where )()1()( kxkwku T
jj , and:
j
j
jj
j
j
jj
k
kekk
kE
kkk
)(
)()()1(
)(
)()1()(
)))1(),1(),(()(()()1( kkkukykk jjjjjj
)))()1((1())()1(()()1( 2 kukkukykuk jjjjjj
,)))()1((1())()1(()()()()1( 2 kukkukykukekk jjjjjjj
the training error can be recalculated after the tuning is performed for j and j :
))(),(),(()()(~ kkkukyke jjjjjj
Adaptive hybrid activation function for deep neural networks
Системні дослідження та інформаційні технології, 2022, № 1 91
)()1()(
T
)()( T
1
)()1()(
)(
1
)()(
)(
kxkwk
jj
jkuk
jj
j
jjjj
e
kxkwk
ky
e
kuk
ky
,
and the synaptic weights are turned:
)(
)(
))(),(),((
)(~)()1()( kx
ku
kkku
kekkwkw i
j
jjj
jwjiji
))()(()()(~)()1( kukkkekkw jjjjwji
)()))()((1())()(1( kxkukkku ijjjj )()(
~
)()1( kxkkkw ijwji ,
where
))(),(),(()(~)(
~
kkkukek jjjjjj
)))()((1()()(1))(()(()()(~ kukkkukukkke jjjjjjjj .
With regards to selection of the learning rate parameters ηβ, ηγ, ηw, the adap-
tive training algorithms like Adam [27], that are popular in DNNs, can be suc-
cessfully replaced by the ones with the filtering and tracking properties [28] that
have a sufficiently high speed of convergence.
For training of multi-layer networks, the hybrid error back propagation pro-
cedure can be used that, comparing to the standard one, calculates the training
error and the -error twice per each hybrid layer of the network: ),(ke j ),(~ ke j
),(kj )(
~
kj .
EVALUATION
Performance of the adaptive hybrid activation function was evaluated on the im-
age classification task on two different datasets with two base neural network ar-
chitectures in a similar way to [29]. The base architectures were modified to use
AHAF activations instead of “classic” activations like ReLU and SiL. The per-
formance of the modified networks was compared to the reference implementa-
tions. The neural network implementations together with the valuation and train-
ing environment were coded in Python 3.8 using PyTorch 1.9.0 [30]. The
implementation is publicly available on GitHub: https://git.io/JDBIZ.
A. Dataset
The models with adaptive hybrid activation function were evaluated on two data-
sets: Fashion-MNIST [31] and CIFAR-10 [32].
Fashion-MNIST is a dataset that contains 60000 monochrome images, each
2828 pixels in size, with associated class labels. Out of all images, 50000 im-
ages are used for training and 10000 are used for validation. The classes are ex-
clusive, the one-hot encoding was used for the class labels. The pixel values were
divided by 255 to rescale them to the ]0,10,0[ range. The images were aug-
mented using the random horizontal flip with the flip probability of 0,5 and the
random shift by both width and height with the maximum shift factor of 0,1.
CIFAR-10 is a dataset of 60000 RGB images, each 32×32 pixels in size and
each having a one of 10 class labels associated with it. The train to test distribu-
tion is 5:1, where all images are randomly selected from the whole dataset. The
Ye. Bodyanskiy, S. Kostiuk
ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 92
classes are exclusive, the one-hot encoding was used for the class labels. Pixel
values on all color channels were rescaled to the [0,0..1,0] range using division by
255. The training set was augmented using the random horizontal flip with probability
of 0,5 and the random horizontal and vertical shift by the maximum factor of 0,1.
B. Neural Networks and Activations
Two base neural networks architectures were used in the experiment: LeNet-5
[33] and KerasNet from Keras version 1.2.2 [34].
LeNet-5 is a simple convolutional neural network consisting of 4 layers: 2
convolutional layers with pooling and activation functions, 1 linear layer with an
activation function and 1 output linear layer with Softmax. The convolutional
layers use 55 filters with 20 output channels for the first layer and 50 output
channels for the second layer. Max pooling with the kernel of 22 is used as the
pooling implementation. The hidden linear layer has 500 output features, the
output layer has 10, one per each class. Several variants of LeNet-5 were used for
evaluation: one with ReLU activations for the hidden layers, one with SiL, one
AHAF activation initialized as ReLU and one with AHAF activation initialized as
SiL. The total number of parameters depends on the size of the input images:
431000 and 657000 for Fashion-MNIST and CIFAR-10 correspondingly. The
total number of parameters does not count the parameters of AHAF activations.
KerasNet is a neural network that is partially similar to VGG. The network
has 6 layers: 4 convolutional layers with activation functions with each second
layer followed by max pooling with dropout, 1 hidden linear layer with an
activation function and dropout, 1 output linear layer with Softmax activation.
The first and the second convolutional layers have 32 output channels with 33
filters, the first layer applies 11 padding to its input, while the second one does
not apply any padding. Max pooling with 22 kernels and the dropout with the
probability of 0,25 follow the first two convolutional layers. The third and the
fourth convolutional layers use 33 filters and have 64 output channels, the third
layer applies 1×1 padding while the fourth does not apply any padding. Max
pooling with the kernel size of 22 and the dropout with the probability of 0,25
are used after the third and fourth convolutional layers. The hidden linear layer
has 512 output features, dropout with the probability of 0,5 is applied after the
hidden linear layer. The output layer has 10 output features, one per each class.
Several variants of KerasNet were used for evaluation: one with ReLU activations
for the hidden layers, one with SiL, one AHAF activation initialized as ReLU and
one with AHAF activation initialized as SiL. The total number of parameters
depends on the size of the input images: 889834 and 1250858 for Fashion-
MNIST and CIFAR-10 correspondingly. The total number of parameters does not
count the parameters of AHAF activations.
C. Training Procedures
The neural networks were trained on the Fashion-MNIST and CIFAR-10 datasets
with the batch size of 64 for 100 epochs on a laptop with NVIDIA GeForce GTX
1650 Max-Q. The RMSprop optimizer was used for training with the initial learn-
ing rate of 10-4 and the learning rate decay of 10-6 applied per one minibatch.
The neural network variants with AHAF activations were trained using the
“classic” training procedure (when all trainable parameters are updated in one go)
and the DSPT procedure. Implementation of the DSPT procedure uses separate
instances of the optimizer class per each set of parameters one per all AHAF
parameters, one per the trainable parameters outside of AHAF activations.
Adaptive hybrid activation function for deep neural networks
Системні дослідження та інформаційні технології, 2022, № 1 93
The training set loss and the test set accuracy were recorded per for each of
the training runs. The results of the training are analyzed and presented in the
following section.
D. Analysis of Results
The network variants with AHAF activations outperform the base implementa-
tions with ReLU and SiL activations on both CIFAR-10 and Fashion-MNIST.
LeNet-5 achieves the best results on the Fashion-MNIST dataset with AHAF ac-
tivations initialized as ReLU and the DSPT procedure. KerasNet achieves the best
results on the CIFAR-10 dataset with AHAF activations initialized as SiL and the
DSPT procedure. Table presents the best achieved test set accuracy and the epoch
number when this result was achieved for each of the network variants, datasets
and parameter tuning procedures used for evaluation.
Best test set accuracy, up to 100 epochs
Fashion-MNIST CIFAR-10
Network Activ. Init. Proc.
Acc.,% Epoch Acc.,% Epoch
LeNet-5 ReLU N/A Classic 91,43 98 75,89 96
LeNet-5 SiL N/A Classic 90,60 95 73,76 95
LeNet-5 AHAF ReLU Classic 91,55 99 76,69 95
LeNet-5 AHAF SiL Classic 91,16 99 74,47 99
LeNet-5 AHAF ReLU DSPT 91,73 93 74,44 95
LeNet-5 AHAF SiL DSPT 90,95 100 74,05 95
KerasNet ReLU N/A Classic 91,29 100 79,36 97
KerasNet SiL N/A Classic 91,76 93 79,83 99
KerasNet AHAF ReLU Classic 91,30 84 79,71 100
KerasNet AHAF SiL Classic 92,02 97 80,31 98
KerasNet AHAF ReLU DSPT 91,35 55 79,30 96
KerasNet AHAF SiL DSPT 91,96 98 80,37 98
Analysis of the dependency between the training loss, test set accuracy and
the training epoch shows the potential for performance improvements using long-
er training runs (running the training for more epochs), different optimizers and
learning rates. For KerasNet on the CIFAR-10 dataset the SiL-initialized AHAF
activation function consistently shows lower training loss and higher test set
accuracy comparing to the base implementation with SiL. Fig. 2 illustrates the
,
,
,
,
,
,
,
Fig. 2. Dependency between the loss, accuracy and the training epoch for KerasNet
network on CIFAR-10
Ye. Bodyanskiy, S. Kostiuk
ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 94
dependency between the training loss, the test set error, and the training epoch for
the KerasNet network trained on the CIFAR-10 dataset.
For neural networks with AHAF initialized as ReLU, AHAF keeps its
ReLU-like form, but changes the amplitude during the training process. This
observation can be explained by the values of the gradient with respect to the γ
parameter — the gradient decreases with the increase of the γ parameter. For
neural networks with AHAF initialized as SiL, AHAF changes its form and
amplitude during the training process. Fig. 3 and Fig. 4 show the form of the
activation functions for the two final neurons of the KerasNet network trained on
the CIFAR-10 dataset with ReLU-like and SiL-like AHAF activations
correspondingly.
CONCLUSIONS
Proposed an adaptive hybrid activation function (AHAF) that is applicable for
usage in feed-forward and recurrent deep neural networks and combines the
properties of both squashing functions and the ones from the rectified unit family.
This function does not suffer from the effect of “vanishing gradient” and its pa-
rameters are trained together with the synaptic weights. Introduced a training al-
gorithm for a neuron based on AHAF.
The proposed approach is sufficiently simple from the implementation
standpoint and provides high performance for the neural network training process.
REFERENCES
1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning”, Nature, vol. 521, no. 7553,
pp. 436–444, 2015. doi: 10.1038/nature14539.
2. J. Schmidhuber, “Deep learning in neural networks: An overview”, Neural Net-
works, vol. 61, pp. 85–117, 2015. doi: 10.1016/j.neunet.2014.09.003.
, ,
,
,
,
,
, ,
Fig. 3. The activation function form for AHAF initialized as ReLU
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Fig. 4. The activation function form for AHAF initialized as SiL
Adaptive hybrid activation function for deep neural networks
Системні дослідження та інформаційні технології, 2022, № 1 95
3. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
4. D. Graupe, Deep Learning Neural Networks: Design and Case Studies. USA: World
Scientific Publishing Co., Inc., 2016.
5. A.L. Caterini and D.E. Chang, Deep Neural Networks in a Mathematical Frame-
work, 1st ed. Springer Publishing Company, Incorporated, 2018.
6. C.C. Aggarwal, Neural Networks and Deep Learning: A Textbook, 1st ed. Springer
Publishing Company, Incorporated, 2018.
7. G. Cybenko, “Approximation by superpositions of a sigmoidal function”, Mathemat-
ics of Control, Signals and Systems, vol. 2, no. 4, pp. 303–314, 1989. doi:
10.1007/BF02551274.
8. K. Hornik, “Approximation capabilities of multilayer feedforward networks”, Neural
Networks, vol. 4, no. 2, pp. 251–257, 1991. doi: 10.1016/0893-6080(91)90009-T.
9. A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Proc-
essing, 1st ed. USA: John Wiley & Sons, Inc., 1993.
10. K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Hu-
man-Level Performance on ImageNet Classification”, in 2015 IEEE International
Conference on Computer Vision (ICCV), 2015, pp. 1026–1034. doi:
10.1109/ICCV.2015.123.
11. D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network
Learning by Exponential Linear Units (ELUs)”, arXiv [cs.LG], 2016. doi:
10.1162/neco.1997.9.8.1735.
12. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recogni-
tion”, in 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.
13. S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory”, Neural Comput.,
vol. 9, no. 8, pp. 1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735.
14. S. Elfwing, E. Uchibe, and K. Doya, “Sigmoid-Weighted Linear Units for Neural
Network Function Approximation in Reinforcement Learning”, arXiv [cs.LG], 2017.
15. P. Ramachandran, B. Zoph, and Q.V. Le, “Searching for Activation Functions”,
arXiv [cs.NE], 2017.
16. X. Jin, C. Xu, J. Feng, Y. Wei, J. Xiong, and S. Yan, “Deep Learning with S-shaped
Rectified Linear Activation Units”, arXiv [cs.CV], 2015.
17. M. Tanaka, “Weighted Sigmoid Gate Unit for an Activation Function of Deep Neu-
ral Network”, arXiv [cs.CV], 2018.
18. B. Yuen, M.T. Hoang, X. Dong, and T. Lu, “Universal Activation Function For Ma-
chine Learning”, arXiv [cs.LG], 2020.
19. D. Misra, “Mish: A Self Regularized Non-Monotonic Activation Function”, arXiv
[cs.LG], 2020.
20. J.K. Kruschke and J.R. Movellan, “Benefits of gain: speeded learning and minimal
hidden layers in back-propagation networks”, IEEE Transactions on Systems, Man,
and Cybernetics, vol. 21, no. 1, pp. 273–280, 1991. doi: 10.1109/21.101159.
21. Z. Hu and H. Shao, “The study of neural network adaptive control systems”, Control
and Decision, no. 7, pp. 361–366, 1992.
22. C.-T. Chen and W.-D. Chang, “A Feedforward Neural Network with Function Shape
Autotuning”, Neural Netw., vol. 9, no. 4, pp. 627–641, 1996. doi: 10.1016/0893-
6080(96)00006-8.
23. E. Trentin, “Networks with Trainable Amplitude of Activation Functions”, Neural
Netw., vol. 14, no. 4–5, pp. 471–493, 2001. doi: 10.1016/S0893-6080(01)00028-4.
24. F. Agostinelli, M. Hoffman, P. Sadowski, and P. Baldi, “Learning Activation Func-
tions to Improve Deep Neural Networks”, arXiv [cs.NE], 2015.
25. L.R. Sütfeld, F. Brieger, H. Finger, S. Füllhase, and G. Pipa, “Adaptive Blending Units:
Trainable Activation Functions for Deep Neural Networks”, arXiv [cs.LG], 2018.
26. Y.V. Bodyanskiy, A. Deineko, I. Pliss, and V. Slepanska, “Formal Neuron Based on
Adaptive Parametric Rectified Linear Activation Function and its Learning”, in
Proc. 1st Int. Workshop on Digital Content & Smart Multimedia “DCSMART
2019”, vol. 2533, pp. 14–22.
27. D.P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization”, arXiv
[cs.LG], 2017.
Ye. Bodyanskiy, S. Kostiuk
ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 96
28. P. Otto, Y. Bodyanskiy, and V. Kolodyazhniy, “A new learning algorithm for a fore-
casting neuro-fuzzy network”, Integrated Computer-Aided Engineering, vol. 10, pp.
399–409, 2003. doi: 10.3233/ICA-2003-10409.
29. F. Manessi and A. Rozza, “Learning Combinations of Activation Functions”, CoRR,
vol. abs/1801.09403, 2018.
30. A. Paszke et al., “PyTorch: An Imperative Style, High-Performance Deep Learning
Library”, in Advances in Neural Information Processing Systems 32, H. Wallach, H.
Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Reds Curran
Associates, Inc., 2019, pp. 8024–8035.
31. H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a Novel Image Dataset for
Benchmarking Machine Learning Algorithms”, arXiv [cs.LG], 2017.
32. A. Krizhevsky, Learning multiple layers of features from tiny images, 2009.
33. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to
document recognition”, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
1998. doi: 10.1109/5.726791.
34. F. Chollet et al., “Keras”, 2015. [Online]. Available: https://github.com/ fchollet/keras.
Received 17.12.2021
INFORMATION ON THE ARTICLE
Yevgeniy V. Bodyanskiy, ORCID: 0000-0001-5418-2143, Kharkiv National University
of Radio Electronics, Ukraine, e-mail: yevgeniy.bodyanskiy@nure.ua
Serhii O. Kostiuk, ORCID: 0000-0003-4196-2524, Kharkiv National University of Radio
Electronics, Ukraine, e-mail: serhii.kostiuk@nure.ua
АДАПТИВНА ГІБРИДНА ФУНКЦІЯ АКТИВАЦІЇ ДЛЯ ГЛИБОКИХ
НЕЙРОННИХ МЕРЕЖ / Є.В. Бодянський, С.О. Костюк
Анотація. Запропоновано адаптивну гібридну функцію активації (AHAF), що
поєднує особливості випрямних блоків (rectifier units) та стискальних (squash-
ing) функцій. Запропонована функція може бути використана як пряма заміна
активаційних функцій ReLU, SiL і Swish для глибоких нейронних мереж, а та-
кож набути форми однієї з цих функцій в процесі навчання. Ефективність
функції досліджено на задачі класифікації зображень на наборах даних Fashion-
MNIST і CIFAR-10. Результати дослідження показують, що нейронні мережі з
активаційними функціями AHAF показують точність класифікації кращу, ніж
їх базові реалізації на основі ReLU та SiL. Запропоновано двоетапний процес
налаштування параметрів для навчання нейронних мереж з AHAF. Запропоно-
ваний підхід достатньо простий в реалізації та забезпечує високу продуктив-
ність у навчанні нейронної мережі.
Ключові слова: адаптивна гібридна функція активації, двоетапний процес на-
лаштування параметрів, глибокі нейронні мережі.
АДАПТИВНАЯ ГИБРИДНАЯ ФУНКЦИЯ АКТИВАЦИИ ДЛЯ ГЛУБОКИХ
НЕЙРОННЫХ СЕТЕЙ / Е.В. Бодянский, С.А. Костюк
Аннотация. Предложена адаптивная гибридная функция активации (AHAF),
которая объединяет свойства выпрямительных блоков (rectifier units) и сжи-
мающих (squashing) функций. Предложенная функция может быть использо-
вана как прямая замена активационных функций ReLU, SiL и Swish для глубо-
ких нейронных сетей, а также принимать форму одной из этих функций в
процессе обучения. Эффективность функции исследована на задаче классифи-
кации изображений на наборах данных Fashion-MNIST и CIFAR-10. Результа-
ты исследования показывают, что нейронные сети с активационными функ-
циями AHAF показывают точность классификации лучшую, чем их базовые
реализации на основе ReLU и SiL. Предложено двухэтапный процесс настрой-
ки параметров для обучения нейронных сетей с AHAF. Предложенный подход
достаточно простой в реализации и обеспечивает высокую продуктивность в
обучении нейронной сети.
Ключевые слова: адаптивная гибридная функция активации, двухэтапный
процесс настройки параметров, глубокие нейронные сети.
|
| id | journaliasakpiua-article-259203 |
| institution | System research and information technologies |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2025-07-17T10:27:52Z |
| publishDate | 2022 |
| publisher | The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" |
| record_format | ojs |
| resource_txt_mv | journaliasakpiua/c7/2639b3262e2ac7bbf14ef9a54b2ae5c7.pdf |
| spelling | journaliasakpiua-article-2592032022-06-21T10:27:50Z Adaptive hybrid activation function for deep neural networks Адаптивная гибридная функция активации для глубоких нейронных сетей Адаптивна гібридна функція активації для глибоких нейронних мереж Bodyanskiy, Yevgeniy Kostiuk, Serhii адаптивна гібридна функція активації двоетапний процес налаштування параметрів глибокі нейронні мережі адаптивная гибридная функция активации двухэтапный процесс настройки параметров глубокие нейронные сети adaptive hybrid activation function double-stage parameter turning process deep neural networks The adaptive hybrid activation function (AHAF) is proposed that combines the properties of the rectifier units and the squashing functions. The proposed function can be used as a drop-in replacement for ReLU, SiL and Swish activations for deep neural networks and can evolve to one of such functions during the training. The effectiveness of the function was evaluated on the image classification task using the Fashion-MNIST and CIFAR-10 datasets. The evaluation shows that the neural networks with AHAF activations achieve better classification accuracy comparing to their base implementations that use ReLU and SiL. A double-stage parameter tuning process for training the neural networks with AHAF is proposed. The proposed approach is sufficiently simple from the implementation standpoint and provides high performance for the neural network training process. Предложена адаптивная гибридная функция активации (AHAF), которая объединяет свойства выпрямительных блоков (rectifier units) и сжимающих (squashing) функций. Предложенная функция может быть использована как прямая замена активационных функций ReLU, SiL и Swish для глубоких нейронных сетей, а также принимать форму одной из этих функций в процессе обучения. Эффективность функции исследована на задаче классификации изображений на наборах данных Fashion-MNIST и CIFAR-10. Результаты исследования показывают, что нейронные сети с активационными функциями AHAF показывают точность классификации лучшую, чем их базовые реализации на основе ReLU и SiL. Предложено двухэтапный процесс настройки параметров для обучения нейронных сетей с AHAF. Предложенный подход достаточно простой в реализации и обеспечивает высокую продуктивность в обучении нейронной сети. Запропоновано адаптивну гібридну функцію активації (AHAF), що поєднує особливості випрямних блоків (rectifier units) та стискальних (squashing) функцій. Запропонована функція може бути використана як пряма заміна активаційних функцій ReLU, SiL і Swish для глибоких нейронних мереж, а також набути форми однієї з цих функцій в процесі навчання. Ефективність функції досліджено на задачі класифікації зображень на наборах даних Fashion-MNIST і CIFAR-10. Результати дослідження показують, що нейронні мережі з активаційними функціями AHAF показують точність класифікації кращу, ніж їх базові реалізації на основі ReLU та SiL. Запропоновано двоетапний процес налаштування параметрів для навчання нейронних мереж з AHAF. Запропонований підхід достатньо простий в реалізації та забезпечує високу продуктивність у навчанні нейронної мережі. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2022-04-25 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/259203 10.20535/SRIT.2308-8893.2022.1.07 System research and information technologies; No. 1 (2022); 87-96 Системные исследования и информационные технологии; № 1 (2022); 87-96 Системні дослідження та інформаційні технології; № 1 (2022); 87-96 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/259203/255848 |
| spellingShingle | адаптивна гібридна функція активації двоетапний процес налаштування параметрів глибокі нейронні мережі Bodyanskiy, Yevgeniy Kostiuk, Serhii Адаптивна гібридна функція активації для глибоких нейронних мереж |
| title | Адаптивна гібридна функція активації для глибоких нейронних мереж |
| title_alt | Adaptive hybrid activation function for deep neural networks Адаптивная гибридная функция активации для глубоких нейронных сетей |
| title_full | Адаптивна гібридна функція активації для глибоких нейронних мереж |
| title_fullStr | Адаптивна гібридна функція активації для глибоких нейронних мереж |
| title_full_unstemmed | Адаптивна гібридна функція активації для глибоких нейронних мереж |
| title_short | Адаптивна гібридна функція активації для глибоких нейронних мереж |
| title_sort | адаптивна гібридна функція активації для глибоких нейронних мереж |
| topic | адаптивна гібридна функція активації двоетапний процес налаштування параметрів глибокі нейронні мережі |
| topic_facet | адаптивна гібридна функція активації двоетапний процес налаштування параметрів глибокі нейронні мережі адаптивная гибридная функция активации двухэтапный процесс настройки параметров глубокие нейронные сети adaptive hybrid activation function double-stage parameter turning process deep neural networks |
| url | https://journal.iasa.kpi.ua/article/view/259203 |
| work_keys_str_mv | AT bodyanskiyyevgeniy adaptivehybridactivationfunctionfordeepneuralnetworks AT kostiukserhii adaptivehybridactivationfunctionfordeepneuralnetworks AT bodyanskiyyevgeniy adaptivnaâgibridnaâfunkciâaktivaciidlâglubokihnejronnyhsetej AT kostiukserhii adaptivnaâgibridnaâfunkciâaktivaciidlâglubokihnejronnyhsetej AT bodyanskiyyevgeniy adaptivnagíbridnafunkcíâaktivacíídlâglibokihnejronnihmerež AT kostiukserhii adaptivnagíbridnafunkcíâaktivacíídlâglibokihnejronnihmerež |