Interval estimation of distribution parameter by statistical trials of expected value
The distribution parameter interval estimators are obtained by direct numerical approximation of the expected value for infinite and finite populations under the known upper and lower bounds of the random variable domain. Like in Bayesian approach, the distribution parameters are treated as random v...
Saved in:
| Date: | 2019 |
|---|---|
| Main Author: | |
| Format: | Article |
| Language: | English |
| Published: |
Національний науковий центр «Харківський фізико-технічний інститут» НАН України
2019
|
| Series: | Вопросы атомной науки и техники |
| Subjects: | |
| Online Access: | https://nasplib.isofts.kiev.ua/handle/123456789/195477 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Journal Title: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| Cite this: | Interval estimation of distribution parameter by statistical trials of expected value / V.O. Barannik // Problems of atomic science and technology. — 2019. — № 6. — С. 138-143. — Бібліогр.: 11 назв. — англ. |
Institution
Digital Library of Periodicals of National Academy of Sciences of Ukraine| id |
nasplib_isofts_kiev_ua-123456789-195477 |
|---|---|
| record_format |
dspace |
| spelling |
nasplib_isofts_kiev_ua-123456789-1954772025-02-09T20:03:31Z Interval estimation of distribution parameter by statistical trials of expected value Інтервальна оцінка параметра розподілу статистичними випробуваннями очікуваної величини Интервальная оценка параметра распределения статистическими испытаниями ожидаемой величины Barannik, V.O. Experimental methods and processing of data The distribution parameter interval estimators are obtained by direct numerical approximation of the expected value for infinite and finite populations under the known upper and lower bounds of the random variable domain. Like in Bayesian approach, the distribution parameters are treated as random variables, and their uncertainty is described as a distribution. The Monte Carlo procedure is involved to get the correspondent confidence interval endpoints. The model does not impose any restrictions on the type of distributions. In contrast to other nonparametric interval assessments of distribution parameters, the model is operable for samples of any size. Отримано інтервальні оцінки параметрів розподілу апроксимацією очікуваних значень нескінченної або скінченної генеральної сукупності з відомими границями. Аналогічно методу Байєса параметри розподілу розглядаються як випадкові величини, а їх невизначеність виражається в термінах розподілу. Для знаходження границь довірчого інтервалу застосовується метод Монте-Карло. Модель не накладає будь-яких обмежень на вид розподілів. На відміну від інших непараметричних інтервальних оцінок параметрів розподілу модель працює з вибірками будь-якоко розміру. Получены интервальные оценки параметров распределения аппроксимацией ожидаемых значений бесконечной или конечной генеральной совокупности с известными границами. Аналогично методу Байеса параметры распределения интерпретируются как случайные переменные, и их неопределенность выражается в терминах распределений. Для нахождения границ доверительного интервала используется метод Монте-Карло. Модель не накладывает каких-либо ограничений на вид распределений. В отличие от других непараметрических интервальных оценок параметров распределений модель работает с выборками любого размера. 2019 Article Interval estimation of distribution parameter by statistical trials of expected value / V.O. Barannik // Problems of atomic science and technology. — 2019. — № 6. — С. 138-143. — Бібліогр.: 11 назв. — англ. 1562-6016 PACS: 02.50.Ng https://nasplib.isofts.kiev.ua/handle/123456789/195477 en Вопросы атомной науки и техники application/pdf Національний науковий центр «Харківський фізико-технічний інститут» НАН України |
| institution |
Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| collection |
DSpace DC |
| language |
English |
| topic |
Experimental methods and processing of data Experimental methods and processing of data |
| spellingShingle |
Experimental methods and processing of data Experimental methods and processing of data Barannik, V.O. Interval estimation of distribution parameter by statistical trials of expected value Вопросы атомной науки и техники |
| description |
The distribution parameter interval estimators are obtained by direct numerical approximation of the expected value for infinite and finite populations under the known upper and lower bounds of the random variable domain. Like in Bayesian approach, the distribution parameters are treated as random variables, and their uncertainty is described as a distribution. The Monte Carlo procedure is involved to get the correspondent confidence interval endpoints. The model does not impose any restrictions on the type of distributions. In contrast to other nonparametric interval assessments of distribution parameters, the model is operable for samples of any size. |
| format |
Article |
| author |
Barannik, V.O. |
| author_facet |
Barannik, V.O. |
| author_sort |
Barannik, V.O. |
| title |
Interval estimation of distribution parameter by statistical trials of expected value |
| title_short |
Interval estimation of distribution parameter by statistical trials of expected value |
| title_full |
Interval estimation of distribution parameter by statistical trials of expected value |
| title_fullStr |
Interval estimation of distribution parameter by statistical trials of expected value |
| title_full_unstemmed |
Interval estimation of distribution parameter by statistical trials of expected value |
| title_sort |
interval estimation of distribution parameter by statistical trials of expected value |
| publisher |
Національний науковий центр «Харківський фізико-технічний інститут» НАН України |
| publishDate |
2019 |
| topic_facet |
Experimental methods and processing of data |
| url |
https://nasplib.isofts.kiev.ua/handle/123456789/195477 |
| citation_txt |
Interval estimation of distribution parameter by statistical trials of expected value / V.O. Barannik // Problems of atomic science and technology. — 2019. — № 6. — С. 138-143. — Бібліогр.: 11 назв. — англ. |
| series |
Вопросы атомной науки и техники |
| work_keys_str_mv |
AT barannikvo intervalestimationofdistributionparameterbystatisticaltrialsofexpectedvalue AT barannikvo íntervalʹnaocínkaparametrarozpodílustatističnimiviprobuvannâmiočíkuvanoíveličini AT barannikvo intervalʹnaâocenkaparametraraspredeleniâstatističeskimiispytaniâmiožidaemoiveličiny |
| first_indexed |
2025-11-30T09:26:00Z |
| last_indexed |
2025-11-30T09:26:00Z |
| _version_ |
1850206858354098176 |
| fulltext |
ISSN 1562-6016. ВАНТ. 2019. №6(124) 138
INTERVAL ESTIMATION OF DISTRIBUTION PARAMETER
BY STATISTICAL TRIALS OF EXPECTED VALUE
V.O. Barannik
O.M. Beketov National University of Urban Economy, Kharkiv, Ukraine
E-mail: v_barannik@ukr.net
The distribution parameter interval estimators are obtained by direct numerical approximation of the expected
value for infinite and finite populations under the known upper and lower bounds of the random variable domain.
Like in Bayesian approach, the distribution parameters are treated as random variables, and their uncertainty is de-
scribed as a distribution. The Monte Carlo procedure is involved to get the correspondent confidence interval end-
points. The model does not impose any restrictions on the type of distributions. In contrast to other nonparametric
interval assessments of distribution parameters, the model is operable for samples of any size.
PACS: 02.50.Ng
INTRODUCTION
Data analysis lies in the basement of physical sci-
ences. The significantly increased cost of modern physi-
cal experiment makes us pay attention to the lack of
reliable methods for analyzing small-volume samples.
Actually, the sample methods [1] are widely used to
study states of large population. In this case, the collec-
tive properties of the population, being quantified
through the parameters of the distribution of individual
properties among the elements, are usually of great in-
terest. In order to evaluate a specific distribution param-
eter, a sampling plan for elements is determined and
implemented, which ensures representativeness of the
sample for estimation, taking into account the known
properties of the population.
In the analysis of sample data, both parametric and
nonparametric methods can be used, depending on
whether hypotheses are involved in the analysis regard-
ing the type of distribution or not. An example of a par-
ametric approach is the maximum likelihood method
[2], in which an interval estimation of the distribution
parameter is performed using the likelihood function
constructed from the probability distributions of sample
elements. At the same time the consequences of specify-
ing the wrong distribution may prove very costly. If
such distribution does not hold, then the confidence
levels of the confidence intervals (or of hypotheses
tests) may be completely off.
An example of a nonparametric approach is the
bootstrap method, which is currently widespread [3, 4],
in which it is assumed that a representative sample ade-
quately reflects the distribution structure of a property in
a population and can replace a population. This allows
one to obtain interval estimate of the distribution pa-
rameter using statistical trials (Monte Carlo method [5]),
consisting in multiple selections of samples with re-
placement from the original sample. At the same time,
for all the asymptotic advantages of method [6], one can
hardly expect reliable estimates if the size of the origi-
nal sample is small or extremely small, because boot-
strap operates under assumption that all possible differ-
ent values of population have been observed [7].
Another approach to statistical inference may be re-
lated to the fact that any real property of a real popula-
tion is a limited variable distributed in the local domain
of the real line. In this regard, the sample can be consid-
ered as a random segmentation of the domain. Domain
partition becomes complete if the sample is supple-
mented with numerical values of the upper and lower
domain boundaries. If the distribution parameter can be
represented as an integral or a sum for the expected val-
ue, then such a domain structure can serve as the basis
for the interval parameter estimation by approximating
this representation. A variant of the model of expected
value approximation is described below.
1. PROBLEM FORMULATION
Infinite population. We consider an infinite popula-
tion of elements having a measurable property X. The
result x of measuring this property in a randomly select-
ed element is a random variable with an unknown inte-
grable probability density function (pdf) )(xρ having
limited domain maxmin xxx ≤≤ , where minx and maxx are
known lower and upper endpoints of the domain, respec-
tively. We conditionally call population to be infinite if it
is possible to draw the simple random sample of any size
n without introducing any distortions to the pdf. In this
sense the finite population that allows extracting samples
with replacement can be referred to infinite.
Let )(xu is integrable monotonic (non-decreasing
or non-increasing) function (generator of distribution
parameter) determining the contribution of the individu-
al property x of element to the collective property U
of population, which we call distribution parameter.
Then we consider distribution parameter that can be
presented as integral
∫=
max
min
)()(
x
x
dxxxuU ρ . (1)
It may be m -th moment of distribution if mxxu =)( ,
or proportion of population with property *xx ≤ if
( )xxYxu −= *)( , or something else ( )(∗Y is Heaviside
function). If there is no additional information then one
can estimate the value of distribution parameter as
maxmin uUu ≤≤ , (2)
where
))(),(min( maxminmin xuxuu = ,
))(),(max( maxminmax xuxuu = .
To improve the precision of this estimate, if it is
deemed insufficient, the measurement data are added to
data available on the property of the elements from a
ISSN 1562-6016. ВАНТ. 2019. №6(124) 139
simple random sample of size n. Thus, the new data set
in ascending order looks like this:
ni xxxx ≤≤≤≤≤ ......21 . (3)
This data are combined with endpoints min0 xx = and
max1
xx
n
=
+
; the corresponding series of values of the
generator function being either non-decreasing or non-
increasing:
{ } )(,)(),( max11min0 xuuixuiuxuu n
n
i === += . (4)
It is required to reassess the value (2) of the distribu-
tion parameter (1) in the light of new data.
Finite population. We also consider a finite popula-
tion Nxxx ,...,, 21 with a known size N, for which the
distribution parameter is determined by the sum
∑=
=
N
j
jxs
N
S
1
)(1 , (5)
where )(xs is another monotonic generator function.
Without loss of generality, we assume that the ele-
ments of the population are numbered in order of in-
creasing property x. If sampling were made with re-
placement from this population, then the population
could be considered as infinite population having pdf as
∑ −=
=
N
j
jN xx
N
x
1
)(
1
)( δρ , (6)
where )(∗δ is Dirac delta function.
Let (3) be the simple random sample extracted from
this population without replacement. The correspondent
values of the generator function are
{ }n
iii xss
1
)(
=
= . (7)
In this case initial interval estimation of S is also
just a range of function )(xs :
maxmin sSs ≤≤ , (8)
where
))(,(min( max)minmin xsxss = ,
))(),(max( maxminmax xsxss = .
The task is the same as before: it is required to reas-
sess the value (8) of the distribution parameter (5) in the
light of new data.
2. PROBLEM ANALYSIS
Infinite population. We introduce the cumulative
distribution function (cdf) into consideration in a usual
way
∫=
x
x dxxxf
min
)()( ρ .
In particular for (6) it will be
∑ −=
=
N
j
jN xxY
N
xf
1
)(1)( .
Integral (1) can be presented as following
∑ ∫=∫=
+
= −
1
1
1
0 1
)]([)]([
n
i
f
f
i
i
dffxudffxuU , (9)
where )( fx is inverse cdf and { }n
iixfif 1
)(
=
= , 00=f ,
11=+nf .
In accordance to the integral mean value theorem
equation (9) can be presented as following:
( )
1
1
1
n
i i i
i
ˆU u f f
+
−
=
= −∑ , (10)
where, if ( )xu is entirely non-decreasing or non-
increasing function,
1 1min( ) max( )i i i i iˆu ,u u u ,u− −≤ ≤ ; ( )ii xuu = . Then fol-
lowing conditions are fulfilled:
( )( )∑ −=≥
+
=
−−
1
1
11min ,min
n
i
iiii ffuuUU , (11)
( )( )∑ −=≤
+
=
−−
1
1
11max ,max
n
i
iiii ffuuUU . (12)
Although values { }n
iif 1=
are unknown they have
well-known posterior pdf as following
≤≤≤≤≤
=
otherwize
fffifn
fff n
n 0
1...0!
),...,,( 21
21ρ . (13)
It means that any set { }n
iif 1=
of random independ-
ent uniformly distributed on [0, 1] and ordered in as-
cending order numbers is equally probable and can be
considered to be likely true set. Now, taking into ac-
count (13) and representations (11), (12) for endpoints,
sample (3) can be considered as deterministic and pa-
rameter U as random.
To get interval assessment of population parameter
(1) by use of statistical trials [8] one can generate K sets
of above-mentioned numbers { }n
ikif 1, =
( )Kk ,...,2,1=
and calculate correspondent posterior statistics:
( )( )∑ −=
+
=
−−
1
1
,1,1min, ,min~ n
i
kikiiik ffuuU , (14)
( )( )∑
=
−=
+
−−
1
,1,1max,
1
,max~ n
kikiiik
i
ffuuU . (15)
Arranging the results of statistical trials (14) and
(15) in ascending order { }K
mmU
1min, =
and { }K
mmU
1max, =
one can get the resultant interval estimation in the form
of confidence interval
KK UUU
−
≤≤
2
1max,
2
min, aa , (16)
where a is appropriate level of significance.
Finite population. It is suitable to rewrite the sum
(5) in the following way:
N
xsxsV
N
Nxs
N
S NN
j
j 2
)()(1)(1 10
1
+
=
+
−
+
=∑= , (17)
∑
+
+
=
+
=
−1
1
1
2
)()(
1
1 N
j
jj xsxs
N
V , (18)
where min0 xx = and max1 xxN =+ .
Let )(ir be the serial number in the population of the
element, which in the sample takes position i . Let also
the serial number of elements min0 xx = and max1 xxN =+
attached to the population be 0)0( =r and
.1)1( +=+ Nnr Then expression (18) can be repre-
sented as the sum of elements within the segments of real
line, the ends of which are the elements of the sample:
( ) ( )
∑ ∑
+
+
=
+
=
−
−=
+1
1
1)(
)1(
1
21
1 n
i
ir
irj
jj xsxs
N
V . (19)
It is now possible to express the value V in terms of
the average values of the generator function iŝ within
each segment i:
ISSN 1562-6016. ВАНТ. 2019. №6(124) 140
[ ]
1
1
1 ( ) ( 1)
1
n
i
i
ˆV s r i r i
N
+
=
= ⋅ − −∑
+
, (20)
( ) ( )( ) 1 1
( 1)
1
( ) ( 1) 2
r i j j
i
j r i
s x s x
ŝ
r i r i
− +
= −
+
= ∑
− −
. (21)
Taking into account the monotonic property of gen-
erator ( )xs the value (20) can be assessed as
[ ][ ]∑ −−⋅
+
=≥
+
=
−
1
1
1min )1()()(),(min
1
1 n
i
ii irirxsxs
N
VV , (22)
[ ][ ]∑ −−⋅
+
=≤
+
=
−
1
1
1max )1()()(),(max
1
1 n
i
ii irirxsxs
N
VV .(23)
In these expressions, now random variables are ac-
tually not the elements of the sample but the places that
the elements of the sample occupy in an ordered popula-
tion.
Obviously Monte Carlo procedure can be also ap-
plied to find the lower and upper bounds of the confi-
dence interval of appropriate significance level a for
the population parameter S . For this purpose K ran-
dom samples of n positive integers should be extracted
from the set { }N
rr 1= without replacement, ordered from
the bottom to the top to represent likely true sets
{ }n
ikir
1
),(
−
, ( )Kk ...,,2,1= , and substituted to the
equations (22), (23) to calculate likely true values of
correspondent endpoints.
After sorting resultant sets in ascending order
{ }K
kkV
1min, =
, { }K
kkV
1max, =
the confidence interval of
desirable significance level a for the distribution pa-
rameter S can be presented as following:
N
xsxsV
N
N
S
N
xsxs
V
N
N
K
K
2
)()(1
2
)()(1
maxmin
2
1max,
maxmin
2
min,
+
−
+
≤≤
+
−
+
−
a
a
. (24)
3. SIMULATIONS
Here we consider some special cases of the above
mentioned model application to demonstrate its useful
properties.
Infinite population – continuous distribution. Let
)(xρ be the pdf of the uniformly distributed within
closed interval [0, 1] random variable, so that
{
otherwise
xif
x
0
101
)(
≤≤
=ρ ,
then m-th moment of this distribution is 1)1( −
+= mM .
To test the described approach for statistical inference
we generated 100 samples of different size
50,30,15,7,3=n from this distribution and did 1000=K
statistical trials on the every sample to get interval esti-
mation for the following moment: 30,15,7,3=m ; the
generator function being mxxu =)( . The endpoints of
confidence interval (16) with confidence level
95.01 =−a for the every sample and minimal Dmin and
maximal Dmax widths of the confidence interval for the
every set of samples of definite size were calculated
together with the number of faults when true value of
moment went outside the correspondent interval. These
results are presented in table below.
Simulation Output for the Uniform Distribution
n Parame-
ters
Order of Moment, m
1 3 7 15 30
3 Dmin 0.677 0.677 0.656 0.673 0.669
Dmax 0.861 0.861 0.861 0.881 0.88
Faults 0 0 2 0 0
7 Dmin 0.423 0.419 0.378 0.388 0.37
Dmax 0.65 0.631 0.582 0.652 0.595
Faults 0 0 1 1 2
15 Dmin 0.27 0.233 0.224 0.207 0.2
Dmax 0.415 0.436 0.399 0.336 0.354
Faults 2 0 0 1 1
30 Dmin 0.204 0.154 0.127 0.12 0.104
Dmax 0.276 0.293 0.264 0.266 0.206
Faults 3 2 2 6 1
50 Dmin 0.153 0.128 0.099 0.082 0.071
Dmax 0.207 0.21 0.191 0.172 0.151
Faults 3 2 2 2 1
One can see from table that described numerical al-
gorithm adequately worked independently on the sam-
ple size and moment order providing the statistical in-
ference on the chosen significance level. These results
demonstrate expected decrease of the confidence inter-
val width (increase of the assessment precision) with
increase of the sample size and order of the moment.
At the same time, the graphs (Figs. 1-5) of the em-
pirical distribution functions for the confidence interval
boundaries show that the reduction of the width of the
confidence interval at a high value of the moment order
( 30=m ) is achieved mainly by moving the right
boundary. The distribution function of the left boundary
of the confidence interval remains sandwiched between
the left endpoint of the domain and the exact value of
the moment when the sample size grows.
The cdf of the generator m
m xxu =)( for the m-th
moment of considered random variable is m
m uu =)(η .
Fig. 1. Distribution functions of the lteft (solid line),
and right (dotted line) confident interval boundaries
for the sample size 3=n . Dashed line shows the point
of true moment value 31/1=M
Fig. 2. The same for the sample size 7=n
ISSN 1562-6016. ВАНТ. 2019. №6(124) 141
Fig. 3. The same for the sample size 15=n
Fig. 4. The same for the sample size 30=n
Fig. 5. The same for the sample size 50=n
It can be understood from Fig. 6, where dependence
of moment distribution skewness and kurtosis versus
moment order m is graphically presented, that moment
distribution has right tail; being right-skewed the distri-
bution is concentrated near the left boundary of the in-
terval [0, 1].
Infinite population – discrete distribution. Here
we consider dichotomous population that consists of
elements having one of two possible signs 1=x or 0=x ,
in other words – “success” or “failure”, and p is pro-
portion of population elements having sign 1.
Fig. 6. Skewness (Sn) and kurtosis (Ks) vs order (m) of
the moment of variable uniformly distributed on [0, 1]
If the random value is the sign x of the element
randomly extracted from the population (Bernoulli trial)
then correspondent pdf is
)1()()1()( −+−= xpxpx δδρ ,
and first moment xxu =)( of this distribution is
pdxxx =∫
∞
∞−
)(ρ .
Let nxxx ,...,, 31 be the signs of elements in the or-
dered simple random sample from this population, so
that sn− elements have sign 0 and s elements have
sign 1: 01 =−≤≤ snix , 11 =≤≤+− nisnx . Then
( )( )
,1)(
,min
1
1
2
1
1
1
11min
+−
+
+−=
−
+
=
−−
−=∑ −
=∑ −=
mn
n
mni
ii
n
i
iiii
fff
ffuuU
( )( )
.1)(
,max
1
1
1
1
1
11max
mn
n
mni
ii
n
i
iiii
fff
ffuuU
−
+
+−=
−
+
=
−−
−=∑ −
=∑ −=
Taking into account that the i-th order statistic if is
a beta-distributed random variable having pdf
( )
,)1,(
1
)!()!1(
!)( 1
+−
=−
−−
=
−−
iniBeta
ff
ini
nf in
i
i
iiρ
the pdf of the confidence interval endpoints are
)1,()( min +−= mnmBetaUρ ,
),1()( max mnmBetaU −+=ρ .
Then the a−1 confidence interval for the probabil-
ity of success is just Clopper-Pearson interval [9] (“ex-
act” confidence interval) that is an early and very com-
mon method for calculating binomial confidence inter-
vals:
( ) ( )mnmBpmnmB −+−≤≤+− ,1,
2
11,,
2
aa ,
where ),,( zsrB is r-th quantile from the beta distribu-
tion with shape parameters s and z .
Finite population. Here we consider dichotomous
population consisted of N elements, where every ele-
ment x has also one of two possible signs 1 or 0. Then
the sum (5), where generator jj xxs =)( , turns to the
proportion
p
N
F
xs
N
S
N
j
j ==∑=
=1
)(
1
,
where F is number of elements having sign 1 and p is
probability to extract element having sign 1 if Bernoulli
trial is applied.
Let 01 =−≤≤ mnix and 11 =≤≤+− nimnx is ordered sim-
ple random sample of size n without replacement
where m elements have sign “1”. Then equations (22),
(23) take the following view
[ ])1(1
1
1
min +−−+
+
=≥ mnrN
N
VV ,
[ ])(1
1
1
max mnrN
N
VV −−+
+
=≤ .
The total number of ordered simple random samples
of size n that can be drawn without replacement from
the population including N elements is ( )n
N . The num-
ber of such samples with fixed value inNiri +−≤≤ )( is
( )( )in
irN
i
ir
−
−
−
− )(
1
1)( . That means that probability distri-
bution for the discrete random value )(ir can be pre-
sented as
ISSN 1562-6016. ВАНТ. 2019. №6(124) 142
[ ] ( ) ( )( )in
irN
i
ir
n
NirP −
−
−
−=
− )(
1
1)()(
1
. (25)
Equations (26), (27) enable to calculate two end-
points, amax,r and amin,r , of the confidence interval
(28) for the desirable level of significance a , though
the discontinuous nature of the distribution (25) may
preclude any interval with exact probability coverage
for all population proportions.
[ ]
( ) ( )( ) ,
2
11
)(1)(1
)1(
max,
1)(
max,
aa
a
−=∑
−
−
−
−−
=≤+−
+−=
r
mnir m
irN
mn
ir
n
N
rmnrP
(26)
[ ]
( ) ( )( ) ,
2
)(
1
1)(1
)(
min,
)(
min,
aa
a
=∑ −
−−
−−
=≤−
−=
r
mnir m
irN
mn
ir
n
N
rmnrP
(27)
N
r
N
p
N
r
N
aa min,max,
2
11
2
11 −+≤≤−+ . (28)
As example we consider situation when 100=N ,
30=n , 3=m and 05.0=a . Then we get 97max, =ar ,
76min, =ar , 245.0035.0 ≤≤ p . For comparison corre-
spondent Clopper-Pearson confidence interval has an-
other endpoints 266.0021.0 ≤≤ p , so the width of the
“exact” confidence interval is greater than the width of
the interval (26), (27) for a population of known size.
Moreover the Monte Carlo algorithm, being applied
directly to (22), (23), allows to get more valuable in-
formation about properties of distribution of random
values { }K
kkp
1min, =
and { }K
ikkp
=max, :
( )
+
−
+
=
12
11
min,min, N
V
N
N
p kk ,
( )
+
−
+
=
12
11
max,max, N
V
N
N
p kk ,
which define the endpoints of confidence interval
−
≤≤
2
1max,
2
min, aa
KK ppp .
Actually, under 610=K statistical trials the corre-
spondent endpoints were found 035.025000min, =p and
245.0975000max, =p . At the same time it was happened
that set of quantiles from the empirical distribution
function have the same value 035.07744824446min, =≤≤ kp .
Then one can conclude that level of significance for the
lower endpoint is less then 05.0=a and it can be esti-
mated as 0489.0:244462 =⋅= Ka . The same is true for
the right boundary of the confidence interval because
245.0980677973365max, =≤≤ kp , then we have to reassess
the level of significance for this quantile as
0386.0):9806771(2 =−⋅= Ka . It is also possible to test
the nearest quantile 235.0973364max, =p for the right
endpoint. The correspondent level of significance for
the interval 235.0035.0 ≤≤ p can be estimated as
051.0:973364:244461 =−+= KKa . It seems that this
interval is in the best degree satisfying a significance
level of 0.05. It’s worse also pointing that all 106 statis-
tical trials gave proportions in the absolute limits
525.0025.0 ≤≤ p .
These results are in agreement with notation [10]
that for interval estimation of a proportion, coverage
probability tends to be too large for “exact” confidence
intervals based on inverting the binomial test.
CONCLUSIONS
It is shown that interval estimation of the distribu-
tion parameter can be fulfilled by the direct approxima-
tion of expected value integral or sum if the upper maxx
and lower minx bounds of random value domain are
known. In this model it is assumed that the sample to be
drawn divides the domain of a random variable into
fixed segments: [ ]ii xx ,1− , if population is infinite, or
[ ])()1( , irir xx − , if population is finite. At the same time,
the statistical weights of the segments, 1−−= iii ffp or
[ ] 1)1()1()( −+⋅−−= NirirPi , and, therefore, the distribu-
tion parameter are considered as random variables,
which resembles the Bayesian approach [11]. However,
this all the similarity ends. The model of the expected
value approximation does not need hypotheses about a
priori distribution of the parameter, since the probability
distributions of these statistical weights are known if the
sample satisfies the i.i.d. conditions. It is easily imple-
mented numerically using the Monte Carlo method, and
it does not impose any restrictions on the sample size. In
contrast to the bootstrap method, the model is formally
operable for samples of any size 1≥n .
It is clear that the possibilities of practical applica-
tion of the model are directly related to the availability
of information regarding the boundaries of the domain
of the variable, the distribution parameter of which is
estimated. The practical attractiveness of the described
approach is stipulated for the circumstance that some
measurable properties of the physical, biological and
social populations have known bounds. There may be
various options for information support. First of all, it
should be pointed out that there are variables with natu-
rally well-defined boundaries. A classic example is the
dichotomous population with a variable taking two pos-
sible values. Another example is the correlation coeffi-
cient K . If there are no expectations, then the natural
boundaries for the correlation coefficient are 11 ≤≤− K ,
if there is confidence in a positive relationship, then
10 ≤≤K , and 01 ≤≤− K in opposite case.
Generally speaking, if the problem concerns the ex-
treme values of the observed variable in a large system,
then one can expect that the addresses of such "extreme"
elements in the system are known and this makes it pos-
sible to target these extreme values. Note that the use of
the model does not require knowledge of the exact up-
per and lower bounds of the variable domain. These
boundaries, if necessary, can be assigned with a margin.
For example, if it is known that the upper bound ex-
ceeds the lower bound by hundreds or more times, then
the lower bound can be set equal to zero. Obviously,
this will lead to a slight broadening of the confidence
ISSN 1562-6016. ВАНТ. 2019. №6(124) 143
interval. Furthermore, the special options could be en-
visaged in the frames of the sampling plan in order to
find appropriate population elements and to estimate the
measured variable bounds. These are just the cases,
when the described method of expected value approxi-
mation could be applied.
REFERENCES
1. A. Bevan. Statistical Data Analysis for the Physical
Sciences. Queen Mary University of London, 2013,
229 p.
2. R.A. Fisher. On the mathematical foundations of
theoretical statistics // Philosophical Transactions of
the Royal Society. 1922, A. 222 (594-604), p. 309-
368.
3. B. Efron. Bootstrap methods: another look at the
jackknife // The Annals of Statistics. 1979, v. 7, № 1,
p. 1-26.
4. B. Efron. The bootstrap and modern statistics //
Journal of the American Association. 2000, v. 95,
№ 452, p. 1293-1296.
5. N. Metropolis, S. Ulam. The Monte Carlo Method //
Journal of the American Statistical Association.
1949, v. 44, № 247, p. 335-341.
6. T.G. DiCiccio, B. Efron. Bootstrap confidence inter-
vals (with Discussion) // Statistical Science. 1996,
v. 11, № 3, p. 189-228.
7. D.B. Rubin. The Bayesian Bootstrap // The Annals
of Statistics. 1981, v. 9, № 1, p. 130-134.
8. D.P. Kroese, T. Taimre, Z.I. Botev. Handbook of
Monte Carlo Methods. New York, John Wiley and
Sons. 2011, 772 p.
9. C. Clopper, E.S. Pearson. The use of confidence or
fiducial illustrated in the case of binomial // Bio-
metrika. 1934, v. 26 (4), p. 404-413.
10. A. Agresti, Alan, B. Coull, Brent. Approximate Is
Better than "Exact" for Interval Estimation of Bino-
mial Proportions // The American Statistician. 1998,
v. 52 (2), p. 119-126.
11. J.A. Bernardo, Adrian F.M. Smith. Bayesian Theory.
New York, John Wiley and Sons. 1994, 586 p.
Article received 04.10.2019
ИНТЕРВАЛЬНАЯ ОЦЕНКА ПАРАМЕТРА РАСПРЕДЕЛЕНИЯ СТАТИСТИЧЕСКИМИ
ИСПЫТАНИЯМИ ОЖИДАЕМОЙ ВЕЛИЧИНЫ
В.А. Баранник
Получены интервальные оценки параметров распределения аппроксимацией ожидаемых значений бес-
конечной или конечной генеральной совокупности с известными границами. Аналогично методу Байеса
параметры распределения интерпретируются как случайные переменные, и их неопределенность выражает-
ся в терминах распределений. Для нахождения границ доверительного интервала используется метод Мон-
те-Карло. Модель не накладывает каких-либо ограничений на вид распределений. В отличие от других не-
параметрических интервальных оценок параметров распределений модель работает с выборками любого
размера.
ІНТЕРВАЛЬНА ОЦІНКА ПАРАМЕТРА РОЗПОДІЛУ СТАТИСТИЧНИМИ
ВИПРОБУВАННЯМИ ОЧІКУВАНОЇ ВЕЛИЧИНИ
В.О. Бараннік
Отримано інтервальні оцінки параметрів розподілу апроксимацією очікуваних значень нескінченної або
скінченної генеральної сукупності з відомими границями. Аналогічно методу Байєса параметри розподілу
розглядаються як випадкові величини, а їх невизначеність виражається в термінах розподілу. Для знахо-
дження границь довірчого інтервалу застосовується метод Монте-Карло. Модель не накладає будь-яких об-
межень на вид розподілів. На відміну від інших непараметричних інтервальних оцінок параметрів розподілу
модель працює з вибірками будь-якоко розміру.
INTRODUCTION
1. PROBLEM FORMULATION
references
ИНТЕРВАЛЬНАЯ ОЦЕНКА ПАРАМЕТРА РАСПРЕДЕЛЕНИЯ СТАТИСТИЧЕСКИМИ ИСПЫТАНИЯМИ ОЖИДАЕМОЙ ВЕЛИЧИНЫ
ІНТЕРВАЛЬНА ОЦІНКА ПАРАМЕТРА РОЗПОДІЛУ статистичними випрОбуваннями очікуваної величини
|