A new test for unimodality

A distribution function (d.f.) of a random variable is unimodal if there exists a number such that d.f. is convex left from this number and is concave right from this number. This number is called a mode of d.f. Since one may have more than one mode, a mode is not necessarily unique. The purpose of...

Повний опис

Збережено в:
Бібліографічні деталі
Дата:2008
Автори: Andrushkiw, R.I., Klyushin, D.D., Petunin, Y.I.
Формат: Стаття
Мова:Англійська
Опубліковано: Інститут математики НАН України 2008
Онлайн доступ:https://nasplib.isofts.kiev.ua/handle/123456789/4530
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:Digital Library of Periodicals of National Academy of Sciences of Ukraine
Цитувати:A new test for unimodality / R.I. Andrushkiw, D.D. Klyushin, Y.I. Petunin // Theory of Stochastic Processes. — 2008. — Т. 14 (30), № 1. — С. 1–6. — Бібліогр.: 12 назв.— англ.

Репозитарії

Digital Library of Periodicals of National Academy of Sciences of Ukraine
_version_ 1860027327220547584
author Andrushkiw, R.I.
Klyushin, D.D.
Petunin, Y.I.
author_facet Andrushkiw, R.I.
Klyushin, D.D.
Petunin, Y.I.
citation_txt A new test for unimodality / R.I. Andrushkiw, D.D. Klyushin, Y.I. Petunin // Theory of Stochastic Processes. — 2008. — Т. 14 (30), № 1. — С. 1–6. — Бібліогр.: 12 назв.— англ.
collection DSpace DC
description A distribution function (d.f.) of a random variable is unimodal if there exists a number such that d.f. is convex left from this number and is concave right from this number. This number is called a mode of d.f. Since one may have more than one mode, a mode is not necessarily unique. The purpose of this paper is to construct nonparametric tests for the unimodality of d.f. based on a sample obtained from the general population of values of the random variable by simple sampling. The tests proposed are significance tests such that the unimodality of d.f. can be guaranteed with some probability (confidence level).
first_indexed 2025-12-07T16:50:16Z
format Article
fulltext Theory of Stochastic Processes Vol. 14 (30), no. 1, 2008, pp. 1–6 UDC 519.21 ROMAN I. ANDRUSHKIW, DMITRY D. KLYUSHIN, AND YURIY I. PETUNIN A NEW TEST FOR UNIMODALITY A distribution function (d.f.) of a random variable is unimodal if there exists a num- ber such that d.f. is convex left from this number and is concave right from this number. This number is called a mode of d.f. Since one may have more than one mode, a mode is not necessarily unique. The purpose of this paper is to construct nonparametric tests for the unimodality of d.f. based on a sample obtained from the general population of values of the random variable by simple sampling. The tests proposed are significance tests such that the unimodality of d.f. can be guaranteed with some probability (confidence level). 1. Introduction Testing the unimodality of a distribution function is a widely investigated issue. The most popular tests include the DIP test proposed by J.A.Hartigan and the kernel density estimation test proposed by B.W.Silverman [1–4]. However, all of these tests are compu- tationally quite complex and asymptotic. That is why it is useful to develop elementary tests which are based on simple computational procedures and are non-asymptotic. According to A.Ya.Khinchin, a distribution function F (u) of a random variable x is unimodal if there exists a number M such that d.f. F (u) is convex in (−∞,M) and concave in (M,∞) . The number M is said to be a mode of d.f. F (u). A mode can not be unique, since d.f. F (u) can have several modes. Also, d.f. F (u) can have break at M and be continuous in (−∞,M) and (M,∞) . The purpose of the paper is to construct nonparametric tests for the unimodality of d.f. F (u) based on a sample x1, x2, . . . , xn obtained by the simple sampling from the general population of values of a random variable x. The tests proposed are significance tests, so the unimodality of d.f. F (u) can be guaranteed with some probability α (confidence level), where β = 1 − α is the significance level of a test. To formulate the tests, we introduce new estimations of the probability density (d.p.) and the distribution function based on a sample x1, x2, . . . , xn. 2. Unified histogram and modified empirical d.f. Let x1, x2, . . . , xn. be a sample obtained from a general population F (u) by the sim- ple sampling which has d.p. F (u). Since these functions are unknown, we call them hypothetical. To estimate d.p., we use the relation (1) p ( xn+1 ∈ ( x(i), x(i+1) )) = 1 n+ 1 , where x(i) is the order statistics (i = 1, 2, ..., n). Using estimation (1), we can define the estimation hn (u) for a hypothetical d.p. hn(u) = { 1 (n−1)(x(i+1)−x(i)) , if u ∈ (xi, xi+1) , 0, otherwise. 2000 AMS Mathematics Subject Classification. Primary 62G05. Key words and phrases. Unimodality, distribution function, significance test. 1 2 ROMAN I. ANDRUSHKIW, DMITRY D. KLYUSHIN, AND YURIY I. PETUNIN In such a case, the probability that the value of a random variable x̃ with d.p. (2) belongs to [ x(i), x(i+1) ) is equal to p ( x̃ ∈ (x(i), x(i+1) )) = 1 n+ 1 , where xi are considered as constants. For large n, this probability is close to probability (1), so we refer to the value hn (u)) as a unified histogram constructed on x1, x2, . . . , xn.. This histogram has some advantage over all other histograms, because it is unambigu- ously defined by the sample x1, x2, . . . , xn.. Also, the integral (2) F̃ ∗ n (u) = u∫ x(1) hn (v) dv = u+ (i− 1)x(i+1) − ix(i) (n− 1) ( x(i+1) − x(i) ) is a linear spline x(i) ≤ v < x(i+1) which is a more precise estimation of the hypothetical d.f F (u) than that of a piecewise empirical d.f. F ∗ n (u) = i n , if x(i) ≤ v < x(i+1). We refer to the function F ∗ n (u) as e.d.f. and to F̃ ∗ n (u) defined by (2) as a modified e.d.f. (m.e.d.f.). Its advantages over the conventional e.d.f. are obvious: 1) when d.f. F (u) is continuous, linear splines are more precise approximations than the piecewise e.d.f F ∗ n (u), and 2) F̃ ∗ n (u) is continuous, so it is possible to estimate quantiles of any order and to construct an inverse d.f. (Quetelet curve), whereas it is impossible to do by using the piecewise e.d.f. F ∗ n (u). However, at large n, e.d.f. F ∗ n (u) and F̃ ∗ n (u) are close. Let us prove that (3) ∣∣∣F ∗ n (u)− F̃ ∗ n (u) ∣∣∣ ≤ 1 n . Indeed, for all u ∈ [xi, xi+1), the following relation holds:∣∣∣F ∗ n (u)− F̃ ∗ n (u) ∣∣∣ = ∣∣∣∣∣u+ (i− 1)x(i+1) − ix(i) n (n− 1) ( x(i+1) − x(i) ) − i n ∣∣∣∣∣ = ∣∣∣∣∣nu− (n− i)x(i+1) + ix(i) n (n− 1) ( x(i+1) − x(i) ) ∣∣∣∣∣ . Granting that F̃ ∗ n (u) = u− x(i+1) + i ( x(i+1) − x(i) ) (n− 1) ( x(i+1) − x(i) ) = 1 n− 1 [ u− x(i+1) x(i+1) − x(i+1) + 1 ] , we have i− 1 n− 1 ≤ F̃ ∗ n (u) ≤ i n− 1 and i− 1 n− 1 − F ∗ n (u) ≤ F̃ ∗ n (u)− F ∗ n (u) ≤ i n− 1 − F ∗ n (u) , n (i− 1)− (n− 1) i (n− 1) i ≤ F̃ ∗ n (u)− F ∗ n (u) ≤ ni− (n− 1) i n (n− 1) , i− n n (n− 1) ≤ F̃ ∗ n (u)− F ∗ n (u) ≤ 1 n ; hence, − 1 n ≤ F̃ ∗ n (u)− F ∗ n (u) ≤ 1 n , i.e. (4) ∣∣∣F̃ ∗ n (u)− F ∗ n (u) ∣∣∣ ≤ 1 n . A NEW TEST FOR UNIMODALITY 3 Estimation (4) implies that m.e.d.f has similar asymptotic properties as conventional e.d.f., i.e. it is consistent, asymptotically unbiased, etc. 3. Confidence limits for hypothetical d.f. Let us define the lower and upper bounds of a hypothetical d.f.F (u) by means of an empirical d.f., under the assumption that F (u) is continuous and strictly increasing. This problem was solved in [5–10]. Hence, given a significance level β∗ (e.g., β∗ = 0.05), we can define ε so that p ( Δ = max x(1)≤u≤x(n) |F (u)− Fn(u)| > ε ) = β∗. It follows that, for a given β∗, we can find ε according to statistical tables [11] and construct a strip Πβ∗ , whose bounds are stepwise linear: y = F ∗ n(u)+ε and y = F ∗ n(u)−ε. The strip Πβ∗ completely covers the true d.f. y = F (u) with the confidence probability α∗ = 1 − β∗. Hereinafter, we refer to the strip Πβ∗ as the confidence strip for d.f. with significance level β∗ constructed for the empirical d.f. 4. Test for unimodality based on e.d.f. Let x1, x2, . . . , xn be a sample obtained from a general population G by the simple sampling with continuous and strictly monotone d.f. F (u). Using this sample, we con- struct the empirical d.f. F ∗ n(u) and the strip Πβ∗ . Denote, by ϕ(u), the upper bound of Πβ∗ described by the equation y = F ∗ n (u) + ε, and let ψ(u) be the lower bound of Πβ∗ described by the equation y = F ∗ n(u)− ε. Then p (ϕ(u) ≤ F (u) ≤ ψ(u)) = α∗ = 1− β∗. Definition 1. Let y=ϕ(u) be an arbitrary function defined on [a, b]. Then the set GU = {(u, y): y ≥ ϕ(u), a ≤ u ≤ b} is an epigraph of ϕ(u), and the set GL = {(u, y) : y ≤ ϕ(u), a ≤ u ≤ b} is a subgraph of ϕ(u). Definition 2. The lower bound of a convex hull of the epigraph of a function ϕ(u) is a convex minorant of ϕ(u), ϕinf(u) = inf { v : (u, v) ∈ conv a≤u≤b GU } , where conv GU is the convex hull of GU . Analogously, the upper bound of a convex hull of the subgraph of a function ϕ(u) is a concave majorant of ϕ(u): ψsup(u) = sup { v : (u, v) ∈ conv a≤u≤b GL } . Theorem 1. Let ϕinf(u) and ψsup(u) be the convex minorant and concave majorant of ϕ(u) and ψ(u), respectively, and c = sup { u : ϕinf(u) ≤ ψ(u), x(1) ≤ u ≤ x(n) } , d = inf { u : ψsup(u) ≥ ϕ(u), x(1) ≤ u ≤ x(n) } . Then, the hypothetical distribution F (u) is unimodal iff 1) ϕinf(u) ≥ ψ(u) or ψsup(u) ≤ ϕ(u) ∀u ∈ [x(1), x(n) ] ; or 2) c ≥ d. 4 ROMAN I. ANDRUSHKIW, DMITRY D. KLYUSHIN, AND YURIY I. PETUNIN Fig. 1. If c < d, the unimodality is absent. Moreover, the significance level of this criterion is β∗. Proof. Necessity. Suppose that the hypothetical d.f. F (u) is unimodal, and M is its mode. If M ≤ x(1) or M ≥ x(n), then F (u) on [ x(1), x(n) ] can be convex or concave. Then condition 1) holds. If x(1) ≥ M ≤ x(n), then F (u) is convex on [ x(1),M ] and concave on [ M,x(n) ] . In such a case, it follows from Definition 2 (see Fig. 1) that ϕinf(u) ≥ F (u) on [ x(1),M ] . Also, F (u) ≥ ψinf(u)∀u ∈ [x(1),M ] , so ϕinf(u) ≥ ψ(u), and d ≥M . On the other hand, F (u) ≥ ψsup(u), as far as F (u) is concave on [ M,x(n) ] . Definition 2 implies that F (u) ≥ ψsup(u) on [ M,x(n) ] . Also, ϕ(u) ≥ F (u)∀u ∈ [M,x(n) ] . Thus, ϕ(u) ≥ ψsup(u)∀u ∈ [M,x(n) ] and d ≤M . Consequently, c ≥ d , and condition 2) holds. Sufficiency. Note that ϕ(u) and ψ(u) are increasing. If condition 1) holds, then ψ(u) ≤ ϕsup(u) ≤ ϕ(u) ∀u ∈ [x(1), x(n) ] or ψ(u) ≤ ψinf (u) ≤ ϕ(u) ∀u ∈ [x(1), x(n) ] . Thus, ϕsup(u) (or ψinf (u) ) lies in the strip Πβ . Therefore, ϕsup(u) (or ψinf (u)) can be used as an estimation of the hypothetical d.f. F (u) of a general population G. Since F (u) = ϕinf (u) or F (u) = ψinf (u), the hypothetical d.f. increases, is convex or concave on [ x(1), x(n) ] , and is unimodal. The significance level of this test is β∗. Now, we suppose that condition 2) holds, i.e. c ≥ d. Put F̂ (u) = ϕinf (u), if u ∈[ x(1), c ] , and F̂ (u) = ψsup(u), if u ∈ (c, x(n) ] . It is easy to see that F̂ (u) lies in Πβ , because c ≥ d. Also, F̂ (u) is convex on [ x(1), c ] and concave on ( c, x(n) ] . Let us prove that F̂ (u) ≥ F̂ (c + 0) = limu→c,u>c = γ. Indeed, if γ < F̂ (c), then γ �∈ Πβ . Therefore, the abscissa of the first exit point d, where ϕinf (u) exceeds the bounds Πβ while moving from x(n) to x(1), is greater than c. This contradicts condition 2. Thus, F̂ (u) increases, is convex on [ x(1), c ] , and concave on ( c, x(n) ] . But F̂ (u) can have a breakpoint in c. To exclude this breakpoint, we take ε > 0 sufficiently small so that the segment with the ends ( c− ε, F̂ (c− ε) ) completely lies in Πβ . Then the function F̂ε(u) = ⎧⎪⎪⎨⎪⎪⎩ F̂ (u), if u ∈ [x(1), c− ε ] , û γ−F̂ (c−ε) ε + γ − cγ−F̂ (c−ε) ε , if u ∈ (c− ε, c), F̂ (u), if u ∈ [c, x(n) ] , A NEW TEST FOR UNIMODALITY 5 increases, is continuous, convex on [ x(1), c ] , concave on [ c, x(n) ] , and its graph lies in Πβ . Thus, d.f. F̂ε(u) is unimodal, and we can consider it as an estimation of the hypothetical d.f. of a general population G. The significance level of this test is β. Theorem 1 is proved. Remark 1. Theorem 1 has the following geometric sense: let c be the abscissa of the first exit point, where the convex minorant ϕinf (u) exceeds the upper bound of Πβ while moving from the maximal order statistics to the minimal one, and let d be the abscissa of the first exit point, where the convex minorant ψsup(u) exceeds the upper bound of Πβ while moving from the minimal order statistics to the maximal one. Then the hypothetical d.f. F (u) is unimodal iff the exit points c and d lie outside [ x(1), x(n) ] or c ≥ d. 5. Test for unimodality based on m.e.d.f. The confidence strip Π̂β for a hypothetical d.f. can be constructed on m.e.d.f. F̃ ∗ n(u) in the following way: let the significance level β∗ be given, let ε be the width of Πβ , and let p ( Δ = max x(1)≤u≤x(n) |F (u)− F ∗ n(u)| > ε ) = β∗. We put ϕ̃(u) = F̃ ∗ n(u) + ε+ 1 n and ψ̃(u) = F̃ ∗ n(u) − ε− 1 n . It is easy to see that Π̃β∗ with lower bound ψ̃(u) and upper bound ϕ̃(u) has the significance level not exceeding β∗. Indeed, by the virtue of (4),∣∣∣F (u)− F̃ ∗ n(u) ∣∣∣ = ∣∣∣F (u)− F ∗ n(u) + F ∗ n(u)− F̃ ∗ n(u) ∣∣∣ ≤ |F (u)− F ∗ n (u)|+ 1 n Therefore, Δ̃ = max x(1)≤u≤x(n) |F (u)− F ∗ n(u)| < Δ + 1 n Hence, p ( Δ̃− 1 n ≥ ε ) ≤ p (Δ ≥ ε) = β∗, p ( Δ̃ ≥ ε+ 1 n ) ≤ p ( Δ̃ ≥ ε̃ ) = β∗. Thus, the significance level of Π̃β∗ does no exceed β∗. Since we increase the validity of the test by selecting β∗ as a significance level, we can use the m.e.d.f F̃ ∗ n(u) to construct Π̂β without decrease in the significance level. However, doing this, we increase the width of Π̃β∗ by 1 n relative to Πβ∗ . For moderate samples (30 ≤ n ≤ 200), this increment varies from 7 to 13 Now, we can formulate the test for the unimodality of a hypothetical d.f. based on m.e.d.f. Theorem 2. Let ϕ̃inf(u) and ψ̃sup(u) be the convex minorant and concave majorant of ϕ̃(u) and ψ̃(u), respectively, and let c = sup { u : ϕ̃inf(u) ≤ ψ̃(u), x(1) ≤ u ≤ x(n) } , d = inf { u : ψ̃sup(u) ≥ ϕ̃(u), x(1) ≤ u ≤ x(n) } . Then, the hypothetical distribution F (u) is unimodal iff 1) ϕ̃inf(u) ≥ ψ̃(u) or ψ̃sup(u) ≤ ϕ̃(u) ∀u ∈ [x(1), x(n) ] ; or 2) c ≥ d. 6 ROMAN I. ANDRUSHKIW, DMITRY D. KLYUSHIN, AND YURIY I. PETUNIN Moreover, the significance level of this criterion is β∗. The proof of Theorem 2 is similar to that of Theorem 1. 6. Conclusion It is shown in [12] that if the distribution function of a general population is unimodal, then the confidence interval (m(x)− 3σ(x),m(x) + 3σ(x)), where m(x) is the mathemat- ical expectation of G and σ(x) is the standard deviation of G, has the significance level which does not exceed 0.05. That is why, this nonparametric test for unimodality can be used to construct the confidence interval for the bulk of the general population G. Bibliography 1. B.W. Silverman, Using kernel density estimates to investigate multimodality, J. of the Royal Statistical Society B 43 (1981), 97-99. 2. J.A. Hartigan, Computation of the dip statistics to test for unimodality, Applied Statistics 34 (1985), 320-325. 3. J.A. Hartigan, The span test of multimodality, Classification and Related Methods of Data Analysis, (H. H. Bock, ed.), North-Holland, Amsterdam, 1988, pp. 229-236. 4. J.A. Hartigan, S. Mohanty, The RUNT test for multimodality, Applied Statistics 9 (1992), 63-70. 5. A.N. Kolmogoroff, Determinatione empirica di una legge di distributione, Giornale Instit. Ital. Attuari 4 (1933), 83-91. 6. N.V. Smirnov, Sur les ecarts de la courbe de distribution empiric, Mat. Sb. 6 (1939), 3-26. 7. A. Wald, J. Wolfowitz, Confidence limits for continuous distribution functions, Ann. Math. Statist. 10 (1939), 199-326. 8. W. Feller, On the Kolmogorov–Smirnov limit theorems for empirical distributions, Ann. Math. Statist. 19 (1948), 177-189. 9. F.J. Massey, A note on the estimation of a distribution function by confidence limits, Ann. Math. Statist. 21 (1950), 125-128. 10. Z.W. Birnbaum, F.H. Tingey, One-sided confidence contours for distribution functions, Ann. Math. Statist. 22 (1951), 592-596. 11. B.L. Van der Waerden, Mathematische Statistik, Springer, Berlin, 1957. 12. D.F. Vysochanskij, Yu.I. Petunin, Justification of the 3-σ rule for unimodal distribution, Theor. Probability Math. Stat. 21 (1980), 25-36. ���������� �������� ��� �� ����� ��� ������ � ���� �� �������� �� ��� ���� � �� ��� ��� ������ ���� ���� ����� � ��� ������� �� !" #� $�� ����� ���%����� &� % ��� ��� $� %��� ��� ���������� ��'����� ��� ()� * � ��� ������ ���+� &� % " ,,� $��� �� E-mail : vm214@dcp.kiev.ua
id nasplib_isofts_kiev_ua-123456789-4530
institution Digital Library of Periodicals of National Academy of Sciences of Ukraine
issn 0321-3900
language English
last_indexed 2025-12-07T16:50:16Z
publishDate 2008
publisher Інститут математики НАН України
record_format dspace
spelling Andrushkiw, R.I.
Klyushin, D.D.
Petunin, Y.I.
2009-11-25T10:59:23Z
2009-11-25T10:59:23Z
2008
A new test for unimodality / R.I. Andrushkiw, D.D. Klyushin, Y.I. Petunin // Theory of Stochastic Processes. — 2008. — Т. 14 (30), № 1. — С. 1–6. — Бібліогр.: 12 назв.— англ.
0321-3900
https://nasplib.isofts.kiev.ua/handle/123456789/4530
519.21
A distribution function (d.f.) of a random variable is unimodal if there exists a number such that d.f. is convex left from this number and is concave right from this number. This number is called a mode of d.f. Since one may have more than one mode, a mode is not necessarily unique. The purpose of this paper is to construct nonparametric tests for the unimodality of d.f. based on a sample obtained from the general population of values of the random variable by simple sampling. The tests proposed are significance tests such that the unimodality of d.f. can be guaranteed with some probability (confidence level).
en
Інститут математики НАН України
A new test for unimodality
Article
published earlier
spellingShingle A new test for unimodality
Andrushkiw, R.I.
Klyushin, D.D.
Petunin, Y.I.
title A new test for unimodality
title_full A new test for unimodality
title_fullStr A new test for unimodality
title_full_unstemmed A new test for unimodality
title_short A new test for unimodality
title_sort new test for unimodality
url https://nasplib.isofts.kiev.ua/handle/123456789/4530
work_keys_str_mv AT andrushkiwri anewtestforunimodality
AT klyushindd anewtestforunimodality
AT petuninyi anewtestforunimodality
AT andrushkiwri newtestforunimodality
AT klyushindd newtestforunimodality
AT petuninyi newtestforunimodality