Алгоритми призначення зовнішніх рецензентів для захисту PhD-дисертацій

We propose an approach to assigning external reviewers. In the proposed ap-proach, only the semantic similarity between applications and reviewers is tak-en into account; the similarity indices are assessed, and the necessary number of reviewers is assigned to ensure the maximum suitability level of...

Повний опис

Збережено в:
Бібліографічні деталі
Дата:2025
Автори: Shtovba, Serhiy, Petrychko, Mykola
Формат: Стаття
Мова:Англійська
Опубліковано: The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025
Теми:
Онлайн доступ:https://journal.iasa.kpi.ua/article/view/351442
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:System research and information technologies
Завантажити файл: Pdf

Репозитарії

System research and information technologies
_version_ 1867334455959486464
author Shtovba, Serhiy
Petrychko, Mykola
author_facet Shtovba, Serhiy
Petrychko, Mykola
author_institution_txt_mv [ { "author": "Serhiy Shtovba", "institution": "Vasyl’ Stus Donetsk National University, Vinnytsia; Vinnytsia National Technical University, Vinnytsia" }, { "author": "Mykola Petrychko", "institution": "Vinnytsia National Technical University, Vinnytsia" } ]
author_sort Shtovba, Serhiy
baseUrl_str http://journal.iasa.kpi.ua/oai
collection OJS
datestamp_date 2026-02-02T20:49:24Z
description We propose an approach to assigning external reviewers. In the proposed ap-proach, only the semantic similarity between applications and reviewers is tak-en into account; the similarity indices are assessed, and the necessary number of reviewers is assigned to ensure the maximum suitability level of the reviewers with the application, according to some criteria. We also perform a comparative analysis of various optimization algorithms using the criterion of “assignment quality–optimization time”. Experiments on the dataset showed that a reasona-ble balance between the “assignment quality” and “optimization time” criteria for the assignment of external reviewers can be achieved using a greedy algo-rithm without elitism or brute-force search on a truncated set of candidates. An application of the proposed algorithms improves the average quality of PhD committees by 13–34% across the entire dataset, depending on the algorithm used.
doi_str_mv 10.20535/SRIT.2308-8893.2025.4.08
first_indexed 2026-02-08T08:06:15Z
format Article
fulltext  Serhiy Shtovba, Mykola Petrychko, 2025 Системні дослідження та інформаційні технології, 2025, № 4 127 TIДC НАУКОВО-МЕТОДИЧНІ ПРОБЛЕМИ В ОСВІТІ UDC 519.254+001.2 DOI: 10.20535/SRIT.2308-8893.2025.4.08 ALGORITHMS FOR ASSIGNMENT OF EXTERNAL REVIEWERS FOR PHD-THESIS DEFENSE SERHIY SHTOVBA, MYKOLA PETRYCHKO Abstract. We propose an approach to assigning external reviewers. In the proposed approach, only the semantic similarity between applications and reviewers is taken into account; the similarity indices are assessed, and the necessary number of re- viewers is assigned to ensure the maximum suitability level of the reviewers with the application, according to some criteria. We also perform a comparative analysis of various optimization algorithms using the criterion of “assignment quality– optimization time”. Experiments on the dataset showed that a reasonable balance be- tween the “assignment quality” and “optimization time” criteria for the assignment of external reviewers can be achieved using a greedy algorithm without elitism or brute-force search on a truncated set of candidates. An application of the proposed algorithms improves the average quality of PhD committees by 13–34% across the entire dataset, depending on the algorithm used. Keywords: external reviewers, reviewer assignment problem, categorization, opti- mization, brute force algorithm, greedy algorithm, assignment in isolation, PhD- thesis, Dimensions, ANZSRC 2020, research group. INTRODUCTION External reviewers are persons from outside an institution who are invited to pro- vide an independent evaluation or assessment of a particular project, document, research paper, or system. They are often selected for their expertise in a relevant field and are expected to offer objective, unbiased feedback. In academia, external reviewers are used in the peer-reviewing to evaluate the quality, relevance, and originality of academic papers before publication. They may also be used for re- viewing PhD-thesis. In Ukraine, a PhD thesis is defended in front of a committee. A PhD- committee consists of 5 scientists with expertise in the thesis subject. The chairman and 1 or 2 reviewers are from the PhD-student’s institution, and 2 or 3 external reviewers are invited from other institutions. The members of the PhD- committee are assigned manually, which has several disadvantages. First of all, there are corruption risks when the committee is formed exclusively from friendly persons who a priori give only favorable reviews regardless of the results of the thesis. Second, a lot of time is spent on manual search and analysis of candidates for the committee. Third, the combining competence of the committee may not fully correspond to the thesis topic due to the fact that some of the good Serhiy Shtovba, Mykola Petrychko ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 128 candidates were missed during the manual search. Therefore, there is an interest in automating the assignment of reviewers to eliminate the specified risks of the human factor influence. The general task of assigning the reviewers consists of three stages [1]: 1) forming of a pool of potential reviewers and subsequently choosing a method of data representation for reviewers and applications; 2) assessing the similarities between the application and the reviewers; 3) assignment of applications to re- viewers to maximize combined similarity across all the subjects with some con- straints. Typical constraints include balancing reviewer workloads, taking into account their preferences, and preventing conflicts of interest. In this work, it is assumed that the pool of potential reviewers is available. Automatic assignment of reviewers assumes that some initial information about reviewers and applications is available. A structured set of such information is called a reviewer profile and an application profile. The following information about reviewer’s publications is used usually to build a reviewer’s profile: title, abstract, keywords, full text, list of references, and list of citations [2]. Abstract, full text, keywords and title are most often used to create an application profile [2]. Applications’ profiles and reviewers’ profiles are built using various natural language processing methods based on bag of words [2; 3; 4], hidden semantic analysis [5; 6], topic modeling [7; 8], static language models with deep learning [9; 10; 11] and contextual models with deep learning [12]. Approaches to solving the problem of automatic assignment of reviewers in most cases require a fairly large amount of initial information about the reviewers’ publications, their inter- action with other scientists, and similar information about the authors of applica- tions. Analyzing this information is costly and will not be expedient if thousands of candidates are to be analyzed in detail for each team of reviewers. Our paper is dedicated to the assignment of external reviewers for PhD thesis defense. A candidate list of available internal reviewers is usually too short; hence it makes no sense to optimize it. We focus on the task of express assignment of external reviewers, where a long initial list of candidates is to be reduced drastically. The subsequent short list can be analyzed manually, or a fine assignment procedure can be activated, which is resource-intensive and requires a much larger volume of initial information than is required for express assignment. During express assignment, only the semantic similarity between applications and reviewers is taken into account, which provide the maximal level of collective competence of the committee. Іn this paper we perform comparative analysis of various optimization algorithms by using the criteria of “assignment quality – optimization time” in order to better understand the tradeoffs when choosing “assignment quality” over “optimization time” or vice versa. DATA REPRESENTATION At the first stage of assigning the reviewers, it is necessary to choose the source data for decision-making, as well as the method of its representation in vector form. In the case of an application, a list of its keywords is used, and in the case of a reviewer, a list of keywords obtained from available data is used. In general, this list of keywords can be from the candidate’s recent publications, from his CV or from a profile from some register of scientists. In the second case, keywords or Algorithms for assignment of external reviewers for PhD-thesis defense Системні дослідження та інформаційні технології, 2025, № 4 129 research interests are formed by the candidate at his own discretion, that is, they are presented in an arbitrary form without reference to any rubric or classifier. The source data is usually processed using statistical models, topic models and embedding models. Some of them analyze the frequency of occurrence of words in the text, others form representation vectors based on the co-occurrence of words. Usually, the resulting vector representations are difficult to interpret. In addition, obtaining such representations requires a large amount of data. We sug- gest using the approach from [13], according to which a set of keywords is cate- gorized as a vector in the space of research groups from the Australian and New Zealand Standard Research Classification — ANZSRC 2020. ANZSRC 2020 in- cludes 171 research groups from 22 divisions. Therefore, the final representation of the application and reviewer profiles looks like a distribution over the 171 re- search groups from ANZSRC 2020. In order to carry out a categorization, it is necessary to have a corpus of marked articles that are assigned to one or more research groups, and a machine learning model that, based on keywords, assigns the analyzed profile to certain research groups. We use the information resources of the Dimensions, in which more than 100M publications are already categorized according to ANZSRC 2020. For a search query in the form of a keyword, Dimensions pro- duces an output that indicates how many publications with that keyword are as- signed to each of the research group. This procedure is shown schematically in Fig. 1. It also shows that in the collection of marked documents an article can be categorized into several research groups, for example, Article 1 is assigned to Re- search Group 1 and Research Group 2. Based on this output, the distribution of a keyword’s occurrence in the context of various research groups can be built. For example, for the keyword from Fig. 1 distribution looks like this: Research Group 1 — 3 appearances, Research Group 2 — 2 appearances, Research Group 3 — 2 appearances, and Research Group K — 1 appearance. On the basis of this distribution, the keyword “some keyword” is further categorized within the framework of the research classification system. To categorize a set of keywords, the algorithm from [13] is applied, which is based on the resources and services of Dimensions. This algorithm takes into account both the occurrence of isolated keywords from a profile, as well as the co-occurrence of keyword pairs. The algo- rithm allows to filter the information noise caused by both stop words and rare keywords that have low reliability of the conclusions. The categorization algorithm consists of 3 stages. For a set of two keywords the procedure of categorization is schematically shown on Fig. 2. In the first stage the set E of search queries is created using the initial keywords and their pairwise Fig. 1. Keyword categorization schema Serhiy Shtovba, Mykola Petrychko ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 130 combinations. At the second stage the membership degrees of queries to research groups are computed. For this the overall distribution of the number of publica- tions over research groups using Dimensions API is found. Then the same is done for each search query with subsequent stop-words detection and noise filtering. Having done this, the relative frequencies of search queries based on the overall distribution is found and the noise reduction using cumulative contribution of re- search groups is done. On the third stage all the queries distributions are averaged that produces one-dimensional vector. We further perform truncation to at most RG_max research groups with non-zero membership degree. A reviewer by the proposed algorithm can be categorized to at most T_max research groups, and the smallest membership degree is restricted to be at least RG_min_degree. The trun- cation is done in the last step of the third stage by removing research groups with low membership degree. The MATLAB-style pseudocode of the categorization algorithm is as follows: %STAGE #1 — creating the set E of search queries from the key- % words w E=w for i=1:length(w)-1 for j=i:length(w) E={E; [w(i) ‘AND’ w(j)] } end end %STAGE #2 — compute membership degrees to research groups by % each query Fig. 2. Keywords detailed categorization schema Algorithms for assignment of external reviewers for PhD-thesis defense Системні дослідження та інформаційні технології, 2025, № 4 131 < Find the total number of publications in each research groups N=[N(1), N(2), …, N(m)], m=171 > Counter=0 % the counter of successful query responses for i=1:length(E) < Find Q — the total number of publications in Dimensions, that contain E{i} > If Q>Threshold_StopWord continue % ignoring the stop- words end If Q<Threshold_Noise continue; % ignoring the rare key- words end < Find t(1), t(2), …, t(m) — the number of publications in each research group for query E{i} > %Ignoring the research group with a tiny number of publica % tions: indeх=find(t<Threshold_topic) t(indeх)=0 if max(t)==0 continue end r=t./N %frequency of E{i}’s occurrence in research groups %Normalizing the frequency distributions: Gamma=r./sum(r) < Choosing the most popular research groups that have cumu- lative contribution in Gamma >= Tail. ID-numbers of the remain- ing research groups are put in vector Rejected > %Ignoring the research groups with contribution lower than % Tail: Gamma(Rejected)=0 Gamma=Gamma./sum(Gamma) %normalizing again Counter=Counter+1 Mu(Counter)=Gamma end If Counter==0 return (‘Unsuccessful’) end %STAGE #3 — compute membership degrees using all queries Mu_mean=mean(Mu) % averaging all successful queries %Computing the current number of the selected research groups: Current_N_RGs=sum(Mu_mean>0) [Mu, RG_ID, Current_N_RGs]=Top_RG(Mu_mean, Source_RG_ID, RG_max) % Top_RG — forms RG_ID as a selection of RG_max research groups % with % highest membership degree from Source_RG_ID. RG_ID is descend % ing order % list of research groups according to their membership degrees % Mu. % Vector Mu is normalized in [0; 1]. %Finish truncation based on kinship of research groups: while (true) if (Current_N_RGs<=Tmax AND Mu(end)>RG_min_degree) break end Serhiy Shtovba, Mykola Petrychko ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 132 if (Current_N_RGs<=1) break end < Drop the minor groups and redistribute its contribution to others based on their kinship > for target=1:Current_N_RGs–1 akin_factor=Jaccard(RG_ID(target), RG_ID(Current_N_RGs)) Mu(target)=Mu(target)+Mu(Current_N_RGs)*akin_factor end [Mu, RG_ID, Current_N_RGs]=Top_RG(Mu, RG_ID, Current_N_RGs-1); end Return(Mu, RG_ID) At the last stage of the algorithm when dropping a minor research group its contribution is redistributed to other research groups based on their kinship. The additional value is proportional to the kinship level between the target research group and the research group being removed. The kinship level is assessed using Jaccard index, where the size of the intersection is the number of publications cat- egorized to belong to both research groups, and the size of the union is the num- ber of publications categorized to either of research groups [14]. We formed the matrix of Jaccard indices for research groups using Dimensions API for the data period of 2019–2023. The intuition behind this step lays in the fact that we want to increase the influence of the subset of research groups that are more akin than others. For example, a researcher is categorized tentatively to research groups 4410 Sociology, 4611 Machine Learning, 3508 Tourism, and 3504 Commercial Ser- vices as follows:       3504350846114410 15.0 , 2.0 , 25.0 , 4.0 . Let us drop the minor research group 3504. For this, we first compute Jaccard indices between 4609 and other research groups using the method from [14]. For the data of 2019–2023 they are: 044.0),( 35044410J ; 0),( 35044611J ; 478.0),( 35043508J . By taking into account the kinships, the contribution of the research group 4609 is redistributed in the following way:        350846114410 15.0478.02.0 , 15.0025.0 , 15.0044.04.0 . As a result, we get:       350846114410 271.0 , 25.0 , 466.0 . After norming:       461135084410 253.0 , 275.0 , 472.0 . As a result, research group 3508 Tourism has been strongly reinforced. This research group is closely related to 3504 Commercial Services, which has been eliminated. If we simply discard the minor research group, then after normalization we get       350846114410 24.0 , 29.0 , 47.0 . In this case, there was no additional reinforcement of the 3508 research group. Algorithms for assignment of external reviewers for PhD-thesis defense Системні дослідження та інформаційні технології, 2025, № 4 133 Let’s present a step-by-step example of how the proposed algorithm works. For this, Susan Dumais is considered as a potential reviewer. The reviewer’s in- formation is taken from her Google Scholar profile that contains a set of research interests. Those interests may be interpreted as a set of initial keywords. For this reviewer the keywords are: “Information Retrieval”, “Human-Computer Interac- tion”. Interests often complement each other thus making the research topics more focused. To take this into account, additional keywords are synthesized as pairs of initial interests. Interests in a pair are combined by a logical operation AND as follows: “Information Retrieval” AND “Human-Computer Interaction”. Fig. 3 shows the initial distribution of membership degrees to research groups for the research interests of Susan Dumais. For each of the reviewer’s interest and conjunction of her interests the distribution to research groups from Dimensions is found. Then the research groups with cumulative contribution less than Tail is dropped to reduce the noise (Fig. 4). Tail is set to be 0.93. The next step is to av- erage over all interests’ distribution (Fig. 5) and further restrict the max number of non-zero membership degrees to be at most RG_max. RG_max is set to be 12. The noise reduction steps and the restriction on the max number of non-zero membership degrees are based on the assumption that researchers usually are pro- ficient only in a few research fields at once. In the end in case of 4max T Susan Dumais is represented by the following research groups: 4608 Human-Centred Computing with degree 0.35; 4609 Information Systems with degree 0.25; 4602 Artificial Intelligence with degree 0.21; 4605 Data Management and Data Science with degree 0.19. Fig. 3. The initial interests’ distributions for Susan Dumais Serhiy Shtovba, Mykola Petrychko ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 134 As the result of categorization, an application profile, defined as a set of keywords }...,,,{ 21 nw wwwA  , is transformed into a profile defined as a cate- Fig. 4. Interests’ distributions after filtering by Tail Fig. 5. Reviewer’s distribution after averaging over all insterests’ distributions and final result Algorithms for assignment of external reviewers for PhD-thesis defense Системні дослідження та інформаційні технології, 2025, № 4 135 gorical distribution over research groups )}(...,),(),({ 21 AAAA mtttt  , where ]1;0[)(  A it denotes membership degree of application A to research group it , mi ,1 . Similarly, a reviewer's profile, defined as a set of keywords or research interests }...,,,{ 21 nw wwwR  , is transformed into a profile defined as a categorical distribution over research groups )}(...,),(),({ 21 RRRR mtttt  . SIMILARITY ASSESMENT To match reviewers and applications, a similarity metric between 2 categorical distributions, the reviewer keywords’ research groups distribution and the appli- cation keywords’ research groups distribution, has to be defined. For this, the metric from [15] is used. The metric calculates the similarity of two objects X and Y with the following categorical distributions ))(...,),(),(( 21 XXX m and ))(...,),(),(( 21 YYY m , where m denotes the number of categories, that are research groups in our case, )(Xi denotes membership degree of object X to i-th category, )(Yi denotes membership degree of object Y to i-th category, mi ,1 . Distributions are normalized and satisfy the following conditions: ]1;0[)(  Xi , ]1;0[)(  Yi , mi ,1 ; ;1)( ,1   mi i X .1)( ,1   mi i Y The categorical distributions of objects X and Y look like two fuzzy sets on universal sets of all categories. Therefore, to calculate the similarity of objects X and Y, it is proposed to use an intersection of the corresponding fuzzy sets. This is reflected in the metric [15], according to which the similarity of objects X and Y is defined as follows: ),())(),((min),( ,1 YXFYXYXFit mi ii    , (1) where    mi ii YX ,1 ))(),((min is an addend that evaluates the direct similarity of objects X and Y; ),( YXF is an addend that evaluates the similarity of objects X and Y through akin categories (akin research groups in our case). Across the all research groups, kinship is conveniently represented by a binary fuzzy relation- ship in the form of an mm matrix. Each element of the matrix corresponds to the kinship level of two corresponding research groups. An identification of this kinship matrix is easily performed by the method [14], which uses the Jaccard index on data from Dimensions. TASK STATEMENT OF ASSIGNMENT OPTIMIZATION Consider the task of assigning a team of reviewers, who are collectively the best suited for reviewing an application. For this task, 2 cases are possible: forming a team from scratch and supplementing the team with new members. Serhiy Shtovba, Mykola Petrychko ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 136 Given: an application profile )}(...,),(),({ 21 AAAA mtttt  and profiles of k-th potential reviewers )}(...,),(),({ 21 jtjtjttj RRRR m  , kj ,1 in the space of m research groups. The entire set of reviewers is denoted as }..,,,{ 21 kRRRR . Find out: subset of reviewers RS with the highest overall suitability level to all the topics of the application: max))(,( SAggAFit , where )(SAgg denotes aggregation function of categorical distributions of the assigned reviewers set. Aggregation of categorical distributions by reviewer profiles tjR , kj ,1 in the space of research groups from ANZSRC 2020 is implemented using the third stage of the above described categorization algorithm. The number of reviewers for an application is denoted by Sc  . This quan- tity is constant; usually it is from 2 to 5 people. The level of suitability between the application and the team of reviewers is calculated by formula (1). REVIEWER ASSIGNMENT ALGORITHMS The task of assigning reviewers from a mathematical point of view is to find a subset of fixed cardinality. To solve such problems in practice, mostly approxi- mate algorithms are used. Among the set of possible algorithms, it is necessary to choose the one that provides a balance between assignment quality and efforts for solution finding. The following algorithms are proposed to be used. Brute force. The best solution can be found by trivial brute force. For appli- cation A, among all possible teams of size c from the reviewers set R, a team with the maximum level of suitability has to be found. The complexity of brute force grows exponentially. The number of operations is proportional to the binomial coefficient: !!)( ! ccn n  . So even for medium-sized problems, it is unrealistic to walk through all possible options and adhere to some time constraints. Moreover, the number of options depends very much on the c. Brute force on a truncated set of candidates. In practice, candidates with a low level of similarity are unlikely to be assigned as reviewers. Therefore, the rational step would be to ignore potential reviewers with very low similarity. By rejecting candidates with low similarity to the application, for example, at the lev- el of 0.1 or 0.2, the search time can be significantly reduced. The number of op- erations is still proportional to the binomial coefficient but on a much smaller set of reviewers: )_( leveltruncationrpn  , where )_( leveltruncationrp  is the probability that a reviewer r will have at least leveltruncation _ similarity level with the application. The more we thin out the initial list of candidates, the shorter the duration of optimization will be, but the risks of deviating far from the opti- mum increase. Pure greedy algorithm. The reviewers are assigned iteratively to ensure at each step the maximum suitability of the current fragment of the team to the ap- plication. The algorithm is performed in c iterations. At each iteration, one new Algorithms for assignment of external reviewers for PhD-thesis defense Системні дослідження та інформаційні технології, 2025, № 4 137 member is added to the team of reviewers, who at this iteration maximizes the level of combined suitability of the current composition with the application. In the first iteration, we find the candidate with the highest similarity to the applica- tion. In the second iteration, we choose the candidate who, together with the al- ready selected member of the team, has the highest suitability level to the applica- tion. The number of operations with this approach is significantly reduced and is proportional to cn , but the solution may turn out to be suboptimal. Greedy algorithm with elitism. The candidate with the highest value of suit- ability to the application is added first. At the same time, the level of combined suitability of updated reviewer team to the application is not taken into account. Other reviewers are assigned according to the pure greedy algorithm, that is, can- didates are assigned who, in the current iteration, maximize the team’s suitability level to the application. The greedy algorithm with elitism significantly shortens the duration of the optimization but still is proportional to 1cn . Assignment in isolation. The easiest way to assign reviewers is to choose those who are the most similar to the application. The combined suitability of the team is not taken into account. It is assumed that the stronger each of the candi- dates corresponds to the application, the better the team will be. Roughly speak- ing, the combined suitability level of the team is considered to be the sum of the similarity levels of each member. Algorithmically, assignment in isolation is im- plemented by sorting the candidates in descending order of similarity to the appli- cation and selecting the first c candidates. The number of operations is propor- tional to cn  in the best case. This is a very fast algorithm, but with a small chance of getting to the optimum. DATASET FOR ASSIGNING EXTERNAL REVIEWERS For experiments on the assignment of external reviewers, a dataset of PhD-thesis was collected [16]. For this, the information system of Ukrainian National Agency for Higher Education Quality Assurance was used. The collected theses belong to various research fields (Fig. 6) with the predominance of Information Technologies. Fig. 6. PhD-theses distribution over research fields Serhiy Shtovba, Mykola Petrychko ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 138 EXPERIMENTS ON ASSIGNING EXTERNAL REVIEWERS Experiments on external reviewers’ assignment are conducted on the formed da- taset of theses. At first, a thesis’s keywords are categorized according to the key- word categorization algorithm within the research groups from ANZSRC 2020. Next, in a similar way, the keywords of the articles of the committees’ members are categorized. Pairs of keywords are combined into additional queries only within one article. For each committee, the external reviewers are removed and new ones are assigned from other committees to maximize combined suitability. After removing the external reviewers, we get a set of fragments of committees, containing the chairman and two or one internal reviewers. The task is to find ex- ternal reviewers whose addition to the fragments of committee ensures their max- imum of combined suitability level to the topic of the theses. The results of the reviewers’ assignment are compared with the version of the committee, which is formed by the institution. The effect is estimated by an average level of change in the suitability level of committees: %100 )( ),( ,1 ,1        Ni current i Ni current i new i currentnew F FF FFE , where N denotes number of theses; new iF denotes suitability level of the com- mittee for i-th thesis after optimization, Ni ,1 ; current iF denotes suitability level of the committee for i-th thesis before optimization, Ni ,1 . Fig. 7 presents the results of optimization using various assignment algorithms. Most of the committees from institutions have the suitability level above 0.2. The interquartile range is approximately equal to [0.4; 0.8]. With brute force there is a significant improvement in the suitability levels for the majority of committees. Some committees are not improved or the improvement level is low. This is due primarily to the fact that the distribution of theses by fields in the dataset is uneven and the dataset has a relatively small size. In almost all cases, 1 8 7 4 2 3 5 6 Fig. 7. Distribution of committees’ suitability level depending on the algorithm used Algorithms for assignment of external reviewers for PhD-thesis defense Системні дослідження та інформаційні технології, 2025, № 4 139 committees from institutions have a lower suitability level to thesis than found by any assignment algorithm. By manually creating committees with limited oppor- tunities for choosing committee’s members, we get an average level of suitability to the thesis. On the other hand, with the automatic assignment of committee’s members and a sufficiently large pool of candidates, we get a significant im- provement of the committees only by changing external reviewers. Fig. 8 compares suitability levels of committees’ found by brute force with the committees found by other algorithms including brute force on a truncated set of candidates. Brute force on a truncated set of candidates with similarity thresh- old 0.1 performs almost identically as regular brute force, but the optimization time is reduced (Fig. 9). Brute force on a truncated set of candidates with similar- ity thresholds 0.2 and 0.3 performs very similar to the regular brute force, but there are a few suboptimal committees in both cases. Committees found by pure greedy algorithm are also suboptimal. Its performance is very close to the brute force 0.2 and is somewhat better than the brute force 0.3, but the time of optimi- zation is significantly better (Fig. 9). Greedy algorithm with elitism performs slightly worse than pure greedy algorithm, there are slightly more suboptimal committees, but it is close to the brute force 0.3 with the optimization time re- duced (Fig. 9). Under the assignment in isolation, most of the committees are suboptimal but it is the fastest among the algorithms (Fig. 9). This is due to the fact that the high similarity of a candidate with a thesis does not mean that the team formed by assignment in isolation covers the entire research groups’ distri- bution of the thesis. Fig. 8. Comparison of committees found by brute force with the committees found by faster algorithms Serhiy Shtovba, Mykola Petrychko ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 140 Fig. 9 compares the results of committees’ assignments according to various optimization algorithms. Optimizing the truncated set of candidates with the simi- larity threshold of 0.3 is clearly unsuccessful. All others form a Pareto set. There- fore, when choosing an algorithm, it is necessary to take into account priorities, what is needed — a quick result or a high-quality one. From Fig. 9, it can be seen that the level of change due to the skip from pure greedy algorithm to brute force algorithms grows slowly. But the optimization time increases significantly. There- fore, the pure greedy assignment algorithm can be considered the most balanced. An alternative to it can be the brute force on truncated set of candidates with the similarity threshold in the vicinity of 0.25. These conclusions are based on ex- periments on a small dataset. With real databases of large volume, the optimiza- tion time by brute force algorithms can increase drastically. AN EXAMPLE OF ASSIGNING A COMMITTEE Let’s consider an example of assigning a committee for the following thesis: “Models and methods of data processing of the system of remote monitoring of the condition of patients with diabetes”. The thesis identifier in National Agency for Higher Education Quality Assurance is 4756. The thesis’s keywords are: edge devices; IoT; diagnostics; diseases; intelli- gent data analysis; information technologies; medical information systems; mod- eling; monitoring; data processing; patient; forecasting; software component model; system design; diabetes. After categorizing these keywords, we get the following result: 4605 Data Management and Data Science — 0.382; 4606 Distributed Computing and Systems Software — 0.255; 4609 Information Systems — 0.205; 4203 Health Services and Systems — 0.158. The thesis is represented by the following vector:       4203 158.0 , 4609 205.0 , 4606 255.0 , 4605 382.0 tA Fig. 9. Comparison of assignment algorithms according to the “duration — quality” criteria Algorithms for assignment of external reviewers for PhD-thesis defense Системні дослідження та інформаційні технології, 2025, № 4 141 In National Agency for Higher Education Quality Assurance, the research topics of each committee member are represented by the keywords of 3 or 4 of his/her papers. To categorize them, the principle of a bag of keywords is applied. Categorization of a member takes place as follows: 1) for each set of keywords of one paper, their paired combinations is created; 2) the received sets of keywords of different papers are combined into into one bag; 3) categorize the received set of keywords according to the algorithm [13]. The result of the committee catego- rization is as follows. Research groups of the chairman are: 4609 Information Systems — 0.381; 4203 Health Services and Systems — 0.225; 4606 Distributed Computing and Systems Software — 0.214; 4601 Applied Computing — 0.180. Suitability level of the chairman is: 577.0 4601 180.0 , 4606 214.0 , 4203 225.0 , 4609 381.0 , 4203 158.0 , 4609 205.0 , 4606 255.0 , 4605 382.0                   Fit . Research groups of the first inner reviewer are: 4606 Distributed Computing and Systems Software — 0.337; 4605 Data Management and Data Science — 0.256; 4003 Biomedical Engineering — 0.244; 3208 Medical Physiology — 0.162. Suitability level of the first inner reviewer is: 564.0 3208 162.0 , 4003 244.0 , 4605 256.0 , 4606 337.0 , 4203 158.0 , 4609 205.0 , 4606 255.0 , 4605 382.0                   Fit . Research groups of the second inner reviewer are: 4606 Distributed Computing and Systems Software — 0.426; 4605 Data Management and Data Science — 0.299; 4003 Biomedical Engineering — 0.138; 4604 Cybersecurity and Privacy — 0.135. Suitability level of the second inner reviewer is: 521.0 4604 135.0 , 4003 138.0 , 4605 299.0 , 4606 426.0 , 4203 158.0 , 4609 205.0 , 4606 255.0 , 4605 382.0                   Fit . Research groups of the first external reviewer are: 3201 Cardiovascular Medicine and Haematology — 0.387; 3203 Dentistry — 0.215; 4605 Data Management and Data Science — 0.205; 4602 Artificial Intelligence — 0.192. Suitability level of the first external reviewer is: 239.0 4602 192.0 , 4605 205.0 , 3203 215.0 , 3201 387.0 , 4203 158.0 , 4609 205.0 , 4606 255.0 , 4605 382.0                   Fit . Research groups of the second external reviewer are: 4602 Artificial Intelligence — 0.435; 4611 Machine Learning — 0.357; 4605 Data Management and Data Science — 0.208. Serhiy Shtovba, Mykola Petrychko ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 142 Suitability level of the second external reviewer is: 227.0 4605 208.0 , 4611 357.0 , 4602 435.0 , 4203 158.0 , 4609 205.0 , 4606 255.0 , 4605 382.0                   Fit . The result of the committee aggregation is as follows:                                                                 4602 236.0 , 4605 374.0 , 4606 389.0 4605 208.0 , 4611 357.0 , 4602 435.0 4602 192.0 , 4605 205.0 , 3203 215.0 , 3201 387.0 4604 135.0 , 4003 138.0 , 4605 299.0 , 4606 426.0 3208 162.0 , 4003 244.0 , 4605 256.0 , 4606 337.0 4601 180.0 , 4606 214.0 , 4203 225.0 , 4609 381.0 Agg . The combined suitability level of the committee to the thesis is 631.0 4602 236.0 , 4605 374.0 , 4606 389.0 , 4203 158.0 , 4609 205.0 , 4606 255.0 , 4605 382.0                   Fit . This is a relatively good suitability level, which is mainly due to the strong overlap in two of the four research groups. Let’s try to choose the best external reviewers to increase the combined suit- ability level. The members of all other committees of the dataset are used as can- didates. As the result of brute force, the two new external reviewers are found. Their profiles are as follows:       3205 214.0 , 4202 251.0 , 4203 261.0 , 3210 274.0 with suitability level 0.158, and       4609 139.0 , 4611 306.0 , 4605 555.0 with suitability level 0.542. After ag- gregating all members of the new committee we get the following categorization:                                                                 4203 148.0 , 4609 156.0 , 4606 335.0 , 4605 361.0 4609 139.0 , 4611 306.0 , 4605 555.0 3205 214.0 , 4202 251.0 , 4203 261.0 , 3210 274.0 4609 163.0 , 4605 183.0 , 4603 196.0 , 4611 457.0 4007 142.0 , 4602 302.0 , 4612 556.0 4608 222.0 , 4602 228.0 , 4605 269.0 , 4611 281.0 Agg . The combined suitability level of the new committee to the thesis is 923.0 4203 148.0 , 4609 156.0 , 4606 335.0 , 4605 361.0 , 4203 158.0 , 4609 205.0 , 4606 255.0 , 4605 382.0                   Fit . Algorithms for assignment of external reviewers for PhD-thesis defense Системні дослідження та інформаційні технології, 2025, № 4 143 Comparing with the initial committee, a significant improvement in the level of suitability is observed, the new committee has the same research groups as the thesis. The improvement is about 46%. From the given example, it can be seen that although the individual similar- ity of an individual member of a committee may be mediocre, the overall suitabil- ity level of the committee may turn out to be high. This is due to the fact that the new external reviewers cover the so-called minor part of the thesis topic, which is outside the field of expertise of other committee members. This is clearly visible on Fig. 10 where the difference between the distributions of thesis, institution’s committee and proposed committee is shown. The thesis and proposed committee intersect in all their research groups. The institution’s committee lacks the re- search groups 4609 Information Systems and 4203 Health Services and Systems, which makes it less similar to the thesis’s research field. CONCLUSIONS The paper proposes an express method of assigning the external reviewers for PhD defense committee. On the first stage of assignment, the application and po- tential reviewers are categorized by presenting their profiles as vectors in the space of research groups from ANZSRC 2020. At the second stage, the suitability levels of potential reviewers to the application topic are calculated, taking into account the kinship of research groups. At the third stage, a team of reviewers is assigned, which corresponds to the topic of the application to the maximum pos- sible extent. To implement the third stage, the various optimization algorithms are proposed: brute force, brute force on a truncated set of candidates, greedy algo- rithm without elitism and with elitism, and on assignment in isolation. Experi- ments on the dataset of 67 PhD theses showed that the best balance in terms of assignment quality criteria and team searching duration provides greedy algorithm without elitism and brute force on a truncated set of candidates. As a result of the optimization, it was possible to improve the combined quality of committees by an average of 13–34% over all the dataset, depending on the type of algorithm used. Optimizing the truncated set of candidates with the similarity threshold of Fig. 10. Comparison of initial committee and proposed committee Serhiy Shtovba, Mykola Petrychko ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 144 0.3 is clearly unsuccessful. All others form a Pareto set. Therefore, when choos- ing an algorithm, it is necessary to take into account priorities, what is needed — a quick result or a high-quality one. The proposed method can be used to improve the efficiency of managing the processes of assigning reviewer teams in various fields, for example, for evalua- tion of grant applications. The method can also be used for auditing to quickly check the correctness of the assigned committees with subsequent thorough re- source-intensive examination of suspicious cases. Further research may include: studying whether using Large Language Mod- els is a better choice for modeling the keywords representation than the proposed method; using the proposed method of express assignment in more time- consuming and iterative procedures for assigning a team of reviewers, when it is necessary to take into account not only the relevance of the topic of the applica- tion, but also the absence of a conflict of interests, the balance of the load on the reviewers, and other possible limitations. It is advisable to take into account not only the relevance of the subject of the reviewers and the application, but also the qualification level of the experts during the assignment. Acknoledgment. The authors are grateful to Digital Science & Research Solu- tions Inc. for the provision of access to Dimensions as part of the DIM-371 project. REFERENCES 1. F. Wang, N. Shi, B. Chen, “A comprehensive survey of the reviewer assignment problem,” In- ternational Journal of Information Technology and Decision Making, 9(4), pp. 645–668, 2010. doi: https://doi.org/10.1142/S0219622010003993 2. M. Aksoy, S. Yanik, M.F. Amasyali, “Reviewer assignment problem: A systematic review of the literature,” Journal of Artificial Intelligence Research, vol. 76, 2023. doi: https://doi.org/ 10.1613/JAIR.1.14318 3. S. Tan, Z. Duan, S. Zhao, J. Chen, Y. Zhang, “Improved reviewer assignment based on both word and semantic features,” Information Retrieval Journal, 24(3), pp. 175–204, 2021. doi: https://doi.org/10.1007/s10791-021-09390-8 4. D. Yarowsky, R. Florian, “Taking the load off the conference chairs: Towards a digital paper- routing assistant,” Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, EMNLP 1999, pp. 220–230. 5. M. Karimzadehgan, C.X. Zhai, G. Belford, “Multi-aspect expertise matching for review as- signment,” Proceedings of International Conference on Information and Knowledge Man- agement, pp. 1113–1122, 2008. doi: https://doi.org/10.1145/ 1458082.1458230 6. M. Mirzaei, J. Sander, E. Stroulia, “Multi-aspect review-team assignment using latent research areas,” Information Processing and Management, 56(3), pp. 858–878, 2019. doi: https://doi.org/10.1016/j.ipm.2019.01.007 7. E. Ekinci, S.I. Omurca, “NET-LDA: A novel topic modeling method based on semantic docu- ment similarity,” Turkish Journal of Electrical Engineering and Computer Sciences, 28(4), pp. 2244–2260, 2020. doi: https://doi.org/10.3906/ELK-1912-62 8. O. Anjum, H. Gong, S. Bhat, J. Xiong, W.M. Hwu, “Pare: A paper-reviewer matching ap- proach using a common topic space,” EMNLP-IJCNLP 2019 – 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 518–528. doi: https://doi.org/ 10.18653/v1/d19-1049 9. C. Sun, K.T.J. Ng, P. Henville, R. Marchant, “Hierarchical word mover distance for collabora- tion recommender system,” Communications in Computer and Information Science, vol. 996, pp. 289–302. Springer Verlag, 2019. doi: https://doi.org/10.1007/978-981-13-6661-1_23 10. X. Kong, H. Jiang, Z. Yang, Z. Xu, F. Xia, A. Tolba, “Exploiting publication contents and collaboration networks for collaborator recommendation,” PLOS One, 11(2): e0148492, 2016. doi: https://doi.org/10.1371/journal.pone.0148492 11. B. Bhaisare, R. Bharati, “Advancing Peer Review Integrity: Automated Reviewer Assignment Techniques with a Focus on Deep Learning Applications,” in A.K. Bairwa, V. Tiwari, S.K. Vishwakarma, M. Tuba, T. Ganokratanaa, (eds) Computation of Artificial Intelligence and Algorithms for assignment of external reviewers for PhD-thesis defense Системні дослідження та інформаційні технології, 2025, № 4 145 Machine Learning. ICCAIML 2024. Communications in Computer and Information Science, vol 2184. Springer, Cham, 2024. doi: https://doi.org/10.1007/978-3-031-71481-8_25 12. Y. Zhao, J. Tang, Z. Du, “EFCNN: A restricted convolutional neural network for expert finding,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11440 LNAI, pp. 96–107. Springer Verlag, 2019. doi: https://doi.org/10.1007/978-3-030-16145-3_8 13. S. Shtovba, M. Petrychko, “Topic modeling of researchers based on their interests from Google Scholar,” System Research and Information Technologies, no. 2, pp. 113–129, 2021. doi: https://doi.org/10.20535/SRIT.2308-8893.2021.2.09 14. S. Shtovba, M. Petrychko, “Jaccard index-based assessing the similarity of research fields in dimensions,” CEUR Workshop Proceedings, vol. 2533, pp. 117–128, 2019. 15. S. Shtovba, M. Petrychko, O. Shtovba, “Similarity metric оf categorical distributions for topic modeling problems with akin categories,” CEUR Workshop Proceedings, vol. 3392 “The Sixth International Workshop on Computer Modeling and Intelligent Systems”, pp. 76–85, 2023. doi: https://doi.org/10.32782/cmis/3392-7 16. M. Petrychko, S. Shtovba, “Dataset for PhD theses reviewers’ assignments,” ResearchGate, 2024. doi: http://dx.doi.org/10.13140/RG.2.2.23147.35362 Received 25.11.2024 INFORMATION ON THE ARTICLE Serhiy D. Shtovba, ORCID: 0000-0003-1302-4899, Vasyl’ Stus Donetsk National University, Vin- nytsia National Technical Universiry, Ukraine, e-mail: s.shtovba@donnu.edu.ua Mykola V. Petrychko, ORCID: 0000-0001-6836-7843, Vinnytsia National Technical Universiry, Ukraine, e-mail: mpetrychko@vntu.edu.ua АЛГОРИТМИ ПРИЗНАЧЕННЯ ЗОВНІШНІХ РЕЦЕНЗЕНТІВ ДЛЯ ЗАХИСТУ PHD- ДИСЕРТАЦІЙ / С.Д. Штовба, М.В. Петричко Анотація. Запропоновано підхід до призначення зовнішніх рецензентів. У ньому враховується лише семантична схожість між заявками та рецензентами, оці- нюються індекси схожості та призначається необхідна кількість таких рецен- зентів, за яких забезпечується максимальний рівень відповідності рецензентів заявці за деякими критеріями. Виконано порівняльний аналіз різних алгорит- мів оптимізації за критерієм «якість призначення – тривалість оптимізації». Експерименти на тестовому датасеті показали, що прийнятний баланс за кри- теріями «якість призначення» та «тривалість оптимізації» для призначення зо- внішніх рецензентів забезпечує жадібний алгоритм без елітизму та за повного перебору на прорідженій множині кандидатів. Застосування запропонованих алгоритмів покращує якість роботи докторських рад в середньому на 13–34% за усього набору даних, залежно від типу використовуваного алгоритму. Ключові слова: зовнішні рецензенти, задача призначення рецензентів, кате- горизація, оптимізація, повний перебір, жадібний алгоритм, ізольоване при- значення, PhD-дисертація, Dimensions, ANZSRC 2020, галузь досліджень.
id journaliasakpiua-article-351442
institution System research and information technologies
keywords_txt_mv keywords
language English
last_indexed 2026-02-08T08:06:15Z
publishDate 2025
publisher The National Technical University of Ukraine &quot;Igor Sikorsky Kyiv Polytechnic Institute&quot;
record_format ojs
resource_txt_mv journaliasakpiua/4f/007794c905aef6b528107864ab93904f.pdf
spelling journaliasakpiua-article-3514422026-02-02T20:49:24Z Algorithms for assignment of external reviewers for PhD-thesis defense Алгоритми призначення зовнішніх рецензентів для захисту PhD-дисертацій Shtovba, Serhiy Petrychko, Mykola external reviewers reviewer assignment problem categorization optimization brute force algorithm greedy algorithm assignment in isolation PhD-thesis Dimensions ANZSRC 2020 research group зовнішні рецензенти задача призначення рецензентів категоризація оптимізація повний перебір жадібний алгоритм ізольоване призначення PhD-дисертація ANZSRC 2020 галузь досліджень виміри We propose an approach to assigning external reviewers. In the proposed ap-proach, only the semantic similarity between applications and reviewers is tak-en into account; the similarity indices are assessed, and the necessary number of reviewers is assigned to ensure the maximum suitability level of the reviewers with the application, according to some criteria. We also perform a comparative analysis of various optimization algorithms using the criterion of “assignment quality–optimization time”. Experiments on the dataset showed that a reasona-ble balance between the “assignment quality” and “optimization time” criteria for the assignment of external reviewers can be achieved using a greedy algo-rithm without elitism or brute-force search on a truncated set of candidates. An application of the proposed algorithms improves the average quality of PhD committees by 13–34% across the entire dataset, depending on the algorithm used. Запропоновано підхід до призначення зовнішніх рецензентів. У ньому враховується лише семантична схожість між заявками та рецензентами, оцінюються індекси схожості та призначається необхідна кількість таких рецензентів, за яких забезпечується максимальний рівень відповідності рецензентів заявці за деякими критеріями. Виконано порівняльний аналіз різних алгоритмів оптимізації за критерієм «якість призначення – тривалість оптимізації». Експерименти на тестовому датасеті показали, що прийнятний баланс за критеріями «якість призначення» та «тривалість оптимізації» для призначення зовнішніх рецензентів забезпечує жадібний алгоритм без елітизму та за повного перебору на прорідженій множині кандидатів. Застосування запропонованих алгоритмів покращує якість роботи докторських рад в середньому на 13–34% за усього набору даних, залежно від типу використовуваного алгоритму. The National Technical University of Ukraine &quot;Igor Sikorsky Kyiv Polytechnic Institute&quot; 2025-12-29 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/351442 10.20535/SRIT.2308-8893.2025.4.08 System research and information technologies; No. 4 (2025); 127-145 Системные исследования и информационные технологии; № 4 (2025); 127-145 Системні дослідження та інформаційні технології; № 4 (2025); 127-145 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/351442/338459
spellingShingle зовнішні рецензенти
задача призначення рецензентів
категоризація
оптимізація
повний перебір
жадібний алгоритм
ізольоване призначення
PhD-дисертація
ANZSRC 2020
галузь досліджень
виміри
Shtovba, Serhiy
Petrychko, Mykola
Алгоритми призначення зовнішніх рецензентів для захисту PhD-дисертацій
title Алгоритми призначення зовнішніх рецензентів для захисту PhD-дисертацій
title_alt Algorithms for assignment of external reviewers for PhD-thesis defense
title_full Алгоритми призначення зовнішніх рецензентів для захисту PhD-дисертацій
title_fullStr Алгоритми призначення зовнішніх рецензентів для захисту PhD-дисертацій
title_full_unstemmed Алгоритми призначення зовнішніх рецензентів для захисту PhD-дисертацій
title_short Алгоритми призначення зовнішніх рецензентів для захисту PhD-дисертацій
title_sort алгоритми призначення зовнішніх рецензентів для захисту phd-дисертацій
topic зовнішні рецензенти
задача призначення рецензентів
категоризація
оптимізація
повний перебір
жадібний алгоритм
ізольоване призначення
PhD-дисертація
ANZSRC 2020
галузь досліджень
виміри
topic_facet external reviewers
reviewer assignment problem
categorization
optimization
brute force algorithm
greedy algorithm
assignment in isolation
PhD-thesis
Dimensions
ANZSRC 2020
research group
зовнішні рецензенти
задача призначення рецензентів
категоризація
оптимізація
повний перебір
жадібний алгоритм
ізольоване призначення
PhD-дисертація
ANZSRC 2020
галузь досліджень
виміри
url https://journal.iasa.kpi.ua/article/view/351442
work_keys_str_mv AT shtovbaserhiy algorithmsforassignmentofexternalreviewersforphdthesisdefense
AT petrychkomykola algorithmsforassignmentofexternalreviewersforphdthesisdefense
AT shtovbaserhiy algoritmipriznačennâzovníšníhrecenzentívdlâzahistuphddisertacíj
AT petrychkomykola algoritmipriznačennâzovníšníhrecenzentívdlâzahistuphddisertacíj