Оцінювання впливу назв продуктів, створених штучним інтелектом, на ефективність електронної комерції
This paper studies the impact of Large Language Model (LLM) technology on the e-commerce industry. This work conducts a detailed review of the current implementation level of LLM technologies in the e-commerce industry. Next, it analyzes the approaches to detecting AI-generated text and determines t...
Saved in:
| Date: | 2025 |
|---|---|
| Main Author: | |
| Format: | Article |
| Language: | English |
| Published: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2025
|
| Subjects: | |
| Online Access: | https://journal.iasa.kpi.ua/article/view/330141 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Journal Title: | System research and information technologies |
| Download file: | |
Institution
System research and information technologies| _version_ | 1867334451668713472 |
|---|---|
| author | Bratus, Oleksandr |
| author_facet | Bratus, Oleksandr |
| author_institution_txt_mv | [
{
"author": "Oleksandr Bratus",
"institution": "Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv"
}
] |
| author_sort | Bratus, Oleksandr |
| baseUrl_str | http://journal.iasa.kpi.ua/oai |
| collection | OJS |
| datestamp_date | 2025-05-20T17:56:07Z |
| description | This paper studies the impact of Large Language Model (LLM) technology on the e-commerce industry. This work conducts a detailed review of the current implementation level of LLM technologies in the e-commerce industry. Next, it analyzes the approaches to detecting AI-generated text and determines the limitations of their application. The proposed methodology defines the impact of LLM models on the e-commerce industry based on a comparative analysis between indicators of machine-generated texts and e-commerce product metrics. Applying this methodology to real data, one of the most relevant data collected after the release of ChatGPT, the results of statistical analyses show a positive correlation between the studied indicators. It is proved that this dependence is dynamic and changes over time. The obtained implicit indicators measure the influence of LLM technologies on the e-commerce domain. This influence is expected to grow, requiring further research. |
| doi_str_mv | 10.20535/SRIT.2308-8893.2025.1.10 |
| first_indexed | 2025-07-17T10:28:45Z |
| format | Article |
| fulltext |
Publisher IASA at the Igor Sikorsky Kyiv Polytechnic Institute, 2025
138 ISSN 1681–6048 System Research & Information Technologies, 2025, № 1
UDC 004.738.5
DOI: 10.20535/SRIT.2308-8893.2025.1.10
ASSESSING THE IMPACT OF AI-GENERATED PRODUCT
NAMES ON E-COMMERCE PERFORMANCE
O. BRATUS
Abstract. This paper studies the impact of Large Language Model (LLM) technol-
ogy on the e-commerce industry. This work conducts a detailed review of the cur-
rent implementation level of LLM technologies in the e-commerce industry. Next, it
analyzes the approaches to detecting AI-generated text and determines the limita-
tions of their application. The proposed methodology defines the impact of LLM
models on the e-commerce industry based on a comparative analysis between indi-
cators of machine-generated texts and e-commerce product metrics. Applying this
methodology to real data, one of the most relevant data collected after the release of
ChatGPT, the results of statistical analyses show a positive correlation between the
studied indicators. It is proved that this dependence is dynamic and changes over
time. The obtained implicit indicators measure the influence of LLM technologies
on the e-commerce domain. This influence is expected to grow, requiring further
research.
Keywords: large language models, AI-detection, e-commerce, product performance.
INTRODUCTION
Since the release of the first version of ChatGPT on November 30, 2022, LLMs
have become integral across numerous aspects of human activity. The capabilities
of these models to search for information, serve as assistants, and analyze data
have made them widely applicable in various sectors, including business and in-
dustry [1]. Particularly in e-commerce — a field where Natural Language Proc-
essing (NLP) techniques were already well-integrated before the advent of LLMs
— these models have found applications at every stage of interaction among cus-
tomers, sellers, and products. The introduction of LLMs has inevitably trans-
formed e-commerce practices, significantly changing the industry. Given that the
presence of LLMs in a business isn’t always immediately apparent, the challenge
of assessing their impact on e-commerce closely ties into the ability to discern
whether textual data was generated by an LLM or not.
Perplexity per token is a key metric for assessing the predictive power of
language models, including prominent transformer models like BERT and GPT-4,
among other LLMs. This metric has been crucial for comparing different lan-
guage models on the same dataset and fine-tuning hyperparameters, though it is
sensitive to linguistic characteristics and sentence length [2]. Despite its central
role in developing language models, perplexity has limitations. Notably, it does
not reliably characterize speech recognition performance and may not effectively
indicate overfitting and generalization capabilities [3; 4]. This has led to question-
ing the merit of solely focusing on perplexity optimization.
Furthermore, while perplexity is a common baseline for differentiating be-
tween machine-generated and human-generated text, it is often inadequate when
Assessing the impact of AI-generated product names on e-commerce performance
Системні дослідження та інформаційні технології, 2025, № 1 139
used alone, leading to a shift away from methods solely reliant on statistical sig-
natures. Instead of relying solely on raw perplexity scores, a more nuanced ap-
proach involves comparing the perplexity measurement with cross-perplexity [5].
This method assesses how unexpected one model’s next token predictions are to
another, providing a more distinct separation between machine and human text
than perplexity alone.
Thus, to investigate the impact of LLM technology on e-commerce, the fol-
lowing research questions are formulated:
RQ1: Do text perplexity-based statistical indicators and e-commerce product
metrics correlate?
RQ2: Does the relationship between text perplexity-based statistical indica-
tors and e-commerce product metrics evolves over time?
This research contributes to the understanding of LLMs’ influence on e-
commerce. The key contributions are as follows:
1. To the best of the author’s knowledge, this study is among the first to as-
sess the impact of LLM models on e-commerce, with the introduction of a unique
approach and then using it on real-world data.
2. This paper explored the relationship between text perplexity-based statis-
tical indicators and product metrics and found a positive correlation that, as veri-
fied by statistical techniques, appears to change over time.
The structure of this paper is organized as follows: Section 2 reviews related
work, Section 3 describes the methodology, Section 4 details the experiments
and results, and Section 5 concludes the paper and proposes directions for fu-
ture research.
RELATED WORK
LLM in NLP. Recent advancements in NLP have been significantly shaped by
(LLMs like GPT-2, GPT-3, and BERT, which have established new benchmarks
in various NLP tasks due to their ability to produce coherent and human-like text
[6; 7]. These models have demonstrated their effectiveness beyond benchmarks
and have been successfully utilized in real-world applications such as automated
customer support, conversational systems, and text summarization [8; 9].
More recently, advanced LLMs, including GPT-4 [10], Gemini [11], and
Llama 2 [12], have shown remarkable proficiency in natural language processing
tasks [1], information retrieval [13], and various other domains [14; 15].
NLP in e-commerce. NLP techniques have been extensively utilized in e-
commerce for various tasks, including sentiment analysis, recommendation sys-
tems, and search engine optimization [16; 17]. Previous research has investigated
using NLP to extract product attributes, create stylistic variations of product de-
scriptions, and generate multilingual descriptions [18; 19]. Although these meth-
ods show promise, they have yet to achieve the scalability needed to produce
high-quality, human-like results. While NLP applications in business settings are
not a novel concept, there has been limited exploration into their tangible effects
on revenue and customer engagement.
LLM in e-commerce. The integration of LLM technology into e-commerce
has not only surpassed existing NLP solutions but has also been instrumental in
addressing a broader range of challenges. Key applications of LLMs in this do-
O. Bratus
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 140
main (Fig. 1) include advanced customer support, content generation (such as
product descriptions, blog posts, comments, and reviews), content evaluation (in-
cluding ratings and sensitivity analysis of user feedback), recommendation sys-
tems, and search engines [20].
One notable trend is the fine-tuning of state-of-the-art LLMs for specific e-
commerce tasks. For instance, LLMs created for automating product description
generation enhance click-through rates and significantly reduce the manual effort
required in content creation [21]. Similarly, employing LLMs for analyzing prod-
uct reviews offers substantial benefits to e-commerce stakeholders — such as
owners, managers, marketers, and data analysts — by providing quicker re-
sponses to customer feedback, thereby improving the overall effectiveness of e-
commerce strategies [22]. In search engine optimization, LLMs are utilized for
keyword selection and content enhancement [23].
Additionally, there is a growing trend towards developing families of LLM
models tailored specifically for e-commerce applications. These models are not
designed to be generalists across multiple domains but are specialized and opti-
mized for e-commerce tasks, training exclusively on relevant data and targeting e-
commerce metrics [24; 25]. Given the widespread adoption of LLMs in the e-
commerce sector, exploring how this technology impacts the industry is crucial.
AI-generated text detection. Early efforts to detect machine-generated text
have shown potential, particularly with models whose outputs are not convinc-
ingly human-like. However, the advent of transformer models for language
generation [6; 7; 12; 26] has rendered many of these basic detection mechanisms
ineffective. One strategy is to record [27] or watermark all generated text [28], but
such preemptive measures require complete control over the generative models.
In response to the growing prevalence of machine-generated text, primarily
through platforms like ChatGPT, a wave of research has focused on post-hoc de-
tection methods. These approaches do not rely on cooperation from model devel-
opers. Detection methods can be broadly categorized into two types. The first in-
volves training detection models, where a pre-trained language model is fine-
tuned for the binary classification task of detecting machine-generated text [29–
31]. Techniques such as adversarial training [32] and abstention [33] are also em-
ployed. Alternatively, instead of fine-tuning the entire model, a linear classifier
can be applied to fixed learned features, allowing for the integration of commer-
cial API outputs [34].
The second category includes methods based on statistical signatures charac-
teristic of machine-generated text. These approaches typically require little or no
training data and can be easily adapted to new model families [35]. Examples include
Content
generation
Content
evaluation
Recmender
system
Search
engine
Advanced customer
support
LLM in
e-commerce
Fig. 1. Applications of LLMs in e-commerce
Assessing the impact of AI-generated product names on e-commerce performance
Системні дослідження та інформаційні технології, 2025, № 1 141
detectors based on perplexity [33; 36; 37], perplexity curvature [38], log-rank [39],
intrinsic dimensionality of generated text [40], and n-gram analysis [41]. While this
overview is not exhaustive, recent surveys can reveal further details [42–45].
From a theoretical standpoint, the main limitation of detection is that fully
general-purpose language models, by definition, would be impossible to detect
[46–48]. However, even models approaching this ideal may still be detectable
with a sufficient number of samples [49]. In practice, the relative success of de-
tection methods, including those proposed and analyzed in this work, provides
evidence that current language models are still imperfect representations of hu-
man writing and, thus, detectable.
RESEARCH METHODOLOGY
The proposed methodology employs a specialized approach that examines the
statistical properties of texts, particularly those that indicate the extent to which a
text is machine-generated, and compares these with product metrics. The goal is
to identify potential relationships between the two characteristics. This methodol-
ogy is structured into three distinct stages (Fig. 2): 1) calculating the machine-
generated characteristics of text features; 2) assessing the e-commerce product
metrics; 3) conducting a statistical analysis to determine any significant correlations.
AI-generated text detection. As described in related works, one of the ap-
proaches to detecting machine-generated text involves calculating specific statis-
tical indicators of the texts and comparing them to predefined threshold values.
This paper follows two critical conditions to choose a model for detecting ma-
chine-generated text. First, there is the absence of a training dataset to fine-tune
classifiers for machine-generated text recognition. Second, there is no information
on whether LLM models were used in generating the texts and, if so, which spe-
cific models. Therefore, a detection model that does not rely on training (zero-
shot model) and is agnostic to any LLM model is required. The method, called
Binoculars, meets these criteria and utilizes the binoculars score, which calculates
the ratio of perplexity to cross-perplexity [5]:
,
)(log
)(log
)(
21
1
21
,
, sPPLX
sPPL
sB
MM
M
MM
where perplexity, )(log
1
sPPLM is defined as the average negative log-likelihood
of all tokens in the given sequence s cross-perplexity, )(log
21, sPPLX MM , is
defined as the average per-token cross-entropy between the outputs of two mod-
els, 1M and 2M when operating on the tokenization of the sequence s .
Fig. 2. Proposed methodology
O. Bratus
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 142
In other words, the numerator in this method is the perplexity, which quanti-
fies how unexpected a string is to model 1M . Conversely, the denominator as-
sesses how unexpected the token predictions of model 2M are when evaluated by
1M . Intuitively, this means that a human is expected to diverge from 1M more
than 2M could, assuming that the LLMs 1M and 2M re more similar to each
other than they are to a human. This approach achieves state-of-the-art accuracy
without requiring any training data. It can detect machine-generated text from
various modern LLMs without needing model-specific adjustments. Therefore,
this work utilizes the Binoculars score as a statistical signature for identifying
machine-generated text.
E-commerce product metrics. Using an e-commerce dataset that captures
interactions between customers and products, various metrics can be calculated to
provide valuable insights into product performance and customer behaviour. Met-
rics related to sales and revenue include sales volume, revenue, conversion rate,
and profit margin. Another category of metrics focuses on user experience, en-
compassing indicators such as product return rate, customer reviews, and ratings.
The scope of product metrics is not confined to these examples; it is instead de-
termined by the availability of specific features in the dataset that enable the cal-
culation of particular metrics.
Statistical analysis. The third and final stage of the proposed methodology
is a statistical comparison of machine-generated text characteristics and product
metrics. Spearman’s rank correlation coefficient is used to determine any relation-
ship. It is a nonparametric measure of rank correlation that assesses how well the
relationship between two features can be described using a monotonic function.
A bootstrap method is used to answer this research’s second question and
determine whether the relationship between the studied metrics has changed. It
estimates the confidence intervals and significance of the difference between two
Spearman coefficients. Bootstrapping can provide a flexible and robust way to
handle non-parametric statistics without relying on normality assumptions. Boot-
strap involves repeatedly resampling the data with replacement. For each boot-
strap sample, the Spearman correlations for each of the two datasets are calcu-
lated, and the difference between them is computed. Then, the differences from
all bootstrap samples (1000 samples in this work) are collected to form a distribu-
tion of differences and determine the confidence interval. A 95% confidence in-
terval is used, which means the 2.5th percentile and the 97.5th percentile of the
bootstrap differences are found. Suppose the 95% confidence interval does not
include zero. In that case, it indicates a statistically significant difference between
the two correlation coefficients and, therefore, a statistically significant change in
the relationship between machine-generated text characteristics and product met-
rics. Otherwise, if the interval includes zero, no significant difference exists be-
tween the correlations at the chosen confidence level.
EXPERIMENTS
Dataset and Preprocessing. One of the challenges in researching the effects of
LLM technology on e-commerce is the scarcity of accessible, complete, and up-
to-date datasets. Given that ChatGPT was only released in November 2022, and
considering the gradual integration of LLMs within the e-commerce sector, it will
take some time to accumulate and publish comprehensive datasets.
Assessing the impact of AI-generated product names on e-commerce performance
Системні дослідження та інформаційні технології, 2025, № 1 143
The MerRec [50], introduced in March 2024, is one of the first datasets that
meets these requirements. It encapsulates detailed records of user interactions on
the Mercari e-commerce platform, tracking millions of users and products over
six months, from May to October 2023. MerRec not only captures basic features
such as user attributes (user_id, sequence_id, session_id) and product attributes
(item_id, product_id) but also includes specialized data like timestamped action
types, product taxonomy, and textual product descriptions, making it a rich re-
source for analysis.
This analysis focused on products listed during the dataset’s initial (May)
and final (October) months. Given limited computing resources and to minimize
data skew from outliers or abnormal product behaviour, the data is preprocessed
with specific criteria: only those products are selected whose names contain at
least five words and are purchased at least once.
Generalized word shift graphs were utilized to enhance the clarity of product
names analyzed in this research. Such visualizations provide a meaningful and
interpretable summary of how individual words contribute to variations observed
between two distinct text corpora [51].
For instance, the product names in the Women category for October 2023
were analyzed, featuring low (“AI-generated”) and high (“human-generated”)
binoculars scores. Examples of top 20 product names with the lowest and the
highest binoculars scores are presented in Table. Names scoring low on the bin-
oculars scale exhibited higher standardization, including consistent word order,
capitalization of each word, and numerical size descriptors. In contrast, names
with high binoculars scores (likely human-generated) displayed a less structured
word order, lacked punctuation and used words to describe sizes (e.g., small).
Top 20 product names in the Women category for October 2023 with the lowest
and the highest binoculars scores
Top 20 product names
with the lowest binoculars score
Top 20 product names
with the highest binoculars score
Converse size 7.5 women’s shoes womens ugg boots size 9
Keychain Wallet, Wristlet, Bangle, Bracelet, ID Card
Holder, Purse, Key Chain, G
Christian Louboutin Women Black Heels Shoes Size 8.5 (38.5)
Vtg Sterling Silver 925 Hinged Bangle Bracelet
Polo Ralph Lauren Women’s V-Neck T-Shirt - Size Me-
dium - Navy Blue
UGG Brookfield Brown Sheepskin Leather Boots Size 8.5
Avatar: The Last Airbender Aang & Katara Mini Backpack
Womens Old Navy Fleece Jacket Size Small
Nike air max 270 women size 7
Old Navy Active Fleece Jacket
Lululemon Long Sleeve - Size 10
Purple Hooded Long Sleeve Sweater
Tory Burch Black Leather Boots Size 10.5
Victoria’s Secret PINK Bling Leggings
The Nightmare Before Christmas Jack Skellington
Nike Air Max 2X (Women)
Super cute and comfy pajamas
Tommy Hilfiger Women’s Medium Red and White Striped
Dress
Costume Jewelry Lot - 25 pieces - Necklaces, Bracelets, Earrings
Sebek Zigvolt Acrylic Stud Earrings
FIGS rose joggers size Small Petite
J crew midi floral sun Dress
Motel Olivia faux leather biker jacket white
She Darc sweatshirt! Size small
Grae Cove linen short sleeve waist tie
pockets mini dress blush women’s XL
Hot topic rob zombie hoodie XS
Famous magic land couples OS leggings
August Silk womens colorful funky
patterned Shorts
Kate spade Pitch Purrfect Piano Cross-
body KC729 NWT
Beautiful Disaster Tribe Jacket Size L
Express Low rise columnist pants
New sweatshirt hoodie Jeffrey Star
Hades Disneyland Spirit Jersey Small
Save for Rosemary Special love lot
Hot Topic Mushroom Collar dress
Coach Wyn Logo Plaque Small Wallet
NEW bundle Victoria Secret underwear
Cat In a pumpkin earrings
Waffle Debut Retro Sneaker leopard
O. Bratus
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 144
The examination using
word shift graph (Fig. 3)
revealed minimal signifi-
cant differences in word
usage between the two
groups (the first group con-
tains product names with
the binoculars scores from
first quartile (Q1) and the
second group contains
product names with the
binoculars scores from
fourth quartile (Q4). How-
ever, several subtle distinc-
tions were noted. Primarily,
descriptive words for sizes
(e.g., small, medium, xs, xl)
were used in “human-
generated” names. In con-
trast, numerical representa-
tions (e.g., 7.5, 8, 8.5) were
employed in “AI-
generated” names, enhanc-
ing the accuracy and stan-
dardization of size descrip-
tions. Additionally,
abbreviations (e.g., sz, nwt)
were often included in “hu-
man-generated” names.
Thus, the example of the
Women’s product category
demonstrates how product
names with different bin-
ocular scores differ from
each other.
LLM models. As de-
scribed in section 3, the
binoculars method is used
as an AI-generated text de-
tector, which requires 2
LLM models. Moreover,
these models should pro-
vide access to the raw logits
of all tokens in the given sequence to calculate the binoculars score. Unfortu-
nately, most state-of-the-art LLM models (GPT-4, Claude-3, etc.) do not provide
access to such logits. Therefore, the open-source LLM models are considered, and
the Falcon-7b model and the Falcon-7b-instruct model are chosen, which are pre-
trained generative text models with 7 billion parameters and demonstrate high
performance.
Fig. 3. Word shift graph of the product names with the
lowest and highest binoculars scores
Assessing the impact of AI-generated product names on e-commerce performance
Системні дослідження та інформаційні технології, 2025, № 1 145
It was carried out on the remote resources of Google Colab and consumed
approximately 10 hours of A100 GPU to generate the scores for nearly 300000
unique product names.
Evaluation metric. To investigate the impact of LLM on e-commerce
(namely, on product names), and based on the features of the selected MerRec
dataset (unique user actions), the conversation rate is used as one of the central
business metrics in e-commerce that indicates product performance. It is defined
as the ratio of the total number of customers who purchased the product compared
to the total number of customers who interacted with it.
Results. The proposed methodology’s performance is evaluated on the real-
data MerRec dataset. Overall, a positive correlation between binoculars score and
conversation rate is found, which differs depending on the product category.
These results are inspected in more detail in the following.
RQ1 Do text perplexity-based statistical indicators and e-commerce
product metrics correlate?
First, the conversation rate scores are calculated for all products sold in May
2023; then, for the same products, the binoculars scores of their names are calcu-
lated. After that, the Spearman correlation coefficient between these indicators is
calculated, and it found that it differs significantly for products of different cate-
gories (Fig. 4). For example, for the Men and Kids categories, the correlation is
the highest at 0.54 and 0.53, respectively, which indicates a moderate correlation
degree. The correlation is somewhat lower, but also significant, for products in
the Women category (0.28). There is a group of categories for which the correla-
tion is positive but very weak (Sports & outdoors, Pet Supplies, Toys & Collecti-
bles, Vintage & collectibles). There are also categories for which the correlation
is practically absent, but what is important to note is that there are no products for
which the Spearman correlation is negative (except Garden & outdoor).
Fig. 4. Spearman correlation coefficients between binoculars score and conversation rate
of products from the MerRec dataset
O. Bratus
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 146
It is to noticed that the correlation is the largest for the most general groups
(Men, Women, Kids, Office, Electronics), which are characterized by a wide
range of products and their diversity, while products that can be attributed to a
specific field of activity (Sports & outdoors, Pet Supplies, Vintage & collectibles,
Handmade, etc.) have very weak or zero correlation. It can be assumed that for
general categories, it is not easy to come up with an original product name that
will distinguish it from others and interest customers; at the same time, for spe-
cific domain categories, the names of products may contain certain specifications,
which will interpret them as original, which, however, is typical for them, and in
no way distinguishes them from other products.
A similar analysis was conducted for products sold in October 2023. Simi-
larly, a positive correlation between binoculars score and conversation rate is ob-
served. However, for most categories, the correlation decreased; for a few, such as
Other and Garden & outdoor, the correlation became negative, albeit very weak.
Thus, a moderate positive correlation between binoculars score (a text per-
plexity-based statistical indicator) and conversation rate (an e-commerce product
metric) is seen. It can be interpreted that a higher probability of the product name
being generated by a human (higher binoculars score) correlates with better prod-
uct performance.
RQ2 Does the relationship between text perplexity-based statistical indi-
cators and e-commerce product metrics evolves over time?
A statistical comparison of the correlation coefficients of the data for May
and October is performed. It is found that for most categories, there is a statisti-
cally significant change in correlation (Fig. 5). Thus, finding a boxplot with
Fig. 5. Distribution of Spearman correlation coefficients across different product catego-
ries. Boxplots show median (red line) and 25- and 75-percentiles with whiskers ranging
from 2.5- to 97.5-percentile
Assessing the impact of AI-generated product names on e-commerce performance
Системні дослідження та інформаційні технології, 2025, № 1 147
whiskers entirely above zero indicates a significant decrease in the correlation
between binoculars score and conversation rate and placing it below zero, on the
contrary, indicates an increase in correlation. It can be concluded that out of all 15
categories, for seven categories (including those with the highest correlation coef-
ficients in May), the correlations decreased statistically; only for three categories
increased, and for the rest of the categories, they remained unchanged (or their
change is statistically insignificant).
So, for six months, from May to October, for most products, there is a trend
to decrease the correlation coefficient between binoculars score (text perplexity-
based statistical indicator) and conversation rate (e-commerce product metric).
This may be due to the increased use of LLM technology to generate product
names, but it is still small.
CONCLUSIONS
In this work, the methodology to determine the impact of AI-generated product
names on e-commerce performance is proposed; namely, the relationship between
the binoculars score of product names and the conversation rate of products is
investigated. It examines in detail the current level of implementation of LLM
technology in the field of e-commerce, considering a wide range of problems
solved by language models. In addition, the existing state-of-the-art detection
methods of machine-generated texts are described, and one of those methods that
performs zero-shot and model-agnostic detection is used. Proposed approach is
applied to real data for 2023 and a positive correlation between binoculars score
(text perplexity-based statistical indicator) and conversation rate (e-commerce
product metric) is found. This positive correlation tends to decrease, which is ver-
ified statistically. Thus, the impact of LLM technology on e-commerce is ob-
served, and only an increase in this impact is expected in the future.
For future work, a semantic analysis of the comparison of product names
over time on changing typical words in the product names triggered by the activ-
ity of LLM models can be conducted, which may be fascinating, but this is a
question for further research.
REFERENCES
1. W.X. Zhao et al., “A survey of large language models,” ArXiv, 2023. doi:
https://doi.org/10.48550/arXiv.2303.18223
2. A. Miaschi, D. Brunato, F. Dell’Orletta, and G. Venturi, “What makes my model
perplexed? A linguistic investigation on neural language models perplexity,” in Pro-
ceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge
Extraction and Integration for Deep Learning Architectures, pp. 40–47, 2021. doi:
10.18653/v1/2021.deelio-1.5
3. D. Klakow, J. Peters, “Testing the correlation of word error rate and perplexity,”
Speech Communication, vol. 38, no. 1–2, pp. 19–28, 2002. doi: 10.1016/S0167-
6393(01)00041-3
4. S.F. Chen, D. Beeferman, and R. Rosenfeld, Evaluation metrics for language mod-
els. 1998. Available: https://kilthub.cmu.edu/articles/Evaluation_Metrics_For_Lan-
guage_Models/6605324/files/12095765.pdf
5. A. Hans et al., “Spotting LLMs with binoculars: Zero-shot detection of machine-
generated text,” ArXiv, 2024. doi: https://doi.org/10.48550/arXiv.2401.12070
O. Bratus
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 148
6. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language
models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
Available: https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf
7. T.B. Brown, “Language models are few-shot learners,” ArXiv, 2020. doi:
https://doi.org/10.48550/arXiv.2005.14165
8. D. Adiwardana et al., “Towards a human-like open-domain chatbot,” ArXiv, 2020.
doi: https://doi.org/10.48550/arXiv.2001.09977
9. P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive
NLP tasks,” Advances in Neural Information Processing Systems, vol. 33,
pp. 9459–9474, 2020. Available: https://proceedings.neurips.cc/paper/2020/hash/
6b493230205f780e1bc26945df7481e5-Abstract.html
10. B. Miranda, A. Lee, S. Sundar, A. Casasola, and S. Koyejo, “Beyond Scale: The Di-
versity Coefficient as a Data Quality Metric for Variability in Natural Language
Data,” ArXiv, 2024. doi: https://doi.org/10.48550/arXiv.2306.13840
11. Gemini Team Google, “Gemini: a family of highly capable multimodal models,”
ArXiv, 2023. doi: https://doi.org/10.48550/arXiv.2312.11805
12. H. Touvron et al., “Llama 2: Open foundation and fine-tuned chat models,” ArXiv,
2023. doi: https://doi.org/10.48550/arXiv.2307.09288
13. S.E. Spatharioti, D.M. Rothschild, D.G. Goldstein, and J.M. Hofman, “Comparing
traditional and LLM-based search for consumer choice: A randomized experiment,”
ArXiv, 2023. doi: https://doi.org/10.48550/arXiv.2307.03744
14. S. Frieder, J. Berner, P. Petersen, and T. Lukasiewicz, “Large language models for
mathematicians,” ArXiv, 2023. doi: https://doi.org/10.48550/arXiv.2312.04556
15. F. Zeng, W. Gan, Y. Wang, N. Liu, and P.S. Yu, “Large language models for robot-
ics: A survey,” ArXiv, 2023. doi: https://doi.org/10.48550/arXiv.2311.07226
16. G. Sousa, “Natural Language Processing and its applications in e-business,” Cader-
nos de Investigação do Mestrado em Negócio Eletrónico, vol. 2, 2022. doi:
https://doi.org/10.56002/ceos.0070_cimne_1_2
17. Y. Huang, “Research on the Application of Natural Language Processing Technol-
ogy in E-commerce,” in ISCTT 2021; 6th International Conference on Information
Science, Computer Technology and Transportation, 2021, pp. 1–5. Available:
https://ieeexplore.ieee.org/abstract/document/9738909
18. M. Chen, Q. Tang, S. Wiseman, and K. Gimpel, “Controllable paraphrase generation
with a syntactic exemplar,” ArXiv, 2019. doi: https://doi.org/10.48550/arXiv.
1906.00565
19. Q. Chen, J. Lin, Y. Zhang, H. Yang, J. Zhou, and J. Tang, “Towards knowledge-
based personalized product description generation in e-commerce,” in Proceedings
of the 25th ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining, 2019, pp. 3040–3050. doi: 10.1145/3292500.333072
20. Q. Ren et al., “A survey on fairness of large language models in e-commerce:
progress, application, and challenge,” ArXiv, 2024. doi: https://doi.org/10.48550/
arXiv.2405.13025
21. J. Zhou, B. Liu, J.N.A.Y. Hong, K.-C. Lee, and M. Wen, “Leveraging Large Lan-
guage Models for Enhanced Product Descriptions in eCommerce,” ArXiv, 2023. doi:
https://doi.org/10.48550/arXiv.2310.18357
22. K.I. Roumeliotis, N.D. Tselikas, and D.K. Nasiopoulos, “LLMs in e-commerce:
a comparative analysis of GPT and LLaMA models in product review evaluation,”
Natural Language Processing Journal, vol. 6, p. 100056, 2024. doi: 10.1016/
j.nlp.2024.100056
23. G. Chodak, K. Błażyczek, “Large Language Models for Search Engine Optimization
in E-commerce,” in International Advanced Computing Conference, pp. 333–344,
2023. doi: 10.1007/978-3-031-56700-1_27
24. B. Peng, X. Ling, Z. Chen, H. Sun, and X. Ning, “eCeLLM: Generalizing Large
Language Models for E-commerce from Large-scale, High-quality Instruction Data,”
ArXiv, 2024. doi: https://doi.org/10.48550/arXiv.2402.08831
Assessing the impact of AI-generated product names on e-commerce performance
Системні дослідження та інформаційні технології, 2025, № 1 149
25. C. Herold, M. Kozielski, L. Ekimov, P. Petrushkov, P.-Y. Vandenbussche, and
S. Khadivi, “LiLiuM: eBay’s Large Language Models for e-commerce,” ArXiv,
2024. doi: https://doi.org/10.48550/arXiv.2406.12023
26. A. Chowdhery et al., “Palm: Scaling language modeling with pathways,” Journal of
Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023. Available:
http://jmlr.org/papers/v24/22-1144.html
27. K. Krishna, Y. Song, M. Karpinska, J. Wieting, and M. Iyyer, “Paraphrasing evades
detectors of AI-generated text, but retrieval is an effective defense,” Advances
in Neural Information Processing Systems, vol. 36, 2024. Available:
https://proceedings.neurips.cc/paper_files/paper/2023/hash/575c450013d0e99e4b0ec
f82bd1afaa4- Abstract-Conference.html
28. J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark
for large language models,” in International Conference on Machine Learning, 2023,
pp. 17061–17084. Available: https://proceedings.mlr.press/v202/ kirchenbauer23a.html
29. I. Solaiman et al., “Release strategies and the social impacts of language models,”
ArXiv, 2019. doi: https://doi.org/10.48550/arXiv.1908.09203
30. R. Zellers et al., “Defending against neural fake news,” Advances in Neural Informa-
tion Processing Systems, vol. 32, 2019. Available: https://proceedings.neurips.cc/pa-
per/2019/hash/3e9f0fc9b2f89e043bc6233994dfcf76-Abstract.html
31. X. Yu et al., “GPT paternity test: GPT generated text detection with GPT genetic
inheritance,” CoRR, 2023. Available: https://arxiv.org/pdf/2305.12519v2
32. X. Hu, P.-Y. Chen, and T.-Y. Ho, “Radar: Robust AI-text detection via adversarial learn-
ing,” Advances in Neural Information Processing Systems, vol. 36, pp. 15077–15095,
2023. Available: https://proceedings.neurips.cc/paper_files/paper/2023/hash/
30e15e5941 ae0cdab7ef58cc8d59a4ca-Abstract-Conference.html
33. Y. Tian et al., “Multiscale positive-unlabeled detection of AI-generated texts,”
ArXiv, 2023. Available: https://arxiv.org/pdf/2305.18149
34. V. Verma, E. Fleisig, N. Tomlin, and D. Klein, “Ghostbuster: Detecting text ghost-
written by large language models,” ArXiv, 2023. doi: https://doi.org/10.48550/
arXiv.2305.15047
35. J. Pu, Z. Huang, Y. Xi, G. Chen, W. Chen, and R. Zhang, “Unraveling the mystery
of artifacts in machine generated text,” in Proceedings of the Thirteenth Language
Resources and Evaluation Conference, pp. 6889–6898, 2022. Available:
https://aclanthology.org/2022.lrec-1.744
36. C. Vasilatos, M. Alam, T. Rahwan, Y. Zaki, and M. Maniatakos, “HowkGPT: Inves-
tigating the detection of ChatGPT-generated university student homework through
context-aware perplexity analysis,” ArXiv, 2023. doi: https://doi.org/10.48550/
arXiv.2305.18226
37. Y. Wang et al., “M4: Multi-generator, multi-domain, and multi-lingual black-box
machine-generated text detection,” ArXiv, 2023. doi: https://doi.org/10.48550/
arXiv.2305.14902
38. E. Mitchell, Y. Lee, A. Khazatsky, C.D. Manning, and C. Finn, “DetectGPT: Zero-
shot machine-generated text detection using probability curvature,” in International
Conference on Machine Learning, pp. 24950–24962, 2023. Available:
https://proceedings.mlr.press/v202/mitchell23a.html
39. J. Su, T.Y. Zhuo, D. Wang, and P. Nakov, “DetectLLM: Leveraging log rank infor-
mation for zero-shot detection of machine-generated text,” ArXiv, 2023. doi:
https://doi.org/10.48550/arXiv.2306.05540
40. E. Tulchinskii et al., “Intrinsic dimension estimation for robust detection of AI-generated
texts,” Advances in Neural Information Processing Systems, vol. 36, 2024. Available:
https://proceedings.neurips.cc/paper_files/paper/2023/hash/7baa48bc166aa2013d78c
bdc15010530-Abstract-Conference.html
41. X. Yang, W. Cheng, Y. Wu, L. Petzold, W.Y. Wang, and H. Chen, “DNA-GPT: Di-
vergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text,”
ArXiv, 2023. doi: https://doi.org/10.48550/arXiv.2305.17359
O. Bratus
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 150
42. S.S. Ghosal, S. Chakraborty, J. Geiping, F. Huang, D. Manocha, and A.S. Bedi,
“Towards possibilities & impossibilities of AI-generated text detection: a survey,”
ArXiv, 2023. doi: https://doi.org/10.48550/arXiv.2310.15264
43. R. Tang, Y.-N. Chuang, and X. Hu, “The science of detecting LLM-generated text,”
Communications of the ACM, vol. 67, issue 4, pp. 50–59, 2024. doi: 10.1145/3624725
44. M. Dhaini, W. Poelman, and E. Erdogan, “Detecting ChatGPT: A survey of the state
of detecting ChatGPT-generated text,” ArXiv, 2023. doi: https://doi.org/10.48550/
arXiv.2309.07689
45. B. Guo et al., “How close is ChatGPT to human experts? Comparison corpus, evalua-
tion, and detection,” ArXiv, 2023. doi: https://doi.org/10.48550/arXiv.2301.07597
46. R. Varshney, N.S. Keskar, and R. Socher, “Limits of detecting text generated by
large-scale language models,” in 2020 Information Theory and Applications Work-
shop (ITA), 2020, pp. 1–5. doi: 10.1109/ITA50056.2020.9245012
47. H. Helm, C.E. Priebe, and W. Yang, “A Statistical Turing Test for Generative Mod-
els,” ArXiv, 2023. doi: https://doi.org/10.48550/arXiv.2309.08913
48. V.S. Sadasivan, A. Kumar, S. Balasubramanian, W. Wang, and S. Feizi, “Can AI-
generated text be reliably detected?,” ArXiv, 2023. doi: https://doi.org/10.48550/
arXiv.2303.11156
49. S. Chakraborty, A.S. Bedi, S. Zhu, B. An, D. Manocha, and F. Huang, “On the pos-
sibilities of AI-generated text detection,” ArXiv, 2023. doi: https://doi.org/10.48550/
arXiv.2304.04736
50. L. Li, Z. A. Din, Z. Tan, S. London, T. Chen, and A. Daptardar, “MerRec: A Large-
scale Multipurpose Mercari Dataset for Consumer-to-Consumer Recommendation
Systems,” ArXiv, 2024. doi: https://doi.org/10.48550/arXiv.2402.14230
51. R. J. Gallagher et al., “Generalized word shift graphs: a method for visualizing and
explaining pairwise comparisons between texts,” EPJ Data Science, vol. 10, no. 1,
Jan. 2021. doi: 10.1140/epjds/s13688-021-00260-3
Received 02.09.2024
INFORMATION ON THE ARTICLE
Oleksandr S. Bratus, ORCID: 0009-0003-5004-1652, Educational and Research Institute
for Applied System Analysis of the National Technical University of Ukraine “Igor Sikor-
sky Kyiv Polytechnic Institute”, Ukraine, e-mail: olexandr.bratus@gmail.com
ОЦІНЮВАННЯ ВПЛИВУ НАЗВ ПРОДУКТІВ, СТВОРЕНИХ ШТУЧНИМ
ІНТЕЛЕКТОМ, НА ЕФЕКТИВНІСТЬ ЕЛЕКТРОННОЇ КОМЕРЦІЇ / О.С. Братусь
Анотація. Досліджено вплив великих мовних моделей (LLM) на електронну
комерцію. Здійснено детальний огляд поточного рівня впровадження LLM у
електронній комерції. Проаналізовано існуючі підходи до детекції текстів, зге-
нерованих штучним інтелектом (ШІ), та визначено обмеження їх застосування.
Запропоновано методологію визначення впливу LLM на електронну комерцію
на основі порівняння індикаторів ШІ-згенерованих текстів та продуктових ме-
трик. Продемонстровано застосування методології на реальних даних, що зі-
брані після релізу ChatGPT, і отримано результати статистичного аналізу, які
показують додатну кореляцію між досліджуваними показниками. Підтвердже-
но наявність динаміки цієї залежності та її зміни з часом. Отримані неявні ін-
дикатори вимірюють вплив LLM технології на сферу електронної комерції.
Очікуємо, що вплив зростатиме, потребуючи подальших досліджень.
Ключові слова: великі мовні моделі, ШІ-детекція, електронна комерція, ефек-
тивність продукту.
|
| id | journaliasakpiua-article-330141 |
| institution | System research and information technologies |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2025-09-17T09:26:03Z |
| publishDate | 2025 |
| publisher | The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" |
| record_format | ojs |
| resource_txt_mv | journaliasakpiua/4b/ca4c1ceca8ca7e42cf87303b89dae54b.pdf |
| spelling | journaliasakpiua-article-3301412025-05-20T17:56:07Z Assessing the impact of AI-generated product names on e-commerce performance Оцінювання впливу назв продуктів, створених штучним інтелектом, на ефективність електронної комерції Bratus, Oleksandr large language models AI-detection e-commerce product performance великі мовні моделі ШІ-детекція електронна комерція ефективність продукту This paper studies the impact of Large Language Model (LLM) technology on the e-commerce industry. This work conducts a detailed review of the current implementation level of LLM technologies in the e-commerce industry. Next, it analyzes the approaches to detecting AI-generated text and determines the limitations of their application. The proposed methodology defines the impact of LLM models on the e-commerce industry based on a comparative analysis between indicators of machine-generated texts and e-commerce product metrics. Applying this methodology to real data, one of the most relevant data collected after the release of ChatGPT, the results of statistical analyses show a positive correlation between the studied indicators. It is proved that this dependence is dynamic and changes over time. The obtained implicit indicators measure the influence of LLM technologies on the e-commerce domain. This influence is expected to grow, requiring further research. Досліджено вплив великих мовних моделей (LLM) на електронну комерцію. Здійснено детальний огляд поточного рівня впровадження LLM у електронній комерції. Проаналізовано існуючі підходи до детекції текстів, згенерованих штучним інтелектом (ШІ), та визначено обмеження їх застосування. Запропоновано методологію визначення впливу LLM на електронну комерцію на основі порівняння індикаторів ШІ-згенерованих текстів та продуктових метрик. Продемонстровано застосування методології на реальних даних, що зібрані після релізу ChatGPT, і отримано результати статистичного аналізу, які показують додатну кореляцію між досліджуваними показниками. Підтверджено наявність динаміки цієї залежності та її зміни з часом. Отримані неявні індикатори вимірюють вплив LLM технології на сферу електронної комерції. Очікуємо, що вплив зростатиме, потребуючи подальших досліджень. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025-03-28 Article Article Peer-reviewed Article application/pdf https://journal.iasa.kpi.ua/article/view/330141 10.20535/SRIT.2308-8893.2025.1.10 System research and information technologies; No. 1 (2025); 138-150 Системные исследования и информационные технологии; № 1 (2025); 138-150 Системні дослідження та інформаційні технології; № 1 (2025); 138-150 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/330141/319621 |
| spellingShingle | великі мовні моделі ШІ-детекція електронна комерція ефективність продукту Bratus, Oleksandr Оцінювання впливу назв продуктів, створених штучним інтелектом, на ефективність електронної комерції |
| title | Оцінювання впливу назв продуктів, створених штучним інтелектом, на ефективність електронної комерції |
| title_alt | Assessing the impact of AI-generated product names on e-commerce performance |
| title_full | Оцінювання впливу назв продуктів, створених штучним інтелектом, на ефективність електронної комерції |
| title_fullStr | Оцінювання впливу назв продуктів, створених штучним інтелектом, на ефективність електронної комерції |
| title_full_unstemmed | Оцінювання впливу назв продуктів, створених штучним інтелектом, на ефективність електронної комерції |
| title_short | Оцінювання впливу назв продуктів, створених штучним інтелектом, на ефективність електронної комерції |
| title_sort | оцінювання впливу назв продуктів, створених штучним інтелектом, на ефективність електронної комерції |
| topic | великі мовні моделі ШІ-детекція електронна комерція ефективність продукту |
| topic_facet | large language models AI-detection e-commerce product performance великі мовні моделі ШІ-детекція електронна комерція ефективність продукту |
| url | https://journal.iasa.kpi.ua/article/view/330141 |
| work_keys_str_mv | AT bratusoleksandr assessingtheimpactofaigeneratedproductnamesonecommerceperformance AT bratusoleksandr ocínûvannâvplivunazvproduktívstvorenihštučnimíntelektomnaefektivnístʹelektronnoíkomercíí |