<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xml:lang="en">
  <front xmlns:xlink="http://www.w3.org/1999/xlink">
    <journal-meta>
      <journal-title-group>
        <journal-title>Computing, Telecommunication and Control</journal-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Информатика, телекоммуникации и управление</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2687-0517</issn>
    </journal-meta>
    <article-meta xmlns:xlink="http://www.w3.org/1999/xlink">
      <article-id pub-id-type="publisher-id">4</article-id>
      <article-id pub-id-type="doi">10.18721/JCSTCS.18304</article-id>
      <title-group>
        <article-title>Text augmentation method via paraphrastic concept embeddings: A case study on Azerbaijani language</article-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Метод аугментации текстов с помощью парафразных векторных представлений на примере азербайджанского языка</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Aghayev</surname>
            <given-names>Aslan</given-names>
          </name>
          <xref ref-type="aff" rid="aff1"/>
          <email>agaev.af@edu.spbstu.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Molodyakov</surname>
            <given-names>Sergey</given-names>
          </name>
          <email>sm50@yandex.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="scopus">6603839750</contrib-id>
          <name>
            <surname>Sergey M. Ustinov</surname>
            <given-names>Сергей</given-names>
          </name>
          <xref ref-type="aff" rid="aff2"/>
          <email>usm50@yandex.ru</email>
        </contrib>
      </contrib-group>
      <aff id="aff1">Peter the Great St. Petersburg Polytechnic University</aff>
      <aff id="aff2">Peter the Great St.Petersburg Polytechnic University</aff>
      <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2025-09-30">
        <day>30</day>
        <month>09</month>
        <year>2025</year>
      </pub-date>
      <volume>18</volume>
      <issue>3</issue>
      <fpage>46</fpage>
      <lpage>57</lpage>
      <self-uri xmlns:xlink="http://www.w3.org/1999/xlink" content-type="pdf" xlink:href="https://infocom.spbstu.ru/userfiles/files/articles/2025/3/46-57.pdf"/>
      <abstract xml:lang="en">
        <p>A novel data augmentation method – paraphrastic concept embeddings – is presented, designed to address the problem of insufficient labeled data in Azerbaijani natural language processing (NLP). This method generates high-quality paraphrastic sentences by encoding semantic concepts into a continuous vector space and decoding them into diverse textual realizations. This approach is the first to utilize concept-level paraphrasing for the Azerbaijani language, yielding substantial improvements in applied tasks. The theoretical foundations of the method, including its mathematical formulation and implementation within NLP pipelines, are proposed. In text classification experiments, the method outperforms standard augmentation techniques in accuracy and robustness. The method does not require external lexical resources, making it especially useful for low-resource languages. It scales for various types of tasks, including sentiment analysis, entity extraction and text generation. It is concluded that the proposed approach significantly advances the level of Azerbaijani NLP and has the potential to be extended to other low-resource languages.</p>
      </abstract>
      <kwd-group xml:lang="en">
        <kwd>natural language processing</kwd>
        <kwd>low-resource language</kwd>
        <kwd>data augmentation</kwd>
        <kwd>paraphrastic embeddings</kwd>
        <kwd>concept embedding</kwd>
        <kwd>text classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
