<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xml:lang="ru">
  <front xmlns:xlink="http://www.w3.org/1999/xlink">
    <journal-meta>
      <journal-title-group>
        <journal-title>Computing, Telecommunication and Control</journal-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Информатика, телекоммуникации и управление</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2687-0517</issn>
    </journal-meta>
    <article-meta xmlns:xlink="http://www.w3.org/1999/xlink">
      <article-id pub-id-type="publisher-id">1</article-id>
      <article-id pub-id-type="doi">10.18721/JCSTCS.12301</article-id>
      <title-group>
        <article-title>Road pavement assessment of the North-West Federal District using sentiment analysis of the Internet user reviews</article-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Оценка состояния транспортных магистралей Северо-Западного федерального округа с использованием анализа тональности отзывов пользователей сети Интернет</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Seliverstov</surname>
            <given-names>Yaroslav</given-names>
          </name>
          <xref ref-type="aff" rid="aff1"/>
          <email>maxwell_8-8@mail.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Nikitin</surname>
            <given-names>Kirill</given-names>
          </name>
          <xref ref-type="aff" rid="aff2"/>
          <email>execiter@mail.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Shatalova</surname>
            <given-names>Natalya</given-names>
          </name>
          <xref ref-type="aff" rid="aff3"/>
          <email>shatillen@mail.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Kiselev</surname>
            <given-names>Arseny</given-names>
          </name>
          <xref ref-type="aff" rid="aff4"/>
          <email>ars8ars@mail.ru</email>
        </contrib>
      </contrib-group>
      <aff id="aff1">Solomenko Institute of Transport Problems  of the Russian Academy of Sciences, University National Technology Initiative 2035</aff>
      <aff id="aff2">Peter the Great St.Petersburg Polytechnic University</aff>
      <aff id="aff3">Solomenko Institute of Transport Problems of the RAS</aff>
      <aff id="aff4">Saint Petersburg Stieglitz State Academy of Art and Design</aff>
      <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2019-09-30">
        <day>30</day>
        <month>09</month>
        <year>2019</year>
      </pub-date>
      <volume>12</volume>
      <issue>3</issue>
      <fpage>7</fpage>
      <lpage>24</lpage>
      <self-uri xmlns:xlink="http://www.w3.org/1999/xlink" content-type="pdf" xlink:href="https://infocom.spbstu.ru/userfiles/files/articles/2019/3/7-24.pdf"/>
      <abstract xml:lang="en">
        <p>As a result of the analysis, it was revealed that social networks, thematic communities, transport portals are a source of actual information about the traffic situation. The article deals with the task of analyzing the road pavement assessment of the North-West Federal District from reviews posted in the web. To solve this problem, a system for automatic classification of reviews based on the sentiment classifier has been developed. The crawler was developed using the Scrapy framework in Python3 and collected reviews from the site http://autostrada.info/ru. The methods of vectorization and lemmatization of texts and their implementation in the Scikit-Learn library are considered: Bag-of-Words, N-gram, CountVectorizer and TF-IDF Vectorizer. For the classification, a naive Bayes algorithm and a linear classifier model with optimization of stochastic gradient descent were used. As a training sample, a base of marked reviews from the Twitter resource was used. The classifier was trained, during which the cross-validation strategy and the ShuffleSplit method were used. According to the results of validation, the linear model with the N-gram scheme and the TF-IDF Vectorizer turned out to be the best. During the approbation of the developed system, the collection and analysis of feedback related to the quality of transport networks in the North-West Federal District was conducted. Based on the results, a color marking of the roads was produced, reflecting the visibility of the research results. Conclusions and prospects for the further development of this study are given.</p>
      </abstract>
      <kwd-group xml:lang="en">
        <kwd>automatic text analysis</kwd>
        <kwd>crowlers</kwd>
        <kwd>texts classification</kwd>
        <kwd>intelligent transport systems</kwd>
        <kwd>machine learning</kwd>
        <kwd>TF-IDF</kwd>
        <kwd>N-gram</kwd>
        <kwd>naive Bayes algorithm</kwd>
        <kwd>linear classifier</kwd>
        <kwd>sentiment analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
