The article discusses modern methods for protecting bank customers' personal information based on differential anonymization of data using trusted neural networks. It provides an overview of the regulatory framework, analyzes technological approaches and describes a developed multi-level anonymization model that combines cryptographic and machine learning techniques. Special attention is paid to balancing between preserving data utility and minimizing the risk of customer identity disclosure.
Keywords: differential anonymization, trusted neural network, personal data, banking technologies, information security, cybersecurity
Two approaches to weighting vertices in rooted trees of hierarchical classifiers that characterize subject domains of various information retrieval systems are presented. Weight characteristics are examined depending on the position of vertices in the rooted tree — namely, on the level, depth, number of directly subordinate vertices (the «sons» clan), and number of hierarchically subordinate leaf vertices. The specifics of vertex weighting in a rooted tree are illustrated based on: weighted summation of level‑based, depth‑based, clan‑based, and leaf‑based vertex weights; hierarchically additive summation of leaf weight characteristics.
Keywords: information retrieval systems, hierarchical subject classifiers, hierarchical categorizers, subject indexing, vertex weighting in a rooted tree, level‑based weight characteristic of a vertex, depth‑based weight characteristic of a vertex
A comprehensive method of system analysis and processing of financial information is proposed without prior training in five complementary taxonomies (genre, type of event, tonality, level of influence, temporality) with simultaneous extraction of entities. The method is based on an ensemble of three specialized instructions for a local artificial intelligence model with an adapted majority voting algorithm and a two-level mechanism for explicable failures. The protocol was validated by comparative testing of 14 local models on 100 expertly marked units of information, while the model achieved 90% processing accuracy. The system implements the principles of self-consistency and selective classification, is reproduced on standard equipment and does not require training on labeled data.
Keywords: organizational management, software projects, intelligent decision support system, ontological approach, artificial intelligence
This article examines the problem of control and management in transport systems using the example of passenger rail rolling stock operation processes using information technology and automation tools. The main proposed methods for improving the efficiency of vehicle operation management are the use of digital modeling of transport complex objects and processes and the automation of probabilistic-statistical analysis of data on the technical and operational characteristics of the system. The objective of the study is to improve the operational efficiency, reliability, and safety of passenger rail rolling stock by developing digital twins of the rolling stock and digital models of its operation processes. The main objectives of the study are to develop approaches to automating methods of analysis of the flow of data on the operation and technical condition of passenger rolling stock, as well as to develop a concept for applying digital modeling to solve current problems of passenger rail transport. The research hypothesis is based on the assumption of the effectiveness of applying new information technologies to solving practical problems of rolling stock operation management. The use of digital models of rolling stock units and the digitalization of the repair process are considered. The paper proposes the use of automated Pareto analysis methods for data on technical failures of railcars and least-squares modeling of distribution and density functions for passenger wagon operating indicators as continuous random variables. It is demonstrated that digital modeling of transport system objects and processes using big data analysis enables the improvement of transportation processes. General recommendations are provided for the use of information tools to improve the management of passenger rolling stock operations in rail transport.
Keywords: information technologies, digital modeling, digital twin, automated control, system analysis, process approach, reliability, rolling stock operation, maintenance and repair, monitoring systems
Construction work often involves risks when carrying out complex sets of tasks described in the form of network schedules, in particular, risks of violating tender deadlines and project costs. One of the main reasons for increased project risks is a lack of resources. The main objective of this study is to develop a methodology for modeling network schedules under resource constraints, taking into account the stochastic influence of risks on project completion deadlines. The paper analyzes tools for modeling project schedules; describes a mathematical model for estimating project cost based on a network schedule under resource constraints; proposes a method for modeling a network schedule in the AnyLogic environment; develops an algorithm for modeling parallel branches of a project schedule under resource constraints; and describes a method for modeling a network schedule for project work. Testing was conducted based on a network schedule for a project to construct a contact line support. It has been shown that the method allows for obtaining probabilistic estimates of project deadlines and costs under conditions of risk and limited resources. The methodology can be applied to various projects described by network schedules and will allow solving a number of practical tasks: optimizing resources allocated for project implementation, taking into account the time and cost of the project, analyzing risks affecting project implementation, and developing optimal solutions for project risk management.
Keywords: network schedule, work plan, simulation modeling, risk analysis, project duration, project cost
This article describes a developed method for automatically optimizing the parameters of an intelligent controller based on an adaptive genetic algorithm. The key goal of this development is to improve the mechanism for generating an intelligent controller rule base through multiparameter optimization. The genetic algorithm is used to eliminate linguistic uncertainty in the design of control systems based on intelligent controllers. A unique algorithm is proposed that implements a comprehensive optimization procedure structured in three sequential stages: identifying optimal control system parameters, optimizing the structure of the intelligent controller rule base, simulating the automatic generation process, and then optimizing the intelligent controller parameters. Implementation of this approach optimizes the weights of fuzzy logic rules and the centers of the membership functions of linguistic variables.
Keywords: intelligent controller, optimization, genetic algorithm, uncertainty, term set
The article presents a comparative analysis of modern database management systems (PostgreSQL/PostGIS, Oracle Database, Microsoft SQL Server, and MongoDB) in the context of implementing a distributed storage of geospatial information. The aim of the study is to identify the strengths and limitations of different platforms when working with heterogeneous geospatial data and to evaluate their applicability in distributed GIS solutions. The research covers three main types of data: vector, raster, and point cloud. A comprehensive set of experiments was conducted in a test environment close to real operating conditions, including functional testing, performance benchmarking, scalability analysis, and fault tolerance assessment.
The results demonstrated that PostgreSQL/PostGIS provides the most balanced solution, showing high scalability and stable performance across all data types, which makes it a versatile platform for building GIS applications. Oracle Database exhibited strong results when processing raster data and proved effective under heavy workloads in multi-node architectures, which is especially relevant for corporate environments. Microsoft SQL Server showed reliable performance on vector data, particularly in distributed scenarios, though requiring optimization for binary storage. MongoDB proved suitable for storing raster content and metadata through GridFS, but its scalability is limited compared to traditional relational DBMS.
In conclusion, PostgreSQL/PostGIS can be recommended as the optimal choice for projects that require universality and high efficiency in distributed storage of geospatial data, while Oracle and Microsoft SQL Server may be preferable in specialized enterprise solutions, and MongoDB can be applied in tasks where flexible metadata management is a priority.
Keywords: geographic information system, database, postgresql, postgis, oracle database, microsoft sql server, mongodb, vector, raster, point cloud, scalability, performance, fault tolerance
The article discusses modern approaches to the design and implementation of data processing architectures in intelligent transport systems (ITS) with a focus on ensuring technological sovereignty. Special attention is paid to the integration of machine learning practices to automate the full lifecycle of machine learning models: from data preparation and streaming to real-time monitoring and updating of models. Architectural solutions using distributed computing platforms such as Hadoop and Apache Spark, in-memory databases on Apache Ignite, as well as Kafka messaging brokers to ensure reliable transmission of events are analyzed. The importance of infrastructure flexibility and scalability, support for parallel operation of multiple models, and reliable access control, including security issues, and the use of transport layer security protocols, is emphasized. Recommendations are given on the organization of a logging and monitoring system for rapid response to changes and incidents. The presented solutions are focused on ensuring high fault tolerance, safety and compliance with the requirements of industrial operation, which allows for efficient processing of large volumes of transport data and adaptation of ITS systems to import-independent conditions.
Keywords: data processing, intelligent transport systems, distributed processing, scalability, fault tolerance
The article discusses the problems and solutions of 3D printing and additive manufacturing in the context of the integration of artificial intelligence. With increasing demands for quality and efficiency, artificial intelligence is becoming a key element of process optimization. The factors influencing the suitability of models for 3D printing, including time, cost and materials, are analyzed. Optimization methods such as genetic algorithms and machine learning can simplify testing and evaluation tasks. Genetic algorithms provide flexibility in solving complex problems, improve quality, and reduce the likelihood of errors. In conclusion, the importance of further research on artificial intelligence for improving the productivity and quality of additive manufacturing is emphasized.
Keywords: artificial intelligence, 3D printing, additive manufacturing, machine learning, process optimization, genetic algorithms, product quality, automation, productivity, geometric complexity
The article presents a comparative analysis of the performance of three solver programs (based on the libraries lpSolve, Microsoft Solver Foundation and Google OR-Tools) when solving a large-dimensional linear Boolean programming problem. The study was conducted using the example of the problem of identifying the parameters of a homogeneous nested piecewise linear regression of the first type. The authors have developed a testing methodology that includes generating test data, selecting hardware platforms, and identifying key performance metrics. The results showed that Google OR-Tools (especially the SCIP solver) demonstrates the best performance, surpassing analogues by 2-3 times. The Microsoft Solver Foundation has shown stable results, while the lpSolve IDE has proven to be the least productive, but the easiest to use. All solvers provided comparable accuracy of the solution. Based on the analysis, recommendations are formulated for choosing a solver depending on performance requirements and integration conditions. The article is of practical value for specialists working with optimization problems and researchers in the field of mathematical modeling.
Keywords: regression model, homogeneous nested piecewise linear regression, parameter estimation, method of least modules, linear Boolean programming problem, index set, comparative analysis, software solvers, algorithm performance, Google OR-Tools
The work reflects the importance of relational databases for storing information. The capabilities and disadvantages of well-known databases Oracle Database, MySQL, Microsoft Access are analyzed. It is found that the Access database is more suitable for storing information in local information systems, and MySQL is used to develop web applications.
Keywords: databases, data storage, Oracle, MySQL, Microsoft Access
This research investigates the development of expert systems (ES) based on large language models (LLMs) enhanced with augmented generation techniques. The study focuses on integrating LLMs into ES architectures to enhance decision-making processes. The growing influence of LLMs in AI has opened new possibilities for expert systems. Traditional ES require extensive development of knowledge bases and inference algorithms, while LLMs offer advanced dialogue capabilities and efficient data processing. However, their reliability in specialized domains remains a challenge. The research proposes an approach combining LLMs with augmented generation, where the model utilizes external knowledge bases for specialized responses. The ES architecture is based on LLM agents implementing production rules and uncertainty handling through confidence coefficients. A specialized prompt manages system-user interaction and knowledge processing. The architecture includes agents for situation analysis, knowledge management, and decision-making, implementing multi-step inference chains. Experimental validation using YandexGPT 5 Pro demonstrates the system’s capability to perform core ES functions: user interaction, rule application, and decision generation. Combining LLMs with structured knowledge representation enhances ES performance significantly. The findings contribute to creating more efficient ES by leveraging LLM capabilities with formalized knowledge management and decision-making algorithms.
Keywords: large language model, expert system, artificial intelligence, decision support, knowledge representation, prompt engineering, uncertainty handling, decision-making algorithms, knowledge management
This paper addresses the challenges of automating enterprise-level business processes through the development of a specialized web application designed for task planning and working time accounting. The research focuses on solving problems related to organizational efficiency within the context of contemporary digital transformations. An innovative architectural design is proposed, leveraging advanced web technologies such as React for the frontend, Node.js with Express.js for the backend, and MySQL for reliable data storage. Particular emphasis is placed on utilizing the Effector library for efficient state management, resulting in significant performance improvements by reducing redundant UI renders. The developed solution offers robust features that enhance operational transparency, resource allocation optimization, and overall productivity enhancement across departments. Furthermore, economic justification demonstrates cost savings achieved through streamlined workflows and reduced administrative overhead. The practical implementation of this system has shown promising results, providing businesses with enhanced capabilities to manage tasks effectively while ensuring scalability and adaptability to future growth requirements.
Keywords: automation of business processes, task management, time management, web application, React, Node.js, Effector, information systems performance
In the context of modern continuous integration and delivery processes, it is critically important not only to have automated tests, but also their real effectiveness, reliability and economic feasibility. In this paper, the key metrics for evaluating the quality of automated testing are systematized, with a special focus on the problem of unstable tests. New indicators have been introduced and justified: the level of unstable tests and the loss of the continuous integration pipeline, which directly reflect the costs of maintaining the test infrastructure. The limitations of traditional metrics, in particular code coverage, are analyzed in detail, and the superiority of mutation testing as a more reliable indicator of the test suite's ability to detect defects is demonstrated. Key dependencies have been identified on the demonstration data of the real continuous integration pipeline: an increase in code coverage does not guarantee an improvement in mutation testing and does not lead to an increase in the number of detected defects; a high proportion of unstable tests correlates with significant losses of machine time and a decrease in confidence in test results.; Reducing the time to detect and eliminate defects is achieved not only by increasing coverage, but also by reducing the proportion of unstable tests, improving system observability, and strengthening defect management discipline.
Keywords: quality metrics for automated testing, mutation testing, unstable tests, code coverage, empirical metric analysis, comparative analysis of testing metrics, optimization of testing processes, cost-effectiveness of automation, software quality management
Problem statement. When modeling a complex technical system, the issues of parameter estimation are of primary importance. To solve this problem, it is necessary to obtain a methodology that allows eliminating errors and inaccuracies in obtaining numerical parameters. Goal. The article is devoted to a systematic analysis of the methodology for estimating the parameters of a complex technical system using the interval estimation method. The research method. A systematic analysis of the methods of using interval estimates of numerical parameters is carried out. The decomposition and structuring of the methods were carried out. Results. The expediency of using a methodology for describing the parameters of a complex technical system using the interval estimation method is shown. An analysis of the use of various interval estimation models is presented. Practical significance. Application in the analysis and construction of complex systems is considered as a practical application option. The method of estimating the parameters of a complex technical system using the interval estimation method can be used as a practical guide.
Keywords: interval estimation, parameter estimation, numerical data, fuzzy data, complex technical systems
This article investigates the problem of structured data schema matching and aggregates the results from previous stages of the research. The systematization of results demonstrated that while the previously considered approaches show promising outcomes, their effectiveness is often insufficient for real-world application. One of the most effective methods was selected for further study. The Self-Organizing Map method was analyzed, which is based on a criterial analysis of the attribute composition of schemas, using an iterative approach to minimize the distance between points (in the current task, a point represents a schema attribute). An experiment on schema matching was conducted using five examples. The results revealed both the strengths and limitations of the method under investigation. It was found that the selected method exhibits insufficient robustness and reproducibility of results on diverse real-world datasets. The verification of the method confirmed the need for its further optimization. The conclusion outlines directions for future research in this field.
Keywords: data management, fusion schemes, machine learning, classification, clustering, machine learning, experimental analysis, data metrics
The article discusses the application of a systematic approach to the development and optimization of lithium-ion batteries (LIBs). Traditional methods that focus on improving individual components (anode, cathode, and electrolyte) often do not lead to a proportional increase in the overall performance of the battery system. The systematic approach views LIBs as a complex, interconnected system where the properties of each component directly influence the behavior of others and the overall performance, including energy and power density, life cycle, safety, and cost. The work analyzes the key aspects of the approach: the interdependence between the main components of a lithium-ion battery, as well as the features of selecting materials for each component. It is proven that only a multidisciplinary approach that combines chemistry, materials science, and engineering can achieve a synergistic effect and create highly efficient, safe, and reliable battery systems for modern applications.
Keywords: lithium-ion battery, system approach, electrode materials, degradation, optimization, cathode, LTO, NMC
This article presents a methodology for assessing damage to railway infrastructure in emergency situations using imagery from unmanned aerial vehicles (UAVs). The study focuses on applying computer vision and machine learning techniques to process high-resolution aerial data for detecting, segmenting, and classifying structural damage.
Optimized image processing algorithms, including U-Net for segmentation and Canny edge detection, are used to automate analysis. A mathematical model based on linear programming is proposed to optimize the logistics of restoration efforts. Test results show reductions in total cost and delivery time by up to 25% when optimization is applied.
The paper also explores 3D modeling from UAV imagery using photogrammetry methods (Structure from Motion and Multi-View Stereo), enabling point cloud generation for further damage analysis. Additionally, machine learning models (Random Forest, XGBoost) are employed to predict flight parameters and resource needs under changing environmental and logistical constraints.
The combination of UAV-based imaging, algorithmic damage assessment, and predictive modeling allows for a faster and more accurate response to natural or man-made disasters affecting railway systems. The presented framework enhances decision-making and contributes to a more efficient and cost-effective restoration process.
Keywords: UAVs, image processing, LiDAR, 3D models of destroyed objects, emergencies, computer vision, convolutional neural networks, machine learning methods, infrastructure restoration, damage diagnostics, damage assessment
Choosing the best video compression method is becoming increasingly important as the volume of online video grows rapidly. By 2026, people are predicted to watch 82% more video online than in 2020. This means finding a balance between image quality, processing speed, and file size. To achieve the desired parameters, it's crucial to choose the right codec.
This paper compares five popular codecs—MPEG-2, MPEG-4, VP9, MJPEG, and ProRes. Each codec offers its own unique method for compressing video, yielding different file sizes and image quality. The goal was to determine which codec is best suited for various applications: video calls, professional filming, and online broadcasts.
The experiments were conducted on a server with four processor cores, 8 GB of RAM, and an 80 GB SSD. Measurements were taken to determine the speed of each codec, the resulting file size, and the video quality. Based on the results of these tests, recommendations were made on which codec to choose and how it can be improved in different scenarios.
Keywords: video codec, MPEG-2, MPEG-4, VP9, MJPEG, ProRes, AVC, compression, coding
The paper presents the results of an analysis of modern approaches to organizing distributed computing architectures that integrate cloud, fog, and edge levels. The limitations of existing models, which fail to provide a comprehensive description of data flows and the dynamics of interactions between computing nodes, are examined. An adaptive graph-based model is proposed, in which the computing system is formalized as a weighted directed graph with parameters of latency, bandwidth, and energy consumption. The model is implemented within a graph database environment and is designed for multicriteria optimization of information exchange routes. Dependencies for calculating flow characteristics and mechanisms for selecting optimal routes based on QoS indicators are provided. The practical applicability of the concept is confirmed by its potential integration into Internet of Things infrastructures, intelligent manufacturing, and transportation systems, where reducing latency and increasing the resilience of the computing architecture are critical.
Keywords: distributed computing system, cloud computing, fog computing, edge computing, graph model, data flow, route optimization, multicriteria optimization, bandwidth, latency, energy consumption, digital twin, Internet of Things, database, Dijkstra’s algorithm
In modern research, symbolic regression is a powerful tool for constructing mathematical models of various systems. In this paper, three symbolic regression methods are applied and compared: genetic programming, sparse identification of nonlinear dynamics and hybrid method. The performance of each method is evaluated by its ability to find accurate models with high accuracy and low complexity in the presence of varying levels of noise in the observational data. Based on the results of the experiments, it was concluded that the best method for identifying dynamic systems is the hybrid method, which combines genetic programming and sparse identification
Keywords: symbolic regression,identification of dynamical systems,genetic programming, sparse identification of nonlinear dynamics, hybrid method GP-SINDy
This article examines the problem of accounting for and planning consumables in clinical diagnostic laboratories today. It presents the results of using approximation methods (polynomial, exponential, Fourier series, Gaussian function, power function, rational polynomials, and sine sum) and linear regression for predictive planning of biochemical reagent consumption. Data from five years of biochemical research at a local medical facility was used for calculations. The calculations were performed using the MATLAB software environment. A comparative analysis of the methods used was conducted, including the calculation of the determination coefficient (reliability coefficient). Gaussian approximation is the best statistical model for predicting reagent consumption.
Keywords: regression, approximation, study, criterion, polynomial, centroid, probability, determination, reagent, metric
The article addresses a significant limitation of the classic TF-IDF method in the context of topic modeling for specialized text corpora. While effective at creating structurally distinct document clusters, TF-IDF often fails to produce semantically coherent topics. This shortcoming stems from its reliance on the bag-of-words model, which ignores semantic relationships between terms, leading to orthogonal and sparse vector representations. This issue is particularly acute in narrow domains where synonyms and semantically related terms are prevalent. To overcome this, the authors propose a novel approach: a contextual-diffusion method for enriching the TF-IDF matrix.
The core of the proposed method involves an iterative procedure of contextual smoothing based on a directed graph of semantic proximity, built using an asymmetric measure of term co-occurrence. This process effectively redistributes term weights not only to the words present in a document but also to their semantic neighbors, thereby capturing the contextual "halo" of concepts.
The method was tested on a corpus of news texts from the highly specialized field of atomic energy. A comparative analysis was conducted using a set of clustering and semantic metrics, such as the silhouette coefficient and topic coherence. The results demonstrate that the new approach, while slightly reducing traditional metrics of structural clarity, drastically enhances the thematic coherence and diversity of the extracted topics. This enables a shift from mere statistical clustering towards the identification of semantically integral and interpretable themes, which is crucial for tasks involving the monitoring and analysis of large textual data in specialized domains.
Keywords: thematic modeling, latent Dirichlet placement, TF-IDF, contextual blurring, semantic proximity, co-occurrence, text vectorization, bag of words model, thematic coherence, natural language processing, silhouette coefficient, text data analysis
The paper is devoted to the search for an effective decoding method for a new class of binary erasure-correcting codes. The codes in question are set by an encoding matrix with restrictions on column weights (MRSt codes). To work with the constructed codes, a decoder based on information aggregates and a decoder based on the belief propagation are used, adapted for the case of erasures. Experiments have been carried out to determine the decoding speed and correcting ability of these methods in relation to the named classes of noise-resistant codes. In the case of MRSt codes, the decoder, based on the principle of spreading trust, significantly benefits in speed compared to the decoder for information aggregates, but loses slightly in terms of corrective ability.
Keywords: channels with erasure, distributed fault-tolerant data storage systems, code with equal-weight columns, decoder based on information aggregates, decoder based on the belief propagation, RSt code, MRSt code
The article forms the task of hierarchical classification of texts, describes approaches to hierarchical classification and metrics for evaluating their work, examines in detail the local approach to hierarchical classification, describes different approaches to local hierarchical classification, conducts a series of experiments on training local hierarchical classifiers with various vectorization methods, compares the results of evaluating the work of trained classifiers.
Keywords: classification, hierarchical classification, local classification, hierarchical presicion, hierarchical recall, hierarchical F-measure, natural language processing, vectorization