The article is devoted to the development of a complex of mathematical models describing the functioning of a data storage system based on solid-state drives using deduplication technology. The mathematical model of the user application generates a load on the system as a stream of requests, randomly sized according to the Pareto law, and with random time interval between requests. Requests are received at the input of the storage system into the network service model, then to the VDO deduplication system model, then to the software RAID model and, finally, to the solid-state drive model for read/write operation. Due to the nature of SSDs, system performance in read and write modes is modeled separately, taking into account the different speed characteristics of RAID-5, RAID-6 and RAID-10 arrays. The mathematical model of the reliability of each RAID array is based on the Kolmogorov-Chapman system of equations for calculating stationary probabilities describing transitions between states in a discrete Markov chain. The durability of the system is determined through the model for assessing the exhaustion of the recording resource of solid-state drives. The mathematical model for estimating the storage cost includes the costs of equipment, resources and maintenance over the entire operation period of the system. The final result is a mathematical formulation of the problem of data storage system optimal design, which allows selecting the system architecture and parameters that are optimal in terms of a combination of factors – reliability, speed and cost of data storage.
Keywords: mathematical modeling, data storage system, performance, reliability, solid state drives, RAID array, optimization
The article proposes a structural-functional model of a data storage system based on solid-state drives, implementing deduplication technology to optimize the use of disk space. The concept is based on systems approach. The functioning model is presented in the form of interaction between the control subject (the system administrator) and the control object (the storage system). The system is exposed to external stream of requests from user applications over the Internet. The output parameters of the system include such performance indicators as storage capacity, storage time, storage cost, performance and reliability. The data storage system is presented in the form of four main integrated components, sequentially connected with each other: Network Service; deduplication system based on VDO technology; write management system, software RAID Control System; array of SSD drives. While the flow of requests passes through the system, the delay at each component is determined by the appropriate mathematical model. Such a structural and functional representation of the system allows applying the methods of statistical modeling and queuing theory to its analysis. The use of the proposed model will allow to design a data storage system of the given volume and service life with a minimum storage cost for a consumer with given performance and reliability indicators.
Keywords: data storage system, solid-state drives, deduplication, SSD-drive, RAID-array, modeling, optimization