You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Motivation

When using physical SSDs in a ZFS configuration, write performance degrades over time when the storage
system is used for a long time. This problem occurs due to the nature of the SSD. Depending on the type
of SSD and how long that drive is loaded, the negative impact on write performance can be significant.
To solve this problem, ZFS supports the TRIM feature, which helps prevent performance degradation of SSDs.

There is also a need to periodically check the integrity of the data in EVE in the ZFS configuration using
the Scrub function. This verification is necessary in order to detect and prevent errors before they cause
hardware failure or data corruption in EVE, which is not acceptable to us.

The requirement to be able to maintain the storage system and periodically check it for errors comes from
EVE customers and also is in line with the requirements for improving the storage system in EVE.
And this proposal is to add the ability to instantly launch or schedule these commands on the controller side.
Since commands can negatively affect performance in the course of their work, it is proposed to add
the ability for users to set the time for their launch themselves through the controller.

TRIM

TRIM is a command which allows the filesystem to notify the storage device which blocks are no longer in use.
Сan read more about what TRIM is on the wiki

This functionality is already implemented in OpenZFS and all we need is to add the ability to control this
through the controller.

The benefit for EVE that we get after implementing the ability to run this command from
the controller is the following:

  • Reduced write amplification (fewer writes)
  • Higher write throughput (less read-erase-modify)
  • Increased device longevity (finite erase-write cycles)

An example of the positive impact of the TRIM team was presented at the OpenZfs conference in 2019 in the
presentation of this functionality:

SCRUB

ZFS checksums every block of data that is written to disk, and compares this checksum when the data is read back
into memory. If the checksums don’t match we know the data was changed by something other than ZFS
(assuming a ZFS bug isn’t the culprit), and assuming we are using ZFS to RAID protect the storage the issue
will be automatically fixed for us.

But what if you have a lot of data on a disk that isn’t read often? ZFS provides a scrub option to read back all
of the data in the file system and validate that the data still matches the computed checksum. This feature
in EVE can be accessed by running the zpool utility with the “scrub” option and the name of the pool to scrub.

In this proposal, it is proposed to add functionality that allows you to run this function from the controller,
including according to the schedule.

Weak sides

The weak side is the fact that performance decreases when executing these commands since for their execution they,
like everyone else, require resources.

A study was also conducted to determine the negative impact on performance when reading and writing at the time
of executing these commands. And it turned out that both commands in the process of their execution show a decrease
in performance on the device by at least half when reading from disk or zpool. And only the Scrub command has
a negative impact on system performance when writing. The scrub operation consumes all I/O resources on the system.

There are levers in ZFS to limit these commands in system resources, but another big new study is needed to test them. 
After which it will be possible to think about improving this functionality in EVE.

EVE API Additions

The StorageCmdConfig message is expected to be received from the controller:


// StorageCmdType is the type of command for servicing the storage system
enum StorageCmdType {
	STORAGE_CMD_TYPE_UNSPECIFIED = 0;
	STORAGE_CMD_TYPE_TRIM = 1; // Command for Trim ops
	STORAGE_CMD_TYPE_SCRUB = 2; // Command for Scrub ops
}

// StorageCmdRunType - type for determining the frequency of commands running
// when servicing the storage system
enum StorageCmdRunType {
	STORAGE_CMD_RUN_TYPE_UNSPECIFIED = 0;
	STORAGE_CMD_RUN_TYPE_NOT_RUN = 1;
	STORAGE_CMD_RUN_TYPE_NOW = 2;
	STORAGE_CMD_RUN_TYPE_EVERY_HOUR = 3;
	STORAGE_CMD_RUN_TYPE_EVERY_3_HOURS = 4;
	STORAGE_CMD_RUN_TYPE_EVERY_6_HOURS = 5;
	STORAGE_CMD_RUN_TYPE_EVERY_12_HOURS = 6;
	STORAGE_CMD_RUN_TYPE_EVERY_DAY = 7;
	STORAGE_CMD_RUN_TYPE_EVERY_2_DAYS = 8;
	STORAGE_CMD_RUN_TYPE_EVERY_3_DAYS = 9;
	STORAGE_CMD_RUN_TYPE_EVERY_WEEK = 10;
	STORAGE_CMD_RUN_TYPE_EVERY_2_WEEKS = 11;
	STORAGE_CMD_RUN_TYPE_EVERY_MONTH = 12;
	STORAGE_CMD_RUN_TYPE_EVERY_3_MONTHS = 13;
	STORAGE_CMD_RUN_TYPE_EVERY_6_MONTHS = 14;
	STORAGE_CMD_RUN_TYPE_EVERY_YEAR = 15;
}

// StorageCmdConfig describes the desired command configuration for serving storage.
message StorageCmdConfig {
	string pool_name = 1;
	StorageCmdType cmd_type = 2;
	StorageCmdRunType run_type = 3;
}

From the EVE side, a message StorageInfo with information about the current state of the commands will be added to
the message StorageServiceCmd with information about the state of the storage system:

// StorageServiceCmd - information about servicing storage systems (Trim/Scrub)
message StorageServiceCmd {
	org.lfedge.eve.config.StorageCmdType cmd_type = 1; // Scrub or Trim
	org.lfedge.eve.config.StorageCmdRunType run_type = 2; // The run instruction (time period)
	uint64 last_update_time = 3; // Time of the last scheduled start check
	uint64 next_run_time = 4; // The time when the command will be run next
	uint64 last_change_cmd_run_type_time = 5; // The time the command was last modified
	repeated uint64 launch_times_history_list = 6; // Stores history of previous runs
}

Discussion

  • Do we want to be able to pause these commands through the controller?
  • The EVE API Additions topic suggested options for time periods to automatically run these commands on a schedule.
    If you have any objections or suggestions, let's discuss them in the comments.
  • No labels