Multi-core processor technology mainly integrates two or more processor cores on a single chip to enhance computing performance. Multi-core processors improve system performance by distributing load on multiple CPU cores, and relying on high-speed on-chip interconnection and high-bandwidth pipelines of memory and input/output (I/O). Under the same conditions, multi-core processors can bring more performance and productivity advantages than current single-core processors. Therefore, the research of scheduling algorithms under multi-core platforms is also a future development trend.
Multi-core processor task scheduling refers to how to allocate multiple tasks to multiple cores for parallel execution through a scheduling algorithm, so as to minimize the total time for task completion. Multi-core task scheduling has long been proved to be an NP problem [1], and it is difficult to find the optimal solution in polynomial time. The most common task scheduling algorithm is based on heuristic scheduling algorithm. Heuristic scheduling algorithms mainly include table scheduling algorithm based on critical path [2,3,4,5], task duplication algorithm [6,7,8], processor allocation algorithm based on task duplication [9], improved multi-core scheduling based on task duplication Algorithm [10], clustering algorithm [11,12,13] and so on.
Since the communication overhead between tasks on the same processor can be ignored, scheduling based on task duplication is an effective strategy for reducing communication overhead. The characteristic of the task copy method is to reduce the communication time between processors by copying the predecessor tasks that have a communication relationship, thereby reducing the execution time of the system as a whole.
When using reasonable and effective duplication rules and strategies, scheduling algorithms based on task duplication have been proven to have better scheduling effects than other scheduling algorithms. However, the scheduling algorithm does not consider the factor of load balancing, and in the DAG graph, there is no dependency between nodes in the same layer. According to the adjustment of the scheduling sequence between nodes in the same layer, the idle time is reduced and the CPU is increased. Utilization, while coordinating the load in each CPU to make it more balanced. Therefore, this paper proposes a hierarchical scheduling algorithm based on task duplication to solve the shortcomings of unbalanced load of traditional scheduling algorithms based on task duplication.
The task scheduling problem is a kind of combinatorial optimization problem in mathematics, that is, an abstract task model of a computer application is established, and then based on the constraints of the task model, through a reasonable scheduling strategy, a scheduling sequence is generated and the tasks are assigned to the processing cores for calculations. Get the least total task execution time and maximize the parallel execution advantages of multi-core systems.
The task scheduling model is mainly divided into two aspects: system model and task model. The system model is a mathematical abstraction of information such as the topological structure and computing capabilities of a multi-core system, and the task model is a mathematical abstraction of computer application programs. It mainly includes information such as the constraint relationship between tasks and the characteristics of the task itself. The following are two parts Detailed discussion.
The system model is an abstraction of the actual computing system. The actual computing system in this article is a multi-core system, that is, a system composed of multiple processing cores. The system is generally expressed as P = {p1, p2,⋯, p
Among them, P represents a collection of processing cores in a multi-core system, which p
The relationship between multi-core tasks is generally represented by DAG (Directed Acyclic Graph), and when there is a dependency between tasks, a weighted DAG graph is used (as shown in Figure 1)
Its mathematical description is:
Among them, the formula
Since the communication time of tasks between different cores is much longer than the communication time between the same cores, when two tasks are on the same processor, the communication time is ignored, that is
The earliest start time
The completion time
Associated predecessor
Where task
The running result of a certain processor
The running result of all processors
The successor task
Execution predecessor
The idle time
The communication start time
For a certain task, when it meets the following two necessary conditions, it can be executed on a specific processing core.
On one hand, All the predecessor tasks of the same processing core with which it is dependent have been executed, and the communication between the predecessor tasks that are not on the same processing core and this task has also been completed;
On the other hand, the time period occupied between the task start time and the end time does not conflict with other tasks on the processing core where the task is located.
For task
For a task graph, the final task scheduling result depends on the task that finishes executing at the latest. That is, the time used by the processor with the longest scheduling length among all processors. which is:
The start time of the communication between the task and the successor task must be after the end time of the task, because only the task execution is completed before the relevant data can be provided to the successor task, which is:
The idea of task duplication scheduling algorithm is to generate copies of specific tasks through duplication. These copies will be allocated to the processing core according to a certain strategy. When the subsequent tasks of the copy are allocated to the same processing core, they can be offset. Consumption of communication between tasks to save time.
Task duplication can be divided into single-task duplication and multi-task duplication according to the number of duplication tasks. Single-task duplication copies and allocates tasks that restrict the start time of the current task to the processing core; multi-task duplication copies and allocates multiple predecessor tasks of the task to the processing core.
Taking Figure 1 as an example, the successor tasks of task
As shown in Figure 2, if the
In the DAG diagram, tasks can be classified by layer. The tasks in the same layer are independent, and the tasks in the same layer are not dependent on each other, that is, if there is no priority difference, the tasks in the same layer, there is no difference in the execution of tasks. The TDLS algorithm mainly relies on hierarchical scheduling without dependencies between tasks on the same layer. By adjusting the scheduling sequence of tasks on the same layer, tasks with a smaller initial start time can be scheduled when the time slice comes, reducing the number of cores. At the same time, considering that the communication value between tasks with dependencies between different layers is too large, it will also affect the completion time of the task, so the predecessor tasks that have a greater impact on the task are adjusted to reduce the communication time between tasks, Thereby increasing the CPU utilization.
Step 1: Calculate the in-degree of all tasks in the DAG graph;
Step 2: Determine whether the in-degree of all tasks is 0, and put all tasks with in-degree of 0 into one layer, that is, the k layer is obtained;
Step 3: Remove the tasks that have been layered in the DAG graph and their related edges to get a DAG graph, and make
Step 1: According to the in-degree of the tasks in the DAG, use the hierarchical algorithm to perform hierarchical operations on all tasks;
Step 2: According to the layering result, the scheduling sequence is initially obtained; because the tasks in the same layer do not have mutual dependence, the tasks of the same layer can be scheduled according to the earliest start time
Step 3: Schedule each task in turn according to the adjusted task scheduling sequence sequence. The polling method assigns the task
If the task's earliest start time
When selecting an algorithm for a task scheduling problem, it is usually necessary to evaluate the time complexity of the algorithm. Time complexity refers to the increasing trend of the execution time required by the algorithm when the scale of the problem expands. Generally, O is used to represent the time complexity. Among them, O(1) means that the time complexity of the algorithm is constant, that is, no matter how much the problem scale increases, the time consumed by the algorithm remains the same; O(n^2) means that the time complexity of the algorithm is square, that is, when the problem scale When the problem is enlarged by 2 times, the time required for the algorithm to solve the problem is 4 times; O(2^n) means that the time complexity of the algorithm is exponential of 2, and once the scale of the problem increases, the time consumed by the algorithm will be exponential increase.
Assume that the DAG task graph has n nodes. By traversing the task graph to schedule each node, the time complexity required for each node is O(n). For each node of each layer, it is necessary to perform simulation scheduling before confirming the scheduling to determine whether the scheduling reduces idle time, reduces CPU waste, and improves CPU utilization. When planning to schedule, the scheduling sequence of each layer needs to be adjusted. At this time, assuming that there are m nodes in a certain layer, the time complexity required at this time is O(m), so the TDLS algorithm is extremely In this case, the time complexity of the algorithm is not greater than O(n^2).
This chapter will use the TDLS algorithm and the traditional task replication-based scheduling algorithm (CPTD and TDMC algorithm) to conduct experiments in the same environment to schedule the specific DAG graph (Figure 1), and use TDLS, CPTD and TDMC to perform the experiments respectively. The task scheduling distribution diagram of the three algorithms available for scheduling is shown in Figures 5~7, and the two performance indicators and the algorithm time complexity of the number of processors used and the CPU utilization rate are compared. Each of the three algorithms is the comparison of performance evaluation parameters is shown in Table 1.
Comparison of scheduling results of TDLS, CPTD, and TDMC algorithms
23 | 2 | O(n^2) | |
23 | 3 | O(n^2) | |
24 | 5 | O(n^2) |
The performance of the task scheduling algorithm can be evaluated with the following performance evaluation parameters:
The number of processors used to complete all tasks. After all tasks in the task graph are scheduled, the fewer processors are used, the more resources can be saved. Scheduling length. After all tasks in the task graph are scheduled, the length of time used, the smaller the scheduling length, the better the explanation of resources, and the better the algorithm.
For a specific task graph example (Figure 1), call TDLS, CPTD, and the (TDMC algorithm for short) in Literature 14 (referred to as TDMC algorithm [14]) respectively based on task duplication scheduling algorithm, draw a scheduling diagram, and compare these 3 algorithms Various performance indicators.
The DAG graph shown in Fig. 1 has 4 layers, including 9 tasks
Use TDLS, CPTD and TDMC three algorithms to schedule the DAG graph respectively, and the task scheduling and distribution diagram corresponding to the three algorithms can be obtained, as shown in Figure 5~7, and the comparison of each performance evaluation parameter under the three algorithms the situation is shown in Table 1.
It can be seen from Figures 5 to Figures 7 and Table 1 that the TDMC algorithm is used to schedule the task graph. The total execution time is 24, and the number of processor cores is 5. The main reason why the CPU is not fully utilized is
The total execution time used by the improved TDLS algorithm is 23, which only occupies 2 processor cores. It is mainly used in the scheduling process of the third layer, by adjusting the scheduling sequence, according to the
Compared with the TDMC algorithm, the TDLS algorithm on the one hand shortens the total scheduling length of task execution (the scheduling length is reduced from 24 to 23), and on the other hand, it reduces the number of CPU required to complete all tasks (the number of CPU is reduced from 5 to 2), the algorithm reduces the waste of resources on the whole and improves the utilization of CPU; at the same time, compared with the CPTD algorithm, the TDLS algorithm reduces the number of redundant tasks on the one hand (such as: Reduction in the number of tasks
Aiming at the problem of load imbalance in the traditional multi-core scheduling algorithm based on task duplication under certain circumstances, this paper proposes a hierarchical scheduling algorithm based on task replication, TDLS. TDLS is based on no scheduling sequence between tasks in the same layer. The scheduling sequence of tasks in the same layer can shorten the idle time of the CPU, shorten the execution time of the program, and then improve the utilization of multiple CPU. It is proved through comparative experiments that the TDLS algorithm is compared to the other two traditional scheduling algorithms based on task replication. The load is more balanced, and it also has a certain significance for the improvement of the scheduling performance of the multi-core parallel computer system.
The hierarchical task scheduling algorithm based on task replication proposed in this paper overcomes some inherent shortcomings of traditional task replication-based algorithms in the experimental environment of simulated scheduling, and improves the efficiency of task scheduling and CPU utilization. The research of this algorithm is completely based on assumptions and simulated environment. However, the real situation is more complicated and changeable. For multi-core systems, its load balancing, power consumption, and communication congestion between cores need to be considered under actual conditions. Therefore, it is necessary to further expand the scope of research in future research, apply it to real-time systems, and make more comprehensive considerations for the problems in the actual situation.