As a complex industrial chain, logistics covers a wide range of areas and rises with the rapid rise of e-commerce. The timeliness of logistics is crucial for the circulation of e-commerce items. In the logistics industry chain, the distribution centre can reduce the transaction links of goods, improve industrial efficiency, reduce the inventory of merchants, and at the same time can quickly get feedback on demand. The specific process is shown in Figure 1, including five links of express delivery, such as stocks, storage, sorting, shipment and despatch [1–3]. Among these five links, the sorting of express delivery is the most important, and the time and labour costs are also the highest, which has a great impact on the efficiency of the entire logistics system.

In the entire distribution centre, the logistics storage warehouse with strong liquidity has the characteristics of high inflow and outflow frequency of goods and a short inventory cycle. A complete distribution centre first needs to purchase the customer’s goods according to the order information, and then store the purchased goods for a short period of time, which can effectively handle the supply of goods according to customer needs, and properly adjust the stability of the goods in the circulation process to avoid the presence too many or too few products on the market. Second, it has the function of quick sorting, which can sort the goods according to the customer’s order information, and sort according to the customer’s delivery location, the number of goods, the weight of the goods, the specifications of the goods, the customer information, and the order number. After the basic sorting is completed, the sorted goods are effectively combined, and the goods at the same location are uniformly packaged to form an effective, reasonable, low-cost, and high-efficiency logistics channel. The whole process of sorting can be simply described as sorting the goods according to the order information with a certain sorting optimisation method, sorting the delivered goods according to the order of shipment, division, serial number of distribution equipment, customer priority, etc. and finally distribute to specific locations according to logistics vehicles [4, 5]. In the entire logistics sorting chain, the sorting efficiency of orders has a huge impact on the efficiency of the entire operation. Compared with other links, the logistics sorting process accounts for a large proportion of time and consumes high costs [6]. For the rapid sorting of commodities, it can be divided into sorting unit division, sorting method division, sorting operation area division, etc. according to different division principles. The sorting unit partition is classified according to the categories of goods to be sorted, such as large objects, small objects, and special commodities. Here, the two unrelated processes of sorting and transportation are independent, making the sorting professional and modular. Sorting method partition, for different sorting unit partitions, according to the different sorting methods and equipment, sorting is divided into multiple areas, and the purpose is to improve the efficiency of sorting operations and reduce the cost of sorting time [7]. In the case of keeping the sorting method unchanged, the sorting operation area is partitioned, and picking personnel is assigned to carry out the next sorting in a fixed place.

At present, in the logistics transmission chain, the development of the sorting system is not optimistic, and the cost and capital consumed by it still account for a large proportion of the entire logistics system. For the logistics industry, commodity materials are dense, and the optimisation and improvement of automatic sorting are crucial to the improvement of the efficiency of the entire logistics distribution. In modern logistics transportation, there are various commodities, and the market’s demand for commodities has become diverse. As the basis of logistics distribution, logistics distribution needs to improve distribution efficiency while providing basic liquidity guarantees. In the entire logistics sorting process, in this paper, we will discover the modus operandi to improve the express sorting speed, optimise the sorting efficiency of the intelligent transmission chain without purchasing new automation, and optimise the sorting algorithm and model, as well as the key research directions of related research. Boysen

Through the analysis of related sorting and the optimisation research of related scholars in sorting and transmission, it shows the importance of sorting in the overall logistics transmission chain, and improving the sorting efficiency of the logistics transmission chain is very important for the entire logistics system. Therefore, in this paper, we use the data mining clustering method to sort related items, classify and merge variables according to aggregation and decentralised clustering methods, and optimise the intelligent transmission of the entire logistics chain combined with effectiveness experiments, improve the efficiency of sorting operations, and at the same time improve the efficiency and stability of the entire logistics level, providing a strong guarantee for the development of the society and the logistics industry.

In the logistics intelligent sorting and conveying chain, there are often several variables, and the variables have a group structure. Cluster analysis is an analysis process that divides samples or variables with high similarity into one class, and its main goal is to classify the collected data based on similarity. Cluster analysis is widely used in many disciplines, such as mathematics, statistics and so on. Under the influence of artificial intelligence, data mining, as an important tool for data analysis, has become more and more common in applications from academic research to market analysis. Sisman and Aydinoglu [13] improves the performance of large national real estate valuation by applying dataset optimisation and spatially constrained multivariate clustering analysis, which defines geographic value clusters to improve valuation accuracy. Park

Different clustering algorithms and their feature tables

Partition-based clustering Algorithms | Heuristic algorithm, difficult to deal with complex data |

Hierarchical class | Sensitive to data entry order |

Density based | Very sensitive to custom user parameters |

Grid based | Grid granularity is not easy to control |

Based on neural network | Grid fixed structure, long training time |

The partition-based clustering algorithm first defines the indicator function, but most of its algorithms are heuristic algorithms, which are difficult to deal with large-scale data types, easy to fall into a local minimum, and weak in noise processing [19,20]; In clustering algorithms, the segmentation between levels is highly dependent, and as long as the aggregation is performed, the results cannot be modified [21–23]; density-based clustering algorithms use the density function for clustering, and the nodes are continuously expanded according to the clustering, which can handle any data type, but is sensitive to custom user parameters [24–26]; the grid-based clustering algorithm is a discretised method to process spatial data, but the grid particles are not easy to control, and the parameter sensitivity is high [27, 28]; the clustering algorithm based on neural network has some shortcomings, mainly due to the long training time [29, 30].

At present, all walks of life are using cluster analysis to divide objects with the same or similar attributes into one or more subsets through a mathematical algorithm. At present, it is widely used in machine learning, data mining, image analysis and other fields. Clustering algorithms can be divided into two types: structural type and decentralised type according to the form of clustering. The former is reclassified by successful clusters, and the initial N samples are regarded as independent N clusters, which are continuously aggregated through a certain clustering algorithm or an iterative algorithm. The method is mainly divided into partition-based, hierarchical-based, density-based, grid-based, and neural network-based clustering algorithms. The selection of the clustering algorithm is shown in Figure 2.

The latter is to determine the classification at one time and perform the division of the set or the merge operation of the set according to the given set of elements. The method is mainly divided into clustering hierarchical clustering method and decentralised hierarchical clustering method. The specific hierarchical clustering effect is shown in Figure 3. This paper adopts the clustering hierarchical clustering algorithm.

Variable cluster analysis is classified according to the degree of closeness and similarity of variables in features. To express it intuitively, this section gives two variables. _{ij}_{i}_{j}

Common types of variables are qualitative and quantitative variables. In this paper, quantitative variables are mainly used, and the commonly used similarity coefficients are the cosine of the included angle and the correlation coefficient. The specific definition expression is as follows:

In formula (1–2), the distance method for measuring samples needs to be given. There are single linkage, full linkage, group average, and median algorithms, in addition to the gravity centre method and square sum of deviations for calculating the similarity distance between classes.

The two classes with the closest distance between classes are merged, and the specific distance formula is as follows:

Refers to the aggregation of classes according to the longest distance between clusters. The specific distance formula is as follows:

The distance between clusters is equal to the average distance between two cluster objects. The specific distance formula is as follows:

The distance between clusters is represented by the centroid distance of the clusters. The specific distance formula is as follows:

Among them, _{p}_{q}

The distance between clusters is based on the single linkage and the full linkage method, and the median value is expressed by the single linkage and the full linkage. The specific distance formula is as follows:

Among them, _{r}_{p}_{q}_{k}

According to the mathematical method, the specific distance formula is as follows:

This paper uses the Fisher distance to verify the optimal solution for hierarchical clustering, where the distance _{jk}

Among them, _{i}_{j}_{j}

The variable group structure identification algorithm designed in this paper uses the hierarchical clustering method to cluster variables, and uses sampling to obtain the Rand index for multiple adjustments so that the number of clusters when the average value of the Rand index reaches the maximum is used as the final number of groups. Specific steps are as follows:

To extract the complex sample useful information without omission, it is necessary to process the sample data. Suppose a given dataset contains m samples, and then sample the dataset m times with replacement to generate a training set of m samples. During the whole process: the probability of being selected is

Among them,

In addition, the specific calculation process of adjusting the Rand coefficient is as follows:

Among them

Assuming that there are n samples in the data set and the number of variables is

Use the bootstrap method to extract n samples from the original data set to obtain the data set S.

Normalise the

Calculate the

Clustering the

Repeat (a)–(d) to calculate the average of the Rand index.

Determine the optimal number of variable categories.

Normalise the original dataset to calculate the distance D for the

Use hierarchical clustering to cluster

The data source of this paper and the manual sorting system in operation include customer order, inventory status, product variety, product specification, product weight and other information. We run simulations under two conditions: one assumes that the variables assigned to each group are uniformly grouped, and the other assumes that the variables per group are not uniformly grouped. In addition, according to the use of six inter-cluster distance calculation methods and the clustering effect, this paper uses the median method and the single connection method for clustering and combines the needs of practical work to reasonably divide the number of clusters. The validity of its classification results is verified. Finally, the results of this paper select an optimal item allocation method based on the user-defined range of the number of partitions.

The verification of the classification results needs to use the Fisher distance. According to the actual needs, the classification range is set to 2–5 partitions, that is, 2≤ Z≤ 5, and a red line is used as the benchmark for segmentation, and the ordinate D is the distance between the clusters. The horizontal axis G represents the item information. Based on different colours, the clustering situation under different segmentation conditions is distinguished. For the segmentation of the clustering results of the single connection method, two results are finally divided into two partitions and three partitions.

As can be seen from the Figures 4 and 5, it can be determined that when the number of points is 3 by the single connection method, the corresponding segmentation value is F3 = 5.8; when the number of points is 4, the corresponding segmentation value is F4 = 4.66. When using this classification method, the item allocation distance is more advantageous. When using the single connection method, the number of points is 3, and the corresponding value of segmentation is F3 = 2.94; when the number of points is 4, the corresponding value of segmentation is F4 = 2.35. When using this classification method, the results of item distribution are relatively uniform.

By comparing the clustering and segmentation results of the two distance calculation methods, it is concluded that each calculation method has its advantages and disadvantages, logistics distribution time cost and logistics package distribution accuracy. Combining the characteristics of the data, conclusions can be drawn about the median method and the single connection method, which are summarised as follows:

Analysis of the internal compactness of clusters, the distance between clusters, and the distribution of samples, the single-connection clustering results are more in line with the needs of real data than the median method, and the allocation time is faster, but at the same time the sample distribution The results show that the sample distribution of the median method is more accurate.

From the analysis of the compactness of the clustering, the median method and the sample data in practical applications are more densely clustered, but there are obvious differences in the compactness within each cluster, and it also supports more partition ways.

From the analysis of the distance between clusters, the distance between clusters of the single connection method is better, the cluster assignment will be better and clearer, and the distance between clusters of the median method is less prominent.

From the analysis of the distribution results of item distribution, the median method makes the distribution of items more evenly distributed under the partition.

