A Smart Irrigation System Using the IoT and Advanced Machine Learning Model
Categoría del artículo: Article
Publicado en línea: 24 feb 2025
Páginas: 13 - 25
Recibido: 28 jul 2024
Aceptado: 20 ago 2024
DOI: https://doi.org/10.2478/jsiot-2024-0009
Palabras clave
© 2024 Upendra Roy B.P et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
The increasing global demand for efficient water management in agriculture has driven the design of innovative smart irrigation techniques. The integration of IoT technologies with advanced ML models has revolutionized traditional irrigation practices by enabling real-time monitoring and precise control of water resources [1–3]. However, existing irrigation models often exhibit limitations, including lower accuracy and inadequate adaptability to diverse environmental conditions. These challenges highlight the necessity for robust, scalable, and efficient solutions to optimize water usage and ensure sustainability in agricultural operations[4].
In this study, we propose a Smart Irrigation System utilising IoT and an advanced ensemble ML framework to address these limitations. The system utilizes an ensemble of DTC and RFC algorithms to analyse critical environmental parameters like soil moisture, temperature, pH levels, and soil variants [5–7]. By processing data from the Great Time dataset, the model is trained to deliver precise irrigation recommendations tailored to varying environmental conditions. The proposed framework achieves a remarkable accuracy of 98.7%, outperforming traditional methods while maintaining computational efficiency.
Traditional irrigation systems rely on rule-based methods or standalone models, which often fail to adapt to the dynamic nature of environmental changes [8–10]. These systems are resource-intensive and limited in scalability, making them unsuitable for large-scale or resource-constrained farming environments [11]. To overcome these challenges, the proposed IoT-enabled system integrates real-time data collection, advanced feature analysis, and ensemble learning techniques, ensuring optimal irrigation scheduling and sustainable water management.
The research highlights the transformative potential of combining IoT and machine learning technologies to enhance agricultural productivity and sustainability [12]. The following sections provide an outline of the recommended methodology, including the preprocessing pipeline, the ensemble learning architecture, and the experimental validation. Results demonstrate the system's robustness, scalability, and adaptability, positioning it as a viable solution for addressing water scarcity and improving crop productivity in diverse agricultural landscapes.
The research introduces a novel IoT-enabled Smart Irrigation System, leveraging an ensemble of DTC and RFC algorithms for precise irrigation scheduling and water management.
The system is rigorously evaluated using a real-time dataset, which includes key parameters like soil moisture, temperature, pH value, and soil variants, achieving superior predictive accuracy and adaptability compared to existing irrigation methods.
Comprehensive experiments are conducted with detailed performance evaluations using metrics such as accuracy, precision, recall, and F1-score, demonstrating the algorithm’s robustness, scalability, and suitability for real-world agricultural applications.
Following sections are structured in the following manner: Section-2 discusses related works and explores existing approaches to smart irrigation and water management. Section-3 explains the foundational concepts of IoT-enabled systems and the ensemble methodology using DTC and RFC, highlighting their integration into the proposed architecture. Section-4 provides a detailed description of the real-time dataset used, outlines preprocessing methods, and discusses the experimental setup along with an in-depth analysis of the outcomes. At last, Section-5 concludes the study by summarizing key findings and presenting potential future enhancements for improving the efficiency and sustainability of IoT-based smart irrigation systems.
Risheh et al. (2020) [13] introduced a transfer learning-based technique for IoT-enabled smart irrigation. By repurposing pre-trained neural networks, the model minimized the need for large labeled datasets, achieving competitive performance with reduced training requirements. Reduced training time and effective application in data-scarce environments. Dependency on pre-trained models limits customization for highly localized conditions. The study highlighted the potential for federated learning to address privacy and data-sharing concerns.
Karar et al. (2020) [14] introduced an IoT-based neural network for controlling water pumps in smart irrigation setups. The system dynamically adjusted water delivery based on soil moisture and environmental conditions. Efficient water utilization and improved crop health through automated control mechanisms. Challenges in scalability for large-scale agricultural operations and dependency on consistent internet connectivity. Future research emphasized exploring decentralized IoT architectures for enhanced resilience and scalability.
Kashyap et al. (2021) [15] implemented an IoT-enabled intelligent irrigation system utilising a deep learning neural network (DLNN) architecture. Their system optimized water allocation based on real-time environmental data. High accuracy in irrigation prediction, significant water savings, and adaptability to dynamic climatic conditions. Resource-intensive neural networks may hinder deployment on low-power IoT devices. They suggested integrating edge computing to mitigate these limitations and enhance system scalability.
Aydin et al. (2021) [16] developed an artificial intelligence-powered irrigation system integrating IoT sensors for automated water management. Their model employed ML methods to predict crop water needs based on real-time sensor data. Enhanced automation and reduced manual intervention, suitable for small and medium-sized farms. Limited performance in handling extreme environmental variations, necessitating further refinement of predictive algorithms. The research proposed incorporating weather prediction models to improve accuracy.
Gao et al. (2021) [17] developed a Deep Bidirectional LSTM approach integrated with IoT systems for soil moisture and electrical conductivity identification in citrus orchards. Their approach leveraged IoT devices for real-time data acquisition, ensuring precise predictions tailored to crop requirements. Effective in capturing temporal dependencies for accurate moisture predictions, enabling targeted irrigation scheduling. The system’s dependence on high-quality IoT sensor data might limit usability in regions with limited infrastructure. The study emphasized integrating additional environmental factors to improve prediction robustness.
Kurtulmuş et al. (2022) [18] developed a deep learning framework aimed at enhancing soil sensor data accuracy, enabling efficient irrigation decisions. Their model focused on proximal soil sensor optimization using convolutional neural networks (CNNs). Enhanced prediction accuracy and resource efficiency, suitable for small-scale precision agriculture. Limited scalability to broader agricultural landscapes without adjustments for diverse soil conditions. The research proposed further exploration of hybrid deep learning models for greater adaptability.
Bai and Tahmasebi (2023) [19] introduced a graph neural network (GNN)-based framework for groundwater level forecasting. Their model capitalized on GNNs’ ability to represent spatial relationships among sensor nodes effectively, enhancing prediction reliability. Superior modeling of spatial dependencies, making it particularly suitable for large-scale groundwater management. High training complexity and potential overfitting in sparse data scenarios require further refinements. The authors proposed integrating ensemble methods with GNNs for improved generalization in diverse geographical contexts.
Deforce et al. (2024) [20] explored the integration of transformers and data fusion techniques in smart irrigation systems to improve prediction accuracy. Their model harnessed the capabilities of transformer-based architectures for analysing complex environmental datasets, combined with data fusion for real-time soil and crop condition monitoring. The approach demonstrated exceptional scalability and adaptability across diverse agricultural environments. High accuracy in predicting irrigation requirements, effective handling of multi-source data, and strong adaptability to varied conditions. Transformers’ high computational demands might challenge deployment in resource-constrained environments, necessitating optimizations for real-time use. Future directions included lightweight transformer models to balance performance with efficiency.
The system utilizes real-time data collected from a range of IoT sensors deployed across the agricultural field. These sensors include soil moisture sensors, pH sensors, temperature sensors, and light intensity sensors, which capture critical environmental parameters that directly influence plant health and water requirements. Specifically, the recorded data attributes include soil moisture, which indicates the water content in the soil and serves as a key factor in determining irrigation needs; soil pH, which provides information on the soil’s acidity or alkalinity and impacts plant growth and water retention; soil type/variant, which classifies the soil according to its texture and composition (such as sandy, loamy, or clay) and affects its water retention capacity; ambient temperature, which influences evaporation rates and plant transpiration, thereby affecting irrigation scheduling; and light intensity, which impacts plant photosynthesis and offers insights into the growth cycle and water needs of crops.
It is a primary component for making raw sensor data for analysis. It includes cleaning, transforming, and structuring the data to enhance its quality, handle missing or inconsistent values, and ensure compatibility with machine learning models. The following preprocessing techniques were applied to the numeric and categorical features in the dataset:
Missing data points were handled using imputation methods to ensure the dataset is complete. For numeric features such as soil moisture, temperature, and pH, missing values were imputed utilising the mean or median of the respective columns, depending on the distribution of the data.
Since the dataset includes features with varying scales (e.g., temperature, soil moisture, and light intensity), normalization was applied to bring all features to a similar scale. The Min-Max scaling technique was used, which rescales the data to a range between 0 and 1. This ensures that no single feature dominates the learning process and allows ML models to effectively perform.
After preprocessing, the dataset was divided into training and testing subsets to assess the algorithm’s effectiveness. The training set, which was utilised to train the machine learning algorithm, consist 80% of the data, while the remaining 20% was reserved for testing. This split ensures that the model is trained on a large portion of the data and validated on a separate portion to check its generalization ability.
In this study, an ensemble model combining RFC and DTC is proposed to predict optimal irrigation schedules for a smart irrigation system based on real-time sensor data. These models were taken for their capability to manage complex relationships in the data and their effectiveness in classification tasks. The ensemble approach enhances the performance by leveraging the strengths of both individual models.
It is an ensemble learning approach that operates by constructing multiple decision trees and combining their outputs to make a final prediction. This technique is specifically powerful when handling with high-dimensional data, such as the variety of environmental factors in a smart irrigation system.
The RFC model consists of several individual decision trees, each trained on a random subset of the dataset using bootstrapping (sampling with replacement). Each tree in the forest is trained to make predictions based on different features and observations, ensuring that the model doesn't overfit to any specific pattern in the data. The final classification result is obtained by aggregating the predictions of all the decision trees, typically using majority voting for classification problems.
For each decision tree, RFC uses a splitting criterion like Gini impurity or Information Gain) to recursively divide the data at each nodeAt every division, the framework selects the attribute that most effectively partitions the dataset into distinct categories. This process continues until a predetermined halting condition is satisfied, such as the highest depth of the tree or the lowest number of instances per terminal node.
The primary benefits of RFC includes:
By averaging the predictions of many decision trees (DT), RFC minimizes the risk of overfitting, which is common in a single decision tree. RFC provides a useful measure of feature importance, which can help identify which environmental parameters (e.g., soil moisture, pH, temperature) most influence irrigation scheduling. RFC can handle large datasets and high-dimensional data efficiently, forming it appropriate for real-time sensor data in smart irrigation systems.
Some key hyperparameters of the RFC model include:
The total count of DT within the forest. The utmost depth achievable by any individual DT. The least quantity of samples necessary to divide an internal node. The greatest amount of attributes to evaluate while identifying the optimal split.
In this study, an ensemble model combining RFC and DTC is proposed to predict optimal irrigation schedules for a smart irrigation system relied on real-time sensor data. These models were selected for their capability to manage intricate connections in the dataset and their reliability in classification tasks. The ensemble approach enhances the performance by utilizing the advantages of both individual models.
It is a simple yet effective machine learning algorithm used for both classification and regression tasks. In this study, DTC is employed as a base model within an ensemble framework to classify irrigation needs based on sensor data.
DT operates by recursively dividing the data into segments depending on the value of input attributes. At every node, the tree makes a choice based on a feature's value that most effectively separates the data into distinct categories. This procedure continues until the tree meets a halting condition, such as a defined tree depth or the least number of samples needed in a node.
DTC uses criteria such as Gini impurity or Information Gain to decide how to split the data at each node. These measures assess the "impurity" of the data at a node and select the feature that provides the best separation of the classes.
One of the main benefits of DT is their transparency. The decision-making process is easy to interpret, making the model's predictions more understandable to non-experts.
Decision trees can handle non-linear relationships between features without requiring transformation or scaling of the data.
Decision trees can effectively handle datasets with a large number of features and categorical data.
Important hyperparameters of the DTC include:
Controls the maximum tree depth. Defines the least amount of data points needed to divide an inner node. The smallest number of instances needed to be at a terminal node. The maximum count of features to consider when splitting a node.
The ensemble model proposed in this study combines the RFC and DTC to leverage their strengths and improve the overall classification accuracy for predicting irrigation needs. The ensemble approach aims to combine the advantages of individual models, reducing the likelihood of overfitting and increasing model robustness.
Hybrid RFC - DTC
The ensemble combines the RFC and DTC models using stacking or voting mechanisms. In stacking, the predictions of the RFC and DTC are used as inputs to a second-level classifier, which combines their outputs to make a final decision. In voting, the individual classifiers (RFC and DTC) cast votes on the predicted class, and the class with the most votes is chosen as the final output. Improving the Accuracy by combining the predictions from both the RFC and DTC, the ensemble model can correct errors made by individual models, leading to improved accuracy. Robustness to Overfitting while DTC is prone to overfitting, RFC helps mitigate this by averaging over multiple trees. The combination of both algorithms reduces overfitting in the ensemble model. The RFC model is particularly good at handling complex and high-dimensional data, while DTC is more interpretable and can handle simpler patterns. Together, they provide a balanced approach to classification.
The ensemble algorithm’s effectiveness can be further optimized by tuning the hyperparameters of both the RFC and DTC models. This includes adjusting the count of trees in RFC, tree depth, and the splitting criteria in DTC. Grid search or random search methods can be used to find the optimal combination of hyperparameters for both classifiers.
Once trained, the ensemble model is integrated into the smart irrigation system. Real-time sensor data, such as soil moisture, pH, temperature, and light intensity, is fed into the model, which then predicts the optimal irrigation schedule. The ensemble approach ensures that the system’s predictions are accurate, efficient, and capable of adapting to different environmental conditions.
The model was designed utilising Python3.19 programming and libraries like matplotlib, numpy, pandas, Scikit-Learn, seaborn are utilized for evaluating the proposed model. The experimentation was carried out in the PC workstation with i7 CPU, 3.2 GHZ operating frequency, 16GB RAM, NVIDIA Tesla GPU.
Several evaluation metrics were utilised to assess the effectiveness of the recommended ensemble algorithm based on DTC and RFC. Accuracy computes the proportion of correctly classified instances out of the total number of instances in the dataset. It offers an overall sense of how well the algorithm excels in recognising irrigation requirements relying on environmental parameters. Precision quantifies the proportion of correctly recognized optimal irrigation recommendations out of all recommendations predicted by the algorithm. It indicates the method’s capability to avoid false recommendations (i.e., false positives). Recall quantifies the fraction of genuine optimal irrigation occurrences accurately recognized by the framework. It emphasizes the model’s ability to recognize all pertinent irrigation requirements. The F1-score, calculated as the harmonic average of precision and recall, equilibrates the compromise between these two criteria and delivers a unified metric of the framework's effectiveness. This metric is particularly useful in scenarios with imbalanced datasets, ensuring that the algorithm maintains an improved accuracy and reliability across diverse agricultural conditions.
Table 2 presents a comparative assessment of distinct DL models. The proposed hybrid model outperforms the baseline techniques, including DTC, RFC, Support Vector Machine (SVM), Linear Regression, Logistic Regression and the proposed. With an accuracy of 98.7%, the recommended approach shows improved effectiveness across all metrics, highlighting its robustness from the real time dataset.
Evaluation metrics utilized for the assessment.
SL.NO | Performance Measures | Expression |
---|---|---|
1 | Accuracy | |
2 | Recall | |
3 | Specificity | |
4 | Precision | |
5 | F1-Score |
Comparative Analysis between the Different Models
Models | ACCURACY | PRECISION | RECALL | F1 SCORE |
---|---|---|---|---|
DTC | 93.5 | 90.8 | 89.7 | 97 |
RFC | 95.3 | 94.1 | 93.4 | 96.7 |
SVM | 94.8 | 94.3 | 96.6 | 93.9 |
LINEAR | 93 | 91.4 | 95 | 95.5 |
LOGISTICS | 96.7 | 96.8 | 96.1 | 95.4 |
PROPOSED | 98.7 | 98.7 | 98 | 98.7 |
Figure 4 and 5 illustrates the comparative analysis of testing accuracy among various machine learning models. It highlights the consistent superiority of the proposed hybrid DTC-RFC model over other architectures, the ROC curve also with a significant margin in performance accuracy.

Hybrid Architecture for the Recommended Approach

Random forest Classifier Working Mechanism

Decision Tree Classifier working Mechanism

Performance Metrics Visualization 1

Receiver Operating Characteristic (ROC) Curve
In the rapidly evolving field of agriculture, the effective utilisation of water resources is critical for ensuring sustainability and productivity. This research presents a Smart Irrigation System that integrates IoT technologies with an advanced ensemble machine learning framework, leveraging DTC and RFC algorithms. The proposed system effectively addresses key challenges in traditional irrigation practices by providing precise irrigation scheduling through the analysis of critical environmental parameters like soil moisture, temperature, pH levels, and soil variants. Experimentation utilising the Great Time dataset indicates the model’s effectiveness, attaining a remarkable accuracy of 98.7%. The findings underscore the algorithm’s ability to adapt to dynamic environmental conditions while preserving computational effectiveness, transforming it a viable solution for diverse agricultural scenarios.
The framework can be further enhanced by integrating advanced optimization techniques and additional environmental parameters, such as weather forecasts and crop-specific requirements, to improve prediction accuracy. Incorporating edge computing and federated learning strategies will enable the system to operate securely and efficiently across large-scale agricultural setups while preserving data privacy. Future research can also explore the DL architectures utilisation and sensor data fusion to achieve even greater precision and adaptability in irrigation scheduling. These advancements would make the system more resilient to evolving agricultural demands and environmental conditions.