Abstract:
Paddy yield prediction has become increasingly critical in the context of global food security and
climate variability. Artificial-Intelligence (AI) and machine learning (ML) provide powerful tools
for improving prediction accuracy, but selecting and interpreting input features remain
challenging due to complex agro-environmental interactions. This review aimed to (1) categorize
critical features used in AI-based models, (2) assess the influence of various parameters on paddy
yield predictions and (3) evaluate methodologies applied in feature selection and sensitivity
analysis. A structured review of 89 peer-reviewed studies from Google Scholar was conducted
using PRISMA guidelines, with data extracted to address four research questions. The review
found that the most commonly used features fell into eight categories: climatic, soil, crop
phenotypic, remote sensing, management, geospatial, temporal and stress/environmental
factors. Rainfall and temperature were the most frequently used meteorological inputs,
appearing in over 90% of studies. Evapotranspiration and cumulative rainfall were especially
impactful in water-stress contexts. The importance of meteorological features varied by crop
season, region, and irrigation practices, rainfall dominated in rain-fed systems, while temperature
or vegetation indices were more influential in irrigated settings or during later crop stages. A
wide range of feature selection and sensitivity analysis techniques was applied, including
correlation-based methods (Pearson, Spearman), statistical techniques (stepwise regression, t
tests, RFE, GAM), and model-intrinsic scoring tools (Gini index, SHAP, attention weights).
Performance impact methods such as LOOCV, feature shuffling and ablation studies were also
used, along with dimensionality reduction (PCA) and optimization algorithms (GA, PSO, SCA). Key
challenges included high dimensionality, feature redundancy, temporal-spatial variability, poor
data quality, lack of model interpretability and inconsistent reporting. Best practices identified
included feature pre-screening (PCA), adopting temporally aware models like LSTM, applying
explainable AI tools (SHAP, LIME, PDP), combining expert judgment with algorithmic selection,
and ensuring standardized, interpretable reporting for better reproducibility and decision
making in sustainable agriculture. These insights underscore the need for integrative, explainable
and context-specific AI approaches to enhance the reliability of rice yield forecasting and support
evidence-based decision for climate-resilient, sustainable agriculture.