SLM–IoT Integration for Natural Farming

“If the technology decreases the operational (input) cost and complexity of the smallholder – consider it – else send it back to the drawing board”

– Gram Disha Trust Agritech Maxim

Himachal Pradesh is an emerging state in natural farming. Over 2.23 lakh farmers across all panchayats have already adopted this approach, and the state aims to bring 9.61 lakh farmers on board. This shift promotes environmentally friendly and cost-effective agriculture.

However, farmers are facing a primary challenge of pest management. Currently, farmers receive pest management advice only after the visible damage has occurred. To address this issue, we need a predictive model that can forecast potential damage before it occurs. Artificial Intelligence (AI) based predictive and advisory models hold potential for such a solution. However, the models are also with limitations, specifically with regard to Hallucination and potential false positive detections. To reduce the chance of such an occurance it is important that the models are trained with the Data (historical and current) which is localised to the geography of application.

Recently, interns at Gram Disha Trust also undertook a primary survey with smallholder farmers on appropriateness of Internet of Things (IoT) solutions.

The survey revealed interesting aspects on AgriTech implementations at large, including AI. Therefore it is of interest in this study and Proof of Concept (PoC) that we analyze the possible current data generation through IoT and analyze historical data for suitable predictive advisory generation for the producers.

In this solution we hope to create a Proof of Concept (PoC) of such an AI based model which may be trained of past data, knowledge base on Natural Farming and linked to IoT sensors to get latest data for predictive analysis of pest and diseases. The model should then be able to generate suitable advisories based on Agroecological Production systems to allow farmers to take pre-emptive steps for addressing risks to crop production. We are also attempting to understand if such solutions may be developed into a Digital Public Infrastructure or Goods (DPI/DPG), which may be available for further enhancement in the future by others in the public domain.

The Current Landscape of AI in Indian Agriculture

The IEEE Spectrum monthly magazine published an article regarding “AI Is Driving India’s Next Agricultural Revolution” and reported that AI in agriculture has benefited the farmers by making higher crop yields and productivity by informing them about when and how much to water, fertilize, and rely on insect-pest and disease control, rather than relying on fixed calendars. The increase in the reach of smartphones and 4G connectivity has helped make AI tools accessible in remote areas. However, the cost and complexity of these are still quite high.

https://spectrum.ieee.org/ai-agriculture

Even after technical advancements, few of the major problems still remain:

Cost/Affordability: Smallholder farmers still cannot afford AI tools or services. Even though advisory is made free with the help of AI, the implementation will require investment in infrastructure.
Access to Technology: Even after 4G/5G connectivity and smartphone adoption, the connectivity in rural areas, which have low/no internet connectivity, remains a challenge in many places.
Trust and Reputation: Farmers are cautious of new technologies, and they want proof that the technologies actually work. This proof should especially come from fellow trusted farmers and reputable sources. They avoid trusting blindly, as it might harm their yield.
Usability of advice: Farmers receive AI-generated advice they may not know how to act on. Generic recommendations are not enough, and more context-specific and localized advice is needed.

There are inherent challenges in the way AI solutions are structured – especially through complex and expensive implementations – that results in additional barriers to implementation. The issue of trust and reliability is also a complex domain to address.

R. Agarwal, I. Bhardwaj, A. K. Sharma, A. Sanghi and G. Agarwal, “Innovations in Agri-Tech: A Review of Artificial Intelligence Applications and Challenges in Modern Agriculture,” 2024 Second International Conference on Advanced Computing & Communication Technologies (ICACCTech), Sonipat, India, 2024, pp. 599-604, doi: 10.1109/ICACCTech65084.2024.00101.

AI in Agriculture has promises but are not without its cost and complexities especially for smallholders.

It is also a question that the speed at which the technologies tends to spread, quite often through biased ventures, comes at a cost of the smallholders time and expense. It remains a question –

Why should Smallholders bear the risks and failures of Agritech interventions?

To develop appropriate solutions which are suitable for Smallholders, specifically in the Himalayan Region, This Proof of Concept is being attempted. As with our Maxim, the cost-complexity component of Technology is kept at the fore while developing this innovation.

Reducing Hallucinations with Domain-Specific SLMs

The Large Language Models (LLMs) are a powerful way for the analysis and organization of information as well as the ability to interact with technology more naturally.

Models such as ChatGPT have been trained on general internet data, which may mix conventional and natural farming information. This may lead to the generation of wrong information or misleading answers. The major risk is that the models may hallucinate by providing inaccurate information or incorrect outputs confidently. Even fully fine-tuning an LLM will need high computational power, which is costly.

SLM, on the other hand, being lightweight, is easy to retrain or fully fine-tune. This approach allows for a reduction in hallucinations if SLM is trained on domain-specific data.

A Steep Climb for Agri-Tech Adoption in the Himalayas

In context of Himachal Pradesh, these generic hurdles bring on-the-ground difficulties.

Connectivity: Being a hilly region, Himachal doesn’t have reliable internet connectivity, mobile network coverage, and power supply in rural areas and remote valleys. This lack of connectivity becomes a major bottleneck, as many AI tools depend upon real-time data and smartphone access.
Small and Fragmented Landholdings: Hilly terrain across Himachal makes hardware more costly and harder to deploy efficiently, as farms are very small and more scattered.
High Upfront Cost: Investing in IoT and other hardware devices is unaffordable for smallholder farmers who rely on seasonal crops.
Localization of AI Models: Currently, the models made are trained on a plain-land crop dataset. Crops that are grown on hills need localized data and a model built on that dataset, which for now is currently scarce.

A Localized and Affordable Solution

The article published by Gram Disha Trust discusses the IoT device being built, which will help in reducing these challenges. The survey done told us the acceptable cost limits, which helps design appropriate devices. Though smallholder farmers want devices ranging from ₹2,000–₹5,000 (~US$24–60), they may still be unaffordable for many marginal/poor farmers but will help many small farmers to an extent. The use of historically validated data will help reduce the full reliance on the always-online models. This is done by using local/natural farming advisory, which can use historical and locally validated data. The survey done in varying altitudes will help build the tool to work in all specific climate conditions, which addresses the issue of localization and usability well.

POC: Fine-Tuning SLM with Natural Farming Data

A proof of concept (POC) is being developed using the small language model, such as the google/gemma-3-1b-it version. The SLM is being trained on domain-specific data related to natural farming. Being trained on this data, the model will hallucinate less and provide more accurate results as compared to any other LLM. The model won’t suggest anything outside of the natural farming domain. Moreover, the model’s context awareness will be better.

The approach currently taken is to convert unstructured data into JSONL format for supervised full fine-tuning of the model. The GitHub code provided is to build the baseline model. Currently the model is being fine-tuned on a limited amount of data, due to which it will hallucinate more, but with the increase in the volume of data in the future, the hallucination will decrease to an extent.

From Data to Decisions: SLM–IoT Integration

The crucial step will involve integrating the SLM with the IoT devices that continuously monitor all relevant data and transform the statistical knowledge into a predictive decision support tool. This will help deliver the early warnings of the pest outbreaks.

This will help farmers shift from a reactive approach, where farmers intervene when the damage is visible, to a proactive approach, where early warnings can help to take preventive measures and help reduce crop losses.

What Actually Makes a Model an SLM?

A Survey paper published acknowledged that there is no single, universally agreed definition of Small Language Models (SLMs). It tells us that defining an SLM on the basis of the number of parameters is insufficient. This is because the term “small” is a relative term, and it will usually depend on the context of use.

Sub-billion parameter models have been considered small, as they can run on devices which have around 6 GB of memory.

Whereas models with up to 10 billion parameters are considered small, as they will generally not exhibit the broad “emergent abilities”(learning many different tasks at once)” that are found in any massive model.

The authors proposed defining SLMs based on the function they have and the environment they can run in.

Lower bound, which is the minimum size required by the model to perform a specific task effectively; below this size, the model will be weak.
Upper bound, largest model size which the run is effective for given the limited resource constraints.

Based on this, we are using models with less than 2B parameters. By choosing a less than 2B parameter architecture, we are targeting strict hardware limits like mobile devices.

BERTScore (Testing the Models)

BERTScore is basically used to evaluate the semantic similarity between the generated and reference text. Unlike the BLEU/ROUGE scores, which are token-based metrics, BERTScore captures contextual meaning and is better suited for domain-specific question answering where multiple valid phrasings exist.

For VARTA-AI we generated the BERTScore for multiple xLM. It was noted that the Llama model was best suited for interpretive conversational response. However, the drawback is that model does not match the grammatical sentencing or Lexical correctness of sentences for response. For now, we have chosen the google/gemma-3-1b-it as the initial implementation for the framework development.

The difference across models is less that tells us that the semantic correctness in preserved regardless of the model size. Moreover it tells us that instruction tuning plays an imporatant role more than the parameter count alone.

To know more on how to clone the repo and try the variations of these xLMs and localise a quick Proof of Concept have a look at the README on Github as below.

The Synergy of IoT, AI, and SLM

Integration of IoT (Internet of Things), AI, and SLM (Small Language Model) will help transform the raw data into meaningful insights and personalized advisories for farmers.

Proactive monitoring

In the proactive monitoring, the IoT will be collecting real-time environment data through MQTT, which will be stored in a central database. Next, the AI agent will be connected to a predictive model, which will perform the predictive pest management and detect the environmental changes that may impact the crop.

If significant change is detected, the system will notify the farmer immediately with the recommended solution so that the farmer can take preventive action.
If there is no change in the environment, then the agent will only retrieve the basic information, such as temperature, humidity, etc.

This will help farmers to take action before an issue has occurred, helping them prevent crop loss.

Reactive Interaction

In reactive interaction, the farmer will submit a request for information or advisory, such as pest management advice, crop recommendations, etc., helping them in decision-making.

The reactive agent will be connected with the SLM to provide context-aware advisory, and the other work of the agent will be to provide the latest analyzed sensor data for advisory.

The SLM retrieves insights from real-time data and generates a personalized advisory message. Then the insights will be sent back to the interface.

However, to create a reliable system, the architecture must remain flexible, and continuous improvement is needed. Therefore, the architecture might change in the future when implemented due to the real-world complexities.