Abstract
The natural language to SQL (NL2SQL) task enables non-expert users to interact with relational databases via natural language interfaces. However, NL2SQL frameworks often rely on Large Language Models (LLMs), raising concerns about computational overhead, data privacy, and deployment in resource-limited environments. To address these issues, we propose a hybrid schema-aware agentic system using Small Language Models (SLMs) as primary agents, with a selective LLM fallback mechanism. The LLM activates only when errors are detected in SLM-generated queries, reducing inference costs. Experiments on the BIRD benchmark dataset show our system achieves an execution accuracy of 53.91% and validation efficiency score of 50.46% on BIRD development set. While this accuracy is lower than state-of-the-art LLM-only systems like MAC-SQL (59.59% execution accuracy), the hybrid approach reduces query processing cost threefold compared to LLM-only frameworks. These findings demonstrate our cost-efficient hybrid method offers a compelling trade-off, delivering competitive performance with enhanced cost efficiency versus LLM-driven baselines.
Advisor
Naseef Mansoor
Committee Member
John Burke
Committee Member
Rajeev Bukralia
Date of Degree
2025
Language
english
Document Type
APP
Degree
Master of Science (MS)
Program of Study
Data Science
Department
Computer Information Science
College
Science, Engineering and Technology
Recommended Citation
Onyango, D. O. (2025). Hybrid agentic system for schema-aware NL2SQL generation [Master’s alternative plan paper, Minnesota State University, Mankato]. Cornerstone: A Collection of Scholarly and Creative Works for Minnesota State University, Mankato. https://cornerstone.lib.mnsu.edu/etds/1559/