Generative AI for Analytics: Performing Natural Language Queries on Amazon RDS using SageMaker, LangChain, and LLMs

Learn to use LangChain’s SQL Database Chain and Agent with large language models to perform natural language queries (NLQ) of Amazon RDS for PostgreSQL database

Gary A. Stafford
21 min readMay 31, 2023

To paraphrase analytics workflow product vendor YellowFin, “Natural language query (NLQ), also known as natural language search, is a self-service business intelligence (BI) reporting capability that enables analytics users to ask questions of their data. It parses for keywords and generates relevant answers sourced from related databases, with results typically delivered as a report, chart or textual explanation that attempt to answer the query, and provide depth of understanding.

Using LangChain’s SQL Database Chain and SQL Database Agent, we can leverage large language models (LLMs) to ask questions of an Amazon RDS for PostgreSQL database using natural language. Questions will be converted into SQL queries and executed against the database. Assuming the generated SQL query is well-formed, the query results will be converted into a textual explanation. For example, we ask questions like, “How many customers have purchased in the last 12 months?” or “What were the total sales in May?” These will be converted into SQL SELECT statements, like SELECT sum(amount) AS sales FROM purchases WHERE MONTH(purchase_date) = 5 AND

--

--

Gary A. Stafford

Area Principal Solutions Architect @ AWS | 10x AWS Certified Pro | Polyglot Developer | DataOps | GenAI | Technology consultant, writer, and speaker