Transforming Data Insights with Snowflake's LLM Functions– Part 1

Kapil Singh Kumawat
Apr 07
773
0
5

Article

In this article, I will cover the capabilities of Snowflake Cortex’s Large Language Model (LLM) function, which can be utilized for a variety of tasks, such as sentiment analysis, text completion, and language translation.

Introduction

Snowflake has introduced a powerful suite of LLM functions under its Cortex platform that offers access to advanced LLMs from companies like Google, Meta, Mistral, and Anthropic. These models are hosted and managed by Snowflake, ensuring seamless integration and high performance without the need for extensive setup. With Snowflake, your data remains securely within the platform, ensuring you benefit from the expected performance, scalability, and governance.

Snowflake Cortex features are provided as SQL functions and are also available in Python.

Snowflake Cortex LLM functions can be grouped into the categories below.

1. COMPLETE Function

The COMPLETE function is capable of performing a wide range of tasks, such as aspect-based sentiment classification, synthetic data generation, and customized summaries.

2. Task-Specific Function

These functions are tailored for specific tasks, making them highly efficient for routine operations:

CLASSIFY_TEXT: Categorizes text into predefined categories.
EXTRACT_ANSWER: Finds answers to questions within unstructured data.
PARSE_DOCUMENT: Extracts text and layout elements from documents.
SENTIMENT: Analyzes the sentiment of a given text and returns a score from -1 to 1, representing sentiment as negative or positive.
SUMMARIZE: Return a summary of a given text.
TRANSLATE: Translates text between supported languages

3. Helper Function

Helper functions enhance the reliability of other LLM functions by managing token counts and handling execution errors:

COUNT_TOKENS: Returns the token count for a given text.
TRY_COMPLETE: It is similar to the COMPLETE function but returns NULL instead of an error when execution fails

Lab Setup

Let’s set the environment before using the LLM function -

Set up your current context for the role, database, schema, and warehouse.

Set the Role

USE ROLE accountadmin;

Set the Warehouse

USE WAREHOUSE compute_wh;

Create a database

CREATE DATABASE IF NOT EXISTS llmtest_db;

Create a schema in a database

CREATE SCHEMA IF NOT EXISTS llmtest_db.data;

Create a stage to hold data for sentiment analysis

CREATE STAGE IF NOT EXISTS llmtest_db.data.reviews

DIRECTORY = (ENABLE = TRUE, REFRESH_ON_CREATE = TRUE);

Switch to the schema that was created on previous steps -

USE SCHEMA llmtest_db.data;

Run LLM functions

1. Analyze Sentiment

This LLM function aims to detect the mood or tone of the given English-language text. The function returns a sentiment score from -1 to 1, representing the text’s detected negative or positive sentiment. This function takes a single parameter, the English-language input text you want analyzed. It returns a floating-point number from -1 to 1 (inclusive), indicating the text’s negative or positive sentiment level. Values around 0 indicate neutral sentiment

Setting variables for sentiments

SET VAR_SENTIMENT_POSITIVE = 'The coffee vending machine was awesome. Taste of coffee was great and has excellent selection of coffee beverages.';

SET VAR_SENTIMENT_NEGATIVE = 'The coffee vending machine was terrible. The selection of coffee beverages was limited and taste poor.';

Create Temporary table

CREATE OR REPLACE TEMPORARY TABLE temp_sentiment (text VARCHAR);

Inserting the example texts into the temporary table

INSERT INTO temp_sentiment (text) VALUES ($VAR_SENTIMENT_POSITIVE);

INSERT INTO temp_sentiment (text) VALUES ($VAR_SENTIMENT_NEGATIVE);

Display the stored texts to verify the correct insertion

SELECT * FROM temp_sentiment;

Let's run the below query to calculate sentiment scores

SELECT

    text

    ,snowflake.cortex.SENTIMENT(text) AS sentiment_grade
    ,CASE

    WHEN (sentiment_grade >= 0.075) THEN 'POSITIVE Sentiment'
    WHEN (sentiment_grade <= -0.075) THEN 'NEGATIVE Sentiment'

    ELSE 'NEUTRAL'
    END as sentiment_label

FROM temp_sentiment;

YOu might receive the below error message after running the above query.

100351 (P0000): Request failed for external function SENTIMENT$V2 with remote service error: 400 '"unknown model"

It looks like you're encountering an error with the SENTIMENT$V2 function in Snowflake. The error message "unknown model" typically indicates that the specified model is not available in the region where your Snowflake account is located.

Here are steps to resolve this issue:

Enable Cross-Region Inference

You can enable cross-region inference to access models from other regions. Use the following command:

ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION';

Let's rerun the below SELECT query to calculate sentiment scores, and you will receive output like this -

Select query

This is how you will get the sentiment grade for any text input. For positive sentiment, the sentiment grade is in a positive range, which was defined in the query, and for negative sentiment, it was defined as negative.

2. EXTRACT_ANSWER

This function is used to extract answers to specific questions from text documents, which can be plain text or JSON data.

The function takes in two parameters: the source “document” and the question asked based on the document text.

Syntax

SNOWFLAKE.CORTEX.EXTRACT_ANSWER(
    <source_document>,
    <question>
);

source_document

A string containing the plain-text or JSON document that contains the answer to the question.

Question

A string containing the question to be answered.

Let's create a table to store sample text for demonstration

CREATE OR REPLACE TEMPORARY TABLE temp_extract_text (

detailed_text VARCHAR

);

Now, Insert the sample text into the temporary table -

INSERT INTO temp_extract_text (detailed_text)

VALUES ('The retail chain XYZ announced that its total sales for 2024 reached $200 million.The best-selling product category was mobiles, which accounted for 40% of total sales. The company opened 10 new mobile stores and plans to expand further in 2025.');

Setting up multiple variables with questions for which answers will be extracted from text defined in a temporary table.

SET VAR_HAS_ANSWER1 = 'What were the total sales for 2024?';

SET VAR_HAS_ANSWER2 = 'What was the best-selling product category?';

SET VAR_NO_ANSWER = 'What were the total sales of mattress in 2025?';

You can check the variable defined above by running the below query.

Show Variable like '%ANSWER%';

Variable

Let's run the sample queries below for each question.

Below is the query for variable VAR_HAS_ANSWER1 -

SELECT snowflake.cortex.EXTRACT_ANSWER(

(SELECT detailed_text FROM temp_extract_text)

,$VAR_HAS_ANSWER1

) AS extracted_answer;

Below is the output of the above query for variable $VAR_HAS_ANSWER1.

Variable $VAR_HAS_ANSWER1

Below is the query for variable VAR_HAS_ANSWER2.

SELECT snowflake.cortex.EXTRACT_ANSWER(

(SELECT detailed_text FROM temp_extract_text)

,$VAR_HAS_ANSWER2

) AS extracted_answer;

Below is the output of the above query for variable $VAR_HAS_ANSWER2.

Variable $VAR_HAS_ANSWER2

Next, use EXTRACT_ANSWER to handle a question that isn’t directly answered in the text.

SELECT snowflake.cortex.EXTRACT_ANSWER(

(SELECT detailed_text FROM temp_extract_text)

,$VAR_NO_ANSWER

) AS extracted_answer;

Below is the output of the above query for variable $VAR_NO_ANSWER.

Variable $VAR_NO_ANSWER

Here, the low confidence score indicates that the function is not sure about the accuracy of the answer.

This is how we can use Snowflake Cortex AI function to evaluate to enhance data processing and analytical tasks.

I will cover a few more functions in the next article. Have a happy learning experience.