Part 3: Query Data in Snowflake

To view the data in Snowflake, we must complete the following steps from the Snowflake documentation:

  1. Create an external volume in Snowflake.
    • Please refer to the Snowflake documentation here for the complete code samples for creating an external volume.
    • The video includes the following details from our example:
      • When creating the new policy for Snowflake to access the s3 bucket, use root of the s3 bucket to avoid a list error when verifying storage access.
      • When creating an external volume in Snowflake, for STORAGE_BASE_URL use the complete bucket path with s3://<>/<>/compaction.
  2. Create a catalog integration for Open Catalog
    • Please refer to the Snowflake documentation here for the complete code samples.
    • The video includes the following details from our example:
      • The CATALOG_NAMESPACE refers to the tenant.namespace in our StreamNative Cluster. Since we published messages to public.default, use public.default as the CATALOG_NAMESPACE.
      • We can resuse the <CLIENT ID>:<SECRET> for Snowflake Open Catalog to allow access for Snowflake. The <CLIENT ID> refers to OAUTH_CLIENT_ID and <SECRET> refers to OAUTH_CLIENT_SECRET.
    • You will need to create a new catalog integration for each tenant.namespace.
  3. Create an externally managed table
    • Please refer to the Snowflake documentation here for the complete code samples.
    • The video includes the following details from our example:
      • A Snowflake Open Catalog warehouse.schema.table (e.g. streamnative.public.default.kafkaschematopic) is mapped to a Snowflake database.schema.table (e.g. training.public.kafkaschematopic)
      • Use AUTO_REFRESH = TRUE; in CREATE ICEBERG TABLE to ensure new data is viewable in Snowflake.
    • You will need to create a new externally managed table for each topic.

Once completing these steps, you will be able to use Snowflake AI Data Cloud to query the Iceberg Table registered in Snowflake Open Catalog. We have now created a cost-effective Streaming Augmented Lakehouse for enabling real-time AI applications in Snowflake AI Data Cloud.