Choosing a Snowflake Connector

Snowflake provides Snowpipe and Snowpipe Streaming real-time ingestion APIs catering to varying performance, cost and complexity requirements. The following table provides a feature comparison for Snowpipe and Snowpipe Streaming. Additional details are available in the blog post here.

With StreamNative support for Kafka Connect and Pulsar IO Connectors through UniConn, three connector options are available for streaming data into Snowflake using Snowpipe or Snowpipe Streaming:

  • Kafka Connect Snowflake Sink Connector (supports Snowpipe and Snowpipe Streaming)
  • Snowflake Streaming Sink Connector (Pulsar IO) (Snowpipe Streaming)
  • Snowflake Sink Connecter (Pulsar IO) (Snowpipe)

Kafka Connect Snowflake Sink Connector

For organizations that already use Apache Kafka or Kafka-compatible data streaming platforms, the Kafka Connect Snowflake Sink Connector is the most efficient way to stream data into Snowflake. Developed and maintained by Snowflake, this connector provides a seamless integration between Kafka topics and Snowflake tables, leveraging both Snowpipe and Snowpipe Streaming for ingestion.

Example Deploying Kafka Connect Snowflake Sink Connector

Snowflake Streaming Sink Connector (Pulsar IO)

The Snowflake streaming sink connector was announced recently. It pulls data from Pulsar topics and persists data to Snowflake based on the Snowpipe Streaming API with sub-second latency. The connector supports exactly-once semantics to ensure data is processed without duplication. If there are any errors during the messages processing, the connector will simply get restarted and reprocess the messages from the last committed messages.

Messages are ingested into Snowflake in JSON, Avro, or Primitive formats. The connector also supports sinking data into Iceberg tables with the proper configuration.

Example Deploying Snowflake Streaming Sink Connector

Snowflake Sink Connector (Pulsar IO)

The snowflake sink connector receives messages from input topics and converts them into JSON format. These data are buffered in memory until the threshold is reached and then are written to temporal files in the internal stage. Snowpipes will be created to ingest staged files on a partition basis. Once the ingestion is succeeded, temporal files will be deleted; otherwise it will move files into table stage and produce error messages. This connector supports effectively-once delivery semantics.

Messages are ingested into Snowflake in JSON, Avro, or Primitive formats. This connector was developed earlier due to the lack of Snowpipe Streaming API; Due to its batch loading method, the ingestion latency would be higher. It’s always recommended to use the new Snowflake Streaming Sink Connector whenever possible.

Example Deploying Snowflake Sink Connector