What is a Virtual Warehouse and What are its Benefits
A virtual warehouse on Snowflake comprises database servicers for the smooth execution of user queries. You can get them based on your business requirements. If you have an on-premise database, the hardware deployment will remain fixed.
But when we consider Snowflake, it is a set of virtual servers. These include memory, CPU cores, and SSD. So, how exactly does this virtual warehouse works? Let’s look at all of that in detail in this article.
What is a virtual warehouse?
A virtual warehouse is a process for collecting and managing data from different sources. You can use it typically to connect and analyze data from heterogeneous sources. A virtual warehouse is essential for queries and DML operations, including loading data into tables. You can define a virtual warehouse by its size and other properties. These help control and automate warehouse activity.
Following are some important factors to consider warehouse:
- Data types: What type of data the warehouse has (structured or unstructured).
- Scale: The amount of data stored in the warehouse.
- Performance: How quickly data gets accessed after a query.
- Maintenance: How much engineering effort needs dedication to a warehouse.
- Cost: How much money do you need to spend on a data warehouse.
- Community: How connected the warehouse is to other critical tools and services.
Fig: Comparison table for various warehouses
Warehouse sizes and credit per hour
Size specifies the set of compute resources available in a warehouse. Snowflake supports the following warehouse sizes:
- 1.X-Small is the default size for warehouses using CREATE WAREHOUSE.
- 2.X-Large is the default for warehouses created in the web interface.
- There is a doubling of credit usage as you increase in size to the next larger warehouse size for each full hour that the warehouse runs.
- Snowflake utilizes per-second billing. So warehouses get billed only for the credits they consume.
- The total number of credits billed depends on how long the warehouse runs continuously.
Caching in snowflake warehouse
Caching helps speed up queries. Snowflake caches the results of every query you run. When a new query gets submitted, it checks previously executed queries. If a matching query exists and the results are still cached, it uses the cached result set instead of executing the query. As Snowflake retrieves results from the cache, it helps reduce the query time. Snowflake Cache results are global and applicable across users. The cache also has infinite space.
Following are the types of caching layers in Snowflake
Fig. Type of Caching Layers in Snowflake
Which holds the copy of the results of every query executed in the past 24 hours. Since the Results Cache has a default and automatic function, you do not have to do much to use it.
These are available across virtual warehouses, so query results returned to one user are available to any other user on the system who executes the same query, provided the underlying data has not changed.
- Virtual Warehouse Local Disk Caching
You can retrieve the data from the Remote Disk storage for queries. It then gets cached in a virtual warehouse used by SQL queries.
- Remote Disk
It holds the long-term storage and also called a storage layer. This level is responsible for data resilience, which in the case of Amazon Web Services.
Loading and querying data requires a virtual warehouse that provides the necessary compute resources to perform these tasks. It takes time for the creation of a virtual warehouse for computing resources. You can overcome this problem by creating warehouses in a suspended state.
Syntax to create warehouse
CREATE [OR REPLACE] WAREHOUSE [IF NOT EXIST] <name>
[ [WITH] objectProperties ]
X-Small warehouse named demo using the CREATE WAREHOUSE command:
create or replace warehouse demo with warehouse_size=’X-SMALL
auto_suspend = 180
auto_resume = true
Data loading in Snowflake
Depending on the volume of data intend to load and the frequency of loading, following are methods to loading data to Snowflake:
- Using SQL Commands
You can bulk load large amounts of data using SQL commands in SnowSQL using the Snowflake CLI. CSV files are most methods, even though several other options exist. Also, bulk load semi-structured data from JSON, AVRO, Parquet, or ORC files.
Bulk loading is possible in two phases:
- Using Snowpipe
You can use Snowpipe for bulk loading data to Snowflake, particularly from files staged in external locations. Snowpipe uses the COPY command, and additional features allow you to automate the data loading process.
Fig. Snowpipe data loading
- Using the Web Interface
You can use the web UI to select the table you wish to load in Snowflake. The wizard simplifies loading by combining the staging and data loading phases into a single operation and automatically deletes all the staged files after loading.
Fig. Web Interface for Loading Data to Snowflake
- Using Hevo Data
Hevo Data is a No-code Data Pipeline solution. It provides a hassle-free solution and helps you directly transfer data from various sources to Snowflake and numerous other Databases or Data Warehouses. As a self-managed tool, it automates the data loading from sources of your choice. It also enriches data and makes it ready for analysis without coding.
Fig: HEVO Data
Automating Warehouse Suspension
You can set warehouses to automatically suspend when there’s no activity after a specified period. Auto-suspend gets enabled by specifying the time of inactivity for the warehouse.
The used credits for a resource monitor reflect the sum of all credits consumed by all assigned warehouses within the specified interval. If a monitor has a Suspend or Suspend Immediately action defined and its used credits reach the threshold for the action.
Following are snowflake partner for data migration: