Azure Synapse Analytics - Create Apache Spark Pool

In this article, we’ll learn to create a Spark Pool within a synapse workspace in Azure Synapse Analytics. Before you follow this article, it is essential the Synapse Workspace has been created. There are two ways to create a spark pool in Azure. One through the Azure Portal and another from the Azure Synapse Analytics Portal. Here we discuss about the important process of creating the Apache Spark Pool through the Analytics Portal itself.

Azure Synapse Analytics

Azure Synapse is a limitless enterprise analytics service that enables us to get insight from data analytics and data warehousing. Using dedicated resources or serverless architecture, data can be queried and provides scalability as per the increase in the size of the data. You can learn more about it from the series.

  1. Azure Synapse Analytics
  2. Azure Synapse Analytics – Create a Synapse Workspace
  3. Azure Synapse Analytics - Create Dedicated SQL Pool
  4. Azure Synapse Analytics – Create Apache Spark Pool
  5. Azure Synapse Analytics - Creating Firewall at Server-level
  6. Azure Synapse Analytics - Connect, Query and Delete Data Warehouse SQL Pool
  7. Azure Synapse Analytics – Load Dataset to Warehouse from Azure Blob Storage
  8. Azure Synapse Analytics - Best Practices to Load Data into SQL Pool Data Warehouse
  9. Azure Synapse Analytics – Restore Point
  10. Azure Synapse Analytics – Exploring Query Editor
  11. Azure Synapse Analytics – Automation Task
  12. Azure Synapse Analytics – Machine Learning

Create Apache Spark Pool in Azure Synapse Analytics

Let us learn to create the Apache Spark Pool in Azure Synapse Analytics.

Step 1

First of all, follow the article, Azure Synapse Analytics – Create a Synapse Workspace and create a synapse workspace. You’d need a paid subscription or sponsorship pass in order to create a Synapse Workspace. The Sandbox from Microsoft Learn will not support. Furthermore, to continue to this article, on creating the Apache Spark Pool in Analytics Portal, paid subscription or sponsorship pass is a requirement.

Step 2

Once the Synapse Workspace has been created, Visit the Azure Synapse Analytics Portal.

Step 3

On the Left Hand side, Choose the Manage from Menu.

Step 4

Now, Under Analytics Pools, Select Apache Spark Pools.

Step 5

You will now be taken to the Apache Spark Pool page. Click on New or New Apache Spark Pool.

Step 6

Now, you’ll be provided with details to fill.

Fill in your Apache Spark pool name. It's ok to Disable the Isolated Compute as of now unless you plan to take a heavy task. Furthermore, Set the Node Size family to Memory Optimized and select a Node Size. I’ve chosen here, Small (4 vCores/ 32 GB) to minimize the cost. Autoscale has been enabled and Node Numbers has been set in the range of 3 to 30. You can set this even minimal to save up the cost. Later, as you execute commands, and in case the Nodes aren’t sufficient, you can change them again.

Step 7

Next, Click on Additional Settings. Here, you can Enable Automatic Pausing for specific number of minutes the pool is idle. Furthermore, you can choose the version of Apache Spark.

Step 8

Now, Click on Review + Create.

Step 9

Now, the validation is done. Once, it is successful, a Green Tick Mark with Validation Succeeded text will pop up.

Step 10

Now, Click on Create.

Step 11

The deployment request will begin. Notifications will be updated as progress occurs.

Step 12

The deployment progress is shown in the notification.

Once, the deployment is successful, we can see the Successfully deployed notification.  

We can now see the new Apache Spark Pool, ojashapachepool name pop up in the Apache Spark Pools page.

Step 13

Later, as you start to work on the Apache Spark Pool. You can choose to change the Size of the pool. For this, on the Spark Pool Page, Click the expand button.

Step 14

You can then, change the note Size and Number of Nodes. You will see the updated Estimated Price as you select different options. Here, I’ve selected Large (16 vCores/ 128GB) compared to the Small Node earlier.

Once, Applied, you’ll see the new updated Apache Spark Pool ready for use.

Here, you can see, a XLarge (32 vCores/ 256GB) Node in use. See, scaling the resources is just so simple with Azure. This would have been a nightmare if you were to use on-premises service.

Step 15

Finally, as you work on the Apache Pool and the work is done, make sure you Delete the Pool to save yourself from any unwanted charges to incur.

Conclusion

Thus, in this article, we created an Apache Spark Pool in Azure Synapse Analytics. This pool can later be used for various jobs from performing Sentiment Analysis Tasks using Text Analytics to Computer Vision jobs in Azure and Big Data Analysis.