In our previous lesson, we learned how to access data files in our data lake using access keys. Today, we’ll explore another method using SAS tokens.
SAS stands for Shared Access Signature. It’s a secure way to grant limited access to resources in Azure, such as data stored in Blob Storage or Azure Data Lake Storage Gen2.
https://learn.microsoft.com/en-us/azure/databricks/connect/storage/azure-storage
Let’s go through the steps to generate and use a SAS token:
Now, let’s integrate the SAS token into our Databricks environment to access data:
Here’s a simplified example of how to use SAS tokens in Databricks to read a CSV file:
spark.conf.set("fs.azure.account.auth.type.adlsv2.dfs.core.windows.net", "SAS") spark.conf.set("fs.azure.sas.token.provider.type.adlsv2.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider") #spark.conf.set("fs.azure.sas.fixed.token.adlsv2.dfs.core.windows.net", dbutils.secrets.get(scope="<scope>", key="<sas-token-key>")) spark.conf.set("fs.azure.sas.fixed.token.adlsv2.dfs.core.windows.net","sv=2024-11-02&ss=bfqt&srt=sco&sp=rwdlacupyx&se=2024-07-11T02:03:28Z&st=2024-07-10T18:03:28Z&spr=https,http&sig=prapuKnCdsUVtyePxhv4T%2F2PF9RNNJyaqsQYZVF0peg%3D41") spark.read.csv("abfss://data@adlsv2.dfs.core.windows.net/Employees.csv", header=True).display()
Ensure to adjust the path (`your_container`, `your_folder`, and `countries.csv`) based on your storage structure.
By utilizing SAS tokens, you’ve learned an alternative method to securely access data stored in Azure from Databricks. Remember, the method of access (e.g., `spark.read.csv` for CSV files) may vary based on the file format, such as Parquet files, which require different methods like `spark.read.parquet`.
This approach ensures secure and controlled access to your data lake resources, enhancing data management capabilities within your cloud environment.