Accessing via Access Keys

Now that we’ve created our storage account and containers, I want to show you how you can access data from these cloud storage accounts.

The first way I’ll show you is by using the storage account access keys.

When you create your storage account as your generates two 512 bit storage account access keys for that account, these keys can be used to authorize access to data in your storage account via shared key authorization.

So back in is your portal.

Let me go to my storage account so I can click here or search for the storage account.

Additionally, you can also find it as a resource in your resources.

If I scroll down and under security plus networking, you can see access keys.

If I click into that, you get your storage account name and then two keys.

You can use either one of these two keys.

So let me click on show and then copy to clipboard.

And we can use this to access data via databricks.

Okay, so I want some official documentation showing you how to access data in databricks from is your storage.

https://learn.microsoft.com/en-us/azure/databricks/connect/storage/azure-storage

So we can scroll down to the section about account keys.

Which is here.

spark.conf.set(
"fs.azure.account.key.<storage-account>.dfs.core.windows.net",
dbutils.secrets.get(scope="<scope>", key="<storage-account-access-key>"))

So as you can see, we first need to set up some configurations.

So we just need to replace the bits in the brackets.

So storage account.

And then finally we need to pass in the storage account access key.

So as I’ve mentioned, storage account is just the name of your storage account.

So if you go here, you can see storage account name is Data Lake 639 for me.

For you it will be different.

Okay, so I’ll replace this with dataLake 639.

You then need to replace all of this and then just paste in your storage account Access key inside of double quotes.

The db utils dot secrets dot get method is going to be covered in a later lecture in this section.

It’s just a method so that you hide your access key from being displayed publicly like this.

So when you copy it like this.

And here and then just delete this bit.

And then this additional bracket.

The access key is publicly available to anyone who can access this notebook.

So that’s not necessarily secure.

That’s why ideally you’d like to have it hidden and that’s what you use.

DB utils or secrets for full and I’ll demonstrate that later.

However, for now I’m just demonstrating the mechanics so this will be okay.

spark.read.csv("abfss://data@adlsv2.dfs.core.windows.net/Employees.csv", header=True).display()

So let me run this cell.

Great.

So that is the configuration.

So now that we’ve done that, we can access the data using a simple spark or read command.

However, for the file path, we need to use a URI. This stands for Uniform Resource Identifier.

So let me show you some documentation.

https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-abfs-driver

So the driver we’re using is called as your blob file system driver or apfs for short. This is the primary access method.

This link contains further details.

Should you be interested?

And then if you scroll down the bit we’re interested in is this bit here, which is the Yuri scheme.

abfs[s]://file_system@account_name.dfs.core.windows.net/<path>/<path>/<file_name>

So let’s copy that and paste that into our notebook.

So you can remove square brackets here and then you replace the file system with the container that you wish to access.

So if I go back to my storage storage account and then go into containers, the bronze container is

the only real container that’s got data in it. And there in CSV format.

So I’ll select the runs container

for account name.

You already know this. It’s data lake 639.

For me and for you, it will be whatever your storage account name is.

And then finally you paste in the file path of the file that you’d like to read.

So for me, let’s read in the country’s file first.

So there’s no subfolder.

So the path is really straightforward.

It’s just the file name.

So once you click into the file, whatever’s here, you can copy that.

That will be the file path for you.

So let me paste that in here.

And this is all you are.

So to read in the file, what we can do is we can type SPARC .3. CSV.

Because remember we’re reading a CSV file and then inside of double quotes we have this string.

And then we can read the file.

However, remember that for this file, the first row is a header, so I can just set header equals to true.

And now let me run this.

And as you can see, that seems to have worked.

I can confirm this by using a dot display.

And here is the dataframe.

You can also assign this to a variable, so I’ll assign it to countries equals that and then get rid

of display.

And then similarly, if I want to read in the Country regions file, I can as well, so I’ll just copy

this.

We change the variable name to regions and then change this to country, underscore regions.

And now I’ve got the dataframe

So in this lecture I’ve shown you how to access data in Databricks from your data lake via an access

key.

Azure Databricks with PySpark

Curriculum

Accessing via Access Keys

Leave a Reply Cancel reply

Modal title