We’ve been able to access data and as your data link via access, keys and SAS tokens, every time we need to access the data, we have relatively lengthy access patterns.
We need to obtain the access key or SAS token and then run the configurations each time we have a new notebook session.
In this lecture, I want to cover how to Mount Cloud Object Storage onto the Databricks file system.
Databricks mounts create a link between a workspace and a cloud object storage.
This enables you to interact with the cloud object storage using familiar file paths relative to the databricks file system.
Mounts work by creating a local alias under the forward slash mount directory (/mnt). That stores the location of the cloud object storage. It also stores the driver specifications to connect to the storage account or container and the security credentials required to access the data.
So when you mount containers or storage locations in yours, your data lake, they’ll appear here along with the data that they contain so you can access them without requiring to provide credentials, access, keys or SAS tokens. You simply provide the relative file path here.
But the actual storage location is in your is your data lake.
Like I said, it creates a local alias for ease of access.
So how do we mount Cloud object storage onto the Databricks file system?
First of all, we need to register an as your AD application to create a service principle and as your service principle is a security identity used to access specific as your resources.
It’s similar to a user identity with a specific role such as a contributor for a specific resource.
The access is restricted by the roles assigned to it.
When we create the service principle, we then get an application, client ID, an application, tenant ID, and a client secret.
We then need to give the service principal access to the ADLs account.
And finally, with the application client ID, the tenant ID, and the secret, we can mount the storage to the Databricks file system.
So that’s a high level overview of how to mount ADLs to the Databricks file system.
https://docs.databricks.com/en/dbfs/mounts.html
configs = {"fs.azure.account.auth.type": "OAuth", "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", "fs.azure.account.oauth2.client.id": "<application-id>", "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"), "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"} # Optionally, you can add <directory-name> to the source URI of your mount point. dbutils.fs.mount( source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/", mount_point = "/mnt/<mount-name>", extra_configs = configs)
display(dbutils.fs.mounts())
I first want to register an app, so I’ll search for app registrations.
It’s right here, but I’ll search for it anyway.
I then need to create a new registration and I’ll give it a name.
Databricks hyphen Service icon app.
And then I’ll register and I’ll keep everything as default.
Notice you get an application ID.
A tenant ID as well, so I’ll copy those and paste it back in my notebook.
So I’ll have application ID, tenant ID, and I’ll copy that as well.
And then we need one more.
We need a secret value.
To get that, you simply go on certificates and secrets.
New client secret and just give it a description.
I’ll just write Databricks because that’s what it’s for, and you can give it an expiration date.
I will just keep minus six months add.
And then what?
You should get on the client secrets.
Is this value here.
So copy that to clipboard.
And then paste.
Great.
So we have our application ID or tenant ID and our secret value.
Now the next thing to do is to give this service app permission to contribute to our storage account
that we’re going to mount to Dbf.
So find storage accounts and then locate your specific storage account.
So mine is data lake 639.
Then go to access control IAM and roll assignments.
What you want to do is add a roll assignment.
And the role assignment should be storage blob data contributor.
Go to next and then select the member.
So the member for me will be my service app.
So that will be the data brick service app.
Select.
Review and Assign.
So now you should see here.
The storage blob data contributor role has been given to the service application.
And for that service application we’ve got the application ID, the tenant ID and the secret.
So now we can actually mount the storage container.
So if I scroll up and I get this code.
And I paste it here.
I can actually take the application ID.
I copy that here.
And then I can take the tenant ID.
And that goes here where we have directory ID.
And now.
I’ll paste in the secret’s value here.
And I’ll show you this DB utilized secrets method very soon.
But essentially this is a way for you not to display the value in your notebook, but for now it’s okay.
So let me paste that in.
Great.
So now, finally, I’ll put in the container name that I’d like to mount, and that will be the bronze container.
The storage account name is Data Lake 639.
And the Mount Point I’d like is just a folder called Bronze.
So now, before I execute this, let me show you the file system.
As you can see at present, there’s nothing in the mount point.
So now let me run this.
So it is mounting and if it’s successful, we should get an output here that says true very soon.
Great.
So we get true.
Now, if I go back to Databricks, unusually, I still can’t see the point.
However, if I do db utils dot fs dot amounts and then I enclose that in display.
We can see the bronze mount point.
Here is the URI for it.
So why can’t we see it here?
Well, to be able to see it, we’re going to have to adjust some settings in order to databricks via
the admin console.
So in order to see this, locate the admin console.
So for me it’s when I click on my email and then I can see admin console.
However, the UI changes sometimes, so this might be a bit different for you, but essentially locate
the admin console and then go to workspace settings.
Then finally scroll down until you see DBMS Dbf s file browser.
Right now it’s disabled for me, so I’m going to check it and enable it.
I’ll refresh this page and then when you go to data, you should get a slightly different view.
So you should see database tables and then DB FS.
So select DB FS, And now, as you can see, I’ve got my file store here, but then I’ve also got an
endpoint and now you can see the bronze container mounted successfully and then the files as well.
So here it is.
This is a message recorded at a future date from the initial recording of this lecture.
Since the initial recording, the user interface has changed, so I just wanted to add an update to
this lecture.
If you’re using a more recent version of Databricks, then you might have an interface similar to why
I’m showing you now.
In the main area you have your hive metadata and then your databases and tables, and I’ll cover this
in the next section.
Once you enable DB FS via your workspace settings, you should get an icon here that says Browse db
FS.
Once you click on that, you should then get the same view as per the original lecture that I’ve recorded.
And then you can see your mount point here under Mount and then you’ve got your bronze moot point.
So going forward, when you navigate to Data Explorer, please click on Browse Dbf.
At the top this will appear.
Only after you’ve gone to admin console workspace settings.
And ensured that Dbf file browser is enabled.
So going forward, please click on Browse Dbf to be taken to the Databricks file store.
So now that we’ve successfully mounted the bronze container from your data lake storage, let’s read
in the country’s file into a data frame.
So now, rather than having to provide Uri’s access keys, SAS tokens and doing all the configurations,
we can actually just use the file path via our Databricks file system.
So I can actually just use this file path here.
So if I want the country’s file.
I can just do this and then add the countries file name as well so I can just do smart dot read csv.
And then for the file path, I will just do countries dot csv and then remember to do header as true
as well.
And then I will do dot display.
So as you can see, that has now successfully read the file.
So now that we’ve mounted a container to our Databricks file system, we can also unmount it to.
To unmount the container.
You can simply do db, utils, dot fs, dot unmount and then specify the path of the mount point.
So for me, it’s Mount Bruns.
So now let’s run that.
And now it should start the process of mounting.
As you can see, it’s been mounted.
So if I now check the file system, as you can see.
It’s no longer mounted.
And that is also the same if I rerun this.
So as you can see, it’s brown and bronze right now.
And if I rerun the cell, it’s disappeared.
So we’ve successfully mounted the container.
Okay, great.
So we’ve now seen how to mount and unmount a storage container to Databricks.