CI/CD and Solution Overview

In this lecture I want to cover CI CD with databricks.

CI CD is a practice that allows the development and operations teams to streamline the development process, automate testing and ensure rapid and reliable delivery of software and code.

The CI in CI CD stands for continuous integration. This involves planning, coding, building and testing.

CD stands for continuous delivery or continuous deployment. This involves releasing, deploying, operating and monitoring. This is a continuous cycle.

I should point out that continuous delivery or continuous deployment are terms that are used interchangeably.

But continuous delivery means automating the delivery process up to the point of readiness for deployment, while continuous deployment takes the automation a step further by automatically deploying the software to production environments after it passes all of the necessary tests.

The objective of continuous integration is to enable multiple developers to work on the same code base simultaneously and ensure that their changes are integrated smoothly.

It involves automatically merging code changes, building the application or code base and running automated tests to identify any integration issues early on.

The objective of continuous delivery is to automate the process of deploying software to different environments such as testing and production.

Continuous deployment ensures that the software is ready for deployment at any time by automating the necessary steps such as building, packaging and deploying the application or code.

So here’s an illustration of how CI CD could work for our needs.

Whenever a developer pushes code via a pull request into a specific branch in Azure Repos. Let’s say our main branch then as your DevOps will trigger the CI CD pipeline.

This is automatic and works with the help of a build pipeline where our code is compiled and artifacts are created.

An artifact refers to a compiled or packaged version of your code, where the source code is transformed into a deployable format.

So in our case, since we’re developing notebooks, that’s what is packaged up as an artifact.

As part of the build pipeline.

We can also perform automated testing such as unit tests. Everything up to this point is the continuous integration stage.

After the build pipeline, we moved to the test stage. In this stage, our code is automatically deployed into the test environment for thorough testing via a release pipeline.

This environment closely resembles the production environment, allowing us to catch any issues before deployment.

Testing is a critical part of the software development process. By deploying the code into the test environment, we can perform various tests, including functional integration and performance tests.

This ensures that our code meets the desired quality standards.

Once the code passes all the tests in the test environment, the pipeline proceeds to the deployment stage via the release pipeline.

Here. The code is deployed to the production environment, making it available to all end users.

The deployment to test and production is the continuous deployment stage.

In the following lectures, I will implement CI CD with Azure, DevOps and Databricks.

The solution will be as follows.

The development code will be stored in a remote git repository.

Whenever a pull request happens on the main branch, the build pipeline will be triggered.

The code will then be deployed to the test environment.

Finally, pending a manual approval, the code will then be deployed to the production environment.

The test and production environments will not be stored in Git, but they will be in shared folders in our databricks workspace.

So in the following lectures, I’ll perform a demonstration and walk you through all of the necessary steps for our solution.

Azure Databricks with PySpark

Curriculum

CI/CD and Solution Overview

Leave a Reply Cancel reply

Modal title