Python is widely used language in the IT world. Process of packaging and distributing the python code across the teams is a very complex task. Solution to this problem is to create wheel file and share this binary files securely across teams.
Python wheel file has an extension .whl which consists of the python version and the platform the wheel file supports. There are various benefits of packaging the python code in wheel files including their smaller size.
Additionally, installing wheel files avoids the step of building the package from the source distribution
Create wheel file using the VS Code
Install the Visual Studio Code here
Install Python extension in the Visual Studio Code,
Install Python 3.9
Setup Wheel Directory Folders and Files
We have to run below commands to setup files and folders to create wheel file.
mkdir PythonWheelDemo
cd PythonWheelDemo
code .
After the directory is created, let’s open it in VS Code. Additionally, we will need to add the following folders and files by clicking the icons outlined below.
Create Setup File
First we have to create the setup.py file. This file will contain all the metadata information.
import setuptools
with open("README.md", "r") as fh:
long_description = fh.read()
setuptools.setup(
name="ingestion",
version="0.0.1",
author="Sagar Lad",
author_email="[email protected]",
description="Package to create data ingestion",
long_description=long_description,
long_description_content_type="text/markdown",
packages=setuptools.find_packages(),
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
python_requires='>=3.7',
)
Create Readme File
We have to create the README.md file.
# Example PackageThis is a simple example package. You can use
[Github-flavored Markdown](https://guides.github.com/features/mastering-markdown/)
to write your content.
Create Init File
Then we have to create an __init__.py file. It is a mechanism to group separate python scripts into a single importable module.
from .ingestion import etl
Create Package Function File
Finally, we will need a python package function file which will contain the python code that will need to be converted to a function. In this demo, we are simply creating a function for a create table statement that can be run in Synapse or Databricks. It will accept the database, table. Spark will be used to simply define the spark.sql code section.
def etl(df, database, table):
database ="meta"
table ="fan"
connection_string="" df.write()
print(f"Command Executed:")
Install Python Wheel Packages
pip install wheel
Install Check Wheel Package
Create & Verify Wheel File
python setup.py bdist_wheel
Data bricks Python Wheel Task
Python wheel tasks can be executed on both interactive clusters and on job clusters. All the output is captured and logged as part of the task execution so it is easy to understand what happened without having to go into cluster logs.
To run a Job with a wheel, first build the Python wheel locally then upload it to cloud storage. Specify the path of the wheel in the task and choose the method as the entry point.
Conclusion
In this article, we explored about how to create python wheel file for easier distribution using the VS Code and how to deploy python wheel files to data bricks clusters.