Introduction
If you have worked with Azure Cognitive Service APIs like OCR API, Read API, or Form Recognizer API, you might have come across a bounding box in the read results of the response. If the input you have given is slightly tilted, the response will also be tilted. The response also contains the angle by which the input page is tilted. To manipulate the results we sometimes need to rotate the boundingBox in the response with the tilted angle to do some operations.
Here we are looking at how this rotation can be done. Here is a Receipt from KSEB captured in an angle. We first give it to Azure Read API to get the JSON output.
Sample Input is given to Azure Read API
Sample Output from Azure Read API
{
"status": "succeeded",
"createdDateTime": "2020-12-21T15:13:55Z",
"lastUpdatedDateTime": "2020-12-21T15:13:56Z",
"analyzeResult": {
"version": "3.0.0",
"readResults": [
{
"page": 1,
"angle": 7.6307,
"width": 445,
"height": 1242,
"unit": "pixel",
"lines": [
....
....
....
{
"boundingBox": [
185,
91,
366,
113,
364,
131,
183,
109
],
"text": "Demand/Disconnection Notice",
"words": [
{
"boundingBox": [
186,
92,
319,
109,
318,
126,
184,
109
],
"text": "Demand/Disconnection",
"confidence": 0.751
},
{
"boundingBox": [
323,
109,
366,
114,
365,
131,
321,
126
],
"text": "Notice",
"confidence": 0.907
}
]
},
....
....
....
]
}
]
}
}
The above JSON response from Azure Read API says the Receipt is tilted by an angle of 7.6307 degrees. In order to do some manipulations based on the coordinates in the bounding box we need to rotate this by the angle of 7.6307.
Rotate a point p by n degrees with respect to the origin o
Suppose we need to rotate a point by n degrees with respect to the origin o we may call a custom function described below by calling it.
rotatedPoint = rotate(p, o, n);
The custom function rotates is shown below, which also needs Numpy to work.
import numpy as np
def rotate(point, origin, degrees):
radians = np.deg2rad(degrees)
x, y = point
offset_x, offset_y = origin
adjusted_x = (x - offset_x)
adjusted_y = (y - offset_y)
cos_rad = np.cos(radians)
sin_rad = np.sin(radians)
qx = offset_x + cos_rad * adjusted_x + sin_rad * adjusted_y
qy = offset_y + -sin_rad * adjusted_x + cos_rad * adjusted_y
return qx, qy
Building a function for correcting angle of coordinates bounding box in response
The bounding box contains 8 points as a list in the order.
[
"left-top-x",
"left-top-y",
"right-top-x",
"right-top-y",
"right-bottom-x",
"right-bottom-y",
"left-bottom-x",
"left-bottom-y"
]
So to rotate a bounding box we may loop through the list by incrementing by 2 on each cycle as below where:
angle = analysis["analyzeResult"]["readResults"][index]["angle"]
Here angle=7.6307 and I try to rotate the picture by origin so that is equal to (0,0)
for ind in range(0, 7, 2):
bBox[ind], bBox[ind + 1] = rotate((bBox[ind], bBox[ind + 1]), (0, 0), angle)
Now we know how to rotate a bounding box. Let's move to how we can rotate all bounding boxes corresponding to lines in readResult as a whole. Nesting the above code snippet inside a loop of lines which is inside a loop of pages will make you achieve the same. Below is the code for the same.
def correctAngle(analysis):
for page in analysis["analyzeResult"]["readResults"]:
for line in page['lines']:
bBox = line['boundingBox']
for ind in range(0, 7, 2):
bBox[ind], bBox[ind + 1] = rotate((bBox[ind], bBox[ind + 1]), (0, 0), angle)
line['boundingBox'] = bBox
return analysis
Now we have rotated the bounding box corresponding to lines but we know there are bounding corresponding to words too so in order to achieve rotation on those bounding boxes you may try the same logic in the loop. We know we need to rotate only pages which have an angle so to optimize the code we check if the angle is a non-zero value before entering the rotation function. Below is the code for the same.
def correctAngle(analysis):
for page in analysis["analyzeResult"]["readResults"]:
if page["angle"] != 0:
for line in page['lines']:
bBox = line['boundingBox']
for ind in range(0, 7, 2):
bBox[ind], bBox[ind + 1] = rotate((bBox[ind], bBox[ind + 1]), (0, 0), page["angle"])
line['boundingBox'] = bBox
for word in line['words']:
wbBox = word['boundingBox']
for ind in range(0, 7, 2):
wbBox[ind], wbBox[ind + 1] = rotate((wbBox[ind], wbBox[ind + 1]), (0, 0), page["angle"])
word['boundingBox'] = wbBox
page["angle"] = 0
return analysis
Sample Output
Output when processing the above JSON response with the above function is.
{
"status": "succeeded",
"createdDateTime": "2020-12-21T15:13:55Z",
"lastUpdatedDateTime": "2020-12-21T15:13:56Z",
"analyzeResult": {
"version": "3.0.0",
"readResults": [
{
"page": 1,
"angle": 0,
"width": 445,
"height": 1242,
"unit": "pixel",
"lines": [
....
....
....
{
"boundingBox": [
202.42927798617825,
39.094595713404814,
382.83721732032967,
12.675371175104381,
385.64576445111106,
30.567046977392415,
205.23782511695967,
56.98627151569286
],
"text": "Demand/Disconnection Notice",
"words": [
{
"boundingBox": [
203.65723612684363,
39.796107512859024,
336.4417810450956,
21.187920313329045,
339.95183997533115,
37.85163797495165,
206.2025600870195,
56.72304834508725
],
"text": "Demand/Disconnection",
"confidence": 0.751
},
{
"boundingBox": [
340.3007209253349,
20.135027630906585,
383.1004404909352,
13.640106145164204,
386.6104994211709,
30.30382380678678,
342.84604488551065,
37.061968463134804
],
"text": "Notice",
"confidence": 0.907
}
]
},
....
....
....
]
}
]
}
}
Final code
import numpy as np
def rotate(point, origin, degrees):
radians = np.deg2rad(degrees)
x, y = point
offset_x, offset_y = origin
adjusted_x = (x - offset_x)
adjusted_y = (y - offset_y)
cos_rad = np.cos(radians)
sin_rad = np.sin(radians)
qx = offset_x + cos_rad * adjusted_x + sin_rad * adjusted_y
qy = offset_y + -sin_rad * adjusted_x + cos_rad * adjusted_y
return qx, qy
def correctAngle(analysis):
for page in analysis["analyzeResult"]["readResults"]:
if page["angle"] != 0:
for line in page['lines']:
bBox = line['boundingBox']
for ind in range(0, 7, 2):
bBox[ind], bBox[ind + 1] = rotate((bBox[ind], bBox[ind + 1]), (0, 0), page["angle"])
line['boundingBox'] = bBox
for word in line['words']:
wbBox = word['boundingBox']
for ind in range(0, 7, 2):
wbBox[ind], wbBox[ind + 1] = rotate((wbBox[ind], wbBox[ind + 1]), (0, 0), page["angle"])
word['boundingBox'] = wbBox
return analysis
Conclusion
This blog was about how to rotate the bounding in response to the tilted angle. I hope you were able to achieve the same.
To know how to Build a Flask application and get started with Azure Cognitive Services go to the article series starting at "Python Flask App And Azure Cognitive Services Read API - Render HTML Page And File Transfer Between Client And Server"