Overcoming Limitations of Power BI REST API for Automated Data Extraction from Dataset

Nandan Hegde
1y
2.5k
0
0

Article

Problem Statement

Power BI REST API: Datasets – Execute Queries has the below limitation

Maximum of 1000,000 rows or 1,000,000 values per query (whichever is hit first). For example, if you query for 5 columns, you can get back max 100,000 rows. If you query for 20 columns, you can get back max 50,000 rows(1 million divided by 20).

maximum of 15 MB of data per query. Once 15 MB is exceeded, the current row will be completed but no additional rows will be written.

Is it possible to have an automated way for data extraction from the Power BI dataset with data greater than 15MB or dataset greater than 100,000 rows?

Prerequisites

Power BI Report / Dataset
Azure Data Factory / Synapse
Azure SQL Database / SQL Database
Azure Blob Storage
Logic App ( This Offering can be avoided if we leverage Web Activity / Dataflow in Data Factory / Synapse )

In this article, we have leveraged Service Principal Authentication for data extraction from Power BI, so for a change of scenario; in this use case, we would be leveraging Logic App and AD User Authentication for extracting data from Dataset.

config

Solution

The pipeline would have 4 variables

variables

To overcome the limitation of Data row / Size, we need to get the overall row count (total number of rows within the dataset) and partition the dataset in some batch range to extract the data in partitions and finally merge the data into a single file.
To get the overall Row Count, we would be leveraging Web activity to execute a query on the Power BI dataset.

websetting

URL : @concat('https://api.powerbi.com/v1.0/myorg/datasets/',pipeline().parameters.PBIDatasetId,'/executeQueries')

Body :
{"queries":[{"query":"@{pipeline().parameters.RowCountQuery}"}],"serializerSettings":{"includeNulls":true}}

Where value for Parameter: RowCountQuery would be a Dax Query to get the Count of Rows.

EVALUATE SUMMARIZE(DataSharkX,"Count Row",COUNT(DataSharkX[EmployeeID]))

Below is my Power BI Dataset table.

dataset table

Output

dataset table output

We would now need to extract the row count from the JSON output and assign it to the RowCount variable.

set variable

Expression:
@replace(string(last(split(string(activity('Row Count').output.results[0].tables[0].rows[0]),':'))),'}','')

Output

set variable output

For this current scenario, we plan to extract the data in batches of 90000 rows. Hence, we have assigned a 90000 value to the PartitionRange pipeline parameter.

Now based on the overall row count and Partition range, we need to identify the number of iterations/batches in which we would be extracting the data.

For our use case, we have used a Lookup Activity to execute a SQL query on an Azure SQL Database.

lookup

-- Query Value:
DECLARE @@initialize INT = 1;
DECLARE @@partition INT = @{pipeline().parameters.PartitionRange};
DECLARE @@count INT = @{variables('RowCount')};
DECLARE @@maxrun INT = CEILING(@@count / @@partition);
DECLARE @@dividendcount INT = @@partition;
DECLARE @@con VARCHAR(MAX) = (SELECT CONCAT(@@initialize, ';', @@dividendcount));
DECLARE @@counter INT = 0;

WHILE (@@counter < @@maxrun)
BEGIN
    SET @@initialize = (SELECT @@dividendcount);
    SET @@dividendcount = (SELECT @@partition + @@dividendcount);
    SET @@con = (SELECT CONCAT(@@con, '|', @@initialize, ';', @@dividendcount));
    SET @@counter = (SELECT @@counter + 1);
END

SELECT @@con AS ConValue;

Output

lookup output

where we have generated 3 partitions in the range of.

1;90000
90000;180000
180000;270000

Split the ConValue (output in step #4) via ‘|’ and iterate it over For Each activity.

Iterate partition

Items :
@split(string(activity('Generate Partitions').output.value[0].ConValue),'|')

6. Within For Each Activity

ActivityBar

Set up the values for the variables of LowerRange & UpperRange.

LowerRange

Expression :
@substring(item(),add(indexOf(item(),';'),1),sub(sub(length(item()),indexOf(item(),';')),1))

Output
LowerRange Output

UpperRange

Expression :
@substring(item(),add(indexOf(item(),';'),1),sub(sub(length(item()),indexOf(item(),';')),1))

Generate the Final Query to Execute on the Power BI Dataset to extract the data in batches

DAX Query below is Equivalent to the Row_Number function in T-SQL, wherein we generate a Rank Column in the source dataset and filter the dataset in every iteration based on data between the LowerRange and UpperRange

DEFINE
    VAR Employee1 =
        SUMMARIZECOLUMNS(
            DataSharkX[EmployeeID],
            DataSharkX[EmployeeName],
            DataSharkX[Rating],
            "EID",
            VALUES(DataSharkX[EmployeeID])
        )
    VAR EmployeeTable =
        ADDCOLUMNS(
            Employee1,
            "Rank",
            RANK.EQ([EID], DataSharkX[EmployeeID], ASC)
        )
EVALUATE
    FILTER(
        EmployeeTable,
        [Rank] >= @{variables('LowerRange')} &&
        [Rank] < @{variables('UpperRange')}
    )

Output
FinalQuery Output

Trigger the Logic App Synchronously that would execute this FinalQuery on the Power BI Dataset and upload the file in Azure Blob Storage.

Logic App Design

LogicApp Design

The Highlighted section (of Sharepoint) can be ignored in the current context if need be.

HTTP Request

Query and Iteration are the input parameters that would map to the FinalQuery and LowerRange Variable values from ADF/Synapse.
Run Query against Dataset

Login via the User ID that has the necessary access on the Power BI Dataset and Select the necessary Workspace Name and the Dataset.
Create CSV Table
Create Blob

Connect to the Resource Group and Provide the Storage Account Details.
The blob Name is below the expression
```
concat(utcNow(),triggerBody()?['Iteration'],'pbi.csv')
```
(Optional) Create File Sharepoint

Where FileName is below the expression
```
concat(utcNow(),triggerBody()?['Iteration'],'pbi.csv')
```
For Synchronous calls and ADF / Synapse Webhook, Instead of using a request-response, we’ll use an HTTP POST message to a CallBackUri.

output after all Iterations

location

To Merge/Club the different files into a single file, leverage the Copy Activity

Source Setting

Source Setting

Source Dataset

Source Dataset

Sink Setting

Sink Setting

Sink Dataset

Sink Dataset

Output

location 1

Finally, Delete the Individual Partitioned data files

Delete Partition

Note. One can leverage Web Activity in place of Webhook/Logic App to execute the queries as well, but the partition range needs to be much lower than the 100000 range as Web Activity has a lower response range.

Note

Error Message for greater size

Error Code

In one case, use Dataflow activity to execute the queries.

File Output

Final Output