Query Folding in Fabric Dataflow Gen2 (Power Query Online)

Query folding is an optimization process where Power Query tries to push as much execution as possible to data sources either on-premises or in the cloud. The Query Folding determines which parts of query can be executed directly by the data source such as SQL Server, Fabric Lakehouse, Fabric Warehouse, etc.

Demo

In the screenshot below, I have transaction2016 and transaction2017 as #delta tables resident in Lakehouse1 and Lakehouse2 within my Fabric Tenant. Dataflow Gen2 is used to access and append the two tables from the Lakehouses.

Query folding

After the append operation, in the applied step, the Source step shows a green indicator. This implies that the step is evaluated at the source data level, which, in my case, are Lakehouse1 and Lakehouse2, respectively. Power Query pushed the execution to the source data, and this is called Query Folding

Power Query

A right-click on the folding Source steps shows a flyover that allows viewing the query plan.

Folding Source steps

In the Query Plan, the Value.NativeQuery is visible and shows all the selected columns in the LH1 (Lakehouse1) table called transaction2016 with a union of all SQL operators appending an equal number of columns from the LH2 (Lakehouse2) table called transaction2017.

SQL operators

A click on the Sql.Database view details show that EnableCrossDatabaseFolding is set to true, implying query folding.

EnableCrossDatabase

Conversely, within the same applied steps, the Groupby operation is performed and returns a red indicator. This implies that the GroupBy step is evaluated outside the source data (evaluated locally and not-folding).

Groupby operation

Conclusion

By pushing as much execution to the source data as possible, data transfer and processing within the Power Query engine is reduced. In addition, only relevant data is retrieved, leading to faster query execution.


Similar Articles