Introduction
As I was analyzing cloud dataflow, I feel my analysis may be helpful in the upcoming days for someone, who does or looking for the same sort of solution. Basically, Cloud dataflow is one that is simply called a cloud processor tool to transform our data from one form to another form. In simple terminology sending input to dataflow and getting the desired output from the pipeline, based on what type of logic we write into dataflow.
Cloud Dataflow
Cloud Dataflow = Cloud Function + Cloud Pub/Sub
Instead of using a couple of services, we can manage them in a single service. Also, cloud dataflow is managed service, so we no need to apply any provisioning, auto-scaling, etc., This service is used very much in the analytics field, to understand our end users. Based on this understanding, we can offer recommendations to increase our product sales.
Cloud Dataflow, it has a multi-predefined data transformation template, which we can use if it suits our use case, otherwise, it has another option a custom template, where we can write our own template and utilize it across our projects.
Architecture HLD
In Cloud Dataflow, we have a couple of approaches
- Batch Processing => Data is collected over time. Once an order is collected, the processing will start to move from one source to another place. On top of that, batch processing does a large set of data, so its latency does not applicable in this case.
- Steaming Processing => It's continuous data, since it processes chunks of data, so the processing time is very low, so no more latency. however, it bit cost extra than batch processing.
Use Cases of Batch Processing
- Payroll => Sending payslip Mail
- Internal Announcement => Sending Mail to Employees
- EOD Calculation
Use Case of Streaming Processing
- Realtime product recommendation
- Fraud detection
- Social Media user base advertisement
I hope this article gave a high-level idea about Cloud dataflow, we will see in our next session :)