I have a situation here. i want to figure out the best way to ingest API streaming data from an application to GCP BigQuery while having data masking in place. However, some downstream admin users will essentially need to see unmasked data too.
What i am thinking here is to implement an event based trigger data ingestion using PubSub to trigger a dataflow as soon as a new file is published. And the dataflow will have 2 branches inside it.
Branch 1: To call DLP and mask the incoming data and load a table T1 in BigQuery Branch 2: Use "PubSub topic to BigQuery" template and load unmasked (as-is) data from the source to another table T2 in BigQuery.
I can later use role based access to give general user access to T1 and admin access to T2.
My question to you is about the first branch in the dataflow. Is there any template available to use DLP and mask the incoming data row by row. How can this be done. Do i need to use Apache Beam here.
Or is it a case that my entire design is wrong and a better approach can be implemented as a whole. Please guide me.
To get direction to my next project and build a dataflow accordingly.