Newest 'google-cloud-dataflow' Questions

0 votes

1 answer

24 views

Disable auto scaling for templated jobs

In Dataflow, you can run jobs without autoscaling. This is typically achieved by setting a pipeline_option called autoscaling_algorithm to NONE. Attempting the equivalent on Templated Dataflow Jobs ...

user30237673

1

asked Apr 23 at 13:48

0 votes

1 answer

48 views

How to prevent deletions from source (GCP CloudSQL MySQL) reflecting in GCP BigQuery using Datastream?

Description: We are currently using Google Cloud Datastream to replicate data from a CloudSQL (MySQL) instance into BigQuery in near real-time. The replication works perfectly for insert and update ...

Ashwini Kumar

33

asked Apr 23 at 10:12

0 votes

0 answers

33 views

Azure Data Factory / Data Flow, how to extract data from JSON where the ids are the keys?

In an Azure Data Factory Data Flow I am using a REST endpoint as the data source to get a JSON of data. However the data arrives in a strange format, it is a dictionary of keys where the key value is ...

Jack

1

asked Apr 22 at 14:36

0 votes

0 answers

49 views

+50

BigQuery Performance Issue After Switching Data Pipeline to DataFlow

Problem I'm experiencing significant query performance degradation in BigQuery for recent partitions after switching our data pipeline from a sequential Talend approach to Apache Beam/DataFlow. ...

Spine Feast

143

asked Apr 18 at 11:54

0 votes

1 answer

33 views

Azure DF error: Unable to parse expression

I am trying to use a dataset parameter set in the pipeline to make my blob path dynamic for each data flow I've created. However, just testing this first data flow, I keep getting an error saying '...

creed6700

1

asked Apr 16 at 13:44

0 votes

1 answer

28 views

Cloud Scheduler to trigger dataflow flex template

I'm struggling to make my Flex Template working with Cloud Scheduler. I was able to create it and I can run it from my local machine, through dataflow "create job from template" or using a ...

Rui Bras Fernandes

91

asked Apr 11 at 20:10

0 votes

1 answer

63 views

Dataflow Flex Template Docker issue: Cannot start an expansion service since neither Java nor Docker executables are available in the system

I'm trying to run a dataflow job using flex template in docker. Here what I have: FROM python:3.11-slim COPY --from=apache/beam_python3.11_sdk:2.54.0 /opt/apache/beam /opt/apache/beam COPY --from=...

Rafael Paz

11

asked Mar 14 at 0:42

2 votes

1 answer

50 views

What’s the difference between regular Apache Beam connectors and Managed I/O?

Apache Beam recently introduced Managed I/O APIs for Java and Python. What is the difference between Managed I/O and the regular Apache Beam connectors (sources and sinks) ?

chamikara

2,074

asked Mar 12 at 21:51

0 votes

1 answer

44 views

Apache Beam Cross-language JDBC (MSSQL) - incorrect negative Integer type conversion

We use JDBC cross-language transform to read data from MSSQL to BigQuery, and we noticed negative integers are being converted incorrectly. For example: if we have INT column in source with value (-1),...

Matar

73

asked Feb 27 at 10:20

0 votes

1 answer

31 views

Escaping non-delimiters in a large csv file in Power BI Dataflow

I am currently attempting to read a large csv file (2.05GB) into Power BI's dataflow. The csv file has 5 million rows and 38 columns (as read separately in Jupyter notebook), and there are some cells ...

b1032c

1

asked Feb 27 at 6:02

0 votes

1 answer

34 views

How does DataFlow charge for read operations from Cloud Storage

I am trying to understand how Google Cloud Dataflow costs when reading a file with beam.io.ReadFromText. From my understanding, every time something is read from a Google Cloud bucket, it incurs ...

Giancarlo Metitieri

135

asked Feb 24 at 17:58

1 vote

2 answers

57 views

Vertical autoscaling dataflow experiments args don't get properly parsed

We want to enable vertical autoscaling on our dataflow prime pipeline for a python container: https://cloud.google.com/dataflow/docs/vertical-autoscaling We're trying to run our pipeline through this ...

unitrium

62

asked Feb 24 at 15:24

0 votes

1 answer

39 views

Leaving message unacknowledged in Benthos job with gcp_pubsub input

How does Benthos handle the acknowledgement of pubsub messages? How can we manage ack/unack based on custom if-else conditions? Here is the scenario i'm trying to achieve: I have written a Benthos job ...

Tarun Kumar

3

asked Feb 23 at 22:07

0 votes

2 answers

64 views

GCP Batch Dataflow - Records Dropped while inserting to BigQuery

Im using GCP Batch Dataflow to process data that im picking from a table. The input here is table data - where im using a query in Java to get the data. After processing, when I'm trying to insert the ...

Insecupa

11

asked Feb 19 at 14:38

0 votes

0 answers

35 views

Tracking FlowFile UUID Across Processors in Apache NiFi 2.1.0

How can I effectively track the original FlowFile UUID across different processors in Apache NiFi, especially after using the SplitJson processor, which creates new FlowFiles with different UUIDs? I ...

user29618711

1

asked Feb 14 at 6:37

Collectives™ on Stack Overflow

Disable auto scaling for templated jobs

How to prevent deletions from source (GCP CloudSQL MySQL) reflecting in GCP BigQuery using Datastream?

Azure Data Factory / Data Flow, how to extract data from JSON where the ids are the keys?

BigQuery Performance Issue After Switching Data Pipeline to DataFlow

Azure DF error: Unable to parse expression

Cloud Scheduler to trigger dataflow flex template

Dataflow Flex Template Docker issue: Cannot start an expansion service since neither Java nor Docker executables are available in the system

What’s the difference between regular Apache Beam connectors and Managed I/O?

Apache Beam Cross-language JDBC (MSSQL) - incorrect negative Integer type conversion

Escaping non-delimiters in a large csv file in Power BI Dataflow

How does DataFlow charge for read operations from Cloud Storage

Vertical autoscaling dataflow experiments args don't get properly parsed

Leaving message unacknowledged in Benthos job with gcp_pubsub input

GCP Batch Dataflow - Records Dropped while inserting to BigQuery

Tracking FlowFile UUID Across Processors in Apache NiFi 2.1.0

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags