Skip to main content
Filter by
Sorted by
Tagged with
0 votes
1 answer
24 views

Disable auto scaling for templated jobs

In Dataflow, you can run jobs without autoscaling. This is typically achieved by setting a pipeline_option called autoscaling_algorithm to NONE. Attempting the equivalent on Templated Dataflow Jobs ...
user30237673's user avatar
0 votes
1 answer
48 views

How to prevent deletions from source (GCP CloudSQL MySQL) reflecting in GCP BigQuery using Datastream?

Description: We are currently using Google Cloud Datastream to replicate data from a CloudSQL (MySQL) instance into BigQuery in near real-time. The replication works perfectly for insert and update ...
Ashwini Kumar's user avatar
0 votes
0 answers
33 views

Azure Data Factory / Data Flow, how to extract data from JSON where the ids are the keys?

In an Azure Data Factory Data Flow I am using a REST endpoint as the data source to get a JSON of data. However the data arrives in a strange format, it is a dictionary of keys where the key value is ...
Jack's user avatar
  • 1
0 votes
0 answers
49 views
+50

BigQuery Performance Issue After Switching Data Pipeline to DataFlow

Problem I'm experiencing significant query performance degradation in BigQuery for recent partitions after switching our data pipeline from a sequential Talend approach to Apache Beam/DataFlow. ...
Spine Feast's user avatar
0 votes
1 answer
33 views

Azure DF error: Unable to parse expression

I am trying to use a dataset parameter set in the pipeline to make my blob path dynamic for each data flow I've created. However, just testing this first data flow, I keep getting an error saying '...
creed6700's user avatar
0 votes
1 answer
28 views

Cloud Scheduler to trigger dataflow flex template

I'm struggling to make my Flex Template working with Cloud Scheduler. I was able to create it and I can run it from my local machine, through dataflow "create job from template" or using a ...
Rui Bras Fernandes's user avatar
0 votes
1 answer
63 views

Dataflow Flex Template Docker issue: Cannot start an expansion service since neither Java nor Docker executables are available in the system

I'm trying to run a dataflow job using flex template in docker. Here what I have: FROM python:3.11-slim COPY --from=apache/beam_python3.11_sdk:2.54.0 /opt/apache/beam /opt/apache/beam COPY --from=...
Rafael Paz's user avatar
2 votes
1 answer
50 views

What’s the difference between regular Apache Beam connectors and Managed I/O?

Apache Beam recently introduced Managed I/O APIs for Java and Python. What is the difference between Managed I/O and the regular Apache Beam connectors (sources and sinks) ?
chamikara's user avatar
  • 2,074
0 votes
1 answer
44 views

Apache Beam Cross-language JDBC (MSSQL) - incorrect negative Integer type conversion

We use JDBC cross-language transform to read data from MSSQL to BigQuery, and we noticed negative integers are being converted incorrectly. For example: if we have INT column in source with value (-1),...
Matar's user avatar
  • 73
0 votes
1 answer
31 views

Escaping non-delimiters in a large csv file in Power BI Dataflow

I am currently attempting to read a large csv file (2.05GB) into Power BI's dataflow. The csv file has 5 million rows and 38 columns (as read separately in Jupyter notebook), and there are some cells ...
b1032c's user avatar
  • 1
0 votes
1 answer
34 views

How does DataFlow charge for read operations from Cloud Storage

I am trying to understand how Google Cloud Dataflow costs when reading a file with beam.io.ReadFromText. From my understanding, every time something is read from a Google Cloud bucket, it incurs ...
Giancarlo Metitieri's user avatar
1 vote
2 answers
57 views

Vertical autoscaling dataflow experiments args don't get properly parsed

We want to enable vertical autoscaling on our dataflow prime pipeline for a python container: https://cloud.google.com/dataflow/docs/vertical-autoscaling We're trying to run our pipeline through this ...
unitrium's user avatar
0 votes
1 answer
39 views

Leaving message unacknowledged in Benthos job with gcp_pubsub input

How does Benthos handle the acknowledgement of pubsub messages? How can we manage ack/unack based on custom if-else conditions? Here is the scenario i'm trying to achieve: I have written a Benthos job ...
Tarun Kumar's user avatar
0 votes
2 answers
64 views

GCP Batch Dataflow - Records Dropped while inserting to BigQuery

Im using GCP Batch Dataflow to process data that im picking from a table. The input here is table data - where im using a query in Java to get the data. After processing, when I'm trying to insert the ...
Insecupa's user avatar
0 votes
0 answers
35 views

Tracking FlowFile UUID Across Processors in Apache NiFi 2.1.0

How can I effectively track the original FlowFile UUID across different processors in Apache NiFi, especially after using the SplitJson processor, which creates new FlowFiles with different UUIDs? I ...
user29618711's user avatar

15 30 50 per page
1
2 3 4 5
372