Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
0 answers
79 views

Spark JDBC Connection To MsSQL Using Kerberos - Failed to find any Kerberos tgt

While trying to connect Spark with MSSQL, we are setting up a JDBC connection and want to Kerberize it. Using the keytab and principal we created, we were able to establish a connection with a simple ...
Baki Erbaş's user avatar
1 vote
1 answer
66 views

Disable inferSchema for JDBC connections

I have an Azure SQL database that I want to query with PySpark. I have to "copy" the data to a temporary table, and then query this temporary table. I would like to use pretty much the same ...
ralpar's user avatar
  • 11
1 vote
1 answer
87 views

Paintully slow Spark Oracle read (JDBC)

I am reading a small table from an Oracle database using Spark on Databricks. The code is very simple: df = spark.read.jdbc(url = url, table = table_name, properties = "{"driver": "...
Łukasz Kastelik's user avatar
1 vote
1 answer
110 views

“Spark-PySpark Redshift JDBC Write: No suitable driver / ClassNotFoundException: com.amazon.redshift.jdbc42.Driver Errors”

I’m trying to write a DataFrame from Spark (PySpark) to an Amazon Redshift Serverless cluster using the Redshift JDBC driver. I keep running into driver-related errors: • java.sql.SQLException: No ...
Cauder's user avatar
  • 2,657
0 votes
1 answer
58 views

Writing data to ADW through JDBC in a PySpark environment performs poorly

I am trying to write PySpark DataFrames to ADW (Oracle Autonomous Data Warehouse) using JDBC in a Jupyter Lab environment, but the performance is low. dataframe.format("jdbc").mode('...
danmo41's user avatar
0 votes
2 answers
128 views

Breaking up a large JDBC write with Spark

We want to copy a large Spark dataframe into Oracle, but I am finding the tuning options a bit limited. Looking at Spark documentation, the only related tuning property I could find for a JDBC write ...
Depressio's user avatar
  • 1,379
1 vote
1 answer
102 views

Spark JDBC read results in row loss

We're currently running into an issue with a spark job architecture which is used as an interface to import data sources from a corporate oracle data warehouse into Amazon s3. The job contains no ...
ddluke's user avatar
  • 51
1 vote
2 answers
139 views

Unable to get the postgres data in the right format via Kafka, JDBC source connector and pyspark

I have created a table in Postgres: CREATE TABLE IF NOT EXISTS public.sample_a ( id text COLLATE pg_catalog."default" NOT NULL, is_active boolean NOT NULL, is_deleted boolean NOT ...
RushHour's user avatar
  • 613
0 votes
2 answers
90 views

Reading from Apache Ignite with JDBC driver gives SQLException: Fetch size must be greater than zero

I'm trying to read some data from an Apache Ignite table with PySpark. spark.read.format("jdbc").option("driver", "org.apache.ignite.IgniteJdbcThinDriver")\ .option("...
Felix's user avatar
  • 3,641
0 votes
0 answers
40 views

How to save a PySpark dataframe to Apache Ignite with JDBC driver?

I have a PySpark application and I want to save some dataframe to Apache Ignite database (to a new table). I have created a Spark session with Spark JDBC driver plugged and I'm trying to save a ...
Felix's user avatar
  • 3,641
0 votes
0 answers
89 views

Py4JJavaError : The TCP/IP connection to the host localhost, port 1433 has failed

first time learning pyspark, want to learn Read and Write from/to Sql Server Via JDBC using this code from pyspark.sql import SparkSession host = "localhost" database = "...
abbym's user avatar
  • 31
-1 votes
1 answer
67 views

running normal java non-spark application in spark cluster

I want to run/execute a normal java application which connects to teradata database. I would like to run this java app in spark cluster although my java app is non-spark. Questions are as follows Is ...
ironfreak's user avatar
0 votes
1 answer
128 views

Spark Driver runs out of memory loading a query from a JDBC driver

Here is the issue. I need to load data from a remote database, with a bit of an awkward query: query = f""" (SELECT id, key, MAX(time) AS created_at FROM remote.table WHERE ...
Lennart Reus's user avatar
0 votes
0 answers
106 views

EMR Serverless SparkSession builder error: ClassNotFoundException issues

I am trying to create a job in EMR Studio to run in an EMR Serverless application. It's a relatively basic script to use PySpark to read some Athena tables, do some joins, create an output dataframe ...
si1287's user avatar
  • 1
0 votes
3 answers
216 views

How to connect Snowflake with PySpark with Google Colab?

I am trying to connect to Snowflake with Pyspark on Google Colab. Spark version 3.4 Scala version 2.12.17 from pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark import ...
Ryan Seiyu's user avatar

15 30 50 per page
1
2 3 4 5
39