All Questions
Tagged with jdbc apache-spark
572 questions
0
votes
0
answers
79
views
Spark JDBC Connection To MsSQL Using Kerberos - Failed to find any Kerberos tgt
While trying to connect Spark with MSSQL, we are setting up a JDBC connection and want to Kerberize it. Using the keytab and principal we created, we were able to establish a connection with a simple ...
1
vote
1
answer
66
views
Disable inferSchema for JDBC connections
I have an Azure SQL database that I want to query with PySpark. I have to "copy" the data to a temporary table, and then query this temporary table. I would like to use pretty much the same ...
1
vote
1
answer
87
views
Paintully slow Spark Oracle read (JDBC)
I am reading a small table from an Oracle database using Spark on Databricks.
The code is very simple:
df = spark.read.jdbc(url = url, table = table_name, properties = "{"driver": "...
1
vote
1
answer
110
views
“Spark-PySpark Redshift JDBC Write: No suitable driver / ClassNotFoundException: com.amazon.redshift.jdbc42.Driver Errors”
I’m trying to write a DataFrame from Spark (PySpark) to an Amazon Redshift Serverless cluster using the Redshift JDBC driver.
I keep running into driver-related errors:
• java.sql.SQLException: No ...
0
votes
1
answer
58
views
Writing data to ADW through JDBC in a PySpark environment performs poorly
I am trying to write PySpark DataFrames to ADW (Oracle Autonomous Data Warehouse) using JDBC in a Jupyter Lab environment, but the performance is low.
dataframe.format("jdbc").mode('...
0
votes
2
answers
128
views
Breaking up a large JDBC write with Spark
We want to copy a large Spark dataframe into Oracle, but I am finding the tuning options a bit limited. Looking at Spark documentation, the only related tuning property I could find for a JDBC write ...
1
vote
1
answer
102
views
Spark JDBC read results in row loss
We're currently running into an issue with a spark job architecture which is used as an interface to import data sources from a corporate oracle data warehouse into Amazon s3.
The job contains no ...
1
vote
2
answers
139
views
Unable to get the postgres data in the right format via Kafka, JDBC source connector and pyspark
I have created a table in Postgres:
CREATE TABLE IF NOT EXISTS public.sample_a
(
id text COLLATE pg_catalog."default" NOT NULL,
is_active boolean NOT NULL,
is_deleted boolean NOT ...
0
votes
2
answers
90
views
Reading from Apache Ignite with JDBC driver gives SQLException: Fetch size must be greater than zero
I'm trying to read some data from an Apache Ignite table with PySpark.
spark.read.format("jdbc").option("driver", "org.apache.ignite.IgniteJdbcThinDriver")\
.option("...
0
votes
0
answers
40
views
How to save a PySpark dataframe to Apache Ignite with JDBC driver?
I have a PySpark application and I want to save some dataframe to Apache Ignite database (to a new table).
I have created a Spark session with Spark JDBC driver plugged and I'm trying to save a ...
0
votes
0
answers
89
views
Py4JJavaError : The TCP/IP connection to the host localhost, port 1433 has failed
first time learning pyspark, want to learn Read and Write from/to Sql Server Via JDBC
using this code
from pyspark.sql import SparkSession
host = "localhost"
database = "...
-1
votes
1
answer
67
views
running normal java non-spark application in spark cluster
I want to run/execute a normal java application which connects to teradata database.
I would like to run this java app in spark cluster although my java app is non-spark.
Questions are as follows
Is ...
0
votes
1
answer
128
views
Spark Driver runs out of memory loading a query from a JDBC driver
Here is the issue. I need to load data from a remote database, with a bit of an awkward query:
query = f"""
(SELECT id, key, MAX(time) AS created_at
FROM remote.table
WHERE ...
0
votes
0
answers
106
views
EMR Serverless SparkSession builder error: ClassNotFoundException issues
I am trying to create a job in EMR Studio to run in an EMR Serverless application. It's a relatively basic script to use PySpark to read some Athena tables, do some joins, create an output dataframe ...
0
votes
3
answers
216
views
How to connect Snowflake with PySpark with Google Colab?
I am trying to connect to Snowflake with Pyspark on Google Colab.
Spark version 3.4
Scala version 2.12.17
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark import ...