data-engineering
Here are 1,266 public repositories matching this topic...
-
Updated
May 1, 2022
-
Updated
Jan 2, 2022
-
Updated
Jan 25, 2022
Opened from the Prefect Public Slack Community
pat: This is a pretty minor problem as these things go, but it would be great if there was a way to disable the ASCII logo in Prefect Agent and Prefect Server, since it pollutes our server logs in DataDog. I can go hack the code, in Prefect, but it seems inelegant to have to re-apply such code after every version up
Tasks:
- Port the content from GH readme to Docusaurus (main Docs website)
- Incorporate relevant CLI content into the Getting Started with Airbyte OSS guide
- Identify other places in Docs where we can incorporate CLI content
Describe the bug
data docs columns shrink to 1 character width with long query
To Reproduce
Steps to reproduce the behavior:
- make a batch from a long query string
- run validation
- render result to data docs
- See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4
Under the hood, Benthos csv input
uses the standard encoding/csv
packages's csv.Reader struct.
The current implementation of csv input doesn't allow setting the LazyQuotes
field.
We have a use case where we need to set the LazyQuotes
field in order to make things work correctly.
Expected Behavior
Feast should allow users to create feature views with .csv data sources and retrieve features from offline store without any issues.
Current Behavior
Presently, I have a .csv file sitting in S3 bucket and I am able to create a feature view using this .csv file but while fetching the features from offline store getting below error
-------------------------
When there are not enough results, we tell the user that the experiment just started, so come back later. When the experiment dates are set to a future time, this language doesn't fit very well. We should adjust the language to take this future state into account when figuring out the message.
<img width="875" alt="CleanShot 2022-04-10 at 21 23 22@2x" src="https://user-images.githubusercontent
-
Updated
May 6, 2022 - Python
-
Updated
May 6, 2022 - Java
On more advanced versions of LakeFS (probably > = v1.0.0), we would like to remove the logic that tries to fill the generation field in DB when loading old dumps. It means we will no longer support loading dump that made with a version lower than v0.61.0.
there are a few places where we are using multithreading to download files, but looks like using asyncio is a better option. we'd first have to do a quick implementation and see how much benefit it brings to consider migrating
-
Updated
Feb 2, 2022
-
Updated
Mar 29, 2022 - JavaScript
-
Updated
May 5, 2022 - Scala
-
Updated
May 6, 2022 - Jupyter Notebook
-
Updated
Dec 31, 2021
(1) Add docstrings to methods
(2) Covert .format() methods to f strings for readability
(3) Make sure we are using Python 3.8 throughout
(4) zip extract_all() in ingest_flights.py can be simplified with a Path parameter
-
Updated
Mar 9, 2020 - Python
Let's prepare a mixin for interacting with Roles and Policies with the Python client, in case users want to use the API directly.
Do not only have the list, get etc, but also utility methods, such as updating a default role. It should wrap the following logic:
import requests
import json
# Get the ID
data_consumer = requests.get("http://localhost:8585/api/v1/roles/name/DataCo
Hi ,
I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?
if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.
`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)
@classmethod
def create_testing_pyspark_session(cls):
return Sp
-
Updated
May 7, 2022 - Python
-
Updated
May 7, 2022 - Python
-
Updated
Mar 5, 2020 - Python
-
Updated
Jun 2, 2021
Improve this page
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."
The Mixed Time-Series chart type allows for configuring the title of the primary and the secondary y-axis.
However, while only the title of the primary axis is shown next to the axis, the title of the secondary one is placed at the upper end of the axis where it gets hidden by bar values and zoom controls.
How to reproduce the bug