data-engineering
Here are 1,143 public repositories matching this topic...
-
Updated
Feb 9, 2022
-
Updated
Jan 2, 2022
-
Updated
Jan 25, 2022
Description
It looks like the installation command should be conda install -c conda-forge prefect python-kubernetes
https://github.com/PrefectHQ/prefect/blame/master/docs/orchestration/agents/kubernetes.md#L28
Logs:
(prefect) ubuntu@ip-172-31-7-79:~/perfect$ prefect agent kubernetes start --name "K8S Agent"
Traceback (most recent call
Describe the bug
data docs columns shrink to 1 character width with long query
To Reproduce
Steps to reproduce the behavior:
- make a batch from a long query string
- run validation
- render result to data docs
- See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4
We should do something like https://blog.questionable.services/article/kubernetes-deployments-configmap-change/ to ensure that if the pod sweeper has a different config map the underlying pod gets rolled.
Under the hood, Benthos csv input
uses the standard encoding/csv
packages's csv.Reader struct.
The current implementation of csv input doesn't allow setting the LazyQuotes
field.
We have a use case where we need to set the LazyQuotes
field in order to make things work correctly.
The current DynamoDB implementation does sequential gets (https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/online_stores/dynamodb.py#L163)
Possible Solution
A better approach is to do some multi-get operation or at least run these queries in parallel and collect the results.
-
Updated
Feb 22, 2022 - Python
Documentation around storage namespace should be improved:
- Stress that the storage namespace should be empty for a new repository. The reason for this is that repo-level settings would be overwritten if two repositories used the same storage namespace (you don't necessarily have to mention the
reason).
An exception for this is bare repositories which may be created over an existing, alr
-
Updated
Feb 2, 2022
When we show data for a metric, we currently don't include the current day's worth of data. For users just getting set up, they may only have events from today, and want to test out if the query is working, and by excluding events from 'today', they can't see results.
TODO:
- In
packages/back-end/src/services/experiments.ts
on line329
, instead of using the current date as the value
-
Updated
Aug 2, 2021 - JavaScript
we recently added an example to use File clients, we should update the docs to link to the example wherever is relevant: https://github.com/ploomber/projects/tree/master/cookbook/file-client
-
Updated
Feb 23, 2022 - Jupyter Notebook
-
Updated
Dec 31, 2021
-
Updated
Feb 8, 2022 - Jupyter Notebook
-
Updated
Mar 9, 2020 - Python
if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.
`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)
@classmethod
def create_testing_pyspark_session(cls):
return Sp
-
Updated
Feb 22, 2022 - Dockerfile
Background
This thread is borne out of the discussion from #968 , in an effort to make documentation more beginner-friendly & more understandable.
One of the subtasks mentioned in that thread was to go through the function docstrings and include a minimal working example to each of the public functions in pyjanitor.
Criteria reiterated here for the benefit of discussion:
It sh
-
Updated
Feb 14, 2022 - Python
Is your feature request related to a problem? Please describe.
The current ometa_api
method list_entities
returns an EntityList
which could possibly be missing some entities, in the request if we still have the value of after
informed.
Describe the solution you'd like
Prepare a new convenience method list_all_entities
by using the current list_entities
to make sure we have
-
Updated
Mar 5, 2020 - Python
-
Updated
Jun 2, 2021
-
Updated
Feb 2, 2022
-
Updated
Feb 22, 2022 - Python
-
Updated
Nov 6, 2021 - Ruby
Improve this page
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."
The Mixed Time-Series chart type allows for configuring the title of the primary and the secondary y-axis.
However, while only the title of the primary axis is shown next to the axis, the title of the secondary one is placed at the upper end of the axis where it gets hidden by bar values and zoom controls.
How to reproduce the bug