datajoely
09/22/2021, 12:06 PMWaldrill
09/22/2021, 12:13 PMdatajoely
09/22/2021, 12:14 PMWaldrill
09/22/2021, 12:15 PMdatajoely
09/22/2021, 12:16 PMuser
09/29/2021, 3:08 PMende
10/01/2021, 6:53 PM_load
method is wrapping some other library's read operation that only takes file paths (not file like objects, etc)... what's the best general strategy here using fsspec ?datajoely
10/04/2021, 8:58 AMpandas.CSVDataSet
and altering it for your purposes - since that's all tested to work
https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.pandas.CSVDataSet.htmluser
10/08/2021, 7:34 AMsimon_myway
10/08/2021, 2:15 PMdatajoely
10/08/2021, 2:25 PMdatajoely
10/08/2021, 2:25 PMrequirements.txt
within a modular pipeline subfolder, it will take that as gospel for that particular pipelineuser
10/08/2021, 5:01 PMmlemainque
10/11/2021, 9:09 AMdatajoely
10/11/2021, 9:27 AMmlemainque
10/11/2021, 9:43 AMyaml
incremental_sql_dataset:
type: SQLQueryDataSet
sql: SELECT * FROM table WHERE id > %(checkpoint)s
checkpoint:
column: id # Which column to use to update the checkpoint based on the loaded content
filepath: ... # Where to store the checkpoint (same as for partitioned incremental datasets)
But you're right it could easily be done with a custom implementationmlemainque
10/11/2021, 9:59 AMpython
def make_incremental(input_data: pd.DataFrame, output_partitioned_data: Dict) -> Dict:
for _, load_output in output_partitioned_data.items():
input_data = input_data.merge(load_output()[['id']], on='id', how='outer', indicator=True)
input_data = input_data[input_data._merge == 'right_only'].drop(columns=['_merge'])
return {str(datetime.utcnow()): input_data}
node(make_incremental, 'input_dataset', 'output_partitioned_dataset')
datajoely
10/11/2021, 10:00 AMmlemainque
10/11/2021, 10:02 AMdatajoely
10/11/2021, 10:03 AMmlemainque
10/11/2021, 10:03 AMNode._run_with_dict
method should also pass the outputs if they are in the inner function's signaturedatajoely
10/11/2021, 10:04 AMdatajoely
10/11/2021, 10:04 AMmlemainque
10/11/2021, 10:06 AMdatajoely
10/11/2021, 10:07 AMmlemainque
10/11/2021, 2:36 PMkedro-viz
and finally have it somehow integrated in our favorites IDE?
A first easy step I think would be to add hyperlinks:
* From a node you can go directly to the inner func's code in VScode thanks to a vscode://
hyperlink
* From a FS dataset you can see the list of files and open them thanks to a file://
hyperlink. Or even display a table preview directly in kedro-viz
* From an image/matplotlib dataset you can display a preview...datajoely
10/11/2021, 2:37 PMdatajoely
10/11/2021, 2:37 PMdatajoely
10/11/2021, 2:37 PMmlemainque
10/11/2021, 2:38 PMmlemainque
10/11/2021, 2:38 PM