Ben H
07/13/2021, 5:00 PMpipeline.py
.
* Treat these like you would any other unit test you do.
* Use tiny, targeted data per-test to keep your test suite fast. Usually single-row datasets work fine for unit tests, but occasionally you may stretch it to 10 randomly generated ones.
2. Integration test everything that is inside pipeline.py
* This is where you typically stitch functions together. You already know they work individually from your unit tests, so completely disregard any form of unit testing for pipelines. Instead, think of them as an integration.
* Provide free inputs
* Test on final outputs
* Completely ignore intermediates - you already unit tested them.
3. System test the entire pipeline.
* Also known as "end-to-end" testing, or "automated acceptance" testing.
* Your test runner will losely look like:
(cd source_proj & kedro pipeline package my_pipeline)
(cd test_dir & kedro new --config /path/to/test/config)
(cd test_dir/test_proj & kedro pipeline pull source_proj/dist/my_pipeline*.whl)
... any other data / config / pipelines that need set up ...
kedro run
4. Never ever use catalog.yml
or parameters.yml
or data/
files in your tests.
* For unit tests, you'll likely want to try out many variations of parameters, so you can't anyway!
* Use kedro code API instead
5. Make use of pytest features, they make life a lot easier
* use fixtures
for setting up default catalogs and parameters (top-tip: pytest has a built-in fixture tmp_path
- use that in your catalog entries)
* conftest.py
is a really useful fileArnaldo
07/13/2021, 6:43 PMArnaldo
07/13/2021, 6:45 PMdatajoely
07/13/2021, 6:50 PMuser
07/13/2021, 7:39 PMdatajoely
07/13/2021, 9:44 PMdatajoely
07/13/2021, 9:45 PMuser
07/13/2021, 9:47 PMnoklam
07/14/2021, 6:05 AMdatajoely
07/14/2021, 6:31 AMMad Hatter
07/14/2021, 7:19 AMdatajoely
07/14/2021, 8:12 AMMad Hatter
07/14/2021, 9:17 AMMad Hatter
07/14/2021, 9:17 AMdatajoely
07/14/2021, 9:19 AMMad Hatter
07/14/2021, 9:20 AMMad Hatter
07/14/2021, 9:23 AMMad Hatter
07/14/2021, 9:23 AMdatajoely
07/14/2021, 9:27 AMBen H
07/14/2021, 9:59 AMBen H
07/14/2021, 10:00 AMMad Hatter
07/14/2021, 12:01 PMdatajoely
07/14/2021, 12:02 PMMad Hatter
07/14/2021, 1:22 PMMad Hatter
07/14/2021, 1:23 PMdatajoely
07/14/2021, 1:29 PMPipeline([a,b]) + Pipeline([c,d]) = Pipeline([a,b,c,d])
This is will make one big pipeline, but it must still be acyclic to run both on the Kedro and Viz sideArnaldo
07/14/2021, 1:36 PMArnaldo
07/14/2021, 1:36 PMuser
07/14/2021, 1:37 PMdatajoely
07/14/2021, 1:38 PM