Hey <@!318412012701089794> . A few guidelines I tr...
# advanced-need-help
b
Hey @User . A few guidelines I try to stick to: 1. Unit test everything that is not inside
pipeline.py
. * Treat these like you would any other unit test you do. * Use tiny, targeted data per-test to keep your test suite fast. Usually single-row datasets work fine for unit tests, but occasionally you may stretch it to 10 randomly generated ones. 2. Integration test everything that is inside
pipeline.py
* This is where you typically stitch functions together. You already know they work individually from your unit tests, so completely disregard any form of unit testing for pipelines. Instead, think of them as an integration. * Provide free inputs * Test on final outputs * Completely ignore intermediates - you already unit tested them. 3. System test the entire pipeline. * Also known as "end-to-end" testing, or "automated acceptance" testing. * Your test runner will losely look like:
(cd source_proj & kedro pipeline package my_pipeline)
(cd test_dir & kedro new --config /path/to/test/config)
(cd test_dir/test_proj & kedro pipeline pull source_proj/dist/my_pipeline*.whl)
... any other data / config / pipelines that need set up ...
kedro run
4. Never ever use
catalog.yml
or
parameters.yml
or
data/
files in your tests. * For unit tests, you'll likely want to try out many variations of parameters, so you can't anyway! * Use kedro code API instead 5. Make use of pytest features, they make life a lot easier * use
fixtures
for setting up default catalogs and parameters (top-tip: pytest has a built-in fixture
tmp_path
- use that in your catalog entries) *
conftest.py
is a really useful file
10 Views