Hey < Arnaldo> A few guidelines I try to stick to 1 Unit tes Kedro #advanced-need-help

Hey <@!318412012701089794> . A few guidelines I tr...

Ben H

07/13/2021, 5:00 PM

Hey @User . A few guidelines I try to stick to: 1. Unit test everything that is not inside

pipeline.py

. * Treat these like you would any other unit test you do. * Use tiny, targeted data per-test to keep your test suite fast. Usually single-row datasets work fine for unit tests, but occasionally you may stretch it to 10 randomly generated ones. 2. Integration test everything that is inside

pipeline.py

* This is where you typically stitch functions together. You already know they work individually from your unit tests, so completely disregard any form of unit testing for pipelines. Instead, think of them as an integration. * Provide free inputs * Test on final outputs * Completely ignore intermediates - you already unit tested them. 3. System test the entire pipeline. * Also known as "end-to-end" testing, or "automated acceptance" testing. * Your test runner will losely look like:

(cd source_proj & kedro pipeline package my_pipeline)

(cd test_dir & kedro new --config /path/to/test/config)

(cd test_dir/test_proj & kedro pipeline pull source_proj/dist/my_pipeline*.whl)

... any other data / config / pipelines that need set up ...

kedro run

4. Never ever use

catalog.yml

parameters.yml

data/

files in your tests. * For unit tests, you'll likely want to try out many variations of parameters, so you can't anyway! * Use kedro code API instead 5. Make use of pytest features, they make life a lot easier * use

fixtures

for setting up default catalogs and parameters (top-tip: pytest has a built-in fixture

tmp_path

- use that in your catalog entries) *

conftest.py

is a really useful file

10 Views

Previous Next