Ben H
07/13/2021, 5:00 PMpipeline.py
.
* Treat these like you would any other unit test you do.
* Use tiny, targeted data per-test to keep your test suite fast. Usually single-row datasets work fine for unit tests, but occasionally you may stretch it to 10 randomly generated ones.
2. Integration test everything that is inside pipeline.py
* This is where you typically stitch functions together. You already know they work individually from your unit tests, so completely disregard any form of unit testing for pipelines. Instead, think of them as an integration.
* Provide free inputs
* Test on final outputs
* Completely ignore intermediates - you already unit tested them.
3. System test the entire pipeline.
* Also known as "end-to-end" testing, or "automated acceptance" testing.
* Your test runner will losely look like:
(cd source_proj & kedro pipeline package my_pipeline)
(cd test_dir & kedro new --config /path/to/test/config)
(cd test_dir/test_proj & kedro pipeline pull source_proj/dist/my_pipeline*.whl)
... any other data / config / pipelines that need set up ...
kedro run
4. Never ever use catalog.yml
or parameters.yml
or data/
files in your tests.
* For unit tests, you'll likely want to try out many variations of parameters, so you can't anyway!
* Use kedro code API instead
5. Make use of pytest features, they make life a lot easier
* use fixtures
for setting up default catalogs and parameters (top-tip: pytest has a built-in fixture tmp_path
- use that in your catalog entries)
* conftest.py
is a really useful file