Title
#beginners-need-help
datajoely

datajoely

12/01/2021, 1:19 PM
Hi @User I'll answer all of you questions in this thread
1:20 PM
So regarding your second two questions - we don't encourage users to construct runners outside of the template like this? What are you trying to achieve
martinlarsalbert

martinlarsalbert

12/01/2021, 1:21 PM
I want to run the same pipeline with a number of datasets, like putting a pipeline in a "loop"
datajoely

datajoely

12/01/2021, 1:22 PM
Okay so you can achieve that without constructing a runner outside of the main template
1:22 PM
So we have a pattern called modular pipelines that allow you instantiate versions of pipelines but override the inputs/outputs/parameters
1:23 PM
Your normal
from kedro.pipeline import Pipeline
object can be overridden with
from kedro.pipeline.modular_pipeline import pipeline
martinlarsalbert

martinlarsalbert

12/01/2021, 1:23 PM
Yea I've tried that one also, for instance creating 5 modular pipelines. But then I will need to use namespaces to separate them if I understand correctly?
datajoely

datajoely

12/01/2021, 1:23 PM
yes - but you can namespace as part of the loop
1:24 PM
and that way we can be sure things are isolated
1:24 PM
If you look at this viz
1:24 PM
and expand the modelling pipeline
1:24 PM
you can see we have two instances of the same pipeline
1:25 PM
the feature pipeline too
martinlarsalbert

martinlarsalbert

12/01/2021, 1:25 PM
Thanks, so far I've been using the yaml files to specify my catalogs and then it seem like I will need to duplicate these for my namespaces?
datajoely

datajoely

12/01/2021, 1:25 PM
you can look at the code below
1:25 PM
that's how we build it
1:25 PM
it's going to be a tutorial, but it's currently work in progress
martinlarsalbert

martinlarsalbert

12/01/2021, 1:26 PM
OK, I will have a look and thanks a lot this help is fantastic!πŸ˜†
datajoely

datajoely

12/01/2021, 1:33 PM
πŸ’ͺ we built Modular Pipelines for this problem specifically - static pipeline definition, dynamic inputs
1:34 PM
we're hoping to overhaul the tutorials in the next few weeks as we do a terrible job about showing off how cool it is
1:34 PM
Plus the latest version of viz has the collapsible nodes that now show the power of namespacing
1:34 PM
and also make it much easier to develop against with
kedro viz --auto-reload
martinlarsalbert

martinlarsalbert

12/01/2021, 2:36 PM
Nice solutions! It will take me some time to fully understand them. But to take it to the extreme: Let's say that I have 100 datasets that should independently go through the same pipeline and I want to save all the outputs from the nodes for each dataset. With the modular pipeline solution I guess that you will have to create 100 namespaced copies of the catalog.yml definitions to get that working?
datajoely

datajoely

12/01/2021, 2:45 PM
so you can save yourself writing 100 catalog entries by doing a Jinja2 loop
2:46 PM
and doing the same on the python side if you need to
2:46 PM
I'd have a go with proving the concept for say 3
2:46 PM
and then scale
2:46 PM
because I think you'll get the hand of namespacing quicker that way
martinlarsalbert

martinlarsalbert

12/01/2021, 3:45 PM
I got that to working with jinja2 very nice indeed! Thank you so much! I will try to pay back with some contributions to this project when I get more into it
datajoely

datajoely

12/01/2021, 3:45 PM
That's wonderful to hear!
3:46 PM
Shout if you need any help πŸ™‚ your feedback on how difficult it was to read about modular pipelines is super useful and we're keen to make this easier for future people in your position
j c h a r l e s

j c h a r l e s

12/01/2021, 7:32 PM
These kinds of discussions are super helpful for other beginners such as myself. Please keep asking and answering questions in public. Thank you both
datajoely

datajoely

12/01/2021, 7:32 PM
Messages like this make it worth it! :kedroid:855153796736614411
j c h a r l e s

j c h a r l e s

12/01/2021, 9:34 PM
What do you mean by "expand the modeling pipeline"?
datajoely

datajoely

12/01/2021, 9:34 PM
On the left hand side you can expand the drop down
j c h a r l e s

j c h a r l e s

12/01/2021, 9:34 PM
Which tab?
9:34 PM
message has been deleted
datajoely

datajoely

12/01/2021, 9:35 PM
message has been deleted
9:35 PM
The little chevrons correspond to the namespaces
9:35 PM
Which can be nested with the dot syntax
j c h a r l e s

j c h a r l e s

12/01/2021, 9:36 PM
Wow
datajoely

datajoely

12/01/2021, 9:36 PM
Cool right?! Super excited about this feature it was only released a couple weeks back on the viz side
9:36 PM
And any modular pipeline can be packaged and shared with other projects
9:37 PM
Lots of fun stuff in this space
j c h a r l e s

j c h a r l e s

12/01/2021, 9:38 PM
Where do I submit feature requests?
datajoely

datajoely

12/01/2021, 9:38 PM
GitHub issues please!
j c h a r l e s

j c h a r l e s

12/01/2021, 9:38 PM
There's one thing I built 5 years ago that I would like to add in
9:38 PM
ok awesome
datajoely

datajoely

12/01/2021, 9:39 PM
If you’re feeling brave we accept PRs too πŸ‘€πŸ˜‚
j c h a r l e s

j c h a r l e s

12/01/2021, 9:39 PM
For these runs do you ever visualize the cardinality of the datasets that have been processed so far?
datajoely

datajoely

12/01/2021, 9:39 PM
Good question
j c h a r l e s

j c h a r l e s

12/01/2021, 9:39 PM
Like N=10000 for the X then N=9000 for X_train, N=1000 for X_test
datajoely

datajoely

12/01/2021, 9:39 PM
There is an argument you could use a tracking.MetricsDataSet to do that
9:40 PM
Stay tuned for more on that
j c h a r l e s

j c h a r l e s

12/01/2021, 9:40 PM
Yeah one of the main values I find for pipeline visualizations like these is to know what actually ran and what has the right amount of data
9:40 PM
I literally built this exact pipeline a bunch of times for ML
9:40 PM
and used graphviz to show the total rows output by each step
9:41 PM
and it kept me sane
9:41 PM
Because I ran this generic pipeline for dozens of use cases every day
datajoely

datajoely

12/01/2021, 9:41 PM
So in the demo if you click the πŸ§ͺ icon you can see the first cut of our experiment tracking features
9:44 PM
And this is actively being worked on
9:44 PM
So expect more features to arrive in quick succession
datajoely

datajoely

12/01/2021, 9:45 PM
Interesting so you’d like to annotate the flowchart with custom attributes
j c h a r l e s

j c h a r l e s

12/01/2021, 9:45 PM
message has been deleted
9:45 PM
yeah
datajoely

datajoely

12/01/2021, 9:45 PM
Would those be attributes of the data or task nodes
j c h a r l e s

j c h a r l e s

12/01/2021, 9:45 PM
And if something hasnt run
9:46 PM
Then it would be red with a 0
9:46 PM
or something like that
datajoely

datajoely

12/01/2021, 9:46 PM
Interesting
j c h a r l e s

j c h a r l e s

12/01/2021, 9:46 PM
yeah
9:46 PM
this was insanely useful
9:46 PM
Attributes of the data
datajoely

datajoely

12/01/2021, 9:46 PM
Please raise as a GitHub issue and we can get a sense from the community if it would be worth prioritising
j c h a r l e s

j c h a r l e s

12/01/2021, 9:46 PM
kk
9:46 PM
Any guidelines for issues?
datajoely

datajoely

12/01/2021, 9:47 PM
I think the short term answer is that our experiment tracking features will let you do something close to it
9:47 PM
There is an issue template when you select new issue
j c h a r l e s

j c h a r l e s

12/01/2021, 9:47 PM
kk excellent
9:47 PM
Could you explain how the experiment tracking lets you do this?
9:47 PM
Does it provide a table with stats for the intermediate datasets?
datajoely

datajoely

12/01/2021, 9:48 PM
πŸ™
9:48 PM
You can track what you want
9:49 PM
And it will show up in the second tab today
9:49 PM
But also on the flow chart soon
9:55 PM
I can show you some designs tomorrow when I’m back at my computer