I've got two clients doing some simple file copies, file removes, and directory removes. After a while, I stop both test scripts and unmount the mds. We can see the relationship between files, layouts, and device tables (mpd):
[root@pnfs-17-24 ~]> ./rlays.sh | grep refcnt
+ mdb -k
+ echo ::walk mds_Layout_entry_cache|::print struct rfs4_dbe
refcnt = 0x50
refcnt = 0x99
refcnt = 0x49
[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt | wc -l
+ mdb -k
+ echo ::walk mds_File_entry_cache|::print struct rfs4_dbe
303
[root@pnfs-17-24 ~]> ./rmpd.sh | grep refcnt
+ mdb -k
+ echo ::walk mds_mpd_entry_cache|::print struct rfs4_dbe
refcnt = 0x2
refcnt = 0x2
refcnt = 0x2
I just fixed a bug where a layout request invalidated the old layout. So we can see here that there are over 300 files with state which are still active, there are 3 corresponding layouts, each with a different usage, and 3 mpds, each with a hold from their respective layout. The 3 layouts correspond to the 3 policies in effect in the spe. There is no round robin scheduling going on, which might see more layouts in use.
I haven't fixed it yet, and this example doesn't show it, but each layout create is going to cause a corresponding mpd create. Which is okay for now, the only difference we could support would be a different stripe unit size.
And we can see that the files have been harvested, but the layouts have not:
[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt | wc -l
+ mdb -k
+ echo ::walk mds_File_entry_cache|::print struct rfs4_dbe
0
[root@pnfs-17-24 ~]> ./rlays.sh | grep refcnt
+ mdb -k
+ echo ::walk mds_Layout_entry_cache|::print struct rfs4_dbe
refcnt = 0x1
refcnt = 0x1
refcnt = 0x1
[root@pnfs-17-24 ~]> ./rmpd.sh | grep refcnt
+ mdb -k
+ echo ::walk mds_mpd_entry_cache|::print struct rfs4_dbe
refcnt = 0x2
refcnt = 0x2
refcnt = 0x2
The layouts are ripe for plucking, with a refcnt of 1, it means that they are in the table, but no one else references them. Let's see if we can quickly use them again:
[root@pnfs-17-24 ~]> ./rlays.sh | grep refcnt
+ mdb -k
+ echo ::walk mds_Layout_entry_cache|::print struct rfs4_dbe
refcnt = 0x5
refcnt = 0xb
refcnt = 0x4
[root@pnfs-17-24 ~]> ./rmpd.sh | grep refcnt
+ mdb -k
+ echo ::walk mds_mpd_entry_cache|::print struct rfs4_dbe
refcnt = 0x2
refcnt = 0x2
refcnt = 0x2
Now I don't have proof here that these are the same entries in the db, but what I wanted to point out is that I never invalidate entries in either the layout or mpd tables. Instead, I rely on the fact that they can not go away until all external references are gone. Based on timing, I'd say we did reuse the entries, otherwise I would expect to see 6 mpds and not 3 of them. I.e., they do not get reused and if there were new layout entries, we'd expect to see 6 mpds.