[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-btrfs
Subject:    Re: stat(2) returning device ID not existing in mountinfo
From:       Goffredo Baroncelli <kreijack () inwind ! it>
Date:       2017-02-13 19:21:34
Message-ID: e18d2622-75b0-18c0-8e79-630bcee3a554 () inwind ! it
[Download RAW message or body]

Hi,

I want to highlight this bug another time.

I encountered this bug, when I was looking to a problem with find. I my machine find \
took an huge quantity of memory (up to 3GB) when used by updatedb.

	http://lists.gnu.org/archive/html/findutils-patches/2016-12/msg00000.html

The root of the problem was a leak memory in find. What was strange however is that \
the findutils developers weren't unable to reproduce this bug. They find the leak, \
but not a so high memory usage.

After some email, it was discovered that find (when used in updatedb) checks if the \
filesystem is changed during the tree walking. They checked the device-id returned by \
stat against the one returned by mountinfo.

For btrfs these differ, and the check is repeat each time. Because the memory leak \
was per "filesystem check", using find on a btrfs filesystem caused a huge leak.

I hope that some btrfs developer could address this, because I suspect that a lot of \
tools compare the device id returned by /proc/self/mountinfo against the one returned \
by stat(2).


BR
G.Baroncelli




On 2016-09-20 15:15, Jeff Mahoney wrote:
> On 9/16/16 4:28 PM, Tomasz Sterna wrote:
> > Hi.
> > 
> > I have spotted an issue with stat(2) call on files on btrfs.
> > It is giving me dev_t st_dev number that does not correspond to any
> > mounted filesystem in proc's mountinfo.
> 
> That's by design.  Your particular file system may only use one device
> but, internally, btrfs uses virtualized storage that may be spread
> across multiple devices.  To make things more complicated, snapshots
> mean that:
> 
> sled1a:/mnt # btrfs sub list .
> ID 257 gen 14 top level 5 path a
> ID 258 gen 14 top level 5 path b
> 
> sled1a:/mnt # ls -laRi
> .:
> total 16
> 256 drwxr-xr-x 1 root root   4 Sep 20 09:08 .
> 256 drwxr-xr-x 1 root root 220 Sep 16 09:49 ..
> 256 drwxr-xr-x 1 root root   8 Sep 14 10:24 a
> 256 drwxr-xr-x 1 root root   8 Sep 14 10:24 b
> 
> ./a:
> total 4112
> 256 drwxr-xr-x 1 root root       8 Sep 14 10:24 .
> 256 drwxr-xr-x 1 root root       4 Sep 20 09:08 ..
> 257 -rw-r--r-- 1 root root 4194304 Sep 14 10:24 file
> 
> ./b:
> total 4112
> 256 drwxr-xr-x 1 root root       8 Sep 14 10:24 .
> 256 drwxr-xr-x 1 root root       4 Sep 20 09:08 ..
> 257 -rw-r--r-- 1 root root 4194304 Sep 14 10:24 file
> 
> Under normal circumstances those are two files with the same st_dev and
> the same inode number.  That would normally correspond to a hard link,
> but the files do not (necessarily) correspond to the same file.
> 
> ... but because we use anonymous device numbers for each subvolume, we
> have different device numbers for each one.
> 
> sled1a:/mnt # stat --format "%n st_dev=%d" {a,b}/file
> a/file st_dev=69
> b/file st_dev=70
> 
> It's a pretty big usability wart that we don't consistently report the
> device number.  We do it correctly in stat() but there are other places
> in the code that assume that inode->i_sb->s_dev will work.  In the SUSE
> kernels, we have patches that add a super_operation to report the
> correct device number everywhere, but even that is a hack.
> 
> > I already attempted a illinformed-patch in fs/btrfs/super.c:
> > 
> > @@ -1127,6 +1127,7 @@ static int btrfs_fill_super(struct super_block *sb,
> > 		goto fail_close;
> > 	}
> > 
> > +	sb->s_dev = inode->i_sb->s_dev;
> > 	sb->s_root = d_make_root(inode);
> > 	if (!sb->s_root) {
> > 		err = -ENOMEM;
> > 
> > but it didn't help.
> 
> It wouldn't.  That is assigning a variable to itself.
> 
> > I would like to dig deeper and fix it, but first I have to ask:
> > - Which number is wrong?
> > The one returned by stat() or the one in mountinfo?
> 
> The one in mountinfo, but then that means that the user only sees the
> anonymous devices in mount(8), which isn't what we want either.
> 
> I'm afraid the correct fix is very involved and requires non-trivial
> changes in the VFS layer as well.  It's on my long-term TODO list.  I
> currently have some patches that do the magic with vfsmounts but it's
> far from being usable.
> 
> -Jeff
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic