[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-btrfs
Subject: Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH
From: Li Zefan <lizf () cn ! fujitsu ! com>
Date: 2010-12-15 3:33:33
Message-ID: 4D08370D.6020705 () cn ! fujitsu ! com
[Download RAW message or body]
Goffredo Baroncelli wrote:
> On Tuesday, 14 December, 2010, Li Zefan wrote:
>> Goffredo Baroncelli wrote:
>>> Hi Li,
>>>
>>> On Monday, 13 December, 2010, Li Zefan wrote:
>>>> The keys returned by tree search ioctl should be restricted to:
>>>>
>>>> key.objectid = [min_objectid, max_objectid] &&
>>>> key.offset = [min_offset, max_offset] &&
>>>> key.type = [min_type, max_type]
>>>>
>>>> But actually it returns those keys:
>>>>
>>>> [(min_objectid, min_type, min_offset),
>>>> (max_objectid, max_type, max_offset)].
>>>>
>>> I have to admit that I had need several minutes to understand what you
> wrote
>>> :). Then I came to conclusion that the tree search ioctl is basically
> wrong.
>>> IMHO, the error in this API is to use the lower bound of the acceptance
>>> criteria (the min_objectid, min_offset, min_type fields) also as starting
>>> point for the search.
>>>
>>> Let me explain with an example.
>>>
>>> Suppose to want to search all the keys in the range
>>>
>>> key.objectid = 10..20
>>> key.offset = 100..200
>>> key.type = 2..5
>>>
>>>
>>> Suppose to set sk->nr_items to 1 for simplicity, and the keys available
> which
>>> fit in the range are
>>>
>>> 1) [15,150,3]
>>> 2) [16,160,4]
>>> 3) [17,180,3]
>>>
>>> All these key satisfy the "acceptance criteria", but because we have to
>>> restart the search from the last key found, the code should resemble
>>>
>>> sk = &args.key
>>>
>>> sk->min_objectid; sk->max_objectid
>>> sk->min_offset0; sk->max_offset 0
>>> sk->min_type=2; sk->max_type=5
>>> sk->nr_items = 1;
>>>
>>> while(1){
>>> ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args);
>>> if( !sk->nr_items )
>>> break
>>>
>>> for(off = 0, i=0 ; i < sk->nr_items ; i ){
>>> sh = (struct btrfs_ioctl_search_header *)(args.buf
>>> off);
>>>
>>> [...]
>>> sk->min_objectid = sh->objectid;
>>> sk->min_offset = sh->offset;
>>> sk->min_type = sh->type;
>>> }
>>>
>>> <increase the sk->min_* key of 1>
>>>
>>> }
>>>
>>> But in this case, the code after found the key #2, sets the minimum
> acceptance
>>> criteria to [16,160,4], which exclude the key #3 because min_type is too
> high.
>>> Ideally, we should add three new field to the search key structure:
>>>
>>> sk->start_objectid
>>> sk->start_offset
>>> sk->start_type
>>>
>>> And after every iteration the code (even the kernel code) should set these
>>> fields as "last key found 1", leaving the min_* fields as they are.
>>>
>>> My analysis is correct or I miss something ?
>>>
>> After looking more deeply, I found the ioctl was changed in this way
>> on purpose, to support "btrfs subvolume find-new" specifically.
>>
>> See this commit:
>>
>> commit abc6e1341bda974e2d0eddb75f57a20ac18e9b33
>> Author: Chris Mason <chris.mason@oracle.com>
>> Date: Thu Mar 18 12:10:08 2010 -0400
>>
>> Btrfs: fix key checks and advance in the search ioctl
>>
>> The search ioctl was working well for finding tree roots, but using it
> for
>> generic searches requires a few changes to how the keys are advanced.
>> This treats the search control min fields for objectid, type and offset
>> more like a key, where we drop the offset to zero once we bump the type,
>> etc.
>>
>> The downside of this is that we are changing the min_type and min_offset
>> fields during the search, and so the ioctl caller needs extra checks to
> make
>> the keys in the result are the ones it wanted.
>>
>> This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make
>> things more readable.
>>
>> So I think we can just fix the btrfs tool. Though adding sk->start_xxx
> should
>> also be able to meet the needs for "btrfs subvolume find-new".
>
> Sorry, but I have to disagree. This API seems to me simply bugged. The example
> above (which is quite generic) highlights this fact. But I can provide a more
> real case: suppose to use the BTRFS_IOC_TREE_SEARCH ioctl to find the new
> files. We are interested to the following items:
>
> - BTRFS_EXTENT_DATA_KEY (type = 1)
> - BTRFS_INODE_ITEM_KEY (type = 24)
> - BTRFS_XATTR_ITEM_KEY (type = 108)
>
> Acceptance criteria:
>
> min_type = 1
> max_type = 108
> min_offset = 0
> max_offset = ~0
> min_objectid = 0
> max_objectid = ~0
> min_transid = <the base generation number>
>
> Pay attention that we aren't interested in the offset.
>
> Suppose to have the following sequence keys [objectid, type, offset]:
>
> [...]
> 1) [300, BTRFS_EXTENT_DATA_KEY, xx]
> 2) [300, BTRFS_INODE_ITEM_KEY, xx]
> 3) [300, BTRFS_XATTR_ITEM_KEY, xx]
> 4) [301, BTRFS_EXTENT_DATA_KEY, xx]
> 5) [301, BTRFS_INODE_ITEM_KEY, xx]
> 7) [30200, BTRFS_EXTENT_DATA_KEY, xx]
> 8) [30200, BTRFS_INODE_ITEM_KEY, xx]
> 9) [30200, BTRFS_XATTR_ITEM_KEY, xx]
> [...]
>
>
> Suppose that the buffer is filled between the item 2 and 3. We should restart
> the search, but how set the min_* key ? Try the following hypothesis
>
> h1) objectid++, type = 0 -> In the next search the key 3 would be skipped
> h2) objectid asis, type ++, -> in the next search the key 4 would be skipped
> h3) objectid asis, type = 0 -> in the next search the key 1,2,3 would be
h4) objectid asis, type asis, offset++ -> we should get the correct result.
because the current ioctl uses min_{x,y,z} and max_{x,y,z} as start_key and
end_key, and it returns all keys that falls in [start_key, end_key].
So this btrfs-progs patch should fix missing subvolumes in the output of
"subvolume list":
diff --git a/btrfs-list.c b/btrfs-list.c
index 93766a8..1b9ea45 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -620,7 +620,10 @@ int list_subvols(int fd)
/* this iteration is done, step forward one root for the next
* ioctl
*/
- if (sk->min_objectid < (u64)-1) {
+ if (sk->min_type < BTRFS_ROOT_BACKREF_KEY) {
+ sk->min_type = BTRFS_ROOT_BACKREF_KEY;
+ sk->min_offset = 0;
+ } else if (sk->min_objectid < (u64)-1) {
sk->min_objectid++;
sk->min_type = BTRFS_ROOT_BACKREF_KEY;
sk->min_offset = 0;
> returned a second time...
>
> Pay attention that every inode may have more key type BTRFS_XATTR_ITEM_KEY or
> type BTRFS_EXTENT_DATA_KEY, so it is not possible to know in advance when the
> buffer is filled.
>
> Only as theoretical exercise, we can improve the search logic in userspace so
> when an item is returned, in the next search we set the minimum type as
> previous type+1, and the *maximum* objectid as the latest ofound bject id.
> When we are sure that there are not more key with this objectid we can reuse
> the old max_objectid and min_type... But to me it seems very fragile.
>
> Chris what do you think ? Otherwise I missed something this seems a severe bug
> in the api ?
>
> In another email I will propose a patch which may address this problem.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic