[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-raid
Subject: Re: [BUG] MD/RAID1 hung forever on freeze_array
From: Jinpu Wang <jinpu.wang () profitbricks ! com>
Date: 2016-12-21 12:48:25
Message-ID: CAMGffE=OVX-jp+-SY8A4pmJHyVAkVQVXSCa_3VgndFf4tYyG9w () mail ! gmail ! com
[Download RAW message or body]
Thanks Neil,
On Tue, Dec 20, 2016 at 10:23 PM, NeilBrown <neilb@suse.com> wrote:
> On Tue, Dec 20 2016, Jinpu Wang wrote:
>
>> Hi Neil,
>>> This is the problem. 'hold' hasn't been initialised.
>>> We could either do:
>>> bio_list_init(&hold);
>>> bio_list_merge(&hold, &bio_list_on_stack);
>> I tried above variant first and it lead to panic in endio path:
>>
> ...
>>
>> [exception RIP: bio_check_pages_dirty+65]
>
>
> I can't explain that at all. Maybe if I saw the complete patch I would
> be able to see what went wrong (or maybe there is a separate cause).
I think I know the cause now, I did
+bio_list_init(&hold);
bio_list_merge(&hold, &bio_list_on_stack);
- bio_list_init(&bio_list_on_stack)
I didn't read your comment clear enough, just thought we maybe don't
need to reinit &bio_list_on_stack twice.
But your idea is same the copy before we can q->make_request_fn.
>
>>>
>>> You didn't find 'hold' to be necessary in your testing, but I think that
>>> is more complex arrangements it could make an important difference.
>>
>> Could you elaborate a bit more, from my understanding, in later code,
>> we pop all bio from bio_list_on_stack,
>> add it to either "lower" or "same" bio_list, so merge both will have
>> the whole list again, right?
>
> If there are several bios on bio_list_on_stack, and if processing the
> first one causes several calls to generic_make_request(), then we want
> all the bios passed to generic_make_request() to be handled *before* the
> remaining bios that were originally on bio_list_on_stack. Of these new
> bios, we want to handle those for a lower level device first, and those
> for the same device after that. Only when all of those have been
> completely submitted (together with any subordinate bios submitted to
> generic_make_request()) do we move on to the rest of bio_list_on_stack
> (which are at the same level as the first bio, or higher in the device
> stack).
>
> The usage of 'hold' would become important when you have multiple
> levels. e.g. dm on md on scsi. If the first bio send to dm is split,
> then the first half needs to be processed all the way down to all the
> scsi devices, before the second half of the split is handled.
>
> Hope that helps,
> NeilBrown
>
Thanks, it does help a lot, I attached the patch I'm still testing,
but so far so good.
Could you check if I got it right?
Cheers,
--
Jinpu Wang
Linux Kernel Developer
ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
Tel: +49 30 577 008 042
Fax: +49 30 577 008 299
Email: jinpu.wang@profitbricks.com
URL: https://www.profitbricks.de
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss
["0001-block-fix-deadlock-between-freeze_array-and-wait_bar.patch" (text/x-patch)]
From 9500d5e9a7ccd1f4397f927ffdc8d378955d477c Mon Sep 17 00:00:00 2001
From: Jack Wang <jinpu.wang@profitbricks.com>
Date: Wed, 14 Dec 2016 16:55:52 +0100
Subject: [PATCH] block: fix deadlock between freeze_array() and wait_barrier()
When we call wait_barrier, we might have some bios waiting
in current->bio_list, which prevents the array_freeze call to
complete. Those can only be internal READs, which have already
passed the wait_barrier call (thus incrementing nr_pending), but
still were not submitted to the lower level, due to generic_make_request
logic to avoid recursive calls. In such case, we have a deadlock:
- array_frozen is already set to 1, so wait_barrier unconditionally waits, so
- internal READ bios will not be submitted, thus freeze_array will
never completes.
To fix this, modify generic_make_request to always sort bio_list_on_stack
first with lowest level, then higher, until same level.
Suggested-by: Neil Brown <neil.brown.suse.com>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
block/blk-core.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/block/blk-core.c b/block/blk-core.c
index e8d15d8..5a6390a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2064,10 +2064,30 @@ blk_qc_t generic_make_request(struct bio *bio)
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) {
+ struct bio_list lower, same, hold;
+
+ /* Create a fresh bio_list for all subordinate requests */
+ bio_list_init(&hold);
+ bio_list_merge(&hold, &bio_list_on_stack);
+ bio_list_init(&bio_list_on_stack);
ret = q->make_request_fn(q, bio);
blk_queue_exit(q);
+ /* sort new bios into those for a lower level
+ * and those for the same level
+ */
+ bio_list_init(&lower);
+ bio_list_init(&same);
+ while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL)
+ if (q == bdev_get_queue(bio->bi_bdev))
+ bio_list_add(&same, bio);
+ else
+ bio_list_add(&lower, bio);
+ /* now assemble so we handle the lowest level first */
+ bio_list_merge(&bio_list_on_stack, &lower);
+ bio_list_merge(&bio_list_on_stack, &same);
+ bio_list_merge(&bio_list_on_stack, &hold);
bio = bio_list_pop(current->bio_list);
} else {
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic