[prev in list] [next in list] [prev in thread] [next in thread]
List: dm-devel
Subject: Re: [dm-devel] [PATCH 0/6] scsi_dh : Couple of fixes for scsi
From: "Moger, Babu" <Babu.Moger () lsi ! com>
Date: 2010-07-31 5:02:09
Message-ID: E463DF2B2E584B4A82673F53D62C2EF4D0663B70 () cosmail01 ! lsi ! com
[Download RAW message or body]
> -----Original Message-----
> From: Shyam Iyer [mailto:shyam_iyer@dell.com]
> Sent: Friday, July 30, 2010 7:22 PM
> To: sekharan@us.ibm.com
> Cc: Moger, Babu; device-mapper development; linux-scsi@vger.kernel.org;
> Qi, Yanling; Chauhan, Vijay; Dachepalli, Sudhir; Stankey, Robert
> Subject: Re: [PATCH 0/6] scsi_dh : Couple of fixes for scsi device
> handlers
>
> On 07/30/2010 07:56 PM, Chandra Seetharaman wrote:
> > On Fri, 2010-07-30 at 12:12 -0600, Moger, Babu wrote:
> >
> >
> > > > > Yes, We can do that. Problem is I am hitting the issue with
> BUG_ON
> > > > > in get_rdac_data which is there in the beginning of rdac_activate.
> > > > >
> > > > I do not understand the problem.
> > > >
> > > > If the BUG_ON on get_rdac_data() is being triggered, that means
> > > > sdev->scsi_dh_data is NULL, if that is the case, how can you access
> > > > sdev->scsi_dh_data->kref in scsi_dh_activate (in patch 2/6) ? Wouldn't
> > > > it trip a oops ?
> > > >
> > > Test case is deleting both active and passive paths almost together
> during
> > > the multipath testing. Looks like DM picked up the active path
> failure
> > > first. Then failed the active path and started scheduling activate_path
> to
> > > failover to passive path. Passive path is also about to go down pretty
> soon.
> > > When the control was in scsi_dh_activate, Looks like scsi_dh_data was
> still
> > > valid because I did not see panic here. But scsi_dh_data became NULL
> when
> > > control went to rdac_activate. That is when I hit the bug on.
> > >
> > > kernel BUG at /usr/src/packages/BUILD/lsi-scsi_dh_rdac-
> 01.00/obj/default/scsi_dh_rdac.c:232!
> > > RIP: 0010: rdac_activate+0x257/0x387 [scsi_dh_rdac]
> > >
> > > My understanding is someone triggered scsi_dh->detach for passive path
> during
> > > this small window. Only way I could see problem go away is holding
> reference
> > > counts between these calls. Did i miss anything here? See the code
> snippet below..
> > >
> > So, basically, the new patch just reduces the window of race. IOW, if it
> > just spins for few seconds (or preempted) just before calling the
> > kref_get() in the new patch, it will generate an oops as scsi_dh_data
> > would be NULL.
That is correct.
> >
> > Looks like you have to create a lock to protect scsi_dh_data and then
> > call kref_get() with the protection of lock.
> >
> >
> Second that.. Sounds like each kref_get/kref_put .. needs a lock
> protection not just this scenario.
I agree. I will attempt one more set of patches. This time I will try to match get and put sequence in \
a one file. Please feel free to comment.
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic