注册

记一次安卓广播引起的ANR死锁问题

年初遇到个bug,设备安装应用宝之后打开使用一段时间(浏览和安装应用),大概率会卡死,然后整个系统重启(表现为卡死后过一段时间开机动画出现,重新进入系统)。非常容易复现。


看到这种软重启首先考虑ANR,分析抓到的log


  到日志中查找watchdog关键词
01-25 11:02:42.032 22774 22796 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in handler on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on ActivityManager (ActivityManager), Blocked in handler on PowerManagerService (PowerManagerService)
01-25 11:02:42.033 22774 22796 W Watchdog: android.fg annotated stack trace:
01-25 11:02:42.037 22774 22796 W Watchdog: at com.android.server.am.ActivityManagerService.bindServiceInstance(ActivityManagerService.java:12853)
01-25 11:02:42.037 22774 22796 W Watchdog: - waiting to lock <0x02af09e1> (a com.android.server.am.ActivityManagerService)
01-25 11:02:42.037 22774 22796 W Watchdog: at com.android.server.am.ActivityManagerService.bindServiceInstance(ActivityManagerService.java:12810)
01-25 11:02:42.037 22774 22796 W Watchdog: at android.app.ContextImpl.bindServiceCommon(ContextImpl.java:2035)
01-25 11:02:42.037 22774 22796 W Watchdog: at android.app.ContextImpl.bindService(ContextImpl.java:1958)
01-25 11:02:42.037 22774 22796 W Watchdog: at com.android.internal.infra.ServiceConnector$Impl.bindService(ServiceConnector.java:343)
01-25 11:02:42.037 22774 22796 W Watchdog: at com.android.internal.infra.ServiceConnector$Impl.enqueueJobThread(ServiceConnector.java:462)
01-25 11:02:42.037 22774 22796 W Watchdog: at com.android.internal.infra.ServiceConnector$Impl.lambda$enqueue$1$com-android-internal-infra-ServiceConnector$Impl(ServiceConnector.java:445)
01-25 11:02:42.037 22774 22796 W Watchdog: at com.android.internal.infra.ServiceConnector$Impl$$ExternalSyntheticLambda2.run(Unknown Source:4)
01-25 11:02:42.037 22774 22796 W Watchdog: at android.os.Handler.handleCallback(Handler.java:942)
01-25 11:02:42.037 22774 22796 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:99)
01-25 11:02:42.037 22774 22796 W Watchdog: at android.os.Looper.loopOnce(Looper.java:201)
01-25 11:02:42.037 22774 22796 W Watchdog: at android.os.Looper.loop(Looper.java:288)
01-25 11:02:42.037 22774 22796 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:67)
01-25 11:02:42.037 22774 22796 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
01-25 11:02:42.038 22774 22796 W Watchdog: main annotated stack trace:
01-25 11:02:42.038 22774 22796 W Watchdog: at com.android.server.alarm.AlarmManagerService$AlarmHandler.handleMessage(AlarmManagerService.java:4993)
01-25 11:02:42.038 22774 22796 W Watchdog: - waiting to lock <0x0edf48f2> (a java.lang.Object)
01-25 11:02:42.038 22774 22796 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:106)
01-25 11:02:42.038 22774 22796 W Watchdog: at android.os.Looper.loopOnce(Looper.java:201)
01-25 11:02:42.038 22774 22796 W Watchdog: at android.os.Looper.loop(Looper.java:288)
01-25 11:02:42.038 22774 22796 W Watchdog: at com.android.server.SystemServer.run(SystemServer.java:968)
01-25 11:02:42.038 22774 22796 W Watchdog: at com.android.server.SystemServer.main(SystemServer.java:653)
01-25 11:02:42.038 22774 22796 W Watchdog: at java.lang.reflect.Method.invoke(Native Method)
01-25 11:02:42.038 22774 22796 W Watchdog: at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
01-25 11:02:42.038 22774 22796 W Watchdog: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:920)
01-25 11:02:42.038 22774 22796 W Watchdog: ActivityManager annotated stack trace:
01-25 11:02:42.038 22774 22796 W Watchdog: at com.android.server.am.BroadcastQueue.processNextBroadcast(BroadcastQueue.java:1154)
01-25 11:02:42.038 22774 22796 W Watchdog: - waiting to lock <0x02af09e1> (a com.android.server.am.ActivityManagerService)
01-25 11:02:42.038 22774 22796 W Watchdog: at com.android.server.am.BroadcastQueue.-$$Nest$mprocessNextBroadcast(Unknown Source:0)
01-25 11:02:42.038 22774 22796 W Watchdog: at com.android.server.am.BroadcastQueue$BroadcastHandler.handleMessage(BroadcastQueue.java:224)
01-25 11:02:42.038 22774 22796 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:106)
01-25 11:02:42.038 22774 22796 W Watchdog: at android.os.Looper.loopOnce(Looper.java:201)
01-25 11:02:42.038 22774 22796 W Watchdog: at android.os.Looper.loop(Looper.java:288)
01-25 11:02:42.038 22774 22796 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:67)
01-25 11:02:42.038 22774 22796 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
01-25 11:02:42.039 22774 22796 W Watchdog: PowerManagerService annotated stack trace:
01-25 11:02:42.039 22774 22796 W Watchdog: at com.android.server.power.PowerManagerService.handleSandman(PowerManagerService.java:3257)
01-25 11:02:42.039 22774 22796 W Watchdog: - waiting to lock <0x0f550be5> (a java.lang.Object)
01-25 11:02:42.039 22774 22796 W Watchdog: at com.android.server.power.PowerManagerService.-$$Nest$mhandleSandman(Unknown Source:0)
01-25 11:02:42.039 22774 22796 W Watchdog: at com.android.server.power.PowerManagerService$PowerManagerHandlerCallback.handleMessage(PowerManagerService.java:5103)
01-25 11:02:42.039 22774 22796 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:102)
01-25 11:02:42.039 22774 22796 W Watchdog: at android.os.Looper.loopOnce(Looper.java:201)
01-25 11:02:42.039 22774 22796 W Watchdog: at android.os.Looper.loop(Looper.java:288)
01-25 11:02:42.039 22774 22796 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:67)
01-25 11:02:42.039 22774 22796 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
01-25 11:02:42.039 22774 22796 W Watchdog: *** GOODBYE!

可以很清楚的看到watchdog提示waiting to lock </xxxxx/> 关键词已经确定大概率是死锁问题了。
导出Anr日志继续进行分析,一般先从group="main"分析起


"main" prio=5 tid=1 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x71c515d0 self=0xb4000078c0a0abe0
| sysTid=2257 nice=-2 cgrp=foreground sched=0/0 handle=0x7a80db94f8
| state=S schedstat=( 67235248988 24733272110 200587 ) utm=3966 stm=2756 core=7 HZ=100
| stack=0x7fca926000-0x7fca928000 stackSize=8188KB
| held mutexes=
at com.android.server.am.ActivityManagerService.broadcastIntentWithFeature(ActivityManagerService.java:14615)
- waiting to lock <0x0185c3b1> (a com.android.server.am.ActivityManagerService) held by thread 16
at android.app.ActivityManager.broadcastStickyIntent(ActivityManager.java:4620)
at android.app.ActivityManager.broadcastStickyIntent(ActivityManager.java:4610)
at com.android.server.BatteryService.lambda$sendBatteryChangedIntentLocked$0(BatteryService.java:780)
at com.android.server.BatteryService.$r8$lambda$r64V5AVg_Okl7PnB1VjeN4oyo1I(unavailable:0)
at com.android.server.BatteryService$$ExternalSyntheticLambda5.run(unavailable:2)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at com.android.server.SystemServer.run(SystemServer.java:968)
at com.android.server.SystemServer.main(SystemServer.java:653)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:920)

main它在等待thread 16的lock <0x0185c3b1>,所以我们去tid=16再看看


"android.display" prio=5 tid=16 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x14340db8 self=0xb4000078c0a40a10
| sysTid=2349 nice=-3 cgrp=top-app sched=0/0 handle=0x76c8933cb0
| state=S schedstat=( 1805684706 1655206467 5034 ) utm=99 stm=81 core=7 HZ=100
| stack=0x76c8830000-0x76c8832000 stackSize=1039KB
| held mutexes=
at com.android.server.wm.ActivityTaskManagerService$LocalService.onProcessAdded(ActivityTaskManagerService.java:5740)
- waiting to lock <0x033a7977> (a com.android.server.wm.WindowManagerGlobalLock) held by thread 198
at com.android.server.am.ProcessList$MyProcessMap.put(ProcessList.java:699)
at com.android.server.am.ProcessList.addProcessNameLocked(ProcessList.java:2946)
- locked <0x09f229e4> (a com.android.server.am.ActivityManagerProcLock)
at com.android.server.am.ProcessList.newProcessRecordLocked(ProcessList.java:3039)
at com.android.server.am.ProcessList.startProcessLocked(ProcessList.java:2487)
at com.android.server.am.ActivityManagerService.startProcessLocked(ActivityManagerService.java:2854)
at com.android.server.am.ActivityManagerService$LocalService.startProcess(ActivityManagerService.java:17450)
- locked <0x0185c3b1> (a com.android.server.am.ActivityManagerService)
at com.android.server.wm.ActivityTaskManagerService$$ExternalSyntheticLambda11.accept(unavailable:27)
at com.android.internal.util.function.pooled.PooledLambdaImpl.doInvoke(PooledLambdaImpl.java:363)
at com.android.internal.util.function.pooled.PooledLambdaImpl.invoke(PooledLambdaImpl.java:204)
at com.android.internal.util.function.pooled.OmniFunction.run(OmniFunction.java:97)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at android.os.HandlerThread.run(HandlerThread.java:67)
at com.android.server.ServiceThread.run(ServiceThread.java:44)

可以看到waiting to lock <0x033a7977>,它在等待threa198的锁释放。因此再去tid=198看看


"binder:2257_1C" prio=5 tid=198 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x143443a0 self=0xb4000078c0bb49f0
| sysTid=6550 nice=-10 cgrp=foreground sched=0/0 handle=0x7638d60cb0
| state=S schedstat=( 12455767235 6173702511 27824 ) utm=695 stm=550 core=4 HZ=100
| stack=0x7638c69000-0x7638c6b000 stackSize=991KB
| held mutexes=
at com.android.server.display.DisplayManagerService.setDisplayPropertiesInternal(DisplayManagerService.java:2019)
- waiting to lock <0x0faa6d02> (a com.android.server.display.DisplayManagerService$SyncRoot) held by thread 130
at com.android.server.display.DisplayManagerService.-$$Nest$msetDisplayPropertiesInternal(unavailable:0)
at com.android.server.display.DisplayManagerService$LocalService.setDisplayProperties(DisplayManagerService.java:3811)
at com.android.server.wm.DisplayContent.applySurfaceChangesTransaction(DisplayContent.java:4878)
at com.android.server.wm.RootWindowContainer.applySurfaceChangesTransaction(RootWindowContainer.java:1022)
at com.android.server.wm.RootWindowContainer.performSurfacePlacementNoTrace(RootWindowContainer.java:824)
at com.android.server.wm.RootWindowContainer.performSurfacePlacement(RootWindowContainer.java:785)
at com.android.server.wm.WindowSurfacePlacer.performSurfacePlacementLoop(WindowSurfacePlacer.java:177)
at com.android.server.wm.WindowSurfacePlacer.performSurfacePlacement(WindowSurfacePlacer.java:126)
at com.android.server.wm.WindowManagerService.relayoutWindow(WindowManagerService.java:2501)
- locked <0x033a7977> (a com.android.server.wm.WindowManagerGlobalLock)
at com.android.server.wm.Session.relayout(Session.java:253)
at com.android.server.wm.Session.relayoutAsync(Session.java:267)
at android.view.IWindowSession$Stub.onTransact(IWindowSession.java:757)
at com.android.server.wm.Session.onTransact(Session.java:178)
at android.os.Binder.execTransactInternal(Binder.java:1285)
at android.os.Binder.execTransact(Binder.java:1244)

198在等待130来释放锁,waiting to lock <0x0faa6d02>,不要嫌麻烦再跟去130


"binder:2257_4" prio=5 tid=130 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x15288e48 self=0xb4000078c0b32400
| sysTid=2541 nice=0 cgrp=foreground sched=0/0 handle=0x76489a8cb0
| state=S schedstat=( 8323462229 6484203506 25835 ) utm=482 stm=349 core=7 HZ=100
| stack=0x76488b1000-0x76488b3000 stackSize=991KB
| held mutexes=
at com.android.server.am.ActivityManagerService.registerReceiverWithFeature(ActivityManagerService.java:13285)
- waiting to lock <0x0185c3b1> (a com.android.server.am.ActivityManagerService) held by thread 16
at android.app.ContextImpl.registerReceiverInternal(ContextImpl.java:1816)
at android.app.ContextImpl.registerReceiver(ContextImpl.java:1750)
at android.app.ContextImpl.registerReceiver(ContextImpl.java:1738)
at com.android.server.display.DisplayPowerController.<init>(DisplayPowerController.java:674)
at com.android.server.display.DisplayManagerService.addDisplayPowerControllerLocked(DisplayManagerService.java:2710)
at com.android.server.display.DisplayManagerService.handleLogicalDisplayAddedLocked(DisplayManagerService.java:1561)
at com.android.server.display.DisplayManagerService.-$$Nest$mhandleLogicalDisplayAddedLocked(unavailable:0)
at com.android.server.display.DisplayManagerService$LogicalDisplayListener.onLogicalDisplayEventLocked(DisplayManagerService.java:2811)
at com.android.server.display.LogicalDisplayMapper.sendUpdatesForDisplaysLocked(LogicalDisplayMapper.java:759)
at com.android.server.display.LogicalDisplayMapper.updateLogicalDisplaysLocked(LogicalDisplayMapper.java:733)
at com.android.server.display.LogicalDisplayMapper.handleDisplayDeviceAddedLocked(LogicalDisplayMapper.java:580)
at com.android.server.display.LogicalDisplayMapper.onDisplayDeviceEventLocked(LogicalDisplayMapper.java:201)
at com.android.server.display.DisplayDeviceRepository.sendEventLocked(DisplayDeviceRepository.java:214)
at com.android.server.display.DisplayDeviceRepository.handleDisplayDeviceAdded(DisplayDeviceRepository.java:158)
- locked <0x0faa6d02> (a com.android.server.display.DisplayManagerService$SyncRoot)
at com.android.server.display.DisplayDeviceRepository.onDisplayDeviceEvent(DisplayDeviceRepository.java:87)
at com.android.server.display.DisplayManagerService.createVirtualDisplayLocked(DisplayManagerService.java:1418)
at com.android.server.display.DisplayManagerService.createVirtualDisplayInternal(DisplayManagerService.java:1381)
- locked <0x0faa6d02> (a com.android.server.display.DisplayManagerService$SyncRoot)
at com.android.server.display.DisplayManagerService.-$$Nest$mcreateVirtualDisplayInternal(unavailable:0)
at com.android.server.display.DisplayManagerService$BinderService.createVirtualDisplay(DisplayManagerService.java:3180)
at android.hardware.display.IDisplayManager$Stub.onTransact(IDisplayManager.java:659)
at android.os.Binder.execTransactInternal(Binder.java:1280)
at android.os.Binder.execTransact(Binder.java:1244)

130在等待16释放锁,这下子就明朗了,画个它们之间的关系图,这样就很明朗了他们三个之间死锁引起main主线程等待超时。


graph TD
binder2257_4 --> binder2257_1c --> android.display --> binder2257_4 & main主线程

前面这里只是分析出问题原因。下面来说问题为什么会发生。从main主线程开始追到binder2257_4然后开始循环。合理怀疑问题是先从binder2257_4出现的。都是binder是因为他们之间是用binder通信的。
看到binder2257_4这个anr日志中出现


at com.android.server.display.DisplayPowerController.<init>(DisplayPowerController.java:674)

竟然是在displaypowercontroller init中出现的,去到这个文件的674行,发现这里在注册广播接收器。


image.png


尝试退回这条提交之后果然问题没有复现了,分析这条提交,它是想在displaypowercontroller中注册一个广播接收器。用于接收挂电话的时候发出的广播。当收到这条广播之后就会通知displaypowercontroller亮屏。其实就是实现了一个通话时对方挂断电话之后自动亮屏。它是在创建阶段进行的注册。要分析为什么这个会导致死锁。


作者:用户8081391597591
来源:juejin.cn/post/7353158088730165259

0 个评论

要回复文章请先登录注册