Bug 216371 - acpi wake up with black screen(failed to get iomux index)
Summary: acpi wake up with black screen(failed to get iomux index)
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: AMD Linux
: P1 high
Assignee: acpi_power-sleep-wake
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-17 01:50 UTC by neoe
Modified: 2022-11-17 16:27 UTC (History)
3 users (show)

See Also:
Kernel Version: 6.0.0-rc2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
OK version (92.28 KB, text/plain)
2022-08-25 14:31 UTC, neoe
Details
another dmesg when bad (91.88 KB, text/plain)
2022-08-26 04:46 UTC, neoe
Details

Description neoe 2022-08-17 01:50:41 UTC
just upgrade from 5.18 to 6.0.0-rc1

`acpitool -s` 
to set to sleep, it seems a little slow.

then wake up, black screen,

everything works fine before.

AMD 3900X
Comment 1 neoe 2022-08-22 02:39:12 UTC
dmesg

amd_gpio AMDI0030:00: failed to get iomux index
Comment 2 Mario Limonciello (AMD) 2022-08-25 13:46:37 UTC
Can you please bisect (https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html)?
Comment 3 Mario Limonciello (AMD) 2022-08-25 14:18:41 UTC
Also, please share a full dmesg from the previous working kernel and full dmesg from broken kernel.
Comment 4 neoe 2022-08-25 14:31:33 UTC
Created attachment 301657 [details]
OK version
Comment 5 neoe 2022-08-25 14:34:22 UTC
sorry I don't have bad version dmesg for 6.0

but I have some other information:


acpitool -e

  Kernel version : 5.19.3-gs   -    ACPI version : 20220331
  Freq. scaling driver   : amd-pstate

  Kernel version : 6.0.0-rc2-gs   -    ACPI version : 20220331
  Freq. scaling driver   : acpi-cpufreq

6.0.0 seems `amd-pstate` not loaded, but use `acpi-cpufreq`
Comment 6 Mario Limonciello (AMD) 2022-08-25 14:38:23 UTC
That may be a completely separate bug.  Maybe you can try to explicitly set which frequency scaling driver is used in both cases to isolate if that's the cause of your issue.
Comment 7 neoe 2022-08-25 15:46:14 UTC
I add 'Command line: BOOT_IMAGE=/boot/vmlinuz-xxx root=/dev/nvme0n1p4 amd_pstate.shared_mem=1' to enable `amd-pstate` for all , I think.

when next 6.0.0-rc3 come out, I think I will have another try and upload the dmesg    here.
Comment 8 Mario Limonciello (AMD) 2022-08-25 17:12:46 UTC
Once you test 6.0-rc3 if it's still failing, please perform a bisect to find the root cause.
Comment 9 neoe 2022-08-26 04:38:05 UTC
I made a git bitsect

```
[cb6b81b21bd9cf09d72b7fe711be1b55001eb166] Merge tag 'drm-misc-next-fixes-2022-07-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

# git bisect bad
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[676ad8e997036e2f815c293b76c356fb7cc97a08] drm: rcar-du: Lift z-pos restriction on primary plane for Gen3

bad amdgpu
# git bisect good
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[c96cfaf8fc02d4bb70727dfa7ce7841a3cff9be2] drm/nouveau: Don't pm_runtime_put_sync(), only pm_runtime_put_autosuspend()

bad amdgpu
cpufreq but wake up good

```
then I don't know whether it's good or bad, because 
if say bad, it wake up good
if say good, it use cpufreq instead of expected amd-pstate

but there is already 2 rev left, so can I leave it for you the developer?

btw, I start from tags/v6.0-rc1(bad) and  tags/v5.19(good) with 7097 revisions,
I hope there are no regression  more than once, and hope the result range is meaningful.
Comment 10 neoe 2022-08-26 04:46:30 UTC
Created attachment 301670 [details]
another dmesg when bad
Comment 11 Mario Limonciello (AMD) 2022-08-26 04:52:46 UTC
> then I don't know whether it's good or bad, because 
> if say bad, it wake up good
> if say good, it use cpufreq instead of expected amd-pstate

Because you know there is a problem with this part way through it's better to force all the tests to use acpi-cpufreq.  It removes more variability in the test result.

> but there is already 2 rev left, so can I leave it for you the developer?

I don't know what are actually left, I think I'd need to see the whole log to see what happens.  With the above guidance can you narrow down to a specific commit?
Comment 12 Mario Limonciello (AMD) 2022-08-26 04:56:40 UTC
Shot in the dark until we know the causing commit - a6250bdb6c4677ee77d699b338e077b900f94c0c in 6.0-rc2 and latest 5.19.y helps some other people with VT freezes.
Comment 13 neoe 2022-08-26 05:23:48 UTC
wow, maybe my result is not valid, because I really get lost in the bisect game.


[/home/neoe/oss/linux/linux] git bisect good

cb6b81b21bd9cf09d72b7fe711be1b55001eb166 is the first bad commit
commit cb6b81b21bd9cf09d72b7fe711be1b55001eb166
Merge: 3cfb5bc94fab 6f2c8d5f1659
Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Jul 22 13:43:46 2022 +1000

    Merge tag 'drm-misc-next-fixes-2022-07-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
    
    Short summary of fixes pull:
    
     * amdgpu: Fix for drm buddy memory corruption
     * nouveau: PM fixes; DP fixes
    
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    
    From: Thomas Zimmermann <tzimmermann@suse.de>
    Link: https://patchwork.freedesktop.org/patch/msgid/Ytj65+PdAJs4jIEO@linux-uq9g

 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 16 ++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h |  2 +-
 drivers/gpu/drm/nouveau/nouveau_connector.c  |  8 +++-----
 drivers/gpu/drm/nouveau/nouveau_display.c    |  4 ++--
 drivers/gpu/drm/nouveau/nouveau_fbcon.c      |  2 +-
 5 files changed, 15 insertions(+), 17 deletions(-)



[/home/neoe/oss/linux/linux] git bisect log

git bisect start
# status: waiting for both good and bad commits
# good: [1b54a0121dba12af268fb75c413feabdb9f573d4] drm/amd/display: Reduce stack size in the mode support function
git bisect good 1b54a0121dba12af268fb75c413feabdb9f573d4
# status: waiting for bad commit, 1 good commit known
# bad: [2bc7ea71a73747a77e7f83bc085b0d2393235410] Merge tag 'topic/nouveau-misc-2022-07-27' of git://anongit.freedesktop.org/drm/drm into drm-next
git bisect bad 2bc7ea71a73747a77e7f83bc085b0d2393235410
# good: [c877bed82e1017c102c137d432933ccbba92c119] drm/i915/gt: Only kick the signal worker if there's been an update
git bisect good c877bed82e1017c102c137d432933ccbba92c119
# bad: [ee8b1ef9a6b089abf7a9c7d094b6e93fa05f15b9] Merge tag 'amd-drm-next-5.20-2022-07-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
git bisect bad ee8b1ef9a6b089abf7a9c7d094b6e93fa05f15b9
# bad: [cb6b81b21bd9cf09d72b7fe711be1b55001eb166] Merge tag 'drm-misc-next-fixes-2022-07-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
git bisect bad cb6b81b21bd9cf09d72b7fe711be1b55001eb166
# good: [676ad8e997036e2f815c293b76c356fb7cc97a08] drm: rcar-du: Lift z-pos restriction on primary plane for Gen3
git bisect good 676ad8e997036e2f815c293b76c356fb7cc97a08
# good: [c96cfaf8fc02d4bb70727dfa7ce7841a3cff9be2] drm/nouveau: Don't pm_runtime_put_sync(), only pm_runtime_put_autosuspend()
git bisect good c96cfaf8fc02d4bb70727dfa7ce7841a3cff9be2
# good: [6f2c8d5f16594a13295d153245e0bb8166db7ac9] drm/amdgpu: Fix for drm buddy memory corruption
git bisect good 6f2c8d5f16594a13295d153245e0bb8166db7ac9
# good: [3cfb5bc94fab39c456dccee75553f7f6c52ee7f7] Merge tag 'du-next-20220707' of git://linuxtv.org/pinchartl/media into drm-next
git bisect good 3cfb5bc94fab39c456dccee75553f7f6c52ee7f7
# first bad commit: [cb6b81b21bd9cf09d72b7fe711be1b55001eb166] Merge tag 'drm-misc-next-fixes-2022-07-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
Comment 14 Mario Limonciello (AMD) 2022-08-26 12:29:31 UTC
It's unfortunate it ends on a merge commit but not unheard of.
https://stackoverflow.com/questions/17267816/git-bisect-with-merged-commits

If that's a true result, you should be able to add nomodeset which will turn off amdgpu and check whether you can suspend/resume.  If so it does confirm this is still an amdgpu bug.

The real commit from that one should be 6f2c8d5f16594a13295d153245e0bb8166db7ac9, but you have that marked as good above. The only other stuff in that merge request is nouveau which shouldn't affect your system.

If nomodeset helps I think you should redo your bisect.
Comment 15 neoe 2022-08-26 22:25:36 UTC
added nomodeset to 6.0.0-rc2 still wake up to black screen.
means this is more likely acpi bug than amdgpu bug.
Comment 16 Mario Limonciello (AMD) 2022-08-29 13:44:47 UTC
In that case I think you should redo the bisect with amdgpu blacklisted for the entire duration so that any instability in the middle of the release doesn't lead to a bad result.
Comment 17 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-09-04 18:05:38 UTC
neoe, any news? Did you try what Mario suggested?
Comment 18 neoe 2022-09-04 23:27:51 UTC
sorry, not yet.
a compile takes 12 minutes, and a bitsec need about 14 iterators, and not automatic yet. quite expensive for me. 
I don't know if any amd zen2 will have the same problem. 
So I think if 6.0.0 release and still problem, I will make an automatic script to do the git bitsect test, if there are no better ways.
Comment 19 neoe 2022-09-05 02:42:17 UTC
I see 6.0.0-rc4 come out today, so I had a test on it.

"acpitool -e" shows the "Freq. scaling driver   : amd-pstate"  , better than rc2

but also noticed that the HDMI-audio output is gone, I just can say there seems alot amdgpu changes. I just cannot follow the test process according to the state. 

As a linux user and lover, I decided to stick to 5.19.y until 6.0.0 become stable. 

Thank you all guys.
Comment 20 neoe 2022-09-05 02:43:42 UTC
6.0.0-rc4 also black-screen after sleep by 'acpitool -s'
Comment 21 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-09-05 09:29:50 UTC
(In reply to neoe from comment #18)
> a bitsec need about 14 iterators

FWIW, it's likely just about 8 if it's really between v6.0-rc1..v6.0-rc2
Comment 22 Frank Kruger 2022-10-11 20:05:25 UTC
This might be worth to look at: https://gitlab.freedesktop.org/drm/amd/-/issues/2164
Comment 23 neoe 2022-11-17 14:36:39 UTC
I found this bug is fixed in v6.0.9 (no idea what happened)

Note You need to log in before you can comment on or make changes to this bug.