This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5
ClosedPublic

Authored by cfang on Apr 7 2022, 5:04 PM.

Details

Summary

In emitting metadata for implicit kernel arguments, we need to be in sync with the actual loads
to align the implicit kernel argument segment to 8 byte boundary. In this work, we simply force
this alignment through the first implicit argument.
In addition, we don't emit metadata for any implicit kernel argument if none of them is actually used.

Diff Detail

Event Timeline

cfang created this revision.Apr 7 2022, 5:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2022, 5:04 PM
cfang requested review of this revision.Apr 7 2022, 5:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2022, 5:04 PM
Herald added a subscriber: wdng. · View Herald Transcript

Looks fine to me!

cfang added a comment.Apr 8 2022, 7:14 AM

[AMD Official Use Only]

Hi, Matt:
Do you have any suggestion about the patch ? I need the permission to integrate.

Thanks
Changpeng

Get Outlook for iOShttps://aka.ms/o0ukef


From: Brian Sumner via Phabricator <reviews@reviews.llvm.org>
Sent: Thursday, April 7, 2022 7:35:02 PM
To: Fang, Changpeng <Changpeng.Fang@amd.com>; Arsenault, Matthew <Matthew.Arsenault@amd.com>; Sumner, Brian <Brian.Sumner@amd.com>
Cc: Zhuravlyov, Konstantin <Konstantin.Zhuravlyov@amd.com>; jv356@scarletmail.rutgers.edu <jv356@scarletmail.rutgers.edu>; nhaehnle@gmail.com <nhaehnle@gmail.com>; Liu, Yaxun (Sam) <Yaxun.Liu@amd.com>; Stuttard, David <David.Stuttard@amd.com>; Tye, Tony <Tony.Tye@amd.com>; Kerbow, Austin <Austin.Kerbow@amd.com>; llvm-commits@lists.llvm.org <llvm-commits@lists.llvm.org>; jay.foad@gmail.com <jay.foad@gmail.com>; wei.ding2@amd.com <wei.ding2@amd.com>; mahesha.comp@gmail.com <mahesha.comp@gmail.com>; acmerzhangxipeng@163.com <acmerzhangxipeng@163.com>; Kumar N, Bhuvanendra <Bhuvanendra.KumarN@amd.com>; wangyihansh@outlook.com <wangyihansh@outlook.com>; yanliang.mu@intel.com <yanliang.mu@intel.com>; aeubanks@google.com <aeubanks@google.com>; dougpuob@gmail.com <dougpuob@gmail.com>; michael.hliao@gmail.com <michael.hliao@gmail.com>; Neubauer, Sebastian <Sebastian.Neubauer@amd.com>; Song, Ruiling <Ruiling.Song@amd.com>; Saleil, Baptiste <Baptiste.Saleil@amd.com>; Lambert, Jacob <Jacob.Lambert@amd.com>
Subject: [PATCH] D123346: AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5

[CAUTION: External Email]

b-sumner added a comment.

Looks fine to me!

CHANGES SINCE LAST ACTION

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD123346%2Fnew%2F&amp;data=04%7C01%7Cchangpeng.fang%40amd.com%7C9948ae9fa2004f0af0c008da190862bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637849821660760975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=HGOsFLQqbL4gLMC8SA4TXspkWjTbp%2FvLUyKG70XLfcw%3D&amp;reserved=0

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD123346&amp;data=04%7C01%7Cchangpeng.fang%40amd.com%7C9948ae9fa2004f0af0c008da190862bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637849821660760975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=y6wYTWANYcU0mrWObqnuBceXzqkTLJp1E4m730ypy%2FU%3D&amp;reserved=0

arsenm added inline comments.Apr 8 2022, 8:09 AM
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
991

It would be better to compute the actual alignment based on the known offset plus kernarg segment base alignment, if only to assert that it's at least 8

cfang updated this revision to Diff 421549.Apr 8 2022, 9:24 AM

Use alignTo to force the alignment for the implicit kernarg segment:
Offset = alignTo(Offset, ST.getAlignmentForImplicitArgPtr());

cfang marked an inline comment as done.Apr 8 2022, 9:26 AM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
991

We can use the alignTo to align the Offset at the very beginning. We don't need to worry about 8 bytes:
Offset = alignTo(Offset, ST.getAlignmentForImplicitArgPtr());

arsenm added inline comments.Apr 8 2022, 11:33 AM
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
804

This isn't the real alignment. You should assert the alignment you compute using the kernarg base alignment is at least getAlignmentForImplicitArgPtr

cfang marked an inline comment as done.Apr 8 2022, 2:05 PM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
804

I think this is to align Offset to ST.getAlignmentForImplicitArgPtr which is 8 bytes.
Offset = alignTo(Offset, ST.getAlignmentForImplicitArgPtr());

We do need this alignment to be in sync with the actual load of the implicit kernel argument.

arsenm added inline comments.Apr 8 2022, 2:15 PM
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
804

You do not want to use alignTo. You are not adjusting the alignment or offset here. You want to use commonAlignment(16, Offset), and assert this is >= ST.getAlignmentForImplicitArgPtr()

arsenm added inline comments.Apr 8 2022, 2:21 PM
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
804

OK, you are re-doing the same full layout recomputation in this case, so you do need to alignTo the implicitarg align

arsenm added a comment.Apr 8 2022, 2:43 PM

These test changes don't show a change in behavior

cfang added a comment.Apr 8 2022, 2:51 PM

These test changes don't show a change in behavior

Sorry, implicit-kernel-argument-alignment.ll should be a new addition of the file.
I will correct that. Thanks.

cfang updated this revision to Diff 421643.Apr 8 2022, 2:53 PM

Did correct "git diff" to include deleted and added files in the diff

Ping
Now that we agreed that we have to use alignTo to align the Offset to what the implicitarg_ptr requires
and the LIT tests have been updated to show the alignment related layout. Thanks.

arsenm accepted this revision.Apr 11 2022, 3:23 PM
This revision is now accepted and ready to land.Apr 11 2022, 3:23 PM
This revision was landed with ongoing or failed builds.Apr 11 2022, 4:13 PM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/AMDGPU/hsa-metadata-queue-ptr-v5.ll