[SRU][J:linux-gcp/N:linux-gcp/Q:linux-gcp][PULL] Support larger gVNIC queue depth on Gen3+ GCE VMs
Ian Whitfield
ian.whitfield at canonical.com
Fri May 22 03:26:24 UTC 2026
BugLink: https://bugs.launchpad.net/bugs/2153950
[Impact]
Currently, the maximum queue depth supported for gVNIC on overcommitted Gen3+
VMs such as N4 is 1K, whereas the maximum queue depth supported on Gen1/Gen2 VMs
is 2K. Customers who are migrating their workloads from N2 to N4 have requested
higher queue depth support on N4 VMs.
[Fix]
Target patches:
a2f19184014f ("gve: Enable reading max ring size from the device in DQO-QPL mode")
07993df56091 ("gve: Update QPL page registration logic")
Questing (6.17):
No additional patches were needed, the patches cherry picked cleanly.
Noble (6.8):
Several additional patches were included to enable all the features required
for the target patches, and reduce how much distance these patches have from
upstream. Details on the backport are in each commit message. Generally,
later commits depend on earlier commits. Fixes for any prerequisite patches here
were applied after the target commits. This is the patch series:
7cea48b9a4b2 ("gve: Define config structs for queue allocation")
1dfc2e46117e ("gve: Refactor napi add and remove functions")
f13697cc7a19 ("gve: Switch to config-aware queue allocation")
92a6d7a4010c ("gve: Refactor gve_open and gve_close")
5f08cd3d6423 ("gve: Alloc before freeing when adjusting queues")
f3753771e7cc ("gve: Alloc before freeing when changing features")
0b43cf527d1d ("gve: Add header split device option")
5e37d8254e7f ("gve: Add header split data path")
056a70924a02 ("gve: Add header split ethtool stats")
4cbc70f6ec5e ("gve: simplify setting decriptor count defaults")
5dee3c702c20 ("gve: make the completion and buffer ring size equal for DQO")
b94d3703c1a6 ("gve: set page count for RX QPL for GQI and DQO queue formats")
ed4fb326947d ("gve: add support to read ring size ranges from the device")
834f9458f2fd ("gve: add support to change ring size via ethtool")
fdf412374379 ("gve: Remove qpl_cfg struct since qpl_ids map with queues respectively")
087b24de5c82 ("queue_api: define queue api")
dcecfcf21bd1 ("gve: Make the GQ RX free queue funcs idempotent")
242f30fe692e ("gve: Add adminq funcs to add/remove a single Rx queue")
5abc37bdcbc5 ("gve: Make gve_turn(up|down) ignore stopped queues")
864616d97a45 ("gve: Make gve_turnup work for nonempty queues")
9a5e0776d11f ("gve: Avoid rescheduling napi if on wrong cpu")
770f52d5a0ed ("gve: Reset Rx ring state in the ring-stop funcs")
af9bcf910b1f ("gve: Account for stopped queues when reading NIC stats")
ee24284e2a10 ("gve: Alloc and free QPLs with the rings")
c93462b914db ("gve: Implement queue api")
07993df56091 ("gve: Update QPL page registration logic")
a2f19184014f ("gve: Enable reading max ring size from the device in DQO-QPL mode")
fba917b169be ("gve: Fix use of netif_carrier_ok()")
de63ac44a527 ("gve: fix XDP allocation path in edge cases")
3d970eda0034 ("gve: defer interrupt enabling until NAPI registration")
Jammy (5.15):
Several additional patches were included to enable all the features required
for the target patches, and reduce how much distance these patches have from
upstream. Details on the backport are in each commit message. Generally,
later commits depend on earlier commits. Fixes for any prerequisite patches here
were applied after the target commits. This is the patch series:
7cea48b9a4b2 ("gve: Define config structs for queue allocation")
1dfc2e46117e ("gve: Refactor napi add and remove functions")
95535e37e895 ("gve: Do not fully free QPL pages on prefill errors")
f13697cc7a19 ("gve: Switch to config-aware queue allocation")
92a6d7a4010c ("gve: Refactor gve_open and gve_close")
5f08cd3d6423 ("gve: Alloc before freeing when adjusting queues")
f3753771e7cc ("gve: Alloc before freeing when changing features")
0b43cf527d1d ("gve: Add header split device option")
5e37d8254e7f ("gve: Add header split data path")
056a70924a02 ("gve: Add header split ethtool stats")
4cbc70f6ec5e ("gve: simplify setting decriptor count defaults")
5dee3c702c20 ("gve: make the completion and buffer ring size equal for DQO")
b94d3703c1a6 ("gve: set page count for RX QPL for GQI and DQO queue formats")
ed4fb326947d ("gve: add support to read ring size ranges from the device")
834f9458f2fd ("gve: add support to change ring size via ethtool")
fdf412374379 ("gve: Remove qpl_cfg struct since qpl_ids map with queues respectively")
c91c46de6bbc ("net: provide macros for commonly copied lockless queue stop/wake code")
087b24de5c82 ("queue_api: define queue api")
dcecfcf21bd1 ("gve: Make the GQ RX free queue funcs idempotent")
242f30fe692e ("gve: Add adminq funcs to add/remove a single Rx queue")
5abc37bdcbc5 ("gve: Make gve_turn(up|down) ignore stopped queues")
864616d97a45 ("gve: Make gve_turnup work for nonempty queues")
9a5e0776d11f ("gve: Avoid rescheduling napi if on wrong cpu")
770f52d5a0ed ("gve: Reset Rx ring state in the ring-stop funcs")
af9bcf910b1f ("gve: Account for stopped queues when reading NIC stats")
ee24284e2a10 ("gve: Alloc and free QPLs with the rings")
c93462b914db ("gve: Implement queue api")
07993df56091 ("gve: Update QPL page registration logic")
a2f19184014f ("gve: Enable reading max ring size from the device in DQO-QPL mode")
448f413a8bdc ("ethtool: add support to set/get tx copybreak buf size via ethtool")
0b70c256eba8 ("ethtool: add support to set/get rx buf len via ethtool")
7462494408cd ("ethtool: extend ringparam setting/getting API with rx_buf_len")
9690ae604290 ("ethtool: add header/data split indication")
3e4d5ba9a3f8 ("netlink: Add a macro to set policy message with format string")
1241e329ce2e ("ethtool: add support to set/get completion queue event size")
4dc84c06a343 ("net: ethtool: extend ringparam set/get APIs for tx_push")
bde292c07b48 ("net: ethtool: move checks before rtnl_lock() in ethnl_set_rings")
5b4e9a7a71ab ("net: ethtool: extend ringparam set/get APIs for rx_push")
233eb4e786b5 ("ethtool: Add support for configuring tx_push_buf_len")
50d73710715d ("ethtool: add SET for TCP_DATA_SPLIT ringparam")
51c352bdbcd2 ("netlink: add support for formatted extack messages")
894d7508316e ("net: netdev_queue: netdev_txq_completed_mb(): fix wake condition")
fba917b169be ("gve: Fix use of netif_carrier_ok()")
de63ac44a527 ("gve: fix XDP allocation path in edge cases")
3d970eda0034 ("gve: defer interrupt enabling until NAPI registration")
[Test Plan]
The following output indicates the feature is missing:
$ sudo ethtool -G ens3 rx 4096
netlink error: Operation not supported
So the test plan for Canonical was to confirm that `sudo ethtool -G ens3 rx
4096` exits successfully without an error.
Google also reviewed the backport and kernel build for each series, and has
approved them in accordance with their own standard.
[Regression potential]
Questing has low potential for regression because the commits applied cleanly,
and only the two target patches were required. Noble required a large patchset,
but only one patch had an impact outside of the gve driver. On Jammy, 15 patches
affected systems outside of gve, so the potential for regression is much higher
relative to the other series. On Questing and Noble, regressions should only
be possible when using the gVNIC device for networking (which depends on the
gve driver), and could most likely result in networking failures on those
configurations. For Jammy, the most probable case for regression is still on
gVNIC configurations, but other networking systems could be impacted. These
target patches are also only a few months old, so follow-on fixes are also
a possibility.
[Other]
SF: 00433236 (for GCP variants)
PIT: 509371317 (for GKE variants)
More information about the kernel-team
mailing list