-
Notifications
You must be signed in to change notification settings - Fork 28
[th/marvell-boot-live-iso] marvell: boot live ISO for accessing worker node #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+181
−101
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
c1bb932
host: honor timeout for ssh_connect()
thom311 5fc672e
marvell: log a message before running pxeboot command
thom311 8e021cf
coreosBuilder: fix build() to initialize git submodule
thom311 6f1c9c1
marvell: boot live ISO for accessing worker node
thom311 504f01d
test/marvell: simular host unreachable for testing boot coreos
thom311 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,93 +1,160 @@ | ||
import os | ||
import shlex | ||
from clustersConfig import NodeConfig | ||
import typing | ||
from bmc import BMC | ||
from bmc import BmcConfig | ||
from clustersConfig import NodeConfig | ||
from clusterNode import ClusterNode | ||
import common | ||
import host | ||
|
||
|
||
def marvell_bmc_rsh(bmc: BmcConfig) -> host.Host: | ||
thom311 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# For Marvell DPU, we require that our "BMC" is the host on has the DPU | ||
# plugged in. | ||
# | ||
# We also assume, that the user name is "core" and that we can SSH into | ||
# that host with public key authentication. We ignore the `bmc.user` | ||
# setting. The reason for that is so that dpu-operator's | ||
# "hack/cluster-config/config-dpu.yaml" (which should work with IPU and | ||
# Marvell DPU) does not need to specify different BMC user name and | ||
# passwords. If you solve how to express the BMC authentication in the | ||
# cluster config in a way that is suitable for IPU and Marvell DPU at the | ||
# same time (e.g. via Jinja2 templates), we can start honoring | ||
# bmc.user/bmc.password. | ||
rsh = host.RemoteHost(bmc.url) | ||
rsh.ssh_connect("core") | ||
return rsh | ||
|
||
|
||
def is_marvell(bmc: BmcConfig) -> bool: | ||
rsh = marvell_bmc_rsh(bmc) | ||
return "177d:b900" in rsh.run("lspci -nn -d :b900").out | ||
|
||
|
||
def _pxeboot_marvell_dpu(name: str, bmc: BmcConfig, mac: str, ip: str, iso: str) -> None: | ||
rsh = marvell_bmc_rsh(bmc) | ||
|
||
ip_addr = f"{ip}/24" | ||
ip_gateway = common.ip_to_gateway(ip, "255.255.255.0") | ||
|
||
# An empty entry means to use the host's "id_ed25519.pub". We want that. | ||
ssh_keys = [""] | ||
for _, pub_key_content, _ in common.iterate_ssh_keys(): | ||
ssh_keys.append(pub_key_content) | ||
|
||
ssh_key_options = [f"--ssh-key={shlex.quote(s)}" for s in ssh_keys] | ||
|
||
image = os.environ.get("CDA_MARVELL_TOOLS_IMAGE", "quay.io/sdaniele/marvell-tools:latest") | ||
|
||
r = rsh.run( | ||
"set -o pipefail ; " | ||
"sudo " | ||
"podman " | ||
"run " | ||
"--pull always " | ||
"--rm " | ||
"--replace " | ||
"--privileged " | ||
"--pid host " | ||
"--network host " | ||
"--user 0 " | ||
"--name marvell-tools " | ||
"-i " | ||
"-v /:/host " | ||
"-v /dev:/dev " | ||
f"{shlex.quote(image)} " | ||
"./pxeboot.py " | ||
f"--dpu-name={shlex.quote(name)} " | ||
"--host-mode=coreos " | ||
f"--nm-secondary-cloned-mac-address={shlex.quote(mac)} " | ||
f"--nm-secondary-ip-address={shlex.quote(ip_addr)} " | ||
f"--nm-secondary-ip-gateway={shlex.quote(ip_gateway)} " | ||
"--yum-repos=rhel-nightly " | ||
"--default-extra-packages " | ||
"--octep-cp-agent-service-disable " | ||
f"{' '.join(ssh_key_options)} " | ||
f"{shlex.quote(iso)} " | ||
"2>&1 " | ||
"| tee \"/tmp/pxeboot-log-$(date '+%Y%m%d-%H%M%S')\"" | ||
) | ||
if not r.success(): | ||
raise RuntimeError(f"Failure to to pxeboot: {r}") | ||
|
||
|
||
def MarvellIsoBoot(node: NodeConfig, iso: str) -> None: | ||
assert node.ip is not None | ||
assert node.bmc is not None | ||
_pxeboot_marvell_dpu(node.name, node.bmc, node.mac, node.ip, iso) | ||
|
||
|
||
def main() -> None: | ||
pass | ||
|
||
|
||
if __name__ == "__main__": | ||
main() | ||
from logger import logger | ||
import coreosBuilder | ||
from nfs import NFS | ||
|
||
|
||
class MarvellBMC: | ||
def __init__( | ||
self, | ||
bmc: BmcConfig, | ||
*, | ||
bmc_host: typing.Optional[BmcConfig] = None, | ||
get_external_port: typing.Optional[typing.Callable[[], str]] = None, | ||
) -> None: | ||
assert (bmc_host is None) == (get_external_port is None) | ||
self.bmc = bmc | ||
self._bmc_host = bmc_host | ||
self._get_external_port = get_external_port | ||
|
||
def _ssh_to_bmc(self, *, boot_coreos: bool = True) -> typing.Optional[host.Host]: | ||
# For Marvell DPU, the "BMC" is the host where the DPU is plugged in. | ||
# | ||
# That host also has the serial console of the DPU connected to | ||
# /dev/ttyUSB[01] and "eno4" is (by default) switched together with the | ||
# primary interface enP2p3s0 on the DPU. This interface is also used | ||
# for pxeboot installation. See | ||
# https://github.com/wizhaoredhat/marvell-octeon-10-tools project. | ||
# | ||
# To access those interfaces, the host must be accessible via SSH. | ||
# This function returns a Host instance with SSH connected (usually | ||
# to the "core" user, use via sudo). | ||
# | ||
# If the host is not accessible, the function may first call _boot_coreos() | ||
# method, to boot a CoreOS Live image. For that, the host needs a separate | ||
# bmc_host (which is supposed to be a Redfish BMC of the host). | ||
rsh = host.RemoteHost(self.bmc.url) | ||
|
||
try: | ||
if boot_coreos: | ||
# FIXME: testing only. Drop this part. | ||
raise RuntimeError("TEST: for testing simulate host is unrechable and boot coreos") | ||
|
||
rsh.ssh_connect("core", timeout="2m") | ||
except Exception as e: | ||
logger.info(f"Cannot connect to core @ {self.bmc.url}: {e}") | ||
else: | ||
return rsh | ||
|
||
if self._bmc_host is None or not boot_coreos: | ||
# There is no fallback to boot a CoreOS Live ISO. | ||
return None | ||
|
||
self._boot_coreos() | ||
|
||
rsh = host.RemoteHost(self.bmc.url) | ||
rsh.ssh_connect("core", timeout="15m") | ||
return rsh | ||
|
||
def _boot_coreos(self) -> None: | ||
assert self._bmc_host | ||
assert self._get_external_port | ||
|
||
logger.info(f"For Marvell host {self.bmc.url} boot CoreOS Live via BMC {self._bmc_host.url}") | ||
|
||
coreosBuilder.ensure_fcos_exists() | ||
lh = host.LocalHost() | ||
nfs = NFS(lh, self._get_external_port()) | ||
iso_url = nfs.host_file("/root/iso/fedora-coreos.iso") | ||
|
||
bmc2 = BMC.from_bmc_config(self._bmc_host) | ||
bmc2.boot_iso_redfish(iso_url) | ||
|
||
def is_marvell(self) -> bool: | ||
rsh = self._ssh_to_bmc() | ||
if rsh is None: | ||
return False | ||
return "177d:b900" in rsh.run("lspci -nn -d :b900").out | ||
|
||
def pxeboot( | ||
thom311 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self, | ||
name: str, | ||
mac: str, | ||
ip: str, | ||
iso: str, | ||
) -> None: | ||
rsh = self._ssh_to_bmc(boot_coreos=False) | ||
|
||
if rsh is None: | ||
raise RuntimeError(f"Cannot connect to {self.bmc.url} for pxeboot of Marvell DPU") | ||
|
||
ip_addr = f"{ip}/24" | ||
ip_gateway = common.ip_to_gateway(ip, "255.255.255.0") | ||
|
||
# An empty entry means to use the host's "id_ed25519.pub". We want that. | ||
ssh_keys = [""] | ||
for _, pub_key_content, _ in common.iterate_ssh_keys(): | ||
ssh_keys.append(pub_key_content) | ||
|
||
ssh_key_options = [f"--ssh-key={shlex.quote(s)}" for s in ssh_keys] | ||
|
||
image = os.environ.get("CDA_MARVELL_TOOLS_IMAGE", "quay.io/sdaniele/marvell-tools:latest") | ||
|
||
logger.info(f"run pxeboot for {self.bmc.url} to install {image}") | ||
|
||
r = rsh.run( | ||
"set -o pipefail ; " | ||
"sudo " | ||
"podman " | ||
"run " | ||
"--pull always " | ||
"--rm " | ||
"--replace " | ||
"--privileged " | ||
"--pid host " | ||
"--network host " | ||
"--user 0 " | ||
"--name marvell-tools " | ||
"-i " | ||
"-v /:/host " | ||
"-v /dev:/dev " | ||
f"{shlex.quote(image)} " | ||
"./pxeboot.py " | ||
f"--dpu-name={shlex.quote(name)} " | ||
"--host-mode=coreos " | ||
f"--nm-secondary-cloned-mac-address={shlex.quote(mac)} " | ||
f"--nm-secondary-ip-address={shlex.quote(ip_addr)} " | ||
f"--nm-secondary-ip-gateway={shlex.quote(ip_gateway)} " | ||
"--yum-repos=rhel-nightly " | ||
"--default-extra-packages " | ||
"--octep-cp-agent-service-disable " | ||
f"{' '.join(ssh_key_options)} " | ||
f"{shlex.quote(iso)} " | ||
"2>&1 " | ||
"| tee \"/tmp/pxeboot-log-$(date '+%Y%m%d-%H%M%S')\"" | ||
) | ||
if not r.success(): | ||
raise RuntimeError(f"Failure to to pxeboot: {r}") | ||
|
||
|
||
class MarvellClusterNode(ClusterNode): | ||
def __init__(self, node: NodeConfig) -> None: | ||
assert node.ip is not None | ||
assert node.bmc is not None | ||
self._name = node.name | ||
self._ip = node.ip | ||
self._mac = node.mac | ||
self._bmc = node.bmc | ||
|
||
def start(self, install_iso: str) -> bool: | ||
bmc = MarvellBMC(self._bmc) | ||
bmc.pxeboot(self._name, self._mac, self._ip, install_iso) | ||
return True |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first that we do is boot the host. The patch is good but the comment isn't entirely correct. Can you push this separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dpu_vendor.is_ipu()
uses SSH (with timeout) to detect whether a host has an IPU. The ping-check thwarts that. The statement is seems correct.Of course, maybe some earlier layers of the code try to boot the host first. But
dpu_vendor.is_ipu()
is a reasonably high level function that the statement can be applied to that (and be correct).And as to whether we really "first [...] boot the host", that does not seem the case currently (see #382 (comment) ). Maybe, but I have doubts...
The patch is on this PR, because without it, the PR does not seem testable. Well, due to #382 (comment), it probably is still not testable, so probably the PR has more issues. As said, I would recommend to revert 382 (for now). In any case, the PR contains patches that are necessary to (maybe) get the test passing.
Anyway, I will reword the commit message...