various fixes and add cmc secondary ap
[automated-distro-installer] / README
1 PXE install w multi-boot, btrfs & Libreboot support
2
3 Some things are specific to my home network, and uses files with secrets
4 that are not in this repo. I use this for bare metal and vms, and two
5 scripts which can run post boot so I use them on vps distributed image
6 as well.
7
8 Features people may find useful: installs encrypted trisquel, debian,
9 ubuntu, arch, and parabola (archlike install is likely broken, I've only
10 done pxe boots recently), in a multi-boot setup using multiple
11 subvolumes of a single btrfs filesystem. Utilizes multiple disks, with
12 scripts to automatically decrypt on intentional reboots, but not after
13 shutdown or power loss.
14
15 Normal install mode for fai is using pxe, but on a libreboot system,
16 there is no pxe. The pxe in a normal computer is nonfree
17 firmware. Alternatives to normal pxe that I've tried:
18
19 * libreboot + seabios + ipxe
20
21 * Use a live cd to call pxe-kexec, this is described later in this file.
22
23 * Use the fai autodiscover iso. This is more automated, so nicer.
24
25 * Use an install method above to setup a gnu/linux disk partition that
26 coordinates with libreboot grub to acts like a pxe boot using
27 kexec. The boot process takes a bit longer than normal pxe. This is
28 the bootstrap partition in my scripts.
29
30 Things I haven't tried:
31
32 * The bios chip has enough room for an initrd. This could be setup to
33 work like the partition I use to kexec, but it would be faster, and
34 not require installing to disk.
35
36 The partititioning and filesystem script is at
37 fai/config/hooks/partition.DEFAULT. Disks are grouped as ssd or hdd and
38 raided in raid 1 or raid 0 per configuration. The base partitions are
39 divided into boot, swap, and root, (only boot is unencrypted). There are
40 scripts to resize those partitions post-provision and while the system
41 is running.
42
43 People who use fai may find these things as useful examples: it uses
44 dnsmasq (on a openwrt machine) for dhcp instead of the isc
45 dhcp. fai-wrapper is a small script to use basic fai classes outside of
46 fai. It does not use the fai partitioning tool, but the script is
47 inspired from it and works outside of fai. It supports running a fai
48 server on debian within android via Maru.
49
50 It also automates configuration of an openwrt router after manual
51 initial installation.
52
53 After provisionining is done, I sync files using btrfs, or unison for
54 vps, then automate further setup using a different set of scripts,
55 https://iankelling.org/git/?p=distro-setup;a=tree.
56
57 My network is a wndr3700v2 router with openwrt on it and a few pcs/laptops.
58
59 Since fai requires a debian server as the fai server, there are also
60 scripts to automate a debian install using pxe and preseeding, which can
61 be done from any distro.
62
63 Some of the scripts have dependencies for some simple obvious utility
64 scripts from https://iankelling.org/git, and of course there are some
65 hostnames that are specific to my network.
66
67
68 # Per-host/install configuration
69
70 Before doing a fai install, you will need to populate a class file. I
71 use one called 51-multi-boot, which you can see example of in
72 fai/config/class/50-host-classes.
73
74 Before doing a fai install, you will need to populate /q/root/luks and
75 /q/root/shadow, see their references. You might also want to copy
76 existing /etc/ssh/*host* to
77 /p/c/machine_specific/HOST/filesystem/etc/ssh
78
79 host-* luks keyfiles generated like:
80 h=demohost; head -c 2048 /dev/urandom | od | se dd of=/q/root/luks/host-$h
81
82 Configuration of which luks key to use is in
83 fai/config/hooks/partition.DEFAULT
84
85 Configuration of which (if any) shadow file to use is in
86 fai/config/distro-install-common/end
87 and which shadow file / luks file(s) to copy into the new machine depends
88 on fai-redep arguments.
89
90 Also, setup dns in bind and wrt-setup-local.
91
92 After install, btrbk to setup data, and then distro-begin && distro end.
93 See notes in distro-begin for other configuration.
94
95 # Scripts (meant to be used directly):
96
97
98 # Setup the environment for the install
99
100 # create tiny autodiscover cd
101 # todo: with fai-revm at least, this complains about missing vmlinuz. need to fix this.
102 fai-redep && sudo fai-cd -g $PWD/grub.cfg.autodiscover -f -A $BASEFILE_DIR/autodiscover.iso
103 # create normal fai cd (replace TARGET_HOSTNAME)
104 fai-redep -t TARGET_HOSTNAME && sudo fai-cd -M -g $PWD/grub.cfg.netinst-noreboot -f $BASEFILE_DIR/netinst.iso
105 # note, may need to set hostname, depending on config,
106 # and some other things for environment not on your lan
107 # for example see fai/config/class/LINODE.var. See linode notes below.
108
109 mymk-basefile # Create basefiles for various distros
110 archlike-pxe # Setup pxe boot server from an archlike base image
111 fai-redep # Deploy fai configuration to host "faiserver"
112 faiserver-uninstall # uninstall fai-server
113 faiserver-setup # install fai-server on the current machine
114 myfai-chboot # setup fai tftp and nfs. useful for doing pxe-kexec
115 pxe-server # disable/enable pxe dhcp, tfp, and nfs. calls myfai-chboot
116 wrt-setup # setup my router in general: dhcp, dns, etc.
117
118
119 # Script to do a distro install
120
121 faiserver-revm # using pxe & preseed, create a vm which is a fai server
122 dsfull # install & post-install a new fai distro
123 arch-init-remote # install arch after it's been booted into it's setup env
124 live-kexec # Kexec this or a remote machine using host faiserver. also
125 useful to run as curl live-kexec|bash
126
127
128 # Test scripts
129
130 arch-revm # test arch install on a fresh vm
131 fai-revm # test fai install on a fresh vm
132
133
134 # Scripts to call after a distro install for various reasons
135
136 chboot # Set grub to boot into a different distro (installed earlier)
137 install-chboot # reinstall chboot to /boot subvols, for chboot updates.
138 eboot # reboot without automatic disk decryption
139 fai-wrapper # use fai classes outside of fai. sourced, not called.
140 faiserver-disable # Disable the fai nfs server exports
141 fresize # resize swap or boot partitions in a host
142
143
144 # NAT/forward/vpn tftp
145
146 I tried to get this working, but failed.
147
148 tftp server in theory can be forwarded over a vpn, eg on a wireguard tunnel.
149
150 However, I found that when actually pxe booting, it wouldn't work, only
151 the 1st filename would be requested, eg, in the logs:
152
153 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
154
155
156 To get that far, nating tftp requires some special attention in iptables, like so:
157
158 https://unix.stackexchange.com/questions/579508/iptables-rules-to-forward-tftp-via-nat
159 iptables -t raw -A PREROUTING -p udp --dport 69 -s 209.51.188.0/24 -j CT --helper tftp
160 modprobe nf_nat_tftp
161
162 to test tftp from a client machine:
163
164 tftp SERVER_IP -c get pxelinux.0
165 rm -fv pxelinux.0
166
167
168 # Common problems
169
170 ## kernel mismatch very early error, no remote logs:
171
172 ERROR: the running kernel does not match the kernel modules inside the nfsroot.
173 ERROR: Kernel modules directory /lib/modules/5.10.0-8-amd not available. Only found /lib/modules/5.10.0-15-amd64
174
175 solution: if running from fai-cd, recreate autodiscover cd as noted above in setup.
176
177 # What good logs look like:
178
179 logging nfs traffic from server
180
181 s rpcdebug -m nfsd -s all
182
183
184 normal nfs mount & umount logs look like:
185
186 journalctl -ef | gr nfs
187
188 Jun 20 22:15:36 kd rpc.mountd[2025725]: authenticated mount request from 10.32.2.1:865 for /srv/fai/nfsroot (/srv/fai/nfsroot)
189 Jun 20 22:15:36 kd kernel: nfsd: exp_rootfh(/srv/fai/nfsroot [00000000e8c53e54] *:dm-0/5521225)
190 Jun 20 22:15:36 kd kernel: nfsd: fh_compose(exp 00:1b/5521225 fai/nfsroot, ino=5521225)
191 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
192 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
193 Jun 20 22:15:36 kd kernel: nfsd: PATHCONF(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
194 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
195 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
196 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
197 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
198 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
199 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
200 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
201 Jun 20 22:15:45 kd rpc.mountd[2025725]: authenticated unmount request from 10.32.2.1:986 for /srv/fai/nfsroot (/srv/fai/nfsroot)
202
203 normal tftpd logs from:
204
205 after setting -vv in TFTP_OPTIONS in /etc/default/tftpd-hpa
206
207 journalctl -u tftpd-hpa
208
209 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
210 Jun 20 23:51:02 kd in.tftpd[4021351]: RRQ from 10.2.0.12 filename ldlinux.c32
211 Jun 20 23:51:02 kd in.tftpd[4021352]: RRQ from 10.2.0.12 filename pxelinux.cfg/a913a477-fca6-234d-a928-6bb011decd05
212 Jun 20 23:51:02 kd in.tftpd[4021352]: sending NAK (1, File not found) to 10.2.0.12
213 Jun 20 23:51:02 kd in.tftpd[4021353]: RRQ from 10.2.0.12 filename pxelinux.cfg/01-52-54-00-9c-ef-ad
214 Jun 20 23:51:02 kd in.tftpd[4021353]: sending NAK (1, File not found) to 10.2.0.12
215 Jun 20 23:51:02 kd in.tftpd[4021354]: RRQ from 10.2.0.12 filename pxelinux.cfg/0A02000C
216 Jun 20 23:51:02 kd in.tftpd[4021355]: RRQ from 10.2.0.12 filename vmlinuz-5.10.0-15-amd64
217 Jun 20 23:51:03 kd in.tftpd[4021356]: RRQ from 10.2.0.12 filename initrd.img-5.10.0-15-amd64
218
219
220
221 # Replacing a raid 10 disk
222
223 pxe-server -S HOST fai
224
225 # btrfs replace or delete. prefer replace. to setup partitions on replacement drive:
226 scp fai-wrapper HOST:
227 ssh root@HOST
228 . fai-wrapper
229 export SPECIAL_DISK=/dev/REPLACEMENT_DEV
230 /var/lib/fai/config/hooks/partition.DEFAULT
231
232
233 ssh root@HOST
234 for x in /target/* /target; do umount $x; done
235 cat >p
236 PASSWORD HERE(ctrl-d ctrl-d)
237 cd /dev/disk/by-id/
238 for d in ata*part1; do cryptsetup luksOpen -d /root/p $d crypt_dev_$d; done
239 x=(/dev/mapper/*part1); mount -o subvol=root_trisquelflidas $x /mnt
240 # btrfs fi show /mnt
241 # btrfs replace start -f /dev/mapper/OLD_DEV /dev/mapper/NEW_DEV /mnt
242 # btrfs replace status /mnt
243 # nohup btrfs dev delete /dev/sde1 /mnt
244 mount -o subvol=boot_trisquelflidas /dev/sda3 /mnt/boot
245 # also replace or delete disk for boot
246 for x in dev proc sys; do mount -o bind /$x /mnt/$x; done
247 chroot /mnt /bin/bash
248 # replace disk in fstab
249 # replace disk in /etc/crypttab
250 update-grub
251 update-initramfs -u
252 mount /a
253 /a/exe/keyscript-on
254 exit
255 reboot
256
257
258 # Expected output in fai logs
259
260
261 ## On focal:
262
263 fai.log:updatebase.UBUNTU FAILED with exit code 1.
264 the real error is dpkg-reconfigure locales, seems to be related
265 to a workaround for < 20.04, relevant comment:
266 # in case the locales are already included inside the base file (Ubuntu)
267 in config/hooks/instsoft.DEBIAN
268
269
270 ## For flidas,
271
272 when installing systemd, this error happens, and it's
273 a superflous upstream bug based on reading the post install script:
274
275 addgroup: The group `systemd-journal' already exists as a system group. Exiting.
276 Operation failed: No such file or directory
277
278 ## On nabia/newer,
279
280 python is removed, now its python3,
281 and its easier to just let the package get removed than
282 do host class package config.
283 fai.log:WARNING: These unknown packages are removed from the installation list: python python-minimal
284
285 Similar to python, linux-image-amd64 is the debian package name
286 for the kernel, linux-image-generic is for ubuntu, but the
287 DEBIAN class is defined on ubuntu and its easier to just let
288 the package get removed with this warning:
289 fai.log:WARNING: These unknown packages are removed from the installation list: linux-image-amd64
290 Also, cryptsetup-initramfs is new to buster/nabia, it gets removed
291 on earlier versions.
292
293 ## parted error
294 fai.log:Error: /dev/vda: unrecognised disk label
295 This is from parted -m $d unit MiB print.
296 It happens when there are no partitions yet.
297
298
299 ######## notes on creating a lan with just 2 computers ########
300
301
302 ## below assumes eth0 is the ethernet device used to connect to the target computer.
303
304
305 # this is not strictly needed. I had my connection die at some point,
306 # and I suspected this might help.
307 # based on
308 # https://support.qacafe.com/knowledge-base/how-do-i-prevent-network-manager-from-controlling-an-interface/
309 cat > /etc/NetworkManager/conf.d/99-fai-tmp.conf <<'EOF'
310 [main]
311 plugins=keyfile
312
313 [keyfile]
314 unmanaged-devices=interface-name:eth0
315 EOF
316 ser restart NetworkManager
317
318
319 cat >> /etc/network/interfaces <<'EOF'
320 iface eth0 inet static
321 address 10.0.44.1/24
322 EOF
323
324 ifup eth0
325
326 # note turn off fsf vpn, so route to coresite is the normal route.
327 echo 1 > /proc/sys/net/ipv4/ip_forward
328 m s iptables -t nat -A POSTROUTING -o $(ip -4 route get 8.8.8.8 | sed -nr 's,^.* dev\s+(\S+).*,\1,p') -j MASQUERADE
329
330
331 change /p/c/machine_specific/vps/bind-initial/db.b8.nz
332 faiserver 10.0.44.1
333 TARGET_HOSTNAME 10.0.44.2
334
335 apt install isc-dhcp-server
336
337 cat >> /etc/default/isc-dhcp-server <<'EOF'
338 INTERFACESv4="eth0"
339 EOF
340
341 edit ./dhcpd.conf to change mac address and target host name.
342
343 s cp /b/fai/dhcpd.conf /etc/dhcp/
344 ser restart isc-dhcp-server
345
346 edit /a/bin/fai/fai/config/class/51-multi-boot
347
348 pxe-server -d TARGET fai
349
350 Then do a pxe boot on the target host
351
352
353
354 ##### linode notes ######
355
356 * create 2 disks, installer (3000 mb, raw), boot (remaining, raw)
357 * create 2 profiles w direct boot, no helpers:
358 * installer (sda=boot, sdb=installer, boot dev=sdb)
359 * boot (sda=boot)
360 * Boot into rescue mode, ssh in with lish,
361 curl url_to_some_fai_cd_created_image | dd of=/dev/sda
362 poweroff
363 * boot into installer.
364 * Lish shows console, at the end of install, it gives prompt because
365 logs failed to save remotely, check the logs, then reboot into boot
366 profile if all is well. If that doesn't happen, turn off lassie in
367 settings.
368
369
370
371 ###### ubuntu notes ######
372
373 For someone who really needed ubuntu on host tp, otherwise they would
374 end up on a non-gnu os, and I didn't want to figure out how to get all
375 the default software installed, I did the following:
376
377 # On remote host:
378 # install etiona
379 cd /b/fai
380 # set 51-multi-boot to set classes outside of fai-wrapper conditional, including NOWIPE
381 . fai-wrapper
382 ./fai/config/hooks/partition.DEFAULT
383
384 # on remote host
385 # install ubuntu 20.04 using virt-install
386 sudo -i
387 virt-install --os-variant=ubuntu16.04 --cdrom ubuntu-20.04-desktop-amd64.iso --disk path=u2004.qcow2 -r 2048 --vcpus 1 -n u2004
388 qemu-img create -o preallocation=metadata -f qcow2 u2004.qcow2 15G
389 # alternatively, also tried a physical install, because I know the virtual install ends up
390 # with some differen things, like some spice service. then pulled the data out with
391 rsync -ahSAX --numeric-ids --exclude=proc --exclude=sys --exclude=dev --exclude=tmp --exclude=run root@tp:/ .; mkdir proc sys dev tmp
392
393 modprobe nbd
394 qemu-nbd --connect=/dev/nbd0 u1804.qcow2 -f qcow2
395 qemu-nbd --connect=/dev/nbd0 u2004.qcow2 -f qcow2
396 mount /dev/nbd0p1 /mnt/1 # bionic
397 mount /dev/nbd0p5 /mnt/1 # focal
398 mount -o bind /mnt/root/root_ubuntubionic /mnt/2
399 mount -o bind /mnt/root/root_ubuntufocal /mnt/2
400 mkdir -p /mnt/2/boot
401 mount -o bind /mnt/boot/boot_ubuntubionic /mnt/2/boot
402 mount -o bind /mnt/boot/boot_ubuntufocal /mnt/2/boot
403 # S = sparse, A = acls, X = xattrs
404 rsync -ahSAX --numeric-ids /mnt/1/ /mnt/2
405
406 cd /mnt/2
407 cp /tmp/fai/crypttab etc
408 sed -i "s#/root/keyscript,#decrypt_keyctl,#" etc/crypttab
409 cp /tmp/fai/fstab etc
410 echo "tmpfs /tmp tmpfs nodev,nosuid,size=50%,mode=1777 0 0" >> etc/fstab
411 chrbind
412 chroot .
413 mv /etc/resolv.conf /etc/resolv.conf.old
414 echo nameserver 1.1.1.1 >/etc/resolv.conf
415 # install programs from /a/bin/fai/fai/config/package_config/STANDARD:
416 apt install -y openssh-client openssh-server cryptsetup keyutils btrfs-progs console-setup kbd pciutils usbutils unattended-upgrades initramfs-tools-core dropbear-initramfs
417 mv /etc/resolv.conf.old /etc/resolv.conf
418 exit
419 d=etc/initramfs-tools
420 mkdir -p $d/root/.ssh etc/dropbear-initramfs root/.ssh
421 chmod 700 $d/root $d/root/.ssh root/.ssh
422 cp -p /root/.ssh/authorized_keys $d/root/.ssh/authorized_keys
423 cp -p /root/.ssh/authorized_keys etc/dropbear-initramfs
424 cp -p /root/.ssh/authorized_keys root/.ssh/authorized_keys
425 chroot .
426 sed -ri 's/^ *GRUB_CMDLINE_LINUX_DEFAULT=.*/GRUB_CMDLINE_LINUX_DEFAULT="rd.luks.crypttab=no"/' /etc/default/grub
427 grub-install --no-floppy $(grub-probe -tdrive -d /dev/sda)
428 update-grub
429 grub-bios-setup -d /boot/grub/i386-pc -s /dev/sda
430 exit
431 umount proc
432 umount dev
433 umount sys
434 reboot
435
436 # for switching the boot to root2
437 zboot
438 # for switching back, efibootmgr, if there is a problem with the root filesystem detection,
439 # boot into the debian bootstrap distro, run partition.DEFAULT using comments for mktab arg.
440 # then manually run iboot and then reboot.
441
442
443 # pine rock64 notes
444 # the only useful image is ubuntu 18.04 ayafun or something.
445 # using emmc usb:
446 s mount /dev/sdb7 /mnt/1
447 s cp `which qemu-arm-static` /mnt/1/usr/bin
448 s chroot /mnt/1 qemu-arm-static /bin/bash
449 usermod --login iank --move-home --home /home/iank rock46
450 groupmod --new-name iank rock64
451 passwd iank
452 # boot it
453 s apt-get update
454 s apt dist-upgrade
455
456
457 ### How to merge upstream fai-config
458
459 git checkout upstream
460 cd path-to-fai-config
461 git pull --stat
462 # the following needs modification if there was deletions or renames
463 rsync --exclude /.git -rlpgoDcvi . /b/fai/fai/config/
464 cd /b/fai/fai/config/
465 # where XXXXX is the git commit hash
466 # note, several files which just had trailing space changes will get ignored.
467 git commit -am "update upstream to XXXXX"
468 git checkout master
469 git merge upstream
470 # fix conflicts
471 git commit
472
473
474 # TODO
475 Change arch to archlike and to support arch and parabola
476
477
478 # License
479
480 The license for the project is GPLv2 or later, mostly because fai is and
481 I periodically merge the upstream example config, which contains small
482 scripts. Also, there is a modified encrypt.upstream, which is from the
483 cryptsetup package in arch, which is under the same license.