various fixes
[automated-distro-installer] / README
1 PXE install w multi-boot, btrfs & Libreboot support
2
3 Some things are specific to my home network, and uses files with secrets
4 that are not in this repo. I use this for bare metal and vms, and two
5 scripts which can run post boot so I use them on vps distributed image
6 as well.
7
8 Features people may find useful: installs encrypted trisquel, debian,
9 ubuntu, arch, and parabola (archlike install is likely broken, I've only
10 done pxe boots recently), in a multi-boot setup using multiple
11 subvolumes of a single btrfs filesystem. Utilizes multiple disks, with
12 scripts to automatically decrypt on intentional reboots, but not after
13 shutdown or power loss.
14
15 Normal install mode for fai is using pxe, but on a libreboot system,
16 there is no pxe. The pxe in a normal computer is nonfree
17 firmware. Alternatives to normal pxe that I've tried:
18
19 * libreboot + seabios + ipxe
20
21 * Use a live cd to call pxe-kexec, this is described later in this file.
22
23 * Use the fai autodiscover iso. This is more automated, so nicer.
24
25 * Use an install method above to setup a gnu/linux disk partition that
26 coordinates with libreboot grub to acts like a pxe boot using
27 kexec. The boot process takes a bit longer than normal pxe. This is
28 the bootstrap partition in my scripts.
29
30 Things I haven't tried:
31
32 * The bios chip has enough room for an initrd. This could be setup to
33 work like the partition I use to kexec, but it would be faster, and
34 not require installing to disk.
35
36 The partititioning and filesystem script is at
37 fai/config/hooks/partition.DEFAULT. Disks are grouped as ssd or hdd and
38 raided in raid 1 or raid 0 per configuration. The base partitions are
39 divided into boot, swap, and root, (only boot is unencrypted). There are
40 scripts to resize those partitions post-provision and while the system
41 is running.
42
43 People who use fai may find these things as useful examples: it uses
44 dnsmasq (on a openwrt machine) for dhcp instead of the isc
45 dhcp. fai-wrapper is a small script to use basic fai classes outside of
46 fai. It does not use the fai partitioning tool, but the script is
47 inspired from it and works outside of fai. It supports running a fai
48 server on debian within android via Maru.
49
50 It also automates configuration of an openwrt router after manual
51 initial installation.
52
53 After provisionining is done, I sync files using btrfs, or unison for
54 vps, then automate further setup using a different set of scripts,
55 https://iankelling.org/git/?p=distro-setup;a=tree.
56
57 My network is a wndr3700v2 router with openwrt on it and a few pcs/laptops.
58
59 Since fai requires a debian server as the fai server, there are also
60 scripts to automate a debian install using pxe and preseeding, which can
61 be done from any distro.
62
63 Some of the scripts have dependencies for some simple obvious utility
64 scripts from https://iankelling.org/git, and of course there are some
65 hostnames that are specific to my network.
66
67
68 # Per-host/install configuration
69
70 Before doing a fai install, you will need to populate a class file. I
71 use one called 51-multi-boot, which you can see example of in
72 fai/config/class/50-host-classes.
73
74
75
76 Before doing a fai install, you will need to populate /q/root/luks and
77 /q/root/shadow, see their references. You might also want to copy
78 existing /etc/ssh/*host* to
79 /p/c/machine_specific/HOST/filesystem/etc/ssh
80
81 host-* luks keyfiles generated like:
82 h=demohost; head -c 2048 /dev/urandom | od | se dd of=/q/root/luks/host-$h
83
84 Configuration of which luks key to use is in
85 fai/config/hooks/partition.DEFAULT
86
87 Configuration of which (if any) shadow file to use is in
88 fai/config/distro-install-common/end
89 and which shadow file / luks file(s) to copy into the new machine depends
90 on fai-redep arguments.
91
92 # Scripts (meant to be used directly):
93
94
95 # Setup the environment for the install
96
97 # create tiny autodiscover cd
98 # todo: with fai-revm at least, this complains about missing vmlinuz. need to fix this.
99 fai-redep && sudo fai-cd -g $PWD/grub.cfg.autodiscover -f -A $BASEFILE_DIR/autodiscover.iso
100 # create normal fai cd (replace TARGET_HOSTNAME)
101 fai-redep -t TARGET_HOSTNAME && sudo fai-cd -M -g $PWD/grub.cfg.netinst-noreboot -f $BASEFILE_DIR/netinst.iso
102 # note, may need to set hostname, depending on config,
103 # and some other things for environment not on your lan
104 # for example see fai/config/class/LINODE.var. See linode notes below.
105
106 mymk-basefile # Create basefiles for various distros
107 archlike-pxe # Setup pxe boot server from an archlike base image
108 fai-redep # Deploy fai configuration to host "faiserver"
109 faiserver-uninstall # uninstall fai-server
110 faiserver-setup # install fai-server on the current machine
111 myfai-chboot # setup fai tftp and nfs. useful for doing pxe-kexec
112 pxe-server # disable/enable pxe dhcp, tfp, and nfs. calls myfai-chboot
113 wrt-setup # setup my router in general: dhcp, dns, etc.
114
115
116 # Script to do a distro install
117
118 faiserver-revm # using pxe & preseed, create a vm which is a fai server
119 dsfull # install & post-install a new fai distro
120 arch-init-remote # install arch after it's been booted into it's setup env
121 live-kexec # Kexec this or a remote machine using host faiserver. also
122 useful to run as curl live-kexec|bash
123
124
125 # Test scripts
126
127 arch-revm # test arch install on a fresh vm
128 fai-revm # test fai install on a fresh vm
129
130
131 # Scripts to call after a distro install for various reasons
132
133 chboot # Set grub to boot into a different distro (installed earlier)
134 install-chboot # reinstall chboot to /boot subvols, for chboot updates.
135 eboot # reboot without automatic disk decryption
136 fai-wrapper # use fai classes outside of fai. sourced, not called.
137 faiserver-disable # Disable the fai nfs server exports
138 fresize # resize swap or boot partitions in a host
139
140
141 # NAT/forward/vpn tftp
142
143 I tried to get this working, but failed.
144
145 tftp server in theory can be forwarded over a vpn, eg on a wireguard tunnel.
146
147 However, I found that when actually pxe booting, it wouldn't work, only
148 the 1st filename would be requested, eg, in the logs:
149
150 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
151
152
153 To get that far, nating tftp requires some special attention in iptables, like so:
154
155 https://unix.stackexchange.com/questions/579508/iptables-rules-to-forward-tftp-via-nat
156 iptables -t raw -A PREROUTING -p udp --dport 69 -s 209.51.188.0/24 -j CT --helper tftp
157 modprobe nf_nat_tftp
158
159 to test tftp from a client machine:
160
161 tftp SERVER_IP -c get pxelinux.0
162 rm -fv pxelinux.0
163
164
165 # Common problems
166
167 ## kernel mismatch very early error, no remote logs:
168
169 ERROR: the running kernel does not match the kernel modules inside the nfsroot.
170 ERROR: Kernel modules directory /lib/modules/5.10.0-8-amd not available. Only found /lib/modules/5.10.0-15-amd64
171
172 solution: if running from fai-cd, recreate autodiscover cd as noted above in setup.
173
174 # What good logs look like:
175
176 logging nfs traffic from server
177
178 s rpcdebug -m nfsd -s all
179
180
181 normal nfs mount & umount logs look like:
182
183 journalctl -ef | gr nfs
184
185 Jun 20 22:15:36 kd rpc.mountd[2025725]: authenticated mount request from 10.32.2.1:865 for /srv/fai/nfsroot (/srv/fai/nfsroot)
186 Jun 20 22:15:36 kd kernel: nfsd: exp_rootfh(/srv/fai/nfsroot [00000000e8c53e54] *:dm-0/5521225)
187 Jun 20 22:15:36 kd kernel: nfsd: fh_compose(exp 00:1b/5521225 fai/nfsroot, ino=5521225)
188 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
189 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
190 Jun 20 22:15:36 kd kernel: nfsd: PATHCONF(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
191 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
192 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
193 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
194 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
195 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
196 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
197 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
198 Jun 20 22:15:45 kd rpc.mountd[2025725]: authenticated unmount request from 10.32.2.1:986 for /srv/fai/nfsroot (/srv/fai/nfsroot)
199
200 normal tftpd logs from:
201
202 after setting -vv in TFTP_OPTIONS in /etc/default/tftpd-hpa
203
204 journalctl -u tftpd-hpa
205
206 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
207 Jun 20 23:51:02 kd in.tftpd[4021351]: RRQ from 10.2.0.12 filename ldlinux.c32
208 Jun 20 23:51:02 kd in.tftpd[4021352]: RRQ from 10.2.0.12 filename pxelinux.cfg/a913a477-fca6-234d-a928-6bb011decd05
209 Jun 20 23:51:02 kd in.tftpd[4021352]: sending NAK (1, File not found) to 10.2.0.12
210 Jun 20 23:51:02 kd in.tftpd[4021353]: RRQ from 10.2.0.12 filename pxelinux.cfg/01-52-54-00-9c-ef-ad
211 Jun 20 23:51:02 kd in.tftpd[4021353]: sending NAK (1, File not found) to 10.2.0.12
212 Jun 20 23:51:02 kd in.tftpd[4021354]: RRQ from 10.2.0.12 filename pxelinux.cfg/0A02000C
213 Jun 20 23:51:02 kd in.tftpd[4021355]: RRQ from 10.2.0.12 filename vmlinuz-5.10.0-15-amd64
214 Jun 20 23:51:03 kd in.tftpd[4021356]: RRQ from 10.2.0.12 filename initrd.img-5.10.0-15-amd64
215
216
217
218 # Replacing a raid 10 disk
219
220 pxe-server -S HOST fai
221
222 # btrfs replace or delete. prefer replace. to setup partitions on replacement drive:
223 scp fai-wrapper HOST:
224 ssh root@HOST
225 . fai-wrapper
226 export SPECIAL_DISK=/dev/REPLACEMENT_DEV
227 /var/lib/fai/config/hooks/partition.DEFAULT
228
229
230 ssh root@HOST
231 for x in /target/* /target; do umount $x; done
232 cat >p
233 PASSWORD HERE(ctrl-d ctrl-d)
234 cd /dev/disk/by-id/
235 for d in ata*part1; do cryptsetup luksOpen -d /root/p $d crypt_dev_$d; done
236 x=(/dev/mapper/*part1); mount -o subvol=root_trisquelflidas $x /mnt
237 # btrfs fi show /mnt
238 # btrfs replace start -f /dev/mapper/OLD_DEV /dev/mapper/NEW_DEV /mnt
239 # btrfs replace status /mnt
240 # nohup btrfs dev delete /dev/sde1 /mnt
241 mount -o subvol=boot_trisquelflidas /dev/sda3 /mnt/boot
242 # also replace or delete disk for boot
243 for x in dev proc sys; do mount -o bind /$x /mnt/$x; done
244 chroot /mnt /bin/bash
245 # replace disk in fstab
246 # replace disk in /etc/crypttab
247 update-grub
248 update-initramfs -u
249 mount /a
250 /a/exe/keyscript-on
251 exit
252 reboot
253
254
255 # Expected output in fai logs
256
257
258 ## On focal:
259
260 fai.log:updatebase.UBUNTU FAILED with exit code 1.
261 the real error is dpkg-reconfigure locales, seems to be related
262 to a workaround for < 20.04, relevant comment:
263 # in case the locales are already included inside the base file (Ubuntu)
264 in config/hooks/instsoft.DEBIAN
265
266
267 ## For flidas,
268
269 when installing systemd, this error happens, and it's
270 a superflous upstream bug based on reading the post install script:
271
272 addgroup: The group `systemd-journal' already exists as a system group. Exiting.
273 Operation failed: No such file or directory
274
275 ## On nabia/newer,
276
277 python is removed, now its python3,
278 and its easier to just let the package get removed than
279 do host class package config.
280 fai.log:WARNING: These unknown packages are removed from the installation list: python python-minimal
281
282 Similar to python, linux-image-amd64 is the debian package name
283 for the kernel, linux-image-generic is for ubuntu, but the
284 DEBIAN class is defined on ubuntu and its easier to just let
285 the package get removed with this warning:
286 fai.log:WARNING: These unknown packages are removed from the installation list: linux-image-amd64
287 Also, cryptsetup-initramfs is new to buster/nabia, it gets removed
288 on earlier versions.
289
290 ## parted error
291 fai.log:Error: /dev/vda: unrecognised disk label
292 This is from parted -m $d unit MiB print.
293 It happens when there are no partitions yet.
294
295
296 ######## notes on creating a lan with just 2 computers ########
297
298
299 ## below assumes eth0 is the ethernet device used to connect to the target computer.
300
301
302 # this is not strictly needed. I had my connection die at some point,
303 # and I suspected this might help.
304 # based on
305 # https://support.qacafe.com/knowledge-base/how-do-i-prevent-network-manager-from-controlling-an-interface/
306 cat > /etc/NetworkManager/conf.d/99-fai-tmp.conf <<'EOF'
307 [main]
308 plugins=keyfile
309
310 [keyfile]
311 unmanaged-devices=interface-name:eth0
312 EOF
313 ser restart NetworkManager
314
315
316 cat >> /etc/network/interfaces <<'EOF'
317 iface eth0 inet static
318 address 10.0.44.1/24
319 EOF
320
321 ifup eth0
322
323 # note turn off fsf vpn, so route to coresite is the normal route.
324 echo 1 > /proc/sys/net/ipv4/ip_forward
325 m s iptables -t nat -A POSTROUTING -o $(ip -4 route get 8.8.8.8 | sed -nr 's,^.* dev\s+(\S+).*,\1,p') -j MASQUERADE
326
327
328 change /p/c/machine_specific/vps/bind-initial/db.b8.nz
329 faiserver 10.0.44.1
330 TARGET_HOSTNAME 10.0.44.2
331
332 apt install isc-dhcp-server
333
334 cat >> /etc/default/isc-dhcp-server <<'EOF'
335 INTERFACESv4="eth0"
336 EOF
337
338 edit ./dhcpd.conf to change mac address and target host name.
339
340 s cp /b/fai/dhcpd.conf /etc/dhcp/
341 ser restart isc-dhcp-server
342
343 edit /a/bin/fai/fai/config/class/51-multi-boot
344
345 pxe-server -d TARGET fai
346
347 Then do a pxe boot on the target host
348
349
350
351 ##### linode notes ######
352
353 * create 2 disks, installer (3000 mb, raw), boot (remaining, raw)
354 * create 2 profiles w direct boot, no helpers:
355 * installer (sda=boot, sdb=installer, boot dev=sdb)
356 * boot (sda=boot)
357 * Boot into rescue mode, ssh in with lish,
358 curl url_to_some_fai_cd_created_image | dd of=/dev/sda
359 poweroff
360 * boot into installer.
361 * Lish shows console, at the end of install, it gives prompt because
362 logs failed to save remotely, check the logs, then reboot into boot
363 profile if all is well. If that doesn't happen, turn off lassie in
364 settings.
365
366
367
368 ###### ubuntu notes ######
369
370 For someone who really needed ubuntu on host tp, otherwise they would
371 end up on a non-gnu os, and I didn't want to figure out how to get all
372 the default software installed, I did the following:
373
374 # On remote host:
375 # install etiona
376 cd /b/fai
377 # set 51-multi-boot to set classes outside of fai-wrapper conditional, including NOWIPE
378 . fai-wrapper
379 ./fai/config/hooks/partition.DEFAULT
380
381 # on remote host
382 # install ubuntu 20.04 using virt-install
383 sudo -i
384 virt-install --os-variant=ubuntu16.04 --cdrom ubuntu-20.04-desktop-amd64.iso --disk path=u2004.qcow2 -r 2048 --vcpus 1 -n u2004
385 qemu-img create -o preallocation=metadata -f qcow2 u2004.qcow2 15G
386 # alternatively, also tried a physical install, because I know the virtual install ends up
387 # with some differen things, like some spice service. then pulled the data out with
388 rsync -ahSAX --numeric-ids --exclude=proc --exclude=sys --exclude=dev --exclude=tmp --exclude=run root@tp:/ .; mkdir proc sys dev tmp
389
390 modprobe nbd
391 qemu-nbd --connect=/dev/nbd0 u1804.qcow2 -f qcow2
392 qemu-nbd --connect=/dev/nbd0 u2004.qcow2 -f qcow2
393 mount /dev/nbd0p1 /mnt/1 # bionic
394 mount /dev/nbd0p5 /mnt/1 # focal
395 mount -o bind /mnt/root/root_ubuntubionic /mnt/2
396 mount -o bind /mnt/root/root_ubuntufocal /mnt/2
397 mkdir -p /mnt/2/boot
398 mount -o bind /mnt/boot/boot_ubuntubionic /mnt/2/boot
399 mount -o bind /mnt/boot/boot_ubuntufocal /mnt/2/boot
400 # S = sparse, A = acls, X = xattrs
401 rsync -ahSAX --numeric-ids /mnt/1/ /mnt/2
402
403 cd /mnt/2
404 cp /tmp/fai/crypttab etc
405 sed -i "s#/root/keyscript,#decrypt_keyctl,#" etc/crypttab
406 cp /tmp/fai/fstab etc
407 echo "tmpfs /tmp tmpfs nodev,nosuid,size=50%,mode=1777 0 0" >> etc/fstab
408 chrbind
409 chroot .
410 mv /etc/resolv.conf /etc/resolv.conf.old
411 echo nameserver 1.1.1.1 >/etc/resolv.conf
412 # install programs from /a/bin/fai/fai/config/package_config/STANDARD:
413 apt install -y openssh-client openssh-server cryptsetup keyutils btrfs-progs console-setup kbd pciutils usbutils unattended-upgrades initramfs-tools-core dropbear-initramfs
414 mv /etc/resolv.conf.old /etc/resolv.conf
415 exit
416 d=etc/initramfs-tools
417 mkdir -p $d/root/.ssh etc/dropbear-initramfs root/.ssh
418 chmod 700 $d/root $d/root/.ssh root/.ssh
419 cp -p /root/.ssh/authorized_keys $d/root/.ssh/authorized_keys
420 cp -p /root/.ssh/authorized_keys etc/dropbear-initramfs
421 cp -p /root/.ssh/authorized_keys root/.ssh/authorized_keys
422 chroot .
423 sed -ri 's/^ *GRUB_CMDLINE_LINUX_DEFAULT=.*/GRUB_CMDLINE_LINUX_DEFAULT="rd.luks.crypttab=no"/' /etc/default/grub
424 grub-install --no-floppy $(grub-probe -tdrive -d /dev/sda)
425 update-grub
426 grub-bios-setup -d /boot/grub/i386-pc -s /dev/sda
427 exit
428 umount proc
429 umount dev
430 umount sys
431 reboot
432
433 # for switching the boot to root2
434 zboot
435 # for switching back, efibootmgr, if there is a problem with the root filesystem detection,
436 # boot into the debian bootstrap distro, run partition.DEFAULT using comments for mktab arg.
437 # then manually run iboot and then reboot.
438
439
440 # pine rock64 notes
441 # the only useful image is ubuntu 18.04 ayafun or something.
442 # using emmc usb:
443 s mount /dev/sdb7 /mnt/1
444 s cp `which qemu-arm-static` /mnt/1/usr/bin
445 s chroot /mnt/1 qemu-arm-static /bin/bash
446 usermod --login iank --move-home --home /home/iank rock46
447 groupmod --new-name iank rock64
448 passwd iank
449 # boot it
450 s apt-get update
451 s apt dist-upgrade
452
453
454 ### How to merge upstream fai-config
455
456 git checkout upstream
457 cd path-to-fai-config
458 git pull --stat
459 # the following needs modification if there was deletions or renames
460 rsync --exclude /.git -rlpgoDcvi . /b/fai/fai/config/
461 cd /b/fai/fai/config/
462 # where XXXXX is the git commit hash
463 # note, several files which just had trailing space changes will get ignored.
464 git commit -am "update upstream to XXXXX"
465 git checkout master
466 git merge upstream
467 # fix conflicts
468 git commit
469
470
471 # TODO
472 Change arch to archlike and to support arch and parabola
473
474
475 # License
476
477 The license for the project is GPLv2 or later, mostly because fai is and
478 I periodically merge the upstream example config, which contains small
479 scripts. Also, there is a modified encrypt.upstream, which is from the
480 cryptsetup package in arch, which is under the same license.