5e01bb4794b2e25d9d55320a63111f4bef5b7bc7
[automated-distro-installer] / README
1 # This file is part of Ian Kelling's automated-distro-installer
2 # Copyright (C) 2024 Ian Kelling
3
4 # This program is free software; you can redistribute it and/or
5 # modify it under the terms of the GNU General Public License
6 # as published by the Free Software Foundation; either version 2
7 # of the License, or (at your option) any later version.
8
9 # This program is distributed in the hope that it will be useful,
10 # but WITHOUT ANY WARRANTY; without even the implied warranty of
11 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 # GNU General Public License for more details.
13
14 # You should have received a copy of the GNU General Public License
15 # along with this program; if not, write to the Free Software
16 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
17
18 PXE install w multi-boot, btrfs & Libreboot support
19
20 Some things are specific to my home network, and uses files with secrets
21 that are not in this repo. I use this for bare metal and vms, and two
22 scripts which can run post boot so I use them on vps distributed image
23 as well.
24
25 Features people may find useful: installs encrypted trisquel, debian,
26 ubuntu, arch, and parabola (archlike install is likely broken, I've only
27 done pxe boots recently), in a multi-boot setup using multiple
28 subvolumes of a single btrfs filesystem. Utilizes multiple disks, with
29 scripts to automatically decrypt on intentional reboots, but not after
30 shutdown or power loss.
31
32 Normal install mode for fai is using pxe, but on a libreboot system,
33 there is no pxe. The pxe in a normal computer is nonfree
34 firmware. Alternatives to normal pxe that I've tried:
35
36 * libreboot + seabios + ipxe
37
38 * Use a live cd to call pxe-kexec, this is described later in this file.
39
40 * Use the fai autodiscover iso. This is more automated, so nicer.
41
42 * Use an install method above to setup a gnu/linux disk partition that
43 coordinates with libreboot grub to acts like a pxe boot using
44 kexec. The boot process takes a bit longer than normal pxe. This is
45 the bootstrap partition in my scripts.
46
47 Things I haven't tried:
48
49 * The bios chip has enough room for an initrd. This could be setup to
50 work like the partition I use to kexec, but it would be faster, and
51 not require installing to disk.
52
53 The partititioning and filesystem script is at
54 fai/config/hooks/partition.DEFAULT. Disks are grouped as ssd or hdd and
55 raided in raid 1 or raid 0 per configuration. The base partitions are
56 divided into boot, swap, and root, (only boot is unencrypted). There are
57 scripts to resize those partitions post-provision and while the system
58 is running.
59
60 People who use fai may find these things as useful examples: it uses
61 dnsmasq (on a openwrt machine) for dhcp instead of the isc
62 dhcp. fai-wrapper is a small script to use basic fai classes outside of
63 fai. It does not use the fai partitioning tool, but the script is
64 inspired from it and works outside of fai. It supports running a fai
65 server on debian within android via Maru.
66
67 It also automates configuration of an openwrt router after manual
68 initial installation.
69
70 After provisionining is done, I sync files using btrfs, or unison for
71 vps, then automate further setup using a different set of scripts,
72 https://iankelling.org/git/?p=distro-setup;a=tree.
73
74 My network is a wndr3700v2 router with openwrt on it and a few pcs/laptops.
75
76 Since fai requires a debian server as the fai server, there are also
77 scripts to automate a debian install using pxe and preseeding, which can
78 be done from any distro.
79
80 Some of the scripts have dependencies for some simple obvious utility
81 scripts from https://iankelling.org/git, and of course there are some
82 hostnames that are specific to my network.
83
84
85 # Per-host/install configuration
86
87 Before doing a fai install, you will need to populate a class file. I
88 use one called 51-multi-boot, which you can see example of in
89 fai/config/class/50-host-classes.
90
91 Before doing a fai install, you will need to populate /q/root/luks and
92 /q/root/shadow, see their references. You might also want to copy
93 existing /etc/ssh/*host* to
94 /p/c/machine_specific/HOST/filesystem/etc/ssh
95
96 host-* luks keyfiles generated like:
97 h=demohost; head -c 2048 /dev/urandom | od | se dd of=/q/root/luks/host-$h
98
99 Configuration of which luks key to use is in
100 fai/config/hooks/partition.DEFAULT
101
102 Configuration of which (if any) shadow file to use is in
103 fai/config/distro-install-common/end
104 and which shadow file / luks file(s) to copy into the new machine depends
105 on fai-redep arguments.
106
107 Also, setup dns in /p/c/host-info and firewall redirects in wrt-setup-local.
108
109 After install, btrbk to setup data, and then distro-begin && distro end.
110 See notes in distro-begin for other configuration.
111
112 # Prerequesites:
113
114 <https://savannah.nongnu.org/git/?group=bash-bear-trap>
115 git clone https://git.savannah.nongnu.org/git/bash-bear-trap.git
116 sudo install -T bash-bear-trap/bash-bear /usr/local/lib/bash-bear
117
118
119 # Scripts (meant to be used directly):
120
121
122 # Setup the environment for the install
123
124 # create tiny autodiscover cd
125 # todo: with fai-revm at least, this complains about missing vmlinuz. need to fix this.
126 fai-redep && sudo fai-cd -g $PWD/grub.cfg.autodiscover -f -A $BASEFILE_DIR/autodiscover.iso
127 # create normal fai cd (replace TARGET_HOSTNAME)
128 fai-redep -t TARGET_HOSTNAME && sudo fai-cd -M -g $PWD/grub.cfg.netinst-noreboot -f $BASEFILE_DIR/netinst.iso
129 # note, may need to set hostname, depending on config,
130 # and some other things for environment not on your lan
131 # for example see fai/config/class/LINODE.var. See linode notes below.
132
133 mymk-basefile # Create basefiles for various distros
134 archlike-pxe # Setup pxe boot server from an archlike base image
135 fai-redep # Deploy fai configuration to host "faiserver.b8.nz"
136 faiserver-uninstall # uninstall fai-server
137 faiserver-setup # install fai-server on the current machine
138 myfai-chboot # setup fai tftp and nfs. useful for doing pxe-kexec or booting from a fai-cd.
139 pxe-server # disable/enable pxe dhcp, tfp, and nfs. calls myfai-chboot
140 wrt-setup # setup my router in general: dhcp, dns, etc.
141
142
143 # Script to do a distro install
144
145 faiserver-revm # using pxe & preseed, create a vm which is a fai server
146 dsfull # install & post-install a new fai distro
147 arch-init-remote # install arch after it's been booted into it's setup env
148 live-kexec # Kexec this or a remote machine using host faiserver. also
149 useful to run as curl live-kexec|bash
150
151
152 # Test scripts
153
154 arch-revm # test arch install on a fresh vm
155 fai-revm # test fai install on a fresh vm
156
157
158 # Scripts to call after a distro install for various reasons
159
160 chboot # Set grub to boot into a different distro (installed earlier)
161 install-chboot # reinstall chboot to /boot subvols, for chboot updates.
162 eboot # reboot without automatic disk decryption
163 fai-wrapper # use fai classes outside of fai. sourced, not called.
164 faiserver-disable # Disable the fai nfs server exports
165 fresize # resize swap or boot partitions in a host
166
167
168 # NAT/forward/vpn tftp
169
170 I tried to get this working, but failed.
171
172 tftp server in theory can be forwarded over a vpn, eg on a wireguard tunnel.
173
174 However, I found that when actually pxe booting, it wouldn't work, only
175 the 1st filename would be requested, eg, in the logs:
176
177 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
178
179
180 To get that far, nating tftp requires some special attention in iptables, like so:
181
182 https://unix.stackexchange.com/questions/579508/iptables-rules-to-forward-tftp-via-nat
183 iptables -t raw -A PREROUTING -p udp --dport 69 -s 209.51.188.0/24 -j CT --helper tftp
184 modprobe nf_nat_tftp
185
186 to test tftp from a client machine:
187
188 tftp SERVER_IP -c get pxelinux.0
189 rm -fv pxelinux.0
190
191
192 # Common problems
193
194 ## kernel mismatch very early error, no remote logs:
195
196 ERROR: the running kernel does not match the kernel modules inside the nfsroot.
197 ERROR: Kernel modules directory /lib/modules/5.10.0-8-amd not available. Only found /lib/modules/5.10.0-15-amd64
198
199 solution: if running from fai-cd, recreate autodiscover cd as noted above in setup.
200
201 ## Weird package dependency errors
202
203 for example: in fai.log, within instsoft.DEBIAN
204 ```
205 The following packages have unmet dependencies:
206 libc6 : Breaks: locales (< 2.36) but 2.35-0ubuntu3.7+11.0trisquel1 is to be installed
207 ```
208
209 In this case, it was because the basefile was missing, and so instead
210 fai decided to use the wrong basefile.
211
212 for example: in fai.log, within instsoft.DEBIAN
213
214 ```
215 ftar: No matching class found in /var/lib/fai/config/basefiles//
216 ftar: extracting /var/tmp/base.tar.zst to /target/
217 ```
218
219 # What good logs look like:
220
221 logging nfs traffic from server
222
223 s rpcdebug -m nfsd -s all
224
225
226 normal nfs mount & umount logs look like:
227
228 journalctl -ef | gr nfs
229
230 Jun 20 22:15:36 kd rpc.mountd[2025725]: authenticated mount request from 10.32.2.1:865 for /srv/fai/nfsroot (/srv/fai/nfsroot)
231 Jun 20 22:15:36 kd kernel: nfsd: exp_rootfh(/srv/fai/nfsroot [00000000e8c53e54] *:dm-0/5521225)
232 Jun 20 22:15:36 kd kernel: nfsd: fh_compose(exp 00:1b/5521225 fai/nfsroot, ino=5521225)
233 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
234 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
235 Jun 20 22:15:36 kd kernel: nfsd: PATHCONF(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
236 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
237 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
238 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
239 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
240 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
241 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
242 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
243 Jun 20 22:15:45 kd rpc.mountd[2025725]: authenticated unmount request from 10.32.2.1:986 for /srv/fai/nfsroot (/srv/fai/nfsroot)
244
245 normal tftpd logs from:
246
247 after setting -vv in TFTP_OPTIONS in /etc/default/tftpd-hpa
248
249 journalctl -u tftpd-hpa
250
251 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
252 Jun 20 23:51:02 kd in.tftpd[4021351]: RRQ from 10.2.0.12 filename ldlinux.c32
253 Jun 20 23:51:02 kd in.tftpd[4021352]: RRQ from 10.2.0.12 filename pxelinux.cfg/a913a477-fca6-234d-a928-6bb011decd05
254 Jun 20 23:51:02 kd in.tftpd[4021352]: sending NAK (1, File not found) to 10.2.0.12
255 Jun 20 23:51:02 kd in.tftpd[4021353]: RRQ from 10.2.0.12 filename pxelinux.cfg/01-52-54-00-9c-ef-ad
256 Jun 20 23:51:02 kd in.tftpd[4021353]: sending NAK (1, File not found) to 10.2.0.12
257 Jun 20 23:51:02 kd in.tftpd[4021354]: RRQ from 10.2.0.12 filename pxelinux.cfg/0A02000C
258 Jun 20 23:51:02 kd in.tftpd[4021355]: RRQ from 10.2.0.12 filename vmlinuz-5.10.0-15-amd64
259 Jun 20 23:51:03 kd in.tftpd[4021356]: RRQ from 10.2.0.12 filename initrd.img-5.10.0-15-amd64
260
261
262
263 # Replacing a raid 10 disk
264
265 pxe-server -S HOST fai
266
267 # btrfs replace or delete. prefer replace. to setup partitions on replacement drive:
268 scp fai-wrapper HOST:
269 ssh root@HOST
270 . fai-wrapper
271 export SPECIAL_DISK=/dev/REPLACEMENT_DEV
272 /var/lib/fai/config/hooks/partition.DEFAULT
273
274
275 ssh root@HOST
276 for x in /target/* /target; do umount $x; done
277 cat >p
278 PASSWORD HERE(ctrl-d ctrl-d)
279 cd /dev/disk/by-id/
280 for d in ata*part1; do cryptsetup luksOpen -d /root/p $d crypt_dev_$d; done
281 x=(/dev/mapper/*part1); mount -o subvol=root_trisquelflidas $x /mnt
282 # btrfs fi show /mnt
283 # btrfs replace start -f /dev/mapper/OLD_DEV /dev/mapper/NEW_DEV /mnt
284 # btrfs replace status /mnt
285 # nohup btrfs dev delete /dev/sde1 /mnt
286 mount -o subvol=boot_trisquelflidas /dev/sda3 /mnt/boot
287 # also replace or delete disk for boot
288 for x in dev proc sys; do mount -o bind /$x /mnt/$x; done
289 chroot /mnt /bin/bash
290 # replace disk in fstab
291 # replace disk in /etc/crypttab
292 update-grub
293 update-initramfs -u
294 mount /a
295 /a/exe/keyscript-on
296 exit
297 reboot
298
299
300 # Expected output in fai logs
301
302
303 ## On focal:
304
305 fai.log:updatebase.UBUNTU FAILED with exit code 1.
306 the real error is dpkg-reconfigure locales, seems to be related
307 to a workaround for < 20.04, relevant comment:
308 # in case the locales are already included inside the base file (Ubuntu)
309 in config/hooks/instsoft.DEBIAN
310
311
312 ## For flidas,
313
314 when installing systemd, this error happens, and it's
315 a superflous upstream bug based on reading the post install script:
316
317 addgroup: The group `systemd-journal' already exists as a system group. Exiting.
318 Operation failed: No such file or directory
319
320 ## On nabia/newer,
321
322 python is removed, now its python3,
323 and its easier to just let the package get removed than
324 do host class package config.
325 fai.log:WARNING: These unknown packages are removed from the installation list: python python-minimal
326
327 Similar to python, linux-image-amd64 is the debian package name
328 for the kernel, linux-image-generic is for ubuntu, but the
329 DEBIAN class is defined on ubuntu and its easier to just let
330 the package get removed with this warning:
331 fai.log:WARNING: These unknown packages are removed from the installation list: linux-image-amd64
332 Also, cryptsetup-initramfs is new to buster/nabia, it gets removed
333 on earlier versions.
334
335 ## parted error
336 fai.log:Error: /dev/vda: unrecognised disk label
337 This is from parted -m $d unit MiB print.
338 It happens when there are no partitions yet.
339
340
341 ######## notes on creating a lan with just 2 computers ########
342
343
344 ## below assumes eth0 is the ethernet device used to connect to the target computer.
345
346
347 # this is not strictly needed. I had my connection die at some point,
348 # and I suspected this might help.
349 # based on
350 # https://support.qacafe.com/knowledge-base/how-do-i-prevent-network-manager-from-controlling-an-interface/
351 cat > /etc/NetworkManager/conf.d/99-fai-tmp.conf <<'EOF'
352 [main]
353 plugins=keyfile
354
355 [keyfile]
356 unmanaged-devices=interface-name:eth0
357 EOF
358 ser restart NetworkManager
359
360
361 cat >> /etc/network/interfaces <<'EOF'
362 iface eth0 inet static
363 address 10.0.44.1/24
364 EOF
365
366 ifup eth0
367
368 # note turn off fsf vpn, so route to coresite is the normal route.
369 echo 1 > /proc/sys/net/ipv4/ip_forward
370 m s iptables -t nat -A POSTROUTING -o $(ip -4 route get 8.8.8.8 | sed -nr 's,^.* dev\s+(\S+).*,\1,p') -j MASQUERADE
371
372
373 change /p/c/machine_specific/vps/bind-initial/db.b8.nz
374 faiserver 10.0.44.1
375 TARGET_HOSTNAME 10.0.44.2
376
377 apt install isc-dhcp-server
378
379 cat >> /etc/default/isc-dhcp-server <<'EOF'
380 INTERFACESv4="eth0"
381 EOF
382
383 edit ./dhcpd.conf to change mac address and target host name.
384
385 s cp /b/fai/dhcpd.conf /etc/dhcp/
386 ser restart isc-dhcp-server
387
388 edit /a/bin/fai/fai/config/class/51-multi-boot
389
390 pxe-server -d TARGET fai
391
392 Then do a pxe boot on the target host
393
394
395
396 ##### linode notes ######
397
398 * create 2 disks, installer (3000 mb, raw), boot (remaining, raw)
399 * create 2 profiles w direct boot, no helpers:
400 * installer (sda=boot, sdb=installer, boot dev=sdb)
401 * boot (sda=boot)
402 * Boot into rescue mode, ssh in with lish,
403 curl url_to_some_fai_cd_created_image | dd of=/dev/sda
404 poweroff
405 * boot into installer.
406 * Lish shows console, at the end of install, it gives prompt because
407 logs failed to save remotely, check the logs, then reboot into boot
408 profile if all is well. If that doesn't happen, turn off lassie in
409 settings.
410
411
412
413 ###### ubuntu notes ######
414
415 For someone who really needed ubuntu on host tp, otherwise they would
416 end up on a non-gnu os, and I didn't want to figure out how to get all
417 the default software installed, I did the following:
418
419 # On remote host:
420 # install etiona
421 cd /b/fai
422 # set 51-multi-boot to set classes outside of fai-wrapper conditional, including NOWIPE
423 . fai-wrapper
424 ./fai/config/hooks/partition.DEFAULT
425
426 # on remote host
427 # install ubuntu 20.04 using virt-install
428 sudo -i
429 virt-install --os-variant=ubuntu16.04 --cdrom ubuntu-20.04-desktop-amd64.iso --disk path=u2004.qcow2 -r 2048 --vcpus 1 -n u2004
430 qemu-img create -o preallocation=metadata -f qcow2 u2004.qcow2 15G
431 # alternatively, also tried a physical install, because I know the virtual install ends up
432 # with some differen things, like some spice service. then pulled the data out with
433 rsync -ahSAX --numeric-ids --exclude=proc --exclude=sys --exclude=dev --exclude=tmp --exclude=run root@tp:/ .; mkdir proc sys dev tmp
434
435 modprobe nbd
436 qemu-nbd --connect=/dev/nbd0 u1804.qcow2 -f qcow2
437 qemu-nbd --connect=/dev/nbd0 u2004.qcow2 -f qcow2
438 mount /dev/nbd0p1 /mnt/1 # bionic
439 mount /dev/nbd0p5 /mnt/1 # focal
440 mount -o bind /mnt/root/root_ubuntubionic /mnt/2
441 mount -o bind /mnt/root/root_ubuntufocal /mnt/2
442 mkdir -p /mnt/2/boot
443 mount -o bind /mnt/boot/boot_ubuntubionic /mnt/2/boot
444 mount -o bind /mnt/boot/boot_ubuntufocal /mnt/2/boot
445 # S = sparse, A = acls, X = xattrs
446 rsync -ahSAX --numeric-ids /mnt/1/ /mnt/2
447
448 cd /mnt/2
449 cp /tmp/fai/crypttab etc
450 sed -i "s#/root/keyscript,#decrypt_keyctl,#" etc/crypttab
451 cp /tmp/fai/fstab etc
452 echo "tmpfs /tmp tmpfs nodev,nosuid,size=50%,mode=1777 0 0" >> etc/fstab
453 chrbind
454 chroot .
455 mv /etc/resolv.conf /etc/resolv.conf.old
456 echo nameserver 1.1.1.1 >/etc/resolv.conf
457 # install programs from /a/bin/fai/fai/config/package_config/STANDARD:
458 apt install -y openssh-client openssh-server cryptsetup keyutils btrfs-progs console-setup kbd pciutils usbutils unattended-upgrades initramfs-tools-core dropbear-initramfs
459 mv /etc/resolv.conf.old /etc/resolv.conf
460 exit
461 d=etc/initramfs-tools
462 mkdir -p $d/root/.ssh etc/dropbear-initramfs root/.ssh
463 chmod 700 $d/root $d/root/.ssh root/.ssh
464 cp -p /root/.ssh/authorized_keys $d/root/.ssh/authorized_keys
465 cp -p /root/.ssh/authorized_keys etc/dropbear-initramfs
466 cp -p /root/.ssh/authorized_keys root/.ssh/authorized_keys
467 chroot .
468 sed -ri 's/^ *GRUB_CMDLINE_LINUX_DEFAULT=.*/GRUB_CMDLINE_LINUX_DEFAULT="rd.luks.crypttab=no"/' /etc/default/grub
469 grub-install --no-floppy $(grub-probe -tdrive -d /dev/sda)
470 update-grub
471 grub-bios-setup -d /boot/grub/i386-pc -s /dev/sda
472 exit
473 umount proc
474 umount dev
475 umount sys
476 reboot
477
478 # for switching the boot to root2
479 zboot
480 # for switching back, efibootmgr, if there is a problem with the root filesystem detection,
481 # boot into the debian bootstrap distro, run partition.DEFAULT using comments for mktab arg.
482 # then manually run iboot and then reboot.
483
484
485 # pine rock64 notes
486 # the only useful image is ubuntu 18.04 ayafun or something.
487 # using emmc usb:
488 s mount /dev/sdb7 /mnt/1
489 s cp `which qemu-arm-static` /mnt/1/usr/bin
490 s chroot /mnt/1 qemu-arm-static /bin/bash
491 usermod --login iank --move-home --home /home/iank rock46
492 groupmod --new-name iank rock64
493 passwd iank
494 # boot it
495 s apt-get update
496 s apt dist-upgrade
497
498
499 ### How to merge upstream fai-config
500
501 git checkout upstream
502 cd path-to-fai-config
503 git pull --stat
504 # the following needs modification if there was deletions or renames
505 rsync --exclude /.git -rlpgoDcvi . /b/fai/fai/config/
506 cd /b/fai/fai/config/
507 # where XXXXX is the git commit hash
508 # note, several files which just had trailing space changes will get ignored.
509 git commit -am "update upstream to XXXXX"
510 git checkout master
511 git merge upstream
512 # fix conflicts
513 git commit
514
515
516 # TODO
517 Change arch to archlike and to support arch and parabola
518
519
520 # License
521
522 The license for the project is GPLv2 or later, mostly because fai is and
523 I periodically merge the upstream example config, which contains small
524 scripts. Also, there is a modified encrypt.upstream, which is from the
525 cryptsetup package in arch, which is under the same license.