84a733ea7459b8a0a1915b28ff591133b23497f3
[automated-distro-installer] / README
1 # This file is part of Ian Kelling's automated-distro-installer
2 # Copyright (C) 2024 Ian Kelling
3
4 # This program is free software; you can redistribute it and/or
5 # modify it under the terms of the GNU General Public License
6 # as published by the Free Software Foundation; either version 2
7 # of the License, or (at your option) any later version.
8
9 # This program is distributed in the hope that it will be useful,
10 # but WITHOUT ANY WARRANTY; without even the implied warranty of
11 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 # GNU General Public License for more details.
13
14 # You should have received a copy of the GNU General Public License
15 # along with this program; if not, write to the Free Software
16 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
17
18 PXE install w multi-boot, btrfs & Libreboot support
19
20 Some things are specific to my home network, and uses files with secrets
21 that are not in this repo. I use this for bare metal and vms, and two
22 scripts which can run post boot so I use them on vps distributed image
23 as well.
24
25 Features people may find useful: installs encrypted trisquel, debian,
26 ubuntu, arch, and parabola (archlike install is likely broken, I've only
27 done pxe boots recently), in a multi-boot setup using multiple
28 subvolumes of a single btrfs filesystem. Utilizes multiple disks, with
29 scripts to automatically decrypt on intentional reboots, but not after
30 shutdown or power loss.
31
32 Normal install mode for fai is using pxe, but on a libreboot system,
33 there is no pxe. The pxe in a normal computer is nonfree
34 firmware. Alternatives to normal pxe that I've tried:
35
36 * libreboot + seabios + ipxe
37
38 * Use a live cd to call pxe-kexec, this is described later in this file.
39
40 * Use the fai autodiscover iso. This is more automated, so nicer.
41
42 * Use an install method above to setup a gnu/linux disk partition that
43 coordinates with libreboot grub to acts like a pxe boot using
44 kexec. The boot process takes a bit longer than normal pxe. This is
45 the bootstrap partition in my scripts.
46
47 Things I haven't tried:
48
49 * The bios chip has enough room for an initrd. This could be setup to
50 work like the partition I use to kexec, but it would be faster, and
51 not require installing to disk.
52
53 The partititioning and filesystem script is at
54 fai/config/hooks/partition.DEFAULT. Disks are grouped as ssd or hdd and
55 raided in raid 1 or raid 0 per configuration. The base partitions are
56 divided into boot, swap, and root, (only boot is unencrypted). There are
57 scripts to resize those partitions post-provision and while the system
58 is running.
59
60 People who use fai may find these things as useful examples: it uses
61 dnsmasq (on a openwrt machine) for dhcp instead of the isc
62 dhcp. fai-wrapper is a small script to use basic fai classes outside of
63 fai. It does not use the fai partitioning tool, but the script is
64 inspired from it and works outside of fai. It supports running a fai
65 server on debian within android via Maru.
66
67 It also automates configuration of an openwrt router after manual
68 initial installation.
69
70 After provisionining is done, I sync files using btrfs, or unison for
71 vps, then automate further setup using a different set of scripts,
72 https://iankelling.org/git/?p=distro-setup;a=tree.
73
74 My network is a wndr3700v2 router with openwrt on it and a few pcs/laptops.
75
76 Since fai requires a debian server as the fai server, there are also
77 scripts to automate a debian install using pxe and preseeding, which can
78 be done from any distro.
79
80 Some of the scripts have dependencies for some simple obvious utility
81 scripts from https://iankelling.org/git, and of course there are some
82 hostnames that are specific to my network.
83
84
85 # Per-host/install configuration
86
87 Before doing a fai install, you will need to populate a class file. I
88 use one called 51-multi-boot, which you can see example of in
89 fai/config/class/50-host-classes.
90
91 Before doing a fai install, you will need to populate /q/root/luks and
92 /q/root/shadow, see their references. You might also want to copy
93 existing /etc/ssh/*host* to
94 /p/c/machine_specific/HOST/filesystem/etc/ssh
95
96 host-* luks keyfiles generated like:
97 h=demohost; head -c 2048 /dev/urandom | od | se dd of=/q/root/luks/host-$h
98
99 Configuration of which luks key to use is in
100 fai/config/hooks/partition.DEFAULT
101
102 Configuration of which (if any) shadow file to use is in
103 fai/config/distro-install-common/end
104 and which shadow file / luks file(s) to copy into the new machine depends
105 on fai-redep arguments.
106
107 Also, setup dns in /p/c/host-info and firewall redirects in wrt-setup-local.
108
109 After install, btrbk to setup data, and then distro-begin && distro end.
110 See notes in distro-begin for other configuration.
111
112 # Per distro install/config
113
114 ./fai/config/package_config/CLASS.gpg
115
116 # Prerequesites:
117
118 <https://savannah.nongnu.org/git/?group=bash-bear-trap>
119 git clone https://git.savannah.nongnu.org/git/bash-bear-trap.git
120 sudo install -T bash-bear-trap/bash-bear /usr/local/lib/bash-bear
121
122
123 # Scripts (meant to be used directly):
124
125
126 # Setup the environment for the install
127
128 # create tiny autodiscover cd
129 # todo: with fai-revm at least, this complains about missing vmlinuz. need to fix this.
130 fai-redep && sudo fai-cd -g $PWD/grub.cfg.autodiscover -f -A $BASEFILE_DIR/autodiscover.iso
131 # create normal fai cd (replace TARGET_HOSTNAME)
132 fai-redep -t TARGET_HOSTNAME && sudo fai-cd -M -g $PWD/grub.cfg.netinst-noreboot -f $BASEFILE_DIR/netinst.iso
133 # note, may need to set hostname, depending on config,
134 # and some other things for environment not on your lan
135 # for example see fai/config/class/LINODE.var. See linode notes below.
136
137 mymk-basefile # Create basefiles for various distros
138 archlike-pxe # Setup pxe boot server from an archlike base image
139 fai-redep # Deploy fai configuration to host "faiserver.b8.nz"
140 faiserver-uninstall # uninstall fai-server
141 faiserver-setup # install fai-server on the current machine
142 myfai-chboot # setup fai tftp and nfs. useful for doing pxe-kexec or booting from a fai-cd.
143 pxe-server # disable/enable pxe dhcp, tfp, and nfs. calls myfai-chboot
144 wrt-setup # setup my router in general: dhcp, dns, etc.
145
146
147 # Script to do a distro install
148
149 faiserver-revm # using pxe & preseed, create a vm which is a fai server
150 dsfull # install & post-install a new fai distro
151 arch-init-remote # install arch after it's been booted into it's setup env
152 live-kexec # Kexec this or a remote machine using host faiserver. also
153 useful to run as curl live-kexec|bash
154
155
156 # Test scripts
157
158 arch-revm # test arch install on a fresh vm
159 fai-revm # test fai install on a fresh vm
160
161
162 # Scripts to call after a distro install for various reasons
163
164 chboot # Set grub to boot into a different distro (installed earlier)
165 install-chboot # reinstall chboot to /boot subvols, for chboot updates.
166 eboot # reboot without automatic disk decryption
167 fai-wrapper # use fai classes outside of fai. sourced, not called.
168 faiserver-disable # Disable the fai nfs server exports
169 fresize # resize swap or boot partitions in a host
170
171
172 # NAT/forward/vpn tftp
173
174 I tried to get this working, but failed.
175
176 tftp server in theory can be forwarded over a vpn, eg on a wireguard tunnel.
177
178 However, I found that when actually pxe booting, it wouldn't work, only
179 the 1st filename would be requested, eg, in the logs:
180
181 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
182
183
184 To get that far, nating tftp requires some special attention in iptables, like so:
185
186 https://unix.stackexchange.com/questions/579508/iptables-rules-to-forward-tftp-via-nat
187 iptables -t raw -A PREROUTING -p udp --dport 69 -s 209.51.188.0/24 -j CT --helper tftp
188 modprobe nf_nat_tftp
189
190 to test tftp from a client machine:
191
192 tftp SERVER_IP -c get pxelinux.0
193 rm -fv pxelinux.0
194
195
196 # Common problems
197
198 ## kernel mismatch very early error, no remote logs:
199
200 ERROR: the running kernel does not match the kernel modules inside the nfsroot.
201 ERROR: Kernel modules directory /lib/modules/5.10.0-8-amd not available. Only found /lib/modules/5.10.0-15-amd64
202
203 solution: if running from fai-cd, recreate autodiscover cd as noted above in setup.
204
205 ## Weird package dependency errors
206
207 for example: in fai.log, within instsoft.DEBIAN
208 ```
209 The following packages have unmet dependencies:
210 libc6 : Breaks: locales (< 2.36) but 2.35-0ubuntu3.7+11.0trisquel1 is to be installed
211 ```
212
213 In this case, it was because the basefile was missing, and so instead
214 fai decided to use the wrong basefile.
215
216 for example: in fai.log, within instsoft.DEBIAN
217
218 ```
219 ftar: No matching class found in /var/lib/fai/config/basefiles//
220 ftar: extracting /var/tmp/base.tar.zst to /target/
221 ```
222
223 # What good logs look like:
224
225 logging nfs traffic from server
226
227 s rpcdebug -m nfsd -s all
228
229
230 normal nfs mount & umount logs look like:
231
232 journalctl -ef | gr nfs
233
234 Jun 20 22:15:36 kd rpc.mountd[2025725]: authenticated mount request from 10.32.2.1:865 for /srv/fai/nfsroot (/srv/fai/nfsroot)
235 Jun 20 22:15:36 kd kernel: nfsd: exp_rootfh(/srv/fai/nfsroot [00000000e8c53e54] *:dm-0/5521225)
236 Jun 20 22:15:36 kd kernel: nfsd: fh_compose(exp 00:1b/5521225 fai/nfsroot, ino=5521225)
237 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
238 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
239 Jun 20 22:15:36 kd kernel: nfsd: PATHCONF(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
240 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
241 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
242 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
243 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
244 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
245 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
246 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
247 Jun 20 22:15:45 kd rpc.mountd[2025725]: authenticated unmount request from 10.32.2.1:986 for /srv/fai/nfsroot (/srv/fai/nfsroot)
248
249 normal tftpd logs from:
250
251 after setting -vv in TFTP_OPTIONS in /etc/default/tftpd-hpa
252
253 journalctl -u tftpd-hpa
254
255 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
256 Jun 20 23:51:02 kd in.tftpd[4021351]: RRQ from 10.2.0.12 filename ldlinux.c32
257 Jun 20 23:51:02 kd in.tftpd[4021352]: RRQ from 10.2.0.12 filename pxelinux.cfg/a913a477-fca6-234d-a928-6bb011decd05
258 Jun 20 23:51:02 kd in.tftpd[4021352]: sending NAK (1, File not found) to 10.2.0.12
259 Jun 20 23:51:02 kd in.tftpd[4021353]: RRQ from 10.2.0.12 filename pxelinux.cfg/01-52-54-00-9c-ef-ad
260 Jun 20 23:51:02 kd in.tftpd[4021353]: sending NAK (1, File not found) to 10.2.0.12
261 Jun 20 23:51:02 kd in.tftpd[4021354]: RRQ from 10.2.0.12 filename pxelinux.cfg/0A02000C
262 Jun 20 23:51:02 kd in.tftpd[4021355]: RRQ from 10.2.0.12 filename vmlinuz-5.10.0-15-amd64
263 Jun 20 23:51:03 kd in.tftpd[4021356]: RRQ from 10.2.0.12 filename initrd.img-5.10.0-15-amd64
264
265
266
267 # Replacing a raid 10 disk
268
269 pxe-server -S HOST fai
270
271 # btrfs replace or delete. prefer replace. to setup partitions on replacement drive:
272 scp fai-wrapper HOST:
273 ssh root@HOST
274 . fai-wrapper
275 export SPECIAL_DISK=/dev/REPLACEMENT_DEV
276 /var/lib/fai/config/hooks/partition.DEFAULT
277
278
279 ssh root@HOST
280 for x in /target/* /target; do umount $x; done
281 cat >p
282 PASSWORD HERE(ctrl-d ctrl-d)
283 cd /dev/disk/by-id/
284 for d in ata*part1; do cryptsetup luksOpen -d /root/p $d crypt_dev_$d; done
285 x=(/dev/mapper/*part1); mount -o subvol=root_trisquelflidas $x /mnt
286 # btrfs fi show /mnt
287 # btrfs replace start -f /dev/mapper/OLD_DEV /dev/mapper/NEW_DEV /mnt
288 # btrfs replace status /mnt
289 # nohup btrfs dev delete /dev/sde1 /mnt
290 mount -o subvol=boot_trisquelflidas /dev/sda3 /mnt/boot
291 # also replace or delete disk for boot
292 for x in dev proc sys; do mount -o bind /$x /mnt/$x; done
293 chroot /mnt /bin/bash
294 # replace disk in fstab
295 # replace disk in /etc/crypttab
296 update-grub
297 update-initramfs -u
298 mount /a
299 /a/exe/keyscript-on
300 exit
301 reboot
302
303
304 # Expected output in fai logs
305
306
307 ## On focal:
308
309 fai.log:updatebase.UBUNTU FAILED with exit code 1.
310 the real error is dpkg-reconfigure locales, seems to be related
311 to a workaround for < 20.04, relevant comment:
312 # in case the locales are already included inside the base file (Ubuntu)
313 in config/hooks/instsoft.DEBIAN
314
315
316 ## For flidas,
317
318 when installing systemd, this error happens, and it's
319 a superflous upstream bug based on reading the post install script:
320
321 addgroup: The group `systemd-journal' already exists as a system group. Exiting.
322 Operation failed: No such file or directory
323
324 ## On nabia/newer,
325
326 python is removed, now its python3,
327 and its easier to just let the package get removed than
328 do host class package config.
329 fai.log:WARNING: These unknown packages are removed from the installation list: python python-minimal
330
331 Similar to python, linux-image-amd64 is the debian package name
332 for the kernel, linux-image-generic is for ubuntu, but the
333 DEBIAN class is defined on ubuntu and its easier to just let
334 the package get removed with this warning:
335 fai.log:WARNING: These unknown packages are removed from the installation list: linux-image-amd64
336 Also, cryptsetup-initramfs is new to buster/nabia, it gets removed
337 on earlier versions.
338
339 ## parted error
340 fai.log:Error: /dev/vda: unrecognised disk label
341 This is from parted -m $d unit MiB print.
342 It happens when there are no partitions yet.
343
344
345 ######## notes on creating a lan with just 2 computers ########
346
347
348 ## below assumes eth0 is the ethernet device used to connect to the target computer.
349
350
351 # this is not strictly needed. I had my connection die at some point,
352 # and I suspected this might help.
353 # based on
354 # https://support.qacafe.com/knowledge-base/how-do-i-prevent-network-manager-from-controlling-an-interface/
355 cat > /etc/NetworkManager/conf.d/99-fai-tmp.conf <<'EOF'
356 [main]
357 plugins=keyfile
358
359 [keyfile]
360 unmanaged-devices=interface-name:eth0
361 EOF
362 ser restart NetworkManager
363
364
365 cat >> /etc/network/interfaces <<'EOF'
366 iface eth0 inet static
367 address 10.0.44.1/24
368 EOF
369
370 ifup eth0
371
372 # note turn off fsf vpn, so route to coresite is the normal route.
373 echo 1 > /proc/sys/net/ipv4/ip_forward
374 m s iptables -t nat -A POSTROUTING -o $(ip -4 route get 8.8.8.8 | sed -nr 's,^.* dev\s+(\S+).*,\1,p') -j MASQUERADE
375
376
377 change /p/c/machine_specific/vps/bind-initial/db.b8.nz
378 faiserver 10.0.44.1
379 TARGET_HOSTNAME 10.0.44.2
380
381 apt install isc-dhcp-server
382
383 cat >> /etc/default/isc-dhcp-server <<'EOF'
384 INTERFACESv4="eth0"
385 EOF
386
387 edit ./dhcpd.conf to change mac address and target host name.
388
389 s cp /b/fai/dhcpd.conf /etc/dhcp/
390 ser restart isc-dhcp-server
391
392 edit /a/bin/fai/fai/config/class/51-multi-boot
393
394 pxe-server -d TARGET fai
395
396 Then do a pxe boot on the target host
397
398
399
400 ##### linode notes ######
401
402 * create 2 disks, installer (3000 mb, raw), boot (remaining, raw)
403 * create 2 profiles w direct boot, no helpers:
404 * installer (sda=boot, sdb=installer, boot dev=sdb)
405 * boot (sda=boot)
406 * Boot into rescue mode, ssh in with lish,
407 curl url_to_some_fai_cd_created_image | dd of=/dev/sda
408 poweroff
409 * boot into installer.
410 * Lish shows console, at the end of install, it gives prompt because
411 logs failed to save remotely, check the logs, then reboot into boot
412 profile if all is well. If that doesn't happen, turn off lassie in
413 settings.
414
415
416
417 ###### ubuntu notes ######
418
419 For someone who really needed ubuntu on host tp, otherwise they would
420 end up on a non-gnu os, and I didn't want to figure out how to get all
421 the default software installed, I did the following:
422
423 # On remote host:
424 # install etiona
425 cd /b/fai
426 # set 51-multi-boot to set classes outside of fai-wrapper conditional, including NOWIPE
427 . fai-wrapper
428 ./fai/config/hooks/partition.DEFAULT
429
430 # on remote host
431 # install ubuntu 20.04 using virt-install
432 sudo -i
433 virt-install --os-variant=ubuntu16.04 --cdrom ubuntu-20.04-desktop-amd64.iso --disk path=u2004.qcow2 -r 2048 --vcpus 1 -n u2004
434 qemu-img create -o preallocation=metadata -f qcow2 u2004.qcow2 15G
435 # alternatively, also tried a physical install, because I know the virtual install ends up
436 # with some differen things, like some spice service. then pulled the data out with
437 rsync -ahSAX --numeric-ids --exclude=proc --exclude=sys --exclude=dev --exclude=tmp --exclude=run root@tp:/ .; mkdir proc sys dev tmp
438
439 modprobe nbd
440 qemu-nbd --connect=/dev/nbd0 u1804.qcow2 -f qcow2
441 qemu-nbd --connect=/dev/nbd0 u2004.qcow2 -f qcow2
442 mount /dev/nbd0p1 /mnt/1 # bionic
443 mount /dev/nbd0p5 /mnt/1 # focal
444 mount -o bind /mnt/root/root_ubuntubionic /mnt/2
445 mount -o bind /mnt/root/root_ubuntufocal /mnt/2
446 mkdir -p /mnt/2/boot
447 mount -o bind /mnt/boot/boot_ubuntubionic /mnt/2/boot
448 mount -o bind /mnt/boot/boot_ubuntufocal /mnt/2/boot
449 # S = sparse, A = acls, X = xattrs
450 rsync -ahSAX --numeric-ids /mnt/1/ /mnt/2
451
452 cd /mnt/2
453 cp /tmp/fai/crypttab etc
454 sed -i "s#/root/keyscript,#decrypt_keyctl,#" etc/crypttab
455 cp /tmp/fai/fstab etc
456 echo "tmpfs /tmp tmpfs nodev,nosuid,size=50%,mode=1777 0 0" >> etc/fstab
457 chrbind
458 chroot .
459 mv /etc/resolv.conf /etc/resolv.conf.old
460 echo nameserver 1.1.1.1 >/etc/resolv.conf
461 # install programs from /a/bin/fai/fai/config/package_config/STANDARD:
462 apt install -y openssh-client openssh-server cryptsetup keyutils btrfs-progs console-setup kbd pciutils usbutils unattended-upgrades initramfs-tools-core dropbear-initramfs
463 mv /etc/resolv.conf.old /etc/resolv.conf
464 exit
465 d=etc/initramfs-tools
466 mkdir -p $d/root/.ssh etc/dropbear-initramfs root/.ssh
467 chmod 700 $d/root $d/root/.ssh root/.ssh
468 cp -p /root/.ssh/authorized_keys $d/root/.ssh/authorized_keys
469 cp -p /root/.ssh/authorized_keys etc/dropbear-initramfs
470 cp -p /root/.ssh/authorized_keys root/.ssh/authorized_keys
471 chroot .
472 sed -ri 's/^ *GRUB_CMDLINE_LINUX_DEFAULT=.*/GRUB_CMDLINE_LINUX_DEFAULT="rd.luks.crypttab=no"/' /etc/default/grub
473 grub-install --no-floppy $(grub-probe -tdrive -d /dev/sda)
474 update-grub
475 grub-bios-setup -d /boot/grub/i386-pc -s /dev/sda
476 exit
477 umount proc
478 umount dev
479 umount sys
480 reboot
481
482 # for switching the boot to root2
483 zboot
484 # for switching back, efibootmgr, if there is a problem with the root filesystem detection,
485 # boot into the debian bootstrap distro, run partition.DEFAULT using comments for mktab arg.
486 # then manually run iboot and then reboot.
487
488
489 # pine rock64 notes
490 # the only useful image is ubuntu 18.04 ayafun or something.
491 # using emmc usb:
492 s mount /dev/sdb7 /mnt/1
493 s cp `which qemu-arm-static` /mnt/1/usr/bin
494 s chroot /mnt/1 qemu-arm-static /bin/bash
495 usermod --login iank --move-home --home /home/iank rock46
496 groupmod --new-name iank rock64
497 passwd iank
498 # boot it
499 s apt-get update
500 s apt dist-upgrade
501
502
503 ### How to merge upstream fai-config
504
505 git checkout upstream
506 cd path-to-fai-config
507 git pull --stat
508 # the following needs modification if there was deletions or renames
509 rsync --exclude /.git -rlpgoDcvi . /b/fai/fai/config/
510 cd /b/fai/fai/config/
511 # where XXXXX is the git commit hash
512 # note, several files which just had trailing space changes will get ignored.
513 git commit -am "update upstream to XXXXX"
514 git checkout master
515 git merge upstream
516 # fix conflicts
517 git commit
518
519
520 # TODO
521 Change arch to archlike and to support arch and parabola
522
523
524 # License
525
526 The license for the project is GPLv2 or later, mostly because fai is and
527 I periodically merge the upstream example config, which contains small
528 scripts. Also, there is a modified encrypt.upstream, which is from the
529 cryptsetup package in arch, which is under the same license.