da28e4e2f0a0ef933f72cc0949055eb864e0c208
[automated-distro-installer] / README
1 # This file is part of Ian Kelling's automated-distro-installer
2 # Copyright (C) 2024 Ian Kelling
3
4 # This program is free software; you can redistribute it and/or
5 # modify it under the terms of the GNU General Public License
6 # as published by the Free Software Foundation; either version 2
7 # of the License, or (at your option) any later version.
8
9 # This program is distributed in the hope that it will be useful,
10 # but WITHOUT ANY WARRANTY; without even the implied warranty of
11 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 # GNU General Public License for more details.
13
14 # You should have received a copy of the GNU General Public License
15 # along with this program; if not, write to the Free Software
16 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
17
18 PXE install w multi-boot, btrfs & Libreboot support
19
20 Some things are specific to my home network, and uses files with secrets
21 that are not in this repo. I use this for bare metal and vms, and two
22 scripts which can run post boot so I use them on vps distributed image
23 as well.
24
25 Features people may find useful: installs encrypted trisquel, debian,
26 ubuntu, arch, and parabola (archlike install is likely broken, I've only
27 done pxe boots recently), in a multi-boot setup using multiple
28 subvolumes of a single btrfs filesystem. Utilizes multiple disks, with
29 scripts to automatically decrypt on intentional reboots, but not after
30 shutdown or power loss.
31
32 Normal install mode for fai is using pxe, but on a libreboot system,
33 there is no pxe. The pxe in a normal computer is nonfree
34 firmware. Alternatives to normal pxe that I've tried:
35
36 * libreboot + seabios + ipxe
37
38 * Use a live cd to call pxe-kexec, this is described later in this file.
39
40 * Use the fai autodiscover iso. This is more automated, so nicer.
41
42 * Use an install method above to setup a gnu/linux disk partition that
43 coordinates with libreboot grub to acts like a pxe boot using
44 kexec. The boot process takes a bit longer than normal pxe. This is
45 the bootstrap partition in my scripts.
46
47 Things I haven't tried:
48
49 * The bios chip has enough room for an initrd. This could be setup to
50 work like the partition I use to kexec, but it would be faster, and
51 not require installing to disk.
52
53 The partititioning and filesystem script is at
54 fai/config/hooks/partition.DEFAULT. Disks are grouped as ssd or hdd and
55 raided in raid 1 or raid 0 per configuration. The base partitions are
56 divided into boot, swap, and root, (only boot is unencrypted). There are
57 scripts to resize those partitions post-provision and while the system
58 is running.
59
60 People who use fai may find these things as useful examples: it uses
61 dnsmasq (on a openwrt machine) for dhcp instead of the isc
62 dhcp. fai-wrapper is a small script to use basic fai classes outside of
63 fai. It does not use the fai partitioning tool, but the script is
64 inspired from it and works outside of fai. It supports running a fai
65 server on debian within android via Maru.
66
67 It also automates configuration of an openwrt router after manual
68 initial installation.
69
70 After provisionining is done, I sync files using btrfs, or unison for
71 vps, then automate further setup using a different set of scripts,
72 https://iankelling.org/git/?p=distro-setup;a=tree.
73
74 My network is a wndr3700v2 router with openwrt on it and a few pcs/laptops.
75
76 Since fai requires a debian server as the fai server, there are also
77 scripts to automate a debian install using pxe and preseeding, which can
78 be done from any distro.
79
80 Some of the scripts have dependencies for some simple obvious utility
81 scripts from https://iankelling.org/git, and of course there are some
82 hostnames that are specific to my network.
83
84
85 # Per-host/install configuration
86
87 Before doing a fai install, you will need to populate a class file. I
88 use one called 51-multi-boot, which you can see example of in
89 fai/config/class/50-host-classes.
90
91 Before doing a fai install, you will need to populate /q/root/luks and
92 /q/root/shadow, see their references. You might also want to copy
93 existing /etc/ssh/*host* to
94 /p/c/machine_specific/HOST/filesystem/etc/ssh
95
96 host-* luks keyfiles generated like:
97 h=demohost; head -c 2048 /dev/urandom | od | se dd of=/q/root/luks/host-$h
98
99 Configuration of which luks key to use is in
100 fai/config/hooks/partition.DEFAULT
101
102 Configuration of which (if any) shadow file to use is in
103 fai/config/distro-install-common/end
104 and which shadow file / luks file(s) to copy into the new machine depends
105 on fai-redep arguments.
106
107 Also, setup dns in bind and wrt-setup-local.
108
109 After install, btrbk to setup data, and then distro-begin && distro end.
110 See notes in distro-begin for other configuration.
111
112 # Scripts (meant to be used directly):
113
114
115 # Setup the environment for the install
116
117 # create tiny autodiscover cd
118 # todo: with fai-revm at least, this complains about missing vmlinuz. need to fix this.
119 fai-redep && sudo fai-cd -g $PWD/grub.cfg.autodiscover -f -A $BASEFILE_DIR/autodiscover.iso
120 # create normal fai cd (replace TARGET_HOSTNAME)
121 fai-redep -t TARGET_HOSTNAME && sudo fai-cd -M -g $PWD/grub.cfg.netinst-noreboot -f $BASEFILE_DIR/netinst.iso
122 # note, may need to set hostname, depending on config,
123 # and some other things for environment not on your lan
124 # for example see fai/config/class/LINODE.var. See linode notes below.
125
126 mymk-basefile # Create basefiles for various distros
127 archlike-pxe # Setup pxe boot server from an archlike base image
128 fai-redep # Deploy fai configuration to host "faiserver"
129 faiserver-uninstall # uninstall fai-server
130 faiserver-setup # install fai-server on the current machine
131 myfai-chboot # setup fai tftp and nfs. useful for doing pxe-kexec
132 pxe-server # disable/enable pxe dhcp, tfp, and nfs. calls myfai-chboot
133 wrt-setup # setup my router in general: dhcp, dns, etc.
134
135
136 # Script to do a distro install
137
138 faiserver-revm # using pxe & preseed, create a vm which is a fai server
139 dsfull # install & post-install a new fai distro
140 arch-init-remote # install arch after it's been booted into it's setup env
141 live-kexec # Kexec this or a remote machine using host faiserver. also
142 useful to run as curl live-kexec|bash
143
144
145 # Test scripts
146
147 arch-revm # test arch install on a fresh vm
148 fai-revm # test fai install on a fresh vm
149
150
151 # Scripts to call after a distro install for various reasons
152
153 chboot # Set grub to boot into a different distro (installed earlier)
154 install-chboot # reinstall chboot to /boot subvols, for chboot updates.
155 eboot # reboot without automatic disk decryption
156 fai-wrapper # use fai classes outside of fai. sourced, not called.
157 faiserver-disable # Disable the fai nfs server exports
158 fresize # resize swap or boot partitions in a host
159
160
161 # NAT/forward/vpn tftp
162
163 I tried to get this working, but failed.
164
165 tftp server in theory can be forwarded over a vpn, eg on a wireguard tunnel.
166
167 However, I found that when actually pxe booting, it wouldn't work, only
168 the 1st filename would be requested, eg, in the logs:
169
170 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
171
172
173 To get that far, nating tftp requires some special attention in iptables, like so:
174
175 https://unix.stackexchange.com/questions/579508/iptables-rules-to-forward-tftp-via-nat
176 iptables -t raw -A PREROUTING -p udp --dport 69 -s 209.51.188.0/24 -j CT --helper tftp
177 modprobe nf_nat_tftp
178
179 to test tftp from a client machine:
180
181 tftp SERVER_IP -c get pxelinux.0
182 rm -fv pxelinux.0
183
184
185 # Common problems
186
187 ## kernel mismatch very early error, no remote logs:
188
189 ERROR: the running kernel does not match the kernel modules inside the nfsroot.
190 ERROR: Kernel modules directory /lib/modules/5.10.0-8-amd not available. Only found /lib/modules/5.10.0-15-amd64
191
192 solution: if running from fai-cd, recreate autodiscover cd as noted above in setup.
193
194 # What good logs look like:
195
196 logging nfs traffic from server
197
198 s rpcdebug -m nfsd -s all
199
200
201 normal nfs mount & umount logs look like:
202
203 journalctl -ef | gr nfs
204
205 Jun 20 22:15:36 kd rpc.mountd[2025725]: authenticated mount request from 10.32.2.1:865 for /srv/fai/nfsroot (/srv/fai/nfsroot)
206 Jun 20 22:15:36 kd kernel: nfsd: exp_rootfh(/srv/fai/nfsroot [00000000e8c53e54] *:dm-0/5521225)
207 Jun 20 22:15:36 kd kernel: nfsd: fh_compose(exp 00:1b/5521225 fai/nfsroot, ino=5521225)
208 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
209 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
210 Jun 20 22:15:36 kd kernel: nfsd: PATHCONF(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
211 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
212 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
213 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
214 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
215 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
216 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
217 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
218 Jun 20 22:15:45 kd rpc.mountd[2025725]: authenticated unmount request from 10.32.2.1:986 for /srv/fai/nfsroot (/srv/fai/nfsroot)
219
220 normal tftpd logs from:
221
222 after setting -vv in TFTP_OPTIONS in /etc/default/tftpd-hpa
223
224 journalctl -u tftpd-hpa
225
226 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
227 Jun 20 23:51:02 kd in.tftpd[4021351]: RRQ from 10.2.0.12 filename ldlinux.c32
228 Jun 20 23:51:02 kd in.tftpd[4021352]: RRQ from 10.2.0.12 filename pxelinux.cfg/a913a477-fca6-234d-a928-6bb011decd05
229 Jun 20 23:51:02 kd in.tftpd[4021352]: sending NAK (1, File not found) to 10.2.0.12
230 Jun 20 23:51:02 kd in.tftpd[4021353]: RRQ from 10.2.0.12 filename pxelinux.cfg/01-52-54-00-9c-ef-ad
231 Jun 20 23:51:02 kd in.tftpd[4021353]: sending NAK (1, File not found) to 10.2.0.12
232 Jun 20 23:51:02 kd in.tftpd[4021354]: RRQ from 10.2.0.12 filename pxelinux.cfg/0A02000C
233 Jun 20 23:51:02 kd in.tftpd[4021355]: RRQ from 10.2.0.12 filename vmlinuz-5.10.0-15-amd64
234 Jun 20 23:51:03 kd in.tftpd[4021356]: RRQ from 10.2.0.12 filename initrd.img-5.10.0-15-amd64
235
236
237
238 # Replacing a raid 10 disk
239
240 pxe-server -S HOST fai
241
242 # btrfs replace or delete. prefer replace. to setup partitions on replacement drive:
243 scp fai-wrapper HOST:
244 ssh root@HOST
245 . fai-wrapper
246 export SPECIAL_DISK=/dev/REPLACEMENT_DEV
247 /var/lib/fai/config/hooks/partition.DEFAULT
248
249
250 ssh root@HOST
251 for x in /target/* /target; do umount $x; done
252 cat >p
253 PASSWORD HERE(ctrl-d ctrl-d)
254 cd /dev/disk/by-id/
255 for d in ata*part1; do cryptsetup luksOpen -d /root/p $d crypt_dev_$d; done
256 x=(/dev/mapper/*part1); mount -o subvol=root_trisquelflidas $x /mnt
257 # btrfs fi show /mnt
258 # btrfs replace start -f /dev/mapper/OLD_DEV /dev/mapper/NEW_DEV /mnt
259 # btrfs replace status /mnt
260 # nohup btrfs dev delete /dev/sde1 /mnt
261 mount -o subvol=boot_trisquelflidas /dev/sda3 /mnt/boot
262 # also replace or delete disk for boot
263 for x in dev proc sys; do mount -o bind /$x /mnt/$x; done
264 chroot /mnt /bin/bash
265 # replace disk in fstab
266 # replace disk in /etc/crypttab
267 update-grub
268 update-initramfs -u
269 mount /a
270 /a/exe/keyscript-on
271 exit
272 reboot
273
274
275 # Expected output in fai logs
276
277
278 ## On focal:
279
280 fai.log:updatebase.UBUNTU FAILED with exit code 1.
281 the real error is dpkg-reconfigure locales, seems to be related
282 to a workaround for < 20.04, relevant comment:
283 # in case the locales are already included inside the base file (Ubuntu)
284 in config/hooks/instsoft.DEBIAN
285
286
287 ## For flidas,
288
289 when installing systemd, this error happens, and it's
290 a superflous upstream bug based on reading the post install script:
291
292 addgroup: The group `systemd-journal' already exists as a system group. Exiting.
293 Operation failed: No such file or directory
294
295 ## On nabia/newer,
296
297 python is removed, now its python3,
298 and its easier to just let the package get removed than
299 do host class package config.
300 fai.log:WARNING: These unknown packages are removed from the installation list: python python-minimal
301
302 Similar to python, linux-image-amd64 is the debian package name
303 for the kernel, linux-image-generic is for ubuntu, but the
304 DEBIAN class is defined on ubuntu and its easier to just let
305 the package get removed with this warning:
306 fai.log:WARNING: These unknown packages are removed from the installation list: linux-image-amd64
307 Also, cryptsetup-initramfs is new to buster/nabia, it gets removed
308 on earlier versions.
309
310 ## parted error
311 fai.log:Error: /dev/vda: unrecognised disk label
312 This is from parted -m $d unit MiB print.
313 It happens when there are no partitions yet.
314
315
316 ######## notes on creating a lan with just 2 computers ########
317
318
319 ## below assumes eth0 is the ethernet device used to connect to the target computer.
320
321
322 # this is not strictly needed. I had my connection die at some point,
323 # and I suspected this might help.
324 # based on
325 # https://support.qacafe.com/knowledge-base/how-do-i-prevent-network-manager-from-controlling-an-interface/
326 cat > /etc/NetworkManager/conf.d/99-fai-tmp.conf <<'EOF'
327 [main]
328 plugins=keyfile
329
330 [keyfile]
331 unmanaged-devices=interface-name:eth0
332 EOF
333 ser restart NetworkManager
334
335
336 cat >> /etc/network/interfaces <<'EOF'
337 iface eth0 inet static
338 address 10.0.44.1/24
339 EOF
340
341 ifup eth0
342
343 # note turn off fsf vpn, so route to coresite is the normal route.
344 echo 1 > /proc/sys/net/ipv4/ip_forward
345 m s iptables -t nat -A POSTROUTING -o $(ip -4 route get 8.8.8.8 | sed -nr 's,^.* dev\s+(\S+).*,\1,p') -j MASQUERADE
346
347
348 change /p/c/machine_specific/vps/bind-initial/db.b8.nz
349 faiserver 10.0.44.1
350 TARGET_HOSTNAME 10.0.44.2
351
352 apt install isc-dhcp-server
353
354 cat >> /etc/default/isc-dhcp-server <<'EOF'
355 INTERFACESv4="eth0"
356 EOF
357
358 edit ./dhcpd.conf to change mac address and target host name.
359
360 s cp /b/fai/dhcpd.conf /etc/dhcp/
361 ser restart isc-dhcp-server
362
363 edit /a/bin/fai/fai/config/class/51-multi-boot
364
365 pxe-server -d TARGET fai
366
367 Then do a pxe boot on the target host
368
369
370
371 ##### linode notes ######
372
373 * create 2 disks, installer (3000 mb, raw), boot (remaining, raw)
374 * create 2 profiles w direct boot, no helpers:
375 * installer (sda=boot, sdb=installer, boot dev=sdb)
376 * boot (sda=boot)
377 * Boot into rescue mode, ssh in with lish,
378 curl url_to_some_fai_cd_created_image | dd of=/dev/sda
379 poweroff
380 * boot into installer.
381 * Lish shows console, at the end of install, it gives prompt because
382 logs failed to save remotely, check the logs, then reboot into boot
383 profile if all is well. If that doesn't happen, turn off lassie in
384 settings.
385
386
387
388 ###### ubuntu notes ######
389
390 For someone who really needed ubuntu on host tp, otherwise they would
391 end up on a non-gnu os, and I didn't want to figure out how to get all
392 the default software installed, I did the following:
393
394 # On remote host:
395 # install etiona
396 cd /b/fai
397 # set 51-multi-boot to set classes outside of fai-wrapper conditional, including NOWIPE
398 . fai-wrapper
399 ./fai/config/hooks/partition.DEFAULT
400
401 # on remote host
402 # install ubuntu 20.04 using virt-install
403 sudo -i
404 virt-install --os-variant=ubuntu16.04 --cdrom ubuntu-20.04-desktop-amd64.iso --disk path=u2004.qcow2 -r 2048 --vcpus 1 -n u2004
405 qemu-img create -o preallocation=metadata -f qcow2 u2004.qcow2 15G
406 # alternatively, also tried a physical install, because I know the virtual install ends up
407 # with some differen things, like some spice service. then pulled the data out with
408 rsync -ahSAX --numeric-ids --exclude=proc --exclude=sys --exclude=dev --exclude=tmp --exclude=run root@tp:/ .; mkdir proc sys dev tmp
409
410 modprobe nbd
411 qemu-nbd --connect=/dev/nbd0 u1804.qcow2 -f qcow2
412 qemu-nbd --connect=/dev/nbd0 u2004.qcow2 -f qcow2
413 mount /dev/nbd0p1 /mnt/1 # bionic
414 mount /dev/nbd0p5 /mnt/1 # focal
415 mount -o bind /mnt/root/root_ubuntubionic /mnt/2
416 mount -o bind /mnt/root/root_ubuntufocal /mnt/2
417 mkdir -p /mnt/2/boot
418 mount -o bind /mnt/boot/boot_ubuntubionic /mnt/2/boot
419 mount -o bind /mnt/boot/boot_ubuntufocal /mnt/2/boot
420 # S = sparse, A = acls, X = xattrs
421 rsync -ahSAX --numeric-ids /mnt/1/ /mnt/2
422
423 cd /mnt/2
424 cp /tmp/fai/crypttab etc
425 sed -i "s#/root/keyscript,#decrypt_keyctl,#" etc/crypttab
426 cp /tmp/fai/fstab etc
427 echo "tmpfs /tmp tmpfs nodev,nosuid,size=50%,mode=1777 0 0" >> etc/fstab
428 chrbind
429 chroot .
430 mv /etc/resolv.conf /etc/resolv.conf.old
431 echo nameserver 1.1.1.1 >/etc/resolv.conf
432 # install programs from /a/bin/fai/fai/config/package_config/STANDARD:
433 apt install -y openssh-client openssh-server cryptsetup keyutils btrfs-progs console-setup kbd pciutils usbutils unattended-upgrades initramfs-tools-core dropbear-initramfs
434 mv /etc/resolv.conf.old /etc/resolv.conf
435 exit
436 d=etc/initramfs-tools
437 mkdir -p $d/root/.ssh etc/dropbear-initramfs root/.ssh
438 chmod 700 $d/root $d/root/.ssh root/.ssh
439 cp -p /root/.ssh/authorized_keys $d/root/.ssh/authorized_keys
440 cp -p /root/.ssh/authorized_keys etc/dropbear-initramfs
441 cp -p /root/.ssh/authorized_keys root/.ssh/authorized_keys
442 chroot .
443 sed -ri 's/^ *GRUB_CMDLINE_LINUX_DEFAULT=.*/GRUB_CMDLINE_LINUX_DEFAULT="rd.luks.crypttab=no"/' /etc/default/grub
444 grub-install --no-floppy $(grub-probe -tdrive -d /dev/sda)
445 update-grub
446 grub-bios-setup -d /boot/grub/i386-pc -s /dev/sda
447 exit
448 umount proc
449 umount dev
450 umount sys
451 reboot
452
453 # for switching the boot to root2
454 zboot
455 # for switching back, efibootmgr, if there is a problem with the root filesystem detection,
456 # boot into the debian bootstrap distro, run partition.DEFAULT using comments for mktab arg.
457 # then manually run iboot and then reboot.
458
459
460 # pine rock64 notes
461 # the only useful image is ubuntu 18.04 ayafun or something.
462 # using emmc usb:
463 s mount /dev/sdb7 /mnt/1
464 s cp `which qemu-arm-static` /mnt/1/usr/bin
465 s chroot /mnt/1 qemu-arm-static /bin/bash
466 usermod --login iank --move-home --home /home/iank rock46
467 groupmod --new-name iank rock64
468 passwd iank
469 # boot it
470 s apt-get update
471 s apt dist-upgrade
472
473
474 ### How to merge upstream fai-config
475
476 git checkout upstream
477 cd path-to-fai-config
478 git pull --stat
479 # the following needs modification if there was deletions or renames
480 rsync --exclude /.git -rlpgoDcvi . /b/fai/fai/config/
481 cd /b/fai/fai/config/
482 # where XXXXX is the git commit hash
483 # note, several files which just had trailing space changes will get ignored.
484 git commit -am "update upstream to XXXXX"
485 git checkout master
486 git merge upstream
487 # fix conflicts
488 git commit
489
490
491 # TODO
492 Change arch to archlike and to support arch and parabola
493
494
495 # License
496
497 The license for the project is GPLv2 or later, mostly because fai is and
498 I periodically merge the upstream example config, which contains small
499 scripts. Also, there is a modified encrypt.upstream, which is from the
500 cryptsetup package in arch, which is under the same license.