change partitioning to use lvm, refactor for fsf server
[automated-distro-installer] / README
1 PXE install w multi-boot, btrfs & Libreboot support
2
3 Some things are specific to my home network, and uses files with secrets
4 that are not in this repo. I use this for bare metal and vms, and two
5 scripts which can run post boot so I use them on vps distributed image
6 as well.
7
8 Features people may find useful: installs encrypted trisquel, debian,
9 ubuntu, arch, and parabola (archlike install is likely broken, I've only
10 done pxe boots recently), in a multi-boot setup using multiple
11 subvolumes of a single btrfs filesystem. Utilizes multiple disks, with
12 scripts to automatically decrypt on intentional reboots, but not after
13 shutdown or power loss.
14
15 Normal install mode for fai is using pxe, but on a libreboot system,
16 there is no pxe. The pxe in a normal computer is nonfree
17 firmware. Alternatives to normal pxe that I've tried:
18
19 * libreboot + seabios + ipxe
20
21 * Use a live cd to call pxe-kexec, this is described later in this file.
22
23 * Use the fai autodiscover iso. This is more automated, so nicer.
24
25 * Use an install method above to setup a gnu/linux disk partition that
26 coordinates with libreboot grub to acts like a pxe boot using
27 kexec. The boot process takes a bit longer than normal pxe. This is
28 the bootstrap partition in my scripts.
29
30 Things I haven't tried:
31
32 * The bios chip has enough room for an initrd. This could be setup to
33 work like the partition I use to kexec, but it would be faster, and
34 not require installing to disk.
35
36 The partititioning and filesystem script is at
37 fai/config/hooks/partition.DEFAULT. Disks are grouped as ssd or hdd and
38 raided in raid 1 or raid 0 per configuration. The base partitions are
39 divided into boot, swap, and root, (only boot is unencrypted). There are
40 scripts to resize those partitions post-provision and while the system
41 is running.
42
43 People who use fai may find these things as useful examples: it uses
44 dnsmasq (on a openwrt machine) for dhcp instead of the isc
45 dhcp. fai-wrapper is a small script to use basic fai classes outside of
46 fai. It does not use the fai partitioning tool, but the script is
47 inspired from it and works outside of fai. It supports running a fai
48 server on debian within android via Maru.
49
50 It also automates configuration of an openwrt router after manual
51 initial installation.
52
53 After provisionining is done, I sync files using btrfs, or unison for
54 vps, then automate further setup using a different set of scripts,
55 https://iankelling.org/git/?p=distro-setup;a=tree.
56
57 My network is a wndr3700v2 router with openwrt on it and a few pcs/laptops.
58
59 Since fai requires a debian server as the fai server, there are also
60 scripts to automate a debian install using pxe and preseeding, which can
61 be done from any distro.
62
63 Some of the scripts have dependencies for some simple obvious utility
64 scripts from https://iankelling.org/git, and of course there are some
65 hostnames that are specific to my network.
66
67
68 # Per-host/install configuration
69
70 Before doing a fai install, you will need to populate a class file. I
71 use one called 51-multi-boot, which you can see example of in
72 fai/config/class/50-host-classes.
73
74
75
76 Before doing a fai install, you will need to populate /q/root/luks and
77 /q/root/shadow, see their references. You might also want to copy
78 existing /etc/ssh/*host* to
79 /p/c/machine_specific/HOST/filesystem/etc/ssh
80
81 host-* luks keyfiles generated like:
82 head -c 2048 /dev/urandom | od | s dd of=/q/root/luks/host-demohost
83
84 Configuration of which luks key to use is in
85 fai/config/hooks/partition.DEFAULT
86
87 Configuration of which (if any) shadow file to use is in
88 fai/config/distro-install-common/end
89 and which shadow file / luks file(s) to copy into the new machine depends
90 on fai-redep arguments.
91
92 # Scripts (meant to be used directly):
93
94
95 # Setup the environment for the install
96
97 # create tiny autodiscover cd
98 # todo: with fai-revm at least, this complains about missing vmlinuz. need to fix this.
99 fai-redep && sudo fai-cd -g $PWD/grub.cfg.autodiscover -f -A $BASEFILE_DIR/autodiscover.iso
100 # create normal fai cd (replace TARGET_HOSTNAME)
101 fai-redep -t TARGET_HOSTNAME && sudo fai-cd -M -g $PWD/grub.cfg.netinst-noreboot -f $BASEFILE_DIR/netinst.iso
102 # note, may need to set hostname, depending on config,
103 # and some other things for environment not on your lan
104 # for example see fai/config/class/LINODE.var. See linode notes below.
105
106 mymk-basefile # Create basefiles for various distros
107 archlike-pxe # Setup pxe boot server from an archlike base image
108 fai-redep # Deploy fai configuration to host "faiserver"
109 faiserver-uninstall # uninstall fai-server
110 faiserver-setup # install fai-server on the current machine
111 myfai-chboot # setup fai tftp and nfs. useful for doing pxe-kexec
112 pxe-server # disable/enable pxe dhcp, tfp, and nfs. calls myfai-chboot
113 wrt-setup # setup my router in general: dhcp, dns, etc.
114
115
116 # Script to do a distro install
117
118 faiserver-revm # using pxe & preseed, create a vm which is a fai server
119 dsfull # install & post-install a new fai distro
120 arch-init-remote # install arch after it's been booted into it's setup env
121 live-kexec # Kexec this or a remote machine using host faiserver. also
122 useful to run as curl live-kexec|bash
123
124
125 # Test scripts
126
127 arch-revm # test arch install on a fresh vm
128 fai-revm # test fai install on a fresh vm
129
130
131 # Scripts to call after a distro install for various reasons
132
133 chboot # Set grub to boot into a different distro (installed earlier)
134 install-chboot # reinstall chboot to /boot subvols, for chboot updates.
135 eboot # reboot without automatic disk decryption
136 fai-wrapper # use fai classes outside of fai. sourced, not called.
137 faiserver-disable # Disable the fai nfs server exports
138 fresize # resize swap or boot partitions in a host
139
140
141 # NAT/forward/vpn tftp
142
143 I tried to get this working, but failed.
144
145 tftp server in theory can be forwarded over a vpn, eg on a wireguard tunnel.
146
147 However, I found that when actually pxe booting, it wouldn't work, only
148 the 1st filename would be requested, eg, in the logs:
149
150 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
151
152
153 To get that far, nating tftp requires some special attention in iptables, like so:
154
155 https://unix.stackexchange.com/questions/579508/iptables-rules-to-forward-tftp-via-nat
156 iptables -t raw -A PREROUTING -p udp --dport 69 -s 209.51.188.0/24 -j CT --helper tftp
157 modprobe nf_nat_tftp
158
159 to test tftp from a client machine:
160
161 tftp SERVER_IP -c get pxelinux.0
162 rm -fv pxelinux.0
163
164
165 # Common problems
166
167 ## kernel mismatch very early error, no remote logs:
168
169 ERROR: the running kernel does not match the kernel modules inside the nfsroot.
170 ERROR: Kernel modules directory /lib/modules/5.10.0-8-amd not available. Only found /lib/modules/5.10.0-15-amd64
171
172 solution: if running from fai-cd, recreate autodiscover cd as noted above in setup.
173
174 # What good logs look like:
175
176 logging nfs traffic from server
177
178 s rpcdebug -m nfsd -s all
179
180
181 normal nfs mount & umount logs look like:
182
183 journalctl -ef | gr nfs
184
185 Jun 20 22:15:36 kd rpc.mountd[2025725]: authenticated mount request from 10.32.2.1:865 for /srv/fai/nfsroot (/srv/fai/nfsroot)
186 Jun 20 22:15:36 kd kernel: nfsd: exp_rootfh(/srv/fai/nfsroot [00000000e8c53e54] *:dm-0/5521225)
187 Jun 20 22:15:36 kd kernel: nfsd: fh_compose(exp 00:1b/5521225 fai/nfsroot, ino=5521225)
188 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
189 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
190 Jun 20 22:15:36 kd kernel: nfsd: PATHCONF(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
191 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
192 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
193 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
194 Jun 20 22:15:36 kd kernel: nfsd: FSINFO(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
195 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
196 Jun 20 22:15:36 kd kernel: nfsd: GETATTR(3) 28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000
197 Jun 20 22:15:36 kd kernel: nfsd: fh_verify(28: 00070001 00543f49 00000000 d185f7b0 58d1a3c6 00000000)
198 Jun 20 22:15:45 kd rpc.mountd[2025725]: authenticated unmount request from 10.32.2.1:986 for /srv/fai/nfsroot (/srv/fai/nfsroot)
199
200 normal tftpd logs from:
201
202 after setting -vv in TFTP_OPTIONS in /etc/default/tftpd-hpa
203
204 journalctl -u tftpd-hpa
205
206 Jun 20 23:51:02 kd in.tftpd[4021350]: RRQ from 10.2.0.12 filename pxelinux.0
207 Jun 20 23:51:02 kd in.tftpd[4021351]: RRQ from 10.2.0.12 filename ldlinux.c32
208 Jun 20 23:51:02 kd in.tftpd[4021352]: RRQ from 10.2.0.12 filename pxelinux.cfg/a913a477-fca6-234d-a928-6bb011decd05
209 Jun 20 23:51:02 kd in.tftpd[4021352]: sending NAK (1, File not found) to 10.2.0.12
210 Jun 20 23:51:02 kd in.tftpd[4021353]: RRQ from 10.2.0.12 filename pxelinux.cfg/01-52-54-00-9c-ef-ad
211 Jun 20 23:51:02 kd in.tftpd[4021353]: sending NAK (1, File not found) to 10.2.0.12
212 Jun 20 23:51:02 kd in.tftpd[4021354]: RRQ from 10.2.0.12 filename pxelinux.cfg/0A02000C
213 Jun 20 23:51:02 kd in.tftpd[4021355]: RRQ from 10.2.0.12 filename vmlinuz-5.10.0-15-amd64
214 Jun 20 23:51:03 kd in.tftpd[4021356]: RRQ from 10.2.0.12 filename initrd.img-5.10.0-15-amd64
215
216
217
218 # Replacing a raid 10 disk
219
220 pxe-server -S HOST fai
221
222 # btrfs replace or delete. prefer replace. to setup partitions on replacement drive:
223 scp fai-wrapper HOST:
224 ssh root@HOST
225 . fai-wrapper
226 export SPECIAL_DISK=/dev/REPLACEMENT_DEV
227 /var/lib/fai/config/hooks/partition.DEFAULT
228
229
230 ssh root@HOST
231 for x in /target/* /target; do umount $x; done
232 cat >p
233 PASSWORD HERE(ctrl-d ctrl-d)
234 cd /dev/disk/by-id/
235 for d in ata*part1; do cryptsetup luksOpen -d /root/p $d crypt_dev_$d; done
236 x=(/dev/mapper/*part1); mount -o subvol=root_trisquelflidas $x /mnt
237 # btrfs fi show /mnt
238 # btrfs replace start -f /dev/mapper/OLD_DEV /dev/mapper/NEW_DEV /mnt
239 # btrfs replace status /mnt
240 # nohup btrfs dev delete /dev/sde1 /mnt
241 mount -o subvol=boot_trisquelflidas /dev/sda3 /mnt/boot
242 # also replace or delete disk for boot
243 for x in dev proc sys; do mount -o bind /$x /mnt/$x; done
244 chroot /mnt /bin/bash
245 # replace disk in fstab
246 # replace disk in /etc/crypttab
247 update-grub
248 update-initramfs -u
249 mount /a
250 /a/exe/keyscript-on
251 exit
252 reboot
253
254
255 # Expected output in fai logs
256
257
258 ## On focal:
259
260 fai.log:updatebase.UBUNTU FAILED with exit code 1.
261 the real error is dpkg-reconfigure locales, seems to be related
262 to a workaround for < 20.04, relevant comment:
263 # in case the locales are already included inside the base file (Ubuntu)
264 in config/hooks/instsoft.DEBIAN
265
266
267 ## For flidas,
268
269 when installing systemd, this error happens, and it's
270 a superflous upstream bug based on reading the post install script:
271
272 addgroup: The group `systemd-journal' already exists as a system group. Exiting.
273 Operation failed: No such file or directory
274
275 ## On nabia/newer,
276
277 python is removed, now its python3,
278 and its easier to just let the package get removed than
279 do host class package config.
280 fai.log:WARNING: These unknown packages are removed from the installation list: python python-minimal
281
282 Similar to python, linux-image-amd64 is the debian package name
283 for the kernel, linux-image-generic is for ubuntu, but the
284 DEBIAN class is defined on ubuntu and its easier to just let
285 the package get removed with this warning:
286 fai.log:WARNING: These unknown packages are removed from the installation list: linux-image-amd64
287 Also, cryptsetup-initramfs is new to buster/nabia, it gets removed
288 on earlier versions.
289
290 ## parted error
291 fai.log:Error: /dev/vda: unrecognised disk label
292 This is from parted -m $d unit MiB print.
293 It happens when there are no partitions yet.
294
295 # linode notes
296
297 * create 2 disks, installer (3000 mb, raw), boot (remaining, raw)
298 * create 2 profiles w direct boot, no helpers:
299 * installer (sda=boot, sdb=installer, boot dev=sdb)
300 * boot (sda=boot)
301 * Boot into rescue mode, ssh in with lish,
302 curl url_to_some_fai_cd_created_image | dd of=/dev/sda
303 poweroff
304 * boot into installer.
305 * Lish shows console, at the end of install, it gives prompt because
306 logs failed to save remotely, check the logs, then reboot into boot
307 profile if all is well. If that doesn't happen, turn off lassie in
308 settings.
309
310
311 # ubuntu notes
312
313 For someone who really needed ubuntu on host tp, otherwise they would
314 end up on a non-gnu os, and I didn't want to figure out how to get all
315 the default software installed, I did the following:
316
317 # On remote host:
318 # install etiona
319 cd /b/fai
320 # set 51-multi-boot to set classes outside of fai-wrapper conditional, including NOWIPE
321 . fai-wrapper
322 ./fai/config/hooks/partition.DEFAULT
323
324 # on remote host
325 # install ubuntu 20.04 using virt-install
326 sudo -i
327 virt-install --os-variant=ubuntu16.04 --cdrom ubuntu-20.04-desktop-amd64.iso --disk path=u2004.qcow2 -r 2048 --vcpus 1 -n u2004
328 qemu-img create -o preallocation=metadata -f qcow2 u2004.qcow2 15G
329 # alternatively, also tried a physical install, because I know the virtual install ends up
330 # with some differen things, like some spice service. then pulled the data out with
331 rsync -ahSAX --numeric-ids --exclude=proc --exclude=sys --exclude=dev --exclude=tmp --exclude=run root@tp:/ .; mkdir proc sys dev tmp
332
333 modprobe nbd
334 qemu-nbd --connect=/dev/nbd0 u1804.qcow2 -f qcow2
335 qemu-nbd --connect=/dev/nbd0 u2004.qcow2 -f qcow2
336 mount /dev/nbd0p1 /mnt/1 # bionic
337 mount /dev/nbd0p5 /mnt/1 # focal
338 mount -o bind /mnt/root/root_ubuntubionic /mnt/2
339 mount -o bind /mnt/root/root_ubuntufocal /mnt/2
340 mkdir -p /mnt/2/boot
341 mount -o bind /mnt/boot/boot_ubuntubionic /mnt/2/boot
342 mount -o bind /mnt/boot/boot_ubuntufocal /mnt/2/boot
343 # S = sparse, A = acls, X = xattrs
344 rsync -ahSAX --numeric-ids /mnt/1/ /mnt/2
345
346 cd /mnt/2
347 cp /tmp/fai/crypttab etc
348 sed -i "s#/root/keyscript,#decrypt_keyctl,#" etc/crypttab
349 cp /tmp/fai/fstab etc
350 echo "tmpfs /tmp tmpfs nodev,nosuid,size=50%,mode=1777 0 0" >> etc/fstab
351 chrbind
352 chroot .
353 mv /etc/resolv.conf /etc/resolv.conf.old
354 echo nameserver 1.1.1.1 >/etc/resolv.conf
355 # install programs from /a/bin/fai/fai/config/package_config/STANDARD:
356 apt install -y openssh-client openssh-server cryptsetup keyutils btrfs-progs console-setup kbd pciutils usbutils unattended-upgrades initramfs-tools-core dropbear-initramfs
357 mv /etc/resolv.conf.old /etc/resolv.conf
358 exit
359 d=etc/initramfs-tools
360 mkdir -p $d/root/.ssh etc/dropbear-initramfs root/.ssh
361 chmod 700 $d/root $d/root/.ssh root/.ssh
362 cp -p /root/.ssh/authorized_keys $d/root/.ssh/authorized_keys
363 cp -p /root/.ssh/authorized_keys etc/dropbear-initramfs
364 cp -p /root/.ssh/authorized_keys root/.ssh/authorized_keys
365 chroot .
366 sed -ri 's/^ *GRUB_CMDLINE_LINUX_DEFAULT=.*/GRUB_CMDLINE_LINUX_DEFAULT="rd.luks.crypttab=no"/' /etc/default/grub
367 grub-install --no-floppy $(grub-probe -tdrive -d /dev/sda)
368 update-grub
369 grub-bios-setup -d /boot/grub/i386-pc -s /dev/sda
370 exit
371 umount proc
372 umount dev
373 umount sys
374 reboot
375
376 # for switching the boot to root2
377 zboot
378 # for switching back, efibootmgr, if there is a problem with the root filesystem detection,
379 # boot into the debian bootstrap distro, run partition.DEFAULT using comments for mktab arg.
380 # then manually run iboot and then reboot.
381
382
383 # pine rock64 notes
384 # the only useful image is ubuntu 18.04 ayafun or something.
385 # using emmc usb:
386 s mount /dev/sdb7 /mnt/1
387 s cp `which qemu-arm-static` /mnt/1/usr/bin
388 s chroot /mnt/1 qemu-arm-static /bin/bash
389 usermod --login iank --move-home --home /home/iank rock46
390 groupmod --new-name iank rock64
391 passwd iank
392 # boot it
393 s apt-get update
394 s apt dist-upgrade
395
396
397 ### How to merge upstream fai-config
398
399 git checkout upstream
400 cd path-to-fai-config
401 git pull --stat
402 # the following needs modification if there was deletions or renames
403 rsync --exclude /.git -rlpgoDcvi . /b/fai/fai/config/
404 cd /b/fai/fai/config/
405 # where XXXXX is the git commit hash
406 # note, several files which just had trailing space changes will get ignored.
407 git commit -am "update upstream to XXXXX"
408 git checkout master
409 git merge upstream
410 # fix conflicts
411 git commit
412
413
414 # TODO
415 Change arch to archlike and to support arch and parabola
416
417
418 # License
419
420 The license for the project is GPLv2 or later, mostly because fai is and
421 I periodically merge the upstream example config, which contains small
422 scripts. Also, there is a modified encrypt.upstream, which is from the
423 cryptsetup package in arch, which is under the same license.