Mastodon.ie @mastodonie

MichaelPutting the data roots for each buildah container into a separate directory on the shared CephFS volume did work and reduced the overall build time for the FluentD container from 23 minutes to 15 minutes. So the shared dir was part of the problem. But it's still rather slow, and I'm still pretty sure it's because of the VFS driver and CephFS. So let's see whether I can come up with something better.<a href="https://social.mei-home.net/tags/HomeLab" class="mention hashtag" rel="nofollow noopener" target="_blank">#HomeLab</a> <a href="https://social.mei-home.net/tags/Ceph" class="mention hashtag" rel="nofollow noopener" target="_blank">#Ceph</a> <a href="https://social.mei-home.net/tags/buildah" class="mention hashtag" rel="nofollow noopener" target="_blank">#buildah</a>

Heinlein SupportDu brauchst dringend viel <a href="https://social.heinlein-support.de/tags/Speicherplatz" class="mention hashtag" rel="nofollow noopener" target="_blank">#Speicherplatz</a>? Dann kommst Du im Open Source-Bereich an <a href="https://social.heinlein-support.de/tags/Ceph" class="mention hashtag" rel="nofollow noopener" target="_blank">#Ceph</a> eigentlich nicht vorbei. Egal, ob als Daten-Backup, für Virtualisierungs-Projekte, als Fileserver oder als Ersatz für Amazon S3 – Ceph bietet als <a href="https://social.heinlein-support.de/tags/Storage" class="mention hashtag" rel="nofollow noopener" target="_blank">#Storage</a>-System vielfältige Einsatzszenarien. Und das Beste: bei guter Pflege und mit fundierter Schulung ist das System quasi unkaputtbar. In unserer Schulung lernst Du, wie man ein Ceph-Cluster aufbaut und betreibt: <a href="https://www.heinlein-support.de/schulung/ceph-grundlagen" rel="nofollow noopener" translate="no" target="_blank">https://www.heinlein-support.de/schulung/ceph-grundlagen</a><a href="https://social.heinlein-support.de/tags/CephSchulung" class="mention hashtag" rel="nofollow noopener" target="_blank">#CephSchulung</a>

JPLIch habe jetzt ca. eine Personenwoche damit verbracht, die Voraussetzungen zum Einsatz von Software X zu schaffen, nur um dann festzustellen dass X Mist ist und wir es vermutlich doch nicht einsetzen - was man auch aus der Doku hätte erahnen können. Ich habe interessante Dinge dabei gelernt, z.B. über <a href="https://norden.social/tags/Kubernetes" class="mention hashtag" rel="nofollow noopener" target="_blank">#Kubernetes</a> auf Multi-Homed Bare-Metal Nodes. Und über <a href="https://norden.social/tags/Rook" class="mention hashtag" rel="nofollow noopener" target="_blank">#Rook</a> <a href="https://norden.social/tags/Ceph" class="mention hashtag" rel="nofollow noopener" target="_blank">#Ceph</a>. Ich sollte mich also wahrscheinlich freuen, bin aber dennoch gerade sehr enttäuscht.<a href="https://norden.social/tags/SysAdmin" class="mention hashtag" rel="nofollow noopener" target="_blank">#SysAdmin</a>

Neustradamus :xmpp: :linux:<a href="https://mastodon.social/tags/ProxmoxVE" class="mention hashtag" rel="nofollow noopener" target="_blank">#ProxmoxVE</a> 9.0 has been released (<a href="https://mastodon.social/tags/Proxmox" class="mention hashtag" rel="nofollow noopener" target="_blank">#Proxmox</a> / <a href="https://mastodon.social/tags/VirtualEnvironment" class="mention hashtag" rel="nofollow noopener" target="_blank">#VirtualEnvironment</a> / <a href="https://mastodon.social/tags/Virtualization" class="mention hashtag" rel="nofollow noopener" target="_blank">#Virtualization</a> / <a href="https://mastodon.social/tags/VirtualMachine" class="mention hashtag" rel="nofollow noopener" target="_blank">#VirtualMachine</a> / <a href="https://mastodon.social/tags/VM" class="mention hashtag" rel="nofollow noopener" target="_blank">#VM</a> / <a href="https://mastodon.social/tags/Linux" class="mention hashtag" rel="nofollow noopener" target="_blank">#Linux</a> / <a href="https://mastodon.social/tags/Debian" class="mention hashtag" rel="nofollow noopener" target="_blank">#Debian</a> / <a href="https://mastodon.social/tags/Trixie" class="mention hashtag" rel="nofollow noopener" target="_blank">#Trixie</a> / <a href="https://mastodon.social/tags/DebianTrixie" class="mention hashtag" rel="nofollow noopener" target="_blank">#DebianTrixie</a> / <a href="https://mastodon.social/tags/QEMU" class="mention hashtag" rel="nofollow noopener" target="_blank">#QEMU</a> / <a href="https://mastodon.social/tags/LXC" class="mention hashtag" rel="nofollow noopener" target="_blank">#LXC</a> / <a href="https://mastodon.social/tags/KVM" class="mention hashtag" rel="nofollow noopener" target="_blank">#KVM</a> / <a href="https://mastodon.social/tags/ZFS" class="mention hashtag" rel="nofollow noopener" target="_blank">#ZFS</a> / <a href="https://mastodon.social/tags/OpenZFS" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenZFS</a> / <a href="https://mastodon.social/tags/Ceph" class="mention hashtag" rel="nofollow noopener" target="_blank">#Ceph</a>) <a href="https://proxmox.com/" rel="nofollow noopener" translate="no" target="_blank">https://proxmox.com/</a>

Rachelok looking a bit closer I think I see how it works! But I have some thoughts and a decision/question... Each ceph pool gets automatically split into some number of PGs, and each PG gets split up by the crush rules. So the PGs of my rbd pool all get split up between three hosts, but which three is chosen such that the total data usage is spread evenly across nodes/disks based on the weight. For some reason the big hdd ec pool only has 1 PG so far so that is just using 6 of the drives, as it adds PGs it should spread out to the 8 drives just fine But now I am thinking: Do I continue with OSD failure domain, or do I switch to 2+1 EC with 4 hosts for this pool? Basically everyone suggests not using OSD failure domain, but the mgr/etc data is replicated on the SSDs and with 8 drives it could re-balance (it will be a LONG time till I fill this, or even get close to 50% used) Meanwhile with 3+1 and node failure domain I have the same capacity. <a href="https://hachyderm.io/@willglynn" class="u-url mention" rel="nofollow noopener" target="_blank">@willglynn@hachyderm.io</a> any thoughts? It could be a while until I can add more nodes/disks so no suggestions of filling a rack with 20 more servers ;) <a href="https://transitory.social/tags/Ceph" rel="nofollow noopener" target="_blank">#Ceph</a> <a href="https://transitory.social/tags/Kubernetes" rel="nofollow noopener" target="_blank">#Kubernetes</a> <a href="https://transitory.social/tags/Homelab" rel="nofollow noopener" target="_blank">#Homelab</a>

RachelOk so I'm seeing some curious issues with this ceph cluster and either I have a config issue, or a core misunderstanding that I really should address before I build out further I have 8 hdd disks across 4 nodes, and in rook they're dust assigned to a 4+2 EC pool. I had thought it would spread the data over all 8 drives eventually But in <code>ceph pg dump</code> I see two drives missing. * maybe it just grabs the he first 6 that it sees and leaves those two alone? Would it swap in drives as a hot spare? * Or maybe I had a hiccup and they didn't get assigned for some other reason? * Maybe I should reset it and re-create with 5+3 EC? * The two drives in question have basically no data usage but they're not errored out <a href="https://transitory.social/tags/Homelab" rel="nofollow noopener" target="_blank">#Homelab</a> <a href="https://transitory.social/tags/Ceph" rel="nofollow noopener" target="_blank">#Ceph</a> <a href="https://transitory.social/tags/Kubernetes" rel="nofollow noopener" target="_blank">#Kubernetes</a> <a href="https://transitory.social/tags/Minilab" rel="nofollow noopener" target="_blank">#Minilab</a>

RachelAll drives have been added physically, now to figure out how the heck I'm going to build the pools and then transfer data over Current plan: * I have 8 disks, so I'm thinking 4+2 EC pool with the HDDs only, ODD failure domain because I just don't have enough hosts * I think I can update Ceph and tell it to add the new HDDs as blank OSDs pretty easily * Then I can add a new pool ? Maybe I'm a little fuzzy on terms. Goal is to run CephFS on this pool. * Then I'll create the base CephFS volumes. I'm thinking I'll create a new namespace and add the PVCs there first. I'll back things up via this namespace. * Then I can create new CephFS vols/PVCs in each namespace that also needs access * I should be able to do some of the basic copy operations by just more bring NFS and CephFS at the same time in a pod? Maybe. That or I expose CephFS outside of the cluster and mount it directly on the NAS itself. <a href="https://transitory.social/tags/Homelab" rel="nofollow noopener" target="_blank">#Homelab</a> <a href="https://transitory.social/tags/Ceph" rel="nofollow noopener" target="_blank">#Ceph</a> <a href="https://transitory.social/tags/Kubernetes" rel="nofollow noopener" target="_blank">#Kubernetes</a> <a href="https://transitory.social/tags/Minilab" rel="nofollow noopener" target="_blank">#Minilab</a>

Métaphysicien DouteuxHey les copains <a href="https://framapiaf.org/tags/sysadmin" class="mention hashtag" rel="nofollow noopener" target="_blank">#sysadmin</a> grands sorciers de <a href="https://framapiaf.org/tags/proxmox" class="mention hashtag" rel="nofollow noopener" target="_blank">#proxmox</a> et <a href="https://framapiaf.org/tags/ceph" class="mention hashtag" rel="nofollow noopener" target="_blank">#ceph</a> question réseaux svp: - 2 interfaces 10gb en bond0 - bond0 bridgé en vmbr0 (tolerance de panne + répartir 20gb sur vlans (vmbr0.x) (internes ceph et utilisés par VMs) q1: MTU, je veux jumbo sur le vlan ceph et normal sur le reste. Je mets: - 9000 sur le bond - un peu moins sur vmbr0 - ce que je veux sur les differents vmbr0.x mais max <vmbr0q2: comment gérer les limites de bande passante par vlan? <a href="https://friend.geoffray-levasseur.org/profile/fatalerrors" class="u-url mention" rel="nofollow noopener" target="_blank">@fatalerrors</a> obviously ;)

Matthew VernonAnyone got good suggestions for <a href="https://wikis.world/tags/Backups" class="mention hashtag" rel="nofollow noopener" target="_blank">#Backups</a> of <a href="https://wikis.world/tags/S3" class="mention hashtag" rel="nofollow noopener" target="_blank">#S3</a> / <a href="https://wikis.world/tags/Ceph" class="mention hashtag" rel="nofollow noopener" target="_blank">#Ceph</a> buckets? I'd like not to have to download everything every time, and also have multiple backups but not growing without bound (and be able to say "restore object X from date Y"). I'd like to avoid "fuse-mount the bucket and then backup as if it were a fs".Something like rsbackup (which uses rsync --link-dest meaning you only store changed objects; and that you can safely delete old backups) would be nice (rclone lacks this sort of thing)

RachelCluster rebuild project: This isn't yak shaving, this is something else entirely The goal: move all bulk storage off of the old NAS and onto a 3.5in HDD Ceph pool in the Minilab (plan is 4 nodes 8 disks 4+2 EC pool with osd failure domain). Most of the bulk data will be in CephFS. I plan to add a samba container for access by windows for misc uses as needed. What needs to be accomplished to get there: * Data needs to transfer and performance tested from each application it will be utilizer * The HDD ool needs to be installed and built * Before that I need to be confident of backups, Velero is working but it is failing on CephFS specifically and I'm not sure why. I'm able to manually take CephFS volume snapshots? Uggggggggh. * Great now the Ceph mgrs are xrashing. The only errors I see are OOMKills but the nodes aren't close to our of memory any I haven't found anything else in the logs So now I'm troubleshooting a pop-up event before the unclear backup issue before I can even get started on migrating the data. With trip planning happening I probably won't have much progress until late Aug at best. The goal is to have this done and tested including backups before mid Nov. <a href="https://transitory.social/tags/Homelab" rel="nofollow noopener" target="_blank">#Homelab</a> <a href="https://transitory.social/tags/Ceph" rel="nofollow noopener" target="_blank">#Ceph</a> <a href="https://transitory.social/tags/Kubernetes" rel="nofollow noopener" target="_blank">#Kubernetes</a> <a href="https://transitory.social/tags/Minilab" rel="nofollow noopener" target="_blank">#Minilab</a>

ijMy 3-node <a href="https://nerdculture.de/tags/Ceph" class="mention hashtag" rel="nofollow noopener" target="_blank">#Ceph</a> cluster is showing nice performance when running "bin/tootctl search deploy --only=accounts" during the Mastodon upgrade...

RachelOverview of cluster specs: Networking: * Mikrotik RB5009Upr+S+in * Mikrotik CRS310-5s-4s+in * Mikrotik CRS310-8g+2s+in (rear mounted) * Cable modem (pending move into rack) * 1x raspberry pi 4 running dnsmasq for DHCP/DNS, with a second acting as a coldish spare Compute: * 1x Intel core ultra 235 system with Nvidia p4 and 32gb ram, general compute * 4x odroid H4s ultra with 1x NVMe boot, 2x 800gb, ssd 2x 22tb HDD (pending) Each node has 1x 2.5gb link to the rear CRS310 with room to LACP the odroids if I need to upgrade networking. Software: The cluster nodes are all running bare metal Talos. Three odroids act as control plane+storage, the fourth acts as a storage+compute, and the Intel core ultra is pure compute. There are/will be multiple Ceph storage pools for different use cases. This was the smallest Ceph that I felt ok with. Ceph people really suggest larger clusters than this so we'll see how it goes. That is also why I'm stalled until I get backups working fully. <a href="https://transitory.social/tags/Homelab" rel="nofollow noopener" target="_blank">#Homelab</a> <a href="https://transitory.social/tags/Ceph" rel="nofollow noopener" target="_blank">#Ceph</a> <a href="https://transitory.social/tags/Kubernetes" rel="nofollow noopener" target="_blank">#Kubernetes</a> <a href="https://transitory.social/tags/Minilab" rel="nofollow noopener" target="_blank">#Minilab</a>

RachelPhysical test fit is good! (The 3.5in drives are not populated yet) <a href="https://transitory.social/tags/Homelab" rel="nofollow noopener" target="_blank">#Homelab</a> <a href="https://transitory.social/tags/Ceph" rel="nofollow noopener" target="_blank">#Ceph</a> <a href="https://transitory.social/tags/Kubernetes" rel="nofollow noopener" target="_blank">#Kubernetes</a> <a href="https://transitory.social/tags/Minilab" rel="nofollow noopener" target="_blank">#Minilab</a>

MichaelWell of course. My Ceph cluster would produce a scrub inconsistency error while I'm 400 km away from the Homelab. Let's see what this is about.<a href="https://social.mei-home.net/tags/HomeLab" class="mention hashtag" rel="nofollow noopener" target="_blank">#HomeLab</a> <a href="https://social.mei-home.net/tags/Ceph" class="mention hashtag" rel="nofollow noopener" target="_blank">#Ceph</a>

Marianne SpillerUpgrade des Prod-Clusters von <a href="https://konfigurationsmanufaktur.de/tags/Ceph" class="mention hashtag" rel="nofollow noopener" target="_blank">#Ceph</a> Reef zu Squid ✅ Und das sogar noch **vor** dem 1. August! 🎉 Lief unproblematisch.<a href="https://konfigurationsmanufaktur.de/tags/Proxmox" class="mention hashtag" rel="nofollow noopener" target="_blank">#Proxmox</a> <a href="https://konfigurationsmanufaktur.de/tags/Ceph" class="mention hashtag" rel="nofollow noopener" target="_blank">#Ceph</a>

RachelThe Velero backups are working except for the CephFS volumes :neocat_sob: I have no idea why it just throws timeout errors but I can create a manual snapshot which doesn't take that long? I'm so close to getting past this step but I'm not exactly where to look next <a href="https://transitory.social/tags/Kubernetes" rel="nofollow noopener" target="_blank">#Kubernetes</a> <a href="https://transitory.social/tags/Velero" rel="nofollow noopener" target="_blank">#Velero</a> <a href="https://transitory.social/tags/Ceph" rel="nofollow noopener" target="_blank">#Ceph</a> <a href="https://transitory.social/tags/Backups" rel="nofollow noopener" target="_blank">#Backups</a>

**OpenStack** @OpenStack@openinfra.dev · Jul 10

Jul 10

OpenStack @OpenStack@openinfra.dev

Are you running #OpenStack and #Ceph? Share your story at #Cephalocon In Vancouver on October 28! The CFP is currently open and closes this Sunday, July 13 at 11:59pm PT.

https://ceph.io/en/community/events/2025/cephalocon-2025/

**Stuart Longland (VK4MSL)** @stuartl@longlandclan.id.au · Jul 6 *

Jul 6 *

Stuart Longland (VK4MSL) @stuartl@longlandclan.id.au

Seems some of my disks have seen a few writes! `smartctl -a ${DEVICE}` stats for the #Ceph and virtual machine host cluster here.

The 2TB SSDs are OSDs in Ceph, the others are OS disks or local VM cache disks. Ceph is moaning about the disks in "oxygen" and "fluorine".

Numbers here are Total Drive Writes in GB.

```
$ for h in *; do for d in ${h}/tmp/*.txt; do lbas=$( grep -F Total_LBAs_Written ${d} | cut -c 88- ); if [ -n "${lbas}" ]; then dev=$( basename "${d}" .txt ); dev=/dev/${dev##*-}; sect=$( grep '^Sector Size' ${d} | cut -c 18-21 ); printf "%-20s %8s %6d %s\n" ${h} ${dev} $(( ( ${lbas} * ${sect} ) / 1024 / 1024 / 1024 )) "$( grep '^Device Model' ${d} )"; fi; done; done
beryllium.chost.lan /dev/sda 0 Device Model: WD Green 2.5 240GB
boron.chost.lan /dev/sdb 76370 Device Model: Samsung SSD 870 EVO 2TB
carbon.chost.lan /dev/sda 113892 Device Model: Samsung SSD 870 EVO 2TB
carbon.chost.lan /dev/sdb 157993 Device Model: Samsung SSD 860 EVO 2TB
fluorine.chost.lan /dev/sda 111939 Device Model: Samsung SSD 870 QVO 2TB
helium.chost.lan /dev/sda 100476 Device Model: Samsung SSD 870 QVO 2TB
hydrogen.chost.lan /dev/sda 184564 Device Model: Samsung SSD 860 EVO 2TB
hydrogen.chost.lan /dev/sdb 58602 Device Model: Samsung SSD 870 EVO 2TB
lithium.chost.lan /dev/sda 0 Device Model: WD Green 2.5 240GB
magnesium.chost.lan /dev/sda 0 Device Model: WD Green 2.5 240GB
neon.chost.lan /dev/sdb 146926 Device Model: Samsung SSD 860 EVO 2TB
nitrogen.chost.lan /dev/sda 99473 Device Model: Samsung SSD 870 EVO 2TB
oxygen.chost.lan /dev/sdb 108748 Device Model: Samsung SSD 870 QVO 2TB
sodium.chost.lan /dev/sda 0 Device Model: WD Green 2.5 240GB
```

#HomeServer

Replied in thread

**Stuart Longland (VK4MSL)** @stuartl@longlandclan.id.au · Jul 4

Jul 4

Stuart Longland (VK4MSL) @stuartl@longlandclan.id.au

I'm a glutton for punishment this week… I'm poking the octopus again, moving to Ceph 18.

So far, we're doing the monitor daemons. A fun fact, they switched from LevelDB to RocksDB some time back and in Ceph 17, they dropped support for LevelDB.

I found this out when updating the first monitor node, it crashed. Refused to start. All I had was a cryptic:

ceph_abort_msg("MonitorDBStore: error initializing keyvaluedb back storage")

Not very helpful. The docs for Ceph 17 (which I didn't read as I was going from 16→18 direct, which the Ceph 18 docs says you _can_ do), merely state:

> LevelDB support has been removed. WITH_LEVELDB is no longer a supported build option. Users should migrate their monitors and OSDs to RocksDB before upgrading to Quincy.
(https://docs.ceph.com/en/latest/releases/quincy/)

How? 0 suggestions as to the procedure.

What worked here, was to manually remove the monitor node, rename the /var/lib/ceph/mon/* directory, then re-add the monitor using the manual instructions.

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/

docs.ceph.comQuincy — Ceph Documentation

#Ceph

**Rachel** @rachel@transitory.social · Jul 1

Jul 1

Rachel @rachel@transitory.social

Huh, can I manually point two persistent volumes at the same CephFS filesystem?

I just realized that a huge part of my plan depends on this
#Kubernetes #Ceph

Recent searches

Search options

Administered by:

Server stats:

#ceph