Sunday, January 29, 2006

ldb & kde

while at linux.conf.au i spent a bit of time with tridge who was keen to showcase ldb which is the new "ldapy" storage backend that was written for samba4.

i've begun working on ldb support for kde and while i'm a few days away from having something worth committing so far it's looking pretty good. ian geiser had done some work on this already, i believe, as well. personally i've become convinced it's the way to go for kde4 and am putting in the effort to get it there. why?

ldb is an embedded database that mimics ldap. because it's embedded it's fast and memory lean. it had to be as samba has some pretty insane performance requirements. from what i understand, these performance requirements is what caused them (the samba project) to write their own thing rather than reuse something that was already out there. but it's features go far beyond performance:

firstly, it's transactional and multi-reader/writer safe. this means that if an app is writing to its config and crashes (or, as in a recent kicker bug report, the user purposefully kills it during such a write process) the configuration file doesn't end up in a non-usable state. it also means that even though everything is one file, it's perfectly suited to having dozens of different processes hammering away at it and all without having a separate daemon process.

it offers an ldap-y API and storage format but without the requirement of schemas (hallelujah!) and provides network access for free (just provide an ldap:// URI instead of a local ldb file path). they are now working on replication, so all users of ldb will get that for free eventually as well. indexes keeps access fast, and mmap'ing the file intelligently keeps page-ins and memory usage minimal. even the "two-phase commit" transaction system for writing is kept light-weight enough to not become onerous.

it's also scripting and human friendly. with command line tools and a "vipw"-like tool for the database, one can either use scripting or a text editor to edit the contents of the database (or any specific branch of it). this was one of my bigger worries (alongside safety) of using a monolithic binary file.

the obvious implications of all this are that kde could get a centralizable, replicatable, scalable, fast and safe config storage system that can move us beyond ini-style configs without much work on our part. being able to store data in binary formats also means that we may find we get some performance boosts from not having to ascii-ize everything on the way out to the config file and repeat the process in reverse when reading it in.

this translates into various perks such as making kde even more attractive for large roll-outs where centralization of config data may be desirable, making it easier for users to know what to back up ("just back up the kde.ldb file!") and bringing us a bit closer to integration with samba.

by providing a nice KDE-style front end to LDB we may also finally provide a nice way for app developers to store hierarchical data without resorting to abusing the INI file format (think mail filters or accounts in kmail, panel layouts in kicker) or deciding they need to use something else such as their own hand-rolled XML concoctions.

the latter is The Big Dream here. imagine a "firewall to phone" type solution based on the combined value of linux/bsd (firewall & operating system), kolab (groupware), samba (file/print), astrisk (voip) and kde (desktop productivity, document management, phone GUI and voice mail access) all with their configurations being held in a common LDAP(-y) database. so when the sysadmin adds a user they can get email, calendaring, file server access, network printer access, PBX access and desktop configuration provision all at the same time and essentially for free.

this obviously requires letting these individual pieces tie into the same configuration infrastructure, and it seems samba4 has the most sane option in our world right now from a desktop perspective. i talked with an asterisk guy at linux.conf.au about some of these things and will be discussing it with kolab people as well. and getting kde4 on ldb would be us doing our part to make this happen.

hopefully others will agree with me and we'll have ldb files as an option for configuration files in kde4 =)

15 comments:

Peter said...

I don't like it.

One of the big advantages of KDE for me over GNOME is its use of plain text files stored in a directory tree for configuration. There have been a few occasions where I've managed to configure a KDE app in a stupid way that broke it, and I've had to fix it by hacking the config.

If all the configs were hidden in a binary file, that option wouldn't be open to me. On any recent computer, the penalty associated with loading and parsing a text file is going to be barely noticeable, surely? (Feel free to provide me with benchmarks to the contrary!)

Anonymous said...

Why not elektra (elektra.sf.net)? It seems as a much better solution...

Anonymous said...

Great! So KDE users at last will get possibility to experience the same problems that Windows users get when registry gets corrupted.

Anonymous said...

Why would elektra be a better solution? It seems to go exactly opposite way - one value per file. So instead of parsing you get insane number of stat(), open() and close() calls. And it gives you nothing that .ini files already don't have. Elektra devs try to push their stuff everywhere (at least several attempts at xorg for example) but for some reason nobody wants it.

Robert Knight said...

Just to second some of the above posts, if KDE does go down this route, we have to be careful to avoid the pitfalls that the Windows registry fell into.

That is:

- If the registry gets corrupted, and there is no backup (which is often the case, you cannot expect people to be careful to back up their settings database)
- If the registry database gets very large, it sometimes takes a while to copy it across when logging onto a Windows network.
- The only way to read and write config settings is via the appropriate API. There is something to be said for the ability to hack around using a text file.

And probably many others that have been covered elsewhere.

In general, I think it would also be helpful if there was an easy way of finding out which registry entries were created or modified by which program. When cleaning malware out of Windows systems, trying to undo the damage done to the registry by a particular virus or piece of spyware is an absolute pain.

Jamie McCracken said...

ldb has some disadvantages, I know this cause I evaluated ldb as a possible backend for the doomed dconf.

The problems are:

1) I dont think its threadsafe at all (at least I checked the source and found no use of mutexes!). That might be okay for Samba which is multi process anyway but is too dangerous for use in either threaded apps or if you wanted a threaded config daemon that provides notifications.

2) Performance is poor when compared to modern databases. It uses a file based hash which is both slow and inefficient memory wise when compared to b-tree based databases (lack of cache locality in hash based dbs mean more disk seeks on average!).

3) You will have Big problems on NFS mounted home dirs where due to broken file locking your ldb database will definitely get corrupted! - Its only as safe as system level file locks and therefore is unsafe!

All in all its way too dangerous to use. A multi threaded daemon solution is much safer (esp on NFS where mutexes and not file locks are used to keep ypur db safe) and gives you added benefits such as additional caching (useful for when you use remote LDAP) and of course notifications.

Its true that none of the existing soultions are really state of the art or suitable for a new rocktastic desktop but I would be happy to collaborate on a freedesktop venture to sort this out (and no it does not have to be dconf).

Henning said...

I second the point of Peter. I was able to recover some kde apps this way.

My experiences with samba3 and the tdb format are not encouraging, corrupt files (and no way to recover them) causes bad performance. Perhaps this will be better with samba4 and ldb.

And my recent experiences with gconf and gnome (i need to install >10 packages only to edit the default fonts) and the windows registry make me dislike this approuch much more..

If we need more performance, we can cache the text files in a binary format, like other programs do.

Anonymous said...

I'm just a lowly enduser, not a programmer. This doesn't sound so great to me. One of the things I like about Linux is that: (1)I have access to text-based config files and (2)I know where to find them (mostly*). If you want faster performance then duplicate the conf. data in a binary format and just check on startup whether the text-file datestamp/whatever has changed or not.

As far as backups go -- what to back up? Your home directory.

*With KDE apps I do have trouble sometime finding them, because they config filenames were created by someone from Microsoft and often bear little resemblance to the names of their applications.

But in general if I am using a program called LyX I look for a dot file (or dir) called ".lyx" (or similar) and I usually find it -- and I can open it, read it (usually understand it) and even modify it -- all with my text editor. (Not at all like the mysterious world of Windows that I sometimes need to visit.)

-Kevin Pfeiffer

Brad Hards said...

So many uninformed opionions, so little time to educate people...

It is binary (well, it almost certainly is - you might be able to generate a backend that is text based), but you can hand edit it, using the ldbedit command. ldbedit sucks the data out into LDIF format, brings up your preferred editor (you edit the line oriented text) and then ldbedit saves the LDIF back.

I haven't done the threading checks, but Samba4 can do a threading process model, and I think the tdb backend to ldb will provide appropriate locking. LDAP backend and sqlite3 backend should also be fine.

Anonymous said...

I'm rather uneducated about ldap style databases, but I would imagine that storing config files this way would make them easier to edit with different config tools.

For example. Currently on many distros there is a GUI distro specific tool for editing the configuration of certain packages (e.g. YaST) or you can edit them by hand, but it seems to me that it's very difficult to go back and forth. After editing a config file by hand the GUI editor sometimes balks at the idea of editing it. Shouldn't this get rid of that problem?

I see many worries about what if this file gets corrupted, wouldn't it be as bad as if the current ini file gets corrupted, or removed? I'm not certain I understand peoples anxiety.

Anonymous said...

I have no problems with binary files, as long as they are documented and robust, and I think they can have some really big advantages.

Just one concern regarding backups: If you do some migration, just a single file may be fine, but when it comes to regular backups, "everything in a single file" may be a bad thing. Either the backup program has to try to make a differential backup (rsync like), or the complete file has to be backupped. Differential backups may be no options, as it is computationally expensive, and the backup software may not even provide this option, and complete file backups may be a bad idea storage space wise.

Is it possible to split the database in two seperate parts, one holding the normal config values (colors and themes, mail accounts ...) and one the often changing values (file selector histories, window positions ...)?

Anonymous said...

This is a great idea indeed, I look forward to see that happen in KDE 4.

Daniel "Suslik" D. said...
This comment has been removed by a blog administrator.
Jengu said...

Disclaimer: I don't have any experience with large rollouts so it's hard for me to get a tangible feeling for how useful this would be. Coming from Windows to Linux, one of my favorite 'features' was that since the OS basically forces it so that you can only write to your home folder, this forces apps to be coded so that their settings are saved there.

On Windows, if I need to reformat or upgrade to a new version of the OS or whatever, I'm sent all over my hardrive looking for where my saved games are.

Backing up the settings for any app is also dead simple on Linux compared to windows -- I just backup that apps .whateverapp folder that's inside my home folder. I know exactly where it is and I can literally just copy and paste that folder, no finding and exporting of registry keys or other mucking around required.

Unfortunately, KDE totally breaks with this paradigm and all KDE app configs get shoved into .kde, and across at least 2 files. I don't have any idea what the reasoning behind doing that was.

Now here comes this other proposal, and it sounds like it's just going to make things worse! :P Not only are the settings going to be in nonstandard locations that I can't find, they're going to require me to learn new command line utilities to edit... :P

The obvious advantage I can see to this sort of approach is when you want to change one setting for all users without altering any other settings they may have changed -- overwriting rc files wouldn't have the desired effect. But I really like the 'usability' of just being able to copy .whateverapp.

Anonymous said...

Regarding the "large rollout" aspect, there is already a patch for KDE supporting configuration in LDAP, see the related bug report. The issues affecting this implementation are the shortcomings of KConfig, which include lack for multi-level defaults (its either global or user), and no caching. KConfig should be fixed to allow those, and support multiple backends (ie global and group in LDAP, user in local files, or global in LDAP, group in sql, user in local files etc etc) first, which would allow easier development of backends such as this.

I always get worried when people say "no need to worry about schemas", this will lead to interoperability problems for anyone who *does* need to have final storage in something like a corporate LDAP infrastructure ...