/** Copyright 2007 Haiku Inc. All rights reserved.* Distributed under the terms of the MIT License.** Authors:* Ingo Weinhold*//*!\page fs_modules File System ModulesTo support a particular file system (FS), a kernel module implementing aspecial interface (\c file_system_module_info defined in \c <fs_interface.h>)has to be provided. As for any other module the \c std_ops() hook is invokedwith \c B_MODULE_INIT directly after the FS module has been loaded by thekernel, and with \c B_MODULE_UNINIT before it is unloaded, thus providinga simple mechanism for one-time module initializations. The same module isused for accessing any volume of that FS type.\section objects File System ObjectsThere are several types of objects a FS module has to deal with directly orindirectly:- A \em volume is an instance of a file system. For a disk-based filesystem it corresponds to a disk, partition, or disk image file. Whenmounting a volume the virtual file system layer (VFS) assigns a uniquenumber (ID, of type \c dev_t) to it and a handle (type \c void*) providedby the file system. The VFS creates an instance of struct \c fs_volumethat stores these two, an operation vector (\c fs_volume_ops), and othervolume related items.Whenever the FS is asked to perform an operation the \c fs_volume objectis supplied, and whenever the FS requests a volume-related service fromthe kernel, it also has to pass the \c fs_volume object or, in some cases,just the volume ID.Normally the handle is a pointer to a data structure the FS allocates toassociate data with the volume.- A \em node is contained by a volume. It can be of type file, directory, orsymbolic link (symlink). Just as volumes nodes are associated with an ID(type \c ino_t) and, if in use, also with a handle (type \c void*).As for volumes the VFS creates an instance of a structure (\c fs_vnode)for each node in use, storing the FS's handle for the node and anoperation vector (\c fs_vnode_ops).Unlike the volume ID the node ID is defined by the FS.It often has a meaning to the FS, e.g. file systems using inodes mightchoose the inode number corresponding to the node. As long as the volumeis mounted and the node is known to the VFS, its node ID must not change.The node handle is again a pointer to a data structure allocated by theFS.- A \em vnode (VFS node) is the VFS representation of a node. A volume maycontain a great number of nodes, but at a time only a few are representedby vnodes, usually only those that are currently in use (sometimes a fewmore).- An \em entry (directory entry) belongs to a directory, has a name, andrefers to a node. It is important to understand the difference betweenentries and nodes: A node doesn't have a name, only the entries that referto it have. If a FS supports to have more than one entry refer to a singlenode, it is also said to support "hard links". It is possible that noentry refers to a node. This happens when a node (e.g. a file) is stillopen, but the last entry referring to it has been removed (the node willbe deleted when the it is closed). While entries are to be understood asindependent entities, the FS interface does not use IDs or handles torefer to them; it always uses directory and entry name pairs to do that.- An \em attribute is a named and typed data container belonging to a node.A node may have any number of attributes; they are organized in a(depending on the FS, virtual or actually existing) attribute directory,through which one can iterate.- An \em index is supposed to provide fast searching capabilities forattributes with a certain name. A volume's index directory allows foriterating through the indices.- A \em query is a fully virtual object for searching for entries via anexpression matching entry name, node size, node modification date, and/ornode attributes. The mechanism of retrieving the entries found by a queryis similar to that for reading a directory contents. A query can be livein which case the creator of the query is notified by the FS whenever anentry no longer matches the query expression or starts matching.\section concepts Generic ConceptsA FS module has to (or can) provide quite a lot of hook functions. There area few concepts that apply to several groups of them:- <em>Opening, Closing, and Cookies</em>: Many FS objects can be opened andclosed, namely nodes in general, directories, attribute directories,attributes, the index directory, and queries. In each case there are threehook functions: <tt>open*()</tt>, <tt>close*()</tt>, and<tt>free*_cookie()</tt>. The <tt>open*()</tt> hook is passed all that isneeded to identify the object to be opened and, in some cases, additionalparameters e.g. specifying a particular opening mode. The implementationis required to return a cookie (type \c void*), usually a pointer to adata structure the FS allocates. In some cases (e.g.when an iteration state is associated with the cookie) a new cookie mustbe allocated for each instance of opening the object. The cookie is passedto all hooks that operate on a thusly opened object. The <tt>close*()</tt>hook is invoked to signal that the cookie is to be closed. At this pointthe cookie might still be in use. Blocking FS hooks (e.g. blockingread/write operations) using the same cookie have to be unblocked. Whenthe cookie stops being in use the <tt>free*_cookie()</tt> hook is called;it has to free the cookie.- <em>Entry Iteration</em>: For the FS objects serving as containers forother objects, i.e. directories, attribute directories, the indexdirectory, and queries, the cookie mechanism is used for a statefuliteration through the contained objects. The <tt>read_*()</tt> hook readsthe next one or more entries into a <tt>struct dirent</tt> buffer. The<tt>rewind_*()</tt> hook resets the iteration state to the first entry.- <em>Stat Information</em>: In case of nodes, attributes, and indicesdetailed information about an object are requested via a<tt>read*_stat()</tt> hook and must be written into a <tt>struct stat</tt>buffer.\section vnodes VNodesA vnode is the VFS representation of a node. As soon as an access to a nodeis requested, the VFS creates a corresponding vnode. The requesting entitygets a reference to the vnode for the time it works with the vnode andreleases the reference when done. When the last reference to a vnode hasbeen surrendered, the vnode is unused and the VFS can decide to destroy it(usually it is cached for a while longer).When the VFS creates a vnode, it invokes the volume's\link fs_volume_ops::get_vnode get_vnode() \endlinkhook to let it create the respective node handle (unless the FS requests thecreation of the vnode explicitely by calling publish_vnode()). That's theonly hook that specifies a node by ID; all other node-related hooks aredefined in the respective node's operation vector and they are passed therespective \c fs_vnode object. When the VFS deletes the vnode, it invokesthe nodes's \link fs_vnode_ops::put_vnode put_vnode() \endlinkhook or, if the node was marked removed,\link fs_vnode_ops::remove_vnode remove_vnode() \endlink.There are only four FS hooks through which the VFS gains knowledge of theexistence of a node. The first one is the\link file_system_module_info::mount mount() \endlinkhook. It is supposed to call \c publish_vnode() for the root node of thevolume and return its ID. The second one is the\link fs_vnode_ops::lookup lookup() \endlinkhook. Given a \c fs_vnode object of a directory and an entry name, it issupposed to call \c get_vnode() for the node the entry refers to and returnthe node ID.The remaining two hooks,\link fs_vnode_ops::read_dir read_dir() \endlink and\link fs_volume_ops::read_query read_query() \endlink,both return entries in a <tt>struct dirent</tt> structure, which alsocontains the ID of the node the entry refers to.\section mandatory_hooks Mandatory HooksWhich hooks a FS module should provide mainly depends on what functionalityit features. E.g. a FS without support for attribute, indices, and/orqueries can omit the respective hooks (i.e. set them to \c NULL in themodule, \c fs_volume_ops, and \c fs_vnode_ops structure). Some hooks aremandatory, though. A minimal read-only FS module must implement:- \link file_system_module_info::mount mount() \endlink and\link fs_volume_ops::unmount unmount() \endlink:Mounting and unmounting a volume is required for pretty obvious reasons.- \link fs_vnode_ops::lookup lookup() \endlink:The VFS uses this hook to resolve path names. It is probably one of themost frequently invoked hooks.- \link fs_volume_ops::get_vnode get_vnode() \endlink and\link fs_vnode_ops::put_vnode put_vnode() \endlink:Create respectively destroy the FS's private node handle whenthe VFS creates/deletes the vnode for a particular node.- \link fs_vnode_ops::read_stat read_stat() \endlink:Return a <tt>struct stat</tt> info for the given node, consisting of thetype and size of the node, its owner and access permissions, as well ascertain access times.- \link fs_vnode_ops::open open() \endlink,\link fs_vnode_ops::close close() \endlink, and\link fs_vnode_ops::free_cookie free_cookie() \endlink:Open and close a node as explained in \ref concepts.- \link fs_vnode_ops::read read() \endlink:Read data from an opened node (file). Even if the FS does not featurefiles, the hook has to be present anyway; it should return an error inthis case.- \link fs_vnode_ops::open_dir open_dir() \endlink,\link fs_vnode_ops::close_dir close_dir() \endlink, and\link fs_vnode_ops::free_dir_cookie free_dir_cookie() \endlink:Open and close a directory for entry iteration as explained in\ref concepts.- \link fs_vnode_ops::read_dir read_dir() \endlink and\link fs_vnode_ops::rewind_dir rewind_dir() \endlink:Read the next entry/entries from a directory, respectively reset theiterator to the first entry, as explained in \ref concepts.Although not strictly mandatory, a FS should additionally implement thefollowing hooks:- \link fs_volume_ops::read_fs_info read_fs_info() \endlink:Return general information about the volume, e.g. total and free size, andwhat special features (attributes, MIME types, queries) the volume/FSsupports.- \link fs_vnode_ops::read_symlink read_symlink() \endlink:Read the value of a symbolic link. Needed only, if the FS and volumesupport symbolic links at all. If absent symbolic links stored on thevolume won't be interpreted.- \link fs_vnode_ops::access access() \endlink:Return whether the current user has the given access permissions for anode. If the hook is absent the user is considered to have allpermissions.\section permissions Checking Access PermissionWhile there is the \link fs_vnode_ops::access access() \endlink hookthat explicitly checks access permission for a node, it is not used by theVFS to check access permissions for the other hooks. This has two reasons:It could be cheaper for the FS to do that in the respective hook (at leastit's not more expensive), and the FS can make sure that there are no raceconditions between the check and the start of the operation for the hook.The downside is that in most hooks the FS has to check those permissions.It is possible to simplify things a bit, though:- For operations that require the file system object in question (node,directory, index, attribute, attribute directory, query) to be open, mostof the checks can already be done in the respective <tt>open*()</tt> hook.E.g. in fs_vnode_ops::read() or fs_vnode_ops::write() one only has tocheck, if the file has been opened for reading/writing, not whether thecurrent process has the respective permissions.- The core of the fs_vnode_ops::access() hook can be moved into a privatefunction that can be easily reused in other hooks to check the permissionsfor the respective operations. In most cases this will reduce permissionchecking to one or two additional "if"s in the hooks where it is required.\section node_monitoring Node MonitoringOne of the nice features of Haiku's API is an easy way to monitordirectories or nodes for changes. That is one can register for watching agiven node for certain modification events and will get a notificationmessage whenever one of those events occurs. While other parts of theoperating system do the actual notification message delivery, it is theresponsibility of each file system to announce changes. It has to use thefollowing functions to do that:- notify_entry_created(): A directory entry has been created.- notify_entry_removed(): A directory entry has been removed.- notify_entry_moved(): A directory entry has been renamed and/or movedto another directory.- notify_stat_changed(): One or more members of the stat data for node havechanged. E.g. the \c st_size member changes when the file is truncated ordata have been written to it beyond its former size. The modification time(\c st_mtime) changes whenever a node is write-accessed. To avoid a floodof messages for small and frequent write operations on an open file thefile system can limit the number of notifications and mark them with theB_WATCH_INTERIM_STAT flag. When closing a modified file a notificationwithout that flag should be issued.- notify_attribute_changed(): An attribute of a node has been added,removed, or changed.If the file system supports queries, it needs to call the followingfunctions to make live queries work:- notify_query_entry_created(): A change caused an entry that didn't matchthe query predicate before to match now.- notify_query_entry_removed(): A change caused an entry that matchedthe query predicate before to no longer match.\section caches CachesThe Haiku kernel provides three kinds of caches that can be used by afile system implementation to speed up file system operations:- <em>Block cache</em>: Interesting for disk-based file systems. The devicethe file system volume is located on is considered to be divided inequally-sized blocks of data that can be accessed via the block cache API(e.g. block_cache_get() and block_cache_put()). As long as the system hasenough memory the block cache will keep all blocks that have been accessedin memory, thus allowing further accesses to be very fast.The block cache also has transaction support, which is of interest forjournaled file systems.- <em>File cache</em>: Stores file contents. The FS can decide to createa file cache for any of its files. The fs_vnode_ops::read() andfs_vnode_ops::write() hooks can then simply be implemented by calling thefile_cache_read() respectively file_cache_write() function, which willread the data from/write the data to the file cache. For reading uncacheddata or writing back cached data to the file, the file cache will invokethe fs_vnode_ops::io() hook.Only files for which the file cache is used, can be memory mapped (cf.mmap())- <em>Entry cache</em>: Can be used to speed up resolving paths. Normallythe VFS will call the fs_vnode_ops::lookup() hook for each element of thepath to be resolved, which, depending on the file system, can be more orless expensive. When the FS uses the entry cache, those calls will beavoided most of the time. All the file system has to do is invoke theentry_cache_add() function when it encounters an entry that might not yetbe known to the entry cache and entry_cache_remove() when a directoryentry has been removed.The entry cache can also be used for negative caching. If the file systemdetermines that the requested entry is not present during a lookup, it cancache this lookup failure by calling entry_cache_add_missing(). Furthercalls to fs_vnode_ops::lookup() for the missing entry will then beavoided.Note that it is safe to call entry_cache_add() andentry_cache_add_missing() with the same directory/name pair previouslygiven to either function to update a cache entry, without needing to callentry_cache_remove() first. It is also safe to call entry_cache_remove()for pairs that have never been added to the cache.*/// TODO:// * FS layers