http://www.golem.de/news/lizardfs-software-defined-storage-wie-es-sein-soll-1604- 119518.html Published: 27/04/2016 12:07 http://glm.io/119518
LizardFS
alternatives to expensive appliances are available in software-defined Storage few: it is often sufficient to interconnect standard hardware with software to a failsafe storage pool. Who it was not satisfied with solutions like GlusterFS, LizardFS should test.
A classic way to make space available, the acquisition of one or more storage appliances, the software and hardware interlock tightly together. However, these are not very flexible, not scalable and can not be so easily migrate to solutions from other manufacturers.
The situation is different from the software-defined storage (SDS): Here abstracted software the storage function of the hardware. Ideally, several legacy servers with hard drives or SSDs can be connected together to form a pool of devices, each system provides space and – depending on the solution – can take over different tasks. Scaling is possible by additional devices are added, flexibility is given by the independence from the hardware manufacturer and a fast response to growing demands. Normally, the software also ensures that data is stored failsafe on server boundaries.
Large selection and great differences
In the open source area there are already some solutions: The most famous are Lustre, GlusterFS, Ceph and MooseFS. Not all are equally good, and some are specialized, such as on Ceph and Object Storage. Especially in demand is the feature in which a SDS provides a Posix-compatible file system – from the perspective of the client to the distributed file system has the same characteristics as an ordinary locally used file system (for example, ext4).
Some of the available solutions are operated by large companies, such as Lustre, GlusterFS and Ceph. Others depend on a few developers or even no longer cared how MooseFS. The project operates in parts as dead or as a one-man project without long-term strategy and active community. This circumstance took some developers from Poland in Summer 2013 as an opportunity to create a Fork and actively develop it at GLPLv3 license: LizardFS was born.
LizardFS is called by its developers as a distributed scalable file system with enterprise features such as fault tolerance, reliability and high availability. The software is mainly developed by ten developers under the umbrella of the Warsaw company Skytechnology independently.
If you want to install LizardFS, the software can build from the sources or from the download page packages for Debian, Ubuntu, CentOS and Red Hat relate. There is LizardFS since the past few months through various official distribution sources.
Components and Architecture
The design of LizardFS provides for a separation of metadata such as file names, locations and checksums of the actual data. To avoid inconsistencies and atomic actions on the file system level are possible, all operations must be funneled through a so-called master. He stops in front of all the metadata and is the central contact for server components and clients.
This may turn out a master, he can run -Pulley also in a “Shadow”. Here a master on an additional server is installed, but remains passive. The Shadow Master brings permanently from any changes in the metadata and therefore reflects the state of the file system in their own memory. If the master fails, the second system can switch over to the Shadow-master in the active role and provide further all participants with information.
The open source version of LizardFS however do not include automatic failover between the primary and theoretically any number of secondary masters. As an administrator, you should therefore either forced to switch manually or to build its own failover mechanism, as based on a Pacemaker cluster. However, since the switching of the master rollers consists only of changing a configuration variable and a reload of the master daemon, administrators should with experience can find its own solution in the operation of clusters quickly.
is managed, stored and replicated across the chunk server
The so-called chunk server for managing, storing and replicating the actual data responsible. Like all other components also, the chunk Server service is installed on any Linux system. Chunk servers should ideally have fast storage media (such as SAS HDs or SSDs) have and can use a file system to export as part of the storage pools. In the smallest case, a chunk server is running in a virtual machine and is for example, a 20 GB large Ext4 filesystem free.
All Chunk servers are interconnected and close their respective local file systems to a pool together. The data is split over a certain size in individual Stripes, then appear before the eyes of the client but still as a file.
The Metadata Backup logger collects similar to a Shadow Master always changes the metadata and should naturally run on its own system. Unlike a typical master of the metadata but does not keep in memory, but the local file system before. In the unlikely event of a total failure of all LizardFS master thus a backup for disaster recovery is available.
File Operations
In order for a system of LizardFS exported share can mount the package Lizardfs-client must be installed. A simple command ensures that the system mnt on a mount point, such as under /, can access the data that lie dormant in the Chunk servers. Once the user or an application on such a file is trying to access the installed LizardFS client contacted the current master. This is a list of chunk servers back, harboring the desired file. The client takes the list in reception and contacted after one of Chunk Server to issue an invitation to send the file. The chunk server responds concluded with the desired data stream.
If a file is written or modified on the LizardFS storage pool, the LizardFS client consultation must also stop with the master. It could be that an existing file was moved because of Balancing mechanisms or just still available chunk server has gone offline. Only when the master selects a chunk servers for writing and sent to the client, it sends its data to the target chunk servers in the next step.
This confirms the write operation and forwards, if necessary, a replication of the newly written data. This represents the chunk server ensures that a possibly set replication target is as soon as possible fulfilled. An answer to the LizardFS client signals a successful write. The client then terminated the just opened write-session by the master shall be informed of the just ended process.
Both the reading, and the writing operation of LizardFS client pays attention to a potentially configured topology. If it appears to the client useful preferred this location near Chunk servers over those that may be accessible only with a higher latency.
Simple, but allow users and administrators to keep sufficient Web GUI
track, LizardFS introduces the CGI server a simple but sufficient web GUI ready. This component can in principle be on any system with additional installed and provides an overview of all servers in the network, the number of files, the replication status and other important information.
Who also a support contract ends with the company behind LizardFS, can use the proprietary uRaft daemon. This tool is independent of LizardFS and offers based on the Raft consensus algorithm a mechanism for automated failover. Using Heartbeats all participating master daemon always choose from a wide responsibility. For this to work reliably, an odd number should for a quorum are to mastering available.
uRaft provides then with the help of simple shell scripts that the Master will each promote or demote. Who Runs with the uRaft daemon LizardFS, are the typical master and shadow rolls and leaves uRaft throughout the administration of the individual rollers. Moreover uRaft ensures the start and stop the master daemon, which administrators no longer allowed to use the init script for LizardFS master.
uRaft assumes that all master in the same network are and a floating IP can be moved with the respective primary master. The uRaft daemon is installed on each server on which a master service running.
All items are designed hardly demands on the underlying system. Only the master server should, depending on the number of files to manage, have more memory.
feature diversity and limits
LizardFS exhibits client view a Posix-compatible file system available which – similar to NFS – a Mount can be hooked command. While the server components necessarily presuppose Linux as the underlying operating system, may, in addition to Linux-based clients to access Windows computers on the network file system.
Naturally runs LizardFS ideally on multiple servers, where it plays for the functionality not matter if virtual machines or standard hardware used. All servers can occupy all or different roles, and specialization is partially useful. So should rather run on systems with much (and possibly fast) memory chunk server while the master server mainly manufacture requirements for CPU and memory. The Metadata Backup logger, however, can run along because of low demands on a small virtual machine or a backup server.
The data using predefined replication targets ( “Goals”) often replicated as required between the systems and are so redundant, while forgiving ago – turns out a chunk server, the data yet about other servers are available. If the defective system repaired and reintegrated in the composite, LizardFS automatically takes over the redistribution of files so that the replication targets can be met again. The Goals can be set by default to server-side; alternatively allows you also to the clients to set replication targets. For example, it is possible in principle, have an up mounted file system redundant, but to give the user the opportunity, for example, place temporary files only once in the storage pool.
The Chunk server can be fed with the topologies from its own data center, which is in the picture LizardFS whether the chunk servers are in the same rack or cage. Who configures topologies sense caused by replication traffic can thus keep behind the same switch or within a co-location, because the topologies are also propagated to the clients.
LizardFS about data center boundaries operate
If you want to operate LizardFS about data center boundaries, topologies can also use as a basis for Georeplikation and show LizardFS which Chunk servers are located in which data center. The clients that access the storage pool of LizardFS, can be induced thus, RZ local Chunk servers against preferred distant. In addition, users and applications do not have to strive for their own replication. If the topologies are configured correctly, the Georeplikation just fall out behind the data is automatically synchronized between two or more locations.
The clearances provided by LizardFS can be hooked in principle by all network participants. If you want to restrict this access, but may for example be limited to a read-only export or assign ACLs based on network domains and / or IPs. The award of a password is possible. coming possibly from an LDAP or Active Directory – – At this stage, further restrictions, for example, to specific user groups, are not provided.
The configuration file for the LizardFS-export is very similar to the known NFS file / etc / exports and accepted partly known by NFS parameters.
LizardFS supports archiving
Who can access many different systems or users on LizardFS volumes can enable quotas and thus limit the use of memory. Because users of Samba and Windows shares are accustomed to popular trash, can be such a turn for LizardFS shares. In the trash moved files are then once not deleted and remain in the chunk servers are, as long as the derivative has not yet been configured exceeded. Administrators have the ability to mount LizardFS shares with a specific parameter, which then manage access to the virtual wastebasket. Unfortunately, however, lacks the option to give users even access to previously deleted data.
Another way to reproach files that provide snapshots. There is a command to duplicate a file in a snapshot. This action is particularly efficient – the master server copied once only the metadata. Only when the newly created file from the original is different, the corresponding blocks will be modified in the Chunk servers.
To archive files, LizardFS suitable way also – thanks to the replication-Goals and topologies files can be designed so that a copy of them always land on a desired or set of Chunk servers. It is thus conceivable that a group Chunk servers have tapes (LTO) as storage media because LizardFS can address them natively. If desired, LizardFS can therefore ensure that certain data is always kept on tape and can be read when needed it.
is a previously mentioned Web GUI available for monitoring a LizardFS environment. Since this but a classic monitoring can not replace, LizardFS provides for virtually all administrative tools that allow about querying states, a special output format, which can be easily read by checks. One connection to Nagios or other monitoring tools is therefore nothing in the way. Thanks to backward compatibility to an older version of MooseFS also many modules or plugins can be used, which can be found on the Internet to monitor MooseFS. Resourceful Managers write their own checks.
More Chunk server for more storage increases
If the demand for storage space, more Chunk servers can be added. It is also possible to remove existing chunk server, as long as the data thereon be otherwise held. If one chunk server LizardFS provides autonomously for a re-balance and just provides clients accessing alternative chunk server as data resources.
Similar to the chunk servers can incidentally also scale master server – meet two Master about because of the location no longer needs, can be more easily added taken.
If administrators want to store their data not only fail-safe and fault-tolerant, but also protect from prying eyes, additional measures must be taken. Although LizardFS shares files under circumstances in several stripes. But this often is not enough to meet the requirements of security and privacy. Since innately no encryption mechanism is built, LizardFS Admins must ensure that the level is encrypted under the exported from Chunk servers filesystem. This may involve the use of self-encrypting hard drives or via LUKS encrypted file system containers.
applications
Since LizardFS the “client” data presented as a classic storage, it can in principle be used wherever space is needed. However, sensible is used when the data should be kept, for example, redundant or alternative to traditional, expensive and inflexible storage appliances is required.
Who contemplates the use of LizardFS into consideration, however, should keep in mind that typically multiple servers are connected together to form a large pool. This can not always be guaranteed that contiguous data, such as a database, to be stored on a single chunk server. The administrator can through topologies exert influence on locations (for example: a copy of the rack 14, another copy in the rack 15), but this is – depending on the application scenario and configuration of LizardFS – not necessarily a guarantee of fixed locations is <. /> p>
It is possible that an application of their data across multiple chunk servers (and optionally rack or even data center boundaries) writes. There are situations in which are then delayed by increased network latency read and write operations. Depending on the application, this leads to longer response times of the application used. Typically therefore avoids for example, to be written large and heavily used databases in a distributed file system. For most other applications however LizardFS suitable problems.
Even as a storage pool for a render farm or media files suitable LizardFS because all clients simultaneously access different chunk servers and thus can exhaust the performance of each participating server. Less well known is the possibility of using LizardFS as storage for virtual machines. claims to Skytechnology is also working on a better integration in VMWare to make LizardFS interesting than cloud VM storage.
experience
The author has been working for some time with SDS technology and testing a LizardFS environment under a Pilotierungsprojekt. The several month long test run is based on the available time of starting for Debian 7 LizardFS version 2.6 and provides the verification of the software under production-like conditions including various tests, including failover, before. Although
The tests described here cover only a portion of the possible events, but to show that the solution works in principle and can handle error cases. The behavior of LizardFS in this case acts comprehensible usually, though sometimes a bit sluggish.
One of the test scenarios was restarting a Masters in order to provoke a change in the master role to a different node in another data center. The concurrent read access of a client has been interrupted during master change for nearly two seconds was then continued, however. So in such cases it is important that the accessing application can deal with such delays and the master change is performed as soon as possible.
Another scenario envisages that clients know the RZ-Chunk local server and they prefer when reading over the chunk servers from another location. During a read operation by an RZ-local client chunk servers are disconnected in the same location with which the client related data without noticeable delay of the chunk servers in the other location. Once separated from the mains Chunk servers in the same data center were available again, the client automatically swiveled back and read the data back from the original node.
These tests have shown a limit: In LizardFS Goals are set, which state how many Distributed File System is to hold a file. As a rule, is at least as many to clear more Chunk server on, so that this target can be met. When are through testing, reboots, crashes or other failures, however, too few chunk servers online to meet the target for a new file immediately, the LizardFS master refuses to write. In this example, only two Chunk servers were available at a Replication Goal three. A LizardFS developers confirmed this behavior and announced an internal discussion on this topic.
Compared
LizardFS must be measured in SDS-area with many competitors, some with Ceph or GlusterFS. Ceph is primarily a to Amazon S3 API compatible object store, but can also block devices or a Posix-compatible file system ( “CephFS”) provide. The latter is more an overlay over the object store as a robust file system. Manufacturers write itself on its website that CephFS currently primarily aimed at the early adopter and any important data should be stored on it. Since Ceph has its focus on the object store functionality, it can not be considered a direct competitor to LizardFS.
GlusterFS provides on paper nearly the same functionality, but has long been on the market and enjoys accordingly much prestige in the SDS community. GlusterFS offers many different operating modes that offer various levels of reliability and performance depending on the configuration. It offers these configuration options at the volume level while LizardFS defines the replication targets per folder or file. Both variants have their advantages and disadvantages: In GlusterFS the administrator when creating the volumes have to decide on a variation, while the type of replication is always amendable at LizardFS.
For security-conscious system administrator GlusterFS provides the ability to encrypt a volume with a key. Only the server that can demonstrate the correct key, are then able to mount the volume, and to decipher. Both GlusterFS and LizardFS run on Linux clients as FUSE module in user space.
While LizardFS the data must be written only once on a chunk server (then replicate the chunk servers to each other on) accepts at GlusterFS the client replication: The write operation is done on all participating GlusterFS servers in parallel, so that the client needs to ensure that replication has completed successfully throughout. This ensures on writes for slower performance, but does not fall otherwise significant.
While LizardFS the client always presented a master, GlusterFS clients can specify multiple GlusterFS server when mounting the volume. This may have recourse to other GlusterFS nodes a client in case of failure of the first given server.
View
In addition to the usual fixing of bugs and improving performance and stability, the makers of LizardFS see also better integration with VMWare before. Apparently, there are still no plans for the integration of data encryption, but a protocol rewrite is planned, which can in principle form the basis for new features such as encryption.
toLike to update Ceph users win LizardFS also wants in not exactly dated future also provide an S3-compatible API. In the long term it will probably be possible to use LizardFS as Object Store. It remains unclear as to when LizardFS to provide this functionality. For the development of a larger user group and a SPARC port is planned.
Other minor improvements are extended ACLs, order-based quota, better logging behavior on Windows clients and minimal Goal Settings included. The latter has long been desired by the community, and to ensure greater reliability and data security.
Conclusion
LizardFS positions itself as an alternative to GlusterFS and is now operated in setups with several hundred terabytes as SDS solution. The project offers many interesting features and should be sufficient for most needs. Moreover, it can keep data in sync between multiple data centers. An alternative to Ceph is not because LizardFS can not provide object store.
for improvement there is in the external communication (roadmap, documentation, community work), the software versioning – small version change on a time, a major release – and in technical detail: An SDS should already bring a native encryption for communication between the nodes and data and ensure that write operations continue even when many Chunk servers are offline.
It is a pity that the company behind LizardFS reserves the daemon for automatic failover paying customers and the community provides no equivalent available. Although there are indications in the Community as a Pacemaker cluster for automated failover can provide. But here there is a lack of clear instructions, scripts and recommendations by the LizardFS-makers.
For those looking for an alternative to GlusterFS or simply a functioning SDS, LizardFS but should be interesting. The future will show how serious their announcements mean the developer, as with respect to the implementation of an S3-compatible API, and if ten active developers can lift a project with such ambitions in the long term.
About the Author
Valentin Höbel works as a Cloud Architect for nfon AG from Munich. In his spare time he works on open source technologies and reported in online and print media about his experiences. Recently he helps free Build a community forum for LizardFS to facilitate other users to share on this project
Blog:. Http://www.xenuser.org
> Twitter: https://twitter.com/xenuser (vah)
Synology DSM 6.0 brings Btrfs and snapshots of NAS systems
(03/25/2016, http://glm.io/119985)
T-Com offers solutions for secure data communication
(11.03.2005, http://glm.io/36729)
patents: Red Hat uses GPL for countersuit
(17.09.2012, http://glm.io/94595)
Distributed file systems: Mark Shuttleworth invested 1 million dollars in Ceph
(12.09.2012, http://glm.io/94498)
network and storage: Linux Foundation founded Fast Data project
(02/17/2016, http://glm.io/ 119186)