We have servers (still in their early stage) on Linux, Windows, Mac OS X and now Wayland. All of this is available under the Apache license. One common mistake about RDP is to compare only in terms of the core feature set, which is graphics remoting and input. Wayland is a protocol, while Xorg is a display server (using X11 protocol). Wayland is the 'new' thing and works diffrenetly than x11. Back when x11 was created, it was common to have one poweful (time sharing) server and multiple clients that are connecting to it. Since Wayland was still largely unusable at the time, people forgot about it. Several years later, Nvidia ported Weston to their pre-existing EGLStream backend, in exactly the manner that was intended by the original Wayland design. They sent the patches to Wayland. In this tutorial, we will create a snap of a Wayland-native application to act as the graphical user interface (GUI) for an IoT or kiosk device. We will walk through the process and solve some common problems along the way. For the introduction to this tutorial series and the Mir display server please visit here.
A good way to understand the wayland architecture and how it isdifferent from X is to follow an event from the input device to thepoint where the change it affects appears on screen.
This is where we are now with X:
As suggested above, there are a few problems with this approach. The X server doesn't have the information to decide which window should receive the event, nor can it transform the screen coordinates to window-local coordinates. And even though X has handed responsibility for the final painting of the screen to the compositing manager, X still controls the front buffer and modesetting. Most of the complexity that the X server used to handle is now available in the kernel or self contained libraries (KMS, evdev, mesa, fontconfig, freetype, cairo, Qt, etc). In general, the X server is now just a middle man that introduces an extra step between applications and the compositor and an extra step between the compositor and the hardware.
In wayland the compositor is the display server. We transfer the control of KMS and evdev to the compositor. The wayland protocol lets the compositor send the input events directly to the clients and lets the client send the damage event directly to the compositor:
One of the details I left out in the above overview is how clients actually render under wayland. By removing the X server from the picture we also removed the mechanism by which X clients typically render. But there's another mechanism that we're already using with DRI2 under X: direct rendering. With direct rendering, the client and the server share a video memory buffer. The client links to a rendering library such as OpenGL that knows how to program the hardware and renders directly into the buffer. The compositor in turn can take the buffer and use it as a texture when it composites the desktop. After the initial setup, the client only needs to tell the compositor which buffer to use and when and where it has rendered new content into it.
This leaves an application with two ways to update its window contents:
In either case, the application must tell the compositor which area of the surface holds new contents. When the application renders directly to the shared buffer, the compositor needs to be noticed that there is new content. But also when exchanging buffers, the compositor doesn't assume anything changed, and needs a request from the application before it will repaint the desktop. The idea that even if an application passes a new buffer to the compositor, only a small part of the buffer may be different, like a blinking cursor or a spinner.
Typically, hardware enabling includes modesetting/display and EGL/GLES2. On top of that, Wayland needs a way to share buffers efficiently between processes. There are two sides to that, the client side and the server side.
On the client side we've defined a Wayland EGL platform. In the EGL model, that consists of the native types (EGLNativeDisplayType, EGLNativeWindowType and EGLNativePixmapType) and a way to create those types. In other words, it's the glue code that binds the EGL stack and its buffer sharing mechanism to the generic Wayland API. The EGL stack is expected to provide an implementation of the Wayland EGL platform. The full API is in the wayland-egl.h header. The open source implementation in the mesa EGL stack is in platform_wayland.c.
Under the hood, the EGL stack is expected to define a vendor-specific protocol extension that lets the client side EGL stack communicate buffer details with the compositor in order to share buffers. The point of the wayland-egl.h API is to abstract that away and just let the client create an EGLSurface for a Wayland surface and start rendering. The open source stack uses the drm Wayland extension, which lets the client discover the drm device to use and authenticate and then share drm (GEM) buffers with the compositor.
The server side of Wayland is the compositor and core UX for the vertical, typically integrating task switcher, app launcher, lock screen in one monolithic application. The server runs on top of a modesetting API (kernel modesetting, OpenWF Display or similar) and composites the final UI using a mix of EGL/GLES2 compositor and hardware overlays if available. Enabling modesetting, EGL/GLES2 and overlays is something that should be part of standard hardware bringup. The extra requirement for Wayland enabling is the EGL_WL_bind_wayland_display extension that lets the compositor create an EGLImage from a generic Wayland shared buffer. It's similar to the EGL_KHR_image_pixmap extension to create an EGLImage from an X pixmap.
The extension has a setup step where you have to bind the EGL display to a Wayland display. Then as the compositor receives generic Wayland buffers from the clients (typically when the client calls eglSwapBuffers), it will be able to pass the struct wl_buffer pointer to eglCreateImageKHR as the EGLClientBuffer argument and with EGL_WAYLAND_BUFFER_WL as the target. This will create an EGLImage, which can then be used by the compositor as a texture or passed to the modesetting code to use as an overlay plane. Again, this is implemented by the vendor specific protocol extension, which on the server side will receive the driver specific details about the shared buffer and turn that into an EGL image when the user calls eglCreateImageKHR.
In previous chapters, we built a simple client which can present its surfaces onthe display. Let's expand this code a bit to build a client which can receiveinput events. For the sake of simplicity, we're just going to be logging inputevents to stderr.
This is going to require a lot more code than we've worked with so far, so getstrapped in. The first thing we need to do is set up the seat.
The first thing we'll need is a reference to a seat. We'll add it to ourclient_state
struct, and add keyboard, pointer, and touch objects for lateruse as well:
We'll also need to update registry_global
to register a listener for thatseat.
Note that we bind to the latest version of the seat interface, version 7. Let'salso rig up that listener:
If you compile (cc -o client client.c xdg-shell-protocol.c
) and run this now,you should seat the name of the seat printed to stderr.
Let's get to pointer events. If you recall, pointer events from the Waylandserver are to be accumulated into a single logical event. For this reason, we'llneed to define a struct to store them in.
We'll be using a bitmask here to identify which events we've received for asingle pointer frame, and storing the relevant information from each event intheir respective fields. Let's add this to our state struct as well:
Then we'll need to update our wl_seat_capabilities
to set up the pointerobject for seats which are capable of pointer input.
This merits some explanation. Recall that capabilities
is a bitmask of thekinds of devices supported by this seat - a bitwise AND (&) with a capabilitywill produce a non-zero value if supported. Then, if we have a pointer and havenot already configured it, we take the first branch, usingwl_seat_get_pointer
to obtain a pointer reference and storing it in our state.If the seat does not support pointers, but we already have one configured, weuse wl_pointer_release
to get rid of it. Remember that the capabilities of aseat can change at runtime, for example when the user un-plugs and re-plugstheir mouse.
We also configured a listener for the pointer. Let's add the struct for that,too:
Pointers have a lot of events. Let's have a look at them.
The 'enter' and 'leave' events are fairly straightforward, and they set thestage for the rest of the implementation. We update the event mask to includethe appropriate event, then populate it with the data we were provided. The'motion' and 'button' events are rather similar:
Axis events are somewhat more complex, because there are two axes: horizontaland vertical. Thus, our pointer_event
struct contains an array with two groupsof axis events. Our code to handle these ends up something like this:
Similarly straightforward, aside from the main change of updating whichever axiswas affected. Note the use of the 'valid' boolean as well: it's possible thatwe'll receive a pointer frame which updates one axis, but not another, so we usethis 'valid' value to determine which axes were updated in the frame event.
Speaking of which, it's time for the main attraction: our 'frame' handler.
It certainly is the longest of the bunch, isn't it? Hopefully it isn't tooconfusing, though. All we're doing here is pretty-printing the accumulated statefor this frame to stderr. If you compile and run this again now, you should beable to wiggle your mouse over the window and see input events printed out!
Let's update our client_state
struct with some fields to store XKB state.
We need the xkbcommon headers to define these. While we're at it, I'm going topull in assert.h
as well:
We'll also need to initialize the xkb_context in our main function:
Next, let's update our seat capabilities function to rig up our keyboardlistener, too.
We'll have to define the wl_keyboard_listener
we use here, too.
And now, the meat of the changes. Let's start with the keymap:
Now we can see why we added assert.h
— we're using it here to make surethat the keymap format is the one we expect. Then, we use mmap to map the filedescriptor the compositor sent us to a char *
pointer we can pass intoxkb_keymap_new_from_string
. Don't forget to munmap and close that fdafterwards — then we set up our XKB state. Note as well that we have alsounrefed any previous XKB keymap or state that we had set up in a prior call tothis function, in case the compositor changes the keymap at runtime.1
When the keyboard 'enters' our surface, we have received keyboard focus. Thecompositor forwards a list of keys which were already pressed at that time, andhere we just enumerate them and log their keysym names and UTF-8 equivalent.We'll do something similar when keys are pressed:
And finally, we add small implementations of the three remaining events:
For modifiers, we could decode these further, but most applications won't needto. We just update the XKB state here. As for handling key repeat — thishas a lot of constraints particular to your application. Do you want to repeattext input? Do you want to repeat keyboard shortcuts? How does the timing ofthese interact with your event loop? The answers to these questions is left foryou to decide.
If you compile this again, you should be able to start typing into the windowand see your input printed into the log. Huzzah!
Finally, we'll add support for touch-capable devices. Like pointers, a 'frame'event exists for touch devices. However, they're further complicated by thepossibility that mulitple touch points may be updated within a single frame.We'll add some more structures and enums to represent the accumulated state:
Note that I've arbitrarily choosen 10 touchpoints here, with the assumption thatmost users will only ever use that many fingers. For larger, multi-user touchscreens, you may need a higher limit. Additionally, some touch hardware supportsfewer than 10 touch points concurrently — 8 is also common, and hardwarewhich supports fewer still is common among older devices.
We'll add this struct to client_state
:
And we'll update the seat capabilities handler to rig up a listener when touchsupport is available.
We've repeated again the pattern of handling both the appearance anddisappearance of touch capabilities on the seat, so we're robust to devicesappearing and disappearing at runtime. It's less common for touch devices to behotplugged, though.
Here's the listener itself:
To deal with multiple touch points, we'll need to write a small helper function:
The basic purpose of this function is to pick a touch_point
from the array weadded to the touch_event
struct, based on the touch ID we're receiving eventsfor. If we find an existing touch_point
for that ID, we return it. If not, wereturn the first available touch point. In case we run out, we return NULL.
Now we can take advantage of this to implement our first function: touch up.
Like the pointer events, we're also simply accumulating this state for lateruse. We don't yet know if this event represents a complete touch frame. Let'sadd something similar for touch up:
And for motion:
The touch cancel event is somewhat different, as it 'cancels' all active touchpoints at once. We'll just store this in the touch_event
's top-level eventmask.
The shape and orientation events are similar to up, down, and move, however, inthat they inform us about the dimensions of a specific touch point.
And finally, upon receiving a frame event, we can interpret all of thisaccumulated state as a single input event, much like our pointer code.
Compile and run this again, and you'll be able to see touch events printed tostderr as you interact with your touch device (assuming you have such a deviceto test with). And now our client supports input!
There are a lot of different kinds of input devices, so extending our code tosupport them was a fair bit of work — our code has grown by 2.5× inthis chapter alone. The rewards should feel pretty great, though, as you are nowfamiliar with enough Wayland concepts (and code) that you can implement a lot ofclients.
There's still a little bit more to learn — in the last few chapters, we'llcover popup windows, context menus, interactive window moving and resizing,clipboard and drag & drop support, and, later, a handful of interesting protocolextensions which support more niche use-cases. I definitely recommend reading atleast chapter 10.1 before you start building your own client, as it coversthings like having the window resized at the compositor's request.