User:Lone Rifle/kjk on ws2
Winsock is a framework for user mode transport protocol drivers.
In this framework, drivers are called providers. In Winsock version 1, sockets weren't even guaranteed, or even expected, to be valid files. However, in Winsock version 2, thanks to AFD, all sockets are files.
Despite this, Winsock calls are still routed in user mode. For each public API there is a corresponding provider callback. To further understand this in context, Winsock is a standard. There used to be third-party implementations, but eventually, Microsoft extended the Winsock specification with high performance APIs. This was long before epoll/kpoll et al., when all one could count on for asynchronous I/O was select or maybe poll. While the high performance APIs were never standardized, an extension mechanism was.
The mechanism is a special IOCTL that let you get the address of provider functions that couldn't be exported through the public API. Among these is ConnectEx. Traditional BSD connect has, among other things, no way to specify a connection timeout. Also, the BSD API wasn't very friendly to the Windows NT asynchronous I/O model, hence the need for ConnectEx. Generally speaking, ConnectEx is to connect what WriteFileEx is to WriteFile.
Extension example: ConnectEx
The grand unified Microsoft Winsock provider, MSAFD, exports from its DLL (mswsock.dll) a ConnectEx function. However, it's not specific to msafd but an extension of the high-level API. One might as well pretend ConnectEx was actually exported from ws2_32.dll because it has no knowledge of AFD (In reality, it is of course not exported from ws2_32.dll since it is not a Winsock 2 API).
What this extension does is query the provider of a socket for the ConnectEx implementation and then call it, just like connect queries the provider of the socket for the connect implementation and then calls it. This way, a non-MSAFD provider is free to implement the ConnectEx extension. This is why there's a ConnectEx function exported from mswsock.dll: to abstract the process from the application, so it doesn't have to query the provider itself, at the cost of lost efficiency if the application does it on its own, since it can safely cache the return value.
In contrast, while libisc does handle this on its own, it has a few problems. It assumes a single provider, but all code that uses select or WSAPoll does. While incorrect, it is done in good company and in good faith. If it is to be done properly however, you should query each and every socket separately since each could come from a different provider, each with its own implementation, hence the whole model in the first place. However, different providers would break programs built around select. This might not be a problem at all if the program does NOT use select and uses I/O completion queues instead, but such programs are rare especially since there is no widely used event handling library that supports I/O completion queues as well.
Extensions are function pointers retrieved with a Win32-style IOCTL, with separate input and output (as opposed to a BSD-style IOCTL, which typically has a single argument). The input of the IOCTL is a GUID that identifies the function to be returned, while the output is a function pointer.
As mentioned earlier, MSAFD has a single, global implementation of the ConnectEx extension. This, like all MSAFD functions, is just a wrapper around an IOCTL handled by AFD in kernel mode. Winsock ioctls are strictly user mode-only because the "drivers" that handle them, providers, run in user mode. Providers can have an associated kernel mode driver, but that is left to their own discretion.
While Winsock v2 treats all sockets as files, sockets should be opened with CreateNamedPipe instead of CreateFile. Windows only has three kinds of files, files, pipes and mailslots (multicast pipes), four if you count devices. The closest one can get to a "socket" file type is "pipe". This ensures the socket cannot be used for anything else than I/O and ioctls. Using CreateFile on the other hand will break GetFileType.
GetFileType is more of a framework-related item, to be used by libraries that wrap file handles in classes. It's usually very important for those libraries to distinguish between real files and everything else
Suggestions and further work
- Completion ports/queues. the counterintuitive part of completion queues is that you have to begin the operation to have a notification. Most event models notify you before the operation to tell you when the operation can be performed
- An epoll API, for compatibility