DragonFly On-Line Manual Pages
NVME(4) DragonFly Kernel Interfaces Manual NVME(4)
NAME
nvme -- NVM Express Controller for PCIe-based SSDs
SYNOPSIS
To compile this driver into the kernel, place the following line in your
kernel configuration file:
device nvme
Alternatively, to load the driver as a module at boot time, place the
following line in loader.conf(5):
nvme_load="YES"
DESCRIPTION
The nvme driver provides support for PCIe storage controllers conforming
to the NVM Express Controller Interface specification. NVMe controllers
have a direct PCIe host interface to the controller which in turn has a
direct connection to the underlying non-volatile (typically flash)
storage, yielding a huge reduction in latency and increase in performance
over ahci(4).
In addition, NVMe controllers are capable of supporting up to 65535
independent submission and completion queues each able to support upwards
of 16384 queue entries. Each queue may be assigned its own interrupt
vector out of the controller's pool (up to 2048).
Actual controllers typically implement lower limits. While most
controllers allow a maximal number of queue entries, the total number of
queues is often limited to far less than 65535. 8-32 queues are commonly
supported. Similarly, while up to 2048 MSI-X vectors can be supported by
the spec, actual controllers typically support fewer vectors. Still,
having several MSI-X vectors allows interrupts to be distributed to
multiple CPUs, reducing bottlenecks and improving performance. The
multiple queues can be divvied up across available cpu cores by the
driver, as well as split-up based on the type of I/O operation being
performed (such as giving read and write I/O commands their own queues).
This also significantly reduces bottlenecks and improves performance,
particularly in mixed read-write environments.
FORM FACTOR
NVMe boards usually come in one of two flavors, either a tiny form-factor
with a M.2 or NGFF connector, supplying 2 or 4 PCIe lanes, or in a larger
form that slips into a normal PCIe slot. The larger form typically
implements 2, 4, or 8 lanes. Also note that adapter cards that fit into
normal PCIe slots and can mount the smaller M.2/NGFF NVME cards can be
cheaply purchased.
PERFORMANCE
Typical performance for a 2-lane (x2) board is in the 700MB/s to 1.5
GByte/s range. 4-lane (x4) boards typically range from 1.0 GBytes/s to
2.5 GBytes/s. Full-blown PCIe cards run the whole gamut, 2.5 GBytes/sec
is fairly typical but performance can exceed 5 GBytes/sec in a high-end
card.
Multi-threaded random-read performance can exceed 300,000 IOPS on an x4
board. Single-threaded performance is usually in the 40,000 to 100,000
IOPS range. Sequential submission/completion latencies are typically
below 35uS while random submission/completion latencies are typically
below 110uS. Performance (uncached) through a filesystem will be
bottlenecked by additional factors, particularly if testing is only being
done on a single file.
The biggest differentiation between boards is usually write performance.
Small boards with only a few flash chips have relatively low write
performance, usually in the 150MByte/sec range. Higher-end boards will
have significantly better write performance, potentially exceeding 1.0
GByte/sec.
For reference, the SATA-III physical interface is limited to 600
MBytes/sec and the extra phy layer results in higher latencies, and AHCI
controllers are limited to a single 32-entry queue.
FEATURES
The DragonFly nvme driver automatically selects the best SMP-friendly and
I/O-typing queue configuration possible based on what the controller
supports. It uses a direct disk device API which bypasses CAM, so kernel
code paths to read and write blocks are SMP-friendly and, depending on
the queue configuration, potentially conflict-free. The driver is
capable of submitting commands and processing responses on multiple
queues simultaniously in a SMP environment.
The driver pre-reserves DMA memory for all necessary descriptors, queue
entries, and internal driver structures, and allows for a very generous
number of queue entries (1024 x NQueues) for maximum performance.
HINTS ON NVME CARDS
So far I've only been able to test one Samsung NVME M.2 card and an Intel
750 HHHL (half-height / half-length) PCIe card.
My recommendation is to go with Samsung. The firmware is pretty good.
It appears to be implemented reasonably well regardless of the queue
configuration or I/O blocksize employed, giving expected scaling without
any quirky behavior.
The intel 750 has very poorly-implemented firmware. For example, the
more queues the driver configures, the poorer the single-threaded read
performance is. And no matter the queue configuration it appears that
adding a second concurrent reader drops performance drastically, then it
slowly increases as you add more concurrent readers. In addition, on the
750, the firmware degrades horribly when reads use a blocksize of 64KB.
The best performance is at 32KB. In fact, performance again degrades
horribly if you drop down to 16KB. And if that weren't bad enough, the
750 takes over 13 seconds to become ready after a machine power-up or
reset.
The grand result of all of this is that filesystem performance through an
Intel NVME card is going to be hit-or-miss, depending on inconseqential
differences in blocksize and queue configuration. Regardless of whatever
hacks Intel might be employing in their own drivers, this is just totally
unacceptable driver behavior.
I do not recommend rebranders like Plextor or Kingston. For one thing,
if you do buy these, be very careful to get one that is actually a NVME
card and not a M.2 card with an AHCI controller on it. Plextor's
performance is particularly bad. Kingston seems to have done a better
job and reading at 1.0GB/s+ is possible despite the cpu overhead of going
through an AHCI controller (the flash in both cases is directly connected
to the controller, so there is no SATA Phy to get in the way). Of
course, if you actually want an AHCI card, then these might be the way to
go, and you might even be able to boot from them.
HINTS ON CONFIGURING MACHINES (BIOS)
If nvme locks up while trying to probe the BIOS did something horrible to
the PCIe card. If you have enabled your BIOS's FastBoot option, turn it
off, this may fix the issue.
Not all BIOSes can boot from a NVMe card. Those that can typically
require booting via EFI.
SEE ALSO
ahci(4), intro(4), pci(4), loader.conf(5), nvmectl(8)
HISTORY
The nvme driver first appeared in DragonFly 4.5.
AUTHORS
The nvme driver for DragonFly was written from scratch by Matthew Dillon
<dillon@backplane.com> based on the NVM Express 1.2a specification.
DragonFly 4.7 June 5, 2015 DragonFly 4.7