Most of zvol & vdev_disk is complete (though untested). Here's a summary of notable aspects and pending issues related to these two interfaces:
Pseudo disk structure
- The ZVOL top-half presents a volume through a device node.
- Calls go through the ZVOL device into the Intent Log (ZIL), through the I/O Pipeline (ZIO), where much of the ZFS magic occurs,
- And then eventually the Virtual Devices (vdevs) interact with the underlying disk drivers.
In order to configure the pseudo disk, we do the following:
/* * Configure pseudo-disk interface */ zv->zv_dk.dk_name = zv->zv_name; zv->zv_dk.dk_driver = &zvoldkdriver; pseudo_disk_init(&zv->zv_dk); zv->zv_flags = DKF_INITED; /* * XXX Is there a need to set a geometry. It is very likely possible * that zv_volsize can handle everything. */ /* Attach ZVOL */ pseudo_disk_attach(&zv->zv_dk); dkwedge_discover(&zv->zv_dk);
At one point I was using the dk_softc structure out of sys/dev/dkvar.h, which, you may notice, has many of the same fields that are in the zvol's softc:
struct zvol_softc {
struct device zv_device; /* device softc for driver(9) */
char zv_name[MAXPATHLEN]; /* pool/dd name */
uint64_t zv_volsize; /* amount of space we advertise */
uint64_t zv_volblocksize; /* volume block size */
Note that there are sufficiently large types to hold these values. dk_softc.sc_size is only a (4 byte) size_t.
minor_t zv_minor; /* minor number */ uint8_t zv_min_bs; /* minimum addressable block shift */ uint8_t zv_readonly; /* hard readonly; like write-protect */ objset_t *zv_objset; /* objset handle */ uint32_t zv_mode; /* DS_MODE_* flags at open time */ uint32_t zv_total_opens; /* total open count */ zilog_t *zv_zilog; /* ZIL handle */ uint64_t zv_txg_assign; /* txg to assign during ZIL replay */ znode_t zv_znode; /* for range locking */ struct disk zv_dk; /* disk interface */ uint32_t zv_flags; /* DKF_* state flags */ };
IOCTLs
- case DKIOCFLUSHWRITECACHE:
case DIOCCACHESYNC:I've think I've got this one figured out, actually.
I was forewarned about the cache-flushing ioctl being important to proper ZFS behavior. Solaris's DKIOCFLUSHWRITECACHE is, I believe, equivalent to NetBSD's DIOCCACHESYNC (once the disk's write cache has been enabled with (DIOCSCACHE, DKCACHE_WRITE)). In the underlying vdevs, VOP_IOCTL(DIOCCACHESYNC) is performed on the backing device node's vnode.
The vnode, in this case (or, at this point?), is that yielded from a lookup(9) on the device node as specified to zpool(1).
- case DIOCAWEDGE:
case DIOCDWEDGE:
case DIOCLWEDGES:These were trivial to import from pre-existing pseudo-disk drivers.
- case DIOCGDINFO:
case DIOCWDINFO:
case DIOCSDINFO:
case DIOCGPART:
case DIOCWLABEL:
case DIOCGDEFLABEL:These have not been absolutely trivial to import. Can these IOCTLs be left unsupported? As I understand it, we could simply require GPT labels to be written for ZVOL storage.
Proplib
gpt(8) uses drvctl to get sector and media size, so we need to update the prop_dictionary in the zvol_softc with accurate information. sys/kern/kern_drvctl.c uses the dv_properties dictionary in struct device. However, ld(4) uses the dk_info in struct disk. I haven't resolved what needs to be done here. Either way, I gather that (de)referencing will have to occur during disk attachment and detachment.
Disk IDs
Also, Solaris and FreeBSD are able to keep track of a unique disk identifications, so that when a disk is reattached, e.g. on a different controller, it can be detected and correctly added to the proper zpool.
Stay tuned for the next half of this status update...