Next 5 entries

The following files now compile:

sys/compat/opensolaris/kern/opensolaris_kmem.c
sys/compat/opensolaris/kern/opensolaris_kobj.c
sys/compat/opensolaris/kern/opensolaris_kstat.c
sys/compat/opensolaris/kern/opensolaris_misc.c
sys/compat/opensolaris/kern/opensolaris_policy.c
sys/compat/opensolaris/kern/opensolaris_string.c
sys/contrib/opensolaris/common/acl/acl_common.c
sys/contrib/opensolaris/common/avl/avl.c
sys/contrib/opensolaris/common/nvpair/nvpair.c
sys/contrib/opensolaris/common/zfs/zfs_namecheck.c
sys/contrib/opensolaris/common/zfs/zfs_prop.c
sys/contrib/opensolaris/uts/common/fs/zfs/arc.c
sys/contrib/opensolaris/uts/common/fs/zfs/bplist.c
sys/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
sys/contrib/opensolaris/uts/common/fs/zfs/dmu.c
sys/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c
sys/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c
sys/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c
sys/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c
sys/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
sys/contrib/opensolaris/uts/common/fs/zfs/dmu_zfetch.c
sys/contrib/opensolaris/uts/common/fs/zfs/dnode.c
sys/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c
sys/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c
sys/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c
sys/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
sys/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c
sys/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c
sys/contrib/opensolaris/uts/common/fs/zfs/fletcher.c
sys/contrib/opensolaris/uts/common/fs/zfs/gzip.c
sys/contrib/opensolaris/uts/common/fs/zfs/lzjb.c
sys/contrib/opensolaris/uts/common/fs/zfs/metaslab.c
sys/contrib/opensolaris/uts/common/fs/zfs/refcount.c
sys/contrib/opensolaris/uts/common/fs/zfs/sha256.c
sys/contrib/opensolaris/uts/common/fs/zfs/spa.c
sys/contrib/opensolaris/uts/common/fs/zfs/spa_config.c
sys/contrib/opensolaris/uts/common/fs/zfs/spa_errlog.c
sys/contrib/opensolaris/uts/common/fs/zfs/spa_history.c
sys/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c
sys/contrib/opensolaris/uts/common/fs/zfs/space_map.c
sys/contrib/opensolaris/uts/common/fs/zfs/txg.c
sys/contrib/opensolaris/uts/common/fs/zfs/uberblock.c
sys/contrib/opensolaris/uts/common/fs/zfs/unique.c
sys/contrib/opensolaris/uts/common/fs/zfs/vdev.c
sys/contrib/opensolaris/uts/common/fs/zfs/vdev_cache.c
sys/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c
sys/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c
sys/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c
sys/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c
sys/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c
sys/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c
sys/contrib/opensolaris/uts/common/fs/zfs/zap.c
sys/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c
sys/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c
sys/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c
sys/contrib/opensolaris/uts/common/fs/zfs/zfs_fm.c
sys/contrib/opensolaris/uts/common/fs/zfs/zil.c
sys/contrib/opensolaris/uts/common/fs/zfs/zio.c
sys/contrib/opensolaris/uts/common/fs/zfs/zio_checksum.c
sys/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c
sys/contrib/opensolaris/uts/common/fs/zfs/zio_inject.c
sys/contrib/opensolaris/uts/common/fs/zfs/zvol.c
sys/contrib/opensolaris/uts/common/zmod/adler32.c
sys/contrib/opensolaris/uts/common/zmod/crc32.c
sys/contrib/opensolaris/uts/common/zmod/deflate.c
sys/contrib/opensolaris/uts/common/zmod/inffast.c
sys/contrib/opensolaris/uts/common/zmod/inflate.c
sys/contrib/opensolaris/uts/common/zmod/inftrees.c
sys/contrib/opensolaris/uts/common/zmod/trees.c
sys/contrib/opensolaris/uts/common/zmod/zmod.c
sys/contrib/opensolaris/uts/common/zmod/zmod_subr.c
sys/contrib/opensolaris/uts/common/zmod/zutil.c
  
Admittedly, there are some pending issues that have been #ifndefed out to get this point. Here is a summary of notable changes that have been made, and issues that still need to be addressed. Let's take a look through the set of patches that will be committed shortly. (These patches have been trimmed manually for legibility, so there should be no expectation for these to work. Use CVS.)

  • Makefile

    diff -b -d -u -r1.1 Makefile.files
    --- contrib/opensolaris/uts/common/Makefile.files	28 Jun 2007 21:51:56 -0000	1.1
    +++ contrib/opensolaris/uts/common/Makefile.files	11 Aug 2007 22:49:02 -0000
     
    +# XXX Edited for ZVOL
      
    -	opensolaris_atomic.o	\
    -	opensolaris_zfs.o	\
    -	opensolaris_zone.o	\
    -	zfs_znode.o		\
    -	zfs_acl.o		\
    -	zfs_ctldir.o		\
    -	zfs_dir.o		\
    -	zfs_ioctl.o		\
    -	zfs_log.o		\
    -	zfs_replay.o		\
    -	zfs_rlock.o		\
    -	zfs_vfsops.o		\
    -	zfs_vnops.o		\
        
    I am not yet concerned with the ZFS filesystem-- merely storage pool volumes.

  • Atomic Ops

    diff -b -d -u -r1.1 atomic.h
    --- compat/opensolaris/sys/atomic.h	28 Jun 2007 21:51:42 -0000	1.1
    +++ compat/opensolaris/sys/atomic.h	11 Aug 2007 22:49:00 -0000
    @@ -1,7 +1,12 @@
    +/*	$NetBSD: atomic.h,v 1.1.2.2 2007/04/13 04:09:43 thorpej Exp $	*/
    +
     /*-
    - * Copyright (c) 2007 Pawel Jakub Dawidek 
    + * Copyright (c) 2007 The NetBSD Foundation, Inc.
      * All rights reserved.
      *
    + * This code is derived from software contributed to The NetBSD Foundation
    + * by Jason R. Thorpe.
    + *
        
    I am using the headers from the netbsd-thorpej-atomic branch. I haven't been able to find the implementation, so I'm guessing it doesn't exist yet. Though the code compiles, we will certainly hit errors when trying to link (if we're not able to determine that sort of thing LKMs, I'll probably configure another Makefile for static compilation.

    retrieving revision 1.1
    diff -b -d -u -r1.1 Makefile
    --- Makefile	30 Jul 2007 15:21:45 -0000	1.1
    +++ Makefile	11 Aug 2007 22:48:59 -0000
    @@ -80,6 +80,9 @@
     
    +# XXX FIXME arc.c needs atomic_add_64()
    +CFLAGS+=-D__HAVE_ATOMIC64_OPS
        
    I don't know what's going to happen if we don't have 64b atomic ops. Arc.c has several calls to atomic_*_64().

  • Kmem Compatibility

    diff -b -d -u -r1.2 opensolaris_kmem.c
    --- compat/opensolaris/kern/opensolaris_kmem.c	30 Jul 2007 15:21:46 -0000	1.2
    +++ compat/opensolaris/kern/opensolaris_kmem.c	11 Aug 2007 22:48:59 -0000
    @@ -112,8 +112,11 @@
     kmem_size(void)
     {
     
    +#ifndef __NETBSD__
     	return ((u_long)vm_kmem_size);
    +#else	/* FIXME */
    +	return 0;
    +#endif
     }
        
    I was hoping tha kmem_map->size would implement kmem_size(), but kmem_used() returns that value. How does one determine the capacity of kmem?

    Kobj Combatibility

    diff -b -u -d -r1.2 opensolaris_kobj.c
    --- opensolaris_kobj.c	30 Jul 2007 15:21:46 -0000	1.2
    +++ opensolaris_kobj.c	14 Aug 2007 03:40:48 -0000
    @@ -65,23 +68,21 @@
     static void *
     kobj_open_file_vnode(const char *file)
     {
    -	struct thread *td = curthread;
    +	struct lwp *l = curlwp;
     	struct nameidata nd;
    -	int error, flags;
    +	int error;
     
    -	if (td->td_proc->p_fd->fd_rdir == NULL)
    -		td->td_proc->p_fd->fd_rdir = rootvnode;
    -	if (td->td_proc->p_fd->fd_cdir == NULL)
    -		td->td_proc->p_fd->fd_cdir = rootvnode;
    +	if (l->l_proc->p_cwdi->cwdi_rdir == NULL)
    +		l->l_proc->p_cwdi->cwdi_rdir = rootvnode;
    +	if (l->l_proc->p_cwdi->cwdi_cdir == NULL)
    +		l->l_proc->p_cwdi->cwdi_cdir = rootvnode;
     
    -	flags = FREAD;
    -	NDINIT(&nd, LOOKUP, NOFOLLOW, UIO_SYSSPACE, file, td);
    -	error = vn_open_cred(&nd, &flags, 0, td->td_ucred, NULL);
    -	NDFREE(&nd, NDF_ONLY_PNBUF);
    +	NDINIT(&nd, LOOKUP, NOFOLLOW, UIO_SYSSPACE, file, l);
    +	error = _vn_open(&nd, FREAD, 0);
     	if (error != 0)
     		return (NULL);
     	/* We just unlock so we hold a reference. */
    -	VOP_UNLOCK(nd.ni_vp, 0, td);
    +	VOP_UNLOCK(nd.ni_vp, 0);
     	return (nd.ni_vp);
     }
       
    Porting vnode operations is a very large part of this project. While not all vnode-related code will assuredly work at this point, the above code is a good example of the type of changes that have to occur.
    @@ -89,7 +90,11 @@
     kobj_open_file_loader(const char *file)
     {
     
    +#ifndef __NetBSD__
    	return (preload_search_by_name(file));
    +#else	/* FIXME */
    +	return (NULL);
    +#endif
     }
       
    This is a FreeBSDism that shows up only in kobj compatibility. It comes from their <sys/linker.h>
    @@ -156,7 +165,7 @@
     kobj_read_file_vnode(struct _buf *file, char *buf, unsigned size, unsigned off)
     {
     	struct vnode *vp = file->ptr;
    -	struct thread *td = curthread;
    +	struct lwp *l = curlwp;
     	struct uio auio;
     	struct iovec aiov;
     	int error;
    @@ -169,15 +178,19 @@
     
     	auio.uio_iov = &aiov;
     	auio.uio_offset = (off_t)off;
    +#ifndef __NetBSD__
     	auio.uio_segflg = UIO_SYSSPACE;
    +#endif
     	auio.uio_rw = UIO_READ;
     	auio.uio_iovcnt = 1;
     	auio.uio_resid = size;
    +#ifndef __NetBSD__
     	auio.uio_td = td;
    +#endif
     
    -	vn_lock(vp, LK_SHARED | LK_RETRY, td);
    -	error = VOP_READ(vp, &auio, IO_UNIT | IO_SYNC, td->td_ucred);
    -	VOP_UNLOCK(vp, 0, td);
    +	vn_lock(vp, LK_SHARED | LK_RETRY);
    +	error = VOP_READ(vp, &auio, IO_UNIT | IO_SYNC, l->l_cred);
    +	VOP_UNLOCK(vp, 0);
     	return (error != 0 ? -1 : size - auio.uio_resid);
     }
        
    There are also some differences in the members of strcut uio. I have not determined how/if it is necessary to pass this information to VOP_READ another way.

  • Security Policy Compatibility

    diff -b -d -u -r1.1 opensolaris_policy.c
    --- compat/opensolaris/kern/opensolaris_policy.c	28 Jun 2007 21:51:38 -0000	1.1
    +++ compat/opensolaris/kern/opensolaris_policy.c	11 Aug 2007 22:48:59 -0000
    @@ -38,41 +39,59 @@
    +#ifdef __NETBSD__	/* FIXME */
    +# define priv_check_cred(cred, cmd, num)	(0)
    +
    +static int
    +groupmember(gid_t gid, kauth_cred_t cred)
    +{
    +       int error, result;
    +
    +       result = 0;
    +       (void) kauth_cred_ismember_gid(cred, gid, &result);
    +       return (result); 
    +}
    +#endif
    +
        
     int
    -secpolicy_fs_unmount(struct ucred *cred, struct mount *vfsp __unused)
    +secpolicy_fs_unmount(kauth_cred_t cred, struct mount *vfsp)
     {
     
    -	return (priv_check_cred(cred, PRIV_VFS_UNMOUNT, 0));
    +	return (kauth_authorize_system(cred, KAUTH_SYSTEM_MOUNT,
    +	    KAUTH_REQ_SYSTEM_MOUNT_UNMOUNT, vfsp, NULL, NULL));
     }
        
    It was trivial to port the unmount request. The others.. not so much. For now they are all stubbed with priv_check_cred().

  • Vnode

    As I said earlier, porting vnode makes up a large part of this project (and likely, even a larger part in porting the rest of ZFS). Luckily, most of this isn't critical for zpools, so just achieving compilation should be sufficient.

    diff -b -u -d -r1.2 vnode.h
    --- vnode.h	30 Jul 2007 15:21:47 -0000	1.2
    +++ vnode.h	14 Aug 2007 04:18:32 -0000
    @@ -48,11 +48,8 @@
     
     #define	v_count	v_usecount
     
    -static __inline int
    -vn_is_readonly(vnode_t *vp)
    -{
    -	return (vp->v_mount->mnt_flag & MNT_RDONLY);
    -}
    +#define vn_is_readonly(vp)	(vn_writechk((vp)) == ETEXTBSY)
    +
     #define	vn_vfswlock(vp)		(0)
     #define	vn_vfsunlock(vp)	do { } while (0)
     #define	vn_ismntpt(vp)		((vp)->v_type == VDIR && (vp)->v_mountedhere != NULL)
    @@ -141,6 +138,13 @@
     		vap->va_mask |= AT_MODE;
     }
     
    +static __inline int
    +nb_vn_open(struct nameidata *ndp, int filemode, int createmode)
    +{
    +
    +	return (vn_open(ndp, filemode, createmode));
    +}
    +
        
    Before papering over vn_open(), it's copied into a safe namespace for access from compat modules.
     #define	FCREAT	O_CREAT
     #define	FTRUNC	O_TRUNC
     #define	FSYNC	FFSYNC
    @@ -216,16 +225,17 @@
     
     	ASSERT(flag == FSYNC);
     
    -	/* XXX vfslocked = VFS_LOCK_GIANT(vp->v_mount); */
    -	/* FIXME */
        
    We don't have a giant lock, right? Might anything else need to happen here in terms of locks?
    +#ifndef __NetBSD__
     	if ((error = vn_start_write(vp, &mp, V_WAIT | PCATCH)) != 0)
     		goto drop;
    -	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, l);
    +#endif
        
    I see that NetBSD used to have vn_start_write(). I haven't tracked through commit logs yet, but I take its absence to indicate that this step is not necessary any longer.
    +	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
     	error = VOP_FSYNC(vp, cr, MNT_WAIT, /* FIXME offsets */ 0, 0, l);
    -	VOP_UNLOCK(vp, 0, l);
    +	VOP_UNLOCK(vp, 0);
    +#ifndef __NetBSD__
     	vn_finished_write(mp);
    +#endif
        
    And similarly, above.
    @@ -249,7 +259,11 @@
     
     	ASSERT(seg == UIO_SYSSPACE);
     
    +#ifndef __NetBSD__
     	return (kern_rename(curlwp, from, to, seg));
    +#else	/* FIXME */
    +	return (ENOTSUP);
    +#endif
     }
    @@ -260,7 +274,11 @@
     	ASSERT(seg == UIO_SYSSPACE);
     	ASSERT(dirflag == RMFILE);
     
    +#ifndef __NetBSD__
     	return (kern_unlink(curlwp, fnamep, seg));
    +#else	/* FIXME */
    +	return (ENOTSUP);
    +#endif
     }
     
     #endif	/* _OPENSOLARIS_SYS_VNODE_H_ */
        
    Any pointers on replacements for kern_unlink() and kern_rename()?

  • Zones compatibility

    diff -b -d -u -r1.1 zone.h
    --- compat/opensolaris/sys/zone.h	28 Jun 2007 21:51:52 -0000	1.1
    +++ compat/opensolaris/sys/zone.h	11 Aug 2007 22:49:01 -0000
    @@ -40,22 +38,9 @@
     /*
      * Is process in the global zone?
      */
    -#define	INGLOBALZONE(p)	(!jailed((p)->p_ucred))
    -
    -/*
    - * Attach the given dataset to the given jail.
    - */
    -extern int zone_dataset_attach(struct ucred *, const char *, int);
    -
    -/*
    - * Detach the given dataset to the given jail.
    - */
    -extern int zone_dataset_detach(struct ucred *, const char *, int);
    +#define	INGLOBALZONE(p)	(p != NULL)
     
    -/*
    - * Returns true if the named pool/dataset is visible in the current zone.
    - */
    -extern int zone_dataset_visible(const char *, int *);
    +#define zone_dataset_visible(s, n)	(1)
     
     #else	/* !_KERNEL */
        
    Early on in the project, I tried to rip a lot of the zones stuff out. It still creeps up in a few places, so I implemented the minimal set of stubs above. I may add some more as I start to look at minimizing the size of my diffs.

  • Adaptive Resource Cache

    diff -b -d -u -r1.1 arc.c
    --- contrib/opensolaris/uts/common/fs/zfs/arc.c	28 Jun 2007 21:51:57 -0000	1.1
    +++ contrib/opensolaris/uts/common/fs/zfs/arc.c	11 Aug 2007 22:49:03 -0000
    @@ -148,3 +152,5 @@
    +/*
    + * FIXME These tunables are for performance analysis.
     TUNABLE_ULONG("vfs.zfs.arc_max", &zfs_arc_max);
     TUNABLE_ULONG("vfs.zfs.arc_min", &zfs_arc_min);
     SYSCTL_DECL(_vfs_zfs);
    @@ -159,6 +158,7 @@
         "Maximum ARC size");
     SYSCTL_ULONG(_vfs_zfs, OID_AUTO, arc_min, CTLFLAG_RD, &zfs_arc_min, 0,
         "Minimum ARC size");
    + */
         
    More sysctl FIXMEs...
    @@ -2792,10 +2796,12 @@
     	(void) thread_create(NULL, 0, arc_reclaim_thread, NULL, 0, &p0,
     	    TS_RUN, minclsyspri);
     
    +#ifndef __NETBSD__	/* FIXME */
     #ifdef _KERNEL
     	arc_event_lowmem = EVENTHANDLER_REGISTER(vm_lowmem, arc_lowmem, NULL,
     	    EVENTHANDLER_PRI_FIRST);
     #endif
    +#endif
     
     	arc_dead = FALSE;
        
    FreeBSD uses their EVENTHANDLER framework to monitor a low-watermark in their VM system. To the best that I can tell, this is equivalent to knote(9). However, I have found no uvm-related knotes at this point. Any pointers?

  • dmu_send

    I am replacing the FreeBSD dmu_send.c with the original Solaris version, which compiles with one small modification:

    diff -b -d -u -r1.1 dmu_send.c
    --- onnv/uts/common/fs/zfs/dmu_send.c			08 Jun 2007 15:11:35 -0400
    +++ contrib/opensolaris/uts/common/fs/zfs/dmu_send.c	11 Aug 2007 23:44:13 -0400
    @@ -160,8 +160,10 @@
            void *data = bc->bc_data;
            int err = 0;
     	 
     +#ifndef __NETBSD__     /* FIXME */
             if (issig(JUSTLOOKING) && issig(FORREAL))
                     return (EINTR);
     +#endif
     
              ASSERT(data || bp == NULL);
        
    I do not know if there is an equivalent for this bit of code in NetBSD.

  • Sysctl

    Sysctl(9) is not been fully ported yet. This is one (of many) examples:

    diff -b -d -u -r1.2 spa.c
    --- contrib/opensolaris/uts/common/fs/zfs/spa.c	30 Jul 2007 15:15:32 -0000	1.2
    +++ contrib/opensolaris/uts/common/fs/zfs/spa.c	11 Aug 2007 22:49:09 -0000
    @@ -59,11 +59,13 @@
     #include <sys/fs/zfs.h>
     #include <sys/callb.h>
     #include <sys/sunddi.h>
    +#include <sys/vnode.h>
     
    -/* FIXME */
     int zio_taskq_threads = 0;
    +#ifndef __NETBSD__	/* FIXME */
     SYSCTL_DECL(_vfs_zfs);
     SYSCTL_NODE(_vfs_zfs, OID_AUTO, zio, CTLFLAG_RW, 0, "ZFS ZIO");
     TUNABLE_INT("vfs.zfs.zio.taskq_threads", &zio_taskq_threads);
     SYSCTL_INT(_vfs_zfs_zio, OID_AUTO, taskq_threads, CTLFLAG_RW,
         &zio_taskq_threads, 0, "Number of ZIO threads per ZIO type");
    +#endif
        

    One notable use of sysctl (in the ARC) is the kstat compability code. The kstat compat code is intended to interface with sysctl through the following kstat calls:

    • kstat_create()

      Configures nodes in the sysctl tree in the form:

      	  kstat.<module>.<class>.<name>
      	
      The class sysctlnode is returned in ksp->ks_sysctl_root.
      diff -b -u -d -r1.1 opensolaris_kstat.c
      --- opensolaris_kstat.c	28 Jun 2007 21:51:37 -0000	1.1
      +++ opensolaris_kstat.c	14 Aug 2007 03:40:31 -0000
      @@ -62,6 +75,7 @@
       	 *
       	 *	kstat....
       	 */
      +#ifndef __NetBSD__
       	sysctl_ctx_init(&ksp->ks_sysctl_ctx);
       	root = SYSCTL_ADD_NODE(&ksp->ks_sysctl_ctx,
       	    SYSCTL_STATIC_CHILDREN(_kstat), OID_AUTO, module, CTLFLAG_RW, 0,
      @@ -90,11 +104,41 @@
       		free(ksp, M_KSTAT);
       		return (NULL);
       	}
      +#else	/* FIXME */
      +	root = NULL;
      +	if (sysctl_createv(NULL, 0, NULL, &root, 0, CTLTYPE_NODE, "kstat",
      +	    NULL, NULL, 0, NULL, 0, CTL_KERN, CTL_CREATE, CTL_EOL) != 0) {
      +		printf("%s: Cannot create kstat tree!\n", __func__);
      +		free(ksp, M_KSTAT);
      +		return NULL;
      +	}
      +	if (sysctl_createv(NULL, 0, &root, &root, 0, CTLTYPE_NODE, module,
      +	    NULL, NULL, 0, NULL, 0, CTL_CREATE, CTL_EOL) != 0) {
      +		printf("%s: Cannot create kstat.%s tree!\n", __func__, module);
      +		free(ksp, M_KSTAT);
      +		return NULL;
      +	}
      +	if (sysctl_createv(NULL, 0, &root, &root, 0, CTLTYPE_NODE, class,
      +	    NULL, NULL, 0, NULL, 0, CTL_CREATE, CTL_EOL) != 0) {
      +		printf("%s: Cannot create kstat.%s.%s tree!\n", __func__,
      +		    module, class);
      +		free(ksp, M_KSTAT);
      +		return NULL;
      +	}
      +	if (sysctl_createv(NULL, 0, &root, &root, 0, CTLTYPE_NODE, name,
      +	    NULL, NULL, 0, NULL, 0, CTL_CREATE, CTL_EOL) != 0) {
      +		printf("%s: Cannot create kstat.%s.%s.%s tree!\n", __func__,
      +		    module, class, name);
      +		free(ksp, M_KSTAT);
      +		return NULL;
      +	}
      +#endif
       	ksp->ks_sysctl_root = root;
       
       	return (ksp);
       }
       
      +#ifndef __NetBSD__
       static int
       kstat_sysctl(SYSCTL_HANDLER_ARGS)
       {
      @@ -104,6 +148,29 @@
       	val = ksent->value.ui64;
       	return sysctl_handle_quad(oidp, &val, 0, req);
       }
      +#else
      +static int
      +kstat_sysctl(SYSCTLFN_ARGS)
      +{
      +	struct sysctlnode	 node;
      +	uint64_t		 val;
      +	kstat_named_t		*ksent;
      +	int			 error;
      +
      +	node = *rnode;
      +	ksent = newp;
      +
      +	node.sysctl_data = &val;
      +
      +	error = sysctl_lookup(SYSCTLFN_CALL(&node));
      +	if (error || newp == NULL)
      +		return (error);
      +
      +	val = ksent->value.ui64;
      +
      +	return (0);
      +}
      +#endif
                

    • kstat_install()

      Add a knode_named_t into sysctl as *newp. (via kstat_sysctl() (above). The sysctl helper needs to be looked at to determine whether it will work at all as expected.

       void
       kstat_install(kstat_t *ksp)
      @@ -113,12 +180,18 @@
       
       	ksent = ksp->ks_data;
       	for (i = 0; i < ksp->ks_ndata; i++, ksent++) {
      -		KASSERT(ksent->data_type == KSTAT_DATA_UINT64,
      -		    ("data_type=%d", ksent->data_type));
      +		KASSERT(ksent->data_type == KSTAT_DATA_UINT64);
      +#ifndef __NetBSD__
       		SYSCTL_ADD_PROC(&ksp->ks_sysctl_ctx,
       		    SYSCTL_CHILDREN(ksp->ks_sysctl_root), OID_AUTO, ksent->name,
       		    CTLTYPE_QUAD | CTLFLAG_RD, ksent, sizeof(*ksent),
       		    kstat_sysctl, "QU", "");
      +#else
      +		sysctl_createv(NULL, 0, &ksp->ks_sysctl_root, NULL,
      +		    CTLFLAG_READWRITE, CTLTYPE_QUAD,
      +		    ksent->name, NULL, kstat_sysctl, 0, ksent, sizeof(*ksent),
      +		    CTL_CREATE, CTL_EOL);
      +#endif
       	}
       }
              

    • kstat_delete()

      Free the kstat data. I don't know if we should teardown parts of the sysctl tree too.

      @@ -126,6 +199,8 @@
       kstat_delete(kstat_t *ksp)
       {
       
      +#ifndef __NetBSD__
       	sysctl_ctx_free(&ksp->ks_sysctl_ctx);
      +#endif
       	free(ksp, M_KSTAT);
       }
              

    Also, I don't know if a SYSCTL_SETUP function is necessary-- I don't believe it is.

  • mutex

    diff -b -u -d -r1.3 mutex.h
    --- compat/opensolaris/sys/mutex.h      30 Jul 2007 15:21:46 -0000      1.3
    +++ compat/opensolaris/sys/mutex.h      12 Aug 2007 06:16:41 -0000
    @@ -53,6 +55,8 @@
     #define mutex_init(mtx, desc, type, arg)                               \
            zfs_mutex_init((mtx), (desc), (type), (arg))
     
    +#define mutex_owner(mtx)       ((struct lwp *)(mtx)->mtx_owner)
    +
     #endif /* _KERNEL */
     
     #endif /* _OPENSOLARIS_SYS_MUTEX_H_ */
        
    I was unsure how to mimick mutex_owner(). I finally found the following in %lt;sys/mutex%gt;:
    (isla)$ nl -ba /usr/src/sys/sys/mutex.h | sed -n 54,60p
        54   *      struct mutex
        55   *              The actual mutex structure.  This structure is mostly
        56   *              opaque to machine-independent code; most access are done
        57   *              through macros.  However, machine-independent code must
        58   *              be able to access the following members:
        59   *
        60   *              uintptr_t               mtx_owner
        ..   .              ...			...
        

    However, we have to define __MUTEX_PRIVATE to get this macro (in sys/arch/x86/include/mutex.h by doing the following:

    @@ -34,7 +34,9 @@
     
     #ifdef _KERNEL
     
    +#define __MUTEX_PRIVATE                /* We need mtx_owner */
     #include_next 
    +
     #include  <sys/param.h>
     /* XXX #include <sys/proc.h> */
     #include  <sys/lock.h>
        

  • Adaptive Replacement Cache

    diff -b -d -u -r1.1 arc.h
    --- contrib/opensolaris/uts/common/fs/zfs/sys/arc.h	28 Jun 2007 21:52:19 -0000	1.1
    +++ contrib/opensolaris/uts/common/fs/zfs/sys/arc.h	11 Aug 2007 22:49:15 -0000
    @@ -39,0 +40,4 @@
    +#ifdef __NETBSD__	/* XXX Namespace clash with buf.h */
    +# undef b_data
    +# undef b_private
    +#endif
        
    A nasty, annoying namespace clash with <sys/buf.h> forces me to undefine the above two macros (which are otherwise used to access data and private elements within a union). I hope this doesn't cause a problem.

  • Asynchronous Read/Write calls

    --- contrib/opensolaris/uts/common/fs/zfs/sys/zvol.h	28 Jun 2007 21:52:28 -0000	1.1
    +++ contrib/opensolaris/uts/common/fs/zfs/sys/zvol.h	11 Aug 2007 22:49:16 -0000
    @@ -51,8 +51,10 @@
     extern int zvol_strategy(buf_t *bp);
     extern int zvol_read(dev_t dev, uio_t *uiop, cred_t *cr);
     extern int zvol_write(dev_t dev, uio_t *uiop, cred_t *cr);
    +# ifndef __NetBSD__
     extern int zvol_aread(dev_t dev, struct aio_req *aio, cred_t *cr);
     extern int zvol_awrite(dev_t dev, struct aio_req *aio, cred_t *cr);
    +# endif
     #endif
     extern int zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr,
         int *rvalp);
        
    Async-Read/Write is not implemented.

  • Prototypes and Function Pointers

    While all of the Solaris code is generally lucid, <bsd.kmod.mk> uses pretty strict warning flags (e.g. -Wstrict-prototypes -Wmissing-prototypes). Furthermore, the Solaris headers qualify prototypes with extern. I have added all missing prototypes to the beginning of each C file. This can easily be reversed at some point, but it made my life a lot easier...

    Similarly, this strictness affects function pointer declaration. Therefore, the types of (missing) arguments were specified as in the followng example from sys/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h:236:

        < typedef int zil_replay_func_t();
        > typedef int zil_replay_func_t(void*,void*,boolean_t);
        

  • Remaining compile errors

    Of course, there are still some compile errors left in sys/contrib/opensolaris/uts/common/os/. The one I'm currently at follows:

    #   compile  sys/taskq.o
    cc -O2 -D_SOLARIS_C_SOURCE -D__HAVE_ATOMIC64_OPS
    -I~/src/zfs/src/sys/compat/opensolaris
    -I~/src/zfs/src/sys/contrib/opensolaris/uts/common/fs/zfs
    -I~/src/zfs/src/sys/contrib/opensolaris/uts/common/zmod
    -I~/src/zfs/src/sys/contrib/opensolaris/uts/common
    -I~/src/zfs/src/sys/contrib/opensolaris/common/zfs
    -I~/src/zfs/src/sys/contrib/opensolaris/common
    -I~/src/zfs/src/sys/../include -I~/src/zfs/src/sys
    -ffreestanding  -fno-strict-aliasing -Wno-pointer-sign -Wall -Wstrict-prototypes
    -Wmissing-prototypes -Wpointer-arith -Wno-sign-compare -Wno-traditional -Wall
    -Wno-unknown-pragmas -Wno-missing-braces -Wno-parentheses -Wno-uninitialized
    -Wno-unused -Wno-switch -Werror   -nostdinc -I. -I~/src/zfs/src/sys
    -isystem /usr/src/sys -isystem /usr/src/sys/arch -isystem
    /usr/src/sys/../common/include -D_KERNEL -D_LKM  -c
    ~/src/zfs/src/sys/contrib/opensolaris/uts/common/os/taskq.c
    In file included from /usr/src/sys/sys/device.h:80,
                     from ~/src/zfs/src/sys/x86/pic.h:6,
                     from ~/src/zfs/src/sys/machine/pic.h:3,
                     from ~/src/zfs/src/sys/x86/intr.h:45,
                     from ~/src/zfs/src/sys/machine/intr.h:3,
                     from /usr/src/sys/sys/mutex.h:187,
                     from ~/src/zfs/src/sys/compat/opensolaris/sys/mutex.h:38,
                     from ~/src/zfs/src/sys/compat/opensolaris/sys/taskq_impl.h:32,
                     from ~/src/zfs/src/sys/contrib/opensolaris/uts/common/os/taskq.c:374:
    /usr/src/sys/sys/evcnt.h:85: error: expected specifier-qualifier-list before 'uint64_t'
        
    Otherwise, in this directory, it's the pretty standard fare of prototype warnings and #ifndefing around SYSINIT(..).

    And outside of these problems, we're at least compiling ;). Next comes the fun task of (trying to) link zfs into the kernel, which almost has to be done statically, as I understand it, in order to get reasonable information.

2007/08/14 15:56

Most of zvol & vdev_disk is complete (though untested). Here's a summary of notable aspects and pending issues related to these two interfaces:

Pseudo disk structure

  • The ZVOL top-half presents a volume through a device node.
  • Calls go through the ZVOL device into the Intent Log (ZIL), through the I/O Pipeline (ZIO), where much of the ZFS magic occurs,
  • And then eventually the Virtual Devices (vdevs) interact with the underlying disk drivers.

In order to configure the pseudo disk, we do the following:

/*
 * Configure pseudo-disk interface
 */
zv->zv_dk.dk_name = zv->zv_name;
zv->zv_dk.dk_driver = &zvoldkdriver;
pseudo_disk_init(&zv->zv_dk);
zv->zv_flags = DKF_INITED;

/*
 * XXX Is there a need to set a geometry.  It is very likely possible
 * that zv_volsize can handle everything.
 */

/* Attach ZVOL */
pseudo_disk_attach(&zv->zv_dk);

dkwedge_discover(&zv->zv_dk);

At one point I was using the dk_softc structure out of sys/dev/dkvar.h, which, you may notice, has many of the same fields that are in the zvol's softc:

struct zvol_softc {
	struct device	 zv_device;     	/* device softc for driver(9) */
	char		 zv_name[MAXPATHLEN];	/* pool/dd name */
	uint64_t	 zv_volsize;		/* amount of space we advertise */
	uint64_t	 zv_volblocksize;	/* volume block size */

Note that there are sufficiently large types to hold these values. dk_softc.sc_size is only a (4 byte) size_t.

	minor_t		 zv_minor;      /* minor number */
	uint8_t		 zv_min_bs;     /* minimum addressable block shift */
	uint8_t		 zv_readonly;   /* hard readonly; like write-protect */
	objset_t	*zv_objset;     /* objset handle */
	uint32_t	 zv_mode;       /* DS_MODE_* flags at open time */
	uint32_t	 zv_total_opens; /* total open count */
	zilog_t		*zv_zilog;      /* ZIL handle */
	uint64_t	 zv_txg_assign; /* txg to assign during ZIL replay */
	znode_t		 zv_znode;      /* for range locking */
	struct disk	 zv_dk;         /* disk interface */
	uint32_t	 zv_flags;      /* DKF_* state flags */
};

IOCTLs

  • case DKIOCFLUSHWRITECACHE:
    case DIOCCACHESYNC:

    I've think I've got this one figured out, actually.

    I was forewarned about the cache-flushing ioctl being important to proper ZFS behavior. Solaris's DKIOCFLUSHWRITECACHE is, I believe, equivalent to NetBSD's DIOCCACHESYNC (once the disk's write cache has been enabled with (DIOCSCACHE, DKCACHE_WRITE)). In the underlying vdevs, VOP_IOCTL(DIOCCACHESYNC) is performed on the backing device node's vnode.

    The vnode, in this case (or, at this point?), is that yielded from a lookup(9) on the device node as specified to zpool(1).

  • case DIOCAWEDGE:
    case DIOCDWEDGE:
    case DIOCLWEDGES:

    These were trivial to import from pre-existing pseudo-disk drivers.

  • case DIOCGDINFO:
    case DIOCWDINFO:
    case DIOCSDINFO:
    case DIOCGPART:
    case DIOCWLABEL:
    case DIOCGDEFLABEL:

    These have not been absolutely trivial to import. Can these IOCTLs be left unsupported? As I understand it, we could simply require GPT labels to be written for ZVOL storage.

Proplib

gpt(8) uses drvctl to get sector and media size, so we need to update the prop_dictionary in the zvol_softc with accurate information. sys/kern/kern_drvctl.c uses the dv_properties dictionary in struct device. However, ld(4) uses the dk_info in struct disk. I haven't resolved what needs to be done here. Either way, I gather that (de)referencing will have to occur during disk attachment and detachment.

Disk IDs

Also, Solaris and FreeBSD are able to keep track of a unique disk identifications, so that when a disk is reattached, e.g. on a different controller, it can be detected and correctly added to the proper zpool.

Stay tuned for the next half of this status update...

2007/08/09 22:48
Since dynamic device node creation is not currently possible (not easily, at least) in NetBSD, zvol-attaching and mounting will be handled much differently than in other implementations, at least within the context of the summer.

ZVOLs are "created" through zvol_create_minor(name, dev) (by ioctls on /dev/zfs, and therefore zfs_ioctl.c). NetBSD's implementations uses namei(9) to find the vnode of the device node of the given volume, and then the minor number is determined with VOP_GETATTR(9).

Actually, it comes as an afterthought that device node creation may be possible with VOP_MKNOD(9) (basically, copying the mknod(2) code). Doing this is clearly not the first (or next) step, but it may be a workable interim solution (a la compat layer?).

Each ZVOL's soft state (struct zvol_softc, which is based on Solaris's struct zvol_state) is stored in an array, zvol_softcs, of pointers. Individual softcs are accessed and manipulated, by minor number, through zvol_softc_get(minor), zvol_softc_zalloc(minor), zvol_softc_free(minor). This behaviour is largely based on that of cgd(4).

However, I have to say that I have not been able to fully make top and bottom of how a disk driver like this should work. Though, for instance, cgd and vnd both provide virtual disk drivers through pre-configured device nodes, they accomplish this (seemingly) differently. To wit, cgd maintains an array of softcs, as I currently do in zvol, and uses neither CFATTACH nor CFDRIVER_DECL (as is the same for ccd). The vnd driver does, and it accesses each zvol with driver_lookup(). Having to pick one approach, I'll go with c.?d(4)'s (because I've already implemented that). However, vnd is the only one loadable as an LKM.

A lot of other things have happened since I've last posted, which has been a long time. Rather than try to recap everything at once, I'd prefer to focus on moving forward.

2007/07/20 12:03
Thanks to Zafer, I've been keeping olix0r.net despiste my incredible inability to reclaim (legitimate) internet access at my new residence. He's gone out of his way to give me CGI functionality to run Blosxom.

Aside from that, life has been exciting. I took a "vacation" this week to get some Real Work done, but that is still an uphill battle. (Updates in the near term).

Also, John From Cincinatti might be the best television show ever.

2007/06/29 22:25
I'm happy to say that I have read the previously-mentioned ;login: article.

In summary, I think that Pawel and Marshall do a good job. Their overview of ZFS should be more than sufficient for anyone that reads ;login: but has managed to keep bunkered under a rock for long enough not to have seen the brochures.

On to the main course, though-- a summary of Pawel's work porting ZFS to FreeBSD. Reading it, I was surprised at how much I knew already. That said, I am surprised at how I did not put the clues together into a clearer plan for this summer's project. Specifically, I've known about (and read.. and discussed) FreeBSD's OpenSolaris compat layer. What I somehow did not correlate, though, is that most of the work I have slated for this summer will be limited to this layer. I anticipate that it will take me some time to grok the individual interfaces properly, but this is where my mentors are really likely to lend a hand.

So today-- optimism.

I could try to blame other factors for clouding my mind, but I think ultimately my project's "negative-slack" (a former professor's euphemism for schedule-crunch) is due to a lack of intimacy with the code, an aversion to premature assumptions, and, moreover, fear of failure. Completing my initial development plan, and getting some code committed will, I'm sure, give me some much-needed confidence and momentum. It's always interesting to me, though not surprising at all, how human factors manage to bleed into technical work.

2007/06/13 07:48
THE MEATENING!! -- I strongly suggest that you do not send mail to that link.