I just did a branch merge of the nfs41-gate with the onnv_117 tag of the onnv-gate.
And the closed tree had some changes.
I've added a tag of closedv11 to the nfs41-gate and you can download new versions of the closed-bins at http://www.opensolaris.org/os/project/nfsv41/downloads/
There are two ways you can go about installing pNFS on your machines:
We decided to skip branch merges to Nevada builds 113-116. We had issues with the hardware in the lab and vboxes. Plus we wanted a stable environment for the BakeAThon. I'm having a hard time merging with build 117. One of the issues was a panic on the client (which I could patch) due to getting NFS4ERR_BADSESSION from a DS. I was able to see the DS replying on the exchange_id request in a packet trace. There are only a handful of places where the server returns this error code, so I was pretty sure I could reuse a Dtrace script I had laying around:
:nfssrv:mds_lease_chk:return
/args[1] != 0/
{
printf("rc1 = %d", args[1]);
}
Until I came along this code snippet:
*cs->statusp = resp->dsr_status =
NFS4ERR_UNSAFE_COMPOUND;
goto final;
...
final:
DTRACE_NFSV4_2(op__destroy__session__done,
struct compound_state *, cs,
DESTROY_SESSION4res *, resp);
}
Just great, no return value. How could I catch it?
Well, the answer is staring us right in the face, we can use op__destroy__session__done. I had to snoop around for examples in a colleague's home directory (which is kinda why I blog about this stuff, it is easier to find), but ended up with this:
:::op-bind-conn-to-session-done
{
bresp = (BIND_CONN_TO_SESSION4res *)arg1;
printf("rc1 = %d", bresp->bctsr_status);
}
:::op-destroy-session-done
{
dresp = (DESTROY_SESSION4res *)arg1;
printf("rc1 = %d", dresp->dsr_status);
}
Note that I wasn't too worried about being specific as to which module you found these calls in, I was betting on them being very unique.
Oh, and I was able to narrow down where the NFS4ERR_BADSESSION was coming from. And then I had to add debug statements to find out why. :-< I bet I might have been able to do it still with Dtrace. :->
So we had our first BakeAThon hosted by NetApp. I thought it went very well and it enabled the NetApp folks to see how much work goes into hosting the event.
I can't go into details that involves other vendors, because of a NDA, but I can say that we did a lot of testing on new features in both the client and the server. I'm not that familiar with the client side changes, I think we had the compound constructor changes by Bob Mastors and Karen Rochford was testing a rewrite of the layout handling code.
I was more excited about the changes on the server side:
I also brought a major rewrite of the layout handling code (allowing for multiple layouts, etc), device handling (allowing multiple datasets on the same DS) and the integration of the kspe (kernel simple policy engine), but I probably needed another week of development/testing on that code.
The other major development is that we are finally making the switch to using Virtual Boxes for our testing needs. We typically would want 3-4 machines per developer. We can refine this by having a pair of public communities (MDS plus 2 DSes). One as the stable system and the other as a sanity test rig. And then each client developer gets their own machine for a client and each server developer gets a community.
Well, with Virtual Boxes working on a wide range of host OSes, we can have the communities hosted on one or two beefier machines, perhaps sharing room with the build server, and each developer can bring up what they need on their laptops.
I'm going to be giving a presentation on July 7th over pNFS at the Oklahoma City OpenSolaris User Group - OKCOSUG. I'd like to give a live demo using virtual boxes, but I'm not making any promises...
I asked a friend who was a ChemE in college what I could do to make my CPU bracket stronger, he passed me on to Larry. Hi Larry!
Larry's suggestion was to add fiberglass threads to the resin as it mixed. My last interaction with fiberglass involved Tequila and the last football game I went to at OU. I woke up the next morning with a rear end full of fiber - seems I had worn shorts and somehow managed to find a spot that was flaking.
So I listened and got a tyvek jacket:
I also pulled out a mask I had used when painting our cement floor:
I also got some fiberglass cloth to cut into 1/4" strands:
I could go on about technique, but you cut at 45 degrees to the grain and you want shorted than 1/4" in. Anything longer (upto 3/4" in) you keep in a different container. I have a picture of my two containers, and yes, it appears both have strands longer than 1/4" in:
The other thing I needed was something to vibrate the mold to get out air bubbles. Try as I might, I couldn't think of anything. Except making my iPhone ring and ring. Or finding an app what made it vibrate. I paid $0.99 for some app and covered my iPhone in an expensive protective sheathe:
Here you can see my first run:
You can see the sandwich bag housing the iPhone, you can see where I added 4 holes on the feet to allow additional resin to be added. You can see I'm making a small plug to test strength.
And you can see I made a mess:
I didn't bother cleaning off the flash - the resultant structure was weak. I'm not sure, but I think I didn't stir the A and B compounds enough for the first batch. The remaining batches did get stirred and the final plug shape I cast was much stronger. I remember that the bracket felt like it had not set when I pulled it out.
We can see my sandwich bag was a good idea:
I did another run, not shown, and the resulting cast was stronger than both the first and the prior one without the fiberglass. But, I was getting frustrated because my resin was setting too fast, it was difficult to get the gooier mix of resin and fiberglass into the form, and I didn't want to recut the fiber.
I've since used milliput (and a nice clay snake!):
to extend one of the original brackets:
I'm waiting for that to set before I dremel it out.
I've hit a nasty little bug which requires an orderly shutdown of the DS as the client is pounding it with traffic:
panic[cpu1]/thread=ffffff01d16d38e0: rw_destroy: lock still active, lp=ffffff01e60c6e08 wwwh=10 thread=ffffff01d16d38e0 ffffff00090a0b50 unix:rw_panic+6f () ffffff00090a0b70 unix:rw_destroy+33 () ffffff00090a0ba0 nfssrv:dserv_mds_instance_destroy+6d () ffffff00090a0bf0 genunix:kmem_cache_free_debug+29c () ffffff00090a0c50 genunix:kmem_cache_free+90 () ffffff00090a0c90 nfssrv:dserv_mds_instance_teardown+2b8 () ffffff00090a0cd0 unix:stubs_common_code+51 () ffffff00090a1eb0 nfs:nfssys+73 () ffffff00090a1f00 unix:brand_sys_syscall32+292 ()
Lock still active, I take that to mean that we are trying to destroy it as another thread has it held.
If we look at dserv_mds_instance_destroy, it only helps as far as getting us in the right source file. Strike that, as there is only one read/write lock, it also tells us which one. The issue is this snippet in dserv_instance_enter:
220 if (inst == NULL)
221 return (ESRCH);
222
223 rw_enter(&inst->dmi_inst_lock, lock_type);
224 /*
225 * dmi_teardown_in_progress is only set in one place,
226 * dserv_mds_teardown_instance() and when doing so the dmi_inst_lock
227 * is held as a WRITER, therefore, it is safe to check it without
228 * holding the dmi_content_lock.
229 */
230 if (inst->dmi_teardown_in_progress == B_TRUE) {
231 rw_exit(&inst->dmi_inst_lock);
232 if (lock_type == RW_READER)
233 return (EIO);
234 else if (lock_type == RW_WRITER)
235 /*
236 * This will protect from receiving multiple teardown
237 * commands happening at once.
238 */
239 return (EBUSY);
240 }
I thought the issue was that if we got past here, we had multiple references held to the lock by threads processing compounds. I.e., I've been dealing with refcounts and all problems look like refcounts will solve them. I had the refcount code halfway implemented and was looking to send a wakeup to try and get the writer going when I realized what the code was really trying to do.
If the instance had been removed from memory totally, then we would bail out on the first check at line 220. If it was in the process of tearing down, then the lock would be held by the WRITER, which would mean that all READERs had exited. I.e., the refcount of READERs had to be 0. No use tracking something that does not matter.
The problem had to be in that window between starting to tear down and removing the instance from the avl tree (look at the code in usr/src/uts/common/fs/nfs/dserv_mds.c for more context). And the most obvious thing is that a READER which comes along to check has to grab the lock to check. So, this code solves that:
216 bool_t grab_lock = FALSE;
...
223 if (inst == NULL)
224 return (ESRCH);
225
226 /*
227 * If dmi_teardown_in_progress is set, then we can't grab the
228 * lock. I.e., we are in the midst of either tearing it
229 * down or we have torn it down.
230 */
231 retry_with_lock:
232 if (grab_lock) {
233 /*
234 * Now we have to grab the lock and make sure that it is not
235 * true!
236 */
237 rw_enter(&inst->dmi_inst_lock, lock_type);
238 }
239
240 /*
241 * dmi_teardown_in_progress is only set in one place,
242 * dserv_mds_teardown_instance() and when doing so the dmi_inst_lock
243 * is held as a WRITER, therefore, it is safe to check it without
244 * holding the dmi_content_lock.
245 */
246 if (inst->dmi_teardown_in_progress == B_TRUE) {
247 if (grab_lock)
248 rw_exit(&inst->dmi_inst_lock);
249 if (lock_type == RW_READER)
250 return (EIO);
251 else if (lock_type == RW_WRITER)
252 /*
253 * This will protect from receiving multiple teardown
254 * commands happening at once.
255 */
256 return (EBUSY);
257 }
258
259 if (!grab_lock) {
260 grab_lock = TRUE;
261 goto retry_with_lock;
262 }
We will try to check inst->dmi_teardown_in_progress twice. The first time we will do it without the lock. The only way it can be 1 is if a tear down is in progress. We aren't going to keep the lock in that case, we are going to exit right away. But if it is 0 here, we have no way of knowing whether another thread just modified it. So in that case we have to grab the lock and check again.
I thought this would solve the problem, but I ended up with the same panic. The new issue can be found in rwlock(9F):
The rw_enter() function acquires the lock, and blocks if necessary. If enter_type is RW_READER, the caller blocks if there is a writer or a thread attempting to enter for writ- ing. If enter_type is RW_WRITER, the caller blocks if any thread holds the lock.
So say a WRITER comes along, it will wait until all of the READERS drain. At that point, we know the refcnt is 0. The problem is that if another READER comes along, it will now block until the WRITER is done. It will keep the lock active:
...
dmi_inst_lock = {
_opaque = [ 0x10 ]
}
...
[1]> ffffff01e60c6dc8::print dserv_mds_instance_t dmi_inst_lock|::rwlock
ADDR OWNER/COUNT FLAGS WAITERS
ffffff01e60c6e08 READERS=2 B000
What we have to do as a READER is try to grab the lock. If we have to block, then in this case only, we know tear-down has started!
[th199096@aus-build-x86 nfs]> grep dserv_instance_enter *c dserv_mds.c:dserv_instance_enter(krw_t lock_type, boolean_t create_instance, dserv_mds.c: * This function frees any of the locks taken by dserv_instance_enter dserv_mds.c: error = dserv_instance_enter(RW_WRITER, B_FALSE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_TRUE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_TRUE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_TRUE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_FALSE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_FALSE, &inst); dserv_server.c: error = dserv_instance_enter(RW_READER, B_FALSE, &inst);
So we could do this:
238 if (rw_tryenter(&inst->dmi_inst_lock, lock_type) == 0) {
239 if (lock_type == RW_READER)
240 return (EIO);
241 else if (lock_type == RW_WRITER)
242 rw_enter(&inst->dmi_inst_lock, lock_type);
243 }
But I really hate doing this as it makes an assumption about there only ever being one reason to grab as a WRITER and no way to programmatically enforce it. Would a comment suffice?
What we really want to do is sleep and when awoken, retry to grab the lock. We would also have to get the instance pointer fresh.
I think a quick comment and this change will accomplish all I want:
238 if (rw_tryenter(&inst->dmi_inst_lock, lock_type) == 0) {
239 if (lock_type == RW_READER)
240 return (EAGAIN);
241 else if (lock_type == RW_WRITER)
242 rw_enter(&inst->dmi_inst_lock, lock_type);
243 }
I.e., let the caller try again if it wants to!
So code, build, test!
I've spent the last several weeks making both the MDS and DS survive whilst the DS reboots itself silly. On the MDS side, that was mainly in making sure we didn't orphan off ds_addrlist and ds_guid_info database entries. And on the DS, it was mainly making sure that we didn't try to operate on a client's NFSv4.1 requests before we started to end the NFS server.
The ds_owner has two linked lists: ds_addlist and ds_guid_info entries. The lists allow the quick traversal of the entries and to not store the owner id in the entries. At first, we might want to store the owner id, but then we also have to store a boot instance id as well. I.e., are you an ds_addrlist from before the reboot of the DS or after?
But this incestuous relationship causes the most convoluted usage of the rfs4_dbe code. It ain't pretty, and it ain't always easy to follow.
The way I did unit testing was to reboot a DS and see if the MDS stayed up. If it did, then I would run a shell script to dump some interesting structures via mdb. If it didn't, well, then I was already in kmdb and could start dumping the structures directly.
Once I got that working, I started running 10 back-to-back instances of the cthon test suite. And I would then reboot the DS whenever I was ready.
Eventually, I got to the point where the DS would randomly crash upon reboot. It looked like the server instance was being torn down at the same time it was being used to grab a lock. I added a variable to keep track of whether it was being torn down. But I couldn't reliably trigger the bug.
I created a simple script to drive NFS traffic, even if there was an application level error:
[root@pnfs-17-21 ~]> more swift.sh
#!/bin/sh
mount -o vers=4 pnfs-17-24:/pnfs2/pnfs /pnfs/pnfs-17-24/
while /usr/bin/true; do
dd if=/root/cleanup@downtime-zsend.bz2 of=/pnfs/pnfs-17-24/zero count=12
04 bs=2048
dd if=/pnfs/pnfs-17-24/zero of=/root/zero count=1204 bs=2048
sleep 1
done
I also needed a way to force the DS to reboot itself just after the dserv was started. I could have added some code to cause a panic, but I wanted a clean and orderly shutdown. The solution was to use smf(5) to create a service instance with a dependency on dserv and which called reboot directly. This worked like a charm.
If I hit a crash or had to refresh the nfssrv modules, I would drop down into maintenance mode and edit the '/a/var/svc/manifest/network/reboot.xml' file to change the 'reboot' to 'true'.
This unit test ended up testing both the MDS state management and the DS race case.
With a solid set of results from the unit testing, I just need to go validate my code with our standard test suite that all integrations need to pass. And that, that will be my task tomorrow...
As I kept on spending money to make a mold of the w2100z CPU retainer bracket, I came to the realization that I wanted to make a mold more than I wanted a silent PC. It is that simple. I came up with a cheaper method to extend the original brackets, and I may end up falling back on that.
But that is later, for now we should concentrate on my travel to get a mold.
I'd have to say the biggest non-lethal mistake I made on this project was assuming that since the retainer was pretty plain and out of sight, I didn't have to be as careful as I needed to be with making say a figurine. I didn't always measure twice, which can be seen in my choice of using a sandwich container for my mold box:
I didn't make sure that there was a 1/4-1/2 inch space at the top and I ended up having too much extra space. Also, an expensive lesson here is that all of that dead space in the middle of the bracket needs to be filled with mold material. If I had placed an object there to take up room, I would have had a donut shaped mold and saved some money. Ehh, I don't know how easy it would have been to work with that mold!
I really should have taken the time to smooth out the layer of clay. That really ended up hurting me when I casted a bracket. But that all is in the future. Right now we can see that I've run out of mold material:
I had calculated the amount of product needed and was dismayed to realize that just for the top portion it was more than the starter kit provided. BTW - the kit tells you how much base you have, but it doesn't tell you how much of the activant you got or how much the base container weighed.
Speaking of weight, I needed to buy a gram scale:
I felt weird walking through a store looking for a gram scale, syringes, and disposable plastic gloves. I.e., I realized what else that stuff could be used for!
Anyway, the scale was a wise investment and I can now use it on the expense reports I need to send in (i.e., do I need one stamp or two?). Also, I flashed back to HS Chemistry. I must have been paying attention at some point, because I knew to offset the weight of my containers.
A day later, and I have some cheesecake with a nice blue playdo crust:
And if we flip it over, we see the next issue:
I need to cut open the sandwich container. I realized this before I added the mold material, but went on with it anyway. I was having a tough time figuring out what to use as my mold box (did I mention that this was my very first attempt to make a mold?). You can see another problem, I didn't make sure to get a good seal with the clay against the wall.
Once all of the clay, well most of it, is removed, we can see the bottom part:
I would end up getting more of the clay removed, but I mainly concentrated on the stuff on the retainer. I had meant to leave some on the inside corners of it, but decided not to do so. My other idea was to later add some milliput and shape it as needed.
I didn't capture my next mistake - which was probably the worst. I forgot to add the mold release before I started to pour in the bottom mold. I caught this after I had started, but before the mold really set. I poured out the product, scraped it down, and added some release agent. I was already screwed and not inclined to buy another $29 bottle of mold material. I had the milliput in mind already.
Also, as can be seen here, I didn't have enough clearance on the bottom:
I used duct tape to build up the side. That actually worked out okay.
After I pulled out the set mold, you can see there really isn't a line to separate the two halves:
I took a line and cut along the clay residue. Another lesson is that I should have marked the line on the clay box. Here we can see the original part, rescued from its early grave:
And here we can see the mold:
I set the mold and it is alive I tell you! Alive!
But it does need some surgery:
At first I thought my mold was bad, but then I realized that it was simply an air bubble. I got rushed at the end, resin was pouring out of the bottom of the mold. I had been burping it, but I didn't get enough product up to the top corners. Some vent holes there might have done wonders.
The main problem with the resulting part is that I don't think it is strong enough. I'll need to recast and add some fillers. I read somewhere that nylon string strands would work. But I need to confirm that.
But what I can do with this part is fit it perfectly to the fan clip and the motherboard. I can use milliput to reshape. And once I have that in order, I can either create a new mold (wouldn't be prudent at this juncture) or have an easy way to reshape strengthened clones or the originals.
It was fun and I'm amazed that with as many things I did wrong, I got close to what I wanted. The unit cost can only go down and I'm pretty sure I can get a stronger retainer!
I'm in the process of making sure the MDS cleans up memory when a DS reboots. As such, I've been trying to figure out where a refcnt is held or released for a rfs4_dbe. I added two circular buffers to keep track of these events.
I just did 10 back-to-back cthon test runs and turned my head away. I was trying to figure out if mds_gather_devs leaked ds_addrlist_ts. The answer is no, but I did get a panic.
[root@pnfs-17-24 ~]> panic[cpu1]/thread=ffffff01d16dee80: assertion failed: e->refcnt > 1, file: ../../common/fs/nfs/nfs4_db.c, line: 193 ffffff00094836d0 genunix:assfail+7e () ffffff0009483700 nfssrv:rfs4_dbe_rele+98 () ffffff0009483720 nfssrv:rfs41_session_rele+1a () ffffff0009483740 nfssrv:rfs41_compound_state_free+8a () ffffff00094837d0 nfssrv:rfs41_dispatch+1bd () ffffff0009483820 nfssrv:rfs4_minor_version_dispatch+5c () ffffff0009483b20 nfssrv:common_dispatch+7a6 () ffffff0009483b40 nfssrv:rfs_dispatch+2d () ffffff0009483c30 rpcmod:svc_getreq+20d () ffffff0009483ca0 rpcmod:svc_run+197 () ffffff0009483cd0 rpcmod:svc_do_run+81 () ffffff0009484eb0 nfs:nfssys+9f1 () ffffff0009484f00 unix:brand_sys_syscall32+292 () panic: entering debugger (continue to save dump)
And this is probably something I added - not something already existing.
So I have these arrays, how do I figure out what is going on?
[1]> $C fffffffffbc65740 kmdb_enter+0xb() fffffffffbc65760 debug_enter+0x38(fffffffffb961ea0) fffffffffbc65830 panicsys+0x40e(fffffffffbf50518, ffffff0009483660, fffffffffbc65840, 1) ffffff00094835a0 vpanic+0x15d() ffffff0009483690 panic+0x94() ffffff00094836d0 assfail+0x7e(fffffffff81e4910, fffffffff81e48f0, c1) ffffff0009483700 nfssrv`rfs4_dbe_rele+0x98(ffffff01e678fad8) ffffff0009483720 nfssrv`rfs41_session_rele+0x1a(ffffff01e678fb68) ffffff0009483740 nfssrv`rfs41_compound_state_free+0x8a(ffffff027ffb7c20) ffffff00094837d0 nfssrv`rfs41_dispatch+0x1bd(ffffff0009483bb0, ffffff01f18ab940 , ffffff0009483880) ffffff0009483820 nfssrv`rfs4_minor_version_dispatch+0x5c(ffffff0009483bb0, ffffff01f18ab940, ffffff0009483880) ffffff0009483b20 nfssrv`common_dispatch+0x7a6(ffffff0009483bb0, ffffff01f18ab940 , 2, 4, fffffffff81e1e40, ffffffffc011e3e0) ffffff0009483b40 nfssrv`rfs_dispatch+0x2d(ffffff0009483bb0, ffffff01f18ab940) ffffff0009483c30 rpcmod`svc_getreq+0x20d(ffffff01f18ab940, ffffff01d63f2f60) ffffff0009483ca0 rpcmod`svc_run+0x197(ffffff01e4a0bbb8) ffffff0009483cd0 rpcmod`svc_do_run+0x81(1) ffffff0009484eb0 nfs`nfssys+0x9f1(e, fe580fc0) ffffff0009484f00 sys_syscall32+0x1fc()
The rfs4_dbe_t is at ffffff01e678fad8:
[1]> ffffff01e678fad8::print struct rfs4_dbe
{
lock = [
{
_opaque = [ 0xffffff01d16dee80 ]
}
]
refcnt = 0x1
skipsearch = 0x1
invalid = 0x1
reserved = 0x3addcafe
time_rele = 2009 May 5 16:17:41
inval_hint = nfssrv`mds_session_inval+0x9e
id = 0x64
cv = [
{
_opaque = 0
}
]
data = 0xffffff01e678fb68
rtr = {
rtr_count = 0x3
rtr_rele_idx = 0x1
rtr_hold_idx = 0x2
rtr_rele = 0xffffff01e7a9c1b0
rtr_hold = 0xffffff01e7a9c180
}
table = 0xffffff01d9494700
indices = [
{
next = 0
prev = 0
entry = 0xffffff01e678fad8
}
]
}
The rtr structure holds the refcnt tracking data. We are tracking 3 items per buffer and we can see the last ones recorded would have been rele=0 and hold=1. (I.e., we are going to write at rele=1 and hold=2, so we subtract 1 to get the previous record.)
I can use a dcmd to see each array:
[1]> 0xffffff01e7a9c1b0::array caddr_t 3 |::print caddr_t nfssrv`mds_destroysession+0x3e nfssrv`rfs41_compound_state_free+0x8a nfssrv`mds_session_inval+0xa7 [1]> 0xffffff01e7a9c180::array caddr_t 3 |::print caddr_t nfssrv`rfs4_dbsearch+0x154 nfssrv`rfs4_dbsearch+0x154 nfssrv`rfs4_dbsearch+0x154
I need to script this up to run from the command line, but for now this will suffice. Hmm, the rfs4_dbsearch information isn't too useful. It would be better to have recorded the caller of it. I have that capability, I'll just need to make sure to call it in rfs4_dbsearch.
Okay, the buffer size is too small here to help. But I could bump it up and rerun the tests. I'm pretty confident that it will trip again. But, I also happen to know I have made changes here and I can go revisit those changes to see where I made a mistake.
So, I've never made a mold before and I've never made a casting. I've been reading the Alumilite online examples and I went ahead and bought their starter kit. I figured if they can tell me how to do it online, they can have some of my money.
The first thing I did was use modeling clay to plug the openings:
I'm going to have a two part mold and I won't be able to have those openings in there. Note that I really should make sure to smooth things down on the prongs, but hey, I want to know where to cut those holes back in. I'm banking on having a faint outline there to guide me.
You can see things are pretty smooth still though:
Note that the surface I care most about is that screw connector shown at the top in that picture. I really want it to be level such that the fan will not wobble. But given the way I'm building the bottom of the mold, this will not matter that much:
I've elected to have the seam be on the base and I've pressed the retainer into the clay. At this point I need to clean up the top of towers (look in the lower left one) or I'll be cleaning up each and every cast. Another decision is whether I want to add some more clay to provide material to shave off for the base connections. I.e., I pretty well near went into the outer wall on my earlier attempt. I still plan to add some material around the feet when I prep the bottom half.
Oh well, I still need to figure out what I am going to do about the casting box, so I've got some time.
I'm also hoping the resulting cast will be strong enough. The demos produced car model parts and I think that will be sufficient. I'll have to see - I may need to get different casting resin.
So we can see here that it doesn't look like I can shave enough off:
I went to Hobby Lobby over lunch and it looks like I going to need to make a mold and tap out some holes. I saw some Alumilite products that look like they would be durable plastic.
The plan would be to cover the screw holes and take a mold. Hmm, I might also look to add a bit of width on the hole bases. I.e., I probably need another millimeter or two of wiggle room. 1 for the hole and 1 for the rim.
But once I have a mold in place, I will also be able to make my own spares and not be so cautious with what I have. :->
Here is a picture of the CPU retainer bracket before any modifications:
All of the holes look nice and round.
I marked one corner (so I was always testing the same fit) and started in on the holes with a dremel. The intent is to shift them about 1mm up. I'm not using calipers or anything at all scientific. I'm doing it all by eye. Note that this means I might have the right side fitted correctly and not the left. And I may end up over compensating. My inspiration are those oval screw holes on older hard drive brackets.
I'm fitting the lower left and upper right screws in when I'm doing my tests. I leave the screws loose and move the bracket up and down to check the fit. Here we can see the original offset:
And here we can see the new and improved offset:
At this point, I'm getting leery of making more shavings. If we look at a closeup of one of the holes:
you can see that I may end up going to far. I have no clue what will happen as I tighten the screws later. I don't want a crack.
Just by eyeballing the remaining clearance needed and the bracket to be carved off, I'm not convinced that that this is plausible. Right now I have a working bracket for the stock CPU fans and I'm flashing on the old adage, "Measure twice, cut once."
Also, I've got another problem on the daughterboard. You can see that even unconnected my test blank is not going to fit:
There is a header there (for the CPU fan) that is blocking lining this up correctly. I shaved down the outside of the retainer and got it to fit:
I got the Dremel out and started shaving back the mounting holes for the CPU bracket. I think I can make it work. But on my last shave, I got a little impatient and made a cut bigger than I wanted. It wasn't damaging, but I realized I'm too groggy.
Anyway, I'll add pictures tomorrow and maybe continue on.
I've been assuming all along that the problem was the cpu fans. I did another search and came up with someone who installed a Zalman: here and write-up here.
One of the pictures had a closeup of the case fan, so I searched for it and here is a description: here:
If noise issue is no problem for you, you might want to consider this Delta 120mm fan. It features a whopping 152 CFM of airflow at 53 dBa. For a 120mm fan that is 38mm thick that isn't too shabby. This high end Delta 120mm fan is often used in rackmount server rooms where noise is not an issue.
I can't swear, but I think the CPU fan is a:
Tach output, 9 blade, very high performance 40CFM, moderate noise 38dBA,
Okay, it is clear Sun was getting a good deal on server room fans and not desktop fans.
But, as a server, these things kick butt!
If I wanted to keep the stock fans, I think I would look into building a box to house the computer. I've looked at noise dampening from an audio perspective (think garage band) and it looks like a simple box with 2 layers of drywall might do. Sounds heavy. I'd probably consider a light wooden frame and the sides made out of a double layer of cork. This would not be a load bearing box, just something that could be placed over the computer.
You would want airflow (the trade-off is classically sound versus temperature.) with a means for air to come out. Perhaps a low blowing case fan? :->
I think I could build something like that for much less than the sound reduction boxes I see online.
Hmm, too late at night - but I wanted to capture the thoughts on the noise reduction box!
The w2100z is loud, no two ways about it. And someone claimed that you can't make it quiet.
Well, I have a nice w2100z and I want to use it as a test server at home. I'm tired of the constant buzzing. So, I've ordered some things to help quiet it.
There are basically 3 things I can do to reduce the noise footprint:
I've actually already relocated the w2100z and it still pollutes the ambient noise in my house:
You can see where I've pulled the w2100z off of that half-height telco rack mount cabinet. And I cleaned off that dust from a remodeling job in the house! Okay, I keep the closet door open all of the time to keep air flowing and I also have added a bathroom fan to push air through the closet:
I bring this up because closing the door or turning off the fan are big improvements. So, even with any improvements I make, I'll have to keep this in mind. One thing I have done is add a remote digital thermometer to the room. I'm trying to get an idea of how hot it really gets, but I won't know until summer time really hits.
Also, if I put the w2100z inside the cabinet, I might be able to leverage any sound insulation it will provide.
Okay, the real thrust here is that I ordered some replacement fans for the case and cpu. I like Scythe and I like to order it from NewEgg, except the Scythe Mini-Ninjas were out of stock. So I got them from EndPCNoise:
I went with the Mini-Ninja because space inside the w2100z is limited:
I've put a full height Scythe Ninja in a couple of Antec P180 cases and I've been leery of doing that for a dual-cpu system. I.e., one in a case is tight.
Anyway, the first thing I did was take some pictures of the current CPU fans in place, in case I wanted to add them back:
This also showed me where the fan connectors went. I thought I was screwed, as the current fans had a 3 pin connector, but the cpu fans that came with the Mini-Ninjas (and some quieter replacements:)
were either 3 pin or came with converters.
I pulled out the heatsinks, which was a bit of chore, but I followed the directions at Sun Java™ Workstation W1100z and W2100z User Guide. We can see the thermal grease on the bottom of the heat sink:
I used some alcohol pads to clean off the gunk:
Note that some has gotten on the "fins" and it appears there is small ding. I think that happened trying to get it loose from that last connector.
The CPU also starts out dirty:
And cleans up nicely:
Note that I'm not as concerned with getting this perfect yet. You can see the grime in the plastic retainer bracket and I'll only be worried when I get ready to add it all in.
Before I do anything else, I want to do a dry run to see if the fit is right. So I just place the towers on the cpus:
Amazingly, the daughter board has a better fit than the motherboard. Also, I'll probably not put a CPU fan on it as the case fan will do a lot of work there. Or if the CPU fan is quiet enough, perhaps I'll make sure to push and pull air across that CPU.
With the motherboard CPU, it looks like it will fit okay, but there are some connectors underneath which may cause complications:
The real problem is that the CPU fan retainer bracket that Sun has used is not standard! The 940 socket adapter has nothing to grab onto:
Hmm, we see that the Socket 478 assembly clip almost works:
Except that the CPU hole does not line up correctly!
If we pull the bracket, attach the clip, then we can see that we now know the dimensions needed for a replacement retainer bracket, and dang if it isn't a close fit:
But we can see that the mounting holes are just a bit off center and I'm not sure if the metal assembly might now be touching something else on the motherboard. If I knew for sure that it was not, I could probably bore new holes for the mounting screws. I'll have to think about that.
By the way, a quick search for replacement brackets showed that the Sun ones were not compatible with standard 940 brackets.
If I line up the holes, we can really see the gap:
It is about 3mm off and it doesn't look like there is any place to get that back.
I need a good solid connection or the thermal seal could break.
Does anyone know of a good replacement bracket? It seems proprietary brackets don't work well and the Scythe Universal Retention kit will not work.
I need to think about what I'm going to do before I make any modifications. :->
We had a town hall style event today and some Oracle brass were there to answer some questions.
What struck me most about the mood of the questions was the pride and ownership that Sun employees have over our products. Not only was there a paternal pride over the products we birthed, we had it for those that we adopted.
But that is only the tip of the iceberg, we wanted to know if our customers were going to be looked after. We have an emotional investment with them as well. I still have friends who buy Sun gear and I'm quick to answer their questions when they get stuck. I'm sure others do that as well.
I think that as long as a company's culture embraces that pride, their customers will get a product that they love. We've seen that to date with Sun and OpenSolaris - Sun employees want the quality poured into Solaris to remain in OpenSolaris. That is probably the biggest differentiator between Linux and OpenSolaris. And if you don't understand that pride and investment, you'll never accept OpenSolaris as open source.
But if you do understand how individuals at Sun own quality, then you'll understand how our products and customers are going to be looked after.
Okay, the spuds and the drives work like a charm:
[root@ultralord ~]> format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t1d0 <DEFAULT cyl 30391 alt 2 hd 255 sec 63>
/pci@0,0/pci108e,5351@1f,2/disk@1,0
1. c0t3d0 <DEFAULT cyl 38910 alt 2 hd 255 sec 126>
/pci@0,0/pci108e,5351@1f,2/disk@3,0
2. c0t4d0 <DEFAULT cyl 38910 alt 2 hd 255 sec 126>
/pci@0,0/pci108e,5351@1f,2/disk@4,0
Specify disk (enter its number): ^D
I want to create a pool for building kernels:
[root@ultralord ~]> zpool create builds c0t3d0 c0t4d0 [root@ultralord ~]> df -h /builds Filesystem size used avail capacity Mounted on builds 1.1T 19K 1.1T 1% /builds
Hmm, while I like the space, these are 640G HDs. Wait, I didn't use raidz:
[root@ultralord ~]> zpool destroy builds [root@ultralord ~]> zpool create builds raidz c0t3d0 c0t4d0 [root@ultralord ~]> df -h /builds Filesystem size used avail capacity Mounted on builds 587G 19K 587G 1% /builds
Much better - now I need to get some source and let it rip!