I asked a friend who was a ChemE in college what I could do to make my CPU bracket stronger, he passed me on to Larry. Hi Larry!
Larry's suggestion was to add fiberglass threads to the resin as it mixed. My last interaction with fiberglass involved Tequila and the last football game I went to at OU. I woke up the next morning with a rear end full of fiber - seems I had worn shorts and somehow managed to find a spot that was flaking.
So I listened and got a tyvek jacket:
I also pulled out a mask I had used when painting our cement floor:
I also got some fiberglass cloth to cut into 1/4" strands:
I could go on about technique, but you cut at 45 degrees to the grain and you want shorted than 1/4" in. Anything longer (upto 3/4" in) you keep in a different container. I have a picture of my two containers, and yes, it appears both have strands longer than 1/4" in:
The other thing I needed was something to vibrate the mold to get out air bubbles. Try as I might, I couldn't think of anything. Except making my iPhone ring and ring. Or finding an app what made it vibrate. I paid $0.99 for some app and covered my iPhone in an expensive protective sheathe:
Here you can see my first run:
You can see the sandwich bag housing the iPhone, you can see where I added 4 holes on the feet to allow additional resin to be added. You can see I'm making a small plug to test strength.
And you can see I made a mess:
I didn't bother cleaning off the flash - the resultant structure was weak. I'm not sure, but I think I didn't stir the A and B compounds enough for the first batch. The remaining batches did get stirred and the final plug shape I cast was much stronger. I remember that the bracket felt like it had not set when I pulled it out.
We can see my sandwich bag was a good idea:
I did another run, not shown, and the resulting cast was stronger than both the first and the prior one without the fiberglass. But, I was getting frustrated because my resin was setting too fast, it was difficult to get the gooier mix of resin and fiberglass into the form, and I didn't want to recut the fiber.
I've since used milliput (and a nice clay snake!):
to extend one of the original brackets:
I'm waiting for that to set before I dremel it out.
I've hit a nasty little bug which requires an orderly shutdown of the DS as the client is pounding it with traffic:
panic[cpu1]/thread=ffffff01d16d38e0: rw_destroy: lock still active, lp=ffffff01e60c6e08 wwwh=10 thread=ffffff01d16d38e0 ffffff00090a0b50 unix:rw_panic+6f () ffffff00090a0b70 unix:rw_destroy+33 () ffffff00090a0ba0 nfssrv:dserv_mds_instance_destroy+6d () ffffff00090a0bf0 genunix:kmem_cache_free_debug+29c () ffffff00090a0c50 genunix:kmem_cache_free+90 () ffffff00090a0c90 nfssrv:dserv_mds_instance_teardown+2b8 () ffffff00090a0cd0 unix:stubs_common_code+51 () ffffff00090a1eb0 nfs:nfssys+73 () ffffff00090a1f00 unix:brand_sys_syscall32+292 ()
Lock still active, I take that to mean that we are trying to destroy it as another thread has it held.
If we look at dserv_mds_instance_destroy, it only helps as far as getting us in the right source file. Strike that, as there is only one read/write lock, it also tells us which one. The issue is this snippet in dserv_instance_enter:
220 if (inst == NULL)
221 return (ESRCH);
222
223 rw_enter(&inst->dmi_inst_lock, lock_type);
224 /*
225 * dmi_teardown_in_progress is only set in one place,
226 * dserv_mds_teardown_instance() and when doing so the dmi_inst_lock
227 * is held as a WRITER, therefore, it is safe to check it without
228 * holding the dmi_content_lock.
229 */
230 if (inst->dmi_teardown_in_progress == B_TRUE) {
231 rw_exit(&inst->dmi_inst_lock);
232 if (lock_type == RW_READER)
233 return (EIO);
234 else if (lock_type == RW_WRITER)
235 /*
236 * This will protect from receiving multiple teardown
237 * commands happening at once.
238 */
239 return (EBUSY);
240 }
I thought the issue was that if we got past here, we had multiple references held to the lock by threads processing compounds. I.e., I've been dealing with refcounts and all problems look like refcounts will solve them. I had the refcount code halfway implemented and was looking to send a wakeup to try and get the writer going when I realized what the code was really trying to do.
If the instance had been removed from memory totally, then we would bail out on the first check at line 220. If it was in the process of tearing down, then the lock would be held by the WRITER, which would mean that all READERs had exited. I.e., the refcount of READERs had to be 0. No use tracking something that does not matter.
The problem had to be in that window between starting to tear down and removing the instance from the avl tree (look at the code in usr/src/uts/common/fs/nfs/dserv_mds.c for more context). And the most obvious thing is that a READER which comes along to check has to grab the lock to check. So, this code solves that:
216 bool_t grab_lock = FALSE;
...
223 if (inst == NULL)
224 return (ESRCH);
225
226 /*
227 * If dmi_teardown_in_progress is set, then we can't grab the
228 * lock. I.e., we are in the midst of either tearing it
229 * down or we have torn it down.
230 */
231 retry_with_lock:
232 if (grab_lock) {
233 /*
234 * Now we have to grab the lock and make sure that it is not
235 * true!
236 */
237 rw_enter(&inst->dmi_inst_lock, lock_type);
238 }
239
240 /*
241 * dmi_teardown_in_progress is only set in one place,
242 * dserv_mds_teardown_instance() and when doing so the dmi_inst_lock
243 * is held as a WRITER, therefore, it is safe to check it without
244 * holding the dmi_content_lock.
245 */
246 if (inst->dmi_teardown_in_progress == B_TRUE) {
247 if (grab_lock)
248 rw_exit(&inst->dmi_inst_lock);
249 if (lock_type == RW_READER)
250 return (EIO);
251 else if (lock_type == RW_WRITER)
252 /*
253 * This will protect from receiving multiple teardown
254 * commands happening at once.
255 */
256 return (EBUSY);
257 }
258
259 if (!grab_lock) {
260 grab_lock = TRUE;
261 goto retry_with_lock;
262 }
We will try to check inst->dmi_teardown_in_progress twice. The first time we will do it without the lock. The only way it can be 1 is if a tear down is in progress. We aren't going to keep the lock in that case, we are going to exit right away. But if it is 0 here, we have no way of knowing whether another thread just modified it. So in that case we have to grab the lock and check again.
I thought this would solve the problem, but I ended up with the same panic. The new issue can be found in rwlock(9F):
The rw_enter() function acquires the lock, and blocks if necessary. If enter_type is RW_READER, the caller blocks if there is a writer or a thread attempting to enter for writ- ing. If enter_type is RW_WRITER, the caller blocks if any thread holds the lock.
So say a WRITER comes along, it will wait until all of the READERS drain. At that point, we know the refcnt is 0. The problem is that if another READER comes along, it will now block until the WRITER is done. It will keep the lock active:
...
dmi_inst_lock = {
_opaque = [ 0x10 ]
}
...
[1]> ffffff01e60c6dc8::print dserv_mds_instance_t dmi_inst_lock|::rwlock
ADDR OWNER/COUNT FLAGS WAITERS
ffffff01e60c6e08 READERS=2 B000
What we have to do as a READER is try to grab the lock. If we have to block, then in this case only, we know tear-down has started!
[th199096@aus-build-x86 nfs]> grep dserv_instance_enter *c dserv_mds.c:dserv_instance_enter(krw_t lock_type, boolean_t create_instance, dserv_mds.c: * This function frees any of the locks taken by dserv_instance_enter dserv_mds.c: error = dserv_instance_enter(RW_WRITER, B_FALSE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_TRUE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_TRUE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_TRUE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_FALSE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_FALSE, &inst); dserv_server.c: error = dserv_instance_enter(RW_READER, B_FALSE, &inst);
So we could do this:
238 if (rw_tryenter(&inst->dmi_inst_lock, lock_type) == 0) {
239 if (lock_type == RW_READER)
240 return (EIO);
241 else if (lock_type == RW_WRITER)
242 rw_enter(&inst->dmi_inst_lock, lock_type);
243 }
But I really hate doing this as it makes an assumption about there only ever being one reason to grab as a WRITER and no way to programmatically enforce it. Would a comment suffice?
What we really want to do is sleep and when awoken, retry to grab the lock. We would also have to get the instance pointer fresh.
I think a quick comment and this change will accomplish all I want:
238 if (rw_tryenter(&inst->dmi_inst_lock, lock_type) == 0) {
239 if (lock_type == RW_READER)
240 return (EAGAIN);
241 else if (lock_type == RW_WRITER)
242 rw_enter(&inst->dmi_inst_lock, lock_type);
243 }
I.e., let the caller try again if it wants to!
So code, build, test!
I've spent the last several weeks making both the MDS and DS survive whilst the DS reboots itself silly. On the MDS side, that was mainly in making sure we didn't orphan off ds_addrlist and ds_guid_info database entries. And on the DS, it was mainly making sure that we didn't try to operate on a client's NFSv4.1 requests before we started to end the NFS server.
The ds_owner has two linked lists: ds_addlist and ds_guid_info entries. The lists allow the quick traversal of the entries and to not store the owner id in the entries. At first, we might want to store the owner id, but then we also have to store a boot instance id as well. I.e., are you an ds_addrlist from before the reboot of the DS or after?
But this incestuous relationship causes the most convoluted usage of the rfs4_dbe code. It ain't pretty, and it ain't always easy to follow.
The way I did unit testing was to reboot a DS and see if the MDS stayed up. If it did, then I would run a shell script to dump some interesting structures via mdb. If it didn't, well, then I was already in kmdb and could start dumping the structures directly.
Once I got that working, I started running 10 back-to-back instances of the cthon test suite. And I would then reboot the DS whenever I was ready.
Eventually, I got to the point where the DS would randomly crash upon reboot. It looked like the server instance was being torn down at the same time it was being used to grab a lock. I added a variable to keep track of whether it was being torn down. But I couldn't reliably trigger the bug.
I created a simple script to drive NFS traffic, even if there was an application level error:
[root@pnfs-17-21 ~]> more swift.sh
#!/bin/sh
mount -o vers=4 pnfs-17-24:/pnfs2/pnfs /pnfs/pnfs-17-24/
while /usr/bin/true; do
dd if=/root/cleanup@downtime-zsend.bz2 of=/pnfs/pnfs-17-24/zero count=12
04 bs=2048
dd if=/pnfs/pnfs-17-24/zero of=/root/zero count=1204 bs=2048
sleep 1
done
I also needed a way to force the DS to reboot itself just after the dserv was started. I could have added some code to cause a panic, but I wanted a clean and orderly shutdown. The solution was to use smf(5) to create a service instance with a dependency on dserv and which called reboot directly. This worked like a charm.
If I hit a crash or had to refresh the nfssrv modules, I would drop down into maintenance mode and edit the '/a/var/svc/manifest/network/reboot.xml' file to change the 'reboot' to 'true'.
This unit test ended up testing both the MDS state management and the DS race case.
With a solid set of results from the unit testing, I just need to go validate my code with our standard test suite that all integrations need to pass. And that, that will be my task tomorrow...
As I kept on spending money to make a mold of the w2100z CPU retainer bracket, I came to the realization that I wanted to make a mold more than I wanted a silent PC. It is that simple. I came up with a cheaper method to extend the original brackets, and I may end up falling back on that.
But that is later, for now we should concentrate on my travel to get a mold.
I'd have to say the biggest non-lethal mistake I made on this project was assuming that since the retainer was pretty plain and out of sight, I didn't have to be as careful as I needed to be with making say a figurine. I didn't always measure twice, which can be seen in my choice of using a sandwich container for my mold box:
I didn't make sure that there was a 1/4-1/2 inch space at the top and I ended up having too much extra space. Also, an expensive lesson here is that all of that dead space in the middle of the bracket needs to be filled with mold material. If I had placed an object there to take up room, I would have had a donut shaped mold and saved some money. Ehh, I don't know how easy it would have been to work with that mold!
I really should have taken the time to smooth out the layer of clay. That really ended up hurting me when I casted a bracket. But that all is in the future. Right now we can see that I've run out of mold material:
I had calculated the amount of product needed and was dismayed to realize that just for the top portion it was more than the starter kit provided. BTW - the kit tells you how much base you have, but it doesn't tell you how much of the activant you got or how much the base container weighed.
Speaking of weight, I needed to buy a gram scale:
I felt weird walking through a store looking for a gram scale, syringes, and disposable plastic gloves. I.e., I realized what else that stuff could be used for!
Anyway, the scale was a wise investment and I can now use it on the expense reports I need to send in (i.e., do I need one stamp or two?). Also, I flashed back to HS Chemistry. I must have been paying attention at some point, because I knew to offset the weight of my containers.
A day later, and I have some cheesecake with a nice blue playdo crust:
And if we flip it over, we see the next issue:
I need to cut open the sandwich container. I realized this before I added the mold material, but went on with it anyway. I was having a tough time figuring out what to use as my mold box (did I mention that this was my very first attempt to make a mold?). You can see another problem, I didn't make sure to get a good seal with the clay against the wall.
Once all of the clay, well most of it, is removed, we can see the bottom part:
I would end up getting more of the clay removed, but I mainly concentrated on the stuff on the retainer. I had meant to leave some on the inside corners of it, but decided not to do so. My other idea was to later add some milliput and shape it as needed.
I didn't capture my next mistake - which was probably the worst. I forgot to add the mold release before I started to pour in the bottom mold. I caught this after I had started, but before the mold really set. I poured out the product, scraped it down, and added some release agent. I was already screwed and not inclined to buy another $29 bottle of mold material. I had the milliput in mind already.
Also, as can be seen here, I didn't have enough clearance on the bottom:
I used duct tape to build up the side. That actually worked out okay.
After I pulled out the set mold, you can see there really isn't a line to separate the two halves:
I took a line and cut along the clay residue. Another lesson is that I should have marked the line on the clay box. Here we can see the original part, rescued from its early grave:
And here we can see the mold:
I set the mold and it is alive I tell you! Alive!
But it does need some surgery:
At first I thought my mold was bad, but then I realized that it was simply an air bubble. I got rushed at the end, resin was pouring out of the bottom of the mold. I had been burping it, but I didn't get enough product up to the top corners. Some vent holes there might have done wonders.
The main problem with the resulting part is that I don't think it is strong enough. I'll need to recast and add some fillers. I read somewhere that nylon string strands would work. But I need to confirm that.
But what I can do with this part is fit it perfectly to the fan clip and the motherboard. I can use milliput to reshape. And once I have that in order, I can either create a new mold (wouldn't be prudent at this juncture) or have an easy way to reshape strengthened clones or the originals.
It was fun and I'm amazed that with as many things I did wrong, I got close to what I wanted. The unit cost can only go down and I'm pretty sure I can get a stronger retainer!
I'm in the process of making sure the MDS cleans up memory when a DS reboots. As such, I've been trying to figure out where a refcnt is held or released for a rfs4_dbe. I added two circular buffers to keep track of these events.
I just did 10 back-to-back cthon test runs and turned my head away. I was trying to figure out if mds_gather_devs leaked ds_addrlist_ts. The answer is no, but I did get a panic.
[root@pnfs-17-24 ~]> panic[cpu1]/thread=ffffff01d16dee80: assertion failed: e->refcnt > 1, file: ../../common/fs/nfs/nfs4_db.c, line: 193 ffffff00094836d0 genunix:assfail+7e () ffffff0009483700 nfssrv:rfs4_dbe_rele+98 () ffffff0009483720 nfssrv:rfs41_session_rele+1a () ffffff0009483740 nfssrv:rfs41_compound_state_free+8a () ffffff00094837d0 nfssrv:rfs41_dispatch+1bd () ffffff0009483820 nfssrv:rfs4_minor_version_dispatch+5c () ffffff0009483b20 nfssrv:common_dispatch+7a6 () ffffff0009483b40 nfssrv:rfs_dispatch+2d () ffffff0009483c30 rpcmod:svc_getreq+20d () ffffff0009483ca0 rpcmod:svc_run+197 () ffffff0009483cd0 rpcmod:svc_do_run+81 () ffffff0009484eb0 nfs:nfssys+9f1 () ffffff0009484f00 unix:brand_sys_syscall32+292 () panic: entering debugger (continue to save dump)
And this is probably something I added - not something already existing.
So I have these arrays, how do I figure out what is going on?
[1]> $C fffffffffbc65740 kmdb_enter+0xb() fffffffffbc65760 debug_enter+0x38(fffffffffb961ea0) fffffffffbc65830 panicsys+0x40e(fffffffffbf50518, ffffff0009483660, fffffffffbc65840, 1) ffffff00094835a0 vpanic+0x15d() ffffff0009483690 panic+0x94() ffffff00094836d0 assfail+0x7e(fffffffff81e4910, fffffffff81e48f0, c1) ffffff0009483700 nfssrv`rfs4_dbe_rele+0x98(ffffff01e678fad8) ffffff0009483720 nfssrv`rfs41_session_rele+0x1a(ffffff01e678fb68) ffffff0009483740 nfssrv`rfs41_compound_state_free+0x8a(ffffff027ffb7c20) ffffff00094837d0 nfssrv`rfs41_dispatch+0x1bd(ffffff0009483bb0, ffffff01f18ab940 , ffffff0009483880) ffffff0009483820 nfssrv`rfs4_minor_version_dispatch+0x5c(ffffff0009483bb0, ffffff01f18ab940, ffffff0009483880) ffffff0009483b20 nfssrv`common_dispatch+0x7a6(ffffff0009483bb0, ffffff01f18ab940 , 2, 4, fffffffff81e1e40, ffffffffc011e3e0) ffffff0009483b40 nfssrv`rfs_dispatch+0x2d(ffffff0009483bb0, ffffff01f18ab940) ffffff0009483c30 rpcmod`svc_getreq+0x20d(ffffff01f18ab940, ffffff01d63f2f60) ffffff0009483ca0 rpcmod`svc_run+0x197(ffffff01e4a0bbb8) ffffff0009483cd0 rpcmod`svc_do_run+0x81(1) ffffff0009484eb0 nfs`nfssys+0x9f1(e, fe580fc0) ffffff0009484f00 sys_syscall32+0x1fc()
The rfs4_dbe_t is at ffffff01e678fad8:
[1]> ffffff01e678fad8::print struct rfs4_dbe
{
lock = [
{
_opaque = [ 0xffffff01d16dee80 ]
}
]
refcnt = 0x1
skipsearch = 0x1
invalid = 0x1
reserved = 0x3addcafe
time_rele = 2009 May 5 16:17:41
inval_hint = nfssrv`mds_session_inval+0x9e
id = 0x64
cv = [
{
_opaque = 0
}
]
data = 0xffffff01e678fb68
rtr = {
rtr_count = 0x3
rtr_rele_idx = 0x1
rtr_hold_idx = 0x2
rtr_rele = 0xffffff01e7a9c1b0
rtr_hold = 0xffffff01e7a9c180
}
table = 0xffffff01d9494700
indices = [
{
next = 0
prev = 0
entry = 0xffffff01e678fad8
}
]
}
The rtr structure holds the refcnt tracking data. We are tracking 3 items per buffer and we can see the last ones recorded would have been rele=0 and hold=1. (I.e., we are going to write at rele=1 and hold=2, so we subtract 1 to get the previous record.)
I can use a dcmd to see each array:
[1]> 0xffffff01e7a9c1b0::array caddr_t 3 |::print caddr_t nfssrv`mds_destroysession+0x3e nfssrv`rfs41_compound_state_free+0x8a nfssrv`mds_session_inval+0xa7 [1]> 0xffffff01e7a9c180::array caddr_t 3 |::print caddr_t nfssrv`rfs4_dbsearch+0x154 nfssrv`rfs4_dbsearch+0x154 nfssrv`rfs4_dbsearch+0x154
I need to script this up to run from the command line, but for now this will suffice. Hmm, the rfs4_dbsearch information isn't too useful. It would be better to have recorded the caller of it. I have that capability, I'll just need to make sure to call it in rfs4_dbsearch.
Okay, the buffer size is too small here to help. But I could bump it up and rerun the tests. I'm pretty confident that it will trip again. But, I also happen to know I have made changes here and I can go revisit those changes to see where I made a mistake.
So, I've never made a mold before and I've never made a casting. I've been reading the Alumilite online examples and I went ahead and bought their starter kit. I figured if they can tell me how to do it online, they can have some of my money.
The first thing I did was use modeling clay to plug the openings:
I'm going to have a two part mold and I won't be able to have those openings in there. Note that I really should make sure to smooth things down on the prongs, but hey, I want to know where to cut those holes back in. I'm banking on having a faint outline there to guide me.
You can see things are pretty smooth still though:
Note that the surface I care most about is that screw connector shown at the top in that picture. I really want it to be level such that the fan will not wobble. But given the way I'm building the bottom of the mold, this will not matter that much:
I've elected to have the seam be on the base and I've pressed the retainer into the clay. At this point I need to clean up the top of towers (look in the lower left one) or I'll be cleaning up each and every cast. Another decision is whether I want to add some more clay to provide material to shave off for the base connections. I.e., I pretty well near went into the outer wall on my earlier attempt. I still plan to add some material around the feet when I prep the bottom half.
Oh well, I still need to figure out what I am going to do about the casting box, so I've got some time.
I'm also hoping the resulting cast will be strong enough. The demos produced car model parts and I think that will be sufficient. I'll have to see - I may need to get different casting resin.
So we can see here that it doesn't look like I can shave enough off:
I went to Hobby Lobby over lunch and it looks like I going to need to make a mold and tap out some holes. I saw some Alumilite products that look like they would be durable plastic.
The plan would be to cover the screw holes and take a mold. Hmm, I might also look to add a bit of width on the hole bases. I.e., I probably need another millimeter or two of wiggle room. 1 for the hole and 1 for the rim.
But once I have a mold in place, I will also be able to make my own spares and not be so cautious with what I have. :->
Here is a picture of the CPU retainer bracket before any modifications:
All of the holes look nice and round.
I marked one corner (so I was always testing the same fit) and started in on the holes with a dremel. The intent is to shift them about 1mm up. I'm not using calipers or anything at all scientific. I'm doing it all by eye. Note that this means I might have the right side fitted correctly and not the left. And I may end up over compensating. My inspiration are those oval screw holes on older hard drive brackets.
I'm fitting the lower left and upper right screws in when I'm doing my tests. I leave the screws loose and move the bracket up and down to check the fit. Here we can see the original offset:
And here we can see the new and improved offset:
At this point, I'm getting leery of making more shavings. If we look at a closeup of one of the holes:
you can see that I may end up going to far. I have no clue what will happen as I tighten the screws later. I don't want a crack.
Just by eyeballing the remaining clearance needed and the bracket to be carved off, I'm not convinced that that this is plausible. Right now I have a working bracket for the stock CPU fans and I'm flashing on the old adage, "Measure twice, cut once."
Also, I've got another problem on the daughterboard. You can see that even unconnected my test blank is not going to fit:
There is a header there (for the CPU fan) that is blocking lining this up correctly. I shaved down the outside of the retainer and got it to fit:
I got the Dremel out and started shaving back the mounting holes for the CPU bracket. I think I can make it work. But on my last shave, I got a little impatient and made a cut bigger than I wanted. It wasn't damaging, but I realized I'm too groggy.
Anyway, I'll add pictures tomorrow and maybe continue on.
I've been assuming all along that the problem was the cpu fans. I did another search and came up with someone who installed a Zalman: here and write-up here.
One of the pictures had a closeup of the case fan, so I searched for it and here is a description: here:
If noise issue is no problem for you, you might want to consider this Delta 120mm fan. It features a whopping 152 CFM of airflow at 53 dBa. For a 120mm fan that is 38mm thick that isn't too shabby. This high end Delta 120mm fan is often used in rackmount server rooms where noise is not an issue.
I can't swear, but I think the CPU fan is a:
Tach output, 9 blade, very high performance 40CFM, moderate noise 38dBA,
Okay, it is clear Sun was getting a good deal on server room fans and not desktop fans.
But, as a server, these things kick butt!
If I wanted to keep the stock fans, I think I would look into building a box to house the computer. I've looked at noise dampening from an audio perspective (think garage band) and it looks like a simple box with 2 layers of drywall might do. Sounds heavy. I'd probably consider a light wooden frame and the sides made out of a double layer of cork. This would not be a load bearing box, just something that could be placed over the computer.
You would want airflow (the trade-off is classically sound versus temperature.) with a means for air to come out. Perhaps a low blowing case fan? :->
I think I could build something like that for much less than the sound reduction boxes I see online.
Hmm, too late at night - but I wanted to capture the thoughts on the noise reduction box!