Verizon and my SSN

Apr 29th, 2011 | Filed under General

I finally decided to buy an iPhone 4 from Verizon. You can think of it as contributing to the cause, but I do like Apple products anyway. It is not about being a fanboy. It is about liking well designed products and caring about quality implementations in both hardware and software. Anyway, even though I know a new iPhone will be out in September possibly, I don’t want to wait that long. I had a family plan at Sprint so that I could provide my parents with a cell phone as well. Since I was in town on Wednesday, it made sense to take care of this right then so I could give them their new phone (and I decided to buy them an iPhone 4 as well – am I not a nice guy :-)

I’m sitting there at their sales desk with two iPhones, actually excited to spend more than $500 and make a long commitment to a carrier – which is something I don’t particularly care for. I typed in my SSN on their slow, broken keypad too quickly apparently. For some reason it left out the last two numbers, so I backtracked and started over. Their software refused to do the credit check on me. The sales woman asked me if I had anything with my SSN printed on it. I did not of course. The state of Ohio no longer will put it on a drivers license, and I lost my original card about 16 years ago or so. They actually could not get around this point. They would not sell me thousands of dollars in a combination of service and hardware because I could not simply tell them my SSN so they could do their manual process. I pointed out the following facts to no effect, as far as I am aware anyway:

  1. I could just go to their website and I would not be forced to produce a physical copy of my SSN.
  2. I could buy the iPhone from Apple and not be forced by produce a physical copy of my SSN.

However, at the store, it seems that I could not even give my money away to them. To say I was upset would be putting a nice spin on it. I was nice about it, but I was clear about how I felt this was totally insane.

What’s even funnier about this is that I was closing on a home loan that day. That’s a story I won’t tell for another time, other than to say it was a good thing in the end. At no point during the process of securing that home loan did I have to produce a physical copy of my SSN. Yet I could not buy two damn iPhones without doing so, even though they could just use the SSN I would tell them and identify that I really am who I say I am in combination with my driver’s license easily.

Seriously Verizon, is this how you run a company that is interested in acquiring customers who are likely to spend insane amounts of money? Verizon? Can you hear me now? Maybe I should send them an email, but I suspect it might need to be accompanied by a physical copy of my SSN. I still cannot believe that this actually happened.

I stopped by the Social Security office after I was done closing this home loan at the bank. I figured that was a good time to take care of getting a replacement SSN card. I’ve been meaning to do that in case I were to run into totally moronic situations like this. I won’t get it for up to two weeks, so I thought I was still going to be out of luck in my quest for the iPhone 4. I was given a printout and told that could be used as proof for now! I did not know that :-) I went back to Verizon, and they were able to make a decision in less than a minute through their moronic manual process. My credit is excellent BTW.

I made sure to point out the fact that I never had to do this in my quest for securing way more money to settle a mortgage situation than I’ll ever give to Verizon. I now have my iPhone 4. I love it. However, I have no love for Verizon, tough they did make up for it somewhat by providing a way to block phone numbers for 90 days through the website.

That’s right SirrusXM, I don’t want your service anymore! SirrusXM? Can you hear me now? I won’t be hearing from you as you try to call me twice a day, every day, for apparently forever.

Mac OS X and Active Directory Tale of Woe

Apr 27th, 2011 | Filed under Active Directory, Mac OS X

We are starting to get new Mac systems in for desktop replacements at work. In the past these were treated as completely standalone systems. Users had local home directories, and we weren’t even really backing them up due to the complications FileValut presented. I really hope Lion solves this with full disk encryption, but I’ve not even looked into this yet. We need to join the Mac systems to Active Directory to enforce password policies. I also want to have users mount their home directories from our EMC NAS storage.

I quickly ran into an interesting problem joining new Snow Leopard and legacy Leopard Mac systems to our Active Directory. I kept getting an error that indicated the account username and password I used could not join the machine to the AD. Of course, I know that was not the case. I was able to turn on the DirectoryService.debug.log file by sending a USR1 to the DirectoryService process. Sending a USR1 will toggle debug logging, so you can send another USR1 to turn it off again:

sudo killall -USR1 DirectoryService

See the manual page for more, like using USR2 to toggle API logging. There are hexadecimal identifiers for actions in the debug log. Once you find what you are looking for, you can grep on that ID value. This is what I saw happening:

2011-04-25 16:50:08 EDT - T[0xB0185000] - Active Directory:    Attempting Add Record......
2011-04-25 16:50:08 EDT - T[0xB0185000] - Active Directory:       Adding in OU = CN=Computers,DC=cse,DC=ohio-state,DC=edu
2011-04-25 16:50:08 EDT - T[0xB0185000] - Active Directory:    Added record CN=rowland-mac,CN=Computers,DC=cse,DC=ohio-state,DC=edu
2011-04-25 16:50:08 EDT - T[0xB0185000] - Active Directory:    Setting Computer Password......
2011-04-25 16:50:08 EDT - T[0xB0185000] - Active Directory:    Deleting Record CN=rowland-mac,CN=Computers,DC=cse,DC=ohio-state,DC=edu...
2011-04-25 16:50:08 EDT - T[0xB0185000] - Active Directory:    Setting Computer Password FAILED Deleted Record......

Clearly Mac OS X was able to do everything except change the password on the computer account once it had added it to the AD. I fired up Wireshark on my Mac and saw the following packets. The first is the request to change the computer account password using the kpasswd protocol:


No.     Time        Source                Destination           Protocol Info
    691 3.521763    164.107.120.107       164.107.114.11        KPASSWD  Request

Frame 691 (234 bytes on wire, 234 bytes captured)
    Arrival Time: Apr 25, 2011 19:38:35.971888000
    [Time delta from previous captured frame: 0.000116000 seconds]
    [Time delta from previous displayed frame: 0.000116000 seconds]
    [Time since reference or first frame: 3.521763000 seconds]
    Frame Number: 691
    Frame Length: 234 bytes
    Capture Length: 234 bytes
    [Frame is marked: False]
    [Protocols in frame: eth:ip:udp:kpasswd]
    [Coloring Rule Name: UDP]
    [Coloring Rule String: udp]
Ethernet II, Src: Apple_b3:4d:49 (00:1b:63:b3:4d:49), Dst: 00:23:9c:46:f2:00 (00:23:9c:46:f2:00)
    Destination: 00:23:9c:46:f2:00 (00:23:9c:46:f2:00)
        Address: 00:23:9c:46:f2:00 (00:23:9c:46:f2:00)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Source: Apple_b3:4d:49 (00:1b:63:b3:4d:49)
        Address: Apple_b3:4d:49 (00:1b:63:b3:4d:49)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Type: IP (0x0800)
Internet Protocol, Src: 164.107.120.107 (164.107.120.107), Dst: 164.107.114.11 (164.107.114.11)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 220
    Identification: 0x340e (13326)
    Flags: 0x00
        0... = Reserved bit: Not set
        .0.. = Don't fragment: Not set
        ..0. = More fragments: Not set
    Fragment offset: 1480
    Time to live: 64
    Protocol: UDP (0x11)
    Header checksum: 0x11fd [correct]
        [Good: True]
        [Bad : False]
    Source: 164.107.120.107 (164.107.120.107)
    Destination: 164.107.114.11 (164.107.114.11)
    [IP Fragments (1680 bytes): #690(1480), #691(200)]
        [Frame: 690, payload: 0-1479 (1480 bytes)]
        [Frame: 691, payload: 1480-1679 (200 bytes)]
User Datagram Protocol, Src Port: 50406 (50406), Dst Port: kpasswd (464)
    Source port: 50406 (50406)
    Destination port: kpasswd (464)
    Length: 1680
    Checksum: 0xdfa0 [correct]
        [Good Checksum: True]
        [Bad Checksum: False]
MS Kpasswd
    Message Length: 1672
    Version: Request (0xff80)
    AP_REQ Length: 1513
    AP_REQ
        Kerberos AP-REQ
            Pvno: 5
            MSG Type: AP-REQ (14)
            Padding: 0
            APOptions: 00000000
                .0.. .... .... .... .... .... .... .... = Use Session Key: Do NOT use the session key to encrypt the ticket
                ..0. .... .... .... .... .... .... .... = Mutual required: Mutual authentication is NOT required
            Ticket
                Tkt-vno: 5
                Realm: CSE.OHIO-STATE.EDU
                Server Name (Unknown): kadmin/changepw
                    Name-type: Unknown (0)
                    Name: kadmin
                    Name: changepw
                enc-part aes256-cts-hmac-sha1-96
                    Encryption type: aes256-cts-hmac-sha1-96 (18)
                    Kvno: 100003
                    enc-part: AFF4597D76A9DED1446C2E9B1672DD931C8F3DFA2D33F1BE...
            Authenticator aes256-cts-hmac-sha1-96
                Encryption type: aes256-cts-hmac-sha1-96 (18)
                Authenticator data: 215C9C1108143CB84F138E2B522599B6F2AC87B6BBBEA6CA...
    KRB-PRIV
        Kerberos
            PRIV_BODY KRB-PRIV
                Pvno: 5
                MSG Type: KRB-PRIV (21)
                enc PRIV: 308183A003020112A27C047A88FB8B24C0A5901A14681E86... aes256-cts-hmac-sha1-96
                    Encryption type: aes256-cts-hmac-sha1-96 (18)
                    Encrypted PRIV

The very next frame contains a KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN error response:


No.     Time        Source                Destination           Protocol Info
    692 3.522243    164.107.114.11        164.107.120.107       KPASSWD  KRB Error: KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN

Frame 692 (146 bytes on wire, 146 bytes captured)
    Arrival Time: Apr 25, 2011 19:38:35.972368000
    [Time delta from previous captured frame: 0.000480000 seconds]
    [Time delta from previous displayed frame: 0.000480000 seconds]
    [Time since reference or first frame: 3.522243000 seconds]
    Frame Number: 692
    Frame Length: 146 bytes
    Capture Length: 146 bytes
    [Frame is marked: False]
    [Protocols in frame: eth:ip:udp:kpasswd]
    [Coloring Rule Name: UDP]
    [Coloring Rule String: udp]
Ethernet II, Src: 00:23:9c:46:f2:00 (00:23:9c:46:f2:00), Dst: Apple_b3:4d:49 (00:1b:63:b3:4d:49)
    Destination: Apple_b3:4d:49 (00:1b:63:b3:4d:49)
        Address: Apple_b3:4d:49 (00:1b:63:b3:4d:49)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Source: 00:23:9c:46:f2:00 (00:23:9c:46:f2:00)
        Address: 00:23:9c:46:f2:00 (00:23:9c:46:f2:00)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Type: IP (0x0800)
Internet Protocol, Src: 164.107.114.11 (164.107.114.11), Dst: 164.107.120.107 (164.107.120.107)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 132
    Identification: 0x3a83 (14979)
    Flags: 0x04 (Don't Fragment)
        0... = Reserved bit: Not set
        .1.. = Don't fragment: Set
        ..0. = More fragments: Not set
    Fragment offset: 0
    Time to live: 127
    Protocol: UDP (0x11)
    Header checksum: 0x8d98 [correct]
        [Good: True]
        [Bad : False]
    Source: 164.107.114.11 (164.107.114.11)
    Destination: 164.107.120.107 (164.107.120.107)
User Datagram Protocol, Src Port: kpasswd (464), Dst Port: 50406 (50406)
    Source port: kpasswd (464)
    Destination port: 50406 (50406)
    Length: 112
    Checksum: 0xce60 [correct]
        [Good Checksum: True]
        [Bad Checksum: False]
Kerberos KRB-ERROR
    Pvno: 5
    MSG Type: KRB-ERROR (30)
    stime: 2011-04-25 23:38:35 (UTC)
    susec: 798969
    error_code: KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN (7)
    Realm: CSE.OHIO-STATE.EDU
    Server Name (Service and Instance): kadmin/changepw
        Name-type: Service and Instance (2)
        Name: kadmin
        Name: changepw
    e-data

This got me thinking. I did some searching and found a post outlining a similar issue and another one from NetApp on TechNet outlining the same type of problem. Both reference this hotfix to solve the problem. I checked using repadmin.exe to see if we had done an authoritative restore that affected the krbtgt user, and sure enough the version numbers were higher than 100,000. The version numbers were much lower on our research domain controller, and I was able to join there.

I duplicated this in a Windows Server 2008 VM I have at home. The version numbers on the krbtgt user AD attributes matched our research domain controller before doing the authoritative restore:

C:\Users\Administrator>repadmin.exe /showobjmeta vmware-win2008 cn=krbtgt,cn=users,dc=vmware,dc=local

35 entries.
Loc.USN                           Originating DSA  Org.USN  Org.Time/Date
 Ver Attribute
=======                           =============== ========= =============
 === =========
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 objectClass
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 cn
  12326    Default-First-Site-Name\VMWARE-WIN2008     12326 2009-02-01 23:09:06
   2 description
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 instanceType
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 whenCreated
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 displayName
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 showInAdvancedViewOnly
  16680    Default-First-Site-Name\VMWARE-WIN2008     16680 2009-02-01 23:30:09
   2 nTSecurityDescriptor
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 name
  12323    Default-First-Site-Name\VMWARE-WIN2008     12323 2009-02-01 23:09:06
   3 userAccountControl
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 codePage
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 countryCode
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 homeDirectory
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 homeDrive
  12324    Default-First-Site-Name\VMWARE-WIN2008     12324 2009-02-01 23:09:06
   2 dBCSPwd
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 scriptPath
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 logonHours
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 userWorkstations
  12324    Default-First-Site-Name\VMWARE-WIN2008     12324 2009-02-01 23:09:06
   2 unicodePwd
  12324    Default-First-Site-Name\VMWARE-WIN2008     12324 2009-02-01 23:09:06
   2 ntPwdHistory
  12324    Default-First-Site-Name\VMWARE-WIN2008     12324 2009-02-01 23:09:06
   2 pwdLastSet
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 primaryGroupID
  12325    Default-First-Site-Name\VMWARE-WIN2008     12325 2009-02-01 23:09:06
   1 supplementalCredentials
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 userParameters
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 profilePath
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 objectSid
  16680    Default-First-Site-Name\VMWARE-WIN2008     16680 2009-02-01 23:30:09
   1 adminCount
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 comment
  12322    Default-First-Site-Name\VMWARE-WIN2008     12322 2009-02-01 23:09:06
   1 accountExpires
  12324    Default-First-Site-Name\VMWARE-WIN2008     12324 2009-02-01 23:09:06
   2 lmPwdHistory
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 sAMAccountName
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 sAMAccountType
  12446    Default-First-Site-Name\VMWARE-WIN2008     12446 2009-02-01 23:09:21
   1 servicePrincipalName
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 objectCategory
  12321    Default-First-Site-Name\VMWARE-WIN2008     12321 2009-02-01 23:09:06
   1 isCriticalSystemObject
0 entries.
Type    Attribute     Last Mod Time                            Originating DSA
Loc.USN Org.USN Ver
======= ============  =============                           =================
======= ======= ===
        Distinguished Name
        =============================

Next I did an authoritative restore:

C:\Users\Administrator>net stop ntds
<snip>

C:\Users\Administrator>ntdsutil
ntdsutil: activate instance NTDS
Active instance set to "NTDS".
ntdsutil: authoritative restore
authoritative restore: restore subtree cn=Users,dc=vmware,dc=local

Opening DIT database... Done.

The current time is 04-26-11 00:10.12.
Most recent database update occured at 04-26-11 00:07.10.
Increasing attribute version numbers by 100000.

Counting records that need updating...
Records found: 0000000051
Done.

Found 51 records to update.

Updating records...
Records remaining: 0000000000
Done.

Successfully updated 51 records.

The following text file with a list of authoritatively restored objects has been
 created in the current working directory:
        ar_20110426-001012_objects.txt

One or more specified objects have back-links in this domain. The following LDIF
 files with link restore operations have been created in the current working dir
ectory:
        ar_20110426-001012_links_vmware.local.ldf

Authoritative Restore completed successfully.

authoritative restore: quit
ntdsutil: quit

C:\Users\Administrator>net start ntds
<snip>

I checked the version numbers again, and now they matched our production AD at work:

C:\Users\Administrator>repadmin.exe /showobjmeta vmware-win2008 cn=krbtgt,cn=users,dc=vmware,dc=local

36 entries.
Loc.USN                           Originating DSA  Org.USN  Org.Time/Date
 Ver Attribute
=======                           =============== ========= =============
 === =========
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 objectClass
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 cn
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00002 description
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 instanceType
  12321      e470cfbf-a782-40ab-badb-314e2760696d     12321 2009-02-01 23:09:06
   1 whenCreated
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 displayName
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00000 isDeleted
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 showInAdvancedViewOnly
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00002 nTSecurityDescriptor
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 name
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00003 userAccountControl
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 codePage
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 countryCode
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 homeDirectory
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 homeDrive
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00002 dBCSPwd
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 scriptPath
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 logonHours
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 userWorkstations
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00002 unicodePwd
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00002 ntPwdHistory
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00002 pwdLastSet
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 primaryGroupID
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 supplementalCredentials
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 userParameters
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 profilePath
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 objectSid
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 adminCount
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 comment
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 accountExpires
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00002 lmPwdHistory
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 sAMAccountName
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 sAMAccountType
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 servicePrincipalName
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 objectCategory
 819244    Default-First-Site-Name\VMWARE-WIN2008    819244 2011-04-26 00:10:121
00001 isCriticalSystemObject
0 entries.
Type    Attribute     Last Mod Time                            Originating DSA
Loc.USN Org.USN Ver
======= ============  =============                           =================
======= ======= ===
        Distinguished Name
        =============================

It is hard to tell due to the wrapping, but the first digit is right up against the end of the date (the final digit on each line is the first of the version number in most cases). I was able to hack up the DNS on my Mac OS X Server VM enough so that it could attempt to join my test AD VM (at the expense of it not working when running Server Manager though). I was able to join before the authoritative restore, but not after. I installed the hotfix, and it was all working again. I did a snapshot before starting in both cases.

This was a weird problem that was hard to figure out. Searching online reveals a lot of problems that have to do with standard “using the wrong account” or “DNS is not configured properly” issues. It was only after having dug deeper and finding the specific sequence of events from the DirectoryService debug log and Wireshark packet inspection that I was able to narrow down the search to find the hotfix. So, if you can’t join your Mac to your AD, it’s possible this is the issue. Even though this was a big problem for about a day, it was an interesting problem to solve.

Removing Shut Down from the Gnome Panel in RHEL 6

Apr 20th, 2011 | Filed under UNIX System Administration

I am still working on our Linux migration. It has been a slow process due to various other projects I have to deal with. We were going to go with RHEL 5, but now that RHEL 6 is out, I decided to take the work done with RHEL 5 and go for the newer distribution. I should have something for our users to test soon. We have login servers used remotely via XDMCP queries, and we configure GDM to deal with that. This means that I need to customize the Gnome panel menus and keep certain applets from starting for regular users. I didn’t have trouble with customizing the menus, but I had a hard time removing the “Shut Down…” menu item from the Gnome panel because it is not an actual menu item. The Gnome panel is putting it there itself.

My original title for this post was going to include the words “one step forward, six steps back,” but that probably would not be search engine friendly. And let it be known that I have had plenty of search engine experience trying to figure out how to solve this problem once I got through looking at the Gnome documentation! I’m hoping that this might help someone else. I saw a lot of questions on how to do this, but no real solutions. Only one solution seemed like it would even be close, but it still did not solve the problem for me. The suggestion was to create a local Gnome PolicyKit file to configure certain settings. You can read more about this with “man pklocalauthority”. I created an /etc/polkit-1/localauthority/50-local.d/10-shutdown.pkla file with the following contents:

[shutdown]
Identity=unix-user:*
Action=org.freedesktop.consolekit.system.*
ResultAny=no
ResultInactive=no
ResultActive=no

[hibernate]
Identity=unix-user:*
Action=org.freedesktop.devicekit.power.*
ResultAny=no
ResultInactive=no
ResultActive=no

I found those actions by digging around in /usr/share/polkit-1/actions, but they did not quite seem to fit what I wanted to do exactly. I set them anyway, and of course they did not work as I wanted. They do work though. I can change the values used and I see different things immediately, so it’s not a problem of the file being evaluated. I figured that these were a long shot given the comments in the policy files anyway.

You can remove the option to log out by setting the apps/panel/global/disable_log_out key to true in GConf. This removes the “Log Out” and “Shut Down…” items from the Gnome panel, but I do want my users to be able to log out. They can’t just stay logged in forever, right? Some like to try that with VNC, but it’s not a good idea :-) My solution was to set disable_log_out to true and then add my own “Log Out” item directly to the menu. The first thing that I did was to set disable_log_out to true as a mandatory option. The /etc/gconf/gconf.xml.mandatory/%gconf-tree.xml file result is below:

<?xml version="1.0"?>
<gconf>
	<dir name="apps">
		<dir name="panel">
			<dir name="global">
				<entry name="disable_log_out" mtime="1303269759" type="bool" value="true"/>
				<entry name="disabled_applets" mtime="1303269755" type="list" ltype="string">
					<li type="string">
						<stringvalue>OAFIID:GNOME_CDPlayerApplet</stringvalue>
					</li>
					<li type="string">
						<stringvalue>OAFIID:GNOME_BattstatApplet</stringvalue>
					</li>
					<li type="string">
						<stringvalue>OAFIID:GNOME_MixerApplet</stringvalue>
					</li>
					<li type="string">
						<stringvalue>OAFIID:GNOME_CPUFreqApplet</stringvalue>
					</li>
					<li type="string">
						<stringvalue>OAFIID:GNOME_DriveMountApplet</stringvalue>
					</li>
				</entry>
			</dir>
		</dir>
	</dir>
</gconf>

Only the first entry is related to removing the “Shut Down…” item. The other entries disable some applets, though I had varying levels of success there as well. I had to manually remove some RPMs to get rid of certain things, but that’s all right because they really were not needed in the first place. Note that the previous settings in apps/gnome-power-manager are not there and don’t work if set.

The next task was adding a new “Log Out” item to the menu. I created a /usr/share/applications/logout.desktop file with the following contents:

[Desktop Entry]
Version=1.0
Type=Application
Terminal=false
Icon[en_US]=gdu-smart-failing
Name[en_US]=Log Out
Exec=/usr/bin/gnome-session-save --logout-dialog
Comment[en_US]=Log out
Name=Log Out
Comment=Log out
Icon=gnome-panel-launcher

I added this to the Gnome panel menu in approximately the same place by adding an include at the end of the /etc/xdg/menus/settings.menu file shown below:

<!DOCTYPE Menu PUBLIC "-//freedesktop//DTD Menu 1.0//EN"
 "http://www.freedesktop.org/standards/menu-spec/1.0/menu.dtd">

<Menu>

  <Name>Settings</Name>
  <Directory>Desktop.directory</Directory>

  <!-- Scan legacy dirs first, as later items take priority -->
  <LegacyDir>/etc/X11/applnk</LegacyDir>
  <LegacyDir>/usr/share/gnome/apps</LegacyDir>

  <!-- Read standard .directory and .desktop file locations -->
  <DefaultAppDirs/>
  <DefaultDirectoryDirs/>

  <!-- Read in overrides and child menus from applications-merged/ -->
  <DefaultMergeDirs/>

  <!-- Keep Preferences first -->
  <Layout>
    <Menuname>Preferences</Menuname>
    <Menuname>Administration</Menuname>
    <Menuname>Documentation</Menuname>
  </Layout>

  <!-- Merge in these other files as submenus -->
  <Menu>
    <Name>Preferences</Name>
    <MergeFile>preferences.menu</MergeFile>
    <MergeDir>preferences-post-merged</MergeDir>
  </Menu>

  <!-- System Settings -->
  <!--
  <Menu>
    <Name>Administration</Name>
    <Directory>SystemConfig.directory</Directory>
    <MergeFile>system-settings.menu</MergeFile>
  </Menu> -->     <!-- End System Settings -->

  <!-- Documentation -->
  <Menu>
    <Name>Documentation</Name>
    <Directory>Documentation.directory</Directory>
    <MergeFile>documentation.menu</MergeFile>
  </Menu>     <!-- End Documentation -->

  <Include>
    <Filename>logout.desktop</Filename>
  </Include>

</Menu> <!-- End Applications -->

Note that I commented out the inclusion of the system-settings.menu file. There are numerous things in that menu that regular users cannot run. It is best to hide those for obvious reasons. To be complete with this example, I also removed the “System Tools” menu under the “Applications” menu as well by commenting that out of the /etc/xdg/menus/applications.menu file as shown below:

<!DOCTYPE Menu PUBLIC "-//freedesktop//DTD Menu 1.0//EN"
 "http://www.freedesktop.org/standards/menu-spec/1.0/menu.dtd">

<Menu>

  <Name>Applications</Name>
  <Directory>X-GNOME-Menu-Applications.directory</Directory>

  <!-- Scan legacy dirs first, as later items take priority -->
  <LegacyDir>/usr/share/gnome/apps</LegacyDir>
  <LegacyDir>/etc/X11/applnk</LegacyDir>

  <!-- Read standard .directory and .desktop file locations -->
  <KDELegacyDirs/>
  <DefaultAppDirs/>
  <DefaultDirectoryDirs/>

  <!-- Add stock tarball installs to menus -->
  <AppDir>/usr/local/share/applications</AppDir>

  <!-- Accessories submenu -->
  <Menu>
    <Name>Accessories</Name>
    <Directory>Utility.directory</Directory>
    <Include>
      <And>
        <Category>Utility</Category>
        <Not>
          <Category>System</Category>
        </Not>
      </And>
    </Include>
  </Menu> <!-- End Accessories -->

  <!-- Development Tools -->
  <Menu>
    <Name>Development</Name>
    <Directory>Development.directory</Directory>
    <Include>
      <And>
        <Category>Development</Category>
      </And>
    </Include>
  </Menu> <!-- End Development Tools -->

  <!-- Education -->
  <Menu>
    <Name>Education</Name>
    <Directory>Education.directory</Directory>
    <Include>
      <And>
        <Category>Education</Category>
      </And>
    </Include>
  </Menu> <!-- End Education -->

  <!-- Games -->
  <Menu>
    <Name>Games</Name>
    <Directory>Game.directory</Directory>
    <Include>
      <And>
        <Category>Game</Category>
      </And>
    </Include>
  </Menu> <!-- End Games -->

  <!-- Graphics -->
  <Menu>
    <Name>Graphics</Name>
    <Directory>Graphics.directory</Directory>
    <Include>
      <And>
        <Category>Graphics</Category>
      </And>
    </Include>
  </Menu> <!-- End Graphics -->

  <!-- Internet -->
  <Menu>
    <Name>Internet</Name>
    <Directory>Network.directory</Directory>
    <Include>
      <And>
        <Category>Network</Category>
	<Not><Category>Settings</Category></Not>
      </And>
    </Include>
  </Menu>   <!-- End Internet -->

  <!-- Multimedia -->
  <Menu>
    <Name>Multimedia</Name>
    <Directory>AudioVideo.directory</Directory>
    <Include>
      <And>
        <Category>AudioVideo</Category>
	<Not><Category>Settings</Category></Not>
      </And>
    </Include>
  </Menu>   <!-- End Multimedia -->

  <!-- Office -->
  <Menu>
    <Name>Office</Name>
    <Directory>Office.directory</Directory>
    <Include>
      <And>
        <Category>Office</Category>
      </And>
    </Include>
  </Menu> <!-- End Office -->

  <!-- System Tools-->
  <!--
  <Menu>
    <Name>System Tools</Name>
    <Directory>System-Tools.directory</Directory>
    <Include>
      <And>
        <Category>System</Category>
	<Not><Category>Settings</Category></Not>
        <Not><Category>Screensaver</Category></Not>
      </And>
    </Include>
  </Menu> -->  <!-- End System Tools -->

  <!-- Other -->
  <Menu>
    <Name>Other</Name>
    <Directory>X-GNOME-Other.directory</Directory>
    <OnlyUnallocated/>
    <Include>
      <And>
        <Not><Category>Core</Category></Not>
        <Not><Category>Settings</Category></Not>
        <Not><Category>SystemSetup</Category></Not>
        <Not><Category>X-Red-Hat-ServerConfig</Category></Not>
        <Not><Category>Screensaver</Category></Not>
        <Not><Category>Documentation</Category></Not>
      </And>
    </Include>
  </Menu> <!-- End Other -->

  <MergeFile>applications-kmenuedit.menu</MergeFile>

  <!-- Read in overrides and child menus from applications.d -->
  <DefaultMergeDirs/>
</Menu> <!-- End Applications -->

There is more to the configuration of course, but this covers my solution to removing “Shut Down…” from the Gnome panel.

This situation does not seem to be well thought out in Gnome. If you just think of a workstation alone, I doubt you would rarely (if ever) want to remove the “Shut Down…” item, but they do allow you to remove “Log Out”. Why allow one to be removed and not the other? Some thought has been put into it though with respect to the Gnome PolicyKit default values for the items I mentioned previously. If you log in over XDCMP and click “Shut Down…”, it only gives you the option to “Hibernate” or “Cancel”. What’s funny is that it doesn’t hibernate. It only locks the screen. That’s obviously confusing. There really should be some way to remove “Shut Down…” using either GConf or by setting policy. I could not find a way. I could not find anyone else suggesting something that actually works with current versions of Gnome either. This is my solution. I hope this is addressed soon. It’s possible I missed something, but if I did, it was simply too hard to find – which would be a different problem :-)

The Elusive Command

Apr 4th, 2011 | Filed under General

I got one of the best root emails today:

omicron : Apr 4 15:44:02 : <username> : user NOT in sudoers ; TTY=pts/96 ; PWD=<home dir> ; USER=root ; COMMAND=getporn

Apparently users cannot be bothered with search engines anymore.

Fixing the Twitter Spaces Bug

Mar 27th, 2011 | Filed under Software

I installed the Mac App Store as soon as it was available. One of the first applications I installed was Twitter as a replacement for Tweetie. I was impressed at first. I actually liked the UI better. Then came the updates, and it all went downhill quickly. First, they changed the icon. I really don’t care too much about the icon changing, but the original black icon was better. I quickly noted that Twitter did not work with Spaces correctly. I had assigned it to Space 1. I didn’t want it to follow me around everywhere. One of the features of Twitter 2.0.1 (I believe) was “improved Spaces support”. I guess that means developers deciding that Twitter was to follow me to every Space. Honestly, this is something you don’t mess up. This is a HIG 101 issue. I actually uninstalled Twitter and put Tweetie back on.

I did some searching and found some AppleScript to fix the problem. I was solving this by hand with:

  1. Starting Twitter.
  2. Going into Exposé and Spaces and setting it to be on Space 3.
  3. Setting it to be back on Space 1

That “fixed” the problem for various definitions of “fixed” that you will not find in a dictionary. A lot of people have complained about this. I can’t believe there has not been an update. I am pretty sure I saw one of the developers tweet that this would be fixed in the next update. News flash: we’ve been waiting for a little less than forever for this fix. This is something that warrants an immediate update IMO. My latest fix, because I have to use this version for some reason, is to slightly tweak the AppleScript solution I found to actually launch Twitter, delay for 1 second, and then apply the fix:

launch application "Twitter"
delay 1

tell application "System Events"
	set x to application bindings of spaces preferences of expose preferences
	set x to {|com.twitter.twitter-mac|:3} & x -- 3 is any space you don't want twitter on
	set application bindings of spaces preferences of expose preferences to x
	set x to {|com.twitter.twitter-mac|:1} & x -- 1 is the space you want twitter to be on
	set application bindings of spaces preferences of expose preferences to x
end tell

The delay is needed or the code that follows is executed before Twitter can hose itself up. Really, this is like a bad joke :-) I saved that as an application and reopened it in the ActionScript Editor so that I could pull the Twitter icon and replace the application bundle’s applet.icns file. I saved it as Twitter Fix in my Applications folder so that I can just launch that instead of Twitter. Note that I know very little about AppleScript in general. One last note, you need to set Twitter to use a specific Space first for the fix to work.

I purchased Alfred and use that to launch Twitter Fix among other things. Alfred is pretty awesome. We’ll see how it affects my real workflow. I like it a lot so far.

will_paginate When You Come Back

Mar 22nd, 2011 | Filed under Ruby, Ruby on Rails

I implemented params hash caching in a couple of my Rails 2.3.x projects. I use this to store the parameters that exist when entering the index action of my controllers for index views that have filters and pagination. I’ve done this so that users see the same view when returning to the index later, otherwise they would just get the initial default view. The params are stored in the user’s session, keyed on the specific controller, which is backed by the database.

I’m using the excellent will_paginate gem for pagination. The following code represents two different index methods that use params caching and pagination:

def index
  last_params params
  @formulas = Formula.find_filtered_formulas(params)
  save_params params

  respond_to do |format|
    format.html # index.html.erb
    format.xml  { render :xml => @formulas }
  end
end

The find_filtered_formulas() class method on the Formula model uses the params hash to build its own conditions. I am sure that can be done better with named scopes, etc. These are my first serious Rails applications, and that’s what I came up with at the time. The second method does pagination directly:

def index
  last_params params
  @people = Person.paginate(:page => params[:page], :order => :surname)
  save_params params

  respond_to do |format|
    format.html # index.html.erb
    format.xml  { render :xml => @people }
  end
end

That all worked fine, but I ran into a subtle issue with caching the params[:page] parameter. What happens if the page no longer has data? The will_paginate gem will happily accept a page that would not otherwise exist and return an empty array of results. As long as there are still multiple pages, you will still get the pagination links and be able to click on some other page. That is, unless there is now only one page left! In that case, you get no pagination links and you are stuck on some nonexistent page unable to get off until your session data is destroyed. I happened to notice this when deleting the last item on page 2, which then reloaded the index view. This is obvious if you think about it of course, but I missed it initially. It would be nice if the will_paginate gem just loaded the last page in this case, but that’s not really its job.

My first attempt to solve this involved writing code around the index action finds to try the find again with the previous page, but besides the fact that this needed serious refactoring and was ugly, it didn’t solve the case where multiple pages were now gone from some other deletions since the last time the index was viewed by the user.

I started to think about using a block to keep the code in the index() method clean. Blocks are closures, and I thought that it would be easy to store the results from the yield into a local variable in a method and keep the code that backtracks pages in one place. I came up with this at 4:20AM about two hours after I went to bed while unable to sleep due to thinking about my initial solution. I really wanted to get up and try it out, but I had a meeting at 11AM! I wasn’t able to work on it until this afternoon, but it worked just as expected. I have the following code in my application controller:

def previous_page_if_empty(params)
  results = []

  if block_given?
    begin
      results = yield
    rescue TypeError
      results = []
    end

    previous = 0
    max_previous = 5

    while (params[:page] && params[:page].to_i > 1 && results.empty?)
      if previous < max_previous
        params[:page] = (params[:page].to_i - 1).to_s
      else
        params[:page] = '1'
      end

      begin
        results = yield
      rescue TypeError
        results = []
      end

      previous += 1
    end
  end

  results
end

That tries to get results from yielding to the block passed to the method. If there are no results and we are not on page 1, it backtracks to the previous page and tries again. It will backtrack 5 pages, but then give up and go to page 1. Page 1 either always has results or not, period. The choice to go back 5 pages is completely arbitrary. I could just go back one page, and it would function just fine eliminating the while loop. Perhaps I just like trying too hard. Returning the results in the method is unnecessary, and that’s not what makes it work, but I do so anyway. You might be wondering what those begin/rescue blocks are all about. In some complicated filtered results situations I build the results array manually like the following:

settings = WillPaginate::Collection.create(page, per_page,
                                           settings.size) do |pager|
  pager.replace settings[pager.offset, pager.per_page]
end

But that’s extremely rare. The page variable comes from params[:page]. The problem there is that this will throw a TypeError complaining about trying to treat nil as an Array if you use a page number that’s too large. This doesn’t happen with Active Record finds. I rescue from that so that the previous_page_if_empty() method can be used in both situations. I’ve seen some slightly different ways of building the paginated collection, so it is possible a different implementation would not result in a TypeError, but I have my doubts and would need to investigate more.

With the previous_page_if_empty() method I can write the same finds above as:

def index
  last_params params

  previous_page_if_empty(params) do
    @formulas = Formula.find_filtered_formulas(params)
  end

  save_params params

  respond_to do |format|
    format.html # index.html.erb
    format.xml  { render :xml => @formulas }
  end
end

I am sure there could be some changes to make this all better, more refactoring, etc. I thought this was interesting because of how blocks work in Ruby. I have to say that I really love Ruby. I’ve only been doing Rails for a little while. I’m reading the latest Pickaxe book now. I read the older free version online before Ruby 1.9.2 was released. I highly recommend all of it!

Update

As pointed out by the first comment (thanks François-Pierre Bouchard), it is possible to get the total number of pages if this was an ActiveRecord paginate() call directly instead of backtracking. I was backtracking because I had some cases where I was building the collection manually. That’s probably an unusual case, and it’s much better to just get the total_pages() value if you can, and certainly you can do that in almost all cases. If I am working with a block that tried to use WillPaginate::Collection.create() I can’t do that however, and I need this method to handle both cases. I had seen total_pages() in the API docs, but I probably didn’t use it because of manual collection creation (and I really needed to get it solved ASAP once I saw the bug I had). I modified the method to be the following to handle both cases:

# This method ensures that results are returned even if the page requested
# is beyond the total number of pages in the collection returned by the
# attempt to paginate using will_paginate. For ActiveRecord pagination
# attempts, if the result is empty but we are not asking for page one,
# we can simply use the total_pages() value for the collection returned
# and be certain we are on the last page. For manual collections built
# with WillPaginate::Collection.create() we will get a TypeError if we
# attempt to use a page beyond the total number of pages and we cannot
# directly see how many pages there would be. In that case, this ensures
# that we backtrack to a previous page if the current page stored in the
# session for the user no longer has results. This is needed because
# results on index views might have changed dramatically since the
# last time someone viewed them. Normally this is all right, though not
# optimal, because the pagination links will still exist. However, this is
# not the case if only one page of data is left due to the fact that
# pagination links are not displayed if there is just one page. In that
# case the user is stuck. This only attempts to go back a certain number
# of pages in the manual collection creation case before giving up and
# shooting for page 1. This is also better behavior if someone deletes
# the last item on a page, because on reload, they will automatically go
# to the previous page. In the ActiveRecord pagination case, we're always
# able to find the last page directly, so we go straight to that.
#
# Note the begin/rescue block around the yields. This is necessary if
# you build the paginated collection manually with:
#
# items = WillPaginate::Collection.create(page, per_page,
#                                         items.size) do |pager|
#   pager.replace items[pager.offset, pager.per_page]
# end
#
# In the code above, using a page value greater than what's possible actually
# generates a TypeError about trying to convert nil to an array as mentioned
# previously. This does not happen with normal ActiveRecord paginate() calls
# however. Rescuing from that allows this method to be used in the manual
# pagination build case as well.
#
# This should be used with any find method that relies on the params[:page]
# variable being saved and restored before searching. The pattern in this
# code is something like the following in the index action on a controller:
#
# def index
#   last_filter_params(params)
#
#   previous_page_if_empty(params) do
#     @items = Item.find_filtered_items(params)
#   end
#
#   save_filter_params(params)
#   <snip>
# end
#
# In the case above, the find_filtered_items(params) method uses the
# params[:page] value. You could also have done a direct paginate() find
# call there as well.
def previous_page_if_empty(params)
  results = []
  manual_collection = false

  if block_given?
    begin
      results = yield
    rescue TypeError
      results = []
      manual_collection = true
    end

    # If this was an attempt to yield to a block that creates a collection
    # using WillPaginate::Collection.create(), a TypeError would have been
    # raised if we tried to specify a page beyond the total number of pages
    # and the results variable will not contain a collection. Since we
    # don't know the number of total pages here, we can only backtrack to
    # try and figure it out.
    if manual_collection
      previous = 0
      max_previous = 5

      while params[:page] && params[:page].to_i > 1 && results.empty?
        if previous < max_previous
          params[:page] = (params[:page].to_i - 1).to_s
        else
          params[:page] = '1'
        end

        begin
          results = yield
        rescue TypeError
          results = []
        end

        previous += 1
      end
    else
      # This is the result of a direct ActiveRecord pagination call. It will
      # be empty but have the total pages available. We can just use that
      # directly.
      if params[:page] && params[:page].to_i > 1 && results.empty?
        params[:page] = results.total_pages.to_s
        results = yield
      end
    end
  end

  results
end

RADUM 0.0.3

Jan 9th, 2011 | Filed under Ruby, Software

I updated my RADUM gem to 0.0.3. This version uses the new net-ldap 0.1.1 gem. I tested with Ruby 1.8.7, JRuby 1.5.6, and Ruby 1.9.2. I’ve wanted to get this working with Ruby 1.9 for quite a while, but the previous ruby-net-ldap gem did not work with Ruby 1.9. The new net-ldap gem supposedly works with Ruby 1.9, but I had to add the String#to_a method back through a monkey patch to get it working. I should probably file a bug report (or see if there already is one). I’m not sure if this is something new in Ruby 1.9.2 though, so I can check that as well. In any case, it works with Ruby 1.9.2 now with this small hack. I made sure to note this on the main RDoc page.

Speaking of RDoc, I don’t know what’s up with the default RDoc version in Ruby 1.9.2 (installed through RVM), but my RDoc generation resulted in the following error:

ERROR:  While generating documentation for radum-0.0.3
... MESSAGE:   Error while evaluating /Users/rowland/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/rdoc/generator/template/darkfish/classpage.rhtml: undefined method `accept' for nil:NilClass (at "\n\n\t\t\t\t<div class=\"method-description\">\n\t\t\t\t\t\n\t\t\t\t\t")
... RDOC args: lib LICENSE.rdoc
(continuing with the rest of the installation)

That sure looks like an RDoc bug to me. Updating RDoc to the latest version fixed the problem. I have no idea why this is a problem with my code, but searching online resulted in no real answer.

The Second Great NIS Migration

Dec 27th, 2010 | Filed under UNIX System Administration

Yeah, we still use NIS. I know, but it’s so easy! :-) I’ll move on to Kerberos for user authentication and NIS for user information (UIDs, GIDs, sans encrypted passwords of any sort, etc.) soon enough just about when I get rid of Solaris in our environment at work. It should have happened a while ago. I’ve been busy unfortunately. That’s why I haven’t posted here much.

I was here around 10 years ago for our first NIS migration off of aging HP-UX 9.05 servers to Solaris 2.6 before Sun went all “7″ on us. Heh. We still used HP-UX 9.05 then. Crazy, right? We were the CIS department then, and I believe our NIS domain was cis.osu.new. I am not 100% sure about that, but I believe I changed it to cis.osu. Now we are the CSE department that embraces the word “Engineering”, so I went with just plain old cse for the NIS domain. I configured our NIS master and slave servers before our break outage today, but I kept having a problem where the master would not transfer the ypservers map to the slave. Of course the error was confusing because it said it was updating ypservers, so it didn’t come to me right off the bat that it was talking about a map. Sure enough, when setting up the NIS slave server with:

/usr/lib64/yp/ypinit -s <master>

it did not transfer the ypservers map. The /usr/lib64/yp/ypinit command gets the NIS maps from the following command:

[root@<slave> ~]# /usr/lib64/yp/yphelper --maps <master>
protocols.byname
hosts.byname
networks.byname
services.byname
auto.n
rpc.bynumber
networks.byaddr
passwd.byname
auto.master
passwd.byuid
rpc.byname
printinfo.byname
netid.byname
group.byname
netgroup.byuser
group.bygid
netgroup
netgroup.byhost
hosts.byaddr
ypservers
services.byservicename
protocols.bynumber
auto.home

I don’t know what the problem was. Naturally, that’s what it tells me now. I did not run it at the time. I simply proceeded to solve the problem by doing the transfer manually with:

/usr/lib64/yp/ypxfr -f -h <master> -c -d <domain> ypservers

That should have been executed, but apparently it was not. I don’t know why, but this solved it. It’s interesting because I found some posts online about this sort of thing, but I didn’t see any answers that helped (if there were any really). Most answers were for obvious things that aren’t really related, but honestly – I did not look that hard. The script is right there, and one might as well just look at what it is doing. So, if you have this problem setting up your NIS slave, a manual transfer might help out. There are other ways to accomplish the same thing of course.

OK, I know what you’re thinking… what in the heck is that printinfo.byname map? Yeah, I am pretty sure we just made that one up. Let’s just say it contains magic “proprietary” printer information for a BSD-style NIS integrated printing system I created that includes my own print quota daemon. I was going to just change that to printinfo, but I didn’t feel like tacking another task list item for fixing the print filter scripts.

Oh, I almost forgot one thing. I need to mount / with the “mand” option to turn on mandatory locking because we have a fully automated account management system that really wants to lock the passwd and group files. I also ran updates that installed a new kernel. After rebooting I waited forever, and then I finally went into the machine room and hooked up a monitor. Sure enough, the kernel install captured that mount option and the booting process had no idea how to deal with it. I still have to look into this, but I solved the problem by booting into the previous kernel, removing the option, reinstalling the latest kernel for Red Hat Server 5, adding the option back, and then rebooting. That seems pretty strange to me. How can having a supported mount option cause this kind of problem? In any case, I am sure there’s a reason, but now I have to remember to remove that option whenever there is a kernel update.

Celerra Open-File Cache Bug

Dec 9th, 2010 | Filed under Hardware, UNIX System Administration

It seems the NFS problem we were having is due to a bug in Celerra NAS codes 5.6.36 to 5.6.43 (fixed in 5.6.44). We upgraded to NAS code 5.6.40 before our EMC support ended. I found something in the EMC Knowledgebase about customers having performance issues under heavy CIFS load due to the CIFS trickle sync feature that was added, which lead to an issue with insufficient open-file cache resources because that feature allows CIFS to use the open-file cache. The open-file cache is used for NFS too, but instead of causing a performance issue when we run out of open-file cache resources, the NFS service simply starts returning NFS3ERR_IO instead! The suggested workaround, aside from upgrading, is to set the following parameter:

param cifs ofCache=0

We put that in our /nas/site/slot_param file to make it global to both Data Movers instead of simply putting in the /nas/server/slot_N/param file (where N is your slot number of course). The suggested workaround also included setting this on the command line as so:

.server_config server_2 -v "param cifs ofCache=0"

This parameter requires a reboot however, as indicated by server_param:

[nasadmin@nas-dl-cs ~]$ server_param server_2 -facility cifs -info ofCache
server_2 :
name                    = ofCache
facility_name           = cifs
default_value           = 1
current_value           = 0
configured_value        = 0
user_action             = reboot DataMover
change_effective        = reboot DataMover
range                   = (0,4294967295)
description             = NA

Setting it on the command line is useless because a reboot is required anyway, so I am not sure why that was even suggested. They also indicated this in the fix as well. Note that this ofCache param won’t “appear” until it is actually set, and it’s not in the documentation. Magical, right? I saw a reference online about this being a setting for NFS too, but I think that was for NAS code 6.x. There wasn’t enough detail.

We rebooted our Data Movers to set this value from the site slot_param file. Hopefully this will solve the problem, and it does sound like this bug is the cause of our problem. This information was hard to find. Sometimes I am amazed at the process that leads to finding stuff like this. I mean, have you ever tried searching in EMC Powerlink? :-)

EMC NFS Round 2

Dec 2nd, 2010 | Filed under Hardware, UNIX System Administration

We had the same problem today as yesterday, though at a much smaller scale. The following error messages seem to happen when NFS goes out to lunch:

Error	12/2/10 13:32	CFS	No free entries in open file cache
Error	12/2/10 13:32	CFS	last message repeated 101 times

I knew the word “cache” was in there somewhere. This combined with the recoverable single bit errors seems to be the problem so far. We failed over to our standby Data Mover and power-cycled the faulted primary. For now we’re just running on the old standby. It has not shown any of these errors yet. I imagine we’ll have to try failing back to see if power-cycling fixed it. I have a feeling the primary Data Mover’s hardware is getting flakey.

Fun times.