Legato NetWorker Internal Documentation

Legato NetWorker Internal Documentation

This document was previously modified on March 9th, 2009

Table of Contents
Tech Support and software registration
Software availability
Hardware configuration
Shipping LTO-3 tapes off-site for mainframe VTL backups
How to set up a tagged VLAN on Bootz
How to set up a tagged VLAN on Puss
How to start the NetWorker Console Manager
Daily bootstrap information and how to recover corrupt index files
Reboot process for Bootz
Reboot process for Puss
Policy and cost to add new clients
Cleaning Tape Drives
Mirapoint data backups
Mirapoint data recovery
How to tell if a tape needs to be loaded for a recovery
Problems that can interfere with backups/restores
How to run a test backup or back up a particular client
How to load writable tapes into the Qualstar library

How to rebuild Qualstar library configuration
(due to persistent binding issues)

Periodic Qualstar tape library cleaning

Mounting the X4500 disk device

Sending email notifications about backups
How to set up an NSR client to back up to Bootz via its IP address
NetWorker 7.4 software and documentation

Technical Support and software registration:

We have separate annual software technical support and software update agreements with EMC for Puss and Bootz.

The phone number for NetWorker tech support is 1-877-LEGATO7. Email is softwaresupport@emc.com. The web site is http://powerlink.emc.com.

Bootz's NetWorker enabler code for our NetWorker server is eb746d-a40f64-e1b758 and its host id 84829a10.
Puss' NetWorker enabler code is ef6871-b80b36-ac7ab5 and its host id is 600a5010.

Software support requests used to be submitted via phone or email. Now, a new web page at http://powerlink.emc.com/ is available for both software support and license registration. This web site requires a log-in ID and password. You can set up your own log-in ID and password by following the directions on the web site. The web site is at softwaresupport.emc.com/ which is also where technical documentation and reports are available. A knowledgebase is also available via this web site. This knowledgebase can be very helpful in troubleshooting some problems, so check it out before opening a tech support request.

To register a temporary or permanent software license, you need to open up the the NetWorker Console Manager and then open up the GUI on the NetWorker server (Bootz or Puss) where you want to enter the license. From there, double-click on the "configuration" button, then select "registrations" from menu lower left menu frame. Right click on "registrations" and select "new" from the pop up menu. You will then see a field labeled "Enabler code". You will receive the relevant enabler code via a pdf file that gets emailed to you by either EMC or the reseller who sold the license. Go to http://powerlink.emc.com and pull down the "licenses" page and register the license, then on the NetWorker server's GUI, type in this code into the field and click on the "Apply" button. After you do this, an expiration date will appear and so will a field called "Authorization Code".

We also have a hardware tech support agreement for the Sony PetaSite CSM-200BF tape library in the Bell Building. The hardware support for the Qualstar library is handled by CNS. For the Sony Petasite, if there is a problem with it, call Sony at 877-398-7669 for tech support. Serial number is 01009202. License key (for the Sony PetaSite software) is 1-66972-8020971030. The service agreement number is MA3789. Our maintenance agreemend ends each March, so in late January, contact William Chang at Campbridge Computer Services to renew it. Norm Fite is our Sony Petasite site engineer. Norm's cell phone number is 717-542-1396 and his office phone is 717-359-9991. Norm's email address is Norman.Fite@am.sony.com. Do not be afraid to contact Norm if you run into any problems with the Sony Petasite. He is very helpful.

Hardware support for the Qualstar 88132 tape library is provided by Qualstar. Its serial number is 2705065 and Qualstar can be contacted at. You can click here to access the Qualstar technical manual.There's also a web GUI available on the Qualstar. See Stan for the log-in information. The URL is qualstar.ocis.temple.edu/paccess.cgi

If you need to replace a tape drive in the PetaSite, call Sony to request a replacement drive. They will want the serial number of the faulty tape drive. To get the serial number, open up the PetaSite GUI, go into the help pull down menu and select the "version" item. This presents a window with several tapes. Click on the "drive" tab and expand the window by dragging it from the lower right corner out to make it larger and you will see the serial number for each tape drive and the firmeware version it is running.

Software availability:

Software and documentation is available on CD media in Stan's Office. This stuff can also be found at http://powerlink.emc.com/ then pull down thhe "support" tab. Printed documention for the Qualstar library is available in the tape cabinet next to the library. A few of the commonly used software files are also available on this web site for easier access.


Hardware configuration:

Our NetWorker data zone consists of one NetWorker 7.4 Server and two NetWorker 7.4 Storage Nodes. The NetWorker Server runs on a Sun Fire T2000 with Solaris 10. The first Storage Node runs on a Sun Fire X4500 with Solaris 10. The T2000 and X4500 are both located in the Bell Building and they share the Sony PetaSite tape library. They are both hardwired in via orange fiber channel cables via a pair of Qlogic 5200 SAN switches. Four of the PetaSite's 14 tape drives are shared by the T2000 and X4500 (this is known as dynamic drive sharing). Five tape drives are dedicated exclusively to the T2000 and another five are dedicated exclusively to the X4500. Each of the four shared tape drives can be used by either the T2000 or the X4500 (but only one at a time).

The PetaSite currently has 936 tape slots. Slots 1-925 are reserved for data cartridges. The remaining eleven tape slots are reserved for cleaning cartridges. Each new cleaning cartridge is good for 50 tape cleanings and the PetaSite's S-AIT1 drives need cleaning every 50 hours or so.

Note that the hostname for the T2000 is bootz.ocis.temple.edu and for the X4500, it is x4500.ocis.temple.edu

The following table shows how the PetaSite's tape drives are laid out ...

 

device file server
/dev/rmt/0cbn T2000
/dev/rmt/1cbn T2000
/dev/rmt/2cbn T2000
/dev/rmt/3cbn
T2000
/dev/rmt/4cbn T2000
/dev/rmt/5cbn T2000
/dev/rmt/6cbn
X4500
/dev/rmt/7cbn X4500
/dev/rmt/8cbn X4500
/dev/rmt/9cbn X4500
/dev/rmt/10cbn T2000 and X4500
/dev/rmt/11cbn T2000 and X4500
/dev/rmt/12cbn T2000 and X4500
/dev/rmt/13cbn T2000 and X4500

The Sony PetaSite is connected to the X4500 and Bootz via a pair of Qlogic SAN Switch 5200 boxes. These are located in the same cabinet as Bootz and the X500. They are the ones with all the orange cables going into them. These two switches were set up and configured by Cambridge Computer, not Temple. The serial number for the top switch is 0428A00147. The serial number for the bottom switch is 0434A00174. Both switches are model SB5200-16A. The hardware support for these two switches is provided directly by Qlogic. I chose not to have them covered by CNS because they are so highly specialized and CNS' people tend to be generalists. For emergency service, call Qlogic at 952-932-4040. Our service contract (which expires on the 4th of each August) is 17975. For non-emergency inquiries involving these switches, you can send email to support@qlogic.com and their web site is http://www.qlogic.com/ and you can click on the following link to bring up the most recent tech support agreement (its a Word document). Qlogic agreement. The hardware maintenance agreement should be renewed at the end of May through William Chang at Cambridge Computer Services.

The other NetWorker Server and tape library are located in the Wachman 8th floor Data Center. This NetWorker Server is puss.ocis.temple.edu and it runs on a Dell 2950 with Red Hat Linux 4 and its host name is puss.ocis.temple.edu. It is connected to the Qualstar 88132 tape library, which has four LTO-3 tape drives. The tape drives and robot are connected to Puss via five fibre channel cables. The first 11 tape slots (on the left side of the library as you face its door) are allocated as a tape transport port area for use in depositing and withdrawing tapes from the library. Tapes should not be left in those 11 ports if they are intended to be deposited into the library because the operators will remove them and ship them off-site. The remaining cartridge slots are numbered by NSR as port 1-121. Slot #1 is reserved for a tape cleaning cartridge. This means that the slots 2-121 are reserved for data cartridges. More will be said about tape cleaning cartridge in another section.

Shipping LTO-3 tapes off-site for mainframe VTL backups

Every day, except Mondays, the operators in the Wachman Data Center are required to ship off any tapes that are found in the Qualstar tape library's i/o port area (the 11 slots below the cleaning cartridge). Any tapes that are in that area when the operator checks the library at around 8:30am will be sent off-site to NOVA. These tapes will be returned one month later, upon which the operator will place them in the tape cabinet. It is our job to load those tapes back into the tape library for future use.

In order to identify which tapes need to be sent off-site and transfer them into the i-o port bin, I wrote a perl script. This script is on Puss. It is /usr/local/nsr_scripts/vt1_tape_unload.pl and its scheduled via root's crontab entry. This script looks at all the tape cartridges in the Qualstar and any that contain data and that are members of the ESGdata pool and that are not marked as recyclable will be unloaded from their slots and moved to a slot in the i-o bin underneath the cleaning cartridge. If a tape is actively being written, it will not be moved to the i-o port.

How to set up a client to back up to Bootz via a tagged VLAN address

In order to reduce backup traffic between some subnets and Bootz, Network Services has set up a series of tagged VLAN connections between those subnets and Bootz. Accessing Bootz from either of those subnets for backups, restores, and also to log into the NetWorker Console Manager must be done via the appropriate tagged VLAN address. The following table lists out the tagged VLAN addresses and the range of IP addresses to which they apply. This table resides on Bootz in the /etc/hosts file. Also note that any client that has to be backed up via a tagged VLAN must be backed up to Bootz, not the X4500.

IP
short host name
long host name
IP address range
155.247.168.112
bootz168
bootz168.ocis.temple.edu
155.247.168.1 - 155.247.168.255
155.247.80.229
bootz80
bootz80.adminsvc.temple.edu
155.247.80.129 - 155.247.80.255
155.247.27.141
bootz27
bootz27.ocis.temple.edu
129.32.1.1 - 129.32.1.255
129.32.1.42
bootz129032001
bootz129032001.ocis.temple.edu
129.32.2.129 - 129.32.2.130
129.32.2.148
bootz129032002
bootz129032002.cla.temple.edu
129.32.2.129 - 129.32.2.25
155.247.225.195
bootz225
bootz225.ocis.temple.edu
155.247.225.193 - 155.247.225.223
129.32.84.148
bootz129032084128
bootz129032084128.ocis.temple.edu
129.32.84.129-129.32.84.255
10.96.100.31
bootz010096100
bootz010096100.ocis.temple.edu
10.96.100.0 - 10.96.103.254
10.16.192.20
bootz010016192001
bootz010016192001.ocis.temple.edu
10.16.192.0 - 10.16.192.254
155.247.80.7
bootz080001
bootz080001.adminsvc.temple.edu
155.247.80.0 - 155.247.80.14

On the client in question, you need to edit its /nsr/res/server's file to include the appropriate short and long host names for the tagged VLAN it needs to be backed up to. Windows users also need to point the short cuts for the NetWorker admin and recover utilities to the same address with the "-s" option. Unix/Linux users will need to reference that tagged VLAN with the "-s" option on the command line recover utility whenever they need to recover data.

On Bootz, someone from ESG needs to open up the NetWorker Console Manager, go into the configure section, then select "clients" then open the resource for the appropriate client, and in the storage node field, make sure you replace "nsrserverhost" with the appropriate long host name for the tagged VLAN you are using for that client. Then put in the same tagged VLAN name in the storage nodes field and click "ok" to save these two changes.

How to set up a client to back up to Puss via a tagged VLAN address

To use tagged traffic exclusively, create additional ifcfg-ethX.Y files, where X is the interface on which you will use the VLAN and Y is the VLAN ID.. In Puss' case, eth1 was required to talk to VLAN's ID 166 (Network: 155.247.166.0), 1695 (Network: 10.96.52.0), so the following set of files was created:

/etc/sysconfig/network-scripts/ifcfg-eth1
/etc/sysconfig/network-scripts/ifcfg-eth1.166
/etc/sysconfig/network-scripts/ifcfg-eth1.1695

These files configured Puss to have two virtual ethernet interfaces called eth1.166 and eth1.1695 that use tagged frames for communication to VLANS 166 and 1695.

Edit the "DEVICE=" line in ifcfg-eth1.166 and ifcfg-eth1.1695 files so they read eth1.166 and ethh1.1695. Add the line "VLAN=yes" to both files. Finish configuring these virtual adaptors with the correctt IP address and subnet mask for each VLAN.

Default Gateway Note:

Add default gateway in /etc/sysconfig/network. It is important to remember that you can only have one default gateway for the entire machine, which is eth0's gateway. There is no need to configure multiple gateways because the kernel knows-by default- the destination(s) and which interface to use for any network(s) it is configured. This is why only a single gateway should be configured. This single gateway should be defined in /etc/sysconfig/network. WHen multiple gateways are defined in the various ifcfg-eth0, ifcfg-eth1, ifcfg-eth1.166, and ifcfg-eth1.1695 files, you can run into routing problems.

# more /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=puss.ocis.temple.edu
GATEWAY=10.96.16.1

Interfaces Configuration Files:

Here are the completed files for a network ste up only to transmit tagged frames for the 166 and 1695 VLANs:

FILE: /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
BOOTPROTO=none
HWADDR=00:19:B9:CD:B3:12
ONBOOT=yes
TYPE=ethernet
NETMASK=255.255.252.0
IPADDR=10.96.16.80

FILE: /etc/sysconfig/network-scripts/ifcfg-eth1.166

DEVICE=eth1.166
TYPE=ethernet
BOOTPRTO=none
ONBOOT=yes
#HWADDR=00:19:B9:CD:B3:14
IPADDR=155.247.166.252
NETMASK=255.255.254.0
NETWORK=155.247.166.0
VLAN=yes

FILE: /etc/sysconfig/network-scripts/ifcfg-eth1.1695

DEVICE=eth1.1695
TYPE=ethernet
BOOTPRTO=none
ONBOOT=yes
#HWADDR=00:19:B9:CD:B3:14
IPADDR=10.96.52.61
NETMASK=255.255.252.0
NETWORK=10.96.52.0
VLAN=yes

In order to activate this new network configuration, restart the network service by typing ...

#service network restart

To verify that everyting is working, run the following command:

#ls /proc/net/vlan

You should see the necessary VLAN entries for each VLAN tag.

How to start the NetWorker Console Manager

To start the NetWorker Console Manager (NWCM), first, download the Java applet at http://nsrconsole.temple.edu:9000 and it will open right away. You will be prompted to enter a userid and password and if you want to trust the certificate. Select the option in the window to trust the certificate. See Stan for the userid and password.


Daily bootstrap information and how to recover corrupt index files:

Every day, including weekends and holidays, the NetWorker server generates a bootstrap report after doing a full backup of the server's index files. This is true of both Puss and Bootz. Note that the tapes containing Bootz's bootstrip and index file backups are kept in the PetaSite and they are not sent off-site. Puss' bootstrap and index backup tapes are kept in the Qualstar and they are sent off-site every day except Monday. The most recent bootstap report from both Puss and Bootz must be retained. These reports are automatically emailed by NSR to a Listserv list called NSR-ACTIVITY which is on listmail.temple.edu. This list is managed by Stan. Each day's bootstrap reports and all NSR notifications can be viewed on this list via the web at http://listserv.temple.edu/archives/nsr-activity.html and anyone who is subscribed to the list can view the list's archived messages, and the daily bootstrap reports.

As an additional precaution, the daily bootstrap reports are sent from Bootz and Puss via email to an address on gmail.com. The address is stan.horwitz@gmail.com. This way, if we lose the data center, we still have the bootstraps. Bootstraps are essential for recovering a NetWorker server, so they must be protected and kept off-site.

The reason the bootstraps are important is that the most recent report is used via the "mmrecov" utility to recover all the configuration files for NSR if the /nsr disk fails or something like that on Puss or Bootz. See the man page for mmrecov for details.

Note that if you lose NSR's configuration files, the mmrecov command is what you use to fix the problem. We should strive never to need to do this.

Once in a great while, an index (the database records) for a particular NSR client's backups will become corrupt. A symptom of this problem is that you will not be able to see six months worth of backups when you use the "versions" command within the "recover" command. Note that the Mirapoint message stores only have their backups saved for one week. If you can't see versions of a file that you know have been around for a while, you can use the "nsrck -L7" command to fix the problem. This may take several hours to run though. See the nsrck man page for details. You would run nsrck on Bootz, not the NSR client.

Reboot process for Bootz

Unless its an emergency, do not reboot Bootz while backups are in progress, especially while "savegrp -O" is running. This is the process that backups up Bootz's NSR data so its important to allow it to finish if at all possible. Go into the console manager and open up a window to "Bootz new (T2000)" and be certain you can log in that way and get root access. If you can't, then do not shut down or reboot Bootz because you won't be able to start it up properly without console access.

On Solaris 10, to see what savegroups are running, type ...

ps -lefo,args,pid | grep savegrp

and look for a process that says "bootz-index-backup" and try to allow it to finish. If you must interrupt it, kill its pid. This particular savegrp cannot be stopped via the NetWorker GUI because it isn't scheduled to start from that GUI.

The steps to follow to reboot Bootz and shut down NSR cleanly are as follows:

You can use nsrwatch and nsradmin in place of the Java-based NetWorker Management Console (NWMC) GUI, but its a pain in the neck.

If you intend to do any hardware or OS configuration changes, it is prudent to make a copy of the files in the /nsr/res directory first and store them on a different server for safe keeping. Do this after the NSR software has been shutdown. I just tar the directory to my home account and then sftp it to my workstation. Its not a large directory so this will only take a minute or two.

There are two methods to shut down NSR gracefully. The easiest method is to open up the NetWorker Console Manager and log in with an account that has administrator rights. Then go into the monitor window, look for all the groups that show a status of running, and right-click on each group and select the stop item from the pop up menu.

Alternatively, you can do the following steps:

1) Log onto Bootz and gain root access.

2) Type "ps -lef | grep savegrp"

3) Type "nsadmin" to start up the NetWorker curses GUI

4) Type "vi" to get to the menu.

5) Use the arrow or tab key to move the cursor to "select" and press enter.

6) Move the cursor to the options and press enter, then press the enter key to select "hidden" and press the escape key.

7) scroll through the list of groups, and for each group you want to stop, open it up by pressing the enter key

8) Look at the fifth or sixth line from the top where it says "force stop" and select "true" then press Escape and type "yes." Repeat for each group you want to stop.

If the NWCM or nsradmin is not responsive (i.e., too slow), then you can kill each group process that you see from the output of "ps -lef | grep savegrp"

After you kill the groups, NWMC and nsradmin should become responsive again. Start either nsrwatch or the NWMC and late at the "Devices" section. Make sure that none of the tape drives shows any writing activity. There might be some reading activity going on for a recover, but odds are, its Nelson's test recovery script. It would be nice to wait for any reading to finish, but if you're rushed, don't wait.You should wait for any tape loads/unloads to complete though and you can see what's happening by looking at the messages and devices panels in the nwadmin GUI.

When reading/writing activity looks like its mostly stopped, unload the tapes from the tape drives by going to the Unix prompt on bootz and typing nsrjb -j PetaSite -u

This process will take a few minutes. When its done, you can shut down NSR.

6)To shutdown NSR, type "nsr_shutdown" and type "Yes" at the prompt. Then wait for all the processes to shut down. If nsr_shutdown hangs, try /usr/local/src/nsr-scripts/nsr_shutdown2

Just to be prudent, make sure that no NSR processes are dead by typing something like "ps -lef | grep nsr". If you see any NSR processes still running, wait a few minutes and check again. There might be some slow housekeeping processes that haven't ended yet, although this rarely happens.

After the nsr_shutdown script completes, there is one more step to take.

Delete everything in /nsr/tmp except the sec directory. Do not delete the sec directory. To do this, type rm /nsr/tmp/* and note that since the force option is not included on the rm command, it won't delete any directories. If there are any directories other than sec in /nsr/tmp, delete those individually with "rm -f /nsr/tmp/dir_name."

Now, you can either reboot bootz or shut it down as needed. If you reboot, be sure to use the console connection, not ssh because ssh will shut down and not restart upon reboot.

NSR will not start on its own after Bootz is rebooted and neither with the sshd deamon and the NetWorker Console Manager. These must be started manually for reasons I haven't figured out yet.

First, make sure all the TCP/IP settings that NetWorker needs are established by typing "/etc/init.d/TEMPLE_tcp_settings"

Next, start sshd so people can log in remotely. Do this by typing "/etc/init.d/local_sshd start"

Next, start the NetWorker daemons by typing "/etc/init.d/networker start"

Then wait about five minutes and start up nsrwatch or NetWorker Console and look in the monitoring log section to make sure NetWorker started fine. The start up process triggers another process that checks the media and client file indexes. This is why I say to wait five minutes, to allow the checking to finish. The start up process might take longer though, which is fine. At the end, it should say something like "media index verified" or something like that. When its done, NetWorker is back in production.

Also, you will need to start Hobbit again. To do so, type:

su hobbit -c "/export/home/hobbit/client/runclient.sh start"

Reboot process for Puss

Rebooting Puss is easier then Bootz.

Unless its an emergency, do not reboot Puss while backups are in progress, especially while "savegrp -O" is running. This is the process that backups up Puss' NSR data so its important to allow it to finish if at all possible.

On Red Hat Linux, to see what savegroups are running, type ...

ps -eo pid,args| grep savegrp

and look for a process that says "puss-index-backup" and try to allow it to finish. If you must interrupt it, kill its pid. This particular savegrp cannot be stopped via the NetWorker GUI because it isn't scheduled to start from that GUI.

The steps to follow to reboot Puss and shut down NSR cleanly are as follows:

You can use nsrwatch and nsradmin in place of the Java-based NetWorker Management Console (NWMC) GUI, but its a pain in the neck.

If you intend to do any hardware or OS configuration changes, it is prudent to make a copy of the files in the /nsr/res directory first and store them on a different server for safe keeping. Do this after the NSR software has been shutdown. I just tar the directory to my home account and then sftp it to my workstation. Its not a large directory so this will only take a minute or two.

There are two methods to shut down NSR gracefully. The easiest method is to open up the NetWorker Console Manager and log in with an account that has administrator rights. Then go into the monitor window, look for all the groups that show a status of running, and right-click on each group and select the stop item from the pop up menu.

Alternatively, you can do the following steps:

1) Log onto Puss and gain root access.

2) Type "ps -lfe | grep savegrp"

3) Start up the NetWorker curses GUI by typing nsradmin

4) Type "vi" to get to the menu.

5) Use the arrow or tab key to move the cursor to "select" and press enter.

6) Move the cursor to the options and press enter, then press the enter key to select "hidden" and press the escape key.

7) scroll through the list of groups, and for each group you want to stop, open it up by pressing the enter key

8) Look at the fifth or sixth line from the top where it says "force stop" and select "true" then press Escape and type "yes." Repeat for each group you want to stop.

If the NWCM or nsradmin is not responsive (i.e., too slow), then you can kill each group process that you see from the output of "ps -lfe | grep savegrp"

After you kill the groups, NWMC and nsradmin should become responsive again. Start either nsrwatch or the NWMC and late at the "Devices" section. Make sure that none of the tape drives shows any writing activity. There might be some reading activity going on for a recover, but odds are, its Nelson's test recovery script. It would be nice to wait for any reading to finish, but if you're rushed, don't wait.You should wait for any tape loads/unloads to complete though and you can see what's happening by looking at the messages and devices panels in the nwadmin GUI.

When reading/writing activity looks like its mostly stopped, unload the tapes from the tape drives by going to the Unix prompt on Puss

and typing nsrjb -u

This process will take a few minutes. When its done, you can shut down NSR.

6)To shutdown NSR, type "nsr_shutdown" and type "Yes" at the prompt. Then wait for all the processes to shut down. If nsr_shutdown hangs, try /usr/local/src/nsr-scripts/nsr_shutdown2

Just to be prudent, make sure that no NSR processes are dead by typing something like "ps -lfe | grep nsr". If you see any NSR processes still running, wait a few minutes and check again. There might be some slow housekeeping processes that haven't ended yet, although this rarely happens.

After the nsr_shutdown script completes, there is one more step to take.

Delete everything in /nsr/tmp except the sec directory. Do not delete the sec directory. To do this, type rm /nsr/tmp/* and note that since the force option is not included on the rm command, it won't delete any directories. If there are any directories other than sec in /nsr/tmp, delete those individually with "rm -f /nsr/tmp/dir_name."

Now, you can either reboot Puss or shut it down as needed. If you reboot, be sure to use the console connection, not ssh because ssh will shut down and not restart upon reboot.

Both sshd and the NetWorker software should start up on their own. Wait two or three minutes after you log in again and and start nsrwatch or NetWorker Console and look in the monitoring log section to make sure it started up fine. The start up process triggers another process that checks the media and client file indexes. This is why I say to wait five minutes, to allow the checking to finish. It might take longer though, which is fine. At the end, it should say something like "media index verified" or something like that. When its done, NetWorker is back in production.

Policy and cost to add new clients

We recently began charging to add servers to the nightly backup schedule for servers that are not within the ESG budget authority. This includes servers that are maintained by groups such as DT-LAN, ISET, etc. The cost for adding on clients is as follows:

One client connection is needed for each new server to be backed up by non-ESG groups. The recommended procedure is for the person who requests a server to be backed up to create a requisition to purchase the appropriate number of client connections from William Chang at Cambridge Computer Services. The requisition should clearly state what the client connections are for and indicate that the license code is to be emailed to esg@temple.edu so it can be registered on our NetWorker server. Stan maintains an Excel spreadsheet to keep track of this information, so please send email to esg@temple.edu any time you add or remove a client resource from NetWorker so Stan can update the spreadsheet.

If you receive a client connection license code, you can register it by following the procedure at Technical Support and software registration. You can also request temporary license codes by contacting William Chang.

The cost for five years of backups is

Standard Backup Client and Tapes Fee - $700
Database Module (SQL/Oracle) - optional:
2 Processors - $2300
4 Processors - $3400

Cleaning Tape Drives:

The Qualstar tape library cleaning:

After some trial and error, I have determined that it is best to run NSR so that it is configured to clean each LTO-3 drive once a month; however, this may change as the tape library gets more heavily utilized. The timing of the cleanings is controlled via the NetWorker Management Console. Click on the devices button, then right click on each individual tape drive device resource to reveal its configuration settings. If you need to adjust the tape drive cleaning frequency, you must do it for every tape drive in the library. This parameter can be changed for each device and can even differ for the individual devices, although for our enviroment, leaving them all the same cleaning frequency is best.

Each LTO-3 cleaning cartridge is good for 50 cleanings, but check the label on the package to verify because that number might change. You can distinguish an LTO-3 cleaning cartridge from an LTO-3 data cartridge easily. Cleaning cartridges have a blue write protect tab, while data cartridges have a red write protect tab. The words "cleaning cartridge" are also imprinted on the packaging.

When a cleaning cartridge has reached the maximum number of usages, an email will be sent out to the NSR-ACTIVITY list to notify you that it is time to change the cleaning tape. A supply of new cleaning tapes is in the tape closet in Wachman Hall. Affix a label to the cleaning cartridge that says "cleaning cartridge" on it. This label is there because it makes the cleaning cartridge stand out for when the operators go to remove any LTO-3 data cartridges that are located directly below it. If you need to buy additional cleaning tapes, go through William Chang at Cambridge Computer Services.

To find out how many cleaning uses are left on the Qualstar's cleaning cartridge, open up a terminal window on Puss and gain root access, then issue the command "nsrjb -C". Look for the information that is displayed for slot #1, which is where the number of remaining cleanings will be identified. The information that corresponds to slot #1 will say something like "x uses left".

An email notification will be automatically generated when a cleaning cartridge hits zero uses left. The subject of that message will be "cleaning cartridge expired." You should also visually inspect the tape library once every other day or so in order to verify that it is operating properly. Look at the tape drives. If any of them has a red LED indicator that says "CL" on it, this means it needs to be cleaned, so don't wait for NSR to clean it.

To manually clean an LTO-3 tape drive, log onto the NetWorker Consolel Manager, then open up the devices window and click on the "devices" item in the menu in the left window frame. Right click on the device you need to clean and open its properties. You should see a window with a few tabs. The general tab should already be showing. In this tab, there's a check box that says "cleaning required." Place a check mark in that box and click "ok". If the cleaning required box is already checked, then you can just close the window. NetWorker will clean that drive as soon as it can, which is usually pretty quickly if a tape is not being written or read in it.

To remove a depleted LTO-3 cleaning cartridge from the tape library and replace it with a new one, first, withdraw the old cartridge from the library with the command

nsrjb -w -S 1

then replace the old cleaning cartridge with a new cartridge in the load port slot. The key to unlock the front door of the tape library is kept in the black cabinet with the other keys and there's usually a key in the lock anyway. If the robotic arm is obstructing your access to slot 1 of the load port then close the door again, and press the "menu" key, then the "down key" until the cursor moves to the "operations" item, then press the "enter" key, then move the cursor to the item that says "park handler" and press "enter" again. Then when its done, press the "exit" key three or four times until the main display appears again. Remove the old cleaning tape from the slot and replace it with a new cartridge, then close the tape library door, let it complete its inventory (takes about five minutes). You can tell when the inventory is in place because the front display will show a message that says something like "scanning inventory." When that message disappears, you should then issue the command

nsrjb -d -S 1

After the cartidge is depositted into slot 1, you need to set the number of cleanings on it. To do this, issue the command:

nsrjb -U 50 -S 1

After you do this, type "nsrjb -C" to verify that slot 1 has the proper number of usages set up for it.

The PetaSite library cleaning:

After some trial and error, I have determined that it is best to disable automated tape cleanings in NetWorker and allow the PetaSite to do its cleanings on demand.

Each S-AIT cleaning cartridge is good for 50 uses. The last ten slots in the PetaSite are reserved for cleaning cartridges. Several new cleaning cartridges are kept in the storage cabinet next to the man trap in the Bell Building switch room. Order more cleaning cartridges whenever the supply of unused cartridges gets below eleven. When a cleaning cartridge has reached the maximum number of usages, the PetaSite will eject it and move it to a slot in the load/unload area. Check that area once a day and if you see a cleaning cartridge there, go to the Bell Building, take it out and dispose of it. The S-AIT cleaning cartridges must have a bar code label affixed to them. If you try to put a cleaning cartridge into the PetaSite without a bar code, the PetaSite GUI will display some ominous error messages. These bar code labels should be gray in color contain the sttring "0CL"in the S-AIT bar codes. . If you use a different formatted bar code label, the PetaSite will not recognize it as a cleaning cartridge (this is a configuration option). If you need to order additional cleaning cartridges and pre-printed bar code labels, go through William Chang at Cambridge Computer Services.

If the PetaSite is working properly and there is at least one out of ten good cleaning cartridges loaded in the last ten slots, the PetaSite will totally manage tape drive cleaning. There should not be a need to do cleanings manually. If for some reason, you need to do manual tape cleanings, open up the PetaSite GUI and right-click on a cleaning cartridge, then click on the icon for the tape drive you want to clean and right-click on it and select "move" from the contextual menu.

To clean a SAIT tape drive, use the same method as the Qualstar's LTO-3 drives. Open up the NWCM, go to devices, and right click on the device you want to clean and get its properties. On the main window, place a check box in the "cleaning required" box.

Mirapoint data backups

To set up NSR to back up a Mirapoint system, a license is needed for NDMP backups. Each Mirapoint Message Store requires its own license. The topic of obtaining licenses is covered at Tech Support and software registration.

You need to enable NDMP backups on the new Mirapoint server. This will change in the next version of the Mirapoint Operating System, but for now, log onto the new Mirapoint server's administrator account and issue the commands "service enable ndmp" and "service start ndmp." This is all that is needed on the Mirapoint side, although we'll have some options in the next MOS version.

This process will change considerably when we upgrade to the next version of the NetWorker server software:

On the NetWorker server side, log onto bootz, get root access, set up an Xwindows session, and then open up the nwadmin GUI. First, use the nwadmin GUI to create a new savegroup for this server's backups. That can be done by pulling down the "Customize" menu and than selecting the "Groups" option. Pull down the "View" menu and select "Details." From there, click on the "Create" button.

Type the name of the new savegroup in the "Name" field and make the name of this group the same as the new server's short domain name. For example, if the new server's FDQDN is po-b.temple.edu, you would name this group "po-b." Set up a time when the backup of this client will begin each day. This is entered in the "Start time" field in 24 hour format. You want to avoid scheduling any two NDMP groups from beginning closer than about 15 minutes together so as not to stress Bootz. Click the "Enabled" buttons on the "Autostart" and "Autorestart" fields. Make sure "Savegrp parallelism" is set to zero. Set "Client retries" to 5. Set the "Level" field to "full." Set the "Inactivity timeout" field to 140. Leave all the other items alone. Click on the "Apply" button to save the changes.

Next, go to the "Media" pulldown menu in nwadmin and select "Pools." Pull down the "View" menu and select "Details." Click on the "Create" button to create a new tape pool for the new server's backups. Type a name for the new tape pool into the "Name" field. For example, if the new client is po-b.temple.edu, give the pool, the name "PostofficeB." In the "Groups" section of this window, make sure the box for the corresponding group that you created in the previous step is marked. Scroll down this window further and make sure that only the devices "rd=bootz:/dev/rmt/5cbn," "rd=bootz:/dev/rmt/6cbn," "rd=bootz:/dev/rmt/7cbn," and "rd=bootz:/dev/rmt/8cbn" are marked. Do not mark any other devices. Further down in this window, make sure the "Yes" option is activated for the "Store index entries," "Auto media verify," "Recycle to other pools," and "Recycle from other pools." Click on "Apply" button to save the changes.

In the "Pools" window, scroll up on the list of pools section and select the "MiraPointIndices" pool. In the "Groups" section of this pool, make sure to mark off the group that you previously created for this new server. Click on "Apply."

Use the "Clients" pull down menu to access the "Client setup" window. In the Client Setup window, you will see a pull down menu that says "View." From the "View" menu, click on the "Details" item. Click on the "Create" button. In the "Name" field, put in the fully qualified domain name for the new Message Store (e.g., po-b.temple.edu). In both the Browse policy and Retention policy, select "Week". If you want the backup records to be retained for longer than one week, set that limit here. In the "Save set" field, type "/usr/store" and click on the "Add" button. In the "Remote access" field, enter in "root@bootz," root@bootz.ocis.temple.edu," and "root@bootz.temple.edu" and click on the "add" button in the Remote access sectionafter you type in each address. In the "Remote user" field, type in "administrator" and type in the administrator's password into the "Password" field. This refers to the account that is authorized to do backups on the new Mirapoint system. In the "Backup command" field, type "nsrndmp_save -T image" exactly as it is typed here. This is case sensitive. Click on the "Add" button in that section. Make sure the "NDMP" option is set to "Yes" in the next field. Make sure the "Parallelism" field is set to 1.

In the "Owner notification" field, type in something like:
/usr/bin/mailx -s "PO-B daily NSR backup report" nsr-activity@listserv.temple.edu ray@temple.edu mdavis10@temple.edu nsr2bb@bb.ocis.temple.edu

In the "Storage nodes" fields, delete the reference to "nsrserverhost" and put in "bootz" instead.

Click on "Apply" to save these changes.

Mirapoint data recovery:

If you just want to recover the inbox, calendar data, or address book entries for a particular tumail user, then you can do what is referred to by Mirapoint as a selective restore. To do this, log onto Bootz. You need to identify which tumail server houses the user's data (e.g., po-c, po-f, etc.). So log into Bootz and type the following command

recover -s bootz -c hostname

where "hostname" is either po-a.temple.edu, po-b.temple.edu, po-c.temple.edu, po-d.temple.edu, or po-f.temple.edu, depending on where the user's account sits. You will get an error that says your path isn't found, but that can be ignored. You will be at the recover prompt, so type "cd /usr/spool/user" then "add uid", then type "recover" and wait because the process is very slow. It might even take overnight; however, all services to the mail store will continue to operate normally. Its a good idea to go into the NetWorker Console Manager GUI and disable the backup of the server you are recovering, and do not recover any data while a back up is already in progress; allow any backup to finish first. For example, to recover the data for uid=stan on po-c, you would type ...

recover -s bootz -c po-c.temple.edu
cd /usr/store/spool/user
add stan
recover
quit

If you wanted to recover data from a specific day's backup, say the "February 16th, 2008" you would type

recover -s bootz -c po-c.temple.edu
cd /usr/store/spool/user
changetime 02/16/2008
add stan
recover
quit

To recover a specific mailbox within a user's inbox, you need to explicitly request it via the add command. For example, if I had wanted to recover a folder called "stan-projects" I would do ...

recover -s bootz -c po-c.temple.edu
cd /usr/store/spool/user
add stan/stan-projects
recover
quit

More information is available via Mirapoint's web site at http://support.mirapoint.com/secure/kb/NdmpBackupAndRestoreWhereDoIBegin.html

If you need to recover an entire tumail server, the following details are pertinant:

Log onto conman and access Bootz's console. Data is recovered for Mirapoint Message Stores using the saveset recover method. This method requires that you find the saveset ID for the given saveset you want to recover. To do this, you need to know which Mirapoint server where the data was located and you need to use the mminfo utility on Bootz to locate the saveset ID.

The general command for the mminfo command is:

mminfo -v -t date -c po-X.temple.edu

where "date" is the day you want to recover from and "X" denotes the specific message store.

For example, if I wanted to recover the message store data from po-e for the date 03/11/2005, I would type:

mminfo -v -c po-c.temple.edu

This will display information for backups since March 11, 2005. You can also get this info via the nwadmin GUI.

The displayed output will look something like this:

# mminfo -v -c po-c.temple.edu -ot

volumeclientdatetimesizessidfllvlname
S00021S1 po-c.temple.edu 03/11/0507:05:2255 GB2167508797cbNfull/usr/store
S00021S1 po-c.temple.edu 03/12/0507:36:5255 GB3962758262cbNfull/usr/store
S00021S1 po-c.temple.edu 03/13/0505:19:3755 GB3140752562cbNfull/usr/store
S00021S1 po-c.temple.edu 03/14/0505:00:0554 GB2335531425cbNfull/usr/store
S00021S1 po-c.temple.edu 03/15/0506:03:0754 GB3493249465cbNfull/usr/store

Look for the column labeled "ssid." This information will change after each day's backup as new records are added to NetWorker's Client File Index and old records are removed. Keep in mind that data for po-e backups is retained for one month and only one week for the other message stores. Also, keep in mind that the only data on these backups is message store data (i.e., indivudal mailboxes). We do not do backups of system configuration and OS files via NetWorker because the NDMP software allows only user data to be backed up. The responsibility to back up the configuration and other data on the Mirapoint systems rests with those who manage those systems.

Note that po-a, po-b, po-d, and po-f receive a once a week full backup and then a level 1 differential backup the other days of the week. The full backup for each one of those servers is scheduled for a different day of the week in order to spread the processing load on Bootz out. These are all selective backups, meaning that individual mailboxes can be recovered for users. The backups of po-c are full daily because its so small. To recover all but po-c, you need to recover the most recent full backup and then the most recent differential backup. Do not reboot the server between recovering the the full backup and the differential backup, just reboot it after you recover the differential backup.

Now, you have all the information you need to recover the March 11, 2005 data that was stored on po-c.

Use ssh to log onto po-c and disable the ndmp service. This is done as an insurance measure to assure that the data is not overwritten. You will likely want to recover the data to postoffice.temple.edu, not to its original host.

The commands to turn off the ndmp service on po-c (or other Mirapoint Message Store) are:

service disable ndmp

service stop ndmp

When the recover is finished, log back on and issue the commands

service enable ndmp

service start ndmp

The general command to recover data from one Mirapoint Message Store to postoffice.temple.edu is:

nsrndmp_recover -c po-X.temple.edu -s bootz.ocis.temple.edu -S ssid -m /usr/store -R postoffice.temple.edu

and replace the "X" with whichever message store you want to recover.

So for the example of recovering the March 11, 2005 data from po-e to postoffice, the command would be:

nsrndmp_recover -c po-c.temple.edu -s bootz.ocis.temple.edu -S 2167508797 -m postoffice.temple.edu::/usr/store

Notice that this command has TWO colon symbols in it between the destination server name and the destination directory.

Be warned! As soon as you begin the nsrndmp_recover process, you will lose ALL connectivity on the host to which you are recovering data. This means that even ssh services will stop and any administrator connections will end. In our example, this means that as soon as the nsrndmp_recover process begins reading data, all access to postoffice.temple.edu will end. Access will resume when the process is finished.

The recover process will take several hours, maybe 14 or 15 hours depending on which server's data you are recovering. The data for po-e usually takes an hour or two to recover. The recover time is roughly the same as the time the data took to be backed up. There is also no progress information available until the recover process is complete, so you just have to wait and maybe visiually inspect the tape drive and the Mirapoint's disk array to see if there is a lot of activity on them (i.e., blinking lights).

When the recover process is finished, you will be prompted to reboot the message store that received the recovered data. In our example, that message store is postoffice.temple.edu. This reboot process will take longer than a regular reboot because it has to integrate data back into the message store's file system. The reboot process will probably take about 45 minutes, but it depends on how much data is recovered.

If you recover email to Postoffice for someone who lost their email and wants it back on their original PO-x server's account, you can log onto the PO-x server's administrator account and do something like

mailbox copy gold.temple.edu
"(mailbox=#user.dvrs.$folder)(verbose=yes)(recursive=yes)(error=Continue)"
"(user=administrator)(authorizeid=dvrs)" xxxxxxx user.dvrs.$folder to get back all their email.

After rebooting Postoffice, you need to prevent it from sending out calendar event reminders, so issue the following three lines:

filter add (domain=primary) "nuke-reminders" discard "" allof stop
X-mailer matches "Mirapoint Webcal"
.

and be sure to type a "." on the third line by itself. Note that there is a blank space after the word "discard" and two quotation marks after the space. Do not do this if you are recovering a full message store back to its original location as part of a system restore.

To recover an individual account, you would do

nsrndmp_recover -c po-x.temple.edu -s bootz.ocis.temple.edu -S ssid /usr/store/spool/user/XXX

where po-x.temple.edu is the host name for the message store where the account resides, "ssid" is the saveset id (see mminfo output), and XXX is the userid. For example, to get back my data on Postoffice and recover it to Postoffice, using the savset id of 2414201892 (which I got from mminfo -cv postoffice.temple.edu), you would type the following command:

nsrndmp_recover -c postoffice.temple.edu -s bootz.ocis.temple.edu -S 2414201892 /usr/store/spool/user/stan

This process needs to parse every bit of data in the saveset on the tape(s), so this process will take several hours. For this example, it took twelve hours, but the recover can be done while the server is up and running and the user who's data is being recovered can continue using his email account during the recovery process. The results should look something like:

NDMP Service Log: ndmp-image: read 476603299633 bytes in 45662.355568 seconds (34.99 GB/hour).

NDMP Service Log: ndmp-image: Staging of selective image restore successful.Please use "Ndmp Merge Status" to track the completion of the merge.

Close the tape device.
nsrndmp_recover: RPC error: Server can't decode arguments
------------------------------------------------------------------
------ W A R N I N G -------
Failed to set the recover status to MMD.
------------------------------------------------------------------
OK
Thu Aug 24 03:23:37 EDT 2006

The errors can be ignored.


How to tell if a tape needs to be loaded for a recovery:

If someone needs to recover files, you need to verify that the tapes that are needed are in the tape library. The person doing the recover can issue the command "show volumes" at the "recover"prompt to see what tapes are needed. This will also indicate if the tapes are in the tape library or not. This applies to Puss and Bootz.

You can also see if a tape for a pending tape mount is in the library by logging onto either Puss or Bootz (depending on which one did the backup) and type:

nsrjb -C | grep tape

and replace "tape" with the volume name of the tape that's waiting. You can tell what tape mounts are waiting by looking at the alerts section of the NetWorker Managment Console's "monitor" window or use nsrwatch. Alternatively, you can see which volumes are loaded in a tape library by starting the NetWorker Console Manager, then looking in the volumes section of the media window for Puss or Bootz.

If the tape is not in the library and the library is full, then use the command

nsrjb -w tape

to unload a full tape in order to open up a slot for the desired tape.

and than put in a new tape into the load port of the tape library and type:

nsrjb -d

If the tape library is not full (i.e., there are vacant slots), then you don't need to withdraw a tape from it.


Problems that can interfere with backups/restores:

First off, make sure that DNS is set up properly for the troublesome client system. Make sure that on Bootz that you can nslookup both the fully qualified hostname and its shortened name and that nslookup returns the same IP address in both cases. The same should be true on the client system. Make sure that it can look up "bootz" and "bootz.ocis.temple.edu" and result in the same IP address both times. Of course, "ping" is your friend.

Next, make sure that no firewalls are interfering with NSR's traffic
This is something that we usually hold the person responsible for the client to ensure.

If backups are too slow, or restores, make sure that the NIC on the client is set to its proper parameters. Do not rely on auto-negotatiation because it frequently negotiates the wrong duplex setting on the switches we tend to use around here. If the settings on the NIC are correct, yet backup/restore performance is still too slow, you need to make sure that the other side of the network connection is okay, as well as the network cable. You also should make sure that anti-virus software is not in use on the client system since that consumes a lot of CPU and disk bandwidth and presents file contention issues.

Make sure that the account that runs the nsr client is authorized to read every file on the system and write to each file. With Novell clients, the Novell administrator (usually Don Cordner) has to push out the password to our NSR server from the Novell server. With Mirapoint, the password is entered on the NSR server, but with Novell, it is the opposite process. Only Novell and Mirapoint clients require a password for backups/restores.

When doing a restore, make sure that the file(s) are being restored to the same type of OS platform and file system as they were when they were backed up. For example, with Windows, if you want to restore a file that was originally on an NTFS volume, it needs to be recovered to an NTFS volume. Of course, also make sure that restoring the file(s) does not exceed or violate any disk quotas that might be in effect. Note that you cannot restore files across different operating systems and/or different file systems.

How to run a test backup or back up a particular client:

Go into nsradmin or nwadmin and make the client in question a member of the "test" group. In nsradmin, you type "vi", then go to Select->NSR Clients->client_in_question

Tab to the particular client you want, then select "Edit" then move the cursor down to the "group" field and click on the "test" group.

Press the Escape key and select "yes" to save the change.

Now, to actually run the group, at the Unix prompt, type:

savegrp -pvvvvv test

If you get errors, its not a good thing. If you get no errors, it is a good sign, but not certain, that a true backup of the client will work okay. You can run a real backup by typing:

savegrp test&

when the group is finished being executed, email will be sent to the nsr-activity Listserv list.

Note that if you see information about some other group(s) being backed up that you do not want to test, go into nsradmin/nwadmin in the clients section and remove those unwanted clients from the test group.

There are also two other testing groups. They are "test2" and "test3". If you need to run more than one client at a time to test it or rerun a backup, you can either put them in one of the test groups, or use multiple test groups.

How to load writable tapes into the Qualstar library

Note that it is a good idea to have at least one writable tape per tape drive in the Qualstar. That means a total of twelve tapes need to be in the library at any one time that are flagged as recyclable. You can tell how many recyclable tapes are in the Qualstar library by logging onto Puss and typing

nsrjb -C | grep -c yes

The result of this command will be a number. If the number is less than 12, its time to change tapes.

Log into Bootz and get root access. Open up the nwadmin Xwindows GUI. Click on the button labeled "volumes." so you can see which tapes to remove. You need to remove one tape from the library for each tape you load into the library. The "volumes" window will tell you this. The window looks a lot like:

Look for entries such as B00142 and B00143 that have the word "full" in the "%Used" column and "rd=puss.ocis.tem" in the location column. Click on a tape and then look in the bottom portion of the window. Scroll down the bottom portion of the window and look for the date on the last entry in the window. You can remove the tape if the date on the last entry is one month or more ago. These are tapes that can be removed from the tape library.

So let's suppose I want to remove tapes B00138, B00140, B00141, B00142, and B00143 from the tape library. Since the Qualstar library is attached to Puss, you need to ssh into Puss and get root access. On Puss, type the command:

nsrjb -s bootz -w B00138 B00140 B00141 B00142 B00143

and wait. This might take about ten minutes. The more tapes you withdraw from the tape library, the longer this will take to complete. The NetWorker software is withdrawing tapes into the Qualstar's load port. You can withdraw a maximum of 30 tapes at a time.

Now you need to figure out which tapes from the storage cabinet are writable. To do this, log onto Bootz and run a script that I wrote. The script is /usr/local/bin/expired_tapes I suggest you pipe this into the mail program, email it to yourself as in

/usr/local/bin/expired_tapes.pl | mailx -s "expired tapes" mdavis10@temple.edu

and then print the results. Take your printed report up to the 8th floor data center and open the tape cabinet and find five tapes from the list. These are the tapes you will put into the tape library. Take the tapes you withdrew from the library and put them away in the cabinet and then replace those tapes in the library's load port with the ones you found on the printed report.

Close the tape library door. It will do an inventory. The LCD status panel will say something like "scanning inventory." Wait until the inventory process is done. This takes five - ten minutes. Then you need to deposit the newly added tapes from the load port into usable tape slots that NSR can access. To do the deposit, you need to find out which slots in the library are empty. This is done on Puss by typing

nsrjb -C | grep ": "

As an example, try:

nsrjb -C | grep ": "

These are slot numbers. The numbers will likely be different each time you change tapes because different tapes come from different slots

So using the numbers above, we need to put the tapes that we just added to the Qualstar's load port back into usable tape slots, like the ones in the previous list. To do this, type:
nsrjb -s bootz -d -S 146 -S 217 -S 245 -S 524 -S 532

When this command completes, you will get only a Unix prompt back. That's the only feedback you will get. You can verify that the slots are full now by issuing the command

nsrjb -s bootz -C | grep ": "

After you deposit tapes into the library, you need to inventory them. So continuing on with this example, to inventory the tapes that were just deposited by the previous "nsrjb -d" command, you would type


nsrjb -s bootz -Iv -S 146 -S 217 -S 245 -S 524 -S 532

Be advised, if the tape library is busy, this process can take a long time if it needs to read a label. Just be patient. This is the last step. When the inventory completes, you're done.

How to rebuild Qualstar library configuration

Once in a while, you might see that NetWorker fails to unload tapes from the Qualstar tape library's tape drives. Log onto Puss and get root access, then look at nsrwatch at the devices section, or use the NetWorker Console Manager GUI. If you see there are tapes in tape drives that should have been sent off-site, this is not good. Log into Puss and get root access to see if you can get NetWorker to unload the tapes from the tape drives.

On Puss, you would type "nsrjb -u"

The above command should cause the tape library to unload all tapes in the tape drives and put them in the tape slots where they belong. This command will take a few minutes to complete. If it completes, but takes like ten minutes or longer, it might the result of Puss being heavily loaded. Just do something else. But if you see strange errors in nsrwatch or the devices section of NetWorker Console that say something like "error" on device or any other error for the devices in the Qualstar, something wrong has happened. The quick fix is to do do the following steps:

First, on Puss type "nsrjb -C | mailx -s "Puss tape library state" esg@temple.edu" to email the state of the tape lilbrary to you. This is important only because it provides information on the number of cleanings remaining on the cleaning tape in slot #1.

Shut down Puss. To do this, log on, get root access and

Shut down NetWorker by typing "nsr_shutdown" then use the "/sbin/shutdown -h" command, then push the front button on Puss to power it off.

Walk to the back of the tape library, look down and to the left and you'll see the power switch. Power off the tape library. Wait a minute or two. Power on the tape library and allow it to finish its hardware inventory (you can see when its finished by looking at the LCD screen on the front). Power back on Puss. It should reboot fine with no further action, but just in case, watch it from the command center. Next ...

Open up the NetWorker Console Manager GUI. Click on the devices button, and click on the libraries item in the left frame of the window. Pull down the "view" menu and select "diagnostic mode." Right click on the "rd=puss.ocis.temple.edu:Qualstar" item in the right frame and select the property item from the pop up menu. Be careful NOT TO select the PetaSite. Delete the Qualstar library configuration. NetWorker will prompt you to do the delete again to confirm you really want to do it, so do it. You do not need to delete the individual LTO-3 devices though, just the lilbrary resource.

To recreate the Qualstar's library resource in NSR, log onto Puss. Get root access and type the following command ...

jbconfig

It will give you a menu. Select the option for auto-detected SCSI library. You will be prompted to enter the name of the tape lilbrary. Make sure you name it "Qualstar" with an uppercase "Q" and the rest lowercase. It is case sensitive and you need to create it under the same name as the entry you deleted in order to make it link up properly with NSR's media database. You will be prompted if you want auto-cleaning. Say yes, but if you mistakenly so no, don't worry about it. This should take only a minute or two, upon which you will be prompted if you want to configure another tape library, so say no at that prompt.

Wait two or three minutes, then then open up the NetWorker Console Manager GUI again. Click on the devices button, and click on the libraries item in the left frame of the window. Pull down the "view" menu and select "diagnostic mode." Right click on the "rd=puss.ocis.temple.edu:Qualstar" item in the right frame and select the property item from the pop up menu. Go into the timing tab and make sure the following values are set for the timing parameters (note that you need to be in diagnostic mode to see the timing tab):

Load sleep 20;
Unload sleep 20
Eject sleep 20
Deposit timeout 15
Withdrawal timeout 15
Cleaning delay 60
Idle device timeout 0
Port polling period 3
Operation lifespan 1,800
Operation timeout 1,800

Next, you need to reconfigure how the cleaning cartridge is handled. First, change the location of the cleaning slot from slot 121 to slot 1 since the cleaning cartridge is used by the SunGard operators as a marker point for where the load/export bin is located. To do this, go into the tape library properties tab and in the "cleaning slots field" make sure it says "1-1". Next, change the default cleanings to 50. Then click on the advanced tab. In the advanced tab, change the available tape slots from 1-120 to 2-121. Then go into the the configuration tab and make sure there's a check mark on the boxes labeled "auto media management" and "bar code reader" and "match bar code labels." Click on the save button. You are done with the GUI now.

Then in a terminal window on Puss, issue the following command

nsrjb -U x -S 1

Replace the value of "x" with the number of remaining cleanings that was indicated in the output of the "nsrjb -C" command that you should have emailed to esg@temple.edu earlier in this process. This might take a few minutes to complete. Just wait.

Then type the following two commands (both of which might take 20 minutes or more to complete):

nsrjb -Iv

nsrjb -HEv

That's it. You're done.

Periodic Qualstar tape library cleaning

On the first business day of every even numbered month, clean the orange pads on the two gripper arms. Each gripper arm has two pads. Cleaning those pads is necessary because when the tape cartridges are added and removed from the library, they accumulate oils from our skin. These oils get transferred to those gripper pads and they build up over time. Clean themm using the isopropyl alocohol and the cotton swabs that are kept in the tape cabinet. On the first business day of each calendar year, clean the various sensors in the library as per the documentation. This requires canned air, which is also available in the tape cabinet. The procedure on how to do this cleaning is described in detail in chapter 12 of the Qualstar manual, which is kept in the tape cabinet.

Mounting the X4500 disk device

Every once in a while, the disk device on the X4500 will be unmounted by NSR. Note that the disk device consists of two parts: writable and readonly. Why NSR does thisis a mystery, but it is not supposed to happen. If you notice the disk drives are not mount them, then you should manually mount them. Usually, it is just the read only portion ofo the disk device that becomes unmounted, and this only happens maybe once every three or four months, if that. I have only seen the writeable portion of the disk drive unmount once.

 

How do you know that the read only disk device is unmounted? Start nsrwatch on Bootz. Look at the device list. Notice where it says "readonly adv_file (none)?" The "none" means there's no disk device mounted on the read only part. If the writeonly part of the disk device is also unmounted, it will also say "none" on the first line.

sr/disk_device adv_file Disk.001 writing, done
e/_AF_readonly adv_file (none)

Mounting the disk device parts is easy. Log onto Bootz and get root access.

To mount the read only portion of the device, type the following command:

nsrmm -m -f rd=x4500:/nsr/disk_device/_AF_readonly Disk.001.RO

To mount the writable portion of the disk device, type the following command:

nsrmm -m -f rd=x4500:/nsr/disk_device Disk.001

Sending email notifications about backups

There are two ways to send out emails regarding the successful or failed backups. If someone wants to be notified about the backups of just an individual server such as "po-b", then you open up nwadmin. Do not use nsradmin for this purpose because the text in the mail notification field gets munged for some reason, at least it does for me so open up nwadmin. Click on the "Clients" pull down menu. Scroll down to the client entry of interest. Pull down the "View" menu in the client's window. In the view menu, select the "Details" option. Unfortunatetly, there is no way to make "Details" a default setting. Then scroll down the client entry window until you see a field called "Owner notification" and then put in something like "/usr/bin/mailx -s "xxx daily NSR backup report" addr" where xxx is the name of the client and "addr" is the email address for the person who wants to receive the daily report for that client.

If the person wants to receive Ray's consolidated daily report for all the clients that we back up, then all you need to do is subscribe them to the NSR-ISSUES list by sending email to listserv@listserv.temple.edu and in the body of the message, type "add nsr-issues addr name" where "addr" is the person's email address and "name" is the person's first and last name.

How to set up an NSR client to back up to Bootz via its IP address

In the client resource, click on the tab "Globals 1 of 2". You should see a field labeled "server network interface". If that field is not visible, pull down the "view" menu from the main NetWorker Console Manager GUI and make sure "diagnostic mode" is checked off, then go back to the client's resource and that field will be visible. In that field, put Bootz's IP address, which is 155.247.166.26, then click on "ok" to save the change. For an example of this, see the client resource for www.ocis.temple.edu.


Top of Document