Avanti Product Banner

User Experience: An Unofficial Guide to Tuning &
Troubleshooting TaskMaster (UDP) on NetWare

Submitted by: Allan Clausen - IT Quality A/S (DK)

Are you seeing UDP or NCPE packets?
First of all you need to make sure you use the UDP protocol. It is significantly faster than NCP Extensions (NCPE), and the packet size is inherently larger, which in turn speeds up communication. NCPE has a packet size of not more than 576 bytes of which approximately 512 bytes is actually data. UDP will support packet sizes as large as 32768 bytes. And you don't have the luxury of LIP or Packet Burst when doing Server-to-Server data transfers, so UDP is the way to go. Unfortunately UDP it not a connection oriented protocol as NCP, but we'll look into that later.

The big question is: How do I get TaskMaster to use UDP over NCPE?

The official way is to add "/+IP" as an option to the command line but that will not guarantee you UDP communication. It just tells TaskMaster to go for this protocol first.

Avanti Comment: By default, TaskMaster will always test for UDP support. The purpose of the "/+IP" option is to allow the User to specify the UDP packet size for TaskMaster to use at the start of the initial communications test. For example, if the SET Largest UDP Packet Size parameter for one Server is defined as 32768 with 16384 defined for the other Server but the WAN link will only support a UDP packet size of 4096 then TaskMaster would normally initiate its communications test using 16384 byte packets, which is the largest supported size for both Servers, and would probably fall back to 4096 or below before it established stable communications. Adding the command line option "/+IP=4096" or "/+IP=4" will cause TaskMaster to start its communications test using a UDP packet size of 4096 bytes, typically reducing the communications test cycle. This optional override is also useful in environments where a larger packet size is possible for short bursts but a smaller packet size will provide more stable communications (i.e., reduce the size for stability).

TaskMaster starts every replication run by testing the communication between source and destination. If UDP is not stable enough, or not available, TaskMaster will revert to NCPE for guaranteed delivery. And there is nothing you can do about it, with regard to TaskMaster.

There is one issue though, in earlier TaskMaster releases where both source and destination settled on a connection protocol and packet size, they were unable to raise the speed again on later replications. So if they started out with replicating through a busy line - TaskMaster assumed the line was busy forever. You would then benefit from unloading and loading TaskMaster again on both servers. But that seems to have cleared up in v4.11.

Are your Server settings correct?
As a minimum, go to the FAQ on the TaskMaster web-site and implement the protocol specific SET parameters (Nagle, etc). Remember FLUSH CDBE to save your settings.

The downside with UDP, is that you need to find out your maximum UDP packet size. Don't rely on what TaskMaster is saying on the replication screen. It could be what it "thinks" your packet size are. You need to know!

You will probably be using a WAN line or as a minimum a Router. You need to know the largest packet size allowed for UDP on both the WAN line and the local Router. People usually don't give UDP much attention when configuring routers so the UDP packet size is often set to "Default" which can be anything from 1024 to 32768. And you would go with 32768 if you get the chance. So get on the phone and talk to your ISP WAN provider. Remember this is technical stuff and Joe TechSupp will, in many cases, not suffice. Ask to speak to someone who really knows - not just someone who "thinks" it is set to "maximum".

When you have the magic number, which is probably 16384 or 32768, go and set "SET Largest UDP Packet Size=xxxxx" on all Servers using TaskMaster. And if you have Servers that act as Router for TaskMaster communication, you should set it here too (got Border Manager?).

Remember, if you are using NetWare Branch Office, it will have a default UDP packet size of 16384 and not 32768 as the full 6.5 will. (When this document was written, SP4 was the active Service Pack.) So remember to set a packet size that is the same for all Servers.

How can you test UDP communication?
Now you know the packet size, but how will you know it's really using 16kb or more?

It's quite tricky, but a packet trace will do. I personally use Netmon ( http://www.roletosoft.com/download/default.shtml ) as it's easy to use, and never hangs the Server.

Run it, press F2 to choose NIC, press F9 to capture, and F4 to enter the IP address of the destination Server. Press F5 to start capturing.

Send a small file with TaskMaster and stop the capture (F5). You should now see chunks of data with the MTU size (probably 1514 on your Ethernet). Each chunk is divideable with the maximum packet size. So if you have 16kb you should see around 11 packets with a size of 1514 bytes and then some smaller packets.

This will only give you a hint towards the actual packet size, not speed or reliability.

To really make sure, you need to test UDP directly from your workstation. This can be done in many ways, but the following works for me:

Use TFTD. It's the evil brother to FTP and always uses UDP. You probably used it when flashing that annoying Cisco Switch last year.   :-)

First of all you need to load a Server module ( http://www.novell.com/coolsolutions/tools/13770.html ).

Make sure to unload TaskMaster, just so there is no interference. Unpack the TFTPD.NLM to SYS:\SYSTEM and LOAD it with "TFTPD -w". It will then act as a TFTPD Server to which you can both read and write files.

Now you need a Client, the Pumpkin program works great for me ( http://www.download.com/3000-2085-10424312.html ).

Grab a 10 MB file (JPG or ZIP) and place it in C:\10MB.TST and SYS:\10MB.TST. This is the easiest way if you are not familiar with the Pumpkin Client.

Then try to send (put) and fetch (get) the file between the Workstation and Server. On a 100 Mbit line, it should rarely require more than 3 seconds. On a Gigabit line, it should not require more than 1 second.

Watch the file size as it climbs to 10 MB. If it's too slow all the way through, you might have packet size problems. If it pauses for 10 seconds or more (or aborts), you might have infrastructure problems.

When an error is present, I see great performance for around 500 KB then it just sits there for 30 seconds - no errors are reported but 10 MB requires 30 seconds or more.

I have local problems.
If you have problems with UDP communication between the Workstation and Server without passing a router (i.e., in the same building), you need to check some things out:

  1. Do you have a network loop?
    Yes, it should not happen - but it does in anyway. Are your users buying $10 mini 5 port Switches? And setting them up everywhere?

    Go through the cabling and make sure no loops are present in the network. Of course it never happens to you - but I have seen loops that accounted for 50% of the UDP problems. UDP packets are prone to these kind of problems, so make sure that network loops are not an issue.

  2. What are the NIC settings?
    When dealing with Server NIC's and Switches, you are always in danger of not setting the NIC speed correct. I have seen many speed settings that "should work" but didn't.

    If you are using Gigabit Switches, make sure the NIC is set to "Auto Detect" as most drivers don't support 1000 Mbit as an option and will need to negotiate for this speed.

    Forcing the NIC to 100 Mbit / Full Duplex on a Switch port that is set to Auto Detect can give you problems on Gigabit Switches (I have seen this on Nortel Switches).

    Don't be fooled by other traffic and protocols. I have seen a case where the Server was forced to 100 Mbit / Full Duplex, the Switch was set to "Auto Detect" and everything performed perfectly, except for UDP. When the Server was set to "Auto Detect", it solved the UDP problems. So don't treat NCP or TCP traffic as a guideline for how UDP traffic should perform on the same connection.

    Also make sure that the Server is downed and powered off when changing the NIC settings. Some Servers and NICs will retain the settings when rebooted.

  3. What type of Switch do you have?
    There are Layer 2 Switches, Layer 3 Switches which operate in Layer 2 mode and fully Layer 3 Switches. Make sure that you are not having Switch problems.

    Load up the Pumpkin Client and try it out between two Workstations on the same Switch as the Server.

    Are you still having problems?

    Try changing the Workstations from forced speed to Auto Detect and vice versa. If that doesn't help, you need to present your findings to the Switch guy.

I have WAN problems.
Now this is not as common a problem as you might think. Normally UDP problems are local. But there are cases of local ISP's that inhibit UDP traffic to "fend off hacker attacks." When pressed for an explanation to this, they seldom come up with a good answer - so make sure your ISP is not filtering out UDP on some level.

Some ISPs also sell solutions where you have a number of physical lines acting as one. Remember that UDP is a connection-less protocol, so you might see that traffic moves trough all lines in turn. And if you only have a small line at the destination, something in between might be congested with UDP traffic. But don't take my word for it, I have just seen it as a theoretical problem.

The problem with WAN communication errors, is that they are hard to troubleshoot. Your ISP does not have UDP problems that often and they are sometimes tempted ignore them as "one of those problems." You might need to set up 2 Workstations, one on each side, just to take "Novell" out of the equation, as that word seems to stress ISP tech guys needlessly.

In conclusion.
Try and work out a baseline for how fast you think your traffic should flow. Use a Workstation and copy the same file through the Novell Client.

Use 10 megabyte as a guideline. And make sure you use a JPG or ZIP file - otherwise TaskMaster might be compressing your file. You should see your workstation having the same speed as TaskMaster - or just a bit faster.

Avanti Comment: Copying directly between a Workstation and a Server should always be faster than copying between two File Servers. The Workstation benefits from the fact it is basically performing a dedicated task (i.e., its primary focus at the time is copying the file) while Servers must yield processing to the dozens, even hundreds, of active processes and service other client requests at the same time. Unless the file data happens to already be in the Server's Cache Buffers, dedicated Workstation disk reads will always outperform shared Server disk reads so merely accessing the data on a Workstation will be faster. And, in most cases, Workstations tend to be upgraded with newer and faster hardware technology than Server's, providing Workstation-based copy operations additional performance and resource advantages.

Also review the speed of the WAN line and see if you are not within at least 80% of the theoretical bandwith maximum.

Avanti Comment: It is important to note that stated bandwidth speed is a theoretical maximum and not guaranteed as sustainable. In fact, sustaining anything above 60% of the stated bandwidth on Ethernet is most often considered efficient throughput due to Ethernet's design and collison avoidance overhead.

Don't be scared that TaskMaster reports up to 50 retries on a 10 MB file. It's UDP but it can handle it. I am not able to detect any decrease in speed with 20 retries on a 10 MB file on a 2 Mbit line.

When you have determined the optimal maximum UDP packet size, set all Servers to that size and add "/+IP=[Size]" option to the command line. That will make the negotiation quicker and you will hit the right settings every time.

And finally, remember that UDP does not work like TCP. Don't compare the two protocols.

Also remember that if there are Server to Server VPN involved somewhere, you really need to make sure you know what's going on in the VPN tunnel.

Happy replication!

Avanti Comment: Thank you very much for taking the time and effort to research and prepare, and for allowing us to share, such a comprehensive document on the subject.