
isham research
Those familiar with the IBM mainframe market over the last three decades will know the term FUD - Fear, Uncertainty and Doubt. Our German friends know it as FUZ - Furcht, Unsicherheit und Zweifel - and enjoy the fact that it sounds like "Furz", German for "fart".
Some classics have passed into folklore, like the IBM salesman comparing the airflow of an Amdahl mainframe with a domestic vacuum cleaner, and leaving his prospect to wonder if an air-cooled system really was as loud as a thousand howling Hoovers.
To earn the name, FUD has to be properly structured. There should always be a grain of actual or supposed truth at the heart of each point - this is extrapolated a little, and the recipient of the FUD (the "FUDee") is then left to continue this extrapolation. It's a fact - air-cooled CPUs are louder than water-cooled ones - but who sits next to them? Amdahl could point out that an air-cooled machine stays up long enough for a graceful shutdown if the cooling failed - water-cooled systems didn't. Final proof of the pudding; all IBM's modern systems are air-cooled.
During the 3380 era, all of IBM's competitors used rotary actuators - according to IBM FUD, a technology devised for PCs and unsuitable for mainframe use. Not enough ball bearings, for one thing. Yet even at the time, IBM's 3081 used a rotary actuator for its microcode storage - and today linear actuators (and their ball bearings) have vanished from the planet.
So FUD - even if plausible - isn't always true, and FUDers very often wind up using the same technology as the FUDed. Just as soon as they realise it's better.
It is astounding how often FUD exhorts people to deny quite obvious truths - it can demonstrate convincingly that something can never work, even though there may be hundreds or thousands installed and working without a word of complaint from their users. Features are presented as problems - even though no one in a large installed base has ever reported them.
This first piece of emulation FUD - from UMX - is very crude by IBM standards and, in contrast with most of the PCM era FUD, flat wrong in several respects. It also contradicts itself in a couple of interesting ways:
FLEX-ES versus UMX Virtual Mainframe - A Comparison
Version 1.0 - October 2, 2002
by Pete Wilson, Founder S/SE
This neatly delivers the first conundrum. Since this web page was uploaded, versions of this paper as old as August 2001 have arrived via email. So either the date or the version - and certainly the author - are not correct.
It can be just as confusing working out when UMX was founded. 1994, as some p/r claimed - 1999, as the current company presentation states - or had UMX been "working on its mainframe in a box for ten years" in 2000, as other p/r claims?
Those familiar with IBM mainframes know that there is a combination of hardware, microcode and software that together make up the mainframe computer. This document describes the differences between two "software mainframe" methods: FLEX-ES and UMX Virtual Mainframe.
The information contained in this communication is confidential and may be legally privileged. It is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful.
Afraid or ashamed?
Several people have reported getting a copy by ticking a box on the Demo CD order form. One person got it without ticking the box. Hardly a state secret.
S/390 Emulation
S/390 mainframe emulators perform quite similar tasks in that they translate S/390 instructions into Intel instructions and execute them. This makes the environment appear to the Operating System and its applications as a true S/390 architecture.
There are two commercially available S/390 mainframe emulators: UMX Virtual Mainframe from UMX Technologies and FLEX-ES from Fundamental Software. The method deployed by these companies in their software packages is vastly different. This affects how the software performs the tasks it was designed for as well as the performance potential of the system.
Both UMX Virtual Mainframe and FLEX-ES emulate the ESA architecture but they use different methods: FLEX-ES utilizes a compiler approach that interprets a block of S/390 instructions into Intel instructions and then executes them. UMX employs a "micro-program-automat" technique that places a layer of microcode between the software and hardware to enable S/390 instruction execution.
Flex-ES actually emulates IBM's System/370, ECPS:VSE, ESA/390 and/or 64-bit z/Architecture in any combination on a single system - something impossible on 'real' hardware. UMX is currently stepping through point releases of VMF Release 4 - Release 4.5 is scheduled to have security enhancements, Release 5 is unspecified, and not until Release 6 will it have 64-bit support.
FLEX-ES Method
FLEX-ES utilizes pre-compilation to interpret and execute S/390 instructions. The software takes a section of S/390 instructions, up to a predetermined limit that is typically 4K in size, compiles it into a sequence of Intel instructions or metacode and then stores these instructions in internal cache for future reference and execution. Compiling the S/390 codes into Intel instructions and executing the Intel instructions is the method FLEX-ES uses to accomplish work.
If a fact is used in a FUD argument, it's critical that it's accurate - and the above is not. Flex-ES precompiles code in sections that reflect the size of a real z900's cache lines, and actually mirrors its behaviour - storing into a cache line invalidates it in both systems. If there are instructions in such a cache line, a z900 has to refetch it and Flex-ES has to recompile it. A great deal of software has been optimized to cater for this, and Flex-ES exploits these changes for even higher performance. Pre-compilation also allows optimisation for the changes in microarchitecture in succeeding Intel processor variants.
Overall the process is referred to as Binary Translation - note the reference to Flex-ES in this IBM paper.
(Earlier versions of the document said ".. most likely 4k in size" - still wrong, but at least admitting of doubt. The revision turns this into an egregious error.)
UMX Virtual Mainframe Method
UMX Virtual Mainframe uses the microcode approach, much like "real" S/390 mainframe hardware. A conceptual way to describe this is that each S/390 instruction or a part of it has a microcoded function responsible for interpretation and execution of this S/390 instruction.
The microcode attempts to predict future instructions based on the use pattern and optimizes the code path according to this intelligence. For example, a Compare instruction is most often followed by a Branch instruction, thus the potential for optimization.
It's a leap from a "microcode approach" to actual microcode. Intel hardware instructions are used in both emulations. Because UMX doesn't precompile the emulated stream, each mainframe instruction must be emulated as it is encountered. It's true that branch results can be predicted (a BCT or BCTR with a non-zero branch target is nearly always taken) but this is of no use whatsoever except in keeping a hardware pipeline loaded. IBM has published an excellent paper on the z900 processor design, but this has nothing to do with UMX.
Performance
Let's take a closer look at the advantages and drawbacks of each method. The FLEX-ES method of pre-compilation appears to be extremely effective, as it appears to run at near Pentium speed. However, this is somewhat artificial under modern operating environments and production systems.
This is where life gets to be fun - someone who's never benchmarked - or probably even seen - a system that is the product of many years' research installed in hundreds of sites world wide is going to get out a cigarette pack and scribble all its faults on the back.
First, production programs are extremely I/O intensive. Nearly all programs process, query or modify data utilizing access methods like BSAM, QSAM and VSAM or database calls. These programs make numerous Supervisor Calls (SVC) to I/O support routines, invariably creating a long jump outside of the block of code that FLEX-ES pre-compiled. This causes a new compilation and write to cache of the SVC routine. Under a heavy load, FLEX-ES runs out of internal cache to store the pre-compiled blocks of code and performance degrades exponentially as more compilation occurs.
This is remarkably self-contradictory. The entire point of caching is that code sections are retained for rapid re-use - so the more frequently a section of code is used, the greater the benefit. There simply is no recompilation - it's still there. The more often you go back, the more likely it is still to be there - and the less cache you need in total. Why would you run out of cache by using some entries more frequently? The opposite is the case.
But it's all built on sand anyway. H. Pat Artis, author of the PA I/O Driver and well-respected performance guru has examined the Flex-ES I/O subsystem in detail and published a White Paper. The reader is left to judge whether the author of this FUD or Dr Artis is the greater authority on I/O system performance.
Second, in modern operating systems, multiprogramming plays a huge role in performance. As the multiprogramming workload increases there is a corresponding increase in branching, so caching and recompiling activities increase. Eventually, these resources become constrained.
Actually, multiprogramming produces a reduction in branching, which is to a degree supplanted by other types of state change. Factors such as support for an adequate equivalent to a Segment Table Origin Register Save Stack (to preserve TLB entries across address space changes) and similar things take on a greater degree of importance.
Finally, if a program modifies itself this will invalidate the pre-compiled code, creating another instance of recompiling and caching. FLEX-ES cannot efficiently handle this self-modifying code because there is no code section to pre-compile; it is invalidated as soon as the S/390 code is modified. Most code in modern operating systems is reentrant so it appears that this argument is not applicable. But production workloads are I/O intensive and if you examine I/O processing, most channel programs are self-modifying. An operating system like OS/390 utilizes a large amount of Program Controlled Interrupts, thus the I/O process under FLEX-ES appears to be a single-threaded operation.
Well, a couple of paragraphs back the main problem was Supervisor Calls and I/O - yet there's no code on the planet more read-only than IBM's interrupt handlers. Quite what channel programmes have to do with code compilation is unclear - and that isn't how Flex-ES handles them anyway.
These architectural features create huge performance peaks in FLEX-ES implementations. In the old days, it was called "the knee of the curve" where response degraded exponentially at high resource utilization. FLEX-ES's performance and the resulting response are good at moderate loads but will tend to decay exponentially at high multiprogramming workload levels. Fundamental's own tests show non-linear degradation under OS/390 or VSE when run in more than one region/partition.
Since there are no queues in Flex-ES, it's hard to see where queueing delays might occur.
The microcoded environment of UMX Virtual Mainframe is an entirely different software model with none of the preceding problems and associated bottlenecks. Specific microcode programs deal with each S/390 instruction or piece of instruction to optimize performance by look-ahead, caching and/or parallelism. This process is much like a real S/390 mainframe that also utilizes microcode as a layer between the software and hardware. Thus, UMX Virtual Mainframe interprets S/390 instructions using its specific architectural model and performance peaks at high resource utilization are not nearly as dramatic as FLEX-ES Emulation. Performance is extremely linear even up to and including 100% resource loading.
The UMX Virtual Mainframe approach may create somewhat higher overhead at low system loading. Since the entire environment is created to service all workloads whether large or small, small workloads will have a bigger proportion of emulator overhead. This does not affect response time as resources are available and can be thought of as similar to "the low utilization effect" in MVS and OS/390.
Getting 8 MIPS out of an IBM eServer x250 - even a two-way 700MHz system - is nothing to be particularly proud of.
A simple example can be used to visualize the performance of the two systems. Compare a minivan and sports sedan. A sports sedan like a BMW will transport up to 4 passengers somewhat faster or at least as fast as a minivan, but what about 8 passengers? It takes the same amount of time for a minivan with only 4 passengers or twice the load. That can't be done by the BMW, even though it is a capable machine for its purpose.
A curious analogy - your product is a BMW, mine's a van. BMW sold 905,000 cars in 2001 - the leading brand of minivan only 142,000. The order is right - but Flex-ES has sold hundreds of copies, and the fingers of two hands are still enough to count UMX installations.
Another interesting point to note is that the FLEX-ES emulator utilizes a just-in-time compiler of Fundamental's design. UMX uses the industry-standard Intel compiler products because it is well known that compiler developers optimize their code for a specific processor. Obviously, the Intel developers create better optimization for their own CPUs. UMX developers have verified this by noticing significant performance gains just by switching from another vendor's compilers to Intel's.
Entirely true. But as Dr Gene Amdahl pointed out, a tiny fraction of the code accounts for most execution. Hand-optimizing a portion of the code - a process that must admittedly be repeated for each iteration of the Intel processor range - yields benefits of the order of 2:1 over even the best compilers.
User Interface
UMX Virtual Mainframe is a robust Microsoft Windows NT, 2000 or XP application and utilizes Windows functionality for the user interface. Installation, operation and usability are relatively simple using a familiar Windows style interface. UMX Virtual Mainframe installation is typically 1-2 hours. Adding disk is a simple Windows file creation function and large files can be compressed to save space if the NTFS file system is employed. Network administration is easy for both local loop back and remote TCP/IP emulated sessions.
This is a "no contest" technical difference - Flex-ES executes under flavours of UNIX or Linux, not Windows. Each operating system has its advocates, though Linux has recently started to gain mindshare and is even mandated by some European public authorities. No apology is necessary for picking Linux - quite the reverse.
FLEX-ES employs a "Unix command window" interface where the operator keys long command streams for operation. It is a Unix variant that requires several days of expert installation. After installation, the experts have recommended that the user should attempt no system modifications.
Possibly the above may have been true for the modified Dynix-PTX variant of UNIX used with the IBM x430 Enabled for S/390. In most cases the most that an operator will have to enter is an occasional "shutdown". Command line interfaces also avoid confusion like the wonderful 40-post thread about the precise meaning of icons on a Multiprise. The IBM mainframe operator is used to unambiguous system commands.
FLEX-ES has limited scalability of emulated disk devices because of the high degree of difficulty and interruption to operations. An upgrade to capacity requires a few days to back up ALL the disk data to a suitable location and then restoring that data once the logical files have been redefined.
The idea that the Linux Logical Volume Manager has limited scalability is ludicrous. And anyone who makes any major change to a business-critical system without one or preferably two backups is frankly reckless. If free disk space is available on the host, adding emulated disks to a Flex-ES instance is truly trivial and does not even require the host to be rebooted. Indeed, if Flex-ES is supporting multiple instances, only the one affected need be stopped. A major reorganisation of a RAID group will require the data to be reloaded - but this is no different from any other RAID system.
There are a few other salient points that differentiate Flex-ES from UMX's VMF. Flex-ES is approved by IBM, has been marketed by IBM on the x430 NUMA-Q system, has been run against IBM's certification engine and uses very similar remote support facilities to IBM's own mainframes. It's supported by custom communications and channel cards that - in contrast with commodity PCI cards - preserve parity throughout. It's also the subject of several IBM Redbooks and a Redpaper:
But at the end of the day, it's what's delivered to customers and what they think of it that counts. There's a Flex-ES mailing list open to all who register - why not search the archives for all the reports of the disastrous performance problems UMX describes? A query on Flex-ES I/O performance might be a good start.