From Here to Modernity

Five Brave RPG Programmers Move from PDM/SEU to WDSc

April 23, 2006

Varieties of IO Strategies

For many, the idea of modernization includes at least two things: 1) moving from green screen to GUI, and 2) developing a layered application architecture where business logic is separated from user interface logic, on the one hand, and IO access logic on the other. A post at the end of the prior thread of this blog asked about four different IO strategies which were under consideration,

and I think this topic of various IO strategies serves as a good excuse to start a new thread.

Greg posted these questions:

Here's some simple changes that have been suggested to me. I'd like to know which of these you'd choose:

For a critical table, important to the business, do you

1. remove F specs and replace IO w/ SQL in each PGM or module

2. move IO to a SRVPGM and have a procedure return the record format

3. move IO to a SRVPGM and have several procedures which return various data structures

4. move IO to a SRVPGM and have field getters and setters

Doesn't a company risk having its database design go stale by coupling hundreds of programs to its tables and making change expensive?

Posted by at April 23, 2006 5:19 PM

Comments

Greg, of the four options you list, I have used option 3: IO service programs with several procedures. However, rather than "returning" data structures, they are "shared": the calling program or procedure imports the data structure exported by the IO service program procedure. What is "returned" is a small data strucure containing success/fail and other control information; what is "shared" is the data set. The data set is not a record format, and the relationship of data sets to tables is many-to-many.

We used this approach to develop a layer of IO service programs for data access to be used by middle tier business logic programs or procedures. The IO procedures were all embedded SQL.

I think setters and getters could be included in this approach where it seemed appropriate, but we shared data structures between the IO and BL layers, and the business logic modules would do their getting and setting from the data structures.

This was solely an RPG and SQL application; we were not concerned with interfacing with Java. An IO layer which could be used by both RPG and Java might be better implemented with internal SQL stored procedures which could be called by either Java or RPG. This might be an option 5 for you to consider.

I do not see options 1 and 2 as being very attractive.

And yes, of course, tightly coupling your business logic to your database makes database changes expensive. The virtues of a layered architecture are pretty clear, I would think.

BTW, Joel Cochran has some useful online articles about this at the IT Jungle website. His approach is similar to, but not identical with, the approach I have experience with.


Posted by: Max at April 23, 2006 5:29 PM

Our shop has decided on method 2. 1 module for each file. 1 Service program for files that can be grouped in a functional area. This I believe is the first step of separation. My only problem with this method and the others is access is not based on the function performed. I believe database access should not be closely tied with the business logic at all. Functional calls to a "Technical Services" layer would be more appropriate IMHO. Now you have functional calls that match the process you are performing. The database access is part of the technical services layer which gives true separation.

The functional call would return just the information the business logic requires. In some cases just a status code and in others data that would be displayed or manipulated by the business logic. This also requires a change to SQL instead of RLA to decrease the need for recompiles.

Sounds like alot of programming, but if you look at your existing code most of this logic already exists.

I don't think any of the options are attractive because what value have they really added? They could be seen as steps to a better goal, but if I still have to recompile and haven't gotten separation. Then what have I done?

I believe database separation has to be taken in steps, but you can't take 1 and then say you've achieved your goal.

Just my 2 cents

Posted by: Rick at April 23, 2006 9:21 PM

Option 2 seems to be the preferred option among the 200 or so RPG developers in my shop. My problem with option 2 is that it does not achieve any decoupling from the record format. A change to any field means a recompile for all the programs and modules that use the record format, just as if they still had the F spec entry. Isn't one of the main advantages of a layered approach the insulation of objects in one layer from change in another? Only the most elementary Java applications would allow a change to the database to necessitate a change or recompile of the Java.

Posted by: Greg at April 24, 2006 10:01 AM

Yes, the problem with option 2 is that it unnecessarily imposes the record format concept on SQL data access. Unless there are compelling reasons for doing this, it limits the payback from an SQL-based IO layer.

The strength of a many-to-many relationship between data structures and database tables is that the data structures serve the business logic needs. Changes to the database need not require changes to all data structures which draw columns from changed tables, and business logic modules which call procedures for data structures are even more insulated from database changes.

Posted by: Max at April 24, 2006 11:02 AM

Option 2 when I've seen it done, and I'm sure most had this in mind when they cited it, was a bit more function oriented than a literal chain to one file and return the record format of that file.

You could think of the data structure coming back as a function data structure, the source and computation of each field can change but it's essentially an interface that your business logic requires. It does decouple from source and format for the future, but provides what the logic was counting on however it is assembled.

This might include for example the server assembling a result set of detail records to return with the master record. The point being that it's not the literal format of the data structure that is depended on but the overall aggregate of data structure that logic is written to compute with. The source may change, but the logic still requires the aggregate data structure.

Option 2 returns the entire set so that custom structures aren't required to be created and kept in synch with myriad uses of each server, but option 3 does fine tune the return with custom data structure calls. This is the option I used in my architecture that I wrote of a few threads ago.

It is more fine tuned than returning everything but bundles everything that would require a number of gets and sets, too many to be practical. But it also requires importing common data strcutures with which to read and write in the server and client programs, so non-trivial changes to the logic requires a recompile of the server program.

Whether option 2 or 3, the data structure wouldn't be the identical record format. What SQL advocates do is use very large standard size fields, a tradeoff of size across the board, storage, parms, and computation, but not needing to resize to meet additional requirements, say quality specs to the thousandths instead of the hundredths as the larger fields would usually have enough room to handle additional requirements.

So fine, just set up everything huge like they do and voila, look ma, I can do that too. What a brave new world. And business wonders why computers that are a gazillion time more powerful than what they used to have are slower.

rd

Posted by: Ralph Daugherty at April 24, 2006 6:41 PM

During my career, which has been over for a couple of years now, I found the simple chain or read sequence within a program to be more than sufficient. During my time I designed countless applications, wrote dozens of moderate sized and large systems involving a total of thousands of programs. I was writing highly modular code back in the early 1990's using called programs. Even on the earliest AS/400's, response time was subsecond and much faster than comparable monolithic programs.

The above mentioned approaches just look a little pompous and self important. Why not concentrate on better learning how the business works instead of how overcomplicated you can make an otherwise simple problem? If I were managing your IT department, I would be thinking of this as busywork and of you as a potential means for labor cost savings in the next downsizing.

Posted by: AnotherView at April 25, 2006 7:25 AM

I think your opening paragraph makes you read more than a little pompous and self-important.

You talk about your experiences writing modular code and then you rail against someone who's question boils down to: "What's the best practice for modularizing my code?" Once that best practice is found, it simplifies the development and maintenance of solutions to satisfy business requirements. Or they could just hack together some code to meet the business requirements and leave it at that.

To wander back onto topic, I think #1 is pointless (it will make it harder to identify which programs are affected by a change), #2 is probably the most practical, #3 is ideal (by delivering subsets of files, you reduce the likelihood of a field change affecting a program that doesn't use the field), and #4 is just plain silly (it's a simulation of OOP in RPG, which would introduce so much overhead as to eliminate the productivity advantage of writing in RPG in the first place....if encapsulation is the goal, it'd make more sense to use a language that was built for encapsulation).

Posted by: rdean at April 25, 2006 6:18 PM

Several comments seem to imply that it is fruitless to attempt to modernize RPG applications. I think that in large part they are right but only because the pool of RPG programmers is made of people that program without the goals that come naturally to the better OO programmers. Concepts like "DRY" (Don't Repeat Yourself), "TDD" (Test Driven Development), and accurately and succinctly modeling the business in software. I have almost twenty years on IBM midrange but all I know about software development as an engineering discipline I learned working with the Java people my company hired six years ago.

Do "highly modular" RPG subroutines, procedures and programs model the business in a manner such that the code isn't tightly coupled to the files and fields as defined in the DDS? That would be modular. And what if the RPG didn't always have the same work fields, strings and literals defined repeatedly in module after module? That would be modular. What if the code to do each action were in the code just one time? That would be modular. Will we ever see the day where RPG code in any way reflects modern software practices? Not likely.

Ancient Chinese wisdom - the last to know of the water are the fish. You have to know of an alternative to understand where you are.

Posted by: Greg at April 25, 2006 7:46 PM

Yes, a curmudgeon after my own heart. I find the theoretical and never ending separation into abstraction to be silly when no one anywhere really wants to go to another OS and database, they just talk that way. Then they write abstracted code that only works on their OS and database. Anything that isn't, isn't worth running.

My particular work was a series of business logic servers based on green screen programs to interface to a real time shop floor control system, and yes, the chains and reads and updates were right there, the core of the logic. But when massive critical enterprise logic is only in green screen programs, isolating it out to where interactive, batch, real time, and GUI interfaces can call it is definitely not busy work.

Although it will make you busy.

rd

Posted by: Ralph Daugherty at April 25, 2006 8:43 PM

I began my IT career writing RPG II on a Sys/3 Mod 10 back in the summer of 1973, AnotherView, and like you, I have designed and implemented countless applications using reads and chains for IO, and before RPG IV was released in the mid-90s we had to accomplish our modularity using subroutines and program calls. We did the best we could within the limits of the RPG language at any period in its intriguing history.

But in 2000/2001 I was involved in a project designing and implementing an ILE RPG system for a major financial company where we had a layered architecture with RPG/SQL procedures in service programs providing the database layer. That experience convinced me that a layered architecture can deliver on its promises for a procedural language like RPG as well as for OO languages.

Each company must do what best fits that company's environment and culture, but I think the desirability of a layered architecture has become a recognized best practice, and I can't imagine a new system of any significance being written today which does NOT adopt a layered architecture of some sort.

I know from personal experience that modern, highly modular, layered applications CAN be written in ILE RPG. It has been done; it is being done. For new RPG applications we should expect this. A more interesting challenge is how to modernize older RPG applications which are not layered and not as modular as they could be now (they may pre-date ILE RPG IV), IF it makes business sense to modernize them because they continue to provide great value and need to be blended with newer applications and technologies.

But Greg is probably right to suggest that there may be limits to the extent to which we can transfer best practices from the OO world to the procedural world of RPG. But I bet the folks at IBM who manage the evolution of RPG are doing their best to push out these limits. The challenge is on us RPG developers to use modern RPG in a modern way.

Posted by: Max at April 25, 2006 9:14 PM

I have read most of the responses so far and I don't think my following opinion has been addressed, but forgive me if it has because I was skimming :-)


The number one thing we need to make sure of is that we are solving a problem! If your order header table is changing on a monthly basis then by all means encapsulating it is a good approach to ease recompilation efforts. THAT DOES NOT MEAN EVERY OTHER FILE ON THE SYSTEM SHOULD BE ENCAPSULATED! I see this all over the place with RPG programmers that get that first dose of "service program encapsulation cool-ness". Make sure there is need in each case before taking action.


The second point I want to make (which was covered above) is that modular code, whether it includes DB access, business logic, or both, should be based on need. If you don't need it then don't build it. To put forth an initiative to encapsulate every stinkin file or piece of business logic without even knowing if it will be reused is a plain waste of time. Sure you could label it as an "investment", but remember, investments need to pay out in the future otherwise the person who made the "investment" needs to stop being the broker making the ideas and authoring IT program structure direction.


There, I got it off my chest :-)
Aaron Bartell
http://mowyourlawn.com/blog

Posted by: Aaron Bartell at April 26, 2006 9:27 AM

Aaron, your opinion is very appreciable.
It is full of good old "common sense".
Quite uncommon nowadays.

Posted by: Claudio Cuzzi at April 26, 2006 10:50 AM

OK, I've read some of these threads and heard everything from we should all be fired to let's have common sense. I agree everyone's environment is not the same. But let's be real, about what the AS/400, iSeries, i5 (or what ever you want to call it is becoming. Not many users want their applications on greeen screens. There are some exceptions to that rule, but from what I've seen and heard the user community doesn't want GREEN. So we need to modernize our applications!! Our business logic is great and should be preserved. So how do you separate, some say buy a screen scraper and have at it, other rewrite it in Java. I'm not of that opinion, I believe in some instances we have to change the way we do things. In my development environment, we are struggling with a project to centralize 1400 AS/400's to 1 single box and have all remote users over a VSAT network. Think I want to send a 5250 data stream to 7000 plus users. Hell NO!! Maybe this type of situation hasn't come to your shop yet, but guess what, it will be it's just a matter of time!! So, I think if we don't start talking about these things and DOING some of them, we won't know what the best practices are and guess what happens when you can't deliver a company where they want to go? Just my opinion!!

Posted by: Rick at April 26, 2006 5:05 PM

The possibilities for layering and encapsulation are limited but we should not overlook them. Elimination of duplication would be a huge benefit to the code. When I see a constant repeatedly defined throughout some code I recognize that what it represents is probably important to the business and its function should be encapsulated in a procedure. I see the same IF statements repeated in the code where a record is read and the "code values" in it are compared to "special literals" to see how the record is going to be processed.

Have you heard the joke, "How many programmers does it take to screw in an object oriented light bulb? None. You send it a message and it screws itself in"? That is like the controller functionality we have in Java. Some benefits of a controller rely on polymorphism but, an RPG application could still encapsulate the controller function well enough. I see a lot of RPG.

I reviewed several SRVPGMs today. The top line in the comments of two say "cloned". I see that as a blatant admission of failure. The clone wasn't great code to start with and both the original and the clone have to be maintained for 10 to 15 years. Another way to eliminate duplication is to define surrogate keys in your tables. Programming is all about DRY - don't repeat yourself.

Posted by: Greg at April 28, 2006 9:01 PM

For more years than I can remember we have written modular apps, even way back on the S/34 and S/36. The 400 let us extend that into called programs dynamically bound, where explicit programs deal with explict business functions. This always minimised change impact. For years we have used, as required, called file access programs that serve other pgm requirements. File IO in a single pgm used across the entire system.

Anyone who uses monolithic programs (and I have examined some real MONSTERS, like over 3000 pages of compiler printouts) had a problem in the conceptual design in the first place.

IO is a major consideration, and RPG is still probably the fastest way to access and process data. Just like the 5250 data stream is efficient. GUI often isn't, but there are IMHO many people who do not understand the magnitude of data required under different situations. Hence faster machines and slower responses.

We write many cross app solutions, using server programs. Years ago (probably 15) we wrote a MS/ACCESS app that acted as the front end and chatted to 400 server apps, and DBFs with attached programs. Core business logic really belongs in one place, so any solution that scatters business logic becomes a new challenge. But combining the worlds in the right way is ideal, no matter how you do it. Modular is the best base thinking, no matter how you do it. But many apps have mud-ulare designs....

I agree that the 5250 screens often need refreshing, a subject that has been addressed between users and IBM dev-labs on many occasions over many years. But there are times when speed of capture is best on 5250 - again it depends on what the app is and who is using it. There are apps where GUI is best. Mix and match the best to fit the needs and the users is often a simple solution that lets you modernise easily. In other words I have never found one size fits all, or one solution method that is perfect.

Much of the time we have demands from users that deal with extracting information. Fine. We produce solutions that fit many needs easily, using 400 apps, and donwloading into things like EXCEL. Lots of "problems" get solved like this.

There is also a tendancy for the technical providers to focus on what they feel is needed whilst perhaps ignoring the business focus, those aspects that give the business competitive advantage. Thus some apps may need revision, but not all of them. This is the real challenge, to understand where the business needs to go, then decide what components need revision. More companies die because the apps are too restrictive to let the business change quickly. But that may not impact your debtors apps, or your accounting apps.

To respond to Greg's question we use 1 2 3 as well as the "classical" approach. Depends on the need for change now and in the future.

Posted by: Paul Gunner at April 29, 2006 12:53 PM

SQL, it seems to me, offers the possibility of having the module know of only the fields it needs and thus, the module can be decoupled from the fields it doesn't need. I don't want my code coupled to every field in the record format. I want to be as free as possible to modify the tables.

Object orientation makes it possible to closely model the solution domain to the problem domain. I originally thought that this just provided for ease of analysis and development but, having seen a Java project fail to develop an object domain model, the long term maintenance costs this project was going to impose became apparent. The RPG I see fails to encapsulate and reuse important business functionality. At my company, we need to have one SRVPGM that provides functionality over a Customer and another that provides functionality over the products, etc, etc. We have "modular" code but, to me, the modularity seems random, making the costs to maintain this code unnecessarilly high.

Posted by: Greg at May 1, 2006 4:40 PM

If you have a many-to-many relationship between the data structures produced by the IO procedures, on the one hand, and the tables used by the IO procedures, on the other, then you can design the data structures to serve the business logic, and the BL modules are less constrained by database design.

You in effect have a data mapping layer between the database and the application, such that those developers working at the business logic level need have little concern for the database design, as long as the developers working on the IO access layer can provide the data structures and result sets needed at the BL level.

The BL developer does not need to think in terms of tables and files and record formats, etc, but in terms of data sets and result sets as needed by the application. The IO access developer writes the SQL to serve the needs of the business logic level.

This is not what everyone needs to do, nor does anyone need to do it all the time, but this approach has worked very well for the design and implementation of new systems of some complexity.

Posted by: Max at May 1, 2006 8:28 PM

Off topic: news announcement
"IBM Hopes to Stimulate i5 Sales with High-End Price Cuts "

Oh boy, any marketer reading such an announcement raises his high brows. Oh boy, does Ibm still have a lot to learn. I recommend them a beginners course "how to market the AS/400" (huh, oops iSeries, huh oops i system, sorry system i i mean system thing... you know what i mean)

Posted by: ugeerts at May 3, 2006 3:52 PM

..."SQL, it seems to me, offers the possibility of having the module know of only the fields it needs and thus, the module can be decoupled from the fields it doesn't need. I don't want my code coupled to every field in the record format. I want to be as free as possible to modify the tables."

Greg, I think you made an excellent point. To isolate the business layer from the i/o layer in order to limit the impact of change requests on either side, has his merits everybody acknowledges. But a solution discussed earlier like isolating the i/o by means of a service program and to use data structures to communicate with the business layer, has poor quality imho. To access the data file(s), you need that peticular service program. What if you, besides your rpg programs, have a report generator or datawarehouse tool that you want to act on the same interface? Won't work.
Oracle has presented a much better solution. The interface is still SQL based, the programmer access a virtual file (view), but behind the view, a whole lot of business logic (triggers, procedures, functions using pl/sql) is present to transform the interface to the physical data layer. The pl part (p-rocedural l-anguage) means you can include the classical do while and if then else constructs, providing you with a richer set of logical tools than pure sql alone. Once this layer defined, it can be exploited from a java client, an odbc/dot net VB client or any other tool. No need to call a service program with a parameterlist and action code 'insert record' or that kind of BS.

Posted by: ugeerts at May 3, 2006 4:13 PM

Yes, Ugeerts, you have pointed out a key limitation in using a layer of IO service programs to establish the IO tier. The strategy works well if your application is all RPG (or ILE), but presents challenges if you want Java or other technologies to use it.

With DB2, internal SQL stored procedures can overcome this problem just as Oracle stored procedures do. Java as well as ILE programs can call stored procedures, as well as other technologies. iSeries shops which want an IO tier which can service both ILE languages and Java and other products may find stored procedures the way to go.

Posted by: Max at May 3, 2006 4:44 PM

I've been avoiding this discussion because it's getting more and more impractical every day, but I have to jump in here on this last argument.

How is wrapping stored procedures around your database fundamentally any different than isolating them behind service programs? The only difference is that you're also allowing unfettered access to your database for end users.

And don't talk to me about data mining; you should generate the data that needs to be mined and store it separately from your online transaction data. Typically data mining is NOT performed on realtime data, which is what we're talking about isolating here. Note that this is NOT the same as online queries, which extract specific realtime data for immediate reporting; that requires access to the current data, but can be done just fine through service programs.

Instead, we're talking about end users performing ad hoc reporting. This should be done on prepared copies of the data. You should NEVER give your end users carte blanche access to raw fields because they don't know the relationships between all the data. The trend of giving blanket access to raw data is simply horrible, because it locks your database design into your end users' queries. What if I want to change how I store dates? I have to now go and fix every end user query out there.

The fundamental question is whether you want non-programmers to have access to your tables and columns. The answer ought to be a resounding "NO!"

If you said "Yes" in a programming 101 class, your teacher would bash you over the head about "tight binding" between the presentation and data tiers. It's just a bad, bad idea when your presentation layer programs (and especially your end users) know the names of tables and columns in your database.

Joe

Posted by: Joe Pluta at May 3, 2006 9:00 PM

I don't think the use of stored procedures implies that users thereby have unfettered access to the database.

The only experience I have with a comprehensive wrapping of the database involved a layer of IO service programs consisting of embedded SQL procedures. This was an all-RPG application and it worked very well.

I have recently become more interested in the stored procedure option for two reasons: 1) Use of stored procedures by Java as well as RPG seems more straightforward, and the Java call of a stored procedure will likely perform better than Java use of an RPG procedure, and 2) performance.

I am currently working on a Java application which is essentially a decision support application which needs to do focused data mining to get data for the tool. Data requirements are intense and response time is important, so we benchmarked six different data access strategies, four involved SQL and two involved native IO. For this particular application, SQL in general outperformed native IO by about a factor of three. The internal SQL stored procedure gave the best performance.

Some of our data needs can be achieved with direct SQL queries from the app, but the more intense needs will be accomplished using called stored procedures.

We have two reporting tools popular with our users. One uses the approach you recommend, Joe, of preparing a copy of selected data and downloading it to a server dedicated to that reporting tool. The other tool provides direct read-only access to any number of databases, but the access passes through a metadata layer defined and controlled by IT. We use this tool to provide user access to both our DB2 and Oracle databases. User access is controlled both by security and by metadata definitions (they don't see tables; they see query subjects (similar to views)).

Posted by: Max at May 4, 2006 7:51 AM

See, this is the thing I hate. This thread was originally about modernizing legacy applications and has now devolved into yet another discussion of data mining.

Let's be clear: data mining is NOT application programming. Never has been, never will be. Has nothing to do with programming. It's about getting ad hoc access to large amounts of raw data in short periods of time. SQL was designed for this and is and always will be the best answer.

Any architecture you do for data mining should be separate from what you do for OLTP, since the two requirements are so vastly different. Designing one architecture for both is kind of like trying to design a good underwater fighter jet. It don't make sense.

So if you want to use stored procedures for your data mining, have at it. Let me know when you get back to talking about application architectures (at which point I guarantee direct program calls will blow the doors off of stored procedures).

Joe

Posted by: Joe Pluta at May 4, 2006 6:01 PM

The main weakness of stored procedures as a backend model to simultaneously serve RPG and Java concerns is that DB2/400's SQL PL doesn't support consumption of result sets. This limitation can be circumvented by table functions, but it's really hackish. Other versions of DB2 support consuming result sets directly (and simply).

Posted by: rdean at May 4, 2006 6:37 PM

Gee, Joe, you are so passionate about things!

I don't know if the term "data mining" has become synonymous with the production of large result sets, but if it has, then certainly SQL is the proper choice for producing large result sets.

If the production of large result sets has nothing to do with programming, then I suppose all IO access methods have nothing to do with programming, except that they are actions performed by the program.

There are many applications which need large result sets. Manufacturing planning and control systems and supply chain management systems (where my work has largely been concentrated) rely on large result sets from historical and forecasted data in order to do their work. Whether the use of large result sets from time-phased tables is always thought of as data mining I do not know, but is is very common outside of ad hoc access to large sets of data for reporting purposes.

But you are right: for single row or small result set retrievals to support online transaction processing systems, stored procedures would be an odd choice of data access. Therefore the idea of using solely stored procedures to encapsulate a database is unsound. Stored procedures may be an excellent strategy for returning large result sets, but there are better methods for getting small result sets.

Which is why our decision support app will use direct SQL calls from the Java app IO tier for single-row or small result sets, and the call of stored procedures from the IO tier for the generation of the large result sets.

BTW, I had advocated using direct SQL calls for all data access, but my preferred method came in second place to the stored procedure (usually; sometimes it was in first place). But I can't argue with demonstrated facts.

Before our benchmarking, I had generally disfavored the idea of using stored procedures to return result sets. I had thought stored procedures were mainly used either to return a value calculated from the database or to perform a batch process on the database. In fact, most examples of stored procedures from Oracle or DB2 manuals show them as being used for those purposes. But now I have learned that DB2 stored procedures can be the best choice, at times, for returning large result sets. Always learning something new in this field.

Posted by: Max at May 4, 2006 8:49 PM

RDean: In recent years I have written many Oracle PL/SQL programs, and one of the first things I noticed as I began working with the DB2/400 internal SQL stored procedure was that it did not support array processing within the procedure. I suppose this is somewhat related to the inability to consume result sets.

But for just returning result sets, a DB2/400 stored procedure, if it can return the desired result set, can be called by either RPG or Java, and that is an attractive characteristic.

BTW, wouldn't another way of overcoming the limitations of the internal SQL stored procedure be to use an external stored procedure instead? That way RPG could provide the array processing or the consumption of the result set, and yet in the end a result set would be returned to the calling program, whether RPG or Java. (Of course such a solution would only be useful if one wanted to produce a large result set and make it returnable to whatever may call the procedure).

Posted by: Max at May 4, 2006 9:12 PM

"Manufacturing planning and control systems and supply chain management systems (where my work has largely been concentrated) rely on large result sets from historical and forecasted data in order to do their work."

Um, not really. In fact, that's the difference between application programming and data mining. In almost any ERP you're doing work at a very low level, typically the item/location level and in relatively short time increments. You may be reading through milloions of records in any given run, but you're usually generating discrete packets of data (for example, demand for a specific day). The rules are generally complex and data driven and don't lend themselves to a single SQL instruction, no matter how complex. This is why native I/O (which is basically nothing more than good old ISAM) performs so much better (not to mention is much more maintainable).

Data mining, on the other hand, typically executes the exact same operations over thousands or millions or records at once, generating large result sets which are then massaged to generate information. The fields, orders and calculations are ad hoc, so there is little or no ability for the programmer to identify logic shortcuts or strategic denormalizations or any of the other techniques application programmers use to optimize teir algorithms. Thus SQL engines are usually able to optimize things as well as can be expected (although I would still guess that in many cases a good programmer would be able to modify the system in such a way that a given specific query could be faster using ISAM over SQL).

To take a page from quantum physics, application programming is the paricle nature of data access, while data mining is the wave nature.

"If the production of large result sets has nothing to do with programming, then I suppose all IO access methods have nothing to do with programming, except that they are actions performed by the program."

That's not what I said. I said that the generation of large AD HOC sets of data has nothing to do with programming. I thought I was pretty clear about that, but I'll restate my position. When the order, selection and calculation of data is entirely under the purview of the end user, then application programming is not part of the process. The end user effectively becomes the programmer (and you had better hope they actually know how the database is set up).

Unfortunately, if you don't understand this distinction, these two very different business requirements might superficially look the same. And when people confuse the two, they end up making bad architectural decisions... and end up with fighter jets with snorkels.

Joe

Posted by: Joe Pluta at May 4, 2006 9:18 PM

First, Joe, I agree with you about user-controlled data mining. Perhaps I should not have used the term when describing the data retrieval needed by our decision support system.

But I have to disagree about the use of large result sets for ERP and SCM systems. Many such systems which need to use large sets of time phased data to perform their functions will use SQL to build the result sets, and then apply logic to the results as they fetch through them.

In former times I was deep into MAPICS, and I agree with you about the power of native IO, but . . .

Many ERP and SCM systems are not written in RPG, and hence the use of SQL is sort of taken for granted. Our Oracle-based supply chain mgmt system, for example, is a Java J2EE application.

Posted by: Max at May 4, 2006 9:36 PM

"Many ERP and SCM systems are not written in RPG, and hence the use of SQL is sort of taken for granted. Our Oracle-based supply chain mgmt system, for example, is a Java J2EE application."

Again, the topic is devolving from a "best practices" to one where previous choices are dictating the architecture. You're implying that because some people have developed ERP systems on SQL databases that the "standard" approach is to use a result set.

This is circular reasoning. Since your only tool in SQL is the result set, then of course you have to use a result set. But I insist that ISAM access will far outperform SQL in anything but the simplest of requirements generations. Again, it's the particle nature of application programming vs. the wave nature of queries.

And that's why RPG on the System i is hands down the best environment for business applications. In situations where you need ISAM access, you have it, but where you need SQL you have that too. Nobody else provides that.

More and more this blog is starting to (as so many do) become a recitation of your reasons why you do things rather than a research of best practices. I'm not criticizing, simply observing what seems to be an inevitability in the blogosphere.

Joe

Posted by: Joe Pluta at May 5, 2006 6:25 AM

I would hope that this blog can be a place for exploring options and the reasons for choosing various options, and respect that all choices are relative to their specific context.

I was merely pointing out that SQL is commonly used and did not mean to imply that therefore it should always be used. I did mean to imply, however, that it can give acceptable performance.

I understand, Joe, that you are a strong advocate of native IO. Many of my colleagues here at Glazers are also strong advocates of native IO. I respect that preference.

But I am very pro-SQL. Prior to the year 2000 I used native IO almost exclusively; SQL was an incidental alternative used only in special cases. But then I spent a year on a project designing and implementing an ILE RPG application with a layered architecture where the IO layer consisted of service programs containing procedures which used embedded SQL only. Native IO was not permitted by client management.

That project taught me that SQL works very well. There were no performance complaints. The application was largely interactive and users was distributed around the country, all accessing a central iSeries.

Now I don't mean to imply that because SQL worked fine for that project everyone should stop using native IO and start using SQL. But that experience certainly changed my own views about SQL as an alternative to native IO for RPG applications. Subsequent experience as an Oracle developer deepened my appreciation of SQL.

I think the SQL vs. native IO choice is very interesting. My current thinking is that for many interactive applications performance is often a secondary issue to code architecture, because there is little appreciable difference in the performance given by SQL vs. native IO for small data retrievals (if you need a measuring device to notice the difference, it is not appreciable).

But what I find quite interesting is how the use of SQL or native IO can affect the structure of the code and the layering of the application (if this is desired). A lot of the discussions of SQL vs. native IO seem to focus on performance instead of on how IO strategies affect application architecture and program code structure.

For some applications layering may be unnecessary and the use of SQL instead of native IO may run counter to the code structure of the program. For preexisting applications already committed to native IO, the introduction of SQL could disturb the architecture. With new applications performance and architectural concerns can be evaluated afresh and up front.

Two questions I would like to propose for discussion are these:

1. In what ways can the choice of SQL or native IO affect the structure of the program?

2. In those cases where SQL and native IO give comparable performance, what criteria might be used to choose between the options?

Posted by: Max at May 5, 2006 8:15 AM

First, you make some bad assumptions, Max. I am NOT pro-native, at least in the sense that it would make me anti-SQL. I am in all things pro-correct-tool. I firmly believe in the right tool for the right job (I use both native I/O and SQL as needed). I believe that in the absence of external business influences that there usually is a "best" architectural solution; for example, it's hard to beat RPG on the iSeries for programming business logic. At the same time, I also believe that a proper requirements definition can lead to a perfectly sound business decision in program design that isn't architecturally "optimal".

However, what I do hate is when someone gets comfortable in a technology and then uses that technology inappropriately. For example, your client that forbade native I/O. I don't know the circumstances, but I've never seen a place where this requirement was made by technically competent individuals (with the exception of software companies trying to sell the same mediocre code to multiple customers, or consultants trying to charge multiple clients for the same work). Especially since you more than likely were forced to use extensions to the language that were not portable (TOP as opposed to FIRST nn ROWS comes to mind immediately). At that point, the vaunted platform independence of SQL is no longer in place, and without that big selling point the language is nothing more than a really bad substitute for ISAM access.

I won't hash the SQL vs. native debate. I can't; it's simply common sense to use the faster, better, more maintainable solution. I have no common ground with someone who is willing to accept lower performance in order to write code that is hard to maintain and nearly impossible to debug interactively. SQL is the wrong tool for OLTP and those who insist on it -- well, I don't spend much time with people who paint their house with a bulldozer, either.

Joe

Posted by: Joe Pluta at May 5, 2006 7:25 PM

I have spent most of my AS/400 career positioning within large files and reading by key forward and backward, using the data read on each IO to do IO's on additional files and make decisions every step of the way.

But for two years in the early 90's I was the programmer for an insurance company running an insurance package on the AS/400. They wanted additional client/server apps against the AS/400 developed in R:Base in SQL, and that's what I did.

Creating a cursor and reading through result sets was similar to SETLL and reading by key, but the SQL apps were relatively simple and I didn't have to do the complex interaction with multiple files I routinely do in RPG in more complex apps, but I was able to do all my record at a time processing with SQL using cursors.

When I looked at SQL not long ago to do some things on my web site, I saw that IBM has extensions in DB2/400 SQL to do forward and backward reading by key, which looked to me like they implemented in SQL the capabilities of native RPG IO opcodes such as READE and READPE. But it wasn't clear to me that other SQL's had that.

I looked at commercial software and what databases they ran on, and I saw specific databases listed that you could run the software on. In a recent example, thay even said they ran on IBM database, but when you got to the technical specs it was Informix.

One last point: SQL has been around almost as long as RPG. There is nothing "modern" about it. It is not a given to me that "modern" development is in SQL, or that the alleged reason it's being used is that it is database independent. It's not.

So we have an OS/400 computer and RPG native IO that's the best at transaction processing of massive data in the world (better even than COBOL on the mainframe perhaps), and we have people running around who say let's write it in something that runs somewhere else, even though it doesn't.

When business specifies SQL for that reason, they are buying into a myth propagated by most of IBM and everyone else who doesn't have RPG native IO, but they are being misled deliberately. SQL is not cross platform.

Can SQL for DB2/400 do what we do in RPG? Well, IBM spends full time trying to make sure it does. But in my opinion complex interaction among several files by key is what RPG is for, and trying to emulate that with cursors and proprietary SQL enhancements is not a good AS/400 application architecture.

Every major ERP up till now has been in RPG, COBOL, and German COBOL. If Oracle and their SQL ways could write ERP's in SQL, they wouldn't have had to buy JDE and Peoplesoft. Their trying yet again with Fusion.

Sooner or later maybe mediocrity will prevail. The AS/400 and RPG will have to be gone by then for that to happen.

rd

Posted by: Ralph Daugherty at May 6, 2006 12:40 PM

Thanks, Joe and Ralph; you both make excellent points.

It amazes me, Joe, how soon I forget after completing a project involving SQL in RPG, whatever difficulties I may have encountered because of the difficulties of interactive debugging of embedded SQL. My practice is to develop the SQL separately in advance of embedding it to minimize the extent to which I experience that pain.

I was not privy to the client management decision to use SQL only, but I know IBM was involved in designing the IO tier and I suspect they may have been involved in that decision. I think we can probably attribute it to "architectural philosophy".

I don't think there was an RPG developer on that project who didn't find it odd, and initially scary, to not have the ability to resort to chain and read operations. But then again, many of us found the highly normalized database design unfriendly to native IO anyway -- too many tables to chain to.

In the end, it all worked out well: the IO layer design did a good job buffering the application programming from the highly normalized database, and the system worked well. Given the overall system architecture, including the highly normalized database design, I think SQL was the best choice for that project.

Perhaps we can conclude that the more normalized the database is, the more useful SQL is likely to be; and the less normalized the database is, the more convenient native IO is likely to be.

Ralph, I hope someone someday writes a detailed history of our profession and its tools; I think it would be fun for us old-timers to read. RPG was first released in 1964, but the more potent RPG II was released in 1969. The next year Dr. Codd of IBM laid the theoretical foundation for SQL in his description of a relational model for database design. But it wasn't until the late 1970s that SQL products started hitting the market. As I recall, IBM was slow at bringing SQL to the AS/400.

But you're right: SQL is not modern; it has been around for close to 30 years. What "might" be "modern" for the iSeries community, however, is the increased use of SQL, along with procedures (brought to us in 1994 with RPG IV), to develop a different breed of architectures than what we developed in the 70s, 80s and early 90s. And yet, the theoretical foundations for highly modular layered architectures dates back to . . . the early 1970s! Why then did it take IBM until 1994 to give us RPG IV?

Posted by: Max at May 6, 2006 8:06 PM

Why was ibm late to bring SQL to the as400? Why did they gave us rpg iv in 1994? Why did free format rpg emerge late in 2002?
There is one constant factor here; ibm is about late to bring everything to the as400.
Back in 1999, i was on a project where the code base for a major application was written in rpg400 but the sources needed also to be ported to a unix cobol environment using an Oracle database. The porting from rpg400 to cobol was completely automated (some German company provided the language conversion tools), but performance wise, the unix environment was unacceptably slow. So we started using Embedded sql in rpg400 to replace rpg's native record i/o, and, as you can quess, the unix application ran about 5 times faster (a batch job of 60 min ran in 12 min), but the same job on the as400 side ran about 10 times slower (the 60 min batch job took 600 min or 10 hours).

Indeed, replacing a simple rpg chain by a statement like "select field1, field2.. from table where keyfield = :keyvalue" is pure horror for the as400, yet for Oracle, it is the only and fastest way around.
Now let's take a look at all modern languages, merely the Oracle Pl/SQL's, the SAP ABAP language, the Java's, the Visual studios (VB, C sharp...) the office automation tools (macros in excel or ms access), the scripting languages embedded in HTML, of this world. Do they use record access methods to access the DB? Does the possibility even exist in those languages? The answer is NO (except maybe to access files outside the DB like pure textfiles). The rest of the world is using SQL syntax to access a DB, only rpg and cobol still provide record access. My conclusion is: instead of writing an application on the as400 using native record i/o, write the application on a modern machine in a modern language using "modern" sql.
I have witnessed several IT hardware decision processes NOT in favor of the as400, and the most common reasons against, apart from the legacy-obsolete image, mentioned were:
1. the inability of ibm to bring a native (not web based) gui to as400
2. the inability of ibm to bring the performance of as400 sql on par with the other major rdbms

Imho, these are the major reasons which led to the demise of the as400 in the last 5 years.

Posted by: ugeerts at May 7, 2006 5:48 AM

When I said German COBOL, that's SAP. That is what it has been referred to. Does German COBOL sound like SQL?

I researched record level access in other databases. Oracle and SQL Server have it. There's a reason for that.

The writer mentions Oracle and Microsoft. Neither had a manufacturing level ERP with SQL. MS sought to buy SAP, German COBOL, in addition to buying Great Plains and others. Even lightweight PC ERP's like Great Plains are written in PC database record level access.

Oracle bought Peoplesoft, much of its ERP written in COBOL, and JDE, written in RPG.

Both Oracle and MS are struggling to put out their own ERP's now, Fusion and Green. Green has been delayed. How much is due to trying to do ERP record level access with SQL? A lot, I think.

When all is said and done, the SQL they have will not be database independent but instead be highly tuned to the record level access API's of Oracle and SQL Server respectively.

We already use something better. It's called RPG.

rd

Posted by: Ralph Daugherty at May 7, 2006 11:56 AM

I'm surprised that you refer to an Oracle record level access API. I don't know why such a thing would be needed. As far as I'm concerned, record level access in any version of SQL is merely a select statement with the intent to retrieve a single row. It is used all the time.

The only time I have heard the expression "record level access" in OracleLand is in reference to security, not retrieval method. But I'll have to check with our Oracle DBA to see if there is some well-hidden alternative RLA method to a focused select statement. I would be very surprised if there is.

I have experienced very good performance with single-row retrievals using IBM SQL on the iSeries, but I haven't done millisecond benchmark comparisons to the chain operation. I am being tempted to do so, just to see what the extent of the difference really is at V5R3.

I think ugeerts' main point is that IBM has been tardy in bringing many technologies to the AS/400, and having them perform well right away. I know of people who have a bad impression of iSeries SQL because of bad experiences with early versions of it. They should give it another try. I think IBM is telling the truth when they say they have been continually improving it.

Posted by: Max at May 7, 2006 12:33 PM

Hi Max. I Googled record level access several months ago as part of research for a personal project. I didn't take notes, but my recollection was that Oracle had an API for it and MS was enhancing their's.

As I recall, SAP's paid version of their database also had it.

I decided to use another database so I didn't go into further research on any of these, but I and I'm sure everyone else will appreciate any corrections if needed. :)

rd

Posted by: Ralph Daugherty at May 7, 2006 1:07 PM

Max, when you say "Ibm is late in bringing new technology and have them perform well right away" is exactly the point I'm making. The project I mentioned ran in 1999, probably on an older as400, maybe still a cisc machine. Ibm lost face in those days, at the height of the ICT bubble with so many companies that even if they claim they are now furiously improving sql performance, it is simply *NOT* believed anymore by those who have decided to leave the as/400 platform in the mean time.

BTW: by record access i mean the syntactical construction of using a start or setll, read or read previous (read, readp), write record etc. statement as opposed to the sql construct select from , declare cursor, insert into etc. which allows to work on multiple records or result sets at the same time.
The reason why record access performs better on iseries is well known; the as400 database and those of its predecessors in the 80's was ISAM based, SQL was just a layer that was put on top of it in the early 90's.
Oracle on the other hand, was SQL based from the beginning.

Posted by: ugeerts at May 7, 2006 1:11 PM

"Perhaps we can conclude that the more normalized the database is, the more useful SQL is likely to be; and the less normalized the database is, the more convenient native IO is likely to be."

Nope. The issue has to do with the type of programming you're doing. Again, it's the wave nature vs. particle nature concept I've been trying to get across.

Take a simple price lookup. If you want to get the total for a line, you simply multiple the quantity by the unit price. This ought to be a simple SQL, right?

Now let's add real business rules. First, items have many prices. The price you get may well be different than the price I get. It may depend on the customer, or the customer class. It may depend on whether I have a coupon, or whether there is a sale going on this week. It may depend on whether I've reached a certain threshhold of purchases this order, or even this year. It could even depend on other products I've bought.

And that's just the unit price. I haven't talked about taxes, which opens up a whole different set of decisions.

Lots of things go into figuring out the price, and the vast majority of them look something like this:

READ FILEA BASED ON FIELD1
IF FIELD2 IN FILEA IS NON-BLANK, THEN READ RECORD FROM FILE2 USING FIELD1 AND FIELD2 AND USE FIELDB
OTHERWISE, READ FILE3 USING FIELD1 AND CURRENT CUSTOMER NUMBER
IF FIELDD OF FILE2 IS > 0 THEN USE FIELDD OTHERWISE READ FILE4

And so on. This is the database-driven "particle" nature of real application programming that simply doesn't translate to SQL. By the time you manage to create an SQL statement that handles that level of complexity, it's 50 lines long with tons of CASE and COALESCE statements and SUBSELECTS and WHERE EXISTS to the point where it's not readable, much less debuggable. Trying to add a single new rule often requires rebuilding the whole thing from scratch.

Whereas with record-level I/O, it's a breeze. You prototype the rules the way I laid them out and basically you just execute those instructions.

And here's what's even more important: it's relatively easy to view the logic branches to identify ways to optimize the processing based on expert knowledge of the database. While SQL can indeed perform optimizations, those optimizations are by definition more general in nature, and don't take into account the real world (and possibly changing) population of the database; it can't! Only a human being can make those sorts of decisions.

SQL is the wrong choice for these situations, just as RPG is the wrong choice for heavy stream file processing. There are no "mitigating factors" or "special circumstances" under which SQL somehow becomes better. It just never is. The beauty of the System i as opposed to, say, an Java- and Oracle-based system is that we have a choice with RPG and they really don't.

So RPG and native I/O is the best solution for that problem set, while SQL is the best tool for different problem sets. The beauty of the System i is that it's the only system that allows both.

Do your benchmarks. You'll see significant performance benefits for single-record fetches and even higher benefits for single-record updates. The numbers may astonish you.

Anyway, I'm done here. Until you prove that SQL is faster than native I/O, I will continue to be amazed and horrified that so many otherwise intelligent people would continue to use a slower, harder to maintain and harder to debug tool. It's simply contrary to common sense, and another reason why business people distrust programmers.

Joe

Posted by: Joe Pluta at May 7, 2006 1:34 PM

Max, I would like you to perform a thought exercise and tell me why you think a single-record SQL statement "SELECT * FROM FILEA WHERE CUSTOMER = 1000" would perform faster than a CHAIN. What overhead is there in the CHAIN that is not in the SELECT? On the other hand, you can probably spot a number of things where the SELECT has more overhead than the CHAIN.

Does this sort of common sense thinking come into play when you make your decisions? Or is the computer just a big magical box where the laws of physics don't apply? For example, maybe parsing the SELECT statement takes NEGATIVE CPU cycles, which would make the SELECT statement faster!

I'm sorry if that sounds a little condescending, but since this entire blog has now devolved to an attempt to justify SQL where it clearly doesn't belong (as well as a bit of really nasty RPG bashing), I don't feel I'm really diminishing anything.

Joe

Posted by: Joe Pluta at May 7, 2006 1:44 PM

First of all, Joe, I don't engage in RPG bashing. I've used the language since 1972 and I have enjoyed growing with RPG through the years. I've seen lots of styles of RPG coding and I've written a lot of code. I like free form RPG IV written in a highly modular style, and I know from experience both that systems with layered architectures can be written in RPG, and that SQL procedures can make positive contributions to code structure.

Second, your opinion that there are places where SQL does not belong is your opinion. Some will disagree with you. It has a lot to do with architecture, database design and coding style.

Your programming example assumes that SQL must do in one statement what your native IO does in as many reads and chains as it needs plus the interwoven business logic. There are other ways of structuring a solution, some of which could use SQL. For example, assuming a modular design, you might start with a:

extPrice = qty * getPrice(item,cust,p3, p4);

The getPrice procedure could use a SQL statement to get the component values and then use a Select/When logic structure to determine the price to return. Not messy; easily comprehensible.

I do not see the issue as whether a chain is faster than a single-row retrieval, but whether the difference is significant anymore, given the performance increases of iSeries SQL. I prefer benchmark tests to thought experiments.

Being pro-SQL does not make me anti-RPG. I like this language. I'd like to see it survive.

Posted by: Max at May 7, 2006 10:12 PM

Max, I didn't say you were bashing RPG. I said the thread was.

Second, as to where SQL belongs, I expected this particular argument. When someone tries to imappropriately position SQL, the discussion always degenerates to "the performance loss isn't significant." You can set your watch by it.

Well, if performance isn't significant, what is, Max? We've already determined that SQL isn't platform independent, and that it's difficult to debug. So what is the benefit I'm gaining in return for the performance?

It's clear that your primary purpose in this blog is to justify SQL over native I/O, and that any argument I make will be met with some version of "SQL performs well enough", which is the final refuge of the SQL zealot.

Well, let's put this to the test, shall we? I'll challenge you to a simple programming exercise. I'll create a pricing scenario similar to the one above but with specific details. We'll then write our programs and compare the results.

Joe

Posted by: Joe Pluta at May 8, 2006 12:08 AM

I have been at clients where they only allowed use of SQL for DB access. Primarily because IBM said so. It appears to be IBM's direction to move exclusively toward SQL vs native IO. Not sure why other than that is what everyone else does so it must be the cats meow.

I worked on a large billing project where there were millions of transactions to process in a short window of time. The DB was Sybase and it could not handle the volumes. So the developers copied all the reference tables into an ISAM Informix DB, processed the transactions (pricing can be rather tedious for billing eCommerce network traffic) and then bulk copied the results up into Sybase. This was a new system to replace an AS/400 using JDE. Gotta love it. Of course it had a GUI interface for table setups (yippie skippie).

I use SQL and native IO and find each have their place. If you have change management in place DB changes are a breeze. If you dont have change management, it needs to be a top priority to put in place (IMHO)

I am not anti GUI and pro 5250. But I do find it amusing when business's cut off their nose despite their face to follow the lemmings off a cliff.

Posted by: Stevek at May 8, 2006 6:50 AM

That's a great idea, Joe! I enjoy exploring various coding strategies, and I love to code in RPG. And heck, this may give me an opportunity to see where I fall on the openminded/blockheaded scale. Let's do it!

As for what is gained by using SQL: In some cases SQL outperforms native IO, as our recent benchmarking showed for a decision support application. Also, the use of cursors produced from joined tables can offer an alternative to the code pattern of a read followed by conditional logic followed by a series of chains followed by conditional logic. In some of these cases there may be gains in both performance and code structure (depending on desired style of code structure). And SQL can offer a quick way of getting a set of data items from several tables with one IO action.

I think we both agree that where SQL significantly outperforms native IO it should be used, and where native IO significantly outperforms SQL native IO should be used. I think it is also the case that where both provide comparable performance, OTHER criteria will come to the forefront. These other criteria may in some cases favor SQL and may in some cases favor native IO.

Posted by: Max at May 8, 2006 7:52 AM

If you're running benchmarks where SQL is outperforming native reads, you might want to a activate record blocking in your "F" specs, which gives a tremendous boost to the performance of native reads.

Also, if you're comparing native reads to something like:

select flda, fldb, fldc from filea

where filea has a large number of fields, then you might want to create a logical file or view that has been reduced to just flda, fldb, fldc, to make a more apples to apples comparison.

I keep hearing that SQL outperforms native I/O under certain circumstances, but in every test I've run, I was able to come up with a native method that produced the same results, faster than SQL.

This is not to say that I always advocate native I/O over SQL. Just that I tend to challenge SQL performance claims.

Posted by: Nathan Andelin at May 8, 2006 9:25 AM

"Also, the use of cursors produced from joined tables can offer an alternative to the code pattern of a read followed by conditional logic followed by a series of chains followed by conditional logic."

Not to be contrary, but so what? To me, this statement, without some concrete examples, is really without substance. Unless the "alternate code pattern" has some measurable ROI, why does it matter?

The phrase "desired code structure" is the tail wagging the dog. If you have a predetermined code style, then this entire thread is justification, not research.

The other issue is "significant" vs. "comparable". Please define these terms. To me, a 10% performance difference is significant. What is it to you? My guess is that whatever number makes SQL "comparable" will be the answer. If I'm wrong, then please state a number.

I also still have some issue with the whole idea that if SQL is "comparable" then it should be used instead of native I/O. This denigrates ISAM access, which is just as valid an access method as SQL. SQL per se is not better than ISAM. Especially since your earlier post about doing a bunch of SELECTs followed by a big old WHEN construct is basically the SQL equivalent of ISAM access, and not a very good one at that.

Joe

Posted by: Joe Pluta at May 8, 2006 9:49 AM

Thanks, Nathan. The native IO strategies used logical files and the SQL strategies used DDL-created indices. This interactive app requires aggregation of data from a variable date range over a variable set of items involving a varying set of files, all determiined by user input. The SQL solutions used a "group by" over a subquery of "unioned" selects; the native IO strategies used CHAINs, SETLLs and READEs. The developers who wrote the native IO programs are, in my opinion, top-notch seasoned developers, and are not SQL zealots. I think this was a case where SQL was just the better performing option.

Posted by: Max at May 8, 2006 10:09 AM

Gosh, Joe, I sometimes feel like you just want to argue with me. It seems you read more into my words than is intended. To treat your four points in order:

1. The ROI on coding style is reduced maintenance costs; that's why IT departments have standards. That's why structured programming became dominant, that's why modularity and layered architectures are recommended, and was likely a prime motivation for the development of object-oriented languages. Since most of the cost of an application is maintenance cost, this can be a huge ROI, if the adopted standards do in fact promote readability and lessen maintenence time.

2. By the expression "desired code style" I merely meant to condition the claim that there was a benefit of that nature only if the code style was a desired code style. I meant to respect the differences in preference which developers and departments have.

3. We may have to negotiate the definition of "signicant" and "comparable" so we don't argue over the results of our exercise. I would define the terms differently for batch and interactive settings. To me, in an interactive setting a difference not appreciated by humans is not significant; a 70% difference in milliseconds which cannot even be noticed by the user is just not significant in my book. In such cases other criteria should be given greater weight. For batch processing 10% seems a little low; 12 min. is significant over 10 min.? 34 min. is significant over 30 min.? I don't know if a 10% difference would play a major role in a design decision if other criteria were also deemed important. I would have probably picked 20% out of the air, but if you want 10% for a batch process I can agree to that.

4. I definitely did not say that if comparable, SQL should used. I clearly said that if the methods are comparable other criteria should come to the forefront. Is this not rational?

Also, you mischaracterized my suggested solution from the earlier post, where "select" was singular, not plural. And thinking about it further, the SELECT logic structure might be unecessary. Maybe something like this would work in the subprocedure:

Select COALESCE(cp.DiscPrice, sp.price, i.StdPrice)
INTO PriceToReturn
From ItemMaster i
LEFT OUTER JOIN SpecialPrices sp
ON i.item = sp.item
and sp.Override_Flag = sp.Override_Flag
LEFT OUTER JOIN CustPrices cp
On CustNumber = cp.CustNumber
and i.item = cp.item

But maybe we should just get on with our exercise, Joe. I am curious to test IBM's claims about improving SQL performance.

Posted by: Max at May 8, 2006 10:59 AM

Even top-notch seasoned developers may miss performance enhancement techniques available in native access methods.

On the other hand, decision support applications are a good case where SQL is helpful, if not required. I'm the author of a CL command named SQL2STMF (SQL To Stream File) that dynamically generates SQL result sets and outputs them to stream files in fixed length TEXT, CSV, and XML formats, which are useful for data interchange between systems, generating files in batch mode for data warehouse applications, and such.

When it comes to transactions, database updates, and business logic like the examples outlined by Joe Pluta, I think native I/O is the best tool for the job. In some cases SQL can be more of a hindrance than a help.

Posted by: Nathan Andelin at May 8, 2006 11:16 AM

Max and Joe I am interested in this test as well. Please ensure the transactions being processed are significant (ie 500,000 or more) and the pricing tables and other reference tables (ie customer master, item master, etc.) used have a significant number of rows to make it a realistic test (ie 50k to 100k rows or more). I would also like this proof of concept to use a normalized DB design.

I look forward to seeing the results.

Thanks,
Steve

Posted by: Stevek at May 8, 2006 11:25 AM

Oracle Call Interface APIs

It had been suggested that Oracle may have developed APIs equivalent to RPG's RLA operations. I consulted our local Oracle Brahmin (DBA), and he says no, but the poster may have been thinking of OCI (Oracle Call Interface) APIs, which permit application developers to work closer to the database and wrest some control from Oracle over database work. OCI APIs can be used by many programming languages to gain greater control of the database, but the IO operations used are the standard select, insert, update, delete, etc.

Posted by: Max at May 8, 2006 11:40 AM

So am I. Last time a ran a comparable benchmark was in 1999. The setup was something like this:
1. generate a file which represents the inhabitants of a small country or US state, say 10 mio records occupying a disk space that is at least 10 times the size of main memory, to make sure results are not influenced by memory disk caching. The records are identified by a personal identity number which is just a sequential number.
Write half of the records with native i/o, the other half with sql's insert statements. Let the job run on an unloaded system. Measure times.

2. Use a random generator to generate a random persons id as key field. Use native chains to retrieve about 5 mio records, then use sql's "select from" to do the same. Let the job run on an unloaded system. Measure times.

3. consider variations with update's and delete's.

Sql will come out bad i'm afraid, despite ibm's claims of recent progress, but lets see.

PS: most shops have a heterogeneous environment, besides the as400 there is the usual Windows Sql server or Unix Oracle server. Why not run the same test there (of course sql only) and compare systems?

Posted by: ugeerts at May 8, 2006 11:52 AM

Googling again, it looks like I was referring to the SQLLIB runtime library calls, but looking closer I see it's used by precompilers and undocumented.

Thanks for the correction, Max.

rd

Posted by: Ralph Daugherty at May 8, 2006 8:35 PM

This thread started off discussing I/O strategies, and what's clear from the discussion's transition to the RLA vs. SQL debate is that you want to isolate I/O code so you can pick-and-choose (if you think that's a good idea) and change your mind when you find out you were wrong or IBM or your company management changed the rules.

Early in the discussion Aaron Bartell suggested: "To put forth an initiative to encapsulate every stinkin file or piece of business logic without even knowing if it will be reused is a plain waste of time."

This is dead wrong.

Encapsulation is a programming strategy that should be used pervasively. The point isn't mainly for code reuse, but rather to manage complexity.

A professional programmer should no more attempt their work without programming tools, such as encapsulation, factoring, decoupling, etc., than a surgeon should perform an operation without sharp surgical knives. OTOH, I don't recommend amateurs attempting even the simplest surgery, regardless of the tools at hand. Similarly with programming.

This "encapsulation" stuff isn't "woo-woo" programming -- it's basic competence.

= = = =

As to SQL vs. RLA. I'm one of the "gray beards" in the industry, and I swear I must have met Joe Pluta back in the early 70's when I first started programming in COBOL on a mainframe. At least from my recollection there was this guy who sounded like Joe, ardently advocating that my cohorts and I continue to implement application I/O as channel programs written in BAL (assembler), rather than incur all the overhead of COBOL's built-in I/O. :-)

Before injecting a little reality into the SQL vs. RLA discussion, let me say that you don't need to "rip-and-replace" stable, working RLA just to use SQL.

And, it's fine if you want to selectively use RLA in new code where it makes a truly significant difference. Especially if you've isolated your I/O code so you can easily change it if the need arises.

But SQL is a "no brainer" for your I/O strategy.

SQL generally performs well when used appropriately and performs better than RLA in some cases.

Most importantly, IBM, along with the entire industry, has settled on SQL. IBM is putting all their DB work into additional functionaility and performance into SQL.

All the major development tools target SQL.

Java, .NET and LIMP Web application platforms all target SQL.

Almost all education, training, books, Web resources, and consulting revolve around SQL.

SQL on i5 has a HUGE amount of functionality that DDS and RLA+OpnQryF lack.

SQL is simple to code if you have the skill and don't have an "attitude."

SQL's column-oriented approach and its creation of access plans on-the-fly provide valuable "data independence".

SQL for i5 provides several performance advantages for "roll-your-own" CGI, as well as simplified "optimistic" concurrency that's essential for many Web applications.

As to benchmarking SQL vs RLA performance... For simple, high-volume batch processes that run in relative isolation (e.g., at night), you can get meaningful comparative results by simple benchmark scenarios.

For real world, multi-user, multi-application production environments, simple benchmarks break down. This cuts both ways, and trying to extrapolate from benchmarks that aren't run under a simulated multi-user load can skew the results either way. (I would put no stock in the "toy" test that Joe posed.)

My advice to iSeries developers and their managers who want to chart a sound development strategy for the future is this:

* Follow professional programming practices, including encapsulation, etc., in all your development as a matter of course.

* Adopt SQL as your standard practice for DB definition and access. (And limit RLA to exceptional cases, if at all.)

* Isolate I/O (of all kinds) so that most of the application code calls procedures (or "methods" or "services") that have an interface at a higher level than a file format (or SQL table structure).

* Visit IBM's "History of Computing" museum and catch the "Mummified Assembler Programmer" exhibit next time you're in Poughkeepsie.

Cheers!

-- Paul

Posted by: Paul Conte at May 9, 2006 9:55 AM

I'm going to take exception with a few of Paul Conte's suggestions, hoping not to diminish the ones I agree with and arise from his insight and experience.

I'd suggest using DDS for base tables and logical files, except in the rare cases where BLOB and similar data types are required. I'd also suggest that storing references to stream files in records is better than using BLOBS in most cases. DDS is easier to work with. Stream files are easier to work with.

If you need SQL DDL in order to migrate database definitions to another platform, use iSeries Navigator to generate it after the tables have been created. iSeries Navigator is cumbersome for defining tables and views. If you need SQL views, I'd suggest using the RUNSQLSTM command against source members, instead.

SQL is an important element of IBM's server homogenization strategy, but that doesn't mean that RLA should be pigeon-holed or relegated to a corner as suggested, unless you think server homogenization is a good strategy for your business. It has been interesting to see a number of Java frameworks arise that insulate developers from SQL by creating RLA based interfaces around it.

Regarding benchmarks, it has been my experience that differences in I/O performance generally increase as the number of users, threads, and workloads increase. I wouldn't suggest that there is magic in SQL that makes it scale better under complex multi-user workloads.

Posted by: Nathan Andelin at May 9, 2006 12:14 PM

Nathan, I may be going out on a limb here, since I do not know which frameworks or Java objects you are referring to, but when you mention that "It has been interesting to see a number of Java frameworks arise that insulate developers from SQL by creating RLA based interfaces around it", this might be misleading for those who don't know Java.

I assume you are referring to those data mapping objects which in a 5-tier layered application separate the business logic domain from the data source layer. To compare those to RPG RLA operations is imaginative and perhaps insightful, but we may be talking apples and oranges here. The role these objects play and the effect they have on OO code structure is quite different from the role of RPG RLA operations.

There are a variety of approaches to providing this layer of data mapping or persistence objects, including the development of persistence frameworks like Hibernate, and I am assuming you are referring to this category of Java objects. If you are referring to some other category of Java objects, let us know more about the details.

Posted by: Max at May 9, 2006 12:45 PM

I am curious about a few things:

1) What change management tools is everyone using for deployment of SQL databases (and revisions). I doubt using techniques like RUNSQLSTM over production environments would fly with SOX auditors.

2) As far as data encapsulation it seems to be the consensus of those in the know to create a common structure to be used across business process for data retrieval regardless of if it is needed or not. Or is what is being bandied about to have proprietary data retrieval structures for each business process (that seems unrealistic).

3) How many layers of Db encapsulation are we talking about. Are those that are in the know adding an additional layer under these common structures (i.e. setters/getters).

4) I am also curious if there is a method similar to SETLL to check for existence via SQL.

Thanks,
Steve

Posted by: Stevek at May 9, 2006 12:56 PM

Hibernate is one of the Java database frameworks that come to mind, though I wasn't thinking of it specifically. It's a little hard to discuss specifics because of the number of frameworks that have sprung up, but it seems generally accurate to say that they tend to insulate developers from SQL.

I agree that it's not an apples to apples comparison, and I don't want to trivialize the architecture of OO frameworks, but I occasionally see methods exposed through record, and field objects that parallel native RLA operation codes, XML based data definitions that parallel DDS, and table objects that parallel "F" specs. To me it doesn't seem overly imaginative to note parallels.

I hope you feel free Max, to share your thoughts about how database frameworks improve the structure of applications.

Steve - I'm not suggesting that end users have unfettered access to RUNSQLSTM. Just that it allows SQL DDL statements to be stored in source members and provides a convenient way of creating SQL Views.

I'm not against SQL. I do think IBM's server homogenization strategy (SQL, n-tier), combined with it's server assimilation strategy (Java), competes against and threatens its platform integration/server consolidation strategy, and may even be killing the System I as a platform.

With server homogenization/platform assimilation the natural tendency is to deploy to the lowest common denominator (Wintel).

Posted by: Nathan Andelin at May 9, 2006 4:01 PM

I don't think it is overly imaginative to note high-level parallels between system architecture strategies; in fact it can be insightful. But we also have to be careful to note differences in motivation and effect, and perhaps even historical context (RLA predates RDBMS), in order to keep our insights truly insightful.

The emergence of the data mapping layer illustrates an important point Paul Conte made above: "Encapsulation is a programming strategy that should be used pervasively. The point isn't mainly for code reuse, but rather to manage complexity."

As patterns of data mapping activity were recognized -- to map data between the IO access objects and the business logic objects -- the motivatation to encapsulate in order to manage complexity led to the idea of a data mapping layer between the BL and IO layers. This is now I think the standard model for J2EE development.

You don't need persistence frameworks to do this. It can be done with Enterprise Java Beans or other Java objects. The persistence frameworks provide consistency and convenience. They are optional. A data mapping layer can be implemented without them.


Posted by: Max at May 9, 2006 9:03 PM

Steve, although your questions are addressed to "those in the know" (and I don't fall into that category), I would like to take a shot at offering some answers:

1. We use a popular change mgmt system here, but I don't think I'm supposed to mention products by name. But our DB is still defined by DDS anyway, except for DDL indices. SOX doesn't apply to us, so I am ignorant of SOX compliance issues.

2. Encapsulation is a coding strategy, and ideally it would be pervasive, but in reality there are always those situations where someone finds an excuse to bring the IO into the BL with the hope there are no unintended consequences down the road. Also, most RPG apps alive today were probably developed before encapsulation became more common in the RPG world.

3. There should only be one layer of DB encapsulation, but in the Java world there is now usually a second data mapping layer between that and the BL layer. In RPG the data mapping role would probably be played by shared data structures, but data mapping procedures could mediate between IO procedures and BL procedures/modules if this made sense under the circumstances.

4. Yes, there is an EXISTS function which tests for the existence of certain rows. It is generally used in a subquery, either to condition an action or to set an existence flag.

Posted by: Max at May 9, 2006 9:29 PM

Max: thanks for the replies. I have used several change management systems and none support SQL DB implementations. I did work as a Sybase Dba years ago and used a desktop tool that would redeploy the database based on the schema definition, however after generating the SQL script it would regenerate the entrire database. That was fine during development however it would have been a real headache post implementation and moving foward with new releases of the software.

I have had similar issues promoting SQL scripts for triggers and stored procedures. When you move from a development to test to QA to production environments having qualified libraries can be problematic.

Posted by: Stevek at May 10, 2006 6:25 AM

Re several of Nathan's comments.

***
Unless you have some legacy or existing tool issue, SQL is definitely the better approach to define tables and views.

There are performance advantages for some (even many) tables and indexes.

The functionality of views is far superior to logical files.

To me DDS and SQL are both easily mastered, and there's no significant "ease-of-use" advantage to either, but more tooling does, and will, support SQL.

***
Homogenization isn't a reason I listed to use SQL. There are plenty of others that support the case for SQL.

***
Using toy, single-run benchmarks can provide some performance insight, but they aren't reliable to predict overall behavior when you're talking about I/O in a real-world mix of concurrent, multi-user applications.

It's not just that viewed in isolation, some operation all of a sudden becomes relatively faster than what it's being compared to. It's that the full load may shift the proportional impact of both methods under test. And, this can manifiest itself by revealing more severe SQL performance problems in a particular context, as well as demonstrating that SQL is adequate or better than RLA.

-- Paul

Posted by: Paul Conte at May 11, 2006 7:09 AM

EXCELLENT!

Paul, I'm glad you joined, because I want everyone to see some of the statements that are peculiar to the SQL zealot.

"SQL generally performs well when used appropriately and performs better than RLA in some cases."

This is "what you don't know will hurt you". Note that he glosses over the fact that in some cases SQL performs far, far worse than RLA. Those cases (record at a time updates, navigating through the database based on field values, and so on) are the things that application programs do.

"Most importantly, IBM, along with the entire industry, has settled on SQL. IBM is putting all their DB work into additional functionaility and performance into SQL."

Another strange assertion: Everybody is doing it, so should you! If that had anything to do with reality, we'd all be using .NET to access the iSeries database as a big SQL server. And then we'd just get rid of the iSeries in favor of MS SQL Server!

Of course, Paul has written books on SQL Server...

"All the major development tools target SQL."

Following this line of thought, no major tools support RPG either. I guess it's time to learn .NET!

"Java, .NET and LIMP Web application platforms all target SQL."

All these other platforms target SQL because THAT'S ALL THEY HAVE. The single-mindedness of the SQL zealot seems to lead to slightly blurred thinking. BTW, I THINK he meant "LAMP" applications (Lotus Apache MySQL PHP) but "LIMP" certainly has it's own concepts.

"Almost all education, training, books, Web resources, and consulting revolve around SQL."

Another argument for medicority. This one would certainly have us dumping RPG. Hell, it would have us dumping the iSeries, since almost all training (etc.) is for Windows or Linux.

"SQL on i5 has a HUGE amount of functionality that DDS and RLA+OpnQryF lack."

Yes, but that simply means we can use either one as we need it. It's called "the best tool for the job". The SQL zealot's vision is called "wen all you have is a hammer, everything looks like a nail."

"SQL is simple to code if you have the skill and don't have an 'attitude.'"

Simple SQL is simple to code. Complex SQL is a pain in the rear. It is far easier to comprehend the individual lines of a good RPG program than it is to understand complex SQL. But we can see about that when I create the benchmark programs.

"SQL's column-oriented approach and its creation of access plans on-the-fly provide valuable 'data independence'."

We now begin the buzzwords. I can just as easily tell you that SQL's knowledge of the physical locations and name of tables as columns causes unacceptably tight binding between the business logic and the databess.

The truth lies somewhere in between. SQL does allow in certain instances for you to be able to make minor modifications to your database without having to change your code. Similarly, RPG allows you to make minor changes in your database without having to change your code. Sometimes it's as simple as a recompile, and with judicious use of logical views, not even that.

However, this is only for minor changes to the database. If you do anything that isn't just minor structure changes -- rename a field, change a field's attribute, move a field from one file to another, change how a field is calculated -- these all require the same level of changes in SQL as they do in RPG. Except in RPG we can typically use the program reference (DSPPGMREF) to quickly find all the objects that use that file. There is no equivalent tool for SQL access.

"SQL for i5 provides several performance advantages for "roll-your-own" CGI, as well as simplified "optimistic" concurrency that's essential for many Web applications."

Or, you could use a simple JSP Model II approach with an I/O server program using a combiniation of SQL and native I/O that will kick CGI's butt.


And now for the final ones, folks. The SQL guys will ALWAYS, ALWAYS, ALWAYS do this:

"For real world, multi-user, multi-application production environments, simple benchmarks break down. This cuts both ways, and trying to extrapolate from benchmarks that aren't run under a simulated multi-user load can skew the results either way. (I would put no stock in the "toy" test that Joe posed.)"

The refuge of bad performance is to say the benchmarks are skewed. Losers blame the umpires or the refs...

Do you think Paul would be saying this if he didn't know that benchmarks show native I/O blowing the doors off of SQL?


"My advice to iSeries developers and their managers who want to chart a sound development strategy for the future is this:"

My advice to you all to go to their the iSeries Core Decisions Roadshow page and see what Paul and Carson Soule are selling.

http://www.iseriesnetwork.com/coreroadshow/

Read the abstract for the first session in which Carson states:

"Is there still a role for 'traditional' ILE RPG and ILE Cobol programs? Or is your best strategy to develop applications that run under .NET on Windows or as CGI programs on Linux and rely on the iSeries for just your database server?"

That's Paul's vision of the future.

You decide.

Joe

Posted by: Joe Pluta at May 11, 2006 7:35 AM

Joe Pluta and Paul Conte are both highly regarded, deeply experienced members of our profession, and I regard both of them as "experts" whose views deserve careful consideration. I am pleased that they have chosen to make contributions to this discussion.

They represent different views which are both common in our field at this time. I surmise that behind the SQL vs. native IO debate there is actually a more fundamental difference in vision about system architecture and code design.

I think the issues here are important, and I hope our strong feelings about some of our positions do not get in the way of openminded professional dialogue. Much can be learned from this dialogue if issues are clearly articulated and fairly assessed.

A few observations:

1. The idea of "use the best tool for the job" is of course sound as a starting position, but I note additional points:

a) Performance may not be the sole or primary consideration, especially where the performance difference is not significant; and

b) Departments may wish to limit the extent to which SQL and native IO are mixed by establishing a preference or some decision rules.

2. Native IO will not always outperform SQL; our recent benchmark had SQL outperforming native IO by an average factor of three. Performance differences will vary by the processing task, database design, solution design and other factors. Of more interest to me is:

a) the circumstances under which one method significantly outperforms the other;

b) the significance and consistency of performance differences; and

c) those circumstances under which the differences are hardly ever significant.

3. I don't think we can infer Paul Conte's vision from somebody else's unanswered question.

4. There is an elegance to SQL if you acquire a taste for it. :)

Posted by: Max at May 11, 2006 9:30 AM

"LAMP: Lotus Apache MySQL PHP"

HEE HEE! IBM would love that! I must be very tired.

LAMP is of course LINUX, Apache, MySQL and PHP.

Joe

Posted by: Joe Pluta at May 11, 2006 11:59 AM

Max, I don't understand. I've stated clearly multiple times the circumstances under which native I/O outperforms SQL.

Native outperforms when reading or updating single records. SQL outperforms when multiple records are involved.

With reads, the crossover point is about 10 records, with inserts and updates, about 100 records.

When doing normal OLTP operations, such as updating an order, SQL underperforms native by as much as a factor of ten.

Your benchmark was a data mining query, which is the best case scenario for SQL. Please use SQL for those purposes.

But do not use that as an excuse to justify SQL for OLTP, because it's wrong.

In case I wasn't clear enough: native I/O ALWAYS outperforms SQL for single-record updates, which are the primary activity of OLTP business applications.

SQL (nearly) ALWAYS outperforms native I/O on set-based instructions, including ad hoc queries over large databases.

Conclusion: use SQL for queries, and native I/O for OLTP.


So which part of this leaves ANY question in your mind?

Notice that I am not saying one practice is better than another; I'm simply saying that you should use the right technique for the right job. SQL zealots push a single technique for everything, and in doing so doing downplay perhaps the best performing solution on the planet. That's dumb.

Anyway, no more time needs to be wasted here. I may not even bother with the benchmarks, since it's clear that the SQL lovers will simply say that benchmarks don't matter. I have real work to do with people who aren't trying to justify a pre-conceived agenda.


Joe


P.S. "I don't think we can infer Paul Conte's vision from somebody else's unanswered question."

It's not "somebody else", it's Carson Soule, Paul's partner in the iSeries Core Decisions Road Show, talking ABOUT that same road show.

Posted by: Joe Pluta at May 11, 2006 12:13 PM

The question which remains, Joe, is the significance of the difference. If we are talking nanoseconds or milliseconds of difference per OLTP transaction, then that is simply not significant enough for performance alone to decide which tool to use, when there are other considerations, such as code architecture, which are important to the decision.

I know from personal experience that SQL could support OLTP back under V4R4, and I believe that SQL performance is even better today than it was then.

If you are writing traditional RPG programs where IO operations are intertwined with business logic, then native IO makes perfect sense; but if you are writing an application where the desired architecture is to encapsulate the IO activity in separate service program procedures, then I think most programmers would choose SQL. In the absence of hard evidence, I would assume no significant performance difference between the two approaches.

About improved SQL performance, IBM has just released a new Redbook which explains the differences between the older Classic Query Engine (CQE) and the new SQL Query Engine (SQE) available since V5R2. The new engine is faster for a number of reasons, including that of moving the optimizer below the MI (machine interface) into the SLIC (System Licensed Internal Code). This redbook can be downloaded from http://www.redbooks.ibm.com/redpieces/abstracts/sg246654.html

Posted by: Max at May 11, 2006 2:18 PM

The truth comes out! This entire exercise was an attempt to rationalize an architecure of encapsulated SQL.


"if you are writing an application where the desired architecture is to encapsulate the IO activity in separate service program procedures, then I think most programmers would choose SQL."

And I, on the other hand, don't think a single competent programmer would ever suiggest this approach. But what does what you or I "think" matter? This is a waste of time, Max.


"In the absence of hard evidence, I would assume no significant performance difference between the two approaches."

I've given you hard evidence that there is significant difference. You have in effect stuck your fingers in your ears and shouted "I CAN'T HEAR YOU!" every time I did.


I consider this entire exercise to have been a colossal waste of time, and intellectually insincere.


Joe

Posted by: Joe Pluta at May 11, 2006 3:22 PM

I tend to agree with Max, although I understand Joe perfectly. In the "toy" benchmark I described earlier, the 10 mio record file creation and query test, native i/o was about a factor 5 times faster than sql. Even under real business load circumstances, sql will rarely outpace native i/o. Everbody says sql cursor declarations perform faster, but native sequential reads can outpace sql provided you use a large enough blocking factor specified in the ovrdbf command.

But still I symphatize with Max... and I'll take it even a bit further. Sql is currently "thee" standard for rdb access. If code homogonization is the primary goal and you're not satisfied with the performance of sql on the iseries (system i), than... simply throw out the iseries!! Business couldn't care less on what kind of hardware their ITC runs, as long it runs reliably and timely.

PS: my toy benchmark ran in 1999 on a system as/400 model 640. The same toy benchmark ran on a Sun solaris enterprise system under Oracle 8, off course sql only. On average, the Sun ran TWICE(!) as fast as the native i/o test on the as/400 model 640.
The as/400 was soon after thrown in the waste bin.

Posted by: ugeerts at May 11, 2006 3:42 PM

Gosh, Joe, I think you over-react on this topic.

This thread started with a question from Greg about four options. The first option expressly referred to SQL and the other three options pertained to various encapsulation strategies which seemed to imply SQL. The question as a whole seemed to suggest an interest in IO encapsulation. It is not intellectually dishonest to address the concerns of the question.

I am not going to call you dumb, Joe; I'm not even going to call you bull-headed. But when you call "dumb" those who disagree with you, you are being a bit uncharitable.

I am not sure if you oppose IO encapsulation or using SQL in an IO procedure, but if you think competent programmers would not consider this, you are dead wrong. They do.

Back in the 1970s I knew of programmers who did not like structured programming theory because it interfered with programmer creativity. I knew of programmers who didn't like modular programming because they didn't like having to look around at all those subroutines (this was pre-RPG IV). And yet, as we all know, structured modular programming theory became the generally accepted best practice for code design.

Encapsulation (which might be thought of as a form of modularity) is widely accepted as a best practice in systems programming and object-oriented programming, and is now gaining currency in the RPG community. Those who practice it can speak of its benefits. I predict that if RPG survives, the use of procedures to encapsulate functions will become widespread. Furthermore, as the practice of encapsulation spreads in RPG circles, IO procedures are likely to use SQL more often than native IO, for a number of good reasons which become more compelling as performance becomes less of a differentiating factor.

You have given no hard evidence of significant performance of native IO over SQL; you have only made claims. Nor have you stated why performance differences which may be unnoticeable in an interactive setting should nevertheless be considered significant.

Interactive applications using SQL exist and perform well. Those who designed and programmed those systems are quite competent. Programmers who write well-structured programs using native IO are also quite competent. There are a variety of competent choices to select from as we maintain existing systems or design new ones.

As time passes, we can watch and see which practices prevail and endure.

Posted by: Max at May 11, 2006 4:55 PM

In my experience, experienced Java developers conform to a consistent development practice. Each creates and uses functionality in a similar way especially now that Spring and Hibernate dominate and unit testing is de rigueur. The RPG3 programmers I know also seem to conform to a consistent practice. Both in this forum and in my workplace, the RPG programmers who have gone beyond RPG3 don't seem to have a consistent style they agree on nor consistent goals that one style or set of techniques can be said to achieve. Max, what are your goals for RPG modernization? One way to reduce the costs of maintaining code is to have similar functionality coded in a similar manner. Is that a realistic goal for your organization or is it already too late? Does your organization perform extensive code reviews? Is there shared ownership of the code? Inconsistency is costly and modernization that introduces inconsistency is probably not worthwhile. Another source of inconsistency (and thus, high maintenance costs) is programmer creativity and on this, I direct you to Steve McConnell's Code Complete.

Posted by: Greg at May 11, 2006 6:07 PM

Max, I only get upset when you continue to blatantly misrepresent me, misstate the facts, or otherwise are intellectually dishonest with the readers.

When talking about encapsulation you said "I think most programmers would choose SQL" and I said "I don't think a single programmer would choose this approach". It's clear from the context that I was talking about SQL not encapsulation, and yet you chose to misread it and then go on and on arguing with your own misreading. That's an example of the intellectual insincerity of this entire thread.

"But when you call "dumb" those who disagree with you, you are being a bit uncharitable."

Wow. Paul Conte made several personal comments about me, and you didn't chastise him. I simply said that the decision to not use the best tool for the job is dumb. I characterized the decision, Paul characterized me, and yet you call me out. Again, intellectual sincerity doesn't seem to be your forte.

"You have given no hard evidence of significant performance of native IO over SQL; you have only made claims."

Absolutely false. I've actually posted entire open source benchmark libraries that prove SQL to severely underperform RLA. Do a little research, Max. Go to www.midrange.com and search the archives for Pluta, SQL and benchmark. Here are numbers from V5R1:

(...for a simple CHAIN operation, native I/O ran 14000 records per second, while a single-record SELECT INTO ran 1400 records per second and a FETCH on a cursor ran about 950 records per second.)

10-to-1 is pretty significant in my book.

"Nor have you stated why performance differences which may be unnoticeable in an interactive setting should nevertheless be considered significant."

This is pure doublespeak and again, intellectually insincere. I'm simply stating the conditions under which native I/O outperforms SQL. It is you who are using fudge phrases like "may be unnoticeable". If there is ANY performance advantage in native I/O than it is up to you, the SQL advocate, to provide reasons as to why the underperforming solution should be used. It's not up to me to challenge your standard of "acceptable mediocrity".


Again, it's clear that no matter what I say you will try to defend your decision, but frankly it's a bad one and nobody but another SQL zealot is going to back you up.

Or Ugeerts. I like his point. If you're going to avoid using the best tools on the box, throw out the box.


Joe

Posted by: Joe Pluta at May 11, 2006 6:29 PM

I'm sorry you feel I'm being intellectually dishonest, Joe. I am sincere in my communications and I try to be clear as well.

I was confused about whether you were against encapsulation or the use of SQL in IO encapsulation because I found it odd that you would make the incredible statement that you didn't think a single programmer would make that choice. You should know that many have. IBM recommended it to the client I have mentioned before. Joel Cochran prefers SQL in his IO encapsulation articles on IT Jungle. Because programmers HAVE chosen SQL for this purpose, I wasn't sure if you were referring to the use of SQL or the use of encapsulation.

We disagree about how to decide whether to use SQL instead of native IO. We agree that SQL should normally be used for multiple row processing, but disagree on when it makes sense to use it for single-row or small result sets, and here our argument seems to be about criteria.

You seem to be saying that whenever native IO outperforms SQL, it should be used. Period. Other criteria do not matter. Use of SQL would be mediocre.

I am saying that where the difference is not significant, other criteria, if there are other criteria, may be used to make the decision.

I have visited your website in the past and reviewed that data. It actually confirms my position. The differences, Joe, by your own tests, in an interactive OLTP setting, are measured in microseconds or milliseconds. For a person sitting at a PC using an interactive application, the use of SQL or native IO is immaterial. They won't notice the difference.

That conforms with my experience. SQL in an interactive setting is no slower than native IO from a user experience point of view. Therefore other criteria can drive the IO choice, if there are other criteria that matter to the designers.

The only time the performance differences (which are now obsolete) would matter, would be if there was a long-running batch job that for some reason could not make use of SQL's multiple row strengths but had to execute single-row retrievals over and over again. For that non-OLTP batch process situation, native IO would be the better option, except perhaps for a cursor to get things going.

I cannot deny my own experience, Joe; I KNOW SQL performs well for interactive applications. Tests which show native IO beating repeated single-row retrievals in a batch process will not affect what I know about SQL in an interactive context. Microseconds do not matter to interactive users.

I, and many others, have our design reasons for preferring encapsulation and the use of SQL. Unless there are significant performance (or other) reasons for using native IO, we would default to IO procedures which use SQL.

This doesn't mean that those who prefer native IO for their design reasons are making dumb choices. And those who prefer native IO should not think our choices are dumb. We ought to be able to respect each other's preferences.

Posted by: Max at May 11, 2006 8:50 PM

Greg, you make excellent observations here. Yes, I agree with you that RPG IV developers have not developed the kind of consistent patterns of coding that Java developers have. And yes, there is a tension between the virtue of consistency and the desire to adopt newer coding practices. We are well aware of this tension.

The release of RPG IV initiated a "crisis" of sorts in the RPG developer community. RPG III programs could be converted to RPG IV form, but they still had an RPG III architecture.
What do we do with our standards? How do we evolve our programs from an older form to a newer form, or should we?

This is a very real challenge, but we have encountered this before in the past. Many RPG II programs predated structured modular programming, but when they evolved into RPG III form they could be changed, gradually or suddenly, into a better form.

We have given a lot of thought to this set of problems over the past year in conjunction with our new development forum. We favor developing some new consistent code patterns, and we favor adopting maintenance practices which encourage moving code from older forms to newer forms. But we need to develop ways of managing this effectively. Currently our developers are encouraged to adopt new practices from a set of approved new practices, under the guidance of their project leader, and we all understand that we are in a transition period from former standards to new standards.

Whenever IBM dramatically changes RPG, there is a period of upheaval and transition to new best practices. We've been through it before; we'll get through it this time, too.

Posted by: Max at May 11, 2006 9:06 PM

"You seem to be saying that whenever native IO outperforms SQL, it should be used. Period. Other criteria do not matter. Use of SQL would be mediocre."

Barring reasons that are of material benefit to the company, absolutely. You have yet to provide such reasons.

"I am saying that where the difference is not significant, other criteria, if there are other criteria, may be used to make the decision."

Certainly. You can make such decisions based on the phases of the moon if you so choose. However, you have yet to provide a compelling business reason to use SQL. I, on the other hand, have provided ample evidence to not do so.

"I have visited your website in the past and reviewed that data. It actually confirms my position. The differences, Joe, by your own tests, in an interactive OLTP setting, are measured in microseconds or milliseconds. For a person sitting at a PC using an interactive application, the use of SQL or native IO is immaterial. They won't notice the difference."

Arguments like this are exactly how poorly performing applications get written. It's not the milliseconds, it's the percentage. It takes ten times as long to do the same thing in SQL as in native I/O. When measured by one user on a dedicated machine, it's negligible. When you do that to a machine that's running at 90% or more of its capacity, it's criminal.

"That conforms with my experience. SQL in an interactive setting is no slower than native IO from a user experience point of view. Therefore other criteria can drive the IO choice, if there are other criteria that matter to the designers."

And my anecdotal experience is exactly the opposite. You seem to forget (or maybe didn't know) that I was the Manager of Architecture at SSA, the largest AS/400 software company in the world at the time. And at that time they were moving programs from native I/O to SQL, primary because they listened to arguments like yours and Paul Conte's. Programs like order entry slowed to a crawl. Price lookups became painfully slow. Things like three-way matches and purchase order allocations went from instantaneous to horrible.

"The only time the performance differences (which are now obsolete) would matter, would be if there was a long-running batch job"

Again, this is simply wrong. Go to a real world client and see what percentage of capacity they are running at. I will guess it is a very high number, and that the programs are I/O bound, not CPU bound. That being the case, adding a tenfold increase in disk I/O overhead will DESTROY their performance.

Is there any part of this I am being unclear on? Reducing performance is not acceptable unless you have a compelling business reason to provide lesser value to your end users.

"I cannot deny my own experience, Joe; I KNOW SQL performs well for interactive applications."

And I KNOW that native I/O performs better. Much, much better. You can say it over and over, Max, but that doesn't prove anything. You've now devolved completely into the "yeah, but it performs good enough" school of SQL zealotry.

And even if the performance gain for native I/O was negligible, you have yet to build a compelling business case for using SQL, or indeed to provide any reason at all other than the rather bizarre concept of calling a 30-year old technology "new".

"I, and many others, have our design reasons for preferring encapsulation and the use of SQL."

Since encapsulation can be done just as easily with native I/O as with SQL, I can only wonder what these design reasons are that make SQL so compelling to you, Max. What architectural decisions are you trying to justify? The only reason I can see for foisting lesser performance off on your users would be a desire to move your data off of the iSeries, or to replace your RPG programmers.

Amd the more I listen to your "arguments", the more that I believe that is exactly what you are trying to justify.

Joe

P.S. I'm allowed to think and even to say that certain decisions are dumb, Max. In fact, I'm happy that computer programming is one of the few places where Political Correctness has yet to take hold; it's acceptable for someone to look at someone else's design and say why they think it's flawed.

And without some other compelling business reasons, the decision to replace ISAM database access with SQL is indeed very, very dumb.

Posted by: Joe Pluta at May 11, 2006 10:06 PM

"That's Paul's vision of the future."

Amusing how a person who expects people to trust his opinions would make such a mis-step as to claim to know the actual content of the "Core Development Decisions" roadshow when (I gather) he hasn't attended.

Just to provide a bit of reality to balance this somewhat unreal diatribe, Carson and I both express great confidence in the future of the iSeries (whatever it may be called), and (in simple terms) advise that Java/J2EE with or without ILE HLLs on the backend are viable in some contexts, and CGI is in others.

I can assure you, if you want to know what works in the real world, you can't find anyone much better than Carson Soule to listen to. I'm just the "window dressing" and honored to be on the same bill.

Our simple agenda is to help people make good decisions for the long term vitality of their companies. If you're in the Columbus, Ohio or Dallas areas, come talk with us next month and see for yourself.

-- Paul

Posted by: Paul conte at May 11, 2006 11:12 PM

Yes, Joe, we are all entitled to think what we like and you may think that replacing ISAM with SQL is very dumb and I can think otherwise. The practices of our profession will continue to evolve taking into account changes in design theory and changes in software/hardware performance. I think we will see in the world of RPG a continuing increase in the use of SQL and a gradual decline in the use of native IO. Let's check back in in five years.

I never called SQL new; in fact, I pointed out its correct age. What I may have called "new" is the increased use of SQL in the RPG world. In think this is just a true fact known to all.

Since your tests were done there have been further enhancements of SQL performance on the iSeries, especially with the development of the new SQE which premiered with V5R2. It is probably no longer true that "It takes ten times as long to do the same thing in SQL as in native I/O." (and that truth was always limited to a particular type of task). I think one of the reasons SQL dramatically outperformed native IO in our recent benchmarks was because we are at V5R3 and our SQL options were able to use the new SQE instead of the older CQE.

Your argument against SQL in an interactive setting in spite of the miniscule performance difference is that small time differences do matter when machines run at capacity. Well, at capacity a lot things can slow things down, but as the row-level performance difference between SQL and ISAM continues to collapse (please look into the new SQE), once again, many may feel the difference is not the only thing that matters.

You say I have not mentioned architectural reasons why some may prefer SQL. I thought I have mentioned some of the advantages of SQL and of a more modular architecture, and it is well known that a primary motivation for modularity and encapsulation is to reduce maintenance costs. That is the intended ROI. I also know that some IT managers favor moving from native IO to SQL because of changing skill sets and common practices in the industry.

You say that "encapsulation can be done just as easily with native I/O as with SQL." Well, yes, but many of us would find it less elegant. SQL can do in one query what might take several native IO operations to do, and SQL ccontains IO logic within itself whereas as native IO is only IO and has to be surrounded with conditional logic.

I checked back on Joel Cochran's IT Jungle article and he shares this opinion: "I would like to point out that the example being built in this article uses SQL for all file access. This method can be easily adopted for native file access as well, but requires additional coding. If you are going to use native file access, I recommend you control the open and close status of the file and check in every access procedure for the file status."

Maybe I am an SQL zealot after all, Joe, instead of just an SQL enthusiast, but I remain unpersuaded by your arguments against the use of SQL in interactive applications. The performance difference is too small, and might not always favor native IO (I am thinking here of a logical file accessed by RLA versus a file accessed by SQL using a DDL index). I think design decisions should be made on the basis of a full set of criteria identified to be relevant to the project. Sometimes the full set of criteria may favor native IO and sometimes SQL, it depends on a lot more than just one factor.

I have seen a lot of changes in IT since I entered the field in 1973, and I expect to see a lot more before I am put out to pasture. Quite frankly, I do expect to see the gradual decline in the use of native IO as SQL performance continues to improve and architecture preferences continue to favor increased modularity and encapsulation. I could be wrong, but let's watch and see what happens over the next decade.

Posted by: Max at May 12, 2006 8:36 AM

I just want everyone to know that although I disagree with Joe about the reasonability of using SQL in an interactive context, I have a high regard for the man and his work and I have learned many useful things from his articles. I am particularly fascinated by some new innovative work he is doing in the area of combining AJAX with portal technology, and I would refer you to a recent article of his at http://www.mcpressonline.com/mc?1@156.POyzcx27595.0@.6b37286c

Hats off to you, Joe!

Posted by: Max at May 12, 2006 8:51 AM

Max: I am a bit confused. Using SQL and encapsulating all file access would require not only interactive but batch processes to use this access method as well. I have used dynamic Sql for interactive applications and think its great for allowing users to pick fields from a file to dynamically display columns in a subfile, sort or filter to their hearts content. This is very hard to do with native IO. But I have some serious concern using SQL in batch processes where you are retrieving a row from file a, then a row from file b, etc. for reference information for processing thousands of transactions. You could rework these to use set processing perhaps, but that would take serious rework of their design.

Most ERP system on the iSeries are still very batch oriented. I have yet to run across anything resembling a relational design where most if not all processing occurs as events happen.

Many business process, EDI for example are batch oriented by their very nature.

If iSeries software architecture continues to an all SQL path then I have serious concerns of the survival of i5/OS on the platform.


Posted by: Stevek at May 12, 2006 9:43 AM

As has been proven many times, anyone can make SQL perform poorly and then say it shouldn't be used. When someone shows me that SQL has poor performance, I can be sure that person this hasn't exerted much effort to learn SQL. And this goes back to my earlier point that modernization is more about people and processes than any particular technology. I think plans and goals are needed more than the reassurance that we've muddled through before.

Posted by: Greg at May 12, 2006 10:06 AM

"Amusing how a person who expects people to trust his opinions would make such a mis-step as to claim to know the actual content of the "Core Development Decisions" roadshow when (I gather) he hasn't attended."

Paul, I just go based on what you guys advertise. All I did was repeat what your roadshow has published on the web and invite people to draw their own conclusions as I have.

If anyone should get called on the carpet for mischaracterization, it should be you! You have pretty much called me a technological Luddite for not being on the SQL bandwagon, yet while you were writing COBOL I was designing operating systems. Some of the many firsts in the IBM midrange arena that I've been involved in:

The first distributed laser printer
The first standalone multi-port asynchronous communication device
The first commercial application using multi-tasking Pascal (a distributed POS system)
The first distributed PC network for credit card authorization (Navistar's national dealership network)
The first commercial client/server application using OS/2
The first commercial Java-based product
The first commercial web-facing product

I've also been involved with several open source projects, including my latest AJAX project, AIRPort.

So, while I'm sorry that you might feel a lttle bit put upon, my opinion is that my characterization of you is far more accurate than your characterization of me. But I'd be just as happy to drop the personal crap and move on.

Joe

Posted by: Joe Pluta at May 12, 2006 10:10 AM

Steve, you can use encapsulating strategies for new development while permitting preexisting applications to continue to use their IO strategies.

Many valuable applications which do not use IO encapsulation will continue to exist undisturbed by such new development. Encapsulation is a software strategy which can be application-specific and not disturb other applications.

I don't see an all-SQL world on the iSeries on the horizon; I just see increased SQL for new apps and also for some enhancements to existing apps. Even I do not see any ROI in reworking an ERP application to encapsulate IO.

Set operations can be used in batch processing, of course, but it is best designed in from the beginning. Revising a batch program which uses native IO strategies to use SQL set operation strategies might really alter the program design.

Because I don't see SQL as incompatible with RPG, I don't see SQL as a challenge to the i5/OS world. BTW, this month's IBM Systems magazine contains several articles about V5R4 enhancements to SQL and DB2.

Posted by: Max at May 12, 2006 10:11 AM

"Sometimes the full set of criteria may favor native IO and sometimes SQL, it depends on a lot more than just one factor."

Max, you aren't providing any real criteria nor rebuttal. You keep saying SQL has gotten better, yet provide no quantitative proof. You say that it's better to encapsulate using SQL, but provide only "elegance" as your business reason.

At this point, we've reched an impasse. I am using what I consider to be sound business decisions, and you are going by your perception of what is elegant. I've seen what "elegance" and an unreasonable adherence to the fad of the day does to a business application.

You may expect to see the decline of native I/O, but the same has been said of RPG for decades, and of our platform for nearly as long. And those of us who continue to use the strengths of the platform and the language rather than bow to some subjective measure of "elegance" have been proven correct for a long time.

You're right. We'll see.

Joe

Posted by: Joe Pluta at May 12, 2006 10:20 AM

Max: Then this is really not a "modernization" technique to be use by existing applications, but an architecture and framework to be designed and deployed for developers to use on new application development.

In my mind application modernization and new development arent the same thing.

Posted by: Stevek at May 12, 2006 10:24 AM

Yes, the modularization strategy of encapsulation (in general, not just for IO) is much easier to design into new applications than to modify into existing applications.

But this doesn't mean that developers can't be encouraged to use encapsulation strategies when enhancing existing applications. I think many shops which have moved from RPG III to RPG IV are in fact doing that as they move certain functions from original settings into service programs.

But trying to modify an existing application of significant size to adopt a comprehensive IO encapsulation strategy would be an enormous exercise. For an ERP system in particular, it should only be done by the software provider. Sometimes they do radical redesigns of their product (JD Edwards and Manugistics come to mind).

Use of encapsulation can play a role in application modernization efforts, but design constraints are likely to be encountered, especially for system-wide IO encapsulation.

RPG IV has opened the gates for much greater use of encapsulation for RPG applications, and many people may be doing it without calling it encapsulation. Whenever you "hide the implementation" of a function in a reusable service program procedure, you are "encapsulating" that function, and users of the procedure "write to the interface" (I am using OO language to refer to standard RPG IV practices).

Posted by: Max at May 12, 2006 10:48 AM

Max,

I do **NOT** believe IBM when they claim they've enhanced sql performance on iseries. IBM has become totally incompetent and irrelevant in todays ICT landscape, at least that has become my personal opinion as a half time migrator from as/400 apps to SAP or ORACLE (the other half working in SAP). But still, they may deserve a second chance. Can we have some meaningfull benchmark results on the latest and greated iseries hardware and o/s version?
SHOW US THE BEEF!

Posted by: ugeerts at May 12, 2006 12:16 PM

I'd say the difference between RPG record level business logic and SQL is far more than the execution times of READ, WRITE, UPDATE, and DELETE versus SELECT, INSERT, UPDATE, and DELETE.

And I'd also say isolating database IO to separate modules (presumably a service program) makes that difference even greater.

A lot of what we do in RPG is to drop out of loops as soon as some situation is met. A lot is based on reading one record at a time with the one record locked for update. And a lot is based on the program scaling to any amount of data, because if we're successful it does.

But doing it with SQL, which is inherently set oriented, is entirely different than the record oriented RPG native IO.

When we position to detail records for a key, we're positioned to read one record. There may be a thousand detail records, but we read one and may determine the key does not satisfy conditions for processing, and drop out and perhaps perform a SETGT to the next key.

But with SQL a result set is created, and while our record level code works for any amounts of data just as fast, which is the whole point of our success, SQL will build a result set in a buffer that is obviously affected by the size of the potential result sets. And passing it from separate IO modules compounds the overhead.

The result set is cursored, and after one it may be all abandoned and we start over. Even though we are reading one at a time with RPG, I think disk IO systems are going to be bringing a chunk of data up in IO buffers, so it's not all that much worse to read next as cursoring through a result set.

Obviously large amounts of data that are all processed will gain from all that work to build the result set, but I wouldn't write special SQL code for that.

In the meantime, where with record level IO we have the detail we are reading locked if for update, the equivalent would be only the record the SQL cursor is on locked for update and the rest of the result set not locked.

My guess is IBM has implemented some of the best row level locking around because they know this, but on the other hand I don't know. It's a scaling problem if it isn't.

I routinely have programmed through the years against AS/400 files with millions and even a billion records, and RPG record level IO works just as fast and just as good against a billion as it does one. Again, IBM knows this and perhaps with package preparation and the like can emulate RPG file access by key when SQL is executed, but in my opinion the AS/400 and RPG native IO should be used for all those billions of dollars of government systems that don't work because RPG and OS/400 scales and SQL, well, we've got billions and billions of dollars of SQL code that doesn't work against massive government databases.

RPG does in business. It would also for the federal government's failed projects across the board, but there's no money in that for what passes as government consulting companies, including IBM.

Remember, IBM spent a billion dollars trying to create an ERP infrastructure called San Francisco. It wasn't just Java, it was also SQL using all the same buzzwords throughout this blog.

A billion dollars down the drain. AS/400 companies, and RPG programmers, are proven to be a lot more cost effective than that.

rd

Posted by: Ralph Daugherty at May 12, 2006 9:21 PM

I'm afraid Ralph's explanation of how SQL works is simply not correct for the iSeries.

For some queries, SQL does build a temporary file to hold a result set.

However, for typical sequential processing by key, whether for an entire table (file) or a subset (e.g., detail records for an order) SQL uses keyed positioning, and does NOT use a temporary file or fill a buffer with the entire result set.

You can use an SQL cursor and Fetch statements to process a set of detail records in an RPG loop, dropping out when you want, "repositioning" for the next set of detail records when you want, and SQL leaves the ODP open and uses keyed positioning pretty much like RPG.

It's dead wrong that: "SQL will build a result set in a buffer that is obviously affected by the size of the potential result sets. And passing it from separate IO modules compounds the overhead."

Among the things that are useful to know about the differences between RLA and SQL is this:

RPG and Cobol bring an entire logical or physical file record into a program buffer, and fields are accessed as subfields of a record structure that overlays this buffer. The clever thing about the iSeries RPG and Cobol compilers is that they generate the program record structure from the file definition. On other systems, this is handled by coding a structure declaration in a "copy library".

Unless you use SQL for iSeries' proprietary Fetch to a multi-element structure, SQL brings in individual column values and stores them in scalar program variables. This provides a level of data independence, but incurs some overhead.

If you're unaware of SQL for iSeries support for Fetching multiple rows into an array of structures, you should learn about it for high-volume input.

SQL for iSeries uses row-level locking, rather than "page" locking, as is used for some other DBMS. A cursor's entire result set is NOT locked with lower levels of transaction isolation, so this is another non-issue.

As to SQL being "set-oriented" versus RLA being "record-oriented". It should be obvious that a "set" can consist of one row, so SQL actually provides BOTH perspectives. Furthermore, cursors are specifically designed to allow row-by-row processing, i.e., a "record-oriented" interface.

-- Paul

Posted by: Paul Conte at May 15, 2006 10:08 AM

Paul,

I wouldn't expect anything else. Since iseries SQL was a laid on top of a basically ISAM oriented DB, SQL naturally uses the low level ISAM api's which are in fact the RLA api's. That's why SQL peformance will never be better than RLA, unless SQL bypasses the RLA api's and looks for entry points deeper down in the DB object space.

Other rdbms manufacturers, Oracle, Ms, Adabas, Informix and most others don't have this legacy burden, that's why they outpace SQL on the iseries. Time for a contest, I would say.

Posted by: ugeerts at May 15, 2006 1:38 PM

If I understand the text and diagrams correctly in the new Redbook which I have mentioned before (and mention again below), I think SQL DOES now bypass the RLA APIs. (See section 2.2 on the CQE and SQE query engines in the redbook).

The Redbook seems to suggest that much of the SQL work is now deeper (closer to the database) than the RLA APIs.

From a prior post:
About improved SQL performance, IBM has just released a new Redbook which explains the differences between the older Classic Query Engine (CQE) and the new SQL Query Engine (SQE) available since V5R2. The new engine is faster for a number of reasons, including that of moving the optimizer below the MI (machine interface) into the SLIC (System Licensed Internal Code). This redbook can be downloaded from http://www.redbooks.ibm.com/redpieces/abstracts/sg246654.html

Posted by: Max at May 15, 2006 1:46 PM

Thanks Max. I'll have a look.

Posted by: ugeerts at May 15, 2006 2:13 PM

"That's why SQL peformance will never be better than RLA, unless SQL bypasses the RLA api's and looks for entry points deeper down in the DB object space."

This reasoning doesn't hold.

The HLL's have their own runtime libraries that sit above the lower operating system and LIC layers, and compilers generate code for every I/O statement, just like SQL does.

You might argue HLL DB code has a shorter path length, and therefore should be faster. Or vice-versa. But SQL isn't somehow "stacked" on top of the HLL's I/O layer.

-- Paul

Posted by: Paul Conte at May 15, 2006 6:06 PM

If I read Paul's post right, SQL on the AS/400 operates just like record level processing in RPG native IO.

If there's no difference, then the implication is that SQL is just a different syntax for doing RPG CHAIN, SETLL, READE, UPDATE, etc., but with more options. That would be a revelation.

Maybe I read it wrong.

If they operate the same, execution must be about the same, with a little vig for moving packed decimal into SQL integers and the like. That can't take that long.

If it runs significantly slower, say 20% or 30%, or up to 1000%, then what, we are supposed to use SQL syntax to do the same thing RPG IO does but a lot slower so we can say we're using SQL that only runs on the AS/400 anyway?

But since Paul described it as running the same as record level, I would expect the benchmarks will show it runs almost as good as record level. That'll also be a revelation.

viva SQL syntax.

rd

Posted by: Ralph Daugherty at May 15, 2006 7:06 PM

I'm sorry if I was unclear, but my post did NOT say that "SQL on the AS/400 operates just like record level processing in RPG native IO" and that is not a correct understanding.

Whether you execute an RPG Chain or an SQL Select in your program, the actual execution flow goes through multiple layers of system code.

The upper layers of this system code are different for RPG opcodes than for SQL statements, while some of the bottom layers are the exact same code.

Picture a "Y" with RPG above the left arm of the "Y" and SQL above the right arm. (This is an over-simplification, of course.)

The SQL-specific layers of code are not "on top" of the runtime code for RPG. Where both SQL's and RPG's runtime routines call the same lower-level DB APIs (what ugeerts calls "ISAM" or "RLA"), both SQL and RPG have their own layers of code on top of these APIs.

Again in over-simplified terms, the difference in execution speed between (e.g.) a Chain vs. a Select is based on the code path length, from top to bottom, down each respective arm of the "Y".

This doesn't imply SQL is faster, slower, or comparable. Just that SQL uses some of the same system code as RPG opcodes, and some different runtime code.

The SQL "arm" of the "Y" also isn't just a one-for-one re-implementation of RPG opcodes in a different syntax. That should be obvious to anyone who's RTFM, I would think.

There are, of course, many cases where you can achieve the same goal (e.g., reading a record by primary key) using an RPG opcode or an SQL statement. That doesn't imply SQL is simply a different syntax for RLA.

I encourage anyone considering the relative merits of SQL vs. HLL RLA to make sure they have an accurate understanding of SQL's implementation on the iSeries and aren't basing their decisions on some of the misconceptions being presented here as arguments against SQL.

-- Paul

Posted by: Paul Conte at May 18, 2006 12:01 PM

"...and aren't basing their decisions on some of the misconceptions being presented here as arguments against SQL."

And I also encourage anybody who is making a decision on SQL vs. RLA to use a little common sense, which is in short supply these days.

It is pure common sense to realize that the fastest access to a keyed file is direct access to an ISAM index. If you've ever actually written code to build and access ISAM indices (as I have), you know that it's the fastest thing available; you use your key to find the appropriate index entry and then use that entry to access the physical record.

To borrow from Paul's analogy, eventually the SQL statement will have to do the same thing: take some key information, use it to access the appropriate entry in an ISAM index, and then use that index entry to get the record.

In a standard ISAM setup like RLA, the index is predefined and prebuilt, and your program fills the key field directly with data in the form of a key list. Thus the ONLY steps are to access the index, read the record and return the data.

With SQL, there are optional additional steps, such as creating the access path or parsing the statement or building the key. While you can conceivably reduce the overhead using pre-built indices and prepared statements, even once that overhead is reduced or removed, your SQL program STILL has to travel on down the path of accessing the index and reading the data.

So until somebody can explain to you how the SQL engine (regardless of the implementation) uses less time to fetch a record using a key than a direct ISAM access to the same index, then any claim that SQL is somehow faster than RLA is simply smoke and mirrors.

And unless you've really spent a lot of time hardcoding your SQL access using things like prepared statements, then you will have a lot of overhead, which is why benchmarks show SQL access to be so slow.

It's also why SQL folks don't like benchmarks.

It's just commone sense.

Joe

Posted by: Joe Pluta at May 18, 2006 2:41 PM

As if to prove my point ...

If you want to do roughly equivalent keyed access with embedded SQL in an RPG module, you use static SQL statements, which are always prepared.

For static statements, the SQL precompilation process generates an MI code fragment that gets stored in the associated space of the module object. This code is just as "compiled" as any RPG opcode.

Several elements of this "access plan" code aren't resolved until the first time you run the statement on your production table(s), but after that it's just plain old compiled code.

And you don't have to "spend a lot of time hardcoding" SQL any more than writing RPG opcodes and error checking requires. This is a silly claim.

For fast keyed access an index must exist whether you're using RPG or SQL, and if an appropriate keyed access path exists, SQL uses it.

SQL tables have large index pages, so index processing can be _faster_ in SQL, depending on the pattern of keys and whether all column values being returned exist in the key.

There's much more to keyed I/O than just navigating an index, and a SQL statement has no inherent disadvantage in calling the system code that supports index functions.

In fact, where RPG I/O may have a shorter code path than SQL probably has more to do with the way SQL decomposes an internal record into the scalar values that are returned for each column that's retrieved. SQL has code to do data type conversions automatically, which adds some data independence and some overhead.

Certainly one thing I've observed is that reducing the number of columns retrieved speeds up SQL rertieval.

I haven't re-checked in the latest release, but using multi-row retrieval into a multi-occurrence data structure returned blocks of data (not columns) and was another way to speed up retrieval. Interestingly, you can burn yourself with this feature just as you can if you mis-define a record layout in RPG or Cobol. For both SQL and HLLs, using externally-described record structures makes it easier to assure accurate record structure definitions.

RPG basically brings a buffer into the program, although there may be some field copying afterwards.

SQL also provides a layer of error and diagnostic handling that RPG I/O doesn't involve.

OTOH, RPG I/O checks data validity on every "read" operation; whereas SQL doesn't -- a code path advantage for SQL.

Any developer should welcome credible benchmarks. But, as I've said, "toy" benchmarks are of limited value.

And "non-sense" shouldn't be mistaken for "common sense".

-- Paul

Posted by: Paul Conte at May 18, 2006 3:32 PM

"OTOH, RPG I/O checks data validity on every "read" operation; whereas SQL doesn't -- a code path advantage for SQL."

I'm not sure who told you this, Paul, but that's not what the RPG compiler folks told me. Data validity checking is determined not by access, but by definition: specifically whether you create your files with DDL or DDS. DDL files are checked at write time, regardless of access method, while DDS files are checked on the read.

Meanwhile, you are now locking the programmer into embedded SQL in RPG, even specifying iSeries-specific extensions.

I hope that everybody notes that as soon as performance is the issue, the SQL argument IMMEDIATELY devolves into platform-specific extensions and language dependent programming techniques.

Obviously the concepts of platform independence are tossed out the window; Paul's now even advocating using RPG!

So, to recap, I *STARTED OUT* saying that a combination of RPG and SQL, each used properly, is the proper technique. Paul just wasted days arguing the merits of SQL access, only to finally tell us that we should use a combination of RPG and SQL.

The only difference in our positions is that Paul would have you NOT use RLA, even when it performs better.

It's up to you, gentle reader, to decide which is "non-sense". I am truly done.

Joe

Posted by: Joe Pluta at May 18, 2006 4:11 PM

Joe is correct that a table defined with SQL provides validate-on-write, which is a potential performance advantage of using SQL, but for DB definition, not as a code path difference. I wasn't careful enough in how I stated the point, and appreciate the correction.

= = = =

On the other points ... Let's not set up "straw men".

First off, the discussion is (partially) about SQL vs. RPG built-in I/O, not about which language to use for business logic or presentation.

So of course, if you're using RPG as your main HLL, it makes sense in many cases to embed SQL in it. And because RPG isn't portable, such applications as a whole aren't either. But if you used C or Cobol or Java, they would be to a greater degree, although I'm not about to overstate the "portability" of actual SQL source code for a variety of reasons. In fact, if you came to the "Core" road show, you'd hear me explain that portability of SQL source code is of signficance only for certain types of companies (e.g., some ISVs) and situations.

I began my discussion of SQL and RLA by making clear:

"... let me say that you don't need to "rip-and-replace" stable, working RLA just to use SQL.

And, it's fine if you want to selectively use RLA in new code where it makes a truly significant difference. Especially if you've isolated your I/O code so you can easily change it if the need arises."

So Joe's assertion that "Paul would have you NOT use RLA ..." is a just a self-serving misrepresentation. I would have you use RLA selectively and for sound reasons.

The two substantive differences in our positions are:

* I suggest development groups take a broader view of their development strategies and tools, and recognize the many reasons to adopt SQL as a primary DB interface.

* I encourage developers to avoid being misled by some of the misunderstandings about the SQL implementation on the iSeries.

-- Paul

Posted by: Paul Conte at May 18, 2006 6:12 PM

"Joe is correct that a table defined with SQL provides validate-on-write, which is a potential performance advantage of using SQL, but for DB definition, not as a code path difference. I wasn't careful enough in how I stated the point, and appreciate the correction."

Just to be clear here: defining a file with DDL has nothing to do with programming with SQL. You can define a table with DDL and use it with RPG. You can define a file with DDS and use it with SQL. Thus the benefits of the DDL-defined table are the same for RPG as for SQL.

The "don't rip and replace" statement clearly implies writing nearly all new code with SQL. At first, it sounds like this means some sort of standard SQL. But as soon as we talk about performance, Paul jumps to platform-specific SQL code.

So now we're clear. It's taken some work and I'm still a little fuzzy on the details, but as far as I can tell Paul's position is to code primarily using platform-specific SQL, rather than native I/O (which is not only platform-specific but platform-optimized). In fact, if I read it correctly he recommends SQL embedded in RPG for performance, although I'm not sure; he also seemed to be saying you might write your business logic in C or Java. And THAT, my friends, makes the conversation far too surreal for me.

Joe

Posted by: Joe Pluta at May 18, 2006 9:11 PM

This thread has grown quite long and should be replaced with a new thread soon, but before then I would like to briefly summarize (as fairly as possible) what I see as the two main positions regarding Native IO (RLA) vs. SQL.

I would say there is a pro-RLA position and a pro-SQL position. Both positions say that both IO technologies should be used, and that the choice for any situation depends on the relevant selection criteria. Disagreements seem to exist about the weight given to different criteria (including architecture and code structure interests), and also about performance claims.

Both positions recognize that under certain conditions RLA will outperform SQL, and under certain other conditions SQL will outperform RLA. There is some disagreement about the significance of performance differences under some of these situations, or for how significant a performance difference must be to outweigh an architectural interest.

The situation is further complicated by the changing nature of SQL on the iSeries. Since V5R2 we have two SQL engines, the new SQE engine being more deeply based in the machine and much faster than the older CQE engine. The selection of the engine is made at runtime by the Query Dispatcher, but the choice may be predetermined by the solution design (creating DDL indexes, for example).

Benchmark data is lacking using the new SQE. Our recent benchmark DID use the SQE (at V5R3) and SQL significantly outperformed RLA, but this was under a circumstance where SQL would be expected to outperform native IO (set operations supporting an interactive decision support program).

It seems to me that there are at least three categories (possibly more) of processing situations where benchmarks would be interesting:

1. Batch processing where the SQL solutions get to use set operations. Here SQL will outperform RLA, but the extent of the difference would be interesting.

2. Batch processing where the SQL solutions have to use some single-row retrievals as well as set operations (one cursor, for example, and single-row retrievals while fetching through the cursor). Here the RLA solution might outperform the SQL solution, depending on the mix of IO activity.

3. Interactive applications where most IO activity will either be single-row or small result sets. Here RLA would be expected to be faster, but the degree of the difference would be interesting to measure.

Whenever I conduct benchmarks which may be of interest to others, I will consider posting the results on this blog. I hope to be able to do some, but such activity will be subordinated to other matters of higher priority.

Posted by: Max at May 19, 2006 9:15 AM

Max,

Amen.

Benchmark results is what probably all readers here are after. Tech talks on how low level api's are utilized may be interesting reading material before bedtime, but that doesn't do anything to make the right choices in hard and software design to solve a business problem at hand.

I would like to see real benchmark test result for an agreed upon system load scenario, acknowledging the scenario will never mimic 100% a real business load. The one you proposed seems to me a good starting point. Maybe the test should be broadened to multiple tasks in parallel and I would also throw in a certain random functionality instead of a static scenario.


Next, I would like to expand the benchmark on other hardware and rdbms designs, and in order not to compare apples and pears, on systems that are designed to support the same kind of applications (fi business vs scientific) and who reside in the same price class.
Now t*h*a*t would be interesting reading material, but I'm afraid those kind benchmark tests are only utopia.

PS: on the other hand, given the high price for iseries hardware, any wintel server could beat iseries easily (or not?).

Posted by: ugeerts at May 19, 2006 9:55 AM

Excellent summary, Max. My only issue is with the phrase, "how significant a performance difference must be to outweigh an architectural interest."

From the beginning of this thread I've struggled with the concept of "architectural interest". What exactly is this "architectural interest" that makes you favor SQL? Could there not be an architectural interest that favors RLA? In that case, would you be willing to dump SQL completely?

The concept bothers me because, without some form of quantifiable measurement, "architectural interest" could be little more than another way of saying "personal preference".

Joe

P.S. As to the question of comparing platforms, it's been hashed a billion times. When you try to create "equivalent" systems, please remember to include licensing fees, administrative personnel costs, anti-virus software, development tools, backup subsystems and so on. And no, I won't get sucked into that discussion; I've spent enough time in this blog.

Posted by: Joe Pluta at May 19, 2006 1:55 PM

Thanks, Joe; I'm glad you liked my summary.

Although I won't deny that I have personal preferences, I actually intended the phrase "architectural interests" to be a neutral recognition of architectural and coding structure interests which companies might have which also impact decisions about IO strategies.

For some companies these interests might favor native IO; for some companies they might favor SQL; and perhaps some companies will hold that performance always decides the choice.

The company which I have referred to which decided to have an ILE RPG system with a layered architecture where the IO layer was SQL only -- this was under V4R4 so performance was obviously not the reason for favoring SQL. I think their reasons were likely what I mean by "architectural interests" here.

I did not mean to imply that sound architectural interests will always favor SQL. I can understand that the architectural interests of some companies may strongly favor native IO. I merely meant to recognize that it is not uncommon for architectural interests to also play a role in decisions about IO strategies.

Posted by: Max at May 19, 2006 2:33 PM

"I did not mean to imply that sound architectural interests will always favor SQL."

Exactly. There is appropriateness of application to mission. Standards are a dandy thing, but if an architect has to fulfill a mission based solely on speed, then the design will be different than a mission based solely on time to market, so to speak.

We just completed a re-engineering project, moving to SQL being fed via Linux based ODBC from FTP feeding the native DB from Windows. We process something near 80 million transactions per 18 hour operation segment. It was quite a learning process. Speed was a major focus and a great deal of time went into multi-threading to compensate for some simply awful trigger processes that dragged things down. SQL wasn't the problem and it would have been as messy in any other style, I think.

Finally, the decision was made to use triggers only to feed messages to DQ messages that were being slurped up by services used to process transactions. (SQL is amazingly adept at data but does not defend itself from feeble design flaws.)

That left SQL pretty much standalone and the results were everything we expected. Many of the problems we suffered with legacy applications, like the simple definition of a new column, went away. Speed was much better than expected. Change control became so much smoother. File management is now quite pleasant. Metrics are all automated.

My point is that you have to sum a lot of things to measure the overall ability of an approach. A point several have made here. I vote with them.

Posted by: Jinx at May 21, 2006 4:55 PM

Interestingly enough, Max, you mentioned that the company decided to use SQL, and said that's what you meant by "architectural interest". And yet, unless I'm missing something, you never said WHY this company chose SQL. In fact, I haven't seen a single compelling argument as to why SQL would be used exclusively or even primarily, except that somebody decided it would be.

Now there may indeed be a good reason, but I'm hard pressed to see it. Instead, it sounds more like a decision made at the executive level after some consultant convinced them that this was the "next generation technology" or an "industry trend". It's exactly what happened when SSA decided to go Unix and SQL, despite the fact that every technical person on the staff explained why the architecture was a horrible fit for ERP systems. Instead, a couple of pseudo-technologists with more degrees than ability convinced upper management that there were "architecural interests" that superseded using the best tool for the job. And those, Max, are typically the worst possible decisions I've ever seen. This one bankrupted a 500 million dollar company in about five years.

And that's why I'm leery of architectural interests. I've spent my life writing code, learning new techniques, and identifying the best uses for various technologies. Over the years I've built quite a repertoire of skills and become relatively proficient at selecting the right one. And I've never gone wrong by making sure that performance was always a very high ranking criterion during that selection process. Of course, I'm not wedded to performance or I'd still be writing everything in assembly language or MI. But there was a time when, for example, C compilers didn't generate tight enough code and we had to write our own assembly language (especially for interrupt routines).

You have to be able to select the right technique. Remember, in the end it's all ones and zeros and if my ones and zeros are faster than yours, then I win. Now in reality there are other considerations such as time to market and maintainability and available skill sets and any number of things. But these are individual situations, and they don't change the concept of the best tool for the job. You may then consciously choose to use a different tool (this I think is a better definition of your "architetural interest") but even then it doesn't change the fact that the tool you decided not to use wasn't the best tool for the job; it just wasn't the RIGHT tool for this specific job.

My point is this: there is always a best tool and a bunch of runner ups for every situation. And the same tool isn't the winner for every situation. And once you've determined the best tool, there are indeed smetimes extenuating circumstances that might make the BEST tool not the RIGHT tool. However, if the reason given for not using the best tool is because "the industry" is "moving in the direction" of some specific coding technique, then that is usually a very bad sign. It usually means there's someone with an agenda (and a pet technology) somewhere in the chain. It's never a good idea to bet your mission critical systems on "industry trends" when those trends aren't backed up by cold, hard numbers.

Just ask SSA.

Joe

Posted by: Joe Pluta at May 21, 2006 8:03 PM

Concerning adding a field being problematic with RPG native IO but no problem with SQL, most RPG shops I have worked in through the years added fields to extension files rather than a master file used throughout many programs in a system. Only programs with new code using a new field is recompiled with the extension file, as it had to be recompiled and tested anyway for the new code.

Also, I have not seen any problem with adding fields to the end of record formats and turning level check off. This accomplishes the same thing if it is not worth creating an extension file for a field or two.

More importantly, OS/400 and RPG native IO is what is referred to as a competitive advantage. There are lots of companies running their enterprise on it.

Will competitors find a cost advantage to using SQL only programmmers against SQL only databases on the non-object based operating systems Unix, Linux, and Windows?

All the software vendors want business executives to believe that because that is all they have to sell, with IBM essentially keeping their mainframe COBOL and AS/400 RPG success stories a secret because they have Websphere to sell.

If people truly believe OS/400 with RPG is not a competitive advantage to SQL and other OS'es, then I would dump IBM, their mainframes and midranges, COBOL and RPG, and everything else IBM, and go with others.

I think RPG and RPG programmers are a competitive advantage, but if they're not, then IBM is not an advantage. Other SQL only solutions would be more cost effective.

rd

Posted by: Ralph Daugherty at May 21, 2006 9:35 PM

The IBM Redbook Modernizing and Improving Maintainability of RPG Applications with X-Analysis has a lot of value in its discussion of the goals for RPG modernization whether or not X-Analysis is used. The information fits well with Don Denoncourt's recent article on creating an API. Robert Martin's 1996 article for C++ Magazine on Dependency Injection is available on the ObjectMentor website. This article shows how dependency injection in OO languages achieves more reuse than procedural languages allow. I found the example on pages 3 thru 6 very enlightening.

Posted by: Greg at May 23, 2006 8:03 PM

No wonder systems are what they are...in this age to add fields to a database and not use LVLCHK(*NO)...or have file extensions to add fields....with all this technical expertise how hard is it to develop a "GLOBAL COMPILE" function that updates, flags, deletes all logicals, compiles all physicals and then compiles all DSPF files and CLP and RPG programs....it amazes me that this level of programming still exists in the AS 400 world!!!!!!no wonder users complain about lack of functionality....this is basic programming 101 on the AS400...if shops do not have this utility then they should not be using an AS400....I developed mine in 1992 and it still runs like a charm today....How sad this all sounds!!!!!!!!

Posted by: Perry at May 24, 2006 7:11 AM

[PaulConte]
Early in the discussion Aaron Bartell suggested: "To put forth an initiative to encapsulate every stinkin file or piece of business logic without even knowing if it will be reused is a plain waste of time."

This is dead wrong.

Encapsulation is a programming strategy that should be used pervasively. The point isn't mainly for code reuse, but rather to manage complexity.

A professional programmer should no more attempt their work without programming tools, such as encapsulation, factoring, decoupling, etc., than a surgeon should perform an operation without sharp surgical knives. OTOH, I don't recommend amateurs attempting even the simplest surgery, regardless of the tools at hand. Similarly with programming.

This "encapsulation" stuff isn't "woo-woo" programming -- it's basic competence.
[/PaulConte]


Sorry I haven't responded sooner to being "dead wrong" but this non-subscription blog stuff will be the death of me.

Anyways, Paul, maybe a little more explanation from myself is in order because I do not believe I am "dead wrong" - maybe need to modify my stated approach, but not "dead wrong".

I was strictly approaching purist encapsulation (i.e. encapsulating every PF with RPG ILE service progams) from an RPG programmers perspective - which I noted in my original post. I lived through trying to accomplish that same task and it just isn't intuitive with RPG nor productive (read zero tooling to autogenerate and refactor). To apply all programming thoughts to RPG as one would Java is more on the dead wrong side IMO.

Now, on the Java side it is an entirely different story. For instance right now I am developing an affiliate lead tracking web app using Tapestry as the view and Hibernate for my data persistence. I can't say how nice it is (note that I didn't say easy, more on that later) to be able to, with a few clicks of a button, completely setup all of my Java data access without writing hardly one line of SQL. The tools out there for Java are truly amazing. The only drawback is that you need to develop purist databases (i.e. constraints, foreign keys, no composite keys, etc). I am learning more and more about how to work around this supposed "drawback", and as I do I can see how flexible my model becomes. The tools are getting better all the time too - I use MySQL for non-iSeries Java development and right now am enjoying MySQL Workbench which gives me a graphical design aid tooling so I can visually see my table relationships: http://dev.mysql.com/downloads/workbench/1.0.html

Thanks for listening,
Aaron Bartell
http://mowyourlawn.com/blog

Posted by: Aaron Bartell at June 6, 2006 12:33 PM

Post a comment




Remember Me?


Acceptable Use Policy

Bill Blalock
January 2009
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Blog Policy

We welcome your comments and opinions and encourage lively debate on the issues. However, Penton Media reserves the right to delete or move any content that it may determine, in its sole discretion, violates or may violate its Terms of Use or is otherwise unacceptable. For more information, see Penton Media's Terms of Use.

ProVIP Sponsors