COB2[1].Close Of Business – BATCH.JOB.CONTROL, Errors-R10.01.pdf
Short Description
Download COB2[1].Close Of Business – BATCH.JOB.CONTROL, Errors-R10.01.pdf...
Description
1.
Welcome to the “Close Of Business – BATCH.JOB.CONTROL & Errors” learning unit. This learning unit will help you understand COB, the different stages of COB and the applications in T24 associated with COB.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
1
After completing this learning unit/course, you will be able to: •
Understand the internal working of COB
•
Understand the working of tSM & tSA
•
Visualize failure scenarios
•
Understand the types of errors generated during COB
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
2
What have you seen till now 1. You have learnt the various stages of COB 2. The fields in BATCH record 3. tSM and tSA 4. TSA.SERVICE 5. TSA.WORKLOAD.PROFILE 6. TSA.PARAMETER 7. Running COB in phantom and Interactive mode 8. Date Change 9. You learnt what is the NS Module 10. Monitoring COB using enquiries 11. EB.EOD.ERROR
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
3
1. Default Printer specifies the default printer to which all output is to be directed for the process. If no printer name has been specified in field PRINTER.NAME for the individual jobs, then this field allows the user to define a default printer to direct the output. If this field is left blank and no printer has been specified for the individual jobs, then the default SYSTEM printer defined in DE.FORM.TYPE will be used. 2. Printer Name field holds the Name of the printer where the output of the job needs to be sent to. If left blank, the printer specified in the Default Printer field will be used 3. Data field holds the Input parameters for the job specified in JOB NAME. Usually contains ENQUIRY REPORT ids or REPGEN ids. In this screen shot when you see ENQ followed by EU.FX.PL.TODAY you might think that this is the name of an enquiry – NO. This is the ID of a ENQUIRY.REPORT application and should be specified in this manner in the DATA field 4. Job Message : In case the job results in an error, the error message is stored in this field. The error message gets cleared once the job error has been corrected and the job’s Job Status changes to 2
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
1. In T24, most of the COB routines are multi-threaded (Not OS multi threading but a simulated multi-threading) 2. A routine is broken up into 3 parts - LOAD, SELECT, RECORD 1. LOAD routine - initialization of common variables and parameters 2. SELECT routine - selects ID’s from file 3. RECORD routine - contains actual processing logic 3. LOAD and RECORD routine are executed by all tSA and SELECT routine is executed by only ONE tSA. The main job of the SELECT routine is to select all the IDs that need to be processed. Now, imagine a scenario, where in interest needs to be accrued for 1000 accounts as part of a COB job. Would you like all tSAs to perform a select on and get the list of account ids and want each of them to accrue interest for all the accounts? Of course not. It would suffice if one agent performs a select and stores the selected ids in a common file so that all agents can share and process the ids. This is the reason why only one agent performs the select while the others wait until the select is complete. 4. Instead of executing one long routine (by only one tSA), breaking it up into logical parts and then executing it (by multiple tSA), is faster
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
5
1. As you know a multi threaded routine comprises of 3 parts. Each part is a separate routine that does a specific task. For example: If we look at a sample routine ROUTINE1, it will be made up of three parts ROUTINE1.LOAD, ROUTINE1.SELECT and ROUTINE1 (This is called the record routine). 2. Once the tSAs are started, each of the tSA’s execute the .LOAD routine. 3. Then only 1 of the tSAs will execute the .SELECT routine while the others wait. As a result of the .SELECT routine a LIST file will be populated. This list file will contain the actual list of ID’s to be processed from the database (as a result of the select statement inside the .SELECT routine) 4.
Q.
Once the LIST is ready, all the tSA’s will pick up ID’s from the list file, and start executing the ID’s one by one by calling the record routine, viz the routine name itself i.e. ROUTINE1 (does not have any suffix to it) Now, how is this different from single threaded routine?
Ans . In case of a single threaded routine, there would be just one routine that would have the logic of .LOAD, .SELECT and execution built into it. Therefore, only 1 tSA would process the whole routine. However, in case of a multi threaded routine, the logic is split and written in the 3 separate routines, and multiple tSA’s can execute it at the same time, thereby decreasing the total time taken to execute a routine, and increasing throughput.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
6
1. How will I know if a routine is single threaded or multi threaded? Any job (routine) that will be executed during COB, will contain an entry in the application PGM.FILE. The id of the record in this application will be the name of the job and the field TYPE will be set to ‘B’ 2. If the field BATCH.JOB contains a NULL ‘’ value or @BATCH.JOB.CONTROL, then the routine is a multi threaded and will have 3 parts to it. For e.g.: AZ.CYCLE.DATES.LOAD, AZ.CYCLE.DATES.SELECT and AZ.CYCLE.DATES NOTE that the .LOAD and the .SELECT routines need not have an entry in the PGM.FILE, they will automatically picked up by the T24 system while executing the routine. 3. If the field BATCH.JOB for that particular job contains the name of a routine (can be the same name as that of the job) then the routine is single threaded. In other words, if the field BATCH.JOB is not NULL ‘’ or does not contain @BATCH.JOB.CONTROL then the routine is single threaded
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
7
1. The first agent (tSA) to be started is called the tSM (T24 Service Manager). tSM monitors one or more agent processes called tSA (T24 Service Agent). Every time an agent is started, it will first check if it is the tSM. If not, then it will continue to work as an agent. 2. tSA invokes a routine called S.JOB.RUN, that in turn invokes a routine EB.SORT.BATCH only ONCE, which sorts the COB jobs in ‘A’, ‘S’, ‘R’, ‘D’, ‘O’ order. 3. Assume that today is 04-Jan-2010 and the table here contains sample records from the BATCH application. EB.SORT.BATCH fetches all the processes from the BATCH application whose PROCESS.STATUS field is not equal to 2, i.e. it picks up all the processes that are not complete. It then sorts the jobs based on the ‘Batch Stage’ and the ‘Frequency’. Once EB.SORT.BATCH returns the list of jobs to be processed, S.JOB.RUN invokes a core T24 subroutine called BATCH.JOB.CONTROL, about which you will learn in the next few slides 4. In what order do you think will EB.SORT.BATCH give the jobs to each tSA for execution? 5. Each job is given in the form of a dynamic array to a tSA containing the details that you see on the screen now PROCESS.NAME:'_':JOB.NAME:'_':RTN:'_':JOB.DATA:'_':COMPANY.ID:'_':NEXT.RUN.DATE:'_': ACTIVATION.FILE:'_':BATCH.STAGE:'_':JIDX:'_':BATCH.PROCESS.STATUS E.g.: BNK/AC.START.OF.DAY_ACCOUNT.DEBIT.LIMIT.UPD_BATCH.JOB.CONTROL_ _GB0010001_ _ _D110_1_0 1. PROCESS.NAME - BNK/AC.START.OF.DAY (@ID of BATCH record) 2. JOB.NAME - ACCOUNT.DEBIT.LIMIT.UPD (JOB NAME in the above BATCH record) 3. RTN - BATCH.JOB.CONTROL (Contents of the field BATCH.JOB in the PGM.FILE entry for the above JOB.NAME) 4. JOB.DATA - ‘’ (Contents of field JOB.DATA for this job in the BATCH record) 5. COMPANY.ID - GB0010001 6. NEXT.RUN.DATE - ‘’ (not needed for a job with FREQUENCY ‘D’) 7. ACTIVATION.FILE - ‘’ 8. BATCH.STAGE - D110 ; JIDX - 1 (multi value position of the above job in the BATCH record) 9. BATCH.PROCESS.STATUS – 0
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
8
Slide 8 SG2
Sujoy Ghosh 08/05/2008 Concentrate on explaining How the jobs are sorted. Make the trainees write it on a piece of paper first and then check with the slide. The slide is animated, so the result of sorting is on a mouse click, wait for the trainees to complete before showing the animation Make sure that they understand that within a BATCH stage the jobs CANNOT be swapped or sorted, for eg: JOB2 cannot be executed before JOB1 Sujoy Ghosh, 2008/05/22
1. In the previous screen, each of the agents were given a list of Jobs to execute according to the Batch Stage. Now lets understand how each tSA processes these jobs. 2. First, the JOB.PROGRESS field is updated in the F.TSA.STATUS file to 5 (Means the tSA is now going to invoke a core T24 routine called BATCH.JOB.CONTROL). BATCH.JOB.CONTROL is the heart of the COB process and takes care of the entire execution logic. This routine is invoked once per job. This tSA first executes the .LOAD routine. In our example the name of the job is JOB5, therefore the name of the load routine will be JOB5.LOAD. As a result of the LOAD routine all the common variables required by this routine will be initialized. 3. Since, all the tSA’s execute the same job, they need to know where to pick up contract id’s in order to process them. The contract id’s are all stored in a LIST FILE.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
9
1. What is a LIST FILE? A LIST file is one that holds all the id’s of the contracts from the database 2. How did the tSA know which LIST file to use? Within BATCH.JOB.CONTROL, there is logic to find out a free LIST file 3. This routine will loop through F.LOCKING with @ID starting with F.JOB.LIST.1, and will return the one (List file name) that is first available 4. If this LIST file does not exist then BATCH JOB CONTROL will create the LIST file
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
10
Slide 10 s1
F.JOB.LIST.1 exists by default when T24 is shipped. Every time the LIST file does not exist and BJC has to create the LIST file, it creates it by using the properties of F.JOB.LIST.1 salamelu, 2010/02/27
1. Before deciding which LIST FILE to use, it updates the JOB.PROGRESS field in F.TSA.STATUS to 3. The value 3 means that the tSA is managing a record in a file called F.BATCH.STATUS. What is F.BATCH.STATUS? This is a file that contains the status of a particular job (within a batch record). A record ID in this file is the FLAG.ID, viz. the ProcessName-JobName-Multivalue Position of the job in the Batch record. 2. All the tSA try to Read and Lock a record with @ID as Flag ID in F.BATCH.STATUS. One tSA succeeds while the other tSA’s wait for the Lock to be released. First time this record will not exist in F.BATCH.STATUS, we use a jBASE command to read and lock a record; this command will lock a record even if the record does not exist. 3. The tSA which has acquired the lock on F.BATCH.STATUS record with @ID as Flad Id will perform the List File allocation (refer to next slide for list file allocation logic) 4. The value in the second position of the record is checked. If it not processing, then the SELECT has to be performed. A value of Processing signifies that the select is over and the processing of contract ID’s should begin. Since the .SELECT has not yet been executed for this job, this position will be empty.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
11
Step 1: Read a record in F.LOCKING with @ID equal to Process.Name-Job.NameJob.Position. Process Name is the name of the BATCH record. Job Name is the job within the process and Job. Position is the position of the job in the BATCH record. This is collectively called as FLAG.ID (P.N-J.N-M.Vpos). If the record exists, then read the list file name that has been assigned for the job and continue with the select logic. Step 2: Start a transaction block Step 3: If the record does not exist, then Read and Lock a record in F.LOCKING with @ID equal to Process.Name-Job.Name-Job.Position. The first time this record will not exist in F.LOCKING. We use a jBASE command to read and lock a record. This command will lock a record even if the record does not exist. Therefore, the tSA that came in first will try to read a record in F.LOCKING and will retain the lock till the transaction ends. Step 4: Get the name of the LIST FILE into which the .SELECT routine will populate values into. Step 5: The name of the FILE in this case is F.JOB.LIST. e.g.: F.JOB.LIST.2. If the file (LIST FILE) does not exist, then the routine will automatically create it, thereby ensuring that the file physically exists. We create a record in F.LOCKING with @ID as LIST FILE NAME. The contents of this record is the FLAG.ID. This is done to stop other threads from picking up the LIST FILE to populate data into it. We also create another record in F.LOCKING with @ID as the FLAG ID. The contents of this record is the name of the LIST FILE. This is done so that we can publish to other agents that this particular LIST FILE is to be used for this JOB Step 6: End the transaction block
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
12
5. Step 1: Start a transaction Block 6. Step 2: Update the JOB.PROGRESS field in F.TSA.STATUS to 2, which stands for selecting contracts to populate into the LIST FILE record. Execute the Job5.SELECT routine. This routine will take care of populating the contracts in the LIST FILE record based on the logic described earlier 7. Step 3: The .SELECT routine can be called multiple times based on a values in a common variable CONTROL.LIST. The values here are separated by FM. For each value a select is performed. The contents of CONTROL.LIST are written to F.BATCH.STATUS. Therefore, Update the record in F.BATCH.STATUS with the value of CONTROL.LIST along with the first line containing VMPROCESSINGVMMAX.IST.IDVMKEYS.PROCESSED. MAX.LIST.ID is a variable that contains the total number of records in the LIST FILE. KEYS.PROCESSED is a variable that contains the total contracts present in the LIST FILE. 8. Step 4: End the transaction Block Once the transaction block ends, the lock that was held on F.BATCH.STATUS by the first tSA will be released. The other tSA that was waiting for the lock on F.BATCH.STATUS, will now get the lock on the record in F.BATCH.STATUS with @id equal to FLAG.ID. However, now this tSA will find that the contents of the record is already set to “PROCESSING”, which means that some other tSA has already completed executing the .SELECT routine, and will continue to the next step viz. executing the record routine for each contract.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
13
A question which you should try and answer at this juncture •
Q. What is the output of a select statement?
•
A. A list of contract id’s that are selected from the database.
These contract id’s are written into a record of a LIST FILE. During the selection process, there is a call to a core T24 routine called BATCH.BUILD.LIST. This routine decides how to write data into the LIST FILE
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
14
1. Let’s assume the select routine retrieved 5 contract id’s. It called BATCH.BUILD.LIST which in turn created 5 records in the LIST FILE, one for each contract ID 2. For e.g.: The multithreaded routine that we have written is FT.PROCESS.EOD. Therefore, this routine will have three parts - FT.PROCESS.EOD.LOAD, FT.PROCESS.EOD.SELECT and FT.PROCESS.EOD. As a result of FT.PROCESS.EOD.SELECT routine, a list of FT contract id’s will be fetched from the database. These contract id’s will be written into a LIST FILE record. The diagrams above show you the content of each LIST File record
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
15
1. For conceptual view, the LIST FILE as a result of the previous select will look like this.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
16
1. To improve performance we can club a group of contracts and write it into the LIST FILE delimited by VM’s (Value Markers). 2. This process is called bulking Let’s go back to our example of the multi threaded routine FT.PROCESS.EOD. In the .SELECT part of the routine (FT.PROCESS.EOD.SELECT) we can tell the system how many contract ID’s do we want per row in a list record. Till now we have seen that every row in the LIST record has only 1 contract ID in it. We can have up to a maximum of 200 contract ID’s per row in a LIST record. Each contract in the row is delimited by VM’s (Value Markers). In the above example the bulk count has been set as 5 How does this improve performance? We will understand this when we look at how contract ID’s are picked up for processing later in this section.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
17
1. For conceptual view, the LIST FILE as a result or the previous select will look like this if the Bulk number has been set to 5
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
18
1. The client might want to Bulk records in the a particular job to improve performance. In that case, he can open the PGM.FILE entry for that job and enter the number of contracts to bulk in the field BULK.NO. It also possible to specify the bulk no in the parameter to BATCH.BUILD.LIST. However, the value configured in PGM.FILE takes precedence. The value in this field should be numeric and between 1 and 200 as the maximum bulk limit is 200.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
19
Continuing from the previous slides, a tSA has completed the .SELECT routine. The other tSA; waiting on the lock to be released on F.BATCH.STATUS will now obtain it. It will find out that the value in the F.BATCH.STATUS record has been set to ‘PROCESSING’, which means that the .SELECT routine has been executed. So, both the tSA’s are ready to execute the record routine. 1. STEP 1:Both the tSA’s will check if the content of F.BATCH.STATUS is set to ‘PROCESSED’, if not, it means there are contract ID’s to be executed. The tSA will update the JOB.STATUS field in F.TSA.STATUS record to 1. This value stands for Processing contracts. 2. STEP 2: Initialise a variable, which will hold all the LIST records to be processed. The name of this variable is FULL.LIST. 3. STEP 3: Set the JOB.PROGRESS field in F.TSA.STATUS to 4 which means selecting from the LIST FILE. 4. STEP 4: The tSA that executed the select, will store all the LIST record id’s separated by FM (field markers). All the other tSA’s that didn’t execute the .SELECT routine will perform a select on the LIST FILE, extract the LIST record id’s and store them in a variable. This is done to improve performance. Every select statement executed is an I/O on the database. So, the tSA that executed .SELECT routine need not again execute a select statement to pick up LIST records from the LIST FILE.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
20
This slide explains the working of one tSA only. The tSA has to now extract its portion of the work and start executing the RECORD routine. To do so the tSA1. STEP 1: Populates the LIST records it has to execute in a variable. This variable contains the LIST record id’s separated by FM (Field Markers). For e.g.: This tSA has to execute LIST record “1” 2. STEP 2: Update JOB.PROGRESS field in F.TSA.STATUS to 3. This means that the tSA has started processing the contracts one by one. 3. STEP 3: Extracts the first LIST record id from variable. 4. STEP 4: Reads and lock the corresponding record in the LIST FILE. Remember each LIST record contains contract ID’s. 5. STEP 5: Starts a Transaction Block 6. STEP 6: Extracts the first contract ID. Executes the RECORD routine for this particular contract ID 7. STEP 7: Checks if all the contract ID’s within the particular LIST record have been processed. If not, it deletes the particular contact ID from the LIST FILE. Then it ends the transaction block and goes on to pick on next contract ID within the same LIST record. However if this contract processed was the last contract to be processed in the LIST record, then it deletes the LIST record from the LIST FILE directly. Then it ends the transaction block. It proceeds to pick up the next LIST record ID to process. The important thing to note here is that the transaction block is around each contract ID inside the LIST record separated by FM (field markers). Both, the deletion of contract ID’s within the LIST record or of the LIST record itself, happens within the same transaction block. This implies that even if the tSA dies after it has executed the RECORD routine, and removed the LIST record from the LIST FILE the changes will not be committed till the transaction block ends.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
21
This slide explains the working of another tSA (tSA 3) The tSA has to now extract its portion of the work and start executing the RECORD routine. In this case, the LIST record that the tSA has to process is 2. To do so the tSA1. STEP 1: Populates the LIST records it has to execute in a variable. This variable contains the LIST record id’s separated by FM (Field Markers). 2. STEP 2: Update JOB.PROGRESS field in F.TSA.STATUS to 3. This means that the tSA has started processing the contracts one by one. 3. STEP 3: Extracts the first LIST record id from the variable. 4. STEP 4: Reads and lock the corresponding record in the LIST FILE. Remember each LIST record contains contract ID’s. 5. STEP 5: Starts a Transaction Block 6. STEP 6: Extracts the first contract ID. Executes the RECORD routine for this particular contract ID 7. STEP 7: Checks if all the contract ID’s within the particular LIST record have been processed. If not, it deletes the particular contact ID from the LIST FILE. Then it ends the transaction block and goes on to pick on next contract ID within the same LIST record. However if this contract processed was the last contract to be processed in the LIST record, then it deletes the LIST record from the LIST FILE directly. Then it ends the transaction block. It proceeds to pick up the next LIST record ID from ID.LIST. The important thing to note here is that the transaction block is around each contract ID inside the LIST record separated by FM (field markers). Both, the deletion of contract ID’s within the LIST record or of the LIST record itself, happens within the same transaction block. This implies that even if the tSA dies after it has executed the RECORD routine, and removed the LIST record from the LIST FILE the changes will not be committed till the transaction block ends.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
22
In case of Bulking, the concept we saw earlier, each LIST record contains a group of contract ID’s in every row rather than just one. These group of contract ID’s are separated by VM (Value Marker). In the above screenshot, the bulk count has been set to 9. 1. When a tSA extracts contract ID’s in this case, it extracts all ID’s till the first FM (Field Marker). In the above example, tSA 3 extract contract1, contract2 and contract3 together i.e. till it encounters the first FM. After starting the transaction block it processes all the 3 contracts within the same transaction block. The record routine is executed thrice (once for each contract ID) within a loop. Once the loop is over, the tSA removes the group of contract ID’s from the LIST FILE in one go. Q. How does this improve performance? A. As you know already, once a contract is processed, it is removed from the LIST FILE. Every removal of a contract is 1 I/O (input/output) on the disk. So, now instead of 3 write statements (1 for each contract) on the disk only 1 takes place. This quickens the entire processing of COB. Thus bulking helps improve performance. However, it can be used only if we know that the result of select statement will retrieve lots of contract ID’s
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
23
Once the tSA(s) have executed all the ID’s in its ID.LIST it will return to read F.BATCH.STATUS. The following steps take place. 1. Step 1: Start a transaction Block 2. Step 2: Both the tSA(s) try to read and lock a record in F.BATCH.STATUS with @ID as FLAG.ID. One tSA will be successful in doing so. If the tSA finds that the record is locked then, it will end the transaction block and exit. 3. Step 3: The tSA that succeeded in the reading the record and obtaining the lock will check if the record contains any value. 4. If yes, delete the first line of the record and write it back to F.BATCH.STATUS. This is because the .SELECT routine has to be called again for the next value in CONTROL.LIST. 5. If no, then it (tSA) will update the contents of F.BATCH.STATUS record with a string “PROCESSED”. This is done to tell all the tSA’s that all the contract ID’s have been processed and this Batch Job is complete. 6. End the transaction block. The following steps are done to clean up F.LOCKING and send the control back to S.JOB.RUN, so that the next Batch Job can be picked up for execution 7. Step 4 : Start a transaction block 8. Step 5: Read and lock a record in F.LOCKING with @id as FLAG.ID. Only 1 tSA will be successful in doing so. 9. Step 6: Delete the record in F.LOCKING with @ID as FLAG.ID 10. Step 7: Read and lock a record in F.LOCKING with @id as LIST FILE name. Only 1 tSA will be successful in doing so. 11. Step 8: Delete the record in F.LOCKING with @ID as LIST FILE name 12. Step 9 : End the transaction block.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
24
1. Multiple batch records with the same batch stage 2. Randomize batch records with the same batch stage 3. Never randomize jobs 4. Ideal when multiple single threaded routines need to be executed simultaneously
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
25
1. Take a look at this table that contains a set of records from the BATCH application. For each of the records, a batch stage has been specified. Note that they are all the same. This being the case, when EB.SORT.BATCH is internally called to sort the records, what EB.SORT.BATCH will do is, it will pick up all BATCH records with the same BATCH.STAGE and will randomize them and at the same time ensuring that the sequence of jobs within the BATCH records don’t change.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
26
1. When single threaded routines get executed, as you are aware, they will neither have a .LOAD nor a .SELECT component. They will only have one routine which will contain the entire logic. The following are the steps that will happen internally when a single threaded routine is executed. 1. BATCH.JOB.CONTROL (BJC) will realize that it is a single threaded routine and hence will not search for a .LOAD routine and hence no LOAD routine will get executed. 2. BJC running on the agent that holds a lock on F.BATCH.STATUS will write a value ‘SingleThreaded’ on to the list file 3. The agent that gets the lock on the ID in the LIST file will be the one which executes the single threaded routine 4. The entire single threaded routine is within a transaction block
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
27
1. Assume that tSA 2 is executing a single threaded routine 2. An agent recognizes that it is executing a single threaded routine using the field BATCH.JOB in the corresponding PGM.FILE record. This field will have the name of the routine to be executed prefixed with @. Once it does, it displays that it is executing a single threaded job along with the job name
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
28
1. tSA will update JOB.PROGRESS field to 3 in F.TSA.STATUS to denote that it is processing the job 2. Next, it will read and lock a record in F.BATCH.STATUS with @ID as Process.Name-Job.NameJob.Position. This ID is called FLAG.ID 3. It proceeds with list file allocation. Follows the same logic as that for a multi threaded routine. 4. Then, it checks if the content of the record is Processed or Processing or NULL. NULL denotes that the job is yet to be executed. Processing denotes that the LIST file is ready for processing and Processed denotes that the LIST file has been processed (meaning the job is done) 5. Since, it is only now that the job is being executed, tSA2 updates JOB.PROGRESS field to 2 in F.TSA.STATUS. This denotes that contracts are to be selected.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
29
1. Then, it starts a transaction block 2. Since it is a single threaded routine, there is no SELECT routine to be executed as well. Unless there is a LIST file with IDs, the BATCH.JOB.CONTROL framework will not be able to execute the job. Hence, what BATCH.JOB.CONTROL does is, it creates a record with ID 1 and writes a value ‘SingleThread” in it. 3. Now, the number of keys to processed is 1 4. The next step is to update F.BATCH.STATUS with the keyword processing to denote that the LIST file is ready for processing along with the maximum list of keys to process (which is 1) and the number of keys to process (which is also 1 in this case) 5. Marks the end of a transaction block
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
30
Now that the LIST file is ready with the string “SingleThreaded”, it is now time to execute the actual single threaded routine. This is what happens internally. 1. Read and lock the F.BATCH.STATUS record. Which ever agent gets the lock , is the one that updates JOB.PROGRESS field to 1 in F.TSA.STATUS to denote that it is processing the contents of the LIST file. Note that the LIST file doesn’t have any contracts in it, rather it just has the value “SingleThreaded” written in it 2. Update JOB.PROGRESS field to 7 in F.TSA.STATUS to denote that it is executing a single threaded routine. The agent also locks the one record in the LIST file with ID 1 and contents as ‘SingleThreaded’. 3. Start a transaction block 4. Execute the routine specified in the field BATCH.JOB in the appropriate PGM.FILE record 5. End the transaction block. Once a transaction block ends, it deletes the record from the LIST file. Now there are no more records to be processed. 6. Update F.BATCH.STATUS record to “PROCESSED” to denote that the LIST file has been processed 7. Start a transaction block 8. Delete the records in F.LOCKING with ID as FLAG.ID and ID as F.JOB.LIST. 9. End the transaction block
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
31
Slide 31 s3
salamelu, 2010/02/27
It is vital to understand that the tSM is itself a background process like the tSA but the only difference is that it will not execute COB jobs but will monitor tSAs. Therefore, when we initiate the tSM, it is as good as initiating the first tSA. This tSA is like a master agent and will control all the other agents. At any point in time, there can be only one tSM running on one T24 server but we can have as many numbers of tSAs as required on a T24 server. Therefore, TSA.WORKLOAD.PROFILE record used by the TSM service should never contain more than one agent. 1. Execute the command START.TSM with or without the –DEBUG option depending on how you wish to start tSM (Interactive or phantom mode) 2. Once tSM is started, it will check to see which agent is available to perform the role of a manager. For this, it scans through a file named F.TSA.STATUS to see if there are any records with STATUS set to STOPPED or DEAD. If you are running COB for the first time, then, this file will be empty. 3. Depending on which agent is available, it sets the field NEXT.SERVICE in the F.TSA.STATUS file to TSM for that particular agent. 4. Then, it internally starts the agent allocated to be the tSM (1 in this case) 5. As soon as an agent starts, the first thing that it checks for is to see if it is a tSM or a normal agent. This, it checks based on the value in NEXT.SERVICE field.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
32
6. Once done, it updates the F.TSA.STATUS file with the details of the agent that is running the tSM. Details such as server name, process id, the last time the agent write to this file, status and the current service that is being executed by the agent are updated on to the F.TSA.STATUS file. Note that the value in NEXT.SERVICE is moved to CURRENT.SERVICE. 7. Since this agent (Agent 1) is designated to be a tSM, it starts the tSM process by internally executing the command “tSM -DEBUG” 8. The next step is to build the service profile. Service profile refers to a list of services that need to be started and the number of agents required to be started for each service. At this juncture, it will update the field highest agent in the TSA.PARAMATER file with the maximum number of agents running on a server 9. Assuming that the required number of agents for COB is 2, it sets the field NEXT.SERVICE to COB for 2 agents (Agents 2 and 3 in this case), tSM will prompt you to start the necessary agents 10. Apart from this, it will also update the F.TSA.STATUS file with the information about these 2 agents.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
33
Slide 33 A1
This is a hidden slide and hence will not appear in a slide show mode. This slide has been placed as the notes page contents of the previous slide did not fit into a single slide. Alagammai, 2008/09/30
This explains the internal working of an agent that is designated to perform a role other than that of a manager (tSM) 1. Execute the command tSA followed by the agent number followed by –DEBUG (optional) to start an agent 2. Next, it checks if any service has been assigned to the agent (2 in this case) that has been started by referring to the F.TSA.STATUS file. 3. This internally executes the tSA program followed by the agent number 4. Checks if it has to perform the role of a tSM. This is the first check that all tSAs perform when they are started 5. Since the field NEXT.SERVICE for agent 2 is set to COB, it realizes that it has to execute COB 6. Then, it updates the F.TSA.STATUS file with details of agent 2 that has been started 7. Next, it internally calls a routine named S.JOB.RUN to trigger off COB processing 8. Next, it invokes a routine named EB.SORT.BATCH which is the one that will sort all jobs in the BATCH application based on BATCH.STAGE and FREQUENCY. 9. Once done, jobs start getting processed.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
34
This illustration will help you to understand how multiple agents help perform a job and how failure of a single agent does not stop COB 1. Each server can have only 1 tSM. tSM, when started will launch the required number of agents for that server 2. Note that this tSM and the agents have been launched in Server A. On Server B, another tSM has been launched, which internally launches the required number of agents on that server 3. Let us assume that all agents are helping in executing a particular COB job. Assume that one of the tSAs on Server A is the one which has executed the .SELECT routine has hence has populated the contents of the LIST file 4. As you can see, all agents pick up IDs to be processed from the same LIST file thus enabling to achieve multithreading 5. If you wish, you could add in one more server(s), start agents on that server and this agent can also participate and process IDs from the LIST file 6. Even if one of the agents crash, since all other agents are active, they will continue to pick up data from the LIST file. 7. This process will continue until the LIST file becomes empty.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
35
1.
How does the tSA tell the tSM that it is alive?
2.
From time to time within the flow of BATCH.JOB.CONTROL, we call a routine SERVICE.HEARTBEAT
3.
This routine takes care of updating the following fields in F.TSA.STATUS every 60 sec 3.1 LAST.CONTACT.TIME 3.2 JOB PROGESS 3.3 LAST MESSAGE
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
36
1. How do we ensure that TSM is alive? 2. On every call to SERVICE.HEARTBEAT, the tSA’s(any agent other than the one allocated for TSM) checks if the TSM is alive. 3. In Phantom mode, agents re-launch the manager, if TSM has not reported for a specified period of time(an update of the last contact time of TSM is maintained in F.LOCKING with records key as TSM:. If difference between current time and last contact time as updated for TSM is greater than 120 seconds, then the tSA will re-launch the TSM)
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
37
1.
Log files get stored under a directory &COMO&
2.
LIST &COMO& to get a list of log files
3.
JED &COMO& to view the contents of the log file
4.
Log file ID : tSA___
5.
Remember 5.1 If an agent is restarted, there will be 2 log file for that agent as the log file ID is based on date and time 5.2 Log files do not get cleared automatically
From time to time within COB processing, mainly within BATCH.JOB.CONTROL, a file called &COMO& is updated. This file resides in the T24 Home directory (bnk.run) and contains the log of what every tSA has done. There is one record per tSA per day in this file. The @ID of the record in this file is tSA___. The date format is YYYYMMDD and the time format is HH-MM-SS E.g.: tSA_2_20100324_17-32-10. We can view the log stored in &COMO& through the JED editor. At the jshell prompt, type JED &COMO& tSA___ to view the log of a particular tSA. E.g.: jsh ---> JED &COMO& tSA_2_20100324_17-32-10
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
38
1.What are the different types of errors that could be encountered during execution of COB? 1.1 Errors that are caused due to jBASE level problems, like error reading a file, error opening a file etc., are called jBASE level errors. 1.2 Errors that are caused due to O/S level problems, like inappropriate permissions on a file, memory inadequate, etc., are called O/S level errors. 1.3 Within the individual COB jobs we can raise two types of errors viz. Non Fatal Errors, Fatal Errors and Critical Errors. You will look at these 3 types of errors in the next slides. 2. What happens when a jBASE/OS level error occurs? The agent executing COB will crash out, by either going to the jBASE debugger prompt or the jsh (jshell prompt). The error has to corrected before restarting COB. The PROCESS.STATUS and the JOB.STATUS field in the BATCH record will be 1 (meaning ‘running’). There will be no record created in the application EB.EOD.ERROR and EB.EOD.ERROR.DETAIL, this is because the error was not raised by the COB job. An error will be logged in these applications only when the error raised can be corrected before starting COB during the next working day i.e. the error is not critical.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
39
1. What is a Non Fatal Error? If an error is encountered during the execution of a job then, within the individual COB job/routine we would have called a core T24 API called FATAL.ERROR. To this routine we can pass a parameter telling it not to fatal out. This means that the agent executing COB will not crash out, instead in this case the agent will write the error to applications called EB.EOD.ERROR and EB.EOD.ERROR.DETAIL, and then continue executing COB. A field in EB.EOD.ERROR called FIX.REQUIRED is set to ‘YES’ indicating that the error/errors are fixed before COB is started next day. The PROCESS.STATUS field and the JOB.STATUS field in the BATCH record will be “1”, signifying that the current job and therefore the process is running. 2. What is a Fatal Error? If an error is encountered during the execution of a job then, within the individual COB job/routine we would have called a core T24 API called FATAL.ERROR. To this routine we can pass a parameter telling it to fatal out. This means that the agent executing COB WILL crash out, writing the error to applications called EB.EOD.ERROR and EB.EOD.ERROR.DETAIL. However, before crashing out, it (agent) removes the contractID from the LIST FILE and then crashes. This means that the agent can be restarted and will start processing from where it last ended, minus the contract that caused it to fatal out. The PROCESS.STATUS field and JOB.STATUS field in the corresponding BATCH record remains “1” (which stands for “Running”) 3. When a Fatal Error is encountered, the problem needs to be reported to the Temenos Help Desk to get it fixed
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
40
1. When a FATAL.ERROR occurs, and you are running TSM in Interactive/Debug mode. The error will be displayed on the screen and will be written on the COMO before the agent crashes out, and returns to the jshell prompt. The contract that caused the error will also be removed from the LIST FILE. If you are running TSM in Phantom mode, the agent will still crash out however it will not be displayed on the screen. The AGENT.STATUS enquiry will show that the agent has been stopped and new agent will automatically be launched by TSM once the REVIEW.TIME has been reached 2. The error is written on to the EB.EOD.ERROR and EB.EOD.ERROR.DETAIL file. If there are multiple agents running, it will start COB from where this agent left off. You will have to manually start the agent again in Interactive Mode.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
41
Q. How do you tell T24 that a job is a CRITICAL job? A. Set the ADDITIONAL.INFO field in PGM.FILE entry for that job to .CRITICAL 1. What happens when a critical job crashes? The error is treated as a Critical Error. If an error is encountered during the execution of a critical job then, the agent executing COB will crash out in this case and return to the jBASE prompt (In Interactive Mode). In phantom mode you will be able to see that the agent has stopped in the AGENT.STATUS enquiry output. It (the agent) DOES NOT write the error to applications called EB.EOD.ERROR and EB.EOD.ERROR.DETAIL, DOES NOT remove the contractID from the LIST FILE and then crashes. The PROCESS.STATUS field and JOB.STATUS field in the corresponding BATCH record is updated to 3 (which stands for Error/Hold). This means that if a critical job fails, the cause of error needs to be corrected immediately before COB is restarted again.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
42
1. Repeated crashes at system level i.e., crashes in LOAD and SELECT , will stop the service unlike before when crashes in LOAD & SELECT routines will not stop the service rather other agents will try and complete this task i.e. repeated crashes at system level was not detected by tSM and it tried to achieve the task through other agents. 2. Repeated crashes on the same job within a time interval stops the service. Duration and the number of crashes can be configured at TSA.PARAMETER table through fields (STOPPAGE TIME & STOP COUNT – Number of times the crash happens for a service i.e. for the agents within a specified time). The first time the agent is marked as ‘DEAD’, TSM updates a record with key as -STOP in F.LOCKING with the stop time and stop count as one. If not the first crash, then the time difference between the first stop time and the current time is checked and if it has exceeded the stop time period specified in TSA.PARAMETER, then the current time is updated as the first stop time. Otherwise if the stop count has exceeded the limit specified, then the service is stopped.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
43
1. Date specifies the bank date on which the service was started. 2. Started specifies the time on which the service was started, in the format DD/MM/YYYY HH:MM:SS 3. Stopped specifies the time on which the service was stopped, in the format DD/MM/YYYY HH:MM:SS 4. Elapsed specifies the elapsed time of the service, i.e., the difference between the start and stop time for the service in seconds. 5. Transactions specifies the number of transaction processed by the service 6. Stoppage Time specifies the time interval for a service to be monitored for the number of crashes allowed as specified in the field STOP.COUNT. If no value is specified here, the value is taken from TSA.PARAMETER. The time given in this field must be grater than TIME.OUT/DEATH.WATCH. 7. Stop Count specifies the number of crashes allowed for a service in a time period as specified in the field STOPPAGE.TIME. If no value is given for this field, the value is taken from TSA.PARAMETER e.g., STOPPAGE.TIME = 100 STOP.COUNT = 3 Here the Service can only crash 3 times in a time period of 100 seconds. At the fourth crash within 100 seconds, the service will be stopped.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
44
1. Fatal errors(both Online and during COB) provide more useful information. 2. Call stack & core dump are written to a new file EB.SYSTEM.STATUS along with a transaction reference. Call stack will contain all calls made before the crash happened. 3. The above changes will be packaged as a part of TEC framework. 4.
Key to be transaction reference & unique number.
5.
All variables and their values at the time of crash are also stored.
6. The core dump uses jBASE system functions to obtain the required information.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
45
1. What happens when a tSA crashes while processing the LOAD routine? Ans. Since there are multiple tSA’s executing the LOAD routine, if this tSA fails another tSA will execute it anyways 2. What happens when a tSA crashes just after executing and populating the LIST file Ans. The LIST FILE to be used is written onto F.LOCKING for a particular job, therefore even if this tSA crashes, other tSA’s will read F.LOCKING and pick up the LIST FILE to be used 3. What happens if a tSA crashes while processing the SELECT routine? Ans. The SELECT routine is executed only by one tSA and is wrapped around jBASE transaction management. So, if the tSA crashes while doing the select any partial updates will be rolled back. The other tSA waiting for the lock to be released on F.BATCH.STATUS will perform the select 4. What happens if a tSA crashes while executing a contract ID in the LIST record? Ans. Each contract ID(s) being processed in the LIST file is wrapped around a transaction block, therefore any failure in tSA will result in a rollback. The contract ID will not be removed from the LIST record. Any other agent will pick up that particular LIST record and execute the contract ID(s).
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
46
So, since the TSM is still running, it will prompt the user to start the 3 agents required to run COB. •
The user starts the first tSA. It (tSA) calls EB.SORT.BATCH. This in turn reads F.BATCH and picks up all the processes with PROCESS.STATUS ready (0) or running(1). So, it leaves out the process BNK/PL.CLOSE.EOD as its PROCESS.STATUS is already 2 (completed).
•
It reads the JOB.STATUS of Job7 and finds that its set to 2. This means that Job7 has been completed successfully.
•
The tSA then executes the Job8.LOAD routine and tries to read a record in F.LOCKING with @ID equal to FLAG.ID. (Process.Name-Job.Name-Multi value position of that job within the process, e.g.: BNK/FT.EOD-Job8-2). It finds out that such a record exists as the previous tSA (that crashed) already wrote this info into F.LOCKING file.
•
The tSA now has the name of the LIST FILE into which the LIST records should be populated into. E.g.: F.JOB.LIST.2
•
It now tries to read a record in F.BATCH.STATUS with @ID equal to FLAG.ID. However such a record does not exist as the previous tSA crashed before writing the information into F.BATCH.STATUS.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
47
•
The tSA gets the lock on F.BATCH.STATUS and executes the Job8.SELECT routine. By this time if the other tSA’s are also started then they will wait till this tSA is done executing the Job8.SELECT routine, as this tSA is holding a lock on F.BATCH.STATUS
•
It will execute the Job8.SELECT routine, populate the LIST records into the LIST FILE, F.JOB.LIST.2
•
Suppose as a result of Job8.SELECT , 90 contract ID’s are picked up from the database then the LIST FILE will have 90 LIST Records each record having 1 contract ID’s stored in it.
•
Write the updated information into F.BATCH.STATUS with value of VMPROCESSINGVM90VM90 and end the transaction block.
•
Now the other tSA(s) waiting for the lock on F.BATCH.STATUS to be released, will read the information in it and find that the value is set to “PROCESSING”. This means that .SELECT routine has been executed. All the tSA’s will now proceed to execute the RECORD routine.
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
48
Slide 48 A2
This is a hidden slide and will not appear when run in slide show mode. This slide has been placed as the notes page in the previous slide did not fit into 1 slide. Alagammai, 2008/09/30
1. False 2. False 3. True 4. False 5. True
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
49
In this learning unit, you learnt about the internal working of COB You will now be able to: 1. Understand the internal working of COB 2. Understand the working of tSM & tSA 3. Visualize failure scenarios 4. Understand the types of errors generated during COB
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
50
COB2.Close Of Business – BATCH.JOB.CONTROL & Errors-R10.01
51
View more...
Comments