IAT

Software Instrumentation and Analysis
Using The Analysis Reports
Running Your Own Software Analysis
Testing and LabReduction

Regression Analysis
Process and Traceability
Other Serives
Lessons Learned
Internal Operations
Regular Expressions

IAT tailoring and metrics
Tailoring FAQ
Metrics FAQ

Summary

The Complete Instrumentation and Analysis Tool (CIAT) or IAT is used to support the development of safe and secure Software and Hardware. Using IAT allows a program to show traceabilility of software and VHDL down to the source code level and to automate many of the tedious manually intensive tasks associated with code analysis and testing. To support the test phase it instruments the software source code, performs data reduction and analysis of data captured during lab testing on the target, and supports regression test analysis. During the design and coding phase it provides a number of automated source code checks to support analysts during code walk throughs and code inspections. At all phases its shows where req's are met in the code and it identifies discrepancies between the project SRDB and the actual as built software. It accomplishes these services by reading in the actual software source code and the req's from the SRDB.

Its strength is that it was developed while supporting the needs of a real program. Services were added and tested to its Internet framework as those needs arose. It is based on web technology and provides HTML reports with hyperlinks that allow analysts to jump between actual lines of code, test instrumentation points, requirements, and code analysis findings. This evolution has taken the IAT to the point where its services can be used for reverse engineering and reuse analysis where users are able to examine processing threads from previous test runs which represent typical use cases of the software. It also includes very specialized and unique web enabled search, filter, and compare services to support on the fly ad hoc analysis as security related question surface.

IAT

Introduction

The IAT is an Internet based automation aid that integrates front end system engineering to back end design, implementation, and test. This integration is not approximate or based on analysis, but exact and based on the real requirements text and source code products. This exact interface between requirements, source code, and test results provides many benefits to a project including unambiguous traceability to the source code level, analysis reports of the source code for potential problems as part of walk throughs, and test results that allow users to literally select hyper text links to software source and requirements from actual test data. As the IAT has evolved, its services have expanded to support regression analysis, responses to customer questions, and re-use analysis. These services have been enabled with the introduction of specialized searching, filtering, comparing, and replacing features.

The Instrumentation and Analysis Tool is used to support the instrumentation of software source code, perform data reduction and analysis of data captured during lab testing on the target, and support regression test analysis. The tool is a framework that includes: Instrumentation, Simulation, Data Reduction, Lab Reduction, and Regression.

The tool also supports a number of automated source code checks to support analysts during code walk throughs and code inspections. The automated analysis comes in the form of HTML reports and messages which are either labeled as attentions, warnings, cautions, or notes.

When a web server such as Apache is enabled, the IAT also includes general services such as searching, filtering, comparing, and replacing. These services evolved as the needs arose during the intial use of the IAT-apr-2002 . IAT-sep-2002. The needs included answering cusomter questions, regression analysis, and reverse engineering.

There are some power point presentations which are a good introduction into the tool. Overview . Capabilities . Technical Insights . Walkthgroughs . Automated SV Approach . Change History . Example Process . Business Startup . Training Outline . Lunch Presentation . White Paper . Brochure . Customer Demo

IAT is in the processes of expanding to become a best practice. The IAT was created on Project-O and is being transferred to other programs. IAT tailoring and metrics are available.

Background Information Includes

Defensive Coding Summary.doc
Log Event Coding Rules.doc
Outline-for-implementing-IAT.doc
Certification Check List.doc

IAT Services

Note: scripts are disabled


Projects
Project-IAT
Project-O
Project-N

Enterprize Mgmt
Enterprise Request
View E Requests
Download E Log

Project Mgmt
Project Request
View P Requests
Download P Log
Import Doors
General
Instrumentation

Search
Filter
Compare
Regression
Search & Replace
Project ErrorCodes

Libraries
Analysis
Source
Result
Test Area

Test Data


Web Enabled
Test Area
Test Data
Templates

Instrumentation
Codegen
Le-Summary

Code Analysis
Stats-Problems
Entropy
Keywords
Hamming-Distances
Watch Dog Timer
Error-Handling
Abstracts
Comments
Regression
Metrics Reporting

Sw Req Accounting
Simulate
Data-Reduction
SRDB Update

Test Cases
z-Test Case
y-Test Case
Func Call Seq
Clear-Test-Data

Test Req Accounting
Reqreduction
s_Reqreduction

Test Analysis

Compare
Regression
Metrics Reporting

Re-Use Support

Execution Threads
PUI Translation
Search
Filter

Using The Analysis Reports

The primary purpose of the reports is to provide information. They are not to be used to directly create action items or issues. They are a tool used by evaluators together with other data to generate action items and issues. The stats-problems report is the only report that contains automatic detectors for finding potential problems. The remaining reports use keyword searches and correlation to provide a view of the software and requirements.

All reports are structured in a similar fashion. There is an introduction area with links to lower portions of the reports and links to the analysis, source, result, and other broad areas outside the current report. Following the introduction, there is a parameters area which contains most of the settings to generate the current report. This parameters area can be moved to the end of the report. Some of the newer reports have these parameters at the end of the report. Its an experiment... If a web server is enabled, these parameters can be changed and a new report can be generated. The next area is a summary area which summarizes all the information which is found in the details area.

The details area contains cross links to the source code and to other reports. The Source link opens the original source file. If the browser file extension is enabled to open an editor, the file can be opened, changed, and re-saved. This allows a software developer to quickly make changes resulting from this analysis. The Result link opens the instrumented source file. It is the file that is modified by the IAT. The link within the log event resembling a PUI opens the data reduction report and takes the user to the referenced requirement number. The left most link is the IAT internal file number and line number of the current file in the detail view. Selecting any of these links will automatically open the IAT listing report and take the user to the appropriate file and line number.

Instrumentation and Code Walk Through

The detectors and reports are in a constant state of evolution. This evolution is based on the fact that the analysts constantly need to respond to new questions as they arise on a program and the IAT is a tool to help them answer those questions. For example, if it is determined that a type recast has caused a problem, the analyst has 3 choices: (1) do nothing, (2) print out the hardcopy and look for type recasts in all the source code using a yellow highlighter, or (3) spend 15 minutes and add a new IAT service to look for type recasts. Many folks do not understand this simple concept and they resist adding to the IAT because it is viewed as adding more work to the software effort, which by the way has nothing to do with getting basic functionality out the door. However, from a program perspective there is only one right answer, anything else would leave the program open to risk. One could even argue that the DR should not be closed until all type recasts are checked.

The most current view of IAT services is located at tailoring-metrics.html. The following sections describe some of these reports, but they do not represent the complete set available for the IAT.

Possible Problems

The possible problems are generated by the logevents.pl program and appear as a separate logical display in the reports. The possible problems summarize all the anomalies encountered while analyzing the source and generating the messages in the details portion of the reports. The possible problems are:

Fatal Errors These are artifacts of instrumentation that will lead to compile errors most of the time, such as printf's, breaking if-else sequences, splitting declaration areas, and instrumenting previously instrumented code.
Header The .c and .h headers are separately checked for the appropriate fields. Missing fields or missing headers are detected.
Classification Errors The header classification field must be marked as defined in the rules, no other markings are permitted. The file naming convention is checked against the header classification marking. Keywords that may trigger a higher classification are compared against the file naming convention. Missing headers against classified file names are detected.
SV Marking The SV field is checked for YES or NO marking. Blank or other markings are detected and noted in the detailed message. The SV header field must be present for this check to detect an error. If the SV header field is missing, that is noted in the bad header area.
CV Marking The CV field is checked for YES or NO marking. Blank or other markings are detected and noted in the detailed message. The CV header field must be present for this check to detect an error. If the CV header field is missing, that is noted in the bad header area.
Log Events Logevents are subjected to several checks to determine if there may be a potential problem. The checks include redundant text, consecutive events without program code in between events, encoding problems (LE SV not LE, not SV, not TE, etc), potentially placing a logevent at the start of a procedure rather than at the conclusion, potentially placing a logevent within a loop.
Fixed Keywords Although there is a separate keywords report, there are several keywords that must be explained or removed. The bad fixed keywords form this set.
Switch Default Balance All switch statements must be paired with a default statement. Missing default statements are detected exclusive of comments. There is also a reverse check to determine if there is a default with a missing switch. This last case should never occur unless something is wrong during compile or within the regular expression searches in this tool.
Default SwError Balance All default statements must call SwError. Missing SwError calls and calls that are commented out are detected.
Case Break Balance All case statements should be paired with a break statement. Missing break statements are detected exclusive of comments. There is also a reverse check to determine if there is a break with a missing case. Without a break the code is less robust and prone to disintegration when modified. Break can also be used to exit elsewhere which may skew this result.
Nested Switches Nested switch sequences.
Stacked Cases Stacked case statements with no code in between.
Calling Rules Unique to each program.
Calling Load Functions considered are project unique. If a function is called 1 time, it may be a candidate for folding into another function.
No Error Exits Most of the software should include an error exit. This is not a hard and fast rule, but may need to be justified.
Files with: 15 or more Functions Good practices limit the number of functions in any given source file.
Files with: 500 or more MOL Good practices limit the number of LOC in any given source file.
Functions with: 100 or more LOC For proper analysis there must be a valid header termination before the start of a function. Good practices limit the LOC in any given function.
C Functions with: less than 5 LOC For proper analysis there must be a valid header termination before the start of a function. Good practices limit the LOC in any given function.
Uncalled Functions During a peer review this shows the external interface. For a release, it shows uncalled functions.
Dead Code This is a very simple detector looking for single line commented out code. Code that is block commented out is not detected. Dead code that is the result of logical errors is not detected.
Line Length Good practices limit the number characters in a line.
do Loops Considered as a bad practice and leads to maintenance problems.
goto Statements Considered as a bad practice and leads to maintenance problems.
? : Operator Considered as a bad practice. Its not clear to most software developers.
++ / -- Within Decision Considered as a bad practice by some domains.
Extra Lines between header & function The heart of this tool is the ability to detect functions. Given the level of reuse, the ability to consistently detect functions was accomplished with this internal rule. Although not required for software review, it is a tool for the analysts if a problem should arise in the report of a software entity.
NO Lines between header & function The heart of this tool is the ability to detect functions. Given the level of reuse, the ability to consistently detect functions was accomplished with this internal rule. Although not required for software review, it is a tool for the analysts if a problem should arise in the report of a software entity.

Messages

The messages are generated by the logevents.pl program and appear in the details portion of the reports. They are primarily based on verifying that the log event comments are placed properly into the source code and that a limited subset of source code checks are performed, where practical within the framework of the instrumentation code. The messages are constantly changing. Some are coming, some are going, and some are re-phrased.

Attentions

  1. File s_ marking or Header Classification blank
  2. Header Classification Marking Wrong
  3. Classified Keywords: XXX in wrong file name
  4. Bad Header
  5. OOPs Missing Header
  6. printf's Detected
  7. Do Loops detected
  8. Bad Fixed Keyword
  9. Bad Switch Default Balance (missing item from set)
  10. Default Within Comment
  11. Bad Default SwError Balance (missing item from set)
  12. SwError Within Comment
  13. Nested Switches
  14. Illegal Call (calling rules)
  15. Dead Code (commented out)


  1. Log Event Possible Encoding Problem
  2. Log Not In Executable Path
  3. Source Currently Instrumented
  4. Instrumentation Splits Declaration Area
  5. Instrumentation Missing Curly Brace LE Splits if / else
  6. Instrumentation Split Multi Line LE Comment
  7. Log Event In Header
  8. Redundant Log Text
  9. Log Not At Conclusion
  10. 6 Consecutive Log Events

Warnings

  1. Bad SV marking
  2. Bad CV marking
  3. More Than xx Functions per File
  4. Possible Bad Ham value
  5. No Error Exit
  6. Extra lines after header
  7. NO lines after header
  8. do loop
  9. goto
  10. ? : Operator
  11. ++ / -- Within Decision
  12. Missing Curly Brace
  13. Header and Function Name Do Not Match


  1. Log May Be In Loop
  2. 4 Consecutive Log Events
  3. Log Not At new Line

Cautions

  1. Possible HAM Value
  2. Line Length
  3. Bad Log English
  4. Auto Repaired Bad Log English
  5. Stacked case Statements


  1. Log Not At Conclusion
  2. Log Not At New Line
  3. 2 Consecutive Log Events


Notes

  1. Ok Classification Marking
  2. Good Header
  3. Possible Good Ham value
  4. Debug Instrumented
  5. Good Log Outside Loop

Reports

Reports are created by a user based on various display filter settings and various parameters. A limited number of predefined reports have been created and saved in the z- series scripts. Logevents.pl generates 5 reports, simulate.pl and data reduction.pl each generate 1 report. The reports are as follows:

Report

Primary Use

Codegen.html
Keywords.html
Hamming-Distance.html
Stats-Problems.html
error-handling.html
cpc-listing.html
Abstracts.html
Comments.html
Used by analysts to support SV code walk throughs and SV code inspection.
Simulate.html
Datareduction.html
Used by the test team to verify that the TOC has been fully instrumented into the source code.
Also used by the test team to perform data reduction and analysis of lab test data.

Report

Features

Codegen.html
  • instruments source code with log events
  • identifies potential problems with log events
  • prepares a DIFF report showing source modification
Keywords.html
  • identifies known keywords that may be a problem
Hamming-Distance.html
  • flags all hex numbers
  • compares hex numbers to known set of good hamming values
Stats-Problems.html
  • identifies potential bad headers
  • identifies potential classification errors
  • identifies potential bad SV and CV marking
  • identifies potential bad log events
  • identifies potential bad fixed keywords
  • identifies potential bad no error exits
  • identifies some statistics
Abstracts.html
  • shows only the abstracts
Comments.html
  • shows only the comments
Cpc-listing.html
  • generates one large listing of the software group that was analyzed
error-handling.html
  • uses the keyword facilities to provide a view of switch, default, if, then, else and alarm processing
watchdog_timmer.html
  • uses the keyword facilities to provide a view of the watchdog timmer calls
Simulate.html
  • used to verify that all TOC req’s were captured
Datareduction.html
  • used to verify that all TOC req’s were captured
  • used to verify that in lab testing caught all TOC req’s

Software Instrumentation and Analysis

The instrumentation and analysis of the software is performed by 3 scripts: logevents, simulate, and data reduction. The tool processes software that has been commented as described in le-coding-rules.doc. The tool consists of five programs which can be started via either a web browser or a DOS script. The tool processes software stored in a source directory and places newly instrumented source code in a result directory. It also generates reports which are placed in an analysis directory. All directories and file names are user selectable.

LogEvents

The logevents.pl program is used to instrument and analyze the source code. If a web server is enabled, various settings can be selected and executed. The resulting reports can then be save and used as templates for future analysis. If a web server is not enabled, a script can be used to call the logevents.pl program and pass different parameters with each pass of a logevents.pl execution. Sample scripts are provided and they begin with <z-> in the main directory. Currently logevents.pl is executed multiple times by a single z- script to produce the following fixed reports:

When codegen.html is produced the same option causes logevents.pl to save a translation file which will be used in data reduction. The translation file is stored in the analysis directory with the reports as translate.dat.

Once the source code is instrumented two additional programs are available to support the framework, simulate.pl and datareduction.pl.

Simulate

Simulate.pl accesses the instrumented source code and plays the log events. A file is created that contains only the numerical log event numbers and is stored in the analysis directory as sim.dat. A report simulate.html is also returned which shows the modules, functions, log event numbers assigned by logevents.pl, and log comments. This program is started in the same way as described for logevents.pl.

DataReduction

Datareduction.pl accesses the sim.dat file created by the simulator or the lab test environment, the translate.dat file and the toc.dat and bfw.dat files. It generates a data reduction report datareduction.html which shows the log event numbers, log event comments, TOC program unique identifiers (PUIs), the TOC requirements satisfied by the test run, and the TOC requirements not satisfied by the test run. This program is started in the same way as described for logevents.pl.

LogEvents Simulate and DataReduction Collective

A single script can be created to run the reports associated with logevents.pl, the simulate.pl, and datareduction.pl programs. These scripts duplicate what would happen if a web server were enabled and users could select different parameters to produce different reports. However, in the case of the scripts, the parameters which are passed in are fixed. Examples of these scripts start with <z->. These scripts process the individual CA folders and all the CA folders:

    z-all-sw.pl
    z-sim.pl
    z-rdn.pl
    z-key.pl

Since the individual script approach above is a maintenance problem, a master script has been created that takes in 2 or optionally 3 parameters and will process the software. Examples of using this master script is as follows:

    perl z-any.pl wt-csciA wt-release csciA
    perl z-any.pl wt-csciB wt-release csciB

    perl z-any.pl pr-csci pr-release_1 csci
    perl z-any.pl pr-csci pr-release_2 csci

where the 1st parameter is the main directory of where the software resides (CSCI) and the 2d parameter is the location of where the software group under this analysis run exists (release) and the third parameter is the CPC or CSCI name. If the third parameter is not provided, the CPC is calculated by using the second parameter and stripping the numbers and dashes from the name.

Running Your Own Software Analysis

Analysis is about the ability to ask questions and slice through different perspectives of data. Doing that quickly or on the fly is one of they key elements to creating a successful analysis platform. The web server is one of the fastest methods for allowing a user to ask those questions and get different perspectives of the data. In the absence of a web server, the scripts can be used, but the focus changes to establishing a definite set of templates. An initial set of templates have been established while supporting 3 different code walk throughs. These templates are the result of hundreds of analysis runs and parameter settings. As more software is subjected to code walk through more scripts (templates) will be developed.

Its not as bad as you may think. Lets get started...

Sample File StructureTo get started without a web server run this example

  1. The main directory where everything is located <P:/project/iat-inst>
  2. Start an existing template script in a DOS window by entering perl P:/project/iat-inst/z-any.pl CA sim147 sim
  3. If your perl path is not present, then you must enter I:/perl/bin/perl in place of perl
  4. Notice the status messages on the DOS window and wait for the script and its program calls to complete
  5. To access the reports go to P:/project/iat-inst/analysis/ and open the reports
  6. If your browser is not initialized, windows will walk you through the process, follow the instructions to initialize your MSIE browser or install your netscape browser
  7. Examine the parameter settings in the browser that just opened with the report contents
  8. Examine the report contents
  9. Open the remaining reports and examine the contents
  10. Copy z-any.pl to a new name of choice
  11. Using an editor, change the parameter settings in the new script, such as your keywords
  12. Re-run the script
  13. A simple way to re-run the script is by pressing pressing the up arrow on the DOS window
  14. If the up arrow does not let you scroll through previous DOS entries then type <doskey> and press enter to enable that option
  15. Examine the new reports

To install software that is to be instrumented

  1. Examine the current source directory
  2. Examine the current result/source directory
  3. Copy all your source files from your highest directory of interest, such as <rp> to the <source> directory in <P:/project/iat-inst>
  4. Copy the same source files to the <result/source> directory
  5. This will ensure that the instrumented source contains all the elements found in the original source
  6. Notice that the result directory contains an <source> directory
  7. The files you copied are most likely read only
  8. The tool will automatically disable the read only settings for the files in the result directory when the code is instrumented

To get started with a web server

  1. Fire up the logevents.pl program by entering the URL (www.domain.com/instrumentation/logevents.pl)
  2. Notice the parameter settings
  3. Change the parameter settings
  4. Re-run the script by pressing submit
  5. Notice the changes
  6. If you like the results, save the web page, it will be your new template

Parameters and Settings

The parameters to control the various options are selectable from the browser user interface if the web server is enabled. In the absence of a web server, the parameters need to be modified either in the individual programs or the script(s) which call the programs. The parameter settings form a template. The parameters are as follows:

Parameters

Comments

# tagging
$classmarking = "Doc Classification";
$classhigh = "Highest Classification";
$classlow = "UNCLASSIFIED";
The tagging parameters are used to mark the reports. These are static settings and should not change.
# build settings
$baseline = "IAT previous release";
$future = "1\\.0 |1\\.5 |2\\.0 ";
The build settings are unique to each build.

The $baseline parameter is used to identify the previous baseline metrics to compare with this baseline metrics set. This is obviously user controlled, but for official releases, the comparisons should be against an official previous release. It should be changed to the previous release with each new release.

Since the IAT reads the DOORS requirements and the requirements include a phase attribute, its possible for the IAT to determine and sort through future requirements. The $future parameter should be changed with each official release and should only contain the set that represents future requirements.

# directory settings These parameters will change often and are based on the software being processed.
$dirpath = "p:/project/iat-inst"; Path to the location of the programs.
$srcpath = "p:/project/iat-inst"; Pathto the source files, instrumented files, and the toc.dat file. The path is concatenated with all the location settings.
$source = "source/xmirror-rp"; Location of the software source files to be analyzed and instrumented.
$result = "result"; Location of the instrumented software source files.
$analysis = "analysis/all-sw"; Location of the analysis reports and the translate.dat file
$simdata = "analysis/all-sw"; Location of the simulation sim.dat file or lab test results.
$reduceddata = "analysis/all-sw"; Location of the data reduction reports
$report = "s_znew-report.html"; Name of the report. This name changes constantly depending on whether logevents, simulate, or datareduction is executed.
# file extensions to analyze
$extensions = "\\.c\$|\\.h\$";
File extensions that are accessed by the tool.
# color settings
$color1 = "blue";
$color2 = "blue";
$color3 = "red";
$color4 = "green";
$color5 = "purple";
$color6 = "red";
$color7 = "orange";
The keyword searches and instrumentation elements are color coded.
# keyword searches
$pdlevents_1 = "nnoonnee";
$pdlevents_2 = "nnoonnee";
$pdlevents_3 = "nnoonnee";
$pdlevents_4 = "nnoonnee";
$pdlevents_5 = "nnoonnee";
The color coded keyword searches available to a user.
# hamming patterns
$hamevents = "0x\\w\\w\\w\\w|return.*0x....";
@hamvalues = ('0001','0002'...)
The pattern used to extract hex data that might corresponding to values that should be a certain hamming distance. The extracted patterns are subjected to the accepted hamvalues.
# instrumentation initial counter
$svlognum = 1000;
$dblognum = 5000;
$locevent = 7000;
$hmievent = 8000;
The instrumentation codes are automatically assigned by the tool. The starting numbers set by these parameters.
# instrumentation patterns
$svevents = "LE.SV";
$dbevents = "LE.Debug";
The instrumentation patterns are defined by these parameters
$headerformat_c
$headerformat_h
Defines the header format that should be used by the software source.
# filters
$abstract = "0";
$comments = "0";
$srccode = "0";
$header = "0";
$svreq = "1";
$cvreq = "1";
$reqs = "0";
$cfunc = "1";
$showonlysvcv = "0";
$instrument = "0";
Enables and disables various display filters that are presented in the details report.

A "1" means enable the feature which is included in the details report.

Note 1: Report in this context refers to a display area report as opposed to a dynamically generated web page which is then saved by the analyst.

Note 2: These filters enable and disable functionality. It means that if a function is disabled, such as instrument, the function will not be performed.

# reports
$rptkeywords = "0";
$rpthamvalues = "0";
$rptstats = "0";
$rptproblems = "0";
$rptdetails = "0";
$rptcompare = "0";
Enables and disables various reports that are displayed when the dynamically generated web page is created. The details report tends to be very large and contains all the attention, warning, caution, notes, and keyword messages. The compare report is the DIFF between the original source code and the instrumented source code.

A "1" means generate the unique display report.

$window = 0; Many times an analyst is able to print out an event based on a keyword entry, but occasionally the analysts wants content. This parameter allows an analyst to show "n" events after the trigger event.

The scripts have a set of variable that are set at the top, then a set of variables that are re-set prior to calling the main program. The programs also have the same set of variables that are also set. When the scripts are used, the variable settings in the main programs are ignored. So most of your changes should be isolated to the particular area within a script that is performing a particular analysis run. Each script has these same areas and just points to different software and analysis directories:

set some parameters
require "datareduction.pl";

####################
# unique processing
####################

####################
# SV CV instrumentation
####################

set some parameters
&logevents_main;

####################
# keywords all
####################

set some parameters
&logevents_main;

####################
# hamming distances
####################

set some parameters
&logevents_main;

####################
# abstracts
####################

set some parameters
&logevents_main;

####################
# comments
####################

set some parameters
&logevents_main;

####################
# stats & problems
####################

set some parameters
&logevents_main;

####################
# simulate
####################

require "simulate.pl";
set some parameters
&simulate_main;

####################
# datereduction
####################

require "datareduction.pl";
set some parameters
&datareduction_main;


Testing and LabReduction

Testing and Lab Reduction includes several scripts: labreduction.pl, design.pl, clear-test-data.pl, and compare.pl.

Labreduction.pl is a companion web based tool to datareduction.pl except that it operates on real test data rather than simulated test data. The data reduction is performed in real time as the test executes. Anyone on the network can view the test results in real time during the testing.

Design.pl is similar to labreduction.pl except it only shows the software files and functions in the actual calling sequence for a given test thread. This provides an insight into the software which can be used for troubleshooting or reverse engineering.

Clear-test-data.pl is used to clear existing test data files of raw test data and leave the user comments behind for use on future tests. This allows for the creation of templates that can be used to support multiple releases.

Test Data Reduction

Analyzing The Test Results

The test results analysis begins during informal testing by examining the real time test results.

  1. Are the start and stop log events as expected
  2. Are any log events grossly out of place
  3. Do the log events and sequence look reasonable
  4. Do the number of log events match previous test runs
  5. Do the source files and function names in the log events look reasonable
  6. Does the layout of the log event sequence appear to be similar to previous test runs

If the results appear to be reasonable, copy and paste them into the test procedures. They will form the factory template that will be used during formal testing.

As part of an offline analysis, use the IAT reports and software source to locate the test result log events. Examine the software and look for potential missing log events or wrong long events. This is a final examination to confirm that the test results are as expected.

For dry run, create a single complete test results report following the sequence that is expected during formal test. Use these dry run test results to support the formal test.

At the conclusion of all the DRY RUN tests enable the log events found, log events not found, requirements correlation, and requirements coverage reports. Look for log events and requirements that were not captured during the all the test activity. If needed, create new tests to capture the missing requirements.

With each test battery, there is usually more than one instance of a test thread. Examples include log on from different starting points in the system. Use the IAT to compare these different log on approaches. Open up 2 browsers side by side and scan the results. Look for discrepancies as previously defined.

Eventually a repository of test data will develop. IAT includes a very powerful compare service. This compare service can be used to identify differences between test runs and allow the analyst to just focus on the differences. Save these compare results and use them as templates for future test run comparisons. As always the IAT maintains context with all of its saved web page reports so that the analysis can be re-run or modified for a different analysis, such as new baselines. If the compare report confuses you, open up the original reports and do a side by side comparison until you can gain confidence and understanding to work with the compare report.

Finally, analyzing test data well needs to include multiple dimensions. Try to compare similar things within a test thread, battery, or across baselines. Don't be afraid to keep digging when an anomaly surfaces. IAT will produce and organize a great deal of data which will let you dig until you have found an answer or you have located a genuine problem.

Besides the compare service, search and filter are also useful services for test analysis. The filter service allows an analyst to enter the PUIs from one test battery and subtract the PUIs from another test battery yielding the requirements not covered in the starting test battery. This is extremely powerful and allows the analyst to instantly determine test coverage. Search allows the analyst to search anything on the network from software and requirements to test reports and metrics reports.

Running Real Time Test Results Script

This tool is started as a DOS application in a directory which contains the test data. The syntax is:

perl labreduction.pl [filename.txt] [delayinsecs] [TocCor] [TocCov] [LabData] [LeFind] [LeNoFind] [Results] [CPC]

where all [fields] are optional and [blank] or [-] is default or are set as follows:

filename.txt Is the test file name, the default is testlog.txt. The report name is derived from this file name. The default report name is testlog.html. When entering a file name the .txt extension must be used (e.g. projecta.txt, projectb.txt).
delayinsecs The script executes 100 times before ending, this is the delay between executions, the default is 20 seconds.
TocCor If set to [y] correlates the log events to requirements.
TocCov If set to [y] shows which requirements were addressed and which requirements were not addressed by the test results.
LabData If set to [y] shows the raw test data.
LeFind If set to [y] shows the log events that were found. The default is [y].
LeNoFind If set to [y] shows the log events NOT found in these test results.
Results If set to [y] shows the test results correlated to the original log events. The default is [y].
CPC If set to [all], [red], or [a cpc] shows req's satisfied and NOT satisfied by these test results.
Filter If set to [y], filters items containing certain words from the reports. The default is [y].

Raw Test Data
Raw Test Data

The file containing the test results is automatically updated by the BOOT application for some test cases. In all other BOOT and RED test cases the file is manually updated by the testers by either copying log events from the emulator window or entering log events based on emulator profile points. Using appropriate file naming conventions, parallel testing can be performed.

This application requires the inclusion of a special file (translate.dat) that is created by the logevent application and stored with the instrumentation analysis results. This file translates the log event codes to the original english equivalents. It also requires a version of the TOC (s_toc.dat) to correlate the TOC requirements with the lab test results.

Analysis Reports Script Option Comment
Lab Data LabData This report shows the raw numerical log events captured during lag testing or during a simulation run of a software collection in analysis
Test Results Results This is the test results in log event format. This analysis connects the raw numerical log events captured during lab testing with the original source code comment statements, source file, and C-function.
Log Events Found LeFind This analysis lists the log events found in these test results.
Log Events NOT Found LeNoFind This analysis lists the log events NOT found in these test results.
PUIs TocCor
TocCov
This analysis extracts the TOC numbers from the log events captured during lab testing.
TOC Text Correlation TocCor This analysis correlates the log events captured during lab testing with the baseline TOC. If multiple events are associated with the same req, that req is displayed multiple times. The sequence matches the original test events sequence.
Not in TOC Text TocCor This analysis shows the log events that are NOT in the current TOC baseline. For example, if a TOC req is deleted after the software is coded or if the implementor enters the wrong TOC number in the comment, this report will show those events.
In This CPC Collection TocCov The TOC reqs are allocated to CPCs. This analysis shows the TOC reqs that are in this CPC collection. All the reqs in this report area should contain the CPC or CPCs that represent this analysis run. If there are only foreign CPCs, then this software collection has implemented reqs that are implemented elsewhere, or the DOORS data base is in error. If this analysis represents all the software then this report is for the current software baseline.
Not In This CPC Collection TocCov This analysis shows the TOC reqs that should be in the analyzed software but are NOT in this CPC collection. It may be that only a portion of a CPC was analyzed and so there are outstanding reqs. It may be that reqs are not properly allocated in DOORS. It may be that the implementation missed reqs and the software needs to be updated. This report only applies when CPCs are analyzed.
Not in Software Baseline None This analysis shows the TOC requirements that are NOT in the current analysis baseline. When the entire software set is analyzed, this report area should only contain paragraph headings, information text, and NON software reqs.


Regression Analysis

Regression is a web based analysis tool used to help identify potential areas of re-test when software is upgraded. It produces a bi-directional DIFF report between the two versions and submits the resulting files to a detailed analysis which extracts information. There is a regression.html report that is produced and used by the test group to help determine regression test needs.

Analysis Comment
Compare This analysis shows a DIFF between the original source code and the new instrumented code.
Baseline Files This analysis shows whole files deleted from or added to the baseline.
Details This analysis shows log events and possible regression tests based on keywords analysis and a DIFF.
Regression Test Keywords Analysis This analysis shows a set of potential regression tests based on a keyword analysis.
Regression Test File Analysis This analysis shows a set of changed files and potential regression tests based on a keyword analysis.

This tool is started as a DOS application in the IAT main directory (iat-inst). Example syntax is:

perl labreduction.pl ft-bfw sv-bfw sv-bfw2

where sv-bfw2 is the new software and the regression.html report is placed in the new software analysis directory (sv-bfw2).


Other Services

These services require a web server such as Apache. They evolved to support regression testing, user responses to customer questions, and reverse engineering for troubleshooting or reuse. Although the services appear to be similar to services found in other applications, they are significantly more powerful because they allow users to define many parameters. These parameters are key to making these services return quick and meaningful results.

Search is used to search the software or other areas of a project data base to respond to questions. Examples include customer comments that certain functions are not called. A quick search of the function name will return the result to the analyst.

Filter is used to extract information from a selected file. It also has the added benefit of subtracting information from an access query. For example, all the final PUI's shown in a data reduction report for a new test run can be subtracted from the PUI's shown in an old test run to yield the delta between the two test runs. Another example is to extract all the requirements based on a keyword such as a CPC, capability, or selected word.

Compare is used to compare test data from two different test runs. It allows a user to filter and mask certain records and minimize the amount of noise so that a meaningful comparison can be performed. The time stamps are also masked. The compare includes an IAT unique DIFF mechanism and an internal GNU DIFF.html or GnuDiff.pdf.

Libraries are project specific. They are links that point to key data. This is accomplished with the establishment of a project portal, such as this IAT portal.


Process and Traceability

The IAT allows a project to trace its requirements to the source code level. This traceability is not approximate or based on interpretation, but exact. This traceability is shown at all phases of the project development life cycle. To achieve this traceability a process is summarized in the following tables. This process was established and used on a real time embedded application project with a very aggressive schedule and limited funds. This process assumes that DOORS and the IAT are only accessible by the systems group. This limited DOORS and IAT access is a worst case scenario. The ideal situation includes DOORS and IAT access by all project personnel.

Reuse on your Program
Placing IAT On Your Program

The Big Picture

Program Process Details

IAT Code Review Process Details

IAT Test Process Details

Traceability Frequently Asked Questions


Reuse on your Program

How do you handle legacy log events within a reused software collection? Unlike the subjects on the rest of this IAT portal, the reuse topic has not been tested on a real program.

A number of years ago, when the topic was hot I was asked to think about reuse. My thoughts were framed by another company that had been developing essentially the same system since my birth. They kept upgrading and every so often re-achitecting the system and delivering it to different countries for decades. Without realizing it while I was there, they IMHO, had all of the elements of reuse in place and by the time I arrived on the scene it was just ingrained in the culture. One of the more important elements was the fact that they realized reuse was not just about software. It was about all the products including the information products that feed the software. It framed their entire culture.

So IMHO, there are three levels of reuse: (1) course grain, (2) medium grain, (3) and fine grain. Course grain reuse is where you take the people from a project and move them to a new project of a similar nature. The level of reuse is their experience. Medium grain reuse is taking all the information products from one or more related programs and moving them to a new program. I had extensive experience in that area with the company I referenced above. The information products would include the specifications, studies, models, and design documents. Fine grain reuse is what most folks are familiar with when they think of reuse, it is the reuse of the software.

If instrumented software is moved from one project to another project as part of a reuse effort, the question arises: What do you do with the log events? If the organization is mature enough, then its a no brainer because the requirements base is the same. There is one SRDB that supports multiple reuse projects. If the SRDB is fragmented across projects, then the IAT in its infinite wisdom will flag that you have a big problem on your program. The problem is the missing information products that are linked to your reused code. If you duplicated them thats one answer. If you ignored them thats another. Chances are its a combination of the two. So is IAT to blame? No IAT is only the messenger of a process problem. If the process problem exists, what are the alternatives?

There are 3 basic alternatives to the software reuse problem that did not consider all the underlying information products: (1) strip out the log events, (2) translate the log events, or (3) leave the log events in place and enable / disable instrumentation as request by organizations such as test, systems, or software. The organization then embeds the new requirements as log events into the code.

Alternative 1 - Stripping out the log events is easy. However, once they are gone, they are gone forever. Its easy to throw things away, its very difficult to create things. The IAT can be set to just ignore these log events. So this approach is not recommended.

Alternative 2 - Translating the requirements is a possibility. It would mean that a requirements analysis needs to include a reuse analysis so that the requirements can be reconciled. This is not a bad approach and in fact that is what we use to do in that company that practiced reuse for all those decades. Once the translation map is complete the IAT can drill through the software and make the changes.

Alternative 3 - The IAT as part of instrumentation can automatically identify the reused requirements that are in the new project software baseline. A new "reuse" specification can be easily extracted and imported into DOORS as a module named "reuse" from this IAT product.

Why go to the trouble of capturing the requirements associated with reuse? Because all software needs to be accounted for from a requirements perspective. Again, if your program has an SRDB, it is one of those special mission critical programs. That means there should be no "unaccounted for" software. The next issue is the software design information products, such as object interaction diagrams (OIDS), associated with the reused software. Unfortunately IAT currently can not help in that area.

Is there a cost? You bet, but its less than recreating the software. Is it something that must be done? Yes, bidding a reuse program while ignoring the underlying information products shows a lack of understanding the entire process outside the software boundary. Fortunately, unlike in Alternative 2, which is very labor intensive, the IAT can slice through the software and find the reused requirements faster that anyone can read this section. That output can then be fed into DOORS. Call the module "reuse" and declare success. So, this is the recommended approach.


The Big Picture

Requirements analysis is used to develop a set of software requirements that can be used for software design. The output of the requirements analysis is captured in System Requirements Data Base (SRDB) using a tool like DOORS. Traditionally, reports have been produced for people as they work different aspects of a project. With the introduction of IAT, the DOORS output is imported into IAT and used by its code analysis and test services.

The most important IAT aspect of the software design is to ensure that peer reviews are held for the code containing the log events. Doing SV testing is not just about running IAT reports and capturing test data for a release. It is also about following process. A key process element is ensuring that software containing log events is subjected to peer reviews and that a check list is filled out as part of the peer review. The check list is documented evidence that the placement of log events was examined by those closest to the problem, the original designers and coders. This is long before any IAT activity starts up on a program. The SV lead should examine these check lists to verify that the item addressing log event placement is checked. Its a good idea for the SV lead to attend initial peer reviews to enforce this concept and make sure that very important check list item is addressed and closed at the review.

With the introduction of IAT, the DOORS output is imported into IAT and used by its code analysis and test services. The schema should include the elements found in any good SRDB. A unique view should be created to support export to IAT. The format of the report should include the PUI, Req Text, CPC and optionally the phase of delivery.


The picture looks busy, but the bottom line is tester gather raw test data in TXT files. The TXT files are subjected to data reduction and HTML files are produced as the test is being performed. There is a TXT and companion HTML file for each test case. At the conclusion of a test, Req Reduction processes all the TXT files and produces a single big picture HTML report of the entire test run. This same service MINES the test case HTML reports and produces a single big picture HTML report of each test case. There are services such as compare, search, and filter which are used to perform ad hoc analysis of the HTML reports. In the end, the job is to hunt for discrepancies and account for all the req's.

There are IAT services and files containing code analysis reports, test data, test reports. As the project unfolds with multiple internal engineering releases, templates are developed. These templates are extremely valuable. The most valuable templates are the TXT files containing the raw test data. These files contain comments from the testers and essentially become scripts for running the tests. As the project unfolds, the team starts to refer to the TXT files as scripts. An IAT service is available to scrub all scripts of any test data and leave the test comments behind for use in a next round of testing. There are also HTML templates that are left behind from previous test runs. These HTML templates are usually the result of some ad hoc analysis involving a search or a compare.

The directory structure is based on having a holding place for all the test (iat-test), a holding place for each release (REL x.x), a special place for IAT test data (logev), and a special place for analysis (logevents-analysis). The only name that is fixed is the "logev" directory.


The Program Process Details

Placing IAT On You Program

Staff

Activities

Eng Mgmt Get buy in from all the stakeholders - Customer, PMO, Software, Systems, Test.
Eng Mgmt Identify an IAT project POC who will be responsible for its operation and products.
IAT expert & project POC Move a generic form of IAT to the unique program network area (about 1 day). This includes getting ISS to install Apache and AOL press on your IAT workstations. Although not initially required, these applications will be needed on programs once IAT is 100% in place and providing significant services.

If you don't have software yet, then you just need to understand what to do with your SRDB, and your software code once it starts to get developed. That is a simple meeting with your staff where "we" plan for IAT and a careful read of the IAT web portal.

IAT expert & project POC Run some project software files through code analysis and asses results (2-3 days). This includes getting the project POC up to speed on running very rudimentary IAT services and trying to communicate the IAT big picture to the project POC.
systems, software, test Determine if IAT will be used for code analysis, test, or both code analysis and test.
systems, software, test Identify what requirements will be instrumented - security critical, mission critical, system, software, all processors, some processors, etc. If this is an existing program, work only with the new requirements associated with the next release. Do not attempt to back fill previous releases for the first IAT based release.
systems, software, test Identify if you intend to capture display messages intermixed with the log events. If so, determine if it will be via instrumentation or mixing of display messages with log events on the same interface.
systems, software, test Identify how the log events will be captured - JTAG emulator port, RS-232, internally within a PC bus, etc.
Eng Mgmt Identify a Software POC who will be responsible for getting the log event interface to a static text file up and running - tester copies and pastes from emulator window or a mechanism that writes to a file or something else.
IAT POC Distribute Plan for embedding IAT on the program Outline-for-implementing-IAT.doc.

Front End Systems Engineering

Staff

Activities

customer Customer provides a list of features.
systems, software A-type specification is written using MSword. The engineering data is captured via round table discussions while reviewing the OPS specification.
systems The OPS specification is imported into DOORS.
systems with some software participation Round table discussions are held to determine potential CSCI's. A decision is made to have x SRS's. One for each "computer" and one for when the device has no application software (BOOT state).
systems The requirements are extracted and allocated to hardware and software SRS's as a first pass.
systems An OPS DOORS table report is produced showing allocations to hardware and software SRS's.
software Requirements allocations are reviewed and modified as needed.
systems DOORS is updated to reflect allocation changes and a new report is produced and distributed.

Software Requirements Analysis

Staff

Activities

software SRS's are written using MSword.
systems SRS's are imported into DOORS.
systems OPS and SRS requirements are linked in DOORS as a first pass.
systems Two types of DOORS table reports are produced to show allocations and highlight orphans and childless parents.
software with some systems participation Reports are reviewed to identify missing requirements and misallocations. Modifications are made.
systems DOORS is updated to reflect the changes and new reports are produced and distributed.
systems TEO and TOC is written using MSword.
systems TOC is imported into DOORS.

Preliminary Design

Staff

Activities

systems OPS, SRSs, and TOC requirements are allocated to system capabilities (threads which become test cases).
systems OPS, SRSs, and TOC requirements are allocated to test methods (I, A, D, T).
software SRSs and TOC requirements are allocated to CPCs. A CPC is a collection of software source files. They are a directory in the software source. They are a UNIT which are referenced in the SDFs.
systems DOORS table reports are produced showing the requirements and allocations to CPCs, capabilities, and test methods.
software CPCs allocation updated several times. Note that this process never achieves closure. The final allocation is always in the as built software. However, a best effort should be put forward in this activity to minimize questions in the back end of the program when everything is reconciled.
systems DOORS table reports regenerated with each CPC allocation change request. Requirements reports are generated for use by the IAT.

Detailed Design

Staff

Activities

software TOC requirements are physically entered into the CPC software source code as a comment statement where the requirement is satisfied. These are the log events. This is the actual as built final allocation results.
software Peer review is announced. Announcement includes where the software is located.
systems Software is accessed and the IAT is run against software to be reviewed. E-mail generated summarizing the findings and pointing to the detailed IAT reports.
software, IAT reports, and initially a physical presence by systems Peer review held to verify several items including log event placement.
Test Director with input from TD and software leads Code walkthrough with the customer is announced.
systems IAT is run against software to be reviewed. E-mail generated summarizing the findings and pointing to the detailed IAT reports.
software Walk through held to verify several items including log event placement.
systems CD of reports sent to Government

Implementation and Test

Staff

Activities

software, systems Create a C program that supports emulator log event dumping.
software Make changes to ensure that instrumentation works. This may include moving software in the memory map.
systems for dry runs and CM for formal testing Instrument the software using the IAT.
systems with software review if they have the time Review IAT reports to determine if requirements covered in the current build are as expected. Use the IAT to support code familiarization.
systems Run tests IAW with the test cases, which are the system threads and capabilities. Capture the log events from the emulator. Run data reduction reports for each test case.
systems Examine the final master report for requirements and log events coverage. At the conclusion of the tests execute the reqreduction.pl tool to extract all the unique log events from the entire test activity. Run the labreduction.pl tool with all the options enabled.
systems Examine the test results associated with each test thread and the master report.
systems If this is the formal test and not a dry run, incorporate findings into the test results report.


DOORS

Good, bad, or indifferent this is what happened with DOORS on Project-O.

Schema for the Project-O DOORS database

Formal Modules

Link Modules

Other Modules that you see

The reasons for the 3 CSCI's are:

  1. 2 computers, 1 for each computer, plus 1 boot
  2. Location of classified req's forced the CA RP split
  3. Boot as a separate entity was developed and delivered separately
  4. Boot was delivered way before anything else, months
  5. There may be others, but my memory fails me

Attributes

Some of the attributes started as pull down selection boxes. All the attributes are now text fields. We did not know how to perform global inserts for pull down boxes and doing those items 1 at a time took to long. Since DOORS training I have a feeling that may be possible with the pull down selection boxes, but have not tried it... The attributes are:

  1. Allocation (RP, BP, BFW, HW) - this was done first
  2. Req (y, n, default=n filter on shall/will and replace all with y)
  3. CPC Note: If you change from SW functions to SW threads you must add that to your SRDB. You need to maintain 2 sets of books CPC and Threads. So be very careful about your original selection. Also realize that IAT will track either or both BUT the SRDB MUST have the allocations.
  4. Phase (1, 1.5, 2.0, 2.5)
  5. Qual (I, D, A, T, log)
  6. Test (FCAT, SV, CV, Unit)
  7. Capability = Major Test Case = Major Operational Thread

Preparing the Import and Importing

I took a quick look at the current PPP SRDB (system Req's data Base) and noticed that there are multiple req's in the same object (row). On Project-O we imported many times until we figured out what needed to be done.

Step A was to go into the word document and do a global search and replace so that each period was followed by a paragraph mark. The search pattern was <. > note the space and the replace pattern was <. ^p>. DOORS will place windows lists in a separate object. So remove any periods from the lists. There was a time when we wanted the lists to go into a single object, but we could never figure out how to do it without significant manual merging once in DOORS (our new version of DOORS).

We then experimented with different import approaches (rtf, txt, deleting figures and table, etc). The bottom line is open the document in Msword and press the icon to export to DOORS. This will grab everything, including the pictures and tables. It is the cleanest import to DOORS, capturing the paragraph headings and non heading stuff. DOORS assigns a unique number to each object (row) which is different than the "paragraph" number that DOORS assigns with other import methods. Project-O used the object number rather than the DOORS "paragraph" number. The paragraph number is not absolute while the object number is, so there can never be confusion in the future. Also, the DOORS paragraph number is ugly for a human, too much noise. Bottom line is use the OBJECT number for the Req PUI (Project Unique Identifier), IMHO.

Generating Reports and Exporting Back to Word

Use the wizard. Do not try to create custom reports by modifying DXL. We tried that on Project-O, you might as well stick tooth picks in your eyes. Use the wizard and learn to live with the views the wizard provides. We tried to create a few standard views, but the wizard is so fast, that the standard views were used primarily as a guide to remember what the last "good" report looked like...

Do not include the tables in the report if you plan to export back to word. It takes way too long. Although you can link to each cell in a table, we never did that, again because of the huge amount of time in creating those links. By time I do not mean DOORS button pushing, but gray matter time. Someone needs to think about the allocation. After we reflected on what benefit might arise if we linked to each table cell, the bottom line answer was NO. We also considered linking to the table references at the start of each table. It never happened... again to what benefit.

So the Project-O SRDB is English phrases, no pictures or tables even though the original stuff is in the database. We did try to make each phrase stand alone. We created and published a requirements style guide which included instructions and samples on writing good clean req's text. The SRS's and OPS are not bad, considering everything else... I would recommend finding this style guide and using something similar on PPP. It roots are Al Starr and company.

Attribute Versus Linkable Object

The question always arises, should something be an attribute or a linkable object. The best way to answer that is by example and some history. In the old days before COTS SRDBs organizations would try to show traceability using a spread sheet or a word processing table. Those who had worked with those information products know that the view is only in one direction, the direction the table was built. For example, if req's were allocated to test cases, the product would list the test cases in the left column and the requirements would be in the right column. But, for each row, while there would be one entry for the test case, there would be multiple entries for the requirements. The dilemma arises if the user wants the information product sorted by requirements. What would happen is the rows would be duplicated using a copy and paste operation and the cell with the multiple entries would be manually parsed. Now here is the bug a boo; the example is test case versus requirement, but the information product would always start out as requirement versus test case. So the manual parsing was always required if you wanted all the requirements associated with a test case.

The advantage of an SRDB is, if you use the linkable object method, the SRDB will create bi-direction views without any manual changes. So, if you are silly enough to make your test cases an attribute, then you will be forced to use the old manual approach of parsing your information for a view that is based on test cases. Essentially you have demoted the wonderful power of an SRDB to the state of an office automation product like a spreadsheet or word processing information product. Further, you have forever lost any ability to show the relation ship to any other information that is at a lower level. For example, you can not create a view that shows all your test cases linked to lower level requirements like your SRSs. You also can not show links to other information like your CPCs. Finally, you can't see the chain that stretches from CPC, to SRS, to A-level Spec, to system test.

So, anything that needs to show bi-directional views should be based on a linkable object. Also, if the object is related to other items in the database, the linkable chain path should be preserved. In general, you really should have a very good argument for reducing something to an attribute, because as an attribute, it has no ability to be viewed in alternative ways. It is stuck in a word processing or spreadsheet table like view.


IAT Code Review Process Details

The IAT supports both peer reviews and code walk throughs. The difference between a peer review and a code walk through is the amount of software under review. During a peer review the collection of software being reviewed is a small subset of the application. During a code walk through the collection of software being reviewed is a working standalone application. The code walk through has been merged and is coincident with a software release on Project-O.

IAT Based Peer Review Process

  1. Two days prior to a peer review, an IAT Peer Review request is submitted Place Request.
  2. The project IAT person extracts the software collection from CM and generates an IAT run by creating the appropriate directories and executing z-any.pl.
  3. The project IAT person examines the reports and generates an email summary of the findings. This email also points to the location of the reports and reminds the reviewers to examine the reports prior to the peer review and take appropriate action.
  4. Peer review participants hold the review and fill out the regular and security critical software check lists.

IAT Based Code Walk Through

  1. An IAT Code Walk Through, Dry Run, or Final Test request is submitted Place Request.
  2. CM executes the build process, which includes running IAT code analysis and instrumentation. This is accomplished by creating the appropriate directories and running z-any.pl.
  3. The SV analysts examine the code analysis reports and generate discrepancy reports as needed.
  4. The project reviews the discrepancy reports and makes corrections with this release or with future releases.

Analysis Process

  1. The analysts review the reports and answer questions using the full services of the IAT. These services include the hyper links in the reports, search, and filter services.
  2. Examine the codegen.html report for proper instrumentation. Scan the details and look for unusual alerts. Examine the translate file results area and look for the number of log events of each type (1000, 5000, 7000, 8000 series). Verify that the number of log events actually embedded in the code, as evidenced by the translate data area matches the log event pattern counts at the top of the report. Continue to the bottom of the report and examine the automatic DIFF. Verify that no code has been modified and that only log events have been inserted and, if needed, log event comment changes have been automatically performed. This should not take more than 15 minutes.
  3. Use the codegen.html report to begin verifying log event placement, however, the more compact le-summary report, which only includes the instrumented files and ignores all other files, may work more efficiently. Click on the links and review the software and requirements. Depending on your level of trust in the peer review process, this analysis can be a few minutes to days. This report will be the main product used by the customer IV&V contractor to verify your requirement allocations to the software.
  4. Examine the simulate report. This information is essentially the same as found in codegen.html or le-summary.html, but its an artifact of playing the actual instrumented code where as the other reports are the result of reading the original non instrumented source code. Examine the report and look for discrepancies. Examine the resulting simulate file sim.txt. This should not take more than 5 minutes.
  5. Examine the datareduction.html report to determine if all requirements have been instrumented. Note the missing log events (ie requirements that have not been instrumented) and communicate the missing log events to software ASAP.
  6. The remaining reports are used for code analysis. The primary code analysis report is stats-problems.html.


IAT Test Process Details

The IAT Test Process contains the following major blocks: Requirements, test case development, test run preparation, test execution, test wrap up, test analysis, and test complete activities.

Test Requirements The requirements need to be imported from DOORS into IAT. This is a manual process using MSword to convert the file format into a flat ASCII text file and a .html file. The flat ASCII text file is accessed by the IAT internals and the .html file is used by the IAT links.

Test Case Development Test case development begins with a traditional MSword document, but it is augmented with test scripts that are save and maintained as separate stand alone text files.

Test Run Preparation Test preparation includes setting up the test directory and populating the directory with IAT services software and test scripts.

Test Execution Test execution is dependent upon the level of sophistication of the test environment. This environment can be simple where testers copy and paste log events into a text file or complex where the machinery writes directly to a disk file and intermixes log events with target machine display information.

Test Wrap Up At the conclusion of a test battery, there is an accounting that occurs which reconciles all the requirements found in the DOORS data base. This accounting is accomplished by executing unique IAT services.

Test Analysis Test analysis is a grueling activity that involves looking at the log events from multiple perspectives. The IAT includes a number of service to ease that process and these include filtering, searching, comparing, and keyword coloring.

Test Complete At the end of the test analysis everything needs to be organized and put away for future reference and possible reuse. This also includes transferring metrics data to external reports and updating test catalog web pages.

Test Requirements - Create a MSword version of the DOORS report.

  1. Generate a word table report from DOORS which contains the PUI, Req Text, CPC, Capability, and phase.
  2. Create a header and a footer with proper classification.
  3. Add a date and time stamp, file name, and page number to the footer of the document.
  4. Save the word DOORS report as a .doc document.
  5. Save this report in a special DOORS directory.
  6. Send an email to software indicating that new DOORS reports are available at this directory.

Test Requirements - Convert the DOORS report to formats that IAT understands - flat ASCII text file

  1. IAT uses a .dat and .html version of the DOORS report.
  2. These versions need to be created and placed in appropriate IAT directories.
  3. Start by opening the new MWword version of the DOORS report.
  4. Use word to convert the table to text using the "$" as a delimiter.
  5. Optionally remove all text formatting by double clicking bold, italic, underline and selecting a normal format with a 10 pt font.
  6. Remove all the paragraph marks (^p). It should become a great big glob of text.
  7. Replace all tabs with space.
  8. Replace your PUI with ^pPUI.
  9. Each requirement should now be on a standalone single line.
  10. Save as .dat format in the special DOORS directory containing the original MSword document.
  11. Preserve the classification indicator and date time stamp.
  12. Open the new flat ASCII text file of the req's and verify the word operations placed each requirement and its attributes on a single line.

Test Requirements - Convert the DOORS report to formats that IAT understands - .html file

  1. Reopen the .doc version of the DOORS report and duplicate the PUI column.
  2. Replace the first PUI column with <A HREF'PUI by searching for PUI and replacing with <A HREF'PUI.
  3. Replace the second column with '> </A>PUI by searching for PUI and replacing with '> </A>PUI.
  4. Now convert this file to a flat ASCII text file using the previous process for creating the .dat version of DOORS.
  5. Add html paragraph marks to each line by replacing ^p with <P>^p.
  6. Save as .html in the special DOORS directory containing the original MSword document.
  7. Preserve the classification indicator and date time stamp.
  8. Open the new html file in a browser and verify that each requirement and its attributes are in a separate html paragraph (double spaced).

Test Requirements - Transfer the converted DOORS reports to IAT areas

  1. The DOORS requirements need to transferred from the DOORS directory to the IAT working areas.
  2. There are 2 IAT working areas: iat-inst and iat-test/unique-release-dir.
  3. The iat-test/unique-release-dir is created with each new test by the CM build process.
  4. Copy the .dat and .html versions of the DOORS reports to iat-inst and iat-test/unique-release-dir.

Test Case Development - Writing test procedures

  1. Create your test procedures using a traditional document.
  2. Place emphasis on the operational thread that will capture a snap shot of the internal operations of the device as log events are captured. This implies looking for logical start and stop points as an operator might see in the device. Examples include: poweron, poweroff, logon, logoff, create an account, delete an account, etc.
  3. Realize that there is a related test script that is a text file which will contain the log events for each atomic test case in the test procedures.
  4. Reference the text file representing the test script in the test procedure.
  5. Use a naming convention for the text based test scripts.
  6. Don't get too wordy in the test scripts. The idea is to quickly understand what the test is about and get to the business of capturing the log events.
  7. The test scripts take precedence and should always be viewed as red lines to the current procedure if there is a difference. Take special care not to remove test sequences from the original test case and claim that the shortened test script is valid. This will get you in trouble.
  8. Realize that todays test script will become tomorrows regression test script. Regression test scripts are always smaller than the original test and usually do not include multiple paths and error paths as in the original test.

Test Run Preparation - All

  1. The test process begins with an official CM build which includes instrumentation.
  2. An IAT dry run or formal test request is submitted by CM and they proceed to make the build Place Request.
  3. Verify that CM updated the $future and $baseline parameters in z-any.pl.
  4. Update labreduction.pl and .cgi to include unique user defined buckets for requirements accounting.
  5. Use the datareduction.html report in the release analysis area to identify potential buckets.
  6. Update the search.cgi and filter.cgi scripts to point to the new release.
  7. Create a logev directory in your iat-test release directory.
  8. Copy the analysis-test directory created by CM to the logev directory. This directory exists in the result folder and is created as part of the instrumentation process.

Test Run Preparation - First Time

  1. Copy all the IAT software, sample data, and directories from the IAT templates logev area to this new release logev test area.
  2. Create a test script for each test case in the test procedures.
  3. At the top of the script use a title that reflects the test case.
  4. Use HTML tags to underline the title.
  5. Add text to represent the release, such as "REL_1.00".
  6. Place any setup information at the top of the script.
  7. Add comments to the script that will minimize your need to reference the MWword test procedures.
  8. You can start by copying and pasting the test procedure steps into the script and deleting the noise. Remember, do not get wordy. You want to quickly know what the test is about and where to grab the log events.
  9. Use the device HMI to bracket your log events.
  10. Use a naming convention such as FAIL, PASS, ERROR #####.
  11. Clump your script categories together and clearly state a start point and an end point.
  12. Number each item in the script. It will ensure that each line is unique allowing for compare services in the future.
  13. Its not a bad idea to use the same number for all the text associated with a clump of text.
  14. Finish a clump of text and its associated log events with the case sensitive word of "#. Done". This will greatly aid you during analysis when automated compare services are used to complete the test analysis. Do NOT place "#. Done" before the log events.
  15. Feel free to update labreduction.pl and .cgi to include new or different keywords during the dry run tests. By formal test, resist the urge to update these keywords.

Test Run Preparation - There is a previous release to start from in this release

  1. Copy all the IAT software, raw test data scripts, and directories from the previous test to this new release logev test area.
  2. Clear all the test data from the existing scripts by running the clear-test-data.pl script.
  3. This will leave the user comments behind from the previous test run.
  4. Proceed with the test run as outlined for the First Time Test Run case.

Test Execution - Manual transfer of log events to test script files

  1. This is the least sophisticated method of transfer of the raw test log events from the target to the test scripts, but it provides the tester with maximum visibility and control.
  2. Run you test case from a starting point to a point where the target achieves a steady state condition and no more log events are being generated.
  3. Copy and paste the raw test log events from your test environment to the text file associate with this test case.
  4. Ensure that you place the log events in the appropriate spot in the test script.
  5. Execute labreduction.pl and examine the initial set of logevents.
  6. If there appear to be no problems with the target, test equipment, or IAT, continue with the test case.
  7. Copy and paste the raw log events until the test case is complete. Feel free to re-execute labreduction.pl with each copy and paste sequence. This will allow you to see the log events build up and get a reference as the target moves through the test case.
  8. If needed, modify the test script. Bracket the log events with operator displays.
  9. If there is an error, note the error in the script, save the script under a different name, and restart the test with a clean script. You can manually delete the previous log events to get going with a new test run, but do not lose the test data that contains the error. It is important data. Use naming conventions such as FAIL and ERROR #### in the script indicating the failure. If the second test run has an error and you have convinced yourself it is not a cockpit problem, you have found a problem that needs reporting. In the problem report make reference to the test data reduction report. This may be helpful to the people trying to debug the code.
  10. Note: Although lab-web-lite.pl is available for data reduction, do not use it during testing. Using labreduction.pl automatically saves the reduced data and logs an entry in summary.html. These two simple activities are critical to prevent loss of data and provide a view of how hard each test case is to complete during an actual test run.

Test Execution - Semi Automatic writing of log events to test script files

  1. This is a moderately sophisticated method of transfer of log events from the target to the test scripts. It mixes the advantages of full automation with the manual method.
  2. The test configuration has been setup to write log events directly to a disk file.
  3. Run you test case from a starting point to a point where the target achieves a steady state condition and no more log events are being generated.
  4. Open the file being logged and examine the contents. Add comments above and below the log events just captured. Save the file and exit.
  5. Execute labreduction.pl and examine the initial set of logevents. If there appear to be no problems with the target, test equipment, or IAT, continue with the test case.
  6. Repeat the process of opening the file, adding comments, saving the file and exiting. Feel free to re-execute labreduction.pl with each sequence. This will allow you to see the log events build up and get a reference as the target moves through the test case.
  7. At the end of the test case, rename the file written to the final script file name that represents the test case. Execute labreduction.pl on this new file name and examine the set of logevents.
  8. Treat errors in the same manner as for the manual test case.

Test Execution - Automatic writing of log events to test script file

  1. This is the most sophisticated method of transfer of log events from the target to the test scripts. This approach is based on the concept of fully automated testing, such as over night test runs.
  2. For this approach to be useful, the log events should include major sign posts, such as HMI displays and time stamps.
  3. Its anticipated the a few hundred thousand log events will be captured during an over night run, so there needs to be a mechanism to parse the data and run labreduction.pl. The labreduction.pl tool will process all the data and it probably will not take very long (1 hour), but the resulting report will not load into the browser.
  4. The suggested approach for parsing the data is to provide starting and ending points in labreduction.pl. A good approach would be to use the time stamps or HMI displays. Alternatively the raw text file can be manually parsed using a very fast editor like WordPad and labreduction.pl can be executed on the saved parsed file snippets.

Test Execution - Complete

  1. Prior to the government witnesses leaving, make sure all the hardcopy test procedure steps have been checked off and the signed by the appropriate personnel. This should be done during the actual test case completion. Everything needs to be accounted for on that hardcopy.
  2. Pass the original test procedures with the red lines, check marks, and signatures to the test director as soon as the test case is complete or at the end of that day. Definitely pass the material to the test director when the government witnesses leave and all testing is complete. Make a copy for yourself.

Test Wrap Up - Requirements Accounting

  1. At the conclusion of a test, execute reqreduction.pl. This will read all the test scripts, extract unique log events, count all the log events, search for keywords such as ERROR, FAIL, PASS, produce a report, reqreduction.html, summarizing the findings and create a reqreduction.txt file for further analysis.
  2. Open and review both the reqreduction.txt and reqreduction.html reports.
  3. Execute labreduction.pl on the new reqreduction.txt file. Verify that all the options are enabled so that a complete report is produced. This is create a s_reqreduction.html report which will allow an analyst to account for all the requirements and log events in the entire test battery that represents the release.
  4. Note the difference between the reqreduction.html and s_reqreduction.html names.
  5. Use the labreduction.pl accounting buckets to categorize requirements and log events that were not encountered during the test. For example, if you did not run a logon test, those req's can be identified and placed in a buckets separate and distinct from log events and requirements actually missing because of problems with the software or test execution.
  6. Run regression.pl to compare this release with a previous release. This is especially important if the release being shipped is different than the release when formal test was started. The regression analysis report will be located in the newer release, if you follow the instructions on running regression.pl. This report contains many visualization alternatives, feel free to experiment with the visualizations.

Test Analysis

  1. An IAT request is submitted by the SV lead to perform test analysis Place Request.
  2. There is an existing word template document that describes the test analysis process. Access and review the document s_logevent-analysisREL_x.x.doc.
  3. There are two broad categories to the analysis. The first is the requirements accounting and the second is the test case by test case analysis of the log events.
  4. Transfer the requirements accounting information from s_reqreduction.html to the s_logevent-analysisREL_x.x.doc.
  5. If there are any discrepancies with the accounting, create a problem report.
  6. Use the filter service to identify the requirements coverage differences in both directions between this release and the previous release. Transfer the data to the s_logevent-analysisREL_x.x.doc.
  7. For each test case compare it to similar test cases and previous versions.
  8. Use side by side browsers, compare, search, and filter services.
  9. Try to create a set of compare reports that can be used to document your findings and save them in the logevents-analysis directory.
  10. Use compare reports from previous test runs as templates.
  11. Use the locate and subtract features of the filter service to compare requirement coverage differences between test reports.
  12. The goal is to look for discrepancies and explain them using reasonable logic.
  13. If this is the first time running this test case and there are no similar log events to compare, then the analyst needs to use the services of the IAT to ensure that the output is complete and consistent. If needed, use the search and filter services.
  14. If there are any discrepancies with the log events, create a problem report.

Test Complete - This includes cataloging the test data and transferring metrics data

  1. There is a template page for cataloging the test data that is gathered with each release.
  2. The approach is to catalog releases that are shipped to the customer and provide a link to the directory containing the remaining test data.
  3. Use AOL press or another HTML editor to update test-data.html. Use the summary.html report to update this page.
  4. Examine the summary.html report and attempt to get an understanding of the length and difficulty of the test.
  5. Access the metric data throughout the IAT and transfer it to the document iat-metrics.doc in the analysis directory.
  6. Once the s_logevent-analysisREL_x.x.doc is complete, archive it an iat-test and pass the file to the test director for incorporation into the Final Test Report.


Traceability Frequently Asked Questions

  1. What is a CPC and a UNIT?
  2. Where do I show traceability, in DOORS or IAT?
  3. Should I go back and update DOORS with the fine grain traceability outputs of IAT?
  4. Ok, so how does the IAT show the traceability to the UNIT level?
  5. My head is about to explode, so what should I do?
  6. I am not using IAT so how do I show traceability to the UNIT level?
  7. Why are we just talking about software traceability?

1. What is a CPC and a UNIT?

In my view a unit was and still is the lowest common denominator in software. This translates to a subroutine / function call. Occasionally I will back off and consider a unit a source file which may include from 1 to 6 functions / subroutines. This view was formed over 20 years ago while working with an outfit that was writing the next generation software processes which eventually became MIL-STD-2167. It was in the day when Yourdon ruled the nest. A CPC was a collection of units. You could have lower level CPCs within upper level CPCs. A CSCI was a collection of CPCs.

As I have made my trek across the country working for many different types of organizations the definition of a UNIT was always mired in obfuscation and confusion while the software team would try to ignore what was needed for traceability and focus on providing software code. In the end it would never matter because traceability to the UNIT never really occurred. There was no reasonable mechanism to capture it and show it with or without an SRDB (DOORS). The links would eventually break down shortly after the CSCI level. In fact it was in the more enlightened organizations where traceabiltiy would be reasonable at the CSCI level. The test organization would be left to pick up the pieces and attempt to imply traceability while creating verification cross reference matrices.

So traceability could range from course grain to fine grain traceability. Fine grain traceability would take it to the unit level. Course grain traceability would take it to the CSCI level. The cost associated with providing fine grain traceability would far out weight the needs of providing basic functionality and performance, so it would never happen.

In Project-O a UNIT was a software folder on the NT Network. It was also a representation of the SDF. It was a collection of multiple C , ASM, and H files. Examples include SES, TRF, LDI, etc. However, even though the software team defined a unit to be less than the common denominator, the traceability provided was fine grain traceability and went all the way down to the function call in 99% of the cases. This was accomplished with the IAT and the defined process. The remaining 1% would occur at the source file level if the IAT was not successful in locating the associated function.

Here is the bug a boo. Hardware folks have no problem understanding the concept of a unit. For hardware, a unit is a thing with a part number, and everything has a part number, even the lowly resister. Software people do not understand that simple concept. So when you say every function in every file should have a part number, they get very upset. Why are units and the nomenclature of units so important for some folks? It is all for traceability. Traceability can only exist if you can identify a thing. The lower the level of the thing you identify, the closer you get to the area of interest once an SRDB trace analysis is performed.

2. Where do I show traceability, in DOORS or IAT?

In Project-O traceability was down to the CPC level in DOORS and down to the function call level (UNIT as it should be defined) in IAT. IAT also shows the CSCI and each CPC where the CPC, is a WinNT directory, is an SDF, is what software called a UNIT. The traceability is approximate in DOORS since the CPC req's allocation was never updated after the software was built and released. The traceability is final and exact in IAT. It is an as built view with almost no room for error as the actual designer performed the final allocation.

3. Should I go back and update DOORS with the fine grain traceability outputs of IAT?

You have to ask yourself the all important question - what is the of purpose showing the traceability to the UNIT level.

If the purpose is to provide some assurance that the software does what it is suppose to do and the program includes many third party reviewers, then showing traceability in the IAT is probably the most effective presentation. It provides links to the source code and the requirements. It provides listings that are easy to read and follow with the requirements summarized as embedded comments very close to where they are satisfied.

If the purpose is to manage large numbers of subcontractors and build a repository that may last a few decades, then the SRDB should be maintained. This means that the SRDB should be updated frequently including entering updates from the outputs of the IAT.

4. Ok, so how does the IAT show the traceability to the UNIT level?

A picture is worth a thousand words. The output of the IAT is reports. The power of the IAT is not in its absolute "here it is nature", but in its Internet roots. This means that the output is easily formatted to fit any need as it arises.

LE-Summary.html Report

Currently the traceability to the UNIT level is captured in any of the following reports which are produced during instrumentation: s_codegen.html, s_le-summary.html, or s_simulate.html. The s_le-summary.html report is probably the most effective analysis report. It only shows the software files and functions which contain req's and has links to the source code and original req text.

If you want to update the SRDB, the IAT has a report, s_srdbupdate.html, that was specifically created for this activity. Its format is such that a data entry person can update the SRDB. It also provides table views of req's vs c-function and c-file sorted by either req number of path/file name.

5. My head is about to explode, so what should I do?

Keep it simple and follow the Project-O approach to traceability. Capture the front end as best as you can in DOORS realizing that eventually you will cut the umbilical and transfer to the IAT.

If you must update DOORS, then perform the update after the first release of the software. The output from the IAT will be real and complete for that release. If you do not like the current report formats, consider creating a new IAT report to streamline the DOORS update process.

Consider your stakeholders. What tools and reports are they really examining. If no one has access to your SRDB but everyone has access to your source code and IAT reports, then guess what everyone will use for their respective jobs?

Keep in mind that almost everyone has unit development folders and they have req's in those folders, however, that does not ensure "modern up down traceability" goals as can be provided with an SRDB. Populating the SRDB is 1 problem, but the other problem is actually ensuring that you have traceability without orphans and childless parents. That analysis can only occur with an SRDB. That is why DOORS exists. So after you populate your SRDB, don't forget to produce those reports and have the staff fill the holes. Again, no one goes much below the CSCI level. The really good organizations make it to the CPC level, the really great organizations make it to the unit level in the SRDB. I have never seen a really great organization, so for me its a myth and a challenge which IAT attempted to address.

6. I am not using IAT so how do I show traceability to the UNIT level?

Good Luck. Let folks know what you did :)

7. Why are we just talking about software traceability?

Traceability and the lack of it always boils down to the same thing, software.

Thanks to the explosion of hardware performance, systems have all become software intensive. Software has the unique problem of dealing with millions of things (LOC) which means that no one can really show traceability in the end except for the individual designers and implementors. However when the topic of traceability surfaces, its irrelevant to the typical software organization tasked with getting basic functionality out the door.

There in lies the dilemma. Too much to do and not enough time and or resources to do it. Further, traceability to the software UNIT level is very excruciating by its very nature of being so fine of a level of detail. So no one wants to do it even if time or resources could materialize.

IAT is based on these realizations and is a mechanism that extracts the traceability from the software team in the least intrusive way possible, primarily by piggy backing on their existing activities. Tailoring and Metrics Frequently Asked Questions


Lessons Learned

1. The IAT uses pattern searches to preform all of its services so think in terms of patterns. These pattern searches look for things to note or ignore. The IAT is not a complex device. Each line of code is examined only once, by subjecting it to these pattern searches. These pattern searches form the rule sets of the IAT. Some complex rules have look ahead pattern search capabilities. This means that the choice of key words and naming conventions is important.

2. Do not use TBD for all place holders. Differentiate the TBDs. For example TBF = future, TBS = To be specified, TBC = To be coded, TBD = to be determined, etc. Or just use TBD-1, TBD-2, TBD-n.

3. For all new code, use a naming convention for security critical variables. This will reduce the noise in the error and hamming distances reports. Get a list of the security critical variables that do not follow the naming conventions and incorporate them into the IAT analysis.

4. Do a peer review ASAP so that the process can be started on a simple case.

5. Do not bypass peer reviews as the next phase of a project starts. For example, after an initial release to the field. Take the opportunity to get IAT peer review reports on the new features being added to the device and as a minimum review the requirements captured in the software under review.

6. Use the regression service to perform DIFFs on software that goes to the factory. Do not let software go to the factory without using this service. Make sure it is reviewed and not just mechanically generated. Examine the recommended regression tests carefully.

7. Incorporate the trivial cosmetic changes ASAP. This will reduce the noise in the reports and allow everyone to focus on the real problems while feeling good about having some initial success. There is a tendency to delay these cosmetic changes. If that is the case on your program, then disable these features in the IAT. However, be careful. Bad headers are bad headers, and if they show no history, then its valid to assume you have lost CM control on those modules.

8. During test trust your log events. If a test fails to complete as expected do not try to understand the problem by resorting to the emulator or multiple changes to the test scenario and configuration. Trust the IAT and its log events. Experience has shown that IAT identifies where the failure occurs without question. That information has always pointed to a software or configuration problem that is quickly understood (within seconds of seeing the log events).

9. Do not add missed log events or make code analysis changes as a special activity. Experience has shown that between releases much of the software is revisited as part of enhancements and problem correction. Add the log events as part of the process of doing something else to a module. So there should be a "to do list" each time a module is opened for change. That "to do list" should include checking the missing log events list. This also applies to the software code analysis changes. The "to do list" should include fixing headers and other outputs from the code analysis reports. BUT do not delay or stonewall the work that needs to be performed. Experience has shown there is usually no excuse for not putting the changes in the next release.

10. In the past the test team set hundreds of breakpoints in the software giving them control. With IAT the software team sets the "logical break points of the past" as log events. Do not prevent the testers from adding, changing, and moving log events during test preparation. Preventing this simple task takes control away from the test team that existed when they set hundreds of break points in the software. Adding and moving 10-20 logevents per release is trivial and will eliminate significant tension between management, software staff, and the test team needing this critical service.

11. Consider logging content on your program ASAP. IAT has all the hooks to log content. The only issue is your unique approach to getting the log events and log content out your interface. Content is a key element that further reinforces placement of the log events. If the content is not right, you know there is a log event placement problem. In the past testers would search for the elusive breakpoint by looking at the related content. This would verify the proper placement of the breakpoint.

12. Log events in the code is not just about automated testing. Placement of log events is also used to verify requirements in the software. The code analysis reports include visualizations that quickly link software and requirements. Placement of log events also can be used to perform traditional breakpoint testing where the log event identifies ahead of time the logical location of a breakpoint. In fact the very first IAT based testing was based on both breakpoints (using log events as the breakpoint locators) and the automated output of a log event to a file.


Internal Operations

The framework is implemented in PERL. One of the advantages of PERL is that variable length or type is not relevant. So a single variable can slurp in an entire document and the manipulations on that document can be performed by manipulating that single variable.

The heart of this tool is the regular expression. Regular expressions are used to instrument the code, search for keywords, find bad patterns, correlate the lab test data with the original source code comment statements, and correlate the comment statements with the original TOC specification. The tool access all the software in a directory tree one file at a time. The file is moved into an array and into a single variable. Some of the checks are performed against the entire file in the single variable (e.g. classification checks). The remaining checks are performed on each individual line in the array (e.g. Show ham values). In some cases multiple items in the array are checked (e.g. consecutive logevents). Everything else is mundane and mechanical.

The reports are HTML formatted. Normally in PERL, if executed from a web server, the output is sent to STDOUT which is the web server. STDIN ordinates from the clients browser via the web server. STDERR goes to an error file defined by a configuration directive. In the absence of a web server the STDOUT is redirect to a file with the following sequence:

# set to save HTML page status
open(DATA, ">$analysis/$report");
select(DATA)
;

From this point on, all output is sent to the disk following whatever is stored in $analysis/$report. When the instruction:

close(DATA);

is executed, STDOUT becomes the normal STDOUT, which is the DOS window if the program was started from DOS or the web server if the program was started from a browser.

There is a "main" at the top of the 3 programs. The main calls the subprograms that form this framework. To enable the script operations from the DOS window the "main" area of each of the 3 programs was encapsulated as a sub program:

sub logevents_main{
all the main junk
}

removing this encapsulation or just adding a call at the top of each program will allow these 3 programs to operate in stand alone mode, without the calls from the scripts / templates.

The data stored on disk is read in using two approaches. The first approach reads the file into an array. The second approach reads the file into a variable. Occasionally, data read in as an array is also passed to a variable.

The reason for these two approaches to internally storing the file data is associated with the regular expression processing. If the program is looking for a certain pattern across an entire file, the variable is used in the regular expression. If on the other hand a single line entry needs to be examined, then the array is sequenced and a regular expression processes all the array elements. Sometimes its possible to perform regular expression processing on the non-array form, but regular expressions are not trivial, and for simplicity, and the desire to get consistent results, the array method is used.


Regular Expressions (REs)

In an RE there are plenty of special characters, and it is these that both give them their power and make them appear very complicated. It's best to build up your use of REs slowly; their creation can be something of an art form.

Here are some special RE characters and their meaning

.       # Any single character except a newline
^       # The beginning of the line or string
$       # The end of the line or string
*       # Zero or more of the last character
+       # One or more of the last character
?       # Zero or one of the last character

and here are some example matches. Remember that should be enclosed in /.../ slashes to be used.

t.e     # t followed by anything followed by e
        # This will match the
        #                 tre
        #                 tle
        # but not te
        #         tale
^f      # f at the beginning of a line
^ftp    # ftp at the beginning of a line
e$      # e at the end of a line
tle$    # tle at the end of a line
und*    # un followed by zero or more d characters
        # This will match un
        #                 und
        #                 undd
        #                 unddd (etc)
.*      # Any string without a newline. This is because
        # the . matches anything except a newline and
        # the * means zero or more of these.
^$      # A line with nothing in it.

There are even more options. Square brackets are used to match any one of the characters inside them. Inside square brackets a - indicates "between" and a ^ at the beginning means "not":

[qjk]           # Either q or j or k
[^qjk]          # Neither q nor j nor k
[a-z]           # Anything from a to z inclusive
[^a-z]          # No lower case letters
[a-zA-Z]        # Any letter
[a-z]+          # Any non-zero sequence of lower case letters

At this point you can probably skip to the end and do at least most of the exercise. The rest is mostly just for reference.

A vertical bar | represents an "or" and parentheses (...) can be used to group things together:

jelly|cream     # Either jelly or cream
(eg|le)gs       # Either eggs or legs
(da)+           # Either da or dada or dadada or...

Here are some more special characters:

\n              # A newline
\t              # A tab
\w              # Any alphanumeric (word) character.
                # The same as [a-zA-Z0-9_]
\W              # Any non-word character.
                # The same as [^a-zA-Z0-9_]
\d              # Any digit. The same as [0-9]
\D              # Any non-digit. The same as [^0-9]
\s              # Any whitespace character: space,
                # tab, newline, etc
\S              # Any non-whitespace character
\b              # A word boundary, outside [] only
\B              # No word boundary

Clearly characters like $, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to preceed it by a backslash. So:

\|              # Vertical bar
\[              # An open square bracket
\)              # A closing parenthesis
\*              # An asterisk
\^              # A carat symbol
\/              # A slash
\\              # A backslash

and so on.

Some example REs

As was mentioned earlier, it's probably best to build up your use of regular expressions slowly. Here are a few examples. Remember that to use them for matching they should be put in /.../ slashes

[01]            # Either "0" or "1"
\/0             # A division by zero: "/0"
\/ 0            # A division by zero with a space: "/ 0"
\/\s0           # A division by zero with a whitespace:
                # "/ 0" where the space may be a tab etc.
\/ *0           # A division by zero with possibly some
                # spaces: "/0" or "/ 0" or "/  0" etc.
\/\s*0          # A division by zero with possibly some
                # whitespace.
\/\s*0\.0*      # As the previous one, but with decimal
                # point and maybe some 0s after it. Accepts
                # "/0." and "/0.0" and "/0.00" etc and
                # "/ 0." and "/  0.0" and "/   0.00" etc.

More Regular Expressions


More Regular Expressions

i

Do case-insensitive pattern matching.

If use locale is in effect, the case map is taken from the current locale.

m

Treat string as multiple lines. That is, change ``^'' and ``$'' from matching at only the very start or end of the string to the start or end of any line anywhere within the string,

s

Treat string as single line. That is, change ``.'' to match any character whatsoever, even a newline, which it normally would not match.

The /s and /m modifiers both override the $* setting. That is, no matter what $* contains, /s without /m will force ``^'' to match only at the beginning of the string and ``$'' to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the ``.'' match any character whatsoever, while yet allowing ``^'' and ``$'' to match, respectively, just after and just before newlines within the string.

x

Extend your pattern's legibility by permitting whitespace and comments.

These are usually written as ``the /x modifier'', even though the delimiter in question might not actually be a slash. In fact, any of these modifiers may also be embedded within the regular expression itself using the new (?...) construct. See below.

The /x modifier itself needs a little more explanation. It tells the regular expression parser to ignore whitespace that is neither backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or # characters in the pattern (outside of a character class, where they are unaffected by /x), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making Perl's regular expressions more readable. Note that you have to be careful not to include the pattern delimiter in the comment--perl has no way of knowing you did not intend to close the pattern early.


The patterns used in pattern matching are regular expressions such as those supplied in the Version 8 regex routines. (In fact, the routines are derived (distantly) from Henry Spencer's freely redistributable reimplementation of the V8 routines.)

In particular the following metacharacters have their standard egrep-ish meanings:

    \   Quote the next metacharacter
    ^   Match the beginning of the line
    .   Match any character (except newline)
    $   Match the end of the line (or before newline at the end)
    |   Alternation
    ()  Grouping
    []  Character class

By default, the ``^'' character is guaranteed to match at only the beginning of the string, the ``$'' character at only the end (or before the newline at the end) and Perl does certain optimizations with the assumption that the string contains only one line. Embedded newlines will not be matched by ``^'' or ``$''. You may, however, wish to treat a string as a multi-line buffer, such that the ``^'' will match after any newline within the string, and ``$'' will match before any newline. At the cost of a little more overhead, you can do this by using the /m modifier on the pattern match operator. (Older programs did this by setting $*, but this practice is now deprecated.)

To facilitate multi-line substitutions, the ``.'' character never matches a newline unless you use the /s modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't. The /s modifier also overrides the setting of $*, in case you have some (badly behaved) older code that sets it in another module.

The following standard quantifiers are recognized:

    *      Match 0 or more times
    +      Match 1 or more times
    ?      Match 1 or 0 times
    {n}    Match exactly n times
    {n,}   Match at least n times
    {n,m}  Match at least n but not more than m times

(If a curly bracket occurs in any other context, it is treated as a regular character.) The ``*'' modifier is equivalent to {0,}, the ``+'' modifier to {1,}, and the ``?'' modifier to {0,1}. n and m are limited to integral values less than 65536.

By default, a quantified subpattern is ``greedy'', that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a ``?''. Note that the meanings don't change, just the ``greediness'':

    *?     Match 0 or more times
    +?     Match 1 or more times
    ??     Match 0 or 1 time
    {n}?   Match exactly n times
    {n,}?  Match at least n times
    {n,m}? Match at least n but not more than m times

Because patterns are processed as double quoted strings, the following also work:

    \t          tab                   (HT, TAB)
    \n          newline               (LF, NL)
    \r          return                (CR)
    \f          form feed             (FF)
    \a          alarm (bell)          (BEL)
    \e          escape (think troff)  (ESC)
    \033        octal char (think of a PDP-11)
    \x1B        hex char
    \c[         control char
    \l          lowercase next char (think vi)
    \u          uppercase next char (think vi)
    \L          lowercase till \E (think vi)
    \U          uppercase till \E (think vi)
    \E          end case modification (think vi)
    \Q          quote (disable) pattern metacharacters till \E

If use locale is in effect, the case map used by \l, \L, \u and \U is taken from the current locale.

You cannot include a literal $ or @ within a \Q sequence. An unescaped $ or @ interpolates the corresponding variable, while escaping will cause the literal string \$ to be matched. You'll need to write something like m/\Quser\E\@\Qhost/.

In addition, Perl defines the following:

    \w  Match a "word" character (alphanumeric plus "_")
    \W  Match a non-word character
    \s  Match a whitespace character
    \S  Match a non-whitespace character
    \d  Match a digit character
    \D  Match a non-digit character

A \w matches a single alphanumeric character, not a whole word. To match a word you'd need to say \w+. If use locale is in effect, the list of alphabetic characters generated by \w is taken from the current locale. You may use \w, \W, \s, \S, \d, and \D within character classes (though not as either end of a range).

Perl defines the following zero-width assertions:

    \b  Match a word boundary
    \B  Match a non-(word boundary)
    \A  Match only at beginning of string
    \Z  Match only at end of string, or before newline at the end
    \z  Match only at end of string
    \G  Match only where previous m//g left off (works only with /g)

A word boundary (\b) is defined as a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W. (Within character classes \b represents backspace rather than a word boundary.) The \A and \Z are just like ``^'' and ``$'', except that they won't match multiple times when the /m modifier is used, while ``^'' and ``$'' will match at every internal line boundary. To match the actual end of the string, not ignoring newline, you can use \z. The \G assertion can be used to chain global matches (using m//g), as described in Regexp Quote-Like Operators.

It is also useful when writing lex-like scanners, when you have several patterns that you want to match against consequent substrings of your string, see the previous reference. The actual location where \G will match can also be influenced by using pos() as an lvalue.

When the bracketing construct ( ... ) is used, \<digit> matches the digit'th substring. Outside of the pattern, always use ``$'' instead of ``\'' in front of the digit. (While the \<digit> notation can on rare occasion work outside the current pattern, this should not be relied upon. See the WARNING below.) The scope of $<digit> (and $`, $&, and $') extends to the end of the enclosing BLOCK or eval string, or to the next successful pattern match, whichever comes first. If you want to use parentheses to delimit a subpattern (e.g., a set of alternatives) without saving it as a subpattern, follow the ( with a ?:.

You may have as many parentheses as you wish. If you have more than 9 substrings, the variables $10, $11, ... refer to the corresponding substring. Within the pattern, \10, \11, etc. refer back to substrings if there have been at least that many left parentheses before the backreference. Otherwise (for backward compatibility) \10 is the same as \010, a backspace, and \11 the same as \011, a tab. And so on. (\1 through \9 are always backreferences.)

$+ returns whatever the last bracket match matched. $& returns the entire matched string. ($0 used to return the same thing, but not any more.) $` returns everything before the matched string. $' returns everything after the matched string. Examples:

    s/^([^ ]*) *([^ ]*)/$2 $1/;     # swap first two words

    if (/Time: (..):(..):(..)/) {
        $hours = $1;
        $minutes = $2;
        $seconds = $3;
    }

Once perl sees that you need one of $&, $` or $' anywhere in the program, it has to provide them on each and every pattern match. This can slow your program down. The same mechanism that handles these provides for the use of $1, $2, etc., so you pay the same price for each pattern that contains capturing parentheses. But if you never use $&, etc., in your script, then patterns without capturing parentheses won't be penalized. So avoid $&, $', and $` if you can, but if you can't (and some algorithms really appreciate them), once you've used them once, use them at will, because you've already paid the price. As of 5.005, $& is not so costly as the other two.

Backslashed metacharacters in Perl are alphanumeric, such as \b, \w, \n. Unlike some other regular expression languages, there are no backslashed symbols that aren't alphanumeric. So anything that looks like \\, \(, \), \<, \>, \{, or \} is always interpreted as a literal character, not a metacharacter. This was once used in a common idiom to disable or quote the special meanings of regular expression metacharacters in a string that you want to use for a pattern. Simply quote all non-alphanumeric characters:

    $pattern =~ s/(\W)/\\$1/g;

Now it is much more common to see either the quotemeta() function or the \Q escape sequence used to disable all metacharacters' special meanings like this:

    /$unquoted\Q$quoted\E$unquoted/

Perl defines a consistent extension syntax for regular expressions. The syntax is a pair of parentheses with a question mark as the first thing within the parentheses (this was a syntax error in older versions of Perl). The character after the question mark gives the function of the extension. Several extensions are already supported:

(?#text)
A comment. The text is ignored. If the /x switch is used to enable whitespace formatting, a simple # will suffice. Note that perl closes the comment as soon as it sees a ), so there is no way to put a literal ) in the comment.
(?:pattern)
(?imsx-imsx:pattern)
This is for clustering, not capturing; it groups subexpressions like ``()'', but doesn't make backreferences as ``()'' does. So
    @fields = split(/\b(?:a|b|c)\b/)
is like
    @fields = split(/\b(a|b|c)\b/)
but doesn't spit out extra fields.
The letters between ? and : act as flags modifiers, see (?imsx-imsx). In particular,
    /(?s-i:more.*than).*million/i
is equivalent to more verbose
    /(?:(?s-i)more.*than).*million/i
(?=pattern)
A zero-width positive lookahead assertion. For example, /\w+(?=\t)/ matches a word followed by a tab, without including the tab in $&.
(?!pattern)
A zero-width negative lookahead assertion. For example /foo(?!bar)/ matches any occurrence of ``foo'' that isn't followed by ``bar''. Note however that lookahead and lookbehind are NOT the same thing. You cannot use this for lookbehind.
If you are looking for a ``bar'' that isn't preceded by a ``foo'', /(?!foo)bar/ will not do what you want. That's because the (?!foo) is just saying that the next thing cannot be ``foo''--and it's not, it's a ``bar'', so ``foobar'' will match. You would have to do something like /(?!foo)...bar/ for that. We say ``like'' because there's the case of your ``bar'' not having three characters before it. You could cover that this way: /(?:(?!foo)...|^.{0,2})bar/. Sometimes it's still easier just to say:
    if (/bar/ && $` !~ /foo$/)
For lookbehind see below.
(?<=pattern)
A zero-width positive lookbehind assertion. For example, /(?<=\t)\w+/ matches a word following a tab, without including the tab in $&. Works only for fixed-width lookbehind.
(?<!pattern)
A zero-width negative lookbehind assertion. For example /(?<!bar)foo/ matches any occurrence of ``foo'' that isn't following ``bar''. Works only for fixed-width lookbehind.
(?{ code })
Experimental ``evaluate any Perl code'' zero-width assertion. Always succeeds. code is not interpolated. Currently the rules to determine where the code ends are somewhat convoluted.
The code is properly scoped in the following sense: if the assertion is backtracked (compare Backtracking), all the changes introduced after localisation are undone, so
  $_ = 'a' x 8;
  m< 
     (?{ $cnt = 0 })                    # Initialize $cnt.
     (
       a 
       (?{
           local $cnt = $cnt + 1;       # Update $cnt, backtracking-safe.
       })
     )*  
     aaaa
     (?{ $res = $cnt })                 # On success copy to non-localized
                                        # location.
   >x;
will set $res = 4. Note that after the match $cnt returns to the globally introduced value 0, since the scopes which restrict local statements are unwound.
This assertion may be used as (?(condition)yes-pattern) switch. If not used in this way, the result of evaluation of code is put into variable $^R. This happens immediately, so $^R can be used from other (?{ code }) assertions inside the same regular expression.
The above assignment to $^R is properly localized, thus the old value of $^R is restored if the assertion is backtracked (compare Backtracking).
Due to security concerns, this construction is not allowed if the regular expression involves run-time interpolation of variables, unless use re 'eval' pragma is used (see the re manpage), or the variables contain results of qr() operator (see qr/STRING/imosx).
This restriction is due to the wide-spread (questionable) practice of using the construct
    $re = <>;
    chomp $re;
    $string =~ /$re/;
without tainting. While this code is frowned upon from security point of view, when (?{}) was introduced, it was considered bad to add new security holes to existing scripts.
NOTE: Use of the above insecure snippet without also enabling taint mode is to be severely frowned upon. use re 'eval' does not disable tainting checks, thus to allow $re in the above snippet to contain (?{}) with tainting enabled, one needs both use re 'eval' and untaint the $re.
(?>pattern)
An ``independent'' subexpression. Matches the substring that a standalone pattern would match if anchored at the given position, and only this substring.
Say, ^(?>a*)ab will never match, since (?>a*) (anchored at the beginning of string, as above) will match all characters a at the beginning of string, leaving no a for ab to match. In contrast, a*ab will match the same as a+b, since the match of the subgroup a* is influenced by the following group ab (see Backtracking). In particular, a* inside a*ab will match fewer characters than a standalone a*, since this makes the tail match.
An effect similar to (?>pattern) may be achieved by
   (?=(pattern))\1
since the lookahead is in "logical" context, thus matches the same substring as a standalone a+. The following \1 eats the matched string, thus making a zero-length assertion into an analogue of (?>...). (The difference between these two constructs is that the second one uses a catching group, thus shifting ordinals of backreferences in the rest of a regular expression.)
This construct is useful for optimizations of ``eternal'' matches, because it will not backtrack (see Backtracking).
    m{ \(
          ( 
            [^()]+ 
          | 
            \( [^()]* \)
          )+
       \) 
     }x
That will efficiently match a nonempty group with matching two-or-less-level-deep parentheses. However, if there is no such group, it will take virtually forever on a long string. That's because there are so many different ways to split a long string into several substrings. This is what (.+)+ is doing, and (.+)+ is similar to a subpattern of the above pattern. Consider that the above pattern detects no-match on ((()aaaaaaaaaaaaaaaaaa in several seconds, but that each extra letter doubles this time. This exponential performance will make it appear that your program has hung.
However, a tiny modification of this pattern
    m{ \( 
          ( 
            (?> [^()]+ )
          | 
            \( [^()]* \)
          )+
       \) 
     }x
which uses (?>...) matches exactly when the one above does (verifying this yourself would be a productive exercise), but finishes in a fourth the time when used on a similar string with 1000000 as. Be aware, however, that this pattern currently triggers a warning message under -w saying it "matches the null string many times"):
On simple groups, such as the pattern (? [^()]+ )>, a comparable effect may be achieved by negative lookahead, as in [^()]+ (?! [^()] ). This was only 4 times slower on a string with 1000000 as.
(?(condition)yes-pattern|no-pattern)
(?(condition)yes-pattern)
Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or lookahead/lookbehind/evaluate zero-width assertion.
Say,
    m{ ( \( )? 
       [^()]+ 
       (?(1) \) ) 
     }x
matches a chunk of non-parentheses, possibly included in parentheses themselves.
(?imsx-imsx)
One or more embedded pattern-match modifiers. This is particularly useful for patterns that are specified in a table somewhere, some of which want to be case sensitive, and some of which don't. The case insensitive ones need to include merely (?i) at the front of the pattern. For example:
    $pattern = "foobar";
    if ( /$pattern/i ) { } 
    # more flexible:
    $pattern = "(?i)foobar";
    if ( /$pattern/ ) { } 
Letters after - switch modifiers off.
These modifiers are localized inside an enclosing group (if any). Say,
    ( (?i) blah ) \s+ \1
(assuming x modifier, and no i modifier outside of this group) will match a repeated (including the case!) word blah in any case.

A question mark was chosen for this and for the new minimal-matching construct because 1) question mark is pretty rare in older regular expressions, and 2) whenever you see one, you should stop and ``question'' exactly what is going on. That's psychology...


A fundamental feature of regular expression matching involves the notion called backtracking, which is currently used (when needed) by all regular expression quantifiers, namely *, *?, +, +?, {n,m}, and {n,m}?.

For a regular expression to match, the entire regular expression must match, not just part of it. So if the beginning of a pattern containing a quantifier succeeds in a way that causes later parts in the pattern to fail, the matching engine backs up and recalculates the beginning part--that's why it's called backtracking.

Here is an example of backtracking: Let's say you want to find the word following ``foo'' in the string ``Food is on the foo table.'':

    $_ = "Food is on the foo table.";
    if ( /\b(foo)\s+(\w+)/i ) {
        print "$2 follows $1.\n";
    }

When the match runs, the first part of the regular expression (\b(foo)) finds a possible match right at the beginning of the string, and loads up $1 with ``Foo''. However, as soon as the matching engine sees that there's no whitespace following the ``Foo'' that it had saved in $1, it realizes its mistake and starts over again one character after where it had the tentative match. This time it goes all the way until the next occurrence of ``foo''. The complete regular expression matches this time, and you get the expected output of ``table follows foo.''

Sometimes minimal matching can help a lot. Imagine you'd like to match everything between ``foo'' and ``bar''. Initially, you write something like this:

    $_ =  "The food is under the bar in the barn.";
    if ( /foo(.*)bar/ ) {
        print "got <$1>\n";
    }

Which perhaps unexpectedly yields:

  got <d is under the bar in the >

That's because .* was greedy, so you get everything between the first ``foo'' and the last ``bar''. In this case, it's more effective to use minimal matching to make sure you get the text between a ``foo'' and the first ``bar'' thereafter.

    if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
  got <d is under the >

Here's another example: let's say you'd like to match a number at the end of a string, and you also want to keep the preceding part the match. So you write this:

    $_ = "I have 2 numbers: 53147";
    if ( /(.*)(\d*)/ ) {                                # Wrong!
        print "Beginning is <$1>, number is <$2>.\n";
    }

That won't work at all, because .* was greedy and gobbled up the whole string. As \d* can match on an empty string the complete regular expression matched successfully.

    Beginning is <I have 2 numbers: 53147>, number is <>.

Here are some variants, most of which don't work:

    $_ = "I have 2 numbers: 53147";
    @pats = qw{
        (.*)(\d*)
        (.*)(\d+)
        (.*?)(\d*)
        (.*?)(\d+)
        (.*)(\d+)$
        (.*?)(\d+)$
        (.*)\b(\d+)$
        (.*\D)(\d+)$
    };

    for $pat (@pats) {
        printf "%-12s ", $pat;
        if ( /$pat/ ) {
            print "<$1> <$2>\n";
        } else {
            print "FAIL\n";
        }
    }

That will print out:

    (.*)(\d*)    <I have 2 numbers: 53147> <>
    (.*)(\d+)    <I have 2 numbers: 5314> <7>
    (.*?)(\d*)   <> <>
    (.*?)(\d+)   <I have > <2>
    (.*)(\d+)$   <I have 2 numbers: 5314> <7>
    (.*?)(\d+)$  <I have 2 numbers: > <53147>
    (.*)\b(\d+)$ <I have 2 numbers: > <53147>
    (.*\D)(\d+)$ <I have 2 numbers: > <53147>

As you see, this can be a bit tricky. It's important to realize that a regular expression is merely a set of assertions that gives a definition of success. There may be 0, 1, or several different ways that the definition might succeed against a particular string. And if there are multiple ways it might succeed, you need to understand backtracking to know which variety of success you will achieve.

When using lookahead assertions and negations, this can all get even tricker. Imagine you'd like to find a sequence of non-digits not followed by ``123''. You might try to write that as

    $_ = "ABC123";
    if ( /^\D*(?!123)/ ) {              # Wrong!
        print "Yup, no 123 in $_\n";
    }

But that isn't going to match; at least, not the way you're hoping. It claims that there is no 123 in the string. Here's a clearer picture of why it that pattern matches, contrary to popular expectations:

    $x = 'ABC123' ;
    $y = 'ABC445' ;

    print "1: got $1\n" if $x =~ /^(ABC)(?!123)/ ;
    print "2: got $1\n" if $y =~ /^(ABC)(?!123)/ ;

    print "3: got $1\n" if $x =~ /^(\D*)(?!123)/ ;
    print "4: got $1\n" if $y =~ /^(\D*)(?!123)/ ;

This prints

    2: got ABC
    3: got AB
    4: got ABC

You might have expected test 3 to fail because it seems to a more general purpose version of test 1. The important difference between them is that test 3 contains a quantifier (\D*) and so can use backtracking, whereas test 1 will not. What's happening is that you've asked ``Is it true that at the start of $x, following 0 or more non-digits, you have something that's not 123?'' If the pattern matcher had let \D* expand to ``ABC'', this would have caused the whole pattern to fail. The search engine will initially match \D* with ``ABC''. Then it will try to match (?!123 with ``123'', which of course fails. But because a quantifier (\D*) has been used in the regular expression, the search engine can backtrack and retry the match differently in the hope of matching the complete regular expression.

The pattern really, really wants to succeed, so it uses the standard pattern back-off-and-retry and lets \D* expand to just ``AB'' this time. Now there's indeed something following ``AB'' that is not ``123''. It's in fact ``C123'', which suffices.

We can deal with this by using both an assertion and a negation. We'll say that the first part in $1 must be followed by a digit, and in fact, it must also be followed by something that's not ``123''. Remember that the lookaheads are zero-width expressions--they only look, but don't consume any of the string in their match. So rewriting this way produces what you'd expect; that is, case 5 will fail, but case 6 succeeds:

    print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/ ;
    print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/ ;

    6: got ABC

In other words, the two zero-width assertions next to each other work as though they're ANDed together, just as you'd use any builtin assertions: /^$/ matches only if you're at the beginning of the line AND the end of the line simultaneously. The deeper underlying truth is that juxtaposition in regular expressions always means AND, except when you write an explicit OR using the vertical bar. /ab/ means match ``a'' AND (then) match ``b'', although the attempted matches are made at different positions because ``a'' is not a zero-width assertion, but a one-width assertion.

One warning: particularly complicated regular expressions can take exponential time to solve due to the immense number of possible ways they can use backtracking to try match. For example this will take a very long time to run

    /((a{0,5}){0,5}){0,5}/

And if you used *'s instead of limiting it to 0 through 5 matches, then it would take literally forever--or until you ran out of stack space.

A powerful tool for optimizing such beasts is ``independent'' groups, which do not backtrace (see (?>pattern)). Note also that zero-length lookahead/lookbehind assertions will not backtrace to make the tail match, since they are in ``logical'' context: only the fact whether they match or not is considered relevant. For an example where side-effects of a lookahead might have influenced the following match, see (?>pattern).


Version 8 Regular Expressions

In case you're not familiar with the ``regular'' Version 8 regex routines, here are the pattern-matching rules not described above.

Any single character matches itself, unless it is a metacharacter with a special meaning described here or above. You can cause characters that normally function as metacharacters to be interpreted literally by prefixing them with a ``\'' (e.g., ``\.'' matches a ``.'', not any character; ``\\'' matches a ``\''). A series of characters matches that series of characters in the target string, so the pattern blurfl would match ``blurfl'' in the target string.

You can specify a character class, by enclosing a list of characters in [], which will match any one character from the list. If the first character after the ``['' is ``^'', the class matches any character not in the list. Within a list, the ``-'' character is used to specify a range, so that a-z represents all characters between ``a'' and ``z'', inclusive. If you want ``-'' itself to be a member of a class, put it at the start or end of the list, or escape it with a backslash. (The following all specify the same class of three characters: [-az], [az-], and [a\-z]. All are different from [a-z], which specifies a class containing twenty-six characters.)

Characters may be specified using a metacharacter syntax much like that used in C: ``\n'' matches a newline, ``\t'' a tab, ``\r'' a carriage return, ``\f'' a form feed, etc. More generally, \ nnn, where nnn is a string of octal digits, matches the character whose ASCII value is nnn. Similarly, \xnn, where nn are hexadecimal digits, matches the character whose ASCII value is nn. The expression \cx matches the ASCII character control-x. Finally, the ``.'' metacharacter matches any character except ``\n'' (unless you use /s).

You can specify a series of alternatives for a pattern using ``|'' to separate them, so that fee|fie|foe will match any of ``fee'', ``fie'', or ``foe'' in the target string (as would f(e|i|o)e). The first alternative includes everything from the last pattern delimiter (``('', ``['', or the beginning of the pattern) up to the first ``|'', and the last alternative contains everything from the last ``|'' to the next pattern delimiter. For this reason, it's common practice to include alternatives in parentheses, to minimize confusion about where they start and end.

Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when mathing foo|foot against ``barefoot'', only the ``foo'' part will match, as that is the first alternative tried, and it successfully matches the target string. (This might not seem important, but it is important when you are capturing matched text using parentheses.)

Also remember that ``|'' is interpreted as a literal within square brackets, so if you write [fee|fie|foe] you're really only matching [feio|].

Within a pattern, you may designate subpatterns for later reference by enclosing them in parentheses, and you may refer back to the nth subpattern later in the pattern using the metacharacter \n. Subpatterns are numbered based on the left to right order of their opening parenthesis. A backreference matches whatever actually matched the subpattern in the string being examined, not the rules for that subpattern. Therefore, (0|0x)\d*\s\1\d* will match ``0x1234 0x4321'', but not ``0x1234 01234'', because subpattern 1 actually matched ``0x'', even though the rule 0|0x could potentially match the leading 0 in the second number.


WARNING on \1 vs $1

Some people get too used to writing things like:

    $pattern =~ s/(\W)/\\\1/g;

This is grandfathered for the RHS of a substitute to avoid shocking the sed addicts, but it's a dirty habit to get into. That's because in PerlThink, the righthand side of a s/// is a double-quoted string. \1 in the usual double-quoted string means a control-A. The customary Unix meaning of \1 is kludged in for s///. However, if you get into the habit of doing that, you get yourself into trouble if you then add an /e modifier.

    s/(\d+)/ \1 + 1 /eg;        # causes warning under -w

Or if you try to do

    s/(\d+)/\1000/;

You can't disambiguate that by saying \{1}000, whereas you can fix it with ${1}000. Basically, the operation of interpolation should not be confused with the operation of matching a backreference. Certainly they mean two different things on the left side of the s///.


Repeated patterns matching zero-length substring

WARNING: Difficult material (and prose) ahead. This section needs a rewrite.

Regular expressions provide a terse and powerful programming language. As with most other power tools, power comes together with the ability to wreak havoc.

A common abuse of this power stems from the ability to make infinite loops using regular expressions, with something as innocous as:

    'foo' =~ m{ ( o? )* }x;

The o? can match at the beginning of 'foo', and since the position in the string is not moved by the match, o? would match again and again due to the * modifier. Another common way to create a similar cycle is with the looping modifier //g:

    @matches = ( 'foo' =~ m{ o? }xg );

or

    print "match: <$&>\n" while 'foo' =~ m{ o? }xg;

or the loop implied by split().

However, long experience has shown that many programming tasks may be significantly simplified by using repeated subexpressions which may match zero-length substrings, with a simple example being:

    @chars = split //, $string;           # // is not magic in split
    ($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /

Thus Perl allows the /()/ construct, which forcefully breaks the infinite loop. The rules for this are different for lower-level loops given by the greedy modifiers *+{}, and for higher-level ones like the /g modifier or split() operator.

The lower-level loops are interrupted when it is detected that a repeated expression did match a zero-length substring, thus

   m{ (?: NON_ZERO_LENGTH | ZERO_LENGTH )* }x;

is made equivalent to

   m{   (?: NON_ZERO_LENGTH )* 
      | 
        (?: ZERO_LENGTH )? 
    }x;

The higher level-loops preserve an additional state between iterations: whether the last match was zero-length. To break the loop, the following match after a zero-length match is prohibited to have a length of zero. This prohibition interacts with backtracking (see Backtracking), and so the second best match is chosen if the best match is of zero length.

Say,

    $_ = 'bar';
    s/\w??/<$&>/g;

results in "<<b><><a><><r><>``>. At each position of the string the best match given by non-greedy ?? is the zero-length match, and the second best match is what is matched by \w. Thus zero-length matches alternate with one-character-long matches.

Similarly, for repeated m/()/g the second-best match is the match at the position one notch further in the string.

The additional state of being matched with zero-length is associated to the matched string, and is reset by each assignment to pos().


Creating custom RE engines

Overloaded constants (see the overload manpage) provide a simple way to extend the functionality of the RE engine.

Suppose that we want to enable a new RE escape-sequence \Y| which matches at boundary between white-space characters and non-whitespace characters. Note that (?=\S)(?<!\S)|(?!\S)(?<=\S) matches exactly at these positions, so we want to have each \Y| in the place of the more complicated version. We can create a module customre to do this:

    package customre;
    use overload;

    sub import {
      shift;
      die "No argument to customre::import allowed" if @_;
      overload::constant 'qr' => \&convert;
    }

    sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"}

    my %rules = ( '\\' => '\\', 
                  'Y|' => qr/(?=\S)(?<!\S)|(?!\S)(?<=\S)/ );
    sub convert {
      my $re = shift;
      $re =~ s{ 
                \\ ( \\ | Y . )
              }
              { $rules{$1} or invalid($re,$1) }sgex; 
      return $re;
    }

Now use customre enables the new escape in constant regular expressions, i.e., those without any runtime variable interpolations. As documented in the overload manpage, this conversion will work only over literal parts of regular expressions. For \Y|$re\Y| the variable part of this regular expression needs to be converted explicitly (but only if the special meaning of \Y| should be enabled inside $re):

    use customre;
    $re = <>;
    chomp $re;
    $re = customre::convert $re;
    /\Y|$re\Y|/;


sat@cassbeth.com