Application centric Cloudwatch logging in AWS Lambda functions with python3+ runtimes

Abstract

Application centric logging is a system where there are one or more components all directing thier log entries to a single logger. In the AWS context, this could mean an application composed of one or more AWS Lamba functions each logging to a single application-wide AWS CloudWatch log stream. By “single”, I mean single to the application, not to each function.

In lambda functions with python runtimes, the default mode of logging is one log stream per lambda. We can do this via the print() function or the logging module. But there are sometimes situations where mulltiple lambdas are co-operating to solve a larger problem, where the co-operation is synchronous, and where it would be of value to be able to view a unified log stream for events accross multiple lambdas.

How do we achieve this, with a simple client interface? In this post, I present a solution.

A Solution

Include in the packaging of your AWS Lambda function, the following python 3 script with filename “custom_logging.py”.

################################################################################
##  CustomLogging class
################################################################################
import boto3
import time, sys
import logging 

def coerceLoggingType( logType):
  if (logType is None) or (logType == ''):
    logType = logging.INFO
  elif isinstance( logType, str):
    logType = getattr( logging, logType.upper(), logging.INFO)
  return logType

global stockFormats
global defaultFormat
global levelNames

defaultFormat = '#standard'

stockFormats = {
  '#standard': '{level}: {func}: {caller}: ',
  '#short'   : '{level}: {func}: ',
  '#simple'  : '{func}: '} 

levelNames = {
  logging.DEBUG   : 'DEBUG',
  logging.INFO    : 'INFO',
  logging.WARNING : 'WARNING',
  logging.ERROR   : 'ERROR',
  logging.CRITICAL: 'CRITICAL'}

botoLoggers = ['boto', 'boto3', 'botocore', 'urllib3']

def _json_formatter( obj):
  """Formatter for unserialisable values."""
  return str( obj)

class JsonFormatter( logging.Formatter):
  """AWS Lambda Logging formatter.
  Formats the log message as a JSON encoded string.  If the message is a
  dict it will be used directly.  If the message can be parsed as JSON, then
  the parse d value is used in the output record.
  """
  def __init__( self, **kwargs):
    super( JsonFormatter, self).__init__()
    self.format_dict = {
      'timestamp': '%(asctime)s',
      'level': '%(levelname)s',
      'location': '%(name)s.%(funcName)s:%(lineno)d'}
    self.format_dict.update(kwargs)
    self.default_json_formatter = kwargs.pop( 'json_default', _json_formatter)

  def format( self, record):
    record_dict = record.__dict__.copy()
    record_dict['asctime'] = self.formatTime( record)
    log_dict = {
      k: v % record_dict
      for k, v in self.format_dict.items() if v}
    if isinstance( record_dict['msg'], dict):
      log_dict['message'] = record_dict['msg']
    else:
      log_dict['message'] = record.getMessage()
    # Attempt to decode the message as JSON, if so, merge it with the
    # overall message for clarity.
    try:
      log_dict['message'] = json.loads( log_dict['message'])
    except ( TypeError, ValueError):
      pass
    if record.exc_info:
      # Cache the traceback text to avoid converting it multiple times
      # (it's constant anyway)
      # from logging.Formatter:format
      if not record.exc_text:
        record.exc_text = self.formatException( record.exc_info)
    if record.exc_text:
      log_dict['exception'] = record.exc_text
    json_record = json.dumps( log_dict, default=self.default_json_formatter)
    if hasattr( json_record, 'decode'):  # pragma: no cover
      json_record = json_record.decode( 'utf-8')
    return json_record

def setupCanonicalLogLevels( logger, level, fmt, formatter_cls=JsonFormatter, boto_level=None, **kwargs):
  if not isinstance( logger, logging.Logger):
    raise Exception( 'Wrong class of logger passed to setupCanonicalLogLevels().')
  if logger is not None:
    logger.setLevel( level)
  logging.root.setLevel( level)
  if fmt is not None:
    logging.basicConfig( format=fmt)
    fmtObj = logging.Formatter( fmt)
  else:
    fmtObj = None
  for handler in logging.root.handlers:
    try:
      if fmtObj is not None:
        handler.setFormatter( fmtObj)
      elif formatter_cls is not None:
        handler.setFormatter( formatter_cls( **kwargs))
    except:
      pass
  if boto_level is None:
    boto_level = level
  for loggerId in botoLoggers:
    try:
      logging.getLogger( loggerId).setLevel( boto_level)
    except:
      pass
 
 
class NullLogger():
  def __init__( self):
    pass
 
  def purge( self):
    pass
 
  def log( self, level, msg, withPurge=False):
    pass
 
  def debug( self, msg, withPurge=False):
    pass
 
  def info( self, msg, withPurge=False):
    pass
 
  def warning( self, msg, withPurge=False):
    pass
 
  def critical( self, msg, withPurge=False):
    pass
 
  def error( self, msg, withPurge=False):
    pass
 
  def exception( self, msg, withPurge=False):
    pass
 
  def classCode( self):
    return '#null'
 
  def isPurgeable( self):
    return False
 
class PrintLogger():
  def __init__( self, threshold):
    self.threshold = threshold
 
  def purge( self):
    pass
 
  def log( self, level, msg, withPurge=False):
    if level >= self.threshold:
      print( msg)
 
  def debug( self, msg, withPurge=False):
    self.log( logging.DEBUG, msg, False)
 
  def info( self, msg, withPurge=False):
    self.log( logging.INFO, msg, False)
 
  def warning( self, msg, withPurge=False):
    self.log( logging.WARNING, msg, False)
 
  def critical( self, msg, withPurge=False):
    self.log( logging.CRITICAL, msg, False)
 
  def error( self, msg, withPurge=False):
    self.log( logging.ERROR, msg, False)
 
  def exception( self, msg, withPurge=False):
    self.log( logging.ERROR, msg, False)
 
  def classCode( self):
    return '#print'
 
  def isPurgeable( self):
    return False
 
def createPolymorphicLogger( logClass, logGroup, logStream, logLevel = logging.INFO, functionName = None, msgFormat = None):
  if logClass == 'cloud-watch':
    return CustomLogging( logGroup, logStream, logLevel, functionName, msgFormat)
  elif logClass == '#print':
    return PrintLogger( logLevel)
  elif (logClass == '#null') or (logClass is None):
    return NullLogger()
  elif isinstance( logClass, dict) and ('logging' in logClass):
    loggingParams    = logClass.get( 'logging', {})
    cloudWatchParams = loggingParams.get( 'cloud-watch', {})
    if msgFormat is None:
      msgFormat = '#mini'
    actualLogClass  = loggingParams.get( 'class')
    logGroup     = cloudWatchParams.get( 'group'   , logGroup)
    logStream    = cloudWatchParams.get( 'stream'  , logStream)
    logLevel     =    loggingParams.get( 'level'   , logLevel)
    functionName = cloudWatchParams.get( 'function', functionName)
    msgFormat    = cloudWatchParams.get( 'format'  , msgFormat)
    return createLogger( actualLogClass, logGroup, logStream, logLevel, functionName, msgFormat)
  elif isinstance( logClass, dict) and ('class' in logClass):
    canonicalLogClassRecord = {'logging': logClass}
    return createLogger( canonicalLogClassRecord, logGroup, logStream, logLevel, functionName, msgFormat)
  elif logClass == '#standard-logger':
    logger = logging.getLogger( name=logStream)
    if msgFormat is None:
      msgFormat = '[%(levelname)s] %(message)s'
    setupCanonicalLogLevels( logger, logLevel, msgFormat, JsonFormatter, logging.ERROR)
    return logger
  else:
    raise Exception( f'Unrecognised log class {logClass}')
 
def getClassCode( logger):
  code = '#null'
  if isinstance( logger, logging.Logger):
    code = '#standard-logger'
  elif logger is not None:
    try:
      code = logger.classCode()
    except:
      code = '#unrecognised'
  return code
 
def isLoggerPurgeable( logger):
  result = False
  if (not isinstance( logger, logging.Logger)) and (logger is not None):
    try:
      result = logger.isPurgeable()
    except:
      pass
  return result
 
class CustomLogging:
  def __init__( self, logGroup, logStream, logLevel = logging.INFO, functionName = None, msgFormat = None):
    """ logGroup is the name of the CloudWatch log group. If none, the messages passes to print.
        logStream is the name of the stream. It is required. It is a string. There is no embedded date processing.
        logLevel is one of the logging level constants or its string equivalent. Posts below this level will be swallowed.
        functionName is the name of the lambda.
        msgFormat determines the logged message prefix. It is either a format string, a label or a function.
          If it is a format string, the following substitution identifiers:
            {level}  The message log level.
            {func}   The passed functionName
            {caller} The python caller function name
          If it is a label, is one of:
            #standard   - This is the default.
            #short
            #simple
            #mini
          If it is a function (or callable object), it must be a function that returns a prefix string with
            the following input parameters in order:
              level           - passed message level
              functionName  - constructed function name
              caller          - invoker caller name
              logMsg          - passed message
             
        EXAMPLE USAGE 1:
          import custom_logging, logging
         
          logger = CustomLogging( '/aws/ec2/prod/odin', '2022-06-29-MLC_DAILY-143', logging.INFO, 'CoolLambdaFunc', '#mini')
          logger.info( 'Hello friend! This is an info')
          logger.error( 'I broke it!')
          logger.purge()
       
        
        EXAMPLE USAGE 2:
          import custom_logging, logging
         
          logger = CustomLogging( None, None, logging.DEBUG, 'CoolLambdaFunc', '#mini')
          logger.info( 'This is the same as print')
      
        
        EXAMPLE USAGE 3:
          import custom_logging, logging
         
          logger = CustomLogging( None, None, logging.WARNING, 'CoolLambdaFunc', '{caller} | {level} !! {func}: ')
          
       
        
        EXAMPLE USAGE 3:
          import custom_logging, logging
         
          def colourMePink( level, functionName, caller, logMsg):
            if level == logging.DEBUG:
              prefix = '{level}: {func}: {caller}: '.format( level = sLevel, func = functionName, caller = caller)
            elif  level == logging.INFO:
              prefix = ''
            else:
              prefix = '{level}: '.format( level = sLevel)
            return prefix
         
          logger = CustomLogging( None, None, logging.INFO, None, colourMePink)
          
    """
    self.logs           = boto3.client( 'logs', region_name='ap-southeast-2')
    self.logEvents      = []
    self.functionName = functionName
    if self.functionName is None:
      self.functionName = ''
    self.logGroup       = logGroup
    self.logStream      = logStream
    self.msgFormat = msgFormat
    if self.msgFormat is None:
      self.msgFormat = defaultFormat
    if isinstance( self.msgFormat, str) and (self.msgFormat in stockFormats):
      self.msgFormat = stockFormats[self.msgFormat]
    elif self.msgFormat == '#mini':
      self.msgFormat = self._miniFormat
    self.logLevel       = coerceLoggingType( logLevel)
    self.sequenceToken  = None
    self.sequenceTokenIsValid = False
    self.maxEventsInBuffer = 20
    self.maxBufferAgeMs = 60000 # 1 minute.
 
  def _formatMessage( self, caller, logType, logMsg):
    prefix = ''
    if caller is None:
      try:
        caller = sys._getframe(3).f_code.co_name
      except:
        caller = ''
    sLevel = levelNames.get( logType, str( logType))
    if isinstance( self.msgFormat, str):
      prefix = self.msgFormat.format( level = sLevel, func = self.functionName, caller = caller)
    elif callable( self.msgFormat):
      prefix = self.msgFormat( logType, self.functionName, caller, logMsg)
    return prefix + str( logMsg)
 
  def _miniFormat( self, level, functionName, caller, logMsg):
    prefix = ''
    if level >= logging.WARNING:
      prefix = levelNames[ level] + ': '
    if functionName != '':
      prefix = prefix + functionName + ': '
    return prefix
 
  def _getSequenceToken( self):
    self.sequenceToken = None
    self.sequenceTokenIsValid = True
    try:
      response = self.logs.describe_log_streams( logGroupName=self.logGroup, logStreamNamePrefix=self.logStream)
    except self.logs.exceptions.ResourceNotFoundException:
      return 'group-not-found'
    try:
      if 'uploadSequenceToken' in response['logStreams'][0]:
        self.sequenceToken = response['logStreams'][0]['uploadSequenceToken']
      if self.sequenceToken == '':
        self.sequenceToken = None
    except:
      pass
    if self.sequenceToken is None:
      return 'stream-not-found-or-virgin-stream'
    else:
      return None
 
  def put( self, logMsg, logType = logging.INFO, withPurge=False, callFunc = None):
    logType = coerceLoggingType( logType)
    if self.logLevel <= logType:
      if self.logGroup is not None:
        timestamp = int( round( time.time() * 1000))
        message = self._formatMessage( callFunc, logType, logMsg)
        logEvent = {'timestamp': timestamp, 'message': message}
        if self.logLevel == logging.DEBUG:
         print( message)
        self.logEvents.append( logEvent)
        count = len( self.logEvents)
        if withPurge or \
           (count >= self.maxEventsInBuffer) or \
           ((count >= 1) and ((timestamp - self.logEvents[0]['timestamp']) >= self.maxBufferAgeMs)):
          self.purge()
      else:
        print( logMsg)
 
  def classCode( self):
    return 'cloud-watch'
 
  def _primitive_put_log_events( self):
    event_log = {
      'logGroupName' : self.logGroup,
      'logStreamName': self.logStream,
      'logEvents'    : self.logEvents}
    if self.sequenceToken is not None:
      event_log['sequenceToken'] = self.sequenceToken
    try:
      response = self.logs.put_log_events( **event_log)
      self.sequenceToken = response.get( 'nextSequenceToken')
      self.sequenceTokenIsValid = True
      result = None
    except self.logs.exceptions.ResourceAlreadyExistsException:
      self.sequenceTokenIsValid = False
      result = None
    except self.logs.exceptions.DataAlreadyAcceptedException:
      self.sequenceTokenIsValid = False
      result = None
    except self.logs.exceptions.InvalidSequenceTokenException:
      self.sequenceTokenIsValid = False
      result = 'invalid-sequence-token'
    except self.logs.exceptions.ResourceNotFoundException:
      self.sequenceTokenIsValid = True
      self.sequenceToken = None
      result = 'stream-not-found'
    return result
 
  def _primitive_create_log_stream( self):
    self.sequenceTokenIsValid = True
    self.sequenceToken = None
    try:
      self.logs.create_log_stream( logGroupName=self.logGroup, logStreamName=self.logStream)
      result = None
    except self.logs.exceptions.ResourceAlreadyExistsException:
      self.sequenceTokenIsValid = False
      result = None
    except self.logs.exceptions.ResourceNotFoundException:
      result = 'group-not-found'
    return result
 
  def _primitive_create_log_group( self):
   self.sequenceTokenIsValid = True
    self.sequenceToken = None
    try:
      self.logs.create_log_group( logGroupName=self.logGroup)
    except self.logs.exceptions.ResourceAlreadyExistsException:
      pass
 
  def _robust_put_log_events( self):
    status = 'hungry'
    for tryCount in range( 100):
      if status == 'group-not-found':
        self._primitive_create_log_group()
        status = 'stream-not-found'
      elif status == 'stream-not-found':
        status = self._primitive_create_log_stream()
        if status is None:
          status = 'hungry'
      elif status == 'invalid-sequence-token':
        getSequenceResult = self._getSequenceToken()
        # getSequenceResult == 'group-not-found' | 'stream-not-found-or-virgin-stream' | None
        if getSequenceResult == 'group-not-found':
          status = 'group-not-found'
        elif getSequenceResult == 'stream-not-found-or-virgin-stream':
          status = 'stream-not-found'
        else:
          status = 'ready'
      elif status == 'hungry':
        if not self.sequenceTokenIsValid:
          status = 'invalid-sequence-token'
        else:
          status = 'ready'
      elif status == 'ready':
        status = self._primitive_put_log_events()
        if status is None:
          status = 'done'
      if status == 'done':
        break
    if status != 'done':
      raise Exception( 'Failed to post to CloudWatch Logs.')
 
  def purge( self):
    if len( self.logEvents) > 0:
      try:
        self._robust_put_log_events()
      except Exception as ex:
        print( self.logEvents)
        print( ex)
      self.logEvents = []
 
  def log( self, level, msg, withPurge=False):
    self.put( msg, level, withPurge, None)
 
  def debug( self, msg, withPurge=False):
    self.put( msg, logging.DEBUG, withPurge, None)
 
  def info( self, msg, withPurge=False):
    self.put( msg, logging.INFO, withPurge, None)
 
  def warning( self, msg, withPurge=False):
    self.put( msg, logging.WARNING, withPurge, None)
 
  def error( self, msg, withPurge=False):
    self.put( msg, logging.ERROR, withPurge, None)
 
  def critical( self, msg, callFunc = None):
    self.put( msg, logging.CRITICAL, True, callFunc)
 
  def exception( self, msg, withPurge=True):
    self.log( logging.ERROR, msg, True)
 
  def isPurgeable( self):
    return True
 
  def __del__( self):
    try:
      self.purge()
    except:
      pass

How to use

Import custom_logging. In your lambda code, where you need application-centric logging, invoke the factory method createPolymorphicLogger() to create a logger. Then send all your application-centric log events to this logger, instead of print().

The logger is going to have the following public methods.

  • purge()
  • log( level, msg, withPurge=False)
  • debug/info/warning/critical/error/exception( msg, withPurge=False)

Use the log() method to log a string message. ‘level’ is one of the usual logging levels: DEBUG, INFO etc. For performance reasons, messages are buffered before actually sending to CloudWatch. The buffer is purged when either: (A) the buffer gets too long; or (B) the buffer ages out (1 minute); or (C) the withPurge parameter is explicitly set to True. Invoking the purge() method or releasing the custom logger class instance will also do it.

The debug() etc methods are short hand for the log() method when the level is fixed.

How to configure it

Refer to the inline comments.

Posted in Python | Tagged , | Comments Off on Application centric Cloudwatch logging in AWS Lambda functions with python3+ runtimes

SBD-JPath Data Model, Entry #1

In this series of entries, this entry being the first, I will specify the SBD-JPath data model. All the resolved values of expressions of this language are within the value-space of the types described in this data model.

This document specifies a grammar for SBD-JPath, using the same basic EBNF notation used in XML 1.0, 5th edition. White space is not significant in expressions. Grammar productions are introduced together with the features that they describe.

All data types in this model are captured by this class hierarchy diagram. Each line describes a type who is directly descendant from the most previous line with one less indentation.

item
 j-value
  j-string
  j-number
  j-object
  j-array
  j-boolean
  j-null
 function
 map
 tuple
 atomic-value
  string
  number
  boolean
  date
sequence

The item type

The item type is an abstract type. An item is the basic building block of SBD-JPath, and is any thing but a sequence. Items can be j-values, functions, maps, tuples or atomic values. All items are immutable.

The sequence type

A sequence is an ordered list of items. Sequences are immutable. A sequence cannot contain a sequence. The empty sequence is identical to an absence of normal value. A sequence with a cardinality of 1 is identical to it’s one member. Sequences are not bags, an item can appear more than once in a sequence. The origin for indexing sequences is 1 (Sorry javascript developers! I chose 1 to be closer to XPath, and for other pragmatic reasons related to sequence predicates). Some core properties/functions of sequences (this is not exhaustive) include:

  1. last(): integer

Last returns the cardinality of the sequence. Equivalently, in the case of non-empty sequences, this is equal to the index of the last item.

Some core operators properties of sequences (this is not exhaustive) include:

  1. left-operand , right-operand
  2. left-operand < < right-operand
  3. left-operand >> right-operand
  4. left-operand is right-operand
  5. left-operand union right-operand
  6. left-operand except right-operand
  7. 1 to 10

A specification for these operands will be defined in a future post. They are equivalent to the ones of the same symbol in XPath 3.1 .

All empty sequences are identical to all other empty sequences. A sequence is identical to another sequence if and only if they have the same cardinality and each member in order, is identical.

An empty sequence can be constructed thus:

()

The j-value type

The j-value type is an abstract type. It descends from item. It is identical to the type described as “value” in the JSON specification. j-values should be seen as reference data, as opposed to value data, for the purposes of identity. For example, consider the following JSON datum.

{
  "colour": "red",
  "flag": "red
}

The value of the colour property of this json object is a node whose string value is “red”. The type of the node is j-string, which inherits from j-value. Similarly, the value of the flag property is a node whose string value is also “red”. The type of the node is j-string, which inherits from j-value. The two aforementioned instances of j-value are NOT equal nor identical. The string ‘red’ is identical to ‘red’, because string is a value kind of datum. In contrast the two j-values, even though they may have identical property values, are not identical. This is because j-value is a reference kind of datum.

Descendant types are j-string, j-number, j-boolean, j-null, j-object, j-array.

All j-values have two properties:

  1. parent: j-value?
  2. root: j-value

The parent of an array member is the containing array. The parent of a json object’s values are the containing json object. Otherwise the parent is the empty sequence. The root is the ultimate parent or the j-value itself, starting from the given j-value and running up the ancestral path.

The j-string type

The j-string type is a concrete type. It descends from j-value and is sealed. It is identical to the type described as “string” in the JSON specification.

A literal j-string instance without a parent, can be constructed thus:

:"red"
[1] j-stringy  ::= string-delimiter (literal-char)* string-delimiter
[2] literal-j-string  ::= ":" j-stringy
[3] string-delimiter  ::= """"
[4] literal-char ::= [^\\"] | "\""" | "\\" | "\/" | "\b" | "\f" | "\n" | "\r" | "\t" | "\u[\d]{4}"

[ebnf title=”j-stringy diagram”]
“j-string” {
j-stringy = string-delimiter { literal-char } string-delimiter.
literal-j-string = “:” j-stringy.
string-delimiter = “”””.
}
[/ebnf]

The model value for such constructed strings are as per JSON specification.

The j-number type

The j-number type is a concrete type. It descends from j-value and is sealed. It is identical to the type described as “number” in the JSON specification.

A literal j-number instance without a parent, can be constructed thus:

:3.14
[5] j-numbery  ::= ""-"? ("0" | ([1-9] \d*)) ("." \d*)? ([eE] [+\-]? \d+)?
[6] literal-j-number  ::= ":" j-numbery

The model value for such constructed numbers are as per JSON specification.

The j-object type

The j-object type is a concrete type. It descends from j-value and is sealed. It is identical to the type described as “object” in the JSON specification.

A literal j-object instance without a parent, can be constructed thus:

:{"menu": ["fish", "poultry"]}
[7] j-objecty  ::= "{" (j-stringy ":" j-valuey ("," j-stringy ":" j-valuey)*)? "}"
[8] literal-j-object  ::= ":" j-objecty
[9] j-valuey ::= j-objecty | j-arrayy | j-numbery | j-stringy | j-booleany | y-nully

The model value for such constructed objects are as per JSON specification.

The j-array type

The j-array type is a concrete type. It descends from j-value and is sealed. It is identical to the type described as “array” in the JSON specification.

A literal j-array instance without a parent, can be constructed thus:

:["fish", "poultry"]
[10] j-arrayy ::= "[" (j-valuey ("," j-valuey)*)?] "]"
[11] j-array ::= ":" j-arrayy

The model value for such constructed arrays are as per JSON specification.

The j-boolean type

The j-boolean type is a concrete type. It descends from j-value and is sealed. It is identical to the union of the types described as “true” and “false” in the JSON specification.

A literal j-boolean instance without a parent, can be constructed thus:

:true
[10] j-booleany ::= "true" | "false" 
[11] j-boolean ::= ":" j-booleany

The model value for such constructed booleans are as per JSON specification, with rendered “true” representing logical true, and rendered “false” representing logical false.

The j-null type

The j-null type is a concrete type. It descends from j-value and is sealed. It is identical to the type described as “null” in the JSON specification.

A literal j-null instance without a parent, can be constructed thus:

:null
[12] j-nully ::= "null" 
[14] j-null ::= ":" j-nully

The model value for such constructed nulls are as per JSON specification.

The function type

The function type is a concrete type. It descends from item and is sealed. Functions can be anonymous or named.

A literal anonymous function which doubles a number, can be constructed thus:

function( $x number) as number { 2 * $x }
[15] literalFunction ::= "function" "(" (param ("," param)*)? ")" ("as" sequenceType)? enclosedExpr	
[16] param ::= "$" name ("as" sequenceType)?	
[17] enclosedExpr ::= "{" expr? "}"

`expr` will be defined later. It is basically an expression.
`name` will be defined later. It is basically a programmatic identifier, with a grammar common to most language grammars for variable identifiers.
`sequenceType` is a type specification. Parameters and function returns can be so typed.

Defining our productions further …

sequenceType

[18] SequenceType ::= ("empty-sequence" "(" ")") | (itemType occurrenceIndicator?)	

A sequence type is a test for a parameter type. If the actual value of the parameter does not pass the test, it will be a static error, if this error can be syntactically detected, and a run-time error if not. The `empty-sequence()` passes if and only if the actual value is an empty sequence.

sequenceType

[19] itemType ::= kindTest | ("item" "(" ")") | functionTest | mapTest | tupleTest | atomic
[20] occurrenceIndicator ::= "?" | "*" | "+"

An item type is a test for a parameter type. If the occuranceIndicator is `?`, the count of items must be 0 or 1. If the occuranceIndicator is `*`, the count of items can be any number. If the occuranceIndicator is `+`, the count of items must be at least 1. If there is no occuranceIndicator, then the count of items must be precisely 1.

sequenceType

[21] kindTest ::= j-stringTest | j-numberTest | j-booleanTest | j-nullTest | j-arrayTest | j-objectTest | j-anyTest

A kind test is a test for a parameter type. The parameter value must be one of the standard json data types (j-string, j-number, j-object etc.)

j-stringTest

[22] j-stringTest ::= ("text" "(" ")") | ("text-or-null" "(" ")") | ("nonempty-text" "(" ")") 

A j-string test is a test for a parameter type. The parameter value must be a j-string value, or in the case of text-or-null(), either a j-string value or a j-null value. In the case of nonempty-text(), the j-string string value must be a non-empty string.

j-numberTest

[23] j-numberTest ::= ("number" "(" number-constraint* ")") | ("number-or-null" "(" number-constraint* ")")
[24] number-constraint ::= ("min" S number) | ("max" S number) | ("grain" S number) 

A j-number test is a test for a parameter type. The parameter value must be a j-number value, or in the case of number-or-null(), either a j-number value or a j-null value. Each number-constraint kind (min, max or grain), can only occur once, but in any order. If min is present, and the value is not j-null, the test fails if the numerical actual value of the parameter is less than the specified min number. Similarly for max. It is a static error, if the max value is less than the min value. The `grain` number must be specified with exponent, and be a positive number. If the grain constraint is specified, and the value is not j-null, and the remainder of the parameter actual value after division by the grain number (granularity) is non-zero, then the test fails.

j-booleanTest

[25] j-booleanTest ::= ("boolean" "(" ")") | ("boolean-or-null" "(" ")")

A j-boolean test is a test for a parameter type. The parameter value must be a j-boolean value, or in the case of boolean-or-null(), either a j-boolean value or a j-null value.

j-nullTest

[26] j-nullTest ::= "null" "(" ")"

A j-null test is a test for a parameter type. The parameter value must be a j-null.

j-arrayTest

[27] j-arrayTest ::= ("array" | "array-or-null") "(" arrayTypeConstraint* ")"
[28] arrayTypeConstraint ::= ("base" S (kindTest - j-anyTest)) | ("min" S number) | ("max" S number)

A j-array test is a test for a parameter type. The parameter value must be a j-array, or either j-array or j-null, in the case of `array-or-null()`. If constraints are present, the constraints must be met. The can be at most only one base constraint, one min constraint and one max constraint, but they can be in any order. If the base constraint is present, each member of the parameter value array, if the value is not j-null, must pass the specified kindTest. The numbers for the min and max, must be non-negative integers and not be rendered with the characters “+”, “-“, “.”, “e” and “E”. It is a static error for the max number to be less than the min number. It is an error if the cardinality of the array is less than the min (if specified) or greater than the max (if specified).

j-objectTest

[29] j-objectTest ::= ("object" | "object-or-null") "(" ")"

A j-object test is a test for a parameter type. The parameter value must be a j-object, or either j-object or j-null, in the case of `object-or-null()`. In future versions of SBD-JPath, we may allow extensions to the test, which will constrain the objects to a given JSON schema. It is envisaged that SBD-JPath compliant implementations of SBD-JPath processors will each have a convenient mechanism for which to register schemas. These schemas could then be leveraged in j-object tests.

j-anyTest

[30] j-anyTest ::= "node" "(" ")"

A j-any test is a test for a parameter type. The parameter value must be a json datum, namely j-value. This includes j-null.

atomic

[31] atomic  ::= "string" | "number" | "boolean" | "date"

An atomic test is a test for a parameter type. The parameter value must be one of the fundamental non-node types of SBD-JPath, to wit: string, number, boolean or date. None of these types include a null value in their value-space. Dates do not include a time component, nor time-zone. The test passes if the parameter type is a string (in the case of “string”), etc.

functionTest

[31] function-test ::= "function" "(" "*" ")"

A function-test is a test for any function. The test passes if the parameter is a function of any signature.

mapTest

[32] map-test ::= "map" "(" "*" ")"

A map-test is a test for any map. The test passes if the parameter is a map.

tupleTest

(Content to be developed)

The map type

(Content to be developed)

The tuple type

(Content to be developed)

literal-tuple ::= "tuple" tuple-identifier "{" (identifier ":" expression ("," identifier ":" expression)* )? "}"
tuple-type-declaration ::= "type" identifier "tuple" "(" (identifier ":" sequence-type ("=" "(" expression ")")? )+ ")"
tuple-expression ::= tuple-type-declaration ( "," tuple-type-declaration)* return expression
identifier ::= "$" identifier-first-character (identifier-subsequent-character)* 
identifier-first-character ::== [A-Za-z]
identifier-subsequent-character ::= identifier-first-character | [0-9/-_]
let
  type $coordsType = tuple( latitude: number = 0, longitude: number),
  $coords = tuple $coordsType( longitude: 30) return
  travel-north( $coords)

The atomic-value type

(Content to be developed)
(Content should follow structure:
* basic description
* position in the type hierarchy
* value-space
* properties
* constructor grammar
* some core operators and functions
)

The string type

(Content to be developed)

The number type

(Content to be developed)

The boolean type

(Content to be developed)

The date type

(Content to be developed)

Posted in SBD-JPath | Comments Off on SBD-JPath Data Model, Entry #1

Introducing SBD-JPath

SBD-JPath is an expression language that allows the processing of json data. Core ideas for this language were inspired from the XPath 3.1 language and JSONPath.

There are many json processing languages and libraries already, so why SBD-JPath, yet again another one? SBD-JPath introduces a number of features not yet seen in competitors, to wit:

  • A formal language specification
  • Language features inspired by XPath 3.1, including maps and functions
  • A syntax closer to XPath
  • Ability to navigate “up” the tree (that is to say, to reference ancestor nodes)
  • Greater extensibility, with the ability to externally provide functions

This blog post is merely an introduction to the concept of SBD-JPath. A series of future blog posts, taken together will comprise the language specification. We will start with a data model, the specify core concepts and core operators, and then finally I will specify non-core inbuilt functions and operators.

SBD-JPath will be instrumental in the development of another language, as yet unnamed, similar to XSLT 3.0, which seeks to transform json data into html fragments and visa-versa.

Just to wet your appetite, here is a sample task and solution provided by SBD-JPath.
In this task, we have sales data from a fruit vendor. What is the SBD-JPath expression to return the volume of sales? The sales data is thus:

{
  "apple":{
    "price": 3.10,
    "quantity": 100
    },
  "orange":{
    "price": 1.50,
    "quantity": 20
    }
}

The task answer is thus:

  sum( */(price * quantity))

So dear reader, read my future blog posts to receive the full specification.

Posted in SBD-JPath | Comments Off on Introducing SBD-JPath

Connecting Auth0 to DynamoDb and CloudWatch

If your website uses Auth0 for user authentication, and you have subscribed to an Enterprise Plan (https://auth0.com/pricing), you can store your user data in your own database, rather that Auth0’s internal database. The connection between Auth0 and your database is done through a set of Node.js scripts that you supply. Auth0 provides template scripts for many databases, but not AWS DynamoDb.

What if you want to store your users on DynamoDb, and log log-in/log-out etc. events in a daily CloudWatch log stream? Here is how you do it ….

Log-in script

Auth0’s custom database connection requires 6 scripts to be defined: Login, Create, Verify, Change Password, Get User and Delete. Here is a script for Login. Once you get the picture, you can copy and adapt this script for the 5 others. Note that I have assumed a certain structure in my Dynamodb user table. If your structure (fields and global indicies) are different, you will need to do some minor adjustment accordingly.

function login(email, password, callback) {

var AWS = require('aws-sdk');
var crypto = require('crypto');

var region = configuration.region;
var accessKeyId = configuration.accessKeyId;
var secretAccessKey = configuration.secretAccessKey;
var groupName = configuration.groupName;
var userTableName = configuration.userTableName;
var salt = configuration.salt;
var streamNamePrefix = configuration.streamNamePrefix;

var logEvents = [];


var cloudwatchlogs = new AWS.CloudWatchLogs({
    apiVersion: '2014-03-28',
    region: region,
    accessKeyId: accessKeyId,
    secretAccessKey: secretAccessKey
});

var dynamodb = new AWS.DynamoDB({
    apiVersion: '2012-08-10',
    region: region,
    accessKeyId: accessKeyId,
    secretAccessKey: secretAccessKey
});



function zeroPad(num, places) {
    var zero = places - num.toString().length + 1;
    return Array(+(zero > 0 && zero)).join("0") + num;
}

function todayAsString() {
    var mydate = new Date();
    return zeroPad(1900 + mydate.getYear(), 4) +
        '-' + zeroPad(mydate.getMonth(), 2) +
        '-' + zeroPad(mydate.getDate(), 2);
}

var streamName = streamNamePrefix + '/' + todayAsString();

function makeLogGroup(cb) {
    cloudwatchlogs.createLogGroup({
            logGroupName: groupName
        },
        cb
    );
}

function makeLogStream(cb) {
    cloudwatchlogs.createLogStream({
            logGroupName: groupName,
            logStreamName: streamName
        },
        cb);
}

function createLogGroupIfNotExists(cb) {
    var tryCount = 0;
    var maxTries = 3;

    function tryOnce() {
        cloudwatchlogs.describeLogStreams({
                logGroupName: groupName,
                logStreamNamePrefix: streamName
            },
            function(err, data) {
                if (err) {
                    if ((err.code === 'ResourceNotFoundException') && (tryCount++ < = maxTries)) {
                        makeLogGroup(function(gerr, gdata) {
                            if (gerr) {
                                cb(gerr);
                            } else {
                                tryOnce();
                            }
                        });
                    } else {
                        cb(err);
                    }
                } else {
                    cb(null, data);
                }
            });
    }

    tryOnce();
}

function createLogStreamIfNotExists(cb) {
    var tryCount = 0;
    var maxTries = 3;

    function tryOnce() {
        createLogGroupIfNotExists(function(err, data) {
            if (err) {
                cb(err);
            } else {
                if (data.logStreams.length === 0) {
                    makeLogStream(function(lerr, ldata) {
                        if ((lerr) && (tryCount++ <= maxTries)) {
                            cb(lerr);
                        } else {
                            tryOnce();
                        }
                    });
                } else {
                    cb(null, data.logStreams[0]);
                }
            }
        });
    }

    tryOnce();
}

function createLogEvent(rec) {
    return {
        message: typeof rec === 'string' ?
            rec : JSON.stringify(rec),
        timestamp: typeof rec === 'object' && rec.time ?
            new Date(rec.time).getTime() : Date.now()
    };
}

function cloudLog(event) {
    logEvents.push(createLogEvent(event));
}

function emit(cb) {
    function doCallBack(err, data) {
        if (typeof cb === 'function') {
            cb(err, data);
        }
    }

    if (logEvents.length) {
        createLogStreamIfNotExists(function(err, streamMetaData) {
            var nextToken = streamMetaData ? streamMetaData.uploadSequenceToken : null;
            if (err) {
                if (typeof cb === 'function') {
                    cb(err);
                }
            } else {
                var params = {
                    logEvents: logEvents,
                    logGroupName: groupName,
                    logStreamName: streamName
                };
                logEvents = [];
                if (nextToken) {
                    params.sequenceToken = nextToken;
                }
                cloudwatchlogs.putLogEvents(params, function(err, data) {
                    doCallBack(err, data);
                });
            }
        });
    } else {
        doCallBack();
    }
}

function getUserByEmail(email, cb) {
    dynamodb.query({
            TableName: userTableName,
            IndexName: 'email-index',
            ExpressionAttributeNames: {
                "#u": "user-id"
            },
            ExpressionAttributeValues: {
                ":v1": {
                    S: email
                }
            },
            KeyConditionExpression: 'email = :v1',
            ProjectionExpression: '#u,email,nick,phash'
        },
        cb);
}

function hashString(datum) {
    var hash = crypto.createHash('sha256');
    hash.update(datum);
    return hash.digest('base64');
}

function hashPassword(user_id,given_password) {
    return hashString(user_id + '|' + salt + '|' + given_password);
}

function pass(user_id, given_password, phash) {
    return hashPassword(user_id,given_password) === phash;
}

function testCredentials(email, password, cb) {
    getUserByEmail(email, function(err, data) {
        var user = null;
        if ((!err) && (data.Items.length >= 0)) {
            user = {
                user_id: data.Items[0]['user-id'].S,
                nickname: data.Items[0].nick.S,
                email: data.Items[0].email.S,
                phash: data.Items[0].phash.S
            };
        }
        if (user && ((user.email !== email) || (!pass(user.user_id, password, user.phash)))) user = null;
        cb(err, user);
    });
}

    cloudLog({method:'login',control:'ENTER'});
    testCredentials(email, password, function(err, user) {
        if (err) {
            cloudLog(err);

        } else if (user) {
            cloudLog({
                method: 'login',
                pass: 'true',
                email: email,
                user_id: user.user_id
            });

        } else {
            cloudLog({
                method: 'login',
                pass: 'false',
                email: email
            });
        }
        cloudLog({method:'login',control:'EXIT'});
        emit(function(emit_err, datum) {
            callback(err);
        });
    });
}

Settings

In the settings, you will need to define the following configuration items:

  • region AWS region code for both the dynamodb table and cloudwatch logs.
  • accessKeyId IAM Access key for AWS operations. See the section on User Policies below.
  • secretAccessKey Goes with accessKeyId.
  • groupName The ClouldWatch Log Group name. The group will be created if it does not exist.
  • userTableName The DynamoDb table name.
  • salt Just some random secret string to salt the passwords.
  • streamNamePrefix Prefix for the CloudWatch log stream name.

IAM User Policies

You will need to assign at least the following policies (after substitution of place-markers) to the user, who’s credentials were passed in the Auth0 custom database connection settings above.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "arn:aws:logs:<#region>:<#account>:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:DescribeLogStreams",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:<#region>:<#account>:log-group:<#group>:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:DeleteItem",
                "dynamodb:GetItem",
                "dynamodb:PutItem",
                "dynamodb:Scan",
                "dynamodb:UpdateItem"
            ],
            "Resource": "arn:aws:dynamodb:<#region>:<#account>:table/<#table>"
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:Query"
            ],
            "Resource": "arn:aws:dynamodb:<#region>:<#account>:table/<#table>/index/<#index>"
        }
    ]
}

… where the following place-markers are substituted for your particular values:

  • <#region> The AWS region code for dynamodb and cloudwatch. Eg. ap-southeast-2 .
  • <#account> Your AWS account number/identifier.
  • <#group> The name of the CloudWatch log group.
  • <#table> The name of the DynamoDb user table.
  • <#index> The name of the global index used to look-up the user table by email address. In the supplied code fragment, this name is ’email-index’. Change as you require.

Dynamodb schema

I have assumed that the user table has schema that follows this pattern of item:

{
  "email": "sean@seanbdurkin.id.au",
  "email_verified": true,
  "nick": "Sean",
  "phash": "<#redacted>",
  "user-id": "sean"
}

where the primary key is user-id. You will also need a global index to look-up users based on email. Probably you should also add fields for username and user_metadata.

A note about logging

The CloudWatch log name will be ‘< #streamNamePrefix>/< #Date>‘ where < #streamNamePrefix> is as given by the settings, and < #Date> is today’s date. There is an assumption that there will be no other log streams which begin with ‘< #streamNamePrefix>/< #Date>‘, but are not ‘< #streamNamePrefix>/< #Date>‘. So when we search for streams prefixed with ‘< #streamNamePrefix>/< #Date>‘, we only get zero streams, or exactly one stream, which is the stream we want. If this assumption is not going to hold in your architecture, adjust the code accordingly.

What about the other 5 scripts?

You can develop them yourself. Once you see how the login script is made, it’s just a case of cut and paste, with some obvious modification.

After-thoughts

The custom database connection is only available on the expensive Enterprise plan, or on a 30 day trial. If it was on the free plan, I would use Auth0 for my amateur and Start-Up projects. It is not good keeping your user table in a foreign database, because you can’t join it with other tables.

Posted in Web hosting | Comments Off on Connecting Auth0 to DynamoDb and CloudWatch

Chess Ex

Abstract – Chess XML

WORK IN PROGRESS – DO NOT READ YET !!

To be developed.

Purpose

Chess Ex is an record format to record the play of a chess game. The format is a subset of lexical XML, described by an XML schema.

Statement of copying permission

To be developed.

Example

To be developed.

Namespace

Chess Ex will use the following namespace …

xmlns:c="http://seanbdurkin.id.au/pascaliburnus2/archives/243"

Prose description

c:games
Games is the root element of a Chess Ex document and represents a collection of games. It is permissible for a Chess Ex document to be contained within an XML chimera document, so that c:games is not the root element of the containing XML document. c:games contains exactly one child element in this namespace, c:chess.

c:games/@version
Must be present with a value of 1. In future versions of the Chess Ex schema, this might have other values.

c:games/c:chess
c:chess is a container for chess games. Its child elements in this namespace consist of 1 or more c:event, followed by 1 or more related c:chess-game.

c:games/c:chess/c:event
c:event represents an event at which one or more chess games are played. It has zero or one child element in this namespace, c:server.

c:games/c:chess/c:event/c:title
Records the name of the tournament or match event. Data type is xs:string. Required.

c:games/c:chess/c:event/c:id
Records an internal identifier for the event. Data type is xs:ID. Games can be made to relate to events by reference to this identifier. Required.

c:games/c:chess/c:event/c:date
Date and time of the starting moment of the event. This datum must include time-zone. If all the games of event were physically located within a sole time-zone jurisdiction, then this is the one that must be used. Otherwise UTC+00:00 must be used. Data type is xs:date. Required.

c:games/c:chess/c:event/c:site
Location of the event. If there is no physical site, for such reasons as the event was by correspondence or over internet, then this element should be absent. Data type is xs:string. Optional.

c:games/c:chess/c:event/c:server |
c:games/c:chess/c:chess-game/c:server
These elements capture information about the computer program which facilitated the game. The c:server element can occur in two possible positions. When under c:event, c:server records default server information that applies by default to each of the games that relate to that event. When under c:chess-game, c:server overrides the default data attached to the event. c:server has 5 child elements (c:program-name,c:uri,c:vendor,c:host,c:version) which must be present exactly once, and in that order, except in the cases indicated by the description of the ref attribute, below.

c:games/c:chess/c:chess-game/c:server/@ref
This datum specifies how this c:server element applies to the parent game. The data type is enumeration: (nil | inherit | override). The default is override. This attribute is only allowed when the c:server element is parented by c:chess-game (as opposed to c:event parent). The meanings are as follows …

c:games/c:chess/c:chess-game/c:server/@ref=’nil’
There is no server information for this c:chess-game, irrespective of the linked c:games/c:chess/c:event/c:server element. In this case, the 5 child elements of c:server (c:program-name,c:uri,c:vendor,c:host,c:version), must be absent.

c:games/c:chess/c:chess-game/c:server/@ref=’inherit’
The server information for this c:chess-game, is inherited from the linked c:games/c:chess/c:event/c:server element. Such a linked element must exist. In this case, as in the case of @ref=’nil’, the 5 child elements must be absent.

c:games/c:chess/c:chess-game/c:server/@ref=’override’
The server information for this c:chess-game, is fully provided by the child nodes of this c:server. Any linked c:games/c:chess/c:event/c:server element, if it exists, is considered to be overridden.

c:games/c:chess/c:event/c:server/c:program-name |
c:games/c:chess/c:chess-game/c:server/c:program-name
The title of the computer program which is facilitating this game. Data type is xs:string. May be an empty string.

c:games/c:chess/c:event/c:server/c:uri |
c:games/c:chess/c:chess-game/c:server/c:uri
The URI for an informational resource which relates computer program which is facilitating this game. Data type is xs:anyURI. May be empty.

c:games/c:chess/c:event/c:server/c:vendor |
c:games/c:chess/c:chess-game/c:server/c:vendor
The URI for an informational resource which relates computer program which is facilitating this game. Data type is xs:string. May be empty.

c:games/c:chess/c:event/c:server/c:host |
c:games/c:chess/c:chess-game/c:server/c:host
A description of the host server which hosts the computer program which is facilitating this game. Data type is xs:string. May be empty.
c:games/c:chess/c:event/c:server/c:version |

c:games/c:chess/c:chess-game/c:server/c:version
A description of the version of the computer program which is facilitating this game. Data type is xs:string. May be an empty string.

c:games/c:chess/c:chess-game

c:games/c:chess/c:chess-game/@event-id

c:games/c:chess/c:chess-game/c:players

c:games/c:chess/c:chess-game/c:players/(c:white|c:black)

c:games/c:chess/c:chess-game/c:players/(c:white|c:black)/c:name

c:games/c:chess/c:chess-game/c:players/(c:white|c:black)/id

c:games/c:chess/c:chess-game/c:players/(c:white|c:black)/rating

c:games/c:chess/c:chess-game/c:players/(c:white|c:black)/@class

c:games/c:chess/c:chess-game/c:players/(c:white|c:black)/@class=’human’

c:games/c:chess/c:chess-game/c:players/(c:white|c:black)/@class=’artifice’

c:games/c:chess/c:chess-game/c:players/(c:white|c:black)/@class=’other’

c:games/c:chess/c:chess-game/c:controls

c:games/c:chess/c:chess-game/c:plies

c:games/c:chess/c:chess-game/c:plies/c:ply

c:games/c:chess/c:chess-game/c:plies/c:ply/@pro

(), WQ, WB, WN, WR, BQ, BB, BN, BR

c:games/c:chess/c:chess-game/c:result

Event identifier
c:games/c:event/@id
xs:token
An internal identifier for the tournament or match event. Individual game records will link to this within the same document.

Event title
c:games/c:event/c:title
xs:string
Name of the tournament or match event.

Event date
c:games/c:event/c:date
xs:dateTime
Date and time of the starting moment of the event. This datum must include time-zone. If all the games of event were physically located within a sole time-zone jurisdiction, then this is the one that must be used. Otherwise UTC+00:00 must be used.

Event sponsor
c:games/c:event/c:sponsor
xs:string
Name of the sponsor of the event.

Game event reference
c:games/c:game/@event
xs:token
Refers to the event in which this game was a part of. With c:game as the focus node, the referenced event has XPath let $id=@event return ../c:event[@id=$id]

game/section
game/stage
game/board
game/dateTime
game/site
event/site (default)
game/timeControls
/phase
@moves : cardinal or ‘INF’, 1+
@time : cardinal or ‘INF’, seconds
@mode : ‘add’ or ‘set’
players/team/position()=1
players/team/position()=2
players/team/player
/human
/surname & /other-name
/name
/rating /@system
/artifice
/server-prog-name
/platform Win Android iOS OSX Linux
/version
/rating /@system
/client-prog-name
move
/source /destin
/en-passent
/promote
check
annotation
annotation message/xml:lang
result

Schema

Acknowledgements

http://chess.stackexchange.com/questions/12403
http://www.xml.com/pub/a/2004/08/25/tourist.html

PGNXML
http://www.cybercom.net/~zbrad/Chess/pgnxml/
http://www.saremba.de/chessgml/

ChessGML http://www.saremba.de/chessgml/standards/pgn/pgn-complete.htm
chess/tournament/eventinfo/event
chess/tournament/eventinfo/site
chess/tournament/players/player/@id
chess/tournament/players/player/@table-ref
chess/tournament/players/player/person/@cbuf-id
chess/tournament/players/player/person/surname
chess/tournament/players/player/person/firstname
chess/?/moves/sanMoves

[Event “F/S Return Match”]
[Site “Belgrade, Serbia JUG”]
[Date “1992.11.04”]
[Round “29”]
[White “Fischer, Robert J.”]
[Black “Spassky, Boris V.”]
[Result “1/2-1/2”]

1. Event (the name of the tournament or match event)

Site (the location of the event)

Date (the starting date of the game)

Round (the playing round ordinal of the game)

White (the player of the white pieces)

Black (the player of the black pieces)

Result (the result of the game)

Posted in XML | Comments Off on Chess Ex

Announcing DUnit-M – Data driven unit testing for Delphi

Hurray! Today I published DUnit-M, a data-driven unit testing framework for Delhpi XE7 and beyond.

Features include:

* Data-driven testing support
* Multiple loggers are configurable through an observer-subscriber pattern
* An application generation wizard to assist in the creation of unit testing programs.

I believe that DUnit-M is the first Delphi FOSS library which offers data-driven unit testing. If I am wrong, please tell me so in the comments below.

Test case data can be hosted in a text file or Excel file (or anything else that you can specify how to iterate through), and each line/row is its own test case.

DUnit-M is released as a free and open source software library under an Apache 2.0 license.

You can find the source code at

https://bitbucket.org/sean_b_durkin/dunit-m

Posted in Delphi | Comments Off on Announcing DUnit-M – Data driven unit testing for Delphi

How to wait on multiple events (Delphi)

Abstract

How can one write code to block (wait) on multiple syncro events at once, in a multi-threaded application? We are of course, talking about Delphi. By “multiple synchro events at once”, I mean that the thread unblocks on some logical combination of the statuses of the member events. Here I use the word “event” in a very general sense, not to be confused with the more specific sense of the word event, as in Embarcadero’s “TEvent”. In the general sense, by “event” or “synchro event”, I mean the broad class of synchro objects like semaphore (TSemaphore) and event (TEvent).

The typical and most common logical combination is the “OR” operation. In other words, how can we block on multiple synchro events and be unblocked when the first event is signaled, or a time-out occurs. When we are finally unblocked, how can we know which event was the signaled one?

The Task

Write some re-usable library code, in Delphi XE7+, to block on multiple synchro events at once, with time-out. The code must be:

  • cross-platform,
  • Bullet proof and
  • Have a really simple API.

The wait function should allow for optional time-out, detect which entity caused the signal, and should leverage operating system capabilities for optimal efficiency.

In this post, I will focus a particular subset of this task, where:

  • all the synchro events are semaphores (eg. TSemaphore, but not TEvent)
  • access to the semaphores is restricted to this library. In other words, they are unnamed.
  • The wait condition is simply just to wait on the chronologically first semaphore to be signaled, or a time-out to occur, whichever first.

However, I will also address, in less detail, the more general task.

By Why?

One of the big bug-bears in multi-threaded programming is that if a thread is blocked on some semaphore for normal operational purposes, then it can’t check if it is time to gracefully shut-down until it is unblocked. Unblocking driven by the semaphore being signaled. But if it is time to gracefully shutdown, then it is quiet likely that the semaphore, in which it is blocked on will never again be signaled, precisely because it is time to shut-down. For this reason, a lot of multi-threaded applications (whether written in Delphi or not) have problems shutting down gracefully. I call this the “block-and-check-terminate” problem. Some readers may be quick to respond, that this is a non-issue. With “proper” design, your programs will never have a “block-and-check-terminate” problem. While this may well be true, in my experience, I have found that in real applications, designing to avoid “block-and-check-terminate” problems, without a generic soltion, is complex and tedious.

What is needed, is a generic cross-platform solution.

What have others done

Jedi

The Jedi Component Library, in it’s multi-thread component code, provided a wrapper for thread and semaphore. It associated a special semaphore with each thread. The thread semaphore would start life as unsignaled. A call to terminate the thread would cause the special semaphore to be signaled. Whenever the thread would wait on some regular semaphore, the wrap would be defined in such a way that it would call the windows WaitForMultipleObjects() function on two semaphores: the explicit semaphore, and the thread’s special semaphore. If the thread was terminated while it was blocked on some operational semaphore, the thread would unblock, detect the condition and properly handle it.

It was a elegant solution to the most common subset of our task. The problem is though, that it only works for Windows. Android, iOS and OSX have no o/s API equivalent to WaitForMultipleObjects(). I don’t know why. Maybe mobile applications just have less need for within-app parallelism? If you search StackOverflow, there are more than a few questions asking how to wait on multiple semaphores, on Android or iOS, and there are no real solutions given. This is just my opinion. If you disagree, please post a comment below.

Just Kill threads

I’ve seen this done. When the program needs to shut-down, it just kills its non-main threads with an explicit kill command. Any self-respecting developer will be very uncomfortable with the solution, and it’s degree of safety is very situational.

Design Around

Design around the issue, so that no thread ever has to wait on multiple conditions. Good luck with that.

Design Above

Use a threading library, like OTL, to provide sufficient higher level structures to support parallel tasking, that there is no design need for lower level structures such as semaphores and events. You could do this, but it is a bit limiting. I think even with OTL, you will still run into requirements where you need to wait on multiple semaphores.

Count-down Latch

This SO post, suggests using a TCountdownEvent object (SyncObjs unit). When either of the component semaphores is signaled, also signal the count-down latch. To wait on the first of either, wait on the latch. The limitations of this solution are:
The code that signals the component semaphores has to know about the latch rules. That’s not a general solution.
What happens when something wants to wait directly on a component semaphore? The relationship between the component semaphores and the latch is destroyed. So you either have to have code to deal with this (complex), or the latch solution is just used in situations where the only consumer of either semaphore is the consumer of both. Another problem is that it may be inefficient (CPU-wise) for Windows.

Offered Solution

The solution that I offer has a story that goes like this …

Take the synchro classes that you want to use (TSemaphore, TEvent etc), and wrap them so that you take control of their WaitFor() and Signal() methods.

Imagine a new object, called a “Condition”. A condition is a syncro object which is a composite view of an arbitrary set of member synchro objects. This is the Composite Pattern applied to synchro objects. The condition is considered signaled according to some rule of your design, based on the signal status of the members. For example, it could be deemed signaled, when one or more members are signaled, but clear when all members are clear. Apply a rule, that you cant directly signal the condition, because that would be meaningless.

The condition is implemented by a private semaphore. When a member semaphore is signaled, the aforementioned condition semaphore is signaled (or not, if you have a different composition rule). When a member semaphore is successfully waited on, from code outside of the condition, the condition status needs to be updated atomically. When the condition is waited for, and unblocked, resource counts need to be decremented from the contributing signaler.

Construction, destruction and operation of the condition needs to happen transparently, from direct operations on the member semaphores. We achieve this by each condition keeping a record of its member semaphores and their contribution to the resource count; and also, each member semaphore keeping a record of the list of conditions that it is entangled in. Construction and destruction of conditions needs to safely and correctly update entanglement lists.

A single critical section (Gate), is shared by all synchro objects and conditions that might be entangled together. The gate must be passed in as a construction parameter.

When the operating system call WaitForMultipleObjects() is available, it is used instead of the private condition semaphore. This call will be available if and only if:

  1. the operating system is win32/64; and
  2. all member objects are descendants of THandleObject (and thus have a windows “handle”); and
  3. the client specifies so via a construction parameter.

The Full Source Code

Listing 1 below shows a solution for the aforementioned subset of the task. To implement other conditions, other than “unblock on first chronological member”, override the TSyncroCondition methods marked as virtual.

unit SBD.TL.SyncObjs2;
interface
uses System.SyncObjs, Generics.Collections, SysUtils;

const
  Forever = System.INFINITE;
  TimeOut = cardinal( $FFFFFFFF);
  WaitForError = 0;

type

  TSynchoConditionList = class;

  ISynchroCondition = interface;
  TSBDSemaphore = class
    private
      FEntangledConditions: TSynchoConditionList;
      FGate: TCriticalSection;
      FCount: cardinal;
      FMax: cardinal;
      FBase: TSemaphore;

      function  BaseWaitFor( TimeLimit: cardinal): TWaitResult;
      procedure BaseSignal;
      procedure WaitedVia_External;

    public
      constructor Create( Gate: TCriticalSection; InitialCount, MaxCount: cardinal);
      destructor Destroy; override;
      function  WaitFor( TimeLimit: cardinal): TWaitResult;
      function  Signal: boolean;
      function  Count: cardinal;
      function  AsConditions( AConstrainToHandleSyncros: boolean): ISynchroCondition;
    end;

  ISynchroCondition = interface
    ['{CFD0DD74-6EB4-4CCF-9EE1-8BBAC759151A}']
      function WaitFor( TimeLimit: cardinal; var Contributor: TSBDSemaphore): TWaitResult;
      function Join( const Addend: ISynchroCondition): ISynchroCondition;
    end;

  IInterfaceHelper = interface
    ['{40D9B899-AF13-4FC1-AF04-23E8416E1FEE}']
      function AsObject: TObject;
  end;

  TSyncroCondition = class( TInterfacedObject, ISynchroCondition, IInterfaceHelper)
    protected
      // Override these methods to implement conditions other that "first chronological".
      function  ConditionIsSignaled: boolean;                                    virtual;
      function  ConditionWillBeSignaledAfterSignal( Sem: TSBDSemaphore):boolean; virtual;
      function  ConditionWillBeSignaledAfterWait  ( Sem: TSBDSemaphore):boolean; virtual;
      function  WaitCanUseOS_API: boolean;                                       virtual;
      function  OS_API_WaitFor( TimeLimit: cardinal; var Contributor: TSBDSemaphore): TWaitResult; virtual;

    private
      FisSignalled: boolean;
      FWillBeSignaled: boolean;
      FSignals: TDictionary<TSBDSemaphore,cardinal>;
      FGate: TCriticalSection;
      FisBroken: boolean;
      FBase: TSemaphore;
      FInited: boolean;
      FConstrainedToAllHandleSyncros: boolean;

      constructor CreateWithOne( Origin: TSBDSemaphore; AConstrainToHandleSyncros: boolean);
      destructor Destroy; override;

      procedure Presignal ( Sem: TSBDSemaphore);
      procedure Postsignal( Sem: TSBDSemaphore);
      procedure PostWait  ( Sem: TSBDSemaphore);
      procedure BaseSignal;
      procedure BaseWaitForever;
      function  BaseWaitFor( TimeLimit: cardinal): TWaitResult;
      function  WaitFor( TimeLimit: cardinal; var Contributor: TSBDSemaphore): TWaitResult;
      function  FindASignallingContributor( var Sem: TSBDSemaphore): boolean;
      procedure NotifyMemberDestroyed( Member: TSBDSemaphore);
      function  AsObject: TObject;
      function  Join( const Addend: ISynchroCondition): ISynchroCondition;
      procedure CheckInit;
    end;

  TSynchoConditionList = class( TList<TSyncroCondition>)
    public
    end;

implementation




function TSBDSemaphore.Count: cardinal;
begin
result := FCount
end;

constructor TSBDSemaphore.Create( Gate: TCriticalSection; InitialCount, MaxCount: cardinal);
begin
Assert( MaxCount >= 1);
Assert( InitialCount <= MaxCount);
FGate := Gate;
FBase := TSemaphore.Create( nil, InitialCount, MaxCount, '', False);
FEntangledConditions := TSynchoConditionList.Create;
FCount := InitialCount;
FMax   := MaxCount
end;

destructor TSBDSemaphore.Destroy;
var
  Cnd: TSyncroCondition;
begin
FGate.Enter;
try
  for Cnd in FEntangledConditions do
    Cnd.NotifyMemberDestroyed( self);
  FBase.Free
finally
  FGate.Leave
  end;
inherited
end;

function TSBDSemaphore.Signal: boolean;
var
  Cnd: TSyncroCondition;
  Saturated: boolean;
begin
FGate.Enter;
try
  Saturated := FCount >= FMax;
  if not Saturated then
    begin
    Inc( FCount);
    for Cnd in FEntangledConditions do
      Cnd.Presignal( self)
    end;
  BaseSignal;
  if not Saturated then
    for Cnd in FEntangledConditions do
      Cnd.Postsignal( self);
finally
  FGate.Leave
  end;
end;

procedure TSyncroCondition.Presignal( Sem: TSBDSemaphore);
begin
Assert( FGate = Sem.FGate);
CheckInit;
FWillBeSignaled := ConditionWillBeSignaledAfterSignal( Sem);
FSignals[ Sem] := FSignals[ Sem] + 1;
if (not FisSignalled) and FWillBeSignaled then
  BaseSignal;
end;

procedure TSyncroCondition.BaseWaitForever;
begin
BaseWaitFor( Forever)
end;

constructor TSyncroCondition.CreateWithOne( Origin: TSBDSemaphore; AConstrainToHandleSyncros: boolean);
begin
FisBroken := False;
FGate     := Origin.FGate;
FConstrainedToAllHandleSyncros := AConstrainToHandleSyncros;
{$IFDEF MSWINDOWS}
if FConstrainedToAllHandleSyncros then
  Assert( Origin.FBase is THandleObject);
{$ENDIF MSWINDOWS}
FInited   := False;
FSignals  := TDictionary<TSBDSemaphore,cardinal>.Create;
FSignals.Add( Origin, Origin.FCount);
FisSignalled    := False;
FWillBeSignaled := False;
FBase := nil
end;

procedure TSyncroCondition.CheckInit;
var
  InitialCount: cardinal;
begin
if FInited then exit;
FInited := True;
FisSignalled    := ConditionIsSignaled;
FWillBeSignaled := FisSignalled;
if FisSignalled then
    InitialCount := 0
  else
    InitialCount := 1;
if not WaitCanUseOS_API then
  FBase := TSemaphore.Create( nil, InitialCount, 1, '', False)
end;


destructor TSyncroCondition.Destroy;
var
  Member: TSBDSemaphore;
begin
FGate.Enter;
try
  FisBroken := True;
  for Member in FSignals.Keys do
    Member.FEntangledConditions.Remove( self);
  FSignals.Free;
  FBase.Free
finally
  FGate.Leave
  end;
inherited
end;

procedure TSyncroCondition.NotifyMemberDestroyed( Member: TSBDSemaphore);
begin
if FGate <> Member.FGate then
  FisBroken := True;
if FisBroken then exit;
FGate.Enter;
try
  if FSignals.ContainsKey( Member) then
    begin
    FisBroken := True;
    FSignals.Remove( Member)
    end
finally
  FGate.Leave
  end
end;

function TSyncroCondition.OS_API_WaitFor(
  TimeLimit: cardinal; var Contributor: TSBDSemaphore): TWaitResult;
{$IFDEF MSWINDOWS}
var
  HandleObjs: THandleObjectArray;
  SignaledObj: THandleObject;
  Member: TSBDSemaphore;
  i: integer;
{$ENDIF MSWINDOWS}
begin
{$IFDEF MSWINDOWS}
  SetLength( HandleObjs, FSignals.Count);
  i := -1;
  for Member in FSignals.Keys do
    begin
    Inc( i);
    HandleObjs[ i] := Member.FBase as THandleObject
    end;
  result := THandleObject.WaitForMultiple( HandleObjs, TimeLimit, False, SignaledObj, False, 0);
  if result = wrSignaled then
    begin
    i := -1;
    for Member in FSignals.Keys do
      begin
      Inc( i);
      if SignaledObj <> Member.FBase then continue;
      Member.WaitedVia_External;
      break
      end
    end
{$ELSE}
  result := wrError
{$ENDIF MSWINDOWS}
end;

function TSyncroCondition.WaitCanUseOS_API: boolean;
begin
{$IFDEF MSWINDOWS}
result := FConstrainedToAllHandleSyncros;
{$ELSE}
result := False
{$ENDIF MSWINDOWS}
end;

function TSyncroCondition.WaitFor(
  TimeLimit: cardinal; var Contributor: TSBDSemaphore): TWaitResult;
var
  TimeLeft: cardinal;
  AfterWaitClock, Elapsed: TDateTime;
  Confirmed: boolean;
  doRetry: boolean;
  Count: cardinal;
begin
if FisBroken then
  begin
  result := wrAbandoned;
  exit
  end;
CheckInit;
if WaitCanUseOS_API then
    result := OS_API_WaitFor( TimeLimit, Contributor)
  else
    begin
    TimeLeft := TimeLimit;
    repeat
      result := BaseWaitFor( TimeLeft);
      if result <> wrSignaled then break;
      if TimeLeft > 0 then
        AfterWaitClock := Now;
      FGate.Enter;
      try
        Confirmed := FindASignallingContributor( Contributor);
        if Confirmed then
          begin
          result    := Contributor.BaseWaitFor( 0);
          Confirmed := result = wrSignaled
          end;
        doRetry := result = wrTimeOut;
        if doRetry and (TimeLeft > 0) then
          begin
          Elapsed := Trunc( (Now - AfterWaitClock) * MSecsPerDay);
          if TimeLeft > Elapsed then
              Dec( TimeLeft, Elapsed)
            else
              TimeLeft := 0
          end;
        if Confirmed then
          begin
          Count := FSignals[ Contributor];
          if Count > 0 then
            FSignals[ Contributor] := Count - 1
          end;
        if ConditionIsSignaled then
          BaseSignal;
      finally
        FGate.Leave
        end;
      if doRetry then
        Sleep(1)
    until not doRetry
    end
end;

procedure TSyncroCondition.Postsignal( Sem: TSBDSemaphore);
begin
CheckInit;
if FisSignalled and (not FWillBeSignaled) then
  BaseWaitForever
end;

procedure TSBDSemaphore.WaitedVia_External;
var
  Cnd: TSyncroCondition;
  Saturated: boolean;
begin
FGate.Enter;
try
  Saturated := FCount = 0;
  if not Saturated then
    begin
    Dec( FCount);
    for Cnd in FEntangledConditions do
      Cnd.PostWait( self)
    end
finally
  FGate.Leave
  end;
end;

function TSBDSemaphore.WaitFor( TimeLimit: cardinal): TWaitResult;
begin
result := BaseWaitFor( TimeLimit);
if result = wrSignaled then
  WaitedVia_External
end;


procedure TSyncroCondition.PostWait( Sem: TSBDSemaphore);
var
  WR: TWaitResult;
  Saturated: boolean;
  Count: cardinal;
begin
Assert( FGate = Sem.FGate);
Assert( not FisBroken);
CheckInit;
FWillBeSignaled := ConditionWillBeSignaledAfterWait( Sem);
Count           := FSignals[ Sem];
if Count > 0 then
  FSignals[ Sem] := Count - 1;
if FisSignalled <> FWillBeSignaled then
  begin
  if FWillBeSignaled then
      BaseSignal
    else
      BaseWaitForever
  end
end;


function TSBDSemaphore.AsConditions( AConstrainToHandleSyncros: boolean): ISynchroCondition;
begin
result := TSyncroCondition.Create( self, AConstrainToHandleSyncros);
end;

procedure TSBDSemaphore.BaseSignal;
begin
FBase.Release
end;

function TSBDSemaphore.BaseWaitFor( TimeLimit: cardinal): TWaitResult;
begin
result := FBase.WaitFor( TimeLimit)
end;


function TSyncroCondition.FindASignallingContributor(
  var Sem: TSBDSemaphore): boolean;
var
  Pair: TPair<TSBDSemaphore,cardinal>;
begin
result := not FisBroken;
if not result then exit;
result := False;
for Pair in FSignals do
  begin
  result := Pair.Value > 0;
  if not result then continue;
  Sem := Pair.Key;
  break
  end
end;

function TSyncroCondition.AsObject: TObject;
begin
result := self
end;

function TSyncroCondition.Join(
  const Addend: ISynchroCondition): ISynchroCondition;
var
  Composite, Friend: TSyncroCondition;
  Pair: TPair<TSBDSemaphore,cardinal>;
  InitialCount: cardinal;
begin
FGate.Enter;
try
  Composite := TSyncroCondition.Create;
  result    := Composite;
  Friend    := (Addend as IInterfaceHelper).AsObject as TSyncroCondition;
  Assert( FGate = Friend.FGate);
  Composite.FisBroken := FisBroken or Friend.FisBroken;
  Composite.FGate     := FGate;
  FSignals            := TDictionary<TSBDSemaphore,cardinal>.Create;
  for Pair in FSignals do
    Composite.FSignals.Add( Pair.Key, Pair.Value);
  for Pair in Friend.FSignals do
    begin
    if Composite.FSignals.ContainsKey( Pair.Key) then
        Composite.FSignals[ Pair.Key] := Composite.FSignals[ Pair.Key] + Friend.FSignals[ Pair.Key]
      else
        Composite.FSignals.Add( Pair.Key, Pair.Value)
    end;
  for Pair in  Composite.FSignals do
    if Pair.Key.FEntangledConditions.IndexOf( self) = -1 then
      Pair.Key.FEntangledConditions.Add( self);
  Composite.FisSignalled    := ConditionIsSignaled;
  Composite.FWillBeSignaled := FisSignalled;
  Composite.FInited         := False;
  Composite.FBase           := nil;
  {$IFDEF MSWINDOWS}
  Assert( FConstrainedToAllHandleSyncros = Friend.FConstrainedToAllHandleSyncros);
  {$ENDIF MSWINDOWS}
  Composite.FConstrainedToAllHandleSyncros := FConstrainedToAllHandleSyncros;
finally
  FGate.Leave
  end;
end;

procedure TSyncroCondition.BaseSignal;
begin
if assigned( FBase) then
  FBase.Release;
FisSignalled := True
end;

function TSyncroCondition.BaseWaitFor( TimeLimit: cardinal): TWaitResult;
begin
if assigned( FBase) then
    result := FBase.WaitFor( TimeLimit)
  else
    result := wrError;
FisSignalled := False
end;


function TSyncroCondition.ConditionIsSignaled: boolean;
var
  Pair: TPair<TSBDSemaphore,cardinal>;
begin
result := False;
for Pair in FSignals do
  begin
  result := Pair.Value > 0;
  if not result then continue;
  break
  end
end;

function TSyncroCondition.ConditionWillBeSignaledAfterSignal(
  Sem: TSBDSemaphore): boolean;
begin
result := True
end;

function TSyncroCondition.ConditionWillBeSignaledAfterWait(
  Sem: TSBDSemaphore): boolean;
var
  Pair: TPair<TSBDSemaphore,cardinal>;
  Count, Sum: cardinal;
begin
result := False;
Sum    := 0;
for Pair in FSignals do
  begin
  Count := Pair.Value;
  if (Pair.Key = Sem) and (Count > 0) then
    Dec( Count);
  Inc( Sum, Count);
  result := Sum > 0;
  if not result then continue;
  break
  end
end;


end.
Posted in Delphi | Comments Off on How to wait on multiple events (Delphi)

Upgrading SmartInspect to Delphi XE7 support

This is how I upgraded SmartInspect to XE7.


Important Update!

Gurock has released SmartInspect v3.3.8, which supports XE7. So now this post is only for historical interest.


My version of SI is “SmartInspect Professional” v3.3.7.150 by Gurock software. My version of Delphi is “Embarcadero® Delphi XE7” Version 21.0.17707.5020 + Update 1. The o/s is Windows 7 Enterprise + SP1 (64-bit). This procedure was tested early March 2015.

(1) Install SmartInspect using the regular automated installer.
(2) Copy the source code for XE6 to a new and convenient directory.
The relevant files are:

  1. $(appdir)\source\delphi\SiAuto.pas
  2. $(appdir)\source\delphi\SiEncryption.pas
  3. $(appdir)\source\delphi\SiRijndael.inc
  4. $(appdir)\source\delphi\SiWinSock2.pa
  5. $(appdir)\source\delphi\SmartInspect.inc
  6. $(appdir)\source\delphi\SmartInspect.pas
  7. $(appdir)\source\delphi\SmartInpsectDXE6.dpk
  8. $(appdir)\source\delphi\SmartInspectDXE6.bdsproj

… where:

  • $(appdir) is a place-marker for your SI install directory (typically something like ‘C:\Program Files (x86)\Gurock Software\SmartInspect Professional’)
  • $(new-appdir) is a place-marker for where you care going to put the modified source

For now, don’t worry about the project group, the 64-bit project, nor the project heads for the other compilers.

In your new directory, structure the folders how you like. My preference is indicated below. segregate the project head files (.dpk, and .bdsproj) from the rest in accordance with your plan. For example, with my structure, everything goes in run, except for the package heads which go in packages\XE7.

Directory What lives there?
$(new-appdir)\run Source, except head files
$(new-appdir)\packages\XE7 XE7 head files
$(new-appdir)\ephemeral\dcu\XE7\Win32\Debug dcu files for Win32 platform, Debug config
$(new-appdir)\ephemeral\dcu\XE7\Win32\Release dcu files for Win32 platform, Release config
$(new-appdir)\ephemeral\dcu\XE7\Win64\Debug dcu files for Win64 platform, Debug config
$(new-appdir)\ephemeral\dcu\XE7\Win64\Release dcu files for Win64 platform, Release config

(3) Open up the package, at the new location, in the XE7 IDE. If you have changed the relative location of project members to thier heads, you may have to remove and add back the members. Do not do so by directly editing the dpr, as doing so will not correctly update the droj.

The original dpr should look like this …

package SmartInspectDXE6;

{$ALIGN 8}
{$ASSERTIONS ON}
{$BOOLEVAL OFF}
{$DEBUGINFO ON}
{$EXTENDEDSYNTAX ON}
{$IMPORTEDDATA ON}
{$IOCHECKS ON}
{$LOCALSYMBOLS ON}
{$LONGSTRINGS ON}
{$OPENSTRINGS ON}
{$OPTIMIZATION ON}
{$OVERFLOWCHECKS OFF}
{$RANGECHECKS OFF}
{$REFERENCEINFO ON}
{$SAFEDIVIDE OFF}
{$STACKFRAMES OFF}
{$TYPEDADDRESS OFF}
{$VARSTRINGCHECKS ON}
{$WRITEABLECONST OFF}
{$MINENUMSIZE 1}
{$IMAGEBASE $400000}
{$DESCRIPTION 'SmartInspect Delphi XE6 Runtime Package'}
{$RUNONLY}
{$IMPLICITBUILD OFF}

requires
  rtl,
  vcl
  {$IFNDEF SI_DISABLE_GRAPHIC}, vclimg {$ENDIF}
  {$IFNDEF SI_DISABLE_DB}, dbrtl {$ENDIF}
  ;

contains
  SiAuto in 'SiAuto.pas',
  SiWinSock2 in 'SiWinSock2.pas',
  SiEncryption in 'SiEncryption.pas',
  SmartInspect in 'SmartInspect.pas';

{$IFDEF INCLUDE_VERSION}
  {$R versionXE6.res}
{$ENDIF}

end.

Experienced Delphi Developers will notice a problem here straight away. Manual conditionals in the dpr source. The SmartInspect author should have known better. If you make any changes to the project, even just project options, the Delhi IDE will silently trash the source, and the developer will be left scratching his head, wondering why it won’t compile. We don’t need the conditions. For the one in a million developers that want vlcimg unit excluded, or the version resource included, they can work this out for themselves.

(4) If touching the project leads you to an unbalanced {$ENDIF}, balance as required, but don’t compile yet.

(5) Save the project with new project name, just ‘SmartInspect’. Delete the old named project head files. We should be left with just SmartInspect.dpr and SmartInspect.dproj .

(6) Set up project options. Set these options up for “All configurations – all platforms”. Make sure afterwards that there are no configs in which these values are overridden. Checking this can be a bit of a pain, and is a area of the Embarcadero IDE that could and should be improved.

Option Instruction
dcp output directory Set as required. Usually this is blank, and the resolved value is controlled by IDE options.
package output directory Set as required. Usually this is blank and the resolved value is controlled by IDE options
unit output directory. Set to ..\..\ephemeral\dcu\XE7\$(Platform)\$(Config)
Description SmartInspect Runtime Package
LibSuffix _XE7
Include Version Information CHECKED
Module Version Number 3.3.7.0

(7) Update global IDE options for library path. Do this for all platforms. In particular ….

  1. Remove any references that might exist to the old SmartInspect source, dcu or dcp locations
  2. Remove any references that might exist to the new SmartInspect source directory. We don’t want SI units to be recompiled every time we compile our main application
  3. Include the path the to SmartInspect dcu files at the new location. Make use of $(Platform) and $(Config) symbols, instead of literal values, where-ever possible.
  4. You many also want to adjust browsing path, if you intend to debug step into the SI code. This step is optional and will not suite all developers.

(8) If you are the obsessive-compulsive type, you could edit the SmartInspect.inc file to explicitly mention the XE7 compiler, but this step is optional, as the inc file is already reasonably future-proof. I list it here for your convenience. How to extend it is self-evident.

//
// <!-- Copyright (C) Gurock Software GmbH. All rights reserved. -->
//
// <summary>
//   Contains defines needed to be compatible with multiple versions of
//   Borland/CodeGear Delphi.
// </summary>

{$IFDEF CONDITIONALEXPRESSIONS}
  {$IF CompilerVersion >= 15}
    {$DEFINE DELPHI7_OR_HIGHER}
  {$IFEND}
  {$IF CompilerVersion >= 17}
    {$DEFINE DELPHI2005_OR_HIGHER}
  {$IFEND}
  {$IF CompilerVersion >= 18}
    {$DEFINE DELPHI2006_OR_HIGHER}
  {$IFEND}
  {$IF CompilerVersion >= 20}
    {$DEFINE DELPHI2009_OR_HIGHER}
  {$IFEND}
  {$IF CompilerVersion >= 21}
    {$DEFINE DELPHI2010_OR_HIGHER}
  {$IFEND}
  {$IF CompilerVersion >= 22} // XE
    {$DEFINE DELPHI2011_OR_HIGHER}
  {$IFEND}
  {$IF CompilerVersion >= 23} // XE2
    {$DEFINE DELPHI2012_OR_HIGHER}
  {$IFEND}
  {$IF CompilerVersion >= 24} // XE3
    {$DEFINE DELPHI2013_OR_HIGHER}
  {$IFEND}
  {$IF CompilerVersion >= 25} // XE4
    {$DEFINE DELPHI2014_OR_HIGHER}
  {$IFEND}
  {$IF CompilerVersion >= 26} // XE5
    {$DEFINE DELPHIXE5_OR_HIGHER}
  {$IFEND}
  {$IF CompilerVersion >= 27} // XE5
    {$DEFINE DELPHIXE6_OR_HIGHER}
  {$IFEND}
{$ENDIF}

{.$DEFINE SI_DISABLE_DB}         { Disables database methods }
{.$DEFINE SI_DISABLE_RTTI}       { Disables RTTI usage (LogObject) }
{.$DEFINE SI_DISABLE_GRAPHIC}    { Disables graphic methods }
{.$DEFINE SI_DISABLE_ENCRYPT}    { Disables log file encryption  }

{$OVERFLOWCHECKS OFF}
{$RANGECHECKS OFF}

(9) In the unit SmartInspect.pas, add a line in the procedure InternalConnect(), just after making the o/s call to connect to the socket. This line will be a call to a local procedure to write some text to a temporary file. This is shown in listing 3. I have found by trial, that without this fix, on XE7, about 50% of the time, the o/s would terminate the application calling InternalConnect. No exception is raised. The application is simply terminated. I cannot explain why this fix works. I only know that it does. InternalConnect() will be called every time a SmartInspect object is switched from Enabled=False to Enabled=True.

procedure FixSocketProblem;
var
  FN: string;
  TempStream: TStream;
  Buff: TBytes;
  sOutText: string;
begin
  FN := TPath.GetTempFileName;
  TempStream := TFile.Create( FN, fmCreate);
  try
    sOutText := 'Writing some text to a file makes the socket problem go away.';
    Buff := TEncoding.UTF8.GetBytes( sOutText);
    TempStream.WriteData( Buff, Length( Buff))
  finally
    TempStream.Free
  end;
  TFile.Delete( FN)
end;


procedure TSiTcpClient.InternalConnect(const ASocket: TSocket;
  const ATo: PSockAddrIn; const ATimeout: Integer);
var
  LConnectResult: Integer;
  WSAErr: integer;
begin
  ChangeBlockingMode(ASocket, False);
  try
    // Try a non-blocking connect to the destination.
    LConnectResult :=
      SiWinSock2.connect(ASocket, PSockAddr(ATo), SizeOf(ATo^));

    FixSocketProblem;

    if LConnectResult = SOCKET_ERROR then
    begin
      // It is normal that the winsock connect function returns
      // SOCKET_ERROR if a socket is in non-blocking mode. This occurs
      // if the socket could not be connected immediately. But if the
      // last error isn't WSAEWOULDBLOCK we've got a real error.
      WSAErr := WSAGetLastError;
      if WSAErr <> WSAEWOULDBLOCK then
      begin
        // And indeed, we've got a real error, so
        // raise the last socket error accordingly.
        RaiseLastSocketError;
      end;
      // We are not connected yet, so we need to wait
      // until the socket connection has been established
      // or the timeout has been reached.
      WaitForConnect(ASocket, ATimeout);
    end;
  finally
    // Restore the normal blocking mode.
    ChangeBlockingMode(ASocket, True);
  end;
end;

(10) Modify TSiTcpClient.ReceiveLn() to put a reasonable cap on the text line returned by sending tcp client (usually the SI Console). The adjustment is shown in listing 4.

function TSiTcpClient.ReceiveLn: AnsiString;
var
  C: AnsiChar;
begin
  SetLength(Result, 0);
  repeat
    if Receive(C, 1) <> 1 then
    begin
      // The socket connection has been closed.
      raise ESmartInspectError.Create(SiSocketClosedError);
    end else
    begin
      if not (C in [#10, #13]) then
      begin
        // Append the new character, unless it
        // represents a newline (char #10 or #13).
        Result := Result + C;
      end;
    end;
  until (C = #10) or (Length( result) >= 1000);
end;

(11) Compile with Platform=Win32 and Config=Debug.
(12) Test. Make a Win32 desktop application that does some SI stuff.
(13) Compile for Release.
(14) Add Win64 platform within the same project. No need to create a separate project. Compile for both configs.

The new dpr should look like this …

package SmartInspect;

{$R *.res}
{$IFDEF IMPLICITBUILDING This IFDEF should not be used by users}
{$ALIGN 8}
{$ASSERTIONS ON}
{$BOOLEVAL OFF}
{$DEBUGINFO ON}
{$EXTENDEDSYNTAX ON}
{$IMPORTEDDATA ON}
{$IOCHECKS ON}
{$LOCALSYMBOLS ON}
{$LONGSTRINGS ON}
{$OPENSTRINGS ON}
{$OPTIMIZATION OFF}
{$OVERFLOWCHECKS OFF}
{$RANGECHECKS OFF}
{$REFERENCEINFO ON}
{$SAFEDIVIDE OFF}
{$STACKFRAMES ON}
{$TYPEDADDRESS OFF}
{$VARSTRINGCHECKS ON}
{$WRITEABLECONST OFF}
{$MINENUMSIZE 1}
{$IMAGEBASE $400000}
{$DEFINE DEBUG}
{$ENDIF IMPLICITBUILDING}
{$DESCRIPTION 'SmartInspect Runtime Package'}
{$RUNONLY}
{$IMPLICITBUILD OFF}

requires
  rtl,
  vcl,
  vclimg,
  dbrtl;

contains
  SiAuto in 'SiAuto.pas',
  SiWinSock2 in 'SiWinSock2.pas',
  SiEncryption in 'SiEncryption.pas',
  SmartInspect in 'SmartInspect.pas';


end.
Posted in Delphi | Comments Off on Upgrading SmartInspect to Delphi XE7 support

On Entities in XML

In this post I explain the rules for entity definition and references in natural XML documents. These rules are defined in the XML specs (XML 1.0 and XML 1.1). You can read the specification yourself, but reading standards documents are rarely fun. This post will illustrate the rules by example.

In section 4.4 XML Processor Treatment of Entities and References of the spec, we are presented with …

Entity
Type
Character
Parameter Internal General External Parsed
General
Unparsed
Reference
in Content
Not recognized Included Included
if validating
Forbidden Included
Reference in Attribute Value Not recognized Included
in literal
Forbidden Forbidden Included
Occurs as Attribute
Value
Not recognized Forbidden Forbidden Notify Not recognized
Reference in EntityValue Included in literal Bypassed Bypassed Error Included
Reference in DTD Included as PE Forbidden Forbidden Forbidden Forbidden

The approach

The rule of each cell in the above matrix will be illustrated with a case study (labelled “test case”). The input of test cases will be grouped together in a natural XML document, and passed through an XSLT identity transform. A test case is then, the combination of the input document and the result of the transform (either an error result or a resultant document). The transformation engine chosen was Saxon-HE 9.5.1.1N from Saxonica. The central idea of this approach, is that the identity transform is a transparent window into how Saxon’s input XML processor processes and views the test case input documents. Listing 1 shows our identity transform.

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" encoding="utf-8" omit-xml-declaration="yes" />

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>
      
</xsl:stylesheet>

Informational resources

The following entities (informational resources) will be available to all test case via the indicated system identifiers.

<?xml version="1.0" encoding="UTF-8"?>banana
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY % where-is-it "It is at ">
<!-- %colour; is defined by whatever is using this external doc-type.
     Even though this external doc-type is passed BEFORE the internal one.
     At this point of parsing, it is not necessary for %colour; to be defined,
     as the reference in the attri definition is BYPASSED, not substituted. -->
<!ENTITY % some-place "%colour;">

<!-- The parameter entity is included in the definition of &test-case-14; -->
<!ENTITY test-case-14 "%where-is-it; %some-place;">
<!-- In the processor's symbol map: test-case-14 ==> 'x'. -->

Entity references in content

Let’s look at some entity references in content. This means references anywhere after the start-tag and before the end-tag of an element, and corresponds to the nonterminal content.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE a-doc-type [
 <!ENTITY % internal-pe "internal-pe-value">
 <!ENTITY % external-pe SYSTEM "source-text.txt">
 <!ENTITY internal-ge "internal-ge-value">
 <!ENTITY internal-ge-with-mark-up "<solo-element recursive1='&internal-ge;'/>">
 <!ENTITY external-parsed-ge SYSTEM "source-text.txt">
]>
<container>
  <!-- Test case 1: Parameter entities are not recognised in content. -->
  <test-case1>%internal-pe; %external-pe;</test-case1>
  
  <!-- Test case 2: Internal general entities are included in content. -->
  <test-case2 attri="&internal-ge;">&internal-ge; &internal-ge-with-mark-up;</test-case2>
  
  <!-- Test case 3: Character references are included. -->
  <test-case3>&#169; Sean B. Durkin, 2014</test-case3> 

  <!-- Test case 4: An external parsed general entity in content.
       An non-validating processor MIGHT include it.
       A validating one WILL.	   -->
  <test-case4>&external-parsed-ge;</test-case4> 
</container>

… transforms to …

<container>
  <!-- Test case 1: Parameter entities are not recognised in content. -->
  <test-case1>%internal-pe; %external-pe;</test-case1>
  
  <!-- Test case 2: Internal general entities are included in content. -->
  <test-case2 attri="internal-ge-value">internal-ge-value <solo-element recursive1="internal-ge-value"/>
   </test-case2>
  
  <!-- Test case 3: Character references are included. -->
  <test-case3>آ© Sean B. Durkin, 2014</test-case3> 

  <!-- Test case 4: An external parsed general entity in content.
       An non-validating processor MIGHT include it.
       A validating one WILL.	   -->
  <test-case4>banana</test-case4> 
</container>

Test cases 1 and 2 show us that parameter entity references are not recognised in content. That should be expected by reader. They are a thing just for the purposes of building DTD’s.

Test case 3 shows that character references are included in content. They are replaced as soon as the XML processor sees the entity reference in content. I am not sure why we are getting the extra آ character in listing 5. If you can explain this, please leave a comment.

[aside]
caption = Error and Fatal Error?
alignment = right
collapse_state = collapsed
corners = round
hr_style = sbd-swish
bg_colour = green
width = 300px

The rules on conformant processors handling non-fatal errors are pretty liberal. A processor is might detect and report the error, but equally well, it is free to not detect it, or equivalently, to detect it but not report it. In any case, it is free to recover from the error, or to not recover from it and therefore behave indeterminately.

Fatal errors are a subset of errors. The first such encountered error must be detected and reported by conformant processors. Normal processing of the document must cease. The processor then has the option of completely aborting processing, or scanning for more errors, but in any case, the normal data (other than information about the errors), must not be passed to the client of the processor.
[/aside]

Test case 4 is an interesting one. The treatment of reference to an external parsed general entity in content depends on whether or not the processor is validating. If validating, the replacement text is included, as shown in listing 5, test case 4. Apparently the outcome of listing 5, does not depend on whether the input document of listing 4 is marked as stand-alone or not. You would think that marking standalone=”yes” would cause the processor NOT to attempt to load external entities. The spec says that for a non-validating processor, the processor gets to choose whether or not to include replacement text. It is a rather inconvenient rule, because it makes the behavior of non-validating processors unpredictable in the general sense. Will &external-parsed-ge; be replaced by ‘banana’? If the vendor does not publish the rule, only the vendor knows.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE a-doc-type [
 <!NOTATION my-notation SYSTEM "www.dat-files.org" >	
 <!ENTITY unparsed-entity SYSTEM "source-file.dat" NDATA my-notation>
]>
<container>
  <!-- Test case 5: An unparsed entity reference in content is a fatal error. -->
  <test-case5>&unparsed-entity;</test-case5>
</container>

The document is listing 6 raises a fatal error when we attempt to ident transform it. Test case 5 (listing 6), illustrates that we cannot refer to an unparsed entity within content. More-over they are pretty much forbidden every where. The only place we can have them is as entity values, as we will see following.

References within attribute values

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE a-doc-type [
 <!ENTITY % internal-pe "internal-pe-value">
 <!ENTITY internal-ge "internal-ge-value">
 <!ELEMENT test-case6 EMPTY>
 <!ATTLIST test-case6 attri2 CDATA "%internal-pe;"><!-- 6.A: Parameter entities are not recognised in attribute values. -->
 <!ELEMENT test-case7 EMPTY>
 <!ATTLIST test-case7 attri2 CDATA "&internal-ge;"><!-- 7.A: Internal general entities are are included in attribute values.-->
 <!ELEMENT test-case8 EMPTY>
 <!ATTLIST test-case8 attri2 CDATA "&#169;"><!-- 8.A: Character references are included in attribute values. -->
]>
<container>
  <!-- Test case 6.B: Parameter entities are not recognised in attribute values. -->
  <test-case6 attri1="%internal-pe;" />
  
  <!-- Test case 7.B: Internal general entities are included in attribute values. -->
  <test-case7 attri1="&internal-ge;" />
  
  <!-- Test case 8.B: Character references are included in attribute values. -->
  <test-case8 attri1="&#169; Sean B. Durkin, 2014" /> 
</container>

… transforms into …

<container>
  <!-- Test case 6.B: Parameter entities are not recognised in attribute values. -->
  <test-case6 attri1="%internal-pe;" attri2="%internal-pe;"/>
  
  <!-- Test case 7.B: Internal general entities are included in attribute values. -->
  <test-case7 attri1="internal-ge-value" attri2="internal-ge-value"/>
  
  <!-- Test case 8.B: Character references are included in attribute values. -->
  <test-case8 attri1="آ© Sean B. Durkin, 2014" attri2="آ©"/> 
</container>

As we can see from listings 7 and 8, internal parameter entities are not recognised in attribute values (just as they are not so in non-attribute content). But parsed general entities and character references are ok. Our parsed general reference (test case 7) is an internal general entity, but equally well it could have been an external parsed general entity.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE a-doc-type [
 <!ENTITY external-parsed-ge SYSTEM "source-text.txt">
 <!ELEMENT test-case9 EMPTY>
 <!ATTLIST test-case9 attri2 CDATA "&external-parsed-ge;"><!-- 9.A: External entity reference forbidden in attribute value. -->
]>
<container>
  <!-- Test case 9.B: An external entity, (be it either an external parsed general entity or an unparsed general entity)
       is forbidden as a reference in an attribute value. This is a fatal error. -->
  <test-case9 attri1="&external-parsed-ge;" />
</container> 

Oh oh! The document of listing 9 blows up when we try to ident transform it. It has two fatal errors. Both illustrate that it is forbidden to refer to an unparsed entity in an attribute value, be it content or attribute defaults.

Entities AS attribute values

When an attribute value is declared to be of type ENTITY, we can list the entity directly as the value, as opposed to making an entity reference.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE a-doc-type [
 <!ENTITY internal "stuff and nonsense">
 <!NOTATION jpg SYSTEM "image/jpeg"> 
 <!ENTITY file_pic SYSTEM "file.jpg" NDATA jpg>
 <!ENTITY source-text SYSTEM "source-text.txt">
 <!ELEMENT test-case EMPTY>
 <!ATTLIST test-case source-entity ENTITY #REQUIRED>
]>
<container>
  <!-- Test case 10. An internal general entity as an entity value is forbidden. -->
  <test-case case-number="10" source-entity="internal"/>
  
  <!-- Test case 11. An external parsed general entity as an entity value is forbidden. -->
  <test-case case-number="11" source-entity="source-text"/>

  <!-- Test case 12. When an unparsed entity as an entity value, and the processor is validating, 
       the processor must inform the application of the system and public (if any) identifiers for
	    both the entity and its associated notation. (In this case "file.jpg" and "image/jpeg"). -->
  <test-case case-number="12" source-entity="file_pic"/>
  
  <!-- Test case 13. Character references are not recognised in attribute values of type ENTITY. -->
  <test-case case-number="13" source-entity="&#169;"/>
</container>

… transforms into …

<container>
  <!-- Test case 10. An internal general entity as an entity value is forbidden. -->
  <test-case case-number="10" source-entity="internal"/>
  
  <!-- Test case 11. An external parsed general entity as an entity value is forbidden. -->
  <test-case case-number="11" source-entity="source-text"/>

  <!-- Test case 12. When an unparsed entity as an entity value, and the processor is validating, 
       the processor must inform the application of the system and public (if any) identifiers for
	    both the entity and its associated notation. (In this case "file.jpg" and "image/jpeg"). -->
  <test-case case-number="12" source-entity="file_pic"/>
  
  <!-- Test case 13. Character references are not recognised in attribute values of type ENTITY. -->
  <test-case case-number="13" source-entity="آ©"/>
</container>

What the heck is happening in test cases 10, 11 and 13 (listing 11)? According to the “Processor Treatment of Entities and References” matrix in the spec, it is forbidden to specify a parsed general entity as a value of a attribute typed as ENTITY. Yet the XML processor used by Saxon, allows these to pass without comment. It should be fatal error. I posted a question about this on StackOverflow. If you can explain this odd behaviour, feel free to leave a comment on this post.

An unparsed entity is fine (as you can see from test case 12).

Really, what is the point of this rule? What is wrong with an ENTITY attribute taking the value of a parsed entity? It feels like this is a typographical error in the spec.

Reference in entity value

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE a-doc-type SYSTEM "extra-doc-type.dtp" [

 <!ENTITY % colour "green">
 <!ENTITY external-parsed-ge SYSTEM "source-text.txt">
 <!ENTITY unparsed-entity SYSTEM "source-file.dat" NDATA my-notation>

 <!-- When a general entity reference appears in the EntityValue in an entity declaration, it must be bypassed and left as is. -->
 
 <!ENTITY test-case-15 "&test-case-14;">
 <!-- In the processor's symbol map: test-case-15 ==> '&test-case-14;'. -->
 
 <!ENTITY test-case-16 "&external-parsed-ge;">
 <!-- In the processor's symbol map: test-case-16 ==> '&external-parsed-ge;'. -->
 
 <!ENTITY test-case-17 "&#169;">
 <!-- In the processor's symbol map: test-case-17 ==> '©'. -->
 
 <!ENTITY test-case-18 "&unparsed-entity;">
 <!-- Test case 18 is an error because references to unparsed entities are forbidden in entity values.
      However, the processor may choose to recover from this error. -->
 
 ]>
<container>
  <!-- Test case 15: Mapping of @attri applies recursively until we reach a value of 'x'. -->
  <test-case-15 attri="&test-case-15;" />
  
  <!-- Test case 16: An external parsed general entity reference (&external-parsed-ge;)
       in the entity value definition above, is bypassed, and stored as-is.
	   When we come to normalisation of content (below), replacement values of entities
	   are recursively applied. -->
  <test-case-16>&test-case-16;</test-case-16>
  
  <test-case-17 attri="&test-case-17;" />
 </container> 

… transforms into …

<container>
  <!-- Test case 15: Mapping of @attri applies recursively until we reach a value of 'x'. -->
  <test-case-15 attri="It is at  green"/>
  
  <!-- Test case 16: An external parsed general entity reference (&external-parsed-ge;)
       in the entity value definition above, is bypassed, and stored as-is.
	   When we come to normalisation of content (below), replacement values of entities
	   are recursively applied. -->
  <test-case-16>banana</test-case-16>
  
  <test-case-17 attri="آ©"/>
 </container>

From listings 12 and 13, we can see that references to parsed entities within an entity value declaration are bypassed. Take a look at the entity declaration for test-case-15. When the declaration is parsed, the processor stores the value ‘&test-case-14;’ literally. It does not store the translation of &test-case-14; as the value of &test-case-15, but rather ‘&test-case-14;’ literally. When it comes to parsing content, yes, then at that point, ‘&test-case-15’ will be recursively replaced by a computed replacement value. By examining both listings 13 and 4, you can see that this comes out to be ‘It is at green’.

References to unparsed entities within an entity value definition is an error, but not not a fatal one. Test case 18, listing 11 is an example. Our processor chooses to swallow and recover from this error, but another processor might baulk at it.

Entity references within the DTD

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE a-doc-type [
 <!ENTITY % internal-pe "<!ELEMENT test-case-19 EMPTY>
                         <!ATTLIST test-case-19 magic CDATA 'waffle'>">
 <!ENTITY % external-pe SYSTEM "source-text.txt">
 
 <!-- %internal-pe; below is included. That is to say, it is parsed as its replacement text -->
 %internal-pe;
]>
<container>
  <test-case-19/>
</container>

… transforms into …

<container>
  <test-case-19 magic="waffle"/>
</container>

In listings 14 and 15, we can see that a reference to a parameter entity is included in the DTD. There rules for where parameter entities can occur depend on whether the data set is internal or external. For internal, the rules are restrictive and fairly simple. You can only have parameter entity references at the outer-most declarative level, and it may only contain a whole number of markup-decls. But when the references are in an external dataset, the rules are much more liberal. We can have them within mark-up.

In constrast, listing 16 explodes when we try to ident transform it. It contains 4 fatal errors.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE a-doc-type [
 <!ENTITY internal-ge "<!ELEMENT test-case-20 EMPTY>
                       <!ATTLIST test-case-20 magic CDATA 'waffle'>">
 <!ENTITY external-parsed-ge SYSTEM "source-text.txt">
 <!ENTITY unparsed-entity SYSTEM "source-file.dat" NDATA my-notation>
 
 <!-- Test case 20: Internal general entity references are forbidden in the DTD,
      (outside of an entity value definition, an attribute value, a processing instruction,
	   a comment, a system literal or a public id literal).
   &internal-ge; below is forbidden in the DTD. This is a fatal error. -->
 &internal-ge;

 <!-- Test case 21: External parsed general entity references are forbidden in the DTD,
      (outside of an entity value definition, an attribute value, a processing instruction,
	   a comment, a system literal or a public id literal).
    &external-parsed-ge; below is forbidden in the DTD. This is a fatal error. -->
 &external-parsed-ge;

 <!-- Test case 22: Unparsed entity references are forbidden in the DTD,
      (outside of an entity value definition, an attribute value, a processing instruction,
	   a comment, a system literal or a public id literal).
    &unparsed-entity; below is forbidden in the DTD. This is a fatal error. -->
 &unparsed-entity;
 
 <!-- Test case 23: Character references are forbidden in the DTD,
      (outside of an entity value definition, an attribute value).
    &#109;, which is the letter 'm', is forbidden in the context shown below.
	This is a fatal error. -->
 <!ELEMENT test-case-23 EMPTY>
 <!ATTLIST test-case-23 &#109;agic CDATA 'waffle'>"
]>
<container>
  <test-case-20/>
  <test-case-23/>
</container> 

We are not permitted to put general or character entity references within DTD’s, except in attribute and entity values, as covered by previous rules.

Posted in XML | Comments Off on On Entities in XML

TurboPower LockBox 3.6.0 Released – Mobile support!

I have just now released TurboPower LockBox 3.6.0. This massive update was contributed by Nick Chevsky. The code repository has been migrated/forked to Google Code.

With 3.5.0 as the baseline, the 3.6.0 version delivers:

  1. Support for all compilers from Delphi 7 and up.
  2. Support for Win32, Win64, Android, iOS, and OS X.
  3. A new include, TPLB3.Common.Inc, is included from all source files and defines conditionals used throughout the code base.
  4. Renamed uTPLb_D7Compatibility.pas to TPLB3.Compatibility.pas and added code needed by other compilers.
  5. Replaced legacy and platform-specific types (e.g. AnsiString, UTF8String, DWORD, etc.) with cross-platform types required by the next-generation mobile compilers.
  6. Implemented TStringHelper for platform-agnostic string manipulation, which resolves differences between 1-based strings in the legacy compilers and 0-based strings in the next-generation compilers.
  7. Functions that operate on strings now accept an Encoding parameter. I’ve kept this optional for backwards compatibility, but it should be made mandatory at some point in the future.
  8. Marked the following as deprecated in favor of new encoding-agnostic functions: TCodec.EncryptAnsistring, TCodec.DecryptAnsistring, TCodec.EncryptUtf8string, TCodec.DecryptUtf8string, THash.Hashutf8string, TStreamUtils.Stream_to_utf8string, TStreamUtils.utf8string_to_stream.
  9. Removed the ComponentPlatformsAttribute declaration from components (all except TOpenSSL), since they now work on all platforms.
  10. Added non-assembler fallback Pascal code to TRandom for use on mobile platforms.
  11. TNoncibleDecryptor now supports the OnSetIV event, like its encryptor counterpart
  12. .

Get the source here ….

https://tplockbox.googlecode.com/svn/trunk

Nick has done an excellent job of future proofing the code and engineering support for the mobile targets.

Posted in Delphi | Comments Off on TurboPower LockBox 3.6.0 Released – Mobile support!