Application centric Cloudwatch logging in AWS Lambda functions with python3+ runtimes

Abstract

Application centric logging is a system where there are one or more components all directing thier log entries to a single logger. In the AWS context, this could mean an application composed of one or more AWS Lamba functions each logging to a single application-wide AWS CloudWatch log stream. By “single”, I mean single to the application, not to each function.

In lambda functions with python runtimes, the default mode of logging is one log stream per lambda. We can do this via the print() function or the logging module. But there are sometimes situations where mulltiple lambdas are co-operating to solve a larger problem, where the co-operation is synchronous, and where it would be of value to be able to view a unified log stream for events accross multiple lambdas.

How do we achieve this, with a simple client interface? In this post, I present a solution.

A Solution

Include in the packaging of your AWS Lambda function, the following python 3 script with filename “custom_logging.py”.

################################################################################
##  CustomLogging class
################################################################################
import boto3
import time, sys
import logging 

def coerceLoggingType( logType):
  if (logType is None) or (logType == ''):
    logType = logging.INFO
  elif isinstance( logType, str):
    logType = getattr( logging, logType.upper(), logging.INFO)
  return logType

global stockFormats
global defaultFormat
global levelNames

defaultFormat = '#standard'

stockFormats = {
  '#standard': '{level}: {func}: {caller}: ',
  '#short'   : '{level}: {func}: ',
  '#simple'  : '{func}: '} 

levelNames = {
  logging.DEBUG   : 'DEBUG',
  logging.INFO    : 'INFO',
  logging.WARNING : 'WARNING',
  logging.ERROR   : 'ERROR',
  logging.CRITICAL: 'CRITICAL'}

botoLoggers = ['boto', 'boto3', 'botocore', 'urllib3']

def _json_formatter( obj):
  """Formatter for unserialisable values."""
  return str( obj)

class JsonFormatter( logging.Formatter):
  """AWS Lambda Logging formatter.
  Formats the log message as a JSON encoded string.  If the message is a
  dict it will be used directly.  If the message can be parsed as JSON, then
  the parse d value is used in the output record.
  """
  def __init__( self, **kwargs):
    super( JsonFormatter, self).__init__()
    self.format_dict = {
      'timestamp': '%(asctime)s',
      'level': '%(levelname)s',
      'location': '%(name)s.%(funcName)s:%(lineno)d'}
    self.format_dict.update(kwargs)
    self.default_json_formatter = kwargs.pop( 'json_default', _json_formatter)

  def format( self, record):
    record_dict = record.__dict__.copy()
    record_dict['asctime'] = self.formatTime( record)
    log_dict = {
      k: v % record_dict
      for k, v in self.format_dict.items() if v}
    if isinstance( record_dict['msg'], dict):
      log_dict['message'] = record_dict['msg']
    else:
      log_dict['message'] = record.getMessage()
    # Attempt to decode the message as JSON, if so, merge it with the
    # overall message for clarity.
    try:
      log_dict['message'] = json.loads( log_dict['message'])
    except ( TypeError, ValueError):
      pass
    if record.exc_info:
      # Cache the traceback text to avoid converting it multiple times
      # (it's constant anyway)
      # from logging.Formatter:format
      if not record.exc_text:
        record.exc_text = self.formatException( record.exc_info)
    if record.exc_text:
      log_dict['exception'] = record.exc_text
    json_record = json.dumps( log_dict, default=self.default_json_formatter)
    if hasattr( json_record, 'decode'):  # pragma: no cover
      json_record = json_record.decode( 'utf-8')
    return json_record

def setupCanonicalLogLevels( logger, level, fmt, formatter_cls=JsonFormatter, boto_level=None, **kwargs):
  if not isinstance( logger, logging.Logger):
    raise Exception( 'Wrong class of logger passed to setupCanonicalLogLevels().')
  if logger is not None:
    logger.setLevel( level)
  logging.root.setLevel( level)
  if fmt is not None:
    logging.basicConfig( format=fmt)
    fmtObj = logging.Formatter( fmt)
  else:
    fmtObj = None
  for handler in logging.root.handlers:
    try:
      if fmtObj is not None:
        handler.setFormatter( fmtObj)
      elif formatter_cls is not None:
        handler.setFormatter( formatter_cls( **kwargs))
    except:
      pass
  if boto_level is None:
    boto_level = level
  for loggerId in botoLoggers:
    try:
      logging.getLogger( loggerId).setLevel( boto_level)
    except:
      pass
 
 
class NullLogger():
  def __init__( self):
    pass
 
  def purge( self):
    pass
 
  def log( self, level, msg, withPurge=False):
    pass
 
  def debug( self, msg, withPurge=False):
    pass
 
  def info( self, msg, withPurge=False):
    pass
 
  def warning( self, msg, withPurge=False):
    pass
 
  def critical( self, msg, withPurge=False):
    pass
 
  def error( self, msg, withPurge=False):
    pass
 
  def exception( self, msg, withPurge=False):
    pass
 
  def classCode( self):
    return '#null'
 
  def isPurgeable( self):
    return False
 
class PrintLogger():
  def __init__( self, threshold):
    self.threshold = threshold
 
  def purge( self):
    pass
 
  def log( self, level, msg, withPurge=False):
    if level >= self.threshold:
      print( msg)
 
  def debug( self, msg, withPurge=False):
    self.log( logging.DEBUG, msg, False)
 
  def info( self, msg, withPurge=False):
    self.log( logging.INFO, msg, False)
 
  def warning( self, msg, withPurge=False):
    self.log( logging.WARNING, msg, False)
 
  def critical( self, msg, withPurge=False):
    self.log( logging.CRITICAL, msg, False)
 
  def error( self, msg, withPurge=False):
    self.log( logging.ERROR, msg, False)
 
  def exception( self, msg, withPurge=False):
    self.log( logging.ERROR, msg, False)
 
  def classCode( self):
    return '#print'
 
  def isPurgeable( self):
    return False
 
def createPolymorphicLogger( logClass, logGroup, logStream, logLevel = logging.INFO, functionName = None, msgFormat = None):
  if logClass == 'cloud-watch':
    return CustomLogging( logGroup, logStream, logLevel, functionName, msgFormat)
  elif logClass == '#print':
    return PrintLogger( logLevel)
  elif (logClass == '#null') or (logClass is None):
    return NullLogger()
  elif isinstance( logClass, dict) and ('logging' in logClass):
    loggingParams    = logClass.get( 'logging', {})
    cloudWatchParams = loggingParams.get( 'cloud-watch', {})
    if msgFormat is None:
      msgFormat = '#mini'
    actualLogClass  = loggingParams.get( 'class')
    logGroup     = cloudWatchParams.get( 'group'   , logGroup)
    logStream    = cloudWatchParams.get( 'stream'  , logStream)
    logLevel     =    loggingParams.get( 'level'   , logLevel)
    functionName = cloudWatchParams.get( 'function', functionName)
    msgFormat    = cloudWatchParams.get( 'format'  , msgFormat)
    return createLogger( actualLogClass, logGroup, logStream, logLevel, functionName, msgFormat)
  elif isinstance( logClass, dict) and ('class' in logClass):
    canonicalLogClassRecord = {'logging': logClass}
    return createLogger( canonicalLogClassRecord, logGroup, logStream, logLevel, functionName, msgFormat)
  elif logClass == '#standard-logger':
    logger = logging.getLogger( name=logStream)
    if msgFormat is None:
      msgFormat = '[%(levelname)s] %(message)s'
    setupCanonicalLogLevels( logger, logLevel, msgFormat, JsonFormatter, logging.ERROR)
    return logger
  else:
    raise Exception( f'Unrecognised log class {logClass}')
 
def getClassCode( logger):
  code = '#null'
  if isinstance( logger, logging.Logger):
    code = '#standard-logger'
  elif logger is not None:
    try:
      code = logger.classCode()
    except:
      code = '#unrecognised'
  return code
 
def isLoggerPurgeable( logger):
  result = False
  if (not isinstance( logger, logging.Logger)) and (logger is not None):
    try:
      result = logger.isPurgeable()
    except:
      pass
  return result
 
class CustomLogging:
  def __init__( self, logGroup, logStream, logLevel = logging.INFO, functionName = None, msgFormat = None):
    """ logGroup is the name of the CloudWatch log group. If none, the messages passes to print.
        logStream is the name of the stream. It is required. It is a string. There is no embedded date processing.
        logLevel is one of the logging level constants or its string equivalent. Posts below this level will be swallowed.
        functionName is the name of the lambda.
        msgFormat determines the logged message prefix. It is either a format string, a label or a function.
          If it is a format string, the following substitution identifiers:
            {level}  The message log level.
            {func}   The passed functionName
            {caller} The python caller function name
          If it is a label, is one of:
            #standard   - This is the default.
            #short
            #simple
            #mini
          If it is a function (or callable object), it must be a function that returns a prefix string with
            the following input parameters in order:
              level           - passed message level
              functionName  - constructed function name
              caller          - invoker caller name
              logMsg          - passed message
             
        EXAMPLE USAGE 1:
          import custom_logging, logging
         
          logger = CustomLogging( '/aws/ec2/prod/odin', '2022-06-29-MLC_DAILY-143', logging.INFO, 'CoolLambdaFunc', '#mini')
          logger.info( 'Hello friend! This is an info')
          logger.error( 'I broke it!')
          logger.purge()
       
        
        EXAMPLE USAGE 2:
          import custom_logging, logging
         
          logger = CustomLogging( None, None, logging.DEBUG, 'CoolLambdaFunc', '#mini')
          logger.info( 'This is the same as print')
      
        
        EXAMPLE USAGE 3:
          import custom_logging, logging
         
          logger = CustomLogging( None, None, logging.WARNING, 'CoolLambdaFunc', '{caller} | {level} !! {func}: ')
          
       
        
        EXAMPLE USAGE 3:
          import custom_logging, logging
         
          def colourMePink( level, functionName, caller, logMsg):
            if level == logging.DEBUG:
              prefix = '{level}: {func}: {caller}: '.format( level = sLevel, func = functionName, caller = caller)
            elif  level == logging.INFO:
              prefix = ''
            else:
              prefix = '{level}: '.format( level = sLevel)
            return prefix
         
          logger = CustomLogging( None, None, logging.INFO, None, colourMePink)
          
    """
    self.logs           = boto3.client( 'logs', region_name='ap-southeast-2')
    self.logEvents      = []
    self.functionName = functionName
    if self.functionName is None:
      self.functionName = ''
    self.logGroup       = logGroup
    self.logStream      = logStream
    self.msgFormat = msgFormat
    if self.msgFormat is None:
      self.msgFormat = defaultFormat
    if isinstance( self.msgFormat, str) and (self.msgFormat in stockFormats):
      self.msgFormat = stockFormats[self.msgFormat]
    elif self.msgFormat == '#mini':
      self.msgFormat = self._miniFormat
    self.logLevel       = coerceLoggingType( logLevel)
    self.sequenceToken  = None
    self.sequenceTokenIsValid = False
    self.maxEventsInBuffer = 20
    self.maxBufferAgeMs = 60000 # 1 minute.
 
  def _formatMessage( self, caller, logType, logMsg):
    prefix = ''
    if caller is None:
      try:
        caller = sys._getframe(3).f_code.co_name
      except:
        caller = ''
    sLevel = levelNames.get( logType, str( logType))
    if isinstance( self.msgFormat, str):
      prefix = self.msgFormat.format( level = sLevel, func = self.functionName, caller = caller)
    elif callable( self.msgFormat):
      prefix = self.msgFormat( logType, self.functionName, caller, logMsg)
    return prefix + str( logMsg)
 
  def _miniFormat( self, level, functionName, caller, logMsg):
    prefix = ''
    if level >= logging.WARNING:
      prefix = levelNames[ level] + ': '
    if functionName != '':
      prefix = prefix + functionName + ': '
    return prefix
 
  def _getSequenceToken( self):
    self.sequenceToken = None
    self.sequenceTokenIsValid = True
    try:
      response = self.logs.describe_log_streams( logGroupName=self.logGroup, logStreamNamePrefix=self.logStream)
    except self.logs.exceptions.ResourceNotFoundException:
      return 'group-not-found'
    try:
      if 'uploadSequenceToken' in response['logStreams'][0]:
        self.sequenceToken = response['logStreams'][0]['uploadSequenceToken']
      if self.sequenceToken == '':
        self.sequenceToken = None
    except:
      pass
    if self.sequenceToken is None:
      return 'stream-not-found-or-virgin-stream'
    else:
      return None
 
  def put( self, logMsg, logType = logging.INFO, withPurge=False, callFunc = None):
    logType = coerceLoggingType( logType)
    if self.logLevel <= logType:
      if self.logGroup is not None:
        timestamp = int( round( time.time() * 1000))
        message = self._formatMessage( callFunc, logType, logMsg)
        logEvent = {'timestamp': timestamp, 'message': message}
        if self.logLevel == logging.DEBUG:
         print( message)
        self.logEvents.append( logEvent)
        count = len( self.logEvents)
        if withPurge or \
           (count >= self.maxEventsInBuffer) or \
           ((count >= 1) and ((timestamp - self.logEvents[0]['timestamp']) >= self.maxBufferAgeMs)):
          self.purge()
      else:
        print( logMsg)
 
  def classCode( self):
    return 'cloud-watch'
 
  def _primitive_put_log_events( self):
    event_log = {
      'logGroupName' : self.logGroup,
      'logStreamName': self.logStream,
      'logEvents'    : self.logEvents}
    if self.sequenceToken is not None:
      event_log['sequenceToken'] = self.sequenceToken
    try:
      response = self.logs.put_log_events( **event_log)
      self.sequenceToken = response.get( 'nextSequenceToken')
      self.sequenceTokenIsValid = True
      result = None
    except self.logs.exceptions.ResourceAlreadyExistsException:
      self.sequenceTokenIsValid = False
      result = None
    except self.logs.exceptions.DataAlreadyAcceptedException:
      self.sequenceTokenIsValid = False
      result = None
    except self.logs.exceptions.InvalidSequenceTokenException:
      self.sequenceTokenIsValid = False
      result = 'invalid-sequence-token'
    except self.logs.exceptions.ResourceNotFoundException:
      self.sequenceTokenIsValid = True
      self.sequenceToken = None
      result = 'stream-not-found'
    return result
 
  def _primitive_create_log_stream( self):
    self.sequenceTokenIsValid = True
    self.sequenceToken = None
    try:
      self.logs.create_log_stream( logGroupName=self.logGroup, logStreamName=self.logStream)
      result = None
    except self.logs.exceptions.ResourceAlreadyExistsException:
      self.sequenceTokenIsValid = False
      result = None
    except self.logs.exceptions.ResourceNotFoundException:
      result = 'group-not-found'
    return result
 
  def _primitive_create_log_group( self):
   self.sequenceTokenIsValid = True
    self.sequenceToken = None
    try:
      self.logs.create_log_group( logGroupName=self.logGroup)
    except self.logs.exceptions.ResourceAlreadyExistsException:
      pass
 
  def _robust_put_log_events( self):
    status = 'hungry'
    for tryCount in range( 100):
      if status == 'group-not-found':
        self._primitive_create_log_group()
        status = 'stream-not-found'
      elif status == 'stream-not-found':
        status = self._primitive_create_log_stream()
        if status is None:
          status = 'hungry'
      elif status == 'invalid-sequence-token':
        getSequenceResult = self._getSequenceToken()
        # getSequenceResult == 'group-not-found' | 'stream-not-found-or-virgin-stream' | None
        if getSequenceResult == 'group-not-found':
          status = 'group-not-found'
        elif getSequenceResult == 'stream-not-found-or-virgin-stream':
          status = 'stream-not-found'
        else:
          status = 'ready'
      elif status == 'hungry':
        if not self.sequenceTokenIsValid:
          status = 'invalid-sequence-token'
        else:
          status = 'ready'
      elif status == 'ready':
        status = self._primitive_put_log_events()
        if status is None:
          status = 'done'
      if status == 'done':
        break
    if status != 'done':
      raise Exception( 'Failed to post to CloudWatch Logs.')
 
  def purge( self):
    if len( self.logEvents) > 0:
      try:
        self._robust_put_log_events()
      except Exception as ex:
        print( self.logEvents)
        print( ex)
      self.logEvents = []
 
  def log( self, level, msg, withPurge=False):
    self.put( msg, level, withPurge, None)
 
  def debug( self, msg, withPurge=False):
    self.put( msg, logging.DEBUG, withPurge, None)
 
  def info( self, msg, withPurge=False):
    self.put( msg, logging.INFO, withPurge, None)
 
  def warning( self, msg, withPurge=False):
    self.put( msg, logging.WARNING, withPurge, None)
 
  def error( self, msg, withPurge=False):
    self.put( msg, logging.ERROR, withPurge, None)
 
  def critical( self, msg, callFunc = None):
    self.put( msg, logging.CRITICAL, True, callFunc)
 
  def exception( self, msg, withPurge=True):
    self.log( logging.ERROR, msg, True)
 
  def isPurgeable( self):
    return True
 
  def __del__( self):
    try:
      self.purge()
    except:
      pass

How to use

Import custom_logging. In your lambda code, where you need application-centric logging, invoke the factory method createPolymorphicLogger() to create a logger. Then send all your application-centric log events to this logger, instead of print().

The logger is going to have the following public methods.

  • purge()
  • log( level, msg, withPurge=False)
  • debug/info/warning/critical/error/exception( msg, withPurge=False)

Use the log() method to log a string message. ‘level’ is one of the usual logging levels: DEBUG, INFO etc. For performance reasons, messages are buffered before actually sending to CloudWatch. The buffer is purged when either: (A) the buffer gets too long; or (B) the buffer ages out (1 minute); or (C) the withPurge parameter is explicitly set to True. Invoking the purge() method or releasing the custom logger class instance will also do it.

The debug() etc methods are short hand for the log() method when the level is fixed.

How to configure it

Refer to the inline comments.

Posted in Python | Tagged , | Leave a comment

SBD-JPath Data Model, Entry #1

In this series of entries, this entry being the first, I will specify the SBD-JPath data model. All the resolved values of expressions of this language are within the value-space of the types described in this data model.

This document specifies a grammar for SBD-JPath, using the same basic EBNF notation used in XML 1.0, 5th edition. White space is not significant in expressions. Grammar productions are introduced together with the features that they describe.

All data types in this model are captured by this class hierarchy diagram. Each line describes a type who is directly descendant from the most previous line with one less indentation.

The item type

The item type is an abstract type. An item is the basic building block of SBD-JPath, and is any thing but a sequence. Items can be j-values, functions, maps, tuples or atomic values. All items are immutable.

The sequence type

A sequence is an ordered list of items. Sequences are immutable. A sequence cannot contain a sequence. The empty sequence is identical to an absence of normal value. A sequence with a cardinality of 1 is identical to it’s one member. Sequences are not bags, an item can appear more than once in a sequence. The origin for indexing sequences is 1 (Sorry javascript developers! I chose 1 to be closer to XPath, and for other pragmatic reasons related to sequence predicates). Some core properties/functions of sequences (this is not exhaustive) include:

  1. last(): integer

Last returns the cardinality of the sequence. Equivalently, in the case of non-empty sequences, this is equal to the index of the last item.

Some core operators properties of sequences (this is not exhaustive) include:

  1. left-operand , right-operand
  2. left-operand < < right-operand
  3. left-operand >> right-operand
  4. left-operand is right-operand
  5. left-operand union right-operand
  6. left-operand except right-operand
  7. 1 to 10

A specification for these operands will be defined in a future post. They are equivalent to the ones of the same symbol in XPath 3.1 .

All empty sequences are identical to all other empty sequences. A sequence is identical to another sequence if and only if they have the same cardinality and each member in order, is identical.

An empty sequence can be constructed thus:

The j-value type

The j-value type is an abstract type. It descends from item. It is identical to the type described as “value” in the JSON specification. j-values should be seen as reference data, as opposed to value data, for the purposes of identity. For example, consider the following JSON datum.

The value of the colour property of this json object is a node whose string value is “red”. The type of the node is j-string, which inherits from j-value. Similarly, the value of the flag property is a node whose string value is also “red”. The type of the node is j-string, which inherits from j-value. The two aforementioned instances of j-value are NOT equal nor identical. The string ‘red’ is identical to ‘red’, because string is a value kind of datum. In contrast the two j-values, even though they may have identical property values, are not identical. This is because j-value is a reference kind of datum.

Descendant types are j-string, j-number, j-boolean, j-null, j-object, j-array.

All j-values have two properties:

  1. parent: j-value?
  2. root: j-value

The parent of an array member is the containing array. The parent of a json object’s values are the containing json object. Otherwise the parent is the empty sequence. The root is the ultimate parent or the j-value itself, starting from the given j-value and running up the ancestral path.

The j-string type

The j-string type is a concrete type. It descends from j-value and is sealed. It is identical to the type described as “string” in the JSON specification.

A literal j-string instance without a parent, can be constructed thus:

:"red"

[ebnf title=”j-stringy diagram”]
“j-string” {
j-stringy = string-delimiter { literal-char } string-delimiter.
literal-j-string = “:” j-stringy.
string-delimiter = “”””.
}
[/ebnf]

The model value for such constructed strings are as per JSON specification.

The j-number type

The j-number type is a concrete type. It descends from j-value and is sealed. It is identical to the type described as “number” in the JSON specification.

A literal j-number instance without a parent, can be constructed thus:

The model value for such constructed numbers are as per JSON specification.

The j-object type

The j-object type is a concrete type. It descends from j-value and is sealed. It is identical to the type described as “object” in the JSON specification.

A literal j-object instance without a parent, can be constructed thus:

:{"menu": ["fish", "poultry"]}
[7] j-objecty  ::= "{" (j-stringy ":" j-valuey ("," j-stringy ":" j-valuey)*)? "}"
[8] literal-j-object  ::= ":" j-objecty
[9] j-valuey ::= j-objecty | j-arrayy | j-numbery | j-stringy | j-booleany | y-nully

The model value for such constructed objects are as per JSON specification.

The j-array type

The j-array type is a concrete type. It descends from j-value and is sealed. It is identical to the type described as “array” in the JSON specification.

A literal j-array instance without a parent, can be constructed thus:

:["fish", "poultry"]
[10] j-arrayy ::= "[" (j-valuey ("," j-valuey)*)?] "]"
[11] j-array ::= ":" j-arrayy

The model value for such constructed arrays are as per JSON specification.

The j-boolean type

The j-boolean type is a concrete type. It descends from j-value and is sealed. It is identical to the union of the types described as “true” and “false” in the JSON specification.

A literal j-boolean instance without a parent, can be constructed thus:

:true
[10] j-booleany ::= "true" | "false" 
[11] j-boolean ::= ":" j-booleany

The model value for such constructed booleans are as per JSON specification, with rendered “true” representing logical true, and rendered “false” representing logical false.

The j-null type

The j-null type is a concrete type. It descends from j-value and is sealed. It is identical to the type described as “null” in the JSON specification.

A literal j-null instance without a parent, can be constructed thus:

:null
[12] j-nully ::= "null" 
[14] j-null ::= ":" j-nully

The model value for such constructed nulls are as per JSON specification.

The function type

The function type is a concrete type. It descends from item and is sealed. Functions can be anonymous or named.

A literal anonymous function which doubles a number, can be constructed thus:

function( $x number) as number { 2 * $x }
[15] literalFunction ::= "function" "(" (param ("," param)*)? ")" ("as" sequenceType)? enclosedExpr	
[16] param ::= "$" name ("as" sequenceType)?	
[17] enclosedExpr ::= "{" expr? "}"

expr will be defined later. It is basically an expression.
name will be defined later. It is basically a programmatic identifier, with a grammar common to most language grammars for variable identifiers.
sequenceType is a type specification. Parameters and function returns can be so typed.

Defining our productions further …

sequenceType

[18] SequenceType ::= ("empty-sequence" "(" ")") | (itemType occurrenceIndicator?)	

A sequence type is a test for a parameter type. If the actual value of the parameter does not pass the test, it will be a static error, if this error can be syntactically detected, and a run-time error if not. The empty-sequence() passes if and only if the actual value is an empty sequence.

sequenceType

[19] itemType ::= kindTest | ("item" "(" ")") | functionTest | mapTest | tupleTest | atomic
[20] occurrenceIndicator ::= "?" | "*" | "+"

An item type is a test for a parameter type. If the occuranceIndicator is ?, the count of items must be 0 or 1. If the occuranceIndicator is *, the count of items can be any number. If the occuranceIndicator is +, the count of items must be at least 1. If there is no occuranceIndicator, then the count of items must be precisely 1.

sequenceType

[21] kindTest ::= j-stringTest | j-numberTest | j-booleanTest | j-nullTest | j-arrayTest | j-objectTest | j-anyTest

A kind test is a test for a parameter type. The parameter value must be one of the standard json data types (j-string, j-number, j-object etc.)

j-stringTest

[22] j-stringTest ::= ("text" "(" ")") | ("text-or-null" "(" ")") | ("nonempty-text" "(" ")") 

A j-string test is a test for a parameter type. The parameter value must be a j-string value, or in the case of text-or-null(), either a j-string value or a j-null value. In the case of nonempty-text(), the j-string string value must be a non-empty string.

j-numberTest

[23] j-numberTest ::= ("number" "(" number-constraint* ")") | ("number-or-null" "(" number-constraint* ")")
[24] number-constraint ::= ("min" S number) | ("max" S number) | ("grain" S number) 

A j-number test is a test for a parameter type. The parameter value must be a j-number value, or in the case of number-or-null(), either a j-number value or a j-null value. Each number-constraint kind (min, max or grain), can only occur once, but in any order. If min is present, and the value is not j-null, the test fails if the numerical actual value of the parameter is less than the specified min number. Similarly for max. It is a static error, if the max value is less than the min value. The grain number must be specified with exponent, and be a positive number. If the grain constraint is specified, and the value is not j-null, and the remainder of the parameter actual value after division by the grain number (granularity) is non-zero, then the test fails.

j-booleanTest

[25] j-booleanTest ::= ("boolean" "(" ")") | ("boolean-or-null" "(" ")")

A j-boolean test is a test for a parameter type. The parameter value must be a j-boolean value, or in the case of boolean-or-null(), either a j-boolean value or a j-null value.

j-nullTest

[26] j-nullTest ::= "null" "(" ")"

A j-null test is a test for a parameter type. The parameter value must be a j-null.

j-arrayTest

[27] j-arrayTest ::= ("array" | "array-or-null") "(" arrayTypeConstraint* ")"
[28] arrayTypeConstraint ::= ("base" S (kindTest - j-anyTest)) | ("min" S number) | ("max" S number)

A j-array test is a test for a parameter type. The parameter value must be a j-array, or either j-array or j-null, in the case of array-or-null(). If constraints are present, the constraints must be met. The can be at most only one base constraint, one min constraint and one max constraint, but they can be in any order. If the base constraint is present, each member of the parameter value array, if the value is not j-null, must pass the specified kindTest. The numbers for the min and max, must be non-negative integers and not be rendered with the characters “+”, “-“, “.”, “e” and “E”. It is a static error for the max number to be less than the min number. It is an error if the cardinality of the array is less than the min (if specified) or greater than the max (if specified).

j-objectTest

[29] j-objectTest ::= ("object" | "object-or-null") "(" ")"

A j-object test is a test for a parameter type. The parameter value must be a j-object, or either j-object or j-null, in the case of object-or-null(). In future versions of SBD-JPath, we may allow extensions to the test, which will constrain the objects to a given JSON schema. It is envisaged that SBD-JPath compliant implementations of SBD-JPath processors will each have a convenient mechanism for which to register schemas. These schemas could then be leveraged in j-object tests.

j-anyTest

[30] j-anyTest ::= "node" "(" ")"

A j-any test is a test for a parameter type. The parameter value must be a json datum, namely j-value. This includes j-null.

atomic

[31] atomic  ::= "string" | "number" | "boolean" | "date"

An atomic test is a test for a parameter type. The parameter value must be one of the fundamental non-node types of SBD-JPath, to wit: string, number, boolean or date. None of these types include a null value in their value-space. Dates do not include a time component, nor time-zone. The test passes if the parameter type is a string (in the case of “string”), etc.

functionTest

A function-test is a test for any function. The test passes if the parameter is a function of any signature.

mapTest

A map-test is a test for any map. The test passes if the parameter is a map.

tupleTest

(Content to be developed)

The map type

(Content to be developed)

The tuple type

(Content to be developed)

The atomic-value type

(Content to be developed)
(Content should follow structure:
* basic description
* position in the type hierarchy
* value-space
* properties
* constructor grammar
* some core operators and functions
)

The string type

(Content to be developed)

The number type

(Content to be developed)

The boolean type

(Content to be developed)

The date type

(Content to be developed)

Posted in SBD-JPath | Leave a comment

Introducing SBD-JPath

SBD-JPath is an expression language that allows the processing of json data. Core ideas for this language were inspired from the XPath 3.1 language and JSONPath.

There are many json processing languages and libraries already, so why SBD-JPath, yet again another one? SBD-JPath introduces a number of features not yet seen in competitors, to wit:

  • A formal language specification
  • Language features inspired by XPath 3.1, including maps and functions
  • A syntax closer to XPath
  • Ability to navigate “up” the tree (that is to say, to reference ancestor nodes)
  • Greater extensibility, with the ability to externally provide functions

This blog post is merely an introduction to the concept of SBD-JPath. A series of future blog posts, taken together will comprise the language specification. We will start with a data model, the specify core concepts and core operators, and then finally I will specify non-core inbuilt functions and operators.

SBD-JPath will be instrumental in the development of another language, as yet unnamed, similar to XSLT 3.0, which seeks to transform json data into html fragments and visa-versa.

Just to wet your appetite, here is a sample task and solution provided by SBD-JPath.
In this task, we have sales data from a fruit vendor. What is the SBD-JPath expression to return the volume of sales? The sales data is thus:

{
  "apple":{
    "price": 3.10,
    "quantity": 100
    },
  "orange":{
    "price": 1.50,
    "quantity": 20
    }
}

The task answer is thus:

  sum( */(price * quantity))

So dear reader, read my future blog posts to receive the full specification.

Posted in SBD-JPath | Leave a comment

Connecting Auth0 to DynamoDb and CloudWatch