9
G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS OPEN JOURNAL SIGNAL PROCESSING 1 Feature Extraction for Dynamical Information G.L. Viviani, Senior Member, IEEE Abstract—This research provides a new means for Machine Learning feature extraction that is resolved from representing the underling information bearing processes more accurately. By recognizing that information is a dynamical process (as opposed to a static one), an alternative dynamical feature extraction process is defined to represent the underlying patterns of interest more accurately. In this manner, topics such as “few-shot” learning are more readily addressed. Importantly, recognizable information bearing systems will be shown to be equivalent with the concept of autonomy that is described by a novel concept termed “Information Braille.” I. I NTRODUCTION M Achine Learning (ML) can be thought of as a subset of conventional signal processing. Ultimately, ML relies on interpreting signals that can be 1D, 2D or even higher in dimensionality. It is important to first understand what is different about ML, as compared to conventional signal processing. Recognizing the underlying differences (seeing them in a different light) will better illuminate the improvements resulting from the formulation in this paper. Conventionally, signals are manipulated in either the time domain or else in an alternative reference which is often considered to be a frequency based one. This is true for both 1D as well as 2D signals of interest. Classically, a Fourier Transform pair takes a form such as: ˆ f (ξ )= Z -∞ f (x) e -2πjxξ dx ; f (x) = Z -∞ ˆ f (ξ ) e 2πjxξ dξ. Additional details can be found in many references. Some important observations are as follows: a) There exists a continuous-time (frequency) interpretation as suggested, as well as discrete (sampled) time(frequency) pair, and b) the signal, f (x) is considered to be periodic. Its associated energy must be bounded and equivalent in the time and frequency (Parseval’s Theorem) domains. As the orthogonal basis functions are sinusoidal, for the Fourier Series, it is most accurate for signals involving one or more sinusoids. Keep in mind, a Fourier series representation will be infinite (in general). Hence, typical signals will only be approximated by the Fourier approach, due to truncation. This approach is ubiquitous, and the reader is assumed to be familiar with these concepts. With respect to defining improved feature extraction for ML, the limitations imposed by traditional means for representing “signals” of interest will serve as a point of departure. For example, the Fourier (Series) approach represents signals of interest with constant coefficients to the basis functions as can readily be recognized by the series expansion of e jx . In what follows, improvements to these and other limitations will be G.L. Viviani, VivatronX, Bastrop, TX USA 78602 e-mail: [email protected]. Fig. 1. Electronic Cognition provided. In general, what is generally lacking is that they do not account for the fact that recognizable “information” is truly a dynamic process. In what follows this will be more fully explained and taken into consideration. So, the question becomes, what can be done differently, in order to provide improved feature extraction for ML applications? The answer to this question is the focus of this research. The need for such improvements is widely recognized. The human is an optimal “learner,” able to recognize information and utilize it effectively. Humans remain our standard for success. The very recent publication [1] surveys and compares performance for several leading ML algorithms on the subject of improved learning. The conclusion (at least partly) is: “the results also show that feature extraction is a very important step in few-shot learning, therefore, how to fully and effectively use the global and local information of objects still needs to be studied.” Some of the details related to these evaluations are found in the following publications: [2], [3], [4], [5], [6], [7], [8], [9]. This is a more simplistic way of identifying that information is not being accounted for adequately, to achieve better performing machine learning. This research focuses on what constitutes information in the context of ML. With this meaning, this effort will then show how to utilize it. As (12) confirms, information is necessarily a dynamical process. It represents the duality of entropy and energy [10]. Importantly, information is a time varying process [11]. Hence, understanding how to make use of dynamical- information improves the basic means by which time varying recognition capabilities can result from a ML based approach to AI (Artificial Intelligence). Also outlined in [10] is the fact that an algorithm (alone) cannot create information [12]. This class of problems is large. For example, in what might be classical “target recognition” in a cluttered background, certain objects such as a rock, or twigs or leaves have very limited interest (information) compared to the attraction to the information content of animals or vehicles. From an evolutionary sense, it may be as simple as the fact that inanimate objects are not the ones that can cause harm. There are probably deeper psychological reasons as well, such as those described in [13]. In the case of inanimate objects, it is logical that there ought to be less that is identifiable. Hence,

G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS

G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS OPEN JOURNAL SIGNAL PROCESSING 1

Feature Extraction for Dynamical InformationG.L. Viviani, Senior Member, IEEE

Abstract—This research provides a new means for MachineLearning feature extraction that is resolved from representingthe underling information bearing processes more accurately. Byrecognizing that information is a dynamical process (as opposedto a static one), an alternative dynamical feature extractionprocess is defined to represent the underlying patterns of interestmore accurately. In this manner, topics such as “few-shot”learning are more readily addressed. Importantly, recognizableinformation bearing systems will be shown to be equivalent withthe concept of autonomy that is described by a novel concepttermed “Information Braille.”

I. INTRODUCTION

MAchine Learning (ML) can be thought of as a subsetof conventional signal processing. Ultimately, ML

relies on interpreting signals that can be 1D, 2D or evenhigher in dimensionality. It is important to first understandwhat is different about ML, as compared to conventionalsignal processing. Recognizing the underlying differences(seeing them in a different light) will better illuminate theimprovements resulting from the formulation in this paper.

Conventionally, signals are manipulated in either the timedomain or else in an alternative reference which is oftenconsidered to be a frequency based one. This is true for both1D as well as 2D signals of interest. Classically, a FourierTransform pair takes a form such as:

f(ξ) =

∫ ∞−∞

f(x) e−2πjxξ dx ; f(x) =

∫ ∞−∞

f(ξ) e2πjxξ dξ.

Additional details can be found in many references. Someimportant observations are as follows: a) There exists acontinuous-time (frequency) interpretation as suggested, as wellas discrete (sampled) time(frequency) pair, and b) the signal,f(x) is considered to be periodic. Its associated energy must bebounded and equivalent in the time and frequency (Parseval’sTheorem) domains. As the orthogonal basis functions aresinusoidal, for the Fourier Series, it is most accurate for signalsinvolving one or more sinusoids. Keep in mind, a Fourierseries representation will be infinite (in general). Hence, typicalsignals will only be approximated by the Fourier approach,due to truncation. This approach is ubiquitous, and the readeris assumed to be familiar with these concepts.

With respect to defining improved feature extraction for ML,the limitations imposed by traditional means for representing“signals” of interest will serve as a point of departure. Forexample, the Fourier (Series) approach represents signals ofinterest with constant coefficients to the basis functions as canreadily be recognized by the series expansion of ejx. In whatfollows, improvements to these and other limitations will be

G.L. Viviani, VivatronX, Bastrop, TX USA 78602 e-mail:[email protected].

Fig. 1. Electronic Cognition

provided. In general, what is generally lacking is that they donot account for the fact that recognizable “information” is trulya dynamic process. In what follows this will be more fullyexplained and taken into consideration.

So, the question becomes, what can be done differently,in order to provide improved feature extraction for MLapplications? The answer to this question is the focus of thisresearch.

The need for such improvements is widely recognized. Thehuman is an optimal “learner,” able to recognize informationand utilize it effectively. Humans remain our standard forsuccess. The very recent publication [1] surveys and comparesperformance for several leading ML algorithms on the subjectof improved learning. The conclusion (at least partly) is: “theresults also show that feature extraction is a very importantstep in few-shot learning, therefore, how to fully and effectivelyuse the global and local information of objects still needs tobe studied.” Some of the details related to these evaluationsare found in the following publications: [2], [3], [4], [5], [6],[7], [8], [9]. This is a more simplistic way of identifying thatinformation is not being accounted for adequately, to achievebetter performing machine learning.

This research focuses on what constitutes information in thecontext of ML. With this meaning, this effort will then showhow to utilize it. As (12) confirms, information is necessarilya dynamical process. It represents the duality of entropy andenergy [10]. Importantly, information is a time varying process[11]. Hence, understanding how to make use of dynamical-information improves the basic means by which time varyingrecognition capabilities can result from a ML based approachto AI (Artificial Intelligence). Also outlined in [10] is the factthat an algorithm (alone) cannot create information [12].

This class of problems is large. For example, in what mightbe classical “target recognition” in a cluttered background,certain objects such as a rock, or twigs or leaves have verylimited interest (information) compared to the attraction tothe information content of animals or vehicles. From anevolutionary sense, it may be as simple as the fact thatinanimate objects are not the ones that can cause harm. Thereare probably deeper psychological reasons as well, such asthose described in [13]. In the case of inanimate objects, it islogical that there ought to be less that is identifiable. Hence,

Page 2: G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS

G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS OPEN JOURNAL SIGNAL PROCESSING 2

there is likely some difference in the way autonomous entitiespresent themselves, with respect to recognition.

Essentially all machine derived measurements from whichrecognition can occur are based on the principles described byFig. 1. In the case of an autonomous system, there is somethinginternal that causes measurable emissions (electromagnetic,sound, or smell emissions, etc.) and this is almost alwaysconverted into an electronic form from which cognitivedecisions are derived. The means for achieving decisions arevaried, however, the fact that there is a channel to transferinformation is virtually universal. It is worth mentioning thatpurely numerical approaches (simulated sensors) can also resultin information transfer.

The point of departure for this present work will be todetermine and implement algorithms that take account ofuseful energy (and associated mathematical) relationships thatare present for a broad class of problems to which MachineLearning (ML) and Artificial Intelligence (AI) algorithms areapplied. Algorithms cannot create information, but they canimprove performance, as will be shown. The outcome of theseobservations will be an improved means for feature extraction.

The remainder of the paper is organized as follow:• Derived (non-intuitive) new concepts related to energy

and recognition for autonomous system information.• The concept of a “stochastic backbone” will be introduced.• A more concise description of a stochastic backbone will

then be formulated in terms of another new concept whichis termed Information Braille (i-braille).

• Summary and conclusions will conclude the paper.

II. DERIVATION OF IMPROVED MEANS FOR RECOGNITION

As was outlined in [10] there is an unavoidable relationshipbetween information and recognition. This will be furtherexplained.

The Theorem of Noether [14] explains that a phase planerepresentation of a Hamiltonian system (conservative) mustresult in a dynamical representation that is a symmetrical closedorbit (curve). It must be periodical. This is a critical observation.When combined with those of Pendry [11] (for example), asoutlined in the [10], the only possible conclusion is that energyflows for a dynamical system with recognizable informationmust therefore be periodic. (Importantly, all information bearingsystems are dynamical.)

Another useful description for a dynamical system is that itbe observable. The concepts of observable and controllable havea strict mathematical interpretation in the context of controltheory.

So, it is reasonable to assume that, in general, informationbearing systems of interest must therefore be observable(assuming recognition is primary goal).

The next description will be counter-intuitive in nature. Ifwe recognize that the conservative Hamiltonian based systemthat is described by [14] must be periodic and symmetric (as iseasily visualized in the phase plane), then this means that fora suitably chosen energy function (in the phase plane), therewill be an associated invariance (at least in energy). This is aconsequence of the Mean Value Theorem. Importantly, such

invariance can be determined in both time and space, or elsespatiotemporally.

The recent work of Martinelli [15] confirms that if givena dynamical system contains something termed “continuoussymmetry,” then it must have some degree of “unobservability.”In short, this means that there must be an inability to measuresome state of the system based on observations of inputsand outputs of the system. Some additional details of theseinterrelationships are also explained in [16].

These described properties lead to the conclusion that inorder to confirm, and therefore recognize an autonomoussystem, at least one particular element of the state must beunobservable. Indeed, if we think in terms of an invariant phaseplane representation for Hamiltonian autonomous systems,it is not necessarily possible to observe the period of theinvariant closed orbit. This is significant, recalling that muchof signal process is based on Fourier methods for which atacit understanding of the concept of a frequency is essential.Alternatively, it is possible that various systems with thesame closed curve presentation, but operating at differentperiodicities, would appear to be the same. Other hiddenparameters may also be present, as required by [15] whencombined with the concept of the closed symmetric curve inthe phase space as determined by [14].

Hence, for an autonomous system to be recognizable,it must also be (somewhat) unobservable. While this maysound like a challenge it is really a very exploitable cluethat can be generally applied to recognition for any systemof interest. Borrowing from [13], the objective is to “exposethe ghost” in the system (which is inherently unobservable).This observation, alone, sets what follows apart from the moreconventional approaches to ”signal processing” associated withfeature extraction.

A. Summary

Multiple phase plane observations of autonomous informa-tion bearing systems that depend upon invariant symmetricclosed curves represent a basis set to provide an improvedrecognition result. More than one phase plane representation isnecessary. If there was only one, then invariance could not beconfirmed. Classically, characterizations like those associatedwith Fourier methods assume ensemble invariance (which isprobably not realistic). These observations will depend ona measurement that is a function of something that is notobservable (such as internal energy).

III. APPROACH

The approach will consider that there must be a closed phaseplane representation of the dynamical nature of the (periodic)information which will undoubtedly have some element that isunobservable (such as energy, or else frequency). Importantly,this suggests that efforts to try and couple the energy tothe dynamics are not worthwhile to pursue as they are notnecessary from a recognition perspective. This seems counterintuitive. Additionally, there must be an ability to separate(equivalent) phase plane representations in time and/or spaceto be able to discern that successive representations of phase

Page 3: G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS

G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS OPEN JOURNAL SIGNAL PROCESSING 3

plane-based information are invariant (or not). The fidelity ofthe separation in time and/or space is a performance measure.A more comprehensive Markov-like description will be differedto some follow-on research.

Separation in time and/or space necessarily implies anotherdimension, which suggests that it is necessary to have aminimum of 3D to be confident that something is recognizable.Since information is dynamic, and the associated closed curveis 2D, we need 3D to confirm invariance (or not). While wemay not be able to discern the precise interrelationship ofthe unobservable energy of the system, we should at leastbe able to confirm that invariance in time and/or space mustexist. As originally argued by Gabor [17], this will typicallybe electronically represented in a suitable voltage-currentrelationship (which is topologically equivalent to a phase plane).

The concept of phase plane closed curves can be viewed froma variety of perspectives, all of which have ML applications.For example, we can consider a small or large-scale distributed(electrical) signal creating network. A power grid representssuch a system. In this case the phase plane representationof the nodal voltages in the network might be of interest.The information content would rest in the similarity (or not)of the phase planes associated with each node. Alternatively,something like radar returns could be measured. In this case,it would be interesting to assess similarity (or not) betweenmeasurements spaced in time from the same location, orperhaps spatially separated measurements of the same signal. Inanother variation, the associated close curve in the phase planecould be considered for the case of measuring various frequency(wavelength) bands simultaneously (akin to spectroscopy).Space, time, or multidimensional networks all can be imaginedcontaining phase plane indications of dynamical informationflows of interest. Again, the goal is to utilize these various phaseplane formulations for the purpose of recognizing invariance.Invariant information is the basis for learning. In what follows,references to various time/space separations of phase planeswill be made. This is done to emphasize the generality of thisapproach.

Considering that many measurements can be traced to somepropagation of electromagnetic fields, a visualization derivedfrom Poincare [18] comes to mind [19]. The Poincare Sphereis a convenient means for presenting dynamical informationabout polarized electromagnetic radiation. It stems from thepremise that

E(z, t) =

[EH(z, t)EV (z, t)

]= pej(wt−kz) (1)

p =

[aHe

j(δH)

aV ej(δV )

](2)

represents a general plane harmonic wave propagating inthe z-axis [20]. In this formulation, H = y and V = x denotesthe horizontal and vertical components, respectively. Here, themagnitude of p and δ are given by:

|p|2 = (aH)2 + (aH)2, (3)δ = δV − δH . (4)

As can be found in any standard reference [21] on polarizedE/M propagation, the horizontal and vertical phasors (assumingconstant frequency) maintain a time-invariant polarizationwhich is described by the vector p and δ.

The idea of a stochastic process associated with polarizationof various E/M waves has value in the context that the scatteredradar returns are seen to result in an ensemble of polarizedwaves whose statistics are useful for identifying objects ofinterest. In the analog to a widely distributed circuit (grid ofnodal or else remote sensing observations), with known sets of“monochromatic” observations, it is proposed to consider thediscrete observations of orthogonal components of a pseudoE/M wave (voltage and current) at each node in the network andto consider the relative magnitudes and phase of each complexphasor between simultaneous observations, at a particular node.Hence, the state of a particular node, j, will be determined asfollows:

Xj(k) =

[aV (k)ej(δV (k))

aI(k)ej(δI(k))

](5)

for each time epoch, k. Here, the subscripts V, and I denote(pseudo) voltage and current and they correspond to the x andy axis, respectively.

For each node in the network, the state of the system, ata given time, is characterized in Fig. 2. As can be discerned,this is a phase-plane formulation. The third dimension will beforthcoming.

The set Cj = {aV (k), δV (k), aI(k), δI(k)} of randomvariates will be associated with each node in the network, j. Orequivalently, this could be the same signal observed at differenttime epochs. Alternatively, multiple phase plane representationsmight result from similar signals observed over different rangesof frequencies. For example, at a particular node (by applyinga suitable filter) a range of frequencies of interest could becharacterized by a multiplicity of phase planes, as suggestedby Fig. 6. This becomes even more apparent when stochasticvariations are considered (stochastic effects will affect differentphase planes differently which will provide additional insight).Or, as is commonly observed in optics to be sufficient, theequivalent set of stochastic parameters is reduced to the set

Ψ = {aV (k), aI(k), δ(k)} (6)δ(k) = δV (k)− δI(k) (7)

By analogy to Stoke’s parameters, [21], the following isdetermined and can be thought of as the elements of aquaternion (requires at most two phase planes to define) whichcharacterizes each observation:

s0 =< aV (k)2 > + < aI(k)2 > (8)

s1 =< aV (k)2 > − < aI(k)2 > (9)s2 = 2 < aV (k)aI(k) cos(δ(k)) > (10)s3 = 2 < aV (k)aI(k) sin(δ(k)) > (11)

a multi-dimensional framework for characterizing thestochastic nature of phase plane dynamics. s2 is directly

Page 4: G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS

G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS OPEN JOURNAL SIGNAL PROCESSING 4

Fig. 2. Monochromatic Electromagnetic Signal. Any electromagnetic signal canbe represented in a quadrature manner. The approach to such a representationvaries with the pertinent circumstances and means for collection.

Fig. 3. Perfectly aligned vertebrae in “Normal” state (left) and variations thatmight be considered an “Abnormal” state (right). In this example the thicklines can be thought of as phase plane dynamics associated with a) multiplenodes in a network (system), b) the same observation at multiple time epochsthat are regularly spaced, or c) similar phase planes observed at multiplefrequency ranges, or d) any combination of these approaches, for phenomenaof interest. In all cases, a phase plane representation is utilized. For somethingthat is invariant and recognizable, this should result in a symmetric closedcurve, as indicated. The ensemble of phase plane representations is what is“recognizable.”

related to real power, P , and s3 is directly related to reactive(imaginary) power, Q, where the complex power (time varyingenergy) at each node is commonly specified as S = P+jQ. Theintensities provided by s0 and s1 will be useful in determining abasis function. Note: the symbol <> denotes ensemble average.Importantly, these measurement power related parameters arenot necessarily related to the underlying dynamical processeswhich is being observed. A tacit assumption is being made thateven if the nature of what is causing the radiation to emanateis not observable, if the object is recognizable, there shouldbe periodicity in the measurements associated with the systemof interest.

This is but one example formulation for representing a setof basis functions that would characterize the invariance (ornot) for system. Successive snapshots of observations could beutilized, as will be shown, for machine learning.

A. Stochastic Backbone Concept

If we consider that ellipses (or whatever closed curve in theplane is appropriate for the particular information process ofinterest) represent a suitably constructed ensemble of phaseplane characterizations, then the vertebrae of a “backbone,”with each having its own shape and orientation as seen inFig. 3, suggests a suitable 3D formulation for the recognitionproblem from which determinations are made.

The ensemble of measurements is like a backbone withvertebrae that are slowly changing shape and orientation atrelatively constant locations in the z direction. Unusual eventswill be marked by a sudden twisting or “rupturing” of theindividual vertebrae. Hence, a suitable control strategy shouldbe composed of something that is able to detect sudden twistingor turning, or other shape distortions, as projected to the x-yplane.

The stochastic variations have been omitted from Fig. 3 forclarity.

B. Feature Extraction

It should be apparent that the motivation behind this effortis to create representative patterns that are more amenable toMachine Learning and Deep Learning approaches. In particular,the following assumptions are informative.

• As has been shown to be true in spectroscopy, a limitedset of basis functions (wavelengths) is all that is necessaryto make positive identifications. The basis functions canbe thought of as associated with the vertebrae of Figure 3.These stochastic variations of the system of signals areeasily discernible, one from the other.

• The backbone provides a recognizable closed curve phaseplane (set of) shape(s) that is necessary to account for aninformation generating autonomous system. Invariancebecomes a property (or not) of this ensemble. Thisapproach is also extendable in an incremental way.

• Pattern recognition can be thought of in terms of ellipsesand a sheath around the ellipses. Such time varying eventswill give rise to determining the nature of pertinent eventsassociated with observations at multiple wavelengths, orelse distributed across pertinent geographic regions ofknown locations, or else space at known time epochs.

Fig. 3 only shows a comparison of “good” versus “bad.”However, extending this approach to include a multiplicty ofrecognizable patterns is straightforward. Importantly, given theinformation-based way the “stochastic backbone” has beenconstructed, any invariant (or at least slowly varying) andrecognizable “pattern” is representative of a state of informationthat is being observed. Again, there is no need to understandthe details of how these dynamical properties are coupledto the internal dynamics of the autonomous system underconsideration, even though it is recognizable.

1) Illustration: A program was written to create stochasti-cally varying ensembles of ellipses of Type “0” and Type “1”shown in Fig. 4. In the Type 0 case the stochastic variationswere subtle and in case of Type 1, they were more pronounced.The idea was to make a ML classifier by two different methods(in order to compare performance).

• Numerical Features: The equation for an ellipse wasbroken down into the 3 descriptive parameters and afeature vector of length 21 (3 x 7) was constructed foreach set of 7 ellipses (as illustrated in Fig. 4) in thetraining set based on a neural net. Sets of 7 ellipses wereconsidered as a numerical feature. This neural net and itsperformance are described in in Fig. 5.

Page 5: G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS

G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS OPEN JOURNAL SIGNAL PROCESSING 5

Fig. 4. Randomly selected training set examples of “Good” and “Bad” stochastically varying ensembles of phase plane closed curves.

def build_model():model = Sequential()model.add(Dense(42, input_dim=21, activation=’relu’))model.add(Dense(21, activation=’relu’))model.add(Dense(1, activation=’sigmoid’))# Compile modelmodel.compile(loss=’binary_crossentropy’, optimizer=’adam’,metrics=[’accuracy’])

return model

Fig. 5. Stochastic Backbone ROC Curves for elliptical numerical features.Different amounts of training are indicated for the set of samples illustratedin Fig. 4

.

• CNN Features: The same set of ellipses were representedas images, just as shown in Fig. 4 for the exact sameconsiderations used in the Numerical Features set. In thiscase a CNN neural net that was trained with images (asopposed to numerical features) was created. This CNNalgorithm in not shown (for simplicity). It took moreiterations to achieve similar levels of performance to thoseindicated in Fig. 5. It is possible to train with images, but

the next sections will focus more on numerical featuresets, which are more precise.

For a realistic scenario, it is unclear whether the measuredphase plane observations would be numerically determined orelse in the form of images. For the case of very pronouncedstochastic variations (much more pronounced than shown inFig. 4), discerning Type 0 versus Type 1 is quite easy. Asit turns out, even for very subtle variations between the twodifferent types, as shown in Fig. 4, essentially 100% accuracywas achieved relatively easily for either numerical featuresets, or else image-based ones. Whatever approach is mostconvenient should be applicable for discerning invariance (ornot) in order to evaluate autonomous dynamical informationbearing systems of interest.

IV. AUTONOMOUS INFORMATION BRAILLE

What has been revealed so far regarding the stochasticbackbone concept is indeed new and interesting. However, itdoes not seem reasonable that in all cases an “elliptical” type ofrepresentation will be possible. If it is, and it is more convenient,then there is no reason not to employ it. Alternatively, aneven more powerful representation is proposed as will bedescribed. This can be thought of as an alternative to Fouriermethods, that is more suitable for ML and the representationof autonomous systems which require periodic informationflows. This approach will not be based on are an “assumption”of periodicity, as the flows will already be symmetric closedcurves. Instead, we will make use of a segment of arc lengthπ for the closed contour of interest. This is suitable for anyclosed symmetric curve of interest.

Appendix VIII describes an approach to generalizing Euler’sIdentity, which is termed Instantaneous Spectral Analysis (ISA)[22]. As can be easily verified, Fourier methods can be thoughtof an application of Euler’s (conventional) Identity. In [22], theconcept of a function of the form eti

(22−m)

, which is a moregeneral form of Euler’s Identity, is presented.

Page 6: G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS

G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS OPEN JOURNAL SIGNAL PROCESSING 6

Fig. 6. Multiple manners of representing potentially invariant dynamical processes. The left-hand column shows phase plane indication of periodic functions.The right-hand column illustrates the associated frequency domain representation. The central column indicates a new approach for representing potentiallyinvariant dynamical periodical processes. This is termed Information Braille (i-braille) and is derived from [22]. The values displayed, for i-braille, correspondto time varying coefficients associated with unique basis functions derived from (21).

This concept, which is fully described in [22] leads to (21).As described by (21), a general (2M − 1) order polynomial,p(t), is represented by a series expansion where the coefficientsof complex sinusoids (spirals) are time varying. Importantly,there is no underlying assumption of periodicity regarding thepolynomial, p(t). As seen in Fig. 6, the series coefficients havea mean and variance (standard deviation). This can also besurmised by investigating (19).

From a ML perspective, the resultant patterns for thecoefficients defined in (21) for a general polynomial, p(t),are very distinctive and discrete. They have a resemblance tobraille, as shown in the central column of Fig. 6. Hence, theyare termed Information Braille (i-braille). Importantly, they aremore discerning and specific (for a polynomial segment) thansomething like a Fourier representation as also seen in Fig. 6.For a periodic information flow that corresponds to a closedsymmetric curve in the phase plane, they provide an excellentway to perform feature extraction. Based on these parameters,in much the same manner as the elliptical parameters wereapplied in the previous “Numerical Features” example, this isa more general approach to accomplish the same capability.

V. RESULTS

Fig. 6 illustrates the fact that a periodic closed curve in thephase plane is not necessarily associated with a single peak inthe frequency domain. As previously described, some systemsare not well represented as “spikes” in the frequency domain.In Fig. 6, the Van der Pol Oscillation is topologically verysimilar to a sinusoid, but there are difficult to discern differencesbetween the two in the frequency domain. In the middle columnof Fig. 6, the equivalent i-braille representation for the sameperiodical information flows is shown. As can be seen, the i-braille representation displays a dramatically different “pattern”for each of the periodic (simulated) information flows. This

model = Sequential()model.add(Dense(18, input_dim=18, activation=’relu’))model.add(Dense(18, activation=’relu’))model.add(Dense(9, activation=’relu’))model.add(Dense(1, activation=’sigmoid’))# Compile modelmodel.compile(loss=’binary_crossentropy’, optimizer=’adam’,

metrics=[’accuracy’])return model

Fig. 7. Stochastic Backbone (elemental) ROC Curves for different amountsof training with Van der Pol Orbits of µ = 1.0 vs µ = 1.6. Each i-braillepattern is characterized by 18 features corresponding to the means and standarddeviations shown in the central column of Fig. 6

characteristic is critical to have assured discernible featureextraction.

The i-braille representation is more amenable to computerprocessing. The differences between one closed curve andanother are more discernible with i-braille. Perhaps moreimportantly, the i-braille pattern is more practical, than either

Page 7: G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS

G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS OPEN JOURNAL SIGNAL PROCESSING 7

the time or frequency descriptions, or both. As previouslyshown, both an image-based and analytical representation forthe feature sets of the vertebra of a stochastic backbone wereeffective. However, the analytical representation was moreprecise. The i-braille approach is applicable to any closedcurve in the phase plane and is more general than the ellipticalfeatures previously illustrated.

The ROC curves for an i-braille ML binary comparison isshown in Fig. 7. As can be extrapolated from Fig. 6, utilizationof conventional Fourier methods will reveal littler discernibledifference between a Van der Pol phase plane for µ = 1.0versus µ = 1.6. Recall that the Van der Pol Equation is:x + µf(x)x + x = 0. A training set with 170 training and30 test feature sets, of a (2M + 2 = 18) length feature vectorcomprised of the means and standard deviations for the uniquetime varying coefficients of (21) was created. Two i-braille typesof coefficients were created. Type 0 corresponds to the case µ =1.0 + 0.1 ·Normal(0, 1) (similar to that shown in Fig. 6) andType 1 for µ = 1.6+0.1·Normal(0, 1). To simulate variability,random noise equal to 0.1 ·Normal(0, 1) was added to each µvalue before the phase plane curve was calculated. As shown inFig. 7, performance is excellent. More pronounced differencesare even easier to discern. It is not clear that conventionalapproaches would even be able to discern these differences.Hence, the i-braille approach is ideally suited for discerningfine detail between competing forms of information.

With what has been described, the primary results of thisresearch are reduced to the following algorithms:

Algorithm 1 I-Braille (Feature Set)

INPUT: π-length arc - s(t) - from phase planeOUTPUT: ML Features, F (k) : k = 1, 2, ...2M−1 from (21)

Fit polynomial, p(t), of order (2M − 1) to s(t)Determine (18) for p(t)Apply (17) to prepared for next stepUse (21) to determine F (k)’s - I-Braille coefficients

Algorithm 2 Stochastic Backbone (Ensemble of Feature Sets)

INPUT: η π-length arcs - sη(t) - η = 1, 2..β phase-planesOUTPUT: β Feature Sets, Fη(k) : k = 1, 2, ...2M−1

; η = 1, 2..β from (21)

η = 1while η 6> β do

Fη = Call Algorithm 1η++

Return F1, F2...Fβ

Algorithm 1 and Algorithm 2 summarize the main develop-ments of this research. A means for presenting an ensembleof pertinent ML Features for autonomous information bearingsystems is provided.

VI. CONCLUSION

This work demonstrates that information must be representedby a periodical dynamical process. Only in the phase plane,

for such a periodical process, is it simple to discern invariancethat corresponds to the associated two degrees of freedom [17].Such a representation of information is consistent with whatis expected to characterize an autonomous system of interest.An autonomous system is a particular example of a meansfor generating information. It was also shown that there willalways be some element of an autonomous system that is notobservable, but this does not hamper the ability to recognizeit.

The idea of utilizing phase planes to better characterize thefeature space for ML is introduced.

An i-braille feature space representation, based on conceptsfound in [22], is applied to invariant measures of informationin the phase plane. Such an approach allows for a significantability to discern fine details between similar objects of interest.This is ideally suited for ML because it provides metrics thatcharacterize the dynamical nature of information, which is akey requirement to improve feature extraction.

The stochastic dynamics for information bearing systemsof interest are represented by multiple feature sets, in orderthat recognition (as well as decision-making) is possible.This is accomplished by taking a multiplicity of phase plane(information bearing) representations. Applications of thesecapabilities are varied and suitable for machine-based cognitiveapplications. This approach is termed a Stochastic Backbonewhich is the result of a multiplicity of phase plane feature sets.

Algorithms for both the i-braille and stochastic backboneconcepts are provided in Algorithm 1 and Algorithm 2,respectively. The performance with regard to ML is describedin Fig. 7.

Importantly, the connection between dynamical systems andthe means by which information is recognizable has resulted apractical means for achieving improved ML feature extraction.This approach is not generally known in the literature.

A future effort will show how to apply this method in the caseof autonomous objects in a cluttered background. Additionally,if this present work is coupled to that of [10], then a powerfulquantized approach to processing such results can also beachieved.

REFERENCES

[1] S. Yang, F. Liu, N. Dong, and J. Wu, “Comparative analysis on classicalmeta-metric models for few-shot learning,” IEEE Access, vol. 8, pp.127 065–127 073, 2020.

[2] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human-levelconcept learning through probabilistic program induction,” Science, vol.350, no. 1332, December 2015.

[3] G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networksfor one-shot image recognition,” Proceedings of the 32nd InternationalConference on Machine Learning, Lille, France, 2015.

[4] O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra,“Matching networks for one shot learning,” 30th Conference on NeuralInformation Processing Systems (NIPS 2016), Barcelona, Spain, 2016.

[5] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. S. Torr, and T. M. Hospedales,“Learning to compare: Relation network for few-shot learning,” in 2018IEEE/CVF Conference on Computer Vision and Pattern Recognition,2018, pp. 1199–1208.

[6] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shotlearning,” 31st Conference on Neural Information Processing Systems(NIPS 2017), Long Beach, CA, USA., 2017.

[7] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap,“Meta-learning with memory-augmented neural networks,” Proceedingsof the 33rdInternational Conference on Machine Learning, New York,NY, USA, 2016.

Page 8: G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS

G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS OPEN JOURNAL SIGNAL PROCESSING 8

[8] A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,” 2014.[9] A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning

algorithms,” 2018.[10] G. L. Viviani, “Information devices based on quantized Lienard-Hermite

oscillators,” IEEE Transactions on Molecular, Biological and Multi-ScaleCommunications, vol. 6, no. 2, pp. 81–92, 2020.

[11] J. B. Pendry, “Quantum limits to the flow of information and entropy,”Journal of Physics A: Mathematical and General, vol. 16, no. 10, p. 2161,1983. [Online]. Available: http://stacks.iop.org/0305-4470/16/i=10/a=012

[12] T. M. Cover and J. A. Thomas, Elements of Information Theory. Wiley,2006.

[13] A. Koestler, The Ghost in the Machine. Hutchinson and Company,London, 1967.

[14] E. Noether, “Invariant variation problems,” Translation of original article“Invariante Variationsprobleme,” Nachr. d. Konig. Gesellsch. d. Wiss. zuGottingen, Math-phys, Klass, 235-257, 1918, arXiv:physics/0503066v1[physics.hist-ph] 8 Mar 2005.

[15] A. Martinelli, Observability: A New Theory Based on the Group ofInvariance. SIAM (Society for Industrial and Applied Mathematics),Philadelphia, 2020.

[16] ——, “Overdamped 2d brownian motion for self-propelled and non-holonomic particles,” Journal of Statistical Mechanics: Theory andExperiment, p. P03003, 2014, 10.1088/1742-5468/2014/03/P03003.

[17] D. Gabor, “Theory of communication (parts I, II and III),” Journal ofthe Institute of Electrical Engineers, vol. 93, pp. 429–457, 1946.

[18] H. Poincare, “Theorie mathaematique del la lumier,” 1892.[19] G. L. Viviani, “Discrete stochastic process monitoring for large-scale

distributed circuits,” IEEE International Conference on Systems, Manand Cybernetics, pp. 774–778, 1992.

[20] D. Giuli, “Polarization diversity in radars,” Proceedings of the IEEE,vol. 74, no. 2, 1986.

[21] M. Born and E. Wolf, Principles of Optics. Pergamon Press, 1980.[22] J. Prothero, KM Zahidul Islam, H. Rodrigues, L. Mendes, J. Barrueco,

and J. Montalban, “Instantaneous spectral analysis,” JOURNAL OFCOMMUNICATION AND INFORMATION SYSTEMS, vol. 34, no. 1, pp.12–26, 2019. [Online]. Available: https://doi.org/10.14209/jcis.2019.2

VII. APPENDIX A

Information, as described in [10], is consistent with the factthat information is a dynamical process. Pendry has shown thateven quantum dynamical systems adhere to information/energyflows described by Lemma 7.1.

Lemma 7.1: Pendry [11]:For a given channel, there is an inequality between informa-

tion flow, dIdt , and energy flow, dEdt :

(dI

dt)2 ≤

π(dEdt )

3 ~ ln22(12)

Proof: See main result of [11].

VIII. APPENDIX B

This Appendix outlines the major contributions and determi-nations developed in [22]. In [22], more detailed explanationsare found.

As is well known, Euler’s Identity allows for a relationshipbetween an exponential and trigonometric expression expressedin complex form,

eit = cos(t) + i sin(t). (13)

There is also a series expression for both sin(·) and cos(·).In [22], the notion of a more general exponential function

is introduced, eti(22−m)

. This can be thought of as a moregeneral Euler’s Identity. This in turn leads to more generalseries expressions which are a cousin to the well-known FourierSeries for representing functions. Unlike the Fourier Seriesexpressions, the ISA (Instantaneous Spectral Analysis) series

expression involves basis functions that are “spirals” as opposedto sinusoids. With an ISA formulation, any polynomial, p(t),can be represented by (21). NOTE: (19), (20) and (21) areequivalent and provide different representations, for clarity.(21) is the representation from which the i-braille coefficientsare determined.

The following equations represent the significant resultsfrom [22]. In the present research, curved line segments arerepresented by 15th order polynomials which are consistentwith the findings in [22].

eti(22−m)

=

M∑m=0

d2m−1e−1∑n=0

in(22−m)ψm,n(t) (14)

ψm,n(t) =

∞∑q=0

(−1)qd21−me tqd2

m−1e+n

(qd2m−1e+ n)!(15)

Em,n(t) =1

d2m−1e

d2m−1e−1∑p=0

i−n(2p+1)22−m

eti(2p+1)22−m

(16)

Em,n(t) = ψm,n(t) (17)

p(t) =

M∑m=0

d2m−1e−1∑n=0

cm,nψm,n(t) (18)

In (14) - (21), T is the duration over which the polynomialof interest is valid and (2M −1) is the order of the polynomial,p(t). For convenience, T this has been normalized to 1, asworking in 2π-radians is desired. Closed curves in the planeare invariant with respect to the frequency of operation of thesystem. This formulation, for nonlinear periodic functions ofinterest, is much more precise than other known metrics, asdescribed in Fig. 6. Hence, the coefficients of (20), that vary incardinal number depending upon the order to the polynomialthat is chosen to represent p(t), will form the feature space forML training, to recognize samples associated with a particularphase planes’ characterization of information. As shown inFig. 6, (keep in mind the multiplier on the y-axis), the separationin feature space is very broad. Hence, neural net performanceis likely to be excellent.

NOTE: All the coefficients of (21) are time varying andsymmetric about a mean value as shown in the central columnof Fig. 6.

Page 9: G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS

G.L. VIVIANI - FEB 2021 - I-BRAILLE - IEEE TRANSACTIONS OPEN JOURNAL SIGNAL PROCESSING 9

p(t) =

M∑m=0

d2m−1e−1∑n=0

constant︷ ︸︸ ︷cm,nd2m−1e

d2m−1e−1∑p=0

constant︷ ︸︸ ︷i−n(2p+1)22−m

complex spiral︷ ︸︸ ︷real−valued exponential︷ ︸︸ ︷et cos(π(2p+1)21−m)

complex sinusoid︷ ︸︸ ︷eit sin(π(2p+1)21−m) (19)

p(t) =

f=0 (DC)︷ ︸︸ ︷(c0,0e

t + c1,0e−t) · 1 +

f= 1T (maximum)︷ ︸︸ ︷

(c2,01

2+ c2,1

i

2)eit +

f=− 1T (minimum)︷ ︸︸ ︷

(c2,01

2− c2,1

i

2)e−it +

− 1T < f < 1

T︷ ︸︸ ︷M∑m=3

d2m−1e−1∑n=0

cm,nEm,n(t) (20)

p(t) =

f=0 (DC)︷ ︸︸ ︷(c0,0e

t + c1,0e−t) · 1 +

f= 1T (maximum)︷ ︸︸ ︷

(c2,01

2+ c2,1

i

2)eit +

f=− 1T (minimum)︷ ︸︸ ︷

(c2,01

2− c2,1

i

2)e−it+

− 1T < funique <

1T ∀ [unique = 2,3...,2M−1−3]︷ ︸︸ ︷

M∑m=3

5·2m−3−1∑p=3·2m−3

d2m−1e−1∑n=0

cm,n2m−1

(et cos(π(2(〈p〉(2m−1)+1)21−m)

in(2〈p〉(2m−1)+1)22−m)+e−t cos(π(2(〈p〉(2m−1)+1)21−m)

in(2(−p+3·2m−2)+1)22−m)

) e−it sin(π(2(〈p〉(2m−1)+1)21−m)

(21)where 〈·〉N denotes a modulo N operation.