Thursday, July 10, 2014

Design Document: Transforming YAML to OWL

In an ad hoc regards, my having begun a project[1] in which one of the goals will be, technically, to transform a set of YAML models into a set of OWL ontological models -- all of a set of simultaneously legal and public information items about the structure and the proceedings of the United States Congress -- there becomes a goal, sort of iteratively, towards defining a methodology for translating "That YAML model" into a a single ontology model -- likely that may be stored as separate files, one for each of :
  • An ontology about the United States Congress as a whole, including:
    • The United States House of Representatives
    • The Senate of the United States of America
    • Committees of the House and of the Senate
    • Sessions of the Congress
    • Leadership Roles in the Structure of the Congress
    • Congressional Membership
  • Political Parties in the US
  • US States
  • Congressional Districts, in Each Congress (in a model organized into one file to each US state)
  • Formal public profiles about members of the Congress (GovTrack, Washington Post, etc) i.e "Social media"

Some Caveats

Caveat 1: Not About Lobbying

This project will not be endeavoring to venture onto the "Bling-bling" or alternately "Hot-Button" topic, campaign financing -- not at the scale of the PAC, and not at the scale of any manner of crowdsourcing. Certainly, campaign financing is thoroughly addressed in an existing number of existing web information resources. This project shall focus, rather, about the formal structure and the proceedings of the US Congress, in an interest of developing a structural model of the Congress -- a model constrained within the effective model syntax of the Web Ontology Language (OWL) -- in developing an OWL model that may be made reference to, for effective application in development of web content resources. This would be, principally, towards fostering a sense of understanding for US citizens, as about the US Congress as an entity representative of the United States, as well as for fostering of US citizen involvement in the democratic processes of the US federal government.


Caveat 2: Not POTUS or SCOTUS

This project, specifically,  shall not be focusing about the US Federal Executive Branch, and neither about the US Judicial Branch.


Caveat 3: Congressional Record Annotation Engine?

This project may later endeavor to develop a model for processing of documents published in the formal Congressional record, in the interest of applying  any number of topically focused models -- e.g AGROVOC -- in developing a machine generated, topical annotation layer onto the Congressional record, in application of existing, expert topical knowledge models, for fostering a broader public understanding about topics addressed in the Congressional record.

Ed note: In regards to this item, specifically, it should be considered, whether and how an ontology-focused model, as such, could in any ways extend on the functionality available of a conventional, flat/full text search engine. Concepts of the ontological knowledge model should be thoroughly addressed, in that analysis, such as: Knowledge and Information Structure; Inference and Entailment; Subsumption, and Extensibility, and the Open Universe Model.

Technical Outline: YAML to OWL

An information model in YAML structured markup format may be transformed to instance of OWL ontology model, using one or more programming languages. This project shall use Java.

Implementation note: The processing of the YAML input stream could be managed with SnakeYAML. Serialization to OWL could be performed via OWL API or Apache Jena.

Deriving Instances, Assigning OWL Classes

It should be noted that input YAML data would not define the entire ontology, itself. Rather the YAML data would be used in deriving OWL individual objects, each implementing of one or more OWL classes from within an existing ontology.

The pairing between YAML and OWL may be implemented in conjunction with a structured table, such that may be initialized exactly one, at runtime -- essentially, assigning one or more OWL classes C1..Cn to a single, structured, ad hoc YAML input path, P. In the effective processing model applied onto the YAML input, each node in the YAML input may then be iteratively scanned for each P in the table, as to generate a set of OWL individual instances M1..Mn such that each M would be of a class C1..Cn for each such P.

Step 1: Initializing XML DOM Nodes from YAML

That initial scanning method may be accomplished by first initializing an XML document object model (DOM) document, at runtime, derived from the input YAML model -- regarding the YAML model, then, as a source of instance data for within the DOM document. and the DOM document as it providing a programmatically useful structural model for the initial instance data.

The OWL-class-to-input-data table {{P1,C1...Cn,} ...} may then be implemented with each P being of type xpath expression. As for whether Cn would be implemented as of type URI, each referencing a single OWL class -- "YAML to OWL" -- or alternately, each Cn being implemented as exactly one Cn to each P, with Cn then denoting a Fully Qualified Java Class Name -- "YAML to Java." This processing model will prefer the latter implementation. In the "YAML to Java" implementation, each Cn may then make reference to a set of C'1...C'n each denoting an OWL class.


 Step 2: Initializing Java Objects from XML

In the "YAML to OWL" approach, C1...Cn may be defined as being each of type URI -- each URI, then denoting a specific OWL class within an input types ontologly. The respective types ontology may then be initialized within the Java runtime, at any time before the assignment of data properties, within the respective OWL engine -- such as OWL API or Apache Jena, for instance. 

In either of the "YAML to OWL" or "YAML" to "Java" approach, a loose coupling should be implemented between the respective Java class and the set of OWL classes.


Step 2.1: Deriving Java Class Instances from Input DOM Nodes

Alternately to the "YAML to OWL" methodology denoted in this article, then as in order to make effective use of Java method overriding within the processing model, C1 may be defined as each a single Java class, with C then serving effectively as a container of OWL Class URI C'1...C'n.


Step 2.2: Assigning OWL Properties 

... specifically, Object Properties and Datatype Propeties 
...  to each OWL Individual Instance Derived from the DOM model

After each Java object N'1..N'n is initialized, each of a single Java class C1..Cn, then the assignment of data properties -- as would be assigned, each, onto an OWL individual instance represented in N'n as derived of DOM node Nn -- the property assignment procedures may then proceed in one of at least two alternate approaches:
  1. With a constructor for C processing the DOM node N of which N' would have been derived, in the initial assignment of C to P, then assigning OWL properties A1..An to N'
    • This would be in a model of assigning OWL object properties and data properties (P) defined the input ontology
    • Each property defined in the input ontology would then be mapped onto an input DOM node N and the derived instance N'
  2. Similarly, but rather than with the OWL property assignment being encapsulated within the constructor for C, instead with the OWL property assignment being encapsulated into a single property assignment engine method, then encapsulating calls onto any exacting property assignment methods -- this description, necessarily differentiating OWL object type properties and OWL data type properties. 

Focus: Object Property Assignment

The generic object type properties assignment method would bear some particular attention, whereas any object type property assignment method, in this model, may be effectively required to reference -- in (subject, predicate, object) form, similarly (N, A, M) -- as given a single Java subject object N, as would be provided to the respective property assignment engine method defined to the class C of N, and for each predicate object property A, as selected of that engine -- that the object M may be available only as an object reference, towards an object not yet initialized within the result ontology.

The input model -- whether in a YAML text format or derived DOM object format -- the input model may denote any single reference object of (subject, predicate, object) with a string key code for the object. To effectively pad for that concern, within the object type properties assignment step, the property assignment may be conducted not until after all N'1..N'n would have been initialized.

Considering that in each element of the (subject, predicate, and object) trie of any single OWL object property expression, the reference to each of the subject, predicate, and object is denoted with a URI, whereas the reference in the input YAML model is rather encoded as (subject, predicateCode, objectCode) then the procedure for reference translation, in this model, must effectively represent a translation from numeric object code to object URI.

Step 2.3: Assigning OWL Class Identities to Java Objects

Effectively, the assignment of C'1...C'n OWL class identities may be implemented as a sort of property assignment procedure, in itself -- as iteratively, onto each Java object derived from the input DOM model, that derived from an input YAML model, in this example.

This document will denote as in a sidebar that one or more OWL Classes may be assigned directly  to any single OWL Individual node, moreover that zero or more OWL Classes may be derived of a single OWL Individual node, as by way of directed inference applied onto the input OWL model. The model for deriving an ontology of information about the Congress -- in the methodology as described in this article -- will be implemented only with direct OWL class assignment.

In sidebar, briefly: As an abstract example of the derived class model, a derived class may be defined, "Senators of North Dakota", such that the OWL class of that derived information object class would be defined with SWRL inference rules, namely as such that the OWL class thusly defined would be defined as to denote Congresspersons who are Senators elected to represent the state of North Dakota, as would be across the entire timeline provided of the model itself -- as in this example, deriving from the input YAML model. It may seem that the greatest strengths of the OWL abstract data model would be found in such a capacity for defining such derived classes within an OWL abstract data model. Not in so much as a tedious "Data mining," rather that an OWL inference model -- such as the SWRL inference model onto OWL, specifically -- that it allows for extraction and derivation of discrete objects representative of structured knowledge, from within a broader object model representative of structured knowledge.

(Draft 2 and final, of this article)

[1] Onto Ontologies, the Constitution, and the Congress. DSP42. 2014