Digital Spelunk 42: Towards an RDF Encoding for the IANA Language Subtags Registry for Presentation in Graph Views and Application Data Spaces

While I was reading some of the existing documentation about the Resource Description Framework (RDF), last week, I found myself wondering about a particular feature with regards to localization (L10N) -- in a broader sense, internationalization (I18N) -- namely as with regards to internationalization of static application data label text, moreover as concerning a concept semantically towards an application of CORBA in Common Lisp, at an extent that the implementation might be integrated more thoroughly with the Common Lisp Metaobject Protocol (MOP), more than the present Lisp bindings for IDL.

Hypothetically -- in an offhand design specification, albeit -- an IDL implementation may be defined, effectively, as an extension of MOP. In an application view, IDL attributes may be represented with specific extensions of the MOP slot definition classes, such that the extending classes may naturally serve as to encode features of IDL source forms, secondly such that may be applied in reflective procedures of a CORBA Interface Repository (IR) in defining, essentially, both of an abstract application space, via portable CORBA object services, along with an application serving as an implementation of the same abstract application space in portable Common Lisp. The concept of an abstract application space, as such -- though I might be the first person ever suggesting any such concept -- I think it's almost suggested by the design of the application space of the Android platform, together with any number of web service applications such that provide data services for Android applications. Though it might seem to be as if written between the rows and columns of application icons on an Android home screen, moreover hinted towards -- ever so subtly -- via the Android resource 'share' feature, buy insofar as if such a thing would be assembled exclusively of HTTP services, I'm afraid it might make for something of an overall protocol nightmare, candidly.

Although it may seem as though the set of extensions of the primary CORBA specifications have been defined, by in large, for support of enterprise commercial systems, and -- in something of a sense of nuance, perhaps -- it might be assumed as though there was no space for any sort of an independent platform in CORBA object services, and -- if one would suggest otherwise -- it might seem like all of an absurdly radical concept, there's my primary concern in denoting any such concept.

Presently, the discussion will resume to a matter of language codes.

After some litttle tooling around with the Protégé toolkit, of course the author is familiar with the RDF syntax of suffixing a language tag to an RDF literal resource, such as to denote a logical, linguistic derivative of a fundamental resource concept, or directly to denote a language in which an RDF literal value finds a lexicon. RDF language tags might find their broadest use -- broadly -- as object values in RDF annotation properties. I can't say as if that's been written so clearly in any specific documentation -- I don't believe I've ever read such a thing, precisely anywhere. So, perhaps it's simply an intuitive observation.

So far as regarding the syntax and structures of language codes as may be applied in RDF literal values, certainly one may endeavor to consult the documentation. In reading a few resources about RDF schema, as such -- long bibliography written short -- I noticed a reference to IETF BCP 47 -- in a sense, however indirectly, updated with RFC 5646, RFC 4647, and RFC 5645.

BCP 47, oringinally makes reference to an IANA Language Tags registry, which in turn makes reference to the IANA Language Subtags registry. To my best understanding, the latter serves as something of a formal structured reference, by in large supplemental to sets of language codes and script codes published by the ISO — in the latter regards, concerning script codes, thereof in manner orthogonal to Unocode Code Sets. Considering the structures defined of the IANA language subtags registry, perhaps the same regsitry may itself serve to provide something of a sense of linguistic semantic knowledge, more than the ISO language codes unardorned. In a sense, the IANA Language Subtags Registry serves as to collate so many language and script identifiers such that are defined, originally, in ISO standards documents. Of course, the IANA Language Subtags registry does not go to any great detail in its data record values, as to the providence of each respective language code or script code. However, the providence of the data values is described — broadly — in documents published of the Internet Engineering Task Force (IETF).

Orthogonally, considering that the Language Subtags registry does not make reference expressly to the Sorani or Kurmanji dialects of Kurdish languages -- rather, delegating the Kurdish languages to a total of three geographical subsets of no specific cultural characteristics, apparently in extending of ISO 639-3 as such [Wikipedia] — not to permit the ISO specification as though to remove any language of its characteristic pith, a secondary reference may serve to describe any qualities of broader cultural meanings, to which the three odd geographic definitions of Kurdish languages [Kurdish Project] may be supplemental.

Of course, if all language was so tidy as to be fit easily into a neat ontology, that would certainly serve to make a librarians' tasks altogether easier. However, if all language was all so far standardized as to meet the specifications of any single institution's own views of language, it might furthermore serve to remove all of a sense of organic, expressive wit from the nature of language itself. In so much as denoting that there are normative string values by which any single work of an expressive text may be annotated for its language, I believe it is not as though to suggest as if any single language tags model, itself, could ever serve to define all of any language, however a language tags model may ever be applied . In a simplest sense, perhaps a system of language tags may serve in a bibliographical role, perhaps in a manner secondary to any applications of language tags for a purpose of user comfort and user convenience, in a service for networked content negotiation -- such as in HTTP.

In so many things, a model may not exclusively define any object that a model may soever deign to describe. Much like languages, models are imperfect artifacts, never completely describing every functional detail of any system beyond a form of a model, itself. Inasmuch as that a model may serve as to convey a useful range of concepts and specifications, however — leaving aside any reflective analysis of any specific semantics in modeling — but as in that a model may serve as to communicate a useful set of concepts, in any of a manner essentially independent of media, thereof a model might find at least a manner of a utility role. Essentially, a model serves as a communicative document.

Language being a broad concept -- furthermore, a concept ever evolving in parallel with any contemporary social trends or discrete styles in developments and expressions of language -- certainly, as even that a momentary language tags model may ever serve in a sort of utility role, in establishing a sense of idenity about a language, in communications, but certainly a language tag in itself may never be all of a language. Even insofar as of the English language, clearly a popular language in contemporary communications, there are some literary forms that essentially disregard the limits of classification and identity of language. The poetry of Amiri Baraka occurs to the author's own literary consideration, as such. Historically of European cultures moreover, the concrete poems of Kurt Schwitters — aside to the Dadaists' own original sporting about identities in the arts and society, the Dadaists more or less creating a new manner of expressive absurdity, shortly past the end of WWI, perhaps in a time too early for the Dadaists to be categorized to a Postmodernist school, though one might endeavor to estimate, intiitively, that the Dadists were some of the modern society's first postmodernists, in a manner of an ad hoc thesis, albeit — simply, the concrete poetry developed by Kurt Schwitters would deny any single linguistic classification. Although it finds a literal form in a Latin script, as per any single written publication — historically, publications as made astride to the events of the small number of Dada Soirees in Europe — it may be likewise an original language, although scarcely known and scarcely adopted, but a language expressly developed by Kurt Schwitters, otherwise unknown of its ownership and identity, to all the pages of archival history. If one may suggest a style in homage to the Dadas' own post-consumerist works, unabashed of the nearly infantile syllabery of the original concrete poems, perhaps it was the language of Schwittersian European Regression Inc. Considering the Dadaists' own efforts in reconciling a critical wit to a commercial modernity after the long effects of the First World War, as a concept, perhaps it may serve to suggest that there is a heritage of empathy, in the works of which the contemporary artistic postmodernity has evolved.

That grain of salt now idiomatically aside: The IANA Language Subtags registry -- in all of its broad, structural form -- perhaps, it may be one of a markedly significant number of resources such that we, as people, may not too quickly dispose of. However it may be that the Language Tags registry, itself, became denoted as an "Obsolete" resource — in any analysis of events over time — but in considering the structural nature of the Language Subtags registry as it representing at least a best effort for defining structures of languages -- at least, insofar as for applications in regards to representation of text in electronic media -- certainly, the structural nature of the Language Subtags registry may not be, in itself, so far outdated.

Thus, although the RDF specification -- even insofar as its rendition in the ODM 1.1 Metamodel [ODM 1.1] -- may denote only a string syntax for representation of language tags, the author believes that it may be well to define a broader CORBA model, to an effect, a Language interface definition. The IANA Language Subtags registry may, itself, serve to provide a sense of structure for an initial Language interface implementation, such that may applied within a CORBA application system, if not in a definition of a broader, abstract application space. The author having no broad manner of an academic reputation with which to present such a thesis, however, the author contrives to present it if only as a "Belief," that a structural identity for Language may be apropos, as a feature towards — broadly — a definition of an abstract application communications environment, such as may serve to implement a singular service mix as may extend of existing CORBA object service definitions, augmenting the latter so as to define a complete networked application environment of CORBA object services, essentially as communication services, moreover integrarting a set of broader network application services — such as Kerberos authentication services and transport layer security models — as to define a comprehensive model for trusted digital communications, ensuring of nonrepudiation and neither neglecting of creators' rights, in defining an abstract CORBA application space, irrespective of individual machine architectures. If, hypothetically, a CORBA application service model may be furthermore applied in support of any formal, civil disaster relief and emergency services operations — if not furthermore, applied in roles for knowledge discovery, as with regards to literal, audio, and visual media — then perhaps it may moreover stand as to remove much of a figrative bight, in altogether a positive regards, concerning some communications about application service architectures. Certainly, CORBA may be applied towards portable definitons of network service models in civil service applications, inasmuch as CORBA services have ever been defined for applications in national defense systems, and such applications — though not any details of any platform-unique CORBA application services, direclty — then discussed in an open forum. The author of this article, on having read of such defense-oriented applications for CORBA, is simultaneously impressed and nonplussed of such news. It may seem as if to suggest an altogether awkward style for communications about an otherwise benign object service specification. Moreover, the author does not believe it could be either useful or necessary to "Refactor CORBA" — though such and idea might occur — as whether or not if to address any vaguely sympathetic sense of concern about any existing, platform-unique CORBA object service definitions anywhere in controlled media.

So far as to propose any exact structure for a Language interface, of course there are the semantic characteristics of the IANA Language Subtags registry, and the structures of ISO 639 specifications dash 1, 2, and 3. Wikpedia, furthermore, denotes a singular manner of linked open data resource, namely Glottolog -- in reference to any apparent conventions in denoting dialects of Kurdish languages [Glottolog] -- such that the author makes reference to, only with a further note: Candidly, to the author's own understanding, not all persons might appreciate a categorization of Kurdish languages as if all of Kurdish language was subsumed under a category denoting Iran. Certainly, in considering it as a semantically if not historically framed categorization, certainly it might not have been defined as if to suggest any manner of a political interpretation. In a contemporary sense, perhaps it might not be so easy to interpret, as in some points of view with regards to any manners of contemporary political and social institutions in Southwest Asia. Concerning Kurdistan's representation in contemporary Iran, as well as to Kurdistan's representation in contemporary Iraq, contemporary Syria, contemporary Turkey, and abroad to the Middle East, such as in the EU and in the US, certainly a people's geographical distribution may not serve as though to permit so much of an easy geographic categorization, if in any regards — as if secondly — to characteristics of language.

Thus, perhaps the author might begin to understand if there may be any difficulties faced by linguists, in reconciling even a simple, however common language annotation tag to a language form.

Insofar as that a programmatic Language interface may be defined as in a generic manner of definition, as though irrespective of any specificstructures of individual language code registries — perhaps a generic manner of definition may serve as to permit for definitions of more semantically specialized extensions, onto individual language identity lists. Simply, a distinct Language interface may be defined — as in a CORBA application protocol, such that a Language interface may be defined originally in IDL — such as to replace the optional "Language string" of an RDF Literal resource.

In any later definitions of application protocols, the Language interface may be furthermore applied as in a session-oriented definition of locale identities for purpose of data field localization, in a manner essentially abstracting POSIX LC_FUM environment variables. Orthogonally, POSIX shell environment variables may be represented in an abstract model, altogether, as session variables, such as towards a definition of a CORBA model for ad hoc shell command interactions, whether via TTY, PTY, or Ethernet networked data channel. POSIX, as a brand name, is certainly a trademark of its respective trademark owner.

Concerning the RFC 5646 definition of the IANA Language Subtags registry [RFC 5646], RFC 5646 section 3.1.3 describes seven classes of record for entries in the registry [list misformatted by Blogaway app, in edit, reformatted manually]:
• language
• extlang
• script
• region
• variant
• grandfathered
• redundant

In the data records of the IANA Language Subtags registry, each language type record is denoted with a corresponding subtag value, the syntax of which may serve to indicate the origins and applications of the subtag value. As denoted in RFC 5646 section 2.2.1., two-character subtag values are derived of ISO 639-1. Three-character subtag values -- excluding any subtag values as may occur in the lexical value space, in range [qaa - qtz]-- other literal three-character subtag values, as defined in the registry, are derived of an effective set union of language codes in the ISO 639-2, ISO 639-3, and ISO 639-5 specifications.

Although the IANA Language Subtags registry provides a textual name for each language entry -- namely, in each record's description field -- however, for purpose of internationalization in presentational applications, it may be preferred that a language definition would make reference to records encoded in representation of each of the respective ISO source documents. In some encodings of the ISO 639 language codes, for instance, there may be localized strings available, for representing each respective langauge name in a lexicon of any single language..

Concerning the instances of single language records having multiple description fields, in the Language Subtags registry, perhaps an RDF encoding for the registry may be preferred -- as towards a sense of any project developing a programmatic, graph oriented encoding of the IANA Language Subtag Registry. Of course, an RDF encoding may serve to present a a sense of further challenge for adoption in a CORBA application service protocol. However, in translation to an RDF encoding, certainly all of the information provided in the IANA Language Subtags registry may be preserved, as across the transition to the subsequent structured record format. The RDF encoded model may be subsequently accessed via CORBA, in a model extending of ODM [ODM 1.1]

Certainly, in analysis of other than the language records in the IANA Language Subtags registry, s9me further information may be interpolated of the respective data records of the Language Subtags registry, such as in reference to RFC 5646 -- such as to determine, when possible, the data providence of each record field, furthermore to determine each tag's compatibility onto other specifications, e.g. ISO-15924 as with regards to script records.

In defining, essentialy, a knowledge model for standard language tags and other language identities, it may be noted that the language knowledge model, itself, may need not be applied in all applications as may provide any manner of localization features. Insofar as that a single application may utilize a single subset of language codes, such an application may access any one of respective subset of language codes as a session-local value, without immediately requiring an interface be provided onto an entire RDF language graph.

Of course, inasmuch as that an RDF language graph may be applied for purpose of facilitating translation of language codes, across compatible language code sets, furthermore as may support an interactive selection of language-specific resources, if not moreover to facilitate a machine translation of text or other media, an RDF language graph may serve in a supportive role, in so many interactive applications. Regardless, if an application may not itself require access to any single graph view of language identities, perhaps it may be advisable to define two distinct Language interfaces — towards one definition of a Language interface, as may serve to express a portable graph of language identities, and another interface definition as to provide a view of a language identity as a resource in a graph view of language. The latter might be of some use to students of language studies, moreover for applications in academic linguistics. The simpler, non-graph-integrated Language interface definition may serve a role for simple localization of data label strings, in any manner of interactive applications.

Towards illustrating a simple usage case, the session variables inteface deinition — suggested in the previous — may be relatively easier to illustrate, initially, in an application providing a simple CORBA session proxy interface to a shell terminal. In such an application, the session variables inteface deinitions may be applied in a manner that would be principally irrespective of the syntax of any individual shell environment variables — thus, without applying any manner of a specialized Language interface definition, immediately in the shell emulator/shell proxy application.

Hypothetically, a manner of a dispatching controller application — perhaps, in a manner analogous to an abstract user session manager for an abstract application space — such may be defined as to extend of the shell emulator/shell proxy application, in such a manner as that the values of the respective LC_FUM variables would be derived from session data recorded in the dispatching controller application.

Precentky, the author proposes to update the Hardpan Tech fork of CLORB, subsequently to develop a proof of concept as suggested in the previous.

Digital Spelunk 42

Monday, July 13, 2015

Towards an RDF Encoding for the IANA Language Subtags Registry for Presentation in Graph Views and Application Data Spaces

No comments:

Post a Comment