7
Preprint LRECCNL 2014 Marrying Techinical Writing with LRT Giovanni Antico, Valeria Quochi*, Monica Monachini*, Maurizio Martinelli** S.Te.L S.r.L, Istituto di Linguistica Computazionale - CNR*, Istituto di Informatica e Telementica - CNR** [email protected], [email protected]*, [email protected]** Abstract In the last years the Technical Writer operational scenarios and the workflow sensibly changed; specifically,“free style” writing - or manual writing - has become outdated and technical writing is now much more concerned with structured management of content than in the past. Technical writing has become more demanding due to a number of factors among which the rise and spread of mobile devices usage. This paper discusses the new needs of technical writing and content management business and how LRT can help it improve quality and productivity. Keywords: controlled language, technical writing, content management systems 1. Introduction In the last years the Technical Writer operational scenarios and workflows sensibly changed; specifically,“free style” writing - or manual writing - is becoming outdated and technical writing much more concerned with structured management of content than in the past. Actually, technical writing has become more demanding due to a number of factors including the increased complex- ity of machines, which translates into more complex docu- mentation, and the rise and spread of mobile device usage, which generate new information needs in users. Addition- ally, the legislation is now much more compelling and stan- dardization of technical documentation more widespread. All these aspects makes writing technical documentation increasingly more complex as there are many different re- quirements to be fulfilled. Karen McGrane, a pioneer in Content Strategy, states the need to start thinking at content as independent of its pre- sentation means and to structure content so that it can be reused”(McGrane, 2012). This view of structured content not as a mere technologi- cal matter, but as strategic for business today is becoming an acknowledged reality. Content needs to be decoupled from (form of) presentation and made modular, reusable and collaboratively modifiable according to rigorous work- flows that will help business keep up with the increasing need of adapting the same content to different presentation devices. In addition to these display issues, technical content must also satisfy a number of quality requirement among which, crucially, coherence and comprehensibility. Standardiza- tion, controlled languages, and language simplification are key issues here. Content Management Systems (CMS) have started to be used in the technical writing industry and are successfully fulfilling some of the new requirements, especially related to modularization of content, collaboration and sharing among colleagues and in some cases beyond. Most needs related to content creation and quality how- ever, still fall outside the scope of existing CMS, which mainly address optimal management of document structure and work-flow management 1 . At least part of such needs and of the desiderata for for technical documentation within CMS, we claim, can be satisfied with the help of current state-of-the art Language and Resource Technology, espe- cially exploiting the web-service paradigm (cfr. platforms such as PANACEA 2 , OPENER 3 , Let’sMT 4 ). This contribution will attempt to identify some of those pressing needs and the language-related technologies that might provide an answer to them. 1.1. Information needs We can identify two types of information needs: general needs, i.e. always valid and already well-known, and emer- gent ones. It is well-known and fundamental for content to be first of all correct, updated and coherent within the whole set of documentation that goes with the industrial product along its whole lifecycle (i.e. from its commercial proposal, to its implementation, client education, billing, maintenance and repair, etc.). Technical documentation has to be easy to understand by clients’ workers, and therefore written in the user language (cfr. (O’Keefe and Pringle, 2012) and (Laan, 2012)). On top of these, the recent “web” and “T” revolution has created new needs that technical writing businesses have to address (McGrane, 2012). Content today needs to be: multimodal, i.e. documentation has to be based on im- ages and video in order to facilitate the comprehension of the sequences of tasks to perform; searchable; contextual, that is content has to be retrieved at the right place in the right moment, whereas at present content is usually placed elsewhere than needed; targeted at users’ profiles in order to avoid information overload; 1 e.g. Argo CMS, www.keanet.it/; Vasont CMS, www. vasont.com; docuglobe,www.gds.eu/; SCHEMA ST4, www.schema.de/,... 2 www.panacea-lr.eu 3 www.opener-project.org 4 www.letsmt.eu

Marrying Technical Writing with LRT

  • Upload
    kea-srl

  • View
    630

  • Download
    0

Embed Size (px)

DESCRIPTION

Title / Titolo Marrying Technical Writing with LRT Coniugare technical writing ed LRT (Language Resources and Technologies) *** Authors / Autori Giovanni Antico (S.Te.L S.r.L) Valeria Quochi, Monica Monachini (Istituto di Linguistica Computazionale - CNR) Maurizio Martinelli (Istituto di Informatica e Telematica - CNR) *** Abstract In the last years the Technical Writer operational scenarios and the workflow sensibly changed; specifically,“free style” writing - or manual writing - has become outdated and technical writing is now much more concerned with structured management of content than in the past. Technical writing has become more demanding due to a number of factors among which the rise and spread of mobile devices usage. This paper discusses the new needs of technical writing and content management business and how LRT (Language Resources and Language Technologies) can help it improve quality and productivity. Negli ultimi anni gli scenari operativi del technical writer e i suoi flussi di lavoro sono sensibilmente mutati. In particolare, la scrittura "free style" - ovvero quella manuale - è diventata obsoleta: molto più che in passato, il technical writing è ora incentrato sulla gestione strutturata dei contenuti. Il technical writing è cresciuto in complessità a causa si numerosi fattori, fra i quali spicca la diffusione sempre più capillare dei dispositivi mobili. Il documento illustra le nuove esigenze del technical writing e del content management, suggerendo come le LRT (Language Resources and Technologies) possono contribuire a incrementarne qualità e produttività. *** Il documento sarà presentato il 27 maggio 2014, al LREC Workshop W2: Controlled Natural Language Simplifying Language Use, nell'ambito della 9a edizione del Language Resources and Evaluation Conference, 26-31 May, Reykjavik, Iceland. Per maggiori informazioni: - https://sites.google.com/site/lreccnl2014/ - http://lrec2014.lrec-conf.org/en/

Citation preview

Page 1: Marrying Technical Writing with LRT

Preprin

t

LRECCNL 2

014

Marrying Techinical Writing with LRT

Giovanni Antico, Valeria Quochi*, Monica Monachini*, Maurizio Martinelli**S.Te.L S.r.L, Istituto di Linguistica Computazionale - CNR*, Istituto di Informatica e Telementica - CNR**

[email protected], [email protected]*, [email protected]**

AbstractIn the last years the Technical Writer operational scenarios and the workflow sensibly changed; specifically,“free style” writing - ormanual writing - has become outdated and technical writing is now much more concerned with structured management of content thanin the past. Technical writing has become more demanding due to a number of factors among which the rise and spread of mobiledevices usage. This paper discusses the new needs of technical writing and content management business and how LRT can help itimprove quality and productivity.

Keywords: controlled language, technical writing, content management systems

1. Introduction

In the last years the Technical Writer operational scenariosand workflows sensibly changed; specifically,“free style”writing - or manual writing - is becoming outdated andtechnical writing much more concerned with structuredmanagement of content than in the past.Actually, technical writing has become more demandingdue to a number of factors including the increased complex-ity of machines, which translates into more complex docu-mentation, and the rise and spread of mobile device usage,which generate new information needs in users. Addition-ally, the legislation is now much more compelling and stan-dardization of technical documentation more widespread.All these aspects makes writing technical documentationincreasingly more complex as there are many different re-quirements to be fulfilled.Karen McGrane, a pioneer in Content Strategy, states theneed to start thinking at content as independent of its pre-sentation means and to structure content so that it can bereused”(McGrane, 2012).This view of structured content not as a mere technologi-cal matter, but as strategic for business today is becomingan acknowledged reality. Content needs to be decoupledfrom (form of) presentation and made modular, reusableand collaboratively modifiable according to rigorous work-flows that will help business keep up with the increasingneed of adapting the same content to different presentationdevices.In addition to these display issues, technical content mustalso satisfy a number of quality requirement among which,crucially, coherence and comprehensibility. Standardiza-tion, controlled languages, and language simplification arekey issues here.Content Management Systems (CMS) have started to beused in the technical writing industry and are successfullyfulfilling some of the new requirements, especially relatedto modularization of content, collaboration and sharingamong colleagues and in some cases beyond.Most needs related to content creation and quality how-ever, still fall outside the scope of existing CMS, whichmainly address optimal management of document structure

and work-flow management1. At least part of such needsand of the desiderata for for technical documentation withinCMS, we claim, can be satisfied with the help of currentstate-of-the art Language and Resource Technology, espe-cially exploiting the web-service paradigm (cfr. platformssuch as PANACEA2, OPENER3, Let’sMT4).This contribution will attempt to identify some of thosepressing needs and the language-related technologies thatmight provide an answer to them.

1.1. Information needsWe can identify two types of information needs: generalneeds, i.e. always valid and already well-known, and emer-gent ones.It is well-known and fundamental for content to be first ofall correct, updated and coherent within the whole set ofdocumentation that goes with the industrial product alongits whole lifecycle (i.e. from its commercial proposal, toits implementation, client education, billing, maintenanceand repair, etc.). Technical documentation has to be easy tounderstand by clients’ workers, and therefore written in theuser language (cfr. (O’Keefe and Pringle, 2012) and (Laan,2012)).On top of these, the recent “web” and “T” revolution hascreated new needs that technical writing businesses have toaddress (McGrane, 2012). Content today needs to be:

• multimodal, i.e. documentation has to be based on im-ages and video in order to facilitate the comprehensionof the sequences of tasks to perform;

• searchable;

• contextual, that is content has to be retrieved at theright place in the right moment, whereas at presentcontent is usually placed elsewhere than needed;

• targeted at users’ profiles in order to avoid informationoverload;

1e.g. Argo CMS, www.keanet.it/; Vasont CMS, www.vasont.com; docuglobe,www.gds.eu/; SCHEMA ST4,www.schema.de/, . . .

2www.panacea-lr.eu3www.opener-project.org4www.letsmt.eu

Page 2: Marrying Technical Writing with LRT

Preprin

t

LRECCNL 2

014

• equipped with tools for sharing problems and solu-tions;

• integrated and aggregated (dynamically); i.e. usersshould not need to consult different informationsources to retrieve the data they need.

1.2. Standards for Technical Writing andcontrolled languages

Legislation is vast in this field both at the national andinternational level and aims at regulating both the plan-ning/design of the products and the instructions on their us-age for security and quality reasons (e.g. ISO-IEC-82079-1(2012); UNI-10653 (2003); ; to mention just a few).In addition to these normative rules, we find a number ofstandards and best practices adopted more or less widely,such as the AECMA / ATA / S1000D for the aerospacetechnical domain (ACEMA/ASD-S1000D, ); the OASISDITA for e-business applications (OASIS-DITA, 2010);and various best practices for technical writing.All these aim at improving the quality not only of contentitself, but also of processes for editing, translating, publish-ing and disseminating technical documentation, by rulingthe work flow as well as the structure, presentation orderand informativeness of documents, the semantic commu-nication rules, their graphical display, the file format andmany other aspects.As it can be easily imagined, free-style writing is error-prone in coping with all the requirements imposed by leg-islation and best practices. Think for example at a sim-ple case: a warning. Legislation requires that a warningis accompanied by a pictograph, has a label that explainsthe type of warning (attention, danger, prohibition, . . . ); thecause, consequences and remedies are explained. Dealingwith content manually, the technical writer needs to recalland apply the correct structure, paginate the image, and as-sign the correct style to each piece of content.Using a CMS (Content Management System) as a controlsoftware instead allows for the definition and automatic ap-plication of the required structure and for the automatic in-sertion and editing of the image. The CMS can also exportthe same content in various file formats, especially in theXML targeted standards, like for example DITA.

2. Advantages of CMS for technical writingGiven the strategic importance of smart structured contentmanagement, CMS have become widely used in the indus-try, with current systems successfully satisfying many ofthe needs mentioned above and brings a number of advan-tages to content business. In the following, we mention themost salient.

Collaborative management of content With an ade-quate and customisable management of profiles and au-thorization, CMS allows different professional figues, bothwithin and outside the company (e.g. product manager,technical writers, translators, consultants, etc.), to collab-orate to the process of content editing according to prede-fined work flows.

Single management and revision of content that can bereused CMS allows unique management of content cre-ation and revisions and makes the various pieces of con-tent reusable at all levels. For example, sections, chap-ters, warnings, variables, . . . , are handled as independentunits of content that can be reused or visualised in differentcontexts and/or displayed differently depending on the dis-semination/presentation channel chosen. Moreover, con-tent managers can choose whether a collaborative revisionof common content should be propagated automatically toall its instances (i.e. all occurrences in the various docu-ments) or not.

Definition and application of content structuresReusable models or templates for different kinds of doc-uments, for examples for the warnings mentioned above,can be defined and used to help writers in their daily work.This reduces errors and costs by increasing efficiency.

Automation of the production of various types of tar-geted technical documentation for different channelsBy integrating automatic pagination tools with web appli-cations, CMS is be able to automate cross-media publishingfunctions.This directly translates into several advantages for the busi-ness:

• production costs and time will be reduced,

• content correctness and quality will increase,

• information will be easily targetable,

• graphical styles can be made more coherent.

Content Tagging for reuse Faceted tagging or classifi-cation, possible in CMS, allows for a quick ordering andfiltering of content according to several different character-istics or points of view and constitutes a sensible improve-ments relative to taxonomic classification. By means oftagging, the technical writer defines the usage context ofa piece of common content; that is, for example, (s)he de-fines:

• which family, model, machine or Bill of Material itrefers to,

• who is its audience,

• what types of publication it was conceived for,

• what types of presentation/display channel it can bedisseminated on,

• . . .

Tagging and classification within existing CMS however isstill manual.

Translation Management Through a tagging system, aCMS makes it possible to provide a translator with only thebits of content to be translated of proofread, this helps inoptimising the translation costs. Moreover, translations viaCAT tools can be automatically imported in the CMS, thussaving time and gaining accuracy in text alignment, which

Page 3: Marrying Technical Writing with LRT

Preprin

t

LRECCNL 2

014

can be totally independent of the technical writer languageskills. However, heavy manual intervention/work is stillneeded for managing translations and technical writers stillneed to use different software applications.

Integration CMS often shares data (classifications,codes, prices, figures . . . ) with other software applicationsused by the company for: e.g. project management, CAD,ERP, . . .

3. What’s missingFigure ( 1 ) exemplifies a typical working methodology ofa technical writing company.After a preliminary analysis of the documentation projectto be realised, the rules that characterise the documents aredefined and fixed by the writing and editing team. At thispoint (Author phase), the authors start creating new contentor insert missing bits of data into the database.The next phase, the selection, allows defining the specificdocument that will be created through a process of auto-matic pagination.If we take this example of (real-world) working method,we can see that CMS solutions normally handle two of theindicated phases: the author and selector phases in Figure(1).However, while the selector phase can be highly con-trolled/structured, the author phase is generally “free”. Inthe selector phase, in fact, the CMS allows for the definitionof several (formal) rules (customisable on project basis)that make sure documents respect a number of fundamen-tal requisites: for example, that there cannot be a picturewithout a caption, or that there cannot be a sub-paragraphif there is no preceding paragraph, etc. In the author phase,i.e. the writing phase, instead, current solutions offer littleor no support to authors5.

3.1. Desiderata for Technical DocumentationCMS

Content Tagging - metadata management Dynamicadaptability in context is required for dealing with differ-ent domain terminologies: e.g. a technical writer wouldneed to use different terms when writing about a systemfor maritime navigation than when writing about a simi-lar system on an automobile (route vs. direction). Currentsystems rely on manual tagging and on manual metadatachoice. Some kind of automation of support here is desiredfor improving content adaptability and for reducing errors.

Glossaries and terminologies Writing technical docu-mentation requires the availability of technical glossariesand terminology to help writers and editors be consis-tent and clear. Thus, existing glossaries and terminologiesshould be integrated in the authoring/editing phases withinCMS in order to offer better support to content managersand technical writers. In addition, as often glossaries andterminologies need to be customised according to the spe-cific project, tools that help such a collaborative customisa-tion of terminologies would be most welcome.

5Current CMS solutions can provide standard spell-checkingfunctionalities, but little or no advanced linguistic or terminologi-cal support.

Advanced Translation Management Translation is cur-rently often outsourced, done by professional translatorsexternally of the CMS, i.e. of the “routine” working envi-ronment. This clearly increases not only the time and costsof the final documentation, but also the rate of human er-rors. Ideally, technological solutions that help or assist pro-fessionals in translating technical documentation should beintegrated into the CMS that handles all other phases of theworkflow, so that the whole is more efficient and control-lable, by the editor or the project manager.

Advanced Integration A more thorough integration ofthe CMS with other software can be highly advantageousfor companies. It would indeed allow its different sectorsand collaborators to: share correct information, to use itto write the documents that accompany the products alongtheir life cycle, and above all to dispose of complete infor-mation at once, without having to consult different sourcesto get the complete picture required.

Figure 2: Work flow

Controlled and Simplified Language Current CMS andtechnical writing tools, as we have seen above, mostlydeal with formal issues related to the segmentation of con-tent into minimal reusable pieces, collaboration and sharingamong the working team, and display and presentation ofthe content on different media. Aspects related to the qual-ity of the content, i.e. on the information conveyed, how-ever, are still for the large part left to the human writer tocontrol. However, looking at the current landscape we seethat the adoption and use of controlled (natural) languageshas become now a best practice in many sectors. SimplifiedTechnical English (STE6) is for instance commonly usednow in the editing of technical documentation, especiallywithin the aerospace and military industry7.Controlled (natural) languages generally restrict the gram-mar and vocabulary of the language in order to reduce oreliminate ambiguity and complexity (normally a word canhave only one sense and one part-of-speech). For example:close can be used as the verb denoting the action of closing,

6www.asd-ste100.org/7Other known controlled languages used in the industry are

Caterpillar Technical English, IBM’s Easy English, BULL GlobalEnglish, . . .

Page 4: Marrying Technical Writing with LRT

Preprin

t

LRECCNL 2

014

Figure 1: A Working Methodology

but not as the adjective for proximity; this way, to close thedoor is accepted as a valid chunk in the controlled languagewhile do not go close to the landing gear is not.The advantages of the adoption of a controlled languageis now widely recognised in the sector: it helps increasingclarity of the procedural technical language; it also helpsimproving the comprehension of the documents by non-native as well as low-literacy speakers of the language andoptimizing translation procedures by increasing the perfor-mance/reliability of CAT and MT tools.To adopt a controlled language, however, forces the tech-nical writer to follow the prescribed (language) rules andrestricted vocabulary specific for the topic/domain of thedocumentation, which increases the complexity of his/her,and the editor’s, job.Therefore, equipping a CMS for technical documentationwith smart authoring tools that support the adoption of con-trolled languages would bring together their potentialitiesto the advantages offered by structured management.It is worth noting, however, that for many languages anddomains, standardised or shared controlled languages donot (yet) exist; each company or manufacturer simply de-fine their own requirements and establish some internal bestpractice (often even with rules not explicitly formalised orstated). Moreover, even when a CL exists, technical writingprojects often need specific stylistic rules and terminologythat goes beyond the standardised controlled language. Insuch cases, CMS could/should additionally provide func-

tionalities that help writers and editors establish their in-house solutions of controlled language (rules and vocabu-lary), which may be additionally shared to a wider com-munity in order to contribute to the spread and harmoni-sation of technical style within a single national/regionallanguage.

4. How can LRT and Semantic Web helpAlthough human intervention in technical writing will con-tinue to have play a major role, language technology canhelp optimize the writing and editing tasks considerably, asthe few existing products demonstrate. Of course, moreresearch is required to fruitfully apply the developmentsachieved within the research community, make the prod-ucts efficient and widely available, at low cost. However,it is high time we transferred some of the more stable andmature technology to the small industry for exploitation.LRT advancements can help in particular address desider-ata related to glossaries and terminologies, Metadata man-agement and domain adaptability, translation management,and support for controlled or simplified language.Technology today is relatively mature to bootstrap lexicons,terminologies and ontologies from corpora, and merge orlink them to construct resources that cover different do-mains and usages (cfr. Venturi et al. (2009), Carroll et al.(2012), Fazly et al. (2007), Lin et al. (2009), Del Grattaet al. (2012), Padro et al. (2013) among many others).Of course, these are still error-prone procedures; but to-

Page 5: Marrying Technical Writing with LRT

Preprin

t

LRECCNL 2

014

gether with automatic reliability scores for semi-automaticvalidation and post-editing tools, they may help reduce theproduction costs and increase the coverage of corporate re-sources.Thanks to the effort of the computational linguistics com-munity in the last decades, we now have available (stan-dardized) representation formats and architectures for lex-ical, terminological and ontological resources, which al-low for easy integration within various applications8. Alsoa number of such resources already exist available onthe web9, which can be used already to build proto-types and showcases (e.g. WordNets, Ontologies, Lexica(Francopoulo et al., 2009), (Henrich and Hinrichs, 2010),(Del Gratta et al., Under review) ). There has been also agreat body of work on metadata management and standard-ization for terminology management, which is now largelyadopted especially in the translation and localization world,but not so much in technical writing, and not so much forlanguages other than English.Exploiting such representation models and standardizedmetadata, resources can be organized to list different us-ages for different domains/context, so that once the domainis identified or selected they can be used for providing someautomation in content tagging and adaptability.Terminologies and ontologies can also be further adaptedto manage controlled vocabularies by applying for exampleautomatic acquisition and representation of term variantsand acronyms (e.g Jacquemin and Tzoukermann (1999),Thompson et al. (2011)), word sense induction and clas-sification (e.g. Lau et al. (2012), Manandhar et al. (2010),Pantel and Lin (2002)) , such that the preferred term canbe automatically suggested when a synonym or variant isused.Support to authoring (in a controlled language) and copyediting can be provided by parsing tools, which can beadapted and used to signal complex word patterns and syn-tactic structures. At a simplest level, the integration of apos-tagger (which nowadays exist state-of-the-art for manylanguages) in a technical authoring system can be used,for example, to signal and highlight not allowed part-of-speeches, e.g. adverbs; the integration of morphologicalor syntactic analysers can help identify for example the us-ages of the passive voice, which should be revised/changedby the author. Nonetheless, more sophisticated tools forchecking whether the sentence the author is producing iscompatible with the syntax of the controlled language (as-suming this has been defined) can also be developed build-ing on the existing technology.Furthermore, domain-specific authoring memories could beimplemented, and potentially shared across companies, thatcollect previously used (and/or approved), chunks of texts,so that within the CMS, the author is provided (in real time)with suggestions about the most similar and or frequentchinks of text already used within the domain, the samedocument, or the same company documentation material.

8e.g. LMF, NIF, TMF, Lemon, among others, which can all berepresented in XML, RDF, RDF Linked Data, or Json syntax cfr.Bora et al. (2010), Hayashi et al. (2012), McCrae et al. (2012)

9We will deliberately not consider licensing issues in this dis-cussion.

Finally, as technical documentation often needs to be writ-ten in the language of the target audience, and translationfrom English is not always an option (in the first place be-cause documentation may be originally written in anotherlanguage), Machine Translation systems as well as transla-tion memories should be integrated in CMS and adaptedto suit the needs of technical writers and editors. Thisway, massive outsourcing of the translation process can beavoided while at the same time editors or project managerscan be given higher control on the whole workflow. CMSenriched with automatic MT should further be equippedwith post-editing applications, so that the revision of con-tent becomes cheaper. Instead of having independent trans-lation tools, be they fully automatic or not, what is neededtoday especially by SMEs in small countries (i.e. withsmaller markets), is to provide the industry with easy-to-use machine and /or assisted translation tools for many lan-guage pairs. Also of interest is the design and deploymentof shared translation memories that are capable of incre-mental and smart augmentation, so that human translationcan become easier and ensure coherence within a same do-main, topic and language style.Technical authoring tools that implement some of thefunctionalities mentioned above do indeed exist already(e.g. Acrolinks IQ Suite10, Boeing Simplified EnglishChecker11, Adobe Technical Communication Suite); mostof them however are developed and marketed as legacytoolkits by big industries, at prices that SMEs in countrieswhere the internal market is not very big cannot afford.Even more importantly, perhaps, they support few EU lan-guages, mostly English and German. Very little exist forother languages12.To promote competitiveness in non-English speaking coun-tries, instead, such technology needs to be mastered alsoby smaller companies that can adapt and customise solu-tions for their specific reality. In recent years, languagetechnology has been looking to the web and the paradigmof language tools as (distributed) web-services and webapplications is now relatively consolidated. This allowsfor modularisation, easy experimentation by companies,better academia-industry transfer, and is in line with the“Software as Service” paradigm currently adopted by manyCMS software houses. Thus, while the academia shoulddisclose its achievements as open source software, it is alsoimportant to pursue with research on deployment of lan-guage technology in the web-service paradigm, so that newfunctionalities can become quickly and easily usable bybusinesses not interested in technology development per se.

5. ConclusionsGiven the recent change in paradigm and strategy for tech-nical writing business, and that information is a very im-portant part of products, satisfying all information needs of

10www.acrolinx.com/11www.boeing.com/boeing/phantom/sechecker/12In France for example a movement has started to pro-

mote the adoption of controlled technical languages andto develop some support computational tools, e.g. seethe projects Sense Unique,//tesniere.univ-fcomte.fr/sensunique.html, LiSe (Renahy and Thomas, 2009)

Page 6: Marrying Technical Writing with LRT

Preprin

t

LRECCNL 2

014

users and legislation provides the product with a consider-able competitive advantage. This implies that content com-panies have to enhance not only their publication channelsand platforms, but also their content creation methodology,introducing advanced content technology support to helpthem go beyond “free style” writing, as part of their busi-ness strategy.In this paper, we went through the technological featuresthat are already available to technical writers and editors.In particular, with the adoption of CMS great improvementshave been introduced both for managing the technical doc-umentation team and workflow, and for better modularizingand structuring content so that most formal aspects of tech-nical documentation creation and publishing are automa-tized.Still, the authoring part of the job is left uncovered withinCMS and authors need to recur to independent authoringsoftware, which is highly expensive and often available fora limited se t of languages. Thus, we have formulated someof the most urgent desiderata for CMS dedicated to techni-cal writing and have tried to clarify what LRT can providesupport and briefly sketched how.Language and Web technology can indeed help developsuch a forward-looking advanced strategy. Indeed, thereexists already authoring toolkits that tackle (some) of thementioned needs. However, these are generally quite ex-pensive for small companies, and mostly work for Englishand for a few highly restricted technical domains. Instead,what is envisaged is easy-to-use, customize and integratesoftware/services for potentially all EU languages and var-ious text styles, so that even smaller companies can affordto introduce important innovations within their work envi-ronments.Certainly some of the text analysis, terminology extractionand management technology is mature enough for integra-tion into legacy applications on a web-service basis. Inparticular, what we try to encourage is research towards astrong integration of LRT and Semantic Web functionali-ties within CMS for the technical documentation business,in such a way that they can be able also to handle controlledlanguages in the authoring phase. This might, in fact, pro-vide the business with a double advantage: both on the formand content sides.

6. AcknowledgementsThis work has been done within a collaboration betweenCNR Research Institutes and SMEs under the auspices ofTuscany Region whose priority is to promote strong public-private partnership (PPP) in view of the European chal-lenges.

7. ReferencesACEMA/ASD-S1000D. (). International specification for

technical publications using a common source database.ASD.

Anon., (1967). Title title title title title title title title titletitle. Organization organization organization.

Blunsom, Phil and Baldwin, Timothy. (2006). Multilin-gual deep lexical acquisition for hpsgs via supertagging.In Jurafsky, Dan and Gaussier, Eric, editors, Proceedings

of the 2006 Conference on Empirical Methods in NaturalLanguage Processing, (EMNLP06), pages 164–171.

Bora, S. Ali, Hayashi, Yoshihiko, Monachini, Monica, So-ria, Claudia, and Calzolari, Nicoletta. (2010). An lmf-based web service for accessing wordnet-type seman-tic lexicons. In Proceedings of the 2010 Language re-sources and Evaluation Conference.

BSI, (1973a). Natural Fibre Twines. British Standards In-stitution, London, 3rd edition. BS 2570.

BSI. (1973b). Natural fibre twines. BS 2570, British Stan-dards Institution, London. 3rd. edn.

Carroll, John, Koeling, Rob, and Puri, Shivani. (2012).Lexical acquisition for clinical text mining using distri-butional similarity. In Gelbukh, Alexander, editor, Com-putational Linguistics and Intelligent Text Processing,volume 7182 of Lecture Notes in Computer Science,pages 232–246. Springer Berlin Heidelberg.

Del Gratta, Riccardo, Frontini, Francesca, Monachini,Monica, Quochi, Valeria, Rubino, Francesco, Abrate,Matteo, and Duca, Angelica Lo. (2012). L-leme: Anautomatic lexical merger based on the lmf standard. InProceedings of the LREC 2012 Workshop on LanguageResource Merging, pages 31–40.

Del Gratta, Riccardo, Frontini, Francesca, Khan, Fa-had, and Monachini, Monica. (Under review). Con-verting the parole simple clips lexicon into rdf usingthe lemon model. Semantic Web Interoperability, Us-ability, Applicability (SWJ), http://www.semantic-web-journal.net/system/files/swj487.pdf(1).

Douglas, S. and Hurst, M. (1996). Controlled languagesupport for Perkins Approved Clear English (PACE). InCLAW96: Proceedings of the First International Work-shop on Controlled Language Applications, pages 26–27, Leuven, Belgium.

Fazly, Afsaneh, Stevenson, Suzanne, and North, Ryan.(2007). Automatically learning semantic knowledgeabout multiword predicates. Language Resources andEvaluation, 41(1):61–89.

Francopoulo, Gil, Bel, Nuria, George, Monte, Calzolari,Nicoletta, Monachini, Monica, Pet, Mandy, and Soria,Claudia. (2009). Multilingual resources for nlp in thelexical markup framework (lmf). Language Resourcesand Evaluation, 43(1):57–70.

Grandchercheur, L. B., (1983). Vers une modlisation cogni-tive de l’łtre et du nant, pages 6–38. Lawrence ErlbaumAssociates, Hillsdale, N.J.

Hayashi, Yoshihiko, Bora, Savas Ali, Monachini, Mon-ica, Soria, Claudia, and Calzolari, Nicoletta. (2012).Lmf-aware web services for accessing semantic lexicons.Language Resources and Evaluation, 46(2):253–264.

Henrich, Verena and Hinrichs, Erhard W. (2010). Stan-dardizing wordnets in the iso standard lmf: Wordnet lmffor germanet. In COLING, pages 456–464.

ISO-IEC-82079-1. (2012). Preparation of instructions foruse – Structuring, content and presentation – Part 1:General principles and detailed requirements. ISO.

Jacquemin, Christian and Tzoukermann, Evelyne. (1999).Nlp for term variant extraction: Synergy between mor-phology, lexicon, and syntax. In Strzalkowski, Tomek,

Page 7: Marrying Technical Writing with LRT

Preprin

t

LRECCNL 2

014

editor, Natural Language Information Retrieval, vol-ume 7 of Text, Speech and Language Technology, pages25–74. Springer Netherlands.

Laan, Krista Van. (2012). The insiders guide to TechnicalWriting. XML Press, Laguna Hills, CA.

Lau, Jey Han, Cook, Paul, McCarthy, Diana, Newman,David, and Baldwin, Timothy. (2012). Word sense in-duction for novel sense detection. In Proceedings of the13th Conference of the European Chapter of the Asso-ciation for Computational Linguistics, EACL ’12, pages591–601, Stroudsburg, PA, USA. Association for Com-putational Linguistics.

Lin, Jimmy, Murray, G.Craig, Dorr, BonnieJ., Haji, Jan,and Pecina, Pavel. (2009). A cost-effective lexical ac-quisition process for large-scale thesaurus translation.Language Resources and Evaluation, 43(1):27–40.

Maass, Wolfgang and Kowatsch, Tobias, editors. (2012).Semantic Technologies in Content Management Systems:Trends, Applications and Evaluations. Springer, Berlin.

Manandhar, Suresh, Klapaftis, Ioannis P., Dligach, Dmitriy,and Pradhan, Sameer S. (2010). Semeval-2010 task 14:Word sense induction & disambiguation. In Proceedingsof the 5th International Workshop on Semantic Evalua-tion, SemEval ’10, pages 63–68, Stroudsburg, PA, USA.Association for Computational Linguistics.

McCrae, John, Montiel-Ponsoda, Elena, and Cimiano,Philipp. (2012). Integrating wordnet and wiktionarywith lemon. In Linked Data in Linguistics, pages 25–34.

McGrane, Karen. (2012). Content Strategy for Mobile.http://www.abookapart.com/products/content-strategy-for-mobile.

OASIS-DITA. (2010). Darwin Information Typing Archi-tecture (DITA). OASIS.

O’Keefe, Sarah S. and Pringle, Alan S. (2012). Con-tent Strategy 101: Transform Technical Content Into aBusiness Asset. Scriptorium Publishing Services, Inc.,Durham, NC.

Padro, Muntsa, Bel, Nuria, and Necsulescu, Silvia. (2013).Towards the fully automatic merging of lexical re-sources: A step forward. CoRR, abs/1303.1929.

Pantel, Patrick and Lin, Dekang. (2002). Discovering wordsenses from text. In Proceedings of the Eighth ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining, KDD ’02, pages 613–619, NewYork, NY, USA. ACM.

Renahy, Julie and Thomas, Izabella. (2009). Compagnonlise: A collaborative controlled language writing assis-tant. In ISMTCL Proceedings, Bulag, pages 223–230,Besancon, France. PUFC.

Thompson, Paul, McNaught, John, Montemagni, Simon-etta, Calzolari, Nicoletta, del Gratta, Riccardo, Lee, Vi-vian, Marchi, Simone, Monachini, Monica, Pezik, Pi-otr, Quochi, Valeria, Rupp, CJ, Sasaki, Yutaka, Venturi,Giulia, Rebholz-Schuhmann, Dietrich, and Ananiadou,Sophia. (2011). The biolexicon: a large-scale termino-logical resource for biomedical text mining. BMC Bioin-formatics, 12(1):397.

Tufte, Edward R. (2001). The Visual Display of Quanti-

tative Information. Graphics Press, Cheshire, CT, 2 edi-tion.

UNI-10653. (2003). Qualita della documentazione tecnicadi prodotto. UNI.

Venturi, Giulia, Montemagni, Simonetta, Marchi, Simone,Sasaki, Yutaka, Thompson, Paul, McNaught, John, andAnaniadou, Sophia. (2009). Bootstrapping a verb lexi-con for biomedical information extraction. In Gelbukh,Alexander, editor, Computational Linguistics and Intel-ligent Text Processing, volume 5449 of Lecture Notes inComputer Science, pages 137–148. Springer Berlin Hei-delberg.

Zavatta, A. (1992). Un Gnrateur d’Insultes s’intgrantdans un Systme de Dialogue Humain-Machine. Thsede doctorat en informatique, Universit Paris-sud, Centred’Orsay.