Guidelines for Local Language Content Development

Preview:

Citation preview

Guidelines for Local Language Content Development

Content Planning, Development, Licensing, Online

Publishing, Dissemination and EvaluationPublishing, Dissemination and Evaluation

Mumit Khan and Urmi LohaniBRAC University

Bangladesh

What is “content”?

• Anything that can be put online? Printed matter too?

Published by a “reputable” publisher? Youtube

videos? Cartoons? Maps?

• Who needs it?

• Who has access to it?• Who has access to it?

• Who judges its quality?

• How to effectively develop content and disseminate?

• Is it “original” material, or translated from another

language? Who “owns” it?

• How to evaluate the content’s “impact”?

Is this content?

Is this content?

Is this content?

Is this content?

How about this?

MIT OCW Lecture Notes

A Typical PanL10n Content

1. Multi-format – HTML, PDF, …

2. Multi-target groups – training for novices and

advanced users of Linux

3. (Perhaps) Multi-language – English and local 3. (Perhaps) Multi-language – English and local

language

EIGHT different versions without a sound

development process!

Content Development

Methodology (Life-cycle)

1. Choosing target audience and content type

2. Choosing team and resource partners

3. Conducting needs assessment

4. Collecting raw materials for content4. Collecting raw materials for content

5. Process raw materials into content form

6. Disseminate content using appropriate

license

7. Collect feedback/evaluate content

Content Development

Methodology Example

Dhaka Ahsania Mission (DAM)

Content PlanningContent Planning

Choosing target audience and

content type

• The target audience and content type

selections typically a “business” decision

• Issues when selecting content type and media

– Text/Audio/Video/Multimedia– Text/Audio/Video/Multimedia

– Printed only – difficult for multimedia content

– Online only – accessibility issues

– Printed and Online – development tools and

technology selection is very important

Choosing resource partners

• The supply side needs to be addressed well

before the content generation phase

• Typical scenario involves surveying the

content area to identify potential resource content area to identify potential resource

partners

• Longevity and sustainability of the

relationships need to be considered

• Copyright and licensing issues start with

resource partners

Needs Assessment

• Very much depends on the target audience

• For development work, participatory methods

are most prevalent (from lessons learned)

• Need some experience and organizational • Need some experience and organizational

support to conduct participatory sessions

• Stakeholder participation at some level is

critical for content success

• Must have methodology for post-evaluation

Needs Assessment Case Study –

D.Net’s livelihood content

Content DevelopmentContent Development

Process, Technologies & Tools, and

Media

• Development process must be well defined

– Collection and evaluation of raw materials

– Formatting and Incorporation into existing content

• Technologies and tools play a critical role in • Technologies and tools play a critical role in

effective and sustainable development

– Using a CMS, XML technologies

• Chosen media will often dictate the process

and technology & tools choices

Development Process

• Similar to Software Engineering process

methodologies

– Content repositories (with revision control)

– Content owners– Content owners

– Content reviewers

– Layout/formatting owners

– Release cycle

Ref: “Building re-configurable multilingual training media”, I. Gru�tzner, L. Thomas, S. Steinbach-Nordmann, Current Developments in Technology-Assisted Education , (2006).

The article demonstrates the importance of designing re-configurableand re-useable training media for trainers, course-providers and course-developers. It analyzes the underlying concepts of re-configurable

Example Framework - Up2UML

developers. It analyzes the underlying concepts of re-configurabletraining media and their authoring, production and distributionprocesses and shows the importance of sound development for theconceptualization and realization of modular training media. It illustrateshow re-configurable training media can be built in various languagesand media-formats using open standards and open sourcetechnologies. Based on a multi-national project, the practical applicationof such an approach and show how trainers and course providers mightbe supported in building individualized sets of training media and acrossvarious media formats (WBT, print) are discussed.

Up2UML Highlights

• Multi-language, -format (PDF, HTML), -level

(“beginner”, “advanced”), -target groups, etc

• Sound Development processes for design and

developmentdevelopment

– modularization, re-use, no (content) redundancy

• Separating structure from layout – use XML

technologies Uses open standards and open

source –

– DocBook/DITA/etc markup frameworks, SVN

A (Trivial) DocBook Article

<?xml version='1.0'?>

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V3.1//EN">

<article>

<sect1 id="introduction”><title>Hello world</title>

<para>

Hello world! Hello world!

</para>

</sect1>

</article>

To render into HTML/PDF:$ docbook2html myfile.en.html$ docbook2pdf myfile.en.pdf

HTML Output (ugly!)

Add a few headers

<article> <!– Nested within the article element �

<artheader>

<title>Hello World 2</title>

<author>

<firstname>Foo</firstname>

<surname>Bar</surname><surname>Bar</surname>

<affiliation>

<address>

<email>foo@bar.ORG</email>

</address>

</affiliation>

</author>

</artheader>

HTML Output (still ugly!)

Add an abstract

<article> <!– Nested within the article element �

<artheader>

<abstract>

<para>

This document is intended to help newbies do ...

</para></para>

</abstract>

</artheader>

HTML Output (still ugly!)

A (Trivial) DocBook Book

<?xml version='1.0'?>

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"

"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">

<book>

<title>My First Book!</title>

<chapter id="ch1"><chapter id="ch1">

<title>First chapter</title>

<para>

Chapter 1 continues ...

</para>

</chapter>

</book>

HTML Output

A (real) DocBook Book

DocBook elements

• Sets, two or more books

• Books, contains a mixture of Dedication, Navigational

Components, Divisions, and Components

• Divisions, which divide books into parts and references

– Parts contain components– Parts contain components

– References contain refentries

• Components, which divide books or divisions into chapters

• Sections, which subdivide components

• Meta-information elements

• Block elements, paragraphs, lists, figures, equations, …

• Inline elements, markups of running text

Customizing the “look and feel”

• The right way: style sheets to customize L&F

– Cascading style sheet (CSS)

– XSL style sheets

– Xquery– Xquery

• The hard way: Introduce custom declarations

and markups to customize L&F

DITA HTML Example

DITA PDF Example

Which XML Framework?

• DocBook

– a document type description (DTD) for SGML and

XML.

– Very suited for technical documentation, plenty of – Very suited for technical documentation, plenty of

examples

• DITA

– Topic oriented

– Much more flexible for a large collection of items

Content Management Systems

(CMS)

• Jooma, Drupal, phpNuke, dotNetNuke, Plone,

Wiki, …

• Excellent platforms for online-only content

publishingpublishing

• Wikis for excellent collaborative development

framework

• Most (all?) have revision control

Why Not Just Use a Word-

Processor (ODF, OOXML, etc)?

• The same reason Mathematicians and

Scientists (and Linguists) use TeX/LaTeX

• Very difficult to manage after initial version

• Not transportable• Not transportable

• Must depend on 3rd party format conversion

tools

• Structure vs Layout – separating presentation

from structure

Resources - Books

Resources - Tools

• XML technologies

– DocBook. Much of the open source world uses it.

www.docbook.org, newbiedoc.sourceforge.net/metadoc/docbook-

guide.html

– IBM’s Darwin Information Typing Architecture (DITA), – IBM’s Darwin Information Typing Architecture (DITA),

OASIS standard. Extensive and mature.http://dita-ot.sourceforge.net/, http://dita.xml.org/

– “How Did You Decide Between DocBook, DITA, or Custom

DTDs?”, Eliot Kimber.http://xml.coverpages.org/DocBook-DITA-customDTD.html

• CMS – Joomla, Drupal, Plone, Moodle, …

Content LicensingContent Licensing

“© 2008,2009 Foo Bar, All Rights Reserved”

vs“No rights reserved”

Public domain vs. copyrighted

content

• What is “public domain”

– “I place this document in the public domain …”

– After copyright expires (more on duration below)

• What can be copyrighted and how?• What can be copyrighted and how?

• A quick survey of regional copyright laws

• Duration of copyright

– US (amended 1998): life of author + 77 years; if work for

hire (or under pseudonym) 28 years + 67 years = 95 years;

all work

Licensing

• GNU Public License GPL (sofware), Free

Documentation License FDL (content)

– Copyleft

• MIT, BSD, Apache, Artistic, … - design for free • MIT, BSD, Apache, Artistic, … - design for free

software, much more lenient than GPL/FDL

• Creative Commons – “customizable” content

licensing (of the future?)

– Allows modeling of all the other licenses, free or non-free

– Automatic license generator

The Question – Which License is

Right For You?

• Simple - Customize Creative Commons

• Pick the appropriate CC attributes.

– Corpus example: Attribution-Noncommercial-

Noderivative-ShareAlikeNoderivative-ShareAlike

– Document: Attribution-Noncommercial-ShareAlike

• Also offers easy way to release into the public

domain - Public Domain Dedication, and

Founder’s Copyright (work released into

public domain after 14 or 28 years)

Creative Commons Overview

• Four basic conditions:– Attribution (by)

– Noncommercial (nc)

– No Derivative works or NoDerivs (nd)

– ShareAlike (sa)– ShareAlike (sa)

• “Mix and Match” conditions to customize your

license

• Non-revocable

• Almost anything is possible …

And the answer is …

Generated

License

Generated

License

Grant of Rights Letter

January 13, 2009

To: CRBLP, BRAC University, Bangladesh

Re: Grant of Rights

This letter is to warrant that The Daily Star is the owner of all necessary legal rights to grant duplication rights associated with the following title:

Publisher Publication IPR owner

Mahfuz Anam The Daily Star The Daily Star

The Daily Star hereunder known as the Grantor hereby grants CRBLP of BRAC University, hereunder known as the Grantee, the rights to have the above listed title duplicated as specified by the Grantor. Furthermore, Grantor hereby agrees to promptly provide to Grantee any additional information reasonably requested in order to confirm the grant of rights provided herein.

Signed: ___________________________________

Name: Mahfuz Anam Title: PublisherContact Information: Phone: 812-4944, Fax: 812-5155

Mahfuz Anam The Daily Star The Daily Star

All issues of July 2007-December 2007

Case Studies for IP

• Content for Training and Education

– D.Net/BD, ENRD/NP, NIDA/Cambodia, NUCES/PK…

• Content for Livelihoood

– D.Net, …– D.Net, …

Copyright resources

• Wikipediahttp://en.wikipedia.org/wiki/Copyrights

• Stanford Copyright and Fair Use fairuse.stanford.edu/Copyright_and_Fair_Use_Overview/chap

ter0/0-a.html

• Creative Commons (+ license generator)http://creativecommons.org/

• 10 Big Myths About Copyright Explainedhttp://www.templetons.com/brad/copymyths.html

Guidelines for Local Language Content

Development II

Content Online Publishing, Dissemination and Content Online Publishing, Dissemination and

Evaluation

Online Publishing

• For “non-interactive” content (PDF, DOC,

OOXML, ODF), just put online! Assuming a

formal or informal CMS

• For interactive “native” content• For interactive “native” content

– Pure HTML

– Generated (CMS, DocBook, DITA, other XML

technologies, TeX/LaTeX, etc)

– Many accessibility issues, starting with fonts,

conventions, etc

Multilingual Online Content

• HTTP/1.1 Content-Language header

Content-Language: en, bn

• Language attributes

<html lang="en">

• Meta element with Content-Language• Meta element with Content-Language

<meta http-equiv="Content-Language" content="en,bn" />

• And in the document

<p>The Bangla word for <em>Saturday</em> is <em

lang=”bn">������</em>.

http://www.w3.org/International/tutorials/language-decl/

Rendering Issues

• Browser (client) must know how to render the

content

– Content meta-data, language tags

– Mapping language to typeface/font (client dependent)– Mapping language to typeface/font (client dependent)

– Proprietary and platform-specific solutions such as Web

embedded fonts, hardcoded font names…

• What about lack of meta information?

• Non-standard encoding? (non-Unicode, non-

ISO)

Encoding Issues

• Standard ISO encodings, including Unicode

• Non-standard encodings

– ASCII fonts

• Search engine• Search engine

– Simple if a standard encoding is used

– Must have custom handlers for non-standard ones

• Google/Yahoo/Altavista may simply not care enough to

handle it for your content

Accessibility Issues

• Display fonts

• Input method

– Forms

– Search– Search

• Print disability

– “Infomediaries”

– Speech technologies

Dissemination

• End-user training to use content

• “Last mile” issues

– CD/DVD based distribution (e.g., eGranary, D.Net)

• Release maintenance• Release maintenance

– Updates?

• Cultural issues

• Gender issues

• Literacy issues

Summary

• Developing “living” content requires a

complete development life-cycle

– User intervention to create relevant content

– Development methodology to improve process– Development methodology to improve process

– Respects source IP and uses effective licensing

regimes to disseminate

– Follows online publishing conventions to

disseminate

– Evaluates the content in context

Recommended