Upload
fawn
View
52
Download
0
Embed Size (px)
DESCRIPTION
Knowledge Management & Linguistic Pluralism Rajeeva Ratna Shah Secretary Government of India Ministry of Communications & Information Technology Department of Information Technology [email protected]. A CASE OF COMMUNICATION GAP. Wing Commander to Squadron Leader - PowerPoint PPT Presentation
Citation preview
Knowledge Management &
Linguistic Pluralism
Rajeeva Ratna ShahSecretary
Government of IndiaMinistry of Communications & Information Technology
Department of Information Technology [email protected]@mit.gov.in
A CASE OF COMMUNICATION GAP
Wing Commander to Squadron Leader
At 9 O'clock tomorrow there will be an eclipse of the Sun, something which does not occur every day. Get the men to fall out in the Lal Bahadur Shastri Marg in their uniform so that they will see this rare phenomenon, and I will explain it to them. In case of rain, we will not be able to see anything, then take the men to the gymkhana.
Squadron Leader to Flying Officer
By order of the Wing Commander, tomorrow
at 9 O'clock there will be an eclipse of the
Sun, if it rains you will not be able to see it
from the Lal Bahadur Shastry Marg, So then
in uniform, the eclipse of the Sun will take
place in gymkhana, something that does not
occur every day.
The Flying Officer to SergeantBy order of the Wing Commander in uniform tomorrow at 9 O'clock in the morning, the inauguration of the eclipse of the Sun will take place in the gymkhana. The Wing Commander will give the order if it should rain, something, which occurs everyday.
Sergeant to CorporalTomorrow at nine the Wing Commander in uniform will eclipse the sun in the gymkhana; as it occurs every day, if it is a nice day; if it rains, then in the Lal Bahadur Shastri Marg.
Corporal To Lance Corporal
Tomorrow at nine the eclipse of the Wing Commander in uniform will take place because of the Sun. If it rains in the gymkhana, something which does not take place every day, you will fall out in the Lal Bahadur Shastri Marg.
COMMENTS AMONG ALL IN THE UNIT
Tomorrow, if it rains, it looks as if the sun will eclipse the Wing Commander in the gymkhana. It is a shame that this does not occur every day.
The Broadening sphere of Information Technology
Cognition
INFORMATIONDATA KNOWLEDGE
Computation
Communication
Old EconomyCapitalist Society
(Legacy System)
New EconomyInformation Society (Knowledge Society)
Core: Competition is the key since capital is a limited and scarce resource
Core: Collaboration and sharing is the key since knowledge is inexhaustible
Capital diminishes with sharing
Knowledge increases with sharing
Capital investments are one time and subject to low obsolescence
Knowledge investments need continuous up-gradation and have high obsolescence
Knowledge of the 21st CenturySTHULA-JAGATSTHULA-JAGAT SOOKSMA-JAGAT SOOKSMA-JAGATMacrocosmMacrocosm Microcosm Microcosm
ATOMS
NANOTECH
Building Blocks & Knowledge Tools of 21st Century
NEURONS
NETWORKS
BITS
COMPUTERS
GENES
BIOTECH
Erosion of Knowledge base due to loss of language
Technologies – transformations in the Societies – Increase in Knowledge. BUT……..
From an estimated 10,000 language in 1900, the world has about 6,700 languages surviving today. 33% in Asia & 19% in Pacific Only 50 percent of those surviving ones are being
taught to children. Half the current languages will be effectively
extinct within a single generation.
Is there gain in knowledge or loss of Knowledge?
Sprawling digital divide
Rough sketch of global digital –divide among script
Latin Alphabet users :39 % of the global population enjoy 84% of access to the Internet
Hanzi-Chinese-Ideograph users in China/ Japan/ Korea:22% in global population enjoy 13% of Internet access
Arabic script users:9% of the population have 1.2 % of the Internet Access
Indic scripts users:occupy 22 % of the World population have just 0.3 % of Internet
access.
Is the technology to divide or to unite?
Exponential Growth Trends in Computer Performance
102400
100
200
400
800
1600
3200
6400
12800
25600
51200
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Year
MIPS
Giga PC
10G PC
2015 2016 2017 2018 2019 2020
204800
409600
819200
1638400 Tera PC
100G PC
Doubling every 15 months
Doubling every 2 years
Future Direction : Information Interspace
• Third wave in the ongoing evolution of the Global Information Infrastructure
• Computing technology will transform the Internet into Interspace.
• In future the Information Infrastructure will support semantic indexing and concept navigation across widely distributed community repository.
• Concept Navigation will become standard function in the Interspace
E-mail in ARPANET
(1965-85)
Document Browsing in INTERNET
(1985-2000)
Concept Navigation in INTERSPACE
(2000-10)
Script (10) Language (18)Devanagari Sanskrit, Hindi, Marathi, Nepali, Sindhi
& Konkani
Bengali Bangla, Assamese, Manipuri
Oriya Oriya
Gujarati Gujarati
Gurumukhi Punjabi
Telugu Telugu
Kannada Kannada
Tamil Tamil
Malayalam Malayalam
Urdu Urdu , Kashir
Linguistic Pluralism in India
Eighteen constitutional Indian Languages & their scripts
Language-wise world PopulationLanguage 2050
Population in Billion1996
Population in Billion
Chinese 1.384 1.113
Hindi/Urdu 0.556 0.316
English 0.508 0.372
Spanish 0.486 0.304
Arabic 0.482 0.201
Portuguese 0.248 0.165
Bengali 0.229 0.125
Russian 0.132 0.155
Japanese 0.108 0.123
German 0.091 0.102
Malay 0.080 0.047
French 0.076 0.070
0.630.630.810.810.580.583.383.3815.415.43.53.52390239010271027IndiaIndia
11.1711.170.690.691.931.9313.8113.8137.937.94.94.93940394012611261ChinaChina
60.5360.53132.94132.9433.7033.7057.3557.351706.61706.63.83.825000250005959FranceFrance
68.2968.29294.58294.5833.6033.6063.4863.481699.91699.94.14.125010250108282GermanyGermany
USAUSA
CountryCountry
282282
Population Population in millionin million
3426034260
PPPPPP
5.25.2
IT/GDP in IT/GDP in percentpercent
44.4244.423714.013714.0162.2562.2566.4566.452792.12792.1
Mobile Mobile phones phones users per users per 100 100 PersonsPersons
Internet Internet Host per Host per 10,000 10,000 PersonsPersons
PC PC PenetratioPenetration per 100 n per 100 personspersons
Tel. Tel. Density Density Tel. per Tel. per 100 100 personspersons
IT per IT per capita capita Nominal Nominal US$US$
Data on Information Technology Indicators
Source: ITU-2001 and IMF world economic review 2001.
Language Technology Mission
Vision : Digital unite and knowledge for all.
Mission: Communicating without language barrier & moving up the knowledge chain.
Objectives:
• To develop information processing tools to facilitate human machine interaction in Indian languages and to create and access multilingual knowledge resources/content.
• To promote the use of information processing tools for language studies and research.
• To consolidate technologies thus developed for Indian languages and integrate these to develop innovative user products and services.
Major Initiatives1. Knowledge Resources (Parallel Corpora, Multilingual Libraries/Dictionaries, lexical
resources)2. Knowledge Tools (Portals, Language Processing Tools, Translation Memory
Tools)3. Translation Support Systems
(Machine Translation, Multilingual Information Access, Cross Language Information Retrieval)
4. Human Machine Interface System (OCR, Voice Recognition Systems, Text-to-Speech System)
5. Localization (Adapting IT Tools and solutions in Indian Languages) 6. Language Technology Human Resource Development
(in NLP & Computational Linguistics)7. Standardization (ISCII, Unicode, XML, INSFOC, MPEG, Terminology, etc.)
Industry Involvement Through CoIL-tech
To catalyze the Language Technology
innovation and productization in industry
and to foster interaction with academia,
MAIT has nucleated a consortium
named Consortium on Innovation &
Language Technology (COILTech) with
members from industry and research
organizations.
Major Achievements of TDIL Programme of DIT OCRs Developed• Hindi • Marathi • Bangla • Tamil • Telugu • Punjabi(with 97% accuracy)
OCRs under Development• Gujarati• Assamese• Oriya• Malayalam
1. Hindi 2. Marathi 3. Bangla 4. Tamil (Spell checkers Developed) 5. Telugu 6. Punjabi 7. Malayalam
Machine Aided Translation System (MAT)• The Anglabharati MAT Technology with high accuracy has been
developed by IIT Kanpur • Text-to-Speech integrated with MAT system has also been
demonstrated• On-line MAT system can be accessed on the web at:
www.anglahindi.iitk.ac.in
Speech Recognition
• Continuous Speech Recognition System for Hindi is being developed by IBM Research Lab India.
Parallel Corpora
• Development of One Million pages Parallel Corpora (Gyan-Nidhi) for knowledge Repository has been undertaken.
• The Parallel Corpora can act as a test-bed for the OCR and EBMAT (Example Based Machine Aided Translation) systems.
Language Technology Products in Public DomainFor widespread proliferation, a number of the freely downloadable softwares are available on the TDIL web-site: http://tdil.mit.gov.in. These include fonts with Keyboard drivers, e-mail client, bilingual Word processors, Glossaries, Corpora and Classic contents.
Open Source Software INDIX (Indian Language Interface) supports Indian languages on Linux. This will ensure affordability of IL software based on Linux. Open Source Software approach will ensure faster localization and low cost software.
Standardisation • Standardization of 8 bit ISCII (Indian Script Standard Code
for Information Interchange) was developed in 1988 & is a subset of the Unicode
• DIT (Govt. of India) is a voting member of the Unicode consortium
• Feedback on revision of UNICODE 3.0 for all Indian languages has been finalised
• International UNICODE Conference 2003 in India Proposed
• Draft Standard for - • Display codes in the form of INSFOC (Indian Standard
for Font Code) is ready• Indian Script to Roman Transliteration (INSROT) is
ready• Multi–lingual lexical format has also been proposed
TOMORROWS TOOLS:
PDS for ANMs, water, power, schools, crafts, GIS
WORLD COMPUTER:
Low cost computing devices
Linux CE, Village Interfaces, Village Info
Systems
BITS FOR ALL:
Wi-Fi nets, DakNet
DIGITALVILLAGE:Community ConnectionVillage Voice
Media Lab Asia Programme - Major Project Areas
Low cost Computing Devices - Choice of Technologies
High-bandwidth option: High-bandwidth option: IEEE 802.11B/802.11AIEEE 802.11B/802.11A
Typical transfer rates:Typical transfer rates: 11 Mbps @180m11 Mbps @180m 1 Mbps @500m1 Mbps @500m
Prices still falling:Prices still falling: Access point, <US$180Access point, <US$180 Transceiver, <US$80Transceiver, <US$80
Peer-to-peer supportedPeer-to-peer supported
Low Cost Computing Devices
Ruggedized terminals Ruggedized terminals with medium with medium functionality and low functionality and low cost < US$100 cost < US$100
(also has smart card port and musical keyboard)
E-Learning E-Learning
Vidya Vahini Gyan Vahini
Proposed Setup for Vidya VahiniProposed Setup for Vidya Vahini
INTERNET
INTERNET
UPS
LAN HUB
LANPRINTER
COMPUTER LAB
ROUTERSERVER
SERVER
TV
Pilot Project – Vidya Vahini
200 schools in select districts Systems at school:-
One Server (P-IV based), 256 MB memory, 40 GB Hard Disk Drive
Network Printer Multimedia Personal Computer with Web Camera Colour TV 27”/29” 2 KVA UPS with 50/30 minutes power back-up Software (MS Office full suite, Education software with
Multi-lingual support, Course Curriculum software, Filtering software and School Administration Software)
Internet access of 128 Kbps – to be increased gradually
Technology Class Room A classroom in every school will be converted into
a Technology Class Room. The Technology Room will have
29” Flat-Screen TV connected with a PC which will further be connected to the Server
Computer-aided techniques will be used to impart teaching basic course curriculum
Vidya Vahini Schools as Anchor Schools
• Using VSAT based Internet Connectivity (8 mbps)
• Using a transceiver, dissemination upto 11 mbps of bandwidth in a radius of 4 to 5 Kms
Training - Teacher Empowerment The Teacher Empowerment Programme forms the heart of The Teacher Empowerment Programme forms the heart of
“Vidya Vahini”. The programme covers training of teachers in:“Vidya Vahini”. The programme covers training of teachers in: Use of Computers Effective Teaching Techniques Creating Lessons Building Teaching Tools Usage of Technology in class rooms Training on Educational Software
7 Computer Labs equipped with 1 Server, Printer, 10 PCs, 7 Computer Labs equipped with 1 Server, Printer, 10 PCs, TV, Educational software tools are proposed in collaboration TV, Educational software tools are proposed in collaboration with Industry in the Pilot Projectwith Industry in the Pilot Project
Knowledge PortalA Knowledge Portal will be hosted which haveA Knowledge Portal will be hosted which have
Education material Programming Tools Software tools for teachers Software tools for students Language tools Filtering Software CBSE Course curriculum Web pages of all schools Circulars/Notices/directives issued by the Central Board of
Secondary Education and different Boards throughout the country
Students will be able to access, harness and manage Students will be able to access, harness and manage knowledge through the Portalknowledge through the Portal
Gyan Vahini
Phase IPhase I Set up IT infrastructure and connect all Govt. Set up IT infrastructure and connect all Govt.
funded Universities (including Deemed funded Universities (including Deemed Universities), Engineering Colleges and Medical Universities), Engineering Colleges and Medical Colleges in the countryColleges in the country
Phase II Phase II Set up IT infrastructure and connect all Set up IT infrastructure and connect all
Polytechnics, Degree and Dental Colleges across Polytechnics, Degree and Dental Colleges across the countrythe country
Typical Campus Wide Network for High End Institutions
INTERNETINTERNET
Router Cum RAS
Central Switch
Administrative Block
Dif
fere
nt
Stu
de
nt
Ho
ste
ls
Res
iden
tial
Qu
arte
rs
Fibre Optic Cable
CAT 5 cabling
Hostel Block
Var
iou
s D
epar
tmen
ts
Academic Block
Existing PBX
Computers
Internet and LAN Servers
Cat 5 CableFibre Optic Cable
Switch
Switch
Switch
100Mbps
SwitchSwitch
Switch
Switch
Switch
Computers
Com
pu
sers
Com
pu
sers
Computers
Computers
Computers
Com
pu
sers
Com
pu
sers
Switch
Switch
New Initiatives under Consideration
1. e – Content (including Digital Library)
2. Speech- to- Speech translation
3. Open source software