Vision and Plan for Data Processing Center at Social Science Research Institutes

Embed Size (px)

Citation preview

  • 8/2/2019 Vision and Plan for Data Processing Center at Social Science Research Institutes

    1/7

    Vision of Data Processing Environment at SocialResearch Institute.

    Background

    Social research institutes plays vital role in coordination between government and

    public being part of formulation and monitoring developmental plan. They also help inunderstanding nature and status of the society belonging to its area of study. Due to its

    specific role, it hold data which has different nature from that of industry and

    government statistical system. While industry has more emphasis on managerial type

    data (for taking decisions on day to day activity) and hence it is mostly of online nature,

    social research institutes use data for policy formulation and monitoring purposes (and

    hence off line nature). Unlike government statistical organization, they rely many times

    on qualitative surveys and experiment with new methodologies, indicators and

    subjective nature of data. Unlike industries and government statistical system, they

    heavily depend upon data from many sources which are not always synchronized at

    scale of time, geographical nature and purpose. Due to their specific data requirement,

    social research organization require a particular type of data processing environment.

    Due to availability of vast computational power in Information Technology (IT) in last

    two decades or so, in turn, impacted significantly on the techniques for designing and

    implementing social research (qualitative and quantitative). Parallel to the

    developments in hardware, there is significant improvements in the quality and user

    friendliness of software for statistical data processing, analysis, and dissemination. This

    has also made it possible for many of the processing tasks to move from computer

    experts to subject matter specialists. A number of software packages for the processing

    of statistical surveys have emerged over the years. The relative strengths for each of

    these software products differ with the different steps of data processing. Use of

    suitable software, for different steps of data processing, and training have significantrole in plan for realization of vision of modern data processing system.

    Vision

    Vision of data processing environment for social research institute may be expressed

    through following capacities and behaviors:

    1. Institute is capable of large scale qualitative and quantitative data analysis.

    2. Any data related with qualitative or quantitative research may be released for

    analysis with in four months of field work.

    3. Data processing may help in monitoring of field work (problem of probing etc.)

    through patterns in incoming data.

    4. Sufficient computational and analytical skill to adopt full strength of computer

    based analysis (see annexure-1).

    5. Can easily adopt any new methodological change in data capturing, analysis,

    presentation, dissemination and computerized content as well as knowledge

    management system.

    6. Have rich data bank comprising all relevant data and documents either owned

    by institute or collected from others (may bepanel data). It is integrated with

    broader network with various level of asses to users of data.

    7. Have good links with other institutes and individual users of its study for

    sharing data and ideas through social network.

    http://en.wikipedia.org/wiki/Social_researchhttp://en.wikipedia.org/wiki/Qualitative_data_analysishttp://en.wikipedia.org/wiki/Quantitative_researchhttp://en.wikipedia.org/wiki/Content_managementhttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Panel_datahttp://en.wikipedia.org/wiki/Social_networkhttp://en.wikipedia.org/wiki/Social_researchhttp://en.wikipedia.org/wiki/Qualitative_data_analysishttp://en.wikipedia.org/wiki/Quantitative_researchhttp://en.wikipedia.org/wiki/Content_managementhttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Panel_datahttp://en.wikipedia.org/wiki/Social_network
  • 8/2/2019 Vision and Plan for Data Processing Center at Social Science Research Institutes

    2/7

    Organization of Data Processing

    To proceed in direction of above vision, modular approach will provide more

    adoptability and flexibility to implement plan of realization of vision. Total data

    processing environment may be divided in centers which will perform different steps of

    data processing. These centers have been created according to different nature of work,

    requirements of software (and its training) and skills to perform the task. Developing allcenters simultaneously to perfection level is not essential. They can be developed in

    phases.

    Although Data Bank is central part of data processing, we can develop data processing

    system from periphery. Centers may be given priority as follows:

    (1) Data preperation center:

    Although data may come in different form (like textual, number, audio, video etc.), we

    can concentrate on numeric (quantitative) and textual data obtained as outcome of

    quantitative and qualitative survey at initial stage. Data preperation of quantitative an

    qualitative surveys are entirely different (and hence different skill and software

    required), separate wing may be created for preparing quantitative and qualitative

    survey data. Following will be requirement of wings:

    Quantitative wing:

    Center forAnalysis

    Center ofSocial

    Network

    Disseminationcenter

    DataPreperatio

    n center

    Data Bank

    http://en.wikipedia.org/wiki/Data_processinghttp://en.wikipedia.org/wiki/Data_processinghttp://en.wikipedia.org/wiki/Data_processinghttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://en.wikipedia.org/wiki/Data_processinghttp://en.wikipedia.org/wiki/Data_processing
  • 8/2/2019 Vision and Plan for Data Processing Center at Social Science Research Institutes

    3/7

    Hardware: PCs (of moderate strength). Number may vary as per work load

    Software: CSPro (free to download).

    Responsibility: Data entry, data validation, codification, basic predefined tabulation,

    generation of field monitoring reports.

    Skill: In charge of center should have understanding of (1) logic associated with

    questionnaire (2) steps of data preperation (3) program development through CSPro (4)

    basic understanding of database, spreadsheets, data archiving (for in charge). Rest of

    staff will work as data entry operator. Basic knowledge of computer (file system) will

    be required for them.Link: Questionnaire preperation team, data bank

    Qualitative wing:

    Hardware: PCs (of moderate strength). Number may vary as per work load.

    Software:AtlasTi,Anthopac, Answer(free), ez-text(free)

    Responsibility: Entry of field report (or its summary) according to format required for

    software, creating codes.

    Skill: Understanding of subject, capable to create suitable quotation and code from text.

    All faculty and research scholars who are involved in qualitative research should have

    skill of running such software.

    Link: Team of qualitative research, data bank

    (2) Center for analysisAll faculty and research scholars should be attached with analysis center.

    Hardware: PCs with sufficient RAM and CPU strength to all faculty. A good lab for

    research scholars.

    Responsibility: Doing exploratory and confirmatory data analysis, report writing,

    preperation of presentation.

    Skill: Knowledge of using word processor, spread sheets, slide preperation tools,

    statistical software, GIS based modeling and simulation

    Software: MS Office, Open Office (free), Epi Info (free) for presentation through map(other open source GIS software may be selected according to level of requirement, see

    forother sources), Stata (more suitable for analysis of large complex surveys).

    Link: Data bank

    (3) Dissemination center

    Hardware: PCs of sufficient strength.

    Software: Basic knowledge of HTML, CSS, HTML Editor. There are many tools

    available which reduce programming load for its user. Druple is one of them which is

    freely available. There are many free html editor also available. Most of the content

    management tools have its own HTML editor.

    Responsibility: Center will receive raw documents in form of soft copy from itsfaculties and will convert them in suitable format for publishing (in hard copy as well as

    on web). Unless development of databank, all part of content management- creation,

    editing, publishing and managing (archiving) will be responsibility of this center.

    Skill: Aesthetic sense of word processing, skill to use content management tools.

    Link: Analysis center, data bank

    (4) Center of social network

    Any social research institute can not work in isolation. Recent developments in IT and

    web, has made it possible to use social network for learning and research. There are

    many benefits of social networking at individual level as we as organizational level.

    Following are benefits at organizational level:

    1. Make sure knowledge gets to people who can act on it in time.

    2. Connect people and organization to build relationships across boundaries of

    geography or discipline.

    http://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.atlasti.com/http://www.atlasti.com/http://www.analytictech.com/anthropac/apacdesc.htmhttp://www.cdc.gov/hiv/topics/surveillance/resources/software/answr/index.htmhttp://www.cdc.gov/hiv/topics/surveillance/resources/software/ez-text/index.htmhttp://www.cdc.gov/hiv/topics/surveillance/resources/software/ez-text/index.htmhttp://en.wikipedia.org/wiki/Exploratory_data_analysishttp://en.wikipedia.org/wiki/Statistical_hypothesis_testinghttp://opensourcegis.org/http://freegis.org/http://en.wikipedia.org/wiki/Cascading_Style_Sheetshttp://en.wikipedia.org/wiki/HTML_editorhttp://en.wikipedia.org/wiki/HTML_editorhttp://www.software-pointers.com/en-content-tools.htmlhttp://www.software-pointers.com/en-content-tools.htmlhttp://drupal.org/getting-started/before/overviewhttp://en.wikipedia.org/wiki/Comparison_of_HTML_editorshttp://en.wikipedia.org/wiki/Comparison_of_HTML_editorshttp://www.c4lpt.co.uk/handbook/contents.htmlhttp://www.atlasti.com/http://www.analytictech.com/anthropac/apacdesc.htmhttp://www.cdc.gov/hiv/topics/surveillance/resources/software/answr/index.htmhttp://www.cdc.gov/hiv/topics/surveillance/resources/software/ez-text/index.htmhttp://en.wikipedia.org/wiki/Exploratory_data_analysishttp://en.wikipedia.org/wiki/Statistical_hypothesis_testinghttp://opensourcegis.org/http://freegis.org/http://en.wikipedia.org/wiki/Cascading_Style_Sheetshttp://en.wikipedia.org/wiki/HTML_editorhttp://www.software-pointers.com/en-content-tools.htmlhttp://drupal.org/getting-started/before/overviewhttp://en.wikipedia.org/wiki/Comparison_of_HTML_editorshttp://www.c4lpt.co.uk/handbook/contents.html
  • 8/2/2019 Vision and Plan for Data Processing Center at Social Science Research Institutes

    4/7

    3. Provide an ongoing context for knowledge exchange that can be far more

    effective than memoranda.

    4. Attune everyone in the institute to each other's needs more people will know

    who knows who knows what, and will know it faster.

    5. Multiply intellectual capital by the power of social capital, reducing social

    friction and encouraging social cohesion.

    6. Create an ongoing, shared social space for people who are geographically

    dispersed.

    7. Amplify innovation when groups get turned on by what they can do online,they go beyond problem-solving and start inventing together.

    8. Create a community memory for group deliberation and brainstorming that

    stimulates the capture of ideas and facilitates finding information when it is

    needed.

    9. Improve the way individuals think collectively moving from knowledge-

    sharing to collective knowing.

    10. Turn training into a continuous process, not divorced from normal business

    processes.

    Hardware: PC with sufficient bandwidth.

    Software: Most of the social software are available as web services and are free.Responsibility: In charge of center will analyze, expand and maintain social network of

    institute.

    Skills: Although faculties and staff will be member of this center. In charge of center

    will maintain communication on behalf of institute at platform of social network.

    Link: Faculty and staff, all centers, external people and organization.

    (5) Data bankData bankis central part of data processing system. It is the center through which other

    center will be coordinated. Apart from own data and report, center will work as

    consortium of different academic and research institutes as well as external socioeconomic data banks like Inter University Consortium for Political and Social Research ,The United Nations Statistics Division, Minnesota Population Center, IQSS Dataverse

    Network etc.Hardware: Sever and PCs with sufficient bandwidth. Institute can hire web hosting

    services for maintaining its external link.

    Software: Tools forwebmaster (to be selected by webmaster according to his

    confidence. Many open source toolsare available).

    Skill: Role of data center is very challenging. Its in-charge should be capable to

    configure server, install application at host site and integrating web services. He should

    know server and client scripting language (like PHP and Javascript) and Database

    management tools.

    Responsibility: Following are responsibilities of data bank center

    1. Create catalog of data and reports.

    2. Put uniform code for geographical area (in different data sets) so that they may

    be linked

    3. Create different aggregation level of data as per need.

    4. Provide data in required format

    5. Createmetadata for data collected by institute. It will help to share data.

    6. Preparing time series micro- economic data banks

    7. Role of webmaster

    Links: With all centers and external network.

    Challenges in realizing the vision

    1. It is difficult to identify a role model. A lot of experimentation are going on at

    international level. There is need to be cautious to choose own path by learning

    from on going experimentation.

    http://en.wikipedia.org/wiki/Social_softwarehttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.icpsr.umich.edu/http://www.icpsr.umich.edu/http://www.icpsr.umich.edu/http://millenniumindicators.un.org/unsd/aboutus.htmhttp://www.ipums.org/http://dvn.iq.harvard.edu/dvn/http://dvn.iq.harvard.edu/dvn/http://en.wikipedia.org/wiki/Web_hosting_servicehttp://en.wikipedia.org/wiki/Web_hosting_servicehttp://en.wikipedia.org/wiki/Web_masterhttp://en.wikipedia.org/wiki/Web_masterhttp://www.opensourcescripts.com/dir/PHP/Web_Hosting_Tools/http://www.opensourcescripts.com/dir/PHP/Web_Hosting_Tools/http://en.wikipedia.org/wiki/Web_serviceshttp://www.im.gov.ab.ca/publications/pdf/MetadataResGuide.pdfhttp://www.im.gov.ab.ca/publications/pdf/MetadataResGuide.pdfhttp://www.nber.org/chapters/c6615.pdfhttp://en.wikipedia.org/wiki/Social_softwarehttp://www.cee-socialscience.net/archive/empirical/dbsr/report1.htmlhttp://www.icpsr.umich.edu/http://millenniumindicators.un.org/unsd/aboutus.htmhttp://www.ipums.org/http://dvn.iq.harvard.edu/dvn/http://dvn.iq.harvard.edu/dvn/http://en.wikipedia.org/wiki/Web_hosting_servicehttp://en.wikipedia.org/wiki/Web_hosting_servicehttp://en.wikipedia.org/wiki/Web_masterhttp://www.opensourcescripts.com/dir/PHP/Web_Hosting_Tools/http://en.wikipedia.org/wiki/Web_serviceshttp://www.im.gov.ab.ca/publications/pdf/MetadataResGuide.pdfhttp://www.nber.org/chapters/c6615.pdf
  • 8/2/2019 Vision and Plan for Data Processing Center at Social Science Research Institutes

    5/7

    2. IT people are trained as per need of business and industry. It may be difficult to

    identify suitable people (or trainer) according to need of institute.

    3. There may be resistance for change in role of faculties and staff.

    4. Old habit may resist for new change. Chances of resistance increases because

    gain (through data processing system) can be perceived only after certain level

    of perfection.

    5. Training is crucial for vision. For successful training, it is necessary to fix

    target of achievement at organization and individual level (in terms of work)

    after particular training. This is difficult to implement. Trainer also may not beready for it (it will require many follow-ups).

    6. Hierarchy may have objection to assign higher role to efficient person.

    7. Weak motivation for training in participants.

    Conclusion

    From above discussion, it clear that for developing good data processing system, apart

    from investment in hardware, there is little monetary investment in software is required.

    Real issue in developing a good data processing environment is training. Training for

    most of areas are also available on net (even free) Sufficient will and motivation can

    lead a social research institute in direction of developing a modern data processing

    environment.

  • 8/2/2019 Vision and Plan for Data Processing Center at Social Science Research Institutes

    6/7

    Annexure-1

    Role of Computational Skill in Statistical Analysis

    Hurdles in statistical analysis

    1. Vague vision regarding statistics- whether it is number or methodology or way

    of thinking;

    2. Less importance to variation as compared to center of data. The main cause

    seems to lack of computational capability. Due to this reason, statistical scale

    could not be developed properly;

    3. Simulation as a tool of analysis could not get desired importance, again due to

    lack of computational skill;

    4. Statistical weights based on data did not used for conversion of a unknown

    phenomena to a number (use of latent variable), which creates unresolved

    disputes;

    5. Lack of proper sampling design, restricts to generalize results in right manner;

    6. Generally statistical results are interpreted as causal relationship.

    Common view on computer based computational capability

    1. Required as it works fast;

    2. It is useful as it hides mathematical complexity of statistical tools;

    3. Obtained results are more accurate;

    4. Little computational burden;

    5. Only investment is a computer and some feel that a statistical package with skill

    to run it is also required.

    What is reality

    1. It works fast only if data is organized in proper format;2. It hides mathematical complexity but it requires clear understanding of

    assumptions and interpretation lying behind statistical tools. Application of

    tools without feeling of data may lead to misleading results;

    3. It may provide inaccurate, sometimes more disastrous results, if proper steps are

    not followed;

    4. Yes, it ease the burden of computation, if logical complexities are less and

    dataset is large;

    5. Apart from investment for computer and skill to run statistical software, skill to

    organize data is required.

    In fact most of the analyst did not change orientation for data analysis in spite of fast

    improvement in computational capabilities. How new framework of analysis should be

    different from old one, capabilities required and new concepts emerging due to

    availability of power of computational tools can be understood by comparing old

    framework of analysis with new one (as follows):

  • 8/2/2019 Vision and Plan for Data Processing Center at Social Science Research Institutes

    7/7

    Old framework of analysis New framework of analysis

    Start analytical work by following

    precedence in the area of study

    Start analysis with an attempt to know and

    feel the data (exploratory data analysis)

    Format of analysis is fixed before

    planning of data collection

    Mixed strategy is followed with more

    emphasis on learning from data

    Computational skill; and analysis and

    interpretation are treated different entity

    Needs computational and analyzing skill

    in same person

    Descriptive analysis is based only on

    different measures of central tendency

    such as mean, median, mode etc

    Apart form studying central tendency of

    data, more emphasis is given on variation

    in data

    Testing of assumptions for use of certain

    statistical tools is almost neglected

    Testing of assumptions of tools and

    transforming the data to meet these

    assumptions is given importance

    Anything computed is worth for reporting A major part of computation is meant for

    understanding and feeling the data

    Computational work cannot be reused Reusability is significant part of skill

    Believe that analysis start after obtainingthe data

    Believe that analysis starts with planningof survey

    Missing values and non response is not

    given due weight due to computational

    problems

    Missing and non response can be handled

    easily

    Sampling design is not important for

    developing a model

    Sampling design is important for applying

    a model

    Only those statistical model should be

    used which has clear mathematical

    solutions

    Simulation may be used where analytical

    solution is not possible

    Understanding of behavior of data in

    terms of probability is not much important

    Understanding of probabilistic

    interpretation of behavior of data isimportant

    http://en.wikipedia.org/wiki/Exploratory_data_analysishttp://en.wikipedia.org/wiki/Exploratory_data_analysis