diff --git a/.travis.yml b/.travis.yml index 78ac6e5..b42dd2f 100644 --- a/.travis.yml +++ b/.travis.yml @@ -13,15 +13,6 @@ r_github_packages: - r-lib/covr after_success: - Rscript -e 'covr::codecov()' - - Rscript -e 'pkgdown::build_site()' -deploy: - provider: pages - skip-cleanup: true - github-token: $GITHUB_PAT - keep-history: true - local-dir: docs - on: - branch: master addons: apt: packages: diff --git a/README.html b/README.html new file mode 100644 index 0000000..10a7cf4 --- /dev/null +++ b/README.html @@ -0,0 +1,312 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + +
+

QCoder

+Lightweight package to conduct qualitative coding
+ +

+ +

+
+

Installation

+

To install the latest development version, run

+
install.packages("devtools")
+devtools::install_github("ropenscilabs/qcoder")
+

Please note that this is not a release-ready version and should be considered experimental and subject to changes. Still, we encourage you to install and send us feedback on our issue tracker.

+

For instructions for use, see the Using QCoder section below.

+
+
+

Motivation

+

The motivation stems from the need for a free, open source option for analyzing textual qualitative data. Textual qualitative data refers to text from interview transcripts, observation notes, memos, jottings and primary source/archival documents.

+

Qualitative data analysis (QDA) processes, particularly those developed by Corbin and Strauss (2014), Miles, Huberman, and Saldana (2013), and Glaser and Strauss (2017), can be thought of as layering interpretation onto the text. The researcher starts with open coding, meaning that she is free to tag snippets of text with whatever descriptions she deems appropriate. For instance, if the researcher is coding observation notes and senses that conversations between two individuals will be relevant to the research questions, she might tag the instances in which the two individuals speak with the code “conversation.” In the next round of coding, she might classify what the participants discuss with a finer tag, like “conversation_package” if they were talking about creating packages in R. In another round, she might get even more specific with codes such as “conversation_package_nomoney” if a participant discussed not having money to create a package in R. Later rounds involve conflating codes that might mean the same thing, relating codes to one another (often by documenting their meanings in a similar way to software documentation), and eliminating codes that no longer make sense.

+

Sometimes the QDA process involves looking for instances that demonstrate some concept, mechanism, or theory from the academic literature on the subject. This approach is common toward the end of the process in fields such as organization studies, management, and other disciplines and is prominent as an approach from the beginning in fields with experimental or clinical roots (e.g., psychology).

+

Using software for QDA allows researchers to nest codes, then begin to see the number of instances in which a particular code has been applied. Perhaps more importantly, software allows easy visualization and analysis of where codes co-occur (i.e., where multiple codes have been applied to the same snippets of text), and other linking activities that help researchers identify and specify themes in the data. Some software packages allow the researcher to visualize codes and the relationships between them in new and innovative ways; however, a cursory review of the literature suggests that qualitative researchers often use very basic features; some even use software such as Endnote on Microsoft Word or even analog systems such as paper or sticky notes.

+

QDA is an iterative process. Researchers will often change, lump together, split, or re-organize codes as they analyze their data. Depending on the coding approach, researchers might also create a codebook or research notes for each code that defines the code and specifies instances in which the code should be applied.

+
+
+

Current QDA software

+

To date, researchers who conduct QDA largely rely upon proprietary software such as those listed below. Free and open source options are limited, not easy to integrate with R and (credit to bduckles for the descriptions):

+
    +
  • NVivo – QSR International Desktop based software with both Windows and Mac support. Functionality includes video and images as data and auto-coding processes. Tech support and training is available and a relatively large user base.

  • +
  • Atlas.TI Desktop based software for Mac and Windows, also has mobile apps for android and iOS. Similar functionality as NVivo including using video and visuals as data. Has training and support.

  • +
  • MAXQDA Desktop based software that works for both Mac and Windows. Training and several tiers of licenses available.

  • +
  • QDA Miner – Provalis Research A Windows application to analyze qualitative text, can be used with some visuals as well. They also have a “lite” package which is free.

  • +
  • HyperResearch – Researchware Desktop based software that works on both Mac and Windows.

  • +
  • Dedoose Web based software, usable on any platform. Data stored in the web instead of on your device. Includes text, photos, audio, video and spreadsheet information.

  • +
  • Annotations Mac only app that allows you to highlight, keyword and create notes for text documents. An inexpensive way to do basic qualitative data analysis. Last updated in 2014.

  • +
  • RQDA A QDA package for R that is free and open source. Bugs in the program make it challenging to use. Last updated in 2012.

  • +
  • Coding Analysis Toolkit – CAT A free, web based, open source service that was created by University of Pittsburgh. Can be used on any platform. Not supported.

  • +
  • Weft QDA A free and open source Windows program that has no support. The website says that bugs in the program can result in the loss of data.

  • +
+
+
+

Limitations of Existing Software

+

Each extant software package has its limitations. The foremost limitation is cost, which can prohibit students and underfunded qualitative researchers from conducting analyses systematically, and efficiently. Furthermore, the mature software packages (e.g., Atlas.TI, NVIVO) offer features that exceed the needs of many users and, as a result, suffer speed issues (particularly for those researchers who may not benefit from advanced hardware). The sharing process for proprietary QDA outputs is equally unwieldy, relying on non-intuitive bundling and unbundling processes, steep learning curves, and non-transferable skill development.

+

Open source languages such as R offer the opportunity to involve qualitative researchers in open source software development. Greater involvement of qualitative researchers serves to expand the scope of R users and could create inroads to connect qualitative and quantitative R packages. For instance, better integration of qualitative research packages into R would make it possible for existing text analysis programs to work alongside qualitative coding.

+
+
+

User Needs

+

We began this project at rOpenSci’s runconf18 based on bduckles issue. We began by discussing the common challenges we and our peers face in conducting QDA and considering what we might learn from existing quantitative and text analysis open source packages. We identified a number of unique challenges qualitative researchers encounter when analyzing data.

+
    +
  • GUI for broader adoption. Quantitative researchers are often familiar with conducting analyses from the command line. Qualitative researchers, on the other hand, often rely upon Graphical User Interfaces (GUI) to perform analysis tasks. A proposed, in-development solution is integrating Shiny apps to permit a personalized GUI for QCoder.

  • +
  • Collaboration. Several aspects of collaboration in qualitative research present challenges for development and use of QDA software. The first challenge relates to version control and tracking changes. Whereas contributors to the open source software community, and perhaps the software development community more generally, have developed processes for sharing code, data, and other work products, qualitative researchers often rely on email, DropBox/box, and other sharing technologies that lack appropriate version control features. One proposed solution is to implement a system of record-keeping that maintains all files akin to Atlas.TI’s bundling and unbundling process, but that permits simultaneous work and comparison/merging of changes. We might also consider urging users to learn git and facilitate skill development by offering tutorials. The second collaboration challenge relates to the need to establish inter-rater or inter-coder reliability. Existing methods of assessing inter-rater reliability for QDA include Cronbach’s Alpha and Krippendorff’s Alpha, among others. Such evaluations, though, do not account for complexities such as decisions not to code a segment of text. In other words, by deciding not to apply a code to a snippet of text might be considered an equivalent level of agreement as a decision to apply the same code to the same snippet of text. QCoder aims to make inter-rater reliability assessments easier and more effective than traditional approaches by allowing direct comparisons of differences in the codebooks it generates.

  • +
  • Privacy concerns. Qualitative data often includes personally-identifying information. Quantitative approaches to deidentification are not always applicable to qualitative datasets because text-based data includes rich descriptions that may elevate the risk of re-identification. Furthermore, qualitative data tends to be unstructured in ways that make systematic approaches to de-identification difficult to implement. QDA software, then, needs to account for these complexities.

  • +
  • Reproducibility. Qualitative research communities, particularly those communities employing methods to probe human and social behavior (such as ethnography), fiercely debate whether or not qualitative research is meant to be reproducible. In other words, some researchers argue that human and social behavior is contextually bound by moments in time, physical settings, and the generally fluid nature of the phenomena under study. Accepting this dispute as valid, we are striving to develop a package that can facilitate some level of reproduction of data analysis. Our approach centers on the production of codebooks. Qualitative researchers create codebooks while coding their data. These codebooks match the code with a description of how and when the code is used. The creation of the codebook is an iterative process that is concurrent with the coding process. Codes are routinely combined, split and shifted as the researcher does their analysis. Codebooks are a way for the researcher to indicate to other members of their research team how to apply codes to the data. Historically, these code books have not been standard across different QDA packages. There could be a benefit to having a standard codebook format which could be compared across projects and even as a project iterates. Using Git or other version control as a codebook is created could prevent mistakes and be a teaching and learning tool for qualitative research. Additionally, most codebooks do not contain sensitive or private data and they could be shared and made publicly available. One could envision the capacity for researchers to share their codebooks so that research that follows could draw on existing codes for future research. A key characteristic of QCoder is that it exposes QDA processes to critique in ways that proprietary software excludes. For example, it is unreasonable to expect that researchers interested in reviewing or reproducing the analysis done for a manuscript pay for a license to perform the task at hand. QCoder can be easily installed and potentially maintained so that critical mistakes in analysis might be caught and corrected in review processes.

  • +
  • Money. Some of the disciplines in which qualitative work occurs do not enjoy the same level of funding as, say, natural sciences and engineering. Expensive software packages, then, may not be available to researchers. The case is equally dire for students and researchers in under-resourced communities and countries. An R-based QDA package enables broader participation in qualitative research and, by extension, broader perspectives on the nature of human and social behavior.

  • +
+
+
+

Conceptualizing a Minimum Viable Product: A vignette

+

Questa the graduate student is working on one chapter of her dissertation research where she is trying to understand how different codes of conduct in the software community welcome underrepresented communities. She wants to understand language use and to trace how these codes of conduct are created from differing communities. She also is planning to do interviews with people who have created and used these codes of conduct to understand how their adoption influences inclusion.

+

None of the grants that Questa sent out have come back with any money to do this research. She’s discouraged but passionate about her work and she knows she needs to finish this chapter of her dissertation so she can graduate already. She’s done some work with a nonprofit and they support her work. She’s pretty sure she has a postdoc with them when she finishes her degree. While they should be able to hire her and do support her research, it’s unlikely they’d have enough funds for her to get a license for one of the QDA packages. She has been able to try out some of the larger QDA packages and they work ok with her student license, but her computer is on its last legs and the program keeps crashing her computer.

+

She’s planning to start up her interviews and overall she thinks she’ll have maybe around 50-75 text files that include 1) codes of conduct 2) interview transcripts 3) field research at a conference.

+

She’s not sure how many codes she’d need, but she’s guessing that the first round of the coding would be a lot of different codes - maybe 150 - 200 codes to start. Then she’d likely change them and boil it down into groups of codes and categories of those codes. So ideally the program would make it easy for her to edit, change and move around the codes. She also needs to write up a simple codebook as she does the coding.

+
+ +
+

References

+

Corbin, J., & Strauss, A. L. (2014). Basics of qualitative research. Thousand Oaks, CA: Sage.

+

Glaser, B. G., & Strauss, A. L. (2017). Discovery of grounded theory: Strategies for qualitative research. London, UK: Routledge.

+

Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook. Thousand Oaks, CA: Sage.

+

Miles, M. B., Huberman, A. M., & Saldana, J. (2013). Qualitative data analysis. Thousand Oaks, CA: Sage.

+

Saldaña, J. (2015). The coding manual for qualitative researchers. Thousand Oaks, CA: Sage.

+
+
+

Using QCoder

+

QCoder is designed to be easy to use and to require minimal knowledge of computer systems and code. Like all software, including other applications for QDA there will be a learning period, but as we develop our goal will be to keep the interface simple and steadily improve it. Currently we have a very minimal prototype.

+

Once you have installed QCoder, load it with the library command.

+
library(qcoder)
+

This readme file is going to use sample data to illustrate basic QCoder functionality. A full vignette will explain how to use non standard names and file locations.

+

To begin we will explain how to set up your data. To make this simpler you can create a QCoder project by using the create_qcoder_project() command. You can install the sample data used in this document by adding sample = TRUE.

+
create_qcoder_project("my_qcoder_project")
+# Or to install sample data
+create_qcoder_project("my_qcoder_project", sample = TRUE)
+

This will create one main folder and four subfolders. Unless you specified otherwise it will be in your current working directory (you can find this with the getwd() command at the console). These will hold the documents to be coded, information about the codes, unit information and the r data frames that will be the core of the analysis. For this example the folder and file structures for the sample data will look similar to this.

+
+ + +
+
+

Documents

+

After creating the QCoder project organize your data. In our example we’ve already placed our documents into the “documents” folder. At this point we only have tested support for txt files though other types will probably work. If you have documents in other formats you can use “Save As” to convert to txt.

+
read_raw_data(project = "my_qcoder_project")
+

Will read the documents files into a data frame stored in the data_frames folder of your project in a data frame called qcoder_documents.

+

(Other options let you use other names and locations.)

+
+
+

Codes

+

QCoder has the option to import a list of predefined codes from a CSV file (if you have this in a spreadsheet you can “Save As” csv). This file should have exactly 3 columns with headings:

+
    +
  • code_id (A unique number for each code)
  • +
  • code (One word description, can use underscores or hyphens)
  • +
  • code.description (Longer description of the code, must be enclosed in quotation marks.)
  • +
+

Here are the contents of the sample data csv file that comes with QCoder.

+
 code_id,code,code.description
+1,"harassment","define or describes harassing behavior"
+2,"person_talk","naming a specific person to talk to if there are violations"
+3,"gender","mentions gender"
+4,"gender_id","mentions gender identity or expression"
+5,"consequences","Detailing what happens if someone violates the code of conduct"
+6,"license","The license for this code of conduct"
+
+

You are not restricted to using the listed codes, but this file allows you to produce a detailed codebook including descriptions.

+

If you installed the sample data the example codes are in the codes folder.

+

Running this code will read your code file into a data frame called qcoder_codes.rds.

+
read_raw_data(project = "my_qcoder_project")
+

Other options allow use of different names and locations as well as creation of an empty data frame.

+

Now it’s time to start coding.

+

Coding uses a “Shiny App” to provide a user interface to the data. To launch the app use the function qcode_edit().

+
qcode_edit()
+

Which will launch this application.

+
+ + +
+

Put the full paths to your documents and codes data frames in the corresponding fields. To get these you may need to use the file explorer on your computer. Make sure to include the .rds at the end of paths.

+

Once you have entered the paths there will be a drop down menu on the “Add codes to text” tab to allow you to pick a specific document to code. This will pull a document into the editor.

+
+ + +
+

Switching to the “Codes” tab a list of codes from the codes file is displayed.

+
+ + +
+

Our sample data already has some coding done, and the code-text data is displayed on the “Coded data” tab.

+
+ + +
+
+
+

Coding the data

+

To add codes to the documents uses a tagging system. Text to be assigned a code should be surrounded by (QCODER) (/QCODER) tags. The closing tag is followed immediately by the code enclosed in curly brackets and prefixed with a # for exmaple {#samplecode}

+

(QCODER)This is the text that is being assigned a code.(/QCODER){#instructions}

+

One pair of {} can contain multiple codes, each with at # and separated by commas.

+

At this point nested codes are not supported but implementing that is a high priority goal.

+

When you have finished tagging a document press the “Save changes” button.

+
+
+
+ + + + +
+ + + + + + + + diff --git a/README.rmd b/README.rmd index 34c56c9..5bf42b6 100644 --- a/README.rmd +++ b/README.rmd @@ -94,7 +94,7 @@ These will hold the documents to be coded, information about the codes, unit information and the r data frames that will be the core of the analysis. For this example the folder and file structures for the sample data will look similar to this. -![](./images/folderstructure.png) +![](../images/folderstructure.png) ### Documents In our example we've already placed our documents into the "documents" folder. @@ -178,7 +178,7 @@ import_project_data(project = "my_qcoder_project") ``` Now the data_frames folder will contain the imported files. -![](./images/data_frames_folder.png) +![](../images/data_frames_folder.png) Now it's time to start coding. @@ -197,11 +197,11 @@ Once you have selected your project there will be a drop down menu on the "Add codes to text" tab to allow you to pick a specific document to code. This will pull a document into the editor. -![](./images/coding_step1.png) +![](images/coding_step1.png) Select your project folder. -![](./images/coding_step2_folder_select.png) +![](../images/coding_step2_folder_select.png) Once you have a project, use the drop down menu to select a particular document to code. This will open in an editor. When done coding (instructions below), @@ -209,16 +209,16 @@ click Save changes. Select your project folder. -![](./images/coding_step3_document_select.png) +![](../images/coding_step3_document_select.png) Switching to the "Codes" tab a list of codes from the codes file is displayed. -![](./images/codestab.png) +![](../images/codestab.png) Our sample data already has some coding done, and the code-text data is displayed on the "Coded data" tab. -![](./images/codeddata.png) +![](../images/codeddata.png) ### Coding the data diff --git a/docs/images/codeddata.png b/docs/images/codeddata.png new file mode 100644 index 0000000..0b586d8 Binary files /dev/null and b/docs/images/codeddata.png differ diff --git a/docs/images/codestab.png b/docs/images/codestab.png new file mode 100644 index 0000000..7daa2a9 Binary files /dev/null and b/docs/images/codestab.png differ diff --git a/docs/images/coding_step1.png b/docs/images/coding_step1.png new file mode 100644 index 0000000..6503c5b Binary files /dev/null and b/docs/images/coding_step1.png differ diff --git a/docs/images/coding_step2_folder_select.png b/docs/images/coding_step2_folder_select.png new file mode 100644 index 0000000..0ab81db Binary files /dev/null and b/docs/images/coding_step2_folder_select.png differ diff --git a/docs/images/coding_step3_document_select.png b/docs/images/coding_step3_document_select.png new file mode 100644 index 0000000..2a19a06 Binary files /dev/null and b/docs/images/coding_step3_document_select.png differ diff --git a/docs/images/data-reader-app.png b/docs/images/data-reader-app.png new file mode 100644 index 0000000..25cf50b Binary files /dev/null and b/docs/images/data-reader-app.png differ diff --git a/docs/images/data_frames_folder.png b/docs/images/data_frames_folder.png new file mode 100644 index 0000000..b213aa5 Binary files /dev/null and b/docs/images/data_frames_folder.png differ diff --git a/docs/images/folderstructure.png b/docs/images/folderstructure.png new file mode 100644 index 0000000..2b0e407 Binary files /dev/null and b/docs/images/folderstructure.png differ diff --git a/docs/images/hex/hex.R b/docs/images/hex/hex.R new file mode 100644 index 0000000..3e185ea --- /dev/null +++ b/docs/images/hex/hex.R @@ -0,0 +1,12 @@ +#icon from https://thenounproject.com +library(hexSticker) +sticker("hex/noun_729118_cc_mod.png", package="qcoder", + p_size=19, s_x=1, s_y=.8, s_width=.6, p_y=1.55, + h_fill="#d5f4e6", h_color="#80ced6", p_color="#000000", + filename="imgHex.png") + +#beige version +sticker("hex/noun_729118_cc_mod.png", package="qcoder", + p_size=22, s_x=1, s_y=.82, s_width=.6, p_y=1.51, h_size=1.2, + h_fill="#f9e8a4", h_color="#99623b", p_color="#664025", + filename="hex/imgHex_beige.png") \ No newline at end of file diff --git a/docs/images/hex/imgHex.png b/docs/images/hex/imgHex.png new file mode 100644 index 0000000..c8fa745 Binary files /dev/null and b/docs/images/hex/imgHex.png differ diff --git a/docs/images/hex/imgHex_beige.png b/docs/images/hex/imgHex_beige.png new file mode 100644 index 0000000..97b8e7a Binary files /dev/null and b/docs/images/hex/imgHex_beige.png differ diff --git a/docs/images/hex/noun_729118_cc.png b/docs/images/hex/noun_729118_cc.png new file mode 100644 index 0000000..04ec475 Binary files /dev/null and b/docs/images/hex/noun_729118_cc.png differ diff --git a/docs/images/hex/noun_729118_cc_mod.png b/docs/images/hex/noun_729118_cc_mod.png new file mode 100644 index 0000000..18a3c74 Binary files /dev/null and b/docs/images/hex/noun_729118_cc_mod.png differ diff --git a/docs/images/qedit1.png b/docs/images/qedit1.png new file mode 100644 index 0000000..9a6bf36 Binary files /dev/null and b/docs/images/qedit1.png differ diff --git a/docs/images/qedit2.png b/docs/images/qedit2.png new file mode 100644 index 0000000..7972326 Binary files /dev/null and b/docs/images/qedit2.png differ diff --git a/docs/index.html b/docs/index.html index a06fb7a..98c87f5 100644 --- a/docs/index.html +++ b/docs/index.html @@ -134,7 +134,7 @@

This will create one main folder and four subfolders. Unless you specified otherwise it will be in your current working directory (you can find this with the getwd() command at the console). If you have a specific location where you want to put the folder change your working directory.

These will hold the documents to be coded, information about the codes, unit information and the r data frames that will be the core of the analysis. For this example the folder and file structures for the sample data will look similar to this.

- +

@@ -187,7 +187,7 @@

Importing the data

To import this data into Qcode user the import_project_data() function.

import_project_data(project = "my_qcoder_project")
-

Now the data_frames folder will contain the imported files.

+

Now the data_frames folder will contain the imported files.

Now it’s time to start coding.

Coding uses a “Shiny App” to provide a user interface to the data. To launch the app use the function qcode().

qcode()
@@ -203,20 +203,20 @@

Select your project folder.

- +

Once you have a project, use the drop down menu to select a particular document to code. This will open in an editor. When done coding (instructions below), click Save changes.

Select your project folder.

- +

Switching to the “Codes” tab a list of codes from the codes file is displayed.

- +

Our sample data already has some coding done, and the code-text data is displayed on the “Coded data” tab.

- +

diff --git a/images/data-reader-app.png b/images/data-reader-app.png new file mode 100644 index 0000000..25cf50b Binary files /dev/null and b/images/data-reader-app.png differ diff --git a/motivation.html b/motivation.html new file mode 100644 index 0000000..1357130 --- /dev/null +++ b/motivation.html @@ -0,0 +1,210 @@ + + + + + + + + + + + + + +Motivation + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + +
+

Motivation

+

The motivation stems from the need for a free, open source option for analyzing textual qualitative data. Textual qualitative data refers to text from interview transcripts, observation notes, memos, jottings and primary source/archival documents.

+

Qualitative data analysis (QDA) processes, particularly those developed by Corbin and Strauss (2014), Miles, Huberman, and Saldana (2013), and Glaser and Strauss (2017), can be thought of as layering interpretation onto the text. The researcher starts with open coding, meaning that she is free to tag snippets of text with whatever descriptions she deems appropriate. For instance, if the researcher is coding observation notes and senses that conversations between two individuals will be relevant to the research questions, she might tag the instances in which the two individuals speak with the code “conversation.” In the next round of coding, she might classify what the participants discuss with a finer tag, like “conversation_package” if they were talking about creating packages in R. In another round, she might get even more specific with codes such as “conversation_package_nomoney” if a participant discussed not having money to create a package in R. Later rounds involve conflating codes that might mean the same thing, relating codes to one another (often by documenting their meanings in a similar way to software documentation), and eliminating codes that no longer make sense.

+

Sometimes the QDA process involves looking for instances that demonstrate some concept, mechanism, or theory from the academic literature on the subject. This approach is common toward the end of the process in fields such as organization studies, management, and other disciplines and is prominent as an approach from the beginning in fields with experimental or clinical roots (e.g., psychology).

+

Using software for QDA allows researchers to nest codes, then begin to see the number of instances in which a particular code has been applied. Perhaps more importantly, software allows easy visualization and analysis of where codes co-occur (i.e., where multiple codes have been applied to the same snippets of text), and other linking activities that help researchers identify and specify themes in the data. Some software packages allow the researcher to visualize codes and the relationships between them in new and innovative ways; however, a cursory review of the literature suggests that qualitative researchers often use very basic features; some even use software such as Endnote on Microsoft Word or even analog systems such as paper or sticky notes.

+

QDA is an iterative process. Researchers will often change, lump together, split, or re-organize codes as they analyze their data. Depending on the coding approach, researchers might also create a codebook or research notes for each code that defines the code and specifies instances in which the code should be applied.

+
+
+

Current QDA software

+

To date, researchers who conduct QDA largely rely upon proprietary software such as those listed below. Free and open source options are limited, not easy to integrate with R and (credit to bduckles for the descriptions):

+
    +
  • NVivo – QSR International Desktop based software with both Windows and Mac support. Functionality includes video and images as data and auto-coding processes. Tech support and training is available and a relatively large user base.

  • +
  • Atlas.TI Desktop based software for Mac and Windows, also has mobile apps for android and iOS. Similar functionality as NVivo including using video and visuals as data. Has training and support.

  • +
  • MAXQDA Desktop based software that works for both Mac and Windows. Training and several tiers of licenses available.

  • +
  • QDA Miner – Provalis Research A Windows application to analyze qualitative text, can be used with some visuals as well. They also have a “lite” package which is free.

  • +
  • HyperResearch – Researchware Desktop based software that works on both Mac and Windows.

  • +
  • Dedoose Web based software, usable on any platform. Data stored in the web instead of on your device. Includes text, photos, audio, video and spreadsheet information.

  • +
  • Annotations Mac only app that allows you to highlight, keyword and create notes for text documents. An inexpensive way to do basic qualitative data analysis. Last updated in 2014.

  • +
  • RQDA A QDA package for R that is free and open source. Bugs in the program make it challenging to use. Last updated in 2012.

  • +
  • Coding Analysis Toolkit – CAT A free, web based, open source service that was created by University of Pittsburgh. Can be used on any platform. Not supported.

  • +
  • Weft QDA A free and open source Windows program that has no support. The website says that bugs in the program can result in the loss of data.

  • +
+
+
+

Limitations of Existing Software

+

Each extant software package has its limitations. The foremost limitation is cost, which can prohibit students and underfunded qualitative researchers from conducting analyses systematically, and efficiently. Furthermore, the mature software packages (e.g., Atlas.TI, NVIVO) offer features that exceed the needs of many users and, as a result, suffer speed issues (particularly for those researchers who may not benefit from advanced hardware). The sharing process for proprietary QDA outputs is equally unwieldy, relying on non-intuitive bundling and unbundling processes, steep learning curves, and non-transferable skill development.

+

Open source languages such as R offer the opportunity to involve qualitative researchers in open source software development. Greater involvement of qualitative researchers serves to expand the scope of R users and could create inroads to connect qualitative and quantitative R packages. For instance, better integration of qualitative research packages into R would make it possible for existing text analysis programs to work alongside qualitative coding.

+
+
+

User Needs

+

We began this project at rOpenSci’s runconf18 based on bduckles issue. We began by discussing the common challenges we and our peers face in conducting QDA and considering what we might learn from existing quantitative and text analysis open source packages. We identified a number of unique challenges qualitative researchers encounter when analyzing data.

+
    +
  • GUI for broader adoption. Quantitative researchers are often familiar with conducting analyses from the command line. Qualitative researchers, on the other hand, often rely upon Graphical User Interfaces (GUI) to perform analysis tasks. A proposed, in-development solution is integrating Shiny apps to permit a personalized GUI for QCoder.

  • +
  • Collaboration. Several aspects of collaboration in qualitative research present challenges for development and use of QDA software. The first challenge relates to version control and tracking changes. Whereas contributors to the open source software community, and perhaps the software development community more generally, have developed processes for sharing code, data, and other work products, qualitative researchers often rely on email, DropBox/box, and other sharing technologies that lack appropriate version control features. One proposed solution is to implement a system of record-keeping that maintains all files akin to Atlas.TI’s bundling and unbundling process, but that permits simultaneous work and comparison/merging of changes. We might also consider urging users to learn git and facilitate skill development by offering tutorials. The second collaboration challenge relates to the need to establish inter-rater or inter-coder reliability. Existing methods of assessing inter-rater reliability for QDA include Cronbach’s Alpha and Krippendorff’s Alpha, among others. Such evaluations, though, do not account for complexities such as decisions not to code a segment of text. In other words, by deciding not to apply a code to a snippet of text might be considered an equivalent level of agreement as a decision to apply the same code to the same snippet of text. QCoder aims to make inter-rater reliability assessments easier and more effective than traditional approaches by allowing direct comparisons of differences in the codebooks it generates.

  • +
  • Privacy concerns. Qualitative data often includes personally-identifying information. Quantitative approaches to deidentification are not always applicable to qualitative datasets because text-based data includes rich descriptions that may elevate the risk of re-identification. Furthermore, qualitative data tends to be unstructured in ways that make systematic approaches to de-identification difficult to implement. QDA software, then, needs to account for these complexities.

  • +
  • Reproducibility. Qualitative research communities, particularly those communities employing methods to probe human and social behavior (such as ethnography), fiercely debate whether or not qualitative research is meant to be reproducible. In other words, some researchers argue that human and social behavior is contextually bound by moments in time, physical settings, and the generally fluid nature of the phenomena under study. Accepting this dispute as valid, we are striving to develop a package that can facilitate some level of reproduction of data analysis. Our approach centers on the production of codebooks. Qualitative researchers create codebooks while coding their data. These codebooks match the code with a description of how and when the code is used. The creation of the codebook is an iterative process that is concurrent with the coding process. Codes are routinely combined, split and shifted as the researcher does their analysis. Codebooks are a way for the researcher to indicate to other members of their research team how to apply codes to the data. Historically, these code books have not been standard across different QDA packages. There could be a benefit to having a standard codebook format which could be compared across projects and even as a project iterates. Using Git or other version control as a codebook is created could prevent mistakes and be a teaching and learning tool for qualitative research. Additionally, most codebooks do not contain sensitive or private data and they could be shared and made publicly available. One could envision the capacity for researchers to share their codebooks so that research that follows could draw on existing codes for future research. A key characteristic of QCoder is that it exposes QDA processes to critique in ways that proprietary software excludes. For example, it is unreasonable to expect that researchers interested in reviewing or reproducing the analysis done for a manuscript pay for a license to perform the task at hand. QCoder can be easily installed and potentially maintained so that critical mistakes in analysis might be caught and corrected in review processes.

  • +
  • Money. Some of the disciplines in which qualitative work occurs do not enjoy the same level of funding as, say, natural sciences and engineering. Expensive software packages, then, may not be available to researchers. The case is equally dire for students and researchers in under-resourced communities and countries. An R-based QDA package enables broader participation in qualitative research and, by extension, broader perspectives on the nature of human and social behavior.

  • +
+
+
+

Conceptualizing a Minimum Viable Product: A vignette

+

Questa the graduate student is working on one chapter of her dissertation research where she is trying to understand how different codes of conduct in the software community welcome underrepresented communities. She wants to understand language use and to trace how these codes of conduct are created from differing communities. She also is planning to do interviews with people who have created and used these codes of conduct to understand how their adoption influences inclusion.

+

None of the grants that Questa sent out have come back with any money to do this research. She’s discouraged but passionate about her work and she knows she needs to finish this chapter of her dissertation so she can graduate already. She’s done some work with a nonprofit and they support her work. She’s pretty sure she has a postdoc with them when she finishes her degree. While they should be able to hire her and do support her research, it’s unlikely they’d have enough funds for her to get a license for one of the QDA packages. She has been able to try out some of the larger QDA packages and they work ok with her student license, but her computer is on its last legs and the program keeps crashing her computer.

+

She’s planning to start up her interviews and overall she thinks she’ll have maybe around 50-75 text files that include 1) codes of conduct 2) interview transcripts 3) field research at a conference.

+

She’s not sure how many codes she’d need, but she’s guessing that the first round of the coding would be a lot of different codes - maybe 150 - 200 codes to start. Then she’d likely change them and boil it down into groups of codes and categories of those codes. So ideally the program would make it easy for her to edit, change and move around the codes. She also needs to write up a simple codebook as she does the coding.

+
+
+

References

+

Corbin, J., & Strauss, A. L. (2014). Basics of qualitative research. Thousand Oaks, CA: Sage.

+

Glaser, B. G., & Strauss, A. L. (2017). Discovery of grounded theory: Strategies for qualitative research. London, UK: Routledge.

+

Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook. Thousand Oaks, CA: Sage.

+

Miles, M. B., Huberman, A. M., & Saldana, J. (2013). Qualitative data analysis. Thousand Oaks, CA: Sage.

+

Saldaña, J. (2015). The coding manual for qualitative researchers. Thousand Oaks, CA: Sage.

+
+ + + + +
+ + + + + + + +