--- title: "stenR usage" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{stenR usage} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4, fig.align = "center" ) ``` `stenR` is a package tailored mainly for users and creators of psychological questionnaires, though other social science researchers and survey authors can benefit greatly from it. It provides tools to help with processes necessary for conducting such studies: 1. processing data from raw item scores to raw factor/scale scores 1. standardization of the raw scores into standard scale of your choosing, either by: - normalization of the raw scores using frequency table (if no norms have been developed before). Usually for authors of questionnaires or their adaptations. - importing scoring table developed by questionnaire authors - for researchers only using the measure Furthermore, tools for developing or using norms on grouped basis are also provided (up to two intertwined grouping conditions are supported). As there are few fairly independent and varied processes supported in the `stenR`, they will be described separately below. For more details, browse through documentation and other vignettes. ```{r library_load} library(stenR) ``` ## Processing raw item scores to raw factor/scales After conducting the study, results will be usually available in form of responses in some scoring scale for each separate items. For further analysis they need to be gathered into scales and factors (unless they are one-item scale). `stenR` provides functions to make this process straightforward. We will use one of the datasets provided with the package: `SLCS`, containing responses for items in Self-Liking Self-Competence Scale. It consists of 16 items, which can be grouped into two subscales (Self-Liking, Self-Competence) and General Score. ```{r SLCS_structure} str(SLCS) ``` To summarize scores we need to create *ScaleSpec* objects with `ScaleSpec()` constructor function. Such objects contain instructions for R how the scales are structured, most importantly: - **name**: name of the resulting variable - **item_names**: names of variables which will be summed to the scale - **reverse**: names of the variables that need to have their scores reversed - **min**, **max**: absolute min and max of raw scores ```{r SLCS_ScaleSpecs} SL_spec <- ScaleSpec( name = "SelfLiking", min = 1, max = 5, item_names = c("SLCS_1", "SLCS_3", "SLCS_5", "SLCS_6", "SLCS_7", "SLCS_9", "SLCS_11", "SLCS_15"), reverse = c("SLCS_1", "SLCS_6", "SLCS_7", "SLCS_15") ) SC_spec <- ScaleSpec( name = "SelfCompetence", min = 1, max = 5, item_names = c("SLCS_2", "SLCS_4", "SLCS_8", "SLCS_10", "SLCS_12", "SLCS_13", "SLCS_14", "SLCS_16"), reverse = c("SLCS_8", "SLCS_10", "SLCS_13") ) ``` If there are main factors or factors of higher level, the `ScaleSpec` objects can be also combined into *CombScaleSpec* object with `CombScaleSpec()` constructor function. In our example the **General Score** is such factor. ```{r SLCS_CombScaleSpec} GS_spec <- CombScaleSpec( name = "GeneralScore", SL_spec, SC_spec ) # subscales can be also reversed GS_with_rev <- CombScaleSpec( name = "rev_example", SL_spec, SC_spec, reverse = "SelfCompetence" ) ``` When all scale specifications are ready, they can be then used to get the factor/scale data, summarized in accordance to the instructions in provided *ScaleSpec* or *CombScaleSpec* objects. ```{r SLCS_summarize} summed_SCLS <- sum_items_to_scale( SLCS, SL_spec, SC_spec, GS_spec, GS_with_rev ) str(summed_SCLS) ``` ## Normalize scores with FrequencyTable For the times when you have great number of observations and prefer to develop norms (usually, when you are creator of questionnaire or its adaptation) it is recommended to generate *FrequencyTable* and *ScoreTable* objects. Resulting *ScoreTable* objects can be either used to normalize the scores or create exportable to non-R specific objects *ScoringTable* object. > There are also support for automatic grouping the observations using *GroupedFrequencyTable* and *GroupedScoreTable* objects. They will be mentioned in **Grouping** section. We will use one of the datasets provided with the package: `HEXACO_60`, containing raw scores of scales in HEXACO 60-item questionnaire. ```{r HEXACO_str} str(HEXACO_60) ``` 1. Create *FrequencyTable* objects At first, we need to create a *FrequencyTable* object for each variable using `FrequencyTable()` constructor function. ```{r ungrouped_freqtable} HEX_C_ft <- FrequencyTable(HEXACO_60$HEX_C) HEX_E_ft <- FrequencyTable(HEXACO_60$HEX_E) ``` If there are some missing raw scores in your data, helpful message will be displayed. You can check how the frequencies look like using `plot()` function. ```{r plot_freqtable} plot(HEX_E_ft) ``` As we can see, the missing values are gathered near tails of the distribution. It can happen even with many observations - and in case of our sample (103 observations) it is very likely. 2. Create *ScoreTable* objects *ScoreTable* object is basically a frequency table with additional standard scale specification attached. We can create our own specification using `StandardScale()`, but we will use in the example already provided `STEN` (Standard TEN) score specification ```{r ungrouped_scoretables} HEX_C_st <- ScoreTable( ft = HEX_C_ft, scale = STEN ) HEX_E_st <-ScoreTable( ft = HEX_E_ft, scale = STEN ) ``` 3. Normalize and standardize scores Created *ScoreTable*s can be then used to calculate the normalized scores. Normalization can be done either on individual vectors with basic `normalize_score()` function: ```{r ungrouped_normalization_base} HEX_C_norm <- normalize_score( HEXACO_60$HEX_C, table = HEX_C_st, what = "sten" ) HEX_E_norm <- normalize_score( HEXACO_60$HEX_E, table = HEX_E_st, what = "sten" ) summary(HEX_C_norm) summary(HEX_E_norm) ``` Or using the convienient wrapped for whole *data.frame* ```{r ungrouped_normalization_df} HEX_CE_norm <- normalize_scores_df( data = HEXACO_60, vars = c("HEX_C", "HEX_E"), HEX_C_st, HEX_E_st, what = "sten", # by default no other variables will be retained retain = FALSE ) summary(HEX_CE_norm) str(HEX_CE_norm) ``` ```{r ScoringTable_export, echo=FALSE} C_ScoringTable <- tempfile(fileext = ".csv") E_ScoringTable <- tempfile(fileext = ".csv") export_ScoringTable( to_ScoringTable(HEX_C_st, min_raw = 10, max_raw = 50), C_ScoringTable, "csv", ) export_ScoringTable( to_ScoringTable(HEX_E_st, min_raw = 10, max_raw = 50), E_ScoringTable, "csv" ) ``` ## Normalize scores using imported ScoringTable Most users will be using already developed norms by the creators of questionnaire. Scoring tables should be provided in the measure documentation, and *ScoringTable* object is mirroring their usual representation. *ScoringTable* object can be either created from *ScoreTable* or *GroupedScoreTable* object or imported from **csv** or **json** file. For manual creation, the **csv** format is recommended. Such file should look similar to the one below (which is created on basis of Consciousness *ScoreTable* from code in section above) ```csv "sten","Score" 1,"10-19" 2,"20-25" 3,"26-28" 4,"29-31" 5,"32-35" 6,"36-39" 7,"40-42" 8,"43-46" 9,"47-48" 10,"49-50" ``` - first column should contain the standardized scores - second column should contain the raw scores in pattern of `{min}-{max}` that need to be changed into each standardized score > *ScoringTable* objects also supports different groups of observations - in that case 2nd to n-th columns are reflecting scores for each of the group. They will be mentioned in **Grouping** section. We can import *ScoringTable*s using `import_ScoringTable()` function. ```{r Scoring_ungrouped_import} HEX_C_Scoring <- import_ScoringTable( source = C_ScoringTable, method = "csv" ) HEX_E_Scoring <- import_ScoringTable( source = E_ScoringTable, method = "csv" ) summary(HEX_C_Scoring) summary(HEX_E_Scoring) ``` They can be then used to normalize scores, very similarly to `normalize_scores_df`: ```{r normalize_Scoring_ungrouped} HEX_CE_norm <- normalize_scores_scoring( data = HEXACO_60, vars = c("HEX_C", "HEX_E"), HEX_C_Scoring, HEX_E_Scoring ) summary(HEX_CE_norm) str(HEX_CE_norm) ``` ## Groupings Very often the norms are different for different groups: most often varying in some demographic variables, like biological sex or biological age. `stenR` functions provide support for such groups by intoducing *Grouped* variants of *FrequencyTable* and *ScoreTable* (regular *ScoringTable* supports them) and *GroupConditions* class. *GroupConditions* works similarly to *ScaleSpec* and *CombScaleSpec* objects: it provides information about how to assign observations. They need the name of category (mainly for informative reasons) and conditions following the syntax of name of the group on the LHS and boolean condition on the RHS. ```{r GroupConditions} sex_grouping <- GroupConditions( conditions_category = "Sex", "M" ~ sex == "M", "F" ~ sex == "F" ) age_grouping <- GroupConditions( conditions_category = "Age", "to 30" ~ age < 30, "above 30" ~ age >= 31 ) sex_grouping age_grouping ``` They can be then used to create a *GroupedFrequencyTable*, and following that: *GroupedScoreTable* and, optionally, *ScoringTable* - or to create *ScoringTable* during import. For this examples we will be using `IPIP_NEO_300` dataset provided with the package. It contains the *age* and *sex* variables, and summed raw scores of 5 scales from IPIP NEO questionnaire (300 item version). ```{r IPIP_NEO_str} str(IPIP_NEO_300) ``` 1. *GroupedFrequencyTable*, *GroupedScoreTable* and *ScoringTable* export Workflow is very similiar to the ungrouped tables. ```{r ungrouped_tables, fig.align = "center"} N_gft <- GroupedFrequencyTable( data = IPIP_NEO_300, conditions = list(age_grouping, sex_grouping), var = "N", # By default, norms are are also computed for '.all' groups. These are # used if by any reason observation can't be assigned to any group # in corresponding condition category .all = TRUE ) N_gst <- GroupedScoreTable(N_gft, scale = STEN) plot(N_gst) ``` *GroupedScoreTable* can be then used to normalize scores using `normalize_scores_grouped()`. By default, other variables are not retained. You can also provide column name to contain the assigned group names per observation. ```{r grouped_table_normalize} NEO_norm <- normalize_scores_grouped( data = IPIP_NEO_300, vars = "N", N_gst, what = "sten", group_col = "Group" ) str(NEO_norm) table(NEO_norm$Group) ``` *GroupedScoreTable* can be then transformed into *ScoringTable* and exported to **csv** or **json** file. ```{r grouped_scoring_export} ST_csv <- tempfile(fileext = ".csv") cond_csv <- tempfile(fileext = ".csv") N_ST <- to_ScoringTable( table = N_gst, min_raw = 60, max_raw = 300 ) summary(N_ST) export_ScoringTable( table = N_ST, out_file = ST_csv, method = "csv", # you can also export GroupConditions to seperate csv file cond_file = cond_csv ) ``` 2. *ScoringTable* import from file To import *ScoringTable* with groups from **csv**, it needs to look accordingly: ```csv sten,to 30:M,to 30:F,to 30:.all2,above 30:M,above 30:F,above 30:.all2,.all1:M,.all1:F,.all1:.all2 1,60-94,60-111,60-101,60-85,60-98,60-92,60-90,60-104,60-95 2,95-110,112-128,102-117,86-101,99-112,93-106,91-106,105-119,96-111 3,111-126,129-144,118-134,102-117,113-128,107-122,107-122,120-136,112-128 4,127-143,145-162,135-152,118-135,129-146,123-140,123-140,137-154,129-147 5,144-162,163-180,153-171,136-154,147-165,141-160,141-159,155-174,148-166 6,163-181,181-199,172-190,155-175,166-185,161-180,160-179,175-194,167-186 7,182-201,200-218,191-210,176-198,186-208,181-203,180-200,195-214,187-208 8,202-222,219-238,211-232,199-222,209-229,204-226,201-222,215-234,209-229 9,223-244,239-256,233-251,223-245,230-247,227-247,223-245,235-251,230-248 10,245-300,257-300,252-300,246-300,248-300,248-300,246-300,252-300,249-300 ``` Usually measure developers don't include norms for observations with unmet conditions (groups with `.all` names in `stenR` convention). *ScoringTable* constructed without these groups will produce `NA` during `normalize_scores_scoring()` when observation isn't matching condition provided (that's why `GroupedFrequencyTable()` generates these groups them by default). In that case the csv file would be smaller: ```csv sten,to 30:M,to 30:F,above 30:M,above 30:F 1,60-94,60-111,60-85,60-98 2,95-110,112-128,86-101,99-112 3,111-126,129-144,102-117,113-128 4,127-143,145-162,118-135,129-146 5,144-162,163-180,136-154,147-165 6,163-181,181-199,155-175,166-185 7,182-201,200-218,176-198,186-208 8,202-222,219-238,199-222,209-229 9,223-244,239-256,223-245,230-247 10,245-300,257-300,246-300,248-300 ``` *GroupConditions* objects need to be provided either from **csv** file in `cond_file` argument or as R objects in `conditions` argument of `import_ScoringTable()` function. ```{r import_grouped_scoring} imported_ST <- import_ScoringTable( source = ST_csv, method = "csv", conditions = list(age_grouping, sex_grouping) ) summary(imported_ST) ``` After import, *ScoringTable* can be used to generate scores. ```{r normalize_imported_scoring} NEO_norm <- normalize_scores_scoring( data = IPIP_NEO_300, vars = "N", imported_ST, group_col = "Group" ) str(NEO_norm) table(NEO_norm$Group) ``` ## Varia Above information should be enough for basic usage of `stenR`. It is developed having in mind multiple use-cases and general customizability. Below are some additional possibilities described. ### *StandardScale* In the examples above we used `STEN` *StandardScale* object, which is provided in the package. You can check all available scales with `?default_scales` doc. You can also define your own *StandardScale* object using the `StandardScale` function. ```{r StandardScale} new_scale <- StandardScale("my_scale", 10, 3, 0, 20) # let's see if everything is correct new_scale # how does its distribution looks like? plot(new_scale) ``` ### *CompScoreTable* `R6` object In addition to procedural workflow described above, there is also an `R6` class definition prepared to handle the creation of *ScoreTables* and generation of normalized scores: *CompScoreTable*. There is one useful feature of this object, mainly the ability to automatically recalculate *ScoreTables* based on raw score values calculated using the `standardize` method. It can be helpful for inter-session continuity. >Currently there is only one object, supporting the ungrouped workflow. Grouped version of it is currently in works. #### Initialize the object During object initialization you can attach some previously calculated *FrequencyTables* and/or *StandardScales*. It is fully optional, as it can also be done afterwards. ```{r init_CompScoreTable} # attach during initialization HexCST <- CompScoreTable$new( tables = list(HEX_E = HEX_E_ft), scales = STEN ) # attach later altCST <- CompScoreTable$new() altCST$attach_FrequencyTable(HEX_E_ft, "HEX_E") altCST$attach_StandardScale(STEN) # there are no visible differences in objects structure summary(HexCST) summary(altCST) ``` #### Expand *CompScoreTable* After creation the object can be expanded with more *FrequencyTables* and *StandardScales.* All *ScoreTables* will be internally recalculated ```{r expand_CST} # add new FrequencyTable HexCST$attach_FrequencyTable(FrequencyTable(HEXACO_60$HEX_C), "HEX_C") summary(HexCST) # add new StandardScale HexCST$attach_StandardScale(STANINE) summary(HexCST) ``` #### Standardize scores After the object is ready, the score standardization may begin. Let's feed it some raw scores! ```{r CST_standardize} # standardize the Honesty-Humility and Consciousness HexCST$standardize( data = head(HEXACO_60), what = "sten", vars = c("HEX_E", "HEX_C") ) # you can also do this easily with pipes! HEXACO_60[1:5, c("HEX_E", "HEX_C")] |> # no need to specify 'vars', as the correct columns are already selected HexCST$standardize("sten") ``` #### Automatically recalculate ScoreTables During score standardization, you can also automatically add new raw scores to existing frequencies and recalculate the *ScoreTables* automatically. It is done before returning the values, so they will be based on the most recent ScoreTables. > You can actually use `standardize()` with `calc = TRUE` just after attaching the scale or scales. *ScoreTables* will be generated automatically before the data standardization - so you will receive both the data and computed *ScoreTables* ```{r CST_append} # check the current state of the object summary(HexCST) # now, standardize and recalculate! HEXACO_60[1:5, c("HEX_H", "HEX_C")] |> HexCST$standardize("sten", calc = TRUE) # check the new state summary(HexCST) ``` #### Export tables There is also option to export the *ScoreTables* - either to use them later in procedural way or to create new *CompScoreTable* in another session - for this reason there is also option to export them as *FrequencyTables*! ```{r CST_export} # export as ScoreTables st_list <- HexCST$export_ScoreTable() summary(st_list) # export as FrequencyTables ft_list <- HexCST$export_ScoreTable(strip = T) summary(ft_list) ``` ### Simulate *FrequencyTable* using raw score distribution data Above examples described two most possible scenarios: either having raw scores to calculate norms yourself, or importing scoring table from measure documentation. There are also more rare, but also possible scenario: having access only to descriptive statistics in research article. Using them we can create *Simulated* tables: ```{r SimTables} sim_ft <- SimFrequencyTable(min = 10, max = 50, M = 31.04, SD = 6.7, skew = -0.3, kurt = 2.89, seed = 2678) class(sim_ft) plot(sim_ft) ``` The *Simulated* class will be inherited by *ScoreTable* object created on its basis. Simulated tables can be used in every way that regular ones can be with one exception: if used to create *CompScoreTable* object, the raw scores cannot be appended to this kind of table in `standardize()` method. ```{r check_SimComp, error=TRUE} SimCST <- CompScoreTable$new( tables = list("simmed" = sim_ft), scales = STEN ) SimCST$standardize( data = data.frame(simmed = round(runif(10, 10, 50), 0)), what = "sten", calc = TRUE) ``` ### Extracting observations by group There are also `GroupAssignment()` and `intersect_GroupAssignment()` functions to assign observations on basis of one or two *GroupConditions* objects, described in **Groups** section. They are used internally by `GroupedFrequencyTable()`, `normalize_scores_grouped()` and `normalize_scores_scoring()`, but are also exported if you wish to `extract_observations()` manually. Check the examples in documentation for more information.