Introducing filters in jamovi

tl;dr

jamovi 0.9 and newer allow you to filter rows and columns out of your analyses. This is useful for excluding outliers, or limiting the scope of your analyses. jamovi filters are built on top of the jamovi ‘compute variables’ system, allowing great flexibility in filtering.

(note, if you haven’t already updated to jamovi 0.9.0.1 [or newer], we encourage you to do do so).

Row filters

The jamovi 0.9 series has landed, and with it comes the ability to filter out rows that you don’t want included in your analyses. There are a number of reasons why this might be appropriate. For example, you might want to only include people’s survey responses if they explicitly consented to having their data used, or you might want to exclude all left-handed people, or perhaps people who score ‘below chance’ in an experimental task. In some cases you just want to exclude extreme scores, for example those that score more than 3 standard deviations from the mean.

The filters in jamovi are build on top of jamovi’s computed variable formula system, which allows the building of arbitrarily complex formulas. For a primer on computed variables in jamovi, there’s an earlier blog post on them here (although, you should note that we’ve added quite a few new useful functions since that blog post was written.)

To demonstrate jamovi filters (you can follow along if you like), I’m going to open the very simple Tooth Growth data set from the examples. Next, we select the Filters button from the Data tab. This opens the filter view and creates a new filter called Filter 1. This can be seen in the following GIF:

In this GIF we specify a filter to exclude the 9th row. Perhaps we know that the 9th participant was someone just testing the survey system, and not a proper participant (Tooth Growth is actually about the length of guinea pig teeth, so perhaps we know that the 9th participant was a rabbit). We can simply exclude them with the formula:

ROW() != 9

In this expression the != means ‘does not equal’. If you’ve ever used a programming language like R this should be very familiar. Filters in jamovi exclude the rows for which the formula is not true. in this case, the expression ROW() != 9 is true for all rows except the 9th row. When we apply this filter, the tick in the Filter 1 column of the 9th row changes to a cross, and the whole row greys out. If we were to run an analysis now, it would run as though the 9th row wasn’t there. Similarly, if we already had run some analyses, they would re-run and the results would update to values not using the 9th row.

But we can do more complicated filters than this! The Tooth Growth example contains the length of teeth from guinea pigs (the len column) fed different dosages (the dose column) of supplements; vitamin c or orange juice (recorded in the supp column). Let’s say that we’re interested in the effect of dosage on tooth length. We might run an ANOVA with len as the dependent variable, and dose as the grouping variable. But let’s say that we’re only interested in the effects of vitamin c, and not of orange juice. we can use the formula:

supp == 'VC'

In fact we can specify this formula in addition to the ROW() != 9 formula if we like. We can add it as another expression to Filter 1 (by clicking the small + beside the first formula), or we can add it as an additional filter (by selecting the large + to the left of the filters dialog box). As we’ll see, adding an expression to an existing filter does not provide exactly the same behaviour as creating a separate filter. In this case however, it doesn’t make a difference, so we’ll just add it to the existing filter. This additional expression comes to be represented with its own column as well, and by looking at the ticks and crosses, we can see which filter or expression is responsible for excluding each row.

But let’s say we want to exclude from the analysis all the tooth lengths that were more than 1.5 standard deviations from the mean. To do this, we’d take a Z-score, and check that it falls between -1.5 and 1.5. we could use the formula:

-1.5 < Z(len) < 1.5

or if we’re really keen:

-1.5 < (len - VMEAN(len)) / VSTDEV(len) < 1.5

(this last formula is a great way to demonstrate to students what a z-score is.)

There are a lot of functions available in jamovi, and you can see them by clicking the small variable-computed beside the formula box.

Now let’s add this z-score formula to a separate filter by clicking the large + to the left of the filters, and adding it to Filter 2.

With multiple filters, the filtered rows cascade from one filter into the next. So only the rows allowed through by Filter 1 are used in the calculations for Filter 2. In this case, the mean and standard deviation for the z-score will be based only on the Vitamin C rows (and also not on row 9). In contrast, if we’d specified our Z() filter as an additional expression in Filter 1, then the mean and standard deviation for the z-score would be based on the entire dataset.

In this way you can specify arbitrarily complex rules for when a row should be included in analyses or not (but you should pre-register your rules1).

So that is how row filters work in jamovi, which apply to the data set as a whole. However, sometimes you want to just filter individual columns. For that there are column filters.

Column filters

Column filters come in handy when you want to filter some rows for some analyses, but not for all. This is achieved with the computed variable system (you can read more about computed variables in our earlier blog post here). With the computed variables we create a copy of an existing column, but with the unwanted values excluded.

In the Tooth Growth example, we might want to analyse the doses of 500 and 1000, and 1000 and 2000 separately. To do this we create a new column for each subset. So in our example, we can select the dose column in the jamovi spreadsheet, and then select the Compute button from the data tab. This creates a new column to the right called dose (2), and same as the filters, we can enter a formula. in this case we’ll enter the formula:

FILTER(dose, dose <= 1000)

Or if you prefer:

FILTER(dose, dose == 1000 or dose == 500)

The first argument to the FILTER() function (in this example dose) is what values to use in the computed column. The second argument is the condition; when this condition isn’t satisfied, the value comes across blank (or as a ‘missing value’ if you prefer). So with this formula, the dose (2) column contains all the 500 and 1000 values, but the 2000 values are not there.

We might also change the name of the column to something more descriptive, like dose 5,10. Similarly we can create a column dose 10,20 with the formula FILTER(dose, dose != 500). Now we can run two separate ANOVAs (or t-tests) using len as the dependent variable, and dose 5,10 as one grouping variable in the first analysis, and dose 10,20 in the other. In this way we can use different filters for different analyses. Contrast this with row filters which are applied to all the analyses.

It may also have occurred to you, that with FILTER() we can do what might be called a ‘poor man’s split variables’. In the future jamovi will provide a dedicated UI for ‘splitting variables’, but in the mean time you can create splits using FILTER(). For example, we could split len into two new columns len VC and len OJ with the functions FILTER(len, supp == 'VC') and FILTER(len, supp == 'OJ') respectively. This results in two separate columns which can be analysed side-by-side.

So that’s column filters and row filters. We hope you find them as satisfying to use as we’ve found developing them!



1Pre-registration is the solution to p-hacking, not deliberately making software difficult to use! Don’t p-hack. Your p-hacking harms more people than you know.

Comments