### tl;dr

jamovi 0.9 and newer allow you to filter rows and columns out of your analyses. This is useful for excluding outliers, or limiting the scope of your analyses. jamovi filters are built on top of the jamovi ‘compute variables’ system, allowing great flexibility in filtering.

*(note, if you haven’t already updated to jamovi 0.9.0.1 [or newer], we encourage you to do do so).*

## Row filters

The jamovi 0.9 series has landed, and with it comes the ability to filter out rows that you don’t want included in your analyses. There are a number of reasons why this might be appropriate. For example, you might want to only include people’s survey responses if they explicitly consented to having their data used, or you might want to exclude all left-handed people, or perhaps people who score ‘below chance’ in an experimental task. In some cases you just want to exclude extreme scores, for example those that score more than 3 standard deviations from the mean.

The filters in jamovi are build on top of jamovi’s *computed variable* formula system, which allows the building of arbitrarily complex formulas. For a primer on *computed variables* in jamovi, there’s an earlier blog post on them here (although, you should note that we’ve added quite a few new useful functions since that blog post was written.)

To demonstrate jamovi filters (you can follow along if you like), I’m going to open the very simple `Tooth Growth`

data set from the examples. Next, we select the *Filters* button from the *Data* tab. This opens the filter view and creates a new filter called `Filter 1`

. This can be seen in the following GIF:

In this GIF we specify a filter to exclude the 9th row. Perhaps we know that the 9th participant was someone just testing the survey system, and not a proper participant (`Tooth Growth`

is actually about the length of guinea pig teeth, so perhaps we know that the 9th participant was a rabbit). We can simply exclude them with the formula:

```
ROW() != 9
```

In this expression the `!=`

means ‘does not equal’. If you’ve ever used a programming language like *R* this should be very familiar. Filters in jamovi exclude the rows for which the formula is *not* true. in this case, the expression `ROW() != 9`

is true for all rows *except* the 9th row. When we apply this filter, the tick in the `Filter 1`

column of the 9th row changes to a cross, and the whole row greys out. If we were to run an analysis now, it would run as though the 9th row wasn’t there. Similarly, if we already had run some analyses, they would re-run and the results would update to values not using the 9th row.

But we can do more complicated filters than this! The `Tooth Growth`

example contains the length of teeth from guinea pigs (the `len`

column) fed different dosages (the `dose`

column) of supplements; vitamin c or orange juice (recorded in the `supp`

column). Let’s say that we’re interested in the effect of dosage on tooth length. We might run an ANOVA with `len`

as the dependent variable, and `dose`

as the grouping variable. But let’s say that we’re only interested in the effects of vitamin c, and not of orange juice. we can use the formula:

```
supp == 'VC'
```

In fact we can specify this formula *in addition* to the `ROW() != 9`

formula if we like. We can add it as another expression to `Filter 1`

(by clicking the small `+`

beside the first formula), or we can add it as an additional filter (by selecting the large `+`

to the left of the filters dialog box). As we’ll see, adding an expression to an existing filter does not provide exactly the same behaviour as creating a separate filter. In this case however, it doesn’t make a difference, so we’ll just add it to the existing filter. This additional expression comes to be represented with its own column as well, and by looking at the ticks and crosses, we can see which filter or expression is responsible for excluding each row.

But let’s say we want to exclude from the analysis all the tooth lengths that were more than 1.5 standard deviations from the mean. To do this, we’d take a Z-score, and check that it falls between -1.5 and 1.5. we could use the formula:

```
-1.5 < Z(len) < 1.5
```

or if we’re really keen:

```
-1.5 < (len - VMEAN(len)) / VSTDEV(len) < 1.5
```

(this last formula is a great way to demonstrate to students what a z-score is.)

There are a lot of functions available in jamovi, and you can see them by clicking the small beside the formula box.

Now let’s add this z-score formula to a separate filter by clicking the large `+`

to the left of the filters, and adding it to `Filter 2`

.

With multiple filters, the filtered rows cascade from one filter into the next. So only the rows allowed through by `Filter 1`

are used in the calculations for `Filter 2`

. In this case, the mean and standard deviation for the z-score will be based only on the Vitamin C rows (and also not on row 9). In contrast, if we’d specified our Z() filter as an additional expression in `Filter 1`

, then the mean and standard deviation for the z-score would be based on the entire dataset.

In this way you can specify arbitrarily complex rules for when a row should be included in analyses or not (but you should pre-register your rules^{1}).

So that is how row filters work in jamovi, which apply to the data set as a whole. However, sometimes you want to just filter individual columns. For that there are column filters.

## Column filters

Column filters come in handy when you want to filter some rows for some analyses, but not for all. This is achieved with the *computed variable* system (you can read more about computed variables in our earlier blog post here). With the computed variables we create a copy of an existing column, but with the unwanted values excluded.

In the `Tooth Growth`

example, we might want to analyse the doses of 500 and 1000, and 1000 and 2000 separately. To do this we create a new column for each subset. So in our example, we can select the `dose`

column in the jamovi spreadsheet, and then select the `Compute`

button from the data tab. This creates a new column to the right called `dose (2)`

, and same as the filters, we can enter a formula. in this case we’ll enter the formula:

```
FILTER(dose, dose <= 1000)
```

Or if you prefer:

```
FILTER(dose, dose == 1000 or dose == 500)
```

The first argument to the `FILTER()`

function (in this example `dose`

) is what values to use in the computed column. The second argument is the condition; when this condition isn’t satisfied, the value comes across blank (or as a ‘missing value’ if you prefer). So with this formula, the `dose (2)`

column contains all the `500`

and `1000`

values, but the `2000`

values are not there.

We might also change the name of the column to something more descriptive, like `dose 5,10`

. Similarly we can create a column `dose 10,20`

with the formula `FILTER(dose, dose != 500)`

. Now we can run two separate ANOVAs (or t-tests) using `len`

as the dependent variable, and `dose 5,10`

as one grouping variable in the first analysis, and `dose 10,20`

in the other. In this way we can use different filters for different analyses. Contrast this with row filters which are applied to *all* the analyses.

It may also have occurred to you, that with `FILTER()`

we can do what might be called a ‘poor man’s *split variables*’. In the future jamovi will provide a dedicated UI for ‘splitting variables’, but in the mean time you can create splits using `FILTER()`

. For example, we could split `len`

into two new columns `len VC`

and `len OJ`

with the functions `FILTER(len, supp == 'VC')`

and `FILTER(len, supp == 'OJ')`

respectively. This results in two separate columns which can be analysed side-by-side.

So that’s column filters and row filters. We hope you find them as satisfying to use as we’ve found developing them!

^{1}*Pre-registration is the solution to p-hacking, not deliberately making software difficult to use! Don’t p-hack. Your p-hacking harms more people than you know.*

## Comments