missing values
Andreas Beyer
beyer at imb-jena.de
Wed Feb 9 09:39:31 CET 2005
Hi,
instead of using something like -99 as a representation for missing
values we should use something like infinity. The GNU C-lib knows the
constant:
float NAN;
which is "not-a-number". Unfortunately this constant is only defined on
GNU systems. In addition, NAN is unequal to itself. Hence, the following
is true:
float x = NAN;
assert(x != x);
Instead we might use the maximum double, i.e. something like the following:
#include float.h // DBL_MAX
static double _no_value = DBL_MAX;
/* Obtaining the internal representation for a missing value. */
double missing_value()
{
return _no_value;
}
/* Check if val is a missing value. */
int is_missing_value(double val)
{
return val == _no_value;
}
I prefer encapsulating the missing-value check in a function, because
that leaves room for future extensions that might become more complex.
Andreas
Jakson Aquino schrieb:
> Hi Bernhard!
>
>
>>Could you produce a few example files so that
>>the error can be made obvious. We can use that as
>
> test cases
>
>>when we fix the bug.
>
>
> I'm sending the example files as attachment. The file
> results_gss explains what was done and presents the
> results. The file gss20x6_file_info has information
> about the very small database used to make the test.
>
>
>
>>Putting a special value in is not the niced solution,
>>however I would need to look at the code to think
>
> about
>
>>others.
>
>
> Yes, if the special value was present in the database
> as a valid value this would be a problem. Perhaps a
> option like --diffsysmis=number (define different
> system missing value, other than 9.99999e+99) could
> solve the problem. PSPP, which is cloning spss, seems
> to use a list of user defined missing values to each
> colum:
>
> [jakson at localhost src]$ grep MISSING *.h -n
> exprP.h:187: OP_STR_MIS, /*
> MISSING(strvar). */
> var.h:282: MISSING_NONE, /* No
> user-missing values. */
> var.h:283: MISSING_1, /* One
> user-missing value. */
> var.h:284: MISSING_2, /* Two
> user-missing values. */
> var.h:285: MISSING_3, /*
> Three user-missing values. */
> var.h:286: MISSING_RANGE, /* [a,b]. */
> var.h:287: MISSING_LOW, /* (-inf,a].
> */
> var.h:288: MISSING_HIGH, /* (a,+inf].
> */
> var.h:289: MISSING_RANGE_1, /* [a,b], c.
> */
> var.h:290: MISSING_LOW_1, /* (-inf,a],
> b. */
> var.h:291: MISSING_HIGH_1, /* (a,+inf),
> b. */
> var.h:292: MISSING_COUNT
>
> PSPP also seems to have a special value reserved to
> sysmis. Look at the following lines:
>
> var.h, line 40 and ss.:
> /* Special values. */
> #define SYSMIS (-DBL_MAX)
> #define LOWEST second_lowest_value
> #define HIGHEST DBL_MAX
>
> magic.h, line 29 and ss.:
> fndef SECOND_LOWEST_VALUE
> /* "Second lowest" value for a flt64; that is,
> (-FLT64_MAX) + epsilon. */
> double second_lowest_value;
> #endif
>
> PSPP code is too big and complex to me, and I just
> used grep to find the word "missing". But it seems
> that they are using the biggest possible double
> negative number as sysmis, and a lot of other user
> defined missing values. Perhaps using a very big (and
> negative) number was not so dangerous. Anyway, I don't
> know if this is a good solution.
>
> Anyway, I don't think it is necessary to complicate
> the code creating a complex list of user defined
> missing values. I prefer the simpler approach of
> Statist of simply putting a 'M' in the database. In
> social research we have three main sources of abudant
> missing values: (1) people don't know how to answer a
> question; (2) they do know how to, but are unwilling
> to answer, and (3) the question don't apply. In some
> analysis, all three cases are better considered
> missing values, but in others, often using the same
> data base, "I don't know" and "I prefer don't answer
> this question" must be distinctly counted and
> analysed. But this would happens only rarely, and it
> would not be that difficult to write a program to
> automatically recode an original database and create a
> new one with the 'M's correctly replacing values that
> have to be recoded as missing.
>
> Best,
>
> Jakson
>
>
>
>
>
>
> _______________________________________________________
> Yahoo! Acesso Grátis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet rápida e grátis
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Statist-list mailing list
> Statist-list at intevation.de
> https://intevation.de/mailman/listinfo/statist-list
More information about the Statist-list
mailing list
This site is hosted by Intevation GmbH (Datenschutzerklärung und Impressum | Privacy Policy and Imprint)