missing values

Andreas Beyer beyer at imb-jena.de
Wed Feb 9 09:39:31 CET 2005


Hi,

instead of using something like -99 as a representation for missing 
values we should use something like infinity. The GNU C-lib knows the 
constant:

float NAN;

which is "not-a-number". Unfortunately this constant is only defined on 
GNU systems. In addition, NAN is unequal to itself. Hence, the following 
is true:

float x = NAN;
assert(x != x);

Instead we might use the maximum double, i.e. something like the following:

#include float.h // DBL_MAX

static double _no_value = DBL_MAX;

/* Obtaining the internal representation for a missing value. */
double missing_value()
{
    return _no_value;
}

/* Check if val is a missing value. */
int is_missing_value(double val)
{
    return val == _no_value;
}


I prefer encapsulating the missing-value check in a function, because 
that leaves room for future extensions that might become more complex.
	Andreas

Jakson Aquino schrieb:
> Hi Bernhard!
> 
> 
>>Could you produce a few example files so that
>>the error can be made obvious. We can use that as
> 
> test cases
> 
>>when we fix the bug.
> 
> 
> I'm  sending the example files as attachment. The file
> results_gss explains what was done and presents the
> results. The file gss20x6_file_info has information
> about the very small database used to make the test.
> 
> 
> 
>>Putting a special value in is not the niced solution,
>>however I would need to look at the code to think
> 
> about
> 
>>others.
> 
> 
> Yes, if the special value was present in the database
> as a valid value this would be a problem. Perhaps a
> option like --diffsysmis=number (define different
> system missing value, other than 9.99999e+99) could
> solve the problem. PSPP, which is cloning spss, seems
> to use a list of user defined missing values to each
> colum:
> 
> [jakson at localhost src]$ grep MISSING *.h -n
> exprP.h:187:    OP_STR_MIS,                     /*
> MISSING(strvar). */
> var.h:282:    MISSING_NONE,             /* No
> user-missing values. */
> var.h:283:    MISSING_1,                        /* One
> user-missing value. */
> var.h:284:    MISSING_2,                        /* Two
> user-missing values. */
> var.h:285:    MISSING_3,                        /*
> Three user-missing values. */
> var.h:286:    MISSING_RANGE,            /* [a,b]. */
> var.h:287:    MISSING_LOW,              /* (-inf,a].
> */
> var.h:288:    MISSING_HIGH,             /* (a,+inf].
> */
> var.h:289:    MISSING_RANGE_1,          /* [a,b], c.
> */
> var.h:290:    MISSING_LOW_1,            /* (-inf,a],
> b. */
> var.h:291:    MISSING_HIGH_1,           /* (a,+inf),
> b. */
> var.h:292:    MISSING_COUNT
> 
> PSPP also seems to have a special value reserved to
> sysmis. Look at the following lines:
> 
> var.h, line 40 and ss.:
> /* Special values. */
> #define SYSMIS (-DBL_MAX)
> #define LOWEST second_lowest_value
> #define HIGHEST DBL_MAX
> 
> magic.h, line 29 and ss.:
> fndef SECOND_LOWEST_VALUE
> /* "Second lowest" value for a flt64; that is,
> (-FLT64_MAX) + epsilon. */
> double second_lowest_value;
> #endif
> 
> PSPP code is too big and complex to me, and I just
> used grep to find the word "missing". But it seems
> that they are using the biggest possible double
> negative number as sysmis, and a lot of other user
> defined missing values. Perhaps using a very big (and
> negative) number was not so dangerous. Anyway, I don't
> know if this is a good solution.
> 
> Anyway, I don't think it is necessary to complicate
> the code creating a complex list of user defined
> missing values. I prefer the simpler approach of
> Statist of simply putting a 'M' in the database. In
> social research we have three main sources of abudant
> missing values: (1) people don't know how to answer a
> question; (2) they do know how to, but are unwilling
> to answer, and (3) the question don't apply. In some
> analysis, all three cases are better considered
> missing values, but in others, often using the same
> data base, "I don't know" and "I prefer don't answer
> this question" must be distinctly counted and
> analysed. But this would happens only rarely, and it
> would not be that difficult to write a program to
> automatically recode an original database and create a
> new one with the 'M's correctly replacing values that
> have to be recoded as missing. 
> 
> Best,
> 
> Jakson
> 
> 
> 
> 	
> 	
> 		
> _______________________________________________________ 
> Yahoo! Acesso Grátis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet rápida e grátis
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Statist-list mailing list
> Statist-list at intevation.de
> https://intevation.de/mailman/listinfo/statist-list




More information about the Statist-list mailing list

This site is hosted by Intevation GmbH (Datenschutzerklärung und Impressum | Privacy Policy and Imprint)