missing values
Jakson Aquino
jaksonaquino at yahoo.com.br
Thu Feb 10 22:13:35 CET 2005
Hello All!
Did anyone worked on the problem of missing values?
I was curious to know whether the code that I wrote in
the other email would work, and, then, I tested it.
With some adaptations, it seems that it's working.
However, I tested it only with <Miscellaneous | Mean,
standard deviations etc.> and <Multiple linear
correlation>. I opened the file that I sent as example
(gss20x6_M.dat), and Statist correctly deleted rows
with missing values during many calculations. Latter,
I also tested new code with the database that have 17
columns and 350000 rows. It worked well also.
I declared REAL SYSMIS as (-DBL_MAX + 1) because
Statist is already using the values (-DBL_MAX) and
(DBL_MAX).
The changes that I made are below. I can send them as
a patch against version 1.0.1, but perhaps it's better
to wait for the solution of the problem with gnuplot.
=================================================
statist.h
-------------------------------------------------
extern REAL SYSMIS; /* 'M's are put in tempfiles with
this value */
extern int tn; /* total number of rows in data
base */
=================================================
=================================================
statist.c
-------------------------------------------------
REAL SYSMIS = (-DBL_MAX) + 1;
int tn;
=================================================
=================================================
data.c
-------------------------------------------------
in function readsourcefile(), lines 262 and ff.
if ((token[0] == NODATA) && (strlen(token)==1)) {
FWRITE(&SYSMIS, sizeof(REAL), 1,
ttempfile[actcol]);
nn[actcol] ++;
colread ++;
}
else if (sscanf(token, "%lf", &test)==1) {
FWRITE(&test, sizeof(REAL), 1, ttempfile[actcol]);
nn[actcol] ++;
colread ++;
}
. . .
tn = lread - first_line;
-------------------------------------------------
void alloc_cols(int n_alloc) {
int k;
for (k=0; k<n_alloc; k++)
nn[acol[k]] = tn; /* tn = total number of rows in
database */
/* deleting all columns */
for (k=0; k<MCOL; k++){
if((x_read[k])){
myfree(xx[k]);
x_read[k] = FALSE;
label_tab[k].ptr = NULL;
}
}
/* putting selected columns in memory */
for (k=0; k<n_alloc; k++)
if (!x_read[acol[k]]){
xx[acol[k]] = readcol(acol[k]);
label_tab[acol[k]].ptr = xx[acol[k]];
label_tab[acol[k]].str = alias[acol[k]];
}
/* deleting rows with missing values */
int cr = 0; /* current row */
int tr = 0; /* total number of rows already checked
*/
int RowHasMis = 0;
while(tr < tn){
for (k=0; k<n_alloc; k++)
if(xx[acol[k]][tr] == SYSMIS) RowHasMis = 1;
if (RowHasMis){
tr++;
RowHasMis = 0;
}
else{
for (k=0; k<n_alloc; k++)
xx[acol[k]][cr] = xx[acol[k]][tr];
cr++;
tr++;
}
}
for (k=0; k<n_alloc; k++)
nn[acol[k]] = cr;
if (log_set)
for (k=0; k<n_alloc; k++)
fprintf(logfile, _("Variable %i = Column %s\n",
"Variable %i = Spalte %s\n"), (k+1),
alias[acol[k]] );
}
=================================================
* * * * * * * * * *
As you can see, the function alloc_cols() has two work
arounds with the variable nn; one at the begin of the
function, and the other at the end.
Best,
Jakson
_______________________________________________________
Yahoo! Acesso Grátis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet rápida e grátis
More information about the Statist-list
mailing list
This site is hosted by Intevation GmbH (Datenschutzerklärung und Impressum | Privacy Policy and Imprint)