missing values

Jakson Aquino jaksonaquino at yahoo.com.br
Thu Feb 10 22:13:35 CET 2005


Hello All!

Did anyone worked on the problem of missing values?

I was curious to know whether the code that I wrote in
the other email would work, and, then, I tested it.
With some adaptations, it seems that it's working.
However, I tested it only with <Miscellaneous | Mean,
standard deviations etc.> and <Multiple linear
correlation>. I opened the file that I sent as example
(gss20x6_M.dat), and Statist correctly deleted rows
with missing values during many calculations. Latter,
I also tested new code with the database that have 17
columns and 350000 rows. It worked well also.

I declared REAL SYSMIS as (-DBL_MAX + 1) because
Statist is already using the values (-DBL_MAX) and
(DBL_MAX).
 
The changes that I made are below. I can send them as
a patch against version 1.0.1, but perhaps it's better
to wait for the solution of the  problem with gnuplot.


=================================================
statist.h
-------------------------------------------------

extern REAL SYSMIS; /* 'M's are put in tempfiles with
this value */
extern int     tn;  /* total number of rows in data
base */

=================================================



=================================================
statist.c
-------------------------------------------------

REAL SYSMIS = (-DBL_MAX) + 1;
int tn;

=================================================



=================================================
data.c
-------------------------------------------------
in function readsourcefile(), lines 262 and ff.


	 if ((token[0] == NODATA) && (strlen(token)==1)) {
           FWRITE(&SYSMIS, sizeof(REAL), 1,
ttempfile[actcol]);
           nn[actcol] ++;
	   colread ++;
	 }
	 else if (sscanf(token, "%lf", &test)==1) {
	   FWRITE(&test, sizeof(REAL), 1, ttempfile[actcol]);
	   nn[actcol] ++;
	   colread ++;
	 }
       . . . 

       tn = lread - first_line;

-------------------------------------------------
void alloc_cols(int n_alloc) {
  int k;
  for (k=0; k<n_alloc; k++)
    nn[acol[k]] = tn; /* tn = total number of rows in
database */

  /* deleting all columns */
  for (k=0; k<MCOL; k++){
    if((x_read[k])){
      myfree(xx[k]);
      x_read[k] = FALSE;
      label_tab[k].ptr = NULL;       
    }
  }

  /* putting selected columns in memory */
  for (k=0; k<n_alloc; k++)
    if (!x_read[acol[k]]){
      xx[acol[k]] = readcol(acol[k]);
      label_tab[acol[k]].ptr = xx[acol[k]];
      label_tab[acol[k]].str = alias[acol[k]];
    }

  /* deleting rows with missing values */
  int cr = 0; /* current row */
  int tr = 0; /* total number of rows already checked
*/
  int RowHasMis = 0;
  while(tr < tn){
    for (k=0; k<n_alloc; k++)
      if(xx[acol[k]][tr] == SYSMIS) RowHasMis = 1;
           
    if (RowHasMis){
        tr++;
        RowHasMis = 0;
    }
    else{ 
      for (k=0; k<n_alloc; k++)
        xx[acol[k]][cr] = xx[acol[k]][tr];
      cr++;
      tr++;
    }
  }
  for (k=0; k<n_alloc; k++)
    nn[acol[k]] = cr;
    
  if (log_set)
    for (k=0; k<n_alloc; k++)
      fprintf(logfile, _("Variable %i = Column %s\n",
            "Variable %i = Spalte %s\n"), (k+1),
alias[acol[k]] );
}
=================================================

* * * * * * * * * * 

As you can see, the function alloc_cols() has two work
arounds with the variable nn; one at the begin of the
function, and the other at the end.

Best,

Jakson



	
	
		
_______________________________________________________ 
Yahoo! Acesso Grátis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet rápida e grátis




More information about the Statist-list mailing list

This site is hosted by Intevation GmbH (Datenschutzerklärung und Impressum | Privacy Policy and Imprint)