c++ - searching for hundreds of patterns in huge Logfiles -


i have lots of filenames inside webserver's htdocs directory , take list of filenames search huge amount of archived logfiles last access on these files.

i plan in c++ boost. take newest log first , read backwards checking every single line of filenames got.

if filename matches, read time logstring , save it's last access. don't need file more want know last access.

the vector of filenames search should rapidly decrease.

i wonder how can handle kind of problem multiple threads effective.

do partition logfiles , let every thread search part of logs memory , if thread has match removes filename filenames vector or there more effective way this?

try using mmap, save considerable hair loss. feeling expeditious , in odd mood recall mmap knowledge, wrote simple thing started. hope helps!

the beauty of mmap can parallelized openmp. it's way prevent i/o bottleneck. let me first define logfile class , i'll go on implementation.

here's header file (logfile.h)

#ifndef _logfile_h_ #define _logfile_h_  #include <iostream> #include <fcntl.h> #include <stdio.h> #include <string> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h>  using std::string;  class logfile {  public:      logfile(string title);      char* open();     unsigned int get_size() const;     string get_name() const;     bool close();  private:      string name;     char* start;     unsigned int size;     int file_descriptor;  };  #endif 

and here's .cpp file.

#include <iostream> #include "logfile.h"  using namespace std;  logfile::logfile(string name){     this->name = name;     start = null;     size = 0;     file_descriptor = -1;  }  char* logfile::open(){      // file size     struct stat st;     stat(title.c_str(), &st);      size = st.st_size;      // file descriptor     file_descriptor = open(title.c_str(), o_rdonly);     if(file_descriptor < 0){         cerr << "error obtaining file descriptor for: " << title.c_str() << endl;         return null;     }      // memory map part     start = (char*) mmap(null, size, prot_read, map_shared, file_descriptor, 0);     if(start == null){         cerr << "error memory-mapping file\n";         close(file_descriptor);         return null;     }      return start; }  unsigned int logfile::get_size() const {     return size; }  string logfile::get_title() const {     return title; }  bool logfile::close(){      if( start == null){         cerr << "error closing file. closetext() called without matching opentext() ?\n";         return false;     }      // unmap memory , close file     bool ret = munmap(start, size) != -1 && close(file_descriptor) != -1;     start = null;     return ret;  } 

now, using code, can use openmp work-share parsing of these logfiles, i.e.

logfile lf ("yourfile"); char * log = lf.open(); int size = (int) lf.get_size();  #pragma omp parallel shared(log, size) private(i) {   #pragma omp   (i = 0 ; < size ; i++) {      // routine   }   #pragma omp critical      // methods combine thread results } 

Comments

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -