c++ - searching for hundreds of patterns in huge Logfiles -
i have lots of filenames inside webserver's htdocs directory , take list of filenames search huge amount of archived logfiles last access on these files.
i plan in c++ boost. take newest log first , read backwards checking every single line of filenames got.
if filename matches, read time logstring , save it's last access. don't need file more want know last access.
the vector of filenames search should rapidly decrease.
i wonder how can handle kind of problem multiple threads effective.
do partition logfiles , let every thread search part of logs memory , if thread has match removes filename filenames vector or there more effective way this?
try using mmap, save considerable hair loss. feeling expeditious , in odd mood recall mmap knowledge, wrote simple thing started. hope helps!
the beauty of mmap can parallelized openmp. it's way prevent i/o bottleneck. let me first define logfile class , i'll go on implementation.
here's header file (logfile.h)
#ifndef _logfile_h_ #define _logfile_h_ #include <iostream> #include <fcntl.h> #include <stdio.h> #include <string> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> using std::string; class logfile { public: logfile(string title); char* open(); unsigned int get_size() const; string get_name() const; bool close(); private: string name; char* start; unsigned int size; int file_descriptor; }; #endif
and here's .cpp file.
#include <iostream> #include "logfile.h" using namespace std; logfile::logfile(string name){ this->name = name; start = null; size = 0; file_descriptor = -1; } char* logfile::open(){ // file size struct stat st; stat(title.c_str(), &st); size = st.st_size; // file descriptor file_descriptor = open(title.c_str(), o_rdonly); if(file_descriptor < 0){ cerr << "error obtaining file descriptor for: " << title.c_str() << endl; return null; } // memory map part start = (char*) mmap(null, size, prot_read, map_shared, file_descriptor, 0); if(start == null){ cerr << "error memory-mapping file\n"; close(file_descriptor); return null; } return start; } unsigned int logfile::get_size() const { return size; } string logfile::get_title() const { return title; } bool logfile::close(){ if( start == null){ cerr << "error closing file. closetext() called without matching opentext() ?\n"; return false; } // unmap memory , close file bool ret = munmap(start, size) != -1 && close(file_descriptor) != -1; start = null; return ret; }
now, using code, can use openmp work-share parsing of these logfiles, i.e.
logfile lf ("yourfile"); char * log = lf.open(); int size = (int) lf.get_size(); #pragma omp parallel shared(log, size) private(i) { #pragma omp (i = 0 ; < size ; i++) { // routine } #pragma omp critical // methods combine thread results }
Comments
Post a Comment