3.0
What's the matter with 2.2?
Speed
Privateye was originally written in PHP for one important reason. That's the only API I knew for accessing MySQL databases, and we needed to access MySQL databases. In other words, ADODB made me choose PHP. There were some benifits to working with PHP: 5.0 has some good object oriented features, sockets and streams were pretty easy to deal with, and the "everything is a string is a number is a whatever" idea (dynamic typing, I believe it's called) ended up helping out quite a bit. But there's some problems with PHP. It was designed with a purpose. And that purpose (I'm sure the PHP developers will agree with me on this) was not to create command-line applications for high-speed data parsing. It's an interpreted language, and that slows it down. It's not multithreaded, and that hurts it quite a bit when dealing with SQL queries and calling external scripts. As Privateye grew, it became apparent that a better foundation was required to push it forward.
Complexity
Privateye 1.0 was simple. Privateye 2.0 and above were not. Privateye 1.0 was designed for a specific problem on our specific network. Privateye 2.0 and up were desig
ned to handle any problem we could throw at it and a few we couldn't think of, as well. Some of the added complexity, of course, was required to allow 2.0-2.2 the bread
th that they have. But, when writing config files for version 2 variants, I always thought that there had to be an easier way. Take a look at 2.2's Documentation.config if you want a real show. The version 2 syntax was also needlessly constricting. INPUT passed to ALERTPARSER passed to USERHASH passed to RULELIST which had RULES where TRIGGERS passed to ALERTS based on thresholds. Wow. It all makes sense, if you read the config and realize that ALERTPARSER is really just an input normalization object, and that normalization most likely comes before correlation (USERHASH), etc, etc, etc. But this meant that in order to do anything, you had to know everything.
The New Deal: 3.0
Take a look at the new 3.0 codebase, and the first thing you should notice (unless you're really, really thick) is the change to C++. There were really two choices in the matter, C++ or Perl. Some things, Perl could have done better. Some things C++. The C++ decision was a tough one, but in the end, it was based on:
- IO multiplexing (ah, the fabled 'select')
- PCRECPP (regular expressions without the Perl)
- QT (the possibility of eventually creating a GUI)
- Speed
C++ does suffer some shortfalls, most regrettably the conversion from strings to numbers and back, but through the version 3.0 release it's been clear that it was a good
, solid choice that should serve Privateye well in the future. Also, for my own benifit, it lets me code in C++ (which I quite enjoy) and gives me a first chance to really play around with the GNU build system (configure, etc.).
So, how is version 3.0 different from the version 2s?
Codebase
I think we've beaten this one to death. It's now in C++. One thing I haven't mentioned, though, is the ability this gives to link with other open-source libraries. Anything C or C++ can now be easily be ported into a peTrigger, peInput, or peOutput object. This should greatly increase the capabilities of Privateye with limited coding and leveraging the full power of the open source community. And since Privateye now uses GNU's build system, those without the fancy libraries will still be able to compile with what they have and have a perfectly good working Privateye.
Further Abstraction
In an attempt to deal with the complexity issue, the plethora of object types in the v2s have been abstracted down to three: Input, Trigger, Output. Basically, everything that could be made a Trigger was. This had a number of interesting unforseen affects which actually help Privateye more than I could have hoped:
- Ordering Optimization: Order isn't set in stone anymore, so ordering can now be used to satisfy specific needs or optomize specific processing characteristics. Most obvious of these is processing time. Do you only care about 'sendmail' syslog messages, but your /var/log/messages contains everything. Now, your first trigger can check for the substring 'sendmail', and you can throw out useless data before you ever deal with the normalization and correlation steps. Your execution branches can now normalize only the specific data they require, instead of all normalization being done up front.
- Multi-Stat Correlation: Stat objects (the new User objects) contain multi-alert correlation data. In the old way, a single User object was attached to an alert, and the processing continued on to the triggers. Now, attachment of a Stat object is a trigger itself, so it can be done anywhere in the event processing hierarchy. Also, multiple statistical objects can now be attached simultaneously, allowing for more granular and specialized statistical information.
- Bare-bones Configuration: Configuration for simple chores can now actually be simple (relatively speaking). If you don't need correlation, don't use the correlation triggers. If you don't require normalization, take it out of the processing tree. Object trees become much more function-based and directed. Now, everything has a purpose, and if it doesn't, it can easily be taken out.
Configuration Standardization
In another effort to reduce complexity, all object creation and configuration is now done exactly the same way. It doesn't matter if you're creating an SQL trigger or a Regular Expression trigger. First you name the object. Then you give its type (Trigger) and subtype (Regular Expression), then you give it the arguments it requires, in any order. In Privateye 2.x, the Parser object took up almost a quarter of the entire code base, line for line. And this was mostly because every object had its own unique syntax, with its own required arguments that had to be given in a specific order. Now, arguments can be given in any order, or can even be set and reset again after the object is created. So long as an object has everything it needs by the time it goes into use, it's good to go. This as well has had some interesting and useful side affects:
- Optional Arguments: Optional arguments can now be freely added to any object type. Because it's so easy to set them up and use them, the new syntax actually encourages them. So individual objects can now encompass more functionality through optionaly arguments while still allowing for their simpler default versions to be created without confusion.
- Trigger Trees: Two specific optional arguments come to mind, 'T' and 'F'. These are now optional arguments on all triggers where any number of children can be stored to be executed if the trigger returns true or false, respectively. For the most part, the need for those annoying boolean AND and OR triggers has been eliminated. They still have their place, enabling a single trigger to be used in two disparate processing trees, but this is a rare occurance that will probably only be seen in the most complex of implementations.
- Default Values: To aid in simplicity, default values can be given to a wide variety of triggers, so that common tasks require even less configuration. Most triggers, for example, default to using the "DATA" alert field, the field automatically populated by Input objects (this is a post-3.0_alpha phenomenon, by the way). So if you're just looking at the input data, not the normalized data, you don't even need to specify the field.
In Conclusion...
This has been a showcasing of some of the major improvements that reengineering Privateye from the ground up has allowed. Currently, the 3.0_alpha release provides limited functionality, but hopefully a beta will be available in the next month or two with a full feature set ready to be debugged and tested by the ever-curious and ever-growing Privateye community. As always, if you have any questions or comments, I'd love to hear from you at gsconnell@gmail.com. Thanks again for the time you took reading this. If you got this far, hopefully you're interested enough in Privateye to download and test. I can always use other ideas, viewpoints, and criticism. And as always, if you're interested enough to read this far, maybe you're interested enough to grab the latest SVN:
- svn co https://svn.sourceforge.net/svnroot/privateye/privateye/trunk privateye
I can't guarantee it will work, but at the very least it should prove pretty amusing.