Page 1 of 1

What a day...

PostPosted:Mon Apr 14, 2008 7:02 pm
by Imakeholesinu
So the monitoring tool I was in the process of implementing at work for the past, oh, 9 months was just aborted.

I feel somewhat relieved but also like I failed because of the companies inability to follow through with what was discussed back in January on a fix that would allow us to continue using the system.

Now we're looking at Solarwinds, so if anyone else out there works for an MSP I suggest you shy away from Silverback which is now owned by Dell.

PostPosted:Mon Apr 14, 2008 7:14 pm
by SineSwiper
Heh, you and I are in similar fields. I'm in the process of trying to build an HFC Performance Mgmt System for our company.

SolarWinds is great for network monitoring, but if you're a cable company, I would stay away from their broadband monitoring product. Spent the last three months or so trying to get it to work up to our needs (after we bought the thing a long time ago), and failed to get anything working on it.

Also, the support is somewhat lacking, but maybe that has to do with the whole broadband piece. (Every time we talk to support, they seem to need to get a broadband engineer to help us. Can't wait to upgrade to the network-only version.)

Anyway, SolarWinds is working great for our needs in tandem with Netcool.

PostPosted:Mon Apr 14, 2008 8:28 pm
by Imakeholesinu
Basically my company hosts servers and provides mid-range windows and unix support to customers who are just big enough to need IT infrastructure but to small to build it and maintain it on their own.

Solarwinds would act with our existing Big Brother monitoring to monitor Windows servers and Unix, Red Hat, HP-UX, AIX and Solaris hopefully as well as monitoring all of the Cisco Routers and switches.

We do have some large customers (Red Bull, Louis Vitton, Bank of America just to name a few) also who have some of their environment outsourced to up simply because we're cheap.

PostPosted:Mon Apr 14, 2008 10:37 pm
by SineSwiper
Have you seen AdventNet OpManager? It's a nice server-centric network monitoring tool.

SolarWinds is really good for switches and routers, and even getting CPU/Memory/Disk measurements, though it doesn't have any application monitoring goodies. I guess that's what you're using BigBrother for. (What the hell...I remember when this thing was free. Isn't it GNU?)

BTW, do you guys have a 24/7 NOC, or are you just doing stuff by email notifications?

PostPosted:Mon Apr 14, 2008 10:59 pm
by Tessian
SineSwiper wrote:Have you seen AdventNet OpManager? It's a nice server-centric network monitoring tool.
We actually have this in our environment. It's pretty decent, I know right now our Windows Infrastructure group is trying to tie it into Altiris help desk to do automated alerting / ticketing. I just have it warn me if any of my Websense services or vuln servers go down. I have heard it's a BITCH to configure in a large environment because while you can set alarms and thresholds for entire groups of devices, most servers and such have their own individual thresholds (DC1 and DC2 can have 2 completely different load averages)


It's funny you guys mention this because I am starting a proof of concept with Mazu, a Network Behavior Analysis (NBA). You feed it netflow from your routers and through that and an app sensor it's able to tell you pretty much everything that's going on in your network at a high level. It'll baseline your network behavior and warn you when shit changes. Should be great for troubleshooting and tracking changes, virus outbreaks, etc. I've yet to see if it's all they claim to be but I'm very interested to find out.

I don't think that's exactly what you're talking about as it won't do server specific things (except tell you if Server A has changed its traffic pattern or show you dependencies, etc) but it's related.

PostPosted:Mon Apr 14, 2008 11:59 pm
by Imakeholesinu
SineSwiper wrote:Have you seen AdventNet OpManager? It's a nice server-centric network monitoring tool.

SolarWinds is really good for switches and routers, and even getting CPU/Memory/Disk measurements, though it doesn't have any application monitoring goodies. I guess that's what you're using BigBrother for. (What the hell...I remember when this thing was free. Isn't it GNU?)

BTW, do you guys have a 24/7 NOC, or are you just doing stuff by email notifications?
24/7 NOC at 2 of 3 data centers. Both DC's here in STL are 24/7 staffed now. The one out in Philly isn't. Not a real reason to as it is mostly Dev and staging or DR for customers. Support for that runs 8-6.

I'm not sure if BB is GNU, though this may push us to finally fully implement it if you are correct about Solarwinds and the lack of application level monitoring (IE Windows Services and Event Log message monitoring specifically). I haven't heard anything about Solarwinds. I looked at Opmanager and again, it was mostly for windows devices. We're in the market for a product that will encompass all of our supported devices. That's one of the reasons why Silverback looked appealing, but that's only because their sales people lied to us (no, they flat out lied to us) about what this thing could/can/will do in the future, then they got bought by dell right after we signed the check for the appliances.

We need something that monitors for SNMP traps on all devices. We are mostly HP and use Insight manager but we need something for other devices like Brocade switches and Netapp devices that periodically send traps when connections fail or drives get full.

Nagieos was another product we were looking at.

We already had Unicenter and that is a piece of shit. HP openview, while it would work awesome with 95% of our environment, requires at minimum 2 fulltime developers and 3 admins to keep it happy.

Basically we want something agentless.

PostPosted:Tue Apr 15, 2008 7:52 am
by SineSwiper
Tessian wrote:It's funny you guys mention this because I am starting a proof of concept with Mazu, a Network Behavior Analysis (NBA). You feed it netflow from your routers and through that and an app sensor it's able to tell you pretty much everything that's going on in your network at a high level. It'll baseline your network behavior and warn you when shit changes. Should be great for troubleshooting and tracking changes, virus outbreaks, etc. I've yet to see if it's all they claim to be but I'm very interested to find out.
Basically, a deep packet inspector. We have Sandvine for that. I just wish we were using it for stuff besides reporting. It's capable of mitigation of virus/DDoS traffic, and a whole lot more, but big boss is paranoid of any active changes on our network, with all of the net neutrality bullshit and Comcast's lawsuit about auto-closing connections.
Barret wrote:I'm not sure if BB is GNU, though this may push us to finally fully implement it if you are correct about Solarwinds and the lack of application level monitoring (IE Windows Services and Event Log message monitoring specifically).
Actually, there is an application monitoring piece that I forgot about. It's just that we weren't using it. However, I would ask if it works for both UNIX and Windows. (SolarWinds is a Windows product.) Also, we have our SolarWinds split up between the web server, database server, and app servers.
Barret wrote:We need something that monitors for SNMP traps on all devices. We are mostly HP and use Insight manager but we need something for other devices like Brocade switches and Netapp devices that periodically send traps when connections fail or drives get full.
HP has OpenView for that. We use Netcool for our "manager of managers". It's basically a global alarm system that accepts different types of SNMP traps or syslogs and put it up into an alarm system. Basically, our NOC monitors just Netcool 24/7, tickets the alarm, and dives into the problem, escalating if neccesary.

Netcool, OpenView, or something similar IS A REQUIREMENT of a NOC! Cannot stress this enough. You can't have your NOCs just looking at a bunch of different status pages for different applications. And email isn't a tracking system, so don't use it for alarms. For example, we have all of our CMTS, switches, routers syslogging into Netcool. Fiber transport syslogs to Netcool. SolarWinds sends syslog messages to Netcool. All outages are put out as syslogs to Netcool. Most of the servers are syslogging into the Netcool, or they are syslogging into OpManager (which gets forwarded to Netcool). We get SNMP traps for everywhere, from NetApp (we have that SAN, too), Netbotz (awesome headend env monitoring system), the mail system, etc., etc., that go into Netcool.

Also, you will need a Netcool admin to keep up with rules file changes, etc.

Netcool got bought by IBM recently, but we've talked with them after the buyout and they assured us that they really like Netcool, and they didn't buy it to bury the product.
Barret wrote:Nagios was another product we were looking at.
Yeah, we have that, too. Not sure how the data center is using it, though I do know that they use it for security testing. We just had to pass our PCI compliance recently.
Barret wrote:We already had Unicenter and that is a piece of shit. HP openview, while it would work awesome with 95% of our environment, requires at minimum 2 fulltime developers and 3 admins to keep it happy.
Didn't think it was that resource intensive. Try out Netcool. Of course, keep in mind that you still DO need an admin for it. For any of these type of systems, it's requires somebody to keep up with the rules files and alarm changes. But, it's something that you NEED TO HAVE. Your company will just need to bite the bullet and hire another rep. But the business impact is in catching a problem 2 minutes after it happens. You can't get that from a thousand different monitoring systems without some sort of "glue" to pull it together.