Showing all posts tagged thwack:

The Art of Simplicity is a Puzzle of Complexity

As network engineers, administrators, architects, and enthusiasts we are seeing a trend of relatively complicated devices that all strive to provide unparalleled visibility into the inner workings of applications or security. Inherent in these solutions is a level of complexity that challenges network monitoring tools, it seems that in many cases vendors are pitching proprietary tools that are capable of extracting the maximum amount of data out of a specific box. Just this afternoon I sat on a vendor call in which we were doing a technical deep dive of a next-generation firewall with a very robust feature set with a customer. Inevitably the pitch was made to consider a manager of managers that could consolidate all of this data into one location. While valuable in its own right for visibility, this perpetuates the problem of many “single panes of glass".
I couldn’t help but think, what we really need is the ability to follow certain threads of information across many boxes, regardless of manufacturer—these threads could be things like application performance or flows, security policies, etc. Standards-based protocols and vendors that are open to working with others are ideal as it fosters the creation of ecosystems. Automation and orchestration tools offer this promise, but add on additional layers of intricacy in the requirements of knowing scripting languages, a willingness to work with open source platforms, etc.
Additionally, any time we seem to abstract a layer or simplify it, we lose something in the process—this is known as generation loss. Generation loss is the result of compounding this across many devices or layers of management tends to result in data that is incomplete or worse inaccurate, yet this is the data that we are intending to use to make our decisions.
Is it really too much to ask for simple and accurate? I believe this is where the art of simplicity comes into play. The challenge of creating an environment in which the simple is useful and obtainable requires creativity, attention to detail, and an understanding that no two environments are identical. In creating this environment, it is important to address what exactly will be made simple and by what means. With a clear understanding of the goals in mind, I believe it is possible to achieve these goals, but the decisions on equipment, management systems, vendors, partners, etc. need to be well thought through and the right amount of time and effort must be dedicated to it.

One Company's Journey Out of Darkness, Part VI: Looking Forward

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.


If you'e followed the series this far, you've seen a progression through a series of tools being rolled out. My hope is that this last post in the series spawns some discussion around tools that are needed in the market and features or functionality that is needed. these are the top three things that we are looking at next.

Event Correlation
The organization acquired Splunk to correlate events happening at machine level throughout the organization, but this is far from fully implemented and will likely be the next big focus. The goal is to integrate everything from clients to manufacturing equipment to networking to find information that will help the business run better and experience fewer outages and/or issues as well as increase security. Machine data is being collected to learn about errors in the manufacturing process as early as possible. This error detection allows for on the fly identification of faulty machinery and enables quicker response time. This decreases the amount of bad product and waste as a result, improving overall profitability. I still believe there is much more to be gained here in terms of user experience, proactive notifications, etc.

Software Defined X
Looking to continue move into the software defined world for networking, compute, storage, etc. These offerings vary greatly and the decision to go down a specific path shouldn't be taken lightly by an organization. In our case here we are looking to simplify network management across a very large organization and do so in such a way that we are enabling not only IT work flows, but for other business units as well. This will likely be OpenFlow based and start with the R&D use cases. Organizationally IT has now set standards in place that all future equipment must support OpenFlow as part of the SDN readiness initiative.

Software defined storage is another area of interest as it reduces the dependency on any one particular hardware type and allows for ease of provisioning anywhere. The ideal use case again is for R&D teams as they develop new product. Products that will likely lead here are those that are pure software and open, evaluation has not really begun in this area yet.

DevOps on Demand
IT getting a handle on the infrastructure needed to support R&D teams was only the beginning of the desired end state. One of the loftiest goals is to create an on-demand lab environment that provides compute, store and network on demand in a secure fashion as well as provide intelligent request monitoring and departmental bill back. We've been looking into Puppet Labs, Chef, and others but do not have a firm answer here yet. This is a relatively new space for me personally and I would be very interested in further discussion around how people have been successful in this space.


Lastly, I'd just like to thank the Thwack Community for participation throughout this blog series. Your input is what makes this valuable to me and increases learning opportunities for anyone reading.





One Company's Journey Out of Darkness, Part V: Seeing the Light

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.

Blog Series

After months of rolling out new tools and provisioning the right levels of access, we started to see positive changes within the organization.

Growing Pains
Some amount of growing pains were to be expected and this was certainly no exception. Breaking bad habits developed over time is a challenge, however the team worked to hold each other accountable and began to build the tools into their daily routines. New procedures for rolling out equipment included integration with monitoring tools and testing to ensure data was being logged and reported on properly. The team made a concerted effort to ensure that previously deployed devices were populated into the system and spent some time clearing out retired devices. Deployments weren't perfect at first and a few steps were skipped, however the team developed deployment and decommission checklists to help ensure the proper steps were being met. Some of the deployment checklist items included things that would be expected: IP addressing, SNMP strings, AAA configuration, change control submission, etc. while others were somewhat less obvious - placing inventory tags on devices, recording serial numbers, etc. We also noticed that communications between team members started to change as discussions were starting from a place in which individuals were better informed.

Reducing the Shadow
After the "growing pains" period, we were pleased to see that the tools were becoming part of every day activities for core teams. The increased knowledge led to some interesting discussions around optimizing locations for specific purposes and helped shed some light on regular pain points within the organization. For this particular customer, the R&D teams have "labs" all over the place which could place undue stress on the network infrastructure. The "Shadow IT" that had been an issue before could now be better understood. In turn, IT made an offer to manage the infrastructure in trade for giving them what they wanted. This became a win-win for both groups and has fundamentally changed the business for the better. In my opinion, this is the single best change the company experienced. Reduction in role of "Shadow IT" and migrating those services to the official IT infrastructure group created far better awareness and supportability. As an added benefit, budgets are being realigned with additional funding shifted to IT who has taken on this increased role. There is definitely still some learning that needs to be done here, but the progress thus far has been great.

Training for Adoption
Adoption seemed slow for help desk and some of the ancillary teams who weren't used to these tools and we wanted to better understand why. After working with the staff to understand the limited use it became apparent that although some operational training had been done, training for adoption had not. A well-designed training-for-adoption strategy can make the difference between success and failure of a new workflow or technology change.The process isn't just providing users with technical knowledge, but rather to build buy-in, ensure efficiency, and create business alignment. It is important to evaluate how the technology initiative will help improve your organization. Part of the strategy should include an evaluation plan to measure results against those organizational outcomes, such as efficiency, collaboration, and customer satisfaction (this could be internal business units or outward facing customers).

The following are tips that my company lives by to help ensure that users embrace new technology to advance the organization:
Communicate the big-picture goals in relevant terms. To senior management or technology leaders, the need for new technology may be self-evident. To end-users, the change can seem arbitrary. However, all stakeholders share common interests such as improving efficiency or patient care. Yet, users may resist a new workflow system—unless the project team can illustrate how the system will help them better serve patients and save time.

Invest properly in planning and resources for user adoption. If an organization is making a significant investment in new systems, investing in the end-user experience is imperative to fully realize the value of the technology. However, training for user adoption often is an afterthought in major technology project planning. Furthermore, it is easy to underestimate the hours required for communications, workshops and working sessions.

Anticipate cultural barriers to adoption. Training should be customized to your corporate culture. In some organizations, for instance, time-strapped users may assume that they can learn new technology “on the fly." Others rely on online training as a foundation for in-person instruction. Administrators may face competing mandates from management, while users may have concerns about coverage while they are attending training. A strong project sponsor and operational champions can help anticipate and overcome these barriers, and advise on the training formats that will be most effective.

Provide training timed to technology implementation. Another common mistake is to provide generic training long before users actually experience the new system, or in the midst of go-live, where it becomes chaotic. Both scenarios pose challenges. Train too early and, by the time you go “live," users forget how they are supposed to use the technology and may be inclined to use it as little as possible If you wait for go-live, staff may be overwhelmed by their fears and anxieties, and may have already developed resistance to change. The ideal approach will depend on each facility’s context and dependencies. However, staggering training, delivering complex training based on scenarios, addressing fears in advance, and allowing for practice time, are all key success factors.

Provide customized training based on real-life scenarios. Bridging the gap between the technology and the user experience is a critical dimension of training and one that some technology vendors tend to overlook in favor of training around features and functionality. Train with real-life scenarios, incorporating various technologies integrated into “day in the life" of an end user or staff member. By focusing on real-world practice, this comprehensive training helps overcome the “fear of the new" as users realizes the benefits of the new technology.

Create thoughtful metrics around adoption. Another hiccup in effective adoption occurs when companies do not have realistic metrics, evaluation, and remediation plans. Without these tools, how do you ensure training goals are met—and, perhaps more importantly, correct processes when they are not? Recommend an ongoing evaluation plan that covers go-live as well as one to six months out.

Don’t ignore post-implementation planning. Contrary to popular perception, training and adoption do not end when the new system goes live. In fact, training professionals find that post-implementation support is an important area for ensuring ongoing user adoption.

One Company's Journey Out of Darkness, Part IV: Who is Going to Use the Tools?

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.


Throughout this series I've been advocating the formation of a tools team, whether it is a formalized group of people or just another hat that some of the IT team wears. This team's task is to maximize the impact of the tools that they've chosen to invest in. In order to maximize this impact, understanding who is using each tool is a critical component of success. One of the most expensive tools that organizations invest in is their main network monitoring system. This expense may be in the CapEx spent obtaining the tool or the sweat equity put in by someone building out an open source offering, but either way these dashboards require significant effort to put in place and demand effective use by the IT organization. Most of IT can benefit from these tools in one way or another, so having Role Based Access Controls to these platforms is important so that this access may be granted in a secure way. Screens should be highly visible so that everyone in the office can see them.

Network Performance Monitoring
NPM aspects of a network management tool should be accessible by most if not all teams, although some may never opt to actually use it. Outside of the typical network team, the server team should be aware of typical throughput, interface utilization, error rates, etc. such that the team can be proactive in remediation of issues. Examples where this has come in useful include troubleshooting backup related WAN congestion issues and usage spikes around anti-virus updates in a large network. In both of these cases, the server team was able to provide some insights into configuration of the applications and options to help remedy the issue in unison with the network management team. Specific roles benefiting from this access include: Server Admins, Security Admins, WAN Admin, Desktop Support

Deep Packet Inspection/Quality of Experience Monitoring
One of the newer additions to NMS systems over the years has been DPI and its use in shedding some light on the QoE for end users. Visibility into application response time can benefit the server team and help them be proactive in managing compute loads or improving on capacity. Traps based on QoE variances can help teams responsible for specific servers or applications provide better service to business units. Specific roles benefiting from this access include: Server Admins, Security Admins, Desktop or Mobile Support

Wireless Network Monitoring
Wireless has outpaced the wired access layer as the primary means of network connectivity. Multiple teams benefit from monitoring the air space ranging from security to help desk and mobile support teams. In organizations supporting large guest networks - health care, universities, hotels, etc. the performance of your wireless network is critical to the public perception of brand. Wireless networks monitoring now even appeals to customer service or marketing teams. This addition to non-IT teams can improve overall communications and satisfaction with the solutions. For teams with wireless voice handsets, telecom will benefit from access to wireless monitoring. In health care, there is a trend to develop a mobile team as these devices are critical to the quality of care. These mobile teams should be considered advanced users of wireless monitoring.

IP Address Management (IPAM)
IPAM is an amazing tool in organizations that have grown organically over the years. Using my customer as a reference, they had numerous /16 networks in use around the world, however many of these were disjointed. This disjointed IP addressing strategy leads to challenge from an IP planning standpoint, especially for any new office, subnet, DMZ, etc. I'd advocate read only access for help desk and mobile support teams and expanded access for server and network teams. Awareness of an IPAM solution can reduce outages due to human error and provides a great visual reference as to the state of organization (or lack there of) when it comes to a company's addressing scheme.

I personally do not advocate an environment that promotes read-only access for anyone interested in these tools as the information held within these tools should be secure as they would provide the seeds for a well planned attack if so desired. Each individual given access to these tools should be made aware that they are a job aide and carry a burden of responsibility. Also, I've worked with some organizations looking for very complex RBAC for their management teams, unless you have an extremely good reason, I'd shy away from this as well as the added complexity generally offers very little.

One Company's Journey Out of Darkness, Part III: Justification of the Tools

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.


As organizations roll out network management software and extend that software to a number teams they begin to gain additional insights that weren't visible before. These additional insights enable the business to make better decisions, recognize more challenges and/or inefficiencies, etc.

For this customer one of the areas in which we were able to vastly improve visibility had to do with the facilities team. This manufacturing site has its own power station and water plant among other things to ensure that manufacturing isn't ever disrupted. In working on other projects with the team, it became obvious that the plant facilities team was in the dark about network maintenance issues, etc. This team would mobilize into "outage mode" whenever the network was undergoing maintenance. After spending time with this team and understanding why they had to react the way that the do, we were able to extend a specific set of tools to them that would make them aware of any outages, give them insight into when/why certain devices were offline, and provide visibility into when the network would come back online. This increased awareness of their needs, combined with additional visibility from network tools has reduced the average cost of an outage significantly as well as solved some communication challenges between various teams. We were also able to give them a dashboard that would help discern between network and application level issues.

This is a brief of example as to how we can all start to build the case for network management tools and do so in a business relevant way. Justifying these tools has to be about the business rather than simply viewing red/yellow/green or how hard a specific server is working. A diverse team can help explain the total business impact better than any single team could. For admins looking to get these tools look for some of these business impacting advantages:

Reduced Downtime
We always seem to look at this as network downtime, however as in the example above there are other downtime issues to be aware of and all of these can impact the business. Expanding the scope of network related issues can increase the perceived value of any networking tool. Faster time to resolution through the added visibility is a key contributor to reduced downtime. Tools that allow you to be proactive also have a very positive effect on downtime.

Supportability
This seems rather self explanatory, however enabling helpdesk to be more self-sufficient through these tools can reduce the percentage of escalated tickets. These tickets typically carry a hefty price and also impact the escalations team to work on other issues.

Establish and Maintain Service Level Agreements
Many organization talk about SLAs and expect them from their carriers, etc. but how many are offering this to their own company? I'd argue very few do this and it is something that would benefit the organization as a whole. An organization that sees IT as an asset will typically be willing to invest more in that group. As network admins, we need to make sure we are providing value to the company. Predictable response and resolution times are a good start.

Impact on Staff
Unplanned outages are a massive drain on resources from help desk to admins to executives, everyone is on edge. These also often carry the financial impacts of overtime, consulting fees, etc. in addition to some of the intangibles like work/life balance, etc.

One Company's Journey Out of Darkness: Part II What Tools Should We Have?

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.

Blog Series
One Company's Journey Out of Darkness, Part V: Seeing the Light Also on Thwack!
One Company's Journey Out of Darkness, Part VI: Looking Forward Also on Thwack!

IT organizations who have followed this segregated path of each team purchasing the tools they need tend to have some areas that have sufficient monitoring as well as areas in which there no visibility exists. Predictably these gaps in visibility tend to reside between areas of responsibility or the "gray space" within an organization. Common examples of gray space could be the interaction between applications, clients and the transport between the two, the network and mobile devices, guest devices/users and their traffic patterns, help desk and network issues.

In a collaborative environment, the team is able to review the entirety of the tool set and discuss where gaps may exist. It is important that the right players have a seat at the table for these discussions - this will range from traditional network, application, security, and help desk teams to some of the newer teams like the mobile device teams. Spend some time exploring pain points within the existing work flows as these may stem from lack of knowledge that could be supplemented by one of the tools. There may be tools that aren't shared and that is quite alright, taking a phased approach to implementing tool sets on a wider basis will help ensure that these groups are getting tools that impact their ability to do their job.

With my customer we found the following to work:

Network Management
Consolidate network and wireless management tools to create "single pane of glass"
Troubleshooting tools helped the help desk resolve issues faster and provided them with access to info that could be more difficult to walk end users through providing.
Increase awareness of Netman and ensure contractors know how to use it

Point Solutions
Expand access to IPAM solution to include help desk and contractors as it helps with network address planning and troubleshooting
Increase awareness of available scripts and create internal portal so that others know where to find them and how to use them

Expand NAC Integration Through APIs
Integrate NAC via its APIs so that it shared data with Infoblox and Palo Alto improving network visibility for guests and improving Infoblox reporting
Integrate NAC with log aggregation tool so that it has more device data
Expand log aggregation tool access to all senior IT staff

Operations
Improve ticketing system notification to include facilities for outage window
Create documentation repository on Box.com so that all IT members can reach it

Issues to Address
Visibility into the Nexus data center infrastructures is lacking
Legacy cloud managed switches floating around that need to be dealt with. These have a great management platform in their own right, but they aren't integrated properly
Mobile device visibility and management at this point
Server visibility tools have not been shared with anyone outside of server team at this point as we are evaluating
Application performance management


The development of organizational tools should be an iterative process and each step should bring the company closer to its goals. The total value of a well integrated management system is greater than the sum of its parts as it can eliminate some of the holes in the processes. While many positive changes have been made, there are still many more to work through. This company has opted for a pace that enables them to make slow steady process on these tools while having to maintain day to day operations and plan for many future tools. Brand new tools will likely be integrated by VARs/System Integrators to ensure full deployment while minimizing impact on the IT staff.

One Company's Journey Out of Darkness: Part I - What tools do we have?

I've had the opportunity over the past couple of years to work with a large customer of mine on a refresh of their entire infrastructure. Network management tools were one of the last pieces to be addressed as emphasis had been on legacy hardware first and the direction for management tools had not been established. This mini-series will highlight this company's journey and the problems solved, insights gained, as well as unresolved issues that still need addressing in the future. Hopefully this help other companies or individuals going through the process. Topics will include discovery around types of tools, how they are being used, who uses them and for what purpose, their fit within the organization, and lastly what more they leave to be desired.


Lean IT teams often do whatever they can to get by and my customer was no exception. One of the biggest challenges they had in approaching their network management strategy was to understand what they currently had. We had to work through the "day in the life" of a number of individuals to identify the core tools used, but were constantly surprised by new tools that would appear or were used so infrequently that the team would forget about them until a specific use case arose.

Open Source Tools
The team found open source tools to be of tremendous use, especially Netman and MRTG. These provided much needed visibility and the price was right given the lack of investment in monitoring tools at the time of deployment. The relatively complex nature of deployment of these tools did limit adoption and we found that often these tools lagged behind from a configuration standpoint. New equipment would be deployed without necessarily being integrated into the tool, likewise old equipment when replaced was not always removed from the tools. Lack of policy and discipline in a busy IT shop had effectively limited the effectiveness of these tools. This was further compounded by only a small subset of the team having access. Additionally, as an outside resource, I had no idea what "normal" was when looking at the tool (e.g. is that router down or has it been removed?).

Vendor Specific Tools
These tools are something most are familiar with products like Cisco's Prime Infrastructure, Aruba's Airwave, VMWare's vSphere Operations Management (VSOM), etc. Each of these tools had been deployed widely and would tend to be used by those who's job responsibility primarily covered the area managed by the tool, however others that could benefit from this tool very rarely used it if at all. These tools tend to be fairly expensive and offer many features that are typically not leveraged very well. Additionally, most of the tools have robust AAA capabilities that would enable them to be shared with help desk teams, etc. but these features had been overlooked by the team, despite having been properly configured for their own purposes.

Third Party Tools
Some investment had been made in third party tools, typically around a specific need. A good example of this would be the Kiwi Cat Tools for ease of device backups. While this functionality existed in other tools, the company wanted a single location for all device configuration files. The customer found that numerous tools existed, but it took the entire team to enumerate them and in a couple cases multiple instances of the same tool had been purchased for usage by different teams.

Scripting
Certain members of the IT team who were comfortable with writing and using scripts would develop their own toolsets, however these would often not be shared with the rest of the IT team until some specific project jogged the author's memory who would then offer up some script that had been written. In all cases these were very specific and had never been fully socialized, the team decided to develop a website internally to reference these tools and their use cases.

Taking a Step Back
Working with each of the administrators and their areas of responsibility it was easy to understand how they've gotten to this point where substantial investment had been made in a myriad of tools without putting a strategy in place. Each of the teams had acquired or deployed tools to make their lives easier and tended to go with whatever was vendor aligned or free. Taking a step back together from it all and looking at the system in its entirety provided a much different perspective - is this really how we'd design our management infrastructure if we built it from the ground up? Clearly not, so what next? Looking at the tools current deployed, it was obvious that substantial duplicate functionality existed as well as a number of gaps, especially as it pertained to any one specific team's visibility.

Enumerating the existing tools, processes and use cases highlighted how much organizations actually do spend on tools while complaining that they don't have the visibility needed. Open lines of communication between teams, the development of an official or virtual "tools team", and careful consideration of products purchased are key to the success of running the IT team properly. Highly custom scripts and those who can write them can be of great value to an organization, however this value is wasted if the team at large isn't aware of these scripts and how to best utilize them.