Wednesday, November 02, 2011

Linux vs Windows resurfacing.

I was recently forwarded a link.
http://www.zdnet.com/blog/diy-it/why-ive-finally-had-it-with-my-linux-server-and-im-moving-back-to-windows/245?pg=2

I need to start off with the Oh, my; after reading and formulating my rebuttal, I think this article was written more for shock value than any technical content. David complains about not running Linux because he doesn't have any time. hmm, as any decent Linux admin will tell you the number one reason to run Linux is you don't have the time to properly administrate a windows environment. Just based on the amount of time for setup, patching, and maintenance, I am going to say declaratively, Linux takes way less time.

David complains about the excess of options to many shells and UI's. I submit, you absolutely, positively never run a GUI in a production environment. There is one caveat to that comment, I know some one will catch it, so I will go ahead, the first time you install Oracle in a cluster you need a gui. But after it is installed you shut it off.
Next I notice David refferences an ini file, hmm, probably meant .conf file. David reffers to compiling everything, when is the last time you compliled something in a modern distro that wasn't a one off, like needing apache 2.3.5. Its a misunderstanding to think you need to compile anything in the present, unless that is you have the time and you really want to.

Most of David's article is FUd and he lowers my opinion of ZDnet, my favorite part is

"I just can’t afford to waste any more time with Linux. Not when — by design — everything is held together with toothpicks, duct tape, and bailing wire.

No way. You couldn’t pay me to run Linux on my raw iron.

Never again."

Nice, very well rounded and inflammatory. So, David is permanently swearing off Linux for good. I for one say thank you. We don't need or want you...

Wednesday, October 19, 2011

Running Oracle in Vmware.
One area where I am constantly beat up is my Oracle practices. When I refer back to the Oracle config documents I always call them recommendations. Over the last three years this has been a major point of contention between me and my DBA's. I have built multi-tenant Oracle Rac implementations leveraging infiniband and I have built small three node Rac clusters for data availability. I have recently moved my Oracle experiences into the VM-World. Different Rulesets apply and through performance testing I have proved or disproved several theories of my own and my peers. For those who don't know, Oracle scales Excellent vertically, well, but not quite as well horizontally. Previous world record benchmarks were all set on Big Sun Iron, now the world records are all on multinode Rac clusters.

A good starting point Document is https://docs.google.com/viewer?a=v&q=cache:rbFmvCq0CxoJ:www.vmware.com/files/pdf/Oracle_Databases_on_vSphere_Deployment_Tips.pdf+&hl=en&gl=us&pid=bl&srcid=ADGEEShEbhUk5I76OpN_S7SBmuu0TnW_FFBO60DjpJ4gLZKAXb2TmbHlwQVyo00dxS9RKdHSZLDJhLkv4oFe4pwI7Y9YnylgTS9K-2lX2MSfnWL2kFAoz3bAKbCl4ycbU3hmSBnDOCav&sig=AHIEtbSK4GOjlyrMTBhTHH90Kov_jStbWw&pli=1

I start to deviate on Page 6. Leave IPtables on, duh, every box must run a systems level firewall. Also the doc doesn't tell you to shut off enough services. Next pay particular attention to tip 12 on page 13. Never ever use RDM's. If in doubt read my previous top ten rules of vmware. Tune your OS, based on the install guide, many of the OS variables change depending on the final amount of Memory you add to the system, so start with a reasonable baseline and move from there.

The biggest performance gains are made by right sizing your VM's. Using the performance suite of your choice, I like the combination of Toad, dyna trace, esxtop and GKrellM. When you run a performance test against your database server, if you hit more than 80% for a sustained period bump up your vcpu count. Make sure you do not cross boundaries. Don't use 6 vcpu's if you have 4 core procs. Most of the time barring oddities, your best Oracle performance will be had at 4 vcpus. Next play with you Memory. Always set a reservation on Memory used by Oracle virtuals. Other apps are very tolerent of memory being swapped in and out, but Oracle takes a steep performance penalty when it swaps. Dependign on dataset the final numbers will end up aroung 16-24Gb of system mem. Next set your SGA, make sure to leave enough space for your operating system. On most SGA max should be set to 14G with 16G of system memory. Next, tune your SGA, this is also a point of contention between me and DBA's, I think you should set your sga targets and watch it over time and then lock it down. Next set your sessions and your processes. This is also a tuning issue, if your performance test replicates your production load this should be easy, increase the count until performance falls off then back off.

The final outcome of all these tuning options will be your little 4vcpu VMware box will run almost as as good as physical hardware with similar core counts.

Wednesday, September 14, 2011

No more Tiers.

Data center networking has fundamentally changed over the last few years. Previously there was a clear separation of tiers; A couple of cores, a few distros, some server edges, some aggregation, and a bunch of access layer switches. The current datacenter network design philosophy is no more tiers. A data center network should be flat and encompass the previous functionality in one layer. Cisco called this a single tier datacenter model now they call it the Fabric Path based networ. I call this a distributed core architecture. All switches are access layer, distro layer, and core layer switches.

The move to a single layer in the datacenter is hard. There are many entrenched CCNA who are not capable of free thought unless cisco says so. Encouraging networkers to think outside the cert, is challenging.

To move to a single tier you must evaluate newer technologies. In a single tier everything counts so you want all ports to run at line speed. The next key piece of the architecture is uplinks.

Although Cisco provides nice equipment there is no way to get there from here. Current cisco inventory has to little backplane speed and throughput. Look for cisco to buy another company who makes decent 10/40Gbe gear.

A properly laid out single tier datacenter network has over subscription rates of less than 3:1@10Gbe. Less when the density is not as high. With a multi-tier layout over subscription on the access switch is at least 1.2:1@1Gbe and 9+:1@10Gbe. You can lay out a multiple tier datacenter in many different ways, but with tiers you will always have high over-subscription.

A benefit of a single tier is latency, when the max number of switches you will traverse to get between servers is two. Low latency is king in the Datcenter network. I like to illustrate this with a voip call, on a high latency network, voip calls are somewhat choppy and occasionally you can here yourself echo. Then you have to wait for the other end to catch up. Although most network engineers will tell you latency doesn't matter when its bellow a few ms. Many applications are sensitive to latency, so we buy separate networking for these applications; with single a tier all of these pools of technology can be integrated.

The drawbacks of moving to a single tier in the datacenter is complexity of configuration, the type of equipment you should buy, and the amount of ports utilized for full speed uplinks. The biggest drawback is scalability each datacenter will be limited to ~8 switches because of the number of interconnects. A single tier only scales to ~4Tb and depending on switch, 800 10Gbe server links.

Wednesday, June 22, 2011

No one owns knowledge, it should be freely shared. It is the processes and software derived from this knowledge that are leveraged to create products which are bought and sold. You can not patent a process or an idea.

You can patent how you do this process, what you have created from this process or unique methods that allow you to use this idea. Once a process exists in the wild, and is used by the community, such as a design pattern, you can no longer claim your patent unless it was delivered to the public by means of corruption.

If you write code to allow you to authenticate through a unique security process, then some one figures out your method and decides they should write code to do the same thing. You have no claim to their code and you can not stop them from using their process.

This fits into a larger open source argument, that has been ongoing between colleagues.

Thursday, May 26, 2011

My top ten networking rules.

10. Always segment traffic. Storage traffic should be on a storage vlan, Backup traffic on a backup vlan, database traffic - guess where on the database vlan. I even promote farther segregation, prod, stage, test, and dev vlans.

9. Scan your network. If you properly segment traffic, you should never see ssh on your database vlan. This should throw a big red flag and be investigated.

8. Always allow all vlans at the core, filter which vlans are allowed at the aggregation / distro layer. The core should be static. The Core is the backbone upon which your network is built. It should be redundant and bullet proof. Admins should almost never login. The core should be transparent.

7. Create ACL's, some protocols should not transit the network (telnet, netbios). Server administrators should filter out 90% of unnecessary traffic. Network admins should put in rules in case they are lazy and to get the other 10%.


6. There shalt not be more than 5 network devices between the user and the internet ( or the voip phone and the router), not including firewalls. So the worse case, A user pc connects to an access layer switch, to the aggregation layer switch, to the distro switch, to the core switch, to the core router.

5. There shalt not be more than 4 layers of switches between the top and the bottom of any network.(Thanks for your rebuttal Kevin, I Still don't Agree. This rule is sign of good design. I think you should re-evaluate 6 layers is more than excessive. I know I break the cisco mold. The data center stack should be directly connected to the core. but on the client side and the access layer you should not have more than four layers. Core, distro, aggregation, access. How would you name 6 layers?)

4. There shalt never be more than 3 devices between server and server, this includes servers in other data centers. The worse case a server connects to the data center edge switch, to the core, to different data center edge switch.

3. Avoid over subscription. In the Data center the rule is 1.2:1 for 1g, 8.4:1 for 10g

2. The core router should be used for network ingress and egress traffic only.

1. Always route at the access layer if possible. Layer 3 switches provide greater throughput and routing speeds for trivial routing. Voip acts better if you route as close to the user as possible.


The last but not least. Keep it simple. Always err towards simplicity, complexity kills.

Tuesday, May 17, 2011

My top ten rules of Virtualization.

10. Never allow the OS to manage the memory. Always make it static.

9. Route as close to the virtual servers as possible.

8. You can strive as much as you want to improve your VMware infrastructure, more memory, flash disks, 40Gbe, but if you do not take a holistic approach and simplify the network from the servers to the servers and from the servers to the desktop, you are only going to realize a portion of those gains.

7. A hop is a hop is a hop. Don't let the network ugru, tell you with this seamless fabric there are no hops. Every time you leave a switch it is a hop.

6. You can never have enough memory on an esx host, always max the host regardless of cost. Once you start using APM and DPM, you will be amazed when during the weekend you have one server running in your datacenter and you save 4k every weekend and about .5K every night in data center power and cooling.

5. Don't add cores. Every time you add a core to a virtual your raise the cpu wait state by at least 10%. Optimize applications, distribute load, create another instance, do not add another VCPU.

4. Do not back up at the OS layer. Backup via VCB or at the storage with snapshots and ndmp. Never backup at the OS. Same applies in the physical world. Backup the data, you can re-provision in minutes what it takes hours to restore.

3. Never use RDMS. The fictional performance gains are not worth the lack of functionality.

2. Always install VMware tools in the OS, and use vxnet.

1. This is an absolute no bending. Never install a Windows / Linux / Solaris cluster of any kind in VMware.

Yes, I realize Guru is misspelled, as ugru, but come on have you actually met a network guru.

Thursday, September 23, 2010

Changing IT paradigms, the cyclical revolution.

In 1994 we had two large Unix boxes connected to a terminal concentrators. The only way into the machines was via console. We had physical access to the machines and would load tapes to install software and databases. The only way to admin the machines was through bastion terminals. From a bastion terminal you could type Ctrl alt Pf4 and get to login prompt that would take you a unix shell.
Slowly IP was adopted and we could access the machines via terminal emulator. Our Desktop machines could do most of the work we needed to do but to admin the servers we had to use the consoles.
The next move was to telnet. Most users still connected to the machine via terminal emulator, but admins were able to connect from bastion hosts via telnet and get a Unix shell.
Sometime around 2002 our security folks realized telnet was insecure. So we compiled and ran ssh in its place. The Remote system console ports did no support SSH so we had to take them off the network. The console ports had to be direct connected to a terminal concentrator that supported ssh.
As SSH became ubiquitous we started allowing users to connect to the machines from anywhere via SSH. The console ports were now SSH and we moved those to the regular network also.
SSH was a good fit until ITIL and change control. What is the number one cause of system outage in an IT datacenter? Superusers and admins, how do you control users and admins? You don't let them log into the machines unless they have an approved change request. DBA's can't log into a database box unless they are doing an actual update. All SSH traffic must go through bastion hosts, The only services you can see from a machine on the production network, are services its serving. For example database, Apache, middleware. All maintenance on the machines is done through a bastion Maintenance VLAN, All storage is mounted on the Storage VLAN, all dev machines are on a DEV VLAN, all test machines are on a TEST VLAN, no traffic gets through between the VLANS except through bastion hosts.

And we have come full circle.