jump to navigation

Storage Systems September 27, 2007

Posted by Coolguy in IT Infrastructure.
Tags:
add a comment

Sun recently did a survey of 200 user sites globally and found that about 70% of the disk space is simple wasted.

Building a data center September 27, 2007

Posted by Coolguy in IT Infrastructure.
Tags: ,
add a comment

Five steps to build a entirely new data center or to renovate an existing one:

1. Get the board behind you

HP’s MArk Hurd and SUN’s Jonathan Schwartz did just that.
2. Choose location carefully

Factors that shouldn’t affect your decision include proximity to any of the company’s locations or highly populated cities. Look at available power supply and square footage.Google built a datacenter in Dalles near Portland as its close to Columbia river with potential for unlimited hydroelectric power.

3. Design it Green

Green means lighting, cooling, computer systems and mechanical systems are designed for maximum energy efficiency. Companies spend much more on the power to run a server during its lifetime than they do in capital to purchase it.

Advantages for going green:

  • A green data center will consume 50-80% less power than a typical data center.
  • Minimization of size of building
  • Reducing carbon footprint

Liebert is US market leader in data center air conditioning, UPS systems.

 

4. Pick products carefully

 

Hardware and software for data center should be chosen for performance, quality, green and scalability.

 

Consider:

  • Open systems: Hot-swappable disk drives, power supplies, fans. Look for open standards. Not necessarily open source.
  • Versatile products: That allow for interconnecting components.
  • Virtulization: For both servers and storage increases visibility of underutilised servers and storage
  • Data De duplication: Data-duplication eliminates redundant data throughout the storage network, increasing cost effectiveness and efficiency.
  • Thin Provisioning: Lets admin’s limit the allocation of storage to what the apps immediately need. It enables automatic addition of capacity on demand upto preset limits. Equallogi, Hitachi, EMC, NetApp, 3PAR and CommVault offer thin provisioning of SAN or iSCSI.

5. Turn-it off:

Put the right management systems that will help you manage parts of systems. Scalent Systems, Cassatt offer systems to manage data centers and server appliances respectively. MonoSphere, Asigra and Onaro offer storage allocation and monitoring systems.

 

 

 

Website on steroids September 26, 2007

Posted by Coolguy in IT Infrastructure, Web Applications.
Tags: , ,
2 comments

When a user types in a URL to request a Web page, the page is created by an application server, which executes a script that builds the page. This script contains actions such as calls to database systems to retrieve content. The result is an HTML “blueprint” for the requested page.

The page is then delivered back to the browser in a quick, text-only, nonbandwidth-intensive transfer that does not incorporate graphics. Finally, the browser must fetch the graphics, requesting each object from the appropriate server address based on the embedded URLs in the HTML page.

Because browsers are limited to downloading two to four objects at a time and a typical Web page may contain 30 to 40 embedded objects, a good deal of back-and-forth handshaking between the browser and server is required to complete the loading of a page

Solutions to speed up

Address both Network latency & Server latency.

Network Latency
Network Caching addresses network latency. By storing and serving objects from the network edge, caching slashes the time it takes a browser to load an object. Hardware applicances which help caching are NetCache from NetApp and Cache Server Director from Radware. Even for dynamic content, static elements like logos etc could be cached.

Edge delivery is another option.

Caching is now a sophisticated programmable tool. You cannot just enable caching after an application is written. You have to start thinking about using caching right from the architectural stage. Caching doesn’t effectively address dynamic page generation, which typically accounts for 40% of the time required to deliver a Web page

Dynamic content accelerators

A dynamic-content accelerator is positioned between the Web server and back-end resources to field and fill logic requests. Relying on the expectation that even personalized content will make use of some recycled data, content accelerators reduce the number of application server and database calls needed to compose an HTML page response.

One such product from Chutney and spider cache

ESI

Edge Side Includes (ESI), HTML-based language, which has been proposed to the World Wide Web Consortium as an industry standard, defines fragments of Web pages, allowing them to be assembled and updated at the edge of the Internet. With ESI, companies can set rules within Web pages, alerting the cache when it is necessary to retrieve fresh information from an origin server and when cached content can be used. Then new content from origin servers can be combined with cached content so that an entire Web page can be assembled at the network’s edge – no need to retrieve complete pages from origin infrastructure.

Cantos uses ESI and Akamai

Tangosol can be used to cache data from the database.

Links:
Original Article
Another Article
Creating a cache friendly website
Edge servers and how it works
MTV Case Study
Jcache
Spiritsoft

Business continuity facts September 26, 2007

Posted by Coolguy in IT Infrastructure.
Tags: ,
add a comment

25% companies do not test continuity plans.
Terrorism accounts for 3% of incidents.
Software or hardware failures account for 67% of incidents.
Power failures make for other 16%.

HTTP Methods August 19, 2005

Posted by Coolguy in Networks.
Tags:
add a comment

HTTP Methods

  • HTTP 1.1 Methods are GET,POST,OPTIONS,HEAD,TRACE,PUT,DELETE,OPTIONS,CONNECT
  • Methods can be safe or idempotent
  • Idempotent means doing the same thing again and again without no unwanted side effects
  • GET.HEAD,PUT,DELETE,OPTIONS,TRACE are idempotent
  • POST in not idempotent
  • OPTIONS: This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval. Asks for list of http methods to which the requested URL can respond.
  • GET: The GET method means retrieve whatever information is identified by the Request-URI
  • A simple hyperlink is always GET
  • If there is no ‘method’ in a form, the default is GET
  • HEAD: The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response.This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.
  • POST: The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line.
  • PUT: The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server
  • DELETE: The DELETE method requests that the origin server delete the resource identified by the Request-URI
  • TRACE: The TRACE method is used to invoke a remote, application-layer loop- back of the request message.TRACE allows the client to see what is being received at the other end of the request chain and use that data for testing or diagnostic information. Asks for loopback of request message so that the client can see what’s being recieved on the other end.
  • CONNECT: Says to connect for the purposes of tunneling.

GET vs POST

  • Use GET if: The interaction is more like a question (i.e., it is a safe operation such as a query, read operation, or lookup).
  • Use POST if: The interaction is more like an order, or The interaction changes the state of the resource in a way that the user would perceive (e.g., a subscription to a service), or The user be held accountable for the results of the interaction
  • Issues with GET are:
  • Size of data
  • Security
  • GET can be bookmarked
  • Developers who want to support both methods usually putlogic in doGet(), and then have the doPost() implementation delegate to that doGet()

More:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html

http://www.w3.org/2001/tag/doc/whenToUseGet.html#principles-summary

Structure of the Internet and bottlenecks August 10, 2005

Posted by Coolguy in Networks.
add a comment
  • The Internet is made up of thousands of different networks (also called Autonomous Systems or AS’s) that communicate using the IP protocol
  • These networks range from large backbone providers such as UUNet and PSINet to small local ISPs
  • Each of these networks is a complex entity in itself, being made up of routers,switches, fiber, microwave links, and ATM technology.
  • All of these components work together to move data packets
    through the network toward their specified destinations.
  • In order for the Internet to function as a single global network interconnecting everyone, all of these individual networks must connect to each other and exchange traffic. This happens through a process called peering.
  • When two networks decide to connect and exchange traffic, a connection called a peering session is established between a pair of routers,
    each located at the border of one of the two networks. These two routers periodically exchange routing information,thereby informing each other of the destination users and servers that are reachable through their respective networks.
  • There exist thousands of peering points on the Internet, each falling into one of two categories: public or private.
  • Public peering occurs at major Internet interconnection points such as MAE-East, MAE-West, and the Ameritech NAP, while private peering arrangements bypass these points. Peering can either be free, or one network may purchase a connection to another, usually bigger, network.
  • Once the networks are interconnected at peering points, the software running on every Internet router moves packets in such a way as to transport each request and data packet to its correct destination.
  • For scalability purposes, there are two types of routing protocols directing traffic on the Internet today.
  • Interior gateway protocols such as OSPF and RIP create routing paths within individual networks or AS’s
  • Exterior gateway protocol BGP is used to send traffic between different networks.
  • Interior gateway protocols use detailed information on network topology, bandwidth and link delays to compute routes through a network for the packets that enter it.
  • Since this approach does not scale to handle a large-scale network composed of separate administrative domains, BGP is used to link individual networks together to form the Internet.
  • BGP creates routing paths by simply minimizing the number of individual networks (AS’s) a data packet must traverse. While this approach does not guarantee that the routes are even close to optimal, it supports a global Internet by scaling to handle thousands of AS’s and allowing each of them to implement their own independent routing policies.

Bottlenecks

  • There are four types of bottlenecks that, left unaddressed, can slow down performance and impede the ability of the Internet to handle a quickly growing number of users, services, and traffic.
  • These bottlenecks occur at the following points in the Internet infrastructure:
    1. First Mile
  • 2. Peering Points
  • 3. Backbone
  • 4. Last Mile

Bogged Down at the Beginning

  • Each content provider sets up a Web site in a single physical location and disseminates data, services, and information to all Internet users around the world from this central location.
  • This means that the speed with which users can access the site is necessarily limited by its First Mile connectivity – the bandwidth capacity of the Web site’s connection to the Internet.
  • In order to accommodate a growing number of users in this model, not only must each content provider keep buying larger and larger connections to his or her ISP, but so must the ISP continually expand its internal network capacity, and the same goes for neighboring networks.
  • Since it is impossible in this type of approach to keep up with exponential growth in Internet traffic, the centralized model of content serving is inherently unscalable.
  • In all of situations, the demand for the content exceeded the first mile capacity of the Web site.

Peering:Points of Congestion

  • The second type of Internet bottleneck occurs at peering points – the interconnection points between independent
    networks.
  • The reasons why peering points are bottlenecks are mostly economic.
  • First of all, networks have little incentive to set up free peering arrangements, since there is no revenue generation opportunity in that type of arrangement, but there are considerable setup costs.
  • At the same time, none of the large networks is going to agree to pay another large network for peering, because from a traffic perspective, they would both benefit equally from such an arrangement. As a result, large networks end up not peering with each other very much and so the limited number of peering points between them end up as bottlenecks.
  • One of the most common types of peering occurs when a small network buys connectivity from a large network. The issue that comes up in this situation is the long time it takes to provision the necessary circuits.
  • Although there is plenty of dark, or unused, fiber already in the ground, the telephone companies who own it prefer to install new fiber for each
    requested circuit in order to increase their revenues. As a result, it takes 3-6 months to provision new circuits for peering. By the time a requested circuit is installed, traffic may have grown beyond expectations, resulting in a full link – a bottleneck.
  • It is also the case that it doesn’t make economic sense for a network to pay for unused capacity. Therefore, a network generally purchases just enough capacity to handle current traffic levels, so every peering link is full. Unfortunately, this practice of running links at full capacity creates severe traffic bottlenecks. Links run at very high utilization exhibit both high packet loss rates as well as very high latency (because of router queuing delays) even for the packets that can get through. As a result, web performance slows to a crawl.
  • Peering bottlenecks may dissipate due to telecom consolidation
  • Since there are thousands of networks in the Internet, there are at least thousands of peering points in the Internet.
  • Since access traffic is evenly spread out over the Internet’s thousands of networks, most traffic must travel through a number of different networks, and, consequently, a number of peering points, to reach its destination.
  • Therefore, the peering bottleneck problem is clearly a large-scale problem inherent to the structure of the Internet.

Breaking the Backbone

  • The third type of Internet bottleneck is in the capacity of the large long-haul networks that make up the Internet backbone.
  • Because today’s centralized model of content serving requires that almost all Internet traffic traverse one or more backbone networks, the capacity of these networks must be able to grow as quickly as Internet traffic.
  • A network’s capacity is determined by the capacity of its cables and routers.
  • Since fiber is cheap, plentiful and able to support high-bandwidth demands, cable capacity is not an issue.
  • Instead, it is the routers at the ends of the fiber cables that limit backbone capacity.
  • At any point in time, the speed of the packet-forwarding hardware and software in routers is limited by current technology.
  • Many ISPs run IP over switched ATM networks, because IP routers have not been able to keep pace with their traffic demands.However an ATM network is more expensive to deploy and maintain.
  • And while backbone providers too are spending a great deal of money upgrading their routers to handle more traffic, demand will still end up far exceeding capacity.
  • E:g
    Let’s compute demand for long-haul Internet capacity. Consider an example application of video-on-demand on the Internet, the personalized Cooking Channel. Instead of watching the broadcast version of the generic Cooking Channel program on TV, each user will be able to put together his own “menu” of recipes to learn, catering to his or her own specific tastes or entertainment plans. How much Internet capacity will the personalized Cooking Channel consume? Viewer monitoring performed by Nielsen Media Research shows that cooking shows rate about 1/10 of a Nielsen rating point, or 100,000 simultaneous viewers. At a conservatively low encoding rate of 300 Kbps and 100,000 unique simultaneous streams, the personalized Cooking Channel will consume 30 Gbps.
  • Now consider WorldCom’s UUNET backbone which carries approximately half of all Internet transit traffic. Its network is comprised of multiple hubs of varying capacity, ranging from T1 (1.544 Mbps) to OC48 (2,450 Mbps or 2.4 Gbps). UUNET’s Canadian and U.S. networks are linked with more than 3,600 Mbps (or 3.6 Gbps) of aggregate bandwidth between multiple metropolitan areas. However, the equivalent of 32 cooking channels alone could overload the capacity of the world’s largest backbone to transfer content from the U.S. to Canada!

The Long Last Mile

  • A common misconception, is that eliminating this bottleneck will solve all Internet performance problems by providing high-speed Internet access for everyone.
  • In fact, by rate-limiting Internet access, 56 Kbps modems are saving the Internet from a meltdown.
  • If all users were able to access the Internet via multi-megabit cable modem or DSL modems, the other three types of bottlenecks would make the Internet unbearably slow.

Delivering Content from the Network Edge

  • The current centralized model of Internet content serving requires that all user requests and data travel through several networks and, therefore, encounter all four types of core Internet bottlenecks, in order to reach their destinations.
  • Due to this model, the entire Internet slows down during peak traffic times
  • Fortunately, this unacceptable end result can be avoided by replacing centralized content serving with edge delivery, a much more scalable model of distributing information and services to end users.
  • In edge delivery, the content of each Web site is available from multiple servers located at the edge of the Internet. In other words, a user/browser would be able to find all requested content on a server within its home network.
  • The edge is the closest point in the network to the end user, who is the ultimate judge of service.
  • Edge delivery solves the first mile bottleneck by eliminating the central point from which all data must be retrieved.
  • By making each Web site’s content available at multiple servers, each Web site’s capacity is no longer limited by the capacity of a single network link.
  • Edge delivery solves the peering bottleneck problem by making it unnecessary for web requests and data to traverse multiple networks and thus encounter peering points. Of course, in order to accomplish this goal, the edge delivery servers must be deployed inside ALL networks that have access customers.
  • Since there are thousands of networks and access traffic is spread evenly among them, edge delivery schemes with only 50-100 locations or deployed in only the top 10 networks cannot in any way solve the peering bottleneck problem.
  • When content is retrieved from the network edge, the demand for backbone capacity decreases, thus alleviating the bottleneck there. Again, this can only be achieved with edge servers deployed at every one of the thousands of Internet access providers.
  • Edge delivery does not solve the last mile bottleneck issue, it does help to alleviate the growing severity of the other three bottlenecks, thus enabling content delivery closer to end users.

Edge Delivery – Challenges

  • Content must be deployed on edge servers
  • Requires massive deployment of edge servers
  • Geographic diversity of edge servers is critical
  • Content management across disparate networks
  • Content must be kept fresh and synchronized
  • Fault tolerance is crucial
  • Monitor Internet traffic conditions
  • Request routing
  • Managing load
  • Performance monitoring

SOCKS August 10, 2005

Posted by Coolguy in Networks.
add a comment
  • Socks protocol provides a framework for client-server applications in both the TCP and UDP domains to conveniently and securely use the services of a network firewall
  • The basic purpose of the protocol is to enable hosts on one side of a SOCKS server to gain access to hosts on the other side of a SOCKS Server, without requiring direct IP-reachability.
  • The protocol is conceptually a “shim-layer” between the application layer and the transport layer, and as such does not provide network layer gateway services
  • SOCKS includes two components, the SOCKS server and the SOCKS client
  • When an application client needs to connect to an application server, the client connects to a SOCKS proxy server. The proxy server connects to the application server on behalf of the client, and relays data between the client and the application server. For the application server, the proxy server is the client.

More:

http://www.javvin.com/protocolSocks.html

Proxy Servers August 10, 2005

Posted by Coolguy in Networks.
add a comment
  • Proxy server is a program that acts as an intermediary between computers on your LAN and computers on the Internet
  • Proxy servers often have a cache built in to make web surfing faster
  • Some proxy servers also allow the filtering of web content or domains. Additionally, almost all proxy servers support logging
  • Workstations on your network must request data (like web pages) from the proxy server to access the internet.
  • The proxy server then fetches the internet data, checks it’s filters, and returns it to the workstation that requested it.
  • Because a proxy server does all of the data requesting, each machine must be configured to make all internet requests from the proxy server – not the internet
  • Proxy servers are not a ‘transparent’ connection sharing technology
  • Proxy servers provide three main functions:
    firewalling and filtering
    connection sharing
    caching
  • Proxy servers are also more difficult to install and maintain than firewalls, as proxy functionality for each application protocol like HTTP, SMTP, or SOCKS must be configured individually

Products:

Squid

Firewalls August 10, 2005

Posted by Coolguy in Networks.
add a comment
  • A firewall is simply a program or hardware device that filters the information coming through the Internet connection into your private network or computer system
  • If an incoming packet of information is flagged by the filters, it is not allowed through.
  • A company will place a firewall at every connection to the Internet (for example, at every T1 line coming into the company).
  • The firewall can implement security rules. For example, one of the security rules inside the company might be:
    Out of the 500 computers inside this company, only one of them is permitted to receive public FTP traffic. Allow FTP connections only to that one computer and prevent them on all others.
  • Firewalls use one or more of three methods to control traffic flowing in and out of the network:
  • Packet filtering – Packets (small chunks of data) are analyzed against a set of filters. Packets that make it through the filters are sent to the requesting system and all others are discarded.
  • Proxy service – Information from the Internet is retrieved by the firewall and then sent to the requesting system and vice versa.
  • Stateful inspection – A newer method that doesn’t examine the contents of each packet but instead compares certain key parts of the packet to a database of trusted information. Information traveling from inside the firewall to the outside is monitored for specific defining characteristics, then incoming information is compared to these characteristics. If the comparison yields a reasonable match, the information is allowed through. Otherwise it is discarded.

Customising Firewalls

  • IP addresses: If a certain IP address outside the company is reading too many files from a server, the firewall can block all traffic to or from that IP address.
  • Domain names: A company might block all access to certain domain names, or allow access only to specific domain names.
  • Protocols: A company might set up only one or two machines to handle a specific protocol and ban that protocol on all other machines.
  • Ports: A company might block certain ports access on all machines but one inside the company.
  • Specific words and phrases : The firewall will sniff (search through) each packet of information for an exact match of the text listed in the filter. For example, you could instruct the firewall to block any packet with the word “X-rated” in it.

Uses of Firewall

Protects from

  • Remote login
  • Application backdoors
  • SMTP session hijacking
  • Operating system bugs
  • Denial of service
  • E-mail bombs
  • Macros
  • Viruses
  • Spam
  • Redirect bombs :Hackers can use ICMP to change (redirect) the path information takes by sending it to a different router. This is one of the ways that a denial of service attack is set up
  • Source routing : In most cases, the path a packet travels over the Internet (or any other network) is determined by the routers along that path. But the source providing the packet can arbitrarily specify the route that the packet should travel. Hackers sometimes take advantage of this to make information appear to come from a trusted source or even from inside the network! Most firewall products disable source routing by default.

Firewall Products

Radware specific faq

DNS August 10, 2005

Posted by Coolguy in Networks.
add a comment
  • Domain name servers, or DNS, are an incredibly important but completely hidden part of the Internet
  • Domain name servers translate domain names to IP addresses

Complications:

  • There are billions of IP addresses currently in use, and most machines have a human-readable name as well.
  • There are many billions of DNS requests made every day. A single person can easily make a hundred or more DNS requests a day, and there are hundreds of millions of people and machines using the Internet daily
  • Domain names and IP addresses change daily
  • New domain names get created daily.
  • Millions of people do the work to change and add domain names and IP addresses every day

Distributed System

  • DNS system is a distributed database
  • Every domain has a domain name server somewhere that handles its requests, and there is a person maintaining the records in that DNS.
  • Name servers do two things all day long:
  • They accept requests from programs to convert domain names into IP addresses
  • They accept requests from other name servers to convert domain names into IP addresses
  • When a request comes in, the name server can do one of four things with it:

    • It can answer the request with an IP address because it already knows the IP address for the domain.
    • It can contact another name server and try to find the IP address for the name requested. It may have to do this multiple times.
    • It can say, “I don’t know the IP address for the domain you requested, but here’s the IP address for a name server that knows more than I do.”
    • It can return an error message because the requested domain name is invalid or does not exist.
  • When you type a URL into your browser, the browser’s first step is to convert the domain name and host name into an IP address so that the browser can go request a Web page from the machine at that IP address
  • To do this conversion, the browser has a conversation with a name server.
  • When you set up your machine on the Internet, you tell your machine what name server it should use for converting domain names to IP addresses
  • WINIPCFG.EXE, IPCONFIG, nslookup are used to view current name server
  • The name server may already know the IP address
  • That would be the case if another request to resolve the same name came in recently
  • In that case, the name server can return the IP address immediately
  • If not, a name server would start its search for an IP address by contacting one of the root name servers.
  • The root servers know the IP address for all of the name servers that handle the top-level domains
  • Your name server would ask the root for www.xyz.com, and the root would say (assuming no caching), “I don’t know the IP address for www.xyz.com, but here’s the IP address for the COM name server.”
  • These root servers are vital to this whole process, so:
    There are many of them scattered all over the planet.
    Every name server has a list of all of the known root servers. It contacts the first root server in the list, and if that doesn’t work it contacts the next one in the list, and so on
  • The root server knows the IP addresses of the name servers handling the several hundred top-level domains
  • It returns to your name server the IP address for a name server for the COM domain
  • Your name server then sends a query to the COM name server asking it if it knows the IP address for www.xyz.com
  • The name server for the COM domain knows the IP addresses for the name servers handling the domain, so it returns those.
  • Your name server then contacts the name server for http://www.xyz.com/ and asks if it knows the IP address for www.xyz.com.
  • It does, so it returns the IP address to your name server, which returns it to the browser, which can then contact the server for http://www.xyz.com/ to get a Web page
  • There are multiple name servers at every level, so if one fails, there are others to handle the requests
  • Once a name server resolves a request, it caches all of the IP addresses it receives. Once it has made a request to a root server for any COM domain, it knows the IP address for a name server handling the COM domain, so it doesn’t have to bug the root servers again for that information. Name servers can do this for every request, and this caching helps to keep things from bogging down.
  • Name servers do not cache forever, though. The caching has a component, called the Time To Live (TTL), that controls how long a server will cache a piece of information. When the server receives an IP address, it receives the TTL with it. The name server will cache the IP address for that period of time (ranging from minutes to days) and then discard it. The TTL allows changes in name servers to propagate.
  • Not all name servers respect the TTL they receive, however. When HowStuffWorks moved its machines over to new servers, it took three weeks for the transition to propagate throughout the Web.
  • All name servers run software called BIND

BIND

  • BIND (Berkeley Internet Name Domain) is an implementation of the Domain Name System (DNS)
  • It provides components for
  • Domain Name System server (named)
  • Domain Name System resolver library
  • tools for verifying the proper operation of the DNS server
  • The BIND DNS Server is used on the vast majority of name serving machines on the Internet