Author Archive

Let’s Play Vulnerability Bingo!

Posted in System administration, security on July 1st, 2010 by Jessica McKellar6 Comments

Dear Fellow System Administrators,

I like excitement in my life. I go on roller coasters, I ride my bike without a helmet, I make risky financial decisions. I treat my servers no differently. When my Linux vendor releases security updates, I think: I could apply these patches, but why would I? If I did, I’d have to coordinate with my users to schedule a maintenance window for 2am on a Sunday and babysit those systems while they reboot, which is seriously annoying, hurts our availability, and interrupts my beauty sleep (and trust me, I need my beauty sleep). Plus, where’s the fun in having a fully-patched system? Without open vulnerabilities, how else would I have won a ton of money in my office’s Vulnerability Bingo games?
vulnerability bingo card

How can I get in on some Vulnerability Bingo action, you ask? Simple: get yourself some bingo cards, be sure not to patch your systems, and place chips on appropriate squares when your machines are compromised. Or, as a fun variant, place chips when your friends’ machines get compromised! For the less adventurous, place chips as relevant Common Vulnerabilities and Exposures are announced.

What’s really great is the diversity of vulnerabilities. In 2009 alone, Vulnerability Bingo featured:

physically proximate denial of service attacks (CVE-2009-1046).

local denial of service attacks (CVE-2009-0322, CVE-2009-0031, CVE-2009-0269, CVE-2009-1242, CVE-2009-2406, CVE-2009-2407, CVE-2009-2287, CVE-2009-2692, CVE-2009-2909, CVE-2009-2908, CVE-2009-3290, CVE-2009-3547, CVE-2009-3621, CVE-2009-3620) coming in at least 5 great flavors: faults, memory corruption, system crashes, hangs, and the kernel OOPS!

And the perennial favorite, remote denial of service attacks (CVE-2009-1439, CVE-2009-1633, CVE-2009-3613, CVE-2009-2903) including but not limited to system crashes, IOMMU space exhaustion, and memory consumption!

How about leaking potentially sensitive information from kernel memory (CVE-2009-0676, CVE-2009-3002, CVE-2009-3612, CVE-2009-3228) and remote access to potentially sensitive information from kernel memory (CVE-2009-1265)?

Perhaps I can interest you in some privilege escalation (CVE-2009-2406, CVE-2009-2407, CVE-2009-2692, CVE-2009-3547, CVE-2009-3620), or my personal favorites, arbitrary code execution (CVE-2009-2908) and unknown impact (CVE-2009-0065, CVE-2009-1633, CVE-2009-3638).

Sometimes you get a triple threat like CVE-2009-1895, which “makes it easier for local users to leverage the details of memory usage to (1) conduct NULL pointer dereference attacks, (2) bypass the mmap_min_addr protection mechanism, or (3) defeat address space layout randomization (ASLR)“. Three great tastes that taste great together — and a great multi-play Bingo opportunity!

Linux vendors release kernel security updates almost every month (take Red Hat for example), so generate some cards and get in on the action before you miss the next round of exciting CVEs!

Happy Hacking,
Ben Bitdiddle
System Administrator
HackMe Inc.


You Don’t Want to Win Vulnerability Bingo

Don’t be like Ben Bitdiddle. Applying kernel updates doesn’t have to be a compromise between availability and security.

Ksplice Uptrack allows you to apply security updates without rebooting. When your Linux vendor release patches, the Uptrack service will send your machines rebootless versions that can be applied with zero disruption to you, your customers, or your running applications. Seeing is believing: roll out a fully-functional free trial on an unlimited number of machines and see what rebootless updates can do for you.

Query Planners and Baseball

Posted in Databases on June 10th, 2010 by Jessica McKellar10 Comments

I’m not a big sports aficionado. However, I’m compelled by virtue of living in Boston to care about baseball, and I just finished reading Moneyball, which is all about baseball statistics. One thing about baseball I definitely appreciate is that there are lots of published statistics, because lots of statistics means nice big public data sets available for my relational database querying pleasure.

Let’s use baseball statistics as an excuse to learn more about query planners (or use query planners as an excuse to learn more about baseball, if that’s how you roll):

I tend to use Postgres in my day-to-day data processing, so that’s what I’m using here. My data set is from baseball1.com, Sean Lahman’s insanely comprehensive baseball statistics archive, specifically the batting and salaries tables from Version 5.7 (1871-2009).

Batting

For starters, how many people have hit a baseball, professionally? How many people have hit home runs? How many people have hit a lot of home runs (what does ‘a lot’ mean in this context)? I have no idea…

We can get this information from the batting table. This is also an opportunity to look at what the Postgres query planner thinks about how to conduct read queries on a single table. To give the query planner some choices, we’ll create an index on the number of home runs.

baseball# CREATE INDEX bhomers ON batting USING btree(homeruns);

Postgres is nice enough to give us a number of indexing options. In this case we’re using a B-tree, which is like a binary tree’s shorter, fatter cousin: each node can have many children, within a range, and it’s self-balancing, so the tree can have few levels and minimize the number of expensive lookups to get to a value.

Let’s get a count on everyone who’s been at bat since 1871:

baseball=# EXPLAIN ANALYZE SELECT COUNT(DISTINCT(playerid)) FROM batting;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2609.83..2609.84 rows=1 width=9) (actual time=852.611..852.612 rows=1 loops=1)
   ->  Seq Scan on batting  (cost=0.00..2378.06 rows=92706 width=9) (actual time=0.020..27.904 rows=92706 loops=1)
 Total runtime: 852.718 ms
(3 rows)

baseball=# SELECT COUNT(DISTINCT(playerid)) FROM batting;
 count
-------
 17256
(1 row)

Cost is measured in estimated disk page fetches. The first cost number is the estimated start-up cost. Actions like pre-sorting would be bucketed under start up costs; for this simple query we have a startup cost of 0.00 for the sequential scan on the table. The second cost number is the estimated total cost, or the total number of disk page fetches for this query, in this case 2378.06 for the sequential scan. There are 92706 tuples in the table, so 92706/2378.06 ≈ 40 tuples per 4 KB disk page on this machine. The “actual time” numbers are in milliseconds. For bonus fun, frob some of the planner cost constants and see how that affects queries.

We can get the size of the table by itself and the size of the table along with any auxiliary data structures using something like:

baseball=# SELECT pg_size_pretty(pg_relation_size('batting')) As table_size, pg_size_pretty(pg_total_relation_size('batting')) AS table_plus_aux_size;
 table_size | table_plus_aux_size
------------+---------------------
 11 MB      | 15 MB
(1 row)

So on this sequential scan we’re working through 11 MB of data in 27.904 milliseconds. This looks like a sequential read rate of (11 MB / 27.904 ms) x (1000 ms / 1 sec) = 409 MB / sec, which would be amazing on this low-end machine, but all queries in this post were run several times and results reflect the benefit of a warm cache (and that’s a story for another post!).

We have to go through every tuple in the batting table to get the count, so it makes sense that we’re doing a full sequential scan. One disk seek and a big sequential read are going to be cheaper than the disk seeks incurred for random-access reads if we try to take advantage of any of the auxiliary data structures for this table.

Okay, cool, over 17,000 people have been at bat since 1871. But how many have hit homers — more than one, so it’s not a fluke. Let’s increase the selectivity by adding a condition (the WHERE clause) on homeruns, the column on which we’ve created an index:

baseball=# EXPLAIN ANALYZE SELECT COUNT(DISTINCT(playerid)) FROM batting WHERE homeruns > 1;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2379.50..2379.51 rows=1 width=9) (actual time=218.128..218.128 rows=1 loops=1)
   ->  Bitmap Heap Scan on batting  (cost=516.91..2310.90 rows=27439 width=9) (actual time=8.241..19.220 rows=26953 loops=1)
         Recheck Cond: (homeruns > 1)
         ->  Bitmap Index Scan on bhomers  (cost=0.00..510.05 rows=27439 width=0) (actual time=7.735..7.735 rows=26953 loops=1)
               Index Cond: (homeruns > 1)
 Total runtime: 218.248 ms
(6 rows)

baseball=# SELECT COUNT(DISTINCT(playerid)) FROM batting WHERE homeruns > 1;
 count
-------
  5231
(1 row)

5231/17256 = 30% of batters have hit home runs. Nice!

The “actual time” rows value is 26953, so 26953 tuples, or 26953/92706 = 29% of the rows in the table, matched the homeruns condition. The query planner has been maintaining statistics on the data distributions in this table and guessed that 27439 would match the condition, which is pretty close. This is apparently selective enough to switch us over to using bitmaps. In the bitmap index scan, we traverse the bhomer index B-tree to create an in-memory mapping of homeruns to rowids that match the condition (homeruns > 1). We then create a bitmap for the rowids, setting a bit if that row appeared in the mapping. This lets us scan for the matching tuples in the table (aka the “heap”, hence “bitmap heap scan”) in on-disk order, minimizing disk seeks and the number of pages that have to be fetched into memory.

Let’s bump up the selectivity even more and see what it takes to be a real home run juggernaut:

baseball=# EXPLAIN ANALYZE SELECT COUNT(DISTINCT(playerid)) FROM batting WHERE homeruns > 52;
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=376.42..376.43 rows=1 width=9) (actual time=0.550..0.550 rows=1 loops=1)
   ->  Index Scan using bhomers on batting  (cost=0.00..376.11 rows=124 width=9) (actual time=0.055..0.431 rows=23 loops=1)
         Index Cond: (homeruns > 52)
 Total runtime: 0.640 ms
(4 rows)

baseball=# SELECT COUNT(DISTINCT(playerid)) FROM batting WHERE homeruns > 52;
 count
-------
    15
(1 row)

This query returned 15 players, meaning more than 52 home runs in one season puts you in the top .1% of all home run hitters. This high degree of selectivity has also switched us over to an index scan. This time, we’re fetching in the order of the bhomer index. It can mean more disk seeks and page fetches to get the matching tuples, but the query is so selective that the extra jumping around is less costly than the sorting that would be required for a bitmap heap scan.

(For the baseball cave-dwellers like myself, Barry Bonds still has the most home runs in a single season, at 73)

Salaries

For reads on a single table, the query planner had to decide between sequential, bitmap heap, and index scans. What happens if we throw a second table into the mix with a join?

First, for fun more indices:

baseball=# CREATE INDEX brbis ON batting USING btree(rbis);
baseball=# CREATE INDEX ssal ON salaries USING btree(salary);

And here’s a join across batting and salaries that will get us information on players making lots of money who haven’t completely failed to generate some RBIs in a given season:

baseball=# explain analyze select b.playerid from batting b, salaries s where b.playerid = s.playerid and b.yearid = s.yearid and s.salary > 30000000 and b.rbis > 10;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=11.71..2984.29 rows=1 width=9) (actual time=51.087..51.129 rows=1 loops=1)
   Hash Cond: (((b.playerid)::text = (s.playerid)::text) AND (b.yearid = s.yearid))
   ->  Seq Scan on batting b  (cost=0.00..2609.82 rows=36275 width=13) (actual time=0.025..42.779 rows=33921 loops=1)
         Filter: (rbis > 10)
   ->  Hash  (cost=11.68..11.68 rows=2 width=13) (actual time=0.038..0.038 rows=1 loops=1)
         ->  Index Scan using ssal on salaries s  (cost=0.00..11.68 rows=2 width=13) (actual time=0.031..0.033 rows=1 loops=1)
               Index Cond: (salary > 30000000::double precision)
 Total runtime: 51.231 ms
(8 rows)

Postgres thinks the smartest thing to do is an index scan on ssal because of the high salary selectivity and a sequential scan on rbis because of the low RBI selectivity. Then do a hash join on the two tables and spit out the rows matching the filters for a snappy runtime of 51 milliseconds.

What happens if we take away the query planner’s ability to use a hash join?

baseball=# set enable_hashjoin to false;
SET
baseball=# explain analyze select b.playerid from batting b, salaries s where b.playerid = s.playerid and b.yearid = s.yearid and s.salary > 30000000 and b.rbis > 10;
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2824.10..5005.53 rows=1 width=9) (actual time=71.004..71.075 rows=1 loops=1)
   Join Filter: (((b.playerid)::text = (s.playerid)::text) AND (b.yearid = s.yearid))
   ->  Index Scan using ssal on salaries s  (cost=0.00..11.68 rows=2 width=13) (actual time=0.039..0.048 rows=1 loops=1)
         Index Cond: (salary > 30000000::double precision)
   ->  Materialize  (cost=2824.10..3364.85 rows=36275 width=13) (actual time=0.026..63.490 rows=33921 loops=1)
         ->  Seq Scan on batting b  (cost=0.00..2609.82 rows=36275 width=13) (actual time=0.018..44.113 rows=33921 loops=1)
               Filter: (rbis > 10)
 Total runtime: 71.518 ms
(8 rows)

Without a hash join at its disposal, the query planner opts for a nested loops join. In a nested loops join, we’re basically doing:

for every row in salaries:
    for every row in batting:
        keep based on the join filter?

The materialize is like a cache of the filtered results of the inner loop, so it doesn’t have to be reevaluated after the first iteration through the outer loop. Even if it looks inefficient, it turns out that this method isn’t that much slower than a hash join.

Let’s hamstring it further by disabling nested loops joins too:

baseball=# set enable_nestloop to false;
SET
baseball=# explain analyze select b.playerid from batting b, salaries s where b.playerid = s.playerid and b.yearid = s.yearid and s.salary > 30000000 and b.rbis > 10;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=5991.77..6263.62 rows=1 width=9) (actual time=374.931..374.935 rows=1 loops=1)
   Merge Cond: (((b.playerid)::text = (s.playerid)::text) AND (b.yearid = s.yearid))
   ->  Sort  (cost=5980.06..6070.74 rows=36275 width=13) (actual time=343.304..365.986 rows=26055 loops=1)
         Sort Key: b.playerid, b.yearid
         Sort Method:  external merge  Disk: 1056kB
         ->  Seq Scan on batting b  (cost=0.00..2609.82 rows=36275 width=13) (actual time=0.025..76.286 rows=33921 loops=1)
               Filter: (rbis > 10)
   ->  Sort  (cost=11.69..11.69 rows=2 width=13) (actual time=0.083..0.084 rows=1 loops=1)
         Sort Key: s.playerid, s.yearid
         Sort Method:  quicksort  Memory: 25kB
         ->  Index Scan using ssal on salaries s  (cost=0.00..11.68 rows=2 width=13) (actual time=0.045..0.046 rows=1 loops=1)
               Index Cond: (salary > 30000000::double precision)
 Total runtime: 375.410 ms
(13 rows)

Ouch. A merge join, our worst option, not only looks grossly over-complicated but is 7 times slower than the original hash join. The tuples returned from the filter on salaries are small enough to fit in memory, and so we get away with an in-memory quicksort there that is wicked fast, but there are so many RBI tuples that we’re forced to do a merge sort on disk, incurring 343.3 milliseconds, over 90% of the query time in disk seeks, page fetches, and writes.

There are even more ways to influence the query planner.

Bottom of the 9th

Well, I don’t know about you, but I’m glad there’s a query planner making all these choices for me. I’m ready to head over to Fenway, dump beer on Yankees fans, and pretend to be a sabermetrician.

Anyone have a MySQL or Oracle instance handy? How do their query plans compare for these queries?


Still rebooting? Strike three, yer out!

Why waste time, money, and sleep rebooting for security updates when you could be at a ball game? Ksplice Uptrack lets you apply Linux security updates to your running kernel without rebooting.


Special thanks to Uncle Dirtnap for his willingness to discuss query planners during a D&D game. What can I say, that’s how we roll.

The wireless traffic of MIT students

Posted in Networking on May 20th, 2010 by Jessica McKellar36 Comments

Wireless traffic is both interesting and delightfully accessible thanks to the broadcast nature of 802.11. I have spent many a lazy weekend afternoon watching my laptop, the Tivo, and the router chatting away in a Wireshark window.

As fun as the wireless traffic in one’s house may be, there’s something to be said for being able to observe a much larger ecosystem – one with more people with a more diverse set of operating systems, browsers, and intentions as they work on their wireless-enabled devices, giving rise to more interesting background and active traffic patterns and a greater set of protocols in play.

Now, it happens to be the case that sniffing other people’s wireless traffic breaks a number of federal and local laws, including wiretapping laws, and while I am interested in observing other people’s wireless traffic, I’m not interested in breaking the law. Fortunately, Ksplice is down the road from a wonderful school that fosters this kind of intellectual curiosity.

I met with MIT’s Information Services and Technology Security Team, and we came up with a plan for me to gather data that would satisfy MIT’s Athena Rules of Use. I got permission from Robert Morris and Sam Madden to monitor the wireless traffic during their Computer Systems Engineering class and made an announcement at the beginning of a class period explaining what I’d be doing. Somewhat ironically, that day’s lecture was an introduction to computer security.

Some interesting results from the data set collected are summarized below. Traffic was gathered with tcpdump on my laptop as I sat in the middle of the classroom. The data was imported into Wireshark, spit back out as a CSV, and imported into a sqlite database for aggregation queries, read back into tcpdump and filtered there, or hacked up with judicious use of grep, as different needs arose.

Protocol # Packets % Packets
MDNS 259932 30.46
TCP 245268 28.74
ICMPv6 116167 13.61
TPKT 78311 9.18
SSDP 31441 3.68
HTTP 28027 3.28
UDP 17006 1.99
LLMNR 16991 1.99
TLSv1 14390 1.69
DHCPv6 11572 1.36
DNS 10870 1.27
SSH 8804 1.03
SSLv3 3094 0.36
Jabber 2507 0.29
ARP 2003 0.23
SSHv2 1503 0.18
IGMP 1309 0.15
SNMP 1232 0.14
NBNS 619 0.073
WLCCP 513 0.060
AIM Buddylist 265 0.031
NTP 245 0.029
HTTP/XML 218 0.026
MSNMS 192 0.022
IP 139 0.016
AIM 121 0.014
SSL 105 0.012
IPv6 90 0.011
AIM Generic 84 0.0098
DHCP 64 0.0075
ICMP 60 0.0070
X.224 59 0.0069
AIM SST 45 0.0053
AIM Messaging 39 0.0046
BROWSER 37 0.0043
YMSG 35 0.0041
AARP 18 0.0021
OCSP 16 0.0019
AIM SSI 11 0.0013
SSLv2 8 0.00094
T.125 5 0.00059
AIM Signon 4 0.00047
NBP 4 0.00047
AIM BOS 3 0.00035
AIM ChatNav 3 0.00035
AIM Stats 3 0.00035
AIM Location 2 0.00023
ZIP 2 0.00023

Basic statistics
Time spent capturing: 45 minutes
Packets captured: 853436
Number of traffic sources in the room: 21
Number of distinct source and destination IPv4 and IPv6 addresses: 5117
Number of “active” traffic addresses (eg using HTTP, SSH, not background traffic): 581
Number of protocols represented: 48 (note that Wireshark buckets based on the top layer for a packet, so for example TCP is in this count because someone was sending actual TCP traffic without an application layer on top and not because TCP is the transport protocol for HTTP, which is also in this count). These protocols and how much traffic was sent over them are in the table on the left.

IPv4 v. IPv6
Number of IPv4 packets: 580547
Number of IPv6 packets: 270351
2.15 IPv4 packets were sent for every IPv6 packet. IPv6 was only used for background traffic, serving as the internet layer for the following protocols: DHCPv6, DNS, ICMPv6, LLMNR, MDNS, SSDP, TCP, and UDP. The TCP over IPv6 packets were all icslap, ldap, or wsdapi communications between our Windows user discussed below and his or her remote desktop. The UDP over IPv6 packets were all ws-discovery communications, part of a local multicast discovery protocol most likely being used by the Windows machines in the room.

ICMP v. ICMPv6
Number of ICMP packets: 60
Number of ICMPv6 packets: 116167
1936.12 ICMPv6 packets were sent for every ICMP packet. The reason is that ICMPv6 is doing IPv6 work that is taken care of by other link layer protocols in IPv4. Looking at the ICMP and ICMPv6 packets by type:

ICMP Type/Code Pkts
Dest unreachable (Host administratively prohibited) 1
Dest unreachable (Port unreachable) 35
Echo (ping) request 9
Time-to-live exceeded in transit 15

ICMPv6 Type/Code Pkts
Echo request 8
Multicast Listener Report msg v2 7236
Multicast listener done 86
Multicast listener report 548
Neighbor advertisement 806
Neighbor solicitation 105710
Redirect 353
Router advertisement 461
Router solicitation 956
Time exceeded (In-transit) 1
Unknown (0×40) (Unknown (0×87)) 1
Unreachable (Port unreachable) 1

These ICMPv6 packets are mostly doing Neighbor Discover Protocol (NDP) and Multicast Listener Discovery (MLD) work. NDP handles router and neighbor solicitation and is similar to ARP and ICMP under IPv4, and MLD handles multicast listener discovery similar to IGMP under IPv4.

TCP v. UDP
Number of TCP packets: 383122
Number of UDP packets: 350067
1.09 TCP packets were sent for every UDP packet. I would have thought TCP would be a clear winner, but given that MDNS traffic, which is over UDP, makes up over 30% of the packets captured, I guess this isn’t surprising. The 14% of packets unaccounted for at the transport layer are mostly ARP and ICMP traffic. See also this post.

Instant Messenging
Awesomely, AIM, Jabber, MSN Messenger, and Yahoo! Messenger were all represented in the traffic:

Participants # Packets
AIM 22 580
Jabber 6 2507
YMSG 4 35
MSNMS 3 192

AIM is the clear favorite (at least with this small sample size). Note that Jabber has about 1/4th the AIM participants but 4x the number of packets. Either the Jabberers are extra chatty, or the fact that Jabber is an XML-based protocol inflates the size of a conversation dramatically on the wire. Note that some IM traffic (like Google Chat) might have instead been bucketed as HTTP/XML by Wireshark.

That Windows Remote Desktop Person
119489 packets, or 14% of the traffic, were between a computer in the classroom and what is with high probability a Windows machine on campus running the Microsoft Remote Desktop Protocol (see also this support page for a discussion of the protocol specifics).
RDP traffic from client to remote desktop: T.125, TCP, TPKT, X.224
RDP traffic from remote desktop to client: TCP, TPKT
Most of the traffic is T.125 payload packets. TPKTs encapulate transport protocol data units (TPDUs) in the ISO Transport Service on top of TCP. TPKT traffic was all “Continuation” traffic. X.224 transmits status and error codes. TCP “ms-wbt-server” traffic to port 3389 on the remote machine seals the deal on this being an RDP setup.

Security
All SSH and SSHv2 traffic was to Linerva, a public-access Linux server run by SIPB for the MIT community, except for one person talking to a personal server on campus.

Protocol # Packets % Packets
TLSv1 14390 81.8
SSLv3 3094 17.6
SSL 105 .59
SSLv2 8 .045

5 clients were involved with negotiations with SSLv2, which is insecure and disabled by default on most browsers and never got past a “Client Hello”.

HTTP
I wanted to be able to answer questions like “what were the top 20 most visited website” in this traffic capture. The proliferation of content distribution networks makes it harder to track all traffic associated with a popular website by IP addresses or hostnames. I ended up doing a crude but quick grep "Host:" pcap.plaintext | sort | uniq -c | sort -n -r on the expanded plaintext version of the data exported from Wireshark, which gives the most visited hosts based on the number of GET requests. The top 20 most visited hosts by that method were:

Rank GETs Host Rank GETs Host
1 336 www.blogcdn.com 11 88 newsimg.bbc.co.uk
2 229 www.blogsmithmedia.com 12 87 www.google.com
3 211 assets.speedtv.com 13 66 sih.onemadogre.com
4 167 profile.ak.fbcdn.net 14 66 ensidia.com
5 149 images.hardwareinfo.net 15 64 student.mit.edu
6 114 static.ensidia.com 16 61 s0.2mdn.net
7 113 www.facebook.com 17 57 alum.mit.edu
8 111 www.blogsmithcdn.com 18 56 www.wired.com
9 93 ad.doubleclick.net 19 56 cdn.eyewonder.com
10 90 static.mmo-champion.com 20 55 www.google-analytics.com

Alas, pretty boring. The blogcdn, blogsmith and eyewonder hosts are all for Engadget, and fbcdn is part of Facebook. I’ll admit that I’d been a little hopeful that some enterprising student would try to screw up my data by scripting visits to a bunch of porn sites or something. CDNs dominate the top 20, and in fact almost all of the roughly 410 web server IP addresses gathered are for CDNs. Akamai led with 39 distinct IPs, followed by Amazon AWS with 23, and Facebook and Panther CDN with 16, with many more at lower numbers.

Wrap-up
Using the Internet means a lot more than HTTP traffic! 45 minutes of traffic capture gave us 48 protocols to explore. Most of the captured traffic was background traffic, and in particular discovery traffic local to the subnet.

Sniffing wireless traffic (legally) is a great way to learn about networking protocols and the way the Internet works in practice. It can also be incredibly useful for debugging networking problems.

What are some of your fun or awful networking experiences?


Ksplice Uptrack

Like what you see? Subscribe to our blog! Brought to you by the fine folks at Ksplice, bringing you rebootless kernel updates with Ksplice Uptrack.

Dating is rough at the transport layer

Posted in Fun on April 22nd, 2010 by Jessica McKellar16 Comments

Love can be unreliable, even when you use TCP


Ksplice Uptrack

Like what you see? Subscribe to our blog! Brought to you by the fine folks at Ksplice, bringing you rebootless kernel updates with Ksplice Uptrack.

Hello from a libc-free world! (Part 2)

Posted in computer architecture on April 6th, 2010 by Jessica McKellar13 Comments

In the previous post we conquered compilation by constructing a small program that can be compiled without using libc. Understanding object code and the details of an ELF executable are the next step in our adventure.

We left off with the following program pieces:

jesstess@kid-charlemagne:~$ cat stubstart.S
.globl _start

_start:
	call main
	movl $1, %eax
	xorl %ebx, %ebx
	int $0x80
jesstess@kid-charlemagne:~$ cat hello.c
int main()
{
  char *str = "Hello World";
  return 0;
}
jesstess@kid-charlemagne:~/c$ gcc -nostdlib stubstart.S hello.c -o hello

What did all that work get us?

jesstess@kid-charlemagne:~/c$ wc -c hello
1373 hello
jesstess@kid-charlemagne:~$ objdump -D hello | wc -l
93

We’re down to a little over 1300 bytes of executable and what at under 100 lines seems like a very reasonable amount of assembly to dissect. Since no little bit of assembly is going to scare us at this point, let’s look at the assembly now, with objdump -D so we see the assembly for all sections (output here). If it looks intimidating at first, just give it a quick once-over and I promise it won’t be by the end of this post.

Alright, we have 5 sections: .text, which contains the familiar _start and main symbols, .rodata, .eh_frame_hdr, .eh_frame, and .comment.

Step 1: Back up – what the heck is a “section”?

If we dust off our favorite copy of the Tool Interface Standard ELF Specification and have a look inside, it tells us this:

An ELF executable like the result of our compilation has two views: it has a program header describing the segments, which contain information used at run-time, and a section header describing the sections, which contain information for linking and relocation. We can look at the program header’s segment information or the section header’s section information with readelf -l or readelf -S, respectively (output here). The output from these commands on our program is summarized in Figure 1. We won’t worry about segments again during this post.

ELF segments and sections

Figure 1: our ELF segments and sections

Step 2: What goes in our sections?

The specification also tells us what goes where in our executable:

.text: The executable instructions for a program.

.rodata: Constant data. This is the “read-only data” segment.

.eh_frame: Information necessary for frame-unwinding during exception handling.

.eh_frame_hdr: To quote the Linux Standard Base Specification: “This section contains a pointer to the .eh_frame section which is accessible to the runtime support code of a C++ application. This section may also contain a binary search table which may be used by the runtime support code to more efficiently access records in the .eh_frame section.”

We don’t have to worry about exceptions with this example, so .eh_frame and .eh_frame_hdr aren’t doing much that we care about, and on this machine, compiling with -fno-asynchronous-unwind-tables will suppress creation of these two sections.

.comment: Compiler version information.

Speaking of getting rid of sections: for those of us with a minimalist aesthetic, strip(1) is our friend. We can --remove-section on non-essential sections like .comment to get rid of them entirely; file(1) will tell us if an ELF executable has been stripped.

Other common sections we don’t see with our example because they’d be empty:

.data: Initialized global variables and initialized static local variables.

.bss: Uninitialized global and local variables; filled with zeroes. A popular section to bring up during CS interviews!

That’s the story on sections. Now, we know that symbols, like _start and main, live in these sections, but are there any more symbols in this program?

Step 3: Understand the symbols and why they live where they live.

We can get symbol information for our executable with objdump -t:

jesstess@kid-charlemagne:~/c$ objdump -t hello

hello:     file format elf64-x86-64

SYMBOL TABLE:
00000000004000e8 l    d  .text                   0000000000000000 .text
0000000000400107 l    d  .rodata                 0000000000000000 .rodata
0000000000400114 l    d  .eh_frame_hdr           0000000000000000 .eh_frame_hdr
0000000000400128 l    d  .eh_frame               0000000000000000 .eh_frame
0000000000000000 l    d  .comment                0000000000000000 .comment
0000000000000000 l    df *ABS*                   0000000000000000 hello.c
00000000004000e8 g       .text                   0000000000000000 _start
0000000000600fe8 g       *ABS*                   0000000000000000 __bss_start
00000000004000f4 g     F .text                   0000000000000013 main
0000000000600fe8 g       *ABS*                   0000000000000000 _edata
0000000000600fe8 g       *ABS*                   0000000000000000 _end

The symbol table for our executable has 11 entries. Weirdly, only rare versions of the objdump man page, like this one, will actually explain the symbol table column by column. It breaks the table down as follows:

Column 1: the symbol’s value/address.
Column 2: a set of characters and spaces representing the flag bits set for the symbol. There are 7 groupings, three of which are represented in this symbol table. The first can be l, g, <space>, or !, if the symbol is local, global, neither, or both, respectively. The sixth can be d, D, or <space>, for debugging, dynamic, or normal, respectively. The seventh can be F, f, O, or <space>, for function, file, object, or normal symbol, respectively. Descriptions of the 4 remaining grouping can be found in that unusally comprehensive objdump manpage.
Column 3: which section the symbol lives in. *ABS*, or absolute, means the symbol is not associated with a certain section.
Column 4: the symbol’s size/alignment.
Column 5: the symbol’s name.

Our 5 sections all have associated local (l) debugging (d) symbols. main is indeed a function (F), and hello.c is in fact a file (f), and it isn’t associated with any particular section (*ABS*). _start and main are part of the executable instructions for our program and thus live in the .text section as we’d expect. The only oddities here are __bss_start, _edata, and _end, all *ABS*olute, global symbols that we certainly didn’t write into our program. Where did they come from?

The culprit this time is the linker script. gcc implicitly called ld to do the linking on this machine as part of the compilation process. ld --verbose will spit out the linker script that was used, and looking at this script (output here) we see that _edata is defined as the end of the .data section, and __bss_start and _end mark the beginning and end of the .bss section. These symbols could be used by memory management schemes (for example if sbrk wants to know where the heap could start) and garbage collectors.

Note that str, our initialized local variable, doesn’t show up in the symbol table. Why? Because it gets allocated on the stack (or possibly in a register) at runtime. However, something related to str is in the .rodata section, even though we don’t see it in the symbol table…

With char *str = "Hello, World"; we’re actually creating two different objects. The first is the string literal “Hello, World”, which is just that array of characters, and has some address but no explicit way of naming it. That array is read-only and lives in .rodata. The second is the local variable str, which is of type “pointer to char”. That is what lives on the stack. Its initial value is the address of the string literal that was created.

We can prove this, and see some other useful information, by looking at the contents of our sections with the strings decoded:

jesstess@kid-charlemagne:~$ objdump -s hello

hello:     file format elf64-x86-64

Contents of section .text:
 4000e8 e80b0000 00b80100 000031db cd809090  ..........1.....
 4000f8 554889e5 48c745f8 0b014000 b8000000  UH..H.E...@.....
 400108 00c9c3                               ...
Contents of section .rodata:
 40010b 48656c6c 6f20576f 726c6400           Hello World.
Contents of section .eh_frame_hdr:
 400118 011b033b 14000000 01000000 e0ffffff  ...;............
 400128 30000000                             0...
Contents of section .eh_frame:
 400130 14000000 00000000 017a5200 01781001  .........zR..x..
 400140 030c0708 90010000 1c000000 1c000000  ................
 400150 f8004000 13000000 00410e10 8602430d  ..@......A....C.
 400160 06000000 00000000                    ........
Contents of section .comment:
 0000 00474343 3a202855 62756e74 7520342e  .GCC: (Ubuntu 4.
 0010 332e332d 35756275 6e747534 2920342e  3.3-5ubuntu4) 4.
 0020 332e3300                             3.3. 

Voila! Our “Hello World” string is in .rodata, and our .comment section is now explained: it just holds a string with the gcc version used to compile the program.

Step 4: Trim the fat and put it all together

This executable has 5 sections: .text, .rodata, .eh_frame_hdr, .eh_frame, and .comment. Really, only one of them, .text, has assembly that’s germane to what this little program does. This can be confirmed by doing an objdump -d (only disassemble those sections which are expected to contain instructions) instead of the objdump -D (disassemble the contents of all sections, not just those expected to contain instructions) done at the beginning of the post and noting that only the content of .text is displayed.

.rodata really only contains the string “Hello World”, and .comment really only contains a gcc version string. The “instructions” for those sections seen in the objdump -D output come from objdump treating the hexadecimal representations of the ASCII characters in those strings as instructions and trying to disassemble them. We can convert the first couple of numbers in the .comment section to ASCII characters to prove this. In Python:

>>> "".join(chr(int(x, 16)) for x in "47 43 43 3a 20 28 55 62 75 6e 74 75".split())
'GCC: (Ubuntu'

In .text, _start calls main, and in main a pointer to the memory location where “Hello World” is stored, 0x40010b (where .rodata starts, as seen in the obdjump -D output), is pushed onto the stack. We then return from main to _start, which takes care of returning from the program, as described in Part I.

And that’s everything! All sections and symbols are accounted for. Nothing is magic (and I mean magic in a good I-would-ace-this-test way, not a sorry-Jimmy-Santa-isn’t-real way). Whew.

Looking at and really understanding the core parts of an ELF executable means that we can add complexity now without cheating our way around parts we don’t understand. To that end, stay tuned for Part 3, where we’ll stuff this program with a veritable variable smörgåsbord and see where everything ends up in the program’s memory.


Ksplice Uptrack: What Understanding ELF Sections Can Do For You

Ksplice Uptrack lets you apply Linux security updates to your running kernel without rebooting. Say goodbye to wasted time, money, and sleep for scheduled downtimes and try it today.

Hello from a libc-free world! (Part 1)

Posted in computer architecture on March 16th, 2010 by Jessica McKellar104 Comments

As an exercise, I want to write a Hello World program in C simple enough that I can disassemble it and be able to explain all of the assembly to myself.

This should be easy, right?

This adventure assumes compilation and execution on a Linux machine. Some familiarity with reading assembly is helpful.

Here’s our basic Hello World program:

jesstess@kid-charlemagne:~/c$ cat hello.c
#include <stdio.h>

int main()
{
  printf("Hello World\n");
  return 0;
}

Let’s compile it and get a bytecount:

jesstess@kid-charlemagne:~/c$ gcc -o hello hello.c
jesstess@kid-charlemagne:~/c$ wc -c hello
10931 hello

Yikes! Where are 11 Kilobytes worth of executable coming from? objdump -t hello gives us 79 symbol-table entries, most of which we can blame on our using the standard library.

So let’s stop using it. We won’t use printf so we can get rid of our include file:

jesstess@kid-charlemagne:~/c$ cat hello.c
int main()
{
  char *str = "Hello World";
  return 0;
}

Recompiling and checking the bytecount:

jesstess@kid-charlemagne:~/c$ gcc -o hello hello.c
jesstess@kid-charlemagne:~/c$ wc -c hello
10892 hello

What? That barely changed anything!

The problem is that gcc is still using standard library startup files when linking. Want proof? We’ll compile with -nostdlib, which according to the gcc man page won’t “use the standard system libraries and startup files when linking. Only the files you specify will be passed to the linker”.

jesstess@kid-charlemagne:~/c$ gcc -nostdlib -o hello hello.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 00000000004000e8

Well, it’s just a warning; let’s check it anyway:

jesstess@kid-charlemagne:~/c$ wc -c hello
1329 hello

That looks pretty good! We got our bytecount down to a much more reasonable size (an order of magnitude smaller!)…

jesstess@kid-charlemagne:~/c$ ./hello
Segmentation fault

…at the expense of segfaulting when it runs. Hrmph.

For fun, let’s get our program to be actually runnable before digging into the assembly.

So what is this _start entry symbol that appears to be required for our program to run? Where is it usually defined if you’re using libc?

From the perspective of the linker, by default _start is the actual entry point to your program, not main. It is normally defined in the crt1.o ELF relocatable. We can verify this by linking against crt1.o and noting that _start is now found (although we develop other problems by not having defined other necessary libc startup symbols):

# Compile the source files but don't link
jesstess@kid-charlemagne:~/c$ gcc -Os -c hello.c
# Now try to link
jesstess@kid-charlemagne:~/c$ ld /usr/lib/crt1.o -o hello hello.o
/usr/lib/crt1.o: In function `_start':
/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:106: undefined reference to `__libc_csu_fini'
/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:107: undefined reference to `__libc_csu_init'
/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:113: undefined reference to `__libc_start_main'

This check conveniently also tells us where _start lives in the libc source: sysdeps/x86_64/elf/start.S for this particular machine. This delightfully well-commented file exports the _start symbol, sets up the stack and some registers, and calls __libc_start_main. If we look at the very bottom of csu/libc-start.c we see the call to our program’s main:

/* Nothing fancy, just call the function.  */
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);

and down the rabbit hole we go.

So that’s what _start is all about. Conveniently, we can summarize what happens between _start and the call to main as “set up a bunch of stuff for libc and then call main”, and since we don’t care about libc, let’s just export our own _start symbol that just calls main and link against that:

jesstess@kid-charlemagne:~/c$ cat stubstart.S
.globl _start

_start:
	call main

Compiling and running with our stub _start assembly file:

jesstess@kid-charlemagne:~/c$ gcc -nostdlib stubstart.S -o hello hello.c
jesstess@kid-charlemagne:~/c$ ./hello
Segmentation fault

Hurrah, our compilation problems go away! However, we still segfault. Why? Let’s compile with debugging information and take a look in gdb. We’ll set a breakpoint at main and step through until the segfault:

jesstess@kid-charlemagne:~/c$ gcc -g -nostdlib stubstart.S -o hello hello.c
jesstess@kid-charlemagne:~/c$ gdb hello
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
(gdb) break main
Breakpoint 1 at 0x4000f4: file hello.c, line 3.
(gdb) run
Starting program: /home/jesstess/c/hello

Breakpoint 1, main () at hello.c:5
5	  char *str = "Hello World";
(gdb) step
6	  return 0;
(gdb) step
7	}
(gdb) step
0x00000000004000ed in _start ()
(gdb) step
Single stepping until exit from function _start,
which has no line number information.
main () at helloint.c:4
4	{
(gdb) step

Breakpoint 1, main () at helloint.c:5
5	  char *str = "Hello World";
(gdb) step
6	  return 0;
(gdb) step
7	}
(gdb) step

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000001 in ?? ()
(gdb)

Wait, what? Why are we running through main twice? …It’s time to look at the assembly:

jesstess@kid-charlemagne:~/c$ objdump -d hello

hello:     file format elf64-x86-64

Disassembly of section .text:

00000000004000e8 <_start>:
  4000e8:	e8 03 00 00 00       	callq  4000f0
  4000ed:	90                   	nop
  4000ee:	90                   	nop
  4000ef:	90                   	nop    

00000000004000f0 :
  4000f0:	55                   	push   %rbp
  4000f1:	48 89 e5             	mov    %rsp,%rbp
  4000f4:	48 c7 45 f8 03 01 40 	movq   $0x400103,-0x8(%rbp)
  4000fb:	00
  4000fc:	b8 00 00 00 00       	mov    $0x0,%eax
  400101:	c9                   	leaveq
  400102:	c3                   	retq

D’oh! Let’s save a detailed examination of the assembly for later, but in brief: when we return from the callq to main we hit some nops and run right back into main. Since we re-entered main without putting a return instruction pointer on the stack as part of the standard prologue for calling a function, the second call to retq tries to pop a bogus return instruction pointer off the stack and jump to it and we bomb out. We need an exit strategy.

Literally. After the return from callq, push 1, the syscall number for SYS_exit, into %eax, and because we want to say that we’re exiting cleanly, put a status of 0, SYS_exit‘s only argument, into %ebx. Then make the interrupt to drop into the kernel with int $0x80.

jesstess@kid-charlemagne:~/c$ cat stubstart.S
.globl _start

_start:
	call main
	movl $1, %eax
	xorl %ebx, %ebx
	int $0x80
jesstess@kid-charlemagne:~/c$ gcc -nostdlib stubstart.S -o hello hello.c
jesstess@kid-charlemagne:~/c$ ./hello
jesstess@kid-charlemagne:~/c$

Success! It compiles, it runs, and if we step through this new version under gdb it even exits normally.

Hello from a libc-free world!

Stay tuned for Part 2, where we’ll walk through the parts of the executable in earnest and watch what happens to it as we add complexity, in the process understanding more about x86 linking and calling conventions and the structure of an ELF binary.


Ksplice Uptrack: What Understanding Object Code Can Do For You

Ksplice Uptrack lets you apply Linux security updates to your running kernel without rebooting. Say goodbye to wasted time, money, and sleep for scheduled downtimes and try it today.