Sounds like you may have a bit of a fill-rate problem. This can be dramatically impacted by clock speeds. While most motherboard/CPU configurations will be the same--when it comes to graphics cards they vary WIDELY. Just because your friend has the same 900 series card, it doesn't mean your core, shader, and memory clocks are all running at the same speed, and there is also differences in the width of the data path (128, 192, 256 bit), system memory density (how many rows/columns and number of chips), and refresh/latency timings (CAS 7, 8,... up to 11 ms delay) that can come into play---all these things add up to produce different fill rates against the same clock speeds.
Here's a pretty simplified example. Your motherboard has base time settings for both the memory and the CPU, which then get multiplied to create the CPU's rated clock speed...let's say 3GHz. Changing up these base speeds and/or the multipliers can have a big impact on performance, even though you may be staying right at the same 3GHz overall speed. 3GHz from a 100Mhz base (30x multiplier) vs 2.999GHz from a 133MHz base (22.5x multiplier). The latter will have a much better response time to anything you do because the core bus speed that is driving all transfers is running at 133MHz versus 100MHz---you will be feeding the data into the CPU ~33% faster. That includes getting all data to your graphics card initially, and then all communication back and forth is sped up as well. Note that your system memory has a similar mechanic as well--your memory could be running against base clocks anywhere from 200 to 333 MHz (theoretically up to 400MHz), which then get multiplied to reach their rated speeds.
Now look specifically at your graphics card. If it has a wider data path (256bit vs 128bit), it can move up to twice the amount of data per clock cycle. Now, if that memory is also clocked higher or has a lower latency (faster refresh time), the differences between them compound dramatically between the 128bit and 256 bit pipelines simply because of the doubling of the size of data sent per clock cycle. Note that at the top end, you can have cards with 384 and 512 bit interfaces. Just look at some basic math. For the same clock speed, you could be looking at a base of almost 15GB/sec at 128 bit vs 29.8GB/sec at 256---44.7GB/sec as a base at 384bits, or a whopping 59.6GB/sec at 512. That is a big difference in how fast you feed the data to the GPU.
This is why it is so important to compare the spec's on a graphics card. If it looks like too good of a deal, there may be a reason why--compare the particulars on the clock rates and memory details.
CPUID and TechPowerUp have some simple tools you can download (freebies) that can give you some of these specs on your card and such. Might be worth pulling them up to see just how you are configured and compare them to the reference numbers for your line and such. There may be some simple tweaking that can be done to improve your fill rate. CPU-Z is a simple tool for details on your CPU and memory and such--their HWMonitor is a nice touch for snapshots into temps and such. Techpowerup's GPU-z is a great tool for monitoring all of this on your graphics card as well.