News
Petabyte Size Data Store Managed by Hadoop & Map Reduce.
0Hadoop
——–
Source : http://hadoop.apache.org/ & www.
Today, we’re surrounded by data. People upload videos, take pictures on their cell phones, text friends, update their Facebook status, leave comments around the web, click on ads, and so forth. Machines, too, are generating and keeping more and more data. You may even be reading this book as digital data on your computer screen, and certainly your purchase of this book is recorded as data with some retailer.
The exponential growth of data first presented challenges to cutting-edge businesses such as Google, Yahoo, Amazon, and Microsoft. They needed to go through terabytes and petabytes of data to figure out which websites were popular, what books were in demand, and what kinds of ads appealed to people. Existing tools were becoming inadequate to process such large data sets. Google was the first to publicize MapReduce—a system they had used to scale their data processing needs.
This system aroused a lot of interest because many other businesses were facing similar scaling challenges, and it wasn’t feasible for everyone to reinvent their own proprietary tool. Doug Cutting saw an opportunity and led the charge to develop an open source version of this MapReduce system called Hadoop . Soon after, Yahoo and others rallied around to support this effort.
What is Hadoop ?
————–
Hadoop is an open source framework for writing and running distributed applications that process large amounts of data. Distributed computing is a wide and varied field, but the key distinctions of Hadoop are that it is
1.Accessible—Hadoop runs on large clusters of commodity machines or on cloud computing services such as Amazon’s Elastic Compute Cloud (EC2 ).
2.Robust—Because it is intended to run on commodity hardware, Hadoop is architected with the assumption of frequent hardware malfunctions. It can gracefully handle most such failures.
3.Scalable—Hadoop scales linearly to handle larger data by adding more nodes to the cluster.
4.Simple—Hadoop allows users to quickly write efficient parallel code.
Comparing SQL databases and Hadoop:
————————————
Hadoop is a framework for processing data, what makes it better than standard relational databases, the workhorse of data processing in most of today’s applications? One reason is that SQL (structured query language) is by design targeted at structured data. Many of Hadoop’s initial applications deal with unstructured data such as text. From this perspective Hadoop provides a more general paradigm than SQL.
For working only with structured data, the comparison is more nuanced. In principle, SQL and Hadoop can be complementary, as SQL is a query language which can be implemented on top of Hadoop as the execution engine.3 But in practice, SQL databases tend to refer to a whole set of legacy technologies, with several dominant vendors, optimized for a historical set of applications. Many of these existing commercial databases are a mismatch to the requirements that Hadoop targets.
Some Implementation of Hadoop for production purpose :
——————————————————
Complete List @ http://wiki.apache.org/hadoop/PoweredBy
Sybase IQ
———
Sybase IQ : http://www.computerworld.com/s/article/9221355/Updated_Sybase_IQ_supports_Hadoop_MapReduce_Big_Data_
EBay
—-
532 nodes cluster (8 * 532 cores, 5.3PB).
Heavy usage of Java MapReduce, Pig, Hive, HBase
Using it for Search optimization and Research.
Facebook
——-
We use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning.
Currently we have 2 major clusters:
A 1100-machine cluster with 8800 cores and about 12 PB raw storage.
A 300-machine cluster with 2400 cores and about 3 PB raw storage.
Each (commodity) node has 8 cores and 12 TB of storage.
We are heavy users of both streaming as well as the Java APIs. We have built a higher level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/). We have also developed a FUSE implementation over HDFS.
LinkedIn
———
We have multiple grids divided up based upon purpose. * Hardware:
120 Nehalem-based Sun x4275, with 2×4 cores, 24GB RAM, 8x1TB SATA
580 Westmere-based HP SL 170x, with 2×4 cores, 24GB RAM, 6x2TB SATA
1200 Westmere-based SuperMicro X8DTT-H, with 2×6 cores, 24GB RAM, 6x2TB SATA
Software:
CentOS 5.5 -> RHEL 6.1
Sun JDK 1.6.0_14 -> Sun JDK 1.6.0_20 -> Sun JDK 1.6.0_26
Apache Hadoop 0.20.2+patches -> Apache Hadoop 0.20.204+patches
Pig 0.9 heavily customized
Azkaban for scheduling
Hive, Avro, Kafka, and other bits and pieces…
Twitter
——–
We use Hadoop to store and process tweets, log files, and many other types of data generated across Twitter. We use Cloudera’s CDH2 distribution of Hadoop, and store all data as compressed LZO files.
We use both Scala and Java to access Hadoop’s MapReduce APIs
We use Pig heavily for both scheduled and ad-hoc jobs, due to its ability to accomplish a lot with few statements.
We employ committers on Pig, Avro, Hive, and Cassandra, and contribute much of our internal Hadoop work to opensource (see hadoop-lzo)
For more on our use of Hadoop, see the following presentations: Hadoop and Pig at Twitter and Protocol Buffers and Hadoop at Twitter
Yahoo!
——–
More than 100,000 CPUs in >40,000 computers running Hadoop
Our biggest cluster: 4500 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)
Used to support research for Ad Systems and Web Search
Also used to do scaling tests to support development of Hadoop on larger clusters
Our Blog – Learn more about how we use Hadoop.
>60% of Hadoop Jobs within Yahoo are Pig jobs.
Data_
Multi-Path Replication (MPR) technology : Replication Server 15.7
0The imminent release of Replication Server 15.7 continues pushing envelop and maintaining its leading edge by introducing new Multi-Path Replication (MPR) technology.
So, what is MPR? MPR improves replication performance and reduces latency by enabling parallel paths of data from the source database to the target database. These parallel paths will process data independently of each other to improve overall efficiency, performance and load balancing.
Full Source @ http://blogs.sybase.com/zhangb/2011/12/replication-server-improves-performance-and-reduces-latency-with-mpr/#respond
Note :
What about the order of transacation , that need to maintain at target side?
Even transaction can come rapidally at target , but it must be applying in a order.
Commit order is maintained within single path. To increase performance on a single path, one can employ parallel DSI, Bulk copy and HVAR features RS has introduced in earlier releases. To take advantage of MPR, users need to fully understand application schema to divide them as commit order is not guaranteed among paths.
ASE database for SAP ERP
0Hello all,
These are copilataion for Sybase ASE on SAP from the Rob’s Blog :
Read Full Story : http://blogs.sybase.com/database/2011/12/so-what-does-an-ase-database-look-like-in-sap-erp/
- SAP has released Business Suite on ASE version 15.7.
- All SAP application data resides in a single ASE database. There is another small database for use by SAP tools.
- The ASE database uses a 16KB page size.
- For ERP only (i.e. not counting CRM and the other Business Suite modules), the database contains about 80,000 tables and 170,000 indexes. This is because SAP ERP has many features and functions, all with their own set of tables. SAP customers typically run only a subset of all those functions so in practice a large part of those 80,000 tables will always remain empty.
- All SAP tables use datarowslocking (there is an interesting historical dimension.
- All tables names are in uppercase; some table names contain special characters, like the slash character in “/BCV/C_QATTR” (I don’t have a clue what that name means, BTW)
- Apart from the tables, there are also about 10,000 views. No stored procedures or triggers are used.
- SAP makes heavy use of dynamic SQL (also known as “prepared statements”).
- Many tables have a text or image column.
- All tables are owned by one database user (and that’s not the dbouser).
- The ASE database is accessed through ODBC.
- SAP makes frequent use of the built-in ASE Job Scheduler (originally added in ASE 12.5.1).
- The ASE server uses Unicode with the utf8 character set.
K21– ASE’S KERNEL DESIGN FOR THE 21ST CENTURY – ASE 15.7’s THREADED KERNEL K21
0Basic Difference :
Process Kernel:
Pre-15.7 kernel (except Windows)
Each engine is a separate process
Retained in 15.7 for risk mitigation
Threaded Kernel:
Default kernel for 15.7
Each engine is a thread of a single process
Additional threads for handling I/O, etc.
ASE on Windows has always been thread based
http://www.sybase.com/files/Product_Overviews/ASE-15.7-New-Threaded-Kernel.pdf
Survey among Sybase and Oracle customers – Bloor’s Research .
0
1. License fees: 75% of respondents thought that Sybase ASE was less expensive, by an average of 28%.
2. Support costs: 78% of respondents thought that Sybase ASE was less expensive, by an average of 32%.
3. Number of database administrators: nobody thought that Oracle required fewer DBAs and 61% thought that Sybase required fewer DBAs. On average the saving was 32%.
4. Frequency of security patches: again, nobody thought that this was less frequent in the case of Oracle while 68% thought that this was the case with Sybase, with an average reduction of 22%.
5. Issue resolution: 73% thought that Sybase was faster at resolving issues, typically being 21% faster.
Reference: http://www.sybase.com/files/White_Papers/SYBASE_ASE_Bloor_Research_TCO_vs_Oracle.pdf
sybaseblog.com completed 2 yrs on Oct 31st!
0All,
Last Monday Oct 31st 2011, sybaseblog.com has completed 2 years!
Thanks for all your support and wishes!
Happy Sybase Learning !
Cheers…
RIP Ritchie SIR!
0Jobs used c to devlop the mac , even all the major product has base C language and unix, bt aftr the death Sir Ritchie did nt get such media attention as Jobs got
He was not the owner of any software giant and was not a buisnessman
RIP SIR Ritchie..You contribution people will remember ever ..forever…
After a long illness, Dennis Ritchie, father of Unix and an esteemed computer scientist, died last weekend at the age of 70.
Ritchie, also known as “dmr”, is best know for creating the C programming language as well as being instrumental in the development of UNIX along with Ken Thompson. Ritchie spent most of his career at Bell Labs, which at the time of his joining in 1967, was one of the largest phone providers in the U.S. and had one of the most well-known research labs in operation.
Working alongside Thompson (who had written B) at Bell in the late sixties, the two men set out to develop a more efficient operating system for the up-and-coming minicomputer, resulting in the release of Unix (running on a DEC PDP-7) in 1971.
Though Unix was cheap and compatible with just about any machine, allowing users to install a variety of software systems, the OS was written in machine (or assembly) language, meaning that it had a small vocabulary and suffered in relation to memory.
By 1973, Ritchie and Thompson had rewritten Unix in C, developing its syntax, functionality, and beyond to give the language the ability to program an operating system. The kernel was published in the same year.
Today, C remains the second most popular programming language in the world (or at least the language in which the second most lines of code have been written), and ushered in C++ and Java; while the pair’s work on Unix led to, among other things, Linus Torvalds’ Linux. The work has without a doubt made Ritchie one of the most important, if not under-recognized, engineers of the modern era.
His work, specifically in relation to UNIX, led to him becoming a joint recipient of the Turing Award with Ken Thompson in 1983, as well as a recipient of the National Medal of Technology in 1998 from then-president Bill Clinton.UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity – Dennis Ritchie, who was a genius,
Why Software is Soft?
0why software is soft ,it’s very basis question, worth to know it..
Our computer is designed in layered in architecture, In the core (inner layer) we have all physical component(h/w).
In starting , we were directly interacting with hardware and mainly in 1 & 0 , even you can say in on/off state of a transistor. It was really very difficult process for the user’s interaction.
To reduce the hardness of physical component for interaction, we started layering h/w with piece of code.
Simply saying to remove the hardness of hardware , for making it soft for users, we used layers, as we calling now as Software.
Thanks Prof Raman for clearing my thoughts!
ASE 15.7 for SAP® Business Suite Released in Techwave!!!
0Hi Folks,
Most Awaited Database ASE 15.7 has been released yday in Vegas Techwave:
Extremely good news for Sybase Users!!!!!!!!
http://www.sybase.com/detail?id=1094783
http://www.sybase.com/asebuiltforbusiness
http://www.ctoedge.com/content/making-database-smarter
ASE 15.7 key features include:
Management of Large Datasets
- Compression— allows large databases to be stored more compactly and reduces I/O times to ensure high performance on even the largest databases.
- Reduced Query Latency— helps better handle large data sets, especially those which use dynamic SQL for interactive data retrieval.
- Replication Performance—increases the performance of Sybase’s industry-leading transaction replication and syncing technology.
- Enhancements for Parallel Hardware —improves optimization of multi-core/multi-threaded CPU architectures to get the maximum performance out of today’s latest processors
Simplified Administration
- Online Operations—increases data availability while allowing data to be optimized for application performance.
- Extended Diagnostics—allows DBAs to quickly pin-point performance bottlenecks and speed customer support requests.
- Strong Password Encryption—protects the database from external intrusion and hacking
- Single Sign-on & Login Profiles—makes it easier to manage large numbers of users and simplifies end-user access to the system
Ease of Application Development
- Efficient Management of Large Objects—inline management of large objects as well as enhanced application development features such as large objects as parameters to stored procedures.
- Enhanced Application Language Capabilities—many improved TransactSQL&™ language features to increase productivity of application developers as well as support for a variety of popular languages such as Python, PHP and Perl.
- An Enterprise-Class DB for ISV applications —enhanced business-critical performance for ISV applications out of the box, enabling ISVs to easily write and port their applications to ASE 15.7.

Recent Comments