Posts tagged News

Petabyte Size Data Store Managed by Hadoop & Map Reduce.

0

Hadoop
——–

Source : http://hadoop.apache.org/ & www.
Today, we’re surrounded by data. People upload videos, take pictures on their cell phones, text friends, update their Facebook status, leave comments around the web, click on ads, and so forth. Machines, too, are generating and keeping more and more data. You may even be reading this book as digital data on your computer screen, and certainly your purchase of this book is recorded as data with some retailer.

The exponential growth of data first presented challenges to cutting-edge businesses such as Google, Yahoo, Amazon, and Microsoft. They needed to go through terabytes and petabytes of data to figure out which websites were popular, what books were in demand, and what kinds of ads appealed to people. Existing tools were becoming inadequate to process such large data sets. Google was the first to publicize MapReduce—a system they had used to scale their data processing needs.

This system aroused a lot of interest because many other businesses were facing similar scaling challenges, and it wasn’t feasible for everyone to reinvent their own proprietary tool. Doug Cutting saw an opportunity and led the charge to develop an open source version of this MapReduce system called Hadoop . Soon after, Yahoo and others rallied around to support this effort.

What is Hadoop ?
————–

Hadoop is an open source framework for writing and running distributed applications that process large amounts of data. Distributed computing is a wide and varied field, but the key distinctions of Hadoop are that it is

1.Accessible—Hadoop runs on large clusters of commodity machines or on cloud computing services such as Amazon’s Elastic Compute Cloud (EC2 ).
2.Robust—Because it is intended to run on commodity hardware, Hadoop is architected with the assumption of frequent hardware malfunctions. It can gracefully handle most such failures.
3.Scalable—Hadoop scales linearly to handle larger data by adding more nodes to the cluster.
4.Simple—Hadoop allows users to quickly write efficient parallel code.

Comparing SQL databases and Hadoop:
————————————

Hadoop is a framework for processing data, what makes it better than standard relational databases, the workhorse of data processing in most of today’s applications? One reason is that SQL (structured query language) is by design targeted at structured data. Many of Hadoop’s initial applications deal with unstructured data such as text. From this perspective Hadoop provides a more general paradigm than SQL.
For working only with structured data, the comparison is more nuanced. In principle, SQL and Hadoop can be complementary, as SQL is a query language which can be implemented on top of Hadoop as the execution engine.3 But in practice, SQL databases tend to refer to a whole set of legacy technologies, with several dominant vendors, optimized for a historical set of applications. Many of these existing commercial databases are a mismatch to the requirements that Hadoop targets.
Some Implementation of Hadoop for production purpose :
——————————————————

Complete List @ http://wiki.apache.org/hadoop/PoweredBy

Sybase IQ
———
Sybase IQ : http://www.computerworld.com/s/article/9221355/Updated_Sybase_IQ_supports_Hadoop_MapReduce_Big_Data_

EBay
—-

532 nodes cluster (8 * 532 cores, 5.3PB).
Heavy usage of Java MapReduce, Pig, Hive, HBase
Using it for Search optimization and Research.

Facebook
——-

We use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning.

Currently we have 2 major clusters:

A 1100-machine cluster with 8800 cores and about 12 PB raw storage.
A 300-machine cluster with 2400 cores and about 3 PB raw storage.
Each (commodity) node has 8 cores and 12 TB of storage.
We are heavy users of both streaming as well as the Java APIs. We have built a higher level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/). We have also developed a FUSE implementation over HDFS.

LinkedIn
———

We have multiple grids divided up based upon purpose. * Hardware:
120 Nehalem-based Sun x4275, with 2×4 cores, 24GB RAM, 8x1TB SATA
580 Westmere-based HP SL 170x, with 2×4 cores, 24GB RAM, 6x2TB SATA
1200 Westmere-based SuperMicro X8DTT-H, with 2×6 cores, 24GB RAM, 6x2TB SATA
Software:
CentOS 5.5 -> RHEL 6.1
Sun JDK 1.6.0_14 -> Sun JDK 1.6.0_20 -> Sun JDK 1.6.0_26
Apache Hadoop 0.20.2+patches -> Apache Hadoop 0.20.204+patches
Pig 0.9 heavily customized
Azkaban for scheduling
Hive, Avro, Kafka, and other bits and pieces…

Twitter
——–

We use Hadoop to store and process tweets, log files, and many other types of data generated across Twitter. We use Cloudera’s CDH2 distribution of Hadoop, and store all data as compressed LZO files.

We use both Scala and Java to access Hadoop’s MapReduce APIs
We use Pig heavily for both scheduled and ad-hoc jobs, due to its ability to accomplish a lot with few statements.
We employ committers on Pig, Avro, Hive, and Cassandra, and contribute much of our internal Hadoop work to opensource (see hadoop-lzo)
For more on our use of Hadoop, see the following presentations: Hadoop and Pig at Twitter and Protocol Buffers and Hadoop at Twitter

Yahoo!
——–

More than 100,000 CPUs in >40,000 computers running Hadoop
Our biggest cluster: 4500 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)
Used to support research for Ad Systems and Web Search
Also used to do scaling tests to support development of Hadoop on larger clusters
Our Blog – Learn more about how we use Hadoop.
>60% of Hadoop Jobs within Yahoo are Pig jobs.

 

Data_

ASE database for SAP ERP

0

Hello all,

These are copilataion for Sybase ASE on SAP from the Rob’s Blog :

Read Full Story : http://blogs.sybase.com/database/2011/12/so-what-does-an-ase-database-look-like-in-sap-erp/

  • SAP has released Business Suite on ASE version 15.7.
  • All SAP application data resides in a single ASE database. There is another small database for use by SAP tools.
  • The ASE database uses a 16KB page size.
  • For ERP only (i.e. not counting CRM and the other Business Suite modules), the database contains about 80,000 tables and 170,000 indexes. This is because SAP ERP has many features and functions, all with their own set of tables. SAP customers typically run only a subset of all those functions so in practice a large part of those 80,000 tables will always remain empty.
  • All SAP tables use datarowslocking (there is an interesting historical dimension.
  • All tables names are in uppercase; some table names contain special characters, like the slash character in “/BCV/C_QATTR” (I don’t have a clue what that name means, BTW)
  • Apart from the tables, there are also about 10,000 views. No stored procedures or triggers are used.
  • SAP makes heavy use of dynamic SQL (also known as “prepared statements”).
  • Many tables have a text or image column.
  • All tables are owned by one database user (and that’s not the dbouser).
  • The ASE database is accessed through ODBC.
  • SAP makes frequent use of the built-in ASE Job Scheduler (originally added in ASE 12.5.1).
  • The ASE server uses Unicode with the utf8 character set.

sybaseblog.com completed 2 yrs on Oct 31st!

0

All,

Last Monday Oct 31st 2011, sybaseblog.com has completed 2 years!

Thanks for all your support and wishes!

Happy Sybase Learning !

Cheers…

Isolation Level – Summarized.

0

Isolation Level ??
=============

Data concurrency: means that many users can access data at the same time.

Data consistency: means that each user sees a consistent view of the data, including visible changes made by the user’s own transactions and transactions of other users.

Isolation : is a property that defines how/when the changes made by one operation become visible to other concurrent operations. Isolation is one of the ACID property.

Lower isolation levels increase transaction concurrency at the risk of allowing transactions to observe a fuzzy or incorrect database state. These incorrect state you need to manage at application design.

4 Isolation Levels:
===================

The ANSI/ISO SQL-92 specifications define four isolation levels:

(1) READ UNCOMMITTED.
(2) READ COMMITTED.
(3) REPEATABLE READ.
(4) SERIALIZABLE.

Lower Isolation level —> Higher concurrency, Data consistancy low, Reducing the locking overhead.
Higher Isolation Level —> Lower Concurrency, High Data Consistancy, Possible More Deadlock in multi user enviorment.

Three preventable phenomena
===========================

P1 (Dirty Read): Transaction T1 modifies a data item. Another transaction T2 then reads that data item before T1 performs a COMMIT or ROLLBACK. If T1 then performs a ROLLBACK, T2 has read a data item that was never committed and so never really existed.

P2 (Non-repeatable or Fuzzy Read): Transaction T1 reads a data item. Another transaction T2 then modifies or
deletes that data item and commits. If T1 then attempts to reread the data item, it receives a modified value or discovers
that the data item has been deleted.

P3 (Phantom): Transaction T1 reads a set of data items satisfying some . Transaction T2
then creates data items that satisfy T1’s and commits. If T1 then repeats its read with the
same , it gets a set of data items different from the first read.

—————————————————————————–
Isolation Level Dirty Read Nonrepeatable Read Phantom Read
——————————————————————————
Read uncommitted Possible Possible Possible
Read committed Not possible Possible Possible
Repeatable read Not possible Not possible Possible
Serializable Not possible Not possible Not possible
——————————————————————————

ASE 15.7 for SAP® Business Suite Released in Techwave!!!

0

Hi Folks,

Most Awaited Database ASE 15.7 has been released yday in Vegas Techwave:

Extremely good news for Sybase Users!!!!!!!!

http://www.sybase.com/detail?id=1094783

http://www.sybase.com/asebuiltforbusiness

http://www.ctoedge.com/content/making-database-smarter

http://blogs.sybase.com/tradingandrisk/2011/09/sybase-unveils-latest-ase-at-techwave-2011-in-las-vegas/

ASE 15.7 key features include:

Management of Large Datasets

  • Compression— allows large databases to be stored more compactly and reduces I/O times to ensure high performance on even the largest databases.
  • Reduced Query Latency— helps better handle large data sets, especially those which use dynamic SQL for interactive data retrieval.
  • Replication Performance—increases the performance of Sybase’s industry-leading transaction replication and syncing technology.
  • Enhancements for Parallel Hardware —improves optimization of multi-core/multi-threaded CPU architectures to get the maximum performance out of today’s latest processors

Simplified Administration

  • Online Operations—increases data availability while allowing data to be optimized for application performance.
  • Extended Diagnostics—allows DBAs to quickly pin-point performance bottlenecks and speed customer support requests.
  • Strong Password Encryption—protects the database from external intrusion and hacking
  • Single Sign-on & Login Profiles—makes it easier to manage large numbers of users and simplifies end-user access to the system

Ease of Application Development

  • Efficient Management of Large Objects—inline management of large objects as well as enhanced application development features such as large objects as parameters to stored procedures.
  • Enhanced Application Language Capabilities—many improved TransactSQL&™ language features to increase productivity of application developers as well as support for a variety of popular languages such as Python, PHP and Perl.
  • An Enterprise-Class DB for ISV applications —enhanced business-critical performance for ISV applications out of the box, enabling ISVs to easily write and port their applications to ASE 15.7.

bcp copy in failed

0

Hi Folks,

Few days back , we  have faced the issue regarding bcp in,  message was bcp copy in failed. Although Rows were inserting successfully.  The user which We were using that was dbo of database and had all
permissions.

We tried with other login id and it was working fine, that was dbo alias.

We tried to remove user, added as  dbo alias, given explicit permission,but it didn’t work out.

It was totally weried as there was no problem at all , or you can say the issue which we were not able to
investigate as error is not much explanatory.

Also I would like to mention , we have refreshed this database from 12.5.4 to 15.0.3 env.

Finally we dropped the user, dropped login account and added again , means we craeted new login account
with same name  and after that added in db as a dbo alias. It worked fine now.
I am still not sure why this caused the issue.

If failures is my Destiny, I would like to top in failures!!

0

–If 100 plans fail, will make 200 next day!

– If failures is my Destiny, I would like to top in failures!!

Sybase: PowerBuilder and PowerDesigner

0
  • Sybase PowerBuilder is a RAD tool that lets you develop rich GUI applications,

based on the DataWindow concept.I’ve been told that part of the

SAP POS module is actually developed  with PowerBuilder.

  • Sybase PowerDesigner is a modeling tool that lets you handle anything from a 

data model to a full enterprise architecture, and is widely seen as

one of the best such tools around.

Source : http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/25286
Both PowerBuilder and PowerDesigner are not tied to any specific database and will work with most common database brands.

 (more...)

17700 + Hits in sybaseblog.com!

0

Hi Guys,

Today your blog crossed the  17700+ hits!!

I would like to congratulate and thanks for your continued support !!!

Thanks once again!

Cheers!!

AnVa

New Parallel Distributed Query And Advanced Workload Management Capabilities in IQ15.3

0

Source : http://www.sybase.com/detail?id=1093604&contentOnly=true

Sybase, Inc., an SAP® company (NYSE: SAP) and industry leader in enterprise and mobile software, today announced the general availability of Sybase® IQ 15.3, powered by a new generation of shared everything Massively Parallel Processing (MPP) technology. With this release, enterprise IT departments can overcome scalability limitations of today’s data warehouses in many industries including financial services, telecommunications, information providers, healthcare and insurance. By implementing a business analytics information solution that allows sharing of computing and data resources with the innovative Sybase IQ PlexQ™ technology, enterprises can lead the next wave of data warehouse transformation by breaking down user and information silos to drive analytics adoption throughout the entire organization.

Sybase IQ 15.3 key new features include: (more…)

Go to Top