sybanva
(1 comments, 86 posts)
This user hasn't shared any profile information
Posts by sybanva
Petabyte Size Data Store Managed by Hadoop & Map Reduce.
0Hadoop
——–
Source : http://hadoop.apache.org/ & www.
Today, we’re surrounded by data. People upload videos, take pictures on their cell phones, text friends, update their Facebook status, leave comments around the web, click on ads, and so forth. Machines, too, are generating and keeping more and more data. You may even be reading this book as digital data on your computer screen, and certainly your purchase of this book is recorded as data with some retailer.
The exponential growth of data first presented challenges to cutting-edge businesses such as Google, Yahoo, Amazon, and Microsoft. They needed to go through terabytes and petabytes of data to figure out which websites were popular, what books were in demand, and what kinds of ads appealed to people. Existing tools were becoming inadequate to process such large data sets. Google was the first to publicize MapReduce—a system they had used to scale their data processing needs.
This system aroused a lot of interest because many other businesses were facing similar scaling challenges, and it wasn’t feasible for everyone to reinvent their own proprietary tool. Doug Cutting saw an opportunity and led the charge to develop an open source version of this MapReduce system called Hadoop . Soon after, Yahoo and others rallied around to support this effort.
What is Hadoop ?
————–
Hadoop is an open source framework for writing and running distributed applications that process large amounts of data. Distributed computing is a wide and varied field, but the key distinctions of Hadoop are that it is
1.Accessible—Hadoop runs on large clusters of commodity machines or on cloud computing services such as Amazon’s Elastic Compute Cloud (EC2 ).
2.Robust—Because it is intended to run on commodity hardware, Hadoop is architected with the assumption of frequent hardware malfunctions. It can gracefully handle most such failures.
3.Scalable—Hadoop scales linearly to handle larger data by adding more nodes to the cluster.
4.Simple—Hadoop allows users to quickly write efficient parallel code.
Comparing SQL databases and Hadoop:
————————————
Hadoop is a framework for processing data, what makes it better than standard relational databases, the workhorse of data processing in most of today’s applications? One reason is that SQL (structured query language) is by design targeted at structured data. Many of Hadoop’s initial applications deal with unstructured data such as text. From this perspective Hadoop provides a more general paradigm than SQL.
For working only with structured data, the comparison is more nuanced. In principle, SQL and Hadoop can be complementary, as SQL is a query language which can be implemented on top of Hadoop as the execution engine.3 But in practice, SQL databases tend to refer to a whole set of legacy technologies, with several dominant vendors, optimized for a historical set of applications. Many of these existing commercial databases are a mismatch to the requirements that Hadoop targets.
Some Implementation of Hadoop for production purpose :
——————————————————
Complete List @ http://wiki.apache.org/hadoop/PoweredBy
Sybase IQ
———
Sybase IQ : http://www.computerworld.com/s/article/9221355/Updated_Sybase_IQ_supports_Hadoop_MapReduce_Big_Data_
EBay
—-
532 nodes cluster (8 * 532 cores, 5.3PB).
Heavy usage of Java MapReduce, Pig, Hive, HBase
Using it for Search optimization and Research.
Facebook
——-
We use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning.
Currently we have 2 major clusters:
A 1100-machine cluster with 8800 cores and about 12 PB raw storage.
A 300-machine cluster with 2400 cores and about 3 PB raw storage.
Each (commodity) node has 8 cores and 12 TB of storage.
We are heavy users of both streaming as well as the Java APIs. We have built a higher level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/). We have also developed a FUSE implementation over HDFS.
LinkedIn
———
We have multiple grids divided up based upon purpose. * Hardware:
120 Nehalem-based Sun x4275, with 2×4 cores, 24GB RAM, 8x1TB SATA
580 Westmere-based HP SL 170x, with 2×4 cores, 24GB RAM, 6x2TB SATA
1200 Westmere-based SuperMicro X8DTT-H, with 2×6 cores, 24GB RAM, 6x2TB SATA
Software:
CentOS 5.5 -> RHEL 6.1
Sun JDK 1.6.0_14 -> Sun JDK 1.6.0_20 -> Sun JDK 1.6.0_26
Apache Hadoop 0.20.2+patches -> Apache Hadoop 0.20.204+patches
Pig 0.9 heavily customized
Azkaban for scheduling
Hive, Avro, Kafka, and other bits and pieces…
Twitter
——–
We use Hadoop to store and process tweets, log files, and many other types of data generated across Twitter. We use Cloudera’s CDH2 distribution of Hadoop, and store all data as compressed LZO files.
We use both Scala and Java to access Hadoop’s MapReduce APIs
We use Pig heavily for both scheduled and ad-hoc jobs, due to its ability to accomplish a lot with few statements.
We employ committers on Pig, Avro, Hive, and Cassandra, and contribute much of our internal Hadoop work to opensource (see hadoop-lzo)
For more on our use of Hadoop, see the following presentations: Hadoop and Pig at Twitter and Protocol Buffers and Hadoop at Twitter
Yahoo!
——–
More than 100,000 CPUs in >40,000 computers running Hadoop
Our biggest cluster: 4500 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)
Used to support research for Ad Systems and Web Search
Also used to do scaling tests to support development of Hadoop on larger clusters
Our Blog – Learn more about how we use Hadoop.
>60% of Hadoop Jobs within Yahoo are Pig jobs.
Data_
Implementation of Function String in Sybase Replication Server(SRS)
0These experience shared by Senior DBAs as name mentioned, Hope this will help you to understand more about function string from implementation point of view in a Replication environment:
Craig Oakley , Senior DBA.
—————————--
We used function strings when we wanted to replicate all columns to some servers, and only selected columns to other (web-facing) servers. This was particularly useful before Rep Server allowed multiple RepDefs on the same table. One concern was text columns which were not being replicated to the web-facing server: we had to create a function string to get a text pointer (we used a one-row table and just update all the text columns on top of each other, as the value was not needed on that server): failure to get a text pointer cause the DSI to go down, and we could not specify that as a condition to ignore.
Beyond this, I would imagine function strings could help specify how you want the update to be done, which could be a performance improvement. It would also allow for a different implementation at the replicate than there is at the primary (such as a table at the primary being two joined tables at the replicate).
Sukhesh Nair, Senior Sybase DBA
———————————–
We used to have a setup where data was replicated from sybase to oracle as also to a warm standby sybase server. Rep Server function strings helped in filtering data that would need to be passed to Oracle. It helped immensely in streamlining the data flow to targets by manipulating the incoming data through function string. I feel it is one of the most advanced and useful yet very less used capabilities of Sybase Rep Server.
The deterrent could be because of the complexity it would introduce to the replication system. The setup we had worked wonderfully and never gave us any major problems. Without proper monitoring (which needs to be scripted by DBAs) it used to be hard to maintain. Many of the current Rep Server administrators I see do not have adequate knowledge or experience of handling function strings.
Rey Wang , Senior Sybase DBA
————————-
You can map the delete to no op with functional string.
Partha Gogoi Senior DBA
————————-
We use function strings to transform data at the replicate..We have databases being replicated from Toronto and New York to London, Sydney and Singapore and the client ids are transformed at the replicate because, as per business requirements, the client ids are different at each site.. Of course , having a Universal client id would simplify things , but the systems and databases at each site grew independently until replication was set up and it would be a lot of rework to change all the client ids at the replicate sites
Øystein Grinaker Senior DBA
—————————
A Function String could be used to change default behaviour.
Say you delete a row in a table on PDB, but you do not want to delete the row on the RDB. Then make a change in rs_delete. You may make the rs delete just to make a logical delete by updateing a deletemarker for that spesific row.
Source : Linkedin.com
Multi-Path Replication (MPR) technology : Replication Server 15.7
0The imminent release of Replication Server 15.7 continues pushing envelop and maintaining its leading edge by introducing new Multi-Path Replication (MPR) technology.
So, what is MPR? MPR improves replication performance and reduces latency by enabling parallel paths of data from the source database to the target database. These parallel paths will process data independently of each other to improve overall efficiency, performance and load balancing.
Full Source @ http://blogs.sybase.com/zhangb/2011/12/replication-server-improves-performance-and-reduces-latency-with-mpr/#respond
Note :
What about the order of transacation , that need to maintain at target side?
Even transaction can come rapidally at target , but it must be applying in a order.
Commit order is maintained within single path. To increase performance on a single path, one can employ parallel DSI, Bulk copy and HVAR features RS has introduced in earlier releases. To take advantage of MPR, users need to fully understand application schema to divide them as commit order is not guaranteed among paths.
Sybase Interview Questions
0Same has been updated in @http://sybaseblog.com/interviewquestions/
How can we configure the dbcc database?
How can you configure sybsecurity?
Have you ever worked on terabyte size of database? How are you taking backup for the same?
Whats the diff between MSA and WS? Can we consider MSA as a Ws?
You are not able to execute any command in ASE as tempdb is full and you cant create user defined tempdb on the fly , how will you investigate ?
What are the new features fo Sybase ASE 15?
What are the different options avilable with reorg ?
Why we require reorg ?
Suppose if every thing is fine in REplication enviorment and data is not replicating , how will you troubleshoot the same?
What is gen id in rep server?
How can you check the latency in the replication enviorment?
Whats is HA in Sybase? How can we monitor the HA status?
Sybase IQ : Architecture & Benefits
0Sybase IQ ??
================
Sybase® IQ is a high-performance decision-support server designed specifically for data warehousing.
Sybase IQ is part of the Sybase product family that includes Adaptive Server Enterprise and SQL Anywhere. Component Integration Services within Sybase IQ provide direct access to relational and nonrelational databases on mainframe, UNIX, or Windows servers.
Architecture ??
===============
Sybase IQ architecture differs from most relational databases. Sybase IQ focuses on readers, not writers, which provides a fast query response for many users.
Data is stored in columns, not rows
Placing indexes on all columns provides a performance advantage
A large page size provides a performance advantage
A large temporary cache provides a performance advantage for most operations
Access to data occurs at the table level
Most query results focus on data at the table level
Most insertions and deletions write data for an entire table, not for a single row.
Benefits ??
=========
Sybase IQ is a decision support system optimized to deliver superior performance for mission-critical business solutions.
Intelligent query processing that use index-only access plans to process any type of query.
Ad hoc query performance on uniprocessor and parallel systems.
Multiplex capability for managing large query loads in a multi-server configuration.
Fully-flexible schema support.
Efficient query execution without query-specific tuning under most circumstances.
Fast initial and incremental loading.
Fast aggregations, counts, comparisons of data.
Parallel processing optimized for multi-user environments.
Stored procedures.
Increased productivity due to reduced query time.
Entire database and indexing stored in less space than raw data.
Reduced input/output (I/O).
Happy,Healthy & Successful New Year 2012
0Wishing You Happy , Healthy & Successful New Year 2012! Happy Learning Sybase!
ASE database for SAP ERP
0Hello all,
These are copilataion for Sybase ASE on SAP from the Rob’s Blog :
Read Full Story : http://blogs.sybase.com/database/2011/12/so-what-does-an-ase-database-look-like-in-sap-erp/
- SAP has released Business Suite on ASE version 15.7.
- All SAP application data resides in a single ASE database. There is another small database for use by SAP tools.
- The ASE database uses a 16KB page size.
- For ERP only (i.e. not counting CRM and the other Business Suite modules), the database contains about 80,000 tables and 170,000 indexes. This is because SAP ERP has many features and functions, all with their own set of tables. SAP customers typically run only a subset of all those functions so in practice a large part of those 80,000 tables will always remain empty.
- All SAP tables use datarowslocking (there is an interesting historical dimension.
- All tables names are in uppercase; some table names contain special characters, like the slash character in “/BCV/C_QATTR” (I don’t have a clue what that name means, BTW)
- Apart from the tables, there are also about 10,000 views. No stored procedures or triggers are used.
- SAP makes heavy use of dynamic SQL (also known as “prepared statements”).
- Many tables have a text or image column.
- All tables are owned by one database user (and that’s not the dbouser).
- The ASE database is accessed through ODBC.
- SAP makes frequent use of the built-in ASE Job Scheduler (originally added in ASE 12.5.1).
- The ASE server uses Unicode with the utf8 character set.
What’s in YOUR Architecture?
0What’s in your architecture?
Probably not enough if you’re using just data modeling to integrate all the components of your enterprise, especially your information. To lay the
foundation of this paper, let’s start from the same level of understanding: the need for a common approach to managing all the aspects of information to
enable rapid business performance in the 21st century economy.
Competition in the marketplace is always fierce. To stay abreast, organizations must constantly analyze their customer needs and expectations, enhance
or innovate their business processes, and deliver products and services that create exceptional customer value. Organizations also need to be customercentric to forge long-term relationships with clients and consumers. Only organizations that are agile enough to respond to volatile market conditions
with innovation, expedited time-to-market processes, and reduced costs can differentiate themselves from the competition. Such agility occurs when a
company’s IT operations are closely aligned with its business operations. IT needs to understand business to implement technologies and applications that
support the current and future business goals.
Read Full Article @ http://www.sybase.com/files/White_Papers/Sybase_Whats_in_Your_Architecture_WP.pdf

Recent Comments